Analyse a large data set using the function "HistogramDistribution"?

Posted 7 months ago
When I analyzed a large data set(see the attachment) using the function "HistogramDistribution"

dataout = Import["/Users/apple/Desktop/dataOut.txt", "Table"];

HistogramDistribution[dataout[[All, {1, 2}]], "Scott"]

an error occurs:

Thread::tdlen: Objects of unequal length in (1/99953){0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,<<42>>} {<<1>>} cannot be combined.

And the output data distribution becomes a mess. The strange thing is, when I change the bspec "Scott" to "Sturges" or "Automatic", everything works.

HistogramDistribution[dataout[[All, {1, 2}]], "Sturges"]

Now for some practical reasons, I have to use "Scott" to analyze the data. So could anyone help me solve this? Thanks a lot.

This appears to work for me in Mathematica 11.3. What version are you using?

Posted 7 months ago

I have tried 11.0 and , both failed. Could you upload your code and the output results? Thank you.

It looks like there was a bug in the HistogramDistribution code that was introduced in version 10.4 and fixed in version 11.2. I'm afraid I don't know of a workaround for the affected versions.

And the code I used was:

data = Import["~/Downloads/dataOut_2.txt", "TSV"];
HistogramDistribution[data[[2 ;;, {1, 2}]], "Scott"]
Posted 7 months ago

Right, Solved! Thank you!

As far as I can see the problem comes from the very first line of your data:

enter image description here

So instead you might try:

HistogramDistribution[dataout[[2 ;;, {1, 2}]], "Scott"]

Hope that helps, regards -- Henrik

Posted 7 months ago

Sorry, may be I did not present the problem clearly. Now I have deleted the first line of the .txt file, the problem remains.

All I can say is that this is working:

enter image description here

