Message Boards Message Boards

Visualize probability distribution data as BarChart or Histogram?

Posted 7 years ago

Dear All, I have the following probability distribution data, and I would like to visualize the data as Bar Chart or Histogram. I have a command, but it gives me 2 problems: 1) as you can see, my axis values in a generated plot are very unattractive, and the values are hard to distinguish. 2) I would like to draw a proper bar chart/histogram, so that later I can fit some distribution to it and visualize. But the plot I have generates bars "next to one another", yet ideally the spaces between the spikes is not identical. For example, the distance between 0.3 and 0.375 is not the same as the distance 0 and 0.1 If would greatly appreciate your kind help and suggestions. Best regards, Alex.

data = {{0.`, 0.14943079650402`}, {0.1`, 
        0.00021827389719`}, {0.1111111`, 0.00032284128247`}, {0.125`, 
        0.00105345470365`}, {0.1428571`, 0.00189604749903`}, {0.1666667`, 
        0.00496507203206`}, {0.2`, 0.0128972902894`}, {0.2222222`, 
        0.00094202003675`}, {0.25`, 0.03224607929587`}, {0.2857143`, 
        0.00478106550872`}, {0.3`, 0.00093960686354`}, {0.3333333`, 
        0.08392683416605`}, {0.375`, 0.00326516595669`}, {0.4`, 
        0.02893583476543`}, {0.4285714`, 0.0072281844914`}, {0.4444444`, 
        0.00200031138957`}, {0.5`, 0.22689934074879`}, {0.5555556`, 
        0.00208450458013`}, {0.5714286`, 0.00720308022574`}, {0.6`, 
        0.02667851746082`}, {0.625`, 0.00338809774257`}, {0.6666667`, 
        0.10289531946182`}, {0.7`, 0.00141532090493`}, {0.7142857`, 
        0.00444651674479`}, {0.75`, 0.03364257141948`}, {0.7777778`, 
        0.00085208012024`}, {0.8`, 0.01232228334993`}, {0.8333333`, 
        0.00435024732724`}, {0.8571429`, 0.00152650044765`}, {0.875`, 
        0.00095871940721`}, {0.8888889`, 0.00024728401331`}, {0.9`, 
        0.00045259558829`}, {1.`, 0.23558811843395`}};

    BarChart[data[[All, 2]], ChartLabels -> data[[All, 1]], Frame -> True,
      GridLines -> {None, Automatic}, BarSpacing -> Automatic]
POSTED BY: Alex Token
5 Replies
Posted 7 years ago

I also find that a simple command may give a good description of how the data behaves. data = {{0., 0.14943079650402}, {0.1, 0.00021827389719}, {0.1111111, 0.00032284128247}, {0.125, 0.00105345470365}, {0.1428571, 0.00189604749903}, {0.1666667, 0.00496507203206}, {0.2, 0.0128972902894}, {0.2222222, 0.00094202003675}, {0.25, 0.03224607929587}, {0.2857143, 0.00478106550872}, {0.3, 0.00093960686354}, {0.3333333, 0.08392683416605}, {0.375, 0.00326516595669}, {0.4, 0.02893583476543}, {0.4285714, 0.0072281844914}, {0.4444444, 0.00200031138957}, {0.5, 0.22689934074879}, {0.5555556, 0.00208450458013}, {0.5714286, 0.00720308022574}, {0.6, 0.02667851746082}, {0.625, 0.00338809774257}, {0.6666667, 0.10289531946182}, {0.7, 0.00141532090493}, {0.7142857, 0.00444651674479}, {0.75, 0.03364257141948}, {0.7777778, 0.00085208012024}, {0.8, 0.01232228334993}, {0.8333333, 0.00435024732724}, {0.8571429, 0.00152650044765}, {0.875, 0.00095871940721}, {0.8888889, 0.00024728401331}, {0.9, 0.00045259558829}, {1., 0.23558811843395}};

ListPlot[data]

POSTED BY: Alex Token
Posted 7 years ago

One just doesn't automatically get relative frequencies when sampling from a probability distribution. What is the sample size and how did you end up with such odd, unequally spaced values?

POSTED BY: Jim Baldwin
Posted 7 years ago

Dear Jim, this was sent as a raw data to me, the sample size is about 300,000,000. The numbers come from a Census data. I have a task to do, i.e., I need to have it graphed (preserving the original “distances” between the horizontal axis’ values) with “spikes” of given heights.

POSTED BY: Alex Token
Posted 7 years ago

Dear Jim, thank you for your reply. No, I actually have the data as is, which shows some possible values of a random variable, and the associated frequencies. I need to see them it in a "correct" histogram type from, or preferably, just something like "thin spikes" at the given data points. For example, at the horizontal axis' value of 0, a vertical spike occurs whose height is 0.14943079650402, at 0.1, another spike's height is 0.00021827389719 etc. Thank you.

POSTED BY: Alex Token
Posted 7 years ago

Do you have the raw data? (i.e., data before binning). If not, about all you can do is get the horizontal axis cleaned up. There are 15 different distances between the horizontal values. Maybe describing how the data got into the form it's in would help. At minimum the actual sample size is needed if you want to fit a probability distribution and have some idea as to the precision of the estimates of the parameters.

POSTED BY: Jim Baldwin
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract