Message Boards Message Boards

How to input classed (grouped) data for empirical distribution

Originally I wanted to ask about it. Then I figured it out on my own and some other undocumented (or poorly documented) tricks and workarounds, so I thought I should write it in English and share it with everyone. I could have re-organized the writing but in the end it's okay like that imho, so please bear with me (the repetitiveness). Beginners of that maths topic might find some information from it useful.

If the random variable assumes discrete values or isn't continuous at the interval boundary, e.g. makes a jump from 159 to 160, then one must continualize the entire range, i.e. transform the pseudo-discrete problem into a continuous problem. For example:

1) calculation of the median from grouped data with "discontinuous" borders:

In[1]:= d = DataDistribution["Histogram",
            {
             {2, 5, 11, 14, 9, 3, 0}/(44*5)
             , Range[149.5, 184.5, 5]
            }, 1, 44];
        Median[d]

Out[2]= 165.929

2) calculation of the standard deviation from grouped data with "discontinuous" borders:

In[3]:= d = DataDistribution["Histogram",
            {
             {3, 5, 9, 12, 8, 0}/(37*10)
             , Range[50, 110, 10]
            }, 1, 37];
        StandardDeviation[d] // N

Out[4]= 12.3324
POSTED BY: Raspi Rascal
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract