Message Boards Message Boards

GROUPS:

Input of Frequency distribution

Posted 6 months ago
1433 Views
|
10 Replies
|
3 Total Likes
|

How do we represent frequency distribution below in Mathematica? Frequency distribution table

I can find the mean, median and mode manually. Just curious to know how can I represent the frequency distribution using Mathematica and from it determine the mean, median and mode.

The values that I determined manually using the respective formulas are mean = 35.8867, median = 33.9444, mode = 28.1364

I tried to approach it in Mathematica 12.1 by using probability distribution:

pdf[x_] := Piecewise[{{20, 19.5 <= x < 24.5},
             {24, 24.5 <= x < 29.5},
             {45, 29.5 <= x < 39.5},
             {30, 39.5 <= x < 54.5},
             {5, 54.5 <= x < 59.5},
             {4, 59.5 <= x < 69.5}}, 0]

dist = ProbabilityDistribution[pdf[x], {x, 19.5, 69.5}, Method -> "Normalize"];

Mean[dist]

Median[dist]

The results that I obtained is mean = 38.9198, median = 37.7738, which are incorrect, and so is my approach.

Anyone can help?

10 Replies

Hello, your mean is correct, since

Integrate[ x PDF[dist, x], {x, 19.5, 69.5}]=38.9198
Posted 6 months ago

From elementary statistics text, the mean is computed as:

sumFreq = Total[{20, 24, 45, 30, 5, 4 }]

mean = Total[{22 x 20, 27 x 24, 34.5 x 45, 47 x 30, 57 x 5, 64.5 x 4}] / sumFreq

which gives mean = 35.8867

The median class is 30-39, and the median is computed as

median = 29.5 + 10 ((sumFreq /2 - (20 + 24))/45)

which gives median = 33.9444

The mode is estimated by

mode = 24.5 + (24/5 - 20/5)/((24/5 - 20/5) + (24/5 - 45/10)) 5 = 28.1364

My doubt is whether my representation of the uneven frequency distribution using Piecewise and ProbabilityDistribution appropriate to estimate the mean and median? If it is, then

Integrate[ x PDF[dist, x], {x, 19.5, 69.5}]=38.9198 is also correct.

But why the discrepancy from the manual estimation of the mean using the class mid-points?

Thanks for your kind response.

The above mean computation is not properly weighting for interval lengths. So it would actually be:

vals = {20, 24, 45, 45, 30, 30, 30, 5, 4, 4};
midpts = {22, 27, 32, 37, 42, 47, 52, 57, 62, 67};
midpts.vals/Total[vals] // N

(* Out[169]= 38.9198 *)
Posted 6 months ago

Since my Mean[dist] is correct, so is my Median[dist]?

Thanks for your help.

"My doubt is whether my representation of the uneven frequency distribution using Piecewise and ProbabilityDistribution appropriate to estimate the mean and median?" Yes, but other representations are probably better. It's an issue of Density Estimation.

Your dist estimates the density by a piecewise constant function, while the solution proposed by D.L. is the midpoint density estimation. More sophisticated methods consist in fitting by some F the associated distribution function in a convenient basis (polynomials (basic, trigonometrical, Berstein,..), splines, wavelets). Then f:=F' is some density estimation, Integrate[x f[x]] is the estimation of the mean, x0 such that F[x0]==0.5 is an estimation of the median, etc... There is a large literature about this topic.

Posted 6 months ago

Thanks for your explanation. Very much appreciated. I will find out more about kernel density estimation. The representation of frequency distribution is in the realm of elementary statistics but the theory behind it, as happens most of the time, is not elementary. Thanks again.

Posted 6 months ago

In an effort to contribute something (useful, I hope) to the community after all these years, I offer this schoolboy approach with yet another result for the weighted average... 34.

But, I have no idea how to add the code from my cloud directory, so I attach the nb file in hopes someone can see it.

Attachments:
Posted 6 months ago

Thanks David. I'm trying to understand your use of ConstantArray in detailing the frequencies. I'm lost.

Posted 6 months ago

You are not alone, I am lost somewhere between Wolfram Language, math and physics 97% of the time. I used ConstantArray as a easy way to generate repeated frequency values the length of the mass span. ConstantArray[x,4] -> {x,x,x,x} so I just repeated the frequencies for the length of the mass span. That way, all data is weighted properly. I would not recommend trying this with large data sets but for lists of under several hundred elements it is a straightforward method.

Note that I erred in the data table so the proper result is 39. Technically, you can not get more significant digits than the least of your data, so you can only resolve to 39. Here is a corrected code block...

Block[{x = {{{20, 24}, 20}, {{25, 29}, 24}, {{30, 39}, 45}, {{40, 54},
      30}, {{55, 59}, 5}, {{60, 69}, 4}}, m, f, a},
 m = Table[Range[x[[i, 1, 1]], x[[i, 1, 2]]], {i, 1, Length@x}];
 f = Table[ConstantArray[x[[i, 2]], Length[m[[i]]]], {i, 1, Length@x}];
 a = Flatten[m].Flatten[f]/Total[Flatten[f]];
 Column[{
   StringForm["Given a mass-frequency table: ``", 
    MatrixForm[x\[Transpose]]],
   StringForm["The table range of masses m \[Rule] ``", 
    m // MatrixForm],
   StringForm["Corresponding frequencies f \[Rule] ``", 
    f // MatrixForm],
   StringForm[
    "Thus m*f/(Total@f) evaluates average a \[TildeTilde] ``", 
    Round[N@a, 1]]
   }]
 ]
Posted 6 months ago

Thanks for your time and effort to help explain the codes, David.

What if I change the codes to the following?

midpts = Range[20, 69];    
freq = Join[Table[20, {5}], Table[24, {5}], Table[45, {10}], 
   Table[30, {15}], Table[5, {5}], Table[4, {10}]];    
mean = midpts . freq /Total[freq] // N

The answer 38.9198 (39 to 2 sig fig) is the same as yours. Please comment.

Group Abstract Group Abstract