# Input of Frequency distribution

Posted 6 months ago
1433 Views
|
10 Replies
|
3 Total Likes
|
 How do we represent frequency distribution below in Mathematica? I can find the mean, median and mode manually. Just curious to know how can I represent the frequency distribution using Mathematica and from it determine the mean, median and mode.The values that I determined manually using the respective formulas are mean = 35.8867, median = 33.9444, mode = 28.1364I tried to approach it in Mathematica 12.1 by using probability distribution: pdf[x_] := Piecewise[{{20, 19.5 <= x < 24.5}, {24, 24.5 <= x < 29.5}, {45, 29.5 <= x < 39.5}, {30, 39.5 <= x < 54.5}, {5, 54.5 <= x < 59.5}, {4, 59.5 <= x < 69.5}}, 0] dist = ProbabilityDistribution[pdf[x], {x, 19.5, 69.5}, Method -> "Normalize"]; Mean[dist] Median[dist] The results that I obtained is mean = 38.9198, median = 37.7738, which are incorrect, and so is my approach.Anyone can help?
10 Replies
Sort By:
Posted 6 months ago
 Hello, your mean is correct, since Integrate[ x PDF[dist, x], {x, 19.5, 69.5}]=38.9198 
Posted 6 months ago
 From elementary statistics text, the mean is computed as: sumFreq = Total[{20, 24, 45, 30, 5, 4 }] mean = Total[{22 x 20, 27 x 24, 34.5 x 45, 47 x 30, 57 x 5, 64.5 x 4}] / sumFreq which gives mean = 35.8867The median class is 30-39, and the median is computed as median = 29.5 + 10 ((sumFreq /2 - (20 + 24))/45) which gives median = 33.9444The mode is estimated by mode = 24.5 + (24/5 - 20/5)/((24/5 - 20/5) + (24/5 - 45/10)) 5 = 28.1364 My doubt is whether my representation of the uneven frequency distribution using Piecewise and ProbabilityDistribution appropriate to estimate the mean and median? If it is, then Integrate[ x PDF[dist, x], {x, 19.5, 69.5}]=38.9198 is also correct. But why the discrepancy from the manual estimation of the mean using the class mid-points?Thanks for your kind response.
Posted 6 months ago
 The above mean computation is not properly weighting for interval lengths. So it would actually be: vals = {20, 24, 45, 45, 30, 30, 30, 5, 4, 4}; midpts = {22, 27, 32, 37, 42, 47, 52, 57, 62, 67}; midpts.vals/Total[vals] // N (* Out[169]= 38.9198 *) 
Posted 6 months ago
 Since my Mean[dist] is correct, so is my Median[dist]?Thanks for your help.
Posted 6 months ago
 "My doubt is whether my representation of the uneven frequency distribution using Piecewise and ProbabilityDistribution appropriate to estimate the mean and median?" Yes, but other representations are probably better. It's an issue of Density Estimation.Your dist estimates the density by a piecewise constant function, while the solution proposed by D.L. is the midpoint density estimation. More sophisticated methods consist in fitting by some F the associated distribution function in a convenient basis (polynomials (basic, trigonometrical, Berstein,..), splines, wavelets). Then f:=F' is some density estimation, Integrate[x f[x]] is the estimation of the mean, x0 such that F[x0]==0.5 is an estimation of the median, etc... There is a large literature about this topic.
Posted 6 months ago
 Thanks for your explanation. Very much appreciated. I will find out more about kernel density estimation. The representation of frequency distribution is in the realm of elementary statistics but the theory behind it, as happens most of the time, is not elementary. Thanks again.
Posted 6 months ago
 In an effort to contribute something (useful, I hope) to the community after all these years, I offer this schoolboy approach with yet another result for the weighted average... 34.But, I have no idea how to add the code from my cloud directory, so I attach the nb file in hopes someone can see it. Attachments:
 You are not alone, I am lost somewhere between Wolfram Language, math and physics 97% of the time. I used ConstantArray as a easy way to generate repeated frequency values the length of the mass span. ConstantArray[x,4] -> {x,x,x,x} so I just repeated the frequencies for the length of the mass span. That way, all data is weighted properly. I would not recommend trying this with large data sets but for lists of under several hundred elements it is a straightforward method.Note that I erred in the data table so the proper result is 39. Technically, you can not get more significant digits than the least of your data, so you can only resolve to 39. Here is a corrected code block... Block[{x = {{{20, 24}, 20}, {{25, 29}, 24}, {{30, 39}, 45}, {{40, 54}, 30}, {{55, 59}, 5}, {{60, 69}, 4}}, m, f, a}, m = Table[Range[x[[i, 1, 1]], x[[i, 1, 2]]], {i, 1, Length@x}]; f = Table[ConstantArray[x[[i, 2]], Length[m[[i]]]], {i, 1, Length@x}]; a = Flatten[m].Flatten[f]/Total[Flatten[f]]; Column[{ StringForm["Given a mass-frequency table: ", MatrixForm[x\[Transpose]]], StringForm["The table range of masses m \[Rule] ", m // MatrixForm], StringForm["Corresponding frequencies f \[Rule] ", f // MatrixForm], StringForm[ "Thus m*f/(Total@f) evaluates average a \[TildeTilde] ", Round[N@a, 1]] }] ] 
 Thanks for your time and effort to help explain the codes, David.What if I change the codes to the following? midpts = Range[20, 69]; freq = Join[Table[20, {5}], Table[24, {5}], Table[45, {10}], Table[30, {15}], Table[5, {5}], Table[4, {10}]]; mean = midpts . freq /Total[freq] // N The answer 38.9198 (39 to 2 sig fig) is the same as yours. Please comment.