Group Abstract

Message Boards

WOLFRAM COMMUNITY

10.9K Views

7 Replies

8 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

Mathematics Wolfram Language Statistics and Probability

Calculation of variance of weighted data

Raspi Rascal

Raspi Rascal, novato, contributor, pseudo-wannabe (not even tryhard)

Posted 4 years ago

The WDC example (/Scope/Data: "Find the variance of WeightedData") merges the internal intermediary result (of the mean) and shows the end result in a single line only, so i cannot back track anymore. Inspired by that WDC example, please could anyone demonstrate how the $\frac{8800}{23}$ was calculated (just the start/from which definition)? Feel free to use two lines: 1 line for the numeric mean, 1 for the variance using that numeric mean. In[1]:= data = {-30, 10, 10, 10, 10, 10, 10, 10, 20, 20};(* sample data ) {Mean[data], Variance[data]}( bias-corrected sample variance) Out[2]= {8, 1760/9} In[3]:= edis = EmpiricalDistribution[data];( population ) {Mean[edis], Variance[edis]}( population variance ) Out[4]= {8, 176} In[5]:= wdata = WeightedData[{-30, 10, 20}, {1/10, 7/10, 2/10}]; wedis = EmpiricalDistribution[wdata]; {Mean[wedis], Variance[wedis]}( okay,as expected ) Out[7]= {8, 176} In[8]:= {Mean[wdata], Variance[wdata]}( which formula/definition used here, why? *) Out[8]= {8, 8800/23} I cannot figure it out, thank you! Best wishes.

The WDC example (/Scope/Data: "Find the variance of WeightedData") merges the internal intermediary result (of the mean) and shows the end result in a single line only, so i cannot back track anymore.

Inspired by that WDC example, please could anyone demonstrate how the $\frac{8800}{23}$ was calculated (just the start/from which definition)? Feel free to use two lines: 1 line for the numeric mean, 1 for the variance using that numeric mean.

In[1]:= data = {-30, 10, 10, 10, 10, 10, 10, 10, 20, 20};(* sample data *)
{Mean[data], Variance[data]}(* bias-corrected sample variance*)

Out[2]= {8, 1760/9}

In[3]:= edis = EmpiricalDistribution[data];(* population *)
{Mean[edis], Variance[edis]}(* population variance *)

Out[4]= {8, 176}

In[5]:= wdata = WeightedData[{-30, 10, 20}, {1/10, 7/10, 2/10}];
wedis = EmpiricalDistribution[wdata];
{Mean[wedis], Variance[wedis]}(* okay,as expected *)

Out[7]= {8, 176}

In[8]:= {Mean[wdata], Variance[wdata]}(* which formula/definition used here, why? *)

Out[8]= {8, 8800/23}

I cannot figure it out, thank you! Best wishes.

POSTED BY: Raspi Rascal

7 Replies

Sort By:

Raspi Rascal

Raspi Rascal, novato, contributor, pseudo-wannabe (not even tryhard)

Posted 4 years ago

Not a question today. I answered it on my own and took these notes, hopefully someone agrees or disagrees or finds it useful.

POSTED BY: Raspi Rascal

Mike Besso

Posted 4 years ago

Raspi: Thank you for the great question. Though I am not sure I fully understand your question, I think I might understand your frustration. The documentation for Variance states that Variance takes either a Distribution or a List as its argument. The head of a WeightedData is WeightedData. While one of the properties of WeightedData is an EmpericalPDF, I do not think that a Probability Density Function is the same thing as a Distribution. So, my best guess is that there is an undocumented use of [Variance] that takes a WeightedData argument. It would be great if someone from Wolfram jumped in to help us better understand.

POSTED BY: Mike Besso

Michael Helmle

Posted 4 years ago

Since Mathematica can do many caclulations symbolic, you can build a small symbolic weighted data list and use this: wdata2 = WeightedData[Array[v, 3], Array[w, 3]] Here you get the formula how Mean and Variance are calculated Mean[wdata2] Variance[wdata2]

POSTED BY: Michael Helmle

Raspi Rascal

Raspi Rascal, novato, contributor, pseudo-wannabe (not even tryhard)

Posted 4 years ago

Thanks but that's exactly the point which i was making. Your output shows the end result, not the definition. I cannot verify where the output comes from. You will not find such an output as the literal definition of the variance, in any reference book, textbook, or elsewhere. I shouldn't give up here. The topic is relevant in practice. Knowing what you're doing is relevant in practice ... because weighted data is everywhere (including introtexts) and the calculation of variance is everywhere too. So if you're not sure what you're doing, you will end up producing the wrong variance value for your problem. Thanks to the documentation of Mathematica :P

POSTED BY: Raspi Rascal

Jim Baldwin

Posted 4 years ago

@MichaelHelmle did give you enough information to determine the answer for `WeightedData`. And it is readily available at a Wiki page (and elsewhere). Using Michael's notation the weighted mean is given by $$\bar{v}=\frac{\sum _{i=1}^n v_i w_i}{\sum _{i=1}^n w_i}$$ And the weighted variance is given by $$\frac{\sum _{i=1}^n w_i (v_i-\bar{v})^2}{V_1-\frac{V_2}{V_1}}$$ where $$V_i=\sum_{j=1}^n w_j^i$$ The weights are assumed to be known and not random variables.

POSTED BY: Jim Baldwin

Raspi Rascal

Raspi Rascal, novato, contributor, pseudo-wannabe (not even tryhard)

Posted 4 years ago

POSTED BY: Raspi Rascal

Mike Besso

Posted 4 years ago

@Raspi Rascal Yes, I am enjoying this discussion. And for me, it is timely. My job is requiring me to do more and more statistics. Keep up the good work.

POSTED BY: Mike Besso

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback