Message Boards Message Boards

Unexpected Variance values with WeightedData

Posted 1 day ago

Variances of WeightedData are wrong in Mathematica.
Mathematica lacks of quality testing.

Moreover, Mean and Variance of WeightedData are wrong in the documentation.

https://reference.wolfram.com/language/ref/Mean.html

https://reference.wolfram.com/language/ref/Variance.html

enter image description here

POSTED BY: Paolo Zavarise
8 Replies

Variance of WeightedData gives the population variance, while Variance on a set of raw observations gives the unbiased sample variance.

POSTED BY: David Trimas

I also think there is a problem with WeightedData and the documentation... With the same data generation:

Mean@d
Mean[WeightedData[d, 
  w]] (*Correct, but different from the documentation of Mean!*)
Mean[d w]/Total[w] (*From the documentation of Mean for WeightedData *)

As fo the variance, the problem seems linked with unbiasedness :

Variance@d  (*unbiased*)
Mean[(d - Mean@d)^2] (nbpoints/(nbpoints - 1))(*unbiased*)
Variance[
 WeightedData[d, 
  w]]  (*  Problematic definition in the documentation, \
for  WeightedData*)
Mean[(d - Mean@d)^2]  (* biased *)
POSTED BY: Claude Mante
Posted 19 hours ago

Does this make sense? I don't think so.

The rule is: equal weights = without weights.

POSTED BY: Paolo Zavarise

To add to David's point, data will be treated as population data if wrapped in EmpiricalDistribution[]:

Variance[EmpiricalDistribution@data]

This is documented in Mean[], Variance[], and StandardDeviation[] under "Details":

The data can have the following additional forms and interpretations:... WeightedData — weighted mean, based on the underlying EmpiricalDistribution  »

(This comes a couple of items after the erroneous formulae that have been pointed out already.)

POSTED BY: Michael Rogers
Posted 12 hours ago

It's good that you pointed this out and it would be helpful if the specific documentation errors were described. For example the documentation for Mean with weighted data shows

Weighted mean definition in the documentation

The $\frac{1}{n}$ should not be in that definition.

POSTED BY: Jim Baldwin

Yes! The formula for weighted mean in the documentation is wrong - but the result is correct. More clearly:

In[70]:= nbpoints = 10;
d = RandomReal[{-1, 1}, nbpoints];
w = Table[1, nbpoints];
Proba = w/Total@w; (* Probability*)
UsualMean = Mean@d
WeightedMean = 
 Mean[WeightedData[d, 
   Proba]] (*Correct, but different from the documentation of Mean!*)
DocMean = (1/nbpoints) Total[
   Proba d](*According to the documentation of Mean for WeightedData *)

Out[74]= -0.268511

Out[75]= -0.268511

Out[76]= -0.0268511

docMean is wrong. As for the variance, compute :

WeightedVar = 
 Variance[
  WeightedData[d, 
   Proba]]  (*  Problematic definition in the documentation, \
for  WeightedData*)
Good = Total[Proba (d - WeightedMean)^2] 
DocVar = (1/(nbpoints - 1)) Total[
   Proba (d - 
       Mean@d)^2](*According to the documentation of Mean for Weighted\
Data *)

WeightedVar =Good, which is correct, I think ; but DocVar is wrong too, and the problem lies in the documentation, not in the computations - a lesser evil !

POSTED BY: Claude Mante

Maybe there is a different definition, but I tried 2 examples on the given websites and everything is fine.

POSTED BY: Mariusz Iwaniuk
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract