@Jim I am not capable of following the wiki article, my bad. Your posted middle formula does the trick for me though, thanks:
In[9]:= V1 = 1/10 + 7/10 + 2/10;(* ==1, True*)
V2 = (1/10)^2 + (7/10)^2 + (2/10)^2;(* ==(27/50), True *)
176/(V1 - (V2/V1))
Out[9]= 8800/23
So … Mathematica uses the formula for the "Reliability weights-weighted unbiased (estimate of) sample variance", aha good to know! I find the explanation elsewhere more understandable, especially the paraphrase "this expression reduces to the unweighted bias-corrected sample variance with the familiar
$\frac{1}{n-1}$ factor when there are
$n$ equal weights in the population". Wiki claims that the expectancy of this quantity equals the population variance:
$\mathcal{E}[s_w^2]=\sigma _{\text{actual}}^2$
Two things i learn from this (please correct if i'm mistaken):
- wdata and data are not related in any way, if data is to be viewed as a sample with
$n=10$ observations. hence the results Out[2] and Out[8] have a very different meaning. In particular, the weights in In[8] do represent the weights in the population, not the weighting of
$-30$ (and of
$10$ and
$20$) within the sample! For 2 days i was wrong, thinking the opposite; whose fault is it? I am gonna blame the ones whose job is to tell/teach the right thing fully, clearly, exemplarily but didn't do so.
- The quality of this estimator function is not too good, its "convergence" (and performance) is slow. It is interesting to learn that the estimator does indeed its job (and better for increasing sample sizes
$n$) but using the regular estimator
$s_{N-1}^2$ gives a far higher quality estimation of the population variance:
$\mathcal{E}(s_{N-1}^2)=\sigma _{\text{actual}}^2$
Here are two codes to illustrate the correct use of the Variance function on sample data (not population data). The first variant is more intelligible but the second variant should be preferred imho. It is good to know both for learning/doing statistics in
$Wolfram$
$L$.
n = 10;(* taking n observations PER SAMPLE *)
A = Tuples[{0, 1}, n];(* drawing with replacement *)
prules = {0 -> 1/3, 1 -> 2/3};(* the 2 elements in population are weighted *)
Total[A /. a : {_?NumericQ ..} :> Variance[WeightedData[a, a /. prules]]]/Length[A] (* should converge to 2/9 *)
As opposed to working with the unbiased sample variance:
piData = Sort@(A /. a : {_?NumericQ ..} :> {Variance[a], Times @@ (a /. prules)});
pData = {#[[1, 1]], Total[#[[All, 2]]]} & /@ Split[piData, First[#1] == First[#2] &];
var = Times @@@ pData // Total (* expectancy of s^2 is 2/9 *)
This was really essential for me to figure out, i am slowly understanding what the built-in statistics functions do for me, hehe. Also i am still wondering if the Mathematica of today is better than Mathstatica (powered by Mathematica) and the ubiquitous
$R$ for programming in stochastics/statistics. Today i got my hands on the two Murray Spiegel Schaum's Outline workbooks on statistics. Oh, just found a new one, hmmm.
@Mike I was frustrated indeed for 48h haha but things are clearing up now. Thanks for your attention, appreciated! Hope you enjoyed this 'contribution'. I am all new to statistics, so my steps are tiny but need to be firmly rooted (for "knowing what i am doing" — very important when handling a software application, or maths).