Message Boards Message Boards

Switching samples in Mann-Whitney test changes p-value

Posted 1 year ago

I did some demonstration about Mann-Whitney test for students and found some weird behavior. Take two samples, 6 values each. First is 1,2,3,4,5,6. Second is 7,8,9,10,11,12.

data1 = {1, 2, 3, 4, 5, 6};
data2 = {7, 8, 9, 10, 11, 12};

Now perform Mann Whitney test on these datasets:

MannWhitneyTest[{data1, data2}]
MannWhitneyTest[{data2, data1}]

Surprisingly, these two tests return different p-values. First returns 0.0030528, the second one returns 0.00507487

The same is true for WolframAlpha: MannWhitneyTest[{data1, data2}], MannWhitneyTest[{data2, data1}]

Any ideas why the test implementation is not symmetrical?

8 Replies

Hi just an update on this. The reason that the computation wasn't symmetric was that the previous chosen convention was only asymptotically symmetric. I have now modified this such that the case for AlternativeHypothesis->"Unequal" is symmetric for any Method. I have also added an "Exact" method following original sources (even though they are missing boundary conditions for the derivation), which matches the tables in the original sources. I have chosen a convention to be as consistent as possible with the preexisting "Asymptotic" computation. Unfortunately it won't make it into 14.0.

POSTED BY: Eduardo Serna

Thanks so much for this input, it helps a lot. I have added it to the test report

POSTED BY: Eduardo Serna

Thank you very much. One more request: as long as you will fix the MannWhitneyTest function, it would be great also to fix the behavior of Method->"Automatic". Now it always calculates p-value from asymptotic normal distribution for U. At the same time for small samples (<20) an exact p-value could be calculated based on permutations (not Monte Carlo, a direct formula/table instead). For instance, in the reported case of data1, data2 the exact p-value is the probability of getting 2 extreme cases: all ranks of sample 1 below the sample 2 and all ranks of sample 1 above the sample 2, which is exactly 2(6!6!)/(6+6)! = 1/462 ~ 0.00216.The asymptotic normal approximation for U gives 0.003053. This difference is not a big deal if a single M-W test is performed as both results are significant. But if multiple comparisons are required, then multiple M-W tests are applied with subsequent correction of p-values by for instance Holm–Bonferroni method. And here the difference between 0.002 and 0.003 might become important.

Thank you for reporting this behavior, I am currently investigating. The intention of the implementation is certainly to be symmetric but something is wrong I will try to get it fixed soon.

POSTED BY: Eduardo Serna

There is the same problem with SignedRankTest.

By default, MannWhitneyTest uses asymptotic statistics, which is not appropriate for such small samples. Using instead the method "Permutations", with a reasonable number of permutations, (50, 100), I obtained zero as a Pvalue in both cases.

POSTED BY: Claude Mante

Is there any way to report this bug? It is extremely influential. I can't imagine how many datasets from animal studies I've processed with M-W in Mathematica in last 5 years... It will be a nightmare to revisit all the data once again, taking into account that a lot of data is already published. I'm shocked.

Yes, this does seem to be a bug. In the M-W test, the test statistic U is the smaller of U1 and U2, but the WL function seems to just take the value of U_1, which is the incorrect value if the first group is larger than the second. So in your example, the first version with {data1,data2} gives the correct answer.

POSTED BY: Gareth Russell
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract