Message Boards Message Boards

GROUPS:

Replicate Dave Giles' permutation test using WL?

Posted 3 months ago
787 Views
|
10 Replies
|
2 Total Likes
|

Dear community members, I'm trying to replicate Dave Giles' permutation test examples using Mathematica but I'm having difficulties randomly selecting 50,000 samples from all the possible permutations of a list of 20 prices. Mathematica runs out of memory in my laptop (8 GB RAM).

prices = {5.0, 4.8, 4.7, 4.0, 5.3, 4.1, 5.5, 4.7, 3.3, 4.0, 4.0, 4.6, 
   5.3, 3.0, 3.5, 3.9, 4.7, 5.0, 5.2, 4.6};
In[4]:= RandomSample[Permutations[prices], 50000]

During evaluation of In[4]:= General::nomem: The current computation was aborted because there was insufficient memory available to complete the computation.

During evaluation of In[4]:= Throw::sysexc: Uncaught SystemException returned to top level. Can be caught with Catch[\[Ellipsis], _SystemException].

Out[4]= SystemException["MemoryAllocationFailure"]

Is there an alternative way that I could use to generate those samples?

10 Replies

Just drop the use of Permutations.

prices = {5.0, 4.8, 4.7, 4.0, 5.3, 4.1, 5.5, 4.7, 3.3, 4.0, 4.0, 4.6, 5.3, 3.0, 
  3.5, 3.9, 4.7, 5.0, 5.2, 4.6};
samples = Table[RandomSample[prices, Length[prices]], {i, 50000}];
Posted 3 months ago

Just a simple comment: Your method is obviously more efficient than the OP's. However, the resulting collection of permutations will probably contain duplicates. While Rubens method will not.

Point well taken. However, here's my (maybe feeble) defense:

  • The R code used in "Dave Giles' permutation test examples" allows for potential duplicates. The probability that there will be any duplicates (let alone even a few) with 50,000 samples out of 20! elements is extremely small. (Although to have a proper defense I'll need to quantify that.)

  • The number of duplicates in any particular sample using the method I proposed will almost always be zero. One could execute Length[samples] - Length[DeleteDuplicates[Sort[samples]]] to provide the number of duplicates.

  • If one looked at the distribution of the 20! test statistics and the distribution of an infinite number of samples with replacement from those 20! test statistics, those distributions would be identical as each of the 20! test statistics will be represented the same proportion of the time.

There will be no practical difference in the result with or without duplicates for this particular permutation test and dataset. (I am reminded of the Hans Christian Andersen story "The Princess and the Pea".) Or rather the only practical difference will be the ability to perform the test (allowing duplicates) or not (not allowing duplicates).

Re that small probability: It appears to be a birthday paradox problem, and if I set it up correctly the probability is around 5*10^(-10).

Not that you need my confirmation but, yes, that is a good way to set up the problem:

1 - 50000! Binomial[20!, 50000]/(20!)^50000 // N
(* 5.13779*10^-10 *)

However, if you are seeing lots (or even one duplicate) when running the code, please buy a lotto ticket for me.

How about

Permute[prices, RandomPermutation[Length[prices]]]

This gives only one permutation and no duplicates.

Regards,

Neil

But it is apparently desired to have 50,000 permutations with none of those arrangements duplicated. With a sample of size 1 there are by definition no duplicates. What am I missing?

Jim,

Actually, after rereading what you posted more carefully, I agree with you. You can obviously expand the approach I posted above to include 50000 samples:

Map[Permute[prices, #] & , RandomPermutation[Length[prices], 50000]]

but this would have the same duplicates as your approach and would be slower than what you posted.

I like your post best. I suppose you can always guarrantee no duplicates with:

DeleteDuplicates[
 Table[RandomSample[prices, Length[prices]], {i, 50000}]]

and it is still very fast (although duplicates are unlikely).

Regards,

Neil

Good idea, Neil. If one needed exactly 50,000 samples with no duplicates, just sample somewhat more than 50,000, delete duplicates, and then just use the first 50,000.

Dear all, thank your for your answers. I followed Jim's initial suggestion and I got a result (0.00184) pretty close to Dave's figures ( 0.0017 & 0.0016). Regards, Ruben

Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract