Message Boards Message Boards

0
|
7630 Views
|
2 Replies
|
1 Total Likes
View groups...
Share
Share this post:

Sampling Without Replacement much faster than Built-In

Attachments:
POSTED BY: Brian Beckman
2 Replies

That's a nice solution. The only downsides w.r.t. Vitter's that I can think of are (1) the storage for the hash table, which can be mitigated by making it a Module variable so it's garbage-collected; permuting the ordered output of Vitter's in the ordinary use-case would also entail extra storage, so, in practice, I call that a wash, (2) the time to regenerate numbers in case of collision, negligible unless the sample size approaches the population size, which is not the use case here, where we are interested in small samples from very large populations of integers.

POSTED BY: Brian Beckman

RandomSample takes a set as input. It scales linearly in the size of the set, in the above examples, for the fairly obvious reason that it has to first create the set.

A fast way to get a few elements between 1 and a given maximum, without replacement, is as below. It uses hashing under the hood to remember what's been chosen. We allow for multiple calls, keeping already chosen elements from being reselected. I did not add error handling for the case where there are not sufficiently many elements remaining to fill the order.

randomSampleFromRange[n_, r_, taken_] := Reap[Module[
    {j = 0, val},
    While[j < r,
      val = RandomInteger[{1, n}];
      If[! TrueQ[taken[val]], j++; Sow[val]; taken[val] = True];
      ];
    ]][[2, 1]]

Example:

Clear[taken]
Timing[randomSampleFromRange[10^10, 20, taken]]

(* Out[15]= {0., {5225044465, 7784713962, 4676630142, 5502367616, 
  561195181, 3490396528, 4458882112, 5747601606, 8671904831, 
  7764579636, 6624948369, 4611866065, 4057914586, 8132719602, 
  4118571904, 6751169498, 9765405784, 2128317632, 6907625021, 
  1835393080}} *)
POSTED BY: Daniel Lichtblau
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract