Group Abstract

Message Boards

WOLFRAM COMMUNITY

0

|

15.4K Views

|

13 Replies

|

16 Total Likes

View groups...

Follow this post

Share

Share this post:

GROUPS:

Data Science Mathematics Physics Mathematica Wolfram Language Statistics and Probability

Generate weighted random numbers

Michael Knoerzer

Michael Knoerzer

Posted 12 years ago

Hi there, I need to generate weighted random numbers. The resulting distribution of random numbers should correspond to a scalar function W(a, b). The function Random[] only gives a uniformly distribution. I think there is a function to generate weighted random numbers, but seems it does not work with a continuously function. Any ideas how to do this with mathematica? Thanks a lot in advance! Cheers, Michael

POSTED BY: Michael Knoerzer

13 Replies

Sort By:

0

Raymond Low

Posted 2 years ago

If I wanted a random sampling of a weighted distribution that was not a normal distribution or probability distribution, WM documentation implies that I use p5 = RandomSample[w -> n, 5] where w = weight and n = variable. How does Mathematica make this weighted sampling??

POSTED BY: Raymond Low

2

Jari Kirma

Posted 2 years ago

Probably by accumulating weights together, taking an uniform sample on the total range of those weights, and figuring out on what interval corresponding to a value the sample ended (or doing the same by subtracting weights from a random sample until it is below zero), then performing the same with remaining values and weights for the next sample in the case of RandomSample. Like this, but more efficiently: ClearAll[randomSample]; randomSample[w0_List -> v0_List, n0_ : default] := With[{n = Min[n0, default] /. default -> Length[w0]}, Last[Nest[ Apply[Function[{w, v, l}, With[{r = Total[w] RandomReal[]}, LengthWhile[Accumulate[Most[w]], # < r &] + 1 // {Delete[w, #], Delete[v, #], Append[l, v[[#]]]} &]]], {w0, v0, {}}, n]]]; randomSample[{1, 3, 5} -> {a, b, c}, 2] (* {c, b} *)

POSTED BY: Jari Kirma

0

Raymond Low

Posted 2 years ago

Thank you, Jari. I have read your code many times, but I don't know WM well enough to picture the verbiage, I imagined that w -> n would have to regenerate the database enumerating the variables as many times as their weight. RandomSample - to create random numbers to choose variables and then remove them before the next selection. It would seem simpler to take RandomSample of the original database, and then to Tally the database before taking the RandomSample. If the database had duplicates, does RandomSample remove the duplicates or just not replace the variable selected? Again thank you for your insights.

POSTED BY: Raymond Low

2

Jari Kirma

Posted 2 years ago

If a value is present multiple times in the input (and zero weights are not involved) it can also be present multiple times in the result from RandomSample. I'm not entirely certain how to interpret your question, but it should also be noted that - although not obvious as a feature - weights can actually affect the order of items on the result, even when it is a permutation. Consider the following, for instance: RandomSample[{1, 1000, 1000000, 1000000000} -> {1, 2, 1, 3}] This almost always results the following: {3, 1, 2, 1} Thus tallying in order to avoid manipulating the value and weight lists is unlikely to work as expected.

POSTED BY: Jari Kirma

2

Daniel Lichtblau

Daniel Lichtblau, Wolfram Research

Posted 2 years ago

If only a small number of values are required it will just pick uniform random values between 0 an 1, use the weights in effect to form subintervals of (0,1) to see which input from the list gets selected. That element is then marked as removed, so further selections of that element will be discarded. When many elements are needed, the idea is to prune the list every so often, and reweight accordingly. This at least is how I recall it being done. But this was done years ago and not by me, so I might be remembering incorrectly how it was actually coded.

POSTED BY: Daniel Lichtblau

2

Jari Kirma

Posted 2 years ago

"Discarding" a value in further selections can mean two things: re-running selection step if the item is already taken, or discarding it from further selection candidates. With very skewed probabilities the first option can be very inefficient, while the second is just (relatively) fine. Based on run time performance I could imagine something like the following being done per resulting value in C pseudocode: item_t getone(item_t values, double weights, double weightsum) { r = weightsum * randomreal(); i = 0; while (r > weights[i]) { r -= weights[i]; i++; } *weightsum -= weights[i]; weights[i] = 0; return values[i]; } One could even perform in-place compaction for such arrays... but I guess I'm thinking on quite non-Mathematica abstraction level here.

POSTED BY: Jari Kirma

2

Daniel Lichtblau

Daniel Lichtblau, Wolfram Research

Posted 2 years ago

I meant the first. Actual deletion is `O(n)`, so not really fine, unless absolutely necessary, if the list is long. I do not know how the code deals with the lopsided weights issue but most likely by having a threshold of misses-to-hits after which reweighting is done; that's what I mean by "absolutely necessary". As for in-place compaction, possibly that can be done using an indexed priority queue, if one is willing to accept each selection being `O(log n)` instead of `O(1)`. Operative term being "possibly".

POSTED BY: Daniel Lichtblau

1

Jari Kirma

Posted 2 years ago

There are certainly plenty of options available here if one really wants to go into optimising certain kinds of workloads. Just that it's pretty obvious from run time observations that the simplest strategy of "repeat if item is marked already used" is not at least always used, because if it would, taking a permutation of a list with trillion-to-one weight distribution would not be effectively instantaneous...

POSTED BY: Jari Kirma

0

Raymond Low

Posted 2 years ago

the nuances escape me db = RandomInteger[89, 5589]; db[[10 ;; 20]] Histogram[db, 89] My question would be - which method would be a better representation of the database, RandomSample on the database or RandomSample on the Tally of the database? dbTally = Tally[db]; dbTally[[10 ;; 20]]; w = dbTally[[All, 2]]; n = dbTally[[All, 1]]; sampledb = RandomSample[db, 7] sampleTally = RandomSample[w -> n, 7] I guess you have answered the question "Thus tallying in order to avoid manipulating the value and weight lists is unlikely to work as expected." And the extra work only adds distortion to the selection.

POSTED BY: Raymond Low

0

Jari Kirma

Posted 2 years ago

Ah, it may be that I misunderstood something... `RandomSample` removes the sampled value from the value list before sampling again. So, `RandomSample[{1000, 1} -> {1, 2}, 2]` may have 1 only once in a result, and only twice in `RandomSample[{999, 1, 1} -> {1, 2, 3}, 3]`, while with corresponding `RandomChoice` both are capable and very likely to return only ones. If `RandomSample` `db` directly it is capable of returning each unique value the number of times `Tally` would indicate...

POSTED BY: Jari Kirma

0

Michael Knoerzer

Michael Knoerzer

Posted 12 years ago

Thanks a lot for your replies. I need to check if I can somehow use a built-in distribution. I am not sure about this.

POSTED BY: Michael Knoerzer

2

Jari Kirma

Posted 12 years ago

As Daniel Lichtblau stated, RandomVariate combined with a specific distribution (such as NormalDistribution) does the trick, and Mathematica provides plenty of built-in distributions to play with. One benefit of using built-in distributions is that we can easily and efficiently manipulate them symbolically. For instance, numeric random variates of a specific distributions are available: RandomVariate[NormalDistribution[0, 1], {20}] (* {0.270132, -1.79075, 0.0748062, 0.715661, 0.617855, 0.202395, \ -1.01686, -0.125683, 1.5585, 1.42078, 2.04629, -0.309086, -1.33024, \ 0.591959, 0.530648, -0.52868, -1.07803, 0.504637, 0.327951, -0.569275} ) Variance[%] ( 0.963508 ) ... but also, it is possible compare this with symbolic result: Variance[NormalDistribution[0, 1]] ( 1 ) EDIT: This is probably closest to what you want. This is an example directly from Mathematica documentation: dist = ProbabilityDistribution[(Sqrt[2] / Pi) (1 / (1 + x^4)), {x, -Infinity, Infinity}]; It's important to define the function so that it integrates to unity over range of defined ProbabilityDistribution. Integrate[(Sqrt[2] / Pi) (1 / (1 + x^4)), {x, -Infinity, Infinity}] ( 1 ) Some random variates: Histogram[RandomVariate[dist, {1000}]] And symbolic manipulation is possible: Variance[dist] ( 1 ) Also, using TransformedDistribution, we can also map one distribution to another. Let's take inverse function of cumulative distribution function of a normal distribution: InverseFunction[CDF[NormalDistribution[0, 1]]][x] ( -Sqrt[2] InverseErfc[2 x] ) Now we use this to map uniform distribution to above normal distribution: dist = TransformedDistribution[-Sqrt[2] InverseErfc[2 x], x \[Distributed] UniformDistribution[{0, 1}]]; We can generate transformed numeric random variates this way: Histogram[RandomVariate[dist, {10000}]] Also, symbolic manipulation is still possible. Now it's considerably slower, though (first value is runtime in seconds, second is the symbolic result): Variance[dist] // AbsoluteTiming ( {2.605953, 1} *)

POSTED BY: Jari Kirma

3

Daniel Lichtblau

Daniel Lichtblau, Wolfram Research

Posted 12 years ago

If your W corresponds to a probability distribution (which I would think is the case) then RandomVariate is meant for this purpose.

POSTED BY: Daniel Lichtblau

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback