# Gathering from large set of random numbers

Posted 8 years ago
5154 Views
|
14 Replies
|
0 Total Likes
|
 How can I generate a very large (billions) list of sublists consisting of randomly permuted integers from 1 to N and then gather them in subsublists where all elements differ only by one unit? I have especially hard time with the last operation. I guess it has something to do with functional programming. Will appreciate any help.
14 Replies
Sort By:
Posted 8 years ago
 t = Table[Sort[RandomSample[Range[12], 7]], {10}] {{1, 3, 4, 7, 9, 11, 12}, {2, 4, 6, 7, 8, 10, 11}, {2, 4, 5, 7, 9, 11, 12}, {3, 5, 6, 8, 9, 11, 12}, {2, 3, 4, 7, 9, 10, 12}, {1, 2, 4, 8, 9, 10, 12}, {2, 3, 4, 5, 8, 9, 10}, {2, 3, 5, 6, 7, 8, 9}, {1, 3, 6, 7, 9, 10, 11}, {1, 2, 3, 6, 7, 10, 11}} Table[Split[t[[i]], #2 - #1 == 1 &], {i, Length@t}] {{{1}, {3, 4}, {7}, {9}, {11, 12}}, {{2}, {4}, {6, 7, 8}, {10, 11}}, {{2}, {4, 5}, {7}, {9}, {11, 12}}, {{3}, {5, 6}, {8, 9}, {11, 12}}, {{2, 3, 4}, {7}, {9, 10}, {12}}, {{1, 2}, {4}, {8, 9, 10}, {12}}, {{2, 3, 4, 5}, {8, 9, 10}}, {{2, 3}, {5, 6, 7, 8, 9}}, {{1}, {3}, {6, 7}, {9, 10, 11}}, {{1, 2, 3}, {6, 7}, {10, 11}}} I am not sure this is what you want..
Posted 8 years ago
 Split is a problem here. It's up in a few comments.
Posted 8 years ago
 No idea whether this is faster or slower In[1]:= t = Table[Sort[RandomSample[Range[12], 7], Less], {10}] Out[1]= {{2, 3, 4, 6, 7, 11, 12}, {3, 4, 6, 8, 9, 11, 12}, {1, 2, 3, 4, 8, 9, 10}, {1, 2, 5, 6, 8, 9, 12}, {2, 7, 8, 9, 10, 11, 12}, {1, 2, 4, 5, 6, 7, 9}, {2, 3, 4, 6, 7, 8, 12}, {3, 4, 7, 8, 9, 11, 12}, {1, 2, 3, 5, 6, 9, 11}, {2, 4, 5, 6, 7, 8, 11}} In[2]:= one = Table[1, {7 - 1}]; Map[1 + Total[Unitize[Rest[#] - Most[#] - one]] &, t] Out[3]= {3, 4, 2, 4, 2, 3, 3, 3, 4, 3} 
Posted 8 years ago
 This is great! Faster 20 times! I need one more order of magnitude and that's it.
Posted 8 years ago
 Perhaps this is what you want to learn next.
Posted 8 years ago
 Just to elaborate a little: t = Table[Sort[RandomSample[ll, 7], Less], {10^5}] How can I write a function (I assume with Map) so that it will return the number of non-consecutive neighbor discontinuities in each sublist, say 12356710 has two discontinuities? The above written Split is slow. I think there should be other faster way without drastically reorganizing the list.
Posted 8 years ago
 I think the problem is Split function, it takes a lot of time to execute. I was trying to put something like CountDistnictBy but i don't know how to supply the test function that would count when the neighbors' difference is more than 2.
Posted 8 years ago
 I think the compilation wasn't good. Here is what CompilePrint returned: No argument 8 Integer registers 2 Real registers 1 Tensor register Underflow checking off Overflow checking off Integer overflow checking on RuntimeAttributes -> {} I0 = 0 I7 = 1000000 I1 = 12 I6 = 7 I3 = 1 Result = I4 1 V17 = MainEvaluate[ Function[{}, k = 0][ ]] 2 V17 = MainEvaluate[ Function[{}, ll = Range[12]][ ]] 3 V17 = MainEvaluate[ Function[{}, n = 7][ ]] 4 I5 = I7 5 I4 = I0 6 goto 8 7 R1 = MainEvaluate[ Function[{}, k = k + \ Length[Split[Sort[RandomSample[ll, n]], #1 - #2 == -1 & ]]][ ]] 8 if[ ++ I4 <= I5] goto 7 9 I4 = MainEvaluate[ Function[{}, k][ ]] 10 Return 
Posted 8 years ago
 Good catch. Thank you for pointing out my mistake.I didn't take time to check that because I (incorrectly) assumed all those steps looked simple enough that they would compile without complaint.Perhaps you can experiment with small changes to this until you isolate what is causing the compile to hook back to the Mathematica evaluator and then see if you can think of a simple way to rewrite that.If I had to bet then I would expect it is the user defined function being given to Split that isn't compiling. Replacing that with some very simple iterative code might fix the problem.That then might give the 3-10x increase in speed.I would also watch for the size of the total being accumulated in k because it looks like it might overflow with larger exponents.
Posted 8 years ago
 Thank you. That helps.You are doing lots of small integer operations. My first thought is to compare your code with a compiled version of your code.I quit the kernel, evaluate x=0 to get the kernel loaded and then I evaluate your function. In[1]:= x = 0 Out[1]= 0 In[2]:= AbsoluteTiming[ k = 0; ll = Range[12]; n = 7; For[i = 1, i <= 10^6, i++, k += Length[Split[Sort[RandomSample[ll, n], Less], #1 - #2 == -1 &]] ]; k] Out[2]= {19.4494, 3500839} Now I quit the kernel, evaluate x=0 and then I evaluate the compiled version. In[1]:= x = 0 Out[1]= 0 In[2]:= cfun = Compile[{}, k = 0; ll = Range[12]; n = 7; Do[ k += Length[Split[Sort[RandomSample[ll, n]], #1 - #2 == -1 &]] , {10^6}]; k]; AbsoluteTiming[cfun[]] Out[3]= {15.3916, 3500458} So that is 19.4 seconds versus 15.4 seconds. That isn't enough of an increase.Perhaps someone else can think of another way to increase this even more.
Posted 8 years ago
 So far I wrote this code. But it is remarkably inefficient, I can't get beyond 10^7. Any way to accelerate? ll = Range[12]; k = 0; n = 7; AbsoluteTiming@(For[i = 1, i <= 10^9, i++, k += Length[ Split[Sort[RandomSample[ll, n], Less], #1 - #2 == -1 &]]; ];) k 
Posted 8 years ago
 I suspect you might have a better chance of getting useful replies if you were to manually create a small example list of permuted integers for a small N, a set small enough that it is easy understand, but just large enough that it will show each of the kinds of cases that might arise. Then manually show what the result should be after you perform your operation on these sets and explain why.It might not be precisely clear to some readers exactly what "gather them in subsublists where all elements differ only by one unit" means and without that understanding it might be difficult to offer the code idea you are looking for.
Posted 8 years ago
Posted 8 years ago
 Not sure what it has to do with my question.