Message Boards Message Boards

0
|
5954 Views
|
14 Replies
|
0 Total Likes
View groups...
Share
Share this post:
GROUPS:

Gathering from large set of random numbers

Posted 9 years ago

How can I generate a very large (billions) list of sublists consisting of randomly permuted integers from 1 to N and then gather them in subsublists where all elements differ only by one unit? I have especially hard time with the last operation. I guess it has something to do with functional programming. Will appreciate any help.

POSTED BY: Al Guy
14 Replies

enter image description here

POSTED BY: Simon Cadrin
Posted 9 years ago

Not sure what it has to do with my question.

POSTED BY: Al Guy
Posted 9 years ago

I suspect you might have a better chance of getting useful replies if you were to manually create a small example list of permuted integers for a small N, a set small enough that it is easy understand, but just large enough that it will show each of the kinds of cases that might arise. Then manually show what the result should be after you perform your operation on these sets and explain why.

It might not be precisely clear to some readers exactly what "gather them in subsublists where all elements differ only by one unit" means and without that understanding it might be difficult to offer the code idea you are looking for.

POSTED BY: Bill Simpson
Posted 9 years ago

So far I wrote this code. But it is remarkably inefficient, I can't get beyond 10^7. Any way to accelerate?

ll = Range[12];
k = 0; n = 7;
AbsoluteTiming@(For[i = 1, i <= 10^9, i++,

    k += Length[
       Split[Sort[RandomSample[ll, n], Less], #1 - #2 == -1 &]];

    ];)
k
POSTED BY: Al Guy
Posted 9 years ago

Thank you. That helps.

You are doing lots of small integer operations. My first thought is to compare your code with a compiled version of your code.

I quit the kernel, evaluate x=0 to get the kernel loaded and then I evaluate your function.

In[1]:= x = 0

Out[1]= 0

In[2]:= AbsoluteTiming[
 k = 0; ll = Range[12]; n = 7;
 For[i = 1, i <= 10^6, i++, 
  k += Length[Split[Sort[RandomSample[ll, n], Less], #1 - #2 == -1 &]]
 ];
 k]

Out[2]= {19.4494, 3500839}

Now I quit the kernel, evaluate x=0 and then I evaluate the compiled version.

In[1]:= x = 0

Out[1]= 0

In[2]:= cfun = Compile[{},
   k = 0; ll = Range[12]; n = 7;
   Do[
    k += Length[Split[Sort[RandomSample[ll, n]], #1 - #2 == -1 &]]
    , {10^6}];
   k];
AbsoluteTiming[cfun[]]

Out[3]= {15.3916, 3500458}

So that is 19.4 seconds versus 15.4 seconds. That isn't enough of an increase.

Perhaps someone else can think of another way to increase this even more.

POSTED BY: Bill Simpson
Posted 9 years ago

I think the compilation wasn't good. Here is what CompilePrint returned:

No argument
       8 Integer registers
       2 Real registers
       1 Tensor register
       Underflow checking off
       Overflow checking off
       Integer overflow checking on
       RuntimeAttributes -> {}

       I0 = 0
       I7 = 1000000
       I1 = 12
       I6 = 7
       I3 = 1
       Result = I4

1   V17 = MainEvaluate[ Function[{}, k = 0][ ]]
2   V17 = MainEvaluate[ Function[{}, ll = Range[12]][ ]]
3   V17 = MainEvaluate[ Function[{}, n = 7][ ]]
4   I5 = I7
5   I4 = I0
6   goto 8
7   R1 = MainEvaluate[ Function[{}, k = k + \
Length[Split[Sort[RandomSample[ll, n]], #1 - #2 == -1 & ]]][ ]]
8   if[ ++ I4 <= I5] goto 7
9   I4 = MainEvaluate[ Function[{}, k][ ]]
10  Return
POSTED BY: Al Guy
Posted 9 years ago

Good catch. Thank you for pointing out my mistake.

I didn't take time to check that because I (incorrectly) assumed all those steps looked simple enough that they would compile without complaint.

Perhaps you can experiment with small changes to this until you isolate what is causing the compile to hook back to the Mathematica evaluator and then see if you can think of a simple way to rewrite that.

If I had to bet then I would expect it is the user defined function being given to Split that isn't compiling. Replacing that with some very simple iterative code might fix the problem.

That then might give the 3-10x increase in speed.

I would also watch for the size of the total being accumulated in k because it looks like it might overflow with larger exponents.

POSTED BY: Bill Simpson
Posted 9 years ago

I think the problem is Split function, it takes a lot of time to execute. I was trying to put something like CountDistnictBy but i don't know how to supply the test function that would count when the neighbors' difference is more than 2.

POSTED BY: Al Guy
Posted 9 years ago

Just to elaborate a little:

t = Table[Sort[RandomSample[ll, 7], Less], {10^5}]

How can I write a function (I assume with Map) so that it will return the number of non-consecutive neighbor discontinuities in each sublist, say 12356710 has two discontinuities? The above written Split is slow. I think there should be other faster way without drastically reorganizing the list.

POSTED BY: Al Guy
Posted 9 years ago

No idea whether this is faster or slower

In[1]:= t = Table[Sort[RandomSample[Range[12], 7], Less], {10}]

Out[1]= {{2, 3, 4, 6, 7, 11, 12}, {3, 4, 6, 8, 9, 11, 12}, {1, 2, 3, 4, 8, 9, 10}, {1, 2, 5, 6, 8, 9, 12},
   {2, 7, 8, 9, 10, 11, 12}, {1, 2, 4, 5, 6, 7, 9}, {2, 3, 4, 6, 7, 8, 12}, {3, 4, 7, 8, 9, 11, 12},
   {1, 2, 3, 5, 6, 9, 11}, {2, 4, 5, 6, 7, 8, 11}}

In[2]:= one = Table[1, {7 - 1}];
Map[1 + Total[Unitize[Rest[#] - Most[#] - one]] &, t]

Out[3]= {3, 4, 2, 4, 2, 3, 3, 3, 4, 3}
POSTED BY: Bill Simpson
Posted 9 years ago

This is great! Faster 20 times! I need one more order of magnitude and that's it.

POSTED BY: Al Guy
Posted 9 years ago
t = Table[Sort[RandomSample[Range[12], 7]], {10}]
{{1, 3, 4, 7, 9, 11, 12}, {2, 4, 6, 7, 8, 10, 11}, {2, 4, 5, 7, 9, 11,
   12}, {3, 5, 6, 8, 9, 11, 12}, {2, 3, 4, 7, 9, 10, 12}, {1, 2, 4, 8,
   9, 10, 12}, {2, 3, 4, 5, 8, 9, 10}, {2, 3, 5, 6, 7, 8, 9}, {1, 3, 
  6, 7, 9, 10, 11}, {1, 2, 3, 6, 7, 10, 11}}
Table[Split[t[[i]], #2 - #1 == 1 &], {i, Length@t}]
{{{1}, {3, 4}, {7}, {9}, {11, 12}}, {{2}, {4}, {6, 7, 8}, {10, 
   11}}, {{2}, {4, 5}, {7}, {9}, {11, 12}}, {{3}, {5, 6}, {8, 9}, {11,
    12}}, {{2, 3, 4}, {7}, {9, 10}, {12}}, {{1, 2}, {4}, {8, 9, 
   10}, {12}}, {{2, 3, 4, 5}, {8, 9, 10}}, {{2, 3}, {5, 6, 7, 8, 
   9}}, {{1}, {3}, {6, 7}, {9, 10, 11}}, {{1, 2, 3}, {6, 7}, {10, 
   11}}}

I am not sure this is what you want..

POSTED BY: Okkes Dulgerci
Posted 9 years ago

Split is a problem here. It's up in a few comments.

POSTED BY: Al Guy
Posted 9 years ago

Perhaps this is what you want to learn next.

POSTED BY: Bill Simpson
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract