Group Abstract

Message Boards

WOLFRAM COMMUNITY

3.6K Views

8 Replies

0 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

Wolfram Language Tuning and Debugging

Increasing the speed of a code with Select

Hongyi Zhao

Posted 2 years ago

I've the following list, and I want to find all the combinations of its sub-lists that meet specific conditions, as represented below: This is the original list: In[77]:= pk={{2, 3, 1}, {6, 3, 2}, {18, 3, 3}, {4, 5, 1}, {20, 5, 2}, {6, 7, 1}, {10, 11, 1}, {12, 13, 1}, {16, 17, 1}, {18, 19, 1}}; I want to find all the combinations of sub-lists with the sum of whose first elements equal to the following value: n=22; Create the combinations with all possible lengths: sub=Subsets[pk,{2,Length[pk]}]; This is the result list, which in addition meets the following condition: All the second elements in a specific sub-list should be different. lst=Select[ sub, (#[[All,1]]//Total)==n && (#[[All,2]]//Length)==(Union[ #[[All,2]] ]//Length) & ] Out[80]= {{{2, 3, 1}, {20, 5, 2}}, {{6, 3, 2}, {16, 17, 1}}, {{18, 3, 3}, {4, 5, 1}}, {{4, 5, 1}, {18, 19, 1}}, {{6, 7, 1}, {16, 17, 1}}, {{10, 11, 1}, {12, 13, 1}}, {{2, 3, 1}, {4, 5, 1}, {16, 17, 1}}, {{6, 3, 2}, {4, 5, 1}, {12, 13, 1}}, {{6, 3, 2}, {6, 7, 1}, {10, 11, 1}}, {{4, 5, 1}, {6, 7, 1}, {12, 13, 1}}, {{2, 3, 1}, {4, 5, 1}, {6, 7, 1}, {10, 11, 1}}} Here's my difficulty: when the initial list pk is longer than 20, the above algorithm will be very time-consuming. So, I must seek a more efficient method to solve this problem. Any tips will be appreciated. Regards, Zhao

POSTED BY: Hongyi Zhao

8 Replies

Sort By:

Hongyi Zhao

Posted 2 years ago

The scale of the problem is still beyond imagination and unable to handle: In[6]:= Subsets[pk,{2,20}]//Length During evaluation of In[6]:= Subsets::toomany: The number of subsets (3079913176815553421320) indicated by Subsets[{{2,3,1},{4,5,1},{6,3,2},{6,7,1},{10,11,1},{12,13,1},{16,17,1},{18,3,3},{18,19,1},{20,5,2},<<97>>},{2,20}] is too large; it must be a machine integer. Out[6]= 2

The scale of the problem is still beyond imagination and unable to handle:

In[6]:= Subsets[pk,{2,20}]//Length

During evaluation of In[6]:= Subsets::toomany: The number of subsets (3079913176815553421320) indicated by Subsets[{{2,3,1},{4,5,1},{6,3,2},{6,7,1},{10,11,1},{12,13,1},{16,17,1},{18,3,3},{18,19,1},{20,5,2},<<97>>},{2,20}] is too large; it must be a machine integer.

Out[6]= 2

POSTED BY: Hongyi Zhao

Hongyi Zhao

Posted 2 years ago

Some additional explainations: In my question, the values of n and lst are not arbitrary integers. They have the following relationships and restrictions: They are all positive integers. n>(#[[1]]&/@lst//Max) The list elements are generated by Phi(x), where Phi is [Euler's totient function](https://en.wikipedia.org/wiki/Euler%27s_totient_function), and x is a prime or power of prime with the condition x - 1 <= n. See following confirmation: In[26]:= n>(#[[1]]&/@lst//Max) (* Here, #[[2]] must a prime: *) ForAll[#[[1]]==EulerPhi[#[[2]]^#[[3]]]&/@lst,True] Out[26]= True Out[27]= True I want to know, under the constraints of the above conditions, whether this problem can be NP-Complete or even simpler? Regards, Zhao

POSTED BY: Hongyi Zhao

Hongyi Zhao

Posted 2 years ago

The scale of the problem is still beyond imagination and unable to handle: In[6]:= Subsets[pk,{2,20}]//Length During evaluation of In[6]:= Subsets::toomany: The number of subsets (3079913176815553421320) indicated by Subsets[{{2,3,1},{4,5,1},{6,3,2},{6,7,1},{10,11,1},{12,13,1},{16,17,1},{18,3,3},{18,19,1},{20,5,2},<<97>>},{2,20}] is too large; it must be a machine integer. Out[6]= 2

The scale of the problem is still beyond imagination and unable to handle:

In[6]:= Subsets[pk,{2,20}]//Length

During evaluation of In[6]:= Subsets::toomany: The number of subsets (3079913176815553421320) indicated by Subsets[{{2,3,1},{4,5,1},{6,3,2},{6,7,1},{10,11,1},{12,13,1},{16,17,1},{18,3,3},{18,19,1},{20,5,2},<<97>>},{2,20}] is too large; it must be a machine integer.

Out[6]= 2

POSTED BY: Hongyi Zhao

Bill Nelson

Posted 2 years ago

Thank you for your actual data. For that, at most 20 subsets can be in any solution. Perhaps your original code with sub=Subsets[pk,{2,20}] might be interesting to measure. That should be significantly faster because of trying far far fewer subsets. Please let me know how much that helps if you take the time to test that. I am sorry that my idea did not provide sufficient speed-up for you.

POSTED BY: Bill Nelson

Hongyi Zhao

Posted 2 years ago

My actual data is something as follows: n = 500; lst={ { 2, 3, 1 }, { 6, 3, 2 }, { 18, 3, 3 }, { 54, 3, 4 }, { 162, 3, 5 }, { 486, 3, 6 }, { 4, 5, 1 }, { 20, 5, 2 }, { 100, 5, 3 }, { 6, 7, 1 }, { 42, 7, 2 }, { 294, 7, 3 }, { 10, 11, 1 }, { 110, 11, 2 }, { 12, 13, 1 }, { 156, 13, 2 }, { 16, 17, 1 }, { 272, 17, 2 }, { 18, 19, 1 }, { 342, 19, 2 }, { 22, 23, 1 }, { 28, 29, 1 }, { 30, 31, 1 }, { 36, 37, 1 }, { 40, 41, 1 }, { 42, 43, 1 }, { 46, 47, 1 }, { 52, 53, 1 }, { 58, 59, 1 }, { 60, 61, 1 }, { 66, 67, 1 }, { 70, 71, 1 }, { 72, 73, 1 }, { 78, 79, 1 }, { 82, 83, 1 }, { 88, 89, 1 }, { 96, 97, 1 }, { 100, 101, 1 }, { 102, 103, 1 }, { 106, 107, 1 }, { 108, 109, 1 }, { 112, 113, 1 }, { 126, 127, 1 }, { 130, 131, 1 }, { 136, 137, 1 }, { 138, 139, 1 }, { 148, 149, 1 }, { 150, 151, 1 }, { 156, 157, 1 }, { 162, 163, 1 }, { 166, 167, 1 }, { 172, 173, 1 }, { 178, 179, 1 }, { 180, 181, 1 }, { 190, 191, 1 }, { 192, 193, 1 }, { 196, 197, 1 }, { 198, 199, 1 }, { 210, 211, 1 }, { 222, 223, 1 }, { 226, 227, 1 }, { 228, 229, 1 }, { 232, 233, 1 }, { 238, 239, 1 }, { 240, 241, 1 }, { 250, 251, 1 }, { 256, 257, 1 }, { 262, 263, 1 }, { 268, 269, 1 }, { 270, 271, 1 }, { 276, 277, 1 }, { 280, 281, 1 }, { 282, 283, 1 }, { 292, 293, 1 }, { 306, 307, 1 }, { 310, 311, 1 }, { 312, 313, 1 }, { 316, 317, 1 }, { 330, 331, 1 }, { 336, 337, 1 }, { 346, 347, 1 }, { 348, 349, 1 }, { 352, 353, 1 }, { 358, 359, 1 }, { 366, 367, 1 }, { 372, 373, 1 }, { 378, 379, 1 }, { 382, 383, 1 }, { 388, 389, 1 }, { 396, 397, 1 }, { 400, 401, 1 }, { 408, 409, 1 }, { 418, 419, 1 }, { 420, 421, 1 }, { 430, 431, 1 }, { 432, 433, 1 }, { 438, 439, 1 }, { 442, 443, 1 }, { 448, 449, 1 }, { 456, 457, 1 }, { 460, 461, 1 }, { 462, 463, 1 }, { 466, 467, 1 }, { 478, 479, 1 }, { 486, 487, 1 }, { 490, 491, 1 }, { 498, 499, 1 } }; pk = SortBy[lst,First] count=0; subsets[subset_,total_,list_]:=( Print[count++]; If[total== target ,Sow[Partition[Flatten[subset],3]]]; If[list!={}&&total+list[[1,1]]<= target, subsets[subset,total,Rest[list]]; subsets[{subset,list[[1]]},total+list[[1,1]],Rest[list]] ]); Reap[subsets[{},0, lst]][[2,1]] According to my test, the above code has been running for a long time without ending or seeing any improvement in efficiency. So I terminated it.

My actual data is something as follows:

n = 500;
lst={ { 2, 3, 1 }, { 6, 3, 2 }, { 18, 3, 3 }, { 54, 3, 4 }, { 162, 3,
5 }, { 486, 3, 6 }, { 4, 5, 1 }, { 20, 5, 2 }, { 100, 5, 3 }, { 6, 7,
1 },
  { 42, 7, 2 }, { 294, 7, 3 }, { 10, 11, 1 }, { 110, 11, 2 }, { 12,
13, 1 }, { 156, 13, 2 }, { 16, 17, 1 }, { 272, 17, 2 }, { 18, 19, 1 },
  { 342, 19, 2 }, { 22, 23, 1 }, { 28, 29, 1 }, { 30, 31, 1 }, { 36,
37, 1 }, { 40, 41, 1 }, { 42, 43, 1 }, { 46, 47, 1 }, { 52, 53, 1 },
  { 58, 59, 1 }, { 60, 61, 1 }, { 66, 67, 1 }, { 70, 71, 1 }, { 72,
73, 1 }, { 78, 79, 1 }, { 82, 83, 1 }, { 88, 89, 1 }, { 96, 97, 1 },
  { 100, 101, 1 }, { 102, 103, 1 }, { 106, 107, 1 }, { 108, 109, 1 },
{ 112, 113, 1 }, { 126, 127, 1 }, { 130, 131, 1 }, { 136, 137, 1 },
  { 138, 139, 1 }, { 148, 149, 1 }, { 150, 151, 1 }, { 156, 157, 1 },
{ 162, 163, 1 }, { 166, 167, 1 }, { 172, 173, 1 }, { 178, 179, 1 },
  { 180, 181, 1 }, { 190, 191, 1 }, { 192, 193, 1 }, { 196, 197, 1 },
{ 198, 199, 1 }, { 210, 211, 1 }, { 222, 223, 1 }, { 226, 227, 1 },
  { 228, 229, 1 }, { 232, 233, 1 }, { 238, 239, 1 }, { 240, 241, 1 },
{ 250, 251, 1 }, { 256, 257, 1 }, { 262, 263, 1 }, { 268, 269, 1 },
  { 270, 271, 1 }, { 276, 277, 1 }, { 280, 281, 1 }, { 282, 283, 1 },
{ 292, 293, 1 }, { 306, 307, 1 }, { 310, 311, 1 }, { 312, 313, 1 },
  { 316, 317, 1 }, { 330, 331, 1 }, { 336, 337, 1 }, { 346, 347, 1 },
{ 348, 349, 1 }, { 352, 353, 1 }, { 358, 359, 1 }, { 366, 367, 1 },
  { 372, 373, 1 }, { 378, 379, 1 }, { 382, 383, 1 }, { 388, 389, 1 },
{ 396, 397, 1 }, { 400, 401, 1 }, { 408, 409, 1 }, { 418, 419, 1 },
  { 420, 421, 1 }, { 430, 431, 1 }, { 432, 433, 1 }, { 438, 439, 1 },
{ 442, 443, 1 }, { 448, 449, 1 }, { 456, 457, 1 }, { 460, 461, 1 },
  { 462, 463, 1 }, { 466, 467, 1 }, { 478, 479, 1 }, { 486, 487, 1 },
{ 490, 491, 1 }, { 498, 499, 1 } };
pk = SortBy[lst,First]

count=0;
subsets[subset_,total_,list_]:=(
  Print[count++];
  If[total== target ,Sow[Partition[Flatten[subset],3]]];
  If[list!={}&&total+list[[1,1]]<= target,
    subsets[subset,total,Rest[list]];
    subsets[{subset,list[[1]]},total+list[[1,1]],Rest[list]]
  ]);
Reap[subsets[{},0, lst]][[2,1]]

According to my test, the above code has been running for a long time without ending or seeing any improvement in efficiency. So I terminated it.

POSTED BY: Hongyi Zhao

Bill Nelson

Posted 2 years ago

Since you are using extremely long lists and you only want subsets where the first elements sum to n then it may be possible that you do not need to look at almost all possible subsets. I sorted your list of lists on the first element. I created a recursive function to generate subsets. That will be much slower than the internal Mathematica Subsets function. But before each recursive step I looked at the total thus far and did not recurse if the total would be greater than n. That meant I could skip all remaining sets in the sorted list. For your sample data set that meant I only needed to look at 106 of the possible 1024 cases. This method was approximately 4.6 times faster than your original code for that sample data. Someone else might be able to see how to make method this even faster. For much larger lists this might avoid an even greater amount of work. I do not know how easily this might be done with the code using IntegerDigits. This example only generates the subset lists and does not impose your second condition. Please test this very carefully before you depend on it. AbsoluteTiming[pk=SortBy[{{2,3,1},{6,3,2},{18,3,3},{4,5,1},{20,5,2},{6,7,1},{10,11,1},{12,13,1},{16,17,1},{18,19,1}},First]; n=22; count=0; subsets[subset_,total_,list_]:=( Print[count++]; If[total==n,Sow[subset]]; If[list!={}&&total+list[[1,1]]<=n, subsets[subset,total,Rest[list]]; subsets[Join[subset,{list[[1]]}],total+list[[1,1]],Rest[list]] ]); Reap[subsets[{},0,pk]][[2,1]]]

POSTED BY: Bill Nelson

Hongyi Zhao

Posted 2 years ago

I didn't analyze your algorithm, but I don't think it finds all of the possibilities. For example, your output is missing {{2, 3, 1}, {4, 5, 1}, {6, 3, 2}, {10, 11, 1}}. As I've said, the following condition must also be met: All the second elements in a specific sub-list should be different. In the above result shown by you, 3 appears twice in the 2nd positions of the sub-lists, so this shouldn't be a desired solution in my case, and I removed it from the results: lst=Select[ sub, (#[[All,1]]//Total)==n && (#[[All,2]]//Length)==(Union[ #[[All,2]] ]//Length) & ] I came up with an algorithm that is faster and that finds more results. It uses IntegerPartitions instead of Subsets. However it typically takes over a minute when length of pk is 30. Does your approach also rely on dynamic programming (DP) technology? I wonder if you can show me the algorithm? Here's another algorithm I've noticed for this problem: (* 2.2 The practical code We here present a light code for the algorithm, which is written in language “Mathematica” because of its easily available and amicable windows. *) BeginPackage["DiscreteMath‘SubsetSum‘"] SubsetSum::usage = "Find all solutions for subset sum problem." Begin["‘Private‘"] SubsetSum[t_, W_List]:= Module[{T,K,A,len,F}, T={t}; For[i=1,i<=Length[W],i++,B=T-W[[i]];T=Join[T,B]]; K=Flatten[Position[T,0]]; If[Length[K]>0, For[i=1,i<=Length[K],i++, A=IntegerDigits[2(K[[i]]-1),2]; len=Length[A];F={}; For[j=1,j<=len,j++,If[A[[j]]==1,F=Append[F,W[[len-j]]]]]; Print[F] ], Print["Fail"] ] ] End[] EndPackage[] How long will your pk be? Do you need to find absolutely all solutions? I'm solving a problem related to number theory and group theory, and of course I want the algorithm to deal with extremely long lists as efficiently as possible, so that I can make a larger comparison database for others to study and use.

POSTED BY: Hongyi Zhao

Updating Name

Posted 2 years ago

I didn't analyze your algorithm, but I don't think it finds all of the possibilities. For example, your output is missing `{{2, 3, 1}, {4, 5, 1}, {6, 3, 2}, {10, 11, 1}}`. I came up with an algorithm that is faster and that finds more results. It uses `IntegerPartitions` instead of `Subsets`. However it typically takes over a minute when length of `pk` is 30. How long will your `pk` be? Do you need to find absolutely all solutions?

POSTED BY: Updating Name

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback