Message Boards Message Boards

0
|
2788 Views
|
8 Replies
|
0 Total Likes
View groups...
Share
Share this post:

Increasing the speed of a code with Select

Posted 1 year ago

I've the following list, and I want to find all the combinations of its sub-lists that meet specific conditions, as represented below:

This is the original list:

In[77]:= pk={{2, 3, 1}, {6, 3, 2}, {18, 3, 3}, {4, 5, 1}, {20, 5, 2}, {6, 7, 
  1}, {10, 11, 1}, {12, 13, 1}, {16, 17, 1}, {18, 19, 1}};

I want to find all the combinations of sub-lists with the sum of whose first elements equal to the following value:

n=22;

Create the combinations with all possible lengths:

sub=Subsets[pk,{2,Length[pk]}];

This is the result list, which in addition meets the following condition: All the second elements in a specific sub-list should be different.

lst=Select[ sub, (#[[All,1]]//Total)==n && (#[[All,2]]//Length)==(Union[ #[[All,2]] ]//Length) &  ]

Out[80]= {{{2, 3, 1}, {20, 5, 2}}, {{6, 3, 2}, {16, 17, 1}}, {{18, 3, 
   3}, {4, 5, 1}}, {{4, 5, 1}, {18, 19, 1}}, {{6, 7, 1}, {16, 17, 
   1}}, {{10, 11, 1}, {12, 13, 1}}, {{2, 3, 1}, {4, 5, 1}, {16, 17, 
   1}}, {{6, 3, 2}, {4, 5, 1}, {12, 13, 1}}, {{6, 3, 2}, {6, 7, 
   1}, {10, 11, 1}}, {{4, 5, 1}, {6, 7, 1}, {12, 13, 1}}, {{2, 3, 
   1}, {4, 5, 1}, {6, 7, 1}, {10, 11, 1}}}

Here's my difficulty: when the initial list pk is longer than 20, the above algorithm will be very time-consuming. So, I must seek a more efficient method to solve this problem. Any tips will be appreciated.

Regards,
Zhao

POSTED BY: Hongyi Zhao
8 Replies
Posted 1 year ago

The scale of the problem is still beyond imagination and unable to handle:

In[6]:= Subsets[pk,{2,20}]//Length

During evaluation of In[6]:= Subsets::toomany: The number of subsets (3079913176815553421320) indicated by Subsets[{{2,3,1},{4,5,1},{6,3,2},{6,7,1},{10,11,1},{12,13,1},{16,17,1},{18,3,3},{18,19,1},{20,5,2},<<97>>},{2,20}] is too large; it must be a machine integer.

Out[6]= 2
POSTED BY: Hongyi Zhao
Posted 1 year ago

Some additional explainations:

In my question, the values of n and lst are not arbitrary integers. They have the following relationships and restrictions:

  1. They are all positive integers.
  2. n>(#[[1]]&/@lst//Max)
  3. The list elements are generated by Phi(x), where Phi is [Euler's totient function](https://en.wikipedia.org/wiki/Euler%27s_totient_function), and x is a prime or power of prime with the condition x - 1 <= n.

See following confirmation:

In[26]:= n>(#[[1]]&/@lst//Max)
(*
Here, #[[2]] must a prime:
*)
ForAll[#[[1]]==EulerPhi[#[[2]]^#[[3]]]&/@lst,True]

Out[26]= True

Out[27]= True

I want to know, under the constraints of the above conditions, whether this problem can be NP-Complete or even simpler?

Regards, Zhao

POSTED BY: Hongyi Zhao
Posted 1 year ago

The scale of the problem is still beyond imagination and unable to handle:

In[6]:= Subsets[pk,{2,20}]//Length

During evaluation of In[6]:= Subsets::toomany: The number of subsets (3079913176815553421320) indicated by Subsets[{{2,3,1},{4,5,1},{6,3,2},{6,7,1},{10,11,1},{12,13,1},{16,17,1},{18,3,3},{18,19,1},{20,5,2},<<97>>},{2,20}] is too large; it must be a machine integer.

Out[6]= 2
POSTED BY: Hongyi Zhao
Posted 1 year ago

Thank you for your actual data. For that, at most 20 subsets can be in any solution. Perhaps your original code with

sub=Subsets[pk,{2,20}]

might be interesting to measure. That should be significantly faster because of trying far far fewer subsets. Please let me know how much that helps if you take the time to test that.

I am sorry that my idea did not provide sufficient speed-up for you.

POSTED BY: Bill Nelson
Posted 1 year ago

My actual data is something as follows:

n = 500;
lst={ { 2, 3, 1 }, { 6, 3, 2 }, { 18, 3, 3 }, { 54, 3, 4 }, { 162, 3,
5 }, { 486, 3, 6 }, { 4, 5, 1 }, { 20, 5, 2 }, { 100, 5, 3 }, { 6, 7,
1 },
  { 42, 7, 2 }, { 294, 7, 3 }, { 10, 11, 1 }, { 110, 11, 2 }, { 12,
13, 1 }, { 156, 13, 2 }, { 16, 17, 1 }, { 272, 17, 2 }, { 18, 19, 1 },
  { 342, 19, 2 }, { 22, 23, 1 }, { 28, 29, 1 }, { 30, 31, 1 }, { 36,
37, 1 }, { 40, 41, 1 }, { 42, 43, 1 }, { 46, 47, 1 }, { 52, 53, 1 },
  { 58, 59, 1 }, { 60, 61, 1 }, { 66, 67, 1 }, { 70, 71, 1 }, { 72,
73, 1 }, { 78, 79, 1 }, { 82, 83, 1 }, { 88, 89, 1 }, { 96, 97, 1 },
  { 100, 101, 1 }, { 102, 103, 1 }, { 106, 107, 1 }, { 108, 109, 1 },
{ 112, 113, 1 }, { 126, 127, 1 }, { 130, 131, 1 }, { 136, 137, 1 },
  { 138, 139, 1 }, { 148, 149, 1 }, { 150, 151, 1 }, { 156, 157, 1 },
{ 162, 163, 1 }, { 166, 167, 1 }, { 172, 173, 1 }, { 178, 179, 1 },
  { 180, 181, 1 }, { 190, 191, 1 }, { 192, 193, 1 }, { 196, 197, 1 },
{ 198, 199, 1 }, { 210, 211, 1 }, { 222, 223, 1 }, { 226, 227, 1 },
  { 228, 229, 1 }, { 232, 233, 1 }, { 238, 239, 1 }, { 240, 241, 1 },
{ 250, 251, 1 }, { 256, 257, 1 }, { 262, 263, 1 }, { 268, 269, 1 },
  { 270, 271, 1 }, { 276, 277, 1 }, { 280, 281, 1 }, { 282, 283, 1 },
{ 292, 293, 1 }, { 306, 307, 1 }, { 310, 311, 1 }, { 312, 313, 1 },
  { 316, 317, 1 }, { 330, 331, 1 }, { 336, 337, 1 }, { 346, 347, 1 },
{ 348, 349, 1 }, { 352, 353, 1 }, { 358, 359, 1 }, { 366, 367, 1 },
  { 372, 373, 1 }, { 378, 379, 1 }, { 382, 383, 1 }, { 388, 389, 1 },
{ 396, 397, 1 }, { 400, 401, 1 }, { 408, 409, 1 }, { 418, 419, 1 },
  { 420, 421, 1 }, { 430, 431, 1 }, { 432, 433, 1 }, { 438, 439, 1 },
{ 442, 443, 1 }, { 448, 449, 1 }, { 456, 457, 1 }, { 460, 461, 1 },
  { 462, 463, 1 }, { 466, 467, 1 }, { 478, 479, 1 }, { 486, 487, 1 },
{ 490, 491, 1 }, { 498, 499, 1 } };
pk = SortBy[lst,First]

count=0;
subsets[subset_,total_,list_]:=(
  Print[count++];
  If[total== target ,Sow[Partition[Flatten[subset],3]]];
  If[list!={}&&total+list[[1,1]]<= target,
    subsets[subset,total,Rest[list]];
    subsets[{subset,list[[1]]},total+list[[1,1]],Rest[list]]
  ]);
Reap[subsets[{},0, lst]][[2,1]]

According to my test, the above code has been running for a long time without ending or seeing any improvement in efficiency. So I terminated it.

POSTED BY: Hongyi Zhao
Posted 1 year ago

Since you are using extremely long lists and you only want subsets where the first elements sum to n then it may be possible that you do not need to look at almost all possible subsets.

I sorted your list of lists on the first element. I created a recursive function to generate subsets. That will be much slower than the internal Mathematica Subsets function. But before each recursive step I looked at the total thus far and did not recurse if the total would be greater than n. That meant I could skip all remaining sets in the sorted list. For your sample data set that meant I only needed to look at 106 of the possible 1024 cases. This method was approximately 4.6 times faster than your original code for that sample data. Someone else might be able to see how to make method this even faster.

For much larger lists this might avoid an even greater amount of work. I do not know how easily this might be done with the code using IntegerDigits.

This example only generates the subset lists and does not impose your second condition.

Please test this very carefully before you depend on it.

AbsoluteTiming[pk=SortBy[{{2,3,1},{6,3,2},{18,3,3},{4,5,1},{20,5,2},{6,7,1},{10,11,1},{12,13,1},{16,17,1},{18,19,1}},First];
n=22;
count=0;
subsets[subset_,total_,list_]:=(
  Print[count++];
  If[total==n,Sow[subset]];
  If[list!={}&&total+list[[1,1]]<=n,
    subsets[subset,total,Rest[list]];
    subsets[Join[subset,{list[[1]]}],total+list[[1,1]],Rest[list]]
  ]);
Reap[subsets[{},0,pk]][[2,1]]]
POSTED BY: Bill Nelson
Posted 1 year ago

I didn't analyze your algorithm, but I don't think it finds all of the possibilities. For example, your output is missing {{2, 3, 1}, {4, 5, 1}, {6, 3, 2}, {10, 11, 1}}.

As I've said, the following condition must also be met:

All the second elements in a specific sub-list should be different.

In the above result shown by you, 3 appears twice in the 2nd positions of the sub-lists, so this shouldn't be a desired solution in my case, and I removed it from the results:

lst=Select[ sub, (#[[All,1]]//Total)==n && (#[[All,2]]//Length)==(Union[ #[[All,2]] ]//Length) &  ]

I came up with an algorithm that is faster and that finds more results. It uses IntegerPartitions instead of Subsets. However it typically takes over a minute when length of pk is 30.

Does your approach also rely on dynamic programming (DP) technology? I wonder if you can show me the algorithm?

Here's another algorithm I've noticed for this problem:

(* 2.2 The practical code
We here present a light code for the algorithm, which is written in language “Mathematica”
because of its easily available and amicable windows. *)

BeginPackage["DiscreteMath‘SubsetSum‘"]
SubsetSum::usage = "Find all solutions for subset sum problem."
Begin["‘Private‘"]
SubsetSum[t_, W_List]:=
Module[{T,K,A,len,F},
T={t};
For[i=1,i<=Length[W],i++,B=T-W[[i]];T=Join[T,B]];
K=Flatten[Position[T,0]];
If[Length[K]>0,
For[i=1,i<=Length[K],i++,
A=IntegerDigits[2(K[[i]]-1),2];
len=Length[A];F={};
For[j=1,j<=len,j++,If[A[[j]]==1,F=Append[F,W[[len-j]]]]];
Print[F]
],
Print["Fail"]
]
]
End[]
EndPackage[]

How long will your pk be? Do you need to find absolutely all solutions?

I'm solving a problem related to number theory and group theory, and of course I want the algorithm to deal with extremely long lists as efficiently as possible, so that I can make a larger comparison database for others to study and use.

POSTED BY: Hongyi Zhao
Posted 1 year ago

I didn't analyze your algorithm, but I don't think it finds all of the possibilities. For example, your output is missing {{2, 3, 1}, {4, 5, 1}, {6, 3, 2}, {10, 11, 1}}.

I came up with an algorithm that is faster and that finds more results. It uses IntegerPartitions instead of Subsets. However it typically takes over a minute when length of pk is 30. How long will your pk be? Do you need to find absolutely all solutions?

POSTED BY: Updating Name
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract