Message Boards Message Boards

0
|
5628 Views
|
1 Reply
|
0 Total Likes
View groups...
Share
Share this post:

Can this simple list matching task be done faster?

Hello,

I have a 2D list of about 13K rows by 10 columns containing data on test results:

Dimensions[list1]
{12946, 10}

Only the first two columns are important for what I am trying to do. The first column has only 50 unique string entries (test case names):

Dimensions[Union[Part[list1, All, 1]]]
{50}

The second column contains real numbers (let's say times). Even if the second column is considered, list1 has only about 10K unique entries; some test cases have two data points at the same time (say, at different positions).

I have another list of about 31K rows and only 3 columns:

Dimensions[list2]
{31154, 3}

The first column of list2 contains the same 50 different entries (test case names) as in list1. If the first two columns of list2 are taken, then all the cases and times found in the first two columns of list1 are contained (but in list2 they are not repeated):

Complement[Part[list1, All, {1, 2}], Part[list2, All, {1, 2}]]
{}

The third column of list2 contains data on an additional parameter that was not captured in list1. What I need to do is to construct a column (a 1D list) containing the value in the third column of list2 that corresponds to a given row (as determined by the values in the first two columns) of list1. Basically, I want to add an eleventh column to my original list1. Notice that occasionally the same value needs to be extracted twice (because of the repetitions in list1). I was able to do this by creating a function that matches the values and then mapping that function onto the first two columns of list1, but it takes long to execute (I need to do this several times):

myfind = Function[x, Part[Select[list2, Drop[#, -1] == x &, 1], 1, 3]];
AbsoluteTiming[Map[myfind, Take[list1, All, 2]];]
{139.517903, Null}

I have the impression there must be a faster way to do this, but I cannot think of one.

Thanks in advance,

OL.

POSTED BY: Otto Linsuain
Posted 2 years ago

First I make list1, list2 with the structure you describe.

strings1=Table[StringJoin@RandomChoice[Characters["abcdefghijklmnopqrstuvwxyz"],4],12946];
times1=RandomReal[{0,500},12946];
list1=Transpose[{strings1,times1}];
strings2=Table[StringJoin@RandomChoice[Characters["abcdefghijklmnopqrstuvwxyz"],4],31000-12946];
times2=RandomReal[{0,500},31000-12946];
list2=RandomSample[Join[Transpose[{strings2,times2}],list1],31000];
list2=Transpose[Join[Transpose[list2],{RandomInteger[{1,1000000},31000]}]];

Then I think the following does what you want in less than 0.15 seconds.

Scan[(fun[Most[#]]=#)&,list2];
newList=fun/@list1;
POSTED BY: Ted Ersek
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract