This should help. First some helper code to construct an example.
randomString[n_] := StringJoin @@ Map[FromCharacterCode, RandomChoice[Range[65, 90], n]]
randomEntry[n_, m_] := Prepend[RandomInteger[{0, 9}, m], randomString[n]]
randomEntrySet[n_, m_, l_] := Table[randomEntry[n, m], {l}]
Quick example of what this does:
In[1229]:= randomEntrySet[3, 4, 10]
Out[1229]= {{"LWN", 9, 1, 8, 7}, {"JDD", 2, 2, 0, 7}, {"BKE", 3, 2, 6, 7},
{"ZGS", 5, 9, 3, 2}, {"JPW", 1, 1, 3, 9}, {"FBK", 5, 8, 4, 8},
{"IPR", 7, 9, 1, 9}, {"FQD", 9, 3, 0, 2}, {"SDD", 9, 4, 3, 2}, {"UVG", 1, 3, 3, 5}}
Create a random example for a dataset of length one million and a set of one thousand keys selected at random from that set.
SeedRandom[1234];
dsize = 10^6;
randdata = randomEntrySet[10, 10, dsize];
ksize = 10^3;
randSKUs = randdata[[RandomChoice[Range@dsize, ksize], 1]];
Your lookup iterates over all the SKUs we seek. This has the possible advantage of finding them in the order they are listed (if that's important to your work).
Timing[skuposns = Flatten@Table[
Position[randdata[[All, 1]], randSKUs[[i]]], {i, 1, Length@randSKUs}];]
(* Out[1235]= {44.2355, Null} *)
If we use Alternatives
it is much faster to find these positions.
Timing[skuposns2 = Flatten[Position[randdata[[All, 1]], Apply[Alternatives, randSKUs]]];]
(* Out[1236]= {0.095339, Null} *)
These are not in the same ordering. The first is ordered by where the SKU appears in the search list while the second is ordered by where a sought-for SKU appears in the data set. It is easy to show this so I'll skip that and just show that the two are equivalent as unordered sets.
skuposns === skuposns2, Union[skuposns] === Union[skuposns2]}
(* Out[1263]= {False, True} *)
Why did I use Union
instead of, say, Sort
? In this particular example it happens that a key we sought for is repeated in the set, so the result lengths differ by one since there's a duplicate in the first set. Which might be a disadvantage, depending on actual usage requirements.
If you do not care about the ordering of the positions then you are done. If you require the ordering of the first result, that can be obtained with a bit more work. And it's much faster since we've reduced the size of the set we'd need to work with. If this is a requirement feel free to note that and people can provide guidance.