Message Boards Message Boards

5
|
3590 Views
|
1 Reply
|
10 Total Likes
View groups...
Share
Share this post:

Efficient sort by template

You are given 2 vectors:

x={a,b,c,d,e,f,r,e,t};
y={f,a,c,f,b,a,b,f,g};

and need to sort vector y in such a way that the HammingDistance[x,y] is minimized. This literal implementation:

HumMin[x_, y_] := First[MinimalBy[Permutations[y], HammingDistance[#, x] &, 1]]

x
HumMin[x, y]    
(*{a,b,c,d,e,f,r,e,t}*)
(*{a,b,c,f,f,f,b,a,g}*)

is terribly inefficient due to generally large space of Permutations[y]. The order of non-matching elements between x and y is not important, so there could be several equivalent solutions and any would do. So in principle an algorithm should stop as soon as first solution is found. THis is when all elements of y identical to some elements of x aligned at the same position if possible. For example it is possible for one a in y and impossible for the other. What would be a more efficient implementation?

POSTED BY: Vitaliy Kaurov

Here is a method that uses several sweeps over the lists. asically liner in complexity (there is a hidden log(n) factor but it's easy to get rid of it). I'll illustrate with your example, then wrap the thing in a Module and show some larger ones

In[724]:= x = {a, b, c, d, e, f, r, e, t};
y = {f, a, c, f, b, a, b, f, g};

Record positions of x elements.

In[755]:= xValsAndPositions = 
 Reap[MapIndexed[Sow[{#1, #2[[1]]}, #1] &, x]][[2]]

(* Out[755]= {{{a, 1}}, {{b, 2}}, {{c, 3}}, {{d, 4}}, {{e, 5}, {e, 
   8}}, {{f, 6}}, {{r, 7}}, {{t, 9}}} *)

In[756]:= valAndIndexList = 
 Map[{#[[1, 1]], #[[All, 2]]} &, xValsAndPositions]

(* Out[756]= {{a, {1}}, {b, {2}}, {c, {3}}, {d, {4}}, {e, {5, 
   8}}, {f, {6}}, {r, {7}}, {t, {9}}} *)

Create a lookup table from this. Also throw in indexing to first ocurrence positions for later use.

In[757]:= Scan[(indexList[#[[1]]] = {1, #[[2]]}) &, valAndIndexList]

Have look at the internals of this table.

In[758]:= DownValues[indexList]

(* Out[758]= {HoldPattern[indexList[a]] :> {1, {1}}, 
 HoldPattern[indexList[b]] :> {1, {2}}, 
 HoldPattern[indexList[c]] :> {1, {3}}, 
 HoldPattern[indexList[d]] :> {1, {4}}, 
 HoldPattern[indexList[e]] :> {1, {5, 8}}, 
 HoldPattern[indexList[f]] :> {1, {6}}, 
 HoldPattern[indexList[r]] :> {1, {7}}, 
 HoldPattern[indexList[t]] :> {1, {9}}} *)

Now figure out landing positions for elements in the y list.

In[759]:= yPositions = Table[
  If[ListQ[index = indexList[y[[i]]]] && 
    index[[1]] <= Length[index[[2]]],
   indexList[y[[i]]] = {index[[1]] + 1, index[[2]]}; 
   index[[2, index[[1]]]],
   0],
  {i, Length[y]}]

(* Out[759]= {6, 1, 3, 0, 2, 0, 0, 0, 0} *)

Extract the "missing" positions.

In[760]:= missingPositions = Complement[Range[Length[y]], yPositions]

(* Out[760]= {4, 5, 7, 8, 9} *)

Merge the two lists.

In[763]:= k = 0;
finalYPositions = 
 Table[If[yPositions[[j]] === 0, k++; missingPositions[[k]], 
   yPositions[[j]]], {j, Length[yPositions]}]

(* Out[764]= {6, 1, 3, 4, 2, 5, 7, 8, 9} *)

Use this to get our result.

In[767]:= y[[Ordering[finalYPositions]]]

(*Out[767]= {a, b, c, f, a, f, b, f, g} *)

Here is the full function.

reorderByHamming[x_List, y_List] := Module[
  {xValsAndPositions, valAndIndexList, indexList, yPositions, index, 
   missingPositions, k, finalPositions},
  xValsAndPositions = 
   Reap[MapIndexed[Sow[{#1, #2[[1]]}, #1] &, x]][[2]];
  valAndIndexList = Map[{#[[1, 1]], #[[All, 2]]} &, xValsAndPositions];
  Scan[(indexList[#[[1]]] = {1, #[[2]]}) &, valAndIndexList];
  yPositions = Table[
    If[ListQ[index = indexList[y[[i]]]] && 
      index[[1]] <= Length[index[[2]]],
     indexList[y[[i]]] = {index[[1]] + 1, index[[2]]}; 
     index[[2, index[[1]]]],
     0],
    {i, Length[y]}];
  missingPositions = Complement[Range[Length[y]], yPositions];
  k = 0;
  finalPositions = 
   Table[If[yPositions[[j]] === 0, k++; missingPositions[[k]], 
     yPositions[[j]]], {j, Length[yPositions]}];
  Clear[indexList];
  y[[Ordering[finalPositions]]]
  ]

We'll illustrate on a pair of lists that we expect to have a "good" reordering (low Hamming distance).

SeedRandom[111111]
x = RandomInteger[10, 30]
y = RandomInteger[10, 30]

(* Out[782]= {10, 0, 8, 9, 9, 6, 10, 2, 4, 8, 2, 8, 9, 7, 9, 3, 1, 3, 6, \
3, 10, 10, 7, 9, 6, 3, 0, 6, 2, 8}

Out[783]= {5, 5, 6, 1, 6, 2, 0, 8, 0, 5, 1, 9, 3, 1, 6, 3, 5, 3, 9, \
8, 2, 10, 8, 4, 7, 4, 2, 9, 9, 8} *)

Do the reordering.

yNew = reorderByHamming[x, y]

(* Out[785]= {10, 0, 8, 9, 9, 6, 5, 2, 4, 8, 2, 8, 9, 7, 9, 3, 1, 3, 6, \
3, 5, 5, 1, 1, 6, 5, 0, 4, 2, 8} *)

Check the result for quality.

HammingDistance[x, yNew]

(* Out[787]= 7 *)

Here is a bigger example.

SeedRandom[111111]
n = 10^5;
x = RandomInteger[10, n];
y = RandomInteger[10, n];
Timing[yNew = reorderByHamming[x, y];]
HammingDistance[x, y]
HammingDistance[x, yNew]

(* Out[806]= {3.26563, Null}

Out[807]= 90988

Out[808]= 760 *)

Could be faster but at least it's not outrageously slow. As for the result, we start with the expected roughly 10% agreement between the lists, and end with agreement to better than 99%.

POSTED BY: Daniel Lichtblau
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract