Message Boards Message Boards

GROUPS:

Program the distance Jaro?

Posted 3 months ago
758 Views
|
9 Replies
|
3 Total Likes
|

Hello everyone, I'm trying to program Jaro distance as requested by this page, I've done the following code that works well for the next two pairs ("MARTHA", "MARHTA") and ("DIXON", "DICKSONX") but when I try with ("JELLYFISH", "SMELLYFISH") I get an error because the code counts the double S of "SMELLYFISH", due to this error I have not been able to finish successfully, here is what I have programmed up to this moment:

uno = "DIXON"; dos ="DICKSONX" ;
rep = Characters[uno] \[Intersection] Characters[dos]
scope = Max[StringLength[uno], StringLength[dos]]/2 - 1
inter = Transpose[{Flatten[Position[Characters[uno], #] & /@ rep], 
   Flatten[Position[Characters[dos], #] & /@ rep]}]

m = Select[inter, Abs[#[[1]] - #[[2]]] < scope &]

prb = Select[m, #[[1]] != #[[2]] &]

trans = Length[DeleteCases[Position[prb, Reverse[#]] & /@ prb, {}]]/2

1/3 (Length[m]/StringLength[uno] + Length[m]/StringLength[dos] + (
   Length[m] - trans)/Length[m])

Someone who can help me solve this problem? Maybe the approach I'm using is wrong, unfortunately I have not been able to find a way to solve the problem with my code, I hope that someone can please guide me to achieve my goal, any help is welcome, thank you in advance for your help

9 Replies

One could do something like that

Clear[check, jaro]
check[d_Integer, l1_List, l2_List] := 
  If[l1[[1]] == l2[[1]] && Abs[l1[[2]] - l2[[2]]] <= d,
   {l1, l2}, (* else *) 
   Missing[]
   ];
jaro[s1_String, s2_String] := 
 Block[{r, m = {}, l1 = StringLength[s1], l2 = StringLength[s2], w1, 
    w2, l3, t = 0, l4, l5},
   r = Floor[Max[l1, l2]/2] - 1;
   w1 = Transpose[{ToCharacterCode[s1], Range[l1]}];
   w2 = Transpose[{ToCharacterCode[s2], Range[l2]}];
   l3 = DeleteMissing[
     Flatten[Outer[check[r, #1, #2] &, w1, w2, 1], 1]];
   m = Length[
     GatherBy[SortBy[l3, First[First[#]] &], First[First[#]] &]];
   {l4, l5} = SortBy[#, Last] & /@ (Union /@ Transpose[l3]);
   l4 = First /@ l4;
   l5 = First /@ l5;
   t = Length[
      Select[Subtract @@@ 
        Transpose[{Take[l4, Min[Length[l4], Length[l5]]], 
          Take[l5, Min[Length[l4], Length[l5]]]}], # != 0 &]]/2;
   Print["r = ", r, "| m = ", m, "| t = ", t]; 
   If[m == 0,
    0, (* else *)
    (m/l1 + m/l2 + (m - t)/m)/3.
    ]
   ] /; StringLength[s1] > 0 && StringLength[s2] > 0

but it has an error if s1 and s2 are the same character. Then

In[204]:= jaro["Mis", "Mis"]
During evaluation of In[204]:= r = 0| m = 3| t = 0
Out[204]= 1.

In[205]:= jaro["Miss", "Miss"]
During evaluation of In[205]:= r = 1| m = 3| t = 0
Out[205]= 0.833333

the $s$ in $ss$ is in reach of r to the other s, but seems to count twice, because m must be 4 to bring jaro to 1 in that case; correspondingly the test cases have a 50% fail, of course:

In[202]:= jaro @@@ {{"DWAYNE", "DUANE"},
  {"MARTHA", "MARHTA"},
  {"DIXON", "DICKSONX"},
  {"JELLYFISH", "SMELLYFISH"}}
During evaluation of In[202]:= r = 2| m = 4| t = 0
During evaluation of In[202]:= r = 2| m = 5| t = 1
During evaluation of In[202]:= r = 3| m = 4| t = 0
During evaluation of In[202]:= r = 4| m = 7| t = 0

(Debug) Out[202]= {0.822222, 0.822222, 0.766667, 0.825926}

so, please Luis, get it fixed, to reach at

0.8222222, 0.9444444, 0.7666667, 0.8962963.

Okay, fix it into

Clear[check, jaro]
check[d_Integer, l1_List, l2_List] := 
  If[l1[[1]] == l2[[1]] && Abs[l1[[2]] - l2[[2]]] <= d,
   {l1, l2}, (* else *) 
   Missing[]
   ];
jaro[s1_String, s2_String] := 
 Block[{r, m = {}, l1 = StringLength[s1], l2 = StringLength[s2], w1, 
    w2, l3, t = 0, l4, l5},
   r = Floor[Max[l1, l2]/2] - 1;
   If[r < 0,
    m = 0, (* else *)
    w1 = Transpose[{ToCharacterCode[s1], Range[l1]}];
    w2 = Transpose[{ToCharacterCode[s2], Range[l2]}];
    l3 = DeleteMissing[
      Flatten[Outer[check[r, #1, #2] &, w1, w2, 1], 1]];
    m = Length[GatherBy[SortBy[l3, First[First[#]] &], First]];
    {l4, l5} = SortBy[#, Last] & /@ (Union /@ Transpose[l3]);
    t = EditDistance[First /@ l4, First /@ l5]/2
    ];
   Print["r = ", r, "| m = ", m, "| t = ", t];
   If[m == 0,
    0, (* else *)
    (m/l1 + m/l2 + (m - t)/m)/3.
    ]
   ] /; StringLength[s1] > 0 && StringLength[s2] > 0

this implementation is still toy because Outer[] goes too far and then ,,,,, using the Levenshtein distance in computing the Jaro distance seems a bit lunatic ... but at least the test cases go through:

In[71]:= jaro["s", "s"]
During evaluation of In[71]:= r = -1| m = 0| t = 0
Out[71]= 0

In[72]:= jaro["Miss", "Miss"]
During evaluation of In[72]:= r = 1| m = 4| t = 0
Out[72]= 1.

In[73]:= jaro @@@ {{"DWAYNE", "DUANE"},
  {"MARTHA", "MARHTA"},
  {"DIXON", "DICKSONX"},
  {"JELLYFISH", "SMELLYFISH"}}
During evaluation of In[73]:= r = 2| m = 4| t = 0
During evaluation of In[73]:= r = 2| m = 6| t = 1
During evaluation of In[73]:= r = 3| m = 4| t = 0
During evaluation of In[73]:= r = 4| m = 8| t = 0
Out[73]= {0.822222, 0.944444, 0.766667, 0.896296}

In[76]:= jaro["Miss Argentina", "Miss Brasilia"]
During evaluation of In[76]:= r = 6| m = 8| t = 3/2
Out[76]= 0.666438

In[77]:= jaro["0100010100101001001001001010010", \
"10000100100111101010101010101010"]
During evaluation of In[77]:= r = 15| m = 31| t = 9/2
Out[77]= 0.941196
Posted 3 months ago

$Udo $ $Krause$ Thank you very much for your help and support to face this problem, I sincerely believed that this problem was easier to solve, but I see that no, it has its details. Thanks to the code you shared, I learned another different approach to mine, which is more sophisticated and elegant. For my part I have made the following code based on the response they provide using python in the page I shared, I probe it with the cases that they request there and with some that you showed. I hope the community will discuss the improvements we can make to the code and thus progress in my learning, thanks again for all your support.

jaro[uno_String, dos_String] := 
 Module[{scope, slen, tlen, smatches, tmatches, matches, 
   transpositions, i, start, end, k}, 
  scope = Floor[Max[StringLength[uno], StringLength[dos]]/2] - 1; 
  slen = StringLength[uno]; tlen = StringLength[dos]; 
  smatches = Table[False, {slen}]; tmatches = Table[False, {tlen}]; 
  matches = 0; transpositions = 0; 
  For[i = 1, i <= slen, i++, start = Max[1, i - scope]; 
   end = Min[i + scope, tlen]; 
   For[j = start, j <= end, j++, 
    If[tmatches[[j]], Continue[], 
     If[StringTake[uno, {i}] != StringTake[dos, {j}], Continue[], 
      smatches[[i]] = True; tmatches[[j]] = True]]; matches++; 
    Break[]]]; k = 1; i =.; 
  For[i = 1, i <= slen, i++, If[Not[smatches[[i]]], Continue[]]; 
   While[Not[tmatches[[k]]], k++]; 
   If[StringTake[uno, {i}] != StringTake[dos, {k}], transpositions++];
    k++]; transpositions /= 2 ; 
  1/3 (matches/slen + matches/tlen + (matches - transpositions)/
      matches) // N]

In[10]:= jaro @@@ {{"DWAYNE", "DUANE"}, {"MARTHA", 
   "MARHTA"}, {"DIXON", "DICKSONX"}, {"JELLYFISH", "SMELLYFISH"}}

Out[10]= {0.822222, 0.944444, 0.766667, 0.896296}

In[11]:= jaro @@@ {{"Miss Argentina", 
   "Miss Brasilia"}, {"MissArgentina", 
   "MissBrasilia"}, {"miss Argentina", 
   "miss Brasilia"}, {"MISSARGENTINA", "MISSBRASILIA"}, {"miss", 
   "miss"}, {"MiSs", "MisS"}}

Out[11]= {0.687271, 0.707265, 0.687271, 0.719017, 1., 0.916667}

For my part I have made the following Code

which reminds me to the quotation

enter image description here

you witness $FORTRAN66$ in Mathematica. If it is necessary to hammer out the fact that a double loop must be done -- Outer[] states exactly that -- it can be done in Mathematica without using the built-in Symbol For[] - I guess. Let's see whether the Delayed constructs allow for it. Outer[] is an overkill and elegance is of no good if it compromises performance (and/or correctness)

The Outer[] has been banned down the call tree

Clear[check, jaro]
check[l1_List, l2_List] := 
 Flatten[Outer[List, l1, Select[l2, #[[1]] == l1[[1, 1]] &], 1], 1]
jaro[s1_String, s2_String] := 
 Block[{r, l1 = StringLength[s1], l2 = StringLength[s2], w1, w2, l3, m, l4, l5, t},
   r = Floor[Max[l1, l2]/2] - 1;
   If[r >= 0,(* then *)
    w1 = Transpose[{ToCharacterCode[s1], Range[l1]}];
    w2 = Transpose[{ToCharacterCode[s2], Range[l2]}];
    l3 = Flatten[check[{w1[[#]]}, w2[[Min[l2, Max[1, # - r]] ;; Min[l2, # + r]]]] & /@ Range[l1], 1];
    If[Length[l3] > 0,
     {l4, l5} = MapAt[First, SortBy[#, Last] & /@ (Union /@ Transpose[l3]), {{1, All}, {2, All}}];
     m = MinMax[{Length[l4], Length[l5]}];
     t = Count[PadRight[l4, m[[2]], -1] - PadRight[l5, m[[2]], -1], u_ /; u != 0]/2;
     (m[[1]]/l1 + m[[1]]/l2 + (m[[1]] - t)/m[[1]])/3., (* else *)
     0 
     ], (* else *)
    0
    ]
   ] /; StringLength[s1] > 0 && StringLength[s2] > 0

to let out

In[12]:= jaro["Miss Australia", "Miss Brasilia"]
Out[12]= 0.771229

In[6]:= jaro["Miss Brasilia", "Miss Australia"]
Out[6]= 0.771229

In[8]:= jaro["s", "s"]
Out[8]= 0

In[10]:= jaro["Miss", "Miss"]
Out[10]= 1.

In[13]:= jaro @@@ {{"DWAYNE", "DUANE"},
  {"MARTHA", "MARHTA"},
  {"DIXON", "DICKSONX"},
  {"JELLYFISH", "SMELLYFISH"}}
Out[13]= {0.822222, 0.944444, 0.766667, 0.896296}

In[25]:= jaro["Miss Mexiko", "Miss Belize"]
Out[25]= 0.733766

In[26]:= jaro[ "Miss Belize", "Miss Mexiko"]
Out[26]= 0.733766

In[27]:= jaro["0100010100101001001001001010010", \
"10000100100111101010101010101010"]
Out[27]= 0.898185

In[28]:= jaro["Miss Mexikoooooooooooooooooooo", "Miss Belize"]
Out[28]= 0.602146

In[29]:= jaro["abdegopq", "cfhijklmnrstuvwyz"]
Out[29]= 0

In[30]:= jaro["cfhijklmnrstuvwyz", "abdegopq"]
Out[30]= 0

In[33]:= jaro["aasdjkdashdahsgdashdgasj", "asdjkdashdahsgdashdgasj"]
Out[33]= 0.819444

In[34]:= jaro["aasdjkdashdahsgdashdgasj", "aasdjkdashdahsgdashdgasj"]
Out[34]= 1.

In[31]:= jaro["CRATE", "TRACE"]
Out[31]= 0.733333

In[32]:= jaro["Mary has a little lamb", "and Meghan has the redhead Harry"]
Out[32]= 0.465097

to check it against another implementation textdistance has been choosen, it gives for a r = -1 nevertheless 1 and shows the following results

Python 3.7.0 (v3.7.0:1bf9cc5093, Jun 27 2018, 04:06:47) [MSC v.1914 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import textdistance
>>> textdistance.jaro.distance('Miss', 'Miss')
0
>>> textdistance.jaro('Miss', 'Miss')
1
>>> textdistance.jaro('s', 's')
1
>>> textdistance.jaro('Miss Australia', 'Miss Brasilia')
0.8166833166833167
>>> textdistance.jaro('Miss Brasilia', 'Miss Australia')
0.8166833166833167
>>> textdistance.jaro('DWAYNE', 'DUANE')
0.8222222222222223
>>> textdistance.jaro('MARTHA', 'MARHTA')
0.9444444444444445
>>> textdistance.jaro('DIXON', 'DICKSONX')
0.7666666666666666
>>> textdistance.jaro('JELLYFISH', 'SMELLYFISH')
0.8962962962962964
>>> textdistance.jaro('Miss Mexiko', 'Miss Belize')
0.7575757575757575
>>> textdistance.jaro('Miss Belize', 'Miss Mexiko')
0.7575757575757575
>>> textdistance.jaro('0100010100101001001001001010010', '10000100100111101010101010101010')
0.8308371735791091
>>> textdistance.jaro('Miss Mexikoooooooooooooooooooo', 'Miss Belize')
0.6232323232323232
>>> textdistance.jaro('abdegopq', 'cfhijklmnrstuvwyz')
0.0
>>> textdistance.jaro('cfhijklmnrstuvwyz', 'abdegopq')
0.0
>>> textdistance.jaro('aasdjkdashdahsgdashdgasj', 'asdjkdashdahsgdashdgasj')
0.841183574879227
>>> textdistance.jaro('aasdjkdashdahsgdashdgasj', 'aasdjkdashdahsgdashdgasj')
1
>>> textdistance.jaro('CRATE', 'TRACE')
0.7333333333333334
>>> textdistance.jaro('Mary has a little lamb', 'and Meghan has the redhead Harry')
0.5631555944055945
>>>

most of them disagree with the above jaro[]. Does your implementation match it?

The EditDistance[] is back and

Clear[check, jaro]
check[l1_List, l2_List] := 
 Flatten[Outer[List, l1, Select[l2, #[[1]] == l1[[1, 1]] &], 1], 1]
jaro[s1_String, s2_String, prec_: $MachinePrecision] := 
 Block[{r, l1 = StringLength[s1], l2 = StringLength[s2], l3, m, l4, 
    l5, t},
   r = Floor[Max[l1, l2]/2] - 1;
   If[r >= 0,(* then *)
    l3 = Flatten[
      check[{Transpose[{ToCharacterCode[s1], Range[l1]}][[#]]}, 
         Transpose[{ToCharacterCode[s2], Range[l2]}][[
          Min[l2, Max[1, # - r]] ;; Min[l2, # + r]]]] & /@ Range[l1], 
      1];
    If[Length[l3] > 0,
     {l4, l5} = 
      MapAt[First, 
       SortBy[#, Last] & /@ (Union /@ Transpose[l3]), {{1, All}, {2, 
         All}}];
     m = Min[Length[l4], Length[l5]];
     t = EditDistance[Take[l4, m], Take[l5, m]]/2;
     N[(m/l1 + m/l2 + (m - t)/m)/3, prec], (* else *)
     0 
     ], (* else *)
    0
    ]
   ] /; StringLength[s1] > 0 && StringLength[s2] > 0 && prec > 1

many results of textdistance.jaro() are matched

>>> textdistance.jaro('Miss Australia', 'Miss Brasilia')
0.8166833166833167
>>> textdistance.jaro('Miss Mexiko', 'Miss Belize')
0.7575757575757575
>>> textdistance.jaro('Miss Belize', 'Miss Mexiko')
0.7575757575757575

In[161]:= jaro["Miss Australia", "Miss Brasilia"]
Out[161]= 0.8166833166833167

In[162]:= jaro["Miss Mexiko", "Miss Belize"]
Out[162]= 0.7575757575757576

In[163]:= jaro[ "Miss Belize", "Miss Mexiko"]
Out[163]= 0.7575757575757576

but some not, to be discussed in the following: First define a function jo[] usable in FindRoot[] to find the m and t which have been ssemingly found by textdistance.jaro():

Clear[jo, joex]
jo[s1_Integer, s2_Integer, m_?NumberQ, t_?NumberQ, 
  prec_: $MachinePrecision] := 
 N[(m/s1 + m/s2 + (m - t)/m)/3, prec] /; m > 0 && prec > 0
joex[s1_Integer, s2_Integer, m_Integer, 
  t_?NumberQ] := (m/s1 + m/s2 + (m - t)/m)/3 /; m > 0

so one uses a jaro[] printing it's findings:

>>> textdistance.jaro('Miss Mexikoooooooooooooooooooo', 'Miss Belize')
0.6232323232323232

In[136]:= jaro["Miss Mexikoooooooooooooooooooo", "Miss Belize"]
During evaluation of In[136]:= l4 = Miss Mei
During evaluation of In[136]:= l5 = Miss eie
During evaluation of In[136]:= s1 = 30| s2 = 11| r = 14| m = 8| t = 1
Out[136]= 0.6229797979797980

the result ot textdistance.jaro() follows with m = 7 and t = 0:

In[145]:= jo[30, 11, 7, 0]
Out[145]= 0.6232323232323232

it's unclear how a disagreement in the number of matching characters can happen.

This is interesting, only the first character differs:

>>> textdistance.jaro('aasdjkdashdahsgdashdgasj', 'asdjkdashdahsgdashdgasj')
0.841183574879227

In[146]:= jaro["aasdjkdashdahsgdashdgasj", "asdjkdashdahsgdashdgasj"]
During evaluation of In[146]:= l4 = aasdjkdashdahsgdashdgasj
During evaluation of In[146]:= l5 = asdjkdashdahsgdashdgasj
During evaluation of In[146]:= s1 = 24| s2 = 23| r = 11| m = 23| t = 1
Out[146]= 0.971618357487923

this is reached by the same m (luckily), but t = 10:

In[149]:= jo[24, 23, 23, 10]
Out[149]= 0.8411835748792271

this one has a great discrepance in the number of matching charachters

>>> textdistance.jaro('0100010100101001001001001010010', '10000100100111101010101010101010')
0.8308371735791091

In[67]:= jaro["0100010100101001001001001010010", \
"10000100100111101010101010101010"]
During evaluation of In[67]:= l4 = 0100010100101001001001001010010
During evaluation of In[67]:= l5 = 10000100100111101010101010101010
During evaluation of In[67]:= s1 = 31| s2 = 32| r = 15| m = 31| t = 5
Out[67]= 0.935819892473118

because it's reached with m = 28 and t = 8

In[133]:= jo[31, 32, 28, 16/2]
Out[133]= 0.8308371735791091 

and last but not least, Mary and Meghan

>>> textdistance.jaro('Mary has a little lamb', 'and Meghan has the redhead Harry')
0.5631555944055945

In[167]:= jaro["Mary has a little lamb", "and Meghan has the redhead Harry"]
During evaluation of In[167]:= l4 = Ma has a tte a
During evaluation of In[167]:= l5 = a Meha has the eea a
During evaluation of In[167]:= s1 = 22| s2 = 32| r = 15| m = 14| t = 9/2
Out[167]= 0.5841450216450216

again textdistance,jaro() disagrees with the m (needs 13) as well as with the t (has 4):

In[153]:= jo[22, 32, 13, 4]
Out[153]= 0.5631555944055944

I quit. I expected that the m fits and the difference lies only in determining the t which has an English description

The number of matching (but different sequence order) characters divided by 2 defines the number of transpositions.

as well as a French description

Le nombre de transpositions est obtenu en comparant le i-ème caractère correspondant de {\displaystyle s{1}} s{1} avec le i-ème caractère correspondant de {\displaystyle s{2}} s{2}. Le nombre de fois où ces caractères sont différents, divisé par deux, donne le nombre de transpositions.

but if one does that, even 'Miss Australia' and 'Miss Brasilia' do not agree with textdistance.jaro().

Posted 2 months ago

Very interesting dissertation about distance Jaro, so I get to understand you still preserve certain discrepancies with

textdistance results

, I do not know how to solve them because that escapes my knowledge, but I do not know if you already know about the following that can be used to make more comparisons

jD = Experimental`JaroDistance;

jD["DIXON", "DICKSONX"]

    0.766667

jD["JELLYFISH", "SMELLYFISH"]

    0.896296

I put those examples so you can see what I mean, maybe my suggestion will help you. What illustrative has been for me everything you have done on this subject, I continue to read both your code and your observations

jD = Experimental`JaroDistance;

This function seems not to have a single point of definition:

In[1]:= $Version
Out[1]= "11.3.0 for Microsoft Windows (64-bit) (March 7, 2018)"

In[2]:= Needs["Experimental`"]

In[3]:= ?JaroDistance
Experimental`JaroDistance
Attributes[JaroDistance]={Protected}
Options[JaroDistance]={IgnoreCase->False}

In[4]:= JaroDistance["Miss Australia", "Miss Brasilia"]
Out[4]= 0.816683

In[5]:= JaroDistance["Miss Australia", "Miss Brasilia"]
Out[5]= 0.78638

In[6]:= JaroDistance["Miss Australia", "Miss Brasilia", IgnoreCase -> True]
Out[6]= 0.816683

In[7]:= JaroDistance["Miss Australia", "Miss Brasilia", IgnoreCase -> False]
Out[7]= 0.816683

this is mind-boggling; JaroDistance[] has different results under identical input, it has also different result if default option is given or not.

For the sake of completeness, I found a NIST description of the Jaro distance containing

William E. Winkler and Yves Thibaudeau, An Application of the Fellegi-Sunter Model of Record Linkage to the 1990 U.S. Decennial Census, Statistical Research Report Series RR91/09, U.S. Bureau of the Census, Washington, D.C., 1991. The abstract (HTML) and full paper (PDF).

and there it is said on p. 12

Two characters are considered in common only if they are no further apart than (m/2 - 1) where m = max(d,r). Characters in common from two strings are assigned; remaining characters unassigned. Each string has the same number of assigned characters.

That tells you to treat l3 as a whole and suggests the following implementation

Needs["Experimental`"]

Clear[check]
check[l1_List, l2_List] := Flatten[Outer[List, l1, Select[l2, #[[1]] == l1[[1, 1]] &], 1], 1]

Clear[jaro]
jaro[s1_String, s2_String, prec_: $MachinePrecision] := 
 Block[{r, l1 = StringLength[s1], l2 = StringLength[s2], l3, m, t},
   r = Floor[Max[l1, l2]/2] - 1;
   If[r >= 0,(* then *)
    l3 = DeleteDuplicates[
      Flatten[check[{Transpose[{ToCharacterCode[s1], 
              Range[l1]}][[#]]}, 
          Transpose[{ToCharacterCode[s2], Range[l2]}][[
           Min[l2, Max[1, # - r]] ;; Min[l2, # + r]]]] & /@ Range[l1],
        1], ((First[#1] == First[#2]) || (Last[#1] == Last[#2])) &];
    m = Length[l3];
    If[m > 0,
     t = (m - 
         Count[{1, -1} . 
           MapAt[First, 
            SortBy[#, Last] & /@ Transpose[l3], {{1, All}, {2, All}}],
           0])/2;
     N[(m/l1 + m/l2 + (m - t)/m)/3, prec], (* else *)
     0 
     ], (* else *)
    0
    ]
   ] /; StringLength[s1] > 0 && StringLength[s2] > 0 && prec > 1

which has still t sometimes off by 1/2 with respect to the textdistance.jaro. The JaroDistance[] is spurious (this was made with Mathematica 10.3). Taking the drudgery to create a table

In[204]:=  Grid[{{Item["s1"], Item["s2"], Item["td.jaro"], Item["jaro"], 
       Item["Defect"], Item["JaroDistance"]},
      {Item["Miss Australia"], Item["Miss Brasilia"],
       Item[0.8166833166833167],
       Item[jaro["Miss Australia", "Miss Brasilia"], Background -> Pink],
       Item["t+1/2"],
       Item[JaroDistance["Miss Australia", "Miss Brasilia", 
         IgnoreCase -> False], Background -> Pink]
       },
      {Item["DWAYNE"], Item["DUANE"],
       Item[0.8222222222222223],
       Item[jaro["DWAYNE", "DUANE"]],
       Item["-"],
       Item[JaroDistance["DWAYNE", "DUANE", IgnoreCase -> False], 
        Background -> Pink]
       },
      {Item["MARTHA"], Item["MARHTA"],
       Item[0.9444444444444445],
       Item[jaro["MARTHA", "MARHTA"]],
       Item["-"],
       Item[JaroDistance["MARTHA", "MARHTA", IgnoreCase -> False], 
        Background -> Pink]
       },
      {Item["DIXON"], Item["DICKSONX"],
       Item[0.7666666666666666],
       Item[jaro["DIXON", "DICKSONX"]],
       Item["-"],
       Item[JaroDistance["DIXON", "DICKSONX", IgnoreCase -> False], 
        Background -> Pink]
       },
      {Item["JELLYFISH"], Item["SMELLYFISH"],
       Item[0.8962962962962964],
       Item[jaro["JELLYFISH", "SMELLYFISH"]],
       Item["-"],
       Item[JaroDistance["JELLYFISH", "SMELLYFISH", IgnoreCase -> False], 
        Background -> Pink]
       },
      {Item["Miss Mexiko"], Item["Miss Belize"],
       Item[0.7575757575757575],
       Item[jaro["Miss Mexiko", "Miss Belize"]],
       Item["-"],
       Item[JaroDistance["Miss Mexiko", "Miss Belize", 
         IgnoreCase -> False], Background -> Pink]
       },
      {Item["0100010100101001001001001010010"], 
       Item["10000100100111101010101010101010"],
       Item[0.8308371735791091],
       Item[jaro["0100010100101001001001001010010", 
         "10000100100111101010101010101010"]],
       Item["-"],
       Item[JaroDistance["0100010100101001001001001010010", 
         "10000100100111101010101010101010", IgnoreCase -> False]]
       },
      {Item["aasdjkdashdahsgdashdgasj"], Item["asdjkdashdahsgdashdgasj"],
       Item[0.841183574879227],
       Item[jaro["aasdjkdashdahsgdashdgasj", "asdjkdashdahsgdashdgasj"]],
       Item["-"],
       Item[JaroDistance["aasdjkdashdahsgdashdgasj", 
         "asdjkdashdahsgdashdgasj", IgnoreCase -> False], 
        Background -> Pink]
       },
      {Item["abdegopq"], Item["cfhijklmnrstuvwxyz"],
       Item[0.0],
       Item[jaro["abdegopq", "cfhijklmnrstuvwxyz"]],
       Item["-"],
       Item[JaroDistance["abdegopq", "cfhijklmnrstuvwxyz", 
         IgnoreCase -> False]]
       },
      {Item["Mary has a little lamb"], 
       Item["and Meghan has the redhead Harry"],
       Item[0.5631555944055945],
       Item[jaro["Mary has a little lamb", 
         "and Meghan has the redhead Harry"], Background -> Pink],
       Item["t+1/2"],
       Item[JaroDistance["Mary has a little lamb", 
         "and Meghan has the redhead Harry"]]
       },
      {Item["Take[list,-n] gives the last n elements of list."], 
       Item["Take[list,{m,n}] gives elements m through n of list."],
       Item[0.791056166056166],
       Item[jaro["Take[list,-n] gives the last n elements of list.", 
         "Take[list,{m,n}] gives elements m through n of list."], 
        Background -> Pink],
       Item["t+1/2"],
       Item[JaroDistance[
         "Take[list,-n] gives the last n elements of list.", 
         "Take[list,{m,n}] gives elements m through n of list."]]
       }
      }, Background -> {None, {Cyan}}, Frame -> All]

one gets

enter image description here

and I've to confess in complete humbleness that I'm unable to get jaro[] right using the given descriptions what is is meant to be. Please read the cited PDF and tell me.

Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract