Message Boards Message Boards

Jaro–Winkler distance in Wolfram Language ?

Posted 10 years ago

Does anyone have an idea for an efficient implementation of Jaro–Winkler distance? If there is a built in function with a different name it would be great, please let me know. If not, perhaps a modification of Levenshtein or similar would do or a compiled version (not sure which functions to use to make it compilable). I know good etiquette is to show some code, but I have to compare millions of strings pairwise and need the most efficient approach. Any advice would be appreciated - thanks in advance!

BTW does anyone see any link between Shannon Entropy and Levenshtein and similar distances? Is it possible to talk about distance metric between strings in terms of information change needed to turn one string into the other?

POSTED BY: Sam Carrettie
2 Replies

Java SimMetrics implements that distance function.

See here how to implement SimMetrics in Mathematica in this StackExchange post:

Using SimMetrics Java Function on Mathematica

POSTED BY: Rodrigo Murta

This is pretty cool, but I would like to have a self-sufficient solution not relying on external libraries. But I agree this is a very nice list - I wish Wolfram Language would have ALL (not just some of) these:

Levenstein, NeedlemanWunch, SmithWaterman, SmithWatermanGotoh, SmithWatermanGotohWindowedAffine, Jaro, JaroWinkler, ChapmanLengthDeviation, ChapmanMeanLength, QGramsDistance, BlockDistance, CosineSimilarity, DiceSimilarity, EuclideanDistance, JaccardSimilarity, MatchingCoefficient, MongeElkan, OverlapCoefficient.

And while I need a pure WL solution I am sure other interested folks can benefit from SimMetrics (github / sourceforge) and Rolf Mertig's StackExchange answer.

POSTED BY: Sam Carrettie
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
or Discard

Group Abstract Group Abstract