Group Abstract Group Abstract

Message Boards Message Boards

0
|
2K Views
|
2 Replies
|
1 Total Like
View groups...
Share
Share this post:

Comparing to Strings to know the different character

Hello Wolfram Community, my question is somehow trivial and might not be a real challenge for many people but here I go. I have a long list with 2 columns of strings for example.

String1           String2               Edit Distance        Different Character
OKJ530-WM         OKJS30-WM                      1                      5
LTBRT5100-244   LTBR-T5100-244                 1                      -

Computing the Edit Distance is pretty simple, also Sorting all my list from smaller distances to bigger ones.
In the first example the user made a mistake in the first line where they entered "5" in our system instead of "S". Most probably they got confused while copying the numbers. In the second example they forgot a "-" between "R" and "T"

I am trying to make a list of the most common mistakes to see if there is a way we can avoid them.

Right now I know String1 and String2 are different in 1 or 2 or 3 or N characters depending on the EditDistance value. However, I want to compute what is the difference and here is where I am a little lost. That is my final goal, to compute the Different Character column

I think probably a combination of StringCases and a RegularExpression might do the trick, but I am not so confident on the use of Regular Expression. Or many be there is a different way.

If anyone has an idea it will be super appreciated.

Attachments:
POSTED BY: Rodrigo Amor
2 Replies

Thank you very much Eric this works perfectly , because then I can get the middle pair that is the difference.

Love it , thanks again

POSTED BY: Rodrigo Amor
Posted 1 year ago

I think SequenceAlignment might be interesting to you. You'll be able to find the swapped characters (and other differences) by inspecting what it gives you. For example, I imported your csv file and applied it to the pairs in the first two columns:

rawData = Import[<path to file>];
pairs = rawData[[2 ;;, 1 ;; 2]];
SequenceAlignment @@@ pairs

Result:

{{"OKJ", {"5", "S"}, "30-WM"}, 
 {"CPX13", {"", " "}, "1980 P7.2"}, 
 {"LTBR", {"", "-"}, "T5100-244"}, 
 {"SFJT6-75-M3", {"_", "."}, "0"}, 
 {"TELANA10T-P8.4-B25-L12", {"", "-"}, "KD"}, 
 {"WSJM30-12-2", {",", ""}}, 
 {"CCGH18-80", {"-", "."}, "0"}, 
 {"CCGH18-80", {"-", "."}, "0"}, 
 {"LFZB8", {"", "-"}, "15"}, 
 {"LFZB12", {"", "-"}, "15"}, 
 {"LFZB12", {"", "-"}, "12"}, 
 {"MSRB", {" ", ""}, "6.5-0.5"}, 
 {"MSRB", {" ", ""}, "6.5-0.5"}, 
 {"MSRB", {" ", ""}, "6.5-0.5"}, 
 {"MSRB", {" ", ""}, "6.5-0.5"}, 
 {"37104-3122-000", {"", " "}, "FL"}, 
 {"37104-3122-000", {"", " "}, "FL"}, 
 {"HXMN", {"S", "5"}}}

There are several options that you can play with.

POSTED BY: Eric Rimbey
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard