Group Abstract

Message Boards

WOLFRAM COMMUNITY

2.2K Views

2 Replies

1 Total Like

View groups...

Follow this post

Share this post:

GROUPS:

Comparing to Strings to know the different character

Rodrigo Amor

Rodrigo Amor, Misumi Corporation

Posted 2 years ago

Hello Wolfram Community, my question is somehow trivial and might not be a real challenge for many people but here I go. I have a long list with 2 columns of strings for example. String1 String2 Edit Distance Different Character OKJ530-WM OKJS30-WM 1 5 LTBRT5100-244 LTBR-T5100-244 1 - Computing the Edit Distance is pretty simple, also Sorting all my list from smaller distances to bigger ones. In the first example the user made a mistake in the first line where they entered "5" in our system instead of "S". Most probably they got confused while copying the numbers. In the second example they forgot a "-" between "R" and "T" I am trying to make a list of the most common mistakes to see if there is a way we can avoid them. Right now I know String1 and String2 are different in 1 or 2 or 3 or N characters depending on the EditDistance value. However, I want to compute what is the difference and here is where I am a little lost. That is my final goal, to compute the Different Character column I think probably a combination of StringCases and a RegularExpression might do the trick, but I am not so confident on the use of Regular Expression. Or many be there is a different way. If anyone has an idea it will be super appreciated. Attachments: Wolfram Communit...csv

POSTED BY: Rodrigo Amor

2 Replies

Sort By:

Rodrigo Amor

Rodrigo Amor, Misumi Corporation

Posted 2 years ago

Thank you very much Eric this works perfectly , because then I can get the middle pair that is the difference. Love it , thanks again

POSTED BY: Rodrigo Amor

Eric Rimbey

Posted 2 years ago

I think `SequenceAlignment` might be interesting to you. You'll be able to find the swapped characters (and other differences) by inspecting what it gives you. For example, I imported your csv file and applied it to the pairs in the first two columns: rawData = Import[<path to file>]; pairs = rawData[[2 ;;, 1 ;; 2]]; SequenceAlignment @@@ pairs Result: {{"OKJ", {"5", "S"}, "30-WM"}, {"CPX13", {"", " "}, "1980 P7.2"}, {"LTBR", {"", "-"}, "T5100-244"}, {"SFJT6-75-M3", {"_", "."}, "0"}, {"TELANA10T-P8.4-B25-L12", {"", "-"}, "KD"}, {"WSJM30-12-2", {",", ""}}, {"CCGH18-80", {"-", "."}, "0"}, {"CCGH18-80", {"-", "."}, "0"}, {"LFZB8", {"", "-"}, "15"}, {"LFZB12", {"", "-"}, "15"}, {"LFZB12", {"", "-"}, "12"}, {"MSRB", {" ", ""}, "6.5-0.5"}, {"MSRB", {" ", ""}, "6.5-0.5"}, {"MSRB", {" ", ""}, "6.5-0.5"}, {"MSRB", {" ", ""}, "6.5-0.5"}, {"37104-3122-000", {"", " "}, "FL"}, {"37104-3122-000", {"", " "}, "FL"}, {"HXMN", {"S", "5"}}} There are several options that you can play with.

I think SequenceAlignment might be interesting to you. You'll be able to find the swapped characters (and other differences) by inspecting what it gives you. For example, I imported your csv file and applied it to the pairs in the first two columns:

rawData = Import[<path to file>];
pairs = rawData[[2 ;;, 1 ;; 2]];
SequenceAlignment @@@ pairs

Result:

{{"OKJ", {"5", "S"}, "30-WM"}, 
 {"CPX13", {"", " "}, "1980 P7.2"}, 
 {"LTBR", {"", "-"}, "T5100-244"}, 
 {"SFJT6-75-M3", {"_", "."}, "0"}, 
 {"TELANA10T-P8.4-B25-L12", {"", "-"}, "KD"}, 
 {"WSJM30-12-2", {",", ""}}, 
 {"CCGH18-80", {"-", "."}, "0"}, 
 {"CCGH18-80", {"-", "."}, "0"}, 
 {"LFZB8", {"", "-"}, "15"}, 
 {"LFZB12", {"", "-"}, "15"}, 
 {"LFZB12", {"", "-"}, "12"}, 
 {"MSRB", {" ", ""}, "6.5-0.5"}, 
 {"MSRB", {" ", ""}, "6.5-0.5"}, 
 {"MSRB", {" ", ""}, "6.5-0.5"}, 
 {"MSRB", {" ", ""}, "6.5-0.5"}, 
 {"37104-3122-000", {"", " "}, "FL"}, 
 {"37104-3122-000", {"", " "}, "FL"}, 
 {"HXMN", {"S", "5"}}}

There are several options that you can play with.

POSTED BY: Eric Rimbey

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback