Message Boards Message Boards

3 Replies
2 Total Likes
View groups...
Share this post:

Is there a built-in to convert a RegularExpression to StringExpression?

Posted 2 years ago

If not, what is the StringExpression equivalent of this RegularExpression?



POSTED BY: Richard Frost
3 Replies

There is no built-in function to go from Regular expression to string expression, only the opposite:


The equivalent pattern would be:

patt = "GGTG" ~~ Repeated[Except["\n", _], {18, 18}] ~~ "CCAA" ~~ 
   Repeated[Except["\n", _], {17, 17}] ~~ "TTAT";

You can try out:

StringMatchQ["GGTG123456789012345678CCAA12345878901234567TTAT", patt]
StringMatchQ["GGTG123456789012345678CCAA123458789012345678TTAT", patt]
StringMatchQ["GGTG12345678901234567CCAA123458789012345678TTAT", patt]
StringMatchQ["GGTG12345678901234567CCAA123458789012345678TTATX", patt]
POSTED BY: Sander Huisman
Posted 2 years ago

Also available in the WFR ToRegularExpression.

POSTED BY: Rohit Namjoshi

Thank you for the example.

I tested RegularExpression and StringExpression against one of my genomic computations which checks 2773 different patterns against several chromosomes. In these tests, the chromosome is 35.7M characters.

Here are two examples of the 2773 match sequences for the RegularExpression test:


Here are the timing results:

read "EM03 chr 1 F parts" (31.7237MB) (2.05 seconds)
r, k = 4713, 123 (24.5 seconds)
r, k = 9426, 311 (36.4 seconds)
r, k = 14139, 570 (50.1 seconds)
r, k = 18852, 792 (42.9 seconds)
r, k = 23565, 1123 (1.07 minutes)
r, k = 28278, 1429 (59.8 seconds)
r, k = 32991, 1767 (1.10 minutes)
r, k = 37704, 2072 (59.6 seconds)
r, k = 42417, 2415 (1.13 minutes)
r, k = 47130, 2773 (1.20 minutes)
finished cluster vetting r, k = 47132, 2773 in 9.05 minutes

Here are the same two match sequences for the StringExpression test:


Here are the timing results:

read "EM03 chr 1 F parts" (31.7237MB) (2.01 seconds)
r, k = 4713, 123 (1.64 minutes)
r, k = 9426, 311 (2.86 minutes)
r, k = 14139, 570 (4.06 minutes)
r, k = 18852, 792 (3.46 minutes)
r, k = 23565, 1123 (5.28 minutes)
r, k = 28278, 1429 (4.89 minutes)
r, k = 32991, 1767 (5.56 minutes)
r, k = 37704, 2072 (4.97 minutes)
r, k = 42417, 2415 (5.58 minutes)
r, k = 47130, 2773 (5.80 minutes)
finished cluster vetting r, k = 47132, 2773 in 44.1 minutes
POSTED BY: Richard Frost
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
or Discard

Group Abstract Group Abstract