Message Boards Message Boards

1
|
1895 Views
|
3 Replies
|
2 Total Likes
View groups...
Share
Share this post:

Is there a built-in to convert a RegularExpression to StringExpression?

Posted 1 year ago

If not, what is the StringExpression equivalent of this RegularExpression?

GGTG.{18,18}CCAA.{17,17}TTAT

Thanks!

POSTED BY: Richard Frost
3 Replies

There is no built-in function to go from Regular expression to string expression, only the opposite:

StringPattern`PatternConvert

The equivalent pattern would be:

patt = "GGTG" ~~ Repeated[Except["\n", _], {18, 18}] ~~ "CCAA" ~~ 
   Repeated[Except["\n", _], {17, 17}] ~~ "TTAT";

You can try out:

StringMatchQ["GGTG123456789012345678CCAA12345878901234567TTAT", patt]
StringMatchQ["GGTG123456789012345678CCAA123458789012345678TTAT", patt]
StringMatchQ["GGTG12345678901234567CCAA123458789012345678TTAT", patt]
StringMatchQ["GGTG12345678901234567CCAA123458789012345678TTATX", patt]
POSTED BY: Sander Huisman

Also available in the WFR ToRegularExpression.

POSTED BY: Rohit Namjoshi

Thank you for the example.

I tested RegularExpression and StringExpression against one of my genomic computations which checks 2773 different patterns against several chromosomes. In these tests, the chromosome is 35.7M characters.

Here are two examples of the 2773 match sequences for the RegularExpression test:

GGTG.{35,35}TTAT
GGTG.{13,13}CCAA.{86,86}TTAT

Here are the timing results:

read "EM03 chr 1 F parts clusters.mx.gz" (31.7237MB) (2.05 seconds)
r, k = 4713, 123 (24.5 seconds)
r, k = 9426, 311 (36.4 seconds)
r, k = 14139, 570 (50.1 seconds)
r, k = 18852, 792 (42.9 seconds)
r, k = 23565, 1123 (1.07 minutes)
r, k = 28278, 1429 (59.8 seconds)
r, k = 32991, 1767 (1.10 minutes)
r, k = 37704, 2072 (59.6 seconds)
r, k = 42417, 2415 (1.13 minutes)
r, k = 47130, 2773 (1.20 minutes)
finished cluster vetting r, k = 47132, 2773 in 9.05 minutes

Here are the same two match sequences for the StringExpression test:

GGTG~~Repeated[_,{35,35}]~~TTAT
GGTG~~Repeated[_,{13,13}]~~CCAA~~Repeated[_,{86,86}]~~TTAT

Here are the timing results:

read "EM03 chr 1 F parts clusters.mx.gz" (31.7237MB) (2.01 seconds)
r, k = 4713, 123 (1.64 minutes)
r, k = 9426, 311 (2.86 minutes)
r, k = 14139, 570 (4.06 minutes)
r, k = 18852, 792 (3.46 minutes)
r, k = 23565, 1123 (5.28 minutes)
r, k = 28278, 1429 (4.89 minutes)
r, k = 32991, 1767 (5.56 minutes)
r, k = 37704, 2072 (4.97 minutes)
r, k = 42417, 2415 (5.58 minutes)
r, k = 47130, 2773 (5.80 minutes)
finished cluster vetting r, k = 47132, 2773 in 44.1 minutes
POSTED BY: Richard Frost
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract