Message Boards Message Boards

GROUPS:

Partial string replacement

Posted 5 months ago
825 Views
|
6 Replies
|
3 Total Likes
|

Dear all,

I urgently need your help. I have many strings of the following kind:

string="(Vitis_vinifera_VvNAC09_GSVIVT01009651001:1.174911034,((Musa_\
acuminata_GSMUA_Achr7T26050_001:0.1846057474,Musa_acuminata_GSMUA_\
Achr4T07148_001:0.2741664889)_D_1_0_0_9260000000_:0.2005997307,(Oryza_\
sativa_Os10g21560_1:0.9733414222,Musa_acuminata_GSMUA_Achr2T06610_001:\
0.4881902598)0_1700000000:0.0623981619)_D_0_5__:0.13917144990000008);"

These are trees downloaded from the public database "TreeBase". I import them as strings and -I want to get rid of what officially is the node names in the Newick format.

To be precise, I want to delete everything that stands between a ")" symbol and THE NEXT ":" symbol.

For my example, I want to get the following:

"(VitisviniferaVvNAC09GSVIVT01009651001:1.174911034,((Musa\ acuminataGSMUAAchr7T26050001:0.1846057474,MusaacuminataGSMUA\ Achr4T07148001:0.2741664889):0.2005997307,(Oryza\ sativaOs10g215601:0.9733414222,MusaacuminataGSMUAAchr2T06610001:\ 0.4881902598):0.0623981619):0.13917144990000008);"

However, if I do

StringReplace[string, ")" ~~ __ ~~ ":" -> ")"]

what I get is this:

"(VitisviniferaVvNAC09GSVIVT01009651001:1.174911034,((Musa\ acuminataGSMUAAchr7T26050001:0.1846057474,MusaacuminataGSMUA\ Achr4T07148_001:0.2741664889)0.13917144990000008);"

I.e. Mathematica replaces everything between the first ")" symbol and the LAST ":" symbol. But that's not what I want.

Can anyone help with this please?

Thanks so much!

6 Replies

use

StringReplace[string, ")" ~~ Shortest[__] ~~ ":" -> "):"]

Normal expression patterns take the shortest pattern by default, while string patterns take the longest.

Note, I added the ":" to the rhs of the replacement rule.

In[21]:= StringReplace[string, ")" ~~ Shortest[__] ~~ ":" -> "):"]

Out[21]= "(VitisviniferaVvNAC09GSVIVT01009651001:1.174911034,((Musa \
acuminataGSMUAAchr7T26050001:0.1846057474,MusaacuminataGSMUA \
Achr4T07148001:0.2741664889):0.2005997307,(Oryza \
sativaOs10g215601:0.9733414222,MusaacuminataGSMUAAchr2T06610001: \
0.4881902598):0.0623981619):0.13917144990000008);"
Posted 5 months ago

Or using RegularExpression

StringReplace[string, RegularExpression["\\).*?:"] -> "):"]

StringReplace[string, RegularExpression["\\).*?:"] -> "):"] ==
 StringReplace[string, ")" ~~ Shortest[__] ~~ ":" -> "):"]
(* True *)
Posted 5 months ago

Oh wow, I am totally unaware of Regular Expressions and Normal Expressions. Is there an easy way to understand what these two terms mean and which expressions are treated by Mathematica as which?

Posted 5 months ago

Thanks so much! I wasn't aware of "Shortest". This helped a lot.

Posted 5 months ago

Crossposted here. If you are going to crosspost, please add links to both posts so people don't waste time answering a question that has already been answered.

Posted 5 months ago

Ok, sorry! I am new to all this. Will keep that in mind for future questions!

Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract