Posted 5 months ago
Dear all,

I urgently need your help. I have many strings of the following kind:


These are trees downloaded from the public database "TreeBase". I import them as strings and -I want to get rid of what officially is the node names in the Newick format.

To be precise, I want to delete everything that stands between a ")" symbol and THE NEXT ":" symbol.

For my example, I want to get the following:

"(VitisviniferaVvNAC09GSVIVT01009651001:1.174911034,((Musa\ acuminataGSMUAAchr7T26050001:0.1846057474,MusaacuminataGSMUA\ Achr4T07148001:0.2741664889):0.2005997307,(Oryza\ sativaOs10g215601:0.9733414222,MusaacuminataGSMUAAchr2T06610001:\ 0.4881902598):0.0623981619):0.13917144990000008);"

However, if I do

StringReplace[string, ")" ~~ __ ~~ ":" -> ")"]

what I get is this:

"(VitisviniferaVvNAC09GSVIVT01009651001:1.174911034,((Musa\ acuminataGSMUAAchr7T26050001:0.1846057474,MusaacuminataGSMUA\ Achr4T07148_001:0.2741664889)0.13917144990000008);"

I.e. Mathematica replaces everything between the first ")" symbol and the LAST ":" symbol. But that's not what I want.

Can anyone help with this please?

Thanks so much!

StringReplace[string, ")" ~~ Shortest[__] ~~ ":" -> "):"]

Normal expression patterns take the shortest pattern by default, while string patterns take the longest.

Note, I added the ":" to the rhs of the replacement rule.

In[21]:= StringReplace[string, ")" ~~ Shortest[__] ~~ ":" -> "):"]

Out[21]= "(VitisviniferaVvNAC09GSVIVT01009651001:1.174911034,((Musa \
acuminataGSMUAAchr7T26050001:0.1846057474,MusaacuminataGSMUA \
Achr4T07148001:0.2741664889):0.2005997307,(Oryza \
sativaOs10g215601:0.9733414222,MusaacuminataGSMUAAchr2T06610001: \
Posted 5 months ago

Or using RegularExpression

StringReplace[string, RegularExpression["\\).*?:"] -> "):"]

StringReplace[string, RegularExpression["\\).*?:"] -> "):"] ==
 StringReplace[string, ")" ~~ Shortest[__] ~~ ":" -> "):"]
(* True *)
Posted 5 months ago

Oh wow, I am totally unaware of Regular Expressions and Normal Expressions. Is there an easy way to understand what these two terms mean and which expressions are treated by Mathematica as which?

Posted 5 months ago

Thanks so much! I wasn't aware of "Shortest". This helped a lot.

Posted 5 months ago

Crossposted here. If you are going to crosspost, please add links to both posts so people don't waste time answering a question that has already been answered.

Posted 5 months ago

Ok, sorry! I am new to all this. Will keep that in mind for future questions!

