Message Boards Message Boards

0
|
3277 Views
|
6 Replies
|
3 Total Likes
View groups...
Share
Share this post:

Partial string replacement

Posted 3 years ago

Dear all,

I urgently need your help. I have many strings of the following kind:

string="(Vitis_vinifera_VvNAC09_GSVIVT01009651001:1.174911034,((Musa_\
acuminata_GSMUA_Achr7T26050_001:0.1846057474,Musa_acuminata_GSMUA_\
Achr4T07148_001:0.2741664889)_D_1_0_0_9260000000_:0.2005997307,(Oryza_\
sativa_Os10g21560_1:0.9733414222,Musa_acuminata_GSMUA_Achr2T06610_001:\
0.4881902598)0_1700000000:0.0623981619)_D_0_5__:0.13917144990000008);"

These are trees downloaded from the public database "TreeBase". I import them as strings and -I want to get rid of what officially is the node names in the Newick format.

To be precise, I want to delete everything that stands between a ")" symbol and THE NEXT ":" symbol.

For my example, I want to get the following:

"(VitisviniferaVvNAC09GSVIVT01009651001:1.174911034,((Musa\ acuminataGSMUAAchr7T26050001:0.1846057474,MusaacuminataGSMUA\ Achr4T07148001:0.2741664889):0.2005997307,(Oryza\ sativaOs10g215601:0.9733414222,MusaacuminataGSMUAAchr2T06610001:\ 0.4881902598):0.0623981619):0.13917144990000008);"

However, if I do

StringReplace[string, ")" ~~ __ ~~ ":" -> ")"]

what I get is this:

"(VitisviniferaVvNAC09GSVIVT01009651001:1.174911034,((Musa\ acuminataGSMUAAchr7T26050001:0.1846057474,MusaacuminataGSMUA\ Achr4T07148_001:0.2741664889)0.13917144990000008);"

I.e. Mathematica replaces everything between the first ")" symbol and the LAST ":" symbol. But that's not what I want.

Can anyone help with this please?

Thanks so much!

POSTED BY: copito411
6 Replies
Posted 3 years ago

Crossposted here. If you are going to crosspost, please add links to both posts so people don't waste time answering a question that has already been answered.

POSTED BY: Rohit Namjoshi
Posted 3 years ago

Ok, sorry! I am new to all this. Will keep that in mind for future questions!

POSTED BY: copito411

use

StringReplace[string, ")" ~~ Shortest[__] ~~ ":" -> "):"]

Normal expression patterns take the shortest pattern by default, while string patterns take the longest.

Note, I added the ":" to the rhs of the replacement rule.

In[21]:= StringReplace[string, ")" ~~ Shortest[__] ~~ ":" -> "):"]

Out[21]= "(VitisviniferaVvNAC09GSVIVT01009651001:1.174911034,((Musa \
acuminataGSMUAAchr7T26050001:0.1846057474,MusaacuminataGSMUA \
Achr4T07148001:0.2741664889):0.2005997307,(Oryza \
sativaOs10g215601:0.9733414222,MusaacuminataGSMUAAchr2T06610001: \
0.4881902598):0.0623981619):0.13917144990000008);"
POSTED BY: Robert Nachbar
Posted 3 years ago

Or using RegularExpression

StringReplace[string, RegularExpression["\\).*?:"] -> "):"]

StringReplace[string, RegularExpression["\\).*?:"] -> "):"] ==
 StringReplace[string, ")" ~~ Shortest[__] ~~ ":" -> "):"]
(* True *)
POSTED BY: Rohit Namjoshi
Posted 3 years ago

Oh wow, I am totally unaware of Regular Expressions and Normal Expressions. Is there an easy way to understand what these two terms mean and which expressions are treated by Mathematica as which?

POSTED BY: copito411
Posted 3 years ago

Thanks so much! I wasn't aware of "Shortest". This helped a lot.

POSTED BY: copito411
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract