Message Boards Message Boards

1
|
5842 Views
|
7 Replies
|
7 Total Likes
View groups...
Share
Share this post:

Cases with string patterns and pure function

Posted 10 years ago

Hi,

I have a list of strings, let's call it "data". I want to extract those strings that match a particular pattern: "str1 str2 str3". I can accomplish that with

Cases[_?(StringMatchQ[#,NumberString~~Whitespace~~NumberString~~Whitespace~~NumberString]&)][data]

However, I would like to use the Cases feature to "format" the results in the following way (and this is why I am using Cases instead of Select):

Cases[x_?(StringMatchQ[#,NumberString~~Whitespace~~NumberString~~Whitespace~~NumberString]&)->StringSplit[x]][data]

which would yield a list structured as follows {{str1,str2,str3},...}. In this case I get the output I want by using StringSplit, but how can I (or whether I can) form more complicated rules without resorting to that StringSplit trick. More generally, how can I name arguments in the pattern within StringMatchQ to later use on the rhs of ->?

Notice that if I run

Cases[x_?(StringMatchQ[#,f:NumberString~~Whitespace~~g:NumberString~~Whitespace~~h:NumberString]&)->{f,g,h}][data]

I will not get the answer I want.

Thanks,

POSTED BY: Miguel Olivo-V
7 Replies
Posted 10 years ago

I think this meets your requirement for naming parts. Works for me.

Flatten[Map[
  StringCases[#, 
    f : NumberString ~~ Whitespace ~~ g : NumberString ~~ Whitespace ~~
       h : NumberString -> {f, g, h}] &, data], 1]
POSTED BY: Douglas Kubler
Posted 10 years ago

One more thing,

In the following example (which is very similar to yours),

Cases[x_?(StringMatchQ[#, 
       NumberString ~~ Whitespace ~~ NumberString ~~ Whitespace ~~ 
        NumberString] &) :> 
   StringCases[x, 
    f : NumberString ~~ Whitespace ~~ g : NumberString ~~ Whitespace ~~
       h : NumberString :> {f, g, h}]][data]

I don't actually need to "find" anything with StringCases since the StringMatchQ test ensures that everything that is returned has the pattern I want, I just need to apply the replacement rule. Is there a way to do just that? We are exploiting StringCases to do the replacement but we actually don't need to use that function if we can find a way to do the replacement without "searching" for the same pattern.

POSTED BY: Miguel Olivo-V

You're welcome!

I wonder whether the lack of an operator form for StringCases is just an oversight? The operator forms are all new in version 10 (and I have yet to get it in my head that they're there).

The world of Strings and their patterns is an alternative universe from the world of Mathematica patterns. And things like Cases predate by quite a few years Mathematica's inclusion of a robust set of string manipulation tools. Back in the day all one could do was to StringMatch with the wild cards * and @.

POSTED BY: David Reiss
Posted 10 years ago

Nice, I also thought that it could be accomplished combining Cases and StringCases like you show. In any case, why doesn't Cases directly take string patterns as arguments? I was expecting Mathematica (or the Wolfram Language now) to be nice enough to be able to compute

Cases[f:NumberString~~Whitespace~~g:NumberString~~Whitespace~~h:NumberString->{f,g,h}][data]

Also funny is that StringCases doesn't have an operator form.

Thanks a lot David

POSTED BY: Miguel Olivo-V

I meant to address that question. I don't think that that's possible. The named patterns are "scoped" within the pure function and are not the patterns that the Cases is making use of.

Also, I meant to comment that, in your replacements rules for the Cases and the StringCases you probably want to use a delayed rule rather than an immediate rule.

Also (again) my final expression above should probably read

StringCases[newData, 
 "X" ~~ f : NumberString ~~ Whitespace ~~ g : NumberString ~~ 
   Whitespace ~~ h : NumberString ~~ "X" :> ToExpression[{f, g, h}]]

so that you actually get numbers in the lists rather than strings with number characters.

Finally here is an approach using Cases and then modifying the result using StringCases

Cases[x_?(StringMatchQ[#, 
       NumberString ~~ Whitespace ~~ NumberString ~~ Whitespace ~~ 
        NumberString] &) :> 
   StringCases[x, 
    f : NumberString ~~ Whitespace ~~ g : NumberString ~~ Whitespace ~~
       h : NumberString :> Sequence @@ ToExpression@{f, g, h}]][data]

or a slightly different approach:

Cases[x_?(StringMatchQ[#, 
       NumberString ~~ Whitespace ~~ NumberString ~~ Whitespace ~~ 
        NumberString] &) :> 
   StringReplace[x, 
    f : NumberString ~~ Whitespace ~~ g : NumberString ~~ Whitespace ~~
       h : NumberString :> 
     ToExpression["{" <> f <> "," <> g <> "," <> h <> "}"]]][data]
POSTED BY: David Reiss
Posted 10 years ago

Thanks, that works. But I guess the proper question is whether I can name arguments inside a pure function that modifies a pattern (using PatternTest).

POSTED BY: Miguel Olivo-V

You might take an "all string" approach like this (using the "XX" as a boundary between items in the list, though this assumes that there are no Xs in the strings in the data list):

In[1]:= data = {"1 2 3", "g 8 9", "5 3  7", "8 45", "213 452 9876"}

Out[1]= {"1 2 3", "g 8 9", "5 3  7", "8 45", "213 452 9876"}

In[2]:= newData = "XX" <> StringJoin@Riffle[data, "XX"] <> "XX"

Out[2]= "XX1 2 3XXg 8 9XX5 3  7XX8 45XX213 452 9876XX"

In[3]:= StringCases[newData, 
 "X" ~~ f : NumberString ~~ Whitespace ~~ g : NumberString ~~ 
   Whitespace ~~ h : NumberString ~~ "X" :> {f, g, h}]

Out[3]= {{"1", "2", "3"}, {"5", "3", "7"}, {"213", "452", "9876"}}

POSTED BY: David Reiss
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract