Message Boards Message Boards

0
|
7261 Views
|
7 Replies
|
2 Total Likes
View groups...
Share
Share this post:

How to remove lists in list containing a variable string pattern DropCases

Posted 9 years ago

How to remove lists in a list containing a variable string pattern in the Wolfram Language How To: Clean Up Data Imported from a Website ( http://reference.wolfram.com/language/howto/CleanUpDataImportedFromAWebsite.html )

In this example, the HTML page imported contained a summary row for each year that began with the string "TOTAL". The web page has been updated since the 'How To' was written and the summary row for each year now begins with the string "TOTAL YYYY", where "YYYY" is the year being summarized. See below:

tmp3 = {{"January 2015", "9,552.0", "38,158.4", 
  "-28,606.4"}, {"February 2015", "8,699.8", "31,240.1", 
  "-22,540.3"}, {"March 2015", "9,887.2", "41,121.9", 
  "-31,234.7"}, {"April 2015", "9,316.8", "35,795.1", 
  "-26,478.3"}, {"May 2015", "8,758.8", "39,211.2", 
  "-30,452.4"}, {"June 2015", "9,687.8", "41,145.1", 
  "-31,457.3"}, {"July 2015", "9,500.7", "41,077.2", 
  "-31,576.6"}, **{"TOTAL 2015", "65,403.1", "267,749.0", 
  "-202,345.9"}**, {"January 2014", "10,266.1", "38,278.6", 
  "-28,012.5"}, {"February 2014", "9,766.4", "30,617.1", "-20,850.7"}, ...

The original example used DeleteCases of the form:

tmp4 = DeleteCases[tmp3, {"TOTAL", __}];

to remove the total lists (rows) for each year. This no longer appears to work, as the first element in the list is no longer a static value for each total row, rather it is of the format described above, "TOTAL YYYY". How would you recommend updating this 'How To' example to handle this new data import?

Thx!

POSTED BY: David Proffer
7 Replies

Rather than using StringMatchQ one could use the nicer StringContainsQ or even StringStartsQ. These are slight variations of StringMatchQ...

POSTED BY: Sander Huisman
Posted 9 years ago

Thanks Girish. I couldn't get "Total*" to work. Nice with _?(..)

Here's what I had before your post. Something more general in case it's a little more complex.

reTotal[n_] := StringMatchQ[n,  RegularExpression["TOTAL .*"]]

DeleteCases[tmp3, {n_ /; reTotal[n],  ___}]

Using your idea, I was able to shorten the above using the n_/; technique.

DeleteCases[tmp3, {n_ /; StringMatchQ[n, "TOTAL*"], ___}]
POSTED BY: Dana DeLouis
Posted 9 years ago

Hey Dana, you can still try your function this way-

reTotal[n_] := StringMatchQ[n, "TOTAL*"];
DeleteCases[z1, {_?reTotal, __}, Infinity]
POSTED BY: Girish Arabale
Posted 9 years ago

Thank you. I was just looking at reg expressions today, and thought I'd give it a try.
Using you ideas, here's what I tried just for learning.
Ignore case, Look for "Total ", followed by 4 digits.

the following seems to work: :>)

fx[{nString, __}] := StringMatchQ[n, RegularExpression["(?i)Total \d{4}"]]

DeleteCases[v, _?fx]

Worked! :>)

Thanks for the feedback. :>)

POSTED BY: Dana DeLouis
Posted 9 years ago

With the PatternTest-

DeleteCases[z1, {_?(StringMatchQ[#, "TOTAL*"] &), __}, Infinity]
POSTED BY: Girish Arabale

You can try with

DeleteCases[tmp3, 
 str_String /; StringMatchQ[str, "TOTAL " ~~ DigitCharacter ..], 2]
POSTED BY: Gianluca Gorni
Posted 9 years ago

Thank you Gianluca! That looks to be the solution! Molte grazie!

z1 = {{"January 2015", "9,552.0", "38,158.4", 
   "-28,606.4"}, {"February 2015", "8,699.8", "31,240.1", 
   "-22,540.3"}, {"March 2015", "9,887.2", "41,121.9", 
   "-31,234.7"}, {"April 2015", "9,316.8", "35,795.1", 
   "-26,478.3"}, {"May 2015", "8,758.8", "39,211.2", 
   "-30,452.4"}, {"June 2015", "9,687.8", "41,145.1", 
   "-31,457.3"}, {"July 2015", "9,500.7", "41,077.2", 
   "-31,576.6"}, **{"TOTAL 2015", "65,403.1", "267,749.0", 
   "-202,345.9"}**, {"January 2014", "10,266.1", "38,278.6", 
   "-28,012.5"}, {"February 2014", "9,766.4", "30,617.1", 
   "-20,850.7"}, {"March 2014", "10,922.0", "31,325.9", 
   "-20,404.0"}, {"April 2014", "9,027.2", "36,327.6", "-27,300.4"}}

z2 = DeleteCases[
  z1, {str_String /; 
    StringMatchQ[str, "TOTAL " ~~ DigitCharacter ..], __}, 2]

{{"January 2015", "9,552.0", "38,158.4", 
  "-28,606.4"}, {"February 2015", "8,699.8", "31,240.1", 
  "-22,540.3"}, {"March 2015", "9,887.2", "41,121.9", 
  "-31,234.7"}, {"April 2015", "9,316.8", "35,795.1", 
  "-26,478.3"}, {"May 2015", "8,758.8", "39,211.2", 
  "-30,452.4"}, {"June 2015", "9,687.8", "41,145.1", 
  "-31,457.3"}, {"July 2015", "9,500.7", "41,077.2", 
  "-31,576.6"}, {"January 2014", "10,266.1", "38,278.6", 
  "-28,012.5"}, {"February 2014", "9,766.4", "30,617.1", 
  "-20,850.7"}, {"March 2014", "10,922.0", "31,325.9", 
  "-20,404.0"}, {"April 2014", "9,027.2", "36,327.6", "-27,300.4"}}
POSTED BY: David Proffer
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract