Message Boards Message Boards

0
|
7900 Views
|
8 Replies
|
0 Total Likes
View groups...
Share
Share this post:

FindList AND Rule to find lines with ALL (not just any) Specified Strings

Posted 11 years ago

I am trying to parse a file where I am looking for lines that contain logical combinations of strings and I want FindList to essentially follow that logic. As an example if if have the following

"some line.............a,1,2,b"
"some line..............a,1,b,3,c"    --> want FindList to pull this line out of the file for a&&b&&c but not the other two since a,b,and c are not all present there
"some line..............3,b,c,4"

Is there an easy way to implement a rule for FindList or do I need to implement my own FindList function? It would be nice if I could add a Boolean rule to FindList search strings.

FindList["myfile.csv", {"wantThisString",",AndThisString,",",AndThisString to all be present to add line to list,"}];

thanks

POSTED BY: Bob Stephens
8 Replies
Posted 11 years ago

ok, thank you very much this solution would address scenarios for larger (larger than the file I used in this example) files - not sure I understand the 80's comment............in any case, my comment saying it would be nice if this logic could be applied as a rule to FindList was for incremental reads but also for providing a cleaner (and simpler) solution by adding some conditional logic to FindList.

thanks again for your responses

POSTED BY: Bob Stephens

An intermediate approach, use FindList as far as possible, then continue to filter with Select

In[64]:= Select[
  StringSplit[#, ",", All] & /@ FindList["test_file.csv", {"Type3,"}],
   StringMatchQ[#[[3]], "1"] || StringMatchQ[#[[3]], "6"] &] // Length

Out[64]= 598
POSTED BY: Udo Krause
Posted 11 years ago

yes, I like this approach is much closer to adding conditional logic to FindList.

thanks!

POSTED BY: Bob Stephens

This is the application area of regular expressions. If you deliver an example file to parse some more hints might possibly follow.

POSTED BY: Udo Krause
Posted 11 years ago

sure - in the enclosed example file I want to use FindList to select all lines containing "Type3" AND {1 OR 6} for Index. So I assume you are saying FindList would look something like the line below?

I noticed the documentation for RegularExpression shows p1 | p2 .........string matching p1 OR p2 (which may take care of selecting index 1 or 6 in my example) but there does not appear to be an option for AND.

Please see enclosed file.

thanks

FindList["D:\\test_file.csv", <RegularExpression selecting  "Type3" AND {"1" OR "6" in Index column}> ]
Attachments:
POSTED BY: Bob Stephens

That's it

 In[1]:= SetDirectory[FileNameJoin[{NotebookDirectory[], "test"}]]
 Out[1]= "N:\\Udo\\Abt_N\\test"

 In[22]:= Select[
          Import["test_file.csv", "Data"], 
          (StringMatchQ[#[[1]], "Type3"] && ((#[[3]] == 1) || (#[[3]] == 6))) &] // Short[#, 17] &

 Out[22]//Short= 
 {{Type3,OK,1,-44.57,-40.68},{Type3,OK,6,-44.6,-41.83},
  {Type3,OK,1,-44.61,-39.72},{Type3,OK,6,-44.53,-41.44},
  {Type3,OK,1,-44.56,-39.92},{Type3,OK,6,-44.58,-40.83},
  {Type3,OK,1,-44.54,-41.47},{Type3,OK,6,-44.51,-41.17},
  {Type3,OK,1,-44.56,-39.89},{Type3,OK,6,-44.57,-41.47},
  {Type3,OK,1,-44.57,-40.62},{Type3,OK,6,-44.55,-41.61},
  {Type3,OK,1,-44.54,-40.41},{Type3,OK,6,-44.57,-41.26},
  {Type3,OK,1,-44.56,-39.98},<<568>>,{Type3,OK,6,-44.72,-41.84},
  {Type3,OK,1,-44.7,-40.06},{Type3,OK,6,-44.7,-40.89},
  {Type3,OK,1,-44.71,-40.85},{Type3,OK,6,-44.69,-42.13},
  {Type3,OK,1,-44.71,-40.68},{Type3,OK,6,-44.73,-41.68},
  {Type3,OK,1,-44.74,-40.77},{Type3,OK,6,-44.71,-41.36},
  {Type3,OK,1,-44.76,-40.94},{Type3,OK,6,-44.78,-41.68},
  {Type3,OK,1,-44.79,-39.98},{Type3,OK,6,-44.75,-42.75},
  {Type3,OK,1,-44.8,-40.52},{Type3,OK,6,-44.8,-41.7}}

Import realizes the CSV format and builds a list per row from which one just selects what's needed.

The result could be exported back to the file system as CSV file, of course.

POSTED BY: Udo Krause
Posted 11 years ago

thanks Udo that does work. Since some of the files are rather large it would be nice if this logic could be applied to FindList (instead of importing the entire file) when reading the file incrementally into memory.

POSTED BY: Bob Stephens

OMG you turn me back to the eighties ... the following works, but seems by no means faster than the CSV-Import ... have nevertheless fun with it.

Clear[stephensReader]
stephensReader[s_String, a1_String, v1_Integer:1, v2_Integer:6] := 
 Module[{str, r, o = 0, oo = 0, n },
   If[!FileExistsQ[s],
    Print["Sorry, cannot find file \"", s, "\". Bye."];
    Return[$Failed]
   ];
   str = OpenRead[s, BinaryFormat -> True];
   While[True,
    r = ReadLine[str];
    If[r == EndOfFile, Break[], o++, o++];
    r = StringSplit[r, ",", All];
    n = ToExpression[r[[3]]];
    If[StringMatchQ[r[[1]], a1] && ((n == v1) || (n == v2)),
     oo++;
     Print[r]
    ]
   ];
   Close[str];
   Print["Lines read: ", o, " Lines selected: ", oo]
  ] /; StringLength[s] > 0 && StringLength[a1] > 0


stephensReader["test_file.csv", "Type3"]

{Type3,OK,1,-44.57,-40.68}
{Type3,OK,6,-44.6,-41.83}
{Type3,OK,1,-44.61,-39.72}
{Type3,OK,6,-44.53,-41.44}
{Type3,OK,1,-44.56,-39.92}
{Type3,OK,6,-44.58,-40.83}
{Type3,OK,1,-44.54,-41.47}
{Type3,OK,6,-44.51,-41.17}
{Type3,OK,1,-44.56,-39.89}
<snip>
{Type3,OK,6,-44.71,-41.36}
{Type3,OK,1,-44.76,-40.94}
{Type3,OK,6,-44.78,-41.68}
{Type3,OK,1,-44.79,-39.98}
{Type3,OK,6,-44.75,-42.75}
{Type3,OK,1,-44.8,-40.52}
{Type3,OK,6,-44.8,-41.7}

Lines read: 6327 Lines selected: 598

P.S.: The Print of rows is there for demonstration only - stephensReader should as usual return the list of matching rows to be useful.

Clear[stephensReader]
stephensReader[s_String, a1_String, v1_Integer: 1, v2_Integer: 6] := 
 Module[{str, r, o = 0, oo = 0, n, resL = {}},
   If[! FileExistsQ[s],
    Print["Sorry, cannot find file \"", s, "\". Bye."];
    Return[$Failed]
   ];
   str = OpenRead[s, BinaryFormat -> True];
   While[True,
    r = ReadLine[str];
    If[r == EndOfFile, Break[], o++, o++];
    r = StringSplit[r, ",", All];
    n = ToExpression[r[[3]]];
    If[StringMatchQ[r[[1]], a1] && ((n == v1) || (n == v2)),
     oo++;
     resL = Join[resL, {r}];
    ]
   ];
   Close[str];
   Print["Lines read: ", o, "| Lines selected: ", oo];
   resL
 ] /; StringLength[s] > 0 && StringLength[a1] > 0

In[65]:= stephensReader["test_file.csv", "Type3"] // Length
During evaluation of In[65]:= Lines read: 6327| Lines selected: 598
Out[65]= 598
POSTED BY: Udo Krause
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract