# Line count/patterns

Posted 10 years ago
5151 Views
|
4 Replies
|
5 Total Likes
|
 Given: contents of a text file, which may have Unix or Windows line endings, as a byte list Task: compute the line count Example: bytes={65, 13, 10, 66, 67, 13, 10} I thought Count[bytes, PatternSequence[13,10]] might do the trick, but it gives zero. Count[bytes,10|13]/2 works, if I can know that I'm getting Windows line endings.I don't want an iterative approach.
4 Replies
Sort By:
Posted 10 years ago
 I think this worksLength[Split[bytes, #1 != 10 && (#1 != 13 || #2 == 10) &]] It's a little counterintuitive in this context that Split splits when the test function returns False; thus the test is!(#1==10 || (#1 == 13&!=10))to avoid splitting on a 13 if it's followed by a 10, because I assume you also want Mac line endings to work.
Posted 10 years ago
 If you assume it's consistent within the list (either {10} or {13,10} or {13} for every line ending), then something likeIn[103]:= Max[Count[bytes, 10], Count[bytes, 13]]Out[103]= 2 is probably pretty efficient. If you want to support a mix and match, something like this should work:In[104]:= StringCount[FromCharacterCode[bytes], RegularExpression["\r\n?|\n"]]Out[104]= 2 Count[bytes, PatternSequence[13,10]] doesn't work because Count doesn't support sequence matching, only element matching. To do that kind of thing for Mathematica expression, one typical approach isIn[109]:= Count[Partition[bytes, 2, 1, 1], {13, 10}]Out[109]= 2 but it's not very efficient (and wouldn't cover the other cases anyway).
Posted 10 years ago
 For those that assume the last line is terminated by a new line, here's another approachnewlines = 1 - Unitize[bytes - 10];formfeeds = 1 - Unitize[bytes - 13];both = Rest[newlines]*Most[formfeeds];Total[newlines] + Total[formfeeds] - Total[both]I haven't check relative efficiency.
Posted 10 years ago
 What would you expect for the following?  Or are you reasonably certain this won't come up?bytes={65, 13, 10, 66, 67, 10}
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.