Message Boards Message Boards

0
|
5895 Views
|
8 Replies
|
10 Total Likes
View groups...
Share
Share this post:

help cleaning up a list

Posted 11 years ago

Hello all, I am a new Mathematica user and I very much like the power of the platform for data analysis and visualization. I've run into a problem importing data from an instrument with limited formatting capabilities. The system places dashes in missing spaces and I need to strip them away. The problem is that the data comes in as a list of lists, and some of the lists are all dashes and need to be removed. A very simplified example would be:

{{1,2,3,---},{4,5,---},{---,---,---},{7,8,9,10,---}}

Which I would like to reduce to:

{{1,2,3},{4,5},{7,8,9,10}}

I can handle the trailing dashes in the first, second, and fourth list, but removing the set that is all dashes has me stumped. And since the actual data is composed of over 100 sets with anywhere from 10 to 1000 items, a generic automated method is a must. Any suggestions would be greatly appreciated. Thanks, Mike.

POSTED BY: Michael Marino
8 Replies

Great!

POSTED BY: David Reiss
Posted 11 years ago

Of course! I feel silly for not realizing it sooner. TableForm had me thinking that the data was in columns when rows where the more accurate way to think about it. Transposing the original array before using the DeleteCases function has everything working perfectly now. Thanks again David for all your help!

Chris, that is some useful information that I will definitely use in the future.

Best regards,

Mike.

POSTED BY: Michael Marino
POSTED BY: David Reiss

Straight Import of your data file, may be what is causing sub-lists to become appended. There are a few lower-level functions that can give finer control over how the data is grabbed from the file. These functions are named Read and ReadList.

To see where the sub-list confusion is coming from, start with ReadList of Record and check if that does not give you the expected data structure. If yes, then Import is getting confused. If no, then your instrument recording software is omitting record separator delimiters in the data file.

Things to try:

ReadList[ file, Record]
ReadList[ file, Record, RecordSeparators-> {"\r\n", "\n", "\r"}]
ReadList[ file, Word]
ReadList[ file, Word, WordSeparators -> {", ", ",", " ", "\t"}]

Read is like a microscope; it lets you pipe the data in from the file as a stream one chunk at a time.

Things to try:

str = OpenRead[ file]
Read[str, Word]
StreamPosition[str]

SetStreamPosition[str, \[Infinity]];
Read[str, Byte]
endoffile = StreamPosition[str]

SetStreamPosition[str, 0];
Reap[
  While[StreamPosition[str] <= 256,
   Sow[FromCharacterCode@Read[str, Byte]]]
  ][[2]]

Hmmmm... it's hard to say without an example. It is possible that the import from the text file is creating a data structure that is somewhat different from what you are expecting. Does the issue happen with all 3 approaches that I suggested? And, if so, does it happen in the same way? How large is the file that shows the problem? If it is email-able (less than 5 meg) then email it to the address that is posted on the "contact" section of my website (found from clicking on my name here).

By the way, the reason why you may have thought that the three dashes were not strings is that string characters (the quotation marks) are by default suppressed in output cells. So if you execute this in an input cell

"I am a string"

The output cell will look like this

I am a string
POSTED BY: David Reiss
Posted 11 years ago
POSTED BY: Michael Marino

Is the list that you obtain when it is imported into Mathematica this,

{{1,2,3,---},{4,5,---},{---,---,---},{7,8,9,10,---}}

or is it this:

{{1,2,3,"---"},{4,5,"---"},{"---","---","---"},{7,8,9,10,"---"}}

I.e., are the dashes imported as a string of dashes? Without them being strings, the expressions (of 3 dashes in a row) are not syntactically valid. So I will assume that they appear as strings. If so then the following will work for you:

In[1]:= test = {{1, 2, 3, "---"}, {4, 5, "---"}, {"---", "---", "---"}, {7, 8, 9, 10, "---"}}

In[2]:= DeleteCases[test /. "---" -> Sequence[], {}, Infinity]

Out[2]= {{1, 2, 3}, {4, 5}, {7, 8, 9, 10}}

Another way to do it might be

In[3]:= DeleteCases[DeleteCases[test, "---", Infinity], {}, Infinity]

Out[3]= {{1, 2, 3}, {4, 5}, {7, 8, 9, 10}}

And here's another....

In[4]:= (test /. "---" -> Sequence[]) /. {} -> Sequence[]

Out[4]= {{1, 2, 3}, {4, 5}, {7, 8, 9, 10}}
POSTED BY: David Reiss

Please see the function DeleteCases:

DeleteCases[{1, 2, 3, "---"}, "---"]
POSTED BY: Sean Clarke
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract