Message Boards Message Boards

Ignore portions that are part of a CSV file when importing?

Posted 8 years ago

Hi, I'm trying to build an importing and data processing tool in wolfram development platform and run into a little snag. (most likely my inexperience with Wolfram language).

What I am trying to do is import a CSV/ file (see snippit below) that contains comments and markers to describe the data. If I manually strip out the comments and markers I can easy import the data and create graphs with them. (so far so good). Now, since I have a considerable amount of these files I wanted to automate that. Comments are marked with # and dataset begins with \ $V and ends with \$E.

So here are my questions: Q1. Can I tell the importer (or maybe pre/post processing the file) to ignore elements that are not part of the data? Q2. Can I extract the comments based on the markers (maybe parse the file? scanf?) and add them to a specified variable for further use?

Thanks for any pointers you might give me. The options in the Wolfram Language are a bit overwhelming and getting lost is easy.

#A Albert
#B 0.5 %
#C 0.01726
$V
-25.0,-25.0,5.0,0.053609
-24.0,-25.0,5.0,0.065964
-23.0,-25.0,5.0,0.051466
-22.0,-25.0,5.0,0.053896
$E

[edit] So I made some progress. Based on the following code: rawDataSet= ReadList["test.txt",Record, RecordSeparators->{"$V", "$E" }]; measurementSet = Table[rawDataSet[[2]]]; Print[measurementSet]

This results into an Table object (at least from what I can gather? What is the way to get information on an object?). But it appears that the knowledge that this is an CSV list is lost, its just a flat string. I tried StringSplit[measurementSet,","] but that didn't really seems to give me an 2D array of items to plot a graph from. Any suggestions?

POSTED BY: Erick van Rijk
6 Replies
Posted 8 years ago

Hi David, many thanks for the example. It will take me a while to study the method you made, since there are things that are new to me in there.

Thanks for the help! Erick

POSTED BY: Erick van Rijk
Posted 8 years ago

Hi David,

thank you very much for the detailed examples you provided. This allowed me to continue playing with processing the raw data. Do you have a suggestion for the processing of the # statements as variables? Using the Cases statement?

thanks Erick

POSTED BY: Updating Name
Posted 8 years ago

Hi Erick,

Attached is a notebook which defines a function for importing data from such a file. It accepts a filename and returns a list structure containing the imported elements. Following are examples for using this to assign the data to arbitrary variables, or to convert the control tags a form suitable for use as a Mathematica symbol and assign the data elements to them. It assumes a form similar to "test.txt" and does no error checking. It assumes the data file is in the same directory as the notebook, although a full path could be used.

Kind regards, David

Attachments:
POSTED BY: David Keith
Posted 8 years ago

I generally do this by importing all of the data, so that I have the numbers imported as numbers, and then select the data from the raw import. In the code below two methods are shown. The first assumes that the data is a single block of lines, each with 4 reals. The second method finds the positions of the start and end delimiters and takes the elements in between.

Best,

David

In[1]:= SetDirectory[NotebookDirectory[]]

Out[1]= "C:\\Users\\David\\Desktop\\Erick"

In[2]:= raw = 
 Import["test.txt", "Table", "FieldSeparators" -> { " ", ","}]

Out[2]= {{"#A", "Albert"}, {"#B", 0.5, "%"}, {"#C", 
  0.01726}, {"$V"}, {-25., -25., 5., 0.053609}, {-24., -25., 5., 
  0.065964}, {-23., -25., 5., 0.051466}, {-22., -25., 5., 
  0.053896}, {"$E"}}

In[3]:= (* By pattern match *)

In[4]:= data1 = Cases[raw, {_Real, _Real, _Real, _Real}]

Out[4]= {{-25., -25., 5., 0.053609}, {-24., -25., 5., 
  0.065964}, {-23., -25., 5., 0.051466}, {-22., -25., 5., 0.053896}}

In[5]:= (* by delimiters *)

In[6]:= start = Position[raw, {"$V"}][[1, 1]] + 1

Out[6]= 5

In[7]:= end = Position[raw, {"$E"}][[1, 1]] - 1

Out[7]= 8

In[8]:= data2 = raw[[start ;; end]]

Out[8]= {{-25., -25., 5., 0.053609}, {-24., -25., 5., 
  0.065964}, {-23., -25., 5., 0.051466}, {-22., -25., 5., 0.053896}}
Attachments:
POSTED BY: David Keith
Posted 8 years ago

Is it always the first row of the CSV to be removed? Then read the file and delete the first line.

If it is not always the first row then is there some other way a program can determine what are comments and what are data?

If you could include a small example file, describe what parts the data are then that might help.

If a program were able to correctly handle that file would imply that it would almost certainly do any other file you might have then that would be even better.

POSTED BY: Bill Simpson
Posted 8 years ago

Hi Bill, I included the a snippit of the format what the CSV file would be like. Basically the data is marked between \ $V and \$E. Which contains 4 columns of data to be post processed. In this case I am interested in column 1,2,4 (x,y,value) The comments are marked with # and a tag. example #A Albert is the marker for a Name. It would be great to also extract this information and put this in a variable to use later in the program.

I managed to extract that data with the following code: ReadList["test.txt",Record, RecordSeparators->{"$V", "$E" }] . But this only outputs a single blob of (which does contain the data I need but with an extra , at the end?) but I have not been able to get this blob inputted in a table of sorts. So I guess I'm missing something here.

Thanks for the help.

Attachments:
POSTED BY: Erick van Rijk
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract