Group Abstract Group Abstract

Message Boards Message Boards

[WSS19] Tools for Consuming and Transforming Time Series Data

Posted 6 years ago
4 Replies
Posted 6 years ago

Thank you, that is really helpful feedback!

POSTED BY: Kyle MacLaury
Posted 6 years ago

First subelements (like 2 ;; 4 in the following example) are applied so the positions correspond to the actual positions of the data in the file. Then SkipLines is applied, then HeaderLines is applied.

In[10]:= csv = ExportString[Partition[Range[25], 5], "CSV"]

Out[10]= "1,2,3,4,5
6,7,8,9,10
11,12,13,14,15
16,17,18,19,20
21,22,23,24,25
"

In[11]:= ImportString[csv, {"Data", 2 ;; 4}]

Out[11]= {{6, 7, 8, 9, 10}, {11, 12, 13, 14, 15}, {16, 17, 18, 19, 20}}

In[12]:= ImportString[csv, {"Data", 2 ;; 4}, "SkipLines" -> 1]

Out[12]= {{11, 12, 13, 14, 15}, {16, 17, 18, 19, 20}}

For "Data" and similar elements, HeaderLines and SkipLines behave the same. For Dataset, HeaderLines will become the Dataset headers, and SkipLines will simply skip like normal.

POSTED BY: Sean Cheren
Posted 6 years ago

This has been a great experience. Thank you for the feedback, that is much cleaner!

I hadn't come across the usage of "SkipLines" -> 5, "HeaderLines" -> 1 until now. It looks like the use of "SkipLines" resets the line count, so that you start at 1 with the line immediately after the final skipped line. Is that correct?

POSTED BY: Kyle MacLaury
Posted 6 years ago

Nice post, and I hope you had fun playing with WL at WSS this year! One small improvement I'd like to suggest is to use the features in Import to automatically deal with headers and parts of data rather than importing everything and starting from there. For example, this will give a Dataset with the correct headers and data range:

Import["rt_hourlysysload_20190709.csv", {"Dataset", ;; -2}, "SkipLines" -> 5, "HeaderLines" -> 1]

And to get the associations with headers, one simply can call Normal on the result of this. This removes all the usage of Part and AssociationThread!

Headers can also be inserted on Export directly so there is no need to join the header to values.

Export[exportPath, Values[data], "CSV", "TableHeadings" -> headers]

Since we are just taking Values on the associations anyway, I think this step can be skipped entirely, and we can just aggregate the Data, and pull a header from a single example. This would be much more efficient for very large data, and would be my ultimate suggestion for the step of combining the data:

importData[path_String] := 
    Import[path, {"Data", 1 ;; -2}, "SkipLines" -> 5, "HeaderLines" -> 1];
csvFiles = FileNames["*.csv", FileNameJoin[{NotebookDirectory[], "Data"}]];
headers = Import[First[csvFiles], {"Data", 6}];
data = Join[importData /@ csvFiles];
exportPath = FileNameJoin[{NotebookDirectory[], "alldata.csv"}];
Export[exportPath, data, "CSV", "TableHeadings" -> headers]
POSTED BY: Sean Cheren
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard