Group Abstract Group Abstract

Message Boards Message Boards

0
|
5.3K Views
|
9 Replies
|
10 Total Likes
View groups...
Share
Share this post:

Concatenating a group of csv files

POSTED BY: Henrick Jeanty
9 Replies

Great! I didn't realize that you could just join them like that. I assumed that you needed first to get the lists of associations (thus the use of Normal), join the list, and then turn the list into a Dataset.

Your solution is just so much simpler.

Thank you!

POSTED BY: Henrick Jeanty

Thank you again Eric. Your first reply made me think more about it and I solved my problem with exactly what you call a nifty way. Here is essentially how it works:

I just turn each dataset to its Normal format as a list of associations. I then Join all those associations and turn the final file into a Dataset with // Dataset

POSTED BY: Henrick Jeanty
Posted 3 years ago
POSTED BY: Rohit Namjoshi
POSTED BY: Henrick Jeanty
Posted 3 years ago
POSTED BY: Eric Rimbey

Thank you Eric. Your answer has set me on the right path. I will mark your reply as an answer. However, I would now like to know if I can import each csv file as a Dataset (which I do via SemanticImport) and somehow concatenate the different datasets into a single dataset. In short I would like to turn CSV files f1.csv, f2.csv .... fn.csv into datasets ds1,ds2,...dsn and merge the n datasets into a single one with the proper header. Any suggestion?

POSTED BY: Henrick Jeanty
Posted 3 years ago

Join works with Dataset

ds1 = Dataset[{<|"a" -> 1, "b" -> "x"|>, <|"a" -> 2, "b" -> "y"|>}]
ds2 = Dataset[{<|"a" -> 3, "b" -> "z"|>, <|"a" -> 4, "b" -> "t"|>}]
ds1~Join~ds2
POSTED BY: Rohit Namjoshi
Posted 3 years ago

Nice!

POSTED BY: Eric Rimbey
Posted 3 years ago

Here's a very lightly tested attempt:

ConcatenateCsvFiles[output_String, inputs : {__String}] :=
 With[
  {joinedData = Import /@ inputs},
  Export[output, Prepend[Flatten[joinedData[[All, 2 ;;]], 1], joinedData[[1, 1]]]]]

But I should make some comments. First, I probably wouldn't actually do all three steps in one big function like this. I would probably do the Import and Export separately from the catenation of the list data. It would be easier to test and parts could be re-used independently. It also reduces the risk of overwriting good data with bad.

Second, this doesn't match what you asked for:

contatenate[singleFilename,#]& /@ {f1.csv,f2.csv,....fn.csv}

But what you asked for would probably need to rely on updating some temporary state, and it would just become awkward. The semantics of the problem (as I understand it) isn't about applying the same function to a list of data, but about restructuring a list of (lists of) data.

Third, if you're never going to process the data itself--i.e. this is strictly a file-to-file convenience tool, then there might be a neater approach that doesn't bother with importing structured data but just processes the files' contents as just lines.

POSTED BY: Eric Rimbey
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard