Message Boards Message Boards


Concatenating a group of csv files

Posted 12 days ago
9 Replies
10 Total Likes

I don't know why I seem to be stuck on something so. seemingly, simple.
I have a set of Excel csv files. Call them f1.csv, f2.csv.... fn.csv.
They are all in the same format meaning that they all have the same header).

What I want to do is to read them all and concatenate them as I read them into one big file with a single header.
I have tried OpenAppend, Write, Import and Export and nothing seems to be working.

Could someone please provide me with a function such that I could use it as follows:

contatenate[singleFilename,#]& /@ {f1.csv,f2.csv,....fn.csv}

The concatemate function would create a CSV file that would have the common header and all the rows of all the fx.csv files
I apologize in advance if this is trivial (which I know it must be) but that I somehow seem to be stock on.

POSTED BY: Henrick Jeanty
9 Replies
Posted 11 days ago

Here's a very lightly tested attempt:

ConcatenateCsvFiles[output_String, inputs : {__String}] :=
  {joinedData = Import /@ inputs},
  Export[output, Prepend[Flatten[joinedData[[All, 2 ;;]], 1], joinedData[[1, 1]]]]]

But I should make some comments. First, I probably wouldn't actually do all three steps in one big function like this. I would probably do the Import and Export separately from the catenation of the list data. It would be easier to test and parts could be re-used independently. It also reduces the risk of overwriting good data with bad.

Second, this doesn't match what you asked for:

contatenate[singleFilename,#]& /@ {f1.csv,f2.csv,....fn.csv}

But what you asked for would probably need to rely on updating some temporary state, and it would just become awkward. The semantics of the problem (as I understand it) isn't about applying the same function to a list of data, but about restructuring a list of (lists of) data.

Third, if you're never going to process the data itself--i.e. this is strictly a file-to-file convenience tool, then there might be a neater approach that doesn't bother with importing structured data but just processes the files' contents as just lines.

POSTED BY: Eric Rimbey

Thank you Eric. Your answer has set me on the right path. I will mark your reply as an answer. However, I would now like to know if I can import each csv file as a Dataset (which I do via SemanticImport) and somehow concatenate the different datasets into a single dataset. In short I would like to turn CSV files f1.csv, f2.csv .... fn.csv into datasets ds1,ds2,...dsn and merge the n datasets into a single one with the proper header. Any suggestion?

POSTED BY: Henrick Jeanty
Posted 11 days ago

Join works with Dataset

ds1 = Dataset[{<|"a" -> 1, "b" -> "x"|>, <|"a" -> 2, "b" -> "y"|>}]
ds2 = Dataset[{<|"a" -> 3, "b" -> "z"|>, <|"a" -> 4, "b" -> "t"|>}]
POSTED BY: Rohit Namjoshi
Posted 11 days ago


POSTED BY: Eric Rimbey
Posted 11 days ago

There might be a nifty way to do it with datasets directly, but the way that I immediately thought of was to use Normal to "unwrap" the dataset and then apply Dataset again. Let's say you have data1, data2, and data3 that are Dataset expressions that you have already imported from the csv files. You could do this:

Dataset[Normal@{data1, data2, data3}]

Normal turns the datasets into lists of Associations. Dataset automatically works with such Associations. You might have to Flatten the "normalized" data in some circumstances--not sure.

POSTED BY: Eric Rimbey

Thank you again Eric. Your first reply made me think more about it and I solved my problem with exactly what you call a nifty way. Here is essentially how it works:

I just turn each dataset to its Normal format as a list of associations. I then Join all those associations and turn the final file into a Dataset with // Dataset

POSTED BY: Henrick Jeanty
Posted 10 days ago

No need for that complexity, as I mentioned earlier, this works fine

t = d1~Join~d2
POSTED BY: Rohit Namjoshi

Rohit, I now have a further question. Join[d1,d2] works great. But how would I Join N files whose names are in a list. For example, assume I have filenames = {fn1,fn2,fn3,fn4,..fnN} I could use a loop that reads each csv file as a Dataset and Joins it to a running result, where at first I would read the first file, fn1 into result. From that point, I want to read each subsequent file (fn2,....fnN) and Join it to result. I find that:

result = fn1; result = Join[result,#]& /@ {fn2,.....fn3} Doesn't work. I am stuck trying to find a "Mathematica Elegant" way to join N csv files into a dataset without using a loop.

Any suggestions?

POSTED BY: Henrick Jeanty

Great! I didn't realize that you could just join them like that. I assumed that you needed first to get the lists of associations (thus the use of Normal), join the list, and then turn the list into a Dataset.

Your solution is just so much simpler.

Thank you!

POSTED BY: Henrick Jeanty
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
or Discard

Group Abstract Group Abstract