Message Boards Message Boards

[✓] Fastest safe way to change keys in large Dataset?

GROUPS:

Consider a large Dataset (ds) that wraps a list of Associations or an Association of Associations. Perhaps the Dataset has something like a million rows and thirty columns. I decide I want to change the name of a few of the column names. The fastest way I have found to safely change the column names is with a function like this (assuming we have a list of Associations).

 changeColumnNames[ds_,newColumnNames_]:=Dataset[Map[
   AssociationThread[newColumnNames, #] &, Normal@ds[All, Values]]]

Basically I rebuild the Dataset using AssociationThread. This executes in a time roughly proportional to the number of rows. Given that the Dataset already has embedded inside its FullForm (second element) a list of the column names, I am wondering if there might be a way of doing this more swiftly, particularly if one is willing to use the Dataset` package.

All ideas welcome!

P.S. I've played around with Transpose-ing the Dataset but have not gotten anything to work both well and swiftly.

POSTED BY: Seth Chandler
Answer
8 days ago

Oops. There is a method in the Documentation that reads as follows, where {"a","b","c"} were the names of the old columns and {"A","B","C"} are the names of the new columns. So, probably this is about a good a method as exists. (Unless someone has a better idea.

 dataset[All, <|"A" -> "a", "B" -> "b", "C" -> "c"|>]
POSTED BY: Seth Chandler
Answer
8 days ago

Group Abstract Group Abstract