Group Abstract Group Abstract

Message Boards Message Boards

1
|
7.9K Views
|
11 Replies
|
6 Total Likes
View groups...
Share
Share this post:

[?] Simple way to add a column to a 2-D dataset?

Is there a simpler way to add a column to a Dataset than what I'm doing? I've defined a function to combine two (or more) separately created Datasets

joinDataset[x_List] := Transpose[Join @@ Transpose /@ x]

But all I want to do is to use data from a couple columns of the dataset and append the value to the dataset. The current process seems too laborious:

dataset = Dataset[{
   <|"a" -> 1, "b" -> "x", "c" -> {1}|>,
   <|"a" -> 2, "b" -> "y", "c" -> {2, 3}|>,
   <|"a" -> 3, "b" -> "z", "c" -> {3}|>,
   <|"a" -> 4, "b" -> "x", "c" -> {4, 5}|>}]
dataset = joinDataset[{dataset, dataset[All, <|"d" -> #a + #b|> &]}]

That will add a column "d" that adds "a" and "b" but I suspect there's something built-in that will do this better.

POSTED BY: Eric Smith
11 Replies

Shortest way of doing it might be:

dataset[All, Append[#, "d" -> #["a"] + #["b"]] &]

but might not be fastest.

Or simpler notation:

dataset[All, <|#, "e" -> #a + #b|> &]
POSTED BY: Sander Huisman

Good call Sander! It's obvious now. I suppose if the dataset is very big I'm better off doing the operation first and the join later?

POSTED BY: Eric Smith

if your dataset is very big you probably don't want to use Dataset ;-) It is very handy and flexible but that comes at the expense of memory usage and speed in some casesÂ…

POSTED BY: Sander Huisman

I see. I was thinking Dataset was similar to a hash table. What structure would you recommend?

POSTED BY: Eric Smith

It is very similar to a hash table indeed. But finding/manipulating data in "named" columns just takes more time than if it the can simply be accessed by an index. You can store the data as a 'matrix' (list of lists) and 'remembering' the columns yourself, generally much faster. Dataset can also handle data as just matrices, rather than associations.

POSTED BY: Sander Huisman

That's what I did for years and it was error prone. Associations are probably more efficient, right?

POSTED BY: Eric Smith

That's the balance: error-prone/flexible/convenience vs speed/less memory. Associations are implemented very efficiently but if you can avoid it and you can use packed arrays than that is (generally) much faster.

POSTED BY: Sander Huisman

Eric,

joinDataset[{dataset2, dataset[All, {"b"}]}]

will work. you have to use the {} to get the key to stay around -- just as you did with

dataset2 = dataset[All, {"a", "c"}]

Regards,

Neil

POSTED BY: Neil Singer
Posted 8 years ago
POSTED BY: Updating Name

Perhaps this?:

dataset = Join[dataset, dataset[All, <|"d" -> #a + #b|> &], 2]
POSTED BY: Michael Rogers

Not quite, but what I'm asking for isn't much different than adding a column to a 2-D array which can be a bit tedious with

MapThread[Append,{twoDarray,newColumn}]

I can use MapThread if I convert the datasets to associations, do the MapThread, then turn it back into an association

Dataset@MapThread[Append, 
  Normal /@ {dataset, dataset[All, <|"d" -> #a + #b|> &]}]

Using Transpose instead

The route that seems a little more intuitive is to use Transpose. So for adding a column to a 2-D array

Transpose[Append[Transpose[twoDarray],newColumn]]

Which is analogous to the approach I'm using with "joinDataset". I think Datasets are a great way to keep track of a lot of data in a human-usable form.


Extracting column/row with key

One other thing I wonder about is, how do I retain the key when extracting a single value from the Dataset? So for instance, say I want to take dataset[All,"b"] and add it to a different Dataset dataset2? I can't do this:

dataset2 = dataset[All, {"a", "c"}]
joinDataset[{dataset2, dataset[All, "b"]}]

I have to map an association to each element first

joinDataset[{dataset2, <|"b" -> #|> & /@ dataset[All, "b"]}]
POSTED BY: Eric Smith
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard