Group Abstract Group Abstract

Message Boards Message Boards

1
|
7.8K Views
|
11 Replies
|
6 Total Likes
View groups...
Share
Share this post:

[?] Simple way to add a column to a 2-D dataset?

POSTED BY: Eric Smith
11 Replies
POSTED BY: Sander Huisman

That's what I did for years and it was error prone. Associations are probably more efficient, right?

POSTED BY: Eric Smith

It is very similar to a hash table indeed. But finding/manipulating data in "named" columns just takes more time than if it the can simply be accessed by an index. You can store the data as a 'matrix' (list of lists) and 'remembering' the columns yourself, generally much faster. Dataset can also handle data as just matrices, rather than associations.

POSTED BY: Sander Huisman

I see. I was thinking Dataset was similar to a hash table. What structure would you recommend?

POSTED BY: Eric Smith

if your dataset is very big you probably don't want to use Dataset ;-) It is very handy and flexible but that comes at the expense of memory usage and speed in some casesÂ…

POSTED BY: Sander Huisman
POSTED BY: Eric Smith
Posted 8 years ago

This is perfect! Thanks, Neil. I didn't pick this up from the documentation.

Operators using Right Composition

I hope you don't mind me using this thread for another Dataset question. I've struggled with the logic behind

dataset[All, Key["c"] /* <|"ctotal" -> Total, "clength" -> Length|>]

RightComposition is being used so I should be able write this in another form. If f/*/g/*h@x = h[g[f[x]] then I should be able to use

dataset[All, 
 Function[x, <|"ctotal" -> Total, "clength" -> Length|>[Key["c"][x]]]]

But this doesn't work.

Pure functions using "&" instead of Function

Last last question (I think). This doesn't work:

dataset[All, {"a" ->( #["a"] + 1) &, "b" -> g, "c" -> h}]

but this does

dataset[All, {"a" -> Function[x, (x + 1)], "b" -> g, "c" -> h}]

I've been using Datasets for a while but I feel like I'm not using them with full understanding of what's going on. Same goes for &, Function, and RightComposition.

I appreciate all the help I've gotten on this so far.

POSTED BY: Updating Name

Shortest way of doing it might be:

dataset[All, Append[#, "d" -> #["a"] + #["b"]] &]

but might not be fastest.

Or simpler notation:

dataset[All, <|#, "e" -> #a + #b|> &]
POSTED BY: Sander Huisman

Eric,

joinDataset[{dataset2, dataset[All, {"b"}]}]

will work. you have to use the {} to get the key to stay around -- just as you did with

dataset2 = dataset[All, {"a", "c"}]

Regards,

Neil

POSTED BY: Neil Singer

Not quite, but what I'm asking for isn't much different than adding a column to a 2-D array which can be a bit tedious with

MapThread[Append,{twoDarray,newColumn}]

I can use MapThread if I convert the datasets to associations, do the MapThread, then turn it back into an association

Dataset@MapThread[Append, 
  Normal /@ {dataset, dataset[All, <|"d" -> #a + #b|> &]}]

Using Transpose instead

The route that seems a little more intuitive is to use Transpose. So for adding a column to a 2-D array

Transpose[Append[Transpose[twoDarray],newColumn]]

Which is analogous to the approach I'm using with "joinDataset". I think Datasets are a great way to keep track of a lot of data in a human-usable form.


Extracting column/row with key

One other thing I wonder about is, how do I retain the key when extracting a single value from the Dataset? So for instance, say I want to take dataset[All,"b"] and add it to a different Dataset dataset2? I can't do this:

dataset2 = dataset[All, {"a", "c"}]
joinDataset[{dataset2, dataset[All, "b"]}]

I have to map an association to each element first

joinDataset[{dataset2, <|"b" -> #|> & /@ dataset[All, "b"]}]
POSTED BY: Eric Smith

Perhaps this?:

dataset = Join[dataset, dataset[All, <|"d" -> #a + #b|> &], 2]
POSTED BY: Michael Rogers
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard