Message Boards Message Boards


How to modify a Dataset?

Posted 7 years ago
9 Replies
14 Total Likes

Hello all, I have the following problem. I have a Dataset with three columns. Now I want to a a fourth column with combining the second and third column (e.g. store the sum of two numbers (second and third column) in the fourth column). I tried a lot but had no success. Maybe it is very simple, but I do not see how to manage this. Can anyone give me a hint.

Greetings from Germany


9 Replies

Dear Mike,

let's generate a data set:

dataset = RandomReal[1, {10, 3}]

This is one way:

{#[[1]], #[[2]], #[[3]], #[[2]] + #[[3]]} & /@ dataset

This is another

Flatten[{#, #[[2]] + #[[3]]}] & /@ dataset

If you prefer procedural programming you can use

Table[Flatten[{dataset[[i]], dataset[[i, 2]] + dataset[[i, 3]]}], {i, 1, Length[dataset]}]

This one works, too

Transpose[Append[Transpose[dataset] , dataset[[All, 2]] + dataset[[All, 3]]]]



Marco, I believe that Michael is asking about a Version 10 Dataset

rather than a simple rectangular array of data.

Dear David,

I am sorry. You are right. I should have read the question more carefully. My bad!

Here's the answer for a "Dataset" dataset ...

dataset = Dataset[Table[<|"a" -> RandomReal[], "b" -> RandomReal[], "c" -> RandomReal[]|>, {i, 1, 10}]]

enter image description here

Then you can do

dataset2 = Append[#, "d" -> #["b"] + #["c"]] & /@ dataset

enter image description here

This one is a bit shorter

dataset2 = Append[#, "d" -> #b + #c] & /@ dataset

Alternatively, you can use the Join command

dataset2 = Join[#, <|"d" -> #b + #c|>] & /@ dataset

I guess that it can also be done using procedural programming, e.g.

dataset2 = Dataset[Table[Normal[Join[dataset[[i]], <|"d" -> dataset[[i, 2]] + dataset[[i, 3]]|>]], {i,1, 10}]]

but this version is neither fast, nor elegant nor readable.

Cheers, M.

Thank you very much, David! when I see the solution it is really not too complicated - but I was not able to manage it by myself :-( . The documentation for Dataset in V10 is really not very instructive, so it is great to get a quick help here :-)

Greetings from Germany


Thanks Marco for the nice quick tutorial!

I just noticed that this can also be easily done with pattern matching:

Normal[dataset] /. x_Association :>  Join[x , <|{"d" -> x[[2]] + x[[3]]}|>]

That would then cover all three programming paradigms, I guess.



Posted 6 years ago

Suppose I want to test whether the value in column A is in some other list or set, and then append the result of the test (True or False) as a separate column to the dataset. I am used to doing this in SQL, but not sure how best to do this using Datasets in Mathematica 10.

Get a data set

In[16]:= Clear[daS1]
daS1 = Dataset[Table[<|"a" -> o, "b" -> RandomChoice[Characters["caitlin ramsey"]]|>, {o, 12}]]

and another one

In[24]:= Clear[daS2]
daS2 = Dataset[Table[<|"c" -> o, "d" -> Characters["caitlin"][[o]]|>, {o, StringLength["caitlin"]}]]

to create a third one using approaches mentioned in this discussion already

In[33]:= daS1 /. x_Association :> Join[x, <|{"Q" -> If[Intersection[{x[[2]]}, Normal[Query[All, "d"]@daS2]] != {}, True, False]}|>]

to point out a best way to do it one needs some criterion (what means good, what means bad) and at least two ways to get the job done to make a non-trivial decision ...

Posted 6 years ago

I think that will be quite sufficient, Udo, thank you. In this case, I was mainly interested in the syntax to incorporate the comparison test into the dataset modification. No requirement for optimality here -- I could have said "reasonably efficient" rather than "best" to express the problem -- and thank you for considering that detail!

Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
or Discard

Group Abstract Group Abstract