Group Abstract Group Abstract

Message Boards Message Boards

0
|
4.1K Views
|
7 Replies
|
1 Total Like
View groups...
Share
Share this post:

How to join a Dataset as a column to another dataset?

Posted 3 years ago

I am trying to Import the titanic dataset from ExampleData, and convert it to a Dataset. My code so far is as follows:

trainData = ExampleData[{"MachineLearning", "Titanic"}, "TrainingData"];
X = Keys[trainData]
Y = Values[trainData]
dsX = Dataset[X]
dsY = Dataset[Y]

How do I now add dsY as a column to dsX? I tried with Join and Append but somehow could not get through. Any help is deeply appreciated.

POSTED BY: Abhijit Mustafi
7 Replies
Posted 3 years ago

Another way, but without keys and values. Take a look at Normal for this way and the previous one I posted.

MapThread[Append, {dsX // Normal, dsY // Normal}] // Dataset
POSTED BY: Rohit Namjoshi
Posted 3 years ago

Using data repository data which, apart from column names looks identical

titanicData = ResourceData["Sample Data: Titanic Survival"]

(* Replace all Missing with 0. May not be appropriate if e.g. gender is missing *)
titanicData /. _Missing -> 0

(* Replace just missing Age with 0 years *)
titanicData[All, <|#, "Age" -> If[MissingQ[#Age], Quantity[0, "Years"], #Age]|> &]

Depending on what you are going to do with the data, replacing missing age with 0 may not be a good idea as it can introduce unwanted bias. Might be better to ignore those rows or replace the missing age with the mean age for that class / gender. You can also try SynthesizeMissingValues.

POSTED BY: Rohit Namjoshi

One more query, my combined dataset is called X as shown in the figure.

enter image description here

A number of the rows have a missing "Age" column. I need to substitute 0 (zero) for all the missing rows. I can find all the rows with missing Ages X[Select[MissingQ[#Age] &]][All, "Age"] but cannot update them with zero. Can you please help. Thanks once again!!

POSTED BY: Abhijit Mustafi

Thanks a tonne!! The second method is a godsend and does exactly what I wanted. Thanks once again.

POSTED BY: Abhijit Mustafi
Posted 3 years ago

If for some reason you need to convert the {features} -> class format to a Dataset then here is one way

columns = 
 List @@ ExampleData[{"MachineLearning", "Titanic"}, "VariableDescriptions"] // Flatten
data = List @@@ trainData // Map[Flatten]

data // Map[AssociationThread[columns -> #] &] // Dataset
POSTED BY: Rohit Namjoshi

Thanks for the input. But I specifically want to append the two datasets as asked in the original question.

POSTED BY: Abhijit Mustafi
Posted 3 years ago

Hi Abhijit,

The "MachineLearning" data is in a form suitable for input to ML functions. If you want a Dataset, it can be converted, but easier to get the data from the Wolfram Data Repository which is a Dataset.

ResourceData["Sample Data: Titanic Survival"]
POSTED BY: Rohit Namjoshi
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard