Message Boards Message Boards

[✓] Delete missing values of a dataset?

GROUPS:

Hello, Mathematica Community I have imported the following data as a DataSet into Mathematica enter image description here

and I'm trying to run the following Linear Model enter image description here

When I run the following code, I get all that missing data enter image description here

But I'm unable to select the rows with missing data enter image description here

Nothing shows up. Can someone please help me to select those rows and delete them? Because when I examine the original data, I can't find the rows with missing data either so I am curious to see Mathematica's output. Thank you in advance,

Thad

Answer
3 months ago

Probably, the missing data is tagged with additional "reason" (look at the docs for the Missing syntax).

Maybe this example will help:

ds = Dataset[{<|"a" -> 1, "b" -> 11|>, <|"a" -> 2, "b" -> Missing["somereason"]|>, <|"a" -> 3, "b" -> Missing[]|>}]

enter image description here

ds[Count[_Missing], "b"]

2

ds[Select[#b == Missing[] &]]

enter image description here

ds[Select[#b == Missing["somereason"] &]]

enter image description here

ds[Select[MatchQ[#b, _Missing] &]]

enter image description here

POSTED BY: Chris P
Answer
3 months ago

Thanks, Chris P And I think you are spot on because when I investigate further, I see missing values on the DataSet (house$size) enter image description here

that are not missing in the original data (notice house$size and nbaths as well) enter image description here

What's going on? Do you know how I can fix this?

Thanks, Thad

Answer
3 months ago

If I understand well, the problem seems to be when you import the original data into Mathematica. I just remark that the missing nbaths value is a real (1.5) whereas all the others are integers (1) ...

Anyway, you should start a new question, telling precisely how you imported the data, what format the original data are, and give a minimal example.

-

POSTED BY: Chris P
Answer
3 months ago

Thanks, Chris P I imported the data from a csv type format (filename = train.csv) using

data = SemanticImport[ "C:\\Users\\Thadeu\\Documents\\Kaggle\\train.csv"]

I'll start a new question. Thanks

Answer
3 months ago

Group Abstract Group Abstract