Message Boards Message Boards

0
|
4268 Views
|
3 Replies
|
0 Total Likes
View groups...
Share
Share this post:
GROUPS:

How Do I Suppress Attributes in an Association so Classify[] Ignores Them?

Posted 9 years ago

I am using Classify[] to process a large number of records, represented as associations which have a lot of attributes each.

Some of the attributes should not figure in Classify[]'s work -- they effectively are identifiers to the records themselves. I could strip them out, but they are useful in identifying the records downstream.

Is there any way to supply associations to Classify[], but ask it to ignore certain attributes?

POSTED BY: Brad Varey
3 Replies

Hi Brad,

I know that this is not really what you want, because is sort of involves stripping the dataset down to what you need, but I am not aware of an option in Classify/Predict that allows you to do what you want. What I show here works with the toy data you provided but I wanted to see how fast it is and generated my own toy dataset:

datalong = <|"ID" -> RandomInteger[20000], "Name" -> #, "Height" -> RandomVariate[NormalDistribution[1.7, 0.2]], 
      "Systolic" -> RandomInteger[{105, 160}], "Diastolic" -> RandomInteger[{65, 95}]|> -> RandomInteger[{60, 99}] & /@ 
   Flatten@Import["http://www.babycenter.com/top-baby-names-2012", "Data"][[2, 2, 2]];

enter image description here

There are 200 fake subjects in that dataset.

 p = Predict[Rule @@@ Transpose[{{#[[1]], #[[2]]} & /@ datalong[[All, 1, {"Diastolic", "Systolic"}]], datalong[[All, 2]]}]]

If you want to ask for data you could :

p[<|"Diastolic" -> 35, "Systolic" -> 160|>]

This example is actually a bit poor because of the random choice of parameters....

Best wishes,

Marco

PS: Note, that I strip the "Systolic" / "Diastolic" from the predictor.

p2 = Predict[Rule @@@ Transpose[{datalong[[All, 1, {"Diastolic", "Systolic"}]], datalong[[All, 2]]}]]
p2[<|"Diastolic" -> 35, "Systolic" -> 160|>]

appears to give slightly different results.

POSTED BY: Marco Thiel
Posted 9 years ago

Hi Jesse

This is a made-up example of some training data, much simpler than what I am actually working with...

data = {
   <| "ID" -> 12345, "Name" -> "Alice", "Height" -> 1.62, 
     "Systolic" -> 122, "Diastolic" -> 91 |> -> 78,
   <| "ID" -> 12346, "Name" -> "Bob", "Height" -> 1.73, 
     "Systolic" -> 131, "Diastolic" -> 92 |> -> 89,
   <| "ID" -> 12347, "Name" -> "Carole", "Height" -> 1.67, 
     "Systolic" -> 118, "Diastolic" -> 79 |> -> 92
   };

One can imagine this is data coming from a database. ID and Name are completely irrelevant to Classify[]'s task. But I would rather keep them with the record rather than strip them out. Once the classifier is built, I might throw a thousand such records at it, without the outcomes, to see what Classify predicts, then sort by its predictions, and look for interesting cases.

You can see keeping the ID and Name would make this a lot easier.

POSTED BY: Brad Varey

Can you provide an example of data you want to input to Classify?

POSTED BY: Jesse Friedman
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract