Message Boards Message Boards

How To Classify Databin Data - Auto Transformation

I have this data:

{<|"Request For Intent" ->"Exactly why are you text messaging me?"|>,
 <|"Request For Intent" -> "Why are you texting me?"|>,
<|"Request For Intent" ->"How come you text messaging me?"|>,
<|"Request For Intent" ->"Why are you sending text messages me?"|>,
<|"Request For Intent" -> "Why are you text messaging me?"|>,
<|"Request For Intent" ->"Exactly why are you sending text messages me?"|>}

After storing each line in DataDrop and view that data it has transformed into:

 {<|"Request For Intent" -> {"Exactly why are you text messaging me?",
"Why are you texting me?", "How come you text messaging me?",
"Why are you sending text messages me?",
"Why are you text messaging me?",
"Exactly why are you sending text messages me?"}|>}

When I try to run Classify on the data it fails because its not in the correct format. Classify wants:

<|"Exactly why are you text messaging me?" ->"Request For Intent"|>

However, that format can not work in Databin because if you add additional data in the exact order Classify wants, and your data has similar features, it starts duplicating the second item because it think the second one are all Values belonging to a key like:

<|"Exactly why are you text messaging me?" ->{"Request For Intent","Request For Intent"}|>

So, when I pull the data out from the Databin to try and run Classify, I have problems. I tried using Map and a few other ways of trying to put the data back into the format needed for Classify but I can't figure it out. I read every single tutorial and help doc I could find over the last 2 days. Seems like machine learning on IoT data would be super important and there would be lots of tutorials on making IoT data useful. Any help would be great.

I would really wish the Classify function could run directly on the Databin without needing to worry about how the data is ordered. The Databin knows what the Keys and the Values are. Why force the user to do anything? Just assume the Key is the Label and the Values are the Sample for the purposes of Classify. Either that, or let me decide the order of the data I enter into the Databin and don't transform the data so when I pull the data out, it is the same as when I put it in. Then I could just use Reverse before putting the data in or after taking it out.

Until that works out of the box, any help would be helpful. I can't move forward if I can't run Classify on DataDrop data.

POSTED BY: David Johnston

I think I figured it out. Normal is not always Normal and neither is Values.

Solution for pulling out the data and getting it ready for Classification:

binValues = Apply[Reverse, Reverse[#]] & /@ Normal[Normal[bin]]

{"Exactly why are you text messaging me?" -> "Request For Intent", 
 "Why are you texting me?" -> "Request For Intent", 
 "How come you text messaging me?" -> "Request For Intent", 
 "Why are you sending text messages me?" -> "Request For Intent", 
 "Why are you text messaging me?" -> "Request For Intent", 
 "Exactly why are you sending text messages me?" -> 
  "Request For Intent"}

Then:

c = Classify[binValues]

So it works now. So you don't use:

Values[bin] nor bin["Values"] but you use Normal[bin] assuming "bin" is a defined variable represending your Databin.

POSTED BY: David Johnston
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract