Group Abstract Group Abstract

Message Boards Message Boards

Using text (not images) in neural networks - Mathematica 11?

After searching for a few hours, I can't find any examples of text-based Neural Networks. Image processing is great and all but NLP is key to what I am building.

I tried several versions of what I cloud create for "NetEncoder" and "NetDecoder" to fix NLP but was unsuccessful. I tried leaving it out altogether hoping automatic would kick in and handle it for me.

I am assuming the data needs to be vectorized? I really like how Classify makes all this so much easier but I can not tolerate the training time. I must find a way to make the new NN's work for NLP because the Target->"GPU" is so much faster!

You can see the error I get at the bottom. I don't know how to create the NN configuration so it will work on text.

Here is my code:

dataSF = Import["E:\\Downloads\\Spam-Filter-CSV2.csv", "CSV"];

In[21]:= assocSF = Apply[Rule, dataSF, {1}];

In[22]:= trainLength = Round[Length[assocSF]*.3]
testLength = Round[Length[assocSF]*.7] + 1

Out[22]= 28923

Out[23]= 67487

In[24]:= trainingData = Take[assocSF, ;; trainLength];
testData = Take[assocSF, testLength ;;];

In[39]:= testData = RandomSample[trainingData, 10]

Out[39]= {"i can assist with mor info what about you" -> "NEUTRAL", 
 "i am just fired up to have received the txt these days" -> "OPTIN", 
 "how do your building reach your firm" -> "COMMERCIAL", 
 "i am just grateful for obtaining ur txts its remarkable" -> 
  "NEUTRAL", "hey there" -> "NEUTRAL", 
 "avoid the company texting" -> "OPTOUT", 
 "aww thanks swthart lov you as wll" -> "PERSONAL", 
 "i am just fired up to have attained that txt now" -> "OPTIN", 
 "cannot wait to see ya either" -> "PERSONAL", 
 "fairly great what about you" -> "NEUTRAL"}

neuralNet = 
 NetChain[{ConvolutionLayer[20, 5], Ramp, PoolingLayer[2, 2], 
   ConvolutionLayer[50, 5], Ramp, PoolingLayer[2, 2], FlattenLayer[], 
   500, Ramp, 10, SoftmaxLayer[]}]

trainedNN = 
 NetTrain[neuralNet, trainingData, ValidationSet -> testData, 
  MaxTrainingRounds -> 3, TargetDevice -> "GPU"]

Out[38]= Failure[NetTrain, 
Association[
 "MessageTemplate" :> MessageName[NetTrain, "invnet"], 
  "MessageParameters" :> {"First argument to NetTrain should be a fully specified net"}]]
POSTED BY: David Johnston
3 Replies
POSTED BY: David Johnston

Oh. Okay. Thanks for clearing that up. At least I am not crazy. lol

Well, hm... Hopefully, they just add Target->"GPU" to Classify, Predict, Cluster, etc. and no one accustomed to those functions will have to worry about the new complexities of the NetTrain and NetGraph.

The training time difference is almost 10 to 1. If it would take a day on my built-in NVIDIA 660 GPU (in my Alienware X51 PC), it would take over a week on the CPU. That is a very big deal.

In fact, it should probably just auto detect a compatible GPU and attempt to use it and if that fails then use CPU. I don't see any big benefit in expecting users to specify GPU. They all want fast by default but would probably want the reverse Target->"CPU" as an option.

I would also like to see the really cool visualization of the training in progress like NetTrain for Classify and Predict. There are many great features like this that users will probably expect that they should work the same on any Method->"NeuralNetwork" parameterized functions.

Just my 2 cents. I am loving Mathematica 11 and believe Wolfram Language is the future!

POSTED BY: David Johnston

After searching for a few hours, I can't find any examples of text-based Neural Networks. Image processing is great and all but NLP is key to what I am building.

This is not an accident: v11.0 is not very good for text. This was reserved for 11.1:

1) have full RNN support

2) support variable length sequences

3) have appropriate NetEncoders for text

4) Generalize existing layers (add 1-d convolutions, make EmbeddingLayer accept sequences)

Regarding your example: without NetEncoders for text, its up to you to convert the text to some appropriate tensor representation that can be fed into a conv net. It obviously has no idea what to do with pure text.

Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard