Group Abstract Group Abstract

Message Boards Message Boards

0
|
4.8K Views
|
4 Replies
|
3 Total Likes
View groups...
Share
Share this post:

String Manipulation and Calculations Referencing Word Dictionaries

Posted 10 years ago

Hello,

Suppose I have a table where one column denotes a headline. The other denotes its date like so,

 2015-08-21 {Spain,gains,balloons,tomorrow}
 2015-08-21 {England,loses,rocks,after,rain}
 2015-08-21 {Brazil,gets,bananas,and,abandons,red,spinach}    
 2015-08-21 {Israel,begins,sleeping,but,loses,race}
 2015-08-21 {Japan,fights,at,noon,might,win,large,gains}
 2015-08-21 {Russia,rejects,koala,adopts,on,lonely,tiger}

Next, suppose we have created two word dictionaries like so,

negativeWords={EXCLUDE,FIGHTS,LONELY,LOSES,REJECT,TIGER}
positiveWords={ADOPT,BALLOON,BANANA,EMBRACE,GAIN,JOY}

I would like to re-create the first table and add:

  • a the third column that shows the number of positive words in each headline
  • a fourth column that shows the number of negative words in each headline
  • a final column that calculates the number of positive words subtracted by the number of negative words.

In this example, I would like the final table to look like this,

 2015-08-21 {Spain,gains,balloons,tomorrow}                 2  0   2
 2015-08-21 {England,loses,rocks,after,rain}                0  1  -1
 2015-08-21 {Brazil,gets,bananas,and,abandons,red,spinach}  1  1   0
 2015-08-21 {Israel,begins,sleeping,but,loses,race}         0  1  -1
 2015-08-21 {Japan,fights,at,noon,might,win,large,gains}    1  1   0
 2015-08-21 {Russia,rejects,koala,adopts,lonely,tiger}      1  3  -2

So far I have managed to create a table where the headlines were cut to words only available in the negative dictionary. I can then take a single headline and calculate the number of words with WordCount and total them up. I repeat this process with the positive dictionary and I am able to get the subtraction of positives minus negatives. I am not proficient enough with Mathematica to loop a function that will do this to every headline and create a table like the one above.

Any help would be greatly appreciated.

4 Replies
POSTED BY: Jesse Friedman

Sorry, I assumed your data would be in a computer-readable format. Try this:

wordsByDate4= 
 Dataset[Function[<|"date" -> DateObject[#[[1]]], 
       "words" -> ToString /@ ToExpression[#[[2]]]|>][
     StringSplit[#, " "]] & /@ 
   StringSplit[StringReplace["2015-08-21 {Spain,gains,balloons,tomorrow}
               2015-08-21 (England,loses,rocks,after,rain)
               2015-08-21 {Brazil,gets,bananas,and,abandons,red,spinach}    
               2015-08-21 {Israel,begins,sleeping,but,loses,race}
               2015-08-21 {Japan,fights,at,noon,might,win,large,gains}
               2015-08-21 {Russia,rejects,koala,adopts,on,lonely,tiger}",{"("->"{",")"->"}"}], 
    "\n"]]
POSTED BY: Jesse Friedman

Perfect, thanks again.

Thanks for the reply, Jesse. It all works great besides a little hiccup I haven't been able to overcome.

ToExpression throws up an error when it encounters a headline wrapped in parentheses as such:

wordsByDate4= 
 Dataset[Function[<|"date" -> DateObject[#[[1]]], 
       "words" -> ToString /@ ToExpression[#[[2]]]|>][
     StringSplit[#, " "]] & /@ 
   StringSplit["2015-08-21 {Spain,gains,balloons,tomorrow}
        2015-08-21 (England,loses,rocks,after,rain)
        2015-08-21 {Brazil,gets,bananas,and,abandons,red,spinach}    
        2015-08-21 {Israel,begins,sleeping,but,loses,race}
        2015-08-21 {Japan,fights,at,noon,might,win,large,gains}
        2015-08-21 {Russia,rejects,koala,adopts,on,lonely,tiger}", 
    "\n"]]

Do you have any suggestions?

Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard