Message Boards Message Boards

0
|
4230 Views
|
4 Replies
|
3 Total Likes
View groups...
Share
Share this post:

String Manipulation and Calculations Referencing Word Dictionaries

Posted 9 years ago

Hello,

Suppose I have a table where one column denotes a headline. The other denotes its date like so,

 2015-08-21 {Spain,gains,balloons,tomorrow}
 2015-08-21 {England,loses,rocks,after,rain}
 2015-08-21 {Brazil,gets,bananas,and,abandons,red,spinach}    
 2015-08-21 {Israel,begins,sleeping,but,loses,race}
 2015-08-21 {Japan,fights,at,noon,might,win,large,gains}
 2015-08-21 {Russia,rejects,koala,adopts,on,lonely,tiger}

Next, suppose we have created two word dictionaries like so,

negativeWords={EXCLUDE,FIGHTS,LONELY,LOSES,REJECT,TIGER}
positiveWords={ADOPT,BALLOON,BANANA,EMBRACE,GAIN,JOY}

I would like to re-create the first table and add:

  • a the third column that shows the number of positive words in each headline
  • a fourth column that shows the number of negative words in each headline
  • a final column that calculates the number of positive words subtracted by the number of negative words.

In this example, I would like the final table to look like this,

 2015-08-21 {Spain,gains,balloons,tomorrow}                 2  0   2
 2015-08-21 {England,loses,rocks,after,rain}                0  1  -1
 2015-08-21 {Brazil,gets,bananas,and,abandons,red,spinach}  1  1   0
 2015-08-21 {Israel,begins,sleeping,but,loses,race}         0  1  -1
 2015-08-21 {Japan,fights,at,noon,might,win,large,gains}    1  1   0
 2015-08-21 {Russia,rejects,koala,adopts,lonely,tiger}      1  3  -2

So far I have managed to create a table where the headlines were cut to words only available in the negative dictionary. I can then take a single headline and calculate the number of words with WordCount and total them up. I repeat this process with the positive dictionary and I am able to get the subtraction of positives minus negatives. I am not proficient enough with Mathematica to loop a function that will do this to every headline and create a table like the one above.

Any help would be greatly appreciated.

4 Replies

First we'll define the dictionaries:

negativeWords = {"exclude", "fights", "lonely", "loses", "reject", 
   "tiger"};
positiveWords = {"adopt", "balloon", "banana", "embrace", "gain", 
   "joy"};

Then we'll extract the actual data from the table you posted. Since I assume your data is in a computer-readable format, you'll end up doing this a bit differently.

wordsByDate = 
 Dataset[Function[<|"date" -> DateObject[#[[1]]], 
       "words" -> ToString /@ ToExpression[#[[2]]]|>][
     StringSplit[#, " "]] & /@ 
   StringSplit["2015-08-21 {Spain,gains,balloons,tomorrow}
    2015-08-21 {England,loses,rocks,after,rain}
    2015-08-21 {Brazil,gets,bananas,and,abandons,red,spinach}    
    2015-08-21 {Israel,begins,sleeping,but,loses,race}
    2015-08-21 {Japan,fights,at,noon,might,win,large,gains}
    2015-08-21 {Russia,rejects,koala,adopts,on,lonely,tiger}", "\n"]]

And we can define a function that created a list of the positive, negative, and sum sentiments given a list of words. It's easier to operate on a string than a list of words, so we convert the list to one.

sentimentList[words_List] := 
 Block[{lowercase = StringJoin[Riffle[ToLowerCase /@ words, " "]], 
   positive, negative, sentiment}, 
  positive = StringCount[lowercase, positiveWords]; 
  negative = StringCount[lowercase, negativeWords];
  sentiment = positive - negative;
  {"positive" -> positive, "negative" -> negative, 
   "sentiment" -> sentiment}]

Then we can execute sentimentList on each row of the table:

wordsByDate[All, Append[#, sentimentList[#[["words"]]]] &]

Here's the result:

5-column table of dates, word lists, and sentiments

POSTED BY: Jesse Friedman

Thanks for the reply, Jesse. It all works great besides a little hiccup I haven't been able to overcome.

ToExpression throws up an error when it encounters a headline wrapped in parentheses as such:

wordsByDate4= 
 Dataset[Function[<|"date" -> DateObject[#[[1]]], 
       "words" -> ToString /@ ToExpression[#[[2]]]|>][
     StringSplit[#, " "]] & /@ 
   StringSplit["2015-08-21 {Spain,gains,balloons,tomorrow}
        2015-08-21 (England,loses,rocks,after,rain)
        2015-08-21 {Brazil,gets,bananas,and,abandons,red,spinach}    
        2015-08-21 {Israel,begins,sleeping,but,loses,race}
        2015-08-21 {Japan,fights,at,noon,might,win,large,gains}
        2015-08-21 {Russia,rejects,koala,adopts,on,lonely,tiger}", 
    "\n"]]

Do you have any suggestions?

Sorry, I assumed your data would be in a computer-readable format. Try this:

wordsByDate4= 
 Dataset[Function[<|"date" -> DateObject[#[[1]]], 
       "words" -> ToString /@ ToExpression[#[[2]]]|>][
     StringSplit[#, " "]] & /@ 
   StringSplit[StringReplace["2015-08-21 {Spain,gains,balloons,tomorrow}
               2015-08-21 (England,loses,rocks,after,rain)
               2015-08-21 {Brazil,gets,bananas,and,abandons,red,spinach}    
               2015-08-21 {Israel,begins,sleeping,but,loses,race}
               2015-08-21 {Japan,fights,at,noon,might,win,large,gains}
               2015-08-21 {Russia,rejects,koala,adopts,on,lonely,tiger}",{"("->"{",")"->"}"}], 
    "\n"]]
POSTED BY: Jesse Friedman

Perfect, thanks again.

Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract