Message Boards Message Boards

0
|
4546 Views
|
4 Replies
|
3 Total Likes
View groups...
Share
Share this post:

String Manipulation and Calculations Referencing Word Dictionaries

Posted 10 years ago

Hello,

Suppose I have a table where one column denotes a headline. The other denotes its date like so,

 2015-08-21 {Spain,gains,balloons,tomorrow}
 2015-08-21 {England,loses,rocks,after,rain}
 2015-08-21 {Brazil,gets,bananas,and,abandons,red,spinach}    
 2015-08-21 {Israel,begins,sleeping,but,loses,race}
 2015-08-21 {Japan,fights,at,noon,might,win,large,gains}
 2015-08-21 {Russia,rejects,koala,adopts,on,lonely,tiger}

Next, suppose we have created two word dictionaries like so,

negativeWords={EXCLUDE,FIGHTS,LONELY,LOSES,REJECT,TIGER}
positiveWords={ADOPT,BALLOON,BANANA,EMBRACE,GAIN,JOY}

I would like to re-create the first table and add:

  • a the third column that shows the number of positive words in each headline
  • a fourth column that shows the number of negative words in each headline
  • a final column that calculates the number of positive words subtracted by the number of negative words.

In this example, I would like the final table to look like this,

 2015-08-21 {Spain,gains,balloons,tomorrow}                 2  0   2
 2015-08-21 {England,loses,rocks,after,rain}                0  1  -1
 2015-08-21 {Brazil,gets,bananas,and,abandons,red,spinach}  1  1   0
 2015-08-21 {Israel,begins,sleeping,but,loses,race}         0  1  -1
 2015-08-21 {Japan,fights,at,noon,might,win,large,gains}    1  1   0
 2015-08-21 {Russia,rejects,koala,adopts,lonely,tiger}      1  3  -2

So far I have managed to create a table where the headlines were cut to words only available in the negative dictionary. I can then take a single headline and calculate the number of words with WordCount and total them up. I repeat this process with the positive dictionary and I am able to get the subtraction of positives minus negatives. I am not proficient enough with Mathematica to loop a function that will do this to every headline and create a table like the one above.

Any help would be greatly appreciated.

4 Replies

First we'll define the dictionaries:

negativeWords = {"exclude", "fights", "lonely", "loses", "reject", 
   "tiger"};
positiveWords = {"adopt", "balloon", "banana", "embrace", "gain", 
   "joy"};

Then we'll extract the actual data from the table you posted. Since I assume your data is in a computer-readable format, you'll end up doing this a bit differently.

wordsByDate = 
 Dataset[Function[<|"date" -> DateObject[#[[1]]], 
       "words" -> ToString /@ ToExpression[#[[2]]]|>][
     StringSplit[#, " "]] & /@ 
   StringSplit["2015-08-21 {Spain,gains,balloons,tomorrow}
    2015-08-21 {England,loses,rocks,after,rain}
    2015-08-21 {Brazil,gets,bananas,and,abandons,red,spinach}    
    2015-08-21 {Israel,begins,sleeping,but,loses,race}
    2015-08-21 {Japan,fights,at,noon,might,win,large,gains}
    2015-08-21 {Russia,rejects,koala,adopts,on,lonely,tiger}", "\n"]]

And we can define a function that created a list of the positive, negative, and sum sentiments given a list of words. It's easier to operate on a string than a list of words, so we convert the list to one.

sentimentList[words_List] := 
 Block[{lowercase = StringJoin[Riffle[ToLowerCase /@ words, " "]], 
   positive, negative, sentiment}, 
  positive = StringCount[lowercase, positiveWords]; 
  negative = StringCount[lowercase, negativeWords];
  sentiment = positive - negative;
  {"positive" -> positive, "negative" -> negative, 
   "sentiment" -> sentiment}]

Then we can execute sentimentList on each row of the table:

wordsByDate[All, Append[#, sentimentList[#[["words"]]]] &]

Here's the result:

5-column table of dates, word lists, and sentiments

POSTED BY: Jesse Friedman

Sorry, I assumed your data would be in a computer-readable format. Try this:

wordsByDate4= 
 Dataset[Function[<|"date" -> DateObject[#[[1]]], 
       "words" -> ToString /@ ToExpression[#[[2]]]|>][
     StringSplit[#, " "]] & /@ 
   StringSplit[StringReplace["2015-08-21 {Spain,gains,balloons,tomorrow}
               2015-08-21 (England,loses,rocks,after,rain)
               2015-08-21 {Brazil,gets,bananas,and,abandons,red,spinach}    
               2015-08-21 {Israel,begins,sleeping,but,loses,race}
               2015-08-21 {Japan,fights,at,noon,might,win,large,gains}
               2015-08-21 {Russia,rejects,koala,adopts,on,lonely,tiger}",{"("->"{",")"->"}"}], 
    "\n"]]
POSTED BY: Jesse Friedman

Thanks for the reply, Jesse. It all works great besides a little hiccup I haven't been able to overcome.

ToExpression throws up an error when it encounters a headline wrapped in parentheses as such:

wordsByDate4= 
 Dataset[Function[<|"date" -> DateObject[#[[1]]], 
       "words" -> ToString /@ ToExpression[#[[2]]]|>][
     StringSplit[#, " "]] & /@ 
   StringSplit["2015-08-21 {Spain,gains,balloons,tomorrow}
        2015-08-21 (England,loses,rocks,after,rain)
        2015-08-21 {Brazil,gets,bananas,and,abandons,red,spinach}    
        2015-08-21 {Israel,begins,sleeping,but,loses,race}
        2015-08-21 {Japan,fights,at,noon,might,win,large,gains}
        2015-08-21 {Russia,rejects,koala,adopts,on,lonely,tiger}", 
    "\n"]]

Do you have any suggestions?

Perfect, thanks again.

Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract