String Manipulation and Calculations Referencing Word Dictionaries

Posted 9 years ago
 Hello, Suppose I have a table where one column denotes a headline. The other denotes its date like so,  2015-08-21 {Spain,gains,balloons,tomorrow} 2015-08-21 {England,loses,rocks,after,rain} 2015-08-21 {Brazil,gets,bananas,and,abandons,red,spinach} 2015-08-21 {Israel,begins,sleeping,but,loses,race} 2015-08-21 {Japan,fights,at,noon,might,win,large,gains} 2015-08-21 {Russia,rejects,koala,adopts,on,lonely,tiger}  Next, suppose we have created two word dictionaries like so, negativeWords={EXCLUDE,FIGHTS,LONELY,LOSES,REJECT,TIGER} positiveWords={ADOPT,BALLOON,BANANA,EMBRACE,GAIN,JOY}  I would like to re-create the first table and add: a the third column that shows the number of positive words in each headline a fourth column that shows the number of negative words in each headline a final column that calculates the number of positive words subtracted by the number of negative words. In this example, I would like the final table to look like this,  2015-08-21 {Spain,gains,balloons,tomorrow} 2 0 2 2015-08-21 {England,loses,rocks,after,rain} 0 1 -1 2015-08-21 {Brazil,gets,bananas,and,abandons,red,spinach} 1 1 0 2015-08-21 {Israel,begins,sleeping,but,loses,race} 0 1 -1 2015-08-21 {Japan,fights,at,noon,might,win,large,gains} 1 1 0 2015-08-21 {Russia,rejects,koala,adopts,lonely,tiger} 1 3 -2  So far I have managed to create a table where the headlines were cut to words only available in the negative dictionary. I can then take a single headline and calculate the number of words with WordCount and total them up. I repeat this process with the positive dictionary and I am able to get the subtraction of positives minus negatives. I am not proficient enough with Mathematica to loop a function that will do this to every headline and create a table like the one above. Any help would be greatly appreciated.
Posted 9 years ago
 First we'll define the dictionaries: negativeWords = {"exclude", "fights", "lonely", "loses", "reject", "tiger"}; positiveWords = {"adopt", "balloon", "banana", "embrace", "gain", "joy"}; Then we'll extract the actual data from the table you posted. Since I assume your data is in a computer-readable format, you'll end up doing this a bit differently. wordsByDate = Dataset[Function[<|"date" -> DateObject[#[[1]]], "words" -> ToString /@ ToExpression[#[[2]]]|>][ StringSplit[#, " "]] & /@ StringSplit["2015-08-21 {Spain,gains,balloons,tomorrow} 2015-08-21 {England,loses,rocks,after,rain} 2015-08-21 {Brazil,gets,bananas,and,abandons,red,spinach} 2015-08-21 {Israel,begins,sleeping,but,loses,race} 2015-08-21 {Japan,fights,at,noon,might,win,large,gains} 2015-08-21 {Russia,rejects,koala,adopts,on,lonely,tiger}", "\n"]] And we can define a function that created a list of the positive, negative, and sum sentiments given a list of words. It's easier to operate on a string than a list of words, so we convert the list to one. sentimentList[words_List] := Block[{lowercase = StringJoin[Riffle[ToLowerCase /@ words, " "]], positive, negative, sentiment}, positive = StringCount[lowercase, positiveWords]; negative = StringCount[lowercase, negativeWords]; sentiment = positive - negative; {"positive" -> positive, "negative" -> negative, "sentiment" -> sentiment}] Then we can execute sentimentList on each row of the table: wordsByDate[All, Append[#, sentimentList[#[["words"]]]] &] Here's the result:
Posted 9 years ago
 Sorry, I assumed your data would be in a computer-readable format. Try this: wordsByDate4= Dataset[Function[<|"date" -> DateObject[#[[1]]], "words" -> ToString /@ ToExpression[#[[2]]]|>][ StringSplit[#, " "]] & /@ StringSplit[StringReplace["2015-08-21 {Spain,gains,balloons,tomorrow} 2015-08-21 (England,loses,rocks,after,rain) 2015-08-21 {Brazil,gets,bananas,and,abandons,red,spinach} 2015-08-21 {Israel,begins,sleeping,but,loses,race} 2015-08-21 {Japan,fights,at,noon,might,win,large,gains} 2015-08-21 {Russia,rejects,koala,adopts,on,lonely,tiger}",{"("->"{",")"->"}"}], "\n"]] 
Posted 9 years ago
 Thanks for the reply, Jesse. It all works great besides a little hiccup I haven't been able to overcome.ToExpression throws up an error when it encounters a headline wrapped in parentheses as such: wordsByDate4= Dataset[Function[<|"date" -> DateObject[#[[1]]], "words" -> ToString /@ ToExpression[#[[2]]]|>][ StringSplit[#, " "]] & /@ StringSplit["2015-08-21 {Spain,gains,balloons,tomorrow} 2015-08-21 (England,loses,rocks,after,rain) 2015-08-21 {Brazil,gets,bananas,and,abandons,red,spinach} 2015-08-21 {Israel,begins,sleeping,but,loses,race} 2015-08-21 {Japan,fights,at,noon,might,win,large,gains} 2015-08-21 {Russia,rejects,koala,adopts,on,lonely,tiger}", "\n"]] Do you have any suggestions?
Posted 9 years ago
 Perfect, thanks again.