Message Boards Message Boards

Understanding sentiment analysis in Mathematica

I have a project to analyze some letters that Bertrand Russell, the logician and philosopher, wrote while imprisoned during 1918 for his writings about WWI.

As I understand the classifier, it uses words and phrases classified for sentiment, and predicts the sentiment of the word or phrase provided. In her book Artificial Intelligence Melanie Mitchell discusses sentiment analysis saying that the network is trained on human-labeled examples, it learns useful features, and outputs a classification confidence.

I expect this is the method for the Mathematica classifier. This implies a larger amount of training data. Is this the Mathematica method.

The data sources are important because language changes over time.

There are examples of using Mathematica to analyze sentiment in some of Shakespeare’s work from 400 years ago. The change in language and meaning doesn’t seem to affect the analysis.

Is the classifier suitable for text written in 1918?

POSTED BY: Dan O'Leary
15 Replies

Kacie,

I haven't finished this project because I was pulled in a different direction. I hope to restart work early in 2023.

I found the following, which might help you.

Tracking a Descent to Savagery with the Wolfram Language: Plotting Sentiment Analysis in Lord of the Flies https://blog.wolfram.com/2017/12/07/tracking-a-descent-to-savagery-with-the-wolfram-language-plotting-sentiment-analysis-in-lord-of-the-flies/

Analyzing Shakespeare’s Texts on the 400th Anniversary of His Death https://blog.wolfram.com/2016/04/21/analyzing-shakespeares-texts-on-the-400th-anniversary-of-his-death/

POSTED BY: Dan O'Leary

Hi Kacie,

Your question seems to be unrelated to this thread which is more about sentiment analysis. You should probably ask a new question.

Anyway, the structure and format of the text of Hamlet that you are using will dictate how to proceed with the analysis. Take a look at my answer to this question. It should help you get started. If you get stuck, ask a new question and include the code you have tried, and explain the problem you have.

POSTED BY: Rohit Namjoshi
Posted 1 year ago

I need to count the names of the characters and their number of letters in Shakespeare's "Hamlet", but even after looking at your advice, I don't understand how to do it. For the theoretical part of my assignment, I found https://samploon.com/free-essays/hamlet/ to get details and philosophical explanations of the relationship between Hamlet and Ophelia. However, in addition to these free essays, I need help with the programming exercise.

POSTED BY: Kacie Brown

Again, thank you to all for the information.

For the letters, I copy and paste the transcription into MS Word so I can edit them. This means deleting the annotation references, poetry, and long quotations. The poetry and quotations could affect the sentiment analysis. Then I copy and paste the edited transcript into a Mathematica notebook.

I have a notebook for each letter, so I just change the file name and the name of the text. The notebook then calculates a work cloud, sentiment analysis for the whole letter, a pie chart for the whole letter, identifies the sentences, plots the sentence positive sentiments (and moving averages), etc.

When I’ve finished all the letters, I’ll use TimeLine to show both sets of letters in order. I’ve not decided on what other information to present, so I enjoy the suggestions and references.

POSTED BY: Dan O'Leary

For those interested in recent approaches to sentiment and emotion mining of literary texts (and arguably one could include Russel's letters in this group), this survey provides an overview.

POSTED BY: Arno Bosse

Russell’s 1918 letters from Brixton Prison are available at https://russell-letters.mcmaster.ca/ For each letter there is an image of the original and an annotated transcription. I’m using the transcriptions of the letters in English (some are in French) after removing the references to the annotations. In some cases, Russell quotes passages of text and poems, which I also delete, since they might influence the overall sentiment analysis.

In addition, there is a database of Russell related information at https://bracers.mcmaster.ca/ It doesn’t have transcriptions because of ownership and copyright issues. In the larger project I’m looking at social network analysis related to Russell for 1918. The database, today, returned 1167 entries for 1918.

I’m not sure what to use as training data for a classier. The general consensus is that the writing style in letters is not the same as in published articles and books.

I considered classifying each sentence and letter by hand, but I’m concerned it would be too subjective. Also, since it would take some time, I’m concerned about being consistent.

I like the objectivity from the Mathematica classifier, but given some of the points raised, I may need to adjust some of the classifications.

POSTED BY: Dan O'Leary

Thanks for that information. Looks like a nice repository but I cannot seem to import the texts of the letters, or even the web site pages, into Mathematica.

POSTED BY: Daniel Lichtblau
Posted 2 years ago

Daniel,

This works reasonably well on the few that I tried. Will likely need adjusting if the format is different in some of the transcripts. Also does not remove footnote numbers from the text. Will have to parse the HTML to do that.

brixtonLetter[number_] := Module[{plaintext, start, end},
  plaintext = 
   Import["https://russell-letters.mcmaster.ca/brixton-letter-" <> ToString@number, "Plaintext"];
  start = StringPosition[plaintext, "Original"][[1, 2]] + 2;
  end = StringPosition[plaintext, "Notes"][[1, 1]] - 1;
  StringTake[plaintext, {start, end}]
  ]

To get the transcript for letter 20

brixtonLetter[20]
POSTED BY: Rohit Namjoshi

Thank you Rohit. It seems I may have a connectivity problem:

Import::nfurl: Unable to retreive data from https://russell-letters.mcmaster.ca/brixton-letter-20. Consult Internal`$LastInternalFailure for potential information.

Might be a permissions issue. I need to look into this.

POSTED BY: Daniel Lichtblau
Posted 2 years ago

Daniel,

I was using "12.3.1 for Mac OS X ARM (64-bit) (July 8, 2021)". Tried it on 13.0 PR3 "13.0.0 for Mac OS X ARM (64-bit) (November 12, 2021)" and got the same failure.

Looks like a bug in 13.0.

POSTED BY: Rohit Namjoshi

Yes, this sounds interesting, please keep going. However, you should be skeptical toward whatever the AI outputs, because it's guaranteed not to understand all the nuances of human language, especially in letters from a jailed man to a free woman.

POSTED BY: Brad Klee

I also use sentiment analysis quite regularly. But there are some quirks you need to be aware of. E.g.

Classify["Sentiment", "The weather is horrible."]

gives "Negative" as expected.

Classify["Sentiment", "The weather is horrible!"]

gives "Positive". The exclamation mark has an interesting effect on nearly any statement:

Classify["Sentiment", "This day is the worst of my life and I am so sad!"]

is classified as "Positive". In many situations these issues can be mended though.

I think that there might be larger issues to consider than "only" a gradual change of language over time. Also, when analysing for example, social media posts in real time during political debates, one would use rather different Classifiers for say the US and the UK.

I would be quite interested in the analysis of Bertrand Russell's letters and would love to see the results.

Cheers from Scotland, Marco

PS: you can easily build your own classifier of course, using datasets like:

https://www.baeldung.com/cs/sentiment-analysis-training-data https://analyticsindiamag.com/10-popular-datasets-for-sentiment-analysis/ https://blog.cambridgespark.com/50-free-machine-learning-datasets-sentiment-analysis-b9388f79c124 https://www.kaggle.com/c/sentiment-analysis-evaluation/data https://www.kaggle.com/kazanova/sentiment140

This might be interesting, too:

https://epjdatascience.springeropen.com/articles/10.1140/epjds/s13688-017-0121-9

It is also possible to use data from GDELT, which has newspaper articles and has data on how positive/negative the articles are. Not ideal for sentence by sentence analysis.

For a lecture I have "redone" the analysis here: https://www.jstor.org/stable/118328 based on the automatic classification in the Wolfram Language.

POSTED BY: Marco Thiel

This is an element of a larger project for social network analysis of Bertrand Russell’s correspondence in 1918, the year he went to prison for objections to WWI. The social media at the times was the written letter and Russell was a prolific letter writer.

While in prison he wrote letters and smuggled them out in books he borrowed and returned. My interest here is in the letters he wrote to two women: Lady Constance Malleson, an author and actress, also known as Colette and Lady Ottoline Morrell.

I expect that early in his prison term the letters were largely negative but as the came to an end, they would be more positive. I would also like to compare the sentiment between letters to the two correspondents.

I’m using tools from Mathematica. Many of the ideas came from April 2016 posting Analyzing Shakespeare’s Texts on the 400th Anniversary of His Death.

I certainly appreciate the ideas and look forward to exploring them. If all goes well, I’ll post the results here

POSTED BY: Dan O'Leary

Is the data for this publicly available somewhere? Also I wonder what would be good training data for a classifier. Another question (or two): Were there so many letters that it is difficult to classify by hand? Or are there subtleties that might make classification by readers today difficult?

POSTED BY: Daniel Lichtblau
Posted 2 years ago

The method and training data used by Classify for sentiment analysis is not described anywhere in the documentation as far as I can tell. Apparently, it is trained on social media posts as mentioned here.

The NN repository has this sentiment model trained on Amazon product reviews.

In my experience both perform fairly well on contemporary text. You will just have to try it on your data and manually examine a subset of positive/neutral/negative classifications to convince yourself that it does a good enough job.

POSTED BY: Rohit Namjoshi
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract