Message Boards Message Boards


Trying to analyze discussions around Covid using Twitter data

Posted 2 years ago
4 Replies
5 Total Likes

Hi! A little bit about my idea and what I could figure out with my basic Mathematica knowledge. Hopefully you guys can let me know other solutions for the objectives I want to achieve.

Basically, I would like to check what kind of words or topics are mostly discussed in my country's Twittosphere. I'm really interested in this data because I enjoy doing SciComm and getting data about how the public interacts on the Internet will help me build a better proposal whenever I have to do SciComm expos or anything. My country is also interesting because we're known to not invest to much on science, for example. What reputation do scientists have in a not-so-scientific country? That would also be interesting to know.

To start with this, I picked up 7 search words and hashtags I know people in my country are using from what I could extract from the daily TT.

So, my first idea is to apply wordclouds on imported Twitter data.

I tried to use my Mathematica v.11.2.0 on my Ubuntu computer, but sadly, I don't know why it can't process any of the Twitter code I'm writing in there. So, I decided to start working with a notebook on the Cloud. This is what I got using a script I found somewhere:

twitter= ServiceConnect["Twitter"]
result = twitter["TweetSearch", "Query" -> "#CoronavirusEnPeru", 
      MaxItems -> 100];
WordCloud@Flatten[Normal[StringSplit[#["Text"]] & /@ result]]

This resulted in a nice WordCloud, but it lacks certain filtering. Now, the tweets I'm trying to analyze here are all written in spanish. So, using a list of spanish stepwords, I created a list stepsp.

My goal is to remove all spanish stepwords to obtain a better glimpse. Also, I added some twitter jargon that might be noisy.

This is what I tried to use to remove spanish stepwords, using my list stepsp:

DeleteCases[Normal[result[[All,"Text"]]], Alternatives @@  stepsp] 

And this is where I'm stuck right now. It seems that DeleteCases isn't doing actually anything. I tried to produce a wordcloud from that, but it seems that that computation exceeds what I'm able to do with the Cloud.

These are my questions:

  1. Is DeleteCases a good way to remove stepwords? Why is that function not deleting what I want to delete?
  2. Is there a way to just obtain tweets according to a certain country? I tried using GeoLocation, but I don't know if this is the way to go.
  3. How should I proceed with the Mathematica I have installed in my computer? This is what I get whenever I want to process anything:

    $CharacterEncoding: "The byte sequence {240} could not be interpreted as a character in \ the UTF-8 character encoding."

The same code I'm using in the Cloud notebook has been tried in my own computer, but it doesn't work. And it seems that I won't be able to complete these tasks with the Cloud seeing as I might occupy all the memory I have available.