Hi! A little bit about my idea and what I could figure out with my basic Mathematica knowledge. Hopefully you guys can let me know other solutions for the objectives I want to achieve.
Basically, I would like to check what kind of words or topics are mostly discussed in my country's Twittosphere. I'm really interested in this data because I enjoy doing SciComm and getting data about how the public interacts on the Internet will help me build a better proposal whenever I have to do SciComm expos or anything. My country is also interesting because we're known to not invest to much on science, for example. What reputation do scientists have in a not-so-scientific country? That would also be interesting to know.
To start with this, I picked up 7 search words and hashtags I know people in my country are using from what I could extract from the daily TT.
So, my first idea is to apply wordclouds on imported Twitter data.
I tried to use my Mathematica v.11.2.0 on my Ubuntu computer, but sadly, I don't know why it can't process any of the Twitter code I'm writing in there. So, I decided to start working with a notebook on the Cloud. This is what I got using a script I found somewhere:
twitter= ServiceConnect["Twitter"]
result = twitter["TweetSearch", "Query" -> "#CoronavirusEnPeru",
MaxItems -> 100];
WordCloud@Flatten[Normal[StringSplit[#["Text"]] & /@ result]]
This resulted in a nice WordCloud, but it lacks certain filtering. Now, the tweets I'm trying to analyze here are all written in spanish. So, using a list of spanish stepwords, I created a list stepsp
.
My goal is to remove all spanish stepwords to obtain a better glimpse. Also, I added some twitter jargon that might be noisy.
This is what I tried to use to remove spanish stepwords, using my list stepsp
:
DeleteCases[Normal[result[[All,"Text"]]], Alternatives @@ stepsp]
And this is where I'm stuck right now. It seems that DeleteCases isn't doing actually anything. I tried to produce a wordcloud from that, but it seems that that computation exceeds what I'm able to do with the Cloud.
These are my questions:
- Is DeleteCases a good way to remove stepwords? Why is that function not deleting what I want to delete?
- Is there a way to just obtain tweets according to a certain country? I tried using
GeoLocation
, but I don't know if this is the way to go.
How should I proceed with the Mathematica I have installed in my computer? This is what I get whenever I want to process anything:
$CharacterEncoding: "The byte sequence {240} could not be interpreted as a character in \ the UTF-8 character encoding."
The same code I'm using in the Cloud notebook has been tried in my own computer, but it doesn't work. And it seems that I won't be able to complete these tasks with the Cloud seeing as I might occupy all the memory I have available.
Thanks for any help you could give me!