Group Abstract Group Abstract

Message Boards Message Boards

Trying to analyze discussions around Covid using Twitter data

Posted 5 years ago
POSTED BY: Camila Castillo
4 Replies
POSTED BY: Camila Castillo

Hola Camila, I think I found a way to remove these tags from the Twitter before you process the data. I called the Twitter string t. Then this seemed to work; however, I only tried this on a short segment of your Twitter string so it is possible there are other patterns that may also need to be excluded. Good luck.

StringDelete[t, Shortest["RT@" ~~ __ ~~ ":"]]
POSTED BY: Nathan Shpritz

Hi Nathan!

First of all, thank you so much for your help. It worked perfectly! I also followed your suggestion about lower cases and it went great. Thank you again.

As a last touch, however, I would like to delete all "@<<STRING>>" usernames in my search, because they are just noisy in what I would like to search for. Searching a bit, I came up with:

CleanttAsString = StringDelete[ttAsString,"@"~~__~~" "]

In which ttAsString is the variable containing the twitter data. However, it seems that I'm just erasing all of my data. Any idea of what might be happening? I'm still trying to get my hand around string patterns.

I'm attaching a new Notebook with the new improvements. Thanks again for the help!

POSTED BY: Camila Castillo

I think this might work for you - I do not have a Twitter account and I am hoping to see what you come up with.

First, I copied your stepwords list, and converted it into a string and then into an association:

To get it into String format:

stepspStep1 = ToString /@ stepsp;
stepspAsString = StringJoin[Riffle[stepspStep1, " "]]

Then create an association as a WordCount:

stepsspAsWordcount = WordCounts[stepspAsString]

Then I took a piece or your Twitter data and created a WordCount association as well:

shortTwitter = WordCounts[t]

And then just took the complement of the keys from the Twitter with the stop words:

shortList = KeyComplement[{shortTwitter, stepsspAsWordcount}]

You can create a WordCloud directly from the shortList:

WordCloud[shortList]

I did notice that one should probably move all letters to lowercase because "Todo" slipped into the WordCloud.

Good luck.

POSTED BY: Nathan Shpritz
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard