Group Abstract Group Abstract

Message Boards Message Boards

Animated WordCloud for Alice in Wonderland

Posted 10 years ago

I am trying to find a good way to visualize evolution of subjects, ideas, characters in a text using WordCloud. Below you see this simple idea:

  1. Get text of the book and DeleteStopwords

  2. Delete obvious non-informative top words as "Alice" in our case

  3. Build frames of animation each frame on a bout a page of a book ~ 500 words

  4. For smoother animation transitions make shifts between the frames much smaller than a page - say 20 words

So we are basically scanning the text by a window of 500 words shifted in steps of 20 words. Here is the result and code. Smaller words are harder to to comprehend. Let me know if you got some better ideas.

enter image description here

alice = DeleteCases[TextWords[DeleteStopwords[
     ToLowerCase[ExampleData[{"Text", "AliceInWonderland"}]]]], "alice" | "said" | "little" | "heard"];

lngth = alice // Length

3580

frms = ParallelTable[WordCloud[alice[[k ;; k + 500]], ImageSize -> 400], {k, 1, lngth - 500, 20}];
Export["alice.gif", frms, "DisplayDurations" -> {.25}]

"alice.gif"

POSTED BY: Vitaliy Kaurov
5 Replies
POSTED BY: Dorothy Evans
POSTED BY: Robert Nachbar
Posted 10 years ago

This is really great, Vitaliy! Seeing the way the representation changes as the window moves through time is very interesting. It made me think of another representation -- different but quite similar.

Picture a network map which includes all the words ever used, with their spacial proximities determined by how close in time their utterances were, on average, to each other. But now represent their usage densities, within each time window, as a magnitude, and represent that as something like a color or a value (like saturation or value in the colorimetric sense), or even a height of the characters, as in a 3D histogram. Now we see a connectedness representation, as well as a time evolution. It would be interesting to follow a political campaign, or a debate, or even the conversation at a lunch table, as the discussion went from subject to subject.

POSTED BY: David Keith

Neat project! I think for this application, it would be nice if WordCloud took an argument/option for the max number of words to display in the cloud. Maybe it would be easier to understand if you take just the top, say, 20 or 25 words in each window to make a cloud with. I might also play with taking a longer DisplayDurations, while perhaps increasing the window step size from 20, if needed, to maintain dynamics. Maybe I'll try it myself when I have some time :).

POSTED BY: Paco Jain

Thank you for your suggestions. Indeed max number of words could be an interesting option to take advantage of. I already played with various DisplayDurations and slowing it down makes it more boring to my taste. I was thinking that maybe there are method of packing words that allows to keep same words mostly at the same positions. Most of the jumping is not due to disappearance of words but their sudden relocation. But this sounds like a tough problem.

POSTED BY: Vitaliy Kaurov
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard