Message Boards Message Boards

Animated WordCloud for Alice in Wonderland

I am trying to find a good way to visualize evolution of subjects, ideas, characters in a text using WordCloud. Below you see this simple idea:

  1. Get text of the book and DeleteStopwords

  2. Delete obvious non-informative top words as "Alice" in our case

  3. Build frames of animation each frame on a bout a page of a book ~ 500 words

  4. For smoother animation transitions make shifts between the frames much smaller than a page - say 20 words

So we are basically scanning the text by a window of 500 words shifted in steps of 20 words. Here is the result and code. Smaller words are harder to to comprehend. Let me know if you got some better ideas.

enter image description here

alice = DeleteCases[TextWords[DeleteStopwords[
     ToLowerCase[ExampleData[{"Text", "AliceInWonderland"}]]]], "alice" | "said" | "little" | "heard"];

lngth = alice // Length


frms = ParallelTable[WordCloud[alice[[k ;; k + 500]], ImageSize -> 400], {k, 1, lngth - 500, 20}];
Export["alice.gif", frms, "DisplayDurations" -> {.25}]


POSTED BY: Vitaliy Kaurov
5 Replies

Neat project! I think for this application, it would be nice if WordCloud took an argument/option for the max number of words to display in the cloud. Maybe it would be easier to understand if you take just the top, say, 20 or 25 words in each window to make a cloud with. I might also play with taking a longer DisplayDurations, while perhaps increasing the window step size from 20, if needed, to maintain dynamics. Maybe I'll try it myself when I have some time :).

POSTED BY: Paco Jain

Hi Vitaliy,

I liked that display.

Three ideas for visualization:

  1. I don't know if you can access it, but in a play it might be nice to have the character's word clouds playing against each other to see what they are saying in sequence. Also, if a character doesn't say much the words spoken by that character will have more visual impact.

    In W|A we have the "Dialog timeline" pod

    which shows when several of the main characters speak in relation to the progression of the play.

  2. Maybe something with the presidential debates could be done with word clouds from separate speeches being compared. That might also indicate how well they answer any questions posed to them. For example if the debate has a question posed about a certain subject, it would be interesting to see how often the responses to that question include reference to that subject.

  3. Could you color the words by sentiment, so happy = yellow, sad = gray, peace = blue, or whatever scheme you choose (included in a legend, possibly), to see if the clouds of words spoken give a visual impact on that dimension as well?

POSTED BY: Dorothy Evans
Posted 9 years ago

This is really great, Vitaliy! Seeing the way the representation changes as the window moves through time is very interesting. It made me think of another representation -- different but quite similar.

Picture a network map which includes all the words ever used, with their spacial proximities determined by how close in time their utterances were, on average, to each other. But now represent their usage densities, within each time window, as a magnitude, and represent that as something like a color or a value (like saturation or value in the colorimetric sense), or even a height of the characters, as in a 3D histogram. Now we see a connectedness representation, as well as a time evolution. It would be interesting to follow a political campaign, or a debate, or even the conversation at a lunch table, as the discussion went from subject to subject.

POSTED BY: David Keith

Very cool, Vitaliy!

Page-by-page analysis is one of the easiest ways to track the evolution, but I would think using chapters or paragraphs might be more meaningful. Dialog, of course, makes for very short paragraphs, and some authorsÂ’ chapters are too long. There might be an adaptive method to find natural partitions of the whole body of text, sort of like automatically making chapters and paragraphs.

And then for visualizing, would the community graph be roughly linear, or could there be loops (perhaps from a flashback)? This idea stems from this morningÂ’s post on the Republican candidate debates. They were connected to each other by the frequency of mutually used words. I was thinking that a similar analysis of pages or chapters and the overall structure of that network. Would page 1 be followed by page 2, then page 3, etc., or could there be loops back to earlier pages? Or could some of the connections be so tenuous that the edges essentially vanish?

Fun stuff! My wife and daughter both enjoyed these posts.

POSTED BY: Robert Nachbar

Thank you for your suggestions. Indeed max number of words could be an interesting option to take advantage of. I already played with various DisplayDurations and slowing it down makes it more boring to my taste. I was thinking that maybe there are method of packing words that allows to keep same words mostly at the same positions. Most of the jumping is not due to disappearance of words but their sudden relocation. But this sounds like a tough problem.

POSTED BY: Vitaliy Kaurov
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
or Discard

Group Abstract Group Abstract