Message Boards Message Boards

Create a simple word cloud?

Posted 8 years ago

Hello everyone,

I'd like to create a simple word cloud to use as one of a number of programming examples for my students. I obviously do not want to use Mathematica's slick WordCloud function. Here is what I have so far, which isn't very much: the placement of the styled word Howdy at position {4, 4} in a graphics object whose size is 288.

Graphics[Style[Text["Howdy", {4, 4}], FontFamily -> "Palatino", 
  FontSize -> 22, FontColor -> Orange], ImageSize -> 288]

Suppose that Howdy is the most frequent word in my text, so that, at 22 points, it must display as the biggest in my word cloud. The other, less frequent words will be displayed around it in smaller fonts. That's where I'm stuck. I realize that this is an arrangement or packaging problem, where I need to arrange all of the words in the graphics area in such a way that they do not overlap. But in order to do this I need the location of Howdy as a rectangle that most snugly encloses it (top-left, top-right, bottom-left, bottom right) so that I can place the other words. How should I approach this? Perhaps I should be using something other than the Graphics function?

Regards,

Gregory

POSTED BY: Gregory Lypny
5 Replies

I learned in this discussion that it should/can be treated as a packing problem. See e.g. interesting contributions made by @Frank Kampas.

POSTED BY: Henrik Schachner
Posted 8 years ago

Hi Henrik,

Thanks for the reference to Frank Kampas' work on packaging ellipses. I downloaded the paper. It looks fascinating but it will be a tough read!

Regards,

Gregory

POSTED BY: Gregory Lypny
Posted 8 years ago

Create a simple word cloud?

POSTED BY: Louis Godwin
Posted 8 years ago

Nice question. Building a word cloud from scratch may be a big project. Here are a few ideas that may be useful. To keep things really simple, maybe start with a list of only 3 to 5 words and a list of how many times each is used. This will be handy as proof of concept. At first make a word tower, with the most frequently used words on the bottom. For each word in the list, apply ImageCrop to the Graphics. Students can then use ImageDimensions to find the size of each word. They may want to scale these images to the most common word has the most greatest area, rather than just the the biggest font. Maybe you will want to use the same font size for all words, and ImageResize to scale the images.

Individual words can be placed into a common graphics object at specific locations using Inset. The MMA documentation contains some interesting applications. For instance, the students may want to inset the images, if they can get the sizes right, or they may want to inset the words, using Style, and letting Inset do the scaling. To get this work

Once the students have their word towers working, they can add more words. A long string can split into words using StringSplit. Common articles (a, an, the) can be ignored using StringCases. Word counts can be obtained using Tally.

After the tower comes the cloud. One idea that may work is to generate random (x,y) points with -1<x<1 and -1<y<1. Then sort the points by distance from the origin. The biggest word goes closest to the origin, etc. To prevent overlap, when a word image is added to the graphics, all the remaining points must be pushed away, depending on the size of the word image. Points that are above the word image must be pushed up to make room for the image. Points below the image must be pushed down, etc. Students may come up with much better ways of ordering the images. Or, they may google it.

Attached is a notebook that contains some scratch work done as a proof of concept. It is nothing close to finished product.

Attachments:
POSTED BY: Louis Godwin
Posted 8 years ago

Hi Louis,

Thank you for your thorough and thoughtful approach. I didn't know about ImageCrop and ImageDimensions because I don't do a lot of work with graphics other than financial plots. It is those two functions that will let me deal with the problem of establishing the location of each word as a rectangular graphic with respect to the coordinates of their corners. Good stuff. I also like your approach of starting students off with building word towers!

Thanks again,

Gregory

POSTED BY: Gregory Lypny
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract