Message Boards Message Boards

[WSS19] Time dependent text data and the dynamical word cloud

Introduction

The word cloud is a popular means of visualization of text data showing objects sized according to their weights. However, if the text data is changing over time, for example, the newspaper article of each edition, then the position of the words in the word cloud can change drastically from time to time. This happens especially if the size of the words has changed over time. See the picture below for an example.

Word cloud from Wikipedia data, today (left) and in 2014 (right)

FIG 1: Word cloud from Wikipedia data in 2019 (left) and in 2014 (right). We can clearly see that if the size of the words has changed they move from their position.

A stable word position in the word cloud can help the viewer to keep track of the word and visually guide the reader to see how it has changed over time. Stabilizing the position of the words in the word cloud for the dynamical data is a challenging issue. This project aims to solve this problem.

A general algorithm for static word cloud is to fill the space from the larger to the smaller size, starting from the center and spiraling outward until there is no more overlapping.

I tested the following modification to the algorithm to the dynamical system: The time is obviously discretized. For simplicity, we consider only two consecutive time frames and we want to minimize the displacement of the word between consecutive time lapses. In the long run, the word can move, but the movement would look smooth making reading through word clouds natural.

Here is the algorithm in plain English:

A. Take the initial word data and convert them into a dictionary of words and frequencies. The words with higher frequencies are bigger.

  1. Put the largest word at the center of the graph.
  2. Take the next larger word. Place it at the center of the graph and move it spirally outward until there is no overlap with the previously placed words.
  3. Repeat for smaller words each time starting from the center.

B. Take the word data for the next time step. Arrange the words by their size and start with largest to smallest.

  1. If the current word was present in the word cloud of the previous time laps, start spiraling it from its previous position until there is no overlap with already placed words. If it was not present start from the center of the graph.
  2. Take the next word, and repeat

C. Repeat for a set of words on each time step.

For the other approach to address the dynamical word cloud follow this link and the references therein.

Generating the words

Rectangles of various sizes are used as placeholders for the words in this project. We use labels on the rectangle to track them over time. Here, we define a container that gives a rectangle with a specified label.

toGraphic[
  myRectangle[leftDown_, rightUp_, label_]] := {Rectangle[leftDown, 
   rightUp], Text[Style[label, Red], (leftDown + rightUp)/2.]}
toGraphic[list_List] := Flatten[Map[toGraphic, list]];

myRectangle[leftDown_, rightUp_, id_]["LeftDown"] := leftDown;
myRectangle[leftDown_, rightUp_, id_]["RightUp"] := rightUp;
myRectangle[leftDown_, rightUp_, id_]["Id"] := id

Now we define various functions on this container which can be useful later.

myRectangle /: toRectangle[myRectangle[leftDown_, rightUp_, id_]] := 
 Rectangle[leftDown, rightUp]
SetAttributes[toRectangle, Listable]

myRectangle /: myArea[myRectangle[leftDown_, rightUp_, id_]] := 
 Times @@ (leftDown - rightUp)

myRectangle /: myCentroid[myRectangle[leftDown_, rightUp_, id_]] := 
 1/2 (myRectangle[leftDown, rightUp, id]["LeftDown"] + 
    myRectangle[leftDown, rightUp, id]["RightUp"])
SetAttributes[myCentroid, Listable]

myRectangle /: 
 myTranslate[myRectangle[leftDown_, rightUp_, id_], vec_List] := 
 myRectangle[leftDown + vec, rightUp + vec, id]

myRectangle /: 
 myScaling[myRectangle[leftDown_, rightUp_, id_], scale_] := 
 myRectangle[leftDown*scale, rightUp*scale, id]

Here, we generate two sets of word lists and treat them as the initial word list and the final word list.

dataLength = 5;

xPos = RandomReal[{5, 10}, dataLength];
yPos = RandomReal[{1, 5}, dataLength];

$currentID = 0;
ID[] := (IntegerString[$currentID++, 10, 3]);
wordList1 = 
  Table[myRectangle[{-x, -y}, {x, y}, ID[]], {x, xPos}, {y, yPos}] // 
   Flatten;

wordList2 = myScaling[#, RandomReal[{0.8, 1.2}]] & /@ wordList1;

Here, we can visualize how each word has changed its size. The horizontal axis is the numerical labels on the words and the vertical axis is the area of the word

FIG 2: In the picture above, we can visualize how each word has changed its size. The horizontal axis are numerical labels on the words and the vertical axis is the area of the word

Spiral grids

This is the code to generate the spiral grid. The total area spanned by the grid is proportional to the total area covered by all the words in the list.

\[Theta]MaxVal = 2.00 10^4;
\[CapitalDelta]\[Theta]Val = 10.00;

bValue[wordList_, \[Theta]Max_: \[Theta]MaxVal] := 
  3/4 Sqrt[Total[myArea /@ wordList]]/\[Theta]Max;

gridPoint[\[Theta]_] := {b \[Theta] Cos[\[Theta]/180], 
  b \[Theta] Sin[\[Theta]/180]}

spiralGrid[
  wordList_, \[Theta]Max_: \[Theta]MaxVal, \[CapitalDelta]\[Theta]_: \
\[CapitalDelta]\[Theta]Val] := 
 Module[{bSub = bValue[wordList, \[Theta]Max]}, 
  Table[gridPoint[\[Theta]], {\[Theta], \[CapitalDelta]\[Theta], \
\[Theta]Max, \[CapitalDelta]\[Theta]}] /. b -> bSub]

Spiral grid

FIG 3: Spiral grid used for the static word cloud

Static word cloud

Intersection of the word

We define a function to check the intersection of two rectangles and then we use this function to define another function to check the intersection of a rectangle with a list of rectangles.

intersectingQ[rect1_, rect2_] :=
 Module[{
   x1d = rect1["LeftDown"][[1]],
   y1d = rect1["LeftDown"][[2]],
   x1u = rect1["RightUp"][[1]],
   y1u = rect1["RightUp"][[2]],
   x2d = rect2["LeftDown"][[1]],
   y2d = rect2["LeftDown"][[2]],
   x2u = rect2["RightUp"][[1]],
   y2u = rect2["RightUp"][[2]]},
  And[(Max[x1d, x2d] < Min[x1u, x2u]), (Max[y1d, y2d] < 
     Min[y1u, y2u])]]

intersectListQ[nextWord_, existingWordList_] := 
 Or @@ (intersectingQ[nextWord, #] & /@ existingWordList)

Positioning the words in the right place

Now, we define a function to assign the words in the right position, and then we make graphics out of it.

positionCurrentWordStatic[wordList_, wordPosition_, 
  currentWord_, \[Theta]Max_: \[Theta]MaxVal, \
\[CapitalDelta]\[Theta]_: \[CapitalDelta]\[Theta]Val] :=
 Module[{
   sg = spiralGrid[wordList, \[Theta]Max, \[CapitalDelta]\[Theta]],
   out, cg},
  out = Not[RegionMember[RegionUnion[toRectangle@wordPosition], sg]];
  SetAttributes[Not, Listable];
  cg = Pick[sg, out];
  SelectFirst[myTranslate[currentWord, #] & /@ cg, 
   Not[intersectListQ[#, wordPosition]] &]
  ]


wordListSort[wordList_] := Reverse@SortBy[wordList, myArea]

finalWordPositionsStatic[
  wordList_, \[Theta]Max_: \[Theta]MaxVal, \[CapitalDelta]\[Theta]_: \
\[CapitalDelta]\[Theta]Val] :=
 Module[
  {sorted = wordListSort[wordList]},
  Fold[Append[#1, 
     positionCurrentWordStatic[
      wordList, #1, #2, \[Theta]Max, \[CapitalDelta]\[Theta]]] &, \
{sorted[[1]]}, sorted[[2 ;;]]]
  ]


wordCloud[
  wordList_, \[Theta]Max_: \[Theta]MaxVal, \[CapitalDelta]\[Theta]_: \
\[CapitalDelta]\[Theta]Val] := 
 Graphics[toGraphic[
   finalWordPositionsStatic[
    wordList, \[Theta]Max, \[CapitalDelta]\[Theta]]], 
  ImageSize -> Medium, Frame -> False]

Static word cloud

FIG 4: Static word cloud

Dynamic word cloud

Current word in the previous position

We start by calculating the starting position for the words in the current word data. In a more general case, it can be the average position of the center of the words in the previous word clouds. But here, I am working with two-time steps. So, words are simply assigned at their previous centroid scaled by a factor which is the ratio of the total area occupied by the words in the previous word cloud to that of the present word cloud.

Here we define a function to assign each word in the new word list to its previous position back in time. This function is limited to working with one static and the next dynamic word cloud. But this handles the case of words appearing and disappearing from the previous word data.

centerSelection[wordListInit_, wordListCurrent_] :=
 Module[{
   w2 = wordListSort[wordListCurrent],
   w1 = finalWordPositionsStatic[wordListInit]},
  Table[
    w1[[i]]["Id"] == w2[[j]]["Id"], {j, Length[w2]}, {i, 
     Length[wordListInit]}
    ] // Boole
  ]

centroidList[wordListInit_, wordListCurrent_] :=
 Module[{
   center = myCentroid[finalWordPositionsStatic[wordListInit]],
   cmatrix = centerSelection[wordListInit, wordListCurrent],
   scale},
  scale = Sqrt[Total[myArea /@ wordListCurrent]/
   Total[myArea /@ wordListInit]];
  scale*(cmatrix.center)
  ]

wordInCentroid[wordListInit_, wordListCurrent_] := Module[
  {wl2 = wordListSort[wordListCurrent],
   wl1 = centroidList[wordListInit, wordListCurrent]},
  Table[myTranslate[wl2[[i]], wl1[[i]]], {i, Length[wl2]}]
  ]

The word cloud for the first set of word data (left), step one of the dynamical word cloud before removing the overlap of the word

FIG 5: The word cloud for the first set of word data (left), step one of the dynamical word cloud before removing the overlap of the word

Off-centered spirals

We define a function to calculate the spiral grid centered around the center of the word. The spirals are restricted so that it does not go beyond a projected area which is proportional to the sum of the area of all the words in the word cloud. This partly helps to avoid flying the word off from the center in the dynamical word cloud generation.

offCenterGrid[wordListInit_, wordListCurrent_] :=
 Module[{
   grid = spiralGrid[wordListCurrent],
   center = centroidList[wordListInit, wordListCurrent],
   bLimit = Max[Abs[spiralGrid[wordListCurrent]]],
   xmin, ymin, xmax, ymax,
   offGrid},
  {xmin, ymin, xmax, ymax} = {-bLimit, -bLimit, bLimit, bLimit};
  offGrid = (Function[coord, # + coord] /@ grid) & /@ center;
  Table[
   Select[offGrid[[i]], 
    xmin < #[[1]] < xmax && ymin < #[[2]] < ymax &], {i, 
    Dimensions[offGrid][[1]]}
   ]
  ]

Remove the overlap

Now, we remove the overlap (if any) of the words by moving them around the spiral centered at the initial position of each word.

positionCurrentWordDynamic[wordListInit_, wordListCurrent_, 
  wordPosition_, currentWord_] :=
 Module[{
   ocg = offCenterGrid[wordListInit, wordListCurrent],
   part = Length[wordPosition],
   out, cog, cwog},
  out = Not[
    RegionMember[RegionUnion[toRectangle@wordPosition[[part]]], 
     ocg[[part]]]];
  cog = Pick[ocg[[part]], out];
  cwog = myTranslate[currentWord, (# - cog[[1]])] & /@ cog;
  SelectFirst[cwog, Not[intersectListQ[#, wordPosition]] &]
  ]

Illustration of the algorithm. We start with the biggest word, labeled by 10, at its previous position. Then we take the next bigger word. If there is overlap we move it around the spiral centered on it.

FIG 6: Illustration of the algorithm. We start with the biggest word, labeled by 10, at its previous position. Then we take the next bigger word. If there is overlap we move it around the spiral centered on it.

Dynamical word cloud generation

Finally, we find the non-overlapped position for each word in the list and create graphics out of it.

finalWordPositionsDynamic[wordListInit_, wordListCurrent_] := 
 Fold[Append[#1, 
    positionCurrentWordDynamic[wordListInit, 
     wordListCurrent, #1, #2]] &, {wordInCentroid[wordListInit, 
     wordListCurrent][[1]]}, 
  wordInCentroid[wordListInit, wordListCurrent][[2 ;;]]]


wordCloudDynamic[wordListInit_, wordListCurrent_] := 
 Graphics[toGraphic[
   finalWordPositionsDynamic[wordListInit, wordListCurrent]], 
  ImageSize -> Medium, Frame -> False]

The initial word cloud (left) and the final word cloud (right). We can see that the words are stable in their position and there is no overlap.

FIG 7: The initial word cloud (left) and the final word cloud (right). We can see that the words are stable in their position and there is no overlap.

The positive point of this algorithm is that the biggest word will be placed first. So they relatively move lesser. But even for the smaller word, they might find the smaller gap left by the bigger words so they also tend to move less.

Future direction

  1. Detail tuning of the hyper-parameter for better performance is to be done.

  2. It only considers the evolution of the word cloud from the previous static word cloud. The continuous evolution over time is a straightforward generalization.

  3. This algorithm might run into the problem of having words not packing compactly. A simple way to solve this problem is to finally pull the words towards the center until it overlaps. Alternatively, this could be solved by assigning the weight such that the word will get penalized by placing it away from the center of the graph. (not to confuse with the center of the spiral they are moving)

  4. Allowing vertical positioning of the word might help in compactness. This can be achieved by rotating the word by 90 degrees at each grid point before moving to the next grid point.

  5. Restricting the off-centered spiral grid on the square gave a more relative area for words to fit in. Restricting them in a circle of the same relative area as the previous word cloud can improve the compactness of the word cloud.

POSTED BY: Roshan Koirala
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract