Group Abstract Group Abstract

Message Boards Message Boards

Convergence of synonym networks

GROUPS:
Take a word and find its synonyms. Then find the synonyms of the synonyms from the previous step. And so on. For how long will the number of synonyms continue to grow?

Language structures can take peculiar shapes. Code can be quite simple to compute and automate informative visualizations. Below vertex size is larger if the vertex has more connections, meaning a word has a greater number of synonyms. Note GraphLayout -> {"BalloonEmbedding", "RootVertex" -> word} option to Graph is used to produce this specific layout. The code below is based on an example from Documentation.

Let’s define a function:
 SynonymNetwork[word_String, depth_Integer, labels_: Tooltip] :=
 
  Module[{ed, sz, g},
  
   (* list of edges *)
    ed = Union[Sort /@ Flatten[Rest[NestList[Union[Flatten[
            Thread[# <-> WordData[#, "Synonyms", "List"]] & /@ #[[All, 2]]]] &, {"" <-> word}, depth]]]];
  
   (* size of vertices based on number of synonyms *)
   sz = Thread[VertexList[Graph[ed]] -> Map[{"Scaled", #} &,
           .05 (.01 + .99 N[Rescale[VertexDegree[g = Graph[ed]]]])^.5]];
 
  (* graph *) 
  SetProperty[g,
   {GraphLayout -> {"BalloonEmbedding", "RootVertex" -> word},
    EdgeStyle -> Directive[Opacity[.2], Red],
    VertexStyle -> Directive[Opacity[.1], Black],
    VertexStyle -> {word -> Directive[Opacity[1], Red]},
    VertexSize -> sz, VertexLabels -> Placed["Name", labels]}]
 
  ]

Here how it works:
SynonymNetwork["promise", 3]




Now as I asked earlier: will the number of synonyms continue to grow? This of course depends on a particular thesaurus dictionary at hand. But once we settled on one, what can we find? Obviously some very specific words have no synonyms at all:
In[1]:= WordData["transmogrification", "Definitions"]
Out[1]= {{"transmogrification", "Noun"} ->
          "the act of changing into a different form or appearance (especially a fantastic or grotesque one)"}


In[2]:= WordData["transmogrification", "Synonyms"]
Out[2]= {{"transmogrification", "Noun"} -> {}}

Some words will have very trivial small finite networks (note network depth is set 20, while even 100 or greater will not change it):
SynonymNetwork["chemistry", 20, Above]



And of course many words will have networks that grow very fast. This applies not only for very general words such as “fast” or “beautiful”, but also for strange rare words such as "discombobulate":
ParallelMap[Length[EdgeList[SynonymNetwork["discombobulate", #]]] &, Range[16]]
Out[]= {9, 97, 1097, 7644, 26051, 46671, 59440, 65187, 67805, 68798, 69274, 69456, 69565, 69587, 69592, 69592}

data = ParallelMap[Length[VertexList[SynonymNetwork["discombobulate", #]]] &, Range[16]
{10, 73, 787, 4293, 11646, 18858, 22931, 24911, 25743, 26096, 26238, 26302, 26321, 26330, 26330, 26330}

ListPlot[data, Filling -> Bottom, Joined -> True, Mesh -> All, AxesLabel -> {"Network depth", "Synonyms"}]




So we see "discombobulate" synonym network gets saturated at depth 15 and attains 26330 vertices and 69592 edges. Quite magnificent result I think ;-) This is, btw, very inefficient way of computing the convergence – we repeat same computations many times while building the graph. Ideally we should introduce counting synonyms part into our SynonymNetwork function. This is how “discombobulate” network looks "big" already at depth 3:
SynonymNetwork["discombobulate", 3]


How big do such networks get? Can we make some estimates? Well, lets first define set of all words that to belong to “discombobulate” network:
gr = SynonymNetwork["discombobulate", 15];
ver = VertexList[gr];

Then lets find all unique words in “Alice in Wonderland”:
alice = Union[ToLowerCase[StringSplit[ExampleData[{"Text", "AliceInWonderland"}], RegularExpression["[\\W_]+"]]]];
alice // Length

Out[]= 1484

Then select, say, only nouns that have synonyms 
nouns = Select[alice, MemberQ[WordData[#, "PartsOfSpeech"], "Noun"] && WordData[#, "Synonyms", "List"] =!= {} &];
nouns // Length

Out[]= 809

We see that more than half of these nouns belong to the “discombobulate” network:
MemberQ[ver, #] & /@ nouns // Tally
Out[]= {{False, 223}, {True, 586}}

I also would like to share some beautiful smaller CONVERGED (saturated) networks, which I found, - beautiful in structure and sets of words they gather (see below). They of course do not belong to huge "discombobulate" graph above. And as already mentioned I propose the following - comment if:
  • you find some beautiful networks
  • you can figure out how we can make estimates on “large” networks or do any further digging 
  • you can optimize code
  • you have any ideas / comments at all ;-)

===> "dragonfly" - 23 synonyms, depth 3

ParallelMap[Length[VertexList[SynonymNetwork["dragonfly", #]]] &, Range[7]]
Out[]= {8, 14, 23, 23}

SynonymNetwork["dragonfly", 20, Above]



===> "benevolent" - 27 synonyms, depth 7

ParallelMap[Length[VertexList[SynonymNetwork["benevolent", #]]] &, Range[9]]
Out[]= {11, 15, 19, 23, 25, 26, 27, 27}

SynonymNetwork["benevolent", 20, Below]

POSTED BY: Vitaliy Kaurov
Answer
5 months ago
For the sake of us newbies, could you explain how the heart of this works?
Union[Flatten[Thread[# <-> WordData[#, "Synonyms", "List"]] & /@ Last /@ #]]

At the center, I can see that it builds up a list of edges between a word and it's synonyms and the Union[Flatten[... looks like it removes duplicate edges but I can't understand the structure of the edge list that is being created by this part of the code:
 Thread[# <-> WordData[#, "Synonyms", "List"]] & /@ Last /@ #]]

A couple of my questions are:
  1. What function is being threaded to create the list of edges?
  2. What do the mapping to Last ( /@ Last) and the mapping to the word being processed (/@  #) do?
I've tried extracting parts of the code and evaluating it as a separate inut to see the output structure but I have not been able to get anything helpful.
Thanks in advance.
POSTED BY: Carl Lemp
Answer
5 months ago
Carl, thanks for asking - he is my try at explaining this.

1) Function being threaded is UndirectedEdge[a,b] which has a shorter notation a<->b. When we actually get to the stage of threading, we have a simple construct of linking a specific word to its synonyms in the following way:
In[1]:= Thread[word <-> {syn1, syn2, syn3}]



2) To answer this question it is better to remember that the functions you are asking about are inside NestList[f,exp,n] which is basically a looping iterative contract that takes expression exp, plugs it into function f and MUST return expression of the same structure as original exp in order for NestList to proceed self-consistently from loop to loop. In our example the expression exp is a list of edges. So function f must also return a list of edges. What f does on its 1st step via Last/@ is taking last parts {x,y,z} from edges list {a<->x,b<->y,c<->z} which are the synonyms from the last step, while {a,b,c} would be the words for which synonyms are looked up. The very original expression exp is {"" <-> word} where word is the original word and "" just a dummy placeholder to have "a list of edges" structure. This original exp is getting dropped at the end by function Rest, so vertex "" never appear in the final graph.

I hope this helps somehow.
POSTED BY: Vitaliy Kaurov
Answer
5 months ago
Actually, there is a better way than Last/@, compare:
In[1]:= qqq = RandomInteger[9, {10^6, 2}];

In[2]:= Last /@ qqq; // AbsoluteTiming
Out[2]= {0.071944, Null}

In[3]:= qqq[[All, 2]]; // AbsoluteTiming
Out[3]= {0.013791, Null}

I will update my code soon with [[...]] which is function Part. 
POSTED BY: Vitaliy Kaurov
Answer
5 months ago
Hi Carl,

The smallest subexpression in this example that will do something interesting when you evaluate it is what you get when you triple-click on NestList. The Union has an & at the end of it, which means that it is the pure function that is being repeatedly applied by NestList, so it won't do anything by itself because the # doesn't have a value.

To answer your questions:
1. The function/head being threaded is UndirectedEdge. It is being written with its infix operator (<->) though. Remember to think in terms of how things look in FullForm. Consider this similar example with the infix operator for Equal.
Thread[a == {b, c, d}]
{a == b, a == c, a == d}
It can also be written:
Thread[Equal[a, {b, c, d}]]
But the former is easier to read once you understand it.
2. The Map infix operator is right associative. This means that
a /@ b /@ c
is equivalent to
a /@ (b /@ c)
You can see this by triple-clicking on each of the letters a, b, and c and seeing what subexpression is selected. So Last /@ # is evaluated first. The # here is replaced with a list of undirected edges at each iteration of NestList. Mapping Last across this list gives us a list of the new words to expand from at each iteration. If the Thread was written like
Thread[WordData[#, "Synonyms", "List"] <-> #]
then we would want to map First across the edges to get the new words. We would also need to change the starting value to { word <-> "" }. Once we have the list of new words we map the Thread across them which creates an edge joining each new word to each of its synonyms.
POSTED BY: Michael Hale
Answer
5 months ago
Oops. I took too long typing. Vitaliy already responded.
POSTED BY: Michael Hale
Answer
5 months ago
It's great you responded, Michael, - excellent explanation, thanks for helping. It's always better to say more than less when explaining ;-)
POSTED BY: Vitaliy Kaurov
Answer
5 months ago
Vitaliy and Michael,  thank you for the explanations.  The infix notation and the right associative nature of /@ were throwing me but now I can see it.  I'll have to remember to use FullForm.  I also didn't know about the tripple-click.  Very helpful.  Thanks again.
POSTED BY: Carl Lemp
Answer
5 months ago
Very cool!

I used this approach to plot similar graphs using a statististical thesaurus based on NPR podcast transripts --- see http://mathematicaforprediction.wordpress.com/2013/10/15/statistical-thesaurus-from-npr-podcasts/

(I replaced WordData[#, "Synonyms", "List"] with Rest[StatThesaurus[#, 12]] .)

The graph plots below use the statistical thesaurus entries for the words "obama", "fbi", and "food". (The words are stemmed, see WordData[_,"PorterStem"].)





POSTED BY: Anton Antonov
Answer
5 months ago
I find it amazing how little code it takes to create the networks like that and it takes almost nothing to allow the user to make some adjustments on the fly
Manipulate[
SynonymNetwork["program", lev, myDataTyp],
{{myDataTyp, "Synonyms", "Relationship"}, {"Synonyms", "Antonyms",
   "NarrowerTerms", "BroaderTerms"}},
{{lev, 2, "Depth"}, {1, 2, 3, 4, 5}}]

But is there an easy way to make the vertexes selectable so the network is updated with the selected word as next root word?
I've seen a couple of demonstation projects where network nodes are clickable but they seem to require a lot of code to do it.

In comparison to Mathematica, I wonder how much more coding was required for applications like this.
https://www.visualthesaurus.com/
POSTED BY: Carl Lemp
Answer
5 months ago
it's not that complicated.  You just need to add Button to vertices..

For example,
 SetAttributes[SynonymNetworkNew, HoldFirst]
 
 SynonymNetworkNew[word_, depth_Integer, labels_: Tooltip] :=
  Module[{ed, sz, g, vlist},(*list of edges*)
   ed = Union[
     Sort /@ Flatten[
       Rest[NestList[
         Union[Flatten[
            Thread[# <-> WordData[#, "Synonyms", "List"]] & /@ #[[All,
              2]]]] &, {"" <-> word}, depth]]]];
  vlist = Union[Flatten[ed, Infinity, UndirectedEdge]];
  g = Graph[Button[#, word = #] & /@ vlist, ed];
  (*size of vertices based on number of synonyms*)
  sz = Thread[
    vlist ->
     Map[{"Scaled", #} &, .05 (.01 + .99 N[
            Rescale[VertexDegree[g]]])^.5]];
  (*graph*)
  SetProperty[
   g, {GraphLayout -> {"BalloonEmbedding", "RootVertex" -> word},
    EdgeStyle -> Directive[Opacity[.2], Red],
    VertexStyle -> Directive[Opacity[.1], Black],
    VertexStyle -> {word -> Directive[Opacity[1], Red]},
    VertexSize -> sz, VertexLabels -> Placed["Name", labels]}]]

And then do something like the following
DynamicModule[{word = "program"},
Dynamic[SynonymNetworkNew[word, 2], TrackedSymbols :> {word}]
]

Hope this helps..

POSTED BY: Jaebum Jung
Answer
5 months ago
Jaebum, very neat, I really enjoy the implementation. Thank you for haring!
POSTED BY: Sam Carrettie
Answer
5 months ago
Jaebum, that does look easy...at leaste after you wrote it and all I had to do is read it!
I understand the rest of what you added but what is the purpose of this line?
SetAttributes[SynonymNetworkNew, HoldFirst]
The first argument of SynonymNetworkNew is a string so does it ever get evaluated with or without that line? 
I tried removing it and the result was the same bu that may be because the attribute has already been set by that point.

Thanks.
POSTED BY: Carl Lemp
Answer
5 months ago
Key point here is treating word as symbol so that we can reset word by pressing button.  By setting HoldFirst (or HoldAll) we prevent evaluation of input word, i.e., when we passing word (symbol) into function, it still remain as symbol.. For example,
SetAttributes[test1, HoldFirst]

test1[symbol_] := symbol = "Yes"

test2[symbol_] := symbol = "No"
 In[56]:= yes = "No";
 
 In[60]:= test1[yes]
 
 Out[60]= "Yes"
 
 In[61]:= test2[yes]
 
 During evaluation of In[61]:= Set::setraw: Cannot assign to raw object Yes. >>

Out[61]= "No"
POSTED BY: Jaebum Jung
Answer
5 months ago
Nice graphs! Depending on your goal, you can get more "accurate" synonyms by staying within the sense-disambiguated classes in WordData. E.g., taking the word "discombobulate", you can find these synonyms by using the third argument "Rules":
In[1]:= WordData["discombobulate", "Synonyms", "Rules"]
Out[1]= {{"discombobulate", "Verb", "Be"} ->
   {{"bedevil", "Verb", "Be"}, {"befuddle", "Verb", "Be"}, {"confound", "Verb", "Be"},
    {"confuse", "Verb", "Be"}, {"fox", "Verb", "Be"}, {"fuddle", "Verb", "Be"},
    {"throw", "Verb", "Be"}},
{"discombobulate", "Verb", "Disconcert"} -> {{"bemuse", "Verb"},
    {"bewilder", "Verb", "Disconcert"}, {"throw", "Verb", "Disconcert"}}}
If you try to do "synonyms of synonyms" of these disambiguated words, you will actually not generate any more synonyms (so significantly less fun emoticon ...)! E.g.,
In[2]:= WordData[{"fox", "Verb", "Be"}, "Synonyms", "List"]
Out[2]= {"bedevil", "befuddle", "confound", "confuse", "discombobulate", "fuddle", "throw"}
which we already had, and is a rather shorter list than
In[3]:= WordData["fox", "Synonyms", "List"]
Out[3]= {"bedevil", "befuddle", "confound", "confuse",
"discombobulate", "dodger", "flim-flam", "fob", "fuddle",
"play a joke on", "play a trick on", "play tricks", "pull a fast one on",
"slyboots", "throw", "trick"}
This might be a stricter sense of a "synonym" than you're looking for, but it's actually how the underlying WordData sources are structured (so called WordNet synsets). Although you can then make further connections by using the "BroaderTerms", "NarrowerTerms", and "PartTerms" properties, again with "Rules" as the third argument to stay within the synsets.
POSTED BY: Oyvind Tafjord
Answer
5 months ago
Thanks, Oyvind, very useful info. Ideas behind WordNet are quite intriguing. And I am very happy that we use it in Wolfram Language. Princeton University Cognitive Science Laboratory site is very interesting to browse too. You are right, considering synonym within synsets or "BroaderTerms" and "NarrowerTerms" makes network to restrictive or too over-connected. Somehow simple synonym "List" makes distribution of network sizes very broad - which sometimes considered as some sort of signature of criticality or a transition. Simplifying terribly - perhaps transition from under-connected to over-connected structures - some sort of percolation threshold. And I sort of like the idea that this is achieved when synonyms from all synsets are considered uniformly (If I am understanding meaning of "List" option correctly). 

Perhaps some research of the filed would make a Wolfram Science Summer School project. I found already some work there on synonyms: A Simple Rule to Describe Meaning-Preserving Transformations of Word Order.
POSTED BY: Vitaliy Kaurov
Answer
5 months ago
Nice work! I missed a lot of good stuff in the linguistic data part. Also, will there be data on languages other than English? I remember seeing someone on a Mathematica conference demostrated a Chinese classic poetry generator. (Might also be interesting to apply this visualization method on programming languages e.g. Mathematica itself emoticon
POSTED BY: Silvia Hao
Answer
5 months ago