Group Abstract

Message Boards

WOLFRAM COMMUNITY

16

|

38.9K Views

|

17 Replies

|

44 Total Likes

View groups...

Follow this post

Share

Share this post:

GROUPS:

Wolfram Science Data Science Mathematica Graphs and Networks Wolfram Language

Convergence of synonym networks

Vitaliy Kaurov, WOLFRAM Research

Posted 12 years ago

Take a word and find its synonyms. Then find the synonyms of the synonyms from the previous step. And so on. For how long will the number of synonyms continue to grow? Language structures can take peculiar shapes. Code can be quite simple to compute and automate informative visualizations. Below vertex size is larger if the vertex has more connections, meaning a word has a greater number of synonyms. Note GraphLayout -> {"BalloonEmbedding", "RootVertex" -> word} option to Graph is used to produce this specific layout. The code below is based on an example from Documentation. Lets define a function: SynonymNetwork[word_String, depth_Integer, labels_: Tooltip] := Module[{ed, sz, g}, (* list of edges ) ed = Union[Sort /@ Flatten[Rest[NestList[Union[Flatten[ Thread[# <-> WordData[#, "Synonyms", "List"]] & /@ #[[All, 2]]]] &, {"" <-> word}, depth]]]]; ( size of vertices based on number of synonyms ) sz = Thread[VertexList[Graph[ed]] -> Map[{"Scaled", #} &, .05 (.01 + .99 N[Rescale[VertexDegree[g = Graph[ed]]]])^.5]]; ( graph ) SetProperty[g, {GraphLayout -> {"BalloonEmbedding", "RootVertex" -> word}, EdgeStyle -> Directive[Opacity[.2], Red], VertexStyle -> Directive[Opacity[.1], Black], VertexStyle -> {word -> Directive[Opacity[1], Red]}, VertexSize -> sz, VertexLabels -> Placed["Name", labels]}] ] Here how it works: SynonymNetwork["promise", 3] Now as I asked earlier: will the number of synonyms continue to grow? This of course depends on a particular thesaurus dictionary at hand. But once we settled on one, what can we find? Obviously some very specific words have no synonyms at all: In[1]:= WordData["transmogrification", "Definitions"] Out[1]= {{"transmogrification", "Noun"} -> "the act of changing into a different form or appearance (especially a fantastic or grotesque one)"} In[2]:= WordData["transmogrification", "Synonyms"] Out[2]= {{"transmogrification", "Noun"} -> {}} Some words will have very trivial small finite networks (note network depth is set 20, while even 100 or greater will not change it): SynonymNetwork["chemistry", 20, Above] And of course many words will have networks that grow very fast. This applies not only for very general words such as fast or beautiful, but also for strange rare words such as "discombobulate": ParallelMap[Length[EdgeList[SynonymNetwork["discombobulate", #]]] &, Range[16]] Out[]= {9, 97, 1097, 7644, 26051, 46671, 59440, 65187, 67805, 68798, 69274, 69456, 69565, 69587, 69592, 69592} data = ParallelMap[Length[VertexList[SynonymNetwork["discombobulate", #]]] &, Range[16] {10, 73, 787, 4293, 11646, 18858, 22931, 24911, 25743, 26096, 26238, 26302, 26321, 26330, 26330, 26330} ListPlot[data, Filling -> Bottom, Joined -> True, Mesh -> All, AxesLabel -> {"Network depth", "Synonyms"}] So we see "discombobulate" synonym network gets saturated at depth 15 and attains 26330 vertices and 69592 edges. Quite magnificent result I think ;-) This is, btw, very inefficient way of computing the convergence we repeat same computations many times while building the graph. Ideally we should introduce counting synonyms part into our SynonymNetwork function. This is how discombobulate network looks "big" already at depth 3: SynonymNetwork["discombobulate", 3] How big do such networks get? Can we make some estimates? Well, lets first define set of all words that to belong to discombobulate network: gr = SynonymNetwork["discombobulate", 15]; ver = VertexList[gr]; Then lets find all unique words in Alice in Wonderland: alice = Union[ToLowerCase[StringSplit[ExampleData[{"Text", "AliceInWonderland"}], RegularExpression["[\\W_]+"]]]]; alice // Length Out[]= 1484 Then select, say, only nouns that have synonyms nouns = Select[alice, MemberQ[WordData[#, "PartsOfSpeech"], "Noun"] && WordData[#, "Synonyms", "List"] =!= {} &]; nouns // Length Out[]= 809 We see that more than half of these nouns belong to the discombobulate network: MemberQ[ver, #] & /@ nouns // Tally Out[]= {{False, 223}, {True, 586}} I also would like to share some beautiful smaller CONVERGED (saturated) networks, which I found, - beautiful in structure and sets of words they gather (see below). They of course do not belong to huge "discombobulate" graph above. And as already mentioned I propose the following - comment if: you find some beautiful networks you can figure out how we can make estimates on large networks or do any further digging you can optimize code you have any ideas / comments at all ;-) ===> "dragonfly" - 23 synonyms, depth 3* ParallelMap[Length[VertexList[SynonymNetwork["dragonfly", #]]] &, Range[7]] Out[]= {8, 14, 23, 23} SynonymNetwork["dragonfly", 20, Above] ===> "benevolent" - 27 synonyms, depth 7 ParallelMap[Length[VertexList[SynonymNetwork["benevolent", #]]] &, Range[9]] Out[]= {11, 15, 19, 23, 25, 26, 27, 27} SynonymNetwork["benevolent", 20, Below]

POSTED BY: Vitaliy Kaurov

17 Replies

Sort By:

0

Matthew Mosher

Posted 6 years ago

Mr. Kaurov, Is there a way to map the closeness or relatedness of words with your method? I would like to be able to measure or rate the closeness or proximity of words in absolute terms ( if the whole language is mapped and quantified ) or as a measure of degrees of freedom away from each other. Maybe some multiplicative measure between touch-points and degrees of freedom? I found a tool that almost does something similar at https://wordnet.princeton.edu but it is not a graphical network map like you created. Any help or pointers would be greatly appreciated. Thank you in advance.

POSTED BY: Matthew Mosher

1

Silvia Hao, Glimscape Technology

Posted 12 years ago

Nice work! I missed a lot of good stuff in the linguistic data part. Also, will there be data on languages other than English? I remember seeing someone on a Mathematica conference demostrated a Chinese classic poetry generator. (Might also be interesting to apply this visualization method on programming languages e.g. Mathematica itself

POSTED BY: Silvia Hao

3

Oyvind Tafjord

Posted 12 years ago

Nice graphs! Depending on your goal, you can get more "accurate" synonyms by staying within the sense-disambiguated classes in WordData. E.g., taking the word "discombobulate", you can find these synonyms by using the third argument "Rules": In[1]:= WordData["discombobulate", "Synonyms", "Rules"] Out[1]= {{"discombobulate", "Verb", "Be"} -> {{"bedevil", "Verb", "Be"}, {"befuddle", "Verb", "Be"}, {"confound", "Verb", "Be"}, {"confuse", "Verb", "Be"}, {"fox", "Verb", "Be"}, {"fuddle", "Verb", "Be"}, {"throw", "Verb", "Be"}}, {"discombobulate", "Verb", "Disconcert"} -> {{"bemuse", "Verb"}, {"bewilder", "Verb", "Disconcert"}, {"throw", "Verb", "Disconcert"}}} If you try to do "synonyms of synonyms" of these disambiguated words, you will actually not generate any more synonyms (so significantly less fun ...)! E.g., In[2]:= WordData[{"fox", "Verb", "Be"}, "Synonyms", "List"] Out[2]= {"bedevil", "befuddle", "confound", "confuse", "discombobulate", "fuddle", "throw"} which we already had, and is a rather shorter list than In[3]:= WordData["fox", "Synonyms", "List"] Out[3]= {"bedevil", "befuddle", "confound", "confuse", "discombobulate", "dodger", "flim-flam", "fob", "fuddle", "play a joke on", "play a trick on", "play tricks", "pull a fast one on", "slyboots", "throw", "trick"} This might be a stricter sense of a "synonym" than you're looking for, but it's actually how the underlying WordData sources are structured (so called WordNet synsets). Although you can then make further connections by using the "BroaderTerms", "NarrowerTerms", and "PartTerms" properties, again with "Rules" as the third argument to stay within the synsets.

POSTED BY: Oyvind Tafjord

3

Vitaliy Kaurov, WOLFRAM Research

Posted 12 years ago

Thanks, Oyvind, very useful info. Ideas behind WordNet are quite intriguing. And I am very happy that we use it in Wolfram Language. Princeton University Cognitive Science Laboratory site is very interesting to browse too. You are right, considering synonym within synsets or "BroaderTerms" and "NarrowerTerms" makes network to restrictive or too over-connected. Somehow simple synonym "List" makes distribution of network sizes very broad - which sometimes considered as some sort of signature of criticality or a transition. Simplifying terribly - perhaps transition from under-connected to over-connected structures - some sort of percolation threshold. And I sort of like the idea that this is achieved when synonyms from all synsets are considered uniformly (If I am understanding meaning of "List" option correctly). Perhaps some research of the filed would make a Wolfram Science Summer School project. I found already some work there on synonyms: A Simple Rule to Describe Meaning-Preserving Transformations of Word Order.

POSTED BY: Vitaliy Kaurov

1

Carl Lemp, Integral Controls, LLC

Posted 12 years ago

I find it amazing how little code it takes to create the networks like that and it takes almost nothing to allow the user to make some adjustments on the fly Manipulate[ SynonymNetwork["program", lev, myDataTyp], {{myDataTyp, "Synonyms", "Relationship"}, {"Synonyms", "Antonyms", "NarrowerTerms", "BroaderTerms"}}, {{lev, 2, "Depth"}, {1, 2, 3, 4, 5}}] But is there an easy way to make the vertexes selectable so the network is updated with the selected word as next root word? I've seen a couple of demonstation projects where network nodes are clickable but they seem to require a lot of code to do it. In comparison to Mathematica, I wonder how much more coding was required for applications like this. https://www.visualthesaurus.com/

POSTED BY: Carl Lemp

6

Jaebum Jung, Wolfram

Posted 12 years ago

it's not that complicated. You just need to add Button to vertices.. For example, SetAttributes[SynonymNetworkNew, HoldFirst] SynonymNetworkNew[word_, depth_Integer, labels_: Tooltip] := Module[{ed, sz, g, vlist},(list of edges) ed = Union[ Sort /@ Flatten[ Rest[NestList[ Union[Flatten[ Thread[# <-> WordData[#, "Synonyms", "List"]] & /@ #[[All, 2]]]] &, {"" <-> word}, depth]]]]; vlist = Union[Flatten[ed, Infinity, UndirectedEdge]]; g = Graph[Button[#, word = #] & /@ vlist, ed]; (size of vertices based on number of synonyms) sz = Thread[ vlist -> Map[{"Scaled", #} &, .05 (.01 + .99 N[ Rescale[VertexDegree[g]]])^.5]]; (graph) SetProperty[ g, {GraphLayout -> {"BalloonEmbedding", "RootVertex" -> word}, EdgeStyle -> Directive[Opacity[.2], Red], VertexStyle -> Directive[Opacity[.1], Black], VertexStyle -> {word -> Directive[Opacity[1], Red]}, VertexSize -> sz, VertexLabels -> Placed["Name", labels]}]] And then do something like the following DynamicModule[{word = "program"}, Dynamic[SynonymNetworkNew[word, 2], TrackedSymbols :> {word}] ] Hope this helps..

POSTED BY: Jaebum Jung

0

Sam Carrettie, Freelancer

Posted 12 years ago

Jaebum, very neat, I really enjoy the implementation. Thank you for haring!

POSTED BY: Sam Carrettie

0

Carl Lemp, Integral Controls, LLC

Posted 12 years ago

Jaebum, that does look easy...at leaste after you wrote it and all I had to do is read it! I understand the rest of what you added but what is the purpose of this line? SetAttributes[SynonymNetworkNew, HoldFirst] The first argument of SynonymNetworkNew is a string so does it ever get evaluated with or without that line? I tried removing it and the result was the same bu that may be because the attribute has already been set by that point. Thanks.

POSTED BY: Carl Lemp

0

Jaebum Jung, Wolfram

Posted 12 years ago

Key point here is treating word as symbol so that we can reset word by pressing button. By setting HoldFirst (or HoldAll) we prevent evaluation of input word, i.e., when we passing word (symbol) into function, it still remain as symbol.. For example, SetAttributes[test1, HoldFirst] test1[symbol_] := symbol = "Yes" test2[symbol_] := symbol = "No" In[56]:= yes = "No"; In[60]:= test1[yes] Out[60]= "Yes" In[61]:= test2[yes] During evaluation of In[61]:= Set::setraw: Cannot assign to raw object Yes. >> Out[61]= "No"

POSTED BY: Jaebum Jung

6

Anton Antonov, Accendo Data LLC

Posted 12 years ago

Very cool! I used this approach to plot similar graphs using a statististical thesaurus based on NPR podcast transripts --- see http://mathematicaforprediction.wordpress.com/2013/10/15/statistical-thesaurus-from-npr-podcasts/ . (I replaced WordData[#, "Synonyms", "List"] with Rest[StatThesaurus[#, 12]] .) The graph plots below use the statistical thesaurus entries for the words "obama", "fbi", and "food". (The words are stemmed, see WordData[_,"PorterStem"].)

POSTED BY: Anton Antonov

2

Carl Lemp, Integral Controls, LLC

Posted 12 years ago

For the sake of us newbies, could you explain how the heart of this works? Union[Flatten[Thread[# <-> WordData[#, "Synonyms", "List"]] & /@ Last /@ #]] At the center, I can see that it builds up a list of edges between a word and it's synonyms and the Union[Flatten[... looks like it removes duplicate edges but I can't understand the structure of the edge list that is being created by this part of the code: Thread[# <-> WordData[#, "Synonyms", "List"]] & /@ Last /@ #]] A couple of my questions are: What function is being threaded to create the list of edges? What do the mapping to Last ( /@ Last) and the mapping to the word being processed (/@ #) do? I've tried extracting parts of the code and evaluating it as a separate inut to see the output structure but I have not been able to get anything helpful. Thanks in advance.

POSTED BY: Carl Lemp

2

Vitaliy Kaurov, WOLFRAM Research

Posted 12 years ago

Carl, thanks for asking - he is my try at explaining this. 1) Function being threaded is UndirectedEdge[a,b] which has a shorter notation a<->b. When we actually get to the stage of threading, we have a simple construct of linking a specific word to its synonyms in the following way: In[1]:= Thread[word <-> {syn1, syn2, syn3}] 2) To answer this question it is better to remember that the functions you are asking about are inside NestList[f,exp,n] which is basically a looping iterative contract that takes expression exp, plugs it into function f and MUST return expression of the same structure as original exp in order for NestList to proceed self-consistently from loop to loop. In our example the expression exp is a list of edges. So function f must also return a list of edges. What f does on its 1st step via Last/@ is taking last parts {x,y,z} from edges list {a<->x,b<->y,c<->z} which are the synonyms from the last step, while {a,b,c} would be the words for which synonyms are looked up. The very original expression exp is {"" <-> word} where word is the original word and "" just a dummy placeholder to have "a list of edges" structure. This original exp is getting dropped at the end by function Rest, so vertex "" never appear in the final graph. I hope this helps somehow.

POSTED BY: Vitaliy Kaurov

1

Vitaliy Kaurov, WOLFRAM Research

Posted 12 years ago

Actually, there is a better way than Last/@, compare: In[1]:= qqq = RandomInteger[9, {10^6, 2}]; In[2]:= Last /@ qqq; // AbsoluteTiming Out[2]= {0.071944, Null} In[3]:= qqq[[All, 2]]; // AbsoluteTiming Out[3]= {0.013791, Null} I will update my code soon with [[...]] which is function Part.

POSTED BY: Vitaliy Kaurov

3

Michael Hale

Posted 12 years ago

Hi Carl, The smallest subexpression in this example that will do something interesting when you evaluate it is what you get when you triple-click on NestList. The Union has an & at the end of it, which means that it is the pure function that is being repeatedly applied by NestList, so it won't do anything by itself because the # doesn't have a value. To answer your questions: 1. The function/head being threaded is UndirectedEdge. It is being written with its infix operator (<->) though. Remember to think in terms of how things look in FullForm. Consider this similar example with the infix operator for Equal. Thread[a == {b, c, d}] {a == b, a == c, a == d} It can also be written: Thread[Equal[a, {b, c, d}]] But the former is easier to read once you understand it. 2. The Map infix operator is right associative. This means that a /@ b /@ c is equivalent to a /@ (b /@ c) You can see this by triple-clicking on each of the letters a, b, and c and seeing what subexpression is selected. So Last /@ # is evaluated first. The # here is replaced with a list of undirected edges at each iteration of NestList. Mapping Last across this list gives us a list of the new words to expand from at each iteration. If the Thread was written like Thread[WordData[#, "Synonyms", "List"] <-> #] then we would want to map First across the edges to get the new words. We would also need to change the starting value to { word <-> "" }. Once we have the list of new words we map the Thread across them which creates an edge joining each new word to each of its synonyms.

POSTED BY: Michael Hale

0

Michael Hale

Posted 12 years ago

Oops. I took too long typing. Vitaliy already responded.

POSTED BY: Michael Hale

0

Vitaliy Kaurov, WOLFRAM Research

Posted 12 years ago

It's great you responded, Michael, - excellent explanation, thanks for helping. It's always better to say more than less when explaining ;-)

POSTED BY: Vitaliy Kaurov

0

Carl Lemp, Integral Controls, LLC

Posted 12 years ago

Vitaliy and Michael, thank you for the explanations. The infix notation and the right associative nature of /@ were throwing me but now I can see it. I'll have to remember to use FullForm. I also didn't know about the tripple-click. Very helpful. Thanks again.

POSTED BY: Carl Lemp

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback