Message Boards Message Boards

[WSS22] Extracting linguistic relations fr. word embeddings&language models

Posted 1 year ago
2 Replies

How would you convert TextStructure into a list of vertices that are directed from the first member of the list to the next and then the next? For example,

TextStructure["Jupiter is the biggest planet in our solar system, \
eleven times bigger in diameter than Earth and two-and-a-half times \
more massive than all the other planets put together. Jupiter has no \
solid surface. Beneath the gas clouds lies hot, liquid hydrogen, then \
a layer of hydrogen in a form similar to liquid metal, and finally a \
rocky core. Jupiter has a faint ring around its equator made of \
microscopic dust particles.", "DependencyGraphs"]

The directed lists which generate the abstraction graphs in natural language processing really aid in the visualization of latent spaces.

enter image description here enter image description here enter image description here enter image description here

What would you do if you wanted to create the directed list with, instead of natural numbers, strings of words? Which is to generate list-oriented DirectedEdges.

With[{text = 
    "What do you call an arrow function that cares about where it is \
invoked & called (for example, not about where it is defined)? The \
idea is that the 8 year old has a lot to look forward to, therefore \
is most analogous to the arrow function; he's taking his true calling \
and following the laws of physics, rather than being affected by the \
laws of physics. He thus finds the illogical arrangement in the data, \
caring about the logical progression of the data (for example, not \
its definition).", {" ", ". ", ", ", "
    Select[Thread[Rule[Drop[text, -1], Drop[text, 1]]], 
     UnsameQ[#, {}] &]]], VertexLabels -> "Name"]]

enter image description here

CosineSimilarity[word1_, word2_, model_] := 
 CosineDistance[model[word1], model[word2]]
CosineSimilarity["dog", "cat", word2vecGl]

This similarity Cosine, is intertwined..offering us new possibilities for efficiency, transparency, and fairness. It poses the following value:

Cosine Similarity

Nearest[word2vecGl, word2vecGl["king"], 10]

Nearest King

 word2vecGl["paris"] - word2vecGl["france"] + word2vecGl["italy"], 10]

Nearest Paris

Nearest[word2vecCn, word2vecCn["birthday"], 10]

Nearest Birthday


Mean Syno Distances

Mean Antonyms Distances

There are some breathtaking linguistic computational interpretability issues related to ethics and privacy before the mean synonym & antonym distances are to be addressed. It might help us gain a more perplexing understanding of our own societal norms and rules.

 word2vecGl["king"] - word2vecGl["man"] + word2vecGl["woman"], 8]
Nearest[word2vecGl, word2vecGl["waitress"] + 2.2*converterTrans, 8]
Nearest[word2vecGl, word2vecGl["actress"] + 2.2*converterTrans, 8]
WordRelations[words_List, modelAssoc_] := 
 Column[Nearest[modelAssoc, modelAssoc[#], 5] & /@ words]
WordRelations[{"king", "queen", "prince", "princess"}, word2vecGl]

King Man Woman

Waitress 2.2 converterTrans

Actress 2.2 converterTrans


The philosophical implications of the king queen dichotomy or the bartender-actor understanding of human concepts...ranging from "bartender" to "guy" and "actor" to "mr." would need to be translated into dazzling computational terms that the AIs can understand. Constitutionally, we should be able to represent legal and linguistic constructs symbolically. Human language is inherently ambiguous, leading to many legal disputes.

PrincipalRelations["kitten", "cat"]
PrincipalRelations["summer", "winter"]
PrincipalRelations["republican", "democrat"]

Kitten Cat

Summer Winter

Republican Democrat

{Histogram[synoDistances, PlotLabel -> "Synonyms Distances", 
  ImageSize -> Medium],
 Histogram[antonymsDistances, PlotLabel -> "Antonyms Distances", 
  ImageSize -> Medium],
  PlotLabel -> "Random Word Embedding Distances", ImageSize -> Medium],
  PlotLabel -> "Random Vector Distances", ImageSize -> Medium]}

Random Vector Distances Histograms

This ambiguity is a major challenge when you look at these random word embedding distances which require precise you know that the Cayley-Menger determinant is extensively used in geometric computation and representations of geometric & algebraic problems? When this determinant equals zero, it means all the points are on the same n-1 dimensional sphere, that is they are co-spherical. And if the determinant is negative, it indicates..that the distances do not come from any Euclidean space, that is they violate some basic principles of geometry: the triangle inequality.

FindSimilarWords[word_String, modelAssoc_, n_ : 10] := 
   DeleteCases[Keys[modelAssoc], word] // 
    AssociationMap[CosineDistance[modelAssoc[word], modelAssoc[#]] &],
    Identity] // Take[#, n] &
FindSimilarWords["king", word2vecGl]


The luminescent sterilizer, it's the more complex coordinate geometry that the determinant provides for an easy way to calculate volumes named after Arthur Cayley and Carl Menger the mathematicians, that calculates the volume of an n-dimensional simplex.

genderVector = Normalize[word2vecGl["man"] - word2vecGl["woman"]];
genderBias = Dot[genderVector, #] & /@ word2vecGl;
genderBias // KeySort // ReverseSort

Gender Bias

This gender bias illuminates the relationship between the downstream task performance and the analogy task, to uncover how these models represent and learn linguistic information. And it's this radiant ability to solve the analogy task..can these maps generalize to other tasks, can they be interpreted meaningfully? By the eigenvectors & eigenvalues, the map's the analogy "man is to king as woman is to queen", the map would transform the word embedding of 'man' to that of the ravishing 'king'.

SynonymFinder[word_String] := 
 Nearest[word2vecGl, word2vecGl[word], 6]

Synonym Finder Happy

wordList = {"king", "queen", "man", "woman", "bread", "butter", "cat",
wordVectors = word2vecGl /@ wordList;
PCA = PrincipalComponents[wordVectors];
proj = PCA[[All, 1 ;; 2]];
 PlotStyle -> PointSize[Medium],
 PlotLabel -> "Word Embeddings Visualized via PCA",
 FrameLabel -> {"Principal Component 1", "Principal Component 2"},
 Frame -> True,
 ImageSize -> Medium,
 Epilog -> (Text[wordList[[#]], proj[[#]], {-1, -1}] & /@ 

Word Embeddings Visualized via PCA

@Cayden Pierce it's delectable the behavior of these linear maps, word embeddings visualized via Principle Component Analysis. Whether or not linear maps are learned in a word analogy task that hold when applied to other tasks..these kinds of models perform on downstream tasks.

word = "pizza";
distances = 
  AssociationMap[EuclideanDistance[word2vecGl[word], word2vecGl[#]] &,
closest = TakeSmallest[distances, 50];
WordCloud[closest, ImageSize -> Large]

Word Cloud

It's interesting to think about, where do these word cloud errors come from? While word2vec and similar models can capture some intriguing semantic relationships, they have limitations and do not always perform as expected. The word2vec model can often reflect biases in the training data and may not always provide the most intuitively correct answers, to these analogy problems. The famous word2vec example, "king - man + woman = ?" often results in 'queen'. It does not always do so! The exact result is mysterious and varies depending on the specific model implementation and word vectors.

Rasterize[FeatureSpacePlot[RandomSample[vecsCn, 2000]]]


randomVecNormal[r_, d_] := 
 r*Normalize@RandomVariate[NormalDistribution[], d]
Rasterize[FeatureSpacePlot[Table[randomVecNormal[1, 300], 1000]]]

Feature Space Plot Rasterize

Here's another "example". The FeatureSpacePlot can be normally Rasterized. What's the balance between production readiness and prototyping speed? In TensorFlow vs PyTorch for Text Classification there are Convolutional Neural Networks to perform text classification tasks, on the two distinct datasets: 20 Newsgroups and the Movie Review Data. The "Pythonic" presentation of PyTorch is exhilarating and flexible, with both that and TensorFlow described as a "low-level library with high-level APIs built on top", TensorFlow more verbose and provides more explicit control, which might be beneficial for deploying in a production environment. How do you implement your production environment?

synonyms = {"happy", "joyful", "cheerful", "gleeful", "jubilant"};
meanSynonymEmbedding = Mean[word2vecCn[#] & /@ synonyms];
Nearest[word2vecCn, meanSynonymEmbedding, 10]

word2vecCn meanSynonymEmbedding, 10, Nearest

WordDiffWalkSteps[w1_String, w2_String, steps_Integer, modelAssoc_] :=
   Module[{e1, e2, transSteps}, e1 = Flatten[modelAssoc[w1]];
   e2 = Flatten[modelAssoc[w2]];
   transSteps = Table[e1 + i*((e2 - e1)/steps), {i, steps}];
   Nearest[modelAssoc, #, 5] & /@ transSteps];
Column[WordDiffWalkSteps["cat", "dog", 15, word2vecCn]]


categories = <|
   "animals" -> {"dog", "cat", "tiger", "elephant", "bear", "fish", 
     "dolphin", "bird"}, 
   "furniture" -> {"table", "chair", "sofa", "cupboard", "bed", 
     "desk", "shelf", "drawer"}, 
   "emotions" -> {"happy", "sad", "angry", "excited", "afraid", 
     "curious", "bored", "surprised"}|>;
categoryEmbeddings = Map[word2vecGl, categories, {2}];
categorySamples = RandomSample /@ categoryEmbeddings;
words = Join[categories["animals"], categories["furniture"]];
vectors = word2vecGl /@ words;
clusters = FindClusters[vectors];
wordsByCluster = 
  GatherBy[words, Position[clusters, word2vecGl[#]] &];
Rasterize /@ Map[FeatureSpacePlot, categorySamples]


Rasterize /@ Map[FeatureSpacePlot

The word2vec model actually presents an interactive, intoxicating visualization tool that allows users to explore word analogies, using the word2vec model, using pre-trained word vectors from GloVe, what is that? It's so compelling how these pairs like 'uncle' and 'aunt', 'niece' and 'nephew', 'brother' and 'sister', 'actor' and 'actress' etc., are positioned close together to signify that they're, in the sense of gender differences, similar, right.

Rasterize[FeatureSpacePlot[RandomSample[vecsCn, 1000]]]
Rasterize[FeatureSpacePlot[RandomSample[vecsGl, 1000]]]



{PhraseBogusPair["the sky"], PhraseBogusPair["the green"], 
 PhraseBogusPair["a cat"], PhraseBogusPair["many dogs"]}


Row[Style[Keys[#] <> " ", Background -> Hue[Values[#]]] & /@ 
      "In the beginning God created the heavens and the earth", " "]],
     2, 1], {1}]]


@Cayden Pierce, Pointwise Mutual Information that is the log probability that two words co-occur, can be approximated in a high-dimensional space by the scalar product of word vectors. The technical papers, tutorials, and pre-trained models that Extracting linguistic relations fr. word embeddings&language models provides, have furthered our exploration of word2vec.

abstractConcretePairs = {{"vehicle", "car"}, {"fruit", 
    "apple"}, {"color", "blue"}, {"animal", "dog"}};
abstractConcretePairsEmbeddings = 
  Apply[{Rule[#1, word2vecCn[#1]], Rule[#2, word2vecCn[#2]]} &, 
   abstractConcretePairs, 1];
abstractConcretePairsNearest = 
  Apply[{Rule[Values[#1], Nearest[word2vecCn, Values[#1], 1000]], 
     Rule[Values[#2], Nearest[word2vecCn, Values[#2], 1000]]} &, 
   abstractConcretePairsEmbeddings, 1];
abstractConcretePairsNearestEmbeddings = 
  Map[{Keys[#], Map[word2vecCn, Values[#]]} &, 
   abstractConcretePairsNearest, {2}];
abstractConcretePairsNearestDistances = 
  Map[Map[Function[{var}, EuclideanDistance[var, #[[1]]]], #[[
      2]], {1}] &, abstractConcretePairsNearestEmbeddings, {2}];
Apply[PairedHistogram, abstractConcretePairsNearestDistances, {1}]


The concept of "Linguistic Equations" compels the left-hand side to represent a single word and the right-hand side to represent a synonymous: definitional expression. You had me at, word embeddings contain more grammatical information, just not in the conventional sense. Assigning a higher value to the grammatically correct sentences, phrases that yield larger inner products than agrammatical phrases..noun can be represented by the word embedding and, in the word embedding space the adjective represents some translation vector. In this vector do something else, and then play around with the validity of these linguistic equations, and then you've got it figured out. Simple distance metrics do not capture this relation, the class of words that does have a unique relationship that relationship exists in a subspace of the word embedding space so that instead of merely relying on the proximity between words in the embedding space, @Cayden Pierce also calculate the volume of the simplex formed by the embeddings of each word in the class and the centroid, of those embeddings, galvanizing the subspaces & relationships that exist within different classes, of words.

POSTED BY: Dean Gladish

enter image description here -- you have earned Featured Contributor Badge enter image description here Your exceptional post has been selected for our editorial column Staff Picks and Your Profile is now distinguished by a Featured Contributor Badge and is displayed on the Featured Contributor Board. Thank you!

POSTED BY: Moderation Team
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
or Discard

Group Abstract Group Abstract