Message Boards Message Boards

Exploring philosophy with LLM embeddings and Wolfram Language

LLM Embeddings and Philosophy are Easy and Fun to Use with Wolfram Lang

Overview of Project

  • Get some Philosophy texts (Code of Hammurabi, Wittgenstein, my own (Russell Foltz-Smith))
  • Clean/Tag it a bit
  • Use Nomic.ai Atlas to produce embeddings and visual map
  • Further explore the data in WolframLang
  • Enjoy life with philosophy over the eons

Tech used

  • Get an API Key from Nomic.ai https://docs.nomic.ai/
  • Set up your python environment, makes using nomic easier from within WolframLang

The Code

start a python External Session

session = StartExternalSession[{"Python", "Name" -> "wolframlang"}]

get the data sources

  • ExampleData for Hammurabi
  • JSON for Wittgenstein's Tractatus from here https://github.com/Geurt/tractatus/tree/master/data (then simplified using a script, chatGPT, wolframGPT or other can help ya do it)
  • my own philosophy thoughts from a public google sheet

    text = ExampleData[{"Text", "CodeOfHammurabiEnglish"}];
    sections2 = StringSplit[text, RegularExpression["\\d+\\. "]];
    sections = 
     StringSplit[text, RegularExpression["(?<=\\d\\. )"]]; hammurabi = 
     ExportString[
      Map[<|{"thought_author" -> "Hammurabi", 
          "thought_id" -> Position[sections2, #][[1]][[1]], 
          "thought_section_id" -> Position[sections2, #][[1]][[1]], 
          "thought_sub_id" -> Position[sections2, #][[1]][[1]], 
          "thought" -> #}|> &, sections2], "JSON"]; TLPDataset = 
     Import["/Users/eventhorizon/Downloads/TLP_simple_decimal.json", 
      "Dataset"]; symbolAndrelation = 
     Import["https://docs.google.com/spreadsheets/d/e/2PACX-\
    1vTV5nLJFkS4iCKojJ7qWMtHTVxAsnNmA_RsmqQircWgELoJqdZWmQn0wF-\
    d4l06ivz5pgZJkeDbRxNh/pub?gid=0&single=true&output=csv", "Dataset", 
      "HeaderLines" -> 1]; ham = 
     Map[<|{"thought_author" -> "Hammurabi", 
         "thought_id" -> Position[sections2, #][[1]][[1]], 
         "thought_section_id" -> Position[sections2, #][[1]][[1]], 
         "thought_sub_id" -> Position[sections2, #][[1]][[1]], 
         "thought" -> #}|> &, sections2]; russ = 
     Map[<|{"thought_author" -> "Russell Foltz-Smith", 
         "thought_id" -> #[[6]], "thought_section_id" -> #[[5]], 
         "thought_sub_id" -> #[[1]], "thought" -> #[[2]]}|> &, 
      Normal@symbolAndrelation]; witt = 
     Map[<|{"thought_author" -> "Wittgenstein (trans. Ogden)", 
         "thought_id" -> 
          Flatten[Position[Normal[TLPDataset][[All, 2]], #[[2]]]][[1]], 
         "thought_section_id" -> #[[1]], "thought_sub_id" -> #[[2]], 
         "thought" -> #[[3]]}|> &, Normal@TLPDataset]; allThoughts = 
     Join[ham, russ, witt]
    

Turn all the thoughts into a nice python dataframe for easy loading into Nomic python API

pythonCodeAll = StringJoin[{
      "import pandas as pd\n",
    "import json\n",
      "thoughts = <* allThoughts *>\n",
      "dfAll = pd.DataFrame(data=thoughts)\n",
    "dfAll['thought_sub_id'] \
=dfAll['thought_sub_id'].astype('float64')\n",
    "dfAll['thought_section_id'] \
=dfAll['thought_section_id'].astype('float64')\n",
      "dfAll"
    }];

(* Evaluate the Python code in the external session. *)
ExternalEvaluate[session, pythonCodeAll]

should look like this

Create your nomic map!

ExternalEvaluate[session, "import nomic
nomic.login('[YOUR NOMIC API KEY"]

ExternalEvaluate[session, "from nomic import atlas
dataset = atlas.map_data(data=dfAll,
indexed_field='thought',            
is_public=False,
identifier='[your URL friendly name]',
embedding_model='nomic-embed-text-v1.5',
description='how are thoughts really connected?')
dataset.maps[0]"]

After your data finishes mapping on Nomic (it will get embeddings automatically) you'll get a url that takes you to your interactive map like this enter image description here

then you bring your embedding, topic and duplicate detected data back into WolframLang for more fun.

ExternalEvaluate[session, "from nomic import atlas
from nomic import atlas
from nomic import AtlasDataset
datasetThoughts = AtlasDataset('thoughts-of-three')
map = datasetThoughts.maps[0]
map.data.df
"]

or if you just want the project (2d embeddings)

embeddingsNomicProjected = 
 ExternalEvaluate[session, "from nomic import atlas
from nomic import atlas
from nomic import AtlasDataset
map.embeddings.projected
"]

You'll probably just want the X,Y projected embeddings and the other data in one dataframe/dataset (wolframlang concept for dataframe)

embeddingsNomicProjected = 
 ExternalEvaluate[session, "from nomic import atlas
from nomic import atlas
from nomic import AtlasDataset
merged_df = pd.merge(map.embeddings.projected,map.data.df, on='id_')
merged_df
"]

nomic data out

now more from within Wolfram, like plotting the author names in the embedding space

ListPlot[
 Labeled[{#[[1]], #[[2]]}, #[[3]]] & /@ 
  Values[embeddingsNomicProjected[All, {"x", "y", "thought_author"}]],
  ColorFunction -> "DarkRainbow"]

View From Within Wolfram Lang

Let's add in the Nomic topics so we can label with that too!

embeddingsNomicProjected = 
ExternalEvaluate[session, "from nomic import atlas
from nomic import atlas
from nomic import AtlasDataset
merged_df = pd.merge(map.embeddings.projected,map.topics.df, \
on='id_')
merged_df2 = pd.merge(merged_df ,map.data.df, on='id_')
merged_df2
"]

ListPlot[
 Labeled[{#[[1]], #[[2]]}, {#[[3]]}] & /@ 
  Values[embeddingsNomicProjected[
    All, {"x", "y", "topic_depth_1", "thought_author"}]], 
 ImageSize -> Full, ColorFunction -> "DarkRainbow"]

top level subjects

Maybe philosophers (ancient lawyers) have longer thoughts the more they write?

thought order by thought length

sentimentByAuthor = {#[[4]], #[[3]], 
    Classify["Sentiment", #[[5]]]} & /@ 
  Values[embeddingsNomicProjected[
    All, {"x", "y", "topic_depth_2", "thought_author", "thought", 
     "thought_id"}]]

what do you think the feelings are of each author by subject? :)

have fun!

p.s. here's an set of animations of Nomic showing different topologies of the thought data Animation of Embedding and data using Nomic Atlas

animating through different projects to make some philosophic point about: * length of philosophers ideas * order in which they have thoughts * diffusion of authors thoughts * artful interpretation of Wittgensteins famous last thought from Tractatus

enter image description here -- you have earned Featured Contributor Badge enter image description here Your exceptional post has been selected for our editorial column Staff Picks http://wolfr.am/StaffPicks and Your Profile is now distinguished by a Featured Contributor Badge and is displayed on the Featured Contributor Board. Thank you!

POSTED BY: EDITORIAL BOARD
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract