LLM Embeddings and Philosophy are Easy and Fun to Use with Wolfram Lang
Overview of Project
- Get some Philosophy texts (Code of Hammurabi, Wittgenstein, my own (Russell Foltz-Smith))
- Clean/Tag it a bit
- Use Nomic.ai Atlas to produce embeddings and visual map
- Further explore the data in WolframLang
- Enjoy life with philosophy over the eons
Tech used
- Get an API Key from Nomic.ai https://docs.nomic.ai/
- Set up your python environment, makes using nomic easier from within WolframLang
The Code
start a python External Session
session = StartExternalSession[{"Python", "Name" -> "wolframlang"}]
get the data sources
- ExampleData for Hammurabi
- JSON for Wittgenstein's Tractatus from here https://github.com/Geurt/tractatus/tree/master/data (then simplified using a script, chatGPT, wolframGPT or other can help ya do it)
my own philosophy thoughts from a public google sheet
text = ExampleData[{"Text", "CodeOfHammurabiEnglish"}];
sections2 = StringSplit[text, RegularExpression["\\d+\\. "]];
sections =
StringSplit[text, RegularExpression["(?<=\\d\\. )"]]; hammurabi =
ExportString[
Map[<|{"thought_author" -> "Hammurabi",
"thought_id" -> Position[sections2, #][[1]][[1]],
"thought_section_id" -> Position[sections2, #][[1]][[1]],
"thought_sub_id" -> Position[sections2, #][[1]][[1]],
"thought" -> #}|> &, sections2], "JSON"]; TLPDataset =
Import["/Users/eventhorizon/Downloads/TLP_simple_decimal.json",
"Dataset"]; symbolAndrelation =
Import["https://docs.google.com/spreadsheets/d/e/2PACX-\
1vTV5nLJFkS4iCKojJ7qWMtHTVxAsnNmA_RsmqQircWgELoJqdZWmQn0wF-\
d4l06ivz5pgZJkeDbRxNh/pub?gid=0&single=true&output=csv", "Dataset",
"HeaderLines" -> 1]; ham =
Map[<|{"thought_author" -> "Hammurabi",
"thought_id" -> Position[sections2, #][[1]][[1]],
"thought_section_id" -> Position[sections2, #][[1]][[1]],
"thought_sub_id" -> Position[sections2, #][[1]][[1]],
"thought" -> #}|> &, sections2]; russ =
Map[<|{"thought_author" -> "Russell Foltz-Smith",
"thought_id" -> #[[6]], "thought_section_id" -> #[[5]],
"thought_sub_id" -> #[[1]], "thought" -> #[[2]]}|> &,
Normal@symbolAndrelation]; witt =
Map[<|{"thought_author" -> "Wittgenstein (trans. Ogden)",
"thought_id" ->
Flatten[Position[Normal[TLPDataset][[All, 2]], #[[2]]]][[1]],
"thought_section_id" -> #[[1]], "thought_sub_id" -> #[[2]],
"thought" -> #[[3]]}|> &, Normal@TLPDataset]; allThoughts =
Join[ham, russ, witt]
Turn all the thoughts into a nice python dataframe for easy loading into Nomic python API
pythonCodeAll = StringJoin[{
"import pandas as pd\n",
"import json\n",
"thoughts = <* allThoughts *>\n",
"dfAll = pd.DataFrame(data=thoughts)\n",
"dfAll['thought_sub_id'] \
=dfAll['thought_sub_id'].astype('float64')\n",
"dfAll['thought_section_id'] \
=dfAll['thought_section_id'].astype('float64')\n",
"dfAll"
}];
(* Evaluate the Python code in the external session. *)
ExternalEvaluate[session, pythonCodeAll]
Create your nomic map!
ExternalEvaluate[session, "import nomic
nomic.login('[YOUR NOMIC API KEY"]
ExternalEvaluate[session, "from nomic import atlas
dataset = atlas.map_data(data=dfAll,
indexed_field='thought',
is_public=False,
identifier='[your URL friendly name]',
embedding_model='nomic-embed-text-v1.5',
description='how are thoughts really connected?')
dataset.maps[0]"]
After your data finishes mapping on Nomic (it will get embeddings automatically) you'll get a url that takes you to your interactive map like this
then you bring your embedding, topic and duplicate detected data back into WolframLang for more fun.
ExternalEvaluate[session, "from nomic import atlas
from nomic import atlas
from nomic import AtlasDataset
datasetThoughts = AtlasDataset('thoughts-of-three')
map = datasetThoughts.maps[0]
map.data.df
"]
or if you just want the project (2d embeddings)
embeddingsNomicProjected =
ExternalEvaluate[session, "from nomic import atlas
from nomic import atlas
from nomic import AtlasDataset
map.embeddings.projected
"]
You'll probably just want the X,Y projected embeddings and the other data in one dataframe/dataset (wolframlang concept for dataframe)
embeddingsNomicProjected =
ExternalEvaluate[session, "from nomic import atlas
from nomic import atlas
from nomic import AtlasDataset
merged_df = pd.merge(map.embeddings.projected,map.data.df, on='id_')
merged_df
"]
now more from within Wolfram, like plotting the author names in the embedding space
ListPlot[
Labeled[{#[[1]], #[[2]]}, #[[3]]] & /@
Values[embeddingsNomicProjected[All, {"x", "y", "thought_author"}]],
ColorFunction -> "DarkRainbow"]
Let's add in the Nomic topics so we can label with that too!
embeddingsNomicProjected =
ExternalEvaluate[session, "from nomic import atlas
from nomic import atlas
from nomic import AtlasDataset
merged_df = pd.merge(map.embeddings.projected,map.topics.df, \
on='id_')
merged_df2 = pd.merge(merged_df ,map.data.df, on='id_')
merged_df2
"]
ListPlot[
Labeled[{#[[1]], #[[2]]}, {#[[3]]}] & /@
Values[embeddingsNomicProjected[
All, {"x", "y", "topic_depth_1", "thought_author"}]],
ImageSize -> Full, ColorFunction -> "DarkRainbow"]
Maybe philosophers (ancient lawyers) have longer thoughts the more they write?
sentimentByAuthor = {#[[4]], #[[3]],
Classify["Sentiment", #[[5]]]} & /@
Values[embeddingsNomicProjected[
All, {"x", "y", "topic_depth_2", "thought_author", "thought",
"thought_id"}]]
what do you think the feelings are of each author by subject? :)
have fun!
p.s. here's an set of animations of Nomic showing different topologies of the thought data Animation of Embedding and data using Nomic Atlas
animating through different projects to make some philosophic point about: * length of philosophers ideas * order in which they have thoughts * diffusion of authors thoughts * artful interpretation of Wittgensteins famous last thought from Tractatus