Group Abstract

Message Boards

WOLFRAM COMMUNITY

3K Views

1 Reply

6 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

Staff Picks Data Science External Programs and Systems Wolfram Language Machine Learning Natural Language Processing Artificial Intelligence

Exploring philosophy with LLM embeddings and Wolfram Language

Russell Foltz-Smith

Russell Foltz-Smith, SmarterX, Wolfram Alum, OpenAI Dev Ambassador, Microsoft MVP

Posted 1 year ago

LLM Embeddings and Philosophy are Easy and Fun to Use with Wolfram Lang Overview of Project Get some Philosophy texts (Code of Hammurabi, Wittgenstein, my own (Russell Foltz-Smith)) Clean/Tag it a bit Use Nomic.ai Atlas to produce embeddings and visual map Further explore the data in WolframLang Enjoy life with philosophy over the eons Tech used Get an API Key from Nomic.ai https://docs.nomic.ai/ Set up your python environment, makes using nomic easier from within WolframLang The Code start a python External Session session = StartExternalSession[{"Python", "Name" -> "wolframlang"}] get the data sources ExampleData for Hammurabi JSON for Wittgenstein's Tractatus from here https://github.com/Geurt/tractatus/tree/master/data (then simplified using a script, chatGPT, wolframGPT or other can help ya do it) my own philosophy thoughts from a public google sheet text = ExampleData[{"Text", "CodeOfHammurabiEnglish"}]; sections2 = StringSplit[text, RegularExpression["\\d+\\. "]]; sections = StringSplit[text, RegularExpression["(?<=\\d\\. )"]]; hammurabi = ExportString[ Map[<\|{"thought_author" -> "Hammurabi", "thought_id" -> Position[sections2, #][[1]][[1]], "thought_section_id" -> Position[sections2, #][[1]][[1]], "thought_sub_id" -> Position[sections2, #][[1]][[1]], "thought" -> #}\|> &, sections2], "JSON"]; TLPDataset = Import["/Users/eventhorizon/Downloads/TLP_simple_decimal.json", "Dataset"]; symbolAndrelation = Import["https://docs.google.com/spreadsheets/d/e/2PACX-\ 1vTV5nLJFkS4iCKojJ7qWMtHTVxAsnNmA_RsmqQircWgELoJqdZWmQn0wF-\ d4l06ivz5pgZJkeDbRxNh/pub?gid=0&single=true&output=csv", "Dataset", "HeaderLines" -> 1]; ham = Map[<\|{"thought_author" -> "Hammurabi", "thought_id" -> Position[sections2, #][[1]][[1]], "thought_section_id" -> Position[sections2, #][[1]][[1]], "thought_sub_id" -> Position[sections2, #][[1]][[1]], "thought" -> #}\|> &, sections2]; russ = Map[<\|{"thought_author" -> "Russell Foltz-Smith", "thought_id" -> #[[6]], "thought_section_id" -> #[[5]], "thought_sub_id" -> #[[1]], "thought" -> #[[2]]}\|> &, Normal@symbolAndrelation]; witt = Map[<\|{"thought_author" -> "Wittgenstein (trans. Ogden)", "thought_id" -> Flatten[Position[Normal[TLPDataset][[All, 2]], #[[2]]]][[1]], "thought_section_id" -> #[[1]], "thought_sub_id" -> #[[2]], "thought" -> #[[3]]}\|> &, Normal@TLPDataset]; allThoughts = Join[ham, russ, witt] Turn all the thoughts into a nice python dataframe for easy loading into Nomic python API pythonCodeAll = StringJoin[{ "import pandas as pd\n", "import json\n", "thoughts = <* allThoughts >\n", "dfAll = pd.DataFrame(data=thoughts)\n", "dfAll['thought_sub_id'] \ =dfAll['thought_sub_id'].astype('float64')\n", "dfAll['thought_section_id'] \ =dfAll['thought_section_id'].astype('float64')\n", "dfAll" }]; ( Evaluate the Python code in the external session. ) ExternalEvaluate[session, pythonCodeAll] Create your nomic map! ExternalEvaluate[session, "import nomic nomic.login('[YOUR NOMIC API KEY"] ExternalEvaluate[session, "from nomic import atlas dataset = atlas.map_data(data=dfAll, indexed_field='thought', is_public=False, identifier='[your URL friendly name]', embedding_model='nomic-embed-text-v1.5', description='how are thoughts really connected?') dataset.maps[0]"] After your data finishes mapping on Nomic (it will get embeddings automatically) you'll get a url that takes you to your interactive map like this then you bring your embedding, topic and duplicate detected data back into WolframLang for more fun. ExternalEvaluate[session, "from nomic import atlas from nomic import atlas from nomic import AtlasDataset datasetThoughts = AtlasDataset('thoughts-of-three') map = datasetThoughts.maps[0] map.data.df "] or if you just want the project (2d embeddings) embeddingsNomicProjected = ExternalEvaluate[session, "from nomic import atlas from nomic import atlas from nomic import AtlasDataset map.embeddings.projected "] You'll probably just want the X,Y projected embeddings and the other data in one dataframe/dataset (wolframlang concept for dataframe) embeddingsNomicProjected = ExternalEvaluate[session, "from nomic import atlas from nomic import atlas from nomic import AtlasDataset merged_df = pd.merge(map.embeddings.projected,map.data.df, on='id_') merged_df "] now more from within Wolfram, like plotting the author names in the embedding space ListPlot[ Labeled[{#[[1]], #[[2]]}, #[[3]]] & /@ Values[embeddingsNomicProjected[All, {"x", "y", "thought_author"}]], ColorFunction -> "DarkRainbow"] Let's add in the Nomic topics so we can label with that too! embeddingsNomicProjected = ExternalEvaluate[session, "from nomic import atlas from nomic import atlas from nomic import AtlasDataset merged_df = pd.merge(map.embeddings.projected,map.topics.df, \ on='id_') merged_df2 = pd.merge(merged_df ,map.data.df, on='id_') merged_df2 "] ListPlot[ Labeled[{#[[1]], #[[2]]}, {#[[3]]}] & /@ Values[embeddingsNomicProjected[ All, {"x", "y", "topic_depth_1", "thought_author"}]], ImageSize -> Full, ColorFunction -> "DarkRainbow"] Maybe philosophers (ancient lawyers) have longer thoughts the more they write? sentimentByAuthor = {#[[4]], #[[3]], Classify["Sentiment", #[[5]]]} & /@ Values[embeddingsNomicProjected[ All, {"x", "y", "topic_depth_2", "thought_author", "thought", "thought_id"}]] what do you think the feelings are of each author by subject? :) have fun! p.s. here's an set of animations of Nomic showing different topologies of the thought data Animation of Embedding and data using Nomic Atlas animating through different projects to make some philosophic point about: length of philosophers ideas * order in which they have thoughts * diffusion of authors thoughts * artful interpretation of Wittgensteins famous last thought from Tractatus

LLM Embeddings and Philosophy are Easy and Fun to Use with Wolfram Lang

Overview of Project

Get some Philosophy texts (Code of Hammurabi, Wittgenstein, my own (Russell Foltz-Smith))
Clean/Tag it a bit
Use Nomic.ai Atlas to produce embeddings and visual map
Further explore the data in WolframLang
Enjoy life with philosophy over the eons

Tech used

Get an API Key from Nomic.ai https://docs.nomic.ai/
Set up your python environment, makes using nomic easier from within WolframLang

The Code

start a python External Session

session = StartExternalSession[{"Python", "Name" -> "wolframlang"}]

get the data sources

ExampleData for Hammurabi
JSON for Wittgenstein's Tractatus from here https://github.com/Geurt/tractatus/tree/master/data (then simplified using a script, chatGPT, wolframGPT or other can help ya do it)

my own philosophy thoughts from a public google sheet

text = ExampleData[{"Text", "CodeOfHammurabiEnglish"}];
sections2 = StringSplit[text, RegularExpression["\\d+\\. "]];
sections = 
 StringSplit[text, RegularExpression["(?<=\\d\\. )"]]; hammurabi = 
 ExportString[
  Map[<|{"thought_author" -> "Hammurabi", 
      "thought_id" -> Position[sections2, #][[1]][[1]], 
      "thought_section_id" -> Position[sections2, #][[1]][[1]], 
      "thought_sub_id" -> Position[sections2, #][[1]][[1]], 
      "thought" -> #}|> &, sections2], "JSON"]; TLPDataset = 
 Import["/Users/eventhorizon/Downloads/TLP_simple_decimal.json", 
  "Dataset"]; symbolAndrelation = 
 Import["https://docs.google.com/spreadsheets/d/e/2PACX-\
1vTV5nLJFkS4iCKojJ7qWMtHTVxAsnNmA_RsmqQircWgELoJqdZWmQn0wF-\
d4l06ivz5pgZJkeDbRxNh/pub?gid=0&single=true&output=csv", "Dataset", 
  "HeaderLines" -> 1]; ham = 
 Map[<|{"thought_author" -> "Hammurabi", 
     "thought_id" -> Position[sections2, #][[1]][[1]], 
     "thought_section_id" -> Position[sections2, #][[1]][[1]], 
     "thought_sub_id" -> Position[sections2, #][[1]][[1]], 
     "thought" -> #}|> &, sections2]; russ = 
 Map[<|{"thought_author" -> "Russell Foltz-Smith", 
     "thought_id" -> #[[6]], "thought_section_id" -> #[[5]], 
     "thought_sub_id" -> #[[1]], "thought" -> #[[2]]}|> &, 
  Normal@symbolAndrelation]; witt = 
 Map[<|{"thought_author" -> "Wittgenstein (trans. Ogden)", 
     "thought_id" -> 
      Flatten[Position[Normal[TLPDataset][[All, 2]], #[[2]]]][[1]], 
     "thought_section_id" -> #[[1]], "thought_sub_id" -> #[[2]], 
     "thought" -> #[[3]]}|> &, Normal@TLPDataset]; allThoughts = 
 Join[ham, russ, witt]

Turn all the thoughts into a nice python dataframe for easy loading into Nomic python API

pythonCodeAll = StringJoin[{
      "import pandas as pd\n",
    "import json\n",
      "thoughts = <* allThoughts *>\n",
      "dfAll = pd.DataFrame(data=thoughts)\n",
    "dfAll['thought_sub_id'] \
=dfAll['thought_sub_id'].astype('float64')\n",
    "dfAll['thought_section_id'] \
=dfAll['thought_section_id'].astype('float64')\n",
      "dfAll"
    }];

(* Evaluate the Python code in the external session. *)
ExternalEvaluate[session, pythonCodeAll]

should look like this

Create your nomic map!

ExternalEvaluate[session, "import nomic
nomic.login('[YOUR NOMIC API KEY"]

ExternalEvaluate[session, "from nomic import atlas
dataset = atlas.map_data(data=dfAll,
indexed_field='thought',            
is_public=False,
identifier='[your URL friendly name]',
embedding_model='nomic-embed-text-v1.5',
description='how are thoughts really connected?')
dataset.maps[0]"]

After your data finishes mapping on Nomic (it will get embeddings automatically) you'll get a url that takes you to your interactive map like this enter image description here

then you bring your embedding, topic and duplicate detected data back into WolframLang for more fun.

ExternalEvaluate[session, "from nomic import atlas
from nomic import atlas
from nomic import AtlasDataset
datasetThoughts = AtlasDataset('thoughts-of-three')
map = datasetThoughts.maps[0]
map.data.df
"]

or if you just want the project (2d embeddings)

embeddingsNomicProjected = 
 ExternalEvaluate[session, "from nomic import atlas
from nomic import atlas
from nomic import AtlasDataset
map.embeddings.projected
"]

You'll probably just want the X,Y projected embeddings and the other data in one dataframe/dataset (wolframlang concept for dataframe)

embeddingsNomicProjected = 
 ExternalEvaluate[session, "from nomic import atlas
from nomic import atlas
from nomic import AtlasDataset
merged_df = pd.merge(map.embeddings.projected,map.data.df, on='id_')
merged_df
"]

nomic data out

now more from within Wolfram, like plotting the author names in the embedding space

ListPlot[
 Labeled[{#[[1]], #[[2]]}, #[[3]]] & /@ 
  Values[embeddingsNomicProjected[All, {"x", "y", "thought_author"}]],
  ColorFunction -> "DarkRainbow"]

View From Within Wolfram Lang

Let's add in the Nomic topics so we can label with that too!

embeddingsNomicProjected = 
ExternalEvaluate[session, "from nomic import atlas
from nomic import atlas
from nomic import AtlasDataset
merged_df = pd.merge(map.embeddings.projected,map.topics.df, \
on='id_')
merged_df2 = pd.merge(merged_df ,map.data.df, on='id_')
merged_df2
"]

ListPlot[
 Labeled[{#[[1]], #[[2]]}, {#[[3]]}] & /@ 
  Values[embeddingsNomicProjected[
    All, {"x", "y", "topic_depth_1", "thought_author"}]], 
 ImageSize -> Full, ColorFunction -> "DarkRainbow"]

top level subjects

Maybe philosophers (ancient lawyers) have longer thoughts the more they write?

thought order by thought length

sentimentByAuthor = {#[[4]], #[[3]], 
    Classify["Sentiment", #[[5]]]} & /@ 
  Values[embeddingsNomicProjected[
    All, {"x", "y", "topic_depth_2", "thought_author", "thought", 
     "thought_id"}]]

what do you think the feelings are of each author by subject? :)

have fun!

p.s. here's an set of animations of Nomic showing different topologies of the thought data Animation of Embedding and data using Nomic Atlas

animating through different projects to make some philosophic point about: * length of philosophers ideas * order in which they have thoughts * diffusion of authors thoughts * artful interpretation of Wittgensteins famous last thought from Tractatus

POSTED BY: Russell Foltz-Smith

1 Reply

Sort By:

EDITORIAL BOARD

EDITORIAL BOARD, WOLFRAM

Posted 1 year ago

-- you have earned *Featured Contributor Badge* Your exceptional post has been selected for our editorial column *Staff Picks* http://wolfr.am/StaffPicks and Your Profile is now distinguished by a *Featured Contributor Badge* and is displayed on the Featured Contributor Board. Thank you!

POSTED BY: EDITORIAL BOARD

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback