Message Boards Message Boards

Shakespearean GPT from scratch: create a generative pre-trained transformer

10 Replies

Thanks a lot for this excellent post which stimulated me to try to understand more of the internals of the network.

In this context, I came up with two questions which I was not able to figure out by myself:

1) How can I visualize the word embedding space for the Shakespeare vocabulary in order to test if words which are "semantically near" show up nearby in the word embedding map, similar to what Stephen Wolfram showed in his blogpost "What is ChatGPT doing..."? I have the feeling that it should be quite simple, but all my attempts to access the embedding layer of the trained network failed, probably due to my very limited understanding of the tools of net surgery.
2) Is it easily possible to visualize the attention matrix for a particular input sentence?

Any help is appreciated...

POSTED BY: Frank Scherbaum

I find using the OpenAILink within the wolfram notebook kinda fun. And the ability to save them in an orderly fashion super useful. I can almost write sort of a graphic novella. Here's my first try with the defaults enter image description here

enter image description here However, the OpenAICompletion did not follow the 7 word and 5 line rule that I gave. it worked fine on the openAI website but not with the packet command. Here's the output from chatGPT

enter image description here

I tried to change the OpenAIMethod with no luck

enter image description here

POSTED BY: Jack I Houng

with the new 2.0.2 version of openailink now the poetry is as I requested! enter image description here

POSTED BY: Jack I Houng

After reading "What Is ChatGPT Doing … and Why Does It Work?" I really wanted to explore the networks, better understand what they do and play around with them in Mathematica. This post gave me everything in one go... which is awesome!

POSTED BY: Martijn Froeling

Happy to hear that. :)

I'm preparing another post on the original vanilla encoder-decoder transformer architecture (Attention is all you need paper from 2017).

It also might be of your interest this other recent post on Few Shot Learning using GPT-3 and WL: https://community.wolfram.com/groups/-/m/t/2848741

This is an excellent hands-on presentation. Thank you for this. I do have a question, though. In the example with:

trainedNet=NetExtract[results["TrainedNet"],"decode"]; 

Instead of using CloudPut and CloudGet, what would be the correct expression to save to a local file and retrieve it from the local file?

POSTED BY: Loren Abdulezer

Happy to hear that you enjoy it. :)

You can use Export, for example:

Export["trainedModel.wlnet", trainedNet]

And then you can get the model with Import. Note that you might want to specify a particular folder directory to store the model. For example with SetDirectory.

Hope this helps.

Works perfectly. Thanks!

POSTED BY: Loren Abdulezer

Excellent introduction ... thanks!

POSTED BY: Stuart Nettleton

enter image description here -- you have earned Featured Contributor Badge enter image description here Your exceptional post has been selected for our editorial column Staff Picks http://wolfr.am/StaffPicks and Your Profile is now distinguished by a Featured Contributor Badge and is displayed on the Featured Contributor Board. Thank you!

POSTED BY: EDITORIAL BOARD
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract