Group Abstract Group Abstract

Message Boards Message Boards

4
|
1.6K Views
|
18 Replies
|
15 Total Likes
View groups...
Share
Share this post:
GROUPS:

[WSG25] Daily Study Group: What is ChatGPT Doing... and Why Does It Work?

Posted 2 months ago

A one-week Wolfram U Daily Study Group covering Stephen Wolfram's best-selling book What is ChatGPT Doing... and Why Does It Work? begins on Monday, March 3, 2025.

enter image description here

Join a cohort of fellow learners to discover the principles that underly ChatGPT and other LLMs. I have adapted the material from the aforementioned book into a series of four notebooks covering topics ranging from probabilistic text generation to neural nets, machine learning, embeddings and even transformer models.

On the final day of the study group, I will go through a bunch of interesting examples using our new Notebook Assistant, which closely integrates powerful LLMs with the Wolfram documentation system and notebook interface.

Stephen Wolfram's book is aimed at anybody who is curious about these ideas, and this study group follows the book's lead. Therefore, no prior Wolfram Language, machine learning, or even coding experience is necessary to attend this study group.

Please feel free to post any questions, ideas and/or useful links in this thread between sessions—we always love to continue the discussion here on Community! If you'd like to read the discussion from the last time we ran this study group, you can find that here.

This is a one-week study group that will run from March 3 through March 7 2025 at 11:00am Central US Time each day.

REGISTER HERE

enter image description here

POSTED BY: Arben Kalziqi
18 Replies
Posted 1 month ago

Several observations:

I absolutely love computational essays as a means to host lectures. They should be used everywhere–for all sorts of online presentations. I loved the example here on Lecture #4 of projecting the "shadow" of a multidimensional array onto a 2D surface. That provides a perfect model for what happens when we make puns: there is some orientation of projection of a multi-dimensional word-space where the "shadow" has particular words line up next to each other. There are many different kinds of puns–just like there are many orientations that one could project the shadows of a word-map.

Indirectly, this course has some profound commentary about learning. In the context of training a neural net, errors are never a problem. Errors are simply a way to re-jigger the neural network to behave better in the future -- to train to a local minimum. Maybe the most important strategy for learning is seeking out strategies-for-possibly-making-errors, noticing when you make errors, and letting the neural network rewire itself as appropriate. We're so obsessed with avoiding [public] errors; that's really silly.

The commentary about what problems AIs cannot solve were as valuable as anything else in this course. I like how Stephen went back to his computational automata to explain this.

Word puzzles and number-puzzles are interesting. When playing knotwords, it's useful to know vowel-consonant combinations and groups of 3 letters. At the same time, that's insufficient to do well at the game. You come face-to-face with strange rules: "if I have guessed a 8-letter word, it must be right". I've gotten better over time, but I couldn't really say how.

Wolfram Research has a compelling value proposition with LLMs + the Wolfram Alpha + the Wolfram Language. It's uncanny how much Wolfram Research's past investments in a vast library of human knowledge are the perfect partner for LLMs. Understanding the shortcomings of LLMs and why they are contained with judicious use of the Wolfram Assistant is an important thing to explain to others.

The Q&A that Arben published from this course are tremendous. Arben's comment that code functions more like a natural language than an algorithm are intriguing. I am paraphrasing his comment; I'll have to scrutinize his written answer.

POSTED BY: Updating Name
Posted 1 month ago

@Arben , I attended the course live on Friday and am going through the other sessions through the recordings. Good job with the poll questions. They worked great in the BigMarker replays. I was able to understand the poll questions as you asked them. Also, you consistently shared the results on your screen; people listening that way could play along. It's a good quiz, too.

I loved your discussion about particles in the universe on Day 1. It tickled me immensely to see an instructor provide a calculation that uses [Wolfram's estimate of] the number of particles in the universe.

That discussion reminded me of a short story by Stanislaw Lem in the book The Cyberiad. In the collection, a pair of "constructor" robots create fantastic machines. In the sixth sally, they create a demon of the second kind to work their way out of a problem. The story references a demon of the first kind -- Maxwell's Demon. Maxwell's demon was about thermodynamics; this second demon was about information. For a book that was created over 50 years ago, Lem's machine in this story is a rather astonishing vision of an LLM. Their demon was using a tiny bit of stale air -- a tiny number of particles -- as the source of its information; it could not possibly work as described in the story. OTOH, Maxwell's demon couldn't possibly work, either. Both hypotheticals are quite interesting; they are fine little thought experiments. I don't know if you -- or anyone -- will learn anything from Lem's short story, but you should be amused. Ask an AI to find you a copy.

POSTED BY: Phil Earnhardt
Posted 1 month ago

Arben, On Wednesday I had asked, "Is DeepSeek doing something fundamentally different, or did they find a way to do it more efficiently?" You responded, "I have to say I'm not sure on that front—I'll try to look into this tomorrow and post about it in the Community thread." Have you had a chance to look into this yet?

Also, I am not seeing a Digest_Day5 in the Daily Q&A Digests.

POSTED BY: Gerald Oberg

Hmm... annoying!! I absolutely dropped it in there yesterday. I'll make sure it works this time.

As far as DeepSeek goes, as best I can tell there are indeed some architectural improvements, but the overall idea and structure remain the same and most of the improvements are in the training process. This article from MIT Technology Review is insightful! https://www.technologyreview.com/2025/01/31/1110740/how-deepseek-ripped-up-the-ai-playbook-and-why-everyones-going-to-follow-it/

POSTED BY: Arben Kalziqi
Posted 1 month ago

Arben, Digest_Day5 is still not showing up in the Daily Q&A Digests. Did you by any chance put it in the Series 58 ChatGPT rather than the Series 62 ChatGPT? Thanks for the link. The three that I put in the other post are more about the geopolitical implications of AI and LLMs rather than their technology.

POSTED BY: Gerald Oberg

Added!! Sorry about that!

POSTED BY: Arben Kalziqi
Posted 1 month ago

Okay, I found it (posted two days ago) using a link from the latest email received two hours ago (3/12 at 11:59AM).

POSTED BY: Gerald Oberg

Arben, Thanks 10^6 for a wonderful course! Question: Will you be posting an updated Q&A Digest along with an updated notebook for Day 5? We would appreciate it very much.

POSTED BY: Zbigniew Kabala

Ah, yes! The digest can be added asap, which at this hour probably means Monday morning :). As for an updated notebook—if you mean the transcript of the chats, I've added that already and it should be visible. Let me know if not! (If you mean the questions people asked in the Thursday survey, I'll try to review those more thoroughly and provide answers where reasonable.)

POSTED BY: Arben Kalziqi
Posted 1 month ago

I was wondering in the last class two topics related to line of research in Wolfram:

1) Is Wolfram Research developing their own Algorithmic Differentiation framework for programs, a.k.a (parametrized)functions, that depends on loops, conditionals, recursion, and (non)smooth elemental functions? I commented in class this is between numerical and symbolic differentiation. By the way, this is the standard in Scientific Machine Learning and the backbone to make computationally efficient optimizers like ADAM (and their subsequent versions).

2) In case that (1) is negative, are you more focused on developing the discrete techniques to emulate neural networks capacity via Rule Arrays applied to cellular automata?

POSTED BY: Angel Rojas
Posted 1 month ago

Arben, Can you please explain your decision in your communications to abrogate the standard rule of capitalizing the first letter in a sentence? Is that just a strategy to minimize the time to post a response to a question/comment? Is it a standard style followed by certain tech people? Would you do the same in more formal settings, such as published articles? Are you in the vanguard of a movement to change the English language? No disrespect or criticism intended - I am just curious …

POSTED BY: Gerald Oberg

I don't! If you're referring to the Q&A digests, that's because those are just logs of the live chats from the sessions rather than email or otherwise "official"/formal communications. I think you'll find that the number of people who capitalized the first word of their messages back on AIM in 1997 was also quite small—though to your point, I do imagine that that number is shrinking over time as people have more access to instant back-and-forth communication. Language does change over time, and while I am largely a stickler for rules in a visceral sense I'm certainly not a prescriptivist. (If I were, I might point out that you use a hyphen rather than an em-dash in your last sentence, and add a novel space before the ellipses :). Language—written and spoken—always changes, particularly when exacerbated by the movement to new mediums and entry tools like keyboards where it's easier to type a hyphen than an en- or em-dash.)

POSTED BY: Arben Kalziqi
Posted 1 month ago

In response to the survey question, "How will you use what you learned at this Daily Study Group?" I wrote: I will have a better appreciation and understanding of what is being discussed in the numerous newscasts, podcasts, articles, interviews that one encounters about ChatGPT or other LLMs. Something the course did not address: How could the technology overviewed lead to the catastrophic results people are predicting about AGI? One would not think that even monstrously large matrices could produce malicious consciousness. (I am sometimes reminded of the book I read as a teenager, "Colossus: The Forbin Project", a 1966 science fiction novel by D. F. Jones, about super-computers taking control of mankind.) The things Arben demoed are really impressive, but there is no "mind" (with potential intensions) producing them. I would like to hear Arben's thoughts about these issues. Even more, can you point us somewhere that Stephen Wolfram has discussed these issues (potential threats of AI or AGI)?

Here is a good expert discussion: https://drive.google.com/file/d/1JVPc3ObMP1L2a53T5LA1xxKXM6DAwEiC/view

Also: https://www.google.com/books/edition/The_Age_of_AI/Y2QwEAAAQBAJ?hl=en&gbpv=1

This could be considered a post-DeepSeek update to the reference above: https://www.csis.org/analysis/deepseek-huawei-export-controls-and-future-us-china-ai-race

POSTED BY: Gerald Oberg

Hi Gerald—I've taken some time to respond to this one because I'm not sure there's much I can say in my position here. I have personal opinions on this, but we'd kind of get into the weeds more than I think we ought to on Community. I have two main things that I think I can say:

  1. LLMs really are quite good at a lot of things, as we've seen. However, some companies suggesting that they're SO powerful that we need to be extremely worried about them "taking over" (et c.) needs to be understood in its proper context—namely, ask "who benefits?" I think that it's highly unlikely that most western governments outright ban or even strongly regulate this sort of stuff, so when a company says this kind of thing, I think the only reasonably-expectable outcome is that they create a public image of their product that they are trying to sell as being unbelievably powerful.

  2. At the very least, I think that in the short and medium term the larger danger to society posed by LLMs is the potential for actual, sourced knowledge to be "evaporated". LLMs can learn "the gist" of facts about the world, but they don't know them and generally don't retain them word for word or have the ability to reliably produce sources. As more and more people rely on "what AI said" in terms of forming their opinions, beliefs, and views about certain facts of the world, I think that we run the risk of some bad outcomes. (An unexpected one I just saw recently—somebody who must have been rather young asked for help from other people on a forum, but their request was formatting exactly like a query to an LLM: "Explain it this and this way, and DON'T do this or I won't believe you." Scary!) You have to take this with a grain of salt because I'm an employee here at Wolfram telling you this, but this very issue is why computational language and LLM integration with tools like Wolfram Language is really important: it provides a way for users to be sure that what they're getting from the LLM is related in a concrete way to the real world.

I don't think that there's a 1:1 discussion from Stephen on this topic, but you're likely to find some posts of interest here: https://writings.stephenwolfram.com/category/artificial-intelligence/

POSTED BY: Arben Kalziqi

Thanks Arbin, I did not see this aspect of the problem. Very interesting,

POSTED BY: Laurence Bloxham

Gradient decent optimization procedures can get trapped at falsw (local) minimums. How does Chat GPT avoid or correct false mionimums? Aside Incredible clas, many thanks Arbin.

POSTED BY: Laurence Bloxham

Thanks, Laurence! One of the big reasons that LLMs don't get stuck is that they live in such a high-dimensional space. Imagine you're on some 2D surface: if you're in a well, it's hard to get out because you only have two directions to move—but imagine you had tens of thousands of directions to try to get out. It's much easier!

POSTED BY: Arben Kalziqi
Posted 1 month ago

Also, the idea of converging to a global minimiser of the loss function would not be ideal. We do not want our model to overfit the outputs of the training set, as in the supervised learning scenario.

However, the "descent" direction that an optimiser computes should not be only descent. Why? Because the more we explore the space of the parameters is much better to find suitable "contextual" fits to the output space. By exploring as many neighborhoods of local minimizers (while converging to local minimum) as possible, we could improve the training by starting our search from a local minimizer that worked for certain context. However, the optimizer would need to be smart enough to scape again from that local minimizer.

POSTED BY: Angel Rojas
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard