Message Boards Message Boards

4
|
833 Views
|
13 Replies
|
9 Total Likes
View groups...
Share
Share this post:
GROUPS:

[WSG25] Daily Study Group: What is ChatGPT Doing... and Why Does It Work?

Posted 15 days ago

A one-week Wolfram U Daily Study Group covering Stephen Wolfram's best-selling book What is ChatGPT Doing... and Why Does It Work? begins on Monday, March 3, 2025.

enter image description here

Join a cohort of fellow learners to discover the principles that underly ChatGPT and other LLMs. I have adapted the material from the aforementioned book into a series of four notebooks covering topics ranging from probabilistic text generation to neural nets, machine learning, embeddings and even transformer models.

On the final day of the study group, I will go through a bunch of interesting examples using our new Notebook Assistant, which closely integrates powerful LLMs with the Wolfram documentation system and notebook interface.

Stephen Wolfram's book is aimed at anybody who is curious about these ideas, and this study group follows the book's lead. Therefore, no prior Wolfram Language, machine learning, or even coding experience is necessary to attend this study group.

Please feel free to post any questions, ideas and/or useful links in this thread between sessions—we always love to continue the discussion here on Community! If you'd like to read the discussion from the last time we ran this study group, you can find that here.

This is a one-week study group that will run from March 3 through March 7 2025 at 11:00am Central US Time each day.

REGISTER HERE

enter image description here

POSTED BY: Arben Kalziqi
13 Replies
Posted 1 day ago

Arben, On Wednesday I had asked, "Is DeepSeek doing something fundamentally different, or did they find a way to do it more efficiently?" You responded, "I have to say I'm not sure on that front—I'll try to look into this tomorrow and post about it in the Community thread." Have you had a chance to look into this yet?

Also, I am not seeing a Digest_Day5 in the Daily Q&A Digests.

POSTED BY: Gerald Oberg

Hmm... annoying!! I absolutely dropped it in there yesterday. I'll make sure it works this time.

As far as DeepSeek goes, as best I can tell there are indeed some architectural improvements, but the overall idea and structure remain the same and most of the improvements are in the training process. This article from MIT Technology Review is insightful! https://www.technologyreview.com/2025/01/31/1110740/how-deepseek-ripped-up-the-ai-playbook-and-why-everyones-going-to-follow-it/

POSTED BY: Arben Kalziqi

Arben, Thanks 10^6 for a wonderful course! Question: Will you be posting an updated Q&A Digest along with an updated notebook for Day 5? We would appreciate it very much.

POSTED BY: Zbigniew Kabala

Ah, yes! The digest can be added asap, which at this hour probably means Monday morning :). As for an updated notebook—if you mean the transcript of the chats, I've added that already and it should be visible. Let me know if not! (If you mean the questions people asked in the Thursday survey, I'll try to review those more thoroughly and provide answers where reasonable.)

POSTED BY: Arben Kalziqi
Posted 4 days ago

I was wondering in the last class two topics related to line of research in Wolfram:

1) Is Wolfram Research developing their own Algorithmic Differentiation framework for programs, a.k.a (parametrized)functions, that depends on loops, conditionals, recursion, and (non)smooth elemental functions? I commented in class this is between numerical and symbolic differentiation. By the way, this is the standard in Scientific Machine Learning and the backbone to make computationally efficient optimizers like ADAM (and their subsequent versions).

2) In case that (1) is negative, are you more focused on developing the discrete techniques to emulate neural networks capacity via Rule Arrays applied to cellular automata?

POSTED BY: Angel Rojas
Posted 4 days ago

Arben, Can you please explain your decision in your communications to abrogate the standard rule of capitalizing the first letter in a sentence? Is that just a strategy to minimize the time to post a response to a question/comment? Is it a standard style followed by certain tech people? Would you do the same in more formal settings, such as published articles? Are you in the vanguard of a movement to change the English language? No disrespect or criticism intended - I am just curious …

POSTED BY: Gerald Oberg

I don't! If you're referring to the Q&A digests, that's because those are just logs of the live chats from the sessions rather than email or otherwise "official"/formal communications. I think you'll find that the number of people who capitalized the first word of their messages back on AIM in 1997 was also quite small—though to your point, I do imagine that that number is shrinking over time as people have more access to instant back-and-forth communication. Language does change over time, and while I am largely a stickler for rules in a visceral sense I'm certainly not a prescriptivist. (If I were, I might point out that you use a hyphen rather than an em-dash in your last sentence, and add a novel space before the ellipses :). Language—written and spoken—always changes, particularly when exacerbated by the movement to new mediums and entry tools like keyboards where it's easier to type a hyphen than an en- or em-dash.)

POSTED BY: Arben Kalziqi
Posted 4 days ago

In response to the survey question, "How will you use what you learned at this Daily Study Group?" I wrote: I will have a better appreciation and understanding of what is being discussed in the numerous newscasts, podcasts, articles, interviews that one encounters about ChatGPT or other LLMs. Something the course did not address: How could the technology overviewed lead to the catastrophic results people are predicting about AGI? One would not think that even monstrously large matrices could produce malicious consciousness. (I am sometimes reminded of the book I read as a teenager, "Colossus: The Forbin Project", a 1966 science fiction novel by D. F. Jones, about super-computers taking control of mankind.) The things Arben demoed are really impressive, but there is no "mind" (with potential intensions) producing them. I would like to hear Arben's thoughts about these issues. Even more, can you point us somewhere that Stephen Wolfram has discussed these issues (potential threats of AI or AGI)?

Here is a good expert discussion: https://drive.google.com/file/d/1JVPc3ObMP1L2a53T5LA1xxKXM6DAwEiC/view

Also: https://www.google.com/books/edition/The_Age_of_AI/Y2QwEAAAQBAJ?hl=en&gbpv=1

This could be considered a post-DeepSeek update to the reference above: https://www.csis.org/analysis/deepseek-huawei-export-controls-and-future-us-china-ai-race

POSTED BY: Gerald Oberg

Hi Gerald—I've taken some time to respond to this one because I'm not sure there's much I can say in my position here. I have personal opinions on this, but we'd kind of get into the weeds more than I think we ought to on Community. I have two main things that I think I can say:

  1. LLMs really are quite good at a lot of things, as we've seen. However, some companies suggesting that they're SO powerful that we need to be extremely worried about them "taking over" (et c.) needs to be understood in its proper context—namely, ask "who benefits?" I think that it's highly unlikely that most western governments outright ban or even strongly regulate this sort of stuff, so when a company says this kind of thing, I think the only reasonably-expectable outcome is that they create a public image of their product that they are trying to sell as being unbelievably powerful.

  2. At the very least, I think that in the short and medium term the larger danger to society posed by LLMs is the potential for actual, sourced knowledge to be "evaporated". LLMs can learn "the gist" of facts about the world, but they don't know them and generally don't retain them word for word or have the ability to reliably produce sources. As more and more people rely on "what AI said" in terms of forming their opinions, beliefs, and views about certain facts of the world, I think that we run the risk of some bad outcomes. (An unexpected one I just saw recently—somebody who must have been rather young asked for help from other people on a forum, but their request was formatting exactly like a query to an LLM: "Explain it this and this way, and DON'T do this or I won't believe you." Scary!) You have to take this with a grain of salt because I'm an employee here at Wolfram telling you this, but this very issue is why computational language and LLM integration with tools like Wolfram Language is really important: it provides a way for users to be sure that what they're getting from the LLM is related in a concrete way to the real world.

I don't think that there's a 1:1 discussion from Stephen on this topic, but you're likely to find some posts of interest here: https://writings.stephenwolfram.com/category/artificial-intelligence/

POSTED BY: Arben Kalziqi

Thanks Arbin, I did not see this aspect of the problem. Very interesting,

POSTED BY: Laurence Bloxham

Gradient decent optimization procedures can get trapped at falsw (local) minimums. How does Chat GPT avoid or correct false mionimums? Aside Incredible clas, many thanks Arbin.

POSTED BY: Laurence Bloxham

Thanks, Laurence! One of the big reasons that LLMs don't get stuck is that they live in such a high-dimensional space. Imagine you're on some 2D surface: if you're in a well, it's hard to get out because you only have two directions to move—but imagine you had tens of thousands of directions to try to get out. It's much easier!

POSTED BY: Arben Kalziqi
Posted 5 days ago

Also, the idea of converging to a global minimiser of the loss function would not be ideal. We do not want our model to overfit the outputs of the training set, as in the supervised learning scenario.

However, the "descent" direction that an optimiser computes should not be only descent. Why? Because the more we explore the space of the parameters is much better to find suitable "contextual" fits to the output space. By exploring as many neighborhoods of local minimizers (while converging to local minimum) as possible, we could improve the training by starting our search from a local minimizer that worked for certain context. However, the optimizer would need to be smart enough to scape again from that local minimizer.

POSTED BY: Angel Rojas
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract