I have not done much with neural networks in the past, but the Wolfram Language made it easy for me to jump right in. This page: https://resources.wolframcloud.com/NeuralNetRepository/resources/GPT-2-Transformer-Trained-on-WebText-Data made setting up a language model for text completion as simple as copy-paste! I've been having fun messing around with the model.
But I have an issue now. As you can see from the page I posted, the built in model lets you choose a "size" for the model: one can currently choose between 117M and 335M. As you can see from this page: https://github.com/openai/gpt-2 the researchers who made the model released three sizes of their model, 117M, 335M, and 774M. For their own reasons they have not released the full model yet. However, another group has apparently replicated their full 1.5B model anyway: https://blog.usejournal.com/opengpt-2-we-replicated-gpt-2-because-you-can-too-45e34e6d36dc
They have links in their first paragraph to the model, apparently.
I've messed around with the first three sizes of models on a website that implemented them, and in my opinion there seems to be quite a difference in how it performs when the dataset gets larger. I would love to try out the 1.5B model, which was not on the website I tried, and is not , apparently, built into the Wolfram Language at this time. Even having access to the 774M model in the Wolfram Lanugage would be nice.
Does anyone know how to go about importing the larger models from those websites into the Wolfram Lanuguage and then running them? I did try downloading some files from the website with the larger model and using the "Import" function, but didn't have much luck with that as it didn't seem to recognize the file format. Again, I feel like I'm pretty good with WL, (wolfram language), but I'm not too proficient with the details of neural networks, I'm just pluggin stuff in as is suggested on that Wolfram page about GPT-2. Alternatively if anyone happened to know anything about if Wolfram Research plans to implement the larger model(s) any time soon that might be interesting to know as well.
Thank you for your suggestions.
When we first published the models, only 117M and 334M was released and as a result, I imported just those two. I will make sure to add the remainder of the models, and put it as a suggestion.
Great to hear! Thanks! Nice that you guys are keeping on top of things, the new model hasn't been out that long and I'm sure there's lots of other stuff to add too, so truly thanks :-)
Yes, while I was aware of the newer/bigger model releases, we were focusing on other models/improving or introducing new functionalities that would allow for importing more models. However, I will make sure that the bigger models are also added in to-do list or imported.
Hi, apologies if I'm being impatient. I hear you that you guys have other priorities for neural networks and all the other things Wolfram Research is doing. I'm just wondering if there's any update thanks :-/ I'm working on a project that uses GPT-2 and getting some interesting results. I'll post my code when I'm finished, but ofc I just can't wait to plug in a more powerful language generator into the code X-D
If anyone had tips for importing the full GPT-2 dataset in another way either, that would also be much appreciated. Otherwise I could look into finding a way to link it myself I guess but I'm laazy lol. That's what Wolfram Language is great for, just plugging stuff in that just works X-D Thanks you guys.
Yes, these models have been converted. I will try to publish them as soon as possible.
Thanks for inquiring again (and being impatient in this case).
I haven't seen the larger models posted yet. any chance those can get up?