I have not done much with neural networks in the past, but the Wolfram Language made it easy for me to jump right in. This page: https://resources.wolframcloud.com/NeuralNetRepository/resources/GPT-2-Transformer-Trained-on-WebText-Data made setting up a language model for text completion as simple as copy-paste! I've been having fun messing around with the model.
But I have an issue now. As you can see from the page I posted, the built in model lets you choose a "size" for the model: one can currently choose between 117M and 335M. As you can see from this page: https://github.com/openai/gpt-2 the researchers who made the model released three sizes of their model, 117M, 335M, and 774M. For their own reasons they have not released the full model yet. However, another group has apparently replicated their full 1.5B model anyway: https://blog.usejournal.com/opengpt-2-we-replicated-gpt-2-because-you-can-too-45e34e6d36dc They have links in their first paragraph to the model, apparently.
I've messed around with the first three sizes of models on a website that implemented them, and in my opinion there seems to be quite a difference in how it performs when the dataset gets larger. I would love to try out the 1.5B model, which was not on the website I tried, and is not , apparently, built into the Wolfram Language at this time. Even having access to the 774M model in the Wolfram Lanugage would be nice.
Does anyone know how to go about importing the larger models from those websites into the Wolfram Lanuguage and then running them? I did try downloading some files from the website with the larger model and using the "Import" function, but didn't have much luck with that as it didn't seem to recognize the file format. Again, I feel like I'm pretty good with WL, (wolfram language), but I'm not too proficient with the details of neural networks, I'm just pluggin stuff in as is suggested on that Wolfram page about GPT-2. Alternatively if anyone happened to know anything about if Wolfram Research plans to implement the larger model(s) any time soon that might be interesting to know as well.
Thanks!