For anybody who was interested in extracting the tokens for a given input from GPT2 (or any other net model that tokenizes that you can get into a notebook, you just need to do this):
tokenizer =
NetExtract[NetModel["GPT2 Transformer Trained on WebText Data"],
"Input"]
Then, you can run tokenizer["whatever text here"] and it will produce the tokens. For example:
In[591]:= tokenizer["Hello there! My name is Arben. Hello hello hello hello hello goodbye goodbye."]
Out[591]= {15241, 357, 50257, 1756, 1183, 63, 688, 11467, 50244, 18180, 23493, 23493, 23493, 23493, 24574, 24574, 50244}