It seems that the GPT-2 net model does not encode input words correctly. Trying the official Wolfram examples - in the help section - for word generation mostly gives me random words.
I spent some time and now I believe that the issue is that the encoder for this model does not correctly encode the words. For example, I looked at the encoder vocabulary and made sure that the word "Hitman" is in there. I then gave the encoder the word "hitman". Interestingly, the word vectors are generated for "hit" and "man" separately.
lm = NetModel[{"GPT2 Transformer Trained on WebText Data", "Task" -> "LanguageModeling", "Size" -> "774M"}]
NetExtract[lm, "Input"]["Hitman"]
Output: {17634, 550}
According to decoder, these two indices are associated with the words "Hit" and "man":
NetExtract[lm, {"Output", "Labels"}][[{17634, 550}]]
Output: {"Hit", "man"}
Try other words, and you will see the same type of behavior - like "Gorgeous" splits into {"G", "orge", "ous"}
.
I am relatively new to Mathematica... Am I doing something wrong or there is really something wrong with the encoder?
-Ethan