Message Boards Message Boards


GPT-2 NetModel encoder issue?

Posted 2 months ago
0 Replies
0 Total Likes

It seems that the GPT-2 net model does not encode input words correctly. Trying the official Wolfram examples - in the help section - for word generation mostly gives me random words.

I spent some time and now I believe that the issue is that the encoder for this model does not correctly encode the words. For example, I looked at the encoder vocabulary and made sure that the word "Hitman" is in there. I then gave the encoder the word "hitman". Interestingly, the word vectors are generated for "hit" and "man" separately.

lm = NetModel[{"GPT2 Transformer Trained on WebText Data", "Task" -> "LanguageModeling", "Size" -> "774M"}]

NetExtract[lm, "Input"]["Hitman"]

Output: {17634, 550}

According to decoder, these two indices are associated with the words "Hit" and "man":

NetExtract[lm, {"Output", "Labels"}][[{17634, 550}]]

Output: {"Hit", "man"}

Try other words, and you will see the same type of behavior - like "Gorgeous" splits into {"G", "orge", "ous"}.

I am relatively new to Mathematica... Am I doing something wrong or there is really something wrong with the encoder?


Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
or Discard

Group Abstract Group Abstract