Message Boards Message Boards

GROUPS:

GPT-2 NetModel encoder issue?

Posted 2 months ago
231 Views
|
0 Replies
|
0 Total Likes
|

It seems that the GPT-2 net model does not encode input words correctly. Trying the official Wolfram examples - in the help section - for word generation mostly gives me random words.

I spent some time and now I believe that the issue is that the encoder for this model does not correctly encode the words. For example, I looked at the encoder vocabulary and made sure that the word "Hitman" is in there. I then gave the encoder the word "hitman". Interestingly, the word vectors are generated for "hit" and "man" separately.

lm = NetModel[{"GPT2 Transformer Trained on WebText Data", "Task" -> "LanguageModeling", "Size" -> "774M"}]

NetExtract[lm, "Input"]["Hitman"]

Output: {17634, 550}

According to decoder, these two indices are associated with the words "Hit" and "man":

NetExtract[lm, {"Output", "Labels"}][[{17634, 550}]]

Output: {"Hit", "man"}

Try other words, and you will see the same type of behavior - like "Gorgeous" splits into {"G", "orge", "ous"}.

I am relatively new to Mathematica... Am I doing something wrong or there is really something wrong with the encoder?

-Ethan

Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract