Hi Sam, you ask a good question.
That's the surprising part -- the only WL "training data" given to the model is the gray text shown in each example above. This is sometimes referred to as "few shot learning" -- when a model is able to adapt to a new task by only seeing a few examples.
That said, OpenAI trained GPT-3 on a huge corpus of text taken from the web, which undoubtedly would have contained examples of WL code, so it's possible that's it's drawing on some of that prior knowledge.