Hi Sander,
I didn't envisage that the program generator would create a question, as such. It would simply generate a single, random line of WL code that produces "interesting" output of some kind.
I envisage the generator function taking perhaps two parameters. The first would specify the max no of characters in the generated WL code (or else we could see a "one-liner" comprising 100,000 characters of code!). The second would specify the form of output that the generated code would produce, such as a graphic, a manipulate, a 3D plot, or formula. So it might look like this:
genCode = WLGenerator[250, "Formula"]
which might produce an expression, say:
genCode = "Exp[i*Pi] ", which could be evaluated to produce the result -1.
Don't ask me how WLGenerator decides that the above expression is "interesting" - that's part of the challenge!
I suppose one could create a WL Generator function, have it produce a bunch of expressions, tag each of them as interesting or uninteresting and then use that data to build a ML model to try to learn which is which.