Super interesting research, and very good results given the scope and difficulty of the task. Especially love the idea on just having a neural network overfit the entire world :P Given that your end network architecture was tuned automatically, why do you think a "mostly linear" neural network ended up working best for this task?