Maybe the theorem has a set of measure 0 for which it fails? I have 3 nested functions, each quadratic in its inputs. A standard multi-layer linear plus logistic NN with ADAM optimizer can't seem to minimize the loss (or it gets stuck in a local minimum). I've tried varying the number of layers, and their width, but to no avail. What's going on?