This was a great project. I was wondering, since the general impression is that typical behaviors are already found in simple rules, how the machine learning compares to simple statistics. For example, a quick test for halting might be to examine a few steps and see if the expression grows quickly or if it already halts. Of course, not a perfect test, but how the machine learning compares to this sort of crude test.