I just want to clarify a few things about this post for
- The timing results are completely misleading: for a real-world dataset, TensorFlow and the WL neural net framework should be faster than your NumPy code, and TensorFlow and the WL neural net frameworks should have very similar speeds for both GPU and CPU. Only in the special case of a super small dataset should you see these timing differences. There are many reasons:
NetTrain starts plotting loss curves, both NetTrain and TensorFlow run a compilation step that tries to find any runtime optimizations possible by rewriting the computation graph of the net. All of this is completely unnecessary for tiny examples, but can produce major speedups for large nets, which is the case that people care about optimizing.
- Lines of code versus TensorFlow: TensorFlow is designed to be a super low-level framework that gives maximal flexibility (it can do things that the WL neural net framework can't, due to this flexibility), whilst the WL neural net framework is designed to be as high-level as possible to be as simple to use as possible (it itself uses a low-level framework as a backend, MXNet). A much more interesting comparison is against Keras (a high-level framework built on top of TensorFlow), which you mention.
- The comparison between NumPy and WL is strange: the analogue of using NumPy is writing a neural net from scratch using WL
PackedArrays. Why not compare against that instead?