Nice work!
Just a small comment on this part:
Second limitation : During training Mathematica is not keeping track
of the statistics of the intermediate values (input and output of
layers).
[...]
So, to get those statistics I am just applying each layer of the
trained network one after another and keeping track of the input /
output
This can be done with more ease using the NetPort[All]
evaluation property:
NetChain[{Sin, Cos, Log}][{1, 2, 3}, NetPort[All]]
<|NetPort[{1, "Output"}] -> {0.841471, 0.909297, 0.14112},
NetPort[{2, "Output"}] -> {0.666367, 0.6143, 0.990059},
NetPort[{3, "Output"}] -> {-0.405915, -0.487271, -0.00999066}|>