I recently noticed that Mathematica uses MxNet as the backend for neural networks. It seems to have been integrated in 2015. The blog post listing the rationale is here.
MxNet seems to have not gained the momentum to become popular, you can see the trends from "papers with code". Lack of popularity means framework may be slow to develop.
For instance, this question about using neural networks to fit ODEs from Joshua Schrier. It requires underlying framework to support higher order gradients. There's an issue to add support in MxNET but progress has stalled. Meanwhile PyTorch/TensorFlow/JAX support this feature.