That post involves turning the solution into a supervised learning problem by building a dataset of finalized renders of the pde solution generated by NDSolve, then training a network on the dataset.
I am not exactly sure how that method ideally compares to something like the techniques described here (paper1) or here (paper2), but it could potentially be an acceptable approach. The creator of that post claims an 8% error, but this could be a result of hyperparameter selection of the model.
Is NDSolve fast enough at solving high-dimensional differential equations to build a training dataset of thousands of results to train on? If NDSolve took a long enough time, it would probably be faster to directly optimize a black-box PINN learner that can directly use the gradients to train on (as mentioned in the papers linked above) rather than a distance function to the 'correct' NDSolve solution.
I am very new to this area, so any insight regarding the tradeoffs would be greatly appreciated.