With the DeleteCachePNG tip I was able to get it working. If you want to play with it in my silly notebook above you can add (/ change the NetTrain call):
LoadTrainingPair[idx_] := Module[{out},
out = <|"Input" ->
First@Image`ImportExportDump`ImageReadPNG[frogImages[[idx]]],
"Output" ->
First@Image`ImportExportDump`ImageReadPNG[gtImages[[idx]]]|>;
Image`ImportExportDump`DeleteCachePNG[];
out
]
trainingGenerator = Function[
ParallelMap[LoadTrainingPair,
RandomSample[Range[Length[frogImages]], #BatchSize]]
]
trained =
NetTrain[net, trainingGenerator, BatchSize -> 32,
MaxTrainingRounds -> 100, TargetDevice -> "GPU"]
That (and a variation on it for my work setup that was a bit different) allowed me to train without memory issues and still at 130-180 samples / sec. (the real version was tested on Ubuntu 20.04 + Mathematica 12.3.1 and the silly notebook above was tested on Windows 11 + Mathematica 13)