The images are large enough that #1 isn't an option (out-of-core is genuinely needed). I didn't yet try the .wxf route because I was hoping to not have to transcode tens of thousands of files.
I tried using MongoLink since that was the only officially documented out-of-core solution I hadn't tried yet. It does work, but the performance is terrible because of Mongo's 16 MB per result set limit (so GPU throughput was something like 12 images/second. I should have known that already but I think it would be worth adding to the documentation that the Mongo solution is not appropriate for image based networks.
I wrote my own generator to read batches of PNGs like
generator = Function[ParallelMap[First@Image`ImportExportDump`ImageReadPNG,
RandomSample[filenames, #BatchSize]]
Which worked and achieved ~180 images / second but also leaked memory.
I then spent a while longer trying to get the unofficial HDF5 support thing working but I haven't managed yet. I created an HDF5 with "Input" and "Output" datasets each n x w x h but the HDF5 importer says the format of the file is wrong. I think the error message is telling me it wants the dataset to be n x w * h -- but I'm not sure if that's true and to even attempt it I'm going to have to writer NetEncoders and decoders the reshape everything going in and out and I just doubt it's going to work anyway so I don't see myself spending the effort tonight.
I'll try the wxf approach tomorrow probably.
I would like to add though that I really like everything else about the neural net framework. I know I've been venting a bit here, but I really do want to figure out how to make this work. I haven't posted code yet because I've been working on a network for my job but I'll try to make an example with publicly available data and try to put together something constructive for this thread.