Message Boards Message Boards

GROUPS:

Tips for out-of-core training of neural net for semantic segmentation

Posted 4 months ago
1090 Views
|
7 Replies
|
5 Total Likes
|

I'm posting because I'm a little bit at the end of my rope. I put together a network for semantic segmentation (just a U-net style thing). I've got a few tens of thousands of labeled images (input image and output labeled image). I tried to train this with a a list of {File[input] -> File[output],...} and it seems to work except it leaks memory and consumes all 64 GB on my machine in about 5 minutes. I spent a few more hours putting together an HDF5 following the undocumented procedure on StackExchange (https://mathematica.stackexchange.com/questions/142135/how-to-use-mathematica-to-train-a-network-using-out-of-core-classification?noredirect=1&lq=1 ) but I couldn't get this to work with image inputs & outputs either (mostly these issues seem to be around the shapes of inputs and outputs but without it being documented this seems like a good way to waste a day or two; the errors I got were malformed so I'm guessing I'm one of the few to ever hit those also). It seems like my last option is the MongoDB approach, but I've already spent a lot of time on this and I'd like to know if anyone has successfully trained a pixel-wise classifier and if so how you managed the data.

I know there are semantic segmentation models on the Wolfram Neural Net repo, but so far I can only guess they must be done by importing ONNX models that were trained with Python.

POSTED BY: Dan Farmer
7 Replies

Assuming your segmentation masks are images, there was a memory leak in the fast image import function used but the NN framework which should have been fixed for 13.0.1 so be on the lookout for that update.

In the meantime here are few things you can try:

  1. If the segmentations masks are small and you have less then 257 classes, save them in-core as byte arrays, they will take less space than 64bit integer arrays as they automatically gets save as Integer8

    (mask = RandomInteger[10, {128, 128}]) // ByteCount
    BinarySerialize[mask] // ByteCount
    (*
    131280
    17512
    *)
    
  2. Keep them out of core but export them as WXF files

    (* In[117] *)
    file1 = BinaryWrite["mask1", BinarySerialize[mask]];
    Close["mask1"];
    
    (* In[125] *)
    (mask1 = BinaryDeserialize@ReadByteArray[file1]) // MaxMemoryUsed // RepeatedTiming
    mask1 == mask
    
    (* Out[125] *)
    (* {0.000401338, 167864} *)
    
    (* Out[126] *)
    (* True *)
    
  3. Not sure why the one above is not using the same encoding but you can force it to use one byte per class

    (* In[122] *)
    file2 = BinaryWrite["mask2", BinarySerialize[NumericArray[mask, "Integer8"]]];
    Close["mask2"];
    
    (* In[128] *)
    (mask2 = BinaryDeserialize@ReadByteArray[file2]) // MaxMemoryUsed // RepeatedTiming
    
    (* Out[128] *)
    (* {0.00037028, 110752} *)
    
    (* In[130] *)
    mask2 == mask
    
    (* Out[130] *)
    (* True *)
    

These solution would require the appropriate encoder on the segmentation port, e.g.

ElementwiseLayer[Sin, "Input" -> NetEncoder[{"Function", BinaryDeserialize, {128, 128}}]]
Posted 4 months ago

The images are large enough that #1 isn't an option (out-of-core is genuinely needed). I didn't yet try the .wxf route because I was hoping to not have to transcode tens of thousands of files.

I tried using MongoLink since that was the only officially documented out-of-core solution I hadn't tried yet. It does work, but the performance is terrible because of Mongo's 16 MB per result set limit (so GPU throughput was something like 12 images/second. I should have known that already but I think it would be worth adding to the documentation that the Mongo solution is not appropriate for image based networks.

I wrote my own generator to read batches of PNGs like

generator = Function[ParallelMap[First@Image`ImportExportDump`ImageReadPNG, 
RandomSample[filenames, #BatchSize]]

Which worked and achieved ~180 images / second but also leaked memory.

I then spent a while longer trying to get the unofficial HDF5 support thing working but I haven't managed yet. I created an HDF5 with "Input" and "Output" datasets each n x w x h but the HDF5 importer says the format of the file is wrong. I think the error message is telling me it wants the dataset to be n x w * h -- but I'm not sure if that's true and to even attempt it I'm going to have to writer NetEncoders and decoders the reshape everything going in and out and I just doubt it's going to work anyway so I don't see myself spending the effort tonight.

I'll try the wxf approach tomorrow probably.

I would like to add though that I really like everything else about the neural net framework. I know I've been venting a bit here, but I really do want to figure out how to make this work. I haven't posted code yet because I've been working on a network for my job but I'll try to make an example with publicly available data and try to put together something constructive for this thread.

POSTED BY: Dan Farmer

The low level PNG importer ImageReadPNG is automatically caching the result. If you use the internal function directly you must clean it up yourself with

Image`ImportExportDump`DeleteCachePNG[]

This should hopefully solve the memory leak problem with that approach.

Posted 4 months ago

Here's a simple example. The network itself is really dumb, don't pay any attention to that I was just trying to slap something together that would do some plausible amount of work on the GPU. On my machine (Windows 11, Mathematica 13 with Neural Net paclet version 13.0.3) this will eat all of the memory on my computer. If you elect to download the data I mentioned it's ~200 MB. Obviously that would fit in RAM if you just read it all at once, but for the actual case I'm dealing with that isn't an option.

POSTED BY: Dan Farmer
Posted 4 months ago

I can confirm almost the same problem. Out-of-core data eat all my 64GB memory. The generator function does not help. Consequently I have to reduce the volume of the training data and the max training round.

My platform is ubuntu 20.04, Mathematica 13.0.

POSTED BY: Kyle Jiang
Posted 4 months ago

With the DeleteCachePNG tip I was able to get it working. If you want to play with it in my silly notebook above you can add (/ change the NetTrain call):

LoadTrainingPair[idx_] := Module[{out},
  out = <|"Input" -> 
     First@Image`ImportExportDump`ImageReadPNG[frogImages[[idx]]], 
    "Output" -> 
     First@Image`ImportExportDump`ImageReadPNG[gtImages[[idx]]]|>;
  Image`ImportExportDump`DeleteCachePNG[];
  out
  ]
trainingGenerator = Function[
  ParallelMap[LoadTrainingPair, 
   RandomSample[Range[Length[frogImages]], #BatchSize]]
  ]
trained = 
 NetTrain[net, trainingGenerator, BatchSize -> 32, 
  MaxTrainingRounds -> 100, TargetDevice -> "GPU"]

That (and a variation on it for my work setup that was a bit different) allowed me to train without memory issues and still at 130-180 samples / sec. (the real version was tested on Ubuntu 20.04 + Mathematica 12.3.1 and the silly notebook above was tested on Windows 11 + Mathematica 13)

POSTED BY: Dan Farmer
Posted 4 months ago

Thanks! your solution do solve my problem with data generator function when NetTrain, that is:

Image`ImportExportDump`DeleteCacheJPEG[]`
POSTED BY: Kyle Jiang
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract