Message Boards

WOLFRAM COMMUNITY

32841 Views

10 Replies

41 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

Staff Picks Data Science Image Processing Wolfram Language Machine Learning Neural Networks

Semantic Image Segmentation Neural Network in Wolfram Language

Test Account

Posted 7 years ago

Hello everyone, this is my first post in Wolfram community, so I am quite excited about it! I got interested in the neural network functionalities, after attending a talk, right when version 11.0.0 of Mathematica was about to release (so about one and half years back). Since then, I have just been self-teaching myself about neural networks using various resources available online and Wolfram Language documentation. What really helped me with this current post (project) is the ease with which one can implement a seemingly complicated network in Wolfram Language, the level of automation which NetTrain handles behind the scenes (batch sizes, methods, learning rates, initializations... and the list can go on), and the simplicity with which we can all stitch (chain) it together and run (essentially Shift+Enter) using Wolfram Language. So in this post, I have tried to implement SegNet ( https://arxiv.org/pdf/1511.00561.pdf ). It is a convolution neural network for a semantic pixel-wise segmentation. The encoder network is identical to the first 13 layers of the VGGNetwork, identical because each convolution layer is followed by a batch-normalization. The decoder upsamples the image obtained from the encoder, using Max pooling. (please note that this encoder, is not the same as NetEncoder and NetDecoder functionality in Wolfram Language). A snippet from the paper cited above shows the layer organization So let's take a step by step approach to build this network: let's start with the encoder network. To build the networks, I directly referred to the Caffe prototxt files and tried to reproduce the same in Wolfram Language. So if we see the prototxt files that it has a structure of the layers, and these structures have a repetitive pattern. Taking use of that, we can build the encoder in a nicer neater way: encChain[index_, nLayers_, nChannels_] := Module[{tags, names, layers}, tags = Table[ToString[index] <> "_" <> ToString[j], {j, nLayers}]; names = Append[ Flatten[{"conv" <> #, "conv" <> # <> "bn", "relu" <> #} & /@ tags], "pool" <> ToString[index]]; layers = Append[ Flatten@Table[{ConvolutionLayer[nChannels, {3, 3}, "PaddingSize" -> {1, 1}], BatchNormalizationLayer[], ElementwiseLayer[Ramp]}, {nLayers}], PoolingLayer[{2, 2}, "Stride" -> {2, 2}]]; AssociationThread[names -> layers]] encoder = NetChain[Join[ encChain[1, 2, 64], encChain[2, 2, 128], encChain[3, 3, 256], encChain[4, 3, 512], encChain[5, 3, 512]], "Input" -> NetEncoder[{"Image", {256, 256}}]] Next we can move to the decoder section of the network. This is similar to the encoder section, and it also contains the patterns which we had seen in the encoder, only now we need to add the appropriate padding, after the upsampling using a DeconvolutionLayer. Here is a decoder which is similar (if not the same) as the prototxt of the Segnet found in literature decChain[index_, nLayersmax_, nLayersmin_, nChannels_] := Module[{tags, names, layers, nlayers}, nlayers = nLayersmax - nLayersmin + 1; tags = Table[ ToString[index] <> "_" <> ToString[j] <> "_" <> "D", {j, nLayersmax, nLayersmin, -1}]; names = Flatten@ Append[{"deconv" <> ToString[index], "deconv" <> ToString[index] <> "pad"}, Flatten[{"conv" <> #, "conv" <> # <> "bn", "relu" <> #} & /@ tags]]; layers = Flatten@Append[{DeconvolutionLayer[nChannels, {3, 3}, "Stride" -> 2, "PaddingSize" -> 1], PaddingLayer[{{0, 0}, {0, 1}, {0, 1}}]}, Flatten@Table[{ConvolutionLayer[nChannels, {3, 3}, "PaddingSize" -> {1, 1}], BatchNormalizationLayer[], ElementwiseLayer[Ramp]}, {nlayers}]]; AssociationThread[names -> layers]] decChain2[index_, nLayersmax_, nLayersmin_, nChannels_] := Module[{tags, names, layers, nlayers}, nlayers = nLayersmax - nLayersmin + 1; tags = Table[ ToString[index] <> "_" <> ToString[j] <> "_" <> "D", {j, nLayersmax, nLayersmin, -1}]; names = Flatten[{"conv" <> #, "conv" <> # <> "bn", "relu" <> #} & /@ tags]; layers = Flatten@Table[{ConvolutionLayer[nChannels, {3, 3}, "PaddingSize" -> {1, 1}], BatchNormalizationLayer[], ElementwiseLayer[Ramp]}, {nlayers}]; AssociationThread[names -> layers]] decoder = NetChain[Join[ decChain[5, 3, 1, 512], decChain[4, 3, 2, 512], decChain2[4, 1, 1, 256], decChain[3, 3, 2, 256], decChain2[3, 1, 1, 128], decChain[2, 2, 2, 128], decChain2[2, 1, 1, 64], decChain[1, 2, 2, 64]]] The last step would be to just put it all together, chain them with the final layer depending on the application. Since in this code (project), I just wanted to obtain a final image (pixel classified according to the object class), I kept the Output as an image. chain = NetChain[{encoder, decoder, ConvolutionLayer[3, {3, 3}, "PaddingSize" -> 1, "Input" -> {64, 256, 256}, "Output" -> NetDecoder[{"Image"}]]}] Now, as a proof of concept, the network was trained on images from the EarthObject dataset: http://www2.isprs.org/potsdam-2d-semantic-labeling.html. This dataset was chosen because there was already a similar trained network in Caffe https://github.com/nshaud/DeepNetsForEO, and I just wanted to implement the same in Wolfram Language. The dataset consists of images of dimensions {6000,6000}. They were appropriately divided into training, validation, and test images. Once done, they were imported, resized to {2048,2048} images, and partitioned into {256,256} sized images so that I could feed it into the modest-sized GPU that I had. So each image created a set of 64 images. The final last step would be to train it, for training this network, a very modest GPU was used (just a laptop, 4GB graphics card), with Images of dimensions {256,256}, BatchSize 10. dir = SetDirectory[NotebookDirectory[]]; trained=NetTrain[chain, traindata, ValidationSet -> valdata, MaxTrainingRounds -> 1000, TargetDevice -> "GPU", BatchSize -> 10, TrainingProgressCheckpointing -> {"Directory", dir, "Interval" -> Quantity[30, "Minutes"]}] Once the network was trained to some extent (it was not trained completely, since the Validationloss had not plateaued yet), I was impatient to look at results (this is indeed my first neural net exploratory project). The above shows roads, cars, and trees as trained by the network. The first is the actual image, second image is the trained classes, while the third is the actual label (ground truth) I am still working on an efficient way to do error analysis. Please feel free to provide me feedback, so that I can improve on this code, do efficient error analysis, or even suggestions for other nets that I can try to code (implement) in the Wolfram Language. I am aware of the CityScapes dataset (https://www.cityscapes-dataset.com/) and Synthia dataset, which might be interesting applications of this net, and they are still work in progress (limited by the GPU availability) P.S. This was trained for research/leisure purposes only, and cannot be used for commercial use.

POSTED BY: Test Account

10 Replies

Sort By:

Michael Sollami

Michael Sollami, Salesforce

Posted 7 years ago

Great post! The link to your dataset is broken though... Also, can you upload your wlnet so we can try it?

POSTED BY: Michael Sollami

Test Account

Posted 7 years ago

There are similar images that are there in the neural net repository.

POSTED BY: Test Account

Test Account

Posted 7 years ago

Thank you so much for the encouraging words!

POSTED BY: Test Account

Kamil Luto

Kamil Luto, University of Rochester

Posted 7 years ago

Great, inspiring, work! Thank you for sharing. I just started learning neural networks on my own. Your post is really helpful in this voyage!

POSTED BY: Kamil Luto

Rand Baldwin

Rand Baldwin, AEgis Technologies | Defense & Space

Posted 7 years ago

Great first post! Congratulations!

POSTED BY: Rand Baldwin

l van Veen

l van Veen, Hewlett-Packard Enterprise

Posted 7 years ago

Hi, Nice post and congrats with your first post! I'm just curious about the resources you used. You mentioned "various resources available online" Which do you think where the most valuable? Thx!

POSTED BY: l van Veen

Test Account

Posted 7 years ago

I would first suggest going through http://reference.wolfram.com/language/guide/NeuralNetworks.html. Especially the function NetTrain http://reference.wolfram.com/language/ref/NetTrain.html has many good examples to get started. There are videos of free online courses at Wolfram U website: https://www.wolfram.com/wolfram-u/ and https://www.wolfram.com/wolfram-u/deep-neural-networks-computer-vision/ For more theoretical read, Deep Learning by Ian Goodfellow et.al is perhaps the best one-stop resource I found. Also, most of the papers are from archive libraries and they are open source.

POSTED BY: Test Account

Kyle Martin

Kyle Martin, WOLFRAM

Posted 7 years ago

A similar model, but trained instead on grass fields to automatically measure square footage, would be very useful to landscaping companies. Exciting stuff, thanks for sharing.