Great post! Thanks for sharing a concrete application =)
Few tips. The net can be made both faster and more powerful by
- embedding augmentation on the input image
- using the stride to downsample
- keep the convolution pyramid going deeper to reduce the size of the final aggregation
I also did not see a significand drop in accuracy using grayscale images instead of RGB.
A small quality of life trick is to use a NetDecoder to automatically decorate the ouput. in this case NetDecoder["Boolean"] is the right choice.
All this considered this is the new net
net = NetInitialize @ NetChain[
{
ImageAugmentationLayer[{100, 100}, "ReflectionProbabilities" -> {.5, .5}],
ConvolutionLayer[16, {3,3}, "Stride" -> 2],
ElementwiseLayer[Ramp],
ConvolutionLayer[32, {3,3}, "Stride" -> 2],
ElementwiseLayer[Ramp],
ConvolutionLayer[64, {3,3}, "Stride" -> 2],
ElementwiseLayer[Ramp],
ConvolutionLayer[128, {3,3}, "Stride" -> 2],
AggregationLayer[Mean],
LinearLayer[{}],
LogisticSigmoid
},
"Input" -> NetEncoder[{"Image", {120,120}, ColorSpace -> "Grayscale"}],
"Output" -> NetDecoder["Boolean"]
]
And this is the training code. Notice that I am using the file wrapper as the whole pipeline supports out-of-core training.
negativeImages = File /@ FileNames["*.jpg", FileNameJoin[{dataDir, "Negative"}]];
poisitiveImages = File /@ FileNames["*.jpg", FileNameJoin[{dataDir, "Positive"}]];
split = {3800, 400, 15000};
ranges = Partition[Join[{0}, Accumulate[split - 1]] + 1, 2, 1]
(* {{1, 3800}, {3800, 4199}, {4199, 19198}} *)
SeedRandom[1234];
negatives = Thread[RandomSample[negativeImages] -> False];
positives = Thread[RandomSample[poisitiveImages] -> True];
{train, val, test} =
Table[
Join[Take[negatives, s], Take[positives, s]],
{s, ranges}
];
Length /@ {train, val, test}
(* {7600, 800, 30000} *)
trainingResult =
NetTrain[net, train, All, ValidationSet -> val, MaxTrainingRounds -> 4,
TrainingProgressMeasurements -> {"ROCCurvePlot", "F1Score", "Accuracy"}] *)
Then you can test with
trained = trainingResult["TrainedNet"];
NetMeasurements[trained, test, {"Accuracy", "ConfusionMatrixPlot",
"ROCCurvePlot"}]
I get an accuracy of 0.988.
For an application like this you probably want to calibrate the classifier to have close to zero false negatives (better to check and discard than ignore a crack!)