A few salient points:
You state that you wish to use "mean-squared loss". However, your code does not do this. When an Elementwise[LogisticSigmoid]
is the last layer, NetTrain
will default to using the CrossEntropyLoss
(see ref/LossFunction) If you wish to use a different loss function that this default, set the LossFunction -> MeanSquaredLossLayer[]
option in NetTrain
.
You may need to increase the MaxTrainingRounds
option in NetTrain
. By default (if nothing else is specified) it will run a maximum of 10^4 batches, but this may not be enough. Try increasing this to converge the results. For example, using the default loss (CrossEntropyLoss
), I found that the loss was still converging beyond this default; increasing MaxTrainingRounds -> 10^5
batches resulted in a well-converged results.
