Group Abstract

Message Boards

WOLFRAM COMMUNITY

10.3K Views

4 Replies

7 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

Discontinuous Deep Learning?

Bryan Minor

Bryan Minor, Envision.ai

Posted 9 years ago

Over the last week I have been training a deep learning NN (VGG style). Overall the results look good, but the discontinuities between training sessions worry me. Here is the Validation loss graph: Each training session lasts 5 epochs, then when finished results are examined and the next is started if necessary: Training Session #1: epochs 1 - 5 Training Session #2: epochs 6 - 10 Training Session #3: epochs 11 - 15 Training Session #4: epochs 16 - 20 Each of these sessions are just started using the previous with the following command: {trainCNN4a1e, LossPlot4a1e, weightPlot4a1e, gradPlot4a1e, TrainTime4a1e, MeanBatchesPerSec4a1e, MeanInputsPerSecond4a1e, BatchLossList4a1e, RoundLossList4a1e, ValidationLossList4a1e} = NetTrain[trainCNN4a1d, TrainSet, {"TrainedNet", "LossEvolutionPlot", "RMSWeightEvolutionPlot", "RMSGradientEvolutionPlot", "TotalTrainingTime", "MeanBatchesPerSecond", "MeanInputsPerSecond", "BatchLossList", "RoundLossList", "ValidationLossList"}, ValidationSet -> Scaled[0.2], Method -> {"SGD", "Momentum" -> 0.95}, TrainingProgressReporting -> "Print", MaxTrainingRounds -> 5, BatchSize -> 256]; From one session to the next the only thing that changes is the previous trained model (this case: trainCNN4a1d) is used as the starting point for the training. In all 4 training sessions shown above this is held true, the last trained NN model is used in the next. If I ran all these Epochs as one Training session (instead of 4) these discontinuities would NOT be present. Why are they present for this case when I do incremental training?

POSTED BY: Bryan Minor

4 Replies

Sort By:

Bryan Minor

Bryan Minor, Envision.ai

Posted 9 years ago

So I can control the Learning rate with the following parameter setting on the Method: LearningRate LearningRateSchedule Not really sure what are reasonable values for each, particularly the later. I can also mitigate the issue of varying Validation Sets by specifying the same one for all cases. I will report back my findings, it will be a few days.

POSTED BY: Bryan Minor

Sean Clarke

Sean Clarke, Wolfram Research

Posted 9 years ago

So after talking with the developers about this, it turns out there's a better explanation. The learning rate also changes. The learning rate will be relatively high at the start of NetTrain and go down as net train runs. It's likely that when you restart NetTrain, it uses a learning rate that's too high and it messes up the previously learned values.

POSTED BY: Sean Clarke

Bryan Minor

Bryan Minor, Envision.ai

Posted 9 years ago

Thanks Sean! I think you are correct. I appreciate your insight on this issue. Now off to do more Deep Learning NN models.

POSTED BY: Bryan Minor

Sean Clarke

Sean Clarke, Wolfram Research

Posted 9 years ago

From one session to the next the only thing that changes is the previous trained model There is one other thing I see that does change - the validation set. Each training session will have a new validation set and some new training data. Each time you run NetTrain 20% of your data is randomly selected to be the validation set. So there's new data in each training session. Under these conditions, I would expect somewhat of a zigzag pattern. I'm not sure however if that justifies what we see in your example. Or worse, maybe each round is somewhat ..overfitting.. to the validation set? I'm not sure what the correct word choice is there.

POSTED BY: Sean Clarke

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback