Message Boards Message Boards

Discontinuous Deep Learning?

Posted 7 years ago

Over the last week I have been training a deep learning NN (VGG style). Overall the results look good, but the discontinuities between training sessions worry me. Here is the Validation loss graph: Training over 20 epochs with 4 different sessions

Each training session lasts 5 epochs, then when finished results are examined and the next is started if necessary:

  1. Training Session #1: epochs 1 - 5
  2. Training Session #2: epochs 6 - 10
  3. Training Session #3: epochs 11 - 15
  4. Training Session #4: epochs 16 - 20

Each of these sessions are just started using the previous with the following command:

{trainCNN4a1e, LossPlot4a1e, weightPlot4a1e, gradPlot4a1e, TrainTime4a1e, MeanBatchesPerSec4a1e, MeanInputsPerSecond4a1e, BatchLossList4a1e, RoundLossList4a1e, ValidationLossList4a1e} = NetTrain[trainCNN4a1d, TrainSet, {"TrainedNet", "LossEvolutionPlot", "RMSWeightEvolutionPlot", "RMSGradientEvolutionPlot", "TotalTrainingTime", "MeanBatchesPerSecond", "MeanInputsPerSecond", "BatchLossList", "RoundLossList", "ValidationLossList"}, ValidationSet -> Scaled[0.2], Method -> {"SGD", "Momentum" -> 0.95}, TrainingProgressReporting -> "Print", MaxTrainingRounds -> 5, BatchSize -> 256];

From one session to the next the only thing that changes is the previous trained model (this case: trainCNN4a1d) is used as the starting point for the training. In all 4 training sessions shown above this is held true, the last trained NN model is used in the next.

If I ran all these Epochs as one Training session (instead of 4) these discontinuities would NOT be present. Why are they present for this case when I do incremental training?

POSTED BY: Bryan Minor
4 Replies

So I can control the Learning rate with the following parameter setting on the Method:

  • LearningRate
  • LearningRateSchedule

Not really sure what are reasonable values for each, particularly the later.

I can also mitigate the issue of varying Validation Sets by specifying the same one for all cases.

I will report back my findings, it will be a few days.

POSTED BY: Bryan Minor

From one session to the next the only thing that changes is the previous trained model

There is one other thing I see that does change - the validation set. Each training session will have a new validation set and some new training data. Each time you run NetTrain 20% of your data is randomly selected to be the validation set. So there's new data in each training session.

Under these conditions, I would expect somewhat of a zigzag pattern. I'm not sure however if that justifies what we see in your example.

Or worse, maybe each round is somewhat ..overfitting.. to the validation set? I'm not sure what the correct word choice is there.

POSTED BY: Sean Clarke

Thanks Sean! I think you are correct. I appreciate your insight on this issue.

Now off to do more Deep Learning NN models.

POSTED BY: Bryan Minor

So after talking with the developers about this, it turns out there's a better explanation.

The learning rate also changes. The learning rate will be relatively high at the start of NetTrain and go down as net train runs. It's likely that when you restart NetTrain, it uses a learning rate that's too high and it messes up the previously learned values.

POSTED BY: Sean Clarke
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract