Just a comment to Christopher's answer.
There is an efficient workflow for transfer learning in general that is worth mentioning: instead of keeping the pre-trained part of the net active during training, use that to extract features from the whole dataset, dump them to disk and train the final part on top of the dumped features. This will make training experiments much faster at the price of extra disk space being used. Since you usually run many trainings, exploring different combinations of hyperparameters and net architectures, the time gain is usually significant.
Following Christopher's example, you woud do:
extractor = Take[net, 7];
features = extractor[trainingSet[[All, 1]]];
featureTrainingSet = Thread[features -> trainingSet[[All, 2]]];
Export["feature_training_set.mx", featureTrainingSet, "MX"];
Where the above assumes that trainingSet
is a list of the form {input1 -> label1, input2 -> label2, ...}
.
When ready for training just load the new dataset, then define the new layers and train them:
featureTrainingSet = Import["feature_training_set.mx"];
newLayers = NetChain[{...}];
trained = NetTrain[newLayers, featureTrainingSet]
After training, you can obtain your final model by simply doing
final = NetChain[{extractor, trained}]