# Using NetModel to "fine tune" models with new final layers?

GROUPS:
 David Cardinal 2 Votes Hi -- First, congrats on 11.1. Support for the 1080 GPU is enough for me to get excited. I also love that I can load pre-trained models using NetModel, as that is an increasingly obvious strategy for problem solving. However, I'm not clear on how I would go about keeping the weights from the lower (feature) layers, while re-training the upper layers. Knowing how thorough you are, I'm sure it's possible, but just not sure how. NetChain lets me build layers, but I don't think it lets me operate on them. NetExtract lets me pull out layers, but I don't want just the model, I'd like to keep the pre-trained weights. Thanks! -- David
1 year ago
9 Replies
 Christopher Wolfram 4 Votes Hi David.I think what you are looking for is LearningRateMultipliers. LearningRateMultipliers is an option for NetTrain that lets you specify what learning rate you want for each layer in the network.Look at the details section for more information, but these seem to be the most important lines:"m;;n->r use multiplier r for layers m through n""If r is zero or None, it specifies that the layer or array should not undergo training and will be left unchanged by NetTrain.""LearningRateMultipliers->{layer->None} can be used to "freeze" a specific layer."Here is an example. First, we will get some model from NetModel: net = NetModel["LeNet Trained on MNIST Data"] Then we can get some training and test data: resource = ResourceObject["MNIST"]; trainingData = ResourceData[resource, "TrainingData"]; testData = ResourceData[resource, "TestData"]; And finally, we can train just the layers after layer 7: NetTrain[net, trainingData, ValidationSet -> testData, LearningRateMultipliers -> {1 ;; 7 -> None}] This particular example probably isn't all too useful, as we're probably using the same dataset as this network was originally trained on, but I think this shows the general idea.
1 year ago
 Christopher -- Yes, changing the LearningRateMultiplier for the layers I want fixed is perfect. That combined with Sebastian's reminder about Drop will do what I want. And Matteo's alternate suggestion of saving out the Features is an interesting one, as I think it also does what I'd like, but in a different way. Thanks all. -- David
1 year ago
 Christopher -- Thanks! That definitely helpful, although I'm not sure it is the complete solution. It sounds like we can retrain the existing upper layers (which is helpful), but in many cases the need is to replace them (as in when we want to use a pre-trained classifier to generate 2 classes instead of many classes). In Keras I'd .pop the relevant layers and add back ones I want, but I'm not seeing that in Mathematica that I can find.
1 year ago
 Christopher Wolfram 1 Vote Ah, I think I see what you're saying.If you just want to remove the final layers and replace them with your own, you can do something like this: NetChain[{ (*the parts of the net you want to keep*) Take[net, 7], (*the following are the replacement layers*) 500, Ramp, 10, SoftmaxLayer[] }] Is that what you mean?
1 year ago
 Sebastian Bodenstein 2 Votes Just to mention: Drop could also be used (maybe more similar to pop). Both Take and Drop work on NetChain, whilst Take works on NetGraph as well. This is mentioned in the NetChain and NetGraph docs.
1 year ago
 Matteo Salvarezza 4 Votes Just a comment to Christopher's answer. There is an efficient workflow for transfer learning in general that is worth mentioning: instead of keeping the pre-trained part of the net active during training, use that to extract features from the whole dataset, dump them to disk and train the final part on top of the dumped features. This will make training experiments much faster at the price of extra disk space being used. Since you usually run many trainings, exploring different combinations of hyperparameters and net architectures, the time gain is usually significant.Following Christopher's example, you woud do: extractor = Take[net, 7]; features = extractor[trainingSet[[All, 1]]]; featureTrainingSet = Thread[features -> trainingSet[[All, 2]]]; Export["feature_training_set.mx", featureTrainingSet, "MX"]; Where the above assumes that trainingSet is a list of the form {input1 -> label1, input2 -> label2, ...}.When ready for training just load the new dataset, then define the new layers and train them: featureTrainingSet = Import["feature_training_set.mx"]; newLayers = NetChain[{...}]; trained = NetTrain[newLayers, featureTrainingSet] After training, you can obtain your final model by simply doing final = NetChain[{extractor, trained}] 
 UPDATE: Even though the Weights show up in Red, if I use NetExtract on the nested NetChains, they seem to have legit values. So maybe I'm fine. Testing now... -- Seems to be fine, which is great. Sorry for the false alarm, although I still don't understand why the Weights show as Red & the Net shows as Uninitialized in this situation.Hi again -- I've been trying to use the combination of Drop & then NetChain to add my new layers, but when I call NetChain it appears to be clearing the weights of my pre-trained network. (They show up in red, although they were correctly black when first loaded). Some sample code & a screenshot of the first bit of the output below. Thanks, as always, for any help!: nm = NetModel[ "Inception V3 Trained on ImageNet Competition Data"]; bn = BatchNormalizationLayer[]; relu = NetInitialize[ElementwiseLayer[Ramp]]; dpLayer2 = NetInitialize[DotPlusLayer[1024, "Input" -> 2048]]; dpLayer3 = NetInitialize[DotPlusLayer[2, "Input" -> 1024]]; normLayer3 = NetInitialize[SoftmaxLayer[]]; nmFull = NetChain[{Drop[nm, -3], dpLayer2, relu, dpLayer3, normLayer3}, "Output" -> NetDecoder[{"Class", Range[0, 1]}]] ![output of adding additional layers to a pre-trained network][1]