Message Boards Message Boards

Using NetModel to "fine tune" models with new final layers?

mathematics wolfram language machine learning

Hi -- First, congrats on 11.1. Support for the 1080 GPU is enough for me to get excited. I also love that I can load pre-trained models using NetModel, as that is an increasingly obvious strategy for problem solving. However, I'm not clear on how I would go about keeping the weights from the lower (feature) layers, while re-training the upper layers. Knowing how thorough you are, I'm sure it's possible, but just not sure how. NetChain lets me build layers, but I don't think it lets me operate on them. NetExtract lets me pull out layers, but I don't want just the model, I'd like to keep the pre-trained weights. Thanks! -- David

POSTED BY: David Cardinal
8 days ago

Hi David.

I think what you are looking for is LearningRateMultipliers. LearningRateMultipliers is an option for NetTrain that lets you specify what learning rate you want for each layer in the network.

Look at the details section for more information, but these seem to be the most important lines:

"m;;n->r use multiplier r for layers m through n"

"If r is zero or None, it specifies that the layer or array should not undergo training and will be left unchanged by NetTrain."

"LearningRateMultipliers->{layer->None} can be used to "freeze" a specific layer."

Here is an example. First, we will get some model from NetModel:

net = NetModel["LeNet Trained on MNIST Data"]

enter image description here

Then we can get some training and test data:

resource = ResourceObject["MNIST"];
trainingData = ResourceData[resource, "TrainingData"];
testData = ResourceData[resource, "TestData"];

And finally, we can train just the layers after layer 7:

NetTrain[net, trainingData, ValidationSet -> testData, LearningRateMultipliers -> {1 ;; 7 -> None}]

This particular example probably isn't all too useful, as we're probably using the same dataset as this network was originally trained on, but I think this shows the general idea.

POSTED BY: Christopher Wolfram
7 days ago

Christopher -- Yes, changing the LearningRateMultiplier for the layers I want fixed is perfect. That combined with Sebastian's reminder about Drop will do what I want. And Matteo's alternate suggestion of saving out the Features is an interesting one, as I think it also does what I'd like, but in a different way. Thanks all. -- David

POSTED BY: David Cardinal
7 days ago

Christopher -- Thanks! That definitely helpful, although I'm not sure it is the complete solution. It sounds like we can retrain the existing upper layers (which is helpful), but in many cases the need is to replace them (as in when we want to use a pre-trained classifier to generate 2 classes instead of many classes). In Keras I'd .pop the relevant layers and add back ones I want, but I'm not seeing that in Mathematica that I can find.

POSTED BY: David Cardinal
7 days ago

Ah, I think I see what you're saying.

If you just want to remove the final layers and replace them with your own, you can do something like this:

    (*the parts of the net you want to keep*)
    Take[net, 7], 
    (*the following are the replacement layers*)

Is that what you mean?

POSTED BY: Christopher Wolfram
7 days ago

Just to mention: Drop could also be used (maybe more similar to pop). Both Take and Drop work on NetChain, whilst Take works on NetGraph as well. This is mentioned in the NetChain and NetGraph docs.

7 days ago

Just a comment to Christopher's answer.

There is an efficient workflow for transfer learning in general that is worth mentioning: instead of keeping the pre-trained part of the net active during training, use that to extract features from the whole dataset, dump them to disk and train the final part on top of the dumped features. This will make training experiments much faster at the price of extra disk space being used. Since you usually run many trainings, exploring different combinations of hyperparameters and net architectures, the time gain is usually significant.

Following Christopher's example, you woud do:

extractor = Take[net, 7];
features = extractor[trainingSet[[All, 1]]];
featureTrainingSet = Thread[features -> trainingSet[[All, 2]]];
Export["", featureTrainingSet, "MX"];

Where the above assumes that trainingSet is a list of the form {input1 -> label1, input2 -> label2, ...}.

When ready for training just load the new dataset, then define the new layers and train them:

featureTrainingSet = Import[""];
newLayers = NetChain[{...}];
trained = NetTrain[newLayers, featureTrainingSet]

After training, you can obtain your final model by simply doing

final = NetChain[{extractor, trained}]
POSTED BY: Matteo Salvarezza
7 days ago

Hi David, What platform are you using MMA 11.1 on? I was not expecting my 1080 GPU to be supported on OS X, but I have found 11.1 to say it no longer support GPU for NeuralNet functions at all. At least my 950 card worked with 11.0. Rather frustrating....

POSTED BY: David Proffer
7 days ago

David P. -- I'm running on Windows 10. I did a NetTrain with TargetDevice -> GPU and it ran just fine on my EVGA 1080. Just to double check after seeing your post, I ran the code again, and indeed the GPU lights up with activity, so it is really in use.

POSTED BY: David Cardinal
7 days ago

UPDATE: Even though the Weights show up in Red, if I use NetExtract on the nested NetChains, they seem to have legit values. So maybe I'm fine. Testing now... -- Seems to be fine, which is great. Sorry for the false alarm, although I still don't understand why the Weights show as Red & the Net shows as Uninitialized in this situation.

Hi again -- I've been trying to use the combination of Drop & then NetChain to add my new layers, but when I call NetChain it appears to be clearing the weights of my pre-trained network. (They show up in red, although they were correctly black when first loaded). Some sample code & a screenshot of the first bit of the output below. Thanks, as always, for any help!:

nm = NetModel[
   "Inception V3 Trained on ImageNet Competition Data"];

bn = BatchNormalizationLayer[];
relu = NetInitialize[ElementwiseLayer[Ramp]];

dpLayer2 = NetInitialize[DotPlusLayer[1024, "Input" -> 2048]];
dpLayer3 = NetInitialize[DotPlusLayer[2, "Input" -> 1024]];
normLayer3 = NetInitialize[SoftmaxLayer[]];

nmFull = NetChain[{Drop[nm, -3], dpLayer2, relu, dpLayer3, 
   normLayer3}, "Output" -> NetDecoder[{"Class", Range[0, 1]}]]
![output of adding additional layers to a pre-trained network][1]

enter image description here

POSTED BY: David Cardinal
7 days ago

Group Abstract Group Abstract