I've been trying to use your example using BERT for tweets classification (simple Kaggle dataset)
trainembeddings = NetModel["BERT Trained on BookCorpus and English Wikipedia Data"][
forBERT[[All, 1]], TargetDevice -> "CPU"] -> forBERT[[All, 2]];
classifierhead = NetChain[{DropoutLayer[], NetMapOperator[2],
AggregationLayer[Max, 1], SoftmaxLayer[]},
"Output" -> NetDecoder[{"Class", {"negative", "positive"}}]]
bertresults = NetTrain[classifierhead, trainembeddings, All, TargetDevice -> "CPU",
MaxTrainingRounds -> 50]
But, after I build the model, I've no idea how to score with new tweets, and I cannot see how to do it in your example.