Sorry if my post was unclear or if it appeared that I am bluntly asking for help. Perhaps copying the code here is going to make my already confusing question even more confusing. I have spent a ton of time searching online and reading through Mathematica manual pages. I was hoping that someone here can either tell me that such a feature I am looking for does not exists in Mathematica or someone gives me a pointer.
Some reinforcement learning methods such as TRPO or PPO use minimization (maximization) of some entropy or Kullbeck-Leibler divergence - see, for example, equations 2b and 2c in this recent paper:
Hsu, Chloe Ching-Yun, Celestine Mendler-Dünner, and Moritz Hardt. "Revisiting Design Choices in Proximal Policy Optimization." arXiv preprint arXiv:2009.10897 (2020).
I want to be able to dynamically update the scaling factor of a specific loss term during training. Say below is how I train my network and note that the Loss functions for different parts of the network are optimized separately using Scaled. Scaled allows me to scale individual losses by different factors. For example, -1 for the clip loss maximizes that term, and 1.0 for valueFunctionLoss minimizes the value function.
So using Scaled, if I need to, let's say, use a different scaling factor for the KL divergence loss of the network, klForwardLoss, say 0.01, I use Scaled[0.01].
But what if I want to change/update beta during training, say 0.01 at first and then slowly update it to get to 0.1. Is doing this possible?
resultNet = NetTrain[
net,
ppoSampler[#Net, #BatchSize] &,
All,
LossFunction -> {
"clipLoss" -> Scaled[-1.0]
, "valueFunctionLoss" -> Scaled[1.0]
, "klForwardLoss" -> Scaled[beta] (* (1) <-------BETA, can it be updated here?*)
},
Method -> "RMSProp",
BatchSize -> 32,
MaxTrainingRounds -> 20000,
LearningRate -> 0.00025,
TrainingUpdateSchedule -> {"policy", "value"},
WorkingPrecision -> "Real64",
TrainingProgressFunction->Function[
(*(2) Can beta be updated here?*)
(* access to the network is provided through #Net *)
]
]