In an attempt at creating new gym environments, I want to use Image
s as inputs to neural networks in reinforcement learning but have got stuck with a problem of training never converging.
Am I making an obvious mistake? Using a deeper CNN as policy network does not help either.
I suspect there is one / are some issues with
CollectEpisodes
, sample
step
and reward
design
loss
network definition
Context The first environment (Pixels-v1) is going to be that of a few happy pixels surviving against various simple hazards.
"ObservedState"
: a 40
*40
Image
"ActionSpace"
: {Left, Right, Up, Down}
"Step"
: I am not sure how to define the reward here for the neural network to converge, intention for the simplest case: Reward == 1
if stepped closer than ever before to center, otherwise 0; also Ended == True
if active Pixel hit edges.
Questions
- Is there anyone willing to help me out with the attached notebook?
- Is there anyone with a different complete example of employing neural networks in reinforcement learning in Mathematica?
Notebook