In an attempt at creating new gym environments, I want to use Images as inputs to neural networks in reinforcement learning but have got stuck with a problem of training never converging.
Am I making an obvious mistake? Using a deeper CNN as policy network does not help either.
I suspect there is one / are some issues with
CollectEpisodes, sample
step and reward design
loss network definition
Context The first environment (Pixels-v1) is going to be that of a few happy pixels surviving against various simple hazards.
"ObservedState": a 40*40 Image
"ActionSpace": {Left, Right, Up, Down}
"Step": I am not sure how to define the reward here for the neural network to converge, intention for the simplest case: Reward == 1 if stepped closer than ever before to center, otherwise 0; also Ended == True if active Pixel hit edges.
Questions
- Is there anyone willing to help me out with the attached notebook?
- Is there anyone with a different complete example of employing neural networks in reinforcement learning in Mathematica?
Notebook