I get some immediate errors when training, but if I change this line:
<|"Measurement"->Function[N[#BatchData["score"]][[1]]],"Key"->"Score"|>
to this:
<|"Measurement"->Function[N[#BatchData["score"]]],"Key"->"Score"|>
it works.
(and it seems pretty clear from the code above that the "score" is a real number)
However, the trained agent doesn't seem to do very well on my 2018 MacBook pro.
Twice it hung (although after manually stopping, the agent does pretty well), and once it finished but failed to hold up the cart pole (image attached).