Message Boards Message Boards

0
|
3253 Views
|
3 Replies
|
0 Total Likes
View groups...
Share
Share this post:

Alternative algorithm for reinforcement learning

Posted 3 years ago

Hello, I am new at the world of machine learning and I wanted to tackle the task of learning this world through working on a project.

To solve my project I would like to train a simple NN (Dense layers) unsupervised. I thought on my problem as a reinforcement learning frame but I noticed a key difference.

As far as I am aware, the reinforcement learning framework defines an agent that interacts with an eviroment. At each time step the agent can choose a finite set of possible actions, those actions interact with the enviroment and a reward value is given to the agent. The point of reinforcement learning is to train the agent to perform the best actions in the enviroment (the actions that end up maximazing the total reward).

My question is:

Is there a way to do reinforcement learning with an agent that has no impact on the states of the enviroment?.

In my problem I have:

  • A data set containing all the states of the enviroment. Each state is a vector with 130 elements that span between -1 and 1. (what the agent does is irrelevant).

  • A function that given a particular state and action outputs a numerical reward. (Depends also on some of the previous state action pairs already performed)

What I want to accomplish:

  • To find the optimal weights of the NN (the agent) so that the cummulative reward is maximum (trained on the dataset).

Is there a machine learning algorithm that can perform this task? Is this task even possible?

3 Replies
Posted 11 months ago

It is pretty late but if you are still interested, I think your problem can be done using reinforcement learning. There is a sample code by Landajuela (https://community.wolfram.com/groups/-/m/t/2256202 ).
This code seems to work and solves the Cartpole problem.
It also seems straight-forward to mimic this code and let your data be trained in a similar way.

Is your data a time series?
If you don't mind you may give a link to your data or a simulated one and see if the code above can be transformed to serve your problem. (Cannot guarantee though.)

POSTED BY: Young Kim

Welcome to Wolfram Community! Please make sure you know the rules: https://wolfr.am/READ-1ST
Your post is too vague. Please describe your subject extensively providing the details, examples, code, and other relevant ideas, so it is clear what exactly you are looking for.

POSTED BY: Moderation Team

Thank you for your welcome. My question is vague because I expect and equally vague (general) answer. I'll edit the question to put more context in the problem but Im pretty sure my question, altough maybe vague, is answerable.

Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract