Group Abstract

Message Boards

4.6K Views

3 Replies

0 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

Data Science Wolfram Language Machine Learning

Posted 5 years ago

Hello, I am new at the world of machine learning and I wanted to tackle the task of learning this world through working on a project. To solve my project I would like to train a simple NN (Dense layers) unsupervised. I thought on my problem as a reinforcement learning frame but I noticed a key difference. As far as I am aware, the reinforcement learning framework defines an agent that interacts with an eviroment. At each time step the agent can choose a finite set of possible actions, those actions interact with the enviroment and a reward value is given to the agent. The point of reinforcement learning is to train the agent to perform the best actions in the enviroment (the actions that end up maximazing the total reward). My question is: Is there a way to do reinforcement learning with an agent that has no impact on the states of the enviroment?. In my problem I have: A data set containing all the states of the enviroment. Each state is a vector with 130 elements that span between -1 and 1. (what the agent does is irrelevant). A function that given a particular state and action outputs a numerical reward. (Depends also on some of the previous state action pairs already performed) What I want to accomplish: To find the optimal weights of the NN (the agent) so that the cummulative reward is maximum (trained on the dataset). Is there a machine learning algorithm that can perform this task? Is this task even possible?

POSTED BY: Daniel Casasampera

3 Replies

Sort By:

Posted 2 years ago

It is pretty late but if you are still interested, I think your problem can be done using reinforcement learning. There is a sample code by Landajuela (https://community.wolfram.com/groups/-/m/t/2256202 ). This code seems to work and solves the Cartpole problem. It also seems straight-forward to mimic this code and let your data be trained in a similar way. Is your data a time series? If you don't mind you may give a link to your data or a simulated one and see if the code above can be transformed to serve your problem. (Cannot guarantee though.)

POSTED BY: Young Kim

Posted 5 years ago

Thank you for your welcome. My question is vague because I expect and equally vague (general) answer. I'll edit the question to put more context in the problem but Im pretty sure my question, altough maybe vague, is answerable.

POSTED BY: Daniel Casasampera

Posted 5 years ago

Welcome to Wolfram Community! Please make sure you know the rules: https://wolfr.am/READ-1ST Your post is too vague. Please describe your subject extensively providing the details, examples, code, and other relevant ideas, so it is clear what exactly you are looking for.

POSTED BY: EDITORIAL BOARD

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback