Hello, I am new at the world of machine learning and I wanted to tackle the task of learning this world through working on a project.

To solve my project I would like to train a simple NN (Dense layers) unsupervised. I thought on my problem as a reinforcement learning frame but I noticed a key difference.

As far as I am aware, the reinforcement learning framework defines an agent that interacts with an eviroment. At each time step the agent can choose a finite set of possible actions, those actions interact with the enviroment and a reward value is given to the agent. The point of reinforcement learning is to train the agent to perform the best actions in the enviroment (the actions that end up maximazing the total reward).

My question is:

Is there a way to do reinforcement learning with an agent that has no impact on the states of the enviroment?.

In my problem I have:

  • A data set containing all the states of the enviroment. Each state is a vector with 130 elements that span between -1 and 1. (what the agent does is irrelevant).

  • A function that given a particular state and action outputs a numerical reward. (Depends also on some of the previous state action pairs already performed)

What I want to accomplish:

  • To find the optimal weights of the NN (the agent) so that the cummulative reward is maximum (trained on the dataset).

Is there a machine learning algorithm that can perform this task? Is this task even possible?

Welcome to Wolfram Community! Please make sure you know the rules:
Your post is too vague. Please describe your subject extensively providing the details, examples, code, and other relevant ideas, so it is clear what exactly you are looking for.

Thank you for your welcome. My question is vague because I expect and equally vague (general) answer. I'll edit the question to put more context in the problem but Im pretty sure my question, altough maybe vague, is answerable.

