View more

View more

View more

### Image of the Day Submit

IOTD | Top Screenshots

### The latest, straight to your Inbox.

Subscribe to GameDev.net Direct to receive the latest updates and exclusive content.

# Understanding how to train Neural Networks for Control Theory Q-learning and SARSA

Old topic!

Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

No replies to this topic

### #1Sevren  Members

Posted 14 August 2013 - 10:27 AM

Hi everyone. I've been learning about Reinforcement learning for the past little bit in an attempt to learn how to create a agent that I could use in a game i.e driving a car around a track. I want to learn how to combine the Neural network architecture with RL such as Q-learning or SARSA.

Normally in Error- back propagation Neural Networks you have both input and a given target
i.e xor pattern input is 0 0 or 1 1 or 0 1 1 0 and the target is either 0 or 1. This is given so it is easy for me to see where to plug in the values for my error back prop function. The problem for me now is given only the state variables  in my testing problem of Mountain car or pendulum how do I go about using Error- back propagation?

Since I first want to build an agent that solves Mountain car as a test Is this the right set of steps?

S =[-0.5; 0] as the inital state ( input into my neural network)

1. create network (2, X-hidden units,3) -> 2 inputs position and velocity  and either 1 ouput or 3 outputs corresponding to actions, with Hidden activation function is sigmoid(tanh) and output is purelin

2. Now run the state values for position and velocity into the network (Feed forward) and get 3 Q values as output, it's 3 outputs as that is how many actions I have.

3. select an action A using e-greedy, either a random one or the best Q-value giving me which action to choose from this state.

4. Execute action A for the problem and receive new state S' and reward

5. Run S' through the neural network and obtain Q S' values

Now I guess I need to compute a target value... given Q-learning where Q(s,a) = Q(s,a)+alpha*[reward+gamma* MAX Q(s',a') -Q(s,a)]
I think my Target output is calculated using:  QTarget=reward+gamma*MAX Q(s',a') right?

So that means now i choose the max Q-value from step  5 and plug it into the QTarget  equation

Do I calculate an Error again like in the original backprop algo?

So Error=QTarget-Q(S,A) ?

and now resume normal Neural Network backprop weight updates?

Thanks, Sevren

Edited by Sevren, 14 August 2013 - 10:29 AM.

Old topic!

Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.