Function approximation of continuous multi-dimensional state space for Q-Learning

Started by
3 comments, last by StubbornDuck 10 years, 5 months ago

Hello,

I am trying to simulate a robot that I want to build (in Unity, hah). To make it walk, I'm trying to use Q-Learning. It's actually working to some extent, but its walking performance could be a lot better. It suffers from an extremely crude approximation of the true utility function; a linear weighted sum of all the sensor input variables. The weights (parameters) are updated through Q-Learning, which is preferable to e.g. genetic algorithms in my case since I intend to implement the same algorithm in the robot microcontroller, and having something that adapts quickly in real time is necessary.

Now, my AI book mentions two ways I might proceed with this:

1. I might try to find and compute more features of the perceived state that I supply as input variables. This will allow me to continue with this crude approach, but it's problematic as it requires me to identify features. Furthermore, I already got way more inputs than I would like - for instance, for the robot to manage to walk at all I now have like 15 sensors (accelerometers, pulse encoders, etc). Although I can probably trim this down later, it makes more a more expensive and inefficient hardware implementation. Much better to solve that which can be solved in software.

2. It should be possible to discretize the continous variables into several parameters to obtain a much better utility approximation. The cart-pole balancing problem is mentioned, and indeed I checked out the BOXES (Michie and Chambers) experiment. Understanding this paper will take some time though due to its terminology differing, and it's also fairly old. Does anyone have a hint about newer implementations or papers that might perform better than BOXES or are easier to understand?

Advertisement
Can you afford a better approximation of your utility function, learning a larger set of parameters? For example, you can put nonlinearities in your weighted sum: instead of summing feature weight*feature value, you could sum feature weight*(feature value - reference value)^feature exponent (three times as many parameters to learn). Other nonlinearity types might be more suitable for certain features.

Omae Wa Mou Shindeiru

Your input is much appreciated. I can definitely add more parameters to the sum this way, and will try it out. One thing though, how would I handle the case where (feature value - reference value) is below zero? Or should I just limit the lower bound to 0? Will have to double check the parameter update step to make sure it doesn't go crazy from this discontinuity.. Also, a one-sided bound seems somewhat wrong to me - it means that for a bipolar value the behavior will be asymmetric (although I could deal with that by splitting magnitude and direction into two different variables).

What the authors of BOXES did to really enable generalized learning over several dimensions was to use basis functions of several variables. For instance, they might have 3^4*5^2 states/parameters due to having four variables with three reference points of interest each, and two variables with 5 reference points each. I am visualizing this like some sort of multi-dimensional grid. From what I understand, this greatly improved quality of learning, at the cost of an exponentially increasing number of parameters. Like I said though, my aim is to have as few sensors as possible and I prefer heavy computation over hardware cost. So I am still looking into this possibility.

 

Your input is much appreciated. I can definitely add more parameters to the sum this way, and will try it out. One thing though, how would I handle the case where (feature value - reference value) is below zero? Or should I just limit the lower bound to 0? Will have to double check the parameter update step to make sure it doesn't go crazy from this discontinuity.. Also, a one-sided bound seems somewhat wrong to me - it means that for a bipolar value the behavior will be asymmetric (although I could deal with that by splitting magnitude and direction into two different variables).

To use noninteger exponents, you need to take the absolute value of the input; both leaving the end result positive or matching the sign of the input are valid choices, and you can pick what makes more sense for each feature (or throw both variants into the weighted sum if you aren't sure).


Walking appears to be largely a classification problem (given an intent, e.g. turning right, learn the set of states in which the robot should move each actuator in a certain way); what is the role of function approximation?

Omae Wa Mou Shindeiru

Bear with me, I'm not a mathematician, hehe. However, from what I could gather from the book, the value of function approximation is to be able to generalize over the whole multi-dimensional space rather than just a single dimension, leading to more accurate learning, to put it simply.

Taking absolute value is something I considered, though again that means we lose sign information, and there's the discontinuity around x = 0 of the abs function (guess it doesn't matter in practice though). But yeah, we can put sign into a different variable I guess.

This topic is closed to new replies.

Advertisement