# RL for Dubins Car

## Recommended Posts

##### Share on other sites
Timkin    864
Doh, the reference book I want is at home on the dining table...

Off the top of my head, you don't have to discretise the state space, you just need to solve a Hamilton-Jacobi-Bellman equation for the value function you choose. Although, if I remember correctly, the Dubins Car involves non-holonomy, so that might make it tougher. I've got a book on viscousity solutions of HJB equations at home which covers this sort of problem. I'll post something on it tonight.

If you want to specfically use Q-learning to learn a stationary policy, then yes it would be easier if you discretised the state space, since you then have merely a lookup table for your optimal control. However I'm not convinced that's the best way to solve this problem... unless there's some other reason constraining you to discrete spaces.

Two other quick questions: (1) from you post, it sounds like you're trying to do waypoint tracking. Have you instead considered having the vehicle try and follow a specified parametric trajectory (such as a spline curve)? This is a slightly easier problem since you're looking for a policy to minimise the distance-to-track error. Of course, any waypoint course is also a piecewise linear spline! ;)

(2) You're from Melbourne... and doing a PhD... where?

Cheers,

Timkin

##### Share on other sites
My PhD is through Swinburne, but I'm not on site much.

The reason for discretizing the state space is that the research problem doesn't really have anything to do with the dubins car - we're working on much more complex vehicles. Mainly AUV's but with faciltiy for UAV's (including aerosondes). Dubins was going to be the first step, then move onto more relevant models like dubins with steering acceleration before starting work on our vehicles (which are currently gain scheduled linear models). At this point I'm assuming that the problem will be stationary.

My thesis is concentrated on obstacle avoidance for AUV's. I am particularly interested in trying to get my head around the work of Frazzoli on Manuever Automatons http://rigoletto.seas.ucla.edu/papers/Frazzoli.PhD01.pdf . One point the made in the thesis is that if you can do optimal control in an obstacle free environment you can use those results as a lower bound metric to improve the performance of probabilistic path planners like RRT's when you do have obstacles. Because of the application I've got in mind, tracking rate commands is much more important than tracking waypoints, but this aspect of my problem is more about having some type of estimate of the optimal cost to go.

I've seen lots of reference to the HJB equations, but I'm really unsure of where I should even start looking if I want to setup the equations for the dubins car let alone anything more complex.

Any references or suggested reading would be really appreciated.

##### Share on other sites
Timkin    864
Sorry for the delay in a followup to my earlier post... I'm flat out getting a presentation ready for a one day conference tomorrow... but I'll be back to my semi-normal state on Tuesday and hopefully I'll have a few moments to post some more thoughts.

Cheers,

Timkin

##### Share on other sites
Timkin    864
Okay, I've got a few moments... phew...

Where are you at with this problem? Same place as last week? A few thoughts...

To partition rho, first map it into a bounded space, using say r=tanh(rho), then discretise this space using a uniform partition set. While this gives you a uniform partitioning of your discrete space (alpha,beta,r), it provides a non-uniform discretisation of the original state space. One problem you may face is that the learning will be insensitive to control variations far from the target. The other choice is to make time a state variable and give it a non-uniform disretisation as a function of rho. I probably wouldn't go down that path though.

Have you managed to generate a discrete map of the form sp = G(sq,am) (for states s and controls a) from your equations of motion?

As for the HJB approach... you basically need to write down your equations of motion for the system in Hamiltonian coordinates (generally not a trivial task). This is much the same as casting a problem in Lagrangian coordinates (and indeed, one way to get the Hamiltonian dynamics is to first write down the Lagrangian and then transform to the Hamiltonian using the known transformations between these two systems). Why is this a good thing to do? Because given the Hamiltonian, the optimal control problem in continuous space is trivially expressed as a dynamic programming problem. Q-learning is basically the discrete space version of the discounted HJB equation. Why do I think this is a good path to follow? Because, when you end up dealing with UAVs and AUVs, you'll be dealing with vehicles in a non-zero background flow. This will royally screw with any discretisation scheme and you'll need to solve your problem in its natural continuous space taking into account the asymmetry of motion induced by this flow.

I'd enjoy the opportunity to chat to you about your project as well. Send me an email to the address in my profile and I'll send you my work contact details. In particular I'd like to find out what Swinburne is doing in the area of UAVs and AUVs and see how that matches up with our work at Deakin.

Cheers,

Timkin