Jump to content
  • Advertisement
Sign in to follow this  
cossie

Using a GA to control weights in a Neural Network - rating fitness

This topic is 4197 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hey, I'm working on a pathfinding project in which I'm trying to evolve a bot with multi-layered feed forward neural network as a brain, that will navigate towards a goal in a 2D grid world. I'm using a Genetic Algorithm to optimize the weights of each neural network. I have my network and GA coded (although I plan to implement and test various forms of selection, crossover and mutation) and the simulation is running quite poorly at the moment. I was wondering if anyone could offer and advice/opinion on the best way to rate the fitness? At present when a bot makes a move that takes it further away (in terms of relative x and y distances) I decrement its fitness, and for moves that take it closer, I increment its fitness. So the its fitness is constantly being assessed, my thinking behind was to hopefully generate a bot that will take the most direct path to the target, rather than one that "will get there, but it might take a while". When a trial ends (usually 1000 loops) I copy the bot's fitness rating into a corresponding genome, run my GA and then place the bot's old network weights with the new weights. Should I rate the fitness at the end of each trial run (about 1000 loops) based on how close the bot ended up to the target instead? Thanks for any tips, Cossie [Edited by - cossie on April 22, 2007 12:15:57 PM]

Share this post


Link to post
Share on other sites
Advertisement


I would use a rating something like

int(FinalDistanceToTarget) - (1.0 / Penalties)

Where penalties might be the amount of energy consumed on the trip (more energy used to step uphill) or the accumulation of observed bad behavior (perhaps a penalty for banging into things, and so forth)

Share this post


Link to post
Share on other sites
Just decrement the fitness after every move, and stop the run if the bot reaches the goal. Bots that reach the goal faster will have higher fitness, and that's all that matters. If a bot fails to reach the goal during the alotted time, you could penalize it further, based on how close it managed to get. This fitness function actually only depends on information that would be available at the end of the run, so you can perform the run then calculate the fitness afterwards.

Another possibility is to reward the bot based on how close it gets to the goal. The fitness function should reflect the desired behavior of the bot. In a maze-like environment, if the bot can get close to the goal but still be far a way in terms of navigating the maze, it could learn to just try to reach a cell near the goal and stay around there to be rewarded if the fitness function is not designed carefully.

Share this post


Link to post
Share on other sites
Quote:
Original post by cossie
I'm working on a pathfinding project in which I'm trying to evolve a bot with multi-layered feed forward neural network as a brain, that will navigate towards a goal in a 2D grid world.

Why? Unless you're doing this for purely interests sake, or there are particularly difficult features of the domain and bot that require neuro-control, then there are far better ways to approach such a simple control problem... like Optimal Control (specifically policy/value iteration, Q-learning, TD-learning). Assuming you have a reason for pursuing neuro-control though...

Quote:
Original post by cossie
At present when a bot makes a move that takes it further away (in terms of relative x and y distances) I decrement its fitness, and for moves that take it closer, I increment its fitness.


You're assuming that the optimal move is always toward the goal. This will only be the case if there are no obstructions between the goal and the bot. If you are working in an open space, then don't reward moving toward the goal. Rather, reward specific action selection, such as decreasing the relative angle to the goal (angle between current heading and direction to goal). You should also reward speed changes separately, so if the bot is far from the goal, reward speed increases. If you want the bot to arrive at the goal, reward speed decreases. However, do NOT encode the goal location specifically. Work in the error space (that is, distance to target space). You don't want it learning different behaviours for separate goal locations.

If you're not working in an open (unobstructed) space, then the above is useless and indeed you're not going to solve the problem with this approach, because the learned controller will depend on starting position and the goal position and hence will not be useful given a different scenario.

Quote:

Should I rate the fitness at the end of each trial run (about 1000 loops) based on how close the bot ended up to the target instead?


This is the age-old problem of choosing between batch (offline) learning and sequential (online) learning. Batch learning is generally better in terms of the quality of the final product, but suffers from several problems relating to computational complexity. Online learning suffers from a specific issue related to the training data, in that you only ever see data along the controlled trajectory. There are far more issues than this and if you truly want to understand the problem, I suggest you look up topics in machine learning. A good start is the book Reinforcement Learning.

Cheers,

Timkin

[Edited by - Timkin on April 22, 2007 11:44:43 PM]

Share this post


Link to post
Share on other sites
Thanks for all the replies, much appreciated.


Quote:

Why? Unless you're doing this for purely interests sake, or there are particularly difficult features of the domain and bot that require neuro-control, then there are far better ways to approach such a simple control problem... like Optimal Control (specifically policy/value iteration, Q-learning, TD-learning). Assuming you have a reason for pursuing neuro-control though...


I knew someone was going to bring this up. It's purely for my own interest, I've spent enough time tinkering with A* etc and I wanted to find out what is involved in trying to code a neural network to perform path finding.


Quote:
A good start is the book Reinforcement Learning.


Thanks a million Timkin! I've only had a chance to look at the table of contents but the book looks to cover a lot of what I'd like to know!

Share this post


Link to post
Share on other sites
Quote:
Original post by cossie
I wanted to find out what is involved in trying to code a neural network to perform path finding.

That's not going to be a trivial task. Keep in mind that at its core, an artificial neural network is just an arbitrary function approximator (with limitations based on structure); a mapping from an input space to an output space. If you want it to produce a path as its output, then you need to be able to represent the problem of generating a path as one of mapping from an input space to an output space. Then you need to obtain suitable training data to learn the network structure and parameters (or fix the structure and learn only the parameters). Off the top of my head I can't see how to do this for pathfinding using a standard multilayer perceptron. You might have success using a Kohonen network to represent the path domain, but even that's going to be tricky/unusual and I don't know of any appropriate research looking into this. I have seen self-organising maps successfully applied to very difficult (traditionally intractable) optimisation problems, such as n-queens. Pathfinding as search is just an optimisation problem, so that might be an approach to try.

Good luck with it.

Share this post


Link to post
Share on other sites
Maybe I'm over-simplifying the problem, but rather than a complete mapping between input space and output space, and outputting a complete path (as I think Timkin is suggesting or addressing) why can't the neural network simply evaluate which direction to move next?

e.g., let's assume you have a tiled domain with a start location, a goal location, various obstacles or differential costs to move in certain regions. The NN would have inputs for all tiles in the domain that are flagged with specific values, or you could have inputs such as angle to goal, obstacles, etc. Outputs would consist of 1 neuron per possible direction of travel - 4 for the cardinal directions and 4 more if you allow diagonal movement. By iteratively evaluating the neural network and taking the direction correspoding to the largest output, I would guess that a path would emerge. You would not get a complete path as output, but rather you would find what the path is after you've acheived the goal.

fup over at AI-Junkie has an interesting demo that is similar and might be useful to check out. In his Minesweepers project, the minesweepers have to cover the largest amount of area or they have to locate mines. He uses a GA to evolve NNs that control the tracks on the minesweepers, so outputs are on the order to left/right track direction/speed. Inputs are composed of various sensors on the minespeepers to detect obstacles and others related angle and distance to mines (if I remember correctly).

I've also put together a PacMan AI application in which the AI controller for PacMan in a evolving NN. This is somewhat different, obvious, and the goal is to maximize PacMan's score. Using a standard maze with no ghosts, he gets quite a good score and navigates around the maze quite nicely. The best results have resulted from inputs based on a tiled window around PacMan that extends 6 tiles in each direction. Each tile gives multiple inputs to the network - wall/no wall, bad ghost/no bad ghost, edible ghost/no edible ghost, dot/no dot, power up/no power up. You can check out the code, get the executable, and watch a demo at http://sourceforge.net/projects/aipac.

I hope some of this helps.

-Kirk

Share this post


Link to post
Share on other sites
Quote:

Maybe I'm over-simplifying the problem, but rather than a complete mapping between input space and output space, and outputting a complete path (as I think Timkin is suggesting or addressing) why can't the neural network simply evaluate which direction to move next?

e.g., let's assume you have a tiled domain with a start location, a goal location, various obstacles or differential costs to move in certain regions. The NN would have inputs for all tiles in the domain that are flagged with specific values, or you could have inputs such as angle to goal, obstacles, etc. Outputs would consist of 1 neuron per possible direction of travel - 4 for the cardinal directions and 4 more if you allow diagonal movement. By iteratively evaluating the neural network and taking the direction correspoding to the largest output, I would guess that a path would emerge. You would not get a complete path as output, but rather you would find what the path is after you've acheived the goal.


That is pretty much my aim, I have 2 inputs for relative x and y distance to the target at the moment, although I plan to expand it to sensors for the 4 cells around the bot for obstacle avoidance, but first I want to get it navigating towards a target.

The network has 4 outputs for the cardinal directions and the output with the strongest"signal" is the direction the bot moves in.

Share this post


Link to post
Share on other sites
Given the setup that you have, I would expect that within a few generations you should get a satisfying result. What happens to your bots over time?

As for how to evaluate fitness, the ultimate task is to minimize the distance to the goal tile. From your first description of how to determine fitness, I would suspect this would leave you with lots of random movements. If I have 10 random steps away from the goal followed by 10 random steps toward the goal, I have a 0 fitness or essentially a neutral rating. I would suspect that by running for a fixed number of steps and evaluating the fitness at the end of the run as distance from the goal, you should get bots that make it to the goal.

The problem, as you've mentioned, is that for 1000 steps you could get a wandering bot that hits the goal at 1000 steps being rated no better than the bot that hit the goal in 25 steps. To alleviate this, you might add a time factor. Perhaps minimizing [(final distance to goal + 1) * time to get there)]. By using the product you have a better chance of minimizing both factors simultaneously. I've also added 1 to the final distance to the goal in order to prevent a bot that hits the goal from eliminating its time penalty. Another approach would be to have a decreasing number of loops - start at 1000 then decrease it by 1 every 10 generations, for example.

Do you have any demos of how the bots behave? It might be useful to visualize it, or if you could describe the results you're getting, that might help. As I mentioned, I would expect some reasonable behavior in only a few generations.

-Kirk



Share this post


Link to post
Share on other sites
Quote:
Original post by kirkd
Maybe I'm over-simplifying the problem, but rather than a complete mapping between input space and output space, and outputting a complete path (as I think Timkin is suggesting or addressing) why can't the neural network simply evaluate which direction to move next?


...and that would be neuro-control, which is certainly achievable... however the OP stated he wanted to do pathfinding using a neural network, hence my skepticism.

If we're abandoning the idea of pathfinding and considering instead trajectory control, then certainly you can use an ANN; it's overkill for simple, tiled worlds, but it's doable. If you'd like some literature references, I have a ridiculously huge pile of papers on my desk on this topic. 8) Most of them though are directed at more complex problems in neuro-control, but the principles are the same.

My personal recommendation would be to use a fuzzy-neural approach, as this will be easier to learn a sufficiently compact mapping for the simple domain you are considering. It will also scale reasonably to more complex domains.

Cheers,

Timkin

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!