Jump to content
  • Advertisement

Archived

This topic is now archived and is closed to further replies.

pars

To Timkin and anybody else interseted in Reinforcement Learning in Pacman

This topic is 5514 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I have been working on implementation of Reinforcement Learning in Pacman for my final year project. You might remember that I asked a few questions in regard to this topic a few months ago. Since Timkin was interested to know if RL in pacman works or not, I decided to post this topic. - I can say that my project was a success. As you can imagine, at the beginning, the ghosts are dumb, and just move around randomly, but after going through a series of training episodes, their behaviour start to improve, and if enough training is done, they will manage to catch Pacman efficiently. - The training however took ages. in order to train the ghosts for 1 million episodes, it took around 20 hours. However I saved all the data in a file, so that they could be loaded into memory the next time the game is run. Therefore there would be no need to repeat the training. - In the training, I only used one ghost. Not because it was not possible to include all the four ghosts, but in order for RL to be effective, The training would have to be done for a much higher number of episodes and so it would take ages. Also I ommited the use of power capsules, because when they are eaten by Pacman, the assumption of the ghosts about their world drastically changes, and would make the learning much more complicated. Frankly, if I had enough time, I would investigate this issue further. If anyone is interested, I can put the results for this project on the web, together with all the details, but only after my exams (they start tomorrow).

Share this post


Link to post
Share on other sites
Advertisement
Oh, and I would like to thank anyone who answered my questions when I had problems

[edited by - pars on May 1, 2003 7:37:39 AM]

Share this post


Link to post
Share on other sites
I for one would like to see your results. Of course I''m sure we''ll all wait patiently while you concentrate on exams.......are they up yet?......how about now?....now?....

-Kirk

Share this post


Link to post
Share on other sites
quote:
Original post by Kylotan
(since I''d argue that 90% of the battle with creating any AI system is finding a decent representation of what you''re modelling).



Nah...more like 99%


Share this post


Link to post
Share on other sites
I''m curious too Two things:

1) Do you think the slow learning due to the fact that pacman is not a static MDP - depending how you model it? How much of the dynamicness of the game did your representation manage to "factor out"?

2) What actual algorithm are you actually using to compute the state values?

Alex



Artificial Intelligence Depot - Maybe it''s not all about graphics...

Share this post


Link to post
Share on other sites
alexjc,

I have used Sarsa algorithm,

I don''t fully understand your first question. Well I suppose the slow learning is prob b/c of the high number of states in the game, It has a 19x19 maze, and If we dont count the walls, there should be around 60,000 possible states, with pacman and only one ghost present in the maze. Now each state has 4 possible Q values, so there would be around 240,000 possible Q values. For the learning to be effective each value needs to be updated a few times. So I suppose the ghost has to be trained for around a million times.

As for training all the four ghosts together, well, I think you can guess how long that will take.

Share this post


Link to post
Share on other sites
pars,

For the algorithm, I''d suggest trying the Monte Carlo approach. TD(0) algorithms - like sarsa - take very long to propagate the state values as they use a backup of depth 1. With a Monte Carlo algorithm, you can do arbitrarily deep backup, and propagate the reward much quicker. It seems pacman is well suited to the episodic learning scheme anyway. As a bonus, you can give hints to the episode generation policy to improve the learning.

As for the first question, I wasn''t sure if you''d modelled ALL the states, but it seems you have. If you hadn''t, I was wondering if the problem was actually "solvable" using RL algorithms (would the probabilities be stationary).


I count 130,321 states (19x19 for ghost * 19x19 for pacman) ignoring obstacles: more complex than I would have thought! That said, you don''t really need the Q-values as you know the neighbouring states. Since you have part of the world model (the state transitions) learning the state values V(s) is possible, and would be at least 4x quicker.

As for training multiple ghosts, I think it would in fact be easier. The ghosts in my pacman don''t seem to pay attention to each other, so it''s the same state-space. In fact, it''d be like having multiple agents exploring the same state space, so it''d learn quicker too.


Anyway, sounds like great fun! Did you code the pacman yourself or use an existing codebase?



Artificial Intelligence Depot - Maybe it''s not all about graphics...

Share this post


Link to post
Share on other sites

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

Participate in the game development conversation and more when you create an account on GameDev.net!

Sign me up!