Jump to content
  • Advertisement
Sign in to follow this  
CzarKirk

Planning Methods

This topic is 3007 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi, I need to make use of planning for a mult-agent system, but I am not so familiar with this area of AI and would appreciate suggestions of techniques I can use. Time is split into discrete steps and agents have assigned to them tasks, which have a start time, an end time (deadline), a reward value (paid only upon completion) and the number of turns the task must be worked upon for completion. Agents can only work on one task at a time, but tasks can be stopped and resumed without penalty. Agents may be assigned future tasks that they cannot work on yet, so they can take these into consideration for their plan, but agents can be assigned any arbitrary task at any time. Agents work on tasks individually, they cannot be shared or delegated. For example, an agent could have 2 short tasks and one long task, and due to deadline constracts can complete only 2 short tasks or one long task. The agent would pick the set that gives the highest reward value. The second short task might not be available from the beginning, but only after a (small) number of turns. This means a short-sighted agent wouldn't take its reward value into consideration and might take an inoptimal path. Any recommendations on kinds of planning methods suitable for this problem would be appreciated. [Edited by - CzarKirk on March 20, 2010 6:55:18 AM]

Share this post


Link to post
Share on other sites
Advertisement
Hi, thanks. I've been looking into STRIPS but I'm not sure how to represent tasks or the goal state, as there will be lots of numbers and not many predicates. There is no special goal state, just to maximise the reward value. How do I represent things such as time using STRIPS?

Share this post


Link to post
Share on other sites
Sounds like a job for dynamic programming. Sort of like a knapsack problem, but the knapsack size is the amount of time left. If new tasks can be added arbitrarily, you could assume some sort of probability distribution on new tasks and incorporate the expected value into the problem.

Share this post


Link to post
Share on other sites
Quote:
Original post by CzarKirk
Hi, thanks. I've been looking into STRIPS but I'm not sure how to represent tasks or the goal state, as there will be lots of numbers and not many predicates. There is no special goal state, just to maximise the reward value. How do I represent things such as time using STRIPS?


Usually goal states are just those for which a certain predicate evaluates to true.

I'm actually not used to thinking of reward values in STRIPS, but of course if you want to think of things this way, one way to encode the goal states is to just make the reward in non-goal states "minus-infinity."

Once you include the "frame axioms" STRIPS essentially become a state-space search which you can solve with BFS/DFS/A*, with each state being the binary vector of the different values of predicates (every predicate evaluates to either true or false). Traditionally, this vector is represented as a list from which things are added or removed, but from my point of view, that's just a sparse vector representation.

Share this post


Link to post
Share on other sites
Here is some crazy idea I've been mulling over. The most successful modern computer-go programs use something called Monte Carlo Tree Search, and I think this could be applied to planning in most games. The basic idea is to have some very fast model of what agents would do in any given circumstance. These models should involve some randomness. So if you have a symbolic representation of your game (which you probably need for any type of planning), you can just simulate forward what's going to happen, Monte Carlo style. At the end of the simulation (you have to decide somehow when that is), you look at some utility function that tells you how happy each agent is with the result. You will then run many such simulations and, among the actions that an agent has available right now, you'll pick the one that is at the beginning of simulations with the highest expected utility for the agent. Well, so far I just described the Monte Carlo part. The "Tree Search" part comes from accumulating statistics about the utility of actions in positions early in the simulations, which we will visit many times if we run thousands of simulations. You can then use that information to do better than using your simplistic probabilistic model for reacting in that particular situation.

I love this technique because it allows the agents to reason about what adversaries might do in reaction to their own actions, it allows for collaboration (if agents have similar goals), it allows for random outcomes (if I shoot at the guard from a distance, some fraction of the time I'll kill him, and if I don't he'll be alerted, and the simulations can capture that)... It just seems really flexible and it has proven to be very powerful in computer go, which is a really hard problem.

EDIT: Too bad I can't show you a working example of a video game where this technique works great. My job gets in the way of my hobbies... ;)

[Edited by - alvaro on March 24, 2010 10:44:34 AM]

Share this post


Link to post
Share on other sites
Alvaro, that sounds interesting and definitely feasible. The down side is that simulation/prediction can be very expensive...

Do you have any links to papers about applying the Monte Carlo tree search to Go?

Share this post


Link to post
Share on other sites
Actually, searching for `Monte Carlo Tree Search' you get some decent papers, including this one that shows that applying this technique to video games is not as original an idea as I thought. :)

The people who created MoGo wrote a good explanation of what it does. That's where I learned a lot of how UCT (the original MCTS algorithm) works.

Share this post


Link to post
Share on other sites
Alvaro, if I understand you, by doing that, you would turn the AI into a reactive planner. And then you would lose one of the main advantages of using planning, which is consistency of actions. Because you dont "keep" a plan, there is no way to make sure that chained actions are consistent. Total war games have a big problem with this on their world map AI...

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

Participate in the game development conversation and more when you create an account on GameDev.net!

Sign me up!