Planning Methods

Started by
20 comments, last by wodinoneeye 14 years ago
Hi, I need to make use of planning for a mult-agent system, but I am not so familiar with this area of AI and would appreciate suggestions of techniques I can use. Time is split into discrete steps and agents have assigned to them tasks, which have a start time, an end time (deadline), a reward value (paid only upon completion) and the number of turns the task must be worked upon for completion. Agents can only work on one task at a time, but tasks can be stopped and resumed without penalty. Agents may be assigned future tasks that they cannot work on yet, so they can take these into consideration for their plan, but agents can be assigned any arbitrary task at any time. Agents work on tasks individually, they cannot be shared or delegated. For example, an agent could have 2 short tasks and one long task, and due to deadline constracts can complete only 2 short tasks or one long task. The agent would pick the set that gives the highest reward value. The second short task might not be available from the beginning, but only after a (small) number of turns. This means a short-sighted agent wouldn't take its reward value into consideration and might take an inoptimal path. Any recommendations on kinds of planning methods suitable for this problem would be appreciated. [Edited by - CzarKirk on March 20, 2010 6:55:18 AM]
Advertisement
I always recommend Steve LaValle's Planning Book (free online).
STRIPS is always a good place to begin.
Hi, thanks. I've been looking into STRIPS but I'm not sure how to represent tasks or the goal state, as there will be lots of numbers and not many predicates. There is no special goal state, just to maximise the reward value. How do I represent things such as time using STRIPS?
Sounds like a job for dynamic programming. Sort of like a knapsack problem, but the knapsack size is the amount of time left. If new tasks can be added arbitrarily, you could assume some sort of probability distribution on new tasks and incorporate the expected value into the problem.
Quote:Original post by CzarKirk
Hi, thanks. I've been looking into STRIPS but I'm not sure how to represent tasks or the goal state, as there will be lots of numbers and not many predicates. There is no special goal state, just to maximise the reward value. How do I represent things such as time using STRIPS?


Usually goal states are just those for which a certain predicate evaluates to true.

I'm actually not used to thinking of reward values in STRIPS, but of course if you want to think of things this way, one way to encode the goal states is to just make the reward in non-goal states "minus-infinity."

Once you include the "frame axioms" STRIPS essentially become a state-space search which you can solve with BFS/DFS/A*, with each state being the binary vector of the different values of predicates (every predicate evaluates to either true or false). Traditionally, this vector is represented as a list from which things are added or removed, but from my point of view, that's just a sparse vector representation.
Here is some crazy idea I've been mulling over. The most successful modern computer-go programs use something called Monte Carlo Tree Search, and I think this could be applied to planning in most games. The basic idea is to have some very fast model of what agents would do in any given circumstance. These models should involve some randomness. So if you have a symbolic representation of your game (which you probably need for any type of planning), you can just simulate forward what's going to happen, Monte Carlo style. At the end of the simulation (you have to decide somehow when that is), you look at some utility function that tells you how happy each agent is with the result. You will then run many such simulations and, among the actions that an agent has available right now, you'll pick the one that is at the beginning of simulations with the highest expected utility for the agent. Well, so far I just described the Monte Carlo part. The "Tree Search" part comes from accumulating statistics about the utility of actions in positions early in the simulations, which we will visit many times if we run thousands of simulations. You can then use that information to do better than using your simplistic probabilistic model for reacting in that particular situation.

I love this technique because it allows the agents to reason about what adversaries might do in reaction to their own actions, it allows for collaboration (if agents have similar goals), it allows for random outcomes (if I shoot at the guard from a distance, some fraction of the time I'll kill him, and if I don't he'll be alerted, and the simulations can capture that)... It just seems really flexible and it has proven to be very powerful in computer go, which is a really hard problem.

EDIT: Too bad I can't show you a working example of a video game where this technique works great. My job gets in the way of my hobbies... ;)

[Edited by - alvaro on March 24, 2010 10:44:34 AM]
Alvaro, that sounds interesting and definitely feasible. The down side is that simulation/prediction can be very expensive...

Do you have any links to papers about applying the Monte Carlo tree search to Go?

Join us in Vienna for the nucl.ai Conference 2015, on July 20-22... Don't miss it!

Actually, searching for `Monte Carlo Tree Search' you get some decent papers, including this one that shows that applying this technique to video games is not as original an idea as I thought. :)

The people who created MoGo wrote a good explanation of what it does. That's where I learned a lot of how UCT (the original MCTS algorithm) works.
Alvaro, if I understand you, by doing that, you would turn the AI into a reactive planner. And then you would lose one of the main advantages of using planning, which is consistency of actions. Because you dont "keep" a plan, there is no way to make sure that chained actions are consistent. Total war games have a big problem with this on their world map AI...

This topic is closed to new replies.

Advertisement