Sign in to follow this  
CzarKirk

Planning Methods

Recommended Posts

Hi, I need to make use of planning for a mult-agent system, but I am not so familiar with this area of AI and would appreciate suggestions of techniques I can use. Time is split into discrete steps and agents have assigned to them tasks, which have a start time, an end time (deadline), a reward value (paid only upon completion) and the number of turns the task must be worked upon for completion. Agents can only work on one task at a time, but tasks can be stopped and resumed without penalty. Agents may be assigned future tasks that they cannot work on yet, so they can take these into consideration for their plan, but agents can be assigned any arbitrary task at any time. Agents work on tasks individually, they cannot be shared or delegated. For example, an agent could have 2 short tasks and one long task, and due to deadline constracts can complete only 2 short tasks or one long task. The agent would pick the set that gives the highest reward value. The second short task might not be available from the beginning, but only after a (small) number of turns. This means a short-sighted agent wouldn't take its reward value into consideration and might take an inoptimal path. Any recommendations on kinds of planning methods suitable for this problem would be appreciated. [Edited by - CzarKirk on March 20, 2010 6:55:18 AM]

Share this post


Link to post
Share on other sites
Hi, thanks. I've been looking into STRIPS but I'm not sure how to represent tasks or the goal state, as there will be lots of numbers and not many predicates. There is no special goal state, just to maximise the reward value. How do I represent things such as time using STRIPS?

Share this post


Link to post
Share on other sites
Sounds like a job for dynamic programming. Sort of like a knapsack problem, but the knapsack size is the amount of time left. If new tasks can be added arbitrarily, you could assume some sort of probability distribution on new tasks and incorporate the expected value into the problem.

Share this post


Link to post
Share on other sites
Quote:
Original post by CzarKirk
Hi, thanks. I've been looking into STRIPS but I'm not sure how to represent tasks or the goal state, as there will be lots of numbers and not many predicates. There is no special goal state, just to maximise the reward value. How do I represent things such as time using STRIPS?


Usually goal states are just those for which a certain predicate evaluates to true.

I'm actually not used to thinking of reward values in STRIPS, but of course if you want to think of things this way, one way to encode the goal states is to just make the reward in non-goal states "minus-infinity."

Once you include the "frame axioms" STRIPS essentially become a state-space search which you can solve with BFS/DFS/A*, with each state being the binary vector of the different values of predicates (every predicate evaluates to either true or false). Traditionally, this vector is represented as a list from which things are added or removed, but from my point of view, that's just a sparse vector representation.

Share this post


Link to post
Share on other sites
Here is some crazy idea I've been mulling over. The most successful modern computer-go programs use something called Monte Carlo Tree Search, and I think this could be applied to planning in most games. The basic idea is to have some very fast model of what agents would do in any given circumstance. These models should involve some randomness. So if you have a symbolic representation of your game (which you probably need for any type of planning), you can just simulate forward what's going to happen, Monte Carlo style. At the end of the simulation (you have to decide somehow when that is), you look at some utility function that tells you how happy each agent is with the result. You will then run many such simulations and, among the actions that an agent has available right now, you'll pick the one that is at the beginning of simulations with the highest expected utility for the agent. Well, so far I just described the Monte Carlo part. The "Tree Search" part comes from accumulating statistics about the utility of actions in positions early in the simulations, which we will visit many times if we run thousands of simulations. You can then use that information to do better than using your simplistic probabilistic model for reacting in that particular situation.

I love this technique because it allows the agents to reason about what adversaries might do in reaction to their own actions, it allows for collaboration (if agents have similar goals), it allows for random outcomes (if I shoot at the guard from a distance, some fraction of the time I'll kill him, and if I don't he'll be alerted, and the simulations can capture that)... It just seems really flexible and it has proven to be very powerful in computer go, which is a really hard problem.

EDIT: Too bad I can't show you a working example of a video game where this technique works great. My job gets in the way of my hobbies... ;)

[Edited by - alvaro on March 24, 2010 10:44:34 AM]

Share this post


Link to post
Share on other sites
Alvaro, that sounds interesting and definitely feasible. The down side is that simulation/prediction can be very expensive...

Do you have any links to papers about applying the Monte Carlo tree search to Go?

Share this post


Link to post
Share on other sites
Actually, searching for `Monte Carlo Tree Search' you get some decent papers, including this one that shows that applying this technique to video games is not as original an idea as I thought. :)

The people who created MoGo wrote a good explanation of what it does. That's where I learned a lot of how UCT (the original MCTS algorithm) works.

Share this post


Link to post
Share on other sites
Alvaro, if I understand you, by doing that, you would turn the AI into a reactive planner. And then you would lose one of the main advantages of using planning, which is consistency of actions. Because you dont "keep" a plan, there is no way to make sure that chained actions are consistent. Total war games have a big problem with this on their world map AI...

Share this post


Link to post
Share on other sites
Steadtler, I wouldn't use the term reactive planner here. It generally means a plan that's triggered reactively but carried out to the end without making new decisions (like most behavior trees). It doesn't seem to be the case here.

Alvaro's suggestion is a full planner, but he seems to suggest executing it every frame. I'm not sure there's a word for that, it almost becomes a reactive policy but informed by runtime sampling.

Share this post


Link to post
Share on other sites
Quote:
Original post by alexjc
Steadtler, I wouldn't use the term reactive planner here. It generally means a plan that's triggered reactively but carried out to the end without making new decisions (like most behavior trees). It doesn't seem to be the case here.

Alvaro's suggestion is a full planner, but he seems to suggest executing it every frame. I'm not sure there's a word for that, it almost becomes a reactive policy but informed by runtime sampling.


Terminology is a bitch in AI... Im talking the definition of reactive planner from Rich & Knight: a system that has a notion of the future but continuously retake a decision every frame, which seems to fit Alvaro's suggestion.

Share this post


Link to post
Share on other sites
Quote:
Original post by SteadtlerTerminology is a bitch in AI... Im talking the definition of reactive planner from Rich & Knight: a system that has a notion of the future but continuously retake a decision every frame.


Ah, I see! It's not a great definition IMHO. Most reactive systems seem to fit that definition... After all, the whole point of an intelligent decision is to understand what's going to happen, either by design or runtime/offline technique.

I prefer the definition of "picking a plan reactively without deliberating into the future." How you execute that plan is a different terminology all together...

I wonder what Norvig has to say about this!

Share this post


Link to post
Share on other sites
Funny thing is as I remember, Rich & Knight dispute people who came up with the term "reactive planner", as it doesnt have to involve planning. Sorry for getting sidetracked here. I like systems where I can control when and how to replan, which is something ofter overlooked.

Share this post


Link to post
Share on other sites
Rich & Knight should be punished for trying to redefine a term... :-) It's bad enough already without people trying to overload common terms!


I agree with your premise though. In the case of Alvaro's system, you could get the same "control" by taking into account the current plan as part of the search process. Even if it's done every frame you get a way to persist with the same plan... KILLZONE 2 does this.

Share this post


Link to post
Share on other sites
Yes, I am suggesting reconsidering often; I don't know if it has to be on every frame. Picking a sequence of actions and executing them without reconsideration doesn't seem appropriate unless the environment is extremely simplistic (deterministic results of actions, no other unpredictable agents in the scene...).

I am not sure the algorithm I propose is a "planner" in a traditional sense of the word. Some actions will be taken only because the agent is expecting to perform some other actions in the future. In this sense it is planning things, and we would probably think of what the agent is doing as following some plan, even if the plan doesn't exist as a concrete data structure anywhere in the code.

If the algorithm is set up right, the agent shouldn't abandon a good [perceived] plan without a good reason. The last action should have brought the agent closer to reaching some goal, so the agent would be more likely than before to see what it's supposed to do to achieve it.

Share this post


Link to post
Share on other sites
Re. this confusion over terminology:

I think that the words used in the decision and control community avoid confusion nicely: A controller is open loop if it is a function from the current time to an action (i.e., a sequence of actions), and closed-loop if it is a mapping from the current state to an action.

I think this is a better distinction, because "planning" (looking into the future, usually using dynamic programming) can be used to create closed-loop controllers; such a mapping from state to action is also known as a policy. An example of this is the policy computed by value iteration, or the brushfire algorithm for single-source pathfinding; I think both of these things can properly be called planning.

Share this post


Link to post
Share on other sites
Alvaro, in static worlds you shouldn't get oscillation for the reasons you mention. But in dynamic worlds (and game environments) things can change to induce oscillation. If a target moves just out of range into the next high-level area, or another object is in the way making the option a bit more expensive. These things need some kind of "momentum" factor to overcome. It's not hard to do but you need to do it.

Emergent, I like the open-loop closed-loop concept. Thanks for the insights!

Share this post


Link to post
Share on other sites
Quote:
Original post by alexjc
Alvaro, in static worlds you shouldn't get oscillation for the reasons you mention. But in dynamic worlds (and game environments) things can change to induce oscillation. If a target moves just out of range into the next high-level area, or another object is in the way making the option a bit more expensive. These things need some kind of "momentum" factor to overcome. It's not hard to do but you need to do it.


Oh, that's what you were talking about! Yes, you probably need to introduce hysteresis by adding some sort of penalty for changing your mind.

Share this post


Link to post
Share on other sites
Quote:
Original post by CzarKirk
Ok all that is too advanced for me right now :(


You're trying to do multiple-task, multiple-agent planning with soft goals right from the go. Begin instead with single (hard) goal, single agent planning and move your way from there.

Share this post


Link to post
Share on other sites


The usual difficulty is of the choice being made -- carrying out a decided action is usually then trivial by comparison though can still have costs using simpler AI methods like pathfinding in carrying out the details).

The situation has to be symbolized. (cognition)

The symbols have to be factored/judged for relevance (including context combinations of the situational symbols )

Solutions to achieve given multiple (often conflicting) goals have to be matched to the situation (making estimations of success -- cost/risk versus payoffs for moving towards achieving multiple goals)

Prioritization using a unified measurement for the evaluation system (apples vs oranges vs bannanas) .

Possibly frequent reevaluations as the situation changes, weighed against the practicality of continuing the planned course. Optimizations to prevent brute force reevaluation...

Uncertainty adds another dimension to be evaluated.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this