Planner-based AI and action costs

Started by
20 comments, last by Timkin 16 years, 10 months ago
Hi,

I'm working on my Masterthesis which is about tracking and predicting motion of objects. In my research I came across bayesian networks. So called decision graphs are an extension of them. The nice thing is that you can easly build those graphs. (Wkipedia Decision Tree)
Might be worth looking into.
“Always programm as if the person who will be maintaining your program is a violent psychopath that knows where you live”
Advertisement
Quote:Original post by Baiame
Timkin, I guess what makes it tricky is how abstract (or maybe just complex) the attributes are. For squad or higher level tactics, you'd have to consider the enemy's relative positions, firing range, numbers, defensive strength (cover, armour, etc.), and mobility. If you knew all this, you could make good descisions for almost any tactic; but I'm not sure how best to represent the data.


This is the heart of the problem for any real time decision system that is context aware. Essentially what you are asking for is a value function over (action,state) pairs (a cost function problem can be easily cast as a value function problem and vice-versa). Actually what you ideally want is a policy (a pairing of states and optimal actions) which covers the game state space. You could try learning such a value function or policy during a training period, but to be honest, I wouldn't go that way (at least not initially). This is an intractable problem in all but the simplest of games. This is why people go with heuristic solutions, or 'canned plans' devised during design time.

You could improve upon this by using a plan library, with heuristics for choosing a particular plan in any given scenario. You can also perform replanning by maintaining an estimate as to which canned plan is the best at any time and always perform the best.


Quote:I guess I'm trying to minimize casualties, and (to a lesser extent) optimizing the chance of success. Reordering actions sounds a bit complex. I think I'll just continually check the current results of the action, and if it's horribly failing, cancel it (which can trigger some relatively simple and immutable "regroup" goal).


You might find some benefit from reading some literature on partial order planning and schedule debugging.

Cheers,

Timkin
I was thinking yesterday that it must be possible to do GOAP backwards. That is, compare the AI agent's currently percieved state to the ideal (goal) state, and get a list of the compatible or necessary actions (based on their effects). Then you can sort them out by their preconditions and effects. Of course, there'd usually be a number of different ways of satisfying the goal, but at least you wouldn't need to search through the tens of thousands of possible states. Not that this is actually relevant to my question.

Now that I've thought about it a bit more, I see that extremely low-fidelity simulations should suffice for the action costs.
Quote:Original post by Baiame
I was thinking yesterday that it must be possible to do GOAP backwards. That is, compare the AI agent's currently percieved state to the ideal (goal) state, and get a list of the compatible or necessary actions (based on their effects). Then you can sort them out by their preconditions and effects.


If you use backtracking, like Orkins do for that matter, thats pretty much how GOAP works!

^ Really? Huh, didn't know that. Guess it makes sense though. Pathfinding through the all the possible states would be excessive.
Quote:Original post by Baiame
^ Really? Huh, didn't know that. Guess it makes sense though. Pathfinding through the all the possible states would be excessive.


It would be! So basically, you find all unsolved goals, and you only apply actions that may solve one or many of the goals of the current state. After applying the goal to the current state you add the unfulfilled requirements of those actions to the new goal list and voila! a new state. There are a few things you need to worry about, but thats the core of it.

For a FEAR-like game, in the most extreme situations you should not expand more than a few dozen states. In the best situations, you will expand only 1 state.
I would strongly suggest looking at HTN planning. I still haven't had time to implement a full HTN system, but on paper it addresses many of these problems. Personally, I believe it is a more intuitive way to structure behaviors for designers, who in my experience think in behavior sequences anyway.

FEARs prodecessors, NOLF2/TRON2.0 did have dynamic priority and we actually expected to add this to FEAR/Condemned. In practice, it was typically simpler to make a multiple actions any time we something like this. Part of this was that getting the priorities ordered to meet designer specified behavior in the first place was hard. As soon as you start making these numbers fuzzy, you introduce more complexity. The one place we may have used this is pathing, as the plan duration varied significantly depending on how far the AI needed to travel.
The problem is that 'cost' is hard to calculate from a large number of factors present in a situation. Even if you have full knowledge of all the objects that will interact and their capabilities you likely will still have uncertainty about what actions they will take (potential tactics of the opponent becomes a factor...). Cost itself may be several values with varying significance depending on the situation and goals (using up resources to reach the goal may be acceptable in one case and be less desireable when preserving resources is one of the goals). Taking damage (losing capability) or using up ammo versus gaining a 'useful' tactical position or lessening the oppositions capabilities. Time may be a resource that could be lost (ie in a multi scenario campaign).

Risk is another consideration. It may take high risk (of not succeeding, of resource lost) to enable the possibility of success when less risky apporaches have no chance of success.

Metrics to judge any situation systematicly (for cost/risk/gains) can be quite complex and hard to boil down to simple evaluation functions which can be applied at runtime. Just judging the value of gaining a position on a map is dependant on your known capabilities and your enemies and how the terrain configuration effects those capabilities.

A static situation can be pre-analysed extensively, but a (more) general solution is a major challenge even in a greatly simplified game world.

Expect a need for constant reevaluation as the situation changes and may require tactics of probing actions to try to gain information or a cheap acquisition of a 'good' position which can be exploted. You cant really plan everything out ahead of time in a non-deterministic system. Complex actions fall apart very rapidly when key points of a plan are not achieved or get delayed. More versatile approachs which inherantly give more options to adapt to a changing situation usually are more successful.








--------------------------------------------[size="1"]Ratings are Opinion, not Fact
^ Thanks for those thoughts. Having several different cost values seems a good idea. I don't think I'll bother with "risk", as I don't see how I could implement it.
Quote:Original post by BrianL
I would strongly suggest looking at HTN planning. I still haven't had time to implement a full HTN system, but on paper it addresses many of these problems. Personally, I believe it is a more intuitive way to structure behaviors for designers, who in my experience think in behavior sequences anyway.

Thanks for the suggestion, I'll look into them.
A brief outline of risk:

Risk can be considered in terms of the behaviour of an individual with regards to betting. Consider a simple lottery offering a probability p of winning L dollars and a probability of 1-p of winning nothing. The expected reward from this lottery is pL dollars. Now, assume that I make you an offer of pL dollars not to play in the lottery. You are a risk-neutral person if you are indifferent to the lottery or my offer of pL. You are risk-taking if you would prefer the lottery, or would prefer the lottery even for a higher offer >pL. Finally, you are risk-averse if you would always accept my offer, or even a lower offer (<pL) not to play the lottery. To summarise, a risk-taking person will always gamble if the possible win is greater than the offer, a risk-averse person will always prefer the sure thing and a risk-neutral person is in the middle.

So what does this mean for a goal-directed planning agent? It means that if the outcomes of actions are uncertain, a risk-taking agent will always choose actions that offer maximimum utility (minimum cost); a risk-averse agent will choose actions with the highest probability of success; and a risk-neutral agent will choose actions that maximise expected utility (minimise expected cost).

Cheers,

Timkin

This topic is closed to new replies.

Advertisement