Archived

This topic is now archived and is closed to further replies.

fup

Intelliegent Agents - Planning

Recommended Posts

Does anyone have any good references to papers or articles which specifically discuss plan formulation within an agent architecture? Most architectures like dMARS etc use hard coded plans for every possible intention but I find this unsatisfactory - it always eats away at the back of my mind. Of course an agent may learn by observing and inferring but this is only a partial solution. I''m trying to figure out some way of getting an agent to learn how to formulate plans without the aid of a teacher of some sort. So if any of you know anything, please let me know.

Stimulate

Share this post


Link to post
Share on other sites
quote:
Original post by fup
Does anyone have any good references to papers or articles which specifically discuss plan formulation within an agent architecture?



Perhaps I''m misunderstanding what you are asking for fup, but there''s tonnes of literature on agent planning. Most of the approaches to solving POMDPs (Partially Observable Markov Decision Processes) are considered from the agent perspective.

I take it what you are looking for is an agent that given a goal in an environemnt (or set of goals) can formulate a plan using an action library to achieve the state goal(s). Is this correct, or am I missing something?

Could you elaborate more by providing an example problem that you are looking to solve and suggest the sorts of limitations you want to place on the agent''s knowledge?

Cheers,

Timkin

Share this post


Link to post
Share on other sites
Ah, Timkin to the rescue! ;0) I think you''ve got the gist of what I''m after. I''ve had difficulty researching this topic because a)finding the correct keywords to put into search engines is a nightmare, and b)Non of the papers I''ve read give any references or citations to any relevant papers. (and I''ve read quite a lot on this subject including all the stuff by Rao, Georgeff and Wooldridge).

I use an adaption of the BDI architecture. In my implementation - and in all others I have seen, including more advanced architechtures like PRS and dMARS - when an agent has decided on an intention it choses a plan from a hard coded plan library based upon a cost function. It then divides the plan into the relevant atoms, again based upon a cost function.

This works well but I''d like to find a way an agent can learn to formulate its own plans. I''m sure there must be research out there, but I''ll be damned if I can find it!

I''ll check out POMDPs. Do you have any good references? (the less technical, the better - I hate wading through pomp)





Stimulate

Share this post


Link to post
Share on other sites
The most untechnical reference to pomdps would be POMDPs for Dummies.
I''m, not sure if this is what you are after, because it is rather low level planning. If you are more interested in planning agents you can check some of these at citeseer. As you will see most planning involves cooperation (multy agent systems) or negotiation (agent brokers).
The learning part in these architectures often involve modeling the environment (including other agents) to form a belief state (just like the pomdps do).
I do not know what you mean by "formulating a plan without ... teacher". Just formulating wouldn''t be enough to learn. The agent should at least be able to execute the plan to find out if it was a good plan. This feadback can be used to come up with a better plan using genetic/reinforcement_learning kind of approaches. Whether this is possible and already exists depend on your particular agent architecture and environemnt of your application domain...

Share this post


Link to post
Share on other sites
quote:
Original post by fup
Ah, Timkin to the rescue! ;0) I think you've got the gist of what I'm after. I've had difficulty researching this topic because a)finding the correct keywords to put into search engines is a nightmare, and b)Non of the papers I've read give any references or citations to any relevant papers. (and I've read quite a lot on this subject including all the stuff by Rao, Georgeff and Wooldridge).



Hehe... the usual problem with relying on the Net for information. 1 billion channels and still nothing on!

quote:
Original post by fup
I use an adaption of the BDI architecture. In my implementation - and in all others I have seen, including more advanced architechtures like PRS and dMARS - when an agent has decided on an intention it choses a plan from a hard coded plan library based upon a cost function. It then divides the plan into the relevant atoms, again based upon a cost function.

This works well but I'd like to find a way an agent can learn to formulate its own plans. I'm sure there must be research out there, but I'll be damned if I can find it!



There are essentially two paradigms for agents formulating actions to achieve goals: planning agents and reactive agents.

The latter is characterised by an agent function. Examples include subsumption architectures, classifier systems and other finite state automata with local simple-stimulus response mappings. The problem with reactive agents is that while they can maximise a local utility function they provide no guarantees about global optimality of the solution, so that they can therefore not guarantee satisfying a goal using only local action. Pengi is a good example of this (Agre & Chapman).

Planning agents on the other hand formulate plans prior to action. These agents generally have a preference structure for one plan over another, using either utility theory or value functions; the latter usually when reinforcement learning is involved. Plans are either selected from a set of canned plans in a plan library, or developed using search (either state space search or plan space search). The latter is more common for agents acting in complex domains where there is uncertainty in their state.

Let's assume for a moment that you want to do planning in a simulated environment, so there is complete information about the state of the domain available to the agent. This information may contain noise and the transition function for the domain may be stochastic. In this case, an optimal policy (a plan that describes an optimal action in every state of the domain) can be found using policy iteration . This turns out to be the same policy you would find if you performed value iteration . This corresponds to solving an arbitrary Markov Decision Process (MDP). An agent designed to solve an MDP is just a Finite State Automaton.

If the information given to the agent conveys only information about a subset of the states of the domain, then you have a POMDP. These are tougher to solve because of the fact that the states of hidden variables must be inferred, thus adding a large branching factor to search tree.

quote:
Original post by fup
I'll check out POMDPs. Do you have any good references? (the less technical, the better - I hate wading through pomp)



Unfortunately you're going to have to get your hands dirty if you want to understand this stuff and especially if you want to implement it. I'll try and direct you to the better papers on the subject. A good place to start with MDPs and POMDPs is Russell & Norvig, "Artificial Intelligence: A Modern Approach". If it's not already on your shelf it should be! This gives a basic overview of the topic with a little math and a fair amount of explanation.

After that, look at the following...

This paper is a must read for anyone looking at planning:

@misc{Dean94,
author = "Dean, T.",
title = "Decision-theoretic planning and markov decision processes",
url = "citeseer.nj.nec.com/33431.html",
year = "1994"
}

Other excellent works

@InProceedings{Basye89,
author = "Kenneth Basye and Thomas Dean and Jeffrey Scott Vitter",
title = "Coping with uncertainty in map learning",
booktitle = "Proceedings of the Eleventh International Joint
Conference on Artificial Intelligence (IJCAI)",
address = "Detroit, MI",
year = "1989"
}

@inproceedings{CassandraEtAl94,
author = "Cassandra, A. R., Kaebling, L.P., and Littman, M. L.",
title = "Acting optimally in partially observable stochastic domains",
booktitle = "Proceedings of the Twelfth National Conference on Artificial Intelligence",
year = 1994,
publisher = "AAAI Press",
pages = "1023-1028"
}


@article{CassandraEtAl96,
author = "Cassandra, A. R., Kaebling, L.P., and Kurien, J. A.",
title = "Acting under {U}ncertainty: {D}iscrete {B}ayesian {M}odels for {M}obile-{R}obot {N}avigation",
journal = "IROS-96. IEEE",
year = 1996
}

@inproceedings{Parr95,
author = "Parr, R., and Russell, S.",
title = "Approximating optimal policies for partially observable stochastic domains",
booktitle = "Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI-95)",
publisher = "Morgan Kaufmann",
year = 1995,
pages = 1088-1094
}


@book{Dean91,
author = "Thomas Dean and Michael P. Wellman",
title = "Planning and control",
publisher = "Morgan Kaufman Publishers",
address = "San Mateo, Ca.",
year = "1991"
}

A good work on reinforcement learning

@inproceedings{Rodriguez00,
author = "Rodriguez, A., Parr, R., and Koller, D.",
title = "Reinforcement learning using approximate belief states",
booktitle = "Advances in Neural Information Processing Systems 12",
publisher = "MIT Press",
year = 2000
}

Two classic papers everyone should at least glance at!

@book{Howard60,
author = "Howard, R. A.",
booktitle = "Dynamic Programming and Markov Processes",
publisher = "MIT Press",
address = "Cambridge, Massachusetts",
year = 1960
}

@book{Bellman57,
author = "Bellman, R. E.",
title = "Dynamic Programming",
publisher = "Princeton University Press",
address = "Princeton, New Jersey",
year = 1957
}


I could go on and list many more papers, however these are enough to get you going. In particular, look at Thom Dean's work and that of his students. Leslie Kaebling does (did) a lot of work on planning in stochastic domains.


Good luck with this and feel free to ask questions if you hit any logs in the road.

Cheers,

Timkin

[edited by - Timkin on July 29, 2002 9:02:27 PM]

Share this post


Link to post
Share on other sites
Timkin: Thanks for the references. Hopefully I may find some clues when i read them.

Your preamble to the references was unnecessary though Timkin. Like I think I said, I already know a fair bit about, and use, agents based on the BDI architecture. Maybe you wrote your agent descriptions for the benefit of other readers but it comes across as though you are trying to teach me how to suck eggs. (btw, this is the point I was trying to make when you got upset the other week when a poster questioned your understanding of vectors...)

The BDI model of agency implements agents that do not have complete state information. They must infer from their beliefs about the world. They formulate beliefs everytime they interact with objects in their environment(using whatever senses they are imbued with). Therefore each agent must build some sort of decision tree based upon this incomplete (and occasionally incorrect) information in order to infer knowledge about their world. They may also infer beliefs from observing other agents interact with the environment. (They may also infer plans based upon their observations of other agents).
For further information see Rao & Georgeff "BDI Agents: From Theory to Practice" 1995 [citeseer] or "Reasoning about Rational Agents" by Michael Wooldridge (MIT press)[Amazon or a university library]
Implementations: for the best of the bunch lookup PRS, dMARs, JACK and JAM. All these, to the best of my knowledge, use fixed plan libraries.


Argus: I have no idea what STRIPS is. Please elaborate.

smilydon: Thanks for trying but that's exactly the sort of problem I mention. Using keywords such as 'planning + agents" just comes up with far too many references.





Stimulate

[edited by - fup on July 30, 2002 7:36:12 AM]

Share this post


Link to post
Share on other sites
You shouldn't use citeseer as google. The keywords are just to get started and then you do a kind of tree search. You pick one entry from the top of the "keyword list" and then you get context information about that paper, like: visiter that looked at this paper also looked at...., or similar pages based on sentence level.
Then you select at this level new papers that seem interesting.
This is the way i use citeseer (and i'm really happy about the tabs in mozilla ). It is a fast way to explore an area of research and also can be very inspirational. Looking for papers is not only about finding that paper, but also about the things you learn while looking (hmm, sounds a bit Taoistic)....

To add to Timkins list:
You can look at Mahadevan's more recent publications ( here ). It's also about planning in stochastic domains. The hierachical pomdp looks interesting.
Also PHAM seem interesting (see i.e. here the work of Andre and Russell) for doing the learning.

Edit: oops forgot a " in the url..

[edited by - smilydon on July 30, 2002 4:28:37 PM]

Share this post


Link to post
Share on other sites
Fup : STRIPS is one of a number of architectures for allowing plan generation - it's old and very simple. However, I don't know if it's useful for what you're doing, particularly if you don't have complete state information. In fact, now I'm sure you require something different.

Basically it's just a scheme to allow searching a space for a plan in whatever fashion. You define the actions one can take and how they alter the state, and then you can search for a path to the goal in whatever fashion suits your problem (backwards chaining is usually best).

I'm sure you know about this kind of planning in some form or another, but just in case.


[edited by - Argus on July 30, 2002 6:01:36 PM]

Share this post


Link to post
Share on other sites
Smilydon: Yep, that's how I use Citeseer normally, and like you say, it is extremely inspirational (too bloody inspirational sometimes, as I'm sure you know!) - but this is the first time I've struggled to get the exact info I'm after. The word 'planning' is really too vague a word in this context - I think that's the problem. Anyway, thanks to everyones suggestions I now have another huge pile of printed papers sitting by my desk to read through.

Argus: thanks for the explanation. I'd not heard of STRIPs before, although I'm familiar with the technique. And yes, its way too simple. One of the big problem is - the agents not only have incomplete state knowledge, they also don't know how some of the actions will alter the state. They have to discover this information by experimentation. (or by communicating with another agent which has already calculated the utility of that particular action)



[edited by - fup on July 30, 2002 11:54:02 PM]

Share this post


Link to post
Share on other sites
Hey, good xtra links smilydon! I've not seen many of those before. Some of the other papers look interesting too. Have you had personal experience with this type of problem or are these links just a result of your encyclopedic knowledge?

PS. It's my birthday tomorrow. I've taken a few days off so my brain cells may not recover from the battering I'm going to give them 'til early next week.

See you all then.

[edited by - fup on July 30, 2002 11:53:41 PM]

Share this post


Link to post
Share on other sites
quote:
Original post by fup
Your preamble to the references was unnecessary though Timkin.



It wasn''t aimed at you but rather for everyone elses benefit... but I can see how it would look that it was since I neglected to mention that. You have my sincere apolgies for the egg sucking excercise!

As a follow-up, you might want to look into the work done by Michael Kearns at AT&T labs. Also, the Intelligent Agent Lab at Melbourne University does work in cooperation with DSTO here in Australia on using BDI for agents in simulated environments. Interestingly enough, Emma Norling, who is doing her PhD with the IALab is married to Peter Wallis who is a long time researcher on the JACK project (which arose from work from the AAII here in Australia). I had a great chat with Peter a few months ago... it seems JACK is looking for other application domains and they wanted to have a crack at applying it to UAV flight management, hence his picking my brain. The work that Emma is doing would certainly interest you fup as it overlaps with some of the stuff you have been doing in exploration and map learning.

I hope this helps.

Good luck,

Timkin

Share this post


Link to post
Share on other sites