How to do feature extraction & weighting (long)
So Timkin and i have been discussing some stuff off-line about analogical reasoning, determining similarity, feature extraction, feature weighting and stuff that would be useful to learning tactics in games. i figured we might be getting close enough to something useful/actionable that we might as well bring this back to the newsgroups
So to make this understandable, let''s do some examples. There are two armies, one made of 50 bowmen and one made of 50 light infrantry. Let''s say you control the infantry guys. Will you win or will the bowmen? That depends in part on strength - maybe you both do 100-150 points of damage. But if that''s all it depended on, you''d figure you have a 50/50 chance
But like rock/paper/scissors, everyone has their strengths and weaknesses relative to another piece. Bowmen, perhaps, do well on on ranged attacks but fight at half strength hand to hand. And their ranged attacks only work if they have line of sight (no trees, bushes, etc. in the way) and if the enemy doesn''t have overhead cover
So two issues come up. First, how do you know what "features" of the world are worth tracking? i mean, after all, you can track absolute position, relative position, number of pieces, type of pieces, color of pieces, time of day, type of terrain, player name, etc. Some of those things are kinda important, some are very important, some aren''t important at all
And to complicate things, some are only important when other conditions are true. In our example, the amount of overhead coverage isn''t important if the infantry men are standing next to the bowmen (melee range). Rivers and bottlenecks are important but only if the pieces are melee pieces and not ranged. You get the idea
Why do we want to know this? First, should you attack or run away? Second, if you have multiple avenues of approach (say, one with overhead cover, one across an open plain), which should you choose? Third, if you know what terrain you''re fighting in or what type of enemy you''ll face, what pieces should you build or buy? Fourth, let''s say mom made you hire your idiot cousin to write the AI battle rules and you don''t trust him to get it right. Maybe he''ll forget a couple of cases or won''t take feedback from the playtesters. How can you automatically learn the rules or the first pass at the rules?
So the issue here is how we learn what features to pay attention to and how much to weight each one? A tricky topic but we can at least take a first stab at it
And i think the key to solving a problem is to define some good questions
1. How do you know which features (raw data) are worth tracking?
2. How do you know which features to derive?
3. How do you describe the raw data and the important features?
4. How do you get enough data to learn from?
As for #1, the approach we''ve discussed so far is decision trees. Probably the standard one where you use "information theory". i don''t know if that''s the right approach but it''s a starting point
For those who don''t know decision trees and information gain, there are several tutorials in books and some free software available (notably ID3 and it''s big brother C4.5; i know you can get ID3 in C++ and prolog and probably everything else; Wiki has it in open source java and i did an implementation in C# in roughly 30 minutes - it''s not hard). But here''s the basic idea:
Get a list of traits and one predicted trait that you have to fill in (it''s supervised learning meaning you have to say what the right answers are). The outcome is a classifier. It can be yes/no or an enumeration (say, one of five types of restaurant, one of 20 specified spells, one of 800 unit types, whatever, just no unlimited values like strings or numbers)
Second, get a big list of examples since that''s what you have to run the decision tree building algorithm against. If we were doing action types (say, attack melee, attack ranged, attack spell, run away and defend), you might have 50 examples of combat situations you''ve been in and you will have scored (by hand probably) the proper response
OK, now step three. For each trait, use "information theory" to rank them which means you find the trait that results in the least entropy which means you find the best discriminant which means you figure out what trait best divides the group into smaller groups where every one in the smaller group has the same outcome or result or classification or whatever you want to call it. There''s a formula for that but it''s pretty simple really. In our example, we have 50 test cases where each case can have one (and only one) of 5 possible recommendations. The worst trait is the one that divides the test cases into groups where the cases have different recommendations. The best trait is the one that divides the group into groups where everyone has the same recommendation. One of the better descriptions is at http://www.cse.unsw.edu.au/~billw/cs9414/notes/ml/06prop/id3/id3.html
The outcome will be a tree which boils down to a series of IF THEN statements (connected by ANDs and where each trait has only one possible value or continuous value range). You could write those IF THEN statements yourself but then you''d have to know what to write. This auto-generates the logic for you
For those following the AI stuff, we''re making decisions through a sequence of filters. First check to see how close you are. Then, if you''re far away, check for overhead cover. Third, check for whatever. You get the idea. An alternative is to check the combination of all the values at the same time using something like nearest neighbor (which is what is used in case based reasoning). That''s a pretty simple thing to implement too (i''ve wrote an open source version in Java and you can find the algorithm all over the place). You gotta decide whether to use a sequence of filtering questions or a combination of values because the two approaches are mutually exclusive. Me personally, i really like the CBR (NN) approach and i think it works better for some of the products i''ve worked on in the past (notably product recommendation engines) but when i did human tests (for evaluating FPS situations) i found that humans seemed to make decisions the first way by walking through a series of questions. So know your requirements before you pick an approach
OK, so that *might* help us determine what features to use. It works well enough for certain problems but i haven''t tried it on a fancy pants real world game so i can''t promise anything. i''m sure something will come up. i know from experience you will occassionally get bad matches, especially if you do pruning or otherwise try to avoid overfitting. It''s not uncommon that you''ll use all the attributes an end up with a group where 10 of the members have the right classification but one doesn''t. Not sure why that happens. For manually scored data, it''s normally (in my experience) that you''re missing an attribute. For automatically scored data (try something and see if it worked or not) the discrepancy is normally because of probability and chance (you roll your 20 sided die and sometimes you win, sometimes you lose)
Timkin had a concern which i listed as question 4 - how do you get enough data to learn the right decisions? Games have a lot of possible choices so your issues are 1)you might not try some of the combinations (oops, a logic hole), 2)you might not try enough times and so not get good data and 3)it could take a while. i personally don''t think it would be an issue but, hey, until you try it you never know. The more decisions you have the more you have to test and the more time it takes. But i figure you can have two AIs doing random moves as fast as possible for a few hours to get a big knowledge base. But hmm, you''d have to manually score that which isn''t very good. Who wants to score 3 million cases? The best case is you find one case for each possible situation (an exemplar case). But that means you have to manually make sure all possible combinations are covered and then score them and by the time you''ve done all that work maybe you would have been better off just writing the IF THEN statements yourself. Hmm, gotta give this more thought
OK, so here''s the probem i have - question #2, derived attributes. This is something i learned while doing some tests on how humans determine similarity between situations in a game. Given a map, some good guys and bad guys and various traits of those guys (weapons, armor, HP, position, direction, etc.) i found that the 10 people i tested first matched on the map then matched on a derived attribute - ((ratio of good guys to bad guys) and (sum of good guys and bad guys)). Actually, i suppose it was a derived combination of two derived attributes (three levels of derivation). And by derived i mean that it''s not in the raw data. How do you get a stupid computer to figure out what attributes to combine and how to combine them?
One option, a really obvious option, is to define a couple of ways to massage data (ratios, sums, whatever) and then apply it to every attribute and permutation of attributes. Those become traits just like the raw data and then you run the decision tree generator on all of them. The DT picks the best attributes (probably only 4 or 5) and throws the rest away. You do that off line before you write the game so you won''t do this during run time. Even so, how long would that take to run? i honestly don''t know. It sounds like a long time, although i said that before about a java program i wrote to de-noise, Porterize and find bigrams and trigrams in a book and that only took a few seconds so who knows. i should probably write a test program to see how practical this is
Another option is to, um... Actually, i don''t know
Gee, this post isn''t long enough to let''s talk about Q3, representation
Timkin asked "Do we need a "robust description language"?". After all, planners have one (PDDL) and so do constraint based reasoners (OCL i think; it''s been a while) and kinda/sorta multi-agent systems (KQML). Do we need a scene/situation description language that a DT or whatever could run on?
i''ll go off topic a bit to mention that a couple of people (academic) have written systems to compute similarity between "stuff". RIBL and Bisson''s matcher use standard first order (predicate) logic. TROPES uses objects. SME, i believe, requires the descriptions to be written in lisp. Most decision tree implementations (like CBR ones) take data in a fixed flat file format like pipe delimited fields (yes, the number of fields is fixed and no, using XML to add heirarchial data isn''t an option that i know of). i''ve also seen the input as a Prolog fixed length array which is just a comma delimited file
So do we need a scene description language? Honestly i don''t know. If there were a fixed, known format lots of people could write DT generators and we could concievably get critical mass. Maybe. i don''t know. The input for DTs is actually pretty well known - a fixed length, single level (non-hierarchial) array
Now in my code and assumedly a lot of other domains, the raw data is in objects which means someone has to write something to convert the source data to these fixed length arrays. Which basically means extracting the data you think is important, do whatever derivations you need and then pull it into a flat file or array. Alternately, i suppose, you could write a DT generator that knew which methods on which objects to pull the data from and how to derive proper attributes but my first guess is that''s a bad idea (mixing of roles, makes portability/reuse a problem). But i''d convert the DT to a series of IF THEN rules and maybe have that call the objects directly
What else can a description language do? Hmm, better let Timkin answer that one ''cause i''m sure there''s more to it than what i''ve mentioned
Oh well, long enough post, it''s probably better that i stop and go write some sample code and then come up with some realistic sample domains. As they say, the ideas are easy, the devil is in the details
-baylor
I can''t believe I read the whole thing! 8^)
Anyway, the decision tree idea seems to be a good one and information gain is as good as any other possible induction rule. It''s actually been shown empirically that the decision rule is not all that important, but that''s a topic for another day. The fundamental problem you''re going to have which I think you''ve already realized is that you need data with known endpoints. In other words, you need to generate a training set that consists of known attributes and whether it was a winner or not. This may be a serious difficulty in this scenario. As for the time required, I wouldn''t worry about that at all as induction of a decision tree is very fast.
On aspect you mentioned that caught my attention was the exemplar case. Suppose you have a given number of different "matches" which consist of a pairing between a good guy and a bad guy. In keeping with your example, you have a bowman vs. an infantryman. You''ve already identified a series of features which determine the outcome in a exemplar case - line of sight, overhead cover, close range, etc. If you were to work with the various features that you''ve already identified in a probabilisitc way, you might be able to come up with something very reasonable. For example, given no overhead cover, 100% line of sight, and a range of 100 yards, the bowman has a 95% chance of hitting. Now, given the same with 10% overhead cover, you have an 85% chance hitting. And so on, and so on. The idea I have is to link the various feature you''ve alread identified to a probability score, then let random chance take over.
I hope that makes sense. This should be an interesting discussion...
-Kirk
Anyway, the decision tree idea seems to be a good one and information gain is as good as any other possible induction rule. It''s actually been shown empirically that the decision rule is not all that important, but that''s a topic for another day. The fundamental problem you''re going to have which I think you''ve already realized is that you need data with known endpoints. In other words, you need to generate a training set that consists of known attributes and whether it was a winner or not. This may be a serious difficulty in this scenario. As for the time required, I wouldn''t worry about that at all as induction of a decision tree is very fast.
On aspect you mentioned that caught my attention was the exemplar case. Suppose you have a given number of different "matches" which consist of a pairing between a good guy and a bad guy. In keeping with your example, you have a bowman vs. an infantryman. You''ve already identified a series of features which determine the outcome in a exemplar case - line of sight, overhead cover, close range, etc. If you were to work with the various features that you''ve already identified in a probabilisitc way, you might be able to come up with something very reasonable. For example, given no overhead cover, 100% line of sight, and a range of 100 yards, the bowman has a 95% chance of hitting. Now, given the same with 10% overhead cover, you have an 85% chance hitting. And so on, and so on. The idea I have is to link the various feature you''ve alread identified to a probability score, then let random chance take over.
I hope that makes sense. This should be an interesting discussion...
-Kirk
I''ve written a chapter about this . Information theory all the way!
I like to think of feature extraction as a design problem (especially in computer games), so I''m not sure decision trees are necessary to pick the n best static features...
Alex
I like to think of feature extraction as a design problem (especially in computer games), so I''m not sure decision trees are necessary to pick the n best static features...
Alex
So Alex, where is this chapter? I''d certainly love to read it!
I''ve looked briefly at the proposed MPEG-7 codec, which incorporates a protocol for describing scenes. However, I don''t think it is appropriate for games, since it requires an operator to encode the description of the video sequence.
My personal opinion is that a good scene description language (SDL) is needed before one can start to perform classification, primarily because without one, we are left with expressing the scene in terms of the state of the game. The result is that a) the state may be unbounded (and hence classification not possible in a finite decision tree or equivalent); b) the important features may be (possibly non-linear) functions of the state (for example, the ratio of the relative densities of archers to footmen); and c) we may wish to perform inference on what is perceived of the state, not of the actual state (this is certainly possible using the state if we permit an uncertainty representation - like probabilities - and perform inference... although it is computationally expensive). The alternative to this last point would be to limit the perceptions of the agent to only certain features... for example, the agent may perceive that there are many more archers than footmen (but not exactly how many of each... just a fuzzy classification).
Once we have a scene depicted in our SDL, it is a fairly trivial matter to find other scenes from our past experience that are similar to the current scene. Thus, classification is a purely computational task given the SDL.
As to feature extraction (i.e., knowing what to classify on)... I want to assume that we don''t have lots of data to play with. If we do, the solution is well defined. But again, as Baylor pointed out, what if we forgot to test a given scenario or context when we created the data. We''re stuffed. I want to limit the problem to a priori estimation of the important features of a scene... or of learning these through trial and error (reinforcement).
Please feel free to share you''re thoughts on this idea.
Thanks,
Timkin
I''ve looked briefly at the proposed MPEG-7 codec, which incorporates a protocol for describing scenes. However, I don''t think it is appropriate for games, since it requires an operator to encode the description of the video sequence.
My personal opinion is that a good scene description language (SDL) is needed before one can start to perform classification, primarily because without one, we are left with expressing the scene in terms of the state of the game. The result is that a) the state may be unbounded (and hence classification not possible in a finite decision tree or equivalent); b) the important features may be (possibly non-linear) functions of the state (for example, the ratio of the relative densities of archers to footmen); and c) we may wish to perform inference on what is perceived of the state, not of the actual state (this is certainly possible using the state if we permit an uncertainty representation - like probabilities - and perform inference... although it is computationally expensive). The alternative to this last point would be to limit the perceptions of the agent to only certain features... for example, the agent may perceive that there are many more archers than footmen (but not exactly how many of each... just a fuzzy classification).
Once we have a scene depicted in our SDL, it is a fairly trivial matter to find other scenes from our past experience that are similar to the current scene. Thus, classification is a purely computational task given the SDL.
As to feature extraction (i.e., knowing what to classify on)... I want to assume that we don''t have lots of data to play with. If we do, the solution is well defined. But again, as Baylor pointed out, what if we forgot to test a given scenario or context when we created the data. We''re stuffed. I want to limit the problem to a priori estimation of the important features of a scene... or of learning these through trial and error (reinforcement).
Please feel free to share you''re thoughts on this idea.
Thanks,
Timkin
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement