Ai that uses Stats to determine if two variables are independent or not

Started by
7 comments, last by Dwiel 21 years, 2 months ago
Hello, I have learned about chi squared testing in statistics class and believe that it has a very powerful application in the AI field. For those of you who don't know what a chi squared(pronounced: KI like hi, but with a K) it is used to determine if two variables are independent or not. This is only one of the few things it can acomplish, but the one we are insterested in at the moment. What you do, is give it a table of counts such as:

music preference | Rap | other |
--------------------------------
< 30 years old   |  25 |   75  | 
>= 30 years old  |  2  |   107 |
 
the numbers in the cells are the # of people in the sample that fall into that catagory. NOTE: no means or proportions can be used here. ONLY COUNTS! what the chi squared test would tell us here is that the two variables, music preference and age are not independent. Now I here you asking: "How in the world does this help us?". Well, if we could have our AI test many variables against each other and bea ble to tell how dependent they are on eachother, the AI would be able to 'know' what variables effect eachother. In this way, the AI would be able to tell that say there is a large corelation between armor stength of a unit in an RTS and dmage taken. It would then be able to carry out further regression techniques (very basic) and determine what the correlation is. It then would 'know' that more armor means less damage taken and base its following actions on this knowledge. Because we can tell if two variables are related to each other, we can narrow down what variable effect each other and keep the AI from trying to find patterns between two completely unrelated variables. Any ideas? Is this already used? Tazzel3d ~ Dwiel EDIT: tried to use pre tags instead of code tags [edited by - Tazzel3d on January 31, 2003 5:51:22 PM]
Advertisement
yeah, statistics class gave me some ideas too when I took the class, I forgot about them so I can''t tell you how well they work, but go for it!
Yes, there is a large overlap between statistics and machine learning. Chi-square and similar statistics have been used directly in machine learning algorithms for years. In my data mining work, for instance, one tool I use to look for relationships in data uses a statistical calculation to decide whether individual variables are worth further investigation or not.

[edited by - Predictor on February 1, 2003 8:45:05 AM]
Independence between variables is only interesting if you want to compute mutual probabilities. In case of independent variables you can simply multiply their individual probabilities. This is something you want to know when you work on raw data, like measurements or values in a large database.

The example of the armor strength only tells you that the armor will reduce the damage inflicted. This is something that''s already known by the programmer (it''s what the armor is used for). If you want to simplify the AI you should look at higher level abstract concepts that are not immediately clear to the programmer, like "life expectancy under enemy fire". The value of LifeExp depends on the units hip points (hp), the units armor strength (as) and the enemies weapon strength (ws). If you can compute LifeExp you can make a simple AI like:

if ( LifeExp(your_as,your_hp,his_ws) > LifeExp(his_as,his_hp,your_ws) ){  // you will win    Attack;  }else{  // you will loose    RunAway;  // or GetHelp;  } 


The point here is that LifeExp should DEPEND on as, hp and ws. You can make the units more interesting by giving them different speeds and give the weapons different ranges. These can also be included in LifeExp. Including something like unit color in LifeExp can be useless when there is NO correlation between color and LifeExp. Then you shouldn''t use it. If you generate test fights between different units with different equipment then you can use chi-square to test for dependencies between LifeExp and the attribute. If they are independent then you can ignore them.
The other way around is also possible. As a game designer you want to balance your units. If you find out that one unit is too weak compared to the other units, then you can find out which attributes has to be changed (changing an independent attribute is pointless). For a simple game you can guess which attributes have to be changed. For very large RTS of RPG games with dozens of units, armor and weapons statistical tricks like chi-squared become essential for balancing the game.

BTW: machine learning == statistics with cool buzzwords
Thanx for the thoughts guys!

I do know that the example I gave which used the chi2 to test if armor and damage taken are related was just an example. I was showing that basic relationships can be shown to exist between two variables. Expanding on your points, I believe a very good way of using the life expectancy example is to use the chi2 to determine if there are relationship between variables and then use say an NN to find the relationship. This allows your NN to search for relationships between variables we know have relationships instead of blindly guessing. One thing I''m not to sure about though is if chi2 idea where you find the expected counts of cells will work with more than 2 variables. In my AP stat class we have never descussed 3 variable chi2. Does the same basic idea still work? Where you find the expected counts for the cells assuming that the variables are independent and then see how far off they are from the actual data?

Thanx for the responses everyone!

Tazzel3d ~ Dwiel
Tazzel3D:

Unfortunately you haven''t come up with anything new, although you should not be discouraged by that, since you have realised an important use for the notion of statistical in/dependence.

In/dependence in systems of variables has long been studied in AI. In the past 20 years Belief Networks have been used to represent these systems and they have been applied to games, however the use of statistical representations of causality extends back hundreds of years to the work of Pascal, Bayes, Gauss and others.

To give you a little more to think about (and to research if you desire), you can arrive at independence in a slightly more efficient manner.

Take a set of variables for which you have collected data on the frequency of events (occurrences of values of the variables). You can easily compute the mean vector for the set and the covariance matrix. The inverse of the covariance matrix is the correlation matrix of the set. Zero elements in this matrix represent independent variables. It is natural, although not strictly correct, to infer that non-zero elements represent causal relationships between variables. They don''t though, merely correlations between events of that value (perhaps due to a common cause).

If you were to draw a diagram with each variable in the set as a node (circle) and each non-zero entry a line between its two corresponding nodes, then you would have created a graph representation of the set of variables.

A particular form of such a graph, known as a DAG (Directed, Acyclic Graph) is the basis of a Bayesian belief network and can be used for all manner of very interesting artificial intelligence tasks.

If you''re interested in using this sort of thing in games I highly recommend that you read up on Bayesian networks. A very good - and very accessible - starting point is Finn Jensen''s book, ''An Introduction to Bayesian Networks''. Russell & Norvig''s ''Artificial Intelligence: A Modern Approach'' also gives a good introduction, although it isn''t quite as accessible as a first read. There''s an abundance of material online and I''d be happy to point you in a particular direction if you want more information on something specific.

Cheers,

Timkin


Timkin:
wow! thanx for the info! I''ll definately read up on that stuff! I love AI. I did see you mention some specific references. Which one would you recomend me read first? Are any of them more or less advanced or understandable? I''ll look on the web for some rescources too. Are there any good pages on the web that are espesially good for someone who doesn''t know much about the subject yet?

Thanx a lot!
I love this place!

Tazzel3d ~ Dwiel
If you want to learn this stuff, you''re better off getting a hold of a book... they usually give a more complete view of a topic... and it''s good to have them in your bookshelf later on, as a reference. If you''re just starting out, I would recommend Finn Jensen''s book. You don''t need to understand too much probability theory to get into it... and it has clear examples that you can work by hand.

Cheers,

Timkin
A search for "belief networks" returned a large number of results on Google. Here''s one that looks good for an introduction. I''m reading it myself right now.

http://www.anc.ed.ac.uk/~amos/belief.html

This topic is closed to new replies.

Advertisement