Archived

This topic is now archived and is closed to further replies.

Learning AI

This topic is 5970 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Well, I''ve created a neat little AI game (right now it''s Tic-Tac-Toe, but can be ported to any board/card based game) that basically begins life knowing nothing about the strategy of how to win at Tic-Tac-Toe, and only knows the rules of the game, and he "learns" how to win from experience playing a computer opponent, or human. It''s a neat program, that basicaly creates a very large state-based network, and after it''s played enough games, you cannot beat it, and he always plays the optimal spots. I''m in the middle of porting it over to Checkers (I have to change the interface), but I was wondering if anyone out there has any experience with this, and if so, could you point me to something like this, or maybe offer some advice? I''d like to see other things of this nature. Thanks, Will

Share this post


Link to post
Share on other sites
Sorry, I can't really help with the original request, but I did have a question: is your state-based learning intelligent enough to spot equivalent states? (For example, in tic-tac-toe, this state:

o | |
----------
o | x | o
----------
| x |

Is effectively the same state as

| | o
----------
o | x | o
----------
| x |

In this case, you have a pair of states that are vertically symmetrical. There are other forms of symmetry that would yield an effectively equivalent state, too.

The benefits of spotting these identical states are obviously that you have a smaller state-space to search through, which should speed up any calculations.

Edited by - Kylotan on June 28, 2001 1:53:37 PM

Share this post


Link to post
Share on other sites
You are correct, those states are exactly the same. But, I wrote the AI engine (I call it TheBrain) such that it knows nothing about the particular''s of a certain game. This way, it can be ported to any other board-game. I don''t know of any other board games which are symetrical like that (I know Checkers is not), so I don''t want to optimize it for any particular game.
Besides, tic-tac-toe is a simple game, and there''s really not that many states (3^9 = 19683).

Thanks for the suggestion though.

BeerNutts

Share this post


Link to post
Share on other sites
Without knowing anything of the methodology of your game agent or indeed the algorithm you implemented, it is hard to comment on it or to discuss it meaningfully! One thought though...

quote:
Original post by BeerNutts
But, I wrote the AI engine (I call it TheBrain) such that it knows nothing about the particular''s of a certain game.


There must be at least some level of interface between the game state and the agent. While most people do not recognise this as a part of the agent, generally it is. The specific response behaviour of the agent will be determined by the perceived benefit of it''s actions (which are interpreted by the interface), particularly if it is a learning agent (i.e., there must be a utility function somewhere in the model, whether stated explicity or implicitly hidden in the game model). I would be very interested to hear about your agent if you have managed to dissociate the agent architecture from it''s environment.

Would you mind elaborating just a little on the specific AI methodology you used?

Cheers,

Tim

Share this post


Link to post
Share on other sites
Well, in a way, you could say the agent is seperated form the game, but, of course, it can''t be totally seperated. There are basically 2 functions the agent must interact with; The Game Rules (where is a legal move?), and the game over (has someone won the game, or is it a draw?).

Basically, TheBrain plays a game by consulting with it''s past game experiences, and if it has been in the same state as the current game state, it makes a decision based on what move he did next, and what it resulted in. Once the game is over, TheBrain traverses each move in the current game, and marks the state based on whether he won or lost.
This way, all the agent needs to know is what moves are valid, and if the game has been won or lost. It does not need to know any strategies related to a particular game, only how to move, and what consitutes winning.

As for the particular game states (ex: how does it know what the game board''s dimensions are?), it currently just uses a two dimensional array, with the size determined at compile time. So, again, it will consult an outside macro (GAME_BOARD_SIZE_X, GAME_BOARD_SIZE_Y) for this property.

OK, when I said, "knows nothing about the particular game," I was only partially right. The Brain engine itself knows nothing about what game it plays; rather, he consults an outside source to find a legal move, and to see if he''s won (basically, he calls GameGetValidMove() and GameCheckGameOver(), where these functions are defined outside of the scope of TheBrain).

Nutts

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Well i''ve just written a simple card game which i''ve copied off my mobile called Bantumi. And i''m just writing some AI now for it. I''m going to implement a learning Neural Net and some Genetics algorithms so i can leave it over night whilst really clever NNs start getting generated and teach them selves.

Share this post


Link to post
Share on other sites
Hey, someone said something about NNs. I''ve been trying to understand how they work and why they work and how to use them. PLZ Xplain!!!

Zach Dwiel

Share this post


Link to post
Share on other sites
A neural network is basically an array of nodes (neurons like in your brain), and an array of link pathways which join up the nodes. You''ve got input links which input the NN and output links which give the out.

Each link calculates the value its going to send to the node by multiplying its input by a specific weight value (property of each link), and sends the multiplied value to its target node.

Each node then has a threshold, so if the sum of all its inputs reach this value it then fires out of its output links.

These values the hopefully filter down from input links->nodes->links etc. -> output links.

Then you can get your output by probably using the output channel with the greatest value.

If that sounds complicated then its probably my explaining, because they are really simple to get up and running. Its just the training which is the bugger.

Share this post


Link to post
Share on other sites
Thanx for the info, but I guess I just doen''t grasp how it all works. I guess I''ll try to do a basic implemention of a net, but I just realized that I don''t know what kind of inputs or outputs neural nets give and recieve. Thanx for the basic definition though. Time to go surfing for neural nets I guess!

Share this post


Link to post
Share on other sites
quote:
Original post by BeerNutts

OK, when I said, "knows nothing about the particular game," I was only partially right. The Brain engine itself knows nothing about what game it plays; rather, he consults an outside source to find a legal move, and to see if he''s won (basically, he calls GameGetValidMove() and GameCheckGameOver(), where these functions are defined outside of the scope of TheBrain).




This gets back to a basic question about AI. Would we be at all intelligent if we didn''t have sensors? I''m not going to divert this thread by following on with such a discussion, although maybe I will start another thread.

It seems to me that there is a flaw in your model.

quote:
Original post by BeerNutts
Basically, TheBrain plays a game by consulting with it''s past game experiences, and if it has been in the same state as the current game state, it makes a decision based on what move he did next, and what it resulted in. Once the game is over, TheBrain traverses each move in the current game, and marks the state based on whether he won or lost.
This way, all the agent needs to know is what moves are valid, and if the game has been won or lost. It does not need to know any strategies related to a particular game, only how to move, and what consitutes winning.



This ignores the fact that there is another player involved. It is certainly possible that state XYZ occurs in games that I win and games that my opponent wins. Let''s assume that XYZ exists more often in games I win than games I lose. Assume that I have seen this state only once and I took the optimal action but I lost the game due to my opponents future actions then this is a false-negative training case. I.e., it suggests to the brain that I will lose from this state when based on likelihoods computed from the full game tree I should probably win.

How do you account for this?

Cheers,

Tim

Share this post


Link to post
Share on other sites
Now, I''m certainly no AI expert by any stretch of the imagination, but I can''t help but think that the point you make, Timkin, applies equally to absolutely any learning system that models non-deterministic state transitions. Not only computerised learning suffers this way - human + animal learning work this way too. False associations are a common theme in many phobias, etc. I expect that the idea is, as with any statistical method, that repeated use will create a more representative data set which should ''iron out'' any blips in the data.

Share this post


Link to post
Share on other sites
quote:
Original post by Kylotan
Now, I''m certainly no AI expert by any stretch of the imagination, but I can''t help but think that the point you make, Timkin, applies equally to absolutely any learning system that models non-deterministic state transitions.



Well certainly the issue of false-positives and false negatives is related to any classification/learning problem and yes, primarily those based on probabilities. What I was suggesting though was that simply choosing an action based on whether the last time the action was performed the game was won or lost, is not really an indicator of the viability of the action. This gets back to the problem of payoff in games. Payoff only occurs when a final state is reached (win or loss). We need some way of determining which states really led to the final state and which states were merely detours of no consequence. By marking all states visited (and all associated actions) as forming either a win or a loss strategy, we unduly reward or penalise those detour states (which may indeed be an important part of a different win/loss strategy)

Cheers,

Tim

Share this post


Link to post
Share on other sites
Timkin and Kylotan,

Wonderful points brought up by both of you. Here''s what I was thinking when I created TheBrain. It is a crude attempt at modelling (at a very low-level) the experiences we as human''s face. If we touch a hot stove, we get burned, so we know not to touch it again. That was exactly what I wanted to model, except, in any board game, the brain doesn''t care about anything except winning and losing (like you said). But, realize the brain doesn''t just use the "last action" from a state. It uses all the actions performed from that state, and uses all the past moves as decisions. If all the actions from that state resulted in an undesireable end, then it will choose a new, unexplored action (if available). So, like Kylotan said, if played enough, then it should come out to pick the optimal solution.

Currently, with the tic-tac-toe game, starting from an empty brain (never played before), it takes about 400 games against a "random" computer enemy (makes random moves, except moves where he can win) before the brain will pick the optimal actions to win.

I haven''t thought about finding ways to remove "detour moves", but I really don''t care about it either. Like I said earlier, all the brain cares about is winning and losing, and if he takes a few detour moves that lead to a win, then he (and I) am happy.

Nutts

Share this post


Link to post
Share on other sites
My question is can't you way just the path from state to state and then "pay-off" all PATHS that generate the same win. More to the path that is shortest less to the long one, on the theory that longer paths are more likely to make mistakes or a comeback?

Smile, it won't let you live longer, but it is funny when they die with smile.

Edited by - adtheice on July 6, 2001 2:09:12 AM

Share this post


Link to post
Share on other sites
Talking more about the issue of pay-off (reward) - as this is fundamental to the performance of any learning system - here are two strategies you can try out.

Both are applied after a result is obtained.

The first pays off each action at an exponentially declining rate based on its distance from the final game state. This basically means that actions that set the stage for a win are given a small reward while actions that actually achieve the win are given a larger reward.

The second method requires computing the minimal plan (set of actions) from the start to the particular end state (after it is known) and then to reward the actual plan using a base amount that is reduced by how far away the actual plan was from the optimal plan. For instance, if the ratio of number of actions in the actual plan to the optimal plan is ''x'', then the payoff might be (for an exponentially declining payoff)

payoff = base_payoff * e1-x

Note: ''x'' is always >= 1 so the maximum multiplier on the base is 1 and the minimum is zero (when the ratio of moves is infinite).

Hope these ideas help.

Cheers,

Tim

Share this post


Link to post
Share on other sites
Timkin,

Thanks for the suggestion. I particularlly like the idea of giving more points to states closer to the win. I wanted to keep the AI somewhat simple, and this option is still simple, but should prove more fruitful in the final product.

Thanks for the help,
Nutts

Share this post


Link to post
Share on other sites
Guys, from what I''ve read so far you are talking about a well-established learning model named ''Reinforcement learning''
It''s main features is that the agent has no model of the world and the reward is ''delayed'' (i.e. at the end of the match)
A famous agent for playing backgammon, Gerald Tesauro''s TD-Gammon, used a technique of reinforcement learning known as temporal difference learning to learn the rules of backgammon and defeat world-class players.

Share this post


Link to post
Share on other sites
Yes, we are talking about reinforcement learning. In particular, we are discussing ''supervised learning''. Supervised learning is applicable to many different areas of AI, not just game playing.

Cheers,

Timkin

Share this post


Link to post
Share on other sites
Just to sort out terms, supervised learning is not the same as reinforcement learning, nor is one distinctly a subset of the other (though they are related).
Supervised learning occurs when a right answer is known and can be presented to the system so the system can adjust itself towards that answer, i.e. the system answers 7, the right answer is 9 and the system adjusts itself so that next time it will answer closer to 9 than 7.
Reinforcement learning occurs when the exact answer isn''t known only a general sense of how right the systems answer is, i.e. the system answers seven and the feedback is "not bad, 8.5 out of ten".
Unsupervised learning occurs when the system is set up so that each part of it adjusts itself according to the values of the system around it without any kind of explicit feedback from an outside teacher. Such as Hebbian learning in an ANN where the synchronocity of the firing of two connected neurons affects the weighting of the connection between them.

Just thought I''d clarify.

Mike

Share this post


Link to post
Share on other sites
Of course, there is the perspective that supervised learning, reinforcement learning and unsupervised learning form a heirarchy tree based on the level of uncertainty in the value function. So while you are correct to point out that supervised learning is not a subset of reinforcement learning (as I see a reading of my last post may infer) it is reasonable to associate the two based on the amount of certainty one holds in an appropriate reward for a given association.

Of course, the implementation is different (in general), so I''m not going to push this associative model further (and the pun is intended if you catch it!)

Cheers,

Tim

Share this post


Link to post
Share on other sites
BeerNutts how about applying your engine to a tic-tac-toe where four or five is used to win.

Consider board size that has no boundaries. If one player goes first in 3 or 4 tic-tac-toe that player is guaranteed to win but if its 5 in a row tic-tac-toe then the player is not guaranteed. At least as far as i know.

Making 3 or 4 tic-tac-toe was easy for me since all the posible moves can be hardcoded but with 5 thats not posible. I started working on a search algorithm but i had not finished it because i lost interest. Its not an original idea ofcourse i had played a number of good and bad 5 tic-tac-toe but i never met one that could not loose.

Share this post


Link to post
Share on other sites
Are you talking about a 5x5 tic-tac-toe board where 5 in a row is required for a win, or an NxN board where 5 in a row is required for a win?

If it's the former, then the search is fairly trivial as its only 1012 states. If you want hard, try finding a solution in a search space of ~10620 states! That's more states than particles in the universe! It has been done though and only took 2 weeks of CPU time on an average desktop PC!

Cheers,

Timkin

Edited by - Timkin on July 19, 2001 1:36:02 AM

Share this post


Link to post
Share on other sites
If you want to go for the evolutionary approach then pitching one brain against another is a good way. Have multiple populations in a GA and run a co-evolutionary experiment where each population gains fitness for how they fair against a member of another population. By keeping more than two populations you can ensure that the individuals aren''t specialising to win against specific strategies and are being more general.

Mike

Share this post


Link to post
Share on other sites