fast TD function approximation

Started by
10 comments, last by Timkin 16 years, 11 months ago
hey, so Im writing this program that is going to learn a pretty complex game (State space is huge, I currently have 40-170 inputs, half of which are continuous and I might need to add more). currently, I plan on using TD(lambda) and the simple back-prop neural net algorithm. From experience however this is very slow (even though its one of the faster algorithms), and isnt guaranteed convergence. I know residual algorithms provide a tradoff between speed and convergence, but I think at least to start off, Id rather have an algorithm that is faster but possibly less accurate. Are there any out there that can beat the gradient descent neural net algorithm as far as speed?
Advertisement
There are a lot more ways to do function approximation than using nueral networks. The primary difference is the type of functions you approximate with and generally the closer those functions are to what you are trying to approximate the better it will work. The fastest method I can think of would be a straight linear transformation, which is basically equivalent to a neural net with no hidden layers and no firing requirements on nodes, but it can be implemented with a single matrix of weights. A backprop like technique can be used to adjust the weights if there are a lot of control points, but you can also solve systems of equations to find the transformation if you have the same number of control points as inputs. That should be a lot faster than a neural network, but of course that method will only work well if the function you're trying to approximate is fairly close to linear. Neural networks are only good if you have no idea what kind of function you are trying to approximate or the function is very similar to functions produced by neural networks. If you know your function is going to be linear or polynomial or sinusoidal or whatever then you'll get the best approximation and performance if you adjust your technique accordingly.
na theres no way this function is linear. Even if I altered the inputs in a way that they required no real processing (which would end up being over 1000 inputs) Id be suprised if it were linear. The program is going to learn to play poker. Right now I have it setup to represent cards in either 1 or 2 inputs, with no real input about probable hands. That alone would probably require significant non-linear processing.
This doesn't sound like a problem you should be using function approximation for...
This function is going to be extremely non-linear, and much more complex than it has to be. There's no practical reason to force the system to learn about what combinations constitute a complete poker hand. Learning other parts of the game will be hard enough. I think I'd use features for the inputs that correspond to what hands the player has or is close to having. Possibly train a separate AI for each phase of the game so it doesn't need to learn how different parts of the game affect strategy (each AI might also have different feature recognizers depending on what type of poker this is).

Unless the whole point of this is to get the AI to recognize hands. In that case, yeah, it's extremely non-linear and discontinuous, and neural networks could possibly have problems with local maxima.
yea see I thought about all of that.

First, while inputting probabilities about the hand rather then just the hand seems like a good idea, it has some drawbacks. The main one is that theres no good way to fully describe the state without inputting the entire hand, and having the neural net figure out how to process it. Yes, I could input the probability of getting each of the 7 different hand types, but that still isnt fully descriptive. I would rather allow the neural net to figure out how to process it on its own, instead of hardcoding it.

Second, Ive thought about splitting it up in diffrent ways. Making a neural net to process different hands and another to find the expected gain from some game state seems like a good idea also. However, what would the first neural net output? It couldnt just output a rating for the hand because the specific board is used to determine possible hands for your opponents.

Although these both seem like good ideas, I think it would make the final neural net inferior, and theres no good way to split up the tasks without taking out possible functionality.

Finally, I havnt really decided on any design yet, the main point of this post was to see if there are any algorithms that are faster than the TD(lambda) neural net.

Alrecenk, how is this not good for function approximation? This problem is PERFECT for function approximation. Yes, it would end up being very complex, and Id have to train many different nets in order to get one that converges to a good maxima, but I think function approximation is a good way to solve this.
It's not good for such general function approximation because of how extremely non-linear it is and how easy it is to perform significant amounts of preprocessing. The AI will need to learn that 5,6,7,8,9 and 6,7,8,9,10 are very good hands but that 5,7,8,9,10; 5,6,8,9,10; 5,6,7,9,10; and 5,6,7,8,10 are bad hands, unless all the cards are of the same suit in which case it is a very good hand. It will need to be able to recognize these good hands to some extent in order to know not to discard any of the cards.

You could represent a hand by having 52 inputs, one for each card, and setting them to 0 or 1 depending on if that card is in the hand or not. Imagine teaching a human to play poker with a deck of cards where each card just has a number from 1 to 52 on it, with the mapping from numbers to actual cards determined at random and not accessible to the player. Now imagine that your only method of teaching them is to have them play the game and watch other people play the game (with this bizarre deck of cards). Learning to play well this way would be extremely difficult for a human. I can't imagine a neural network doing much better.
Personally I'd consider representing hands (of my player and opponents) as a set of discrete jump processes with observations and use a discrete recurrent architecture to learn a value function over filtrations of these processes (if you're not sure what a filtration is, look up any good quality text on stochastic calculus and economics... I can recommend Klebaner). This value function can then be used within a decision process to determine appropriate play; i.e., bet (check/call/raise) or fold.

The key point of poker is that the cards you've observed dictate the filtration of the set of possible hands and that each players filtration is different. One typically tries to infer the filtration of other players by observing their betting strategy, but this is a nonlinear feedback system and so is easily corrupted (i.e., the use of bluffs). Since you never get to see the hand of someone that folds (or a blind winner), you don't get to validate their filtration against your inference of it, making it an ongoing problem from hand to hand. If you observe a player for long enough though, you'll learn to correlate their betting behaviours with their value function and if you see enough terminal hands you can even estimate their betting policy. Not an easy problem though, but one that separates the pros from the wannabes.
I could see training a network for something like betting strategy, but there is no reason (other than personal entertainment) to train a network to evaluate hands since the game comes with a well defined means of evaluating hands. A network for betting strategy would only need the current quality of the hand, the potential quality of the opponents hands and the bets/betting behaviors, not necessarrily anything about the cards themselves. The quality of a 5 card hand with no possible changes is trivial to calculate(look up the value in a table of hand types). The quality of a nonfull hand could be the sum of (the quality of possible hands * the chance of that hand occuring) or an approximation of something like that. The quality of a full hand with the chance to change is the maximum partial hand quality that could be achieved by dropping cards in that full hand . That'll change some depending on the type of poker, but the point is the quality of a hand can be directly calculated/estimated with out any type of learning AI.
One could argue though - and quite rightly given the many failed attempts - that using expected utility measures for hand quality in a game like poker is the wrong way to go. Maximising EU is a model for rational decision processes. Poker is not easily represented as a game involving rational players because it involves bluffing and feedback in the value estimation problem. Advanced strategies such as intentionally losing mediocre pots to fool your opponents in their estimation of your betting strategy also come into it... and at the end of the day, even small amounts of uncertainty can offer significant differences in the perceived value of a particular course of play for a given player. If all players took a risk neutral stance, poker would be an easy game to win! ;)

Cheers,

Timkin

This topic is closed to new replies.

Advertisement