Blind Poker Artificial Neural Network

Started by
31 comments, last by sjhalayka 6 years, 7 months ago

I am making a game called Blind Poker, and I am going to try using an artificial neural network (ANN) to make decisions for the computer player, that is, use an AI to choose state changes (from state to state).

There are 52 cards in the deck, and there are 9 states that each card can be in: player 1's hand, player 2's hand, player 3's hand, player 4's hand, player 5's hand, discard pile, top of discard pile, top of pickup pile, and uncovered. 9 states takes 4 bits to encode. So my neural network will have 52*4 = 208 input neurons and 208 output neurons.

The biggest problem that I face now is deciding on how many hidden layers there should be, and how many neurons per hidden layer there should be.

I'll train the ANN initially by facing it off against a computer player that picks pseudorandom state changes. Once the ANN has won a significant number of games, I will copy it a bunch of times and gently perturb the weights of these copies. Then I play these copies against the ANN, and see which one comes out on top. Then I copy that winner and perturb it. So on and so forth, until some significant number of generations has been spawned.

Is there anything you'd do differently?

Advertisement

The whole setup seems like a non-starter to me. How can you have those 208 inputs? If your game is anything like the poker I know, your agent will not have knowledge of the complete game state, so it shouldn't be given the complete game state as input.

Start by thinking what information the agent really knows about and what actions are available. Then figure out how to provide the input to the ANN and how to get the output. For instance, if you list the hand so far as a sequence of events, you can feed those events to an RNN and then use a softmax output layer with a probability distribution over the possible actions.

Once you have a reasonable ANN architecture, you need to train it. If you have access to a database of human games, it would be a good idea to start by doing supervised learning to try to predict what the human will do next. You can also train it by reinforcement learning, but that's usually harder, particularly in an adversarial setting like this. I have some half-baked ideas on how to do it, but it wouldn't be trivial.

 

Thank you for the input.

OK, so the ANN basically counts cards. At the beginning of the game, the majority of the cards will be in the state 'uncovered'. The state changes occur as the game progresses. If the state of a card is 'uncovered', then it could be in player 1's hand, player 2's hand, etc, in the pickup pile. That's a great deal of lack of knowledge. No cheating is involved.

As for the 208 neurons, that is a very tiny amount compared to some image recognition ANN that has four input neurons per pixel (e.g. R, G, B, A). Even a small image, like 64*64 pixels has a 'whopping' 16384 input neurons. It's slow, yes, but it works well when the learning rate is set to 0.1.

 

8 hours ago, sjhalayka said:

9 states takes 4 bits to encode.

This is true, but it's not likely to be the way you want to proceed, because this would mean that you have certain inputs that are true for completely unrelated states. e.g. bit 1 would be true for cards in player 1's hand, and player 3's hand. This means the NN has to work harder to tell the difference between those 2 states, training layers purely to separate out your 4-bit representation into the 9 separate states again.

Instead, you probably want "one-hot" encoding; 9 input neurons for each card, with only one of them 'hot', and that represents which of the 9 states it's in.

 

8 hours ago, sjhalayka said:

The biggest problem that I face now is deciding on how many hidden layers there should be, and how many neurons per hidden layer there should be.

This is all part and parcel of working with neural networks. If you know about things like bias vs variance, overfitting/underfitting, learning rates, error rates, then you can do the following:

  1. Pick reasonable estimates for these hyperparameters
  2. Perform some training with that neural network
  3. Observe how well it learns, and see whether it is converging, failing to converge, converging too slowly, etc.
  4. Adjust hyperparameters according to these observations
  5. Repeat from step 2.

If you don't know about some of those things, it's time to do some research, because it's a deep subject and not easy to cover here succinctly.

 

8 hours ago, sjhalayka said:

I'll train the ANN initially by facing it off against a computer player that picks pseudorandom state changes.

This is a bad idea. A neural network learns by comparing its own output to a desired output, and adjusting accordingly. A pseudo-random state change on the opponent's part is not going to produce reasonable states to compare against. And you've not mentioned how you'll measure the error, which is the important part.

 

8 hours ago, sjhalayka said:

Once the ANN has won a significant number of games, I will copy it a bunch of times and gently perturb the weights of these copies.

This is also a really bad idea, because you've thrown away the key benefit of a neural network - i.e. the fact that errors in the output layer can be propagated back through the network to adjust each of the layers to improve the response - and vastly increased the search space of your system, by multiplying the number of networks you need to train. On a game with a state space as large as poker, you will probably find that the universe ends far sooner than your neural network becomes usable.

 

What you should do is something more like this:

  • use knowledge of poker rules and a database of past poker games to generate a training set of data that is a set of (game state, the chosen action, score/benefit).
  • Train on this data so that, when in the same situation, the computer would take the actions that scored positively, and will not take the actions that scored negatively
  • Ensure the network is not overfitting, and is therefore able to generalise to similar situations
  • Generate extra training data, e.g. by playing against existing computer programs, or even by having 2 programs play each other.

Being the resident poker expert in here, I've got a feeling we can't continue helping you until you actually describe this game to us. It doesn't feel like poker from your brief mentions. Perhaps a variant of draw? If it is, then a ANN seems almost impossibly intractable for use.

Dave Mark - President and Lead Designer of Intrinsic Algorithm LLC
Professional consultant on game AI, mathematical modeling, simulation modeling
Co-founder and 10 year advisor of the GDC AI Summit
Author of the book, Behavioral Mathematics for Game AI
Blogs I write:
IA News - What's happening at IA | IA on AI - AI news and notes | Post-Play'em - Observations on AI of games I play

"Reducing the world to mathematical equations!"

To get a handle on how the game works, I recommend that you try the finished Windows version:

https://github.com/sjhalayka/Windows_blind_poker/raw/master/poker.zip

Disclaimer: Run it through a virus scanner, or better yet, run it on a disposable virtual machine (e.g. VirtualBox with graphics acceleration enabled).

The game pretty much shows you the way to go, what to choose from, with green borders around the cards that you can click on. The border goes red while the computer players make their moves.

The game is called Bind Poker because you can't see anything at the beginning, and the goal of the game is to get the best poker hand.

I really appreciate everyone's questions and comments.

Attached is an image of the table. There are 4 players, and on the bottom are the discard pile (left) and the pickup pile (right).

 

Screen Shot 2017-08-12 at 9.02.18 PM.png

6 hours ago, Kylotan said:

This is true, but it's not likely to be the way you want to proceed, because this would mean that you have certain inputs that are true for completely unrelated states. e.g. bit 1 would be true for cards in player 1's hand, and player 3's hand. This means the NN has to work harder to tell the difference between those 2 states, training layers purely to separate out your 4-bit representation into the 9 separate states again.

Instead, you probably want "one-hot" encoding; 9 input neurons for each card, with only one of them 'hot', and that represents which of the 9 states it's in.

Thank you. I will consider one-hot encoding. I believe that the book I was reading, Practical Neural Network Recipes in C++, preferred the bit encoding versus the one-hot encoding. I'm having trouble finding what page this is on.

 

6 hours ago, Kylotan said:

This is all part and parcel of working with neural networks. If you know about things like bias vs variance, overfitting/underfitting, learning rates, error rates, then you can do the following:

  1. Pick reasonable estimates for these hyperparameters
  2. Perform some training with that neural network
  3. Observe how well it learns, and see whether it is converging, failing to converge, converging too slowly, etc.
  4. Adjust hyperparameters according to these observations
  5. Repeat from step 2.

If you don't know about some of those things, it's time to do some research, because it's a deep subject and not easy to cover here succinctly.

Thanks again, for the insight and recommendations.

 

6 hours ago, Kylotan said:

This is a bad idea. A neural network learns by comparing its own output to a desired output, and adjusting accordingly. A pseudo-random state change on the opponent's part is not going to produce reasonable states to compare against. And you've not mentioned how you'll measure the error, which is the important part.

I get the error from the back propagation process. I can use the mean squared error and some low error rate like 0.01. Perhaps I'm not sure what you mean.

 

6 hours ago, Kylotan said:

This is also a really bad idea, because you've thrown away the key benefit of a neural network - i.e. the fact that errors in the output layer can be propagated back through the network to adjust each of the layers to improve the response - and vastly increased the search space of your system, by multiplying the number of networks you need to train. On a game with a state space as large as poker, you will probably find that the universe ends far sooner than your neural network becomes usable.

That doesn't sound very promising. 

 

6 hours ago, Kylotan said:

What you should do is something more like this:

  • use knowledge of poker rules and a database of past poker games to generate a training set of data that is a set of (game state, the chosen action, score/benefit).
  • Train on this data so that, when in the same situation, the computer would take the actions that scored positively, and will not take the actions that scored negatively
  • Ensure the network is not overfitting, and is therefore able to generalise to similar situations
  • Generate extra training data, e.g. by playing against existing computer programs, or even by having 2 programs play each other.

Sounds good. Thanks once again for the list of things to check out.

6 minutes ago, sjhalayka said:

I get the error from the back propagation process.

No, you need the error for the back propagation process - it is the error that gets back-propagated!

Your original post makes no mention of error, and just applies some sort of genetic algorithm to the nets to try and increase their quality. When this is done, this is done instead of backpropagation. It's also rarely done these days because backpropagation is more effective by far.

29 minutes ago, Kylotan said:

No, you need the error for the back propagation process - it is the error that gets back-propagated!

Your original post makes no mention of error, and just applies some sort of genetic algorithm to the nets to try and increase their quality. When this is done, this is done instead of backpropagation. It's also rarely done these days because backpropagation is more effective by far.

Sorry, you're right. I should have said that I get the mean squared error returned from the back propagation function, in my code. I calculate it at the beginning of the function, and return it at the end of the function. An example that I wrote, focusing on the XOR operation, is at: https://github.com/sjhalayka/ann_xor

In the game I will only back propagate when the ANN wins a game. The relevant state changes will be represented by the input and output neurons during the back propagation process.

I see what you mean about shortcutting the back propagation process. Perhaps I should entirely cut out the mutation process altogether. It does seem flimsy.

 

 

There is a chapter in Practical Neural Network Recipes in C++ called Eluding Local Minima II: Genetic Optimization. Perhaps I should study that chapter before I proceed.

This topic is closed to new replies.

Advertisement