Jump to content
  • Advertisement
sjhalayka

Blind Poker Artificial Neural Network

Recommended Posts

I am making a game called Blind Poker, and I am going to try using an artificial neural network (ANN) to make decisions for the computer player, that is, use an AI to choose state changes (from state to state).

There are 52 cards in the deck, and there are 9 states that each card can be in: player 1's hand, player 2's hand, player 3's hand, player 4's hand, player 5's hand, discard pile, top of discard pile, top of pickup pile, and uncovered. 9 states takes 4 bits to encode. So my neural network will have 52*4 = 208 input neurons and 208 output neurons.

The biggest problem that I face now is deciding on how many hidden layers there should be, and how many neurons per hidden layer there should be.

I'll train the ANN initially by facing it off against a computer player that picks pseudorandom state changes. Once the ANN has won a significant number of games, I will copy it a bunch of times and gently perturb the weights of these copies. Then I play these copies against the ANN, and see which one comes out on top. Then I copy that winner and perturb it. So on and so forth, until some significant number of generations has been spawned.

Is there anything you'd do differently?

Share this post


Link to post
Share on other sites
Advertisement

The whole setup seems like a non-starter to me. How can you have those 208 inputs? If your game is anything like the poker I know, your agent will not have knowledge of the complete game state, so it shouldn't be given the complete game state as input.

Start by thinking what information the agent really knows about and what actions are available. Then figure out how to provide the input to the ANN and how to get the output. For instance, if you list the hand so far as a sequence of events, you can feed those events to an RNN and then use a softmax output layer with a probability distribution over the possible actions.

Once you have a reasonable ANN architecture, you need to train it. If you have access to a database of human games, it would be a good idea to start by doing supervised learning to try to predict what the human will do next. You can also train it by reinforcement learning, but that's usually harder, particularly in an adversarial setting like this. I have some half-baked ideas on how to do it, but it wouldn't be trivial.

 

Share this post


Link to post
Share on other sites

Thank you for the input.

OK, so the ANN basically counts cards. At the beginning of the game, the majority of the cards will be in the state 'uncovered'. The state changes occur as the game progresses. If the state of a card is 'uncovered', then it could be in player 1's hand, player 2's hand, etc, in the pickup pile. That's a great deal of lack of knowledge. No cheating is involved.

As for the 208 neurons, that is a very tiny amount compared to some image recognition ANN that has four input neurons per pixel (e.g. R, G, B, A). Even a small image, like 64*64 pixels has a 'whopping' 16384 input neurons. It's slow, yes, but it works well when the learning rate is set to 0.1.

 

Edited by sjhalayka

Share this post


Link to post
Share on other sites
8 hours ago, sjhalayka said:

9 states takes 4 bits to encode.

This is true, but it's not likely to be the way you want to proceed, because this would mean that you have certain inputs that are true for completely unrelated states. e.g. bit 1 would be true for cards in player 1's hand, and player 3's hand. This means the NN has to work harder to tell the difference between those 2 states, training layers purely to separate out your 4-bit representation into the 9 separate states again.

Instead, you probably want "one-hot" encoding; 9 input neurons for each card, with only one of them 'hot', and that represents which of the 9 states it's in.

 

8 hours ago, sjhalayka said:

The biggest problem that I face now is deciding on how many hidden layers there should be, and how many neurons per hidden layer there should be.

This is all part and parcel of working with neural networks. If you know about things like bias vs variance, overfitting/underfitting, learning rates, error rates, then you can do the following:

  1. Pick reasonable estimates for these hyperparameters
  2. Perform some training with that neural network
  3. Observe how well it learns, and see whether it is converging, failing to converge, converging too slowly, etc.
  4. Adjust hyperparameters according to these observations
  5. Repeat from step 2.

If you don't know about some of those things, it's time to do some research, because it's a deep subject and not easy to cover here succinctly.

 

8 hours ago, sjhalayka said:

I'll train the ANN initially by facing it off against a computer player that picks pseudorandom state changes.

This is a bad idea. A neural network learns by comparing its own output to a desired output, and adjusting accordingly. A pseudo-random state change on the opponent's part is not going to produce reasonable states to compare against. And you've not mentioned how you'll measure the error, which is the important part.

 

8 hours ago, sjhalayka said:

Once the ANN has won a significant number of games, I will copy it a bunch of times and gently perturb the weights of these copies.

This is also a really bad idea, because you've thrown away the key benefit of a neural network - i.e. the fact that errors in the output layer can be propagated back through the network to adjust each of the layers to improve the response - and vastly increased the search space of your system, by multiplying the number of networks you need to train. On a game with a state space as large as poker, you will probably find that the universe ends far sooner than your neural network becomes usable.

 

What you should do is something more like this:

  • use knowledge of poker rules and a database of past poker games to generate a training set of data that is a set of (game state, the chosen action, score/benefit).
  • Train on this data so that, when in the same situation, the computer would take the actions that scored positively, and will not take the actions that scored negatively
  • Ensure the network is not overfitting, and is therefore able to generalise to similar situations
  • Generate extra training data, e.g. by playing against existing computer programs, or even by having 2 programs play each other.

Share this post


Link to post
Share on other sites

Being the resident poker expert in here, I've got a feeling we can't continue helping you until you actually describe this game to us. It doesn't feel like poker from your brief mentions. Perhaps a variant of draw? If it is, then a ANN seems almost impossibly intractable for use.

Share this post


Link to post
Share on other sites

To get a handle on how the game works, I recommend that you try the finished Windows version:

https://github.com/sjhalayka/Windows_blind_poker/raw/master/poker.zip

Disclaimer: Run it through a virus scanner, or better yet, run it on a disposable virtual machine (e.g. VirtualBox with graphics acceleration enabled).

The game pretty much shows you the way to go, what to choose from, with green borders around the cards that you can click on. The border goes red while the computer players make their moves.

The game is called Bind Poker because you can't see anything at the beginning, and the goal of the game is to get the best poker hand.

I really appreciate everyone's questions and comments.

Attached is an image of the table. There are 4 players, and on the bottom are the discard pile (left) and the pickup pile (right).

 

Screen Shot 2017-08-12 at 9.02.18 PM.png

Edited by sjhalayka

Share this post


Link to post
Share on other sites
6 hours ago, Kylotan said:

This is true, but it's not likely to be the way you want to proceed, because this would mean that you have certain inputs that are true for completely unrelated states. e.g. bit 1 would be true for cards in player 1's hand, and player 3's hand. This means the NN has to work harder to tell the difference between those 2 states, training layers purely to separate out your 4-bit representation into the 9 separate states again.

Instead, you probably want "one-hot" encoding; 9 input neurons for each card, with only one of them 'hot', and that represents which of the 9 states it's in.

Thank you. I will consider one-hot encoding. I believe that the book I was reading, Practical Neural Network Recipes in C++, preferred the bit encoding versus the one-hot encoding. I'm having trouble finding what page this is on.

 

6 hours ago, Kylotan said:

This is all part and parcel of working with neural networks. If you know about things like bias vs variance, overfitting/underfitting, learning rates, error rates, then you can do the following:

  1. Pick reasonable estimates for these hyperparameters
  2. Perform some training with that neural network
  3. Observe how well it learns, and see whether it is converging, failing to converge, converging too slowly, etc.
  4. Adjust hyperparameters according to these observations
  5. Repeat from step 2.

If you don't know about some of those things, it's time to do some research, because it's a deep subject and not easy to cover here succinctly.

Thanks again, for the insight and recommendations.

 

6 hours ago, Kylotan said:

This is a bad idea. A neural network learns by comparing its own output to a desired output, and adjusting accordingly. A pseudo-random state change on the opponent's part is not going to produce reasonable states to compare against. And you've not mentioned how you'll measure the error, which is the important part.

I get the error from the back propagation process. I can use the mean squared error and some low error rate like 0.01. Perhaps I'm not sure what you mean.

 

6 hours ago, Kylotan said:

This is also a really bad idea, because you've thrown away the key benefit of a neural network - i.e. the fact that errors in the output layer can be propagated back through the network to adjust each of the layers to improve the response - and vastly increased the search space of your system, by multiplying the number of networks you need to train. On a game with a state space as large as poker, you will probably find that the universe ends far sooner than your neural network becomes usable.

That doesn't sound very promising. 

 

6 hours ago, Kylotan said:

What you should do is something more like this:

  • use knowledge of poker rules and a database of past poker games to generate a training set of data that is a set of (game state, the chosen action, score/benefit).
  • Train on this data so that, when in the same situation, the computer would take the actions that scored positively, and will not take the actions that scored negatively
  • Ensure the network is not overfitting, and is therefore able to generalise to similar situations
  • Generate extra training data, e.g. by playing against existing computer programs, or even by having 2 programs play each other.

Sounds good. Thanks once again for the list of things to check out.

Share this post


Link to post
Share on other sites
6 minutes ago, sjhalayka said:

I get the error from the back propagation process.

No, you need the error for the back propagation process - it is the error that gets back-propagated!

Your original post makes no mention of error, and just applies some sort of genetic algorithm to the nets to try and increase their quality. When this is done, this is done instead of backpropagation. It's also rarely done these days because backpropagation is more effective by far.

Share this post


Link to post
Share on other sites
29 minutes ago, Kylotan said:

No, you need the error for the back propagation process - it is the error that gets back-propagated!

Your original post makes no mention of error, and just applies some sort of genetic algorithm to the nets to try and increase their quality. When this is done, this is done instead of backpropagation. It's also rarely done these days because backpropagation is more effective by far.

Sorry, you're right. I should have said that I get the mean squared error returned from the back propagation function, in my code. I calculate it at the beginning of the function, and return it at the end of the function. An example that I wrote, focusing on the XOR operation, is at: https://github.com/sjhalayka/ann_xor

In the game I will only back propagate when the ANN wins a game. The relevant state changes will be represented by the input and output neurons during the back propagation process.

I see what you mean about shortcutting the back propagation process. Perhaps I should entirely cut out the mutation process altogether. It does seem flimsy.

 

 

Edited by sjhalayka

Share this post


Link to post
Share on other sites

There is a chapter in Practical Neural Network Recipes in C++ called Eluding Local Minima II: Genetic Optimization. Perhaps I should study that chapter before I proceed.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!