sjhalayka

Blind Poker Artificial Neural Network

Recommended Posts

sjhalayka    3

I am making a game called Blind Poker, and I am going to try using an artificial neural network (ANN) to make decisions for the computer player, that is, use an AI to choose state changes (from state to state).

There are 52 cards in the deck, and there are 9 states that each card can be in: player 1's hand, player 2's hand, player 3's hand, player 4's hand, player 5's hand, discard pile, top of discard pile, top of pickup pile, and uncovered. 9 states takes 4 bits to encode. So my neural network will have 52*4 = 208 input neurons and 208 output neurons.

The biggest problem that I face now is deciding on how many hidden layers there should be, and how many neurons per hidden layer there should be.

I'll train the ANN initially by facing it off against a computer player that picks pseudorandom state changes. Once the ANN has won a significant number of games, I will copy it a bunch of times and gently perturb the weights of these copies. Then I play these copies against the ANN, and see which one comes out on top. Then I copy that winner and perturb it. So on and so forth, until some significant number of generations has been spawned.

Is there anything you'd do differently?

Share this post


Link to post
Share on other sites
alvaro    21266

The whole setup seems like a non-starter to me. How can you have those 208 inputs? If your game is anything like the poker I know, your agent will not have knowledge of the complete game state, so it shouldn't be given the complete game state as input.

Start by thinking what information the agent really knows about and what actions are available. Then figure out how to provide the input to the ANN and how to get the output. For instance, if you list the hand so far as a sequence of events, you can feed those events to an RNN and then use a softmax output layer with a probability distribution over the possible actions.

Once you have a reasonable ANN architecture, you need to train it. If you have access to a database of human games, it would be a good idea to start by doing supervised learning to try to predict what the human will do next. You can also train it by reinforcement learning, but that's usually harder, particularly in an adversarial setting like this. I have some half-baked ideas on how to do it, but it wouldn't be trivial.

 

Share this post


Link to post
Share on other sites
sjhalayka    3

Thank you for the input.

OK, so the ANN basically counts cards. At the beginning of the game, the majority of the cards will be in the state 'uncovered'. The state changes occur as the game progresses. If the state of a card is 'uncovered', then it could be in player 1's hand, player 2's hand, etc, in the pickup pile. That's a great deal of lack of knowledge. No cheating is involved.

As for the 208 neurons, that is a very tiny amount compared to some image recognition ANN that has four input neurons per pixel (e.g. R, G, B, A). Even a small image, like 64*64 pixels has a 'whopping' 16384 input neurons. It's slow, yes, but it works well when the learning rate is set to 0.1.

 

Edited by sjhalayka

Share this post


Link to post
Share on other sites
Kylotan    10010
8 hours ago, sjhalayka said:

9 states takes 4 bits to encode.

This is true, but it's not likely to be the way you want to proceed, because this would mean that you have certain inputs that are true for completely unrelated states. e.g. bit 1 would be true for cards in player 1's hand, and player 3's hand. This means the NN has to work harder to tell the difference between those 2 states, training layers purely to separate out your 4-bit representation into the 9 separate states again.

Instead, you probably want "one-hot" encoding; 9 input neurons for each card, with only one of them 'hot', and that represents which of the 9 states it's in.

 

8 hours ago, sjhalayka said:

The biggest problem that I face now is deciding on how many hidden layers there should be, and how many neurons per hidden layer there should be.

This is all part and parcel of working with neural networks. If you know about things like bias vs variance, overfitting/underfitting, learning rates, error rates, then you can do the following:

  1. Pick reasonable estimates for these hyperparameters
  2. Perform some training with that neural network
  3. Observe how well it learns, and see whether it is converging, failing to converge, converging too slowly, etc.
  4. Adjust hyperparameters according to these observations
  5. Repeat from step 2.

If you don't know about some of those things, it's time to do some research, because it's a deep subject and not easy to cover here succinctly.

 

8 hours ago, sjhalayka said:

I'll train the ANN initially by facing it off against a computer player that picks pseudorandom state changes.

This is a bad idea. A neural network learns by comparing its own output to a desired output, and adjusting accordingly. A pseudo-random state change on the opponent's part is not going to produce reasonable states to compare against. And you've not mentioned how you'll measure the error, which is the important part.

 

8 hours ago, sjhalayka said:

Once the ANN has won a significant number of games, I will copy it a bunch of times and gently perturb the weights of these copies.

This is also a really bad idea, because you've thrown away the key benefit of a neural network - i.e. the fact that errors in the output layer can be propagated back through the network to adjust each of the layers to improve the response - and vastly increased the search space of your system, by multiplying the number of networks you need to train. On a game with a state space as large as poker, you will probably find that the universe ends far sooner than your neural network becomes usable.

 

What you should do is something more like this:

  • use knowledge of poker rules and a database of past poker games to generate a training set of data that is a set of (game state, the chosen action, score/benefit).
  • Train on this data so that, when in the same situation, the computer would take the actions that scored positively, and will not take the actions that scored negatively
  • Ensure the network is not overfitting, and is therefore able to generalise to similar situations
  • Generate extra training data, e.g. by playing against existing computer programs, or even by having 2 programs play each other.

Share this post


Link to post
Share on other sites
IADaveMark    3741

Being the resident poker expert in here, I've got a feeling we can't continue helping you until you actually describe this game to us. It doesn't feel like poker from your brief mentions. Perhaps a variant of draw? If it is, then a ANN seems almost impossibly intractable for use.

Share this post


Link to post
Share on other sites
sjhalayka    3

To get a handle on how the game works, I recommend that you try the finished Windows version:

https://github.com/sjhalayka/Windows_blind_poker/raw/master/poker.zip

Disclaimer: Run it through a virus scanner, or better yet, run it on a disposable virtual machine (e.g. VirtualBox with graphics acceleration enabled).

The game pretty much shows you the way to go, what to choose from, with green borders around the cards that you can click on. The border goes red while the computer players make their moves.

The game is called Bind Poker because you can't see anything at the beginning, and the goal of the game is to get the best poker hand.

I really appreciate everyone's questions and comments.

Attached is an image of the table. There are 4 players, and on the bottom are the discard pile (left) and the pickup pile (right).

 

Screen Shot 2017-08-12 at 9.02.18 PM.png

Edited by sjhalayka

Share this post


Link to post
Share on other sites
sjhalayka    3
6 hours ago, Kylotan said:

This is true, but it's not likely to be the way you want to proceed, because this would mean that you have certain inputs that are true for completely unrelated states. e.g. bit 1 would be true for cards in player 1's hand, and player 3's hand. This means the NN has to work harder to tell the difference between those 2 states, training layers purely to separate out your 4-bit representation into the 9 separate states again.

Instead, you probably want "one-hot" encoding; 9 input neurons for each card, with only one of them 'hot', and that represents which of the 9 states it's in.

Thank you. I will consider one-hot encoding. I believe that the book I was reading, Practical Neural Network Recipes in C++, preferred the bit encoding versus the one-hot encoding. I'm having trouble finding what page this is on.

 

6 hours ago, Kylotan said:

This is all part and parcel of working with neural networks. If you know about things like bias vs variance, overfitting/underfitting, learning rates, error rates, then you can do the following:

  1. Pick reasonable estimates for these hyperparameters
  2. Perform some training with that neural network
  3. Observe how well it learns, and see whether it is converging, failing to converge, converging too slowly, etc.
  4. Adjust hyperparameters according to these observations
  5. Repeat from step 2.

If you don't know about some of those things, it's time to do some research, because it's a deep subject and not easy to cover here succinctly.

Thanks again, for the insight and recommendations.

 

6 hours ago, Kylotan said:

This is a bad idea. A neural network learns by comparing its own output to a desired output, and adjusting accordingly. A pseudo-random state change on the opponent's part is not going to produce reasonable states to compare against. And you've not mentioned how you'll measure the error, which is the important part.

I get the error from the back propagation process. I can use the mean squared error and some low error rate like 0.01. Perhaps I'm not sure what you mean.

 

6 hours ago, Kylotan said:

This is also a really bad idea, because you've thrown away the key benefit of a neural network - i.e. the fact that errors in the output layer can be propagated back through the network to adjust each of the layers to improve the response - and vastly increased the search space of your system, by multiplying the number of networks you need to train. On a game with a state space as large as poker, you will probably find that the universe ends far sooner than your neural network becomes usable.

That doesn't sound very promising. 

 

6 hours ago, Kylotan said:

What you should do is something more like this:

  • use knowledge of poker rules and a database of past poker games to generate a training set of data that is a set of (game state, the chosen action, score/benefit).
  • Train on this data so that, when in the same situation, the computer would take the actions that scored positively, and will not take the actions that scored negatively
  • Ensure the network is not overfitting, and is therefore able to generalise to similar situations
  • Generate extra training data, e.g. by playing against existing computer programs, or even by having 2 programs play each other.

Sounds good. Thanks once again for the list of things to check out.

Share this post


Link to post
Share on other sites
Kylotan    10010
6 minutes ago, sjhalayka said:

I get the error from the back propagation process.

No, you need the error for the back propagation process - it is the error that gets back-propagated!

Your original post makes no mention of error, and just applies some sort of genetic algorithm to the nets to try and increase their quality. When this is done, this is done instead of backpropagation. It's also rarely done these days because backpropagation is more effective by far.

Share this post


Link to post
Share on other sites
sjhalayka    3
29 minutes ago, Kylotan said:

No, you need the error for the back propagation process - it is the error that gets back-propagated!

Your original post makes no mention of error, and just applies some sort of genetic algorithm to the nets to try and increase their quality. When this is done, this is done instead of backpropagation. It's also rarely done these days because backpropagation is more effective by far.

Sorry, you're right. I should have said that I get the mean squared error returned from the back propagation function, in my code. I calculate it at the beginning of the function, and return it at the end of the function. An example that I wrote, focusing on the XOR operation, is at: https://github.com/sjhalayka/ann_xor

In the game I will only back propagate when the ANN wins a game. The relevant state changes will be represented by the input and output neurons during the back propagation process.

I see what you mean about shortcutting the back propagation process. Perhaps I should entirely cut out the mutation process altogether. It does seem flimsy.

 

 

Edited by sjhalayka

Share this post


Link to post
Share on other sites
sjhalayka    3

@Kyltotan: Rather than test for error, I could abort after x number of games, or some high enough win/loss ratio. Indeed, I'm having trouble knowing when to stop the learning process. Millions of games would be played, for sure, seeing how the 5 card odds of landing a royal flush in Texas Hold 'Em is like 649,739:1.

@alvaro: I think I see what was meant by knowledge in advance: the training process must check to see if the proposed state change given by the ANN is even valid. Thus, all of the possible state changes must be enumerated, so as to have something to check the validity of the proposed state change. If the proposed state change is invalid, then one valid state change is picked pseudorandomly from the enumeration of possible state changes. Anyway, the main point is that the possible state changes must be known to the AI in advance. That's cheating, I suppose, but it's the only way that I can see it working. I can also see this being part of the equation when it comes to the socket server having to check to see if a player's proposed state change is valid.

Edited by sjhalayka

Share this post


Link to post
Share on other sites
Kylotan    10010
15 hours ago, sjhalayka said:

Practical Neural Network Recipes in C++

This book is 24 years old. The state of the art has advanced SIGNIFICANTLY since then, not to mention the computing power available. Do not use that book as an example of what to do today.

In particular, genetic optimisation of neural networks made sense in the 90s; it mostly doesn't, now.

 

12 hours ago, sjhalayka said:

Rather than test for error, I could abort after x number of games

This sounds like you don't really understand what the inputs and outputs of the net should be. If you expect to train the network by running entire games and then scoring it with a Win or a Lose, you're throwing away tons of useful information and wasting time. You need a more effective way of judging the value of each move.

 

12 hours ago, sjhalayka said:

I'm having trouble knowing when to stop the learning process

And this indicates that you need to study this aspect. Andrew Ng's online course explains how to observe how well a system is learning and how to adjust it to improve.

Share this post


Link to post
Share on other sites
IADaveMark    3741

I admit that I didn't look at the game link above, but I'm really having a hard time imagining how a NN (of any flavor) would be a good fit here. If you want to learn NNs, pick another game. If you want to do the AI for this game, pick another technique.

Share this post


Link to post
Share on other sites
sjhalayka    3
11 hours ago, Kylotan said:

This book is 24 years old. The state of the art has advanced SIGNIFICANTLY since then, not to mention the computing power available. Do not use that book as an example of what to do today.

In particular, genetic optimisation of neural networks made sense in the 90s; it mostly doesn't, now.

Do you have any recommendations on books to get?

 

11 hours ago, Kylotan said:

This sounds like you don't really understand what the inputs and outputs of the net should be. If you expect to train the network by running entire games and then scoring it with a Win or a Lose, you're throwing away tons of useful information and wasting time. You need a more effective way of judging the value of each move.

Well, I'm pretty sure that the inputs and outputs will be states, with the input/output pair counted as a state change. When a move is made, the AI gives a proposed state change. If that proposed state change is invalid, then a state change is picked pseudorandomly from a list of known valid state changes. When a game is won, all of the state changes picked during the game are ran through the back propagation process. When a game is lost, all of the states changes are abandoned.

Do you know of a more efficient way?

 

11 hours ago, Kylotan said:

And this indicates that you need to study this aspect. Andrew Ng's online course explains how to observe how well a system is learning and how to adjust it to improve.

OK.

5 hours ago, IADaveMark said:

I admit that I didn't look at the game link above, but I'm really having a hard time imagining how a NN (of any flavor) would be a good fit here. If you want to learn NNs, pick another game. If you want to do the AI for this game, pick another technique.

That's ok. What games would work with a NN? What kind of AI might work better for a poker game?

I believe that I understand what you mean by intractable.

Share this post


Link to post
Share on other sites
sjhalayka    3
6 minutes ago, IADaveMark said:

OK... so you are really, really sold on doing something with a NN? That's your entire focus here?

NNs are largely only good at pattern matching (as you seem to understand). That is, "given this exact set of inputs, what's the best output?" 

I'm not really sold on a NN solution, definitely less so since this discussion. :)

I'm just going to do hard-coded AI, using if else statements and the like. I do appreciate you steering me away from using a NN, which doesn't seem to be the best solution.

Share this post


Link to post
Share on other sites
ApochPiQ    23064

Note that what you're talking about is not really what most people consider "poker" - it's (from what I can tell) a hand-assembly game where hands are scored using poker rules.

This is actually a very important distinction.

Playing, say, Texas Hold 'Em with an ANN requires one specific set of game knowledge for the NN to be effective. Playing your game requires a very different set of game state and game awareness representation.

So your links are basically like finding a recipe for cake and deciding that the ingredients would make a good hamburger, because cake and hamburgers are both eaten. You can't cherry-pick one part of the system (NNs) and slap it into a disparate game and expect to get equally good results.

What people are trying to tell you in this thread is that you need to decide what you are doing. Either focus on NNs and apply them to a game they are suited for (i.e. NOT your game) or focus on your game and find a technique that works for it.

Share this post


Link to post
Share on other sites
sjhalayka    3

I've pretty much decided to use functions and conditional statements to construct an AI.

The ANN, in the end before I gave up on it, consisted of 468 input neurons to encode the card states, and one output neuron to encode a which-way choice. I was going to try one hidden layer of x = sqrt(num input neurons * num output neurons) = 22 hidden neurons. This was a lot simpler than the original version.

Edited by sjhalayka

Share this post


Link to post
Share on other sites
IADaveMark    3741

You have such a thin view of the current state space. Also, your decision space is pretty much guided by "does this card make my hand better?" In THe, the state space is based on your current hand, AND what the other players could make with the board, AND what their betting says about their hand, AND the math of the pot odds for calls or how you can force them into sub-optimal pot odds decisions based on YOUR bet sizing, etc. There is a lot more going on there. Additionally, your decision space is much wider (particularly in no limit hold'em) since bet sizing is such a major part of the game.

Now I'm not a fan of using ANNs in hold'em, but that's because I hate the "feeling around in the dark" approach for something that is mathematically definable right from the start.

(Note, I'm interested in this... )

 

Share this post


Link to post
Share on other sites
sjhalayka    3

 

    size_t rank_hand(const size_t player_index) const
    {
        if(player_index >= NUM_PLAYERS)
            return 0;
        
        size_t ret = 0;
        
        vector<card> temp_hand = players_hands[player_index];
        
        sort(temp_hand.begin(), temp_hand.end());

        // Note: FACE_A is defined to be 12
        size_t offset = FACE_A + 1;
        
        for(size_t i = 0; i < NUM_CARDS_PER_HAND; i++)
        {
            ret += temp_hand[i].face * offset;
            offset *= FACE_A + 1;
        }
        
        return ret;
    }

Edited by sjhalayka

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now