Blind Poker Artificial Neural Network

Started by
31 comments, last by sjhalayka 6 years, 7 months ago

I've pretty much decided to use functions and conditional statements to construct an AI.

The ANN, in the end before I gave up on it, consisted of 468 input neurons to encode the card states, and one output neuron to encode a which-way choice. I was going to try one hidden layer of x = sqrt(num input neurons * num output neurons) = 22 hidden neurons. This was a lot simpler than the original version.

Advertisement
4 hours ago, sjhalayka said:

Thanks for the information. Now I'm hungry.

I got that comment a lot when the article first came out but mostly because the GDMag art staff had a blast with the layout. Mexican food everywhere!

Dave Mark - President and Lead Designer of Intrinsic Algorithm LLC
Professional consultant on game AI, mathematical modeling, simulation modeling
Co-founder and 10 year advisor of the GDC AI Summit
Author of the book, Behavioral Mathematics for Game AI
Blogs I write:
IA News - What's happening at IA | IA on AI - AI news and notes | Post-Play'em - Observations on AI of games I play

"Reducing the world to mathematical equations!"

What makes Blind Poker so much different from Texas Hold 'Em, that using an ANN no longer fits as a possible solution? Not to beat a dead horse or anything.

You have such a thin view of the current state space. Also, your decision space is pretty much guided by "does this card make my hand better?" In THe, the state space is based on your current hand, AND what the other players could make with the board, AND what their betting says about their hand, AND the math of the pot odds for calls or how you can force them into sub-optimal pot odds decisions based on YOUR bet sizing, etc. There is a lot more going on there. Additionally, your decision space is much wider (particularly in no limit hold'em) since bet sizing is such a major part of the game.

Now I'm not a fan of using ANNs in hold'em, but that's because I hate the "feeling around in the dark" approach for something that is mathematically definable right from the start.

(Note, I'm interested in this... )

 

Dave Mark - President and Lead Designer of Intrinsic Algorithm LLC
Professional consultant on game AI, mathematical modeling, simulation modeling
Co-founder and 10 year advisor of the GDC AI Summit
Author of the book, Behavioral Mathematics for Game AI
Blogs I write:
IA News - What's happening at IA | IA on AI - AI news and notes | Post-Play'em - Observations on AI of games I play

"Reducing the world to mathematical equations!"

 


    size_t rank_hand(const size_t player_index) const
    {
        if(player_index >= NUM_PLAYERS)
            return 0;
        
        size_t ret = 0;
        
        vector<card> temp_hand = players_hands[player_index];
        
        sort(temp_hand.begin(), temp_hand.end());

        // Note: FACE_A is defined to be 12
        size_t offset = FACE_A + 1;
        
        for(size_t i = 0; i < NUM_CARDS_PER_HAND; i++)
        {
            ret += temp_hand.face * offset;
            offset *= FACE_A + 1;
        }
        
        return ret;
    }

There's knowledge to be gained by considering the plays of any winner; not just when you win.

There's also knowledge to be gained by considering the plays of any loser; not just when you lose.

So I took a couple of days off and went the ANN route. The code is up at: https://github.com/sjhalayka/bpai

I play 1 pseudorandom computer player versus 4 ANNs, for 5 players altogether. No training is needed for the player who wins. It's when the ANN loses that its choices are negated (e.g. if(0 == floor(value + 0.5)) { value = 1; } else { value = 0; } ) and fed into the back propagation function. Seeing how the error rate settles down to some constant number, only the maximum number of training sessions is what's used to terminate the learning process.

For 4 players to 2 players, I play 3 to 1 ANNs. Altogether, for 2 players, 3 players, 4 players, and 5 players, there are 10 ANNs.

... now back to the hard-coded AI.

I updated the code to include one-hot encoding, using a #define. 

In order to play the game, the computer player needs to know what the best possible rank (Royal Flush, etc) their current hand can make. Cards in one's hand that are still not shown are treated as wild cards. Only cards that are currently not shown on the entire table are used to make the best possible rank.

Once the best possible rank is obtained, one then temporarily takes a card from either the discard or pickup pile and runs that hand through the best possible rank function. If this temporary rank is less than the best possible rank, the card is not taken on a permanent basis.

This is not perfect, but it does the trick far better than a computer player who choses the discard pile or pickup pile cards pseudorandomly.

This topic is closed to new replies.

Advertisement