• Create Account

#ActualÁlvaro

Posted 05 January 2013 - 04:22 PM

I think minimax won't help you at all here. Monte Carlo methods, however, should do the job nicely.

The first thing you need to have is a probabilistic model of how players make decisions that can be evaluated very fast. It doesn't have to be perfect; you can actually start with something that assigns equal probabilities to all the choices available, and then make obviously bad moves much less likely and obviously good ones much more likely. Always normalize the probabilities to make sure they add up to 1.

Armed with this fast probabilistic model, you proceed to run simulations as follows:

(1) Create a random permutation of cards that matches the cards you have seen so far.

(2) Replay the hand from the beginning up to the current point. Multiply the probabilities of all the decisions the players have made (use the fast probabilistic model for this). The resulting number is called the likelihood of the observed decisions given the card permutation from (1).

(3) Now play a hypothetical move, among the moves you are considering (we'll discuss how to pick this move later, but for now think of it as a random move).

(4) Play the hand to the end, using the fast probabilistic model for all future decisions.

Accumulate statistics of how often you win or lose with each move selected at (3). Some simulations are more relevant than others: Use the likelihood computed in (2) as the weight of the simulation.

After you have played a number of simulations, you'll have a pretty good idea of what moves are promising and which ones aren't, and evaluating the bad moves many times over is just a waste of time. So when you get to (3) you want to give the strong moves a larger probability of being picked. There is a theoretical construct that is very close to this situation, called a multi-arm bandit, and the theory developed for those can be useful. In particular, there is a policy called UCB1 that is described in an early paper about the computer go program MoGo, which consists of picking always the move that maximizes a formula that goes something like this (from memory):

expected_reward[i] + (1/sqrt(number_of_simulations[i])) * log(1+total_number_of_simulations) * some_constant

So you need to keep track of how many simulations have been played for each move and also the total number of simulations. I think you should "count" these simulations weighted by the likelihood, so you are actually using the sum of the likelihoods of the simulations instead of the count.

When you run out of time, you can play the move with the highest expected reward, or the move that was tried the most times in your simulations. Or you may want to take a little more time for this move if these two criteria don't agree.

That should be enough to get you started. As you certainly know if you have programmed AI for board games before, there are many decisions to be made as you build your program. For instance, it might be better to generate card permutations in (1) using some version of importance sampling, because for some games the average likelihood of a random card permutation might be very very low. Or you may want to reuse the situation from (1) to evaluate several moves...

Do you have a particular card game in mind?

#1Álvaro

Posted 05 January 2013 - 04:21 PM

I think minimax won't help you at all here. Monte Carlo methods, however, should do the job nicely.

The first thing you need to have is a probabilistic model of how players make decisions that can be evaluated very fast. It doesn't have to be perfect; you can actually start with something that takes a random choice among the choices available, and then make obviously bad moves much less likely and obviously good ones much more likely.

Armed with this fast probabilistic model, you proceed to run simulations as follows:

(1) Create a random permutation of cards that matches the cards you have seen so far.

(2) Replay the hand from the beginning up to the current point. Multiply the probabilities of all the decisions the players have made (use the fast probabilistic model for this). The resulting number is called the likelihood of the observed decisions given the card permutation from (1).

(3) Now play a hypothetical move, among the moves you are considering (we'll discuss how to pick this move later, but for now think of it as a random move).

(4) Play the hand to the end, using the fast probabilistic model for all future decisions.

Accumulate statistics of how often you win or lose with each move selected at (3). Some simulations are more relevant than others: Use the likelihood computed in (2) as the weight of the simulation.

After you have played a number of simulations, you'll have a pretty good idea of what moves are promising and which ones aren't, and evaluating the bad moves many times over is just a waste of time. So when you get to (3) you want to give the strong moves a larger probability of being picked. There is a theoretical construct that is very close to this situation, called a multi-arm bandit, and the theory developed for those can be useful. In particular, there is a policy called UCB1 that is described in an early paper about the computer go program MoGo, which consists of picking always the move that maximizes a formula that goes something like this (from memory):

expected_reward[i] + (1/sqrt(number_of_simulations[i])) * log(1+total_number_of_simulations) * some_constant

So you need to keep track of how many simulations have been played for each move and also the total number of simulations. I think you should "count" these simulations weighted by the likelihood, so you are actually using the sum of the likelihoods of the simulations instead of the count.

When you run out of time, you can play the move with the highest expected reward, or the move that was tried the most times in your simulations. Or you may want to take a little more time for this move if these two criteria don't agree.

That should be enough to get you started. As you certainly know if you have programmed AI for board games before, there are many decisions to be made as you build your program. For instance, it might be better to generate card permutations in (1) using some version of importance sampling, because for some games the average likelihood of a random card permutation might be very very low. Or you may want to reuse the situation from (1) to evaluate several moves...

Do you have a particular card game in mind?

PARTNERS