• Advertisement
Sign in to follow this  

AI for a simple shedding card game

This topic is 1906 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hello,

We have this game with 4 players where each players take turns to discard cards that must be higher than the ones previously discarded. The last player with remaining cards loses. I have code on the gameplay, and the AI has code that can list valid combinations from the cards that it has.

I'm planning to use Monte Carlo playouts.

What do you think?

Edited by mudslinger

Share this post


Link to post
Share on other sites
Advertisement
Monte Carlo sounds an excellent approach here, but unlike in your pseudo, you need to be very smart about where you direct your playouts. Having a fixed "Do 10,000 times" loop in the code is very inefficient and wastes a lot of the playouts.

An important aspect of Monte Carlo is to construct/expand a game tree while the playouts are being played. You should never settle to just playing a fixed number of playouts at the root of the tree (or you can, but it won't be as effective). This is the machinery that causes the result to converge to the best play as you increase the number of playouts.

You'll also want to estimate the uncertainty of a certain branch. For a move M, if you play 1000 random samples and always lose, versus if you played 1000 random samples and win 50% of the time, you'll know statistically that you should probably invest more on the latter case, and never play the first move. This should guide the statistical playouts to balance the search towards uncertain branches. UCT tree search is one of the recent key advances in the field. This is also an interesting read, that gives some more references.

Share this post


Link to post
Share on other sites
I'm not that familiar with Monte Carlo, although I fully understand the pseudocode.

What do you mean by "construct/expand a game tree while the playouts are being played"? I'm thinking this means I should use a game tree then randomly playing out the leaves.

I'll be reading about UCT and Monte Carlo in general, I hope it will be fruitful. Edited by mudslinger

Share this post


Link to post
Share on other sites
Yeah, even with Monte Carlo, it's customary to maintain a game tree. Instead of exploring the tree fully like with traditional minimax, the tree is expanded only selectively to the areas based on the Monte Carlo simulation. When the tree search hits a leaf node (the moves after that node are not tracked by the tree), the game is continued with random play i.e. a random playout until the end is performed.

Share this post


Link to post
Share on other sites
As clb points out, there is a large improvement available over sampling every move the same number of times. There are good move and bad moves, and after sampling them a few thousand times, you should have a pretty good idea of which are which, and you can exploit this by sampling promising moves more often. This idea is formalized in the multi-armed bandit problem. There is an algorithm called UCB1 that is a good starting point.

The idea of building a tree is important for full-information games, but I don't think it will get you very far in a card game of this type, since a node depends not only on what the previous moves have been, but also on the plausible scenario generated.

Share this post


Link to post
Share on other sites

UCB1 is basically formula (1.1) in that paper.


Yes, I just realized how simple UCB1 is. I'm reading another paper/presentation: http://www.cs.bham.a...ctures/ucb1.pdf. I just made a simple C implementation of UCB1 with the rand() function. It does test more rewarding functions more.

This is making more sense to me. I'll start coding again tomorrow.

EDIT: I just realized that the tree in the progressiveMCTS paper is not the game tree I think it is. I thought it was something similar to a min-max tree, in which each layer is a turn of each player. It is not - all of it is the turn of the current player. Is this thinking correct? I'm confused as to how this would work though, all of the results are back propagated to the root, and the root is always visited, therefore the root will always be chosen as the move.

EDIT2: It does say the child that was visited the most tongue.png again, more feelings of realization. Based on the simple rule, since a leaf will be expanded upon simulation, does this mean each leaf is simulated only once?

[s]Then again, trees might not be worth it, it's probably better to just list every possible move, then run UCB1 on them.[/s] Edited by mudslinger

Share this post


Link to post
Share on other sites
Where do you use rand() in UCB1? The original algorithm is deterministic (which is kind of strange for a Monte Carlo technique, but the randomness comes from the results of the playouts).

Start by implementing UCB1 at the root. It's basically the same as the pseudo-code at the beginning of the thread, but instead of sampling each move the same number of times, you always sample the move that has the highest UCB1 value:
While there is time {
Select the move M that maximizes the UCB1 formula
Generate a plausible configuration for the cards of the opponents
Play M
Run the rest of the game randomly (room for improvement here)
Count how often you won
}
Pick the move that was picked most often


Don't worry about the tree part for now, but I'll be happy to explain it to you if after reading and thinking about it for a while it still doesn't make sense.

By the way, is this for some sort of simplified Tichu variant? I love Tichu!

Share this post


Link to post
Share on other sites
I think I understand the tree part now. The expanding part is not directly dependent on the number of simulations, for example, I can do 1000 simulations, then generate a branch. I can generate another 1000 simulations, then generate another branch. I think I can't apply this to my game since the game has a max 13 moves at the start and this drops quickly. I'd run out of moves for the tree. Your pseudocode is what I had in mind.

I'm not familiar with tichu.

here's my UCB1 code, it's a bit rough, I apologize.
http://pastebin.com/rddWUSzF Edited by mudslinger

Share this post


Link to post
Share on other sites
The tree part of Monte Carlo Tree Search (or more precisely the UCT algorithm) is basically using the UCB1 mechanism not only at the root, but also at other nodes that you end up visiting many times.

The problem with this in the case of card games is that you will be dealing the cards you haven't seen randomly at each simulation, so there won't be many instances of visiting the same situation twice, because from the point of view of the next player, the situation is different after each plausible configuration is determined.

Share this post


Link to post
Share on other sites

Hello,

We have this game with 4 players where each players take turns to discard cards that must be higher than the ones previously discarded. The card combinations are based on poker. Discarded cards are visible to everyone. The last player with remaining cards loses. I have code on the gameplay, and the AI has code that can list valid combinations from the cards that it has.


i'd like to toss out an idea, but before i can(as it might be moot depending), i'd like a bit of clarification of the rules, if i understand them correctly:

you can discard any number of cards, so long as it's a valid poker combination, and it must be higher than the last card discarded?, exactly how is that value calculated?, as well, how many cards are players given at start? Edited by slicer4ever

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement