This post is related to a comment Alvaro made in a thread about AI in games of perfect information.
I'm interested in implementing Monte Carlo Tree search for a game that involves random outcomes. For some moves the outcome is fixed. For others there could be 6 or more different outcomes depending on some unknown value. The probabilities of each outcome occuring can change as the game progresses-- in this way it is similar to Poker.
I understand MCTS in terms of growing the tree for moves with a fixed outcome, but I'm confused about nodes that have a random outcome. Do I try to expand these nodes, or do I simply use the node to as a starting point for a number of simulated games?
If they are expanded, how? Is each possible outcome explored as a separate node? Are the probability distributions stored, and the node essentially 'randomized' each time it is visited (which would totally mess up the sanity of child nodes?) I can see the value in expanding them (to a degree) but it seems extermely risky given the nature of UCB-- it may attach priority to a very unlikely outcome.
Any help would be very much appreciated. Thanks!
Will
5 replies to this topic
Ad:
#2 Members - Reputation: 1689
Posted 01 February 2012 - 01:11 PM
You do expand those nodes as if the result of the die were a move by some player, but you have to make sure that instead of using UCB to select a "move" in future searches, you simply pick a random move at this type of node.
Does that clear up all your questions?
Does that clear up all your questions?
#3 Moderators - Reputation: 1247
Posted 02 February 2012 - 08:34 AM
Put another way, those branches of the tree represent "what could possibly happen based on the relative likelihood that they will happen."
Dave Mark - President and Lead Designer of Intrinsic Algorithm LLC
Blogs I write:
IA News - What's happening at IA | IA on AI - AI news and notes | Post-Play'em - Observations on AI of games I play
"Reducing the world to mathematical equations!"
Professional consultant on game AI, mathematical modeling, simulation modeling
Co-advisor of the GDC AI Summit
Co-founder of the AI Game Programmers Guild
Author of the book, Behavioral Mathematics for Game AI
Blogs I write:
IA News - What's happening at IA | IA on AI - AI news and notes | Post-Play'em - Observations on AI of games I play
"Reducing the world to mathematical equations!"
#4 Members - Reputation: 118
Posted 02 February 2012 - 09:34 AM
alvaro, on 01 February 2012 - 01:11 PM, said:
You do expand those nodes as if the result of the die were a move by some player, but you have to make sure that instead of using UCB to select a "move" in future searches, you simply pick a random move at this type of node. Does that clear up all your questions?
I think so.. So to clarify, I expand until I hit a 'chance' node. When I hit a chance node, I roll the dice, and expand as if that were the move. The next time I run in to that node, I roll the dice again.
#5 Members - Reputation: 1689
Posted 02 February 2012 - 09:53 AM
The way I have done it before, the board structure knows whose turn it is. I consider die rolls to be moves by a player called "Fortuna" (my wife is a classicist). The UCT tree expands a position where Fortuna is to move just as it does any other position. When I store the statistics for each move (wins and visits), the wins are stored from the point of view of the player to move. Since Fortuna doesn't care who wins, I always count the result of the playout as 0.5 from her point of view. The UCB rule will then always select a move among the ones that have been explored the least. I add a bit of randomness to the UCB score of each move so we don't always explore "1" first or have some other weird bias like that.
EDIT: Of course, in the non-UCT part of the playout, Fortuna picks random moves.
EDIT: Of course, in the non-UCT part of the playout, Fortuna picks random moves.


















