MCTS with a Draw reward

Started by
1 comment, last by alvaro 5 years, 7 months ago

If I use MCTS but with "reward" as -1, 0, and 1 for lose, draw, and win respectively, can I use the UCT formula as is?

uct = node.rewards/(node.visits+1.0) + explorationRate * sqrt(ln(node.parent.visits) / (node.visits+1.0))

Afterwards, I still return the node that was most visited as the best move?

Advertisement

That seems reasonable. You just need to use an estimate of the expected value of the distribution, and node.rewards/(node.visits+1) is reasonable.

A minor matter of naming: I normally call that the "UCB1 formula", not the "UCT formula". UCT is the algorithm resulting from using the UCB1 formula at every node of an expanding tree.

 

This topic is closed to new replies.

Advertisement