Jump to content
  • Advertisement
Sign in to follow this  
mudslinger

MCTS with a Draw reward

Recommended Posts

If I use MCTS but with "reward" as -1, 0, and 1 for lose, draw, and win respectively, can I use the UCT formula as is?

uct = node.rewards/(node.visits+1.0) + explorationRate * sqrt(ln(node.parent.visits) / (node.visits+1.0))

Afterwards, I still return the node that was most visited as the best move?

Share this post


Link to post
Share on other sites
Advertisement

That seems reasonable. You just need to use an estimate of the expected value of the distribution, and node.rewards/(node.visits+1) is reasonable.

A minor matter of naming: I normally call that the "UCB1 formula", not the "UCT formula". UCT is the algorithm resulting from using the UCB1 formula at every node of an expanding tree.

 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!