UCB1 is basically formula (1.1) in that paper.
Yes, I just realized how simple UCB1 is. I'm reading another paper/presentation: http://www.cs.bham.a...ctures/ucb1.pdf
. I just made a simple C implementation of UCB1 with the rand() function. It does test more rewarding functions more.
This is making more sense to me. I'll start coding again tomorrow.
EDIT: I just realized that the tree in the progressiveMCTS paper is not the game tree I think it is. I thought it was something similar to a min-max tree, in which each layer is a turn of each player. It is not - all of it is the turn of the current player. Is this thinking correct? I'm confused as to how this would work though, all of the results are back propagated to the root, and the root is always visited, therefore the root will always be chosen as the move.
EDIT2: It does say the child that was visited the most
again, more feelings of realization. Based on the simple rule, since a leaf will be expanded upon simulation, does this mean each leaf is simulated only once?
Then again, trees might not be worth it, it's probably better to just list every possible move, then run UCB1 on them.
Edited by mudslinger, 11 October 2012 - 04:15 PM.