UCB1 is basically formula (1.1) in that paper.
Yes, I just realized how simple UCB1 is. I'm reading another paper/presentation: http://www.cs.bham.a...ctures/ucb1.pdf. I just made a simple C implementation of UCB1 with the rand() function. It does test more rewarding functions more.
This is making more sense to me. I'll start coding again tomorrow.
EDIT: I just realized that the tree in the progressiveMCTS paper is not the game tree I think it is. I thought it was something similar to a min-max tree, in which each layer is a turn of each player. It is not - all of it is the turn of the current player. Is this thinking correct? I'm confused as to how this would work though, all of the results are back propagated to the root, and the root is always visited, therefore the root will always be chosen as the move.
EDIT2: It does say the child that was visited the most again, more feelings of realization. Based on the simple rule, since a leaf will be expanded upon simulation, does this mean each leaf is simulated only once?