First of all sorry for possible mistakes. English is not my native language...
I'm trying to implement MCTS with UCT, to do an intelligence capable of play Go game. However, I have an important question about the algorithm. I know there are 4 phases: Selection, Expansion, Simulation and Backtracking.
My question is about Selection phase. I've been reading some papers, documentation and so on, and from what I understand, if a move has not been tried before, we must priorize it before anything else. That's make sense to me, but at the same time, I don't undertstand how to grow the tree in depth with this rule. For example, let's suppose we have 3 possible moves. We initialitze the tree, with just the root node.
- 1st playout: 0 moves. No childs yet. So we choose randomly 1 of the three possible moves, we add as a child of the root node, and we make a simulation from it. Then we update this new node incrementing the visit counter and the win counter (if playout has been won).
- 2nd and 3rd playout: We have 1 child in root node. But there are two moves that have not been tried yet. So, we must priorize them, isn't it? At the end of 3rd playout, we'll have 3 childs at first level of the tree, with 1 simulation each one.
- 4th playout: Okay. Now, in the selection phase, we travel down the tree and use UCT formula to select the first node (for example). We have three untried moves here because any simulation has been tried before through this node yet.
And here comes my question: everytime we get an unexploedr node, we will have to try the 3 possible moves before going deeper? That means that we will have always a branching factor of 3 for all nodes, isn't it? However in the next image (an example I've found in the Internet), the selected node is part of a level with only 2 of the 3 possible moves. How is that possible?
Am I doing this right? Or I'm missing something?
Thanks in advance!