Monte Carlo Tree Search - Fatalistic play

Started by
17 comments, last by ddyer 9 years, 4 months ago

Hi,

I'm currently writing AI for a 2-player (non-deterministic, perfect information, asymmetrical) boardgame. I've successfully used Negamax and Expectimax before, but for my current game, it's very hard to write an evaluation function. Hence, using MCTS.

But, I noticed that the AI will often just suicide, or ignore moves that would make it lose the game. I'm using a fairly high sample count, and massively increasing this count doesn't change the play that much: Still the same mistakes.

Currently, I simulate and return 1 for AI win, and -1 for Player win, but I don't know if this is enough. The algorithm will find the best path to victory, and that path includes the worst moves that the player could make.

I *think* that this is the problem I'm having: The AI is expecting a best-case scenario for player-moves. Since you simulate randomly, and check the winrate, it will have taken into account the worst moves that the player could make so that the AI's score is as high as possible.

Does that make any sense? I've been reading loads of papers about this, but none of them seem to mention this at all. Am I thinking wrong here?

Any help would be highly appreciated, I can show code if needed.

Thanks!

Advertisement

I *think* that this is the problem I'm having: The AI is expecting a best-case scenario for player-moves. Since you simulate randomly, and check the winrate, it will have taken into account the worst moves that the player could make so that the AI's score is as high as possible.

Does that make any sense?


Only if you did it wrong. In nodes where it's the player's turn to move, you should be trying to get the best possible score for the player.

In nodes where it's the player's turn to move, you should be trying to get the best possible score for the player.

Currently, I'm incrementing score with 1 when AI wins a simulated game, and -1 if the AI loses one, regardless of whose turn it is.

As you said, the two opponents should be trying to maximize their own score, but I'm not sure how to do add this into my implementation.

EDIT: One thing that comes to mind is editing my Select method: If it's the player's turn, select the lowest UCT score, and the highest for the AI. Would that work?

I imagine you are using something like UCB1 to select the next move while in the tree. In that formula, "wins" and "losses" are to be interpreted from the point of view of the player to move. The formula you are maximizing probably has two terms: something that looks like an average score and something that looks like an error estimate multiplied by a slowly growing function. In that case, you can flip the sign of the average-score term, but don't do anything to the error-estimate term.

If that doesn't make things clear, perhaps you should post some code or some formulas.

In general I'd stay that I agree with Alvaro. ;)

Depending upon your game, possibly the trees go very deep and it can't explore enough nodes in the time allotted. Perhaps you need some domain-specific estimate of the win/loss ratio based on the board state to optimise the search for previously unexplored nodes?


I imagine you are using something like UCB1 to select the next move while in the tree. In that formula, "wins" and "losses" are to be interpreted from the point of view of the player to move. The formula you are maximizing probably has two terms: something that looks like an average score and something that looks like an error estimate multiplied by a slowly growing function. In that case, you can flip the sign of the average-score term, but don't do anything to the error-estimate term.

I don't really see how that makes sense: The Select just defines which nodes will be used for Expanding/Simulating. By changing the formula, all you do is change the Exploration/Exploitation rate for player/AI without changing the scoring for the player/AI. Or am I missing something?


Depending upon your game, possibly the trees go very deep and it can't explore enough nodes in the time allotted. Perhaps you need some domain-specific estimate of the win/loss ratio based on the board state to optimise the search for previously unexplored nodes?

I'm not using fixed allocated time but a sample count. It's important that the difficulty stays the same, no matter the quality of the hardware.

Thanks for the replies, guys! I'll update this post in a bit with some source code; I'm currently busy with a bunch of things.

Hey guys,

Here's my source code for the MCTS AI. It's pretty much entirely domain independent, with a lot of general performance optimizations. Since the performance is not really an issue (I can simulate about 4000 games per second), and increasing the sample size does NOT make the AI play better (with the issues that I currently have), I am not implementing domain-dependant information for now. Also, my goal is not to make the AI as hard as possible: This is made for casual to semi-casual players.


class TreeNodeTablut
{
	public void SelectAction()
	{
		LinkedList<TreeNodeTablut> visited = new LinkedList<TreeNodeTablut>();

		TreeNodeTablut cur = this;

		visited.AddLast(this);

		while (!cur.IsLeaf() && !cur._isTerminalNode)
		{
			cur = cur.Select();
			visited.AddLast(cur);
		}

		if (cur.Expand())
		{
			// Node has been expanded
			if (cur.Children != null)
			{
				cur = cur.Select();
			}
			else // Maximum expansion depth reached: Simulate on the currently selected parent node
			{ }

			visited.AddLast(cur);
			float value = Simulate(ref cur);

			foreach (TreeNodeTablut node in visited)
			{
				node.UpdateStats(value);
			}
		}
		else
		{
			return;
		}
	}

	// Select will select the best-looking child
	// Optimization: Make the move on the selected child during select. NOT during expand on all children
	private TreeNodeTablut Select()
	{
		TreeNodeTablut selected = null;

		if (Children == null)
		{
			return null;
		}
		
		double bestValue = double.MinValue;

		foreach (TreeNodeTablut c in Children)
		{
			double uctValue = c.TotValue / (c.Visits + epsilon) + Math.Sqrt(Math.Log(Visits + 1) / (c.Visits + epsilon)) + MBRandom.RandomFloat * epsilon;

			if (uctValue > bestValue)
			{
				selected = c;
				bestValue = uctValue;
			}
		}

		// Performance optimization: Now finish the expansion by making a move and storing the resulting board-state
		// This is only useful when using large search depths, as all nodes will be visited up to a certain depth
		if (selected != null && selected.BoardState.PawnInformation == null)
		{
			Board.UndoMove(ref selected.Parent.BoardState);

			PawnTablut pawn = (selected.CurrentPlayerTurn == PlayerTurn.PlayerOne ? Board.PlayerOneUnits : Board.PlayerTwoUnits)[selected.PawnIndex] as PawnTablut;

			Move move;
			move.GoalTile = Board.Tiles[selected.goalTile.X][selected.goalTile.Y];
			move.Unit = pawn;

			bool madeCapture;
			bool madeMove = Board.AIMakeMove(move, out madeCapture);

			selected.BoardState = Board.TakeBoardSnapshot();
		}

		return selected;
	}

	public bool Expand()
	{
		// Maximum expanding depth reached. Don't create child nodes
		if (Depth == SearchDepth)
		{
			return true;
		}

		// Set the board up for this node
		Board.UndoMove(ref BoardState);

		// All moves for every pawn
		List<List<Move>> moves = Board.CalculateAllValidMoves(nextPlayerTurn == PlayerTurn.PlayerOne ? -1 : 1);

		// Some pawns might have no moves, but they are still in the list (with 0 moves), filter those out.
		// The pawns with 0 moves are still stored for performance optimization reasons
		List<List<Move>> orderedMoves = moves.OrderByDescending(o => o.Count).TakeWhile(m => m.Count != 0).ToList();

		// Create a child for every available move. This is less efficient than pruning some moves out, but it guarantees that every move (and subsequent moves) can be evaluated
		int moveCount = Board.GetTotalMovesAvailable(ref orderedMoves);
		if (moveCount > 0)
		{
			Children = new List<TreeNodeTablut>(moveCount);

			int index = 0;
			foreach (List<Move> pawnMoves in orderedMoves)
			{
				foreach (Move move in pawnMoves)
				{
					PawnTablut pawn = move.Unit as PawnTablut;
					TileTablut tile = move.GoalTile as TileTablut;

					Children.Add(new TreeNodeTablut());
					Children[index].OpeningMove = move;
					Children[index].CurrentPlayerTurn = NextPlayerTurn;
					Children[index].nextPlayerTurn = (nextPlayerTurn == PlayerTurn.PlayerOne ? PlayerTurn.PlayerTwo : PlayerTurn.PlayerOne);
					Children[index].Depth = Depth + 1;

					// Performance stuff: We don't make a move for every child, only store the information required to make that move.
					// In the select phase, we will then make the move.
					// Advantages: 
					//  - No need to undo the move, since we will continue with the board in the simulation
					//  - A lot less moves to make on the board, since not every node will be visited, especially after a depth of 3/4
					//  - A lot less memory required, since we don't store the board-state for each move
					Children[index].Parent = this;
					Children[index].PawnIndex = pawn.Index;
					Children[index].goalTile = new IntVec2(tile.X, tile.Y);

					++index;
				}
			}
		}
		else
		{
			_isTerminalNode = true;
			return false;
		}

		return true;
	}

	private bool IsLeaf()
	{
		return Children == null;
	}

	private float Simulate(ref TreeNodeTablut tn)
	{
		int moveID, pawnID = -1;

		State state = Board.AICheckGameEnded();

		int simDepth = 0;

		List<List<Move>> validMovesAI = Board.CalculateAllValidMoves(1);
		List<List<Move>> validMovesPlayer = Board.CalculateAllValidMoves(-1);

		PlayerTurn saveTurn = tn.CurrentPlayerTurn;

		while (state == State.Empty)
		{
			List<List<Move>> moves = (tn.CurrentPlayerTurn == PlayerTurn.PlayerOne ? validMovesPlayer : validMovesAI);

			if (moves.Count > 0)
			{
				pawnID = MBRandom.RandomInt(0, moves.Count - 1);

				moveID = MBRandom.RandomInt(0, moves[pawnID].Count - 1);

				PawnTablut pawn = moves[pawnID][moveID].Unit as PawnTablut;
				TileTablut tile = moves[pawnID][moveID].GoalTile as TileTablut;

				int prevX = pawn.X;
				int prevY = pawn.Y;
				bool madeCapture, madeMove;
				madeMove = Board.AIMakeMove(moves[pawnID][moveID], out madeCapture);
				int curX = pawn.X;
				int curY = pawn.Y;

				state = Board.AICheckGameEnded();

				// Only recalculate the moves if the game isn't won yet
				if (state == State.Empty)
				{
					// Performance optimization: Start from the previous moves state, and only update the affected pawns
					Board.UpdateAllValidMoves(ref validMovesAI, ref validMovesPlayer, ref pawn, prevX, prevY, curX, curY);
				}
			}

			tn.CurrentPlayerTurn = (tn.CurrentPlayerTurn == PlayerTurn.PlayerOne ? PlayerTurn.PlayerTwo : PlayerTurn.PlayerOne);

			simDepth++;
		}

		tn.CurrentPlayerTurn = saveTurn;
   
		int score = 1;
		
		// Avoid fatalistic play. AI has no trouble seeing winning moves, but it assumes too much that the player won't win if AI can win too.
		// The deeper the simulation goes, the less we value the outcome.
		if (state == State.PlayerWon)
		{
			switch (tn.Depth + simDepth)
			{
				case 1:
					score = 20;
					break;
				case 2:
					score = 15;
					break;
				case 3:
					score = 10;
					break;
				case 4:
					score = 5;
					break;
				case 5:
					score = 3;
					break;
				default:
					score = 1;
					break;
			}
		}

		score *= (state == State.AIWon ? 1 : -1);

		// Maximize score player, minimize for AI during player turn
		//score *= (saveTurn == PlayerTurn.PlayerOne ? -1 : 1);

		return score;
	}

	private void UpdateStats(float value)
	{
		Visits++;

		TotValue += value;
	}
}

And here's what starts the tree search:


public void CalculateNextMove(out PawnTablut pawn, out TileTablut tile)
{
	pawn = null;
	tile = null;

	TreeNodeTablut.Board = _board;
	BoardTablutInformation boardInfo = _board.TakeBoardSnapshot();
	TreeNodeTablut tn = new TreeNodeTablut();
	tn.BoardState = boardInfo;
	tn.CurrentPlayerTurn = boardInfo.Turn;
	tn.NextPlayerTurn = boardInfo.Turn;
	tn.Depth = 0;

	for (int i = 0; i < _sampleCount; i++)
	{
		tn.SelectAction();
	}

	// Put pawns back in their previous positions
	_board.UndoMove(ref boardInfo);

	float bestScore = float.MinValue;
	Move bestMove;
	bestMove.GoalTile = null;
	bestMove.Unit = null;
	int index = 0;
	int bestIndex = -1;

	// Select best child based on win-rate
	foreach (TreeNodeTablut node in tn.Children)
	{
		float nodeScore = node.TotValue / node.Visits;

		if (nodeScore > bestScore)
		{
			bestScore = nodeScore;
			bestMove.GoalTile = node.OpeningMove.GoalTile;
			bestMove.Unit = node.OpeningMove.Unit;
			bestIndex = index;
		}
		++index;
	}
	if (bestMove.Unit != null && bestMove.GoalTile != null)
	{
		pawn = bestMove.Unit as PawnTablut;
		tile = bestMove.GoalTile as TileTablut;
	}

	tn = null;
}

I hope that helps, thanks again!

In situations where you will always lose, every move looks the same, so the choice of move is likely

to appear random. You can tweak this by adding a small bias to the evaluation function, so that

losing in N moves is slightly better than losing in N-1 moves.

Of course, the #1 probability is that you have a bug propagating the win/loss values up the tree.

---visit my game site http://www.boardspace.net - free online strategy games


Of course, the #1 probability is that you have a bug propagating the win/loss values up the tree.

I think there's a bug, or at least something that I'm missing. :p

Does anyone have any ideas or suggestions about what the problem could be?

Thanks again!


Of course, the #1 probability is that you have a bug propagating the win/loss values up the tree.

I think there's a bug, or at least something that I'm missing. tongue.png

Does anyone have any ideas or suggestions about what the problem could be?

Thanks again!

The easiest bug to make and fix is to not adjust the value of the outcome for each node

in the tree while traversing up to the root. In simple two player games, this is to negate

the outcome for each node as you ascend the tree.

---visit my game site http://www.boardspace.net - free online strategy games

This topic is closed to new replies.

Advertisement