Chess Programming Part V: Advanced Search
search move position depth ply program queen capture value
In this next-to-last article, we will examine advanced search-related techniques which can speed up and/or strengthen your chess-playing program. In most cases, the concepts (if not the actual code) also apply to a variety of other 2-player games; adapting them to your particular problem, however, shall remain an exercise for the proverbial reader.
So far, all of the search algorithms we have looked at examine a position's consequences to a fixed "depth". However, this is rarely a good thing. For example, suppose that your program uses an iterative-deepening alpha-beta algorithm with maximum depth 5-ply. Now look at these cases:
- Along a certain line of play, you discover a position where one of the players is checkmated or stalemated at depth 3. Obviously, you don't want to keep searching, because the final state of the game has been resolved. Not only would searching to depth 5 be a colossal waste of effort, it may also allow the machine to finagle its way into an illegal solution!
- Now, suppose that, at depth 5, you capture a pawn. The program would be likely to score this position in a favorable light, and your program might decide that the continuation leading to it is a useful one. However, if you had looked one ply further, you might have discovered that capturing the pawn left your queen undefended. Oops.
- Finally, suppose that your queen is trapped. No matter what you do, she will be captured by the opponent at ply 4, except for one specific case where her death will happen at ply 6. If you search to depth 5, the continuations where the queen is captured at ply 4 will be examined accurately, and scored as likely disasters. However, the unique line of play where the queen is only captured at ply 6 (outside of the search tree) doesn't reveal the capture to the machine, which thinks that the queen is safe and gives it a much better score! Now, if all you have to do to push the queen capture outside of the search tree is delay the opponent with a diversion, doing so may be worth the risk: although it could damage your position, it might also cause the opponent to make a mistake and allow the queen to escape. But what if you can only delay the queen capture by sacrificing a rook? To the machine, losing a rook at ply 4 is less damaging than losing a queen, so it will merrily throw its good piece away and "hide" the too-horrible-to-mention queen capture beyond its search horizon. (During its next turn, of course, the machine will discover that the queen must now be captured at ply 4 in all cases, and that it has wasted a rook for no gain.) Hans Berliner described this "horizon effect" a long time ago, and it is the most effective justification for the "quiescence search" described in the next section.
There are two ways to assess a position's value: dynamic evaluation (i.e., look at what it may lead to) and static evaluation (i.e., see what it looks like on its own, irrespective of consequences). Dynamic evaluation is performed through search; as we have just mentioned, static evaluation is only feasible when the position is not likely to undergo an overwhelming change of balance in the near future. Such relatively stable positions are called "quiet" or "quiescent", and they are identified via "quiescence search".
The basic concept of Quiescence Search is the following: once the program has searched everything to a fixed depth (say, 6-ply), we continue each line of play selectively, by searching "non-quiescent" moves only, until we find a quiescent position, and only then apply the evaluator.
Finding a quiet position requires some knowledge about the game. For example, which moves are likely to cause a drastic change in the balance of power on the board? For chess, material balance tends to be the overwhelming consideration in the evaluator, so anything that changes material is fair game: captures (especially those of major pieces) and pawn promotions certainly qualify, while checks may also be worth a look (just in case they might lead to checkmate). In checkers, captures and promotions also seem like reasonable choices. In Othello, every single move is a capture, and "material balance" can change so much in so little time that it might be argued that there are no quiet positions at all!
My own program uses a simple quiescence search which extends all lines of play (after a full-width search to depth X) by looking exclusively at captures. Since there are usually not that many legal captures in a given position, the branching factor in the quiescence search tends to be small (4-6 on average, and quickly converging to 0 as pieces are eaten on both sides). Nevertheless, the quiescence search algorithm is called on a LOT of positions, and so it tends to swallow 50% or more of the entire processing time. Make sure that you need such a scheme in your own game before committing to it.
Only when no capture is possible does my program apply its evaluator. The result is a selectively-extended search tree which is anything but fixed-depth, and which defeats most of the nasty consequences of the "horizon effect".
The All-Important Null-Move
One of the most effective ways to speed up a chess program is to introduce the concept of a null move into the equation.
The null move consists, quite simply, of skipping a turn and letting the opponent play two moves in a row. In the overwhelming majority of positions, doing nothing is a bone-head idea: you should (almost) always be able to do *something* to improve your lot. (To be honest, there are a few "damned if I do, damned if I don't" positions where the null move would actually be your best bet, and the computer will not play them correctly, but such "zugzwang" positions are hopeless anyway, so the loss of performance is not very traumatic.)
Allowing the computer to try a null move during search has several advantages related to speed and accuracy. For example:
- Suppose that a position is so overwhelmingly in your favor that, even if you skipped your turn, the opponent couldn't respond with anything that would help. (In program terms, you would get a beta cutoff even without making a move.) Suppose further that this position is scheduled to be searched to depth N. The null move, in effect, takes out an entire ply of the search tree (you are searching only the null move instead of all your legal ones) and if your branching factor is B, searching the null move is equivalent to looking at a single depth N-1 subtree instead of B of them. With B=35 as in the typical chess middlegame, null-move search may only consume 3% of the resources required by a full depth-N examination. If the null move search reveals that you are still too strong even without playing (i.e., it creates a cutoff), you have saved 97% of your effort; if not, you must examine your own legal moves as usual, and have only wasted an extra 3%. On average, the gain is enormous.
- Now, suppose that, during quiescence search, you reach a position where your only legal capture is rook-takes-pawn, which is immediately followed by the opponent's knight-takes-rook. You'd be a lot better off not making the capture, and playing any other non-capture move, right? You can simulate this situation by inserting the null move into the quiescence search: if, in a given position during quiescence search, it is revealed that the null move is better than any capture, you can assume that continuing with captures from this position is a bad idea, and that since the best move is a quiet one, this is a position where the evaluation function itself should be applied!
Aspirated Search and MTD(f)
Plain old alphabeta assumes nothing about a position's ultimate minimax value. It looks at *everything*, no matter how preposterous. However, if you have a pretty good idea of what the value will turn out to be (for example, because you are running an iterative-deepening scheme and have received the previous iteration's results), you might be able to identify lines of play that are so out of whack with your expectations that they can't possibly be right, and cut them off pre-emptively.
For example, suppose that you have reason to believe that a position's value will be close to 0, because it is very well balanced. Now, suppose that an internal node's preliminary evaluation is at +20,000. You can cutoff with reasonable confidence.
This is the idea behind "aspiration search", a variant of alphabeta in which, instead of using +INFINITY and -INFINITY as the initial bounds of the search, you set a small window around the expected value instead. If the actual value happens to fall within the window, you win: you'll get it without error, and faster than you would otherwise (because of the many extra cutoffs). If not, the algorithm will fail, but the error will be easy to detect (because the minimax value you'll receive will be equal to one of the bounds); you'll have to waste a bit of time re-searching with a wider window. If the former case happens more often than the latter, you win on average. Obviously, the better your initial guess of the expected value, the more useful this technique is.
In the mid 1990's, researcher Aske Plaat extended aspiration search to its logical conclusion: what if you called an aspirated alphabeta with a search window of width equal to zero? It would fail all the time, of course... But it would do so *very quickly*, because it would cutoff every path almost immediately. Now, if the failure indicates that the actual value is lower than your estimate, you can try again, with another zero-width window around a smaller estimate, etc. In a sense, you could then use alphabeta to perform a binary search into the space of all possible minimax values, until you reach the only call which will *not* fail because the zero-width window will be centered on the position's actual value!
This brilliant idea, presented in a paper available on the web at http://theory.lcs.mi...plaat/mtdf.html, has been embodied in the MTD(f) search algorithm, which is all of 10 lines long. Tacked on top of an alphabeta implementation equipped with a transposition table, MTD(f) is incredibly efficient and highly parallel-friendly. It also works better with "coarse-grain" (and therefore probably simpler and faster) evaluators: it is easy to see that it takes fewer probes to zero in on the actual value in a binary search if the smallest "atom" of value is equal to, say, 0.1 pawns rather than 0.001 pawns.
There are other alphabeta variants in wider use (namely, the infamous NegaScout; I would rather teach General Relativity to orangutangs than get into that mess) but Plaat insists that MTD(f) is the most efficient algorithm in existence today and I'll take his word for it. My own program uses MTD(f); you'll be able to marvel at the algorithm's simplicity very shortly!
One last thing before we leave the topic of search: in chess, some moves are obviously better than others, and it may not be necessary to waste too much time searching for alternatives.
For example, suppose that after running your iterative algorithm to depth N-1, you discover that one of your moves is worth +9000 (i.e., a capture of the opponent's queen) and all others are below 0. If saving time is a consideration, like in tournaments, you may want to bypass the whole depth N search and only look at the best move to depth N instead: if this extra ply does not lower its evaluation much, then you assume that the other moves won't be able to catch up, and you stop searching early. (Remember: if there are 35 valid moves at each ply on average, you may have just saved 97% of your total effort!)
Deep Blue's team has pushed this idea one step further and implemented the concept of "singular extensions". If, at some point in the search, a move seems to be a lot better than all of the alternatives, it will be searched an extra ply just to make sure that there are no hidden traps there. (This is a vast oversimplification of the whole process, of course, but that's the basic idea.) Singular extensions are costly: adding an extra ply to a node roughly doubles the number of leaves in the tree, causing a commensurate increase in the number of calls to the evaluator; in other words, Deep Blue's specialized hardware can afford it, my cranky Java code can't. But it's hard to argue with the results, isn't it?
In Part VI, we wrap up the series with a discussion of evaluation functions, the code which actually tells your program whether a given board position is good or bad. This is an immense topic, and people can (and do) spend years refining their own evaluators, so we will have to content ourselves with a rather high-level discussion of the types of features which should be examined and their relative importance. If everything goes according to plan, I should also have some Java code for you to sink your teeth into at about that time, so stick around, won't you?
François Dominic Laramée, September 2000