Preface
A number of people have asked me about the algorithms I used in my chess and reversi programs.
What you see here has been put together in an attempt to answer these questions. Please feel
free to contact me if you still have questions
after reading this!
Please remember that the methods presented here serve only as an introduction, and are not the only methods. On the contrary, the alpha-beta pruning described here can be improved, and a number of other algorithms also exist.
The Analysis Function
The analysis function A(p) assigns a number to the board position p.
Higher values of A(p) indicate that the position
favors blue, while lower values mean that red is winning. Hence, for a position p in which
blue has won, A(p) is ∞, while A(p) is
-∞ if red has won.
A draw position can be given the value 0.
The Game Tree
The first algorithm that comes to mind is to always make the move which yields the highest analysis
value. However, this will not work very well in practice, because as mentioned before, the analysis
is only a guess as to how good the position is. Consider for example, an analysis function for the
game of chess which uses only the material value (sum of piece values) for each player. A "greedy"
algorithm which always chooses the move that maximizes the analysis might easily capture a pawn with
its queen, ignoring the fact that the queen will be captured in the next move!
The MiniMax Algorithm
The minimax algorithm is used to find accurate values for the board positions. We assume
that each player always plays his/her best move in any given position. With this assumption, two observations
lead to the minimax algorithm: We have accurate analyses for leaves, and the value of a node can be
determined accurately from its chilren's values. The value for leaves is easily determined, because
in these positions the game has ended.
For a small game like Tic-tac-toe, we can build and search the entire tree, all the way down to the actual leaves. However, this is clearly impossible for a more complicated game like reversi or chess. Here is where we finally use the analysis function. Since we can't search the entire tree, we only consider the first few levels. In this manner we pretend that the nodes below a certain level are leaves, and determine their value using the analysis function.
We can implement the procedure recursively, using two functions, one for red and one for blue.
int BlueValue(Board b, int depth) { if ((GameOver(b) or depth>MaxDepth) return Analysis(b) int max = -infinity for each legal move m in board b { copy b to c make move m in board c int x = RedValue(c, depth+1) if (x>max) max = x } return max } int RedValue(Board b, int depth) { if ((GameOver(b) or depth>MaxDepth) return Analysis(b) int min = infinity for each legal move m in board b { copy b to c make move m in board c int x = BlueValue(c, depth+1) if (x<min) min = x } return min }
Where MaxDepth is a constant defining the maximum depth. Each of these functions is initially called with depth=0. The value returned is the value of the node corresponding to board b. Now, in order to find the best move from any position, we just need to choose the child with the best (minimum or maximum depending on color) value.
The NegaMax Algorithm
NegaMax isn't really a different algorithm; it's just a more elegant implementation. The problem
with the above algorithm is that we have two different functions that are essentially doing the
exact same thing. Negamax merges these two into one, by always considering the children of the
node N, from N's point of view. That is, we multiply values in alternate rows by -1, so that
both players become maximizers. This concept is illustrated in Figure 3, which essentially depicts
the same tree as Figure 2. The resulting values are as they would be if we were to evaluate
node values using the minimax algorithm above, and then multiply blue rows by -1. Note that it is
not necessary to do this for the root of the tree.
One way to implement negamax is shown below.
1 const int sign[2]={1,-1} //0 is blue, 1 is red 2 3 int NegaMax(Board b, int depth, int color) { 4 if (GameOver(b) or depth>MaxDepth) 5 return sign[color]*Analysis(b) 6 int max = -infinity 7 for each legal move m in board b { 8 copy b to c 9 make move m in board c 10 int x= - NegaMax(c, depth+1, 1-color) //Note the "-" before "NegaMax" 11 if (x>max) max = x 12 } 13 return max 14 }The function NegaMax returns the value of the node b, as seen by the player indicated by color. That is, the better the position is for color, the greater the value that it returns. Note that this is exactly the opposite of what we see in Figure 3. (In Figure 3, nodes were considered from their parent's point of view.) However, this is fixed in line 10, where the value of the child is reversed (multiplied by -1). In this manner, node values are processed as in Figure 3, with the exception of the root. Reversing the root, as mentioned before, is not important. In fact, it can be better if we don't reverse it. This way, the true value (the value it would have had in minimax) of a node b with color c, is returned by NegaMax(b,0,c).
Note also that we multiply the analysis value of leaves by sign[color] in line 5. We need to be careful not to leave out things like that, or make mistakes about them. A common mistake in negamax is to get the positive and negative mixed up. It is so easy to make a mistake and end up with a program that always finds the worst possible move for each player. Even worse, you might end up with a program that always finds the best move for one color, but the worse move for the other color.
Alpha-Beta Pruning and Branching
So now we have a search algorithm. How fast do you think it will be? How many levels can you
search in a reasonable time? Considering the fact that most games can easily have more than 10-20
possible moves from every position, the number of nodes we have to search is (dramatically) exponential
in the number of levels. In a game like chess, where we can sometimes have over 50 or even 100
possible moves in a position, this problem is even more drastic. Using what we've covered so
far, we can only search through 3-4 levels within a reasonable time.
Let's follow our algorithm, assuming that nodes are processed from left to right. While trying to find the value of A, we go down to B and determine its value to be 6, the minimum of 6 and 9. Then control is passed to C. From there we compute E to be 4, the maximum of 4 and -2. Since C is a minimizer, we now know that the value of C is less than or equal to 4. Therefore, C will not be chosen by A as the maximum, since A already has the child B with value 6, which will be greater than any value C may later have. Hence, searching F and G is useless, and we can immediately stop processing (prune) C. Control then passes to D.
The same thing can happen if A is a minimizer. In general, we can define lower and upper bounds for the value of a node, called alpha and beta, respectively. One of these bounds (depending on the player's color) does not change within the node, and if the node's value crosses this bound, we can prune the node. The other bound, however, changes as the node's value is updated.
This can be implemented using minimax as below.
int BlueValue(Board b, int depth, int alpha, int beta) { if ((GameOver(b) or depth>MaxDepth) return Analysis(b) int max = -infinity for each legal move m in board b { copy b to c make move m in board c int x = RedValue(c, depth+1, alpha, beta) if (x>max) max = x if (x>alpha) alpha = x if (alpha>=beta) return alpha } return max } int RedValue(Board b, int depth, int alpha, int beta) { if ((GameOver(b) or depth>MaxDepth) return Analysis(b) int min = infinity for each legal move m in board b { copy b to c make move m in board c int x = BlueValue(c, depth+1, alpha, beta) if (x<min) min = x if (x<beta) beta = x if (alpha>=beta) return beta } return min }
Note that we can write alpha>=beta instead of alpha>beta, which slightly increases our program's speed. These functions are intially called with depth=0, alpha=-∞, and beta=∞.
The negamax version can be written as below.
const int sign[2]={1,-1} //0 is blue, 1 is red int NegaMax(Board b, int depth, int color, int alpha, int beta) { if ((GameOver(b) or depth>MaxDepth) return sign[color]*Analysis(b) int max = -infinity for each legal move m in board b { copy b to c make move m in board c int x= - NegaMax(c, depth+1, 1-color, -beta, -alpha) if (x>max) max = x if (x>alpha) alpha = x if (alpha>=beta) return alpha } return max }Note that alpha and beta are replaced with -beta and -alpha, respectively, when NegaMax is called within itself.
Alpha-Beta pruning, used properly, will usually allow us to search at least 2-3 extra levels. However, pruning alone is not enough! Let's look back for a moment at Figure 4. Why was C pruned? C was pruned because B, which was searched before C, had a value better than any value C could possibly have. We knew that C will not be chosen, because we'd already found a better move in A. Thus, in order to maximize the number of nodes that get pruned, we try to check the moves that "look better" first. That is, we process the children of a node in the order of how good we expect the move to be. This is called branching. Here, we use another heuristic to try to compare moves. Some of the things we could use are:
The best way to find which method to use is to experiment! See which works best for your program. After we find the move values, we can sort them in decreasing order, and process them in that order. However, a heap data structure may prove to be more efficient than sorting, because when pruning takes place, we leave the function and thus don't need to have the rest of the moves sorted.
Another method sometimes used to increase speed is to divide the number returned by the analysis function by a constant value. (For example divide it by 10.) This reduces the range of values that the nodes in the game tree can have, and can therefore result in more pruning. However, this makes the analysis less accurate, so it should be used with care.
The Horizon Effect
Remember why we decided to use "look-ahead" with game trees in the first place? The problem
was that we couldn't detect dangerous situations, arising from us doing something bad that
the analysis function couldn't immediately detect. So why shouldn't the same problem still exist
when we're looking several moves ahead? We're still using the analysis function to guess the value
of most nodes.
References
This introduction has been written using my personal experience, bits of information
I've picked up from here and there, and algorithms I learned from the
following sources.