First of all, it is important to understand the concept of State space. The state space of a problem is the tree or graph of the possible states of the problem along its resolution, with each action (represented by an arc) taking from one state node to another. In the case of tic-tac-toe, the action is to make a move (from x
or of o
), and the space of states has approximately 9! = 362.880
us, because from the first empty state there are initially 9 possible states (whereas the x
plays first), under which there are then 8 possible states (for o
), and for each of them plus 7 possible states (for new play of the x
), and so successively until a termination condition (someone’s win or draw with full board fill):
Minmax is still a tree search algorithm (in AI, called search in the state space), but it has one fundamental characteristic: he considers that there is an adversary who is also an intelligent and rational entity (that is, that acts/plays in the best possible way, trying to maximize its own performance always)1.
1Thus, it does not work well if the opponent plays randomly (even if eventually).
He does this by choosing the nodes during the search so as to maximize a gain value on his own plays and minimize that gain value on the opponent’s plays (note the notation I use, in which blue circle means choice to maximize and red square means choice to minimize):
Minimizing the gain value in the opponent’s moves means considering that the opponent will also play optimally (that is, trying to decrease the gain from the computer).
In the case of tic-tac-toe, it is common to use the value of each state as +1
for states that are victorious (or lead to victory), -1
for states that are defeat (or lead to defeat) and 0
to tie states (or that lead to a tie). Thus, in the execution of the algorithm he does the following:
- Open the status space from the current state of the game (up to a configured depth - more details later)
- From bottom to top, it transports state values, maximizing in your play and minimizing in the opponent’s play
- Until you reach the current state, when you already have the ideal path
The following animation illustrates how this process occurs. There at the last level of the tree is a blue node, which is a computer move (and should be maximized). Below it are two possible states, one of winning and the other of losing (for example), so the algorithm chooses the highest value (maximizes the move). This procedure will be performed progressively to the upper levels, until it reaches a red node (which is an opponent’s move, and must be minimized). Therefore, the lowest value is chosen (minimizes in the opponent’s move). Eventually all the child nodes of the main node (which is the current state) will have already been computed, so just choose the one that maximizes the value (after all, again you have the computer move).
The values used in this example are interesting for an old game, but other games can use different values (because there may be more or less interesting states and this can be measured on a particular scale from the problem to the designer’s choice). The implementation of the algorithm is based on classical tree/graph processing algorithms, such as the deep-sea search or the search in width.
In the case of the old game, the tree has a maximum depth of 9 levels (maximum of 9 moves, in case of a draw), so it is worth using the search in depth. In other games (such as Sliding puzzle - do not know the name in English...), processing the whole tree can be computationally impeding. Therefore, there is a technique to avoid exploiting unnecessary nodes, called Alpha-Beta Pruning. The idea of this tree pruning is to keep two variables, alpha
with the best maximization value (highest value found so far) and beta
with the best minimizing value (lowest value found so far). Thus, when opening the nodes, the obtained values are compared with those of alpha
and beta
to decide not to exploit the other nodes of the same affiliation.
Consider the following animation with a fictional example:
- The deep search started at the first blue node (current state), descended to the first red node of the second level (leftmost), and descended to their children opening them and calculating the value of each state (in the example, respectively the values 3, 9 and 7).
- Since the knot in question (the knot of interest, i.e., the parent knot) is red, it is an opponent’s move that will be minimized. Thus, the value 3 is chosen and passed to the next node of the same level (the intermediate red node at the second level). The value of
beta
is defined as 3.
- The deep search will open the first child node resulting in the value 2, less than the current minimum (alpha = 3). It is not yet known the value of the other sibling nodes of this level (and this may mean that their eventual sub-nodes have not been processed to reach their value!). But it doesn’t matter, because the resulting value (called
u
in the graph) will necessarily be less than or equal to 2 since the red node is minimizing (understand that the opponent will play so that the computer will win at most 2 anyway in this branch of the tree). So the rest of us at this level are not exploited, avoiding unnecessary processing. And beta
is amended to 2.
- Finally, the same is done with the last knot of the second level (the red knot on the right). The deep search again opens the first child node that results in 14. If there were only this child node, this branch would be interesting because it would have a higher value (and better for the computer) than the minimum currently obtained (the
beta
equal to 2). But as there are more child nodes, one needs to continue opening the next node on the same level. In it, you get the value 1 which is less than the current minimum (beta
= 2). Here, as before, we found a minimizing value lower than the current one and therefore the processing of this branch can be stopped since it does not matter what it contains in the other nodes: the gain value obtained here will be less than or equal to 1.
- Having these minimizations of the opponent, it is processed the current level (the first level, of the current state of the game) in which it seeks to maximize the gain (since it is the computer’s play). Therefore, the value chosen is 3, which indicates that the ideal path (which maximizes the long-term gain of the computer) is by the left branch.
In other words, that’s the equivalent of doing:
minimax = max{ min{3, 9, 7}, min{2, ?, ?}, min{14, 1, ?, ?} }
minimax = max{ 3, x, y } com x <= 2 e y <= 1
Soon:
minimax = max{ 3, 2, 1 }
minimax = 3
This example focuses on pruning nodes when minimizing is checked, but it’s the same thing when maximization is checked - only by reversing the gain comparison (and using the other variable):
minimax = min{ max{2, 5, 8}, max{15, ?, ?}, max{7, 9, 22, ?} }
minimax = min{ 8, x, y } com x >= 15 e y >= 22
minimax = min{ 8, 15, 22 }
minimax = 8
Note: Note in this other example as opening the node with gain 9 (in the third branch
on the right) processing does not stop because the value of alpha
, that
initially was 8, was changed to 15 in the second branch and by
that the third branch needs to be opened until find the value
22.
In the case of really large state spaces2, there is also the possibility of limiting the search to a configured depth (for example, the next 8 moves). In this case, the nodes' values of lower levels are simply estimating without the need to explore their child states. In chess, for example, this estimate can be made based on the weighting of values from the power of the pieces (pawn is worth 1, pawn in the center is worth 2, horse is worth 3, queen is worth 9, etc) and/or on the comparison of the current state with states knowingly good for the computer (extracted from books of openings and closures, matches with winning statistics, etc).
2Like chess (whose average number of moves is 40 and the average branch number per state is 35 subsets), where the estimated state space is in the order of 1061 nodes (very close to the number of atoms in the universe!)
Have you seen this: https://isjavado.wordpress.com/2012/06/22/jogo-da-velha-intelligents-artificial-algoritmo-minimax/ ? It seems to be exactly what you’re looking for.
– gustavox
Already yes, I saw several before finally coming to ask here, actually I am a week looking, and I did not find a clear, this code there as you can imagine, it is very complicated to abstract only MINIMAX for someone with my poor programming knowledge.
– Andiie