How to program Artificial Intelligence with Minimax

Asked

Viewed 3,265 times

14

My first question here, I recently started a "Jogo Da Velha" in Java and it is already ready, however I want to implement the decision-making capacity, or artificial intelligence.

I’ve found that the best way to do this is by using MiniMax, but I couldn’t find a good "start-guide" on the internet.

I wonder how I start my Minimax, or maybe a code commented with a Minimax.

Or any suggestion of artificial intelligence for my little game, thank you all. D

  • Have you seen this: https://isjavado.wordpress.com/2012/06/22/jogo-da-velha-intelligents-artificial-algoritmo-minimax/ ? It seems to be exactly what you’re looking for.

  • 1

    Already yes, I saw several before finally coming to ask here, actually I am a week looking, and I did not find a clear, this code there as you can imagine, it is very complicated to abstract only MINIMAX for someone with my poor programming knowledge.

1 answer

21


First of all, it is important to understand the concept of State space. The state space of a problem is the tree or graph of the possible states of the problem along its resolution, with each action (represented by an arc) taking from one state node to another. In the case of tic-tac-toe, the action is to make a move (from x or of o), and the space of states has approximately 9! = 362.880 us, because from the first empty state there are initially 9 possible states (whereas the x plays first), under which there are then 8 possible states (for o), and for each of them plus 7 possible states (for new play of the x), and so successively until a termination condition (someone’s win or draw with full board fill):

inserir a descrição da imagem aqui

Minmax is still a tree search algorithm (in AI, called search in the state space), but it has one fundamental characteristic: he considers that there is an adversary who is also an intelligent and rational entity (that is, that acts/plays in the best possible way, trying to maximize its own performance always)1.

1Thus, it does not work well if the opponent plays randomly (even if eventually).

He does this by choosing the nodes during the search so as to maximize a gain value on his own plays and minimize that gain value on the opponent’s plays (note the notation I use, in which blue circle means choice to maximize and red square means choice to minimize):

inserir a descrição da imagem aqui

Minimizing the gain value in the opponent’s moves means considering that the opponent will also play optimally (that is, trying to decrease the gain from the computer).

In the case of tic-tac-toe, it is common to use the value of each state as +1 for states that are victorious (or lead to victory), -1 for states that are defeat (or lead to defeat) and 0 to tie states (or that lead to a tie). Thus, in the execution of the algorithm he does the following:

  1. Open the status space from the current state of the game (up to a configured depth - more details later)
  2. From bottom to top, it transports state values, maximizing in your play and minimizing in the opponent’s play
  3. Until you reach the current state, when you already have the ideal path

The following animation illustrates how this process occurs. There at the last level of the tree is a blue node, which is a computer move (and should be maximized). Below it are two possible states, one of winning and the other of losing (for example), so the algorithm chooses the highest value (maximizes the move). This procedure will be performed progressively to the upper levels, until it reaches a red node (which is an opponent’s move, and must be minimized). Therefore, the lowest value is chosen (minimizes in the opponent’s move). Eventually all the child nodes of the main node (which is the current state) will have already been computed, so just choose the one that maximizes the value (after all, again you have the computer move).

inserir a descrição da imagem aqui

The values used in this example are interesting for an old game, but other games can use different values (because there may be more or less interesting states and this can be measured on a particular scale from the problem to the designer’s choice). The implementation of the algorithm is based on classical tree/graph processing algorithms, such as the deep-sea search or the search in width.

In the case of the old game, the tree has a maximum depth of 9 levels (maximum of 9 moves, in case of a draw), so it is worth using the search in depth. In other games (such as Sliding puzzle - do not know the name in English...), processing the whole tree can be computationally impeding. Therefore, there is a technique to avoid exploiting unnecessary nodes, called Alpha-Beta Pruning. The idea of this tree pruning is to keep two variables, alpha with the best maximization value (highest value found so far) and beta with the best minimizing value (lowest value found so far). Thus, when opening the nodes, the obtained values are compared with those of alpha and beta to decide not to exploit the other nodes of the same affiliation.

Consider the following animation with a fictional example:

  1. The deep search started at the first blue node (current state), descended to the first red node of the second level (leftmost), and descended to their children opening them and calculating the value of each state (in the example, respectively the values 3, 9 and 7).
  2. Since the knot in question (the knot of interest, i.e., the parent knot) is red, it is an opponent’s move that will be minimized. Thus, the value 3 is chosen and passed to the next node of the same level (the intermediate red node at the second level). The value of beta is defined as 3.
  3. The deep search will open the first child node resulting in the value 2, less than the current minimum (alpha = 3). It is not yet known the value of the other sibling nodes of this level (and this may mean that their eventual sub-nodes have not been processed to reach their value!). But it doesn’t matter, because the resulting value (called u in the graph) will necessarily be less than or equal to 2 since the red node is minimizing (understand that the opponent will play so that the computer will win at most 2 anyway in this branch of the tree). So the rest of us at this level are not exploited, avoiding unnecessary processing. And beta is amended to 2.
  4. Finally, the same is done with the last knot of the second level (the red knot on the right). The deep search again opens the first child node that results in 14. If there were only this child node, this branch would be interesting because it would have a higher value (and better for the computer) than the minimum currently obtained (the beta equal to 2). But as there are more child nodes, one needs to continue opening the next node on the same level. In it, you get the value 1 which is less than the current minimum (beta = 2). Here, as before, we found a minimizing value lower than the current one and therefore the processing of this branch can be stopped since it does not matter what it contains in the other nodes: the gain value obtained here will be less than or equal to 1.
  5. Having these minimizations of the opponent, it is processed the current level (the first level, of the current state of the game) in which it seeks to maximize the gain (since it is the computer’s play). Therefore, the value chosen is 3, which indicates that the ideal path (which maximizes the long-term gain of the computer) is by the left branch.

inserir a descrição da imagem aqui

In other words, that’s the equivalent of doing:

minimax = max{ min{3, 9, 7}, min{2, ?, ?}, min{14, 1, ?, ?} }
minimax = max{ 3, x, y } com x <= 2 e y <= 1

Soon:

minimax = max{ 3, 2, 1 }
minimax = 3

This example focuses on pruning nodes when minimizing is checked, but it’s the same thing when maximization is checked - only by reversing the gain comparison (and using the other variable):

minimax = min{ max{2, 5, 8}, max{15, ?, ?}, max{7, 9, 22, ?} }
minimax = min{ 8, x, y } com x >= 15 e y >= 22
minimax = min{ 8, 15, 22 }
minimax = 8

Note: Note in this other example as opening the node with gain 9 (in the third branch on the right) processing does not stop because the value of alpha, that initially was 8, was changed to 15 in the second branch and by that the third branch needs to be opened until find the value 22.

In the case of really large state spaces2, there is also the possibility of limiting the search to a configured depth (for example, the next 8 moves). In this case, the nodes' values of lower levels are simply estimating without the need to explore their child states. In chess, for example, this estimate can be made based on the weighting of values from the power of the pieces (pawn is worth 1, pawn in the center is worth 2, horse is worth 3, queen is worth 9, etc) and/or on the comparison of the current state with states knowingly good for the computer (extracted from books of openings and closures, matches with winning statistics, etc).

2Like chess (whose average number of moves is 40 and the average branch number per state is 35 subsets), where the estimated state space is in the order of 1061 nodes (very close to the number of atoms in the universe!)

  • 1

    It was a very useful reading, scored some is, It’s improved my concept of how logic works, and now I’m going to break my head until I find a way to "encode" my logic, because I can think of the operation going on, but I still can’t figure out how to implement it in code, but I really appreciate it, was an excellent orientation :D

  • I understand this difficulty. It is natural. But I think it’s important to understand the algorithm well, because it’s easier for you to understand the example implementations that exist on the Internet. If you have difficulties, I suggest you start implementing the in-depth search algorithm (without getting into the Minmax merit). It will help you understand the principle of the thing. : ) Good luck!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.