ASHANTHI MEENA SERALATHAN. Haverford College, Computer Science Department Advisors: Dr. Deepak Kumar Dr. Douglas Blank Dr.

Size: px

Start display at page:

Download "ASHANTHI MEENA SERALATHAN. Haverford College, Computer Science Department Advisors: Dr. Deepak Kumar Dr. Douglas Blank Dr."

Harry Charles
5 years ago
Views:

1 USING ADAPTIVE LEARNING ALGORITHMS TO MAKE COMPLEX STRATEGICAL DECISIONS ASHANTHI MEENA SERALATHAN Haverford College, Computer Science Department Advisors: Dr. Deepak Kumar Dr. Douglas Blank Dr. Stephen Lindell April 25, 2011 Date: Spring

2 2 ASHANTHI MEENA SERALATHAN 1. Introduction Chess Deep Thought and Deep Blue Minimax Search α-β Pruning The Horizon Eect and Quiescence Search Veried Null-Move Pruning Neural Networks Evolutionary Processes and Genetic Algorithms Previous Work Temporal Dierence Learning and TDLEAF(λ) Explanation-Based Neural Networks Population Dynamics Experimental Design Content of Chromosomes Crossing-Over Chess Engine Results Analysis Future Work Conclusion References Appendix A: Player Chromosomes Appendix B: Code 24 Contents

3 USING ADAPTIVE LEARNING ALGORITHMS TO MAKE COMPLEX STRATEGICAL DECISIONS 3 Abstract Traditionally, articial intelligence (AI) algorithms have not been built on particularly adaptive principles. Systems were created using complex collections of rules that were created specically for the purpose at hand, and whose exibility was wholly dependent on what exibility the programmer incorporated within the rules. As a result, this thesis examines many dierent algorithms for decision-making, particularly for playing chess. It surveys a number of dierent techniques for creating a chess-playing system, and nally begins an altered implementation on the genetic algorithm-inspired algorithm that uses Population Dynamics to train a system to understand how to rank board states in a game of chess, which includes more genes than the original algorithm. While still a work in progress, the process of creating the system has already demonstrated some advantages other other algorithms for learning evaluation functions for chess (such as the exibility of the algorithm), and futher work could lead to interesting insight on whether a chess system built using a modied version of Population Dynamics can lead to a system whose skill is comparable to the likes of other chess systems, or even to human players.

4 4 ASHANTHI MEENA SERALATHAN 1. Introduction This thesis focuses on a number of dierent strategies utilized for making complex decisions: Deep Blue, an example of a chess-playing system that utilized no adaptive learning strategies to play chess, as well as three popular algorithms for learning evaluation functions for chess. All three learning algorithms rely on a minimax search (which is sped up with various methods of pruning the move space at a given turn) that returns the best found move. Each algorithm has a distinctly dierent method of developing its evaluation function: two algorithms (Explanation-Based Neural Networks and TDLEAF(λ)) use neural networks in some manner to develop the evaluation function; one (Population Dynamics) uses genetic algorithm techniques to develop the function. Finally, this thesis begins to analyse the eects of altering the Population Dynamics algorithm by slightly altering how it searches through the chess board space, as well as adding more parameters to the player's chromosome in order to have the system determine even more about factors that are important to playing chess. It must be noted that these algorithms all have uses outside of chess, and could be slightly altered to t a number of dierent applications, such as natural language processing and generic decisionmaking systems Chess. Chess is an interesting game to choose for studying learning algorithms (due to its complexity), as it has yet to be solved. Winning the game does not simply become a matter of having the hardware to quickly nd the moves necessary for the perfect strategy: some amount of strategy is required for a system to be able to play on-par with intelligent beings. Figure 1. A typical chess board layout.

5 USING ADAPTIVE LEARNING ALGORITHMS TO MAKE COMPLEX STRATEGICAL DECISIONS 5 In chess there are six dierent types of pieces, each with a dierent set of rules as to how it can move (see Figure 1). Some pieces, such as kings, can only move one space at a time, while other pieces, such as queens, can move several spaces in several directions. If a piece can move to a space occupied by an opponent piece, then it may capture that piece. One piece, the king, is not allowed to be captured, and the player is required to nd a way to get its king out of harm's way if it is threatened (called a "check"). The object of the game is thus to nd a way to capture the king, which requires trapping the King to the point where no matter how the player moves, it cannot save its king from being captured (called a "checkmate"). Along with this are various stalemate criteria, which include a player who is not in check being unable move without putting itself in check, that may be taken into account. In addition, there are many special moves that certain pieces can only make under specic circumstances, such as castling. The system therefore has to keep track of each piece, keep track of where that piece can move, keep track of whether a piece's movement will put its king in check, and then (as there will be often be dozens of possible moves for each player) it must make strategic decisions about which possible moves would be most advantageous at a certain point in the game. How the computer chooses these moves is the topic of discussion in this thesis Deep Thought and Deep Blue. The rst truly successful chess-playing articial intelligence (AI) systems were Deep Thought and Deep Blue (created by Carnegie Mellon University in 1988 and IBM in 1996, respectively) [1]. Deep Thought and Deep Blue both relied heavily in optimizing the system hardware, with the intent of being able to do very fast searches and to use carefully researched evaluation functions to choose how to rank boards. Deep Blue in particular was a sophisticated parallel system that relied on 30 IBM RS/6000 SP processors containing a total of 480 chess chips dedicated to speeding up the process of searching for possible moves (each chip was able to see million possible positions per second). One of the 30 processors served as the master processor, which started searches through the move space and pushed deeper searches onto the other processors. These processors would then use their chess chips to do the last few levels of searches, before evaluating individual boards and sending their ndings back up to the master processor. It also relied on pre-computed and heavily researched evaluation functions that themselves relied on over 8000 features specic to chess games (such as how safe the king was, where specic pieces were located, and so on). By combining sophisticated hardware specically designed for optimized chess-playing with a carefully constructed set of components to an evaluation, and with highly optimized minimax searches through the possible move space, Deep Blue was able to play at a level that was competitive with human Grandmasters, and was even able to beat one (Garry Kasparov) in It averaged scanning 126 million positions per second (with a range of 100 million to 200 million positions per second, depending on the point of the game). While certainly a breakthrough in the eld of AI, it must be noted that though it was able to beat a human player, Deep Blue was given almost every bit of information that it may ever need to play a game of chess against a human player, with the fastest, most sophisticated hardware that could be mustered at the time, and was left to automatically compute its movements based on all the hard-coded information it was handed. In consequence, the Deep Blue algorithm as a whole was rather inexible in terms of its application to other real-world problems. Each rule was hand-crafted by Deep Blue's programmers to only improve the chess-playing process, and would not easily work with any other problem.

6 6 ASHANTHI MEENA SERALATHAN 1.3. Minimax Search. Minimax searches rely on an ecient algorithm for making sure the player's move is optimized, even if the opponent moves in a manner most benecial to itself. [2] It is applicable to any two-player game where there is a nite number of possible moves and where good choices by an opponent can directly detract from the player's ability to win, and thus is commonly used in chess to determine whether a certain move is the best, regardless of how good the opponent's next few moves are. The algorithm begins by nding all the player's possible moves, and expanding the search tree for each node (in a manner similar to iterative deepening, or a bounded breadth-rst search). When a move is expanded, the possible moves of the opponent player are found, and then each is expanded in the same fashion. The algorithm switches between expanding possible player moves and possible opponent moves until the search has reached a bound, at which point the possible moves generated are all scored by an evaluation function (generally provided to the system). Bounds typically allow for at least one search ply (a ply being the expansion of player moves followed by the expansion of opponent moves; in other words, two levels in the search tree). Once each move has been scored, it is compared to all the other moves at that level of the search tree, and the best-scoring move is passed back up the tree, where its score is compared to the best score at the next-highest level of the search tree in a similar fashion. By passing these values up, the system gets a total score for that path (which represents how eective the player's move along that part of the tree may be). Ultimately, the player's moves are scored positively and the opponent's moves are scored negatively; therefore, if the opponent's move is better than the last move the player made, then the rating for that path of the search tree will be negative, and vice-versa. And the end, each of the player's next moves will be scored, and the best scored move is chosen (see Figure 2). Figure 2. Pseudocode for the Minimax algorithm. This algorithm is an eective way of determining how good a move will be, and of determining for how long a good move may remain a good choice in the future. However it requires looking at every

7 USING ADAPTIVE LEARNING ALGORITHMS TO MAKE COMPLEX STRATEGICAL DECISIONS 7 possible move of both players over the span of multiple turns, which can easily amount to hundreds or millions of moves to evaluate, depending on the game and on how the game has progressed. In consequence, other algorithms are used along with a minimax algorithm to reduce the number of moves that must actually be evaluated α-β Pruning. α-β pruning is a method of reducing the number of moves the system must evaluate, by determining whether the net score of a particular search path could ever be better than that of other paths around it. It works roughly the same as a normal minimax search, except that it also incorporates two new variables (α and β). α represents the net score of the opponent, while β represents the net score of the player. As the scores of each move are returned up the search tree, in addition to calculating the scores of each move, the system also sets the α (if the moves being scored are the player's moves) or the β (if the moves being scored are the opponent's moves) to the score sent up to that level. If the α is not smaller than the β when the system is scoring the player's moves, then the system will stop evaluating the rest of the player's moves. This is due to the fact that α being smaller than β equates to the opponent's score at that point in the tree being lower than the player's score; if α is not smaller, then the likelihood that the move will ultimately return a low score is so high that the algorithm can simply cut it and its branch from the search entirely. A similar comparison is made when the system is scoring the opponent's moves; if β is larger than α, then the branch being looked at will be cut from the search (see Figure 3). Figure 3. Pseudocode for the α-β pruning algorithm. Using this method, the system can cut a large number of moves out of the search, and therefore will not have to waste time calculating scores for bad moves. More specically, α-β pruning can reduce the average branching factor of the search by 4 b 3, and may reduce the search to b (where b is the number of children of a particular node). [2]

8 8 ASHANTHI MEENA SERALATHAN The only main concerns with this algorithm are that even with this, for a game like Chess there could still be a large number of moves that must be evaluated; it is estimated that chess has approximately game states, suggesting that even with an optimal reduction the program may have to scan hundreds or thousands of game states in order to make a move. [2] In addition, sometimes the algorithm can cut branches that could actually have led to worthwhile moves (often because the search was not performed far enough to make an informed decision about when to cut the branch). These limitations are typically not enough to abandon using the algorithm, however, and these concerns can be ramied with the addition of other algorithms to the search process, such as quiescence and null-move pruning The Horizon Eect and Quiescence Search. In games such as chess, in which the nature of a game can change in a matter of moves, it can be crucial to know the outcome of a heated part of the game if the system hopes to choose the best move to make in such a situation. This is described as the horizon eect; the system is only able to see so far into the future, and therefore cannot understand the full eects of its choices, depending on how far it is able to search. As stated before, a normal α-β-pruned minimax search uses a strict search bound that does not take into account the nature of the game, and thus it is very easy for the system to miss crucial information that could be revealed in the next move (i.e., over the horizon). This is where quiescence is eective; by using quiescence techniques, the system will extend the search on a particular branch if, by the time it hits the normal bound, the game has not "stabilized." [3] A stable game can mean anything from neither player being in check, or not having particular pieces in jeopardy. If the game is found to be unstable for any reason, the search is extended a little further, and this process can be repeated until the system has extended it to a stable game state. By using this, the system lessens the number of times it misses important consequences of its moves, as it continues the search at points in the game where making the right move is crucial Veried Null-Move Pruning. In order to shrink the number of game states examined even further, an algorithm called Null-Move pruning may be implemented. [4] Null-Move pruning is based on the assumption that at any point in the game, it should be more benecial to make a move than it would be to do nothing (i.e., a null move). Thus the algorithm begins with the player testing out a null move; it pretends the player did not make a move, and continues a reduced search (typically a reduction of 1-ply, or R=2) down this new branch of the search tree. This results in a score that is reective of the null move, and which can be compared to the score of the game at the point before the null move was made. If the score of the null move turns out to be higher (i.e., better) than the score prior to the null move, the system ignores the rest of this game branch (under the assumption that since it was better to do nothing than to make a move, this move branch is undesirable). Moves that fall under these category are called "fail-high" moves. While this algorithm works well in the middle game, when it is rare that not making a move will be benecial, it can be detrimental once the game moves closer to the end, as less pieces on the board can lead to less of a benet for moving those pieces. The horizon eect can also come into play when the algorithm does not search far enough along the null move branch to realize that the null move is not as benecial in the far future as it seems in the near future. Thus the algorithm of veried null-move pruning was developed; it works to allow further reduction of the null-move search, while alleviating the eects of the horizon eect and of the nature of chess endgames.

9 USING ADAPTIVE LEARNING ALGORITHMS TO MAKE COMPLEX STRATEGICAL DECISIONS 9 Veried Null-Move pruning does not cut a branch immediately after nding a fail-high move; rather, it uses the result of a standard null-move search as a sort of ag, which informs the rest of the normal minimax search. It rst performs a null-move search, where R=3, and determines whether the results indicate that a node is a fail-high; if this is the case, then rather than cutting o the node, the depth of the overall search is reduced by a ply, and standard null-move pruning is performed on that node's subtree (i.e., any node found to be a fail-high after this point will be cut; if the algorithm nds no need to cut the nodes, the depth of the search is increased by a ply, and continues as though the node had not been agged as a fail-high). This way, a node's branch is only cut o if it consistently shows signs of leading towards a detrimental move, rather than at the rst indication that the node may be a poor choice. This also allows for a shorter null-move search (R=3 rather than R=2), which more than makes up for the extra verication that occurs in the algorithm (see Figure 4). Figure 4. Pseudocode for the Veried Null-Move Pruning algorithm Neural Networks. Many machine learning algorithms employ some form of neural network. Based on a model of how synapses in a human brain function, a neural network is essentially a fully connected, weighted directed graph with three levels ("layers") that each have their own function within the network. [5] The rst layer, known as the input layer, receives the information being given to the network, and sends it to the second layer, called the hidden layer. As the information moves from the input layer to the hidden layer, it is altered by the weights on each edge from the input nodes to the hidden nodes. In this sense, as every input node is connected to every hidden node and each edge has its own weight, each hidden node will receive a dierent transformation of the same data (in other words, each hidden node possesses a dierent perspective on the data). Data

10 ASHANTHI MEENA SERALATHAN is then manipulated by functions within the hidden nodes, and then passed from the hidden nodes to the output nodes (after being altered again by the set of weights

10 10 ASHANTHI MEENA SERALATHAN is then manipulated by functions within the hidden nodes, and then passed from the hidden nodes to the output nodes (after being altered again by the set of weights between the hidden and output layer). Each output node is (before the network runs) mapped to a specic value or answer to the question the network is attempting to solve (such as what move to perform given a board); in the case of chess, this could mean that if the network is set up to take in a board and output what the board would look like after a chosen move, the output layer could represent every possible move that could be generated from the board taken in (with the number of nodes in the layer equalling the number of possible moves). The system would know which move to choose based on which node yielded the best value; typically, the best value is deemed to be the largest. This means that once a board is given to the network, a value will be sent to each output node, and the node with the highest value will represent the move that the system should make (see Figure 5 for a diagram of a neural network). Figure 5. A neural network is a complex weighted graph that can be used to retain information about data propagated through it. Neural networks are typically initialized with a random set of weights (as the best weights to use generally change depending on the problem being solved). Because of this, the network will need to be taught how to solve the problem correctly, and will have to know how to change its understanding of the data (i.e., change its weights) accordingly, so that it may correctly answer the problem. There are several dierent ways to train a network and to update its weights; typically, a value is added to the weights, which is proportional to how incorrect the network's answer was in comparison to what the trainer species as the correct answer (also known as error backpropagation) Evolutionary Processes and Genetic Algorithms. Genetic algorithms are another subset of algorithms often used for machine learning, and are based on the biological phenomenon of evolution. [6] In the algorithm, a population is generated, where each member of the population

11 USING ADAPTIVE LEARNING ALGORITHMS TO MAKE COMPLEX STRATEGICAL DECISIONS 11 serves as a random possible answer to the question the system is trying to answer. Members have the ability to asexually reproduce, and may be removed from the population over the course of a population's lifetime, to be replaced with the more t members of the population in future generations. Sometimes this tness value may be a probability that the member can make it to the next generation, which is related to the strength of the member's genes, and other times it may be dependent on criteria that the member must fulll in order to live past its current generation, such as successfully performing a procedure based on the contents of its personal genetic code. Each member has its own "gene," which is typically a collection of values which, when coupled with an equation or other method of evaluation, can produce an answer that may or may not answer a question accurately. Answers are evaluated in some manner and, similar to Darwinian evolution, the best answers are allowed to persist in the population, while the worst answers will eventually be removed from the population (see Figure 6). Members of the population eventually create progeny, which are either clones of their parents, or who have their "genes" altered in a way similar to biological genetics (i.e., mutations and cross-overs). Such alterations are random and typically have a specied probability of occurring within the population. As long as the alterations are performed in a gradual manner, the population will eventually converge such that all the members will give a similar answer to a question. And, so long as alterations are random, there will be enough variety in the population such that the system will be making a fairly informed decision about what the answer should be. Figure 6. After being evaluated, the genes that best solved the problem are allowed to persist and reproduce, while the worst genes are given less of a chance of persisting in the population, and are less able to reproduce. Genetic algorithms are viewed as more exible than many algorithms due to the fact that the system does not need to have signicant knowledge of the domain of a problem in order to attempt

12 12 ASHANTHI MEENA SERALATHAN to solve it. The only portion of the algorithm that relies on any context from the question is the portion that assesses the tness of a particular member, and thus the programmer does not have to integrate knowledge of the specic problem into every portion of the system. As such, a system trained on genetic principles may be used to solve multiple unrelated problems simply by altering the method of testing a member's tness to suit the problem at hand, rather than having to adjust the internal mechanisms of the algorithm to suit the nature of the new problem. 2. Previous Work 2.1. Temporal Dierence Learning and TDLEAF(λ). Temporal Dierence Learning focuses on training a system so that it may make informed decisions about how to react to something by being able to predict the future cost of its decision. [7] The algorithm requires a set of end states that can be evaluated in some way (in a game like chess, a set of moves), and sets each state to have a probability value p(x t, x t+1, a) that corresponds to the likelihood that the state will occur (for example, the likelihood that a move will occur in a chess game). Using this, the system can assign a reward value to each end state, and use these values to make the best decision. The system also scores itself depending on whether its predictions were successful or not (1 when it is successful, 0 when a neutral outcome is reached and -1 when unsuccessful). The system can use this value to adjust the reward values for each end state. The TDLEAF(λ) algorithm adapts this strategy to a chess-playing system that uses a minimax algorithm to evaluate moves in a chess game. Rather than simply being given a set of states, the algorithm takes the leaves from a minimax search and assigns reward values to each, and then scores other moves based on their relationship to these leaves. In this sense, the reward values are propagated up the minimax tree, such that the system gets a better understanding of how a possible move will turn out, based on how its predicted reward values have turned out. The algorithm is markedly slower than Deep Blue (it plays 6,000 times slower than Deep Blue), however it is still able to play at a highly sophisticated level. After setting most of its evaluation function parameters to 0 (leaving only piece rankings in the equation) and allowing the system to play human and computer chess players, it played at a B-grade human level (around 1650); once the TDLEAF(λ) part of the system was enabled, the chess system's ranking rose 500 points (2150) in its rst 300 games, and became even better afterward. And because the system learns all the parameters to its evaluation function, the system could be left to develop an optimal evaluation function over time, rather than requiring extensive research on how to set each parameter (as was necessary with Deep Blue). The algorithm is also extremely successful with games such as Backgammon Explanation-Based Neural Networks. The explanation-based model for learning chess was motivated by the desire to reduce the amount of time needed to train a network that is going to play chess. [8] Explanation-based methods rely on "domain knowledge" of the problem being tackled in order to form generalized opinions as to how to carry out the task.

13 USING ADAPTIVE LEARNING ALGORITHMS TO MAKE COMPLEX STRATEGICAL DECISIONS 13 Figure 7. Components of the Explanation-based algorithm. In the case of chess, this model is utilized with neural networks; one network, called the chess model M, is trained to keep track of this domain-specic knowledge by training itself on large databases of grand-master chess games, and learning how to take in a board and predict what the next move will be. Specically, it receives a board s t (the board at the start of the player's turn) and maps it to s t+1 (the board at the start of the next player's turn). The system then uses M to aect how it learns an evaluation function V (which is represented by a separate neural network which is trained using the TD algorithm), such that V target (s t) s t γ V (st 2) s t+2 M(st) s t where V target (s t ) is the value the evaluation function will give at board s t, γ is a discount factor that favors early accuracy over late accuracy (i.e., a stronger starting game rather than a strong end game), V target (s t+2 ) is the value the evaluation function predicts will be valid at the player's next turn, and M(s t ) is based on the information M has about board s t, based on the chess games it was trained on. This algorithm saves time because V is the network that will be used in the chess-playing system, and with this algorithm it does not have to spend time training on games rst-hand every time it must train a player (instead, a researcher trains M once, and uses it to train V repeatedly as necessary). See Figure 7 for a diagram of this structure. This algorithm was tested against GNU Chess (a popular and well-developed open-source chess application), and won approximately 13% of its games. The largest problem with the system were its extremely poor opening strategy, and its inability to play out short endgame strategies; however, the system was able to generalize how to protect important pieces and how to do intelligent trades Population Dynamics. One of the only chess-playing learning algorithms that does not rely on a neural network framework, this algorithm uses an algorithm based on evolutionary principles in order to calculate what the system's evaluation function should be. [9] The algorithm uses a minimax

14 14 ASHANTHI MEENA SERALATHAN algorithm using α-β pruning and quiescence. The minimax search evaluates boards based on the following equation: 6 y=0 W [y](n[y] white N[y] black ) where N is a vector containing the number of pawns, knights, bishops, rooks, queens, and legal moves for the current board, and where W represents the weights of all these parameters. A small constant "bonus" is granted in cases where the player controls the center, or when pawns have the potential to be promoted. It then randomly generates a population P, which consists of chess-playing systems that have their own randomized weight vector (or chromosome), v i, where v i = {weight knight, weight bishop, weight rook, weight queen, weight legalmove, weight pawn, weight king }. Weight pawn and weight king are always xed to 1 and, respectively (and, due to being xed, do not need to be added to the chromosome or used in calculations). Each member plays against another member of the population twice (once as black and once as white). Opponents are selected based on the relative strength of the two systems. Each player is represented in a vector, V, and a variable, c, is used to iterate through the vector. A member V [c] is pitted against a randomly selected player in the remainder of the vector (V [c + 1] - V [µ]), and then c is incremented. When the vector has been iterated through a generation has passed, and the vector is reversed, which guarantees that the best player of that generation is placed at the beginning of the vector. This ordering holds due to the fact that the later players will have played more games than their predecessors, and thus will have improved more rapidly than their predecessors. This also ensures that the best player will persist in the population longer than other other players, as it will not be challenged often enough to allow for many unlucky losses. The winner of the two games is allowed to stay in the population, while the losing member is expelled from P, and a mutated version of the winner is put in its place. If there is a draw, then both members are left in P and mutated. The mutated members of the population have their chromosomes change by rst calculating the standard deviation of each parameter: µ n=1 σ (y) = (V n(y) average (y) ) 2 µ 1, y v (where y is a particular evaluation parameter (i.e., gene), V n(y) is the parameter for player V n, average (y) is the average of that parameter across the entire population, and µ is the size of the population), and then using this value in a nal mutation equation: V (y) = V (y) + ((RND(0...1) 0.5) R σ (y) ), y v where R is a xed value that is determined depending on how well the player performed: 2, if player won both games 1, if player lost and is becoming a mutated version of the winner R = 0.2, if player won a match and tied another match 0.5, if both players tied both times In this sense, the mutation of the genes reects the trends in the population, and isn't purely random, which helps in keeping the values close enough so that they can converge at some point. This also allows the values to mutate dierently depending on how well the player has played recently.

15 USING ADAPTIVE LEARNING ALGORITHMS TO MAKE COMPLEX STRATEGICAL DECISIONS 15 Figure 8. In the Population Dynamics algorithm, each member has an array of parameters that it uses to determine its next move; the best set of parameters leads to wins, which in turn leads to the member remaining in the population. In Kendall and Whitwell's experiment [9], µ = 50, and the genes were randomly set to values between One member of the population was seeded with values known to be optimal (knights = 3, bishop = 3, rook = 5, queen = 9, legalmoves = 1), both to speed up learning and to prove that the best member of the population would not be lost during the learning process. Their experiment yielded the following results: Table 1. Kendall and Whitwell's Best Players Knight Bishop Rook Queen Legal Moves Average Experimental Design For the purpose of this experiment, the algorithm utilizing Population Dynamics was implemented and tested; due to the inherently exible nature of genetic algorithms in general, it was deemed to be the most simple to alter, relative to many of the other algorithms, which either would require extensive previous knowledge of how best to analyze each and every board, or would require even more training time and calculations to properly train the chess-playing system Content of Chromosomes. Kendall and Whitwell's algorithm [9] uses piece rankings and the number of legal moves as genes in their chromosomes, and incorporates control of center and pawn promotion as constants in the search algorithm itself; the eects of removing the constant bonus for control of the center and promotion and adding these as genes to each member of the population,

16 16 ASHANTHI MEENA SERALATHAN as well as adding genes for other more abstract strategies or types of moves, such as castling, were not examined in their experiment. Considering that the algorithm as it stands would not allow the system to truly gure out how important the xed parameters are (in comparison to how important it is to keep bishops in play or to have a signicant number of legal moves), it would seem that an alteration of this sort would only improve upon the algorithm: instead of relying on programmer's intuition to set these values to a number that will not cause the system to ignore these parameters too often or give them too much priority, the system itself would be able to not only come up with its own values for these parameters, but would have done so in a manner that naturally relates these values with any other factors the system pays attention to while playing. Adding to the number of genes in each chromosome also should not lead to a problem, as genetic algorithms in general are designed with the ability to manipulate large amounts of data at once; a typical algorithm may be altering hundreds of weights at once. Thus it is doubtful that the addition of these two factors, along with a few abstract moves, would hinder the system's ability to converge to a nal set of genes. Thus for the purposes of this experiment, v i = {weight knight, weight bishop, weight rook, weight queen, weight legalmove, weight controlofcenter, weight promotion, weight castling } Crossing-Over. In Kendall and Whitwell's experiment [9], crossing-over is not incorporated in the training process at all. In order to examine the eects of crossing over on such an algorithm (such as whether or not it speeds up the rate of convergence, or whether it creates a tter population), another population will be trained in which chromosomes have a xed probability of 0.1 of performing crossovers during the mutation process. Crossovers only occur at one point in the chromosome, and involve a swap of genetic material at a random point on the chromosome Chess Engine. The chess game engine is a modied version of a ChessBoard program [10], which handles nding valid moves and checking the state of the game, but does not come with a built-in articial intelligence mechanism for choosing moves. The system is written in the Python programming language. It will be appended with the population dynamics system (which would handle training the population, keeping track of player chromosomes, and mutations), as well as a minimax search algorithm that uses α-β pruning. For the sake of speeding up training and testing the strength of the algorithm, the search depth was reduced to 1-ply, and quiescence was disabled. The evaluation function would be preserved from the original algorithm, with the exception that it for control of center, promotion, and castling, the weights in the chromosomes would be controlled with boolean values (i.e., v 6 = weight controlofcenter *1, if the player has control of the center; if not, then v 6 = weight controlofcenter *0; i.e., the weight is ignored). The way in which these values are calculated varies from the way the original genes were used simply due to the fact that these genes are only concerned with the player still having possibility of performing the move, and are not concerned with immediately quantiable values, such as how many moves the player has Results. In Kendall and Whitwell's experiment [9], the initial standard deviation of the chromosomes in the population was 3.8; because I incorporate three new genes in each chromosome, the initial standard deviation for my players begins higher, at around 10 (10.01 for the population without crossing over, and 9.84 for the population with crossing over).

17 USING ADAPTIVE LEARNING ALGORITHMS TO MAKE COMPLEX STRATEGICAL DECISIONS 17 A player with 51 members in its population was trained for 13 generations; it converged with an average gene deviation of Table 2. Player A Knight Bishop Rook Queen Legal Moves Control of Center Promotion Castling Best Player Average Values are rounded to two decimal places. These genes indicate that this player values having many Queens rst and foremost; then it becomes concerned with the number of Bishops, castling, promotion, the number of Rooks, the player's control of center, the number of legal moves, and the numberof Knights, in that order. The average player has roughtly the same goals in mind. A player with 51 members in its population and crossing-over enabled was also trained; the addition of the crossing over may have given the system some diculty in converging as quickly, as even after twice as many generations had passed, the system had not converged (and, in fact, the genes had an average deviation of ). By the end of the 26th generation the population had the following characteristics: Table 3. Player B Knight Bishop Rook Queen Legal Moves Control of Center Promotion Castling Best Player Average These genes indicate that this player values having many Knights more than anything else; then it becomes concerned with the player's control of the center, promotion potential, the number of Rooks, castling potential, the number of legal moves, the number of queens, and nally the nymber of Bishops, in that order. The average player mostly agrees with this hierarchy (the average chain is Knights, control of center, promotion, legal moves, castling, Rooks, Queens, and Bishops; Rook and legal moves swap precedence). In both of these models, winning members were not mutated (only the replacement chromosomes were mutated, at a rate of 0.5). See Appendix A for tables of all the genes of both populations. For the purpose of testing their ability to play chess (regardless of whether or not their genes were optimal), these two players were both pitted against random players (i.e., players who choose moves at random), using the same two-game format used in the training. The results are as follows: Table 4. Player A v. Random Player Result Moves Required Game 1 (White) Lost 140 Game 2 (Black) Draw (3-Repetition Rule) 180

18 18 ASHANTHI MEENA SERALATHAN Table 5. Player B v. Random Player Result Moves Required Game 1 (White) Lost 232 Game 2 (Black) Draw(3-Repetition Rule) 189 These results remain consistent if the player is allowed a rematch; the players consistently lose their rst game and then tie the second game. These players also played against players pre-tted with the piece rank genes used in the seed for Kendall and Whitwell's experiment ([3, 5, 5, 9, 1, 8, 1, 1]): Table 6. Player A v. Seeded Player Result Moves Required Game 1 (White) Draw (3-Repetition Rule) 12 Game 2 (Black) Draw (3-Repetition Rule) 47 Table 7. Player B v. Seeded Player Result Moves Required Game 1 (White) Draw (3-Repetition Rule) 12 Game 2 (Black) Draw (3-Repetition Rule) 66 For each player, the results are identical if the player and the seed play again. A similar set of games was played between a random player and the seeded player: Table 8. Seeded Player v. Random Result Moves Required Game 1 (White) Draw (3-Repetition Rule) 110 Game 2 (Black) Lost Analysis. As can be seen through the data, the results of the system's evolution dier greatly from the results of Kendall and Whitwell. Many of the genes for piece ranking are inated to a signicant degree in comparison to the "ideal" values. It is currently unclear as to whether this is the result of the added genes having an eect on the values in general, or whether the system as it stands is evolving incorrectly. It is possible, for example, that the fact that the values for the piece rankings are not only dependent on each other, but are now dependent on the new factors, would cause the values in general to deviate from the values one would expect with only the ve piece rankings in play. It is also possible that

19 USING ADAPTIVE LEARNING ALGORITHMS TO MAKE COMPLEX STRATEGICAL DECISIONS 19 the system, as it is currently written, is simply changing too quickly, thus causing abnormally high (or low) values for many of the genes. And while the results of the played games would indicate that the non-optimal values indeed lead to a non-optimal level of gameplay (as the players could not win against a random player), the fact that the seeded player also could not beat the random player suggests that the weight may not be the only cause of weak play. There may currently be a bug in the search procedure that does not allow the player to truly choose its best move, causing even the seeded player to have diculty beating the random player. While it could be the case that the 1-ply search is proving to be a detriment to the system, as the low search bound will make it more dicult for a system to make an informed decision about future moves, one would assume that a weakly informed decision would still fare, on average, better than a completely uninformed decision, which is not the case with these players (as they consistently lose as White and draw as Black). It also seems odd that the trained players lose when White (the color that moves rst, and is therefore commonly expected to have a slight advantage) and draw when Black. More work would have to be done to determine how weak the opening move for the system is, as well as why a weak opening would cause poor gameplay when the player moves rst, and cause a slightly better result when the player moves second Future Work. In order to verify the reason for the inated genes, more training sessions would have to be run, and the average best player of each system would have to be compared. This would be required in order to verify that the inated values in these preliminary training sessions were reective of the system's behavior in general. In order to determine whether the new genes are the source of the weight ination, it would be useful to go back and train a system that does not take the new parameters into account, and compare its results to the ideal set of weights. If similar to the ideal values, then it is possible that these values are close to normal, given the relationship of the new parameters to piece rankings. If the results were also equally inated, then it could be concluded that the system itself is not evolving properly. Altering how the system's members mutate (lowering the rate of mutation or the magnitude of the mutations, for example) would possibly be the best ways to slow down the amount of change in the population, and keep the values closer to the ideal values. The addition of more parameters in the chromosomes would also make for a more interesting system, which would hopefully yield a successful style of play. Adding parameters that take more complex strategies, such as knight-forking, into account would test the strength of the system when faced with abstract concepts and decisions (as opposed to the relatively straight-forward strategies and parameters incorporated in the current chromosomes). Finally, in order to truly test the strength of the system's chess abilities, it would need to play against a wide variety of dierent players (human or computer). The system should also be compared to various other systems, including systems that incorporate neural networks or other structures in order to train their players. Through these tests, one could determine the advantages and disadvantages this approach holds over other other techniques, and could determine whether this approach is (or is not) the best approach to creating systems that can solve complex problems.

20 20 ASHANTHI MEENA SERALATHAN 3.7. Conclusion. While there are not yet results that indicate that genetic algorithms create better predictions for tasks such as chess, in comparison to other learning techniques, the exibility of the algorithm still appears evident, suggesting that the technique is still worth further study. The results thus far indicate that it may be the underlying playing algorithm, rather than the training algorithm, that could be causing non-optimal results; if this is the case, and if the learning algorithm itself does not need to be tweaked to create a more optimal chess-playing population, this would re-arm the notion that genetic algorithms (and in particular, the population dynamics model) can be applied to a wide variety of problems with minimal need to hand-tailor it to the task at hand. This exibility would vary widely from other techniques for playing chess, which either require the computer scientist to personally tweak with every chess parameter until they are optimal, or require using another sort of learning system, such as a neural network, that is more computationally costly (as it is solving a large system of equations at once [ 5], rather than one equation dependent on one set of weights [ 9]) and requires careful observation on the part of the computer scientist to ensure it does not train too much. Additionally, Kendall and Whitwell's algorithm [ 9] suggests that a system trained with genetic can be training to play chess at a sophisticated level, indicating that the learning algorithm itself is capable of producing useful results for chess, and indicating that it is still worthwhile to study.

21 USING ADAPTIVE LEARNING ALGORITHMS TO MAKE COMPLEX STRATEGICAL DECISIONS References [1] M.Campbell, A.J. Hoane Jr., and F-H. Hsu Deep Blue. Articial Intelligence, 134 (1-2): [2] Nilsson, Nils J Articial Intelligence : A New Synthesis. Morgan Kaufmann Publishers, San Francisco: [3] [4] Tabibi, O.D. And Netanyahu, N.S Veried Null-move Pruning. ICGA Journal, 25 (3): [5] Cheng, B. and D.M. Titterington Neural Networks: A Review from a Statistical Perspective. Statistical Science, 9 (1): [6] Koza, John R Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge: 1,. [7] Tesauro, G Temporal dierence learning and TD-gammon. Commun. ACM 38: [8] Thrun. S Learning to play the game of chess. In G. Tesauro, D. Touretzky, and T. Leen, (eds.), Advances in Neural Information Processing Systems 7, The MIT Press: Cambridge, MA: [9] G. Kendall and G. Whitwell An evolutionary approach for the tuning of a chess evaluation function using population dynamics. In Proceedings of the 2001 Congress on Evolutionary Computation (CEC-2001), IEEE Press: , [10]

22 22 ASHANTHI MEENA SERALATHAN 5. Appendix A: Player Chromosomes Table 9. Population A Knight Bishop Rook Queen Legal Moves Control of Center Promotion Castling Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player

CS 570: Machine Learning Seminar. Fall 2016

CS 570: Machine Learning Seminar Fall 2016 Class Information Class web page: http://web.cecs.pdx.edu/~mm/mlseminar2016-2017/fall2016/ Class mailing list: cs570@cs.pdx.edu My office hours: T,Th, 2-3pm or