ASHANTHI MEENA SERALATHAN. Haverford College, Computer Science Department Advisors: Dr. Deepak Kumar Dr. Douglas Blank Dr.

Size: px
Start display at page:

Download "ASHANTHI MEENA SERALATHAN. Haverford College, Computer Science Department Advisors: Dr. Deepak Kumar Dr. Douglas Blank Dr."

Transcription

1 USING ADAPTIVE LEARNING ALGORITHMS TO MAKE COMPLEX STRATEGICAL DECISIONS ASHANTHI MEENA SERALATHAN Haverford College, Computer Science Department Advisors: Dr. Deepak Kumar Dr. Douglas Blank Dr. Stephen Lindell April 25, 2011 Date: Spring

2 2 ASHANTHI MEENA SERALATHAN 1. Introduction Chess Deep Thought and Deep Blue Minimax Search α-β Pruning The Horizon Eect and Quiescence Search Veried Null-Move Pruning Neural Networks Evolutionary Processes and Genetic Algorithms Previous Work Temporal Dierence Learning and TDLEAF(λ) Explanation-Based Neural Networks Population Dynamics Experimental Design Content of Chromosomes Crossing-Over Chess Engine Results Analysis Future Work Conclusion References Appendix A: Player Chromosomes Appendix B: Code 24 Contents

3 USING ADAPTIVE LEARNING ALGORITHMS TO MAKE COMPLEX STRATEGICAL DECISIONS 3 Abstract Traditionally, articial intelligence (AI) algorithms have not been built on particularly adaptive principles. Systems were created using complex collections of rules that were created specically for the purpose at hand, and whose exibility was wholly dependent on what exibility the programmer incorporated within the rules. As a result, this thesis examines many dierent algorithms for decision-making, particularly for playing chess. It surveys a number of dierent techniques for creating a chess-playing system, and nally begins an altered implementation on the genetic algorithm-inspired algorithm that uses Population Dynamics to train a system to understand how to rank board states in a game of chess, which includes more genes than the original algorithm. While still a work in progress, the process of creating the system has already demonstrated some advantages other other algorithms for learning evaluation functions for chess (such as the exibility of the algorithm), and futher work could lead to interesting insight on whether a chess system built using a modied version of Population Dynamics can lead to a system whose skill is comparable to the likes of other chess systems, or even to human players.

4 4 ASHANTHI MEENA SERALATHAN 1. Introduction This thesis focuses on a number of dierent strategies utilized for making complex decisions: Deep Blue, an example of a chess-playing system that utilized no adaptive learning strategies to play chess, as well as three popular algorithms for learning evaluation functions for chess. All three learning algorithms rely on a minimax search (which is sped up with various methods of pruning the move space at a given turn) that returns the best found move. Each algorithm has a distinctly dierent method of developing its evaluation function: two algorithms (Explanation-Based Neural Networks and TDLEAF(λ)) use neural networks in some manner to develop the evaluation function; one (Population Dynamics) uses genetic algorithm techniques to develop the function. Finally, this thesis begins to analyse the eects of altering the Population Dynamics algorithm by slightly altering how it searches through the chess board space, as well as adding more parameters to the player's chromosome in order to have the system determine even more about factors that are important to playing chess. It must be noted that these algorithms all have uses outside of chess, and could be slightly altered to t a number of dierent applications, such as natural language processing and generic decisionmaking systems Chess. Chess is an interesting game to choose for studying learning algorithms (due to its complexity), as it has yet to be solved. Winning the game does not simply become a matter of having the hardware to quickly nd the moves necessary for the perfect strategy: some amount of strategy is required for a system to be able to play on-par with intelligent beings. Figure 1. A typical chess board layout.

5 USING ADAPTIVE LEARNING ALGORITHMS TO MAKE COMPLEX STRATEGICAL DECISIONS 5 In chess there are six dierent types of pieces, each with a dierent set of rules as to how it can move (see Figure 1). Some pieces, such as kings, can only move one space at a time, while other pieces, such as queens, can move several spaces in several directions. If a piece can move to a space occupied by an opponent piece, then it may capture that piece. One piece, the king, is not allowed to be captured, and the player is required to nd a way to get its king out of harm's way if it is threatened (called a "check"). The object of the game is thus to nd a way to capture the king, which requires trapping the King to the point where no matter how the player moves, it cannot save its king from being captured (called a "checkmate"). Along with this are various stalemate criteria, which include a player who is not in check being unable move without putting itself in check, that may be taken into account. In addition, there are many special moves that certain pieces can only make under specic circumstances, such as castling. The system therefore has to keep track of each piece, keep track of where that piece can move, keep track of whether a piece's movement will put its king in check, and then (as there will be often be dozens of possible moves for each player) it must make strategic decisions about which possible moves would be most advantageous at a certain point in the game. How the computer chooses these moves is the topic of discussion in this thesis Deep Thought and Deep Blue. The rst truly successful chess-playing articial intelligence (AI) systems were Deep Thought and Deep Blue (created by Carnegie Mellon University in 1988 and IBM in 1996, respectively) [1]. Deep Thought and Deep Blue both relied heavily in optimizing the system hardware, with the intent of being able to do very fast searches and to use carefully researched evaluation functions to choose how to rank boards. Deep Blue in particular was a sophisticated parallel system that relied on 30 IBM RS/6000 SP processors containing a total of 480 chess chips dedicated to speeding up the process of searching for possible moves (each chip was able to see million possible positions per second). One of the 30 processors served as the master processor, which started searches through the move space and pushed deeper searches onto the other processors. These processors would then use their chess chips to do the last few levels of searches, before evaluating individual boards and sending their ndings back up to the master processor. It also relied on pre-computed and heavily researched evaluation functions that themselves relied on over 8000 features specic to chess games (such as how safe the king was, where specic pieces were located, and so on). By combining sophisticated hardware specically designed for optimized chess-playing with a carefully constructed set of components to an evaluation, and with highly optimized minimax searches through the possible move space, Deep Blue was able to play at a level that was competitive with human Grandmasters, and was even able to beat one (Garry Kasparov) in It averaged scanning 126 million positions per second (with a range of 100 million to 200 million positions per second, depending on the point of the game). While certainly a breakthrough in the eld of AI, it must be noted that though it was able to beat a human player, Deep Blue was given almost every bit of information that it may ever need to play a game of chess against a human player, with the fastest, most sophisticated hardware that could be mustered at the time, and was left to automatically compute its movements based on all the hard-coded information it was handed. In consequence, the Deep Blue algorithm as a whole was rather inexible in terms of its application to other real-world problems. Each rule was hand-crafted by Deep Blue's programmers to only improve the chess-playing process, and would not easily work with any other problem.

6 6 ASHANTHI MEENA SERALATHAN 1.3. Minimax Search. Minimax searches rely on an ecient algorithm for making sure the player's move is optimized, even if the opponent moves in a manner most benecial to itself. [2] It is applicable to any two-player game where there is a nite number of possible moves and where good choices by an opponent can directly detract from the player's ability to win, and thus is commonly used in chess to determine whether a certain move is the best, regardless of how good the opponent's next few moves are. The algorithm begins by nding all the player's possible moves, and expanding the search tree for each node (in a manner similar to iterative deepening, or a bounded breadth-rst search). When a move is expanded, the possible moves of the opponent player are found, and then each is expanded in the same fashion. The algorithm switches between expanding possible player moves and possible opponent moves until the search has reached a bound, at which point the possible moves generated are all scored by an evaluation function (generally provided to the system). Bounds typically allow for at least one search ply (a ply being the expansion of player moves followed by the expansion of opponent moves; in other words, two levels in the search tree). Once each move has been scored, it is compared to all the other moves at that level of the search tree, and the best-scoring move is passed back up the tree, where its score is compared to the best score at the next-highest level of the search tree in a similar fashion. By passing these values up, the system gets a total score for that path (which represents how eective the player's move along that part of the tree may be). Ultimately, the player's moves are scored positively and the opponent's moves are scored negatively; therefore, if the opponent's move is better than the last move the player made, then the rating for that path of the search tree will be negative, and vice-versa. And the end, each of the player's next moves will be scored, and the best scored move is chosen (see Figure 2). Figure 2. Pseudocode for the Minimax algorithm. This algorithm is an eective way of determining how good a move will be, and of determining for how long a good move may remain a good choice in the future. However it requires looking at every

7 USING ADAPTIVE LEARNING ALGORITHMS TO MAKE COMPLEX STRATEGICAL DECISIONS 7 possible move of both players over the span of multiple turns, which can easily amount to hundreds or millions of moves to evaluate, depending on the game and on how the game has progressed. In consequence, other algorithms are used along with a minimax algorithm to reduce the number of moves that must actually be evaluated α-β Pruning. α-β pruning is a method of reducing the number of moves the system must evaluate, by determining whether the net score of a particular search path could ever be better than that of other paths around it. It works roughly the same as a normal minimax search, except that it also incorporates two new variables (α and β). α represents the net score of the opponent, while β represents the net score of the player. As the scores of each move are returned up the search tree, in addition to calculating the scores of each move, the system also sets the α (if the moves being scored are the player's moves) or the β (if the moves being scored are the opponent's moves) to the score sent up to that level. If the α is not smaller than the β when the system is scoring the player's moves, then the system will stop evaluating the rest of the player's moves. This is due to the fact that α being smaller than β equates to the opponent's score at that point in the tree being lower than the player's score; if α is not smaller, then the likelihood that the move will ultimately return a low score is so high that the algorithm can simply cut it and its branch from the search entirely. A similar comparison is made when the system is scoring the opponent's moves; if β is larger than α, then the branch being looked at will be cut from the search (see Figure 3). Figure 3. Pseudocode for the α-β pruning algorithm. Using this method, the system can cut a large number of moves out of the search, and therefore will not have to waste time calculating scores for bad moves. More specically, α-β pruning can reduce the average branching factor of the search by 4 b 3, and may reduce the search to b (where b is the number of children of a particular node). [2]

8 8 ASHANTHI MEENA SERALATHAN The only main concerns with this algorithm are that even with this, for a game like Chess there could still be a large number of moves that must be evaluated; it is estimated that chess has approximately game states, suggesting that even with an optimal reduction the program may have to scan hundreds or thousands of game states in order to make a move. [2] In addition, sometimes the algorithm can cut branches that could actually have led to worthwhile moves (often because the search was not performed far enough to make an informed decision about when to cut the branch). These limitations are typically not enough to abandon using the algorithm, however, and these concerns can be ramied with the addition of other algorithms to the search process, such as quiescence and null-move pruning The Horizon Eect and Quiescence Search. In games such as chess, in which the nature of a game can change in a matter of moves, it can be crucial to know the outcome of a heated part of the game if the system hopes to choose the best move to make in such a situation. This is described as the horizon eect; the system is only able to see so far into the future, and therefore cannot understand the full eects of its choices, depending on how far it is able to search. As stated before, a normal α-β-pruned minimax search uses a strict search bound that does not take into account the nature of the game, and thus it is very easy for the system to miss crucial information that could be revealed in the next move (i.e., over the horizon). This is where quiescence is eective; by using quiescence techniques, the system will extend the search on a particular branch if, by the time it hits the normal bound, the game has not "stabilized." [3] A stable game can mean anything from neither player being in check, or not having particular pieces in jeopardy. If the game is found to be unstable for any reason, the search is extended a little further, and this process can be repeated until the system has extended it to a stable game state. By using this, the system lessens the number of times it misses important consequences of its moves, as it continues the search at points in the game where making the right move is crucial Veried Null-Move Pruning. In order to shrink the number of game states examined even further, an algorithm called Null-Move pruning may be implemented. [4] Null-Move pruning is based on the assumption that at any point in the game, it should be more benecial to make a move than it would be to do nothing (i.e., a null move). Thus the algorithm begins with the player testing out a null move; it pretends the player did not make a move, and continues a reduced search (typically a reduction of 1-ply, or R=2) down this new branch of the search tree. This results in a score that is reective of the null move, and which can be compared to the score of the game at the point before the null move was made. If the score of the null move turns out to be higher (i.e., better) than the score prior to the null move, the system ignores the rest of this game branch (under the assumption that since it was better to do nothing than to make a move, this move branch is undesirable). Moves that fall under these category are called "fail-high" moves. While this algorithm works well in the middle game, when it is rare that not making a move will be benecial, it can be detrimental once the game moves closer to the end, as less pieces on the board can lead to less of a benet for moving those pieces. The horizon eect can also come into play when the algorithm does not search far enough along the null move branch to realize that the null move is not as benecial in the far future as it seems in the near future. Thus the algorithm of veried null-move pruning was developed; it works to allow further reduction of the null-move search, while alleviating the eects of the horizon eect and of the nature of chess endgames.

9 USING ADAPTIVE LEARNING ALGORITHMS TO MAKE COMPLEX STRATEGICAL DECISIONS 9 Veried Null-Move pruning does not cut a branch immediately after nding a fail-high move; rather, it uses the result of a standard null-move search as a sort of ag, which informs the rest of the normal minimax search. It rst performs a null-move search, where R=3, and determines whether the results indicate that a node is a fail-high; if this is the case, then rather than cutting o the node, the depth of the overall search is reduced by a ply, and standard null-move pruning is performed on that node's subtree (i.e., any node found to be a fail-high after this point will be cut; if the algorithm nds no need to cut the nodes, the depth of the search is increased by a ply, and continues as though the node had not been agged as a fail-high). This way, a node's branch is only cut o if it consistently shows signs of leading towards a detrimental move, rather than at the rst indication that the node may be a poor choice. This also allows for a shorter null-move search (R=3 rather than R=2), which more than makes up for the extra verication that occurs in the algorithm (see Figure 4). Figure 4. Pseudocode for the Veried Null-Move Pruning algorithm Neural Networks. Many machine learning algorithms employ some form of neural network. Based on a model of how synapses in a human brain function, a neural network is essentially a fully connected, weighted directed graph with three levels ("layers") that each have their own function within the network. [5] The rst layer, known as the input layer, receives the information being given to the network, and sends it to the second layer, called the hidden layer. As the information moves from the input layer to the hidden layer, it is altered by the weights on each edge from the input nodes to the hidden nodes. In this sense, as every input node is connected to every hidden node and each edge has its own weight, each hidden node will receive a dierent transformation of the same data (in other words, each hidden node possesses a dierent perspective on the data). Data

10 10 ASHANTHI MEENA SERALATHAN is then manipulated by functions within the hidden nodes, and then passed from the hidden nodes to the output nodes (after being altered again by the set of weights between the hidden and output layer). Each output node is (before the network runs) mapped to a specic value or answer to the question the network is attempting to solve (such as what move to perform given a board); in the case of chess, this could mean that if the network is set up to take in a board and output what the board would look like after a chosen move, the output layer could represent every possible move that could be generated from the board taken in (with the number of nodes in the layer equalling the number of possible moves). The system would know which move to choose based on which node yielded the best value; typically, the best value is deemed to be the largest. This means that once a board is given to the network, a value will be sent to each output node, and the node with the highest value will represent the move that the system should make (see Figure 5 for a diagram of a neural network). Figure 5. A neural network is a complex weighted graph that can be used to retain information about data propagated through it. Neural networks are typically initialized with a random set of weights (as the best weights to use generally change depending on the problem being solved). Because of this, the network will need to be taught how to solve the problem correctly, and will have to know how to change its understanding of the data (i.e., change its weights) accordingly, so that it may correctly answer the problem. There are several dierent ways to train a network and to update its weights; typically, a value is added to the weights, which is proportional to how incorrect the network's answer was in comparison to what the trainer species as the correct answer (also known as error backpropagation) Evolutionary Processes and Genetic Algorithms. Genetic algorithms are another subset of algorithms often used for machine learning, and are based on the biological phenomenon of evolution. [6] In the algorithm, a population is generated, where each member of the population

11 USING ADAPTIVE LEARNING ALGORITHMS TO MAKE COMPLEX STRATEGICAL DECISIONS 11 serves as a random possible answer to the question the system is trying to answer. Members have the ability to asexually reproduce, and may be removed from the population over the course of a population's lifetime, to be replaced with the more t members of the population in future generations. Sometimes this tness value may be a probability that the member can make it to the next generation, which is related to the strength of the member's genes, and other times it may be dependent on criteria that the member must fulll in order to live past its current generation, such as successfully performing a procedure based on the contents of its personal genetic code. Each member has its own "gene," which is typically a collection of values which, when coupled with an equation or other method of evaluation, can produce an answer that may or may not answer a question accurately. Answers are evaluated in some manner and, similar to Darwinian evolution, the best answers are allowed to persist in the population, while the worst answers will eventually be removed from the population (see Figure 6). Members of the population eventually create progeny, which are either clones of their parents, or who have their "genes" altered in a way similar to biological genetics (i.e., mutations and cross-overs). Such alterations are random and typically have a specied probability of occurring within the population. As long as the alterations are performed in a gradual manner, the population will eventually converge such that all the members will give a similar answer to a question. And, so long as alterations are random, there will be enough variety in the population such that the system will be making a fairly informed decision about what the answer should be. Figure 6. After being evaluated, the genes that best solved the problem are allowed to persist and reproduce, while the worst genes are given less of a chance of persisting in the population, and are less able to reproduce. Genetic algorithms are viewed as more exible than many algorithms due to the fact that the system does not need to have signicant knowledge of the domain of a problem in order to attempt

12 12 ASHANTHI MEENA SERALATHAN to solve it. The only portion of the algorithm that relies on any context from the question is the portion that assesses the tness of a particular member, and thus the programmer does not have to integrate knowledge of the specic problem into every portion of the system. As such, a system trained on genetic principles may be used to solve multiple unrelated problems simply by altering the method of testing a member's tness to suit the problem at hand, rather than having to adjust the internal mechanisms of the algorithm to suit the nature of the new problem. 2. Previous Work 2.1. Temporal Dierence Learning and TDLEAF(λ). Temporal Dierence Learning focuses on training a system so that it may make informed decisions about how to react to something by being able to predict the future cost of its decision. [7] The algorithm requires a set of end states that can be evaluated in some way (in a game like chess, a set of moves), and sets each state to have a probability value p(x t, x t+1, a) that corresponds to the likelihood that the state will occur (for example, the likelihood that a move will occur in a chess game). Using this, the system can assign a reward value to each end state, and use these values to make the best decision. The system also scores itself depending on whether its predictions were successful or not (1 when it is successful, 0 when a neutral outcome is reached and -1 when unsuccessful). The system can use this value to adjust the reward values for each end state. The TDLEAF(λ) algorithm adapts this strategy to a chess-playing system that uses a minimax algorithm to evaluate moves in a chess game. Rather than simply being given a set of states, the algorithm takes the leaves from a minimax search and assigns reward values to each, and then scores other moves based on their relationship to these leaves. In this sense, the reward values are propagated up the minimax tree, such that the system gets a better understanding of how a possible move will turn out, based on how its predicted reward values have turned out. The algorithm is markedly slower than Deep Blue (it plays 6,000 times slower than Deep Blue), however it is still able to play at a highly sophisticated level. After setting most of its evaluation function parameters to 0 (leaving only piece rankings in the equation) and allowing the system to play human and computer chess players, it played at a B-grade human level (around 1650); once the TDLEAF(λ) part of the system was enabled, the chess system's ranking rose 500 points (2150) in its rst 300 games, and became even better afterward. And because the system learns all the parameters to its evaluation function, the system could be left to develop an optimal evaluation function over time, rather than requiring extensive research on how to set each parameter (as was necessary with Deep Blue). The algorithm is also extremely successful with games such as Backgammon Explanation-Based Neural Networks. The explanation-based model for learning chess was motivated by the desire to reduce the amount of time needed to train a network that is going to play chess. [8] Explanation-based methods rely on "domain knowledge" of the problem being tackled in order to form generalized opinions as to how to carry out the task.

13 USING ADAPTIVE LEARNING ALGORITHMS TO MAKE COMPLEX STRATEGICAL DECISIONS 13 Figure 7. Components of the Explanation-based algorithm. In the case of chess, this model is utilized with neural networks; one network, called the chess model M, is trained to keep track of this domain-specic knowledge by training itself on large databases of grand-master chess games, and learning how to take in a board and predict what the next move will be. Specically, it receives a board s t (the board at the start of the player's turn) and maps it to s t+1 (the board at the start of the next player's turn). The system then uses M to aect how it learns an evaluation function V (which is represented by a separate neural network which is trained using the TD algorithm), such that V target (s t) s t γ V (st 2) s t+2 M(st) s t where V target (s t ) is the value the evaluation function will give at board s t, γ is a discount factor that favors early accuracy over late accuracy (i.e., a stronger starting game rather than a strong end game), V target (s t+2 ) is the value the evaluation function predicts will be valid at the player's next turn, and M(s t ) is based on the information M has about board s t, based on the chess games it was trained on. This algorithm saves time because V is the network that will be used in the chess-playing system, and with this algorithm it does not have to spend time training on games rst-hand every time it must train a player (instead, a researcher trains M once, and uses it to train V repeatedly as necessary). See Figure 7 for a diagram of this structure. This algorithm was tested against GNU Chess (a popular and well-developed open-source chess application), and won approximately 13% of its games. The largest problem with the system were its extremely poor opening strategy, and its inability to play out short endgame strategies; however, the system was able to generalize how to protect important pieces and how to do intelligent trades Population Dynamics. One of the only chess-playing learning algorithms that does not rely on a neural network framework, this algorithm uses an algorithm based on evolutionary principles in order to calculate what the system's evaluation function should be. [9] The algorithm uses a minimax

14 14 ASHANTHI MEENA SERALATHAN algorithm using α-β pruning and quiescence. The minimax search evaluates boards based on the following equation: 6 y=0 W [y](n[y] white N[y] black ) where N is a vector containing the number of pawns, knights, bishops, rooks, queens, and legal moves for the current board, and where W represents the weights of all these parameters. A small constant "bonus" is granted in cases where the player controls the center, or when pawns have the potential to be promoted. It then randomly generates a population P, which consists of chess-playing systems that have their own randomized weight vector (or chromosome), v i, where v i = {weight knight, weight bishop, weight rook, weight queen, weight legalmove, weight pawn, weight king }. Weight pawn and weight king are always xed to 1 and, respectively (and, due to being xed, do not need to be added to the chromosome or used in calculations). Each member plays against another member of the population twice (once as black and once as white). Opponents are selected based on the relative strength of the two systems. Each player is represented in a vector, V, and a variable, c, is used to iterate through the vector. A member V [c] is pitted against a randomly selected player in the remainder of the vector (V [c + 1] - V [µ]), and then c is incremented. When the vector has been iterated through a generation has passed, and the vector is reversed, which guarantees that the best player of that generation is placed at the beginning of the vector. This ordering holds due to the fact that the later players will have played more games than their predecessors, and thus will have improved more rapidly than their predecessors. This also ensures that the best player will persist in the population longer than other other players, as it will not be challenged often enough to allow for many unlucky losses. The winner of the two games is allowed to stay in the population, while the losing member is expelled from P, and a mutated version of the winner is put in its place. If there is a draw, then both members are left in P and mutated. The mutated members of the population have their chromosomes change by rst calculating the standard deviation of each parameter: µ n=1 σ (y) = (V n(y) average (y) ) 2 µ 1, y v (where y is a particular evaluation parameter (i.e., gene), V n(y) is the parameter for player V n, average (y) is the average of that parameter across the entire population, and µ is the size of the population), and then using this value in a nal mutation equation: V (y) = V (y) + ((RND(0...1) 0.5) R σ (y) ), y v where R is a xed value that is determined depending on how well the player performed: 2, if player won both games 1, if player lost and is becoming a mutated version of the winner R = 0.2, if player won a match and tied another match 0.5, if both players tied both times In this sense, the mutation of the genes reects the trends in the population, and isn't purely random, which helps in keeping the values close enough so that they can converge at some point. This also allows the values to mutate dierently depending on how well the player has played recently.

15 USING ADAPTIVE LEARNING ALGORITHMS TO MAKE COMPLEX STRATEGICAL DECISIONS 15 Figure 8. In the Population Dynamics algorithm, each member has an array of parameters that it uses to determine its next move; the best set of parameters leads to wins, which in turn leads to the member remaining in the population. In Kendall and Whitwell's experiment [9], µ = 50, and the genes were randomly set to values between One member of the population was seeded with values known to be optimal (knights = 3, bishop = 3, rook = 5, queen = 9, legalmoves = 1), both to speed up learning and to prove that the best member of the population would not be lost during the learning process. Their experiment yielded the following results: Table 1. Kendall and Whitwell's Best Players Knight Bishop Rook Queen Legal Moves Average Experimental Design For the purpose of this experiment, the algorithm utilizing Population Dynamics was implemented and tested; due to the inherently exible nature of genetic algorithms in general, it was deemed to be the most simple to alter, relative to many of the other algorithms, which either would require extensive previous knowledge of how best to analyze each and every board, or would require even more training time and calculations to properly train the chess-playing system Content of Chromosomes. Kendall and Whitwell's algorithm [9] uses piece rankings and the number of legal moves as genes in their chromosomes, and incorporates control of center and pawn promotion as constants in the search algorithm itself; the eects of removing the constant bonus for control of the center and promotion and adding these as genes to each member of the population,

16 16 ASHANTHI MEENA SERALATHAN as well as adding genes for other more abstract strategies or types of moves, such as castling, were not examined in their experiment. Considering that the algorithm as it stands would not allow the system to truly gure out how important the xed parameters are (in comparison to how important it is to keep bishops in play or to have a signicant number of legal moves), it would seem that an alteration of this sort would only improve upon the algorithm: instead of relying on programmer's intuition to set these values to a number that will not cause the system to ignore these parameters too often or give them too much priority, the system itself would be able to not only come up with its own values for these parameters, but would have done so in a manner that naturally relates these values with any other factors the system pays attention to while playing. Adding to the number of genes in each chromosome also should not lead to a problem, as genetic algorithms in general are designed with the ability to manipulate large amounts of data at once; a typical algorithm may be altering hundreds of weights at once. Thus it is doubtful that the addition of these two factors, along with a few abstract moves, would hinder the system's ability to converge to a nal set of genes. Thus for the purposes of this experiment, v i = {weight knight, weight bishop, weight rook, weight queen, weight legalmove, weight controlofcenter, weight promotion, weight castling } Crossing-Over. In Kendall and Whitwell's experiment [9], crossing-over is not incorporated in the training process at all. In order to examine the eects of crossing over on such an algorithm (such as whether or not it speeds up the rate of convergence, or whether it creates a tter population), another population will be trained in which chromosomes have a xed probability of 0.1 of performing crossovers during the mutation process. Crossovers only occur at one point in the chromosome, and involve a swap of genetic material at a random point on the chromosome Chess Engine. The chess game engine is a modied version of a ChessBoard program [10], which handles nding valid moves and checking the state of the game, but does not come with a built-in articial intelligence mechanism for choosing moves. The system is written in the Python programming language. It will be appended with the population dynamics system (which would handle training the population, keeping track of player chromosomes, and mutations), as well as a minimax search algorithm that uses α-β pruning. For the sake of speeding up training and testing the strength of the algorithm, the search depth was reduced to 1-ply, and quiescence was disabled. The evaluation function would be preserved from the original algorithm, with the exception that it for control of center, promotion, and castling, the weights in the chromosomes would be controlled with boolean values (i.e., v 6 = weight controlofcenter *1, if the player has control of the center; if not, then v 6 = weight controlofcenter *0; i.e., the weight is ignored). The way in which these values are calculated varies from the way the original genes were used simply due to the fact that these genes are only concerned with the player still having possibility of performing the move, and are not concerned with immediately quantiable values, such as how many moves the player has Results. In Kendall and Whitwell's experiment [9], the initial standard deviation of the chromosomes in the population was 3.8; because I incorporate three new genes in each chromosome, the initial standard deviation for my players begins higher, at around 10 (10.01 for the population without crossing over, and 9.84 for the population with crossing over).

17 USING ADAPTIVE LEARNING ALGORITHMS TO MAKE COMPLEX STRATEGICAL DECISIONS 17 A player with 51 members in its population was trained for 13 generations; it converged with an average gene deviation of Table 2. Player A Knight Bishop Rook Queen Legal Moves Control of Center Promotion Castling Best Player Average Values are rounded to two decimal places. These genes indicate that this player values having many Queens rst and foremost; then it becomes concerned with the number of Bishops, castling, promotion, the number of Rooks, the player's control of center, the number of legal moves, and the numberof Knights, in that order. The average player has roughtly the same goals in mind. A player with 51 members in its population and crossing-over enabled was also trained; the addition of the crossing over may have given the system some diculty in converging as quickly, as even after twice as many generations had passed, the system had not converged (and, in fact, the genes had an average deviation of ). By the end of the 26th generation the population had the following characteristics: Table 3. Player B Knight Bishop Rook Queen Legal Moves Control of Center Promotion Castling Best Player Average These genes indicate that this player values having many Knights more than anything else; then it becomes concerned with the player's control of the center, promotion potential, the number of Rooks, castling potential, the number of legal moves, the number of queens, and nally the nymber of Bishops, in that order. The average player mostly agrees with this hierarchy (the average chain is Knights, control of center, promotion, legal moves, castling, Rooks, Queens, and Bishops; Rook and legal moves swap precedence). In both of these models, winning members were not mutated (only the replacement chromosomes were mutated, at a rate of 0.5). See Appendix A for tables of all the genes of both populations. For the purpose of testing their ability to play chess (regardless of whether or not their genes were optimal), these two players were both pitted against random players (i.e., players who choose moves at random), using the same two-game format used in the training. The results are as follows: Table 4. Player A v. Random Player Result Moves Required Game 1 (White) Lost 140 Game 2 (Black) Draw (3-Repetition Rule) 180

18 18 ASHANTHI MEENA SERALATHAN Table 5. Player B v. Random Player Result Moves Required Game 1 (White) Lost 232 Game 2 (Black) Draw(3-Repetition Rule) 189 These results remain consistent if the player is allowed a rematch; the players consistently lose their rst game and then tie the second game. These players also played against players pre-tted with the piece rank genes used in the seed for Kendall and Whitwell's experiment ([3, 5, 5, 9, 1, 8, 1, 1]): Table 6. Player A v. Seeded Player Result Moves Required Game 1 (White) Draw (3-Repetition Rule) 12 Game 2 (Black) Draw (3-Repetition Rule) 47 Table 7. Player B v. Seeded Player Result Moves Required Game 1 (White) Draw (3-Repetition Rule) 12 Game 2 (Black) Draw (3-Repetition Rule) 66 For each player, the results are identical if the player and the seed play again. A similar set of games was played between a random player and the seeded player: Table 8. Seeded Player v. Random Result Moves Required Game 1 (White) Draw (3-Repetition Rule) 110 Game 2 (Black) Lost Analysis. As can be seen through the data, the results of the system's evolution dier greatly from the results of Kendall and Whitwell. Many of the genes for piece ranking are inated to a signicant degree in comparison to the "ideal" values. It is currently unclear as to whether this is the result of the added genes having an eect on the values in general, or whether the system as it stands is evolving incorrectly. It is possible, for example, that the fact that the values for the piece rankings are not only dependent on each other, but are now dependent on the new factors, would cause the values in general to deviate from the values one would expect with only the ve piece rankings in play. It is also possible that

19 USING ADAPTIVE LEARNING ALGORITHMS TO MAKE COMPLEX STRATEGICAL DECISIONS 19 the system, as it is currently written, is simply changing too quickly, thus causing abnormally high (or low) values for many of the genes. And while the results of the played games would indicate that the non-optimal values indeed lead to a non-optimal level of gameplay (as the players could not win against a random player), the fact that the seeded player also could not beat the random player suggests that the weight may not be the only cause of weak play. There may currently be a bug in the search procedure that does not allow the player to truly choose its best move, causing even the seeded player to have diculty beating the random player. While it could be the case that the 1-ply search is proving to be a detriment to the system, as the low search bound will make it more dicult for a system to make an informed decision about future moves, one would assume that a weakly informed decision would still fare, on average, better than a completely uninformed decision, which is not the case with these players (as they consistently lose as White and draw as Black). It also seems odd that the trained players lose when White (the color that moves rst, and is therefore commonly expected to have a slight advantage) and draw when Black. More work would have to be done to determine how weak the opening move for the system is, as well as why a weak opening would cause poor gameplay when the player moves rst, and cause a slightly better result when the player moves second Future Work. In order to verify the reason for the inated genes, more training sessions would have to be run, and the average best player of each system would have to be compared. This would be required in order to verify that the inated values in these preliminary training sessions were reective of the system's behavior in general. In order to determine whether the new genes are the source of the weight ination, it would be useful to go back and train a system that does not take the new parameters into account, and compare its results to the ideal set of weights. If similar to the ideal values, then it is possible that these values are close to normal, given the relationship of the new parameters to piece rankings. If the results were also equally inated, then it could be concluded that the system itself is not evolving properly. Altering how the system's members mutate (lowering the rate of mutation or the magnitude of the mutations, for example) would possibly be the best ways to slow down the amount of change in the population, and keep the values closer to the ideal values. The addition of more parameters in the chromosomes would also make for a more interesting system, which would hopefully yield a successful style of play. Adding parameters that take more complex strategies, such as knight-forking, into account would test the strength of the system when faced with abstract concepts and decisions (as opposed to the relatively straight-forward strategies and parameters incorporated in the current chromosomes). Finally, in order to truly test the strength of the system's chess abilities, it would need to play against a wide variety of dierent players (human or computer). The system should also be compared to various other systems, including systems that incorporate neural networks or other structures in order to train their players. Through these tests, one could determine the advantages and disadvantages this approach holds over other other techniques, and could determine whether this approach is (or is not) the best approach to creating systems that can solve complex problems.

20 20 ASHANTHI MEENA SERALATHAN 3.7. Conclusion. While there are not yet results that indicate that genetic algorithms create better predictions for tasks such as chess, in comparison to other learning techniques, the exibility of the algorithm still appears evident, suggesting that the technique is still worth further study. The results thus far indicate that it may be the underlying playing algorithm, rather than the training algorithm, that could be causing non-optimal results; if this is the case, and if the learning algorithm itself does not need to be tweaked to create a more optimal chess-playing population, this would re-arm the notion that genetic algorithms (and in particular, the population dynamics model) can be applied to a wide variety of problems with minimal need to hand-tailor it to the task at hand. This exibility would vary widely from other techniques for playing chess, which either require the computer scientist to personally tweak with every chess parameter until they are optimal, or require using another sort of learning system, such as a neural network, that is more computationally costly (as it is solving a large system of equations at once [ 5], rather than one equation dependent on one set of weights [ 9]) and requires careful observation on the part of the computer scientist to ensure it does not train too much. Additionally, Kendall and Whitwell's algorithm [ 9] suggests that a system trained with genetic can be training to play chess at a sophisticated level, indicating that the learning algorithm itself is capable of producing useful results for chess, and indicating that it is still worthwhile to study.

21 USING ADAPTIVE LEARNING ALGORITHMS TO MAKE COMPLEX STRATEGICAL DECISIONS References [1] M.Campbell, A.J. Hoane Jr., and F-H. Hsu Deep Blue. Articial Intelligence, 134 (1-2): [2] Nilsson, Nils J Articial Intelligence : A New Synthesis. Morgan Kaufmann Publishers, San Francisco: [3] [4] Tabibi, O.D. And Netanyahu, N.S Veried Null-move Pruning. ICGA Journal, 25 (3): [5] Cheng, B. and D.M. Titterington Neural Networks: A Review from a Statistical Perspective. Statistical Science, 9 (1): [6] Koza, John R Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge: 1,. [7] Tesauro, G Temporal dierence learning and TD-gammon. Commun. ACM 38: [8] Thrun. S Learning to play the game of chess. In G. Tesauro, D. Touretzky, and T. Leen, (eds.), Advances in Neural Information Processing Systems 7, The MIT Press: Cambridge, MA: [9] G. Kendall and G. Whitwell An evolutionary approach for the tuning of a chess evaluation function using population dynamics. In Proceedings of the 2001 Congress on Evolutionary Computation (CEC-2001), IEEE Press: , [10]

22 22 ASHANTHI MEENA SERALATHAN 5. Appendix A: Player Chromosomes Table 9. Population A Knight Bishop Rook Queen Legal Moves Control of Center Promotion Castling Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player Player

CS 570: Machine Learning Seminar. Fall 2016

CS 570: Machine Learning Seminar. Fall 2016 CS 570: Machine Learning Seminar Fall 2016 Class Information Class web page: http://web.cecs.pdx.edu/~mm/mlseminar2016-2017/fall2016/ Class mailing list: cs570@cs.pdx.edu My office hours: T,Th, 2-3pm or

More information

0 o 1 i B C D 0/1 0/ /1

0 o 1 i B C D 0/1 0/ /1 A Comparison of Dominance Mechanisms and Simple Mutation on Non-Stationary Problems Jonathan Lewis,? Emma Hart, Graeme Ritchie Department of Articial Intelligence, University of Edinburgh, Edinburgh EH

More information

Algorithms for Playing and Solving games*

Algorithms for Playing and Solving games* Algorithms for Playing and Solving games* Andrew W. Moore Professor School of Computer Science Carnegie Mellon University www.cs.cmu.edu/~awm awm@cs.cmu.edu 412-268-7599 * Two Player Zero-sum Discrete

More information

Lecture 1: March 7, 2018

Lecture 1: March 7, 2018 Reinforcement Learning Spring Semester, 2017/8 Lecture 1: March 7, 2018 Lecturer: Yishay Mansour Scribe: ym DISCLAIMER: Based on Learning and Planning in Dynamical Systems by Shie Mannor c, all rights

More information

Lecture 14 - P v.s. NP 1

Lecture 14 - P v.s. NP 1 CME 305: Discrete Mathematics and Algorithms Instructor: Professor Aaron Sidford (sidford@stanford.edu) February 27, 2018 Lecture 14 - P v.s. NP 1 In this lecture we start Unit 3 on NP-hardness and approximation

More information

Name: UW CSE 473 Final Exam, Fall 2014

Name: UW CSE 473 Final Exam, Fall 2014 P1 P6 Instructions Please answer clearly and succinctly. If an explanation is requested, think carefully before writing. Points may be removed for rambling answers. If a question is unclear or ambiguous,

More information

CS 4100 // artificial intelligence. Recap/midterm review!

CS 4100 // artificial intelligence. Recap/midterm review! CS 4100 // artificial intelligence instructor: byron wallace Recap/midterm review! Attribution: many of these slides are modified versions of those distributed with the UC Berkeley CS188 materials Thanks

More information

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet. CS 188 Fall 2018 Introduction to Artificial Intelligence Practice Final You have approximately 2 hours 50 minutes. The exam is closed book, closed calculator, and closed notes except your one-page crib

More information

Comparison of Evolutionary Methods for. Smoother Evolution. 2-4 Hikaridai, Seika-cho. Soraku-gun, Kyoto , JAPAN

Comparison of Evolutionary Methods for. Smoother Evolution. 2-4 Hikaridai, Seika-cho. Soraku-gun, Kyoto , JAPAN Comparison of Evolutionary Methods for Smoother Evolution Tomofumi Hikage, Hitoshi Hemmi, and Katsunori Shimohara NTT Communication Science Laboratories 2-4 Hikaridai, Seika-cho Soraku-gun, Kyoto 619-0237,

More information

Properties of Forward Pruning in Game-Tree Search

Properties of Forward Pruning in Game-Tree Search Properties of Forward Pruning in Game-Tree Search Yew Jin Lim and Wee Sun Lee School of Computing National University of Singapore {limyewji,leews}@comp.nus.edu.sg Abstract Forward pruning, or selectively

More information

Alpha-Beta Pruning: Algorithm and Analysis

Alpha-Beta Pruning: Algorithm and Analysis Alpha-Beta Pruning: Algorithm and Analysis Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 Introduction Alpha-beta pruning is the standard searching procedure used for solving

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.5. Spring 2010 Instructor: Dr. Masoud Yaghini Outline How the Brain Works Artificial Neural Networks Simple Computing Elements Feed-Forward Networks Perceptrons (Single-layer,

More information

Alpha-Beta Pruning: Algorithm and Analysis

Alpha-Beta Pruning: Algorithm and Analysis Alpha-Beta Pruning: Algorithm and Analysis Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 Introduction Alpha-beta pruning is the standard searching procedure used for 2-person

More information

Evolutionary Computation: introduction

Evolutionary Computation: introduction Evolutionary Computation: introduction Dirk Thierens Universiteit Utrecht The Netherlands Dirk Thierens (Universiteit Utrecht) EC Introduction 1 / 42 What? Evolutionary Computation Evolutionary Computation

More information

An Analysis of Forward Pruning. to try to understand why programs have been unable to. pruning more eectively. 1

An Analysis of Forward Pruning. to try to understand why programs have been unable to. pruning more eectively. 1 Proc. AAAI-94, to appear. An Analysis of Forward Pruning Stephen J. J. Smith Dana S. Nau Department of Computer Science Department of Computer Science, and University of Maryland Institute for Systems

More information

Alpha-Beta Pruning: Algorithm and Analysis

Alpha-Beta Pruning: Algorithm and Analysis Alpha-Beta Pruning: Algorithm and Analysis Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 Introduction Alpha-beta pruning is the standard searching procedure used for 2-person

More information

Data Warehousing & Data Mining

Data Warehousing & Data Mining 13. Meta-Algorithms for Classification Data Warehousing & Data Mining Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 13.

More information

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence CS 188: Artificial Intelligence Adversarial Search II Instructor: Anca Dragan University of California, Berkeley [These slides adapted from Dan Klein and Pieter Abbeel] Minimax Example 3 12 8 2 4 6 14

More information

Contents 1 Introduction 4 2 Go and genetic programming 4 3 Description of the go board evaluation function 4 4 Fitness Criteria for tness : : :

Contents 1 Introduction 4 2 Go and genetic programming 4 3 Description of the go board evaluation function 4 4 Fitness Criteria for tness : : : Go and Genetic Programming Playing Go with Filter Functions S.F. da Silva November 21, 1996 1 Contents 1 Introduction 4 2 Go and genetic programming 4 3 Description of the go board evaluation function

More information

cells [20]. CAs exhibit three notable features, namely massive parallelism, locality of cellular interactions, and simplicity of basic components (cel

cells [20]. CAs exhibit three notable features, namely massive parallelism, locality of cellular interactions, and simplicity of basic components (cel I. Rechenberg, and H.-P. Schwefel (eds.), pages 950-959, 1996. Copyright Springer-Verlag 1996. Co-evolving Parallel Random Number Generators Moshe Sipper 1 and Marco Tomassini 2 1 Logic Systems Laboratory,

More information

Introduction to Spring 2009 Artificial Intelligence Midterm Exam

Introduction to Spring 2009 Artificial Intelligence Midterm Exam S 188 Introduction to Spring 009 rtificial Intelligence Midterm Exam INSTRUTINS You have 3 hours. The exam is closed book, closed notes except a one-page crib sheet. Please use non-programmable calculators

More information

Playing Abstract games with Hidden States (Spatial and Non-Spatial).

Playing Abstract games with Hidden States (Spatial and Non-Spatial). Playing Abstract games with Hidden States (Spatial and Non-Spatial). Gregory Calbert, Hing-Wah Kwok Peter Smet, Jason Scholz, Michael Webb VE Group, C2D, DSTO. Report Documentation Page Form Approved OMB

More information

a subset of these N input variables. A naive method is to train a new neural network on this subset to determine this performance. Instead of the comp

a subset of these N input variables. A naive method is to train a new neural network on this subset to determine this performance. Instead of the comp Input Selection with Partial Retraining Pierre van de Laar, Stan Gielen, and Tom Heskes RWCP? Novel Functions SNN?? Laboratory, Dept. of Medical Physics and Biophysics, University of Nijmegen, The Netherlands.

More information

Q-Learning in Continuous State Action Spaces

Q-Learning in Continuous State Action Spaces Q-Learning in Continuous State Action Spaces Alex Irpan alexirpan@berkeley.edu December 5, 2015 Contents 1 Introduction 1 2 Background 1 3 Q-Learning 2 4 Q-Learning In Continuous Spaces 4 5 Experimental

More information

Summary. Agenda. Games. Intelligence opponents in game. Expected Value Expected Max Algorithm Minimax Algorithm Alpha-beta Pruning Simultaneous Game

Summary. Agenda. Games. Intelligence opponents in game. Expected Value Expected Max Algorithm Minimax Algorithm Alpha-beta Pruning Simultaneous Game Summary rtificial Intelligence and its applications Lecture 4 Game Playing Search onstraint Satisfaction Problems From start state to goal state onsider constraints Professor Daniel Yeung danyeung@ieee.org

More information

1 Trees. Listing 1: Node with two child reference. public class ptwochildnode { protected Object data ; protected ptwochildnode l e f t, r i g h t ;

1 Trees. Listing 1: Node with two child reference. public class ptwochildnode { protected Object data ; protected ptwochildnode l e f t, r i g h t ; 1 Trees The next major set of data structures belongs to what s called Trees. They are called that, because if you try to visualize the structure, it kind of looks like a tree (root, branches, and leafs).

More information

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet. CS 188 Spring 2017 Introduction to Artificial Intelligence Midterm V2 You have approximately 80 minutes. The exam is closed book, closed calculator, and closed notes except your one-page crib sheet. Mark

More information

Stochastic Search: Part 2. Genetic Algorithms. Vincent A. Cicirello. Robotics Institute. Carnegie Mellon University

Stochastic Search: Part 2. Genetic Algorithms. Vincent A. Cicirello. Robotics Institute. Carnegie Mellon University Stochastic Search: Part 2 Genetic Algorithms Vincent A. Cicirello Robotics Institute Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 15213 cicirello@ri.cmu.edu 1 The Genetic Algorithm (GA)

More information

Effect of number of hidden neurons on learning in large-scale layered neural networks

Effect of number of hidden neurons on learning in large-scale layered neural networks ICROS-SICE International Joint Conference 009 August 18-1, 009, Fukuoka International Congress Center, Japan Effect of on learning in large-scale layered neural networks Katsunari Shibata (Oita Univ.;

More information

Scaling Up. So far, we have considered methods that systematically explore the full search space, possibly using principled pruning (A* etc.).

Scaling Up. So far, we have considered methods that systematically explore the full search space, possibly using principled pruning (A* etc.). Local Search Scaling Up So far, we have considered methods that systematically explore the full search space, possibly using principled pruning (A* etc.). The current best such algorithms (RBFS / SMA*)

More information

Information Entropy Theory of Physics

Information Entropy Theory of Physics Information Entropy Theory of Physics Abstract... 1 Considering the chess game as a model of physics... 1 Using this model in physics... 4 Quantum Information Science... 4 Quantum Computing Research...

More information

Introduction to Machine Learning (Pattern recognition and model fitting) for Master students

Introduction to Machine Learning (Pattern recognition and model fitting) for Master students Introduction to Machine Learning (Pattern recognition and model fitting) for Master students Spring 007, ÖU/RAP Thorsteinn Rögnvaldsson thorsteinn.rognvaldsson@tech.oru.se Contents Machine learning algorithms

More information

Decision making, Markov decision processes

Decision making, Markov decision processes Decision making, Markov decision processes Solved tasks Collected by: Jiří Kléma, klema@fel.cvut.cz Spring 2017 The main goal: The text presents solved tasks to support labs in the A4B33ZUI course. 1 Simple

More information

Learning Tetris. 1 Tetris. February 3, 2009

Learning Tetris. 1 Tetris. February 3, 2009 Learning Tetris Matt Zucker Andrew Maas February 3, 2009 1 Tetris The Tetris game has been used as a benchmark for Machine Learning tasks because its large state space (over 2 200 cell configurations are

More information

Introduction to Arti Intelligence

Introduction to Arti Intelligence Introduction to Arti Intelligence cial Lecture 4: Constraint satisfaction problems 1 / 48 Constraint satisfaction problems: Today Exploiting the representation of a state to accelerate search. Backtracking.

More information

Lecture 23: Reinforcement Learning

Lecture 23: Reinforcement Learning Lecture 23: Reinforcement Learning MDPs revisited Model-based learning Monte Carlo value function estimation Temporal-difference (TD) learning Exploration November 23, 2006 1 COMP-424 Lecture 23 Recall:

More information

Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D

Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D 7.91 Lecture #5 Database Searching & Molecular Phylogenetics Michael Yaffe B C D B C D (((,B)C)D) Outline Distance Matrix Methods Neighbor-Joining Method and Related Neighbor Methods Maximum Likelihood

More information

Selecting Efficient Correlated Equilibria Through Distributed Learning. Jason R. Marden

Selecting Efficient Correlated Equilibria Through Distributed Learning. Jason R. Marden 1 Selecting Efficient Correlated Equilibria Through Distributed Learning Jason R. Marden Abstract A learning rule is completely uncoupled if each player s behavior is conditioned only on his own realized

More information

Introduction to Fall 2009 Artificial Intelligence Final Exam

Introduction to Fall 2009 Artificial Intelligence Final Exam CS 188 Introduction to Fall 2009 Artificial Intelligence Final Exam INSTRUCTIONS You have 3 hours. The exam is closed book, closed notes except a two-page crib sheet. Please use non-programmable calculators

More information

Designing Information Devices and Systems I Spring 2018 Lecture Notes Note Introduction to Linear Algebra the EECS Way

Designing Information Devices and Systems I Spring 2018 Lecture Notes Note Introduction to Linear Algebra the EECS Way EECS 16A Designing Information Devices and Systems I Spring 018 Lecture Notes Note 1 1.1 Introduction to Linear Algebra the EECS Way In this note, we will teach the basics of linear algebra and relate

More information

Sequential Decision Problems

Sequential Decision Problems Sequential Decision Problems Michael A. Goodrich November 10, 2006 If I make changes to these notes after they are posted and if these changes are important (beyond cosmetic), the changes will highlighted

More information

Approximate Optimal-Value Functions. Satinder P. Singh Richard C. Yee. University of Massachusetts.

Approximate Optimal-Value Functions. Satinder P. Singh Richard C. Yee. University of Massachusetts. An Upper Bound on the oss from Approximate Optimal-Value Functions Satinder P. Singh Richard C. Yee Department of Computer Science University of Massachusetts Amherst, MA 01003 singh@cs.umass.edu, yee@cs.umass.edu

More information

In Advances in Neural Information Processing Systems 6. J. D. Cowan, G. Tesauro and. Convergence of Indirect Adaptive. Andrew G.

In Advances in Neural Information Processing Systems 6. J. D. Cowan, G. Tesauro and. Convergence of Indirect Adaptive. Andrew G. In Advances in Neural Information Processing Systems 6. J. D. Cowan, G. Tesauro and J. Alspector, (Eds.). Morgan Kaufmann Publishers, San Fancisco, CA. 1994. Convergence of Indirect Adaptive Asynchronous

More information

CS188: Artificial Intelligence, Fall 2009 Written 2: MDPs, RL, and Probability

CS188: Artificial Intelligence, Fall 2009 Written 2: MDPs, RL, and Probability CS188: Artificial Intelligence, Fall 2009 Written 2: MDPs, RL, and Probability Due: Thursday 10/15 in 283 Soda Drop Box by 11:59pm (no slip days) Policy: Can be solved in groups (acknowledge collaborators)

More information

Scout, NegaScout and Proof-Number Search

Scout, NegaScout and Proof-Number Search Scout, NegaScout and Proof-Number Search Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 Introduction It looks like alpha-beta pruning is the best we can do for a generic searching

More information

Sexual Reproduction *

Sexual Reproduction * OpenStax-CNX module: m45465 1 Sexual Reproduction * OpenStax This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 3.0 Abstract By the end of this section, you

More information

Today s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning

Today s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning CSE 473: Artificial Intelligence Reinforcement Learning Dan Weld Today s Outline Reinforcement Learning Q-value iteration Q-learning Exploration / exploitation Linear function approximation Many slides

More information

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 2778 mgl@cs.duke.edu

More information

Forming Neural Networks through Ecient and Adaptive. Coevolution. Abstract

Forming Neural Networks through Ecient and Adaptive. Coevolution. Abstract Evolutionary Computation, 5(4), 1998. (In Press). Forming Neural Networks through Ecient and Adaptive Coevolution David E. Moriarty Information Sciences Institute University of Southern California 4676

More information

Computing the acceptability semantics. London SW7 2BZ, UK, Nicosia P.O. Box 537, Cyprus,

Computing the acceptability semantics. London SW7 2BZ, UK, Nicosia P.O. Box 537, Cyprus, Computing the acceptability semantics Francesca Toni 1 and Antonios C. Kakas 2 1 Department of Computing, Imperial College, 180 Queen's Gate, London SW7 2BZ, UK, ft@doc.ic.ac.uk 2 Department of Computer

More information

Introduction. Spatial Multi-Agent Systems. The Need for a Theory

Introduction. Spatial Multi-Agent Systems. The Need for a Theory Introduction Spatial Multi-Agent Systems A spatial multi-agent system is a decentralized system composed of numerous identically programmed agents that either form or are embedded in a geometric space.

More information

Artificial Intelligence Methods (G5BAIM) - Examination

Artificial Intelligence Methods (G5BAIM) - Examination Question 1 a) According to John Koza there are five stages when planning to solve a problem using a genetic program. What are they? Give a short description of each. (b) How could you cope with division

More information

Reinforcement Learning: the basics

Reinforcement Learning: the basics Reinforcement Learning: the basics Olivier Sigaud Université Pierre et Marie Curie, PARIS 6 http://people.isir.upmc.fr/sigaud August 6, 2012 1 / 46 Introduction Action selection/planning Learning by trial-and-error

More information

Non-Convex Optimization. CS6787 Lecture 7 Fall 2017

Non-Convex Optimization. CS6787 Lecture 7 Fall 2017 Non-Convex Optimization CS6787 Lecture 7 Fall 2017 First some words about grading I sent out a bunch of grades on the course management system Everyone should have all their grades in Not including paper

More information

CS221 Practice Midterm

CS221 Practice Midterm CS221 Practice Midterm Autumn 2012 1 ther Midterms The following pages are excerpts from similar classes midterms. The content is similar to what we ve been covering this quarter, so that it should be

More information

NEUROEVOLUTION. Contents. Evolutionary Computation. Neuroevolution. Types of neuro-evolution algorithms

NEUROEVOLUTION. Contents. Evolutionary Computation. Neuroevolution. Types of neuro-evolution algorithms Contents Evolutionary Computation overview NEUROEVOLUTION Presenter: Vlad Chiriacescu Neuroevolution overview Issues in standard Evolutionary Computation NEAT method Complexification in competitive coevolution

More information

CS 188 Introduction to Fall 2007 Artificial Intelligence Midterm

CS 188 Introduction to Fall 2007 Artificial Intelligence Midterm NAME: SID#: Login: Sec: 1 CS 188 Introduction to Fall 2007 Artificial Intelligence Midterm You have 80 minutes. The exam is closed book, closed notes except a one-page crib sheet, basic calculators only.

More information

AND AND AND AND NOT NOT

AND AND AND AND NOT NOT Genetic Programming Using a Minimum Description Length Principle Hitoshi IBA 1 Hugo de GARIS 2 Taisuke SATO 1 1)Machine Inference Section, Electrotechnical Laboratory 1-1-4 Umezono, Tsukuba-city, Ibaraki,

More information

ECLT 5810 Classification Neural Networks. Reference: Data Mining: Concepts and Techniques By J. Hand, M. Kamber, and J. Pei, Morgan Kaufmann

ECLT 5810 Classification Neural Networks. Reference: Data Mining: Concepts and Techniques By J. Hand, M. Kamber, and J. Pei, Morgan Kaufmann ECLT 5810 Classification Neural Networks Reference: Data Mining: Concepts and Techniques By J. Hand, M. Kamber, and J. Pei, Morgan Kaufmann Neural Networks A neural network is a set of connected input/output

More information

Local Search & Optimization

Local Search & Optimization Local Search & Optimization CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2017 Soleymani Artificial Intelligence: A Modern Approach, 3 rd Edition, Chapter 4 Outline

More information

Please bring the task to your first physics lesson and hand it to the teacher.

Please bring the task to your first physics lesson and hand it to the teacher. Pre-enrolment task for 2014 entry Physics Why do I need to complete a pre-enrolment task? This bridging pack serves a number of purposes. It gives you practice in some of the important skills you will

More information

Examiners Report/ Principal Examiner Feedback. Summer GCE Core Mathematics C3 (6665) Paper 01

Examiners Report/ Principal Examiner Feedback. Summer GCE Core Mathematics C3 (6665) Paper 01 Examiners Report/ Principal Examiner Feedback Summer 2013 GCE Core Mathematics C3 (6665) Paper 01 Edexcel and BTEC Qualifications Edexcel and BTEC qualifications come from Pearson, the UK s largest awarding

More information

Designing Information Devices and Systems I Fall 2018 Lecture Notes Note Introduction to Linear Algebra the EECS Way

Designing Information Devices and Systems I Fall 2018 Lecture Notes Note Introduction to Linear Algebra the EECS Way EECS 16A Designing Information Devices and Systems I Fall 018 Lecture Notes Note 1 1.1 Introduction to Linear Algebra the EECS Way In this note, we will teach the basics of linear algebra and relate it

More information

PRIME GENERATING LUCAS SEQUENCES

PRIME GENERATING LUCAS SEQUENCES PRIME GENERATING LUCAS SEQUENCES PAUL LIU & RON ESTRIN Science One Program The University of British Columbia Vancouver, Canada April 011 1 PRIME GENERATING LUCAS SEQUENCES Abstract. The distribution of

More information

Lecture 2. Judging the Performance of Classifiers. Nitin R. Patel

Lecture 2. Judging the Performance of Classifiers. Nitin R. Patel Lecture 2 Judging the Performance of Classifiers Nitin R. Patel 1 In this note we will examine the question of how to udge the usefulness of a classifier and how to compare different classifiers. Not only

More information

Planning With Information States: A Survey Term Project for cs397sml Spring 2002

Planning With Information States: A Survey Term Project for cs397sml Spring 2002 Planning With Information States: A Survey Term Project for cs397sml Spring 2002 Jason O Kane jokane@uiuc.edu April 18, 2003 1 Introduction Classical planning generally depends on the assumption that the

More information

Extracting Provably Correct Rules from Artificial Neural Networks

Extracting Provably Correct Rules from Artificial Neural Networks Extracting Provably Correct Rules from Artificial Neural Networks Sebastian B. Thrun University of Bonn Dept. of Computer Science III Römerstr. 64, D-53 Bonn, Germany E-mail: thrun@cs.uni-bonn.de thrun@cmu.edu

More information

Chapter 1 Introduction

Chapter 1 Introduction Chapter 1 Introduction 1.1 Introduction to Chapter This chapter starts by describing the problems addressed by the project. The aims and objectives of the research are outlined and novel ideas discovered

More information

Discrete Tranformation of Output in Cellular Automata

Discrete Tranformation of Output in Cellular Automata Discrete Tranformation of Output in Cellular Automata Aleksander Lunøe Waage Master of Science in Computer Science Submission date: July 2012 Supervisor: Gunnar Tufte, IDI Norwegian University of Science

More information

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

Scout and NegaScout. Tsan-sheng Hsu.

Scout and NegaScout. Tsan-sheng Hsu. Scout and NegaScout Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 Abstract It looks like alpha-beta pruning is the best we can do for an exact generic searching procedure.

More information

Adversarial Search & Logic and Reasoning

Adversarial Search & Logic and Reasoning CSEP 573 Adversarial Search & Logic and Reasoning CSE AI Faculty Recall from Last Time: Adversarial Games as Search Convention: first player is called MAX, 2nd player is called MIN MAX moves first and

More information

Looking Under the EA Hood with Price s Equation

Looking Under the EA Hood with Price s Equation Looking Under the EA Hood with Price s Equation Jeffrey K. Bassett 1, Mitchell A. Potter 2, and Kenneth A. De Jong 1 1 George Mason University, Fairfax, VA 22030 {jbassett, kdejong}@cs.gmu.edu 2 Naval

More information

Hierarchical Evolution of Neural Networks. The University of Texas at Austin. Austin, TX Technical Report AI January 1996.

Hierarchical Evolution of Neural Networks. The University of Texas at Austin. Austin, TX Technical Report AI January 1996. Hierarchical Evolution of Neural Networks David E. Moriarty and Risto Miikkulainen Department of Computer Sciences The University of Texas at Austin Austin, TX 78712 moriarty,risto@cs.utexas.edu Technical

More information

Using NEAT to Stabilize an Inverted Pendulum

Using NEAT to Stabilize an Inverted Pendulum Using NEAT to Stabilize an Inverted Pendulum Awjin Ahn and Caleb Cochrane May 9, 2014 Abstract The inverted pendulum balancing problem is a classic benchmark problem on which many types of control implementations

More information

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet. CS 188 Spring 2017 Introduction to Artificial Intelligence Midterm V2 You have approximately 80 minutes. The exam is closed book, closed calculator, and closed notes except your one-page crib sheet. Mark

More information

Reading Selection: How do species change over time?

Reading Selection: How do species change over time? Reading Selection: How do species change over time? 1. Who first studied how species change over time? Jean-Baptiste Lamarck (1744-1829) and Charles Darwin (1809-1882) both had ideas about how life on

More information

How do species change over time?

How do species change over time? Who first studied how species change over time? How do species change over time? Jean-Baptiste Lamarck (1744-1829) and Charles Darwin (1809-1882) both had ideas about how life on earth changed over time.

More information

On the errors introduced by the naive Bayes independence assumption

On the errors introduced by the naive Bayes independence assumption On the errors introduced by the naive Bayes independence assumption Author Matthijs de Wachter 3671100 Utrecht University Master Thesis Artificial Intelligence Supervisor Dr. Silja Renooij Department of

More information

Artificial Intelligence (AI) Common AI Methods. Training. Signals to Perceptrons. Artificial Neural Networks (ANN) Artificial Intelligence

Artificial Intelligence (AI) Common AI Methods. Training. Signals to Perceptrons. Artificial Neural Networks (ANN) Artificial Intelligence Artificial Intelligence (AI) Artificial Intelligence AI is an attempt to reproduce intelligent reasoning using machines * * H. M. Cartwright, Applications of Artificial Intelligence in Chemistry, 1993,

More information

A fast algorithm to generate necklaces with xed content

A fast algorithm to generate necklaces with xed content Theoretical Computer Science 301 (003) 477 489 www.elsevier.com/locate/tcs Note A fast algorithm to generate necklaces with xed content Joe Sawada 1 Department of Computer Science, University of Toronto,

More information

Heuristic Search Algorithms

Heuristic Search Algorithms CHAPTER 4 Heuristic Search Algorithms 59 4.1 HEURISTIC SEARCH AND SSP MDPS The methods we explored in the previous chapter have a serious practical drawback the amount of memory they require is proportional

More information

2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51

2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51 Star Joins A common structure for data mining of commercial data is the star join. For example, a chain store like Walmart keeps a fact table whose tuples each

More information

Lecture 6: Backpropagation

Lecture 6: Backpropagation Lecture 6: Backpropagation Roger Grosse 1 Introduction So far, we ve seen how to train shallow models, where the predictions are computed as a linear function of the inputs. We ve also observed that deeper

More information

A Simple Model for Sequences of Relational State Descriptions

A Simple Model for Sequences of Relational State Descriptions A Simple Model for Sequences of Relational State Descriptions Ingo Thon, Niels Landwehr, and Luc De Raedt Department of Computer Science, Katholieke Universiteit Leuven, Celestijnenlaan 200A, 3001 Heverlee,

More information

Evolutionary Design I

Evolutionary Design I Evolutionary Design I Jason Noble jasonn@comp.leeds.ac.uk Biosystems group, School of Computing Evolutionary Design I p.1/29 This lecture Harnessing evolution in a computer program How to construct a genetic

More information

T o p 5 r o u t e s t o M E D I C I N E w i t h a d e g r e e i n B I O M E D I C A L S C I E N C E S

T o p 5 r o u t e s t o M E D I C I N E w i t h a d e g r e e i n B I O M E D I C A L S C I E N C E S T o p 5 r o u t e s t o M E D I C I N E w i t h a d e g r e e i n B I O M E D I C A L S C I E N C E S B R O U G H T T O Y O U B Y E U R O P A M E D I C I N E ( V E R Y ) Q U I C K I N T R O My name's Habib

More information

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD WHAT IS A NEURAL NETWORK? The simplest definition of a neural network, more properly referred to as an 'artificial' neural network (ANN), is provided

More information

Algebra Exam. Solutions and Grading Guide

Algebra Exam. Solutions and Grading Guide Algebra Exam Solutions and Grading Guide You should use this grading guide to carefully grade your own exam, trying to be as objective as possible about what score the TAs would give your responses. Full

More information

Deep Reinforcement Learning. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 19, 2017

Deep Reinforcement Learning. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 19, 2017 Deep Reinforcement Learning STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 19, 2017 Outline Introduction to Reinforcement Learning AlphaGo (Deep RL for Computer Go)

More information

1/sqrt(B) convergence 1/B convergence B

1/sqrt(B) convergence 1/B convergence B The Error Coding Method and PICTs Gareth James and Trevor Hastie Department of Statistics, Stanford University March 29, 1998 Abstract A new family of plug-in classication techniques has recently been

More information

Ajit Narayanan and Mark Moore. University of Exeter

Ajit Narayanan and Mark Moore. University of Exeter Quantum-inspired genetic algorithms Ajit Narayanan and Mark Moore Department of Computer Science University of Exeter Exeter, United Kingdom, EX4 4PT ajit@dcs.exeter.ac.uk Abstract A novel evolutionary

More information

Evolutionary computation

Evolutionary computation Evolutionary computation Andrea Roli andrea.roli@unibo.it Dept. of Computer Science and Engineering (DISI) Campus of Cesena Alma Mater Studiorum Università di Bologna Outline 1 Basic principles 2 Genetic

More information

Exam EDAF May 2011, , Vic1. Thore Husfeldt

Exam EDAF May 2011, , Vic1. Thore Husfeldt Exam EDAF05 25 May 2011, 8.00 13.00, Vic1 Thore Husfeldt Instructions What to bring. You can bring any written aid you want. This includes the course book and a dictionary. In fact, these two things are

More information

Chapter 7. Evolutionary Game Theory

Chapter 7. Evolutionary Game Theory From the book Networks, Crowds, and Markets: Reasoning about a Highly Connected World. By David Easley and Jon Kleinberg. Cambridge University Press, 2010. Complete preprint on-line at http://www.cs.cornell.edu/home/kleinber/networks-book/

More information

Forecasting & Futurism

Forecasting & Futurism Article from: Forecasting & Futurism December 2013 Issue 8 A NEAT Approach to Neural Network Structure By Jeff Heaton Jeff Heaton Neural networks are a mainstay of artificial intelligence. These machine-learning

More information

Uncertainty. Michael Peters December 27, 2013

Uncertainty. Michael Peters December 27, 2013 Uncertainty Michael Peters December 27, 20 Lotteries In many problems in economics, people are forced to make decisions without knowing exactly what the consequences will be. For example, when you buy

More information

CSC 4510 Machine Learning

CSC 4510 Machine Learning 10: Gene(c Algorithms CSC 4510 Machine Learning Dr. Mary Angela Papalaskari Department of CompuBng Sciences Villanova University Course website: www.csc.villanova.edu/~map/4510/ Slides of this presenta(on

More information

CPSC 320 Sample Solution, Reductions and Resident Matching: A Residentectomy

CPSC 320 Sample Solution, Reductions and Resident Matching: A Residentectomy CPSC 320 Sample Solution, Reductions and Resident Matching: A Residentectomy August 25, 2017 A group of residents each needs a residency in some hospital. A group of hospitals each need some number (one

More information

A Cooperation Strategy for Decision-making Agents

A Cooperation Strategy for Decision-making Agents A Cooperation Strategy for Decision-making Agents Eduardo Camponogara Institute for Complex Engineered Systems Carnegie Mellon University Pittsburgh, PA 523-389 camponog+@cs.cmu.edu Abstract The style

More information