CMSC 474, Game Theory 4b. Game-Tree Search Dana Nau University of Maryland Nau: Game Theory 1
Finite perfect-information zero-sum games! Finite: Ø finitely many agents, actions, states, histories! Perfect information: Ø Every agent knows all of the players utility functions all of the players actions and what they do the history and current state Ø No simultaneous actions agents move one-at-a-time! Constant sum (or zero-sum): Ø Constant k such that regardless of how the game ends, Σ i=1,,n u i = k Ø For every such game, there s an equivalent game in which k = 0 Nau: Game Theory 2
Examples! Deterministic: Ø chess, checkers Ø go, gomoku Ø reversi (othello) Ø tic-tac-toe, qubic, connect-four Ø mancala (awari, kalah) Ø 9 men s morris (merelles, morels, mill)! Stochastic: Ø backgammon, monopoly, yahtzee, parcheesi, roulette, craps! For now, we ll consider just the deterministic games Nau: Game Theory 3
Outline! A brief history of work on this topic! Restatement of the Minimax Theorem! Game trees! The minimax algorithm! α-β pruning! Resource limits, approximate evaluation! Most of this isn t in the game-theory book! For further information, look at one of the following Ø The private materials page Ø Russell & Norvig s Artificial Intelligence: A Modern Approach There are 3 editions of this book In the 2 nd edition, it s Chapter 6 Nau: Game Theory 4
Brief History 1846 (Babbage) designed machine to play tic-tac-toe 1928 (von Neumann) minimax theorem 1944 (von Neumann & Morgenstern) backward induction 1950 (Shannon) minimax algorithm (finite-horizon search) 1951 (Turing) program (on paper) for playing chess 1952 7 (Samuel) checkers program capable of beating its creator 1956 (McCarthy) pruning to allow deeper minimax search 1957 (Bernstein) first complete chess program, on IBM 704 vacuum-tube computer could examine about 350 positions/minute 1967 (Greenblatt) first program to compete in human chess tournaments 3 wins, 3 draws, 12 losses 1992 (Schaeffer) Chinook won the 1992 US Open checkers tournament 1994 (Schaeffer) Chinook became world checkers champion; Tinsley (human champion) withdrew for health reasons 1997 (Hsu et al) Deep Blue won 6-game match vs world chess champion Kasparov 2007 (Schaeffer et al) Checkers solved: with perfect play, it s a draw 10 14 calculations over 18 years Nau: Game Theory 5
Restatement of the Minimax Theorem! Suppose agents 1 and 2 use strategies s and t on a 2-person game G Ø Let u(s,t) = u 1 (s,t) = u 2 (s,t) Ø Call the agents Max and Min (they want to maximize and minimize u) Minimax Theorem: If G is a two-person finite zero-sum game, then there are strategies s * and t *, and a number v called G s minimax value, such that Ø If Min uses t *, Max s expected utility is v, i.e., max s u(s,t * ) = v Ø If Max uses s *, Max s expected utility is v, i.e., min t u(s *,t) = v Corollary 1:! u(s *,t * ) = v! (s *,t * ) is a Nash equilibrium s * and t * are also called perfect play for Max and Min! s * (or t * ) is Max s (or Min s) minimax strategy and maximin strategy Corollary 2: If G is a perfect-information game, then there are subgameperfect pure strategies s * and t * that satisfy the theorem. Nau: Game Theory 6
No Non-Credible Threats! Recall this example from Chapter 4:! If the agents can announce their strategies beforehand, agent 1 might want to announce (B,H) Ø Threat that if 2 chooses F, agent 1 will choose a move that hurts both agents Ø H isn t a credible threat Ø If agent 2 plays F anyway, it wouldn t be rational for agent 1 to play H C A D (3,8) (8,3) 1 B 2 2 E G F (5,5) 1 H (2,10) (1,0)! In zero-sum games, non-credible threats cannot occur Ø Any move M that hurts 2 will help 1 Ø If agent1 has an opportunity to play M, it would be rational to play it Nau: Game Theory 7
Game Tree Terminology! Root node: where the game starts! Max (or Min) node: a node where it s Max s (or Min s) move Ø Usually draw Max nodes as squares, Min nodes as circles! A node s children: the possible next nodes Ø children(h) = {σ (h,a) a χ(h)}! Terminal node: a node where the game ends Max node Min nodes Min nodes Terminal nodes Max nodes Nau: Game Theory 8
Number of Nodes! Let b = maximum branching factor! Let D = height of tree (maximum depth of any terminal node)! If D is even and the root is a Max node, then Ø The number of Max nodes is 1 + b 2 + b 4 + + b D 2 = O(b D ) Ø The number of Min nodes is b + b 3 + b 5 + + b D 1 = O(b D )! What if D is odd? Max node Min nodes Min nodes Terminal nodes Max nodes Nau: Game Theory 9
Number of Pure Strategies! Pure strategy for Max: at every Max node, choose one branch! O(b D ) Max nodes, b choices at each of them => O( b b D ) pure strategies Ø In the following tree, how many pure strategies for Max?! Many of them are equivalent (differ only at unreachable nodes) Ø How many distinct pure strategies are there?! What about Min? Max node Min nodes Min nodes Terminal nodes Max nodes Nau: Game Theory 10
Number of Distinct Pure Strategies! At every reachable Max node, choose one branch Ø Number of reachable Max nodes b 0 + b 1 + b 2 + + b (D 2)/2 = O(b D/2 )! b choices at each of them => O( b b D/2 ) distinct pure strategies! Again, what about Min? Max node Min nodes Min nodes Terminal nodes Max nodes Nau: Game Theory 11
Finding the Minimax Strategy! Brute-force way to find minimax strategy for Max Ø Construct the sets S and T of all distinct strategies for Max and Min, then choose! Complexity analysis: s* = argmin s S max t T Ø Need to construct and store O( b b D/2 ) distinct strategies Ø Each distinct strategy has size O(b D/2 ) Ø Thus space complexity is O( b (b D/2 D/2)) Ø Time complexity is slightly worse u( s,t)! But there s an easier way to find the minimax strategy! Notation: v(h) = minimax value of the subgame at node h If h is terminal then v(h) = u(h) Nau: Game Theory 12
Minimax! Backward induction algorithm (Chapter 4) for 2-player zero-sum games Ø Returns v(h); can easily modify it to return both v(h) and a function Minimax (h) if h Z then return v(h) else ρ (h) = Max then return max{minimax(σ (h,a)) : a χ(h)} else return min{minimax(σ (h,a)) : a χ(h)} MAX 3 a 1 a 2 a 3 MIN 3 2 2 a 11 a 12 a 13 a 21 a 22 a 23 a 31 a 32 a 33 3 12 8 2 4 6 14 5 2 Nau: Game Theory 13
Complexity Analysis! Space complexity = (maximum path length) (space needed to store the path) = O(bD)! Time complexity = size of the game tree = O(b D ) Ø where b = branching factor, D = height of the game tree! This is a lot better than O( b (b D/2 D/2))! But it still isn t good enough for games like chess Ø b 35, D 100 for reasonable chess games Ø b h 35 100 10 135 nodes! Number of particles in the universe 10 87 Ø 10 135 nodes is 10 55 times the number of particles in the universe no way to examine every node! Nau: Game Theory 14
Limited-Depth Minimax (Wiener,1948) function LD-minimax(h,d) if h Z then return v(h) else if d = 0 then return e(h) else if ρ (h) = Max then return max a χ(h) LD-minimax(σ (h,a)), d 1) else return min a χ(h) LD-minimax(σ (h,a)), d 1)! Minimax algorithm with an upper bound d on the search depth Ø e(h) is a static evaluation function: returns an estimate of v(h)! Returns an approximation of v(h) Ø If d height of node h, returns v(h) exactly! Space complexity = O(min(bD,bd)), where D = height of node h! Time complexity = O(min(b D, b d )) Nau: Game Theory 15
Evaluation Functions! e(h) is often a weighted sum of features Ø e(h) = w 1 f 1 (h) + w 2 f 2 (h) + + w n f n (h)! E.g., in chess, Ø 1 (white pawns black pawns) + 3 (white knights black knights) + Nau: Game Theory 16
How to Use Limited-Depth Minimax function LD-choice(h,d) if ρ (h) = Max then return arg max a χ(h) LD-minimax(σ (h,a)), d 1) else return arg min a χ(h) LD-minimax(σ (h,a)), d 1) function LD-minimax(h,d) if h Z then return v(h) else if d = 0 then return e(h) else if ρ (h) = Max then return max a χ(h) LD-minimax(σ (h,a)), d 1) else return min a χ(h) LD-minimax(σ (h,a)), d 1)! LD-choice only returns a single action; call it at every move Ø Why? Nau: Game Theory 17
Exact Values for e Don t Matter MAX MIN 1 2 1 20 1 2 2 4 1 20 20 400! Behavior is preserved under any monotonic transformation of e Ø Only the order matters Nau: Game Theory 18
Multiplayer Games! The Max n algorithm: backward induction for n players, with cutoff depth d and evaluation function e! Node evaluation is a payoff profile v v[i] is player i s payoff! Approximate SPE payoff profile Ø Exact if d height of h function Maxn (h, d) if h Z then return u(h) if d = 0 then return e(h) V = {Maxn(σ (h,a), d 1) a χ(h)} return arg max v V v[ρ (h)] function Maxn-choice (h, d) 1 s move: return arg max a χ(h) Maxn(σ (h,a), d 1) [ρ (h)] 2 p move to s 2 s q (2,4,4) r (3,5,2) s (4,5,1) move: 4 3 2 5 5 4 t u v w x y 3 (2,4,4) (6,3,1) (5,2,2) (3,5,2) (4,5,1) (1,4,5) 4 Nau: Game Theory 19
Multiplayer Games! The Paranoid algorithm: compute approximate maxmin value for player i, with cutoff depth d and evaluation function e Ø exact maxmin value if d h s height! Compute payoff for i, assuming i wants to maximize it, and the others want to minimize it function Paranoid(i, h, d) if h Z then return u(h)[i] if d = 0 then return e(h)[i] if ρ (h) = i then return max a χ(h) Paranoid(i, σ (h,a), d 1) else return min a χ(h) Paranoid(i, σ (h,a), d 1) function Paranoid-choice (i, h, d) if ρ (h) = i then return arg max a χ(h) Paranoid(i, σ (h,a), d 1) else return error 1 s move: 2 move to r 2 s q 2 r 3 s 1 move: 2 6 5 3 4 1 p t u v w x y 3 (2,4,4) (6,3,1) (5,2,2) (3,5,2) (4,5,1) (1,4,5) 1 Nau: Game Theory 20
Discussion! Neither the Max n algorithm nor the Paranoid algorithm has been entirely successful! Partly due to dynamic relationships among players Ø Human players may hold grudges Ø Informal alliances may form and dissolve over time e.g., the players who are behind may gang up on the player who s winning! These relationships can greatly influence the players strategies Ø Max n and Paranoid don t model these relationships! To play well in a multi-player game, ways are needed to Ø Decide when/whether to cooperate with others Ø Deduce each player s attitude toward the other players Nau: Game Theory 21
Pruning! Let s go back to 2-player zero-sum games Ø Minimax and LD-minimax both examine nodes that don t need to be examined MAX a 3 MIN b 3 2 c d e 3 12 8 2 X X Nau: Game Theory 22
Pruning! b is better for Max than f is! If Max is rational then Max will never choose f! So don t examine any more nodes below f Ø They can t affect v(a) MAX a 3 MIN b 3 f 2 c d e g 3 12 8 2 X X Nau: Game Theory 23
Pruning! Don t know whether h is better or worse than b MAX a 3 MIN b 3 f 2 h 14 c d e g i 3 12 8 2 X X 14 Nau: Game Theory 24
Pruning! Still don t know whether h is better or worse than b MAX a 3 MIN b 3 f 2 h 14 5 c d e g i 3 12 8 2 X X 14 j 5 Nau: Game Theory 25
Pruning! h is worse than b Ø Don t need to examine any more nodes below h! v(a) = 3 MAX a 3 3 MIN b 3 f 2 h 14 5 22 c d e g i 3 12 8 2 X X 14 j k X 5 2 Nau: Game Theory 26
Alpha Cutoff! Squares are Max nodes, circles are Min nodes! Let α = max(a,b,c), and suppose d < α! To reach s, the game must go through p, q, r! By moving elsewhere at one of those nodes, Max can get v α! If the game ever reaches node s, then Min can achieve v d < what Max can get elsewhere Ø Max will never let that happen Ø We don t need to know anything more about s! What if d = α? p v=a v=b v=c v=d v a q v b r v c α = max(a,b,c) s v d Nau: Game Theory 27
Beta Cutoff! Squares are Max nodes, circles are Min nodes! Let β = min(a,b,c), and suppose d > β p v a! To reach s, the game must go through p, q, r! By moving elsewhere at one of those nodes, Min can achieve v β v=a q v b! If the game ever reaches node s, then Max can achieve v d > what Min can get elsewhere v=b r v c Ø Min will never let that happen Ø We don t need to know anything more about s v=c s β = min(a,b,c) v d! What if d = β? v=d Nau: Game Theory 28
Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column α = b return v α = a c if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do v min(v, Alpha-Beta (σ (h,a), d 1, α, β) if v α then return v else β min(β,v) e d j m f i k l g h Nau: Game Theory 29
Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column α = b return v 7 α = X 7 a 7 c if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do v min(v, Alpha-Beta (σ (h,a), d 1, α, β) if v α then return v else β min(β,v) e d j m f i k l g h Nau: Game Theory 30
Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column α = b 7 f g return v h e α = X 7 a 7 d c i k l j if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do v min(v, Alpha-Beta (σ (h,a), d 1, α, β) if v α then return v else β min(β,v) m Nau: Game Theory 31
Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column α = b 7 f g 5 return v 5 h e α = X 7 a 7 d c i k l j if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do v min(v, Alpha-Beta (σ (h,a), d 1, α, β) if v α then return v else β min(β,v) m Nau: Game Theory 32
Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column α = b return v 7 f 5 g 5 h -3 e α = X 7 a 7 d c i k l j if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do v min(v, Alpha-Beta (σ (h,a), d 1, α, β) if v α then return v else β min(β,v) m Nau: Game Theory 33
Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column return v α = X 7 a α = 7 c b 7 d 5 e 5 j alpha cutoff f 5 i k l g 5 h -3 if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do v min(v, Alpha-Beta (σ (h,a), d 1, α, β) if v α then return v else β min(β,v) m Nau: Game Theory 34
Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column return v α = X 7 a α = 7 b 7 d 5 e 5 alpha cutoff f 5 i k g 5 h -3 c j l if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do v min(v, Alpha-Beta (σ (h,a), d 1, α, β) if v α then return v else β min(β,v) m Nau: Game Theory 35
Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column α = X 7 a α = 7 b 7 d 5 e 5 alpha cutoff f 5 i 0 k g 5 h -3 0 return v c j l if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do v min(v, Alpha-Beta (σ (h,a), d 1, α, β) if v α then return v else β min(β,v) m Nau: Game Theory 36
Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column α = X 7 a α = 7 c b 7 d 5 X 8 e 5 j alpha cutoff 8 f 5 i 8 k l g 5 h -3 0 8 return v if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do v min(v, Alpha-Beta (σ (h,a), d 1, α, β) if v α then return v else β min(β,v) β = 8 m Nau: Game Theory 37
Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column α = X 7 v min(v, a α = 7 c b 7 d m 5 X 8 e 5 j alpha cutoff 8 f 5 i 8 k l β = 8 9 beta cutoff g 5 h -3 0 8 9 return v if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do if v α then return v else β min(β,v) Alpha-Beta (σ (h,a), d 1, α, β) Nau: Game Theory 38
Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column α = X 7 v min(v, a α = 7 X 8 c b 7 α = X 7 8 8 d m X 5 8 X 8 e 5 j alpha cutoff 8 f 5 i 8 k l β = 8 9 beta cutoff g 5 h -3 0 8 9 return v if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do if v α then return v else β min(β,v) Alpha-Beta (σ (h,a), d 1, α, β) Nau: Game Theory 39
Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column α = X 7 v min(v, a α = 7 X 8 c b 7 α = X 7 8 8 d m β = 8 X 5 8 X 8 e 5 j alpha cutoff 8 f 5 i 8 k l β = 8 9 beta cutoff g 5 h -3 0 8 9 return v if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do if v α then return v else β min(β,v) Alpha-Beta (σ (h,a), d 1, α, β) Nau: Game Theory 40
Properties of Alpha-Beta! Alpha-beta pruning reasons about which computations are relevant Ø A form of metareasoning Theorem:! If the value returned by Minimax(h, d) is in [α,β] then Alpha-Beta(h, d, α, β) returns the same value! If the value returned by Minimax(h, d) is α then Alpha-Beta(h, d, α, β) returns a value α! If the value returned by Minimax(h, d) is β then Alpha-Beta(h, d, α, β) returns a value β Corollary: Alpha-Beta(h, d,, ) returns the same value as Minimax(h, d) Alpha-Beta(h,,, ) returns v(h) Nau: Game Theory 41
Node Ordering! Deeper lookahead (larger d) usually gives better decisions Ø There are pathological games where it doesn t, but those are rare! Compared to LD-minimax, how much farther ahead can Alpha-Beta look?! Best case:........................... Ø children of Max nodes are searched in greatest-value-first order, children of Min nodes are searched in least-value-first order Ø Alpha-Beta s time complexity is O(b d/2 ) doubles the solvable depth! Worst case: Ø children of Max nodes are searched in least-value first order, children of Min nodes are searched in greatest-value first order Ø Like LD-minimax, Alpha-Beta visits all nodes of depth d: time complexity O(b d ) Nau: Game Theory 42
Node Ordering! How to get closer to the best case: Ø Every time you expand a state s, apply e to its children Ø When it s Max s move, sort the children in order of largest e first Ø When it s Min s move, sort the children in order of smallest e first! Suppose we have 100 seconds, explore 10 4 nodes/second Ø 10 6 nodes per move Ø Put this into the form b d/2 35 8/2 Ø Best case Alpha-Beta reaches depth 8 pretty good chess program Nau: Game Theory 43
Other Modifications! Several other modifications that can improve the accuracy or computation time: Ø quiescence search and biasing Ø transposition tables Ø thinking on the opponent s time Ø table lookup of book moves Ø iterative deepening Ø forward pruning Nau: Game Theory 44
Quiescence Search and Biasing! In a game like checkers or chess, the evaluation is based greatly on material pieces Ø The evaluation is likely to be inaccurate if there are pending captures! Search deeper to reach a position where there aren t pending captures Ø Evaluations will be more accurate here! That creates another problem Ø You re searching some paths to an even depth, others to an odd depth Ø Paths that end just after your opponent s move will look worse than paths that end just after your move! To compensate, add or subtract a number called the biasing factor Nau: Game Theory 45
Transposition Tables! Often there are multiple paths to the same state (i.e., the state space is a really graph rather than a tree)! Idea: Ø when you compute s s minimax value, store it in a hash table Ø visit s again retrieve its value rather than computing it again! The hash table is called a transposition table! Problem: too many states to store all of them Ø Store some, rather than all Ø Try to store the ones that you re most likely to need Nau: Game Theory 46
Thinking on the Opponent s Time! Suppose you re at node a Ø Use alpha-beta to estimate v(b) and v(c) Ø c looks better, so move there! Consider your estimates of v(f) and v(g) Ø They suggest your opponent is likely to move to f Ø While waiting for the opponent to move, start an alpha-beta search below f! If the opponent moves to f, then you ve already done a lot of the work of figuring out your next move b 3 a d 5 e 3 f 8 g 17 8 c 8 Nau: Game Theory 47
Book Moves! In some games, experts have spent lots of time analyzing openings Ø Sequences of moves one might play at the start of the game Ø Best responses to those sequences! Some of these are cataloged in standard reference works Ø e.g., the Encyclopaedia of Chess Openings! Store these in a lookup table Ø Respond almost immediately, as long as the opponent sticks to a sequence that s in the book! A technique humans can use when playing against such a program Ø Deliberately make a move that isn t in the book Ø This may weaken the human s position, but the computer will (i) start taking longer and (ii) stop playing as well Nau: Game Theory 48
Iterative Deepening! How deeply should you search a game tree? Ø When you call Alpha-Beta(h, d,, ), what to use for d?! small d => don t make as good a decision! large d => run out of time without knowing what move to make! Solution: iterative deepening for d = 1 by 1 until you run out of time m the move returned by Alpha-Beta(h, d,, )! Why this works: Ø Time complexity is O(b 1 + b 2 + + b d ) = O(b d ) Ø For large b, each iteration takes much more time than all of the previous iterations put together Nau: Game Theory 49
Forward Pruning! Tapered search: Ø Instead of looking at all of a node s children, just look at the n best ones the n highest e(h) values Ø Decrease the value of n as you go deeper in the tree b a 5 c -3 d 8 d 17 ignore n = 3 Ø Drawback: may exclude an important move at a low level of the tree c 2 ignore! Marginal forward pruning: Ø Ignore every child h such that e(h) is smaller than the current best values from the nodes you ve already visited Ø Not reliable, should be avoided Nau: Game Theory 50
Forward Pruning! Until the mid-1970s most chess programs were trying to search the same way humans think Ø Use extensive chess knowledge at each node to select a few plausible moves; prune the others Ø Serious tactical shortcomings! Brute-force search programs did better, dominated computer chess for about 20 years! Early 1990s: development of some forward-pruning techniques that worked well Ø Null-move pruning! Today most chess programs use some kind of forward-pruning Ø null-move pruning is one of the most popular Nau: Game Theory 51
Game-Tree Search in Practice! Checkers: In 1994, Chinook ended 40-year-reign of human world champion Marion Tinsley Ø Tinsley withdrew for health reasons, died a few months later! In 2007, Checkers was solved: with perfect play, it s a draw This took 10 14 calculations over 18 years. Search space size 5 10 20! Chess: In 1997, Deep Blue defeated Gary Kasparov in a six-game match Ø Deep Blue searches 200 million positions per second Ø Uses very sophisticated evaluation, and undisclosed methods for extending some lines of search up to 40 ply! Othello: human champions don t compete against computers Ø The computers are too good! Go: in 2006, good amateurs could beat the best go programs Even with a 9-stone handicap Ø Go programs have improved a lot during the past 7 years Nau: Game Theory 52
Rules of Go (Abbreviated)! Go-board: 19 19 locations (intersections on a grid)! Back and White take turns! Black has the first move! Each move consists of placing a stone at an unoccupied location Ø You just put them on the board; you don t move them around! Adjacent stones of the same color are called a string. Ø Liberties are the empty locations next to the string Ø A string is removed if its number of liberties is 0! Score: territories (number of occupied or surrounded locations) Nau: Game Theory 53
Why Go is Difficult for Computers! A game tree s size grows exponentially with both its depth and its branching factor! The game tree for go: Ø branching factor 200 Ø game length 250 to 300 moves Ø number of paths in the game tree 10 525 to 10 620! Much too big for a normal game tree search b =2 b =3 b =4 Updated 9/23/10 Nau: Game Theory 43 Nau: Game Theory 54
Why Go is Difficult for Computers! Writing an evaluation function for chess Ø Mainly piece count, plus some positional considerations isolated/doubled pawns, rooks on open files (columns), pawns in the center of the board, etc. Ø As the game progresses, pieces get removed => evaluation gets easier! Writing an evaluation function for Go is Ø Much more complicated whether or not a group is alive which stones can be connected to one another the extent to which a position has influence or can be attacked! As the game progresses, pieces get added => evaluation gets more complicated Nau: Game Theory 55
Game-tree Search in Go! During the past five years, go programs have gotten much better! Main reason: Monte Carlo roll-outs! Basic idea: Ø Generate a list of potential moves A ={a 1,, a n } Ø For each move a i in A do Starting at σ(h,a i ), generate a set of random games G i in which the two players make all their moves at random Ø Choose the move a i that produced the best results Nau: Game Theory 56
Multi-Arm Bandit! Statistical model of sequential experiments Ø Name comes from a traditional slot machine (one-armed bandit)! Multiple actions Ø Each action provides a reward from a probability distribution associated with that specific action Ø Objective: maximize the expected utility of a sequence of actions! Exploitation vs exploration dilemma: Ø Exploitation: choosing an action that you already know about, because you think it s likely to give you a high reward Ø Exploration: choosing an action that you don t know much about, in hopes that maybe it will produce a better reward than the actions you already know about Nau: Game Theory 57
UCB (Upper Confidence Bound) Algorithm! Let! loop Ø x i = average reward you ve gotten from arm i Ø t i = number of times you ve tried arm i; Ø t = i t i Ø if there are one or more arms that have not been played Ø then play one of them Ø else play the arm i that has the highest value of x i + 2 (log t)/t i Nau: Game Theory 58
UCT (UCB for Trees)! Adaptation of UCB for game-tree search in go Ø First used in Mogo in 2006 Ø Mogo won the go tournament at the 2007 Computer Olympiad, and several other computer go tournaments Ø Mogo won one (of 3) 9x9 blitz go games against a human professional! UCT is now used in most computer go programs! I ve had some trouble finding a clear and unambiguous description of UCT Ø But I think the basic algorithm is as follows Nau: Game Theory 59
Basic UCT Algorithm! Start with an empty tree, T! loop Ø h the start of the game Ø while h T do T = {nodes we ve evaluated} The loop ends when we reach a node we haven t evaluated use UCB to choose a child of h if h has one or more children that aren t already in T then h any child of h that isn t in T else h arg max k children(h) x k + 2 (log t h )/t k Ø play a random game starting at h; v the game s payoff profile Ø add h to T; t h 1; x h v[ρ(h)] Ø for every g ancestors(h) do x g (t g x g + v[ρ(g)]) / (t g + 1) t g t g + 1 i.e., x g = avg. payoff to ρ(g) of all games below g Nau: Game Theory 60
Properties! As the number of iterations goes to infinity, the evaluation x h of the root node converges to its minmax value! Question: Ø How can this be, since x h is the the average of the values we ve gotten below h? Nau: Game Theory 61
Properties! As the number of iterations goes to infinity, the evaluation x h of the root node converges to its minmax value! Question: Ø How can this be, since x h is the the average of the values we ve gotten below h?! Look at how h is chosen:! Start with an empty tree, T! loop Ø h the start of the game Ø while h T do if h has one or more children that aren t already in T then h any child of h that isn t in T else h arg max k children(h) x k + 2 (log t h )/t k Ø Nau: Game Theory 62
Some Improvements! As written, UCT won t evaluate any nodes below h until all of h s siblings have been visited Ø Not good we want to explore some subtrees more deeply than others! So, replace this:! with this: if h has one or more children that aren t already in T then h any child of h that isn t in T else h arg max k children(h) x k + 2 (log t h )/t k h arg max k children(h) y k Ø where y k = x k + 2 (log t h )/t k, if k is already in T y k = some constant c, otherwise Nau: Game Theory 63
Some Improvements! UCB assumes all the levers are independent! Similarly, UCT assumes the values of sibling nodes are independent! So, Ø That s usually not true in practice Ø modify x h using some information computed from x h s siblings and grandparent Nau: Game Theory 64
Two-Player Zero-Sum Stochastic Games! Example: Backgammon! Two agents who take turns! At each turn, the set of available moves depends on the results of rolling the dice Ø Each die specifies how far to move one of your pieces (except if you roll doubles) Ø If your piece will land on a location with 2 or more opponent s pieces, you can t move there Ø If your piece lands on a location with 1 opponent s piece, the opponent s piece must start over 24 23 22 21 20 19 18 17 16 15 14 13 Nau: Game Theory 65
Backgammon Game Tree MAX! Players moves have deterministic outcomes DICE MIN B............ 1/36 1,1 1/18 1,2......... 1/18 1/36 6,5 6,6......! Dice rolls have stochastic outcomes DICE MAX... 1/36 1,1 1/18 1,2... C...... 1/18 1/36 6,5 6,6............ TERMINAL 2 1 1 1 1 Nau: Game Theory 66
ExpectiMinimax function ExpectiMinimax(h,d) if h Z then return v(h) else if d = 0 then return e(h) else if ρ (h) = Max then return max a χ(h) ExpectiMinimax(σ (h,a), d 1) else if ρ (h) = Min then return min a χ(h) ExpectiMinimax(σ (h,a), d 1) else return Σ a χ(h) Pr[a h] ExpectiMinimax(σ (h,a), d 1)! Returns the minimax expected utilities Ø Can be modified to return the actions! Can also be modified to do α-β pruning Ø But it s more complicated and less effective than in deterministic games 3 1 0.5 0.5 0.5 0.5 2 4 0 2 2 4 7 4 6 0 5 2 Nau: Game Theory 67
! Dice rolls increase branching factor Ø 21 possible rolls with 2 dice In Practice Ø Given the dice roll, 20 legal moves on average For some dice roles, can be much higher At depth 4, number of nodes is = 20 (21 20) 3 1.2 10 9 Ø As depth increases, probability of reaching a given node shrinks value of lookahead is diminished! α-β pruning is much less effective! TDGammon uses depth-2 search + very good evaluation function Ø world-champion level Ø The evaluation function was created automatically using a machinelearning technique called Temporal Difference learning hence the TD in TDGammon Nau: Game Theory 68
Summary! Two-player zero-sum perfect-information games Ø the maximin and minimax strategies are the same Ø only need to look at pure strategies Ø can do a game-tree search minimax values, alpha-beta pruning! In sufficiently complicated games, perfection is unattainable Ø limited search depth, static evaluation function Ø Monte Carlo roll-outs! Game-tree search can be modified for games in which there are stochastic outcomes Nau: Game Theory 69