CMSC 474, Game Theory

Size: px

Start display at page:

Download "CMSC 474, Game Theory"

Eileen Bridges
5 years ago
Views:

1 CMSC 474, Game Theory 4b. Game-Tree Search Dana Nau University of Maryland Nau: Game Theory 1

2 Finite perfect-information zero-sum games! Finite: Ø finitely many agents, actions, states, histories! Perfect information: Ø Every agent knows all of the players utility functions all of the players actions and what they do the history and current state Ø No simultaneous actions agents move one-at-a-time! Constant sum (or zero-sum): Ø Constant k such that regardless of how the game ends, Σ i=1,,n u i = k Ø For every such game, there s an equivalent game in which k = 0 Nau: Game Theory 2

3 Examples! Deterministic: Ø chess, checkers Ø go, gomoku Ø reversi (othello) Ø tic-tac-toe, qubic, connect-four Ø mancala (awari, kalah) Ø 9 men s morris (merelles, morels, mill)! Stochastic: Ø backgammon, monopoly, yahtzee, parcheesi, roulette, craps! For now, we ll consider just the deterministic games Nau: Game Theory 3

4 Outline! A brief history of work on this topic! Restatement of the Minimax Theorem! Game trees! The minimax algorithm! α-β pruning! Resource limits, approximate evaluation! Most of this isn t in the game-theory book! For further information, look at one of the following Ø The private materials page Ø Russell & Norvig s Artificial Intelligence: A Modern Approach There are 3 editions of this book In the 2 nd edition, it s Chapter 6 Nau: Game Theory 4

5 Brief History 1846 (Babbage) designed machine to play tic-tac-toe 1928 (von Neumann) minimax theorem 1944 (von Neumann & Morgenstern) backward induction 1950 (Shannon) minimax algorithm (finite-horizon search) 1951 (Turing) program (on paper) for playing chess (Samuel) checkers program capable of beating its creator 1956 (McCarthy) pruning to allow deeper minimax search 1957 (Bernstein) first complete chess program, on IBM 704 vacuum-tube computer could examine about 350 positions/minute 1967 (Greenblatt) first program to compete in human chess tournaments 3 wins, 3 draws, 12 losses 1992 (Schaeffer) Chinook won the 1992 US Open checkers tournament 1994 (Schaeffer) Chinook became world checkers champion; Tinsley (human champion) withdrew for health reasons 1997 (Hsu et al) Deep Blue won 6-game match vs world chess champion Kasparov 2007 (Schaeffer et al) Checkers solved: with perfect play, it s a draw calculations over 18 years Nau: Game Theory 5

6 Restatement of the Minimax Theorem! Suppose agents 1 and 2 use strategies s and t on a 2-person game G Ø Let u(s,t) = u 1 (s,t) = u 2 (s,t) Ø Call the agents Max and Min (they want to maximize and minimize u) Minimax Theorem: If G is a two-person finite zero-sum game, then there are strategies s * and t *, and a number v called G s minimax value, such that Ø If Min uses t *, Max s expected utility is v, i.e., max s u(s,t * ) = v Ø If Max uses s *, Max s expected utility is v, i.e., min t u(s *,t) = v Corollary 1:! u(s *,t * ) = v! (s *,t * ) is a Nash equilibrium s * and t * are also called perfect play for Max and Min! s * (or t * ) is Max s (or Min s) minimax strategy and maximin strategy Corollary 2: If G is a perfect-information game, then there are subgameperfect pure strategies s * and t * that satisfy the theorem. Nau: Game Theory 6

7 No Non-Credible Threats! Recall this example from Chapter 4:! If the agents can announce their strategies beforehand, agent 1 might want to announce (B,H) Ø Threat that if 2 chooses F, agent 1 will choose a move that hurts both agents Ø H isn t a credible threat Ø If agent 2 plays F anyway, it wouldn t be rational for agent 1 to play H C A D (3,8) (8,3) 1 B 2 2 E G F (5,5) 1 H (2,10) (1,0)! In zero-sum games, non-credible threats cannot occur Ø Any move M that hurts 2 will help 1 Ø If agent1 has an opportunity to play M, it would be rational to play it Nau: Game Theory 7

8 Game Tree Terminology! Root node: where the game starts! Max (or Min) node: a node where it s Max s (or Min s) move Ø Usually draw Max nodes as squares, Min nodes as circles! A node s children: the possible next nodes Ø children(h) = {σ (h,a) a χ(h)}! Terminal node: a node where the game ends Max node Min nodes Min nodes Terminal nodes Max nodes Nau: Game Theory 8

9 Number of Nodes! Let b = maximum branching factor! Let D = height of tree (maximum depth of any terminal node)! If D is even and the root is a Max node, then Ø The number of Max nodes is 1 + b 2 + b b D 2 = O(b D ) Ø The number of Min nodes is b + b 3 + b b D 1 = O(b D )! What if D is odd? Max node Min nodes Min nodes Terminal nodes Max nodes Nau: Game Theory 9

10 Number of Pure Strategies! Pure strategy for Max: at every Max node, choose one branch! O(b D ) Max nodes, b choices at each of them => O( b b D ) pure strategies Ø In the following tree, how many pure strategies for Max?! Many of them are equivalent (differ only at unreachable nodes) Ø How many distinct pure strategies are there?! What about Min? Max node Min nodes Min nodes Terminal nodes Max nodes Nau: Game Theory 10

11 Number of Distinct Pure Strategies! At every reachable Max node, choose one branch Ø Number of reachable Max nodes b 0 + b 1 + b b (D 2)/2 = O(b D/2 )! b choices at each of them => O( b b D/2 ) distinct pure strategies! Again, what about Min? Max node Min nodes Min nodes Terminal nodes Max nodes Nau: Game Theory 11

12 Finding the Minimax Strategy! Brute-force way to find minimax strategy for Max Ø Construct the sets S and T of all distinct strategies for Max and Min, then choose! Complexity analysis: s* = argmin s S max t T Ø Need to construct and store O( b b D/2 ) distinct strategies Ø Each distinct strategy has size O(b D/2 ) Ø Thus space complexity is O( b (b D/2 D/2)) Ø Time complexity is slightly worse u( s,t)! But there s an easier way to find the minimax strategy! Notation: v(h) = minimax value of the subgame at node h If h is terminal then v(h) = u(h) Nau: Game Theory 12

13 Minimax! Backward induction algorithm (Chapter 4) for 2-player zero-sum games Ø Returns v(h); can easily modify it to return both v(h) and a function Minimax (h) if h Z then return v(h) else ρ (h) = Max then return max{minimax(σ (h,a)) : a χ(h)} else return min{minimax(σ (h,a)) : a χ(h)} MAX 3 a 1 a 2 a 3 MIN a 11 a 12 a 13 a 21 a 22 a 23 a 31 a 32 a Nau: Game Theory 13

14 Complexity Analysis! Space complexity = (maximum path length) (space needed to store the path) = O(bD)! Time complexity = size of the game tree = O(b D ) Ø where b = branching factor, D = height of the game tree! This is a lot better than O( b (b D/2 D/2))! But it still isn t good enough for games like chess Ø b 35, D 100 for reasonable chess games Ø b h nodes! Number of particles in the universe Ø nodes is times the number of particles in the universe no way to examine every node! Nau: Game Theory 14

15 Limited-Depth Minimax (Wiener,1948) function LD-minimax(h,d) if h Z then return v(h) else if d = 0 then return e(h) else if ρ (h) = Max then return max a χ(h) LD-minimax(σ (h,a)), d 1) else return min a χ(h) LD-minimax(σ (h,a)), d 1)! Minimax algorithm with an upper bound d on the search depth Ø e(h) is a static evaluation function: returns an estimate of v(h)! Returns an approximation of v(h) Ø If d height of node h, returns v(h) exactly! Space complexity = O(min(bD,bd)), where D = height of node h! Time complexity = O(min(b D, b d )) Nau: Game Theory 15

16 Evaluation Functions! e(h) is often a weighted sum of features Ø e(h) = w 1 f 1 (h) + w 2 f 2 (h) + + w n f n (h)! E.g., in chess, Ø 1 (white pawns black pawns) + 3 (white knights black knights) + Nau: Game Theory 16

17 How to Use Limited-Depth Minimax function LD-choice(h,d) if ρ (h) = Max then return arg max a χ(h) LD-minimax(σ (h,a)), d 1) else return arg min a χ(h) LD-minimax(σ (h,a)), d 1) function LD-minimax(h,d) if h Z then return v(h) else if d = 0 then return e(h) else if ρ (h) = Max then return max a χ(h) LD-minimax(σ (h,a)), d 1) else return min a χ(h) LD-minimax(σ (h,a)), d 1)! LD-choice only returns a single action; call it at every move Ø Why? Nau: Game Theory 17

18 Exact Values for e Don t Matter MAX MIN ! Behavior is preserved under any monotonic transformation of e Ø Only the order matters Nau: Game Theory 18

19 Multiplayer Games! The Max n algorithm: backward induction for n players, with cutoff depth d and evaluation function e! Node evaluation is a payoff profile v v[i] is player i s payoff! Approximate SPE payoff profile Ø Exact if d height of h function Maxn (h, d) if h Z then return u(h) if d = 0 then return e(h) V = {Maxn(σ (h,a), d 1) a χ(h)} return arg max v V v[ρ (h)] function Maxn-choice (h, d) 1 s move: return arg max a χ(h) Maxn(σ (h,a), d 1) [ρ (h)] 2 p move to s 2 s q (2,4,4) r (3,5,2) s (4,5,1) move: t u v w x y 3 (2,4,4) (6,3,1) (5,2,2) (3,5,2) (4,5,1) (1,4,5) 4 Nau: Game Theory 19

20 Multiplayer Games! The Paranoid algorithm: compute approximate maxmin value for player i, with cutoff depth d and evaluation function e Ø exact maxmin value if d h s height! Compute payoff for i, assuming i wants to maximize it, and the others want to minimize it function Paranoid(i, h, d) if h Z then return u(h)[i] if d = 0 then return e(h)[i] if ρ (h) = i then return max a χ(h) Paranoid(i, σ (h,a), d 1) else return min a χ(h) Paranoid(i, σ (h,a), d 1) function Paranoid-choice (i, h, d) if ρ (h) = i then return arg max a χ(h) Paranoid(i, σ (h,a), d 1) else return error 1 s move: 2 move to r 2 s q 2 r 3 s 1 move: p t u v w x y 3 (2,4,4) (6,3,1) (5,2,2) (3,5,2) (4,5,1) (1,4,5) 1 Nau: Game Theory 20

21 Discussion! Neither the Max n algorithm nor the Paranoid algorithm has been entirely successful! Partly due to dynamic relationships among players Ø Human players may hold grudges Ø Informal alliances may form and dissolve over time e.g., the players who are behind may gang up on the player who s winning! These relationships can greatly influence the players strategies Ø Max n and Paranoid don t model these relationships! To play well in a multi-player game, ways are needed to Ø Decide when/whether to cooperate with others Ø Deduce each player s attitude toward the other players Nau: Game Theory 21

22 Pruning! Let s go back to 2-player zero-sum games Ø Minimax and LD-minimax both examine nodes that don t need to be examined MAX a 3 MIN b 3 2 c d e X X Nau: Game Theory 22

23 Pruning! b is better for Max than f is! If Max is rational then Max will never choose f! So don t examine any more nodes below f Ø They can t affect v(a) MAX a 3 MIN b 3 f 2 c d e g X X Nau: Game Theory 23

24 Pruning! Don t know whether h is better or worse than b MAX a 3 MIN b 3 f 2 h 14 c d e g i X X 14 Nau: Game Theory 24

25 Pruning! Still don t know whether h is better or worse than b MAX a 3 MIN b 3 f 2 h 14 5 c d e g i X X 14 j 5 Nau: Game Theory 25

26 Pruning! h is worse than b Ø Don t need to examine any more nodes below h! v(a) = 3 MAX a 3 3 MIN b 3 f 2 h c d e g i X X 14 j k X 5 2 Nau: Game Theory 26

27 Alpha Cutoff! Squares are Max nodes, circles are Min nodes! Let α = max(a,b,c), and suppose d < α! To reach s, the game must go through p, q, r! By moving elsewhere at one of those nodes, Max can get v α! If the game ever reaches node s, then Min can achieve v d < what Max can get elsewhere Ø Max will never let that happen Ø We don t need to know anything more about s! What if d = α? p v=a v=b v=c v=d v a q v b r v c α = max(a,b,c) s v d Nau: Game Theory 27

28 Beta Cutoff! Squares are Max nodes, circles are Min nodes! Let β = min(a,b,c), and suppose d > β p v a! To reach s, the game must go through p, q, r! By moving elsewhere at one of those nodes, Min can achieve v β v=a q v b! If the game ever reaches node s, then Max can achieve v d > what Min can get elsewhere v=b r v c Ø Min will never let that happen Ø We don t need to know anything more about s v=c s β = min(a,b,c) v d! What if d = β? v=d Nau: Game Theory 28

29 Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column α = b return v α = a c if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do v min(v, Alpha-Beta (σ (h,a), d 1, α, β) if v α then return v else β min(β,v) e d j m f i k l g h Nau: Game Theory 29

30 Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column α = b return v 7 α = X 7 a 7 c if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do v min(v, Alpha-Beta (σ (h,a), d 1, α, β) if v α then return v else β min(β,v) e d j m f i k l g h Nau: Game Theory 30

31 Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column α = b 7 f g return v h e α = X 7 a 7 d c i k l j if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do v min(v, Alpha-Beta (σ (h,a), d 1, α, β) if v α then return v else β min(β,v) m Nau: Game Theory 31

32 Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column α = b 7 f g 5 return v 5 h e α = X 7 a 7 d c i k l j if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do v min(v, Alpha-Beta (σ (h,a), d 1, α, β) if v α then return v else β min(β,v) m Nau: Game Theory 32

33 Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column α = b return v 7 f 5 g 5 h -3 e α = X 7 a 7 d c i k l j if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do v min(v, Alpha-Beta (σ (h,a), d 1, α, β) if v α then return v else β min(β,v) m Nau: Game Theory 33

34 Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column return v α = X 7 a α = 7 c b 7 d 5 e 5 j alpha cutoff f 5 i k l g 5 h -3 if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do v min(v, Alpha-Beta (σ (h,a), d 1, α, β) if v α then return v else β min(β,v) m Nau: Game Theory 34

35 Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column return v α = X 7 a α = 7 b 7 d 5 e 5 alpha cutoff f 5 i k g 5 h -3 c j l if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do v min(v, Alpha-Beta (σ (h,a), d 1, α, β) if v α then return v else β min(β,v) m Nau: Game Theory 35

36 Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column α = X 7 a α = 7 b 7 d 5 e 5 alpha cutoff f 5 i 0 k g 5 h -3 0 return v c j l if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do v min(v, Alpha-Beta (σ (h,a), d 1, α, β) if v α then return v else β min(β,v) m Nau: Game Theory 36

37 Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column α = X 7 a α = 7 c b 7 d 5 X 8 e 5 j alpha cutoff 8 f 5 i 8 k l g 5 h return v if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do v min(v, Alpha-Beta (σ (h,a), d 1, α, β) if v α then return v else β min(β,v) β = 8 m Nau: Game Theory 37

38 Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column α = X 7 v min(v, a α = 7 c b 7 d m 5 X 8 e 5 j alpha cutoff 8 f 5 i 8 k l β = 8 9 beta cutoff g 5 h return v if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do if v α then return v else β min(β,v) Alpha-Beta (σ (h,a), d 1, α, β) Nau: Game Theory 38

39 Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column α = X 7 v min(v, a α = 7 X 8 c b 7 α = X d m X 5 8 X 8 e 5 j alpha cutoff 8 f 5 i 8 k l β = 8 9 beta cutoff g 5 h return v if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do if v α then return v else β min(β,v) Alpha-Beta (σ (h,a), d 1, α, β) Nau: Game Theory 39

40 Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column α = X 7 v min(v, a α = 7 X 8 c b 7 α = X d m β = 8 X 5 8 X 8 e 5 j alpha cutoff 8 f 5 i 8 k l β = 8 9 beta cutoff g 5 h return v if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do if v α then return v else β min(β,v) Alpha-Beta (σ (h,a), d 1, α, β) Nau: Game Theory 40

41 Properties of Alpha-Beta! Alpha-beta pruning reasons about which computations are relevant Ø A form of metareasoning Theorem:! If the value returned by Minimax(h, d) is in [α,β] then Alpha-Beta(h, d, α, β) returns the same value! If the value returned by Minimax(h, d) is α then Alpha-Beta(h, d, α, β) returns a value α! If the value returned by Minimax(h, d) is β then Alpha-Beta(h, d, α, β) returns a value β Corollary: Alpha-Beta(h, d,, ) returns the same value as Minimax(h, d) Alpha-Beta(h,,, ) returns v(h) Nau: Game Theory 41

42 Node Ordering! Deeper lookahead (larger d) usually gives better decisions Ø There are pathological games where it doesn t, but those are rare! Compared to LD-minimax, how much farther ahead can Alpha-Beta look?! Best case: Ø children of Max nodes are searched in greatest-value-first order, children of Min nodes are searched in least-value-first order Ø Alpha-Beta s time complexity is O(b d/2 ) doubles the solvable depth! Worst case: Ø children of Max nodes are searched in least-value first order, children of Min nodes are searched in greatest-value first order Ø Like LD-minimax, Alpha-Beta visits all nodes of depth d: time complexity O(b d ) Nau: Game Theory 42

43 Node Ordering! How to get closer to the best case: Ø Every time you expand a state s, apply e to its children Ø When it s Max s move, sort the children in order of largest e first Ø When it s Min s move, sort the children in order of smallest e first! Suppose we have 100 seconds, explore 10 4 nodes/second Ø 10 6 nodes per move Ø Put this into the form b d/2 35 8/2 Ø Best case Alpha-Beta reaches depth 8 pretty good chess program Nau: Game Theory 43

44 Other Modifications! Several other modifications that can improve the accuracy or computation time: Ø quiescence search and biasing Ø transposition tables Ø thinking on the opponent s time Ø table lookup of book moves Ø iterative deepening Ø forward pruning Nau: Game Theory 44

45 Quiescence Search and Biasing! In a game like checkers or chess, the evaluation is based greatly on material pieces Ø The evaluation is likely to be inaccurate if there are pending captures! Search deeper to reach a position where there aren t pending captures Ø Evaluations will be more accurate here! That creates another problem Ø You re searching some paths to an even depth, others to an odd depth Ø Paths that end just after your opponent s move will look worse than paths that end just after your move! To compensate, add or subtract a number called the biasing factor Nau: Game Theory 45

46 Transposition Tables! Often there are multiple paths to the same state (i.e., the state space is a really graph rather than a tree)! Idea: Ø when you compute s s minimax value, store it in a hash table Ø visit s again retrieve its value rather than computing it again! The hash table is called a transposition table! Problem: too many states to store all of them Ø Store some, rather than all Ø Try to store the ones that you re most likely to need Nau: Game Theory 46

47 Thinking on the Opponent s Time! Suppose you re at node a Ø Use alpha-beta to estimate v(b) and v(c) Ø c looks better, so move there! Consider your estimates of v(f) and v(g) Ø They suggest your opponent is likely to move to f Ø While waiting for the opponent to move, start an alpha-beta search below f! If the opponent moves to f, then you ve already done a lot of the work of figuring out your next move b 3 a d 5 e 3 f 8 g 17 8 c 8 Nau: Game Theory 47

48 Book Moves! In some games, experts have spent lots of time analyzing openings Ø Sequences of moves one might play at the start of the game Ø Best responses to those sequences! Some of these are cataloged in standard reference works Ø e.g., the Encyclopaedia of Chess Openings! Store these in a lookup table Ø Respond almost immediately, as long as the opponent sticks to a sequence that s in the book! A technique humans can use when playing against such a program Ø Deliberately make a move that isn t in the book Ø This may weaken the human s position, but the computer will (i) start taking longer and (ii) stop playing as well Nau: Game Theory 48

49 Iterative Deepening! How deeply should you search a game tree? Ø When you call Alpha-Beta(h, d,, ), what to use for d?! small d => don t make as good a decision! large d => run out of time without knowing what move to make! Solution: iterative deepening for d = 1 by 1 until you run out of time m the move returned by Alpha-Beta(h, d,, )! Why this works: Ø Time complexity is O(b 1 + b b d ) = O(b d ) Ø For large b, each iteration takes much more time than all of the previous iterations put together Nau: Game Theory 49

50 Forward Pruning! Tapered search: Ø Instead of looking at all of a node s children, just look at the n best ones the n highest e(h) values Ø Decrease the value of n as you go deeper in the tree b a 5 c -3 d 8 d 17 ignore n = 3 Ø Drawback: may exclude an important move at a low level of the tree c 2 ignore! Marginal forward pruning: Ø Ignore every child h such that e(h) is smaller than the current best values from the nodes you ve already visited Ø Not reliable, should be avoided Nau: Game Theory 50

51 Forward Pruning! Until the mid-1970s most chess programs were trying to search the same way humans think Ø Use extensive chess knowledge at each node to select a few plausible moves; prune the others Ø Serious tactical shortcomings! Brute-force search programs did better, dominated computer chess for about 20 years! Early 1990s: development of some forward-pruning techniques that worked well Ø Null-move pruning! Today most chess programs use some kind of forward-pruning Ø null-move pruning is one of the most popular Nau: Game Theory 51

52 Game-Tree Search in Practice! Checkers: In 1994, Chinook ended 40-year-reign of human world champion Marion Tinsley Ø Tinsley withdrew for health reasons, died a few months later! In 2007, Checkers was solved: with perfect play, it s a draw This took calculations over 18 years. Search space size ! Chess: In 1997, Deep Blue defeated Gary Kasparov in a six-game match Ø Deep Blue searches 200 million positions per second Ø Uses very sophisticated evaluation, and undisclosed methods for extending some lines of search up to 40 ply! Othello: human champions don t compete against computers Ø The computers are too good! Go: in 2006, good amateurs could beat the best go programs Even with a 9-stone handicap Ø Go programs have improved a lot during the past 7 years Nau: Game Theory 52

Rules of Go (Abbreviated)! Go-board: 19 19 locations (intersections on a grid)! Back and White take turns!

Each move consists of placing a stone at an unoccupied location Ø You just put them on the board; you don t move them around!

53 Rules of Go (Abbreviated)! Go-board: locations (intersections on a grid)! Back and White take turns! Black has the first move! Each move consists of placing a stone at an unoccupied location Ø You just put them on the board; you don t move them around! Adjacent stones of the same color are called a string. Ø Liberties are the empty locations next to the string Ø A string is removed if its number of liberties is 0! Score: territories (number of occupied or surrounded locations) Nau: Game Theory 53

54 Why Go is Difficult for Computers! A game tree s size grows exponentially with both its depth and its branching factor! The game tree for go: Ø branching factor 200 Ø game length 250 to 300 moves Ø number of paths in the game tree to ! Much too big for a normal game tree search b =2 b =3 b =4 Updated 9/23/10 Nau: Game Theory 43 Nau: Game Theory 54

55 Why Go is Difficult for Computers! Writing an evaluation function for chess Ø Mainly piece count, plus some positional considerations isolated/doubled pawns, rooks on open files (columns), pawns in the center of the board, etc. Ø As the game progresses, pieces get removed => evaluation gets easier! Writing an evaluation function for Go is Ø Much more complicated whether or not a group is alive which stones can be connected to one another the extent to which a position has influence or can be attacked! As the game progresses, pieces get added => evaluation gets more complicated Nau: Game Theory 55

56 Game-tree Search in Go! During the past five years, go programs have gotten much better! Main reason: Monte Carlo roll-outs! Basic idea: Ø Generate a list of potential moves A ={a 1,, a n } Ø For each move a i in A do Starting at σ(h,a i ), generate a set of random games G i in which the two players make all their moves at random Ø Choose the move a i that produced the best results Nau: Game Theory 56

57 Multi-Arm Bandit! Statistical model of sequential experiments Ø Name comes from a traditional slot machine (one-armed bandit)! Multiple actions Ø Each action provides a reward from a probability distribution associated with that specific action Ø Objective: maximize the expected utility of a sequence of actions! Exploitation vs exploration dilemma: Ø Exploitation: choosing an action that you already know about, because you think it s likely to give you a high reward Ø Exploration: choosing an action that you don t know much about, in hopes that maybe it will produce a better reward than the actions you already know about Nau: Game Theory 57

58 UCB (Upper Confidence Bound) Algorithm! Let! loop Ø x i = average reward you ve gotten from arm i Ø t i = number of times you ve tried arm i; Ø t = i t i Ø if there are one or more arms that have not been played Ø then play one of them Ø else play the arm i that has the highest value of x i + 2 (log t)/t i Nau: Game Theory 58

59 UCT (UCB for Trees)! Adaptation of UCB for game-tree search in go Ø First used in Mogo in 2006 Ø Mogo won the go tournament at the 2007 Computer Olympiad, and several other computer go tournaments Ø Mogo won one (of 3) 9x9 blitz go games against a human professional! UCT is now used in most computer go programs! I ve had some trouble finding a clear and unambiguous description of UCT Ø But I think the basic algorithm is as follows Nau: Game Theory 59

60 Basic UCT Algorithm! Start with an empty tree, T! loop Ø h the start of the game Ø while h T do T = {nodes we ve evaluated} The loop ends when we reach a node we haven t evaluated use UCB to choose a child of h if h has one or more children that aren t already in T then h any child of h that isn t in T else h arg max k children(h) x k + 2 (log t h )/t k Ø play a random game starting at h; v the game s payoff profile Ø add h to T; t h 1; x h v[ρ(h)] Ø for every g ancestors(h) do x g (t g x g + v[ρ(g)]) / (t g + 1) t g t g + 1 i.e., x g = avg. payoff to ρ(g) of all games below g Nau: Game Theory 60

61 Properties! As the number of iterations goes to infinity, the evaluation x h of the root node converges to its minmax value! Question: Ø How can this be, since x h is the the average of the values we ve gotten below h? Nau: Game Theory 61

62 Properties! As the number of iterations goes to infinity, the evaluation x h of the root node converges to its minmax value! Question: Ø How can this be, since x h is the the average of the values we ve gotten below h?! Look at how h is chosen:! Start with an empty tree, T! loop Ø h the start of the game Ø while h T do if h has one or more children that aren t already in T then h any child of h that isn t in T else h arg max k children(h) x k + 2 (log t h )/t k Ø Nau: Game Theory 62

63 Some Improvements! As written, UCT won t evaluate any nodes below h until all of h s siblings have been visited Ø Not good we want to explore some subtrees more deeply than others! So, replace this:! with this: if h has one or more children that aren t already in T then h any child of h that isn t in T else h arg max k children(h) x k + 2 (log t h )/t k h arg max k children(h) y k Ø where y k = x k + 2 (log t h )/t k, if k is already in T y k = some constant c, otherwise Nau: Game Theory 63

64 Some Improvements! UCB assumes all the levers are independent! Similarly, UCT assumes the values of sibling nodes are independent! So, Ø That s usually not true in practice Ø modify x h using some information computed from x h s siblings and grandparent Nau: Game Theory 64

65 Two-Player Zero-Sum Stochastic Games! Example: Backgammon! Two agents who take turns! At each turn, the set of available moves depends on the results of rolling the dice Ø Each die specifies how far to move one of your pieces (except if you roll doubles) Ø If your piece will land on a location with 2 or more opponent s pieces, you can t move there Ø If your piece lands on a location with 1 opponent s piece, the opponent s piece must start over Nau: Game Theory 65

66 Backgammon Game Tree MAX! Players moves have deterministic outcomes DICE MIN B /36 1,1 1/18 1, /18 1/36 6,5 6, ! Dice rolls have stochastic outcomes DICE MAX... 1/36 1,1 1/18 1,2... C /18 1/36 6,5 6, TERMINAL Nau: Game Theory 66

67 ExpectiMinimax function ExpectiMinimax(h,d) if h Z then return v(h) else if d = 0 then return e(h) else if ρ (h) = Max then return max a χ(h) ExpectiMinimax(σ (h,a), d 1) else if ρ (h) = Min then return min a χ(h) ExpectiMinimax(σ (h,a), d 1) else return Σ a χ(h) Pr[a h] ExpectiMinimax(σ (h,a), d 1)! Returns the minimax expected utilities Ø Can be modified to return the actions! Can also be modified to do α-β pruning Ø But it s more complicated and less effective than in deterministic games Nau: Game Theory 67

68 ! Dice rolls increase branching factor Ø 21 possible rolls with 2 dice In Practice Ø Given the dice roll, 20 legal moves on average For some dice roles, can be much higher At depth 4, number of nodes is = 20 (21 20) Ø As depth increases, probability of reaching a given node shrinks value of lookahead is diminished! α-β pruning is much less effective! TDGammon uses depth-2 search + very good evaluation function Ø world-champion level Ø The evaluation function was created automatically using a machinelearning technique called Temporal Difference learning hence the TD in TDGammon Nau: Game Theory 68

69 Summary! Two-player zero-sum perfect-information games Ø the maximin and minimax strategies are the same Ø only need to look at pure strategies Ø can do a game-tree search minimax values, alpha-beta pruning! In sufficiently complicated games, perfection is unattainable Ø limited search depth, static evaluation function Ø Monte Carlo roll-outs! Game-tree search can be modified for games in which there are stochastic outcomes Nau: Game Theory 69

Game playing. Chapter 6. Chapter 6 1

Game playing. Chapter 6. Chapter 6 1 Game playing Chapter 6 Chapter 6 1 Outline Minimax α β pruning UCT for games Chapter 6 2 Game tree (2-player, deterministic, turns) Chapter 6 3 Minimax Perfect play for deterministic, perfect-information