CMSC 474, Game Theory

Size: px
Start display at page:

Download "CMSC 474, Game Theory"

Transcription

1 CMSC 474, Game Theory 4b. Game-Tree Search Dana Nau University of Maryland Nau: Game Theory 1

2 Finite perfect-information zero-sum games! Finite: Ø finitely many agents, actions, states, histories! Perfect information: Ø Every agent knows all of the players utility functions all of the players actions and what they do the history and current state Ø No simultaneous actions agents move one-at-a-time! Constant sum (or zero-sum): Ø Constant k such that regardless of how the game ends, Σ i=1,,n u i = k Ø For every such game, there s an equivalent game in which k = 0 Nau: Game Theory 2

3 Examples! Deterministic: Ø chess, checkers Ø go, gomoku Ø reversi (othello) Ø tic-tac-toe, qubic, connect-four Ø mancala (awari, kalah) Ø 9 men s morris (merelles, morels, mill)! Stochastic: Ø backgammon, monopoly, yahtzee, parcheesi, roulette, craps! For now, we ll consider just the deterministic games Nau: Game Theory 3

4 Outline! A brief history of work on this topic! Restatement of the Minimax Theorem! Game trees! The minimax algorithm! α-β pruning! Resource limits, approximate evaluation! Most of this isn t in the game-theory book! For further information, look at one of the following Ø The private materials page Ø Russell & Norvig s Artificial Intelligence: A Modern Approach There are 3 editions of this book In the 2 nd edition, it s Chapter 6 Nau: Game Theory 4

5 Brief History 1846 (Babbage) designed machine to play tic-tac-toe 1928 (von Neumann) minimax theorem 1944 (von Neumann & Morgenstern) backward induction 1950 (Shannon) minimax algorithm (finite-horizon search) 1951 (Turing) program (on paper) for playing chess (Samuel) checkers program capable of beating its creator 1956 (McCarthy) pruning to allow deeper minimax search 1957 (Bernstein) first complete chess program, on IBM 704 vacuum-tube computer could examine about 350 positions/minute 1967 (Greenblatt) first program to compete in human chess tournaments 3 wins, 3 draws, 12 losses 1992 (Schaeffer) Chinook won the 1992 US Open checkers tournament 1994 (Schaeffer) Chinook became world checkers champion; Tinsley (human champion) withdrew for health reasons 1997 (Hsu et al) Deep Blue won 6-game match vs world chess champion Kasparov 2007 (Schaeffer et al) Checkers solved: with perfect play, it s a draw calculations over 18 years Nau: Game Theory 5

6 Restatement of the Minimax Theorem! Suppose agents 1 and 2 use strategies s and t on a 2-person game G Ø Let u(s,t) = u 1 (s,t) = u 2 (s,t) Ø Call the agents Max and Min (they want to maximize and minimize u) Minimax Theorem: If G is a two-person finite zero-sum game, then there are strategies s * and t *, and a number v called G s minimax value, such that Ø If Min uses t *, Max s expected utility is v, i.e., max s u(s,t * ) = v Ø If Max uses s *, Max s expected utility is v, i.e., min t u(s *,t) = v Corollary 1:! u(s *,t * ) = v! (s *,t * ) is a Nash equilibrium s * and t * are also called perfect play for Max and Min! s * (or t * ) is Max s (or Min s) minimax strategy and maximin strategy Corollary 2: If G is a perfect-information game, then there are subgameperfect pure strategies s * and t * that satisfy the theorem. Nau: Game Theory 6

7 No Non-Credible Threats! Recall this example from Chapter 4:! If the agents can announce their strategies beforehand, agent 1 might want to announce (B,H) Ø Threat that if 2 chooses F, agent 1 will choose a move that hurts both agents Ø H isn t a credible threat Ø If agent 2 plays F anyway, it wouldn t be rational for agent 1 to play H C A D (3,8) (8,3) 1 B 2 2 E G F (5,5) 1 H (2,10) (1,0)! In zero-sum games, non-credible threats cannot occur Ø Any move M that hurts 2 will help 1 Ø If agent1 has an opportunity to play M, it would be rational to play it Nau: Game Theory 7

8 Game Tree Terminology! Root node: where the game starts! Max (or Min) node: a node where it s Max s (or Min s) move Ø Usually draw Max nodes as squares, Min nodes as circles! A node s children: the possible next nodes Ø children(h) = {σ (h,a) a χ(h)}! Terminal node: a node where the game ends Max node Min nodes Min nodes Terminal nodes Max nodes Nau: Game Theory 8

9 Number of Nodes! Let b = maximum branching factor! Let D = height of tree (maximum depth of any terminal node)! If D is even and the root is a Max node, then Ø The number of Max nodes is 1 + b 2 + b b D 2 = O(b D ) Ø The number of Min nodes is b + b 3 + b b D 1 = O(b D )! What if D is odd? Max node Min nodes Min nodes Terminal nodes Max nodes Nau: Game Theory 9

10 Number of Pure Strategies! Pure strategy for Max: at every Max node, choose one branch! O(b D ) Max nodes, b choices at each of them => O( b b D ) pure strategies Ø In the following tree, how many pure strategies for Max?! Many of them are equivalent (differ only at unreachable nodes) Ø How many distinct pure strategies are there?! What about Min? Max node Min nodes Min nodes Terminal nodes Max nodes Nau: Game Theory 10

11 Number of Distinct Pure Strategies! At every reachable Max node, choose one branch Ø Number of reachable Max nodes b 0 + b 1 + b b (D 2)/2 = O(b D/2 )! b choices at each of them => O( b b D/2 ) distinct pure strategies! Again, what about Min? Max node Min nodes Min nodes Terminal nodes Max nodes Nau: Game Theory 11

12 Finding the Minimax Strategy! Brute-force way to find minimax strategy for Max Ø Construct the sets S and T of all distinct strategies for Max and Min, then choose! Complexity analysis: s* = argmin s S max t T Ø Need to construct and store O( b b D/2 ) distinct strategies Ø Each distinct strategy has size O(b D/2 ) Ø Thus space complexity is O( b (b D/2 D/2)) Ø Time complexity is slightly worse u( s,t)! But there s an easier way to find the minimax strategy! Notation: v(h) = minimax value of the subgame at node h If h is terminal then v(h) = u(h) Nau: Game Theory 12

13 Minimax! Backward induction algorithm (Chapter 4) for 2-player zero-sum games Ø Returns v(h); can easily modify it to return both v(h) and a function Minimax (h) if h Z then return v(h) else ρ (h) = Max then return max{minimax(σ (h,a)) : a χ(h)} else return min{minimax(σ (h,a)) : a χ(h)} MAX 3 a 1 a 2 a 3 MIN a 11 a 12 a 13 a 21 a 22 a 23 a 31 a 32 a Nau: Game Theory 13

14 Complexity Analysis! Space complexity = (maximum path length) (space needed to store the path) = O(bD)! Time complexity = size of the game tree = O(b D ) Ø where b = branching factor, D = height of the game tree! This is a lot better than O( b (b D/2 D/2))! But it still isn t good enough for games like chess Ø b 35, D 100 for reasonable chess games Ø b h nodes! Number of particles in the universe Ø nodes is times the number of particles in the universe no way to examine every node! Nau: Game Theory 14

15 Limited-Depth Minimax (Wiener,1948) function LD-minimax(h,d) if h Z then return v(h) else if d = 0 then return e(h) else if ρ (h) = Max then return max a χ(h) LD-minimax(σ (h,a)), d 1) else return min a χ(h) LD-minimax(σ (h,a)), d 1)! Minimax algorithm with an upper bound d on the search depth Ø e(h) is a static evaluation function: returns an estimate of v(h)! Returns an approximation of v(h) Ø If d height of node h, returns v(h) exactly! Space complexity = O(min(bD,bd)), where D = height of node h! Time complexity = O(min(b D, b d )) Nau: Game Theory 15

16 Evaluation Functions! e(h) is often a weighted sum of features Ø e(h) = w 1 f 1 (h) + w 2 f 2 (h) + + w n f n (h)! E.g., in chess, Ø 1 (white pawns black pawns) + 3 (white knights black knights) + Nau: Game Theory 16

17 How to Use Limited-Depth Minimax function LD-choice(h,d) if ρ (h) = Max then return arg max a χ(h) LD-minimax(σ (h,a)), d 1) else return arg min a χ(h) LD-minimax(σ (h,a)), d 1) function LD-minimax(h,d) if h Z then return v(h) else if d = 0 then return e(h) else if ρ (h) = Max then return max a χ(h) LD-minimax(σ (h,a)), d 1) else return min a χ(h) LD-minimax(σ (h,a)), d 1)! LD-choice only returns a single action; call it at every move Ø Why? Nau: Game Theory 17

18 Exact Values for e Don t Matter MAX MIN ! Behavior is preserved under any monotonic transformation of e Ø Only the order matters Nau: Game Theory 18

19 Multiplayer Games! The Max n algorithm: backward induction for n players, with cutoff depth d and evaluation function e! Node evaluation is a payoff profile v v[i] is player i s payoff! Approximate SPE payoff profile Ø Exact if d height of h function Maxn (h, d) if h Z then return u(h) if d = 0 then return e(h) V = {Maxn(σ (h,a), d 1) a χ(h)} return arg max v V v[ρ (h)] function Maxn-choice (h, d) 1 s move: return arg max a χ(h) Maxn(σ (h,a), d 1) [ρ (h)] 2 p move to s 2 s q (2,4,4) r (3,5,2) s (4,5,1) move: t u v w x y 3 (2,4,4) (6,3,1) (5,2,2) (3,5,2) (4,5,1) (1,4,5) 4 Nau: Game Theory 19

20 Multiplayer Games! The Paranoid algorithm: compute approximate maxmin value for player i, with cutoff depth d and evaluation function e Ø exact maxmin value if d h s height! Compute payoff for i, assuming i wants to maximize it, and the others want to minimize it function Paranoid(i, h, d) if h Z then return u(h)[i] if d = 0 then return e(h)[i] if ρ (h) = i then return max a χ(h) Paranoid(i, σ (h,a), d 1) else return min a χ(h) Paranoid(i, σ (h,a), d 1) function Paranoid-choice (i, h, d) if ρ (h) = i then return arg max a χ(h) Paranoid(i, σ (h,a), d 1) else return error 1 s move: 2 move to r 2 s q 2 r 3 s 1 move: p t u v w x y 3 (2,4,4) (6,3,1) (5,2,2) (3,5,2) (4,5,1) (1,4,5) 1 Nau: Game Theory 20

21 Discussion! Neither the Max n algorithm nor the Paranoid algorithm has been entirely successful! Partly due to dynamic relationships among players Ø Human players may hold grudges Ø Informal alliances may form and dissolve over time e.g., the players who are behind may gang up on the player who s winning! These relationships can greatly influence the players strategies Ø Max n and Paranoid don t model these relationships! To play well in a multi-player game, ways are needed to Ø Decide when/whether to cooperate with others Ø Deduce each player s attitude toward the other players Nau: Game Theory 21

22 Pruning! Let s go back to 2-player zero-sum games Ø Minimax and LD-minimax both examine nodes that don t need to be examined MAX a 3 MIN b 3 2 c d e X X Nau: Game Theory 22

23 Pruning! b is better for Max than f is! If Max is rational then Max will never choose f! So don t examine any more nodes below f Ø They can t affect v(a) MAX a 3 MIN b 3 f 2 c d e g X X Nau: Game Theory 23

24 Pruning! Don t know whether h is better or worse than b MAX a 3 MIN b 3 f 2 h 14 c d e g i X X 14 Nau: Game Theory 24

25 Pruning! Still don t know whether h is better or worse than b MAX a 3 MIN b 3 f 2 h 14 5 c d e g i X X 14 j 5 Nau: Game Theory 25

26 Pruning! h is worse than b Ø Don t need to examine any more nodes below h! v(a) = 3 MAX a 3 3 MIN b 3 f 2 h c d e g i X X 14 j k X 5 2 Nau: Game Theory 26

27 Alpha Cutoff! Squares are Max nodes, circles are Min nodes! Let α = max(a,b,c), and suppose d < α! To reach s, the game must go through p, q, r! By moving elsewhere at one of those nodes, Max can get v α! If the game ever reaches node s, then Min can achieve v d < what Max can get elsewhere Ø Max will never let that happen Ø We don t need to know anything more about s! What if d = α? p v=a v=b v=c v=d v a q v b r v c α = max(a,b,c) s v d Nau: Game Theory 27

28 Beta Cutoff! Squares are Max nodes, circles are Min nodes! Let β = min(a,b,c), and suppose d > β p v a! To reach s, the game must go through p, q, r! By moving elsewhere at one of those nodes, Min can achieve v β v=a q v b! If the game ever reaches node s, then Max can achieve v d > what Min can get elsewhere v=b r v c Ø Min will never let that happen Ø We don t need to know anything more about s v=c s β = min(a,b,c) v d! What if d = β? v=d Nau: Game Theory 28

29 Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column α = b return v α = a c if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do v min(v, Alpha-Beta (σ (h,a), d 1, α, β) if v α then return v else β min(β,v) e d j m f i k l g h Nau: Game Theory 29

30 Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column α = b return v 7 α = X 7 a 7 c if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do v min(v, Alpha-Beta (σ (h,a), d 1, α, β) if v α then return v else β min(β,v) e d j m f i k l g h Nau: Game Theory 30

31 Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column α = b 7 f g return v h e α = X 7 a 7 d c i k l j if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do v min(v, Alpha-Beta (σ (h,a), d 1, α, β) if v α then return v else β min(β,v) m Nau: Game Theory 31

32 Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column α = b 7 f g 5 return v 5 h e α = X 7 a 7 d c i k l j if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do v min(v, Alpha-Beta (σ (h,a), d 1, α, β) if v α then return v else β min(β,v) m Nau: Game Theory 32

33 Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column α = b return v 7 f 5 g 5 h -3 e α = X 7 a 7 d c i k l j if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do v min(v, Alpha-Beta (σ (h,a), d 1, α, β) if v α then return v else β min(β,v) m Nau: Game Theory 33

34 Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column return v α = X 7 a α = 7 c b 7 d 5 e 5 j alpha cutoff f 5 i k l g 5 h -3 if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do v min(v, Alpha-Beta (σ (h,a), d 1, α, β) if v α then return v else β min(β,v) m Nau: Game Theory 34

35 Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column return v α = X 7 a α = 7 b 7 d 5 e 5 alpha cutoff f 5 i k g 5 h -3 c j l if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do v min(v, Alpha-Beta (σ (h,a), d 1, α, β) if v α then return v else β min(β,v) m Nau: Game Theory 35

36 Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column α = X 7 a α = 7 b 7 d 5 e 5 alpha cutoff f 5 i 0 k g 5 h -3 0 return v c j l if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do v min(v, Alpha-Beta (σ (h,a), d 1, α, β) if v α then return v else β min(β,v) m Nau: Game Theory 36

37 Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column α = X 7 a α = 7 c b 7 d 5 X 8 e 5 j alpha cutoff 8 f 5 i 8 k l g 5 h return v if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do v min(v, Alpha-Beta (σ (h,a), d 1, α, β) if v α then return v else β min(β,v) β = 8 m Nau: Game Theory 37

38 Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column α = X 7 v min(v, a α = 7 c b 7 d m 5 X 8 e 5 j alpha cutoff 8 f 5 i 8 k l β = 8 9 beta cutoff g 5 h return v if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do if v α then return v else β min(β,v) Alpha-Beta (σ (h,a), d 1, α, β) Nau: Game Theory 38

39 Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column α = X 7 v min(v, a α = 7 X 8 c b 7 α = X d m X 5 8 X 8 e 5 j alpha cutoff 8 f 5 i 8 k l β = 8 9 beta cutoff g 5 h return v if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do if v α then return v else β min(β,v) Alpha-Beta (σ (h,a), d 1, α, β) Nau: Game Theory 39

40 Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column α = X 7 v min(v, a α = 7 X 8 c b 7 α = X d m β = 8 X 5 8 X 8 e 5 j alpha cutoff 8 f 5 i 8 k l β = 8 9 beta cutoff g 5 h return v if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do if v α then return v else β min(β,v) Alpha-Beta (σ (h,a), d 1, α, β) Nau: Game Theory 40

41 Properties of Alpha-Beta! Alpha-beta pruning reasons about which computations are relevant Ø A form of metareasoning Theorem:! If the value returned by Minimax(h, d) is in [α,β] then Alpha-Beta(h, d, α, β) returns the same value! If the value returned by Minimax(h, d) is α then Alpha-Beta(h, d, α, β) returns a value α! If the value returned by Minimax(h, d) is β then Alpha-Beta(h, d, α, β) returns a value β Corollary: Alpha-Beta(h, d,, ) returns the same value as Minimax(h, d) Alpha-Beta(h,,, ) returns v(h) Nau: Game Theory 41

42 Node Ordering! Deeper lookahead (larger d) usually gives better decisions Ø There are pathological games where it doesn t, but those are rare! Compared to LD-minimax, how much farther ahead can Alpha-Beta look?! Best case: Ø children of Max nodes are searched in greatest-value-first order, children of Min nodes are searched in least-value-first order Ø Alpha-Beta s time complexity is O(b d/2 ) doubles the solvable depth! Worst case: Ø children of Max nodes are searched in least-value first order, children of Min nodes are searched in greatest-value first order Ø Like LD-minimax, Alpha-Beta visits all nodes of depth d: time complexity O(b d ) Nau: Game Theory 42

43 Node Ordering! How to get closer to the best case: Ø Every time you expand a state s, apply e to its children Ø When it s Max s move, sort the children in order of largest e first Ø When it s Min s move, sort the children in order of smallest e first! Suppose we have 100 seconds, explore 10 4 nodes/second Ø 10 6 nodes per move Ø Put this into the form b d/2 35 8/2 Ø Best case Alpha-Beta reaches depth 8 pretty good chess program Nau: Game Theory 43

44 Other Modifications! Several other modifications that can improve the accuracy or computation time: Ø quiescence search and biasing Ø transposition tables Ø thinking on the opponent s time Ø table lookup of book moves Ø iterative deepening Ø forward pruning Nau: Game Theory 44

45 Quiescence Search and Biasing! In a game like checkers or chess, the evaluation is based greatly on material pieces Ø The evaluation is likely to be inaccurate if there are pending captures! Search deeper to reach a position where there aren t pending captures Ø Evaluations will be more accurate here! That creates another problem Ø You re searching some paths to an even depth, others to an odd depth Ø Paths that end just after your opponent s move will look worse than paths that end just after your move! To compensate, add or subtract a number called the biasing factor Nau: Game Theory 45

46 Transposition Tables! Often there are multiple paths to the same state (i.e., the state space is a really graph rather than a tree)! Idea: Ø when you compute s s minimax value, store it in a hash table Ø visit s again retrieve its value rather than computing it again! The hash table is called a transposition table! Problem: too many states to store all of them Ø Store some, rather than all Ø Try to store the ones that you re most likely to need Nau: Game Theory 46

47 Thinking on the Opponent s Time! Suppose you re at node a Ø Use alpha-beta to estimate v(b) and v(c) Ø c looks better, so move there! Consider your estimates of v(f) and v(g) Ø They suggest your opponent is likely to move to f Ø While waiting for the opponent to move, start an alpha-beta search below f! If the opponent moves to f, then you ve already done a lot of the work of figuring out your next move b 3 a d 5 e 3 f 8 g 17 8 c 8 Nau: Game Theory 47

48 Book Moves! In some games, experts have spent lots of time analyzing openings Ø Sequences of moves one might play at the start of the game Ø Best responses to those sequences! Some of these are cataloged in standard reference works Ø e.g., the Encyclopaedia of Chess Openings! Store these in a lookup table Ø Respond almost immediately, as long as the opponent sticks to a sequence that s in the book! A technique humans can use when playing against such a program Ø Deliberately make a move that isn t in the book Ø This may weaken the human s position, but the computer will (i) start taking longer and (ii) stop playing as well Nau: Game Theory 48

49 Iterative Deepening! How deeply should you search a game tree? Ø When you call Alpha-Beta(h, d,, ), what to use for d?! small d => don t make as good a decision! large d => run out of time without knowing what move to make! Solution: iterative deepening for d = 1 by 1 until you run out of time m the move returned by Alpha-Beta(h, d,, )! Why this works: Ø Time complexity is O(b 1 + b b d ) = O(b d ) Ø For large b, each iteration takes much more time than all of the previous iterations put together Nau: Game Theory 49

50 Forward Pruning! Tapered search: Ø Instead of looking at all of a node s children, just look at the n best ones the n highest e(h) values Ø Decrease the value of n as you go deeper in the tree b a 5 c -3 d 8 d 17 ignore n = 3 Ø Drawback: may exclude an important move at a low level of the tree c 2 ignore! Marginal forward pruning: Ø Ignore every child h such that e(h) is smaller than the current best values from the nodes you ve already visited Ø Not reliable, should be avoided Nau: Game Theory 50

51 Forward Pruning! Until the mid-1970s most chess programs were trying to search the same way humans think Ø Use extensive chess knowledge at each node to select a few plausible moves; prune the others Ø Serious tactical shortcomings! Brute-force search programs did better, dominated computer chess for about 20 years! Early 1990s: development of some forward-pruning techniques that worked well Ø Null-move pruning! Today most chess programs use some kind of forward-pruning Ø null-move pruning is one of the most popular Nau: Game Theory 51

52 Game-Tree Search in Practice! Checkers: In 1994, Chinook ended 40-year-reign of human world champion Marion Tinsley Ø Tinsley withdrew for health reasons, died a few months later! In 2007, Checkers was solved: with perfect play, it s a draw This took calculations over 18 years. Search space size ! Chess: In 1997, Deep Blue defeated Gary Kasparov in a six-game match Ø Deep Blue searches 200 million positions per second Ø Uses very sophisticated evaluation, and undisclosed methods for extending some lines of search up to 40 ply! Othello: human champions don t compete against computers Ø The computers are too good! Go: in 2006, good amateurs could beat the best go programs Even with a 9-stone handicap Ø Go programs have improved a lot during the past 7 years Nau: Game Theory 52

53 Rules of Go (Abbreviated)! Go-board: locations (intersections on a grid)! Back and White take turns! Black has the first move! Each move consists of placing a stone at an unoccupied location Ø You just put them on the board; you don t move them around! Adjacent stones of the same color are called a string. Ø Liberties are the empty locations next to the string Ø A string is removed if its number of liberties is 0! Score: territories (number of occupied or surrounded locations) Nau: Game Theory 53

54 Why Go is Difficult for Computers! A game tree s size grows exponentially with both its depth and its branching factor! The game tree for go: Ø branching factor 200 Ø game length 250 to 300 moves Ø number of paths in the game tree to ! Much too big for a normal game tree search b =2 b =3 b =4 Updated 9/23/10 Nau: Game Theory 43 Nau: Game Theory 54

55 Why Go is Difficult for Computers! Writing an evaluation function for chess Ø Mainly piece count, plus some positional considerations isolated/doubled pawns, rooks on open files (columns), pawns in the center of the board, etc. Ø As the game progresses, pieces get removed => evaluation gets easier! Writing an evaluation function for Go is Ø Much more complicated whether or not a group is alive which stones can be connected to one another the extent to which a position has influence or can be attacked! As the game progresses, pieces get added => evaluation gets more complicated Nau: Game Theory 55

56 Game-tree Search in Go! During the past five years, go programs have gotten much better! Main reason: Monte Carlo roll-outs! Basic idea: Ø Generate a list of potential moves A ={a 1,, a n } Ø For each move a i in A do Starting at σ(h,a i ), generate a set of random games G i in which the two players make all their moves at random Ø Choose the move a i that produced the best results Nau: Game Theory 56

57 Multi-Arm Bandit! Statistical model of sequential experiments Ø Name comes from a traditional slot machine (one-armed bandit)! Multiple actions Ø Each action provides a reward from a probability distribution associated with that specific action Ø Objective: maximize the expected utility of a sequence of actions! Exploitation vs exploration dilemma: Ø Exploitation: choosing an action that you already know about, because you think it s likely to give you a high reward Ø Exploration: choosing an action that you don t know much about, in hopes that maybe it will produce a better reward than the actions you already know about Nau: Game Theory 57

58 UCB (Upper Confidence Bound) Algorithm! Let! loop Ø x i = average reward you ve gotten from arm i Ø t i = number of times you ve tried arm i; Ø t = i t i Ø if there are one or more arms that have not been played Ø then play one of them Ø else play the arm i that has the highest value of x i + 2 (log t)/t i Nau: Game Theory 58

59 UCT (UCB for Trees)! Adaptation of UCB for game-tree search in go Ø First used in Mogo in 2006 Ø Mogo won the go tournament at the 2007 Computer Olympiad, and several other computer go tournaments Ø Mogo won one (of 3) 9x9 blitz go games against a human professional! UCT is now used in most computer go programs! I ve had some trouble finding a clear and unambiguous description of UCT Ø But I think the basic algorithm is as follows Nau: Game Theory 59

60 Basic UCT Algorithm! Start with an empty tree, T! loop Ø h the start of the game Ø while h T do T = {nodes we ve evaluated} The loop ends when we reach a node we haven t evaluated use UCB to choose a child of h if h has one or more children that aren t already in T then h any child of h that isn t in T else h arg max k children(h) x k + 2 (log t h )/t k Ø play a random game starting at h; v the game s payoff profile Ø add h to T; t h 1; x h v[ρ(h)] Ø for every g ancestors(h) do x g (t g x g + v[ρ(g)]) / (t g + 1) t g t g + 1 i.e., x g = avg. payoff to ρ(g) of all games below g Nau: Game Theory 60

61 Properties! As the number of iterations goes to infinity, the evaluation x h of the root node converges to its minmax value! Question: Ø How can this be, since x h is the the average of the values we ve gotten below h? Nau: Game Theory 61

62 Properties! As the number of iterations goes to infinity, the evaluation x h of the root node converges to its minmax value! Question: Ø How can this be, since x h is the the average of the values we ve gotten below h?! Look at how h is chosen:! Start with an empty tree, T! loop Ø h the start of the game Ø while h T do if h has one or more children that aren t already in T then h any child of h that isn t in T else h arg max k children(h) x k + 2 (log t h )/t k Ø Nau: Game Theory 62

63 Some Improvements! As written, UCT won t evaluate any nodes below h until all of h s siblings have been visited Ø Not good we want to explore some subtrees more deeply than others! So, replace this:! with this: if h has one or more children that aren t already in T then h any child of h that isn t in T else h arg max k children(h) x k + 2 (log t h )/t k h arg max k children(h) y k Ø where y k = x k + 2 (log t h )/t k, if k is already in T y k = some constant c, otherwise Nau: Game Theory 63

64 Some Improvements! UCB assumes all the levers are independent! Similarly, UCT assumes the values of sibling nodes are independent! So, Ø That s usually not true in practice Ø modify x h using some information computed from x h s siblings and grandparent Nau: Game Theory 64

65 Two-Player Zero-Sum Stochastic Games! Example: Backgammon! Two agents who take turns! At each turn, the set of available moves depends on the results of rolling the dice Ø Each die specifies how far to move one of your pieces (except if you roll doubles) Ø If your piece will land on a location with 2 or more opponent s pieces, you can t move there Ø If your piece lands on a location with 1 opponent s piece, the opponent s piece must start over Nau: Game Theory 65

66 Backgammon Game Tree MAX! Players moves have deterministic outcomes DICE MIN B /36 1,1 1/18 1, /18 1/36 6,5 6, ! Dice rolls have stochastic outcomes DICE MAX... 1/36 1,1 1/18 1,2... C /18 1/36 6,5 6, TERMINAL Nau: Game Theory 66

67 ExpectiMinimax function ExpectiMinimax(h,d) if h Z then return v(h) else if d = 0 then return e(h) else if ρ (h) = Max then return max a χ(h) ExpectiMinimax(σ (h,a), d 1) else if ρ (h) = Min then return min a χ(h) ExpectiMinimax(σ (h,a), d 1) else return Σ a χ(h) Pr[a h] ExpectiMinimax(σ (h,a), d 1)! Returns the minimax expected utilities Ø Can be modified to return the actions! Can also be modified to do α-β pruning Ø But it s more complicated and less effective than in deterministic games Nau: Game Theory 67

68 ! Dice rolls increase branching factor Ø 21 possible rolls with 2 dice In Practice Ø Given the dice roll, 20 legal moves on average For some dice roles, can be much higher At depth 4, number of nodes is = 20 (21 20) Ø As depth increases, probability of reaching a given node shrinks value of lookahead is diminished! α-β pruning is much less effective! TDGammon uses depth-2 search + very good evaluation function Ø world-champion level Ø The evaluation function was created automatically using a machinelearning technique called Temporal Difference learning hence the TD in TDGammon Nau: Game Theory 68

69 Summary! Two-player zero-sum perfect-information games Ø the maximin and minimax strategies are the same Ø only need to look at pure strategies Ø can do a game-tree search minimax values, alpha-beta pruning! In sufficiently complicated games, perfection is unattainable Ø limited search depth, static evaluation function Ø Monte Carlo roll-outs! Game-tree search can be modified for games in which there are stochastic outcomes Nau: Game Theory 69

Game playing. Chapter 6. Chapter 6 1

Game playing. Chapter 6. Chapter 6 1 Game playing Chapter 6 Chapter 6 1 Outline Minimax α β pruning UCT for games Chapter 6 2 Game tree (2-player, deterministic, turns) Chapter 6 3 Minimax Perfect play for deterministic, perfect-information

More information

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence CS 188: Artificial Intelligence Adversarial Search II Instructor: Anca Dragan University of California, Berkeley [These slides adapted from Dan Klein and Pieter Abbeel] Minimax Example 3 12 8 2 4 6 14

More information

CS 188: Artificial Intelligence Fall Announcements

CS 188: Artificial Intelligence Fall Announcements CS 188: Artificial Intelligence Fall 2009 Lecture 7: Expectimax Search 9/17/2008 Dan Klein UC Berkeley Many slides over the course adapted from either Stuart Russell or Andrew Moore 1 Announcements Written

More information

Announcements. CS 188: Artificial Intelligence Fall Adversarial Games. Computing Minimax Values. Evaluation Functions. Recap: Resource Limits

Announcements. CS 188: Artificial Intelligence Fall Adversarial Games. Computing Minimax Values. Evaluation Functions. Recap: Resource Limits CS 188: Artificial Intelligence Fall 2009 Lecture 7: Expectimax Search 9/17/2008 Announcements Written 1: Search and CSPs is up Project 2: Multi-agent Search is up Want a partner? Come to the front after

More information

Algorithms for Playing and Solving games*

Algorithms for Playing and Solving games* Algorithms for Playing and Solving games* Andrew W. Moore Professor School of Computer Science Carnegie Mellon University www.cs.cmu.edu/~awm awm@cs.cmu.edu 412-268-7599 * Two Player Zero-sum Discrete

More information

Announcements. CS 188: Artificial Intelligence Spring Mini-Contest Winners. Today. GamesCrafters. Adversarial Games

Announcements. CS 188: Artificial Intelligence Spring Mini-Contest Winners. Today. GamesCrafters. Adversarial Games CS 188: Artificial Intelligence Spring 2009 Lecture 7: Expectimax Search 2/10/2009 John DeNero UC Berkeley Slides adapted from Dan Klein, Stuart Russell or Andrew Moore Announcements Written Assignment

More information

CSE 573: Artificial Intelligence

CSE 573: Artificial Intelligence CSE 573: Artificial Intelligence Autumn 2010 Lecture 5: Expectimax Search 10/14/2008 Luke Zettlemoyer Most slides over the course adapted from either Dan Klein, Stuart Russell or Andrew Moore 1 Announcements

More information

Adversarial Search & Logic and Reasoning

Adversarial Search & Logic and Reasoning CSEP 573 Adversarial Search & Logic and Reasoning CSE AI Faculty Recall from Last Time: Adversarial Games as Search Convention: first player is called MAX, 2nd player is called MIN MAX moves first and

More information

Evaluation for Pacman. CS 188: Artificial Intelligence Fall Iterative Deepening. α-β Pruning Example. α-β Pruning Pseudocode.

Evaluation for Pacman. CS 188: Artificial Intelligence Fall Iterative Deepening. α-β Pruning Example. α-β Pruning Pseudocode. CS 188: Artificial Intelligence Fall 2008 Evaluation for Pacman Lecture 7: Expectimax Search 9/18/2008 [DEMO: thrashing, smart ghosts] Dan Klein UC Berkeley Many slides over the course adapted from either

More information

CS 188: Artificial Intelligence Fall 2008

CS 188: Artificial Intelligence Fall 2008 CS 188: Artificial Intelligence Fall 2008 Lecture 7: Expectimax Search 9/18/2008 Dan Klein UC Berkeley Many slides over the course adapted from either Stuart Russell or Andrew Moore 1 1 Evaluation for

More information

Alpha-Beta Pruning: Algorithm and Analysis

Alpha-Beta Pruning: Algorithm and Analysis Alpha-Beta Pruning: Algorithm and Analysis Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 Introduction Alpha-beta pruning is the standard searching procedure used for solving

More information

Alpha-Beta Pruning: Algorithm and Analysis

Alpha-Beta Pruning: Algorithm and Analysis Alpha-Beta Pruning: Algorithm and Analysis Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 Introduction Alpha-beta pruning is the standard searching procedure used for 2-person

More information

Alpha-Beta Pruning: Algorithm and Analysis

Alpha-Beta Pruning: Algorithm and Analysis Alpha-Beta Pruning: Algorithm and Analysis Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 Introduction Alpha-beta pruning is the standard searching procedure used for 2-person

More information

CS 188 Introduction to Fall 2007 Artificial Intelligence Midterm

CS 188 Introduction to Fall 2007 Artificial Intelligence Midterm NAME: SID#: Login: Sec: 1 CS 188 Introduction to Fall 2007 Artificial Intelligence Midterm You have 80 minutes. The exam is closed book, closed notes except a one-page crib sheet, basic calculators only.

More information

Summary. Agenda. Games. Intelligence opponents in game. Expected Value Expected Max Algorithm Minimax Algorithm Alpha-beta Pruning Simultaneous Game

Summary. Agenda. Games. Intelligence opponents in game. Expected Value Expected Max Algorithm Minimax Algorithm Alpha-beta Pruning Simultaneous Game Summary rtificial Intelligence and its applications Lecture 4 Game Playing Search onstraint Satisfaction Problems From start state to goal state onsider constraints Professor Daniel Yeung danyeung@ieee.org

More information

CS 188: Artificial Intelligence Spring 2007

CS 188: Artificial Intelligence Spring 2007 CS 188: Artificial Intelligence Spring 2007 Lecture 8: Logical Agents - I 2/8/2007 Srini Narayanan ICSI and UC Berkeley Many slides over the course adapted from Dan Klein, Stuart Russell or Andrew Moore

More information

CS 570: Machine Learning Seminar. Fall 2016

CS 570: Machine Learning Seminar. Fall 2016 CS 570: Machine Learning Seminar Fall 2016 Class Information Class web page: http://web.cecs.pdx.edu/~mm/mlseminar2016-2017/fall2016/ Class mailing list: cs570@cs.pdx.edu My office hours: T,Th, 2-3pm or

More information

Notes on induction proofs and recursive definitions

Notes on induction proofs and recursive definitions Notes on induction proofs and recursive definitions James Aspnes December 13, 2010 1 Simple induction Most of the proof techniques we ve talked about so far are only really useful for proving a property

More information

Reinforcement Learning. Spring 2018 Defining MDPs, Planning

Reinforcement Learning. Spring 2018 Defining MDPs, Planning Reinforcement Learning Spring 2018 Defining MDPs, Planning understandability 0 Slide 10 time You are here Markov Process Where you will go depends only on where you are Markov Process: Information state

More information

IS-ZC444: ARTIFICIAL INTELLIGENCE

IS-ZC444: ARTIFICIAL INTELLIGENCE IS-ZC444: ARTIFICIAL INTELLIGENCE Lecture-07: Beyond Classical Search Dr. Kamlesh Tiwari Assistant Professor Department of Computer Science and Information Systems, BITS Pilani, Pilani, Jhunjhunu-333031,

More information

CS 4100 // artificial intelligence. Recap/midterm review!

CS 4100 // artificial intelligence. Recap/midterm review! CS 4100 // artificial intelligence instructor: byron wallace Recap/midterm review! Attribution: many of these slides are modified versions of those distributed with the UC Berkeley CS188 materials Thanks

More information

POLYNOMIAL SPACE QSAT. Games. Polynomial space cont d

POLYNOMIAL SPACE QSAT. Games. Polynomial space cont d T-79.5103 / Autumn 2008 Polynomial Space 1 T-79.5103 / Autumn 2008 Polynomial Space 3 POLYNOMIAL SPACE Polynomial space cont d Polynomial space-bounded computation has a variety of alternative characterizations

More information

Industrial Organization Lecture 3: Game Theory

Industrial Organization Lecture 3: Game Theory Industrial Organization Lecture 3: Game Theory Nicolas Schutz Nicolas Schutz Game Theory 1 / 43 Introduction Why game theory? In the introductory lecture, we defined Industrial Organization as the economics

More information

Minimax strategies, alpha beta pruning. Lirong Xia

Minimax strategies, alpha beta pruning. Lirong Xia Minimax strategies, alpha beta pruning Lirong Xia Reminder ØProject 1 due tonight Makes sure you DO NOT SEE ERROR: Summation of parsed points does not match ØProject 2 due in two weeks 2 How to find good

More information

An Analysis of Forward Pruning. to try to understand why programs have been unable to. pruning more eectively. 1

An Analysis of Forward Pruning. to try to understand why programs have been unable to. pruning more eectively. 1 Proc. AAAI-94, to appear. An Analysis of Forward Pruning Stephen J. J. Smith Dana S. Nau Department of Computer Science Department of Computer Science, and University of Maryland Institute for Systems

More information

Computing Minmax; Dominance

Computing Minmax; Dominance Computing Minmax; Dominance CPSC 532A Lecture 5 Computing Minmax; Dominance CPSC 532A Lecture 5, Slide 1 Lecture Overview 1 Recap 2 Linear Programming 3 Computational Problems Involving Maxmin 4 Domination

More information

Basic Game Theory. Kate Larson. January 7, University of Waterloo. Kate Larson. What is Game Theory? Normal Form Games. Computing Equilibria

Basic Game Theory. Kate Larson. January 7, University of Waterloo. Kate Larson. What is Game Theory? Normal Form Games. Computing Equilibria Basic Game Theory University of Waterloo January 7, 2013 Outline 1 2 3 What is game theory? The study of games! Bluffing in poker What move to make in chess How to play Rock-Scissors-Paper Also study of

More information

CSC242: Intro to AI. Lecture 7 Games of Imperfect Knowledge & Constraint Satisfaction

CSC242: Intro to AI. Lecture 7 Games of Imperfect Knowledge & Constraint Satisfaction CSC242: Intro to AI Lecture 7 Games of Imperfect Knowledge & Constraint Satisfaction What is This? 25" 20" 15" 10" 5" 0" Quiz 1 25" B 20" 15" 10" F D C A 5" F 0" Moral Many people cannot learn from lectures

More information

Playing Abstract games with Hidden States (Spatial and Non-Spatial).

Playing Abstract games with Hidden States (Spatial and Non-Spatial). Playing Abstract games with Hidden States (Spatial and Non-Spatial). Gregory Calbert, Hing-Wah Kwok Peter Smet, Jason Scholz, Michael Webb VE Group, C2D, DSTO. Report Documentation Page Form Approved OMB

More information

Computing Minmax; Dominance

Computing Minmax; Dominance Computing Minmax; Dominance CPSC 532A Lecture 5 Computing Minmax; Dominance CPSC 532A Lecture 5, Slide 1 Lecture Overview 1 Recap 2 Linear Programming 3 Computational Problems Involving Maxmin 4 Domination

More information

Lecture 1: March 7, 2018

Lecture 1: March 7, 2018 Reinforcement Learning Spring Semester, 2017/8 Lecture 1: March 7, 2018 Lecturer: Yishay Mansour Scribe: ym DISCLAIMER: Based on Learning and Planning in Dynamical Systems by Shie Mannor c, all rights

More information

Properties of Forward Pruning in Game-Tree Search

Properties of Forward Pruning in Game-Tree Search Properties of Forward Pruning in Game-Tree Search Yew Jin Lim and Wee Sun Lee School of Computing National University of Singapore {limyewji,leews}@comp.nus.edu.sg Abstract Forward pruning, or selectively

More information

Evolutionary Computation: introduction

Evolutionary Computation: introduction Evolutionary Computation: introduction Dirk Thierens Universiteit Utrecht The Netherlands Dirk Thierens (Universiteit Utrecht) EC Introduction 1 / 42 What? Evolutionary Computation Evolutionary Computation

More information

Final. Introduction to Artificial Intelligence. CS 188 Spring You have approximately 2 hours and 50 minutes.

Final. Introduction to Artificial Intelligence. CS 188 Spring You have approximately 2 hours and 50 minutes. CS 188 Spring 2014 Introduction to Artificial Intelligence Final You have approximately 2 hours and 50 minutes. The exam is closed book, closed notes except your two-page crib sheet. Mark your answers

More information

CS599 Lecture 1 Introduction To RL

CS599 Lecture 1 Introduction To RL CS599 Lecture 1 Introduction To RL Reinforcement Learning Introduction Learning from rewards Policies Value Functions Rewards Models of the Environment Exploitation vs. Exploration Dynamic Programming

More information

1 Extensive Form Games

1 Extensive Form Games 1 Extensive Form Games De nition 1 A nite extensive form game is am object K = fn; (T ) ; P; A; H; u; g where: N = f0; 1; :::; ng is the set of agents (player 0 is nature ) (T ) is the game tree P is the

More information

Counting. 1 Sum Rule. Example 1. Lecture Notes #1 Sept 24, Chris Piech CS 109

Counting. 1 Sum Rule. Example 1. Lecture Notes #1 Sept 24, Chris Piech CS 109 1 Chris Piech CS 109 Counting Lecture Notes #1 Sept 24, 2018 Based on a handout by Mehran Sahami with examples by Peter Norvig Although you may have thought you had a pretty good grasp on the notion of

More information

CS885 Reinforcement Learning Lecture 7a: May 23, 2018

CS885 Reinforcement Learning Lecture 7a: May 23, 2018 CS885 Reinforcement Learning Lecture 7a: May 23, 2018 Policy Gradient Methods [SutBar] Sec. 13.1-13.3, 13.7 [SigBuf] Sec. 5.1-5.2, [RusNor] Sec. 21.5 CS885 Spring 2018 Pascal Poupart 1 Outline Stochastic

More information

Scout, NegaScout and Proof-Number Search

Scout, NegaScout and Proof-Number Search Scout, NegaScout and Proof-Number Search Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 Introduction It looks like alpha-beta pruning is the best we can do for a generic searching

More information

Deep Reinforcement Learning. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 19, 2017

Deep Reinforcement Learning. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 19, 2017 Deep Reinforcement Learning STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 19, 2017 Outline Introduction to Reinforcement Learning AlphaGo (Deep RL for Computer Go)

More information

Math Models of OR: Branch-and-Bound

Math Models of OR: Branch-and-Bound Math Models of OR: Branch-and-Bound John E. Mitchell Department of Mathematical Sciences RPI, Troy, NY 12180 USA November 2018 Mitchell Branch-and-Bound 1 / 15 Branch-and-Bound Outline 1 Branch-and-Bound

More information

Scout and NegaScout. Tsan-sheng Hsu.

Scout and NegaScout. Tsan-sheng Hsu. Scout and NegaScout Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 Abstract It looks like alpha-beta pruning is the best we can do for an exact generic searching procedure.

More information

4: Dynamic games. Concordia February 6, 2017

4: Dynamic games. Concordia February 6, 2017 INSE6441 Jia Yuan Yu 4: Dynamic games Concordia February 6, 2017 We introduce dynamic game with non-simultaneous moves. Example 0.1 (Ultimatum game). Divide class into two groups at random: Proposers,

More information

Notes on Coursera s Game Theory

Notes on Coursera s Game Theory Notes on Coursera s Game Theory Manoel Horta Ribeiro Week 01: Introduction and Overview Game theory is about self interested agents interacting within a specific set of rules. Self-Interested Agents have

More information

Nondeterministic finite automata

Nondeterministic finite automata Lecture 3 Nondeterministic finite automata This lecture is focused on the nondeterministic finite automata (NFA) model and its relationship to the DFA model. Nondeterminism is an important concept in the

More information

Basics of reinforcement learning

Basics of reinforcement learning Basics of reinforcement learning Lucian Buşoniu TMLSS, 20 July 2018 Main idea of reinforcement learning (RL) Learn a sequential decision policy to optimize the cumulative performance of an unknown system

More information

CS221 Practice Midterm

CS221 Practice Midterm CS221 Practice Midterm Autumn 2012 1 ther Midterms The following pages are excerpts from similar classes midterms. The content is similar to what we ve been covering this quarter, so that it should be

More information

COMP3702/7702 Artificial Intelligence Week1: Introduction Russell & Norvig ch.1-2.3, Hanna Kurniawati

COMP3702/7702 Artificial Intelligence Week1: Introduction Russell & Norvig ch.1-2.3, Hanna Kurniawati COMP3702/7702 Artificial Intelligence Week1: Introduction Russell & Norvig ch.1-2.3, 3.1-3.3 Hanna Kurniawati Today } What is Artificial Intelligence? } Better know what it is first before committing the

More information

Solving Extensive Form Games

Solving Extensive Form Games Chapter 8 Solving Extensive Form Games 8.1 The Extensive Form of a Game The extensive form of a game contains the following information: (1) the set of players (2) the order of moves (that is, who moves

More information

A Polynomial-time Nash Equilibrium Algorithm for Repeated Games

A Polynomial-time Nash Equilibrium Algorithm for Repeated Games A Polynomial-time Nash Equilibrium Algorithm for Repeated Games Michael L. Littman mlittman@cs.rutgers.edu Rutgers University Peter Stone pstone@cs.utexas.edu The University of Texas at Austin Main Result

More information

Satisfaction Equilibrium: Achieving Cooperation in Incomplete Information Games

Satisfaction Equilibrium: Achieving Cooperation in Incomplete Information Games Satisfaction Equilibrium: Achieving Cooperation in Incomplete Information Games Stéphane Ross and Brahim Chaib-draa Department of Computer Science and Software Engineering Laval University, Québec (Qc),

More information

1. Counting. Chris Piech and Mehran Sahami. Oct 2017

1. Counting. Chris Piech and Mehran Sahami. Oct 2017 . Counting Chris Piech and Mehran Sahami Oct 07 Introduction Although you may have thought you had a pretty good grasp on the notion of counting at the age of three, it turns out that you had to wait until

More information

Deep Reinforcement Learning

Deep Reinforcement Learning Martin Matyášek Artificial Intelligence Center Czech Technical University in Prague October 27, 2016 Martin Matyášek VPD, 2016 1 / 50 Reinforcement Learning in a picture R. S. Sutton and A. G. Barto 2015

More information

Administration. CSCI567 Machine Learning (Fall 2018) Outline. Outline. HW5 is available, due on 11/18. Practice final will also be available soon.

Administration. CSCI567 Machine Learning (Fall 2018) Outline. Outline. HW5 is available, due on 11/18. Practice final will also be available soon. Administration CSCI567 Machine Learning Fall 2018 Prof. Haipeng Luo U of Southern California Nov 7, 2018 HW5 is available, due on 11/18. Practice final will also be available soon. Remaining weeks: 11/14,

More information

Computational aspects of two-player zero-sum games Course notes for Computational Game Theory Section 4 Fall 2010

Computational aspects of two-player zero-sum games Course notes for Computational Game Theory Section 4 Fall 2010 Computational aspects of two-player zero-sum games Course notes for Computational Game Theory Section 4 Fall 010 Peter Bro Miltersen January 16, 011 Version 1. 4 Finite Perfect Information Games Definition

More information

CMU Lecture 12: Reinforcement Learning. Teacher: Gianni A. Di Caro

CMU Lecture 12: Reinforcement Learning. Teacher: Gianni A. Di Caro CMU 15-781 Lecture 12: Reinforcement Learning Teacher: Gianni A. Di Caro REINFORCEMENT LEARNING Transition Model? State Action Reward model? Agent Goal: Maximize expected sum of future rewards 2 MDP PLANNING

More information

Marks. bonus points. } Assignment 1: Should be out this weekend. } Mid-term: Before the last lecture. } Mid-term deferred exam:

Marks. bonus points. } Assignment 1: Should be out this weekend. } Mid-term: Before the last lecture. } Mid-term deferred exam: Marks } Assignment 1: Should be out this weekend } All are marked, I m trying to tally them and perhaps add bonus points } Mid-term: Before the last lecture } Mid-term deferred exam: } This Saturday, 9am-10.30am,

More information

Rollout-based Game-tree Search Outprunes Traditional Alpha-beta

Rollout-based Game-tree Search Outprunes Traditional Alpha-beta Journal of Machine Learning Research vol:1 8, 2012 Submitted 6/2012 Rollout-based Game-tree Search Outprunes Traditional Alpha-beta Ari Weinstein Michael L. Littman Sergiu Goschin aweinst@cs.rutgers.edu

More information

algorithms Alpha-Beta Pruning and Althöfer s Pathology-Free Negamax Algorithm Algorithms 2012, 5, ; doi: /a

algorithms Alpha-Beta Pruning and Althöfer s Pathology-Free Negamax Algorithm Algorithms 2012, 5, ; doi: /a Algorithms 01, 5, 51-58; doi:10.3390/a504051 Article OPEN ACCESS algorithms ISSN 1999-4893 www.mdpi.com/journal/algorithms Alpha-Beta Pruning and Althöfer s Pathology-Free Negamax Algorithm Ashraf M. Abdelbar

More information

COMP3702/7702 Artificial Intelligence Lecture 11: Introduction to Machine Learning and Reinforcement Learning. Hanna Kurniawati

COMP3702/7702 Artificial Intelligence Lecture 11: Introduction to Machine Learning and Reinforcement Learning. Hanna Kurniawati COMP3702/7702 Artificial Intelligence Lecture 11: Introduction to Machine Learning and Reinforcement Learning Hanna Kurniawati Today } What is machine learning? } Where is it used? } Types of machine learning

More information

Descriptive Statistics (And a little bit on rounding and significant digits)

Descriptive Statistics (And a little bit on rounding and significant digits) Descriptive Statistics (And a little bit on rounding and significant digits) Now that we know what our data look like, we d like to be able to describe it numerically. In other words, how can we represent

More information

Limits of Computation

Limits of Computation The real danger is not that computers will begin to think like men, but that men will begin to think like computers Limits of Computation - Sydney J. Harris What makes you believe now that I am just talking

More information

Principles of Computing, Carnegie Mellon University. The Limits of Computing

Principles of Computing, Carnegie Mellon University. The Limits of Computing The Limits of Computing Intractability Limits of Computing Announcement Final Exam is on Friday 9:00am 10:20am Part 1 4:30pm 6:10pm Part 2 If you did not fill in the course evaluations please do it today.

More information

Two hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE. Date: Thursday 17th May 2018 Time: 09:45-11:45. Please answer all Questions.

Two hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE. Date: Thursday 17th May 2018 Time: 09:45-11:45. Please answer all Questions. COMP 34120 Two hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE AI and Games Date: Thursday 17th May 2018 Time: 09:45-11:45 Please answer all Questions. Use a SEPARATE answerbook for each SECTION

More information

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 2778 mgl@cs.duke.edu

More information

Quantum Games. Quantum Strategies in Classical Games. Presented by Yaniv Carmeli

Quantum Games. Quantum Strategies in Classical Games. Presented by Yaniv Carmeli Quantum Games Quantum Strategies in Classical Games Presented by Yaniv Carmeli 1 Talk Outline Introduction Game Theory Why quantum games? PQ Games PQ penny flip 2x2 Games Quantum strategies 2 Game Theory

More information

Chapter 4 Deliberation with Temporal Domain Models

Chapter 4 Deliberation with Temporal Domain Models Last update: April 10, 2018 Chapter 4 Deliberation with Temporal Domain Models Section 4.4: Constraint Management Section 4.5: Acting with Temporal Models Automated Planning and Acting Malik Ghallab, Dana

More information

Learning an Effective Strategy in a Multi-Agent System with Hidden Information

Learning an Effective Strategy in a Multi-Agent System with Hidden Information Learning an Effective Strategy in a Multi-Agent System with Hidden Information Richard Mealing Supervisor: Jon Shapiro Machine Learning and Optimisation Group School of Computer Science University of Manchester

More information

Computer Science 385 Analysis of Algorithms Siena College Spring Topic Notes: Limitations of Algorithms

Computer Science 385 Analysis of Algorithms Siena College Spring Topic Notes: Limitations of Algorithms Computer Science 385 Analysis of Algorithms Siena College Spring 2011 Topic Notes: Limitations of Algorithms We conclude with a discussion of the limitations of the power of algorithms. That is, what kinds

More information

CSC304 Lecture 5. Game Theory : Zero-Sum Games, The Minimax Theorem. CSC304 - Nisarg Shah 1

CSC304 Lecture 5. Game Theory : Zero-Sum Games, The Minimax Theorem. CSC304 - Nisarg Shah 1 CSC304 Lecture 5 Game Theory : Zero-Sum Games, The Minimax Theorem CSC304 - Nisarg Shah 1 Recap Last lecture Cost-sharing games o Price of anarchy (PoA) can be n o Price of stability (PoS) is O(log n)

More information

1 Definitions and Things You Know

1 Definitions and Things You Know We will discuss an algorithm for finding stable matchings (not the one you re probably familiar with). The Instability Chaining Algorithm is the most similar algorithm in the literature to the one actually

More information

Synthesis weakness of standard approach. Rational Synthesis

Synthesis weakness of standard approach. Rational Synthesis 1 Synthesis weakness of standard approach Rational Synthesis 3 Overview Introduction to formal verification Reactive systems Verification Synthesis Introduction to Formal Verification of Reactive Systems

More information

Microeconomics. 2. Game Theory

Microeconomics. 2. Game Theory Microeconomics 2. Game Theory Alex Gershkov http://www.econ2.uni-bonn.de/gershkov/gershkov.htm 18. November 2008 1 / 36 Dynamic games Time permitting we will cover 2.a Describing a game in extensive form

More information

Learning Tetris. 1 Tetris. February 3, 2009

Learning Tetris. 1 Tetris. February 3, 2009 Learning Tetris Matt Zucker Andrew Maas February 3, 2009 1 Tetris The Tetris game has been used as a benchmark for Machine Learning tasks because its large state space (over 2 200 cell configurations are

More information

N/4 + N/2 + N = 2N 2.

N/4 + N/2 + N = 2N 2. CS61B Summer 2006 Instructor: Erin Korber Lecture 24, 7 Aug. 1 Amortized Analysis For some of the data structures we ve discussed (namely hash tables and splay trees), it was claimed that the average time

More information

5.2 Infinite Series Brian E. Veitch

5.2 Infinite Series Brian E. Veitch 5. Infinite Series Since many quantities show up that cannot be computed exactly, we need some way of representing it (or approximating it). One way is to sum an infinite series. Recall that a n is the

More information

Evolutionary Dynamics and Extensive Form Games by Ross Cressman. Reviewed by William H. Sandholm *

Evolutionary Dynamics and Extensive Form Games by Ross Cressman. Reviewed by William H. Sandholm * Evolutionary Dynamics and Extensive Form Games by Ross Cressman Reviewed by William H. Sandholm * Noncooperative game theory is one of a handful of fundamental frameworks used for economic modeling. It

More information

1 Trees. Listing 1: Node with two child reference. public class ptwochildnode { protected Object data ; protected ptwochildnode l e f t, r i g h t ;

1 Trees. Listing 1: Node with two child reference. public class ptwochildnode { protected Object data ; protected ptwochildnode l e f t, r i g h t ; 1 Trees The next major set of data structures belongs to what s called Trees. They are called that, because if you try to visualize the structure, it kind of looks like a tree (root, branches, and leafs).

More information

Today s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning

Today s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning CSE 473: Artificial Intelligence Reinforcement Learning Dan Weld Today s Outline Reinforcement Learning Q-value iteration Q-learning Exploration / exploitation Linear function approximation Many slides

More information

Appendix. Mathematical Theorems

Appendix. Mathematical Theorems Appendix Mathematical Theorems This appendix introduces some theorems required in the discussion in Chap. 5 of the asymptotic behavior of minimax values in Pb-game models. Throughout this appendix, b is

More information

CS188: Artificial Intelligence, Fall 2009 Written 2: MDPs, RL, and Probability

CS188: Artificial Intelligence, Fall 2009 Written 2: MDPs, RL, and Probability CS188: Artificial Intelligence, Fall 2009 Written 2: MDPs, RL, and Probability Due: Thursday 10/15 in 283 Soda Drop Box by 11:59pm (no slip days) Policy: Can be solved in groups (acknowledge collaborators)

More information

Normal-form games. Vincent Conitzer

Normal-form games. Vincent Conitzer Normal-form games Vincent Conitzer conitzer@cs.duke.edu 2/3 of the average game Everyone writes down a number between 0 and 100 Person closest to 2/3 of the average wins Example: A says 50 B says 10 C

More information

Game Theory and its Applications to Networks - Part I: Strict Competition

Game Theory and its Applications to Networks - Part I: Strict Competition Game Theory and its Applications to Networks - Part I: Strict Competition Corinne Touati Master ENS Lyon, Fall 200 What is Game Theory and what is it for? Definition (Roger Myerson, Game Theory, Analysis

More information

Artificial Intelligence Methods (G5BAIM) - Examination

Artificial Intelligence Methods (G5BAIM) - Examination Question 1 a) According to John Koza there are five stages when planning to solve a problem using a genetic program. What are they? Give a short description of each. (b) How could you cope with division

More information

Expected Value II. 1 The Expected Number of Events that Happen

Expected Value II. 1 The Expected Number of Events that Happen 6.042/18.062J Mathematics for Computer Science December 5, 2006 Tom Leighton and Ronitt Rubinfeld Lecture Notes Expected Value II 1 The Expected Number of Events that Happen Last week we concluded by showing

More information

Extensive games (with perfect information)

Extensive games (with perfect information) Extensive games (with perfect information) (also referred to as extensive-form games or dynamic games) DEFINITION An extensive game with perfect information has the following components A set N (the set

More information

CS 301. Lecture 18 Decidable languages. Stephen Checkoway. April 2, 2018

CS 301. Lecture 18 Decidable languages. Stephen Checkoway. April 2, 2018 CS 301 Lecture 18 Decidable languages Stephen Checkoway April 2, 2018 1 / 26 Decidable language Recall, a language A is decidable if there is some TM M that 1 recognizes A (i.e., L(M) = A), and 2 halts

More information

ECO 199 GAMES OF STRATEGY Spring Term 2004 Precepts Week 7 March Questions GAMES WITH ASYMMETRIC INFORMATION QUESTIONS

ECO 199 GAMES OF STRATEGY Spring Term 2004 Precepts Week 7 March Questions GAMES WITH ASYMMETRIC INFORMATION QUESTIONS ECO 199 GAMES OF STRATEGY Spring Term 2004 Precepts Week 7 March 22-23 Questions GAMES WITH ASYMMETRIC INFORMATION QUESTIONS Question 1: In the final stages of the printing of Games of Strategy, Sue Skeath

More information

6.207/14.15: Networks Lecture 10: Introduction to Game Theory 2

6.207/14.15: Networks Lecture 10: Introduction to Game Theory 2 6.207/14.15: Networks Lecture 10: Introduction to Game Theory 2 Daron Acemoglu and Asu Ozdaglar MIT October 14, 2009 1 Introduction Outline Mixed Strategies Existence of Mixed Strategy Nash Equilibrium

More information

Refinements - change set of equilibria to find "better" set of equilibria by eliminating some that are less plausible

Refinements - change set of equilibria to find better set of equilibria by eliminating some that are less plausible efinements efinements - change set of equilibria to find "better" set of equilibria by eliminating some that are less plausible Strategic Form Eliminate Weakly Dominated Strategies - Purpose - throwing

More information

Quadratic Equations Part I

Quadratic Equations Part I Quadratic Equations Part I Before proceeding with this section we should note that the topic of solving quadratic equations will be covered in two sections. This is done for the benefit of those viewing

More information

Algorithms, Games, and Networks January 17, Lecture 2

Algorithms, Games, and Networks January 17, Lecture 2 Algorithms, Games, and Networks January 17, 2013 Lecturer: Avrim Blum Lecture 2 Scribe: Aleksandr Kazachkov 1 Readings for today s lecture Today s topic is online learning, regret minimization, and minimax

More information

CS 662 Sample Midterm

CS 662 Sample Midterm Name: 1 True/False, plus corrections CS 662 Sample Midterm 35 pts, 5 pts each Each of the following statements is either true or false. If it is true, mark it true. If it is false, correct the statement

More information

Languages, regular languages, finite automata

Languages, regular languages, finite automata Notes on Computer Theory Last updated: January, 2018 Languages, regular languages, finite automata Content largely taken from Richards [1] and Sipser [2] 1 Languages An alphabet is a finite set of characters,

More information

Lecture 15: Bandit problems. Markov Processes. Recall: Lotteries and utilities

Lecture 15: Bandit problems. Markov Processes. Recall: Lotteries and utilities Lecture 15: Bandit problems. Markov Processes Bandit problems Action values (and now to compute them) Exploration-exploitation trade-off Simple exploration strategies -greedy Softmax (Boltzmann) exploration

More information

ARTIFICIAL INTELLIGENCE. Reinforcement learning

ARTIFICIAL INTELLIGENCE. Reinforcement learning INFOB2KI 2018-2019 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Reinforcement learning Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Function approximation Mario Martin CS-UPC May 18, 2018 Mario Martin (CS-UPC) Reinforcement Learning May 18, 2018 / 65 Recap Algorithms: MonteCarlo methods for Policy Evaluation

More information

Quiz 1 Solutions. Problem 2. Asymptotics & Recurrences [20 points] (3 parts)

Quiz 1 Solutions. Problem 2. Asymptotics & Recurrences [20 points] (3 parts) Introduction to Algorithms October 13, 2010 Massachusetts Institute of Technology 6.006 Fall 2010 Professors Konstantinos Daskalakis and Patrick Jaillet Quiz 1 Solutions Quiz 1 Solutions Problem 1. We

More information

Sampling from Bayes Nets

Sampling from Bayes Nets from Bayes Nets http://www.youtube.com/watch?v=mvrtaljp8dm http://www.youtube.com/watch?v=geqip_0vjec Paper reviews Should be useful feedback for the authors A critique of the paper No paper is perfect!

More information

Skylines. Yufei Tao. ITEE University of Queensland. INFS4205/7205, Uni of Queensland

Skylines. Yufei Tao. ITEE University of Queensland. INFS4205/7205, Uni of Queensland Yufei Tao ITEE University of Queensland Today we will discuss problems closely related to the topic of multi-criteria optimization, where one aims to identify objects that strike a good balance often optimal

More information