CMSC 474, Game Theory
|
|
- Eileen Bridges
- 5 years ago
- Views:
Transcription
1 CMSC 474, Game Theory 4b. Game-Tree Search Dana Nau University of Maryland Nau: Game Theory 1
2 Finite perfect-information zero-sum games! Finite: Ø finitely many agents, actions, states, histories! Perfect information: Ø Every agent knows all of the players utility functions all of the players actions and what they do the history and current state Ø No simultaneous actions agents move one-at-a-time! Constant sum (or zero-sum): Ø Constant k such that regardless of how the game ends, Σ i=1,,n u i = k Ø For every such game, there s an equivalent game in which k = 0 Nau: Game Theory 2
3 Examples! Deterministic: Ø chess, checkers Ø go, gomoku Ø reversi (othello) Ø tic-tac-toe, qubic, connect-four Ø mancala (awari, kalah) Ø 9 men s morris (merelles, morels, mill)! Stochastic: Ø backgammon, monopoly, yahtzee, parcheesi, roulette, craps! For now, we ll consider just the deterministic games Nau: Game Theory 3
4 Outline! A brief history of work on this topic! Restatement of the Minimax Theorem! Game trees! The minimax algorithm! α-β pruning! Resource limits, approximate evaluation! Most of this isn t in the game-theory book! For further information, look at one of the following Ø The private materials page Ø Russell & Norvig s Artificial Intelligence: A Modern Approach There are 3 editions of this book In the 2 nd edition, it s Chapter 6 Nau: Game Theory 4
5 Brief History 1846 (Babbage) designed machine to play tic-tac-toe 1928 (von Neumann) minimax theorem 1944 (von Neumann & Morgenstern) backward induction 1950 (Shannon) minimax algorithm (finite-horizon search) 1951 (Turing) program (on paper) for playing chess (Samuel) checkers program capable of beating its creator 1956 (McCarthy) pruning to allow deeper minimax search 1957 (Bernstein) first complete chess program, on IBM 704 vacuum-tube computer could examine about 350 positions/minute 1967 (Greenblatt) first program to compete in human chess tournaments 3 wins, 3 draws, 12 losses 1992 (Schaeffer) Chinook won the 1992 US Open checkers tournament 1994 (Schaeffer) Chinook became world checkers champion; Tinsley (human champion) withdrew for health reasons 1997 (Hsu et al) Deep Blue won 6-game match vs world chess champion Kasparov 2007 (Schaeffer et al) Checkers solved: with perfect play, it s a draw calculations over 18 years Nau: Game Theory 5
6 Restatement of the Minimax Theorem! Suppose agents 1 and 2 use strategies s and t on a 2-person game G Ø Let u(s,t) = u 1 (s,t) = u 2 (s,t) Ø Call the agents Max and Min (they want to maximize and minimize u) Minimax Theorem: If G is a two-person finite zero-sum game, then there are strategies s * and t *, and a number v called G s minimax value, such that Ø If Min uses t *, Max s expected utility is v, i.e., max s u(s,t * ) = v Ø If Max uses s *, Max s expected utility is v, i.e., min t u(s *,t) = v Corollary 1:! u(s *,t * ) = v! (s *,t * ) is a Nash equilibrium s * and t * are also called perfect play for Max and Min! s * (or t * ) is Max s (or Min s) minimax strategy and maximin strategy Corollary 2: If G is a perfect-information game, then there are subgameperfect pure strategies s * and t * that satisfy the theorem. Nau: Game Theory 6
7 No Non-Credible Threats! Recall this example from Chapter 4:! If the agents can announce their strategies beforehand, agent 1 might want to announce (B,H) Ø Threat that if 2 chooses F, agent 1 will choose a move that hurts both agents Ø H isn t a credible threat Ø If agent 2 plays F anyway, it wouldn t be rational for agent 1 to play H C A D (3,8) (8,3) 1 B 2 2 E G F (5,5) 1 H (2,10) (1,0)! In zero-sum games, non-credible threats cannot occur Ø Any move M that hurts 2 will help 1 Ø If agent1 has an opportunity to play M, it would be rational to play it Nau: Game Theory 7
8 Game Tree Terminology! Root node: where the game starts! Max (or Min) node: a node where it s Max s (or Min s) move Ø Usually draw Max nodes as squares, Min nodes as circles! A node s children: the possible next nodes Ø children(h) = {σ (h,a) a χ(h)}! Terminal node: a node where the game ends Max node Min nodes Min nodes Terminal nodes Max nodes Nau: Game Theory 8
9 Number of Nodes! Let b = maximum branching factor! Let D = height of tree (maximum depth of any terminal node)! If D is even and the root is a Max node, then Ø The number of Max nodes is 1 + b 2 + b b D 2 = O(b D ) Ø The number of Min nodes is b + b 3 + b b D 1 = O(b D )! What if D is odd? Max node Min nodes Min nodes Terminal nodes Max nodes Nau: Game Theory 9
10 Number of Pure Strategies! Pure strategy for Max: at every Max node, choose one branch! O(b D ) Max nodes, b choices at each of them => O( b b D ) pure strategies Ø In the following tree, how many pure strategies for Max?! Many of them are equivalent (differ only at unreachable nodes) Ø How many distinct pure strategies are there?! What about Min? Max node Min nodes Min nodes Terminal nodes Max nodes Nau: Game Theory 10
11 Number of Distinct Pure Strategies! At every reachable Max node, choose one branch Ø Number of reachable Max nodes b 0 + b 1 + b b (D 2)/2 = O(b D/2 )! b choices at each of them => O( b b D/2 ) distinct pure strategies! Again, what about Min? Max node Min nodes Min nodes Terminal nodes Max nodes Nau: Game Theory 11
12 Finding the Minimax Strategy! Brute-force way to find minimax strategy for Max Ø Construct the sets S and T of all distinct strategies for Max and Min, then choose! Complexity analysis: s* = argmin s S max t T Ø Need to construct and store O( b b D/2 ) distinct strategies Ø Each distinct strategy has size O(b D/2 ) Ø Thus space complexity is O( b (b D/2 D/2)) Ø Time complexity is slightly worse u( s,t)! But there s an easier way to find the minimax strategy! Notation: v(h) = minimax value of the subgame at node h If h is terminal then v(h) = u(h) Nau: Game Theory 12
13 Minimax! Backward induction algorithm (Chapter 4) for 2-player zero-sum games Ø Returns v(h); can easily modify it to return both v(h) and a function Minimax (h) if h Z then return v(h) else ρ (h) = Max then return max{minimax(σ (h,a)) : a χ(h)} else return min{minimax(σ (h,a)) : a χ(h)} MAX 3 a 1 a 2 a 3 MIN a 11 a 12 a 13 a 21 a 22 a 23 a 31 a 32 a Nau: Game Theory 13
14 Complexity Analysis! Space complexity = (maximum path length) (space needed to store the path) = O(bD)! Time complexity = size of the game tree = O(b D ) Ø where b = branching factor, D = height of the game tree! This is a lot better than O( b (b D/2 D/2))! But it still isn t good enough for games like chess Ø b 35, D 100 for reasonable chess games Ø b h nodes! Number of particles in the universe Ø nodes is times the number of particles in the universe no way to examine every node! Nau: Game Theory 14
15 Limited-Depth Minimax (Wiener,1948) function LD-minimax(h,d) if h Z then return v(h) else if d = 0 then return e(h) else if ρ (h) = Max then return max a χ(h) LD-minimax(σ (h,a)), d 1) else return min a χ(h) LD-minimax(σ (h,a)), d 1)! Minimax algorithm with an upper bound d on the search depth Ø e(h) is a static evaluation function: returns an estimate of v(h)! Returns an approximation of v(h) Ø If d height of node h, returns v(h) exactly! Space complexity = O(min(bD,bd)), where D = height of node h! Time complexity = O(min(b D, b d )) Nau: Game Theory 15
16 Evaluation Functions! e(h) is often a weighted sum of features Ø e(h) = w 1 f 1 (h) + w 2 f 2 (h) + + w n f n (h)! E.g., in chess, Ø 1 (white pawns black pawns) + 3 (white knights black knights) + Nau: Game Theory 16
17 How to Use Limited-Depth Minimax function LD-choice(h,d) if ρ (h) = Max then return arg max a χ(h) LD-minimax(σ (h,a)), d 1) else return arg min a χ(h) LD-minimax(σ (h,a)), d 1) function LD-minimax(h,d) if h Z then return v(h) else if d = 0 then return e(h) else if ρ (h) = Max then return max a χ(h) LD-minimax(σ (h,a)), d 1) else return min a χ(h) LD-minimax(σ (h,a)), d 1)! LD-choice only returns a single action; call it at every move Ø Why? Nau: Game Theory 17
18 Exact Values for e Don t Matter MAX MIN ! Behavior is preserved under any monotonic transformation of e Ø Only the order matters Nau: Game Theory 18
19 Multiplayer Games! The Max n algorithm: backward induction for n players, with cutoff depth d and evaluation function e! Node evaluation is a payoff profile v v[i] is player i s payoff! Approximate SPE payoff profile Ø Exact if d height of h function Maxn (h, d) if h Z then return u(h) if d = 0 then return e(h) V = {Maxn(σ (h,a), d 1) a χ(h)} return arg max v V v[ρ (h)] function Maxn-choice (h, d) 1 s move: return arg max a χ(h) Maxn(σ (h,a), d 1) [ρ (h)] 2 p move to s 2 s q (2,4,4) r (3,5,2) s (4,5,1) move: t u v w x y 3 (2,4,4) (6,3,1) (5,2,2) (3,5,2) (4,5,1) (1,4,5) 4 Nau: Game Theory 19
20 Multiplayer Games! The Paranoid algorithm: compute approximate maxmin value for player i, with cutoff depth d and evaluation function e Ø exact maxmin value if d h s height! Compute payoff for i, assuming i wants to maximize it, and the others want to minimize it function Paranoid(i, h, d) if h Z then return u(h)[i] if d = 0 then return e(h)[i] if ρ (h) = i then return max a χ(h) Paranoid(i, σ (h,a), d 1) else return min a χ(h) Paranoid(i, σ (h,a), d 1) function Paranoid-choice (i, h, d) if ρ (h) = i then return arg max a χ(h) Paranoid(i, σ (h,a), d 1) else return error 1 s move: 2 move to r 2 s q 2 r 3 s 1 move: p t u v w x y 3 (2,4,4) (6,3,1) (5,2,2) (3,5,2) (4,5,1) (1,4,5) 1 Nau: Game Theory 20
21 Discussion! Neither the Max n algorithm nor the Paranoid algorithm has been entirely successful! Partly due to dynamic relationships among players Ø Human players may hold grudges Ø Informal alliances may form and dissolve over time e.g., the players who are behind may gang up on the player who s winning! These relationships can greatly influence the players strategies Ø Max n and Paranoid don t model these relationships! To play well in a multi-player game, ways are needed to Ø Decide when/whether to cooperate with others Ø Deduce each player s attitude toward the other players Nau: Game Theory 21
22 Pruning! Let s go back to 2-player zero-sum games Ø Minimax and LD-minimax both examine nodes that don t need to be examined MAX a 3 MIN b 3 2 c d e X X Nau: Game Theory 22
23 Pruning! b is better for Max than f is! If Max is rational then Max will never choose f! So don t examine any more nodes below f Ø They can t affect v(a) MAX a 3 MIN b 3 f 2 c d e g X X Nau: Game Theory 23
24 Pruning! Don t know whether h is better or worse than b MAX a 3 MIN b 3 f 2 h 14 c d e g i X X 14 Nau: Game Theory 24
25 Pruning! Still don t know whether h is better or worse than b MAX a 3 MIN b 3 f 2 h 14 5 c d e g i X X 14 j 5 Nau: Game Theory 25
26 Pruning! h is worse than b Ø Don t need to examine any more nodes below h! v(a) = 3 MAX a 3 3 MIN b 3 f 2 h c d e g i X X 14 j k X 5 2 Nau: Game Theory 26
27 Alpha Cutoff! Squares are Max nodes, circles are Min nodes! Let α = max(a,b,c), and suppose d < α! To reach s, the game must go through p, q, r! By moving elsewhere at one of those nodes, Max can get v α! If the game ever reaches node s, then Min can achieve v d < what Max can get elsewhere Ø Max will never let that happen Ø We don t need to know anything more about s! What if d = α? p v=a v=b v=c v=d v a q v b r v c α = max(a,b,c) s v d Nau: Game Theory 27
28 Beta Cutoff! Squares are Max nodes, circles are Min nodes! Let β = min(a,b,c), and suppose d > β p v a! To reach s, the game must go through p, q, r! By moving elsewhere at one of those nodes, Min can achieve v β v=a q v b! If the game ever reaches node s, then Max can achieve v d > what Min can get elsewhere v=b r v c Ø Min will never let that happen Ø We don t need to know anything more about s v=c s β = min(a,b,c) v d! What if d = β? v=d Nau: Game Theory 28
29 Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column α = b return v α = a c if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do v min(v, Alpha-Beta (σ (h,a), d 1, α, β) if v α then return v else β min(β,v) e d j m f i k l g h Nau: Game Theory 29
30 Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column α = b return v 7 α = X 7 a 7 c if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do v min(v, Alpha-Beta (σ (h,a), d 1, α, β) if v α then return v else β min(β,v) e d j m f i k l g h Nau: Game Theory 30
31 Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column α = b 7 f g return v h e α = X 7 a 7 d c i k l j if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do v min(v, Alpha-Beta (σ (h,a), d 1, α, β) if v α then return v else β min(β,v) m Nau: Game Theory 31
32 Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column α = b 7 f g 5 return v 5 h e α = X 7 a 7 d c i k l j if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do v min(v, Alpha-Beta (σ (h,a), d 1, α, β) if v α then return v else β min(β,v) m Nau: Game Theory 32
33 Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column α = b return v 7 f 5 g 5 h -3 e α = X 7 a 7 d c i k l j if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do v min(v, Alpha-Beta (σ (h,a), d 1, α, β) if v α then return v else β min(β,v) m Nau: Game Theory 33
34 Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column return v α = X 7 a α = 7 c b 7 d 5 e 5 j alpha cutoff f 5 i k l g 5 h -3 if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do v min(v, Alpha-Beta (σ (h,a), d 1, α, β) if v α then return v else β min(β,v) m Nau: Game Theory 34
35 Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column return v α = X 7 a α = 7 b 7 d 5 e 5 alpha cutoff f 5 i k g 5 h -3 c j l if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do v min(v, Alpha-Beta (σ (h,a), d 1, α, β) if v α then return v else β min(β,v) m Nau: Game Theory 35
36 Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column α = X 7 a α = 7 b 7 d 5 e 5 alpha cutoff f 5 i 0 k g 5 h -3 0 return v c j l if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do v min(v, Alpha-Beta (σ (h,a), d 1, α, β) if v α then return v else β min(β,v) m Nau: Game Theory 36
37 Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column α = X 7 a α = 7 c b 7 d 5 X 8 e 5 j alpha cutoff 8 f 5 i 8 k l g 5 h return v if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do v min(v, Alpha-Beta (σ (h,a), d 1, α, β) if v α then return v else β min(β,v) β = 8 m Nau: Game Theory 37
38 Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column α = X 7 v min(v, a α = 7 c b 7 d m 5 X 8 e 5 j alpha cutoff 8 f 5 i 8 k l β = 8 9 beta cutoff g 5 h return v if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do if v α then return v else β min(β,v) Alpha-Beta (σ (h,a), d 1, α, β) Nau: Game Theory 38
39 Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column α = X 7 v min(v, a α = 7 X 8 c b 7 α = X d m X 5 8 X 8 e 5 j alpha cutoff 8 f 5 i 8 k l β = 8 9 beta cutoff g 5 h return v if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do if v α then return v else β min(β,v) Alpha-Beta (σ (h,a), d 1, α, β) Nau: Game Theory 39
40 Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column α = X 7 v min(v, a α = 7 X 8 c b 7 α = X d m β = 8 X 5 8 X 8 e 5 j alpha cutoff 8 f 5 i 8 k l β = 8 9 beta cutoff g 5 h return v if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do if v α then return v else β min(β,v) Alpha-Beta (σ (h,a), d 1, α, β) Nau: Game Theory 40
41 Properties of Alpha-Beta! Alpha-beta pruning reasons about which computations are relevant Ø A form of metareasoning Theorem:! If the value returned by Minimax(h, d) is in [α,β] then Alpha-Beta(h, d, α, β) returns the same value! If the value returned by Minimax(h, d) is α then Alpha-Beta(h, d, α, β) returns a value α! If the value returned by Minimax(h, d) is β then Alpha-Beta(h, d, α, β) returns a value β Corollary: Alpha-Beta(h, d,, ) returns the same value as Minimax(h, d) Alpha-Beta(h,,, ) returns v(h) Nau: Game Theory 41
42 Node Ordering! Deeper lookahead (larger d) usually gives better decisions Ø There are pathological games where it doesn t, but those are rare! Compared to LD-minimax, how much farther ahead can Alpha-Beta look?! Best case: Ø children of Max nodes are searched in greatest-value-first order, children of Min nodes are searched in least-value-first order Ø Alpha-Beta s time complexity is O(b d/2 ) doubles the solvable depth! Worst case: Ø children of Max nodes are searched in least-value first order, children of Min nodes are searched in greatest-value first order Ø Like LD-minimax, Alpha-Beta visits all nodes of depth d: time complexity O(b d ) Nau: Game Theory 42
43 Node Ordering! How to get closer to the best case: Ø Every time you expand a state s, apply e to its children Ø When it s Max s move, sort the children in order of largest e first Ø When it s Min s move, sort the children in order of smallest e first! Suppose we have 100 seconds, explore 10 4 nodes/second Ø 10 6 nodes per move Ø Put this into the form b d/2 35 8/2 Ø Best case Alpha-Beta reaches depth 8 pretty good chess program Nau: Game Theory 43
44 Other Modifications! Several other modifications that can improve the accuracy or computation time: Ø quiescence search and biasing Ø transposition tables Ø thinking on the opponent s time Ø table lookup of book moves Ø iterative deepening Ø forward pruning Nau: Game Theory 44
45 Quiescence Search and Biasing! In a game like checkers or chess, the evaluation is based greatly on material pieces Ø The evaluation is likely to be inaccurate if there are pending captures! Search deeper to reach a position where there aren t pending captures Ø Evaluations will be more accurate here! That creates another problem Ø You re searching some paths to an even depth, others to an odd depth Ø Paths that end just after your opponent s move will look worse than paths that end just after your move! To compensate, add or subtract a number called the biasing factor Nau: Game Theory 45
46 Transposition Tables! Often there are multiple paths to the same state (i.e., the state space is a really graph rather than a tree)! Idea: Ø when you compute s s minimax value, store it in a hash table Ø visit s again retrieve its value rather than computing it again! The hash table is called a transposition table! Problem: too many states to store all of them Ø Store some, rather than all Ø Try to store the ones that you re most likely to need Nau: Game Theory 46
47 Thinking on the Opponent s Time! Suppose you re at node a Ø Use alpha-beta to estimate v(b) and v(c) Ø c looks better, so move there! Consider your estimates of v(f) and v(g) Ø They suggest your opponent is likely to move to f Ø While waiting for the opponent to move, start an alpha-beta search below f! If the opponent moves to f, then you ve already done a lot of the work of figuring out your next move b 3 a d 5 e 3 f 8 g 17 8 c 8 Nau: Game Theory 47
48 Book Moves! In some games, experts have spent lots of time analyzing openings Ø Sequences of moves one might play at the start of the game Ø Best responses to those sequences! Some of these are cataloged in standard reference works Ø e.g., the Encyclopaedia of Chess Openings! Store these in a lookup table Ø Respond almost immediately, as long as the opponent sticks to a sequence that s in the book! A technique humans can use when playing against such a program Ø Deliberately make a move that isn t in the book Ø This may weaken the human s position, but the computer will (i) start taking longer and (ii) stop playing as well Nau: Game Theory 48
49 Iterative Deepening! How deeply should you search a game tree? Ø When you call Alpha-Beta(h, d,, ), what to use for d?! small d => don t make as good a decision! large d => run out of time without knowing what move to make! Solution: iterative deepening for d = 1 by 1 until you run out of time m the move returned by Alpha-Beta(h, d,, )! Why this works: Ø Time complexity is O(b 1 + b b d ) = O(b d ) Ø For large b, each iteration takes much more time than all of the previous iterations put together Nau: Game Theory 49
50 Forward Pruning! Tapered search: Ø Instead of looking at all of a node s children, just look at the n best ones the n highest e(h) values Ø Decrease the value of n as you go deeper in the tree b a 5 c -3 d 8 d 17 ignore n = 3 Ø Drawback: may exclude an important move at a low level of the tree c 2 ignore! Marginal forward pruning: Ø Ignore every child h such that e(h) is smaller than the current best values from the nodes you ve already visited Ø Not reliable, should be avoided Nau: Game Theory 50
51 Forward Pruning! Until the mid-1970s most chess programs were trying to search the same way humans think Ø Use extensive chess knowledge at each node to select a few plausible moves; prune the others Ø Serious tactical shortcomings! Brute-force search programs did better, dominated computer chess for about 20 years! Early 1990s: development of some forward-pruning techniques that worked well Ø Null-move pruning! Today most chess programs use some kind of forward-pruning Ø null-move pruning is one of the most popular Nau: Game Theory 51
52 Game-Tree Search in Practice! Checkers: In 1994, Chinook ended 40-year-reign of human world champion Marion Tinsley Ø Tinsley withdrew for health reasons, died a few months later! In 2007, Checkers was solved: with perfect play, it s a draw This took calculations over 18 years. Search space size ! Chess: In 1997, Deep Blue defeated Gary Kasparov in a six-game match Ø Deep Blue searches 200 million positions per second Ø Uses very sophisticated evaluation, and undisclosed methods for extending some lines of search up to 40 ply! Othello: human champions don t compete against computers Ø The computers are too good! Go: in 2006, good amateurs could beat the best go programs Even with a 9-stone handicap Ø Go programs have improved a lot during the past 7 years Nau: Game Theory 52
53 Rules of Go (Abbreviated)! Go-board: locations (intersections on a grid)! Back and White take turns! Black has the first move! Each move consists of placing a stone at an unoccupied location Ø You just put them on the board; you don t move them around! Adjacent stones of the same color are called a string. Ø Liberties are the empty locations next to the string Ø A string is removed if its number of liberties is 0! Score: territories (number of occupied or surrounded locations) Nau: Game Theory 53
54 Why Go is Difficult for Computers! A game tree s size grows exponentially with both its depth and its branching factor! The game tree for go: Ø branching factor 200 Ø game length 250 to 300 moves Ø number of paths in the game tree to ! Much too big for a normal game tree search b =2 b =3 b =4 Updated 9/23/10 Nau: Game Theory 43 Nau: Game Theory 54
55 Why Go is Difficult for Computers! Writing an evaluation function for chess Ø Mainly piece count, plus some positional considerations isolated/doubled pawns, rooks on open files (columns), pawns in the center of the board, etc. Ø As the game progresses, pieces get removed => evaluation gets easier! Writing an evaluation function for Go is Ø Much more complicated whether or not a group is alive which stones can be connected to one another the extent to which a position has influence or can be attacked! As the game progresses, pieces get added => evaluation gets more complicated Nau: Game Theory 55
56 Game-tree Search in Go! During the past five years, go programs have gotten much better! Main reason: Monte Carlo roll-outs! Basic idea: Ø Generate a list of potential moves A ={a 1,, a n } Ø For each move a i in A do Starting at σ(h,a i ), generate a set of random games G i in which the two players make all their moves at random Ø Choose the move a i that produced the best results Nau: Game Theory 56
57 Multi-Arm Bandit! Statistical model of sequential experiments Ø Name comes from a traditional slot machine (one-armed bandit)! Multiple actions Ø Each action provides a reward from a probability distribution associated with that specific action Ø Objective: maximize the expected utility of a sequence of actions! Exploitation vs exploration dilemma: Ø Exploitation: choosing an action that you already know about, because you think it s likely to give you a high reward Ø Exploration: choosing an action that you don t know much about, in hopes that maybe it will produce a better reward than the actions you already know about Nau: Game Theory 57
58 UCB (Upper Confidence Bound) Algorithm! Let! loop Ø x i = average reward you ve gotten from arm i Ø t i = number of times you ve tried arm i; Ø t = i t i Ø if there are one or more arms that have not been played Ø then play one of them Ø else play the arm i that has the highest value of x i + 2 (log t)/t i Nau: Game Theory 58
59 UCT (UCB for Trees)! Adaptation of UCB for game-tree search in go Ø First used in Mogo in 2006 Ø Mogo won the go tournament at the 2007 Computer Olympiad, and several other computer go tournaments Ø Mogo won one (of 3) 9x9 blitz go games against a human professional! UCT is now used in most computer go programs! I ve had some trouble finding a clear and unambiguous description of UCT Ø But I think the basic algorithm is as follows Nau: Game Theory 59
60 Basic UCT Algorithm! Start with an empty tree, T! loop Ø h the start of the game Ø while h T do T = {nodes we ve evaluated} The loop ends when we reach a node we haven t evaluated use UCB to choose a child of h if h has one or more children that aren t already in T then h any child of h that isn t in T else h arg max k children(h) x k + 2 (log t h )/t k Ø play a random game starting at h; v the game s payoff profile Ø add h to T; t h 1; x h v[ρ(h)] Ø for every g ancestors(h) do x g (t g x g + v[ρ(g)]) / (t g + 1) t g t g + 1 i.e., x g = avg. payoff to ρ(g) of all games below g Nau: Game Theory 60
61 Properties! As the number of iterations goes to infinity, the evaluation x h of the root node converges to its minmax value! Question: Ø How can this be, since x h is the the average of the values we ve gotten below h? Nau: Game Theory 61
62 Properties! As the number of iterations goes to infinity, the evaluation x h of the root node converges to its minmax value! Question: Ø How can this be, since x h is the the average of the values we ve gotten below h?! Look at how h is chosen:! Start with an empty tree, T! loop Ø h the start of the game Ø while h T do if h has one or more children that aren t already in T then h any child of h that isn t in T else h arg max k children(h) x k + 2 (log t h )/t k Ø Nau: Game Theory 62
63 Some Improvements! As written, UCT won t evaluate any nodes below h until all of h s siblings have been visited Ø Not good we want to explore some subtrees more deeply than others! So, replace this:! with this: if h has one or more children that aren t already in T then h any child of h that isn t in T else h arg max k children(h) x k + 2 (log t h )/t k h arg max k children(h) y k Ø where y k = x k + 2 (log t h )/t k, if k is already in T y k = some constant c, otherwise Nau: Game Theory 63
64 Some Improvements! UCB assumes all the levers are independent! Similarly, UCT assumes the values of sibling nodes are independent! So, Ø That s usually not true in practice Ø modify x h using some information computed from x h s siblings and grandparent Nau: Game Theory 64
65 Two-Player Zero-Sum Stochastic Games! Example: Backgammon! Two agents who take turns! At each turn, the set of available moves depends on the results of rolling the dice Ø Each die specifies how far to move one of your pieces (except if you roll doubles) Ø If your piece will land on a location with 2 or more opponent s pieces, you can t move there Ø If your piece lands on a location with 1 opponent s piece, the opponent s piece must start over Nau: Game Theory 65
66 Backgammon Game Tree MAX! Players moves have deterministic outcomes DICE MIN B /36 1,1 1/18 1, /18 1/36 6,5 6, ! Dice rolls have stochastic outcomes DICE MAX... 1/36 1,1 1/18 1,2... C /18 1/36 6,5 6, TERMINAL Nau: Game Theory 66
67 ExpectiMinimax function ExpectiMinimax(h,d) if h Z then return v(h) else if d = 0 then return e(h) else if ρ (h) = Max then return max a χ(h) ExpectiMinimax(σ (h,a), d 1) else if ρ (h) = Min then return min a χ(h) ExpectiMinimax(σ (h,a), d 1) else return Σ a χ(h) Pr[a h] ExpectiMinimax(σ (h,a), d 1)! Returns the minimax expected utilities Ø Can be modified to return the actions! Can also be modified to do α-β pruning Ø But it s more complicated and less effective than in deterministic games Nau: Game Theory 67
68 ! Dice rolls increase branching factor Ø 21 possible rolls with 2 dice In Practice Ø Given the dice roll, 20 legal moves on average For some dice roles, can be much higher At depth 4, number of nodes is = 20 (21 20) Ø As depth increases, probability of reaching a given node shrinks value of lookahead is diminished! α-β pruning is much less effective! TDGammon uses depth-2 search + very good evaluation function Ø world-champion level Ø The evaluation function was created automatically using a machinelearning technique called Temporal Difference learning hence the TD in TDGammon Nau: Game Theory 68
69 Summary! Two-player zero-sum perfect-information games Ø the maximin and minimax strategies are the same Ø only need to look at pure strategies Ø can do a game-tree search minimax values, alpha-beta pruning! In sufficiently complicated games, perfection is unattainable Ø limited search depth, static evaluation function Ø Monte Carlo roll-outs! Game-tree search can be modified for games in which there are stochastic outcomes Nau: Game Theory 69
Game playing. Chapter 6. Chapter 6 1
Game playing Chapter 6 Chapter 6 1 Outline Minimax α β pruning UCT for games Chapter 6 2 Game tree (2-player, deterministic, turns) Chapter 6 3 Minimax Perfect play for deterministic, perfect-information
More informationCS 188: Artificial Intelligence
CS 188: Artificial Intelligence Adversarial Search II Instructor: Anca Dragan University of California, Berkeley [These slides adapted from Dan Klein and Pieter Abbeel] Minimax Example 3 12 8 2 4 6 14
More informationCS 188: Artificial Intelligence Fall Announcements
CS 188: Artificial Intelligence Fall 2009 Lecture 7: Expectimax Search 9/17/2008 Dan Klein UC Berkeley Many slides over the course adapted from either Stuart Russell or Andrew Moore 1 Announcements Written
More informationAnnouncements. CS 188: Artificial Intelligence Fall Adversarial Games. Computing Minimax Values. Evaluation Functions. Recap: Resource Limits
CS 188: Artificial Intelligence Fall 2009 Lecture 7: Expectimax Search 9/17/2008 Announcements Written 1: Search and CSPs is up Project 2: Multi-agent Search is up Want a partner? Come to the front after
More informationAlgorithms for Playing and Solving games*
Algorithms for Playing and Solving games* Andrew W. Moore Professor School of Computer Science Carnegie Mellon University www.cs.cmu.edu/~awm awm@cs.cmu.edu 412-268-7599 * Two Player Zero-sum Discrete
More informationAnnouncements. CS 188: Artificial Intelligence Spring Mini-Contest Winners. Today. GamesCrafters. Adversarial Games
CS 188: Artificial Intelligence Spring 2009 Lecture 7: Expectimax Search 2/10/2009 John DeNero UC Berkeley Slides adapted from Dan Klein, Stuart Russell or Andrew Moore Announcements Written Assignment
More informationCSE 573: Artificial Intelligence
CSE 573: Artificial Intelligence Autumn 2010 Lecture 5: Expectimax Search 10/14/2008 Luke Zettlemoyer Most slides over the course adapted from either Dan Klein, Stuart Russell or Andrew Moore 1 Announcements
More informationAdversarial Search & Logic and Reasoning
CSEP 573 Adversarial Search & Logic and Reasoning CSE AI Faculty Recall from Last Time: Adversarial Games as Search Convention: first player is called MAX, 2nd player is called MIN MAX moves first and
More informationEvaluation for Pacman. CS 188: Artificial Intelligence Fall Iterative Deepening. α-β Pruning Example. α-β Pruning Pseudocode.
CS 188: Artificial Intelligence Fall 2008 Evaluation for Pacman Lecture 7: Expectimax Search 9/18/2008 [DEMO: thrashing, smart ghosts] Dan Klein UC Berkeley Many slides over the course adapted from either
More informationCS 188: Artificial Intelligence Fall 2008
CS 188: Artificial Intelligence Fall 2008 Lecture 7: Expectimax Search 9/18/2008 Dan Klein UC Berkeley Many slides over the course adapted from either Stuart Russell or Andrew Moore 1 1 Evaluation for
More informationAlpha-Beta Pruning: Algorithm and Analysis
Alpha-Beta Pruning: Algorithm and Analysis Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 Introduction Alpha-beta pruning is the standard searching procedure used for solving
More informationAlpha-Beta Pruning: Algorithm and Analysis
Alpha-Beta Pruning: Algorithm and Analysis Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 Introduction Alpha-beta pruning is the standard searching procedure used for 2-person
More informationAlpha-Beta Pruning: Algorithm and Analysis
Alpha-Beta Pruning: Algorithm and Analysis Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 Introduction Alpha-beta pruning is the standard searching procedure used for 2-person
More informationCS 188 Introduction to Fall 2007 Artificial Intelligence Midterm
NAME: SID#: Login: Sec: 1 CS 188 Introduction to Fall 2007 Artificial Intelligence Midterm You have 80 minutes. The exam is closed book, closed notes except a one-page crib sheet, basic calculators only.
More informationSummary. Agenda. Games. Intelligence opponents in game. Expected Value Expected Max Algorithm Minimax Algorithm Alpha-beta Pruning Simultaneous Game
Summary rtificial Intelligence and its applications Lecture 4 Game Playing Search onstraint Satisfaction Problems From start state to goal state onsider constraints Professor Daniel Yeung danyeung@ieee.org
More informationCS 188: Artificial Intelligence Spring 2007
CS 188: Artificial Intelligence Spring 2007 Lecture 8: Logical Agents - I 2/8/2007 Srini Narayanan ICSI and UC Berkeley Many slides over the course adapted from Dan Klein, Stuart Russell or Andrew Moore
More informationCS 570: Machine Learning Seminar. Fall 2016
CS 570: Machine Learning Seminar Fall 2016 Class Information Class web page: http://web.cecs.pdx.edu/~mm/mlseminar2016-2017/fall2016/ Class mailing list: cs570@cs.pdx.edu My office hours: T,Th, 2-3pm or
More informationNotes on induction proofs and recursive definitions
Notes on induction proofs and recursive definitions James Aspnes December 13, 2010 1 Simple induction Most of the proof techniques we ve talked about so far are only really useful for proving a property
More informationReinforcement Learning. Spring 2018 Defining MDPs, Planning
Reinforcement Learning Spring 2018 Defining MDPs, Planning understandability 0 Slide 10 time You are here Markov Process Where you will go depends only on where you are Markov Process: Information state
More informationIS-ZC444: ARTIFICIAL INTELLIGENCE
IS-ZC444: ARTIFICIAL INTELLIGENCE Lecture-07: Beyond Classical Search Dr. Kamlesh Tiwari Assistant Professor Department of Computer Science and Information Systems, BITS Pilani, Pilani, Jhunjhunu-333031,
More informationCS 4100 // artificial intelligence. Recap/midterm review!
CS 4100 // artificial intelligence instructor: byron wallace Recap/midterm review! Attribution: many of these slides are modified versions of those distributed with the UC Berkeley CS188 materials Thanks
More informationPOLYNOMIAL SPACE QSAT. Games. Polynomial space cont d
T-79.5103 / Autumn 2008 Polynomial Space 1 T-79.5103 / Autumn 2008 Polynomial Space 3 POLYNOMIAL SPACE Polynomial space cont d Polynomial space-bounded computation has a variety of alternative characterizations
More informationIndustrial Organization Lecture 3: Game Theory
Industrial Organization Lecture 3: Game Theory Nicolas Schutz Nicolas Schutz Game Theory 1 / 43 Introduction Why game theory? In the introductory lecture, we defined Industrial Organization as the economics
More informationMinimax strategies, alpha beta pruning. Lirong Xia
Minimax strategies, alpha beta pruning Lirong Xia Reminder ØProject 1 due tonight Makes sure you DO NOT SEE ERROR: Summation of parsed points does not match ØProject 2 due in two weeks 2 How to find good
More informationAn Analysis of Forward Pruning. to try to understand why programs have been unable to. pruning more eectively. 1
Proc. AAAI-94, to appear. An Analysis of Forward Pruning Stephen J. J. Smith Dana S. Nau Department of Computer Science Department of Computer Science, and University of Maryland Institute for Systems
More informationComputing Minmax; Dominance
Computing Minmax; Dominance CPSC 532A Lecture 5 Computing Minmax; Dominance CPSC 532A Lecture 5, Slide 1 Lecture Overview 1 Recap 2 Linear Programming 3 Computational Problems Involving Maxmin 4 Domination
More informationBasic Game Theory. Kate Larson. January 7, University of Waterloo. Kate Larson. What is Game Theory? Normal Form Games. Computing Equilibria
Basic Game Theory University of Waterloo January 7, 2013 Outline 1 2 3 What is game theory? The study of games! Bluffing in poker What move to make in chess How to play Rock-Scissors-Paper Also study of
More informationCSC242: Intro to AI. Lecture 7 Games of Imperfect Knowledge & Constraint Satisfaction
CSC242: Intro to AI Lecture 7 Games of Imperfect Knowledge & Constraint Satisfaction What is This? 25" 20" 15" 10" 5" 0" Quiz 1 25" B 20" 15" 10" F D C A 5" F 0" Moral Many people cannot learn from lectures
More informationPlaying Abstract games with Hidden States (Spatial and Non-Spatial).
Playing Abstract games with Hidden States (Spatial and Non-Spatial). Gregory Calbert, Hing-Wah Kwok Peter Smet, Jason Scholz, Michael Webb VE Group, C2D, DSTO. Report Documentation Page Form Approved OMB
More informationComputing Minmax; Dominance
Computing Minmax; Dominance CPSC 532A Lecture 5 Computing Minmax; Dominance CPSC 532A Lecture 5, Slide 1 Lecture Overview 1 Recap 2 Linear Programming 3 Computational Problems Involving Maxmin 4 Domination
More informationLecture 1: March 7, 2018
Reinforcement Learning Spring Semester, 2017/8 Lecture 1: March 7, 2018 Lecturer: Yishay Mansour Scribe: ym DISCLAIMER: Based on Learning and Planning in Dynamical Systems by Shie Mannor c, all rights
More informationProperties of Forward Pruning in Game-Tree Search
Properties of Forward Pruning in Game-Tree Search Yew Jin Lim and Wee Sun Lee School of Computing National University of Singapore {limyewji,leews}@comp.nus.edu.sg Abstract Forward pruning, or selectively
More informationEvolutionary Computation: introduction
Evolutionary Computation: introduction Dirk Thierens Universiteit Utrecht The Netherlands Dirk Thierens (Universiteit Utrecht) EC Introduction 1 / 42 What? Evolutionary Computation Evolutionary Computation
More informationFinal. Introduction to Artificial Intelligence. CS 188 Spring You have approximately 2 hours and 50 minutes.
CS 188 Spring 2014 Introduction to Artificial Intelligence Final You have approximately 2 hours and 50 minutes. The exam is closed book, closed notes except your two-page crib sheet. Mark your answers
More informationCS599 Lecture 1 Introduction To RL
CS599 Lecture 1 Introduction To RL Reinforcement Learning Introduction Learning from rewards Policies Value Functions Rewards Models of the Environment Exploitation vs. Exploration Dynamic Programming
More information1 Extensive Form Games
1 Extensive Form Games De nition 1 A nite extensive form game is am object K = fn; (T ) ; P; A; H; u; g where: N = f0; 1; :::; ng is the set of agents (player 0 is nature ) (T ) is the game tree P is the
More informationCounting. 1 Sum Rule. Example 1. Lecture Notes #1 Sept 24, Chris Piech CS 109
1 Chris Piech CS 109 Counting Lecture Notes #1 Sept 24, 2018 Based on a handout by Mehran Sahami with examples by Peter Norvig Although you may have thought you had a pretty good grasp on the notion of
More informationCS885 Reinforcement Learning Lecture 7a: May 23, 2018
CS885 Reinforcement Learning Lecture 7a: May 23, 2018 Policy Gradient Methods [SutBar] Sec. 13.1-13.3, 13.7 [SigBuf] Sec. 5.1-5.2, [RusNor] Sec. 21.5 CS885 Spring 2018 Pascal Poupart 1 Outline Stochastic
More informationScout, NegaScout and Proof-Number Search
Scout, NegaScout and Proof-Number Search Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 Introduction It looks like alpha-beta pruning is the best we can do for a generic searching
More informationDeep Reinforcement Learning. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 19, 2017
Deep Reinforcement Learning STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 19, 2017 Outline Introduction to Reinforcement Learning AlphaGo (Deep RL for Computer Go)
More informationMath Models of OR: Branch-and-Bound
Math Models of OR: Branch-and-Bound John E. Mitchell Department of Mathematical Sciences RPI, Troy, NY 12180 USA November 2018 Mitchell Branch-and-Bound 1 / 15 Branch-and-Bound Outline 1 Branch-and-Bound
More informationScout and NegaScout. Tsan-sheng Hsu.
Scout and NegaScout Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 Abstract It looks like alpha-beta pruning is the best we can do for an exact generic searching procedure.
More information4: Dynamic games. Concordia February 6, 2017
INSE6441 Jia Yuan Yu 4: Dynamic games Concordia February 6, 2017 We introduce dynamic game with non-simultaneous moves. Example 0.1 (Ultimatum game). Divide class into two groups at random: Proposers,
More informationNotes on Coursera s Game Theory
Notes on Coursera s Game Theory Manoel Horta Ribeiro Week 01: Introduction and Overview Game theory is about self interested agents interacting within a specific set of rules. Self-Interested Agents have
More informationNondeterministic finite automata
Lecture 3 Nondeterministic finite automata This lecture is focused on the nondeterministic finite automata (NFA) model and its relationship to the DFA model. Nondeterminism is an important concept in the
More informationBasics of reinforcement learning
Basics of reinforcement learning Lucian Buşoniu TMLSS, 20 July 2018 Main idea of reinforcement learning (RL) Learn a sequential decision policy to optimize the cumulative performance of an unknown system
More informationCS221 Practice Midterm
CS221 Practice Midterm Autumn 2012 1 ther Midterms The following pages are excerpts from similar classes midterms. The content is similar to what we ve been covering this quarter, so that it should be
More informationCOMP3702/7702 Artificial Intelligence Week1: Introduction Russell & Norvig ch.1-2.3, Hanna Kurniawati
COMP3702/7702 Artificial Intelligence Week1: Introduction Russell & Norvig ch.1-2.3, 3.1-3.3 Hanna Kurniawati Today } What is Artificial Intelligence? } Better know what it is first before committing the
More informationSolving Extensive Form Games
Chapter 8 Solving Extensive Form Games 8.1 The Extensive Form of a Game The extensive form of a game contains the following information: (1) the set of players (2) the order of moves (that is, who moves
More informationA Polynomial-time Nash Equilibrium Algorithm for Repeated Games
A Polynomial-time Nash Equilibrium Algorithm for Repeated Games Michael L. Littman mlittman@cs.rutgers.edu Rutgers University Peter Stone pstone@cs.utexas.edu The University of Texas at Austin Main Result
More informationSatisfaction Equilibrium: Achieving Cooperation in Incomplete Information Games
Satisfaction Equilibrium: Achieving Cooperation in Incomplete Information Games Stéphane Ross and Brahim Chaib-draa Department of Computer Science and Software Engineering Laval University, Québec (Qc),
More information1. Counting. Chris Piech and Mehran Sahami. Oct 2017
. Counting Chris Piech and Mehran Sahami Oct 07 Introduction Although you may have thought you had a pretty good grasp on the notion of counting at the age of three, it turns out that you had to wait until
More informationDeep Reinforcement Learning
Martin Matyášek Artificial Intelligence Center Czech Technical University in Prague October 27, 2016 Martin Matyášek VPD, 2016 1 / 50 Reinforcement Learning in a picture R. S. Sutton and A. G. Barto 2015
More informationAdministration. CSCI567 Machine Learning (Fall 2018) Outline. Outline. HW5 is available, due on 11/18. Practice final will also be available soon.
Administration CSCI567 Machine Learning Fall 2018 Prof. Haipeng Luo U of Southern California Nov 7, 2018 HW5 is available, due on 11/18. Practice final will also be available soon. Remaining weeks: 11/14,
More informationComputational aspects of two-player zero-sum games Course notes for Computational Game Theory Section 4 Fall 2010
Computational aspects of two-player zero-sum games Course notes for Computational Game Theory Section 4 Fall 010 Peter Bro Miltersen January 16, 011 Version 1. 4 Finite Perfect Information Games Definition
More informationCMU Lecture 12: Reinforcement Learning. Teacher: Gianni A. Di Caro
CMU 15-781 Lecture 12: Reinforcement Learning Teacher: Gianni A. Di Caro REINFORCEMENT LEARNING Transition Model? State Action Reward model? Agent Goal: Maximize expected sum of future rewards 2 MDP PLANNING
More informationMarks. bonus points. } Assignment 1: Should be out this weekend. } Mid-term: Before the last lecture. } Mid-term deferred exam:
Marks } Assignment 1: Should be out this weekend } All are marked, I m trying to tally them and perhaps add bonus points } Mid-term: Before the last lecture } Mid-term deferred exam: } This Saturday, 9am-10.30am,
More informationRollout-based Game-tree Search Outprunes Traditional Alpha-beta
Journal of Machine Learning Research vol:1 8, 2012 Submitted 6/2012 Rollout-based Game-tree Search Outprunes Traditional Alpha-beta Ari Weinstein Michael L. Littman Sergiu Goschin aweinst@cs.rutgers.edu
More informationalgorithms Alpha-Beta Pruning and Althöfer s Pathology-Free Negamax Algorithm Algorithms 2012, 5, ; doi: /a
Algorithms 01, 5, 51-58; doi:10.3390/a504051 Article OPEN ACCESS algorithms ISSN 1999-4893 www.mdpi.com/journal/algorithms Alpha-Beta Pruning and Althöfer s Pathology-Free Negamax Algorithm Ashraf M. Abdelbar
More informationCOMP3702/7702 Artificial Intelligence Lecture 11: Introduction to Machine Learning and Reinforcement Learning. Hanna Kurniawati
COMP3702/7702 Artificial Intelligence Lecture 11: Introduction to Machine Learning and Reinforcement Learning Hanna Kurniawati Today } What is machine learning? } Where is it used? } Types of machine learning
More informationDescriptive Statistics (And a little bit on rounding and significant digits)
Descriptive Statistics (And a little bit on rounding and significant digits) Now that we know what our data look like, we d like to be able to describe it numerically. In other words, how can we represent
More informationLimits of Computation
The real danger is not that computers will begin to think like men, but that men will begin to think like computers Limits of Computation - Sydney J. Harris What makes you believe now that I am just talking
More informationPrinciples of Computing, Carnegie Mellon University. The Limits of Computing
The Limits of Computing Intractability Limits of Computing Announcement Final Exam is on Friday 9:00am 10:20am Part 1 4:30pm 6:10pm Part 2 If you did not fill in the course evaluations please do it today.
More informationTwo hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE. Date: Thursday 17th May 2018 Time: 09:45-11:45. Please answer all Questions.
COMP 34120 Two hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE AI and Games Date: Thursday 17th May 2018 Time: 09:45-11:45 Please answer all Questions. Use a SEPARATE answerbook for each SECTION
More informationBalancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm
Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 2778 mgl@cs.duke.edu
More informationQuantum Games. Quantum Strategies in Classical Games. Presented by Yaniv Carmeli
Quantum Games Quantum Strategies in Classical Games Presented by Yaniv Carmeli 1 Talk Outline Introduction Game Theory Why quantum games? PQ Games PQ penny flip 2x2 Games Quantum strategies 2 Game Theory
More informationChapter 4 Deliberation with Temporal Domain Models
Last update: April 10, 2018 Chapter 4 Deliberation with Temporal Domain Models Section 4.4: Constraint Management Section 4.5: Acting with Temporal Models Automated Planning and Acting Malik Ghallab, Dana
More informationLearning an Effective Strategy in a Multi-Agent System with Hidden Information
Learning an Effective Strategy in a Multi-Agent System with Hidden Information Richard Mealing Supervisor: Jon Shapiro Machine Learning and Optimisation Group School of Computer Science University of Manchester
More informationComputer Science 385 Analysis of Algorithms Siena College Spring Topic Notes: Limitations of Algorithms
Computer Science 385 Analysis of Algorithms Siena College Spring 2011 Topic Notes: Limitations of Algorithms We conclude with a discussion of the limitations of the power of algorithms. That is, what kinds
More informationCSC304 Lecture 5. Game Theory : Zero-Sum Games, The Minimax Theorem. CSC304 - Nisarg Shah 1
CSC304 Lecture 5 Game Theory : Zero-Sum Games, The Minimax Theorem CSC304 - Nisarg Shah 1 Recap Last lecture Cost-sharing games o Price of anarchy (PoA) can be n o Price of stability (PoS) is O(log n)
More information1 Definitions and Things You Know
We will discuss an algorithm for finding stable matchings (not the one you re probably familiar with). The Instability Chaining Algorithm is the most similar algorithm in the literature to the one actually
More informationSynthesis weakness of standard approach. Rational Synthesis
1 Synthesis weakness of standard approach Rational Synthesis 3 Overview Introduction to formal verification Reactive systems Verification Synthesis Introduction to Formal Verification of Reactive Systems
More informationMicroeconomics. 2. Game Theory
Microeconomics 2. Game Theory Alex Gershkov http://www.econ2.uni-bonn.de/gershkov/gershkov.htm 18. November 2008 1 / 36 Dynamic games Time permitting we will cover 2.a Describing a game in extensive form
More informationLearning Tetris. 1 Tetris. February 3, 2009
Learning Tetris Matt Zucker Andrew Maas February 3, 2009 1 Tetris The Tetris game has been used as a benchmark for Machine Learning tasks because its large state space (over 2 200 cell configurations are
More informationN/4 + N/2 + N = 2N 2.
CS61B Summer 2006 Instructor: Erin Korber Lecture 24, 7 Aug. 1 Amortized Analysis For some of the data structures we ve discussed (namely hash tables and splay trees), it was claimed that the average time
More information5.2 Infinite Series Brian E. Veitch
5. Infinite Series Since many quantities show up that cannot be computed exactly, we need some way of representing it (or approximating it). One way is to sum an infinite series. Recall that a n is the
More informationEvolutionary Dynamics and Extensive Form Games by Ross Cressman. Reviewed by William H. Sandholm *
Evolutionary Dynamics and Extensive Form Games by Ross Cressman Reviewed by William H. Sandholm * Noncooperative game theory is one of a handful of fundamental frameworks used for economic modeling. It
More information1 Trees. Listing 1: Node with two child reference. public class ptwochildnode { protected Object data ; protected ptwochildnode l e f t, r i g h t ;
1 Trees The next major set of data structures belongs to what s called Trees. They are called that, because if you try to visualize the structure, it kind of looks like a tree (root, branches, and leafs).
More informationToday s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning
CSE 473: Artificial Intelligence Reinforcement Learning Dan Weld Today s Outline Reinforcement Learning Q-value iteration Q-learning Exploration / exploitation Linear function approximation Many slides
More informationAppendix. Mathematical Theorems
Appendix Mathematical Theorems This appendix introduces some theorems required in the discussion in Chap. 5 of the asymptotic behavior of minimax values in Pb-game models. Throughout this appendix, b is
More informationCS188: Artificial Intelligence, Fall 2009 Written 2: MDPs, RL, and Probability
CS188: Artificial Intelligence, Fall 2009 Written 2: MDPs, RL, and Probability Due: Thursday 10/15 in 283 Soda Drop Box by 11:59pm (no slip days) Policy: Can be solved in groups (acknowledge collaborators)
More informationNormal-form games. Vincent Conitzer
Normal-form games Vincent Conitzer conitzer@cs.duke.edu 2/3 of the average game Everyone writes down a number between 0 and 100 Person closest to 2/3 of the average wins Example: A says 50 B says 10 C
More informationGame Theory and its Applications to Networks - Part I: Strict Competition
Game Theory and its Applications to Networks - Part I: Strict Competition Corinne Touati Master ENS Lyon, Fall 200 What is Game Theory and what is it for? Definition (Roger Myerson, Game Theory, Analysis
More informationArtificial Intelligence Methods (G5BAIM) - Examination
Question 1 a) According to John Koza there are five stages when planning to solve a problem using a genetic program. What are they? Give a short description of each. (b) How could you cope with division
More informationExpected Value II. 1 The Expected Number of Events that Happen
6.042/18.062J Mathematics for Computer Science December 5, 2006 Tom Leighton and Ronitt Rubinfeld Lecture Notes Expected Value II 1 The Expected Number of Events that Happen Last week we concluded by showing
More informationExtensive games (with perfect information)
Extensive games (with perfect information) (also referred to as extensive-form games or dynamic games) DEFINITION An extensive game with perfect information has the following components A set N (the set
More informationCS 301. Lecture 18 Decidable languages. Stephen Checkoway. April 2, 2018
CS 301 Lecture 18 Decidable languages Stephen Checkoway April 2, 2018 1 / 26 Decidable language Recall, a language A is decidable if there is some TM M that 1 recognizes A (i.e., L(M) = A), and 2 halts
More informationECO 199 GAMES OF STRATEGY Spring Term 2004 Precepts Week 7 March Questions GAMES WITH ASYMMETRIC INFORMATION QUESTIONS
ECO 199 GAMES OF STRATEGY Spring Term 2004 Precepts Week 7 March 22-23 Questions GAMES WITH ASYMMETRIC INFORMATION QUESTIONS Question 1: In the final stages of the printing of Games of Strategy, Sue Skeath
More information6.207/14.15: Networks Lecture 10: Introduction to Game Theory 2
6.207/14.15: Networks Lecture 10: Introduction to Game Theory 2 Daron Acemoglu and Asu Ozdaglar MIT October 14, 2009 1 Introduction Outline Mixed Strategies Existence of Mixed Strategy Nash Equilibrium
More informationRefinements - change set of equilibria to find "better" set of equilibria by eliminating some that are less plausible
efinements efinements - change set of equilibria to find "better" set of equilibria by eliminating some that are less plausible Strategic Form Eliminate Weakly Dominated Strategies - Purpose - throwing
More informationQuadratic Equations Part I
Quadratic Equations Part I Before proceeding with this section we should note that the topic of solving quadratic equations will be covered in two sections. This is done for the benefit of those viewing
More informationAlgorithms, Games, and Networks January 17, Lecture 2
Algorithms, Games, and Networks January 17, 2013 Lecturer: Avrim Blum Lecture 2 Scribe: Aleksandr Kazachkov 1 Readings for today s lecture Today s topic is online learning, regret minimization, and minimax
More informationCS 662 Sample Midterm
Name: 1 True/False, plus corrections CS 662 Sample Midterm 35 pts, 5 pts each Each of the following statements is either true or false. If it is true, mark it true. If it is false, correct the statement
More informationLanguages, regular languages, finite automata
Notes on Computer Theory Last updated: January, 2018 Languages, regular languages, finite automata Content largely taken from Richards [1] and Sipser [2] 1 Languages An alphabet is a finite set of characters,
More informationLecture 15: Bandit problems. Markov Processes. Recall: Lotteries and utilities
Lecture 15: Bandit problems. Markov Processes Bandit problems Action values (and now to compute them) Exploration-exploitation trade-off Simple exploration strategies -greedy Softmax (Boltzmann) exploration
More informationARTIFICIAL INTELLIGENCE. Reinforcement learning
INFOB2KI 2018-2019 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Reinforcement learning Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html
More informationReinforcement Learning
Reinforcement Learning Function approximation Mario Martin CS-UPC May 18, 2018 Mario Martin (CS-UPC) Reinforcement Learning May 18, 2018 / 65 Recap Algorithms: MonteCarlo methods for Policy Evaluation
More informationQuiz 1 Solutions. Problem 2. Asymptotics & Recurrences [20 points] (3 parts)
Introduction to Algorithms October 13, 2010 Massachusetts Institute of Technology 6.006 Fall 2010 Professors Konstantinos Daskalakis and Patrick Jaillet Quiz 1 Solutions Quiz 1 Solutions Problem 1. We
More informationSampling from Bayes Nets
from Bayes Nets http://www.youtube.com/watch?v=mvrtaljp8dm http://www.youtube.com/watch?v=geqip_0vjec Paper reviews Should be useful feedback for the authors A critique of the paper No paper is perfect!
More informationSkylines. Yufei Tao. ITEE University of Queensland. INFS4205/7205, Uni of Queensland
Yufei Tao ITEE University of Queensland Today we will discuss problems closely related to the topic of multi-criteria optimization, where one aims to identify objects that strike a good balance often optimal
More information