CMSC 474, Game Theory

Similar documents
Game playing. Chapter 6. Chapter 6 1

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence Fall Announcements

Announcements. CS 188: Artificial Intelligence Fall Adversarial Games. Computing Minimax Values. Evaluation Functions. Recap: Resource Limits

Algorithms for Playing and Solving games*

Announcements. CS 188: Artificial Intelligence Spring Mini-Contest Winners. Today. GamesCrafters. Adversarial Games

CSE 573: Artificial Intelligence

Adversarial Search & Logic and Reasoning

Evaluation for Pacman. CS 188: Artificial Intelligence Fall Iterative Deepening. α-β Pruning Example. α-β Pruning Pseudocode.

CS 188: Artificial Intelligence Fall 2008

Alpha-Beta Pruning: Algorithm and Analysis

Alpha-Beta Pruning: Algorithm and Analysis

Alpha-Beta Pruning: Algorithm and Analysis

CS 188 Introduction to Fall 2007 Artificial Intelligence Midterm

Summary. Agenda. Games. Intelligence opponents in game. Expected Value Expected Max Algorithm Minimax Algorithm Alpha-beta Pruning Simultaneous Game

CS 188: Artificial Intelligence Spring 2007

CS 570: Machine Learning Seminar. Fall 2016

Notes on induction proofs and recursive definitions

Reinforcement Learning. Spring 2018 Defining MDPs, Planning

IS-ZC444: ARTIFICIAL INTELLIGENCE

CS 4100 // artificial intelligence. Recap/midterm review!

POLYNOMIAL SPACE QSAT. Games. Polynomial space cont d

Industrial Organization Lecture 3: Game Theory

Minimax strategies, alpha beta pruning. Lirong Xia

An Analysis of Forward Pruning. to try to understand why programs have been unable to. pruning more eectively. 1

Computing Minmax; Dominance

Basic Game Theory. Kate Larson. January 7, University of Waterloo. Kate Larson. What is Game Theory? Normal Form Games. Computing Equilibria

CSC242: Intro to AI. Lecture 7 Games of Imperfect Knowledge & Constraint Satisfaction

Playing Abstract games with Hidden States (Spatial and Non-Spatial).

Computing Minmax; Dominance

Lecture 1: March 7, 2018

Properties of Forward Pruning in Game-Tree Search

Evolutionary Computation: introduction

Final. Introduction to Artificial Intelligence. CS 188 Spring You have approximately 2 hours and 50 minutes.

CS599 Lecture 1 Introduction To RL

1 Extensive Form Games

Counting. 1 Sum Rule. Example 1. Lecture Notes #1 Sept 24, Chris Piech CS 109

CS885 Reinforcement Learning Lecture 7a: May 23, 2018

Scout, NegaScout and Proof-Number Search

Deep Reinforcement Learning. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 19, 2017

Math Models of OR: Branch-and-Bound

Scout and NegaScout. Tsan-sheng Hsu.

4: Dynamic games. Concordia February 6, 2017

Notes on Coursera s Game Theory

Nondeterministic finite automata

Basics of reinforcement learning

CS221 Practice Midterm

COMP3702/7702 Artificial Intelligence Week1: Introduction Russell & Norvig ch.1-2.3, Hanna Kurniawati

Solving Extensive Form Games

A Polynomial-time Nash Equilibrium Algorithm for Repeated Games

Satisfaction Equilibrium: Achieving Cooperation in Incomplete Information Games

1. Counting. Chris Piech and Mehran Sahami. Oct 2017

Deep Reinforcement Learning

Administration. CSCI567 Machine Learning (Fall 2018) Outline. Outline. HW5 is available, due on 11/18. Practice final will also be available soon.

Computational aspects of two-player zero-sum games Course notes for Computational Game Theory Section 4 Fall 2010

CMU Lecture 12: Reinforcement Learning. Teacher: Gianni A. Di Caro

Marks. bonus points. } Assignment 1: Should be out this weekend. } Mid-term: Before the last lecture. } Mid-term deferred exam:

Rollout-based Game-tree Search Outprunes Traditional Alpha-beta

algorithms Alpha-Beta Pruning and Althöfer s Pathology-Free Negamax Algorithm Algorithms 2012, 5, ; doi: /a

COMP3702/7702 Artificial Intelligence Lecture 11: Introduction to Machine Learning and Reinforcement Learning. Hanna Kurniawati

Descriptive Statistics (And a little bit on rounding and significant digits)

Limits of Computation

Principles of Computing, Carnegie Mellon University. The Limits of Computing

Two hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE. Date: Thursday 17th May 2018 Time: 09:45-11:45. Please answer all Questions.

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm

Quantum Games. Quantum Strategies in Classical Games. Presented by Yaniv Carmeli

Chapter 4 Deliberation with Temporal Domain Models

Learning an Effective Strategy in a Multi-Agent System with Hidden Information

Computer Science 385 Analysis of Algorithms Siena College Spring Topic Notes: Limitations of Algorithms

CSC304 Lecture 5. Game Theory : Zero-Sum Games, The Minimax Theorem. CSC304 - Nisarg Shah 1

1 Definitions and Things You Know

Synthesis weakness of standard approach. Rational Synthesis

Microeconomics. 2. Game Theory

Learning Tetris. 1 Tetris. February 3, 2009

N/4 + N/2 + N = 2N 2.

5.2 Infinite Series Brian E. Veitch

Evolutionary Dynamics and Extensive Form Games by Ross Cressman. Reviewed by William H. Sandholm *

1 Trees. Listing 1: Node with two child reference. public class ptwochildnode { protected Object data ; protected ptwochildnode l e f t, r i g h t ;

Today s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning

Appendix. Mathematical Theorems

CS188: Artificial Intelligence, Fall 2009 Written 2: MDPs, RL, and Probability

Normal-form games. Vincent Conitzer

Game Theory and its Applications to Networks - Part I: Strict Competition

Artificial Intelligence Methods (G5BAIM) - Examination

Expected Value II. 1 The Expected Number of Events that Happen

Extensive games (with perfect information)

CS 301. Lecture 18 Decidable languages. Stephen Checkoway. April 2, 2018

ECO 199 GAMES OF STRATEGY Spring Term 2004 Precepts Week 7 March Questions GAMES WITH ASYMMETRIC INFORMATION QUESTIONS

6.207/14.15: Networks Lecture 10: Introduction to Game Theory 2

Refinements - change set of equilibria to find "better" set of equilibria by eliminating some that are less plausible

Quadratic Equations Part I

Algorithms, Games, and Networks January 17, Lecture 2

CS 662 Sample Midterm

Languages, regular languages, finite automata

Lecture 15: Bandit problems. Markov Processes. Recall: Lotteries and utilities

ARTIFICIAL INTELLIGENCE. Reinforcement learning

Reinforcement Learning

Quiz 1 Solutions. Problem 2. Asymptotics & Recurrences [20 points] (3 parts)

Sampling from Bayes Nets

Skylines. Yufei Tao. ITEE University of Queensland. INFS4205/7205, Uni of Queensland

Transcription:

CMSC 474, Game Theory 4b. Game-Tree Search Dana Nau University of Maryland Nau: Game Theory 1

Finite perfect-information zero-sum games! Finite: Ø finitely many agents, actions, states, histories! Perfect information: Ø Every agent knows all of the players utility functions all of the players actions and what they do the history and current state Ø No simultaneous actions agents move one-at-a-time! Constant sum (or zero-sum): Ø Constant k such that regardless of how the game ends, Σ i=1,,n u i = k Ø For every such game, there s an equivalent game in which k = 0 Nau: Game Theory 2

Examples! Deterministic: Ø chess, checkers Ø go, gomoku Ø reversi (othello) Ø tic-tac-toe, qubic, connect-four Ø mancala (awari, kalah) Ø 9 men s morris (merelles, morels, mill)! Stochastic: Ø backgammon, monopoly, yahtzee, parcheesi, roulette, craps! For now, we ll consider just the deterministic games Nau: Game Theory 3

Outline! A brief history of work on this topic! Restatement of the Minimax Theorem! Game trees! The minimax algorithm! α-β pruning! Resource limits, approximate evaluation! Most of this isn t in the game-theory book! For further information, look at one of the following Ø The private materials page Ø Russell & Norvig s Artificial Intelligence: A Modern Approach There are 3 editions of this book In the 2 nd edition, it s Chapter 6 Nau: Game Theory 4

Brief History 1846 (Babbage) designed machine to play tic-tac-toe 1928 (von Neumann) minimax theorem 1944 (von Neumann & Morgenstern) backward induction 1950 (Shannon) minimax algorithm (finite-horizon search) 1951 (Turing) program (on paper) for playing chess 1952 7 (Samuel) checkers program capable of beating its creator 1956 (McCarthy) pruning to allow deeper minimax search 1957 (Bernstein) first complete chess program, on IBM 704 vacuum-tube computer could examine about 350 positions/minute 1967 (Greenblatt) first program to compete in human chess tournaments 3 wins, 3 draws, 12 losses 1992 (Schaeffer) Chinook won the 1992 US Open checkers tournament 1994 (Schaeffer) Chinook became world checkers champion; Tinsley (human champion) withdrew for health reasons 1997 (Hsu et al) Deep Blue won 6-game match vs world chess champion Kasparov 2007 (Schaeffer et al) Checkers solved: with perfect play, it s a draw 10 14 calculations over 18 years Nau: Game Theory 5

Restatement of the Minimax Theorem! Suppose agents 1 and 2 use strategies s and t on a 2-person game G Ø Let u(s,t) = u 1 (s,t) = u 2 (s,t) Ø Call the agents Max and Min (they want to maximize and minimize u) Minimax Theorem: If G is a two-person finite zero-sum game, then there are strategies s * and t *, and a number v called G s minimax value, such that Ø If Min uses t *, Max s expected utility is v, i.e., max s u(s,t * ) = v Ø If Max uses s *, Max s expected utility is v, i.e., min t u(s *,t) = v Corollary 1:! u(s *,t * ) = v! (s *,t * ) is a Nash equilibrium s * and t * are also called perfect play for Max and Min! s * (or t * ) is Max s (or Min s) minimax strategy and maximin strategy Corollary 2: If G is a perfect-information game, then there are subgameperfect pure strategies s * and t * that satisfy the theorem. Nau: Game Theory 6

No Non-Credible Threats! Recall this example from Chapter 4:! If the agents can announce their strategies beforehand, agent 1 might want to announce (B,H) Ø Threat that if 2 chooses F, agent 1 will choose a move that hurts both agents Ø H isn t a credible threat Ø If agent 2 plays F anyway, it wouldn t be rational for agent 1 to play H C A D (3,8) (8,3) 1 B 2 2 E G F (5,5) 1 H (2,10) (1,0)! In zero-sum games, non-credible threats cannot occur Ø Any move M that hurts 2 will help 1 Ø If agent1 has an opportunity to play M, it would be rational to play it Nau: Game Theory 7

Game Tree Terminology! Root node: where the game starts! Max (or Min) node: a node where it s Max s (or Min s) move Ø Usually draw Max nodes as squares, Min nodes as circles! A node s children: the possible next nodes Ø children(h) = {σ (h,a) a χ(h)}! Terminal node: a node where the game ends Max node Min nodes Min nodes Terminal nodes Max nodes Nau: Game Theory 8

Number of Nodes! Let b = maximum branching factor! Let D = height of tree (maximum depth of any terminal node)! If D is even and the root is a Max node, then Ø The number of Max nodes is 1 + b 2 + b 4 + + b D 2 = O(b D ) Ø The number of Min nodes is b + b 3 + b 5 + + b D 1 = O(b D )! What if D is odd? Max node Min nodes Min nodes Terminal nodes Max nodes Nau: Game Theory 9

Number of Pure Strategies! Pure strategy for Max: at every Max node, choose one branch! O(b D ) Max nodes, b choices at each of them => O( b b D ) pure strategies Ø In the following tree, how many pure strategies for Max?! Many of them are equivalent (differ only at unreachable nodes) Ø How many distinct pure strategies are there?! What about Min? Max node Min nodes Min nodes Terminal nodes Max nodes Nau: Game Theory 10

Number of Distinct Pure Strategies! At every reachable Max node, choose one branch Ø Number of reachable Max nodes b 0 + b 1 + b 2 + + b (D 2)/2 = O(b D/2 )! b choices at each of them => O( b b D/2 ) distinct pure strategies! Again, what about Min? Max node Min nodes Min nodes Terminal nodes Max nodes Nau: Game Theory 11

Finding the Minimax Strategy! Brute-force way to find minimax strategy for Max Ø Construct the sets S and T of all distinct strategies for Max and Min, then choose! Complexity analysis: s* = argmin s S max t T Ø Need to construct and store O( b b D/2 ) distinct strategies Ø Each distinct strategy has size O(b D/2 ) Ø Thus space complexity is O( b (b D/2 D/2)) Ø Time complexity is slightly worse u( s,t)! But there s an easier way to find the minimax strategy! Notation: v(h) = minimax value of the subgame at node h If h is terminal then v(h) = u(h) Nau: Game Theory 12

Minimax! Backward induction algorithm (Chapter 4) for 2-player zero-sum games Ø Returns v(h); can easily modify it to return both v(h) and a function Minimax (h) if h Z then return v(h) else ρ (h) = Max then return max{minimax(σ (h,a)) : a χ(h)} else return min{minimax(σ (h,a)) : a χ(h)} MAX 3 a 1 a 2 a 3 MIN 3 2 2 a 11 a 12 a 13 a 21 a 22 a 23 a 31 a 32 a 33 3 12 8 2 4 6 14 5 2 Nau: Game Theory 13

Complexity Analysis! Space complexity = (maximum path length) (space needed to store the path) = O(bD)! Time complexity = size of the game tree = O(b D ) Ø where b = branching factor, D = height of the game tree! This is a lot better than O( b (b D/2 D/2))! But it still isn t good enough for games like chess Ø b 35, D 100 for reasonable chess games Ø b h 35 100 10 135 nodes! Number of particles in the universe 10 87 Ø 10 135 nodes is 10 55 times the number of particles in the universe no way to examine every node! Nau: Game Theory 14

Limited-Depth Minimax (Wiener,1948) function LD-minimax(h,d) if h Z then return v(h) else if d = 0 then return e(h) else if ρ (h) = Max then return max a χ(h) LD-minimax(σ (h,a)), d 1) else return min a χ(h) LD-minimax(σ (h,a)), d 1)! Minimax algorithm with an upper bound d on the search depth Ø e(h) is a static evaluation function: returns an estimate of v(h)! Returns an approximation of v(h) Ø If d height of node h, returns v(h) exactly! Space complexity = O(min(bD,bd)), where D = height of node h! Time complexity = O(min(b D, b d )) Nau: Game Theory 15

Evaluation Functions! e(h) is often a weighted sum of features Ø e(h) = w 1 f 1 (h) + w 2 f 2 (h) + + w n f n (h)! E.g., in chess, Ø 1 (white pawns black pawns) + 3 (white knights black knights) + Nau: Game Theory 16

How to Use Limited-Depth Minimax function LD-choice(h,d) if ρ (h) = Max then return arg max a χ(h) LD-minimax(σ (h,a)), d 1) else return arg min a χ(h) LD-minimax(σ (h,a)), d 1) function LD-minimax(h,d) if h Z then return v(h) else if d = 0 then return e(h) else if ρ (h) = Max then return max a χ(h) LD-minimax(σ (h,a)), d 1) else return min a χ(h) LD-minimax(σ (h,a)), d 1)! LD-choice only returns a single action; call it at every move Ø Why? Nau: Game Theory 17

Exact Values for e Don t Matter MAX MIN 1 2 1 20 1 2 2 4 1 20 20 400! Behavior is preserved under any monotonic transformation of e Ø Only the order matters Nau: Game Theory 18

Multiplayer Games! The Max n algorithm: backward induction for n players, with cutoff depth d and evaluation function e! Node evaluation is a payoff profile v v[i] is player i s payoff! Approximate SPE payoff profile Ø Exact if d height of h function Maxn (h, d) if h Z then return u(h) if d = 0 then return e(h) V = {Maxn(σ (h,a), d 1) a χ(h)} return arg max v V v[ρ (h)] function Maxn-choice (h, d) 1 s move: return arg max a χ(h) Maxn(σ (h,a), d 1) [ρ (h)] 2 p move to s 2 s q (2,4,4) r (3,5,2) s (4,5,1) move: 4 3 2 5 5 4 t u v w x y 3 (2,4,4) (6,3,1) (5,2,2) (3,5,2) (4,5,1) (1,4,5) 4 Nau: Game Theory 19

Multiplayer Games! The Paranoid algorithm: compute approximate maxmin value for player i, with cutoff depth d and evaluation function e Ø exact maxmin value if d h s height! Compute payoff for i, assuming i wants to maximize it, and the others want to minimize it function Paranoid(i, h, d) if h Z then return u(h)[i] if d = 0 then return e(h)[i] if ρ (h) = i then return max a χ(h) Paranoid(i, σ (h,a), d 1) else return min a χ(h) Paranoid(i, σ (h,a), d 1) function Paranoid-choice (i, h, d) if ρ (h) = i then return arg max a χ(h) Paranoid(i, σ (h,a), d 1) else return error 1 s move: 2 move to r 2 s q 2 r 3 s 1 move: 2 6 5 3 4 1 p t u v w x y 3 (2,4,4) (6,3,1) (5,2,2) (3,5,2) (4,5,1) (1,4,5) 1 Nau: Game Theory 20

Discussion! Neither the Max n algorithm nor the Paranoid algorithm has been entirely successful! Partly due to dynamic relationships among players Ø Human players may hold grudges Ø Informal alliances may form and dissolve over time e.g., the players who are behind may gang up on the player who s winning! These relationships can greatly influence the players strategies Ø Max n and Paranoid don t model these relationships! To play well in a multi-player game, ways are needed to Ø Decide when/whether to cooperate with others Ø Deduce each player s attitude toward the other players Nau: Game Theory 21

Pruning! Let s go back to 2-player zero-sum games Ø Minimax and LD-minimax both examine nodes that don t need to be examined MAX a 3 MIN b 3 2 c d e 3 12 8 2 X X Nau: Game Theory 22

Pruning! b is better for Max than f is! If Max is rational then Max will never choose f! So don t examine any more nodes below f Ø They can t affect v(a) MAX a 3 MIN b 3 f 2 c d e g 3 12 8 2 X X Nau: Game Theory 23

Pruning! Don t know whether h is better or worse than b MAX a 3 MIN b 3 f 2 h 14 c d e g i 3 12 8 2 X X 14 Nau: Game Theory 24

Pruning! Still don t know whether h is better or worse than b MAX a 3 MIN b 3 f 2 h 14 5 c d e g i 3 12 8 2 X X 14 j 5 Nau: Game Theory 25

Pruning! h is worse than b Ø Don t need to examine any more nodes below h! v(a) = 3 MAX a 3 3 MIN b 3 f 2 h 14 5 22 c d e g i 3 12 8 2 X X 14 j k X 5 2 Nau: Game Theory 26

Alpha Cutoff! Squares are Max nodes, circles are Min nodes! Let α = max(a,b,c), and suppose d < α! To reach s, the game must go through p, q, r! By moving elsewhere at one of those nodes, Max can get v α! If the game ever reaches node s, then Min can achieve v d < what Max can get elsewhere Ø Max will never let that happen Ø We don t need to know anything more about s! What if d = α? p v=a v=b v=c v=d v a q v b r v c α = max(a,b,c) s v d Nau: Game Theory 27

Beta Cutoff! Squares are Max nodes, circles are Min nodes! Let β = min(a,b,c), and suppose d > β p v a! To reach s, the game must go through p, q, r! By moving elsewhere at one of those nodes, Min can achieve v β v=a q v b! If the game ever reaches node s, then Max can achieve v d > what Min can get elsewhere v=b r v c Ø Min will never let that happen Ø We don t need to know anything more about s v=c s β = min(a,b,c) v d! What if d = β? v=d Nau: Game Theory 28

Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column α = b return v α = a c if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do v min(v, Alpha-Beta (σ (h,a), d 1, α, β) if v α then return v else β min(β,v) e d j m f i k l g h Nau: Game Theory 29

Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column α = b return v 7 α = X 7 a 7 c if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do v min(v, Alpha-Beta (σ (h,a), d 1, α, β) if v α then return v else β min(β,v) e d j m f i k l g h Nau: Game Theory 30

Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column α = b 7 f g return v h e α = X 7 a 7 d c i k l j if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do v min(v, Alpha-Beta (σ (h,a), d 1, α, β) if v α then return v else β min(β,v) m Nau: Game Theory 31

Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column α = b 7 f g 5 return v 5 h e α = X 7 a 7 d c i k l j if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do v min(v, Alpha-Beta (σ (h,a), d 1, α, β) if v α then return v else β min(β,v) m Nau: Game Theory 32

Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column α = b return v 7 f 5 g 5 h -3 e α = X 7 a 7 d c i k l j if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do v min(v, Alpha-Beta (σ (h,a), d 1, α, β) if v α then return v else β min(β,v) m Nau: Game Theory 33

Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column return v α = X 7 a α = 7 c b 7 d 5 e 5 j alpha cutoff f 5 i k l g 5 h -3 if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do v min(v, Alpha-Beta (σ (h,a), d 1, α, β) if v α then return v else β min(β,v) m Nau: Game Theory 34

Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column return v α = X 7 a α = 7 b 7 d 5 e 5 alpha cutoff f 5 i k g 5 h -3 c j l if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do v min(v, Alpha-Beta (σ (h,a), d 1, α, β) if v α then return v else β min(β,v) m Nau: Game Theory 35

Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column α = X 7 a α = 7 b 7 d 5 e 5 alpha cutoff f 5 i 0 k g 5 h -3 0 return v c j l if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do v min(v, Alpha-Beta (σ (h,a), d 1, α, β) if v α then return v else β min(β,v) m Nau: Game Theory 36

Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column α = X 7 a α = 7 c b 7 d 5 X 8 e 5 j alpha cutoff 8 f 5 i 8 k l g 5 h -3 0 8 return v if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do v min(v, Alpha-Beta (σ (h,a), d 1, α, β) if v α then return v else β min(β,v) β = 8 m Nau: Game Theory 37

Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column α = X 7 v min(v, a α = 7 c b 7 d m 5 X 8 e 5 j alpha cutoff 8 f 5 i 8 k l β = 8 9 beta cutoff g 5 h -3 0 8 9 return v if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do if v α then return v else β min(β,v) Alpha-Beta (σ (h,a), d 1, α, β) Nau: Game Theory 38

Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column α = X 7 v min(v, a α = 7 X 8 c b 7 α = X 7 8 8 d m X 5 8 X 8 e 5 j alpha cutoff 8 f 5 i 8 k l β = 8 9 beta cutoff g 5 h -3 0 8 9 return v if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do if v α then return v else β min(β,v) Alpha-Beta (σ (h,a), d 1, α, β) Nau: Game Theory 39

Alpha-Beta Pruning function Alpha-Beta(h, d, α, β) if h Z then return u(h) else if d = 0 then return e(h) else do everything in the 2 nd column α = X 7 v min(v, a α = 7 X 8 c b 7 α = X 7 8 8 d m β = 8 X 5 8 X 8 e 5 j alpha cutoff 8 f 5 i 8 k l β = 8 9 beta cutoff g 5 h -3 0 8 9 return v if ρ (h) = Max then v for every a χ(h) do v max(v, Alpha-Beta (σ (h,a), d 1, α, β) if v β then return v else α max(α, v) else v for every a χ(h) do if v α then return v else β min(β,v) Alpha-Beta (σ (h,a), d 1, α, β) Nau: Game Theory 40

Properties of Alpha-Beta! Alpha-beta pruning reasons about which computations are relevant Ø A form of metareasoning Theorem:! If the value returned by Minimax(h, d) is in [α,β] then Alpha-Beta(h, d, α, β) returns the same value! If the value returned by Minimax(h, d) is α then Alpha-Beta(h, d, α, β) returns a value α! If the value returned by Minimax(h, d) is β then Alpha-Beta(h, d, α, β) returns a value β Corollary: Alpha-Beta(h, d,, ) returns the same value as Minimax(h, d) Alpha-Beta(h,,, ) returns v(h) Nau: Game Theory 41

Node Ordering! Deeper lookahead (larger d) usually gives better decisions Ø There are pathological games where it doesn t, but those are rare! Compared to LD-minimax, how much farther ahead can Alpha-Beta look?! Best case:........................... Ø children of Max nodes are searched in greatest-value-first order, children of Min nodes are searched in least-value-first order Ø Alpha-Beta s time complexity is O(b d/2 ) doubles the solvable depth! Worst case: Ø children of Max nodes are searched in least-value first order, children of Min nodes are searched in greatest-value first order Ø Like LD-minimax, Alpha-Beta visits all nodes of depth d: time complexity O(b d ) Nau: Game Theory 42

Node Ordering! How to get closer to the best case: Ø Every time you expand a state s, apply e to its children Ø When it s Max s move, sort the children in order of largest e first Ø When it s Min s move, sort the children in order of smallest e first! Suppose we have 100 seconds, explore 10 4 nodes/second Ø 10 6 nodes per move Ø Put this into the form b d/2 35 8/2 Ø Best case Alpha-Beta reaches depth 8 pretty good chess program Nau: Game Theory 43

Other Modifications! Several other modifications that can improve the accuracy or computation time: Ø quiescence search and biasing Ø transposition tables Ø thinking on the opponent s time Ø table lookup of book moves Ø iterative deepening Ø forward pruning Nau: Game Theory 44

Quiescence Search and Biasing! In a game like checkers or chess, the evaluation is based greatly on material pieces Ø The evaluation is likely to be inaccurate if there are pending captures! Search deeper to reach a position where there aren t pending captures Ø Evaluations will be more accurate here! That creates another problem Ø You re searching some paths to an even depth, others to an odd depth Ø Paths that end just after your opponent s move will look worse than paths that end just after your move! To compensate, add or subtract a number called the biasing factor Nau: Game Theory 45

Transposition Tables! Often there are multiple paths to the same state (i.e., the state space is a really graph rather than a tree)! Idea: Ø when you compute s s minimax value, store it in a hash table Ø visit s again retrieve its value rather than computing it again! The hash table is called a transposition table! Problem: too many states to store all of them Ø Store some, rather than all Ø Try to store the ones that you re most likely to need Nau: Game Theory 46

Thinking on the Opponent s Time! Suppose you re at node a Ø Use alpha-beta to estimate v(b) and v(c) Ø c looks better, so move there! Consider your estimates of v(f) and v(g) Ø They suggest your opponent is likely to move to f Ø While waiting for the opponent to move, start an alpha-beta search below f! If the opponent moves to f, then you ve already done a lot of the work of figuring out your next move b 3 a d 5 e 3 f 8 g 17 8 c 8 Nau: Game Theory 47

Book Moves! In some games, experts have spent lots of time analyzing openings Ø Sequences of moves one might play at the start of the game Ø Best responses to those sequences! Some of these are cataloged in standard reference works Ø e.g., the Encyclopaedia of Chess Openings! Store these in a lookup table Ø Respond almost immediately, as long as the opponent sticks to a sequence that s in the book! A technique humans can use when playing against such a program Ø Deliberately make a move that isn t in the book Ø This may weaken the human s position, but the computer will (i) start taking longer and (ii) stop playing as well Nau: Game Theory 48

Iterative Deepening! How deeply should you search a game tree? Ø When you call Alpha-Beta(h, d,, ), what to use for d?! small d => don t make as good a decision! large d => run out of time without knowing what move to make! Solution: iterative deepening for d = 1 by 1 until you run out of time m the move returned by Alpha-Beta(h, d,, )! Why this works: Ø Time complexity is O(b 1 + b 2 + + b d ) = O(b d ) Ø For large b, each iteration takes much more time than all of the previous iterations put together Nau: Game Theory 49

Forward Pruning! Tapered search: Ø Instead of looking at all of a node s children, just look at the n best ones the n highest e(h) values Ø Decrease the value of n as you go deeper in the tree b a 5 c -3 d 8 d 17 ignore n = 3 Ø Drawback: may exclude an important move at a low level of the tree c 2 ignore! Marginal forward pruning: Ø Ignore every child h such that e(h) is smaller than the current best values from the nodes you ve already visited Ø Not reliable, should be avoided Nau: Game Theory 50

Forward Pruning! Until the mid-1970s most chess programs were trying to search the same way humans think Ø Use extensive chess knowledge at each node to select a few plausible moves; prune the others Ø Serious tactical shortcomings! Brute-force search programs did better, dominated computer chess for about 20 years! Early 1990s: development of some forward-pruning techniques that worked well Ø Null-move pruning! Today most chess programs use some kind of forward-pruning Ø null-move pruning is one of the most popular Nau: Game Theory 51

Game-Tree Search in Practice! Checkers: In 1994, Chinook ended 40-year-reign of human world champion Marion Tinsley Ø Tinsley withdrew for health reasons, died a few months later! In 2007, Checkers was solved: with perfect play, it s a draw This took 10 14 calculations over 18 years. Search space size 5 10 20! Chess: In 1997, Deep Blue defeated Gary Kasparov in a six-game match Ø Deep Blue searches 200 million positions per second Ø Uses very sophisticated evaluation, and undisclosed methods for extending some lines of search up to 40 ply! Othello: human champions don t compete against computers Ø The computers are too good! Go: in 2006, good amateurs could beat the best go programs Even with a 9-stone handicap Ø Go programs have improved a lot during the past 7 years Nau: Game Theory 52

Rules of Go (Abbreviated)! Go-board: 19 19 locations (intersections on a grid)! Back and White take turns! Black has the first move! Each move consists of placing a stone at an unoccupied location Ø You just put them on the board; you don t move them around! Adjacent stones of the same color are called a string. Ø Liberties are the empty locations next to the string Ø A string is removed if its number of liberties is 0! Score: territories (number of occupied or surrounded locations) Nau: Game Theory 53

Why Go is Difficult for Computers! A game tree s size grows exponentially with both its depth and its branching factor! The game tree for go: Ø branching factor 200 Ø game length 250 to 300 moves Ø number of paths in the game tree 10 525 to 10 620! Much too big for a normal game tree search b =2 b =3 b =4 Updated 9/23/10 Nau: Game Theory 43 Nau: Game Theory 54

Why Go is Difficult for Computers! Writing an evaluation function for chess Ø Mainly piece count, plus some positional considerations isolated/doubled pawns, rooks on open files (columns), pawns in the center of the board, etc. Ø As the game progresses, pieces get removed => evaluation gets easier! Writing an evaluation function for Go is Ø Much more complicated whether or not a group is alive which stones can be connected to one another the extent to which a position has influence or can be attacked! As the game progresses, pieces get added => evaluation gets more complicated Nau: Game Theory 55

Game-tree Search in Go! During the past five years, go programs have gotten much better! Main reason: Monte Carlo roll-outs! Basic idea: Ø Generate a list of potential moves A ={a 1,, a n } Ø For each move a i in A do Starting at σ(h,a i ), generate a set of random games G i in which the two players make all their moves at random Ø Choose the move a i that produced the best results Nau: Game Theory 56

Multi-Arm Bandit! Statistical model of sequential experiments Ø Name comes from a traditional slot machine (one-armed bandit)! Multiple actions Ø Each action provides a reward from a probability distribution associated with that specific action Ø Objective: maximize the expected utility of a sequence of actions! Exploitation vs exploration dilemma: Ø Exploitation: choosing an action that you already know about, because you think it s likely to give you a high reward Ø Exploration: choosing an action that you don t know much about, in hopes that maybe it will produce a better reward than the actions you already know about Nau: Game Theory 57

UCB (Upper Confidence Bound) Algorithm! Let! loop Ø x i = average reward you ve gotten from arm i Ø t i = number of times you ve tried arm i; Ø t = i t i Ø if there are one or more arms that have not been played Ø then play one of them Ø else play the arm i that has the highest value of x i + 2 (log t)/t i Nau: Game Theory 58

UCT (UCB for Trees)! Adaptation of UCB for game-tree search in go Ø First used in Mogo in 2006 Ø Mogo won the go tournament at the 2007 Computer Olympiad, and several other computer go tournaments Ø Mogo won one (of 3) 9x9 blitz go games against a human professional! UCT is now used in most computer go programs! I ve had some trouble finding a clear and unambiguous description of UCT Ø But I think the basic algorithm is as follows Nau: Game Theory 59

Basic UCT Algorithm! Start with an empty tree, T! loop Ø h the start of the game Ø while h T do T = {nodes we ve evaluated} The loop ends when we reach a node we haven t evaluated use UCB to choose a child of h if h has one or more children that aren t already in T then h any child of h that isn t in T else h arg max k children(h) x k + 2 (log t h )/t k Ø play a random game starting at h; v the game s payoff profile Ø add h to T; t h 1; x h v[ρ(h)] Ø for every g ancestors(h) do x g (t g x g + v[ρ(g)]) / (t g + 1) t g t g + 1 i.e., x g = avg. payoff to ρ(g) of all games below g Nau: Game Theory 60

Properties! As the number of iterations goes to infinity, the evaluation x h of the root node converges to its minmax value! Question: Ø How can this be, since x h is the the average of the values we ve gotten below h? Nau: Game Theory 61

Properties! As the number of iterations goes to infinity, the evaluation x h of the root node converges to its minmax value! Question: Ø How can this be, since x h is the the average of the values we ve gotten below h?! Look at how h is chosen:! Start with an empty tree, T! loop Ø h the start of the game Ø while h T do if h has one or more children that aren t already in T then h any child of h that isn t in T else h arg max k children(h) x k + 2 (log t h )/t k Ø Nau: Game Theory 62

Some Improvements! As written, UCT won t evaluate any nodes below h until all of h s siblings have been visited Ø Not good we want to explore some subtrees more deeply than others! So, replace this:! with this: if h has one or more children that aren t already in T then h any child of h that isn t in T else h arg max k children(h) x k + 2 (log t h )/t k h arg max k children(h) y k Ø where y k = x k + 2 (log t h )/t k, if k is already in T y k = some constant c, otherwise Nau: Game Theory 63

Some Improvements! UCB assumes all the levers are independent! Similarly, UCT assumes the values of sibling nodes are independent! So, Ø That s usually not true in practice Ø modify x h using some information computed from x h s siblings and grandparent Nau: Game Theory 64

Two-Player Zero-Sum Stochastic Games! Example: Backgammon! Two agents who take turns! At each turn, the set of available moves depends on the results of rolling the dice Ø Each die specifies how far to move one of your pieces (except if you roll doubles) Ø If your piece will land on a location with 2 or more opponent s pieces, you can t move there Ø If your piece lands on a location with 1 opponent s piece, the opponent s piece must start over 24 23 22 21 20 19 18 17 16 15 14 13 Nau: Game Theory 65

Backgammon Game Tree MAX! Players moves have deterministic outcomes DICE MIN B............ 1/36 1,1 1/18 1,2......... 1/18 1/36 6,5 6,6......! Dice rolls have stochastic outcomes DICE MAX... 1/36 1,1 1/18 1,2... C...... 1/18 1/36 6,5 6,6............ TERMINAL 2 1 1 1 1 Nau: Game Theory 66

ExpectiMinimax function ExpectiMinimax(h,d) if h Z then return v(h) else if d = 0 then return e(h) else if ρ (h) = Max then return max a χ(h) ExpectiMinimax(σ (h,a), d 1) else if ρ (h) = Min then return min a χ(h) ExpectiMinimax(σ (h,a), d 1) else return Σ a χ(h) Pr[a h] ExpectiMinimax(σ (h,a), d 1)! Returns the minimax expected utilities Ø Can be modified to return the actions! Can also be modified to do α-β pruning Ø But it s more complicated and less effective than in deterministic games 3 1 0.5 0.5 0.5 0.5 2 4 0 2 2 4 7 4 6 0 5 2 Nau: Game Theory 67

! Dice rolls increase branching factor Ø 21 possible rolls with 2 dice In Practice Ø Given the dice roll, 20 legal moves on average For some dice roles, can be much higher At depth 4, number of nodes is = 20 (21 20) 3 1.2 10 9 Ø As depth increases, probability of reaching a given node shrinks value of lookahead is diminished! α-β pruning is much less effective! TDGammon uses depth-2 search + very good evaluation function Ø world-champion level Ø The evaluation function was created automatically using a machinelearning technique called Temporal Difference learning hence the TD in TDGammon Nau: Game Theory 68

Summary! Two-player zero-sum perfect-information games Ø the maximin and minimax strategies are the same Ø only need to look at pure strategies Ø can do a game-tree search minimax values, alpha-beta pruning! In sufficiently complicated games, perfection is unattainable Ø limited search depth, static evaluation function Ø Monte Carlo roll-outs! Game-tree search can be modified for games in which there are stochastic outcomes Nau: Game Theory 69