Game playing. Chapter 6. Chapter 6 1

Similar documents
Artificial Intelligence

Minimax strategies, alpha beta pruning. Lirong Xia

CS 188: Artificial Intelligence

Announcements. CS 188: Artificial Intelligence Fall Adversarial Games. Computing Minimax Values. Evaluation Functions. Recap: Resource Limits

CS 188: Artificial Intelligence Fall Announcements

CMSC 474, Game Theory

CSE 573: Artificial Intelligence

Algorithms for Playing and Solving games*

Computational aspects of two-player zero-sum games Course notes for Computational Game Theory Section 4 Fall 2010

CS 188 Introduction to Fall 2007 Artificial Intelligence Midterm

Announcements. CS 188: Artificial Intelligence Spring Mini-Contest Winners. Today. GamesCrafters. Adversarial Games

Adversarial Search & Logic and Reasoning

IS-ZC444: ARTIFICIAL INTELLIGENCE

Administrativia. Assignment 1 due thursday 9/23/2004 BEFORE midnight. Midterm exam 10/07/2003 in class. CS 460, Sessions 8-9 1

algorithms Alpha-Beta Pruning and Althöfer s Pathology-Free Negamax Algorithm Algorithms 2012, 5, ; doi: /a

CS 4100 // artificial intelligence. Recap/midterm review!

Alpha-Beta Pruning: Algorithm and Analysis

Summary. Agenda. Games. Intelligence opponents in game. Expected Value Expected Max Algorithm Minimax Algorithm Alpha-beta Pruning Simultaneous Game

Alpha-Beta Pruning: Algorithm and Analysis

Alpha-Beta Pruning: Algorithm and Analysis

c. What are agents? Explain how they interact with environment. Ans:

Evaluation for Pacman. CS 188: Artificial Intelligence Fall Iterative Deepening. α-β Pruning Example. α-β Pruning Pseudocode.

CS 188: Artificial Intelligence Fall 2008

Properties of Forward Pruning in Game-Tree Search

This question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer.

Rollout-based Game-tree Search Outprunes Traditional Alpha-beta

CITS4211 Mid-semester test 2011

Alpha-Beta Pruning for Games with Simultaneous Moves

Extensive Form Games I

Speculative Parallelism in Cilk++

Two hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE. Date: Thursday 17th May 2018 Time: 09:45-11:45. Please answer all Questions.

Final. Introduction to Artificial Intelligence. CS 188 Spring You have approximately 2 hours and 50 minutes.

Solving Zero-Sum Extensive-Form Games. Branislav Bošanský AE4M36MAS, Fall 2013, Lecture 6

Deep Reinforcement Learning

Final. CS 188 Fall Introduction to Artificial Intelligence

Bandit View on Continuous Stochastic Optimization

Game Theory and its Applications to Networks - Part I: Strict Competition

SF2972 Game Theory Written Exam with Solutions June 10, 2011

Alpha-Beta Pruning for Games with Simultaneous Moves

Math 16 - Practice Final

Final. CS 188 Fall Introduction to Artificial Intelligence

CS885 Reinforcement Learning Lecture 7a: May 23, 2018

Alpha-Beta Pruning Under Partial Orders

Machine Learning I Reinforcement Learning

CSC242: Intro to AI. Lecture 7 Games of Imperfect Knowledge & Constraint Satisfaction

Introduction to Reinforcement Learning. Part 6: Core Theory II: Bellman Equations and Dynamic Programming

Sampling from Bayes Nets

An Analysis of Forward Pruning. to try to understand why programs have been unable to. pruning more eectively. 1

Mock Exam Künstliche Intelligenz-1. Different problems test different skills and knowledge, so do not get stuck on one problem.

Lecture 23 Branch-and-Bound Algorithm. November 3, 2009

Scout and NegaScout. Tsan-sheng Hsu.

Adversarial Search. Christos Papaloukas, Iosif Angelidis. University of Athens November 2017

Learning in Depth-First Search: A Unified Approach to Heuristic Search in Deterministic, Non-Deterministic, Probabilistic, and Game Tree Settings

Lecture 19: Monte Carlo Tree Search

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 2013

Final exam of ECE 457 Applied Artificial Intelligence for the Fall term 2007.

GSOE9210 Engineering Decisions

CS261: A Second Course in Algorithms Lecture #12: Applications of Multiplicative Weights to Games and Linear Programs

Final. Introduction to Artificial Intelligence. CS 188 Spring You have approximately 2 hours and 50 minutes.

Introduction to Fall 2009 Artificial Intelligence Final Exam

Artificial Intelligence

Algorithmic Game Theory and Applications. Lecture 4: 2-player zero-sum games, and the Minimax Theorem

Math Models of OR: Branch-and-Bound

Today s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning

Introduction to Spring 2009 Artificial Intelligence Midterm Exam

CS221 Practice Midterm

Final exam of ECE 457 Applied Artificial Intelligence for the Spring term 2007.

maxz = 3x 1 +4x 2 2x 1 +x 2 6 2x 1 +3x 2 9 x 1,x 2

AM 121: Intro to Optimization! Models and Methods! Fall 2018!

Introduction to Reinforcement Learning. CMPT 882 Mar. 18

CS 580: Algorithm Design and Analysis. Jeremiah Blocki Purdue University Spring 2018

Beyond Minimax: Nonzero-Sum Game Tree Search with Knowledge Oriented Players

Totally Corrective Boosting Algorithms that Maximize the Margin

Game Theory. Kuhn s Theorem. Bernhard Nebel, Robert Mattmüller, Stefan Wölfl, Christian Becker-Asano

Lecture 7 Duality II

Decision making, Markov decision processes

Logic. Introduction to Artificial Intelligence CS/ECE 348 Lecture 11 September 27, 2001

Playing Abstract games with Hidden States (Spatial and Non-Spatial).

23. Cutting planes and branch & bound

Game Theory. Greg Plaxton Theory in Programming Practice, Spring 2004 Department of Computer Science University of Texas at Austin

Computing Minmax; Dominance

Introduction to Fall 2008 Artificial Intelligence Midterm Exam

Reinforcement Learning

Equilibrium Points of an AND-OR Tree: under Constraints on Probability

Today: Linear Programming (con t.)

Using first-order logic, formalize the following knowledge:

arxiv: v2 [math.pr] 24 Jan 2018

Decision Procedures An Algorithmic Point of View

Kazuyuki Tanaka s work on AND-OR trees and subsequent development

Appendix. Mathematical Theorems

Supplemental Material for Monte Carlo Sampling for Regret Minimization in Extensive Games

CS 188 Introduction to AI Fall 2005 Stuart Russell Final

University of Alberta. Marc Lanctot. Doctor of Philosophy. Department of Computing Science

Economics 3012 Strategic Behavior Andy McLennan October 20, 2006

Lectures 6, 7 and part of 8

Dynamic Stochastic Control: A New Approach to Tree Search & Game-Playing

Integer Programming. Wolfram Wiesemann. December 6, 2007

Marks. bonus points. } Assignment 1: Should be out this weekend. } Mid-term: Before the last lecture. } Mid-term deferred exam:

Reinforcement Learning in Partially Observable Multiagent Settings: Monte Carlo Exploring Policies

HEPGAME: Physics, Artificial Intelligence, and the Simplification of Expressions

Transcription:

Game playing Chapter 6 Chapter 6 1

Outline Minimax α β pruning UCT for games Chapter 6 2

Game tree (2-player, deterministic, turns) Chapter 6 3

Minimax Perfect play for deterministic, perfect-information games Idea: choose move to position with highest minimax value = best achievable payoff against best play E.g., 2-ply game: Chapter 6 4

Minimax algorithm function Minimax-Decision(state) returns an action inputs: state, current state in game return the a in Actions(state) maximizing Min-Value(Result(a, state)) function Max-Value(state) returns a utility value if Terminal-Test(state) then return Utility(state) v for a, s in Successors(state) do v Max(v, Min-Value(s)) return v function Min-Value(state) returns a utility value if Terminal-Test(state) then return Utility(state) v for a, s in Successors(state) do v Min(v, Max-Value(s)) return v Chapter 6 5

Properties of minimax Complete?? Chapter 6 6

Properties of minimax Complete?? Yes, if tree is finite (chess has specific rules for this) Optimal?? Chapter 6 7

Properties of minimax Complete?? Yes, if tree is finite (chess has specific rules for this) Optimal?? Yes, against an optimal opponent. Otherwise?? Time complexity?? Chapter 6 8

Properties of minimax Complete?? Yes, if tree is finite (chess has specific rules for this) Optimal?? Yes, against an optimal opponent. Otherwise?? Time complexity?? O(b m ) Space complexity?? Chapter 6 9

Properties of minimax Complete?? Yes, if tree is finite (chess has specific rules for this) Optimal?? Yes, against an optimal opponent. Otherwise?? Time complexity?? O(b m ) Space complexity?? O(bm) (depth-first exploration) For chess, b 35, m 100 for reasonable games exact solution completely infeasible But do we need to explore every path? Chapter 6 10

α β pruning example Chapter 6 11

α β pruning example Chapter 6 12

α β pruning example Chapter 6 13

α β pruning example Chapter 6 14

α β pruning example Chapter 6 15

Why is it called α β? α is the best value (to max) found so far off the current path If V is worse than α, max will avoid it prune that branch Define β similarly for min Chapter 6 16

The α β algorithm function Alpha-Beta-Decision(state) returns an action return the a in Actions(state) maximizing Min-Value(Result(a, state)) function Max-Value(state, α, β) returns a utility value inputs: state, current state in game α, the value of the best alternative for max along the path to state β, the value of the best alternative for min along the path to state if Terminal-Test(state) then return Utility(state) v for a, s in Successors(state) do v Max(v, Min-Value(s, α, β)) if v β then return v α Max(α, v) return v function Min-Value(state, α, β) returns a utility value same as Max-Value but with roles of α, β reversed Chapter 6 17

Pruning does not affect final result Properties of α β Good move ordering improves effectiveness of pruning A simple example of the value of reasoning about which computations are relevant (a form of metareasoning) Chapter 6 18

Resource limits Standard approach: Use Cutoff-Test instead of Terminal-Test e.g., depth limit (perhaps add quiescence search) Use Eval instead of Utility i.e., evaluation function that estimates desirability of position Suppose we have 100 seconds, explore 10 4 nodes/second 10 6 nodes per move 35 8/2 α β reaches depth 8 pretty good chess program Chapter 6 19

Evaluation functions For chess, typically linear weighted sum of features Eval(s) = w 1 f 1 (s) + w 2 f 2 (s) +... + w n f n (s) e.g., w 1 = 9 with f 1 (s) = (number of white queens) (number of black queens), etc. Chapter 6 20

Upper Confidence Tree (UCT) for games Standard backup updates all parents of v l as n(v) n(v) + 1 Q(v) Q(v) + (count how often has it been played) (sum of rewards received) In games use a negamax backup: While iterating upward, flip sign in each iteration Survey of MCTS applications: Browne et al.: A Survey of Monte Carlo Tree Search Methods, 2012. Chapter 6 21

Brief notes on game theory (Small) zero-sum games can be represented by a payoff matrix U ji denotes the utility of player 1 if she chooses the pure (=deterministic) strategy i and player 2 chooses the pure strategy j. Zero-sum games: U ji = U ij, U T = U Fining a minimax optimal mixed strategy p is a Linear Program max w s.t. Up w, p i = 1, p 0 w Note that Up w implies min j (Up) j w. i Gainable payoff of player 1: max p min q q T Up Minimax-Theorem: max p min q q T Up = min q max p q T Up Minimax-Theorem optimal p with w 0 exists Chapter 6 22