Summary. Agenda. Games. Intelligence opponents in game. Expected Value Expected Max Algorithm Minimax Algorithm Alpha-beta Pruning Simultaneous Game

Similar documents
CS 4100 // artificial intelligence. Recap/midterm review!

CS 188 Introduction to Fall 2007 Artificial Intelligence Midterm

Announcements. CS 188: Artificial Intelligence Fall Adversarial Games. Computing Minimax Values. Evaluation Functions. Recap: Resource Limits

Announcements. CS 188: Artificial Intelligence Spring Mini-Contest Winners. Today. GamesCrafters. Adversarial Games

CS 188: Artificial Intelligence Fall Announcements

CS 188: Artificial Intelligence

IS-ZC444: ARTIFICIAL INTELLIGENCE

Final. CS 188 Fall Introduction to Artificial Intelligence

Introduction to Spring 2009 Artificial Intelligence Midterm Exam

POLYNOMIAL SPACE QSAT. Games. Polynomial space cont d

Minimax strategies, alpha beta pruning. Lirong Xia

Game playing. Chapter 6. Chapter 6 1

Adversarial Search & Logic and Reasoning

Evaluation for Pacman. CS 188: Artificial Intelligence Fall Iterative Deepening. α-β Pruning Example. α-β Pruning Pseudocode.

CS 188: Artificial Intelligence Fall 2008

CSE 573: Artificial Intelligence

Introduction to Spring 2009 Artificial Intelligence Midterm Solutions

Algorithms for Playing and Solving games*

CS599 Lecture 1 Introduction To RL

Machine Learning I Reinforcement Learning

CS221 Practice Midterm

ENGR 200 ENGR 200. What did we do last week?

Adversarial Search. Christos Papaloukas, Iosif Angelidis. University of Athens November 2017

Information, Utility & Bounded Rationality

Final. CS 188 Fall Introduction to Artificial Intelligence

CITS4211 Mid-semester test 2011

Playing Abstract games with Hidden States (Spatial and Non-Spatial).

ASHANTHI MEENA SERALATHAN. Haverford College, Computer Science Department Advisors: Dr. Deepak Kumar Dr. Douglas Blank Dr.

CSC 8301 Design & Analysis of Algorithms: Lower Bounds

Today s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning

Uncertainty. Michael Peters December 27, 2013

CS 361: Probability & Statistics

CSC 1700 Analysis of Algorithms: P and NP Problems

Lecture 25: Learning 4. Victor R. Lesser. CMPSCI 683 Fall 2010

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

Lecture 2 : CS6205 Advanced Modeling and Simulation

An Analysis of Forward Pruning. to try to understand why programs have been unable to. pruning more eectively. 1

Reinforcement Learning. Introduction

Q-learning Tutorial. CSC411 Geoffrey Roeder. Slides Adapted from lecture: Rich Zemel, Raquel Urtasun, Sanja Fidler, Nitish Srivastava

Exam EDAF May 2011, , Vic1. Thore Husfeldt

the tree till a class assignment is reached

CSL302/612 Artificial Intelligence End-Semester Exam 120 Minutes

Rollout-based Game-tree Search Outprunes Traditional Alpha-beta

CMSC 474, Game Theory

Final. Introduction to Artificial Intelligence. CS 188 Spring You have approximately 2 hours and 50 minutes.

HSMC 2017 Free Response

Reinforcement Learning

Deep Reinforcement Learning

Introduction to Fall 2009 Artificial Intelligence Final Exam

Two hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE. Date: Thursday 17th May 2018 Time: 09:45-11:45. Please answer all Questions.

Reinforcement Learning

Monopoly An Analysis using Markov Chains. Benjamin Bernard

Final. Introduction to Artificial Intelligence. CS 188 Summer 2014

Decision Theory: Q-Learning

Artificial Intelligence

Optimism in the Face of Uncertainty Should be Refutable

CS Communication Complexity: Applications and New Directions

Probability theory and mathematical statistics:

Mock Exam Künstliche Intelligenz-1. Different problems test different skills and knowledge, so do not get stuck on one problem.

Decision Trees. Tirgul 5

The Reinforcement Learning Problem

Learning in Zero-Sum Team Markov Games using Factored Value Functions

Complexity of stochastic branch and bound methods for belief tree search in Bayesian reinforcement learning

Solving Zero-Sum Extensive-Form Games. Branislav Bošanský AE4M36MAS, Fall 2013, Lecture 6

Course basics. CSE 190: Reinforcement Learning: An Introduction. Last Time. Course goals. The website for the class is linked off my homepage.

Alpha-Beta Pruning: Algorithm and Analysis

Reinforcement Learning

Introduction to Fall 2011 Artificial Intelligence Final Exam

CSC304 Lecture 5. Game Theory : Zero-Sum Games, The Minimax Theorem. CSC304 - Nisarg Shah 1

Written examination: Solution suggestions TIN175/DIT411, Introduction to Artificial Intelligence

19 Optimal behavior: Game theory

Introduction to Reinforcement Learning. Part 6: Core Theory II: Bellman Equations and Dynamic Programming

Alpha-Beta Pruning: Algorithm and Analysis

Machine Learning Recitation 8 Oct 21, Oznur Tastan

Autonomous Helicopter Flight via Reinforcement Learning

Practicable Robust Markov Decision Processes

General-sum games. I.e., pretend that the opponent is only trying to hurt you. If Column was trying to hurt Row, Column would play Left, so

algorithms Alpha-Beta Pruning and Althöfer s Pathology-Free Negamax Algorithm Algorithms 2012, 5, ; doi: /a

Name: UW CSE 473 Final Exam, Fall 2014

Machine Learning 3. week

Machine Learning. Reinforcement learning. Hamid Beigy. Sharif University of Technology. Fall 1396

REINFORCEMENT LEARNING

Properties of Forward Pruning in Game-Tree Search

Bandit models: a tutorial

Enochian Chess for Divination

Chapter 6: Temporal Difference Learning

CS 237: Probability in Computing

Alpha-Beta Pruning: Algorithm and Analysis

Learning Tetris. 1 Tetris. February 3, 2009

A Polynomial-time Nash Equilibrium Algorithm for Repeated Games

Grundlagen der Künstlichen Intelligenz

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag

Probability. VCE Maths Methods - Unit 2 - Probability

Final exam of ECE 457 Applied Artificial Intelligence for the Spring term 2007.

Game Theory and Algorithms Lecture 2: Nash Equilibria and Examples

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm

SF2972 Game Theory Written Exam with Solutions June 10, 2011

CS 188 Fall Introduction to Artificial Intelligence Midterm 2

Introduction to Spring 2006 Artificial Intelligence Practice Final

Transcription:

Summary rtificial Intelligence and its applications Lecture 4 Game Playing Search onstraint Satisfaction Problems From start state to goal state onsider constraints Professor Daniel Yeung danyeung@ieee.org Dr. Patrick han patrickchan@ieee.org South hina University of Technology, hina Difficulty Game Playing Markov Decision Processes onsider an adversary onsider an uncertainty Reinforcement Learning No information is given 1 2 genda Expected Value Expected Max lgorithm Minimax lgorithm lpha-beta Pruning Simultaneous Game Games Intelligence opponents in game 3 4

Search and Game Information of Game Search Problem Independent from your decision Games 5 ompetition with an adversary Game opponent acts according to your decision Decision should be made in limited time in game: approximation is needed traditional hallmark of intelligence Model for many applications: Military confrontations, negotiation, auctions, State: s s start : starting state IsEnd(s): whether s is an end state (game over) ctions(s): possible actions from state s Succ(s, a): resulting state if choose action a in state s (s): agent s utility for end state s Player(s) ϵ Players: player who controls state s Players = {agent, opp} Information of Game Example Question Players: State s: IsEnd(s): ctions(s): {white, black} position of all pieces checkmate or draw legal chess moves that Player(s) can make Rules: You choose one of the bin Your opponent chooses a number in your chosen bin Your goal is to maximize the chosen number (s): -50 50 +1 if white wins 0 if draw, 1 if black wins Depend on attitude of opponent Stochastic (base on probability) gainst you e helpful (unlikely) 7 8

Expected Value Similar to a search problem, build a tree -5050 hance node is denoted by a circle Take an action with probability Each node is a decision point for a player you opponent -50 50 outcome Each root-to-leaf path is a possible outcome of the game Expected Value Stochastic policies p (s, a) [0, 1] Probability of player p taking action a in state s For a Two-player Game Value of the game V agent,opp (s) agent opp agent Σ aϵctions(s) π agent (s,a) V agent,opp (Succ(s,a)) Σ aϵctions(s) π opp (s,a) V agent,opp (Succ(s,a)) 9 10 Expected Value Example 1 Expected Value V agent,opp (s) Σ aϵctions(s) π agent (s,a) V agent,opp (Succ(s,a)) Σ aϵctions(s) π opp (s,a) V agent,opp (Succ(s,a)) V agent,opp (s) Σ aϵctions(s) π agent (s,a) V agent,opp (Succ(s,a)) Σ aϵctions(s) π opp (s,a) V agent,opp (Succ(s,a)) ssume agent(s start,)= agent (S start,)= agent (S start,)=1/3 opp(s, L) = opp (s, R) = 0.5, for any s What is the value of the game? V agent,opp (S 1 ) 1/3 1/3 = 1/3 V agent,opp (S 11 ) + 1/3 V agent,opp (S 12 ) S 11 S 12 S 13 + 1/3 V agent,opp (S 13 ) 0.5 0.5 0.5 0.5 0.5 0.5 = 1/3 x (0.5 x (-50) + 0.5 x (50)) + 1/3 x (0.5 x (1) + 0.5 x (3)) + -50 50 1/3 x (0.5 x (-5) + 0.5 x (15)) = 14/3 11 S 1-5050 1/3 ssume agent(s start, ) = 1 opp(s, L) = opp (s, R) = 0.5, for any s What is the value of the game: V agent,opp (S 1 ) = 1 V agent,opp (S 11 ) + 0 V agent,opp (S 12 ) + 0 V agent,opp (S 13 ) = 1 x (0.5 x (-50) + 0.5 x (50)) + 0 x (0.5 x (1) + 0.5 x (3)) + 0 x (0.5 x (-5) + 0.5 x (15)) = 0 12 1 0 S 1-5050 0.5 0.5 0.5 0.5 0.5 0.5-50 50 0 S 11 S 12 S 13

Expected Value Example 3 ssume opp(s 11, L) = 0.4, opp (s 11, R) = 0. opp(s 12, L) = 0.5, opp (s 12, R) = 0.5 opp(s 13, L) = 0.2, opp (s 13, R) = 0.8 Which action we should choose? 13 V agent,opp () = 10 V agent,opp () = 2 V agent,opp () = 11 We should take the one with the maximum value ction 10 S 1-5050??? S 11 S 12 2 S 13 11 0.4 0. 0.5 0.5 0.2 0.8-50 50 Expected Max lgorithm Expected Max lgorithm selects action maximizing value over actions Max nodes is denoted by an upward-pointing triangle V max,opp (s) 14 agent opp agent max aϵctions(s) V max,opp (Succ(s,a)) Σ aϵctions(s) π opp (s,a) V max,opp (Succ(s,a)) Expected Max lgorithm Example ssume opp(s, L) = opp (s, R) = 0.5, for any s Which action an agent will choose in Expected Max lgorithm? max aϵctions(s) V max,opp (Succ(s,a)) ction = V max,opp (S start ) = 20 5?? 20 20 0.5 0.5 0.5 0.5 0.5 0.5-10 20 10 30 5 15? 10 Minimax Unfortunately, we never know what our opponent will do ssume they take action randomly may be too optimistic The worst case should be considered 15 1

Minimax Minimax assumes opponent selects the worst action to an agent Minimax Example 1 Which action an agent will choose in minimax? agent opp agent min aϵctions(s) V max,min (Succ(s,a)) max aϵctions(s) V max,min (Succ(s,a)) Min nodes is denoted by an upside-down triangle V max,min (s) max aϵctions(s) V max,min (Succ(s,a)) min aϵctions(s) V max,min (Succ(s,a)) ction = V max,min (S start ) = 1-50 -50 50 1 1-5 17 18 Node Summary Example 1 2 3 1 Which action an agent will choose in minimax? 1 3 2 2 hance node weighted sum Max node max Min node min 2 5 2 7 10-50 2 7 5 8 2 7 2 5 10 22 19 20

New Game: Rules You choose one of the three bins Then Flip a coin; if heads, then move one bin to the left (with wrap around) If not, just stick on your choice Your opponent chooses a number from that bin Your goal is to maximize the chosen number -50 50 Now, we have three parties Players = {agent, opp, coin} V max,coin,min (s) agent coin opp max aϵctions(s) V max,coin,min (Succ(s,a)) min aϵctions(s) V max,coin,min (Succ(s,a)) agent Σ aϵctions(s) π coin (s,a) V max,coin,min (Succ(s,a)) Player(s) = coin 21 22 You choose one of the three bins Flip a coin; -5050 if heads, then move one bin to the left (with wrap around) If not, just stick on your choice Your opponent chooses a number from that bin Your goal is to maximize the chosen number V max,coin,min (s)= max( E(min(-50,50), min(-5,15)), -5050 E(min(1,3), min(-50,50), E(min(-5,15), min(1,3) ) = max(e(-50, -5), E(1, -50), E(-5,1))= max(-27.5,-24.5,-2) -2 = -2-27.5-24.5-2 -50 5 1-50 -5 1-50 50-50 50-50 50-50 50 23 24

Time omplexity Time omplexity fter a game is modeled as a tree, the search technique can be used omplexity: Space: O(d) Time: O(b2d) However, even a simple game like Tic Tac Toe, the tree is very complicated where b: branching factor, d: depth Example: hess b 35, d 50 long path to get the utility Time omplexity = Time / Space complexity is large in practice 255155207298852924121150151425587 3019041448811019324177784407714 7258239937358437329870435557897823 3195377353285543297897750743 9318774414025 utility 25 2 https://commons.wikimedia.org/wiki/file:tic-tac-toe-full-game-tree-x-rational.jpg dvanced Method dvanced Methods Evaluation Function Original How to speed up minimax? Evaluation Functions Do not access TRUE utility but approximate it use domain-specific knowledge lpha-beta Pruning general-purpose Ignore unnecessary path compute exact answer Evaluation Functions Sstart dmax s Very tall Send =??? Evaluation! = 1 (win) 27 28

dvanced Method Evaluation Function Limited depth tree search (stop at maximum depth d max ) Eval(s) evaluates the value of V max,min (s) at d max (may be very inaccurate) V max,min (s,d) d max Eval(s) d=0 max aϵctions(s) V max,min (Succ(s,a), d-1) min aϵctions(s) V max,min (Succ(s,a), d-1) dvanced Method: Evaluation Function Example Example: hess Eval(s) = material + mobility + king-safety + center-control Material: 10 100 (K K ) + 9(Q Q ) + 5(R R )+ 3( ) + 3(N N ) + 1(P P ) K : King, Q : Queen, R: rook, : bishop, N : Knight, P : Pawn : the difference in due to the move Mobility: 0.1 x (legal_move# - legal_move# ) King-safety: keeping the king safe is good enter-control: control the center of the board 29 30 dvanced Method lpha-beta Pruning dvanced Method lpha-beta Pruning In some cases, visiting some branches is not necessary in minimax algorithm For example s opp always take minimal value, after finding utility 2, the value of action on the right cannot be more than 2 No need to further investigate 3 3 5 2 10 2 Prune a node if its value is not in the interval bounded by and, (i.e. ~( ), v is value of node) where a s : lower bound on value of max node s where b s : upper bound on value of min node s No need to check this branch 31 32

dvanced Method: lpha-beta Pruning Example 1 dvanced Method: lpha-beta Pruning Example: The last node can be pruned α α β 5 7 97 83 2 4 8 No overlap with every ancestor where a s : lower bound on value of max node s 8 4 9 7 No need to check this branch as the value cannot be bigger than 8 3 Still need to check the rest as the value may be equal to 8 where b s : upper bound on value of min node s 33 34 dvanced Method: lpha-beta Pruning dvanced Method: lpha-beta Pruning Example 3 7 9 9 7 7 9 7 8 3 7 8 3 7 8 9 7 8 3 7 7 3 9 7 8 3 9 20 50-2 -2 99 1 1 5 5 1 7 7 7 7 3 50 20-2 15 50-50 5 15 35 9 7 8 3 Prune here! 9 7 8 3 3

Simultaneous Game Turn-based Games Simultaneous Game Simultaneous Game Example Two-finger Morra Rules Players and each show 1 or 2 fingers. If both show 1, gives 2 dollars. If both show 2, gives 4 dollars. Otherwise, gives 3 dollars 37 38 Simultaneous Game Example Simultaneous Game Type of Strategy If both show 1, gives 2 dollars. If both show 2, gives 4 dollars. Otherwise, gives 3 dollars V(a,b), where a,bϵctions 2-3 -3 4 Pure Strategy lways do the same action If π(b) = 1 and π(a) = 0, where b a and a ϵ ctions E.g. lways 1: π = [1, 0] General Strategy Take an action with probability 0 π(a) 1 for a ϵ ctions E.g. Uniformly random: π = [ 1/2, 1/2] 39 40

Simultaneous Game Expected Value Summary Value of the game if follows π and follows π is Search From start state to goal state V(π, π ) = Σ a,b π (a)π (b) V(a,b) onstraint Satisfaction Problems onsider constraints Example: π = [1, 0], π = [1/2, 1/2] V(π, π ) = Σ a,b π (a)π (b) V(a,b) = (1 x 1/2 x 2) + (1 x 1/2 x -3) + (0 x 1/2 x -3) + (0 x 1/2 x 4) = -1/2 2-3 -3 4 Difficulty Game Playing Markov Decision Processes Reinforcement Learning onsider an adversary onsider an uncertainty No information is given 41 42