CS 188: Artificial Intelligence

Similar documents
CS 188: Artificial Intelligence Fall Announcements

Announcements. CS 188: Artificial Intelligence Fall Adversarial Games. Computing Minimax Values. Evaluation Functions. Recap: Resource Limits

Announcements. CS 188: Artificial Intelligence Spring Mini-Contest Winners. Today. GamesCrafters. Adversarial Games

Evaluation for Pacman. CS 188: Artificial Intelligence Fall Iterative Deepening. α-β Pruning Example. α-β Pruning Pseudocode.

CS 188: Artificial Intelligence Fall 2008

CSE 573: Artificial Intelligence

Minimax strategies, alpha beta pruning. Lirong Xia

CS 4100 // artificial intelligence. Recap/midterm review!

Game playing. Chapter 6. Chapter 6 1

CS 188 Introduction to Fall 2007 Artificial Intelligence Midterm

CSE 473: Artificial Intelligence

Reinforcement Learning Wrap-up

Algorithms for Playing and Solving games*

CS 188: Artificial Intelligence Spring Announcements

CS 343: Artificial Intelligence

CS188 Outline. We re done with Part I: Search and Planning! Part II: Probabilistic Reasoning. Part III: Machine Learning

Final. Introduction to Artificial Intelligence. CS 188 Spring You have approximately 2 hours and 50 minutes.

CS188 Outline. CS 188: Artificial Intelligence. Today. Inference in Ghostbusters. Probability. We re done with Part I: Search and Planning!

CS 188: Artificial Intelligence

Adversarial Search. Christos Papaloukas, Iosif Angelidis. University of Athens November 2017

CS 570: Machine Learning Seminar. Fall 2016

CS 188: Artificial Intelligence Spring Today

Alpha-Beta Pruning: Algorithm and Analysis

CS188: Artificial Intelligence, Fall 2009 Written 2: MDPs, RL, and Probability

Today s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning

CSE 473: Artificial Intelligence Probability Review à Markov Models. Outline

Local search and agents

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence. Our Status in CS188

CS 188: Artificial Intelligence Spring 2007

Alpha-Beta Pruning: Algorithm and Analysis

Adversarial Search & Logic and Reasoning

Summary. Agenda. Games. Intelligence opponents in game. Expected Value Expected Max Algorithm Minimax Algorithm Alpha-beta Pruning Simultaneous Game

Alpha-Beta Pruning: Algorithm and Analysis

Our Status in CSE 5522

Final. Introduction to Artificial Intelligence. CS 188 Spring You have approximately 2 hours and 50 minutes.

CS221 Practice Midterm

IS-ZC444: ARTIFICIAL INTELLIGENCE

Our Status. We re done with Part I Search and Planning!

Final. CS 188 Fall Introduction to Artificial Intelligence

CMSC 474, Game Theory

Approximate Q-Learning. Dan Weld / University of Washington

Probabilistic Models. Models describe how (a portion of) the world works

CS 5522: Artificial Intelligence II

Introduction to Fall 2008 Artificial Intelligence Midterm Exam

CS221 Practice Midterm #2 Solutions

CSE 473: Artificial Intelligence Spring 2014

Final. CS 188 Fall Introduction to Artificial Intelligence

CS 5522: Artificial Intelligence II

CS 5522: Artificial Intelligence II

CS 188: Artificial Intelligence

Computational aspects of two-player zero-sum games Course notes for Computational Game Theory Section 4 Fall 2010

Two hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE. Date: Thursday 17th May 2018 Time: 09:45-11:45. Please answer all Questions.

CS 5522: Artificial Intelligence II

Alpha-Beta Pruning Under Partial Orders

CS188: Artificial Intelligence, Fall 2009 Written 2: MDPs, RL, and Probability

MAT Mathematics in Today's World

CS 343: Artificial Intelligence

Against the F-score. Adam Yedidia. December 8, This essay explains why the F-score is a poor metric for the success of a statistical prediction.

CSE 473: Ar+ficial Intelligence. Hidden Markov Models. Bayes Nets. Two random variable at each +me step Hidden state, X i Observa+on, E i

CS 188: Artificial Intelligence Spring Announcements

15-780: Graduate Artificial Intelligence. Reinforcement learning (RL)

CS 188: Artificial Intelligence Fall 2011

Practicable Robust Markov Decision Processes

Artificial Intelligence Bayes Nets: Independence

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence

Properties of Forward Pruning in Game-Tree Search

Announcements. CS 188: Artificial Intelligence Spring Probability recap. Outline. Bayes Nets: Big Picture. Graphical Model Notation

CSC242: Intro to AI. Lecture 7 Games of Imperfect Knowledge & Constraint Satisfaction

CS 188: Artificial Intelligence. Outline

CS 343: Artificial Intelligence

CS 188: Artificial Intelligence Spring Announcements

Midterm. Introduction to Artificial Intelligence. CS 188 Summer You have approximately 2 hours and 50 minutes.

An Analysis of Forward Pruning. to try to understand why programs have been unable to. pruning more eectively. 1

Introduction to Spring 2009 Artificial Intelligence Midterm Exam

CS 343: Artificial Intelligence

Learning in Depth-First Search: A Unified Approach to Heuristic Search in Deterministic, Non-Deterministic, Probabilistic, and Game Tree Settings

Introduction to Fall 2011 Artificial Intelligence Final Exam

Announcements. CS 188: Artificial Intelligence Spring Classification. Today. Classification overview. Case-Based Reasoning

Sampling from Bayes Nets

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

Multiagent (Deep) Reinforcement Learning

CS 361: Probability & Statistics

CSL302/612 Artificial Intelligence End-Semester Exam 120 Minutes

Announcements. CS 188: Artificial Intelligence Fall Markov Models. Example: Markov Chain. Mini-Forward Algorithm. Example

Marks. bonus points. } Assignment 1: Should be out this weekend. } Mid-term: Before the last lecture. } Mid-term deferred exam:

COMP3702/7702 Artificial Intelligence Lecture 11: Introduction to Machine Learning and Reinforcement Learning. Hanna Kurniawati

CSE 473: Ar+ficial Intelligence. Probability Recap. Markov Models - II. Condi+onal probability. Product rule. Chain rule.

: Cryptography and Game Theory Ran Canetti and Alon Rosen. Lecture 8

CSE 473: Ar+ficial Intelligence

CS 188: Artificial Intelligence Spring Announcements

ARTIFICIAL INTELLIGENCE. Reinforcement learning

Learning: Binary Perceptron. Examples: Perceptron. Separable Case. In the space of feature vectors

CS 188: Artificial Intelligence Spring Announcements

CS 343: Artificial Intelligence

Introduction to Spring 2006 Artificial Intelligence Practice Final

CS 188: Artificial Intelligence Fall 2009

The Off Switch Dylan Hadfield- Menell University of California, Berkeley Joint work with Anca Dragan, Pieter Abbeel, and Stuart Russell

Transcription:

CS 188: Artificial Intelligence Adversarial Search II Instructor: Anca Dragan University of California, Berkeley [These slides adapted from Dan Klein and Pieter Abbeel]

Minimax Example 3 12 8 2 4 6 14 5 2

Resource Limits

Game Tree Pruning

Minimax Pruning 3 12 8 2 14 5 2

Alpha-Beta Pruning o General configuration (MIN version) o We re computing the MIN-VALUE at some node n o We re looping over n s children MAX o n s estimate of the childrens min is dropping o Who cares about n s value? MAX MIN o Let a be the best value that MAX can get at any choice point along the current path from the root o If n becomes worse than a, MAX will avoid it, so we can stop considering n s other children (it s already bad enough that it won t be played) MAX o MAX version is symmetric MIN a n

Alpha-Beta Implementation α: MAX s best option on path to root β: MIN s best option on path to root def max-value(state, α, β): initialize v = - for each successor of state: v = max(v, value(successor, α, β)) if v β return v α = max(α, v) return v def min-value(state, α, β): initialize v = + for each successor of state: v = min(v, value(successor, α, β)) if v α return v β = min(β, v) return v

Alpha-Beta Pruning Properties o This pruning has no effect on minimax value computed for the root! o Values of intermediate nodes might be wrong max o Important: children of the root may have the wrong value o So the most naïve version won t let you do action selection min o Good child ordering improves effectiveness of pruning o With perfect ordering : o Time complexity drops to O(bm/2) o Doubles solvable depth! o Full search of, e.g. chess, is still hopeless o 10 10 0

Alpha-Beta Quiz

Alpha-Beta Quiz 2

Resource Limits

Resource Limits o Problem: In realistic games, cannot search to leaves! max 4-2 o Solution: Depth-limited search -1 o Instead, search only to a limited depth in the tree o Replace terminal utilities with an evaluation function for nonterminal positions 4-2 4?? min 9 o Example: o Suppose we have 100 seconds, can explore 10K nodes / sec o So can check 1M nodes per move o - reaches about depth 8 decent chess program o Guarantee of optimal play is gone o More plies makes a BIG difference o Use iterative deepening for an anytime algorithm??

Depth Matters o Evaluation functions are always imperfect o The deeper in the tree the evaluation function is buried, the less the quality of the evaluation function matters o An important example of the tradeoff between complexity of features and complexity of computation [Demo: depth limited (L6D4, L6D5)]

Video of Demo Limited Depth (2)

Video of Demo Limited Depth (10)

Evaluation Functions

Evaluation Functions o Evaluation functions score non-terminals in depth-limited search o Ideal function: returns the actual minimax value of the position o In practice: typically weighted linear sum of features: o e.g. f1(s) = (num white queens num black queens), etc.

Uncertain Outcomes

Worst-Case vs. Average Case max min 10 10 9 100 Idea: Uncertain outcomes controlled by chance, not an adversary!

Expectimax Search o Why wouldn t we know what the result of an action will be? o Explicit randomness: rolling dice o Unpredictable opponents: the ghosts respond randomly o Actions can fail: when moving a robot, wheels might slip max chance o Values should now reflect average-case (expectimax) outcomes, not worst-case (minimax) outcomes o Expectimax search: compute the average score under optimal play o o o o 10 10 4 5 9 100 7 Max nodes as in minimax search Chance nodes are like min nodes but the outcome is uncertain Calculate their expected utilities I.e. take weighted average (expectation) of children o Later, we ll learn how to formalize the underlying uncertainresult problems as Markov Decision Processes [Demo: min vs exp (L7D1,2)]

Video of Demo Minimax vs Expectimax (Min)

Video of Demo Minimax vs Expectimax (Exp)

Expectimax Pseudocode def value(state): if the state is a terminal state: return the state s utility if the next agent is MAX: return max-value(state) if the next agent is EXP: return exp-value(state) def max-value(state): initialize v = - for each successor of state: v = max(v, value(successor)) return v def exp-value(state): initialize v = 0 for each successor of state: p = probability(successor) v += p * value(successor) return v

Expectimax Pseudocode def exp-value(state): initialize v = 0 for each successor of state: p = probability(successor) v += p * value(successor) return v 1/2 1/3 5 8 v = (1/2) (8) + (1/3) (24) + (1/6) (-12) = 10 24 7 1/6-12

Expectimax Example 3 12 9 2 4 6 15 6 0

Expectimax Pruning? 3 12 9 2

Depth-Limited Expectimax 400 300 492 Estimate of true expectimax value (which would require a lot of work to compute) 362

Probabilities

Reminder: Probabilities o A random variable represents an event whose outcome is unknown o A probability distribution is an assignment of weights to outcomes o Example: Traffic on freeway 0.25 o Random variable: T = whether there s traffic o Outcomes: T in {none, light, heavy} o Distribution: P(T=none) = 0.25, P(T=light) = 0.50, P(T=heavy) = 0.25 o Some laws of probability (more later): o Probabilities are always non-negative o Probabilities over all possible outcomes sum to one 0.50 o As we get more evidence, probabilities may change: o P(T=heavy) = 0.25, P(T=heavy Hour=8am) = 0.60 o We ll talk about methods for reasoning and updating probabilities later 0.25

Reminder: Expectations o The expected value of a function of a random variable is the average, weighted by the probability distribution over outcomes o Example: How long to get to the airport? Time: 20 min Probability: 0.25 x + 30 min x 0.50 + 60 min x 0.25 35 min

What Probabilities to Use? o In expectimax search, we have a probabilistic model of how the opponent (or environment) will behave in any state o Model could be a simple uniform distribution (roll a die) o Model could be sophisticated and require a great deal of computation o We have a chance node for any outcome out of our control: opponent or environment o The model might say that adversarial actions are likely! o For now, assume each chance node magically comes along with probabilities that specify the distribution over its outcomes Having a probabilistic belief about another agent s action does not mean that the agent is flipping any coins!

Quiz: Informed Probabilities o Let s say you know that your opponent is actually running a depth 2 minimax, using the result 80% of the time, and moving randomly otherwise o Question: What tree search should you use? Answer: Expectimax! 0.1 0.9 To figure out EACH chance node s probabilities, you have to run a simulation of your opponent This kind of thing gets very slow very quickly Even worse if you have to simulate your opponent simulating you except for minimax, which has the nice property that it all collapses into one game tree

Modeling Assumptions

The Dangers of Optimism and Pessimism Dangerous Optimism Assuming chance when the world is adversarial Dangerous Pessimism Assuming the worst case when it s not likely

Assumptions vs. Reality Minimax Pacman Expectimax Pacman Adversarial Ghost Random Ghost Won 5/5 Won 5/5 Avg. Score: 483 Avg. Score: 493 Won 1/5 Won 5/5 Avg. Score: -303 Avg. Score: 503 Results from playing 5 games Pacman used depth 4 search with an eval function that avoids trouble Ghost used depth 2 search with an eval function that seeks Pacman [Demos: world assumptions (L7D3,4,5,6)]

Assumptions vs. Reality Minimax Pacman Expectimax Pacman Adversarial Ghost Random Ghost Won 5/5 Won 5/5 Avg. Score: 483 Avg. Score: 493 Won 1/5 Won 5/5 Avg. Score: -303 Avg. Score: 503 Results from playing 5 games Pacman used depth 4 search with an eval function that avoids trouble Ghost used depth 2 search with an eval function that seeks Pacman [Demos: world assumptions (L7D3,4,5,6)]

Video of Demo World Assumptions Random Ghost Expectimax Pacman

Video of Demo World Assumptions Adversarial Ghost Minimax Pacman

Video of Demo World Assumptions Adversarial Ghost Expectimax Pacman

Video of Demo World Assumptions Random Ghost Minimax Pacman

Other Game Types

Mixed Layer Types o E.g. Backgammon o Expectiminimax o Environment is an extra random agent player that moves after each min/max agent o Each node computes the appropriate combination of its children

Example: Backgammon o Dice rolls increase b: 21 possible rolls with 2 dice o Backgammon 20 legal moves o Depth 2 = 20 x (21 x 20)3 = 1.2 x 109 o As depth increases, probability of reaching a given search node shrinks o So usefulness of search is diminished o So limiting depth is less damaging o But pruning is trickier o Historic AI: TDGammon uses depth-2 search + very good evaluation function + reinforcement learning: world-champion level play Image: Wikipedia

Multi-Agent Utilities o What if the game is not zero-sum, or has multiple players? o Generalization of minimax: o o o o Terminals have utility tuples Node values are also utility tuples Each player maximizes its own component Can give rise to cooperation and competition dynamically 1,6,6 7,1,2 6,1,2 7,2,1 5,1,7 1,5,2 7,7,1 5,2,5