CSE 573: Artificial Intelligence

Similar documents
Evaluation for Pacman. CS 188: Artificial Intelligence Fall Iterative Deepening. α-β Pruning Example. α-β Pruning Pseudocode.

CS 188: Artificial Intelligence Fall 2008

Announcements. CS 188: Artificial Intelligence Fall Adversarial Games. Computing Minimax Values. Evaluation Functions. Recap: Resource Limits

CS 188: Artificial Intelligence Fall Announcements

Announcements. CS 188: Artificial Intelligence Spring Mini-Contest Winners. Today. GamesCrafters. Adversarial Games

CS 188: Artificial Intelligence

Basic Probability and Decisions

Minimax strategies, alpha beta pruning. Lirong Xia

Last update: April 15, Rational decisions. CMSC 421: Chapter 16. CMSC 421: Chapter 16 1

Rational preferences. Rational decisions. Outline. Rational preferences contd. Maximizing expected utility. Preferences

Probability and Decision Theory

Rational decisions. Chapter 16. Chapter 16 1

Rational decisions From predicting the effect to optimal intervention

Game playing. Chapter 6. Chapter 6 1

CS 4100 // artificial intelligence. Recap/midterm review!

Rational decisions. Chapter 16 1

Today s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning

CS 188: Artificial Intelligence Spring 2007

IS-ZC444: ARTIFICIAL INTELLIGENCE

Algorithms for Playing and Solving games*

Decision Theory Intro: Preferences and Utility

Lecture 14: Introduction to Decision Making

CS 188 Introduction to Fall 2007 Artificial Intelligence Midterm

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

CSE 473: Artificial Intelligence Autumn 2011

CMSC 474, Game Theory

Decision. AI Slides (5e) c Lin

CS 188: Artificial Intelligence Spring Today

Final. Introduction to Artificial Intelligence. CS 188 Spring You have approximately 2 hours and 50 minutes.

CSE 473: Artificial Intelligence Spring 2014

Adversarial Search & Logic and Reasoning

CS 188: Artificial Intelligence Spring Announcements

CSL302/612 Artificial Intelligence End-Semester Exam 120 Minutes

Summary. Agenda. Games. Intelligence opponents in game. Expected Value Expected Max Algorithm Minimax Algorithm Alpha-beta Pruning Simultaneous Game

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Announcements. CS 188: Artificial Intelligence Spring Probability recap. Outline. Bayes Nets: Big Picture. Graphical Model Notation

CSE 473: Artificial Intelligence Autumn Topics

An AI-ish view of Probability, Conditional Probability & Bayes Theorem

10/18/2017. An AI-ish view of Probability, Conditional Probability & Bayes Theorem. Making decisions under uncertainty.

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

CS 570: Machine Learning Seminar. Fall 2016

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

CSEP 573: Artificial Intelligence

Alpha-Beta Pruning: Algorithm and Analysis

Introduction to Fall 2008 Artificial Intelligence Midterm Exam

Notes on induction proofs and recursive definitions

CS 188: Artificial Intelligence Spring Announcements

Alpha-Beta Pruning: Algorithm and Analysis

Bayes Nets III: Inference

Information, Utility & Bounded Rationality

CS599 Lecture 1 Introduction To RL

Uncertainty. Michael Peters December 27, 2013

Alpha-Beta Pruning: Algorithm and Analysis

Probabilistic Reasoning

Lecture Overview. Introduction to Artificial Intelligence COMP 3501 / COMP Lecture 11: Uncertainty. Uncertainty.

Discrete Probability and State Estimation

CS 188 Introduction to AI Fall 2005 Stuart Russell Final

CS221 Practice Midterm #2 Solutions

Where are we in CS 440?

Decision Theory: Q-Learning

18.600: Lecture 3 What is probability?

CS 798: Multiagent Systems

A Problem Involving Games. Paccioli s Solution. Problems for Paccioli: Small Samples. n / (n + m) m / (n + m)

Announcements. Inference. Mid-term. Inference by Enumeration. Reminder: Alarm Network. Introduction to Artificial Intelligence. V22.

CS 188: Artificial Intelligence Spring Announcements

Outline. CSE 473: Artificial Intelligence Spring Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

Grundlagen der Künstlichen Intelligenz

A.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I. kevin small & byron wallace

Final. Introduction to Artificial Intelligence. CS 188 Spring You have approximately 2 hours and 50 minutes.

Adversarial Search. Christos Papaloukas, Iosif Angelidis. University of Athens November 2017

CS188: Artificial Intelligence, Fall 2010 Written 3: Bayes Nets, VPI, and HMMs

Utility and Decision Making 1

Two hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE. Date: Thursday 17th May 2018 Time: 09:45-11:45. Please answer all Questions.

Reasoning Under Uncertainty: Introduction to Probability

Where are we in CS 440?

Lecture 9: Naive Bayes, SVM, Kernels. Saravanan Thirumuruganathan

CSE 473: Artificial Intelligence Spring 2014

Introduction to Spring 2006 Artificial Intelligence Practice Final

Game Theory and its Applications to Networks - Part I: Strict Competition

CSC242: Intro to AI. Lecture 7 Games of Imperfect Knowledge & Constraint Satisfaction

CSE 473: Artificial Intelligence

CS 188: Artificial Intelligence Spring Announcements

Grundlagen der Künstlichen Intelligenz

12735: Urban Systems Modeling. Loss and decisions. instructor: Matteo Pozzi. Lec : Urban Systems Modeling Lec. 11 Loss and decisions

Q-learning Tutorial. CSC411 Geoffrey Roeder. Slides Adapted from lecture: Rich Zemel, Raquel Urtasun, Sanja Fidler, Nitish Srivastava

CS221 Practice Midterm

Announcements. CS 188: Artificial Intelligence Fall Markov Models. Example: Markov Chain. Mini-Forward Algorithm. Example

Reinforcement Learning Wrap-up

CSEP 573: Artificial Intelligence

CSC304 Lecture 5. Game Theory : Zero-Sum Games, The Minimax Theorem. CSC304 - Nisarg Shah 1

Using first-order logic, formalize the following knowledge:

CS188: Artificial Intelligence, Fall 2009 Written 2: MDPs, RL, and Probability

Introduction to Fall 2009 Artificial Intelligence Final Exam

CS 188: Artificial Intelligence

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10

Weighted Majority and the Online Learning Approach

Markov Models and Reinforcement Learning. Stephen G. Ware CSCI 4525 / 5525

POLYNOMIAL SPACE QSAT. Games. Polynomial space cont d

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm

Name: UW CSE 473 Final Exam, Fall 2014

Transcription:

CSE 573: Artificial Intelligence Autumn 2010 Lecture 5: Expectimax Search 10/14/2008 Luke Zettlemoyer Most slides over the course adapted from either Dan Klein, Stuart Russell or Andrew Moore 1

Announcements PS1 due tomorrow, 5pm DropBox instructions are on assignment page No late assignments Email Luke for extension (requires good reason) PS2 will go out soon MDP/RL Readings will assign chapters from RL Book by Sutton & Barto, freely available online: http://webdocs.cs.ualberta.ca/~sutton/book/the-book.html

Outline Review adversarial search Trace alpha/beta Review probabilities / expectations Expectimax search (one and two player) Rational preferences

Adversarial Games Deterministic, zero-sum games: Tic-tac-toe, chess, checkers One player maximizes result The other minimizes result Minimax search: A state-space search tree Players alternate turns Each node has a minimax value: best achievable utility against a rational adversary Minimax values: computed recursively 5 max 2 5 8 2 5 6 Terminal values: part of the game min

Recap: Resource Limits Cannot search to leaves Depth-limited search Instead, search a limited depth of tree Replace terminal utilities with an eval function for nonterminal positions Guarantee of optimal play is gone Replanning agents: Search to choose next action Replan each new turn in response to new state max 4-2 4 min min -1-2 4 9????

Evaluation Functions Function which scores non-terminals Ideal function: returns the utility of the position Typically weighted linear sum of features: number of pawns, rooks, etc.

Pruning for Minimax

Alpha-Beta Pseudocode inputs: state, current game state α, value of best alternative for MAX on path to state β, value of best alternative for MIN on path to state returns: a utility value function MAX-VALUE(state,α,β) if TERMINAL-TEST(state) then return UTILITY(state) v for a, s in SUCCESSORS(state) do v MAX(v, MIN-VALUE(s,α,β)) if v β then return v α MAX(α,v) return v function MIN-VALUE(state,α,β) if TERMINAL-TEST(state) then return UTILITY(state) v + for a, s in SUCCESSORS(state) do v MIN(v, MAX-VALUE(s,α,β)) if v α then return v β MIN(β,v) return v

Alpha-Beta Pruning Example α=- β=+ 3 α=- β=+ α=3 β=+ α=3 β=+ α=3 β=+ 3 2 1 α=- β=+ α=- β=3 α=- β=3 α=- β=3 α=3 β=+ α=3 β=2 α=3 β=+ α=3 β=14 α=3 β=5 3 12 2 14 5 1 α=3 β=1 8 α=- β=3 8 α=8 β=3 α is MAX s best alternative here or above β is MIN s best alternative here or above

Alpha-Beta Pruning Properties This pruning has no effect on final result at the root Values of intermediate nodes might be wrong! but, they are bounds Good child ordering improves effectiveness of pruning With perfect ordering : Time complexity drops to O(b m/2 ) Doubles solvable depth! Full search of, e.g. chess, is still hopeless

Expectimax Search Trees What if we don t know what the result of an action will be? E.g., In solitaire, next card is unknown In minesweeper, mine locations In pacman, the ghosts act randomly Can do expectimax search Chance nodes, like min nodes, except the outcome is uncertain Calculate expected utilities Max nodes as in minimax search Chance nodes take average (expectation) of value of children max 10 4 5 7 chance Later, we ll learn how to formalize the underlying problem as a Markov Decision Process

Maximum Expected Utility Why should we average utilities? Why not minimax?

Which Algorithm? Minimax: no point in trying 3 ply look ahead, ghosts move randomly

Which Algorithm? Expectimax: wins some of the time 3 ply look ahead, ghosts move randomly

Maximum Expected Utility Why should we average utilities? Why not minimax? Principle of maximum expected utility: an agent should chose the action which maximizes its expected utility, given its knowledge General principle for decision making Often taken as the definition of rationality We ll see this idea over and over in this course! Let s decompress this definition

Reminder: Probabilities A random variable represents an event whose outcome is unknown A probability distribution is an assignment of weights to outcomes Example: traffic on freeway? Random variable: T = whether there s traffic Outcomes: T in {none, light, heavy} Distribution: P(T=none) = 0.25, P(T=light) = 0.55, P(T=heavy) = 0.20 Some laws of probability (more later): Probabilities are always non-negative Probabilities over all possible outcomes sum to one As we get more evidence, probabilities may change: P(T=heavy) = 0.20, P(T=heavy Hour=8am) = 0.60 We ll talk about methods for reasoning and updating probabilities later

What are Probabilities? Objectivist / frequentist answer: Averages over repeated experiments E.g. empirically estimating P(rain) from historical observation E.g. pacman s estimate of what the ghost will do, given what it has done in the past Assertion about how future experiments will go (in the limit) Makes one think of inherently random events, like rolling dice Subjectivist / Bayesian answer: Degrees of belief about unobserved variables E.g. an agent s belief that it s raining, given the temperature E.g. pacman s belief that the ghost will turn left, given the state Often learn probabilities from past experiences (more later) New evidence updates beliefs (more later)

Uncertainty Everywhere Not just for games of chance! I m sick: will I sneeze this minute? Email contains FREE! : is it spam? Tooth hurts: have cavity? 60 min enough to get to the airport? Robot rotated wheel three times, how far did it advance? Safe to cross street? (Look both ways!) Sources of uncertainty in random variables: Inherently random process (dice, etc) Insufficient or weak evidence Ignorance of underlying processes Unmodeled variables The world s just noisy it doesn t behave according to plan!

Reminder: Expectations We can define function f(x) of a random variable X The expected value of a function is its average value, weighted by the probability distribution over inputs Example: How long to get to the airport? Length of driving time as a function of traffic: L(none) = 20, L(light) = 30, L(heavy) = 60 What is my expected driving time? Notation: EP(T)[ L(T) ] Remember, P(T) = {none: 0.25, light: 0.5, heavy: 0.25} E[ L(T) ] = L(none) * P(none) + L(light) * P(light) + L(heavy) * P(heavy) E[ L(T) ] = (20 * 0.25) + (30 * 0.5) + (60 * 0.25) = 35

Utilities Utilities are functions from outcomes (states of the world) to real numbers that describe an agent s preferences Where do utilities come from? In a game, may be simple (+1/-1) Utilities summarize the agent s goals Theorem: any set of preferences between outcomes can be summarized as a utility function (provided the preferences meet certain conditions) In general, we hard-wire utilities and let actions emerge (why don t we let agents decide their own utilities?) More on utilities soon

Expectimax Search In expectimax search, we have a probabilistic model of how the opponent (or environment) will behave in any state Model could be a simple uniform distribution (roll a die) Model could be sophisticated and require a great deal of computation We have a node for every outcome out of our control: opponent or environment The model might say that adversarial actions are likely! For now, assume for any state we magically have a distribution to assign probabilities to opponent actions / environment outcomes

Expectimax Pseudocode def value(s) if s is a max node return maxvalue(s) if s is an exp node return expvalue(s) if s is a terminal node return evaluation(s) def maxvalue(s) values = [value(s ) for s in successors(s)] return max(values) 8 4 5 6 def expvalue(s) values = [value(s ) for s in successors(s)] weights = [probability(s, s ) for s in successors(s)] return expectation(values, weights)

Expectimax for Pacman Notice that we ve gotten away from thinking that the ghosts are trying to minimize pacman s score Instead, they are now a part of the environment Pacman has a belief (distribution) over how they will act Quiz: Can we see minimax as a special case of expectimax? Quiz: what would pacman s computation look like if we assumed that the ghosts were doing 1-ply minimax and taking the result 80% of the time, otherwise moving randomly?

Expectimax for Pacman Results from playing 5 games Minimizing Ghost Random Ghost Minimax Pacman Won 5/5 Avg. Score: 493 Won 5/5 Avg. Score: 483 Expectimax Pacman Won 1/5 Avg. Score: -303 Won 5/5 Avg. Score: 503 SCORE: 0 Pacman does depth 4 search with an eval function that avoids trouble Minimizing ghost does depth 2 search with an eval function that seeks Pacman

Expectimax Pruning? Not easy exact: need bounds on possible values approximate: sample high-probability branches

Expectimax Evaluation Evaluation functions quickly return an estimate for a node s true value (which value, expectimax or minimax?) For minimax, evaluation function scale doesn t matter We just want better states to have higher evaluations (get the ordering right) We call this insensitivity to monotonic transformations For expectimax, we need magnitudes to be meaningful 0 40 20 30 x 2 0 1600 400 900

Mixed Layer Types E.g. Backgammon Expectiminimax Environment is an extra player that moves after each agent Chance nodes take expectations, otherwise like minimax

Stochastic Two-Player Dice rolls increase b: 21 possible rolls with 2 dice Backgammon 20 legal moves Depth 4 = 20 x (21 x 20) 3 1.2 x 10 9 As depth increases, probability of reaching a given node shrinks So value of lookahead is diminished So limiting depth is less damaging But pruning is less possible TDGammon uses depth-2 search + very good eval function + reinforcement learning: worldchampion level play

Non-Zero-Sum Games Similar to minimax: Utilities are now tuples Each player maximizes their own entry at each node Propagate (or back up) nodes from children Can give rise to cooperation and competition dynamically 1,2,6 4,3,2 6,1,2 7,4,1 5,1,1 1,5,2 7,7,1 5,4,5

Preferences An agent chooses among: Prizes: A, B, etc. Lotteries: situations with uncertain prizes Notation:

Rational Preferences We want some constraints on preferences before we call them rational For example: an agent with intransitive preferences can be induced to give away all its money If B > C, then an agent with C would pay (say) 1 cent to get B If A > B, then an agent with B would pay (say) 1 cent to get A If C > A, then an agent with A would pay (say) 1 cent to get C

Rational Preferences Preferences of a rational agent must obey constraints. The axioms of rationality: Theorem: Rational preferences imply behavior describable as maximization of expected utility

MEU Principle Theorem: [Ramsey, 1931; von Neumann & Morgenstern, 1944] Given any preferences satisfying these constraints, there exists a real-valued function U such that: Maximum expected likelihood (MEU) principle: Choose the action that maximizes expected utility Note: an agent can be entirely rational (consistent with MEU) without ever representing or manipulating utilities and probabilities E.g., a lookup table for perfect tictactoe, reflex vacuum cleaner

Utility Scales Normalized utilities: u + = 1.0, u - = 0.0 Micromorts: one-millionth chance of death, useful for paying to reduce product risks, etc. QALYs: quality-adjusted life years, useful for medical decisions involving substantial risk Note: behavior is invariant under positive linear transformation

Money Money does not behave as a utility function Given a lottery L: Define expected monetary value EMV(L) Usually U(L) < U(EMV(L)) I.e., people are risk-averse Utility curve: for what probability p am I indifferent between: A prize x A lottery [p,$m; (1-p),$0] for large M? Typical empirical data, extrapolated with risk-prone behavior:

Example: Human Rationality? Famous example of Allais (1953) A: [0.8,$4k; 0.2,$0] B: [1.0,$3k; 0.0,$0] C: [0.2,$4k; 0.8,$0] D: [0.25,$3k; 0.75,$0] Most people prefer B > A, C > D But if U($0) = 0, then B > A U($3k) > 0.8 U($4k) C > D 0.8 U($4k) > U($3k)