CS 343: Artificial Intelligence

Similar documents
Decision Networks. CS 188: Artificial Intelligence. Decision Networks. Decision Networks. Decision Networks and Value of Information

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence Fall 2010

CS 188: Artificial Intelligence Fall Recap: Inference Example

Decision Networks. CS 188: Artificial Intelligence Fall Example: Decision Networks. Decision Networks. Decisions as Outcome Trees

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence

CSE 473: Artificial Intelligence Probability Review à Markov Models. Outline

CSE 473: Artificial Intelligence

CS 343: Artificial Intelligence

CS188 Outline. We re done with Part I: Search and Planning! Part II: Probabilistic Reasoning. Part III: Machine Learning

CS 188: Artificial Intelligence Spring Announcements

CS 343: Artificial Intelligence

CS188 Outline. CS 188: Artificial Intelligence. Today. Inference in Ghostbusters. Probability. We re done with Part I: Search and Planning!

CS 188: Artificial Intelligence. Our Status in CS188

Reinforcement Learning Wrap-up

CS 188: Artificial Intelligence Spring Announcements

CS 5522: Artificial Intelligence II

Our Status. We re done with Part I Search and Planning!

Bayes Nets: Sampling

CS 188: Artificial Intelligence Fall 2011

CS 188: Artificial Intelligence Fall 2009

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence Spring Announcements

Announcements. CS 188: Artificial Intelligence Fall VPI Example. VPI Properties. Reasoning over Time. Markov Models. Lecture 19: HMMs 11/4/2008

Our Status in CSE 5522

CSE 473: Artificial Intelligence

CSEP 573: Artificial Intelligence

CS 188: Artificial Intelligence

Probabilistic Models. Models describe how (a portion of) the world works

CS 188: Artificial Intelligence Fall 2009

CS 188: Artificial Intelligence Spring 2009

CS 188: Artificial Intelligence

CSE 473: Ar+ficial Intelligence. Hidden Markov Models. Bayes Nets. Two random variable at each +me step Hidden state, X i Observa+on, E i

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Hidden Markov Models. Vibhav Gogate The University of Texas at Dallas

Approximate Inference

CS 343: Artificial Intelligence

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

CS 5522: Artificial Intelligence II

CS 188: Artificial Intelligence. Bayes Nets

Introduction to Spring 2006 Artificial Intelligence Practice Final

CS 5522: Artificial Intelligence II

Introduction to Artificial Intelligence (AI)

CS 5522: Artificial Intelligence II

CS 188: Artificial Intelligence Spring Announcements

Last update: April 15, Rational decisions. CMSC 421: Chapter 16. CMSC 421: Chapter 16 1

CS 188: Artificial Intelligence Fall 2008

CS343 Artificial Intelligence

Markov Chains and Hidden Markov Models

Rational preferences. Rational decisions. Outline. Rational preferences contd. Maximizing expected utility. Preferences

Announcements. CS 188: Artificial Intelligence Fall Markov Models. Example: Markov Chain. Mini-Forward Algorithm. Example

Bayes Net Representation. CS 188: Artificial Intelligence. Approximate Inference: Sampling. Variable Elimination. Sampling.

CSE 473: Ar+ficial Intelligence

CSE 473: Artificial Intelligence Autumn 2011

CSE 473: Ar+ficial Intelligence. Probability Recap. Markov Models - II. Condi+onal probability. Product rule. Chain rule.

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Lecture 14: Introduction to Decision Making

Announcements. CS 188: Artificial Intelligence Spring Probability recap. Outline. Bayes Nets: Big Picture. Graphical Model Notation

16.4 Multiattribute Utility Functions

CSE 473: Ar+ficial Intelligence. Example. Par+cle Filters for HMMs. An HMM is defined by: Ini+al distribu+on: Transi+ons: Emissions:

Artificial Intelligence Bayes Nets: Independence

CSE 473: Artificial Intelligence Spring 2014

Bayes Nets: Independence

Bayes Nets III: Inference

CSE 473: Artificial Intelligence Spring 2014

An Introduction to Markov Decision Processes. MDP Tutorial - 1

Rational decisions From predicting the effect to optimal intervention

Basic Probability and Statistics

CS 5522: Artificial Intelligence II

Probability Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 27 Mar 2012

Probability, Markov models and HMMs. Vibhav Gogate The University of Texas at Dallas

Probabilistic Models

Evaluation for Pacman. CS 188: Artificial Intelligence Fall Iterative Deepening. α-β Pruning Example. α-β Pruning Pseudocode.

CS 188: Artificial Intelligence Fall 2008

CS 5522: Artificial Intelligence II

Markov decision processes (MDP) CS 416 Artificial Intelligence. Iterative solution of Bellman equations. Building an optimal policy.

Introduction to probability

Logic, Knowledge Representation and Bayesian Decision Theory

Partially Observable Markov Decision Processes (POMDPs)

CS 188: Artificial Intelligence. Machine Learning

Announcements. CS 188: Artificial Intelligence Fall Adversarial Games. Computing Minimax Values. Evaluation Functions. Recap: Resource Limits

CS 188: Artificial Intelligence Fall Announcements

CS188: Artificial Intelligence, Fall 2010 Written 3: Bayes Nets, VPI, and HMMs

Chapter Six. Approaches to Assigning Probabilities

Bayesian Networks BY: MOHAMAD ALSABBAGH

Today s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning

Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS

Introduction to Artificial Intelligence (AI)

Approximate Q-Learning. Dan Weld / University of Washington

Announcements. CS 188: Artificial Intelligence Spring Mini-Contest Winners. Today. GamesCrafters. Adversarial Games

Announcements. Inference. Mid-term. Inference by Enumeration. Reminder: Alarm Network. Introduction to Artificial Intelligence. V22.

CS 188: Artificial Intelligence

Uncertain Knowledge and Bayes Rule. George Konidaris

Hidden Markov Models. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 19 Apr 2012

Rational decisions. Chapter 16. Chapter 16 1

CS 4100 // artificial intelligence. Recap/midterm review!

Transcription:

CS 343: Artificial Intelligence Decision Networks and Value of Perfect Information Prof. Scott Niekum The niversity of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at C Berkeley. All CS188 materials are available at http://ai.berkeley.edu.]

Decision Networks

Decision Networks mbrella Weather Forecast

Decision Networks ME: choose the action which maximizes the expected utility given the evidence Can directly operationalize this with decision networks Bayes nets with nodes for utility and actions Lets us calculate the expected utility for each action New node types: Chance nodes (just like BNs) mbrella Weather Actions (rectangles, cannot have parents, act as observed evidence) tility node (diamond, depends on action and chance nodes) Forecast

Decision Networks Action selection Instantiate all evidence mbrella Set action node(s) each possible way Calculate posterior for all parents of utility node, given the evidence Calculate expected utility for each action Choose maximizing action Weather Forecast

Decision Networks mbrella = leave mbrella mbrella = take Weather Optimal decision = leave W P(W) sun 0.7 rain 0.3 A W (A,W) leave sun 100 leave rain 0 take sun 20 take rain 70

Decisions as Outcome Trees {} mbrella take leave Weather {} Weather {} Weather sun rain sun rain (t,s) (t,r) (l,s) (l,r) Almost exactly like expectimax / MDPs What s changed?

Example: Decision Networks mbrella = leave mbrella = take Optimal decision = take mbrella Weather Forecast =bad W P(W F=bad) sun 0.34 rain 0.66 A W (A,W) leave sun 100 leave rain 0 take sun 20 take rain 70

Decisions as Outcome Trees mbrella {b} take leave Weather sun W {b} rain sun W {b} rain (t,s) (t,r) (l,s) (l,r) Forecast =bad

Ghostbusters Decision Network Bust Ghost Location Sensor (1,1) Sensor (2,1) Sensor (1,2) Sensor (1,3) Sensor (1,n) Sensor (m,1) Sensor (m,n)

Ghostbusters Where to measure?

Value of Information

Value of Information Idea: compute value of acquiring evidence Can be done directly from decision network Example: buying oil drilling rights Two blocks A and B, exactly one has oil, worth k You can drill in one location Prior probabilities 0.5 each, & mutually exclusive Drilling in either A or B has E = k/2, ME = k/2 O P a 1/2 b 1/2 DrillLoc OilLoc D O a a k a b 0 b a 0 b b k Question: what s the value of information of O? Value of knowing which of A or B has oil Value is expected gain in ME from new info Survey may say oil in a or oil in b, prob 0.5 each If we know OilLoc, ME is k (either way) Gain in ME from knowing OilLoc? VPI(OilLoc) = k/2 Fair price of information: k/2

VPI Example: Weather ME with no evidence mbrella A W leave sun 100 leave rain 0 ME if forecast is bad Weather W P(W) take sun 20 take rain 70 ME if forecast is good Forecast sun 0.7 rain 0.3 W P(W F=bad) sun 0.34 rain 0.66 Forecast distribution F P(F) good 0.59 bad 0.41

Value of Information Assume we have evidence E=e. Value if we act now: a {+e} Assume we see that E = e. Value if we act then: P(s +e) BT E is a random variable whose value is unknown, so we don t know what e will be Expected value if E is revealed and then we act: P(s +e, +e ) {+e, +e } a Value of information: how much ME goes up by revealing E first then acting, over acting now: P(+e +e) {+e, +e } a {+e} P(-e +e) {+e, -e }

VPI Properties Nonnegative Nonadditive Typically (but not always): Order-independent

Quick VPI Questions The soup of the day is either clam chowder or split pea, but you wouldn t order either one. What s the value of knowing which it is? There are two kinds of plastic forks at a picnic. One kind is slightly sturdier. What s the value of knowing which? You re playing the lottery. The prize will be $0 or $100. You can play any number between 1 and 100 (chance of winning is 1%). What is the value of knowing the winning number?

Value of Imperfect Information? No such thing Information corresponds to the observation of a node in the decision network If data is noisy that just means we don t observe the original variable, but another variable which is a noisy version of the original one

VPI Question VPI(OilLoc)? DrillLoc VPI(ScoutingReport)? Scout OilLoc VPI(Scout)? Scouting Report VPI(Scout ScoutingReport)?

POMDPs

POMDPs MDPs have: States S Actions A Transition function P(s s,a) (or T(s,a,s )) Rewards R(s,a,s ) POMDPs add: Observations O Observation function P(o s) (or O(s,o)) s a s, a s,a,s b a b, a s POMDPs are MDPs over belief states b (distributions over S) o b

Demo: Ghostbusters with VPI Example: Ghostbusters In (static) Ghostbusters: Belief state determined by evidence to date {e} Tree really over evidence sets Probabilistic reasoning needed to predict what new evidence will be gained, given past evidence and the action taken e b a b, a b e {e} a e, a {e, e } Solving POMDPs One way: use truncated expectimax to compute approximate value of actions What if you only considered busting or one sense followed by a bust? You get a VPI-based agent! a bust (a bust, {e}) a bust {e} e a sense {e}, a sense {e, e } (a bust, {e, e })

Video of Demo Ghostbusters with VPI

More Generally General solutions map belief functions to actions Can divide regions of belief space (set of belief functions) into policy regions (gets complex quickly) Can build approximate policies using discretization methods Can factor belief functions in various ways Overall, POMDPs are very (actually PSACE-) hard Most real problems are POMDPs, but we can rarely solve then in general!