CS 188: Artificial Intelligence

Similar documents
CS 188: Artificial Intelligence Fall 2010

Decision Networks. CS 188: Artificial Intelligence. Decision Networks. Decision Networks. Decision Networks and Value of Information

Decision Networks. CS 188: Artificial Intelligence Fall Example: Decision Networks. Decision Networks. Decisions as Outcome Trees

CS 343: Artificial Intelligence

CS 188 Introduction to Artificial Intelligence Fall 2018 Note 7

CS 188: Artificial Intelligence Spring 2007

Bellman Optimality Equation for V*

19 Optimal behavior: Game theory

Reinforcement learning II

CS 188: Artificial Intelligence Fall Announcements

Improper Integrals. Type I Improper Integrals How do we evaluate an integral such as

n f(x i ) x. i=1 In section 4.2, we defined the definite integral of f from x = a to x = b as n f(x i ) x; f(x) dx = lim i=1

Chapter 4: Dynamic Programming

Administrivia CSE 190: Reinforcement Learning: An Introduction

State space systems analysis (continued) Stability. A. Definitions A system is said to be Asymptotically Stable (AS) when it satisfies

Reinforcement Learning

CS 188: Artificial Intelligence Fall Recap: Inference Example

{ } = E! & $ " k r t +k +1

Solution for Assignment 1 : Intro to Probability and Statistics, PAC learning

f(x) dx, If one of these two conditions is not met, we call the integral improper. Our usual definition for the value for the definite integral

Artificial Intelligence Markov Decision Problems

Chapter 0. What is the Lebesgue integral about?

DATA Search I 魏忠钰. 复旦大学大数据学院 School of Data Science, Fudan University. March 7 th, 2018

Lecture 20: Numerical Integration III

CS667 Lecture 6: Monte Carlo Integration 02/10/05

4 7x =250; 5 3x =500; Read section 3.3, 3.4 Announcements: Bell Ringer: Use your calculator to solve

Cf. Linn Sennott, Stochastic Dynamic Programming and the Control of Queueing Systems, Wiley Series in Probability & Statistics, 1999.

INTRODUCTION TO INTEGRATION

Problem Set 3 Solutions

1 Error Analysis of Simple Rules for Numerical Integration

20 MATHEMATICS POLYNOMIALS

Recitation 3: More Applications of the Derivative

Read section 3.3, 3.4 Announcements:

Uninformed Search Lecture 4

Problem Set 9. Figure 1: Diagram. This picture is a rough sketch of the 4 parabolas that give us the area that we need to find. The equations are:

Riemann Sums and Riemann Integrals

Chapter 14. Matrix Representations of Linear Transformations

HW3, Math 307. CSUF. Spring 2007.

Math 8 Winter 2015 Applications of Integration

CS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata

Is there an easy way to find examples of such triples? Why yes! Just look at an ordinary multiplication table to find them!

Riemann Sums and Riemann Integrals

2D1431 Machine Learning Lab 3: Reinforcement Learning

Module 6 Value Iteration. CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo

38.2. The Uniform Distribution. Introduction. Prerequisites. Learning Outcomes

Hidden Markov Models

Session Trimester 2. Module Code: MATH08001 MATHEMATICS FOR DESIGN

SUMMER KNOWHOW STUDY AND LEARNING CENTRE

1 Online Learning and Regret Minimization

3.4 Numerical integration

Lecture 3. In this lecture, we will discuss algorithms for solving systems of linear equations.

Math 426: Probability Final Exam Practice

Review of Calculus, cont d

Expectation and Variance

Infinite Geometric Series

Review of Gaussian Quadrature method

Duality # Second iteration for HW problem. Recall our LP example problem we have been working on, in equality form, is given below.

Lecture 3 Gaussian Probability Distribution

1 Probability Density Functions

Continuous Random Variables

Name Solutions to Test 3 November 8, 2017

Chapters 4 & 5 Integrals & Applications

1.2. Linear Variable Coefficient Equations. y + b "! = a y + b " Remark: The case b = 0 and a non-constant can be solved with the same idea as above.

Reinforcement learning

Line and Surface Integrals: An Intuitive Understanding

Today. Recap: Reasoning Over Time. Demo Bonanza! CS 188: Artificial Intelligence. Advanced HMMs. Speech recognition. HMMs. Start machine learning

Chapter 5 : Continuous Random Variables

How do we solve these things, especially when they get complicated? How do we know when a system has a solution, and when is it unique?

Unit #9 : Definite Integral Properties; Fundamental Theorem of Calculus

Riemann is the Mann! (But Lebesgue may besgue to differ.)

Finite Automata. Informatics 2A: Lecture 3. John Longley. 22 September School of Informatics University of Edinburgh

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives

Definite integral. Mathematics FRDIS MENDELU

CS 275 Automata and Formal Language Theory

Equations and Inequalities

The solutions of the single electron Hamiltonian were shown to be Bloch wave of the form: ( ) ( ) ikr

Section 14.3 Arc Length and Curvature

We will see what is meant by standard form very shortly

Bayesian Networks: Approximate Inference

Joint distribution. Joint distribution. Marginal distributions. Joint distribution

7.2 The Definite Integral

A recursive construction of efficiently decodable list-disjunct matrices

Reinforcement Learning and Policy Reuse

CMPSCI 250: Introduction to Computation. Lecture #31: What DFA s Can and Can t Do David Mix Barrington 9 April 2014

Module 9: Tries and String Matching

Module 9: Tries and String Matching

MORE FUNCTION GRAPHING; OPTIMIZATION. (Last edited October 28, 2013 at 11:09pm.)

Definite integral. Mathematics FRDIS MENDELU. Simona Fišnarová (Mendel University) Definite integral MENDELU 1 / 30

Construction of Gauss Quadrature Rules

Reasoning over Time or Space. CS 188: Artificial Intelligence. Outline. Markov Models. Conditional Independence. Query: P(X 4 )

Data Structures and Algorithms CMPSC 465

The Regulated and Riemann Integrals

Data Assimilation. Alan O Neill Data Assimilation Research Centre University of Reading

CS 310 (sec 20) - Winter Final Exam (solutions) SOLUTIONS

Strong Bisimulation. Overview. References. Actions Labeled transition system Transition semantics Simulation Bisimulation

Chapter 6 Notes, Larson/Hostetler 3e

Normal Distribution. Lecture 6: More Binomial Distribution. Properties of the Unit Normal Distribution. Unit Normal Distribution

( dg. ) 2 dt. + dt. dt j + dh. + dt. r(t) dt. Comparing this equation with the one listed above for the length of see that

Lecture 7 notes Nodal Analysis

ELE B7 Power System Engineering. Unbalanced Fault Analysis

Transcription:

CS 188: Artificil Intelligence Lecture 19: Decision Digrms Pieter Abbeel --- C Berkeley Mny slides over this course dpted from Dn Klein, Sturt Russell, Andrew Moore Decision Networks ME: choose the ction which mximizes the expected utility given the evidence mbrell Cn directly opertionlize this with decision networks Byes nets with nodes for utility nd ctions Lets us clculte the expected utility for ech ction Wether New node types: Chnce nodes (just like BNs) Actions (rectngles, cnnot hve prents, ct s observed evidence) tility node (dimond, depends on ction nd chnce nodes) Forecst 2

Decision Networks Action selection: Instntite ll evidence Set ction node(s) ech possible wy Clculte posterior for ll prents of utility node, given the evidence Clculte expected utility for ech ction Choose mximizing ction mbrell Wether Forecst 3 Exmple: Decision Networks mbrell = leve mbrell mbrell = tke Wether Optiml decision = leve W P(W) sun 0.7 rin 0.3 A W (A,W) leve sun 100 leve rin 0 tke sun 20 tke rin 70

Decisions s Outcome Trees {} tke leve Wether {} Wether {} sun rin sun rin (t,s) (t,r) (l,s) (l,r) Almost exctly like expectimx / MDPs Wht s chnged? 5 Exmple: Decision Networks mbrell = leve mbrell W P(W F=bd) sun 0.34 rin 0.66 mbrell = tke Optiml decision = tke Wether Forecst =bd A W (A,W) leve sun 100 leve rin 0 tke sun 20 tke rin 70 6

Decisions s Outcome Trees {b} tke leve W {b} W {b} sun rin sun rin (t,s) (t,r) (l,s) (l,r) 7 Vlue of Informtion Ide: compute vlue of cquiring evidence Cn be done directly from decision network Exmple: buying oil drilling rights Two blocks A nd B, exctly one hs oil, worth k You cn drill in one loction Prior probbilities 0.5 ech, & mutully exclusive Drilling in either A or B hs E = k/2, ME = k/2 Question: wht s the vlue of informtion of O? Vlue of knowing which of A or B hs oil Vlue is expected gin in ME from new info Survey my sy oil in or oil in b, prob 0.5 ech If we know OilLoc, ME is k (either wy) Gin in ME from knowing OilLoc? VPI(OilLoc) = k/2 Fir price of informtion: k/2 DrillLoc OilLoc O P 1/2 b 1/2 D O k b 0 b 0 b b k 8

VPI Exmple: Wether ME with no evidence ME if forecst is bd ME if forecst is good Forecst distribution mbrell Wether Forecst A W leve sun 100 leve rin 0 tke sun 20 tke rin 70 F P(F) good 0.59 bd 0.41 9 Vlue of Informtion Assume we hve evidence E=e. Vlue if we ct now: {+e} Assume we see tht E = e. Vlue if we ct then: P(s +e) BT E is rndom vrible whose vlue is unknown, so we don t know wht e will be P(s +e, +e ) Expected vlue if E is reveled nd then we ct: Vlue of informtion: how much ME goes up by reveling E first then cting, over cting now: {+e, +e } {+e} P(+e +e) {+e, +e } P(-e +e) {+e, +e }

VPI Properties Nonnegtive Nondditive ---consider, e.g., obtining E j twice Order-independent 11 Quick VPI Questions The soup of the dy is either clm chowder or split pe, but you wouldn t order either one. Wht s the vlue of knowing which it is? There re two kinds of plstic forks t picnic. One kind is slightly sturdier. Wht s the vlue of knowing which? You re plying the lottery. The prize will be $0 or $100. You cn ply ny number between 1 nd 100 (chnce of winning is 1%). Wht is the vlue of knowing the winning number?

POMDPs MDPs hve: Sttes S Actions A Trnsition fn P(s s,) (or T(s,,s )) Rewrds R(s,,s ) POMDPs dd: Observtions O Observtion function P(o s) (or O(s,o)) s s, s,,s s b b, POMDPs re MDPs over belief sttes b (distributions over S) o b We ll be ble to sy more in few lectures 13 Exmple: Ghostbusters In (sttic) Ghostbusters: Belief stte determined by evidence to dte {e} Tree relly over evidence sets Probbilistic resoning needed to predict new evidence given pst evidence Solving POMDPs One wy: use truncted expectimx to compute pproximte vlue of ctions Wht if you only considered busting or one sense followed by bust? You get VPI-bsed gent! o bust ( bust, {e}) b, b b bust {e} e ( bust, {e, e }) {e} e, e {e, e } sense {e}, sense {e, e } 14

More Generlly Generl solutions mp belief functions to ctions Cn divide regions of belief spce (set of belief functions) into policy regions (gets complex quickly) Cn build pproximte policies using discretiztion methods Cn fctor belief functions in vrious wys Overll, POMDPs re very (ctully PSACE-) hrd Most rel problems re POMDPs, but we cn rrely solve then in generl! 15