CS 188: Artificil Intelligence Lecture 19: Decision Digrms Pieter Abbeel --- C Berkeley Mny slides over this course dpted from Dn Klein, Sturt Russell, Andrew Moore Decision Networks ME: choose the ction which mximizes the expected utility given the evidence mbrell Cn directly opertionlize this with decision networks Byes nets with nodes for utility nd ctions Lets us clculte the expected utility for ech ction Wether New node types: Chnce nodes (just like BNs) Actions (rectngles, cnnot hve prents, ct s observed evidence) tility node (dimond, depends on ction nd chnce nodes) Forecst 2
Decision Networks Action selection: Instntite ll evidence Set ction node(s) ech possible wy Clculte posterior for ll prents of utility node, given the evidence Clculte expected utility for ech ction Choose mximizing ction mbrell Wether Forecst 3 Exmple: Decision Networks mbrell = leve mbrell mbrell = tke Wether Optiml decision = leve W P(W) sun 0.7 rin 0.3 A W (A,W) leve sun 100 leve rin 0 tke sun 20 tke rin 70
Decisions s Outcome Trees {} tke leve Wether {} Wether {} sun rin sun rin (t,s) (t,r) (l,s) (l,r) Almost exctly like expectimx / MDPs Wht s chnged? 5 Exmple: Decision Networks mbrell = leve mbrell W P(W F=bd) sun 0.34 rin 0.66 mbrell = tke Optiml decision = tke Wether Forecst =bd A W (A,W) leve sun 100 leve rin 0 tke sun 20 tke rin 70 6
Decisions s Outcome Trees {b} tke leve W {b} W {b} sun rin sun rin (t,s) (t,r) (l,s) (l,r) 7 Vlue of Informtion Ide: compute vlue of cquiring evidence Cn be done directly from decision network Exmple: buying oil drilling rights Two blocks A nd B, exctly one hs oil, worth k You cn drill in one loction Prior probbilities 0.5 ech, & mutully exclusive Drilling in either A or B hs E = k/2, ME = k/2 Question: wht s the vlue of informtion of O? Vlue of knowing which of A or B hs oil Vlue is expected gin in ME from new info Survey my sy oil in or oil in b, prob 0.5 ech If we know OilLoc, ME is k (either wy) Gin in ME from knowing OilLoc? VPI(OilLoc) = k/2 Fir price of informtion: k/2 DrillLoc OilLoc O P 1/2 b 1/2 D O k b 0 b 0 b b k 8
VPI Exmple: Wether ME with no evidence ME if forecst is bd ME if forecst is good Forecst distribution mbrell Wether Forecst A W leve sun 100 leve rin 0 tke sun 20 tke rin 70 F P(F) good 0.59 bd 0.41 9 Vlue of Informtion Assume we hve evidence E=e. Vlue if we ct now: {+e} Assume we see tht E = e. Vlue if we ct then: P(s +e) BT E is rndom vrible whose vlue is unknown, so we don t know wht e will be P(s +e, +e ) Expected vlue if E is reveled nd then we ct: Vlue of informtion: how much ME goes up by reveling E first then cting, over cting now: {+e, +e } {+e} P(+e +e) {+e, +e } P(-e +e) {+e, +e }
VPI Properties Nonnegtive Nondditive ---consider, e.g., obtining E j twice Order-independent 11 Quick VPI Questions The soup of the dy is either clm chowder or split pe, but you wouldn t order either one. Wht s the vlue of knowing which it is? There re two kinds of plstic forks t picnic. One kind is slightly sturdier. Wht s the vlue of knowing which? You re plying the lottery. The prize will be $0 or $100. You cn ply ny number between 1 nd 100 (chnce of winning is 1%). Wht is the vlue of knowing the winning number?
POMDPs MDPs hve: Sttes S Actions A Trnsition fn P(s s,) (or T(s,,s )) Rewrds R(s,,s ) POMDPs dd: Observtions O Observtion function P(o s) (or O(s,o)) s s, s,,s s b b, POMDPs re MDPs over belief sttes b (distributions over S) o b We ll be ble to sy more in few lectures 13 Exmple: Ghostbusters In (sttic) Ghostbusters: Belief stte determined by evidence to dte {e} Tree relly over evidence sets Probbilistic resoning needed to predict new evidence given pst evidence Solving POMDPs One wy: use truncted expectimx to compute pproximte vlue of ctions Wht if you only considered busting or one sense followed by bust? You get VPI-bsed gent! o bust ( bust, {e}) b, b b bust {e} e ( bust, {e, e }) {e} e, e {e, e } sense {e}, sense {e, e } 14
More Generlly Generl solutions mp belief functions to ctions Cn divide regions of belief spce (set of belief functions) into policy regions (gets complex quickly) Cn build pproximte policies using discretiztion methods Cn fctor belief functions in vrious wys Overll, POMDPs re very (ctully PSACE-) hrd Most rel problems re POMDPs, but we cn rrely solve then in generl! 15