CS 188 Introduction to Artificial Intelligence Fall 2018 Note 7

CS 188 Introduction to Artificil Intelligence Fll 2018 Note 7 These lecture notes re hevily bsed on notes originlly written by Nikhil Shrm. Decision Networks In the third note, we lerned bout gme trees nd lgorithms such s minimx nd expectimx which we used to determine optiml ctions tht mximized our expected utility. Then in the sixth note, we discussed Byes nets nd how we cn use evidence we know to run probbilistic inference to mke predictions. Now we ll discuss combintion of both Byes nets nd expectimx known s decision network tht we cn use to model the effect of vrious ctions on utilities bsed on n overrching grphicl probbilistic model. Let s dive right in with the ntomy of decision network: Chnce nodes - Chnce nodes in decision network behve identiclly to Byes nets. Ech outcome in chnce node hs n ssocited probbility, which cn be determined by running inference on the underlying Byes net it belongs to. We ll represent these with ovls. Action nodes - Action nodes re nodes tht we hve complete control over; they re nodes representing choice between ny of number of ctions which we hve the power to choose from. We ll represent ction nodes with rectngles. Utility nodes - Utility nodes re children of some combintion of ction nd chnce nodes. They output utility bsed on the vlues tken on by their prents, nd re represented s dimonds in our decision networks. Consider sitution when you re deciding whether or not to tke n umbrell when you re leving for clss in the morning, nd you know there s forecsted 30% chnce of rin. Should you tke the umbrell? If there ws 80% chnce of rin, would your nswer chnge? This sitution is idel for modeling with decision network, nd we do it s follows: CS 188, Fll 2018, Note 7 1

As we ve done throughout this course with the vrious modeling techniques nd lgorithms we ve discussed, our gol with decision networks is gin to select the ction which yields the mximum expected utility (MEU). This cn be done with firly strightforwrd nd intuitive procedure: Strt by instntiting ll evidence tht s known, nd run inference to clculte the posterior probbilities of ll chnce node prents of the utility node into which the ction node feeds. Go through ech possible ction nd compute the expected utility of tking tht ction given the posterior probbilities computed in the previous step. The expected utility of tking n ction given evidence e nd n chnce nodes is computed with the following formul: EU( e) = P(x 1,...,x n e)u(,x 1,...,x n ) x 1,...,x n where ech x i represents vlue tht the i th chnce node cn tke on. We simply tke weighted sum over the utilities of ech outcome under our given ction with weights corresponding to the probbilities of ech outcome. Finlly, select the ction which yielded the highest utility to get the MEU. Let s see how this ctully looks by clculting the optiml ction (should we leve or tke our umbrell) for our wether exmple, using both the conditionl probbility tble for wether given bd wether forecst (forecst is our evidence vrible) nd the utility tble given our ction nd the wether: CS 188, Fll 2018, Note 7 2

Note tht we hve omitted the inference computtion for the posterior probbilities P(W F = bd), but we could compute these using ny of the inference lgorithms we discussed for Byes Nets. Insted, here we simply ssume the bove tble of posterior probbilities for P(W F = bd) s given. Going through both our ctions nd computing expected utilities yields: EU(leve bd) = P(w bd)u(leve, w) w = 0.34 100 + 0.66 0 = 34 EU(tke bd) = P(w bd)u(tke, w) w = 0.34 20 + 0.66 70 = 53 All tht s left to do is tke the mximum over these computed utilities to determine the MEU: MEU(F = bd) = mxeu( bd) = 53 The ction tht yields the mximum expected utility is tke, nd so this is the ction recommended to us by the decision network. More formlly, the ction tht yields the MEU cn be determined by tking the rgmx over expected utilities. Outcome Trees We mentioned t the strt of this note tht decision networks involved some expectimx-esque elements, so let s discuss wht exctly tht mens. We cn unrvel the selection of n ction corresponding to the one tht mximizes expected utility in decision network s n outcome tree. Our wether forecst exmple from bove unrvels into the following outcome tree: CS 188, Fll 2018, Note 7 3

The root node t the top is mximizer node, just like in expectimx, nd is controlled by us. We select n ction, which tkes us to the next level in the tree, controlled by chnce nodes. At this level, chnce nodes resolve to different utility nodes t the finl level with probbilities corresponding to the posterior probbilities derived from probbilistic inference run on the underlying Byes net. Wht exctly mkes this different from vnill expectimx? The only rel difference is tht for outcome trees we nnotte our nodes with wht we know t ny given moment (inside the curly brces). The Vlue of Perfect Informtion In everything we ve covered up to this point, we ve generlly lwys ssumed tht our gent hs ll the informtion it needs for prticulr problem nd/or hs no wy to cquire new informtion. In prctice, this is hrdly the cse, nd one of the most importnt prts of decision mking is knowing whether or not it s worth gthering more evidence to help decide which ction to tke. Observing new evidence lmost lwys hs some cost, whether it be in terms of time, money, or some other medium. In this section, we ll tlk bout very importnt concept - the vlue of perfect informtion (VPI) - which mthemticlly quntifies the mount n gent s mximum expected utility is expected to increse if it observes some new evidence. We cn compre the VPI of lerning some new informtion with the cost ssocited with observing tht informtion to mke decisions bout whether or not it s worthwhile to observe. Generl Formul Rther thn simply presenting the formul for computing the vlue of perfect informtion for new evidence, let s wlk through n intuitive derivtion. We know from our bove definition tht the vlue of perfect informtion is the mount our mximum expected utility is expected to increse if we decide to observe new evidence. We know our current mximum utility given our current evidence e: MEU(e) = mx P(s e)u(s, ) s Additionlly, we know tht if we observed some new evidence e before cting, the mximum expected utility of our ction t tht point would become MEU(e,e ) = mx P(s e,e )U(s,) s CS 188, Fll 2018, Note 7 4

However, note tht we don t know wht new evidence we ll get. For exmple, if we didn t know the wether forecst beforehnd nd chose to observe it, the forecst we observe might be either good or bd. Becuse we don t know wht wht new evidence e we ll get, we must represent it s rndom vrible E. How do we represent the new MEU we ll get if we choose to observe new vrible if we don t know wht the evidence gined from observtion will tell us? The nswer is to compute the expected vlue of the mximum expected utility which, while being mouthful, is the nturl wy to go: MEU(e,E ) = e P(e e)meu(e,e ) Observing new evidence vrible yields different MEU with probbilities corresponding to the probbilities of observing ech vlue for the evidence vrible, nd so by computing MEU(e,E ) s bove, we compute wht we expect our new MEU will be if we choose to observe new evidence. We re just bout done now - returning to our definition for VPI, we wnt to find the mount our MEU is expected to increse if we choose to observe new evidence. We know our current MEU nd the expected vlue of the new MEU if we choose to observe, so the expected MEU increse is simply the difference of these two terms! Indeed, V PI(E e) = MEU(e,E ) MEU(e) where we cn red V PI(E e) s "the vlue of observing new evidence E given our current evidence e". Let s work our wy through n exmple by revisiting our wether scenrio one lst time: If we don t observe ny evidence, then our mximum expected utility cn be computed s follows: MEU( ) = mxeu() = mx P(w)U(, w) w = mx{0.7 100 + 0.3 0,0.7 20 + 0.3 70} = mx{70, 35} = 70 Note tht the convention when we hve no evidence is to write MEU( ), denoting tht our evidence is the empty set. Now let s sy tht we re deciding whether or not to observe the wether forecst. We ve lredy computed tht MEU(F = bd) = 53, nd let s ssume tht running n identicl computtion for F = good CS 188, Fll 2018, Note 7 5

yields MEU(F = good) = 95. We re now redy to compute MEU(e,E ): MEU(e,E ) = MEU(F) = P(e e)meu(e,e ) e = P(F = f )MEU(F = f ) f = P(F = good)meu(f = good) + P(F = bd)meu(f = bd) = 0.59 95 + 0.41 53 = 77.78 Hence we conclude V PI(F) = MEU(F) MEU( ) = 77.78 70 = 7.78. Properties of VPI The vlue of perfect informtion hs severl very importnt properties, nmely: Nonnegtivity. E,e V PI(E e) 0 Observing new informtion lwys llows you to mke more informed decision, nd so your mximum expected utility cn only increse (or sty the sme if the informtion is irrelevnt for the decision you must mke). Nondditivity. V PI(E j,e k e) V PI(E j e) +V PI(E k e) in generl. This is probbly the trickiest of the three properties to understnd intuitively. It s true becuse generlly observing some new evidence E j might chnge how much we cre bout E k ; therefore we cn t simply dd the VPI of observing E j to the VPI of observing E k to get the VPI of observing both of them. Rther, the VPI of observing two new evidence vribles is equivlent to observing one, incorporting it into our current evidence, then observing the other. This is encpsulted by the order-independence property of VPI, described more below. Order-independence. V PI(E j,e k e) = V PI(E j e) +V PI(E k e,e j ) = V PI(E k e) +V PI(E j e,e k ) Observing multiple new evidences yields the sme gin in mximum expected utility regrdless of the order of observtion. This should be firly strightforwrd ssumption - becuse we don t ctully tke ny ction until fter observing ny new evidence vribles, it doesn t ctully mtter whether we observe the new evidence vribles together or in some rbitrry sequentil order. CS 188, Fll 2018, Note 7 6