Artificial Intelligence Markov Decision Problems

Size: px

Start display at page:

Download "Artificial Intelligence Markov Decision Problems"

Tyler Simpson
6 years ago
Views:

1 rtificil Intelligence Mrkov eciion Problem ilon - briefly mentioned in hpter Ruell nd orvig - hpter 7 Mrkov eciion Problem; pge of Mrkov eciion Problem; pge of exmple: probbilitic blockworld ction outcome probbility time (= cot) move ucce filure..9 minute minute pint ucce. minute Mrkov eciion Problem; pge of how to determine the verge pln-execution time of given pln t move to t i = verge pln-execution time until tte i reched if the gent in tte i nd follow the pln G t F t color move F to G t t = +. t +.9 t t = +. t t = +. t +.9 t t = t = (= verge pln-execution time) t = t = t = Mrkov eciion Problem; pge of color color H I move H to I color J J Pln verge pln execution time 6. minute Pln verge pln execution time 9. minute K L move K to L move to G F color move F to G S move S to tble T U move T to U Pln verge pln execution time. minute Pln verge pln execution time. minute M P O V W Y X move M to move O to P move V to W move X to Y R Q Z move Q to R move Z to

2 how to (lmot) detemine pln with miniml verge pln-execution time determinitic plnning nd erch move to move to ground color color move to the deciion tree i infinite! ction re determinitic Mrkov eciion Problem; pge 5 of move to the vlue ocited with thee chnce node hould be the me ==> the ction ocited with thee choice node hould be the me ==> whenever the configurtion of the block i the me, one wnt to execute the me ction (= policy) ction hve determinitic effect tte nd ction determine uniquely the ucceor tte tte re completely obervble pln i equence of ction (= pth) minimize totl cot optiml pln i cyclic Mrkov eciion Problem; pge 6 of ction re probbilitic the robot cn drift probbilitic plnning nd erch =Mrkov eciion Problem (MP) Mrkov property ction hve probbilitic effect tte nd ction uniquely determine prob ditribution over ucceor tte tte re completely obervble pln i mpping from tte to ction (= policy) minimize expected totl cot optiml pln cn be cyclic.5 - W W determinitic plnning nd erch pln i equence of ction (= pth) cn be found uing (forwrd or bckwrd) erch Mrkov eciion Problem pln i mpping from tte to ction (= policy) how to find it? determine the expected ditnce of ll tte greedily ign the ction to ech tte tht decree the expected ditnce the mot Mrkov eciion Problem; pge 7 of Mrkov eciion Problem; pge 8 of

3 () ucc(,) c(,) gd() determinitic plnning nd erch = ditnce of tte = tte = ction = et of ction tht cn be executed in tte = the tte tht reult from the execution of ction in tte = the cot tht reult from the execution of ction in tte gd() = gd() = min ε () c(,) + gd(ucc(,)) if i tte if i not tte () ucc(,) c(,) p(,) gd() Mrkov eciion Problem.5 = tte = ction = et of ction tht cn be executed in tte = the et of tte tht cn reult from the execution of ction in tte = the cot tht reult from the execution of ction in tte = the probbility tht tte reult from the execution of ction in tte = expected ditnce of tte gd() = gd() = min ε () (c(,) + Σ ε ucc(,) p(,) gd( )) ellmn eqution if i tte if i not tte () = the optiml ction to execute in tte () = the optiml ction to execute in tte () = rgmin ε () c(,) + gd(ucc(,)) if i not tte () = rgmin ε () (c(,) + Σ ε ucc(,) p(,) gd( )) if i not tte Mrkov eciion Problem; pge 9 of Mrkov eciion Problem; pge of exmple determinitic plnning nd erch determinitic plnning nd erch Mrkov eciion Problem; pge of Mrkov eciion Problem.5 given the expected ditnce, we cn ue the definition to check them but clculting them i chicken-nd-egg problem () ucc(,) c(,) Mrkov eciion Problem; pge of gd() = ditnce of tte (= miniml cot until i reched if execution t gd i () = miniml cot until i reched or i ction hve been executed if execution in tte for i lrger thn contnt: gd() = gd i () (= once gd i () = gd i- () for ll tte ) gd () = = tte = ction = et of ction tht cn be executed in tte = the tte tht reult from the execution of ction in tte = the cot tht reult from the execution of ction in tte gd i () = gd i () = min ε () (c(,) + gd i- (ucc(,))) if i tte if i not tte

4 () ucc(,) c(,) p(,) gd() gd i () Mrkov eciion Problem = tte = ction = et of ction tht cn be executed in tte = the et of tte tht cn reult from the execution of ction in tte = the cot tht reult from the execution of ction in tte = the probbility tht tte reult from the execution of ction in tte = expected ditnce of tte = miniml expected cot until i reched or i ction hve been executed if execution in tte gd() = lim i -> infinity gd i () (not necerily fter finite mount of time). i :=. Set (for ll εs) gd i () =.. i := i+. Set (for ll εs) Vlue Itertion mintin pproximtion of the ditnce (= vlue) gd i () = gd i () = min ε () (c(,) + Σ ε ucc(,) p(,) gd i- ( )) 5. If (for ome εs) gd i () - gd i- () > mll contnt, go to. 6. Set (for ll εs tht re not tte) () = rgmin ε () (c(,) + Σ ε ucc(,) p(,) gd i ( )) if i tte if i not tte gd () = gd i () = gd i () = min ε () (c(,) + Σ ε ucc(,) p(,) gd i- ( )) if i tte if i not tte Mrkov eciion Problem; pge of Mrkov eciion Problem; pge of exmple of Vlue Itertion ().5 tte i 5 6 exmple of Vlue Itertion ().5 tte (no dicounting) tte which ction to execute in the tte? : + = : =.75 execute! Mrkov eciion Problem; pge 5 of Mrkov eciion Problem; pge 6 of

5 Mrkov eciion Problem; pge 7 of Policy Itertion mintin policy. i :=. Set (for ll εs tht re not tte) i () to n rbitrry ction in ().. Set gd i () to the verge pln-execution time until tte i reched if the gent in tte nd follow policy i. i := i+ 5. Set (for ll εs tht re not tte) i () = rgmin ε () (c(,) + Σ ε ucc(,) p(,) gd i ( )) 6. If (for ome εs tht i not tte) i () doe not equl i- (), go to 7. Set (for ll εs tht re not tte) () = i (). ote: The initil policy o h to gurntee tht the gent reche tte with probbility one no mtter which tte it i ed in. Mrkov eciion Problem; pge 8 of exmple of Policy Itertion.5 tte (no dicounting) policy t i= () = (could lo hve been ) nd (tte ) = gd () = +.5 gd () +.5 gd (tte) = 6 gd (tte ) = +.5 gd () +.5 gd () = gd () = policy t i= () = nd (tte ) = gd () = +. gd () = gd (tte ) = +.5 gd () +.5 gd () =.5 gd () = policy t i= () = nd (tte ) = execute ction in the tte! extenion: no () extenion: no () cnnot minimize expected totl cot wht if there i no? living in the world cn no longer minimize expected cot until the i reched expected totl cot = infinite expected totl cot = infinite here: - cn minimize expected cot per ction execution - cn minimize expected totl dicounted cot Mrkov eciion Problem; pge 9 of Mrkov eciion Problem; pge of

6 extenion: no () totl dicounted cot = dicount fctor extenion: no () cn minimize the expected totl dicounted cot - ume γ =.9 if the interet rte i (-γ)/γ (for < γ < ), how much money do I need to py omeone right now o tht there i no difference to pying the following yerly intllment expected totl dicounted cot =.9 expected totl dicounted cot =. x dollr right now re worth ( + (-γ)/γ)x = x/γ dollr in yer o, y dollr in yer re worth γ y dollr right now nwer: + γ + γ + γ + γ + Mrkov eciion Problem; pge of Mrkov eciion Problem; pge of - dicounting mke the totl cot finite c c c c c expected totl dicounted cot = c/(-γ) - dicounting moothe out the horizon - dicounting cn be interpreted the probbility of dying Mrkov eciion Problem; pge of extenion: no (5) dicounting: if the interet rte i (-γ)/γ, then y dollr in yer re worth γ y dollr right now dying: if I die lter thi yer with probbility -γ, then the expected vlue of y dollr in yer i γ y right now γ () ucc(,) c(,) p(,) gd() Mrkov eciion Problem; pge of = dicount fctor ( < γ < ); if there i, cn et γ = (no dicounting) = tte = ction = et of ction tht cn be executed in tte = the et of tte tht cn reult from the execution of ction in tte = the cot tht reult from the execution of ction in tte = the probbility tht tte reult from the execution of ction in tte = miniml expected dicounted totl cot if execution in tte gd() = gd() = min ε () (c(,) + γ Σ ε ucc(,) p(,) gd( )) gd i () () = the optiml ction to execute in tte () = rgmin ε () (c(,) + γ Σ ε ucc(,) p(,) gd( )) if i tte if i not tte = miniml expected dicounted totl cot until i reched or i ction hve been executed if execution in tte gd () = gd i () = gd i () = min ε () (c(,) + γ Σ ε ucc(,) p(,) gd i- ( )) gd() = lim i -> infinity gd i () Vlue-Itertion with or without dicounting for ll if i tte if i not tte if i not tte gd() doe not necerily converge fter finite mount of time () converge fter finite mount of time if gd() i pproximted with gd i () for ll

7 exmple of Vlue Itertion ().5 tte (dicount fctor =.9) i exmple of Vlue Itertion ().5 tte (dicount fctor =.9) which ction to execute in the tte? tte : + = : =.75 execute! (In generl, the optiml ction depend on the dicount fctor!) Mrkov eciion Problem; pge 5 of Mrkov eciion Problem; pge 6 of lerning for optimiztion reinforcement lerning with Mrkov eciion Proce Model exm exmple find policy (behvior) tht mximize the expected totl dicounted rewrd even in the preence of delyed rewrd if you don t know the ction outcome (rewrd nd probbilitie): reinforcement lerning lerning for optimiztion reinforcement lerning with Mrkov eciion Proce Model pproch etimte the probbilitie nd rewrd ue vlue-itertion time 8 time p(,) =? explortion/exploittion trdeoff Mrkov eciion Problem; pge 7 of Mrkov eciion Problem; pge 8 of

8 lerning for optimiztion reinforcement lerning with Mrkov eciion Proce Model pproch ue Q-lerning Mrkov eciion Problem; pge 9 of if you execute ction in tte nd you receive cot c nd mke trnition to tte then updte Q(,) = Q(,) + α (c + γ V( ) - Q(,)) lerning rte dicount fctor < γ < V( ) = min ε ( ) Q(,) Q(,) = miniml expected dicounted totl cot until i reched if execution in tte nd the firt ction executed i V( ) = miniml expected dicounted totl cot until i reched if execution in tte (= gd( ) ) lerning for optimiztion reinforcement lerning with Mrkov eciion Proce Model pproch. Initilize Q(,) = for ll tte nd ction.. := the current tte.. if i tte then.. hooe n ction to execute in the current tte. (The ction believed to be bet i := rgmin ε () Q(,).) 5. xecute ction. Oberve the cot c nd ucceor tte. 6. Updte Q(,) = Q(,) + α (c + γ V( ) - Q(,)). 7. Goto. Q(, ) = 5. Mrkov eciion Problem; pge of Q(,) =. cot prob.5 prob.5 Q(, ) Q(, ) Q(, ) =. =. =.5 Q(, ) =.9

Reinforcement Learning and Policy Reuse

Reinforcement Learning and Policy Reuse Reinforcement Lerning nd Policy Reue Mnuel M. Veloo PEL Fll 206 Reding: Reinforcement Lerning: An Introduction R. Sutton nd A. Brto Probbilitic policy reue in reinforcement lerning gent Fernndo Fernndez