Improving Anytime Point-Based Value Iteration Using Principled Point Selections

Size: px

Start display at page:

Download "Improving Anytime Point-Based Value Iteration Using Principled Point Selections"

Melina Thomas
5 years ago
Views:

1 In In Proceedngs of the Twenteth Interntonl Jont Conference on Artfcl Intellgence (IJCAI-7) Improvng Anytme Pont-Bsed Vlue Iterton Usng Prncpled Pont Selectons Mchel R. Jmes, Mchel E. Smples, nd Dmtr A. Dolgov AI nd Robotcs Group Techncl Reserch, Toyot Techncl Center USA {mchel.jmes, mchel.smples, Abstrct Plnnng n prtlly-observble dynmcl systems (such s POMDPs nd PSRs) s computtonlly chllengng tsk. Populr pproxmton technques tht hve proved successful re pont-bsed plnnng methods ncludng pontbsed vlue terton (PBVI), whch works by pproxmtng the soluton t fnte set of ponts. These pont-bsed methods typclly re nytme lgorthms, whereby n ntl soluton s obtned usng smll set of ponts, nd the soluton my be ncrementlly mproved by ncludng ddtonl ponts. We ntroduce fmly of nytme PBVI lgorthms tht use the nformton present n the current soluton for dentfyng nd ddng new ponts tht hve the potentl to best mprove the next soluton. We motvte nd present two dfferent methods for choosng ponts nd evlute ther performnce emprclly, demonstrtng tht hgh-qulty solutons cn be obtned wth sgnfcntly fewer ponts thn prevous PBVI pproches. Introducton Pont-bsed plnnng lgorthms Pneu et l., 23; Spn nd Vlsss, 2] for prtlly-observble dynmcl systems hve become populr due to ther reltvely good performnce nd becuse they cn be mplemented s nytme lgorthms. It hs been suggested (e.g., n Pneu et l., 2]), tht nytme pont-bsed plnnng lgorthms could beneft sgnfcntly from the prncpled selecton of ponts to dd. We provde severl methods for usng nformton collected durng vlue terton (specfclly, chrcterstcs of the vlue functon) to choose new ddtonl ponts. We show tht propertes of the vlue functon llow the computton of n upper bound on the potentl mprovement (gn) resultng from the ddton of sngle pont. We rgue tht the ddton of ponts wth mxml gn produces good pproxmtons of the vlue functon, nd ths rgument s emprclly verfed. Further, we defne n optmzton problem tht fnds the pont wth the mxml gn for gven regon. These new pproches re emprclly compred to trdtonl nytme pont-bsed plnnng. The results show tht our lgorthms choose ddtonl ponts tht sgnfcntly mprove the resultng pproxmton of the vlue functon. Such mprovement s potentlly benefcl n two wys. Frst, computton tme cn be reduced becuse there re mny fewer ponts for whch computton must be performed. However, ths beneft s sometmes (but not lwys) offset by the ddtonl tme requred to select new ponts. Another potentl beneft s reduced memory storge for smller sets of ponts. Currently, the soluton of most systems do not requre lrge numbers of ponts, but storge beneft my become more mportnt s the sze of problems ncreses such tht more ponts re requred. 2 Bckground The plnnng methods developed here re for prtllyobservble, dscrete-tme, fnte-spce dynmcl systems n whch ctons re chosen from set A, observtons from set O, nd rewrds from set R. There s exctly one cton, one observton, nd one rewrd per tme-step plnnng lgorthms ttempt to fnd polcy tht mxmzes the longterm dscounted rewrd wth dscount fctor γ, ). These pont-bsed plnnng methods re sutble for t lest two clsses of models tht cn represent such dynmcl systems, the well-known prtlly observble Mrkov decson processes (POMDP), nd the newer predctve stte representton (PSR) Sngh et l., 24]. For the ske of fmlrty, we use POMDPs here, but t should be noted tht the methods presented here (s well s other plnnng methods for such dynmcl systems) cn just s esly be ppled to PSRs (see Jmes et l., 24] for detls). 2. POMDPs POMDPs re models of dynmcl systems bsed on underlyng ltent vrbles clled nomnl sttes s S. At ny tme step, the gent s n one nomnl stte s. Upon tkng n cton A the gent receves rewrd r(s, ) tht s functon of the stte nd cton, trnstons to new stte s ccordng to stochstc trnston model pr(s s, ), nd receves n observton o O ccordng to stochstc observton model pr(o s, ). For compctness, we defne the vector r s r (s) = r(s, ). The notton v(e) for vector v refers the vlue of entry e n v.

2 The gent does not drectly observe the nomnl stte s, nd so, must mntn some sort of memory to fully chrcterze ts current stte. In POMDPs, ths s typclly done usng the belef stte: probblty dstrbuton over nomnl sttes. The belef stte b summrzes the entre hstory by computng the probblty of beng n ech nomnl stte b(s). Ths computton of the new belef stte b o reched fter tkng cton nd gettng observton o from belef stte b s gven n the followng updte: b o = s pr(o s, ) s pr(s s, )b(s). pr(o, b) Ths equton s used s the gent ntercts wth the dynmcl system to mntn the gent s current belef stte. 2.2 Pont-bsed Vlue Iterton Algorthms Pont-bsed plnnng lgorthms, such s PBVI nd Perseus, for prtlly observble dynmcl systems perform plnnng by constructng n pproxmton of the vlue functon. By updtng the vlue functon for one pont, the vlue-functon pproxmton for nerby ponts s lso mproved. The bsc de s to clculte the vlue functon (nd ts grdent) only t smll number of ponts, nd the vlue functon for the other ponts wll be crred long. Ths pont-bsed pproch uses only frcton of the computtonl resources to construct the pproxmte vlue functon, s compred to exct methods. There re two mn sets of vectors used: the set P of ponts, ech of whch s belef stte vector; nd the set S n whch represents the pproxmte vlue functon V n t step n. The vlue functon s pecewse lner convex functon over belef sttes, defned s the upper surfce of the vectors α S n. The vlue for belef stte b s V n (b) = mx α S n b T α. The Bellmn equton s used to defne the updte for the next vlue functon: V n+ (b) = mx b T r + γ ] pr(o, b)v n (b o ) o = mx = mx γ o = mx where b T r + γ o b T r + pr(o, b) mx α S n (b o ) T α ] mx pr(o s, ) α S n s s b T r + γ ] mx {go o } bt go ] pr(s s, )b(s)α (s ) g o(s) = s pr(o s, )pr(s s, )α (s ), () for α S n. These g vectors re defned s such becuse they led to computtonl svngs. The computton of α vectors for V n+ (b) uses the so-clled bckup opertor. bckup(b) = rgmx b T g, b (2) {g b } A where g b = r + γ o rgmx {g o } bt g o. In PBVI, the bckup s ppled to ech pont n P, s follows. functon onesteppbvibckup(p, S old ). S new = 2. For ech b P compute α = bckup(b) nd set S new = S new α 3. return S new For Perseus, n ech terton the current set of α vectors s trnsformed to new set of α vectors by rndomly choosng ponts from the current set P nd pplyng the bckup opertor to them. Ths process s: functon onestepperseusbckup(p, S old ). S new =, P = P, where P lsts the non-mproved ponts 2. Smple b unformly t rndom from P, compute α = bckup(b) 3. If b T α V n (b) then dd α to S new, else dd α = rgmx α S old b T α to S new 4. Compute P = {b P : V n+ (b) ɛ < V n (b)}. If P s empty return S new, otherwse go to step 2 Gven ths, the bsc (non-nytme) lgorthms re, for * PBVI, Perseus]: functon bsc*(). P = pckintlponts(), S = mnmlalph(), n = 2. S n+ = onestep*bckup(p, S n ) 3. ncrement n, goto 2 where pckintlponts() typclly uses rndom wlk wth dstnce-bsed metrcs to choose n ntl set of ponts, nd the functon mnmlalph() returns the α vector wth ll entres equl to mn r R r/( γ). The lgorthm wll stop on ether condton on the vlue functon or on the number of tertons. Ths lgorthm s esly expnded to be n nytme lgorthm by modfyng pckintlponts() to strt wth smll set of ntl ponts nd ncludng step tht tertvely expnds the set of current ponts P. Typclly, hvng current set wth more ponts wll result n better pproxmton of the vlue functon, but wll requre more computton to updte. The nytme versons of the lgorthms re: functon nytme*(). P = pckintlponts(), S = mnmlalph(), n = 2. f(redytoaddponts()) then P = P bestpontstoadd(fndcnddteponts()) 3. S n+ = onestep*bckup(p, S n ) 4. ncrement n, goto 2

3 There re three new functons ntroduced n ths nytme lgorthm: redytoaddponts() whch determnes when the lgorthm dds new ponts to S; fndcnddteponts() whch s the frst step n determnng the ponts to dd; nd best- PontsToAdd() whch tkes the set of cnddte ponts nd determnes set of ponts to dd. In Secton 4, we present vrtons of these functons, ncludng our mn contrbutons, methods for determnng the best ponts to dd bsed on exmnton of the vlue functon. 3 Incrementl PBVI nd Perseus In ths secton we present methods for fndng the best ponts to dd, gven set of cnddte ponts. In some cses, the best ponts wll be subset of the cnddtes, nd n other cses new ponts tht re outsde the set of orgnl cnddtes wll be dentfed s best. In ll cses, we wll mke use of sclr mesure clled the gn of pont, whch s wy to evlute how useful tht pont wll be. We show how the current (pproxmte) vlue functon cn be used to compute useful mesures of gn tht cn then be used n nformed methods of pont selecton. For pont b n the belef spce, the gn g(b) estmtes how much the vlue functon wll mprove. The problem, then, s to construct mesure of gn tht dentfes the ponts tht wll most mprove the pproxmton of the vlue functon. Ths s complex problem, prmrly becuse chngng the vlue functon t one pont cn ffect the vlue of mny (or ll) other ponts, s cn be seen by exmnng the bckup opertor (Equton 2). Chnges my lso propgte over n rbtrry number of tme steps, s the chnges to other ponts propgte to yet other ponts. Therefore, exctly dentfyng the ponts wth best gn s much too computtonlly expensve, nd other mesures of gn must be used. Here, we present two dfferent mesures of gn: one bsed on exmnng the bckup opertor, wrtten g B, nd the other s derved from fndng n upper bound on the mount of gn tht pont my hve. We use lner progrmmng to compute ths gn, wrtten g LP. Gven these defntons of gn, we present two dfferent methods for dentfyng ponts to dd. The smplest method (detled n Secton 4) tkes the cnddte ponts, orders them ccordng to ther gns, nd then selects the top few. A more sophstcted method leverges some nformton present when computng g LP to dentfy ponts tht hve even hgher gn thn the cnddte ponts, nd so returns dfferent ponts thn the orgnl cnddtes. We descrbe how these selecton methods cn be mplemented n Secton 4, followng detled dscusson (Sectons 3. nd 3.2) of the two mesures of gn. 3. Gn Bsed on One-Step Bckup The gn g B mesures the dfference between the current vlue of pont b nd ts vlue ccordng to n α vector computed by one-step bckup of b v Equton 2. Ths mesure does not hve strong theoretcl support, but ntutvely, f the one-step dfference for pont s lrge, then t wll lso mprove the pproxmtons of nerby ponts sgnfcntly. These mprovements wll then hve postve effects on ponts tht led to these ponts (lthough the effect wll be dscounted), mprovng mny prts of the vlue functon. Ths gn s defned s: g B (b) = mx r(b, ) + γ o = b T α V (b), ] pr(o b, )V (b o ) V (b) where α = bckup (b). Note tht ths mesure ncludes the mmedte expected rewrd t b, quntty tht ws not prevously ncluded n the computton of ny α vector. Ths mesure cn mke use of cched g vectors (Equton ), nd so t s nexpensve from the computtonl stndpont. In the followng secton, we descrbe dfferent mesure of gn tht hs stronger theoretcl justfcton, but lso comes t the expensve of hgher computtonl cost. 3.2 Gn Bsed on Vlue-Functon Bounds Ths second mesure of gn tht we ntroduce uses the followng nsght, whch we frst llustrte on system wth two nomnl sttes, then extend the ntuton to three-stte system, nd then descrbe the generl clculton for ny number of sttes. The vlue functon s pecewse lner convex functon consstng of (for two nomnl sttes) the upper surfce of set of lnes defned by α vectors (see Fgure for runnng exmple). The vectors whch defne these lnes correspond to ponts nteror to the regons where the correspondng lnes re mxml (ponts A, B, nd C). Intutvely, the vlue functon s most relevnt t these ponts, nd becomes less relevnt wth ncresng dstnce from the pont. Here, we ssume tht the vlue gven by the α vector s correct t the correspondng pont. Now consder the ncluson of new α vector for new pont (pont X n Fgure ). If the vlue functon for pont A s correct, then the new α vector cnnot chnge the vlue of A to be greter thn ts current vlue; the sme holds for B. Therefore, the new α vector for pont X s upper-bounded by the lne between the vlue for A nd the vlue for B. Thus, the gn for X s the dfference between ths upper bound nd the current vlue of X. For the cse wth three nomnl sttes, the nlogous plnr soluton to the upper bound s not s smple to compute (see Fgure b nd c). To vsulze the problem, consder moble plne beng pushed upwrd from pont X, whch s constrned only to not move bove ny of the exstng ponts tht defne the vlue-functon surfce (A, B, C, nd D n Fgure b). In the exmple depcted n Fgure b, ths plne wll come to rest on the ponts defned by the loctons of A, B, C n the belef spce nd ther vlues. The vlue of tht plne t X then gves the tghtest upper bound for ts vlue f we were to dd new α vector t X. The gn s then computed n the sme mnner s n the two stte cse bove, but fndng the bove-descrbed plne s not strghtforwrd (nd becomes even more dffcult n hgherdmensonl spces). The dffculty s tht n hgher dmensonl spces n contrst to the two stte cse where the mnml upper bound s lwys defned by the ponts mmedtely to the left nd rght of the new pont selectng the ponts tht defne tht mnml upper-boundng surfce s non-trvl tsk. However, ths problem cn be formulted s

4 Vlue g LP (X) A X B C p(s ) () (b) (c) Fgure : (): Vlue functon for dynmcl system wth two nomnl sttes (note tht p(s 2 ) = p(s ), so n xs for p(s 2 ) s unnecessry). The vlue functon s defned by the α vectors correspondng to ponts A, B, nd C. The gn g LP (X) for pont X s the dfference between the upper bound defned by the lne V (A), V (B) nd the current vlue of X. (b,c): Vlue functon for system wth three nomnl sttes. The vlue functon s defned by the α vectors correspondng to ponts A, B, C, nd D, nd the gn g LP (X) for pont X s the dfference between the tghtest upper bound (defned by plne V (A), V (B), V (C) ) nd the current vlue for X. Tht tghtest upper-bound plne s produced by the frst LP (FndGnAndRegonForPont). X s the best pont produced by the second LP (FndBestPontForRegon) for the boundng regon {A, B, C}. lner progrm (LP). We descrbe ths LP n Tble nd refer to t s FndGnAndRegonForPont(b ). Gven cnddte belef pont b ts current sclr vlue v, the current ponts {b...b m }, the current sclr vlues v...v m of these ponts, nd the vlue vectors α...α n, ths mnmzton LP fnds the tghtest upper bound V u (b ) = w v, whch cn then be used to compute the gn g LP (b ) = V u (b ) V (b ). In words, ths mnmzton LP works by fndng set of ponts (these re the set of ponts wth nonzero weghts w ) such tht: ) b cn be expressed convex lner combnton of these ponts, nd ) the vlue functon defned by the plne tht rests on ther vlue-functon ponts s the mnml surfce bove v. We cll the set of such ponts wth non-zero weghts the boundng regon Γ(b ) = {b w > } for pont b ; t hs the property tht the pont b must be nteror to t, nd these boundng regons wll ply n mportnt role n the next secton. There s one subtle ssue regrdng the bove LP, whch s tht not ll ponts of potentl nterest re nteror to the set of current ponts for whch α vectors re defned (n fct erly n the process when there re few ponts, most of the useful ponts wll be outsde the convex hull of current ponts). Ths s problemtc becuse we wnt to consder these ponts (they often correspond to reltvely poorly pproxmted res of the vlue functon), but they re never nteror to ny subset of the current ponts. To resolve ths ssue, we ntroduce the set U of unt bss ponts: the ponts wth ll components but one equl to, nd the remnng entry equl to (the stndrd orthonorml bss n the belef spce). We then ugment the set of current belef ponts n our LP to nclude the unt bss ponts: {b...b m } = {P U}. The vlues v j of the ponts n U \ P re computed s v j = bckup (b j ) b j U \ P, snce these ponts were not updted n onestep*bckup() s were the ponts n P. As before, the vrbles n ths ugmented LP re the sclr weghts w...w m tht defne whch ponts from {P U} form the boundng regon Γ(b ) 3.3 Fndng the Best Pont n Boundng Regon Gven the gns g LP for the cnddte ponts, t s possble to choose ponts to dd bsed on ths crter, but one drwbck to ths s tht we mght dd multple ponts from the sme boundng regon Γ, nd thus wste resources by pproxmtng the vlue functon for multple ponts when just one would work lmost s well. Further, t my be tht none of these ponts s the pont n the regon wth the hghest gn. To ddress these ssues, we ntroduce nother LP clled FndBestPontForRegon(Γ) (Tble ) tht tkes boundng regon nd fnds the pont nteror to tht regon wth mxml g LP. Note tht ths regon my contn unt bss ponts, llowng the new pont to be exteror to the current set of ponts. Ths LP tkes s nput boundng regon Γ(b ) = {b...b l }, sclr vlues v...v l of these ponts, nd the vlue vectors α...α n. The optmzton vrbles re: sclr weghts w...w l, vector b new, whch s the pont to be found, nd the sclr vlue v new, whch s the vlue of b new. The process of fndng the best ponts to dd wll cll ths LP on ll unque regons found by runnng the frst LP (FndGnAndRegonForPont) on ll cnddte ponts. 4 Usng Gns n PBVI nd Perseus We now defne two ncrementl lgorthms, PBVI-I nd Perseus-I, usng the gns g B nd g LP, nd the lner progrms bove. Referencng the nytme*() lgorthm, we explored vrous possbltes for redytoaddponts(), bestpontstoadd(c), nd fndcnddteponts(). Whle mny vrtons exst, ths pper presents only few tht seem most promsng.

5 Tble : The lner progrms used to ) compute the gn nd regon Γ(b ) for pont b gven vlue functon, nd ) fnd the best pont wthn gven regon Γ gven the vlue functon. FndGnAndRegonForPont(b ): Mnmze: w v Gven: Vrbles: Constrnts: b, v, b...b m, v...v m, α...α n w...w m b = w b ; w v v w = ; w..m] Two computtonlly chep possbltes for redytoadd- Ponts() re to dd ponts: ) fter fxed number of tertons, nd ) when the updte process hs pproxmtely converged,.e., the mxml one-step dfference n vlue over ll ponts s below some threshold (mx b P (V n+ (b) V n (b)) ɛ). We hve lso explored two methods for fndcnddteponts(). The frst s to keep lst of ponts tht re one-step successors to the ponts n P, the set {b o b P }. The second method s to do rndom wlk nd use dstnce threshold to choose ponts, just s n pckintlponts(). The gns dscussed n the prevous secton cn then be used for bestpontstoadd(c). Gn g B orders the cnddte ponts C nd the top k re chosen. For gn g LP, the cnddte ponts defne set of unque boundng regons {Γ(b) b C}. For ech regon, the LP FndBestPontFor- Regon() returns the best cnddte pont b. These cnddte ponts re ordered by g LP (b ) nd the top k re chosen. To compre our methods gnst bselne nytme lgorthm, we used the stndrd rndom wlk method, choosng the frst k ponts tht exceeded dstnce threshold. Experments To evlute our prncpled pont selecton lgorthms, we conducted testng on set of stndrd POMDP domns. Three of the problems re smll-to-md szed, nd the fourth (Hllwy) s lrger. Domn defntons my be obtned t Cssndr, 999]. We tested three lgorthms: ) bselne, nytme Perseus usng the dstnce-bsed metrc; ) one-step bckup Perseus-I usng g B ; nd ) LP-bsed Perseus-I usng g LP nd the two LPs for fndng the best ponts. All three lgorthms used the convergence condton wth ɛ = E 3 for redytoaddponts(). The lgorthms for Network, Shuttle, nd 4x3 dded up to 3 ponts t tme, whle Hllwy dded up to 3 ponts t tme. However, choosng ponts usng LPs often dd not dd ll 3 becuse fewer unque regons were found. In the one-step bckup Perseus- I, we used the successor metrc for fndcnddteponts() s t ws esly ntegrted wth lttle overhed, whle n the LPbsed Perseus-I we used rndom wlk wth dstnce metrc s t ntroduced the lest overhed n ths cse. The results were verged over 3 runs, nd the performnce metrc s the verge dscounted rewrd, for gven ntl belef stte. The results re presented n the grphs n Fgure 2, whch re of FndBestPontForRegon(Γ) ( ) Mxmze: w v v new Gven: Vrbles: Constrnts: b...b l, v...v l, α...α n w...w l, v new, b new b new = w b ; w..l] w = ; v new α j b new j..l] three types. The frst type (performnce vs. number of ponts) s mesure of the effectveness of pont selecton. The second type (performnce vs. number tertons) nd the thrd type (performnce vs. CPU tme) both mesure how well the use of prncpled pont selecton ffects the overll performnce, but the thrd type shows the mpct of ddtonl tme ncurred by the pont-selecton process. These grphs show performnce on the x-xs, nd the mount of resource (ponts, tertons, or tme) used to cheve tht performnce on the y-xs. When comprng two lgorthms wth respect to gven resource, the better lgorthm wll be lower. Exmnton of the frst-type grphs shows tht the LPbsed Perseus-I dd good job of dentfyng the best ponts to dd. The resultng polces produced better performnce wth fewer ponts thn ether of the other two methods. Furthermore, on three of the problems, the one-step bckup Perseus- I lso dd sgnfcntly better thn the bselne for choosng ponts. Thus, for ths metrc, our gol of fndng useful ponts hs been cheved. Most promsngly, on the lrgest problem (Hllwy) our lgorthms used sgnfcntly fewer ponts to cheve good performnce. However, the other grphs show tht choosng good ponts does not lwys trnslte nto fster lgorthm. On one problem (Network) both versons of Perseus-I were fster thn the bselne, but on the other problems, the performnce ws ether closely compettve or slghtly worse due to the overhed ncurred by dentfyng good ponts. Whle ths result ws not optml, the fct tht there were sgnfcnt speedups on one problem combned wth the pont selecton results leds us to drw the concluson tht ths method s promsng nd needs further nvestgton nd development. References Cssndr, 999] A. Cssndr. Tony s pomdp pge. pomdp/ndex.html, 999. Izd et l., 2] Msoumeh T. Izd, Ajt V. Rjwde, nd Don Precup. Usng core belefs for pont-bsed vlue terton. In 9th Interntonl Jont Conference on Artfcl Intellgence, 2. Jmes et l., 24] Mchel R. Jmes, Stnder Sngh, nd Mchel L. Lttmn. Plnnng wth predctve stte representtons. In The 24 Interntonl Conference on Mchne Lernng nd Applctons, 24.

6 Ponts vs. Performnce Itertons vs. Performnce 6 Tme vs. Performnce 2 LP 4 Dstnce 4 One Step 3. Network Shuttle x Hllwy Fgure 2: Emprcl comprsons of three methods for selectng new ponts durng vlue terton. Ech grph evlutes performnce gnst one of three metrcs number of ponts, totl number of tertons used n constructng polcy, nd totl CPU tme. The y-xs of ech grph represents the verge vlue of the column s metrc requred to ern prtculr rewrd (x-xs) durng emprcl evluton. Experments were conducted on four dynmcl systems (one per row): Network, Shuttle, 4x3, nd Hllwy. Lower vlues re ndctve of less resource consumpton, nd so re more desrble. Pneu et l., 23] J. Pneu, G. Gordon, nd S. Thrun. Pontbsed vlue terton: An nytme lgorthm for POMDPs. In Proc. 8th Interntonl Jont Conference on Artfcl Intellgence, Acpulco, Mexco, 23. Pneu et l., 2] J. Pneu, G. Gordon, nd S. Thrun. Pontbsed pproxmtons for fst pomdp solvng. Techncl Report SOCS-TR-2.4, McGll Unversty, 2. Shn et l., 2] Guy Shn, Ronen Brfmn, nd Solomon Shmony. Model-bsed onlne lernng of POMDPs. In 6th Euro- pen Conference on Mchne Lernng, 2. Sngh et l., 24] Stnder Sngh, Mchel R. Jmes, nd Mtthew R. Rudry. Predctve stte representtons, new theory for modelng dynmcl systems. In 2th Conference on Uncertnty n Artfcl Intellgence, 24. Spn nd Vlsss, 2] M.T.J Spn nd N. Vlsss. Perseus: Rndomzed pont-bsed vlue terton for pomdps. Journl of Artfcl Intellgence Reserch, 24:9 22, 2.

Partially Observable Systems. 1 Partially Observable Markov Decision Process (POMDP) Formalism

Partially Observable Systems. 1 Partially Observable Markov Decision Process (POMDP) Formalism CS294-40 Lernng for Rootcs nd Control Lecture 10-9/30/2008 Lecturer: Peter Aeel Prtlly Oservle Systems Scre: Dvd Nchum Lecture outlne POMDP formlsm Pont-sed vlue terton Glol methods: polytree, enumerton,