Compact, Convex Upper Bound Iteration for Approximate POMDP Planning
|
|
- Neil Lester
- 6 years ago
- Views:
Transcription
1 Compct, Convex Upper Bound Itertion for Approximte POMDP Plnning To Wng University of Alert Pscl Pouprt University of Wterloo Michel Bowling nd Dle Schuurmns University of Alert Astrct Prtilly oservle Mrkov decision processes (POMDPs) re n intuitive nd generl wy to model sequentil decision mking prolems under uncertinty. Unfortuntely, even pproximte plnning in POMDPs is known to e hrd, nd developing heuristic plnners tht cn deliver resonle results in prctice hs proved to e significnt chllenge. In this pper, we present new pproch to pproximte vlue-itertion for POMDP plnning tht is sed on qudrtic rther thn piecewise liner function pproximtors. Specificlly, we pproximte the optiml vlue function y convex upper ound composed of fixed numer of qudrtics, nd optimize it t ech stge y semidefinite progrmming. We demonstrte tht our pproch cn chieve competitive pproximtion qulity to current techniques while still mintining ounded size representtion of the function pproximtor. Moreover, n upper ound on the optiml vlue function cn e preserved if required. Overll, the technique requires computtion time nd spce tht is only liner in the numer of itertions (horizon time). Introduction Prtilly oservle Mrkov decision processes (POMDPs) re generl model of n gent cting in n environment, where the effects of the gent s ctions nd the oservtions it cn mke out the current stte of the environment re oth suject to uncertinty. The gent s gols re specified y rewrds it receives (s function of the sttes it visits nd ctions it executes), nd n optiml ehvior strtegy in this context chooses ctions, sed on the history of oservtions, tht mximizes the long term rewrd of the gent. POMDPs hve ecome n importnt modeling formlisms in rootics nd utonomous gent design (Thrun, Burgrd, & Fox 2005; Pineu et l. 2003). Much of the current work on root nvigtion nd mpping, for exmple, is now sed on stochstic trnsition nd oservtion models (Thrun, Burgrd, & Fox 2005; Roy, Gordon, & Thrun 2005). Moreover, POMDP representtions hve lso een used to design utonomous gents for rel world pplictions, including nursing (Pineu et l. 2003) nd elderly ssistnce (Boger et l. 2005). Copyright c 2006, Americn Assocition for Artificil Intelligence ( All rights reserved. Despite their convenience s modeling frmework however, POMDPs pose difficult computtionl prolems. It is well known tht solving for optiml ehvior strtegies or even just pproximting optiml strtegies in POMDP is intrctle (Mdni, Hnks, & Condon 2003; Mundhenk et l. 2000). Therefore, lot of work hs focused on developing heuristics for computing resonle ehvior strtegies for POMDPs. These pproches hve generlly followed three rod strtegies: vlue function pproximtion (Huskrecht 2000; Spn & Vlssis 2005; Pineu, Gordon, & Thrun 2003; Prr & Russell 1995), policy sed optimiztion (Ng & Jordn 2000; Pouprt & Boutilier 2003; 2004; Amto, Bernstein, & Zilerstein 2006), nd stochstic smpling (Kerns, Mnsour, & Ng 2002; Thrun 2000). In this pper, we focus on the vlue function pproximtion pproch nd contriute new perspective to this strtegy. Most previous work on vlue function pproximtion for POMDPs hs focused on representtions tht explicitly mintin set of α-vectors or elief sttes. This is motivted y the fct tht the optiml vlue function, considered s function of the elief stte, is determined y the mximum of set of liner functions specified y α-vectors where ech α-vector is ssocited with specific ehvior policy. Since the optiml vlue function is given y the mximum of (lrge) set of α-vectors, it is nturl to consider pproximting it y suset of α-vectors, or t lest smll set of liner functions. In fct, even n exct representtion of the optiml vlue function need not keep every α-vector, ut only those tht re mximl for t lest some witness elief stte. Motivted y this chrcteriztion, most vlue function pproximtion strtegies ttempt to mintin smller suset of α-vectors y focusing on reduced set of elief sttes (Spn & Vlssis 2005; Huskrecht 2000; Pineu, Gordon, & Thrun 2003). Although much recent progress hs een mde on α-vector sed pproximtions, drwck of this pproch is tht the numer of α-vectors stored generlly hs to grow with the numer of vlue itertions to mintin n dequte pproximtion (Pineu, Gordon, & Thrun 2003). In this pper, we consider n lterntive pproch tht drops the notion of n α-vector entirely from the pproximtion strtegy. Insted we exploit the other fundmentl oservtion out the nture of the optiml vlue function: since it is determined y elief-stte-wise mxi-
2 mum over liner functions, the optiml vlue function must e convex function of the elief stte (Sondik 1978; Boyd & Vndenerghe 2004). Our strtegy, then, is to compute convex pproximtion to the optiml vlue function tht is sed on qudrtic rther thn liner functions of the elief stte. The dvntge of using qudrtic sis for vlue function pproximtion is severl-fold: First, the size of the representtion does not hve to grow merely to model n incresing numer of fcets in the optiml solution; thus we cn keep ounded size representtion t ech horizon. Second, qudrtic representtion llows one to conveniently mintin provle upper ound on the optiml vlues in n explicit compct representtion without requiring uxiliry liner progrmming to e used to retrieve the ound, s in current grid sed pproches (Huskrecht 2000; Smith & Simmons 2005). Third, the computtionl cost of updting the pproximtion does not chnge with itertion numer (either in time or spce), so the overll computtion time is only liner in the horizon. Finlly, s we demonstrte elow, despite significnt reduction in representtion size, convex qudrtics re still le to chieve competitive pproximtion qulity on enchmrk POMDP prolems. Bckground We egin with Mrkov decision processes (MDP) since we will need to exploit some sic concepts from MDPs in our pproch elow. An MDP is defined y set of sttes S, set of ctions A, stte trnsition model p(s s, ), nd rewrd model r(s, ). In this setting, deterministic policy is specified y function from sttes to ctions, π : S A, nd the vlue function for policy is defined s the expected future discounted rewrd the policy otins from ech stte [ ] V π (s) = E π γ t r(s t, π(s t )) s 0 = s t=0 Here the discount fctor, 0 γ < 1, expresses trdeoff etween short term nd long term rewrd. It is known tht there exists deterministic optiml policy whose vlue function domintes ll other policy vlues in every stte (Bertseks 1995). This optiml vlue function lso stisfies the Bellmn eqution V (s) = mx r(s, ) + γ p(s s, )V (s ) (1) s Computing the optiml vlue function for given MDP cn e ccomplished in severl wys. The two wys we consider elow re vlue itertion nd liner progrmming. Vlue itertion is sed on repetedly pplying the Bellmn ckup opertor, V n+1 = HV n, specified y V n+1 (s) = mx r(s, ) + γ p(s s, )V n (s ) (2) s It cn e shown tht V n V in the L norm, nd thus V is fixed point of (2) (Bertseks 1995). V is lso the solution to the liner progrm min V V (s) s.t. V (s) r(s, )+γ p(s s, )V (s ) (3) s s for ll s S nd A. It turns out tht for continuous stte spces, the Bellmn eqution (1) still chrcterizes the optiml vlue function, replcing the trnsition proilities with conditionl densities nd the sums with Leesgue integrls. However, computtionlly, the sitution is not so simple for continuous stte spces, since the integrls must now somehow e solved in plce of the sums, nd (3) is no longer finitely defined. Nevertheless, continuous stte spces re unvoidle when one considers POMDP plnning. POMDPs extend MDPs y introducing n oservtion model p(, s ) tht governs how noisy oservtion O is relted to the underlying stte s nd the ction. Hving ccess to only noisy oservtions of the stte complictes the prolem of choosing optiml ctions significntly. The gent now never knows the exct stte of the environment, ut insted must infer distriution over possile sttes, (s), from the history of oservtions nd ctions. Nevertheless, given n ction nd oservtion the gent s elief stte cn e esily updted y Byes rule (,, ) (s ) = p(, s ) s p(s s, )(s)/z (4) where Z = p(, ) = s p(o, s ) s p(s s, )(s). By the Mrkov ssumption, the elief stte is sufficient representtion upon which n optiml ehvior strtegy cn e defined (Sondik 1978). Therefore, policy is nturlly specified in this setting y function from elief sttes to ctions, π : B A, where B is the set of ll possile distriutions over the underlying stte spce S (n S 1 dimensionl simplex). Oviously for ny environment with two or more sttes there re n infinite numer of elief sttes, nd not every policy cn e finitely represented. Nevertheless, one cn still define the vlue function of policy s the expected future discounted rewrd otined from ech elief stte [ ] V π () = E π γ t r( t, π( t )) 0 = t=0 where r(, ) = s r(s, )(s). Thus, POMDP cn e treted s n MDP over elief sttes; tht is, continuous stte MDP. As efore, n optiml policy otins the mximum vlue for ech elief stte, nd its vlue function stisfies the Bellmn eqution (Sondik 1978) V () = mx r(, )+γ p(, )V ( ) = mx r(, )+γ p(, )V ( (,, ) )(5) Unfortuntely, solving the functionl eqution (5) for V is hrd. Known techniques for computing the optiml vlue function re generlly sed on vlue itertion (Cssndr, Littmn, & Zhng 1997; Zhng & Zhng 2001); lthough policy sed pproches re lso possile (Sondik 1978; Pouprt & Boutilier 2003; 2004). As ove, vlue itertion is sed on repetedly pplying Bellmn ckup opertor, V n+1 = HV n, to current vlue function pproximtion. In this cse, current lower ound, V n, is represented y
3 finite set of α-vectors, Γ n = {α π : π Π n }, where ech α-vector is ssocited with n n-step ehvior strtegy π. Given Γ n, the vlue function is represented y V n () = mx α π Γ n α π At ech stge of vlue itertion, the current lower ound is updted ccording to the Bellmn ckup, V n+1 = HV n, such tht V n+1 () = mx r(, )+γ p(, )V n ( (,, ) ) (6) = mx r +γ rg g (π,, ) mx g (π π Π,, ) n = mx,{ π } α,{ π } (7) where we use the quntities g (π,, )(s) = s p(, s )p(s, s)α π (s ) α,{ π } = r +γ g (π,, ) Once gin it is known tht V n V in the L norm, nd thus V is fixed point of (6) (Sondik 1978). Although the size of the representtion for V n+1 remins finite, it cn e exponentilly lrger thn V n in the worst cse, since enumerting every possiility for, { π } over A, o O, π Π n, yields Π n+1 A Π n O comintions. Mny of these α-vectors re not mximl for ny elief stte, nd cn e pruned y running liner progrm for ech tht verifies whether there is witness elief stte for which it is mximl (Cssndr, Littmn, & Zhng 1997). Thus, the set of α-vectors, Γ n, ction strtegies, Π n, nd witness elief sttes, B n, re ll ssocited 1 to 1. However, even with pruning, exct vlue itertion cnnot e run for mny steps, even on smll prolems. Vlue function pproximtion strtegies Much reserch hs focused on pproximting the optiml vlue function, imed for the most prt t reducing the time nd spce complexity of the vlue itertion updte. Work in this re hs considered vrious strtegies (Huskrecht 2000), including direct MDP pproximtions nd vrints, nd using function pproximtion to fit V n+1 over smpled elief sttes (Prr & Russell 1995; Littmn, Cssndr, & Kelling 1995). However, two pproches hve recently ecome the most dominnt: grid sed nd elief point pproximtions. The grid sed pproch (Gordon 1995; Huskrecht 2000; Zhou & Hnsen 2001; Bonet 2002) mintins finite collection of elief sttes long with ssocited vlue estimtes {, V n () : B grid }. These vlue estimtes re updted y pplying the Bellmn updte on B grid. An importnt dvntge of this pproch is tht it cn mintin n upper ound on the optiml vlue function. Unfortuntely, mintining tight ound entils significnt computtionl expense (Huskrecht 2000): First, B grid must contin ll corners of the simplex so tht its convex closure spns B. Second, ech successor elief stte in (6) must hve its interpolted vlue estimte minimized y liner progrm (Zhou & Hnsen 2001). Below we show tht this lrge numer of liner progrms cn e replced with single convex optimiztion. Unlike the grid sed pproch, which tkes current elief stte in B grid nd projects it forwrd to elief sttes outside of B grid, the elief point pproch only considers elief sttes in witness set B wit (Pineu, Gordon, & Thrun 2003; Smith & Simmons 2005). Specificlly, the elief point pproximtion mintins lower ound y keeping suset of α-vectors ssocited with these witness elief sttes. To further explin this pproch, let Γ n = {α π : π ˆΠ n }, so tht there is 1 to 1 correspondence etween α-vectors in Γ n, ction strtegies in ˆΠ n nd elief sttes in B wit. Then the set of α-vectors is updted y pplying the Bellmn ckup, ut restricting the choices in (7) to π ˆΠ n, nd only computing (7) for B wit. Thus, the numer of α-vectors in ech itertion remins ounded nd ssocited with B wit. The qulity of oth these pproches is strongly determined y the sets of elief points, B grid nd B wit, they mintin. For the elief point pproch, one generlly hs to grow the numer of elief points t ech itertion to mintin n dequte ound on the optiml vlue function. Pineu et l. (2003) suggested douling the size t ech itertion, ut recently more refined pproch ws suggested y (Smith & Simmons 2005). Convex qudrtic upper ounds The key oservtion ehind our pproch is tht one does not need to e confined to piecewise liner pproximtions. Our intuition is tht convex qudrtic pproximtions re prticulrly well suited for vlue function pproximtion in POMDPs. This is motivted y the fct tht ech vlue itertion step produces mximum over set of convex functions, yielding result tht is lwys convex. Thus, one cn plusily use convex qudrtic function to upper ound the mximum over α-vectors, nd more generlly to upper ound the mximum over ny set of ck-projected convex vlue pproximtions from itertion n. Our sic gol then is to retin compct representtion of the vlue pproximtion y exploiting the fct tht qudrtics cn e more efficient t pproximting convex upper ound thn set of liner functions; see Figure 1. As with piecewise liner pproximtions, the qulity of the pproximtion cn e improved y tking mximum over set of convex qudrtics, which would yield convex piecewise qudrtic rther thn piecewise liner pproximtion. In this pper, however, we will focus on the most nive choice, nd pproximte the vlue function with single qudrtic in ech step of vlue itertion. The susequent extension to multiple qudrtics is discussed elow. An importnt dvntge the qudrtic form hs over other function pproximtion representtions is tht it permits convex minimiztion of the upper ound, s we demonstrte elow. Such convenient formultion is not redily chievle for other function representtions. Also, since we re
4 α1 Convex qudrtic upper ound α2 Let the ction-vlue ckup of ˆV e denoted y q () = r(, ) + γ p(, ) ˆV ( (,, ) ) (9) Figure 1: Illustrtion of convex qudrtic upper ound pproximtion to mximum of liner functions α π. not compelled to grow the size of the representtion t ech itertion, we otin n pproch tht runs in liner time in the numer of vlue itertion steps. There re few drwcks in dropping the piecewise liner representtion however. One drwck is tht we lose the 1 to 1 correspondence etween α-vectors nd ehvior strtegies π, which mens tht greedy ction selection requires one step look hed clcultion sed on (5). The second drwck is tht the convex optimiztion prolem we hve to solve t ech vlue itertion is more complex thn simple liner progrm. Convex upper ound itertion The min technicl chllenge we fce is to solve for tight qudrtic upper ound on the vlue function t ech stge of vlue itertion. Interestingly, this cn e done effectively with convex optimiztion s follows. We represent the vlue function pproximtion over elief sttes y qudrtic form ˆV n () = W n + w n + ω n (8) where W n is squre mtrix of weights, w n is vector of weights, nd ω n is sclr offset weight. Eqution (8) defines convex function of elief stte if nd only if the mtrix W n is positive semidefinite (Boyd & Vndenerghe 2004). We denote the semidefinite constrint on W n y W n 0. As shown ove, one step of vlue itertion involves expnding (nd ck-projecting) vlue pproximtion from stge n; defining the vlue function t stge n + 1 y the mximum over the expnded, ck-projected set. However, ck-projection entils some dditionl compliction in our cse ecuse we do not mintin set of α-vectors, ut rther mintin qudrtic function pproximtion t stge n. Tht is, our pproximte vlue itertion step hs to pull the qudrtic form through the ckup opertor. Unfortuntely, the result of ckup is no longer qudrtic, ut rtionl (qudrtic over liner) function. Fortuntely, however, the result of this ckup is still convex, s we now show. α3 To express this s function of, we need to expnd the definitions of (,, ) nd ˆV n respectively. First, note tht (,, ) is rtio of vector liner function of over sclr liner function of y (4), therefore we cn represent it y (,, ) = M, p(, ) = M, e M, (10) where M, is mtrix such tht M,(s, s) = p(, s )p(s s, ), nd e denotes the vector of ll 1s. Sustituting (8) nd (10) into (9) yields q () = r(, )+γ M,W M, e +(w+ωe) M, M, Theorem 1 q () is convex in. Proof First note tht M,o W M, 0 if W 0, nd therefore it suffices to show tht the function f() = ( N)/(v ) is convex under the ssumption N 0 nd v 0. Note tht N 0 implies N = QQ for some Q, nd therefore f() = ( QQ )/(v ) = (Q ) (v I) 1 (Q ). Next, we use few elementry fcts out convexity (Boyd & Vndenerghe 2004). First, function is convex iff its epigrph is convex, so it suffices to show tht the set {(, v I, δ) v I 0, (Q ) (v I) 1 (Q ) δ} is convex. By the Schur complement lemm, we [ hve tht δ ] (Q ) (v I) 1 (Q ) 0 iff v I Q (Q ) 0 nd therefore f() is convex iff { δ [ ] } the set (, v I, δ) v v I 0, I Q (Q ) 0 δ is convex. The result then follows ecuse this set cn e written s liner mtrix inequlity. Corollry 1 Given convex qudrtic representtion for ˆV n, mx q (), nd hence H ˆV n, is convex in. So ck-projecting the convex qudrtic representtion still yields convex result. Our gol is to optimize tight qudrtic upper ound on the mximum of these convex functions (which of course is still convex). In some pproches elow we will use the ck-projected ction-vlue functions directly. However, in other cses, it will prove dvntgeous if we cn work with liner upper ounds on the ck-projections. Proposition 1 The tightest liner upper ound on q () is given y q () u for vector u such tht u 1 s = q (1 s ) for ech corner elief stte 1 s. Algorithmic pproch We would like to solve for qudrtic ˆV n+1 t stge n + 1 tht otins s tight n upper ound on H ˆV n s possile. To do this, we ppel to the liner progrm chrcteriztion
5 of the optiml vlue function (3) which lso is expressed s minimizing n upper ound on the ck-projected vlue function. Unfortuntely, here, since we re no longer working with finite spce, we cnnot formulte liner progrm ut rther hve to pose generlized semi-infinite progrm min W,w,ω ( W + w + ω ) µ() d suject to (11) W + w + ω q (),, ; W 0 where µ() is mesure over the spce of possile elief sttes. The semi-infinite progrm (11) specifies liner ojective suject to liner constrints (leit infinitely mny liner constrints); nd hence is convex optimiztion prolem in W, w, ω. There re two min difficulties in solving this convex optimiztion prolem. First, the ojective involves n integrl with respect to mesure µ() on elief sttes. This mesure is ritrry (except tht it must hve full support on the elief spce B) nd llows one to control the emphsis the minimiztion plces on different regions of the elief spce. For simplicity, we ssume the mesure is Dirichlet distriution, specified y vector of prior prmeters θ(s), s S. The Dirichlet distriution is prticulrly convenient in this context since one cn specify uniform distriution over the elief simplex merely y setting θ(s) = 1 for ll s. Moreover, the required integrls for the Dirichlet hve closed form solution, which llows us to simply precompute the liner coefficients for the weight prmeters, y ( W + w + ω ) µ()d = W, E[ ] +w E[]+ω where E[] = θ/ θ 1 ; E[ ] = (dig(e[]) + θ 1 E[]E[] )/(1 + θ 1 ) (Gelmn et l. 1995); nd A, B = ij A ijb ij. Tht is, one cn specify θ nd compute the liner coefficients hed of time. The second nd more difficult prolem with solving (11) is to find wy to cope with the infinite numer of liner constrints. Here, we ddress the prolem with strightforwrd constrint genertion pproch. The ide is to solve (11), itertively, y keeping finite set of constrints, ech corresponding to elief stte, nd solving the finite semidefinite progrm min W, W,w,ω E[ ] + w E[] + ω suject to (12) i W i + w i + ω q ( i ),, i C; W 0 Given puttive solution, W, w, ω, new constrint cn e otined y finding elief stte tht solves min W + w + ω q () suject to 0, s (s) = 1 (13) for ech. If the minimum vlue is nonnegtive for ll then there re no violted constrints nd we hve solution to (11). Unfortuntely, (13) cnnot directly e used for constrint genertion, since q () is convex function of (Theorem 1) nd hence q () is concve; yielding non-convex ojective. Thus, to use (13) for constrint genertion we need to follow n lterntive pproch. We hve pursued three different pproches to this prolem thus fr. Our first strtegy mintins provle upper ound on the optiml vlue function y strengthening the constrint threshold with the liner upper ound u q () from Proposition 1. Replcing q () with u in (11) nd (13) ensures tht n upper ound will e mintined, ut lso reduces (13) to qudrtic progrm tht cn e efficiently minimized to produce elief stte with mximum constrint violtion. Our second strtegy relxes the upper ound gurntee y only sustituting u for q () in the constrint genertion procedure, mintining n efficient qudrtic progrmming formultion there, ut keeping q () in the min optimiztion (12). This no longer gurntees n upper ound, ut cn still produce etter pproximtions in prctice ecuse the ounds do not hve to e rtificilly strengthened. Our finl strtegy side-steps optiml constrint genertion entirely, nd insted chooses fixed set of elief sttes for the constrint set C in (12). In this wy, the semidefinite progrm (12) needs to e solved only once per vlue itertion step. This strtegy doesn t produce n upper ound either ut the resulting pproximtion is fst nd effective in prctice. Finlly, to improve pproximtion qulity, one could ugment the pproximte vlue function representtion with mximum over set of qudrtics, much s with α-vectors. One nturl wy to do this would e to mintin seprte qudrtic for ech ction,, in (11). Experimentl results We implemented the proposed pproch using SDPT3 (Toh, Todd, & Tutuncu 1999) s the semidefinite progrm solver for (12). Specificlly, in our initil experiments, we hve investigted the third (simplest) strtegy mentioned ove, CQUB, which only used rndom smple of elief sttes to specify the constrints in C. We compred this method to two current vlue function pproximtion strtegies in the literture: Perseus (Spn & Vlssis 2005), nd PBVI (Pineu, Gordon, & Thrun 2003). Here, oth Perseus nd PBVI were run with the numer of elief sttes fixed t 1000, wheres the convex qudrtic method, CQUB, ws run with 100 rndom elief sttes. In our initil experiments, we considered the enchmrk prolems: Mze (Huskrecht 1997), Tigergrid, Hllwy, Hllwy2, Aircrft ville from Tle 1 gives the prolem chrcteristics. In ech cse, numer of vlue itertion steps ws fixed s shown in Tle 1, nd ech method ws run 10 times to generte n estimte of vlue function pproximtion qulity. Tle 2 shows the results otined y the vrious vlue function pproximtion strtegies on these domins, reporting the expected discounted rewrd otined y the greedy policies defined with respect to the vlue function estimtes, s well s the verge time nd the size of the vlue function
6 Prolems S A O vlue iters Mze Tiger-grid Hllwy Hllwy Aircrft Tle 1: Prolem chrcteristics. CQUB Perseus PBVI Mze Avg. rewrd ± ± ±2.0 Run time (s) Size Tiger-grid Avg. rewrd 2.16 ± ± ±0.06 Run time (s) Size Hllwy Avg. rewrd 0.58 ± ± ±0.03 Run time (s) Size Hllwy2 Avg. rewrd 0.43 ± ± ±0.03 Run time (s) Size Aircrft Avg. rewrd ± ± ±0.42 Run time (s) Size Tle 2: Men discounted rewrd otined over 1000 trjectories using the greedy policy for ech vlue function pproximtion, verged over 10 runs of vlue itertion. pproximtion. 1 Interestingly, the convex qudrtic strtegy CQUB performed surprisingly well in these experiments, competing with stte of the rt vlue function pproximtions while only using 100 rndom elief sttes for constrint genertion in (12). The result is slightly weker in the Tiger-grid domin, ut significntly stronger in the Hllwy domins; supporting the thesis tht convex qudrtics cpture vlue function structure more efficiently thn liner pproches. Conclusions We hve introduced new pproch to vlue function pproximtion for POMDPs tht is sed on convex qudrtic ound rther thn piecewise liner pproximtion. We hve found tht qudrtic pproximtors cn chieve highly competitive pproximtion qulity without growing the size of the representtion, even while explicitly 1 For Perseus nd PBVI, the size is S times the numer of α- vectors. For CQUB, the size is just S ( S +1)/2+ S +1, which corresponds to the numer of vriles in the qudrtic pproximtor. focusing on only tiny frction of the elief sttes. We expect tht this pproch cn led to new venues of reserch in vlue pproximtion for POMDPs. We re currently considering extensions to this pproch sed on elief stte compression (Pouprt & Boutilier 2002; 2004; Roy, Gordon, & Thrun 2005), nd fctored models (Boutilier & Poole 1996; Feng & Hnsen 2001; Pouprt 2005) to tckle POMDPs with lrge stte spces. We lso pln to comine our qudrtic vlue function pproximtion with policy sed nd smpling sed pproches. A further ide we re exploring is the interprettion of convex qudrtics s second order Tylor pproximtions to the optiml vlue function, which offers further lgorithmic pproches with the potentil for tight theoreticl gurntees on pproximtion qulity. Acknowledgments Reserch supported y the Alert Ingenuity Centre for Mchine Lerning, NSERC, MITACS, CFI, nd the Cnd Reserch Chirs progrm. References Amto, C.; Bernstein, D.; nd Zilerstein, S Solving POMDPs using qudrticlly constrined liner progrms. In Proceedings of the Fifth Interntionl Joint Conference on Autonomous Agents nd Multigent Systems (AAMAS). Bertseks, D Dynmic Progrmming nd Optiml Control, volume 2. Athen Scientific. Boger, J.; Pouprt, P.; Hoey, J.; Boutilier, C.; Fernie, G.; nd Mihilidis, A A decision-theoretic pproch to tsk ssistnce for persons with dementi. In Proceedings of the Nineteenth Interntionl Joint Conference on Artificil Intelligence (IJCAI). Bonet, B An ɛ-optiml grid-sed lgorithm for prtilly oservle Mrkov decision processes. In Proceedings of the Nineteenth Interntionl Conference on Mchine Lerning (ICML). Boutilier, C., nd Poole, D Computing optiml policies for prtilly oservle decision processes using compct representtions. In Proceedings of the Thirteenth Ntionl Conference on Artificil Intelligence (AAAI). Boyd, S., nd Vndenerghe, L Convex Optimiztion. Cmridge Univ. Press. Cssndr, A.; Littmn, M.; nd Zhng, N Incrementl pruning: A simple, fst, exct method for prtilly oservle Mrkov decision processes. In Proceedings of the Thirteenth Conference on Uncertinty in Artificil Intelligence (UAI). Feng, Z., nd Hnsen, E. A Approximte plnning for fctored POMDPs. In Proceedings of the Sixth Europen Conference on Plnning. Gelmn, A.; Crlin, J.; Stern, H.; nd Ruin, D Byesin Dt Anlysis. Chpmn & Hll. Gordon, G Stle function pproximtion in dynmic progrmming. In Proceedings of the Twelfth Interntionl Conference on Mchine Lerning (ICML).
7 Huskrecht, M Incrementl methods for computing ounds in prtilly oservle mrkov decision processes. In Proceedings of the Fourteenth Ntionl Conference on Artificil Intelligence (AAAI). Huskrecht, M Vlue-function pproximtions for prtilly oservle Mrkov decision processes. Journl of Artificil Intelligence Reserch 13: Kerns, M.; Mnsour, Y.; nd Ng, A A sprse smpling lgorithm for ner-optiml plnning in lrge Mrkov decision processes. Mchine Lerning 49(2-3): Littmn, M.; Cssndr, A.; nd Kelling, L Lerning policies for prtilly oservle environments: scling up. In Proceedings of the Twelfth Interntionl Conference on Mchine Lerning (ICML). Mdni, O.; Hnks, S.; nd Condon, A On the undecidility of proilistic plnning nd relted stochstic optimiztion prolems. Artificil Intelligence 147:5 34. Mundhenk, M.; Goldsmith, J.; Lusen, C.; nd Allender, E Complexity of finite-horizon Mrkov decision processes. Journl of the Assocition for Computing Mchinery 47(4): Ng, A., nd Jordn, M Pegsus: A policy serch method for lrge MDPs nd POMDPs. In Proceedings of the Sixteenth Conference on Uncertinty in Artificil Intelligence (UAI). Prr, R., nd Russell, S Approximting optiml policies for prtilly oservle stochstic domins. In Proceedings of the Fourteenth Interntionl Joint Conference on Artificil Intelligence (IJCAI). Pineu, J.; Montemerlo, M.; Pollck, M.; Roy, N.; nd Thrun, S Towrds rootic ssistnts in nursing homes: Chllenges nd results. Rootics nd Autonomous Systems 42: Pineu, J.; Gordon, G.; nd Thrun, S Point-sed vlue itertion: An nytime lgorithm for POMDPs. In Proceedings of the Eighteenth Interntionl Joint Conference on Artificil Intelligence (IJCAI). Pouprt, P., nd Boutilier, C Vlue-directed compression of POMDPs. In Advnces in Neurl Informtion Processing Systems (NIPS 15). Pouprt, P., nd Boutilier, C Bounded finite stte controllers. In Advnces in Neurl Informtion Processing Systems (NIPS 16). Pouprt, P., nd Boutilier, C VDCBPI: An pproximte sclle lgorithm for lrge POMDPs. In Advnces in Neurl Informtion Processing Systems (NIPS 17). Pouprt, P Exploiting Structure to efficienty solve lrge scle prtilly oservle Mrkov decision processes. Ph.D. Disserttion, Deprtment of Computer Science, University of Toronto. Roy, N.; Gordon, G.; nd Thrun, S Finding pproximte POMDP solutions through elief compression. Journl of Artificil Intelligence Reserch 23:1 40. Smith, T., nd Simmons, R Point-sed POMDP lgorithms: Improved nlysis nd implementtion. In Proceedings of the Twenty-first Conference on Uncertinty in Artificil Intelligence (UAI). Sondik, E The optiml control of prtilly oservle Mrkov processes over the infinite horizon: Discounted costs. Opertions Reserch 26: Spn, M., nd Vlssis, N Perseus: Rndomized point-sed vlue itertion for POMDPs. Journl of Artificil Intelligence Reserch 24: Thrun, S.; Burgrd, W.; nd Fox, D Proilistic Rootics. MIT Press. Thrun, S Monte Crlo POMDPs. In Advnces in Neurl Informtion Processing Systems (NIPS 12). Toh, K.; Todd, M.; nd Tutuncu, R SDPT3 Mtl softwre pckge for semidefinite progrmming. Optimiztion Methods nd Softwre 11. Zhng, N., nd Zhng, W Speeding up the convergence of vlue itertion in prtilly oservle Mrkov decision processes. Journl of Artificil Intelligence Reserch 14: Zhou, R., nd Hnsen, E An improved grid-sed pproximtion lgorithm for POMDPs. In Proceedings of the Seventeenth Interntionl Joint Conference on Artificil Intelligence (IJCAI).
Point-Based POMDP Algorithms: Improved Analysis and Implementation
Point-Bsed POMDP Algorithms: Improved Anlysis nd Implementtion Trey Smith nd Reid Simmons Rootics Institute, Crnegie Mellon University Pittsurgh, PA 15213 Astrct Existing complexity ounds for point-sed
More informationReinforcement learning II
CS 1675 Introduction to Mchine Lerning Lecture 26 Reinforcement lerning II Milos Huskrecht milos@cs.pitt.edu 5329 Sennott Squre Reinforcement lerning Bsics: Input x Lerner Output Reinforcement r Critic
More informationReview of Gaussian Quadrature method
Review of Gussin Qudrture method Nsser M. Asi Spring 006 compiled on Sundy Decemer 1, 017 t 09:1 PM 1 The prolem To find numericl vlue for the integrl of rel vlued function of rel vrile over specific rnge
More information19 Optimal behavior: Game theory
Intro. to Artificil Intelligence: Dle Schuurmns, Relu Ptrscu 1 19 Optiml behvior: Gme theory Adversril stte dynmics hve to ccount for worst cse Compute policy π : S A tht mximizes minimum rewrd Let S (,
More informationReinforcement Learning
Reinforcement Lerning Tom Mitchell, Mchine Lerning, chpter 13 Outline Introduction Comprison with inductive lerning Mrkov Decision Processes: the model Optiml policy: The tsk Q Lerning: Q function Algorithm
More information2D1431 Machine Learning Lab 3: Reinforcement Learning
2D1431 Mchine Lerning Lb 3: Reinforcement Lerning Frnk Hoffmnn modified by Örjn Ekeberg December 7, 2004 1 Introduction In this lb you will lern bout dynmic progrmming nd reinforcement lerning. It is ssumed
More informationModule 6 Value Iteration. CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo
Module 6 Vlue Itertion CS 886 Sequentil Decision Mking nd Reinforcement Lerning University of Wterloo Mrkov Decision Process Definition Set of sttes: S Set of ctions (i.e., decisions): A Trnsition model:
More informationAn Optimal Best-First Search Algorithm for Solving Infinite Horizon DEC-POMDPs
An Optiml Best-First Serch Algorithm for Solving Infinite Horizon DEC-POMDPs Dniel Szer nd Frnçois Chrpillet INRIA Lorrine - LORIA, MAIA Group, 54506 Vndœuvre-lès-Nncy, Frnce {szer, chrp}@lori.fr http://mi.lori.fr
More informationBayesian Networks: Approximate Inference
pproches to inference yesin Networks: pproximte Inference xct inference Vrillimintion Join tree lgorithm pproximte inference Simplify the structure of the network to mkxct inferencfficient (vritionl methods,
More informationp-adic Egyptian Fractions
p-adic Egyptin Frctions Contents 1 Introduction 1 2 Trditionl Egyptin Frctions nd Greedy Algorithm 2 3 Set-up 3 4 p-greedy Algorithm 5 5 p-egyptin Trditionl 10 6 Conclusion 1 Introduction An Egyptin frction
More informationDiscrete Mathematics and Probability Theory Spring 2013 Anant Sahai Lecture 17
EECS 70 Discrete Mthemtics nd Proility Theory Spring 2013 Annt Shi Lecture 17 I.I.D. Rndom Vriles Estimting the is of coin Question: We wnt to estimte the proportion p of Democrts in the US popultion,
More informationDiscrete Mathematics and Probability Theory Summer 2014 James Cook Note 17
CS 70 Discrete Mthemtics nd Proility Theory Summer 2014 Jmes Cook Note 17 I.I.D. Rndom Vriles Estimting the is of coin Question: We wnt to estimte the proportion p of Democrts in the US popultion, y tking
More informationQUADRATURE is an old-fashioned word that refers to
World Acdemy of Science Engineering nd Technology Interntionl Journl of Mthemticl nd Computtionl Sciences Vol:5 No:7 011 A New Qudrture Rule Derived from Spline Interpoltion with Error Anlysis Hdi Tghvfrd
More informationAn Optimal Best-first Search Algorithm for Solving Infinite Horizon DEC-POMDPs
An Optiml Best-first Serch Algorithm for Solving Infinite Horizon DEC-POMDPs Dniel Szer, Frnçois Chrpillet To cite this version: Dniel Szer, Frnçois Chrpillet. An Optiml Best-first Serch Algorithm for
More informationGenetic Programming. Outline. Evolutionary Strategies. Evolutionary strategies Genetic programming Summary
Outline Genetic Progrmming Evolutionry strtegies Genetic progrmming Summry Bsed on the mteril provided y Professor Michel Negnevitsky Evolutionry Strtegies An pproch simulting nturl evolution ws proposed
More informationFarey Fractions. Rickard Fernström. U.U.D.M. Project Report 2017:24. Department of Mathematics Uppsala University
U.U.D.M. Project Report 07:4 Frey Frctions Rickrd Fernström Exmensrete i mtemtik, 5 hp Hledre: Andres Strömergsson Exmintor: Jörgen Östensson Juni 07 Deprtment of Mthemtics Uppsl University Frey Frctions
More informationAn Analytic Solution to Discrete Bayesian Reinforcement Learning
An Anlytic Solution to Discrete Byesin Reinforcement Lerning Pscl Pouprt ppouprt@cs.uwterloo.c Cheriton School of Computer Science, University of Wterloo, Wterloo, Ontrio, Cnd Nikos Vlssis vlssis@science.uv.nl
More informationTorsion in Groups of Integral Triangles
Advnces in Pure Mthemtics, 01,, 116-10 http://dxdoiorg/1046/pm011015 Pulished Online Jnury 01 (http://wwwscirporg/journl/pm) Torsion in Groups of Integrl Tringles Will Murry Deprtment of Mthemtics nd Sttistics,
More informationThe Minimum Label Spanning Tree Problem: Illustrating the Utility of Genetic Algorithms
The Minimum Lel Spnning Tree Prolem: Illustrting the Utility of Genetic Algorithms Yupei Xiong, Univ. of Mrylnd Bruce Golden, Univ. of Mrylnd Edwrd Wsil, Americn Univ. Presented t BAE Systems Distinguished
More information1 Online Learning and Regret Minimization
2.997 Decision-Mking in Lrge-Scle Systems My 10 MIT, Spring 2004 Hndout #29 Lecture Note 24 1 Online Lerning nd Regret Minimiztion In this lecture, we consider the problem of sequentil decision mking in
More informationChapter 6 Techniques of Integration
MA Techniques of Integrtion Asst.Prof.Dr.Suprnee Liswdi Chpter 6 Techniques of Integrtion Recll: Some importnt integrls tht we hve lernt so fr. Tle of Integrls n+ n d = + C n + e d = e + C ( n ) d = ln
More informationNew Expansion and Infinite Series
Interntionl Mthemticl Forum, Vol. 9, 204, no. 22, 06-073 HIKARI Ltd, www.m-hikri.com http://dx.doi.org/0.2988/imf.204.4502 New Expnsion nd Infinite Series Diyun Zhng College of Computer Nnjing University
More informationDual Formulations for Optimizing Dec-POMDP Controllers
Dul Formultions for Optimizing Dec-POMDP Controllers Aksht Kumr nd Hl Mostf nd Shlomo Zilerstein School of Informtion Systems, Singpore Mngement University, United Technologies Reserch Center College of
More informationCMDA 4604: Intermediate Topics in Mathematical Modeling Lecture 19: Interpolation and Quadrature
CMDA 4604: Intermedite Topics in Mthemticl Modeling Lecture 19: Interpoltion nd Qudrture In this lecture we mke brief diversion into the res of interpoltion nd qudrture. Given function f C[, b], we sy
More informationModel Reduction of Finite State Machines by Contraction
Model Reduction of Finite Stte Mchines y Contrction Alessndro Giu Dip. di Ingegneri Elettric ed Elettronic, Università di Cgliri, Pizz d Armi, 09123 Cgliri, Itly Phone: +39-070-675-5892 Fx: +39-070-675-5900
More informationI1 = I2 I1 = I2 + I3 I1 + I2 = I3 + I4 I 3
2 The Prllel Circuit Electric Circuits: Figure 2- elow show ttery nd multiple resistors rrnged in prllel. Ech resistor receives portion of the current from the ttery sed on its resistnce. The split is
More informationHistory-Based Controller Design and Optimization for Partially Observable MDPs
History-Bsed Controller Design nd Optimiztion for Prtilly Observble MDPs Aksht Kumr School of Informtion Systems Singpore Mngement University kshtkumr@smu.edu.sg Shlomo Zilberstein School of Computer Science
More informationBases for Vector Spaces
Bses for Vector Spces 2-26-25 A set is independent if, roughly speking, there is no redundncy in the set: You cn t uild ny vector in the set s liner comintion of the others A set spns if you cn uild everything
More informationA Fast and Reliable Policy Improvement Algorithm
A Fst nd Relible Policy Improvement Algorithm Ysin Abbsi-Ydkori Peter L. Brtlett Stephen J. Wright Queenslnd University of Technology UC Berkeley nd QUT University of Wisconsin-Mdison Abstrct We introduce
More informationAdministrivia CSE 190: Reinforcement Learning: An Introduction
Administrivi CSE 190: Reinforcement Lerning: An Introduction Any emil sent to me bout the course should hve CSE 190 in the subject line! Chpter 4: Dynmic Progrmming Acknowledgment: A good number of these
More informationMath 270A: Numerical Linear Algebra
Mth 70A: Numericl Liner Algebr Instructor: Michel Holst Fll Qurter 014 Homework Assignment #3 Due Give to TA t lest few dys before finl if you wnt feedbck. Exercise 3.1. (The Bsic Liner Method for Liner
More informationLecture 3. In this lecture, we will discuss algorithms for solving systems of linear equations.
Lecture 3 3 Solving liner equtions In this lecture we will discuss lgorithms for solving systems of liner equtions Multiplictive identity Let us restrict ourselves to considering squre mtrices since one
More informationMarkov Decision Processes
Mrkov Deciion Procee A Brief Introduction nd Overview Jck L. King Ph.D. Geno UK Limited Preenttion Outline Introduction to MDP Motivtion for Study Definition Key Point of Interet Solution Technique Prtilly
More informationBellman Optimality Equation for V*
Bellmn Optimlity Eqution for V* The vlue of stte under n optiml policy must equl the expected return for the best ction from tht stte: V (s) mx Q (s,) A(s) mx A(s) mx A(s) Er t 1 V (s t 1 ) s t s, t s
More informationMOMDPs: a Solution for Modelling Adaptive Management Problems
MOMDPs: Solution for Modelling Adptive Mngement Problems Idine Chdès nd Josie Crwrdine nd Tr G. Mrtin CSIRO Ecosystem Sciences {idine.chdes, josie.crwrdine, tr.mrtin}@csiro.u Smuel Nicol University of
More informationQuadratic Forms. Quadratic Forms
Qudrtic Forms Recll the Simon & Blume excerpt from n erlier lecture which sid tht the min tsk of clculus is to pproximte nonliner functions with liner functions. It s ctully more ccurte to sy tht we pproximte
More informationSection 4: Integration ECO4112F 2011
Reding: Ching Chpter Section : Integrtion ECOF Note: These notes do not fully cover the mteril in Ching, ut re ment to supplement your reding in Ching. Thus fr the optimistion you hve covered hs een sttic
More informationParse trees, ambiguity, and Chomsky normal form
Prse trees, miguity, nd Chomsky norml form In this lecture we will discuss few importnt notions connected with contextfree grmmrs, including prse trees, miguity, nd specil form for context-free grmmrs
More informationContinuous Random Variables Class 5, Jeremy Orloff and Jonathan Bloom
Lerning Gols Continuous Rndom Vriles Clss 5, 8.05 Jeremy Orloff nd Jonthn Bloom. Know the definition of continuous rndom vrile. 2. Know the definition of the proility density function (pdf) nd cumultive
More informationORDER REDUCTION USING POLE CLUSTERING AND FACTOR DIVISION METHOD
Author Nme et. l. / Interntionl Journl of New Technologies in Science nd Engineering Vol., Issue., 7, ISSN 9-78 ORDER REDUCTION USING POLE CLUSTERING AND FACTOR DIVISION METHOD A Chinn Nidu* G Dileep**
More informationCS 188: Artificial Intelligence Fall Announcements
CS 188: Artificil Intelligence Fll 2009 Lecture 20: Prticle Filtering 11/5/2009 Dn Klein UC Berkeley Announcements Written 3 out: due 10/12 Project 4 out: due 10/19 Written 4 proly xed, Project 5 moving
More informationLecture 2: January 27
CS 684: Algorithmic Gme Theory Spring 217 Lecturer: Év Trdos Lecture 2: Jnury 27 Scrie: Alert Julius Liu 2.1 Logistics Scrie notes must e sumitted within 24 hours of the corresponding lecture for full
More informationNumerical Analysis: Trapezoidal and Simpson s Rule
nd Simpson s Mthemticl question we re interested in numericlly nswering How to we evlute I = f (x) dx? Clculus tells us tht if F(x) is the ntiderivtive of function f (x) on the intervl [, b], then I =
More informationFlexible Beam. Objectives
Flexile Bem Ojectives The ojective of this l is to lern out the chllenges posed y resonnces in feedck systems. An intuitive understnding will e gined through the mnul control of flexile em resemling lrge
More informationCS 188 Introduction to Artificial Intelligence Fall 2018 Note 7
CS 188 Introduction to Artificil Intelligence Fll 2018 Note 7 These lecture notes re hevily bsed on notes originlly written by Nikhil Shrm. Decision Networks In the third note, we lerned bout gme trees
More informationChapter 5 Plan-Space Planning
Lecture slides for Automted Plnning: Theory nd Prctice Chpter 5 Pln-Spce Plnning Dn S. Nu CMSC 722, AI Plnning University of Mrylnd, Spring 2008 1 Stte-Spce Plnning Motivtion g 1 1 g 4 4 s 0 g 5 5 g 2
More informationCompiler Design. Fall Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz
University of Southern Cliforni Computer Science Deprtment Compiler Design Fll Lexicl Anlysis Smple Exercises nd Solutions Prof. Pedro C. Diniz USC / Informtion Sciences Institute 4676 Admirlty Wy, Suite
More informationConvert the NFA into DFA
Convert the NF into F For ech NF we cn find F ccepting the sme lnguge. The numer of sttes of the F could e exponentil in the numer of sttes of the NF, ut in prctice this worst cse occurs rrely. lgorithm:
More informationReview of Calculus, cont d
Jim Lmbers MAT 460 Fll Semester 2009-10 Lecture 3 Notes These notes correspond to Section 1.1 in the text. Review of Clculus, cont d Riemnn Sums nd the Definite Integrl There re mny cses in which some
More informationLecture Solution of a System of Linear Equation
ChE Lecture Notes, Dept. of Chemicl Engineering, Univ. of TN, Knoville - D. Keffer, 5/9/98 (updted /) Lecture 8- - Solution of System of Liner Eqution 8. Why is it importnt to e le to solve system of liner
More informationDesigning finite automata II
Designing finite utomt II Prolem: Design DFA A such tht L(A) consists of ll strings of nd which re of length 3n, for n = 0, 1, 2, (1) Determine wht to rememer out the input string Assign stte to ech of
More informationSOME INTEGRAL INEQUALITIES OF GRÜSS TYPE
RGMIA Reserch Report Collection, Vol., No., 998 http://sci.vut.edu.u/ rgmi SOME INTEGRAL INEQUALITIES OF GRÜSS TYPE S.S. DRAGOMIR Astrct. Some clssicl nd new integrl inequlities of Grüss type re presented.
More informationNear-Bayesian Exploration in Polynomial Time
J. Zico Kolter kolter@cs.stnford.edu Andrew Y. Ng ng@cs.stnford.edu Computer Science Deprtment, Stnford University, CA 94305 Abstrct We consider the explortion/exploittion problem in reinforcement lerning
More information1B40 Practical Skills
B40 Prcticl Skills Comining uncertinties from severl quntities error propgtion We usully encounter situtions where the result of n experiment is given in terms of two (or more) quntities. We then need
More informationA Generalization of Two-Player Stackelberg Games to Three Players
A Generliztion of Two-Plyer Stckelerg Gmes to Three Plyers Grrett Andersen 1 Introduction Two-plyer Stckelerg gmes nd their pplictions to security re currently very hot topic in the field of Algorithmic
More informationCalculus Module C21. Areas by Integration. Copyright This publication The Northern Alberta Institute of Technology All Rights Reserved.
Clculus Module C Ares Integrtion Copright This puliction The Northern Alert Institute of Technolog 7. All Rights Reserved. LAST REVISED Mrch, 9 Introduction to Ares Integrtion Sttement of Prerequisite
More informationCHAPTER 1 PROGRAM OF MATRICES
CHPTER PROGRM OF MTRICES -- INTRODUCTION definition of engineering is the science y which the properties of mtter nd sources of energy in nture re mde useful to mn. Thus n engineer will hve to study the
More informationSurface maps into free groups
Surfce mps into free groups lden Wlker Novemer 10, 2014 Free groups wedge X of two circles: Set F = π 1 (X ) =,. We write cpitl letters for inverse, so = 1. e.g. () 1 = Commuttors Let x nd y e loops. The
More informationCoalgebra, Lecture 15: Equations for Deterministic Automata
Colger, Lecture 15: Equtions for Deterministic Automt Julin Slmnc (nd Jurrin Rot) Decemer 19, 2016 In this lecture, we will study the concept of equtions for deterministic utomt. The notes re self contined
More information12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2016
CS125 Lecture 12 Fll 2016 12.1 Nondeterminism The ide of nondeterministic computtions is to llow our lgorithms to mke guesses, nd only require tht they ccept when the guesses re correct. For exmple, simple
More informationCS 188: Artificial Intelligence Spring 2007
CS 188: Artificil Intelligence Spring 2007 Lecture 3: Queue-Bsed Serch 1/23/2007 Srini Nrynn UC Berkeley Mny slides over the course dpted from Dn Klein, Sturt Russell or Andrew Moore Announcements Assignment
More information{ } = E! & $ " k r t +k +1
Chpter 4: Dynmic Progrmming Objectives of this chpter: Overview of collection of clssicl solution methods for MDPs known s dynmic progrmming (DP) Show how DP cn be used to compute vlue functions, nd hence,
More informationLinear Systems with Constant Coefficients
Liner Systems with Constnt Coefficients 4-3-05 Here is system of n differentil equtions in n unknowns: x x + + n x n, x x + + n x n, x n n x + + nn x n This is constnt coefficient liner homogeneous system
More informationAssignment 1 Automata, Languages, and Computability. 1 Finite State Automata and Regular Languages
Deprtment of Computer Science, Austrlin Ntionl University COMP2600 Forml Methods for Softwre Engineering Semester 2, 206 Assignment Automt, Lnguges, nd Computility Smple Solutions Finite Stte Automt nd
More informationChapter 4: Dynamic Programming
Chpter 4: Dynmic Progrmming Objectives of this chpter: Overview of collection of clssicl solution methods for MDPs known s dynmic progrmming (DP) Show how DP cn be used to compute vlue functions, nd hence,
More informationNondeterminism and Nodeterministic Automata
Nondeterminism nd Nodeterministic Automt 61 Nondeterminism nd Nondeterministic Automt The computtionl mchine models tht we lerned in the clss re deterministic in the sense tht the next move is uniquely
More informationGeneration of Lyapunov Functions by Neural Networks
WCE 28, July 2-4, 28, London, U.K. Genertion of Lypunov Functions by Neurl Networks Nvid Noroozi, Pknoosh Krimghee, Ftemeh Sfei, nd Hmed Jvdi Abstrct Lypunov function is generlly obtined bsed on tril nd
More information7.8 Improper Integrals
7.8 7.8 Improper Integrls The Completeness Axiom of the Rel Numers Roughly speking, the rel numers re clled complete ecuse they hve no holes. The completeness of the rel numers hs numer of importnt consequences.
More informationP 3 (x) = f(0) + f (0)x + f (0) 2. x 2 + f (0) . In the problem set, you are asked to show, in general, the n th order term is a n = f (n) (0)
1 Tylor polynomils In Section 3.5, we discussed how to pproximte function f(x) round point in terms of its first derivtive f (x) evluted t, tht is using the liner pproximtion f() + f ()(x ). We clled this
More informationINEQUALITIES FOR TWO SPECIFIC CLASSES OF FUNCTIONS USING CHEBYSHEV FUNCTIONAL. Mohammad Masjed-Jamei
Fculty of Sciences nd Mthemtics University of Niš Seri Aville t: http://www.pmf.ni.c.rs/filomt Filomt 25:4 20) 53 63 DOI: 0.2298/FIL0453M INEQUALITIES FOR TWO SPECIFIC CLASSES OF FUNCTIONS USING CHEBYSHEV
More informationThe Shortest Confidence Interval for the Mean of a Normal Distribution
Interntionl Journl of Sttistics nd Proility; Vol. 7, No. 2; Mrch 208 ISSN 927-7032 E-ISSN 927-7040 Pulished y Cndin Center of Science nd Eduction The Shortest Confidence Intervl for the Men of Norml Distriution
More informationDerivations for maximum likelihood estimation of particle size distribution using in situ video imaging
2 TWMCC Texs-Wisconsin Modeling nd Control Consortium 1 Technicl report numer 27-1 Derivtions for mximum likelihood estimtion of prticle size distriution using in situ video imging Pul A. Lrsen nd Jmes
More informationAn LP-Based Heuristic for Optimal Planning
An LP-Bsed Heuristic for Optiml Plnning Menkes vn den Briel 1, J. Benton 2, Suro Kmhmpti 2, nd Thoms Vossen 3 Arizon Stte University, Deprtment of Industril Engineering 1, Deprtment of Computer Science
More informationBest Approximation. Chapter The General Case
Chpter 4 Best Approximtion 4.1 The Generl Cse In the previous chpter, we hve seen how n interpolting polynomil cn be used s n pproximtion to given function. We now wnt to find the best pproximtion to given
More informationTHE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS.
THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS RADON ROSBOROUGH https://intuitiveexplntionscom/picrd-lindelof-theorem/ This document is proof of the existence-uniqueness theorem
More informationReward Shaping for Model-Based Bayesian Reinforcement Learning
Rewrd Shping for Model-Bsed Byesin Reinforcement Lerning Hyeoneun Kim, Woosng Lim, Knghoon Lee, Yung-Kyun Noh nd Kee-Eung Kim Deprtment of Computer Science Kore Advnced Institute of Science nd Technology
More informationState space systems analysis (continued) Stability. A. Definitions A system is said to be Asymptotically Stable (AS) when it satisfies
Stte spce systems nlysis (continued) Stbility A. Definitions A system is sid to be Asymptoticlly Stble (AS) when it stisfies ut () = 0, t > 0 lim xt () 0. t A system is AS if nd only if the impulse response
More informationHamiltonian Cycle in Complete Multipartite Graphs
Annls of Pure nd Applied Mthemtics Vol 13, No 2, 2017, 223-228 ISSN: 2279-087X (P), 2279-0888(online) Pulished on 18 April 2017 wwwreserchmthsciorg DOI: http://dxdoiorg/1022457/pmv13n28 Annls of Hmiltonin
More informationA Variance Analysis for POMDP Policy Evaluation
Proceedings of the Twenty-Third AAAI Conference on Artificil Intelligence (2008) A Vrince Anlysis for POMDP Policy Evlution Mhdi Milni Frd nd Joelle Pineu School of Computer Science McGill University,
More informationWe will see what is meant by standard form very shortly
THEOREM: For fesible liner progrm in its stndrd form, the optimum vlue of the objective over its nonempty fesible region is () either unbounded or (b) is chievble t lest t one extreme point of the fesible
More informationThe Regulated and Riemann Integrals
Chpter 1 The Regulted nd Riemnn Integrls 1.1 Introduction We will consider severl different pproches to defining the definite integrl f(x) dx of function f(x). These definitions will ll ssign the sme vlue
More informationWENJUN LIU AND QUÔ C ANH NGÔ
AN OSTROWSKI-GRÜSS TYPE INEQUALITY ON TIME SCALES WENJUN LIU AND QUÔ C ANH NGÔ Astrct. In this pper we derive new inequlity of Ostrowski-Grüss type on time scles nd thus unify corresponding continuous
More informationCS415 Compilers. Lexical Analysis and. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University
CS415 Compilers Lexicl Anlysis nd These slides re sed on slides copyrighted y Keith Cooper, Ken Kennedy & Lind Torczon t Rice University First Progrmming Project Instruction Scheduling Project hs een posted
More informationMATH34032: Green s Functions, Integral Equations and the Calculus of Variations 1
MATH34032: Green s Functions, Integrl Equtions nd the Clculus of Vritions 1 Section 1 Function spces nd opertors Here we gives some brief detils nd definitions, prticulrly relting to opertors. For further
More informationAN INEQUALITY OF OSTROWSKI TYPE AND ITS APPLICATIONS FOR SIMPSON S RULE AND SPECIAL MEANS. I. Fedotov and S. S. Dragomir
RGMIA Reserch Report Collection, Vol., No., 999 http://sci.vu.edu.u/ rgmi AN INEQUALITY OF OSTROWSKI TYPE AND ITS APPLICATIONS FOR SIMPSON S RULE AND SPECIAL MEANS I. Fedotov nd S. S. Drgomir Astrct. An
More informationA Symbolic Approach to Control via Approximate Bisimulations
A Symolic Approch to Control vi Approximte Bisimultions Antoine Girrd Lortoire Jen Kuntzmnn, Université Joseph Fourier Grenole, Frnce Interntionl Symposium on Innovtive Mthemticl Modelling Tokyo, Jpn,
More information5.7 Improper Integrals
458 pplictions of definite integrls 5.7 Improper Integrls In Section 5.4, we computed the work required to lift pylod of mss m from the surfce of moon of mss nd rdius R to height H bove the surfce of the
More informationCS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata
CS103B ndout 18 Winter 2007 Ferury 28, 2007 Finite Automt Initil text y Mggie Johnson. Introduction Severl childrens gmes fit the following description: Pieces re set up on plying ord; dice re thrown or
More informationSection 6.1 Definite Integral
Section 6.1 Definite Integrl Suppose we wnt to find the re of region tht is not so nicely shped. For exmple, consider the function shown elow. The re elow the curve nd ove the x xis cnnot e determined
More information5: The Definite Integral
5: The Definite Integrl 5.: Estimting with Finite Sums Consider moving oject its velocity (meters per second) t ny time (seconds) is given y v t = t+. Cn we use this informtion to determine the distnce
More informationSection 6: Area, Volume, and Average Value
Chpter The Integrl Applied Clculus Section 6: Are, Volume, nd Averge Vlue Are We hve lredy used integrls to find the re etween the grph of function nd the horizontl xis. Integrls cn lso e used to find
More informationNUMERICAL INTEGRATION
NUMERICAL INTEGRATION How do we evlute I = f (x) dx By the fundmentl theorem of clculus, if F (x) is n ntiderivtive of f (x), then I = f (x) dx = F (x) b = F (b) F () However, in prctice most integrls
More information12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2014
CS125 Lecture 12 Fll 2014 12.1 Nondeterminism The ide of nondeterministic computtions is to llow our lgorithms to mke guesses, nd only require tht they ccept when the guesses re correct. For exmple, simple
More informationSuppose we want to find the area under the parabola and above the x axis, between the lines x = 2 and x = -2.
Mth 43 Section 6. Section 6.: Definite Integrl Suppose we wnt to find the re of region tht is not so nicely shped. For exmple, consider the function shown elow. The re elow the curve nd ove the x xis cnnot
More informationexpression simply by forming an OR of the ANDs of all input variables for which the output is
2.4 Logic Minimiztion nd Krnugh Mps As we found ove, given truth tle, it is lwys possile to write down correct logic expression simply y forming n OR of the ANDs of ll input vriles for which the output
More information2.4 Linear Inequalities and Interval Notation
.4 Liner Inequlities nd Intervl Nottion We wnt to solve equtions tht hve n inequlity symol insted of n equl sign. There re four inequlity symols tht we will look t: Less thn , Less thn or
More informationSufficient condition on noise correlations for scalable quantum computing
Sufficient condition on noise correltions for sclble quntum computing John Presill, 2 Februry 202 Is quntum computing sclble? The ccurcy threshold theorem for quntum computtion estblishes tht sclbility
More informationRandom subgroups of a free group
Rndom sugroups of free group Frédérique Bssino LIPN - Lortoire d Informtique de Pris Nord, Université Pris 13 - CNRS Joint work with Armndo Mrtino, Cyril Nicud, Enric Ventur et Pscl Weil LIX My, 2015 Introduction
More information10 Vector Integral Calculus
Vector Integrl lculus Vector integrl clculus extends integrls s known from clculus to integrls over curves ("line integrls"), surfces ("surfce integrls") nd solids ("volume integrls"). These integrls hve
More informationHow do we solve these things, especially when they get complicated? How do we know when a system has a solution, and when is it unique?
XII. LINEAR ALGEBRA: SOLVING SYSTEMS OF EQUATIONS Tody we re going to tlk out solving systems of liner equtions. These re prolems tht give couple of equtions with couple of unknowns, like: 6= x + x 7=
More information