Compact, Convex Upper Bound Iteration for Approximate POMDP Planning

Size: px
Start display at page:

Download "Compact, Convex Upper Bound Iteration for Approximate POMDP Planning"

Transcription

1 Compct, Convex Upper Bound Itertion for Approximte POMDP Plnning To Wng University of Alert Pscl Pouprt University of Wterloo Michel Bowling nd Dle Schuurmns University of Alert Astrct Prtilly oservle Mrkov decision processes (POMDPs) re n intuitive nd generl wy to model sequentil decision mking prolems under uncertinty. Unfortuntely, even pproximte plnning in POMDPs is known to e hrd, nd developing heuristic plnners tht cn deliver resonle results in prctice hs proved to e significnt chllenge. In this pper, we present new pproch to pproximte vlue-itertion for POMDP plnning tht is sed on qudrtic rther thn piecewise liner function pproximtors. Specificlly, we pproximte the optiml vlue function y convex upper ound composed of fixed numer of qudrtics, nd optimize it t ech stge y semidefinite progrmming. We demonstrte tht our pproch cn chieve competitive pproximtion qulity to current techniques while still mintining ounded size representtion of the function pproximtor. Moreover, n upper ound on the optiml vlue function cn e preserved if required. Overll, the technique requires computtion time nd spce tht is only liner in the numer of itertions (horizon time). Introduction Prtilly oservle Mrkov decision processes (POMDPs) re generl model of n gent cting in n environment, where the effects of the gent s ctions nd the oservtions it cn mke out the current stte of the environment re oth suject to uncertinty. The gent s gols re specified y rewrds it receives (s function of the sttes it visits nd ctions it executes), nd n optiml ehvior strtegy in this context chooses ctions, sed on the history of oservtions, tht mximizes the long term rewrd of the gent. POMDPs hve ecome n importnt modeling formlisms in rootics nd utonomous gent design (Thrun, Burgrd, & Fox 2005; Pineu et l. 2003). Much of the current work on root nvigtion nd mpping, for exmple, is now sed on stochstic trnsition nd oservtion models (Thrun, Burgrd, & Fox 2005; Roy, Gordon, & Thrun 2005). Moreover, POMDP representtions hve lso een used to design utonomous gents for rel world pplictions, including nursing (Pineu et l. 2003) nd elderly ssistnce (Boger et l. 2005). Copyright c 2006, Americn Assocition for Artificil Intelligence ( All rights reserved. Despite their convenience s modeling frmework however, POMDPs pose difficult computtionl prolems. It is well known tht solving for optiml ehvior strtegies or even just pproximting optiml strtegies in POMDP is intrctle (Mdni, Hnks, & Condon 2003; Mundhenk et l. 2000). Therefore, lot of work hs focused on developing heuristics for computing resonle ehvior strtegies for POMDPs. These pproches hve generlly followed three rod strtegies: vlue function pproximtion (Huskrecht 2000; Spn & Vlssis 2005; Pineu, Gordon, & Thrun 2003; Prr & Russell 1995), policy sed optimiztion (Ng & Jordn 2000; Pouprt & Boutilier 2003; 2004; Amto, Bernstein, & Zilerstein 2006), nd stochstic smpling (Kerns, Mnsour, & Ng 2002; Thrun 2000). In this pper, we focus on the vlue function pproximtion pproch nd contriute new perspective to this strtegy. Most previous work on vlue function pproximtion for POMDPs hs focused on representtions tht explicitly mintin set of α-vectors or elief sttes. This is motivted y the fct tht the optiml vlue function, considered s function of the elief stte, is determined y the mximum of set of liner functions specified y α-vectors where ech α-vector is ssocited with specific ehvior policy. Since the optiml vlue function is given y the mximum of (lrge) set of α-vectors, it is nturl to consider pproximting it y suset of α-vectors, or t lest smll set of liner functions. In fct, even n exct representtion of the optiml vlue function need not keep every α-vector, ut only those tht re mximl for t lest some witness elief stte. Motivted y this chrcteriztion, most vlue function pproximtion strtegies ttempt to mintin smller suset of α-vectors y focusing on reduced set of elief sttes (Spn & Vlssis 2005; Huskrecht 2000; Pineu, Gordon, & Thrun 2003). Although much recent progress hs een mde on α-vector sed pproximtions, drwck of this pproch is tht the numer of α-vectors stored generlly hs to grow with the numer of vlue itertions to mintin n dequte pproximtion (Pineu, Gordon, & Thrun 2003). In this pper, we consider n lterntive pproch tht drops the notion of n α-vector entirely from the pproximtion strtegy. Insted we exploit the other fundmentl oservtion out the nture of the optiml vlue function: since it is determined y elief-stte-wise mxi-

2 mum over liner functions, the optiml vlue function must e convex function of the elief stte (Sondik 1978; Boyd & Vndenerghe 2004). Our strtegy, then, is to compute convex pproximtion to the optiml vlue function tht is sed on qudrtic rther thn liner functions of the elief stte. The dvntge of using qudrtic sis for vlue function pproximtion is severl-fold: First, the size of the representtion does not hve to grow merely to model n incresing numer of fcets in the optiml solution; thus we cn keep ounded size representtion t ech horizon. Second, qudrtic representtion llows one to conveniently mintin provle upper ound on the optiml vlues in n explicit compct representtion without requiring uxiliry liner progrmming to e used to retrieve the ound, s in current grid sed pproches (Huskrecht 2000; Smith & Simmons 2005). Third, the computtionl cost of updting the pproximtion does not chnge with itertion numer (either in time or spce), so the overll computtion time is only liner in the horizon. Finlly, s we demonstrte elow, despite significnt reduction in representtion size, convex qudrtics re still le to chieve competitive pproximtion qulity on enchmrk POMDP prolems. Bckground We egin with Mrkov decision processes (MDP) since we will need to exploit some sic concepts from MDPs in our pproch elow. An MDP is defined y set of sttes S, set of ctions A, stte trnsition model p(s s, ), nd rewrd model r(s, ). In this setting, deterministic policy is specified y function from sttes to ctions, π : S A, nd the vlue function for policy is defined s the expected future discounted rewrd the policy otins from ech stte [ ] V π (s) = E π γ t r(s t, π(s t )) s 0 = s t=0 Here the discount fctor, 0 γ < 1, expresses trdeoff etween short term nd long term rewrd. It is known tht there exists deterministic optiml policy whose vlue function domintes ll other policy vlues in every stte (Bertseks 1995). This optiml vlue function lso stisfies the Bellmn eqution V (s) = mx r(s, ) + γ p(s s, )V (s ) (1) s Computing the optiml vlue function for given MDP cn e ccomplished in severl wys. The two wys we consider elow re vlue itertion nd liner progrmming. Vlue itertion is sed on repetedly pplying the Bellmn ckup opertor, V n+1 = HV n, specified y V n+1 (s) = mx r(s, ) + γ p(s s, )V n (s ) (2) s It cn e shown tht V n V in the L norm, nd thus V is fixed point of (2) (Bertseks 1995). V is lso the solution to the liner progrm min V V (s) s.t. V (s) r(s, )+γ p(s s, )V (s ) (3) s s for ll s S nd A. It turns out tht for continuous stte spces, the Bellmn eqution (1) still chrcterizes the optiml vlue function, replcing the trnsition proilities with conditionl densities nd the sums with Leesgue integrls. However, computtionlly, the sitution is not so simple for continuous stte spces, since the integrls must now somehow e solved in plce of the sums, nd (3) is no longer finitely defined. Nevertheless, continuous stte spces re unvoidle when one considers POMDP plnning. POMDPs extend MDPs y introducing n oservtion model p(, s ) tht governs how noisy oservtion O is relted to the underlying stte s nd the ction. Hving ccess to only noisy oservtions of the stte complictes the prolem of choosing optiml ctions significntly. The gent now never knows the exct stte of the environment, ut insted must infer distriution over possile sttes, (s), from the history of oservtions nd ctions. Nevertheless, given n ction nd oservtion the gent s elief stte cn e esily updted y Byes rule (,, ) (s ) = p(, s ) s p(s s, )(s)/z (4) where Z = p(, ) = s p(o, s ) s p(s s, )(s). By the Mrkov ssumption, the elief stte is sufficient representtion upon which n optiml ehvior strtegy cn e defined (Sondik 1978). Therefore, policy is nturlly specified in this setting y function from elief sttes to ctions, π : B A, where B is the set of ll possile distriutions over the underlying stte spce S (n S 1 dimensionl simplex). Oviously for ny environment with two or more sttes there re n infinite numer of elief sttes, nd not every policy cn e finitely represented. Nevertheless, one cn still define the vlue function of policy s the expected future discounted rewrd otined from ech elief stte [ ] V π () = E π γ t r( t, π( t )) 0 = t=0 where r(, ) = s r(s, )(s). Thus, POMDP cn e treted s n MDP over elief sttes; tht is, continuous stte MDP. As efore, n optiml policy otins the mximum vlue for ech elief stte, nd its vlue function stisfies the Bellmn eqution (Sondik 1978) V () = mx r(, )+γ p(, )V ( ) = mx r(, )+γ p(, )V ( (,, ) )(5) Unfortuntely, solving the functionl eqution (5) for V is hrd. Known techniques for computing the optiml vlue function re generlly sed on vlue itertion (Cssndr, Littmn, & Zhng 1997; Zhng & Zhng 2001); lthough policy sed pproches re lso possile (Sondik 1978; Pouprt & Boutilier 2003; 2004). As ove, vlue itertion is sed on repetedly pplying Bellmn ckup opertor, V n+1 = HV n, to current vlue function pproximtion. In this cse, current lower ound, V n, is represented y

3 finite set of α-vectors, Γ n = {α π : π Π n }, where ech α-vector is ssocited with n n-step ehvior strtegy π. Given Γ n, the vlue function is represented y V n () = mx α π Γ n α π At ech stge of vlue itertion, the current lower ound is updted ccording to the Bellmn ckup, V n+1 = HV n, such tht V n+1 () = mx r(, )+γ p(, )V n ( (,, ) ) (6) = mx r +γ rg g (π,, ) mx g (π π Π,, ) n = mx,{ π } α,{ π } (7) where we use the quntities g (π,, )(s) = s p(, s )p(s, s)α π (s ) α,{ π } = r +γ g (π,, ) Once gin it is known tht V n V in the L norm, nd thus V is fixed point of (6) (Sondik 1978). Although the size of the representtion for V n+1 remins finite, it cn e exponentilly lrger thn V n in the worst cse, since enumerting every possiility for, { π } over A, o O, π Π n, yields Π n+1 A Π n O comintions. Mny of these α-vectors re not mximl for ny elief stte, nd cn e pruned y running liner progrm for ech tht verifies whether there is witness elief stte for which it is mximl (Cssndr, Littmn, & Zhng 1997). Thus, the set of α-vectors, Γ n, ction strtegies, Π n, nd witness elief sttes, B n, re ll ssocited 1 to 1. However, even with pruning, exct vlue itertion cnnot e run for mny steps, even on smll prolems. Vlue function pproximtion strtegies Much reserch hs focused on pproximting the optiml vlue function, imed for the most prt t reducing the time nd spce complexity of the vlue itertion updte. Work in this re hs considered vrious strtegies (Huskrecht 2000), including direct MDP pproximtions nd vrints, nd using function pproximtion to fit V n+1 over smpled elief sttes (Prr & Russell 1995; Littmn, Cssndr, & Kelling 1995). However, two pproches hve recently ecome the most dominnt: grid sed nd elief point pproximtions. The grid sed pproch (Gordon 1995; Huskrecht 2000; Zhou & Hnsen 2001; Bonet 2002) mintins finite collection of elief sttes long with ssocited vlue estimtes {, V n () : B grid }. These vlue estimtes re updted y pplying the Bellmn updte on B grid. An importnt dvntge of this pproch is tht it cn mintin n upper ound on the optiml vlue function. Unfortuntely, mintining tight ound entils significnt computtionl expense (Huskrecht 2000): First, B grid must contin ll corners of the simplex so tht its convex closure spns B. Second, ech successor elief stte in (6) must hve its interpolted vlue estimte minimized y liner progrm (Zhou & Hnsen 2001). Below we show tht this lrge numer of liner progrms cn e replced with single convex optimiztion. Unlike the grid sed pproch, which tkes current elief stte in B grid nd projects it forwrd to elief sttes outside of B grid, the elief point pproch only considers elief sttes in witness set B wit (Pineu, Gordon, & Thrun 2003; Smith & Simmons 2005). Specificlly, the elief point pproximtion mintins lower ound y keeping suset of α-vectors ssocited with these witness elief sttes. To further explin this pproch, let Γ n = {α π : π ˆΠ n }, so tht there is 1 to 1 correspondence etween α-vectors in Γ n, ction strtegies in ˆΠ n nd elief sttes in B wit. Then the set of α-vectors is updted y pplying the Bellmn ckup, ut restricting the choices in (7) to π ˆΠ n, nd only computing (7) for B wit. Thus, the numer of α-vectors in ech itertion remins ounded nd ssocited with B wit. The qulity of oth these pproches is strongly determined y the sets of elief points, B grid nd B wit, they mintin. For the elief point pproch, one generlly hs to grow the numer of elief points t ech itertion to mintin n dequte ound on the optiml vlue function. Pineu et l. (2003) suggested douling the size t ech itertion, ut recently more refined pproch ws suggested y (Smith & Simmons 2005). Convex qudrtic upper ounds The key oservtion ehind our pproch is tht one does not need to e confined to piecewise liner pproximtions. Our intuition is tht convex qudrtic pproximtions re prticulrly well suited for vlue function pproximtion in POMDPs. This is motivted y the fct tht ech vlue itertion step produces mximum over set of convex functions, yielding result tht is lwys convex. Thus, one cn plusily use convex qudrtic function to upper ound the mximum over α-vectors, nd more generlly to upper ound the mximum over ny set of ck-projected convex vlue pproximtions from itertion n. Our sic gol then is to retin compct representtion of the vlue pproximtion y exploiting the fct tht qudrtics cn e more efficient t pproximting convex upper ound thn set of liner functions; see Figure 1. As with piecewise liner pproximtions, the qulity of the pproximtion cn e improved y tking mximum over set of convex qudrtics, which would yield convex piecewise qudrtic rther thn piecewise liner pproximtion. In this pper, however, we will focus on the most nive choice, nd pproximte the vlue function with single qudrtic in ech step of vlue itertion. The susequent extension to multiple qudrtics is discussed elow. An importnt dvntge the qudrtic form hs over other function pproximtion representtions is tht it permits convex minimiztion of the upper ound, s we demonstrte elow. Such convenient formultion is not redily chievle for other function representtions. Also, since we re

4 α1 Convex qudrtic upper ound α2 Let the ction-vlue ckup of ˆV e denoted y q () = r(, ) + γ p(, ) ˆV ( (,, ) ) (9) Figure 1: Illustrtion of convex qudrtic upper ound pproximtion to mximum of liner functions α π. not compelled to grow the size of the representtion t ech itertion, we otin n pproch tht runs in liner time in the numer of vlue itertion steps. There re few drwcks in dropping the piecewise liner representtion however. One drwck is tht we lose the 1 to 1 correspondence etween α-vectors nd ehvior strtegies π, which mens tht greedy ction selection requires one step look hed clcultion sed on (5). The second drwck is tht the convex optimiztion prolem we hve to solve t ech vlue itertion is more complex thn simple liner progrm. Convex upper ound itertion The min technicl chllenge we fce is to solve for tight qudrtic upper ound on the vlue function t ech stge of vlue itertion. Interestingly, this cn e done effectively with convex optimiztion s follows. We represent the vlue function pproximtion over elief sttes y qudrtic form ˆV n () = W n + w n + ω n (8) where W n is squre mtrix of weights, w n is vector of weights, nd ω n is sclr offset weight. Eqution (8) defines convex function of elief stte if nd only if the mtrix W n is positive semidefinite (Boyd & Vndenerghe 2004). We denote the semidefinite constrint on W n y W n 0. As shown ove, one step of vlue itertion involves expnding (nd ck-projecting) vlue pproximtion from stge n; defining the vlue function t stge n + 1 y the mximum over the expnded, ck-projected set. However, ck-projection entils some dditionl compliction in our cse ecuse we do not mintin set of α-vectors, ut rther mintin qudrtic function pproximtion t stge n. Tht is, our pproximte vlue itertion step hs to pull the qudrtic form through the ckup opertor. Unfortuntely, the result of ckup is no longer qudrtic, ut rtionl (qudrtic over liner) function. Fortuntely, however, the result of this ckup is still convex, s we now show. α3 To express this s function of, we need to expnd the definitions of (,, ) nd ˆV n respectively. First, note tht (,, ) is rtio of vector liner function of over sclr liner function of y (4), therefore we cn represent it y (,, ) = M, p(, ) = M, e M, (10) where M, is mtrix such tht M,(s, s) = p(, s )p(s s, ), nd e denotes the vector of ll 1s. Sustituting (8) nd (10) into (9) yields q () = r(, )+γ M,W M, e +(w+ωe) M, M, Theorem 1 q () is convex in. Proof First note tht M,o W M, 0 if W 0, nd therefore it suffices to show tht the function f() = ( N)/(v ) is convex under the ssumption N 0 nd v 0. Note tht N 0 implies N = QQ for some Q, nd therefore f() = ( QQ )/(v ) = (Q ) (v I) 1 (Q ). Next, we use few elementry fcts out convexity (Boyd & Vndenerghe 2004). First, function is convex iff its epigrph is convex, so it suffices to show tht the set {(, v I, δ) v I 0, (Q ) (v I) 1 (Q ) δ} is convex. By the Schur complement lemm, we [ hve tht δ ] (Q ) (v I) 1 (Q ) 0 iff v I Q (Q ) 0 nd therefore f() is convex iff { δ [ ] } the set (, v I, δ) v v I 0, I Q (Q ) 0 δ is convex. The result then follows ecuse this set cn e written s liner mtrix inequlity. Corollry 1 Given convex qudrtic representtion for ˆV n, mx q (), nd hence H ˆV n, is convex in. So ck-projecting the convex qudrtic representtion still yields convex result. Our gol is to optimize tight qudrtic upper ound on the mximum of these convex functions (which of course is still convex). In some pproches elow we will use the ck-projected ction-vlue functions directly. However, in other cses, it will prove dvntgeous if we cn work with liner upper ounds on the ck-projections. Proposition 1 The tightest liner upper ound on q () is given y q () u for vector u such tht u 1 s = q (1 s ) for ech corner elief stte 1 s. Algorithmic pproch We would like to solve for qudrtic ˆV n+1 t stge n + 1 tht otins s tight n upper ound on H ˆV n s possile. To do this, we ppel to the liner progrm chrcteriztion

5 of the optiml vlue function (3) which lso is expressed s minimizing n upper ound on the ck-projected vlue function. Unfortuntely, here, since we re no longer working with finite spce, we cnnot formulte liner progrm ut rther hve to pose generlized semi-infinite progrm min W,w,ω ( W + w + ω ) µ() d suject to (11) W + w + ω q (),, ; W 0 where µ() is mesure over the spce of possile elief sttes. The semi-infinite progrm (11) specifies liner ojective suject to liner constrints (leit infinitely mny liner constrints); nd hence is convex optimiztion prolem in W, w, ω. There re two min difficulties in solving this convex optimiztion prolem. First, the ojective involves n integrl with respect to mesure µ() on elief sttes. This mesure is ritrry (except tht it must hve full support on the elief spce B) nd llows one to control the emphsis the minimiztion plces on different regions of the elief spce. For simplicity, we ssume the mesure is Dirichlet distriution, specified y vector of prior prmeters θ(s), s S. The Dirichlet distriution is prticulrly convenient in this context since one cn specify uniform distriution over the elief simplex merely y setting θ(s) = 1 for ll s. Moreover, the required integrls for the Dirichlet hve closed form solution, which llows us to simply precompute the liner coefficients for the weight prmeters, y ( W + w + ω ) µ()d = W, E[ ] +w E[]+ω where E[] = θ/ θ 1 ; E[ ] = (dig(e[]) + θ 1 E[]E[] )/(1 + θ 1 ) (Gelmn et l. 1995); nd A, B = ij A ijb ij. Tht is, one cn specify θ nd compute the liner coefficients hed of time. The second nd more difficult prolem with solving (11) is to find wy to cope with the infinite numer of liner constrints. Here, we ddress the prolem with strightforwrd constrint genertion pproch. The ide is to solve (11), itertively, y keeping finite set of constrints, ech corresponding to elief stte, nd solving the finite semidefinite progrm min W, W,w,ω E[ ] + w E[] + ω suject to (12) i W i + w i + ω q ( i ),, i C; W 0 Given puttive solution, W, w, ω, new constrint cn e otined y finding elief stte tht solves min W + w + ω q () suject to 0, s (s) = 1 (13) for ech. If the minimum vlue is nonnegtive for ll then there re no violted constrints nd we hve solution to (11). Unfortuntely, (13) cnnot directly e used for constrint genertion, since q () is convex function of (Theorem 1) nd hence q () is concve; yielding non-convex ojective. Thus, to use (13) for constrint genertion we need to follow n lterntive pproch. We hve pursued three different pproches to this prolem thus fr. Our first strtegy mintins provle upper ound on the optiml vlue function y strengthening the constrint threshold with the liner upper ound u q () from Proposition 1. Replcing q () with u in (11) nd (13) ensures tht n upper ound will e mintined, ut lso reduces (13) to qudrtic progrm tht cn e efficiently minimized to produce elief stte with mximum constrint violtion. Our second strtegy relxes the upper ound gurntee y only sustituting u for q () in the constrint genertion procedure, mintining n efficient qudrtic progrmming formultion there, ut keeping q () in the min optimiztion (12). This no longer gurntees n upper ound, ut cn still produce etter pproximtions in prctice ecuse the ounds do not hve to e rtificilly strengthened. Our finl strtegy side-steps optiml constrint genertion entirely, nd insted chooses fixed set of elief sttes for the constrint set C in (12). In this wy, the semidefinite progrm (12) needs to e solved only once per vlue itertion step. This strtegy doesn t produce n upper ound either ut the resulting pproximtion is fst nd effective in prctice. Finlly, to improve pproximtion qulity, one could ugment the pproximte vlue function representtion with mximum over set of qudrtics, much s with α-vectors. One nturl wy to do this would e to mintin seprte qudrtic for ech ction,, in (11). Experimentl results We implemented the proposed pproch using SDPT3 (Toh, Todd, & Tutuncu 1999) s the semidefinite progrm solver for (12). Specificlly, in our initil experiments, we hve investigted the third (simplest) strtegy mentioned ove, CQUB, which only used rndom smple of elief sttes to specify the constrints in C. We compred this method to two current vlue function pproximtion strtegies in the literture: Perseus (Spn & Vlssis 2005), nd PBVI (Pineu, Gordon, & Thrun 2003). Here, oth Perseus nd PBVI were run with the numer of elief sttes fixed t 1000, wheres the convex qudrtic method, CQUB, ws run with 100 rndom elief sttes. In our initil experiments, we considered the enchmrk prolems: Mze (Huskrecht 1997), Tigergrid, Hllwy, Hllwy2, Aircrft ville from Tle 1 gives the prolem chrcteristics. In ech cse, numer of vlue itertion steps ws fixed s shown in Tle 1, nd ech method ws run 10 times to generte n estimte of vlue function pproximtion qulity. Tle 2 shows the results otined y the vrious vlue function pproximtion strtegies on these domins, reporting the expected discounted rewrd otined y the greedy policies defined with respect to the vlue function estimtes, s well s the verge time nd the size of the vlue function

6 Prolems S A O vlue iters Mze Tiger-grid Hllwy Hllwy Aircrft Tle 1: Prolem chrcteristics. CQUB Perseus PBVI Mze Avg. rewrd ± ± ±2.0 Run time (s) Size Tiger-grid Avg. rewrd 2.16 ± ± ±0.06 Run time (s) Size Hllwy Avg. rewrd 0.58 ± ± ±0.03 Run time (s) Size Hllwy2 Avg. rewrd 0.43 ± ± ±0.03 Run time (s) Size Aircrft Avg. rewrd ± ± ±0.42 Run time (s) Size Tle 2: Men discounted rewrd otined over 1000 trjectories using the greedy policy for ech vlue function pproximtion, verged over 10 runs of vlue itertion. pproximtion. 1 Interestingly, the convex qudrtic strtegy CQUB performed surprisingly well in these experiments, competing with stte of the rt vlue function pproximtions while only using 100 rndom elief sttes for constrint genertion in (12). The result is slightly weker in the Tiger-grid domin, ut significntly stronger in the Hllwy domins; supporting the thesis tht convex qudrtics cpture vlue function structure more efficiently thn liner pproches. Conclusions We hve introduced new pproch to vlue function pproximtion for POMDPs tht is sed on convex qudrtic ound rther thn piecewise liner pproximtion. We hve found tht qudrtic pproximtors cn chieve highly competitive pproximtion qulity without growing the size of the representtion, even while explicitly 1 For Perseus nd PBVI, the size is S times the numer of α- vectors. For CQUB, the size is just S ( S +1)/2+ S +1, which corresponds to the numer of vriles in the qudrtic pproximtor. focusing on only tiny frction of the elief sttes. We expect tht this pproch cn led to new venues of reserch in vlue pproximtion for POMDPs. We re currently considering extensions to this pproch sed on elief stte compression (Pouprt & Boutilier 2002; 2004; Roy, Gordon, & Thrun 2005), nd fctored models (Boutilier & Poole 1996; Feng & Hnsen 2001; Pouprt 2005) to tckle POMDPs with lrge stte spces. We lso pln to comine our qudrtic vlue function pproximtion with policy sed nd smpling sed pproches. A further ide we re exploring is the interprettion of convex qudrtics s second order Tylor pproximtions to the optiml vlue function, which offers further lgorithmic pproches with the potentil for tight theoreticl gurntees on pproximtion qulity. Acknowledgments Reserch supported y the Alert Ingenuity Centre for Mchine Lerning, NSERC, MITACS, CFI, nd the Cnd Reserch Chirs progrm. References Amto, C.; Bernstein, D.; nd Zilerstein, S Solving POMDPs using qudrticlly constrined liner progrms. In Proceedings of the Fifth Interntionl Joint Conference on Autonomous Agents nd Multigent Systems (AAMAS). Bertseks, D Dynmic Progrmming nd Optiml Control, volume 2. Athen Scientific. Boger, J.; Pouprt, P.; Hoey, J.; Boutilier, C.; Fernie, G.; nd Mihilidis, A A decision-theoretic pproch to tsk ssistnce for persons with dementi. In Proceedings of the Nineteenth Interntionl Joint Conference on Artificil Intelligence (IJCAI). Bonet, B An ɛ-optiml grid-sed lgorithm for prtilly oservle Mrkov decision processes. In Proceedings of the Nineteenth Interntionl Conference on Mchine Lerning (ICML). Boutilier, C., nd Poole, D Computing optiml policies for prtilly oservle decision processes using compct representtions. In Proceedings of the Thirteenth Ntionl Conference on Artificil Intelligence (AAAI). Boyd, S., nd Vndenerghe, L Convex Optimiztion. Cmridge Univ. Press. Cssndr, A.; Littmn, M.; nd Zhng, N Incrementl pruning: A simple, fst, exct method for prtilly oservle Mrkov decision processes. In Proceedings of the Thirteenth Conference on Uncertinty in Artificil Intelligence (UAI). Feng, Z., nd Hnsen, E. A Approximte plnning for fctored POMDPs. In Proceedings of the Sixth Europen Conference on Plnning. Gelmn, A.; Crlin, J.; Stern, H.; nd Ruin, D Byesin Dt Anlysis. Chpmn & Hll. Gordon, G Stle function pproximtion in dynmic progrmming. In Proceedings of the Twelfth Interntionl Conference on Mchine Lerning (ICML).

7 Huskrecht, M Incrementl methods for computing ounds in prtilly oservle mrkov decision processes. In Proceedings of the Fourteenth Ntionl Conference on Artificil Intelligence (AAAI). Huskrecht, M Vlue-function pproximtions for prtilly oservle Mrkov decision processes. Journl of Artificil Intelligence Reserch 13: Kerns, M.; Mnsour, Y.; nd Ng, A A sprse smpling lgorithm for ner-optiml plnning in lrge Mrkov decision processes. Mchine Lerning 49(2-3): Littmn, M.; Cssndr, A.; nd Kelling, L Lerning policies for prtilly oservle environments: scling up. In Proceedings of the Twelfth Interntionl Conference on Mchine Lerning (ICML). Mdni, O.; Hnks, S.; nd Condon, A On the undecidility of proilistic plnning nd relted stochstic optimiztion prolems. Artificil Intelligence 147:5 34. Mundhenk, M.; Goldsmith, J.; Lusen, C.; nd Allender, E Complexity of finite-horizon Mrkov decision processes. Journl of the Assocition for Computing Mchinery 47(4): Ng, A., nd Jordn, M Pegsus: A policy serch method for lrge MDPs nd POMDPs. In Proceedings of the Sixteenth Conference on Uncertinty in Artificil Intelligence (UAI). Prr, R., nd Russell, S Approximting optiml policies for prtilly oservle stochstic domins. In Proceedings of the Fourteenth Interntionl Joint Conference on Artificil Intelligence (IJCAI). Pineu, J.; Montemerlo, M.; Pollck, M.; Roy, N.; nd Thrun, S Towrds rootic ssistnts in nursing homes: Chllenges nd results. Rootics nd Autonomous Systems 42: Pineu, J.; Gordon, G.; nd Thrun, S Point-sed vlue itertion: An nytime lgorithm for POMDPs. In Proceedings of the Eighteenth Interntionl Joint Conference on Artificil Intelligence (IJCAI). Pouprt, P., nd Boutilier, C Vlue-directed compression of POMDPs. In Advnces in Neurl Informtion Processing Systems (NIPS 15). Pouprt, P., nd Boutilier, C Bounded finite stte controllers. In Advnces in Neurl Informtion Processing Systems (NIPS 16). Pouprt, P., nd Boutilier, C VDCBPI: An pproximte sclle lgorithm for lrge POMDPs. In Advnces in Neurl Informtion Processing Systems (NIPS 17). Pouprt, P Exploiting Structure to efficienty solve lrge scle prtilly oservle Mrkov decision processes. Ph.D. Disserttion, Deprtment of Computer Science, University of Toronto. Roy, N.; Gordon, G.; nd Thrun, S Finding pproximte POMDP solutions through elief compression. Journl of Artificil Intelligence Reserch 23:1 40. Smith, T., nd Simmons, R Point-sed POMDP lgorithms: Improved nlysis nd implementtion. In Proceedings of the Twenty-first Conference on Uncertinty in Artificil Intelligence (UAI). Sondik, E The optiml control of prtilly oservle Mrkov processes over the infinite horizon: Discounted costs. Opertions Reserch 26: Spn, M., nd Vlssis, N Perseus: Rndomized point-sed vlue itertion for POMDPs. Journl of Artificil Intelligence Reserch 24: Thrun, S.; Burgrd, W.; nd Fox, D Proilistic Rootics. MIT Press. Thrun, S Monte Crlo POMDPs. In Advnces in Neurl Informtion Processing Systems (NIPS 12). Toh, K.; Todd, M.; nd Tutuncu, R SDPT3 Mtl softwre pckge for semidefinite progrmming. Optimiztion Methods nd Softwre 11. Zhng, N., nd Zhng, W Speeding up the convergence of vlue itertion in prtilly oservle Mrkov decision processes. Journl of Artificil Intelligence Reserch 14: Zhou, R., nd Hnsen, E An improved grid-sed pproximtion lgorithm for POMDPs. In Proceedings of the Seventeenth Interntionl Joint Conference on Artificil Intelligence (IJCAI).

Point-Based POMDP Algorithms: Improved Analysis and Implementation

Point-Based POMDP Algorithms: Improved Analysis and Implementation Point-Bsed POMDP Algorithms: Improved Anlysis nd Implementtion Trey Smith nd Reid Simmons Rootics Institute, Crnegie Mellon University Pittsurgh, PA 15213 Astrct Existing complexity ounds for point-sed

More information

Reinforcement learning II

Reinforcement learning II CS 1675 Introduction to Mchine Lerning Lecture 26 Reinforcement lerning II Milos Huskrecht milos@cs.pitt.edu 5329 Sennott Squre Reinforcement lerning Bsics: Input x Lerner Output Reinforcement r Critic

More information

Review of Gaussian Quadrature method

Review of Gaussian Quadrature method Review of Gussin Qudrture method Nsser M. Asi Spring 006 compiled on Sundy Decemer 1, 017 t 09:1 PM 1 The prolem To find numericl vlue for the integrl of rel vlued function of rel vrile over specific rnge

More information

19 Optimal behavior: Game theory

19 Optimal behavior: Game theory Intro. to Artificil Intelligence: Dle Schuurmns, Relu Ptrscu 1 19 Optiml behvior: Gme theory Adversril stte dynmics hve to ccount for worst cse Compute policy π : S A tht mximizes minimum rewrd Let S (,

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Lerning Tom Mitchell, Mchine Lerning, chpter 13 Outline Introduction Comprison with inductive lerning Mrkov Decision Processes: the model Optiml policy: The tsk Q Lerning: Q function Algorithm

More information

2D1431 Machine Learning Lab 3: Reinforcement Learning

2D1431 Machine Learning Lab 3: Reinforcement Learning 2D1431 Mchine Lerning Lb 3: Reinforcement Lerning Frnk Hoffmnn modified by Örjn Ekeberg December 7, 2004 1 Introduction In this lb you will lern bout dynmic progrmming nd reinforcement lerning. It is ssumed

More information

Module 6 Value Iteration. CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo

Module 6 Value Iteration. CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo Module 6 Vlue Itertion CS 886 Sequentil Decision Mking nd Reinforcement Lerning University of Wterloo Mrkov Decision Process Definition Set of sttes: S Set of ctions (i.e., decisions): A Trnsition model:

More information

An Optimal Best-First Search Algorithm for Solving Infinite Horizon DEC-POMDPs

An Optimal Best-First Search Algorithm for Solving Infinite Horizon DEC-POMDPs An Optiml Best-First Serch Algorithm for Solving Infinite Horizon DEC-POMDPs Dniel Szer nd Frnçois Chrpillet INRIA Lorrine - LORIA, MAIA Group, 54506 Vndœuvre-lès-Nncy, Frnce {szer, chrp}@lori.fr http://mi.lori.fr

More information

Bayesian Networks: Approximate Inference

Bayesian Networks: Approximate Inference pproches to inference yesin Networks: pproximte Inference xct inference Vrillimintion Join tree lgorithm pproximte inference Simplify the structure of the network to mkxct inferencfficient (vritionl methods,

More information

p-adic Egyptian Fractions

p-adic Egyptian Fractions p-adic Egyptin Frctions Contents 1 Introduction 1 2 Trditionl Egyptin Frctions nd Greedy Algorithm 2 3 Set-up 3 4 p-greedy Algorithm 5 5 p-egyptin Trditionl 10 6 Conclusion 1 Introduction An Egyptin frction

More information

Discrete Mathematics and Probability Theory Spring 2013 Anant Sahai Lecture 17

Discrete Mathematics and Probability Theory Spring 2013 Anant Sahai Lecture 17 EECS 70 Discrete Mthemtics nd Proility Theory Spring 2013 Annt Shi Lecture 17 I.I.D. Rndom Vriles Estimting the is of coin Question: We wnt to estimte the proportion p of Democrts in the US popultion,

More information

Discrete Mathematics and Probability Theory Summer 2014 James Cook Note 17

Discrete Mathematics and Probability Theory Summer 2014 James Cook Note 17 CS 70 Discrete Mthemtics nd Proility Theory Summer 2014 Jmes Cook Note 17 I.I.D. Rndom Vriles Estimting the is of coin Question: We wnt to estimte the proportion p of Democrts in the US popultion, y tking

More information

QUADRATURE is an old-fashioned word that refers to

QUADRATURE is an old-fashioned word that refers to World Acdemy of Science Engineering nd Technology Interntionl Journl of Mthemticl nd Computtionl Sciences Vol:5 No:7 011 A New Qudrture Rule Derived from Spline Interpoltion with Error Anlysis Hdi Tghvfrd

More information

An Optimal Best-first Search Algorithm for Solving Infinite Horizon DEC-POMDPs

An Optimal Best-first Search Algorithm for Solving Infinite Horizon DEC-POMDPs An Optiml Best-first Serch Algorithm for Solving Infinite Horizon DEC-POMDPs Dniel Szer, Frnçois Chrpillet To cite this version: Dniel Szer, Frnçois Chrpillet. An Optiml Best-first Serch Algorithm for

More information

Genetic Programming. Outline. Evolutionary Strategies. Evolutionary strategies Genetic programming Summary

Genetic Programming. Outline. Evolutionary Strategies. Evolutionary strategies Genetic programming Summary Outline Genetic Progrmming Evolutionry strtegies Genetic progrmming Summry Bsed on the mteril provided y Professor Michel Negnevitsky Evolutionry Strtegies An pproch simulting nturl evolution ws proposed

More information

Farey Fractions. Rickard Fernström. U.U.D.M. Project Report 2017:24. Department of Mathematics Uppsala University

Farey Fractions. Rickard Fernström. U.U.D.M. Project Report 2017:24. Department of Mathematics Uppsala University U.U.D.M. Project Report 07:4 Frey Frctions Rickrd Fernström Exmensrete i mtemtik, 5 hp Hledre: Andres Strömergsson Exmintor: Jörgen Östensson Juni 07 Deprtment of Mthemtics Uppsl University Frey Frctions

More information

An Analytic Solution to Discrete Bayesian Reinforcement Learning

An Analytic Solution to Discrete Bayesian Reinforcement Learning An Anlytic Solution to Discrete Byesin Reinforcement Lerning Pscl Pouprt ppouprt@cs.uwterloo.c Cheriton School of Computer Science, University of Wterloo, Wterloo, Ontrio, Cnd Nikos Vlssis vlssis@science.uv.nl

More information

Torsion in Groups of Integral Triangles

Torsion in Groups of Integral Triangles Advnces in Pure Mthemtics, 01,, 116-10 http://dxdoiorg/1046/pm011015 Pulished Online Jnury 01 (http://wwwscirporg/journl/pm) Torsion in Groups of Integrl Tringles Will Murry Deprtment of Mthemtics nd Sttistics,

More information

The Minimum Label Spanning Tree Problem: Illustrating the Utility of Genetic Algorithms

The Minimum Label Spanning Tree Problem: Illustrating the Utility of Genetic Algorithms The Minimum Lel Spnning Tree Prolem: Illustrting the Utility of Genetic Algorithms Yupei Xiong, Univ. of Mrylnd Bruce Golden, Univ. of Mrylnd Edwrd Wsil, Americn Univ. Presented t BAE Systems Distinguished

More information

1 Online Learning and Regret Minimization

1 Online Learning and Regret Minimization 2.997 Decision-Mking in Lrge-Scle Systems My 10 MIT, Spring 2004 Hndout #29 Lecture Note 24 1 Online Lerning nd Regret Minimiztion In this lecture, we consider the problem of sequentil decision mking in

More information

Chapter 6 Techniques of Integration

Chapter 6 Techniques of Integration MA Techniques of Integrtion Asst.Prof.Dr.Suprnee Liswdi Chpter 6 Techniques of Integrtion Recll: Some importnt integrls tht we hve lernt so fr. Tle of Integrls n+ n d = + C n + e d = e + C ( n ) d = ln

More information

New Expansion and Infinite Series

New Expansion and Infinite Series Interntionl Mthemticl Forum, Vol. 9, 204, no. 22, 06-073 HIKARI Ltd, www.m-hikri.com http://dx.doi.org/0.2988/imf.204.4502 New Expnsion nd Infinite Series Diyun Zhng College of Computer Nnjing University

More information

Dual Formulations for Optimizing Dec-POMDP Controllers

Dual Formulations for Optimizing Dec-POMDP Controllers Dul Formultions for Optimizing Dec-POMDP Controllers Aksht Kumr nd Hl Mostf nd Shlomo Zilerstein School of Informtion Systems, Singpore Mngement University, United Technologies Reserch Center College of

More information

CMDA 4604: Intermediate Topics in Mathematical Modeling Lecture 19: Interpolation and Quadrature

CMDA 4604: Intermediate Topics in Mathematical Modeling Lecture 19: Interpolation and Quadrature CMDA 4604: Intermedite Topics in Mthemticl Modeling Lecture 19: Interpoltion nd Qudrture In this lecture we mke brief diversion into the res of interpoltion nd qudrture. Given function f C[, b], we sy

More information

Model Reduction of Finite State Machines by Contraction

Model Reduction of Finite State Machines by Contraction Model Reduction of Finite Stte Mchines y Contrction Alessndro Giu Dip. di Ingegneri Elettric ed Elettronic, Università di Cgliri, Pizz d Armi, 09123 Cgliri, Itly Phone: +39-070-675-5892 Fx: +39-070-675-5900

More information

I1 = I2 I1 = I2 + I3 I1 + I2 = I3 + I4 I 3

I1 = I2 I1 = I2 + I3 I1 + I2 = I3 + I4 I 3 2 The Prllel Circuit Electric Circuits: Figure 2- elow show ttery nd multiple resistors rrnged in prllel. Ech resistor receives portion of the current from the ttery sed on its resistnce. The split is

More information

History-Based Controller Design and Optimization for Partially Observable MDPs

History-Based Controller Design and Optimization for Partially Observable MDPs History-Bsed Controller Design nd Optimiztion for Prtilly Observble MDPs Aksht Kumr School of Informtion Systems Singpore Mngement University kshtkumr@smu.edu.sg Shlomo Zilberstein School of Computer Science

More information

Bases for Vector Spaces

Bases for Vector Spaces Bses for Vector Spces 2-26-25 A set is independent if, roughly speking, there is no redundncy in the set: You cn t uild ny vector in the set s liner comintion of the others A set spns if you cn uild everything

More information

A Fast and Reliable Policy Improvement Algorithm

A Fast and Reliable Policy Improvement Algorithm A Fst nd Relible Policy Improvement Algorithm Ysin Abbsi-Ydkori Peter L. Brtlett Stephen J. Wright Queenslnd University of Technology UC Berkeley nd QUT University of Wisconsin-Mdison Abstrct We introduce

More information

Administrivia CSE 190: Reinforcement Learning: An Introduction

Administrivia CSE 190: Reinforcement Learning: An Introduction Administrivi CSE 190: Reinforcement Lerning: An Introduction Any emil sent to me bout the course should hve CSE 190 in the subject line! Chpter 4: Dynmic Progrmming Acknowledgment: A good number of these

More information

Math 270A: Numerical Linear Algebra

Math 270A: Numerical Linear Algebra Mth 70A: Numericl Liner Algebr Instructor: Michel Holst Fll Qurter 014 Homework Assignment #3 Due Give to TA t lest few dys before finl if you wnt feedbck. Exercise 3.1. (The Bsic Liner Method for Liner

More information

Lecture 3. In this lecture, we will discuss algorithms for solving systems of linear equations.

Lecture 3. In this lecture, we will discuss algorithms for solving systems of linear equations. Lecture 3 3 Solving liner equtions In this lecture we will discuss lgorithms for solving systems of liner equtions Multiplictive identity Let us restrict ourselves to considering squre mtrices since one

More information

Markov Decision Processes

Markov Decision Processes Mrkov Deciion Procee A Brief Introduction nd Overview Jck L. King Ph.D. Geno UK Limited Preenttion Outline Introduction to MDP Motivtion for Study Definition Key Point of Interet Solution Technique Prtilly

More information

Bellman Optimality Equation for V*

Bellman Optimality Equation for V* Bellmn Optimlity Eqution for V* The vlue of stte under n optiml policy must equl the expected return for the best ction from tht stte: V (s) mx Q (s,) A(s) mx A(s) mx A(s) Er t 1 V (s t 1 ) s t s, t s

More information

MOMDPs: a Solution for Modelling Adaptive Management Problems

MOMDPs: a Solution for Modelling Adaptive Management Problems MOMDPs: Solution for Modelling Adptive Mngement Problems Idine Chdès nd Josie Crwrdine nd Tr G. Mrtin CSIRO Ecosystem Sciences {idine.chdes, josie.crwrdine, tr.mrtin}@csiro.u Smuel Nicol University of

More information

Quadratic Forms. Quadratic Forms

Quadratic Forms. Quadratic Forms Qudrtic Forms Recll the Simon & Blume excerpt from n erlier lecture which sid tht the min tsk of clculus is to pproximte nonliner functions with liner functions. It s ctully more ccurte to sy tht we pproximte

More information

Section 4: Integration ECO4112F 2011

Section 4: Integration ECO4112F 2011 Reding: Ching Chpter Section : Integrtion ECOF Note: These notes do not fully cover the mteril in Ching, ut re ment to supplement your reding in Ching. Thus fr the optimistion you hve covered hs een sttic

More information

Parse trees, ambiguity, and Chomsky normal form

Parse trees, ambiguity, and Chomsky normal form Prse trees, miguity, nd Chomsky norml form In this lecture we will discuss few importnt notions connected with contextfree grmmrs, including prse trees, miguity, nd specil form for context-free grmmrs

More information

Continuous Random Variables Class 5, Jeremy Orloff and Jonathan Bloom

Continuous Random Variables Class 5, Jeremy Orloff and Jonathan Bloom Lerning Gols Continuous Rndom Vriles Clss 5, 8.05 Jeremy Orloff nd Jonthn Bloom. Know the definition of continuous rndom vrile. 2. Know the definition of the proility density function (pdf) nd cumultive

More information

ORDER REDUCTION USING POLE CLUSTERING AND FACTOR DIVISION METHOD

ORDER REDUCTION USING POLE CLUSTERING AND FACTOR DIVISION METHOD Author Nme et. l. / Interntionl Journl of New Technologies in Science nd Engineering Vol., Issue., 7, ISSN 9-78 ORDER REDUCTION USING POLE CLUSTERING AND FACTOR DIVISION METHOD A Chinn Nidu* G Dileep**

More information

CS 188: Artificial Intelligence Fall Announcements

CS 188: Artificial Intelligence Fall Announcements CS 188: Artificil Intelligence Fll 2009 Lecture 20: Prticle Filtering 11/5/2009 Dn Klein UC Berkeley Announcements Written 3 out: due 10/12 Project 4 out: due 10/19 Written 4 proly xed, Project 5 moving

More information

Lecture 2: January 27

Lecture 2: January 27 CS 684: Algorithmic Gme Theory Spring 217 Lecturer: Év Trdos Lecture 2: Jnury 27 Scrie: Alert Julius Liu 2.1 Logistics Scrie notes must e sumitted within 24 hours of the corresponding lecture for full

More information

Numerical Analysis: Trapezoidal and Simpson s Rule

Numerical Analysis: Trapezoidal and Simpson s Rule nd Simpson s Mthemticl question we re interested in numericlly nswering How to we evlute I = f (x) dx? Clculus tells us tht if F(x) is the ntiderivtive of function f (x) on the intervl [, b], then I =

More information

Flexible Beam. Objectives

Flexible Beam. Objectives Flexile Bem Ojectives The ojective of this l is to lern out the chllenges posed y resonnces in feedck systems. An intuitive understnding will e gined through the mnul control of flexile em resemling lrge

More information

CS 188 Introduction to Artificial Intelligence Fall 2018 Note 7

CS 188 Introduction to Artificial Intelligence Fall 2018 Note 7 CS 188 Introduction to Artificil Intelligence Fll 2018 Note 7 These lecture notes re hevily bsed on notes originlly written by Nikhil Shrm. Decision Networks In the third note, we lerned bout gme trees

More information

Chapter 5 Plan-Space Planning

Chapter 5 Plan-Space Planning Lecture slides for Automted Plnning: Theory nd Prctice Chpter 5 Pln-Spce Plnning Dn S. Nu CMSC 722, AI Plnning University of Mrylnd, Spring 2008 1 Stte-Spce Plnning Motivtion g 1 1 g 4 4 s 0 g 5 5 g 2

More information

Compiler Design. Fall Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Fall Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz University of Southern Cliforni Computer Science Deprtment Compiler Design Fll Lexicl Anlysis Smple Exercises nd Solutions Prof. Pedro C. Diniz USC / Informtion Sciences Institute 4676 Admirlty Wy, Suite

More information

Convert the NFA into DFA

Convert the NFA into DFA Convert the NF into F For ech NF we cn find F ccepting the sme lnguge. The numer of sttes of the F could e exponentil in the numer of sttes of the NF, ut in prctice this worst cse occurs rrely. lgorithm:

More information

Review of Calculus, cont d

Review of Calculus, cont d Jim Lmbers MAT 460 Fll Semester 2009-10 Lecture 3 Notes These notes correspond to Section 1.1 in the text. Review of Clculus, cont d Riemnn Sums nd the Definite Integrl There re mny cses in which some

More information

Lecture Solution of a System of Linear Equation

Lecture Solution of a System of Linear Equation ChE Lecture Notes, Dept. of Chemicl Engineering, Univ. of TN, Knoville - D. Keffer, 5/9/98 (updted /) Lecture 8- - Solution of System of Liner Eqution 8. Why is it importnt to e le to solve system of liner

More information

Designing finite automata II

Designing finite automata II Designing finite utomt II Prolem: Design DFA A such tht L(A) consists of ll strings of nd which re of length 3n, for n = 0, 1, 2, (1) Determine wht to rememer out the input string Assign stte to ech of

More information

SOME INTEGRAL INEQUALITIES OF GRÜSS TYPE

SOME INTEGRAL INEQUALITIES OF GRÜSS TYPE RGMIA Reserch Report Collection, Vol., No., 998 http://sci.vut.edu.u/ rgmi SOME INTEGRAL INEQUALITIES OF GRÜSS TYPE S.S. DRAGOMIR Astrct. Some clssicl nd new integrl inequlities of Grüss type re presented.

More information

Near-Bayesian Exploration in Polynomial Time

Near-Bayesian Exploration in Polynomial Time J. Zico Kolter kolter@cs.stnford.edu Andrew Y. Ng ng@cs.stnford.edu Computer Science Deprtment, Stnford University, CA 94305 Abstrct We consider the explortion/exploittion problem in reinforcement lerning

More information

1B40 Practical Skills

1B40 Practical Skills B40 Prcticl Skills Comining uncertinties from severl quntities error propgtion We usully encounter situtions where the result of n experiment is given in terms of two (or more) quntities. We then need

More information

A Generalization of Two-Player Stackelberg Games to Three Players

A Generalization of Two-Player Stackelberg Games to Three Players A Generliztion of Two-Plyer Stckelerg Gmes to Three Plyers Grrett Andersen 1 Introduction Two-plyer Stckelerg gmes nd their pplictions to security re currently very hot topic in the field of Algorithmic

More information

Calculus Module C21. Areas by Integration. Copyright This publication The Northern Alberta Institute of Technology All Rights Reserved.

Calculus Module C21. Areas by Integration. Copyright This publication The Northern Alberta Institute of Technology All Rights Reserved. Clculus Module C Ares Integrtion Copright This puliction The Northern Alert Institute of Technolog 7. All Rights Reserved. LAST REVISED Mrch, 9 Introduction to Ares Integrtion Sttement of Prerequisite

More information

CHAPTER 1 PROGRAM OF MATRICES

CHAPTER 1 PROGRAM OF MATRICES CHPTER PROGRM OF MTRICES -- INTRODUCTION definition of engineering is the science y which the properties of mtter nd sources of energy in nture re mde useful to mn. Thus n engineer will hve to study the

More information

Surface maps into free groups

Surface maps into free groups Surfce mps into free groups lden Wlker Novemer 10, 2014 Free groups wedge X of two circles: Set F = π 1 (X ) =,. We write cpitl letters for inverse, so = 1. e.g. () 1 = Commuttors Let x nd y e loops. The

More information

Coalgebra, Lecture 15: Equations for Deterministic Automata

Coalgebra, Lecture 15: Equations for Deterministic Automata Colger, Lecture 15: Equtions for Deterministic Automt Julin Slmnc (nd Jurrin Rot) Decemer 19, 2016 In this lecture, we will study the concept of equtions for deterministic utomt. The notes re self contined

More information

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2016

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2016 CS125 Lecture 12 Fll 2016 12.1 Nondeterminism The ide of nondeterministic computtions is to llow our lgorithms to mke guesses, nd only require tht they ccept when the guesses re correct. For exmple, simple

More information

CS 188: Artificial Intelligence Spring 2007

CS 188: Artificial Intelligence Spring 2007 CS 188: Artificil Intelligence Spring 2007 Lecture 3: Queue-Bsed Serch 1/23/2007 Srini Nrynn UC Berkeley Mny slides over the course dpted from Dn Klein, Sturt Russell or Andrew Moore Announcements Assignment

More information

{ } = E! & $ " k r t +k +1

{ } = E! & $  k r t +k +1 Chpter 4: Dynmic Progrmming Objectives of this chpter: Overview of collection of clssicl solution methods for MDPs known s dynmic progrmming (DP) Show how DP cn be used to compute vlue functions, nd hence,

More information

Linear Systems with Constant Coefficients

Linear Systems with Constant Coefficients Liner Systems with Constnt Coefficients 4-3-05 Here is system of n differentil equtions in n unknowns: x x + + n x n, x x + + n x n, x n n x + + nn x n This is constnt coefficient liner homogeneous system

More information

Assignment 1 Automata, Languages, and Computability. 1 Finite State Automata and Regular Languages

Assignment 1 Automata, Languages, and Computability. 1 Finite State Automata and Regular Languages Deprtment of Computer Science, Austrlin Ntionl University COMP2600 Forml Methods for Softwre Engineering Semester 2, 206 Assignment Automt, Lnguges, nd Computility Smple Solutions Finite Stte Automt nd

More information

Chapter 4: Dynamic Programming

Chapter 4: Dynamic Programming Chpter 4: Dynmic Progrmming Objectives of this chpter: Overview of collection of clssicl solution methods for MDPs known s dynmic progrmming (DP) Show how DP cn be used to compute vlue functions, nd hence,

More information

Nondeterminism and Nodeterministic Automata

Nondeterminism and Nodeterministic Automata Nondeterminism nd Nodeterministic Automt 61 Nondeterminism nd Nondeterministic Automt The computtionl mchine models tht we lerned in the clss re deterministic in the sense tht the next move is uniquely

More information

Generation of Lyapunov Functions by Neural Networks

Generation of Lyapunov Functions by Neural Networks WCE 28, July 2-4, 28, London, U.K. Genertion of Lypunov Functions by Neurl Networks Nvid Noroozi, Pknoosh Krimghee, Ftemeh Sfei, nd Hmed Jvdi Abstrct Lypunov function is generlly obtined bsed on tril nd

More information

7.8 Improper Integrals

7.8 Improper Integrals 7.8 7.8 Improper Integrls The Completeness Axiom of the Rel Numers Roughly speking, the rel numers re clled complete ecuse they hve no holes. The completeness of the rel numers hs numer of importnt consequences.

More information

P 3 (x) = f(0) + f (0)x + f (0) 2. x 2 + f (0) . In the problem set, you are asked to show, in general, the n th order term is a n = f (n) (0)

P 3 (x) = f(0) + f (0)x + f (0) 2. x 2 + f (0) . In the problem set, you are asked to show, in general, the n th order term is a n = f (n) (0) 1 Tylor polynomils In Section 3.5, we discussed how to pproximte function f(x) round point in terms of its first derivtive f (x) evluted t, tht is using the liner pproximtion f() + f ()(x ). We clled this

More information

INEQUALITIES FOR TWO SPECIFIC CLASSES OF FUNCTIONS USING CHEBYSHEV FUNCTIONAL. Mohammad Masjed-Jamei

INEQUALITIES FOR TWO SPECIFIC CLASSES OF FUNCTIONS USING CHEBYSHEV FUNCTIONAL. Mohammad Masjed-Jamei Fculty of Sciences nd Mthemtics University of Niš Seri Aville t: http://www.pmf.ni.c.rs/filomt Filomt 25:4 20) 53 63 DOI: 0.2298/FIL0453M INEQUALITIES FOR TWO SPECIFIC CLASSES OF FUNCTIONS USING CHEBYSHEV

More information

The Shortest Confidence Interval for the Mean of a Normal Distribution

The Shortest Confidence Interval for the Mean of a Normal Distribution Interntionl Journl of Sttistics nd Proility; Vol. 7, No. 2; Mrch 208 ISSN 927-7032 E-ISSN 927-7040 Pulished y Cndin Center of Science nd Eduction The Shortest Confidence Intervl for the Men of Norml Distriution

More information

Derivations for maximum likelihood estimation of particle size distribution using in situ video imaging

Derivations for maximum likelihood estimation of particle size distribution using in situ video imaging 2 TWMCC Texs-Wisconsin Modeling nd Control Consortium 1 Technicl report numer 27-1 Derivtions for mximum likelihood estimtion of prticle size distriution using in situ video imging Pul A. Lrsen nd Jmes

More information

An LP-Based Heuristic for Optimal Planning

An LP-Based Heuristic for Optimal Planning An LP-Bsed Heuristic for Optiml Plnning Menkes vn den Briel 1, J. Benton 2, Suro Kmhmpti 2, nd Thoms Vossen 3 Arizon Stte University, Deprtment of Industril Engineering 1, Deprtment of Computer Science

More information

Best Approximation. Chapter The General Case

Best Approximation. Chapter The General Case Chpter 4 Best Approximtion 4.1 The Generl Cse In the previous chpter, we hve seen how n interpolting polynomil cn be used s n pproximtion to given function. We now wnt to find the best pproximtion to given

More information

THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS.

THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS. THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS RADON ROSBOROUGH https://intuitiveexplntionscom/picrd-lindelof-theorem/ This document is proof of the existence-uniqueness theorem

More information

Reward Shaping for Model-Based Bayesian Reinforcement Learning

Reward Shaping for Model-Based Bayesian Reinforcement Learning Rewrd Shping for Model-Bsed Byesin Reinforcement Lerning Hyeoneun Kim, Woosng Lim, Knghoon Lee, Yung-Kyun Noh nd Kee-Eung Kim Deprtment of Computer Science Kore Advnced Institute of Science nd Technology

More information

State space systems analysis (continued) Stability. A. Definitions A system is said to be Asymptotically Stable (AS) when it satisfies

State space systems analysis (continued) Stability. A. Definitions A system is said to be Asymptotically Stable (AS) when it satisfies Stte spce systems nlysis (continued) Stbility A. Definitions A system is sid to be Asymptoticlly Stble (AS) when it stisfies ut () = 0, t > 0 lim xt () 0. t A system is AS if nd only if the impulse response

More information

Hamiltonian Cycle in Complete Multipartite Graphs

Hamiltonian Cycle in Complete Multipartite Graphs Annls of Pure nd Applied Mthemtics Vol 13, No 2, 2017, 223-228 ISSN: 2279-087X (P), 2279-0888(online) Pulished on 18 April 2017 wwwreserchmthsciorg DOI: http://dxdoiorg/1022457/pmv13n28 Annls of Hmiltonin

More information

A Variance Analysis for POMDP Policy Evaluation

A Variance Analysis for POMDP Policy Evaluation Proceedings of the Twenty-Third AAAI Conference on Artificil Intelligence (2008) A Vrince Anlysis for POMDP Policy Evlution Mhdi Milni Frd nd Joelle Pineu School of Computer Science McGill University,

More information

We will see what is meant by standard form very shortly

We will see what is meant by standard form very shortly THEOREM: For fesible liner progrm in its stndrd form, the optimum vlue of the objective over its nonempty fesible region is () either unbounded or (b) is chievble t lest t one extreme point of the fesible

More information

The Regulated and Riemann Integrals

The Regulated and Riemann Integrals Chpter 1 The Regulted nd Riemnn Integrls 1.1 Introduction We will consider severl different pproches to defining the definite integrl f(x) dx of function f(x). These definitions will ll ssign the sme vlue

More information

WENJUN LIU AND QUÔ C ANH NGÔ

WENJUN LIU AND QUÔ C ANH NGÔ AN OSTROWSKI-GRÜSS TYPE INEQUALITY ON TIME SCALES WENJUN LIU AND QUÔ C ANH NGÔ Astrct. In this pper we derive new inequlity of Ostrowski-Grüss type on time scles nd thus unify corresponding continuous

More information

CS415 Compilers. Lexical Analysis and. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

CS415 Compilers. Lexical Analysis and. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University CS415 Compilers Lexicl Anlysis nd These slides re sed on slides copyrighted y Keith Cooper, Ken Kennedy & Lind Torczon t Rice University First Progrmming Project Instruction Scheduling Project hs een posted

More information

MATH34032: Green s Functions, Integral Equations and the Calculus of Variations 1

MATH34032: Green s Functions, Integral Equations and the Calculus of Variations 1 MATH34032: Green s Functions, Integrl Equtions nd the Clculus of Vritions 1 Section 1 Function spces nd opertors Here we gives some brief detils nd definitions, prticulrly relting to opertors. For further

More information

AN INEQUALITY OF OSTROWSKI TYPE AND ITS APPLICATIONS FOR SIMPSON S RULE AND SPECIAL MEANS. I. Fedotov and S. S. Dragomir

AN INEQUALITY OF OSTROWSKI TYPE AND ITS APPLICATIONS FOR SIMPSON S RULE AND SPECIAL MEANS. I. Fedotov and S. S. Dragomir RGMIA Reserch Report Collection, Vol., No., 999 http://sci.vu.edu.u/ rgmi AN INEQUALITY OF OSTROWSKI TYPE AND ITS APPLICATIONS FOR SIMPSON S RULE AND SPECIAL MEANS I. Fedotov nd S. S. Drgomir Astrct. An

More information

A Symbolic Approach to Control via Approximate Bisimulations

A Symbolic Approach to Control via Approximate Bisimulations A Symolic Approch to Control vi Approximte Bisimultions Antoine Girrd Lortoire Jen Kuntzmnn, Université Joseph Fourier Grenole, Frnce Interntionl Symposium on Innovtive Mthemticl Modelling Tokyo, Jpn,

More information

5.7 Improper Integrals

5.7 Improper Integrals 458 pplictions of definite integrls 5.7 Improper Integrls In Section 5.4, we computed the work required to lift pylod of mss m from the surfce of moon of mss nd rdius R to height H bove the surfce of the

More information

CS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata

CS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata CS103B ndout 18 Winter 2007 Ferury 28, 2007 Finite Automt Initil text y Mggie Johnson. Introduction Severl childrens gmes fit the following description: Pieces re set up on plying ord; dice re thrown or

More information

Section 6.1 Definite Integral

Section 6.1 Definite Integral Section 6.1 Definite Integrl Suppose we wnt to find the re of region tht is not so nicely shped. For exmple, consider the function shown elow. The re elow the curve nd ove the x xis cnnot e determined

More information

5: The Definite Integral

5: The Definite Integral 5: The Definite Integrl 5.: Estimting with Finite Sums Consider moving oject its velocity (meters per second) t ny time (seconds) is given y v t = t+. Cn we use this informtion to determine the distnce

More information

Section 6: Area, Volume, and Average Value

Section 6: Area, Volume, and Average Value Chpter The Integrl Applied Clculus Section 6: Are, Volume, nd Averge Vlue Are We hve lredy used integrls to find the re etween the grph of function nd the horizontl xis. Integrls cn lso e used to find

More information

NUMERICAL INTEGRATION

NUMERICAL INTEGRATION NUMERICAL INTEGRATION How do we evlute I = f (x) dx By the fundmentl theorem of clculus, if F (x) is n ntiderivtive of f (x), then I = f (x) dx = F (x) b = F (b) F () However, in prctice most integrls

More information

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2014

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2014 CS125 Lecture 12 Fll 2014 12.1 Nondeterminism The ide of nondeterministic computtions is to llow our lgorithms to mke guesses, nd only require tht they ccept when the guesses re correct. For exmple, simple

More information

Suppose we want to find the area under the parabola and above the x axis, between the lines x = 2 and x = -2.

Suppose we want to find the area under the parabola and above the x axis, between the lines x = 2 and x = -2. Mth 43 Section 6. Section 6.: Definite Integrl Suppose we wnt to find the re of region tht is not so nicely shped. For exmple, consider the function shown elow. The re elow the curve nd ove the x xis cnnot

More information

expression simply by forming an OR of the ANDs of all input variables for which the output is

expression simply by forming an OR of the ANDs of all input variables for which the output is 2.4 Logic Minimiztion nd Krnugh Mps As we found ove, given truth tle, it is lwys possile to write down correct logic expression simply y forming n OR of the ANDs of ll input vriles for which the output

More information

2.4 Linear Inequalities and Interval Notation

2.4 Linear Inequalities and Interval Notation .4 Liner Inequlities nd Intervl Nottion We wnt to solve equtions tht hve n inequlity symol insted of n equl sign. There re four inequlity symols tht we will look t: Less thn , Less thn or

More information

Sufficient condition on noise correlations for scalable quantum computing

Sufficient condition on noise correlations for scalable quantum computing Sufficient condition on noise correltions for sclble quntum computing John Presill, 2 Februry 202 Is quntum computing sclble? The ccurcy threshold theorem for quntum computtion estblishes tht sclbility

More information

Random subgroups of a free group

Random subgroups of a free group Rndom sugroups of free group Frédérique Bssino LIPN - Lortoire d Informtique de Pris Nord, Université Pris 13 - CNRS Joint work with Armndo Mrtino, Cyril Nicud, Enric Ventur et Pscl Weil LIX My, 2015 Introduction

More information

10 Vector Integral Calculus

10 Vector Integral Calculus Vector Integrl lculus Vector integrl clculus extends integrls s known from clculus to integrls over curves ("line integrls"), surfces ("surfce integrls") nd solids ("volume integrls"). These integrls hve

More information

How do we solve these things, especially when they get complicated? How do we know when a system has a solution, and when is it unique?

How do we solve these things, especially when they get complicated? How do we know when a system has a solution, and when is it unique? XII. LINEAR ALGEBRA: SOLVING SYSTEMS OF EQUATIONS Tody we re going to tlk out solving systems of liner equtions. These re prolems tht give couple of equtions with couple of unknowns, like: 6= x + x 7=

More information