Point-Based POMDP Algorithms: Improved Analysis and Implementation
|
|
- Roger Julius Peters
- 6 years ago
- Views:
Transcription
1 Point-Bsed POMDP Algorithms: Improved Anlysis nd Implementtion Trey Smith nd Reid Simmons Rootics Institute, Crnegie Mellon University Pittsurgh, PA Astrct Existing complexity ounds for point-sed POMDP vlue itertion lgorithms focus either on the curse of dimensionlity or the curse of history. We derive new ound tht relies on oth nd uses the concept of discounted rechility; our conclusions my help guide future lgorithm design. We lso discuss recent improvements to our (point-sed) heuristic serch vlue itertion lgorithm. Our new implementtion clcultes tighter initil ounds, voids solving liner progrms, nd mkes more effective use of sprsity. Empiricl results show speedups of more thn two orders of mgnitude. 1 INTRODUCTION Prtilly oservle Mrkov decision processes (POMDPs) constitute powerful proilistic model for plnning prolems tht include hidden stte nd uncertinty in ction effects. Recently, severl POMDP solution lgorithms hve een developed tht use pproximte vlue itertion with point-sed updtes. These lgorithms hve proven to scle very effectively, relying on the fct tht performing mny fst pproximte updtes often results in more useful vlue function thn performing few exct updtes. Point-sed updtes re pplied over set B of eliefs drwn from the rechle prt of the elief simplex. One cn derive ound on the pproximtion error tht is proportionl to the smple spcing of B [Pineu et l., 2003]. The numer of points required is driven y the curse of dimensionlity: chieving desired smple spcing requires numer of smples exponentil in the dimensionlity of the elief simplex. However, in discounted prolems, one cn tolerte more pproximtion error t points tht re only rechle fter mny time steps. This ide, which is not used in the smple spcing rgument, is the sis of second type of convergence result [Zhng nd Zhng, 2001, Smith nd Simmons, 2004], in which the error ound is derived from the fct tht B smples enough of the serch tree to some depth. The numer of points required is driven y the curse of history: fully expnding the serch tree to depth t requires numer of points exponentil in t. This pper presents new convergence rgument tht drws on oth pproches. Our nlysis pplies to the cse when the smple spcing vries ccording to wht we cll discounted rechility, which more ccurtely reflects the ehvior of current lgorithms (Fig. 1). () uniform density rechle eliefs 0 () non-uniform reflecting discounted rechility Figure 1: Smpling strtegies for B. The reminder of the pper discusses recent improvements in our heuristic serch vlue itertion lgorithm (HSVI). HSVI is point-sed lgorithm tht mintins oth upper nd lower ounds on the optiml vlue function, llowing it to use effective heuristics for ction nd oservtion selection, nd to provide provly smll regret from the policy it genertes [Smith nd Simmons, 2004]. The new implementtion of HSVI clcultes tighter initil ounds, voids solving liner progrms, nd mkes more effective use of sprsity. Empiricl results show speedups of more thn two orders of mgnitude. 2 POMDP INTRODUCTION A POMDP models plnning prolem tht includes hidden stte nd uncertinty in ction effects; the gent is ssumed to know the trnsition model. Formlly, POMDP is descried y finite set of sttes S = {s 1,..., s S },
2 finite set of ctions A = { 1,..., A }, finite set of oservtions Z = {z 1,..., z Z }, trnsition proilities T,z (s i, s j ) = Pr(s j s i,, z), rel-vlued rewrd function R(s, ), discount fctor γ < 1, nd n initil elief 0. The Mrkov property of the model ensures tht the gent cn use proility distriution over current sttes s sufficient sttistic for the history of ctions nd oservtions. Geometriclly, the spce of eliefs is simplex, denoted. At ech stge of forwrd simultion the current elief cn e updted sed on the ltest ction nd oservtion z using the formul τ(,, z), defined so tht (s ) = s T,z (s, s )(s) (1) Only suset of is rechle from 0 through repeted pplictions of τ; this suset is denoted. In generl the oject of the plnning prolem is to generte policy π tht mximizes expected long-term rewrd: [ ] J π () = E γ t R(s t, t ), π (2) t=0 A glolly optiml policy π is known to exist when γ < 1 [Howrd, 1960]. We re prticulrly interested in the focused pproximtion setting, in which one ttempts to generte policy ˆπ tht minimizes regret J π ( 0 ) J ˆπ ( 0 ) when executed strting from 0. A POMDP is often solved y pproximting its optiml vlue function V = J π. Any vlue function V induces policy π V in which ctions re selected vi one-step lookhed. The regret of the policy induced y n pproximte vlue function ˆV cn e forced ritrrily smll y reducing the mx-norm error V ˆV. Vlue itertion strts with n initil guess V 0 nd pproximtes V through repeted ppliction of the Bellmn updte V t HV t 1, where H is defined s [ ] HV () = mx R(, ) + Pr(, )V ( ) V stisfies Bellmn s eqution V = HV. When γ < 1, H is contrction nd V is the unique solution. During vlue itertion, ech V t is piecewise liner nd convex [Sondik, 1971], so it cn e represented s set of vectors Γ t = {α 1,..., α Γ }, such tht V t () = mx i (α i ). There re numer of vlue itertion lgorithms tht clculte H exctly y projecting α vectors from Γ t 1 to Γ t [Sondik, 1971, Cssndr et l., 1997]. Unfortuntely, in the worst cse the size of the representtion grows s Γ t = A Γ t 1 O, which rpidly ecomes intrctle even for modest prolem sizes. Despite clever strtegies for (3) Algorithm 1. β = ckup(γ, ). 1. β,z rgmx α Γ (α τ(,, z)) 2. β (s) R(s, ) + γ s,z β,z(s )T,z (s, s ) 3. β rgmx β (β ) pruning dominted α vectors, these lgorithms hve een unle to scle to lrger prolems. The intrctility of exct vlue itertion hs led to the development of wide vriety of pproximtion techniques, too mny to mention here [Aerdeen, 2002]. 3 POINT-BASED ALGORITHMS Point-sed vlue itertion lgorithms rely on the fct tht performing mny fst pproximte updtes often results in more useful vlue function thn performing few exct updtes. Their fundmentl opertion is the point-sed updte ckup(γ, ), which genertes single α vector from HV tht is gurnteed to e mximl t (Alg. 1). Our nlysis focuses on simple conceptul version of point-sed vlue itertion. We ssume there is fixed finite set of eliefs B. At ech step the lgorithm genertes n α vector for every point in B, nd the set of vectors Γ defines vlue function through mx-projection s descried erlier. Denote the vlue function fter t updtes s Vt B. The vlue function is initilized with V0 B R min, nd the updte rule is Vt B H B Vt 1, B where the updte opertor H B pplies the point-sed updte t every point of B: H B Γ = {ckup(γ, ) B} (4) In this cse, the pproximtion error reltive to exct vlue itertion fter t updtes, V t Vt B, is known to e ounded proportionlly with the smple spcing δ(b), which is defined to e the mximum 1-norm distnce from ny point in to B [Pineu et l., 2003]. B thus needs to contin only enough points to cover with uniform smple spcing. However, current point-sed lgorithms do not smple uniformly (lthough the PBVI lgorithm mkes some ttempt to do so). Insted, they collect points for B y forwrd simultion process tht ises B to contin eliefs tht re only few simultion steps wy from 0. However, this is rguly helps rther thn hurts. It underlies second type of convergence rgument sed on the depth of the serch tree. If B contins ll the eliefs tht result from expnding the serch tree to depth t nd t updtes over B re performed, then the pproximtion error t 0 is ounded proportionlly with γ t.
3 3.1 NEW THEORETICAL RESULTS This section presents new convergence rgument tht drws on the two erlier pproches. Its use of weighted mx-norm mchinery in vlue itertion is closely relted to [Munos, 2004]. Our rgument reflects current point-sed lgorithms in tht it llows B to e non-uniform smpling of whose spcing vries ccording to discounted rechility. The discounted rechility ρ : R is defined to e ρ() = γ L, where L is the length of the shortest sequence of elief-stte trnsitions from 0 to. ρ stisfies the property tht ρ( ) γρ() whenever there is single-step trnsition from to. Bsed on ρ, we define generlized smple spcing mesure δ p (with 0 p < 1): 1 δ p (B) = mx min B ρ p () In order to chieve smll δ p vlue, B must hve smll 1- norm distnce from ll points in, ut its distnce from cn e proportionlly lrger if ρ p () is smll. When smple spcing is ounded in terms of δ p, H B does not hve the error properties we wnt under the usul mxnorm. We must define new norm to reflect the fct tht H B induces lrger errors where ρ is smll. A weighted mx-norm is function ξ such tht V V ξ = mx (5) V () V (), (6) ξ() where ξ > 0. Not surprisingly, ρ p is the norm we need. Note tht when p = 0, δ p reduces to the uniform spcing mesure δ nd ρ p reduces to the mx-norm. We egin y generlizing some well-known results out stndrd vlue itertion to the ρ p -norm with 0 p < 1. Theorem 1. The exct Bellmn updte H is contrction under the ρ p -norm with contrction fctor γ 1 p. Proof. Define Q V () = R(, ) + γ Pr(, )V ( ) (7) so tht HV = mx Q V. For ny, the mpping V Q V hs contrction fctor γ 1 p : Q V Q V ρ p = mx Q V () Q V [ρ()] p (8) = mx γ Pr(, ) V ( ) V ( ) [ρ()] p (9) mx γ Pr(, ) V ( ) V ( ) [γρ( )] p (10) mx γ 1 p Pr(, ) V V ρ p (11) = γ 1 p V V ρ p (12) Now choose n ritrry. Assume without loss of generlity tht HV () H V (). Choose to mximize Q V (), nd ā to mximize Q Vā (). It follows tht Q V () Q Vā () Q V (), nd HV () H V () = Q V () Q Vā () (13) Q V () Q V () (14) mx Q V () Q V () (15) Dividing through y ρ p () nd mximizing over yields HV H V ρ p mx Q V () Q V () ρ p (16) γ 1 p V V ρ p Theorem 2. Let ˆπ e the one-step lookhed policy induced y n pproximte vlue function ˆV. The regret from executing ˆπ rther thn π, strting from 0, is t most 2γ 1 p 1 γ 1 p V ˆV ρ p (17) Proof. Choose n ritrry. It is esy to check tht for ny policy π, J π () = Q J π π() (). Also, ecuse ˆπ is the one-step lookhed policy induced y ˆV, Q ˆVˆπ() () = H ˆV (). The Bellmn eqution sttes tht V = HV. Then: J π () J ˆπ () = V () Q J ˆπ ˆπ() () (18) = V () Q ˆVˆπ() () + Q ˆVˆπ() () Q J ˆπ ˆπ() () (19) V () Q ˆVˆπ() () + Q ˆVˆπ() () Q J ˆπ ˆπ() () (20) HV () H ˆV () + γ Pr(, ˆπ()) ˆV ( ) J ˆπ ( ) (21) HV () H ˆV () + γ Pr(, ˆπ()) γ p ρ p () ˆV J ˆπ ρ p (22) HV () H ˆV () + γ 1 p ρ p () ˆV J ˆπ ρ p (23) Dividing through y ρ p () nd mximizing over gives J π J ˆπ ρ p (24) HV H ˆV ρ p + γ 1 p ˆV J ˆπ ρ p (25) γ 1 p( V ˆV ρ p + ˆV J ˆπ ) ρ p (26) γ 1 p( V ˆV ρ p + (27) ˆV V ρ p + V J ˆπ ) ρ p (28) γ 1 p( 2 V ˆV ρ p + V J ˆπ ) ρ p (29) = γ 1 p( 2 V ˆV ρ p + J π J ˆπ ) ρ p (30) = 2γ 1 p V ˆV ρ p + γ 1 p J π J ˆπ ρ p (31)
4 Solving the recursion, J π J ˆπ ρ p 2γ1 p 1 γ 1 p V ˆV ρ p (32) And since ρ( 0 ) = 1, we hve the desired regret ound: J π ( 0 ) J ˆπ ( 0 ) 2γ1 p 1 γ 1 p V ˆV ρ p It is worth noting (lthough we lck spce to prove it here) tht tighter ound pplies when ˆV is uniformly improvle [Zhng nd Zhng, 2001]. A smll modifiction to H B would mke Vt B uniformly improvle t the cost of incresing Γ. In tht cse the regret would e t most γ 1 p V ˆV ρ p. Hving discussed the ρ p -norm ehvior of H, now we move on to the ρ p -norm ehvior of H B with non-uniform smple spcing δ p. Lemm 1. At ny updte step t, the error HV t B H B Vt B ρ p introduced y single ppliction of H B rther thn H is t most (R mx R min )δ p (B) 1 γ 1 p (33) Proof. The rgument is nlogous to Lemm 1 of [Pineu et l., 2003]. Necessry chnges: (1) restrict to e drwn from, (2) divide throughout y ρ p ( ), nd (3) sustitute γ 1 p for γ in the denomintor to reflect the chnged contrction properties of H under the new norm. Theorem 3. At ny updte step t, the ccumulted error V t V B t ρ p is t most (R mx R min )δ p (B) (1 γ 1 p ) 2 (34) Proof. The rgument is nlogous to Theorem 1 of [Pineu et l., 2003]. Necessry chnges: (1) replce the mx-norm with the ρ p -norm, nd (2) replce γ with γ 1 p. Tken together, these results show tht the conceptul lgorithm cn e used to generte policy with ritrrily smll regret relted to δ p (B), nd they provide finite ound on the numer of updtes required to chieve given regret. 3.2 IMPLICATIONS FOR ALGORITHM DESIGN The is of our model towrd eliefs with high discounted rechility descries current lgorithms more ccurtely thn uniform smpling, t lest to the extent tht the lgorithms perform (typiclly shllow) forwrd explortion from the initil elief to generte B. The prmeter p rose nturlly during our nlysis. p = 0 corresponds to uniform smpling nd the usul mx-norm. As p increses, smples grow less dense in res with low rechility nd the norm ecomes correspondingly more tolernt. But the results show tht there s no free lunch: the higher effective discount fctor γ 1 p under the new norm mens tht more updtes re required nd the finl error ounds re looser. The new theoreticl frmework provides wy to nlyze this trde-off. We initilly found the concept of discounted rechility surprising. The intuition is tht (1) eliefs tht re deeper in the serch tree re less relevnt, nd (2) eliefs tht cn only e reched y low-proility elief trnsitions re less relevnt. But discounted rechility ignores (2) entirely, in tht ll trnsitions with non-zero proility re treted eqully. Actully, we strted with different concept of discounted occupncy, in which eliefs re tgged s proportionlly less relevnt if they cn only e reched y low-proility elief trnsitions. The is of current lgorithms seems to e etter descried y discounted occupncy, nd empiriclly, treting ll trnsitions with non-zero proility eqully hurts performnce. But the convergence results we found do not go through when discounted occupncy is used insted of discounted rechility. We hope tht more sophisticted future nlysis will shed light on this issue. In summry, these new results tke us closer to understnding point-sed lgorithms. The nlysis helps explin importnt trde-offs in lgorithm design, lthough we hve not yet hd time to pply it to working lgorithm. The next section chnges the topic to recent improvements in our (point-sed) HSVI lgorithm. Note tht those improvements re not sed on the theoreticl results just presented. 4 IMPROVEMENTS IN HEURISTIC SEARCH VALUE ITERATION This section discusses recent improvements in our heuristic serch vlue itertion lgorithm (HSVI). Reltive to our originl presenttion of HSVI, the new implementtion clcultes tighter initil ounds, voids solving liner progrms, nd mkes more effective use of sprsity. Empiricl results show speedups of up to three orders of mgnitude. 4.1 HSVI OVERVIEW HSVI is point-sed lgorithm tht mintins oth upper nd lower ounds on the optiml vlue function, llowing it to use effective heuristics for ction nd oservtion selection, nd to provide provly smll regret from the policy it genertes. We provide rief overview here;
5 Algorithm 2. π = HSVI(ɛ). HSVI(ɛ) returns policy π whose regret reltive to π, strting from 0, is t most ɛ. 1. Initilize the ounds ˆV. 2. While width( ˆV ( 0 )) > ɛ, repetedly invoke explore( 0, ɛ, 0). 3. Hving chieved the desired precision, return the direct-control policy π corresponding to the lower ound. Algorithm 3. explore(, ɛ, t). explore recursively follows single pth down the serch tree until stisfying termintion condition sed on the width of the ounds intervl. It then performs series of updtes on its wy ck up to the initil elief. 1. If width( ˆV ()) ɛγ t, return. 2. Select n ction nd oservtion z ccording to the forwrd explortion heuristics. 3. Cll explore(τ(,, z ), ɛ, t + 1). 4. Perform point-sed updte of ˆV t elief. HSVI1 initilizes the lower ound using conservtive estimte of the vlues of lind policies of the form lwys execute ction. The smllest possile rewrd from executing ction is min s R(s, ), so ound on the longterm rewrd for tht policy cn e found y evluting the relevnt summtion. HSVI1 then mximizes over : R mx t=0 γ t min s R(s, ) = mx min s R(s, ) 1 γ (35) The vector set for the initil lower ound V 0 contins single vector α such tht every α(s) = R. HSVI1 initilizes the upper ound y ssuming full oservility nd solving the MDP version of the prolem. This provides upper ound vlues t the corners of the elief simplex, which form the initil point set. V V * V updte Figure 2: Locl updte t Locl Updtes for more detil refer to [Smith nd Simmons, 2004]. We refer to the originl version nd our current version s HSVI1 nd HSVI2, respectively. The differences re covered comprehensively in 4.2. HSVI is outlined in Algs. 2 nd 3. We denote the lower nd upper ound functions s V nd V, respectively. The intervl function ˆV refers to them collectively, such tht ˆV () = [V (), V ()] nd width( ˆV ()) = V () V () Vlue Function Representtion HSVI uses the usul Γ vector set representtion for its lower ound (see 2). Unfortuntely, if the upper ound is represented with vector set, updting y dding vector does not hve the desired effect of improving the ound in the neighorhood of the locl updte. To ccommodte the need for updtes, HSVI uses point set representtion for the upper ound. The vlue t point is the projection of onto the convex hull formed y finite set Υ of elief/vlue points ( i, v i ). Updtes re performed y dding new point to the set. In HSVI1, the projection onto the convex hull is clculted y solving liner progrm using the commercil CPLEX softwre pckge Initiliztion HSVI performs locl updte L of the lower ound y dding the result of point-sed updte t to the vector set: L Γ = Γ {ckup(γ, )} (36) It performs locl updte U of the upper ound y dding the result of Bellmn updte t to the point set: U Υ = Υ {(, H V ())} (37) Fig. 2 represents the structure of the ounds representtions nd the process of loclly updting t. In the left side of the figure, the points nd dotted lines represent V (upper ound points nd convex hull). Severl solid lines represent the vectors of Γ. In the right side of the figure, we see the result of updting oth ounds t, which involves dding new point to Υ nd new vector to Γ, ringing oth ounds closer to V. HSVI periodiclly prunes dominted elements in oth the lower ound vector set nd the upper ound point set; we do not discuss the pruning here ecuse it is unffected y our recent chnges Forwrd Explortion Heuristics This section discusses the heuristics tht re used to decide which child of the current node to visit s HSVI works its wy forwrd from the initil elief. Strting from prent node, HSVI must choose n ction nd n oservtion z : the child node to visit is τ(,, z ).
6 HSVI selects ctions greedily sed on the upper ound (the IE-MAX heuristic). At elief, for every ction, it cn compute n upper ound on the long-term rewrd from tking tht ction. It chooses the ction with the highest upper ound: = rgmx Q V () (38) Becuse the ounds t prent node re lwys wider thn the ounds t the child with the highest upper ound vlue, choosing ccording to IE-MAX is good wy to ensure convergence. In the simpler context where updtes do not ffect neighoring nodes, it is provly optiml [Kelling, 1993]. HSVI uses the weighted excess uncertinty heuristic for oservtion selection. Excess uncertinty t elief with depth t in the serch tree is defined to e excess(, t) = width( ˆV ()) ɛγ t (39) Excess uncertinty hs the property tht if ll the children of node hve negtive excess uncertinty, then fter n updte will lso hve negtive excess uncertinty. Negtive excess uncertinty t the root implies the desired convergence to within ɛ. The weighted excess uncertinty heuristic is designed to focus ttention on the child node with the gretest contriution to the excess uncertinty t the prent: z = rgmx z [ Pr(z, )excess(τ(,, z), t + 1) ] (40) Both the ction nd oservtion selection heuristics re designed so tht pplying them systemticlly gurntees HSVI convergence in finite time [Smith nd Simmons, 2004]. 4.2 CHANGES BETWEEN HSVI1 AND HSVI2 We report series of chnges mde since our initil presenttion of HSVI1. The chnges re roughly ordered in terms of their impct on the overll performnce. The reltive speedup for individul chnges is prolem-dependent; the reported vlues were mesured informlly on the Tg prolem. HSVI2 performnce is presented in More Effective Use of Sprsity HSVI1 represents eliefs nd trnsition functions s vectors nd mtrices in BLAS compressed storge mode [Dongrr et l., 1988]. It uses n off-the-shelf sprse liner lger pckge to compute elief trnsitions nd tke dot products. Tht pckge turned out to e using inpproprite lgorithms, slowing down individul opertions y s much s 100x. We ddressed the prolem in HSVI2 y writing our own simple compressed storge opertions, which speed up lower ound updtes y out 50x. HSVI1 represents α vectors in dense storge mode ecuse they tend to hve lrge numer of non-zeros, even when eliefs re sprse. Typiclly, when α ckup(γ, ) is pplied, ll of the entries of α must e computed, even if is sprse nd most of the entries hve no effect on the vlue α. They re required ecuse HSVI my lter need to evlute α where hs different non-zeros. But if α is optimized for, why should we expect it to e relevnt to, which hs different non-zeros nd perhps no overlp with t ll? This leds to the ide of msked α vectors. In HSVI2, α ckup(γ, ) computes only the entries of α tht correspond to non-zeros of. A msk records which entries were computed. If HSVI2 lter evlutes mx i (α i ) nd hs non-zero in position tht ws not computed in α i, the dot product α i is rejected from considertion. This chnge cn e interpreted geometriclly. Sprse eliefs lie in hyperplnes on the oundry of the elief simplex. When msked α vector is computed using the new ckup(γ, ), it pplies only to the lowest-dimensionl oundry hyperplne contining. Empiriclly, msked α vectors speed up lower ound updtes y out 5x. Note tht lmost ny POMDP vlue itertion lgorithm could mke use of this concept Avoid Solving Liner Progrms HSVI1 evlutes V () y computing the exct projection of onto the convex hull of the points in Υ, which involves solving liner progrm with the commercil CPLEX softwre pckge. Ech upper ound updte requires severl such projections, nd the time spent solving liner progrms domintes the upper ound updte time. HSVI2 uses n pproximte projection onto the convex hull suggested y [Huskrecht, 2000]. Projection onto the convex hull of set of points is prticulrly simple when the set contins only the corners of the elief simplex nd one interior point: it cn e computed in O( S ) time. To pproximtely project onto the overll convex hull, HSVI2 runs this opertion for ech interior point of Υ nd tkes the minimum vlue, requiring O( Υ S ) time overll (or less with sprsity). This pproximte convex hull hs the key properties tht (1) it is everywhere greter thn the exct convex hull, nd (2) the pproximtion t is exct if there is n undominted pir (, v) Υ. Empiriclly, the pproximte projection speeds up upper ound updtes y out 100x Tighter Initil Bounds HSVI1 genertes n initil lower ound sed on conservtive estimte of the vlues of lind policies. HSVI2 uses etter lind policy vlue estimte suggested in [Huskrecht, 1997]. The vlue α of ech policy lwys
7 tke ction is updted in MDP fshion: α t+1(s) = R(s, ) + γ s Pr(s s, )α t (s ) (41) Ech updte of A vectors cn e evluted in O( S 2 A ) time. HSVI2 initilizes the vectors α 0 using the HSVI1 lower ound, which gurntees tht the ound is vlid even if the itertion is not run to completion. When the itertion is stopped, the α t vectors form the initil lower ound Γ. HSVI1 genertes n initil upper ound sed on the vlue function of the fully oservle MDP. HSVI2 uses the fst informed ound (FIB) pproximtion, which is gurnteed to give tighter upper ound thn the MDP pproximtion [Huskrecht, 2000]. FIB itertion keeps one vector α for ech ction nd uses the following updte rule: αt+1(s) = R(s, ) + γ mx Pr(s, z s, )α t (s ) z s Ech FIB updte cn e evluted in O( A 2 S 2 Z ) time. As with the lower ound, HSVI2 initilizes the upper ound vectors α 0 using the HSVI1 upper ound. When FIB itertion is stopped, ech corner point corresponding to stte s is initilized to mx α t (s). Empiriclly, HSVI2 cn run oth ound initiliztion routines to pproximte convergence (residul < 10 3 ) in t most few seconds for ll of the prolems in our enchmrk set. This results in etter performnce ner the eginning of HSVI2 execution, lthough lter in the run the effect is less significnt. The chnge in the lower ound initiliztion is the more importnt of the two; the MDP upper ound ws lredy firly good for most prolems. 4.3 HSVI2 PERFORMANCE Fig. 3 shows HSVI2 rewrd vs. time for four prolems from the sclle POMDP literture. The plotted rewrd is the verge received over 100 or more simultions. We lso plot HSVI2 s ounds V ( 0 ) nd V ( 0 ). HSVI2 ws run only once on ech prolem since it is not stochstic. The pltform used ws Pentium-4 running t 3.4 GHz, with 2 GB of RAM (HSVI2 used t most 250 MB of RAM). The plots show rnge of ehviors. RockSmple[4,4] is especilly esy; the HSVI2 ounds converge fter 13 seconds, showing tht the solution is optiml. Hllwy2 shows HSVI2 quickly rriving t n pprently ner-optiml solution, ut its ounds remin loose. Tg nd RockSmple[10,10] show typicl ehvior for lrge prolems: the upper ound decrese is slow nd stedy while the lower ound (nd the rewrd) improve in jumps, plteuing for long periods. RockSmple[10,10], with > 10 5 sttes, would e too lrge for most POMDP lgorithms to hndle; HSVI2 gins y use of sprsity. It would run out of memory with prolem 5-10 times lrger RockSmple[4,4] (257s 9 2o) Hllwy2 (93s 5 17o)) simultion 0.1 simultion ounds ounds Tg (870s 5 30o) RockSmple[10,10] (102,401s 19 2o) simultion 10 simultion ounds ounds Figure 3: HSVI2 rewrd vs. wllclock time. Fig. 4 shows running times nd solution qulity for HSVI nd severl other lgorithms. Note tht different lgorithms were run on different pltforms, so running times re only roughly comprle. The tle lso shows, for ech prolem, the 95% confidence intervl for rewrd mesurements ssuming the vrince of HSVI2 s est policy nd verging 100 rewrds. An lgorithm s rewrd is strred if it is within the confidence intervl reltive to the est reported vlue. HSVI2 is within mesurement error of the est reported rewrd for ll prolems, nd its running time is considerly shorter thn other lgorithms in most cses. The gretest speedup from HSVI1 to HSVI2 ws oserved on the Rock- Smple[7,8] prolem. HSVI2 tkes out 6 seconds to surpss the rewrd reched y HSVI1 fter > 10 4 seconds. After correcting for running on processor out 5x fster, this is > 300x speedup. Other stte-of-the-rt sclle POMDP lgorithms could not e compred to HSVI2 ecuse they were tested on different prolems. Among these, two techniques pper especilly promising. Exponentil-fmily PCA trnsforms the POMDP, compresses to low-dimensionl representtion in the trnsformed spce, then solves it with gridsed lgorithm. It hs demonstrted good results on lrgescle root nvigtion prolems [Roy nd Gordon, 2003]. Vlue-directed compression (VDC) is nother compression technique. It typiclly produces less compct representtion thn E-PCA, ut the compressed POMDP retins liner structure nd vlue function convexity, so tht it cn e solved using lmost ny POMDP lgorithm. The comintions VDC+BPI nd VDC+PBVI hve demonstrted sclility to huge prolem sizes, up to 33 million sttes [Pouprt nd Boutilier, 2004]. 1 VDC would likely oost 1 VDC+PBVI results courtesy of Pouprt, personl communi-
8 Prolem (sttes/ctions/oservtions) Rewrd Time (s) Γ Tiger-Grid (36s 5 17o) (±0.14) HSVI1 [Smith et l., 2004] 2.35* Perseus [Spn et l., 2004] 2.34* HSVI2 2.30* PBUA [Poon, 2001] 2.30* PBVI [Pineu et l., 2003] 2.25* BPI [Pouprt et l., 2003] 2.22* QMDP N/A Hllwy (61s 5 21o) (±0.038) PBVI [Pineu et l., 2003] 0.53* PBUA [Poon, 2001] 0.53* HSVI2 0.52* HSVI1 [Smith et l., 2004] 0.52* Perseus [Spn et l., 2004] 0.51* BPI [Pouprt et l., 2003] 0.51* QMDP N/A Hllwy2 (93s 5 17o) (±0.048) HSVI2 0.35* Perseus [Spn et l., 2004] 0.35* HSVI1 [Smith et l., 2004] 0.35* PBUA [Poon, 2001] 0.35* PBVI [Pineu et l., 2003] 0.34* BPI [Pouprt et l., 2004] 0.32* QMDP N/A Tg (870s 5 30o) (±1.2) Perseus [Spn et l., 2004] -6.17* HSVI2-6.36* HSVI1 [Smith et l., 2004] -6.37* BPI [Pouprt et l., 2004] -6.65* PBVI [Pineu et l., 2003] QMDP N/A RockSmple[4,4] (257s 9 2o) (±1.2) HSVI2 18.0* HSVI1 [Smith et l., 2004] 18.0* PBVI [Pineu, pers. communiction] 17.1* 2000? QMDP N/A RockSmple[7,8] (12,545s 13 2o) (±1.2) HSVI2 20.6* HSVI1 [Smith et l., 2004] QMDP N/A RockSmple[10,10] (102,401s 19 2o) (±1.3) HSVI2 20.4* QMDP 0 57 N/A Figure 4: Multi-lgorithm performnce comprison. HSVI sclility in similr wy. 5 CONCLUSION We presented new theoreticl results for point-sed lgorithms, which comine curse of dimensionlity nd curse of history rguments into n overll ound on the convergence of point-sed vlue itertion with non-uniform smple spcing. In the future we will pply these results to point-sed lgorithm design. We lso demonstrted improved performnce for our HSVI lgorithm, with speedups of more thn two orders of mgnitude nd successful scling to POMDP with > 10 5 sttes. In the future we would like to comine HSVI with compct representtion technique such s VDC to del with still lrger prolems. Acknowledgments Thnks to Geoff Gordon nd Pscl Pouprt for helpful discussions. This work ws funded in prt y NASA GSRP Fellowship with Ames Reserch Center. References [Aerdeen, 2002] Aerdeen, D. (2002). A survey of pproximte methods for solving prtilly oservle Mrkov decision processes. Technicl report, Reserch School of Informtion Science nd Engineering, Austrli Ntionl University. [Cssndr et l., 1997] Cssndr, A., Littmn, M., nd Zhng, N. (1997). Incrementl pruning: A simple, fst, exct method for prtilly oservle Mrkov decision processes. In Proc. of UAI. [Dongrr et l., 1988] Dongrr, J. J., Croz, J. D., Hmmrling, S., nd Hnson, R. J. (1988). An extended set of FORTRAN sic liner lger suprogrms. ACM Trns. Mth. Soft., 14:1 17. [Huskrecht, 1997] Huskrecht, M. (1997). Incrementl methods for computing ounds in prtilly oservle Mrkov decision processes. In Proc. of AAAI, pges , Providence, RI. [Huskrecht, 2000] Huskrecht, M. (2000). Vlue-function pproximtions for prtilly oservle Mrkov decision processes. Journl of Artificil Intelligence Reserch, 13: [Howrd, 1960] Howrd, R. A. (1960). Dynmic Progrmming nd Mrkov Processes. MIT. [Kelling, 1993] Kelling, L. P. (1993). Lerning in Emedded Systems. The MIT Press. [Munos, 2004] Munos, R. (2004). Error ounds for pproximte vlue itertion. Technicl Report CMAP 527, École Polytechnique. [Pineu et l., 2003] Pineu, J., Gordon, G., nd Thrun, S. (2003). Point-sed vlue itertion: An nytime lgorithm for POMDPs. In Proc. of IJCAI. [Poon, 2001] Poon, K.-M. (2001). A fst heuristic lgorithm for decision-theoretic plnning. Mster s thesis, The Hong Kong University of Science nd Technology. [Pouprt nd Boutilier, 2003] Pouprt, P. nd Boutilier, C. (2003). Bounded finite stte controllers. In Proc. of NIPS, Vncouver. [Pouprt nd Boutilier, 2004] Pouprt, P. nd Boutilier, C. (2004). VDCBPI: n pproximte sclle lgorithm for lrge scle POMDPs. In Proc. of NIPS, Vncouver. [Roy nd Gordon, 2003] Roy, N. nd Gordon, G. (2003). Exponentil fmily PCA for elief compression in POMDPs. In NIPS. [Smith nd Simmons, 2004] Smith, T. nd Simmons, R. (2004). Heuristic serch vlue itertion for POMDPs. In Proc. of UAI. [Sondik, 1971] Sondik, E. J. (1971). The optiml control of prtilly oservle Mrkov processes. PhD thesis, Stnford University. [Zhng nd Zhng, 2001] Zhng, N. L. nd Zhng, W. (2001). Speeding up the convergence of vlue itertion in prtilly oservle Mrkov decision processes. Journl of AI Reserch, 14: ction.
Reinforcement learning II
CS 1675 Introduction to Mchine Lerning Lecture 26 Reinforcement lerning II Milos Huskrecht milos@cs.pitt.edu 5329 Sennott Squre Reinforcement lerning Bsics: Input x Lerner Output Reinforcement r Critic
More informationp-adic Egyptian Fractions
p-adic Egyptin Frctions Contents 1 Introduction 1 2 Trditionl Egyptin Frctions nd Greedy Algorithm 2 3 Set-up 3 4 p-greedy Algorithm 5 5 p-egyptin Trditionl 10 6 Conclusion 1 Introduction An Egyptin frction
More informationCompact, Convex Upper Bound Iteration for Approximate POMDP Planning
Compct, Convex Upper Bound Itertion for Approximte POMDP Plnning To Wng University of Alert trysi@cs.ulert.c Pscl Pouprt University of Wterloo ppouprt@cs.uwterloo.c Michel Bowling nd Dle Schuurmns University
More information19 Optimal behavior: Game theory
Intro. to Artificil Intelligence: Dle Schuurmns, Relu Ptrscu 1 19 Optiml behvior: Gme theory Adversril stte dynmics hve to ccount for worst cse Compute policy π : S A tht mximizes minimum rewrd Let S (,
More informationConvert the NFA into DFA
Convert the NF into F For ech NF we cn find F ccepting the sme lnguge. The numer of sttes of the F could e exponentil in the numer of sttes of the NF, ut in prctice this worst cse occurs rrely. lgorithm:
More informationBayesian Networks: Approximate Inference
pproches to inference yesin Networks: pproximte Inference xct inference Vrillimintion Join tree lgorithm pproximte inference Simplify the structure of the network to mkxct inferencfficient (vritionl methods,
More information2D1431 Machine Learning Lab 3: Reinforcement Learning
2D1431 Mchine Lerning Lb 3: Reinforcement Lerning Frnk Hoffmnn modified by Örjn Ekeberg December 7, 2004 1 Introduction In this lb you will lern bout dynmic progrmming nd reinforcement lerning. It is ssumed
More informationModule 6 Value Iteration. CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo
Module 6 Vlue Itertion CS 886 Sequentil Decision Mking nd Reinforcement Lerning University of Wterloo Mrkov Decision Process Definition Set of sttes: S Set of ctions (i.e., decisions): A Trnsition model:
More informationReinforcement Learning
Reinforcement Lerning Tom Mitchell, Mchine Lerning, chpter 13 Outline Introduction Comprison with inductive lerning Mrkov Decision Processes: the model Optiml policy: The tsk Q Lerning: Q function Algorithm
More informationReview of Gaussian Quadrature method
Review of Gussin Qudrture method Nsser M. Asi Spring 006 compiled on Sundy Decemer 1, 017 t 09:1 PM 1 The prolem To find numericl vlue for the integrl of rel vlued function of rel vrile over specific rnge
More informationI1 = I2 I1 = I2 + I3 I1 + I2 = I3 + I4 I 3
2 The Prllel Circuit Electric Circuits: Figure 2- elow show ttery nd multiple resistors rrnged in prllel. Ech resistor receives portion of the current from the ttery sed on its resistnce. The split is
More informationDiscrete Mathematics and Probability Theory Spring 2013 Anant Sahai Lecture 17
EECS 70 Discrete Mthemtics nd Proility Theory Spring 2013 Annt Shi Lecture 17 I.I.D. Rndom Vriles Estimting the is of coin Question: We wnt to estimte the proportion p of Democrts in the US popultion,
More information1 Online Learning and Regret Minimization
2.997 Decision-Mking in Lrge-Scle Systems My 10 MIT, Spring 2004 Hndout #29 Lecture Note 24 1 Online Lerning nd Regret Minimiztion In this lecture, we consider the problem of sequentil decision mking in
More informationDiscrete Mathematics and Probability Theory Summer 2014 James Cook Note 17
CS 70 Discrete Mthemtics nd Proility Theory Summer 2014 Jmes Cook Note 17 I.I.D. Rndom Vriles Estimting the is of coin Question: We wnt to estimte the proportion p of Democrts in the US popultion, y tking
More informationGenetic Programming. Outline. Evolutionary Strategies. Evolutionary strategies Genetic programming Summary
Outline Genetic Progrmming Evolutionry strtegies Genetic progrmming Summry Bsed on the mteril provided y Professor Michel Negnevitsky Evolutionry Strtegies An pproch simulting nturl evolution ws proposed
More informationBases for Vector Spaces
Bses for Vector Spces 2-26-25 A set is independent if, roughly speking, there is no redundncy in the set: You cn t uild ny vector in the set s liner comintion of the others A set spns if you cn uild everything
More informationCS 188: Artificial Intelligence Spring 2007
CS 188: Artificil Intelligence Spring 2007 Lecture 3: Queue-Bsed Serch 1/23/2007 Srini Nrynn UC Berkeley Mny slides over the course dpted from Dn Klein, Sturt Russell or Andrew Moore Announcements Assignment
More informationAdministrivia CSE 190: Reinforcement Learning: An Introduction
Administrivi CSE 190: Reinforcement Lerning: An Introduction Any emil sent to me bout the course should hve CSE 190 in the subject line! Chpter 4: Dynmic Progrmming Acknowledgment: A good number of these
More information1B40 Practical Skills
B40 Prcticl Skills Comining uncertinties from severl quntities error propgtion We usully encounter situtions where the result of n experiment is given in terms of two (or more) quntities. We then need
More informationMath 1B, lecture 4: Error bounds for numerical methods
Mth B, lecture 4: Error bounds for numericl methods Nthn Pflueger 4 September 0 Introduction The five numericl methods descried in the previous lecture ll operte by the sme principle: they pproximte the
More informationTHE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS.
THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS RADON ROSBOROUGH https://intuitiveexplntionscom/picrd-lindelof-theorem/ This document is proof of the existence-uniqueness theorem
More informationThe Minimum Label Spanning Tree Problem: Illustrating the Utility of Genetic Algorithms
The Minimum Lel Spnning Tree Prolem: Illustrting the Utility of Genetic Algorithms Yupei Xiong, Univ. of Mrylnd Bruce Golden, Univ. of Mrylnd Edwrd Wsil, Americn Univ. Presented t BAE Systems Distinguished
More informationCMDA 4604: Intermediate Topics in Mathematical Modeling Lecture 19: Interpolation and Quadrature
CMDA 4604: Intermedite Topics in Mthemticl Modeling Lecture 19: Interpoltion nd Qudrture In this lecture we mke brief diversion into the res of interpoltion nd qudrture. Given function f C[, b], we sy
More informationNondeterminism and Nodeterministic Automata
Nondeterminism nd Nodeterministic Automt 61 Nondeterminism nd Nondeterministic Automt The computtionl mchine models tht we lerned in the clss re deterministic in the sense tht the next move is uniquely
More informationUnit #9 : Definite Integral Properties; Fundamental Theorem of Calculus
Unit #9 : Definite Integrl Properties; Fundmentl Theorem of Clculus Gols: Identify properties of definite integrls Define odd nd even functions, nd reltionship to integrl vlues Introduce the Fundmentl
More informationProperties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives
Block #6: Properties of Integrls, Indefinite Integrls Gols: Definition of the Definite Integrl Integrl Clcultions using Antiderivtives Properties of Integrls The Indefinite Integrl 1 Riemnn Sums - 1 Riemnn
More informationQUADRATURE is an old-fashioned word that refers to
World Acdemy of Science Engineering nd Technology Interntionl Journl of Mthemticl nd Computtionl Sciences Vol:5 No:7 011 A New Qudrture Rule Derived from Spline Interpoltion with Error Anlysis Hdi Tghvfrd
More informationExercises with (Some) Solutions
Exercises with (Some) Solutions Techer: Luc Tesei Mster of Science in Computer Science - University of Cmerino Contents 1 Strong Bisimultion nd HML 2 2 Wek Bisimultion 31 3 Complete Lttices nd Fix Points
More information{ } = E! & $ " k r t +k +1
Chpter 4: Dynmic Progrmming Objectives of this chpter: Overview of collection of clssicl solution methods for MDPs known s dynmic progrmming (DP) Show how DP cn be used to compute vlue functions, nd hence,
More informationBellman Optimality Equation for V*
Bellmn Optimlity Eqution for V* The vlue of stte under n optiml policy must equl the expected return for the best ction from tht stte: V (s) mx Q (s,) A(s) mx A(s) mx A(s) Er t 1 V (s t 1 ) s t s, t s
More informationChapter 4: Dynamic Programming
Chpter 4: Dynmic Progrmming Objectives of this chpter: Overview of collection of clssicl solution methods for MDPs known s dynmic progrmming (DP) Show how DP cn be used to compute vlue functions, nd hence,
More informationMath 8 Winter 2015 Applications of Integration
Mth 8 Winter 205 Applictions of Integrtion Here re few importnt pplictions of integrtion. The pplictions you my see on n exm in this course include only the Net Chnge Theorem (which is relly just the Fundmentl
More informationConnected-components. Summary of lecture 9. Algorithms and Data Structures Disjoint sets. Example: connected components in graphs
Prm University, Mth. Deprtment Summry of lecture 9 Algorithms nd Dt Structures Disjoint sets Summry of this lecture: (CLR.1-3) Dt Structures for Disjoint sets: Union opertion Find opertion Mrco Pellegrini
More informationCS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata
CS103B ndout 18 Winter 2007 Ferury 28, 2007 Finite Automt Initil text y Mggie Johnson. Introduction Severl childrens gmes fit the following description: Pieces re set up on plying ord; dice re thrown or
More informationThe practical version
Roerto s Notes on Integrl Clculus Chpter 4: Definite integrls nd the FTC Section 7 The Fundmentl Theorem of Clculus: The prcticl version Wht you need to know lredy: The theoreticl version of the FTC. Wht
More informationW. We shall do so one by one, starting with I 1, and we shall do it greedily, trying
Vitli covers 1 Definition. A Vitli cover of set E R is set V of closed intervls with positive length so tht, for every δ > 0 nd every x E, there is some I V with λ(i ) < δ nd x I. 2 Lemm (Vitli covering)
More informationSection 4: Integration ECO4112F 2011
Reding: Ching Chpter Section : Integrtion ECOF Note: These notes do not fully cover the mteril in Ching, ut re ment to supplement your reding in Ching. Thus fr the optimistion you hve covered hs een sttic
More informationChapter 5 : Continuous Random Variables
STAT/MATH 395 A - PROBABILITY II UW Winter Qurter 216 Néhémy Lim Chpter 5 : Continuous Rndom Vribles Nottions. N {, 1, 2,...}, set of nturl numbers (i.e. ll nonnegtive integers); N {1, 2,...}, set of ll
More informationModel Reduction of Finite State Machines by Contraction
Model Reduction of Finite Stte Mchines y Contrction Alessndro Giu Dip. di Ingegneri Elettric ed Elettronic, Università di Cgliri, Pizz d Armi, 09123 Cgliri, Itly Phone: +39-070-675-5892 Fx: +39-070-675-5900
More informationCS 188: Artificial Intelligence Fall Announcements
CS 188: Artificil Intelligence Fll 2009 Lecture 20: Prticle Filtering 11/5/2009 Dn Klein UC Berkeley Announcements Written 3 out: due 10/12 Project 4 out: due 10/19 Written 4 proly xed, Project 5 moving
More informationQuantum Nonlocality Pt. 2: No-Signaling and Local Hidden Variables May 1, / 16
Quntum Nonloclity Pt. 2: No-Signling nd Locl Hidden Vriles My 1, 2018 Quntum Nonloclity Pt. 2: No-Signling nd Locl Hidden Vriles My 1, 2018 1 / 16 Non-Signling Boxes The primry lesson from lst lecture
More informationChapter 0. What is the Lebesgue integral about?
Chpter 0. Wht is the Lebesgue integrl bout? The pln is to hve tutoril sheet ech week, most often on Fridy, (to be done during the clss) where you will try to get used to the ides introduced in the previous
More informationImproper Integrals. The First Fundamental Theorem of Calculus, as we ve discussed in class, goes as follows:
Improper Integrls The First Fundmentl Theorem of Clculus, s we ve discussed in clss, goes s follows: If f is continuous on the intervl [, ] nd F is function for which F t = ft, then ftdt = F F. An integrl
More information2.4 Linear Inequalities and Interval Notation
.4 Liner Inequlities nd Intervl Nottion We wnt to solve equtions tht hve n inequlity symol insted of n equl sign. There re four inequlity symols tht we will look t: Less thn , Less thn or
More informationChapter 4 Contravariance, Covariance, and Spacetime Diagrams
Chpter 4 Contrvrince, Covrince, nd Spcetime Digrms 4. The Components of Vector in Skewed Coordintes We hve seen in Chpter 3; figure 3.9, tht in order to show inertil motion tht is consistent with the Lorentz
More informationContinuous Random Variables
STAT/MATH 395 A - PROBABILITY II UW Winter Qurter 217 Néhémy Lim Continuous Rndom Vribles Nottion. The indictor function of set S is rel-vlued function defined by : { 1 if x S 1 S (x) if x S Suppose tht
More informationMath 270A: Numerical Linear Algebra
Mth 70A: Numericl Liner Algebr Instructor: Michel Holst Fll Qurter 014 Homework Assignment #3 Due Give to TA t lest few dys before finl if you wnt feedbck. Exercise 3.1. (The Bsic Liner Method for Liner
More informationMinimal DFA. minimal DFA for L starting from any other
Miniml DFA Among the mny DFAs ccepting the sme regulr lnguge L, there is exctly one (up to renming of sttes) which hs the smllest possile numer of sttes. Moreover, it is possile to otin tht miniml DFA
More informationLecture Solution of a System of Linear Equation
ChE Lecture Notes, Dept. of Chemicl Engineering, Univ. of TN, Knoville - D. Keffer, 5/9/98 (updted /) Lecture 8- - Solution of System of Liner Eqution 8. Why is it importnt to e le to solve system of liner
More information5.7 Improper Integrals
458 pplictions of definite integrls 5.7 Improper Integrls In Section 5.4, we computed the work required to lift pylod of mss m from the surfce of moon of mss nd rdius R to height H bove the surfce of the
More informationMath 61CM - Solutions to homework 9
Mth 61CM - Solutions to homework 9 Cédric De Groote November 30 th, 2018 Problem 1: Recll tht the left limit of function f t point c is defined s follows: lim f(x) = l x c if for ny > 0 there exists δ
More informationChapter Five: Nondeterministic Finite Automata. Formal Language, chapter 5, slide 1
Chpter Five: Nondeterministic Finite Automt Forml Lnguge, chpter 5, slide 1 1 A DFA hs exctly one trnsition from every stte on every symol in the lphet. By relxing this requirement we get relted ut more
More informationExam 2, Mathematics 4701, Section ETY6 6:05 pm 7:40 pm, March 31, 2016, IH-1105 Instructor: Attila Máté 1
Exm, Mthemtics 471, Section ETY6 6:5 pm 7:4 pm, Mrch 1, 16, IH-115 Instructor: Attil Máté 1 17 copies 1. ) Stte the usul sufficient condition for the fixed-point itertion to converge when solving the eqution
More informationParse trees, ambiguity, and Chomsky normal form
Prse trees, miguity, nd Chomsky norml form In this lecture we will discuss few importnt notions connected with contextfree grmmrs, including prse trees, miguity, nd specil form for context-free grmmrs
More informationDesigning finite automata II
Designing finite utomt II Prolem: Design DFA A such tht L(A) consists of ll strings of nd which re of length 3n, for n = 0, 1, 2, (1) Determine wht to rememer out the input string Assign stte to ech of
More informationCS 373, Spring Solutions to Mock midterm 1 (Based on first midterm in CS 273, Fall 2008.)
CS 373, Spring 29. Solutions to Mock midterm (sed on first midterm in CS 273, Fll 28.) Prolem : Short nswer (8 points) The nswers to these prolems should e short nd not complicted. () If n NF M ccepts
More information5: The Definite Integral
5: The Definite Integrl 5.: Estimting with Finite Sums Consider moving oject its velocity (meters per second) t ny time (seconds) is given y v t = t+. Cn we use this informtion to determine the distnce
More informationFarey Fractions. Rickard Fernström. U.U.D.M. Project Report 2017:24. Department of Mathematics Uppsala University
U.U.D.M. Project Report 07:4 Frey Frctions Rickrd Fernström Exmensrete i mtemtik, 5 hp Hledre: Andres Strömergsson Exmintor: Jörgen Östensson Juni 07 Deprtment of Mthemtics Uppsl University Frey Frctions
More informationSurface maps into free groups
Surfce mps into free groups lden Wlker Novemer 10, 2014 Free groups wedge X of two circles: Set F = π 1 (X ) =,. We write cpitl letters for inverse, so = 1. e.g. () 1 = Commuttors Let x nd y e loops. The
More informationAn Optimal Best-First Search Algorithm for Solving Infinite Horizon DEC-POMDPs
An Optiml Best-First Serch Algorithm for Solving Infinite Horizon DEC-POMDPs Dniel Szer nd Frnçois Chrpillet INRIA Lorrine - LORIA, MAIA Group, 54506 Vndœuvre-lès-Nncy, Frnce {szer, chrp}@lori.fr http://mi.lori.fr
More informationLinear Systems with Constant Coefficients
Liner Systems with Constnt Coefficients 4-3-05 Here is system of n differentil equtions in n unknowns: x x + + n x n, x x + + n x n, x n n x + + nn x n This is constnt coefficient liner homogeneous system
More informationLAMEPS Limited area ensemble forecasting in Norway, using targeted EPS
Limited re ensemle forecsting in Norwy, using trgeted Mrit H. Jensen, Inger-Lise Frogner* nd Ole Vignes, Norwegin Meteorologicl Institute, (*held the presenttion) At the Norwegin Meteorologicl Institute
More informationLecture 2: January 27
CS 684: Algorithmic Gme Theory Spring 217 Lecturer: Év Trdos Lecture 2: Jnury 27 Scrie: Alert Julius Liu 2.1 Logistics Scrie notes must e sumitted within 24 hours of the corresponding lecture for full
More informationContinuous Random Variables Class 5, Jeremy Orloff and Jonathan Bloom
Lerning Gols Continuous Rndom Vriles Clss 5, 8.05 Jeremy Orloff nd Jonthn Bloom. Know the definition of continuous rndom vrile. 2. Know the definition of the proility density function (pdf) nd cumultive
More informationCMSC 330: Organization of Programming Languages
CMSC 330: Orgniztion of Progrmming Lnguges Finite Automt 2 CMSC 330 1 Types of Finite Automt Deterministic Finite Automt (DFA) Exctly one sequence of steps for ech string All exmples so fr Nondeterministic
More informationGoals: Determine how to calculate the area described by a function. Define the definite integral. Explore the relationship between the definite
Unit #8 : The Integrl Gols: Determine how to clculte the re described by function. Define the definite integrl. Eplore the reltionship between the definite integrl nd re. Eplore wys to estimte the definite
More informationLecture 2e Orthogonal Complement (pages )
Lecture 2e Orthogonl Complement (pges -) We hve now seen tht n orthonorml sis is nice wy to descrie suspce, ut knowing tht we wnt n orthonorml sis doesn t mke one fll into our lp. In theory, the process
More informationFig. 1. Open-Loop and Closed-Loop Systems with Plant Variations
ME 3600 Control ystems Chrcteristics of Open-Loop nd Closed-Loop ystems Importnt Control ystem Chrcteristics o ensitivity of system response to prmetric vritions cn be reduced o rnsient nd stedy-stte responses
More informationMonte Carlo method in solving numerical integration and differential equation
Monte Crlo method in solving numericl integrtion nd differentil eqution Ye Jin Chemistry Deprtment Duke University yj66@duke.edu Abstrct: Monte Crlo method is commonly used in rel physics problem. The
More informationTypes of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. NFA for (a b)*abb.
CMSC 330: Orgniztion of Progrmming Lnguges Finite Automt 2 Types of Finite Automt Deterministic Finite Automt () Exctly one sequence of steps for ech string All exmples so fr Nondeterministic Finite Automt
More informationCS415 Compilers. Lexical Analysis and. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University
CS415 Compilers Lexicl Anlysis nd These slides re sed on slides copyrighted y Keith Cooper, Ken Kennedy & Lind Torczon t Rice University First Progrmming Project Instruction Scheduling Project hs een posted
More informationRiemann is the Mann! (But Lebesgue may besgue to differ.)
Riemnn is the Mnn! (But Lebesgue my besgue to differ.) Leo Livshits My 2, 2008 1 For finite intervls in R We hve seen in clss tht every continuous function f : [, b] R hs the property tht for every ɛ >
More informationNumerical Integration
Chpter 1 Numericl Integrtion Numericl differentition methods compute pproximtions to the derivtive of function from known vlues of the function. Numericl integrtion uses the sme informtion to compute numericl
More informationTypes of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. Comparing DFAs and NFAs (cont.) Finite Automata 2
CMSC 330: Orgniztion of Progrmming Lnguges Finite Automt 2 Types of Finite Automt Deterministic Finite Automt () Exctly one sequence of steps for ech string All exmples so fr Nondeterministic Finite Automt
More informationA study of Pythagoras Theorem
CHAPTER 19 A study of Pythgors Theorem Reson is immortl, ll else mortl. Pythgors, Diogenes Lertius (Lives of Eminent Philosophers) Pythgors Theorem is proly the est-known mthemticl theorem. Even most nonmthemticins
More informationInterpreting Integrals and the Fundamental Theorem
Interpreting Integrls nd the Fundmentl Theorem Tody, we go further in interpreting the mening of the definite integrl. Using Units to Aid Interprettion We lredy know tht if f(t) is the rte of chnge of
More informationDecision Networks. CS 188: Artificial Intelligence Fall Example: Decision Networks. Decision Networks. Decisions as Outcome Trees
CS 188: Artificil Intelligence Fll 2011 Decision Networks ME: choose the ction which mximizes the expected utility given the evidence mbrell Lecture 17: Decision Digrms 10/27/2011 Cn directly opertionlize
More informationNew Expansion and Infinite Series
Interntionl Mthemticl Forum, Vol. 9, 204, no. 22, 06-073 HIKARI Ltd, www.m-hikri.com http://dx.doi.org/0.2988/imf.204.4502 New Expnsion nd Infinite Series Diyun Zhng College of Computer Nnjing University
More informationIntermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4
Intermedite Mth Circles Wednesdy, Novemer 14, 2018 Finite Automt II Nickols Rollick nrollick@uwterloo.c Regulr Lnguges Lst time, we were introduced to the ide of DFA (deterministic finite utomton), one
More informationNUMERICAL INTEGRATION. The inverse process to differentiation in calculus is integration. Mathematically, integration is represented by.
NUMERICAL INTEGRATION 1 Introduction The inverse process to differentition in clculus is integrtion. Mthemticlly, integrtion is represented by f(x) dx which stnds for the integrl of the function f(x) with
More informationHow to simulate Turing machines by invertible one-dimensional cellular automata
How to simulte Turing mchines by invertible one-dimensionl cellulr utomt Jen-Christophe Dubcq Déprtement de Mthémtiques et d Informtique, École Normle Supérieure de Lyon, 46, llée d Itlie, 69364 Lyon Cedex
More information1 Nondeterministic Finite Automata
1 Nondeterministic Finite Automt Suppose in life, whenever you hd choice, you could try oth possiilities nd live your life. At the end, you would go ck nd choose the one tht worked out the est. Then you
More informationDesigning Information Devices and Systems I Spring 2018 Homework 8
EECS 16A Designing Informtion Devices nd Systems I Spring 2018 Homework 8 This homework is due Mrch 19, 2018, t 23:59. Self-grdes re due Mrch 22, 2018, t 23:59. Sumission Formt Your homework sumission
More informationChapter 9 Definite Integrals
Chpter 9 Definite Integrls In the previous chpter we found how to tke n ntiderivtive nd investigted the indefinite integrl. In this chpter the connection etween ntiderivtives nd definite integrls is estlished
More informationSUMMER KNOWHOW STUDY AND LEARNING CENTRE
SUMMER KNOWHOW STUDY AND LEARNING CENTRE Indices & Logrithms 2 Contents Indices.2 Frctionl Indices.4 Logrithms 6 Exponentil equtions. Simplifying Surds 13 Opertions on Surds..16 Scientific Nottion..18
More informationReview of Probability Distributions. CS1538: Introduction to Simulations
Review of Proility Distriutions CS1538: Introduction to Simultions Some Well-Known Proility Distriutions Bernoulli Binomil Geometric Negtive Binomil Poisson Uniform Exponentil Gmm Erlng Gussin/Norml Relevnce
More informationLecture 3. In this lecture, we will discuss algorithms for solving systems of linear equations.
Lecture 3 3 Solving liner equtions In this lecture we will discuss lgorithms for solving systems of liner equtions Multiplictive identity Let us restrict ourselves to considering squre mtrices since one
More informationdifferent methods (left endpoint, right endpoint, midpoint, trapezoid, Simpson s).
Mth 1A with Professor Stnkov Worksheet, Discussion #41; Wednesdy, 12/6/217 GSI nme: Roy Zho Problems 1. Write the integrl 3 dx s limit of Riemnn sums. Write it using 2 intervls using the 1 x different
More informationA Fast and Reliable Policy Improvement Algorithm
A Fst nd Relible Policy Improvement Algorithm Ysin Abbsi-Ydkori Peter L. Brtlett Stephen J. Wright Queenslnd University of Technology UC Berkeley nd QUT University of Wisconsin-Mdison Abstrct We introduce
More informationC Dutch System Version as agreed by the 83rd FIDE Congress in Istanbul 2012
04.3.1. Dutch System Version s greed y the 83rd FIDE Congress in Istnul 2012 A Introductory Remrks nd Definitions A.1 Initil rnking list A.2 Order See 04.2.B (Generl Hndling Rules - Initil order) For pirings
More informationNumerical integration
2 Numericl integrtion This is pge i Printer: Opque this 2. Introduction Numericl integrtion is problem tht is prt of mny problems in the economics nd econometrics literture. The orgniztion of this chpter
More informationODE: Existence and Uniqueness of a Solution
Mth 22 Fll 213 Jerry Kzdn ODE: Existence nd Uniqueness of Solution The Fundmentl Theorem of Clculus tells us how to solve the ordinry differentil eqution (ODE) du = f(t) dt with initil condition u() =
More informationTHERMAL EXPANSION COEFFICIENT OF WATER FOR VOLUMETRIC CALIBRATION
XX IMEKO World Congress Metrology for Green Growth September 9,, Busn, Republic of Kore THERMAL EXPANSION COEFFICIENT OF WATER FOR OLUMETRIC CALIBRATION Nieves Medin Hed of Mss Division, CEM, Spin, mnmedin@mityc.es
More informationSolution for Assignment 1 : Intro to Probability and Statistics, PAC learning
Solution for Assignment 1 : Intro to Probbility nd Sttistics, PAC lerning 10-701/15-781: Mchine Lerning (Fll 004) Due: Sept. 30th 004, Thursdy, Strt of clss Question 1. Bsic Probbility ( 18 pts) 1.1 (
More informationList all of the possible rational roots of each equation. Then find all solutions (both real and imaginary) of the equation. 1.
Mth Anlysis CP WS 4.X- Section 4.-4.4 Review Complete ech question without the use of grphing clcultor.. Compre the mening of the words: roots, zeros nd fctors.. Determine whether - is root of 0. Show
More informationAn approximation to the arithmetic-geometric mean. G.J.O. Jameson, Math. Gazette 98 (2014), 85 95
An pproximtion to the rithmetic-geometric men G.J.O. Jmeson, Mth. Gzette 98 (4), 85 95 Given positive numbers > b, consider the itertion given by =, b = b nd n+ = ( n + b n ), b n+ = ( n b n ) /. At ech
More informationReview of Calculus, cont d
Jim Lmbers MAT 460 Fll Semester 2009-10 Lecture 3 Notes These notes correspond to Section 1.1 in the text. Review of Clculus, cont d Riemnn Sums nd the Definite Integrl There re mny cses in which some
More informationSection 6: Area, Volume, and Average Value
Chpter The Integrl Applied Clculus Section 6: Are, Volume, nd Averge Vlue Are We hve lredy used integrls to find the re etween the grph of function nd the horizontl xis. Integrls cn lso e used to find
More informationNOTES ON HILBERT SPACE
NOTES ON HILBERT SPACE 1 DEFINITION: by Prof C-I Tn Deprtment of Physics Brown University A Hilbert spce is n inner product spce which, s metric spce, is complete We will not present n exhustive mthemticl
More information8 Laplace s Method and Local Limit Theorems
8 Lplce s Method nd Locl Limit Theorems 8. Fourier Anlysis in Higher DImensions Most of the theorems of Fourier nlysis tht we hve proved hve nturl generliztions to higher dimensions, nd these cn be proved
More information