Point-Based POMDP Algorithms: Improved Analysis and Implementation

Size: px
Start display at page:

Download "Point-Based POMDP Algorithms: Improved Analysis and Implementation"

Transcription

1 Point-Bsed POMDP Algorithms: Improved Anlysis nd Implementtion Trey Smith nd Reid Simmons Rootics Institute, Crnegie Mellon University Pittsurgh, PA Astrct Existing complexity ounds for point-sed POMDP vlue itertion lgorithms focus either on the curse of dimensionlity or the curse of history. We derive new ound tht relies on oth nd uses the concept of discounted rechility; our conclusions my help guide future lgorithm design. We lso discuss recent improvements to our (point-sed) heuristic serch vlue itertion lgorithm. Our new implementtion clcultes tighter initil ounds, voids solving liner progrms, nd mkes more effective use of sprsity. Empiricl results show speedups of more thn two orders of mgnitude. 1 INTRODUCTION Prtilly oservle Mrkov decision processes (POMDPs) constitute powerful proilistic model for plnning prolems tht include hidden stte nd uncertinty in ction effects. Recently, severl POMDP solution lgorithms hve een developed tht use pproximte vlue itertion with point-sed updtes. These lgorithms hve proven to scle very effectively, relying on the fct tht performing mny fst pproximte updtes often results in more useful vlue function thn performing few exct updtes. Point-sed updtes re pplied over set B of eliefs drwn from the rechle prt of the elief simplex. One cn derive ound on the pproximtion error tht is proportionl to the smple spcing of B [Pineu et l., 2003]. The numer of points required is driven y the curse of dimensionlity: chieving desired smple spcing requires numer of smples exponentil in the dimensionlity of the elief simplex. However, in discounted prolems, one cn tolerte more pproximtion error t points tht re only rechle fter mny time steps. This ide, which is not used in the smple spcing rgument, is the sis of second type of convergence result [Zhng nd Zhng, 2001, Smith nd Simmons, 2004], in which the error ound is derived from the fct tht B smples enough of the serch tree to some depth. The numer of points required is driven y the curse of history: fully expnding the serch tree to depth t requires numer of points exponentil in t. This pper presents new convergence rgument tht drws on oth pproches. Our nlysis pplies to the cse when the smple spcing vries ccording to wht we cll discounted rechility, which more ccurtely reflects the ehvior of current lgorithms (Fig. 1). () uniform density rechle eliefs 0 () non-uniform reflecting discounted rechility Figure 1: Smpling strtegies for B. The reminder of the pper discusses recent improvements in our heuristic serch vlue itertion lgorithm (HSVI). HSVI is point-sed lgorithm tht mintins oth upper nd lower ounds on the optiml vlue function, llowing it to use effective heuristics for ction nd oservtion selection, nd to provide provly smll regret from the policy it genertes [Smith nd Simmons, 2004]. The new implementtion of HSVI clcultes tighter initil ounds, voids solving liner progrms, nd mkes more effective use of sprsity. Empiricl results show speedups of more thn two orders of mgnitude. 2 POMDP INTRODUCTION A POMDP models plnning prolem tht includes hidden stte nd uncertinty in ction effects; the gent is ssumed to know the trnsition model. Formlly, POMDP is descried y finite set of sttes S = {s 1,..., s S },

2 finite set of ctions A = { 1,..., A }, finite set of oservtions Z = {z 1,..., z Z }, trnsition proilities T,z (s i, s j ) = Pr(s j s i,, z), rel-vlued rewrd function R(s, ), discount fctor γ < 1, nd n initil elief 0. The Mrkov property of the model ensures tht the gent cn use proility distriution over current sttes s sufficient sttistic for the history of ctions nd oservtions. Geometriclly, the spce of eliefs is simplex, denoted. At ech stge of forwrd simultion the current elief cn e updted sed on the ltest ction nd oservtion z using the formul τ(,, z), defined so tht (s ) = s T,z (s, s )(s) (1) Only suset of is rechle from 0 through repeted pplictions of τ; this suset is denoted. In generl the oject of the plnning prolem is to generte policy π tht mximizes expected long-term rewrd: [ ] J π () = E γ t R(s t, t ), π (2) t=0 A glolly optiml policy π is known to exist when γ < 1 [Howrd, 1960]. We re prticulrly interested in the focused pproximtion setting, in which one ttempts to generte policy ˆπ tht minimizes regret J π ( 0 ) J ˆπ ( 0 ) when executed strting from 0. A POMDP is often solved y pproximting its optiml vlue function V = J π. Any vlue function V induces policy π V in which ctions re selected vi one-step lookhed. The regret of the policy induced y n pproximte vlue function ˆV cn e forced ritrrily smll y reducing the mx-norm error V ˆV. Vlue itertion strts with n initil guess V 0 nd pproximtes V through repeted ppliction of the Bellmn updte V t HV t 1, where H is defined s [ ] HV () = mx R(, ) + Pr(, )V ( ) V stisfies Bellmn s eqution V = HV. When γ < 1, H is contrction nd V is the unique solution. During vlue itertion, ech V t is piecewise liner nd convex [Sondik, 1971], so it cn e represented s set of vectors Γ t = {α 1,..., α Γ }, such tht V t () = mx i (α i ). There re numer of vlue itertion lgorithms tht clculte H exctly y projecting α vectors from Γ t 1 to Γ t [Sondik, 1971, Cssndr et l., 1997]. Unfortuntely, in the worst cse the size of the representtion grows s Γ t = A Γ t 1 O, which rpidly ecomes intrctle even for modest prolem sizes. Despite clever strtegies for (3) Algorithm 1. β = ckup(γ, ). 1. β,z rgmx α Γ (α τ(,, z)) 2. β (s) R(s, ) + γ s,z β,z(s )T,z (s, s ) 3. β rgmx β (β ) pruning dominted α vectors, these lgorithms hve een unle to scle to lrger prolems. The intrctility of exct vlue itertion hs led to the development of wide vriety of pproximtion techniques, too mny to mention here [Aerdeen, 2002]. 3 POINT-BASED ALGORITHMS Point-sed vlue itertion lgorithms rely on the fct tht performing mny fst pproximte updtes often results in more useful vlue function thn performing few exct updtes. Their fundmentl opertion is the point-sed updte ckup(γ, ), which genertes single α vector from HV tht is gurnteed to e mximl t (Alg. 1). Our nlysis focuses on simple conceptul version of point-sed vlue itertion. We ssume there is fixed finite set of eliefs B. At ech step the lgorithm genertes n α vector for every point in B, nd the set of vectors Γ defines vlue function through mx-projection s descried erlier. Denote the vlue function fter t updtes s Vt B. The vlue function is initilized with V0 B R min, nd the updte rule is Vt B H B Vt 1, B where the updte opertor H B pplies the point-sed updte t every point of B: H B Γ = {ckup(γ, ) B} (4) In this cse, the pproximtion error reltive to exct vlue itertion fter t updtes, V t Vt B, is known to e ounded proportionlly with the smple spcing δ(b), which is defined to e the mximum 1-norm distnce from ny point in to B [Pineu et l., 2003]. B thus needs to contin only enough points to cover with uniform smple spcing. However, current point-sed lgorithms do not smple uniformly (lthough the PBVI lgorithm mkes some ttempt to do so). Insted, they collect points for B y forwrd simultion process tht ises B to contin eliefs tht re only few simultion steps wy from 0. However, this is rguly helps rther thn hurts. It underlies second type of convergence rgument sed on the depth of the serch tree. If B contins ll the eliefs tht result from expnding the serch tree to depth t nd t updtes over B re performed, then the pproximtion error t 0 is ounded proportionlly with γ t.

3 3.1 NEW THEORETICAL RESULTS This section presents new convergence rgument tht drws on the two erlier pproches. Its use of weighted mx-norm mchinery in vlue itertion is closely relted to [Munos, 2004]. Our rgument reflects current point-sed lgorithms in tht it llows B to e non-uniform smpling of whose spcing vries ccording to discounted rechility. The discounted rechility ρ : R is defined to e ρ() = γ L, where L is the length of the shortest sequence of elief-stte trnsitions from 0 to. ρ stisfies the property tht ρ( ) γρ() whenever there is single-step trnsition from to. Bsed on ρ, we define generlized smple spcing mesure δ p (with 0 p < 1): 1 δ p (B) = mx min B ρ p () In order to chieve smll δ p vlue, B must hve smll 1- norm distnce from ll points in, ut its distnce from cn e proportionlly lrger if ρ p () is smll. When smple spcing is ounded in terms of δ p, H B does not hve the error properties we wnt under the usul mxnorm. We must define new norm to reflect the fct tht H B induces lrger errors where ρ is smll. A weighted mx-norm is function ξ such tht V V ξ = mx (5) V () V (), (6) ξ() where ξ > 0. Not surprisingly, ρ p is the norm we need. Note tht when p = 0, δ p reduces to the uniform spcing mesure δ nd ρ p reduces to the mx-norm. We egin y generlizing some well-known results out stndrd vlue itertion to the ρ p -norm with 0 p < 1. Theorem 1. The exct Bellmn updte H is contrction under the ρ p -norm with contrction fctor γ 1 p. Proof. Define Q V () = R(, ) + γ Pr(, )V ( ) (7) so tht HV = mx Q V. For ny, the mpping V Q V hs contrction fctor γ 1 p : Q V Q V ρ p = mx Q V () Q V [ρ()] p (8) = mx γ Pr(, ) V ( ) V ( ) [ρ()] p (9) mx γ Pr(, ) V ( ) V ( ) [γρ( )] p (10) mx γ 1 p Pr(, ) V V ρ p (11) = γ 1 p V V ρ p (12) Now choose n ritrry. Assume without loss of generlity tht HV () H V (). Choose to mximize Q V (), nd ā to mximize Q Vā (). It follows tht Q V () Q Vā () Q V (), nd HV () H V () = Q V () Q Vā () (13) Q V () Q V () (14) mx Q V () Q V () (15) Dividing through y ρ p () nd mximizing over yields HV H V ρ p mx Q V () Q V () ρ p (16) γ 1 p V V ρ p Theorem 2. Let ˆπ e the one-step lookhed policy induced y n pproximte vlue function ˆV. The regret from executing ˆπ rther thn π, strting from 0, is t most 2γ 1 p 1 γ 1 p V ˆV ρ p (17) Proof. Choose n ritrry. It is esy to check tht for ny policy π, J π () = Q J π π() (). Also, ecuse ˆπ is the one-step lookhed policy induced y ˆV, Q ˆVˆπ() () = H ˆV (). The Bellmn eqution sttes tht V = HV. Then: J π () J ˆπ () = V () Q J ˆπ ˆπ() () (18) = V () Q ˆVˆπ() () + Q ˆVˆπ() () Q J ˆπ ˆπ() () (19) V () Q ˆVˆπ() () + Q ˆVˆπ() () Q J ˆπ ˆπ() () (20) HV () H ˆV () + γ Pr(, ˆπ()) ˆV ( ) J ˆπ ( ) (21) HV () H ˆV () + γ Pr(, ˆπ()) γ p ρ p () ˆV J ˆπ ρ p (22) HV () H ˆV () + γ 1 p ρ p () ˆV J ˆπ ρ p (23) Dividing through y ρ p () nd mximizing over gives J π J ˆπ ρ p (24) HV H ˆV ρ p + γ 1 p ˆV J ˆπ ρ p (25) γ 1 p( V ˆV ρ p + ˆV J ˆπ ) ρ p (26) γ 1 p( V ˆV ρ p + (27) ˆV V ρ p + V J ˆπ ) ρ p (28) γ 1 p( 2 V ˆV ρ p + V J ˆπ ) ρ p (29) = γ 1 p( 2 V ˆV ρ p + J π J ˆπ ) ρ p (30) = 2γ 1 p V ˆV ρ p + γ 1 p J π J ˆπ ρ p (31)

4 Solving the recursion, J π J ˆπ ρ p 2γ1 p 1 γ 1 p V ˆV ρ p (32) And since ρ( 0 ) = 1, we hve the desired regret ound: J π ( 0 ) J ˆπ ( 0 ) 2γ1 p 1 γ 1 p V ˆV ρ p It is worth noting (lthough we lck spce to prove it here) tht tighter ound pplies when ˆV is uniformly improvle [Zhng nd Zhng, 2001]. A smll modifiction to H B would mke Vt B uniformly improvle t the cost of incresing Γ. In tht cse the regret would e t most γ 1 p V ˆV ρ p. Hving discussed the ρ p -norm ehvior of H, now we move on to the ρ p -norm ehvior of H B with non-uniform smple spcing δ p. Lemm 1. At ny updte step t, the error HV t B H B Vt B ρ p introduced y single ppliction of H B rther thn H is t most (R mx R min )δ p (B) 1 γ 1 p (33) Proof. The rgument is nlogous to Lemm 1 of [Pineu et l., 2003]. Necessry chnges: (1) restrict to e drwn from, (2) divide throughout y ρ p ( ), nd (3) sustitute γ 1 p for γ in the denomintor to reflect the chnged contrction properties of H under the new norm. Theorem 3. At ny updte step t, the ccumulted error V t V B t ρ p is t most (R mx R min )δ p (B) (1 γ 1 p ) 2 (34) Proof. The rgument is nlogous to Theorem 1 of [Pineu et l., 2003]. Necessry chnges: (1) replce the mx-norm with the ρ p -norm, nd (2) replce γ with γ 1 p. Tken together, these results show tht the conceptul lgorithm cn e used to generte policy with ritrrily smll regret relted to δ p (B), nd they provide finite ound on the numer of updtes required to chieve given regret. 3.2 IMPLICATIONS FOR ALGORITHM DESIGN The is of our model towrd eliefs with high discounted rechility descries current lgorithms more ccurtely thn uniform smpling, t lest to the extent tht the lgorithms perform (typiclly shllow) forwrd explortion from the initil elief to generte B. The prmeter p rose nturlly during our nlysis. p = 0 corresponds to uniform smpling nd the usul mx-norm. As p increses, smples grow less dense in res with low rechility nd the norm ecomes correspondingly more tolernt. But the results show tht there s no free lunch: the higher effective discount fctor γ 1 p under the new norm mens tht more updtes re required nd the finl error ounds re looser. The new theoreticl frmework provides wy to nlyze this trde-off. We initilly found the concept of discounted rechility surprising. The intuition is tht (1) eliefs tht re deeper in the serch tree re less relevnt, nd (2) eliefs tht cn only e reched y low-proility elief trnsitions re less relevnt. But discounted rechility ignores (2) entirely, in tht ll trnsitions with non-zero proility re treted eqully. Actully, we strted with different concept of discounted occupncy, in which eliefs re tgged s proportionlly less relevnt if they cn only e reched y low-proility elief trnsitions. The is of current lgorithms seems to e etter descried y discounted occupncy, nd empiriclly, treting ll trnsitions with non-zero proility eqully hurts performnce. But the convergence results we found do not go through when discounted occupncy is used insted of discounted rechility. We hope tht more sophisticted future nlysis will shed light on this issue. In summry, these new results tke us closer to understnding point-sed lgorithms. The nlysis helps explin importnt trde-offs in lgorithm design, lthough we hve not yet hd time to pply it to working lgorithm. The next section chnges the topic to recent improvements in our (point-sed) HSVI lgorithm. Note tht those improvements re not sed on the theoreticl results just presented. 4 IMPROVEMENTS IN HEURISTIC SEARCH VALUE ITERATION This section discusses recent improvements in our heuristic serch vlue itertion lgorithm (HSVI). Reltive to our originl presenttion of HSVI, the new implementtion clcultes tighter initil ounds, voids solving liner progrms, nd mkes more effective use of sprsity. Empiricl results show speedups of up to three orders of mgnitude. 4.1 HSVI OVERVIEW HSVI is point-sed lgorithm tht mintins oth upper nd lower ounds on the optiml vlue function, llowing it to use effective heuristics for ction nd oservtion selection, nd to provide provly smll regret from the policy it genertes. We provide rief overview here;

5 Algorithm 2. π = HSVI(ɛ). HSVI(ɛ) returns policy π whose regret reltive to π, strting from 0, is t most ɛ. 1. Initilize the ounds ˆV. 2. While width( ˆV ( 0 )) > ɛ, repetedly invoke explore( 0, ɛ, 0). 3. Hving chieved the desired precision, return the direct-control policy π corresponding to the lower ound. Algorithm 3. explore(, ɛ, t). explore recursively follows single pth down the serch tree until stisfying termintion condition sed on the width of the ounds intervl. It then performs series of updtes on its wy ck up to the initil elief. 1. If width( ˆV ()) ɛγ t, return. 2. Select n ction nd oservtion z ccording to the forwrd explortion heuristics. 3. Cll explore(τ(,, z ), ɛ, t + 1). 4. Perform point-sed updte of ˆV t elief. HSVI1 initilizes the lower ound using conservtive estimte of the vlues of lind policies of the form lwys execute ction. The smllest possile rewrd from executing ction is min s R(s, ), so ound on the longterm rewrd for tht policy cn e found y evluting the relevnt summtion. HSVI1 then mximizes over : R mx t=0 γ t min s R(s, ) = mx min s R(s, ) 1 γ (35) The vector set for the initil lower ound V 0 contins single vector α such tht every α(s) = R. HSVI1 initilizes the upper ound y ssuming full oservility nd solving the MDP version of the prolem. This provides upper ound vlues t the corners of the elief simplex, which form the initil point set. V V * V updte Figure 2: Locl updte t Locl Updtes for more detil refer to [Smith nd Simmons, 2004]. We refer to the originl version nd our current version s HSVI1 nd HSVI2, respectively. The differences re covered comprehensively in 4.2. HSVI is outlined in Algs. 2 nd 3. We denote the lower nd upper ound functions s V nd V, respectively. The intervl function ˆV refers to them collectively, such tht ˆV () = [V (), V ()] nd width( ˆV ()) = V () V () Vlue Function Representtion HSVI uses the usul Γ vector set representtion for its lower ound (see 2). Unfortuntely, if the upper ound is represented with vector set, updting y dding vector does not hve the desired effect of improving the ound in the neighorhood of the locl updte. To ccommodte the need for updtes, HSVI uses point set representtion for the upper ound. The vlue t point is the projection of onto the convex hull formed y finite set Υ of elief/vlue points ( i, v i ). Updtes re performed y dding new point to the set. In HSVI1, the projection onto the convex hull is clculted y solving liner progrm using the commercil CPLEX softwre pckge Initiliztion HSVI performs locl updte L of the lower ound y dding the result of point-sed updte t to the vector set: L Γ = Γ {ckup(γ, )} (36) It performs locl updte U of the upper ound y dding the result of Bellmn updte t to the point set: U Υ = Υ {(, H V ())} (37) Fig. 2 represents the structure of the ounds representtions nd the process of loclly updting t. In the left side of the figure, the points nd dotted lines represent V (upper ound points nd convex hull). Severl solid lines represent the vectors of Γ. In the right side of the figure, we see the result of updting oth ounds t, which involves dding new point to Υ nd new vector to Γ, ringing oth ounds closer to V. HSVI periodiclly prunes dominted elements in oth the lower ound vector set nd the upper ound point set; we do not discuss the pruning here ecuse it is unffected y our recent chnges Forwrd Explortion Heuristics This section discusses the heuristics tht re used to decide which child of the current node to visit s HSVI works its wy forwrd from the initil elief. Strting from prent node, HSVI must choose n ction nd n oservtion z : the child node to visit is τ(,, z ).

6 HSVI selects ctions greedily sed on the upper ound (the IE-MAX heuristic). At elief, for every ction, it cn compute n upper ound on the long-term rewrd from tking tht ction. It chooses the ction with the highest upper ound: = rgmx Q V () (38) Becuse the ounds t prent node re lwys wider thn the ounds t the child with the highest upper ound vlue, choosing ccording to IE-MAX is good wy to ensure convergence. In the simpler context where updtes do not ffect neighoring nodes, it is provly optiml [Kelling, 1993]. HSVI uses the weighted excess uncertinty heuristic for oservtion selection. Excess uncertinty t elief with depth t in the serch tree is defined to e excess(, t) = width( ˆV ()) ɛγ t (39) Excess uncertinty hs the property tht if ll the children of node hve negtive excess uncertinty, then fter n updte will lso hve negtive excess uncertinty. Negtive excess uncertinty t the root implies the desired convergence to within ɛ. The weighted excess uncertinty heuristic is designed to focus ttention on the child node with the gretest contriution to the excess uncertinty t the prent: z = rgmx z [ Pr(z, )excess(τ(,, z), t + 1) ] (40) Both the ction nd oservtion selection heuristics re designed so tht pplying them systemticlly gurntees HSVI convergence in finite time [Smith nd Simmons, 2004]. 4.2 CHANGES BETWEEN HSVI1 AND HSVI2 We report series of chnges mde since our initil presenttion of HSVI1. The chnges re roughly ordered in terms of their impct on the overll performnce. The reltive speedup for individul chnges is prolem-dependent; the reported vlues were mesured informlly on the Tg prolem. HSVI2 performnce is presented in More Effective Use of Sprsity HSVI1 represents eliefs nd trnsition functions s vectors nd mtrices in BLAS compressed storge mode [Dongrr et l., 1988]. It uses n off-the-shelf sprse liner lger pckge to compute elief trnsitions nd tke dot products. Tht pckge turned out to e using inpproprite lgorithms, slowing down individul opertions y s much s 100x. We ddressed the prolem in HSVI2 y writing our own simple compressed storge opertions, which speed up lower ound updtes y out 50x. HSVI1 represents α vectors in dense storge mode ecuse they tend to hve lrge numer of non-zeros, even when eliefs re sprse. Typiclly, when α ckup(γ, ) is pplied, ll of the entries of α must e computed, even if is sprse nd most of the entries hve no effect on the vlue α. They re required ecuse HSVI my lter need to evlute α where hs different non-zeros. But if α is optimized for, why should we expect it to e relevnt to, which hs different non-zeros nd perhps no overlp with t ll? This leds to the ide of msked α vectors. In HSVI2, α ckup(γ, ) computes only the entries of α tht correspond to non-zeros of. A msk records which entries were computed. If HSVI2 lter evlutes mx i (α i ) nd hs non-zero in position tht ws not computed in α i, the dot product α i is rejected from considertion. This chnge cn e interpreted geometriclly. Sprse eliefs lie in hyperplnes on the oundry of the elief simplex. When msked α vector is computed using the new ckup(γ, ), it pplies only to the lowest-dimensionl oundry hyperplne contining. Empiriclly, msked α vectors speed up lower ound updtes y out 5x. Note tht lmost ny POMDP vlue itertion lgorithm could mke use of this concept Avoid Solving Liner Progrms HSVI1 evlutes V () y computing the exct projection of onto the convex hull of the points in Υ, which involves solving liner progrm with the commercil CPLEX softwre pckge. Ech upper ound updte requires severl such projections, nd the time spent solving liner progrms domintes the upper ound updte time. HSVI2 uses n pproximte projection onto the convex hull suggested y [Huskrecht, 2000]. Projection onto the convex hull of set of points is prticulrly simple when the set contins only the corners of the elief simplex nd one interior point: it cn e computed in O( S ) time. To pproximtely project onto the overll convex hull, HSVI2 runs this opertion for ech interior point of Υ nd tkes the minimum vlue, requiring O( Υ S ) time overll (or less with sprsity). This pproximte convex hull hs the key properties tht (1) it is everywhere greter thn the exct convex hull, nd (2) the pproximtion t is exct if there is n undominted pir (, v) Υ. Empiriclly, the pproximte projection speeds up upper ound updtes y out 100x Tighter Initil Bounds HSVI1 genertes n initil lower ound sed on conservtive estimte of the vlues of lind policies. HSVI2 uses etter lind policy vlue estimte suggested in [Huskrecht, 1997]. The vlue α of ech policy lwys

7 tke ction is updted in MDP fshion: α t+1(s) = R(s, ) + γ s Pr(s s, )α t (s ) (41) Ech updte of A vectors cn e evluted in O( S 2 A ) time. HSVI2 initilizes the vectors α 0 using the HSVI1 lower ound, which gurntees tht the ound is vlid even if the itertion is not run to completion. When the itertion is stopped, the α t vectors form the initil lower ound Γ. HSVI1 genertes n initil upper ound sed on the vlue function of the fully oservle MDP. HSVI2 uses the fst informed ound (FIB) pproximtion, which is gurnteed to give tighter upper ound thn the MDP pproximtion [Huskrecht, 2000]. FIB itertion keeps one vector α for ech ction nd uses the following updte rule: αt+1(s) = R(s, ) + γ mx Pr(s, z s, )α t (s ) z s Ech FIB updte cn e evluted in O( A 2 S 2 Z ) time. As with the lower ound, HSVI2 initilizes the upper ound vectors α 0 using the HSVI1 upper ound. When FIB itertion is stopped, ech corner point corresponding to stte s is initilized to mx α t (s). Empiriclly, HSVI2 cn run oth ound initiliztion routines to pproximte convergence (residul < 10 3 ) in t most few seconds for ll of the prolems in our enchmrk set. This results in etter performnce ner the eginning of HSVI2 execution, lthough lter in the run the effect is less significnt. The chnge in the lower ound initiliztion is the more importnt of the two; the MDP upper ound ws lredy firly good for most prolems. 4.3 HSVI2 PERFORMANCE Fig. 3 shows HSVI2 rewrd vs. time for four prolems from the sclle POMDP literture. The plotted rewrd is the verge received over 100 or more simultions. We lso plot HSVI2 s ounds V ( 0 ) nd V ( 0 ). HSVI2 ws run only once on ech prolem since it is not stochstic. The pltform used ws Pentium-4 running t 3.4 GHz, with 2 GB of RAM (HSVI2 used t most 250 MB of RAM). The plots show rnge of ehviors. RockSmple[4,4] is especilly esy; the HSVI2 ounds converge fter 13 seconds, showing tht the solution is optiml. Hllwy2 shows HSVI2 quickly rriving t n pprently ner-optiml solution, ut its ounds remin loose. Tg nd RockSmple[10,10] show typicl ehvior for lrge prolems: the upper ound decrese is slow nd stedy while the lower ound (nd the rewrd) improve in jumps, plteuing for long periods. RockSmple[10,10], with > 10 5 sttes, would e too lrge for most POMDP lgorithms to hndle; HSVI2 gins y use of sprsity. It would run out of memory with prolem 5-10 times lrger RockSmple[4,4] (257s 9 2o) Hllwy2 (93s 5 17o)) simultion 0.1 simultion ounds ounds Tg (870s 5 30o) RockSmple[10,10] (102,401s 19 2o) simultion 10 simultion ounds ounds Figure 3: HSVI2 rewrd vs. wllclock time. Fig. 4 shows running times nd solution qulity for HSVI nd severl other lgorithms. Note tht different lgorithms were run on different pltforms, so running times re only roughly comprle. The tle lso shows, for ech prolem, the 95% confidence intervl for rewrd mesurements ssuming the vrince of HSVI2 s est policy nd verging 100 rewrds. An lgorithm s rewrd is strred if it is within the confidence intervl reltive to the est reported vlue. HSVI2 is within mesurement error of the est reported rewrd for ll prolems, nd its running time is considerly shorter thn other lgorithms in most cses. The gretest speedup from HSVI1 to HSVI2 ws oserved on the Rock- Smple[7,8] prolem. HSVI2 tkes out 6 seconds to surpss the rewrd reched y HSVI1 fter > 10 4 seconds. After correcting for running on processor out 5x fster, this is > 300x speedup. Other stte-of-the-rt sclle POMDP lgorithms could not e compred to HSVI2 ecuse they were tested on different prolems. Among these, two techniques pper especilly promising. Exponentil-fmily PCA trnsforms the POMDP, compresses to low-dimensionl representtion in the trnsformed spce, then solves it with gridsed lgorithm. It hs demonstrted good results on lrgescle root nvigtion prolems [Roy nd Gordon, 2003]. Vlue-directed compression (VDC) is nother compression technique. It typiclly produces less compct representtion thn E-PCA, ut the compressed POMDP retins liner structure nd vlue function convexity, so tht it cn e solved using lmost ny POMDP lgorithm. The comintions VDC+BPI nd VDC+PBVI hve demonstrted sclility to huge prolem sizes, up to 33 million sttes [Pouprt nd Boutilier, 2004]. 1 VDC would likely oost 1 VDC+PBVI results courtesy of Pouprt, personl communi-

8 Prolem (sttes/ctions/oservtions) Rewrd Time (s) Γ Tiger-Grid (36s 5 17o) (±0.14) HSVI1 [Smith et l., 2004] 2.35* Perseus [Spn et l., 2004] 2.34* HSVI2 2.30* PBUA [Poon, 2001] 2.30* PBVI [Pineu et l., 2003] 2.25* BPI [Pouprt et l., 2003] 2.22* QMDP N/A Hllwy (61s 5 21o) (±0.038) PBVI [Pineu et l., 2003] 0.53* PBUA [Poon, 2001] 0.53* HSVI2 0.52* HSVI1 [Smith et l., 2004] 0.52* Perseus [Spn et l., 2004] 0.51* BPI [Pouprt et l., 2003] 0.51* QMDP N/A Hllwy2 (93s 5 17o) (±0.048) HSVI2 0.35* Perseus [Spn et l., 2004] 0.35* HSVI1 [Smith et l., 2004] 0.35* PBUA [Poon, 2001] 0.35* PBVI [Pineu et l., 2003] 0.34* BPI [Pouprt et l., 2004] 0.32* QMDP N/A Tg (870s 5 30o) (±1.2) Perseus [Spn et l., 2004] -6.17* HSVI2-6.36* HSVI1 [Smith et l., 2004] -6.37* BPI [Pouprt et l., 2004] -6.65* PBVI [Pineu et l., 2003] QMDP N/A RockSmple[4,4] (257s 9 2o) (±1.2) HSVI2 18.0* HSVI1 [Smith et l., 2004] 18.0* PBVI [Pineu, pers. communiction] 17.1* 2000? QMDP N/A RockSmple[7,8] (12,545s 13 2o) (±1.2) HSVI2 20.6* HSVI1 [Smith et l., 2004] QMDP N/A RockSmple[10,10] (102,401s 19 2o) (±1.3) HSVI2 20.4* QMDP 0 57 N/A Figure 4: Multi-lgorithm performnce comprison. HSVI sclility in similr wy. 5 CONCLUSION We presented new theoreticl results for point-sed lgorithms, which comine curse of dimensionlity nd curse of history rguments into n overll ound on the convergence of point-sed vlue itertion with non-uniform smple spcing. In the future we will pply these results to point-sed lgorithm design. We lso demonstrted improved performnce for our HSVI lgorithm, with speedups of more thn two orders of mgnitude nd successful scling to POMDP with > 10 5 sttes. In the future we would like to comine HSVI with compct representtion technique such s VDC to del with still lrger prolems. Acknowledgments Thnks to Geoff Gordon nd Pscl Pouprt for helpful discussions. This work ws funded in prt y NASA GSRP Fellowship with Ames Reserch Center. References [Aerdeen, 2002] Aerdeen, D. (2002). A survey of pproximte methods for solving prtilly oservle Mrkov decision processes. Technicl report, Reserch School of Informtion Science nd Engineering, Austrli Ntionl University. [Cssndr et l., 1997] Cssndr, A., Littmn, M., nd Zhng, N. (1997). Incrementl pruning: A simple, fst, exct method for prtilly oservle Mrkov decision processes. In Proc. of UAI. [Dongrr et l., 1988] Dongrr, J. J., Croz, J. D., Hmmrling, S., nd Hnson, R. J. (1988). An extended set of FORTRAN sic liner lger suprogrms. ACM Trns. Mth. Soft., 14:1 17. [Huskrecht, 1997] Huskrecht, M. (1997). Incrementl methods for computing ounds in prtilly oservle Mrkov decision processes. In Proc. of AAAI, pges , Providence, RI. [Huskrecht, 2000] Huskrecht, M. (2000). Vlue-function pproximtions for prtilly oservle Mrkov decision processes. Journl of Artificil Intelligence Reserch, 13: [Howrd, 1960] Howrd, R. A. (1960). Dynmic Progrmming nd Mrkov Processes. MIT. [Kelling, 1993] Kelling, L. P. (1993). Lerning in Emedded Systems. The MIT Press. [Munos, 2004] Munos, R. (2004). Error ounds for pproximte vlue itertion. Technicl Report CMAP 527, École Polytechnique. [Pineu et l., 2003] Pineu, J., Gordon, G., nd Thrun, S. (2003). Point-sed vlue itertion: An nytime lgorithm for POMDPs. In Proc. of IJCAI. [Poon, 2001] Poon, K.-M. (2001). A fst heuristic lgorithm for decision-theoretic plnning. Mster s thesis, The Hong Kong University of Science nd Technology. [Pouprt nd Boutilier, 2003] Pouprt, P. nd Boutilier, C. (2003). Bounded finite stte controllers. In Proc. of NIPS, Vncouver. [Pouprt nd Boutilier, 2004] Pouprt, P. nd Boutilier, C. (2004). VDCBPI: n pproximte sclle lgorithm for lrge scle POMDPs. In Proc. of NIPS, Vncouver. [Roy nd Gordon, 2003] Roy, N. nd Gordon, G. (2003). Exponentil fmily PCA for elief compression in POMDPs. In NIPS. [Smith nd Simmons, 2004] Smith, T. nd Simmons, R. (2004). Heuristic serch vlue itertion for POMDPs. In Proc. of UAI. [Sondik, 1971] Sondik, E. J. (1971). The optiml control of prtilly oservle Mrkov processes. PhD thesis, Stnford University. [Zhng nd Zhng, 2001] Zhng, N. L. nd Zhng, W. (2001). Speeding up the convergence of vlue itertion in prtilly oservle Mrkov decision processes. Journl of AI Reserch, 14: ction.

Reinforcement learning II

Reinforcement learning II CS 1675 Introduction to Mchine Lerning Lecture 26 Reinforcement lerning II Milos Huskrecht milos@cs.pitt.edu 5329 Sennott Squre Reinforcement lerning Bsics: Input x Lerner Output Reinforcement r Critic

More information

p-adic Egyptian Fractions

p-adic Egyptian Fractions p-adic Egyptin Frctions Contents 1 Introduction 1 2 Trditionl Egyptin Frctions nd Greedy Algorithm 2 3 Set-up 3 4 p-greedy Algorithm 5 5 p-egyptin Trditionl 10 6 Conclusion 1 Introduction An Egyptin frction

More information

Compact, Convex Upper Bound Iteration for Approximate POMDP Planning

Compact, Convex Upper Bound Iteration for Approximate POMDP Planning Compct, Convex Upper Bound Itertion for Approximte POMDP Plnning To Wng University of Alert trysi@cs.ulert.c Pscl Pouprt University of Wterloo ppouprt@cs.uwterloo.c Michel Bowling nd Dle Schuurmns University

More information

19 Optimal behavior: Game theory

19 Optimal behavior: Game theory Intro. to Artificil Intelligence: Dle Schuurmns, Relu Ptrscu 1 19 Optiml behvior: Gme theory Adversril stte dynmics hve to ccount for worst cse Compute policy π : S A tht mximizes minimum rewrd Let S (,

More information

Convert the NFA into DFA

Convert the NFA into DFA Convert the NF into F For ech NF we cn find F ccepting the sme lnguge. The numer of sttes of the F could e exponentil in the numer of sttes of the NF, ut in prctice this worst cse occurs rrely. lgorithm:

More information

Bayesian Networks: Approximate Inference

Bayesian Networks: Approximate Inference pproches to inference yesin Networks: pproximte Inference xct inference Vrillimintion Join tree lgorithm pproximte inference Simplify the structure of the network to mkxct inferencfficient (vritionl methods,

More information

2D1431 Machine Learning Lab 3: Reinforcement Learning

2D1431 Machine Learning Lab 3: Reinforcement Learning 2D1431 Mchine Lerning Lb 3: Reinforcement Lerning Frnk Hoffmnn modified by Örjn Ekeberg December 7, 2004 1 Introduction In this lb you will lern bout dynmic progrmming nd reinforcement lerning. It is ssumed

More information

Module 6 Value Iteration. CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo

Module 6 Value Iteration. CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo Module 6 Vlue Itertion CS 886 Sequentil Decision Mking nd Reinforcement Lerning University of Wterloo Mrkov Decision Process Definition Set of sttes: S Set of ctions (i.e., decisions): A Trnsition model:

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Lerning Tom Mitchell, Mchine Lerning, chpter 13 Outline Introduction Comprison with inductive lerning Mrkov Decision Processes: the model Optiml policy: The tsk Q Lerning: Q function Algorithm

More information

Review of Gaussian Quadrature method

Review of Gaussian Quadrature method Review of Gussin Qudrture method Nsser M. Asi Spring 006 compiled on Sundy Decemer 1, 017 t 09:1 PM 1 The prolem To find numericl vlue for the integrl of rel vlued function of rel vrile over specific rnge

More information

I1 = I2 I1 = I2 + I3 I1 + I2 = I3 + I4 I 3

I1 = I2 I1 = I2 + I3 I1 + I2 = I3 + I4 I 3 2 The Prllel Circuit Electric Circuits: Figure 2- elow show ttery nd multiple resistors rrnged in prllel. Ech resistor receives portion of the current from the ttery sed on its resistnce. The split is

More information

Discrete Mathematics and Probability Theory Spring 2013 Anant Sahai Lecture 17

Discrete Mathematics and Probability Theory Spring 2013 Anant Sahai Lecture 17 EECS 70 Discrete Mthemtics nd Proility Theory Spring 2013 Annt Shi Lecture 17 I.I.D. Rndom Vriles Estimting the is of coin Question: We wnt to estimte the proportion p of Democrts in the US popultion,

More information

1 Online Learning and Regret Minimization

1 Online Learning and Regret Minimization 2.997 Decision-Mking in Lrge-Scle Systems My 10 MIT, Spring 2004 Hndout #29 Lecture Note 24 1 Online Lerning nd Regret Minimiztion In this lecture, we consider the problem of sequentil decision mking in

More information

Discrete Mathematics and Probability Theory Summer 2014 James Cook Note 17

Discrete Mathematics and Probability Theory Summer 2014 James Cook Note 17 CS 70 Discrete Mthemtics nd Proility Theory Summer 2014 Jmes Cook Note 17 I.I.D. Rndom Vriles Estimting the is of coin Question: We wnt to estimte the proportion p of Democrts in the US popultion, y tking

More information

Genetic Programming. Outline. Evolutionary Strategies. Evolutionary strategies Genetic programming Summary

Genetic Programming. Outline. Evolutionary Strategies. Evolutionary strategies Genetic programming Summary Outline Genetic Progrmming Evolutionry strtegies Genetic progrmming Summry Bsed on the mteril provided y Professor Michel Negnevitsky Evolutionry Strtegies An pproch simulting nturl evolution ws proposed

More information

Bases for Vector Spaces

Bases for Vector Spaces Bses for Vector Spces 2-26-25 A set is independent if, roughly speking, there is no redundncy in the set: You cn t uild ny vector in the set s liner comintion of the others A set spns if you cn uild everything

More information

CS 188: Artificial Intelligence Spring 2007

CS 188: Artificial Intelligence Spring 2007 CS 188: Artificil Intelligence Spring 2007 Lecture 3: Queue-Bsed Serch 1/23/2007 Srini Nrynn UC Berkeley Mny slides over the course dpted from Dn Klein, Sturt Russell or Andrew Moore Announcements Assignment

More information

Administrivia CSE 190: Reinforcement Learning: An Introduction

Administrivia CSE 190: Reinforcement Learning: An Introduction Administrivi CSE 190: Reinforcement Lerning: An Introduction Any emil sent to me bout the course should hve CSE 190 in the subject line! Chpter 4: Dynmic Progrmming Acknowledgment: A good number of these

More information

1B40 Practical Skills

1B40 Practical Skills B40 Prcticl Skills Comining uncertinties from severl quntities error propgtion We usully encounter situtions where the result of n experiment is given in terms of two (or more) quntities. We then need

More information

Math 1B, lecture 4: Error bounds for numerical methods

Math 1B, lecture 4: Error bounds for numerical methods Mth B, lecture 4: Error bounds for numericl methods Nthn Pflueger 4 September 0 Introduction The five numericl methods descried in the previous lecture ll operte by the sme principle: they pproximte the

More information

THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS.

THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS. THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS RADON ROSBOROUGH https://intuitiveexplntionscom/picrd-lindelof-theorem/ This document is proof of the existence-uniqueness theorem

More information

The Minimum Label Spanning Tree Problem: Illustrating the Utility of Genetic Algorithms

The Minimum Label Spanning Tree Problem: Illustrating the Utility of Genetic Algorithms The Minimum Lel Spnning Tree Prolem: Illustrting the Utility of Genetic Algorithms Yupei Xiong, Univ. of Mrylnd Bruce Golden, Univ. of Mrylnd Edwrd Wsil, Americn Univ. Presented t BAE Systems Distinguished

More information

CMDA 4604: Intermediate Topics in Mathematical Modeling Lecture 19: Interpolation and Quadrature

CMDA 4604: Intermediate Topics in Mathematical Modeling Lecture 19: Interpolation and Quadrature CMDA 4604: Intermedite Topics in Mthemticl Modeling Lecture 19: Interpoltion nd Qudrture In this lecture we mke brief diversion into the res of interpoltion nd qudrture. Given function f C[, b], we sy

More information

Nondeterminism and Nodeterministic Automata

Nondeterminism and Nodeterministic Automata Nondeterminism nd Nodeterministic Automt 61 Nondeterminism nd Nondeterministic Automt The computtionl mchine models tht we lerned in the clss re deterministic in the sense tht the next move is uniquely

More information

Unit #9 : Definite Integral Properties; Fundamental Theorem of Calculus

Unit #9 : Definite Integral Properties; Fundamental Theorem of Calculus Unit #9 : Definite Integrl Properties; Fundmentl Theorem of Clculus Gols: Identify properties of definite integrls Define odd nd even functions, nd reltionship to integrl vlues Introduce the Fundmentl

More information

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives Block #6: Properties of Integrls, Indefinite Integrls Gols: Definition of the Definite Integrl Integrl Clcultions using Antiderivtives Properties of Integrls The Indefinite Integrl 1 Riemnn Sums - 1 Riemnn

More information

QUADRATURE is an old-fashioned word that refers to

QUADRATURE is an old-fashioned word that refers to World Acdemy of Science Engineering nd Technology Interntionl Journl of Mthemticl nd Computtionl Sciences Vol:5 No:7 011 A New Qudrture Rule Derived from Spline Interpoltion with Error Anlysis Hdi Tghvfrd

More information

Exercises with (Some) Solutions

Exercises with (Some) Solutions Exercises with (Some) Solutions Techer: Luc Tesei Mster of Science in Computer Science - University of Cmerino Contents 1 Strong Bisimultion nd HML 2 2 Wek Bisimultion 31 3 Complete Lttices nd Fix Points

More information

{ } = E! & $ " k r t +k +1

{ } = E! & $  k r t +k +1 Chpter 4: Dynmic Progrmming Objectives of this chpter: Overview of collection of clssicl solution methods for MDPs known s dynmic progrmming (DP) Show how DP cn be used to compute vlue functions, nd hence,

More information

Bellman Optimality Equation for V*

Bellman Optimality Equation for V* Bellmn Optimlity Eqution for V* The vlue of stte under n optiml policy must equl the expected return for the best ction from tht stte: V (s) mx Q (s,) A(s) mx A(s) mx A(s) Er t 1 V (s t 1 ) s t s, t s

More information

Chapter 4: Dynamic Programming

Chapter 4: Dynamic Programming Chpter 4: Dynmic Progrmming Objectives of this chpter: Overview of collection of clssicl solution methods for MDPs known s dynmic progrmming (DP) Show how DP cn be used to compute vlue functions, nd hence,

More information

Math 8 Winter 2015 Applications of Integration

Math 8 Winter 2015 Applications of Integration Mth 8 Winter 205 Applictions of Integrtion Here re few importnt pplictions of integrtion. The pplictions you my see on n exm in this course include only the Net Chnge Theorem (which is relly just the Fundmentl

More information

Connected-components. Summary of lecture 9. Algorithms and Data Structures Disjoint sets. Example: connected components in graphs

Connected-components. Summary of lecture 9. Algorithms and Data Structures Disjoint sets. Example: connected components in graphs Prm University, Mth. Deprtment Summry of lecture 9 Algorithms nd Dt Structures Disjoint sets Summry of this lecture: (CLR.1-3) Dt Structures for Disjoint sets: Union opertion Find opertion Mrco Pellegrini

More information

CS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata

CS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata CS103B ndout 18 Winter 2007 Ferury 28, 2007 Finite Automt Initil text y Mggie Johnson. Introduction Severl childrens gmes fit the following description: Pieces re set up on plying ord; dice re thrown or

More information

The practical version

The practical version Roerto s Notes on Integrl Clculus Chpter 4: Definite integrls nd the FTC Section 7 The Fundmentl Theorem of Clculus: The prcticl version Wht you need to know lredy: The theoreticl version of the FTC. Wht

More information

W. We shall do so one by one, starting with I 1, and we shall do it greedily, trying

W. We shall do so one by one, starting with I 1, and we shall do it greedily, trying Vitli covers 1 Definition. A Vitli cover of set E R is set V of closed intervls with positive length so tht, for every δ > 0 nd every x E, there is some I V with λ(i ) < δ nd x I. 2 Lemm (Vitli covering)

More information

Section 4: Integration ECO4112F 2011

Section 4: Integration ECO4112F 2011 Reding: Ching Chpter Section : Integrtion ECOF Note: These notes do not fully cover the mteril in Ching, ut re ment to supplement your reding in Ching. Thus fr the optimistion you hve covered hs een sttic

More information

Chapter 5 : Continuous Random Variables

Chapter 5 : Continuous Random Variables STAT/MATH 395 A - PROBABILITY II UW Winter Qurter 216 Néhémy Lim Chpter 5 : Continuous Rndom Vribles Nottions. N {, 1, 2,...}, set of nturl numbers (i.e. ll nonnegtive integers); N {1, 2,...}, set of ll

More information

Model Reduction of Finite State Machines by Contraction

Model Reduction of Finite State Machines by Contraction Model Reduction of Finite Stte Mchines y Contrction Alessndro Giu Dip. di Ingegneri Elettric ed Elettronic, Università di Cgliri, Pizz d Armi, 09123 Cgliri, Itly Phone: +39-070-675-5892 Fx: +39-070-675-5900

More information

CS 188: Artificial Intelligence Fall Announcements

CS 188: Artificial Intelligence Fall Announcements CS 188: Artificil Intelligence Fll 2009 Lecture 20: Prticle Filtering 11/5/2009 Dn Klein UC Berkeley Announcements Written 3 out: due 10/12 Project 4 out: due 10/19 Written 4 proly xed, Project 5 moving

More information

Quantum Nonlocality Pt. 2: No-Signaling and Local Hidden Variables May 1, / 16

Quantum Nonlocality Pt. 2: No-Signaling and Local Hidden Variables May 1, / 16 Quntum Nonloclity Pt. 2: No-Signling nd Locl Hidden Vriles My 1, 2018 Quntum Nonloclity Pt. 2: No-Signling nd Locl Hidden Vriles My 1, 2018 1 / 16 Non-Signling Boxes The primry lesson from lst lecture

More information

Chapter 0. What is the Lebesgue integral about?

Chapter 0. What is the Lebesgue integral about? Chpter 0. Wht is the Lebesgue integrl bout? The pln is to hve tutoril sheet ech week, most often on Fridy, (to be done during the clss) where you will try to get used to the ides introduced in the previous

More information

Improper Integrals. The First Fundamental Theorem of Calculus, as we ve discussed in class, goes as follows:

Improper Integrals. The First Fundamental Theorem of Calculus, as we ve discussed in class, goes as follows: Improper Integrls The First Fundmentl Theorem of Clculus, s we ve discussed in clss, goes s follows: If f is continuous on the intervl [, ] nd F is function for which F t = ft, then ftdt = F F. An integrl

More information

2.4 Linear Inequalities and Interval Notation

2.4 Linear Inequalities and Interval Notation .4 Liner Inequlities nd Intervl Nottion We wnt to solve equtions tht hve n inequlity symol insted of n equl sign. There re four inequlity symols tht we will look t: Less thn , Less thn or

More information

Chapter 4 Contravariance, Covariance, and Spacetime Diagrams

Chapter 4 Contravariance, Covariance, and Spacetime Diagrams Chpter 4 Contrvrince, Covrince, nd Spcetime Digrms 4. The Components of Vector in Skewed Coordintes We hve seen in Chpter 3; figure 3.9, tht in order to show inertil motion tht is consistent with the Lorentz

More information

Continuous Random Variables

Continuous Random Variables STAT/MATH 395 A - PROBABILITY II UW Winter Qurter 217 Néhémy Lim Continuous Rndom Vribles Nottion. The indictor function of set S is rel-vlued function defined by : { 1 if x S 1 S (x) if x S Suppose tht

More information

Math 270A: Numerical Linear Algebra

Math 270A: Numerical Linear Algebra Mth 70A: Numericl Liner Algebr Instructor: Michel Holst Fll Qurter 014 Homework Assignment #3 Due Give to TA t lest few dys before finl if you wnt feedbck. Exercise 3.1. (The Bsic Liner Method for Liner

More information

Minimal DFA. minimal DFA for L starting from any other

Minimal DFA. minimal DFA for L starting from any other Miniml DFA Among the mny DFAs ccepting the sme regulr lnguge L, there is exctly one (up to renming of sttes) which hs the smllest possile numer of sttes. Moreover, it is possile to otin tht miniml DFA

More information

Lecture Solution of a System of Linear Equation

Lecture Solution of a System of Linear Equation ChE Lecture Notes, Dept. of Chemicl Engineering, Univ. of TN, Knoville - D. Keffer, 5/9/98 (updted /) Lecture 8- - Solution of System of Liner Eqution 8. Why is it importnt to e le to solve system of liner

More information

5.7 Improper Integrals

5.7 Improper Integrals 458 pplictions of definite integrls 5.7 Improper Integrls In Section 5.4, we computed the work required to lift pylod of mss m from the surfce of moon of mss nd rdius R to height H bove the surfce of the

More information

Math 61CM - Solutions to homework 9

Math 61CM - Solutions to homework 9 Mth 61CM - Solutions to homework 9 Cédric De Groote November 30 th, 2018 Problem 1: Recll tht the left limit of function f t point c is defined s follows: lim f(x) = l x c if for ny > 0 there exists δ

More information

Chapter Five: Nondeterministic Finite Automata. Formal Language, chapter 5, slide 1

Chapter Five: Nondeterministic Finite Automata. Formal Language, chapter 5, slide 1 Chpter Five: Nondeterministic Finite Automt Forml Lnguge, chpter 5, slide 1 1 A DFA hs exctly one trnsition from every stte on every symol in the lphet. By relxing this requirement we get relted ut more

More information

Exam 2, Mathematics 4701, Section ETY6 6:05 pm 7:40 pm, March 31, 2016, IH-1105 Instructor: Attila Máté 1

Exam 2, Mathematics 4701, Section ETY6 6:05 pm 7:40 pm, March 31, 2016, IH-1105 Instructor: Attila Máté 1 Exm, Mthemtics 471, Section ETY6 6:5 pm 7:4 pm, Mrch 1, 16, IH-115 Instructor: Attil Máté 1 17 copies 1. ) Stte the usul sufficient condition for the fixed-point itertion to converge when solving the eqution

More information

Parse trees, ambiguity, and Chomsky normal form

Parse trees, ambiguity, and Chomsky normal form Prse trees, miguity, nd Chomsky norml form In this lecture we will discuss few importnt notions connected with contextfree grmmrs, including prse trees, miguity, nd specil form for context-free grmmrs

More information

Designing finite automata II

Designing finite automata II Designing finite utomt II Prolem: Design DFA A such tht L(A) consists of ll strings of nd which re of length 3n, for n = 0, 1, 2, (1) Determine wht to rememer out the input string Assign stte to ech of

More information

CS 373, Spring Solutions to Mock midterm 1 (Based on first midterm in CS 273, Fall 2008.)

CS 373, Spring Solutions to Mock midterm 1 (Based on first midterm in CS 273, Fall 2008.) CS 373, Spring 29. Solutions to Mock midterm (sed on first midterm in CS 273, Fll 28.) Prolem : Short nswer (8 points) The nswers to these prolems should e short nd not complicted. () If n NF M ccepts

More information

5: The Definite Integral

5: The Definite Integral 5: The Definite Integrl 5.: Estimting with Finite Sums Consider moving oject its velocity (meters per second) t ny time (seconds) is given y v t = t+. Cn we use this informtion to determine the distnce

More information

Farey Fractions. Rickard Fernström. U.U.D.M. Project Report 2017:24. Department of Mathematics Uppsala University

Farey Fractions. Rickard Fernström. U.U.D.M. Project Report 2017:24. Department of Mathematics Uppsala University U.U.D.M. Project Report 07:4 Frey Frctions Rickrd Fernström Exmensrete i mtemtik, 5 hp Hledre: Andres Strömergsson Exmintor: Jörgen Östensson Juni 07 Deprtment of Mthemtics Uppsl University Frey Frctions

More information

Surface maps into free groups

Surface maps into free groups Surfce mps into free groups lden Wlker Novemer 10, 2014 Free groups wedge X of two circles: Set F = π 1 (X ) =,. We write cpitl letters for inverse, so = 1. e.g. () 1 = Commuttors Let x nd y e loops. The

More information

An Optimal Best-First Search Algorithm for Solving Infinite Horizon DEC-POMDPs

An Optimal Best-First Search Algorithm for Solving Infinite Horizon DEC-POMDPs An Optiml Best-First Serch Algorithm for Solving Infinite Horizon DEC-POMDPs Dniel Szer nd Frnçois Chrpillet INRIA Lorrine - LORIA, MAIA Group, 54506 Vndœuvre-lès-Nncy, Frnce {szer, chrp}@lori.fr http://mi.lori.fr

More information

Linear Systems with Constant Coefficients

Linear Systems with Constant Coefficients Liner Systems with Constnt Coefficients 4-3-05 Here is system of n differentil equtions in n unknowns: x x + + n x n, x x + + n x n, x n n x + + nn x n This is constnt coefficient liner homogeneous system

More information

LAMEPS Limited area ensemble forecasting in Norway, using targeted EPS

LAMEPS Limited area ensemble forecasting in Norway, using targeted EPS Limited re ensemle forecsting in Norwy, using trgeted Mrit H. Jensen, Inger-Lise Frogner* nd Ole Vignes, Norwegin Meteorologicl Institute, (*held the presenttion) At the Norwegin Meteorologicl Institute

More information

Lecture 2: January 27

Lecture 2: January 27 CS 684: Algorithmic Gme Theory Spring 217 Lecturer: Év Trdos Lecture 2: Jnury 27 Scrie: Alert Julius Liu 2.1 Logistics Scrie notes must e sumitted within 24 hours of the corresponding lecture for full

More information

Continuous Random Variables Class 5, Jeremy Orloff and Jonathan Bloom

Continuous Random Variables Class 5, Jeremy Orloff and Jonathan Bloom Lerning Gols Continuous Rndom Vriles Clss 5, 8.05 Jeremy Orloff nd Jonthn Bloom. Know the definition of continuous rndom vrile. 2. Know the definition of the proility density function (pdf) nd cumultive

More information

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages CMSC 330: Orgniztion of Progrmming Lnguges Finite Automt 2 CMSC 330 1 Types of Finite Automt Deterministic Finite Automt (DFA) Exctly one sequence of steps for ech string All exmples so fr Nondeterministic

More information

Goals: Determine how to calculate the area described by a function. Define the definite integral. Explore the relationship between the definite

Goals: Determine how to calculate the area described by a function. Define the definite integral. Explore the relationship between the definite Unit #8 : The Integrl Gols: Determine how to clculte the re described by function. Define the definite integrl. Eplore the reltionship between the definite integrl nd re. Eplore wys to estimte the definite

More information

Lecture 2e Orthogonal Complement (pages )

Lecture 2e Orthogonal Complement (pages ) Lecture 2e Orthogonl Complement (pges -) We hve now seen tht n orthonorml sis is nice wy to descrie suspce, ut knowing tht we wnt n orthonorml sis doesn t mke one fll into our lp. In theory, the process

More information

Fig. 1. Open-Loop and Closed-Loop Systems with Plant Variations

Fig. 1. Open-Loop and Closed-Loop Systems with Plant Variations ME 3600 Control ystems Chrcteristics of Open-Loop nd Closed-Loop ystems Importnt Control ystem Chrcteristics o ensitivity of system response to prmetric vritions cn be reduced o rnsient nd stedy-stte responses

More information

Monte Carlo method in solving numerical integration and differential equation

Monte Carlo method in solving numerical integration and differential equation Monte Crlo method in solving numericl integrtion nd differentil eqution Ye Jin Chemistry Deprtment Duke University yj66@duke.edu Abstrct: Monte Crlo method is commonly used in rel physics problem. The

More information

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. NFA for (a b)*abb.

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. NFA for (a b)*abb. CMSC 330: Orgniztion of Progrmming Lnguges Finite Automt 2 Types of Finite Automt Deterministic Finite Automt () Exctly one sequence of steps for ech string All exmples so fr Nondeterministic Finite Automt

More information

CS415 Compilers. Lexical Analysis and. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

CS415 Compilers. Lexical Analysis and. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University CS415 Compilers Lexicl Anlysis nd These slides re sed on slides copyrighted y Keith Cooper, Ken Kennedy & Lind Torczon t Rice University First Progrmming Project Instruction Scheduling Project hs een posted

More information

Riemann is the Mann! (But Lebesgue may besgue to differ.)

Riemann is the Mann! (But Lebesgue may besgue to differ.) Riemnn is the Mnn! (But Lebesgue my besgue to differ.) Leo Livshits My 2, 2008 1 For finite intervls in R We hve seen in clss tht every continuous function f : [, b] R hs the property tht for every ɛ >

More information

Numerical Integration

Numerical Integration Chpter 1 Numericl Integrtion Numericl differentition methods compute pproximtions to the derivtive of function from known vlues of the function. Numericl integrtion uses the sme informtion to compute numericl

More information

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. Comparing DFAs and NFAs (cont.) Finite Automata 2

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. Comparing DFAs and NFAs (cont.) Finite Automata 2 CMSC 330: Orgniztion of Progrmming Lnguges Finite Automt 2 Types of Finite Automt Deterministic Finite Automt () Exctly one sequence of steps for ech string All exmples so fr Nondeterministic Finite Automt

More information

A study of Pythagoras Theorem

A study of Pythagoras Theorem CHAPTER 19 A study of Pythgors Theorem Reson is immortl, ll else mortl. Pythgors, Diogenes Lertius (Lives of Eminent Philosophers) Pythgors Theorem is proly the est-known mthemticl theorem. Even most nonmthemticins

More information

Interpreting Integrals and the Fundamental Theorem

Interpreting Integrals and the Fundamental Theorem Interpreting Integrls nd the Fundmentl Theorem Tody, we go further in interpreting the mening of the definite integrl. Using Units to Aid Interprettion We lredy know tht if f(t) is the rte of chnge of

More information

Decision Networks. CS 188: Artificial Intelligence Fall Example: Decision Networks. Decision Networks. Decisions as Outcome Trees

Decision Networks. CS 188: Artificial Intelligence Fall Example: Decision Networks. Decision Networks. Decisions as Outcome Trees CS 188: Artificil Intelligence Fll 2011 Decision Networks ME: choose the ction which mximizes the expected utility given the evidence mbrell Lecture 17: Decision Digrms 10/27/2011 Cn directly opertionlize

More information

New Expansion and Infinite Series

New Expansion and Infinite Series Interntionl Mthemticl Forum, Vol. 9, 204, no. 22, 06-073 HIKARI Ltd, www.m-hikri.com http://dx.doi.org/0.2988/imf.204.4502 New Expnsion nd Infinite Series Diyun Zhng College of Computer Nnjing University

More information

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4 Intermedite Mth Circles Wednesdy, Novemer 14, 2018 Finite Automt II Nickols Rollick nrollick@uwterloo.c Regulr Lnguges Lst time, we were introduced to the ide of DFA (deterministic finite utomton), one

More information

NUMERICAL INTEGRATION. The inverse process to differentiation in calculus is integration. Mathematically, integration is represented by.

NUMERICAL INTEGRATION. The inverse process to differentiation in calculus is integration. Mathematically, integration is represented by. NUMERICAL INTEGRATION 1 Introduction The inverse process to differentition in clculus is integrtion. Mthemticlly, integrtion is represented by f(x) dx which stnds for the integrl of the function f(x) with

More information

How to simulate Turing machines by invertible one-dimensional cellular automata

How to simulate Turing machines by invertible one-dimensional cellular automata How to simulte Turing mchines by invertible one-dimensionl cellulr utomt Jen-Christophe Dubcq Déprtement de Mthémtiques et d Informtique, École Normle Supérieure de Lyon, 46, llée d Itlie, 69364 Lyon Cedex

More information

1 Nondeterministic Finite Automata

1 Nondeterministic Finite Automata 1 Nondeterministic Finite Automt Suppose in life, whenever you hd choice, you could try oth possiilities nd live your life. At the end, you would go ck nd choose the one tht worked out the est. Then you

More information

Designing Information Devices and Systems I Spring 2018 Homework 8

Designing Information Devices and Systems I Spring 2018 Homework 8 EECS 16A Designing Informtion Devices nd Systems I Spring 2018 Homework 8 This homework is due Mrch 19, 2018, t 23:59. Self-grdes re due Mrch 22, 2018, t 23:59. Sumission Formt Your homework sumission

More information

Chapter 9 Definite Integrals

Chapter 9 Definite Integrals Chpter 9 Definite Integrls In the previous chpter we found how to tke n ntiderivtive nd investigted the indefinite integrl. In this chpter the connection etween ntiderivtives nd definite integrls is estlished

More information

SUMMER KNOWHOW STUDY AND LEARNING CENTRE

SUMMER KNOWHOW STUDY AND LEARNING CENTRE SUMMER KNOWHOW STUDY AND LEARNING CENTRE Indices & Logrithms 2 Contents Indices.2 Frctionl Indices.4 Logrithms 6 Exponentil equtions. Simplifying Surds 13 Opertions on Surds..16 Scientific Nottion..18

More information

Review of Probability Distributions. CS1538: Introduction to Simulations

Review of Probability Distributions. CS1538: Introduction to Simulations Review of Proility Distriutions CS1538: Introduction to Simultions Some Well-Known Proility Distriutions Bernoulli Binomil Geometric Negtive Binomil Poisson Uniform Exponentil Gmm Erlng Gussin/Norml Relevnce

More information

Lecture 3. In this lecture, we will discuss algorithms for solving systems of linear equations.

Lecture 3. In this lecture, we will discuss algorithms for solving systems of linear equations. Lecture 3 3 Solving liner equtions In this lecture we will discuss lgorithms for solving systems of liner equtions Multiplictive identity Let us restrict ourselves to considering squre mtrices since one

More information

different methods (left endpoint, right endpoint, midpoint, trapezoid, Simpson s).

different methods (left endpoint, right endpoint, midpoint, trapezoid, Simpson s). Mth 1A with Professor Stnkov Worksheet, Discussion #41; Wednesdy, 12/6/217 GSI nme: Roy Zho Problems 1. Write the integrl 3 dx s limit of Riemnn sums. Write it using 2 intervls using the 1 x different

More information

A Fast and Reliable Policy Improvement Algorithm

A Fast and Reliable Policy Improvement Algorithm A Fst nd Relible Policy Improvement Algorithm Ysin Abbsi-Ydkori Peter L. Brtlett Stephen J. Wright Queenslnd University of Technology UC Berkeley nd QUT University of Wisconsin-Mdison Abstrct We introduce

More information

C Dutch System Version as agreed by the 83rd FIDE Congress in Istanbul 2012

C Dutch System Version as agreed by the 83rd FIDE Congress in Istanbul 2012 04.3.1. Dutch System Version s greed y the 83rd FIDE Congress in Istnul 2012 A Introductory Remrks nd Definitions A.1 Initil rnking list A.2 Order See 04.2.B (Generl Hndling Rules - Initil order) For pirings

More information

Numerical integration

Numerical integration 2 Numericl integrtion This is pge i Printer: Opque this 2. Introduction Numericl integrtion is problem tht is prt of mny problems in the economics nd econometrics literture. The orgniztion of this chpter

More information

ODE: Existence and Uniqueness of a Solution

ODE: Existence and Uniqueness of a Solution Mth 22 Fll 213 Jerry Kzdn ODE: Existence nd Uniqueness of Solution The Fundmentl Theorem of Clculus tells us how to solve the ordinry differentil eqution (ODE) du = f(t) dt with initil condition u() =

More information

THERMAL EXPANSION COEFFICIENT OF WATER FOR VOLUMETRIC CALIBRATION

THERMAL EXPANSION COEFFICIENT OF WATER FOR VOLUMETRIC CALIBRATION XX IMEKO World Congress Metrology for Green Growth September 9,, Busn, Republic of Kore THERMAL EXPANSION COEFFICIENT OF WATER FOR OLUMETRIC CALIBRATION Nieves Medin Hed of Mss Division, CEM, Spin, mnmedin@mityc.es

More information

Solution for Assignment 1 : Intro to Probability and Statistics, PAC learning

Solution for Assignment 1 : Intro to Probability and Statistics, PAC learning Solution for Assignment 1 : Intro to Probbility nd Sttistics, PAC lerning 10-701/15-781: Mchine Lerning (Fll 004) Due: Sept. 30th 004, Thursdy, Strt of clss Question 1. Bsic Probbility ( 18 pts) 1.1 (

More information

List all of the possible rational roots of each equation. Then find all solutions (both real and imaginary) of the equation. 1.

List all of the possible rational roots of each equation. Then find all solutions (both real and imaginary) of the equation. 1. Mth Anlysis CP WS 4.X- Section 4.-4.4 Review Complete ech question without the use of grphing clcultor.. Compre the mening of the words: roots, zeros nd fctors.. Determine whether - is root of 0. Show

More information

An approximation to the arithmetic-geometric mean. G.J.O. Jameson, Math. Gazette 98 (2014), 85 95

An approximation to the arithmetic-geometric mean. G.J.O. Jameson, Math. Gazette 98 (2014), 85 95 An pproximtion to the rithmetic-geometric men G.J.O. Jmeson, Mth. Gzette 98 (4), 85 95 Given positive numbers > b, consider the itertion given by =, b = b nd n+ = ( n + b n ), b n+ = ( n b n ) /. At ech

More information

Review of Calculus, cont d

Review of Calculus, cont d Jim Lmbers MAT 460 Fll Semester 2009-10 Lecture 3 Notes These notes correspond to Section 1.1 in the text. Review of Clculus, cont d Riemnn Sums nd the Definite Integrl There re mny cses in which some

More information

Section 6: Area, Volume, and Average Value

Section 6: Area, Volume, and Average Value Chpter The Integrl Applied Clculus Section 6: Are, Volume, nd Averge Vlue Are We hve lredy used integrls to find the re etween the grph of function nd the horizontl xis. Integrls cn lso e used to find

More information

NOTES ON HILBERT SPACE

NOTES ON HILBERT SPACE NOTES ON HILBERT SPACE 1 DEFINITION: by Prof C-I Tn Deprtment of Physics Brown University A Hilbert spce is n inner product spce which, s metric spce, is complete We will not present n exhustive mthemticl

More information

8 Laplace s Method and Local Limit Theorems

8 Laplace s Method and Local Limit Theorems 8 Lplce s Method nd Locl Limit Theorems 8. Fourier Anlysis in Higher DImensions Most of the theorems of Fourier nlysis tht we hve proved hve nturl generliztions to higher dimensions, nd these cn be proved

More information