Robot Planning in Partially Observable Continuous Domains

Size: px
Start display at page:

Download "Robot Planning in Partially Observable Continuous Domains"

Transcription

1 Robot Plnning in Prtilly Obervble Continuou Domin Joep M. Port Intitut de Robòtic i Informàtic Indutril (UPC-CSIC) Lloren i Artig 4-6, 828, Brcelon Spin Emil: port@iri.upc.edu Mtthij T. J. Spn Informtic Intitute Univerity of Amterdm Kruiln 43, 98SJ, Amterdm The Netherlnd Emil: mtjpn@cience.uv.nl Niko Vli Informtic Intitute Univerity of Amterdm Kruiln 43, 98SJ, Amterdm The Netherlnd Emil: vli@cience.uv.nl Abtrct We preent vlue itertion lgorithm for lerning to ct in Prtilly Obervble Mrkov Deciion Procee (POMDP) with continuou tte pce. Mintrem POMDP reerch focue on the dicrete ce nd thi complicte it ppliction to, e.g., robotic problem tht re nturlly modeled uing continuou tte pce. The min difficulty in defining (belief-bed) POMDP in continuou tte pce i tht expected vlue over tte mut be defined uing integrl tht, in generl, cnnot be computed in cloed from. In thi pper, we firt how tht the optiml finite-horizon vlue function over the continuou infinite-dimenionl POMDP belief pce i piecewie liner nd convex, nd i defined by finite et of upporting α-function tht re nlogou to the α-vector (hyperplne) defining the vlue function of dicrete-tte POMDP. Second, we how tht, for firly generl cl of POMDP model in which ll function of interet re modeled by Guin mixture, ll belief updte nd vlue itertion bckup cn be crried out nlyticlly nd exct. A crucil difference with repect to the α-vector of the dicrete ce i tht, in the continuou ce, the α-function will typiclly grow in complexity (e.g., in the number of component) in ech vlue itertion. Finlly, we demontrte PERSEUS, our previouly propoed rndomized point-bed vlue itertion lgorithm, in imple robot plnning problem with continuou domin, where encourging reult re oberved. I. INTRODUCTION A populr formlim for deciion mking under uncertinty i the Mrkov Deciion Proce (MDP) frmework []. In thi prdigm, n gent interct with given ytem by executing ction tht chnge the tte of the ytem tochticlly nd tht provide rewrd or penltie to the gent. The objective of the lerning gent i to identify for ech tte the ction tht produce the mot rewrd in the long term. When the deciion mking h to performed bed on uncertin informtion bout the tte of the ytem, the tk i nturlly formlized Prtilly Obervble Mrkov Deciion Proce (POMDP) [2] [6]. POMDP hve often been ued frmework for plnning in robotic [7] []. In generl, computing the exct olution of POMDP i n intrctble problem [], [2], even for the dicrete ce (i.e., dicrete et of tte, ction, nd obervtion). Two min fctor cue thi high computtionl cot [3]. The firt one i the cure of hitory: the number of ction-obervtion equence to be conidered incree exponentilly we extend the plnning horizon. Fortuntely the cure of hitory cn be minimized by limiting ourelve to pproximte olution [3], [4]. The econd fctor tht mke POMDP lgorithm inefficient i the cure of dimenionlity: the computtionl cot of dicrete tte POMDP lgorithm cle with the number of tte. Therefore, the finer the grnulrity of the tte pce dicretiztion, the higher the cot of olving the POMDP. One inight we cn extrct from thi fct i tht it would be deirble to void the dicretiztion of the tte pce. Moreover, rel world problem re nturlly formlized uing continuou pce. For intnce, in robot nvigtion problem, the tte to be etimted i the poe of the robot tht, for robot moving on plnr urfce, i nturlly defined in the continuou pce of the Crtein coordinte of the robot nd it orienttion. Liner POMDP with continuou tte nd qudrtic rewrd function hve cloed olution [5]. However, thi i too retrictive ce for mny prcticl purpoe. Exiting lgorithm for continuoutte POMDP with generl rewrd function re bed on policy erch [6], [7] or pproximte (grid-bed) vlue itertion [8], [9]. For dicrete-tte POMDP, recent promiing lgorithm re bed on point-bed vlue itertion [3], [4]. In thi pper, we preent novel pproch to olve POMDP in continuou tte pce vi vlue itertion. The min difficulty of working in continuou tte pce i tht expected vlue over tte mut be defined uing integrl. Thee integrl cnnot be computed in cloed form for generl function nd, therefore, only pproximtion technique cn be ued [9]. In our pproch, we retrict ll function defined on the tte pce to prticulr, lthough highly expreive, fmily of function: liner combintion of Guin. Thi llow u to evlute ll integrl involved in the vlue itertion POMDP formultion in cloed form. Uing thi fct, we cn dpt to the continuou ce the rich mchinery developed for dicrete-tte POMDP vlue itertion, in prticulr the pointbed lgorithm. Thi pper i orgnized follow. Firt, in Section II, we review the POMDP frmework nd the vlue itertion proce for dicrete-tte POMDP. In Section III, we generlize the vlue function repreenttion commonly ued in dicrete-tte POMDP to continuou-tte one. Thi llow u to do vlue itertion for the continuou ce. In Section IV, we derive cloed formul for the element involved in the vlue itertion

2 frmework introduced in Section III, uming Guinbed repreenttion for the belief nd the model defining the POMDP. In Section V, we ue thee cloed formul to define point-bed lgorithm for Guin-bed POMDP. In Section VI, we preent ome reult with the propoed lgorithm nd, in Section VII, we ummrize our work nd point to direction for further reerch. II. PRELIMINARIES: POMDPS A POMDP model n gent intercting with ytem uing the following element A et of ytem tte, S. A et of gent ction, A. A et of obervtion, O. An ction (or trnition) model defined by p(, ), the probbility tht the ytem chnge from tte to when the gent execute ction. An obervtion model defined by p(o ), the probbility tht the gent oberve o when the ytem reche tte. A rewrd function defined r () R, the rewrd obtined by the gent if it execute ction when the the ytem i in tte. At given moment, the ytem i in tte,, nd the gent execute n ction,. A reult, the gent receive rewrd, r, the ytem tte chnge to nd, then, the gent oberve o. The knowledge of the gent bout the ytem tte i repreented belief, i.e., probbility ditribution over the tte pce. The initil belief i umed to be known nd, for dicrete et of tte, if b i the belief of the gent bout the tte, the belief fter executing ction nd oberving o i b,o ( ) = p(o ) p(, ) b(). () p(o, b) S A function mpping belief to ction i clled policy. An optiml policy i one tht, on the verge, generte much rewrd poible in the long term. The vlue function condene the immedite nd delyed rewrd tht cn be obtined from given belief. Thi function cn be expreed in recurive wy with Q n (b, ) = S V n (b) = mx Q n (b, ), (2) r () b() + γ o p(o b, ) V n (b,o ), (3) where n i the plnning horizon, S nd O re umed dicrete nd γ [, ) i dicount fctor tht trde off the importnce of the immedite nd the delyed rewrd. The bove recurion i uully written in functionl form V n = H V n (4) nd it i known the Bellmn recurion [2]. Thi recurion converge to fixed point V tht i the optiml vlue function [] An optiml policy π cn be defined π (b) = rg mx Q (b, ) for Q the Q-function ocited with the optiml vlue function, V. Vlue itertion for POMDP [2], [6], [2] generte equence of function V i uing the recurrence in Eq. 4 tht progreively pproch V nd compute n pproximtely optiml policy from the finl V i. At firt ight the vlue function eem intrctble, but it cn be expreed in imple form [2] V n (b) = mx {α i n }i α i n() b(), with {αn} i i et of vector. Uing thi formultion, vlue itertion lgorithm typiclly focu on the computtion of the α n -vector. III. POMDPS IN CONTINUOUS STATE SPACES In thi ection, we generlize POMDP to continuou tte pce, while till uming dicrete ction nd obervtion pce. With thi formultion, we void the neceity of dicretizing the tte pce nd, thu, we reduce the chnce of being ffected by the cure of dimenionlity. In the dicrete ce, expecttion for given belief re computed by umming over the tte pce (ee Eq. nd 3). The generliztion to the continuou ce mount to computing thee expected vlue by integrting inted of umming. Thu we hve b,o ( ) = p(o ) p(o, b) nd Q n (b, ) = r () b() + γ o p(, ) b(), (5) p(o b, ) V n (b,o ), (6) where r : S R i continuou rewrd function for ction. With continuou tte pce, the belief pce i lo continuou, in the dicrete ce, but now with n infinite number of dimenion. However, there re everl propertie typicl of vlue function for dicrete tte pce tht till hold in the continuou ce. Nmely, we cn prove [22] tht () the optiml finite-horizon vlue function i piecewie liner nd convex (PWLC) in the belief pce, (2) the vlue function recurion i iotonic, nd (3) thi recurion i lo contrction (nd thu, the itertive computtion of the vlue function for increing horizon will converge to the optiml vlue function V ). The PWLC i bic property ince it llow to repreent the vlue function uing mll et of upporting element. Thi kind of repreenttion i the key element to define the vlue itertion proce. To prove thi property, we firt need to prove the following lemm. Lemm : The vlue function in continuou-tte POMDP cn be expreed V n (b) = mx αn() i b(), {α i n }i for pproprite α-function α i n : S R.

3 Proof: The proof, in the dicrete ce, i done vi induction. For plnning horizon, we only hve to tke into ccount the immedite rewrd nd, thu, we hve tht V (b) = mx r () b(), nd, therefore, if we define we hve tht, deired {α i ()} i = {r ()} A, V (b) = mx {α i }i α i () b(). For the generl ce, we hve tht, uing Eq. 2 nd 6, { V n (b) = mx r () b() + γ } p(o, b) V n (b,o ), o nd, by the induction hypothei, V n (b,o ) = mx α j {α j n ( ) b,o ( ). n }j From Eq. 5, V n (b,o ) = = p(o, b) mx α j n ( ) p(o ) p(, ) b() p(o, b) α j n ( ) p(o ) p(, ) b(), {α j n }j mx {α j n }j nd, therefore, { V n (b) = mx r () b() + γ } mx α j o {α j n ( ) p(o ) p(, ) b() n }j { = mx r () b() + γ [ ] } mx α j o {α j n ( ) p(o ) p(, ) b(). n }j At thi point, we define α,o() j = α j n ( ) p(o ) p(, ). (7) With thi, we hve tht { V n (b) = mx r () b() + γ } mx α,o() j b(), o {α j,o} j nd we define α,o,b = rg mx α,o() j b(). (8) {α j,o} j Oberve tht, for given nd o, α,o,b i jut one of the M element in the et {α j,o} j. Uing reoning prllel to tht of the enumertion phe of the Monhn lgorithm [3], we cn hve, t mot, A M O different α,o,b -function. The finite crdinlity of thi et i crucil point ince it prove tht we cn repreent V n (b) with finite et of upporting α-function, depite the infinite dimenionlity of the belief pce. Uing the bove, we cn write V n (b) = mx If we define = mx { r () b() + γ } α,o,b () b() o { [ r () + γ ] } α,o,b () b(). o {α i n()} i = {r () + γ o α,o,b () } A, (9) we hve V n in the deired form V n (b) = mx {α i n }i αn() i b(), () nd, thu, the lemm hold. Lemm 2: The vlue function i PWLC in the belief pce. Proof: It hold tht with V n (b) = mx {α i n }i V i n(b), Vn(b) i = αn() i b(). For prticulr V i n clerly hold V i n(κ b + λb 2 ) = κ V i n(b ) + λ V i n(b 2 ), for rbitrry κ nd λ. Therefore, ech Vn i i liner function in b. The piecewie linerity prt of the property i given by the fct tht the {αn} i i et i of finite crdinlity nd, hown bove, V n i liner, for ech individul αn. i Finlly, the convexity i given by the fct tht we tke the mximum of convex (liner) function when computing the vlue function nd, thu, we obtin convex function reult. Eq. 7 to 9 contitute the vlue itertion proce for continuou tte POMDP ince they provide contructive wy to determine the element (i.e., the α-function) defining V n from thoe defining V n. IV. GAUSSIAN-BASED POMDPS In previou ection, we left n open point how to ctully compute the belief updte (Eq. 5), the tep in the vlue itertion proce (Eq. 7 to 9), nd the vlue for given belief point (Eq. ). In thi ection, we how how thee computtion re poible uming tht the belief well the obervtion, ction, nd rewrd model re repreented liner combintion of Guin. We firt formlly introduce our umption on the model (Section IV-A) nd then we define the belief updte (Section IV-B) nd the bic vlue itertion tep (Section IV-C) for Guin-bed POMDP. Note tht other fmilie of integrble function could be ued to determine the α-function in cloed form, but Guin-bed model provide high degree of flexibility nd re of common ue in mny ppliction, including robotic [23], [24].

4 A. Model for Guin-bed POMDP We will ume tht belief point re repreented Guin mixture b() = j w j φ( j, Σ j ), () with φ Guin with men j nd covrince mtrix Σ j nd where the mixing weight tify w j >, j w j =. In the extreme ce, Guin mixture with n infinite number of component would be necery to repreent given point in the continuou, infinite-dimenionl belief pce. However, only Guin mixture with few component re needed in prcticl itution. We ume tht our obervtion model i defined nonprmetriclly from et of mple T = {( i, o i ) i [, N]} with o i n obervtion obtined t tte i. Uing thee mple, the obervtion model cn be defined p(o ) = p( o) p(o), p() nd, uming uniform p() in the pce covered by T, nd pproximting p(o) from the mple in the trining et we hve [ N o ] No p(o ) λ o i φ( o i, Σ o No i ) N o N = wi o φ( o i, Σ o i ) i= with o i one of the N o point in T with o n ocited obervtion nd where wi o = λo i /N nd Σo i re, repectively, weighting fctor nd covrince mtrix ocited with tht trining point. The et {λ o i } i nd {Σ o i } i re defined o tht p() = p( o) p(o) = N o wi o φ( o i, Σ o i ), o o i= i (pproximtely) uniform in the re covered by T. A fr the ction model i concerned, we ume it i liner-guin i= p(, ) = φ( + (), Σ ). (2) Non-liner ction model cn be pproximted it i done, for intnce, in the extended Klmn filter or in the uncented Klmn filter [25]. The function : A S implement the trnition model of the ytem. Finlly, the rewrd cn be een n obervtion with n ocited clr vlue. Therefore, uming finite et of poible rewrd R = {r i i [, M]}, the rewrd model p(r, ) for ech prticulr cn be repreented in the me wy the obervtion model With tht, we hve tht M r p(r, ) wi r φ( r i, Σ r i ). i= r () = r p(r, ) M r r wi r φ( r i, Σ r i ), r R r R i= tht i n unnormlized Guin mixture. B. Belief updte for Guin-Bed POMDP The belief updte on Eq. 5 cn be implemented in our model tking into ccount tht it conit of two tep. The firt one i the ppliction of the ction model on the current belief tte. Thi cn be computed the convolution of the Guin repreenting b() (Eq. ) with the Guin repreenting the ction model (Eq. 2). Thi convolution reult in p(, ) b() = w j φ( j + (), Σ j + Σ ). j In the econd tep of the belief updte, the prediction obtined with the ction model i corrected uing the informtion provided by the obervtion model [ ] b,o ( ) wi o φ( o i, Σ o i ) = i,j i [ j ] w j φ( j + (), Σ j + Σ ) w o i w j φ( o i, Σ o i ) φ( j + (), Σ j + Σ ) The product of two Guin function i cled Guin. Therefore, we hve tht with b,o ( ) i,j w o i w j δ,o i,j φ(,o i,j, Σ,o i,j ), δ,o i,j = φ( j + () o i, Σ o i + Σ j + Σ ), Σ,o i,j = ((Σo i ) + (Σ j + Σ ) ),,o i,j = Σ,o i,j ((Σo i ) o i + (Σ j + Σ ) ( j + ())). The proportionlity in the definition of b,o ( ) implie tht the weight (wi o w j δ,o i,j, i, j) hould be cled to um to one. C. Bckup Opertor for Guin-Bed POMDP The computtion of the mpping H (Eq. 4) for given belief point b i clled bckup. Thi mpping determine the α function (or α-vector in the dicrete ce) to be included in V n for belief point under conidertion (ee Eq. 7 to 9). A full bckup, i.e., bckup for the whole belief pce, involve the computtion of ll relevnt α-function for V n. Full bckup re computtionlly expenive (in the dicrete ce they involve the ue of liner progrmming in order to determine ufficient et of point on which to bckup), but the bckup for ingle belief point i reltively chep. Thi i exploited by the point-bed POMDP lgorithm to efficiently pproximte V n on fixed et of belief point [3], [4]. Next, we decribe the bckup opertor on continuou tte pce tht we will ue lter in the PERSEUS lgorithm. The bckup for given belief point b i bckup(b) = rg mx αn() i b(), {α i n }i where α i n() i defined in Eq. 8 nd 9 from the α,o -function (Eq. 7).

5 Lemm 3: The function αn() i cn be expreed liner combintion of Guin, uming the enor, ction nd rewrd model re lo Guin-bed. Proof: Thi lemm cn be proved vi induction. For n =, α() i = r () for fixed nd thu it i indeed n unnormlized Guin mixture. For n >, we ume tht Thi opertor cn be computed [ ][ ] α, b = w k φ( k, Σ k ) w j φ( j, Σ j ) = k,j k w k w j φ( k, Σ k ) φ( j, Σ j ) j α j n ( ) = k w j k φ( j k, Σj k ). = k,j w k w j φ( j k, Σ k + Σ j ). Then, with our prticulr model, α,o() j in Eq. 7 i the integrl of three liner combintion of Guin [ ][ α,o() j = w j k φ( j k, Σj k ) ] wl o φ( o l, Σ o l ) = k,l w j k wo l k l φ( + (), Σ ) φ( j k, Σj k )φ( o l, Σ o l )φ( + (), Σ ). In thi ce, we hve to perform the product of two Guin twice, once for φ( j k, Σj k ) nd φ( o l, Σo l ) to get (δ j,o k,l φ(, Σ )) nd once more for (δ j,o k,l φ(, Σ )) nd φ( + (), Σ ) to get (δ j,o k,l βj,o, k,l ()φ(, Σ)). The term () cn be expreed δ j,o k,l with nd βj,o, k,l With thi, we hve δ j,o k,l = φ(o l j k, Σj k + Σo l ), β j,o, k,l () = φ( j,o k,l (), Σj,o k,l + Σ ), Σ j,o k,l = [(Σj k ) + (Σ o l ) ], j,o k,l = Σj,o k,l [(Σj k ) j k + (Σo l ) o l ]. α j,o() = k,l = k,l = k,l w j k wo l δ j,o k,l βj,o, k,l () φ(, Σ) w j k wo l δ j,o k,l βj,o, k,l () φ(, Σ) w j k wo l δ j,o k,l βj,o, k,l (). Once we hve the α,o-function, j we cn compute the αn- i function. To do tht, we need to determine the α,o j for which α,o() j b() i mximized. Since the integrl of the product of two Guin mixture (in prticulr n α-function nd belief point) i rther common opertion in the continuou tte POMDP frmework we will denote it by α, b = α() b(). Uing thi opertor nd Eq. 8 nd 9, we define {α i n()} i = {r () + γ o rg mx {α j,o} j α j,o, b } A. Since ll element involved in the definition re liner combintion of Guin o i the finl reult. Uing the bove lemm, the bckup function i bckup(b) = rg mx {α i n }i α i n, b, nd the vlue of V n t b (Eq. ) i imply V n (b) = bckup(b), b. V. CONTINUOUS-STATE PERSEUS In thi ection, we ue the bckup opertor to extend to the continuou ce the point-bed vlue itertion lgorithm PERSEUS [4], [26], which h been hown to be very efficient for dicrete tte POMDP. The continuou-tte PERSEUS lgorithm i hown in Tble I. Point-bed POMDP lgorithm focu on identifying the α-function (α-vector in the dicrete ce) for et of likely belief point. The α-function for thi retricted et of belief point generlize over the whole belief pce nd, thu, they cn be ued to pproximte the vlue function for ny belief point. The reult i n pproximtion of the vlue function with le error in region of the belief pce where deciion re more likely to be tken. The vlue updte cheme of PERSEUS implement rndomized pproximte vlue function recurion V n = HV n for et of rndomly mpled belief point B. Firt (Tble I, line 2), we let the gent rndomly explore the environment nd collect et B of rechble belief point. Next (Tble I, line 3-5), we initilize the vlue function V ingle weighted Guin with lrge covrince nd with weight min{r}/( γ), with R the et of poible rewrd. Strting with V, PERSEUS perform number of pproximte vlue function updte tge. The definition of the vlue updte proce cn be een on line 2 in Tble I, where B i et of non-improved point: point for which V n+ (b) i till lower thn V n (b). At the trt of ech updte tge, V n+ i et to nd B i initilized to B. A long B i not empty, we mple point b from B nd compute the new α- function ocited with thi point uing the bckup opertor (ee Section IV-C nd line 4 in Tble I). If thi α-function improve the vlue of b (i.e., if α, b V n (b), line 5), we dd α to V n+ (line 8). The hope i tht α improve the vlue of mny other point, nd ll thee point re removed from B (line 9). Often, mll number of vector will be ufficient

6 Pereu Input: A continuou tte POMDP. Output: V n, n pproximtion to the optiml vlue function, V. : Initilize 2: B A et of rndomly mpled belief point. 3: α min{r} φ(, Σ γ ) 4: n 5: V n {α} 6: do 7: b B, 8: Function n(b) rg mx α Vn α, b 9: Vlue n(b) Function n(b), b : V n+ : B B 2: do 3: b Point mpled rndomly from B. 4: α bckup(b) 5: if α, b < Vlue n(b) 6: α Function n(b) 7: endif 8: V n+ V n+ {α} 9: B B \ {b B α, b Vlue n(b )} 2: until B = 2: n n + 22: until convergence () left end right end corridor door (b) move left move right enter door TABLE I THE PERSEUS ALGORITHM. THE bckup FUNCTION IS DESCRIBED IN SECTION IV-C. to improve V n (b) b B, epecilly in the firt tep of vlue itertion. A long B i not empty we continue mpling belief point from it nd trying to dd their α-function to V n+. If the α computed by the bckup opertor doe not improve t let the vlue of b (i.e., α, b < V n (b), ee line 5 in Tble I), we ignore α nd inert copy of the mximizing function of b from V n in V n+ (line 6 nd 8). Point b i now conidered improved nd i removed from B, together with ny other belief point tht hve the me function mximizing one in V n (line 9). Thi procedure enure tht B hrink t ech itertion nd tht the vlue updte tge terminte. PERSEUS top when given convergence criterion hold. Thi criterion cn be bed on the tbility of the vlue function, on the tbility of the ocited policy, or imply on mximum number of itertion. One point tht deerve pecil conidertion when implementing the PERSEUS lgorithm i the poible exploion of the number of component in the Guin mixture defining the α-function for increing n nd on the number of component in the belief repreenttion when the belief updte (ee Section IV-B) i repeted for mny time tep. The lrger the number of component the lower the bic opertion of the lgorithm. To keep the number of component bounded, we dpted the procedure decribed in [27] tht trnform given (c) Fig.. A pictoril repreenttion of the tet problem (), the correponding obervtion model (b) nd the rewrd model (c). Guin mixture with k component to nother Guin mixture with t mot m component, m < k, while retining the initil component tructure. VI. EXPERIMENTS AND RESULTS To demontrte the vibility of our method we crried out n experiment in imulted robotic domin. In thi problem (ee Fig. -), robot i moving in corridor with four door. The robot cn detect when it i in front of door nd when it i t the left or right end of the corridor. In ny other itution, the robot jut detect tht it i in corridor (ee Fig. -b). The robot cn move 2 unit to the left or to the right (with Σ =.5) nd cn try to enter door t ny point (even when not in front of door). The trget for the robot i to locte the econd door from the right nd to enter it. The robot only get poitive rewrd when it enter the trget door (ee Fig. - c). When the robot trie to move further thn the end of the corridor (either t the right or t the left) or when it trie to enter the door t wrong poition it get negtive rewrd. The et of belief B ued in the PERSEUS lgorithm contin unique belief point. Thoe belief point re collected uing rndom wlk deprting from belief including 4 component tht pproximte uniform ditribution on the whole corridor. The wlk of the robot long the corridor re orgnized in epiode where the robot execute ction until it trie to enter door or until it execute 25 (movement) ction.

7 g replcement π g replcement π V # of function PSfrg replcement.2 time () π PSfrg replcement 2 3 time () rewrd π.2. time () time () 2 3 Fig. 2. Top: Evolution of the vlue for ll the belief in B nd the verge ccumulted dicounted rewrd for epiode. Bottom: Number of vector in V n nd the number of policy chnge. Reult re verged forh ( ) H ( ) repetition nd the br repreent the tndrd devition. I ( ) I ( ) H ( ) I ( ) Fig. 3. Evolution of the belief when following the dicovered policy. The rrow under the nphot repreent the ction: for moving right, for moving left nd for entering the door. On the x-xi the four door loction The experimentl etup i completed by etting γ to.95, re indicted. compreing belief o tht they never contin more thn 4 component (i.e., the number of component of the initil belief) nd compreing α-function o tht they never hve more component thn thoe ued to repreent the rewrd function ( component)..8 Fig. 2 how the verge reult obtined fter run of the PERSEUS lgorithm on thi problem. The firt plot (top-.6 left) how tht the vlue computed b B V (b) converge. The econd plot (top-right) how the expected dicounted rewrd verged for epiode with the policy vilble t the correponding time lice. The plot indicte tht the robot uccefully lern to find out it poition nd to ditinguih between the four door. The next plot (bottom-left) how the number of α-function ued to repreent the vlue function. We cn ee tht the number of α-function incree, but i fr below, the mximum poible number of α-function (if we would bck up ech point in B). In the finl plot (bottomright) we how the number of chnge in the policy from one time tep to the next one. The chnge in the policy re computed the number of element in B with different ction from one time lice to the next. The number of policy chnge drop to cloe to zero, indicting convergence with repect to the prticulr B. Following the lerned policy the robot move to one of the end of the corridor to determine it poition nd then towrd the correct door to enter it. The nphot A to I in Fig. 3 how the evolution of the belief of the robot nd the executed ction in ech ce from the initil tge of the epiode to the point t which the trget door i reched. In Fig. 4 we plot the vlue of belief tht hve only one component, prmetrized by the men nd the covrince of PSfrg replcement PSfrg replcement PSfrg replcement A ( ) A ( ) PSfrg replcement PSfrg replcement PSfrg replcement H ( ) H ( ) H ( ) A I ( ) ( ) A A ( ) I ( ) ( ) A I ( ) ( ) PSfrg replcement PSfrg replcement PSfrg replcement H A ( ) ( ) H A ( ) ( ) H A ( ) ( ) B I ( ) ( ) B I ( ) ( ) B I ( ) ( ) PSfrg replcement µ Fig. 4. Vlue function for ingle component belief function of the men nd the covrince. thi component. We cn ee tht, the uncertinty bout the poition of the robot grow (i.e., the covrince i lrger) the vlue of the correponding belief decree. The color/hding in the figure correpond to the different ction: light-gry for moving to the right, white for entering the door, nd drk-gry for moving to the left. Oberve tht the dvntge of uing continuou tte pce i tht we obtin cle-invrint olution. If we hve to olve the me problem in longer corridor, we cn jut cle the Guin ued in the problem definition nd we will obtin the olution with the me cot we hve now. The only difference i tht more ction would be needed 6 σ

8 in ech epiode to rech the correct door. When dicretizing the environment, the grnulrity h to be in ccordnce with the ize of the ction tken by the robot (±2 left/right) nd, thu, the number of tte, nd conequently the cot of the plnning, grow the environment grow. VII. CONCLUSIONS AND FUTURE WORK In thi pper, we hve hown how to generlize vlue itertion to continuou-tte POMDP nd, in prticulr, for the ce of Guin-bed belief nd model. Thi llowed u to define n efficient point-bed vlue itertion lgorithm tht eem to be pproprite for plnning problem tht re often encountered in robotic. An pproch to continuou-tte POMDP tht i cloely relted to our i preented in [9]. In tht work, belief i repreented by et of weighted mple, which cn be regrded degenerte verion of our Guin mixture repreenttion. Additionlly, the vlue function i pproximted by neret-neighbor interpoltion, where in our ce the vlue function chieve generliztion through et of α-function. Alo, in the bove work rel-time dynmic progrmming pproch i ued for updting the vlue function, with the Bellmn bckup opertor being pproximted by mpling from the belief trnition model. In our ce, vlue itertion pplie on pre-collected et of belief, while the Bellmn bckup opertor i nlyticlly computed given the prticulr vlue function repreenttion. Although we hve not directly compred our method to the method preented in [9], we expect our method to be fter (ince it pln on fixed et of belief point) nd the vlue function to generlize better over the belief pce (through the ue of α-function). Ongoing work involve extending our frmework to continuou ction [26] nd obervtion pce [28], well defining pproximte belief repreenttion uing Monte Crlo technique [9]. ACKNOWLEDGMENTS We would like to thnk J.J. Verbeek nd W. Zjdel for their contribution to the work reported here, nd the four reviewer for their detiled comment. J.M. Port h been prtilly upported by Rmón y Cjl contrct from the Spnih Minitry for Science nd Technology. M.T.J. Spn nd N. Vli re upported by PROGRESS, the embedded ytem reerch progrm of the Dutch orgniztion for Scientific Reerch NWO, the Dutch Minitry of Economic Affir nd the Technology Foundtion STW, project AES 544. Author re lited in lphbeticl order. REFERENCES [] M. L. Putermn, Mrkov Deciion Procee: Dicrete Stochtic Dynmic Progrmming. Wiley Serie in Probbility nd Mthemticl Sttitic. John Wiley nd Son, Inc., 994. [2] E. J. Sondik, The Optiml Control of Prtilly Obervble Mrkov Procee, Ph.D. dierttion, Stnford Univerity, 97. [3] G. E. Monhn, A Survey of Prtilly Obervble Mrkov Deciion Procee: Theory, Model, nd Algorithm, Mngement Science, vol. 28, no., pp. 6, 982. [4] H. T. Cheng, Algorithm for Prtilly Obervble Mrkov Deciion Procee, Ph.D. dierttion, Univerity of Britih Columbi, 988. [5] A. R. Cndr, M. L. Littmn, nd N. L. Zhng, Incrementl Pruning: A Simple, Ft, Exct Algorithm for Prtilly Obervble Mrkov Deciion Procee, in Proceeding of the Thirteenth Annul Conference on Uncertinty in Artificil Intelligence (UAI 97), 997. [6] L. P. Kelbling, M. L. Littmn, nd A. R. Cndr, Plnning nd Acting in Prtilly Obervble Stochtic Domin, Artificil Intelligence, vol., no. -2, pp , 998. [7] R. Simmon nd S. Koenig, Probbilitic Robot Nvigtion in Prtilly Obervble Environment, in Proceeding of the Interntionl Joint Conference on Artificil Intelligence (IJCAI), 995. [8] A. R. Cndr, L. P. Kelbling, nd J. A. Kurien, Acting under Uncertinty: Dicrete Byein Model for Mobile-Robot Nvigtion, in Proceeding of IEEE/RSJ Interntionl Conference on Intelligent Robot nd Sytem (IROS), 996, pp [9] G. Theochrou nd S. Mhdevn, Approximte Plnning with Hierrchicl Prtilly Obervble Mrkov Deciion Procee for Robot Nvigtion, in IEEE Interntionl Conference on Robotic nd Automtion, 22, pp [] J. Pineu, M. Montemerlo, M. Pollck, N. Roy, nd S. Thrun, Towrd Robotic Aitnt in Nuring Home: Chllenge nd Reult, in Robotic nd Autonomou Sytem, vol. 42, no. 3-4, 23, pp [] C. Ppdimitriou nd J. N. Tiikli, The Complexity of Mrkov Deciion Procee, Mthemticl nd Opertion Reerch, vol. 2, no. 3, pp , 987. [2] O. Mdni, S. Hnk, nd A. Condon, On the Undecidbility of Probbilitic Plnning nd Infinite-Horizon Prtilly Obervble Mrkov Deciion Problem, in Proceeding of the Sixteenth Ntionl Conference on Artificil Intelligence (AAAI), 999, pp [3] J. Pineu, G. Gordon, nd S. Thrun, Point-bed Vlue Itertion: An Anytime Algorithm for POMDP, in Interntionl Joint Conference on Artificil Intelligence (IJCAI), 23. [4] N. Vli nd M. T. J. Spn, A Ft Point-Bed Algorithm for POMDP, in In Proceeding of Annul Mchine Lerning Conference of Belgium nd the Netherlnd, Bruel, Belgium, 24, pp [5] D. P. Bertek, Dynmic Progrmming nd Optiml Control, 2nd ed. Belmont, MA: Athen Scientific cop, 2. [6] A. Y. Ng nd M. Jordn, PEGASUS: A Policy Serch Method for Lrge MDP nd POMDP, in Proceeding of the 6th Conference on Uncertinty in Artificil Inteligence (UAI), 2, pp [7] D. Aberdeen nd J. Bxter, Sclble Internl-Stte Policy-Grdient Method for POMDP, in Proceeding of the Interntionl Conference on Mchine Lerning (ICML), 22, pp. 3. [8] N. Roy, G. Gordon, nd S. Thrun, Finding Approximte POMDP Solution Through Belief Compreion, Journl of Artificil Intelligence Reerch, vol. 23, pp. 4, 25. [9] S. Thrun, Monte Crlo POMDP, in Advnce in Neurl Informtion Proceing Sytem (NIPS), S. Soll, T. Leen, nd K.-R. Müller, Ed. MIT Pre, 2, pp [2] R. E. Bellmn, Dynmic Progrmming. Princenton Univerity Pre, 957. [2] M. Hukrecht, Vlue Function Approximtion for Prtilly Obervble Mrkov Deciion Procee, Journl of Artificil Intelligence Reerch, vol. 3, pp , 2. [22] J. M. Port, M. T. J. Spn, nd N. Vli, Vlue Itertion for Continuou-Stte POMDP, IAS Technicl Report, Univerity of Amterdm, Tech. Rep. IAS-UVA-4-4, 24. [23] J. J. Leonrd nd H. F. Durrnt-Whyte, Mobile Robot Locliztion by Trcking Geometric Becon, IEEE Trnction on Robotic nd Automtion, vol. 7, no. 3, pp , 99. [24] P. Jenfelt nd S. Kritenen, Active Globl Locliztion for Mobile Robot Uing Multiple Hypothei Trcking, IEEE Trnction on Robotic nd Automtion, vol. 7, no. 5, pp , 2. [25] J. Julier nd J. K. Uhlmnn, A New Extenion of the Klmn Filter to Nonliner Sytem, in In Proceeding of AeroSene: The th Interntionl Sympoium on Aeropce/Defence Sening, Simultion nd Control, 997, pp [26] M. T. J. Spn nd N. Vli, Pereu: Rndomized Point-bed Vlue Itertion for POMDP, Journl of Artificil Intelligence Reerch, 25. [27] J. Goldberger nd S. Rowei, Hierrchicl Clutering of Mixture Model, in Advnce in Neurl Informtion Proceing Sytem (NIPS), 25. [28] J. Hoey nd P. Pouprt, Solving POMDP with Continuou or Lrge Dicrete Obervtion Spce, in Proceeding of the Interntionl Joint Conference on Artificil Intelligence (IJCAI), 25.

Robot Planning in Partially Observable Continuous Domains

Robot Planning in Partially Observable Continuous Domains Robot Plnning in Prtilly Obervble Continuou Domin Joep M. Port Intitut de Robòtic i Informàtic Indutril (UPC-CSIC) Lloren i Artig 4-6, 828, Brcelon Spin Emil: port@iri.upc.edu Mtthij T. J. Spn Informtic

More information

Artificial Intelligence Markov Decision Problems

Artificial Intelligence Markov Decision Problems rtificil Intelligence Mrkov eciion Problem ilon - briefly mentioned in hpter Ruell nd orvig - hpter 7 Mrkov eciion Problem; pge of Mrkov eciion Problem; pge of exmple: probbilitic blockworld ction outcome

More information

TP 10:Importance Sampling-The Metropolis Algorithm-The Ising Model-The Jackknife Method

TP 10:Importance Sampling-The Metropolis Algorithm-The Ising Model-The Jackknife Method TP 0:Importnce Smpling-The Metropoli Algorithm-The Iing Model-The Jckknife Method June, 200 The Cnonicl Enemble We conider phyicl ytem which re in therml contct with n environment. The environment i uully

More information

Reinforcement learning

Reinforcement learning Reinforcement lerning Regulr MDP Given: Trnition model P Rewrd function R Find: Policy π Reinforcement lerning Trnition model nd rewrd function initilly unknown Still need to find the right policy Lern

More information

Reinforcement Learning and Policy Reuse

Reinforcement Learning and Policy Reuse Reinforcement Lerning nd Policy Reue Mnuel M. Veloo PEL Fll 206 Reding: Reinforcement Lerning: An Introduction R. Sutton nd A. Brto Probbilitic policy reue in reinforcement lerning gent Fernndo Fernndez

More information

Non-Myopic Multi-Aspect Sensing with Partially Observable Markov Decision Processes

Non-Myopic Multi-Aspect Sensing with Partially Observable Markov Decision Processes Non-Myopic Multi-Apect Sening with Prtilly Oervle Mrkov Deciion Procee Shiho Ji 2 Ronld Prr nd Lwrence Crin Deprtment of Electricl & Computer Engineering 2 Deprtment of Computer Engineering Duke Univerity

More information

Value Iteration for Continuous-State POMDPs

Value Iteration for Continuous-State POMDPs Univeriteit van Amterdam IAS technical report IAS-UVA-04-04 Value Iteration for Continuou-State POMDP Joep M. Porta, Matthij T.J. Spaan, and Niko Vlai Intitut de Robòtica i Informàtica Indutrial (UPC-CSIC)

More information

Markov Decision Processes

Markov Decision Processes Mrkov Deciion Procee A Brief Introduction nd Overview Jck L. King Ph.D. Geno UK Limited Preenttion Outline Introduction to MDP Motivtion for Study Definition Key Point of Interet Solution Technique Prtilly

More information

Reinforcement learning II

Reinforcement learning II CS 1675 Introduction to Mchine Lerning Lecture 26 Reinforcement lerning II Milos Huskrecht milos@cs.pitt.edu 5329 Sennott Squre Reinforcement lerning Bsics: Input x Lerner Output Reinforcement r Critic

More information

Administrivia CSE 190: Reinforcement Learning: An Introduction

Administrivia CSE 190: Reinforcement Learning: An Introduction Administrivi CSE 190: Reinforcement Lerning: An Introduction Any emil sent to me bout the course should hve CSE 190 in the subject line! Chpter 4: Dynmic Progrmming Acknowledgment: A good number of these

More information

{ } = E! & $ " k r t +k +1

{ } = E! & $  k r t +k +1 Chpter 4: Dynmic Progrmming Objectives of this chpter: Overview of collection of clssicl solution methods for MDPs known s dynmic progrmming (DP) Show how DP cn be used to compute vlue functions, nd hence,

More information

Chapter 4: Dynamic Programming

Chapter 4: Dynamic Programming Chpter 4: Dynmic Progrmming Objectives of this chpter: Overview of collection of clssicl solution methods for MDPs known s dynmic progrmming (DP) Show how DP cn be used to compute vlue functions, nd hence,

More information

Policy Gradient Methods for Reinforcement Learning with Function Approximation

Policy Gradient Methods for Reinforcement Learning with Function Approximation Policy Grdient Method for Reinforcement Lerning with Function Approximtion Richrd S. Sutton, Dvid McAlleter, Stinder Singh, Yihy Mnour AT&T Lb Reerch, 180 Prk Avenue, Florhm Prk, NJ 07932 Abtrct Function

More information

Module 6 Value Iteration. CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo

Module 6 Value Iteration. CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo Module 6 Vlue Itertion CS 886 Sequentil Decision Mking nd Reinforcement Lerning University of Wterloo Mrkov Decision Process Definition Set of sttes: S Set of ctions (i.e., decisions): A Trnsition model:

More information

Bellman Optimality Equation for V*

Bellman Optimality Equation for V* Bellmn Optimlity Eqution for V* The vlue of stte under n optiml policy must equl the expected return for the best ction from tht stte: V (s) mx Q (s,) A(s) mx A(s) mx A(s) Er t 1 V (s t 1 ) s t s, t s

More information

Reinforcement Learning for Robotic Locomotions

Reinforcement Learning for Robotic Locomotions Reinforcement Lerning for Robotic Locomotion Bo Liu Stnford Univerity 121 Cmpu Drive Stnford, CA 94305, USA bliuxix@tnford.edu Hunzhong Xu Stnford Univerity 121 Cmpu Drive Stnford, CA 94305, USA xuhunvc@tnford.edu

More information

Oracular Partially Observable Markov Decision Processes: A Very Special Case

Oracular Partially Observable Markov Decision Processes: A Very Special Case Orculr Prtilly Obervble Mrkov Deciion Procee: A Very Specil Ce Nichol Armtrong-Crew nd Mnuel Veloo Robotic Intitute, Crnegie Mellon Univerity {nrmtro,veloo}@c.cmu.edu Abtrct We introduce the Orculr Prtilly

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Lerning Tom Mitchell, Mchine Lerning, chpter 13 Outline Introduction Comprison with inductive lerning Mrkov Decision Processes: the model Optiml policy: The tsk Q Lerning: Q function Algorithm

More information

CHOOSING THE NUMBER OF MODELS OF THE REFERENCE MODEL USING MULTIPLE MODELS ADAPTIVE CONTROL SYSTEM

CHOOSING THE NUMBER OF MODELS OF THE REFERENCE MODEL USING MULTIPLE MODELS ADAPTIVE CONTROL SYSTEM Interntionl Crpthin Control Conference ICCC 00 ALENOVICE, CZEC REPUBLIC y 7-30, 00 COOSING TE NUBER OF ODELS OF TE REFERENCE ODEL USING ULTIPLE ODELS ADAPTIVE CONTROL SYSTE rin BICĂ, Victor-Vleriu PATRICIU

More information

19 Optimal behavior: Game theory

19 Optimal behavior: Game theory Intro. to Artificil Intelligence: Dle Schuurmns, Relu Ptrscu 1 19 Optiml behvior: Gme theory Adversril stte dynmics hve to ccount for worst cse Compute policy π : S A tht mximizes minimum rewrd Let S (,

More information

APPENDIX 2 LAPLACE TRANSFORMS

APPENDIX 2 LAPLACE TRANSFORMS APPENDIX LAPLACE TRANSFORMS Thi ppendix preent hort introduction to Lplce trnform, the bic tool ued in nlyzing continuou ytem in the frequency domin. The Lplce trnform convert liner ordinry differentil

More information

2D1431 Machine Learning Lab 3: Reinforcement Learning

2D1431 Machine Learning Lab 3: Reinforcement Learning 2D1431 Mchine Lerning Lb 3: Reinforcement Lerning Frnk Hoffmnn modified by Örjn Ekeberg December 7, 2004 1 Introduction In this lb you will lern bout dynmic progrmming nd reinforcement lerning. It is ssumed

More information

CONTROL SYSTEMS LABORATORY ECE311 LAB 3: Control Design Using the Root Locus

CONTROL SYSTEMS LABORATORY ECE311 LAB 3: Control Design Using the Root Locus CONTROL SYSTEMS LABORATORY ECE311 LAB 3: Control Deign Uing the Root Locu 1 Purpoe The purpoe of thi lbortory i to deign cruie control ytem for cr uing the root locu. 2 Introduction Diturbnce D( ) = d

More information

Efficient Planning in R-max

Efficient Planning in R-max Efficient Plnning in R-mx Mrek Grześ nd Jee Hoey Dvid R. Cheriton School of Computer Science, Univerity of Wterloo 200 Univerity Avenue Wet, Wterloo, ON, N2L 3G1, Cnd {mgrze, jhoey}@c.uwterloo.c ABSTRACT

More information

STABILITY and Routh-Hurwitz Stability Criterion

STABILITY and Routh-Hurwitz Stability Criterion Krdeniz Technicl Univerity Deprtment of Electricl nd Electronic Engineering 6080 Trbzon, Turkey Chpter 8- nd Routh-Hurwitz Stbility Criterion Bu der notlrı dece bu deri ln öğrencilerin kullnımın çık olup,

More information

MArkov decision processes (MDPs) have been widely

MArkov decision processes (MDPs) have been widely Spre Mrkov Deciion Procee with Cul Spre Tlli Entropy Regulriztion for Reinforcement Lerning yungje Lee, Sungjoon Choi, nd Songhwi Oh rxiv:709.0693v3 [c.lg] 3 Oct 07 Abtrct In thi pper, re Mrkov deciion

More information

Bias in Natural Actor-Critic Algorithms

Bias in Natural Actor-Critic Algorithms Bi in Nturl Actor-Critic Algorithm Philip S. Thom pthom@c.um.edu Deprtment of Computer Science, Univerity of Mchuett, Amhert, MA 01002 USA Technicl Report UM-CS-2012-018 Abtrct We how tht two populr dicounted

More information

ARCHIVUM MATHEMATICUM (BRNO) Tomus 47 (2011), Kristína Rostás

ARCHIVUM MATHEMATICUM (BRNO) Tomus 47 (2011), Kristína Rostás ARCHIVUM MAHEMAICUM (BRNO) omu 47 (20), 23 33 MINIMAL AND MAXIMAL SOLUIONS OF FOURH ORDER IERAED DIFFERENIAL EQUAIONS WIH SINGULAR NONLINEARIY Kritín Rotá Abtrct. In thi pper we re concerned with ufficient

More information

20.2. The Transform and its Inverse. Introduction. Prerequisites. Learning Outcomes

20.2. The Transform and its Inverse. Introduction. Prerequisites. Learning Outcomes The Trnform nd it Invere 2.2 Introduction In thi Section we formlly introduce the Lplce trnform. The trnform i only pplied to cul function which were introduced in Section 2.1. We find the Lplce trnform

More information

4-4 E-field Calculations using Coulomb s Law

4-4 E-field Calculations using Coulomb s Law 1/11/5 ection_4_4_e-field_clcultion_uing_coulomb_lw_empty.doc 1/1 4-4 E-field Clcultion uing Coulomb Lw Reding Aignment: pp. 9-98 Specificlly: 1. HO: The Uniform, Infinite Line Chrge. HO: The Uniform Dik

More information

PHYS 601 HW 5 Solution. We wish to find a Fourier expansion of e sin ψ so that the solution can be written in the form

PHYS 601 HW 5 Solution. We wish to find a Fourier expansion of e sin ψ so that the solution can be written in the form 5 Solving Kepler eqution Conider the Kepler eqution ωt = ψ e in ψ We wih to find Fourier expnion of e in ψ o tht the olution cn be written in the form ψωt = ωt + A n innωt, n= where A n re the Fourier

More information

Anytime algorithms for multiagent decision making using coordination graphs

Anytime algorithms for multiagent decision making using coordination graphs Anytime lgorithms for multigent decision mking using coordintion grphs N. Vlssis R. Elhorst J. R. Kok Informtics Institute, University of Amsterdm, The Netherlnds {vlssis,reinhrst,jellekok}@science.uv.nl

More information

Analysis of Variance and Design of Experiments-II

Analysis of Variance and Design of Experiments-II Anlyi of Vrince nd Deign of Experiment-II MODULE VI LECTURE - 7 SPLIT-PLOT AND STRIP-PLOT DESIGNS Dr. Shlbh Deprtment of Mthemtic & Sttitic Indin Intitute of Technology Knpur Anlyi of covrince ith one

More information

1 Online Learning and Regret Minimization

1 Online Learning and Regret Minimization 2.997 Decision-Mking in Lrge-Scle Systems My 10 MIT, Spring 2004 Hndout #29 Lecture Note 24 1 Online Lerning nd Regret Minimiztion In this lecture, we consider the problem of sequentil decision mking in

More information

Compact, Convex Upper Bound Iteration for Approximate POMDP Planning

Compact, Convex Upper Bound Iteration for Approximate POMDP Planning Compct, Convex Upper Bound Itertion for Approximte POMDP Plnning To Wng University of Alert trysi@cs.ulert.c Pscl Pouprt University of Wterloo ppouprt@cs.uwterloo.c Michel Bowling nd Dle Schuurmns University

More information

Decision Networks. CS 188: Artificial Intelligence Fall Example: Decision Networks. Decision Networks. Decisions as Outcome Trees

Decision Networks. CS 188: Artificial Intelligence Fall Example: Decision Networks. Decision Networks. Decisions as Outcome Trees CS 188: Artificil Intelligence Fll 2011 Decision Networks ME: choose the ction which mximizes the expected utility given the evidence mbrell Lecture 17: Decision Digrms 10/27/2011 Cn directly opertionlize

More information

PHYSICS 211 MIDTERM I 22 October 2003

PHYSICS 211 MIDTERM I 22 October 2003 PHYSICS MIDTERM I October 3 Exm i cloed book, cloed note. Ue onl our formul heet. Write ll work nd nwer in exm booklet. The bck of pge will not be grded unle ou o requet on the front of the pge. Show ll

More information

Accelerator Physics. G. A. Krafft Jefferson Lab Old Dominion University Lecture 5

Accelerator Physics. G. A. Krafft Jefferson Lab Old Dominion University Lecture 5 Accelertor Phyic G. A. Krfft Jefferon L Old Dominion Univerity Lecture 5 ODU Accelertor Phyic Spring 15 Inhomogeneou Hill Eqution Fundmentl trnvere eqution of motion in prticle ccelertor for mll devition

More information

The ifs Package. December 28, 2005

The ifs Package. December 28, 2005 The if Pckge December 28, 2005 Verion 0.1-1 Title Iterted Function Sytem Author S. M. Icu Mintiner S. M. Icu Iterted Function Sytem Licene GPL Verion 2 or lter. R topic documented:

More information

CS 188 Introduction to Artificial Intelligence Fall 2018 Note 7

CS 188 Introduction to Artificial Intelligence Fall 2018 Note 7 CS 188 Introduction to Artificil Intelligence Fll 2018 Note 7 These lecture notes re hevily bsed on notes originlly written by Nikhil Shrm. Decision Networks In the third note, we lerned bout gme trees

More information

2. The Laplace Transform

2. The Laplace Transform . The Lplce Trnform. Review of Lplce Trnform Theory Pierre Simon Mrqui de Lplce (749-87 French tronomer, mthemticin nd politicin, Miniter of Interior for 6 wee under Npoleon, Preident of Acdemie Frncie

More information

On the Adders with Minimum Tests

On the Adders with Minimum Tests Proceeding of the 5th Ain Tet Sympoium (ATS '97) On the Adder with Minimum Tet Seiji Kjihr nd Tutomu So Dept. of Computer Science nd Electronic, Kyuhu Intitute of Technology Atrct Thi pper conider two

More information

Package ifs. R topics documented: August 21, Version Title Iterated Function Systems. Author S. M. Iacus.

Package ifs. R topics documented: August 21, Version Title Iterated Function Systems. Author S. M. Iacus. Pckge if Augut 21, 2015 Verion 0.1.5 Title Iterted Function Sytem Author S. M. Icu Dte 2015-08-21 Mintiner S. M. Icu Iterted Function Sytem Etimtor. Licene GPL (>= 2) NeedCompiltion

More information

MOMDPs: a Solution for Modelling Adaptive Management Problems

MOMDPs: a Solution for Modelling Adaptive Management Problems MOMDPs: Solution for Modelling Adptive Mngement Problems Idine Chdès nd Josie Crwrdine nd Tr G. Mrtin CSIRO Ecosystem Sciences {idine.chdes, josie.crwrdine, tr.mrtin}@csiro.u Smuel Nicol University of

More information

Excerpted Section. Consider the stochastic diffusion without Poisson jumps governed by the stochastic differential equation (SDE)

Excerpted Section. Consider the stochastic diffusion without Poisson jumps governed by the stochastic differential equation (SDE) ? > ) 1 Technique in Computtionl Stochtic Dynmic Progrmming Floyd B. Hnon niverity of Illinoi t Chicgo Chicgo, Illinoi 60607-705 Excerpted Section A. MARKOV CHAI APPROXIMATIO Another pproch to finite difference

More information

Chapter 2 Organizing and Summarizing Data. Chapter 3 Numerically Summarizing Data. Chapter 4 Describing the Relation between Two Variables

Chapter 2 Organizing and Summarizing Data. Chapter 3 Numerically Summarizing Data. Chapter 4 Describing the Relation between Two Variables Copyright 013 Peron Eduction, Inc. Tble nd Formul for Sullivn, Sttitic: Informed Deciion Uing Dt 013 Peron Eduction, Inc Chpter Orgnizing nd Summrizing Dt Reltive frequency = frequency um of ll frequencie

More information

NUMERICAL INTEGRATION. The inverse process to differentiation in calculus is integration. Mathematically, integration is represented by.

NUMERICAL INTEGRATION. The inverse process to differentiation in calculus is integration. Mathematically, integration is represented by. NUMERICAL INTEGRATION 1 Introduction The inverse process to differentition in clculus is integrtion. Mthemticlly, integrtion is represented by f(x) dx which stnds for the integrl of the function f(x) with

More information

Numerical integration

Numerical integration 2 Numericl integrtion This is pge i Printer: Opque this 2. Introduction Numericl integrtion is problem tht is prt of mny problems in the economics nd econometrics literture. The orgniztion of this chpter

More information

. The set of these fractions is then obviously Q, and we can define addition and multiplication on it in the expected way by

. The set of these fractions is then obviously Q, and we can define addition and multiplication on it in the expected way by 50 Andre Gthmnn 6. LOCALIZATION Locliztion i very powerful technique in commuttive lgebr tht often llow to reduce quetion on ring nd module to union of mller locl problem. It cn eily be motivted both from

More information

CMDA 4604: Intermediate Topics in Mathematical Modeling Lecture 19: Interpolation and Quadrature

CMDA 4604: Intermediate Topics in Mathematical Modeling Lecture 19: Interpolation and Quadrature CMDA 4604: Intermedite Topics in Mthemticl Modeling Lecture 19: Interpoltion nd Qudrture In this lecture we mke brief diversion into the res of interpoltion nd qudrture. Given function f C[, b], we sy

More information

positive definite (symmetric with positive eigenvalues) positive semi definite (symmetric with nonnegative eigenvalues)

positive definite (symmetric with positive eigenvalues) positive semi definite (symmetric with nonnegative eigenvalues) Chter Liner Qudrtic Regultor Problem inimize the cot function J given by J x' Qx u' Ru dt R > Q oitive definite ymmetric with oitive eigenvlue oitive emi definite ymmetric with nonnegtive eigenvlue ubject

More information

Review of Calculus, cont d

Review of Calculus, cont d Jim Lmbers MAT 460 Fll Semester 2009-10 Lecture 3 Notes These notes correspond to Section 1.1 in the text. Review of Clculus, cont d Riemnn Sums nd the Definite Integrl There re mny cses in which some

More information

Monte Carlo Value Iteration with Macro-Actions

Monte Carlo Value Iteration with Macro-Actions In Advnces in Neurl Informtion Processing Systems (NIPS), 2011 Monte Crlo Vlue Itertion with Mcro-Actions Zhnwei Lim Dvid Hsu Wee Sun Lee Deprtment of Computer Science, Ntionl University of Singpore Singpore,

More information

Jack Simons, Henry Eyring Scientist and Professor Chemistry Department University of Utah

Jack Simons, Henry Eyring Scientist and Professor Chemistry Department University of Utah 1. Born-Oppenheimer pprox.- energy surfces 2. Men-field (Hrtree-Fock) theory- orbitls 3. Pros nd cons of HF- RHF, UHF 4. Beyond HF- why? 5. First, one usully does HF-how? 6. Bsis sets nd nottions 7. MPn,

More information

A Fast and Reliable Policy Improvement Algorithm

A Fast and Reliable Policy Improvement Algorithm A Fst nd Relible Policy Improvement Algorithm Ysin Abbsi-Ydkori Peter L. Brtlett Stephen J. Wright Queenslnd University of Technology UC Berkeley nd QUT University of Wisconsin-Mdison Abstrct We introduce

More information

p-adic Egyptian Fractions

p-adic Egyptian Fractions p-adic Egyptin Frctions Contents 1 Introduction 1 2 Trditionl Egyptin Frctions nd Greedy Algorithm 2 3 Set-up 3 4 p-greedy Algorithm 5 5 p-egyptin Trditionl 10 6 Conclusion 1 Introduction An Egyptin frction

More information

CS 188: Artificial Intelligence Spring 2007

CS 188: Artificial Intelligence Spring 2007 CS 188: Artificil Intelligence Spring 2007 Lecture 3: Queue-Bsed Serch 1/23/2007 Srini Nrynn UC Berkeley Mny slides over the course dpted from Dn Klein, Sturt Russell or Andrew Moore Announcements Assignment

More information

CS 188: Artificial Intelligence Fall Announcements

CS 188: Artificial Intelligence Fall Announcements CS 188: Artificil Intelligence Fll 2009 Lecture 20: Prticle Filtering 11/5/2009 Dn Klein UC Berkeley Announcements Written 3 out: due 10/12 Project 4 out: due 10/19 Written 4 proly xed, Project 5 moving

More information

VSS CONTROL OF STRIP STEERING FOR HOT ROLLING MILLS. M.Okada, K.Murayama, Y.Anabuki, Y.Hayashi

VSS CONTROL OF STRIP STEERING FOR HOT ROLLING MILLS. M.Okada, K.Murayama, Y.Anabuki, Y.Hayashi V ONTROL OF TRIP TEERING FOR OT ROLLING MILL M.Okd.Murym Y.Anbuki Y.yhi Wet Jpn Work (urhiki Ditrict) JFE teel orportion wkidori -chome Mizuhim urhiki 7-85 Jpn Abtrct: trip teering i one of the mot eriou

More information

Uncertain Dynamic Systems on Time Scales

Uncertain Dynamic Systems on Time Scales Journl of Uncertin Sytem Vol.9, No.1, pp.17-30, 2015 Online t: www.ju.org.uk Uncertin Dynmic Sytem on Time Scle Umber Abb Hhmi, Vile Lupulecu, Ghu ur Rhmn Abdu Slm School of Mthemticl Science, GCU Lhore

More information

Name Solutions to Test 3 November 8, 2017

Name Solutions to Test 3 November 8, 2017 Nme Solutions to Test 3 November 8, 07 This test consists of three prts. Plese note tht in prts II nd III, you cn skip one question of those offered. Some possibly useful formuls cn be found below. Brrier

More information

Generation of Lyapunov Functions by Neural Networks

Generation of Lyapunov Functions by Neural Networks WCE 28, July 2-4, 28, London, U.K. Genertion of Lypunov Functions by Neurl Networks Nvid Noroozi, Pknoosh Krimghee, Ftemeh Sfei, nd Hmed Jvdi Abstrct Lypunov function is generlly obtined bsed on tril nd

More information

Chapter 0. What is the Lebesgue integral about?

Chapter 0. What is the Lebesgue integral about? Chpter 0. Wht is the Lebesgue integrl bout? The pln is to hve tutoril sheet ech week, most often on Fridy, (to be done during the clss) where you will try to get used to the ides introduced in the previous

More information

Math 2142 Homework 2 Solutions. Problem 1. Prove the following formulas for Laplace transforms for s > 0. a s 2 + a 2 L{cos at} = e st.

Math 2142 Homework 2 Solutions. Problem 1. Prove the following formulas for Laplace transforms for s > 0. a s 2 + a 2 L{cos at} = e st. Mth 2142 Homework 2 Solution Problem 1. Prove the following formul for Lplce trnform for >. L{1} = 1 L{t} = 1 2 L{in t} = 2 + 2 L{co t} = 2 + 2 Solution. For the firt Lplce trnform, we need to clculte:

More information

Research Article Generalized Hyers-Ulam Stability of the Second-Order Linear Differential Equations

Research Article Generalized Hyers-Ulam Stability of the Second-Order Linear Differential Equations Hindwi Publihing Corportion Journl of Applied Mthemtic Volume 011, Article ID 813137, 10 pge doi:10.1155/011/813137 Reerch Article Generlized Hyer-Ulm Stbility of the Second-Order Liner Differentil Eqution

More information

Line and Surface Integrals: An Intuitive Understanding

Line and Surface Integrals: An Intuitive Understanding Line nd Surfce Integrls: An Intuitive Understnding Joseph Breen Introduction Multivrible clculus is ll bout bstrcting the ides of differentition nd integrtion from the fmilir single vrible cse to tht of

More information

CS667 Lecture 6: Monte Carlo Integration 02/10/05

CS667 Lecture 6: Monte Carlo Integration 02/10/05 CS667 Lecture 6: Monte Crlo Integrtion 02/10/05 Venkt Krishnrj Lecturer: Steve Mrschner 1 Ide The min ide of Monte Crlo Integrtion is tht we cn estimte the vlue of n integrl by looking t lrge number of

More information

PRACTICE EXAM 2 SOLUTIONS

PRACTICE EXAM 2 SOLUTIONS MASSACHUSETTS INSTITUTE OF TECHNOLOGY Deprtment of Phyic Phyic 8.01x Fll Term 00 PRACTICE EXAM SOLUTIONS Proble: Thi i reltively trihtforwrd Newton Second Lw problem. We et up coordinte ytem which i poitive

More information

Math 1B, lecture 4: Error bounds for numerical methods

Math 1B, lecture 4: Error bounds for numerical methods Mth B, lecture 4: Error bounds for numericl methods Nthn Pflueger 4 September 0 Introduction The five numericl methods descried in the previous lecture ll operte by the sme principle: they pproximte the

More information

1 The Riemann Integral

1 The Riemann Integral The Riemnn Integrl. An exmple leding to the notion of integrl (res) We know how to find (i.e. define) the re of rectngle (bse height), tringle ( (sum of res of tringles). But how do we find/define n re

More information

8 Laplace s Method and Local Limit Theorems

8 Laplace s Method and Local Limit Theorems 8 Lplce s Method nd Locl Limit Theorems 8. Fourier Anlysis in Higher DImensions Most of the theorems of Fourier nlysis tht we hve proved hve nturl generliztions to higher dimensions, nd these cn be proved

More information

Bernoulli Numbers Jeff Morton

Bernoulli Numbers Jeff Morton Bernoulli Numbers Jeff Morton. We re interested in the opertor e t k d k t k, which is to sy k tk. Applying this to some function f E to get e t f d k k tk d k f f + d k k tk dk f, we note tht since f

More information

Numerical Integration

Numerical Integration Chpter 1 Numericl Integrtion Numericl differentition methods compute pproximtions to the derivtive of function from known vlues of the function. Numericl integrtion uses the sme informtion to compute numericl

More information

Point-Based POMDP Algorithms: Improved Analysis and Implementation

Point-Based POMDP Algorithms: Improved Analysis and Implementation Point-Bsed POMDP Algorithms: Improved Anlysis nd Implementtion Trey Smith nd Reid Simmons Rootics Institute, Crnegie Mellon University Pittsurgh, PA 15213 Astrct Existing complexity ounds for point-sed

More information

Acceptance Sampling by Attributes

Acceptance Sampling by Attributes Introduction Acceptnce Smpling by Attributes Acceptnce smpling is concerned with inspection nd decision mking regrding products. Three spects of smpling re importnt: o Involves rndom smpling of n entire

More information

Planning to Be Surprised: Optimal Bayesian Exploration in Dynamic Environments

Planning to Be Surprised: Optimal Bayesian Exploration in Dynamic Environments Plnning to Be Surprised: Optiml Byesin Explortion in Dynmic Environments Yi Sun, Fustino Gomez, nd Jürgen Schmidhuber IDSIA, Glleri 2, Mnno, CH-6928, Switzerlnd {yi,tino,juergen}@idsi.ch Abstrct. To mximize

More information

EE Control Systems LECTURE 8

EE Control Systems LECTURE 8 Coyright F.L. Lewi 999 All right reerved Udted: Sundy, Ferury, 999 EE 44 - Control Sytem LECTURE 8 REALIZATION AND CANONICAL FORMS A liner time-invrint (LTI) ytem cn e rereented in mny wy, including: differentil

More information

Linear predictive coding

Linear predictive coding Liner predictive coding Thi ethod cobine liner proceing with clr quntiztion. The in ide of the ethod i to predict the vlue of the current ple by liner cobintion of previou lredy recontructed ple nd then

More information

Numerical Integration

Numerical Integration Chpter 5 Numericl Integrtion Numericl integrtion is the study of how the numericl vlue of n integrl cn be found. Methods of function pproximtion discussed in Chpter??, i.e., function pproximtion vi the

More information

MIXED MODELS (Sections ) I) In the unrestricted model, interactions are treated as in the random effects model:

MIXED MODELS (Sections ) I) In the unrestricted model, interactions are treated as in the random effects model: 1 2 MIXED MODELS (Sections 17.7 17.8) Exmple: Suppose tht in the fiber breking strength exmple, the four mchines used were the only ones of interest, but the interest ws over wide rnge of opertors, nd

More information

Frobenius numbers of generalized Fibonacci semigroups

Frobenius numbers of generalized Fibonacci semigroups Frobenius numbers of generlized Fiboncci semigroups Gretchen L. Mtthews 1 Deprtment of Mthemticl Sciences, Clemson University, Clemson, SC 29634-0975, USA gmtthe@clemson.edu Received:, Accepted:, Published:

More information

Analysis of Single Domain Particles. Kevin Hayden UCSB Winter 03

Analysis of Single Domain Particles. Kevin Hayden UCSB Winter 03 Anlyi of Single Domin Prticle Kevin Hyden UCSB Winter 3 Prefce Thi pper cme bout becue of my curioity with mgnet. I think every child begin to wonder bout the mgic within two piece of metl tht tick to

More information

Laplace s equation in Cylindrical Coordinates

Laplace s equation in Cylindrical Coordinates Prof. Dr. I. Ner Phy 571, T-131 -Oct-13 Lplce eqution in Cylindricl Coordinte 1- Circulr cylindricl coordinte The circulr cylindricl coordinte (, φ, z ) re relted to the rectngulr Crtein coordinte ( x,

More information

Transfer Functions. Chapter 5. Transfer Functions. Derivation of a Transfer Function. Transfer Functions

Transfer Functions. Chapter 5. Transfer Functions. Derivation of a Transfer Function. Transfer Functions 5/4/6 PM : Trnfer Function Chpter 5 Trnfer Function Defined G() = Y()/U() preent normlized model of proce, i.e., cn be ued with n input. Y() nd U() re both written in devition vrible form. The form of

More information

M. A. Pathan, O. A. Daman LAPLACE TRANSFORMS OF THE LOGARITHMIC FUNCTIONS AND THEIR APPLICATIONS

M. A. Pathan, O. A. Daman LAPLACE TRANSFORMS OF THE LOGARITHMIC FUNCTIONS AND THEIR APPLICATIONS DEMONSTRATIO MATHEMATICA Vol. XLVI No 3 3 M. A. Pthn, O. A. Dmn LAPLACE TRANSFORMS OF THE LOGARITHMIC FUNCTIONS AND THEIR APPLICATIONS Abtrct. Thi pper del with theorem nd formul uing the technique of

More information

Low-order simultaneous stabilization of linear bicycle models at different forward speeds

Low-order simultaneous stabilization of linear bicycle models at different forward speeds 203 Americn Control Conference (ACC) Whington, DC, USA, June 7-9, 203 Low-order imultneou tbiliztion of liner bicycle model t different forwrd peed A. N. Gündeş nd A. Nnngud 2 Abtrct Liner model of bicycle

More information

History-Based Controller Design and Optimization for Partially Observable MDPs

History-Based Controller Design and Optimization for Partially Observable MDPs History-Bsed Controller Design nd Optimiztion for Prtilly Observble MDPs Aksht Kumr School of Informtion Systems Singpore Mngement University kshtkumr@smu.edu.sg Shlomo Zilberstein School of Computer Science

More information

COUNTING DESCENTS, RISES, AND LEVELS, WITH PRESCRIBED FIRST ELEMENT, IN WORDS

COUNTING DESCENTS, RISES, AND LEVELS, WITH PRESCRIBED FIRST ELEMENT, IN WORDS COUNTING DESCENTS, RISES, AND LEVELS, WITH PRESCRIBED FIRST ELEMENT, IN WORDS Sergey Kitev The Mthemtic Intitute, Reykvik Univerity, IS-03 Reykvik, Icelnd ergey@rui Toufik Mnour Deprtment of Mthemtic,

More information

New Expansion and Infinite Series

New Expansion and Infinite Series Interntionl Mthemticl Forum, Vol. 9, 204, no. 22, 06-073 HIKARI Ltd, www.m-hikri.com http://dx.doi.org/0.2988/imf.204.4502 New Expnsion nd Infinite Series Diyun Zhng College of Computer Nnjing University

More information

How to simulate Turing machines by invertible one-dimensional cellular automata

How to simulate Turing machines by invertible one-dimensional cellular automata How to simulte Turing mchines by invertible one-dimensionl cellulr utomt Jen-Christophe Dubcq Déprtement de Mthémtiques et d Informtique, École Normle Supérieure de Lyon, 46, llée d Itlie, 69364 Lyon Cedex

More information

Solutions Problem Set 2. Problem (a) Let M denote the DFA constructed by swapping the accept and non-accepting state in M.

Solutions Problem Set 2. Problem (a) Let M denote the DFA constructed by swapping the accept and non-accepting state in M. Solution Prolem Set 2 Prolem.4 () Let M denote the DFA contructed y wpping the ccept nd non-ccepting tte in M. For ny tring w B, w will e ccepted y M, tht i, fter conuming the tring w, M will e in n ccepting

More information

State space systems analysis (continued) Stability. A. Definitions A system is said to be Asymptotically Stable (AS) when it satisfies

State space systems analysis (continued) Stability. A. Definitions A system is said to be Asymptotically Stable (AS) when it satisfies Stte spce systems nlysis (continued) Stbility A. Definitions A system is sid to be Asymptoticlly Stble (AS) when it stisfies ut () = 0, t > 0 lim xt () 0. t A system is AS if nd only if the impulse response

More information

The Regulated and Riemann Integrals

The Regulated and Riemann Integrals Chpter 1 The Regulted nd Riemnn Integrls 1.1 Introduction We will consider severl different pproches to defining the definite integrl f(x) dx of function f(x). These definitions will ll ssign the sme vlue

More information

LECTURE NOTE #12 PROF. ALAN YUILLE

LECTURE NOTE #12 PROF. ALAN YUILLE LECTURE NOTE #12 PROF. ALAN YUILLE 1. Clustering, K-mens, nd EM Tsk: set of unlbeled dt D = {x 1,..., x n } Decompose into clsses w 1,..., w M where M is unknown. Lern clss models p(x w)) Discovery of

More information

Uninformed Search Lecture 4

Uninformed Search Lecture 4 Lecture 4 Wht re common serch strtegies tht operte given only serch problem? How do they compre? 1 Agend A quick refresher DFS, BFS, ID-DFS, UCS Unifiction! 2 Serch Problem Formlism Defined vi the following

More information

Driving Cycle Construction of City Road for Hybrid Bus Based on Markov Process Deng Pan1, a, Fengchun Sun1,b*, Hongwen He1, c, Jiankun Peng1, d

Driving Cycle Construction of City Road for Hybrid Bus Based on Markov Process Deng Pan1, a, Fengchun Sun1,b*, Hongwen He1, c, Jiankun Peng1, d Interntionl Industril Informtics nd Computer Engineering Conference (IIICEC 15) Driving Cycle Construction of City Rod for Hybrid Bus Bsed on Mrkov Process Deng Pn1,, Fengchun Sun1,b*, Hongwen He1, c,

More information

Integral equations, eigenvalue, function interpolation

Integral equations, eigenvalue, function interpolation Integrl equtions, eigenvlue, function interpoltion Mrcin Chrząszcz mchrzsz@cernch Monte Crlo methods, 26 My, 2016 1 / Mrcin Chrząszcz (Universität Zürich) Integrl equtions, eigenvlue, function interpoltion

More information

SIMULATION OF TRANSIENT EQUILIBRIUM DECAY USING ANALOGUE CIRCUIT

SIMULATION OF TRANSIENT EQUILIBRIUM DECAY USING ANALOGUE CIRCUIT Bjop ol. o. Decemer 008 Byero Journl of Pure nd Applied Science, ():70 75 Received: Octoer, 008 Accepted: Decemer, 008 SIMULATIO OF TRASIET EQUILIBRIUM DECAY USIG AALOGUE CIRCUIT *Adullhi,.., Ango U.S.

More information

3.4 Numerical integration

3.4 Numerical integration 3.4. Numericl integrtion 63 3.4 Numericl integrtion In mny economic pplictions it is necessry to compute the definite integrl of relvlued function f with respect to "weight" function w over n intervl [,

More information

Near-Bayesian Exploration in Polynomial Time

Near-Bayesian Exploration in Polynomial Time J. Zico Kolter kolter@cs.stnford.edu Andrew Y. Ng ng@cs.stnford.edu Computer Science Deprtment, Stnford University, CA 94305 Abstrct We consider the explortion/exploittion problem in reinforcement lerning

More information