Motion Planning under Uncertainty using Iterative Local Optimization in Belief Space

Size: px
Start display at page:

Download "Motion Planning under Uncertainty using Iterative Local Optimization in Belief Space"

Transcription

1 Moion Planning under Uncerainy using Ieraive Local Opimizaion in Belief Space Jur van den Berg 1 Sachin Pail 2 Ron Aleroviz 2 1 School of Compuing, Universiy of Uah, berg@cs.uah.edu. 2 Dep. of Compuer Science, Universiy of Norh Carolina a Chapel Hill, {sachin, ron}@cs.unc.edu. Absrac We presen a new approach o moion planning under sensing and moion uncerainy by compuing a locally opimal soluion o a coninuous parially observable Markov decision process (POMDP). Our approach represen beliefs (he disribuions of he robo s sae esimae) by Gaussian disribuions and is applicable o robo sysems wih non-linear dynamics and observaion models. The mehod follows he general POMDP soluion framework in which we approximae he belief dynamics using an exended Kalman filer and represen he value funcion by a quadraic funcion ha is valid in he viciniy of a nominal rajecory hrough belief space. Using a belief space varian of ieraive LQG (ilqg), our approach ieraes wih secondorder convergence owards a linear conrol policy over he belief space ha is locally opimal wih respec o a user-defined cos funcion. Unlike previous work, our approach does no assume maximum-likelihood observaions, does no assume fixed esimaor or conrol gains, akes ino accoun obsacles in he environmen, and does no require discreizaion of he sae and acion spaces. The running ime of he algorihm is polynomial (O[n 6 ]) in he dimension n of he sae space. We demonsrae he poenial of our approach in simulaion for holonomic and nonholonomic robos maneuvering hrough environmens wih obsacles wih noisy and parial sensing and wih non-linear dynamics and observaion models. 1 Inroducion As a robo moves hrough an environmen o accomplish a ask, uncerainy may arise in (1) he robo s moion due o unmodeled or unpredicable exernal forces, and (2) he robo s sensing of is sae due o noisy or incomplee sensor measuremens. These forms of uncerainy are common in a variey of pracical roboics asks, including guiding aerial vehicles in urbulen condiions, maneuvering mobile robos in unfamiliar errain, and roboically seering flexible medical needles o clinical arges in sof issues. Explicily considering moion and sensing uncerainy when compuing moion plans can improve he qualiy of compued plans. The objecive of moion planning under uncerainy is o plan moions for a robo such ha he expeced cos (as defined by a user-specified cos-funcion) is minimized. Opimal plans ypically limi he informaion ha is los due o moion uncerainy and move he robo hrough regions of he sae space where informaion on he sae is gained. Opimal soluions will maximize, for insance, he probabiliy of reaching a specified goal locaion while avoiding collisions wih obsacles. To fully consider he impac of uncerainy in moion and sensing, a moion planner should no merely compue a saic pah hrough he robo s configuraion space bu raher a conrol policy ha defines he moion o perform given any curren sae informaion. A key challenge is ha he robo ofen canno direcly observe is curren sae bu insead esimaes a disribuion 1

2 over he se of possible saes (i.e., is belief sae) based on sensor measuremens ha are boh noisy and parial (i.e., only a subse of he sae vecor can be sensed). The problem of compuing a conrol policy over he space of belief saes is formally described as a parially observable Markov decision process (POMDP), on which a large body of work is available in he lieraure. Soluions o POMDPs are known o be exremely complex [19], since he belief space (over which a conrol policy is o be compued) is in he mos general formulaion an infinie-dimensional space of all possible probabiliy disribuions over he finie-dimensional sae space. Soluions based on discree or discreized sae and acion spaces are inherenly subjec o he curse of dimensionaliy, and have only been successfully applied o very small and low-dimensional sae spaces. In his paper, we presen a mehod ha akes as inpu a feasible rajecory and improves i by compuing a locally opimal rajecory and a corresponding conrol policy ha ogeher minimize he expeced value of a user-specified cos meric in he presence of moion and sensing uncerainy. To accomplish his, our mehod compues a locally opimal soluion o a POMDP problem wih coninuous sae and acion spaces and non-linear dynamics and observaion models, where we assume a belief can be represened by a Gaussian disribuion. This POMDP formulaion is applicable o a wide range of robo moion planning problems. Our approach uses a belief space varian of ieraive linear-quadraic Gaussian (ilqg) o perform value ieraion, where he value funcion is approximaed using a quadraizaion around a nominal rajecory, and he belief dynamics is approximaed using an exended Kalman filer (any non-linear Gaussian filer can in fac be used). The resul is a linear conrol policy over he belief space ha is valid in he viciniy of he nominal rajecory. By execuing he conrol policy, a new nominal rajecory is creaed, around which a new conrol policy is consruced. This process coninues wih second-order convergence owards a locally opimal soluion o he POMDP problem. Unlike general POMDP solvers ha have an exponenial running ime, our approach does no rely on discreizaions and has a running ime ha is polynomial (O[n 6 ]) in he dimension n of he sae space. Our approach combines, generalizes, and overcomes he limiaions of previous work ha has addressed he same problem of creaing applicable approximaions o he POMDP problem. Mos previous work on POMDPs assumes maximum-likelihood observaions o enable or simplify compuing a conrol policy. This assumpion has no formal jusificaion, ye seems o produce reasonable resuls. Our approach does no assume maximum-likelihood observaions, bu can relaively easily be adaped such ha i does. We use his o sudy he impac of he maximumlikelihood observaion assumpion on he resuling conrol policies and discuss he impac on plans compued using ieraive local opimizaion. Our resuls indicae ha no making his assumpion resuls, on average, in beer conrol policies (i.e., hey have lower expeced cos). Furhermore, our approach does no assume fixed esimaor or conrol gains, and akes ino accoun obsacles in he environmen. We do assume ha he dynamics and observaion models and cos funcions are sufficienly smooh, and ha he belief abou he sae of he robo is well described by only is mean and is variance. We show he poenial of our approach in several illusraive scenarios involving robos wih non-linear dynamics and observaion models moving hrough environmens conaining obsacles and relying on limied and parial sensing. 2 Previous Work Parially observable Markov decision processes (POMDPs) [24] provide a principled mahemaical framework for planning under uncerainy. They are known o be of exreme complexiy [19], and can only be direcly applied o problems wih small and low-dimensional sae spaces [16]. Recenly, several POMDP algorihms have been developed ha use approximae value ieraion wih poin-based updaes [1, 17, 20, 18]. These have been shown o scale up o medium-sized domains. However, hey rely on discreizing he sae space or he acion space, making hem 2

3 ineviably subjec o he curse of dimensionaliy. The mehods of [23, 4, 9, 6] handle coninuous sae and acion spaces, bu mainain a global (discree) represenaion of he value funcion over he belief space. In conras, our approach is coninuous and approximaes he value funcion in parameric form only in he regions of he belief space ha are relevan o solving he problem, allowing for a running ime polynomial in he dimension of he sae. Anoher class of works, o which our mehod is direcly relaed, assumes a linear-quadraic Gaussian (LQG) framework o find locally opimal feedback policies. In he basic LQG derivaion [2], moion and sensing uncerainy have no impac on he resuling policy. As shown in [25], he LQG framework can be exended such ha i accouns for sae and conrol dependen moion noise, bu sill implicily assumes full observaion (or an independen esimaor) of he sae. Several approaches have been proposed o include parial and noisy observaions such ha he conroller will acively choose acions o gain informaion abou he sae. Belief roadmaps [22] and iclqg [10] combine an ieraive LQG approach wih a roadmap, bu his approach does no resul in (locally) opimal soluions. The approaches of [21, 7, 8] are similar o ours and incorporae he variance ino an augmened sae and use he LQG framework o find a locally opimal conrol policy. The main difference is ha hese approaches assume maximumlikelihood observaions o make he belief propagaion deerminisic. LQG-MP [26] removes his assumpion, bu only evaluaes he probabiliy of success of a given rajecory, raher han consrucing an opimal one. Belief rees [5] overcome his limiaion by combining a varian of LQG-MP wih RRT* o find an opimal rajecory hrough belief space. A grea advanage of his approach is ha i finds a globally opimal soluion. Vius and Tomlin [31] propose an alernaive soluion ha involves solving a chance consrained opimal conrol problem. However, hese approaches do no really solve a POMDP as hey assume fixed conrol gains along each secion of he rajecory independen of he conex. The work of [15] akes ino accoun sae and conrol dependen moion and observaion noise by an inerleaved ieraion of he esimaor and he conroller, converging o a local opimum. While his approach is asympoically faser han ours, i does no allow for obsacles in he environmen and resuls in a conroller ha is opimal only under he assumpion of fixed esimaor gains. Our approach combines and generalizes hese approaches as i does no assume maximum-likelihood observaions, does no assume fixed conrol or esimaor gains, and akes ino accoun he exisence of obsacles in he environmen o compue locally opimal policies ha minimize he expeced value of a user-defined cos funcion. This paper is an exended version of a preliminary paper presened by he auhors in [28], which used sochasic differenial dynamic programming (sddp) raher han ilqg for he value ieraion, bu oherwise presens he same global approach. Also, o improve numerical sabiliy compared o [28], in his paper we use he principal square roo of he variance, raher han he variance iself, in he definiion of he belief. Qualiaively, ilqg is asympoically faser han sddp (O[n 6 ] raher han O[n 7 ]) and numerically more sable (regularizaion of marices o mainain posiive-semidefinieness of he value funcion is no necessary wih ilqg). Our experimenal resuls include a quaniaive comparison beween he wo approaches. 3 Preliminaries and Definiions We begin by defining POMDPs in heir mos general formulaion (following [24]). Then, we specifically sae he insance of he problem we discuss in his paper. 3.1 General POMDPs Le X R n be he space of all possible saes x of he robo, U R m be he space of all possible conrol inpus u of he robo, and Z R k be he space of all possible sensor measuremens z he robo may receive. General POMDPs ake as inpu a sochasic dynamics and observaion 3

4 model, here given in probabilisic noaion: x +1 p[x +1 x, u ], z p[z x ], (1) where x X, u U, and z Z are he robo s sae, conrol inpu, and received measuremen a ime sep, respecively. The belief b[x ] of he robo is defined as he disribuion of he sae x given all pas conrol inpus and sensor measuremens: b[x ] = p[x u 0,..., u 1, z 1,..., z ]. (2) Given a conrol inpu u and a measuremen z +1, he belief is propagaed using Bayesian filering: b[x +1 ] = η p[z +1 x +1 ] p[x +1 x, u ] b[x ] dx, (3) where η is a normalizer independen of x +1. Denoing belief b[x ] by b, and he space of all possible beliefs by B {X R}, he belief dynamics defined by Eq. (3) can be wrien as a funcion β : B U Z B: b +1 = β[b, u, z +1 ]. (4) Now, he challenge of he POMDP problem is o find a conrol policy π : B U for all 0 < l, where l is he ime horizon (i.e. he index of he final ime sep), such ha selecing he conrols u = π [b ] minimizes he objecive funcion: [ l 1 E cl [b l ] + c [b, u ] ], (5) z 1,...,z l for given immediae cos funcions c l and c. The expecaion is aken because he measuremens are sochasic. A general soluion approach uses value ieraion [24], a backward recursion procedure, o find he conrol policy π for each ime sep : =0 v l [b l ] = c l [b l ] (6) [ v [b ] = min(c [b, u ] + E v+1 [β[b, u, z +1 ]] ] ) (7) u z+1 [ π [b ] = argmin(c [b, u ] + E v+1 [β[b, u, z +1 ]] ] ), (8) u z+1 where v [b ] : B R is called he value funcion a ime sep. 3.2 Problem Definiion The complexiy of POMDPs sems from he fac ha B, he space of all beliefs, is infiniedimensional, and ha in general he value funcion canno be expressed in parameric form. We address hese challenges in our approach by represening beliefs by Gaussian disribuions, approximaing he belief dynamics using an exended Kalman filer, and approximaing he value funcion by a quadraizaion around a nominal rajecory hrough he belief space. Specifically, we assume we are given a (non-linear) sochasic dynamics and observaion model, here given in sae-ransiion noaion: x +1 = f[x, u, m ], m N [0, I], (9) z = h[x, n ], n N [0, I], (10) 4

5 where m is he moion noise and n is he measuremen noise, each drawn from an independen Gaussian disribuion wih (wihou loss of generaliy) zero mean and uni variance. Noe ha he moion and sensing uncerainy can be sae and conrol inpu dependen hrough manipulaions on m and n wihin he funcions f and h, respecively. The belief, denoed b = (ˆx, Σ ), is assumed o be defined by he mean ˆx and he principal square roo Σ of he variance Σ of a Gaussian disribuion N [ ˆx, Σ ] of he sae x. We use he square roo for numerical robusness of he algorihm we presen below. Similar o he general POMDP case, our objecive is o find a conrol policy u = π [b ] ha minimizes he cos funcion E [ c l [b l ] + l 1 =0 c [b, u ] ]. In our case, we assume in addiion posiive- (semi)definieness for he Hessian marices of he immediae cos funcions c : [ ] 2 c l 2 c 2 c [b] 0, [b, u] > 0, b b [b, u] 2 c b u [b, u] 0, (11) b b u u [b, u] [b, u] 2 c u b 2 c u u for all b, u and. Furher, we assume ha he iniial belief b 0 = (ˆx 0, Σ 0 ) is given. 4 Approach To compue a locally opimal soluion o he Gaussian POMDP problem as formulaed above, we follow he general soluion approach as skeched in Secion 3.1. Firs, we approximae he belief dynamics using an exended Kalman filer. Second, we approximae he value funcion using a quadraic funcion ha is locally valid in he viciniy of a nominal rajecory hough he belief space. We hen use a belief-space varian of ieraive LQG o perform he value ieraion, which resuls in a linear conrol policy over he belief space ha is locally opimal around he nominal rajecory. We hen ieraively generae new nominal rajecories by execuing he conrol policy, and repea he process unil convergence o a locally opimal soluion o he POMDP problem. We discuss each of hese seps in his secion, and analyze he running ime of our algorihm. 4.1 Bayesian Filer and Belief Dynamics Given a curren belief b = (ˆx, Σ ), a conrol inpu u, and a measuremen z +1, he belief evolves using a Bayesian filer. We approximae he Bayesian filer by an exended Kalman filer (EKF), which is applicable o Gaussian beliefs (we noe ha any oher non-linear Gaussian filer, such as he unscened Kalman filer [12], can be used as well). The EKF is widely used for sae esimaion of non-linear sysems [32], and uses he firs-order approximaion ha for any vecor-valued funcion f[x] of a sochasic variable x we have: E[f[x]] f[e[x]], Var[f[x]] f f [E[x]] Var[x] x x [E[x]]T. (12) Given ˆx and Σ ha define he curren belief, he EKF updae equaions are hen given by: ˆx +1 = f[ˆx, u, 0] + K (z +1 h[f[ˆx, u, 0], 0]), (13) Σ+1 = Γ K H Γ, (14) where Γ = A Σ (A Σ ) T + M M T, A = f x [ˆx, u, 0], K = Γ H T (H Γ H T + N N T ) 1, H = h x [f[ˆx, u, 0], 0], M = f m [ˆx, u, 0], (15) N = h n [f[ˆx, u, 0], 0]. (16) 5

6 Noe ha all of hese marices are funcions of b and u. Equaions (13) and (14) define he (non-linear) belief dynamics. The second erm of Eq. (13), called he innovaion erm, depends on he measuremen z +1. Since he measuremen is unknown in advance, he belief dynamics are sochasic. Using Eq. (10) and he assumpions of Eq. (12), he innovaion erm is disribued according o N [0, K H Γ ]. We define he belief b = [ ˆx vec[ Σ ] ] as a rue vecor, conaining he mean ˆx and he columns of Σ. Obviously, in our implemenaion we exploi he symmery of Σ o eliminae he redundancy. Then, he sochasic belief dynamics are given by: b +1 = g[b, u ] + W [b, u ]w, w N [0, I n ], (17) where n is he dimension dim[x] of he sae, and: [ ] f[ˆx g[b, u ] =, u, 0] vec[, W [b Γ K H Γ ], u ] = 4.2 Value Ieraion [ ] K H Γ. (18) 0 We perform value ieraion backward in ime o find a locally opimal conrol policy. When using value ieraion (dynamic programming) over discree saes one usually sores he value of each possible sae. In he case of a coninuous sae his is no possible. Insead, we assume ha we have an iniial (nominal) rajecory given. For each ime sep we calculae an approximaion of he value funcion around he sae he robo is in a ime sep when following he nominal rajecory. As he value funcion a ime sep depends on he value funcion a ime sep + 1, his is done in a backward ieraive process saring a he final ime sep l. Using he approximaed value funcion, we can also calculae an opimal policy for each ime sep. Using his opimal policy we generae a new nominal rajecory by saring a he iniial sae and applying his opimal policy forward in ime. The process in hen repeaed using he new nominal rajecory, and ulimaely converges o a locally opimal soluion. We use a belief-space varian of ieraive LQG [25] o perform he value ieraion. We approximae he value funcion v [b] as a quadraic funcion ha is approximaely valid around a given nominal rajecory in belief space. Le he nominal rajecory be given as a series of beliefs and conrol inpus ( b 0, ū 0,..., b l 1, ū l 1, b l ) such ha b +1 = g[ b, ū ] for 0... l 1 (we will discuss iniializaion and ieraive convergence of he nominal rajecory o a locally opimal rajecory in he nex subsecion). The value funcion is hen approximaed as: v [b] 1 2 (b b ) T S (b b ) + (b b ) T s + s, (19) wih S 0. For he final ime sep = l, he value funcion v l (see Eq. (6)) is approximaed by seing S l = 2 c l b b [ b l ], s l = c l b [ b l ], s l = c l [ b l ], (20) which amouns o a second-order Taylor expansion of c l around he poin b l. The value funcions and he conrol policies for he ime seps l > 0 are compued by backward recursion. 6

7 We proceed by combining Eqs. (7), (17), and (19): ( v [b] = min c [b, u] + E [ v +1 [g[b, u] + W [b, u]w ] ]) u ( = min c [b, u] + E [ 1 u 2 (g[b, u] + W [b, u]w b +1 ) T S +1 (g[b, u] + W [b, u]w b +1 ) + (g[b, u] + W [b, u]w b +1 ) T ] ) s +1 + s +1 ( = min c [b, u] + 1 u 2 (g[b, u] b +1 ) T S +1 (g[b, u] b +1 ) + (g[b, u] b +1 ) T s +1 + s r [ W [b, u] T S +1 W [b, u] ]) (21) ( = min c [b, u] + 1 u 2 (g[b, u] b +1 ) T S +1 (g[b, u] b +1 ) + (g[b, u] b +1 ) T s +1 + s n ) W (i) [b, u] T S +1 W (i) [b, u], (22) 2 i=1 where W (i) [b, u] refers o he i h column of marix W [b, u] (noe ha W [b, u] has n columns, where n is he dimension of he sae). The race-erm in Eq. (21) follows from he fac ha E[x T Qx] = E[x] T Q E[x] + r[q Var[x]] for any sochasic variable x, and ha r[qxx T ] = r[x T QX]. I is his erm ha ensures ha he sochasic naure of he belief dynamics is accouned for in he value ieraion. Eq. (22) follows from he fac ha r[x T QX] = i X (i) T QX (i). To approximae he opimal value of u as a funcion of b we linearize he belief dynamics and each of he columns of W [b, u] abou he belief b and conrol inpu ū of he nominal rajecory. Given ha b +1 = g[ b, ū ], we ge: where F = g b [ b, ū ], e i = W (i) [ b, ū ], g[b, u] b +1 F (b b ) + G (u ū ), (23) W (i) [b, u] e i + F i (b b ) + G i (u ū ), (24) G = g u [ b, ū ], (25) F i = W (i) b [ b, ū ], G i = W (i) u [ b, ū ]. (26) The immediae cos funcion c [b, u] is quadraized abou b and ū : where c [b, u] 1 [ ] T [ b b Q 2 u ū P Q = 2 c b b [ b, ū ], q T P T R ] [ ] [ ] T [ ] b b b b q + + p u ū u ū r, (27) R = 2 c u u [ b, ū ], P = 2 c u b [ b, ū ], = c b [ b, ū ], r T = c u [ b, ū ], p = c [ b, ū]. (28) 7

8 Filling in Eqs. (23), (24), and (27) ino Eq. (22), we ge: ( [ ] T [ 1 b b Q v [b] min u 2 u ū P where (1 = min u 2 P T R ] [ ] [ ] T [ ] b b b b q + + p u ū u ū r (F (b b ) + G (u ū )) T S +1 (F (b b ) + G (u ū )) + (F (b b ) + G (u ū )) T s +1 + s n (e i + F i (b 2 b ) + G i (u ū )) T S +1 (e i + F i (b b ) ) + G i (u ū )) i=1 [ b b u ū ] T [ C C = Q + F T S +1 F + D = R + G T S +1 G + E = P + G T S +1 F + E E T D ] [ ] [ ] T [ ] b b b b c ) + + e u ū u ū d, (29) n F i T S+1 F i, c = q + F T s +1 + i=1 n G i T S+1 G i, d = r + G T s +1 + i=1 n i=1 G i T S+1 F i, e = p + s n F i T S+1 e i, (30) i=1 n G i T S+1 e i, (31) i=1 n e i T S+1 e i. (32) Equaion (29) is hen solved by expanding he erms, aking he derivaive wih respec o u and equaing o 0 (for u o be acually a minimum, D mus be posiive-definie. Given he assumpions of Eq. (11), his is necessarily he case). We hen ge he soluion: u ū = D 1 E (b b ) D 1 d. (33) Hence, he conrol policy for ime sep is linear and given by: i=1 u = π [b ] = L (b b ) + l + ū, L = D 1 E, l = D 1 d. (34) Filling Eq. (33) back ino Eq. (29) gives he value funcion v [b] as a funcion of only b in he form of Eq. (19). Expanding and collecing erms gives: S = C E T D 1 E, s = c E T D 1 d, s = e 1 2 dt D 1 d. (35) This recursion hen coninues by compuing a conrol policy for ime sep Ieraion o a Locally Opimal Conrol Policy The above value ieraion gives a conrol policy ha is valid in he viciniy of he given nominal rajecory. To le he conrol policy converge o a local opimum, we ieraively updae he nominal rajecory using he mos recen conrol policy [11]. Given he iniial belief b 0 = (ˆx 0, Σ 0 ), and an (arbirary) iniial nominal rajecory ( b (0) b (0) b (0) +1 0, ū(0) (0) 0,..., b l 1, ū(0) (0) l 1, b l ) (such ] for 0... l 1), which can be obained using RRT (0) ha 0 = b 0 and = g[ b, ū (0) moion planning [13], for insance, we proceed as follows. Using he value ieraion procedure as described above given he nominal rajecory of he i h ieraion, we find he conrol policy, i.e. he marices L (i) and vecors l (i) for he i h ieraion. We hen compue he nominal rajecory ( b (i+1), ū (i+1) ) of he i + 1 h ieraion 8

9 (saring wih i = 0) by forward inegraing he conrol policy in he deerminisic (zero-noise) belief dynamics: b (i+1) 0 = b 0, ū (i+1) = L (i) ( b (i+1) (i) b ) + l (i) + ū (i), b(i+1) +1 = g[ b (i+1), ū (i+1) ], (36) We hen recompue he conrol policy, and reierae. This les he conrol policy converge o a locally opimal rajecory wih a second-order convergence rae [14]. 4.4 Ensuring Convergence To ensure ha he above algorihm in fac converges o a locally opimal conrol policy, he algorihm mus be augmened wih line search. As wih Newon s mehod for finding roos of a funcion, second order convergence of he above algorihm is only achieved if he curren nominal rajecory is already close o he locally opimal rajecory. If he curren nominal rajecory is far away from he local opimum, he approach may overshoo local-minima, which significanly slows down convergence, or even resuls in divergence. To address his issue, we subly change he algorihm following [33]. We limi he incremen o he conrol policy by adding a parameer ε o Eq. (33): (u ū ) = L (b b ) + εl. Iniially, ε = 1, bu each ime ha he conrol policy creaes a rajecory wih higher expeced cos han he previous nominal rajecory, he rajecory is rejeced, ε is divided in half, and a new rajecory is creaed. When a new rajecory is acceped, ε is rese o 1. This change is equivalen o using backracking line search o limi he sep size in Newon s mehod and guaranees convergence o a locally opimal conrol policy [33]. An issue ha remains is how o compue he expeced cos of a given nominal rajecory. In deerminisic ilqg, one simply evaluae is cos using he given immediae cos funcions c [b, u]. In our case however, he dynamics are sochasic, so one has o compue he expeced cos. We do his as follows. Le L (i) and εl (i) define he conrol policy in he i h ieraion. A candidae nominal rajecory for ieraion i + 1 is now generaed by applying his conrol policy wih respec o he nominal rajecory of ieraion i, according o Eq. (36). We have: ū (i+1) ū (i) = L (i) ( b (i+1) The conrol policy of ieraion i iself is defined as (u ū (i+1) (u ū (i+1) ) + (ū (i+1) ) + L (i) ( b (i+1) u ū (i) ū (i) (i) b ) + εl (i) u ū (i+1) = L (i) (b ) = L (i) ((b = L (i) ((b = L (i) (b (i) b ) + εl (i). (37) (i) b ) + εl (i), (i+1) b ) + ( b (i+1) (i+1) b ) + ( b (i+1) (i) b )) + εl (i), (i) b )) + εl (i), b (i+1) ). (38) Hence, Eq. (38) gives he conrol policy of ieraion i relaive o a candidae rajecory of ieraion i + 1. We now compue he expeced cos of he candidae nominal rajecory ( b (i+1), ū (i+1) ) as follows. Quadraizing he immediae cos funcions and linearizing he belief dynamics abou he candidae rajecory of ieraion i + 1 according o Eqs. (23) o (28), in combinaion wih he conrol policy of Eq. (38), allows us o o recursively updae he value funcion along he 9

10 candidae rajecory as: S = Q + L T R L + L T P + P T L + (F + G L ) T S +1 (F + G L ) + n (F j + G j L ) T S +1 (F j + G j L ), (39) j=1 s = q + L T r + (F + G L ) T s +1 + s = p + s n (F j + G j L ) T S +1 e j, (40) j=1 n e j T S+1 e j, (41) j=1 where L = L (i). The value s 0 now gives he expeced cos of he candidae nominal rajecory wih respec o he conrol policy of he curren nominal rajecory (noe ha he s s are inconsequenial for he expeced cos, and need no be compued). If his expeced cos is lower han he expeced cos of he curren nominal rajecory, he candidae nominal rajecory is acceped, ε is rese o 1, and he ieraion coninues. Oherwise, ε is divided in half, and he search for a new nominal rajecory coninues. Since he vecors l poin in he direcion of he gradien of he expeced cos, a posiive ε ha generaes a new rajecory wih lower expeced cos will always be found. When he magniude of he l s vanish (or drop below a prese small value), he ieraion sops and he curren nominal rajecory and is conrol policy is a locally opimal soluion. 4.5 Running Time Analysis Le us analyze he running ime of our algorihm. The dimension of he sae is n, and we assume for he sake of analysis ha he dimension of he conrol inpus and he measuremens are O[n]. As he belief conains he (square roo of he) covariance marix of he sae, he dimension of a belief is O[n 2 ]. The boleneck of he running ime lies in he compuaion of he marix C in Eq. (30). Evaluaing he produc F T S +1 F in Eq. (30) of marices of O[n 2 ] O[n 2 ] dimension akes O[n 6 ] ime. Also, compuing he marix Q of Eq. (28), which conains O[n 4 ] enries, using numerical differeniaion (cenral differences) can be done in O[n 6 ] ime assuming ha c [b, u] can be evaluaed in O[n 2 ] ime. Furher, he produc F i T S+1 F i is evaluaed n imes, bu each can be evaluaed in O[n 5 ] ime, since each F i only conains non-zero enries in he upper n O[n 2 ] block of he marix (see he definiion of W [b, u] in Eq. (18)). Noe ha linearizing he belief dynamics, i.e. compuing he marices F, G, F i and G i using numerical differeniaion (cenral differences) can be done in O[n 5 ] ime, as i involves evaluaing he belief dynamics (which akes O[n 3 ] ime for he EKF (and also for he UKF)) O[n 2 ] imes. Hence, his does no form a boleneck of he compuaion. A complee cycle of value ieraion akes l seps (l being he index of he final ime sep), bringing he complexiy o O[ln 6 ]. The number of such cycles needed o obain convergence canno be expressed in erms of n or l, bu as noed before, our algorihm converges wih a second-order rae o a local opimum. 5 Environmens wih Obsacles We presened our approach above for general immediae cos funcions c l [b] and c [b, u] (wih he assumpions of Eq. (11)). In ypical LQG-syle cos funcions, he exisence of obsacles in he environmen is no incorporaed, while we may wan o minimize he probabiliy of colliding wih hem. We incorporae obsacles ino he cos funcions as follows. 10

11 f Σ Σ Figure 1: Plos of he funcion f[σ] = log γ[n/2, σ 2 /2] for n = {1, 2, 3}. Le O X be he region of he sae space ha is occupied by obsacles. Given a belief b = (ˆx, Σ ), he probabiliy of colliding wih an obsacle is given by he inegral over O of he probabiliy-densiy funcion of N [ˆx, Σ ]. As described in [26], his probabiliy can be approximaed by using a collision-checker o compue he number σ[b ] of sandard-deviaions one may deviae from he mean before an obsacle is hi (i akes one geomeric disance compuaion o compue his number, and does no involve Mone Carlo sampling). A lower-bound on he probabiliy of no colliding is hen given by γ[n/2, σ[b ] 2 /2], where γ is he regularized gamma funcion, and n he dimension of he sae. A lower-bound on he oal probabiliy of no colliding along a rajecory is subsequenly compued as l 1 =0 γ[n/2, σ[b ] 2 /2], and his number should be maximized. To fi his objecive wihin he minimizing and addiive naure of he POMDP objecive funcion, we noe ha maximizing a produc is equivalen o minimizing he sum of he negaive logarihms of he facors. Hence, we add o c [b, u] he erm f[σ[b]] = log γ[n/2, σ[b] 2 /2] o accoun for he probabiliy of colliding wih obsacles (noe ha f[σ[b]] > 0 and 2 f σ σ > 0; see Fig. 1), poenially muliplied by a scaling facor o allow rading-off wih respec o oher coss (such as he magniude of he conrol inpu). While he above approach works well, i should be noed ha in order o compue he Hessian of c [b, u] a b (i.e. compuing he marix Q as is done in Eq. (28)), a oal of O[n 4 ] collision-checks wih respec o he obsacles need o be performed, since he obsacle erm f[σ[b]] is par of c [b, u]. As his can be prohibiively cosly, we can insead approximae he Hessian of f[σ[b]] using linearizaions, which involves only O[n 2 ] collision checks. To his end, le us approximae f[σ] by a second-order Taylor expansion abou σ[ b ]: f[σ[b]] 1 2 a(σ[b] σ[ b ]) 2 + b(σ[b] σ[ b ]) + f[σ[ b ]], (42) where a = 2 f σ σ [σ[ b ]] and b = f σ [σ[ b ]] (noe ha his requires only one collision-check). Now, we approximae (σ[b] σ[ b ]) using a firs-order Taylor expansion abou b : σ[b] σ[ b ] (b b ) T a (43) where a T = σ b [ b ] (noe ha his requires O[n 2 ] collision-checks). By subsiuing Eq. (43) in Eq. (42), we ge f[σ[b]] 1 2 (b b ) T (aaa T )(b b ) + (b b ) T (ba) + f[σ[ b ]]. (44) Hence, aaa T is an approximae Hessian of he obsacle erm f[σ[b]] of c [b, u] ha requires only O[n 2 ] collision-checks o compue. In addiion, since a > 0, his Hessian is guaraneed o be posiive-semidefinie, as mandaed by Eq. (11). 11

12 6 Resuls We evaluae our approach in simulaion applied o robo moion planning scenarios involving sochasic dynamics, measuremen models wih sae and conrol-dependen noise, and spaiallyvarying sensing capabiliies. We consider hree scenarios: (i) a 2-D poin robo wih linear dynamics, (ii) a non-holonomic, car-like robo wih second-order dynamics, and (iii) an aircraflike robo navigaing in a 3-D environmen. Our mehod akes as inpu a collision-free rajecory o he goal. A naïve rajecory compued using an uncerainy-unaware planner migh sray very close o he obsacles in he environmen and accumulae considerable uncerainy during execuion. We show ha our mehod improves he inpu rajecory o compue a locally opimal rajecory and a corresponding conrol policy ha safely guides he robo o he goal, even in he presence of large moion uncerainy and measuremen noise. In each of he following experimens, we use he following definiions of c l [b l ] and c [b, u ] in he cos funcion o be minimized (Eq. (5)): c l [b l ] = ˆx T l Q lˆx l + r[ Σ l Q l Σl ], (45) c [b, u ] = u T R u + r[ Σ l Q Σ ] + f[σ[b ]], (46) for given Q 0 and R > 0. The erm ˆx T l Q lˆx l + r[ Σ l Q l Σl ] = E[x T l Q lx l ] encodes he final cos of arriving a he goal, u T R u penalizes he conrol effor along he rajecory, r[ Σ Q Σ ] penalizes he uncerainy, and f[σ[b ]] encodes he obsacle cos erm (if applicable). Using he approximaion of Eq. (44) for f[σ[b ]], he above cos funcions are in accordance wih he assumpions of Eq. (11), and heir Hessians can be consruced in O[n 4 ] ime, so i does no presen a boleneck for he running ime. All he performance resuls presened in his secion are based on a C++ implemenaion running on a 3.33 Ghz Inel R i7 TM PC. For each scenario, we evaluae he performance of our approach and he qualiy of he compued conrol policy. We also separaely consider environmens wih and wihou obsacles o demonsrae ha our approach can handle boh ypes of environmens. We compare and analyze he performance and convergence characerisics of he approach presened in his paper o our preliminary approach based on sochasic differenial dynamic programming (sddp) [28]. We also analyze he effec of assuming maximum-likelihood observaions [21, 7, 8] on he compued locally opimal rajecory and corresponding conrol policy D Poin Robo We consider he case of a poin robo moving in a 2-D environmen wih he following linear dynamics model wih conrol-dependen moion noise: x +1 = f[x, u, m ] = x + τu + M[u ] m, (47) where he sae x = (x, y) R 2 is he robo s posiion, he conrol inpu u R 2 is he robo s velociy, τ is he duraion of a ime sep, and he marix M[u ] scales he moion noise m proporional o he conrol inpu u. The robo localizes iself using noisy measuremens from sensors in he environmen, he reliabiliy of which varies as a funcion of he robo s posiion x. The robo is able o obain reliable measuremens in he brigh region of he environmen, bu he measuremens become noisier as he robo moves in o he dark regions. This gives he following linear observaion model wih spaially-varying noise: z = h[x, n ] = x + N[x ] n, (48) 12

13 (a) Iniial rajecory. (b) Locally opimal soluion. Figure 2: Poin robo moving in a 2-D ligh-dark domain wihou obsacles (adaped from Pla e al. [21]). (a) The mehod is iniialized wih a naïve sraigh line rajecory o he goal. (b) The nominal rajecory and associaed beliefs of he soluion (shown in black), and he rajecory obained by applying he compued feedback policy o a robo wih an iniial belief ha is considerably differen han he iniial belief used for compuing he conrol policy (shown in red). where he measuremen vecor z R 2 consiss of noisy measuremens of he robo s posiion and he marix N[x ] scales he measuremen noise based on a funcion of he robo s posiion. We use sae and conrol cos marices of Q = I, R = I, and he final cos marix, Q l = 10lI in our experimens, where l is he number of secions in he iniial rajecory. The mehod converges when he difference beween he expeced coss beween successive ieraions falls below a user-specified epsilon hreshold Ligh-Dark Domain (No Obsacles) We consider he ligh-dark domain scenario suggesed by Pla e al. [21]. The measuremen noise (modeled by he marix N[x ]) varies as a quadraic funcion of he robo s horizonal coordinae x (as shown in Fig. 2). We iniialize our mehod wih a naïve sraigh line rajecory from he iniial posiion o he goal (Fig. 2(a)). Fig. 2(b) shows he nominal rajecory and he associaed beliefs of he soluion compued using our approach. The locally opimal nominal rajecory leads he robo o he horizonal coordinae where he measuremen noise is minimum, in order o beer localize iself, before proceeding o he goal. For his example, he iniial nominal rajecory has an expeced cos of 49.7, and he rajecory converges o a (local) opimum wih an expeced cos of 9.61 in 42 ieraions, requiring a oal compuaion ime of seconds. To evaluae he qualiy of he compued conrol policy, we also compued he acual expeced coss across simulaion runs ha use he compued feedback policy o compensae for arificial moion and measuremen noise. The acual expeced cos for he compued conrol policy was 9.46 unis. To demonsrae he effeciveness of he conrol policy compued by our mehod, we apply he compued feedback policy o a robo wih a belief ha is considerably differen han he belief wih which our mehod is iniialized. The resuling rajecory is indicaed in red in Fig. 2(b). The compued policy iniially leads he robo owards he ligh region, i quickly recifies he rajecory afer a beer esimae of he belief is obained in he ligh region. The basin of aracion of he conrol policy is wide enough o avoid he need for replanning Ligh-Dark Domain (Wih Obsacles) We consider he ligh-dark domain scenario wih obsacles as suggesed in Bry and Roy [5]. In his scenario, he measuremen noise (modeled by he marix N[x ]) varies as a sigmoid funcion 13

14 (a) Iniial rajecory. (b) Locally opimal soluion. (c) Execuion races. Figure 3: Poin robo moving in a 2-D ligh-dark domain wih obsacles. (a) An iniial collisionfree rajecory is compued using an RRT planner. (b) Nominal rajecory and he associaed beliefs of soluion compued using our approach. The robo moves away from he goal o beer localize iself before reaching he goal wih significanly reduced uncerainy. (c) Execuion races of he robo s rue sae drawn from differen iniial beliefs while following he compued conrol policy. of he robo s horizonal coordinae x (as shown in Fig. 3). We iniialize our mehod wih a collision-free iniial rajecory compued using an RRT planner [13] (Fig. 3(a)). Fig. 3(b) shows he nominal rajecory and he associaed beliefs of he soluion compued using our approach. The nominal rajecory leads he robo o he region of he environmen wih reliable sensing for beer localizaion, before moving he robo hrough he narrow passage o arrive a he goal. For his example, he iniial rajecory has an expeced cos of and he rajecory converges o a local opimum wih expeced cos of in 66 ieraions, which requires a oal compuaion ime of seconds. To evaluae he qualiy of he compued conrol policy, we also compued he acual expeced coss across simulaion runs ha use he compued feedback policy o compensae for arificial moion and measuremen noise. The acual expeced cos was 13.8 unis. To demonsrae he effeciveness of he conrol policy compued by our mehod, we apply he compued feedback policy o a robo wih a belief ha is considerably differen han he belief wih which our mehod is iniialized. Fig. 3(c) shows races of he rue sae of he robo x across 5 simulaions where he iniial sae of he robo x 0 is sampled from a differen iniial belief o evaluae he robusness of he conrol policy. Even if he iniial belief is considerably differen from he iniial belief used o compue he soluion, he conrol policy is able o safely guide he robo o he goal. We also evaluaed our mehod quaniaively by compuing he percenage of execuions in which he robo was able o avoid obsacles across 1000 simulaion execuions for 10 random iniial beliefs. In our experimens, in 93% (sandard deviaion: 3%) of he execuions, he robo was able o safely raverse he narrow passage wihou colliding wih obsacles. Our soluion also agrees wih he soluion found by Bry and Roy [5] for his experimen. Our mehod direcly opimizes he rajecory raher han relying on an opimal sampling-based planner in belief space, resuling in an order of magniude faser compuaion imes. Our mehod also does no assume fixed conrol gains along each along each secion of he nominal rajecory. However, he mehod of Bry and Roy is able o find a globally-opimal soluion (given he fixed conrol gains), whereas our mehod compues a locally opimal soluion given an iniial rajecory. 6.2 Non-Holonomic Car-Like Robo We consider he case of a non-holonomic car-like robo navigaing in a 2-D environmen wih noisy and parial sensing of he robo s sae. The sae x = (x, y, θ, v) R 4 of he robo consiss of is posiion (x, y), is orienaion θ and speed v. The conrol inpu vecor u = (a, φ) 14

15 (a) Iniial rajecory. (b) Locally opimal soluion. Figure 4: Car-like robo moving in a 2-D ligh-dark domain wihou obsacles (adaped from Pla e al. [21]). (a) The mehod is iniialized wih a naïve rajecory o he goal using a RRT planner. (b) The nominal rajecory and associaed beliefs of he soluion compued using our approach (shown in black), and he rajecory obained by applying he compued feedback policy o a robo wih a belief ha is considerably differen han he belief used for mehod iniializaion (red). consiss of an acceleraion a and he seering wheel angle φ. This gives he following non-linear dynamics model: x + τv cos θ y + τv sin θ x +1 = f[x, u, m ] = θ + τv an[φ]/d v + τa + M[u ] m, (49) where τ is he duraion of a ime sep, d is he lengh of he car-like robo, and M[u ] scales he moion noise m proporional o he conrol inpu u Ligh-Dark Domain (No Obsacles) We again consider he ligh-dark domain scenario suggesed by Pla e al. [21]. In his scenario, he robo s abiliy o sense is sae is boh parial (he robo is only capable of sensing is posiion bu no is velociy or orienaion) and noisy. The measuremen noise (modeled by he marix N[x ]) varies as a quadraic funcion of he robo s horizonal coordinae x (as shown in Fig. 4). This gives he following observaion model wih spaially-varying noise: z = h[x, n ] = [ x y ] + N[x ] n, (50) where he measuremen vecor z R 2 consiss of noisy measuremens of he robo s posiion, and he marix N[x ] scales he measuremen noise based on a funcion of he robo s horizonal coordinae x. We iniialize our mehod wih a naïve rajecory o he goal compued using a RRT planner [13] (Fig. 4(a)). We use sae and conrol cos marices of Q = I, R = I, and he final cos marix, Q l = 10lI in our experimens, where l is he number of secions in he iniial rajecory. Fig. 4(b) shows he nominal rajecory and he associaed beliefs of he soluion compued using our approach. The locally opimal nominal rajecory leads he robo o he horizonal coordinae where he measuremen noise is minimum, in order o beer localize iself, before proceeding o he goal. For his example, he iniial rajecory has an expeced cos of and he rajecory converges o a local-opimum wih an expeced cos of 7.6 in 81 ieraions, which requires a oal compuaion ime of 2.07 seconds. We also apply he compued feedback policy o a robo wih a belief ha is considerably differen ha he belief wih which our mehod is iniialized. The resuling rajecory is shown 15

16 (a) Iniial rajecory. (b) Locally opimal soluion. (d) Iniial rajecory. (c) Soluion for Q = 10I`. (e) Locally opimal soluion. Figure 5: A car-like robo moving in a 2-D ligh-dark domain wih obsacles. The robo obains measuremens from wo beacons (marked by blue squares) and an on-board speedomeer. (a) An iniial collision-free rajecory is compued using an RRT planner. (b) Nominal rajecory and he associaed beliefs of soluion compued using our approach. The robo localizes iself by moving closer o he beacon(s) before reaching he goal. The final nominal rajecory also follow he medial axis beween he narrow passage o minimize he possibiliy of colliding wih obsacles. (c) Nominal rajecory compued by varying he cos marices (Q = 10I). The robo ries o reduce he uncerainy in is sae by visiing boh he beacons. (d) A differen iniial rajecory resuls in a differen locally opimal soluion. (e) Our mehod is able o improve rajecories wihin a single homoopy class. in red in Fig. 4(b). Since he belief is considerably differen from he assumed belief used for mehod iniializaion, he conrol policy leads he robo o mimic he compued nominal rajecory, bu once he robo has localized iself in he ligh region of he environmen, he conrol policy reliably leads he robo o he goal Domain Wih Spaially Varying Sensing (Wih Obsacles) We also consider a scenario in which he car-like robo esimaes is locaion using signal measuremens from wo beacons b1 and b2 placed in he environmen a locaions (x 1, y 1 ) and (x 2, y 2 ) respecively. The srengh of he signal decays quadraically wih he disance o he beacon. The robo also measures is curren speed using an on-board speedomeer. The measuremen uncerainy is scaled by a consan marix N. This gives us he following non-linear observaion model: 1/((x x 1 )2 + (y y 1 )2 + 1) z = h[x, n ] = 1/((x x 2 )2 + (y y 2 )2 + 1) + N n, (51) v where he vecor z R3 consiss of wo readings of signal srenghs from he beacons and a speed measuremen from he speedomeer. Fig. 5(a) visually illusraes he quadraic decay in he beacon signal srenghs in he environmen. The robo is able o obain reliable measuremens in he brigh regions of he environmen, bu he measuremens become relaively noisier as he robo moves in o he dark regions due o he decreased signal-o-noise raio. 16

17 (a) Iniial rajecory. (b) Locally opimal soluion. (c) Differen rajecory. (d) Locally opimal soluion. Figure 6: An aircraf-like robo wih omni-direcional acceleraion moving in a 3-D environmen wih obsacles wih parial and noisy sensing. The moion uncerainy is considerably lower a higher aliudes (indicaed by he yellow region). (a) An iniial collision-free rajecory is compued using an RRT planner. (b) Nominal rajecory and he associaed beliefs of soluion compued using our approach. The nominal rajecory is locally opimized such ha he robo spends a large proporion of he rajecory a higher aliudes o reduce uncerainy, before reaching he goal. (c) A differen rajecory iniializaion resuls in local improvemen wihin is iniial homoopy class, resuling in a locally opimal nominal rajecory (d). We iniialize our mehod wih a collision-free rajecory o he goal compued using a RRT planner [13] (Fig. 5(a)). We use sae and conrol cos marices of Q = I, R = I, and he final cos marix, Q l = 10lI in our experimens. Fig. 5(b) shows he nominal rajecory and he associaed beliefs of he soluion compued using our approach. The nominal rajecory leads he robo o he region of he environmen wih reliable sensing for beer localizaion, before moving he robo hrough he narrow passage o arrive a he goal. In conras o he iniial rajecory (Fig. 5(a)), he locally opimal rajecory also moves away from he obsacles and akes a safer pah o he goal. For his example, he iniial rajecory has an expeced cos of and he rajecory converges o a local-opimum wih an expeced cos of in 19 ieraions, which requires a oal compuaion ime of 9.57 seconds. The cos marices Q and R deermine he relaive weighing beween minimizing uncerainy in he robo sae and minimizing conrol effor in he objecive funcion. Fig. 5(c) shows he nominal rajecory of he soluion compued by changing he cos marix Q = 10I. Noice ha he rajecory visis boh he beacons for beer localizaion and minimizing uncerainy, a he expense of addiional conrol effor. Figs. 5(d) and 5(e) shows he nominal rajecory when a differen iniial rajecory is provided as inpu. The presence of obsacles in he environmen forces our mehod o locally opimize rajecories wihin a single homoopy class D Aircraf We consider he case of an aircraf-like robo wih parial and noisy sensing maneuvering in a 3-D environmen wih obsacles. We consider a simplified model of an aircraf ha has omni-direcional acceleraion. This model can be used o approximae he kinemaic consrains on he aircraf as long as he aircraf is moving wih non-zero speed [27]. The sae x = (x, y, z, v x, v y, v z ) R 6 of he robo consiss of is posiion p = (x, y, z) and is velociy v = (v x, v y, v z ). The conrol inpu vecor u = (a x, a y, a z ) comprises of he omni-direcional acceleraion applied o he robo. This gives he following dynamics model: [ p + τv x +1 = f[x, u, m ] = τ ] 2 u + M[p v + τu ] m, (52) 17

18 where τ is he duraion of a ime sep, and M[p ] scales he moion noise m proporional o he robo s posiion p. We se moion uncerainy o be much lower a higher aliudes, approximaely modeling he effec of amospheric and weaher condiions on he robo moion. The uncerainy seadily increases as he aliude of he robo decreases (Fig. 6). We also assume he following sochasic observaion model based on parial and noisy sensing: z = h[x, n ] = p + N n, (53) where he measuremen vecor z R 2 consiss of noisy measuremens of he robo s posiion, and he measuremen noise is scaled by a consan marix N. We iniialize our mehod wih a collision-free rajecory o he goal compued using a RRT planner [13] (Fig. 6(a)). Fig. 6(b) shows he nominal rajecory and he associaed beliefs of he soluions compued using our approach. The robo spends a considerable proporion of he nominal rajecory a higher aliudes in order o reduce he uncerainy, before arriving a he goal. In conras o he iniial rajecory (Fig. 6(a)), he locally opimal rajecory is also smooher in erms of he applied conrol inpus and says away from he obsacles o ake a safer pah o he goal. For his example, he iniial rajecory has an expeced cos of and he rajecory converges o a local opimum wih a considerably lower expeced cos of in 47 ieraions, which requires a oal compuaion ime of 41.8 seconds. Figs. 6(c) and 6(d) show he nominal rajecory when a differen iniial rajecory is provided as inpu. The presence of obsacles in he environmen forces our mehod o locally opimize rajecories wihin a single homoopy class. Our mehod is sill able o locally force he robo o ascend o a higher aliude o reduce he uncerainy, before descending below and going around he obsacle o arrive a he goal. 6.4 Comparison beween ilqg and sddp We quaniaively compared our approach wih value ieraion based on ilqg wih our preliminary approach wih value ieraion based on sochasic differenial dynamic programming (sddp) [28]. In Table 1, we compare he number of ieraions required for convergence and he opimal expeced cos for each of he considered scenarios for boh mehods. Qualiaively, he ilqg-based mehod is asympoically faser han he sddp-based mehod (O[n 6 ] raher han O[n 7 ]) and numerically more sable even when he sddp mehod is implemened wih he square roo of he variance in he belief (sddp requires regularizaion of marices o mainain posiive-semidefinieness of he value funcion). As expeced, each ieraion of he ilqg mehod (O[n 6 ]) akes less ime han an equivalen sddp ieraion (O[n 7 ]). The differences are more pronounced as he dimensionaliy of he belief space increases, as is eviden in he aircraf scenario. On he oher hand, sddp converges in fewer ieraions hen ilqg. This is because sddp uses direc compuaion of he Hessians of he value funcion, while ilqg compues he Hessians based on a linearizaion of he belief dynamics (which runcaes some second-order erms compared o sddp). In all experimens, ilqg and sddp yield almos idenical soluions, whose difference is visually hardly appreciable, and he opimal expeced cos ha boh ilqg and sddp converge o are almos idenical. To evaluae he difference in he wo mehods, we also compue he acual expeced coss across simulaion runs ha use he compued feedback policy o compensae for arificially simulaed moion uncerainy and measuremen noise. The differences in he acual expeced coss are minimal, which alludes o he fac ha he conrol policies compued by he wo mehods are similar. This is wha one would expec; he sligh differences ha do appear are a resul of numerical variaions beween he mehods, and in a few cases his causes he approaches o converge o differen local opima. Overall, our experimens indicae ha ilqg is preferable over sddp because i scales beer o higher dimensional problems and is numerically more sable since he ilqg mehod 18

19 Scenario Iniial Mehod Num. Time Time per Opimal Acual exp cos ier (s) ier (s) exp cos exp cos Poin ilqg (no obs) sddp Poin ilqg (obs) sddp Car ilqg (no obs) sddp Car ilqg (obs) sddp ilqg aircraf sddp Table 1: Comparison of ilqg and sddp. does no require regularizaion o ensure ha he Hessians are posiive semi-definie [28]. The inheren complexiy of he mehod is sill oo high for robos wih complex dynamics and high-dimensional sae spaces, and algorihmic improvemens in he mehod and efficien implemenaions hereof presen ineresing research direcions. 6.5 Effec Of Assuming Maximum-Likelihood Observaions We analyze he effec of assuming maximum-likelihood observaions made in prior work [21, 7, 8] on he compued locally opimal rajecory and corresponding conrol policy. We reproduce his assumpion in our mehod by ignoring all he erms in he value ieraion ha perain o he marix W [b, u], which deermines he sochasic naure of he belief dynamics given by Eq. (17). More specifically, we can reproduce he assumpion by removing he erms conaining he sum-quanifiers in Eqs. (30), (31), and (32). This has he ne resul of considering deerminisic belief dynamics as is he case when maximum-likelihood observaions are assumed. We consider an illusraive example ha considers a poin robo moving in a 2-D domain wih obsacles, as shown in Fig. 7(a). We consider he same sochasic dynamics model for he robo as in Sec We also consider he ligh-dark domain scenario suggesed by Pla e al. [21] where he measuremen noise varies as a quadraic funcion of he robo s horizonal coordinae x (as shown in Fig. 7(a)). We use sae and conrol cos marices of Q = I, R = 3I, and he final cos marix, Q l = 10I in our experimens. We compued 100 random rajecories using an RRT planner [13] and used he rajecories o iniialize our mehod wih and wihou assuming maximum-likelihood observaions. In he case of maximum-likelihood observaions, he mean iniial cos is unis wih a sandard deviaion of 35 unis. The mean final cos a convergence is 17.7 unis wih a sandard deviaion of 1.5 unis. I is imporan o noe ha he final cos is based on deerminisic belief dynamics and is exacly known. We also compued he final expeced cos of he compued conrol policy using value ieraion assuming sochasic belief dynamics, as oulined in Sec The mean expeced cos of he policy a convergence is 23.1 unis wih a sandard deviaion of 2 unis. This indicaes ha here is a mismach in he final cos assuming deerminisic belief dynamics and he acual expeced cos of he compued policies when execued under moion and sensing uncerainy. We also ran our mehod on he same 100 rajecories wihou assuming maximum-likelihood observaions. The mean iniial cos is 35, 371 unis wih a sandard deviaion of 41, 522 unis, while he mean expeced cos a convergence is 21.3 unis wih a sandard deviaion of 1.9 unis. For his scenario, our mehod which does no assume maximum-likelihood observaions yielded an average expeced cos 8.5% beer han he mehod making he maximum-likelihood 19

20 (a) Iniial rajecory. (b) Wih maximum-likelihood assumpion. (c) Wihou maximumlikelihood assumpion. Figure 7: An illusraive example ha considers a poin robo moving in a 2-D domain wih obsacles. (a) An iniial collision-free rajecory is compued using an RRT planner. (b) Nominal rajecory and he associaed beliefs of soluion compued using our mehod under he assumpion of maximum-likelihood observaions. The opimizaion resuls in a nominal rajecory ha does no lead he robo all he way o he horizonal coordinae where he measuremen noise is minimum. (c) Soluion compued wihou making he maximum-likelihood observaion assumpion. The opimizaion is able o find a differen locally opimal rajecory and policy ha allows he robo o localize iself wih cerainy before arriving a he goal region wih reduced uncerainy. assumpion. To demonsrae he effeciveness of he conrol policy compued wih and wihou assuming maximum-likelihood observaions, we evaluaed each conrol policy quaniaively by compuing he percenage of execuions in which he robo was able o avoid obsacles across simulaion execuions assuming arificial moion and measuremen noise. In our experimens, he conrol policies compued assuming maximum-likelihood observaions resul in an average of 324 collisions (sandard deviaion: 87) while he conrol policies compued by our mehod resul in an average of 252 collisions (sandard deviaion: 72). This demonsraes ha no assuming maximum-likelihood observaions reduces he number of collisions by approximaely 25% for he considered scenario. We visualize he difference in he wo cases in Figs. 7(b) and 7(c) using an illusraive example from he 100 random scenarios considered in our experimens. As shown in Fig. 7(b), he nominal rajecory for he case in which we assume maximum-likelihood observaions does no lead he robo all he way o he horizonal coordinae where he measuremen noise is minimum. This resuls in a higher expeced cos of 24.4 unis a convergence and higher uncerainy in he sae of he robo as he robo raverses he narrow passage. In conras, he soluion compued wihou making he maximum-likelihood observaion assumpion is able o find a differen locally opimal rajecory and policy ha allows he robo o localize iself wih greaer cerainy before arriving a he goal region wih reduced uncerainy (see Fig. 7(c)). The expeced cos a convergence in his case is 16.9 unis. We noe ha a lower expeced cos is no guaraneed: among he 100 random iniial rajecories here are also cases in which he soluion compued wih he maximum likelihood assumpion has a beer expeced cos han he soluion compued wihou he assumpion. As in he scenario of he figure, his is very likely he resul of boh mehods converging o a differen local opimum. Overall, our resuls indicae ha no making he maximum-likelihood assumpion gives, on average, beer conrol policies. However, depending on he applicaion, he impac of he assumpion may be relaively limied. This raises he quesion of wheher he assumpion can be formally jusified and is negaive impac bounded. In he case of our mehod, making he 20

Notes on Kalman Filtering

Notes on Kalman Filtering Noes on Kalman Filering Brian Borchers and Rick Aser November 7, Inroducion Daa Assimilaion is he problem of merging model predicions wih acual measuremens of a sysem o produce an opimal esimae of he curren

More information

L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS. NA568 Mobile Robotics: Methods & Algorithms

L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS. NA568 Mobile Robotics: Methods & Algorithms L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS NA568 Mobile Roboics: Mehods & Algorihms Today s Topic Quick review on (Linear) Kalman Filer Kalman Filering for Non-Linear Sysems Exended Kalman Filer (EKF)

More information

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017 Two Popular Bayesian Esimaors: Paricle and Kalman Filers McGill COMP 765 Sep 14 h, 2017 1 1 1, dx x Bel x u x P x z P Recall: Bayes Filers,,,,,,, 1 1 1 1 u z u x P u z u x z P Bayes z = observaion u =

More information

Zürich. ETH Master Course: L Autonomous Mobile Robots Localization II

Zürich. ETH Master Course: L Autonomous Mobile Robots Localization II Roland Siegwar Margaria Chli Paul Furgale Marco Huer Marin Rufli Davide Scaramuzza ETH Maser Course: 151-0854-00L Auonomous Mobile Robos Localizaion II ACT and SEE For all do, (predicion updae / ACT),

More information

Sequential Importance Resampling (SIR) Particle Filter

Sequential Importance Resampling (SIR) Particle Filter Paricle Filers++ Pieer Abbeel UC Berkeley EECS Many slides adaped from Thrun, Burgard and Fox, Probabilisic Roboics 1. Algorihm paricle_filer( S -1, u, z ): 2. Sequenial Imporance Resampling (SIR) Paricle

More information

An introduction to the theory of SDDP algorithm

An introduction to the theory of SDDP algorithm An inroducion o he heory of SDDP algorihm V. Leclère (ENPC) Augus 1, 2014 V. Leclère Inroducion o SDDP Augus 1, 2014 1 / 21 Inroducion Large scale sochasic problem are hard o solve. Two ways of aacking

More information

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle Chaper 2 Newonian Mechanics Single Paricle In his Chaper we will review wha Newon s laws of mechanics ell us abou he moion of a single paricle. Newon s laws are only valid in suiable reference frames,

More information

Estimation of Poses with Particle Filters

Estimation of Poses with Particle Filters Esimaion of Poses wih Paricle Filers Dr.-Ing. Bernd Ludwig Chair for Arificial Inelligence Deparmen of Compuer Science Friedrich-Alexander-Universiä Erlangen-Nürnberg 12/05/2008 Dr.-Ing. Bernd Ludwig (FAU

More information

Vehicle Arrival Models : Headway

Vehicle Arrival Models : Headway Chaper 12 Vehicle Arrival Models : Headway 12.1 Inroducion Modelling arrival of vehicle a secion of road is an imporan sep in raffic flow modelling. I has imporan applicaion in raffic flow simulaion where

More information

STATE-SPACE MODELLING. A mass balance across the tank gives:

STATE-SPACE MODELLING. A mass balance across the tank gives: B. Lennox and N.F. Thornhill, 9, Sae Space Modelling, IChemE Process Managemen and Conrol Subjec Group Newsleer STE-SPACE MODELLING Inroducion: Over he pas decade or so here has been an ever increasing

More information

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still.

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still. Lecure - Kinemaics in One Dimension Displacemen, Velociy and Acceleraion Everyhing in he world is moving. Nohing says sill. Moion occurs a all scales of he universe, saring from he moion of elecrons in

More information

Probabilistic Robotics

Probabilistic Robotics Probabilisic Roboics Bayes Filer Implemenaions Gaussian filers Bayes Filer Reminder Predicion bel p u bel d Correcion bel η p z bel Gaussians : ~ π e p N p - Univariae / / : ~ μ μ μ e p Ν p d π Mulivariae

More information

Optimal Path Planning for Flexible Redundant Robot Manipulators

Optimal Path Planning for Flexible Redundant Robot Manipulators 25 WSEAS In. Conf. on DYNAMICAL SYSEMS and CONROL, Venice, Ialy, November 2-4, 25 (pp363-368) Opimal Pah Planning for Flexible Redundan Robo Manipulaors H. HOMAEI, M. KESHMIRI Deparmen of Mechanical Engineering

More information

Time series model fitting via Kalman smoothing and EM estimation in TimeModels.jl

Time series model fitting via Kalman smoothing and EM estimation in TimeModels.jl Time series model fiing via Kalman smoohing and EM esimaion in TimeModels.jl Gord Sephen Las updaed: January 206 Conens Inroducion 2. Moivaion and Acknowledgemens....................... 2.2 Noaion......................................

More information

15. Vector Valued Functions

15. Vector Valued Functions 1. Vecor Valued Funcions Up o his poin, we have presened vecors wih consan componens, for example, 1, and,,4. However, we can allow he componens of a vecor o be funcions of a common variable. For example,

More information

From Particles to Rigid Bodies

From Particles to Rigid Bodies Rigid Body Dynamics From Paricles o Rigid Bodies Paricles No roaions Linear velociy v only Rigid bodies Body roaions Linear velociy v Angular velociy ω Rigid Bodies Rigid bodies have boh a posiion and

More information

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles Diebold, Chaper 7 Francis X. Diebold, Elemens of Forecasing, 4h Ediion (Mason, Ohio: Cengage Learning, 006). Chaper 7. Characerizing Cycles Afer compleing his reading you should be able o: Define covariance

More information

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon 3..3 INRODUCION O DYNAMIC OPIMIZAION: DISCREE IME PROBLEMS A. he Hamilonian and Firs-Order Condiions in a Finie ime Horizon Define a new funcion, he Hamilonian funcion, H. H he change in he oal value of

More information

SEIF, EnKF, EKF SLAM. Pieter Abbeel UC Berkeley EECS

SEIF, EnKF, EKF SLAM. Pieter Abbeel UC Berkeley EECS SEIF, EnKF, EKF SLAM Pieer Abbeel UC Berkeley EECS Informaion Filer From an analyical poin of view == Kalman filer Difference: keep rack of he inverse covariance raher han he covariance marix [maer of

More information

1. VELOCITY AND ACCELERATION

1. VELOCITY AND ACCELERATION 1. VELOCITY AND ACCELERATION 1.1 Kinemaics Equaions s = u + 1 a and s = v 1 a s = 1 (u + v) v = u + as 1. Displacemen-Time Graph Gradien = speed 1.3 Velociy-Time Graph Gradien = acceleraion Area under

More information

Some Basic Information about M-S-D Systems

Some Basic Information about M-S-D Systems Some Basic Informaion abou M-S-D Sysems 1 Inroducion We wan o give some summary of he facs concerning unforced (homogeneous) and forced (non-homogeneous) models for linear oscillaors governed by second-order,

More information

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010 Simulaion-Solving Dynamic Models ABE 5646 Week 2, Spring 2010 Week Descripion Reading Maerial 2 Compuer Simulaion of Dynamic Models Finie Difference, coninuous saes, discree ime Simple Mehods Euler Trapezoid

More information

EKF SLAM vs. FastSLAM A Comparison

EKF SLAM vs. FastSLAM A Comparison vs. A Comparison Michael Calonder, Compuer Vision Lab Swiss Federal Insiue of Technology, Lausanne EPFL) michael.calonder@epfl.ch The wo algorihms are described wih a planar robo applicaion in mind. Generalizaion

More information

Final Spring 2007

Final Spring 2007 .615 Final Spring 7 Overview The purpose of he final exam is o calculae he MHD β limi in a high-bea oroidal okamak agains he dangerous n = 1 exernal ballooning-kink mode. Effecively, his corresponds o

More information

In this chapter the model of free motion under gravity is extended to objects projected at an angle. When you have completed it, you should

In this chapter the model of free motion under gravity is extended to objects projected at an angle. When you have completed it, you should Cambridge Universiy Press 978--36-60033-7 Cambridge Inernaional AS and A Level Mahemaics: Mechanics Coursebook Excerp More Informaion Chaper The moion of projeciles In his chaper he model of free moion

More information

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB Elecronic Companion EC.1. Proofs of Technical Lemmas and Theorems LEMMA 1. Le C(RB) be he oal cos incurred by he RB policy. Then we have, T L E[C(RB)] 3 E[Z RB ]. (EC.1) Proof of Lemma 1. Using he marginal

More information

Announcements. Recap: Filtering. Recap: Reasoning Over Time. Example: State Representations for Robot Localization. Particle Filtering

Announcements. Recap: Filtering. Recap: Reasoning Over Time. Example: State Representations for Robot Localization. Particle Filtering Inroducion o Arificial Inelligence V22.0472-001 Fall 2009 Lecure 18: aricle & Kalman Filering Announcemens Final exam will be a 7pm on Wednesday December 14 h Dae of las class 1.5 hrs long I won ask anyhing

More information

KINEMATICS IN ONE DIMENSION

KINEMATICS IN ONE DIMENSION KINEMATICS IN ONE DIMENSION PREVIEW Kinemaics is he sudy of how hings move how far (disance and displacemen), how fas (speed and velociy), and how fas ha how fas changes (acceleraion). We say ha an objec

More information

State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter

State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter Sae-Space Models Iniializaion, Esimaion and Smoohing of he Kalman Filer Iniializaion of he Kalman Filer The Kalman filer shows how o updae pas predicors and he corresponding predicion error variances when

More information

Lecture 9: September 25

Lecture 9: September 25 0-725: Opimizaion Fall 202 Lecure 9: Sepember 25 Lecurer: Geoff Gordon/Ryan Tibshirani Scribes: Xuezhi Wang, Subhodeep Moira, Abhimanu Kumar Noe: LaTeX emplae couresy of UC Berkeley EECS dep. Disclaimer:

More information

Overview. COMP14112: Artificial Intelligence Fundamentals. Lecture 0 Very Brief Overview. Structure of this course

Overview. COMP14112: Artificial Intelligence Fundamentals. Lecture 0 Very Brief Overview. Structure of this course OMP: Arificial Inelligence Fundamenals Lecure 0 Very Brief Overview Lecurer: Email: Xiao-Jun Zeng x.zeng@mancheser.ac.uk Overview This course will focus mainly on probabilisic mehods in AI We shall presen

More information

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model Modal idenificaion of srucures from roving inpu daa by means of maximum likelihood esimaion of he sae space model J. Cara, J. Juan, E. Alarcón Absrac The usual way o perform a forced vibraion es is o fix

More information

Introduction to Mobile Robotics

Introduction to Mobile Robotics Inroducion o Mobile Roboics Bayes Filer Kalman Filer Wolfram Burgard Cyrill Sachniss Giorgio Grisei Maren Bennewiz Chrisian Plagemann Bayes Filer Reminder Predicion bel p u bel d Correcion bel η p z bel

More information

Planning in POMDPs. Dominik Schoenberger Abstract

Planning in POMDPs. Dominik Schoenberger Abstract Planning in POMDPs Dominik Schoenberger d.schoenberger@sud.u-darmsad.de Absrac This documen briefly explains wha a Parially Observable Markov Decision Process is. Furhermore i inroduces he differen approaches

More information

Recursive Least-Squares Fixed-Interval Smoother Using Covariance Information based on Innovation Approach in Linear Continuous Stochastic Systems

Recursive Least-Squares Fixed-Interval Smoother Using Covariance Information based on Innovation Approach in Linear Continuous Stochastic Systems 8 Froniers in Signal Processing, Vol. 1, No. 1, July 217 hps://dx.doi.org/1.2266/fsp.217.112 Recursive Leas-Squares Fixed-Inerval Smooher Using Covariance Informaion based on Innovaion Approach in Linear

More information

12: AUTOREGRESSIVE AND MOVING AVERAGE PROCESSES IN DISCRETE TIME. Σ j =

12: AUTOREGRESSIVE AND MOVING AVERAGE PROCESSES IN DISCRETE TIME. Σ j = 1: AUTOREGRESSIVE AND MOVING AVERAGE PROCESSES IN DISCRETE TIME Moving Averages Recall ha a whie noise process is a series { } = having variance σ. The whie noise process has specral densiy f (λ) = of

More information

Tom Heskes and Onno Zoeter. Presented by Mark Buller

Tom Heskes and Onno Zoeter. Presented by Mark Buller Tom Heskes and Onno Zoeer Presened by Mark Buller Dynamic Bayesian Neworks Direced graphical models of sochasic processes Represen hidden and observed variables wih differen dependencies Generalize Hidden

More information

WEEK-3 Recitation PHYS 131. of the projectile s velocity remains constant throughout the motion, since the acceleration a x

WEEK-3 Recitation PHYS 131. of the projectile s velocity remains constant throughout the motion, since the acceleration a x WEEK-3 Reciaion PHYS 131 Ch. 3: FOC 1, 3, 4, 6, 14. Problems 9, 37, 41 & 71 and Ch. 4: FOC 1, 3, 5, 8. Problems 3, 5 & 16. Feb 8, 018 Ch. 3: FOC 1, 3, 4, 6, 14. 1. (a) The horizonal componen of he projecile

More information

CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK

CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK 175 CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK 10.1 INTRODUCTION Amongs he research work performed, he bes resuls of experimenal work are validaed wih Arificial Neural Nework. From he

More information

Multi-scale 2D acoustic full waveform inversion with high frequency impulsive source

Multi-scale 2D acoustic full waveform inversion with high frequency impulsive source Muli-scale D acousic full waveform inversion wih high frequency impulsive source Vladimir N Zubov*, Universiy of Calgary, Calgary AB vzubov@ucalgaryca and Michael P Lamoureux, Universiy of Calgary, Calgary

More information

2. Nonlinear Conservation Law Equations

2. Nonlinear Conservation Law Equations . Nonlinear Conservaion Law Equaions One of he clear lessons learned over recen years in sudying nonlinear parial differenial equaions is ha i is generally no wise o ry o aack a general class of nonlinear

More information

20. Applications of the Genetic-Drift Model

20. Applications of the Genetic-Drift Model 0. Applicaions of he Geneic-Drif Model 1) Deermining he probabiliy of forming any paricular combinaion of genoypes in he nex generaion: Example: If he parenal allele frequencies are p 0 = 0.35 and q 0

More information

2.160 System Identification, Estimation, and Learning. Lecture Notes No. 8. March 6, 2006

2.160 System Identification, Estimation, and Learning. Lecture Notes No. 8. March 6, 2006 2.160 Sysem Idenificaion, Esimaion, and Learning Lecure Noes No. 8 March 6, 2006 4.9 Eended Kalman Filer In many pracical problems, he process dynamics are nonlinear. w Process Dynamics v y u Model (Linearized)

More information

23.2. Representing Periodic Functions by Fourier Series. Introduction. Prerequisites. Learning Outcomes

23.2. Representing Periodic Functions by Fourier Series. Introduction. Prerequisites. Learning Outcomes Represening Periodic Funcions by Fourier Series 3. Inroducion In his Secion we show how a periodic funcion can be expressed as a series of sines and cosines. We begin by obaining some sandard inegrals

More information

Robot Motion Model EKF based Localization EKF SLAM Graph SLAM

Robot Motion Model EKF based Localization EKF SLAM Graph SLAM Robo Moion Model EKF based Localizaion EKF SLAM Graph SLAM General Robo Moion Model Robo sae v r Conrol a ime Sae updae model Noise model of robo conrol Noise model of conrol Robo moion model

More information

Ground Rules. PC1221 Fundamentals of Physics I. Kinematics. Position. Lectures 3 and 4 Motion in One Dimension. A/Prof Tay Seng Chuan

Ground Rules. PC1221 Fundamentals of Physics I. Kinematics. Position. Lectures 3 and 4 Motion in One Dimension. A/Prof Tay Seng Chuan Ground Rules PC11 Fundamenals of Physics I Lecures 3 and 4 Moion in One Dimension A/Prof Tay Seng Chuan 1 Swich off your handphone and pager Swich off your lapop compuer and keep i No alking while lecure

More information

Chapter 2. First Order Scalar Equations

Chapter 2. First Order Scalar Equations Chaper. Firs Order Scalar Equaions We sar our sudy of differenial equaions in he same way he pioneers in his field did. We show paricular echniques o solve paricular ypes of firs order differenial equaions.

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Noes for EE7C Spring 018: Convex Opimizaion and Approximaion Insrucor: Moriz Hard Email: hard+ee7c@berkeley.edu Graduae Insrucor: Max Simchowiz Email: msimchow+ee7c@berkeley.edu Ocober 15, 018 3

More information

Lecture 20: Riccati Equations and Least Squares Feedback Control

Lecture 20: Riccati Equations and Least Squares Feedback Control 34-5 LINEAR SYSTEMS Lecure : Riccai Equaions and Leas Squares Feedback Conrol 5.6.4 Sae Feedback via Riccai Equaions A recursive approach in generaing he marix-valued funcion W ( ) equaion for i for he

More information

m = 41 members n = 27 (nonfounders), f = 14 (founders) 8 markers from chromosome 19

m = 41 members n = 27 (nonfounders), f = 14 (founders) 8 markers from chromosome 19 Sequenial Imporance Sampling (SIS) AKA Paricle Filering, Sequenial Impuaion (Kong, Liu, Wong, 994) For many problems, sampling direcly from he arge disribuion is difficul or impossible. One reason possible

More information

Online Appendix to Solution Methods for Models with Rare Disasters

Online Appendix to Solution Methods for Models with Rare Disasters Online Appendix o Soluion Mehods for Models wih Rare Disasers Jesús Fernández-Villaverde and Oren Levinal In his Online Appendix, we presen he Euler condiions of he model, we develop he pricing Calvo block,

More information

Speaker Adaptation Techniques For Continuous Speech Using Medium and Small Adaptation Data Sets. Constantinos Boulis

Speaker Adaptation Techniques For Continuous Speech Using Medium and Small Adaptation Data Sets. Constantinos Boulis Speaker Adapaion Techniques For Coninuous Speech Using Medium and Small Adapaion Daa Ses Consaninos Boulis Ouline of he Presenaion Inroducion o he speaker adapaion problem Maximum Likelihood Sochasic Transformaions

More information

LAPLACE TRANSFORM AND TRANSFER FUNCTION

LAPLACE TRANSFORM AND TRANSFER FUNCTION CHBE320 LECTURE V LAPLACE TRANSFORM AND TRANSFER FUNCTION Professor Dae Ryook Yang Spring 2018 Dep. of Chemical and Biological Engineering 5-1 Road Map of he Lecure V Laplace Transform and Transfer funcions

More information

Technical Report Doc ID: TR March-2013 (Last revision: 23-February-2016) On formulating quadratic functions in optimization models.

Technical Report Doc ID: TR March-2013 (Last revision: 23-February-2016) On formulating quadratic functions in optimization models. Technical Repor Doc ID: TR--203 06-March-203 (Las revision: 23-Februar-206) On formulaing quadraic funcions in opimizaion models. Auhor: Erling D. Andersen Convex quadraic consrains quie frequenl appear

More information

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature On Measuring Pro-Poor Growh 1. On Various Ways of Measuring Pro-Poor Growh: A Shor eview of he Lieraure During he pas en years or so here have been various suggesions concerning he way one should check

More information

Georey E. Hinton. University oftoronto. Technical Report CRG-TR February 22, Abstract

Georey E. Hinton. University oftoronto.   Technical Report CRG-TR February 22, Abstract Parameer Esimaion for Linear Dynamical Sysems Zoubin Ghahramani Georey E. Hinon Deparmen of Compuer Science Universiy oftorono 6 King's College Road Torono, Canada M5S A4 Email: zoubin@cs.orono.edu Technical

More information

10. State Space Methods

10. State Space Methods . Sae Space Mehods. Inroducion Sae space modelling was briefly inroduced in chaper. Here more coverage is provided of sae space mehods before some of heir uses in conrol sysem design are covered in he

More information

2016 Possible Examination Questions. Robotics CSCE 574

2016 Possible Examination Questions. Robotics CSCE 574 206 Possible Examinaion Quesions Roboics CSCE 574 ) Wha are he differences beween Hydraulic drive and Shape Memory Alloy drive? Name one applicaion in which each one of hem is appropriae. 2) Wha are he

More information

MATH 5720: Gradient Methods Hung Phan, UMass Lowell October 4, 2018

MATH 5720: Gradient Methods Hung Phan, UMass Lowell October 4, 2018 MATH 5720: Gradien Mehods Hung Phan, UMass Lowell Ocober 4, 208 Descen Direcion Mehods Consider he problem min { f(x) x R n}. The general descen direcions mehod is x k+ = x k + k d k where x k is he curren

More information

Effects of Coordinate Curvature on Integration

Effects of Coordinate Curvature on Integration Effecs of Coordinae Curvaure on Inegraion Chrisopher A. Lafore clafore@gmail.com Absrac In his paper, he inegraion of a funcion over a curved manifold is examined in he case where he curvaure of he manifold

More information

Comparing Means: t-tests for One Sample & Two Related Samples

Comparing Means: t-tests for One Sample & Two Related Samples Comparing Means: -Tess for One Sample & Two Relaed Samples Using he z-tes: Assumpions -Tess for One Sample & Two Relaed Samples The z-es (of a sample mean agains a populaion mean) is based on he assumpion

More information

Integration Over Manifolds with Variable Coordinate Density

Integration Over Manifolds with Variable Coordinate Density Inegraion Over Manifolds wih Variable Coordinae Densiy Absrac Chrisopher A. Lafore clafore@gmail.com In his paper, he inegraion of a funcion over a curved manifold is examined in he case where he curvaure

More information

Article from. Predictive Analytics and Futurism. July 2016 Issue 13

Article from. Predictive Analytics and Futurism. July 2016 Issue 13 Aricle from Predicive Analyics and Fuurism July 6 Issue An Inroducion o Incremenal Learning By Qiang Wu and Dave Snell Machine learning provides useful ools for predicive analyics The ypical machine learning

More information

ACE 562 Fall Lecture 5: The Simple Linear Regression Model: Sampling Properties of the Least Squares Estimators. by Professor Scott H.

ACE 562 Fall Lecture 5: The Simple Linear Regression Model: Sampling Properties of the Least Squares Estimators. by Professor Scott H. ACE 56 Fall 005 Lecure 5: he Simple Linear Regression Model: Sampling Properies of he Leas Squares Esimaors by Professor Sco H. Irwin Required Reading: Griffihs, Hill and Judge. "Inference in he Simple

More information

Two Coupled Oscillators / Normal Modes

Two Coupled Oscillators / Normal Modes Lecure 3 Phys 3750 Two Coupled Oscillaors / Normal Modes Overview and Moivaion: Today we ake a small, bu significan, sep owards wave moion. We will no ye observe waves, bu his sep is imporan in is own

More information

R t. C t P t. + u t. C t = αp t + βr t + v t. + β + w t

R t. C t P t. + u t. C t = αp t + βr t + v t. + β + w t Exercise 7 C P = α + β R P + u C = αp + βr + v (a) (b) C R = α P R + β + w (c) Assumpions abou he disurbances u, v, w : Classical assumions on he disurbance of one of he equaions, eg. on (b): E(v v s P,

More information

Book Corrections for Optimal Estimation of Dynamic Systems, 2 nd Edition

Book Corrections for Optimal Estimation of Dynamic Systems, 2 nd Edition Boo Correcions for Opimal Esimaion of Dynamic Sysems, nd Ediion John L. Crassidis and John L. Junins November 17, 017 Chaper 1 This documen provides correcions for he boo: Crassidis, J.L., and Junins,

More information

Trajectory planning in Cartesian space

Trajectory planning in Cartesian space Roboics 1 Trajecory planning in Caresian space Prof. Alessandro De Luca Roboics 1 1 Trajecories in Caresian space in general, he rajecory planning mehods proposed in he join space can be applied also in

More information

Christos Papadimitriou & Luca Trevisan November 22, 2016

Christos Papadimitriou & Luca Trevisan November 22, 2016 U.C. Bereley CS170: Algorihms Handou LN-11-22 Chrisos Papadimiriou & Luca Trevisan November 22, 2016 Sreaming algorihms In his lecure and he nex one we sudy memory-efficien algorihms ha process a sream

More information

Math Week 14 April 16-20: sections first order systems of linear differential equations; 7.4 mass-spring systems.

Math Week 14 April 16-20: sections first order systems of linear differential equations; 7.4 mass-spring systems. Mah 2250-004 Week 4 April 6-20 secions 7.-7.3 firs order sysems of linear differenial equaions; 7.4 mass-spring sysems. Mon Apr 6 7.-7.2 Sysems of differenial equaions (7.), and he vecor Calculus we need

More information

( ) ( ) if t = t. It must satisfy the identity. So, bulkiness of the unit impulse (hyper)function is equal to 1. The defining characteristic is

( ) ( ) if t = t. It must satisfy the identity. So, bulkiness of the unit impulse (hyper)function is equal to 1. The defining characteristic is UNIT IMPULSE RESPONSE, UNIT STEP RESPONSE, STABILITY. Uni impulse funcion (Dirac dela funcion, dela funcion) rigorously defined is no sricly a funcion, bu disribuion (or measure), precise reamen requires

More information

Exponential Weighted Moving Average (EWMA) Chart Under The Assumption of Moderateness And Its 3 Control Limits

Exponential Weighted Moving Average (EWMA) Chart Under The Assumption of Moderateness And Its 3 Control Limits DOI: 0.545/mjis.07.5009 Exponenial Weighed Moving Average (EWMA) Char Under The Assumpion of Moderaeness And Is 3 Conrol Limis KALPESH S TAILOR Assisan Professor, Deparmen of Saisics, M. K. Bhavnagar Universiy,

More information

Robust estimation based on the first- and third-moment restrictions of the power transformation model

Robust estimation based on the first- and third-moment restrictions of the power transformation model h Inernaional Congress on Modelling and Simulaion, Adelaide, Ausralia, 6 December 3 www.mssanz.org.au/modsim3 Robus esimaion based on he firs- and hird-momen resricions of he power ransformaion Nawaa,

More information

Class Meeting # 10: Introduction to the Wave Equation

Class Meeting # 10: Introduction to the Wave Equation MATH 8.5 COURSE NOTES - CLASS MEETING # 0 8.5 Inroducion o PDEs, Fall 0 Professor: Jared Speck Class Meeing # 0: Inroducion o he Wave Equaion. Wha is he wave equaion? The sandard wave equaion for a funcion

More information

A Hop Constrained Min-Sum Arborescence with Outage Costs

A Hop Constrained Min-Sum Arborescence with Outage Costs A Hop Consrained Min-Sum Arborescence wih Ouage Coss Rakesh Kawara Minnesoa Sae Universiy, Mankao, MN 56001 Email: Kawara@mnsu.edu Absrac The hop consrained min-sum arborescence wih ouage coss problem

More information

IB Physics Kinematics Worksheet

IB Physics Kinematics Worksheet IB Physics Kinemaics Workshee Wrie full soluions and noes for muliple choice answers. Do no use a calculaor for muliple choice answers. 1. Which of he following is a correc definiion of average acceleraion?

More information

Module 2 F c i k c s la l w a s o s f dif di fusi s o i n

Module 2 F c i k c s la l w a s o s f dif di fusi s o i n Module Fick s laws of diffusion Fick s laws of diffusion and hin film soluion Adolf Fick (1855) proposed: d J α d d d J (mole/m s) flu (m /s) diffusion coefficien and (mole/m 3 ) concenraion of ions, aoms

More information

d 1 = c 1 b 2 - b 1 c 2 d 2 = c 1 b 3 - b 1 c 3

d 1 = c 1 b 2 - b 1 c 2 d 2 = c 1 b 3 - b 1 c 3 and d = c b - b c c d = c b - b c c This process is coninued unil he nh row has been compleed. The complee array of coefficiens is riangular. Noe ha in developing he array an enire row may be divided or

More information

Augmented Reality II - Kalman Filters - Gudrun Klinker May 25, 2004

Augmented Reality II - Kalman Filters - Gudrun Klinker May 25, 2004 Augmened Realiy II Kalman Filers Gudrun Klinker May 25, 2004 Ouline Moivaion Discree Kalman Filer Modeled Process Compuing Model Parameers Algorihm Exended Kalman Filer Kalman Filer for Sensor Fusion Lieraure

More information

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD HAN XIAO 1. Penalized Leas Squares Lasso solves he following opimizaion problem, ˆβ lasso = arg max β R p+1 1 N y i β 0 N x ij β j β j (1.1) for some 0.

More information

t is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t...

t is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t... Mah 228- Fri Mar 24 5.6 Marix exponenials and linear sysems: The analogy beween firs order sysems of linear differenial equaions (Chaper 5) and scalar linear differenial equaions (Chaper ) is much sronger

More information

SZG Macro 2011 Lecture 3: Dynamic Programming. SZG macro 2011 lecture 3 1

SZG Macro 2011 Lecture 3: Dynamic Programming. SZG macro 2011 lecture 3 1 SZG Macro 2011 Lecure 3: Dynamic Programming SZG macro 2011 lecure 3 1 Background Our previous discussion of opimal consumpion over ime and of opimal capial accumulaion sugges sudying he general decision

More information

Traveling Waves. Chapter Introduction

Traveling Waves. Chapter Introduction Chaper 4 Traveling Waves 4.1 Inroducion To dae, we have considered oscillaions, i.e., periodic, ofen harmonic, variaions of a physical characerisic of a sysem. The sysem a one ime is indisinguishable from

More information

Testing for a Single Factor Model in the Multivariate State Space Framework

Testing for a Single Factor Model in the Multivariate State Space Framework esing for a Single Facor Model in he Mulivariae Sae Space Framework Chen C.-Y. M. Chiba and M. Kobayashi Inernaional Graduae School of Social Sciences Yokohama Naional Universiy Japan Faculy of Economics

More information

23.5. Half-Range Series. Introduction. Prerequisites. Learning Outcomes

23.5. Half-Range Series. Introduction. Prerequisites. Learning Outcomes Half-Range Series 2.5 Inroducion In his Secion we address he following problem: Can we find a Fourier series expansion of a funcion defined over a finie inerval? Of course we recognise ha such a funcion

More information

Numerical Dispersion

Numerical Dispersion eview of Linear Numerical Sabiliy Numerical Dispersion n he previous lecure, we considered he linear numerical sabiliy of boh advecion and diffusion erms when approimaed wih several spaial and emporal

More information

Linear Response Theory: The connection between QFT and experiments

Linear Response Theory: The connection between QFT and experiments Phys540.nb 39 3 Linear Response Theory: The connecion beween QFT and experimens 3.1. Basic conceps and ideas Q: How do we measure he conduciviy of a meal? A: we firs inroduce a weak elecric field E, and

More information

Unsteady Flow Problems

Unsteady Flow Problems School of Mechanical Aerospace and Civil Engineering Unseady Flow Problems T. J. Craf George Begg Building, C41 TPFE MSc CFD-1 Reading: J. Ferziger, M. Peric, Compuaional Mehods for Fluid Dynamics H.K.

More information

Solution: b All the terms must have the dimension of acceleration. We see that, indeed, each term has the units of acceleration

Solution: b All the terms must have the dimension of acceleration. We see that, indeed, each term has the units of acceleration PHYS 54 Tes Pracice Soluions Spring 8 Q: [4] Knowing ha in he ne epression a is acceleraion, v is speed, is posiion and is ime, from a dimensional v poin of view, he equaion a is a) incorrec b) correc

More information

Anno accademico 2006/2007. Davide Migliore

Anno accademico 2006/2007. Davide Migliore Roboica Anno accademico 2006/2007 Davide Migliore migliore@ele.polimi.i Today Eercise session: An Off-side roblem Robo Vision Task Measuring NBA layers erformance robabilisic Roboics Inroducion The Bayesian

More information

Math 333 Problem Set #2 Solution 14 February 2003

Math 333 Problem Set #2 Solution 14 February 2003 Mah 333 Problem Se #2 Soluion 14 February 2003 A1. Solve he iniial value problem dy dx = x2 + e 3x ; 2y 4 y(0) = 1. Soluion: This is separable; we wrie 2y 4 dy = x 2 + e x dx and inegrae o ge The iniial

More information

Lecture 4 Kinetics of a particle Part 3: Impulse and Momentum

Lecture 4 Kinetics of a particle Part 3: Impulse and Momentum MEE Engineering Mechanics II Lecure 4 Lecure 4 Kineics of a paricle Par 3: Impulse and Momenum Linear impulse and momenum Saring from he equaion of moion for a paricle of mass m which is subjeced o an

More information

Basilio Bona ROBOTICA 03CFIOR 1

Basilio Bona ROBOTICA 03CFIOR 1 Indusrial Robos Kinemaics 1 Kinemaics and kinemaic funcions Kinemaics deals wih he sudy of four funcions (called kinemaic funcions or KFs) ha mahemaically ransform join variables ino caresian variables

More information

Rapid Termination Evaluation for Recursive Subdivision of Bezier Curves

Rapid Termination Evaluation for Recursive Subdivision of Bezier Curves Rapid Terminaion Evaluaion for Recursive Subdivision of Bezier Curves Thomas F. Hain School of Compuer and Informaion Sciences, Universiy of Souh Alabama, Mobile, AL, U.S.A. Absrac Bézier curve flaening

More information

Unit Root Time Series. Univariate random walk

Unit Root Time Series. Univariate random walk Uni Roo ime Series Univariae random walk Consider he regression y y where ~ iid N 0, he leas squares esimae of is: ˆ yy y y yy Now wha if = If y y hen le y 0 =0 so ha y j j If ~ iid N 0, hen y ~ N 0, he

More information

Presentation Overview

Presentation Overview Acion Refinemen in Reinforcemen Learning by Probabiliy Smoohing By Thomas G. Dieerich & Didac Busques Speaer: Kai Xu Presenaion Overview Bacground The Probabiliy Smoohing Mehod Experimenal Sudy of Acion

More information

EXERCISES FOR SECTION 1.5

EXERCISES FOR SECTION 1.5 1.5 Exisence and Uniqueness of Soluions 43 20. 1 v c 21. 1 v c 1 2 4 6 8 10 1 2 2 4 6 8 10 Graph of approximae soluion obained using Euler s mehod wih = 0.1. Graph of approximae soluion obained using Euler

More information

Probabilistic Robotics SLAM

Probabilistic Robotics SLAM Probabilisic Roboics SLAM The SLAM Problem SLAM is he process by which a robo builds a map of he environmen and, a he same ime, uses his map o compue is locaion Localizaion: inferring locaion given a map

More information

Robotics I. April 11, The kinematics of a 3R spatial robot is specified by the Denavit-Hartenberg parameters in Tab. 1.

Robotics I. April 11, The kinematics of a 3R spatial robot is specified by the Denavit-Hartenberg parameters in Tab. 1. Roboics I April 11, 017 Exercise 1 he kinemaics of a 3R spaial robo is specified by he Denavi-Harenberg parameers in ab 1 i α i d i a i θ i 1 π/ L 1 0 1 0 0 L 3 0 0 L 3 3 able 1: able of DH parameers of

More information