arxiv: v3 [cs.sy] 28 Oct 2015

Size: px
Start display at page:

Download "arxiv: v3 [cs.sy] 28 Oct 2015"

Transcription

1 Model Predctve Path Integral Control usng Covarance Varable Importance Samplng Grady Wllams, Andrew Aldrch, and Evangelos A. Theodorou arxv:59.49v3 [cs.sy] 8 Oct 5 Abstract In ths paper we develop a Model Predctve Path Integral MPPI control algorthm based on a generalzed mportance samplng scheme and perform parallel optmzaton va samplng usng a Graphcs Processng Unt GPU. The proposed generalzed mportance samplng scheme allows for changes n the drft and dffuson terms of stochastc dffuson processes and plays a sgnfcant role n the performance of the model predctve control algorthm. We compare the proposed algorthm n smulaton wth a model predctve control verson of dfferental dynamc programmng. I. INTRODUCTION The path ntegral optmal control framework [7], [5], [6] provdes a mathematcally sound methodology for developng optmal control algorthms based on stochastc samplng of trajectores. The key dea n ths framework s that the value functon for the optmal control problem s transformed usng the Feynman-Kac lemma [], [8] nto an expectaton over all possble trajectores, whch s known as a path ntegral. Ths transformaton allows stochastc optmal control problems to be solved wth a Monte-Carlo approxmaton usng forward samplng of stochastc dffuson processes. There have been a varety of algorthms developed n the path ntegral control settng. The most straght-forward applcaton of path ntegral control s when the teratve feedback control law suggested n [5] s mplemented n ts open loop formulaton. Ths requres that samplng takes place only from the ntal state of the optmal control problem. A more effectve approach s to use the path ntegral control framework to fnd the parameters of a feedback control polcy. Ths can be done by samplng n polcy parameter space, these methods are known as Polcy Improvement wth Path Integrals [4]. Another approach to fndng the parameters of a polcy s to attempt to drectly sample from the optmal dstrbuton defned by the value functon [3]. Other methods along smlar threads of research nclude [], [7]. Another way that the path ntegral control framework can be appled s n a model predctve control settng. In ths settng an open-loop control sequence s constantly optmzed n the background whle the machne s smultaneously executng the best guess that the controller has. An ssue wth ths approach s that many trajectores must be sampled n real-tme, whch s dffcult when the system has complex dynamcs. One way around ths problem s to Ths research has been supported by NSF Grant No. NRI The authors are wth the Autonomous Control and Decson Systems Laboratory at the Georga Insttute of Technology, Atlanta, GA, USA. Emal: gradyrw@gatech.edu drastcally smplfy the system under consderaton by usng a herarchcal scheme [4], and use path ntegral control to generate trajectores for a pont mass whch s then followed by a low level controller. Even though ths approach may be successfull for certan applcatons, t s lmted n the knds of behavors that t can generate snce t does not consder the full non-lnearty of dynamcs. A more effcent approach s to take advantage of the parallel nature of samplng and use a graphcs processng unt GPU [9] to sample thousands of trajectores from the nonlnear dynamcs. A major ssue n the path ntegral control framework s that the expectaton s taken wth respect to the uncontrolled dynamcs of the system. Ths s problematc snce the probablty of samplng a low cost trajectory usng the uncontrolled dynamcs s typcally very low. Ths problem becomes more drastc when the underlyng dynamcs are nonlnear and sampled trajectores can become trapped n undesrable parts of the state space. It has prevously been demonstrated how to change the mean of the samplng dstrbuton usng Grsanov s theorem [5], [6], ths can then be used to develop an teratve algorthm. However, the varance of the samplng dstrbuton has always remaned unchanged. Although n some smple smulated scenaros changng the varance s not necessary, n many cases the natural varance of a system wll be too low to produce useful devatons from the current trajectory. Prevous methods have ether dealt wth ths problem by artfcally addng nose nto the system and then optmzng the nosy system [], [4]. Or they have smply gnored the problem entrely and sampled from whatever dstrbuton worked best [], [9]. Although these approaches can be successful, both are problematc n that the optmzaton ether takes place wth respect to the wrong system or the resultng algorthm gnores the theoretcal bass of path ntegral control. The approach we take here generalzes these approaches n that t enables for both the mean and varance of the samplng dstrbuton to be changed by the control desgner, wthout volatng the underlyng assumptons made n the path ntegral dervaton. Ths enables the algorthm to converge fast enough that t can be appled n a model predctve control settng. After dervng the model predctve path ntegral control MPPI algorthm, we compare t wth an exstng model predctve control formulaton based on dfferental dynamc programmng DDP [6], [3], [8]. DDP s one of the most powerful technques for trajectory optmzaton, t reles on a frst or second order approxmaton of the dynamcs and a quadratc approxmaton of the cost along a nomnal trajectory, t then computes a second order approxmaton of

2 the value functon whch t uses to generate the control. II. PATH INTEGRAL CONTROL In ths secton we revew the path ntegral optmal control framework [7]. Let x t R N denote the state of a dynamcal system at tme t, ux t, t R m denotes a control nput for the system, τ : [t, T ] R n represents a trajectory of the system, and dw R p s a brownan dsturbance. In the path ntegral control framework we suppose that the dynamcs take the form: dx = fx t, tdt + Gx t, tux t, tdt + Bx t, tdw In other words, the dynamcs are affne n control and subject to an affne brownan dsturbance. We also assume that G and B are parttoned as: Gx t, t = G c x t, t ; Bx t, t = B c x t, t Expectatons taken wth respect to are denoted as E Q [ ], we wll also be nterested n takng expectatons wth respect to the uncontrolled dynamcs of the system.e wth u. These wll be denoted E P [ ]. We suppose that the cost functon for the optmal control problem has a quadratc control cost and an arbtrary state-dependent cost. Let φx T denote a fnal the termnal cost, qx t, t a state dependent runnng cost, and defne Rx t, t as a postve defnte matrx. The value functon V x t, t for ths optmal control problem s then defned as: mn E Q u [ φx T + T t qx t, t + ut Rx t, tu dt 3 The Stochastc Hamlton-Jacob-Bellman equaton [], [] for the type of system n and for the cost functon n 3 s gven as: t V = qx t, t + fx t, t T V x V T x Gx t, trx t, t Gx t, t T V x + trbx t, tbx t, t T V xx where the optmal control s expressed as: ] 4 u = Rx t, t Gx t, t T V x 5 The soluton to ths backwards PDE yelds the value functon for the stochastc optmal control problem, whch s then used to generate the optmal control. Unfortunately, classcal methods for solvng partal dfferental equatons of ths nature suffer from the curse of dmensonalty and are ntractable for systems wth more than a few state varables. The approach we take n the path ntegral control framework s to transform the backwards PDE nto a path ntegral, whch s an expectaton over all possble trajectores of the system. Ths expectaton can then be approxmated by forward samplng of the stochastc dynamcs. In order to effect ths transformaton we apply an exponental transformaton of the value functon V x, t = logψx, t 6 Here s a postve constant. We also have to assume a relatonshp between the cost and nose n the system as well as through the equaton: B c x t, tb c x, t T = G c x t, trx t, t G c x t, t T 7 The man restrcton mpled by ths assumpton s that Bx t, t has the same rank as Rx t, t. Ths lmts the nose n the system to only effect state varables that are drectly actuated.e. the nose s control dependent. There are a wde varety of systems whch naturally fall nto ths descrpton, so the assumpton s not too restrctve. However, there are nterestng systems for whch ths descrpton does not hold.e. f there are known strong dsturbances on ndrectly actuated state varables or f the dynamcs are only partally known. By makng ths assumpton and performng the exponental transformaton of the value functon the stochastc HJB equaton s transformed nto the lnear partal dfferental equaton: t Ψ = Ψx t, t qx t, t fx t, t T Ψ x trσx t, tψ xx 8 Here we ve denoted the covarance matrx B c x t, tb c x t, t T as Σx t, t. Ths equaton s known as the backward Chapman-Kolmogorov PDE. We can then apply the Feynman-Kac lemma, whch relates backward PDEs of ths type to path ntegrals through the equaton: ] T Ψx t, t = E P [exp qx, t dt Ψx T, T t 9 Note that the expectaton whch s the path ntegral s taken wth respect to P whch s the uncontrolled dynamcs of the system. By recognzng that the term Ψx T s the transformed termnal cost: e φx T we can re-wrte ths expresson as: Ψx t, t E P [exp ] Sτ where Sτ = φx T + T t qx t, tdt s the cost-to-go of the state dependent cost of a trajectory. Lastly we have to compute the gradent of Ψ wth respect to the ntal state x t. Ths can be done analytcally and s a straghtforward, albet lengthy, computaton so we omt t and refer the nterested reader to [4]. After takng the gradent we obtan: u dt = Gx t, t E [ P exp Sτ Bx t, t dw ] [ E P exp Sτ]

3 Where the matrx Gx t, t s defned as: Rx t, t G c x t, t T G c x t, trx t, t G c x t, t T Note that f G c x t, t s square whch s the case f the system s not over actuated ths reduces to G c x t, t. Equaton s the path ntegral form of the optmal control. The fundamental dfference between ths form of the optmal control and classcal optmal control theory s that nstead of relyng on a backwards n tme process, ths formula requres the evaluaton of an expectaton whch can be approxmated usng forward samplng of stochastc dfferental equatons. A. Dscrete Approxmaton Equaton provdes an expresson for the optmal control n terms of a path ntegral. However, these equatons are for contnuous tme and n order to sample trajectores on a computer we need dscrete tme approxmatons. We frst dscretze the dynamcs of the system. We have that x t+ = x t + dx t where dx t s defned as: dx t = fx t, t + Gx t, tux t, t + Bx t, tɛ 3 The term ɛ s a vector of standard normal Gaussan random varables. For the uncontrolled dynamcs of the system we have: dx t = fx t, t + Bx t, tɛ 4 Another way we can express Bx t, tdw whch wll be useful s as: Bx t, tdw dx t fx t, t 5 Lastly we say: Sτ φx T + N = qx t, t where N = T t/ Then by defnng p as the probablty nduced by the dscrete tme uncontrolled dynamcs we can approxmate as: [ E u = Gx t, t p exp ] Sτ dx t fx t, t [ E p exp Sτ] 6 Note that we have moved the term multplyng u over to the rght-hand sde of the equaton and nserted t nto the expectaton. III. GENERALIZED IMPORTANCE SAMPLING Equaton 6 provdes an mplementable method for approxmatng the optmal control va random samplng of trajectores. By drawng many samples from p the expectaton can be evaluated usng a Monte-Carlo approxmaton. In practce, ths approach s unlkely to succeed. The problem s that p s typcally an neffcent dstrbuton to sample from.e the cost-to-go wll be hgh for most trajectores sampled from p. Intutvely samplng from the uncontrolled dynamcs corresponds to turnng a machne on and watng for the natural nose n the system dynamcs to produce nterestng behavor. In order to effcently approxmate the controls, we requre the ablty to sample from a dstrbuton whch s lkely to produce low cost trajectores. In prevous applcatons of path ntegral control [5], [6] the mean of the samplng dstrbuton has been changed whch allows for an teratve update law. However, the varance of the samplng dstrbuton has always remaned unchanged. In well engneered systems, where the natural varance of the system s very low, changng the mean s nsuffcent snce the state space s never aggressvely explored. In the followng dervaton we provde a method for changng both the ntal control nput and the varance of the samplng dstrbuton. A. Lkelhood Rato We suppose that we have a samplng dstrbuton wth nonzero control nput and a changed varance, whch we denote as q, and we would lke to approxmate 6 usng samples from q as opposed to p. Now f we wrte the expectaton term 6 n ntegral form we get: exp Sτ dx t fx t, t pτdτ exp Sτ 7 pτdτ Where we are abusng notaton and usng τ to represent the dscrete trajectory x t, x t,... x tn. Next we multply both ntegrals by = qτ qτ to get: exp Sτ dx t fx t, t qτ qτ pτdτ exp Sτ qτ qτ pτdτ 8 And we can then wrte ths as an expectaton wth respect to q: [exp ] Sτ dx t fx t, t pτ qτ [ exp ] 9 Sτ pτ qτ We now have the expectaton n terms of a samplng dstrbuton q for whch we can choose: The ntal control sequence from whch to sample around. The varance of the exploraton nose whch determnes how aggressvely the state space s explored. However, we now have an extra term to compute pτ qτ. Ths s known as the lkelhood rato or Radon-Nkodym dervatve between the dstrbutons p and q. In order to derve an expresson for ths term we frst have to derve equatons for the probablty densty functons of pτ and qτ ndvdually. We can do ths by dervng the probablty densty functon for the general dscrete tme dffuson processes P τ, correspondng to the dynamcs: dx t = fx t, t + Gx t, tux t, t + Bx t, tɛ The goal s to fnd P τ = P x t, x t,... x tn. By condtonng and usng the Markov property of the state space ths probablty becomes: N P x t, x t,... x tn = P x t x t =

4 Now recall that a porton of the state space has determnstc dynamcs and that we ve parttoned the dffuson matrx as: Bx t, t = B c x t, t We can partton the state varables x nto the determnstc and non-determnstc varables x a t and x c t respectvely. The next step s to condton on x a t+ = F a x t, t = x a t + f a x t, t + G a x t, tu t dt snce f ths does not hold P τ s zero. We thus need to compute: N P = x t x t, x a t = F a x t, t 3 And from the dynamcs equatons we know that each of these one-step transtons s Gaussan wth mean: f c x t, t + G c x t, t ux t, t and varance: Σ = B c x t, t B c x t, t T. 4 We then defne z = dxc t f c x t, t, and µ = G c x t, t ux t, t. Applyng the defnton of the Gaussan dstrbuton wth these terms yelds: N exp z µ T Σ z µ P τ = 5 π n/ Σ / = And then usng basc rules of exponents ths probablty becomes: Zτ exp N z µ T Σ z µ 6 = Where Zτ = N = πn/ Σ /. Wth ths equaton n hand we re now ready to compute the lkelhood rato between two dffuson processes. Theorem : Let pτ be the probablty densty functon for trajectores under the uncontrolled dscrete tme dynamcs: dx t = fx t, t + Bx t, tɛ 7 And let qτ be the probablty densty functon for trajectores under the controlled dynamcs wth an adjusted varance: dx t = fx t, t + Gx t, tux t, t + Where the adjusted varance has the form: B E x t, t = A t B c x t, t B E x t, tɛ 8 And defne z, µ, and Σ as before. Let Q be defned as: Where Γ s: Q = z µ T Γ z µ + µ T Σ Γ z µ + µ T Σ µ 9 = Σ A T t Σ A t 3 Then under the condton that each A t s nvertble and each Γ s nvertble, the lkelhood rato for the two dstrbutons s: N A t exp N Q 3 = = Proof: In dscrete tme the probablty of a trajectory s formulated accordng to the 6. We thus have pτ equal to: exp N = z Σ z pτ = 3 Z p τ and qτ equal to: exp N = z µ T A T t Σ A t z µ Z q τ Then dvdng these two equatons we have pτ qτ as: N π n/ A T t Σ A t / exp N ζ π n/ Σ / = = Where ζ s: ζ = z T Σ z z µ T A T t Σ A t z µ 35 Usng basc rules of determnants t s easy to see that the term outsde the exponent reduces to N π n/ A T j Σ ja j / N = A π n/ Σ j / j 36 j= j= So we need only show that ζ reduces to Q. Observe that at every tmestep we have the dfference between two quadratc functons of z, so we can complete the square to combne ths nto a sngle quadratc functon. If we recall the defnton of Γ from above, and defne Λ = A T t Σ A t then completng the square yelds: ζ = z + Γ Λ µ T Λ T µ Γ z + Γ Λ µ µ Γ Λ T µ Γ Γt Λ µ Now we expand out the frst quadratc term to get: ζ = z T Γ µ T Λ z + µ T Λ z + µ T Λ Γ Λ µ µ Γ Λ µ T Γ Γ Λ µ Notce that the two underlned terms are the same, except for the sgn, so they cancel out and we re left wth: ζ = z T Γ z + µ T Λ z µ T Λ µ 39 Now defne z = z µ, and then re-wrte ths equaton n terms of z : ζ = z +µ T Γ z +µ +µ T Λ z +µ µ T Λ µ 4

5 whch expands out to: Then by re-defnng the runnng cost qx t, t as: ζ = z T Γ + µ T Λ Whch then smplfes to: z + µ T Γ z + µ T Λ z + µ T Γ µ µ µ T Λ µ 4 qx, u, dx = qx t, t + z µt Γ z µ + µ T H z µ + µt H µ 48 ζ = z T Γ + µ T Λ Now recall that Γ = Σ quadratc terms n Γ Dong ths yelds: ζ = z T Γ µ T Λ z + µ T Γ z + µ T Σ µ + µ T Λ z + µ T Λ Λ nto the Σ z µ T Λ z + µ T Γ µ µ 4, so we can splt the and Λ components. z + µ T Λ µ z + µ T Σ µ 43 and by notng that the underlned terms cancel out we see that we re left wth: ζ = z T Γ whch s the same as: z µ T Γ z + µ T Σ z + µ T Σ µ 44 z µ + µ T Σ And so ζ = Q whch completes the proof. z µ + µ T Σ µ 45 The key dfference between ths proof and earler path ntegral works whch use an applcaton of Grsanov s theorem to sample from a non-zero control nput s that ths theorem allows for a change n the varance as well. In the expresson for the lkelhood rato derved here the last two terms µ T Σ z µ + µ T Σ µ are exactly the terms from Grsanov s theorem. The frst term z µ T Γ z µ, whch can be nterpreted as penalzng over-aggressve exploraton, s the only addtonal term. B. Lkelhood Rato as Addtonal Runnng Cost The form of the lkelhood rato just derved s easly ncorporated nto the path ntegral control framework by foldng t nto the cost-to-go as an extra runnng cost. Note that the lkelhood rato appears n both the numerator and denomnator of 6. Therefore, any terms whch do not depend on the state can be factored out of the expectaton and canceled. Ths removes the numercally troublesome normalzng term N j= A t j. So only the summaton of Q remans. Recall that Σ = Gx t, trx t, t Gx t, t. Ths mples that: Gxt Γ =, trx t, t Gx t, t 46 A T Gx t, trx t, t Gx t, t T A Now defne H = Gx t, trx t, t Gx t, t T and Γ = Γ. We then have: Q = z µ T Γ z µ+µ T H z µ+µ T H µ 47 and Sτ = φx T + N j= qx, u, dx, we have: [ Sτ E u t = Gx t, t q exp dxt fx t, t ] [exp Sτ ] 49 Also note that dx t s now equal to: fx t, t + Gx t, tux t, t + Bx t, tɛ 5 So we can re-wrte dxt fx t, t as: ɛ Gx t, tux t, t + Bx t, t 5 And then snce Gx t, t does not depend on the expectaton we can pull t out and get the teratve update law: u t = Gx t, t Gx t, tux t, t [ E + Gx t, t q exp Sτ Bx t, t [exp Sτ ] C. Specal Case ] ɛ 5 The update law 5 s applcable for a very general class of systems. In ths secton we examne a specal case whch we use for all of our experments. We consder dynamcs of the form: dx t = fx t, t + Gx t, t ux t, t + ɛ ρ 53 And for the samplng dstrbuton we set A equal to νi. We also assume that G c x t, t s a square nvertble matrx. Ths reduces Hx t, t to G c x t, t. Next the dynamcs can be re-wrtten as: dx t = fx t, t + Gx t, t ux t, t + ɛ ρ 54 Then we can nterpret ρ ɛ as a random change n the control nput, to emphasze ths we wll denote ths term as δu = ɛ ρ. We then have Bx t, t ɛ = Gx t, tδu. Ths yelds the teratve update law as: [exp Sτ ] ux t, t δu = ux t, t + 55 [exp Sτ ] whch can be approxmated as: K ux t, t k= exp Sτ,k δu,k ux t, t + K k= exp Sτ,k 56

6 Where K s the number of random samples termed rollouts and Sτ,k s the cost-to-go of the k th rollout from tme t onward. Ths expresson s smply a reward-weghted average of random varatons n the control nput. Next we nvestgate what the lkelhood rato addton to the runnng cost s. For these dynamcs we have the followng smplfcatons: z µ = Gx t, tδu Γ = ν Gx t, t Rx t, tgx t, t H = Gx t, t Rx t, tgx t, t Gven these smplfcatons q reduces to: qx, u, dx = qx t, t + ν δu T Rδu + u T Rδu + ut Ru 57 Ths means that the ntroducton of the lkelhood rato smply ntroduces the orgnal control cost from the optmal control formulaton nto the samplng cost, whch orgnally only ncluded state-dependent terms. IV. MODEL PREDICTIVE CONTROL ALGORITHM We apply the teratve path ntegral control update law, wth the generalzed mportance samplng term, n a model predctve control settng. In ths settng optmzaton and executon occur smultaneously: the trajectory s optmzed and then a sngle control s executed, then the trajectory s re-optmzed usng the un-executed porton of the prevous trajectory to warm-start the optmzaton. Ths scheme has two key requrements: Rapd convergence to a good control nput. The ablty to sample a large number of trajectores n real-tme. The frst requrement s essental because the algorthm does not have the luxury of watng untl the trajectory has converged before executng. The new mportance samplng term enables tunng of the exploraton varance whch allows for rapd convergence, ths s demonstrated n Fg.. The second requrement, samplng a large number of trajectores n real-tme, s satsfed by mplementng the random samplng of trajectores on a GPU. The algorthm s gven n Algorthm, n the parallel GPU mplementaton the samplng for loop for k to K- s run completely n parallel. V. EXPERIMENTS We tested the model predctve path ntegral control algorthm MPPI on three smulated platforms A cart-pole, A mnature race car, and 3 A quadrotor attemptng to navgate an obstacle flled envronment. For the race car and quadrotor we used a model predctve control verson of the dfferental dynamc programmng DDP algorthm as a baselne comparson. In all of these experments the controller operates at 5 Hz, ths means that the open loop control sequence s re-optmzed every mllseconds. Algorthm : Model Predctve Path Integral Control Gven: K: Number of samples; N: Number of tmesteps; u, u,...u N : Intal control sequence;, x t, f, G, B, ν: System/samplng dynamcs; φ, q, R, : Cost parameters; u nt : Value to ntalze new controls to; whle task not completed do for k to K do x = x t ; for to N do x + = x + f + G u + δu,k ; Sτ +,k = Sτ,k + q; for to N[ do ] K exp u u + S τ,k δu,k k= K k= exp S ; τ,k send to actuatorsu ; for to N do u = u + ; u N = u nt Update the current state after recevng feedback; check for task completon; A. Cart-Pole For the cart-pole swng-up task we used the state cost: qx = p cosθ + θ + ṗ, where p s the poston of cart, ṗ s the velocty and θ, θ are the angle and angular velocty of the pole. The control nput s desred velocty, whch maps to velocty through the equaton: p = u ṗ. The dsturbance parameter ρ was set equal. and the control cost was R =. We ran the MPPI controller for seconds wth a second optmzaton horzon. The controller has to swng-up the pole and keep t balanced for the rest of the second horzon. The exploraton varance Average Runnng Cost ν = 75 ν = 5 ν = ν = Number of Rollouts Log Scale Fg.. Average runnng cost for the cart-pole swng-up task as a functon of the exploraton varance ν and the number of rollouts. Usng only the natural system varance the MPC algorthm does not converge n ths scenaro.

7 parameter, ν, was vared between and 5. The MPPI controller s able to swng-up the pole faster wth ncreasng exploraton varance. Fg. llustrates the performance of the MPPI controller as the exploraton varance and the number of rollouts are changed. Usng only the natural varance of the system for exploraton s nsuffcent n ths task, n that case not shown n the fgure the controller s never able to swng-up the pole whch results n a cost around. B. Race Car In the race car task the goal was to mnmze the objectve functon: qx = d + v x 7.. Where d s defned as: d = x 3 + y 6, and vx s the forward n body frame velocty of the car. Ths cost ensures that the car to stays on an ellptcal track whle mantanng a forward speed of 7 meters/sec. We use a non-lnear dynamcs model [5] whch takes nto account the hghly non-lnear nteractons between tres and the ground. The exploraton varance was set to a constant ν tmes the natural varance of the system. The MPPI controller s able to enter turns at 5 5 MPC-DDP MPPI Fg. 3. Comparson of DDP left and MPPI rght performng a cornerng maneuver along an ellpsod track. MPPI s able to make a much tgther turn whle carryng more speed n and out of the corner than DDP. The drecton of travel s counterclockwse. Velocty m/s DDP v x DDP v y MPPI v x MPPI v y Average Runnng Cost DDP Soluton ν = 5 ν = ν = 5 ν = Tme s Fg. 4. Comparson of DDP left and MPPI rght performng a cornerng maneuver along an ellpsod track. MPPI s able to make a much tgther turn whle carryng more speed n and out of the corner than DDP. MPPI and DDP whch gude the quadrotor through the forest as quckly as possble. The cost functon for MPPI was Number of Rollouts Log Scale 8 MPC-DDP MPPI Fg.. Performance comparson n terms of average cost between MPPI and MPC-DDP as the exploraton varance ν changes from 5 to 3 and the number of rollouts changes from to. Only wth a very large ncrease n the exploraton varance s MPPI able to outperform MPC-DDP. Note that the cost s capped at close to the desred speed of 7 m/s and then slde through the turn. The DDP soluton does not attempt to slde and sgnfcantly reduces ts forward velocty before enterng the turn, ths results n a hgher average cost compared to the MPPI controller. Fg. shows the cost comparson between MPPI and MPC-DDP, and Fgures 3 and 4 show samples of the trajectores taken by the two algorthms as well as the velocty profles. C. Quadrotor The quadrotor task was to fly through a feld flled wth cylndrcal obstacles as fast as possble. We used the quadrotor dynamcs model from [9]. Ths s a non-lnear model whch ncludes poston, velocty, euler angles, angular acceleraton, and the rotor dynamcs. We randomly generated three forests, one where obstacles are on average 3 meters apart, the second one 4 meters apart, and the thrd 5 meters apart. We then separately created cost functons for both Fg. 5. Left: sample DDP trajectory through 4m obstacle feld, Rght: Sample MPPI trajectory through the same feld. Snce the MPPI controller can drectly reason about the shape of the obstacles t s able to safely pass through the feld takng a much more drect route. of the form: qx =.5p x p des x +.5p y p des y + 5p z p des z + 5ψ + v +35 exp d + C where p x, p y, p z denotes the poston of the vehcle. ψ denotes the yaw angle n radans, v s velocty, and d s the dstance to the closest obstacle. C s a varable whch ndcates whether the vehcle has crashed nto the ground or an obstacle. Addtonally f C = whch ndcates a crash, the rollout stops smulatng the dynamcs and the vehcle remans where t s for the rest of the tme horzon. We found that the crash ndcator term s not useful for the MPC-DDP based controller, ths s not surprsng snce the dscontnuty

8 t creates s dffcult to approxmate wth a quadratc functon. The term n the cost for avodng obstacles n the MPC- DDP controller conssts purely of a large exponental term: N = exp d, note that ths sum s over all the obstacles n the proxmty of the vehcle whereas the MPPI controller only has to consder the closest obstacle. Tme to Completon s Fg MPPI MPC-DDP 3m 4m 5m Densty Settng of Forest Tme to navgate forest. Comparson between MMPI and DDP. Snce the MPPI controller can explctly reason about crashng as opposed to just stayng away from obstacles, t s able to travel both faster and closer to obstacles than the MPC-DDP controller. Fg. 7 shows the dfference n tme between the two algorthms and Fg. 6 the trajectores taken by MPC-DDP and one of the MPPI runs on the forest wth obstacles placed on average 4 meters away. Fg. 7. Smulated forest envronment used n the quadrotor navgaton task. VI. CONCLUSION In ths paper we have developed a model predctve path ntegral control algorthm whch s able to outperform a state-of-the-art DDP method on two dffcult control tasks. The algorthm s based on stochastc samplng of system trajectores and requres no dervatves of ether the dynamcs or costs of the system. Ths enables the algorthm to naturally take nto account non-lnear dynamcs, such as a non-lnear tre model [5]. It s also able to handle cost functons whch are ntutvely appealng, such as an mpulse cost for httng an obstacle, but are dffcult for tradtonal approaches that rely on a smooth gradent sgnal to perform optmzaton. The two keys to achevng ths level of performance wth a samplng based method are: The dervaton of the generalzed lkelhood rato between dscrete tme dffuson processes. The use of a GPU to sample thousands of trajectores n real-tme. The dervaton of the lkelhood rato enables the desgner of the algorthm to tune the exploraton varance n the path ntegral control framework, whereas prevous methods have only allowed for the mean of the dstrbuton to be changed. Tunng the exploraton varance s crtcal n achevng a hgh level of performance snce the natural varance of the system s typcally too low to acheve good performance. The experments consdered n ths work only consder changng the varance by a constant multple tmes the natural varance of the system. In ths specal case the ntroducton of the lkelhood rato corresponds to addng n a control cost when evaluatng the cost-to-go of a trajectory. A drecton for future research s to nvestgate how to automatcally adjust the varance onlne. Dong so could enable the algorthm to swtch from aggressvely explorng the state space when performng aggressve maneuvers to explorng more conservatvely for performng very precse maneuvers. REFERENCES [] W. H. Flemng and H. M. Soner. Controlled Markov processes and vscosty solutons. Applcatons of mathematcs. Sprnger, New York, nd edton, 6. [] A. Fredman. Stochastc Dfferental Equatons And Applcatons. Academc Press, 975. [3] Vcenç Gómez, Hlbert J Kappen, Jan Peters, and Gerhard Neumann. Polcy search for path ntegral control. In Machne Learnng and Knowledge Dscovery n Databases, pages Sprnger, 4. [4] Vcenç Gómez, Sep Thjssen, Hlbert J Kappen, Stephen Hales, and Andrew Symngton. Real-tme stochastc optmal control for multagent quadrotor swarms. arxv preprnt arxv:5.4548, 5. [5] R.Y Hndyeh. Dynamcs and Control of Drftng n Automobles. PhD thess, Stanford Unversty, March 3. [6] D. H. Jacobson and D. Q. Mayne. Dfferental dynamc programmng. Amercan Elsever Pub. Co., New York, 97. [7] H. J. Kappen. Lnear theory for control of nonlnear stochastc systems. Phys Rev Lett, 95:, 5. Journal Artcle Unted States. [8] I. Karatzas and S. E. Shreve. Brownan Moton and Stochastc Calculus Graduate Texts n Mathematcs. Sprnger, nd edton, August 99. [9] Nathan Mchael, Danel Mellnger, Quentn Lndsey, and Vjay Kumar. The grasp multple mcro-uav testbed. Robotcs & Automaton Magazne, IEEE, 73:56 65,. [] E. Rombokas, M. Malhotra, E.A. Theodorou, E. Todorov, and Y. Matsuoka. Renforcement learnng and synergstc control of the act hand. IEEE/ASME Transactons on Mechatroncs, 8: , 3. [] R. F. Stengel. Optmal control and estmaton. Dover books on advanced mathematcs. Dover Publcatons, New York, 994. [] F. Stulp, J. Buchl, E. Theodorou, and S. Schaal. Renforcement learnng of full-body humanod motor sklls. In Proceedngs of th IEEE- RAS Internatonal Conference on Humanod Robots Humanods, pages 45 4, Dec. [3] E. Theodorou, Y. Tassa, and E. Todorov. Stochastc dfferental dynamc programmng. In Amercan Control Conference,, pages 5 3,. [4] E. A. Theodorou, J. Buchl, and S. Schaal. A generalzed path ntegral approach to renforcement learnng. Journal of Machne Learnng Research, :337 38,. [5] E.A. Theodorou and E. Todorov. Relatve entropy and free energy dualtes: Connectons to path ntegral and kl control. In the Proceedngs of IEEE Conference on Decson and Control, pages , Dec. [6] Evangelos A. Theodorou. Nonlnear stochastc control and nformaton theoretc dualtes: Connectons, nterdependences and thermodynamc nterpretatons. Entropy, 75: , 5.

9 [7] Sep Thjssen and HJ Kappen. Path ntegral control and state-dependent feedback. Physcal Revew E, 93:34, 5. [8] E. Todorov and W. L. A generalzed teratve lqg method for locallyoptmal feedback control of constraned nonlnear stochastc systems. pages 3 36, 5. [9] G. Wllams, E. Rombokas, and T. Danel. Gpu based path ntegral control wth learned dynamcs. In Neural Informaton Processng Systems - ALR Workshop, 4.

Erratum: A Generalized Path Integral Control Approach to Reinforcement Learning

Erratum: A Generalized Path Integral Control Approach to Reinforcement Learning Journal of Machne Learnng Research 00-9 Submtted /0; Publshed 7/ Erratum: A Generalzed Path Integral Control Approach to Renforcement Learnng Evangelos ATheodorou Jonas Buchl Stefan Schaal Department of

More information

Numerical Heat and Mass Transfer

Numerical Heat and Mass Transfer Master degree n Mechancal Engneerng Numercal Heat and Mass Transfer 06-Fnte-Dfference Method (One-dmensonal, steady state heat conducton) Fausto Arpno f.arpno@uncas.t Introducton Why we use models and

More information

EEE 241: Linear Systems

EEE 241: Linear Systems EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

Lecture 12: Discrete Laplacian

Lecture 12: Discrete Laplacian Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

More information

DUE: WEDS FEB 21ST 2018

DUE: WEDS FEB 21ST 2018 HOMEWORK # 1: FINITE DIFFERENCES IN ONE DIMENSION DUE: WEDS FEB 21ST 2018 1. Theory Beam bendng s a classcal engneerng analyss. The tradtonal soluton technque makes smplfyng assumptons such as a constant

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve

More information

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE Analytcal soluton s usually not possble when exctaton vares arbtrarly wth tme or f the system s nonlnear. Such problems can be solved by numercal tmesteppng

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

Markov Chain Monte Carlo Lecture 6

Markov Chain Monte Carlo Lecture 6 where (x 1,..., x N ) X N, N s called the populaton sze, f(x) f (x) for at least one {1, 2,..., N}, and those dfferent from f(x) are called the tral dstrbutons n terms of mportance samplng. Dfferent ways

More information

Week 5: Neural Networks

Week 5: Neural Networks Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple

More information

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Hidden Markov Models & The Multivariate Gaussian (10/26/04) CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

Difference Equations

Difference Equations Dfference Equatons c Jan Vrbk 1 Bascs Suppose a sequence of numbers, say a 0,a 1,a,a 3,... s defned by a certan general relatonshp between, say, three consecutve values of the sequence, e.g. a + +3a +1

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

Section 8.3 Polar Form of Complex Numbers

Section 8.3 Polar Form of Complex Numbers 80 Chapter 8 Secton 8 Polar Form of Complex Numbers From prevous classes, you may have encountered magnary numbers the square roots of negatve numbers and, more generally, complex numbers whch are the

More information

Supporting Information

Supporting Information Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

Research Article Green s Theorem for Sign Data

Research Article Green s Theorem for Sign Data Internatonal Scholarly Research Network ISRN Appled Mathematcs Volume 2012, Artcle ID 539359, 10 pages do:10.5402/2012/539359 Research Artcle Green s Theorem for Sgn Data Lous M. Houston The Unversty of

More information

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Support Vector Machines. Vibhav Gogate The University of Texas at dallas Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest

More information

Lecture 21: Numerical methods for pricing American type derivatives

Lecture 21: Numerical methods for pricing American type derivatives Lecture 21: Numercal methods for prcng Amercan type dervatves Xaoguang Wang STAT 598W Aprl 10th, 2014 (STAT 598W) Lecture 21 1 / 26 Outlne 1 Fnte Dfference Method Explct Method Penalty Method (STAT 598W)

More information

Linear Feature Engineering 11

Linear Feature Engineering 11 Lnear Feature Engneerng 11 2 Least-Squares 2.1 Smple least-squares Consder the followng dataset. We have a bunch of nputs x and correspondng outputs y. The partcular values n ths dataset are x y 0.23 0.19

More information

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4) I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes

More information

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009 College of Computer & Informaton Scence Fall 2009 Northeastern Unversty 20 October 2009 CS7880: Algorthmc Power Tools Scrbe: Jan Wen and Laura Poplawsk Lecture Outlne: Prmal-dual schema Network Desgn:

More information

2 Finite difference basics

2 Finite difference basics Numersche Methoden 1, WS 11/12 B.J.P. Kaus 2 Fnte dfference bascs Consder the one- The bascs of the fnte dfference method are best understood wth an example. dmensonal transent heat conducton equaton T

More information

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton

More information

The Feynman path integral

The Feynman path integral The Feynman path ntegral Aprl 3, 205 Hesenberg and Schrödnger pctures The Schrödnger wave functon places the tme dependence of a physcal system n the state, ψ, t, where the state s a vector n Hlbert space

More information

Appendix B. The Finite Difference Scheme

Appendix B. The Finite Difference Scheme 140 APPENDIXES Appendx B. The Fnte Dfference Scheme In ths appendx we present numercal technques whch are used to approxmate solutons of system 3.1 3.3. A comprehensve treatment of theoretcal and mplementaton

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Additional Codes using Finite Difference Method. 1 HJB Equation for Consumption-Saving Problem Without Uncertainty

Additional Codes using Finite Difference Method. 1 HJB Equation for Consumption-Saving Problem Without Uncertainty Addtonal Codes usng Fnte Dfference Method Benamn Moll 1 HJB Equaton for Consumpton-Savng Problem Wthout Uncertanty Before consderng the case wth stochastc ncome n http://www.prnceton.edu/~moll/ HACTproect/HACT_Numercal_Appendx.pdf,

More information

ECE559VV Project Report

ECE559VV Project Report ECE559VV Project Report (Supplementary Notes Loc Xuan Bu I. MAX SUM-RATE SCHEDULING: THE UPLINK CASE We have seen (n the presentaton that, for downlnk (broadcast channels, the strategy maxmzng the sum-rate

More information

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1 On an Extenson of Stochastc Approxmaton EM Algorthm for Incomplete Data Problems Vahd Tadayon Abstract: The Stochastc Approxmaton EM (SAEM algorthm, a varant stochastc approxmaton of EM, s a versatle tool

More information

MMA and GCMMA two methods for nonlinear optimization

MMA and GCMMA two methods for nonlinear optimization MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons

More information

1 Convex Optimization

1 Convex Optimization Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,

More information

Formulas for the Determinant

Formulas for the Determinant page 224 224 CHAPTER 3 Determnants e t te t e 2t 38 A = e t 2te t e 2t e t te t 2e 2t 39 If 123 A = 345, 456 compute the matrx product A adj(a) What can you conclude about det(a)? For Problems 40 43, use

More information

1 Derivation of Rate Equations from Single-Cell Conductance (Hodgkin-Huxley-like) Equations

1 Derivation of Rate Equations from Single-Cell Conductance (Hodgkin-Huxley-like) Equations Physcs 171/271 -Davd Klenfeld - Fall 2005 (revsed Wnter 2011) 1 Dervaton of Rate Equatons from Sngle-Cell Conductance (Hodgkn-Huxley-lke) Equatons We consder a network of many neurons, each of whch obeys

More information

Global Sensitivity. Tuesday 20 th February, 2018

Global Sensitivity. Tuesday 20 th February, 2018 Global Senstvty Tuesday 2 th February, 28 ) Local Senstvty Most senstvty analyses [] are based on local estmates of senstvty, typcally by expandng the response n a Taylor seres about some specfc values

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

Transfer Functions. Convenient representation of a linear, dynamic model. A transfer function (TF) relates one input and one output: ( ) system

Transfer Functions. Convenient representation of a linear, dynamic model. A transfer function (TF) relates one input and one output: ( ) system Transfer Functons Convenent representaton of a lnear, dynamc model. A transfer functon (TF) relates one nput and one output: x t X s y t system Y s The followng termnology s used: x y nput output forcng

More information

Time-Varying Systems and Computations Lecture 6

Time-Varying Systems and Computations Lecture 6 Tme-Varyng Systems and Computatons Lecture 6 Klaus Depold 14. Januar 2014 The Kalman Flter The Kalman estmaton flter attempts to estmate the actual state of an unknown dscrete dynamcal system, gven nosy

More information

Integrals and Invariants of Euler-Lagrange Equations

Integrals and Invariants of Euler-Lagrange Equations Lecture 16 Integrals and Invarants of Euler-Lagrange Equatons ME 256 at the Indan Insttute of Scence, Bengaluru Varatonal Methods and Structural Optmzaton G. K. Ananthasuresh Professor, Mechancal Engneerng,

More information

Gaussian Mixture Models

Gaussian Mixture Models Lab Gaussan Mxture Models Lab Objectve: Understand the formulaton of Gaussan Mxture Models (GMMs) and how to estmate GMM parameters. You ve already seen GMMs as the observaton dstrbuton n certan contnuous

More information

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011 Stanford Unversty CS359G: Graph Parttonng and Expanders Handout 4 Luca Trevsan January 3, 0 Lecture 4 In whch we prove the dffcult drecton of Cheeger s nequalty. As n the past lectures, consder an undrected

More information

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal Inner Product Defnton 1 () A Eucldean space s a fnte-dmensonal vector space over the reals R, wth an nner product,. Defnton 2 (Inner Product) An nner product, on a real vector space X s a symmetrc, blnear,

More information

NON-CENTRAL 7-POINT FORMULA IN THE METHOD OF LINES FOR PARABOLIC AND BURGERS' EQUATIONS

NON-CENTRAL 7-POINT FORMULA IN THE METHOD OF LINES FOR PARABOLIC AND BURGERS' EQUATIONS IJRRAS 8 (3 September 011 www.arpapress.com/volumes/vol8issue3/ijrras_8_3_08.pdf NON-CENTRAL 7-POINT FORMULA IN THE METHOD OF LINES FOR PARABOLIC AND BURGERS' EQUATIONS H.O. Bakodah Dept. of Mathematc

More information

2 STATISTICALLY OPTIMAL TRAINING DATA 2.1 A CRITERION OF OPTIMALITY We revew the crteron of statstcally optmal tranng data (Fukumzu et al., 1994). We

2 STATISTICALLY OPTIMAL TRAINING DATA 2.1 A CRITERION OF OPTIMALITY We revew the crteron of statstcally optmal tranng data (Fukumzu et al., 1994). We Advances n Neural Informaton Processng Systems 8 Actve Learnng n Multlayer Perceptrons Kenj Fukumzu Informaton and Communcaton R&D Center, Rcoh Co., Ltd. 3-2-3, Shn-yokohama, Yokohama, 222 Japan E-mal:

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

Physics 5153 Classical Mechanics. D Alembert s Principle and The Lagrangian-1

Physics 5153 Classical Mechanics. D Alembert s Principle and The Lagrangian-1 P. Guterrez Physcs 5153 Classcal Mechancs D Alembert s Prncple and The Lagrangan 1 Introducton The prncple of vrtual work provdes a method of solvng problems of statc equlbrum wthout havng to consder the

More information

ELASTIC WAVE PROPAGATION IN A CONTINUOUS MEDIUM

ELASTIC WAVE PROPAGATION IN A CONTINUOUS MEDIUM ELASTIC WAVE PROPAGATION IN A CONTINUOUS MEDIUM An elastc wave s a deformaton of the body that travels throughout the body n all drectons. We can examne the deformaton over a perod of tme by fxng our look

More information

Chapter 13: Multiple Regression

Chapter 13: Multiple Regression Chapter 13: Multple Regresson 13.1 Developng the multple-regresson Model The general model can be descrbed as: It smplfes for two ndependent varables: The sample ft parameter b 0, b 1, and b are used to

More information

Hidden Markov Models

Hidden Markov Models Hdden Markov Models Namrata Vaswan, Iowa State Unversty Aprl 24, 204 Hdden Markov Model Defntons and Examples Defntons:. A hdden Markov model (HMM) refers to a set of hdden states X 0, X,..., X t,...,

More information

DETERMINATION OF TEMPERATURE DISTRIBUTION FOR ANNULAR FINS WITH TEMPERATURE DEPENDENT THERMAL CONDUCTIVITY BY HPM

DETERMINATION OF TEMPERATURE DISTRIBUTION FOR ANNULAR FINS WITH TEMPERATURE DEPENDENT THERMAL CONDUCTIVITY BY HPM Ganj, Z. Z., et al.: Determnaton of Temperature Dstrbuton for S111 DETERMINATION OF TEMPERATURE DISTRIBUTION FOR ANNULAR FINS WITH TEMPERATURE DEPENDENT THERMAL CONDUCTIVITY BY HPM by Davood Domr GANJI

More information

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement Markov Chan Monte Carlo MCMC, Gbbs Samplng, Metropols Algorthms, and Smulated Annealng 2001 Bonformatcs Course Supplement SNU Bontellgence Lab http://bsnuackr/ Outlne! Markov Chan Monte Carlo MCMC! Metropols-Hastngs

More information

A Robust Method for Calculating the Correlation Coefficient

A Robust Method for Calculating the Correlation Coefficient A Robust Method for Calculatng the Correlaton Coeffcent E.B. Nven and C. V. Deutsch Relatonshps between prmary and secondary data are frequently quantfed usng the correlaton coeffcent; however, the tradtonal

More information

LOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin

LOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin Proceedngs of the 007 Wnter Smulaton Conference S G Henderson, B Bller, M-H Hseh, J Shortle, J D Tew, and R R Barton, eds LOW BIAS INTEGRATED PATH ESTIMATORS James M Calvn Department of Computer Scence

More information

Grover s Algorithm + Quantum Zeno Effect + Vaidman

Grover s Algorithm + Quantum Zeno Effect + Vaidman Grover s Algorthm + Quantum Zeno Effect + Vadman CS 294-2 Bomb 10/12/04 Fall 2004 Lecture 11 Grover s algorthm Recall that Grover s algorthm for searchng over a space of sze wors as follows: consder the

More information

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering / Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons

More information

VQ widely used in coding speech, image, and video

VQ widely used in coding speech, image, and video at Scalar quantzers are specal cases of vector quantzers (VQ): they are constraned to look at one sample at a tme (memoryless) VQ does not have such constrant better RD perfomance expected Source codng

More information

Canonical transformations

Canonical transformations Canoncal transformatons November 23, 2014 Recall that we have defned a symplectc transformaton to be any lnear transformaton M A B leavng the symplectc form nvarant, Ω AB M A CM B DΩ CD Coordnate transformatons,

More information

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010 Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton

More information

A linear imaging system with white additive Gaussian noise on the observed data is modeled as follows:

A linear imaging system with white additive Gaussian noise on the observed data is modeled as follows: Supplementary Note Mathematcal bacground A lnear magng system wth whte addtve Gaussan nose on the observed data s modeled as follows: X = R ϕ V + G, () where X R are the expermental, two-dmensonal proecton

More information

Solution Thermodynamics

Solution Thermodynamics Soluton hermodynamcs usng Wagner Notaton by Stanley. Howard Department of aterals and etallurgcal Engneerng South Dakota School of nes and echnology Rapd Cty, SD 57701 January 7, 001 Soluton hermodynamcs

More information

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U) Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of

More information

The Study of Teaching-learning-based Optimization Algorithm

The Study of Teaching-learning-based Optimization Algorithm Advanced Scence and Technology Letters Vol. (AST 06), pp.05- http://dx.do.org/0.57/astl.06. The Study of Teachng-learnng-based Optmzaton Algorthm u Sun, Yan fu, Lele Kong, Haolang Q,, Helongang Insttute

More information

Relevance Vector Machines Explained

Relevance Vector Machines Explained October 19, 2010 Relevance Vector Machnes Explaned Trstan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introducton Ths document has been wrtten n an attempt to make Tppng s [1] Relevance Vector Machnes

More information

We present the algorithm first, then derive it later. Assume access to a dataset {(x i, y i )} n i=1, where x i R d and y i { 1, 1}.

We present the algorithm first, then derive it later. Assume access to a dataset {(x i, y i )} n i=1, where x i R d and y i { 1, 1}. CS 189 Introducton to Machne Learnng Sprng 2018 Note 26 1 Boostng We have seen that n the case of random forests, combnng many mperfect models can produce a snglodel that works very well. Ths s the dea

More information

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could

More information

Why feed-forward networks are in a bad shape

Why feed-forward networks are in a bad shape Why feed-forward networks are n a bad shape Patrck van der Smagt, Gerd Hrznger Insttute of Robotcs and System Dynamcs German Aerospace Center (DLR Oberpfaffenhofen) 82230 Wesslng, GERMANY emal smagt@dlr.de

More information

DO NOT DO HOMEWORK UNTIL IT IS ASSIGNED. THE ASSIGNMENTS MAY CHANGE UNTIL ANNOUNCED.

DO NOT DO HOMEWORK UNTIL IT IS ASSIGNED. THE ASSIGNMENTS MAY CHANGE UNTIL ANNOUNCED. EE 539 Homeworks Sprng 08 Updated: Tuesday, Aprl 7, 08 DO NOT DO HOMEWORK UNTIL IT IS ASSIGNED. THE ASSIGNMENTS MAY CHANGE UNTIL ANNOUNCED. For full credt, show all work. Some problems requre hand calculatons.

More information

Appendix B: Resampling Algorithms

Appendix B: Resampling Algorithms 407 Appendx B: Resamplng Algorthms A common problem of all partcle flters s the degeneracy of weghts, whch conssts of the unbounded ncrease of the varance of the mportance weghts ω [ ] of the partcles

More information

COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS

COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS Avalable onlne at http://sck.org J. Math. Comput. Sc. 3 (3), No., 6-3 ISSN: 97-537 COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS

More information

Physics 5153 Classical Mechanics. Principle of Virtual Work-1

Physics 5153 Classical Mechanics. Principle of Virtual Work-1 P. Guterrez 1 Introducton Physcs 5153 Classcal Mechancs Prncple of Vrtual Work The frst varatonal prncple we encounter n mechancs s the prncple of vrtual work. It establshes the equlbrum condton of a mechancal

More information

The Geometry of Logit and Probit

The Geometry of Logit and Probit The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.

More information

A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS

A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS HCMC Unversty of Pedagogy Thong Nguyen Huu et al. A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS Thong Nguyen Huu and Hao Tran Van Department of mathematcs-nformaton,

More information

On the correction of the h-index for career length

On the correction of the h-index for career length 1 On the correcton of the h-ndex for career length by L. Egghe Unverstet Hasselt (UHasselt), Campus Depenbeek, Agoralaan, B-3590 Depenbeek, Belgum 1 and Unverstet Antwerpen (UA), IBW, Stadscampus, Venusstraat

More information

The equation of motion of a dynamical system is given by a set of differential equations. That is (1)

The equation of motion of a dynamical system is given by a set of differential equations. That is (1) Dynamcal Systems Many engneerng and natural systems are dynamcal systems. For example a pendulum s a dynamcal system. State l The state of the dynamcal system specfes t condtons. For a pendulum n the absence

More information

1 Matrix representations of canonical matrices

1 Matrix representations of canonical matrices 1 Matrx representatons of canoncal matrces 2-d rotaton around the orgn: ( ) cos θ sn θ R 0 = sn θ cos θ 3-d rotaton around the x-axs: R x = 1 0 0 0 cos θ sn θ 0 sn θ cos θ 3-d rotaton around the y-axs:

More information

Design and Optimization of Fuzzy Controller for Inverse Pendulum System Using Genetic Algorithm

Design and Optimization of Fuzzy Controller for Inverse Pendulum System Using Genetic Algorithm Desgn and Optmzaton of Fuzzy Controller for Inverse Pendulum System Usng Genetc Algorthm H. Mehraban A. Ashoor Unversty of Tehran Unversty of Tehran h.mehraban@ece.ut.ac.r a.ashoor@ece.ut.ac.r Abstract:

More information

Inductance Calculation for Conductors of Arbitrary Shape

Inductance Calculation for Conductors of Arbitrary Shape CRYO/02/028 Aprl 5, 2002 Inductance Calculaton for Conductors of Arbtrary Shape L. Bottura Dstrbuton: Internal Summary In ths note we descrbe a method for the numercal calculaton of nductances among conductors

More information

Solving Nonlinear Differential Equations by a Neural Network Method

Solving Nonlinear Differential Equations by a Neural Network Method Solvng Nonlnear Dfferental Equatons by a Neural Network Method Luce P. Aarts and Peter Van der Veer Delft Unversty of Technology, Faculty of Cvlengneerng and Geoscences, Secton of Cvlengneerng Informatcs,

More information

Primer on High-Order Moment Estimators

Primer on High-Order Moment Estimators Prmer on Hgh-Order Moment Estmators Ton M. Whted July 2007 The Errors-n-Varables Model We wll start wth the classcal EIV for one msmeasured regressor. The general case s n Erckson and Whted Econometrc

More information

Speeding up Computation of Scalar Multiplication in Elliptic Curve Cryptosystem

Speeding up Computation of Scalar Multiplication in Elliptic Curve Cryptosystem H.K. Pathak et. al. / (IJCSE) Internatonal Journal on Computer Scence and Engneerng Speedng up Computaton of Scalar Multplcaton n Ellptc Curve Cryptosystem H. K. Pathak Manju Sangh S.o.S n Computer scence

More information

A new Approach for Solving Linear Ordinary Differential Equations

A new Approach for Solving Linear Ordinary Differential Equations , ISSN 974-57X (Onlne), ISSN 974-5718 (Prnt), Vol. ; Issue No. 1; Year 14, Copyrght 13-14 by CESER PUBLICATIONS A new Approach for Solvng Lnear Ordnary Dfferental Equatons Fawz Abdelwahd Department of

More information

Lecture 14: Bandits with Budget Constraints

Lecture 14: Bandits with Budget Constraints IEOR 8100-001: Learnng and Optmzaton for Sequental Decson Makng 03/07/16 Lecture 14: andts wth udget Constrants Instructor: Shpra Agrawal Scrbed by: Zhpeng Lu 1 Problem defnton In the regular Mult-armed

More information

AN IMPROVED PARTICLE FILTER ALGORITHM BASED ON NEURAL NETWORK FOR TARGET TRACKING

AN IMPROVED PARTICLE FILTER ALGORITHM BASED ON NEURAL NETWORK FOR TARGET TRACKING AN IMPROVED PARTICLE FILTER ALGORITHM BASED ON NEURAL NETWORK FOR TARGET TRACKING Qn Wen, Peng Qcong 40 Lab, Insttuton of Communcaton and Informaton Engneerng,Unversty of Electronc Scence and Technology

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons

More information

Some modelling aspects for the Matlab implementation of MMA

Some modelling aspects for the Matlab implementation of MMA Some modellng aspects for the Matlab mplementaton of MMA Krster Svanberg krlle@math.kth.se Optmzaton and Systems Theory Department of Mathematcs KTH, SE 10044 Stockholm September 2004 1. Consdered optmzaton

More information

C/CS/Phy191 Problem Set 3 Solutions Out: Oct 1, 2008., where ( 00. ), so the overall state of the system is ) ( ( ( ( 00 ± 11 ), Φ ± = 1

C/CS/Phy191 Problem Set 3 Solutions Out: Oct 1, 2008., where ( 00. ), so the overall state of the system is ) ( ( ( ( 00 ± 11 ), Φ ± = 1 C/CS/Phy9 Problem Set 3 Solutons Out: Oct, 8 Suppose you have two qubts n some arbtrary entangled state ψ You apply the teleportaton protocol to each of the qubts separately What s the resultng state obtaned

More information

Errors for Linear Systems

Errors for Linear Systems Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch

More information

9 Derivation of Rate Equations from Single-Cell Conductance (Hodgkin-Huxley-like) Equations

9 Derivation of Rate Equations from Single-Cell Conductance (Hodgkin-Huxley-like) Equations Physcs 171/271 - Chapter 9R -Davd Klenfeld - Fall 2005 9 Dervaton of Rate Equatons from Sngle-Cell Conductance (Hodgkn-Huxley-lke) Equatons We consder a network of many neurons, each of whch obeys a set

More information

PHYS 705: Classical Mechanics. Calculus of Variations II

PHYS 705: Classical Mechanics. Calculus of Variations II 1 PHYS 705: Classcal Mechancs Calculus of Varatons II 2 Calculus of Varatons: Generalzaton (no constrant yet) Suppose now that F depends on several dependent varables : We need to fnd such that has a statonary

More information

Structure and Drive Paul A. Jensen Copyright July 20, 2003

Structure and Drive Paul A. Jensen Copyright July 20, 2003 Structure and Drve Paul A. Jensen Copyrght July 20, 2003 A system s made up of several operatons wth flow passng between them. The structure of the system descrbes the flow paths from nputs to outputs.

More information

NUMERICAL DIFFERENTIATION

NUMERICAL DIFFERENTIATION NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the

More information

EEL 6266 Power System Operation and Control. Chapter 3 Economic Dispatch Using Dynamic Programming

EEL 6266 Power System Operation and Control. Chapter 3 Economic Dispatch Using Dynamic Programming EEL 6266 Power System Operaton and Control Chapter 3 Economc Dspatch Usng Dynamc Programmng Pecewse Lnear Cost Functons Common practce many utltes prefer to represent ther generator cost functons as sngle-

More information

Problem Set 9 Solutions

Problem Set 9 Solutions Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem

More information

MATH 5630: Discrete Time-Space Model Hung Phan, UMass Lowell March 1, 2018

MATH 5630: Discrete Time-Space Model Hung Phan, UMass Lowell March 1, 2018 MATH 5630: Dscrete Tme-Space Model Hung Phan, UMass Lowell March, 08 Newton s Law of Coolng Consder the coolng of a well strred coffee so that the temperature does not depend on space Newton s law of collng

More information

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family IOSR Journal of Mathematcs IOSR-JM) ISSN: 2278-5728. Volume 3, Issue 3 Sep-Oct. 202), PP 44-48 www.osrjournals.org Usng T.O.M to Estmate Parameter of dstrbutons that have not Sngle Exponental Famly Jubran

More information