Planning in POMDPs. Dominik Schoenberger Abstract
|
|
- Elvin Fowler
- 5 years ago
- Views:
Transcription
1 Planning in POMDPs Dominik Schoenberger Absrac This documen briefly explains wha a Parially Observable Markov Decision Process is. Furhermore i inroduces he differen approaches available o solve his class of problems. I also gives an overview over echniques for selecing belief poins which make value backups more efficien. 1 The Parially Observable Markov Decision Process While agens used in classical planning are concerned only wih environmens ha are fully observable, he real world of roboic applicaions is generally a place where i is no possible o observe he whole environmen and ake deerminisic acions. The observed environmen migh no even be saic. For planning under such uncerainy, i is necessary o improve he robusness by explicily reasoning abou he ype of uncerainy ha can occur. The Parially Observable Markov Decision Process (POMDP) has become possibly he mos general represenaion of his problem. 1.1 Benefis of POMDPs This is because i combines he mos essenial feaures for planning under uncerainy. Whereas oher frameworks handle neiher or only sochasic acion effecs, POMDPs handle uncerainy in boh acion effecs an sae observabiliy. The laer is done by expressing parial sae observaions over informaion saes insead of world saes, since hese world saes are no direcly observable. Here he measuremens of noisy and imperfec sensors are used o calculae he informaion saes which form he beliefs a sysem has over is world sae. These informaion saes are represened by probabiliy disribuions over world saes. Many POMDP algorihms form plans by opimizing a value funcion allowing he numerical rade off beween alernaive ways o saisfy a goal, even muliple ineracing goals, and he comparison of acions wih differen coss or rewards. POMDPs are unique doing his for informaion saes insead of world saes. POMDPs produce a universal plan by giving a full policy for acion selecion prescribing he choice of acion for any possible informaion sae and alleviaing he need for replanning. This makes he execuion faser. 1.2 Disadvanages of POMDPs Noneheless, his is also he main drawback of POMDPs, because generaing a universal plan has a high compuaional complexiy. Mos algorihms for exac planning in POMDPs opimize he value funcion over all possible beliefs, which is known o be PSPACE-complee. Tha means ha many POMDP domains wih only a few saes, acions and sensor observaions are already compuaionally inracable. Proposiional planning is only NP-complee. To speed up POMDP solving, a commonly used echnique is o keep value backups of a se of belief poins. Anoher problem, which has long been a key impedimen for POMDPs being used in pracical applicaions, is ha, if he value funcion is expressed by a se of vecors, in his se a vecor can be fully dominaed by a se of oher vecors (see Figure 1). Pruning ha dominaed vecors away can be expensive. 1
2 Figure 1: Value funcion vecor α 2 is dominaed by a combinaion of α 1 and α Basic POMDP erminology The POMDP is formally defined by six disinc quaniies which are denoed {S, A, T, Z, O, R}. These represen he following: Saes S s denoes a sae of he world and he finie se of all saes of a world is denoed S = {s 0, s 1,...}, while he sae a he ime is denoed s wih being a discree ime index. Since he sae of a world is no direcly observable is POMDPs, an agen can only assume which sae i is in by compuing a belief over he sae space S. Acions A The agen is given a se of acions denoed A = {a 0, a 1,...} which i can use o ac in he world. These acions affec he sae of he world sochasically, so choosing he righ acion is a funcion of hisory and ha makes choosing he righ acion he core problem in POMDPs. Observaions Z Since a belief of he world s sae s is needed, he agen can derive his belief from sensor measuremens. A se of measuremens a he same ime is called observaion z. The se of all observaions is denoed Z = {z 0, z 1,...} where he observaion a ime is denoed z. Any observaion z is usually an incomplee projecion of a world sae s due o sensor noise. Reward funcion R The funcion R(s, a) : S A R assigns he reward of performing an acion a a a sae s. The agen ries o collec as much reward as possible over ime, which means i ries o maximize E[ T = 0 γ 0 r ] where E[] is he mahemaical expecaion, 0 γ < 1 is a discoun facor ensuring he sum being finie and r is he reward a ime. Sae ransiion probabiliy disribuion T Given ha he agen is in sae s and selecs acion a, he probabiliy of ransiioning o sae s is T (s, a, s ) := P r(s = s s 1 = s, a 1 = a), for any (s, a, s ). T is a condiional probabiliy disribuion which means ha s S T (s, a, s ) = 1, (s, a). T is also ime-invarian. Observaion probabiliy disribuion O Upon execuing acion a in sae s he probabiliy ha he agen will perceive observaion z is O(s, a, z) := P r(z = z s 1 = s, a 1 = a), 1 Graphic from Pineau, Gordon & Thrun - Anyime Poin-Based Approximaions for POMDPs, Figure 1. 2
3 for all (s, a, z). O is also a condiional probabiliy disribuion wih z Z O(s, a, z) = 1, (s, a) and i is also ime-invarian. 1.4 Belief compuaion Since POMDPs are insances of Markov processes, he curren world sae s is sufficien o predic he fuure independen of he pas {s 0, s 1,..., s 1 }, bu unforunaely he agen in a POMDP can only perceive observaions {z 0,..., z }, because he sae is no direcly observable. This is why he agen has o compue a belief of he world sae insead using a complee race of all observaions and all acions ever execued. Tha race is called a hisory h := {z 0, a 0, z 1,..., z 1, a 1, z }, here a ime. If an iniial sae probabiliy disribuion b 0 (s) := P r(s 0 = s) is available o he agen, he hisory can also be summarized via a belief disribuion b (s) := P r(s = s z, a 1, z 1,..., a 0, b 0 ) insead of being represened explicily. The belief b can be calculaed recursively using only he las belief b 1, he las acion a 1 and he curren observaion z. The belief updae equaion τ() is now defined as follows, equivalen o he one of he Bayes filer: τ(b 1, a 1, z ) = b (s ) = The denominaor is normalizing consan. 1.5 Opimal policy compuaion O(s,a 1,z )T (s,a 1,s )b 1(s) s P r(z b,a 1 Compuing a policy for selecing acions is he cenral objecive in a POMDP. The policy π(b) a chooses acion a a a belief disribuion b. Since he agen wans o maximize he expeced fuure discouned cumulaive reward, he opimal policy for his is π (b 0 ) = arg max π E π [ T = 0 γ 0 r b 0 ] A sraighforward approach o finding an opimal policy is o apply muliple ieraions o compue increasingly more accurae values for each belief sae b. For his a value funcion V is needed, which maps belief saes o values. The iniial value funcion is: V 0 (b) = max a R(s, a)b(s) For each ieraion of he value funcion is compued recursively and maximizes he sum of all fuure rewards wihin for any belief sae b wihin ime seps: V (b) = max a [ R(s, a)b(s) + γ P r(z a, b)v 1 (τ(b, a, z))] This way i produces a policy ha is opimal under he same planning horizon : π (b) = arg max a [ R(s, a)b(s) + γ P r(z a, b)v 1 (τ(b, a, z))] Now each of hese value funcions a any planning horizon can be expressed by a se of vecors Γ = {α 0, α 1,..., α m }, each vecor represening an S -dimensional hyper-plane and defining he value funcion over a bounded region of he belief sae: V (b) = max α Γ α(s)b(s) Then each of hese α-vecors is associaed wih an acion a o creae a policy, ha already assumes opimal behavior for he following seps: 3
4 V (b) = max a A [ R(s, a)b(s) + γ max α Γ 1 s S T (s, a, s )O(s, a, z)α(s )b(s)] V (b) canno be compued direcly for each beliefs because here are infiniely many beliefs. However he corresponding Γ can be generaed, done by a sequence operaions on he previous se Γ 1. For each acion a and for each observaion z he se Γ a, is compued as follows: along wih he inermediae se Γ a,z : α a, (s) = R(s, a) α a,z i (s) = γ T (s, a, s )O(s, a, z), α i (s ), α i Γ 1 s S Nex he cross-sum over observaions Γ a, a A is creaed including one α a,z from each Γ a,z : and he union is aken of all Γ a ses: Γ a = Γ a, + Γ a,z1 Γ = a A Γ a Γ a,z2... In his form he pieces of he soluion for he value funcion a he horizon can be backed up. To exrac he value funcion from he se Γ, he α-vecors are applied o he equaion for V (b) from above: V (b) = max α Γ α(s)b(s) 1.6 Poin-based value backup While here are many differen approaches how o selec belief poins o be updaed, he procedure of how he updae is done is sandard for any of hese, implemened as a sequence of operaions on a se of α-vecors. Since he updae of he value funcion is only applied a a fixed se of belief poins B = {b 0, b 1,..., b q }, here is a corresponding se of vecors {α 0, α 1,..., α q } conaining a mos one vecor for each belief. I is now assumed, ha he belief poins in a region around b have he same acion choice and also lead o he same faces of V 1 as his poin b. For his poin only one of is α-vecors from a given soluion se Γ 1 is used for he poin-based backup. To obain now he nex soluion se Γ, se Γ a, is generaed for all acions and observaions: and he same is done for Γ a,z : α a, (s) = R(s, a) α a,z i (s) = γ T (s, a, s )O(s, a, z), α i (s ), α i Γ 1 s S Nex, insead of a cross-sum, a simple summaion is calculaed o ge Γ a, a A: αb a = Γa, + arg max a Γ a,z( α(s)b(s)), b B Finally, he bes acion is needed for each belief poin: α b = arg max Γ a ( Γ a (s)b(s)), b B and he soluion se is creaed wih hese: Γ = b B α b Alhough he operaions above preserve only he bes α-vecor for each belief poin b B, an esimae of he value funcion a any belief a / B can be calculaed from Γ by using again: V (b) = max α Γ α(s)b(s) 4
5 2 Poin-based algorihms 2.1 Exac poin-based algorihms This ype of mehods ypically canno scale beyond a handful of saes, acions and observaions. Earlier echniques like his use poin-based backups o opimize he value funcion over limied pars of he belief ree looking for beliefs where he value funcion is no opimal. Therefore all reachable beliefs have o be considered, leaving his an expensive approach. Noneheless i is guaraneed o deliver he opimal soluion. 2.2 Approximae poin-based algorihms Poin-Based Value Ieraion Two main componens are needed o achieve an anyime soluion o large POMDP domains. These are he belief se selecion and he poin-based updae procedure, which is done here. The Poin- Based Value Ieraion algorihm (PBVI) sars wih an iniial se of belief poins for applying a firs backup. I hen grows he belief ree and does a new series of backup operaions including old and new beliefs. This is repeaed unil a saisfacory soluion is obained. In his way PBVI gradually rades off compuaion ime and soluion qualiy. Even hough i is no guaraneed, ha he value funcion improves wih he addiion of belief poins, PBVI decreases or a leas keeps he bound error wih each sep The Perseus algorihm Perseus always uses randomly chosen poins ha are added o he belief ree. Value updaes are no done all a once, he poins are randomly sampled o updae heir value one a a ime insead. Because of one updaed value in a value funcion vecor can also improve he value of nearby poins, hese poins are hen already removed from he sampling se. The algorihm coninues unil he value of all poins has been improved Heurisic Search Value Ieraion The Heurisic Search Value Ieraion algorihm (HSVI) keeps a lower and upper bound for he value funcion which i used o selec belief poins. To perform a value updae, i only updaes he direc predecessors of he seleced belief. The HSVI algorihm offers anyime performance Real Time Belief Space Search The Real Time Belief Space Search approach (RTBSS) consrucs a new belief reachabiliy ree by using he curren poin as he op node and erminaing he ree a a fixed deph. This way, he value of each node can be calculaed recursively over he finie planning horizon. The algorihm also deleed subrees ha exceed a calculaed bound, compared o oher subrees. A his poin, anoher algorihm like PBVI can be used o compue a lower bound and so improving pruning of subrees which also improves he qualiy of he soluion of he RTBSS. This approach is able o compue fas resuls alhough he qualiy is no as good as he soluion qualiy of algorihms like PBVI or Perseus. 2.3 Sraegies for selecing belief poins There a differen mehods used for selecing new belief poins. I is useful o check firs if he beliefs ha are considered as a backup are acually reachable. Therefore a subse of reachable beliefs is creaed saring wih a known iniial belief (see Figure 2). This subse should be sufficienly small for compuaional racabiliy and large enough for good value funcion approximaion. 5
6 Figure 2: The shown belief ree includes reachable beliefs only Random Belief Selecion The simples way o sample a new belief poin is obviously o choose i randomly ou of he enire belief simplex. The only hing o regard here is o ensure a uniform coverage. This sraegy work well in small domains bu since i canno provide a good coverage of he belief simplex wih a reasonable number of poins, i exhibis poor performance in large domains Sochasic Simulaion wih Random Acion A beer sraegy is o add poins along he belief ree. To generae hese, an acion is simulaed, making a single-sep forward rajecory from belief poins already in he ree. Since his acion is seleced randomly, he belief ree will sill be very large, especially when he branching facor is high Sochasic Simulaion wih Greedy Acion If he acion is chosen he way, ha he expeced value gain a he new belief poin will be he mos of all value gains seen from he curren belief, his is called a Greedy Acion. Here he ɛ-greedy exploraion sraegy known from reinforcemen learning is used o give he probabiliy wih which he greedy acion is seleced. Then he single-sep forward simulaion is done using he seleced acion Sochasic Simulaion wih Exploraory Acion Because of POMDP algorihm performing bes wih a uniformly dense se of reachable beliefs, he new belief supposed o be added o he belief ree should improve he wors-case densiy. To do his, he simulaion wih Exploraory Acion does a single-sep forward simulaion wih each acion, bu hen keeps only ha one poin, which is farhes away from all oher belief poins already in he belief ree Greedy Error Reducion The mos successful sraegy for selecing new belief poins ries o reduce he expeced error. I firs calculaes he addiional error inroduced by a single belief poin backup for each possible new poin in he ree. Then he exising poin wih he larges error bound is needed, wherefore is imporan o regard he reachabiliy probabiliy of his poin as well. Finally of ha poins descendans ha one is seleced, ha would minimize he new error bound (see Figure 3). 2 Graphic from Pineau, Gordon & Thrun - Anyime Poin-Based Approximaions for POMDPs, Figure 2. 6
7 Figure 3: The marked poins are he candidaes o be added nex. 3 3 Grid-based algorihms 3.1 Grid-based approximaion To approximae he value funcion using a finie se of belief poin, many approaches are know. As he name grid-based approximaion predics, here he poins are disribued according o a grid paern over he belief space. The value of poins no on he grid is specified by an inerpolaionexrapolaion rule maching hem o neighboring grid-poins. Thereby he convexiy of he value funcion of POMDPs is ignored. 3.2 Sraegies for selecing grid-poins To selec he grid-poins needed here, an easy way is o lay a grid wih fixed resoluion over he belief simplex. Now only neighboring grid-poins are used o calculae he value inerpolaion. This is done quickly, bu he number of poins grows wih he dimensionaliy of he belief space. Even simpler is he approach ha selecs random poins over he whole belief ree, bu ha makes inerpolaion a lo harder. These boh mehods are no ideal, when beliefs are no uniformly disribued, which is he acual characerizaion of many real-life problems. Furher here are approaches called non-regular grid approximaions. One of hem does single-sep sochasic simulaions saring a he corner poins of he belief simplex o generae addiional belief poins. Anoher approach also builds a grid bu sars a criical poins of he belief simplex and hen uses a heurisic o esimae he usefulness of inermediae poins i adds sep by sep. A hird one makes an inerpolaion over he values a criical poins of he grid. Though hese mehods require fewer beliefs, hey are more expensive because inerpolaion over non-grid poins requires searching over all grid poins, raher han jus neighboring ones. A beer approach creaes sub-samples of he fixed-resoluion grid fields were needed and his way i ges a variable resoluion of he whole grid. So i can sample some pars more densely while grid poins are resriced o lie on he fixed-resoluion grid. The disadvanage of his algorihm is ha is requires a large number of grid poins o performance well. Anoher good algorihm can be applied o POMDPs wih ɛ-opimaliy and requires a horoughly covered belief simplex and herefore exponenially many grid poins are needed. Bu he algorihm is really fas because i inerpolaes only over he neares neighbor of a one-sep successor belief for each grid poin. References [1] Joelle Pineau, Geoff Gordon and Sebasian Thrun (2006). Anyime poin-based approximaions for large POMDPs. Journal of Arificial Inelligence Research, Vol 27, pp Graphic from Pineau, Gordon & Thrun - Anyime Poin-Based Approximaions for POMDPs, Figure 3. 7
T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB
Elecronic Companion EC.1. Proofs of Technical Lemmas and Theorems LEMMA 1. Le C(RB) be he oal cos incurred by he RB policy. Then we have, T L E[C(RB)] 3 E[Z RB ]. (EC.1) Proof of Lemma 1. Using he marginal
More informationTwo Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017
Two Popular Bayesian Esimaors: Paricle and Kalman Filers McGill COMP 765 Sep 14 h, 2017 1 1 1, dx x Bel x u x P x z P Recall: Bayes Filers,,,,,,, 1 1 1 1 u z u x P u z u x z P Bayes z = observaion u =
More informationAn introduction to the theory of SDDP algorithm
An inroducion o he heory of SDDP algorihm V. Leclère (ENPC) Augus 1, 2014 V. Leclère Inroducion o SDDP Augus 1, 2014 1 / 21 Inroducion Large scale sochasic problem are hard o solve. Two ways of aacking
More informationSequential Importance Resampling (SIR) Particle Filter
Paricle Filers++ Pieer Abbeel UC Berkeley EECS Many slides adaped from Thrun, Burgard and Fox, Probabilisic Roboics 1. Algorihm paricle_filer( S -1, u, z ): 2. Sequenial Imporance Resampling (SIR) Paricle
More informationNotes on Kalman Filtering
Noes on Kalman Filering Brian Borchers and Rick Aser November 7, Inroducion Daa Assimilaion is he problem of merging model predicions wih acual measuremens of a sysem o produce an opimal esimae of he curren
More informationRL Lecture 7: Eligibility Traces. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1
RL Lecure 7: Eligibiliy Traces R. S. Suon and A. G. Baro: Reinforcemen Learning: An Inroducion 1 N-sep TD Predicion Idea: Look farher ino he fuure when you do TD backup (1, 2, 3,, n seps) R. S. Suon and
More informationDiebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles
Diebold, Chaper 7 Francis X. Diebold, Elemens of Forecasing, 4h Ediion (Mason, Ohio: Cengage Learning, 006). Chaper 7. Characerizing Cycles Afer compleing his reading you should be able o: Define covariance
More informationVehicle Arrival Models : Headway
Chaper 12 Vehicle Arrival Models : Headway 12.1 Inroducion Modelling arrival of vehicle a secion of road is an imporan sep in raffic flow modelling. I has imporan applicaion in raffic flow simulaion where
More informationL07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS. NA568 Mobile Robotics: Methods & Algorithms
L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS NA568 Mobile Roboics: Mehods & Algorihms Today s Topic Quick review on (Linear) Kalman Filer Kalman Filering for Non-Linear Sysems Exended Kalman Filer (EKF)
More informationZürich. ETH Master Course: L Autonomous Mobile Robots Localization II
Roland Siegwar Margaria Chli Paul Furgale Marco Huer Marin Rufli Davide Scaramuzza ETH Maser Course: 151-0854-00L Auonomous Mobile Robos Localizaion II ACT and SEE For all do, (predicion updae / ACT),
More information1 Review of Zero-Sum Games
COS 5: heoreical Machine Learning Lecurer: Rob Schapire Lecure #23 Scribe: Eugene Brevdo April 30, 2008 Review of Zero-Sum Games Las ime we inroduced a mahemaical model for wo player zero-sum games. Any
More informationExploiting Symmetries in POMDPs for Point-Based Algorithms
Proceedings of he Tweny-Third AAAI Conference on Arificial Inelligence (2008) Exploiing Symmeries in POMDPs for Poin-Based Algorihms Kee-Eung Kim Deparmen of Compuer Science Korea Advanced Insiue of Science
More information3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon
3..3 INRODUCION O DYNAMIC OPIMIZAION: DISCREE IME PROBLEMS A. he Hamilonian and Firs-Order Condiions in a Finie ime Horizon Define a new funcion, he Hamilonian funcion, H. H he change in he oal value of
More informationTom Heskes and Onno Zoeter. Presented by Mark Buller
Tom Heskes and Onno Zoeer Presened by Mark Buller Dynamic Bayesian Neworks Direced graphical models of sochasic processes Represen hidden and observed variables wih differen dependencies Generalize Hidden
More informationEstimation of Poses with Particle Filters
Esimaion of Poses wih Paricle Filers Dr.-Ing. Bernd Ludwig Chair for Arificial Inelligence Deparmen of Compuer Science Friedrich-Alexander-Universiä Erlangen-Nürnberg 12/05/2008 Dr.-Ing. Bernd Ludwig (FAU
More informationState-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter
Sae-Space Models Iniializaion, Esimaion and Smoohing of he Kalman Filer Iniializaion of he Kalman Filer The Kalman filer shows how o updae pas predicors and he corresponding predicion error variances when
More informationModal identification of structures from roving input data by means of maximum likelihood estimation of the state space model
Modal idenificaion of srucures from roving inpu daa by means of maximum likelihood esimaion of he sae space model J. Cara, J. Juan, E. Alarcón Absrac The usual way o perform a forced vibraion es is o fix
More informationINTRODUCTION TO MACHINE LEARNING 3RD EDITION
ETHEM ALPAYDIN The MIT Press, 2014 Lecure Slides for INTRODUCTION TO MACHINE LEARNING 3RD EDITION alpaydin@boun.edu.r hp://www.cmpe.boun.edu.r/~ehem/i2ml3e CHAPTER 2: SUPERVISED LEARNING Learning a Class
More informationArticle from. Predictive Analytics and Futurism. July 2016 Issue 13
Aricle from Predicive Analyics and Fuurism July 6 Issue An Inroducion o Incremenal Learning By Qiang Wu and Dave Snell Machine learning provides useful ools for predicive analyics The ypical machine learning
More informationSimulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010
Simulaion-Solving Dynamic Models ABE 5646 Week 2, Spring 2010 Week Descripion Reading Maerial 2 Compuer Simulaion of Dynamic Models Finie Difference, coninuous saes, discree ime Simple Mehods Euler Trapezoid
More informationA Shooting Method for A Node Generation Algorithm
A Shooing Mehod for A Node Generaion Algorihm Hiroaki Nishikawa W.M.Keck Foundaion Laboraory for Compuaional Fluid Dynamics Deparmen of Aerospace Engineering, Universiy of Michigan, Ann Arbor, Michigan
More informationPresentation Overview
Acion Refinemen in Reinforcemen Learning by Probabiliy Smoohing By Thomas G. Dieerich & Didac Busques Speaer: Kai Xu Presenaion Overview Bacground The Probabiliy Smoohing Mehod Experimenal Sudy of Acion
More informationOnline Appendix to Solution Methods for Models with Rare Disasters
Online Appendix o Soluion Mehods for Models wih Rare Disasers Jesús Fernández-Villaverde and Oren Levinal In his Online Appendix, we presen he Euler condiions of he model, we develop he pricing Calvo block,
More informationEfficient POMDP Forward Search by Predicting the Posterior Belief Distribution Ruijie He and Nicholas Roy
Compuer Science and Arificial Inelligence Laboraory Technical Repor MIT-CSAIL-TR-2009-044 Sepember 23, 2009 Efficien POMDP Forward Search by Predicing he Poserior Belief Disribuion Ruijie He and Nicholas
More informationAnnouncements. Recap: Filtering. Recap: Reasoning Over Time. Example: State Representations for Robot Localization. Particle Filtering
Inroducion o Arificial Inelligence V22.0472-001 Fall 2009 Lecure 18: aricle & Kalman Filering Announcemens Final exam will be a 7pm on Wednesday December 14 h Dae of las class 1.5 hrs long I won ask anyhing
More informationAir Traffic Forecast Empirical Research Based on the MCMC Method
Compuer and Informaion Science; Vol. 5, No. 5; 0 ISSN 93-8989 E-ISSN 93-8997 Published by Canadian Cener of Science and Educaion Air Traffic Forecas Empirical Research Based on he MCMC Mehod Jian-bo Wang,
More informationSTATE-SPACE MODELLING. A mass balance across the tank gives:
B. Lennox and N.F. Thornhill, 9, Sae Space Modelling, IChemE Process Managemen and Conrol Subjec Group Newsleer STE-SPACE MODELLING Inroducion: Over he pas decade or so here has been an ever increasing
More informationLecture 33: November 29
36-705: Inermediae Saisics Fall 2017 Lecurer: Siva Balakrishnan Lecure 33: November 29 Today we will coninue discussing he boosrap, and hen ry o undersand why i works in a simple case. In he las lecure
More informationChapter 2. First Order Scalar Equations
Chaper. Firs Order Scalar Equaions We sar our sudy of differenial equaions in he same way he pioneers in his field did. We show paricular echniques o solve paricular ypes of firs order differenial equaions.
More informationSUPPLEMENTARY INFORMATION
SUPPLEMENTARY INFORMATION DOI: 0.038/NCLIMATE893 Temporal resoluion and DICE * Supplemenal Informaion Alex L. Maren and Sephen C. Newbold Naional Cener for Environmenal Economics, US Environmenal Proecion
More informationDecentralized Stochastic Control with Partial History Sharing: A Common Information Approach
1 Decenralized Sochasic Conrol wih Parial Hisory Sharing: A Common Informaion Approach Ashuosh Nayyar, Adiya Mahajan and Demoshenis Tenekezis arxiv:1209.1695v1 [cs.sy] 8 Sep 2012 Absrac A general model
More informationChristos Papadimitriou & Luca Trevisan November 22, 2016
U.C. Bereley CS170: Algorihms Handou LN-11-22 Chrisos Papadimiriou & Luca Trevisan November 22, 2016 Sreaming algorihms In his lecure and he nex one we sudy memory-efficien algorihms ha process a sream
More information3.1 More on model selection
3. More on Model selecion 3. Comparing models AIC, BIC, Adjused R squared. 3. Over Fiing problem. 3.3 Sample spliing. 3. More on model selecion crieria Ofen afer model fiing you are lef wih a handful of
More informationExpert Advice for Amateurs
Exper Advice for Amaeurs Ernes K. Lai Online Appendix - Exisence of Equilibria The analysis in his secion is performed under more general payoff funcions. Wihou aking an explici form, he payoffs of he
More information1. An introduction to dynamic optimization -- Optimal Control and Dynamic Programming AGEC
This documen was generaed a :45 PM 8/8/04 Copyrigh 04 Richard T. Woodward. An inroducion o dynamic opimizaion -- Opimal Conrol and Dynamic Programming AGEC 637-04 I. Overview of opimizaion Opimizaion is
More information14 Autoregressive Moving Average Models
14 Auoregressive Moving Average Models In his chaper an imporan parameric family of saionary ime series is inroduced, he family of he auoregressive moving average, or ARMA, processes. For a large class
More information5. Stochastic processes (1)
Lec05.pp S-38.45 - Inroducion o Teleraffic Theory Spring 2005 Conens Basic conceps Poisson process 2 Sochasic processes () Consider some quaniy in a eleraffic (or any) sysem I ypically evolves in ime randomly
More informationCSE/NB 528 Lecture 14: Reinforcement Learning (Chapter 9)
CSE/NB 528 Lecure 14: Reinforcemen Learning Chaper 9 Image from hp://clasdean.la.asu.edu/news/images/ubep2001/neuron3.jpg Lecure figures are from Dayan & Abbo s book hp://people.brandeis.edu/~abbo/book/index.hml
More informationInformation Relaxations and Duality in Stochastic Dynamic Programs
OPERATIONS RESEARCH Vol. 58, No. 4, Par 1 of 2, July Augus 2010, pp. 785 801 issn 0030-364X eissn 1526-5463 10 5804 0785 informs doi 10.1287/opre.1090.0796 2010 INFORMS Informaion Relaxaions and Dualiy
More informationSZG Macro 2011 Lecture 3: Dynamic Programming. SZG macro 2011 lecture 3 1
SZG Macro 2011 Lecure 3: Dynamic Programming SZG macro 2011 lecure 3 1 Background Our previous discussion of opimal consumpion over ime and of opimal capial accumulaion sugges sudying he general decision
More information0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED
0.1 MAXIMUM LIKELIHOOD ESTIMATIO EXPLAIED Maximum likelihood esimaion is a bes-fi saisical mehod for he esimaion of he values of he parameers of a sysem, based on a se of observaions of a random variable
More informationOn Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems
MATHEMATICS OF OPERATIONS RESEARCH Vol. 38, No. 2, May 2013, pp. 209 227 ISSN 0364-765X (prin) ISSN 1526-5471 (online) hp://dx.doi.org/10.1287/moor.1120.0562 2013 INFORMS On Boundedness of Q-Learning Ieraes
More informationLecture 9: September 25
0-725: Opimizaion Fall 202 Lecure 9: Sepember 25 Lecurer: Geoff Gordon/Ryan Tibshirani Scribes: Xuezhi Wang, Subhodeep Moira, Abhimanu Kumar Noe: LaTeX emplae couresy of UC Berkeley EECS dep. Disclaimer:
More informationTimed Circuits. Asynchronous Circuit Design. Timing Relationships. A Simple Example. Timed States. Timing Sequences. ({r 6 },t6 = 1.
Timed Circuis Asynchronous Circui Design Chris J. Myers Lecure 7: Timed Circuis Chaper 7 Previous mehods only use limied knowledge of delays. Very robus sysems, bu exremely conservaive. Large funcional
More information2.160 System Identification, Estimation, and Learning. Lecture Notes No. 8. March 6, 2006
2.160 Sysem Idenificaion, Esimaion, and Learning Lecure Noes No. 8 March 6, 2006 4.9 Eended Kalman Filer In many pracical problems, he process dynamics are nonlinear. w Process Dynamics v y u Model (Linearized)
More informationBias-Variance Error Bounds for Temporal Difference Updates
Bias-Variance Bounds for Temporal Difference Updaes Michael Kearns AT&T Labs mkearns@research.a.com Sainder Singh AT&T Labs baveja@research.a.com Absrac We give he firs rigorous upper bounds on he error
More informationRobust estimation based on the first- and third-moment restrictions of the power transformation model
h Inernaional Congress on Modelling and Simulaion, Adelaide, Ausralia, 6 December 3 www.mssanz.org.au/modsim3 Robus esimaion based on he firs- and hird-momen resricions of he power ransformaion Nawaa,
More informationOverview. COMP14112: Artificial Intelligence Fundamentals. Lecture 0 Very Brief Overview. Structure of this course
OMP: Arificial Inelligence Fundamenals Lecure 0 Very Brief Overview Lecurer: Email: Xiao-Jun Zeng x.zeng@mancheser.ac.uk Overview This course will focus mainly on probabilisic mehods in AI We shall presen
More informationProbabilistic Robotics
Probabilisic Roboics Bayes Filer Implemenaions Gaussian filers Bayes Filer Reminder Predicion bel p u bel d Correcion bel η p z bel Gaussians : ~ π e p N p - Univariae / / : ~ μ μ μ e p Ν p d π Mulivariae
More informationSimulating models with heterogeneous agents
Simulaing models wih heerogeneous agens Wouer J. Den Haan London School of Economics c by Wouer J. Den Haan Individual agen Subjec o employmen shocks (ε i, {0, 1}) Incomplee markes only way o save is hrough
More information. Now define y j = log x j, and solve the iteration.
Problem 1: (Disribued Resource Allocaion (ALOHA!)) (Adaped from M& U, Problem 5.11) In his problem, we sudy a simple disribued proocol for allocaing agens o shared resources, wherein agens conend for resources
More information12: AUTOREGRESSIVE AND MOVING AVERAGE PROCESSES IN DISCRETE TIME. Σ j =
1: AUTOREGRESSIVE AND MOVING AVERAGE PROCESSES IN DISCRETE TIME Moving Averages Recall ha a whie noise process is a series { } = having variance σ. The whie noise process has specral densiy f (λ) = of
More informationTracking. Announcements
Tracking Tuesday, Nov 24 Krisen Grauman UT Ausin Announcemens Pse 5 ou onigh, due 12/4 Shorer assignmen Auo exension il 12/8 I will no hold office hours omorrow 5 6 pm due o Thanksgiving 1 Las ime: Moion
More informationIntroduction to Probability and Statistics Slides 4 Chapter 4
Inroducion o Probabiliy and Saisics Slides 4 Chaper 4 Ammar M. Sarhan, asarhan@mahsa.dal.ca Deparmen of Mahemaics and Saisics, Dalhousie Universiy Fall Semeser 8 Dr. Ammar Sarhan Chaper 4 Coninuous Random
More informationLongest Common Prefixes
Longes Common Prefixes The sandard ordering for srings is he lexicographical order. I is induced by an order over he alphabe. We will use he same symbols (,
More informationGeorey E. Hinton. University oftoronto. Technical Report CRG-TR February 22, Abstract
Parameer Esimaion for Linear Dynamical Sysems Zoubin Ghahramani Georey E. Hinon Deparmen of Compuer Science Universiy oftorono 6 King's College Road Torono, Canada M5S A4 Email: zoubin@cs.orono.edu Technical
More informationSection 3.5 Nonhomogeneous Equations; Method of Undetermined Coefficients
Secion 3.5 Nonhomogeneous Equaions; Mehod of Undeermined Coefficiens Key Terms/Ideas: Linear Differenial operaor Nonlinear operaor Second order homogeneous DE Second order nonhomogeneous DE Soluion o homogeneous
More informationLecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still.
Lecure - Kinemaics in One Dimension Displacemen, Velociy and Acceleraion Everyhing in he world is moving. Nohing says sill. Moion occurs a all scales of he universe, saring from he moion of elecrons in
More informationChapter 3 Boundary Value Problem
Chaper 3 Boundary Value Problem A boundary value problem (BVP) is a problem, ypically an ODE or a PDE, which has values assigned on he physical boundary of he domain in which he problem is specified. Le
More informationBU Macro BU Macro Fall 2008, Lecture 4
Dynamic Programming BU Macro 2008 Lecure 4 1 Ouline 1. Cerainy opimizaion problem used o illusrae: a. Resricions on exogenous variables b. Value funcion c. Policy funcion d. The Bellman equaion and an
More informationMaintenance Models. Prof. Robert C. Leachman IEOR 130, Methods of Manufacturing Improvement Spring, 2011
Mainenance Models Prof Rober C Leachman IEOR 3, Mehods of Manufacuring Improvemen Spring, Inroducion The mainenance of complex equipmen ofen accouns for a large porion of he coss associaed wih ha equipmen
More informationIn this chapter the model of free motion under gravity is extended to objects projected at an angle. When you have completed it, you should
Cambridge Universiy Press 978--36-60033-7 Cambridge Inernaional AS and A Level Mahemaics: Mechanics Coursebook Excerp More Informaion Chaper The moion of projeciles In his chaper he model of free moion
More informationRapid Termination Evaluation for Recursive Subdivision of Bezier Curves
Rapid Terminaion Evaluaion for Recursive Subdivision of Bezier Curves Thomas F. Hain School of Compuer and Informaion Sciences, Universiy of Souh Alabama, Mobile, AL, U.S.A. Absrac Bézier curve flaening
More informationMODULE - 9 LECTURE NOTES 2 GENETIC ALGORITHMS
1 MODULE - 9 LECTURE NOTES 2 GENETIC ALGORITHMS INTRODUCTION Mos real world opimizaion problems involve complexiies like discree, coninuous or mixed variables, muliple conflicing objecives, non-lineariy,
More informationEnsamble methods: Bagging and Boosting
Lecure 21 Ensamble mehods: Bagging and Boosing Milos Hauskrech milos@cs.pi.edu 5329 Senno Square Ensemble mehods Mixure of expers Muliple base models (classifiers, regressors), each covers a differen par
More informationNotes for Lecture 17-18
U.C. Berkeley CS278: Compuaional Complexiy Handou N7-8 Professor Luca Trevisan April 3-8, 2008 Noes for Lecure 7-8 In hese wo lecures we prove he firs half of he PCP Theorem, he Amplificaion Lemma, up
More informationEXERCISES FOR SECTION 1.5
1.5 Exisence and Uniqueness of Soluions 43 20. 1 v c 21. 1 v c 1 2 4 6 8 10 1 2 2 4 6 8 10 Graph of approximae soluion obained using Euler s mehod wih = 0.1. Graph of approximae soluion obained using Euler
More informationKriging Models Predicting Atrazine Concentrations in Surface Water Draining Agricultural Watersheds
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Kriging Models Predicing Arazine Concenraions in Surface Waer Draining Agriculural Waersheds Paul L. Mosquin, Jeremy Aldworh, Wenlin Chen Supplemenal Maerial Number
More informationHamilton- J acobi Equation: Explicit Formulas In this lecture we try to apply the method of characteristics to the Hamilton-Jacobi equation: u t
M ah 5 2 7 Fall 2 0 0 9 L ecure 1 0 O c. 7, 2 0 0 9 Hamilon- J acobi Equaion: Explici Formulas In his lecure we ry o apply he mehod of characerisics o he Hamilon-Jacobi equaion: u + H D u, x = 0 in R n
More informationIntroduction to Mobile Robotics
Inroducion o Mobile Roboics Bayes Filer Kalman Filer Wolfram Burgard Cyrill Sachniss Giorgio Grisei Maren Bennewiz Chrisian Plagemann Bayes Filer Reminder Predicion bel p u bel d Correcion bel η p z bel
More informationLearning to Discover: A Bayesian Approach
Learning o Discover: A Bayesian Approach Zheng Wen Deparmen of Elecrical Engineering Sanford Universiy Sanford, CA zhengwen@sanford.edu Branislav Kveon and Sandilya Bhamidipai Technicolor Labs Palo Alo,
More informationSpeaker Adaptation Techniques For Continuous Speech Using Medium and Small Adaptation Data Sets. Constantinos Boulis
Speaker Adapaion Techniques For Coninuous Speech Using Medium and Small Adapaion Daa Ses Consaninos Boulis Ouline of he Presenaion Inroducion o he speaker adapaion problem Maximum Likelihood Sochasic Transformaions
More informationLearning to Take Concurrent Actions
Learning o Take Concurren Acions Khashayar Rohanimanesh Deparmen of Compuer Science Universiy of Massachuses Amhers, MA 0003 khash@cs.umass.edu Sridhar Mahadevan Deparmen of Compuer Science Universiy of
More informationDesigning Information Devices and Systems I Spring 2019 Lecture Notes Note 17
EES 16A Designing Informaion Devices and Sysems I Spring 019 Lecure Noes Noe 17 17.1 apaciive ouchscreen In he las noe, we saw ha a capacior consiss of wo pieces on conducive maerial separaed by a nonconducive
More informationOnline Convex Optimization Example And Follow-The-Leader
CSE599s, Spring 2014, Online Learning Lecure 2-04/03/2014 Online Convex Opimizaion Example And Follow-The-Leader Lecurer: Brendan McMahan Scribe: Sephen Joe Jonany 1 Review of Online Convex Opimizaion
More informationUnit Root Time Series. Univariate random walk
Uni Roo ime Series Univariae random walk Consider he regression y y where ~ iid N 0, he leas squares esimae of is: ˆ yy y y yy Now wha if = If y y hen le y 0 =0 so ha y j j If ~ iid N 0, hen y ~ N 0, he
More informationSubway stations energy and air quality management
Subway saions energy and air qualiy managemen wih sochasic opimizaion Trisan Rigau 1,2,4, Advisors: P. Carpenier 3, J.-Ph. Chancelier 2, M. De Lara 2 EFFICACITY 1 CERMICS, ENPC 2 UMA, ENSTA 3 LISIS, IFSTTAR
More informationNon-parametric techniques. Instance Based Learning. NN Decision Boundaries. Nearest Neighbor Algorithm. Distance metric important
on-parameric echniques Insance Based Learning AKA: neares neighbor mehods, non-parameric, lazy, memorybased, or case-based learning Copyrigh 2005 by David Helmbold 1 Do no fi a model (as do LTU, decision
More informationWORLD. States AGENT. Actions
Planning and Acing in Parially Observable Sochasic Domains Leslie Pack Kaelbling 1;2 Compuer Science Deparmen Brown Universiy, Box 1910 Providence, RI 02912-1910, USA Michael L. Liman 3 Duke Universiy
More informationAn recursive analytical technique to estimate time dependent physical parameters in the presence of noise processes
WHAT IS A KALMAN FILTER An recursive analyical echnique o esimae ime dependen physical parameers in he presence of noise processes Example of a ime and frequency applicaion: Offse beween wo clocks PREDICTORS,
More informationCS376 Computer Vision Lecture 6: Optical Flow
CS376 Compuer Vision Lecure 6: Opical Flow Qiing Huang Feb. 11 h 2019 Slides Credi: Krisen Grauman and Sebasian Thrun, Michael Black, Marc Pollefeys Opical Flow mage racking 3D compuaion mage sequence
More informationOff-policy TD(λ) with a true online equivalence
Off-policy TD(λ) wih a rue online equivalence Hado van Hassel A Rupam Mahmood Richard S Suon Reinforcemen Learning and Arificial Inelligence Laboraory Universiy of Albera, Edmonon, AB T6G 2E8 Canada Absrac
More informationChapter 7: Solving Trig Equations
Haberman MTH Secion I: The Trigonomeric Funcions Chaper 7: Solving Trig Equaions Le s sar by solving a couple of equaions ha involve he sine funcion EXAMPLE a: Solve he equaion sin( ) The inverse funcions
More informationApplication of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing
Applicaion of a Sochasic-Fuzzy Approach o Modeling Opimal Discree Time Dynamical Sysems by Using Large Scale Daa Processing AA WALASZE-BABISZEWSA Deparmen of Compuer Engineering Opole Universiy of Technology
More informationLecture 1 Overview. course mechanics. outline & topics. what is a linear dynamical system? why study linear systems? some examples
EE263 Auumn 27-8 Sephen Boyd Lecure 1 Overview course mechanics ouline & opics wha is a linear dynamical sysem? why sudy linear sysems? some examples 1 1 Course mechanics all class info, lecures, homeworks,
More informationLecture 2 October ε-approximation of 2-player zero-sum games
Opimizaion II Winer 009/10 Lecurer: Khaled Elbassioni Lecure Ocober 19 1 ε-approximaion of -player zero-sum games In his lecure we give a randomized ficiious play algorihm for obaining an approximae soluion
More informationA Primal-Dual Type Algorithm with the O(1/t) Convergence Rate for Large Scale Constrained Convex Programs
PROC. IEEE CONFERENCE ON DECISION AND CONTROL, 06 A Primal-Dual Type Algorihm wih he O(/) Convergence Rae for Large Scale Consrained Convex Programs Hao Yu and Michael J. Neely Absrac This paper considers
More informationProbabilistic Robotics SLAM
Probabilisic Roboics SLAM The SLAM Problem SLAM is he process by which a robo builds a map of he environmen and, a he same ime, uses his map o compue is locaion Localizaion: inferring locaion given a map
More informationEnergy Storage Benchmark Problems
Energy Sorage Benchmark Problems Daniel F. Salas 1,3, Warren B. Powell 2,3 1 Deparmen of Chemical & Biological Engineering 2 Deparmen of Operaions Research & Financial Engineering 3 Princeon Laboraory
More information23.2. Representing Periodic Functions by Fourier Series. Introduction. Prerequisites. Learning Outcomes
Represening Periodic Funcions by Fourier Series 3. Inroducion In his Secion we show how a periodic funcion can be expressed as a series of sines and cosines. We begin by obaining some sandard inegrals
More informationNon-parametric techniques. Instance Based Learning. NN Decision Boundaries. Nearest Neighbor Algorithm. Distance metric important
on-parameric echniques Insance Based Learning AKA: neares neighbor mehods, non-parameric, lazy, memorybased, or case-based learning Copyrigh 2005 by David Helmbold 1 Do no fi a model (as do LDA, logisic
More informationCMU-Q Lecture 3: Search algorithms: Informed. Teacher: Gianni A. Di Caro
CMU-Q 5-38 Lecure 3: Search algorihms: Informed Teacher: Gianni A. Di Caro UNINFORMED VS. INFORMED SEARCH Sraegy How desirable is o be in a cerain inermediae sae for he sake of (effecively) reaching a
More informationAppendix to Creating Work Breaks From Available Idleness
Appendix o Creaing Work Breaks From Available Idleness Xu Sun and Ward Whi Deparmen of Indusrial Engineering and Operaions Research, Columbia Universiy, New York, NY, 127; {xs2235,ww24}@columbia.edu Sepember
More informationPhysics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle
Chaper 2 Newonian Mechanics Single Paricle In his Chaper we will review wha Newon s laws of mechanics ell us abou he moion of a single paricle. Newon s laws are only valid in suiable reference frames,
More informationPENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD
PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD HAN XIAO 1. Penalized Leas Squares Lasso solves he following opimizaion problem, ˆβ lasso = arg max β R p+1 1 N y i β 0 N x ij β j β j (1.1) for some 0.
More informationRANDOM LAGRANGE MULTIPLIERS AND TRANSVERSALITY
ECO 504 Spring 2006 Chris Sims RANDOM LAGRANGE MULTIPLIERS AND TRANSVERSALITY 1. INTRODUCTION Lagrange muliplier mehods are sandard fare in elemenary calculus courses, and hey play a cenral role in economic
More informationFinal Spring 2007
.615 Final Spring 7 Overview The purpose of he final exam is o calculae he MHD β limi in a high-bea oroidal okamak agains he dangerous n = 1 exernal ballooning-kink mode. Effecively, his corresponds o
More informationThe equation to any straight line can be expressed in the form:
Sring Graphs Par 1 Answers 1 TI-Nspire Invesigaion Suden min Aims Deermine a series of equaions of sraigh lines o form a paern similar o ha formed by he cables on he Jerusalem Chords Bridge. Deermine he
More informationt is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t...
Mah 228- Fri Mar 24 5.6 Marix exponenials and linear sysems: The analogy beween firs order sysems of linear differenial equaions (Chaper 5) and scalar linear differenial equaions (Chaper ) is much sronger
More informationLinear Response Theory: The connection between QFT and experiments
Phys540.nb 39 3 Linear Response Theory: The connecion beween QFT and experimens 3.1. Basic conceps and ideas Q: How do we measure he conduciviy of a meal? A: we firs inroduce a weak elecric field E, and
More information