Planning in POMDPs. Dominik Schoenberger Abstract

Size: px
Start display at page:

Download "Planning in POMDPs. Dominik Schoenberger Abstract"

Transcription

1 Planning in POMDPs Dominik Schoenberger Absrac This documen briefly explains wha a Parially Observable Markov Decision Process is. Furhermore i inroduces he differen approaches available o solve his class of problems. I also gives an overview over echniques for selecing belief poins which make value backups more efficien. 1 The Parially Observable Markov Decision Process While agens used in classical planning are concerned only wih environmens ha are fully observable, he real world of roboic applicaions is generally a place where i is no possible o observe he whole environmen and ake deerminisic acions. The observed environmen migh no even be saic. For planning under such uncerainy, i is necessary o improve he robusness by explicily reasoning abou he ype of uncerainy ha can occur. The Parially Observable Markov Decision Process (POMDP) has become possibly he mos general represenaion of his problem. 1.1 Benefis of POMDPs This is because i combines he mos essenial feaures for planning under uncerainy. Whereas oher frameworks handle neiher or only sochasic acion effecs, POMDPs handle uncerainy in boh acion effecs an sae observabiliy. The laer is done by expressing parial sae observaions over informaion saes insead of world saes, since hese world saes are no direcly observable. Here he measuremens of noisy and imperfec sensors are used o calculae he informaion saes which form he beliefs a sysem has over is world sae. These informaion saes are represened by probabiliy disribuions over world saes. Many POMDP algorihms form plans by opimizing a value funcion allowing he numerical rade off beween alernaive ways o saisfy a goal, even muliple ineracing goals, and he comparison of acions wih differen coss or rewards. POMDPs are unique doing his for informaion saes insead of world saes. POMDPs produce a universal plan by giving a full policy for acion selecion prescribing he choice of acion for any possible informaion sae and alleviaing he need for replanning. This makes he execuion faser. 1.2 Disadvanages of POMDPs Noneheless, his is also he main drawback of POMDPs, because generaing a universal plan has a high compuaional complexiy. Mos algorihms for exac planning in POMDPs opimize he value funcion over all possible beliefs, which is known o be PSPACE-complee. Tha means ha many POMDP domains wih only a few saes, acions and sensor observaions are already compuaionally inracable. Proposiional planning is only NP-complee. To speed up POMDP solving, a commonly used echnique is o keep value backups of a se of belief poins. Anoher problem, which has long been a key impedimen for POMDPs being used in pracical applicaions, is ha, if he value funcion is expressed by a se of vecors, in his se a vecor can be fully dominaed by a se of oher vecors (see Figure 1). Pruning ha dominaed vecors away can be expensive. 1

2 Figure 1: Value funcion vecor α 2 is dominaed by a combinaion of α 1 and α Basic POMDP erminology The POMDP is formally defined by six disinc quaniies which are denoed {S, A, T, Z, O, R}. These represen he following: Saes S s denoes a sae of he world and he finie se of all saes of a world is denoed S = {s 0, s 1,...}, while he sae a he ime is denoed s wih being a discree ime index. Since he sae of a world is no direcly observable is POMDPs, an agen can only assume which sae i is in by compuing a belief over he sae space S. Acions A The agen is given a se of acions denoed A = {a 0, a 1,...} which i can use o ac in he world. These acions affec he sae of he world sochasically, so choosing he righ acion is a funcion of hisory and ha makes choosing he righ acion he core problem in POMDPs. Observaions Z Since a belief of he world s sae s is needed, he agen can derive his belief from sensor measuremens. A se of measuremens a he same ime is called observaion z. The se of all observaions is denoed Z = {z 0, z 1,...} where he observaion a ime is denoed z. Any observaion z is usually an incomplee projecion of a world sae s due o sensor noise. Reward funcion R The funcion R(s, a) : S A R assigns he reward of performing an acion a a a sae s. The agen ries o collec as much reward as possible over ime, which means i ries o maximize E[ T = 0 γ 0 r ] where E[] is he mahemaical expecaion, 0 γ < 1 is a discoun facor ensuring he sum being finie and r is he reward a ime. Sae ransiion probabiliy disribuion T Given ha he agen is in sae s and selecs acion a, he probabiliy of ransiioning o sae s is T (s, a, s ) := P r(s = s s 1 = s, a 1 = a), for any (s, a, s ). T is a condiional probabiliy disribuion which means ha s S T (s, a, s ) = 1, (s, a). T is also ime-invarian. Observaion probabiliy disribuion O Upon execuing acion a in sae s he probabiliy ha he agen will perceive observaion z is O(s, a, z) := P r(z = z s 1 = s, a 1 = a), 1 Graphic from Pineau, Gordon & Thrun - Anyime Poin-Based Approximaions for POMDPs, Figure 1. 2

3 for all (s, a, z). O is also a condiional probabiliy disribuion wih z Z O(s, a, z) = 1, (s, a) and i is also ime-invarian. 1.4 Belief compuaion Since POMDPs are insances of Markov processes, he curren world sae s is sufficien o predic he fuure independen of he pas {s 0, s 1,..., s 1 }, bu unforunaely he agen in a POMDP can only perceive observaions {z 0,..., z }, because he sae is no direcly observable. This is why he agen has o compue a belief of he world sae insead using a complee race of all observaions and all acions ever execued. Tha race is called a hisory h := {z 0, a 0, z 1,..., z 1, a 1, z }, here a ime. If an iniial sae probabiliy disribuion b 0 (s) := P r(s 0 = s) is available o he agen, he hisory can also be summarized via a belief disribuion b (s) := P r(s = s z, a 1, z 1,..., a 0, b 0 ) insead of being represened explicily. The belief b can be calculaed recursively using only he las belief b 1, he las acion a 1 and he curren observaion z. The belief updae equaion τ() is now defined as follows, equivalen o he one of he Bayes filer: τ(b 1, a 1, z ) = b (s ) = The denominaor is normalizing consan. 1.5 Opimal policy compuaion O(s,a 1,z )T (s,a 1,s )b 1(s) s P r(z b,a 1 Compuing a policy for selecing acions is he cenral objecive in a POMDP. The policy π(b) a chooses acion a a a belief disribuion b. Since he agen wans o maximize he expeced fuure discouned cumulaive reward, he opimal policy for his is π (b 0 ) = arg max π E π [ T = 0 γ 0 r b 0 ] A sraighforward approach o finding an opimal policy is o apply muliple ieraions o compue increasingly more accurae values for each belief sae b. For his a value funcion V is needed, which maps belief saes o values. The iniial value funcion is: V 0 (b) = max a R(s, a)b(s) For each ieraion of he value funcion is compued recursively and maximizes he sum of all fuure rewards wihin for any belief sae b wihin ime seps: V (b) = max a [ R(s, a)b(s) + γ P r(z a, b)v 1 (τ(b, a, z))] This way i produces a policy ha is opimal under he same planning horizon : π (b) = arg max a [ R(s, a)b(s) + γ P r(z a, b)v 1 (τ(b, a, z))] Now each of hese value funcions a any planning horizon can be expressed by a se of vecors Γ = {α 0, α 1,..., α m }, each vecor represening an S -dimensional hyper-plane and defining he value funcion over a bounded region of he belief sae: V (b) = max α Γ α(s)b(s) Then each of hese α-vecors is associaed wih an acion a o creae a policy, ha already assumes opimal behavior for he following seps: 3

4 V (b) = max a A [ R(s, a)b(s) + γ max α Γ 1 s S T (s, a, s )O(s, a, z)α(s )b(s)] V (b) canno be compued direcly for each beliefs because here are infiniely many beliefs. However he corresponding Γ can be generaed, done by a sequence operaions on he previous se Γ 1. For each acion a and for each observaion z he se Γ a, is compued as follows: along wih he inermediae se Γ a,z : α a, (s) = R(s, a) α a,z i (s) = γ T (s, a, s )O(s, a, z), α i (s ), α i Γ 1 s S Nex he cross-sum over observaions Γ a, a A is creaed including one α a,z from each Γ a,z : and he union is aken of all Γ a ses: Γ a = Γ a, + Γ a,z1 Γ = a A Γ a Γ a,z2... In his form he pieces of he soluion for he value funcion a he horizon can be backed up. To exrac he value funcion from he se Γ, he α-vecors are applied o he equaion for V (b) from above: V (b) = max α Γ α(s)b(s) 1.6 Poin-based value backup While here are many differen approaches how o selec belief poins o be updaed, he procedure of how he updae is done is sandard for any of hese, implemened as a sequence of operaions on a se of α-vecors. Since he updae of he value funcion is only applied a a fixed se of belief poins B = {b 0, b 1,..., b q }, here is a corresponding se of vecors {α 0, α 1,..., α q } conaining a mos one vecor for each belief. I is now assumed, ha he belief poins in a region around b have he same acion choice and also lead o he same faces of V 1 as his poin b. For his poin only one of is α-vecors from a given soluion se Γ 1 is used for he poin-based backup. To obain now he nex soluion se Γ, se Γ a, is generaed for all acions and observaions: and he same is done for Γ a,z : α a, (s) = R(s, a) α a,z i (s) = γ T (s, a, s )O(s, a, z), α i (s ), α i Γ 1 s S Nex, insead of a cross-sum, a simple summaion is calculaed o ge Γ a, a A: αb a = Γa, + arg max a Γ a,z( α(s)b(s)), b B Finally, he bes acion is needed for each belief poin: α b = arg max Γ a ( Γ a (s)b(s)), b B and he soluion se is creaed wih hese: Γ = b B α b Alhough he operaions above preserve only he bes α-vecor for each belief poin b B, an esimae of he value funcion a any belief a / B can be calculaed from Γ by using again: V (b) = max α Γ α(s)b(s) 4

5 2 Poin-based algorihms 2.1 Exac poin-based algorihms This ype of mehods ypically canno scale beyond a handful of saes, acions and observaions. Earlier echniques like his use poin-based backups o opimize he value funcion over limied pars of he belief ree looking for beliefs where he value funcion is no opimal. Therefore all reachable beliefs have o be considered, leaving his an expensive approach. Noneheless i is guaraneed o deliver he opimal soluion. 2.2 Approximae poin-based algorihms Poin-Based Value Ieraion Two main componens are needed o achieve an anyime soluion o large POMDP domains. These are he belief se selecion and he poin-based updae procedure, which is done here. The Poin- Based Value Ieraion algorihm (PBVI) sars wih an iniial se of belief poins for applying a firs backup. I hen grows he belief ree and does a new series of backup operaions including old and new beliefs. This is repeaed unil a saisfacory soluion is obained. In his way PBVI gradually rades off compuaion ime and soluion qualiy. Even hough i is no guaraneed, ha he value funcion improves wih he addiion of belief poins, PBVI decreases or a leas keeps he bound error wih each sep The Perseus algorihm Perseus always uses randomly chosen poins ha are added o he belief ree. Value updaes are no done all a once, he poins are randomly sampled o updae heir value one a a ime insead. Because of one updaed value in a value funcion vecor can also improve he value of nearby poins, hese poins are hen already removed from he sampling se. The algorihm coninues unil he value of all poins has been improved Heurisic Search Value Ieraion The Heurisic Search Value Ieraion algorihm (HSVI) keeps a lower and upper bound for he value funcion which i used o selec belief poins. To perform a value updae, i only updaes he direc predecessors of he seleced belief. The HSVI algorihm offers anyime performance Real Time Belief Space Search The Real Time Belief Space Search approach (RTBSS) consrucs a new belief reachabiliy ree by using he curren poin as he op node and erminaing he ree a a fixed deph. This way, he value of each node can be calculaed recursively over he finie planning horizon. The algorihm also deleed subrees ha exceed a calculaed bound, compared o oher subrees. A his poin, anoher algorihm like PBVI can be used o compue a lower bound and so improving pruning of subrees which also improves he qualiy of he soluion of he RTBSS. This approach is able o compue fas resuls alhough he qualiy is no as good as he soluion qualiy of algorihms like PBVI or Perseus. 2.3 Sraegies for selecing belief poins There a differen mehods used for selecing new belief poins. I is useful o check firs if he beliefs ha are considered as a backup are acually reachable. Therefore a subse of reachable beliefs is creaed saring wih a known iniial belief (see Figure 2). This subse should be sufficienly small for compuaional racabiliy and large enough for good value funcion approximaion. 5

6 Figure 2: The shown belief ree includes reachable beliefs only Random Belief Selecion The simples way o sample a new belief poin is obviously o choose i randomly ou of he enire belief simplex. The only hing o regard here is o ensure a uniform coverage. This sraegy work well in small domains bu since i canno provide a good coverage of he belief simplex wih a reasonable number of poins, i exhibis poor performance in large domains Sochasic Simulaion wih Random Acion A beer sraegy is o add poins along he belief ree. To generae hese, an acion is simulaed, making a single-sep forward rajecory from belief poins already in he ree. Since his acion is seleced randomly, he belief ree will sill be very large, especially when he branching facor is high Sochasic Simulaion wih Greedy Acion If he acion is chosen he way, ha he expeced value gain a he new belief poin will be he mos of all value gains seen from he curren belief, his is called a Greedy Acion. Here he ɛ-greedy exploraion sraegy known from reinforcemen learning is used o give he probabiliy wih which he greedy acion is seleced. Then he single-sep forward simulaion is done using he seleced acion Sochasic Simulaion wih Exploraory Acion Because of POMDP algorihm performing bes wih a uniformly dense se of reachable beliefs, he new belief supposed o be added o he belief ree should improve he wors-case densiy. To do his, he simulaion wih Exploraory Acion does a single-sep forward simulaion wih each acion, bu hen keeps only ha one poin, which is farhes away from all oher belief poins already in he belief ree Greedy Error Reducion The mos successful sraegy for selecing new belief poins ries o reduce he expeced error. I firs calculaes he addiional error inroduced by a single belief poin backup for each possible new poin in he ree. Then he exising poin wih he larges error bound is needed, wherefore is imporan o regard he reachabiliy probabiliy of his poin as well. Finally of ha poins descendans ha one is seleced, ha would minimize he new error bound (see Figure 3). 2 Graphic from Pineau, Gordon & Thrun - Anyime Poin-Based Approximaions for POMDPs, Figure 2. 6

7 Figure 3: The marked poins are he candidaes o be added nex. 3 3 Grid-based algorihms 3.1 Grid-based approximaion To approximae he value funcion using a finie se of belief poin, many approaches are know. As he name grid-based approximaion predics, here he poins are disribued according o a grid paern over he belief space. The value of poins no on he grid is specified by an inerpolaionexrapolaion rule maching hem o neighboring grid-poins. Thereby he convexiy of he value funcion of POMDPs is ignored. 3.2 Sraegies for selecing grid-poins To selec he grid-poins needed here, an easy way is o lay a grid wih fixed resoluion over he belief simplex. Now only neighboring grid-poins are used o calculae he value inerpolaion. This is done quickly, bu he number of poins grows wih he dimensionaliy of he belief space. Even simpler is he approach ha selecs random poins over he whole belief ree, bu ha makes inerpolaion a lo harder. These boh mehods are no ideal, when beliefs are no uniformly disribued, which is he acual characerizaion of many real-life problems. Furher here are approaches called non-regular grid approximaions. One of hem does single-sep sochasic simulaions saring a he corner poins of he belief simplex o generae addiional belief poins. Anoher approach also builds a grid bu sars a criical poins of he belief simplex and hen uses a heurisic o esimae he usefulness of inermediae poins i adds sep by sep. A hird one makes an inerpolaion over he values a criical poins of he grid. Though hese mehods require fewer beliefs, hey are more expensive because inerpolaion over non-grid poins requires searching over all grid poins, raher han jus neighboring ones. A beer approach creaes sub-samples of he fixed-resoluion grid fields were needed and his way i ges a variable resoluion of he whole grid. So i can sample some pars more densely while grid poins are resriced o lie on he fixed-resoluion grid. The disadvanage of his algorihm is ha is requires a large number of grid poins o performance well. Anoher good algorihm can be applied o POMDPs wih ɛ-opimaliy and requires a horoughly covered belief simplex and herefore exponenially many grid poins are needed. Bu he algorihm is really fas because i inerpolaes only over he neares neighbor of a one-sep successor belief for each grid poin. References [1] Joelle Pineau, Geoff Gordon and Sebasian Thrun (2006). Anyime poin-based approximaions for large POMDPs. Journal of Arificial Inelligence Research, Vol 27, pp Graphic from Pineau, Gordon & Thrun - Anyime Poin-Based Approximaions for POMDPs, Figure 3. 7

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB Elecronic Companion EC.1. Proofs of Technical Lemmas and Theorems LEMMA 1. Le C(RB) be he oal cos incurred by he RB policy. Then we have, T L E[C(RB)] 3 E[Z RB ]. (EC.1) Proof of Lemma 1. Using he marginal

More information

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017 Two Popular Bayesian Esimaors: Paricle and Kalman Filers McGill COMP 765 Sep 14 h, 2017 1 1 1, dx x Bel x u x P x z P Recall: Bayes Filers,,,,,,, 1 1 1 1 u z u x P u z u x z P Bayes z = observaion u =

More information

An introduction to the theory of SDDP algorithm

An introduction to the theory of SDDP algorithm An inroducion o he heory of SDDP algorihm V. Leclère (ENPC) Augus 1, 2014 V. Leclère Inroducion o SDDP Augus 1, 2014 1 / 21 Inroducion Large scale sochasic problem are hard o solve. Two ways of aacking

More information

Sequential Importance Resampling (SIR) Particle Filter

Sequential Importance Resampling (SIR) Particle Filter Paricle Filers++ Pieer Abbeel UC Berkeley EECS Many slides adaped from Thrun, Burgard and Fox, Probabilisic Roboics 1. Algorihm paricle_filer( S -1, u, z ): 2. Sequenial Imporance Resampling (SIR) Paricle

More information

Notes on Kalman Filtering

Notes on Kalman Filtering Noes on Kalman Filering Brian Borchers and Rick Aser November 7, Inroducion Daa Assimilaion is he problem of merging model predicions wih acual measuremens of a sysem o produce an opimal esimae of he curren

More information

RL Lecture 7: Eligibility Traces. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1

RL Lecture 7: Eligibility Traces. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 RL Lecure 7: Eligibiliy Traces R. S. Suon and A. G. Baro: Reinforcemen Learning: An Inroducion 1 N-sep TD Predicion Idea: Look farher ino he fuure when you do TD backup (1, 2, 3,, n seps) R. S. Suon and

More information

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles Diebold, Chaper 7 Francis X. Diebold, Elemens of Forecasing, 4h Ediion (Mason, Ohio: Cengage Learning, 006). Chaper 7. Characerizing Cycles Afer compleing his reading you should be able o: Define covariance

More information

Vehicle Arrival Models : Headway

Vehicle Arrival Models : Headway Chaper 12 Vehicle Arrival Models : Headway 12.1 Inroducion Modelling arrival of vehicle a secion of road is an imporan sep in raffic flow modelling. I has imporan applicaion in raffic flow simulaion where

More information

L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS. NA568 Mobile Robotics: Methods & Algorithms

L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS. NA568 Mobile Robotics: Methods & Algorithms L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS NA568 Mobile Roboics: Mehods & Algorihms Today s Topic Quick review on (Linear) Kalman Filer Kalman Filering for Non-Linear Sysems Exended Kalman Filer (EKF)

More information

Zürich. ETH Master Course: L Autonomous Mobile Robots Localization II

Zürich. ETH Master Course: L Autonomous Mobile Robots Localization II Roland Siegwar Margaria Chli Paul Furgale Marco Huer Marin Rufli Davide Scaramuzza ETH Maser Course: 151-0854-00L Auonomous Mobile Robos Localizaion II ACT and SEE For all do, (predicion updae / ACT),

More information

1 Review of Zero-Sum Games

1 Review of Zero-Sum Games COS 5: heoreical Machine Learning Lecurer: Rob Schapire Lecure #23 Scribe: Eugene Brevdo April 30, 2008 Review of Zero-Sum Games Las ime we inroduced a mahemaical model for wo player zero-sum games. Any

More information

Exploiting Symmetries in POMDPs for Point-Based Algorithms

Exploiting Symmetries in POMDPs for Point-Based Algorithms Proceedings of he Tweny-Third AAAI Conference on Arificial Inelligence (2008) Exploiing Symmeries in POMDPs for Poin-Based Algorihms Kee-Eung Kim Deparmen of Compuer Science Korea Advanced Insiue of Science

More information

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon 3..3 INRODUCION O DYNAMIC OPIMIZAION: DISCREE IME PROBLEMS A. he Hamilonian and Firs-Order Condiions in a Finie ime Horizon Define a new funcion, he Hamilonian funcion, H. H he change in he oal value of

More information

Tom Heskes and Onno Zoeter. Presented by Mark Buller

Tom Heskes and Onno Zoeter. Presented by Mark Buller Tom Heskes and Onno Zoeer Presened by Mark Buller Dynamic Bayesian Neworks Direced graphical models of sochasic processes Represen hidden and observed variables wih differen dependencies Generalize Hidden

More information

Estimation of Poses with Particle Filters

Estimation of Poses with Particle Filters Esimaion of Poses wih Paricle Filers Dr.-Ing. Bernd Ludwig Chair for Arificial Inelligence Deparmen of Compuer Science Friedrich-Alexander-Universiä Erlangen-Nürnberg 12/05/2008 Dr.-Ing. Bernd Ludwig (FAU

More information

State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter

State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter Sae-Space Models Iniializaion, Esimaion and Smoohing of he Kalman Filer Iniializaion of he Kalman Filer The Kalman filer shows how o updae pas predicors and he corresponding predicion error variances when

More information

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model Modal idenificaion of srucures from roving inpu daa by means of maximum likelihood esimaion of he sae space model J. Cara, J. Juan, E. Alarcón Absrac The usual way o perform a forced vibraion es is o fix

More information

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN The MIT Press, 2014 Lecure Slides for INTRODUCTION TO MACHINE LEARNING 3RD EDITION alpaydin@boun.edu.r hp://www.cmpe.boun.edu.r/~ehem/i2ml3e CHAPTER 2: SUPERVISED LEARNING Learning a Class

More information

Article from. Predictive Analytics and Futurism. July 2016 Issue 13

Article from. Predictive Analytics and Futurism. July 2016 Issue 13 Aricle from Predicive Analyics and Fuurism July 6 Issue An Inroducion o Incremenal Learning By Qiang Wu and Dave Snell Machine learning provides useful ools for predicive analyics The ypical machine learning

More information

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010 Simulaion-Solving Dynamic Models ABE 5646 Week 2, Spring 2010 Week Descripion Reading Maerial 2 Compuer Simulaion of Dynamic Models Finie Difference, coninuous saes, discree ime Simple Mehods Euler Trapezoid

More information

A Shooting Method for A Node Generation Algorithm

A Shooting Method for A Node Generation Algorithm A Shooing Mehod for A Node Generaion Algorihm Hiroaki Nishikawa W.M.Keck Foundaion Laboraory for Compuaional Fluid Dynamics Deparmen of Aerospace Engineering, Universiy of Michigan, Ann Arbor, Michigan

More information

Presentation Overview

Presentation Overview Acion Refinemen in Reinforcemen Learning by Probabiliy Smoohing By Thomas G. Dieerich & Didac Busques Speaer: Kai Xu Presenaion Overview Bacground The Probabiliy Smoohing Mehod Experimenal Sudy of Acion

More information

Online Appendix to Solution Methods for Models with Rare Disasters

Online Appendix to Solution Methods for Models with Rare Disasters Online Appendix o Soluion Mehods for Models wih Rare Disasers Jesús Fernández-Villaverde and Oren Levinal In his Online Appendix, we presen he Euler condiions of he model, we develop he pricing Calvo block,

More information

Efficient POMDP Forward Search by Predicting the Posterior Belief Distribution Ruijie He and Nicholas Roy

Efficient POMDP Forward Search by Predicting the Posterior Belief Distribution Ruijie He and Nicholas Roy Compuer Science and Arificial Inelligence Laboraory Technical Repor MIT-CSAIL-TR-2009-044 Sepember 23, 2009 Efficien POMDP Forward Search by Predicing he Poserior Belief Disribuion Ruijie He and Nicholas

More information

Announcements. Recap: Filtering. Recap: Reasoning Over Time. Example: State Representations for Robot Localization. Particle Filtering

Announcements. Recap: Filtering. Recap: Reasoning Over Time. Example: State Representations for Robot Localization. Particle Filtering Inroducion o Arificial Inelligence V22.0472-001 Fall 2009 Lecure 18: aricle & Kalman Filering Announcemens Final exam will be a 7pm on Wednesday December 14 h Dae of las class 1.5 hrs long I won ask anyhing

More information

Air Traffic Forecast Empirical Research Based on the MCMC Method

Air Traffic Forecast Empirical Research Based on the MCMC Method Compuer and Informaion Science; Vol. 5, No. 5; 0 ISSN 93-8989 E-ISSN 93-8997 Published by Canadian Cener of Science and Educaion Air Traffic Forecas Empirical Research Based on he MCMC Mehod Jian-bo Wang,

More information

STATE-SPACE MODELLING. A mass balance across the tank gives:

STATE-SPACE MODELLING. A mass balance across the tank gives: B. Lennox and N.F. Thornhill, 9, Sae Space Modelling, IChemE Process Managemen and Conrol Subjec Group Newsleer STE-SPACE MODELLING Inroducion: Over he pas decade or so here has been an ever increasing

More information

Lecture 33: November 29

Lecture 33: November 29 36-705: Inermediae Saisics Fall 2017 Lecurer: Siva Balakrishnan Lecure 33: November 29 Today we will coninue discussing he boosrap, and hen ry o undersand why i works in a simple case. In he las lecure

More information

Chapter 2. First Order Scalar Equations

Chapter 2. First Order Scalar Equations Chaper. Firs Order Scalar Equaions We sar our sudy of differenial equaions in he same way he pioneers in his field did. We show paricular echniques o solve paricular ypes of firs order differenial equaions.

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION SUPPLEMENTARY INFORMATION DOI: 0.038/NCLIMATE893 Temporal resoluion and DICE * Supplemenal Informaion Alex L. Maren and Sephen C. Newbold Naional Cener for Environmenal Economics, US Environmenal Proecion

More information

Decentralized Stochastic Control with Partial History Sharing: A Common Information Approach

Decentralized Stochastic Control with Partial History Sharing: A Common Information Approach 1 Decenralized Sochasic Conrol wih Parial Hisory Sharing: A Common Informaion Approach Ashuosh Nayyar, Adiya Mahajan and Demoshenis Tenekezis arxiv:1209.1695v1 [cs.sy] 8 Sep 2012 Absrac A general model

More information

Christos Papadimitriou & Luca Trevisan November 22, 2016

Christos Papadimitriou & Luca Trevisan November 22, 2016 U.C. Bereley CS170: Algorihms Handou LN-11-22 Chrisos Papadimiriou & Luca Trevisan November 22, 2016 Sreaming algorihms In his lecure and he nex one we sudy memory-efficien algorihms ha process a sream

More information

3.1 More on model selection

3.1 More on model selection 3. More on Model selecion 3. Comparing models AIC, BIC, Adjused R squared. 3. Over Fiing problem. 3.3 Sample spliing. 3. More on model selecion crieria Ofen afer model fiing you are lef wih a handful of

More information

Expert Advice for Amateurs

Expert Advice for Amateurs Exper Advice for Amaeurs Ernes K. Lai Online Appendix - Exisence of Equilibria The analysis in his secion is performed under more general payoff funcions. Wihou aking an explici form, he payoffs of he

More information

1. An introduction to dynamic optimization -- Optimal Control and Dynamic Programming AGEC

1. An introduction to dynamic optimization -- Optimal Control and Dynamic Programming AGEC This documen was generaed a :45 PM 8/8/04 Copyrigh 04 Richard T. Woodward. An inroducion o dynamic opimizaion -- Opimal Conrol and Dynamic Programming AGEC 637-04 I. Overview of opimizaion Opimizaion is

More information

14 Autoregressive Moving Average Models

14 Autoregressive Moving Average Models 14 Auoregressive Moving Average Models In his chaper an imporan parameric family of saionary ime series is inroduced, he family of he auoregressive moving average, or ARMA, processes. For a large class

More information

5. Stochastic processes (1)

5. Stochastic processes (1) Lec05.pp S-38.45 - Inroducion o Teleraffic Theory Spring 2005 Conens Basic conceps Poisson process 2 Sochasic processes () Consider some quaniy in a eleraffic (or any) sysem I ypically evolves in ime randomly

More information

CSE/NB 528 Lecture 14: Reinforcement Learning (Chapter 9)

CSE/NB 528 Lecture 14: Reinforcement Learning (Chapter 9) CSE/NB 528 Lecure 14: Reinforcemen Learning Chaper 9 Image from hp://clasdean.la.asu.edu/news/images/ubep2001/neuron3.jpg Lecure figures are from Dayan & Abbo s book hp://people.brandeis.edu/~abbo/book/index.hml

More information

Information Relaxations and Duality in Stochastic Dynamic Programs

Information Relaxations and Duality in Stochastic Dynamic Programs OPERATIONS RESEARCH Vol. 58, No. 4, Par 1 of 2, July Augus 2010, pp. 785 801 issn 0030-364X eissn 1526-5463 10 5804 0785 informs doi 10.1287/opre.1090.0796 2010 INFORMS Informaion Relaxaions and Dualiy

More information

SZG Macro 2011 Lecture 3: Dynamic Programming. SZG macro 2011 lecture 3 1

SZG Macro 2011 Lecture 3: Dynamic Programming. SZG macro 2011 lecture 3 1 SZG Macro 2011 Lecure 3: Dynamic Programming SZG macro 2011 lecure 3 1 Background Our previous discussion of opimal consumpion over ime and of opimal capial accumulaion sugges sudying he general decision

More information

0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED

0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED 0.1 MAXIMUM LIKELIHOOD ESTIMATIO EXPLAIED Maximum likelihood esimaion is a bes-fi saisical mehod for he esimaion of he values of he parameers of a sysem, based on a se of observaions of a random variable

More information

On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems

On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems MATHEMATICS OF OPERATIONS RESEARCH Vol. 38, No. 2, May 2013, pp. 209 227 ISSN 0364-765X (prin) ISSN 1526-5471 (online) hp://dx.doi.org/10.1287/moor.1120.0562 2013 INFORMS On Boundedness of Q-Learning Ieraes

More information

Lecture 9: September 25

Lecture 9: September 25 0-725: Opimizaion Fall 202 Lecure 9: Sepember 25 Lecurer: Geoff Gordon/Ryan Tibshirani Scribes: Xuezhi Wang, Subhodeep Moira, Abhimanu Kumar Noe: LaTeX emplae couresy of UC Berkeley EECS dep. Disclaimer:

More information

Timed Circuits. Asynchronous Circuit Design. Timing Relationships. A Simple Example. Timed States. Timing Sequences. ({r 6 },t6 = 1.

Timed Circuits. Asynchronous Circuit Design. Timing Relationships. A Simple Example. Timed States. Timing Sequences. ({r 6 },t6 = 1. Timed Circuis Asynchronous Circui Design Chris J. Myers Lecure 7: Timed Circuis Chaper 7 Previous mehods only use limied knowledge of delays. Very robus sysems, bu exremely conservaive. Large funcional

More information

2.160 System Identification, Estimation, and Learning. Lecture Notes No. 8. March 6, 2006

2.160 System Identification, Estimation, and Learning. Lecture Notes No. 8. March 6, 2006 2.160 Sysem Idenificaion, Esimaion, and Learning Lecure Noes No. 8 March 6, 2006 4.9 Eended Kalman Filer In many pracical problems, he process dynamics are nonlinear. w Process Dynamics v y u Model (Linearized)

More information

Bias-Variance Error Bounds for Temporal Difference Updates

Bias-Variance Error Bounds for Temporal Difference Updates Bias-Variance Bounds for Temporal Difference Updaes Michael Kearns AT&T Labs mkearns@research.a.com Sainder Singh AT&T Labs baveja@research.a.com Absrac We give he firs rigorous upper bounds on he error

More information

Robust estimation based on the first- and third-moment restrictions of the power transformation model

Robust estimation based on the first- and third-moment restrictions of the power transformation model h Inernaional Congress on Modelling and Simulaion, Adelaide, Ausralia, 6 December 3 www.mssanz.org.au/modsim3 Robus esimaion based on he firs- and hird-momen resricions of he power ransformaion Nawaa,

More information

Overview. COMP14112: Artificial Intelligence Fundamentals. Lecture 0 Very Brief Overview. Structure of this course

Overview. COMP14112: Artificial Intelligence Fundamentals. Lecture 0 Very Brief Overview. Structure of this course OMP: Arificial Inelligence Fundamenals Lecure 0 Very Brief Overview Lecurer: Email: Xiao-Jun Zeng x.zeng@mancheser.ac.uk Overview This course will focus mainly on probabilisic mehods in AI We shall presen

More information

Probabilistic Robotics

Probabilistic Robotics Probabilisic Roboics Bayes Filer Implemenaions Gaussian filers Bayes Filer Reminder Predicion bel p u bel d Correcion bel η p z bel Gaussians : ~ π e p N p - Univariae / / : ~ μ μ μ e p Ν p d π Mulivariae

More information

Simulating models with heterogeneous agents

Simulating models with heterogeneous agents Simulaing models wih heerogeneous agens Wouer J. Den Haan London School of Economics c by Wouer J. Den Haan Individual agen Subjec o employmen shocks (ε i, {0, 1}) Incomplee markes only way o save is hrough

More information

. Now define y j = log x j, and solve the iteration.

. Now define y j = log x j, and solve the iteration. Problem 1: (Disribued Resource Allocaion (ALOHA!)) (Adaped from M& U, Problem 5.11) In his problem, we sudy a simple disribued proocol for allocaing agens o shared resources, wherein agens conend for resources

More information

12: AUTOREGRESSIVE AND MOVING AVERAGE PROCESSES IN DISCRETE TIME. Σ j =

12: AUTOREGRESSIVE AND MOVING AVERAGE PROCESSES IN DISCRETE TIME. Σ j = 1: AUTOREGRESSIVE AND MOVING AVERAGE PROCESSES IN DISCRETE TIME Moving Averages Recall ha a whie noise process is a series { } = having variance σ. The whie noise process has specral densiy f (λ) = of

More information

Tracking. Announcements

Tracking. Announcements Tracking Tuesday, Nov 24 Krisen Grauman UT Ausin Announcemens Pse 5 ou onigh, due 12/4 Shorer assignmen Auo exension il 12/8 I will no hold office hours omorrow 5 6 pm due o Thanksgiving 1 Las ime: Moion

More information

Introduction to Probability and Statistics Slides 4 Chapter 4

Introduction to Probability and Statistics Slides 4 Chapter 4 Inroducion o Probabiliy and Saisics Slides 4 Chaper 4 Ammar M. Sarhan, asarhan@mahsa.dal.ca Deparmen of Mahemaics and Saisics, Dalhousie Universiy Fall Semeser 8 Dr. Ammar Sarhan Chaper 4 Coninuous Random

More information

Longest Common Prefixes

Longest Common Prefixes Longes Common Prefixes The sandard ordering for srings is he lexicographical order. I is induced by an order over he alphabe. We will use he same symbols (,

More information

Georey E. Hinton. University oftoronto. Technical Report CRG-TR February 22, Abstract

Georey E. Hinton. University oftoronto.   Technical Report CRG-TR February 22, Abstract Parameer Esimaion for Linear Dynamical Sysems Zoubin Ghahramani Georey E. Hinon Deparmen of Compuer Science Universiy oftorono 6 King's College Road Torono, Canada M5S A4 Email: zoubin@cs.orono.edu Technical

More information

Section 3.5 Nonhomogeneous Equations; Method of Undetermined Coefficients

Section 3.5 Nonhomogeneous Equations; Method of Undetermined Coefficients Secion 3.5 Nonhomogeneous Equaions; Mehod of Undeermined Coefficiens Key Terms/Ideas: Linear Differenial operaor Nonlinear operaor Second order homogeneous DE Second order nonhomogeneous DE Soluion o homogeneous

More information

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still.

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still. Lecure - Kinemaics in One Dimension Displacemen, Velociy and Acceleraion Everyhing in he world is moving. Nohing says sill. Moion occurs a all scales of he universe, saring from he moion of elecrons in

More information

Chapter 3 Boundary Value Problem

Chapter 3 Boundary Value Problem Chaper 3 Boundary Value Problem A boundary value problem (BVP) is a problem, ypically an ODE or a PDE, which has values assigned on he physical boundary of he domain in which he problem is specified. Le

More information

BU Macro BU Macro Fall 2008, Lecture 4

BU Macro BU Macro Fall 2008, Lecture 4 Dynamic Programming BU Macro 2008 Lecure 4 1 Ouline 1. Cerainy opimizaion problem used o illusrae: a. Resricions on exogenous variables b. Value funcion c. Policy funcion d. The Bellman equaion and an

More information

Maintenance Models. Prof. Robert C. Leachman IEOR 130, Methods of Manufacturing Improvement Spring, 2011

Maintenance Models. Prof. Robert C. Leachman IEOR 130, Methods of Manufacturing Improvement Spring, 2011 Mainenance Models Prof Rober C Leachman IEOR 3, Mehods of Manufacuring Improvemen Spring, Inroducion The mainenance of complex equipmen ofen accouns for a large porion of he coss associaed wih ha equipmen

More information

In this chapter the model of free motion under gravity is extended to objects projected at an angle. When you have completed it, you should

In this chapter the model of free motion under gravity is extended to objects projected at an angle. When you have completed it, you should Cambridge Universiy Press 978--36-60033-7 Cambridge Inernaional AS and A Level Mahemaics: Mechanics Coursebook Excerp More Informaion Chaper The moion of projeciles In his chaper he model of free moion

More information

Rapid Termination Evaluation for Recursive Subdivision of Bezier Curves

Rapid Termination Evaluation for Recursive Subdivision of Bezier Curves Rapid Terminaion Evaluaion for Recursive Subdivision of Bezier Curves Thomas F. Hain School of Compuer and Informaion Sciences, Universiy of Souh Alabama, Mobile, AL, U.S.A. Absrac Bézier curve flaening

More information

MODULE - 9 LECTURE NOTES 2 GENETIC ALGORITHMS

MODULE - 9 LECTURE NOTES 2 GENETIC ALGORITHMS 1 MODULE - 9 LECTURE NOTES 2 GENETIC ALGORITHMS INTRODUCTION Mos real world opimizaion problems involve complexiies like discree, coninuous or mixed variables, muliple conflicing objecives, non-lineariy,

More information

Ensamble methods: Bagging and Boosting

Ensamble methods: Bagging and Boosting Lecure 21 Ensamble mehods: Bagging and Boosing Milos Hauskrech milos@cs.pi.edu 5329 Senno Square Ensemble mehods Mixure of expers Muliple base models (classifiers, regressors), each covers a differen par

More information

Notes for Lecture 17-18

Notes for Lecture 17-18 U.C. Berkeley CS278: Compuaional Complexiy Handou N7-8 Professor Luca Trevisan April 3-8, 2008 Noes for Lecure 7-8 In hese wo lecures we prove he firs half of he PCP Theorem, he Amplificaion Lemma, up

More information

EXERCISES FOR SECTION 1.5

EXERCISES FOR SECTION 1.5 1.5 Exisence and Uniqueness of Soluions 43 20. 1 v c 21. 1 v c 1 2 4 6 8 10 1 2 2 4 6 8 10 Graph of approximae soluion obained using Euler s mehod wih = 0.1. Graph of approximae soluion obained using Euler

More information

Kriging Models Predicting Atrazine Concentrations in Surface Water Draining Agricultural Watersheds

Kriging Models Predicting Atrazine Concentrations in Surface Water Draining Agricultural Watersheds 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Kriging Models Predicing Arazine Concenraions in Surface Waer Draining Agriculural Waersheds Paul L. Mosquin, Jeremy Aldworh, Wenlin Chen Supplemenal Maerial Number

More information

Hamilton- J acobi Equation: Explicit Formulas In this lecture we try to apply the method of characteristics to the Hamilton-Jacobi equation: u t

Hamilton- J acobi Equation: Explicit Formulas In this lecture we try to apply the method of characteristics to the Hamilton-Jacobi equation: u t M ah 5 2 7 Fall 2 0 0 9 L ecure 1 0 O c. 7, 2 0 0 9 Hamilon- J acobi Equaion: Explici Formulas In his lecure we ry o apply he mehod of characerisics o he Hamilon-Jacobi equaion: u + H D u, x = 0 in R n

More information

Introduction to Mobile Robotics

Introduction to Mobile Robotics Inroducion o Mobile Roboics Bayes Filer Kalman Filer Wolfram Burgard Cyrill Sachniss Giorgio Grisei Maren Bennewiz Chrisian Plagemann Bayes Filer Reminder Predicion bel p u bel d Correcion bel η p z bel

More information

Learning to Discover: A Bayesian Approach

Learning to Discover: A Bayesian Approach Learning o Discover: A Bayesian Approach Zheng Wen Deparmen of Elecrical Engineering Sanford Universiy Sanford, CA zhengwen@sanford.edu Branislav Kveon and Sandilya Bhamidipai Technicolor Labs Palo Alo,

More information

Speaker Adaptation Techniques For Continuous Speech Using Medium and Small Adaptation Data Sets. Constantinos Boulis

Speaker Adaptation Techniques For Continuous Speech Using Medium and Small Adaptation Data Sets. Constantinos Boulis Speaker Adapaion Techniques For Coninuous Speech Using Medium and Small Adapaion Daa Ses Consaninos Boulis Ouline of he Presenaion Inroducion o he speaker adapaion problem Maximum Likelihood Sochasic Transformaions

More information

Learning to Take Concurrent Actions

Learning to Take Concurrent Actions Learning o Take Concurren Acions Khashayar Rohanimanesh Deparmen of Compuer Science Universiy of Massachuses Amhers, MA 0003 khash@cs.umass.edu Sridhar Mahadevan Deparmen of Compuer Science Universiy of

More information

Designing Information Devices and Systems I Spring 2019 Lecture Notes Note 17

Designing Information Devices and Systems I Spring 2019 Lecture Notes Note 17 EES 16A Designing Informaion Devices and Sysems I Spring 019 Lecure Noes Noe 17 17.1 apaciive ouchscreen In he las noe, we saw ha a capacior consiss of wo pieces on conducive maerial separaed by a nonconducive

More information

Online Convex Optimization Example And Follow-The-Leader

Online Convex Optimization Example And Follow-The-Leader CSE599s, Spring 2014, Online Learning Lecure 2-04/03/2014 Online Convex Opimizaion Example And Follow-The-Leader Lecurer: Brendan McMahan Scribe: Sephen Joe Jonany 1 Review of Online Convex Opimizaion

More information

Unit Root Time Series. Univariate random walk

Unit Root Time Series. Univariate random walk Uni Roo ime Series Univariae random walk Consider he regression y y where ~ iid N 0, he leas squares esimae of is: ˆ yy y y yy Now wha if = If y y hen le y 0 =0 so ha y j j If ~ iid N 0, hen y ~ N 0, he

More information

Subway stations energy and air quality management

Subway stations energy and air quality management Subway saions energy and air qualiy managemen wih sochasic opimizaion Trisan Rigau 1,2,4, Advisors: P. Carpenier 3, J.-Ph. Chancelier 2, M. De Lara 2 EFFICACITY 1 CERMICS, ENPC 2 UMA, ENSTA 3 LISIS, IFSTTAR

More information

Non-parametric techniques. Instance Based Learning. NN Decision Boundaries. Nearest Neighbor Algorithm. Distance metric important

Non-parametric techniques. Instance Based Learning. NN Decision Boundaries. Nearest Neighbor Algorithm. Distance metric important on-parameric echniques Insance Based Learning AKA: neares neighbor mehods, non-parameric, lazy, memorybased, or case-based learning Copyrigh 2005 by David Helmbold 1 Do no fi a model (as do LTU, decision

More information

WORLD. States AGENT. Actions

WORLD. States AGENT. Actions Planning and Acing in Parially Observable Sochasic Domains Leslie Pack Kaelbling 1;2 Compuer Science Deparmen Brown Universiy, Box 1910 Providence, RI 02912-1910, USA Michael L. Liman 3 Duke Universiy

More information

An recursive analytical technique to estimate time dependent physical parameters in the presence of noise processes

An recursive analytical technique to estimate time dependent physical parameters in the presence of noise processes WHAT IS A KALMAN FILTER An recursive analyical echnique o esimae ime dependen physical parameers in he presence of noise processes Example of a ime and frequency applicaion: Offse beween wo clocks PREDICTORS,

More information

CS376 Computer Vision Lecture 6: Optical Flow

CS376 Computer Vision Lecture 6: Optical Flow CS376 Compuer Vision Lecure 6: Opical Flow Qiing Huang Feb. 11 h 2019 Slides Credi: Krisen Grauman and Sebasian Thrun, Michael Black, Marc Pollefeys Opical Flow mage racking 3D compuaion mage sequence

More information

Off-policy TD(λ) with a true online equivalence

Off-policy TD(λ) with a true online equivalence Off-policy TD(λ) wih a rue online equivalence Hado van Hassel A Rupam Mahmood Richard S Suon Reinforcemen Learning and Arificial Inelligence Laboraory Universiy of Albera, Edmonon, AB T6G 2E8 Canada Absrac

More information

Chapter 7: Solving Trig Equations

Chapter 7: Solving Trig Equations Haberman MTH Secion I: The Trigonomeric Funcions Chaper 7: Solving Trig Equaions Le s sar by solving a couple of equaions ha involve he sine funcion EXAMPLE a: Solve he equaion sin( ) The inverse funcions

More information

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing Applicaion of a Sochasic-Fuzzy Approach o Modeling Opimal Discree Time Dynamical Sysems by Using Large Scale Daa Processing AA WALASZE-BABISZEWSA Deparmen of Compuer Engineering Opole Universiy of Technology

More information

Lecture 1 Overview. course mechanics. outline & topics. what is a linear dynamical system? why study linear systems? some examples

Lecture 1 Overview. course mechanics. outline & topics. what is a linear dynamical system? why study linear systems? some examples EE263 Auumn 27-8 Sephen Boyd Lecure 1 Overview course mechanics ouline & opics wha is a linear dynamical sysem? why sudy linear sysems? some examples 1 1 Course mechanics all class info, lecures, homeworks,

More information

Lecture 2 October ε-approximation of 2-player zero-sum games

Lecture 2 October ε-approximation of 2-player zero-sum games Opimizaion II Winer 009/10 Lecurer: Khaled Elbassioni Lecure Ocober 19 1 ε-approximaion of -player zero-sum games In his lecure we give a randomized ficiious play algorihm for obaining an approximae soluion

More information

A Primal-Dual Type Algorithm with the O(1/t) Convergence Rate for Large Scale Constrained Convex Programs

A Primal-Dual Type Algorithm with the O(1/t) Convergence Rate for Large Scale Constrained Convex Programs PROC. IEEE CONFERENCE ON DECISION AND CONTROL, 06 A Primal-Dual Type Algorihm wih he O(/) Convergence Rae for Large Scale Consrained Convex Programs Hao Yu and Michael J. Neely Absrac This paper considers

More information

Probabilistic Robotics SLAM

Probabilistic Robotics SLAM Probabilisic Roboics SLAM The SLAM Problem SLAM is he process by which a robo builds a map of he environmen and, a he same ime, uses his map o compue is locaion Localizaion: inferring locaion given a map

More information

Energy Storage Benchmark Problems

Energy Storage Benchmark Problems Energy Sorage Benchmark Problems Daniel F. Salas 1,3, Warren B. Powell 2,3 1 Deparmen of Chemical & Biological Engineering 2 Deparmen of Operaions Research & Financial Engineering 3 Princeon Laboraory

More information

23.2. Representing Periodic Functions by Fourier Series. Introduction. Prerequisites. Learning Outcomes

23.2. Representing Periodic Functions by Fourier Series. Introduction. Prerequisites. Learning Outcomes Represening Periodic Funcions by Fourier Series 3. Inroducion In his Secion we show how a periodic funcion can be expressed as a series of sines and cosines. We begin by obaining some sandard inegrals

More information

Non-parametric techniques. Instance Based Learning. NN Decision Boundaries. Nearest Neighbor Algorithm. Distance metric important

Non-parametric techniques. Instance Based Learning. NN Decision Boundaries. Nearest Neighbor Algorithm. Distance metric important on-parameric echniques Insance Based Learning AKA: neares neighbor mehods, non-parameric, lazy, memorybased, or case-based learning Copyrigh 2005 by David Helmbold 1 Do no fi a model (as do LDA, logisic

More information

CMU-Q Lecture 3: Search algorithms: Informed. Teacher: Gianni A. Di Caro

CMU-Q Lecture 3: Search algorithms: Informed. Teacher: Gianni A. Di Caro CMU-Q 5-38 Lecure 3: Search algorihms: Informed Teacher: Gianni A. Di Caro UNINFORMED VS. INFORMED SEARCH Sraegy How desirable is o be in a cerain inermediae sae for he sake of (effecively) reaching a

More information

Appendix to Creating Work Breaks From Available Idleness

Appendix to Creating Work Breaks From Available Idleness Appendix o Creaing Work Breaks From Available Idleness Xu Sun and Ward Whi Deparmen of Indusrial Engineering and Operaions Research, Columbia Universiy, New York, NY, 127; {xs2235,ww24}@columbia.edu Sepember

More information

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle Chaper 2 Newonian Mechanics Single Paricle In his Chaper we will review wha Newon s laws of mechanics ell us abou he moion of a single paricle. Newon s laws are only valid in suiable reference frames,

More information

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD HAN XIAO 1. Penalized Leas Squares Lasso solves he following opimizaion problem, ˆβ lasso = arg max β R p+1 1 N y i β 0 N x ij β j β j (1.1) for some 0.

More information

RANDOM LAGRANGE MULTIPLIERS AND TRANSVERSALITY

RANDOM LAGRANGE MULTIPLIERS AND TRANSVERSALITY ECO 504 Spring 2006 Chris Sims RANDOM LAGRANGE MULTIPLIERS AND TRANSVERSALITY 1. INTRODUCTION Lagrange muliplier mehods are sandard fare in elemenary calculus courses, and hey play a cenral role in economic

More information

Final Spring 2007

Final Spring 2007 .615 Final Spring 7 Overview The purpose of he final exam is o calculae he MHD β limi in a high-bea oroidal okamak agains he dangerous n = 1 exernal ballooning-kink mode. Effecively, his corresponds o

More information

The equation to any straight line can be expressed in the form:

The equation to any straight line can be expressed in the form: Sring Graphs Par 1 Answers 1 TI-Nspire Invesigaion Suden min Aims Deermine a series of equaions of sraigh lines o form a paern similar o ha formed by he cables on he Jerusalem Chords Bridge. Deermine he

More information

t is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t...

t is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t... Mah 228- Fri Mar 24 5.6 Marix exponenials and linear sysems: The analogy beween firs order sysems of linear differenial equaions (Chaper 5) and scalar linear differenial equaions (Chaper ) is much sronger

More information

Linear Response Theory: The connection between QFT and experiments

Linear Response Theory: The connection between QFT and experiments Phys540.nb 39 3 Linear Response Theory: The connecion beween QFT and experimens 3.1. Basic conceps and ideas Q: How do we measure he conduciviy of a meal? A: we firs inroduce a weak elecric field E, and

More information