RL Lecture 7: Eligibility Traces. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1
|
|
- Grant Holt
- 5 years ago
- Views:
Transcription
1 RL Lecure 7: Eligibiliy Traces R. S. Suon and A. G. Baro: Reinforcemen Learning: An Inroducion 1
2 N-sep TD Predicion Idea: Look farher ino he fuure when you do TD backup (1, 2, 3,, n seps) R. S. Suon and A. G. Baro: Reinforcemen Learning: An Inroducion 2
3 R. S. Suon and A. G. Baro: Reinforcemen Learning: An Inroducion 3 Mone Carlo: TD: Use V o esimae remaining reurn n-sep TD: 2 sep reurn: n-sep reurn: Mahemaics of N-sep TD Predicion T T r r r r R = γ γ γ! ( 1 ) 1 (1) = s V r R γ ( 2 ) (2) = s V r r R γ γ ) ( ) ( n n n n n s V r r r r R = γ γ γ γ!
4 Learning wih N-sep Backups Backup (on-line or off-line): ΔV (s ) = α[ R (n) V (s )] Error reducion propery of n-sep reurns max s n π n Eπ { R s = s} V ( s) γ maxv ( s) V s π ( s) n sep reurn Maximum error using n-sep reurn Maximum error using V Using his, you can show ha n-sep mehods converge R. S. Suon and A. G. Baro: Reinforcemen Learning: An Inroducion 4
5 Random Walk Examples How does 2-sep TD work here? How abou 3-sep TD? R. S. Suon and A. G. Baro: Reinforcemen Learning: An Inroducion 5
6 A Larger Example Task: 19 sae random walk Do you hink here is an opimal n (for everyhing)? R. S. Suon and A. G. Baro: Reinforcemen Learning: An Inroducion 6
7 Averaging N-sep Reurns n-sep mehods were inroduced o help wih TD(λ) undersanding Idea: backup an average of several reurns e.g. backup half of 2-sep and half of 4- sep avg 1 (2) R = R R (4) One backup Called a complex backup Draw each componen Label wih he weighs for ha componen R. S. Suon and A. G. Baro: Reinforcemen Learning: An Inroducion 7
8 Forward View of TD(λ) TD(λ) is a mehod for averaging all n-sep backups weigh by λ n-1 (ime since visiaion) λ-reurn: R λ = (1 λ) Backup using λ-reurn: λ n 1 (n) R n=1 ΔV (s ) = α[ R λ V (s )] R. S. Suon and A. G. Baro: Reinforcemen Learning: An Inroducion 8
9 λ-reurn Weighing Funcion R. S. Suon and A. G. Baro: Reinforcemen Learning: An Inroducion 9
10 Relaion o TD(0) and MC λ-reurn can be rewrien as: R λ = (1 λ) T 1 λ n 1 n=1 R (n) + λ T 1 R Unil erminaion If λ = 1, you ge MC: Afer erminaion R λ = (1 1) T 1 1 n 1 n=1 R (n ) + 1 T 1 R = R If λ = 0, you ge TD(0) R λ = (1 0) T 1 0 n 1 n=1 R (n ) + 0 T 1 R = R (1) R. S. Suon and A. G. Baro: Reinforcemen Learning: An Inroducion 10
11 Forward View of TD(λ) II Look forward from each sae o deermine updae from fuure saes and rewards: R. S. Suon and A. G. Baro: Reinforcemen Learning: An Inroducion 11
12 λ-reurn on he Random Walk Same 19 sae random walk as before Why do you hink inermediae values of λ are bes? R. S. Suon and A. G. Baro: Reinforcemen Learning: An Inroducion 12
13 Backward View of TD(λ) The forward view was for heory The backward view is for mechanism New variable called eligibiliy race (s)" On each sep, decay all races by γλ and incremen he race for he curren sae by 1 Accumulaing race e + % γλe 1 (s) e (s) = & ' γλe 1 (s) +1 if s s if s = s R. S. Suon and A. G. Baro: Reinforcemen Learning: An Inroducion 13
14 On-line Tabular TD(λ) Iniialize V(s) arbirarily and e(s) = 0, for all s S Repea (for each episode) : Iniialize s Repea (for each sep of episode) : a acion given by π for s Take acion a, observe reward, r, and nex sae s $ δ r +γv( s $ ) V (s) e(s) e(s) +1 For all s : s s $ V(s) V(s) +αδe(s) e(s) γλe(s) Unil s is erminal R. S. Suon and A. G. Baro: Reinforcemen Learning: An Inroducion 14
15 Backward View δ = r ( ) γv s+ 1 V ( s ) Shou δ backwards over ime The srengh of your voice decreases wih emporal disance by γλ R. S. Suon and A. G. Baro: Reinforcemen Learning: An Inroducion 15
16 Relaion of Backwards View o MC & TD(0) Using updae rule: As before, if you se λ o 0, you ge o TD(0) If you se λ o 1, you ge MC bu in a beer way ΔV ( s) = e ( s) αδ Can apply TD(1) o coninuing asks Works incremenally and on-line (insead of waiing o he end of he episode) R. S. Suon and A. G. Baro: Reinforcemen Learning: An Inroducion 16
17 Forward View = Backward View The forward (heoreical) view of TD(λ) is equivalen o he backward (mechanisic) view for off-line updaing The book shows: T 1 ΔV TD (s) = ΔV λ (s ) = 0 T 1 = 0 I ss Backward updaes Forward updaes algebra shown in book T 1 ΔV TD (s) = αi ss = 0 T 1 = 0 T 1 k = (γλ) k δ k T 1 On-line updaing wih small α is similar ΔV λ (s )I ss = αi ss = 0 T 1 = 0 T 1 k = (γλ) k δ k R. S. Suon and A. G. Baro: Reinforcemen Learning: An Inroducion 17
18 On-line versus Off-line on Random Walk Same 19 sae random walk On-line performs beer over a broader range of parameers R. S. Suon and A. G. Baro: Reinforcemen Learning: An Inroducion 18
19 Conrol: Sarsa(λ) Save eligibiliy for sae-acion pairs insead of jus saes $ e (s, a) = γλe 1(s, a) +1 % & γλe 1 (s,a) if s = s and a = a oherwise Q +1 (s, a) = Q (s, a) +αδ e (s, a) δ = r +1 + γq (s +1,a +1 ) Q (s, a ) R. S. Suon and A. G. Baro: Reinforcemen Learning: An Inroducion 19
20 Sarsa(λ) Algorihm Iniialize Q(s,a) arbirarily and e(s, a) = 0, for all s, a Repea (for each episode) : Iniialize s, a Repea (for each sep of episode) : Take acion a, observe r, s! Choose a! from s! using policy derived from Q (e.g.? - greedy) δ r +γq( s!, a!) Q(s, a) e(s,a) e(s,a) +1 For all s,a : Q(s, a) Q(s, a) +αδe(s, a) e(s, a) γλe(s, a) s s!;a a! Unil s is erminal R. S. Suon and A. G. Baro: Reinforcemen Learning: An Inroducion 20
21 Sarsa(λ) Gridworld Example Wih one rial, he agen has much more informaion abou how o ge o he goal no necessarily he bes way Can considerably accelerae learning R. S. Suon and A. G. Baro: Reinforcemen Learning: An Inroducion 21
22 Three Approaches o Q(λ) How can we exend his o Q- learning? If you mark every sae acion pair as eligible, you backup over non-greedy policy Wakins: Zero ou eligibiliy race afer a non-greedy acion. Do max when backing up a firs nongreedy choice. e (s, a) = % 1 + γλe 1 (s, a) ' & 0 ( ' γλe 1 (s,a) if s = s, a = a,q 1 (s,a ) = max a Q 1 (s, a) if Q 1 (s,a ) max a Q 1 (s,a) oherwise Q +1 (s, a) = Q (s, a) +αδ e (s, a) δ = r +1 + γ max a + Q (s +1, a + ) Q (s,a ) R. S. Suon and A. G. Baro: Reinforcemen Learning: An Inroducion 22
23 Wakins s Q(λ) Iniialize Q(s,a) arbirarily and e(s, a) = 0, for all s, a Repea (for each episode) : Iniialize s, a Repea (for each sep of episode) : Take acion a, observe r, s! Choose a! from s! using policy derived from Q (e.g.? - greedy) a * arg max b Q( s!, b) (if a ies for he max, hen a * a!) δ r +γq( s!, a!) Q(s, a * ) e(s,a) e(s,a) +1 For all s,a : Q(s, a) Q(s, a) +αδe(s, a) If a! = a *, hen e(s, a) γλe(s,a) s s!;a a! Unil s is erminal else e(s, a) 0 R. S. Suon and A. G. Baro: Reinforcemen Learning: An Inroducion 23
24 Peng s Q(λ) Disadvanage o Wakins s mehod: Peng: Early in learning, he eligibiliy race will be cu (zeroed ou) frequenly resuling in lile advanage o races Backup max acion excep a end Never cu races Disadvanage: Complicaed o implemen R. S. Suon and A. G. Baro: Reinforcemen Learning: An Inroducion 24
25 Naïve Q(λ) Idea: is i really a problem o backup exploraory acions? Never zero races Always backup max a curren acion (unlike Peng or Wakins s) Is his ruly naïve? Works well is preliminary empirical sudies Wha is he backup diagram? R. S. Suon and A. G. Baro: Reinforcemen Learning: An Inroducion 25
26 Comparison Task Compared Wakins s, Peng s, and Naïve (called McGovern s here) Q(λ) on several asks. See McGovern and Suon (1997). Towards a Beer Q(λ) for oher asks and resuls (sochasic asks, coninuing asks, ec) Deerminisic gridworld wih obsacles 10x10 gridworld 25 randomly generaed obsacles 30 runs α = 0.05, γ = 0.9, λ = 0.9, ε = 0.05, accumulaing races From McGovern and Suon (1997). Towards a beer Q(λ) R. S. Suon and A. G. Baro: Reinforcemen Learning: An Inroducion 26
27 Comparison Resuls From McGovern and Suon (1997). Towards a beer Q(λ) R. S. Suon and A. G. Baro: Reinforcemen Learning: An Inroducion 27
28 Convergence of he Q(λ) s None of he mehods are proven o converge. Much exra credi if you can prove any of hem. Wakins s is hough o converge o Q * Peng s is hough o converge o a mixure of Q π and Q * Naïve - Q *? R. S. Suon and A. G. Baro: Reinforcemen Learning: An Inroducion 28
29 Eligibiliy Traces for Acor-Criic Mehods Criic: On-policy learning of V π. Use TD(λ) as described before. Acor: Needs eligibiliy races for each sae-acion pair. We change he updae equaion: # p +1 (s, a) = p (s,a) +αδ if a = a and s = s $ % p (s, a) oherwise o p + 1( s, a) = p ( s, a) + αδ e ( s, a) Can change he oher acor-criic updae: [ ] if a = a and s = s % p +1 (s, a) = p (s,a) +αδ 1 π(s, a) & ' p (s,a) oherwise o p+ 1( s, a) = p ( s, a) + αδ e ( s, a) where % e (s, a) = γλe (s, a) +1 π (s,a ) 1 & ' γλe 1 (s, a) if s = s and a = a oherwise R. S. Suon and A. G. Baro: Reinforcemen Learning: An Inroducion 29
30 Replacing Traces Using accumulaing races, frequenly visied saes can have eligibiliies greaer han 1 This can be a problem for convergence Replacing races: Insead of adding 1 when you visi a sae, se ha race o 1 % e (s) = γλe (s) if s s 1 & ' 1 if s = s R. S. Suon and A. G. Baro: Reinforcemen Learning: An Inroducion 30
31 Replacing Traces Example Same 19 sae random walk ask as before Replacing races perform beer han accumulaing races over more values of λ R. S. Suon and A. G. Baro: Reinforcemen Learning: An Inroducion 31
32 Why Replacing Traces? Replacing races can significanly speed learning They can make he sysem perform well for a broader se of parameers Accumulaing races can do poorly on cerain ypes of asks Why is his ask paricularly onerous for accumulaing races? R. S. Suon and A. G. Baro: Reinforcemen Learning: An Inroducion 32
33 More Replacing Traces Off-line replacing race TD(1) is idenical o firs-visi MC Exension o acion-values: When you revisi a sae, wha should you do wih he races for he oher acions? Singh and Suon say o se hem o zero: e (s, a) = $ & 1 if s = s and a = a % 0 if s = s and a a & ' γλe 1 (s, a) if s s R. S. Suon and A. G. Baro: Reinforcemen Learning: An Inroducion 33
34 Implemenaion Issues Could require much more compuaion Bu mos eligibiliy races are VERY close o zero If you implemen i in Malab, backup is only one line of code and is very fas (Malab is opimized for marices) R. S. Suon and A. G. Baro: Reinforcemen Learning: An Inroducion 34
35 Variable λ Can generalize o variable λ % γλ e 1 (s) e (s) = & ' γλ e 1 (s) +1 if s s if s = s Here λ is a funcion of ime Could define λ = λ( s ) or λ = λ τ R. S. Suon and A. G. Baro: Reinforcemen Learning: An Inroducion 35
36 Conclusions Provides efficien, incremenal way o combine MC and TD Includes advanages of MC (can deal wih lack of Markov propery) Includes advanages of TD (using TD error, boosrapping) Can significanly speed learning Does have a cos in compuaion R. S. Suon and A. G. Baro: Reinforcemen Learning: An Inroducion 36
37 Somehing Here is No Like he Oher R. S. Suon and A. G. Baro: Reinforcemen Learning: An Inroducion 37
Chapter 7: Eligibility Traces. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1
Chapter 7: Eligibility Traces R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Midterm Mean = 77.33 Median = 82 R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction
More informationThe Book: Where we are and where we re going. CSE 190: Reinforcement Learning: An Introduction. Chapter 7: Eligibility Traces. Simple Monte Carlo
CSE 190: Reinforcement Learning: An Introduction Chapter 7: Eligibility races Acknowledgment: A good number of these slides are cribbed from Rich Sutton he Book: Where we are and where we re going Part
More informationCSE/NB 528 Lecture 14: Reinforcement Learning (Chapter 9)
CSE/NB 528 Lecure 14: Reinforcemen Learning Chaper 9 Image from hp://clasdean.la.asu.edu/news/images/ubep2001/neuron3.jpg Lecure figures are from Dayan & Abbo s book hp://people.brandeis.edu/~abbo/book/index.hml
More informationTrue Online Temporal-Difference Learning. A. Rupam Mahmood Patrick M. Pilarski
True Online Temporal-Difference Learning True Online Temporal-Difference Learning Harm van Seijen harm.vanseijen@ualbera.ca A. Rupam Mahmood ashique@ualbera.ca Parick M. Pilarski parick.pilarski@ualbera.ca
More informationPresentation Overview
Acion Refinemen in Reinforcemen Learning by Probabiliy Smoohing By Thomas G. Dieerich & Didac Busques Speaer: Kai Xu Presenaion Overview Bacground The Probabiliy Smoohing Mehod Experimenal Sudy of Acion
More information1 Review of Zero-Sum Games
COS 5: heoreical Machine Learning Lecurer: Rob Schapire Lecure #23 Scribe: Eugene Brevdo April 30, 2008 Review of Zero-Sum Games Las ime we inroduced a mahemaical model for wo player zero-sum games. Any
More informationAn Analysis of Actor/Critic Algorithms using Eligibility Traces: Reinforcement Learning with Imperfect Value Functions
An Analysis of Acor/Criic Algorihms using Eligibiliy Traces: Reinforcemen Learning wih Imperfec Value Funcions Hajime Kimura 3 Tokyo Insiue of Technology gen@fe.dis.iech.ac.jp Shigenobu Kobayashi Tokyo
More informationTemporal difference learning
Temporal difference learning AI & Agents for IET Lecturer: S Luz http://www.scss.tcd.ie/~luzs/t/cs7032/ February 4, 2014 Recall background & assumptions Environment is a finite MDP (i.e. A and S are finite).
More informationReinforcement Learning. George Konidaris
Reinforcement Learning George Konidaris gdk@cs.brown.edu Fall 2017 Machine Learning Subfield of AI concerned with learning from data. Broadly, using: Experience To Improve Performance On Some Task (Tom
More informationBias-Variance Error Bounds for Temporal Difference Updates
Bias-Variance Bounds for Temporal Difference Updaes Michael Kearns AT&T Labs mkearns@research.a.com Sainder Singh AT&T Labs baveja@research.a.com Absrac We give he firs rigorous upper bounds on he error
More informationL07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS. NA568 Mobile Robotics: Methods & Algorithms
L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS NA568 Mobile Roboics: Mehods & Algorihms Today s Topic Quick review on (Linear) Kalman Filer Kalman Filering for Non-Linear Sysems Exended Kalman Filer (EKF)
More informationOff-policy TD(λ) with a true online equivalence
Off-policy TD(λ) wih a rue online equivalence Hado van Hassel A Rupam Mahmood Richard S Suon Reinforcemen Learning and Arificial Inelligence Laboraory Universiy of Albera, Edmonon, AB T6G 2E8 Canada Absrac
More informationNotes on Kalman Filtering
Noes on Kalman Filering Brian Borchers and Rick Aser November 7, Inroducion Daa Assimilaion is he problem of merging model predicions wih acual measuremens of a sysem o produce an opimal esimae of he curren
More informationA Dynamic Model of Economic Fluctuations
CHAPTER 15 A Dynamic Model of Economic Flucuaions Modified for ECON 2204 by Bob Murphy 2016 Worh Publishers, all righs reserved IN THIS CHAPTER, OU WILL LEARN: how o incorporae dynamics ino he AD-AS model
More informationTemporal Abstraction in Temporal-difference Networks
Temporal Absracion in Temporal-difference Neworks Richard S. Suon, Eddie J. Rafols, Anna Koop Deparmen of Compuing Science Universiy of Albera Edmonon, AB, Canada T6G 2E8 {suon,erafols,anna}@cs.ualbera.ca
More informationSpeaker Adaptation Techniques For Continuous Speech Using Medium and Small Adaptation Data Sets. Constantinos Boulis
Speaker Adapaion Techniques For Coninuous Speech Using Medium and Small Adapaion Daa Ses Consaninos Boulis Ouline of he Presenaion Inroducion o he speaker adapaion problem Maximum Likelihood Sochasic Transformaions
More informationOrdinary differential equations. Phys 750 Lecture 7
Ordinary differenial equaions Phys 750 Lecure 7 Ordinary Differenial Equaions Mos physical laws are expressed as differenial equaions These come in hree flavours: iniial-value problems boundary-value problems
More informationDesigning Information Devices and Systems I Spring 2019 Lecture Notes Note 17
EES 16A Designing Informaion Devices and Sysems I Spring 019 Lecure Noes Noe 17 17.1 apaciive ouchscreen In he las noe, we saw ha a capacior consiss of wo pieces on conducive maerial separaed by a nonconducive
More informationZürich. ETH Master Course: L Autonomous Mobile Robots Localization II
Roland Siegwar Margaria Chli Paul Furgale Marco Huer Marin Rufli Davide Scaramuzza ETH Maser Course: 151-0854-00L Auonomous Mobile Robos Localizaion II ACT and SEE For all do, (predicion updae / ACT),
More informationTwo Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017
Two Popular Bayesian Esimaors: Paricle and Kalman Filers McGill COMP 765 Sep 14 h, 2017 1 1 1, dx x Bel x u x P x z P Recall: Bayes Filers,,,,,,, 1 1 1 1 u z u x P u z u x z P Bayes z = observaion u =
More informationIntroduction D P. r = constant discount rate, g = Gordon Model (1962): constant dividend growth rate.
Inroducion Gordon Model (1962): D P = r g r = consan discoun rae, g = consan dividend growh rae. If raional expecaions of fuure discoun raes and dividend growh vary over ime, so should he D/P raio. Since
More informationAn introduction to the theory of SDDP algorithm
An inroducion o he heory of SDDP algorihm V. Leclère (ENPC) Augus 1, 2014 V. Leclère Inroducion o SDDP Augus 1, 2014 1 / 21 Inroducion Large scale sochasic problem are hard o solve. Two ways of aacking
More informationRobust and Learning Control for Complex Systems
Robus and Learning Conrol for Complex Sysems Peer M. Young Sepember 13, 2007 & Talk Ouline Inroducion Robus Conroller Analysis and Design Theory Experimenal Applicaions Overview MIMO Robus HVAC Conrol
More informationCSE/NB 528 Lecture 14: From Supervised to Reinforcement Learning (Chapter 9) R. Rao, 528: Lecture 14
CSE/NB 58 Lecure 14: From Supervised o Reinforcemen Learning Chaper 9 1 Recall from las ime: Sigmoid Neworks Oupu v T g w u g wiui w Inpu nodes u = u 1 u u 3 T i Sigmoid oupu funcion: 1 g a 1 a e 1 ga
More informationSequential Importance Resampling (SIR) Particle Filter
Paricle Filers++ Pieer Abbeel UC Berkeley EECS Many slides adaped from Thrun, Burgard and Fox, Probabilisic Roboics 1. Algorihm paricle_filer( S -1, u, z ): 2. Sequenial Imporance Resampling (SIR) Paricle
More informationEECE 301 Signals & Systems Prof. Mark Fowler
EECE 3 Signals & Sysems Prof. Mark Fowler Noe Se # Wha are Coninuous-Time Signals??? /6 Coninuous-Time Signal Coninuous Time (C-T) Signal: A C-T signal is defined on he coninuum of ime values. Tha is:
More informationLearning to Take Concurrent Actions
Learning o Take Concurren Acions Khashayar Rohanimanesh Deparmen of Compuer Science Universiy of Massachuses Amhers, MA 0003 khash@cs.umass.edu Sridhar Mahadevan Deparmen of Compuer Science Universiy of
More informationReinforcement Learning
Reinforcement Learning Temporal Difference Learning Temporal difference learning, TD prediction, Q-learning, elibigility traces. (many slides from Marc Toussaint) Vien Ngo MLR, University of Stuttgart
More informationChapter 21. Reinforcement Learning. The Reinforcement Learning Agent
CSE 47 Chaper Reinforcemen Learning The Reinforcemen Learning Agen Agen Sae u Reward r Acion a Enironmen CSE AI Faculy Why reinforcemen learning Programming an agen o drie a car or fly a helicoper is ery
More informationSingle and Double Pendulum Models
Single and Double Pendulum Models Mah 596 Projec Summary Spring 2016 Jarod Har 1 Overview Differen ypes of pendulums are used o model many phenomena in various disciplines. In paricular, single and double
More informationA First Course on Kinetics and Reaction Engineering. Class 19 on Unit 18
A Firs ourse on Kineics and Reacion Engineering lass 19 on Uni 18 Par I - hemical Reacions Par II - hemical Reacion Kineics Where We re Going Par III - hemical Reacion Engineering A. Ideal Reacors B. Perfecly
More informationSimulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010
Simulaion-Solving Dynamic Models ABE 5646 Week 2, Spring 2010 Week Descripion Reading Maerial 2 Compuer Simulaion of Dynamic Models Finie Difference, coninuous saes, discree ime Simple Mehods Euler Trapezoid
More information3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon
3..3 INRODUCION O DYNAMIC OPIMIZAION: DISCREE IME PROBLEMS A. he Hamilonian and Firs-Order Condiions in a Finie ime Horizon Define a new funcion, he Hamilonian funcion, H. H he change in he oal value of
More information5. Stochastic processes (1)
Lec05.pp S-38.45 - Inroducion o Teleraffic Theory Spring 2005 Conens Basic conceps Poisson process 2 Sochasic processes () Consider some quaniy in a eleraffic (or any) sysem I ypically evolves in ime randomly
More informationClassical Conditioning IV: TD learning in the brain
Classical Condiioning IV: TD learning in he brain PSY/NEU338: Animal learning and decision making: Psychological, compuaional and neural perspecives recap: Marr s levels of analysis David Marr (1945-1980)
More informationARTIFICIAL INTELLIGENCE. Markov decision processes
INFOB2KI 2017-2018 Urech Univeriy The Neherland ARTIFICIAL INTELLIGENCE Markov deciion procee Lecurer: Silja Renooij Thee lide are par of he INFOB2KI Coure Noe available from www.c.uu.nl/doc/vakken/b2ki/chema.hml
More informationSTATE-SPACE MODELLING. A mass balance across the tank gives:
B. Lennox and N.F. Thornhill, 9, Sae Space Modelling, IChemE Process Managemen and Conrol Subjec Group Newsleer STE-SPACE MODELLING Inroducion: Over he pas decade or so here has been an ever increasing
More informationCMU-Q Lecture 3: Search algorithms: Informed. Teacher: Gianni A. Di Caro
CMU-Q 5-38 Lecure 3: Search algorihms: Informed Teacher: Gianni A. Di Caro UNINFORMED VS. INFORMED SEARCH Sraegy How desirable is o be in a cerain inermediae sae for he sake of (effecively) reaching a
More information3.1 More on model selection
3. More on Model selecion 3. Comparing models AIC, BIC, Adjused R squared. 3. Over Fiing problem. 3.3 Sample spliing. 3. More on model selecion crieria Ofen afer model fiing you are lef wih a handful of
More informationChapter 2. First Order Scalar Equations
Chaper. Firs Order Scalar Equaions We sar our sudy of differenial equaions in he same way he pioneers in his field did. We show paricular echniques o solve paricular ypes of firs order differenial equaions.
More informationLecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still.
Lecure - Kinemaics in One Dimension Displacemen, Velociy and Acceleraion Everyhing in he world is moving. Nohing says sill. Moion occurs a all scales of he universe, saring from he moion of elecrons in
More informationInventory Analysis and Management. Multi-Period Stochastic Models: Optimality of (s, S) Policy for K-Convex Objective Functions
Muli-Period Sochasic Models: Opimali of (s, S) Polic for -Convex Objecive Funcions Consider a seing similar o he N-sage newsvendor problem excep ha now here is a fixed re-ordering cos (> 0) for each (re-)order.
More informationVehicle Arrival Models : Headway
Chaper 12 Vehicle Arrival Models : Headway 12.1 Inroducion Modelling arrival of vehicle a secion of road is an imporan sep in raffic flow modelling. I has imporan applicaion in raffic flow simulaion where
More informationPlanning in POMDPs. Dominik Schoenberger Abstract
Planning in POMDPs Dominik Schoenberger d.schoenberger@sud.u-darmsad.de Absrac This documen briefly explains wha a Parially Observable Markov Decision Process is. Furhermore i inroduces he differen approaches
More informationRC, RL and RLC circuits
Name Dae Time o Complee h m Parner Course/ Secion / Grade RC, RL and RLC circuis Inroducion In his experimen we will invesigae he behavior of circuis conaining combinaions of resisors, capaciors, and inducors.
More informationCSE Computer Architecture I
Single cycle Conrol Implemenaion CSE 332 Compuer Archiecure I l x I Lecure 7 - uli Cycle achines i i [ ] I I r l ichael Niemier Deparmen of Compuer Science and Engineering I ] i X.S. Hu 5- X.S. Hu 5-2
More information( ) ( ) if t = t. It must satisfy the identity. So, bulkiness of the unit impulse (hyper)function is equal to 1. The defining characteristic is
UNIT IMPULSE RESPONSE, UNIT STEP RESPONSE, STABILITY. Uni impulse funcion (Dirac dela funcion, dela funcion) rigorously defined is no sricly a funcion, bu disribuion (or measure), precise reamen requires
More informationDiebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles
Diebold, Chaper 7 Francis X. Diebold, Elemens of Forecasing, 4h Ediion (Mason, Ohio: Cengage Learning, 006). Chaper 7. Characerizing Cycles Afer compleing his reading you should be able o: Define covariance
More informationWritten HW 9 Sol. CS 188 Fall Introduction to Artificial Intelligence
CS 188 Fall 2018 Inroducion o Arificial Inelligence Wrien HW 9 Sol. Self-assessmen due: Tuesday 11/13/2018 a 11:59pm (submi via Gradescope) For he self assessmen, fill in he self assessmen boxes in your
More informationLinear Gaussian State Space Models
Linear Gaussian Sae Space Models Srucural Time Series Models Level and Trend Models Basic Srucural Model (BSM Dynamic Linear Models Sae Space Model Represenaion Level, Trend, and Seasonal Models Time Varying
More informationEECE 301 Signals & Systems Prof. Mark Fowler
EECE 3 Signals & Sysems Prof. Mark Fowler Noe Se #2 Wha are Coninuous-Time Signals??? Reading Assignmen: Secion. of Kamen and Heck /22 Course Flow Diagram The arrows here show concepual flow beween ideas.
More informationLinear Time-invariant systems, Convolution, and Cross-correlation
Linear Time-invarian sysems, Convoluion, and Cross-correlaion (1) Linear Time-invarian (LTI) sysem A sysem akes in an inpu funcion and reurns an oupu funcion. x() T y() Inpu Sysem Oupu y() = T[x()] An
More informationAdvanced Organic Chemistry
Lalic, G. Chem 53A Chemisry 53A Advanced Organic Chemisry Lecure noes 1 Kineics: A racical Approach Simple Kineics Scenarios Fiing Experimenal Daa Using Kineics o Deermine he Mechanism Doughery, D. A.,
More information6.01: Introduction to EECS I Lecture 8 March 29, 2011
6.01: Inroducion o EES I Lecure 8 March 29, 2011 6.01: Inroducion o EES I Op-Amps Las Time: The ircui Absracion ircuis represen sysems as connecions of elemens hrough which currens (hrough variables) flow
More informationLecture Notes 2. The Hilbert Space Approach to Time Series
Time Series Seven N. Durlauf Universiy of Wisconsin. Basic ideas Lecure Noes. The Hilber Space Approach o Time Series The Hilber space framework provides a very powerful language for discussing he relaionship
More informationLAPLACE TRANSFORM AND TRANSFER FUNCTION
CHBE320 LECTURE V LAPLACE TRANSFORM AND TRANSFER FUNCTION Professor Dae Ryook Yang Spring 2018 Dep. of Chemical and Biological Engineering 5-1 Road Map of he Lecure V Laplace Transform and Transfer funcions
More informationRecursive Least-Squares Fixed-Interval Smoother Using Covariance Information based on Innovation Approach in Linear Continuous Stochastic Systems
8 Froniers in Signal Processing, Vol. 1, No. 1, July 217 hps://dx.doi.org/1.2266/fsp.217.112 Recursive Leas-Squares Fixed-Inerval Smooher Using Covariance Informaion based on Innovaion Approach in Linear
More informationt is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t...
Mah 228- Fri Mar 24 5.6 Marix exponenials and linear sysems: The analogy beween firs order sysems of linear differenial equaions (Chaper 5) and scalar linear differenial equaions (Chaper ) is much sronger
More information1. Consider a pure-exchange economy with stochastic endowments. The state of the economy
Answer 4 of he following 5 quesions. 1. Consider a pure-exchange economy wih sochasic endowmens. The sae of he economy in period, 0,1,..., is he hisory of evens s ( s0, s1,..., s ). The iniial sae is given.
More informationChapter 5 Digital PID control algorithm. Hesheng Wang Department of Automation,SJTU 2016,03
Chaper 5 Digial PID conrol algorihm Hesheng Wang Deparmen of Auomaion,SJTU 216,3 Ouline Absrac Quasi-coninuous PID conrol algorihm Improvemen of sandard PID algorihm Choosing parameer of PID regulaor Brief
More informationCptS 570 Machine Learning School of EECS Washington State University. CptS Machine Learning 1
CpS 570 Machine Learning School of EECS Washingon Sae Universiy CpS 570 - Machine Learning 1 Form of underlying disribuions unknown Bu sill wan o perform classificaion and regression Semi-parameric esimaion
More informationPART A and ONE question from PART B; or ONE question from PART A and TWO questions from PART B.
Advanced Topics in Machine Learning, GI13, 2010/11 Advanced Topics in Machine Learning, GI13, 2010/11 Answer any THREE questions. Each question is worth 20 marks. Use separate answer books Answer any THREE
More informationParticle Swarm Optimization Combining Diversification and Intensification for Nonlinear Integer Programming Problems
Paricle Swarm Opimizaion Combining Diversificaion and Inensificaion for Nonlinear Ineger Programming Problems Takeshi Masui, Masaoshi Sakawa, Kosuke Kao and Koichi Masumoo Hiroshima Universiy 1-4-1, Kagamiyama,
More informationNatural Temporal Difference Learning
Proceedings of he Tweny-Eighh AAAI Conference on Arificial Inelligence Naural Temporal Difference Learning William Dabney and Philip S. Thomas School of Compuer Science Universiy of Massachuses Amhers
More informationLearning Objectives: Practice designing and simulating digital circuits including flip flops Experience state machine design procedure
Lab 4: Synchronous Sae Machine Design Summary: Design and implemen synchronous sae machine circuis and es hem wih simulaions in Cadence Viruoso. Learning Objecives: Pracice designing and simulaing digial
More informationOnline Monte-Carlo Rollout
Presenaion Ouline Online Mone-Carlo Rollou For he Ship Self Defense Problem by Sébasien Chouinard 2828-88-7 The ship self-defense problem; Uncerain duraions and decision epochs; The Mone-Carlo Rollou algorihm;
More informationSelf assessment due: Monday 4/29/2019 at 11:59pm (submit via Gradescope)
CS 188 Spring 2019 Inroducion o Arificial Inelligence Wrien HW 10 Due: Monday 4/22/2019 a 11:59pm (submi via Gradescope). Leave self assessmen boxes blank for his due dae. Self assessmen due: Monday 4/29/2019
More information0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED
0.1 MAXIMUM LIKELIHOOD ESTIMATIO EXPLAIED Maximum likelihood esimaion is a bes-fi saisical mehod for he esimaion of he values of he parameers of a sysem, based on a se of observaions of a random variable
More informationOnline Convex Optimization Example And Follow-The-Leader
CSE599s, Spring 2014, Online Learning Lecure 2-04/03/2014 Online Convex Opimizaion Example And Follow-The-Leader Lecurer: Brendan McMahan Scribe: Sephen Joe Jonany 1 Review of Online Convex Opimizaion
More informationKEY. Math 334 Midterm III Winter 2008 section 002 Instructor: Scott Glasgow
KEY Mah 334 Miderm III Winer 008 secion 00 Insrucor: Sco Glasgow Please do NOT wrie on his exam. No credi will be given for such work. Raher wrie in a blue book, or on your own paper, preferably engineering
More informationThe Optimal Stopping Time for Selling an Asset When It Is Uncertain Whether the Price Process Is Increasing or Decreasing When the Horizon Is Infinite
American Journal of Operaions Research, 08, 8, 8-9 hp://wwwscirporg/journal/ajor ISSN Online: 60-8849 ISSN Prin: 60-8830 The Opimal Sopping Time for Selling an Asse When I Is Uncerain Wheher he Price Process
More informationCSE 3802 / ECE Numerical Methods in Scientific Computation. Jinbo Bi. Department of Computer Science & Engineering
CSE 3802 / ECE 3431 Numerical Mehods in Scienific Compuaion Jinbo Bi Deparmen of Compuer Science & Engineering hp://www.engr.uconn.edu/~jinbo 1 Ph.D in Mahemaics The Insrucor Previous professional experience:
More informationTemporal Difference. Learning KENNETH TRAN. Principal Research Engineer, MSR AI
Temporal Difference Learning KENNETH TRAN Principal Research Engineer, MSR AI Temporal Difference Learning Policy Evaluation Intro to model-free learning Monte Carlo Learning Temporal Difference Learning
More informationPolicy regimes Theory
Advanced Moneary Theory and Policy EPOS 2012/13 Policy regimes Theory Giovanni Di Barolomeo giovanni.dibarolomeo@uniroma1.i The moneary policy regime The simple model: x = - s (i - p e ) + x e + e D p
More informationSubway stations energy and air quality management
Subway saions energy and air qualiy managemen wih sochasic opimizaion Trisan Rigau 1,2,4, Advisors: P. Carpenier 3, J.-Ph. Chancelier 2, M. De Lara 2 EFFICACITY 1 CERMICS, ENPC 2 UMA, ENSTA 3 LISIS, IFSTTAR
More informationArticle from. Predictive Analytics and Futurism. July 2016 Issue 13
Aricle from Predicive Analyics and Fuurism July 6 Issue An Inroducion o Incremenal Learning By Qiang Wu and Dave Snell Machine learning provides useful ools for predicive analyics The ypical machine learning
More informationLecture 20: Riccati Equations and Least Squares Feedback Control
34-5 LINEAR SYSTEMS Lecure : Riccai Equaions and Leas Squares Feedback Conrol 5.6.4 Sae Feedback via Riccai Equaions A recursive approach in generaing he marix-valued funcion W ( ) equaion for i for he
More informationBU Macro BU Macro Fall 2008, Lecture 4
Dynamic Programming BU Macro 2008 Lecure 4 1 Ouline 1. Cerainy opimizaion problem used o illusrae: a. Resricions on exogenous variables b. Value funcion c. Policy funcion d. The Bellman equaion and an
More informationHW6: MRI Imaging Pulse Sequences (7 Problems for 100 pts)
HW6: MRI Imaging Pulse Sequences (7 Problems for 100 ps) GOAL The overall goal of HW6 is o beer undersand pulse sequences for MRI image reconsrucion. OBJECTIVES 1) Design a spin echo pulse sequence o image
More informationAn recursive analytical technique to estimate time dependent physical parameters in the presence of noise processes
WHAT IS A KALMAN FILTER An recursive analyical echnique o esimae ime dependen physical parameers in he presence of noise processes Example of a ime and frequency applicaion: Offse beween wo clocks PREDICTORS,
More informationPolicy Evaluation Using the Ω-Return
Policy Evaluaion Using he Ω-Reurn Philip S. Thomas Universiy of Massachuses Amhers Carnegie Mellon Universiy Georgios Theocharous Adobe Research Sco Niekum Universiy of Texas a Ausin George Konidaris Duke
More informationMore Digital Logic. t p output. Low-to-high and high-to-low transitions could have different t p. V in (t)
EECS 4 Spring 23 Lecure 2 EECS 4 Spring 23 Lecure 2 More igial Logic Gae delay and signal propagaion Clocked circui elemens (flip-flop) Wriing a word o memory Simplifying digial circuis: Karnaugh maps
More informationKinematics Vocabulary. Kinematics and One Dimensional Motion. Position. Coordinate System in One Dimension. Kinema means movement 8.
Kinemaics Vocabulary Kinemaics and One Dimensional Moion 8.1 WD1 Kinema means movemen Mahemaical descripion of moion Posiion Time Inerval Displacemen Velociy; absolue value: speed Acceleraion Averages
More informationSZG Macro 2011 Lecture 3: Dynamic Programming. SZG macro 2011 lecture 3 1
SZG Macro 2011 Lecure 3: Dynamic Programming SZG macro 2011 lecure 3 1 Background Our previous discussion of opimal consumpion over ime and of opimal capial accumulaion sugges sudying he general decision
More informationINSTANTANEOUS VELOCITY
INSTANTANEOUS VELOCITY I claim ha ha if acceleraion is consan, hen he elociy is a linear funcion of ime and he posiion a quadraic funcion of ime. We wan o inesigae hose claims, and a he same ime, work
More informationLet us start with a two dimensional case. We consider a vector ( x,
Roaion marices We consider now roaion marices in wo and hree dimensions. We sar wih wo dimensions since wo dimensions are easier han hree o undersand, and one dimension is a lile oo simple. However, our
More informationLecture 2 October ε-approximation of 2-player zero-sum games
Opimizaion II Winer 009/10 Lecurer: Khaled Elbassioni Lecure Ocober 19 1 ε-approximaion of -player zero-sum games In his lecure we give a randomized ficiious play algorihm for obaining an approximae soluion
More informationState-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter
Sae-Space Models Iniializaion, Esimaion and Smoohing of he Kalman Filer Iniializaion of he Kalman Filer The Kalman filer shows how o updae pas predicors and he corresponding predicion error variances when
More informationProbabilistic Robotics
Probabilisic Roboics Bayes Filer Implemenaions Gaussian filers Bayes Filer Reminder Predicion bel p u bel d Correcion bel η p z bel Gaussians : ~ π e p N p - Univariae / / : ~ μ μ μ e p Ν p d π Mulivariae
More informationReinforcement Learning: A Tutorial. Scope of Tutorial. 1 Introduction
Reinforcemen Learning: A Tuorial Mance E. Harmon WL/AACF 224 Avionics Circle Wrigh Laboraory Wrigh-Paerson AFB, OH 45433 mharmon@acm.org Sephanie S. Harmon Wrigh Sae Universiy 56-8 Mallard Glen Drive Cenerville,
More informationFrom Complex Fourier Series to Fourier Transforms
Topic From Complex Fourier Series o Fourier Transforms. Inroducion In he previous lecure you saw ha complex Fourier Series and is coeciens were dened by as f ( = n= C ne in! where C n = T T = T = f (e
More information4.2 The Fourier Transform
4.2. THE FOURIER TRANSFORM 57 4.2 The Fourier Transform 4.2.1 Inroducion One way o look a Fourier series is ha i is a ransformaion from he ime domain o he frequency domain. Given a signal f (), finding
More information15-889e Policy Search: Gradient Methods Emma Brunskill. All slides from David Silver (with EB adding minor modificafons), unless otherwise noted
15-889e Policy Search: Gradient Methods Emma Brunskill All slides from David Silver (with EB adding minor modificafons), unless otherwise noted Outline 1 Introduction 2 Finite Difference Policy Gradient
More informationa 10.0 (m/s 2 ) 5.0 Name: Date: 1. The graph below describes the motion of a fly that starts out going right V(m/s)
Name: Dae: Kinemaics Review (Honors. Physics) Complee he following on a separae shee of paper o be urned in on he day of he es. ALL WORK MUST BE SHOWN TO RECEIVE CREDIT. 1. The graph below describes he
More informationEchocardiography Project and Finite Fourier Series
Echocardiography Projec and Finie Fourier Series 1 U M An echocardiagram is a plo of how a porion of he hear moves as he funcion of ime over he one or more hearbea cycles If he hearbea repeas iself every
More informationFinish reading Chapter 2 of Spivak, rereading earlier sections as necessary. handout and fill in some missing details!
MAT 257, Handou 6: Ocober 7-2, 20. I. Assignmen. Finish reading Chaper 2 of Spiva, rereading earlier secions as necessary. handou and fill in some missing deails! II. Higher derivaives. Also, read his
More informationHamilton- J acobi Equation: Explicit Formulas In this lecture we try to apply the method of characteristics to the Hamilton-Jacobi equation: u t
M ah 5 2 7 Fall 2 0 0 9 L ecure 1 0 O c. 7, 2 0 0 9 Hamilon- J acobi Equaion: Explici Formulas In his lecure we ry o apply he mehod of characerisics o he Hamilon-Jacobi equaion: u + H D u, x = 0 in R n
More informationRandom Walk with Anti-Correlated Steps
Random Walk wih Ani-Correlaed Seps John Noga Dirk Wagner 2 Absrac We conjecure he expeced value of random walks wih ani-correlaed seps o be exacly. We suppor his conjecure wih 2 plausibiliy argumens and
More informationA Reinforcement Learning Approach for Collaborative Filtering
A Reinforcemen Learning Approach for Collaboraive Filering Jungkyu Lee, Byonghwa Oh 2, Jihoon Yang 2, and Sungyong Park 2 Cyram Inc, Seoul, Korea jklee@cyram.com 2 Sogang Universiy, Seoul, Korea {mrfive,yangjh,parksy}@sogang.ac.kr
More informationKINEMATICS IN ONE DIMENSION
KINEMATICS IN ONE DIMENSION PREVIEW Kinemaics is he sudy of how hings move how far (disance and displacemen), how fas (speed and velociy), and how fas ha how fas changes (acceleraion). We say ha an objec
More information