CSE/NB 528 Lecture 14: Reinforcement Learning (Chapter 9)

Size: px
Start display at page:

Download "CSE/NB 528 Lecture 14: Reinforcement Learning (Chapter 9)"

Transcription

1 CSE/NB 528 Lecure 14: Reinforcemen Learning Chaper 9 Image from hp://clasdean.la.asu.edu/news/images/ubep2001/neuron3.jpg Lecure figures are from Dayan & Abbo s book hp://people.brandeis.edu/~abbo/book/index.hml 1

2 Today s Agenda Reinforcemen Learning Wha is reinforcemen learning? Classical condiioning Learning o salivae predicing reward Predicing Delayed Rewards Temporal Difference Learning Learning o Ac Q-learning Acor-Criic Archiecure 2

3 Some Supervised Learning Demos on he Web Funcion Approximaion: hp://neuron.eng.wayne.edu/bpfuncionapprox/bpfuncionapprox.hml Paern Recogniion hp://eecs.wsu.edu/~cook/ai/lecures/apples/hnn/jrec.hml Image Compression hp://neuron.eng.wayne.edu/bpimagecompression9plus/bp9plus.hml Backpropagaion for Conrol: Ball Balancing hp://neuron.eng.wayne.edu/bpballbalancing/ball5.hml 3

4 Humans don ge exac supervisory signals commands for muscles for learning o alk, walk, ride a bicycle, play he piano, drive, ec. We learn by rial-and-error and by waching ohers Migh ge rewards and punishmens along he way Ener Reinforcemen Learning 4

5 The Reinforcemen Learning Agen Agen Sae u Reward r Acion a Environmen 5

6 The Reinforcemen Learning Framework Unsupervised learning: Learn he hidden causes of inpus Supervised learning: Learn a funcion based on raining examples of inpu, desired oupu pairs Reinforcemen Learning: Learn he bes acion for any given sae so as o maximize oal expeced fuure reward Learn by rial and error Inermediae beween unsupervised and supervised learning Insead of explici eaching signal or desired oupu, you ge rewards or punishmens Inspired by classical condiioning experimens remember Pavlov s hyper-salivaing dog? 6

7 Early Resuls: Pavlov and his Dog Classical Pavlovian condiioning experimens Training: Bell Food Afer: Bell Salivae Condiioned simulus bell predics fuure reward food hp://employees.csbsju.edu/creed/pb/pdoganim.hml 7

8 Predicing Reward Simulus u = 0 or 1 Expeced reward v = wu Delivered reward = r Learn w by minimizing r v 2 w w + ε r v u Predicion error δ = r v For small ε and u = 1, Average value of w = w w same as he dela rule; also called Rescorla-Wagner rule w + ε r w r 8

9 Predicing Reward during Condiioning Reward r presen condiioning r = 1, ε = 0.5 Reward removed exincion Reward presened 50% of he rials 9

10 Predicing Delayed Rewards In more realisic cases, reward is ypically delivered a he end when you know wheher you succeeded or no Time: 0 T wih simulus u and reward r a each ime sep Noe: r can be zero Key Idea: Make he oupu v predic oal expeced fuure reward saring from ime v T τ = 0 r + τ 10

11 Learning o Predic Delayed Rewards Use a se of modifiable weighs w and predic based on all pas simuli u: v = w τ u τ = 0 τ Would like o find wτ ha minimize: T τ = 0 r + τ v 2 Can we minimize his using gradien descen and dela rule? Yes, BUT no ye available are fuure rewards 11

12 12 Temporal Difference TD Learning Key Idea: Rewrie squared error o ge rid of fuure erms: Temporal Difference TD Learning: For each ime sep, do: For all τ 0 τ, do: v v r v r r v r T T = + = τ = τ τ τ ] 1 [ τ ε τ τ u v v r w w 0 τ τ τ = = u w v Expeced fuure reward Predicion δ

13 Predicing Delayed Reward: TD Learning Simulus a = 100 and reward a = 200 Predicion error δ for each ime sep over many rials 13

14 Reward Predicion Error Signal in Monkeys? Dopaminergic cells in Venral Tegmenal Area VTA Reward Predicion error? [ r + v + 1 v ] Before Training Afer Training [ 0 + v + 1 v ] No error v r + v

15 More Evidence for Predicion Error Signals Dopaminergic cells in VTA Negaive error r = 0, v + 1 = 0 [ r + v + 1 v ] = v 15

16 Tha s grea, bu how does all ha mah help me ge food in a maze? 16

17 Using Reward Predicions o Selec Acions Suppose you have compued a Value for each acion Qa = value prediced reward for execuing acion a Higher if acion yields more reward, lower oherwise Can selec acions probabilisically according o heir value: P a = a' exp β Q a exp βq a' High β selecs acions wih highes Q value. Low β selecs more uniformly 17

18 Simple Example: Bee Foraging Experimen: Bees selec eiher yellow y or blue b flowers based on necar reward Idea: Value of yellow/blue = average reward obained so far Q y Q y + ε r Q y Q b Q b + ε r P y P b = 1 P y b y Q b exp βq y = exp βq y + exp βq b dela rule running average Yum! hp://svi.cps.uexas.edu/bee_on_flower_original.hm 18

19 Simulaing Bees r r y b = 2 = 1 r r y b = 1 = 2 β = 1 exploraion possible Q y b β = 50 β = 50 mosly exploiaion 19

20 Forge bees, how do I ge o he food in he maze? 20

21 Selecing Acions when Reward is Delayed Saes: A, B, or C Possible acions a any sae: Lef L or Righ R If you randomly choose o go L or R random policy, wha is he value v of each sae? 21

22 22 Policy Evaluaion For random policy: Can learn his using TD learning: = + = = + = = + = C v B v A v C v B v ] ' [ u v u v u r w u u w a + + ε u,a u Le vu = wu Locaion, acion new locaion

23 Maze Value Learning for Random Policy Once I know he values, I can pick he acion ha leads o he higher valued sae! 23

24 Selecing Acions based on Values Values ac as surrogae immediae rewards Locally opimal choice leads o globally opimal policy for Markov environmens Relaed o Dynamic Programming in CS see appendix in ex 24

25 Q learning A simple mehod for acion selecion based on acion values or Q values Qx,a where x is a sae and a is an acion 1. Le u be he curren sae. Selec an acion a according o: exp βq u, a P a = exp βq u, a' a' 2. Execue a and record new sae u and reward r. Updae Q: Q u, a Q u, a + ε r + maxa' Q u', a' Q u, a 3. Repea unil an end sae is reached 25

26 Anoher Varian: Acor-Criic Learning Two separae componens: Acor mainains policy and Criic mainains value of each sae 1. Criic Learning Policy Evaluaion : Value of sae u = vu = wu w u w u + ε [ ra u + v u' v u] 2. Acor Learning Policy Improvemen : P a; u Q a' = For all a : exp βqa u exp βq u u Qa' u + ε[ ra u + v u' v u] δ aa' 3. Inerleave 1 and 2 b b same as TD rule Use his o selec an acion a in u P a'; u 26

27 Acor-Criic Learning in he Maze Task Probabiliy of going Lef a a locaion 27

28 Demo of Reinforcemen Learning in a Robo from hp://sysplan.nams.kyushuu.ac.jp/gen/papers/javademoml97/robodemo.hml 28

29 Things o do: Work on mini-projec Nex class: Course Summary 29

CSE/NB 528 Lecture 14: From Supervised to Reinforcement Learning (Chapter 9) R. Rao, 528: Lecture 14

CSE/NB 528 Lecture 14: From Supervised to Reinforcement Learning (Chapter 9) R. Rao, 528: Lecture 14 CSE/NB 58 Lecure 14: From Supervised o Reinforcemen Learning Chaper 9 1 Recall from las ime: Sigmoid Neworks Oupu v T g w u g wiui w Inpu nodes u = u 1 u u 3 T i Sigmoid oupu funcion: 1 g a 1 a e 1 ga

More information

Chapter 21. Reinforcement Learning. The Reinforcement Learning Agent

Chapter 21. Reinforcement Learning. The Reinforcement Learning Agent CSE 47 Chaper Reinforcemen Learning The Reinforcemen Learning Agen Agen Sae u Reward r Acion a Enironmen CSE AI Faculy Why reinforcemen learning Programming an agen o drie a car or fly a helicoper is ery

More information

CSE/NEURO 528 Lecture 13: Reinforcement Learning & Course Review (Chapter 9)

CSE/NEURO 528 Lecture 13: Reinforcement Learning & Course Review (Chapter 9) CSE/NEURO 528 Lecure 13: Reinforceen Learning & Course Review Chaper 9 Aniaion: To Creed, SJU 1 Early Resuls: Pavlov and his Dog F Classical Pavlovian condiioning experiens F Training: Bell Food F Afer:

More information

RL Lecture 7: Eligibility Traces. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1

RL Lecture 7: Eligibility Traces. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 RL Lecure 7: Eligibiliy Traces R. S. Suon and A. G. Baro: Reinforcemen Learning: An Inroducion 1 N-sep TD Predicion Idea: Look farher ino he fuure when you do TD backup (1, 2, 3,, n seps) R. S. Suon and

More information

Classical Conditioning IV: TD learning in the brain

Classical Conditioning IV: TD learning in the brain Classical Condiioning IV: TD learning in he brain PSY/NEU338: Animal learning and decision making: Psychological, compuaional and neural perspecives recap: Marr s levels of analysis David Marr (1945-1980)

More information

Online Convex Optimization Example And Follow-The-Leader

Online Convex Optimization Example And Follow-The-Leader CSE599s, Spring 2014, Online Learning Lecure 2-04/03/2014 Online Convex Opimizaion Example And Follow-The-Leader Lecurer: Brendan McMahan Scribe: Sephen Joe Jonany 1 Review of Online Convex Opimizaion

More information

Presentation Overview

Presentation Overview Acion Refinemen in Reinforcemen Learning by Probabiliy Smoohing By Thomas G. Dieerich & Didac Busques Speaer: Kai Xu Presenaion Overview Bacground The Probabiliy Smoohing Mehod Experimenal Sudy of Acion

More information

Ensamble methods: Boosting

Ensamble methods: Boosting Lecure 21 Ensamble mehods: Boosing Milos Hauskrech milos@cs.pi.edu 5329 Senno Square Schedule Final exam: April 18: 1:00-2:15pm, in-class Term projecs April 23 & April 25: a 1:00-2:30pm in CS seminar room

More information

CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK

CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK 175 CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK 10.1 INTRODUCTION Amongs he research work performed, he bes resuls of experimenal work are validaed wih Arificial Neural Nework. From he

More information

ARTIFICIAL INTELLIGENCE. Markov decision processes

ARTIFICIAL INTELLIGENCE. Markov decision processes INFOB2KI 2017-2018 Urech Univeriy The Neherland ARTIFICIAL INTELLIGENCE Markov deciion procee Lecurer: Silja Renooij Thee lide are par of he INFOB2KI Coure Noe available from www.c.uu.nl/doc/vakken/b2ki/chema.hml

More information

Ensamble methods: Bagging and Boosting

Ensamble methods: Bagging and Boosting Lecure 21 Ensamble mehods: Bagging and Boosing Milos Hauskrech milos@cs.pi.edu 5329 Senno Square Ensemble mehods Mixure of expers Muliple base models (classifiers, regressors), each covers a differen par

More information

Linear Response Theory: The connection between QFT and experiments

Linear Response Theory: The connection between QFT and experiments Phys540.nb 39 3 Linear Response Theory: The connecion beween QFT and experimens 3.1. Basic conceps and ideas Q: How do we measure he conduciviy of a meal? A: we firs inroduce a weak elecric field E, and

More information

Zürich. ETH Master Course: L Autonomous Mobile Robots Localization II

Zürich. ETH Master Course: L Autonomous Mobile Robots Localization II Roland Siegwar Margaria Chli Paul Furgale Marco Huer Marin Rufli Davide Scaramuzza ETH Maser Course: 151-0854-00L Auonomous Mobile Robos Localizaion II ACT and SEE For all do, (predicion updae / ACT),

More information

Deep Learning: Theory, Techniques & Applications - Recurrent Neural Networks -

Deep Learning: Theory, Techniques & Applications - Recurrent Neural Networks - Deep Learning: Theory, Techniques & Applicaions - Recurren Neural Neworks - Prof. Maeo Maeucci maeo.maeucci@polimi.i Deparmen of Elecronics, Informaion and Bioengineering Arificial Inelligence and Roboics

More information

20. Applications of the Genetic-Drift Model

20. Applications of the Genetic-Drift Model 0. Applicaions of he Geneic-Drif Model 1) Deermining he probabiliy of forming any paricular combinaion of genoypes in he nex generaion: Example: If he parenal allele frequencies are p 0 = 0.35 and q 0

More information

Predator - Prey Model Trajectories and the nonlinear conservation law

Predator - Prey Model Trajectories and the nonlinear conservation law Predaor - Prey Model Trajecories and he nonlinear conservaion law James K. Peerson Deparmen of Biological Sciences and Deparmen of Mahemaical Sciences Clemson Universiy Ocober 28, 213 Ouline Drawing Trajecories

More information

Vehicle Arrival Models : Headway

Vehicle Arrival Models : Headway Chaper 12 Vehicle Arrival Models : Headway 12.1 Inroducion Modelling arrival of vehicle a secion of road is an imporan sep in raffic flow modelling. I has imporan applicaion in raffic flow simulaion where

More information

Linear Time-invariant systems, Convolution, and Cross-correlation

Linear Time-invariant systems, Convolution, and Cross-correlation Linear Time-invarian sysems, Convoluion, and Cross-correlaion (1) Linear Time-invarian (LTI) sysem A sysem akes in an inpu funcion and reurns an oupu funcion. x() T y() Inpu Sysem Oupu y() = T[x()] An

More information

Viterbi Algorithm: Background

Viterbi Algorithm: Background Vierbi Algorihm: Background Jean Mark Gawron March 24, 2014 1 The Key propery of an HMM Wha is an HMM. Formally, i has he following ingrediens: 1. a se of saes: S 2. a se of final saes: F 3. an iniial

More information

Random Walk with Anti-Correlated Steps

Random Walk with Anti-Correlated Steps Random Walk wih Ani-Correlaed Seps John Noga Dirk Wagner 2 Absrac We conjecure he expeced value of random walks wih ani-correlaed seps o be exacly. We suppor his conjecure wih 2 plausibiliy argumens and

More information

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010 Simulaion-Solving Dynamic Models ABE 5646 Week 2, Spring 2010 Week Descripion Reading Maerial 2 Compuer Simulaion of Dynamic Models Finie Difference, coninuous saes, discree ime Simple Mehods Euler Trapezoid

More information

Planning in POMDPs. Dominik Schoenberger Abstract

Planning in POMDPs. Dominik Schoenberger Abstract Planning in POMDPs Dominik Schoenberger d.schoenberger@sud.u-darmsad.de Absrac This documen briefly explains wha a Parially Observable Markov Decision Process is. Furhermore i inroduces he differen approaches

More information

Embedded Systems and Software. A Simple Introduction to Embedded Control Systems (PID Control)

Embedded Systems and Software. A Simple Introduction to Embedded Control Systems (PID Control) Embedded Sysems and Sofware A Simple Inroducion o Embedded Conrol Sysems (PID Conrol) Embedded Sysems and Sofware, ECE:3360. The Universiy of Iowa, 2016 Slide 1 Acknowledgemens The maerial in his lecure

More information

1 Review of Zero-Sum Games

1 Review of Zero-Sum Games COS 5: heoreical Machine Learning Lecurer: Rob Schapire Lecure #23 Scribe: Eugene Brevdo April 30, 2008 Review of Zero-Sum Games Las ime we inroduced a mahemaical model for wo player zero-sum games. Any

More information

Notes on online convex optimization

Notes on online convex optimization Noes on online convex opimizaion Karl Sraos Online convex opimizaion (OCO) is a principled framework for online learning: OnlineConvexOpimizaion Inpu: convex se S, number of seps T For =, 2,..., T : Selec

More information

Solutions - Midterm Exam

Solutions - Midterm Exam DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING, THE UNIVERITY OF NEW MEXICO ECE-34: ignals and ysems ummer 203 PROBLEM (5 PT) Given he following LTI sysem: oluions - Miderm Exam a) kech he impulse response

More information

Article from. Predictive Analytics and Futurism. July 2016 Issue 13

Article from. Predictive Analytics and Futurism. July 2016 Issue 13 Aricle from Predicive Analyics and Fuurism July 6 Issue An Inroducion o Incremenal Learning By Qiang Wu and Dave Snell Machine learning provides useful ools for predicive analyics The ypical machine learning

More information

Reinforcement learning

Reinforcement learning Lecue 3 Reinfocemen leaning Milos Hauskech milos@cs.pi.edu 539 Senno Squae Reinfocemen leaning We wan o lean he conol policy: : X A We see examples of x (bu oupus a ae no given) Insead of a we ge a feedback

More information

Reinforcement Learning: A Tutorial. Scope of Tutorial. 1 Introduction

Reinforcement Learning: A Tutorial. Scope of Tutorial. 1 Introduction Reinforcemen Learning: A Tuorial Mance E. Harmon WL/AACF 224 Avionics Circle Wrigh Laboraory Wrigh-Paerson AFB, OH 45433 mharmon@acm.org Sephanie S. Harmon Wrigh Sae Universiy 56-8 Mallard Glen Drive Cenerville,

More information

Bias-Variance Error Bounds for Temporal Difference Updates

Bias-Variance Error Bounds for Temporal Difference Updates Bias-Variance Bounds for Temporal Difference Updaes Michael Kearns AT&T Labs mkearns@research.a.com Sainder Singh AT&T Labs baveja@research.a.com Absrac We give he firs rigorous upper bounds on he error

More information

True Online Temporal-Difference Learning. A. Rupam Mahmood Patrick M. Pilarski

True Online Temporal-Difference Learning. A. Rupam Mahmood Patrick M. Pilarski True Online Temporal-Difference Learning True Online Temporal-Difference Learning Harm van Seijen harm.vanseijen@ualbera.ca A. Rupam Mahmood ashique@ualbera.ca Parick M. Pilarski parick.pilarski@ualbera.ca

More information

SOLUTIONS TO ECE 3084

SOLUTIONS TO ECE 3084 SOLUTIONS TO ECE 384 PROBLEM 2.. For each sysem below, specify wheher or no i is: (i) memoryless; (ii) causal; (iii) inverible; (iv) linear; (v) ime invarian; Explain your reasoning. If he propery is no

More information

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing Applicaion of a Sochasic-Fuzzy Approach o Modeling Opimal Discree Time Dynamical Sysems by Using Large Scale Daa Processing AA WALASZE-BABISZEWSA Deparmen of Compuer Engineering Opole Universiy of Technology

More information

EECE 301 Signals & Systems Prof. Mark Fowler

EECE 301 Signals & Systems Prof. Mark Fowler EECE 3 Signals & Sysems Prof. Mark Fowler Noe Se #2 Wha are Coninuous-Time Signals??? Reading Assignmen: Secion. of Kamen and Heck /22 Course Flow Diagram The arrows here show concepual flow beween ideas.

More information

1. An introduction to dynamic optimization -- Optimal Control and Dynamic Programming AGEC

1. An introduction to dynamic optimization -- Optimal Control and Dynamic Programming AGEC This documen was generaed a :45 PM 8/8/04 Copyrigh 04 Richard T. Woodward. An inroducion o dynamic opimizaion -- Opimal Conrol and Dynamic Programming AGEC 637-04 I. Overview of opimizaion Opimizaion is

More information

An introduction to the theory of SDDP algorithm

An introduction to the theory of SDDP algorithm An inroducion o he heory of SDDP algorihm V. Leclère (ENPC) Augus 1, 2014 V. Leclère Inroducion o SDDP Augus 1, 2014 1 / 21 Inroducion Large scale sochasic problem are hard o solve. Two ways of aacking

More information

Lars Nesheim. 17 January Last lecture solved the consumer choice problem.

Lars Nesheim. 17 January Last lecture solved the consumer choice problem. Lecure 4 Locaional Equilibrium Coninued Lars Nesheim 17 January 28 1 Inroducory remarks Las lecure solved he consumer choice problem. Compued condiional demand funcions: C (I x; p; r (x)) and x; p; r (x))

More information

Chapter 7: Solving Trig Equations

Chapter 7: Solving Trig Equations Haberman MTH Secion I: The Trigonomeric Funcions Chaper 7: Solving Trig Equaions Le s sar by solving a couple of equaions ha involve he sine funcion EXAMPLE a: Solve he equaion sin( ) The inverse funcions

More information

A Reinforcement Learning Approach for Collaborative Filtering

A Reinforcement Learning Approach for Collaborative Filtering A Reinforcemen Learning Approach for Collaboraive Filering Jungkyu Lee, Byonghwa Oh 2, Jihoon Yang 2, and Sungyong Park 2 Cyram Inc, Seoul, Korea jklee@cyram.com 2 Sogang Universiy, Seoul, Korea {mrfive,yangjh,parksy}@sogang.ac.kr

More information

Tom Heskes and Onno Zoeter. Presented by Mark Buller

Tom Heskes and Onno Zoeter. Presented by Mark Buller Tom Heskes and Onno Zoeer Presened by Mark Buller Dynamic Bayesian Neworks Direced graphical models of sochasic processes Represen hidden and observed variables wih differen dependencies Generalize Hidden

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Noes for EE7C Spring 018: Convex Opimizaion and Approximaion Insrucor: Moriz Hard Email: hard+ee7c@berkeley.edu Graduae Insrucor: Max Simchowiz Email: msimchow+ee7c@berkeley.edu Ocober 15, 018 3

More information

LabQuest 24. Capacitors

LabQuest 24. Capacitors Capaciors LabQues 24 The charge q on a capacior s plae is proporional o he poenial difference V across he capacior. We express his wih q V = C where C is a proporionaliy consan known as he capaciance.

More information

Chapter 8 The Complete Response of RL and RC Circuits

Chapter 8 The Complete Response of RL and RC Circuits Chaper 8 The Complee Response of RL and RC Circuis Seoul Naional Universiy Deparmen of Elecrical and Compuer Engineering Wha is Firs Order Circuis? Circuis ha conain only one inducor or only one capacior

More information

EE100 Lab 3 Experiment Guide: RC Circuits

EE100 Lab 3 Experiment Guide: RC Circuits I. Inroducion EE100 Lab 3 Experimen Guide: A. apaciors A capacior is a passive elecronic componen ha sores energy in he form of an elecrosaic field. The uni of capaciance is he farad (coulomb/vol). Pracical

More information

Tournament selection in zeroth-level classifier systems based on. average reward reinforcement learning

Tournament selection in zeroth-level classifier systems based on. average reward reinforcement learning ournamen selecion in zeroh-level classifier sysems based on average reward reinforcemen learning Zang Zhaoxiang, Li Zhao, Wang Junying, Dan Zhiping zxzang@gmail.com; zangzx@hus.edu.cn (Hubei Key Laboraory

More information

Designing Information Devices and Systems I Spring 2019 Lecture Notes Note 17

Designing Information Devices and Systems I Spring 2019 Lecture Notes Note 17 EES 16A Designing Informaion Devices and Sysems I Spring 019 Lecure Noes Noe 17 17.1 apaciive ouchscreen In he las noe, we saw ha a capacior consiss of wo pieces on conducive maerial separaed by a nonconducive

More information

Sequential Importance Resampling (SIR) Particle Filter

Sequential Importance Resampling (SIR) Particle Filter Paricle Filers++ Pieer Abbeel UC Berkeley EECS Many slides adaped from Thrun, Burgard and Fox, Probabilisic Roboics 1. Algorihm paricle_filer( S -1, u, z ): 2. Sequenial Imporance Resampling (SIR) Paricle

More information

Robust Learning Control with Application to HVAC Systems

Robust Learning Control with Application to HVAC Systems Robus Learning Conrol wih Applicaion o HVAC Sysems Naional Science Foundaion & Projec Invesigaors: Dr. Charles Anderson, CS Dr. Douglas Hile, ME Dr. Peer Young, ECE Mechanical Engineering Compuer Science

More information

BU Macro BU Macro Fall 2008, Lecture 4

BU Macro BU Macro Fall 2008, Lecture 4 Dynamic Programming BU Macro 2008 Lecure 4 1 Ouline 1. Cerainy opimizaion problem used o illusrae: a. Resricions on exogenous variables b. Value funcion c. Policy funcion d. The Bellman equaion and an

More information

Machine Learning 4771

Machine Learning 4771 ony Jebara, Columbia Universiy achine Learning 4771 Insrucor: ony Jebara ony Jebara, Columbia Universiy opic 20 Hs wih Evidence H Collec H Evaluae H Disribue H Decode H Parameer Learning via JA & E ony

More information

SZG Macro 2011 Lecture 3: Dynamic Programming. SZG macro 2011 lecture 3 1

SZG Macro 2011 Lecture 3: Dynamic Programming. SZG macro 2011 lecture 3 1 SZG Macro 2011 Lecure 3: Dynamic Programming SZG macro 2011 lecure 3 1 Background Our previous discussion of opimal consumpion over ime and of opimal capial accumulaion sugges sudying he general decision

More information

Differential Geometry: Numerical Integration and Surface Flow

Differential Geometry: Numerical Integration and Surface Flow Differenial Geomery: Numerical Inegraion and Surface Flow [Implici Fairing of Irregular Meshes using Diffusion and Curaure Flow. Desbrun e al., 1999] Energy Minimizaion Recall: We hae been considering

More information

Temporal Abstraction in Temporal-difference Networks

Temporal Abstraction in Temporal-difference Networks Temporal Absracion in Temporal-difference Neworks Richard S. Suon, Eddie J. Rafols, Anna Koop Deparmen of Compuing Science Universiy of Albera Edmonon, AB, Canada T6G 2E8 {suon,erafols,anna}@cs.ualbera.ca

More information

Dimitri Solomatine. D.P. Solomatine. Data-driven modelling (part 2). 2

Dimitri Solomatine. D.P. Solomatine. Data-driven modelling (part 2). 2 Daa-driven modelling. Par. Daa-driven Arificial di Neural modelling. Newors Par Dimiri Solomaine Arificial neural newors D.P. Solomaine. Daa-driven modelling par. 1 Arificial neural newors ANN: main pes

More information

Lecture 2 October ε-approximation of 2-player zero-sum games

Lecture 2 October ε-approximation of 2-player zero-sum games Opimizaion II Winer 009/10 Lecurer: Khaled Elbassioni Lecure Ocober 19 1 ε-approximaion of -player zero-sum games In his lecure we give a randomized ficiious play algorihm for obaining an approximae soluion

More information

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model Modal idenificaion of srucures from roving inpu daa by means of maximum likelihood esimaion of he sae space model J. Cara, J. Juan, E. Alarcón Absrac The usual way o perform a forced vibraion es is o fix

More information

EE 301 Lab 2 Convolution

EE 301 Lab 2 Convolution EE 301 Lab 2 Convoluion 1 Inroducion In his lab we will gain some more experience wih he convoluion inegral and creae a scrip ha shows he graphical mehod of convoluion. 2 Wha you will learn This lab will

More information

= ( ) ) or a system of differential equations with continuous parametrization (T = R

= ( ) ) or a system of differential equations with continuous parametrization (T = R XIII. DIFFERENCE AND DIFFERENTIAL EQUATIONS Ofen funcions, or a sysem of funcion, are paramerized in erms of some variable, usually denoed as and inerpreed as ime. The variable is wrien as a funcion of

More information

Laplace transfom: t-translation rule , Haynes Miller and Jeremy Orloff

Laplace transfom: t-translation rule , Haynes Miller and Jeremy Orloff Laplace ransfom: -ranslaion rule 8.03, Haynes Miller and Jeremy Orloff Inroducory example Consider he sysem ẋ + 3x = f(, where f is he inpu and x he response. We know is uni impulse response is 0 for

More information

3.1 More on model selection

3.1 More on model selection 3. More on Model selecion 3. Comparing models AIC, BIC, Adjused R squared. 3. Over Fiing problem. 3.3 Sample spliing. 3. More on model selecion crieria Ofen afer model fiing you are lef wih a handful of

More information

Math 10B: Mock Mid II. April 13, 2016

Math 10B: Mock Mid II. April 13, 2016 Name: Soluions Mah 10B: Mock Mid II April 13, 016 1. ( poins) Sae, wih jusificaion, wheher he following saemens are rue or false. (a) If a 3 3 marix A saisfies A 3 A = 0, hen i canno be inverible. True.

More information

MATH 5720: Gradient Methods Hung Phan, UMass Lowell October 4, 2018

MATH 5720: Gradient Methods Hung Phan, UMass Lowell October 4, 2018 MATH 5720: Gradien Mehods Hung Phan, UMass Lowell Ocober 4, 208 Descen Direcion Mehods Consider he problem min { f(x) x R n}. The general descen direcions mehod is x k+ = x k + k d k where x k is he curren

More information

23.2. Representing Periodic Functions by Fourier Series. Introduction. Prerequisites. Learning Outcomes

23.2. Representing Periodic Functions by Fourier Series. Introduction. Prerequisites. Learning Outcomes Represening Periodic Funcions by Fourier Series 3. Inroducion In his Secion we show how a periodic funcion can be expressed as a series of sines and cosines. We begin by obaining some sandard inegrals

More information

Attention-Gated Reinforcement Learning in Neural Networks A Unified View

Attention-Gated Reinforcement Learning in Neural Networks A Unified View Aenion-Gaed Reinforcemen Learning in Neural Neworks A Unified View Tobias Brosch, Friedhelm Schwenker, and Heiko Neumann Insiue of Neural Informaion Processing, Universiy of Ulm, 89069 Ulm, Germany {obias.brosch,friedhelm.schwenker,heiko.neumann}@uni-ulm.de

More information

Economics 8105 Macroeconomic Theory Recitation 6

Economics 8105 Macroeconomic Theory Recitation 6 Economics 8105 Macroeconomic Theory Reciaion 6 Conor Ryan Ocober 11h, 2016 Ouline: Opimal Taxaion wih Governmen Invesmen 1 Governmen Expendiure in Producion In hese noes we will examine a model in which

More information

1. An introduction to dynamic optimization -- Optimal Control and Dynamic Programming AGEC

1. An introduction to dynamic optimization -- Optimal Control and Dynamic Programming AGEC This documen was generaed a :37 PM, 1/11/018 Copyrigh 018 Richard T. Woodward 1. An inroducion o dynamic opimiaion -- Opimal Conrol and Dynamic Programming AGEC 64-018 I. Overview of opimiaion Opimiaion

More information

E β t log (C t ) + M t M t 1. = Y t + B t 1 P t. B t 0 (3) v t = P tc t M t Question 1. Find the FOC s for an optimum in the agent s problem.

E β t log (C t ) + M t M t 1. = Y t + B t 1 P t. B t 0 (3) v t = P tc t M t Question 1. Find the FOC s for an optimum in the agent s problem. Noes, M. Krause.. Problem Se 9: Exercise on FTPL Same model as in paper and lecure, only ha one-period govenmen bonds are replaced by consols, which are bonds ha pay one dollar forever. I has curren marke

More information

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017 Two Popular Bayesian Esimaors: Paricle and Kalman Filers McGill COMP 765 Sep 14 h, 2017 1 1 1, dx x Bel x u x P x z P Recall: Bayes Filers,,,,,,, 1 1 1 1 u z u x P u z u x z P Bayes z = observaion u =

More information

δ (τ )dτ denotes the unit step function, and

δ (τ )dτ denotes the unit step function, and ECE-202 Homework Problems (Se 1) Spring 18 TO THE STUDENT: ALWAYS CHECK THE ERRATA on he web. ANCIENT ASIAN/AFRICAN/NATIVE AMERICAN/SOUTH AMERICAN ETC. PROVERB: If you give someone a fish, you give hem

More information

Final Spring 2007

Final Spring 2007 .615 Final Spring 7 Overview The purpose of he final exam is o calculae he MHD β limi in a high-bea oroidal okamak agains he dangerous n = 1 exernal ballooning-kink mode. Effecively, his corresponds o

More information

EECE 301 Signals & Systems Prof. Mark Fowler

EECE 301 Signals & Systems Prof. Mark Fowler EECE 31 Signals & Sysems Prof. Mark Fowler Noe Se #1 C-T Sysems: Convoluion Represenaion Reading Assignmen: Secion 2.6 of Kamen and Heck 1/11 Course Flow Diagram The arrows here show concepual flow beween

More information

Phys1112: DC and RC circuits

Phys1112: DC and RC circuits Name: Group Members: Dae: TA s Name: Phys1112: DC and RC circuis Objecives: 1. To undersand curren and volage characerisics of a DC RC discharging circui. 2. To undersand he effec of he RC ime consan.

More information

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon 3..3 INRODUCION O DYNAMIC OPIMIZAION: DISCREE IME PROBLEMS A. he Hamilonian and Firs-Order Condiions in a Finie ime Horizon Define a new funcion, he Hamilonian funcion, H. H he change in he oal value of

More information

8. Basic RL and RC Circuits

8. Basic RL and RC Circuits 8. Basic L and C Circuis This chaper deals wih he soluions of he responses of L and C circuis The analysis of C and L circuis leads o a linear differenial equaion This chaper covers he following opics

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Probabilisic reasoning over ime So far, we ve mosly deal wih episodic environmens Excepions: games wih muliple moves, planning In paricular, he Bayesian neworks we ve seen so far describe

More information

Written HW 9 Sol. CS 188 Fall Introduction to Artificial Intelligence

Written HW 9 Sol. CS 188 Fall Introduction to Artificial Intelligence CS 188 Fall 2018 Inroducion o Arificial Inelligence Wrien HW 9 Sol. Self-assessmen due: Tuesday 11/13/2018 a 11:59pm (submi via Gradescope) For he self assessmen, fill in he self assessmen boxes in your

More information

Robust and Learning Control for Complex Systems

Robust and Learning Control for Complex Systems Robus and Learning Conrol for Complex Sysems Peer M. Young Sepember 13, 2007 & Talk Ouline Inroducion Robus Conroller Analysis and Design Theory Experimenal Applicaions Overview MIMO Robus HVAC Conrol

More information

Guest Lectures for Dr. MacFarlane s EE3350 Part Deux

Guest Lectures for Dr. MacFarlane s EE3350 Part Deux Gues Lecures for Dr. MacFarlane s EE3350 Par Deux Michael Plane Mon., 08-30-2010 Wrie name in corner. Poin ou his is a review, so I will go faser. Remind hem o go lisen o online lecure abou geing an A

More information

Off-policy TD(λ) with a true online equivalence

Off-policy TD(λ) with a true online equivalence Off-policy TD(λ) wih a rue online equivalence Hado van Hassel A Rupam Mahmood Richard S Suon Reinforcemen Learning and Arificial Inelligence Laboraory Universiy of Albera, Edmonon, AB T6G 2E8 Canada Absrac

More information

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD HAN XIAO 1. Penalized Leas Squares Lasso solves he following opimizaion problem, ˆβ lasso = arg max β R p+1 1 N y i β 0 N x ij β j β j (1.1) for some 0.

More information

d 1 = c 1 b 2 - b 1 c 2 d 2 = c 1 b 3 - b 1 c 3

d 1 = c 1 b 2 - b 1 c 2 d 2 = c 1 b 3 - b 1 c 3 and d = c b - b c c d = c b - b c c This process is coninued unil he nh row has been compleed. The complee array of coefficiens is riangular. Noe ha in developing he array an enire row may be divided or

More information

72 Calculus and Structures

72 Calculus and Structures 72 Calculus and Srucures CHAPTER 5 DISTANCE AND ACCUMULATED CHANGE Calculus and Srucures 73 Copyrigh Chaper 5 DISTANCE AND ACCUMULATED CHANGE 5. DISTANCE a. Consan velociy Le s ake anoher look a Mary s

More information

Lecture 3: Exponential Smoothing

Lecture 3: Exponential Smoothing NATCOR: Forecasing & Predicive Analyics Lecure 3: Exponenial Smoohing John Boylan Lancaser Cenre for Forecasing Deparmen of Managemen Science Mehods and Models Forecasing Mehod A (numerical) procedure

More information

Inventory Control of Perishable Items in a Two-Echelon Supply Chain

Inventory Control of Perishable Items in a Two-Echelon Supply Chain Journal of Indusrial Engineering, Universiy of ehran, Special Issue,, PP. 69-77 69 Invenory Conrol of Perishable Iems in a wo-echelon Supply Chain Fariborz Jolai *, Elmira Gheisariha and Farnaz Nojavan

More information

Solutions Problem Set 3 Macro II (14.452)

Solutions Problem Set 3 Macro II (14.452) Soluions Problem Se 3 Macro II (14.452) Francisco A. Gallego 04/27/2005 1 Q heory of invesmen in coninuous ime and no uncerainy Consider he in nie horizon model of a rm facing adjusmen coss o invesmen.

More information

Introduction to Probability and Statistics Slides 4 Chapter 4

Introduction to Probability and Statistics Slides 4 Chapter 4 Inroducion o Probabiliy and Saisics Slides 4 Chaper 4 Ammar M. Sarhan, asarhan@mahsa.dal.ca Deparmen of Mahemaics and Saisics, Dalhousie Universiy Fall Semeser 8 Dr. Ammar Sarhan Chaper 4 Coninuous Random

More information

Lecture 33: November 29

Lecture 33: November 29 36-705: Inermediae Saisics Fall 2017 Lecurer: Siva Balakrishnan Lecure 33: November 29 Today we will coninue discussing he boosrap, and hen ry o undersand why i works in a simple case. In he las lecure

More information

State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter

State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter Sae-Space Models Iniializaion, Esimaion and Smoohing of he Kalman Filer Iniializaion of he Kalman Filer The Kalman filer shows how o updae pas predicors and he corresponding predicion error variances when

More information

Answers to Exercises in Chapter 7 - Correlation Functions

Answers to Exercises in Chapter 7 - Correlation Functions M J Robers - //8 Answers o Exercises in Chaper 7 - Correlaion Funcions 7- (from Papoulis and Pillai) The random variable C is uniform in he inerval (,T ) Find R, ()= u( C), ()= C (Use R (, )= R,, < or

More information

An recursive analytical technique to estimate time dependent physical parameters in the presence of noise processes

An recursive analytical technique to estimate time dependent physical parameters in the presence of noise processes WHAT IS A KALMAN FILTER An recursive analyical echnique o esimae ime dependen physical parameers in he presence of noise processes Example of a ime and frequency applicaion: Offse beween wo clocks PREDICTORS,

More information

Learning Objectives: Practice designing and simulating digital circuits including flip flops Experience state machine design procedure

Learning Objectives: Practice designing and simulating digital circuits including flip flops Experience state machine design procedure Lab 4: Synchronous Sae Machine Design Summary: Design and implemen synchronous sae machine circuis and es hem wih simulaions in Cadence Viruoso. Learning Objecives: Pracice designing and simulaing digial

More information

Learning to Take Concurrent Actions

Learning to Take Concurrent Actions Learning o Take Concurren Acions Khashayar Rohanimanesh Deparmen of Compuer Science Universiy of Massachuses Amhers, MA 0003 khash@cs.umass.edu Sridhar Mahadevan Deparmen of Compuer Science Universiy of

More information

( ) a system of differential equations with continuous parametrization ( T = R + These look like, respectively:

( ) a system of differential equations with continuous parametrization ( T = R + These look like, respectively: XIII. DIFFERENCE AND DIFFERENTIAL EQUATIONS Ofen funcions, or a sysem of funcion, are paramerized in erms of some variable, usually denoed as and inerpreed as ime. The variable is wrien as a funcion of

More information

Code-specific policy gradient rules for spiking neurons

Code-specific policy gradient rules for spiking neurons Code-specific policy gradien rules for spiking neurons Henning Sprekeler Guillaume Hennequin Wulfram Gersner Laboraory for Compuaional Neuroscience École Polyechnique Fédérale de Lausanne 115 Lausanne

More information

Chapter 2. First Order Scalar Equations

Chapter 2. First Order Scalar Equations Chaper. Firs Order Scalar Equaions We sar our sudy of differenial equaions in he same way he pioneers in his field did. We show paricular echniques o solve paricular ypes of firs order differenial equaions.

More information

Macroeconomic Theory Ph.D. Qualifying Examination Fall 2005 ANSWER EACH PART IN A SEPARATE BLUE BOOK. PART ONE: ANSWER IN BOOK 1 WEIGHT 1/3

Macroeconomic Theory Ph.D. Qualifying Examination Fall 2005 ANSWER EACH PART IN A SEPARATE BLUE BOOK. PART ONE: ANSWER IN BOOK 1 WEIGHT 1/3 Macroeconomic Theory Ph.D. Qualifying Examinaion Fall 2005 Comprehensive Examinaion UCLA Dep. of Economics You have 4 hours o complee he exam. There are hree pars o he exam. Answer all pars. Each par has

More information

More Digital Logic. t p output. Low-to-high and high-to-low transitions could have different t p. V in (t)

More Digital Logic. t p output. Low-to-high and high-to-low transitions could have different t p. V in (t) EECS 4 Spring 23 Lecure 2 EECS 4 Spring 23 Lecure 2 More igial Logic Gae delay and signal propagaion Clocked circui elemens (flip-flop) Wriing a word o memory Simplifying digial circuis: Karnaugh maps

More information

On-line Adaptive Optimal Timing Control of Switched Systems

On-line Adaptive Optimal Timing Control of Switched Systems On-line Adapive Opimal Timing Conrol of Swiched Sysems X.C. Ding, Y. Wardi and M. Egersed Absrac In his paper we consider he problem of opimizing over he swiching imes for a muli-modal dynamic sysem when

More information

NEWTON S SECOND LAW OF MOTION

NEWTON S SECOND LAW OF MOTION Course and Secion Dae Names NEWTON S SECOND LAW OF MOTION The acceleraion of an objec is defined as he rae of change of elociy. If he elociy changes by an amoun in a ime, hen he aerage acceleraion during

More information

Georey E. Hinton. University oftoronto. Technical Report CRG-TR February 22, Abstract

Georey E. Hinton. University oftoronto.   Technical Report CRG-TR February 22, Abstract Parameer Esimaion for Linear Dynamical Sysems Zoubin Ghahramani Georey E. Hinon Deparmen of Compuer Science Universiy oftorono 6 King's College Road Torono, Canada M5S A4 Email: zoubin@cs.orono.edu Technical

More information