CSE/NB 528 Lecture 14: Reinforcement Learning (Chapter 9)
|
|
- Dorcas Paul
- 5 years ago
- Views:
Transcription
1 CSE/NB 528 Lecure 14: Reinforcemen Learning Chaper 9 Image from hp://clasdean.la.asu.edu/news/images/ubep2001/neuron3.jpg Lecure figures are from Dayan & Abbo s book hp://people.brandeis.edu/~abbo/book/index.hml 1
2 Today s Agenda Reinforcemen Learning Wha is reinforcemen learning? Classical condiioning Learning o salivae predicing reward Predicing Delayed Rewards Temporal Difference Learning Learning o Ac Q-learning Acor-Criic Archiecure 2
3 Some Supervised Learning Demos on he Web Funcion Approximaion: hp://neuron.eng.wayne.edu/bpfuncionapprox/bpfuncionapprox.hml Paern Recogniion hp://eecs.wsu.edu/~cook/ai/lecures/apples/hnn/jrec.hml Image Compression hp://neuron.eng.wayne.edu/bpimagecompression9plus/bp9plus.hml Backpropagaion for Conrol: Ball Balancing hp://neuron.eng.wayne.edu/bpballbalancing/ball5.hml 3
4 Humans don ge exac supervisory signals commands for muscles for learning o alk, walk, ride a bicycle, play he piano, drive, ec. We learn by rial-and-error and by waching ohers Migh ge rewards and punishmens along he way Ener Reinforcemen Learning 4
5 The Reinforcemen Learning Agen Agen Sae u Reward r Acion a Environmen 5
6 The Reinforcemen Learning Framework Unsupervised learning: Learn he hidden causes of inpus Supervised learning: Learn a funcion based on raining examples of inpu, desired oupu pairs Reinforcemen Learning: Learn he bes acion for any given sae so as o maximize oal expeced fuure reward Learn by rial and error Inermediae beween unsupervised and supervised learning Insead of explici eaching signal or desired oupu, you ge rewards or punishmens Inspired by classical condiioning experimens remember Pavlov s hyper-salivaing dog? 6
7 Early Resuls: Pavlov and his Dog Classical Pavlovian condiioning experimens Training: Bell Food Afer: Bell Salivae Condiioned simulus bell predics fuure reward food hp://employees.csbsju.edu/creed/pb/pdoganim.hml 7
8 Predicing Reward Simulus u = 0 or 1 Expeced reward v = wu Delivered reward = r Learn w by minimizing r v 2 w w + ε r v u Predicion error δ = r v For small ε and u = 1, Average value of w = w w same as he dela rule; also called Rescorla-Wagner rule w + ε r w r 8
9 Predicing Reward during Condiioning Reward r presen condiioning r = 1, ε = 0.5 Reward removed exincion Reward presened 50% of he rials 9
10 Predicing Delayed Rewards In more realisic cases, reward is ypically delivered a he end when you know wheher you succeeded or no Time: 0 T wih simulus u and reward r a each ime sep Noe: r can be zero Key Idea: Make he oupu v predic oal expeced fuure reward saring from ime v T τ = 0 r + τ 10
11 Learning o Predic Delayed Rewards Use a se of modifiable weighs w and predic based on all pas simuli u: v = w τ u τ = 0 τ Would like o find wτ ha minimize: T τ = 0 r + τ v 2 Can we minimize his using gradien descen and dela rule? Yes, BUT no ye available are fuure rewards 11
12 12 Temporal Difference TD Learning Key Idea: Rewrie squared error o ge rid of fuure erms: Temporal Difference TD Learning: For each ime sep, do: For all τ 0 τ, do: v v r v r r v r T T = + = τ = τ τ τ ] 1 [ τ ε τ τ u v v r w w 0 τ τ τ = = u w v Expeced fuure reward Predicion δ
13 Predicing Delayed Reward: TD Learning Simulus a = 100 and reward a = 200 Predicion error δ for each ime sep over many rials 13
14 Reward Predicion Error Signal in Monkeys? Dopaminergic cells in Venral Tegmenal Area VTA Reward Predicion error? [ r + v + 1 v ] Before Training Afer Training [ 0 + v + 1 v ] No error v r + v
15 More Evidence for Predicion Error Signals Dopaminergic cells in VTA Negaive error r = 0, v + 1 = 0 [ r + v + 1 v ] = v 15
16 Tha s grea, bu how does all ha mah help me ge food in a maze? 16
17 Using Reward Predicions o Selec Acions Suppose you have compued a Value for each acion Qa = value prediced reward for execuing acion a Higher if acion yields more reward, lower oherwise Can selec acions probabilisically according o heir value: P a = a' exp β Q a exp βq a' High β selecs acions wih highes Q value. Low β selecs more uniformly 17
18 Simple Example: Bee Foraging Experimen: Bees selec eiher yellow y or blue b flowers based on necar reward Idea: Value of yellow/blue = average reward obained so far Q y Q y + ε r Q y Q b Q b + ε r P y P b = 1 P y b y Q b exp βq y = exp βq y + exp βq b dela rule running average Yum! hp://svi.cps.uexas.edu/bee_on_flower_original.hm 18
19 Simulaing Bees r r y b = 2 = 1 r r y b = 1 = 2 β = 1 exploraion possible Q y b β = 50 β = 50 mosly exploiaion 19
20 Forge bees, how do I ge o he food in he maze? 20
21 Selecing Acions when Reward is Delayed Saes: A, B, or C Possible acions a any sae: Lef L or Righ R If you randomly choose o go L or R random policy, wha is he value v of each sae? 21
22 22 Policy Evaluaion For random policy: Can learn his using TD learning: = + = = + = = + = C v B v A v C v B v ] ' [ u v u v u r w u u w a + + ε u,a u Le vu = wu Locaion, acion new locaion
23 Maze Value Learning for Random Policy Once I know he values, I can pick he acion ha leads o he higher valued sae! 23
24 Selecing Acions based on Values Values ac as surrogae immediae rewards Locally opimal choice leads o globally opimal policy for Markov environmens Relaed o Dynamic Programming in CS see appendix in ex 24
25 Q learning A simple mehod for acion selecion based on acion values or Q values Qx,a where x is a sae and a is an acion 1. Le u be he curren sae. Selec an acion a according o: exp βq u, a P a = exp βq u, a' a' 2. Execue a and record new sae u and reward r. Updae Q: Q u, a Q u, a + ε r + maxa' Q u', a' Q u, a 3. Repea unil an end sae is reached 25
26 Anoher Varian: Acor-Criic Learning Two separae componens: Acor mainains policy and Criic mainains value of each sae 1. Criic Learning Policy Evaluaion : Value of sae u = vu = wu w u w u + ε [ ra u + v u' v u] 2. Acor Learning Policy Improvemen : P a; u Q a' = For all a : exp βqa u exp βq u u Qa' u + ε[ ra u + v u' v u] δ aa' 3. Inerleave 1 and 2 b b same as TD rule Use his o selec an acion a in u P a'; u 26
27 Acor-Criic Learning in he Maze Task Probabiliy of going Lef a a locaion 27
28 Demo of Reinforcemen Learning in a Robo from hp://sysplan.nams.kyushuu.ac.jp/gen/papers/javademoml97/robodemo.hml 28
29 Things o do: Work on mini-projec Nex class: Course Summary 29
CSE/NB 528 Lecture 14: From Supervised to Reinforcement Learning (Chapter 9) R. Rao, 528: Lecture 14
CSE/NB 58 Lecure 14: From Supervised o Reinforcemen Learning Chaper 9 1 Recall from las ime: Sigmoid Neworks Oupu v T g w u g wiui w Inpu nodes u = u 1 u u 3 T i Sigmoid oupu funcion: 1 g a 1 a e 1 ga
More informationChapter 21. Reinforcement Learning. The Reinforcement Learning Agent
CSE 47 Chaper Reinforcemen Learning The Reinforcemen Learning Agen Agen Sae u Reward r Acion a Enironmen CSE AI Faculy Why reinforcemen learning Programming an agen o drie a car or fly a helicoper is ery
More informationCSE/NEURO 528 Lecture 13: Reinforcement Learning & Course Review (Chapter 9)
CSE/NEURO 528 Lecure 13: Reinforceen Learning & Course Review Chaper 9 Aniaion: To Creed, SJU 1 Early Resuls: Pavlov and his Dog F Classical Pavlovian condiioning experiens F Training: Bell Food F Afer:
More informationRL Lecture 7: Eligibility Traces. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1
RL Lecure 7: Eligibiliy Traces R. S. Suon and A. G. Baro: Reinforcemen Learning: An Inroducion 1 N-sep TD Predicion Idea: Look farher ino he fuure when you do TD backup (1, 2, 3,, n seps) R. S. Suon and
More informationClassical Conditioning IV: TD learning in the brain
Classical Condiioning IV: TD learning in he brain PSY/NEU338: Animal learning and decision making: Psychological, compuaional and neural perspecives recap: Marr s levels of analysis David Marr (1945-1980)
More informationOnline Convex Optimization Example And Follow-The-Leader
CSE599s, Spring 2014, Online Learning Lecure 2-04/03/2014 Online Convex Opimizaion Example And Follow-The-Leader Lecurer: Brendan McMahan Scribe: Sephen Joe Jonany 1 Review of Online Convex Opimizaion
More informationPresentation Overview
Acion Refinemen in Reinforcemen Learning by Probabiliy Smoohing By Thomas G. Dieerich & Didac Busques Speaer: Kai Xu Presenaion Overview Bacground The Probabiliy Smoohing Mehod Experimenal Sudy of Acion
More informationEnsamble methods: Boosting
Lecure 21 Ensamble mehods: Boosing Milos Hauskrech milos@cs.pi.edu 5329 Senno Square Schedule Final exam: April 18: 1:00-2:15pm, in-class Term projecs April 23 & April 25: a 1:00-2:30pm in CS seminar room
More informationCHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK
175 CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK 10.1 INTRODUCTION Amongs he research work performed, he bes resuls of experimenal work are validaed wih Arificial Neural Nework. From he
More informationARTIFICIAL INTELLIGENCE. Markov decision processes
INFOB2KI 2017-2018 Urech Univeriy The Neherland ARTIFICIAL INTELLIGENCE Markov deciion procee Lecurer: Silja Renooij Thee lide are par of he INFOB2KI Coure Noe available from www.c.uu.nl/doc/vakken/b2ki/chema.hml
More informationEnsamble methods: Bagging and Boosting
Lecure 21 Ensamble mehods: Bagging and Boosing Milos Hauskrech milos@cs.pi.edu 5329 Senno Square Ensemble mehods Mixure of expers Muliple base models (classifiers, regressors), each covers a differen par
More informationLinear Response Theory: The connection between QFT and experiments
Phys540.nb 39 3 Linear Response Theory: The connecion beween QFT and experimens 3.1. Basic conceps and ideas Q: How do we measure he conduciviy of a meal? A: we firs inroduce a weak elecric field E, and
More informationZürich. ETH Master Course: L Autonomous Mobile Robots Localization II
Roland Siegwar Margaria Chli Paul Furgale Marco Huer Marin Rufli Davide Scaramuzza ETH Maser Course: 151-0854-00L Auonomous Mobile Robos Localizaion II ACT and SEE For all do, (predicion updae / ACT),
More informationDeep Learning: Theory, Techniques & Applications - Recurrent Neural Networks -
Deep Learning: Theory, Techniques & Applicaions - Recurren Neural Neworks - Prof. Maeo Maeucci maeo.maeucci@polimi.i Deparmen of Elecronics, Informaion and Bioengineering Arificial Inelligence and Roboics
More information20. Applications of the Genetic-Drift Model
0. Applicaions of he Geneic-Drif Model 1) Deermining he probabiliy of forming any paricular combinaion of genoypes in he nex generaion: Example: If he parenal allele frequencies are p 0 = 0.35 and q 0
More informationPredator - Prey Model Trajectories and the nonlinear conservation law
Predaor - Prey Model Trajecories and he nonlinear conservaion law James K. Peerson Deparmen of Biological Sciences and Deparmen of Mahemaical Sciences Clemson Universiy Ocober 28, 213 Ouline Drawing Trajecories
More informationVehicle Arrival Models : Headway
Chaper 12 Vehicle Arrival Models : Headway 12.1 Inroducion Modelling arrival of vehicle a secion of road is an imporan sep in raffic flow modelling. I has imporan applicaion in raffic flow simulaion where
More informationLinear Time-invariant systems, Convolution, and Cross-correlation
Linear Time-invarian sysems, Convoluion, and Cross-correlaion (1) Linear Time-invarian (LTI) sysem A sysem akes in an inpu funcion and reurns an oupu funcion. x() T y() Inpu Sysem Oupu y() = T[x()] An
More informationViterbi Algorithm: Background
Vierbi Algorihm: Background Jean Mark Gawron March 24, 2014 1 The Key propery of an HMM Wha is an HMM. Formally, i has he following ingrediens: 1. a se of saes: S 2. a se of final saes: F 3. an iniial
More informationRandom Walk with Anti-Correlated Steps
Random Walk wih Ani-Correlaed Seps John Noga Dirk Wagner 2 Absrac We conjecure he expeced value of random walks wih ani-correlaed seps o be exacly. We suppor his conjecure wih 2 plausibiliy argumens and
More informationSimulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010
Simulaion-Solving Dynamic Models ABE 5646 Week 2, Spring 2010 Week Descripion Reading Maerial 2 Compuer Simulaion of Dynamic Models Finie Difference, coninuous saes, discree ime Simple Mehods Euler Trapezoid
More informationPlanning in POMDPs. Dominik Schoenberger Abstract
Planning in POMDPs Dominik Schoenberger d.schoenberger@sud.u-darmsad.de Absrac This documen briefly explains wha a Parially Observable Markov Decision Process is. Furhermore i inroduces he differen approaches
More informationEmbedded Systems and Software. A Simple Introduction to Embedded Control Systems (PID Control)
Embedded Sysems and Sofware A Simple Inroducion o Embedded Conrol Sysems (PID Conrol) Embedded Sysems and Sofware, ECE:3360. The Universiy of Iowa, 2016 Slide 1 Acknowledgemens The maerial in his lecure
More information1 Review of Zero-Sum Games
COS 5: heoreical Machine Learning Lecurer: Rob Schapire Lecure #23 Scribe: Eugene Brevdo April 30, 2008 Review of Zero-Sum Games Las ime we inroduced a mahemaical model for wo player zero-sum games. Any
More informationNotes on online convex optimization
Noes on online convex opimizaion Karl Sraos Online convex opimizaion (OCO) is a principled framework for online learning: OnlineConvexOpimizaion Inpu: convex se S, number of seps T For =, 2,..., T : Selec
More informationSolutions - Midterm Exam
DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING, THE UNIVERITY OF NEW MEXICO ECE-34: ignals and ysems ummer 203 PROBLEM (5 PT) Given he following LTI sysem: oluions - Miderm Exam a) kech he impulse response
More informationArticle from. Predictive Analytics and Futurism. July 2016 Issue 13
Aricle from Predicive Analyics and Fuurism July 6 Issue An Inroducion o Incremenal Learning By Qiang Wu and Dave Snell Machine learning provides useful ools for predicive analyics The ypical machine learning
More informationReinforcement learning
Lecue 3 Reinfocemen leaning Milos Hauskech milos@cs.pi.edu 539 Senno Squae Reinfocemen leaning We wan o lean he conol policy: : X A We see examples of x (bu oupus a ae no given) Insead of a we ge a feedback
More informationReinforcement Learning: A Tutorial. Scope of Tutorial. 1 Introduction
Reinforcemen Learning: A Tuorial Mance E. Harmon WL/AACF 224 Avionics Circle Wrigh Laboraory Wrigh-Paerson AFB, OH 45433 mharmon@acm.org Sephanie S. Harmon Wrigh Sae Universiy 56-8 Mallard Glen Drive Cenerville,
More informationBias-Variance Error Bounds for Temporal Difference Updates
Bias-Variance Bounds for Temporal Difference Updaes Michael Kearns AT&T Labs mkearns@research.a.com Sainder Singh AT&T Labs baveja@research.a.com Absrac We give he firs rigorous upper bounds on he error
More informationTrue Online Temporal-Difference Learning. A. Rupam Mahmood Patrick M. Pilarski
True Online Temporal-Difference Learning True Online Temporal-Difference Learning Harm van Seijen harm.vanseijen@ualbera.ca A. Rupam Mahmood ashique@ualbera.ca Parick M. Pilarski parick.pilarski@ualbera.ca
More informationSOLUTIONS TO ECE 3084
SOLUTIONS TO ECE 384 PROBLEM 2.. For each sysem below, specify wheher or no i is: (i) memoryless; (ii) causal; (iii) inverible; (iv) linear; (v) ime invarian; Explain your reasoning. If he propery is no
More informationApplication of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing
Applicaion of a Sochasic-Fuzzy Approach o Modeling Opimal Discree Time Dynamical Sysems by Using Large Scale Daa Processing AA WALASZE-BABISZEWSA Deparmen of Compuer Engineering Opole Universiy of Technology
More informationEECE 301 Signals & Systems Prof. Mark Fowler
EECE 3 Signals & Sysems Prof. Mark Fowler Noe Se #2 Wha are Coninuous-Time Signals??? Reading Assignmen: Secion. of Kamen and Heck /22 Course Flow Diagram The arrows here show concepual flow beween ideas.
More information1. An introduction to dynamic optimization -- Optimal Control and Dynamic Programming AGEC
This documen was generaed a :45 PM 8/8/04 Copyrigh 04 Richard T. Woodward. An inroducion o dynamic opimizaion -- Opimal Conrol and Dynamic Programming AGEC 637-04 I. Overview of opimizaion Opimizaion is
More informationAn introduction to the theory of SDDP algorithm
An inroducion o he heory of SDDP algorihm V. Leclère (ENPC) Augus 1, 2014 V. Leclère Inroducion o SDDP Augus 1, 2014 1 / 21 Inroducion Large scale sochasic problem are hard o solve. Two ways of aacking
More informationLars Nesheim. 17 January Last lecture solved the consumer choice problem.
Lecure 4 Locaional Equilibrium Coninued Lars Nesheim 17 January 28 1 Inroducory remarks Las lecure solved he consumer choice problem. Compued condiional demand funcions: C (I x; p; r (x)) and x; p; r (x))
More informationChapter 7: Solving Trig Equations
Haberman MTH Secion I: The Trigonomeric Funcions Chaper 7: Solving Trig Equaions Le s sar by solving a couple of equaions ha involve he sine funcion EXAMPLE a: Solve he equaion sin( ) The inverse funcions
More informationA Reinforcement Learning Approach for Collaborative Filtering
A Reinforcemen Learning Approach for Collaboraive Filering Jungkyu Lee, Byonghwa Oh 2, Jihoon Yang 2, and Sungyong Park 2 Cyram Inc, Seoul, Korea jklee@cyram.com 2 Sogang Universiy, Seoul, Korea {mrfive,yangjh,parksy}@sogang.ac.kr
More informationTom Heskes and Onno Zoeter. Presented by Mark Buller
Tom Heskes and Onno Zoeer Presened by Mark Buller Dynamic Bayesian Neworks Direced graphical models of sochasic processes Represen hidden and observed variables wih differen dependencies Generalize Hidden
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Noes for EE7C Spring 018: Convex Opimizaion and Approximaion Insrucor: Moriz Hard Email: hard+ee7c@berkeley.edu Graduae Insrucor: Max Simchowiz Email: msimchow+ee7c@berkeley.edu Ocober 15, 018 3
More informationLabQuest 24. Capacitors
Capaciors LabQues 24 The charge q on a capacior s plae is proporional o he poenial difference V across he capacior. We express his wih q V = C where C is a proporionaliy consan known as he capaciance.
More informationChapter 8 The Complete Response of RL and RC Circuits
Chaper 8 The Complee Response of RL and RC Circuis Seoul Naional Universiy Deparmen of Elecrical and Compuer Engineering Wha is Firs Order Circuis? Circuis ha conain only one inducor or only one capacior
More informationEE100 Lab 3 Experiment Guide: RC Circuits
I. Inroducion EE100 Lab 3 Experimen Guide: A. apaciors A capacior is a passive elecronic componen ha sores energy in he form of an elecrosaic field. The uni of capaciance is he farad (coulomb/vol). Pracical
More informationTournament selection in zeroth-level classifier systems based on. average reward reinforcement learning
ournamen selecion in zeroh-level classifier sysems based on average reward reinforcemen learning Zang Zhaoxiang, Li Zhao, Wang Junying, Dan Zhiping zxzang@gmail.com; zangzx@hus.edu.cn (Hubei Key Laboraory
More informationDesigning Information Devices and Systems I Spring 2019 Lecture Notes Note 17
EES 16A Designing Informaion Devices and Sysems I Spring 019 Lecure Noes Noe 17 17.1 apaciive ouchscreen In he las noe, we saw ha a capacior consiss of wo pieces on conducive maerial separaed by a nonconducive
More informationSequential Importance Resampling (SIR) Particle Filter
Paricle Filers++ Pieer Abbeel UC Berkeley EECS Many slides adaped from Thrun, Burgard and Fox, Probabilisic Roboics 1. Algorihm paricle_filer( S -1, u, z ): 2. Sequenial Imporance Resampling (SIR) Paricle
More informationRobust Learning Control with Application to HVAC Systems
Robus Learning Conrol wih Applicaion o HVAC Sysems Naional Science Foundaion & Projec Invesigaors: Dr. Charles Anderson, CS Dr. Douglas Hile, ME Dr. Peer Young, ECE Mechanical Engineering Compuer Science
More informationBU Macro BU Macro Fall 2008, Lecture 4
Dynamic Programming BU Macro 2008 Lecure 4 1 Ouline 1. Cerainy opimizaion problem used o illusrae: a. Resricions on exogenous variables b. Value funcion c. Policy funcion d. The Bellman equaion and an
More informationMachine Learning 4771
ony Jebara, Columbia Universiy achine Learning 4771 Insrucor: ony Jebara ony Jebara, Columbia Universiy opic 20 Hs wih Evidence H Collec H Evaluae H Disribue H Decode H Parameer Learning via JA & E ony
More informationSZG Macro 2011 Lecture 3: Dynamic Programming. SZG macro 2011 lecture 3 1
SZG Macro 2011 Lecure 3: Dynamic Programming SZG macro 2011 lecure 3 1 Background Our previous discussion of opimal consumpion over ime and of opimal capial accumulaion sugges sudying he general decision
More informationDifferential Geometry: Numerical Integration and Surface Flow
Differenial Geomery: Numerical Inegraion and Surface Flow [Implici Fairing of Irregular Meshes using Diffusion and Curaure Flow. Desbrun e al., 1999] Energy Minimizaion Recall: We hae been considering
More informationTemporal Abstraction in Temporal-difference Networks
Temporal Absracion in Temporal-difference Neworks Richard S. Suon, Eddie J. Rafols, Anna Koop Deparmen of Compuing Science Universiy of Albera Edmonon, AB, Canada T6G 2E8 {suon,erafols,anna}@cs.ualbera.ca
More informationDimitri Solomatine. D.P. Solomatine. Data-driven modelling (part 2). 2
Daa-driven modelling. Par. Daa-driven Arificial di Neural modelling. Newors Par Dimiri Solomaine Arificial neural newors D.P. Solomaine. Daa-driven modelling par. 1 Arificial neural newors ANN: main pes
More informationLecture 2 October ε-approximation of 2-player zero-sum games
Opimizaion II Winer 009/10 Lecurer: Khaled Elbassioni Lecure Ocober 19 1 ε-approximaion of -player zero-sum games In his lecure we give a randomized ficiious play algorihm for obaining an approximae soluion
More informationModal identification of structures from roving input data by means of maximum likelihood estimation of the state space model
Modal idenificaion of srucures from roving inpu daa by means of maximum likelihood esimaion of he sae space model J. Cara, J. Juan, E. Alarcón Absrac The usual way o perform a forced vibraion es is o fix
More informationEE 301 Lab 2 Convolution
EE 301 Lab 2 Convoluion 1 Inroducion In his lab we will gain some more experience wih he convoluion inegral and creae a scrip ha shows he graphical mehod of convoluion. 2 Wha you will learn This lab will
More information= ( ) ) or a system of differential equations with continuous parametrization (T = R
XIII. DIFFERENCE AND DIFFERENTIAL EQUATIONS Ofen funcions, or a sysem of funcion, are paramerized in erms of some variable, usually denoed as and inerpreed as ime. The variable is wrien as a funcion of
More informationLaplace transfom: t-translation rule , Haynes Miller and Jeremy Orloff
Laplace ransfom: -ranslaion rule 8.03, Haynes Miller and Jeremy Orloff Inroducory example Consider he sysem ẋ + 3x = f(, where f is he inpu and x he response. We know is uni impulse response is 0 for
More information3.1 More on model selection
3. More on Model selecion 3. Comparing models AIC, BIC, Adjused R squared. 3. Over Fiing problem. 3.3 Sample spliing. 3. More on model selecion crieria Ofen afer model fiing you are lef wih a handful of
More informationMath 10B: Mock Mid II. April 13, 2016
Name: Soluions Mah 10B: Mock Mid II April 13, 016 1. ( poins) Sae, wih jusificaion, wheher he following saemens are rue or false. (a) If a 3 3 marix A saisfies A 3 A = 0, hen i canno be inverible. True.
More informationMATH 5720: Gradient Methods Hung Phan, UMass Lowell October 4, 2018
MATH 5720: Gradien Mehods Hung Phan, UMass Lowell Ocober 4, 208 Descen Direcion Mehods Consider he problem min { f(x) x R n}. The general descen direcions mehod is x k+ = x k + k d k where x k is he curren
More information23.2. Representing Periodic Functions by Fourier Series. Introduction. Prerequisites. Learning Outcomes
Represening Periodic Funcions by Fourier Series 3. Inroducion In his Secion we show how a periodic funcion can be expressed as a series of sines and cosines. We begin by obaining some sandard inegrals
More informationAttention-Gated Reinforcement Learning in Neural Networks A Unified View
Aenion-Gaed Reinforcemen Learning in Neural Neworks A Unified View Tobias Brosch, Friedhelm Schwenker, and Heiko Neumann Insiue of Neural Informaion Processing, Universiy of Ulm, 89069 Ulm, Germany {obias.brosch,friedhelm.schwenker,heiko.neumann}@uni-ulm.de
More informationEconomics 8105 Macroeconomic Theory Recitation 6
Economics 8105 Macroeconomic Theory Reciaion 6 Conor Ryan Ocober 11h, 2016 Ouline: Opimal Taxaion wih Governmen Invesmen 1 Governmen Expendiure in Producion In hese noes we will examine a model in which
More information1. An introduction to dynamic optimization -- Optimal Control and Dynamic Programming AGEC
This documen was generaed a :37 PM, 1/11/018 Copyrigh 018 Richard T. Woodward 1. An inroducion o dynamic opimiaion -- Opimal Conrol and Dynamic Programming AGEC 64-018 I. Overview of opimiaion Opimiaion
More informationE β t log (C t ) + M t M t 1. = Y t + B t 1 P t. B t 0 (3) v t = P tc t M t Question 1. Find the FOC s for an optimum in the agent s problem.
Noes, M. Krause.. Problem Se 9: Exercise on FTPL Same model as in paper and lecure, only ha one-period govenmen bonds are replaced by consols, which are bonds ha pay one dollar forever. I has curren marke
More informationTwo Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017
Two Popular Bayesian Esimaors: Paricle and Kalman Filers McGill COMP 765 Sep 14 h, 2017 1 1 1, dx x Bel x u x P x z P Recall: Bayes Filers,,,,,,, 1 1 1 1 u z u x P u z u x z P Bayes z = observaion u =
More informationδ (τ )dτ denotes the unit step function, and
ECE-202 Homework Problems (Se 1) Spring 18 TO THE STUDENT: ALWAYS CHECK THE ERRATA on he web. ANCIENT ASIAN/AFRICAN/NATIVE AMERICAN/SOUTH AMERICAN ETC. PROVERB: If you give someone a fish, you give hem
More informationFinal Spring 2007
.615 Final Spring 7 Overview The purpose of he final exam is o calculae he MHD β limi in a high-bea oroidal okamak agains he dangerous n = 1 exernal ballooning-kink mode. Effecively, his corresponds o
More informationEECE 301 Signals & Systems Prof. Mark Fowler
EECE 31 Signals & Sysems Prof. Mark Fowler Noe Se #1 C-T Sysems: Convoluion Represenaion Reading Assignmen: Secion 2.6 of Kamen and Heck 1/11 Course Flow Diagram The arrows here show concepual flow beween
More informationPhys1112: DC and RC circuits
Name: Group Members: Dae: TA s Name: Phys1112: DC and RC circuis Objecives: 1. To undersand curren and volage characerisics of a DC RC discharging circui. 2. To undersand he effec of he RC ime consan.
More information3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon
3..3 INRODUCION O DYNAMIC OPIMIZAION: DISCREE IME PROBLEMS A. he Hamilonian and Firs-Order Condiions in a Finie ime Horizon Define a new funcion, he Hamilonian funcion, H. H he change in he oal value of
More information8. Basic RL and RC Circuits
8. Basic L and C Circuis This chaper deals wih he soluions of he responses of L and C circuis The analysis of C and L circuis leads o a linear differenial equaion This chaper covers he following opics
More informationHidden Markov Models
Hidden Markov Models Probabilisic reasoning over ime So far, we ve mosly deal wih episodic environmens Excepions: games wih muliple moves, planning In paricular, he Bayesian neworks we ve seen so far describe
More informationWritten HW 9 Sol. CS 188 Fall Introduction to Artificial Intelligence
CS 188 Fall 2018 Inroducion o Arificial Inelligence Wrien HW 9 Sol. Self-assessmen due: Tuesday 11/13/2018 a 11:59pm (submi via Gradescope) For he self assessmen, fill in he self assessmen boxes in your
More informationRobust and Learning Control for Complex Systems
Robus and Learning Conrol for Complex Sysems Peer M. Young Sepember 13, 2007 & Talk Ouline Inroducion Robus Conroller Analysis and Design Theory Experimenal Applicaions Overview MIMO Robus HVAC Conrol
More informationGuest Lectures for Dr. MacFarlane s EE3350 Part Deux
Gues Lecures for Dr. MacFarlane s EE3350 Par Deux Michael Plane Mon., 08-30-2010 Wrie name in corner. Poin ou his is a review, so I will go faser. Remind hem o go lisen o online lecure abou geing an A
More informationOff-policy TD(λ) with a true online equivalence
Off-policy TD(λ) wih a rue online equivalence Hado van Hassel A Rupam Mahmood Richard S Suon Reinforcemen Learning and Arificial Inelligence Laboraory Universiy of Albera, Edmonon, AB T6G 2E8 Canada Absrac
More informationPENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD
PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD HAN XIAO 1. Penalized Leas Squares Lasso solves he following opimizaion problem, ˆβ lasso = arg max β R p+1 1 N y i β 0 N x ij β j β j (1.1) for some 0.
More informationd 1 = c 1 b 2 - b 1 c 2 d 2 = c 1 b 3 - b 1 c 3
and d = c b - b c c d = c b - b c c This process is coninued unil he nh row has been compleed. The complee array of coefficiens is riangular. Noe ha in developing he array an enire row may be divided or
More information72 Calculus and Structures
72 Calculus and Srucures CHAPTER 5 DISTANCE AND ACCUMULATED CHANGE Calculus and Srucures 73 Copyrigh Chaper 5 DISTANCE AND ACCUMULATED CHANGE 5. DISTANCE a. Consan velociy Le s ake anoher look a Mary s
More informationLecture 3: Exponential Smoothing
NATCOR: Forecasing & Predicive Analyics Lecure 3: Exponenial Smoohing John Boylan Lancaser Cenre for Forecasing Deparmen of Managemen Science Mehods and Models Forecasing Mehod A (numerical) procedure
More informationInventory Control of Perishable Items in a Two-Echelon Supply Chain
Journal of Indusrial Engineering, Universiy of ehran, Special Issue,, PP. 69-77 69 Invenory Conrol of Perishable Iems in a wo-echelon Supply Chain Fariborz Jolai *, Elmira Gheisariha and Farnaz Nojavan
More informationSolutions Problem Set 3 Macro II (14.452)
Soluions Problem Se 3 Macro II (14.452) Francisco A. Gallego 04/27/2005 1 Q heory of invesmen in coninuous ime and no uncerainy Consider he in nie horizon model of a rm facing adjusmen coss o invesmen.
More informationIntroduction to Probability and Statistics Slides 4 Chapter 4
Inroducion o Probabiliy and Saisics Slides 4 Chaper 4 Ammar M. Sarhan, asarhan@mahsa.dal.ca Deparmen of Mahemaics and Saisics, Dalhousie Universiy Fall Semeser 8 Dr. Ammar Sarhan Chaper 4 Coninuous Random
More informationLecture 33: November 29
36-705: Inermediae Saisics Fall 2017 Lecurer: Siva Balakrishnan Lecure 33: November 29 Today we will coninue discussing he boosrap, and hen ry o undersand why i works in a simple case. In he las lecure
More informationState-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter
Sae-Space Models Iniializaion, Esimaion and Smoohing of he Kalman Filer Iniializaion of he Kalman Filer The Kalman filer shows how o updae pas predicors and he corresponding predicion error variances when
More informationAnswers to Exercises in Chapter 7 - Correlation Functions
M J Robers - //8 Answers o Exercises in Chaper 7 - Correlaion Funcions 7- (from Papoulis and Pillai) The random variable C is uniform in he inerval (,T ) Find R, ()= u( C), ()= C (Use R (, )= R,, < or
More informationAn recursive analytical technique to estimate time dependent physical parameters in the presence of noise processes
WHAT IS A KALMAN FILTER An recursive analyical echnique o esimae ime dependen physical parameers in he presence of noise processes Example of a ime and frequency applicaion: Offse beween wo clocks PREDICTORS,
More informationLearning Objectives: Practice designing and simulating digital circuits including flip flops Experience state machine design procedure
Lab 4: Synchronous Sae Machine Design Summary: Design and implemen synchronous sae machine circuis and es hem wih simulaions in Cadence Viruoso. Learning Objecives: Pracice designing and simulaing digial
More informationLearning to Take Concurrent Actions
Learning o Take Concurren Acions Khashayar Rohanimanesh Deparmen of Compuer Science Universiy of Massachuses Amhers, MA 0003 khash@cs.umass.edu Sridhar Mahadevan Deparmen of Compuer Science Universiy of
More information( ) a system of differential equations with continuous parametrization ( T = R + These look like, respectively:
XIII. DIFFERENCE AND DIFFERENTIAL EQUATIONS Ofen funcions, or a sysem of funcion, are paramerized in erms of some variable, usually denoed as and inerpreed as ime. The variable is wrien as a funcion of
More informationCode-specific policy gradient rules for spiking neurons
Code-specific policy gradien rules for spiking neurons Henning Sprekeler Guillaume Hennequin Wulfram Gersner Laboraory for Compuaional Neuroscience École Polyechnique Fédérale de Lausanne 115 Lausanne
More informationChapter 2. First Order Scalar Equations
Chaper. Firs Order Scalar Equaions We sar our sudy of differenial equaions in he same way he pioneers in his field did. We show paricular echniques o solve paricular ypes of firs order differenial equaions.
More informationMacroeconomic Theory Ph.D. Qualifying Examination Fall 2005 ANSWER EACH PART IN A SEPARATE BLUE BOOK. PART ONE: ANSWER IN BOOK 1 WEIGHT 1/3
Macroeconomic Theory Ph.D. Qualifying Examinaion Fall 2005 Comprehensive Examinaion UCLA Dep. of Economics You have 4 hours o complee he exam. There are hree pars o he exam. Answer all pars. Each par has
More informationMore Digital Logic. t p output. Low-to-high and high-to-low transitions could have different t p. V in (t)
EECS 4 Spring 23 Lecure 2 EECS 4 Spring 23 Lecure 2 More igial Logic Gae delay and signal propagaion Clocked circui elemens (flip-flop) Wriing a word o memory Simplifying digial circuis: Karnaugh maps
More informationOn-line Adaptive Optimal Timing Control of Switched Systems
On-line Adapive Opimal Timing Conrol of Swiched Sysems X.C. Ding, Y. Wardi and M. Egersed Absrac In his paper we consider he problem of opimizing over he swiching imes for a muli-modal dynamic sysem when
More informationNEWTON S SECOND LAW OF MOTION
Course and Secion Dae Names NEWTON S SECOND LAW OF MOTION The acceleraion of an objec is defined as he rae of change of elociy. If he elociy changes by an amoun in a ime, hen he aerage acceleraion during
More informationGeorey E. Hinton. University oftoronto. Technical Report CRG-TR February 22, Abstract
Parameer Esimaion for Linear Dynamical Sysems Zoubin Ghahramani Georey E. Hinon Deparmen of Compuer Science Universiy oftorono 6 King's College Road Torono, Canada M5S A4 Email: zoubin@cs.orono.edu Technical
More information