CSE/NB 528 Lecture 14: From Supervised to Reinforcement Learning (Chapter 9) R. Rao, 528: Lecture 14
|
|
- Brendan Doyle
- 5 years ago
- Views:
Transcription
1 CSE/NB 58 Lecure 14: From Supervised o Reinforcemen Learning Chaper 9 1
2 Recall from las ime: Sigmoid Neworks Oupu v T g w u g wiui w Inpu nodes u = u 1 u u 3 T i Sigmoid oupu funcion: 1 g a 1 a e 1 ga a a Sigmoid is a non-linear squashing funcion: Squashes inpu o be beween 0 and 1. Parameer conrols he slope.
3 3 Wha should we opimize? Given raining examples u m,d m m = 1,, N, define he oupu error funcion: 1 m m v d E w m T m g v u w where How would you change w so ha Ew is minimized?
4 Learning he Synapic Weighs How would you change w so ha Ew is minimized? Gradien Descen: Change w in proporion o de/dw why? de w w dw de dw m m d v g dela = error w T E w u m u m 1 d m v m Also known as he dela rule or LMS leas mean square rule Derivaive of sigmoid 4
5 Bu wai. Wha if we have muliple layers? v i g Wjig wkjuk j k Oupu v = v 1 v v J T ; Desired = d Dela rule can be used o adap hese weighs How do we adap hese? no desired oupu provided here Inpu u = u 1 u u K T 5
6 Ener he backpropagaion algorihm Acually, nohing bu he chain rule from calculus 6
7 Uppermos layer dela rule v i g Wjix j j 1 E W, w d i vi i x j Learning rule for hidden-oupu weighs W: de W ji W ji dw {gradien descen} de dw R. Rao, 58: Lecure ji 14 ji di vi g j W ji x j x j u k {dela rule} 7
8 Backpropagaion: Inner layer chain rule 1 E W, w d i vi i v m i g Wjix j j x m j g k w kj u m k w kj de dw w kj kj m, i de dw d m i kj v Bu m i : g de dw j kj W ji x de dx m j m u k Learning rule for inpu-hidden weighs w: j W ji dx dw j kj g {chain rule} k w kj u m k u m k 8
9 Example: Learning o Drive 9
10 Example Nework Ge seering angle Training Oupu: d = d 1 d d 30 Ge curren camera image Training Inpu u = u 1 u u 960 = image pixels 10 Pomerleau, 199
11 Training he nework using backprop Sar wih random weighs W, w Given inpu u, nework produces oupu v u k Use backprop o learn W and w ha minimize oal error over all oupu unis labeled i: 1 E W, w d i vi i 11
12 Learning o Drive using Backprop One of he learned road feaures w i 1
13 ALVINN Auonomous Land Vehicle in a Neural Nework Trained using human driver + camera images Afer learning: Drove up o 70 mph on highway Up o miles wihou inervenion Drove cross-counry largely auonomously Pomerleau,
14 Bu ha doesn help me find food in a maze 14
15 Humans and animals in general don ge exac supervisory signals commands for muscles for learning o alk, walk, ride a bicycle, play he piano, drive, ec. We learn by rial-and-error wih hins from ohers Migh ge rewards and punishmens along he way Ener Reinforcemen Learning 15
16 The Reinforcemen Learning Agen Agen Sae u Reward r Acion a Environmen 16
17 The Reinforcemen Learning Framework Unsupervised learning: Learn he hidden causes of inpus Supervised learning: Learn a funcion based on raining examples of inpu, desired oupu pairs Reinforcemen Learning: Learn he bes acion for any given sae so as o maximize oal expeced fuure reward Inermediae beween unsupervised and supervised learning Insead of explici eaching signal or desired oupu, you ge rewards or punishmens Inspired by classical condiioning experimens 17
18 Early Resuls: Pavlov and his Dog Classical Pavlovian condiioning experimens Training: Bell Food Afer: Bell Salivae Condiioned simulus bell predics fuure reward food hp://employees.csbsju.edu/creed/pb/pdoganim.hml 18
19 Predicing Delayed Rewards Reward is ypically delivered a he end when you know wheher you succeeded or no Time: 0 T wih simulus u and reward r a each ime sep Noe: r can be zero a some ime poins Key Idea: Make he oupu v predic oal expeced fuure reward saring from ime v T 0 r 19
20 Learning o Predic Delayed Rewards Use a se of modifiable weighs w and predic based on all pas simuli u: v w u 0 Would like o find he weighs or filer w ha minimize: T 0 r v Can we minimize his using gradien descen and dela rule? Yes, BUT no ye available are he fuure rewards 0
21 1 Temporal Difference TD Learning Key Idea: Rewrie squared error o ge rid of fuure erms: Temporal Difference TD Learning: v v r v r r v r T T ] 1 [ u v v r w w Expeced fuure reward Predicion
22 Predicing Delayed Reward: TD Learning Simulus a = 100 and reward a = 00 Predicion error for each ime sep over many rials
23 Reward Predicion Error in he Primae Brain? Dopaminergic cells in Venral Tegmenal Area VTA Reward Predicion error? [ r v 1 v ] Before Training Afer Training [ 0 v 1 v ] No error v r v 1 3
24 More Evidence for Predicion Error Signals Dopaminergic cells in VTA Negaive error r 0, v 1 0 [ r v 1 v ] v 4
25 Tha s grea, bu how does all ha mah help me ge food in a maze? 5
26 Using Reward Predicions o Selec Acions Suppose you have compued a Value for each acion Qa = value prediced reward for execuing acion a Higher if acion yields more reward, lower oherwise Can selec acions probabilisically according o heir value: P a a' exp Q a exp Q a' High selecs acions wih highes Q value. Low selecs more uniformly 6
27 Simple Example: Bee Foraging Experimen: Bees selec eiher yellow y or blue b flowers based on necar reward Idea: Value of yellow/blue = average reward obained so far Q y Q y r Q b Q b r P y P b 1 P y b y Q y Q b exp Q y exp Q y exp Q b dela rule running average Yum! hp://svi.cps.uexas.edu/bee_on_flower_original.hm 7
28 Simulaing Bees r r y b 1 r r y b 1 = 1 exploraion possible Q y b = 50 = 50 mosly exploiaion 8
29 Forge bees, how do I ge o he food in he maze? 9
30 Selecing Acions when Reward is Delayed Saes: A, B, or C Possible acions a any sae: Lef L or Righ R If you randomly choose o go L or R random policy, wha is he value v of each sae? 30
31 31 Policy Evaluaion For random policy: Can learn his using TD learning: C v B v A v C v B v ] ' [ u v u v u r w u u w a u,a u Le vu = wu Locaion, acion new locaion
32 Maze Value Learning for Random Policy Once I know he values, I can pick he acion ha leads o he higher valued sae! 3
33 Selecing Acions based on Values.5 1 Values ac as surrogae immediae rewards Locally opimal choice leads o globally opimal policy for Markov environmens Relaed o Dynamic Programming in CS see appendix in ex 33
34 Acor-Criic Learning Two separae componens: Acor mainains policy and Criic mainains value of each sae 1. Criic Learning Policy Evaluaion : Value of sae u = vu = wu w u w u [ ra u v u' v u]. Acor Learning Policy Improvemen : P a; u Q a' For all a : exp Qa u exp Q u u Qa ' u [ ra u v u' v u] aa ' 3. Inerleave 1 and b b same as TD rule Use his o selec an acion a in u P a'; u 34
35 Acor-Criic Learning in he Maze Task Probabiliy of going Lef a a locaion 35
36 Demo of Reinforcemen Learning in a Robo from hp://sysplan.nams.kyushu-u.ac.jp/gen/papers/javademoml97/robodemo.hml 36
37 Things o do: Finish homework 3 Work on group projec Nex week: Prof. Emo Todorov on moor conrol 37
CSE/NB 528 Lecture 14: Reinforcement Learning (Chapter 9)
CSE/NB 528 Lecure 14: Reinforcemen Learning Chaper 9 Image from hp://clasdean.la.asu.edu/news/images/ubep2001/neuron3.jpg Lecure figures are from Dayan & Abbo s book hp://people.brandeis.edu/~abbo/book/index.hml
More informationChapter 21. Reinforcement Learning. The Reinforcement Learning Agent
CSE 47 Chaper Reinforcemen Learning The Reinforcemen Learning Agen Agen Sae u Reward r Acion a Enironmen CSE AI Faculy Why reinforcemen learning Programming an agen o drie a car or fly a helicoper is ery
More informationCSE/NEURO 528 Lecture 13: Reinforcement Learning & Course Review (Chapter 9)
CSE/NEURO 528 Lecure 13: Reinforceen Learning & Course Review Chaper 9 Aniaion: To Creed, SJU 1 Early Resuls: Pavlov and his Dog F Classical Pavlovian condiioning experiens F Training: Bell Food F Afer:
More informationOnline Convex Optimization Example And Follow-The-Leader
CSE599s, Spring 2014, Online Learning Lecure 2-04/03/2014 Online Convex Opimizaion Example And Follow-The-Leader Lecurer: Brendan McMahan Scribe: Sephen Joe Jonany 1 Review of Online Convex Opimizaion
More informationEnsamble methods: Boosting
Lecure 21 Ensamble mehods: Boosing Milos Hauskrech milos@cs.pi.edu 5329 Senno Square Schedule Final exam: April 18: 1:00-2:15pm, in-class Term projecs April 23 & April 25: a 1:00-2:30pm in CS seminar room
More informationDeep Learning: Theory, Techniques & Applications - Recurrent Neural Networks -
Deep Learning: Theory, Techniques & Applicaions - Recurren Neural Neworks - Prof. Maeo Maeucci maeo.maeucci@polimi.i Deparmen of Elecronics, Informaion and Bioengineering Arificial Inelligence and Roboics
More informationEnsamble methods: Bagging and Boosting
Lecure 21 Ensamble mehods: Bagging and Boosing Milos Hauskrech milos@cs.pi.edu 5329 Senno Square Ensemble mehods Mixure of expers Muliple base models (classifiers, regressors), each covers a differen par
More informationClassical Conditioning IV: TD learning in the brain
Classical Condiioning IV: TD learning in he brain PSY/NEU338: Animal learning and decision making: Psychological, compuaional and neural perspecives recap: Marr s levels of analysis David Marr (1945-1980)
More informationCHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK
175 CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK 10.1 INTRODUCTION Amongs he research work performed, he bes resuls of experimenal work are validaed wih Arificial Neural Nework. From he
More informationRL Lecture 7: Eligibility Traces. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1
RL Lecure 7: Eligibiliy Traces R. S. Suon and A. G. Baro: Reinforcemen Learning: An Inroducion 1 N-sep TD Predicion Idea: Look farher ino he fuure when you do TD backup (1, 2, 3,, n seps) R. S. Suon and
More informationPredator - Prey Model Trajectories and the nonlinear conservation law
Predaor - Prey Model Trajecories and he nonlinear conservaion law James K. Peerson Deparmen of Biological Sciences and Deparmen of Mahemaical Sciences Clemson Universiy Ocober 28, 213 Ouline Drawing Trajecories
More informationPresentation Overview
Acion Refinemen in Reinforcemen Learning by Probabiliy Smoohing By Thomas G. Dieerich & Didac Busques Speaer: Kai Xu Presenaion Overview Bacground The Probabiliy Smoohing Mehod Experimenal Sudy of Acion
More informationZürich. ETH Master Course: L Autonomous Mobile Robots Localization II
Roland Siegwar Margaria Chli Paul Furgale Marco Huer Marin Rufli Davide Scaramuzza ETH Maser Course: 151-0854-00L Auonomous Mobile Robos Localizaion II ACT and SEE For all do, (predicion updae / ACT),
More informationEmbedded Systems and Software. A Simple Introduction to Embedded Control Systems (PID Control)
Embedded Sysems and Sofware A Simple Inroducion o Embedded Conrol Sysems (PID Conrol) Embedded Sysems and Sofware, ECE:3360. The Universiy of Iowa, 2016 Slide 1 Acknowledgemens The maerial in his lecure
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Noes for EE7C Spring 018: Convex Opimizaion and Approximaion Insrucor: Moriz Hard Email: hard+ee7c@berkeley.edu Graduae Insrucor: Max Simchowiz Email: msimchow+ee7c@berkeley.edu Ocober 15, 018 3
More informationLinear Response Theory: The connection between QFT and experiments
Phys540.nb 39 3 Linear Response Theory: The connecion beween QFT and experimens 3.1. Basic conceps and ideas Q: How do we measure he conduciviy of a meal? A: we firs inroduce a weak elecric field E, and
More informationMachine Learning 4771
ony Jebara, Columbia Universiy achine Learning 4771 Insrucor: ony Jebara ony Jebara, Columbia Universiy opic 20 Hs wih Evidence H Collec H Evaluae H Disribue H Decode H Parameer Learning via JA & E ony
More informationMath Week 14 April 16-20: sections first order systems of linear differential equations; 7.4 mass-spring systems.
Mah 2250-004 Week 4 April 6-20 secions 7.-7.3 firs order sysems of linear differenial equaions; 7.4 mass-spring sysems. Mon Apr 6 7.-7.2 Sysems of differenial equaions (7.), and he vecor Calculus we need
More informationSOLUTIONS TO ECE 3084
SOLUTIONS TO ECE 384 PROBLEM 2.. For each sysem below, specify wheher or no i is: (i) memoryless; (ii) causal; (iii) inverible; (iv) linear; (v) ime invarian; Explain your reasoning. If he propery is no
More informationChapter 7: Solving Trig Equations
Haberman MTH Secion I: The Trigonomeric Funcions Chaper 7: Solving Trig Equaions Le s sar by solving a couple of equaions ha involve he sine funcion EXAMPLE a: Solve he equaion sin( ) The inverse funcions
More informationWritten HW 9 Sol. CS 188 Fall Introduction to Artificial Intelligence
CS 188 Fall 2018 Inroducion o Arificial Inelligence Wrien HW 9 Sol. Self-assessmen due: Tuesday 11/13/2018 a 11:59pm (submi via Gradescope) For he self assessmen, fill in he self assessmen boxes in your
More informationDimitri Solomatine. D.P. Solomatine. Data-driven modelling (part 2). 2
Daa-driven modelling. Par. Daa-driven Arificial di Neural modelling. Newors Par Dimiri Solomaine Arificial neural newors D.P. Solomaine. Daa-driven modelling par. 1 Arificial neural newors ANN: main pes
More informationAn recursive analytical technique to estimate time dependent physical parameters in the presence of noise processes
WHAT IS A KALMAN FILTER An recursive analyical echnique o esimae ime dependen physical parameers in he presence of noise processes Example of a ime and frequency applicaion: Offse beween wo clocks PREDICTORS,
More information1 Review of Zero-Sum Games
COS 5: heoreical Machine Learning Lecurer: Rob Schapire Lecure #23 Scribe: Eugene Brevdo April 30, 2008 Review of Zero-Sum Games Las ime we inroduced a mahemaical model for wo player zero-sum games. Any
More informationNotes on online convex optimization
Noes on online convex opimizaion Karl Sraos Online convex opimizaion (OCO) is a principled framework for online learning: OnlineConvexOpimizaion Inpu: convex se S, number of seps T For =, 2,..., T : Selec
More informationMath 2142 Exam 1 Review Problems. x 2 + f (0) 3! for the 3rd Taylor polynomial at x = 0. To calculate the various quantities:
Mah 4 Eam Review Problems Problem. Calculae he 3rd Taylor polynomial for arcsin a =. Soluion. Le f() = arcsin. For his problem, we use he formula f() + f () + f ()! + f () 3! for he 3rd Taylor polynomial
More information2016 Possible Examination Questions. Robotics CSCE 574
206 Possible Examinaion Quesions Roboics CSCE 574 ) Wha are he differences beween Hydraulic drive and Shape Memory Alloy drive? Name one applicaion in which each one of hem is appropriae. 2) Wha are he
More informationLabQuest 24. Capacitors
Capaciors LabQues 24 The charge q on a capacior s plae is proporional o he poenial difference V across he capacior. We express his wih q V = C where C is a proporionaliy consan known as he capaciance.
More information= ( ) ) or a system of differential equations with continuous parametrization (T = R
XIII. DIFFERENCE AND DIFFERENTIAL EQUATIONS Ofen funcions, or a sysem of funcion, are paramerized in erms of some variable, usually denoed as and inerpreed as ime. The variable is wrien as a funcion of
More informationSolutions Problem Set 3 Macro II (14.452)
Soluions Problem Se 3 Macro II (14.452) Francisco A. Gallego 04/27/2005 1 Q heory of invesmen in coninuous ime and no uncerainy Consider he in nie horizon model of a rm facing adjusmen coss o invesmen.
More informationToday: Graphing. Note: I hope this joke will be funnier (or at least make you roll your eyes and say ugh ) after class. v (miles per hour ) Time
+v Today: Graphing v (miles per hour ) 9 8 7 6 5 4 - - Time Noe: I hope his joke will be funnier (or a leas make you roll your eyes and say ugh ) afer class. Do yourself a favor! Prof Sarah s fail-safe
More informationGeorey E. Hinton. University oftoronto. Technical Report CRG-TR February 22, Abstract
Parameer Esimaion for Linear Dynamical Sysems Zoubin Ghahramani Georey E. Hinon Deparmen of Compuer Science Universiy oftorono 6 King's College Road Torono, Canada M5S A4 Email: zoubin@cs.orono.edu Technical
More informationd 1 = c 1 b 2 - b 1 c 2 d 2 = c 1 b 3 - b 1 c 3
and d = c b - b c c d = c b - b c c This process is coninued unil he nh row has been compleed. The complee array of coefficiens is riangular. Noe ha in developing he array an enire row may be divided or
More informationOpen loop vs Closed Loop. Example: Open Loop. Example: Feedforward Control. Advanced Control I
Open loop vs Closed Loop Advanced I Moor Command Movemen Overview Open Loop vs Closed Loop Some examples Useful Open Loop lers Dynamical sysems CPG (biologically inspired ), Force Fields Feedback conrol
More informationTwo Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017
Two Popular Bayesian Esimaors: Paricle and Kalman Filers McGill COMP 765 Sep 14 h, 2017 1 1 1, dx x Bel x u x P x z P Recall: Bayes Filers,,,,,,, 1 1 1 1 u z u x P u z u x z P Bayes z = observaion u =
More informationSimulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010
Simulaion-Solving Dynamic Models ABE 5646 Week 2, Spring 2010 Week Descripion Reading Maerial 2 Compuer Simulaion of Dynamic Models Finie Difference, coninuous saes, discree ime Simple Mehods Euler Trapezoid
More informationSequential Importance Resampling (SIR) Particle Filter
Paricle Filers++ Pieer Abbeel UC Berkeley EECS Many slides adaped from Thrun, Burgard and Fox, Probabilisic Roboics 1. Algorihm paricle_filer( S -1, u, z ): 2. Sequenial Imporance Resampling (SIR) Paricle
More information23.2. Representing Periodic Functions by Fourier Series. Introduction. Prerequisites. Learning Outcomes
Represening Periodic Funcions by Fourier Series 3. Inroducion In his Secion we show how a periodic funcion can be expressed as a series of sines and cosines. We begin by obaining some sandard inegrals
More informationRobust Learning Control with Application to HVAC Systems
Robus Learning Conrol wih Applicaion o HVAC Sysems Naional Science Foundaion & Projec Invesigaors: Dr. Charles Anderson, CS Dr. Douglas Hile, ME Dr. Peer Young, ECE Mechanical Engineering Compuer Science
More informationVehicle Arrival Models : Headway
Chaper 12 Vehicle Arrival Models : Headway 12.1 Inroducion Modelling arrival of vehicle a secion of road is an imporan sep in raffic flow modelling. I has imporan applicaion in raffic flow simulaion where
More informationEconomics 8105 Macroeconomic Theory Recitation 6
Economics 8105 Macroeconomic Theory Reciaion 6 Conor Ryan Ocober 11h, 2016 Ouline: Opimal Taxaion wih Governmen Invesmen 1 Governmen Expendiure in Producion In hese noes we will examine a model in which
More informationNotes for Lecture 17-18
U.C. Berkeley CS278: Compuaional Complexiy Handou N7-8 Professor Luca Trevisan April 3-8, 2008 Noes for Lecure 7-8 In hese wo lecures we prove he firs half of he PCP Theorem, he Amplificaion Lemma, up
More informationARTIFICIAL INTELLIGENCE. Markov decision processes
INFOB2KI 2017-2018 Urech Univeriy The Neherland ARTIFICIAL INTELLIGENCE Markov deciion procee Lecurer: Silja Renooij Thee lide are par of he INFOB2KI Coure Noe available from www.c.uu.nl/doc/vakken/b2ki/chema.hml
More informationLars Nesheim. 17 January Last lecture solved the consumer choice problem.
Lecure 4 Locaional Equilibrium Coninued Lars Nesheim 17 January 28 1 Inroducory remarks Las lecure solved he consumer choice problem. Compued condiional demand funcions: C (I x; p; r (x)) and x; p; r (x))
More informationLinear Time-invariant systems, Convolution, and Cross-correlation
Linear Time-invarian sysems, Convoluion, and Cross-correlaion (1) Linear Time-invarian (LTI) sysem A sysem akes in an inpu funcion and reurns an oupu funcion. x() T y() Inpu Sysem Oupu y() = T[x()] An
More information6.003 Homework #9 Solutions
6.00 Homework #9 Soluions Problems. Fourier varieies a. Deermine he Fourier series coefficiens of he following signal, which is periodic in 0. x () 0 0 a 0 5 a k sin πk 5 sin πk 5 πk for k 0 a k 0 πk j
More information20. Applications of the Genetic-Drift Model
0. Applicaions of he Geneic-Drif Model 1) Deermining he probabiliy of forming any paricular combinaion of genoypes in he nex generaion: Example: If he parenal allele frequencies are p 0 = 0.35 and q 0
More informationAnnouncements: Warm-up Exercise:
Fri Apr 13 7.1 Sysems of differenial equaions - o model muli-componen sysems via comparmenal analysis hp//en.wikipedia.org/wiki/muli-comparmen_model Announcemens Warm-up Exercise Here's a relaively simple
More informationNumerical Dispersion
eview of Linear Numerical Sabiliy Numerical Dispersion n he previous lecure, we considered he linear numerical sabiliy of boh advecion and diffusion erms when approimaed wih several spaial and emporal
More informationEECE 301 Signals & Systems Prof. Mark Fowler
EECE 31 Signals & Sysems Prof. Mark Fowler Noe Se #1 C-T Sysems: Convoluion Represenaion Reading Assignmen: Secion 2.6 of Kamen and Heck 1/11 Course Flow Diagram The arrows here show concepual flow beween
More information6.003 Homework #9 Solutions
6.003 Homework #9 Soluions Problems. Fourier varieies a. Deermine he Fourier series coefficiens of he following signal, which is periodic in 0. x () 0 3 0 a 0 5 a k a k 0 πk j3 e 0 e j πk 0 jπk πk e 0
More information1. An introduction to dynamic optimization -- Optimal Control and Dynamic Programming AGEC
This documen was generaed a :45 PM 8/8/04 Copyrigh 04 Richard T. Woodward. An inroducion o dynamic opimizaion -- Opimal Conrol and Dynamic Programming AGEC 637-04 I. Overview of opimizaion Opimizaion is
More informationViterbi Algorithm: Background
Vierbi Algorihm: Background Jean Mark Gawron March 24, 2014 1 The Key propery of an HMM Wha is an HMM. Formally, i has he following ingrediens: 1. a se of saes: S 2. a se of final saes: F 3. an iniial
More information2.7. Some common engineering functions. Introduction. Prerequisites. Learning Outcomes
Some common engineering funcions 2.7 Inroducion This secion provides a caalogue of some common funcions ofen used in Science and Engineering. These include polynomials, raional funcions, he modulus funcion
More informationHidden Markov Models
Hidden Markov Models Probabilisic reasoning over ime So far, we ve mosly deal wih episodic environmens Excepions: games wih muliple moves, planning In paricular, he Bayesian neworks we ve seen so far describe
More informationRobust and Learning Control for Complex Systems
Robus and Learning Conrol for Complex Sysems Peer M. Young Sepember 13, 2007 & Talk Ouline Inroducion Robus Conroller Analysis and Design Theory Experimenal Applicaions Overview MIMO Robus HVAC Conrol
More informationt is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t...
Mah 228- Fri Mar 24 5.6 Marix exponenials and linear sysems: The analogy beween firs order sysems of linear differenial equaions (Chaper 5) and scalar linear differenial equaions (Chaper ) is much sronger
More informationChapter 8 The Complete Response of RL and RC Circuits
Chaper 8 The Complee Response of RL and RC Circuis Seoul Naional Universiy Deparmen of Elecrical and Compuer Engineering Wha is Firs Order Circuis? Circuis ha conain only one inducor or only one capacior
More informationInventory Control of Perishable Items in a Two-Echelon Supply Chain
Journal of Indusrial Engineering, Universiy of ehran, Special Issue,, PP. 69-77 69 Invenory Conrol of Perishable Iems in a wo-echelon Supply Chain Fariborz Jolai *, Elmira Gheisariha and Farnaz Nojavan
More informationArticle from. Predictive Analytics and Futurism. July 2016 Issue 13
Aricle from Predicive Analyics and Fuurism July 6 Issue An Inroducion o Incremenal Learning By Qiang Wu and Dave Snell Machine learning provides useful ools for predicive analyics The ypical machine learning
More information3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon
3..3 INRODUCION O DYNAMIC OPIMIZAION: DISCREE IME PROBLEMS A. he Hamilonian and Firs-Order Condiions in a Finie ime Horizon Define a new funcion, he Hamilonian funcion, H. H he change in he oal value of
More informationHW6: MRI Imaging Pulse Sequences (7 Problems for 100 pts)
HW6: MRI Imaging Pulse Sequences (7 Problems for 100 ps) GOAL The overall goal of HW6 is o beer undersand pulse sequences for MRI image reconsrucion. OBJECTIVES 1) Design a spin echo pulse sequence o image
More informationEECS 141: FALL 00 MIDTERM 2
Universiy of California College of Engineering Deparmen of Elecrical Engineering and Compuer Science J. M. Rabaey TuTh9:30-11am ee141@eecs EECS 141: FALL 00 MIDTERM 2 For all problems, you can assume he
More informationSMT 2014 Calculus Test Solutions February 15, 2014 = 3 5 = 15.
SMT Calculus Tes Soluions February 5,. Le f() = and le g() =. Compue f ()g (). Answer: 5 Soluion: We noe ha f () = and g () = 6. Then f ()g () =. Plugging in = we ge f ()g () = 6 = 3 5 = 5.. There is a
More information( ) a system of differential equations with continuous parametrization ( T = R + These look like, respectively:
XIII. DIFFERENCE AND DIFFERENTIAL EQUATIONS Ofen funcions, or a sysem of funcion, are paramerized in erms of some variable, usually denoed as and inerpreed as ime. The variable is wrien as a funcion of
More informationMath 10B: Mock Mid II. April 13, 2016
Name: Soluions Mah 10B: Mock Mid II April 13, 016 1. ( poins) Sae, wih jusificaion, wheher he following saemens are rue or false. (a) If a 3 3 marix A saisfies A 3 A = 0, hen i canno be inverible. True.
More informationINDEX. Transient analysis 1 Initial Conditions 1
INDEX Secion Page Transien analysis 1 Iniial Condiions 1 Please inform me of your opinion of he relaive emphasis of he review maerial by simply making commens on his page and sending i o me a: Frank Mera
More informationReinforcement Learning: A Tutorial. Scope of Tutorial. 1 Introduction
Reinforcemen Learning: A Tuorial Mance E. Harmon WL/AACF 224 Avionics Circle Wrigh Laboraory Wrigh-Paerson AFB, OH 45433 mharmon@acm.org Sephanie S. Harmon Wrigh Sae Universiy 56-8 Mallard Glen Drive Cenerville,
More informationTHE MYSTERY OF STOCHASTIC MECHANICS. Edward Nelson Department of Mathematics Princeton University
THE MYSTERY OF STOCHASTIC MECHANICS Edward Nelson Deparmen of Mahemaics Princeon Universiy 1 Classical Hamilon-Jacobi heory N paricles of various masses on a Euclidean space. Incorporae he masses in he
More informationu(t) Figure 1. Open loop control system
Open loop conrol v cloed loop feedbac conrol The nex wo figure preen he rucure of open loop and feedbac conrol yem Figure how an open loop conrol yem whoe funcion i o caue he oupu y o follow he reference
More informationCS 4495 Computer Vision Tracking 1- Kalman,Gaussian
CS 4495 Compuer Vision A. Bobick CS 4495 Compuer Vision - KalmanGaussian Aaron Bobick School of Ineracive Compuing CS 4495 Compuer Vision A. Bobick Adminisrivia S5 will be ou his Thurs Due Sun Nov h :55pm
More informationBias-Variance Error Bounds for Temporal Difference Updates
Bias-Variance Bounds for Temporal Difference Updaes Michael Kearns AT&T Labs mkearns@research.a.com Sainder Singh AT&T Labs baveja@research.a.com Absrac We give he firs rigorous upper bounds on he error
More information1. An introduction to dynamic optimization -- Optimal Control and Dynamic Programming AGEC
This documen was generaed a :37 PM, 1/11/018 Copyrigh 018 Richard T. Woodward 1. An inroducion o dynamic opimiaion -- Opimal Conrol and Dynamic Programming AGEC 64-018 I. Overview of opimiaion Opimiaion
More informationOnline Monte-Carlo Rollout
Presenaion Ouline Online Mone-Carlo Rollou For he Ship Self Defense Problem by Sébasien Chouinard 2828-88-7 The ship self-defense problem; Uncerain duraions and decision epochs; The Mone-Carlo Rollou algorihm;
More informationThe average rate of change between two points on a function is d t
SM Dae: Secion: Objecive: The average rae of change beween wo poins on a funcion is d. For example, if he funcion ( ) represens he disance in miles ha a car has raveled afer hours, hen finding he slope
More informationDetecting nonlinear processes in experimental data: Applications in Psychology and Medicine
Deecing nonlinear processes in eperimenal daa: Applicaions in Psychology and Medicine Richard A. Heah Division of Psychology, Universiy of Sunderland, UK richard.heah@sunderland.ac.uk Menu For Today Time
More informationDesigning Information Devices and Systems I Spring 2019 Lecture Notes Note 17
EES 16A Designing Informaion Devices and Sysems I Spring 019 Lecure Noes Noe 17 17.1 apaciive ouchscreen In he las noe, we saw ha a capacior consiss of wo pieces on conducive maerial separaed by a nonconducive
More informationA Reinforcement Learning Approach for Collaborative Filtering
A Reinforcemen Learning Approach for Collaboraive Filering Jungkyu Lee, Byonghwa Oh 2, Jihoon Yang 2, and Sungyong Park 2 Cyram Inc, Seoul, Korea jklee@cyram.com 2 Sogang Universiy, Seoul, Korea {mrfive,yangjh,parksy}@sogang.ac.kr
More informationLecture 2 October ε-approximation of 2-player zero-sum games
Opimizaion II Winer 009/10 Lecurer: Khaled Elbassioni Lecure Ocober 19 1 ε-approximaion of -player zero-sum games In his lecure we give a randomized ficiious play algorihm for obaining an approximae soluion
More informationGround Rules. PC1221 Fundamentals of Physics I. Kinematics. Position. Lectures 3 and 4 Motion in One Dimension. A/Prof Tay Seng Chuan
Ground Rules PC11 Fundamenals of Physics I Lecures 3 and 4 Moion in One Dimension A/Prof Tay Seng Chuan 1 Swich off your handphone and pager Swich off your lapop compuer and keep i No alking while lecure
More informationMATH 5720: Gradient Methods Hung Phan, UMass Lowell October 4, 2018
MATH 5720: Gradien Mehods Hung Phan, UMass Lowell Ocober 4, 208 Descen Direcion Mehods Consider he problem min { f(x) x R n}. The general descen direcions mehod is x k+ = x k + k d k where x k is he curren
More informationA Dynamic Model of Economic Fluctuations
CHAPTER 15 A Dynamic Model of Economic Flucuaions Modified for ECON 2204 by Bob Murphy 2016 Worh Publishers, all righs reserved IN THIS CHAPTER, OU WILL LEARN: how o incorporae dynamics ino he AD-AS model
More informationCode-specific policy gradient rules for spiking neurons
Code-specific policy gradien rules for spiking neurons Henning Sprekeler Guillaume Hennequin Wulfram Gersner Laboraory for Compuaional Neuroscience École Polyechnique Fédérale de Lausanne 115 Lausanne
More informationOnline Learning with Partial Feedback. 1 Online Mirror Descent with Estimated Gradient
Avance Course in Machine Learning Spring 2010 Online Learning wih Parial Feeback Hanous are joinly prepare by Shie Mannor an Shai Shalev-Shwarz In previous lecures we alke abou he general framework of
More informationTwo Coupled Oscillators / Normal Modes
Lecure 3 Phys 3750 Two Coupled Oscillaors / Normal Modes Overview and Moivaion: Today we ake a small, bu significan, sep owards wave moion. We will no ye observe waves, bu his sep is imporan in is own
More informationEECE 301 Signals & Systems Prof. Mark Fowler
EECE 3 Signals & Sysems Prof. Mark Fowler Noe Se #2 Wha are Coninuous-Time Signals??? Reading Assignmen: Secion. of Kamen and Heck /22 Course Flow Diagram The arrows here show concepual flow beween ideas.
More informationIntroduction to Probability and Statistics Slides 4 Chapter 4
Inroducion o Probabiliy and Saisics Slides 4 Chaper 4 Ammar M. Sarhan, asarhan@mahsa.dal.ca Deparmen of Mahemaics and Saisics, Dalhousie Universiy Fall Semeser 8 Dr. Ammar Sarhan Chaper 4 Coninuous Random
More informationGuest Lectures for Dr. MacFarlane s EE3350 Part Deux
Gues Lecures for Dr. MacFarlane s EE3350 Par Deux Michael Plane Mon., 08-30-2010 Wrie name in corner. Poin ou his is a review, so I will go faser. Remind hem o go lisen o online lecure abou geing an A
More informationEXERCISES FOR SECTION 1.5
1.5 Exisence and Uniqueness of Soluions 43 20. 1 v c 21. 1 v c 1 2 4 6 8 10 1 2 2 4 6 8 10 Graph of approximae soluion obained using Euler s mehod wih = 0.1. Graph of approximae soluion obained using Euler
More information2.4 Cuk converter example
2.4 Cuk converer example C 1 Cuk converer, wih ideal swich i 1 i v 1 2 1 2 C 2 v 2 Cuk converer: pracical realizaion using MOSFET and diode C 1 i 1 i v 1 2 Q 1 D 1 C 2 v 2 28 Analysis sraegy This converer
More informationSelf assessment due: Monday 4/29/2019 at 11:59pm (submit via Gradescope)
CS 188 Spring 2019 Inroducion o Arificial Inelligence Wrien HW 10 Due: Monday 4/22/2019 a 11:59pm (submi via Gradescope). Leave self assessmen boxes blank for his due dae. Self assessmen due: Monday 4/29/2019
More informationApproximation Algorithms for Unique Games via Orthogonal Separators
Approximaion Algorihms for Unique Games via Orhogonal Separaors Lecure noes by Konsanin Makarychev. Lecure noes are based on he papers [CMM06a, CMM06b, LM4]. Unique Games In hese lecure noes, we define
More information72 Calculus and Structures
72 Calculus and Srucures CHAPTER 5 DISTANCE AND ACCUMULATED CHANGE Calculus and Srucures 73 Copyrigh Chaper 5 DISTANCE AND ACCUMULATED CHANGE 5. DISTANCE a. Consan velociy Le s ake anoher look a Mary s
More informationLAPLACE TRANSFORM AND TRANSFER FUNCTION
CHBE320 LECTURE V LAPLACE TRANSFORM AND TRANSFER FUNCTION Professor Dae Ryook Yang Spring 2018 Dep. of Chemical and Biological Engineering 5-1 Road Map of he Lecure V Laplace Transform and Transfer funcions
More informationThis document was generated at 1:04 PM, 09/10/13 Copyright 2013 Richard T. Woodward. 4. End points and transversality conditions AGEC
his documen was generaed a 1:4 PM, 9/1/13 Copyrigh 213 Richard. Woodward 4. End poins and ransversaliy condiions AGEC 637-213 F z d Recall from Lecure 3 ha a ypical opimal conrol problem is o maimize (,,
More informationLecture 3: Exponential Smoothing
NATCOR: Forecasing & Predicive Analyics Lecure 3: Exponenial Smoohing John Boylan Lancaser Cenre for Forecasing Deparmen of Managemen Science Mehods and Models Forecasing Mehod A (numerical) procedure
More informationLecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still.
Lecure - Kinemaics in One Dimension Displacemen, Velociy and Acceleraion Everyhing in he world is moving. Nohing says sill. Moion occurs a all scales of he universe, saring from he moion of elecrons in
More informationEE100 Lab 3 Experiment Guide: RC Circuits
I. Inroducion EE100 Lab 3 Experimen Guide: A. apaciors A capacior is a passive elecronic componen ha sores energy in he form of an elecrosaic field. The uni of capaciance is he farad (coulomb/vol). Pracical
More informationSlide03 Historical Overview Haykin Chapter 3 (Chap 1, 3, 3rd Ed): Single-Layer Perceptrons Multiple Faces of a Single Neuron Part I: Adaptive Filter
Slide3 Haykin Chaper 3 (Chap, 3, 3rd Ed): Single-Layer Perceprons CPSC 636-6 Insrucor: Yoonsuck Choe Hisorical Overview McCulloch and Pis (943): neural neworks as compuing machines. Hebb (949): posulaed
More information