Presentation Overview

Size: px
Start display at page:

Download "Presentation Overview"

Transcription

1 Acion Refinemen in Reinforcemen Learning by Probabiliy Smoohing By Thomas G. Dieerich & Didac Busques Speaer: Kai Xu Presenaion Overview Bacground The Probabiliy Smoohing Mehod Experimenal Sudy of Acion Refinemen Conclusion Bacground -- Model Based Reinforcemen Learning (MBRL) Experience gained during exploring is employed o learn he models of he sae-acion ransiion funcion and he reward funcion From he learned model, he opimal policy can be compued by many good algorihms MBRL is appropriae when he sae and acion space and relaively small and finie, and each exploring acion is expensive. Bacground -- Mehods To Reduce he Need For Training Daa By incorporae some ind of prior nowledge Previous Sudy: Absracion nowledge across he saes So ha he RL can generalize across saes Absracion nowledge across he acions (in his paper) The RL assumes similar acions will have similar ransiion effecs and rewards Bacground -- Acion Refinemen Recall how human learns Bad Kongfu Masers each he sudens all he rics a he beginning. The sudens have o spend a long ime o grasp all of hem Acion Refinemen Good Kongfu Masers each he sudens only he basic acions a he beginning. Afer he sudens grasp he basic sill he each hem he subleies among differen similar acions. The sudens grasp all he rics in a much shorer ime.

2 Acion Refinemen An RL algorihm iniially reas a se of similar acions as a single absracion Laer, refines ha absracion acion ino individual acions. The Probabiliy Smoohing Mehod Bacground The Probabiliy Smoohing Mehod Experimenal Sudy of Acion Refinemen Conclusion The Probabiliy Smoohing Model The Probabiliy Smoohing Model Conex: The agen is ineracing wih an unnown bu observable Marovian environmen. The environmen conains a finie sae se S, and a finie acion se A. The programmer groups se A ino L disjoin acion ses A, A,..., A L. Acions in he same subses are similar. Le N ( denoe he # of imes acion a has been execued in sae s. Le N ( a, s') denoe he # of imes his resuls in a ransiion o sae s. Le W ( a, s') denoe he oal rewards received when a caused a ransiion from s o s. Define he probabiliy smoohing model M such ha P ( s' = R ( a, s') = Al Al λ N (, s' ) Al Al λ N ( ) λ W (, s' ) λ N ( ) Deermine Smoohing Parameer Suppose he rue ransiion probabiliy from s o s afer execuing acion a is P( s', and he esimae o his probabiliy is P ( s' We wan o find a proper λ such ha P ( s' would be a consisen esimaor for he rue probabiliy. To deermine which esimaor is more appropriae, we need o define he error measure as he following J ( = [ P( s' P ( s' ] s' So he problem is o find a λ which minimizes J( λ Derivaion of Opimal Smoohing Parameers in he simples case Le s suppose here are only wo similar acion and a The curren sae is s There are only wo possible resuling sae s ' and s' ' Acion a has been applied on sae s for N imes. For H imes i ransi o sae s Acion a has been applied on sae s for N imes. For H imes i ransi o sae s a

3 Derivaion of Opimal Smoohing Parameers in he simples case Suppose he rue ransiion probabiliy from s o s afer exe a is p. Alhough H/N is an esimaor for p, i requires large number of rials. So we should use he smoohing model: H + λh pˆ = N + λn Derivaion of Opimal Smoohing Parameers in he simples case Afer calculaion, we find he mos appropriae smoohing parameer V λ = where N ε + V ε = p p Properies of using his λ : V = p ( p ), lim pˆ N V = p ( p ) = p ˆ and lim lim p N N = p Derivaion of Opimal Smoohing Parameers in he simples case Deermine he Level of Smoohing in Pracice Therefore, he probabiliy smoohing will converge o he opimal policy. This model can be expand o cases such as here are more han similar acions here are more han possible resuling saes We can use he resuling λ o build good esimaor for he reward. Big problem: In mos pracical case we will never now he rue value for p, p, or ε A naive approach for choosing λ would be esimae p by H/N, esimae p by H/N Bu when he rial number is small, he variance o hese esimaes are very high. The resul is poor. So he paper proposed o use defaul smoohing, in which we assume he defaul values of p, p, and ε, and plug in he value of N from he real daa. Deermine he Level of Smoohing in Pracice Deermine he Level of Smoohing in Pracice The auhor proposed o use defaul values p = 0., p = 0.5, ε = p p = 0.05 for he simples case. They wor well when < 0.5 for all values of p For cases ha here are more han possible resuling sae he auhor proposed o use defaul values ε V = 0.09, V = 0.75, ε = 0.005

4 Experimenal Sudy of Acion Refinemen Bacground The Probabiliy Smoohing Mehod Experimenal Sudy of Acion Refinemen Conclusion Experimenal Sudy of Acion Refinemen -- Conex A oy maze wih 8 nonerminal saes and erminal saes. 6 acions from he crossproduc of he 4 compass direcions wih 4 modifiers. Acions are grouped ino 4 ses. To measure he performance of a policy, we compue he π value funcion V and sum he value of all 8 non-erminal saes The opimal policy has oal value of Experimenal Sudy of Acion Refinemen -- Compare Wih No Smoohing Mehod Experimenal Sudy of Acion Refinemen -- Compare wih fixed smoohing & four-acion Comparison of probabiliy smoohing, and no smoohing ( λ = 0 ) The probabiliy smoohing model is much beer Comparison of fixed smoohing ( λ = ) four-acion mehod probabiliy smoohing Afer 9.3 exploraion sep four-acion mehod and probabiliy smoohing mehod bea fixed smoohing. Afer 3 sep probabiliy smoohing mehod wins. Experimenal Sudy of Acion Refinemen -- Conclusion Experimenal Sudy of Acion Refinemen -- Sensiiviy To The Size of Acion Ses Conclusion for he previous experimen The probabiliy smoohing mehod is vasly superior o no-smoohing mehod. If large raining se is available, probabiliy smoohing mehod is beer han fixsmoohing and four-acion mehod. Vary he # of acion ses from,, 4, 8, and 6 Wih similar acions be grouped ogeher o he exen possible. 6 separae acion ses gives high variance. One single acion se gives high bias. 4 ses and ses gave he bes performance during he early par of he curve.

5 Experimenal Sudy of Acion Refinemen -- Sensiiviy To The Acion Se Correcness Conclusions A inermediae sample size (4), even random groupings give beer performance han no smoohing. A large sample size he bias in he random and bad groupings leads o worse performance han eiher no smoohing or well-chosen acion ses. Probabiliy Smoohing Mehod is inroduced o acion refinemen o speed up RL applicaions by pariion acions ino ses of similar acions. I significanly eases he designing of a se of good acions in RL. Probabiliy smoohing parameer is deermined by defaul smoohing and he corresponding # of rials. Good prior acion se pariion is criical o he performance.

6 Acion Refinemen applied o he Robo Navigaion Problem Acion Refinemen in Reinforcemen Learning by Probabiliy Smoohing Thomas G. Dieerich, Didac Busques Ramon Lopez de Manara Carles Sierra Robo navigaes by finding visual landmars Robo s camera has a viewing angle of 60 degrees The space around he robo was pariioned ino six 60-degree secors Commens by : Sameer Ape Acion Refinemen applied o he Robo Navigaion Problem An acion called Move While looing for Landmars (MLL) was defined Robo moves forward while aiming is camera in one of he six secors o search for new visual landmars Can define six MLL acion one for each secor and le robo decide which secor o examine wih he camera Acion Refinemen applied o he Robo Navigaion Problem Designers do no now which of hese acions would be mos useful Include all hese acions in he MDP and le RL sysem deermine which acions are useful Problem : Large amoun of exploraion required o learn a good policy Train he robo several imes,each ime wih a differen se of acions Problem : Even more raining experiences required Acion Refinemen applied o he Robo Navigaion Problem Soluion : Acion Refinemen We now ha differen varians of he MLL acion have similar behavior Iniially rea hese similar acions as a single absrac acion Laer allow he learning algorihm o refine absrac acion ino individual acions

7 Direc vs. Model-Based Reinforcemen Learning Daa efficiency Crieria -- Commenary on Kai Xu s presenaion -- Commened by Ruinan Lu -- Reference: paper by C. Aeson, e al. Compuing efficiency Problem for comparison of he wo approaches: single pendulum swing-up Mae i swing! Model-based RL Known reward funcion: r( θ, τ ) = (( θ θ ) + τ ) d θ : angle of he pendulum θ d : desired angle for he invered verical sae τ : moor orque : ime sep Equaion of moion: θ = τ 9.8cos( θ ) Q-learning Resuls Q( x, u ) = Q( x, u ) + α[ r( x, u ) + γ Q( x α : learning rae γ : discoun facor x : sae vecor u : conrol vecor, u ) Q( x, u )] e( x, u ) + + Opimal acion: arg min Q( x, u) u

8 Conclusions Simple Dynamics favor MRL Exploraory acion is expensive Exploraion is performed on a physical sysem Cases favor Direc RL More raining experiences Learner ineracs wih an inexpensive simulaor

RL Lecture 7: Eligibility Traces. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1

RL Lecture 7: Eligibility Traces. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 RL Lecure 7: Eligibiliy Traces R. S. Suon and A. G. Baro: Reinforcemen Learning: An Inroducion 1 N-sep TD Predicion Idea: Look farher ino he fuure when you do TD backup (1, 2, 3,, n seps) R. S. Suon and

More information

Speaker Adaptation Techniques For Continuous Speech Using Medium and Small Adaptation Data Sets. Constantinos Boulis

Speaker Adaptation Techniques For Continuous Speech Using Medium and Small Adaptation Data Sets. Constantinos Boulis Speaker Adapaion Techniques For Coninuous Speech Using Medium and Small Adapaion Daa Ses Consaninos Boulis Ouline of he Presenaion Inroducion o he speaker adapaion problem Maximum Likelihood Sochasic Transformaions

More information

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon 3..3 INRODUCION O DYNAMIC OPIMIZAION: DISCREE IME PROBLEMS A. he Hamilonian and Firs-Order Condiions in a Finie ime Horizon Define a new funcion, he Hamilonian funcion, H. H he change in he oal value of

More information

Zürich. ETH Master Course: L Autonomous Mobile Robots Localization II

Zürich. ETH Master Course: L Autonomous Mobile Robots Localization II Roland Siegwar Margaria Chli Paul Furgale Marco Huer Marin Rufli Davide Scaramuzza ETH Maser Course: 151-0854-00L Auonomous Mobile Robos Localizaion II ACT and SEE For all do, (predicion updae / ACT),

More information

CSE/NB 528 Lecture 14: Reinforcement Learning (Chapter 9)

CSE/NB 528 Lecture 14: Reinforcement Learning (Chapter 9) CSE/NB 528 Lecure 14: Reinforcemen Learning Chaper 9 Image from hp://clasdean.la.asu.edu/news/images/ubep2001/neuron3.jpg Lecure figures are from Dayan & Abbo s book hp://people.brandeis.edu/~abbo/book/index.hml

More information

Lecture 20: Riccati Equations and Least Squares Feedback Control

Lecture 20: Riccati Equations and Least Squares Feedback Control 34-5 LINEAR SYSTEMS Lecure : Riccai Equaions and Leas Squares Feedback Conrol 5.6.4 Sae Feedback via Riccai Equaions A recursive approach in generaing he marix-valued funcion W ( ) equaion for i for he

More information

Chapter 21. Reinforcement Learning. The Reinforcement Learning Agent

Chapter 21. Reinforcement Learning. The Reinforcement Learning Agent CSE 47 Chaper Reinforcemen Learning The Reinforcemen Learning Agen Agen Sae u Reward r Acion a Enironmen CSE AI Faculy Why reinforcemen learning Programming an agen o drie a car or fly a helicoper is ery

More information

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle Chaper 2 Newonian Mechanics Single Paricle In his Chaper we will review wha Newon s laws of mechanics ell us abou he moion of a single paricle. Newon s laws are only valid in suiable reference frames,

More information

Sequential Importance Resampling (SIR) Particle Filter

Sequential Importance Resampling (SIR) Particle Filter Paricle Filers++ Pieer Abbeel UC Berkeley EECS Many slides adaped from Thrun, Burgard and Fox, Probabilisic Roboics 1. Algorihm paricle_filer( S -1, u, z ): 2. Sequenial Imporance Resampling (SIR) Paricle

More information

State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter

State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter Sae-Space Models Iniializaion, Esimaion and Smoohing of he Kalman Filer Iniializaion of he Kalman Filer The Kalman filer shows how o updae pas predicors and he corresponding predicion error variances when

More information

Learning to Take Concurrent Actions

Learning to Take Concurrent Actions Learning o Take Concurren Acions Khashayar Rohanimanesh Deparmen of Compuer Science Universiy of Massachuses Amhers, MA 0003 khash@cs.umass.edu Sridhar Mahadevan Deparmen of Compuer Science Universiy of

More information

CSE/NB 528 Lecture 14: From Supervised to Reinforcement Learning (Chapter 9) R. Rao, 528: Lecture 14

CSE/NB 528 Lecture 14: From Supervised to Reinforcement Learning (Chapter 9) R. Rao, 528: Lecture 14 CSE/NB 58 Lecure 14: From Supervised o Reinforcemen Learning Chaper 9 1 Recall from las ime: Sigmoid Neworks Oupu v T g w u g wiui w Inpu nodes u = u 1 u u 3 T i Sigmoid oupu funcion: 1 g a 1 a e 1 ga

More information

Ensamble methods: Bagging and Boosting

Ensamble methods: Bagging and Boosting Lecure 21 Ensamble mehods: Bagging and Boosing Milos Hauskrech milos@cs.pi.edu 5329 Senno Square Ensemble mehods Mixure of expers Muliple base models (classifiers, regressors), each covers a differen par

More information

Matlab and Python programming: how to get started

Matlab and Python programming: how to get started Malab and Pyhon programming: how o ge sared Equipping readers he skills o wrie programs o explore complex sysems and discover ineresing paerns from big daa is one of he main goals of his book. In his chaper,

More information

20. Applications of the Genetic-Drift Model

20. Applications of the Genetic-Drift Model 0. Applicaions of he Geneic-Drif Model 1) Deermining he probabiliy of forming any paricular combinaion of genoypes in he nex generaion: Example: If he parenal allele frequencies are p 0 = 0.35 and q 0

More information

L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS. NA568 Mobile Robotics: Methods & Algorithms

L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS. NA568 Mobile Robotics: Methods & Algorithms L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS NA568 Mobile Roboics: Mehods & Algorihms Today s Topic Quick review on (Linear) Kalman Filer Kalman Filering for Non-Linear Sysems Exended Kalman Filer (EKF)

More information

Notes on Kalman Filtering

Notes on Kalman Filtering Noes on Kalman Filering Brian Borchers and Rick Aser November 7, Inroducion Daa Assimilaion is he problem of merging model predicions wih acual measuremens of a sysem o produce an opimal esimae of he curren

More information

0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED

0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED 0.1 MAXIMUM LIKELIHOOD ESTIMATIO EXPLAIED Maximum likelihood esimaion is a bes-fi saisical mehod for he esimaion of he values of he parameers of a sysem, based on a se of observaions of a random variable

More information

Ensamble methods: Boosting

Ensamble methods: Boosting Lecure 21 Ensamble mehods: Boosing Milos Hauskrech milos@cs.pi.edu 5329 Senno Square Schedule Final exam: April 18: 1:00-2:15pm, in-class Term projecs April 23 & April 25: a 1:00-2:30pm in CS seminar room

More information

Mechanical Fatigue and Load-Induced Aging of Loudspeaker Suspension. Wolfgang Klippel,

Mechanical Fatigue and Load-Induced Aging of Loudspeaker Suspension. Wolfgang Klippel, Mechanical Faigue and Load-Induced Aging of Loudspeaker Suspension Wolfgang Klippel, Insiue of Acousics and Speech Communicaion Dresden Universiy of Technology presened a he ALMA Symposium 2012, Las Vegas

More information

Augmented Reality II - Kalman Filters - Gudrun Klinker May 25, 2004

Augmented Reality II - Kalman Filters - Gudrun Klinker May 25, 2004 Augmened Realiy II Kalman Filers Gudrun Klinker May 25, 2004 Ouline Moivaion Discree Kalman Filer Modeled Process Compuing Model Parameers Algorihm Exended Kalman Filer Kalman Filer for Sensor Fusion Lieraure

More information

Financial Econometrics Kalman Filter: some applications to Finance University of Evry - Master 2

Financial Econometrics Kalman Filter: some applications to Finance University of Evry - Master 2 Financial Economerics Kalman Filer: some applicaions o Finance Universiy of Evry - Maser 2 Eric Bouyé January 27, 2009 Conens 1 Sae-space models 2 2 The Scalar Kalman Filer 2 21 Presenaion 2 22 Summary

More information

WEEK-3 Recitation PHYS 131. of the projectile s velocity remains constant throughout the motion, since the acceleration a x

WEEK-3 Recitation PHYS 131. of the projectile s velocity remains constant throughout the motion, since the acceleration a x WEEK-3 Reciaion PHYS 131 Ch. 3: FOC 1, 3, 4, 6, 14. Problems 9, 37, 41 & 71 and Ch. 4: FOC 1, 3, 5, 8. Problems 3, 5 & 16. Feb 8, 018 Ch. 3: FOC 1, 3, 4, 6, 14. 1. (a) The horizonal componen of he projecile

More information

Waveform Transmission Method, A New Waveform-relaxation Based Algorithm. to Solve Ordinary Differential Equations in Parallel

Waveform Transmission Method, A New Waveform-relaxation Based Algorithm. to Solve Ordinary Differential Equations in Parallel Waveform Transmission Mehod, A New Waveform-relaxaion Based Algorihm o Solve Ordinary Differenial Equaions in Parallel Fei Wei Huazhong Yang Deparmen of Elecronic Engineering, Tsinghua Universiy, Beijing,

More information

Robust estimation based on the first- and third-moment restrictions of the power transformation model

Robust estimation based on the first- and third-moment restrictions of the power transformation model h Inernaional Congress on Modelling and Simulaion, Adelaide, Ausralia, 6 December 3 www.mssanz.org.au/modsim3 Robus esimaion based on he firs- and hird-momen resricions of he power ransformaion Nawaa,

More information

Estimation of Poses with Particle Filters

Estimation of Poses with Particle Filters Esimaion of Poses wih Paricle Filers Dr.-Ing. Bernd Ludwig Chair for Arificial Inelligence Deparmen of Compuer Science Friedrich-Alexander-Universiä Erlangen-Nürnberg 12/05/2008 Dr.-Ing. Bernd Ludwig (FAU

More information

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN The MIT Press, 2014 Lecure Slides for INTRODUCTION TO MACHINE LEARNING 3RD EDITION alpaydin@boun.edu.r hp://www.cmpe.boun.edu.r/~ehem/i2ml3e CHAPTER 2: SUPERVISED LEARNING Learning a Class

More information

Introduction D P. r = constant discount rate, g = Gordon Model (1962): constant dividend growth rate.

Introduction D P. r = constant discount rate, g = Gordon Model (1962): constant dividend growth rate. Inroducion Gordon Model (1962): D P = r g r = consan discoun rae, g = consan dividend growh rae. If raional expecaions of fuure discoun raes and dividend growh vary over ime, so should he D/P raio. Since

More information

A Dynamic Model of Economic Fluctuations

A Dynamic Model of Economic Fluctuations CHAPTER 15 A Dynamic Model of Economic Flucuaions Modified for ECON 2204 by Bob Murphy 2016 Worh Publishers, all righs reserved IN THIS CHAPTER, OU WILL LEARN: how o incorporae dynamics ino he AD-AS model

More information

Georey E. Hinton. University oftoronto. Technical Report CRG-TR February 22, Abstract

Georey E. Hinton. University oftoronto.   Technical Report CRG-TR February 22, Abstract Parameer Esimaion for Linear Dynamical Sysems Zoubin Ghahramani Georey E. Hinon Deparmen of Compuer Science Universiy oftorono 6 King's College Road Torono, Canada M5S A4 Email: zoubin@cs.orono.edu Technical

More information

Vehicle Arrival Models : Headway

Vehicle Arrival Models : Headway Chaper 12 Vehicle Arrival Models : Headway 12.1 Inroducion Modelling arrival of vehicle a secion of road is an imporan sep in raffic flow modelling. I has imporan applicaion in raffic flow simulaion where

More information

Probabilistic Robotics

Probabilistic Robotics Probabilisic Roboics Bayes Filer Implemenaions Gaussian filers Bayes Filer Reminder Predicion bel p u bel d Correcion bel η p z bel Gaussians : ~ π e p N p - Univariae / / : ~ μ μ μ e p Ν p d π Mulivariae

More information

Excel-Based Solution Method For The Optimal Policy Of The Hadley And Whittin s Exact Model With Arma Demand

Excel-Based Solution Method For The Optimal Policy Of The Hadley And Whittin s Exact Model With Arma Demand Excel-Based Soluion Mehod For The Opimal Policy Of The Hadley And Whiin s Exac Model Wih Arma Demand Kal Nami School of Business and Economics Winson Salem Sae Universiy Winson Salem, NC 27110 Phone: (336)750-2338

More information

Planning in POMDPs. Dominik Schoenberger Abstract

Planning in POMDPs. Dominik Schoenberger Abstract Planning in POMDPs Dominik Schoenberger d.schoenberger@sud.u-darmsad.de Absrac This documen briefly explains wha a Parially Observable Markov Decision Process is. Furhermore i inroduces he differen approaches

More information

Tournament selection in zeroth-level classifier systems based on. average reward reinforcement learning

Tournament selection in zeroth-level classifier systems based on. average reward reinforcement learning ournamen selecion in zeroh-level classifier sysems based on average reward reinforcemen learning Zang Zhaoxiang, Li Zhao, Wang Junying, Dan Zhiping zxzang@gmail.com; zangzx@hus.edu.cn (Hubei Key Laboraory

More information

Overview. COMP14112: Artificial Intelligence Fundamentals. Lecture 0 Very Brief Overview. Structure of this course

Overview. COMP14112: Artificial Intelligence Fundamentals. Lecture 0 Very Brief Overview. Structure of this course OMP: Arificial Inelligence Fundamenals Lecure 0 Very Brief Overview Lecurer: Email: Xiao-Jun Zeng x.zeng@mancheser.ac.uk Overview This course will focus mainly on probabilisic mehods in AI We shall presen

More information

Robot Motion Model EKF based Localization EKF SLAM Graph SLAM

Robot Motion Model EKF based Localization EKF SLAM Graph SLAM Robo Moion Model EKF based Localizaion EKF SLAM Graph SLAM General Robo Moion Model Robo sae v r Conrol a ime Sae updae model Noise model of robo conrol Noise model of conrol Robo moion model

More information

Final Spring 2007

Final Spring 2007 .615 Final Spring 7 Overview The purpose of he final exam is o calculae he MHD β limi in a high-bea oroidal okamak agains he dangerous n = 1 exernal ballooning-kink mode. Effecively, his corresponds o

More information

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010 Simulaion-Solving Dynamic Models ABE 5646 Week 2, Spring 2010 Week Descripion Reading Maerial 2 Compuer Simulaion of Dynamic Models Finie Difference, coninuous saes, discree ime Simple Mehods Euler Trapezoid

More information

GMM - Generalized Method of Moments

GMM - Generalized Method of Moments GMM - Generalized Mehod of Momens Conens GMM esimaion, shor inroducion 2 GMM inuiion: Maching momens 2 3 General overview of GMM esimaion. 3 3. Weighing marix...........................................

More information

CptS 570 Machine Learning School of EECS Washington State University. CptS Machine Learning 1

CptS 570 Machine Learning School of EECS Washington State University. CptS Machine Learning 1 CpS 570 Machine Learning School of EECS Washingon Sae Universiy CpS 570 - Machine Learning 1 Form of underlying disribuions unknown Bu sill wan o perform classificaion and regression Semi-parameric esimaion

More information

Reliability of Technical Systems

Reliability of Technical Systems eliabiliy of Technical Sysems Main Topics Inroducion, Key erms, framing he problem eliabiliy parameers: Failure ae, Failure Probabiliy, Availabiliy, ec. Some imporan reliabiliy disribuions Componen reliabiliy

More information

Numerical Dispersion

Numerical Dispersion eview of Linear Numerical Sabiliy Numerical Dispersion n he previous lecure, we considered he linear numerical sabiliy of boh advecion and diffusion erms when approimaed wih several spaial and emporal

More information

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing Applicaion of a Sochasic-Fuzzy Approach o Modeling Opimal Discree Time Dynamical Sysems by Using Large Scale Daa Processing AA WALASZE-BABISZEWSA Deparmen of Compuer Engineering Opole Universiy of Technology

More information

Hamilton- J acobi Equation: Explicit Formulas In this lecture we try to apply the method of characteristics to the Hamilton-Jacobi equation: u t

Hamilton- J acobi Equation: Explicit Formulas In this lecture we try to apply the method of characteristics to the Hamilton-Jacobi equation: u t M ah 5 2 7 Fall 2 0 0 9 L ecure 1 0 O c. 7, 2 0 0 9 Hamilon- J acobi Equaion: Explici Formulas In his lecure we ry o apply he mehod of characerisics o he Hamilon-Jacobi equaion: u + H D u, x = 0 in R n

More information

True Online Temporal-Difference Learning. A. Rupam Mahmood Patrick M. Pilarski

True Online Temporal-Difference Learning. A. Rupam Mahmood Patrick M. Pilarski True Online Temporal-Difference Learning True Online Temporal-Difference Learning Harm van Seijen harm.vanseijen@ualbera.ca A. Rupam Mahmood ashique@ualbera.ca Parick M. Pilarski parick.pilarski@ualbera.ca

More information

Linear Gaussian State Space Models

Linear Gaussian State Space Models Linear Gaussian Sae Space Models Srucural Time Series Models Level and Trend Models Basic Srucural Model (BSM Dynamic Linear Models Sae Space Model Represenaion Level, Trend, and Seasonal Models Time Varying

More information

A Reinforcement Learning Approach for Collaborative Filtering

A Reinforcement Learning Approach for Collaborative Filtering A Reinforcemen Learning Approach for Collaboraive Filering Jungkyu Lee, Byonghwa Oh 2, Jihoon Yang 2, and Sungyong Park 2 Cyram Inc, Seoul, Korea jklee@cyram.com 2 Sogang Universiy, Seoul, Korea {mrfive,yangjh,parksy}@sogang.ac.kr

More information

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power Alpaydin Chaper, Michell Chaper 7 Alpaydin slides are in urquoise. Ehem Alpaydin, copyrigh: The MIT Press, 010. alpaydin@boun.edu.r hp://www.cmpe.boun.edu.r/ ehem/imle All oher slides are based on Michell.

More information

The Optimal Stopping Time for Selling an Asset When It Is Uncertain Whether the Price Process Is Increasing or Decreasing When the Horizon Is Infinite

The Optimal Stopping Time for Selling an Asset When It Is Uncertain Whether the Price Process Is Increasing or Decreasing When the Horizon Is Infinite American Journal of Operaions Research, 08, 8, 8-9 hp://wwwscirporg/journal/ajor ISSN Online: 60-8849 ISSN Prin: 60-8830 The Opimal Sopping Time for Selling an Asse When I Is Uncerain Wheher he Price Process

More information

Article from. Predictive Analytics and Futurism. July 2016 Issue 13

Article from. Predictive Analytics and Futurism. July 2016 Issue 13 Aricle from Predicive Analyics and Fuurism July 6 Issue An Inroducion o Incremenal Learning By Qiang Wu and Dave Snell Machine learning provides useful ools for predicive analyics The ypical machine learning

More information

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power Alpaydin Chaper, Michell Chaper 7 Alpaydin slides are in urquoise. Ehem Alpaydin, copyrigh: The MIT Press, 010. alpaydin@boun.edu.r hp://www.cmpe.boun.edu.r/ ehem/imle All oher slides are based on Michell.

More information

Recursive Least-Squares Fixed-Interval Smoother Using Covariance Information based on Innovation Approach in Linear Continuous Stochastic Systems

Recursive Least-Squares Fixed-Interval Smoother Using Covariance Information based on Innovation Approach in Linear Continuous Stochastic Systems 8 Froniers in Signal Processing, Vol. 1, No. 1, July 217 hps://dx.doi.org/1.2266/fsp.217.112 Recursive Leas-Squares Fixed-Inerval Smooher Using Covariance Informaion based on Innovaion Approach in Linear

More information

The Rosenblatt s LMS algorithm for Perceptron (1958) is built around a linear neuron (a neuron with a linear

The Rosenblatt s LMS algorithm for Perceptron (1958) is built around a linear neuron (a neuron with a linear In The name of God Lecure4: Percepron and AALIE r. Majid MjidGhoshunih Inroducion The Rosenbla s LMS algorihm for Percepron 958 is buil around a linear neuron a neuron ih a linear acivaion funcion. Hoever,

More information

1 Review of Zero-Sum Games

1 Review of Zero-Sum Games COS 5: heoreical Machine Learning Lecurer: Rob Schapire Lecure #23 Scribe: Eugene Brevdo April 30, 2008 Review of Zero-Sum Games Las ime we inroduced a mahemaical model for wo player zero-sum games. Any

More information

An recursive analytical technique to estimate time dependent physical parameters in the presence of noise processes

An recursive analytical technique to estimate time dependent physical parameters in the presence of noise processes WHAT IS A KALMAN FILTER An recursive analyical echnique o esimae ime dependen physical parameers in he presence of noise processes Example of a ime and frequency applicaion: Offse beween wo clocks PREDICTORS,

More information

Notes for Lecture 17-18

Notes for Lecture 17-18 U.C. Berkeley CS278: Compuaional Complexiy Handou N7-8 Professor Luca Trevisan April 3-8, 2008 Noes for Lecure 7-8 In hese wo lecures we prove he firs half of he PCP Theorem, he Amplificaion Lemma, up

More information

KEY. Math 334 Midterm I Fall 2008 sections 001 and 003 Instructor: Scott Glasgow

KEY. Math 334 Midterm I Fall 2008 sections 001 and 003 Instructor: Scott Glasgow 1 KEY Mah 4 Miderm I Fall 8 secions 1 and Insrucor: Sco Glasgow Please do NOT wrie on his eam. No credi will be given for such work. Raher wrie in a blue book, or on our own paper, preferabl engineering

More information

Robotics I. April 11, The kinematics of a 3R spatial robot is specified by the Denavit-Hartenberg parameters in Tab. 1.

Robotics I. April 11, The kinematics of a 3R spatial robot is specified by the Denavit-Hartenberg parameters in Tab. 1. Roboics I April 11, 017 Exercise 1 he kinemaics of a 3R spaial robo is specified by he Denavi-Harenberg parameers in ab 1 i α i d i a i θ i 1 π/ L 1 0 1 0 0 L 3 0 0 L 3 3 able 1: able of DH parameers of

More information

2016 Possible Examination Questions. Robotics CSCE 574

2016 Possible Examination Questions. Robotics CSCE 574 206 Possible Examinaion Quesions Roboics CSCE 574 ) Wha are he differences beween Hydraulic drive and Shape Memory Alloy drive? Name one applicaion in which each one of hem is appropriae. 2) Wha are he

More information

Bias in Conditional and Unconditional Fixed Effects Logit Estimation: a Correction * Tom Coupé

Bias in Conditional and Unconditional Fixed Effects Logit Estimation: a Correction * Tom Coupé Bias in Condiional and Uncondiional Fixed Effecs Logi Esimaion: a Correcion * Tom Coupé Economics Educaion and Research Consorium, Naional Universiy of Kyiv Mohyla Academy Address: Vul Voloska 10, 04070

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Probabilisic reasoning over ime So far, we ve mosly deal wih episodic environmens Excepions: games wih muliple moves, planning In paricular, he Bayesian neworks we ve seen so far describe

More information

Biol. 356 Lab 8. Mortality, Recruitment, and Migration Rates

Biol. 356 Lab 8. Mortality, Recruitment, and Migration Rates Biol. 356 Lab 8. Moraliy, Recruimen, and Migraion Raes (modified from Cox, 00, General Ecology Lab Manual, McGraw Hill) Las week we esimaed populaion size hrough several mehods. One assumpion of all hese

More information

STATE-SPACE MODELLING. A mass balance across the tank gives:

STATE-SPACE MODELLING. A mass balance across the tank gives: B. Lennox and N.F. Thornhill, 9, Sae Space Modelling, IChemE Process Managemen and Conrol Subjec Group Newsleer STE-SPACE MODELLING Inroducion: Over he pas decade or so here has been an ever increasing

More information

SPH3U: Projectiles. Recorder: Manager: Speaker:

SPH3U: Projectiles. Recorder: Manager: Speaker: SPH3U: Projeciles Now i s ime o use our new skills o analyze he moion of a golf ball ha was ossed hrough he air. Le s find ou wha is special abou he moion of a projecile. Recorder: Manager: Speaker: 0

More information

Chapter 2. Models, Censoring, and Likelihood for Failure-Time Data

Chapter 2. Models, Censoring, and Likelihood for Failure-Time Data Chaper 2 Models, Censoring, and Likelihood for Failure-Time Daa William Q. Meeker and Luis A. Escobar Iowa Sae Universiy and Louisiana Sae Universiy Copyrigh 1998-2008 W. Q. Meeker and L. A. Escobar. Based

More information

OBJECTIVES OF TIME SERIES ANALYSIS

OBJECTIVES OF TIME SERIES ANALYSIS OBJECTIVES OF TIME SERIES ANALYSIS Undersanding he dynamic or imedependen srucure of he observaions of a single series (univariae analysis) Forecasing of fuure observaions Asceraining he leading, lagging

More information

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB Elecronic Companion EC.1. Proofs of Technical Lemmas and Theorems LEMMA 1. Le C(RB) be he oal cos incurred by he RB policy. Then we have, T L E[C(RB)] 3 E[Z RB ]. (EC.1) Proof of Lemma 1. Using he marginal

More information

Navneet Saini, Mayank Goyal, Vishal Bansal (2013); Term Project AML310; Indian Institute of Technology Delhi

Navneet Saini, Mayank Goyal, Vishal Bansal (2013); Term Project AML310; Indian Institute of Technology Delhi Creep in Viscoelasic Subsances Numerical mehods o calculae he coefficiens of he Prony equaion using creep es daa and Herediary Inegrals Mehod Navnee Saini, Mayank Goyal, Vishal Bansal (23); Term Projec

More information

Lecture Notes 3: Quantitative Analysis in DSGE Models: New Keynesian Model

Lecture Notes 3: Quantitative Analysis in DSGE Models: New Keynesian Model Lecure Noes 3: Quaniaive Analysis in DSGE Models: New Keynesian Model Zhiwei Xu, Email: xuzhiwei@sju.edu.cn The moneary policy plays lile role in he basic moneary model wihou price sickiness. We now urn

More information

Problemas das Aulas Práticas

Problemas das Aulas Práticas Mesrado Inegrado em Engenharia Elecroécnica e de Compuadores Conrolo em Espaço de Esados Problemas das Aulas Práicas J. Miranda Lemos Fevereiro de 3 Translaed o English by José Gaspar, 6 J. M. Lemos, IST

More information

Single and Double Pendulum Models

Single and Double Pendulum Models Single and Double Pendulum Models Mah 596 Projec Summary Spring 2016 Jarod Har 1 Overview Differen ypes of pendulums are used o model many phenomena in various disciplines. In paricular, single and double

More information

A variational radial basis function approximation for diffusion processes.

A variational radial basis function approximation for diffusion processes. A variaional radial basis funcion approximaion for diffusion processes. Michail D. Vreas, Dan Cornford and Yuan Shen {vreasm, d.cornford, y.shen}@ason.ac.uk Ason Universiy, Birmingham, UK hp://www.ncrg.ason.ac.uk

More information

Theory of! Partial Differential Equations!

Theory of! Partial Differential Equations! hp://www.nd.edu/~gryggva/cfd-course/! Ouline! Theory o! Parial Dierenial Equaions! Gréar Tryggvason! Spring 011! Basic Properies o PDE!! Quasi-linear Firs Order Equaions! - Characerisics! - Linear and

More information

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017 Two Popular Bayesian Esimaors: Paricle and Kalman Filers McGill COMP 765 Sep 14 h, 2017 1 1 1, dx x Bel x u x P x z P Recall: Bayes Filers,,,,,,, 1 1 1 1 u z u x P u z u x z P Bayes z = observaion u =

More information

Introduction to Mobile Robotics

Introduction to Mobile Robotics Inroducion o Mobile Roboics Bayes Filer Kalman Filer Wolfram Burgard Cyrill Sachniss Giorgio Grisei Maren Bennewiz Chrisian Plagemann Bayes Filer Reminder Predicion bel p u bel d Correcion bel η p z bel

More information

Reinforcement Learning: A Tutorial. Scope of Tutorial. 1 Introduction

Reinforcement Learning: A Tutorial. Scope of Tutorial. 1 Introduction Reinforcemen Learning: A Tuorial Mance E. Harmon WL/AACF 224 Avionics Circle Wrigh Laboraory Wrigh-Paerson AFB, OH 45433 mharmon@acm.org Sephanie S. Harmon Wrigh Sae Universiy 56-8 Mallard Glen Drive Cenerville,

More information

System of Linear Differential Equations

System of Linear Differential Equations Sysem of Linear Differenial Equaions In "Ordinary Differenial Equaions" we've learned how o solve a differenial equaion for a variable, such as: y'k5$e K2$x =0 solve DE yx = K 5 2 ek2 x C_C1 2$y''C7$y

More information

Chapter 15: Phenomena. Chapter 15 Chemical Kinetics. Reaction Rates. Reaction Rates R P. Reaction Rates. Rate Laws

Chapter 15: Phenomena. Chapter 15 Chemical Kinetics. Reaction Rates. Reaction Rates R P. Reaction Rates. Rate Laws Chaper 5: Phenomena Phenomena: The reacion (aq) + B(aq) C(aq) was sudied a wo differen emperaures (98 K and 35 K). For each emperaure he reacion was sared by puing differen concenraions of he 3 species

More information

Supplementary Document

Supplementary Document Saisica Sinica (2013): Preprin 1 Supplemenary Documen for Funcional Linear Model wih Zero-value Coefficien Funcion a Sub-regions Jianhui Zhou, Nae-Yuh Wang, and Naisyin Wang Universiy of Virginia, Johns

More information

(a) Set up the least squares estimation procedure for this problem, which will consist in minimizing the sum of squared residuals. 2 t.

(a) Set up the least squares estimation procedure for this problem, which will consist in minimizing the sum of squared residuals. 2 t. Insrucions: The goal of he problem se is o undersand wha you are doing raher han jus geing he correc resul. Please show your work clearly and nealy. No credi will be given o lae homework, regardless of

More information

Západočeská Univerzita v Plzni, Czech Republic and Groupe ESIEE Paris, France

Západočeská Univerzita v Plzni, Czech Republic and Groupe ESIEE Paris, France ADAPTIVE SIGNAL PROCESSING USING MAXIMUM ENTROPY ON THE MEAN METHOD AND MONTE CARLO ANALYSIS Pavla Holejšovsá, Ing. *), Z. Peroua, Ing. **), J.-F. Bercher, Prof. Assis. ***) Západočesá Univerzia v Plzni,

More information

Longest Common Prefixes

Longest Common Prefixes Longes Common Prefixes The sandard ordering for srings is he lexicographical order. I is induced by an order over he alphabe. We will use he same symbols (,

More information

Problem Set 5. Graduate Macro II, Spring 2017 The University of Notre Dame Professor Sims

Problem Set 5. Graduate Macro II, Spring 2017 The University of Notre Dame Professor Sims Problem Se 5 Graduae Macro II, Spring 2017 The Universiy of Nore Dame Professor Sims Insrucions: You may consul wih oher members of he class, bu please make sure o urn in your own work. Where applicable,

More information

A unit root test based on smooth transitions and nonlinear adjustment

A unit root test based on smooth transitions and nonlinear adjustment MPRA Munich Personal RePEc Archive A uni roo es based on smooh ransiions and nonlinear adjusmen Aycan Hepsag Isanbul Universiy 5 Ocober 2017 Online a hps://mpra.ub.uni-muenchen.de/81788/ MPRA Paper No.

More information

Retrieval Models. Boolean and Vector Space Retrieval Models. Common Preprocessing Steps. Boolean Model. Boolean Retrieval Model

Retrieval Models. Boolean and Vector Space Retrieval Models. Common Preprocessing Steps. Boolean Model. Boolean Retrieval Model 1 Boolean and Vecor Space Rerieval Models Many slides in his secion are adaped from Prof. Joydeep Ghosh (UT ECE) who in urn adaped hem from Prof. Dik Lee (Univ. of Science and Tech, Hong Kong) Rerieval

More information

Hamilton- J acobi Equation: Weak S olution We continue the study of the Hamilton-Jacobi equation:

Hamilton- J acobi Equation: Weak S olution We continue the study of the Hamilton-Jacobi equation: M ah 5 7 Fall 9 L ecure O c. 4, 9 ) Hamilon- J acobi Equaion: Weak S oluion We coninue he sudy of he Hamilon-Jacobi equaion: We have shown ha u + H D u) = R n, ) ; u = g R n { = }. ). In general we canno

More information

Energy Storage Benchmark Problems

Energy Storage Benchmark Problems Energy Sorage Benchmark Problems Daniel F. Salas 1,3, Warren B. Powell 2,3 1 Deparmen of Chemical & Biological Engineering 2 Deparmen of Operaions Research & Financial Engineering 3 Princeon Laboraory

More information

Understanding the asymptotic behaviour of empirical Bayes methods

Understanding the asymptotic behaviour of empirical Bayes methods Undersanding he asympoic behaviour of empirical Bayes mehods Boond Szabo, Aad van der Vaar and Harry van Zanen EURANDOM, 11.10.2011. Conens 2/20 Moivaion Nonparameric Bayesian saisics Signal in Whie noise

More information

Lecture 1 Overview. course mechanics. outline & topics. what is a linear dynamical system? why study linear systems? some examples

Lecture 1 Overview. course mechanics. outline & topics. what is a linear dynamical system? why study linear systems? some examples EE263 Auumn 27-8 Sephen Boyd Lecure 1 Overview course mechanics ouline & opics wha is a linear dynamical sysem? why sudy linear sysems? some examples 1 1 Course mechanics all class info, lecures, homeworks,

More information

Some Basic Information about M-S-D Systems

Some Basic Information about M-S-D Systems Some Basic Informaion abou M-S-D Sysems 1 Inroducion We wan o give some summary of he facs concerning unforced (homogeneous) and forced (non-homogeneous) models for linear oscillaors governed by second-order,

More information

Efficient Reinforcement Learning with Multiple Reward Functions for Randomized Controlled Trial Analysis

Efficient Reinforcement Learning with Multiple Reward Functions for Randomized Controlled Trial Analysis Efficien Reinforcemen Learning wih Muliple Reward Funcions for Randomized Conrolled rial Analysis Daniel J. Lizoe Michael Bowling Susan A. Murphy Universiy of Michigan, Ann Arbor, MI 48109 USA danjl@umich.edu

More information

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD HAN XIAO 1. Penalized Leas Squares Lasso solves he following opimizaion problem, ˆβ lasso = arg max β R p+1 1 N y i β 0 N x ij β j β j (1.1) for some 0.

More information

Mathematical Theory and Modeling ISSN (Paper) ISSN (Online) Vol 3, No.3, 2013

Mathematical Theory and Modeling ISSN (Paper) ISSN (Online) Vol 3, No.3, 2013 Mahemaical Theory and Modeling ISSN -580 (Paper) ISSN 5-05 (Online) Vol, No., 0 www.iise.org The ffec of Inverse Transformaion on he Uni Mean and Consan Variance Assumpions of a Muliplicaive rror Model

More information

Book Corrections for Optimal Estimation of Dynamic Systems, 2 nd Edition

Book Corrections for Optimal Estimation of Dynamic Systems, 2 nd Edition Boo Correcions for Opimal Esimaion of Dynamic Sysems, nd Ediion John L. Crassidis and John L. Junins November 17, 017 Chaper 1 This documen provides correcions for he boo: Crassidis, J.L., and Junins,

More information

IB Physics Kinematics Worksheet

IB Physics Kinematics Worksheet IB Physics Kinemaics Workshee Wrie full soluions and noes for muliple choice answers. Do no use a calculaor for muliple choice answers. 1. Which of he following is a correc definiion of average acceleraion?

More information

Estimation of Investment in Residential and Nonresidential Structures v2.0

Estimation of Investment in Residential and Nonresidential Structures v2.0 Esimaion of Invesmen in Residenial and Nonresidenial Srucures v2.0 Ocober 2015 In he REMI model, he invesmen expendiures depends on he gap beween he opimal capial socks and he acual capial socks. The general

More information

Math 10B: Mock Mid II. April 13, 2016

Math 10B: Mock Mid II. April 13, 2016 Name: Soluions Mah 10B: Mock Mid II April 13, 016 1. ( poins) Sae, wih jusificaion, wheher he following saemens are rue or false. (a) If a 3 3 marix A saisfies A 3 A = 0, hen i canno be inverible. True.

More information

CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK

CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK 175 CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK 10.1 INTRODUCTION Amongs he research work performed, he bes resuls of experimenal work are validaed wih Arificial Neural Nework. From he

More information

Errata (1 st Edition)

Errata (1 st Edition) P Sandborn, os Analysis of Elecronic Sysems, s Ediion, orld Scienific, Singapore, 03 Erraa ( s Ediion) S K 05D Page 8 Equaion (7) should be, E 05D E Nu e S K he L appearing in he equaion in he book does

More information