Batch RL Via Least Squares Policy Iteration

Size: px
Start display at page:

Download "Batch RL Via Least Squares Policy Iteration"

Transcription

1 Batch RL Va Leat Square Polcy Iteraton Alan Fern * Baed n part on lde by Ronald Parr

2 Overvew Motvaton LSPI Dervaton from LSTD Expermental reult

3 Onlne veru Batch RL Onlne RL: ntegrate data collecton and optmzaton Select acton n envronment and at the ame tme update parameter baed on each oberved,a,,r) Batch RL: decouple data collecton and optmzaton Frt generate/collect experence n the envronment gvng a data et of tate-acton-reward-tate par {,a,r, )} Ue the fxed et of experence to optmze/learn a polcy Onlne v. Batch: Batch algorthm are often more data effcent and table Batch algorthm gnore the exploraton-explotaton problem, and do ther bet wth the data they have

4 Batch RL Motvaton There are many applcaton that naturally ft the batch RL model Medcal Treatment Optmzaton: Input: collecton of treatment epode for an alment gvng equence of obervaton and acton ncludng outcome Ouput: a treatment polcy, deally better than current practce Emergency Repone Optmzaton: Input: collecton of emergency repone epode gvng movement of emergency reource before, durng, and after 911 call Output: emergency repone polcy

5 LSPI LSPI a model-free batch RL algorthm Learn a lnear approxmaton of Q-functon table and effcent Never dverge or gve meanngle anwer LSPI can be appled to a dataet regardle of how t wa collected But garbage n, garbage out. Leat-Square Polcy Iteraton, Mchal Lagoudak and Ronald Parr, Journal of Machne Learnng Reearch JMLR), Vol. 4, 2003, pp

6 Notaton S: tate pace, : ndvdual tate R,a): reward for takng acton a n tate g: dcount factor V): tate value functon P,a) = T,a, ): tranton functon Q,a) : tate-acton value functon Polcy: ) a Objectve: Maxmze expected, total, dcounted reward E t 0 g t rt

7 Projecton Approach to Approxmaton Recall the tandard Bellman equaton: V * ) max R, a) g or equvalently V * T[ V * ] Bellman operator a where T[.] the Recall from value teraton, the ub-optmalty of a value functon can be bounded n term of the Bellman error: V T[V ] Th motvate tryng to fnd an approxmate value functon wth mall Bellman error ' P ', a) V * ')

8 Projecton Approach to Approxmaton Suppoe that we have a pace of repreentable value functon E.g. the pace of lnear functon over gven feature Let P be a projecton operator for that pace Project any value functon n or outde of the pace) to cloet value functon n the pace Fxed Pont Bellman Equaton wth approxmaton ˆ * V T[ Vˆ * ] Dependng on pace th wll have a mall Bellman error LSPI wll attempt to arrve at uch a value functon Aume lnear approxmaton and leat-quare projecton

9

10 Projected Value Iteraton Naïve Idea: try computng projected fxed pont ung VI Exact VI: terate Bellman backup) V T[ V 1 ] Projected VI: terated projected Bellman backup): ˆ V 1 T[ Vˆ ] Project exact Bellman backup to cloet functon n our retrcted functon pace exact Bellman backup produced value functon)

11 Example: Projected Bellman Backup Retrct pace to lnear functon over a ngle feature f: Vˆ ) wf ) Suppoe jut two tate 1 and 2 wth: Suppoe exact backup of V gve: T Vˆ ˆ ] 1) 2, T[ V ] ) 2 [ 2 f 1 )=1, f 2 )=2 Can we repreent th exact backup n our lnear pace? No f 1 )=1 f 2 )=2

12 Example: Projected Bellman Backup Retrct pace to lnear functon over a ngle feature f: Vˆ ) wf ) Suppoe jut two tate 1 and 2 wth: Suppoe exact backup of V gve: T Vˆ ˆ ] 1) 2, T[ V ] ) 2 [ 2 f 1 )=1, f 2 )=2 The backup can t be repreented va our lnear functon: ˆ V 1 T[ Vˆ ] f 1 )=1 f 2 )=2 ˆ 1 ) 1.333f ) V projected backup jut leat-quare ft to exact backup

13 Problem: Stablty Exact value teraton tablty enured by contracton property of Bellman backup: V T[ V 1 ] I the projected Bellman backup a contracton: ˆ V 1 T[ Vˆ ]?

14 Example: Stablty Problem [Berteka & Ttkl 1996] Problem: Mot projecton lead to backup that are not contracton and untable 1 2 Reward all zero, ngle acton, g = 0.9: V* = 0 Conder lnear approx. w/ ngle feature f wth weght w. Vˆ ) wf ) Optmal w = 0 nce V*=0

15 Example: Stablty Problem f 1 )=1 1 2 f 2 )=2 V 1 ) = w V 2 ) = 2w weght value at teraton From V perform projected backup for each tate ˆ T[ V ] 1) gv 2) 1. 8 ˆ T[ V ] 2) gv 2) 1. 8 ˆ ˆ Can t be repreented n our pace o fnd w +1 that gve leat-quare approx. to exact backup After ome math we can get: w +1 = 1.2 w w w What doe th mean?

16 Example: Stablty Problem Vx) Vˆ 3 Vˆ 2 1 Vˆ 0 Vˆ Iteraton # 1 2 S Each teraton of Bellman backup make approxmaton wore! Even for th trval problem projected VI dverge.

17 Undertandng the Problem What went wrong? Exact Bellman backup reduce error n max-norm Leat quare = projecton) non-expanve n L 2 norm May ncreae max-norm dtance Concluon: Alternatng value teraton and functon approxmaton rky bune

18 Overvew Motvaton LSPI Dervaton from Leat-Square Temporal Dfference Learnng Expermental reult

19 How doe LSPI fx thee? LSPI perform approxmate polcy teraton PI nvolve polcy evaluaton and polcy mprovement Ue a varant of leat-quare temporal dfference learnng LSTD) for approx. polcy evaluaton [Bratdke & Barto 96] Stablty: LSTD drectly olve for the fxed pont of the approxmate Bellman equaton for polcy value Wth ngular-value decompoton SVD), th alway well defned Data effcency LSTD fnd bet approxmaton for any fnte data et Make a ngle pa over the data for each polcy Can be mplemented ncrementally

20 OK, What LSTD? Leat Square Temporal Dfference Learnng Aume lnear value functon approxmaton of K feature The f k are arbtrary feature functon of tate Some vector notaton k k w k V ) ) ˆ f ) ˆ ) ˆ ˆ 1 n V V V w k w w 1 ) 1 ) n k k k f f f f K f 1

21 Dervng LSTD Vˆ w agn a value to every tate = K ba functon f 1 1) f 2 1)... f 1 2) f 2 2).. # tate Vˆ a lnear functon n the column pace of f 1 f k, that, Vˆ w 1 f 1 w K f K.

22 Suppoe we know value of polcy Want: w V Leat quare weght mnmze quared error w T 1 ) T V Sometme called peudonvere Leat quare projecton then Vˆ w T 1 ) V Textbook leat quare projecton operator T

23 But we don t know V Recall fxed-pont equaton for polce Wll olve a projected fxed-pont equaton: Subttutng leat quare projecton nto th gve: Solvng for w: g V P R V ˆ ˆ w P R w T T g 1 ) R P w T T T 1 ) g )), )), )), )),, )), )), n n n n n n n P P P P P R R R ' ') )), ' )), ) V P R V g

24 Almot there w T g T P) 1 T R Matrx to nvert only K x K But Expenve to contruct matrx e.g. P S x S ) We don t know P We don t know R

25 Ung Sample for Suppoe we have tate tranton ample of the polcy runnng n the MDP: {,a,r, )} Idea: Replace enumeraton of tate wth ampled tate K ba functon f 1 1) f 2 1)... ˆ f 1 2) f 2 2).. ample tate.

26 Ung Sample for R Suppoe we have tate tranton ample of the polcy runnng n the MDP: {,a,r, )} Idea: Replace enumeraton of reward wth ampled reward r 1 R = r 2... ample

27 Ung Sample for P Idea: Replace expectaton over next tate wth ampled next tate. K ba functon f 1 1 ) f 2 1 )... P f 1 2 ) f 2 2 )... from,a,r, )

28 Puttng t Together LSTD need to compute: w g P) The hard part of whch B the kxk matrx: Both B and b can be computed ncrementally for each,a,r, ) ample: ntalze to zero) R B T T 1 T 1 T T B g P) T b R B j b B j b from prevou lde b f ) f ) gf ) f ') r f ) j j

29 LSTD Algorthm Collect data by executng trajectore of current polcy For each,a,r, ) ample: B b j b B j f ) f ) gf ) f ') r f, a) j j w B 1 b

30 LSTD Summary Doe Ok 2 ) work per datum Lnear n amount of data. Approache model-baed anwer n lmt Fndng fxed pont requre nvertng matrx Fxed pont almot alway ext Stable; effcent

31 Approxmate Polcy Iteraton wth LSTD Polcy Iteraton: terate between polcy mprovement and polcy evaluaton Idea: ue LSTD for approxmate polcy evaluaton n PI Start wth random weght w.e. value functon) Repeat Untl Convergence ) = greedy ) // polcy mprovement Evaluate Vˆ, w ung LSTD Generate ample trajectore of Ue LSTD to produce new weght w w gve an approx. value functon of ) )

32 What Break? No way to execute greedy polcy wthout a model Approxmaton baed by current polcy We only approxmate value of tate we ee when executng the current polcy LSTD a weghted approxmaton toward thoe tate Can reult n Learn-forget cycle of polcy teraton Drve off the road; learn that t bad New polcy never doe th; forget that t bad Not truly a batch method Data mut be collected from current polcy for LSTD

33 LSPI LSPI mlar to prevou loop but replace LSTD wth a new algorthm LSTDQ LSTD: produce a value functon Requre ample from polcy under conderaton LSTDQ: produce a Q-functon for current polcy Can learn Q-functon for polcy from any reaonable) et of ample---ometme called an off-polcy method No need to collect ample from current polcy Dconnect polcy evaluaton from data collecton Permt reue of data acro teraton! Truly a batch method.

34 Implementng LSTDQ Both LSTD and LSTDQ compute: But LSTDQ ba functon are ndexed by acton defne greedy polcy: For each,a,r, ) ample: B j b w Q ˆ, a) w f, a) w B 1 B b b j k T T B P) f, a) f, a) f, a) f ', ')) r f, a) k k ˆ w ) arg max Q, a) j a arg max a j w ˆ Q w w ', a)

35 Runnng LSPI There a Matlab mplementaton avalable! 1. Collect a databae of,a,r, ) experence th the magc tep) 2. Start wth random weght = random polcy) 3. Repeat Evaluate current polcy agant databae Run LSTDQ to generate new et of weght New weght mply new Q-functon and hence new polcy Replace current weght wth new weght Untl convergence

36 Reult: Bcycle Rdng Watch random controller operate bke Collect ~40,000,a,r, ) ample Pck 20 mple feature functon 5 acton) Make 5-10 pae over data PI tep) Reward wa baed on dtance to goal + goal achevement Reult: Controller that balance and rde to goal

37 Bcycle Trajectore

38 What about Q-learnng? Ran Q-learnng wth ame feature Ued experence replay for data effcency

39 Q-learnng Reult

40 LSPI Robutne

41 Some key pont LSPI a batch RL algorthm Can generate trajectory data anyway you want Induce a polcy baed on global optmzaton over full dataet Very table wth no parameter that need tweakng

42 So, what the bad new? LSPI doe not addre the exploraton problem It decouple data collecton from polcy optmzaton Th often not a major ue, but can be n ome cae k 2 can ometme be bg Lot of torage Matrx nveron can be expenve Bcycle needed hapng reward Stll haven t olved Feature electon ue for all machne learnng, but RL eem even more entve)

Batch Reinforcement Learning

Batch Reinforcement Learning Batch Renforcement Learnng Alan Fern * Baed n part on lde by Ronald Parr Overvew What batch renforcement learnng? Leat Square Polcy Iteraton Ftted Q-teraton Batch DQN Onlne veru Batch RL Onlne RL: ntegrate

More information

Additional File 1 - Detailed explanation of the expression level CPD

Additional File 1 - Detailed explanation of the expression level CPD Addtonal Fle - Detaled explanaton of the expreon level CPD A mentoned n the man text, the man CPD for the uterng model cont of two ndvdual factor: P( level gen P( level gen P ( level gen 2 (.).. CPD factor

More information

Small signal analysis

Small signal analysis Small gnal analy. ntroducton Let u conder the crcut hown n Fg., where the nonlnear retor decrbed by the equaton g v havng graphcal repreentaton hown n Fg.. ( G (t G v(t v Fg. Fg. a D current ource wherea

More information

The Essential Dynamics Algorithm: Essential Results

The Essential Dynamics Algorithm: Essential Results @ MIT maachuett nttute of technology artfcal ntellgence laboratory The Eental Dynamc Algorthm: Eental Reult Martn C. Martn AI Memo 003-014 May 003 003 maachuett nttute of technology, cambrdge, ma 0139

More information

Improvements on Waring s Problem

Improvements on Waring s Problem Improvement on Warng Problem L An-Png Bejng, PR Chna apl@nacom Abtract By a new recurve algorthm for the auxlary equaton, n th paper, we wll gve ome mprovement for Warng problem Keyword: Warng Problem,

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

Harmonic oscillator approximation

Harmonic oscillator approximation armonc ocllator approxmaton armonc ocllator approxmaton Euaton to be olved We are fndng a mnmum of the functon under the retrcton where W P, P,..., P, Q, Q,..., Q P, P,..., P, Q, Q,..., Q lnwgner functon

More information

Team. Outline. Statistics and Art: Sampling, Response Error, Mixed Models, Missing Data, and Inference

Team. Outline. Statistics and Art: Sampling, Response Error, Mixed Models, Missing Data, and Inference Team Stattc and Art: Samplng, Repone Error, Mxed Model, Mng Data, and nference Ed Stanek Unverty of Maachuett- Amhert, USA 9/5/8 9/5/8 Outlne. Example: Doe-repone Model n Toxcology. ow to Predct Realzed

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

Specification -- Assumptions of the Simple Classical Linear Regression Model (CLRM) 1. Introduction

Specification -- Assumptions of the Simple Classical Linear Regression Model (CLRM) 1. Introduction ECONOMICS 35* -- NOTE ECON 35* -- NOTE Specfcaton -- Aumpton of the Smple Clacal Lnear Regreon Model (CLRM). Introducton CLRM tand for the Clacal Lnear Regreon Model. The CLRM alo known a the tandard lnear

More information

1 Convex Optimization

1 Convex Optimization Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,

More information

AP Statistics Ch 3 Examining Relationships

AP Statistics Ch 3 Examining Relationships Introducton To tud relatonhp between varable, we mut meaure the varable on the ame group of ndvdual. If we thnk a varable ma eplan or even caue change n another varable, then the eplanator varable and

More information

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE Analytcal soluton s usually not possble when exctaton vares arbtrarly wth tme or f the system s nonlnear. Such problems can be solved by numercal tmesteppng

More information

Root Locus Techniques

Root Locus Techniques Root Locu Technque ELEC 32 Cloed-Loop Control The control nput u t ynthezed baed on the a pror knowledge of the ytem plant, the reference nput r t, and the error gnal, e t The control ytem meaure the output,

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

2.3 Least-Square regressions

2.3 Least-Square regressions .3 Leat-Square regreon Eample.10 How do chldren grow? The pattern of growth vare from chld to chld, o we can bet undertandng the general pattern b followng the average heght of a number of chldren. Here

More information

Chapter 6 The Effect of the GPS Systematic Errors on Deformation Parameters

Chapter 6 The Effect of the GPS Systematic Errors on Deformation Parameters Chapter 6 The Effect of the GPS Sytematc Error on Deformaton Parameter 6.. General Beutler et al., (988) dd the frt comprehenve tudy on the GPS ytematc error. Baed on a geometrc approach and aumng a unform

More information

Additional Codes using Finite Difference Method. 1 HJB Equation for Consumption-Saving Problem Without Uncertainty

Additional Codes using Finite Difference Method. 1 HJB Equation for Consumption-Saving Problem Without Uncertainty Addtonal Codes usng Fnte Dfference Method Benamn Moll 1 HJB Equaton for Consumpton-Savng Problem Wthout Uncertanty Before consderng the case wth stochastc ncome n http://www.prnceton.edu/~moll/ HACTproect/HACT_Numercal_Appendx.pdf,

More information

LOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin

LOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin Proceedngs of the 007 Wnter Smulaton Conference S G Henderson, B Bller, M-H Hseh, J Shortle, J D Tew, and R R Barton, eds LOW BIAS INTEGRATED PATH ESTIMATORS James M Calvn Department of Computer Scence

More information

Chapter 11. Supplemental Text Material. The method of steepest ascent can be derived as follows. Suppose that we have fit a firstorder

Chapter 11. Supplemental Text Material. The method of steepest ascent can be derived as follows. Suppose that we have fit a firstorder S-. The Method of Steepet cent Chapter. Supplemental Text Materal The method of teepet acent can be derved a follow. Suppoe that we have ft a frtorder model y = β + β x and we wh to ue th model to determne

More information

EEL 6266 Power System Operation and Control. Chapter 3 Economic Dispatch Using Dynamic Programming

EEL 6266 Power System Operation and Control. Chapter 3 Economic Dispatch Using Dynamic Programming EEL 6266 Power System Operaton and Control Chapter 3 Economc Dspatch Usng Dynamc Programmng Pecewse Lnear Cost Functons Common practce many utltes prefer to represent ther generator cost functons as sngle-

More information

find (x): given element x, return the canonical element of the set containing x;

find (x): given element x, return the canonical element of the set containing x; COS 43 Sprng, 009 Dsjont Set Unon Problem: Mantan a collecton of dsjont sets. Two operatons: fnd the set contanng a gven element; unte two sets nto one (destructvely). Approach: Canoncal element method:

More information

EEE 241: Linear Systems

EEE 241: Linear Systems EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they

More information

Statistical Properties of the OLS Coefficient Estimators. 1. Introduction

Statistical Properties of the OLS Coefficient Estimators. 1. Introduction ECOOMICS 35* -- OTE 4 ECO 35* -- OTE 4 Stattcal Properte of the OLS Coeffcent Etmator Introducton We derved n ote the OLS (Ordnary Leat Square etmator ˆβ j (j, of the regreon coeffcent βj (j, n the mple

More information

A Result on a Cyclic Polynomials

A Result on a Cyclic Polynomials Gen. Math. Note, Vol. 6, No., Feruary 05, pp. 59-65 ISSN 9-78 Copyrght ICSRS Pulcaton, 05.-cr.org Avalale free onlne at http:.geman.n A Reult on a Cyclc Polynomal S.A. Wahd Department of Mathematc & Stattc

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

Problem Set 9 Solutions

Problem Set 9 Solutions Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem

More information

Improvements on Waring s Problem

Improvements on Waring s Problem Imrovement on Warng Problem L An-Png Bejng 85, PR Chna al@nacom Abtract By a new recurve algorthm for the auxlary equaton, n th aer, we wll gve ome mrovement for Warng roblem Keyword: Warng Problem, Hardy-Lttlewood

More information

MULTIPLE REGRESSION ANALYSIS For the Case of Two Regressors

MULTIPLE REGRESSION ANALYSIS For the Case of Two Regressors MULTIPLE REGRESSION ANALYSIS For the Cae of Two Regreor In the followng note, leat-quare etmaton developed for multple regreon problem wth two eplanator varable, here called regreor (uch a n the Fat Food

More information

Scattering of two identical particles in the center-of. of-mass frame. (b)

Scattering of two identical particles in the center-of. of-mass frame. (b) Lecture # November 5 Scatterng of two dentcal partcle Relatvtc Quantum Mechanc: The Klen-Gordon equaton Interpretaton of the Klen-Gordon equaton The Drac equaton Drac repreentaton for the matrce α and

More information

Two Approaches to Proving. Goldbach s Conjecture

Two Approaches to Proving. Goldbach s Conjecture Two Approache to Provng Goldbach Conecture By Bernard Farley Adved By Charle Parry May 3 rd 5 A Bref Introducton to Goldbach Conecture In 74 Goldbach made h mot famou contrbuton n mathematc wth the conecture

More information

Ensemble Methods: Boosting

Ensemble Methods: Boosting Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement

More information

CHAPTER IV RESEARCH FINDING AND ANALYSIS

CHAPTER IV RESEARCH FINDING AND ANALYSIS CHAPTER IV REEARCH FINDING AND ANALYI A. Descrpton of Research Fndngs To fnd out the dfference between the students who were taught by usng Mme Game and the students who were not taught by usng Mme Game

More information

Lecture 10 Support Vector Machines. Oct

Lecture 10 Support Vector Machines. Oct Lecture 10 Support Vector Machnes Oct - 20-2008 Lnear Separators Whch of the lnear separators s optmal? Concept of Margn Recall that n Perceptron, we learned that the convergence rate of the Perceptron

More information

Supplement: Proofs and Technical Details for The Solution Path of the Generalized Lasso

Supplement: Proofs and Technical Details for The Solution Path of the Generalized Lasso Supplement: Proofs and Techncal Detals for The Soluton Path of the Generalzed Lasso Ryan J. Tbshran Jonathan Taylor In ths document we gve supplementary detals to the paper The Soluton Path of the Generalzed

More information

Singular Value Decomposition: Theory and Applications

Singular Value Decomposition: Theory and Applications Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Some Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS)

Some Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS) Some Comments on Acceleratng Convergence of Iteratve Sequences Usng Drect Inverson of the Iteratve Subspace (DIIS) C. Davd Sherrll School of Chemstry and Bochemstry Georga Insttute of Technology May 1998

More information

Reinforcement learning

Reinforcement learning Renforcement learnng Nathanel Daw Gatsby Computatonal Neuroscence Unt daw @ gatsby.ucl.ac.uk http://www.gatsby.ucl.ac.uk/~daw Mostly adapted from Andrew Moore s tutorals, copyrght 2002, 2004 by Andrew

More information

Errors for Linear Systems

Errors for Linear Systems Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch

More information

Lecture 21: Numerical methods for pricing American type derivatives

Lecture 21: Numerical methods for pricing American type derivatives Lecture 21: Numercal methods for prcng Amercan type dervatves Xaoguang Wang STAT 598W Aprl 10th, 2014 (STAT 598W) Lecture 21 1 / 26 Outlne 1 Fnte Dfference Method Explct Method Penalty Method (STAT 598W)

More information

Linear Feature Engineering 11

Linear Feature Engineering 11 Lnear Feature Engneerng 11 2 Least-Squares 2.1 Smple least-squares Consder the followng dataset. We have a bunch of nputs x and correspondng outputs y. The partcular values n ths dataset are x y 0.23 0.19

More information

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011 Stanford Unversty CS359G: Graph Parttonng and Expanders Handout 4 Luca Trevsan January 3, 0 Lecture 4 In whch we prove the dffcult drecton of Cheeger s nequalty. As n the past lectures, consder an undrected

More information

Outline and Reading. Dynamic Programming. Dynamic Programming revealed. Computing Fibonacci. The General Dynamic Programming Technique

Outline and Reading. Dynamic Programming. Dynamic Programming revealed. Computing Fibonacci. The General Dynamic Programming Technique Outlne and Readng Dynamc Programmng The General Technque ( 5.3.2) -1 Knapsac Problem ( 5.3.3) Matrx Chan-Product ( 5.3.1) Dynamc Programmng verson 1.4 1 Dynamc Programmng verson 1.4 2 Dynamc Programmng

More information

Problem #1. Known: All required parameters. Schematic: Find: Depth of freezing as function of time. Strategy:

Problem #1. Known: All required parameters. Schematic: Find: Depth of freezing as function of time. Strategy: BEE 3500 013 Prelm Soluton Problem #1 Known: All requred parameter. Schematc: Fnd: Depth of freezng a functon of tme. Strategy: In thee mplfed analy for freezng tme, a wa done n cla for a lab geometry,

More information

Neural networks. Nuno Vasconcelos ECE Department, UCSD

Neural networks. Nuno Vasconcelos ECE Department, UCSD Neural networs Nuno Vasconcelos ECE Department, UCSD Classfcaton a classfcaton problem has two types of varables e.g. X - vector of observatons (features) n the world Y - state (class) of the world x X

More information

CHAPTER 9 LINEAR MOMENTUM, IMPULSE AND COLLISIONS

CHAPTER 9 LINEAR MOMENTUM, IMPULSE AND COLLISIONS CHAPTER 9 LINEAR MOMENTUM, IMPULSE AND COLLISIONS 103 Phy 1 9.1 Lnear Momentum The prncple o energy conervaton can be ued to olve problem that are harder to olve jut ung Newton law. It ued to decrbe moton

More information

CS4495/6495 Introduction to Computer Vision. 3C-L3 Calibrating cameras

CS4495/6495 Introduction to Computer Vision. 3C-L3 Calibrating cameras CS4495/6495 Introducton to Computer Vson 3C-L3 Calbratng cameras Fnally (last tme): Camera parameters Projecton equaton the cumulatve effect of all parameters: M (3x4) f s x ' 1 0 0 0 c R 0 I T 3 3 3 x1

More information

CSC 411 / CSC D11 / CSC C11

CSC 411 / CSC D11 / CSC C11 18 Boostng s a general strategy for learnng classfers by combnng smpler ones. The dea of boostng s to take a weak classfer that s, any classfer that wll do at least slghtly better than chance and use t

More information

NUMERICAL DIFFERENTIATION

NUMERICAL DIFFERENTIATION NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the

More information

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements CS 750 Machne Learnng Lecture 5 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square CS 750 Machne Learnng Announcements Homework Due on Wednesday before the class Reports: hand n before

More information

Lecture 3: Dual problems and Kernels

Lecture 3: Dual problems and Kernels Lecture 3: Dual problems and Kernels C4B Machne Learnng Hlary 211 A. Zsserman Prmal and dual forms Lnear separablty revsted Feature mappng Kernels for SVMs Kernel trck requrements radal bass functons SVM

More information

MMA and GCMMA two methods for nonlinear optimization

MMA and GCMMA two methods for nonlinear optimization MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons

More information

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009 College of Computer & Informaton Scence Fall 2009 Northeastern Unversty 20 October 2009 CS7880: Algorthmc Power Tools Scrbe: Jan Wen and Laura Poplawsk Lecture Outlne: Prmal-dual schema Network Desgn:

More information

Lecture 4. Instructor: Haipeng Luo

Lecture 4. Instructor: Haipeng Luo Lecture 4 Instructor: Hapeng Luo In the followng lectures, we focus on the expert problem and study more adaptve algorthms. Although Hedge s proven to be worst-case optmal, one may wonder how well t would

More information

Solutions to exam in SF1811 Optimization, Jan 14, 2015

Solutions to exam in SF1811 Optimization, Jan 14, 2015 Solutons to exam n SF8 Optmzaton, Jan 4, 25 3 3 O------O -4 \ / \ / The network: \/ where all lnks go from left to rght. /\ / \ / \ 6 O------O -5 2 4.(a) Let x = ( x 3, x 4, x 23, x 24 ) T, where the varable

More information

Numerical Algorithms for Visual Computing 2008/09 Example Solutions for Assignment 4. Problem 1 (Shift invariance of the Laplace operator)

Numerical Algorithms for Visual Computing 2008/09 Example Solutions for Assignment 4. Problem 1 (Shift invariance of the Laplace operator) Numercal Algorthms for Vsual Computng 008/09 Example Solutons for Assgnment 4 Problem (Shft nvarance of the Laplace operator The Laplace equaton s shft nvarant,.e., nvarant under translatons x x + a, y

More information

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #16 Scribe: Yannan Wang April 3, 2014

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #16 Scribe: Yannan Wang April 3, 2014 COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #16 Scrbe: Yannan Wang Aprl 3, 014 1 Introducton The goal of our onlne learnng scenaro from last class s C comparng wth best expert and

More information

Deep Reinforcement Learning with Experience Replay Based on SARSA

Deep Reinforcement Learning with Experience Replay Based on SARSA Deep Renforcement Learnng wth Experence Replay Baed on SARSA Dongbn Zhao, Hatao Wang, Kun Shao and Yuanheng Zhu Key Laboratory of Management and Control for Complex Sytem Inttute of Automaton Chnee Academy

More information

Differentiating Gaussian Processes

Differentiating Gaussian Processes Dfferentatng Gaussan Processes Andrew McHutchon Aprl 17, 013 1 Frst Order Dervatve of the Posteror Mean The posteror mean of a GP s gven by, f = x, X KX, X 1 y x, X α 1 Only the x, X term depends on the

More information

IV. Performance Optimization

IV. Performance Optimization IV. Performance Optmzaton A. Steepest descent algorthm defnton how to set up bounds on learnng rate mnmzaton n a lne (varyng learnng rate) momentum learnng examples B. Newton s method defnton Gauss-Newton

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16 STAT 39: MATHEMATICAL COMPUTATIONS I FALL 218 LECTURE 16 1 why teratve methods f we have a lnear system Ax = b where A s very, very large but s ether sparse or structured (eg, banded, Toepltz, banded plus

More information

Vector Norms. Chapter 7 Iterative Techniques in Matrix Algebra. Cauchy-Bunyakovsky-Schwarz Inequality for Sums. Distances. Convergence.

Vector Norms. Chapter 7 Iterative Techniques in Matrix Algebra. Cauchy-Bunyakovsky-Schwarz Inequality for Sums. Distances. Convergence. Vector Norms Chapter 7 Iteratve Technques n Matrx Algebra Per-Olof Persson persson@berkeley.edu Department of Mathematcs Unversty of Calforna, Berkeley Math 128B Numercal Analyss Defnton A vector norm

More information

Social Studies 201 Notes for March 18, 2005

Social Studies 201 Notes for March 18, 2005 1 Social Studie 201 Note for March 18, 2005 Etimation of a mean, mall ample ize Section 8.4, p. 501. When a reearcher ha only a mall ample ize available, the central limit theorem doe not apply to the

More information

CS : Algorithms and Uncertainty Lecture 17 Date: October 26, 2016

CS : Algorithms and Uncertainty Lecture 17 Date: October 26, 2016 CS 29-128: Algorthms and Uncertanty Lecture 17 Date: October 26, 2016 Instructor: Nkhl Bansal Scrbe: Mchael Denns 1 Introducton In ths lecture we wll be lookng nto the secretary problem, and an nterestng

More information

Matrix Approximation and Projective Clustering via Iterative Sampling

Matrix Approximation and Projective Clustering via Iterative Sampling Matrx Approxmaton and Projectve Cluterng va Iteratve Samplng Lu Rademacher Santoh Vempala Grant Wang Abtract We preent two new reult for the problem of approxmatng a gven real m n matrx A by a rank-k matrx

More information

Lecture 20: November 7

Lecture 20: November 7 0-725/36-725: Convex Optmzaton Fall 205 Lecturer: Ryan Tbshran Lecture 20: November 7 Scrbes: Varsha Chnnaobreddy, Joon Sk Km, Lngyao Zhang Note: LaTeX template courtesy of UC Berkeley EECS dept. Dsclamer:

More information

Supporting Information

Supporting Information Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to

More information

1 Derivation of Point-to-Plane Minimization

1 Derivation of Point-to-Plane Minimization 1 Dervaton of Pont-to-Plane Mnmzaton Consder the Chen-Medon (pont-to-plane) framework for ICP. Assume we have a collecton of ponts (p, q ) wth normals n. We want to determne the optmal rotaton and translaton

More information

Chapter 12. Ordinary Differential Equation Boundary Value (BV) Problems

Chapter 12. Ordinary Differential Equation Boundary Value (BV) Problems Chapter. Ordnar Dfferental Equaton Boundar Value (BV) Problems In ths chapter we wll learn how to solve ODE boundar value problem. BV ODE s usuall gven wth x beng the ndependent space varable. p( x) q(

More information

CS 229, Public Course Problem Set #3 Solutions: Learning Theory and Unsupervised Learning

CS 229, Public Course Problem Set #3 Solutions: Learning Theory and Unsupervised Learning CS9 Problem Set #3 Solutons CS 9, Publc Course Problem Set #3 Solutons: Learnng Theory and Unsupervsed Learnng. Unform convergence and Model Selecton In ths problem, we wll prove a bound on the error of

More information

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons

More information

Social Studies 201 Notes for November 14, 2003

Social Studies 201 Notes for November 14, 2003 1 Social Studie 201 Note for November 14, 2003 Etimation of a mean, mall ample ize Section 8.4, p. 501. When a reearcher ha only a mall ample ize available, the central limit theorem doe not apply to the

More information

Introduction to Algorithms

Introduction to Algorithms Introducton to Algorthms 6.046J/8.40J Lecture 7 Prof. Potr Indyk Data Structures Role of data structures: Encapsulate data Support certan operatons (e.g., INSERT, DELETE, SEARCH) Our focus: effcency of

More information

Some modelling aspects for the Matlab implementation of MMA

Some modelling aspects for the Matlab implementation of MMA Some modellng aspects for the Matlab mplementaton of MMA Krster Svanberg krlle@math.kth.se Optmzaton and Systems Theory Department of Mathematcs KTH, SE 10044 Stockholm September 2004 1. Consdered optmzaton

More information

Zhongping Jiang Tengfei Liu

Zhongping Jiang Tengfei Liu Zhongpng Jang Tengfe Lu 4 F.L. Lews, NAI Moncref-O Donnell Char, UTA Research Insttute (UTARI) The Unversty of Texas at Arlngton, USA and Qan Ren Consultng Professor, State Key Laboratory of Synthetcal

More information

Lecture 12: Discrete Laplacian

Lecture 12: Discrete Laplacian Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

More information

Lecture 4: Constant Time SVD Approximation

Lecture 4: Constant Time SVD Approximation Spectral Algorthms and Representatons eb. 17, Mar. 3 and 8, 005 Lecture 4: Constant Tme SVD Approxmaton Lecturer: Santosh Vempala Scrbe: Jangzhuo Chen Ths topc conssts of three lectures 0/17, 03/03, 03/08),

More information

p 1 c 2 + p 2 c 2 + p 3 c p m c 2

p 1 c 2 + p 2 c 2 + p 3 c p m c 2 Where to put a faclty? Gven locatons p 1,..., p m n R n of m houses, want to choose a locaton c n R n for the fre staton. Want c to be as close as possble to all the house. We know how to measure dstance

More information

Support Vector Machines CS434

Support Vector Machines CS434 Support Vector Machnes CS434 Lnear Separators Many lnear separators exst that perfectly classfy all tranng examples Whch of the lnear separators s the best? Intuton of Margn Consder ponts A, B, and C We

More information

How Strong Are Weak Patents? Joseph Farrell and Carl Shapiro. Supplementary Material Licensing Probabilistic Patents to Cournot Oligopolists *

How Strong Are Weak Patents? Joseph Farrell and Carl Shapiro. Supplementary Material Licensing Probabilistic Patents to Cournot Oligopolists * How Strong Are Weak Patents? Joseph Farrell and Carl Shapro Supplementary Materal Lcensng Probablstc Patents to Cournot Olgopolsts * September 007 We study here the specal case n whch downstream competton

More information

Introduction. Modeling Data. Approach. Quality of Fit. Likelihood. Probabilistic Approach

Introduction. Modeling Data. Approach. Quality of Fit. Likelihood. Probabilistic Approach Introducton Modelng Data Gven a et of obervaton, we wh to ft a mathematcal model Model deend on adutable arameter traght lne: m + c n Polnomal: a + a + a + L+ a n Choce of model deend uon roblem Aroach

More information

j=0 s t t+1 + q t are vectors of length equal to the number of assets (c t+1 ) q t +1 + d i t+1 (1) (c t+1 ) R t+1 1= E t β u0 (c t+1 ) R u 0 (c t )

j=0 s t t+1 + q t are vectors of length equal to the number of assets (c t+1 ) q t +1 + d i t+1 (1) (c t+1 ) R t+1 1= E t β u0 (c t+1 ) R u 0 (c t ) 1 Aet Prce: overvew Euler equaton C-CAPM equty premum puzzle and rk free rate puzzle Law of One Prce / No Arbtrage Hanen-Jagannathan bound reoluton of equty premum puzzle Euler equaton agent problem X

More information

Quick Visit to Bernoulli Land

Quick Visit to Bernoulli Land Although we have een the Bernoull equaton and een t derved before, th next note how t dervaton for an uncopreble & nvcd flow. The dervaton follow that of Kuethe &Chow ot cloely (I lke t better than Anderon).

More information

Chapter Newton s Method

Chapter Newton s Method Chapter 9. Newton s Method After readng ths chapter, you should be able to:. Understand how Newton s method s dfferent from the Golden Secton Search method. Understand how Newton s method works 3. Solve

More information

Start Point and Trajectory Analysis for the Minimal Time System Design Algorithm

Start Point and Trajectory Analysis for the Minimal Time System Design Algorithm Start Pont and Trajectory Analy for the Mnmal Tme Sytem Degn Algorthm ALEXANDER ZEMLIAK, PEDRO MIRANDA Department of Phyc and Mathematc Puebla Autonomou Unverty Av San Claudo /n, Puebla, 757 MEXICO Abtract:

More information

Separation Axioms of Fuzzy Bitopological Spaces

Separation Axioms of Fuzzy Bitopological Spaces IJCSNS Internatonal Journal of Computer Scence and Network Securty VOL3 No October 3 Separaton Axom of Fuzzy Btopologcal Space Hong Wang College of Scence Southwet Unverty of Scence and Technology Manyang

More information

Nodal analysis of finite square resistive grids and the teaching effectiveness of students projects

Nodal analysis of finite square resistive grids and the teaching effectiveness of students projects 2 nd World Conference on Technology and Engneerng Educaton 2 WIETE Lublana Slovena 5-8 September 2 Nodal analyss of fnte square resstve grds and the teachng effectveness of students proects P. Zegarmstrz

More information

Chapter - 2. Distribution System Power Flow Analysis

Chapter - 2. Distribution System Power Flow Analysis Chapter - 2 Dstrbuton System Power Flow Analyss CHAPTER - 2 Radal Dstrbuton System Load Flow 2.1 Introducton Load flow s an mportant tool [66] for analyzng electrcal power system network performance. Load

More information

arxiv: v1 [cs.gt] 15 Jan 2019

arxiv: v1 [cs.gt] 15 Jan 2019 Model and algorthm for tme-content rk-aware Markov game Wenje Huang, Pham Vet Ha and Wllam B. Hakell January 16, 2019 arxv:1901.04882v1 [c.gt] 15 Jan 2019 Abtract In th paper, we propoe a model for non-cooperatve

More information

Technical Note: Capacity Constraints Across Nests in Assortment Optimization Under the Nested Logit Model

Technical Note: Capacity Constraints Across Nests in Assortment Optimization Under the Nested Logit Model Techncal Note: Capacty Constrants Across Nests n Assortment Optmzaton Under the Nested Logt Model Jacob B. Feldman, Huseyn Topaloglu School of Operatons Research and Informaton Engneerng, Cornell Unversty,

More information

Solution Methods for Time-indexed MIP Models for Chemical Production Scheduling

Solution Methods for Time-indexed MIP Models for Chemical Production Scheduling Ian Davd Lockhart Bogle and Mchael Farweather (Edtor), Proceedng of the 22nd European Sympoum on Computer Aded Proce Engneerng, 17-2 June 212, London. 212 Elever B.V. All rght reerved. Soluton Method for

More information

This appendix presents the derivations and proofs omitted from the main text.

This appendix presents the derivations and proofs omitted from the main text. Onlne Appendx A Appendx: Omtted Dervaton and Proof Th appendx preent the dervaton and proof omtted from the man text A Omtted dervaton n Secton Mot of the analy provded n the man text Here, we formally

More information

Relaxation Methods for Iterative Solution to Linear Systems of Equations

Relaxation Methods for Iterative Solution to Linear Systems of Equations Relaxaton Methods for Iteratve Soluton to Lnear Systems of Equatons Gerald Recktenwald Portland State Unversty Mechancal Engneerng Department gerry@pdx.edu Overvew Techncal topcs Basc Concepts Statonary

More information

Erratum: A Generalized Path Integral Control Approach to Reinforcement Learning

Erratum: A Generalized Path Integral Control Approach to Reinforcement Learning Journal of Machne Learnng Research 00-9 Submtted /0; Publshed 7/ Erratum: A Generalzed Path Integral Control Approach to Renforcement Learnng Evangelos ATheodorou Jonas Buchl Stefan Schaal Department of

More information

Variable Structure Control ~ Basics

Variable Structure Control ~ Basics Varable Structure Control ~ Bac Harry G. Kwatny Department of Mechancal Engneerng & Mechanc Drexel Unverty Outlne A prelmnary example VS ytem, ldng mode, reachng Bac of dcontnuou ytem Example: underea

More information

Support Vector Machines CS434

Support Vector Machines CS434 Support Vector Machnes CS434 Lnear Separators Many lnear separators exst that perfectly classfy all tranng examples Whch of the lnear separators s the best? + + + + + + + + + Intuton of Margn Consder ponts

More information

General viscosity iterative method for a sequence of quasi-nonexpansive mappings

General viscosity iterative method for a sequence of quasi-nonexpansive mappings Avalable onlne at www.tjnsa.com J. Nonlnear Sc. Appl. 9 (2016), 5672 5682 Research Artcle General vscosty teratve method for a sequence of quas-nonexpansve mappngs Cuje Zhang, Ynan Wang College of Scence,

More information

Neuro-Adaptive Design - I:

Neuro-Adaptive Design - I: Lecture 36 Neuro-Adaptve Desgn - I: A Robustfyng ool for Dynamc Inverson Desgn Dr. Radhakant Padh Asst. Professor Dept. of Aerospace Engneerng Indan Insttute of Scence - Bangalore Motvaton Perfect system

More information