Batch RL Via Least Squares Policy Iteration
|
|
- Opal Rice
- 6 years ago
- Views:
Transcription
1 Batch RL Va Leat Square Polcy Iteraton Alan Fern * Baed n part on lde by Ronald Parr
2 Overvew Motvaton LSPI Dervaton from LSTD Expermental reult
3 Onlne veru Batch RL Onlne RL: ntegrate data collecton and optmzaton Select acton n envronment and at the ame tme update parameter baed on each oberved,a,,r) Batch RL: decouple data collecton and optmzaton Frt generate/collect experence n the envronment gvng a data et of tate-acton-reward-tate par {,a,r, )} Ue the fxed et of experence to optmze/learn a polcy Onlne v. Batch: Batch algorthm are often more data effcent and table Batch algorthm gnore the exploraton-explotaton problem, and do ther bet wth the data they have
4 Batch RL Motvaton There are many applcaton that naturally ft the batch RL model Medcal Treatment Optmzaton: Input: collecton of treatment epode for an alment gvng equence of obervaton and acton ncludng outcome Ouput: a treatment polcy, deally better than current practce Emergency Repone Optmzaton: Input: collecton of emergency repone epode gvng movement of emergency reource before, durng, and after 911 call Output: emergency repone polcy
5 LSPI LSPI a model-free batch RL algorthm Learn a lnear approxmaton of Q-functon table and effcent Never dverge or gve meanngle anwer LSPI can be appled to a dataet regardle of how t wa collected But garbage n, garbage out. Leat-Square Polcy Iteraton, Mchal Lagoudak and Ronald Parr, Journal of Machne Learnng Reearch JMLR), Vol. 4, 2003, pp
6 Notaton S: tate pace, : ndvdual tate R,a): reward for takng acton a n tate g: dcount factor V): tate value functon P,a) = T,a, ): tranton functon Q,a) : tate-acton value functon Polcy: ) a Objectve: Maxmze expected, total, dcounted reward E t 0 g t rt
7 Projecton Approach to Approxmaton Recall the tandard Bellman equaton: V * ) max R, a) g or equvalently V * T[ V * ] Bellman operator a where T[.] the Recall from value teraton, the ub-optmalty of a value functon can be bounded n term of the Bellman error: V T[V ] Th motvate tryng to fnd an approxmate value functon wth mall Bellman error ' P ', a) V * ')
8 Projecton Approach to Approxmaton Suppoe that we have a pace of repreentable value functon E.g. the pace of lnear functon over gven feature Let P be a projecton operator for that pace Project any value functon n or outde of the pace) to cloet value functon n the pace Fxed Pont Bellman Equaton wth approxmaton ˆ * V T[ Vˆ * ] Dependng on pace th wll have a mall Bellman error LSPI wll attempt to arrve at uch a value functon Aume lnear approxmaton and leat-quare projecton
9
10 Projected Value Iteraton Naïve Idea: try computng projected fxed pont ung VI Exact VI: terate Bellman backup) V T[ V 1 ] Projected VI: terated projected Bellman backup): ˆ V 1 T[ Vˆ ] Project exact Bellman backup to cloet functon n our retrcted functon pace exact Bellman backup produced value functon)
11 Example: Projected Bellman Backup Retrct pace to lnear functon over a ngle feature f: Vˆ ) wf ) Suppoe jut two tate 1 and 2 wth: Suppoe exact backup of V gve: T Vˆ ˆ ] 1) 2, T[ V ] ) 2 [ 2 f 1 )=1, f 2 )=2 Can we repreent th exact backup n our lnear pace? No f 1 )=1 f 2 )=2
12 Example: Projected Bellman Backup Retrct pace to lnear functon over a ngle feature f: Vˆ ) wf ) Suppoe jut two tate 1 and 2 wth: Suppoe exact backup of V gve: T Vˆ ˆ ] 1) 2, T[ V ] ) 2 [ 2 f 1 )=1, f 2 )=2 The backup can t be repreented va our lnear functon: ˆ V 1 T[ Vˆ ] f 1 )=1 f 2 )=2 ˆ 1 ) 1.333f ) V projected backup jut leat-quare ft to exact backup
13 Problem: Stablty Exact value teraton tablty enured by contracton property of Bellman backup: V T[ V 1 ] I the projected Bellman backup a contracton: ˆ V 1 T[ Vˆ ]?
14 Example: Stablty Problem [Berteka & Ttkl 1996] Problem: Mot projecton lead to backup that are not contracton and untable 1 2 Reward all zero, ngle acton, g = 0.9: V* = 0 Conder lnear approx. w/ ngle feature f wth weght w. Vˆ ) wf ) Optmal w = 0 nce V*=0
15 Example: Stablty Problem f 1 )=1 1 2 f 2 )=2 V 1 ) = w V 2 ) = 2w weght value at teraton From V perform projected backup for each tate ˆ T[ V ] 1) gv 2) 1. 8 ˆ T[ V ] 2) gv 2) 1. 8 ˆ ˆ Can t be repreented n our pace o fnd w +1 that gve leat-quare approx. to exact backup After ome math we can get: w +1 = 1.2 w w w What doe th mean?
16 Example: Stablty Problem Vx) Vˆ 3 Vˆ 2 1 Vˆ 0 Vˆ Iteraton # 1 2 S Each teraton of Bellman backup make approxmaton wore! Even for th trval problem projected VI dverge.
17 Undertandng the Problem What went wrong? Exact Bellman backup reduce error n max-norm Leat quare = projecton) non-expanve n L 2 norm May ncreae max-norm dtance Concluon: Alternatng value teraton and functon approxmaton rky bune
18 Overvew Motvaton LSPI Dervaton from Leat-Square Temporal Dfference Learnng Expermental reult
19 How doe LSPI fx thee? LSPI perform approxmate polcy teraton PI nvolve polcy evaluaton and polcy mprovement Ue a varant of leat-quare temporal dfference learnng LSTD) for approx. polcy evaluaton [Bratdke & Barto 96] Stablty: LSTD drectly olve for the fxed pont of the approxmate Bellman equaton for polcy value Wth ngular-value decompoton SVD), th alway well defned Data effcency LSTD fnd bet approxmaton for any fnte data et Make a ngle pa over the data for each polcy Can be mplemented ncrementally
20 OK, What LSTD? Leat Square Temporal Dfference Learnng Aume lnear value functon approxmaton of K feature The f k are arbtrary feature functon of tate Some vector notaton k k w k V ) ) ˆ f ) ˆ ) ˆ ˆ 1 n V V V w k w w 1 ) 1 ) n k k k f f f f K f 1
21 Dervng LSTD Vˆ w agn a value to every tate = K ba functon f 1 1) f 2 1)... f 1 2) f 2 2).. # tate Vˆ a lnear functon n the column pace of f 1 f k, that, Vˆ w 1 f 1 w K f K.
22 Suppoe we know value of polcy Want: w V Leat quare weght mnmze quared error w T 1 ) T V Sometme called peudonvere Leat quare projecton then Vˆ w T 1 ) V Textbook leat quare projecton operator T
23 But we don t know V Recall fxed-pont equaton for polce Wll olve a projected fxed-pont equaton: Subttutng leat quare projecton nto th gve: Solvng for w: g V P R V ˆ ˆ w P R w T T g 1 ) R P w T T T 1 ) g )), )), )), )),, )), )), n n n n n n n P P P P P R R R ' ') )), ' )), ) V P R V g
24 Almot there w T g T P) 1 T R Matrx to nvert only K x K But Expenve to contruct matrx e.g. P S x S ) We don t know P We don t know R
25 Ung Sample for Suppoe we have tate tranton ample of the polcy runnng n the MDP: {,a,r, )} Idea: Replace enumeraton of tate wth ampled tate K ba functon f 1 1) f 2 1)... ˆ f 1 2) f 2 2).. ample tate.
26 Ung Sample for R Suppoe we have tate tranton ample of the polcy runnng n the MDP: {,a,r, )} Idea: Replace enumeraton of reward wth ampled reward r 1 R = r 2... ample
27 Ung Sample for P Idea: Replace expectaton over next tate wth ampled next tate. K ba functon f 1 1 ) f 2 1 )... P f 1 2 ) f 2 2 )... from,a,r, )
28 Puttng t Together LSTD need to compute: w g P) The hard part of whch B the kxk matrx: Both B and b can be computed ncrementally for each,a,r, ) ample: ntalze to zero) R B T T 1 T 1 T T B g P) T b R B j b B j b from prevou lde b f ) f ) gf ) f ') r f ) j j
29 LSTD Algorthm Collect data by executng trajectore of current polcy For each,a,r, ) ample: B b j b B j f ) f ) gf ) f ') r f, a) j j w B 1 b
30 LSTD Summary Doe Ok 2 ) work per datum Lnear n amount of data. Approache model-baed anwer n lmt Fndng fxed pont requre nvertng matrx Fxed pont almot alway ext Stable; effcent
31 Approxmate Polcy Iteraton wth LSTD Polcy Iteraton: terate between polcy mprovement and polcy evaluaton Idea: ue LSTD for approxmate polcy evaluaton n PI Start wth random weght w.e. value functon) Repeat Untl Convergence ) = greedy ) // polcy mprovement Evaluate Vˆ, w ung LSTD Generate ample trajectore of Ue LSTD to produce new weght w w gve an approx. value functon of ) )
32 What Break? No way to execute greedy polcy wthout a model Approxmaton baed by current polcy We only approxmate value of tate we ee when executng the current polcy LSTD a weghted approxmaton toward thoe tate Can reult n Learn-forget cycle of polcy teraton Drve off the road; learn that t bad New polcy never doe th; forget that t bad Not truly a batch method Data mut be collected from current polcy for LSTD
33 LSPI LSPI mlar to prevou loop but replace LSTD wth a new algorthm LSTDQ LSTD: produce a value functon Requre ample from polcy under conderaton LSTDQ: produce a Q-functon for current polcy Can learn Q-functon for polcy from any reaonable) et of ample---ometme called an off-polcy method No need to collect ample from current polcy Dconnect polcy evaluaton from data collecton Permt reue of data acro teraton! Truly a batch method.
34 Implementng LSTDQ Both LSTD and LSTDQ compute: But LSTDQ ba functon are ndexed by acton defne greedy polcy: For each,a,r, ) ample: B j b w Q ˆ, a) w f, a) w B 1 B b b j k T T B P) f, a) f, a) f, a) f ', ')) r f, a) k k ˆ w ) arg max Q, a) j a arg max a j w ˆ Q w w ', a)
35 Runnng LSPI There a Matlab mplementaton avalable! 1. Collect a databae of,a,r, ) experence th the magc tep) 2. Start wth random weght = random polcy) 3. Repeat Evaluate current polcy agant databae Run LSTDQ to generate new et of weght New weght mply new Q-functon and hence new polcy Replace current weght wth new weght Untl convergence
36 Reult: Bcycle Rdng Watch random controller operate bke Collect ~40,000,a,r, ) ample Pck 20 mple feature functon 5 acton) Make 5-10 pae over data PI tep) Reward wa baed on dtance to goal + goal achevement Reult: Controller that balance and rde to goal
37 Bcycle Trajectore
38 What about Q-learnng? Ran Q-learnng wth ame feature Ued experence replay for data effcency
39 Q-learnng Reult
40 LSPI Robutne
41 Some key pont LSPI a batch RL algorthm Can generate trajectory data anyway you want Induce a polcy baed on global optmzaton over full dataet Very table wth no parameter that need tweakng
42 So, what the bad new? LSPI doe not addre the exploraton problem It decouple data collecton from polcy optmzaton Th often not a major ue, but can be n ome cae k 2 can ometme be bg Lot of torage Matrx nveron can be expenve Bcycle needed hapng reward Stll haven t olved Feature electon ue for all machne learnng, but RL eem even more entve)
Batch Reinforcement Learning
Batch Renforcement Learnng Alan Fern * Baed n part on lde by Ronald Parr Overvew What batch renforcement learnng? Leat Square Polcy Iteraton Ftted Q-teraton Batch DQN Onlne veru Batch RL Onlne RL: ntegrate
More informationAdditional File 1 - Detailed explanation of the expression level CPD
Addtonal Fle - Detaled explanaton of the expreon level CPD A mentoned n the man text, the man CPD for the uterng model cont of two ndvdual factor: P( level gen P( level gen P ( level gen 2 (.).. CPD factor
More informationSmall signal analysis
Small gnal analy. ntroducton Let u conder the crcut hown n Fg., where the nonlnear retor decrbed by the equaton g v havng graphcal repreentaton hown n Fg.. ( G (t G v(t v Fg. Fg. a D current ource wherea
More informationThe Essential Dynamics Algorithm: Essential Results
@ MIT maachuett nttute of technology artfcal ntellgence laboratory The Eental Dynamc Algorthm: Eental Reult Martn C. Martn AI Memo 003-014 May 003 003 maachuett nttute of technology, cambrdge, ma 0139
More informationImprovements on Waring s Problem
Improvement on Warng Problem L An-Png Bejng, PR Chna apl@nacom Abtract By a new recurve algorthm for the auxlary equaton, n th paper, we wll gve ome mprovement for Warng problem Keyword: Warng Problem,
More informationFeature Selection: Part 1
CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?
More informationHarmonic oscillator approximation
armonc ocllator approxmaton armonc ocllator approxmaton Euaton to be olved We are fndng a mnmum of the functon under the retrcton where W P, P,..., P, Q, Q,..., Q P, P,..., P, Q, Q,..., Q lnwgner functon
More informationTeam. Outline. Statistics and Art: Sampling, Response Error, Mixed Models, Missing Data, and Inference
Team Stattc and Art: Samplng, Repone Error, Mxed Model, Mng Data, and nference Ed Stanek Unverty of Maachuett- Amhert, USA 9/5/8 9/5/8 Outlne. Example: Doe-repone Model n Toxcology. ow to Predct Realzed
More informationLecture Notes on Linear Regression
Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume
More informationSpecification -- Assumptions of the Simple Classical Linear Regression Model (CLRM) 1. Introduction
ECONOMICS 35* -- NOTE ECON 35* -- NOTE Specfcaton -- Aumpton of the Smple Clacal Lnear Regreon Model (CLRM). Introducton CLRM tand for the Clacal Lnear Regreon Model. The CLRM alo known a the tandard lnear
More information1 Convex Optimization
Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,
More informationAP Statistics Ch 3 Examining Relationships
Introducton To tud relatonhp between varable, we mut meaure the varable on the ame group of ndvdual. If we thnk a varable ma eplan or even caue change n another varable, then the eplanator varable and
More informationCHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE
CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE Analytcal soluton s usually not possble when exctaton vares arbtrarly wth tme or f the system s nonlnear. Such problems can be solved by numercal tmesteppng
More informationRoot Locus Techniques
Root Locu Technque ELEC 32 Cloed-Loop Control The control nput u t ynthezed baed on the a pror knowledge of the ytem plant, the reference nput r t, and the error gnal, e t The control ytem meaure the output,
More informationLecture 10 Support Vector Machines II
Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed
More information2.3 Least-Square regressions
.3 Leat-Square regreon Eample.10 How do chldren grow? The pattern of growth vare from chld to chld, o we can bet undertandng the general pattern b followng the average heght of a number of chldren. Here
More informationChapter 6 The Effect of the GPS Systematic Errors on Deformation Parameters
Chapter 6 The Effect of the GPS Sytematc Error on Deformaton Parameter 6.. General Beutler et al., (988) dd the frt comprehenve tudy on the GPS ytematc error. Baed on a geometrc approach and aumng a unform
More informationAdditional Codes using Finite Difference Method. 1 HJB Equation for Consumption-Saving Problem Without Uncertainty
Addtonal Codes usng Fnte Dfference Method Benamn Moll 1 HJB Equaton for Consumpton-Savng Problem Wthout Uncertanty Before consderng the case wth stochastc ncome n http://www.prnceton.edu/~moll/ HACTproect/HACT_Numercal_Appendx.pdf,
More informationLOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin
Proceedngs of the 007 Wnter Smulaton Conference S G Henderson, B Bller, M-H Hseh, J Shortle, J D Tew, and R R Barton, eds LOW BIAS INTEGRATED PATH ESTIMATORS James M Calvn Department of Computer Scence
More informationChapter 11. Supplemental Text Material. The method of steepest ascent can be derived as follows. Suppose that we have fit a firstorder
S-. The Method of Steepet cent Chapter. Supplemental Text Materal The method of teepet acent can be derved a follow. Suppoe that we have ft a frtorder model y = β + β x and we wh to ue th model to determne
More informationEEL 6266 Power System Operation and Control. Chapter 3 Economic Dispatch Using Dynamic Programming
EEL 6266 Power System Operaton and Control Chapter 3 Economc Dspatch Usng Dynamc Programmng Pecewse Lnear Cost Functons Common practce many utltes prefer to represent ther generator cost functons as sngle-
More informationfind (x): given element x, return the canonical element of the set containing x;
COS 43 Sprng, 009 Dsjont Set Unon Problem: Mantan a collecton of dsjont sets. Two operatons: fnd the set contanng a gven element; unte two sets nto one (destructvely). Approach: Canoncal element method:
More informationEEE 241: Linear Systems
EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they
More informationStatistical Properties of the OLS Coefficient Estimators. 1. Introduction
ECOOMICS 35* -- OTE 4 ECO 35* -- OTE 4 Stattcal Properte of the OLS Coeffcent Etmator Introducton We derved n ote the OLS (Ordnary Leat Square etmator ˆβ j (j, of the regreon coeffcent βj (j, n the mple
More informationA Result on a Cyclic Polynomials
Gen. Math. Note, Vol. 6, No., Feruary 05, pp. 59-65 ISSN 9-78 Copyrght ICSRS Pulcaton, 05.-cr.org Avalale free onlne at http:.geman.n A Reult on a Cyclc Polynomal S.A. Wahd Department of Mathematc & Stattc
More informationGeneralized Linear Methods
Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set
More informationProblem Set 9 Solutions
Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem
More informationImprovements on Waring s Problem
Imrovement on Warng Problem L An-Png Bejng 85, PR Chna al@nacom Abtract By a new recurve algorthm for the auxlary equaton, n th aer, we wll gve ome mrovement for Warng roblem Keyword: Warng Problem, Hardy-Lttlewood
More informationMULTIPLE REGRESSION ANALYSIS For the Case of Two Regressors
MULTIPLE REGRESSION ANALYSIS For the Cae of Two Regreor In the followng note, leat-quare etmaton developed for multple regreon problem wth two eplanator varable, here called regreor (uch a n the Fat Food
More informationScattering of two identical particles in the center-of. of-mass frame. (b)
Lecture # November 5 Scatterng of two dentcal partcle Relatvtc Quantum Mechanc: The Klen-Gordon equaton Interpretaton of the Klen-Gordon equaton The Drac equaton Drac repreentaton for the matrce α and
More informationTwo Approaches to Proving. Goldbach s Conjecture
Two Approache to Provng Goldbach Conecture By Bernard Farley Adved By Charle Parry May 3 rd 5 A Bref Introducton to Goldbach Conecture In 74 Goldbach made h mot famou contrbuton n mathematc wth the conecture
More informationEnsemble Methods: Boosting
Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement
More informationCHAPTER IV RESEARCH FINDING AND ANALYSIS
CHAPTER IV REEARCH FINDING AND ANALYI A. Descrpton of Research Fndngs To fnd out the dfference between the students who were taught by usng Mme Game and the students who were not taught by usng Mme Game
More informationLecture 10 Support Vector Machines. Oct
Lecture 10 Support Vector Machnes Oct - 20-2008 Lnear Separators Whch of the lnear separators s optmal? Concept of Margn Recall that n Perceptron, we learned that the convergence rate of the Perceptron
More informationSupplement: Proofs and Technical Details for The Solution Path of the Generalized Lasso
Supplement: Proofs and Techncal Detals for The Soluton Path of the Generalzed Lasso Ryan J. Tbshran Jonathan Taylor In ths document we gve supplementary detals to the paper The Soluton Path of the Generalzed
More informationSingular Value Decomposition: Theory and Applications
Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real
More informationKernel Methods and SVMs Extension
Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general
More informationSome Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS)
Some Comments on Acceleratng Convergence of Iteratve Sequences Usng Drect Inverson of the Iteratve Subspace (DIIS) C. Davd Sherrll School of Chemstry and Bochemstry Georga Insttute of Technology May 1998
More informationReinforcement learning
Renforcement learnng Nathanel Daw Gatsby Computatonal Neuroscence Unt daw @ gatsby.ucl.ac.uk http://www.gatsby.ucl.ac.uk/~daw Mostly adapted from Andrew Moore s tutorals, copyrght 2002, 2004 by Andrew
More informationErrors for Linear Systems
Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch
More informationLecture 21: Numerical methods for pricing American type derivatives
Lecture 21: Numercal methods for prcng Amercan type dervatves Xaoguang Wang STAT 598W Aprl 10th, 2014 (STAT 598W) Lecture 21 1 / 26 Outlne 1 Fnte Dfference Method Explct Method Penalty Method (STAT 598W)
More informationLinear Feature Engineering 11
Lnear Feature Engneerng 11 2 Least-Squares 2.1 Smple least-squares Consder the followng dataset. We have a bunch of nputs x and correspondng outputs y. The partcular values n ths dataset are x y 0.23 0.19
More informationStanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011
Stanford Unversty CS359G: Graph Parttonng and Expanders Handout 4 Luca Trevsan January 3, 0 Lecture 4 In whch we prove the dffcult drecton of Cheeger s nequalty. As n the past lectures, consder an undrected
More informationOutline and Reading. Dynamic Programming. Dynamic Programming revealed. Computing Fibonacci. The General Dynamic Programming Technique
Outlne and Readng Dynamc Programmng The General Technque ( 5.3.2) -1 Knapsac Problem ( 5.3.3) Matrx Chan-Product ( 5.3.1) Dynamc Programmng verson 1.4 1 Dynamc Programmng verson 1.4 2 Dynamc Programmng
More informationProblem #1. Known: All required parameters. Schematic: Find: Depth of freezing as function of time. Strategy:
BEE 3500 013 Prelm Soluton Problem #1 Known: All requred parameter. Schematc: Fnd: Depth of freezng a functon of tme. Strategy: In thee mplfed analy for freezng tme, a wa done n cla for a lab geometry,
More informationNeural networks. Nuno Vasconcelos ECE Department, UCSD
Neural networs Nuno Vasconcelos ECE Department, UCSD Classfcaton a classfcaton problem has two types of varables e.g. X - vector of observatons (features) n the world Y - state (class) of the world x X
More informationCHAPTER 9 LINEAR MOMENTUM, IMPULSE AND COLLISIONS
CHAPTER 9 LINEAR MOMENTUM, IMPULSE AND COLLISIONS 103 Phy 1 9.1 Lnear Momentum The prncple o energy conervaton can be ued to olve problem that are harder to olve jut ung Newton law. It ued to decrbe moton
More informationCS4495/6495 Introduction to Computer Vision. 3C-L3 Calibrating cameras
CS4495/6495 Introducton to Computer Vson 3C-L3 Calbratng cameras Fnally (last tme): Camera parameters Projecton equaton the cumulatve effect of all parameters: M (3x4) f s x ' 1 0 0 0 c R 0 I T 3 3 3 x1
More informationCSC 411 / CSC D11 / CSC C11
18 Boostng s a general strategy for learnng classfers by combnng smpler ones. The dea of boostng s to take a weak classfer that s, any classfer that wll do at least slghtly better than chance and use t
More informationNUMERICAL DIFFERENTIATION
NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the
More informationCS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements
CS 750 Machne Learnng Lecture 5 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square CS 750 Machne Learnng Announcements Homework Due on Wednesday before the class Reports: hand n before
More informationLecture 3: Dual problems and Kernels
Lecture 3: Dual problems and Kernels C4B Machne Learnng Hlary 211 A. Zsserman Prmal and dual forms Lnear separablty revsted Feature mappng Kernels for SVMs Kernel trck requrements radal bass functons SVM
More informationMMA and GCMMA two methods for nonlinear optimization
MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons
More informationCollege of Computer & Information Science Fall 2009 Northeastern University 20 October 2009
College of Computer & Informaton Scence Fall 2009 Northeastern Unversty 20 October 2009 CS7880: Algorthmc Power Tools Scrbe: Jan Wen and Laura Poplawsk Lecture Outlne: Prmal-dual schema Network Desgn:
More informationLecture 4. Instructor: Haipeng Luo
Lecture 4 Instructor: Hapeng Luo In the followng lectures, we focus on the expert problem and study more adaptve algorthms. Although Hedge s proven to be worst-case optmal, one may wonder how well t would
More informationSolutions to exam in SF1811 Optimization, Jan 14, 2015
Solutons to exam n SF8 Optmzaton, Jan 4, 25 3 3 O------O -4 \ / \ / The network: \/ where all lnks go from left to rght. /\ / \ / \ 6 O------O -5 2 4.(a) Let x = ( x 3, x 4, x 23, x 24 ) T, where the varable
More informationNumerical Algorithms for Visual Computing 2008/09 Example Solutions for Assignment 4. Problem 1 (Shift invariance of the Laplace operator)
Numercal Algorthms for Vsual Computng 008/09 Example Solutons for Assgnment 4 Problem (Shft nvarance of the Laplace operator The Laplace equaton s shft nvarant,.e., nvarant under translatons x x + a, y
More informationCOS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #16 Scribe: Yannan Wang April 3, 2014
COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #16 Scrbe: Yannan Wang Aprl 3, 014 1 Introducton The goal of our onlne learnng scenaro from last class s C comparng wth best expert and
More informationDeep Reinforcement Learning with Experience Replay Based on SARSA
Deep Renforcement Learnng wth Experence Replay Baed on SARSA Dongbn Zhao, Hatao Wang, Kun Shao and Yuanheng Zhu Key Laboratory of Management and Control for Complex Sytem Inttute of Automaton Chnee Academy
More informationDifferentiating Gaussian Processes
Dfferentatng Gaussan Processes Andrew McHutchon Aprl 17, 013 1 Frst Order Dervatve of the Posteror Mean The posteror mean of a GP s gven by, f = x, X KX, X 1 y x, X α 1 Only the x, X term depends on the
More informationIV. Performance Optimization
IV. Performance Optmzaton A. Steepest descent algorthm defnton how to set up bounds on learnng rate mnmzaton n a lne (varyng learnng rate) momentum learnng examples B. Newton s method defnton Gauss-Newton
More informationFor now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.
Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson
More informationSTAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16
STAT 39: MATHEMATICAL COMPUTATIONS I FALL 218 LECTURE 16 1 why teratve methods f we have a lnear system Ax = b where A s very, very large but s ether sparse or structured (eg, banded, Toepltz, banded plus
More informationVector Norms. Chapter 7 Iterative Techniques in Matrix Algebra. Cauchy-Bunyakovsky-Schwarz Inequality for Sums. Distances. Convergence.
Vector Norms Chapter 7 Iteratve Technques n Matrx Algebra Per-Olof Persson persson@berkeley.edu Department of Mathematcs Unversty of Calforna, Berkeley Math 128B Numercal Analyss Defnton A vector norm
More informationSocial Studies 201 Notes for March 18, 2005
1 Social Studie 201 Note for March 18, 2005 Etimation of a mean, mall ample ize Section 8.4, p. 501. When a reearcher ha only a mall ample ize available, the central limit theorem doe not apply to the
More informationCS : Algorithms and Uncertainty Lecture 17 Date: October 26, 2016
CS 29-128: Algorthms and Uncertanty Lecture 17 Date: October 26, 2016 Instructor: Nkhl Bansal Scrbe: Mchael Denns 1 Introducton In ths lecture we wll be lookng nto the secretary problem, and an nterestng
More informationMatrix Approximation and Projective Clustering via Iterative Sampling
Matrx Approxmaton and Projectve Cluterng va Iteratve Samplng Lu Rademacher Santoh Vempala Grant Wang Abtract We preent two new reult for the problem of approxmatng a gven real m n matrx A by a rank-k matrx
More informationLecture 20: November 7
0-725/36-725: Convex Optmzaton Fall 205 Lecturer: Ryan Tbshran Lecture 20: November 7 Scrbes: Varsha Chnnaobreddy, Joon Sk Km, Lngyao Zhang Note: LaTeX template courtesy of UC Berkeley EECS dept. Dsclamer:
More informationSupporting Information
Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to
More information1 Derivation of Point-to-Plane Minimization
1 Dervaton of Pont-to-Plane Mnmzaton Consder the Chen-Medon (pont-to-plane) framework for ICP. Assume we have a collecton of ponts (p, q ) wth normals n. We want to determne the optmal rotaton and translaton
More informationChapter 12. Ordinary Differential Equation Boundary Value (BV) Problems
Chapter. Ordnar Dfferental Equaton Boundar Value (BV) Problems In ths chapter we wll learn how to solve ODE boundar value problem. BV ODE s usuall gven wth x beng the ndependent space varable. p( x) q(
More informationCS 229, Public Course Problem Set #3 Solutions: Learning Theory and Unsupervised Learning
CS9 Problem Set #3 Solutons CS 9, Publc Course Problem Set #3 Solutons: Learnng Theory and Unsupervsed Learnng. Unform convergence and Model Selecton In ths problem, we wll prove a bound on the error of
More informationChapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems
Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons
More informationSocial Studies 201 Notes for November 14, 2003
1 Social Studie 201 Note for November 14, 2003 Etimation of a mean, mall ample ize Section 8.4, p. 501. When a reearcher ha only a mall ample ize available, the central limit theorem doe not apply to the
More informationIntroduction to Algorithms
Introducton to Algorthms 6.046J/8.40J Lecture 7 Prof. Potr Indyk Data Structures Role of data structures: Encapsulate data Support certan operatons (e.g., INSERT, DELETE, SEARCH) Our focus: effcency of
More informationSome modelling aspects for the Matlab implementation of MMA
Some modellng aspects for the Matlab mplementaton of MMA Krster Svanberg krlle@math.kth.se Optmzaton and Systems Theory Department of Mathematcs KTH, SE 10044 Stockholm September 2004 1. Consdered optmzaton
More informationZhongping Jiang Tengfei Liu
Zhongpng Jang Tengfe Lu 4 F.L. Lews, NAI Moncref-O Donnell Char, UTA Research Insttute (UTARI) The Unversty of Texas at Arlngton, USA and Qan Ren Consultng Professor, State Key Laboratory of Synthetcal
More informationLecture 12: Discrete Laplacian
Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly
More informationLecture 4: Constant Time SVD Approximation
Spectral Algorthms and Representatons eb. 17, Mar. 3 and 8, 005 Lecture 4: Constant Tme SVD Approxmaton Lecturer: Santosh Vempala Scrbe: Jangzhuo Chen Ths topc conssts of three lectures 0/17, 03/03, 03/08),
More informationp 1 c 2 + p 2 c 2 + p 3 c p m c 2
Where to put a faclty? Gven locatons p 1,..., p m n R n of m houses, want to choose a locaton c n R n for the fre staton. Want c to be as close as possble to all the house. We know how to measure dstance
More informationSupport Vector Machines CS434
Support Vector Machnes CS434 Lnear Separators Many lnear separators exst that perfectly classfy all tranng examples Whch of the lnear separators s the best? Intuton of Margn Consder ponts A, B, and C We
More informationHow Strong Are Weak Patents? Joseph Farrell and Carl Shapiro. Supplementary Material Licensing Probabilistic Patents to Cournot Oligopolists *
How Strong Are Weak Patents? Joseph Farrell and Carl Shapro Supplementary Materal Lcensng Probablstc Patents to Cournot Olgopolsts * September 007 We study here the specal case n whch downstream competton
More informationIntroduction. Modeling Data. Approach. Quality of Fit. Likelihood. Probabilistic Approach
Introducton Modelng Data Gven a et of obervaton, we wh to ft a mathematcal model Model deend on adutable arameter traght lne: m + c n Polnomal: a + a + a + L+ a n Choce of model deend uon roblem Aroach
More informationj=0 s t t+1 + q t are vectors of length equal to the number of assets (c t+1 ) q t +1 + d i t+1 (1) (c t+1 ) R t+1 1= E t β u0 (c t+1 ) R u 0 (c t )
1 Aet Prce: overvew Euler equaton C-CAPM equty premum puzzle and rk free rate puzzle Law of One Prce / No Arbtrage Hanen-Jagannathan bound reoluton of equty premum puzzle Euler equaton agent problem X
More informationQuick Visit to Bernoulli Land
Although we have een the Bernoull equaton and een t derved before, th next note how t dervaton for an uncopreble & nvcd flow. The dervaton follow that of Kuethe &Chow ot cloely (I lke t better than Anderon).
More informationChapter Newton s Method
Chapter 9. Newton s Method After readng ths chapter, you should be able to:. Understand how Newton s method s dfferent from the Golden Secton Search method. Understand how Newton s method works 3. Solve
More informationStart Point and Trajectory Analysis for the Minimal Time System Design Algorithm
Start Pont and Trajectory Analy for the Mnmal Tme Sytem Degn Algorthm ALEXANDER ZEMLIAK, PEDRO MIRANDA Department of Phyc and Mathematc Puebla Autonomou Unverty Av San Claudo /n, Puebla, 757 MEXICO Abtract:
More informationSeparation Axioms of Fuzzy Bitopological Spaces
IJCSNS Internatonal Journal of Computer Scence and Network Securty VOL3 No October 3 Separaton Axom of Fuzzy Btopologcal Space Hong Wang College of Scence Southwet Unverty of Scence and Technology Manyang
More informationNodal analysis of finite square resistive grids and the teaching effectiveness of students projects
2 nd World Conference on Technology and Engneerng Educaton 2 WIETE Lublana Slovena 5-8 September 2 Nodal analyss of fnte square resstve grds and the teachng effectveness of students proects P. Zegarmstrz
More informationChapter - 2. Distribution System Power Flow Analysis
Chapter - 2 Dstrbuton System Power Flow Analyss CHAPTER - 2 Radal Dstrbuton System Load Flow 2.1 Introducton Load flow s an mportant tool [66] for analyzng electrcal power system network performance. Load
More informationarxiv: v1 [cs.gt] 15 Jan 2019
Model and algorthm for tme-content rk-aware Markov game Wenje Huang, Pham Vet Ha and Wllam B. Hakell January 16, 2019 arxv:1901.04882v1 [c.gt] 15 Jan 2019 Abtract In th paper, we propoe a model for non-cooperatve
More informationTechnical Note: Capacity Constraints Across Nests in Assortment Optimization Under the Nested Logit Model
Techncal Note: Capacty Constrants Across Nests n Assortment Optmzaton Under the Nested Logt Model Jacob B. Feldman, Huseyn Topaloglu School of Operatons Research and Informaton Engneerng, Cornell Unversty,
More informationSolution Methods for Time-indexed MIP Models for Chemical Production Scheduling
Ian Davd Lockhart Bogle and Mchael Farweather (Edtor), Proceedng of the 22nd European Sympoum on Computer Aded Proce Engneerng, 17-2 June 212, London. 212 Elever B.V. All rght reerved. Soluton Method for
More informationThis appendix presents the derivations and proofs omitted from the main text.
Onlne Appendx A Appendx: Omtted Dervaton and Proof Th appendx preent the dervaton and proof omtted from the man text A Omtted dervaton n Secton Mot of the analy provded n the man text Here, we formally
More informationRelaxation Methods for Iterative Solution to Linear Systems of Equations
Relaxaton Methods for Iteratve Soluton to Lnear Systems of Equatons Gerald Recktenwald Portland State Unversty Mechancal Engneerng Department gerry@pdx.edu Overvew Techncal topcs Basc Concepts Statonary
More informationErratum: A Generalized Path Integral Control Approach to Reinforcement Learning
Journal of Machne Learnng Research 00-9 Submtted /0; Publshed 7/ Erratum: A Generalzed Path Integral Control Approach to Renforcement Learnng Evangelos ATheodorou Jonas Buchl Stefan Schaal Department of
More informationVariable Structure Control ~ Basics
Varable Structure Control ~ Bac Harry G. Kwatny Department of Mechancal Engneerng & Mechanc Drexel Unverty Outlne A prelmnary example VS ytem, ldng mode, reachng Bac of dcontnuou ytem Example: underea
More informationSupport Vector Machines CS434
Support Vector Machnes CS434 Lnear Separators Many lnear separators exst that perfectly classfy all tranng examples Whch of the lnear separators s the best? + + + + + + + + + Intuton of Margn Consder ponts
More informationGeneral viscosity iterative method for a sequence of quasi-nonexpansive mappings
Avalable onlne at www.tjnsa.com J. Nonlnear Sc. Appl. 9 (2016), 5672 5682 Research Artcle General vscosty teratve method for a sequence of quas-nonexpansve mappngs Cuje Zhang, Ynan Wang College of Scence,
More informationNeuro-Adaptive Design - I:
Lecture 36 Neuro-Adaptve Desgn - I: A Robustfyng ool for Dynamc Inverson Desgn Dr. Radhakant Padh Asst. Professor Dept. of Aerospace Engneerng Indan Insttute of Scence - Bangalore Motvaton Perfect system
More information