Lecture 6: Learning for Control (Generalised Linear Regression)

Similar documents
Lecture VI Regression

Machine Learning Linear Regression

CHAPTER 10: LINEAR DISCRIMINATION

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 4

CHAPTER 2: Supervised Learning

CHAPTER 5: MULTIVARIATE METHODS

Robust and Accurate Cancer Classification with Gene Expression Profiling

Lecture 11 SVM cont

John Geweke a and Gianni Amisano b a Departments of Economics and Statistics, University of Iowa, USA b European Central Bank, Frankfurt, Germany

. The geometric multiplicity is dim[ker( λi. number of linearly independent eigenvectors associated with this eigenvalue.

The Finite Element Method for the Analysis of Non-Linear and Dynamic Systems

GMM parameter estimation. Xiaoye Lu CMPS290c Final Project

( t) Outline of program: BGC1: Survival and event history analysis Oslo, March-May Recapitulation. The additive regression model

. The geometric multiplicity is dim[ker( λi. A )], i.e. the number of linearly independent eigenvectors associated with this eigenvalue.

An introduction to Support Vector Machine

How about the more general "linear" scalar functions of scalars (i.e., a 1st degree polynomial of the following form with a constant term )?

TSS = SST + SSE An orthogonal partition of the total SS

Variants of Pegasos. December 11, 2009

Department of Economics University of Toronto

Density Matrix Description of NMR BCMB/CHEM 8190

In the complete model, these slopes are ANALYSIS OF VARIANCE FOR THE COMPLETE TWO-WAY MODEL. (! i+1 -! i ) + [(!") i+1,q - [(!

Dynamic Team Decision Theory. EECS 558 Project Shrutivandana Sharma and David Shuman December 10, 2005

( ) () we define the interaction representation by the unitary transformation () = ()

Should Exact Index Numbers have Standard Errors? Theory and Application to Asian Growth

January Examinations 2012

Clustering (Bishop ch 9)

Graduate Macroeconomics 2 Problem set 5. - Solutions

F-Tests and Analysis of Variance (ANOVA) in the Simple Linear Regression Model. 1. Introduction

Mechanics Physics 151

Mechanics Physics 151

Panel Data Regression Models

Density Matrix Description of NMR BCMB/CHEM 8190

Fall 2010 Graduate Course on Dynamic Learning

Econ107 Applied Econometrics Topic 5: Specification: Choosing Independent Variables (Studenmund, Chapter 6)

Chapter 6: AC Circuits

RELATIONSHIP BETWEEN VOLATILITY AND TRADING VOLUME: THE CASE OF HSI STOCK RETURNS DATA

FTCS Solution to the Heat Equation

Appendix H: Rarefaction and extrapolation of Hill numbers for incidence data

( ) [ ] MAP Decision Rule

Lecture 18: The Laplace Transform (See Sections and 14.7 in Boas)

The Rosenblatt s LMS algorithm for Perceptron (1958) is built around a linear neuron (a neuron with a linear

Bayes rule for a classification problem INF Discriminant functions for the normal density. Euclidean distance. Mahalanobis distance

Mechanics Physics 151

Advanced Machine Learning & Perception

Ordinary Differential Equations in Neuroscience with Matlab examples. Aim 1- Gain understanding of how to set up and solve ODE s

Testing a new idea to solve the P = NP problem with mathematical induction

Notes on the stability of dynamic systems and the use of Eigen Values.

Math 128b Project. Jude Yuen

CSCE 478/878 Lecture 5: Artificial Neural Networks and Support Vector Machines. Stephen Scott. Introduction. Outline. Linear Threshold Units

Machine Learning 2nd Edition

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede

Advanced time-series analysis (University of Lund, Economic History Department)

Lecture Slides for INTRODUCTION TO. Machine Learning. ETHEM ALPAYDIN The MIT Press,

On One Analytic Method of. Constructing Program Controls

Linear Response Theory: The connection between QFT and experiments

Volatility Interpolation

Introduction to Compact Dynamical Modeling. III.1 Reducing Linear Time Invariant Systems. Luca Daniel Massachusetts Institute of Technology

Chapter Lagrangian Interpolation

( ) lamp power. dx dt T. Introduction to Compact Dynamical Modeling. III.1 Reducing Linear Time Invariant Systems

Cubic Bezier Homotopy Function for Solving Exponential Equations

Lecture 2 L n i e n a e r a M od o e d l e s

Forecasting Using First-Order Difference of Time Series and Bagging of Competitive Associative Nets

Equalization on Graphs: Linear Programming and Message Passing

Linear Regression Linear Regression with Shrinkage

NPTEL Project. Econometric Modelling. Module23: Granger Causality Test. Lecture35: Granger Causality Test. Vinod Gupta School of Management

CS286.2 Lecture 14: Quantum de Finetti Theorems II

Online Supplement for Dynamic Multi-Technology. Production-Inventory Problem with Emissions Trading

More Digital Logic. t p output. Low-to-high and high-to-low transitions could have different t p. V in (t)

Solution in semi infinite diffusion couples (error function analysis)

V The Fourier Transform

Dual Approximate Dynamic Programming for Large Scale Hydro Valleys

ES.1803 Topic 22 Notes Jeremy Orloff

arxiv: v1 [math.oc] 11 Dec 2014

General Weighted Majority, Online Learning as Online Optimization

Introduction to Boosting

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Comb Filters. Comb Filters

The Performance of Optimum Response Surface Methodology Based on MM-Estimator

Fall 2009 Social Sciences 7418 University of Wisconsin-Madison. Problem Set 2 Answers (4) (6) di = D (10)

Outline. Probabilistic Model Learning. Probabilistic Model Learning. Probabilistic Model for Time-series Data: Hidden Markov Model

Tight results for Next Fit and Worst Fit with resource augmentation

Unit Root Time Series. Univariate random walk

Math 334 Test 1 KEY Spring 2010 Section: 001. Instructor: Scott Glasgow Dates: May 10 and 11.

Existence and Uniqueness Results for Random Impulsive Integro-Differential Equation

Explaining Total Factor Productivity. Ulrich Kohli University of Geneva December 2015

The Analysis of the Thickness-predictive Model Based on the SVM Xiu-ming Zhao1,a,Yan Wang2,band Zhimin Bi3,c

Dishonest casino as an HMM

The Ordinary Least Squares (OLS) Estimator

Fast Space varying Convolution, Fast Matrix Vector Multiplication,

Appendix to Online Clustering with Experts

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

CH.3. COMPATIBILITY EQUATIONS. Continuum Mechanics Course (MMC) - ETSECCPB - UPC

Computing Relevance, Similarity: The Vector Space Model

EE 435. Lecture 31. Absolute and Relative Accuracy DAC Design. The String DAC

Let s treat the problem of the response of a system to an applied external force. Again,

PubH 7405: REGRESSION ANALYSIS DIAGNOSTICS IN MULTIPLE REGRESSION

New M-Estimator Objective Function. in Simultaneous Equations Model. (A Comparative Study)

Pattern Classification (III) & Pattern Verification

Robustness Experiments with Two Variance Components

Reinforcement Learning

Transcription:

Lecure 6: Learnng for Conrol (Generalsed Lnear Regresson) Conens: Lnear Mehods for Regresson Leas Squares, Gauss Markov heorem Recursve Leas Squares Lecure 6: RLSC - Prof. Sehu Vjayakumar

Lnear Regresson Model M y f ( x) x x : Lnear Model 0 j j j here x ( x, x,..., x, ) Inpu vecor, m (,,...,, ) regresson parameers m 0 he lnear model eher assumes ha he regresson funcon f(x) s lnear, or ha he lnear model s a reasonable approxmaon. he npus x can be : Quanave npus ransformaons of quanave npus such as log, square roo ec. 3 Bass expansons (e.g. polynomal represenaon) : x x, x3 x,... Ineracon beeen varables : x3 x x Dummy codng of levels of qualave npu In all hese cases, he model s lnear n he parameers, even hough he fnal funcon self may no be lnear. Lecure 6: RLSC - Prof. Sehu Vjayakumar

Poer of Lnear Models 0 y g() yx f x, g x 0 x x x 3 x 4 x d f g() s lnear: only lnear funcons can be modeled hoever, f x s acually preprocessed, complcaed funcons can be realzed z z z z x z example : x z z z d d Lecure 6: RLSC - Prof. Sehu Vjayakumar 3

Lecure 6: RLSC - Prof. Sehu Vjayakumar 4 Leas Squares Opmzaon Leas Squares Cos Funcon Mnmze Cos n n N N daa ranng of N here x f J x x x x, here, ) ( # )) ˆ( ( J J 0 Soluon :

Wha are e really dong? Leas Squares Soluon : y pred x pred We seek he lnear funcon of ha mnmzes he sum of he squared resduals from Y Lnear leas squares fng Lecure 6: RLSC - Prof. Sehu Vjayakumar 5

More nsghs no he LS soluon he Pseudo-Inverse pseudo nverses are a specal soluon o an nfne se of soluons of a non-unque nverse problem (e alked abou n he prevous lecure) he marx nverson above may sll be ll-defned f s close o sngular and so-called Rdge Regresson needs o be appled Rdge Regresson I here Mulple Oupus: jus lke mulple sngle oupu regressons W Y Lecure 6: RLSC - Prof. Sehu Vjayakumar 6

Geomercal Inerpreaon of LS Resdual vecor : y s orhogonal o he space spanned by columns of snce J 0 And hence, y s he opmal reconsrucon of n he range of Lecure 6: RLSC - Prof. Sehu Vjayakumar 7

Physcal Inerpreaon of LS all sprngs have he same sprng consan pons far aay generae more force (danger of oulers) sprngs are vercal soluon s he mnmum energy soluon acheved by he sprngs Lecure 6: RLSC - Prof. Sehu Vjayakumar 8

Mnmum varance unbased esmaor Gauss-Markov heorem Leas Squares esmae of he parameers has he smalles varance among all lnear unbased esmaes. Leas Squares are also called BLUE esmaes Bes Lnear Unbased Esmaors ˆ ( H ) : Leas Squares here H ( ) Esmae In oher ords, Gauss-Markov heorem says ha here s no oher marx C such ha he esmaor formed by ~ C ll be boh unbased and have a smaller varance han ˆ. ˆ ( Leas Squares Esmae ) s an Unbased Esmae snce E( ˆ ) (Homeork!!) Lecure 6: RLSC - Prof. Sehu Vjayakumar 9

Gauss-Markov heorem (Proof) E( ~ ) E( C) E( C( ε)) E( C Cε) C CE( ε) C For Unbased Esmae : E( ~ ) C C I Var ( ~ ) E[( ~ E E E[( ~ ( ~ )( ~ ))( ~ ( ~ )) ] ) ] E[( C )( C ) ] E[( C Cε )( C Cε ) E[( Cε)( Cε) ]... snce C I CE[ εε ] C CC ] Lecure 6: RLSC - Prof. Sehu Vjayakumar 0

Gauss-Markov heorem (Proof) We an o sho ha Var ( ˆ ) Var ( ~ ) Le C D ( ) ( D ( ) ) I snce C I D I I D 0 Var ( ~ ) Var ( ~ ) CC ( DD DD DD ( D ( ) )( D ( ) ) ( ) ( )( ) D( ) ( )... snce D 0 Var ( ˆ ) ) I s hs suffcen o sho ha dagonal elemens s rue by defnon. Hence, proved. of DD are non negave. Lecure 6: RLSC - Prof. Sehu Vjayakumar

Based vs unbased Bas-Varance decomposon of error E fˆ( x ) Eyˆ f ( x ) E yˆ Eyˆ var( nose) bas var( esmae ) Gauss-Markov heorem says ha Leas Squares acheves he esmae h he mnmum varance (and hence, he mnmum Mean Squared Error) among all he unbased esmaes (bas=0). Does ha mean ha e should alays ork h unbased esmaors?? No!! snce here may exss some based esmaors h a smaller ne mean squared error hey rade a lle bas for a larger reducon n varance. Varable Subse Selecon and Shrnkage are mehods (hch e ll explore soon) ha nroduce bas and ry o reduce he varance of he esmae. Lecure 6: RLSC - Prof. Sehu Vjayakumar

Recursve Leas Squares he Sherman-Morrson-Woodbury heorem A zz A A zz A z A z More General: he Marx Inverson heorem A BC A A BI CA B CA Recursve Leas Squares Updae n Inalze : P I here (noe P ) For every ne daa pon x, (noe ha x ncludes he bas erm) : n n n n P xx P f no forgeng P P n x P x here f forgeng n W n P n x n x Lecure 6: RLSC - Prof. Sehu Vjayakumar 3

Recursve Leas Squares (con d) Some amazng facs abou recursve leas squares Resuls for W are EACLY he same as for normal leas squares updae (bach updae) afer every daa pon as added once! (no eraons) NO marx nverson necessary anymore NO learnng rae necessary Guaraneed convergence o opmal W (lnear regresson s an opmal esmaor under many condons) Forgeng facor allos o forge daa n case of changng arge funcons Compuaonal load s larger han bach verson of lnear regresson Bu don ge fooled: f daa s sngular, you sll ll have problems! Lecure 6: RLSC - Prof. Sehu Vjayakumar 4