Lecture VI Regression

Similar documents
Lecture 6: Learning for Control (Generalised Linear Regression)

Machine Learning Linear Regression

CHAPTER 10: LINEAR DISCRIMINATION

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 4

CHAPTER 2: Supervised Learning

CHAPTER 5: MULTIVARIATE METHODS

How about the more general "linear" scalar functions of scalars (i.e., a 1st degree polynomial of the following form with a constant term )?

Robust and Accurate Cancer Classification with Gene Expression Profiling

GMM parameter estimation. Xiaoye Lu CMPS290c Final Project

Lecture 11 SVM cont

Department of Economics University of Toronto

. The geometric multiplicity is dim[ker( λi. number of linearly independent eigenvectors associated with this eigenvalue.

Variants of Pegasos. December 11, 2009

TSS = SST + SSE An orthogonal partition of the total SS

John Geweke a and Gianni Amisano b a Departments of Economics and Statistics, University of Iowa, USA b European Central Bank, Frankfurt, Germany

F-Tests and Analysis of Variance (ANOVA) in the Simple Linear Regression Model. 1. Introduction

In the complete model, these slopes are ANALYSIS OF VARIANCE FOR THE COMPLETE TWO-WAY MODEL. (! i+1 -! i ) + [(!") i+1,q - [(!

( t) Outline of program: BGC1: Survival and event history analysis Oslo, March-May Recapitulation. The additive regression model

. The geometric multiplicity is dim[ker( λi. A )], i.e. the number of linearly independent eigenvectors associated with this eigenvalue.

Clustering (Bishop ch 9)

Appendix H: Rarefaction and extrapolation of Hill numbers for incidence data

Notes on the stability of dynamic systems and the use of Eigen Values.

Ordinary Differential Equations in Neuroscience with Matlab examples. Aim 1- Gain understanding of how to set up and solve ODE s

Graduate Macroeconomics 2 Problem set 5. - Solutions

Density Matrix Description of NMR BCMB/CHEM 8190

Advanced Machine Learning & Perception

Dynamic Team Decision Theory. EECS 558 Project Shrutivandana Sharma and David Shuman December 10, 2005

The Finite Element Method for the Analysis of Non-Linear and Dynamic Systems

Introduction to Compact Dynamical Modeling. III.1 Reducing Linear Time Invariant Systems. Luca Daniel Massachusetts Institute of Technology

Panel Data Regression Models

January Examinations 2012

An introduction to Support Vector Machine

Econ107 Applied Econometrics Topic 5: Specification: Choosing Independent Variables (Studenmund, Chapter 6)

RELATIONSHIP BETWEEN VOLATILITY AND TRADING VOLUME: THE CASE OF HSI STOCK RETURNS DATA

Should Exact Index Numbers have Standard Errors? Theory and Application to Asian Growth

Mechanics Physics 151

The Rosenblatt s LMS algorithm for Perceptron (1958) is built around a linear neuron (a neuron with a linear

Lecture 18: The Laplace Transform (See Sections and 14.7 in Boas)

Mechanics Physics 151

Lecture Slides for INTRODUCTION TO. Machine Learning. ETHEM ALPAYDIN The MIT Press,

( ) () we define the interaction representation by the unitary transformation () = ()

Cubic Bezier Homotopy Function for Solving Exponential Equations

Density Matrix Description of NMR BCMB/CHEM 8190

( ) lamp power. dx dt T. Introduction to Compact Dynamical Modeling. III.1 Reducing Linear Time Invariant Systems

Lecture 2 L n i e n a e r a M od o e d l e s

( ) [ ] MAP Decision Rule

Mechanics Physics 151

Bayes rule for a classification problem INF Discriminant functions for the normal density. Euclidean distance. Mahalanobis distance

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede

Chapter 6: AC Circuits

Advanced time-series analysis (University of Lund, Economic History Department)

Machine Learning 2nd Edition

CSCE 478/878 Lecture 5: Artificial Neural Networks and Support Vector Machines. Stephen Scott. Introduction. Outline. Linear Threshold Units

Testing a new idea to solve the P = NP problem with mathematical induction

Math 128b Project. Jude Yuen

Solution in semi infinite diffusion couples (error function analysis)

FTCS Solution to the Heat Equation

ES.1803 Topic 22 Notes Jeremy Orloff

Fall 2010 Graduate Course on Dynamic Learning

CH.3. COMPATIBILITY EQUATIONS. Continuum Mechanics Course (MMC) - ETSECCPB - UPC

The Performance of Optimum Response Surface Methodology Based on MM-Estimator

Forecasting Using First-Order Difference of Time Series and Bagging of Competitive Associative Nets

arxiv: v1 [math.oc] 11 Dec 2014

COMPUTER SCIENCE 349A SAMPLE EXAM QUESTIONS WITH SOLUTIONS PARTS 1, 2

THE PREDICTION OF COMPETITIVE ENVIRONMENT IN BUSINESS

General Weighted Majority, Online Learning as Online Optimization

Fall 2009 Social Sciences 7418 University of Wisconsin-Madison. Problem Set 2 Answers (4) (6) di = D (10)

Linear Regression Linear Regression with Shrinkage

Volatility Interpolation

More Digital Logic. t p output. Low-to-high and high-to-low transitions could have different t p. V in (t)

Chapter 4. Neural Networks Based on Competition

Outline. Probabilistic Model Learning. Probabilistic Model Learning. Probabilistic Model for Time-series Data: Hidden Markov Model

NPTEL Project. Econometric Modelling. Module23: Granger Causality Test. Lecture35: Granger Causality Test. Vinod Gupta School of Management

Unit Root Time Series. Univariate random walk

Equalization on Graphs: Linear Programming and Message Passing

Including the ordinary differential of distance with time as velocity makes a system of ordinary differential equations.

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

t is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t...

On One Analytic Method of. Constructing Program Controls

Chapter Lagrangian Interpolation

Introduction to Boosting

A Principled Approach to MILP Modeling

Linear Response Theory: The connection between QFT and experiments

Math 334 Test 1 KEY Spring 2010 Section: 001. Instructor: Scott Glasgow Dates: May 10 and 11.

Comb Filters. Comb Filters

Clustering with Gaussian Mixtures

This is an example to show you how SMath can calculate the movement of kinematic mechanisms.

STAD57 Time Series Analysis. Lecture 14

V The Fourier Transform

CS286.2 Lecture 14: Quantum de Finetti Theorems II

Boosted LMS-based Piecewise Linear Adaptive Filters

CS 536: Machine Learning. Nonparametric Density Estimation Unsupervised Learning - Clustering

Lecture Notes 4. Univariate Forecasting and the Time Series Properties of Dynamic Economic Models

Explaining Total Factor Productivity. Ulrich Kohli University of Geneva December 2015

Notes for Lecture 17-18

The Ordinary Least Squares (OLS) Estimator

The Analysis of the Thickness-predictive Model Based on the SVM Xiu-ming Zhao1,a,Yan Wang2,band Zhimin Bi3,c

Financial Econometrics Jeffrey R. Russell Midterm Winter 2009 SOLUTIONS

Normal Random Variable and its discriminant functions

Transcription:

Lecure VI Regresson (Lnear Mehods for Regresson) Conens: Lnear Mehods for Regresson Leas Squares, Gauss Markov heorem Recursve Leas Squares Lecure VI: MLSC - Dr. Sehu Vjayakumar

Lnear Regresson Model M y f ( ) + + ε : Lnear Model 0 j j j here (,,...,, ) Inpu vecor, m (,,...,, ) regresson parameers m 0 he lnear model eher assumes ha he regresson funcon f() s lnear, or ha he lnear model s a reasonable appromaon. he npus can be : Quanave npus ransformaons of quanave npus such as log, square roo ec. 3 Bass epansons (e.g. polynomal represenaon) :, 3,... Ineracon beeen varables : 3 Dummy codng of levels of qualave npu In all hese cases, he model s lnear n he parameers, even hough he fnal funcon self may no be lnear. Lecure VI: MLSC - Dr. Sehu Vjayakumar

Poer of Lnear Models 0 y g() y() f (, ) g( + 0 ) 3 4 d f g() s lnear: only lnear funcons can be modeled hoever, f s acually preprocessed, complcaed funcons can be realzed φ ( z) z () z z Φz () φ φ d () z eample : Φz () z d Lecure VI: MLSC - Dr. Sehu Vjayakumar 3

Lecure VI: MLSC - Dr. Sehu Vjayakumar 4 Leas Squares Opmzaon Leas Squares Cos Funcon Mnmze Cos ( ) ( ) n n N N daa ranng of N here f J, here, ) ( # )) ˆ ( ( ( ) ( ) ( ) ( ) J J + + 0 ( ) Soluon :

Wha are e really dong? Leas Squares Soluon : y ( ) pred pred We seek he lnear funcon of ha mnmzes he sum of he squared resduals from Y Lnear leas squares fng Lecure VI: MLSC - Dr. Sehu Vjayakumar 5

More nsghs no he LS soluon he Pseudo-Inverse + ( ) pseudo nverses are a specal soluon o an nfne se of soluons of a non-unque nverse problem (e alked abou n he prevous lecure) he mar nverson above may sll be ll-defned f s close o sngular and so-called Rdge Regresson needs o be appled Rdge Regresson + ( + γi) here γ << Mulple Oupus: jus lke mulple sngle oupu regressons W ( ) Y Lecure VI: MLSC - Dr. Sehu Vjayakumar 6

Geomercal Inerpreaon of LS Subspace S spanned by he columns of Vecor of resdual errors (ohorgonal o y) [ ] Resdual vecor: y [ ] [ ] 0 y s orhogonal o he space spanned by columns of snce J 0 ( ) y s an orhogonal Projecon of on S And hence, y s he opmal reconsrucon of n he range of Lecure VI: MLSC - Dr. Sehu Vjayakumar 7

Physcal Inerpreaon of LS y all sprngs have he same sprng consan pons far aay generae more force (danger of oulers) sprngs are vercal soluon s he mnmum energy soluon acheved by he sprngs Lecure VI: MLSC - Dr. Sehu Vjayakumar 8

Mnmum varance unbased esmaor Gauss-Markov heorem Leas Squares esmae of he parameers has he smalles varance among all lnear unbased esmaes. Leas Squares are also called BLUE esmaes Bes Lnear Unbased Esmaors ˆ ( H ) : Leas Squares here H ( ) Esmae In oher ords, Gauss-Markov heorem says ha here s no oher mar C such ha he esmaor formed by ~ C ll be boh unbased and have a smaller varance han ˆ. ˆ ( Leas Squares Esmae) s an Unbased Esmae snce E( ˆ ) (Homeork!!) Lecure VI: MLSC - Dr. Sehu Vjayakumar 9

Gauss-Markov heorem (Proof) E( ~ ) E( C) E( C( + ε)) E( C + Cε) C + CE( ε) C For Unbased Esmae : E( ~ ) C C I Var( ~ ) E[( ~ E E E[( ~ ( ~ )( ~ ))( ~ ( ~ )) ] ) ] E[( C )( C ) ] E[( C + Cε )( C + Cε ) E[( Cε)( Cε) ]... snce C I CE[ εε ] C σ CC ] Lecure VI: MLSC - Dr. Sehu Vjayakumar 0

Gauss-Markov heorem (Proof) We an o sho ha Var( ˆ ) Var( ~ ) Le C D + ( ) ( D + ( ) ) I snce C I D + I I D 0 Var( ~ ) Var( ~ ) σ CC σ ( DD σ DD σ DD σ ( D + ( ) )( D + ( ) ) + ( ) ( )( ) + D( ) + σ ( )... snce D 0 + Var( ˆ ) ) I s hs suffcen o sho ha dagonal elemens s rue by defnon. Hence, proved. of σ DD are non negave. Lecure VI: MLSC - Dr. Sehu Vjayakumar

Based vs unbased Bas-Varance decomposon of error E { } fˆ( ) σ + E{ yˆ } { } ( f ( )) + E ( yˆ E{ yˆ }) ε var( nose) + bas + var( esmae) Gauss-Markov heorem says ha Leas Squares acheves he esmae h he mnmum varance (and hence, he mnmum Mean Squared Error) among all he unbased esmaes (bas0). Does ha mean ha e should alays ork h unbased esmaors?? No!! snce here may ess some based esmaors h a smaller ne mean squared error hey rade a lle bas for a larger reducon n varance. Varable Subse Selecon and Shrnkage are mehods (hch e ll eplore soon) ha nroduce bas and ry o reduce he varance of he esmae. Lecure VI: MLSC - Dr. Sehu Vjayakumar

Recursve Leas Squares he Sherman-Morrson-Woodbury heorem ( A zz ) A + A zz A z A z More General: he Mar Inverson heorem ( A BC ) A + A B( I + CA B) CA Recursve Leas Squares Updae n+ W n P n+ ( ) n ( ) n Inalze : P I here γ << (noe P ) γ For every ne daa pon (, ) (noe ha ncludes he bas erm) n n n+ n P P f no forgeng P P n P here λ λ λ + < f forgeng + : Lecure VI: MLSC - Dr. Sehu Vjayakumar 3

Recursve Leas Squares (con d) Some amazng facs abou recursve leas squares Resuls for W are EACLY he same as for normal leas squares updae (bach updae) afer every daa pon as added once! (no eraons) NO mar nverson necessary anymore NO learnng rae necessary Guaraneed convergence o opmal W (lnear regresson s an opmal esmaor under many condons) Forgeng facor λ allos o forge daa n case of changng arge funcons Compuaonal load s larger han bach verson of lnear regresson Bu don ge fooled: f daa s sngular, you sll ll have problems! Lecure VI: MLSC - Dr. Sehu Vjayakumar 4