Linear Regression Linear Regression with Shrinkage. Some slides are due to Tommi Jaakkola, MIT AI Lab

Similar documents
CS 2750 Machine Learning. Lecture 8. Linear regression. CS 2750 Machine Learning. Linear regression. is a linear combination of input components x

CS 2750 Machine Learning. Lecture 7. Linear regression. CS 2750 Machine Learning. Linear regression. is a linear combination of input components x

Linear Regression Linear Regression with Shrinkage

Multivariate Transformation of Variables and Maximum Likelihood Estimation

Supervised learning: Linear regression Logistic regression

CS 1675 Introduction to Machine Learning Lecture 12 Support vector machines

Objectives of Multiple Regression

Maximum Likelihood Estimation

Linear Regression Linear Regression with Shrinkage

Support vector machines II

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

Overview. Basic concepts of Bayesian learning. Most probable model given data Coin tosses Linear regression Logistic regression

Binary classification: Support Vector Machines

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

Generative classification models

6.867 Machine Learning

Linear Regression Linear Regression with Shrinkage

Bayes (Naïve or not) Classifiers: Generative Approach

Special Instructions / Useful Data

An Introduction to. Support Vector Machine

Point Estimation: definition of estimators

Machine Learning. Introduction to Regression. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

ENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections

Radial Basis Function Networks

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #1

4. Standard Regression Model and Spatial Dependence Tests

Training Sample Model: Given n observations, [[( Yi, x i the sample model can be expressed as (1) where, zero and variance σ

LINEAR REGRESSION ANALYSIS

12.2 Estimating Model parameters Assumptions: ox and y are related according to the simple linear regression model

Qualifying Exam Statistical Theory Problem Solutions August 2005

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

Lecture 3. Sampling, sampling distributions, and parameter estimation

Chapter 14 Logistic Regression Models

Feature Selection: Part 2. 1 Greedy Algorithms (continued from the last lecture)

Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests. Soccer Goals in European Premier Leagues

ENGI 3423 Simple Linear Regression Page 12-01

Line Fitting and Regression

Chapter Two. An Introduction to Regression ( )

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

ECONOMETRIC THEORY. MODULE VIII Lecture - 26 Heteroskedasticity

Summary of the lecture in Biostatistics

Dimensionality Reduction and Learning

TESTS BASED ON MAXIMUM LIKELIHOOD

Linear Regression with One Regressor

Simple Linear Regression


Lecture Notes Forecasting the process of estimating or predicting unknown situations

9.1 Introduction to the probit and logit models

Probability and. Lecture 13: and Correlation

Part I: Background on the Binomial Distribution

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

Chapter 2 Supplemental Text Material

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best

Introduction to local (nonparametric) density estimation. methods

Correlation and Simple Linear Regression

CHAPTER 3 POSTERIOR DISTRIBUTIONS

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

Simulation Output Analysis

ECON 5360 Class Notes GMM

Bayes Estimator for Exponential Distribution with Extension of Jeffery Prior Information

Lecture Notes Types of economic variables

STK4011 and STK9011 Autumn 2016

Support vector machines

A New Family of Transformations for Lifetime Data

Kernel-based Methods and Support Vector Machines

Functions of Random Variables

Chapter 4 (Part 1): Non-Parametric Classification (Sections ) Pattern Classification 4.3) Announcements

Dimensionality reduction Feature selection

Multiple Choice Test. Chapter Adequacy of Models for Regression

15-381: Artificial Intelligence. Regression and neural networks (NN)

Econometric Methods. Review of Estimation

ε. Therefore, the estimate

ESS Line Fitting

Chapter 8. Inferences about More Than Two Population Central Values

CHAPTER VI Statistical Analysis of Experimental Data

Taylor s Series and Interpolation. Interpolation & Curve-fitting. CIS Interpolation. Basic Scenario. Taylor Series interpolates at a specific

Lecture Notes 2. The ability to manipulate matrices is critical in economics.

Chapter 13 Student Lecture Notes 13-1

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then

Lecture 9: Tolerant Testing

Lecture Note to Rice Chapter 8

Midterm Exam 1, section 2 (Solution) Thursday, February hour, 15 minutes

Lecture 1: Introduction to Regression

Simple Linear Regression

Lecture 2: The Simple Regression Model

Regression and the LMS Algorithm

CSE 5526: Introduction to Neural Networks Linear Regression

3D Geometry for Computer Graphics. Lesson 2: PCA & SVD

Classification : Logistic regression. Generative classification model.

Lecture 8: Linear Regression

Lecture 1: Introduction to Regression

Introduction to Matrices and Matrix Approach to Simple Linear Regression

Principal Components. Analysis. Basic Intuition. A Method of Self Organized Learning

Analysis of Variance with Weibull Data

Lecture 3 Probability review (cont d)

Homework Solution (#5)

Unsupervised Learning and Other Neural Networks

ENGI 4421 Propagation of Error Page 8-01

Fundamentals of Regression Analysis

Logistic regression (continued)

Transcription:

Lear Regresso Lear Regresso th Shrkage Some sldes are due to Tomm Jaakkola, MIT AI Lab

Itroducto The goal of regresso s to make quattatve real valued predctos o the bass of a vector of features or attrbutes. Examples: house prces, stock values, survval tme, fuel effcec of cars, etc. Predctg vehcle fuel effcec mpg from 8 attrbutes:

A geerc regresso problem The put attrbutes are gve as fxed legth vectors that could come from dfferet sources: puts, trasformato of puts log, square root, etc or bass fuctos. The outputs are assumed to be real valued R th some possble restrctos. Gve d trag samples D {x,...x, } from uko dstrbuto Px,, the goal s to mmze the predcto error loss o e examples x, dra at radom from the same Px,. A a example of a loss fucto: L f x, f x squared loss our predcto for x M x R

Regresso Fucto We eed to defe a class of fuctos tpes of predctos e make. Lear predcto: f x;, + x here, are the parameters e eed estmate.

Lear Regresso Tpcall e have a set of trag data x,... x, from hch e estmate the parameters,. x x,.., xm Each s a vector of measuremets for th case. or h x { h x,..., h x } a bass expaso of x M x f x; M t + jh j x h x j defe h x

Bass Fuctos There are ma bass fuctos e ca use e.g. Polomal h x Radal bass fuctos Sgmodal h j j x x j h j x x µ j σ s Sples, Fourer, Wavelets, etc exp xµ j s

Estmato Crtero We eed a fttg/estmato crtero to select approprate values for the parameters, based o the trag set D { x,... x, } For example, e ca use the emprcal loss: J, f x;, ote: the loss s the same as evaluato

Emprcal loss: motvato Ideall, e ould lke to fd the parameters, that mmze the expected loss ulmted trag data: here the expectato s over samples from Px,. ~,, ;, x f E J P x, Whe the umber of trag examples s large: P x x f x f E ~,, ;, ; Expected loss Emprcal loss

Lear Regresso Estmato Mmze the emprcal squared loss x f J, ;, x B settg the dervatves th respect to to zero e get ecessar codtos for the optmal parameter values., x x J, x J,

Iterpretato The optmalt codtos x x x ε esure that the predcto error x s decorrelated th a lear fucto of the puts.

...... X Lear Regresso: Matrx Form samples M+ M+... x x X X, t x J

Lear Regresso Soluto B settg the dervatves of to zero, X t X t e get the soluto: t X X X ˆ t t X X X The soluto s a lear fucto of the outputs.

Statstcal ve of lear regresso I a statstcal regresso model e model both the fucto ad ose Observed output fucto + ose x f x; +ε here, e.g., ε ~ N, σ Whatever e caot capture th our chose faml of fuctos ll be terpreted as ose

Statstcal ve of lear regresso fx; s trg to capture the mea of the observatos gve the put x: E [ x] E[ f x; + ε x] f x; here E[ x] s the codtoal expectato of gve x, evaluated accordg to the model ot accordg to the uderlg dstrbuto of X

Statstcal ve of lear regresso Accordg to our statstcal model x f x; +ε, ε ~ N, σ the outputs gve x are ormall dstrbuted th mea f x; ad varace σ : p x,, σ exp f x; πσ σ e model the ucertat the predctos, ot just the mea

Maxmum lkelhood estmato Gve observatos e fd the parameters that maxmze the lkelhood of the outputs: { },,...,, x x D x p L,,, σ σ Maxmze log-lkelhood k k x f ; exp σ πσ k k x f L ; log, log σ πσ σ mmze

Maxmum lkelhood estmato Thus MLE arg m f x; But the emprcal squared loss s J f x; Least-squares Lear Regresso s MLE for Gaussa ose!!!

Lear Regresso s t good? Smple model Straghtforard soluto BUT MLS s ot a good estmator for predcto error The matrx X T X could be ll codtoed Iputs are correlated Iput dmeso s large Trag set s small

Lear Regresso - Example I ths example the output s geerated b the model here ε s a small hte Gaussa ose:.8x+. x +ε Three trag sets th dfferet correlatos Three trag sets th dfferet correlatos betee the to puts ere radoml chose, ad the lear regresso soluto as appled.

Lear Regresso Example Ad the results are x RSS.9.8,.9.5.5 -.5 - -.5 RSS.9465 b8.8776,.93 < - - 3 x x 3 - - RSS.4 RSS.4955.4, b8.43483.56,.56833 < - - 3 x x RSS.47 3.36, -9. 3 RSS.75469 b83.363,-9.96 < - - 3 x x,x ucorrelated x,x correlated x,x strogl correlated Strog correlato ca cause the coeffcets to be ver large, ad the ll cacel each other to acheve a good RSS. - -

Lear Regresso What ca be doe?

Shrkage methods Rdge Regresso Lasso PCR Prcpal Compoets Regresso PLS Partal Least Squares

Shrkage methods Before e proceed: Sce the follog methods are ot varat uder put scale, e ll assume the put s ormalzed mea, varace : The offset x j x j x j x j s alas estmated as ad e ll ork th cetered meag - x σ j j / N

Rdge Regresso Rdge regresso shrks the regresso coeffcets b mposg a pealt o ther sze also called eght deca I rdge regresso, e add a quadratc pealt o the eghts: N M M J xj j + λ j j j here λ s a tug parameter that cotrols the amout of shrkage. The sze costrat prevets the pheomeo of ldl large coeffcets that cacel each other from occurrg.

Rdge Regresso Soluto Rdge regresso matrx form: J The soluto s ˆ t t X X +λ t t X X + λ I X LS t ˆ X X X t rdge M The soluto adds a postve costat to the dagoal of X T X before verso. Ths makes the problem o sgular eve f X does ot have full colum rak. For orthogoal puts the rdge estmates are the scaled verso of least squares estmates: rdge LS ˆ γ ˆ γ

Rdge Regresso sghts The matrx X ca be represeted b t s SVD: X UDV U s a N*M matrx, t s colums spa the colum space of X D s a M*M dagoal matrx of sgular values V s a M*M matrx, t s colums spa the ro space of X Lets see ho ths method looks the Prcpal Compoets coordates of X T

Rdge Regresso sghts X ' XV X T XV V X ' ' The least squares soluto s gve b: T T T T VDU UDV VDU D U T ls T ˆ ' V ˆ V ˆ ' ls The Rdge Regresso s smlarl gve b: rdge V T ˆ rdge V T T T VDU UDV + λi T ls D + λi DU D + λi D ˆ' VDU ˆ ls T Dagoal t X X X t

Rdge Regresso sghts rdge ˆ' j d j d j ˆ ' +λ I the PCA axes, the rdge coeffcets are just scaled LS coeffcets! The coeffcets that correspod to smaller put varace drectos are scaled do more. ls j d d

Rdge Regresso I the follog smulatos, quattes are plotted versus the quatt df λ d M j j d j + λ Ths mootoc decreasg fucto s the effectve degrees of freedom of the rdge regresso ft.

Rdge Regresso Smulato results: 4 dmesos,,3,4 No correlato b 6 5 4 3 Rdge Regresso Strog correlato b 6 5 4 3 Rdge Regresso RSS 6 5 4 3.5.5.5 3 3.5 Effectve Dmeso.5.5.5 3 3.5 Effectve Dmeso RSS 6 5 4 3 3 4 Effectve Dmeso.5.5.5 3 3.5 4 Effectve Dmeso

Rdge regresso s MAP th Gaussa pror J log P D P t log Ν x, σ Ν, τ t t X X + + cost σ τ Ths s the same objectve fucto that rdge solves, usg λσ /τ Rdge: J t t X X +λ

Lasso Lasso s a shrkage method lke rdge, th a subtle but mportat dfferece. It s defed b ˆ lasso arg m β N M x j j j subject to M There s a smlart to the rdge regresso problem: the L rdge regresso pealt s replaced b the L lasso pealt. j The L pealt makes the soluto o-lear ad requres a quadratc programmg algorthm to compute t. j t

Lasso If t s chose to be larger the t : t t M ls ˆ j the the lasso estmato s detcal to the least squares. O the other had, for sa tt /, the least squares coeffcets are shruk b about 5% o average. For coveece e ll plot the smulato results versus shrkage factor s: s t t M ls ˆ j t

Lasso Smulato results: No correlato b 6 5 4 3 Lasso Strog correlato b 6 5 4 3 Lasso RSS 6 5 4 3..4.6.8 Shrkage factor..4.6.8 Shrkage factor RSS 4 3..4.6.8 Shrkage factor..4.6.8 Shrkage factor

L vs L pealtes Cotours of the LS error fucto L L + β t β β β +β t I Lasso the costrat rego has corers; he the soluto hts a corer the correspodg coeffcets becomes he M> more tha oe.

Problem: Fgure plots lear regresso results o the bass of ol three data pots. We used varous tpes of regularzato to obta the plots see belo but got cofused about hch plot correspods to hch regularzato method. Please assg each plot to oe ad ol oe of the follog regularzato method.