Ridge Regression. Flachs, Munkholt og Skotte. May 4, 2009

Size: px
Start display at page:

Download "Ridge Regression. Flachs, Munkholt og Skotte. May 4, 2009"

Transcription

1 Ridge Regression Flachs, Munkholt og Skotte May 4, 2009 As in usual regression we consider a pair of random variables (X, Y ) with values in R p R and assume that for some (β 0, β) R +p it holds that E(Y X) = β 0 + X j β j = β 0 + X β. j= Let X be the N p matrix of N p-dimensional covariates and let Y denote the N-dimensional vector of observations. With X j = N N i= x ij and Y = N N i= y i we shall use the notation [ ] =., Y = Y, X = X X p and X = X where is N -dimensional. Initially we define the ridge regression estimate ( ˆβ 0, ˆβ) as the pair (β 0, β) that minimizes the penalized residual sum of squares f λ (β 0, β) = (Y β 0 Xβ) (Y β 0 Xβ) + λβ β. Note that the intercept β 0 is not penalized. Question. Show that the ridge regression problem is equivalent to the problem of finding a minimizer ( ˆβ 0 c, ˆβ c ) for g λ (β0, c β c ) = (Y β0 c (X X)β c ) (Y β0 c (X X)β c ) + λ(β c ) β c. Proof. Consider the bijective re-parametrization φ : R +p R +p defined by φ(β 0, β) = (β 0 + Xβ, β).

2 Then f λ (β 0, β) = (Y β 0 Xβ) (Y β 0 Xβ) + λβ β = (Y (β 0 + Xβ) (X X)β) (Y (β 0 + Xβ) (X X)β) + λβ β = g λ (φ(β 0, β)). Thus the two problems are equivalent and the solutions satisfy ˆβ c = ˆβ and ˆβ c 0 = ˆβ 0 + X ˆβ. In the centered version we have that d dβ c 0 Necessarily ˆβ 0 c = Y. Now we have g λ (β c 0, β c ) = 2(Y β c 0 (X X)β c ) = 2 n (y i β0 c i= = 2n(Y β c 0) (x ij X j )βj) c D β cg λ ( ˆβ c 0, β c ) = 2(Y Y (X X)β c ) (X X) + 2λ(β c ) = 2(Y Y) (X X) + 2(β c ) (X X) (X X) + 2λ(β c ) ( ) = 2(Y Y) (X X) + 2(β c ) (X X) (X X) + λi By transposing we see that a minimizer ˆβ c (or ˆβ) must satisfy i= ( (X X) (X X) + λi) ˆβc = (X X) (Y Y). It is therefore possible to assume that the matrix of covariates and the observation vector has been centered, such that the average of each column is zero, if we first let β c 0 = Y. In the following we let X = X X and y = Y Y. Define for λ > 0 the ridge regression estimate ˆβ(λ) as the β that minimize RSS λ (β) = (y Xβ) (y Xβ) + λβ β. Question 2. With X = UDV the singular value decomposition of X, show for λ > 0 that if ˆβ(λ) is a minimizer of RSS λ (β) then ˆβ(λ) ˆβ(λ) = i= ( + λ)2 y u i u i y where d i, i =,..., p are the singular values and u i, i =,..., p are the columns U in the SVD. 2

3 Proof. It follows from the calculations above that ˆβ(λ) satisfy (X X + λi) ˆβ(λ) = X y. When λ > 0 the term is invertible regardless of the rank of X. decomposition X = UDV, we see that Using the singular value ˆβ(λ) = (X X + λi) X y = (V D 2 V + λv V ) V DU y = (V (D 2 + λi)v ) V DU y = V (D 2 + λi) DU y This gives us that ˆβ(λ) ˆβ(λ) = y UD(D 2 + λi) 2 DU y = ( + λ)2 y u i u i y, i= using that for a p-dimensional vector b = (b,..., b p ) and a p p diagonal matrix A it holds that bab = a j b 2 j. j= The ordinary least square solution is obtained by minimizing (y βx) (y βx). The solution is only unique when X has full rank, but any solution satisfy Define t = X Xβ = X y. min β β β:x Xβ=X y Question 3. Show that ˆβ(λ) ˆβ(λ) < t for λ > 0. And that the function λ s(λ) := ˆβ(λ) ˆβ(λ) is continuous, strictly decreasing function, with s(λ) 0 for λ. 3

4 Proof. Consider s : (0, ) (0, ) defined by Then s is continuous with and since s(λ) = ˆβ(λ) ˆβ(λ) = i= ( + λ)2 y u i u i y d dλ s(λ) = y u i u 2 i y ( + < 0 λ)3 ( i= + λ)2 0 when λ. we have that s(λ) is strictly decreasing and goes to zero for λ. Let ˆβ be a minimizer of (y Xβ) (y Xβ), if the minimizer is not unique, consider the one with the smallest norm. For any λ > 0 we have that RSS λ ( ˆβ(λ)) RSS λ ( ˆβ), combining this we see that 0 RSS λ ( ˆβ) RSS λ ( ˆβ(λ)) = (y X ˆβ) (y X ˆβ) (y X ˆβ(λ)) (y X ˆβ(λ)) + λ ˆβ ˆβ λ ˆβ(λ) ˆβ(λ) λ ˆβ ˆβ λ ˆβ(λ) ˆβ(λ). Thus for λ > 0 the ordinary least squares estimate ˆβ (also the one with the smallest norm if there are more than one) satisfy ˆβ ˆβ ˆβ(λ) ˆβ(λ), taking the limit λ 0 on both sides gives Thus for any λ > 0 we see that ˆβ(λ) ˆβ(λ) = i= t di 0 = i ( + λ)2 y u i u i y < Y u i u i Y. di 0 = i Y u i u i Y t. Question 4. Show for λ > 0 that a minimizer of RSS λ (β) also is a minimizer of (y Xβ) (y Xβ), subject to the constraint β β s(λ). 4

5 Let β be a minimizer of RSS λ (β). Observe that β β = s(λ). Let β 2 satisfy β2 β 2 s(λ). Then 0 RSS λ (β 2 ) RSS λ (β ) = (y Xβ 2 ) (y Xβ 2 ) (y Xβ ) (y Xβ ) + λβ2 β 2 λs(λ) (y Xβ 2 ) (y Xβ 2 ) (y Xβ ) (y Xβ ) Thus β is also a minimizer for the constrained least squares problem. Question 5. Show for λ > 0 that a minimizer of (y Xβ) (y Xβ), subject to the constraint β β s(λ), is also a minimizer of RSS λ (β). Argue that the constraint minimization problem above yields the ordinary least squares estimate whenever s t. Let β be a minimizer of (y Xβ) (y Xβ) subject to the constraint β β s(λ). Then (y Xβ ) (y Xβ ) (y X ˆβ(λ)) (y X ˆβ(λ)) since ˆβ(λ) satisfy the restriction. This gives RSS λ (β ) (y X ˆβ(λ)) (y X ˆβ(λ)) + λβ β RSS λ ( ˆβ(λ)). If s t (at least one of) the ordimary least square estimate(s) is contained in the restriction set, thus the minimizer of the constrained problem will be the least squares estimate. The ridge regression estimate can therefore be seen as a ordinary least squares estimate on a parameter set restricted by β β s. The translation between the two models is data dependent, since s is given by λ in a data dependent manner. The predicted values in the ordinary least squares regression are ŷ = X ˆβ(0) = X(X X) X y, in the case where X has full rank p, else replace with a generalized inverse. Question 6. Show that for the projection P = X(X X) X onto the column space of X, we have tr(p ) = p and P 2 = P. 5

6 By rules of the trace we have By direct calculation tr(p ) = tr(x(x X) X ) = tr(x X(X X) ) = tr(i p ) = p. P 2 = X(X X) X X(X X) X = X(X X) X = P. The predicted values in the ridge regression are ŷ = X(X X + λi) X y. Question 7. Show for λ > 0 that for the so-called smoother S λ = X(X X + λi) X we have By properties of the trace tr(s λ ) = i= + λ < p and S2 λ S λ. tr(s λ ) = tr(x(x X + λi) X ) = tr(x X(X X + λi) ) = tr(v D 2 V (V D 2 V + λi) ) = tr(v D 2 V (V (D 2 + λi)v ) ) = tr(v D 2 V (V ) (D 2 + λi) V ) = tr(d 2 (D 2 + λi) ) and since +λ < this gives tr(s λ ) = i= + λ < p. To show that S 2 λ S λ we must show that S λ S 2 λ is positive semi-definite. Observe that S λ S 2 λ = X(X X + λi) X X(X X + λi) X X(X X + λi) X = X(X X + λi) ((X X + λi) X X)(X X + λi) X = X(X X + λi) (λi)(x X + λi) X = λx(x X + λi) 2 X The matrix (X X + λi) 2 is positive definite and since any b R N \{0} gives X b R p, we se that b (S λ Sλ 2 )b 0. 6

MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 1 Ridge Regression Ridge regression and the Lasso are two forms of regularized

More information

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis Massimiliano Pontil 1 Today s plan SVD and principal component analysis (PCA) Connection

More information

A Short Introduction to the Lasso Methodology

A Short Introduction to the Lasso Methodology A Short Introduction to the Lasso Methodology Michael Gutmann sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology March 9, 2016 Michael

More information

Theoretical Exercises Statistical Learning, 2009

Theoretical Exercises Statistical Learning, 2009 Theoretical Exercises Statistical Learning, 2009 Niels Richard Hansen April 20, 2009 The following exercises are going to play a central role in the course Statistical learning, block 4, 2009. The exercises

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

3. For a given dataset and linear model, what do you think is true about least squares estimates? Is Ŷ always unique? Yes. Is ˆβ always unique? No.

3. For a given dataset and linear model, what do you think is true about least squares estimates? Is Ŷ always unique? Yes. Is ˆβ always unique? No. 7. LEAST SQUARES ESTIMATION 1 EXERCISE: Least-Squares Estimation and Uniqueness of Estimates 1. For n real numbers a 1,...,a n, what value of a minimizes the sum of squared distances from a to each of

More information

Linear Regression (9/11/13)

Linear Regression (9/11/13) STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter

More information

Properties of Matrices and Operations on Matrices

Properties of Matrices and Operations on Matrices Properties of Matrices and Operations on Matrices A common data structure for statistical analysis is a rectangular array or matris. Rows represent individual observational units, or just observations,

More information

. a m1 a mn. a 1 a 2 a = a n

. a m1 a mn. a 1 a 2 a = a n Biostat 140655, 2008: Matrix Algebra Review 1 Definition: An m n matrix, A m n, is a rectangular array of real numbers with m rows and n columns Element in the i th row and the j th column is denoted by

More information

Eigenvalues and diagonalization

Eigenvalues and diagonalization Eigenvalues and diagonalization Patrick Breheny November 15 Patrick Breheny BST 764: Applied Statistical Modeling 1/20 Introduction The next topic in our course, principal components analysis, revolves

More information

Linear Algebra - Part II

Linear Algebra - Part II Linear Algebra - Part II Projection, Eigendecomposition, SVD (Adapted from Sargur Srihari s slides) Brief Review from Part 1 Symmetric Matrix: A = A T Orthogonal Matrix: A T A = AA T = I and A 1 = A T

More information

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Jonathan Taylor - p. 1/15 Today s class Bias-Variance tradeoff. Penalized regression. Cross-validation. - p. 2/15 Bias-variance

More information

STAT 462-Computational Data Analysis

STAT 462-Computational Data Analysis STAT 462-Computational Data Analysis Chapter 5- Part 2 Nasser Sadeghkhani a.sadeghkhani@queensu.ca October 2017 1 / 27 Outline Shrinkage Methods 1. Ridge Regression 2. Lasso Dimension Reduction Methods

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 MA 575 Linear Models: Cedric E Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 1 Within-group Correlation Let us recall the simple two-level hierarchical

More information

STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method.

STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method. STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method. Rebecca Barter May 5, 2015 Linear Regression Review Linear Regression Review

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Maximum Likelihood Estimation Merlise Clyde STA721 Linear Models Duke University August 31, 2017 Outline Topics Likelihood Function Projections Maximum Likelihood Estimates Readings: Christensen Chapter

More information

Singular Value Decomposition

Singular Value Decomposition Chapter 6 Singular Value Decomposition In Chapter 5, we derived a number of algorithms for computing the eigenvalues and eigenvectors of matrices A R n n. Having developed this machinery, we complete our

More information

Regularization: Ridge Regression and the LASSO

Regularization: Ridge Regression and the LASSO Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression

More information

4 Multiple Linear Regression

4 Multiple Linear Regression 4 Multiple Linear Regression 4. The Model Definition 4.. random variable Y fits a Multiple Linear Regression Model, iff there exist β, β,..., β k R so that for all (x, x 2,..., x k ) R k where ε N (, σ

More information

Lecture 14: Shrinkage

Lecture 14: Shrinkage Lecture 14: Shrinkage Reading: Section 6.2 STATS 202: Data mining and analysis October 27, 2017 1 / 19 Shrinkage methods The idea is to perform a linear regression, while regularizing or shrinking the

More information

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8 Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall

More information

A Modern Look at Classical Multivariate Techniques

A Modern Look at Classical Multivariate Techniques A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions

More information

Lecture 6: Methods for high-dimensional problems

Lecture 6: Methods for high-dimensional problems Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,

More information

Linear Systems. Carlo Tomasi. June 12, r = rank(a) b range(a) n r solutions

Linear Systems. Carlo Tomasi. June 12, r = rank(a) b range(a) n r solutions Linear Systems Carlo Tomasi June, 08 Section characterizes the existence and multiplicity of the solutions of a linear system in terms of the four fundamental spaces associated with the system s matrix

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

Study Notes on Matrices & Determinants for GATE 2017

Study Notes on Matrices & Determinants for GATE 2017 Study Notes on Matrices & Determinants for GATE 2017 Matrices and Determinates are undoubtedly one of the most scoring and high yielding topics in GATE. At least 3-4 questions are always anticipated from

More information

Estimation of the Response Mean. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 27

Estimation of the Response Mean. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 27 Estimation of the Response Mean Copyright c 202 Dan Nettleton (Iowa State University) Statistics 5 / 27 The Gauss-Markov Linear Model y = Xβ + ɛ y is an n random vector of responses. X is an n p matrix

More information

CS 143 Linear Algebra Review

CS 143 Linear Algebra Review CS 143 Linear Algebra Review Stefan Roth September 29, 2003 Introductory Remarks This review does not aim at mathematical rigor very much, but instead at ease of understanding and conciseness. Please see

More information

CS540 Machine learning Lecture 5

CS540 Machine learning Lecture 5 CS540 Machine learning Lecture 5 1 Last time Basis functions for linear regression Normal equations QR SVD - briefly 2 This time Geometry of least squares (again) SVD more slowly LMS Ridge regression 3

More information

Chapter 5 Matrix Approach to Simple Linear Regression

Chapter 5 Matrix Approach to Simple Linear Regression STAT 525 SPRING 2018 Chapter 5 Matrix Approach to Simple Linear Regression Professor Min Zhang Matrix Collection of elements arranged in rows and columns Elements will be numbers or symbols For example:

More information

Weighted Least Squares

Weighted Least Squares Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w

More information

MATH 3795 Lecture 10. Regularized Linear Least Squares.

MATH 3795 Lecture 10. Regularized Linear Least Squares. MATH 3795 Lecture 10. Regularized Linear Least Squares. Dmitriy Leykekhman Fall 2008 Goals Understanding the regularization. D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Linear Least

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Matrix Algebra, part 2

Matrix Algebra, part 2 Matrix Algebra, part 2 Ming-Ching Luoh 2005.9.12 1 / 38 Diagonalization and Spectral Decomposition of a Matrix Optimization 2 / 38 Diagonalization and Spectral Decomposition of a Matrix Also called Eigenvalues

More information

MARN 5898 Regularized Linear Least Squares.

MARN 5898 Regularized Linear Least Squares. MARN 5898 Regularized Linear Least Squares. Dmitriy Leykekhman Spring 2010 Goals Understanding the regularization. D. Leykekhman - MARN 5898 Parameter estimation in marine sciences Linear Least Squares

More information

14 Multiple Linear Regression

14 Multiple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

Chapter 3. Matrices. 3.1 Matrices

Chapter 3. Matrices. 3.1 Matrices 40 Chapter 3 Matrices 3.1 Matrices Definition 3.1 Matrix) A matrix A is a rectangular array of m n real numbers {a ij } written as a 11 a 12 a 1n a 21 a 22 a 2n A =.... a m1 a m2 a mn The array has m rows

More information

11 Hypothesis Testing

11 Hypothesis Testing 28 11 Hypothesis Testing 111 Introduction Suppose we want to test the hypothesis: H : A q p β p 1 q 1 In terms of the rows of A this can be written as a 1 a q β, ie a i β for each row of A (here a i denotes

More information

Linear Systems. Carlo Tomasi

Linear Systems. Carlo Tomasi Linear Systems Carlo Tomasi Section 1 characterizes the existence and multiplicity of the solutions of a linear system in terms of the four fundamental spaces associated with the system s matrix and of

More information

STAT Homework 8 - Solutions

STAT Homework 8 - Solutions STAT-36700 Homework 8 - Solutions Fall 208 November 3, 208 This contains solutions for Homework 4. lease note that we have included several additional comments and approaches to the problems to give you

More information

STAT5044: Regression and Anova. Inyoung Kim

STAT5044: Regression and Anova. Inyoung Kim STAT5044: Regression and Anova Inyoung Kim 2 / 51 Outline 1 Matrix Expression 2 Linear and quadratic forms 3 Properties of quadratic form 4 Properties of estimates 5 Distributional properties 3 / 51 Matrix

More information

MLES & Multivariate Normal Theory

MLES & Multivariate Normal Theory Merlise Clyde September 6, 2016 Outline Expectations of Quadratic Forms Distribution Linear Transformations Distribution of estimates under normality Properties of MLE s Recap Ŷ = ˆµ is an unbiased estimate

More information

STAT 350: Geometry of Least Squares

STAT 350: Geometry of Least Squares The Geometry of Least Squares Mathematical Basics Inner / dot product: a and b column vectors a b = a T b = a i b i a b a T b = 0 Matrix Product: A is r s B is s t (AB) rt = s A rs B st Partitioned Matrices

More information

Spatial Process Estimates as Smoothers: A Review

Spatial Process Estimates as Smoothers: A Review Spatial Process Estimates as Smoothers: A Review Soutir Bandyopadhyay 1 Basic Model The observational model considered here has the form Y i = f(x i ) + ɛ i, for 1 i n. (1.1) where Y i is the observed

More information

Lecture 6: Geometry of OLS Estimation of Linear Regession

Lecture 6: Geometry of OLS Estimation of Linear Regession Lecture 6: Geometry of OLS Estimation of Linear Regession Xuexin Wang WISE Oct 2013 1 / 22 Matrix Algebra An n m matrix A is a rectangular array that consists of nm elements arranged in n rows and m columns

More information

Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods.

Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods. TheThalesians Itiseasyforphilosopherstoberichiftheychoose Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods Ivan Zhdankin

More information

Solutions for Econometrics I Homework No.1

Solutions for Econometrics I Homework No.1 Solutions for Econometrics I Homework No.1 due 2006-02-20 Feldkircher, Forstner, Ghoddusi, Grafenhofer, Pichler, Reiss, Yan, Zeugner Exercise 1.1 Structural form of the problem: 1. q d t = α 0 + α 1 p

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

3 Multiple Linear Regression

3 Multiple Linear Regression 3 Multiple Linear Regression 3.1 The Model Essentially, all models are wrong, but some are useful. Quote by George E.P. Box. Models are supposed to be exact descriptions of the population, but that is

More information

variability of the model, represented by σ 2 and not accounted for by Xβ

variability of the model, represented by σ 2 and not accounted for by Xβ Posterior Predictive Distribution Suppose we have observed a new set of explanatory variables X and we want to predict the outcomes ỹ using the regression model. Components of uncertainty in p(ỹ y) variability

More information

Estimating Estimable Functions of β. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 17

Estimating Estimable Functions of β. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 17 Estimating Estimable Functions of β Copyright c 202 Dan Nettleton (Iowa State University) Statistics 5 / 7 The Response Depends on β Only through Xβ In the Gauss-Markov or Normal Theory Gauss-Markov Linear

More information

Properties of the least squares estimates

Properties of the least squares estimates Properties of the least squares estimates 2019-01-18 Warmup Let a and b be scalar constants, and X be a scalar random variable. Fill in the blanks E ax + b) = Var ax + b) = Goal Recall that the least squares

More information

FIRST MIDTERM EXAM ECON 7801 SPRING 2001

FIRST MIDTERM EXAM ECON 7801 SPRING 2001 FIRST MIDTERM EXAM ECON 780 SPRING 200 ECONOMICS DEPARTMENT, UNIVERSITY OF UTAH Problem 2 points Let y be a n-vector (It may be a vector of observations of a random variable y, but it does not matter how

More information

Improving ridge regression via model selection and focussed fine-tuning

Improving ridge regression via model selection and focussed fine-tuning Università degli Studi di Milano-Bicocca SCUOLA DI ECONOMIA E STATISTICA Corso di Laurea Magistrale in Scienze Statistiche ed Economiche Tesi di laurea magistrale Improving ridge regression via model selection

More information

The prediction of house price

The prediction of house price 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Foundations of Computer Vision

Foundations of Computer Vision Foundations of Computer Vision Wesley. E. Snyder North Carolina State University Hairong Qi University of Tennessee, Knoxville Last Edited February 8, 2017 1 3.2. A BRIEF REVIEW OF LINEAR ALGEBRA Apply

More information

Quantitative Analysis of Financial Markets. Summary of Part II. Key Concepts & Formulas. Christopher Ting. November 11, 2017

Quantitative Analysis of Financial Markets. Summary of Part II. Key Concepts & Formulas. Christopher Ting. November 11, 2017 Summary of Part II Key Concepts & Formulas Christopher Ting November 11, 2017 christopherting@smu.edu.sg http://www.mysmu.edu/faculty/christophert/ Christopher Ting 1 of 16 Why Regression Analysis? Understand

More information

Singular Value Decomposition and Principal Component Analysis (PCA) I

Singular Value Decomposition and Principal Component Analysis (PCA) I Singular Value Decomposition and Principal Component Analysis (PCA) I Prof Ned Wingreen MOL 40/50 Microarray review Data per array: 0000 genes, I (green) i,i (red) i 000 000+ data points! The expression

More information

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1 Inverse of a Square Matrix For an N N square matrix A, the inverse of A, 1 A, exists if and only if A is of full rank, i.e., if and only if no column of A is a linear combination 1 of the others. A is

More information

High-dimensional regression modeling

High-dimensional regression modeling High-dimensional regression modeling David Causeur Department of Statistics and Computer Science Agrocampus Ouest IRMAR CNRS UMR 6625 http://www.agrocampus-ouest.fr/math/causeur/ Course objectives Making

More information

Advanced Quantitative Methods: ordinary least squares

Advanced Quantitative Methods: ordinary least squares Advanced Quantitative Methods: Ordinary Least Squares University College Dublin 31 January 2012 1 2 3 4 5 Terminology y is the dependent variable referred to also (by Greene) as a regressand X are the

More information

MSG500/MVE190 Linear Models - Lecture 15

MSG500/MVE190 Linear Models - Lecture 15 MSG500/MVE190 Linear Models - Lecture 15 Rebecka Jörnsten Mathematical Statistics University of Gothenburg/Chalmers University of Technology December 13, 2012 1 Regularized regression In ordinary least

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Laurenz Wiskott Institute for Theoretical Biology Humboldt-University Berlin Invalidenstraße 43 D-10115 Berlin, Germany 11 March 2004 1 Intuition Problem Statement Experimental

More information

UNIVERSITETET I OSLO

UNIVERSITETET I OSLO UNIVERSITETET I OSLO Det matematisk-naturvitenskapelige fakultet Examination in: STK4030 Modern data analysis - FASIT Day of examination: Friday 13. Desember 2013. Examination hours: 14.30 18.30. This

More information

x 21 x 22 x 23 f X 1 X 2 X 3 ε

x 21 x 22 x 23 f X 1 X 2 X 3 ε Chapter 2 Estimation 2.1 Example Let s start with an example. Suppose that Y is the fuel consumption of a particular model of car in m.p.g. Suppose that the predictors are 1. X 1 the weight of the car

More information

Biostatistics Advanced Methods in Biostatistics IV

Biostatistics Advanced Methods in Biostatistics IV Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results

More information

The Statistical Property of Ordinary Least Squares

The Statistical Property of Ordinary Least Squares The Statistical Property of Ordinary Least Squares The linear equation, on which we apply the OLS is y t = X t β + u t Then, as we have derived, the OLS estimator is ˆβ = [ X T X] 1 X T y Then, substituting

More information

New insights into best linear unbiased estimation and the optimality of least-squares

New insights into best linear unbiased estimation and the optimality of least-squares Journal of Multivariate Analysis 97 (2006) 575 585 www.elsevier.com/locate/jmva New insights into best linear unbiased estimation and the optimality of least-squares Mario Faliva, Maria Grazia Zoia Istituto

More information

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Ma 3/103: Lecture 24 Linear Regression I: Estimation Ma 3/103: Lecture 24 Linear Regression I: Estimation March 3, 2017 KC Border Linear Regression I March 3, 2017 1 / 32 Regression analysis Regression analysis Estimate and test E(Y X) = f (X). f is the

More information

Well-developed and understood properties

Well-developed and understood properties 1 INTRODUCTION TO LINEAR MODELS 1 THE CLASSICAL LINEAR MODEL Most commonly used statistical models Flexible models Well-developed and understood properties Ease of interpretation Building block for more

More information

Multiple linear regression: estimation and model fitting

Multiple linear regression: estimation and model fitting Multiple linear regression: estimation and model fitting January 25 Introduction The goal of today s class is to set up a multiple regression model in terms of matrices and then solve for the regression

More information

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that Linear Regression For (X, Y ) a pair of random variables with values in R p R we assume that E(Y X) = β 0 + with β R p+1. p X j β j = (1, X T )β j=1 This model of the conditional expectation is linear

More information

Math 533 Extra Hour Material

Math 533 Extra Hour Material Math 533 Extra Hour Material A Justification for Regression The Justification for Regression It is well-known that if we want to predict a random quantity Y using some quantity m according to a mean-squared

More information

INDIAN INSTITUTE OF SCIENCE STOCHASTIC HYDROLOGY. Lecture -30 Course Instructor : Prof. P. P. MUJUMDAR Department of Civil Engg., IISc.

INDIAN INSTITUTE OF SCIENCE STOCHASTIC HYDROLOGY. Lecture -30 Course Instructor : Prof. P. P. MUJUMDAR Department of Civil Engg., IISc. INDIAN INSTITUTE OF SCIENCE STOCHASTIC HYDROLOGY Lecture -30 Course Instructor : Prof. P. P. MUJUMDAR Department of Civil Engg., IISc. Summary of the previous lecture IDF relationship Procedure for creating

More information

EC3062 ECONOMETRICS. THE MULTIPLE REGRESSION MODEL Consider T realisations of the regression equation. (1) y = β 0 + β 1 x β k x k + ε,

EC3062 ECONOMETRICS. THE MULTIPLE REGRESSION MODEL Consider T realisations of the regression equation. (1) y = β 0 + β 1 x β k x k + ε, THE MULTIPLE REGRESSION MODEL Consider T realisations of the regression equation (1) y = β 0 + β 1 x 1 + + β k x k + ε, which can be written in the following form: (2) y 1 y 2.. y T = 1 x 11... x 1k 1

More information

235 Final exam review questions

235 Final exam review questions 5 Final exam review questions Paul Hacking December 4, 0 () Let A be an n n matrix and T : R n R n, T (x) = Ax the linear transformation with matrix A. What does it mean to say that a vector v R n is an

More information

Linear Algebra Review. Vectors

Linear Algebra Review. Vectors Linear Algebra Review 9/4/7 Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa (UCSD) Cogsci 8F Linear Algebra review Vectors

More information

Chapter 7. Canonical Forms. 7.1 Eigenvalues and Eigenvectors

Chapter 7. Canonical Forms. 7.1 Eigenvalues and Eigenvectors Chapter 7 Canonical Forms 7.1 Eigenvalues and Eigenvectors Definition 7.1.1. Let V be a vector space over the field F and let T be a linear operator on V. An eigenvalue of T is a scalar λ F such that there

More information

Xβ is a linear combination of the columns of X: Copyright c 2010 Dan Nettleton (Iowa State University) Statistics / 25 X =

Xβ is a linear combination of the columns of X: Copyright c 2010 Dan Nettleton (Iowa State University) Statistics / 25 X = The Gauss-Markov Linear Model y Xβ + ɛ y is an n random vector of responses X is an n p matrix of constants with columns corresponding to explanatory variables X is sometimes referred to as the design

More information

STK-IN4300 Statistical Learning Methods in Data Science

STK-IN4300 Statistical Learning Methods in Data Science Outline of the lecture STK-I4300 Statistical Learning Methods in Data Science Riccardo De Bin debin@math.uio.no Model Assessment and Selection Cross-Validation Bootstrap Methods Methods using Derived Input

More information

Matrix Approach to Simple Linear Regression: An Overview

Matrix Approach to Simple Linear Regression: An Overview Matrix Approach to Simple Linear Regression: An Overview Aspects of matrices that you should know: Definition of a matrix Addition/subtraction/multiplication of matrices Symmetric/diagonal/identity matrix

More information

Weighted Least Squares

Weighted Least Squares Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w

More information

Learning with Singular Vectors

Learning with Singular Vectors Learning with Singular Vectors CIS 520 Lecture 30 October 2015 Barry Slaff Based on: CIS 520 Wiki Materials Slides by Jia Li (PSU) Works cited throughout Overview Linear regression: Given X, Y find w:

More information

Stat 159/259: Linear Algebra Notes

Stat 159/259: Linear Algebra Notes Stat 159/259: Linear Algebra Notes Jarrod Millman November 16, 2015 Abstract These notes assume you ve taken a semester of undergraduate linear algebra. In particular, I assume you are familiar with the

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is

More information

Total Least Squares Approach in Regression Methods

Total Least Squares Approach in Regression Methods WDS'08 Proceedings of Contributed Papers, Part I, 88 93, 2008. ISBN 978-80-7378-065-4 MATFYZPRESS Total Least Squares Approach in Regression Methods M. Pešta Charles University, Faculty of Mathematics

More information

Designing Information Devices and Systems II

Designing Information Devices and Systems II EECS 16B Fall 2016 Designing Information Devices and Systems II Linear Algebra Notes Introduction In this set of notes, we will derive the linear least squares equation, study the properties symmetric

More information

Basic Distributional Assumptions of the Linear Model: 1. The errors are unbiased: E[ε] = The errors are uncorrelated with common variance:

Basic Distributional Assumptions of the Linear Model: 1. The errors are unbiased: E[ε] = The errors are uncorrelated with common variance: 8. PROPERTIES OF LEAST SQUARES ESTIMATES 1 Basic Distributional Assumptions of the Linear Model: 1. The errors are unbiased: E[ε] = 0. 2. The errors are uncorrelated with common variance: These assumptions

More information

CS 246 Review of Linear Algebra 01/17/19

CS 246 Review of Linear Algebra 01/17/19 1 Linear algebra In this section we will discuss vectors and matrices. We denote the (i, j)th entry of a matrix A as A ij, and the ith entry of a vector as v i. 1.1 Vectors and vector operations A vector

More information

STAT 540: Data Analysis and Regression

STAT 540: Data Analysis and Regression STAT 540: Data Analysis and Regression Wen Zhou http://www.stat.colostate.edu/~riczw/ Email: riczw@stat.colostate.edu Department of Statistics Colorado State University Fall 205 W. Zhou (Colorado State

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

Topic 7 - Matrix Approach to Simple Linear Regression. Outline. Matrix. Matrix. Review of Matrices. Regression model in matrix form

Topic 7 - Matrix Approach to Simple Linear Regression. Outline. Matrix. Matrix. Review of Matrices. Regression model in matrix form Topic 7 - Matrix Approach to Simple Linear Regression Review of Matrices Outline Regression model in matrix form - Fall 03 Calculations using matrices Topic 7 Matrix Collection of elements arranged in

More information

Linear Algebra Primer

Linear Algebra Primer Linear Algebra Primer David Doria daviddoria@gmail.com Wednesday 3 rd December, 2008 Contents Why is it called Linear Algebra? 4 2 What is a Matrix? 4 2. Input and Output.....................................

More information

High-dimensional regression

High-dimensional regression High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and

More information

Basis Penalty Smoothers. Simon Wood Mathematical Sciences, University of Bath, U.K.

Basis Penalty Smoothers. Simon Wood Mathematical Sciences, University of Bath, U.K. Basis Penalty Smoothers Simon Wood Mathematical Sciences, University of Bath, U.K. Estimating functions It is sometimes useful to estimate smooth functions from data, without being too precise about the

More information

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = 30 MATHEMATICS REVIEW G A.1.1 Matrices and Vectors Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = a 11 a 12... a 1N a 21 a 22... a 2N...... a M1 a M2... a MN A matrix can

More information

ANALYSIS OF VARIANCE AND QUADRATIC FORMS

ANALYSIS OF VARIANCE AND QUADRATIC FORMS 4 ANALYSIS OF VARIANCE AND QUADRATIC FORMS The previous chapter developed the regression results involving linear functions of the dependent variable, β, Ŷ, and e. All were shown to be normally distributed

More information