Bayes Estimators & Ridge Regression

Size: px
Start display at page:

Download "Bayes Estimators & Ridge Regression"

Transcription

1 Readings Chapter 14 Christensen Merlise Clyde September 29, 2015

2 How Good are Estimators? Quadratic loss for estimating β using estimator a L(β, a) = (β a) T (β a)

3 How Good are Estimators? Quadratic loss for estimating β using estimator a L(β, a) = (β a) T (β a) Consider our expected loss (before we see the data) of taking an action a

4 How Good are Estimators? Quadratic loss for estimating β using estimator a L(β, a) = (β a) T (β a) Consider our expected loss (before we see the data) of taking an action a Under OLS or the Reference prior the Expected Mean Square Error

5 How Good are Estimators? Quadratic loss for estimating β using estimator a L(β, a) = (β a) T (β a) Consider our expected loss (before we see the data) of taking an action a Under OLS or the Reference prior the Expected Mean Square Error E Y [(β ˆβ) T (β ˆβ) = σ 2 tr[(x T X) 1 ]

6 How Good are Estimators? Quadratic loss for estimating β using estimator a L(β, a) = (β a) T (β a) Consider our expected loss (before we see the data) of taking an action a Under OLS or the Reference prior the Expected Mean Square Error E Y [(β ˆβ) T (β ˆβ) = σ 2 tr[(x T X) 1 ] = p σ 2 j=1 λ 1 j

7 How Good are Estimators? Quadratic loss for estimating β using estimator a L(β, a) = (β a) T (β a) Consider our expected loss (before we see the data) of taking an action a Under OLS or the Reference prior the Expected Mean Square Error E Y [(β ˆβ) T (β ˆβ) = σ 2 tr[(x T X) 1 ] = p σ 2 j=1 λ 1 j If smallest λ j 0 then MSE

8 Is g-prior any better? Under the g-prior E Y [(β g 1+g ˆβ) T (β g 1+g ˆβ)

9 Is g-prior any better? Under the g-prior E Y [(β E[L(β, g 1+g ˆβ) T (β g 1+g ˆβ) ( ) g 1 + g ˆβ)] g 2 = σ 2 tr[(x T X) 1 ] + βt β 1 + g (1 + g) 2

10 Is g-prior any better? Under the g-prior E Y [(β E[L(β, ( g 1 + g ˆβ)] g = σ 2 = g 1+g ˆβ) T (β g 1+g ˆβ) ) 2 tr[(x T X) 1 ] + βt β (1 + g) g 1 (1 + g) 2 (σ2 g 2 λ 1 j + β 2 )

11 Is g-prior any better? Under the g-prior E Y [(β E[L(β, ( g 1 + g ˆβ)] g = σ 2 = g 1+g ˆβ) T (β g 1+g ˆβ) ) 2 tr[(x T X) 1 ] + βt β (1 + g) g 1 (1 + g) 2 (σ2 g 2 λ 1 j + β 2 ) Aside: g prior is better than Reference Prior if g > β 2 σ 2 λ 1 1 j But still have risk going to infinity as λ 0

12 Canonical Orthogonal Representation & Ridge Regression Assume that X has been centered and standardized so that X T X = corr(x)

13 Canonical Orthogonal Representation & Ridge Regression Assume that X has been centered and standardized so that X T X = corr(x) (use scale function in R)

14 Canonical Orthogonal Representation & Ridge Regression Assume that X has been centered and standardized so that X T X = corr(x) (use scale function in R) Write X = U p LV T Singular Value Decomposition

15 Canonical Orthogonal Representation & Ridge Regression Assume that X has been centered and standardized so that X T X = corr(x) (use scale function in R) Write X = U p LV T Singular Value Decomposition where U T p U p = I p and V is p p orthogonal matrix, L is diagonal

16 Canonical Orthogonal Representation & Ridge Regression Assume that X has been centered and standardized so that X T X = corr(x) (use scale function in R) Write X = U p LV T Singular Value Decomposition where U T p U p = I p and V is p p orthogonal matrix, L is diagonal Y = 1α + U p LV T β + ɛ

17 Canonical Orthogonal Representation & Ridge Regression Assume that X has been centered and standardized so that X T X = corr(x) (use scale function in R) Write X = U p LV T Singular Value Decomposition where U T p U p = I p and V is p p orthogonal matrix, L is diagonal Y = 1α + U p LV T β + ɛ Let U = [1 U p U n p 1 ] n n orthogonal matrix

18 Canonical Orthogonal Representation & Ridge Regression Assume that X has been centered and standardized so that X T X = corr(x) (use scale function in R) Write X = U p LV T Singular Value Decomposition where U T p U p = I p and V is p p orthogonal matrix, L is diagonal Y = 1α + U p LV T β + ɛ Let U = [1 U p U n p 1 ] n n orthogonal matrix Rotate by U T

19 Canonical Orthogonal Representation & Ridge Regression Assume that X has been centered and standardized so that X T X = corr(x) (use scale function in R) Write X = U p LV T Singular Value Decomposition where U T p U p = I p and V is p p orthogonal matrix, L is diagonal Y = 1α + U p LV T β + ɛ Let U = [1 U p U n p 1 ] n n orthogonal matrix Rotate by U T U T Y = U T 1α + U T U p LV T β + U T ɛ

20 Canonical Orthogonal Representation & Ridge Regression Assume that X has been centered and standardized so that X T X = corr(x) (use scale function in R) Write X = U p LV T Singular Value Decomposition where U T p U p = I p and V is p p orthogonal matrix, L is diagonal Y = 1α + U p LV T β + ɛ Let U = [1 U p U n p 1 ] n n orthogonal matrix Rotate by U T U T Y = U T 1α + U T U p LV T β + U T ɛ n 0 p ( Y = 0 L α γ 0 n p 1 0 n p 1 p ) + ɛ

21 Canonical Orthogonal Representation & Ridge Regression Assume that X has been centered and standardized so that X T X = corr(x) (use scale function in R) Write X = U p LV T Singular Value Decomposition where U T p U p = I p and V is p p orthogonal matrix, L is diagonal Y = 1α + U p LV T β + ɛ Let U = [1 U p U n p 1 ] n n orthogonal matrix Rotate by U T U T Y = U T 1α + U T U p LV T β + U T ɛ n 0 p ( Y = 0 L α γ 0 n p 1 0 n p 1 p ) + ɛ

22 Orthogonal Regression U T Y = U T 1α + U T U p LV T β + U T ɛ

23 Orthogonal Regression U T Y = U T 1α + U T U p LV T β + U T ɛ n 0 p ( Y = 0 L α γ 0 n p 1 0 n p 1 p ) + ɛ

24 Orthogonal Regression U T Y = U T 1α + U T U p LV T β + U T ɛ n 0 p ( Y = 0 L α γ 0 n p 1 0 n p 1 p ) + ɛ ˆα = ȳ

25 Orthogonal Regression U T Y = U T 1α + U T U p LV T β + U T ɛ n 0 p ( Y = 0 L α γ 0 n p 1 0 n p 1 p ) + ɛ ˆα = ȳ ˆγ = (L T L) 1 L T U T p Y or ˆγ i = y i /l i for i = 1,..., p

26 Orthogonal Regression U T Y = U T 1α + U T U p LV T β + U T ɛ n 0 p ( Y = 0 L α γ 0 n p 1 0 n p 1 p ) + ɛ ˆα = ȳ ˆγ = (L T L) 1 L T U T p Y or ˆγ i = y i /l i for i = 1,..., p Var(ˆγ i ) = σ 2 /l 2 i Directions in X space U j with small eigenvectors l i have the largest variances. Unstable directions.

27 Ridge Regression & Independent Prior (Another) Normal Conjugate Prior Distribution on γ: γ φ N(0 p, 1 φk I p)

28 Ridge Regression & Independent Prior (Another) Normal Conjugate Prior Distribution on γ: Posterior mean γ φ N(0 p, 1 φk I p) γ = (L T L + ki) 1 L T U T p Y = (L T L + ki) 1 L T Lˆγ

29 Ridge Regression & Independent Prior (Another) Normal Conjugate Prior Distribution on γ: Posterior mean γ φ N(0 p, 1 φk I p) γ = (L T L + ki) 1 L T U T p Y = (L T L + ki) 1 L T Lˆγ γ i = l i 2 li 2 + k ˆγ i = λ i λ i + k ˆγ i

30 Ridge Regression & Independent Prior (Another) Normal Conjugate Prior Distribution on γ: γ φ N(0 p, 1 φk I p) Posterior mean γ = (L T L + ki) 1 L T U T p Y = (L T L + ki) 1 L T Lˆγ γ i = l i 2 li 2 + k ˆγ i = λ i λ i + k ˆγ i When λ i 0 then γ i 0 When k 0 we get OLS back but if k gets too big posterior mean goes to zero.

31 Transform Transform back β = V γ

32 Transform Transform back β = V γ β = (X T X + ki) 1 X T Xˆβ

33 Transform Transform back β = V γ β = (X T X + ki) 1 X T Xˆβ importance of standardizing

34 Transform Transform back β = V γ β = (X T X + ki) 1 X T Xˆβ importance of standardizing Is there a value of k for which ridge is better in terms of Expected MSE than OLS?

35 Transform Transform back β = V γ β = (X T X + ki) 1 X T Xˆβ importance of standardizing Is there a value of k for which ridge is better in terms of Expected MSE than OLS? Choice of k? Hoerl & Kennard (1970)

36 MSE Can show (HW) that E[(β β) T (β β)] = E[(γ γ) T (γ γ]

37 MSE Can show (HW) that E[(β β) T (β β)] = E[(γ γ) T (γ γ] Var(γ i γ i ) = σ 2 l 2 i /(l 2 i + k) 2

38 MSE Can show (HW) that E[(β β) T (β β)] = E[(γ γ) T (γ γ] Var(γ i γ i ) = σ 2 l 2 i /(l 2 i + k) 2 Bias of γ is k/(l 2 i + k)

39 MSE Can show (HW) that E[(β β) T (β β)] = E[(γ γ) T (γ γ] Var(γ i γ i ) = σ 2 li 2/(l i 2 + k) 2 Bias of γ is k/(li 2 + k) MSE σ 2 i (l 2 i l 2 i + k) 2 + k2 i γ 2 i (l 2 i + k) 2 The derivative with respect to k is negative at k = 0, hence the function is decreasing.

40 MSE Can show (HW) that E[(β β) T (β β)] = E[(γ γ) T (γ γ] Var(γ i γ i ) = σ 2 li 2/(l i 2 + k) 2 Bias of γ is k/(li 2 + k) MSE σ 2 i (l 2 i l 2 i + k) 2 + k2 i γ 2 i (l 2 i + k) 2 The derivative with respect to k is negative at k = 0, hence the function is decreasing. Since k = 0 is OLS, this means that there is a value of k that will always be better than OLS

41 Alternative Motivation If ˆβ is unconstrained expect high variance with nearly singular X

42 Alternative Motivation If ˆβ is unconstrained expect high variance with nearly singular X Let Y c = (I P 1 )Y and X c the centered and standardized X matrix

43 Alternative Motivation If ˆβ is unconstrained expect high variance with nearly singular X Let Y c = (I P 1 )Y and X c the centered and standardized X matrix Control how large coefficients may grow

44 Alternative Motivation If ˆβ is unconstrained expect high variance with nearly singular X Let Y c = (I P 1 )Y and X c the centered and standardized X matrix Control how large coefficients may grow min β (Yc X c β) T (Y c X c β) subject to β 2 j t

45 Alternative Motivation If ˆβ is unconstrained expect high variance with nearly singular X Let Y c = (I P 1 )Y and X c the centered and standardized X matrix Control how large coefficients may grow min β (Yc X c β) T (Y c X c β) subject to β 2 j t Equivalent Quadratic Programming Problem min β Yc X c β 2 + k β 2

46 Alternative Motivation If ˆβ is unconstrained expect high variance with nearly singular X Let Y c = (I P 1 )Y and X c the centered and standardized X matrix Control how large coefficients may grow min β (Yc X c β) T (Y c X c β) subject to β 2 j t Equivalent Quadratic Programming Problem min β Yc X c β 2 + k β 2 penalized likelihood

47 Alternative Motivation If ˆβ is unconstrained expect high variance with nearly singular X Let Y c = (I P 1 )Y and X c the centered and standardized X matrix Control how large coefficients may grow min β (Yc X c β) T (Y c X c β) subject to β 2 j t Equivalent Quadratic Programming Problem min β Yc X c β 2 + k β 2 penalized likelihood

48 Picture

49 Longley Data Employed GNP.deflator GNP Unemployed Armed.Forces Population Year

50 OLS > longley.lm = lm(employed ~., data=longley) > summary(longley.lm) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e e ** GNP.deflator 1.506e e GNP e e Unemployed e e ** Armed.Forces e e *** Population e e Year 1.829e e ** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: on 9 degrees of freedom Multiple R-squared: ,Adjusted R-squared: F-statistic: on 6 and 9 DF, p-value: 4.984e-10

51 Ridge Trace t(x$coef) x$lambda

52 Generalized Cross-validation > select(lm.ridge(employed ~., data=longley, lambda=seq(0, 0.1, ))) modified HKB estimator is modified L-W estimator is smallest value of GCV at > longley.rreg = lm.ridge(employed ~., data=longley, lambda=0.0028) > coef(longley.rreg) GNP.deflator GNP Unemployed Armed.Forces e e e e e-03 Population e-01 Year 1.557e+00

53 Testimators Goldstein & Smith (1974) have shown that if 1 0 h i 1 and γ i = h i ˆγ i 2 γ 2 i Var(ˆγ i ) < 1+h i 1 h i then γ i has smaller MSE than ˆγ i Case: If γ j < Var(ˆγ i ) = σ 2 /l 2 i then h i = 0 and γ i is better. Apply: Estimate σ 2 with SSE/(n - p - 1) and γ i with ˆγ i. Set h i = 0 if t-statistic is less than 1. testimator - see also Sclove (JASA 1968) and Copas ( JRSSB 1983)

54 Generalized Ridge Instead of γ j iid N(0, σ 2 /k) take γ j ind N(0, σ 2 /k i )

55 Generalized Ridge Instead of γ j iid N(0, σ 2 /k) take γ j ind N(0, σ 2 /k i ) Then Condition of Goldstein & Smith becomes [ 2 γi 2 < σ ] k j li 2

56 Generalized Ridge Instead of γ j iid N(0, σ 2 /k) take γ j ind N(0, σ 2 /k i ) Then Condition of Goldstein & Smith becomes [ 2 γi 2 < σ ] k j li 2 If l i is small almost any k i will improve over OLS

57 Generalized Ridge Instead of γ j iid N(0, σ 2 /k) take γ j ind N(0, σ 2 /k i ) Then Condition of Goldstein & Smith becomes [ 2 γi 2 < σ ] k j li 2 If l i is small almost any k i will improve over OLS if li 2 is large then only very small values of k i will give an improvement

58 Generalized Ridge Instead of γ j iid N(0, σ 2 /k) take γ j ind N(0, σ 2 /k i ) Then Condition of Goldstein & Smith becomes [ 2 γi 2 < σ ] k j li 2 If l i is small almost any k i will improve over OLS if li 2 is large then only very small values of k i will give an improvement Prior on k i?

59 Generalized Ridge Instead of γ j iid N(0, σ 2 /k) take γ j ind N(0, σ 2 /k i ) Then Condition of Goldstein & Smith becomes [ 2 γi 2 < σ ] k j li 2 If l i is small almost any k i will improve over OLS if li 2 is large then only very small values of k i will give an improvement Prior on k i? Induced prior on β? γ j ind N(0, σ 2 /k i ) β N(0, σ 2 VK 1 V T ) which is not diagonal. Loss of invariance.

60 Related Regression on PCA Principal Components of X may be obtained via the Singular Value Decomposition: X = U p LV T

61 Related Regression on PCA Principal Components of X may be obtained via the Singular Value Decomposition: X = U p LV T the l i are the eigenvalues of X T X

62 Related Regression on PCA Principal Components of X may be obtained via the Singular Value Decomposition: X = U p LV T the l i are the eigenvalues of X T X Y = 1α + ULV T β + ɛ = 1α + Fγ + ɛ Columns F i U i are the principal components of the data multivariate data X 1,..., X p

63 Related Regression on PCA Principal Components of X may be obtained via the Singular Value Decomposition: X = U p LV T the l i are the eigenvalues of X T X Y = 1α + ULV T β + ɛ = 1α + Fγ + ɛ Columns F i U i are the principal components of the data multivariate data X 1,..., X p If the direction F i is ill-defined (l i = 0 or λ i < ɛ then we may decide to not use F i in the model.

64 Related Regression on PCA Principal Components of X may be obtained via the Singular Value Decomposition: X = U p LV T the l i are the eigenvalues of X T X Y = 1α + ULV T β + ɛ = 1α + Fγ + ɛ Columns F i U i are the principal components of the data multivariate data X 1,..., X p If the direction F i is ill-defined (l i = 0 or λ i < ɛ then we may decide to not use F i in the model. equivalent to setting γ i = ˆγ i if l i ɛ γ i = 0 if l i < ɛ

65 Related Regression on PCA Principal Components of X may be obtained via the Singular Value Decomposition: X = U p LV T the l i are the eigenvalues of X T X Y = 1α + ULV T β + ɛ = 1α + Fγ + ɛ Columns F i U i are the principal components of the data multivariate data X 1,..., X p If the direction F i is ill-defined (l i = 0 or λ i < ɛ then we may decide to not use F i in the model. equivalent to setting γ i = ˆγ i if l i ɛ γ i = 0 if l i < ɛ

66 Summary OLS can clearly be dominated by other estimators for extimating β

67 Summary OLS can clearly be dominated by other estimators for extimating β Lead to Bayes like estimators

68 Summary OLS can clearly be dominated by other estimators for extimating β Lead to Bayes like estimators choice of penalties or prior hyper-parameters

69 Summary OLS can clearly be dominated by other estimators for extimating β Lead to Bayes like estimators choice of penalties or prior hyper-parameters hierarchical model with prior on k i

70 Summary OLS can clearly be dominated by other estimators for extimating β Lead to Bayes like estimators choice of penalties or prior hyper-parameters hierarchical model with prior on k i Shrinkage, dimension reduction & variable selection

71 Summary OLS can clearly be dominated by other estimators for extimating β Lead to Bayes like estimators choice of penalties or prior hyper-parameters hierarchical model with prior on k i Shrinkage, dimension reduction & variable selection what loss function? Estimation versus prediction? Copas 1983

Lasso & Bayesian Lasso

Lasso & Bayesian Lasso Readings Chapter 15 Christensen Merlise Clyde October 6, 2015 Lasso Tibshirani (JRSS B 1996) proposed estimating coefficients through L 1 constrained least squares Least Absolute Shrinkage and Selection

More information

Biostatistics Advanced Methods in Biostatistics IV

Biostatistics Advanced Methods in Biostatistics IV Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

A Modern Look at Classical Multivariate Techniques

A Modern Look at Classical Multivariate Techniques A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Maximum Likelihood Estimation Merlise Clyde STA721 Linear Models Duke University August 31, 2017 Outline Topics Likelihood Function Projections Maximum Likelihood Estimates Readings: Christensen Chapter

More information

STA Homework 1. Due back at the beginning of class on Oct 20, 2008

STA Homework 1. Due back at the beginning of class on Oct 20, 2008 STA 410 - Homework 1 Due back at the beginning of class on Oct 20, 2008 The following is split into two parts. Part I should be solved by all. Part II is mandatory only for the graduate students in Statistics.

More information

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

Business Statistics. Tommaso Proietti. Model Evaluation and Selection. DEF - Università di Roma 'Tor Vergata'

Business Statistics. Tommaso Proietti. Model Evaluation and Selection. DEF - Università di Roma 'Tor Vergata' Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Model Evaluation and Selection Predictive Ability of a Model: Denition and Estimation We aim at achieving a balance between parsimony

More information

Theorems. Least squares regression

Theorems. Least squares regression Theorems In this assignment we are trying to classify AML and ALL samples by use of penalized logistic regression. Before we indulge on the adventure of classification we should first explain the most

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression Reading: Hoff Chapter 9 November 4, 2009 Problem Data: Observe pairs (Y i,x i ),i = 1,... n Response or dependent variable Y Predictor or independent variable X GOALS: Exploring

More information

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Jonathan Taylor - p. 1/15 Today s class Bias-Variance tradeoff. Penalized regression. Cross-validation. - p. 2/15 Bias-variance

More information

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation Patrick Breheny February 8 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/27 Introduction Basic idea Standardization Large-scale testing is, of course, a big area and we could keep talking

More information

Regularization: Ridge Regression and the LASSO

Regularization: Ridge Regression and the LASSO Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression

More information

The prediction of house price

The prediction of house price 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Matrices and Multivariate Statistics - II

Matrices and Multivariate Statistics - II Matrices and Multivariate Statistics - II Richard Mott November 2011 Multivariate Random Variables Consider a set of dependent random variables z = (z 1,..., z n ) E(z i ) = µ i cov(z i, z j ) = σ ij =

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression September 24, 2008 Reading HH 8, GIll 4 Simple Linear Regression p.1/20 Problem Data: Observe pairs (Y i,x i ),i = 1,...n Response or dependent variable Y Predictor or independent

More information

Linear Models A linear model is defined by the expression

Linear Models A linear model is defined by the expression Linear Models A linear model is defined by the expression x = F β + ɛ. where x = (x 1, x 2,..., x n ) is vector of size n usually known as the response vector. β = (β 1, β 2,..., β p ) is the transpose

More information

Ridge Regression Revisited

Ridge Regression Revisited Ridge Regression Revisited Paul M.C. de Boer Christian M. Hafner Econometric Institute Report EI 2005-29 In general ridge (GR) regression p ridge parameters have to be determined, whereas simple ridge

More information

9/26/17. Ridge regression. What our model needs to do. Ridge Regression: L2 penalty. Ridge coefficients. Ridge coefficients

9/26/17. Ridge regression. What our model needs to do. Ridge Regression: L2 penalty. Ridge coefficients. Ridge coefficients What our model needs to do regression Usually, we are not just trying to explain observed data We want to uncover meaningful trends And predict future observations Our questions then are Is β" a good estimate

More information

Linear Regression Linear Regression with Shrinkage

Linear Regression Linear Regression with Shrinkage Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 1 Ridge Regression Ridge regression and the Lasso are two forms of regularized

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

INTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y

INTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y INTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y Predictor or Independent variable x Model with error: for i = 1,..., n, y i = α + βx i + ε i ε i : independent errors (sampling, measurement,

More information

RXshrink_in_R. Generalized Ridge and Least Angle Regression estimates most likely to have minimum MSE risk

RXshrink_in_R. Generalized Ridge and Least Angle Regression estimates most likely to have minimum MSE risk RXshrink_in_R Generalized Ridge and Least Angle Regression estimates most likely to have minimum MSE risk vignette-like documentation for R-package RXshrink version 1.1 - November 2018 Personal computing

More information

STAT 462-Computational Data Analysis

STAT 462-Computational Data Analysis STAT 462-Computational Data Analysis Chapter 5- Part 2 Nasser Sadeghkhani a.sadeghkhani@queensu.ca October 2017 1 / 27 Outline Shrinkage Methods 1. Ridge Regression 2. Lasso Dimension Reduction Methods

More information

Linear model selection and regularization

Linear model selection and regularization Linear model selection and regularization Problems with linear regression with least square 1. Prediction Accuracy: linear regression has low bias but suffer from high variance, especially when n p. It

More information

COMBINING THE LIU-TYPE ESTIMATOR AND THE PRINCIPAL COMPONENT REGRESSION ESTIMATOR

COMBINING THE LIU-TYPE ESTIMATOR AND THE PRINCIPAL COMPONENT REGRESSION ESTIMATOR Noname manuscript No. (will be inserted by the editor) COMBINING THE LIU-TYPE ESTIMATOR AND THE PRINCIPAL COMPONENT REGRESSION ESTIMATOR Deniz Inan Received: date / Accepted: date Abstract In this study

More information

Multi-View Regression via Canonincal Correlation Analysis

Multi-View Regression via Canonincal Correlation Analysis Multi-View Regression via Canonincal Correlation Analysis Dean P. Foster U. Pennsylvania (Sham M. Kakade of TTI) Model and background p ( i n) y i = X ij β j + ɛ i ɛ i iid N(0, σ 2 ) i=j Data mining and

More information

Lecture 14: Shrinkage

Lecture 14: Shrinkage Lecture 14: Shrinkage Reading: Section 6.2 STATS 202: Data mining and analysis October 27, 2017 1 / 19 Shrinkage methods The idea is to perform a linear regression, while regularizing or shrinking the

More information

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin Regression Review Statistics 149 Spring 2006 Copyright c 2006 by Mark E. Irwin Matrix Approach to Regression Linear Model: Y i = β 0 + β 1 X i1 +... + β p X ip + ɛ i ; ɛ i iid N(0, σ 2 ), i = 1,..., n

More information

Linear Regression Linear Regression with Shrinkage

Linear Regression Linear Regression with Shrinkage Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle

More information

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata' Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Linear Regression Specication Let Y be a univariate quantitative response variable. We model Y as follows: Y = f(x) + ε where

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression ST 430/514 Recall: a regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates).

More information

Statistical Models. Functional Linear Regression. A First Idea. Models for Scalar Responses

Statistical Models. Functional Linear Regression. A First Idea. Models for Scalar Responses Statistical Models Functional Linear Regression So far we have focussed on exploratory data analysis Smoothing Functional covariance Functional PCA/CCA Now we wish to examine predictive relationships generalization

More information

shrink: a vignette for the RXshrink R-package

shrink: a vignette for the RXshrink R-package shrink: a vignette for the RXshrink R-package Generalized Ridge and Least Angle Regression Version 1.0-7, December 2011 License: GPL (>= 2) Personal computing treatments for your data analysis infirmities

More information

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017 CPSC 340: Machine Learning and Data Mining More PCA Fall 2017 Admin Assignment 4: Due Friday of next week. No class Monday due to holiday. There will be tutorials next week on MAP/PCA (except Monday).

More information

High-dimensional regression

High-dimensional regression High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and

More information

Ridge and Lasso Regression

Ridge and Lasso Regression enote 8 1 enote 8 Ridge and Lasso Regression enote 8 INDHOLD 2 Indhold 8 Ridge and Lasso Regression 1 8.1 Reading material................................. 2 8.2 Presentation material...............................

More information

Administration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books

Administration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books STA 44/04 Jan 6, 00 / 5 Administration Homework on web page, due Feb NSERC summer undergraduate award applications due Feb 5 Some helpful books STA 44/04 Jan 6, 00... administration / 5 STA 44/04 Jan 6,

More information

A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables

A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables Qi Tang (Joint work with Kam-Wah Tsui and Sijian Wang) Department of Statistics University of Wisconsin-Madison Feb. 8,

More information

Sampling Distributions

Sampling Distributions Merlise Clyde Duke University September 8, 2016 Outline Topics Normal Theory Chi-squared Distributions Student t Distributions Readings: Christensen Apendix C, Chapter 1-2 Prostate Example > library(lasso2);

More information

STK-IN4300 Statistical Learning Methods in Data Science

STK-IN4300 Statistical Learning Methods in Data Science Outline of the lecture STK-I4300 Statistical Learning Methods in Data Science Riccardo De Bin debin@math.uio.no Model Assessment and Selection Cross-Validation Bootstrap Methods Methods using Derived Input

More information

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis Massimiliano Pontil 1 Today s plan SVD and principal component analysis (PCA) Connection

More information

We ve seen today that a square, symmetric matrix A can be written as. A = ODO t

We ve seen today that a square, symmetric matrix A can be written as. A = ODO t Lecture 5: Shrinky dink From last time... Inference So far we ve shied away from formal probabilistic arguments (aside from, say, assessing the independence of various outputs of a linear model) -- We

More information

7.3 Ridge Analysis of the Response Surface

7.3 Ridge Analysis of the Response Surface 7.3 Ridge Analysis of the Response Surface When analyzing a fitted response surface, the researcher may find that the stationary point is outside of the experimental design region, but the researcher wants

More information

Machine Learning for Economists: Part 4 Shrinkage and Sparsity

Machine Learning for Economists: Part 4 Shrinkage and Sparsity Machine Learning for Economists: Part 4 Shrinkage and Sparsity Michal Andrle International Monetary Fund Washington, D.C., October, 2018 Disclaimer #1: The views expressed herein are those of the authors

More information

CS540 Machine learning Lecture 5

CS540 Machine learning Lecture 5 CS540 Machine learning Lecture 5 1 Last time Basis functions for linear regression Normal equations QR SVD - briefly 2 This time Geometry of least squares (again) SVD more slowly LMS Ridge regression 3

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

Distribution Assumptions

Distribution Assumptions Merlise Clyde Duke University November 22, 2016 Outline Topics Normality & Transformations Box-Cox Nonlinear Regression Readings: Christensen Chapter 13 & Wakefield Chapter 6 Linear Model Linear Model

More information

Applied Regression Analysis

Applied Regression Analysis Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of

More information

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is. Linear regression We have that the estimated mean in linear regression is The standard error of ˆµ Y X=x is where x = 1 n s.e.(ˆµ Y X=x ) = σ ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. 1 n + (x x)2 i (x i x) 2 i x i. The

More information

Homework 1: Solutions

Homework 1: Solutions Homework 1: Solutions Statistics 413 Fall 2017 Data Analysis: Note: All data analysis results are provided by Michael Rodgers 1. Baseball Data: (a) What are the most important features for predicting players

More information

Lecture 6: Methods for high-dimensional problems

Lecture 6: Methods for high-dimensional problems Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,

More information

STAT 705 Chapter 16: One-way ANOVA

STAT 705 Chapter 16: One-way ANOVA STAT 705 Chapter 16: One-way ANOVA Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 21 What is ANOVA? Analysis of variance (ANOVA) models are regression

More information

PENALIZED PRINCIPAL COMPONENT REGRESSION. Ayanna Byrd. (Under the direction of Cheolwoo Park) Abstract

PENALIZED PRINCIPAL COMPONENT REGRESSION. Ayanna Byrd. (Under the direction of Cheolwoo Park) Abstract PENALIZED PRINCIPAL COMPONENT REGRESSION by Ayanna Byrd (Under the direction of Cheolwoo Park) Abstract When using linear regression problems, an unbiased estimate is produced by the Ordinary Least Squares.

More information

Introduction to Statistical modeling: handout for Math 489/583

Introduction to Statistical modeling: handout for Math 489/583 Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect

More information

Response Surface Methodology III

Response Surface Methodology III LECTURE 7 Response Surface Methodology III 1. Canonical Form of Response Surface Models To examine the estimated regression model we have several choices. First, we could plot response contours. Remember

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Math 3330: Solution to midterm Exam

Math 3330: Solution to midterm Exam Math 3330: Solution to midterm Exam Question 1: (14 marks) Suppose the regression model is y i = β 0 + β 1 x i + ε i, i = 1,, n, where ε i are iid Normal distribution N(0, σ 2 ). a. (2 marks) Compute the

More information

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation

More information

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Multiple Regression Analysis. Part III. Multiple Regression Analysis Part III Multiple Regression Analysis As of Sep 26, 2017 1 Multiple Regression Analysis Estimation Matrix form Goodness-of-Fit R-square Adjusted R-square Expected values of the OLS estimators Irrelevant

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

MATH 829: Introduction to Data Mining and Analysis Principal component analysis

MATH 829: Introduction to Data Mining and Analysis Principal component analysis 1/11 MATH 829: Introduction to Data Mining and Analysis Principal component analysis Dominique Guillot Departments of Mathematical Sciences University of Delaware April 4, 2016 Motivation 2/11 High-dimensional

More information

Gauss Markov & Predictive Distributions

Gauss Markov & Predictive Distributions Gauss Markov & Predictive Distributions Merlise Clyde STA721 Linear Models Duke University September 14, 2017 Outline Topics Gauss-Markov Theorem Estimability and Prediction Readings: Christensen Chapter

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

Ridge Regression and Ill-Conditioning

Ridge Regression and Ill-Conditioning Journal of Modern Applied Statistical Methods Volume 3 Issue Article 8-04 Ridge Regression and Ill-Conditioning Ghadban Khalaf King Khalid University, Saudi Arabia, albadran50@yahoo.com Mohamed Iguernane

More information

Spatial Process Estimates as Smoothers: A Review

Spatial Process Estimates as Smoothers: A Review Spatial Process Estimates as Smoothers: A Review Soutir Bandyopadhyay 1 Basic Model The observational model considered here has the form Y i = f(x i ) + ɛ i, for 1 i n. (1.1) where Y i is the observed

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Sampling Distributions

Sampling Distributions Merlise Clyde Duke University September 3, 2015 Outline Topics Normal Theory Chi-squared Distributions Student t Distributions Readings: Christensen Apendix C, Chapter 1-2 Prostate Example > library(lasso2);

More information

Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods.

Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods. TheThalesians Itiseasyforphilosopherstoberichiftheychoose Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods Ivan Zhdankin

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim 0.0 1.0 1.5 2.0 2.5 3.0 8 10 12 14 16 18 20 22 y x Figure 1: The fitted line using the shipment route-number of ampules data STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim Problem#

More information

MLES & Multivariate Normal Theory

MLES & Multivariate Normal Theory Merlise Clyde September 6, 2016 Outline Expectations of Quadratic Forms Distribution Linear Transformations Distribution of estimates under normality Properties of MLE s Recap Ŷ = ˆµ is an unbiased estimate

More information

Improved Liu Estimators for the Poisson Regression Model

Improved Liu Estimators for the Poisson Regression Model www.ccsenet.org/isp International Journal of Statistics and Probability Vol., No. ; May 202 Improved Liu Estimators for the Poisson Regression Model Kristofer Mansson B. M. Golam Kibria Corresponding author

More information

Lecture VIII Dim. Reduction (I)

Lecture VIII Dim. Reduction (I) Lecture VIII Dim. Reduction (I) Contents: Subset Selection & Shrinkage Ridge regression, Lasso PCA, PCR, PLS Lecture VIII: MLSC - Dr. Sethu Viayakumar Data From Human Movement Measure arm movement and

More information

Collinearity: Impact and Possible Remedies

Collinearity: Impact and Possible Remedies Collinearity: Impact and Possible Remedies Deepayan Sarkar What is collinearity? Exact dependence between columns of X make coefficients non-estimable Collinearity refers to the situation where some columns

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression MATH 282A Introduction to Computational Statistics University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/ eariasca/math282a.html MATH 282A University

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Chapter 6 October 18, 2016 Chapter 6 October 18, 2016 1 / 80 1 Subset selection 2 Shrinkage methods 3 Dimension reduction methods (using derived inputs) 4 High

More information

Choosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation

Choosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation Choosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation COMPSTAT 2010 Revised version; August 13, 2010 Michael G.B. Blum 1 Laboratoire TIMC-IMAG, CNRS, UJF Grenoble

More information

Comparison of Some Improved Estimators for Linear Regression Model under Different Conditions

Comparison of Some Improved Estimators for Linear Regression Model under Different Conditions Florida International University FIU Digital Commons FIU Electronic Theses and Dissertations University Graduate School 3-24-2015 Comparison of Some Improved Estimators for Linear Regression Model under

More information

ECON 4160, Lecture 11 and 12

ECON 4160, Lecture 11 and 12 ECON 4160, 2016. Lecture 11 and 12 Co-integration Ragnar Nymoen Department of Economics 9 November 2017 1 / 43 Introduction I So far we have considered: Stationary VAR ( no unit roots ) Standard inference

More information

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression 22s:52 Applied Linear Regression Ch. 4 (sec. and Ch. 5 (sec. & 4: Logistic Regression Logistic Regression When the response variable is a binary variable, such as 0 or live or die fail or succeed then

More information

Influencial Observations in Ridge Regression

Influencial Observations in Ridge Regression Influencial Observations in Ridge Regression by Robert L. Obenchain We augment traditional trace displays with dynamic graphics that can reveal dramatic changes in influence/outliers/leverages of observations

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is

More information

An Introduction to Bayesian Linear Regression

An Introduction to Bayesian Linear Regression An Introduction to Bayesian Linear Regression APPM 5720: Bayesian Computation Fall 2018 A SIMPLE LINEAR MODEL Suppose that we observe explanatory variables x 1, x 2,..., x n and dependent variables y 1,

More information

MS&E 226. In-Class Midterm Examination Solutions Small Data October 20, 2015

MS&E 226. In-Class Midterm Examination Solutions Small Data October 20, 2015 MS&E 226 In-Class Midterm Examination Solutions Small Data October 20, 2015 PROBLEM 1. Alice uses ordinary least squares to fit a linear regression model on a dataset containing outcome data Y and covariates

More information

Ridge Regression. Flachs, Munkholt og Skotte. May 4, 2009

Ridge Regression. Flachs, Munkholt og Skotte. May 4, 2009 Ridge Regression Flachs, Munkholt og Skotte May 4, 2009 As in usual regression we consider a pair of random variables (X, Y ) with values in R p R and assume that for some (β 0, β) R +p it holds that E(Y

More information

Linear Regression (9/11/13)

Linear Regression (9/11/13) STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter

More information

Basis Penalty Smoothers. Simon Wood Mathematical Sciences, University of Bath, U.K.

Basis Penalty Smoothers. Simon Wood Mathematical Sciences, University of Bath, U.K. Basis Penalty Smoothers Simon Wood Mathematical Sciences, University of Bath, U.K. Estimating functions It is sometimes useful to estimate smooth functions from data, without being too precise about the

More information

Log Covariance Matrix Estimation

Log Covariance Matrix Estimation Log Covariance Matrix Estimation Xinwei Deng Department of Statistics University of Wisconsin-Madison Joint work with Kam-Wah Tsui (Univ. of Wisconsin-Madsion) 1 Outline Background and Motivation The Proposed

More information

Penalized Regression

Penalized Regression Penalized Regression Deepayan Sarkar Penalized regression Another potential remedy for collinearity Decreases variability of estimated coefficients at the cost of introducing bias Also known as regularization

More information

20.1. Balanced One-Way Classification Cell means parametrization: ε 1. ε I. + ˆɛ 2 ij =

20.1. Balanced One-Way Classification Cell means parametrization: ε 1. ε I. + ˆɛ 2 ij = 20. ONE-WAY ANALYSIS OF VARIANCE 1 20.1. Balanced One-Way Classification Cell means parametrization: Y ij = µ i + ε ij, i = 1,..., I; j = 1,..., J, ε ij N(0, σ 2 ), In matrix form, Y = Xβ + ε, or 1 Y J

More information

Theoretical Exercises Statistical Learning, 2009

Theoretical Exercises Statistical Learning, 2009 Theoretical Exercises Statistical Learning, 2009 Niels Richard Hansen April 20, 2009 The following exercises are going to play a central role in the course Statistical learning, block 4, 2009. The exercises

More information

Peter Hoff Minimax estimation October 31, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11

Peter Hoff Minimax estimation October 31, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11 Contents 1 Motivation and definition 1 2 Least favorable prior 3 3 Least favorable prior sequence 11 4 Nonparametric problems 15 5 Minimax and admissibility 18 6 Superefficiency and sparsity 19 Most of

More information

MSG500/MVE190 Linear Models - Lecture 15

MSG500/MVE190 Linear Models - Lecture 15 MSG500/MVE190 Linear Models - Lecture 15 Rebecka Jörnsten Mathematical Statistics University of Gothenburg/Chalmers University of Technology December 13, 2012 1 Regularized regression In ordinary least

More information

5. Linear Regression

5. Linear Regression 5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

More information

Lecture 11: Regression Methods I (Linear Regression)

Lecture 11: Regression Methods I (Linear Regression) Lecture 11: Regression Methods I (Linear Regression) Fall, 2017 1 / 40 Outline Linear Model Introduction 1 Regression: Supervised Learning with Continuous Responses 2 Linear Models and Multiple Linear

More information

Variance Decomposition and Goodness of Fit

Variance Decomposition and Goodness of Fit Variance Decomposition and Goodness of Fit 1. Example: Monthly Earnings and Years of Education In this tutorial, we will focus on an example that explores the relationship between total monthly earnings

More information

Learning with Singular Vectors

Learning with Singular Vectors Learning with Singular Vectors CIS 520 Lecture 30 October 2015 Barry Slaff Based on: CIS 520 Wiki Materials Slides by Jia Li (PSU) Works cited throughout Overview Linear regression: Given X, Y find w:

More information

Chemometrics. Matti Hotokka Physical chemistry Åbo Akademi University

Chemometrics. Matti Hotokka Physical chemistry Åbo Akademi University Chemometrics Matti Hotokka Physical chemistry Åbo Akademi University Linear regression Experiment Consider spectrophotometry as an example Beer-Lamberts law: A = cå Experiment Make three known references

More information