Bayes Estimators & Ridge Regression
|
|
- Lucas Buck Palmer
- 5 years ago
- Views:
Transcription
1 Readings Chapter 14 Christensen Merlise Clyde September 29, 2015
2 How Good are Estimators? Quadratic loss for estimating β using estimator a L(β, a) = (β a) T (β a)
3 How Good are Estimators? Quadratic loss for estimating β using estimator a L(β, a) = (β a) T (β a) Consider our expected loss (before we see the data) of taking an action a
4 How Good are Estimators? Quadratic loss for estimating β using estimator a L(β, a) = (β a) T (β a) Consider our expected loss (before we see the data) of taking an action a Under OLS or the Reference prior the Expected Mean Square Error
5 How Good are Estimators? Quadratic loss for estimating β using estimator a L(β, a) = (β a) T (β a) Consider our expected loss (before we see the data) of taking an action a Under OLS or the Reference prior the Expected Mean Square Error E Y [(β ˆβ) T (β ˆβ) = σ 2 tr[(x T X) 1 ]
6 How Good are Estimators? Quadratic loss for estimating β using estimator a L(β, a) = (β a) T (β a) Consider our expected loss (before we see the data) of taking an action a Under OLS or the Reference prior the Expected Mean Square Error E Y [(β ˆβ) T (β ˆβ) = σ 2 tr[(x T X) 1 ] = p σ 2 j=1 λ 1 j
7 How Good are Estimators? Quadratic loss for estimating β using estimator a L(β, a) = (β a) T (β a) Consider our expected loss (before we see the data) of taking an action a Under OLS or the Reference prior the Expected Mean Square Error E Y [(β ˆβ) T (β ˆβ) = σ 2 tr[(x T X) 1 ] = p σ 2 j=1 λ 1 j If smallest λ j 0 then MSE
8 Is g-prior any better? Under the g-prior E Y [(β g 1+g ˆβ) T (β g 1+g ˆβ)
9 Is g-prior any better? Under the g-prior E Y [(β E[L(β, g 1+g ˆβ) T (β g 1+g ˆβ) ( ) g 1 + g ˆβ)] g 2 = σ 2 tr[(x T X) 1 ] + βt β 1 + g (1 + g) 2
10 Is g-prior any better? Under the g-prior E Y [(β E[L(β, ( g 1 + g ˆβ)] g = σ 2 = g 1+g ˆβ) T (β g 1+g ˆβ) ) 2 tr[(x T X) 1 ] + βt β (1 + g) g 1 (1 + g) 2 (σ2 g 2 λ 1 j + β 2 )
11 Is g-prior any better? Under the g-prior E Y [(β E[L(β, ( g 1 + g ˆβ)] g = σ 2 = g 1+g ˆβ) T (β g 1+g ˆβ) ) 2 tr[(x T X) 1 ] + βt β (1 + g) g 1 (1 + g) 2 (σ2 g 2 λ 1 j + β 2 ) Aside: g prior is better than Reference Prior if g > β 2 σ 2 λ 1 1 j But still have risk going to infinity as λ 0
12 Canonical Orthogonal Representation & Ridge Regression Assume that X has been centered and standardized so that X T X = corr(x)
13 Canonical Orthogonal Representation & Ridge Regression Assume that X has been centered and standardized so that X T X = corr(x) (use scale function in R)
14 Canonical Orthogonal Representation & Ridge Regression Assume that X has been centered and standardized so that X T X = corr(x) (use scale function in R) Write X = U p LV T Singular Value Decomposition
15 Canonical Orthogonal Representation & Ridge Regression Assume that X has been centered and standardized so that X T X = corr(x) (use scale function in R) Write X = U p LV T Singular Value Decomposition where U T p U p = I p and V is p p orthogonal matrix, L is diagonal
16 Canonical Orthogonal Representation & Ridge Regression Assume that X has been centered and standardized so that X T X = corr(x) (use scale function in R) Write X = U p LV T Singular Value Decomposition where U T p U p = I p and V is p p orthogonal matrix, L is diagonal Y = 1α + U p LV T β + ɛ
17 Canonical Orthogonal Representation & Ridge Regression Assume that X has been centered and standardized so that X T X = corr(x) (use scale function in R) Write X = U p LV T Singular Value Decomposition where U T p U p = I p and V is p p orthogonal matrix, L is diagonal Y = 1α + U p LV T β + ɛ Let U = [1 U p U n p 1 ] n n orthogonal matrix
18 Canonical Orthogonal Representation & Ridge Regression Assume that X has been centered and standardized so that X T X = corr(x) (use scale function in R) Write X = U p LV T Singular Value Decomposition where U T p U p = I p and V is p p orthogonal matrix, L is diagonal Y = 1α + U p LV T β + ɛ Let U = [1 U p U n p 1 ] n n orthogonal matrix Rotate by U T
19 Canonical Orthogonal Representation & Ridge Regression Assume that X has been centered and standardized so that X T X = corr(x) (use scale function in R) Write X = U p LV T Singular Value Decomposition where U T p U p = I p and V is p p orthogonal matrix, L is diagonal Y = 1α + U p LV T β + ɛ Let U = [1 U p U n p 1 ] n n orthogonal matrix Rotate by U T U T Y = U T 1α + U T U p LV T β + U T ɛ
20 Canonical Orthogonal Representation & Ridge Regression Assume that X has been centered and standardized so that X T X = corr(x) (use scale function in R) Write X = U p LV T Singular Value Decomposition where U T p U p = I p and V is p p orthogonal matrix, L is diagonal Y = 1α + U p LV T β + ɛ Let U = [1 U p U n p 1 ] n n orthogonal matrix Rotate by U T U T Y = U T 1α + U T U p LV T β + U T ɛ n 0 p ( Y = 0 L α γ 0 n p 1 0 n p 1 p ) + ɛ
21 Canonical Orthogonal Representation & Ridge Regression Assume that X has been centered and standardized so that X T X = corr(x) (use scale function in R) Write X = U p LV T Singular Value Decomposition where U T p U p = I p and V is p p orthogonal matrix, L is diagonal Y = 1α + U p LV T β + ɛ Let U = [1 U p U n p 1 ] n n orthogonal matrix Rotate by U T U T Y = U T 1α + U T U p LV T β + U T ɛ n 0 p ( Y = 0 L α γ 0 n p 1 0 n p 1 p ) + ɛ
22 Orthogonal Regression U T Y = U T 1α + U T U p LV T β + U T ɛ
23 Orthogonal Regression U T Y = U T 1α + U T U p LV T β + U T ɛ n 0 p ( Y = 0 L α γ 0 n p 1 0 n p 1 p ) + ɛ
24 Orthogonal Regression U T Y = U T 1α + U T U p LV T β + U T ɛ n 0 p ( Y = 0 L α γ 0 n p 1 0 n p 1 p ) + ɛ ˆα = ȳ
25 Orthogonal Regression U T Y = U T 1α + U T U p LV T β + U T ɛ n 0 p ( Y = 0 L α γ 0 n p 1 0 n p 1 p ) + ɛ ˆα = ȳ ˆγ = (L T L) 1 L T U T p Y or ˆγ i = y i /l i for i = 1,..., p
26 Orthogonal Regression U T Y = U T 1α + U T U p LV T β + U T ɛ n 0 p ( Y = 0 L α γ 0 n p 1 0 n p 1 p ) + ɛ ˆα = ȳ ˆγ = (L T L) 1 L T U T p Y or ˆγ i = y i /l i for i = 1,..., p Var(ˆγ i ) = σ 2 /l 2 i Directions in X space U j with small eigenvectors l i have the largest variances. Unstable directions.
27 Ridge Regression & Independent Prior (Another) Normal Conjugate Prior Distribution on γ: γ φ N(0 p, 1 φk I p)
28 Ridge Regression & Independent Prior (Another) Normal Conjugate Prior Distribution on γ: Posterior mean γ φ N(0 p, 1 φk I p) γ = (L T L + ki) 1 L T U T p Y = (L T L + ki) 1 L T Lˆγ
29 Ridge Regression & Independent Prior (Another) Normal Conjugate Prior Distribution on γ: Posterior mean γ φ N(0 p, 1 φk I p) γ = (L T L + ki) 1 L T U T p Y = (L T L + ki) 1 L T Lˆγ γ i = l i 2 li 2 + k ˆγ i = λ i λ i + k ˆγ i
30 Ridge Regression & Independent Prior (Another) Normal Conjugate Prior Distribution on γ: γ φ N(0 p, 1 φk I p) Posterior mean γ = (L T L + ki) 1 L T U T p Y = (L T L + ki) 1 L T Lˆγ γ i = l i 2 li 2 + k ˆγ i = λ i λ i + k ˆγ i When λ i 0 then γ i 0 When k 0 we get OLS back but if k gets too big posterior mean goes to zero.
31 Transform Transform back β = V γ
32 Transform Transform back β = V γ β = (X T X + ki) 1 X T Xˆβ
33 Transform Transform back β = V γ β = (X T X + ki) 1 X T Xˆβ importance of standardizing
34 Transform Transform back β = V γ β = (X T X + ki) 1 X T Xˆβ importance of standardizing Is there a value of k for which ridge is better in terms of Expected MSE than OLS?
35 Transform Transform back β = V γ β = (X T X + ki) 1 X T Xˆβ importance of standardizing Is there a value of k for which ridge is better in terms of Expected MSE than OLS? Choice of k? Hoerl & Kennard (1970)
36 MSE Can show (HW) that E[(β β) T (β β)] = E[(γ γ) T (γ γ]
37 MSE Can show (HW) that E[(β β) T (β β)] = E[(γ γ) T (γ γ] Var(γ i γ i ) = σ 2 l 2 i /(l 2 i + k) 2
38 MSE Can show (HW) that E[(β β) T (β β)] = E[(γ γ) T (γ γ] Var(γ i γ i ) = σ 2 l 2 i /(l 2 i + k) 2 Bias of γ is k/(l 2 i + k)
39 MSE Can show (HW) that E[(β β) T (β β)] = E[(γ γ) T (γ γ] Var(γ i γ i ) = σ 2 li 2/(l i 2 + k) 2 Bias of γ is k/(li 2 + k) MSE σ 2 i (l 2 i l 2 i + k) 2 + k2 i γ 2 i (l 2 i + k) 2 The derivative with respect to k is negative at k = 0, hence the function is decreasing.
40 MSE Can show (HW) that E[(β β) T (β β)] = E[(γ γ) T (γ γ] Var(γ i γ i ) = σ 2 li 2/(l i 2 + k) 2 Bias of γ is k/(li 2 + k) MSE σ 2 i (l 2 i l 2 i + k) 2 + k2 i γ 2 i (l 2 i + k) 2 The derivative with respect to k is negative at k = 0, hence the function is decreasing. Since k = 0 is OLS, this means that there is a value of k that will always be better than OLS
41 Alternative Motivation If ˆβ is unconstrained expect high variance with nearly singular X
42 Alternative Motivation If ˆβ is unconstrained expect high variance with nearly singular X Let Y c = (I P 1 )Y and X c the centered and standardized X matrix
43 Alternative Motivation If ˆβ is unconstrained expect high variance with nearly singular X Let Y c = (I P 1 )Y and X c the centered and standardized X matrix Control how large coefficients may grow
44 Alternative Motivation If ˆβ is unconstrained expect high variance with nearly singular X Let Y c = (I P 1 )Y and X c the centered and standardized X matrix Control how large coefficients may grow min β (Yc X c β) T (Y c X c β) subject to β 2 j t
45 Alternative Motivation If ˆβ is unconstrained expect high variance with nearly singular X Let Y c = (I P 1 )Y and X c the centered and standardized X matrix Control how large coefficients may grow min β (Yc X c β) T (Y c X c β) subject to β 2 j t Equivalent Quadratic Programming Problem min β Yc X c β 2 + k β 2
46 Alternative Motivation If ˆβ is unconstrained expect high variance with nearly singular X Let Y c = (I P 1 )Y and X c the centered and standardized X matrix Control how large coefficients may grow min β (Yc X c β) T (Y c X c β) subject to β 2 j t Equivalent Quadratic Programming Problem min β Yc X c β 2 + k β 2 penalized likelihood
47 Alternative Motivation If ˆβ is unconstrained expect high variance with nearly singular X Let Y c = (I P 1 )Y and X c the centered and standardized X matrix Control how large coefficients may grow min β (Yc X c β) T (Y c X c β) subject to β 2 j t Equivalent Quadratic Programming Problem min β Yc X c β 2 + k β 2 penalized likelihood
48 Picture
49 Longley Data Employed GNP.deflator GNP Unemployed Armed.Forces Population Year
50 OLS > longley.lm = lm(employed ~., data=longley) > summary(longley.lm) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e e ** GNP.deflator 1.506e e GNP e e Unemployed e e ** Armed.Forces e e *** Population e e Year 1.829e e ** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: on 9 degrees of freedom Multiple R-squared: ,Adjusted R-squared: F-statistic: on 6 and 9 DF, p-value: 4.984e-10
51 Ridge Trace t(x$coef) x$lambda
52 Generalized Cross-validation > select(lm.ridge(employed ~., data=longley, lambda=seq(0, 0.1, ))) modified HKB estimator is modified L-W estimator is smallest value of GCV at > longley.rreg = lm.ridge(employed ~., data=longley, lambda=0.0028) > coef(longley.rreg) GNP.deflator GNP Unemployed Armed.Forces e e e e e-03 Population e-01 Year 1.557e+00
53 Testimators Goldstein & Smith (1974) have shown that if 1 0 h i 1 and γ i = h i ˆγ i 2 γ 2 i Var(ˆγ i ) < 1+h i 1 h i then γ i has smaller MSE than ˆγ i Case: If γ j < Var(ˆγ i ) = σ 2 /l 2 i then h i = 0 and γ i is better. Apply: Estimate σ 2 with SSE/(n - p - 1) and γ i with ˆγ i. Set h i = 0 if t-statistic is less than 1. testimator - see also Sclove (JASA 1968) and Copas ( JRSSB 1983)
54 Generalized Ridge Instead of γ j iid N(0, σ 2 /k) take γ j ind N(0, σ 2 /k i )
55 Generalized Ridge Instead of γ j iid N(0, σ 2 /k) take γ j ind N(0, σ 2 /k i ) Then Condition of Goldstein & Smith becomes [ 2 γi 2 < σ ] k j li 2
56 Generalized Ridge Instead of γ j iid N(0, σ 2 /k) take γ j ind N(0, σ 2 /k i ) Then Condition of Goldstein & Smith becomes [ 2 γi 2 < σ ] k j li 2 If l i is small almost any k i will improve over OLS
57 Generalized Ridge Instead of γ j iid N(0, σ 2 /k) take γ j ind N(0, σ 2 /k i ) Then Condition of Goldstein & Smith becomes [ 2 γi 2 < σ ] k j li 2 If l i is small almost any k i will improve over OLS if li 2 is large then only very small values of k i will give an improvement
58 Generalized Ridge Instead of γ j iid N(0, σ 2 /k) take γ j ind N(0, σ 2 /k i ) Then Condition of Goldstein & Smith becomes [ 2 γi 2 < σ ] k j li 2 If l i is small almost any k i will improve over OLS if li 2 is large then only very small values of k i will give an improvement Prior on k i?
59 Generalized Ridge Instead of γ j iid N(0, σ 2 /k) take γ j ind N(0, σ 2 /k i ) Then Condition of Goldstein & Smith becomes [ 2 γi 2 < σ ] k j li 2 If l i is small almost any k i will improve over OLS if li 2 is large then only very small values of k i will give an improvement Prior on k i? Induced prior on β? γ j ind N(0, σ 2 /k i ) β N(0, σ 2 VK 1 V T ) which is not diagonal. Loss of invariance.
60 Related Regression on PCA Principal Components of X may be obtained via the Singular Value Decomposition: X = U p LV T
61 Related Regression on PCA Principal Components of X may be obtained via the Singular Value Decomposition: X = U p LV T the l i are the eigenvalues of X T X
62 Related Regression on PCA Principal Components of X may be obtained via the Singular Value Decomposition: X = U p LV T the l i are the eigenvalues of X T X Y = 1α + ULV T β + ɛ = 1α + Fγ + ɛ Columns F i U i are the principal components of the data multivariate data X 1,..., X p
63 Related Regression on PCA Principal Components of X may be obtained via the Singular Value Decomposition: X = U p LV T the l i are the eigenvalues of X T X Y = 1α + ULV T β + ɛ = 1α + Fγ + ɛ Columns F i U i are the principal components of the data multivariate data X 1,..., X p If the direction F i is ill-defined (l i = 0 or λ i < ɛ then we may decide to not use F i in the model.
64 Related Regression on PCA Principal Components of X may be obtained via the Singular Value Decomposition: X = U p LV T the l i are the eigenvalues of X T X Y = 1α + ULV T β + ɛ = 1α + Fγ + ɛ Columns F i U i are the principal components of the data multivariate data X 1,..., X p If the direction F i is ill-defined (l i = 0 or λ i < ɛ then we may decide to not use F i in the model. equivalent to setting γ i = ˆγ i if l i ɛ γ i = 0 if l i < ɛ
65 Related Regression on PCA Principal Components of X may be obtained via the Singular Value Decomposition: X = U p LV T the l i are the eigenvalues of X T X Y = 1α + ULV T β + ɛ = 1α + Fγ + ɛ Columns F i U i are the principal components of the data multivariate data X 1,..., X p If the direction F i is ill-defined (l i = 0 or λ i < ɛ then we may decide to not use F i in the model. equivalent to setting γ i = ˆγ i if l i ɛ γ i = 0 if l i < ɛ
66 Summary OLS can clearly be dominated by other estimators for extimating β
67 Summary OLS can clearly be dominated by other estimators for extimating β Lead to Bayes like estimators
68 Summary OLS can clearly be dominated by other estimators for extimating β Lead to Bayes like estimators choice of penalties or prior hyper-parameters
69 Summary OLS can clearly be dominated by other estimators for extimating β Lead to Bayes like estimators choice of penalties or prior hyper-parameters hierarchical model with prior on k i
70 Summary OLS can clearly be dominated by other estimators for extimating β Lead to Bayes like estimators choice of penalties or prior hyper-parameters hierarchical model with prior on k i Shrinkage, dimension reduction & variable selection
71 Summary OLS can clearly be dominated by other estimators for extimating β Lead to Bayes like estimators choice of penalties or prior hyper-parameters hierarchical model with prior on k i Shrinkage, dimension reduction & variable selection what loss function? Estimation versus prediction? Copas 1983
Lasso & Bayesian Lasso
Readings Chapter 15 Christensen Merlise Clyde October 6, 2015 Lasso Tibshirani (JRSS B 1996) proposed estimating coefficients through L 1 constrained least squares Least Absolute Shrinkage and Selection
More informationBiostatistics Advanced Methods in Biostatistics IV
Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results
More informationISyE 691 Data mining and analytics
ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)
More informationA Modern Look at Classical Multivariate Techniques
A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico
More informationMaximum Likelihood Estimation
Maximum Likelihood Estimation Merlise Clyde STA721 Linear Models Duke University August 31, 2017 Outline Topics Likelihood Function Projections Maximum Likelihood Estimates Readings: Christensen Chapter
More informationSTA Homework 1. Due back at the beginning of class on Oct 20, 2008
STA 410 - Homework 1 Due back at the beginning of class on Oct 20, 2008 The following is split into two parts. Part I should be solved by all. Part II is mandatory only for the graduate students in Statistics.
More informationLinear regression methods
Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response
More informationBusiness Statistics. Tommaso Proietti. Model Evaluation and Selection. DEF - Università di Roma 'Tor Vergata'
Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Model Evaluation and Selection Predictive Ability of a Model: Denition and Estimation We aim at achieving a balance between parsimony
More informationTheorems. Least squares regression
Theorems In this assignment we are trying to classify AML and ALL samples by use of penalized logistic regression. Before we indulge on the adventure of classification we should first explain the most
More informationSimple Linear Regression
Simple Linear Regression Reading: Hoff Chapter 9 November 4, 2009 Problem Data: Observe pairs (Y i,x i ),i = 1,... n Response or dependent variable Y Predictor or independent variable X GOALS: Exploring
More informationStatistics 203: Introduction to Regression and Analysis of Variance Penalized models
Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Jonathan Taylor - p. 1/15 Today s class Bias-Variance tradeoff. Penalized regression. Cross-validation. - p. 2/15 Bias-variance
More informationRidge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation
Patrick Breheny February 8 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/27 Introduction Basic idea Standardization Large-scale testing is, of course, a big area and we could keep talking
More informationRegularization: Ridge Regression and the LASSO
Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression
More informationThe prediction of house price
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationMatrices and Multivariate Statistics - II
Matrices and Multivariate Statistics - II Richard Mott November 2011 Multivariate Random Variables Consider a set of dependent random variables z = (z 1,..., z n ) E(z i ) = µ i cov(z i, z j ) = σ ij =
More informationSimple Linear Regression
Simple Linear Regression September 24, 2008 Reading HH 8, GIll 4 Simple Linear Regression p.1/20 Problem Data: Observe pairs (Y i,x i ),i = 1,...n Response or dependent variable Y Predictor or independent
More informationLinear Models A linear model is defined by the expression
Linear Models A linear model is defined by the expression x = F β + ɛ. where x = (x 1, x 2,..., x n ) is vector of size n usually known as the response vector. β = (β 1, β 2,..., β p ) is the transpose
More informationRidge Regression Revisited
Ridge Regression Revisited Paul M.C. de Boer Christian M. Hafner Econometric Institute Report EI 2005-29 In general ridge (GR) regression p ridge parameters have to be determined, whereas simple ridge
More information9/26/17. Ridge regression. What our model needs to do. Ridge Regression: L2 penalty. Ridge coefficients. Ridge coefficients
What our model needs to do regression Usually, we are not just trying to explain observed data We want to uncover meaningful trends And predict future observations Our questions then are Is β" a good estimate
More informationLinear Regression Linear Regression with Shrinkage
Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2
MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 1 Ridge Regression Ridge regression and the Lasso are two forms of regularized
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationINTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y
INTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y Predictor or Independent variable x Model with error: for i = 1,..., n, y i = α + βx i + ε i ε i : independent errors (sampling, measurement,
More informationRXshrink_in_R. Generalized Ridge and Least Angle Regression estimates most likely to have minimum MSE risk
RXshrink_in_R Generalized Ridge and Least Angle Regression estimates most likely to have minimum MSE risk vignette-like documentation for R-package RXshrink version 1.1 - November 2018 Personal computing
More informationSTAT 462-Computational Data Analysis
STAT 462-Computational Data Analysis Chapter 5- Part 2 Nasser Sadeghkhani a.sadeghkhani@queensu.ca October 2017 1 / 27 Outline Shrinkage Methods 1. Ridge Regression 2. Lasso Dimension Reduction Methods
More informationLinear model selection and regularization
Linear model selection and regularization Problems with linear regression with least square 1. Prediction Accuracy: linear regression has low bias but suffer from high variance, especially when n p. It
More informationCOMBINING THE LIU-TYPE ESTIMATOR AND THE PRINCIPAL COMPONENT REGRESSION ESTIMATOR
Noname manuscript No. (will be inserted by the editor) COMBINING THE LIU-TYPE ESTIMATOR AND THE PRINCIPAL COMPONENT REGRESSION ESTIMATOR Deniz Inan Received: date / Accepted: date Abstract In this study
More informationMulti-View Regression via Canonincal Correlation Analysis
Multi-View Regression via Canonincal Correlation Analysis Dean P. Foster U. Pennsylvania (Sham M. Kakade of TTI) Model and background p ( i n) y i = X ij β j + ɛ i ɛ i iid N(0, σ 2 ) i=j Data mining and
More informationLecture 14: Shrinkage
Lecture 14: Shrinkage Reading: Section 6.2 STATS 202: Data mining and analysis October 27, 2017 1 / 19 Shrinkage methods The idea is to perform a linear regression, while regularizing or shrinking the
More informationRegression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin
Regression Review Statistics 149 Spring 2006 Copyright c 2006 by Mark E. Irwin Matrix Approach to Regression Linear Model: Y i = β 0 + β 1 X i1 +... + β p X ip + ɛ i ; ɛ i iid N(0, σ 2 ), i = 1,..., n
More informationLinear Regression Linear Regression with Shrinkage
Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle
More informationBusiness Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'
Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Linear Regression Specication Let Y be a univariate quantitative response variable. We model Y as follows: Y = f(x) + ε where
More informationMultiple Linear Regression
Multiple Linear Regression ST 430/514 Recall: a regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates).
More informationStatistical Models. Functional Linear Regression. A First Idea. Models for Scalar Responses
Statistical Models Functional Linear Regression So far we have focussed on exploratory data analysis Smoothing Functional covariance Functional PCA/CCA Now we wish to examine predictive relationships generalization
More informationshrink: a vignette for the RXshrink R-package
shrink: a vignette for the RXshrink R-package Generalized Ridge and Least Angle Regression Version 1.0-7, December 2011 License: GPL (>= 2) Personal computing treatments for your data analysis infirmities
More informationCPSC 340: Machine Learning and Data Mining. More PCA Fall 2017
CPSC 340: Machine Learning and Data Mining More PCA Fall 2017 Admin Assignment 4: Due Friday of next week. No class Monday due to holiday. There will be tutorials next week on MAP/PCA (except Monday).
More informationHigh-dimensional regression
High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and
More informationRidge and Lasso Regression
enote 8 1 enote 8 Ridge and Lasso Regression enote 8 INDHOLD 2 Indhold 8 Ridge and Lasso Regression 1 8.1 Reading material................................. 2 8.2 Presentation material...............................
More informationAdministration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books
STA 44/04 Jan 6, 00 / 5 Administration Homework on web page, due Feb NSERC summer undergraduate award applications due Feb 5 Some helpful books STA 44/04 Jan 6, 00... administration / 5 STA 44/04 Jan 6,
More informationA New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables
A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables Qi Tang (Joint work with Kam-Wah Tsui and Sijian Wang) Department of Statistics University of Wisconsin-Madison Feb. 8,
More informationSampling Distributions
Merlise Clyde Duke University September 8, 2016 Outline Topics Normal Theory Chi-squared Distributions Student t Distributions Readings: Christensen Apendix C, Chapter 1-2 Prostate Example > library(lasso2);
More informationSTK-IN4300 Statistical Learning Methods in Data Science
Outline of the lecture STK-I4300 Statistical Learning Methods in Data Science Riccardo De Bin debin@math.uio.no Model Assessment and Selection Cross-Validation Bootstrap Methods Methods using Derived Input
More informationGI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil
GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis Massimiliano Pontil 1 Today s plan SVD and principal component analysis (PCA) Connection
More informationWe ve seen today that a square, symmetric matrix A can be written as. A = ODO t
Lecture 5: Shrinky dink From last time... Inference So far we ve shied away from formal probabilistic arguments (aside from, say, assessing the independence of various outputs of a linear model) -- We
More information7.3 Ridge Analysis of the Response Surface
7.3 Ridge Analysis of the Response Surface When analyzing a fitted response surface, the researcher may find that the stationary point is outside of the experimental design region, but the researcher wants
More informationMachine Learning for Economists: Part 4 Shrinkage and Sparsity
Machine Learning for Economists: Part 4 Shrinkage and Sparsity Michal Andrle International Monetary Fund Washington, D.C., October, 2018 Disclaimer #1: The views expressed herein are those of the authors
More informationCS540 Machine learning Lecture 5
CS540 Machine learning Lecture 5 1 Last time Basis functions for linear regression Normal equations QR SVD - briefly 2 This time Geometry of least squares (again) SVD more slowly LMS Ridge regression 3
More informationData Mining Stat 588
Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic
More informationDistribution Assumptions
Merlise Clyde Duke University November 22, 2016 Outline Topics Normality & Transformations Box-Cox Nonlinear Regression Readings: Christensen Chapter 13 & Wakefield Chapter 6 Linear Model Linear Model
More informationApplied Regression Analysis
Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of
More informationLinear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.
Linear regression We have that the estimated mean in linear regression is The standard error of ˆµ Y X=x is where x = 1 n s.e.(ˆµ Y X=x ) = σ ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. 1 n + (x x)2 i (x i x) 2 i x i. The
More informationHomework 1: Solutions
Homework 1: Solutions Statistics 413 Fall 2017 Data Analysis: Note: All data analysis results are provided by Michael Rodgers 1. Baseball Data: (a) What are the most important features for predicting players
More informationLecture 6: Methods for high-dimensional problems
Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,
More informationSTAT 705 Chapter 16: One-way ANOVA
STAT 705 Chapter 16: One-way ANOVA Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 21 What is ANOVA? Analysis of variance (ANOVA) models are regression
More informationPENALIZED PRINCIPAL COMPONENT REGRESSION. Ayanna Byrd. (Under the direction of Cheolwoo Park) Abstract
PENALIZED PRINCIPAL COMPONENT REGRESSION by Ayanna Byrd (Under the direction of Cheolwoo Park) Abstract When using linear regression problems, an unbiased estimate is produced by the Ordinary Least Squares.
More informationIntroduction to Statistical modeling: handout for Math 489/583
Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect
More informationResponse Surface Methodology III
LECTURE 7 Response Surface Methodology III 1. Canonical Form of Response Surface Models To examine the estimated regression model we have several choices. First, we could plot response contours. Remember
More informationCh 2: Simple Linear Regression
Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component
More informationMath 3330: Solution to midterm Exam
Math 3330: Solution to midterm Exam Question 1: (14 marks) Suppose the regression model is y i = β 0 + β 1 x i + ε i, i = 1,, n, where ε i are iid Normal distribution N(0, σ 2 ). a. (2 marks) Compute the
More informationPaper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)
Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation
More informationMultiple Regression Analysis. Part III. Multiple Regression Analysis
Part III Multiple Regression Analysis As of Sep 26, 2017 1 Multiple Regression Analysis Estimation Matrix form Goodness-of-Fit R-square Adjusted R-square Expected values of the OLS estimators Irrelevant
More informationLinear Methods for Prediction
Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we
More informationMATH 829: Introduction to Data Mining and Analysis Principal component analysis
1/11 MATH 829: Introduction to Data Mining and Analysis Principal component analysis Dominique Guillot Departments of Mathematical Sciences University of Delaware April 4, 2016 Motivation 2/11 High-dimensional
More informationGauss Markov & Predictive Distributions
Gauss Markov & Predictive Distributions Merlise Clyde STA721 Linear Models Duke University September 14, 2017 Outline Topics Gauss-Markov Theorem Estimability and Prediction Readings: Christensen Chapter
More informationLinear Methods for Regression. Lijun Zhang
Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived
More informationRidge Regression and Ill-Conditioning
Journal of Modern Applied Statistical Methods Volume 3 Issue Article 8-04 Ridge Regression and Ill-Conditioning Ghadban Khalaf King Khalid University, Saudi Arabia, albadran50@yahoo.com Mohamed Iguernane
More informationSpatial Process Estimates as Smoothers: A Review
Spatial Process Estimates as Smoothers: A Review Soutir Bandyopadhyay 1 Basic Model The observational model considered here has the form Y i = f(x i ) + ɛ i, for 1 i n. (1.1) where Y i is the observed
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationSampling Distributions
Merlise Clyde Duke University September 3, 2015 Outline Topics Normal Theory Chi-squared Distributions Student t Distributions Readings: Christensen Apendix C, Chapter 1-2 Prostate Example > library(lasso2);
More informationData Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods.
TheThalesians Itiseasyforphilosopherstoberichiftheychoose Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods Ivan Zhdankin
More informationInference for Regression
Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu
More informationFigure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim
0.0 1.0 1.5 2.0 2.5 3.0 8 10 12 14 16 18 20 22 y x Figure 1: The fitted line using the shipment route-number of ampules data STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim Problem#
More informationMLES & Multivariate Normal Theory
Merlise Clyde September 6, 2016 Outline Expectations of Quadratic Forms Distribution Linear Transformations Distribution of estimates under normality Properties of MLE s Recap Ŷ = ˆµ is an unbiased estimate
More informationImproved Liu Estimators for the Poisson Regression Model
www.ccsenet.org/isp International Journal of Statistics and Probability Vol., No. ; May 202 Improved Liu Estimators for the Poisson Regression Model Kristofer Mansson B. M. Golam Kibria Corresponding author
More informationLecture VIII Dim. Reduction (I)
Lecture VIII Dim. Reduction (I) Contents: Subset Selection & Shrinkage Ridge regression, Lasso PCA, PCR, PLS Lecture VIII: MLSC - Dr. Sethu Viayakumar Data From Human Movement Measure arm movement and
More informationCollinearity: Impact and Possible Remedies
Collinearity: Impact and Possible Remedies Deepayan Sarkar What is collinearity? Exact dependence between columns of X make coefficients non-estimable Collinearity refers to the situation where some columns
More informationSimple Linear Regression
Simple Linear Regression MATH 282A Introduction to Computational Statistics University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/ eariasca/math282a.html MATH 282A University
More informationLinear Model Selection and Regularization
Linear Model Selection and Regularization Chapter 6 October 18, 2016 Chapter 6 October 18, 2016 1 / 80 1 Subset selection 2 Shrinkage methods 3 Dimension reduction methods (using derived inputs) 4 High
More informationChoosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation
Choosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation COMPSTAT 2010 Revised version; August 13, 2010 Michael G.B. Blum 1 Laboratoire TIMC-IMAG, CNRS, UJF Grenoble
More informationComparison of Some Improved Estimators for Linear Regression Model under Different Conditions
Florida International University FIU Digital Commons FIU Electronic Theses and Dissertations University Graduate School 3-24-2015 Comparison of Some Improved Estimators for Linear Regression Model under
More informationECON 4160, Lecture 11 and 12
ECON 4160, 2016. Lecture 11 and 12 Co-integration Ragnar Nymoen Department of Economics 9 November 2017 1 / 43 Introduction I So far we have considered: Stationary VAR ( no unit roots ) Standard inference
More information22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression
22s:52 Applied Linear Regression Ch. 4 (sec. and Ch. 5 (sec. & 4: Logistic Regression Logistic Regression When the response variable is a binary variable, such as 0 or live or die fail or succeed then
More informationInfluencial Observations in Ridge Regression
Influencial Observations in Ridge Regression by Robert L. Obenchain We augment traditional trace displays with dynamic graphics that can reveal dramatic changes in influence/outliers/leverages of observations
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7
MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is
More informationAn Introduction to Bayesian Linear Regression
An Introduction to Bayesian Linear Regression APPM 5720: Bayesian Computation Fall 2018 A SIMPLE LINEAR MODEL Suppose that we observe explanatory variables x 1, x 2,..., x n and dependent variables y 1,
More informationMS&E 226. In-Class Midterm Examination Solutions Small Data October 20, 2015
MS&E 226 In-Class Midterm Examination Solutions Small Data October 20, 2015 PROBLEM 1. Alice uses ordinary least squares to fit a linear regression model on a dataset containing outcome data Y and covariates
More informationRidge Regression. Flachs, Munkholt og Skotte. May 4, 2009
Ridge Regression Flachs, Munkholt og Skotte May 4, 2009 As in usual regression we consider a pair of random variables (X, Y ) with values in R p R and assume that for some (β 0, β) R +p it holds that E(Y
More informationLinear Regression (9/11/13)
STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter
More informationBasis Penalty Smoothers. Simon Wood Mathematical Sciences, University of Bath, U.K.
Basis Penalty Smoothers Simon Wood Mathematical Sciences, University of Bath, U.K. Estimating functions It is sometimes useful to estimate smooth functions from data, without being too precise about the
More informationLog Covariance Matrix Estimation
Log Covariance Matrix Estimation Xinwei Deng Department of Statistics University of Wisconsin-Madison Joint work with Kam-Wah Tsui (Univ. of Wisconsin-Madsion) 1 Outline Background and Motivation The Proposed
More informationPenalized Regression
Penalized Regression Deepayan Sarkar Penalized regression Another potential remedy for collinearity Decreases variability of estimated coefficients at the cost of introducing bias Also known as regularization
More information20.1. Balanced One-Way Classification Cell means parametrization: ε 1. ε I. + ˆɛ 2 ij =
20. ONE-WAY ANALYSIS OF VARIANCE 1 20.1. Balanced One-Way Classification Cell means parametrization: Y ij = µ i + ε ij, i = 1,..., I; j = 1,..., J, ε ij N(0, σ 2 ), In matrix form, Y = Xβ + ε, or 1 Y J
More informationTheoretical Exercises Statistical Learning, 2009
Theoretical Exercises Statistical Learning, 2009 Niels Richard Hansen April 20, 2009 The following exercises are going to play a central role in the course Statistical learning, block 4, 2009. The exercises
More informationPeter Hoff Minimax estimation October 31, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11
Contents 1 Motivation and definition 1 2 Least favorable prior 3 3 Least favorable prior sequence 11 4 Nonparametric problems 15 5 Minimax and admissibility 18 6 Superefficiency and sparsity 19 Most of
More informationMSG500/MVE190 Linear Models - Lecture 15
MSG500/MVE190 Linear Models - Lecture 15 Rebecka Jörnsten Mathematical Statistics University of Gothenburg/Chalmers University of Technology December 13, 2012 1 Regularized regression In ordinary least
More information5. Linear Regression
5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4
More informationLecture 11: Regression Methods I (Linear Regression)
Lecture 11: Regression Methods I (Linear Regression) Fall, 2017 1 / 40 Outline Linear Model Introduction 1 Regression: Supervised Learning with Continuous Responses 2 Linear Models and Multiple Linear
More informationVariance Decomposition and Goodness of Fit
Variance Decomposition and Goodness of Fit 1. Example: Monthly Earnings and Years of Education In this tutorial, we will focus on an example that explores the relationship between total monthly earnings
More informationLearning with Singular Vectors
Learning with Singular Vectors CIS 520 Lecture 30 October 2015 Barry Slaff Based on: CIS 520 Wiki Materials Slides by Jia Li (PSU) Works cited throughout Overview Linear regression: Given X, Y find w:
More informationChemometrics. Matti Hotokka Physical chemistry Åbo Akademi University
Chemometrics Matti Hotokka Physical chemistry Åbo Akademi University Linear regression Experiment Consider spectrophotometry as an example Beer-Lamberts law: A = cå Experiment Make three known references
More information