Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Similar documents
Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

Need for Several Predictor Variables

Remedial Measures for Multiple Linear Regression Models

Model Selection. Frank Wood. December 10, 2009

Linear Algebra Review

How the mean changes depends on the other variable. Plots can show what s happening...

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

Review: Second Half of Course Stat 704: Data Analysis I, Fall 2014

STAT 540: Data Analysis and Regression

Math 423/533: The Main Theoretical Topics

Chapter 5 Matrix Approach to Simple Linear Regression

Lecture 10 Multiple Linear Regression

Multiple Regression. Dr. Frank Wood. Frank Wood, Linear Regression Models Lecture 12, Slide 1

Formal Statement of Simple Linear Regression Model

Topic 7 - Matrix Approach to Simple Linear Regression. Outline. Matrix. Matrix. Review of Matrices. Regression model in matrix form

STAT5044: Regression and Anova. Inyoung Kim

Introduction to Statistical modeling: handout for Math 489/583

Matrix Approach to Simple Linear Regression: An Overview

10. Alternative case influence statistics

MATH 644: Regression Analysis Methods

6. Multiple Linear Regression

Regression, Ridge Regression, Lasso

Chapter 14 Student Lecture Notes 14-1

MS&E 226: Small Data

Lecture 9 SLR in Matrix Form

Chapter 2 Multiple Regression I (Part 1)

Chapter 11 Building the Regression Model II:

ISyE 691 Data mining and analytics

STAT5044: Regression and Anova

STAT 100C: Linear models

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

Machine Learning Linear Regression. Prof. Matteo Matteucci

Diagnostics and Remedial Measures

Sections 7.1, 7.2, 7.4, & 7.6

LINEAR REGRESSION MODELS W4315

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

Nonparametric Regression and Bonferroni joint confidence intervals. Yang Feng

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.

Outline. Topic 13 - Model Selection. Predicting Survival - Page 350. Survival Time as a Response. Variable Selection R 2 C p Adjusted R 2 PRESS

MS-C1620 Statistical inference

Linear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77

Multiple Regression Analysis. Part III. Multiple Regression Analysis

An Introduction to Path Analysis

Multiple Linear Regression

Machine Learning for OR & FE

Dr. Maddah ENMG 617 EM Statistics 11/28/12. Multiple Regression (3) (Chapter 15, Hines)

Chapter 10 Building the Regression Model II: Diagnostics

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

PubH 7405: REGRESSION ANALYSIS. MLR: INFERENCES, Part I

Chapter 14. Linear least squares

Chapter 12: Multiple Linear Regression

An Introduction to Mplus and Path Analysis

Lecture 6: Linear Regression (continued)

Unit 11: Multiple Linear Regression

Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model "Checking"/Diagnostics

Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model "Checking"/Diagnostics

Chapter 6 Multiple Regression

Linear Models 1. Isfahan University of Technology Fall Semester, 2014

Introduction to Simple Linear Regression

y response variable x 1, x 2,, x k -- a set of explanatory variables

Multivariate Regression (Chapter 10)

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Data Mining Stat 588

Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13)

Chapter 3 Multiple Regression Complete Example

Regression Diagnostics for Survey Data

STA 4210 Practise set 2b

Generalized Linear Models

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

Multiple Linear Regression

Chapter 4: Regression Models

Regression Models for Quantitative and Qualitative Predictors: An Overview

Lecture 2: Linear and Mixed Models

2.2 Classical Regression in the Time Series Context

Regression Steven F. Arnold Professor of Statistics Penn State University

Topic 18: Model Selection and Diagnostics

Lecture 2. The Simple Linear Regression Model: Matrix Approach

holding all other predictors constant

Linear regression methods

A Modern Look at Classical Multivariate Techniques

Day 4: Shrinkage Estimators

Ch14. Multiple Regression Analysis

No other aids are allowed. For example you are not allowed to have any other textbook or past exams.

Lecture 4 Multiple linear regression

STAT5044: Regression and Anova. Inyoung Kim

Business Statistics. Tommaso Proietti. Model Evaluation and Selection. DEF - Università di Roma 'Tor Vergata'

Multiple Linear Regression

Python 데이터분석 보충자료. 윤형기

Statistics 262: Intermediate Biostatistics Model selection

Linear model selection and regularization

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

Chapter 7 Student Lecture Notes 7-1

Linear Regression In God we trust, all others bring data. William Edwards Deming

STAT 705 Chapter 16: One-way ANOVA

Lecture 11. Correlation and Regression

Topic 4: Model Specifications

5. Multiple Regression (Regressioanalyysi) (Azcel Ch. 11, Milton/Arnold Ch. 12) The k-variable Multiple Regression Model

Weighted Least Squares

Wiley. Methods and Applications of Linear Models. Regression and the Analysis. of Variance. Third Edition. Ishpeming, Michigan RONALD R.

Transcription:

Final Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Final Review 1 / 58

Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple Regression Extra Sums of Squares Standardized Version of the Multiple Regression Model 3 Polynomial and Interaction Regression Models 4 Model Selection 5 Remedial Measures for Multiple Linear Regression Models Yang Feng (Columbia University) Final Review 2 / 58

Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple Regression Extra Sums of Squares Standardized Version of the Multiple Regression Model 3 Polynomial and Interaction Regression Models 4 Model Selection 5 Remedial Measures for Multiple Linear Regression Models Yang Feng (Columbia University) Final Review 3 / 58

General Regression Model in Matrix Terms Y 1. 1 X 11 X 12... X 1,p 1 Y =. X =.... 1 X n1 X n2... X n,p 1 Y n β 0 1.. β =. =... β p 1 n Yang Feng (Columbia University) Final Review 4 / 58

General Linear Regression in Matrix Terms With E() = 0 and Y = Xβ + σ 2 0... 0 σ 2 () = 0 σ 2... 0... 0 0... σ 2 We have E(Y) = Xβ and σ 2 {Y} = σ 2 I Yang Feng (Columbia University) Final Review 5 / 58

Least Square Solution The matrix normal equations can be derived directly from the minimization of Q(β) = (Y Xβ) (Y Xβ) w.r.t to β. b = (X X) 1 X Y Ŷ = Xb Yang Feng (Columbia University) Final Review 6 / 58

Hat Matrix-Puts hat on y We can also directly express the fitted values in terms of X and Y matrices Ŷ = X(X X) 1 X Y and we can further define H, the hat matrix Ŷ = HY H = X(X X) 1 X The hat matrix plans an important role in diagnostics for regression analysis. Yang Feng (Columbia University) Final Review 7 / 58

Hat Matrix Properties 1. the hat matrix is symmetric 2. the hat matrix is idempotent, i.e. HH = H Important idempotent matrix property For a symmetric and idempotent matrix A, rank(a) = trace(a), the number of non-zero eigenvalues of A. Yang Feng (Columbia University) Final Review 8 / 58

Residuals The residuals, like the fitted value Ŷ can be expressed as linear combinations of the response variable observations Y i e = Y Ŷ = Y HY = (I H)Y also, remember e = Y Ŷ = Y Xb these are equivalent. Yang Feng (Columbia University) Final Review 9 / 58

Covariance of Residuals Starting with we see that but which means that e = (I H)Y σ 2 {e} = (I H)σ 2 {Y}(I H) σ 2 {Y} = σ 2 {} = σ 2 I σ 2 {e} = σ 2 (I H)I(I H) = σ 2 (I H)(I H) and since I H is idempotent, we have σ 2 {e} = σ 2 (I H) Yang Feng (Columbia University) Final Review 10 / 58

Quadratic Forms In general, a quadratic form is defined by Y AY = i j a ijy i Y j where a ij = a ji with A the matrix of the quadratic form. The ANOVA sums SSTO,SSE and SSR can all be arranged into quadratic forms. SSTO = Y (I 1 n J)Y SSE = Y (I H)Y SSR = Y (H 1 n J)Y Yang Feng (Columbia University) Final Review 11 / 58

Inference Since σ 2 {Y} = σ 2 I we can write σ 2 {b} = (X X) 1 X σ 2 IX(X X) 1 = σ 2 (X X) 1 X X(X X) 1 = σ 2 (X X) 1 I = σ 2 (X X) 1 And E(b) = E((X X) 1 X Y) = (X X) 1 X E(Y) = (X X) 1 X Xβ = β Yang Feng (Columbia University) Final Review 12 / 58

Inference The estimated variance-covariance matrix Then, we have s 2 {b} = MSE(X X) 1 b k β k s{b k } t(n p), k = 0, 1,, p 1 1 α confidence intervals: b k ± t(1 α/2; n p)s{b k } Yang Feng (Columbia University) Final Review 13 / 58

t test Tests for β k : H 0 : β k = 0 H 1 : β k = 0 Test Statistic: t = b k s{b k } Decision Rule: t t(1 α/2; n p); conclude H 0 Otherwise, conclude H a Yang Feng (Columbia University) Final Review 14 / 58

F-test for regression H 0 : β 1 = β 2 = = β p 1 = 0 H a : no all β k, (k = 1,, p 1) equal zero Test statistic: Decision Rule: F = MSR MSE if F F (1 α; p 1, n p), conclude H 0 if F > F (1 α; p 1, n p), conclude H a Yang Feng (Columbia University) Final Review 15 / 58

R 2 and adjusted R 2 The coefficient of multiple determination R 2 is defined as: R 2 = SSR SSTO = 1 SSE SSTO 0 R 2 1 R 2 always increases when there are more variables. Therefore, adjusted R 2 : R 2 a = 1 SSE n p SSTO n 1 R 2 a may decrease when p is large. Coefficient of multiple correlation: Always positive square root! = 1 R = R 2 n 1 SSE n p SSTO Yang Feng (Columbia University) Final Review 16 / 58

Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple Regression Extra Sums of Squares Standardized Version of the Multiple Regression Model 3 Polynomial and Interaction Regression Models 4 Model Selection 5 Remedial Measures for Multiple Linear Regression Models Yang Feng (Columbia University) Final Review 17 / 58

Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple Regression Extra Sums of Squares Standardized Version of the Multiple Regression Model 3 Polynomial and Interaction Regression Models 4 Model Selection 5 Remedial Measures for Multiple Linear Regression Models Yang Feng (Columbia University) Final Review 18 / 58

Extra Sums of Squares Definition: marginal decrease in the SSE when one or several predictor variables are added to the regression model, given that other variables are already in the model. Examples: SSR(X 1 X 2 ) = SSE(X 2 ) SSE(X 1, X 2 ) = SSR(X 1, X 2 ) SSR(X 2 ) SSR(X 3 X 1, X 2 ) = SSE(X 1, X 2 ) SSE(X 1, X 2, X 3 ) = SSR(X 1, X 2, X 3 ) SSR(X 1, X 2 ) Yang Feng (Columbia University) Final Review 19 / 58

ANOVA Table Various software packages can provide extra sums of squares for regression analysis. These are usually provided in the order in which the input variables are provided to the system, for instance Figure: Yang Feng (Columbia University) Final Review 20 / 58

Summary of Tests Concerning Regression Coefficients Test whether all β k = 0 Test whether a single β k = 0 Test whether some β k = 0 Test involving relationships among coefficients, for example, H 0 : β 1 = β 2 vs. H a : β 1 = β 2 H 0 : β 1 = 3, β 2 = 5 vs. H a : otherwise Key point in all tests: form the full model and the reduced model Yang Feng (Columbia University) Final Review 21 / 58

Coefficients of Partial Determination Recall Coefficient of determination : R 2 measures the proportionate reduction in the variation of Y by introduction of the entire set of X. Partial Determination: measures the marginal contribution of one X variable when all others are already in the model. Yang Feng (Columbia University) Final Review 22 / 58

Two predictor variables Y i = β 0 + β 1 X i1 + β 2 X i2 + i Coefficient of partial determination between Y and X 1 given X 2 in the model is denoted as R 2 Y 1 2 : R 2 Y 1 2 = SSE(X 2) SSE(X 1, X 2 ) SSE(X 2 ) = SSR(X 1 X 2 ) SSE(X 2 ) Likewise: R 2 Y 2 1 = SSE(X 1) SSE(X 1, X 2 ) SSE(X 1 ) = SSR(X 2 X 1 ) SSE(X 1 ) Yang Feng (Columbia University) Final Review 23 / 58

General case R 2 Y 1 23 = SSR(X 1 X 2, X 3 ) SSE(X 2, X 3 ) R 2 Y 4 123 = SSR(X 4 X 1, X 2, X 3 ) SSE(X 1, X 2, X 3 ) Yang Feng (Columbia University) Final Review 24 / 58

Coefficients of Partial Correlation Coefficients of Partial Correlation: square root of a coefficient of partial determination, following the same sign with the regression coefficient! Yang Feng (Columbia University) Final Review 25 / 58

Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple Regression Extra Sums of Squares Standardized Version of the Multiple Regression Model 3 Polynomial and Interaction Regression Models 4 Model Selection 5 Remedial Measures for Multiple Linear Regression Models Yang Feng (Columbia University) Final Review 26 / 58

Standardized Multiple Regression Transformed variables Y i = 1 ( Y i Ȳ ) n 1 s y X ik = 1 n 1 ( X ik X k s k ), k = 1,..., p 1 Yang Feng (Columbia University) Final Review 27 / 58

Standardized Regression Model The regression model using the transformed variables: Y i = β 1X i1 + + β p 1X i,p 1 + i Notice that there is no need for intercept It reduces to the standard linear regression problem Yang Feng (Columbia University) Final Review 28 / 58

Standardized Regression Model The solution b = b 1 b 2... b p 1 can be related to the solution to the untransformed regression problem through the relationship b k = ( sy s k )bk, k = 1,..., p 1 b 0 = Ȳ b 1 X 1... b p 1 X p 1 Yang Feng (Columbia University) Final Review 29 / 58

Multicollinearity Usually, we still have good fit of the data, in addition, we still have good prediction. The estimated regression coefficients tends to have large sampling variability when the predictor variables are highly correlated. Some of the regression coefficients maybe statistically not significant even though a definite statistical relation exists. The common interpretation of a regression coefficient is NOT fully applicable any more. Regress Y on both X 1 and X 2. It is possible that when individual t-tests are performed, neither β 1 or β 2 is significant. However, when the F -test is performed for both β 1 and β 2, the results may still be significant. Yang Feng (Columbia University) Final Review 30 / 58

Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple Regression Extra Sums of Squares Standardized Version of the Multiple Regression Model 3 Polynomial and Interaction Regression Models 4 Model Selection 5 Remedial Measures for Multiple Linear Regression Models Yang Feng (Columbia University) Final Review 31 / 58

One-predictor variable-second order Y i = β 0 + β 1 x i + β 11 x 2 i + i where x i = X i X X is centered due to the possible high correlation between X and X 2. Regression function: E{Y } = β 0 + β 1 x + β 11 x 2, quadratic response function β 0 is the mean response when x = 0, i.e., X = X. β 1 is called the linear effect. β 11 is called the quadratic effect. Yang Feng (Columbia University) Final Review 32 / 58

One Predictor Variable-Third Order Y i = β 0 + β 1 x i + β 11 x 2 i + β 111 x 3 i + i where x i = X i X Yang Feng (Columbia University) Final Review 33 / 58

One Predictor Variable-Higher Orders Employed with special caution. Tends to overfit Poor prediction Yang Feng (Columbia University) Final Review 34 / 58

Two Predictors-Second Order Y i = β 0 + β 1 x i1 + β 2 x i2 + β 11 x 2 i1 + β 22 x 2 i2 + β 12 x i1 x i2 + i where x i1 = X i1 X 1, x i2 = X i2 X 2 The coefficient β 12 is called the interaction effect coefficient. More on interaction later. Three Predictors- Second Order is similar. Yang Feng (Columbia University) Final Review 35 / 58

Implementation of Polynomial Regression Models Fitting Very easy, just use the least squares for multiple linear regressions since they can all be seen as a multiple regression. Determine the order Very important step! Y i = β 0 + β 1 x i + β 11 x 2 i + β 111 x 3 i + i Naturally, we want to test whether or not β 111 = 0, or whether or not both β 11 = 0 and β 111 = 0. How to do the test? Yang Feng (Columbia University) Final Review 36 / 58

Extra Sum of Squares Decomposition SSR into SSR(x), SSR(x 2 x) and SSR(x 3 x, x 2 ). Test whether β 111 = 0: use SSR(x 3 x, x 2 ). Test whether both β 11 = 0 and β 111 = 0: use SSR(x 2, x 3 x). Yang Feng (Columbia University) Final Review 37 / 58

Interpretation of Regression Models with Interactions E{Y } = β 0 + β 1 X 1 + β 2 X 2 + β 3 X 1 X 2 The change in mean response with a unit increase in X 1 when X 2 is held constant is β 1 + β 3 X 2 Similarly, a unit increase in X 2 when X 1 is constant: β 2 + β 3 X 1 Yang Feng (Columbia University) Final Review 38 / 58

Implementation of Interaction Regression Models Center the predictor variables to avoid the high multicollinearities x ik = X ik X k Using prior knowledge to reduce the number of interactions. If we have 8 predictors, then we have 28 pairwise terms in total. For p predictors, the number is p(p 1)/2. Yang Feng (Columbia University) Final Review 39 / 58

Qualitative Predictors Examples: Gender (male or female) Purchase status (yes or no) Disability status (not disabled, partly disabled, fully disabled) A qualitative variables with c classes will be represented by c 1 indicator variables, each taking on the values 0 and 1. Yang Feng (Columbia University) Final Review 40 / 58

Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple Regression Extra Sums of Squares Standardized Version of the Multiple Regression Model 3 Polynomial and Interaction Regression Models 4 Model Selection 5 Remedial Measures for Multiple Linear Regression Models Yang Feng (Columbia University) Final Review 41 / 58

Six Criteria R 2 p, R 2 a,p, C p, AIC p, BIC p (SBC p ), PRESS p Denote total number of variables as P 1, so P parameters in total. Here, 1 p P. Use the coefficient of multiple determination R 2 : R 2 p = 1 SSE p SSTO Use adjusted coefficient of multiple determination: Ra,p 2 = 1 n 1 SSE p n p SSTO = 1 MSE p SSTO n 1 Yang Feng (Columbia University) Final Review 42 / 58

Mallows C p Criterion Concerned with total mean squared error we have Criterion measure: (Ŷ i µ i ) 2 = [(E{Ŷ i } µ i ) + (Ŷ i E{Ŷ i })] 2 E(Y i µ i ) 2 = (E{Ŷ i } µ i ) 2 + σ 2 {Ŷ i } Γ p = 1 σ 2 n i=1(e{ŷi} µ i ) 2 + A good estimator of Γ p will be C p = n σ 2 {Ŷi} i=1 SSE p (n 2p) MSE(X 1,, X P 1 ) Yang Feng (Columbia University) Final Review 43 / 58

AIC and BIC The Akaike s information criterion (AIC) and the Bayesian information criterion (BIC) (also called the Schwarz criterion, SBC in the book) are two criteria that penalize model complexity. In the linear regression setting AIC p = n log SSE p n log n + 2p BIC p = n log SSE p n log n + (log n)p Roughly you can think of these two criteria as penalizing models with many parameters (p in the case of linear regression). Yang Feng (Columbia University) Final Review 44 / 58

PRESS p or Leave-One-Out Cross Validation The PRESS p or prediction sum of squares measures how well a subset model can predict the observed responses Y i. Let Ŷi(i) be the fitted value when i is being predicted from a model in which (i) was left out during training. The PRESS p criterion is then given by summing over all n cases PRESS p = n (Y i Ŷi(i)) 2 i=1 PRESS p values can be calculated without doing n separate regression runs. Yang Feng (Columbia University) Final Review 45 / 58

PRESS p or Leave-One-Out Cross Validation If we let d i be the deleted residual for the i th case then we can rewrite d i = Y i Ŷi(i) e i d i = 1 h ii where e i is the ordinary residual for the i th case and h ii is the i th diagonal element in the hat matrix. We can obtain the h ii diagonal element of the hat matrix directly from h ii = X i(x X) 1 X i Yang Feng (Columbia University) Final Review 46 / 58

Stepwise Regression Methods An automatic search procedure Identify a single best model several different formats Yang Feng (Columbia University) Final Review 47 / 58

Forward Stepwise Regression A (greedy) procedure for identifying variables to include in the regression model is as follows. Repeat until finished: 1 Fit a simple linear regression model for each of the P 1 X variables considered for inclusion. For each compute the t statistics for testing whether or not the slope is zero t k = b k 1 s{b k } 2 Pick the largest out of the P 1 tk s (in the first step k = 1) and include the corresponding X variable in the regression model if tk exceeds some significance level. 3 If the number of X variables included in the regression model is greater than one, check to see if the model would be improved by dropping variables (using the t-test and a threshold again). 1 Remember b k is the estimate for β k and s{b k } is the estimator sample standard deviation. Yang Feng (Columbia University) Final Review 48 / 58

Forward Stepwise Regression (cont) Other criteria can be used in determining which variables to add and delete, such as F-test (full model v.s. reduced model), AIC (default option in R), BIC, Cp. Usually much more efficient than the best subset regression Yang Feng (Columbia University) Final Review 49 / 58

Forward Regression Simplified version of forward stepwise regression No deletion step! Once a variable is in, it will be there from then on. Yang Feng (Columbia University) Final Review 50 / 58

Backward Elimination Start from the full model with P 1 variables. Iteratively check whether any variable should be deleted from the model by some given criteria. This time, no addition step! Yang Feng (Columbia University) Final Review 51 / 58

Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple Regression Extra Sums of Squares Standardized Version of the Multiple Regression Model 3 Polynomial and Interaction Regression Models 4 Model Selection 5 Remedial Measures for Multiple Linear Regression Models Yang Feng (Columbia University) Final Review 52 / 58

Unequal Error Variance Y i = β 0 + β 1 X i1 + + β p 1 X i,p 1 + i Here: i are independent N(0, σ 2 i ). (Originally: i are independent N(0, σ 2 )) In matrix form: σ 2 1 0 0 σ 2 0 σ2 2 0 {} =... 0 0 σn 2 Yang Feng (Columbia University) Final Review 53 / 58

Known Error Variance Define weights Denote w i = 1 σ 2 i w 1 0 0 0 w 2 0 W =... 0 0 w n Weighted least squares and maximum likelihood estimator is b w = (X WX) 1 X WY Yang Feng (Columbia University) Final Review 54 / 58

Error Variance Known up to Proportionality Constant Same estimator. w i = k 1 σ 2 i Yang Feng (Columbia University) Final Review 55 / 58

Unknown Error Variances In reality, one rarely known the variances σ 2 i. Estimation of Variance Function or Standard Deviation Function Use of Replicates or Near Replicates Yang Feng (Columbia University) Final Review 56 / 58

Estimation of Variance Function or Standard Deviation Function Four steps: (Can be iterated for several times to reach convegence) 1 Fit the regression model by unweighted least squares and analyze the residuals 2 Estimate the variance function or the standard deviation function by regressing either the squared residuals or the absolute residuals on the appropriate predictor(s). (We known that the variance of i σ 2 i = E( 2 i ) (E( i)) 2 = E( 2 i ). Hence the squared residual e 2 i is an estimator of σ 2 i.) 3 Use the fitted value from the estimated variance or standard deviation function to obtain the weights w i. 4 Estimate the regression coefficients using these weights. Yang Feng (Columbia University) Final Review 57 / 58

Ridge Estimators (Multi-collinearity) OLS: (X X)b = X Y Transformed by correlation transformation: r XX b = r YX Ridge Estimator: for a constant c 0, c = 0, OLS (r XX + ci)b R = r YX c > 0, biased, but much more stable. Yang Feng (Columbia University) Final Review 58 / 58