22s:152 Applied Linear Regression. In matrix notation, we can write this model: Generalized Least Squares. Y = Xβ + ɛ with ɛ N n (0, Σ)

Similar documents
22s:152 Applied Linear Regression. Returning to a continuous response variable Y...

Non-independence due to Time Correlation (Chapter 14)

22s:152 Applied Linear Regression. Take random samples from each of m populations.

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

22s:152 Applied Linear Regression

F9 F10: Autocorrelation

22s:152 Applied Linear Regression. Chapter 5: Ordinary Least Squares Regression. Part 2: Multiple Linear Regression Introduction

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)

Ch 2: Simple Linear Regression

Auto correlation 2. Note: In general we can have AR(p) errors which implies p lagged terms in the error structure, i.e.,

36-707: Regression Analysis Homework Solutions. Homework 3

STAT Regression Methods

Ch 3: Multiple Linear Regression

Lecture 1 Intro to Spatial and Temporal Data

Time-Series Regression and Generalized Least Squares in R*

AGEC 621 Lecture 16 David Bessler

Math 5305 Notes. Diagnostics and Remedial Measures. Jesse Crawford. Department of Mathematics Tarleton State University

1 Graphical method of detecting autocorrelation. 2 Run test to detect autocorrelation

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

Math 3330: Solution to midterm Exam

Simple Linear Regression

R Output for Linear Models using functions lm(), gls() & glm()

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002

22s:152 Applied Linear Regression. 1-way ANOVA visual:

Mixed models with correlated measurement errors

Diagnostics of Linear Regression

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression

Circle the single best answer for each multiple choice question. Your choice should be made clearly.

Questions and Answers on Heteroskedasticity, Autocorrelation and Generalized Least Squares

Lecture 4 Multiple linear regression

Lecture 6 Multiple Linear Regression, cont.

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Introduction and Single Predictor Regression. Correlation

MS&E 226: Small Data

10. Time series regression and forecasting

Lecture 24: Weighted and Generalized Least Squares

Dealing with Heteroskedasticity

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Lecture 19 Multiple (Linear) Regression

22s:152 Applied Linear Regression. Chapter 5: Ordinary Least Squares Regression. Part 1: Simple Linear Regression Introduction and Estimation

Eco and Bus Forecasting Fall 2016 EXERCISE 2

SLR output RLS. Refer to slr (code) on the Lecture Page of the class website.

Correlated Data: Linear Mixed Models with Random Intercepts

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

Correlation and the Analysis of Variance Approach to Simple Linear Regression

The Problem. Regression With Correlated Errors. Generalized Least Squares. Correlated Errors. Consider the typical regression model.

Y i = η + ɛ i, i = 1,...,n.

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

1. How can you tell if there is serial correlation? 2. AR to model serial correlation. 3. Ignoring serial correlation. 4. GLS. 5. Projects.

ST430 Exam 2 Solutions

Empirical Economic Research, Part II

Heteroskedasticity and Autocorrelation

LECTURE 10: MORE ON RANDOM PROCESSES

Multiple Linear Regression

These slides illustrate a few example R commands that can be useful for the analysis of repeated measures data.

Correlation in Linear Regression

Chapter 7: Variances. October 14, In this chapter we consider a variety of extensions to the linear model that allow for more gen-

Heteroskedasticity. Part VII. Heteroskedasticity

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

13.2 Example: W, LM and LR Tests

Introduction to the Analysis of Hierarchical and Longitudinal Data

R 2 and F -Tests and ANOVA

MLR Model Selection. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project

Nonstationary time series models

Density Temp vs Ratio. temp

Applied Regression Analysis

Lecture 3: Multiple Regression

Homework 2. For the homework, be sure to give full explanations where required and to turn in any relevant plots.

1 Forecasting House Starts

Lecture 1: Linear Models and Applications

One-way ANOVA (Single-Factor CRD)

1 Introduction 1. 2 The Multiple Regression Model 1

Weighted Least Squares

14 Multiple Linear Regression

Solution to Series 6

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017

Introduction to Linear Regression Rebecca C. Steorts September 15, 2015

Weighted Least Squares

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =

1 Least Squares Estimation - multiple regression.

Handout 4: Simple Linear Regression

BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression

Stat 579: Generalized Linear Models and Extensions

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects

Motivation for multiple regression

Econometrics of Panel Data

Modeling the Covariance

Econometrics of Panel Data

Inference for Regression

ST505/S697R: Fall Homework 2 Solution.

Lecture 5: Clustering, Linear Regression

Formulary Applied Econometrics

Regression Analysis II

11.1 Gujarati(2003): Chapter 12

Chapter 12: Multiple Linear Regression

Empirical Market Microstructure Analysis (EMMA)

The Multiple Regression Model Estimation

Lecture 9 STK3100/4100

STAT 540: Data Analysis and Regression

Transcription:

22s:152 Applied Linear Regression Generalized Least Squares Returning to a continuous response variable Y Ordinary Least Squares Estimation The classical models we have fit so far with a continuous response Y have been fit using ordinary least squares The model: In matrix notation, we can write this model: Y = Xβ + ɛ with ɛ N n (0, Σ) mean error structure and Σ = σ 2 0 0 0 0 0 0 σ 2 0 0 0 0 0 0 0 0 σ 2 0 0 0 0 0 0 σ 2 n n Y i = β 0 + β 1 x 1i + + β k x ki + ɛ i with ɛ i iid N(0, σ 2 ) is fit by minimizing the RSS, ni=1 (Y i Ŷi) 2 = n i=1 (Y i ( ˆβ 0 + ˆβ 1 x 1i + + ˆβ k x ki )) 2 1 or Σ = σ 2 I n n The variance of the vector Y is denoted Σ, or V (Y )=V (Xβ + ɛ) =V (ɛ) =Σ The Σ shows the independence of the observations (off-diagonals are 0) and the constant variance (σ 2 down the entire diagonal) 2 In matrix notation, we can show the Ordinary Least Squares (OLS) estimates for the regression coefficients β as: ˆβ =(X X) 1 X Y where the (X X) 1 represents the inverse of X X when X is of full rank And the estimate for σ 2 is ˆσ 2 = RSS n k 1 But what if the observations are NOT independent, or there is NOT constant variance? (assumptions for OLS) ie what if V (Y ) σ 2 I n n Then, the appropriate estimation method for the regression coefficients may be through Generalized Least Squares Estimation 3 CASE 1: Non-constant variance, but independence holds In this situation we have a similar model Y = Xβ + ɛ with ɛ N n (0, Σ) except Σ = or ɛ i N(0, σ 2 i ) σ1 2 0 0 0 0 0 0 σ2 2 0 0 0 0 0 0 0 0 σn 1 2 0 0 0 0 0 0 σn 2 n n [different observations i have different variances σi 2] Suppose we can write the variance of Y i as a multiplier of a common variance σ 2 V (Y i )=V (ɛ i )=σ 2 i = ( 1wi ) σ 2 and we say observation i has weight w i 4

The weights are inversely proportional to the variance of errors (w i = 1 σi 2 σ 2 ) An observation with a smaller variance has a larger weight Then, V (Y )=Σ and 1 w 0 0 0 0 0 1 0 1 w2 0 0 0 0 Σ = σ 2 0 0 0 0 1 w 0 n 1 0 0 0 0 0 1 wn n n = σ 2 W 1 and W is a n n diagonal matrix of weights This special case where Σ is diagonal is very useful, and is known as weighted least squares 5 Weighted Least Squares A special case of Generalized Least Squares Useful when errors have different variance but are all uncorrelated (independent) Assumes that we have some way of knowing the relative variances associated with each observation (or weights) Associates a weight w i with each observation Chooses ˆβ =(ˆβ 0, ˆβ 1,, ˆβ k ) to minimize ni=1 w i [Y i ( ˆβ 0 + ˆβ 1 x 1i + + ˆβ k x ki )] 2 In matrix form the Generalized Least Squares estimates are: ˆβ =(X WX) 1 X WY ni=1 ˆσ 2 w = i (Y i Ŷi) 2 n k 1 Notice the similarity to the OLS form, but now with the W 6 Example situations: 1 If data has been summarized and ith response is the average of n i observations each with constant variance σ 2, then V ar(y i )= σ2 n i and w i = n i 2 If variance is proportional to some predictor x i, then V ar(y i )=x i σ 2 and w i = 1 x i Example: Apple Shoots data Using trees planted in 1933 and 1934, Bland took samples of shoots from apple tress every few days throughout the 1971 growing season (about 106 days) He counted the number of stem units per shoot This measurement was thought to help understand the growth of the tress (fruiting and branching) We do not know the number of stem units for every shoot, but we know the average number of stem units per shoot for all samples on a given day We are interested in modeling the relationship between day of collection (observed) and number of stem units on a sample (not directly observed) 7 8

VARIABLES day n y ybar days from dormancy (day of collection) number of shoots collected number of stem units per shoot average number of stem units per shoot for shoots collected on that day (ie y/n) 14 55 11 2264 15 58 9 2278 16 61 14 2393 17 69 10 2550 18 73 12 2508 19 76 9 2667 20 88 7 2800 21 100 10 3167 22 106 7 3214 Notice we do not have y, and there are a variety of number of samples taken on a day > applelong day n ybar 1 0 5 1020 2 3 5 1040 3 7 5 1060 4 13 6 1250 5 18 5 1200 6 24 4 1500 7 25 6 1517 8 32 5 1700 9 38 7 1871 10 42 9 1922 11 44 10 2000 12 49 19 2032 13 52 14 2207 Plot the relationship between ybar and day ybar 10 15 20 25 30 0 20 40 60 80 100 day 9 10 If these were individual y observations, we could fit our usual linear model But, some of the observations provide more information on the conditional mean (n i larger), and others provide less information on the conditional mean (n i smaller) If we assume a constant variance σ 2 for the simple linear regression model of y regressed on day, then these ybar observations have a non-constant variance related to n i, with Var(ybar i )= σ2 n i We can fit this model using Weighted Least Squares estimation with w i = n i We ll use our usual lm() function, but include the weights option > lmout=lm(ybar ~ day,weights=n) > summary(lmout) Coefficients: Estimate Std Error t value Pr(> t ) (Intercept) 9973754 0314272 3174 <2e-16 *** day 0217330 0005339 4071 <2e-16 *** --- Signif codes: 0 *** 0001 ** 001 * 005 01 1 Residual standard error: 1929 on 20 degrees of freedom Multiple R-Squared: 09881,Adjusted R-squared: 09875 F-statistic: 1657 on 1 and 20 DF, p-value: < 22e-16 These estimates ˆβ 0 and ˆβ 1 coincide with the simple linear regression model, but we ve accounted for the non-constant variance in our observations And the common ˆσ = 1929 11 12

If we plot the absolute value of the raw residual e i against the number of observations on the day n i, we see that the observations with higher n i tend to have lower variability: > plot(n,abs(lmout$residuals),ylab="abs(residual)") y 0 40 80 y 0 40 80 lmout$residuals -10 0 5 CASE 2: Non-independence due to Time Correlation When we model the mean structure with ordinary least squares (OLS), the mean structure explains the general trends in the data with respect to our dependent variable and the independent variables abs(residual) 00 05 10 15 5 10 15 n The leftover noise or errors are assumed to have no pattern (we have diagnostic plots to check this) For one thing, the errors are assumed to be independent Suppose observations have been collected over time, and observations taken closer in time are more alike than observations taken further apart in time 13 14 This is a time-correlation situation And we can see the correlation in the errors by plotting the residuals against time OLS fit Residuals Example: Time as independent variable The following scatterplot shows a positive linear trend in Y with respect to time for Time = 1, 2, 3,, 50 0 20 40 time 0 40 80 lmout$fittedvalues There is a pattern in the residuals suggesting residuals near to each other are similar (positively correlated) 0 10 30 50 time Let s look at the ordinary least squares fit 15 If a residual is positive, there s a good chance it s neighboring residual is also positive A lag plot of the residuals gives us information on this residual: e i previous residual in time: e i 1 16

Plotting each residual against the previous residual: lmout$residuals -10-5 0 5 10-15 -10-5 0 5 10 15 Autocorrelation Autoregressive model: model a series in terms of its own past behavior The first-order autoregressive model, AR(1) Y t = β 0 + β 1 x t + ɛ t for t =1,, T lag 1 So, there is positive correlation in the lag residuals The assumption of independence is violated (with respect to the assumption of OLS) We can instead move away from OLS, and incorporate this correlation into our modeling with ɛ t = ρɛ t 1 + u t and u t N(0, σ 2 ) ρ < 1 is the autocorrelation parameter, it tells how strongly the sequential observations are correlated The t th and the (t j) th are also correlated, but not as strongly: corr(ɛ t, ɛ t j )=ρ j 17 18 A simulation of AR(1) data from n = 50 uniformly spaced time points with a positive linear trend (with β 1 = 2) can bring insight into the AR(1) process: ## Generate x-values: > n=50 > time=1:n ## Assign parameters: > sigma=3 > rho=95 > beta=2 ## Get start point at t=1 for time series ## and allocate space for data vectors: > y=rep(0,n) > e=rep(0,n) > e[1]=rnorm(1,0,sigma) > y[1]=beta*time[1]+e[1] ## Use AR(1) process to sequentially generate y-values: > for (i in 2:n){ e[i]=rho*e[i-1]+rnorm(1,0,sigma) y[i]=beta*time[i]+e[i] } The data for the plots on the previous pages were made from There is also a test for time-correlated errors called the Durbin-Watson test It actually looks for AR(1) errors, and uses H 0 : ρ = 0 vs H A : ρ 0 The test statistic: d = nt=2 (e t e t 1 ) 2 nt=1 e 2 t A small d indicates positive autocorrelation And d = 2 suggests no positive autocorrelation Testing the simulated AR(1) data: > library(car) > durbinwatson(lmout) lag Autocorrelation D-W Statistic p-value 1 0852054 02674768 0 Alternative hypothesis: rho!= 0 Reject H 0, there is positive correlation this code 19 20

The mean structure in the AR(1) is the same as OLS, but we model the errors differently Y = Xβ + ɛ with ɛ N n (0, Σ) and V (Y )=Σ = κ κρ κρ 2 κρ n 2 κρ n 1 κρ κ κρ κρ n 3 κρ n 2 κρ n 2 κρ n 3 κρ n 4 κ κρ κρ n 1 κρ n 2 κρ n 3 κρ κ and Var(ɛ t )=κ = σ2 1 ρ 2 n n And we again have a Generalized Least Squares Estimation for ˆβ ˆβ =(X Σ 1 X) 1 X Σ 1 Y Notice the similarity to the OLS form, but now with the Σ 1 Example: Daily value of stock The dataset soccho for this example shows the value of 1 unit of CREF Social Choice stock fund on each day of a year starting on 10/21/99 We re interested in fitting a linear model over time But, the independence assumption for OLS is probably violated We can use the Durbin-Watson test to determine whether this is true If so, we will fit an AR(1) model to the data > head(soccho) account unitval date 1 CREFsoci 887151 10/21/99 2 CREFsoci 894194 10/22/99 3 CREFsoci 894194 10/23/99 4 CREFsoci 894194 10/24/99 5 CREFsoci 891719 10/25/99 6 CREFsoci 887471 10/26/99 21 22 As there are weekend days in the data set, we will first remove these: > n=length(soccho$unitval) > n [1] 365 ## 10/21/99 is a Thursday, get indices for removal: > a1=(seq(1,365,7)+3) > a1=a1[-length(a1)] > a2=(seq(1,365,7)+4) > a2=a2[-length(a2)] > a=sort(c(a1,a2)) ## Subset data down to weekdays: > dayvalues=soccho$unitval[-a] > day=1:length(dayvalues) dayvalues 90 92 94 96 98 100 102 0 50 100 150 200 250 day > length(dayvalues) [1] 261 > plot(day,dayvalues,pch=16) It s pretty apparent that there is time-based correlation in the data, but we will fit a regular linear model assuming independence and then test for correlation over time 23 24

> lmout=lm(dayvalues~day) > summary(lmout) Coefficients: Estimate Std Error t value Pr(> t ) (Intercept) 92653292 0235187 39395 <2e-16 *** day 0027864 0001556 1790 <2e-16 *** --- Signif codes: 0 *** 0001 ** 001 * 005 01 1 Residual standard error: 1894 on 259 degrees of freedom Multiple R-Squared: 05531,Adjusted R-squared: 05514 F-statistic: 3206 on 1 and 259 DF, p-value: < 22e-16 dayvalues 90 92 94 96 98 100 102 A plot of the residuals vs fitted also show the time-based correlation > plot(lmout$fittedvalues,lmout$residuals,pch=16) > abline(h=0) lmout$residuals!6!4!2 0 2 4 94 96 98 100 lmout$fittedvalues Residuals that are positive tend to be near other positive residuals, and vice versa for negative residuals 0 50 100 150 200 250 day 25 26 This is more apparent in a lag plot where we plot a residual vs its neighboring residual: e i vs e i 1 > lagplot(lmout$residuals,dolines=false) We can use the Durbin-Watson test to formally test for time dependence (uses the relationship between e i and e i 1 ) > library(car) > durbinwatson(lmout) lmout$residuals!6!4!2 0 2 4 lag Autocorrelation D-W Statistic p-value 1 09025303 01684356 0 Alternative hypothesis: rho!= 0 The test strongly rejects the null of independence (H 0 : ρ = 0) We will fit a first-order autoregressive model to the data, or an AR(1)!6!4!2 0 2 4 lag 1 There is a positive correlation in the lag residuals (residuals tend to be more like their near neighbors) 27 28

Fitting the AR(1) model The gls function [generalized least squares] in the nlme library [non-linear mixed effects] fits regression models with a variety of correlated-error and non-constant errorvariance structures > library(nlme) ## The ~1 below says the data is in order by time > glsout=gls(dayvalues~day, correlation=corar1(form = ~1)) > summary(glsout) Generalized least squares fit by REML Model: dayvalues ~ day Data: NULL AIC BIC loglik 615644 6298713-303822 Coefficients: Value StdError t-value p-value (Intercept) 9212831 14044915 6559549 00000 day 002897 00090084 321585 00015 Residual standard error: 226733 Degrees of freedom: 261 total; 259 residual Day is a significant linear predictor for stock price ˆρ = 09413, and sequential observations are strongly correlated Correlation Structure: AR(1) Formula: ~1 Parameter estimate(s): Phi 09412842 29 30 Comments: 1 When you have many covariates, you can plot the residuals from the OLS fitted model against time as a time-correlation diagnostic If there is time-correlation, this plot will show a pattern rather than a random scatter 2 Including time as a predictor does not necessarily remove time-correlated errors As in the soccho example, time was a predictor in the OLS model, which meant there was a general trend over time, but there was still correlation in the errors after time was included 31