22s:152 Applied Linear Regression. In matrix notation, we can write this model: Generalized Least Squares. Y = Xβ + ɛ with ɛ N n (0, Σ)

Size: px
Start display at page:

Download "22s:152 Applied Linear Regression. In matrix notation, we can write this model: Generalized Least Squares. Y = Xβ + ɛ with ɛ N n (0, Σ)"


1 22s:152 Applied Linear Regression Generalized Least Squares Returning to a continuous response variable Y Ordinary Least Squares Estimation The classical models we have fit so far with a continuous response Y have been fit using ordinary least squares The model: In matrix notation, we can write this model: Y = Xβ + ɛ with ɛ N n (0, Σ) mean error structure and Σ = σ σ σ σ 2 n n Y i = β 0 + β 1 x 1i + + β k x ki + ɛ i with ɛ i iid N(0, σ 2 ) is fit by minimizing the RSS, ni=1 (Y i Ŷi) 2 = n i=1 (Y i ( ˆβ 0 + ˆβ 1 x 1i + + ˆβ k x ki )) 2 1 or Σ = σ 2 I n n The variance of the vector Y is denoted Σ, or V (Y )=V (Xβ + ɛ) =V (ɛ) =Σ The Σ shows the independence of the observations (off-diagonals are 0) and the constant variance (σ 2 down the entire diagonal) 2 In matrix notation, we can show the Ordinary Least Squares (OLS) estimates for the regression coefficients β as: ˆβ =(X X) 1 X Y where the (X X) 1 represents the inverse of X X when X is of full rank And the estimate for σ 2 is ˆσ 2 = RSS n k 1 But what if the observations are NOT independent, or there is NOT constant variance? (assumptions for OLS) ie what if V (Y ) σ 2 I n n Then, the appropriate estimation method for the regression coefficients may be through Generalized Least Squares Estimation 3 CASE 1: Non-constant variance, but independence holds In this situation we have a similar model Y = Xβ + ɛ with ɛ N n (0, Σ) except Σ = or ɛ i N(0, σ 2 i ) σ σ σn σn 2 n n [different observations i have different variances σi 2] Suppose we can write the variance of Y i as a multiplier of a common variance σ 2 V (Y i )=V (ɛ i )=σ 2 i = ( 1wi ) σ 2 and we say observation i has weight w i 4

2 The weights are inversely proportional to the variance of errors (w i = 1 σi 2 σ 2 ) An observation with a smaller variance has a larger weight Then, V (Y )=Σ and 1 w w Σ = σ w 0 n wn n n = σ 2 W 1 and W is a n n diagonal matrix of weights This special case where Σ is diagonal is very useful, and is known as weighted least squares 5 Weighted Least Squares A special case of Generalized Least Squares Useful when errors have different variance but are all uncorrelated (independent) Assumes that we have some way of knowing the relative variances associated with each observation (or weights) Associates a weight w i with each observation Chooses ˆβ =(ˆβ 0, ˆβ 1,, ˆβ k ) to minimize ni=1 w i [Y i ( ˆβ 0 + ˆβ 1 x 1i + + ˆβ k x ki )] 2 In matrix form the Generalized Least Squares estimates are: ˆβ =(X WX) 1 X WY ni=1 ˆσ 2 w = i (Y i Ŷi) 2 n k 1 Notice the similarity to the OLS form, but now with the W 6 Example situations: 1 If data has been summarized and ith response is the average of n i observations each with constant variance σ 2, then V ar(y i )= σ2 n i and w i = n i 2 If variance is proportional to some predictor x i, then V ar(y i )=x i σ 2 and w i = 1 x i Example: Apple Shoots data Using trees planted in 1933 and 1934, Bland took samples of shoots from apple tress every few days throughout the 1971 growing season (about 106 days) He counted the number of stem units per shoot This measurement was thought to help understand the growth of the tress (fruiting and branching) We do not know the number of stem units for every shoot, but we know the average number of stem units per shoot for all samples on a given day We are interested in modeling the relationship between day of collection (observed) and number of stem units on a sample (not directly observed) 7 8

3 VARIABLES day n y ybar days from dormancy (day of collection) number of shoots collected number of stem units per shoot average number of stem units per shoot for shoots collected on that day (ie y/n) Notice we do not have y, and there are a variety of number of samples taken on a day > applelong day n ybar Plot the relationship between ybar and day ybar day 9 10 If these were individual y observations, we could fit our usual linear model But, some of the observations provide more information on the conditional mean (n i larger), and others provide less information on the conditional mean (n i smaller) If we assume a constant variance σ 2 for the simple linear regression model of y regressed on day, then these ybar observations have a non-constant variance related to n i, with Var(ybar i )= σ2 n i We can fit this model using Weighted Least Squares estimation with w i = n i We ll use our usual lm() function, but include the weights option > lmout=lm(ybar ~ day,weights=n) > summary(lmout) Coefficients: Estimate Std Error t value Pr(> t ) (Intercept) <2e-16 *** day <2e-16 *** --- Signif codes: 0 *** 0001 ** 001 * Residual standard error: 1929 on 20 degrees of freedom Multiple R-Squared: 09881,Adjusted R-squared: F-statistic: 1657 on 1 and 20 DF, p-value: < 22e-16 These estimates ˆβ 0 and ˆβ 1 coincide with the simple linear regression model, but we ve accounted for the non-constant variance in our observations And the common ˆσ =

4 If we plot the absolute value of the raw residual e i against the number of observations on the day n i, we see that the observations with higher n i tend to have lower variability: > plot(n,abs(lmout$residuals),ylab="abs(residual)") y y lmout$residuals CASE 2: Non-independence due to Time Correlation When we model the mean structure with ordinary least squares (OLS), the mean structure explains the general trends in the data with respect to our dependent variable and the independent variables abs(residual) n The leftover noise or errors are assumed to have no pattern (we have diagnostic plots to check this) For one thing, the errors are assumed to be independent Suppose observations have been collected over time, and observations taken closer in time are more alike than observations taken further apart in time This is a time-correlation situation And we can see the correlation in the errors by plotting the residuals against time OLS fit Residuals Example: Time as independent variable The following scatterplot shows a positive linear trend in Y with respect to time for Time = 1, 2, 3,, time lmout$fittedvalues There is a pattern in the residuals suggesting residuals near to each other are similar (positively correlated) time Let s look at the ordinary least squares fit 15 If a residual is positive, there s a good chance it s neighboring residual is also positive A lag plot of the residuals gives us information on this residual: e i previous residual in time: e i 1 16

5 Plotting each residual against the previous residual: lmout$residuals Autocorrelation Autoregressive model: model a series in terms of its own past behavior The first-order autoregressive model, AR(1) Y t = β 0 + β 1 x t + ɛ t for t =1,, T lag 1 So, there is positive correlation in the lag residuals The assumption of independence is violated (with respect to the assumption of OLS) We can instead move away from OLS, and incorporate this correlation into our modeling with ɛ t = ρɛ t 1 + u t and u t N(0, σ 2 ) ρ < 1 is the autocorrelation parameter, it tells how strongly the sequential observations are correlated The t th and the (t j) th are also correlated, but not as strongly: corr(ɛ t, ɛ t j )=ρ j A simulation of AR(1) data from n = 50 uniformly spaced time points with a positive linear trend (with β 1 = 2) can bring insight into the AR(1) process: ## Generate x-values: > n=50 > time=1:n ## Assign parameters: > sigma=3 > rho=95 > beta=2 ## Get start point at t=1 for time series ## and allocate space for data vectors: > y=rep(0,n) > e=rep(0,n) > e[1]=rnorm(1,0,sigma) > y[1]=beta*time[1]+e[1] ## Use AR(1) process to sequentially generate y-values: > for (i in 2:n){ e[i]=rho*e[i-1]+rnorm(1,0,sigma) y[i]=beta*time[i]+e[i] } The data for the plots on the previous pages were made from There is also a test for time-correlated errors called the Durbin-Watson test It actually looks for AR(1) errors, and uses H 0 : ρ = 0 vs H A : ρ 0 The test statistic: d = nt=2 (e t e t 1 ) 2 nt=1 e 2 t A small d indicates positive autocorrelation And d = 2 suggests no positive autocorrelation Testing the simulated AR(1) data: > library(car) > durbinwatson(lmout) lag Autocorrelation D-W Statistic p-value Alternative hypothesis: rho!= 0 Reject H 0, there is positive correlation this code 19 20

6 The mean structure in the AR(1) is the same as OLS, but we model the errors differently Y = Xβ + ɛ with ɛ N n (0, Σ) and V (Y )=Σ = κ κρ κρ 2 κρ n 2 κρ n 1 κρ κ κρ κρ n 3 κρ n 2 κρ n 2 κρ n 3 κρ n 4 κ κρ κρ n 1 κρ n 2 κρ n 3 κρ κ and Var(ɛ t )=κ = σ2 1 ρ 2 n n And we again have a Generalized Least Squares Estimation for ˆβ ˆβ =(X Σ 1 X) 1 X Σ 1 Y Notice the similarity to the OLS form, but now with the Σ 1 Example: Daily value of stock The dataset soccho for this example shows the value of 1 unit of CREF Social Choice stock fund on each day of a year starting on 10/21/99 We re interested in fitting a linear model over time But, the independence assumption for OLS is probably violated We can use the Durbin-Watson test to determine whether this is true If so, we will fit an AR(1) model to the data > head(soccho) account unitval date 1 CREFsoci /21/99 2 CREFsoci /22/99 3 CREFsoci /23/99 4 CREFsoci /24/99 5 CREFsoci /25/99 6 CREFsoci /26/ As there are weekend days in the data set, we will first remove these: > n=length(soccho$unitval) > n [1] 365 ## 10/21/99 is a Thursday, get indices for removal: > a1=(seq(1,365,7)+3) > a1=a1[-length(a1)] > a2=(seq(1,365,7)+4) > a2=a2[-length(a2)] > a=sort(c(a1,a2)) ## Subset data down to weekdays: > dayvalues=soccho$unitval[-a] > day=1:length(dayvalues) dayvalues day > length(dayvalues) [1] 261 > plot(day,dayvalues,pch=16) It s pretty apparent that there is time-based correlation in the data, but we will fit a regular linear model assuming independence and then test for correlation over time 23 24

7 > lmout=lm(dayvalues~day) > summary(lmout) Coefficients: Estimate Std Error t value Pr(> t ) (Intercept) <2e-16 *** day <2e-16 *** --- Signif codes: 0 *** 0001 ** 001 * Residual standard error: 1894 on 259 degrees of freedom Multiple R-Squared: 05531,Adjusted R-squared: F-statistic: 3206 on 1 and 259 DF, p-value: < 22e-16 dayvalues A plot of the residuals vs fitted also show the time-based correlation > plot(lmout$fittedvalues,lmout$residuals,pch=16) > abline(h=0) lmout$residuals!6!4! lmout$fittedvalues Residuals that are positive tend to be near other positive residuals, and vice versa for negative residuals day This is more apparent in a lag plot where we plot a residual vs its neighboring residual: e i vs e i 1 > lagplot(lmout$residuals,dolines=false) We can use the Durbin-Watson test to formally test for time dependence (uses the relationship between e i and e i 1 ) > library(car) > durbinwatson(lmout) lmout$residuals!6!4! lag Autocorrelation D-W Statistic p-value Alternative hypothesis: rho!= 0 The test strongly rejects the null of independence (H 0 : ρ = 0) We will fit a first-order autoregressive model to the data, or an AR(1)!6!4! lag 1 There is a positive correlation in the lag residuals (residuals tend to be more like their near neighbors) 27 28

8 Fitting the AR(1) model The gls function [generalized least squares] in the nlme library [non-linear mixed effects] fits regression models with a variety of correlated-error and non-constant errorvariance structures > library(nlme) ## The ~1 below says the data is in order by time > glsout=gls(dayvalues~day, correlation=corar1(form = ~1)) > summary(glsout) Generalized least squares fit by REML Model: dayvalues ~ day Data: NULL AIC BIC loglik Coefficients: Value StdError t-value p-value (Intercept) day Residual standard error: Degrees of freedom: 261 total; 259 residual Day is a significant linear predictor for stock price ˆρ = 09413, and sequential observations are strongly correlated Correlation Structure: AR(1) Formula: ~1 Parameter estimate(s): Phi Comments: 1 When you have many covariates, you can plot the residuals from the OLS fitted model against time as a time-correlation diagnostic If there is time-correlation, this plot will show a pattern rather than a random scatter 2 Including time as a predictor does not necessarily remove time-correlated errors As in the soccho example, time was a predictor in the OLS model, which meant there was a general trend over time, but there was still correlation in the errors after time was included 31

22s:152 Applied Linear Regression. Returning to a continuous response variable Y...

22s:152 Applied Linear Regression. Returning to a continuous response variable Y... 22s:152 Applied Linear Regression Generalized Least Squares Returning to a continuous response variable Y... Ordinary Least Squares Estimation The classical models we have fit so far with a continuous

More information

Non-independence due to Time Correlation (Chapter 14)

Non-independence due to Time Correlation (Chapter 14) Non-independence due to Time Correlation (Chapter 14) When we model the mean structure with ordinary least squares, the mean structure explains the general trends in the data with respect to our dependent

More information

22s:152 Applied Linear Regression. Take random samples from each of m populations.

22s:152 Applied Linear Regression. Take random samples from each of m populations. 22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each

More information

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA 22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each

More information

22s:152 Applied Linear Regression

22s:152 Applied Linear Regression 22s:152 Applied Linear Regression Chapter 7: Dummy Variable Regression So far, we ve only considered quantitative variables in our models. We can integrate categorical predictors by constructing artificial

More information

F9 F10: Autocorrelation

F9 F10: Autocorrelation F9 F10: Autocorrelation Feng Li Department of Statistics, Stockholm University Introduction In the classic regression model we assume cov(u i, u j x i, x k ) = E(u i, u j ) = 0 What if we break the assumption?

More information

22s:152 Applied Linear Regression. Chapter 5: Ordinary Least Squares Regression. Part 2: Multiple Linear Regression Introduction

22s:152 Applied Linear Regression. Chapter 5: Ordinary Least Squares Regression. Part 2: Multiple Linear Regression Introduction 22s:152 Applied Linear Regression Chapter 5: Ordinary Least Squares Regression Part 2: Multiple Linear Regression Introduction Basic idea: we have more than one covariate or predictor for modeling a dependent

More information

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA s:5 Applied Linear Regression Chapter 8: ANOVA Two-way ANOVA Used to compare populations means when the populations are classified by two factors (or categorical variables) For example sex and occupation

More information

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) 22s:152 Applied Linear Regression Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) We now consider an analysis with only categorical predictors (i.e. all predictors are

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Auto correlation 2. Note: In general we can have AR(p) errors which implies p lagged terms in the error structure, i.e.,

Auto correlation 2. Note: In general we can have AR(p) errors which implies p lagged terms in the error structure, i.e., 1 Motivation Auto correlation 2 Autocorrelation occurs when what happens today has an impact on what happens tomorrow, and perhaps further into the future This is a phenomena mainly found in time-series

More information

36-707: Regression Analysis Homework Solutions. Homework 3

36-707: Regression Analysis Homework Solutions. Homework 3 36-707: Regression Analysis Homework Solutions Homework 3 Fall 2012 Problem 1 Y i = βx i + ɛ i, i {1, 2,..., n}. (a) Find the LS estimator of β: RSS = Σ n i=1(y i βx i ) 2 RSS β = Σ n i=1( 2X i )(Y i βx

More information

STAT Regression Methods

STAT Regression Methods STAT 501 - Regression Methods Unit 9 Examples Example 1: Quake Data Let y t = the annual number of worldwide earthquakes with magnitude greater than 7 on the Richter scale for n = 99 years. Figure 1 gives

More information

Ch 3: Multiple Linear Regression

Ch 3: Multiple Linear Regression Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery

More information

Lecture 1 Intro to Spatial and Temporal Data

Lecture 1 Intro to Spatial and Temporal Data Lecture 1 Intro to Spatial and Temporal Data Dennis Sun Stanford University Stats 253 June 22, 2015 1 What is Spatial and Temporal Data? 2 Trend Modeling 3 Omitted Variables 4 Overview of this Class 1

More information

Time-Series Regression and Generalized Least Squares in R*

Time-Series Regression and Generalized Least Squares in R* Time-Series Regression and Generalized Least Squares in R* An Appendix to An R Companion to Applied Regression, third edition John Fox & Sanford Weisberg last revision: 2018-09-26 Abstract Generalized

More information

AGEC 621 Lecture 16 David Bessler

AGEC 621 Lecture 16 David Bessler AGEC 621 Lecture 16 David Bessler This is a RATS output for the dummy variable problem given in GHJ page 422; the beer expenditure lecture (last time). I do not expect you to know RATS but this will give

More information

Math 5305 Notes. Diagnostics and Remedial Measures. Jesse Crawford. Department of Mathematics Tarleton State University

Math 5305 Notes. Diagnostics and Remedial Measures. Jesse Crawford. Department of Mathematics Tarleton State University Math 5305 Notes Diagnostics and Remedial Measures Jesse Crawford Department of Mathematics Tarleton State University (Tarleton State University) Diagnostics and Remedial Measures 1 / 44 Model Assumptions

More information

1 Graphical method of detecting autocorrelation. 2 Run test to detect autocorrelation

1 Graphical method of detecting autocorrelation. 2 Run test to detect autocorrelation 1 Graphical method of detecting autocorrelation Residual plot : A graph of the estimated residuals ˆɛ i against time t is plotted. If successive residuals tend to cluster on one side of the zero line of

More information

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75

More information

Math 3330: Solution to midterm Exam

Math 3330: Solution to midterm Exam Math 3330: Solution to midterm Exam Question 1: (14 marks) Suppose the regression model is y i = β 0 + β 1 x i + ε i, i = 1,, n, where ε i are iid Normal distribution N(0, σ 2 ). a. (2 marks) Compute the

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

R Output for Linear Models using functions lm(), gls() & glm()

R Output for Linear Models using functions lm(), gls() & glm() LM 04 lm(), gls() &glm() 1 R Output for Linear Models using functions lm(), gls() & glm() Different kinds of output related to linear models can be obtained in R using function lm() {stats} in the base

More information


STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002 Time allowed: 3 HOURS. STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002 This is an open book exam: all course notes and the text are allowed, and you are expected to use your own calculator.

More information

22s:152 Applied Linear Regression. 1-way ANOVA visual:

22s:152 Applied Linear Regression. 1-way ANOVA visual: 22s:152 Applied Linear Regression 1-way ANOVA visual: Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 Y We now consider an analysis

More information

Mixed models with correlated measurement errors

Mixed models with correlated measurement errors Mixed models with correlated measurement errors Rasmus Waagepetersen October 9, 2018 Example from Department of Health Technology 25 subjects where exposed to electric pulses of 11 different durations

More information

Diagnostics of Linear Regression

Diagnostics of Linear Regression Diagnostics of Linear Regression Junhui Qian October 7, 14 The Objectives After estimating a model, we should always perform diagnostics on the model. In particular, we should check whether the assumptions

More information

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression 22s:52 Applied Linear Regression Ch. 4 (sec. and Ch. 5 (sec. & 4: Logistic Regression Logistic Regression When the response variable is a binary variable, such as 0 or live or die fail or succeed then

More information

Circle the single best answer for each multiple choice question. Your choice should be made clearly.

Circle the single best answer for each multiple choice question. Your choice should be made clearly. TEST #1 STA 4853 March 6, 2017 Name: Please read the following directions. DO NOT TURN THE PAGE UNTIL INSTRUCTED TO DO SO Directions This exam is closed book and closed notes. There are 32 multiple choice

More information

Questions and Answers on Heteroskedasticity, Autocorrelation and Generalized Least Squares

Questions and Answers on Heteroskedasticity, Autocorrelation and Generalized Least Squares Questions and Answers on Heteroskedasticity, Autocorrelation and Generalized Least Squares L Magee Fall, 2008 1 Consider a regression model y = Xβ +ɛ, where it is assumed that E(ɛ X) = 0 and E(ɛɛ X) =

More information

Lecture 4 Multiple linear regression

Lecture 4 Multiple linear regression Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters

More information

Lecture 6 Multiple Linear Regression, cont.

Lecture 6 Multiple Linear Regression, cont. Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression

More information

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Lecture 2. The Simple Linear Regression Model: Matrix Approach Lecture 2 The Simple Linear Regression Model: Matrix Approach Matrix algebra Matrix representation of simple linear regression model 1 Vectors and Matrices Where it is necessary to consider a distribution

More information

Introduction and Single Predictor Regression. Correlation

Introduction and Single Predictor Regression. Correlation Introduction and Single Predictor Regression Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Correlation A correlation

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the

More information

10. Time series regression and forecasting

10. Time series regression and forecasting 10. Time series regression and forecasting Key feature of this section: Analysis of data on a single entity observed at multiple points in time (time series data) Typical research questions: What is the

More information

Lecture 24: Weighted and Generalized Least Squares

Lecture 24: Weighted and Generalized Least Squares Lecture 24: Weighted and Generalized Least Squares 1 Weighted Least Squares When we use ordinary least squares to estimate linear regression, we minimize the mean squared error: MSE(b) = 1 n (Y i X i β)

More information

Dealing with Heteroskedasticity

Dealing with Heteroskedasticity Dealing with Heteroskedasticity James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Dealing with Heteroskedasticity 1 / 27 Dealing

More information

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Econometrics Week 4 Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Fall 2012 1 / 23 Recommended Reading For the today Serial correlation and heteroskedasticity in

More information

Lecture 19 Multiple (Linear) Regression

Lecture 19 Multiple (Linear) Regression Lecture 19 Multiple (Linear) Regression Thais Paiva STA 111 - Summer 2013 Term II August 1, 2013 1 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013 Lecture Plan 1 Multiple regression

More information

22s:152 Applied Linear Regression. Chapter 5: Ordinary Least Squares Regression. Part 1: Simple Linear Regression Introduction and Estimation

22s:152 Applied Linear Regression. Chapter 5: Ordinary Least Squares Regression. Part 1: Simple Linear Regression Introduction and Estimation 22s:152 Applied Linear Regression Chapter 5: Ordinary Least Squares Regression Part 1: Simple Linear Regression Introduction and Estimation Methods for studying the relationship of two or more quantitative

More information

Eco and Bus Forecasting Fall 2016 EXERCISE 2

Eco and Bus Forecasting Fall 2016 EXERCISE 2 ECO 5375-701 Prof. Tom Fomby Eco and Bus Forecasting Fall 016 EXERCISE Purpose: To learn how to use the DTDS model to test for the presence or absence of seasonality in time series data and to estimate

More information

SLR output RLS. Refer to slr (code) on the Lecture Page of the class website.

SLR output RLS. Refer to slr (code) on the Lecture Page of the class website. SLR output RLS Refer to slr (code) on the Lecture Page of the class website. Old Faithful at Yellowstone National Park, WY: Simple Linear Regression (SLR) Analysis SLR analysis explores the linear association

More information

Correlated Data: Linear Mixed Models with Random Intercepts

Correlated Data: Linear Mixed Models with Random Intercepts 1 Correlated Data: Linear Mixed Models with Random Intercepts Mixed Effects Models This lecture introduces linear mixed effects models. Linear mixed models are a type of regression model, which generalise

More information

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow) STAT40 Midterm Exam University of Illinois Urbana-Champaign October 19 (Friday), 018 3:00 4:15p SOLUTIONS (Yellow) Question 1 (15 points) (10 points) 3 (50 points) extra ( points) Total (77 points) Points

More information

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Correlation and the Analysis of Variance Approach to Simple Linear Regression Correlation and the Analysis of Variance Approach to Simple Linear Regression Biometry 755 Spring 2009 Correlation and the Analysis of Variance Approach to Simple Linear Regression p. 1/35 Correlation

More information

The Problem. Regression With Correlated Errors. Generalized Least Squares. Correlated Errors. Consider the typical regression model.

The Problem. Regression With Correlated Errors. Generalized Least Squares. Correlated Errors. Consider the typical regression model. The Problem Regression With Correlated Errors Consider the typical regression model y t = β z t + x t where x t is a process with covariance function γ(s, t). The matrix formulation is y = Z β + x where

More information

Y i = η + ɛ i, i = 1,...,n.

Y i = η + ɛ i, i = 1,...,n. Nonparametric tests If data do not come from a normal population (and if the sample is not large), we cannot use a t-test. One useful approach to creating test statistics is through the use of rank statistics.

More information

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin Regression Review Statistics 149 Spring 2006 Copyright c 2006 by Mark E. Irwin Matrix Approach to Regression Linear Model: Y i = β 0 + β 1 X i1 +... + β p X ip + ɛ i ; ɛ i iid N(0, σ 2 ), i = 1,..., n

More information

1. How can you tell if there is serial correlation? 2. AR to model serial correlation. 3. Ignoring serial correlation. 4. GLS. 5. Projects.

1. How can you tell if there is serial correlation? 2. AR to model serial correlation. 3. Ignoring serial correlation. 4. GLS. 5. Projects. 1. How can you tell if there is serial correlation? 2. AR to model serial correlation. 3. Ignoring serial correlation. 4. GLS. 5. Projects. 1) Identifying serial correlation. Plot Y t versus Y t 1. See

More information

ST430 Exam 2 Solutions

ST430 Exam 2 Solutions ST430 Exam 2 Solutions Date: November 9, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textbook are permitted but you may use a calculator. Giving

More information

Empirical Economic Research, Part II

Empirical Economic Research, Part II Based on the text book by Ramanathan: Introductory Econometrics Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna December 7, 2011 Outline Introduction

More information

Heteroskedasticity and Autocorrelation

Heteroskedasticity and Autocorrelation Lesson 7 Heteroskedasticity and Autocorrelation Pilar González and Susan Orbe Dpt. Applied Economics III (Econometrics and Statistics) Pilar González and Susan Orbe OCW 2014 Lesson 7. Heteroskedasticity

More information


LECTURE 10: MORE ON RANDOM PROCESSES LECTURE 10: MORE ON RANDOM PROCESSES AND SERIAL CORRELATION 2 Classification of random processes (cont d) stationary vs. non-stationary processes stationary = distribution does not change over time more

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression ST 430/514 Recall: a regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates).

More information

These slides illustrate a few example R commands that can be useful for the analysis of repeated measures data.

These slides illustrate a few example R commands that can be useful for the analysis of repeated measures data. These slides illustrate a few example R commands that can be useful for the analysis of repeated measures data. We focus on the experiment designed to compare the effectiveness of three strength training

More information

Correlation in Linear Regression

Correlation in Linear Regression Vrije Universiteit Amsterdam Research Paper Correlation in Linear Regression Author: Yura Perugachi-Diaz Student nr.: 2566305 Supervisor: Dr. Bartek Knapik May 29, 2017 Faculty of Sciences Research Paper

More information

Chapter 7: Variances. October 14, In this chapter we consider a variety of extensions to the linear model that allow for more gen-

Chapter 7: Variances. October 14, In this chapter we consider a variety of extensions to the linear model that allow for more gen- Chapter 7: Variances October 14, 2018 In this chapter we consider a variety of extensions to the linear model that allow for more gen- eral variance structures than the independent, identically distributed

More information

Heteroskedasticity. Part VII. Heteroskedasticity

Heteroskedasticity. Part VII. Heteroskedasticity Part VII Heteroskedasticity As of Oct 15, 2015 1 Heteroskedasticity Consequences Heteroskedasticity-robust inference Testing for Heteroskedasticity Weighted Least Squares (WLS) Feasible generalized Least

More information


LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises LINEAR REGRESSION ANALYSIS MODULE XVI Lecture - 44 Exercises Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Exercise 1 The following data has been obtained on

More information

13.2 Example: W, LM and LR Tests

13.2 Example: W, LM and LR Tests 13.2 Example: W, LM and LR Tests Date file = cons99.txt (same data as before) Each column denotes year, nominal household expenditures ( 10 billion yen), household disposable income ( 10 billion yen) and

More information

Introduction to the Analysis of Hierarchical and Longitudinal Data

Introduction to the Analysis of Hierarchical and Longitudinal Data Introduction to the Analysis of Hierarchical and Longitudinal Data Georges Monette, York University with Ye Sun SPIDA June 7, 2004 1 Graphical overview of selected concepts Nature of hierarchical models

More information

R 2 and F -Tests and ANOVA

R 2 and F -Tests and ANOVA R 2 and F -Tests and ANOVA December 6, 2018 1 Partition of Sums of Squares The distance from any point y i in a collection of data, to the mean of the data ȳ, is the deviation, written as y i ȳ. Definition.

More information

MLR Model Selection. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project

MLR Model Selection. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project MLR Model Selection Author: Nicholas G Reich, Jeff Goldsmith This material is part of the statsteachr project Made available under the Creative Commons Attribution-ShareAlike 3.0 Unported License: http://creativecommons.org/licenses/by-sa/3.0/deed.en

More information

Nonstationary time series models

Nonstationary time series models 13 November, 2009 Goals Trends in economic data. Alternative models of time series trends: deterministic trend, and stochastic trend. Comparison of deterministic and stochastic trend models The statistical

More information

Density Temp vs Ratio. temp

Density Temp vs Ratio. temp Temp Ratio Density 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Density 0.0 0.2 0.4 0.6 0.8 1.0 1. (a) 170 175 180 185 temp 1.0 1.5 2.0 2.5 3.0 ratio The histogram shows that the temperature measures have two peaks,

More information

Applied Regression Analysis

Applied Regression Analysis Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of

More information

Lecture 3: Multiple Regression

Lecture 3: Multiple Regression Lecture 3: Multiple Regression R.G. Pierse 1 The General Linear Model Suppose that we have k explanatory variables Y i = β 1 + β X i + β 3 X 3i + + β k X ki + u i, i = 1,, n (1.1) or Y i = β j X ji + u

More information

Homework 2. For the homework, be sure to give full explanations where required and to turn in any relevant plots.

Homework 2. For the homework, be sure to give full explanations where required and to turn in any relevant plots. Homework 2 1 Data analysis problems For the homework, be sure to give full explanations where required and to turn in any relevant plots. 1. The file berkeley.dat contains average yearly temperatures for

More information

1 Forecasting House Starts

1 Forecasting House Starts 1396, Time Series, Week 5, Fall 2007 1 In this handout, we will see the application example on chapter 5. We use the same example as illustrated in the textbook and fit the data with several models of

More information

Lecture 1: Linear Models and Applications

Lecture 1: Linear Models and Applications Lecture 1: Linear Models and Applications Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Introduction to linear models Exploratory data analysis (EDA) Estimation

More information

One-way ANOVA (Single-Factor CRD)

One-way ANOVA (Single-Factor CRD) One-way ANOVA (Single-Factor CRD) STAT:5201 Week 3: Lecture 3 1 / 23 One-way ANOVA We have already described a completed randomized design (CRD) where treatments are randomly assigned to EUs. There is

More information

1 Introduction 1. 2 The Multiple Regression Model 1

1 Introduction 1. 2 The Multiple Regression Model 1 Multiple Linear Regression Contents 1 Introduction 1 2 The Multiple Regression Model 1 3 Setting Up a Multiple Regression Model 2 3.1 Introduction.............................. 2 3.2 Significance Tests

More information

Weighted Least Squares

Weighted Least Squares Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w

More information

14 Multiple Linear Regression

14 Multiple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in

More information

Solution to Series 6

Solution to Series 6 Dr. M. Dettling Applied Series Analysis SS 2014 Solution to Series 6 1. a) > r.bel.lm summary(r.bel.lm) Call: lm(formula = NURSING ~., data = d.beluga) Residuals: Min 1Q

More information

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017 Introduction to Regression Analysis Dr. Devlina Chatterjee 11 th August, 2017 What is regression analysis? Regression analysis is a statistical technique for studying linear relationships. One dependent

More information

Introduction to Linear Regression Rebecca C. Steorts September 15, 2015

Introduction to Linear Regression Rebecca C. Steorts September 15, 2015 Introduction to Linear Regression Rebecca C. Steorts September 15, 2015 Today (Re-)Introduction to linear models and the model space What is linear regression Basic properties of linear regression Using

More information

Weighted Least Squares

Weighted Least Squares Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w

More information

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A = Matrices and vectors A matrix is a rectangular array of numbers Here s an example: 23 14 17 A = 225 0 2 This matrix has dimensions 2 3 The number of rows is first, then the number of columns We can write

More information

1 Least Squares Estimation - multiple regression.

1 Least Squares Estimation - multiple regression. Introduction to multiple regression. Fall 2010 1 Least Squares Estimation - multiple regression. Let y = {y 1,, y n } be a n 1 vector of dependent variable observations. Let β = {β 0, β 1 } be the 2 1

More information

Handout 4: Simple Linear Regression

Handout 4: Simple Linear Regression Handout 4: Simple Linear Regression By: Brandon Berman The following problem comes from Kokoska s Introductory Statistics: A Problem-Solving Approach. The data can be read in to R using the following code:

More information

BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression

BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression Introduction to Correlation and Regression The procedures discussed in the previous ANOVA labs are most useful in cases where we are interested

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Linear Mixed Models for Longitudinal Data Yan Lu April, 2018, week 14 1 / 64 Data structure and Model t1 t2 tn i 1st subject y 11 y 12 y 1n1 2nd subject

More information

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects Contents 1 Review of Residuals 2 Detecting Outliers 3 Influential Observations 4 Multicollinearity and its Effects W. Zhou (Colorado State University) STAT 540 July 6th, 2015 1 / 32 Model Diagnostics:

More information

Motivation for multiple regression

Motivation for multiple regression Motivation for multiple regression 1. Simple regression puts all factors other than X in u, and treats them as unobserved. Effectively the simple regression does not account for other factors. 2. The slope

More information

Econometrics of Panel Data

Econometrics of Panel Data Econometrics of Panel Data Jakub Mućk Meeting # 2 Jakub Mućk Econometrics of Panel Data Meeting # 2 1 / 26 Outline 1 Fixed effects model The Least Squares Dummy Variable Estimator The Fixed Effect (Within

More information

Modeling the Covariance

Modeling the Covariance Modeling the Covariance Jamie Monogan University of Georgia February 3, 2016 Jamie Monogan (UGA) Modeling the Covariance February 3, 2016 1 / 16 Objectives By the end of this meeting, participants should

More information

Econometrics of Panel Data

Econometrics of Panel Data Econometrics of Panel Data Jakub Mućk Meeting # 4 Jakub Mućk Econometrics of Panel Data Meeting # 4 1 / 30 Outline 1 Two-way Error Component Model Fixed effects model Random effects model 2 Non-spherical

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

ST505/S697R: Fall Homework 2 Solution.

ST505/S697R: Fall Homework 2 Solution. ST505/S69R: Fall 2012. Homework 2 Solution. 1. 1a; problem 1.22 Below is the summary information (edited) from the regression (using R output); code at end of solution as is code and output for SAS. a)

More information

Lecture 5: Clustering, Linear Regression

Lecture 5: Clustering, Linear Regression Lecture 5: Clustering, Linear Regression Reading: Chapter 10, Sections 3.1-3.2 STATS 202: Data mining and analysis October 4, 2017 1 / 22 .0.0 5 5 1.0 7 5 X2 X2 7 1.5 1.0 0.5 3 1 2 Hierarchical clustering

More information

Formulary Applied Econometrics

Formulary Applied Econometrics Department of Economics Formulary Applied Econometrics c c Seminar of Statistics University of Fribourg Formulary Applied Econometrics 1 Rescaling With y = cy we have: ˆβ = cˆβ With x = Cx we have: ˆβ

More information

Regression Analysis II

Regression Analysis II Regression Analysis II Measures of Goodness of fit Two measures of Goodness of fit Measure of the absolute fit of the sample points to the sample regression line Standard error of the estimate An index

More information

11.1 Gujarati(2003): Chapter 12

11.1 Gujarati(2003): Chapter 12 11.1 Gujarati(2003): Chapter 12 Time Series Data 11.2 Time series process of economic variables e.g., GDP, M1, interest rate, echange rate, imports, eports, inflation rate, etc. Realization An observed

More information

Chapter 12: Multiple Linear Regression

Chapter 12: Multiple Linear Regression Chapter 12: Multiple Linear Regression Seungchul Baek Department of Statistics, University of South Carolina STAT 509: Statistics for Engineers 1 / 55 Introduction A regression model can be expressed as

More information

Empirical Market Microstructure Analysis (EMMA)

Empirical Market Microstructure Analysis (EMMA) Empirical Market Microstructure Analysis (EMMA) Lecture 3: Statistical Building Blocks and Econometric Basics Prof. Dr. Michael Stein michael.stein@vwl.uni-freiburg.de Albert-Ludwigs-University of Freiburg

More information

The Multiple Regression Model Estimation

The Multiple Regression Model Estimation Lesson 5 The Multiple Regression Model Estimation Pilar González and Susan Orbe Dpt Applied Econometrics III (Econometrics and Statistics) Pilar González and Susan Orbe OCW 2014 Lesson 5 Regression model:

More information

Lecture 9 STK3100/4100

Lecture 9 STK3100/4100 Lecture 9 STK3100/4100 27. October 2014 Plan for lecture: 1. Linear mixed models cont. Models accounting for time dependencies (Ch. 6.1) 2. Generalized linear mixed models (GLMM, Ch. 13.1-13.3) Examples

More information

STAT 540: Data Analysis and Regression

STAT 540: Data Analysis and Regression STAT 540: Data Analysis and Regression Wen Zhou http://www.stat.colostate.edu/~riczw/ Email: riczw@stat.colostate.edu Department of Statistics Colorado State University Fall 205 W. Zhou (Colorado State

More information