STAT Regression Methods

Similar documents
Basics: Definitions and Notation. Stationarity. A More Formal Definition

F9 F10: Autocorrelation

Chapter 3: Regression Methods for Trends

Math 3330: Solution to midterm Exam

10. Time series regression and forecasting

Statistical View of Least Squares

Non-independence due to Time Correlation (Chapter 14)

Heteroscedasticity and Autocorrelation

Chapter 7. Linear Regression (Pt. 1) 7.1 Introduction. 7.2 The Least-Squares Regression Line

22s:152 Applied Linear Regression. Returning to a continuous response variable Y...

Questions and Answers on Heteroskedasticity, Autocorrelation and Generalized Least Squares

Linear Regression and Correlation. February 11, 2009

22s:152 Applied Linear Regression. In matrix notation, we can write this model: Generalized Least Squares. Y = Xβ + ɛ with ɛ N n (0, Σ)

Model Mis-specification

Time series and Forecasting

2. An Introduction to Moving Average Models and ARMA Models

Reading Assignment. Serial Correlation and Heteroskedasticity. Chapters 12 and 11. Kennedy: Chapter 8. AREC-ECON 535 Lec F1 1

Ch3. TRENDS. Time Series Analysis

Ch 6. Model Specification. Time Series Analysis

Business Statistics. Lecture 10: Correlation and Linear Regression

Simple Linear Regression

Fitting a regression model

STAT2201 Assignment 6

Lecture 14 Simple Linear Regression

Scatter plot of data from the study. Linear Regression

MODELING INFLATION RATES IN NIGERIA: BOX-JENKINS APPROACH. I. U. Moffat and A. E. David Department of Mathematics & Statistics, University of Uyo, Uyo

Auto correlation 2. Note: In general we can have AR(p) errors which implies p lagged terms in the error structure, i.e.,

Problem Set 1 Solution Sketches Time Series Analysis Spring 2010

Volume 31, Issue 1. The "spurious regression problem" in the classical regression model framework

Minitab Project Report - Assignment 6

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Chapter 1. Linear Regression with One Predictor Variable

Scatter plot of data from the study. Linear Regression

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

AP Statistics Unit 6 Note Packet Linear Regression. Scatterplots and Correlation

Module 3. Descriptive Time Series Statistics and Introduction to Time Series Models

Exam Applied Statistical Regression. Good Luck!

Chapter 6: Model Specification for Time Series

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

Regression - Modeling a response

Section 5.4 Residuals

Statistical View of Least Squares

Inferences for Regression

Lecture 6: Linear Regression (continued)

Quantitative Bivariate Data

The Model Building Process Part I: Checking Model Assumptions Best Practice

Empirical Market Microstructure Analysis (EMMA)

Lecture 6: Linear Regression

Regression of Time Series

1. How can you tell if there is serial correlation? 2. AR to model serial correlation. 3. Ignoring serial correlation. 4. GLS. 5. Projects.

Statistics for Managers using Microsoft Excel 6 th Edition

Introduction to Simple Linear Regression

STAT5044: Regression and Anova. Inyoung Kim

Circle the single best answer for each multiple choice question. Your choice should be made clearly.

The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1)

AUTOCORRELATION. Phung Thanh Binh

Measuring the fit of the model - SSR

Inference with Simple Regression

Reliability and Risk Analysis. Time Series, Types of Trend Functions and Estimates of Trends

LECTURE 11. Introduction to Econometrics. Autocorrelation

INFERENCE FOR REGRESSION

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION

Analysis. Components of a Time Series

Well-developed and understood properties

Topic 4 Unit Roots. Gerald P. Dwyer. February Clemson University

White Noise Processes (Section 6.2)

Assumptions, Diagnostics, and Inferences for the Simple Linear Regression Model with Normal Residuals

Chapter 10 Correlation and Regression

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik

Statistics: A review. Why statistics?

Basic Business Statistics 6 th Edition

Chapter 12 - Part I: Correlation Analysis

Dr. Allen Back. Sep. 23, 2016

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Serially Correlated Regression Disturbances

Any of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure.

CREATED BY SHANNON MARTIN GRACEY 146 STATISTICS GUIDED NOTEBOOK/FOR USE WITH MARIO TRIOLA S TEXTBOOK ESSENTIALS OF STATISTICS, 3RD ED.

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

Statistics 910, #5 1. Regression Methods

Categorical Predictor Variables

Warm-up Using the given data Create a scatterplot Find the regression line

Econometrics. 9) Heteroscedasticity and autocorrelation

Section 3: Simple Linear Regression

Forecasting the term structure interest rate of government bond yields

Analysis of Bivariate Data

Eco 391, J. Sandford, spring 2013 April 5, Midterm 3 4/5/2013

7.0 Lesson Plan. Regression. Residuals

THE LINEAR DISCRIMINATION PROBLEM

LDA Midterm Due: 02/21/2005

23. Inference for regression

11.1 Gujarati(2003): Chapter 12

Reteach 2-3. Graphing Linear Functions. 22 Holt Algebra 2. Name Date Class

CHAPTER 4 DESCRIPTIVE MEASURES IN REGRESSION AND CORRELATION

Characteristics of Linear Functions (pp. 1 of 8)

Overview. 4.1 Tables and Graphs for the Relationship Between Two Variables. 4.2 Introduction to Correlation. 4.3 Introduction to Regression 3.

STAT 212 Business Statistics II 1

Bivariate Data: Graphical Display The scatterplot is the basic tool for graphically displaying bivariate quantitative data.

1 Introduction to Generalized Least Squares

FinQuiz Notes

Likely causes: The Problem. E u t 0. E u s u p 0

Transcription:

STAT 501 - Regression Methods Unit 9 Examples Example 1: Quake Data Let y t = the annual number of worldwide earthquakes with magnitude greater than 7 on the Richter scale for n = 99 years. Figure 1 gives a time series plot showing a slowly cycling pattern (gradual increases and decreases) for this dataset. Figure 1: Minitab output for a time series plot over the 99-year time period for the quake dataset. Identifying the Order of an Autoregression Model Figure 2 gives a plot for the number of quakes versus the number of quakes in the previous year. (In Minitab, we used Stat Time Series Lag to create the column called lag1quakes.) There looks to be a moderate linear pattern, suggesting that the first order autoregression model y t = β 0 + β 1 y t 1 + ɛ t could be useful. Figure 3 gives a plot of the PACF (partial autocorrelation function), which can be interpreted to mean that the first-order autoregression may be sufficient. The vertical scale gives the value of the partial correlation and the horizontal scale gives the lag (time span) between values. The only notable (in size) correlation is for lag 1. 1

Figure 2: Scatterplot of the number of earthquakes versus the number of earthquakes in the previous year (lag 1). The next step is to do a simple regression with number of quakes as the response variable and number of quakes the previous year as the predictor variable. The results are given in Figure 4 and we see that the predictor for this case is highly significant. Finally, we obtain the autocorrelations within the series of residuals from the model estimated in Figure 4. These autocorrelations are given in figure 5. The vertical scale (negative or positive) gives the value of the correlation. The horizontal scale gives the lag. There are only weak correlations, so the residuals for the first-order autoregression model could reasonably be assumed to be independent (of each other). Simple Regression with Autoregressive (Autocorrelated) Errors Chapter 12 of the textbook considers the problem of autocorrelation within the errors (so they are not independent over time) when the y-variable and x-variable(s) are measured over time. It is assumed that the errors may have a first-order autoregression model. Thus, the overall model is as follows: y t = β 0 + β 1 x t + ɛ t ɛ t = ρɛ t 1 + u t, where we assume the u t s are iid with mean 0 and variance σ 2 and that the u t s and the ɛ t s are independent of each other. It is important to note that the parameter ρ can be proved to equal the correlation between ɛ t and ɛ t 1. The big issue is that if the residuals are dependent in time, the standard errors of coefficients calculated, assuming that the residuals are independent, will incorrectly estimate the true size of the standard errors. Examining Whether this Model May Be Necessary The steps in Minitab are: 1. Start by doing an ordinary regression. Store the residuals. 2

Figure 3: PACF plot for the earthquake data. Figure 4: Minitab output pertaining to the first-order autogregression model. 2. Plot the residuals in time order (either using Minitabs Graph button in Regression, or with a Time Series plot of the stored residuals (Graph Time Series). A slowly undulating time series plot (long sequences of residuals on the same side of zero) indicates a correlation between e t and e t 1. 3. Use Stat Time Series Lag to create a column of lagged residuals e t 1. Plot e t versus e t 1. A linear pattern indicates autocorrelation in the errors. 4. Calculate the correlation between e t and e t 1 (Stat Basic Stats Correlation). Examine its statistical significance. Example 2: Oil Data The data are from U.S. oil and gas price index values for 82 months. There is a strong linear pattern for the relationship between the two variables, as can be seen in Figure 6. We start the analysis by doing a simple linear regression. Minitab results for this analysis are given in Figure 7. The residuals in time order show a dependent pattern (see the plot in Figure 8). The slow cyclical pattern that we see happens because there is a tendency for residuals to keep the 3

Figure 5: Autocorrelations for the model estimated in Figure 4. same algebraic sign for several consecutive months. We also used Stat Time Series Lag to create a column of the lag 1 residuals. The correlation coefficient between the residuals and the lagged residuals is calculated to be 0.829 (and is calculated using Stat Basic Stats Correlation, which can be seen in the bottom of Figure 8). So, the overall analysis strategy in presence of autocorrelated errors is as follows: Do an ordinary regression. Identify the difficulty in the model (autocorrelated errors). Using the stored residuals from the linear regression, use regression to estimate the model for the errors, ɛ t = ρɛ t 1 + u t where the u t are iid with mean 0 and variance σ 2. Adjust the parameter estimates and their standard errors from the original regression. A Method for Adjusting the Original Parameter Estimates (Cochrane-Orcutt Method) Let ˆρ = estimated lag 1 autocorrelation in the residuals from the ordinary regression (in the U.S. oil example, ˆρ = 0.829). Let y t = y t ˆρy t 1. This will be used as a response variable. Let x t = x t ˆρx t 1. This will be used as a predictor variable. and x t. This model should have time inde- Do an ordinary regression between yt pendent residuals. The sample slope from the regression directly estimates β 1, the slope of the relationship between the original y and x. 4

Figure 6: Scatterplot of gas prices versus oil prices. Figure 7: Regression analysis for the U.S. oil data. The correct estimate of the intercept for the original model y versus x relationship is calculated as β 0 = ˆβ 0, where ˆβ 1 ˆρ 0 is the sample intercept obtained from the regression done with the modified variables. Returning to the U.S. oil data, the value of ˆρ = 0.829 and the modified variables are ynew = y t 0.829y t 1 and xnew = x t 0.829x t 1. The regression results are given in Figure 9. Parameter Estimates for the Original Model Our real goal is to estimate the original model y t = β 0 + β 1 x t + ɛ t. The estimates come from the results just given. ˆβ 1 = 1.08073 ˆβ 0 = 1.412 1 0.829 = 8.257 These estimates give the sample regression model: y t = 8.257 + 1.08073x t + ɛ t, with e t = 0.829e t 1 + u t, where u t s are iid with mean 0 and variance σ 2. 5

Figure 8: A plot of the residuals in time order along with the correlation results for the U.S. oil data. Figure 9: Regression results for the U.S. oil data using the modified variables. Correct Standard Errors for the Coefficients The correct standard error for the slope is taken directly from the regression with the modified variables. The correct standard error for the intercept is s.e.( ˆβ 0 ) = s.e.( ˆβ 0 ) 1 ˆρ. Coefficient Correct Estimate Correct Standard Error 1.412 Intercept 2.529 1 0.829 1 0.829 Slope 1.08073 0.05960 Wrong Estimate Wrong Standard Error Intercept -31.349 5.29 Slope 1.17677 0.02305 Table 1: Correct and wrong estimates for the coefficients. Table 1 compares the correct standard errors to the incorrect estimates based on the ordinary regression. The correct estimates come from the work done in this section of the notes. The wrong estimates are from the regression estimates reported in Figure 7. 6

Notice that the correct standard errors are larger than the incorrect values. If ordinary least squares estimation is used when the errors are autocorrelated, the standard errors often are underestimated. It is also important to note that this does not always happen. Overestimation of the standard errors is an on average tendency over all problems. Prediction Issues When calculating predicted values, it is important to utilize ɛ t = ρɛ t 1 + u t as part of the process. In the U.S. oil example, ŷ t = 8.257 + 1.08073x t + e t = 8.257 + 1.08073x t + 0.829e t 1. Values of ŷ t are computed iteratively. Assume e 0 = 0 (error before t = 1 is 0), compute ŷ 1 and e 1 = y 1 ŷ 1. Use the value of e 1 = y 1 ŷ 1 when computing ŷ 2 = 8.257 + 1.08073x 2 + 0.829e 1. Determine e 2 = y 2 ŷ 2, and use that value when computing ŷ 3 = 8.257+1.08073x 3 + 0.829e 2. Iterate. 7