REGRESSION WITH CORRELATED ERRORS C.A. GLASBEY

Similar documents
Regression Models - Introduction

Regression of Time Series

Empirical Market Microstructure Analysis (EMMA)

1 Random walks and data

Likely causes: The Problem. E u t 0. E u s u p 0

Department of Economics, UCSD UC San Diego

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

OFFICE OF NAVAL RESEARCH FINAL REPORT for TASK NO. NR PRINCIPAL INVESTIGATORS: Jeffrey D. Hart Thomas E. Wehrly

Making sense of Econometrics: Basics

Econometrics of Panel Data

Testing for non-stationarity

REGRESSION DIAGNOSTICS AND REMEDIAL MEASURES

Stochastic Processes

Econ 300/QAC 201: Quantitative Methods in Economics/Applied Data Analysis. 17th Class 7/1/10

Prof. Dr. Roland Füss Lecture Series in Applied Econometrics Summer Term Introduction to Time Series Analysis

Modules 1-2 are background; they are the same for regression analysis and time series.

The regression model with one stochastic regressor (part II)

Time Series Analysis -- An Introduction -- AMS 586

interval forecasting

Christopher Dougherty London School of Economics and Political Science

Econometric textbooks usually discuss the procedures to be adopted

Outline. Nature of the Problem. Nature of the Problem. Basic Econometrics in Transportation. Autocorrelation

Efficiency Tradeoffs in Estimating the Linear Trend Plus Noise Model. Abstract

Reading Assignment. Serial Correlation and Heteroskedasticity. Chapters 12 and 11. Kennedy: Chapter 8. AREC-ECON 535 Lec F1 1

Statistical Methods for Forecasting

Regression Models - Introduction

Simulating Properties of the Likelihood Ratio Test for a Unit Root in an Explosive Second Order Autoregression

statistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors:

COMPUTER ALGEBRA DERIVATION OF THE BIAS OF LINEAR ESTIMATORS OF AUTOREGRESSIVE MODELS

The Size and Power of Four Tests for Detecting Autoregressive Conditional Heteroskedasticity in the Presence of Serial Correlation

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Simple Linear Regression

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017

Impact of serial correlation structures on random effect misspecification with the linear mixed model.

The Identification of ARIMA Models

Casuality and Programme Evaluation

Simultaneous Equations and Weak Instruments under Conditionally Heteroscedastic Disturbances

Using Estimating Equations for Spatially Correlated A

E 31501/4150 Properties of OLS estimators (Monte Carlo Analysis)

Chapter 2 The Simple Linear Regression Model: Specification and Estimation

Time Series 4. Robert Almgren. Oct. 5, 2009

Econ 423 Lecture Notes: Additional Topics in Time Series 1

Econometrics Summary Algebraic and Statistical Preliminaries

Economics 308: Econometrics Professor Moody

FinQuiz Notes

Slides 12: Output Analysis for a Single Model

Estimation of Parameters

ECON The Simple Regression Model

The Estimation of Simultaneous Equation Models under Conditional Heteroscedasticity

INSTITUTE AND FACULTY OF ACTUARIES. Curriculum 2019 SPECIMEN SOLUTIONS

Darmstadt Discussion Papers in Economics

Time Series Methods. Sanjaya Desilva

LDA Midterm Due: 02/21/2005

at least 50 and preferably 100 observations should be available to build a proper model

Regression and correlation. Correlation & Regression, I. Regression & correlation. Regression vs. correlation. Involve bivariate, paired data, X & Y

Ch3. TRENDS. Time Series Analysis

Spatial Regression. 3. Review - OLS and 2SLS. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Model Mis-specification

A test for improved forecasting performance at higher lead times

LECTURE 11. Introduction to Econometrics. Autocorrelation

The exact bias of S 2 in linear panel regressions with spatial autocorrelation SFB 823. Discussion Paper. Christoph Hanck, Walter Krämer

Problem set 1 - Solutions

Estimation in Generalized Linear Models with Heterogeneous Random Effects. Woncheol Jang Johan Lim. May 19, 2004

FAQ: Linear and Multiple Regression Analysis: Coefficients

ECON3327: Financial Econometrics, Spring 2016

11.1 Gujarati(2003): Chapter 12

Financial Econometrics

Chapter 2: Unit Roots

Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University

Simple Linear Regression: The Model

EC408 Topics in Applied Econometrics. B Fingleton, Dept of Economics, Strathclyde University

Finite Population Sampling and Inference

1 Estimation of Persistent Dynamic Panel Data. Motivation

Topic 4 Unit Roots. Gerald P. Dwyer. February Clemson University

Statistical Methods in HYDROLOGY CHARLES T. HAAN. The Iowa State University Press / Ames

Journal of Asian Scientific Research COMBINED PARAMETERS ESTIMATION METHODS OF LINEAR REGRESSION MODEL WITH MULTICOLLINEARITY AND AUTOCORRELATION

Linear Models 1. Isfahan University of Technology Fall Semester, 2014

OMISSION OF RELEVANT EXPLANATORY VARIABLES IN SUR MODELS: OBTAINING THE BIAS USING A TRANSFORMATION

1. You have data on years of work experience, EXPER, its square, EXPER2, years of education, EDUC, and the log of hourly wages, LWAGE

Heteroskedasticity. y i = β 0 + β 1 x 1i + β 2 x 2i β k x ki + e i. where E(e i. ) σ 2, non-constant variance.

Sigmaplot di Systat Software

The regression model with one fixed regressor cont d

Final Exam November 24, Problem-1: Consider random walk with drift plus a linear time trend: ( t

9) Time series econometrics

SERIAL CORRELATION. In Panel data. Chapter 5(Econometrics Analysis of Panel data -Baltagi) Shima Goudarzi

Contrasting Models for Lactation Curve Analysis

Multiple Regression Analysis

Generalized Method of Moments: I. Chapter 9, R. Davidson and J.G. MacKinnon, Econometric Theory and Methods, 2004, Oxford.

SOME BASICS OF TIME-SERIES ANALYSIS

Confidence Intervals, Testing and ANOVA Summary

Semester 2, 2015/2016

Heteroscedasticity and Autocorrelation

Volatility. Gerald P. Dwyer. February Clemson University

TIME SERIES ANALYSIS. Forecasting and Control. Wiley. Fifth Edition GWILYM M. JENKINS GEORGE E. P. BOX GREGORY C. REINSEL GRETA M.

Economics 620, Lecture 2: Regression Mechanics (Simple Regression)

1 Cricket chirps: an example

Statistics 910, #5 1. Regression Methods

A MODEL-BASED EVALUATION OF SEVERAL WELL-KNOWN VARIANCE ESTIMATORS FOR THE COMBINED RATIO ESTIMATOR

Development of Stochastic Artificial Neural Networks for Hydrological Prediction

Testing for deterministic trend and seasonal components in time series models

Transcription:

41 REGRESSION WITH CORRELATED ERRORS C.A. GLASBEY SYSTEMATIC RESIDUALS When data exhibit systematic departures from a fitted regression line (see for example Figs 1 and 2), either the regression function is inappropriate, or the errors are correlated, or both. In most cases it is assumed that the function is deficient, and it is changed. But there are situations where the assumption of independent errors is not wholly plausible. For example, some sources of error will persist over several observations when repeated measurements are made on a single experimental unit. Systematic departures may be modelled either by another regression function, or by correlated errors. To illustrate, consider a + bxi + c sin xi + ei i=l,..., n, where a, b, xu..., xn are constants, c is normally distributed vjith mean 0, variance 1: 2, and elp en are independently normally distributed with means 0, variances a 2 Two models are equally valid for the y' s, either they are independently normally distributed with means a+bxi + csinxi, variances a 2, or they are correlated, with means a+bxi, variances 1: 2 sin 2 xi + a 2 and covariances 1: 2 sin xi sin xj between Yi and yj. In general the modelling objective determines the choice: for a simple summary it may be preferable for the regression function to explain all systematic variability, whereas a correlated stochastic component may be of more assistance in understanding the data generating mechanism. A succinct summary of data is often achieved by using the regression function to describe the long-term trends and the correlations the short-term fluctuations. This will be the assumed case from now on. CORRELATED ERRORS In the presence of correlated errors, ordinary least squares regression parameter estimators remain unbiased, but they may be inefficient, and the conventional estimators of the variances of

42 these estimators are usually biased. For example, consider a + b (i - 5.5) + ei i=l, 1 10, where e 1, e 10 are normally distributed with means 0, variances 1, and the correlation between ei and ei is rli-il. The least squares estimators of a and b are a 0.1 (yl + Y2 + Y3 + Y4 + Ys + YG + Y7 + Ys + Y9 + Y1o) b 0.055 (y 10 - y 1 ) + 0.042 (y 9 - y 2 ) + 0.030 (Ya- Y3) + 0 0 18 ( y 7 - y 4 ) + 0 0 0 6 ( Y 6 - Y s) If r 0.9 then var (a) 0.73 var (b) 0.017 The generalised least squares estimators, that is those with minimum variance, are b = 0.107 (y 10 - y 1 ) + 0.003 (y 9 - y 2) + 0.002 (y8 - y3) + 0.001 (y 7 - y 4 ) + 0.0004 (y 6 - Ys) and var (a) 0.68 var(b) 0.015 Therefore a is 93% efficient and b is 88% efficient. However the least squares estimators of the variances, based on the assumptions of independent e's, have expectations E (var (a)) 0.016 E (var (b)) 0.0020 and are therefore very biased. Results for other values of r are given below

43 1000 1000 1000 r var (a) var (a) efficiency % E (var (a l l bias % -0.9 9 6 68 121 1318-0.6 30 27 91 116 292-0.3 57 56 98 110 91 0.0 100 100 100 100 0 0.3 173 171 99 85-51 0.6 325 308 95 59-82 0.9 728 679 93 16-98 10000 10000 10000 r var (b) var (b) efficiency % E (var (b)) bias % -0.9 31 9 28 146 370-0.6 48 38 81 141 196-0.3 78 75 96 133 70 0.0 121 121 100 121 0 0.3 182 176 97 103-44 0.6 250 226 90 71-72 0.9 172 151 88 20-88 These results show that least squares estimators are not too inefficient provided that r is positive, but that estimated variances can be severely biased, typically downwards for positive r. The simplest solution, in general, is to discard the biased standard errors. This approach is most useful when no estimate of precision is required, for example when data are available from independent units and within-unit variability is of little importance (Rowell and Walters, 1976). Alternatively, if it can be assumed that the errors arose from a particular stochastic model, any parameters can be estimated jointly with the regression ones by maximising the either empirical or mechanistic. likelihood. Models may be The mechanistic approach requires knowledge of the processes by which the data were generated, whereas the empirical method is purely data-based. Seber and Wild (1989) discuss and review a wide range of possible models. EMPIRICAL MODELS To adopt an empirical approach, gross assumptions have to be made about the correlation structure. Otherwise there are

44 far too many possible models for one to be identified from a single set of observations. For serially-structured data it may be reasonable to assume that the error process is stationary, in which case the correlations between errors depend solely on the time separation between them. Further, it may be possible to assume that the errors are a sample record from an autoregressive-moving average process of low order (Gallant and Goebel, 1976). However, maximum likelihood estimators are not as efficient as generalised least squares would suggest, because covariance parameters have to be estimated. For example, in the previously considered first-order autoregressive case, the maximum likelihood estimator of r can be approximated by and an approximately unbiased estimate of the error variances is given by where elf elo are the regression residuals. Commencing from the least squares estimate ~~ r and a 2 are estimated, and used to re-estimate~ by generalised least squares. Then rand 0 2 are re-estimated, and so on until convergence. From 1000 simulations with r = 0.9, var (a) 0.73, which is no less than the variance of the least squares estimator. Further, standard errors are biased: E (var (a)) 0.08. This is mainly a consequence of r being underestimated. The average value from the simulations was 0.2. Residual maximum likelihood estimation (Cooper and Thompson, 1977) is more complicated, but does reduce the bias in r, the average result being 0.6. Also, r was sometimes estimated to be 1, which implied that a could not be estimated with any accuracy at all. However, the slope parameter was still estimable, and

45 resulted in var (b) 0. 016, E (var (b)) 0.012. Little improvement in efficiency has been gained over least squares, but standard errors are of the right magnitude. An alternative, is to simply modify the least squares standard errors, based on var (b) where b I: wi Yi Therefore var (b) From the simulations E (var (b)) 0.013. If the assumed error model is incorrect, then least squares can be more efficient than supposed maximum likelihood estimation. For example, if the above case is modified so that Y1 and Y10 are independent of the remaining eight data values, but the first-order autoregressive model is still assumed to hold, then var (b) 0.012 var (b) 0.021 MECHANISTIC MODELS The mechanisms by which errors are correlated are sometimes known precisely; for instance, when data are obtained by applying algebraic operations such as summing, averaging or differencing to independent observations. On other occasions the covariances are known to within a few parameters. Many non-linear regression equations have some justification in terms of underlying deterministic models such as differential equations (Sandland and McGilchrist, 1979) or compartment systems (Matis and Wehrly, 1979). By making these models stochastic, regression functions and error processes can be generated with shared parameters. If a stochastic model is appropriate, then, in fitting it to data,

46 the most efficient parameter estimators are obtained, because the regression curve is fitted efficiently, and also because extra information on the parameter values is recovered from the error covariances. Glasbey (1988) used two data sets to illustrate the methods and problems encountered. Drug-induced currents in ion-channels, as in Fig 1, were satisfactorily represented by a stochastic compartment system. First-order linear stochastic difference equations were used to model milk yield of cows, as for example in Fig 2. In this case, the proposed model did not fit adequately, and the results it produced were potentially very misleading. In general, for mechanistic error models it is recommended t.hat (i) an independent observation error is included, (ii) the goodness-of-fit is tested by refitting the model with separate parameters in the regression and error components, and (iii) a stochastic model is not used unless it has a sound scientific basis. It should not be assumed that a particular stochastic model is appropriate simply because its deterministic counterpart fits well. Sources of error are usually many and varied and it is safer to model them separately from the regression model. CONCLUSIONS Maximum likelihood estimators which are based on the wrong error variance model may be even less efficient, and the estimated standard errors even more biased, than the ordinary least squares ones (Engle, 1974) Moreover, regression parameters can have a different interpretation when errors are modelled by a correlated process. The conjunction of regression model and error model describe a data set, so a change in the error model forces a compensatory change in the regression model. For example, the fitted regression will make larger systematic departures from the data if errors are assumed to be highly correlated than if they are assumed to be independent. Therefore, although the estimation of regression parameters with almost any choice of variance matrix has become computationally easy, it is beset with statistical difficulties.

47 Current through end-plate membrane of muscle fibre after a voltage jump, and least squares fit of exponential curve. -2~ I~ J t -301. -40-50 Current ( na) -60-70 -80-90 5 Time (ms)

48 Daily milk yields of a cow at weekly intervals, and least squares fit of a lactation curve (exponential plus linear trend).., 40 x \ ~ Milk yield (Kg/day) "" x 0 x><x 5 0 Time (weeks)

49 REFERENCES Cooper, D.M. & Thompson, R. (1977) A note on the estimation of the autoregressive-moving average process, Biometrika, 64, 625-628. Engle, R.F. (1974) Specification of the disturbance for efficient estimation, Econometrica, 42, 135-146. Gallant, A.R. & Goebel, J. J. (1976) Nonlinear regression with autocorrelated errors, Journal of the American Statistical Association, 71, 961-967. Glasbey, CA (19.88) Examples of regression with serially correlated errors. The Statistician, 37, 277-291. Matis, J.H. & Wehrly, T.E. (1979) Stochastic models of compartmental systems, Biometrics, 35, 199-220. Rowell, J.G. & Walters, D.E. (1976) Analysing data with repeated observations on each experimental unit, Journal of Agricultural Science, Cambridge, 87, 423-432. Sandland, R.L. & McGilchrist, C.A. (1979) Stochastic growth curve analysis, Biometrics, 35, 255-271. Seber, G.A.F. & Wild, C.J. (1989) Nonlinear Regression, New York, Wiley, chapers 6-8. Scottish Agricultural Statistics Service JCMB, The King's Buildings Edinburgh EH9 3JZ Scotland

50