ute measures of uncertainty called standard errors for these b j estimates and the resulting forecasts if certain conditions are satis- ed. Note the e

Similar documents
Estimating Time-Series Models

Hotelling s Two- Sample T 2

arxiv: v1 [physics.data-an] 26 Oct 2012

Use of Transformations and the Repeated Statement in PROC GLM in SAS Ed Stanek

Statistics II Logistic Regression. So far... Two-way repeated measures ANOVA: an example. RM-ANOVA example: the data after log transform

SAS for Bayesian Mediation Analysis

4. Score normalization technical details We now discuss the technical details of the score normalization method.

Solved Problems. (a) (b) (c) Figure P4.1 Simple Classification Problems First we draw a line between each set of dark and light data points.

Variable Selection and Model Building

One-way ANOVA Inference for one-way ANOVA

PHYS 301 HOMEWORK #9-- SOLUTIONS

General Linear Model Introduction, Classes of Linear models and Estimation

8 STOCHASTIC PROCESSES

Notes on Instrumental Variables Methods

Chapter 13 Variable Selection and Model Building

A Comparison between Biased and Unbiased Estimators in Ordinary Least Squares Regression

Ensemble Forecasting the Number of New Car Registrations

Objectives. 6.1, 7.1 Estimating with confidence (CIS: Chapter 10) CI)

Chapter 10. Supplemental Text Material

MATH 2710: NOTES FOR ANALYSIS

Feedback-error control

Using the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process

2x2x2 Heckscher-Ohlin-Samuelson (H-O-S) model with factor substitution

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO)

CHAPTER 5 STATISTICAL INFERENCE. 1.0 Hypothesis Testing. 2.0 Decision Errors. 3.0 How a Hypothesis is Tested. 4.0 Test for Goodness of Fit

Analysis of some entrance probabilities for killed birth-death processes

State Estimation with ARMarkov Models

Deriving Indicator Direct and Cross Variograms from a Normal Scores Variogram Model (bigaus-full) David F. Machuca Mory and Clayton V.

Chapter 7 Sampling and Sampling Distributions. Introduction. Selecting a Sample. Introduction. Sampling from a Finite Population

Towards understanding the Lorenz curve using the Uniform distribution. Chris J. Stephens. Newcastle City Council, Newcastle upon Tyne, UK

The one-sample t test for a population mean

CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules

Numerical Linear Algebra

CSC165H, Mathematical expression and reasoning for computer science week 12

Tests for Two Proportions in a Stratified Design (Cochran/Mantel-Haenszel Test)

Objectives. Estimating with confidence Confidence intervals.

Lower Confidence Bound for Process-Yield Index S pk with Autocorrelated Process Data

An Improved Calibration Method for a Chopped Pyrgeometer

2. Sample representativeness. That means some type of probability/random sampling.

The Poisson Regression Model

Machine Learning: Homework 4

Morten Frydenberg Section for Biostatistics Version :Friday, 05 September 2014

Unit 1 - Computer Arithmetic

Scaling Multiple Point Statistics for Non-Stationary Geostatistical Modeling

Statics and dynamics: some elementary concepts

LOGISTIC REGRESSION. VINAYANAND KANDALA M.Sc. (Agricultural Statistics), Roll No I.A.S.R.I, Library Avenue, New Delhi

arxiv: v2 [stat.me] 3 Nov 2014

Spectral Analysis by Stationary Time Series Modeling

arxiv:cond-mat/ v2 25 Sep 2002

Measuring center and spread for density curves. Calculating probabilities using the standard Normal Table (CIS Chapter 8, p 105 mainly p114)

Pretest (Optional) Use as an additional pacing tool to guide instruction. August 21

Measuring center and spread for density curves. Calculating probabilities using the standard Normal Table (CIS Chapter 8, p 105 mainly p114)

MAKING WALD TESTS WORK FOR. Juan J. Dolado CEMFI. Casado del Alisal, Madrid. and. Helmut Lutkepohl. Humboldt Universitat zu Berlin

Bayesian Spatially Varying Coefficient Models in the Presence of Collinearity

Genetic Algorithms, Selection Schemes, and the Varying Eects of Noise. IlliGAL Report No November Department of General Engineering

Correspondence Between Fractal-Wavelet. Transforms and Iterated Function Systems. With Grey Level Maps. F. Mendivil and E.R.

Study on determinants of Chinese trade balance based on Bayesian VAR model

Universal Finite Memory Coding of Binary Sequences

Evaluating Process Capability Indices for some Quality Characteristics of a Manufacturing Process

A New Asymmetric Interaction Ridge (AIR) Regression Method

Principles of Computed Tomography (CT)

An Analysis of Reliable Classifiers through ROC Isometrics

substantial literature on emirical likelihood indicating that it is widely viewed as a desirable and natural aroach to statistical inference in a vari

Causality Testing using Higher Order Statistics

CMSC 425: Lecture 4 Geometry and Geometric Programming

A continuous review inventory model with the controllable production rate of the manufacturer

Paper C Exact Volume Balance Versus Exact Mass Balance in Compositional Reservoir Simulation

Estimation of the large covariance matrix with two-step monotone missing data

A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split

Chapter 3. GMM: Selected Topics

COMMUNICATION BETWEEN SHAREHOLDERS 1

The Graph Accessibility Problem and the Universality of the Collision CRCW Conflict Resolution Rule

ECON Answers Homework #2

x and y suer from two tyes of additive noise [], [3] Uncertainties e x, e y, where the only rior knowledge is their boundedness and zero mean Gaussian

MATHEMATICAL MODELLING OF THE WIRELESS COMMUNICATION NETWORK

On split sample and randomized confidence intervals for binomial proportions

ECE 534 Information Theory - Midterm 2

Chapter 7 Rational and Irrational Numbers

HENSEL S LEMMA KEITH CONRAD

Exercises Econometric Models

Chemical Kinetics and Equilibrium - An Overview - Key

7.2 Inference for comparing means of two populations where the samples are independent

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

An Outdoor Recreation Use Model with Applications to Evaluating Survey Estimators

Introduction to Probability and Statistics

DETC2003/DAC AN EFFICIENT ALGORITHM FOR CONSTRUCTING OPTIMAL DESIGN OF COMPUTER EXPERIMENTS

A Special Case Solution to the Perspective 3-Point Problem William J. Wolfe California State University Channel Islands

Series Handout A. 1. Determine which of the following sums are geometric. If the sum is geometric, express the sum in closed form.

Using a Computational Intelligence Hybrid Approach to Recognize the Faults of Variance Shifts for a Manufacturing Process

Plotting the Wilson distribution

Exercise 2: Equivalence of the first two definitions for a differentiable function. is a convex combination of

Chapter 4 Randomized Blocks, Latin Squares, and Related Designs Solutions

Chapter 5 Notes. These notes correspond to chapter 5 of Mas-Colell, Whinston, and Green.

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley

Approximate Dynamic Programming for Dynamic Capacity Allocation with Multiple Priority Levels

International Trade with a Public Intermediate Good and the Gains from Trade

Characterizing the Behavior of a Probabilistic CMOS Switch Through Analytical Models and Its Verification Through Simulations

Estimation of Separable Representations in Psychophysical Experiments

FE FORMULATIONS FOR PLASTICITY

Monte Carlo Studies. Monte Carlo Studies. Sampling Distribution

Transcription:

Regression with Time Series Errors David A. Dickey, North Carolina State University Abstract: The basic assumtions of regression are reviewed. Grahical and statistical methods for checking the assumtions are resented using a sales examle. Deartures from indeendence in time series data are emhasized and illustrated in the examle. Several roducts from SAS TM Institute for analyzing regressions with time series errors are illustrated. The imortance of the stochastic roerties of the model inut variables is emhasized. Forecasts from several models for the examle data are comared. 1. Introduction: Regression is a tool that allows one to model the relationshi between a resonse variable Y, which might be a mail order comany's sales, and some exlanatory variables usually denoted X j where X 1 might be the cost of one item from the comany, X 2 the cost of a similar item from a cometitor comany and X 3 the number of hone calls coming in to the comany's switchboard. A tyical regression model for this situation is Y = 0 + 1 X 1 + 2 X 2 + 3 X 3 + e where the regression coecients,, are unknown. You would like to estimate these s, for if you could, you would then have an equation for redicting a future Y from the associated Xs. Notice that even if the regression coecients were known, such a rediction would require knowledge of future X values. For examle, if t reresents time and X = t, then art of our model consists of a simle linear time trend and there will be no surrises when we try to extend the time sequence 1; 2; 3;:::;n into the future. On the other hand if X t is the number of incoming hone calls at time t then forecasting to time n +1 would require that some value be inserted for X n+1 and this value will itself likely be a forecast. These two examles reresent deterministic and stochastic exlanatory variables, resectively. The nature of the X variables will aect the forecast accuracy - obviously a erson forecasting with a known future X is better o than one who must estimate that future X. Thus a roblem we will need to deal with, if we want to ut some sort of error bounds on our forecasts, is the incororation of our level of uncertainty about the future X values. The usual way of estimating the s is the method used in PROC GLM and PROC REG. The method is referred to as ordinary least squares in that it nds estimates b j of the j arameters that P minimize the error n sum of squares SSE = (Y t=1 t,(b 0 +b 1 X 1 + b 2 X 2 +b 3 X 3 )) 2. This SSE is a function of the estimates, b j, and much of the subject of calculus is concerned with nding values of arguments, like these b j, that minimize a function, SSE in our case. Thus we have mathematical tools which are relatively easy to imlement on the comuter that allow ustond the minimizing values. This is what PROC REG and PROC GLM are set u to do. Furthermore, statistical theory allows us to com- 1

ute measures of uncertainty called standard errors for these b j estimates and the resulting forecasts if certain conditions are satis- ed. Note the exression \if certain conditions are satised." It is this with which we are concerned here. In this aer we review these \certain conditions," indicate why they might be violated when data are taken over time, resent methods for checking these conditions, and nally reresent corrections that can be alied if the conditions are violated. The corrections that we seak of are imlemented in SAS Institute's PROC AUTOREG. Throughout the aer we will use an articial examle in which X t reresents the number of hone calls in week t to a mail order comany and Y t is the number of shiments for that week. Figure 1 shows the data over a 3 year eriod. We are interested in estimating the comany's growth, estimating the number of shiments generated er hone call, and forecasting hone calls and sales two weeks into the future. 2. Checking the usual assumtions. Our model is Y t = 0 + 1 X t + 2 t + e t. We assume (A) Normality: The errors all come from normal distributions (B) Homogeneity: These normal distributions all have mean 0 and the same variance, 2 (C) Indeendence: The correlation between e i and e j is 0 (for i not equal to j) We can check the normality assumtion by drawing histograms and normal robability lots of the residuals. In gures 2-4, we see a histogram of the residuals, a hanging histogram in which each bar becomes a line segment at the former bar midoint, this line being hung from the normal curve rather than rising from the horizontal axis, and a lot of the residuals against their normal scores. These are very easy to roduce using the following code: roc caability grahics; histogram r /normal hanging vref=0; histogram r / normal; qqlot r / normal (mu=est sigma=est); The histograms look reasonably normal and the quantile-quantile lot reasonably straight. PROC CAPABILITY also resents tests of the normality hyothesis but the theory behind these assumes indeendence, an assumtion we have yet to check. Not shown is a simle lot of residuals against redicted values. Because this looks uniform (as oosed to megahone shaed) this check on the homogeneous variance assumtion does not give us cause for concern. The regression and subsequent calculation of residuals was accomlished with this code: roc reg; model y = t x/dw; roc reg; model y = t/dw; where Y is sales, X hone calls, and t week number. The revious residual analysis was from the rst regression. The advantage of the second regression is that only future values of t would be needed for forecasting whereas for the rst model we would need to know, or at least estimate, next week's hone calls to forecast sales. Notice the dw otions. These request the \Durbin Watson" statistic which is a test for autocorrelation, that is, correlation between successive residuals. Autocorrelation is a commonly occurring violation of the indeendence assumtions when data are taken over time. The otion also gives an estimate 2

r of the rst order autocorrelation. We get dw = 1:407 and b = :283 for the rst model, dw = :969 and b = :497 for the second model. 3. The Durbin-Watson statistic and rst order autocorrelation. P n t=2 (r t, r t,1) 2 = P n The Durbin-Watson statistic is dw = t=1 r2 where r t t is the residual at time t. If e t reresents white noise (an uncorrelated sequence) then we nd these exected values: Ef(e t, e t,1) 2 g = Efe 2 t, e te t,1 + e 2 t,1 g = 2 +0+ 2 Efe 2 t g=2 Thus P n t=2 (e t, e t,1) 2 = P n t=1 e2 t should be near 2, that is, the Durbin Watson statistic should be near 2 if calculated on a white noise sequence. If there is ositive correlation between neighboring e's then e t and e t,1 would be more alike than in the white noise case so that e t, e t,1 would be smaller in magnitude and thus dw would move toward 0. The rst order correlation in the residuals is b = nx t=2 (r t, r)(r t,1, r)= nx t=1 (r t, r) 2 which isvery close to what one would get by simly inserting r t and r t,1 into the standard formula for a correlation. If, as in our examle, the regression contains an intercet then r =0. It is well known that if b is comuted on a white noise series and if the samle size n is reasonably large, then Z = nb 1, b 2 is aroximately a N(0; 1) random variable so that for large samles, values of jzj exceeding 1.96 would give us reason to susect that autocorrelation is resent. A bit of algebra demonstrates that the Durbin-Watson statistic is roughly equal to 2(1, b) so that from our Z, we could get a large samle aroximate distribution for the Durbin-Watson statistic. The real contribution of Durbin and Watson was to to show how to get the exact nite samle distribution of the statistic dw. Unfortunately the Durbin-Watson theory shows that the exact distribution deends on the values of the X exlanatory variables in the regression so that each new roblem encountered would require a new table of critical values. However, if none of the X variables are lagged Y values and the errors are normal, they were able to calculate bounds for all critical values. Thus if you enter the tables of Durbin and Watson for a certain samle size and number of exlanatory variables you will see uer and lower bounds for the true critical value. A dw to the left of the lower bound is clearly less than the critical value and thus too close to 0 to accet the indeendence hyothesis (under which dw should be near 2). Adwto the right of the uer bound makes it clear that dw is closer to 2 than is the critical value so we cannot reject indeendence. A dw between the bounds just tells you that the calculated dw and the critical value are between these numbers so you have no idea how they are laced relative to each other. Durbin and Watson also gave a comutationally intensive way of comuting -values using the observed Xs. We will see how toget -values from PROC AUTOREG. It should be noted that the restriction that lagged Y s not be included in the exlanatory X variables still holds so that a model with lagged Y s, exlicitly or imlicitly among the exlanatory variables, would not roduce exact -values using the Durbin-Watson method. Because the large samle Z aroximation is reasonably good, the tables of Durbin and Watson tyically do not extend to very large nvalues so for our examle residuals, we use 3

Z = nb= 1, b 2 getting 156(:283)= 1, :2832 =3:7 for the rst model, and 156(:497)= 1, :4972 =7:2 for the second model. Using 1.96 as a critical value we have strong evidence for autocorrelation in our examle. For our examle we have normal residuals with homogeneous variance, but they are clearly not indeendent for either of our models. 4. Adjusting for autocorrelation. Suose we have a simle linear regression model (A) The estimates of the regression coecients are still unbiased. (B) The estimates of the regression coecients vary more from samle to samle than do the best estimates, but still may be reasonably ecient. (C) Estimates of standard errors for coecients and anything comuted from them (t statistics, - values and condence intervals for examle) are biased - often badly biased. Using our simle linear regression and an order 1 error rocess a t =,a t,1 + e t we note that the equation holds at both times t and t, 1 so that, multilying through by the autoregressive arameter we obtain Y t = 0 + 1 X 1t + a t Y t,1 = 0 + 1 X 1t,1 + a t,1 Y t = 0 + 1 X 1t + a t where, instead of white noise e t term satises a model such as our error and adding we have the transformed model (Y t + Y t,1) = 0 (1+)+ 1 (X 1t +X 1t,1)+(a t +a t,1) a t =, 1 a t,1, 2 a t,2,, a t,+e t This error model is called autoregressive of order where the order refers to the number of lagged a's aearing in the equation for a t. If = 1 then the rst order autocorrelation coecient from PROC REG is a reasonable estimate of but in higher order models, the relationshi between the autoregressive coef- cients (s) and the autocorrelations is much more convoluted. What haens if we just ignore the autocorrelation? Now we note the following oints about this transformed model: (A) This is a linear model in the transformed variables (in arentheses) (B) It has the same coecients as the original model (C) It has error term (a t + a t,1) where we are assuming a t + a t,1 = e t, that is this new model satises all the usual regression assumtions! (D) It has n, 1, not n observations. 4

Note that oint (C) imlies that running an ordinary regression would suce if we knew (or could aroximate it well). We can recover the lost observation by noting that 1, 2 Y 1 = 0 1, 2 + 1 1, 2 X 11 + a 1 This works because Var(a t ) = 2 =(1, 2 ). What we will do is regress column 1 on columns 2 and 3 in this table using the rst order autocorrelation of the residuals from an initial ordinary least squares regression as an estimate of. 1, 2 Y 1 1, 2 1, 2 X 1 Y 2 + Y 1 1+ X 2 +X 1 Y 3 + Y 2 1+ X 3 +X 2... Y n + Y n,1 1+ X n +X n,1 These new estimates of the s can be used to comute new residuals, a new estimate of, new columns in the table and the whole rocess can be iterated until the estimates stabilize. Alternatively, one can simly notice at the outset that this whole rocedure amounts to an attemt to minimize the error sum of squares in a nonlinear model and thus use standard nonlinear search techniques (i.e. full blown maximum likelihood estimation of and the s simultaneously instead of the alternating technique above). Either way, we have estimated the model Y t =,Y t,1+ 0 (1+)+ 1 (X 1t +X 1t,1)+ e t which is seen to be a model that includes lagged Y 's among the exlanatory variables. Now it is ossible that the model needs more than 1 lagged residual. The rocedure is essentially the same, we just need more terms in the transformation and more than 1 observation at the beginning needs to be recovered in a secial way. 5. PROC AUTOREG for the sales data. We use PROC AUTOREG on our sales data with the following code: roc autoreg; model y = t x/nlag=4 backste dwrob artial; The outut begins with ordinary least squares, used to get the residuals and model them. Next come autocorrelations of the residuals. The lag j autocorrelation is simly an estimate of the correlation between a t and a t,j comuted from the ordinary least squares residuals r t (think of r t as an estimate of a t ). The j th artial autocorrelation can be interreted as an estimate of the last coecient in the regression of a t on a t,1;:::;a t,j and thus would be 0 after the aroriate number of autoregressive lags are included in the model. Time series exerts use autocorrelations and artial autocorrelations to diagnose the nature of correlation in the residuals. The dro in the autocorrelations after lag 1 is more dramatic than that in the artial autocorrelations. This suggests that a model other than autoregressive for the error terms might be considered, thus taking us into the realm of PROC ARIMA which we discuss later. Estimates of Autocorrelations Lag Covariance Correlation 0 446.1458 1.000000 1 126.196 0.282858 2-17.9134-0.040151 3 27.51148 0.061665 4 8.913212 0.019978 5

Partial Autocorrelations 1 0.282858 2-0.130610 3 0.123244 4-0.047635 The next art of the outut is a backward elimination of insignicant autoregressive arameters ('s) starting with the least signicant. Backward Elimination of Autoregressive Terms Lag Estimate t-ratio Prob 4 0.047635 0.5821 0.5614 3-0.123244-1.5210 0.1304 2 0.130610 1.6188 0.1076 We are left, then, with just a lag 1 autoregressive model. The rocedure next summarizes the model: Estimates of the Autoregressive Parameters Lag Coecient Std Error t Ratio 1-0.28285812 0.077798-3.636 Yule-Walker Estimates SSE 63800.85 DFE 152 MSE 419.7425 Root MSE 20.48762 SBC 1401.124 AIC 1388.924 Reg Rsq Durbin-Watson 0.7101 1.8850 Total Rsq PROB < DW 0.7764 0.2013 The autoregressive coecient -0.2829 divided by its standard error is t = -3.636 and since our samle size is reasonably large, t exceeding 1.96 in magnitude is considered signicant. The error mean square 419.7 estimates 2, the variance of e t so our error model is a t = :2829a t,1 + e t The SBC and AIC are information criteria that can be used to comare models with diering numbers of arameters. The model delivering the smallest information criterion would be selected. The criteria trade the t of the model o against its comlexity just as a erson might trade the functionality of a new comuter against its cost in deciding which machine to urchase. We observe that the Durbin-Watson statistic on the transformed model is now close to 2 and an aroximate -value larger than 0.05 aears. This is not an exact -value since the transformed model imlicitly uses an estimated coecient on lagged Y 's to redict the current Y and thus does not satisfy Durbin and Watson's assumtions. Finally there are 2 R-square values. This is because, in redicting Y one ste ahead, we can use a rediction based on the X's and their coecients only or we can add to that a forecast of the next residual based on our autoregressive error model and the most recently observed residual(s). The ercentage of variation exlained under these scenarios are the regression R-square and total R-square resectively. In that sense, the total R-square is a redictability R-square while the regression R-square tells how much of the redictability isassociated with the X variables (which may be dicult or exensive to obtain - esecially future values of them). The rocedure next uses the estimated to get the transformed variables, e.g. Y t, 0:2829Y t,1, and runs ordinary least squares on the transformed variables. Because of the transformation, the resulting coecients, standard errors, etc. are correct and, excet for the fact that is estimated instead of known, the X coecients are the best linear unbiased estimates of the arameters. The model arameters art of the outut is given next. A ortion of this is shown here: 6

Variable DF BValue t Ratio Arox Prob Intercet 1 5.571991 0.882 0.3793 T 1 0.038574 0.732 0.4656 X 1 0.947411 18.220 0.0001 We see that the time trend (T ) which seemed to aear in our grah does not seem imortant after X is included in our model. Note that this does not say that there is no increase in sales, only that there is no increase beyond what would have been redicted from incoming hone calls. Phone calls X are strongly signicant. We might have been haier had the signicance results been reversed, since we need to suly next week's hone volume to redict next week's sales in a model involving X. What are our choices in terms of forecasting? One otion is to somehow get estimates of future X values. Here are some forecasts of hone volume which actually came from SAS PROC ARIMA. They have been aended to our original data and you see their Y values are missing. T DATE X Y 154 12/16/94 153.000 141 155 12/23/94 146.000 159 156 12/30/94 180.000 220 157 01/06/95 157.292. 158 01/13/95 141.705. 159 01/20/95 131.008. 160 01/27/95 123.665. 161 02/03/95 118.625. Do we really think X will be 157.292 next week? Of course not, this is just a forecast so the use of this X value in comuting a forecast for Y will add to the inaccuracy of the forecast. On the other hand, since we stoed in week 156, the use of t = 157 for next week's t in our model will introduce no inaccuracy in the forecast. Choosing SBC as a criterion we have Model SBC MSE X and t 1401 420 -.2829 X only 1397 418 -.2873 t only 1607 1658 -.4972 so the model with X only is clearly referred. One confusing henomenon that often occurs is that, although statistical theory indicates that the estimates from our transformed model should be better than the ordinary least squares estimates that ignore autocorrelation, the ordinary least squares rintout shows smaller standard errors than the ones shown in PROC AUTOREG. How can that be if the PROC AUTOREG method rovides better estimates? The answer is simle - the ordinary least squares numbers are not good estimates of the standard errors and are often, but not always, decetively small. In other words by ignoring autocorrelation we are using inferior estimates of the arameters but the standard errors falsely indicate that they are suerior. The three models estimated by PROC AU- TOREG (with standard errors) are: Y t = 5:5720 + 0:0386t+ 0:9474X t (6:3191)(0:0527) (0:052) Y t = 7:3681 +0:9589X t (5:8111) (0:0498) Y t = 83:8472 + 0:3388t (11:0956)(0:1222) 6. Forecasting The model with only X is Y t = 7:3681 + :9589X t + a t (5:8111)(0:050) 7

with a t = :2873 a t,1 + e t and we are at the last week of year 3 in our data so t=156. Standard errors are in arentheses. The last observation was Y 156 = 220 and X 156 = 180 so that the residual, an estimate of a 156, would be r t = 220, 7:3681, :9589(180) = 40:030 and we redict a 157 as :2873(40:030) = 11:501. Now if we assume X 157 will be 157:292 then we redict Y 157 to be 7:3681 + :9589(157:292) + 11:501 = 158:195 + 11:501 = 169:696. This is the rst forecast in the PROC AUTOREG outut dataset and an associated standard error 20.9 is used to comute a 95% rediction interval. The roblem is that this standard error is comuted assuming that X will be exactly 157.292 in the next time interval. Our true level of uncertainty in the forecast of Y would certainly be inuenced by the variability in our imuted X value. 7. Using PROC ARIMA We can model the whole rocess, X and Y, in PROC ARIMA. We rst comute a model for X which PROC ARIMA estimates as X t, 107:6=:6864(X t,1, 107:6) + e t Now PROC ARIMA can t and forecast the same model as PROC AUTOREG, however it gives you the otion of using the X model to rovide forecasts of future X's and it incororates the uncertainty in those X's in the Y forecasts. Our forecasts of future X's were from PROC ARIMA so the forecasts from the two rocedures will match but the forecast error variances will dier. We close by resenting grahs of forecasts and their standard errors from several scenarios. In gure 5, the forecasts from PROC AU- TOREG and PROC ARIMA with their intervals are overlaid. The dierence in widths of the forecast intervals illustrates the magnitude of error that is being ignored when one treats future X's as known when they are in fact forecasts. Figure 6 shows the forecasts and intervals from the model that uses time t as the exlanatory variable. Here we can roerly treat future t's as known, but ay a rice in that the model does not t as well. The forecasted X ARIMA lot is overlaid on this (it gives the lower forecasts and intervals) and it is interesting to note that although this model t substantially worse according to our statistical tests, once we admit that there is error in our forecasts of X, our forecasts and error bands are similar in both models. In the long term, the model with t will give forecasts that trend linearly uward while the forecasts with the ARIMA model will return to the historic mean as will always haen with a stationary ARIMA model. Finally, PROC ARIMA allows the tting of a larger class of error models than autoregressive. A lag 1 moving average model also rovides an excellent t to the error series for the sales data. Using the autoregressive model for X, a linear relationshi between Y and X, and an order 1 moving average error term, our estimated model becomes Y t =8:607 + 0:95 X t + e t + :37e t,1 X t, 107:6=0:686 (X t,1, 107:6) + u t Because the error term e t, e t,1 isamoving average we estimated this model in PROC ARIMA: roc arima data=a; ivar=x norint; e =1 ml; i var=y crosscor=(x) nlag=5; e inut = (x) q=1 ml; f lead=5 out=out5 id=t; Because our inut variable is modeled, we will get a crosscorrelation lot which has been rewhitened. It is a lot of correlations between Y at time t and X at time t, j which has been cleared of any indirect correlations caused by autocorrelation in the X series. It 8

is thus a icture of how changes in X are incororated into the Y series. The lot of the crosscorrelatins for our examle will look similar to this: Crosscorrelations Lag Corr -5-0.065. * j. -4 0.043. j *. -3-0.090. ** j. -2-0.060. * j. -1 0.008. j. 0 0.797. j ********** 1 0.057. j *. 2-0.010. j. 3-0.053. * j. 4 0.135. j *** 5-0.124. ** j. It is seen that there is no lagged correlation, only contemoraneous correlation. The moving average error structure gave error mean square 408.7 as comared to 418.3 so it is the best t of all models considered here by that criterion. SAS is the registered trademark of SAS Institute Inc. in the USA and other countries TM indicates USA registration. \." marks two standard errors 9