Decision 411: Class 5. Where we ve been so far

Size: px

Start display at page:

Download "Decision 411: Class 5. Where we ve been so far"

Cecil Gordon
6 years ago
Views:

1 Decision 411: Class 5 HW#2 discussion Introduction to regression forecasting Example: rolling back the beer tax Where we ve been so far Thus far we have looked at the most basic models for predicting future values of a time series Y from its own history: : the mean model, random walk model, and smoothing/averaging models, possibly with seasonal adjustment. These basic models assume that future values of Y are some sort of linear function of its past values, so we ve also discussed the use of nonlinear data transformations (logging, deflating) to cover more possibilities. We ve also studied basic principles and tools for testing the assumptions of models and comparing the forecasting accuracy of different models. 1

2 Where we re going next Next we will consider models for predicting future values of Y as linear functions of already-known values of some other variable X,, or possibly several other variables (X1( X1, X2,, etc.). These more general linear forecasting models are called regression models (for reasons to be explained ). In some cases the X s could be lagged (previous) values of Y,, but in general they are other variables whose movements are in some way predictive of movements in Y. Our same general tools for testing and comparing models will still apply, but now there will be more assumptions to test and more models to compare. Game plan for the this week Today (videos #3-6): Major concepts: correlation, R-squared R & all that Regression tools for time series data Regression procedures available in Statgraphics Example: rolling back the beer tax Friday (videos #15-16): 16): Seasonality revisited: dummy variables Selecting regressors: : manual vs. stepwise Modeling issues & diagnostic tests More examples 2

3 Game plan for next week Tuesday, September 25: Quiz Nonlinear transformations Not-so so-simple simple regression Multiplicative regression Friday, September 25 (videos #17-18): 18): Advanced regression techniques: ANOVA, general linear models, logistic regression, etc. Linear Regression Is the most widely used (and abused!) of all statistical techniques Is about the fitting of straight lines to data Y X General equation of a (simple) regression line: Y = constant + beta*x Y X 3

4 Why assume linear relationships? Linear relationships are the simplest non-trivial relationships (start simple!) "True" relationships between variables are often at least approximately linear over the range of interest Y X Why assume linear relationships? Linear relationships are the simplest non-trivial relationships (start simple!) "True" relationships between variables are often at least approximately linear over the range of interest Y X 4

5 Linearization of relationships Alternatively, we may be able to transform the variables in such a way as to linearize linearize the relationships. Nonlinear transformations (log, power, reciprocal, deflation, differences, percentage differences, ratios of variables, etc.) are therefore an important tool of regression modeling but use with care and with good motivation! Examples Sales $$ = constant + beta * Advertising $$ Δ Units sold = constant + beta* Δ Coupons distributed % Return on stock = constant + beta* % Return on market Log(Population) ) = constant + beta * Time Temperature(t) ) = constant + beta* Temperature(t-1) Δ WebHits(t) ) = constant + beta * Δ WebHits(t-1) Δ denotes change in,, i.e., delta, DIFF 5

History of regression Was so-named by Sir Francis Galton, a 19th century scientist & adventurer Galton initially gained fame for his African explorations

6 History of regression Was so-named by Sir Francis Galton, a 19th century scientist & adventurer Galton initially gained fame for his African explorations and wrote best-selling selling books on wilderness exploration that introduced the sleeping bag & other wilderness gear to the Western world (still in print) 6

7 Galton (warts and all) Was also a pioneer in the collection & analysis of biometric, anthropometric & psychometric data, inspired by the evolution theory of Darwin Invented weather maps and pioneered the scientific study of tea-brewing He was also wrong about some things (e.g., eugenics) His disciple, Karl Pearson, worked out the mathematics of correlation and regression (Look him up in Google or Wikipedia,, also Galton.org) 7

8 Galton s observations A taller-than than-average average parent tends to have a taller- than-average average child, but the child is likely to be less tall than the parent relative to its own generation Parent s height = x standard deviations from the mean child's predicted height = rx standard deviations from the mean,... where r is a number less than 1 in magnitude: the coefficient of correlation between heights of parents and children This is a "regression" toward mediocrity," or in modern terms a "regression" to the mean." The first regression line (1877) 8

9 Graphical interpretation of regression If you standardize the X and Y variables by converting them to units of standard deviations from their own means, the prediction line passes through the origin and has a slope equal to the correlation coefficient, r. Thus, the line regresses back toward the X- axis, because this minimizes the squared errors in predicting Y from X. standardize(y3) Graphical interpretation standardize(x) On a standardized plot of Y vs. X, where the units are standard-deviations deviations-from- the-mean, the data distribution is roughly symmetric around the 45- degree line but the line for predicting Y from X regresses toward the X axis because this minimizes the squared error in the Y direction. The slope of the regression line on the standardized plot is the correlation r (=0.46 in this case). 9

10 Graphical interpretation 3 standardize(y3) standardize(x) If we instead wanted to predict X from Y,, the line would regress to the Y axis instead! (This line would minimize the squared error measured in the X direction.) Graphical interpretation of regression with time series data In a simple regression of two time series,, the forecast plot of Y is just a shifted and rescaled copy of the time series plot of X In a multiple regression of time series, the forecast plot of Y is a weighted sum* of the plots of the X s In either case, the time pattern in Y should look like some of the time patterns in the X variables: trends and peaks and valleys and spikes in Y ideally should have their counterparts somewhere among the X s * weights can be positive or negative 10

11 Regression is inescapable Your kids will probably be less exceptional than you and your spouse, for better or worse Your performance on the final exam in a course will probably be less exceptional than your score on the midterm A ballplayer s performance during the 2nd half of a season will probably be less exceptional than in the 1st half The hottest mutual funds of the last 5 years will be less hot in the next 5 years Regression is inescapable, cont d Your forecasting models will always produce sequences of forecasts that are smoother (less variable) than the actual data This doesn t mean the future is guaranteed to be more mediocre (less interesting) than the past, but that s the way to bet! 11

12 Why do predictions regress? Is there a restoring force that pulls everything back to the mean? No! It s a purely statistical phenomenon. Every observation of a random process is part signal (a predictable or inheritable component) and part noise (a random, unpredictable, zero- mean component). Here s why: An observation that is exceptional (far above or below the mean) is likely to be composed of a signal and a noise term with the same sign (both positive or both negative). If the high- (or low-) ) achiever performs again (or has offspring), the expected signal will be just as strong and in the same direction, but the expected noise term will be zero. Hence the second observation is likely (not guaranteed, just likely) to be closer to the mean. 12

13 Underlying assumptions of regression Linear relationship between variables Constant variance of errors (homoscedasticity( homoscedasticity) Normally distributed errors Independent errors (no autocorrelation) Stationary process (stable correlations over time) These need to be tested! Error statistics and confidence intervals for forecasts are not reliable if the assumptions are badly violated Sufficient statistics for regression Regression analysis depends only on the following summary statistics of the data: Means of all variables Variances (or standard deviations) of all variables Covariances (or correlations) between all pairs of variables Given only these statistics, you can calculate all the coefficient estimates, standard errors, and forecasts for any regression model that might be fitted to any combination of the variables! (However, you still ought to look at residual plots, etc., ) 13

14 Variance measures the tendency of a variable to vary (away from its mean) Population variance: VARP( Y ) = AVG(( Y Y )...the population variance is the average squared deviation of Y from its own mean Sample variance: VAR(Y) ) = (n/(( /(n-1))varp( 1))VARP(Y)...an unbiased estimate of the true variance based on a finite sample of size n Our forecasting task is to explain the variance in Y. Why does it vary in the way that it does i.e., why isn t t it always constant? 2 ) This factor adjusts for the degree of freedom for error that was used up back calculating the mean from the same sample. Covariance measures the tendency of two variables to vary together Population covariance: COVP( X, Y ) = AVG(( X X )( Y Y ))... is the average product of the deviations of X and Y from their respective means If Y and X tend to be on the same side of their respective means at the same time (both above or both below), the average product of deviations is positive. If they tend to be on opposite sides of their own means at any given time, the average product is negative. If their variations around their own means are unrelated,, the average product is zero. 14

15 Sample covariance Sample covariance: COV(X,Y) ) = (n/(( /(n 1))COVP( 1))COVP(X,Y)...an unbiased estimate of the true covariance based on a sample of size n,, analogous to the sample variance. Correlation The correlation coefficient is the covariance standardized by dividing by the product of standard deviations: r = COV(X,Y X,Y)/STDEV( )/STDEV(X)STDEV(Y) = COVP(X,Y X,Y)/STDEVP( )/STDEVP(X)STDEVP(Y) = CORREL(X,Y X,Y) ) in Excel It measures the strength of the linear relationship between X & Y on a relative scale of -11 to +1 When the correlation is significantly different from zero, variations in X can help to predict variations in Y 15

16 Simple regression formulas Model assumption: Prediction equation: Y t 0 1 t t Y t = β + β X + ε intercept slope = ˆ β + ˆ β X 0 1 ε = t independent identically normally distributed error t Least squares coefficient estimates: βˆ 1 = COV( X,Y )/VAR( X ) = r(stdev( Y )/STDEV( X )) ˆ β0 = AVG( Y ) ˆ β1 AVG( X ) We have exact formulas for the coefficient estimates don t t need to use Solver to minimize squared error. The slope coefficient is just the correlation multiplied by the ratio of standard deviations! Multiple regression formulas The formulas for coefficient estimates and forecast standard errors for the multiple regression model are merely matrix versions of the preceding formulas. If you re interested in the gory details, see the Regression formulas worksheet (SIMPREG.XLS) posted on the Course Outline web page (lecture 5 links). 16

17 Standard error of the regression The standard error of the regression, a.k.a., standard error of the estimate, is the RMSE adjusted for # coefficients estimated: n 1 n 1 n 2 n 2 s = STDEV( e ) = ( )(1 r )VAR( Y ) 2 Adjustment for # coefficients estimated (2) Sample standard deviation of the residuals (errors) Fraction of variance unexplained Original sample variance s is the estimated standard deviation of the true error process (ε( t ), and in general it is slightly larger than the sample standard deviation of the residuals, due to the adjustment for additional coefficients estimated besides the constant. All the other standard errors (for coefficients, ficients, means, forecasts, etc.) are proportional to this quantity. Standard errors of the coefficient estimates Standard error of the slope coefficient SE ˆ β 1 1 s = n STDEVP( X ) t-statistic of the slope coefficient: ˆ β t 1 ˆ = β1 SE ˆ β 1 The p-value of the t-stat is TDIST(t, n 2,2) in Excel The larger the sample size (n),( the more precise the coefficient estimate 17

18 Standard error of the mean The standard error of the mean at X = X t is the standard deviation of the error in estimating the true height of the regression line at that point: SE mean s = + n ( X AVG( X)) 1 tvarp( X ) 2 Same as standard error of the mean in the mean model Correction factor for distance of X t from the mean Standard error of the forecast The standard error of the forecast is 1 n 2 2 ( Xt AVG( X)) fcst = + mean = VARP( X ) SE s SE s 2 This term measures the noise (unexplained variation) in the data This term measures the error in estimating the height of the true regression line at X = X t Note that this is almost the same formula we used for the mean model in class 1. The only difference is that calculating SE mean is slightly more complicated here it depends on the value of X t. 18

19 Lower bounds on standard errors s n is a lower bound on the standard error of the mean ( equalled only when X=AVG(X) ) 1 s 1+ n is the corresponding lower bound on the standard error of the forecast Key point: the standard errors of the forecasts for Y are larger for values of X that are farther from the mean, i.e., farther from the center of the data distribution Confidence limits Confidence limits for a forecast are obtained by adding and subtracting the appropriate multiples of the forecast standard error (as usual). For large n (>20) a rough 95% confidence interval is plus or minus 2 standard errors The exact number of standard errors for a 95% interval, for any n,, is given by TINV(.05,n 2) in Excel A 50% interval is roughly 1/3 as wide (plus or minus 2/3 standard error) 19

20 250 X t = 210 Y % conf. int. for mean Note that confidence intervals are wider when X is far from the center this this probably understates the danger of over- extrapolating a linear model! X 250 X t = 210 Y % conf. int. for forecast The confidence interval for the forecast reflects both the parameter risk concerning the slope & intercept of the regression line and the intrinsic risk of random 150 variations around it. 300 X 20

21 Strange but true For any regression model: VAR( Y ) = VAR( Yˆ ) + VAR( e) Total variance Explained variance + Unexplained variance For a simple regression model: VAR( Y ˆ ) / VAR( Y ) = i.e., fraction of variance explained = r squared r 2 R-squared The term R squared refers to the fraction of variance explained,, i.e., the ratio VAR( Y ˆ ) / VAR( Y ) regardless of the number of regressors. It measures the improvement of the regression model over the mean model for Y. A bigger R-squared R is usually better, for the same Y, but R-squared R should not be used to compare models that may have used different transformations of Y and/or different data samples. R-squared can be very misleading for regressions involving time series data: 90% or more is not necessarily good, and 10% or less is not necessarily bad. 21

Example: rolling back the beer tax Suppose the 1991 beer tax had been rolled back in July 2007, resulting an immediate 10-point drop in the beer price index (from 118.23 to 108.

22 Example: rolling back the beer tax Suppose the 1991 beer tax had been rolled back in July 2007, resulting an immediate 10-point drop in the beer price index (from to ) What would be the expected effect on per capita real consumption ( BeerPerCapita BeerPerCapita )? What would we predict for the consumption rate in July 2007? (June 2007 rate is $ per year SAAR in year-2000 beer dollars.) 22

In search of a linear model 300 200 Variables BeerPerCapita BeerRelPrice What will happen to per Plot of BeerPerCapita vs BeerRelPrice capita consumption 1.02 in July 2007? 0.92 0.82 0.

23 In search of a linear model Variables BeerPerCapita BeerRelPrice What will happen to per Plot of BeerPerCapita vs BeerRelPrice capita consumption 1.02 in July 2007? BeerPerCapita Post tax hike anomaly Scatterplot of BeerPerCapita vs. relative price (BeerPrice( BeerPrice/CPI) reveals a strong negative correlation and a highly linear relationship (except for mid- 90 s anomaly) Assumed 110 relative price drop 0.55 in June BeerRelPrice The actual correlation is -0.94, which suggests that % of the variance in BeerPerCapita can be explained by BeerRelPrice Summary statistics of variables Here are summary stats and correlations of the two variables, obtained with the Multiple-Variable analysis procedure. Note that the standard deviation of BeerPerCapita is $39.15, which is essentially the forecast standard error we would get by using the mean model to predict it. How much better can we do with a regression model? Well, there is a very strong negative correlation of with BeerRelPrice,, and the square of the correlation is the fraction by which the error variance can be reduced by regressing BeerPerCapita on BeerRelPrice rather instead of using the mean model. Variable definitions: BeerPerCapita = *Beer/(BeerPrice BeerPrice*Population) BeerRelPrice = BeerPrice/CPI 23

24 Fitting a simple regression model: Relate/Multiple Factors/Multiple Regression on the Statgraphics menu Typical regression output Standard error of the regression, a.k.a. standard error of the estimate, is the RMSE adjusted for # coefficients estimated The bottom line IF it is really representative of future accuracy R-squared & adjusted* * R-squaredR Not the bottom line! Coefficients & their standard errors, t-stats (=coeff coeff./std. error) & p-values Residual plots and diagnostic tests Used to test whether some variables are insignificant in the presence of the others Used to test assumptions of linearity, normality, no autocorrelation, etc. *Adjusted for # coefficients in the same way as the standard error of the regression, to be able to compare among models with different # s of coefficients 24

25 What to look for in regression output Error measures: smaller is better t-statistics of coefficients greater than 2 in magnitude? (p-values < 0.05) variables appear significant * Economic interpretations of coefficients Residual plots & diagnostic tests Residuals vs. predicted (nonlinearity?) Normal probability plot (skew? fat tails? outliers?) Residuals vs. time (for time series data) Residual autocorrelation plot (for time series data) *Not a hard and fast rule, but variables that don t pass this test can often be removed without being missed. If a variable s presence in the model is strongly supported by intuition or theory, then a low t-stat may be OK: its effect may just be hard to measure. Basic regression output The Interval plot option plots the regression line vs. the dependent variable or time index. R-squared = 88% as expected, slope coefficient (-280.8)( is highly significant (t-stat = -64). Standard error of regression is $13.80, much less than original standard deviation, but still a lot of error in predicting next month s per capita consumption! Durbin-Watson stat and lag-1 1 autocorrelation are also very bad! (DW should be close to 2, not zero, lag-1 1 auto should be close to zero, not 1!) 25

26 Deconstruction of R-squaredR The variance of the dependent variable is (39.15) 2 = This is the error variance you would get by using the mean model. The variance of the regression forecast errors is the square of the regression standard error, which is (13.8) 2 = 190. The fraction of the original variance that remains unexplained is 190/ %, hence the fraction explained is 88%. This is the reduction in error variance compared to using the mean model instead. What s the Durbin-Watson statistic? It s just an alternative statistic for measuring lag-1 autocorrelation in the residuals, which is popular in regression analysis for obscure historical reasons 0< DW < 4, and ideally DW 2 DW 2(1 r 1 ) where r 1 = lag-1 1 autocorrelation r 1 is easier to interpret: a good value is close to 0, and r 2 1 is roughly the percentage of further variance reduction that could be achieved by fine- tuning to reduce the autocorrelation. 26

27 Economic interpretation of model The slope coefficient of suggests that a.01 decrease in the relative price (X)( ) should increase consumption (Y)( ) by $2.81 The proposed tax rollback would decrease the relative price by (from to 0.520) Thus, predicted consumption in July 07 will increase by 0.049*280.8 = $13.76 from its predicted June 07 value But the model s prediction for June 07 is already way off! Hence the prediction for July 07 is less than the actual June 07 value, despite the price drop. Forecasting equation of model 1 Forecasting equation of this model: Y t = X t For July 07: Y t = (0.52) The forecast for Y depends (only) on the current value of X, not on recent values of Y May 07 June 07 July 07 BeerRelprice (X) BeerPerCapita (Y) ?? 27

28 The forecast ( Reports ) report The multiple regression procedure automatically shows forecasts (on the Reports report) if future values are provided for the independent variable(s).. Here, a July 2007 value of 0.52 was plugged in for BeerRelPrice on the data spreadsheet, and the resulting forecast for BeerPerCapita is $240.20, which is $28.45 below the current value of $ The upper 95% limit of $267.4 is even below the current t value! Last data point (268.65) is somewhere in here 95% CI for forecast for BeerRelPrice = 0.52 Here s the plot of the regression line with confidence 95% confidence limits for the forecasts. This is the Interval plots chart drawn with h 95% intervals for predicted values (a right-mouse mouse-button option). The interval for the July 07 prediction is at the upper left where BeerRelPrice=

Plot of residuals vs. row number (time) shows severe autocorrelation, i.e., long runs of values with the same sign, as foretold by bad DW stat and lag-1 1 autocorrelation, and the most recent errors have been especially large.

29 Plot of residuals vs. row number (time) shows severe autocorrelation, i.e., long runs of values with the same sign, as foretold by bad DW stat and lag-1 1 autocorrelation, and the most recent errors have been especially large. Plot of predicted values (red) vs. row number ( Interval plot ) shows poor fit to data, and the predicted jump in July 07 falls well short of the June 07 value. The predicted values are actually just a shifted and rescaled version of BeerRelPrice. Regression option in Forecasting procedure Here model E is specified as a mean + 1 regressor model You can also fit the same regression model in the Forecast/User-Specified Model procedure. Choose the Mean model type and hit the Regression button to add independent variables. This approach allows you to use the model-comparison features and additional residual diagnostics in the forecasting procedure. Caveat: no more than 4 independent variables are allowed here. 29

Same regression results and forecast, but the normal probability plot and autocorrelation plot of the residuals look terrible,, and the comparison with simpler time series models is not flattering!

30 Same regression results and forecast, but the normal probability plot and autocorrelation plot of the residuals look terrible,, and the comparison with simpler time series models is not flattering! Conclusion (so far ) Although this model provides a plausible estimate of the macroeconomic relationship between relative price and per capita consumption (assuming that the long- term upward trend in consumption is entirely caused by the long-term downward trend in relative price!), it does not do a very plausible job of forecasting the near future. Why not? It is a cross-sectional sectional model that does not exploit the time dimension in the data: it predicts consumption for a randomly chosen relative price. Due to other, unmodeled factors, the data wanders away from the regression line and does not return very quickly errors are strongly correlated. 30

31 How to incorporate the time dimension in a regression model? Some possible approaches: Predict changes instead of levels (i.e., use a first- difference transformation) Use lagged variables (recent values of dependent and independent variables) as additional regressors,, to serve as proxies for effects of unmodeled variables* Use an autocorrelation correction (e.g., Cochrane-Orcutt or ARIMA error structure) as a proxy for unmodeled factors* *We ll discuss these in later classes Let s look at monthly changes Here s a plot of the original BeerPerCapita series obtained in the Time Series/Descriptive Methods procedure. No transformations have been performed yet. 31

Here are time series plots of both BeerPerCapita and BeerRelPrice,, before and after a first-difference transformation.

32 On the right-mouse mouse-button Analysis Options panel, entering a 1 in the Differencing box performs a first-difference transformation. Now we are seeing the plot of month-to to-month changes in BeerPerCapita. Here are time series plots of both BeerPerCapita and BeerRelPrice,, before and after a first-difference transformation. Note that the differenced series appear to be stationary : no trend, constant variance, etc. The T circled point in lower right is the assumed price impact of tax rollback in July (This 4-chart arrangement was made by pasting the plots into the Statgallery Statgallery. ) 32

Plot of diff(beerpercapita) vs diff(beerrelprice) 15 diff(beerpercapita) 10 5 0-5 Here the Plot/Scatterplot Scatterplot/X-Y Y Plot procedure was used to plot diff(beerpercapita) vs.

33 Plot of diff(beerpercapita) vs diff(beerrelprice) 15 diff(beerpercapita) Here the Plot/Scatterplot Scatterplot/X-Y Y Plot procedure was used to plot diff(beerpercapita) vs. diff(beerrelprice) diff(beerrelprice) Scatterplot of differenced variables indicates a weaker but still significant nt negative correlation (-0.33).( The two circled points in the lower right are the drops in Jan. and Feb due to the beer tax increase. Even when w these two points are de-selected, there is still a significant negative correlation of Statistics of the differenced variables The correlations and other summary stats of the differenced variables were computed using the Describe/Numeric Data/Multiple- Variable Analysis procedure. Note that the standard deviation of diff(beerpercapita) ) is only $ This is roughly the forecast standard error you would get by using a random walk with drift model to predict BeerPerCapita,, because the random walk model merely predicts that each change will equal the mean change. Hence we can already see that the forecast standard error of the RW model is smaller than that of the original cross-sectional sectional regression by roughly a factor of 5. However, let s see if we can improve on the RW model by regressing diff(beerpercapita) ) on diff(beerrelprice). 33

Simple regression of diff(beerpercapita) on diff(beerrelprice) In a simple regression of the differenced variables, the change in BeerPerCapita is predicted from the change in BeerRelPrice.

Regression standard error is vastly superior! Still some autocorrelation, but not nearly as bad. The estimated coefficient of diff(beerrelprice ) is -257.

34 Simple regression of diff(beerpercapita) on diff(beerrelprice) In a simple regression of the differenced variables, the change in BeerPerCapita is predicted from the change in BeerRelPrice. This is a micro prediction rather than a macro prediction. Our predicted level of BeerPerCapita in the next period will be equal to the current level plus the predicted change. Regression standard error is vastly superior! Still some autocorrelation, but not nearly as bad. The estimated coefficient of diff(beerrelprice ) is , in the same ballpark as the coefficient of BeerRelPrice in the earlier model. Hence a similar change in consumption per unit change in relative price is predicted. However, this model is directly predicting the change,, not the level. The predicted change in July 07 is positive (+$12.65) in line with intuition. But what happened to R-squared? It s fallen to around 11%! (Horrors) 34

35 What happened to R-squared?? R The previous model explained 88% of the variance in the monthly level of BeerPerCapita. Because BeerPerCapita is a nonstationary, trended variable, it has a lot of variance to explain! This model directly predicts the change in BeerPerCapita,, which is a stationary series with a much lower variance to begin with. Hence, less variance remains to be explained by this regression model, and an R-squared R of only 11% is actually a much better performance. Another way to look at it: When the dependent variable is undifferenced, R-squared measures the reduction in error variance compared to using the mean model. When the dependent variable is differenced, R-squared measures the reduction in error variance compared to using the random walk with drift model on the original variable. Here, a random walk model (or another simple time series model) would have been a much better reference point for predictions of monthly per capita beer consumption. The regression of differenced variables is a walk model in which the steps are not completely random: they depend on the change in price! 35

36 Deconstruction of R-squaredR The variance of the differenced dependent variable is (2.637) 2 = This is the error variance you would get by using the random walk with drift model on the original undifferenced variable. The variance of the regression forecast errors is the square of the regression standard error, which is now (2.494) 2 = 6.22 The fraction of the original variance that remains unexplained is 6.22/ %, hence the fraction explained is 11%. This is not a huge improvement over the random walk model in terms of forecast accuracy, but it does allow us to factor in the price sensitivity of consumers. Forecasting equation for model 2 Forecasting equation for the change in Y: (Y t Y t-1 ) = (X t X t-1 ) For July 07: (Y t Y t-1 ) = ( ).569) = Undifferenced forecast for new level of Y: Y t = Y t = = The ultimate forecast from this model steps off from the last actual value of Y,, as in the random walk model, but now the step size depends on the change in X Dec-04 Jan-05 Feb-05 BeerRelprice (X) BeerPerCapita (Y) !! 36

Same model in Forecasting procedure There are several ways in which the differenced regression model can be fitted in the Forecasting procedure.

The first-difference transformation is applied to both variables prior to fitting the regression model.

37 Same model in Forecasting procedure There are several ways in which the differenced regression model can be fitted in the Forecasting procedure. The simplest way is to specify it as an ARIMA model with 1 order of nonseasonal differencing plus 1 regressor and a constant term. The first-difference transformation is applied to both variables prior to fitting the regression model. Almost the same regression results and forecast (slightly different estimation procedure) and the normal probability plot and autocorrelation plot of the residuals are much better (not perfect, but acceptable). The differenced regression model (B) is best on all error measures, but not by a large margin. 37

38 More fine tuning?? The differenced model still has a technically significant lag-1 1 autocorrelation of Because it is negative, it means the model is over- reacting rather than under-reacting reacting to recent changes in the data. By the r-squared rule, this suggests that = % of the remaining variance might be explained via more fine-tuning (e.g., adding lagged variables). This is not a large improvement: it corresponds to about a 2.5% further reduction in standard error, hence a 2.5% shrinkage in confidence intervals. Class 5 recap Regression to mediocrity is inescapable Correlations and scatterplots help to reveal strengths of linear relationships How to interpret regression output & test residuals Much of the variance in the original data may be explainable merely by an appropriate transformation of the data, such as a first- difference transformation applied to nonstationary time series variables. R-squared is not the bottom line! 38

Decision 411: Class 5

Decision 411: Class 5 HW#2 discussion Introduction to regression forecasting Roll back the beer tax? Where we ve been so far Thus far we have looked at the most basic models for predicting future values