Decision 411: Class 5. Where we ve been so far

Similar documents
Decision 411: Class 5

Decision 411: Class 7

Decision 411: Class 3

Decision 411: Class 3

Decision 411: Class 3

Decision 411: Class 9. HW#3 issues

9. Linear Regression and Correlation

Decision 411: Class 4

Decision 411: Class 4

Assumptions in Regression Modeling

Section 3: Simple Linear Regression


Chapter 16. Simple Linear Regression and dcorrelation

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Chapter 16. Simple Linear Regression and Correlation

Decision 411: Class 8

Midterm 2 - Solutions

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

The Simple Linear Regression Model

Decision 411: Class 8

Outline. Nature of the Problem. Nature of the Problem. Basic Econometrics in Transportation. Autocorrelation

Interactions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Decision 411: Forecasting

Business Statistics. Lecture 9: Simple Regression

Econ 300/QAC 201: Quantitative Methods in Economics/Applied Data Analysis. 17th Class 7/1/10

Regression Analysis. BUS 735: Business Decision Making and Research

Statistics for Managers using Microsoft Excel 6 th Edition

28. SIMPLE LINEAR REGRESSION III

How can we explore the association between two quantitative variables?

Year 10 Mathematics Semester 2 Bivariate Data Chapter 13

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

Do not copy, post, or distribute

Review of Statistics 101

11.1 Gujarati(2003): Chapter 12

DEMAND ESTIMATION (PART III)

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.

REVIEW 8/2/2017 陈芳华东师大英语系

Midterm 2 - Solutions

How To: Deal with Heteroscedasticity Using STATGRAPHICS Centurion

LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION

Time Series Analysis. Smoothing Time Series. 2) assessment of/accounting for seasonality. 3) assessment of/exploiting "serial correlation"

Review of Statistics

Chapter 3: Regression Methods for Trends

Nonlinear Regression. Summary. Sample StatFolio: nonlinear reg.sgp

Simple Linear Regression

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017

Any of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure.

Univariate analysis. Simple and Multiple Regression. Univariate analysis. Simple Regression How best to summarise the data?

APPENDIX 1 BASIC STATISTICS. Summarizing Data

11. Further Issues in Using OLS with TS Data

UNST 232 Mentor Section Assignment 5 Historical Climate Data

Chapter 7 Linear Regression

Applied Regression Analysis. Section 2: Multiple Linear Regression

Inferences for Regression

Introduction to Design of Experiments

EC4051 Project and Introductory Econometrics

LECTURE 2: SIMPLE REGRESSION I

Ch 13 & 14 - Regression Analysis

Chapter 27 Summary Inferences for Regression

27. SIMPLE LINEAR REGRESSION II

Wed, June 26, (Lecture 8-2). Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in SPSS.

POL 681 Lecture Notes: Statistical Interactions

Read Section 1.1, Examples of time series, on pages 1-8. These example introduce the book; you are not tested on them.

THE ROYAL STATISTICAL SOCIETY 2008 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE (MODULAR FORMAT) MODULE 4 LINEAR MODELS

FinQuiz Notes

appstats27.notebook April 06, 2017

Decision 411: Forecasting

Decision 411: Forecasting

Chapter 13. Multiple Regression and Model Building

ACE 564 Spring Lecture 8. Violations of Basic Assumptions I: Multicollinearity and Non-Sample Information. by Professor Scott H.

Summary statistics. G.S. Questa, L. Trapani. MSc Induction - Summary statistics 1

Correlation and regression

Regression and correlation. Correlation & Regression, I. Regression & correlation. Regression vs. correlation. Involve bivariate, paired data, X & Y

Basics: Definitions and Notation. Stationarity. A More Formal Definition

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues

Regression of Time Series

Mathematics for Economics MA course

9) Time series econometrics

YEAR 10 GENERAL MATHEMATICS 2017 STRAND: BIVARIATE DATA PART II CHAPTER 12 RESIDUAL ANALYSIS, LINEARITY AND TIME SERIES

Modules 1-2 are background; they are the same for regression analysis and time series.

Chapter 10 Regression Analysis

Week 8: Correlation and Regression

Introduction to Regression

TESTING FOR CO-INTEGRATION

Regression Models for Time Trends: A Second Example. INSR 260, Spring 2009 Bob Stine

Chapter 5 Least Squares Regression

1 The Classic Bivariate Least Squares Model

Chapter 14 Simple Linear Regression (A)

Introduction to Regression

Exam Applied Statistical Regression. Good Luck!

Stat 500 Midterm 2 12 November 2009 page 0 of 11

Introduction to Regression

Week 9: An Introduction to Time Series

Important note: Transcripts are not substitutes for textbook assignments. 1

Forecasting. Simon Shaw 2005/06 Semester II

1 Correlation and Inference from Regression

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

Lecture 10: F -Tests, ANOVA and R 2

Transcription:

Decision 411: Class 5 HW#2 discussion Introduction to regression forecasting Example: rolling back the beer tax Where we ve been so far Thus far we have looked at the most basic models for predicting future values of a time series Y from its own history: : the mean model, random walk model, and smoothing/averaging models, possibly with seasonal adjustment. These basic models assume that future values of Y are some sort of linear function of its past values, so we ve also discussed the use of nonlinear data transformations (logging, deflating) to cover more possibilities. We ve also studied basic principles and tools for testing the assumptions of models and comparing the forecasting accuracy of different models. 1

Where we re going next Next we will consider models for predicting future values of Y as linear functions of already-known values of some other variable X,, or possibly several other variables (X1( X1, X2,, etc.). These more general linear forecasting models are called regression models (for reasons to be explained ). In some cases the X s could be lagged (previous) values of Y,, but in general they are other variables whose movements are in some way predictive of movements in Y. Our same general tools for testing and comparing models will still apply, but now there will be more assumptions to test and more models to compare. Game plan for the this week Today (videos #3-6): Major concepts: correlation, R-squared R & all that Regression tools for time series data Regression procedures available in Statgraphics Example: rolling back the beer tax Friday (videos #15-16): 16): Seasonality revisited: dummy variables Selecting regressors: : manual vs. stepwise Modeling issues & diagnostic tests More examples 2

Game plan for next week Tuesday, September 25: Quiz Nonlinear transformations Not-so so-simple simple regression Multiplicative regression Friday, September 25 (videos #17-18): 18): Advanced regression techniques: ANOVA, general linear models, logistic regression, etc. Linear Regression Is the most widely used (and abused!) of all statistical techniques Is about the fitting of straight lines to data Y1 300 250 200 150 100 50 50 100 150 200 250 300 X General equation of a (simple) regression line: Y = constant + beta*x Y1 300 250 200 150 100 50 50 100 150 200 250 300 X 3

Why assume linear relationships? Linear relationships are the simplest non-trivial relationships (start simple!) "True" relationships between variables are often at least approximately linear over the range of interest Y X Why assume linear relationships? Linear relationships are the simplest non-trivial relationships (start simple!) "True" relationships between variables are often at least approximately linear over the range of interest Y X 4

Linearization of relationships Alternatively, we may be able to transform the variables in such a way as to linearize linearize the relationships. Nonlinear transformations (log, power, reciprocal, deflation, differences, percentage differences, ratios of variables, etc.) are therefore an important tool of regression modeling but use with care and with good motivation! Examples Sales $$ = constant + beta * Advertising $$ Δ Units sold = constant + beta* Δ Coupons distributed % Return on stock = constant + beta* % Return on market Log(Population) ) = constant + beta * Time Temperature(t) ) = constant + beta* Temperature(t-1) Δ WebHits(t) ) = constant + beta * Δ WebHits(t-1) Δ denotes change in,, i.e., delta, DIFF 5

History of regression Was so-named by Sir Francis Galton, a 19th century scientist & adventurer Galton initially gained fame for his African explorations and wrote best-selling selling books on wilderness exploration that introduced the sleeping bag & other wilderness gear to the Western world (still in print) 6

Galton (warts and all) Was also a pioneer in the collection & analysis of biometric, anthropometric & psychometric data, inspired by the evolution theory of Darwin Invented weather maps and pioneered the scientific study of tea-brewing He was also wrong about some things (e.g., eugenics) His disciple, Karl Pearson, worked out the mathematics of correlation and regression (Look him up in Google or Wikipedia,, also Galton.org) 7

Galton s observations A taller-than than-average average parent tends to have a taller- than-average average child, but the child is likely to be less tall than the parent relative to its own generation Parent s height = x standard deviations from the mean child's predicted height = rx standard deviations from the mean,... where r is a number less than 1 in magnitude: the coefficient of correlation between heights of parents and children This is a "regression" toward mediocrity," or in modern terms a "regression" to the mean." The first regression line (1877) 8

Graphical interpretation of regression If you standardize the X and Y variables by converting them to units of standard deviations from their own means, the prediction line passes through the origin and has a slope equal to the correlation coefficient, r. Thus, the line regresses back toward the X- axis, because this minimizes the squared errors in predicting Y from X. standardize(y3) Graphical interpretation 3 2 1 0-1 -2-3 -3-2 -1 0 1 2 3 standardize(x) On a standardized plot of Y vs. X, where the units are standard-deviations deviations-from- the-mean, the data distribution is roughly symmetric around the 45- degree line but the line for predicting Y from X regresses toward the X axis because this minimizes the squared error in the Y direction. The slope of the regression line on the standardized plot is the correlation r (=0.46 in this case). 9

Graphical interpretation 3 standardize(y3) 2 1 0-1 -2-3 -3-2 -1 0 1 2 3 standardize(x) If we instead wanted to predict X from Y,, the line would regress to the Y axis instead! (This line would minimize the squared error measured in the X direction.) Graphical interpretation of regression with time series data In a simple regression of two time series,, the forecast plot of Y is just a shifted and rescaled copy of the time series plot of X In a multiple regression of time series, the forecast plot of Y is a weighted sum* of the plots of the X s In either case, the time pattern in Y should look like some of the time patterns in the X variables: trends and peaks and valleys and spikes in Y ideally should have their counterparts somewhere among the X s * weights can be positive or negative 10

Regression is inescapable Your kids will probably be less exceptional than you and your spouse, for better or worse Your performance on the final exam in a course will probably be less exceptional than your score on the midterm A ballplayer s performance during the 2nd half of a season will probably be less exceptional than in the 1st half The hottest mutual funds of the last 5 years will be less hot in the next 5 years Regression is inescapable, cont d Your forecasting models will always produce sequences of forecasts that are smoother (less variable) than the actual data This doesn t mean the future is guaranteed to be more mediocre (less interesting) than the past, but that s the way to bet! 11

Why do predictions regress? Is there a restoring force that pulls everything back to the mean? No! It s a purely statistical phenomenon. Every observation of a random process is part signal (a predictable or inheritable component) and part noise (a random, unpredictable, zero- mean component). Here s why: An observation that is exceptional (far above or below the mean) is likely to be composed of a signal and a noise term with the same sign (both positive or both negative). If the high- (or low-) ) achiever performs again (or has offspring), the expected signal will be just as strong and in the same direction, but the expected noise term will be zero. Hence the second observation is likely (not guaranteed, just likely) to be closer to the mean. 12

Underlying assumptions of regression Linear relationship between variables Constant variance of errors (homoscedasticity( homoscedasticity) Normally distributed errors Independent errors (no autocorrelation) Stationary process (stable correlations over time) These need to be tested! Error statistics and confidence intervals for forecasts are not reliable if the assumptions are badly violated Sufficient statistics for regression Regression analysis depends only on the following summary statistics of the data: Means of all variables Variances (or standard deviations) of all variables Covariances (or correlations) between all pairs of variables Given only these statistics, you can calculate all the coefficient estimates, standard errors, and forecasts for any regression model that might be fitted to any combination of the variables! (However, you still ought to look at residual plots, etc., ) 13

Variance measures the tendency of a variable to vary (away from its mean) Population variance: VARP( Y ) = AVG(( Y Y )...the population variance is the average squared deviation of Y from its own mean Sample variance: VAR(Y) ) = (n/(( /(n-1))varp( 1))VARP(Y)...an unbiased estimate of the true variance based on a finite sample of size n Our forecasting task is to explain the variance in Y. Why does it vary in the way that it does i.e., why isn t t it always constant? 2 ) This factor adjusts for the degree of freedom for error that was used up back calculating the mean from the same sample. Covariance measures the tendency of two variables to vary together Population covariance: COVP( X, Y ) = AVG(( X X )( Y Y ))... is the average product of the deviations of X and Y from their respective means If Y and X tend to be on the same side of their respective means at the same time (both above or both below), the average product of deviations is positive. If they tend to be on opposite sides of their own means at any given time, the average product is negative. If their variations around their own means are unrelated,, the average product is zero. 14

Sample covariance Sample covariance: COV(X,Y) ) = (n/(( /(n 1))COVP( 1))COVP(X,Y)...an unbiased estimate of the true covariance based on a sample of size n,, analogous to the sample variance. Correlation The correlation coefficient is the covariance standardized by dividing by the product of standard deviations: r = COV(X,Y X,Y)/STDEV( )/STDEV(X)STDEV(Y) = COVP(X,Y X,Y)/STDEVP( )/STDEVP(X)STDEVP(Y) = CORREL(X,Y X,Y) ) in Excel It measures the strength of the linear relationship between X & Y on a relative scale of -11 to +1 When the correlation is significantly different from zero, variations in X can help to predict variations in Y 15

Simple regression formulas Model assumption: Prediction equation: Y t 0 1 t t Y t = β + β X + ε intercept slope = ˆ β + ˆ β X 0 1 ε = t independent identically normally distributed error t Least squares coefficient estimates: βˆ 1 = COV( X,Y )/VAR( X ) = r(stdev( Y )/STDEV( X )) ˆ β0 = AVG( Y ) ˆ β1 AVG( X ) We have exact formulas for the coefficient estimates don t t need to use Solver to minimize squared error. The slope coefficient is just the correlation multiplied by the ratio of standard deviations! Multiple regression formulas The formulas for coefficient estimates and forecast standard errors for the multiple regression model are merely matrix versions of the preceding formulas. If you re interested in the gory details, see the Regression formulas worksheet (SIMPREG.XLS) posted on the Course Outline web page (lecture 5 links). 16

Standard error of the regression The standard error of the regression, a.k.a., standard error of the estimate, is the RMSE adjusted for # coefficients estimated: n 1 n 1 n 2 n 2 s = STDEV( e ) = ( )(1 r )VAR( Y ) 2 Adjustment for # coefficients estimated (2) Sample standard deviation of the residuals (errors) Fraction of variance unexplained Original sample variance s is the estimated standard deviation of the true error process (ε( t ), and in general it is slightly larger than the sample standard deviation of the residuals, due to the adjustment for additional coefficients estimated besides the constant. All the other standard errors (for coefficients, ficients, means, forecasts, etc.) are proportional to this quantity. Standard errors of the coefficient estimates Standard error of the slope coefficient SE ˆ β 1 1 s = n STDEVP( X ) t-statistic of the slope coefficient: ˆ β t 1 ˆ = β1 SE ˆ β 1 The p-value of the t-stat is TDIST(t, n 2,2) in Excel The larger the sample size (n),( the more precise the coefficient estimate 17

Standard error of the mean The standard error of the mean at X = X t is the standard deviation of the error in estimating the true height of the regression line at that point: SE mean s = + n ( X AVG( X)) 1 tvarp( X ) 2 Same as standard error of the mean in the mean model Correction factor for distance of X t from the mean Standard error of the forecast The standard error of the forecast is 1 n 2 2 ( Xt AVG( X)) fcst = + mean = 1+ 1+ VARP( X ) SE s SE s 2 This term measures the noise (unexplained variation) in the data This term measures the error in estimating the height of the true regression line at X = X t Note that this is almost the same formula we used for the mean model in class 1. The only difference is that calculating SE mean is slightly more complicated here it depends on the value of X t. 18

Lower bounds on standard errors s n is a lower bound on the standard error of the mean ( equalled only when X=AVG(X) ) 1 s 1+ n is the corresponding lower bound on the standard error of the forecast Key point: the standard errors of the forecasts for Y are larger for values of X that are farther from the mean, i.e., farther from the center of the data distribution Confidence limits Confidence limits for a forecast are obtained by adding and subtracting the appropriate multiples of the forecast standard error (as usual). For large n (>20) a rough 95% confidence interval is plus or minus 2 standard errors The exact number of standard errors for a 95% interval, for any n,, is given by TINV(.05,n 2) in Excel A 50% interval is roughly 1/3 as wide (plus or minus 2/3 standard error) 19

250 X t = 210 Y2 200 150 95% conf. int. for mean 100 50 Note that confidence intervals are wider when X is far from the center this this probably understates the danger of over- extrapolating a linear model! 50 100 150 200 250 300 X 250 X t = 210 Y2 200 150 95% conf. int. for forecast 100 50 The confidence interval for the forecast reflects both the parameter risk concerning the slope & intercept of the regression line and the intrinsic risk 50 100 of random 150 variations 200 250 around it. 300 X 20

Strange but true For any regression model: VAR( Y ) = VAR( Yˆ ) + VAR( e) Total variance Explained variance + Unexplained variance For a simple regression model: VAR( Y ˆ ) / VAR( Y ) = i.e., fraction of variance explained = r squared r 2 R-squared The term R squared refers to the fraction of variance explained,, i.e., the ratio VAR( Y ˆ ) / VAR( Y ) regardless of the number of regressors. It measures the improvement of the regression model over the mean model for Y. A bigger R-squared R is usually better, for the same Y, but R-squared R should not be used to compare models that may have used different transformations of Y and/or different data samples. R-squared can be very misleading for regressions involving time series data: 90% or more is not necessarily good, and 10% or less is not necessarily bad. 21

Example: rolling back the beer tax Suppose the 1991 beer tax had been rolled back in July 2007, resulting an immediate 10-point drop in the beer price index (from 118.23 to 108.23) What would be the expected effect on per capita real consumption ( BeerPerCapita BeerPerCapita )? What would we predict for the consumption rate in July 2007? (June 2007 rate is $268.65 per year SAAR in year-2000 beer dollars.) 22

In search of a linear model 300 200 Variables BeerPerCapita BeerRelPrice What will happen to per Plot of BeerPerCapita vs BeerRelPrice capita consumption 1.02 in July 2007? 0.92 0.82 0.72 BeerPerCapita 290 260 230 200 170 Post tax hike anomaly 0.62 140 100 0.52 1960 1970 1980 1990 2000 2010 Scatterplot of BeerPerCapita vs. relative price (BeerPrice( BeerPrice/CPI) reveals a strong negative correlation and a highly linear relationship (except for mid- 90 s anomaly) Assumed 110 relative price drop 0.55 in June 07 0.65 0.75 0.85 0.95 BeerRelPrice The actual correlation is -0.94, which suggests that 0.94 2 88% of the variance in BeerPerCapita can be explained by BeerRelPrice Summary statistics of variables Here are summary stats and correlations of the two variables, obtained with the Multiple-Variable analysis procedure. Note that the standard deviation of BeerPerCapita is $39.15, which is essentially the forecast standard error we would get by using the mean model to predict it. How much better can we do with a regression model? Well, there is a very strong negative correlation of -0.94 with BeerRelPrice,, and the square of the correlation is the fraction by which the error variance can be reduced by regressing BeerPerCapita on BeerRelPrice rather instead of using the mean model. Variable definitions: BeerPerCapita = 100000*Beer/(BeerPrice BeerPrice*Population) BeerRelPrice = BeerPrice/CPI 23

Fitting a simple regression model: Relate/Multiple Factors/Multiple Regression on the Statgraphics menu Typical regression output Standard error of the regression, a.k.a. standard error of the estimate, is the RMSE adjusted for # coefficients estimated The bottom line IF it is really representative of future accuracy R-squared & adjusted* * R-squaredR Not the bottom line! Coefficients & their standard errors, t-stats (=coeff coeff./std. error) & p-values Residual plots and diagnostic tests Used to test whether some variables are insignificant in the presence of the others Used to test assumptions of linearity, normality, no autocorrelation, etc. *Adjusted for # coefficients in the same way as the standard error of the regression, to be able to compare among models with different # s of coefficients 24

What to look for in regression output Error measures: smaller is better t-statistics of coefficients greater than 2 in magnitude? (p-values < 0.05) variables appear significant * Economic interpretations of coefficients Residual plots & diagnostic tests Residuals vs. predicted (nonlinearity?) Normal probability plot (skew? fat tails? outliers?) Residuals vs. time (for time series data) Residual autocorrelation plot (for time series data) *Not a hard and fast rule, but variables that don t pass this test can often be removed without being missed. If a variable s presence in the model is strongly supported by intuition or theory, then a low t-stat may be OK: its effect may just be hard to measure. Basic regression output The Interval plot option plots the regression line vs. the dependent variable or time index. R-squared = 88% as expected, slope coefficient (-280.8)( is highly significant (t-stat = -64). Standard error of regression is $13.80, much less than original standard deviation, but still a lot of error in predicting next month s per capita consumption! Durbin-Watson stat and lag-1 1 autocorrelation are also very bad! (DW should be close to 2, not zero, lag-1 1 auto should be close to zero, not 1!) 25

Deconstruction of R-squaredR The variance of the dependent variable is (39.15) 2 = 1533. This is the error variance you would get by using the mean model. The variance of the regression forecast errors is the square of the regression standard error, which is (13.8) 2 = 190. The fraction of the original variance that remains unexplained is 190/1532 12%, hence the fraction explained is 88%. This is the reduction in error variance compared to using the mean model instead. What s the Durbin-Watson statistic? It s just an alternative statistic for measuring lag-1 autocorrelation in the residuals, which is popular in regression analysis for obscure historical reasons 0< DW < 4, and ideally DW 2 DW 2(1 r 1 ) where r 1 = lag-1 1 autocorrelation r 1 is easier to interpret: a good value is close to 0, and r 2 1 is roughly the percentage of further variance reduction that could be achieved by fine- tuning to reduce the autocorrelation. 26

Economic interpretation of model The slope coefficient of -280.8 suggests that a.01 decrease in the relative price (X)( ) should increase consumption (Y)( ) by $2.81 The proposed tax rollback would decrease the relative price by 0.049 (from 0.569 to 0.520) Thus, predicted consumption in July 07 will increase by 0.049*280.8 = $13.76 from its predicted June 07 value But the model s prediction for June 07 is already way off! Hence the prediction for July 07 is less than the actual June 07 value, despite the price drop. Forecasting equation of model 1 Forecasting equation of this model: Y t = 386.3 280.8 X t For July 07: Y t = 386.3 280.8(0.52) 240.20 The forecast for Y depends (only) on the current value of X, not on recent values of Y May 07 June 07 July 07 BeerRelprice (X) 0.565 0.569 0.520 BeerPerCapita (Y) 272.33 268.65 240.20?? 27

The forecast ( Reports ) report The multiple regression procedure automatically shows forecasts (on the Reports report) if future values are provided for the independent variable(s).. Here, a July 2007 value of 0.52 was plugged in for BeerRelPrice on the data spreadsheet, and the resulting forecast for BeerPerCapita is $240.20, which is $28.45 below the current value of $268.65. The upper 95% limit of $267.4 is even below the current t value! Last data point (268.65) is somewhere in here 95% CI for forecast for BeerRelPrice = 0.52 Here s the plot of the regression line with confidence 95% confidence limits for the forecasts. This is the Interval plots chart drawn with h 95% intervals for predicted values (a right-mouse mouse-button option). The interval for the July 07 prediction is at the upper left where BeerRelPrice=0.52. 28

Plot of residuals vs. row number (time) shows severe autocorrelation, i.e., long runs of values with the same sign, as foretold by bad DW stat and lag-1 1 autocorrelation, and the most recent errors have been especially large. Plot of predicted values (red) vs. row number ( Interval plot ) shows poor fit to data, and the predicted jump in July 07 falls well short of the June 07 value. The predicted values are actually just a shifted and rescaled version of BeerRelPrice. Regression option in Forecasting procedure Here model E is specified as a mean + 1 regressor model You can also fit the same regression model in the Forecast/User-Specified Model procedure. Choose the Mean model type and hit the Regression button to add independent variables. This approach allows you to use the model-comparison features and additional residual diagnostics in the forecasting procedure. Caveat: no more than 4 independent variables are allowed here. 29

Same regression results and forecast, but the normal probability plot and autocorrelation plot of the residuals look terrible,, and the comparison with simpler time series models is not flattering! Conclusion (so far ) Although this model provides a plausible estimate of the macroeconomic relationship between relative price and per capita consumption (assuming that the long- term upward trend in consumption is entirely caused by the long-term downward trend in relative price!), it does not do a very plausible job of forecasting the near future. Why not? It is a cross-sectional sectional model that does not exploit the time dimension in the data: it predicts consumption for a randomly chosen relative price. Due to other, unmodeled factors, the data wanders away from the regression line and does not return very quickly errors are strongly correlated. 30

How to incorporate the time dimension in a regression model? Some possible approaches: Predict changes instead of levels (i.e., use a first- difference transformation) Use lagged variables (recent values of dependent and independent variables) as additional regressors,, to serve as proxies for effects of unmodeled variables* Use an autocorrelation correction (e.g., Cochrane-Orcutt or ARIMA error structure) as a proxy for unmodeled factors* *We ll discuss these in later classes Let s look at monthly changes Here s a plot of the original BeerPerCapita series obtained in the Time Series/Descriptive Methods procedure. No transformations have been performed yet. 31

On the right-mouse mouse-button Analysis Options panel, entering a 1 in the Differencing box performs a first-difference transformation. Now we are seeing the plot of month-to to-month changes in BeerPerCapita. Here are time series plots of both BeerPerCapita and BeerRelPrice,, before and after a first-difference transformation. Note that the differenced series appear to be stationary : no trend, constant variance, etc. The T circled point in lower right is the assumed price impact of tax rollback in July 2007. (This 4-chart arrangement was made by pasting the plots into the Statgallery Statgallery. ) 32

Plot of diff(beerpercapita) vs diff(beerrelprice) 15 diff(beerpercapita) 10 5 0-5 Here the Plot/Scatterplot Scatterplot/X-Y Y Plot procedure was used to plot diff(beerpercapita) vs. diff(beerrelprice). -10-15 -0.03-0.02-0.01 0 0.01 0.02 0.03 0.04 diff(beerrelprice) Scatterplot of differenced variables indicates a weaker but still significant nt negative correlation (-0.33).( The two circled points in the lower right are the drops in Jan. and Feb. 1991 due to the beer tax increase. Even when w these two points are de-selected, there is still a significant negative correlation of -0.28. Statistics of the differenced variables The correlations and other summary stats of the differenced variables were computed using the Describe/Numeric Data/Multiple- Variable Analysis procedure. Note that the standard deviation of diff(beerpercapita) ) is only $2.637. This is roughly the forecast standard error you would get by using a random walk with drift model to predict BeerPerCapita,, because the random walk model merely predicts that each change will equal the mean change. Hence we can already see that the forecast standard error of the RW model is smaller than that of the original cross-sectional sectional regression by roughly a factor of 5. However, let s see if we can improve on the RW model by regressing diff(beerpercapita) ) on diff(beerrelprice). 33

Simple regression of diff(beerpercapita) on diff(beerrelprice) In a simple regression of the differenced variables, the change in BeerPerCapita is predicted from the change in BeerRelPrice. This is a micro prediction rather than a macro prediction. Our predicted level of BeerPerCapita in the next period will be equal to the current level plus the predicted change. Regression standard error is vastly superior! Still some autocorrelation, but not nearly as bad. The estimated coefficient of diff(beerrelprice ) is -257.7, in the same ballpark as the coefficient of BeerRelPrice in the earlier model. Hence a similar change in consumption per unit change in relative price is predicted. However, this model is directly predicting the change,, not the level. The predicted change in July 07 is positive (+$12.65) in line with intuition. But what happened to R-squared? It s fallen to around 11%! (Horrors) 34

What happened to R-squared?? R The previous model explained 88% of the variance in the monthly level of BeerPerCapita. Because BeerPerCapita is a nonstationary, trended variable, it has a lot of variance to explain! This model directly predicts the change in BeerPerCapita,, which is a stationary series with a much lower variance to begin with. Hence, less variance remains to be explained by this regression model, and an R-squared R of only 11% is actually a much better performance. Another way to look at it: When the dependent variable is undifferenced, R-squared measures the reduction in error variance compared to using the mean model. When the dependent variable is differenced, R-squared measures the reduction in error variance compared to using the random walk with drift model on the original variable. Here, a random walk model (or another simple time series model) would have been a much better reference point for predictions of monthly per capita beer consumption. The regression of differenced variables is a walk model in which the steps are not completely random: they depend on the change in price! 35

Deconstruction of R-squaredR The variance of the differenced dependent variable is (2.637) 2 = 6.95. This is the error variance you would get by using the random walk with drift model on the original undifferenced variable. The variance of the regression forecast errors is the square of the regression standard error, which is now (2.494) 2 = 6.22 The fraction of the original variance that remains unexplained is 6.22/6.95 89%, hence the fraction explained is 11%. This is not a huge improvement over the random walk model in terms of forecast accuracy, but it does allow us to factor in the price sensitivity of consumers. Forecasting equation for model 2 Forecasting equation for the change in Y: (Y t Y t-1 ) = 0.0893 257.7(X t X t-1 ) For July 07: (Y t Y t-1 ) = 0.0893 257.7 (0.520-.569).569) = 12.65 Undifferenced forecast for new level of Y: Y t = Y t-1 + 12.65 = 268.65 + 12.65 = 281.30 The ultimate forecast from this model steps off from the last actual value of Y,, as in the random walk model, but now the step size depends on the change in X Dec-04 Jan-05 Feb-05 BeerRelprice (X) 0.565 0.569 0.520 BeerPerCapita (Y) 272.33 268.65 281.30!! 36

Same model in Forecasting procedure There are several ways in which the differenced regression model can be fitted in the Forecasting procedure. The simplest way is to specify it as an ARIMA model with 1 order of nonseasonal differencing plus 1 regressor and a constant term. The first-difference transformation is applied to both variables prior to fitting the regression model. Almost the same regression results and forecast (slightly different estimation procedure) and the normal probability plot and autocorrelation plot of the residuals are much better (not perfect, but acceptable). The differenced regression model (B) is best on all error measures, but not by a large margin. 37

More fine tuning?? The differenced model still has a technically significant lag-1 1 autocorrelation of -.23. Because it is negative, it means the model is over- reacting rather than under-reacting reacting to recent changes in the data. By the r-squared rule, this suggests that 0.23 2 = 0.05 5% of the remaining variance might be explained via more fine-tuning (e.g., adding lagged variables). This is not a large improvement: it corresponds to about a 2.5% further reduction in standard error, hence a 2.5% shrinkage in confidence intervals. Class 5 recap Regression to mediocrity is inescapable Correlations and scatterplots help to reveal strengths of linear relationships How to interpret regression output & test residuals Much of the variance in the original data may be explainable merely by an appropriate transformation of the data, such as a first- difference transformation applied to nonstationary time series variables. R-squared is not the bottom line! 38