NATCOR Regression Modelling for Time Series

Similar documents
Basic Business Statistics 6 th Edition

Statistics for Managers using Microsoft Excel 6 th Edition

Ch 2: Simple Linear Regression

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

Business Statistics. Lecture 10: Correlation and Linear Regression

STAT 212 Business Statistics II 1

TIME SERIES ANALYSIS AND FORECASTING USING THE STATISTICAL MODEL ARIMA

Correlation Analysis

Multiple Regression. Peerapat Wongchaiwat, Ph.D.

Suan Sunandha Rajabhat University

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

Inferences for Regression

Ref.: Spring SOS3003 Applied data analysis for social science Lecture note

ECON The Simple Regression Model

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Mathematics for Economics MA course

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

Lecture Prepared By: Mohammad Kamrul Arefin Lecturer, School of Business, North South University

We like to capture and represent the relationship between a set of possible causes and their response, by using a statistical predictive model.

Figure 1. Time Series Plot of arrivals from Western Europe

Lecture 11: Simple Linear Regression

The simple linear regression model discussed in Chapter 13 was written as

FinQuiz Notes

Inference for the Regression Coefficient

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.

The Multiple Regression Model

Review of Statistics 101

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

Section 3: Simple Linear Regression

Chapter 16. Simple Linear Regression and dcorrelation

The Simple Regression Model. Part II. The Simple Regression Model

Lecture 9: Linear Regression

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

Lecture 10 Multiple Linear Regression

Any of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure.

Simple and Multiple Linear Regression

Homework 2: Simple Linear Regression

Econometrics Summary Algebraic and Statistical Preliminaries

Six Sigma Black Belt Study Guides

Multiple Regression Analysis

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

Ch 13 & 14 - Regression Analysis

Lecture Prepared By: Mohammad Kamrul Arefin Lecturer, School of Business, North South University

Time Series Analysis of United States of America Crude Oil and Petroleum Products Importations from Saudi Arabia

ECON3150/4150 Spring 2015

Multiple Regression Analysis

: The model hypothesizes a relationship between the variables. The simplest probabilistic model: or.

Bivariate Regression Analysis. The most useful means of discerning causality and significance of variables

STAT Chapter 11: Regression

Correlation and Regression

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

Chapter Goals. To understand the methods for displaying and describing relationship among variables. Formulate Theories.

Lab: Box-Jenkins Methodology - US Wholesale Price Indicator

5.1 Model Specification and Data 5.2 Estimating the Parameters of the Multiple Regression Model 5.3 Sampling Properties of the Least Squares

Diagnostics of Linear Regression

Ch 3: Multiple Linear Regression

Unit 10: Simple Linear Regression and Correlation

28. SIMPLE LINEAR REGRESSION III

Regression Analysis II

2.4.3 Estimatingσ Coefficient of Determination 2.4. ASSESSING THE MODEL 23

4.1 Least Squares Prediction 4.2 Measuring Goodness-of-Fit. 4.3 Modeling Issues. 4.4 Log-Linear Models

Chapter 16. Simple Linear Regression and Correlation

Homoskedasticity. Var (u X) = σ 2. (23)

TESTING FOR CO-INTEGRATION

The multiple regression model; Indicator variables as regressors

Univariate ARIMA Models

Review of Statistics

Simple Linear Regression

Chapter 13 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics

MODELLING TIME SERIES WITH CONDITIONAL HETEROSCEDASTICITY

REED TUTORIALS (Pty) LTD ECS3706 EXAM PACK

Business Statistics. Lecture 9: Simple Regression

Inference for Regression

y response variable x 1, x 2,, x k -- a set of explanatory variables

Finding Relationships Among Variables

Chapter 3 Multiple Regression Complete Example

LI EAR REGRESSIO A D CORRELATIO

Basic Business Statistics, 10/e

Applied Econometrics. Professor Bernard Fingleton

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

SIMPLE REGRESSION ANALYSIS. Business Statistics

Formal Statement of Simple Linear Regression Model

Simple Linear Regression

Econometrics I Lecture 3: The Simple Linear Regression Model

Univariate analysis. Simple and Multiple Regression. Univariate analysis. Simple Regression How best to summarise the data?

THE ROYAL STATISTICAL SOCIETY 2008 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE (MODULAR FORMAT) MODULE 4 LINEAR MODELS

Midterm 2 - Solutions

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Chapter Learning Objectives. Regression Analysis. Correlation. Simple Linear Regression. Chapter 12. Simple Linear Regression

Multiple Regression Analysis. Basic Estimation Techniques. Multiple Regression Analysis. Multiple Regression Analysis

TMA4255 Applied Statistics V2016 (5)

Regression Analysis. BUS 735: Business Decision Making and Research

Dummy Variables. Susan Thomas IGIDR, Bombay. 24 November, 2008

REVIEW 8/2/2017 陈芳华东师大英语系

Multivariate Regression Model Results

The Simple Linear Regression Model

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Inference for Regression Simple Linear Regression

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

Midterm 2 - Solutions

Transcription:

Universität Hamburg Institut für Wirtschaftsinformatik Prof. Dr. D.B. Preßmar Professor Robert Fildes NATCOR Regression Modelling for Time Series The material presented has been developed with the substantial help of Dr. Sven Crone

Forecasting Methods RECAP: Extrapolative vs. Explanatory (Causal) Methods Volume Volume Time Use only past historical data to recognise patterns and extrapolate them into the future. Promotion BOGOF Price Relationships between dependent and independent Variables are identified and assumed to hold in the future Use additional explanatory variables to predict the future. There is no a-priori superiority of either approach depends on problem & data

Using regression in time series forecasting research In market modelling Diffusion of new technologies Optimization of the marketing mix promotions One important issue: Decoupling Revenue management the optimization from the forecasting is suboptimal In retail, hotels, transport all models depend on demand Best models policies under certainty with sensitivity testing are often far from best under forecast uncertainty In operations Collaboration In forecasting and econometric research Aggregation, non-linearity, conditions where causal methods add value Analysis of behavioural and field experiments

Agenda Regression. Recap: Multiple Regression Concept of Causal Models Choosing the variables 2. Recap: Specifying Regression Models Estimating a multivariate Regression Model Validating a Regression Model 3. Extending the basic Regression for Time Series Modelling (deterministic) Seasonality Modelling Outliers & Level Shifts Including Dummy Variables for Promotions / Events Including Time Lags 4. The Model building Process

CAUSAL FORECASTING a simple regression model Inputs or Explanatory variables X, X 2 Output or Dependent variable, Y Y = α + β X + β X +... + t t 2 2 t error t Past relationships in the data are identified and assumed to hold in the future Estimated model: Y = a + b X + b X +... + t t 2 2 t error t

Deciding the Causal Variables Variable to be forecast Retail Petrol price How do you decide the variables? Prior Research Experience Experts Brainstorming

Preliminary Model Development Graphical Analysis and Preliminary Model Development Suppose we are interested in the level of gas prices as a function of various explanatory variables. Observe Gas Prices (=Y t ) over n time periods, t =, 2,, n Step : DDD (Draw the D- Diagram), a time plot of Y against time Step 2: produce a scatter plot of Y against each explanatory variable X j o For step 2, identify possible variables: E.g. Personal Disposable Income Unemployment S&P 500 Index Price of crude oil Data: Jan 996 Dec 200

A Case Study on the Price of Gasoline Suppose we are interested in predicting the price of (unleaded regular) gasoline (at the pump), given the price of crude oil at the refinery. We examine monthly data. The price of crude oil takes some time to have its effect on the pump price, so we lag the price of crude by one month. Define the variables: Y = Unleaded X = L_crude Q: Why else might we use a lagged value for the X variable? 8

Graphical Analysis and Preliminary Model Development Data: Jan 996 Dec 200

Regression Correlation Correlation does not imply causation!!!. The solution to fire fighting! The more firemen fighting a fire, the more damage there is going to be. Therefore firemen cause damage. 2. Reducing crime levels is simple... Since the 950s, both the atmospheric CO 2 level has increased sharply at the same time as global temperature. Hotter weather always leads time higher crime levels. Hence, reducing atmospheric CO 2 will lead to decreasing crime. But check out Bernard Shaw in Ord and Fildes

Graphical Analysis and Preliminary Model Development First row: Pearson correlation Second row: P-Value Measures the linear relationship Data: Jan 96-Dec 200

Agenda Regression. Recap: Multiple Regression Concept of Causal Models Choosing the variables 2. Recap: Specifying Regression Models Estimating a multivariate Regression Model Validating a Regression Model 3. Extending the basic Regression for Time Series Modelling (deterministic) Seasonality Modelling Outliers & Level Shifts Including Dummy Variables for Promotions / Events Including Time Lags 4. The Model building Process (Lecture 5)

The Multiple Regression Model Assume a linear relation between Y and X,,X p where: Y = β + β X + β X + + β X + ε 0 2 2 p p β 0 = intercept (value of Y when all X j = 0) β j = expected effect of X j on Y, all other factors fixed ε = random error Expected value of Y given the {X j }: So Or ( X) = = β0 + β + β2 2 + + βp p EY Y X X X Y = [Expected value] + [Random error] Observed= Signal + Noise

The Method of Ordinary Least Squares (OLS) Define error = Observed Fitted e= Y EY ( X) i i i Estimate the intercept and slope coefficients by minimizing the sum of squared errors (SSE). That is, choose the coefficients { b to minimize: 0, b,, b K } n n 2 2 i ( i 0 i 2 2i K Ki) i= i= SSE = e = Y b b X b X b X

Least Squares Estimators Equivalent to Maximum Likelihood Unbiased On average the estimators are centred on the true parameter value Efficient E( β ˆ) = β There is no other linear estimator with a smaller variance var( ˆ) β Consistency var( β*) for any other estimator β * As the sample size, n, increases (to ) the estimator β ˆn β

Regression Multivariate Models Ordinary Least Squares Linear Regression (OLS) The maths = t t y y y y 2 2 ( ) p p t p t tp x x x x X x x + = = t t ε ε ε ε 2 0 ( ) p p β β β β + = y = β 0 + x β + x 2 β 2 + x 3 β 3 +. ε y 2 = β 0 + x 2 β + x 22 β 2 + x 23 β 3 +. ε 2 y t = β 0 + x t β + x t2 β 2 + x t3 β 3 +.ε t

Ordinary Least Squares Linear Regression (OLS) The maths y = Xβ + ε Minimize, a scalar ee Regression Multivariate Models ˆβ = ( ) X X X y TIP! Where is the constant? = p x 2 p if X β t p 0 x x tp is a constant X = transpose of matrix X, switch between rows and columns X - = inverse of matrix X, XX - = I

Testing Individual Coefficients Y = β + β X + β X + + β X + ε 0 2 2 p p Test of a coefficient H 0 : slope coefficient for X i is zero [X i does not add value to model, given other variables in the model] H A : slope coefficient for X i is not zero [X i adds value, given other variables in the model] Test Statistic t = b i / SE(b i ) b i = slope coefficient SE(b i ) = standard error of b i

Testing the Model The estimated coefficients are random unbiased The decision rule for testing the individual model coefficients is: o Reject H 0 if P < α o where P is the observed significance level, and o α is the significance level used for testing, typically 0.05. o The rule implies that we do not reject H 0 if P > α. o Rule applies for all testing including F test for model With a variable insignificant with Null hypothesis of no linear effect, i.e b =0, what do we then do? Y = β + β X + β X + + β X + ε 0 2 2 p p

The Multiple Regression Model Example 8.: The Regression Model for Unleaded Gasoline prices The regression equation is Unleaded = 0 + 2.20 L_ crude - 6.4 L_Unemp - 0.0406 L_SP500 + 0.025 L_PDI Predictor Coef SE Coef T P Constant 0.37 2.63 4.69 0.000 L_ crude 2.200 0.068 20.60 0.000 L_ Unemp -6.397 3.752-4.37 0.000 L_SP500-0.04063 0.0360-2.99 0.003 L_PDI 0.0254 0.002324 5.40 0.000 Q: Interpretation do model make sense?

How can we compare different models? Let s look at fit. Date Unleaded Crude ResCrude ResPDI 5-Jan-96.09 8.85 0.34-0.00 5-Feb-96.089 9.09 0.37-0.08 Yˆ i = a + 5-Mar-96.37 2.33 0.350-0.20 bx 5-Apr-96.23 23.5 0.430-0.085 i 5-May-96.279 2.7 0.464 0.026 e i = Y i 5-Jun-96.256 20.42 0.426 0.024 Yˆ residual 5-Jul-96.227 2.3 0.396-0.029 i = 5-Aug-96.207 2.9 0.366-0.066 5-Sep-96.202 23.97 0.348-0.26 5-Oct-96.204 24.88 0.342-0.49 5-Nov-96.232 23.7 0.359-0.089 5-Dec-96.235 25.23 0.350-0.28 RMSE 0.376 0.097 What are good measures of the fit of our model. the individual residuals tell us everything, but can we summarise their information?

Partition of the Sum of Squares Total Sum of Squares: How Good is the Fitted Line? SST = S = ( Y Y ) Sum of Squared Errors (Unexplained Variation): Sum of Squares accounted for by the regression equation: i= n The sums of squares are partitioned: SST = SSR + SSE Our aim: to build a model that explains the variation in Y, SST i= n i= YY i= n i= SSE = ( Y Yˆ ) SSR = ( Yˆ Y ) i= i i i 2 2 i 2

ERROR VARIANCE The residual variation is also used to calculate the error variance (and standard deviation) 2 ( ) 2 Y 2 i Yi s = σ = n p It measures the accuracy with which the model explains the data. The accuracy of a prediction using the model depends on it. MINITAB & SPSS carries out the calculation automatically calculating a confidence interval, I. ie. 90% of the time, actual lies within the Range. [ Υˆ Ι, Υˆ Ι] You need to select the 'confidence' probability within which you expect your the observation to lie. p p +

Explained MS F = = Un explained MS = n i= n i= ( Yˆ Y) i ( Y Yˆ ) i i 2 2 K ( n K ) F-test Explained SS Explained df Unexplained SS Unexplained df MS = Mean Square SS = Sum of Squares df = degree of freedom m = number of parameters (coefficients in equation) = K+ with K explanatory variables and a constant n = number of observations

SUMMARY STATISTICS The 'F' statistic and associated p value measures the overall explanatory power of the explanatory (independent) variables. The model should be chosen so that at least some elements of it are likely to have strong explanatory power. The F test is therefore not particularly important. The 't' tests measure the impact of individual variables. Methods of comparing alternative models are therefore needed. Commonly used * R 2 - a measure of the overall adequacy of the model * σ or s - the standard deviation of the error term - measures the uncertainty around the predictions i.e. the model fit

TESTING THE ADEQUACY OF A MODEL Is this a good model Criteria include * Interpretation of the estimated model * Strength of relationship between Xs and Y * Overall adequacy of Model * Validity of Assumptions * Errors in Predictions No simple rules

Linear model is: STRENGTH OF THE RELATIONSHIP Does the independent variable (x) affect the dependent variable (y) Y = β0 + β X + ( error) A unit change in X produces an (linear additive) impact of β in Y. Interpreting the Model: should make sense, i.e correct sign, low standard error Elasticity The proportionate change in Y relative to the proportionate change in X is measured as: Y Y X X

Assumption of normality Checking the Assumptions Plot a histogram Plot a P-P plot Statistical tests:. Kolmogorov-Smirnov 2. Chi-square

Assumption of constant variance Checking the Assumptions Most of the residuals contained in a fixed corridor constant variance Statistical tests:. Split residuals in two sub-samples 2. Hypothesis testing: a. H0: σ = σ 2 b. Hα: σ σ 2 Also check for:. Systematic patterns 2. Trend 3. Seasonality None should be present!

Unpredictability of residuals Checking the Assumptions Plot residuals against:. Predicted values (Ŷ) 2. All explanatory variables (X) 3. Time There should be no patterns Linear Nonlinear patterns What is this point? Outlier? Statistical tests:. Correlation Looks more or less random

Detecting serial correlation on residuals Residual Autocorrelation Function (ACF) Checking the Assumptions Sample Autocorrelation 0.8 0.6 0.4 0.2 0-0.2 Strong serial correlation Model invalid! -0.4 0 2 5 0 5 20 χ Lag Statistical tests: Ljung- Box test Residuals should be uncorrelated There should be no significant lags Can be used to check for season/trend (there should be none!) 2 LB = r where r is the k th order autocorrelation 2 k rk = corr(e t,e t k) k coefficient With degrees of freedom

Checking the Assumptions

Analysis of Residuals for Gas Price Data Residuals appear to be approximately normal (Probability Plot and Histogram), but there are some outliers o Check the original data to identify the outliers and to determine possible explanations Model does not capture time dependence Zig-zag pattern in Residuals vs. Order Errors are not homoscedastic See in Residuals vs. Fitted Value Increased volatility in the later part of the series See in Residuals vs. Order Some evidence of seasonal pattern Look for peaks every 2 months in Residuals vs. Order

Analysis of Residuals for Gas Price Data Graph residuals The Autocorrelation Function (ACF) allows us to check for dependence at a range of possible lags. Ljung-Box stat rk = corr(e t,e t k) 2 LB = r k NB: The Durbin-Watson test examines only first order autocorrelation: invalid with lags Values centred at between 0 and 4 2

Forecasting with Multiple Regression Example: One-step-ahead forecasts for gas prices We use the four-variable model for gas prices as an illustration. Crude, Unemployment, S&P, PDI Estimate the model to December 2008 One-step-ahead forecasts were generated from equation, the forecast for January 2009 uses the December 2008 values of the explanatory variables. The regression model is Unleaded =.0 + 0.0220 L_crude - 0.64 L_Unemp - 0.000406 L_S&P + 0.00025 L_PDI The values for December 2008 are as follows: L_crude: 4.2, L_Unemp: 7.3, L_S&P: 888.6, L_PDI:2257.7 The forecast for January 2009 is then F =.0 + 0.0220 *4.2 0.64*7.3 0.000406 *888.6 + 0.00025 *2257.7 =.889.

Forecasting with Multiple Regression Given values of the inputs The point forecast is given by: F = b + bx + bx + + b X The Prediction Interval is given by: F ± t ( n K ) * SE( Y F ). where t denotes the appropriate percentage point from t- tables Assumes Xs are known Example { X, X,, X } n+, n+,2 n+, K n+ 0 n+, 2 n+,2 K n+, K n+ α /2 n+ n+ K = 4 and n = 55, so DF = 50. The SE for the point forecast is found to be 0.695 (use R, not SPSS). Using t 0.025 (50) =.976, we find that the 95 percent prediction interval is.8975 ± (.976)*(0.695) =.8975 ± 0.335 = [.563, 2.233].

Issues in Multivariate Regression Establishing the variables to include Testing for insignificant coefficients (& variables) Estimation, summary statistics and testing (same as in the bivariate single explanatory variable case) Simplifying the model Removing inappropriate variables Avoiding spurious relationships Comparing different models

Summary - A Multivariate Example The process of model building Establish a default model Graph the data Any features that need explaining? Estimate models Check assumptions: Diagnostic checks Compare models Interpretation Summary statistics (is the model fit for purpose?) Revise model

Gas prices model building I. Graph focus variable retail price 2. Identify possible explanatory variables L_Personal Disposable Income L_Unemployment L_S&P 500 Index L_Price of crude oil Seasonal dummies And? 3. Estimated model is: + seasonal dummies Model Coefficients a U sta da d ed Coefficients B Std. Error t Sig. L_Crude.024.00 35.279.000 L_PDI 4.994E-05.000 3.903.000 L_Unemp -.04.0 -.36.90 L_Prod.003.004.775.439 L_SP 4.078E-05.000.567.57

Gas prices model building II Summary Stats Diagnostics Outliers? Residuals (approx normal), homoscedastic Serial (auto) correlation? Tests, graphs Add lags Relationship with Xs? Stability Split? Forecasting performance Compared to other regression specifications + univariate benchmark

Take-Aways Start the modeling process by careful consideration of available theory and previous empirical studies Carry out a full preliminary analysis of the data to look for associations and for unusual observations Test both the overall model and the individual components Examine the validity of the underlying assumptions Make sure that the model is sensible with respect to the signs and magnitudes of the slope coefficients Use a hold-out sample to evaluate forecasting performance.

Questions? Basic reference: Ord, K and Fildes, R (203) Principles of Business Forecasting, South- Western, Cengage