Univariate ARIMA Models

Similar documents
TIME SERIES ANALYSIS AND FORECASTING USING THE STATISTICAL MODEL ARIMA

Classic Time Series Analysis

FE570 Financial Markets and Trading. Stevens Institute of Technology

Problem Set 2: Box-Jenkins methodology

Forecasting using R. Rob J Hyndman. 2.4 Non-seasonal ARIMA models. Forecasting using R 1

FORECASTING SUGARCANE PRODUCTION IN INDIA WITH ARIMA MODEL

MODELING INFLATION RATES IN NIGERIA: BOX-JENKINS APPROACH. I. U. Moffat and A. E. David Department of Mathematics & Statistics, University of Uyo, Uyo

Ch 6. Model Specification. Time Series Analysis

at least 50 and preferably 100 observations should be available to build a proper model

Chapter 12: An introduction to Time Series Analysis. Chapter 12: An introduction to Time Series Analysis

Prof. Dr. Roland Füss Lecture Series in Applied Econometrics Summer Term Introduction to Time Series Analysis

NANYANG TECHNOLOGICAL UNIVERSITY SEMESTER II EXAMINATION MAS451/MTH451 Time Series Analysis TIME ALLOWED: 2 HOURS

5 Autoregressive-Moving-Average Modeling

ARIMA modeling to forecast area and production of rice in West Bengal

A SARIMAX coupled modelling applied to individual load curves intraday forecasting

Lab: Box-Jenkins Methodology - US Wholesale Price Indicator

Autoregressive Moving Average (ARMA) Models and their Practical Applications

APPLIED ECONOMETRIC TIME SERIES 4TH EDITION

TIME SERIES ANALYSIS. Forecasting and Control. Wiley. Fifth Edition GWILYM M. JENKINS GEORGE E. P. BOX GREGORY C. REINSEL GRETA M.

Lecture 2: Univariate Time Series

Suan Sunandha Rajabhat University

Advanced Econometrics

Empirical Market Microstructure Analysis (EMMA)

Univariate Time Series Analysis; ARIMA Models

STAT Financial Time Series

Forecasting Area, Production and Yield of Cotton in India using ARIMA Model

Econometrics II Heij et al. Chapter 7.1

CHAPTER 8 MODEL DIAGNOSTICS. 8.1 Residual Analysis

Econometrics for Policy Analysis A Train The Trainer Workshop Oct 22-28, 2016 Organized by African Heritage Institution

Econometrics I: Univariate Time Series Econometrics (1)

Ch 8. MODEL DIAGNOSTICS. Time Series Analysis

FORECASTING OF COTTON PRODUCTION IN INDIA USING ARIMA MODEL

A Data-Driven Model for Software Reliability Prediction

Lesson 13: Box-Jenkins Modeling Strategy for building ARMA models

Lecture 4a: ARMA Model

Stat 5100 Handout #12.e Notes: ARIMA Models (Unit 7) Key here: after stationary, identify dependence structure (and use for forecasting)

Sugarcane Productivity in Bihar- A Forecast through ARIMA Model

Box-Jenkins ARIMA Advanced Time Series

Automatic seasonal auto regressive moving average models and unit root test detection

Implementation of ARIMA Model for Ghee Production in Tamilnadu

Design of Time Series Model for Road Accident Fatal Death in Tamilnadu

Chapter 8: Model Diagnostics

Forecasting the Prices of Indian Natural Rubber using ARIMA Model

Dynamic Time Series Regression: A Panacea for Spurious Correlations

CHAPTER 8 FORECASTING PRACTICE I

Autoregressive Integrated Moving Average Model to Predict Graduate Unemployment in Indonesia

EASTERN MEDITERRANEAN UNIVERSITY ECON 604, FALL 2007 DEPARTMENT OF ECONOMICS MEHMET BALCILAR ARIMA MODELS: IDENTIFICATION

Seasonal Autoregressive Integrated Moving Average Model for Precipitation Time Series

Circle a single answer for each multiple choice question. Your choice should be made clearly.

Non-Stationary Time Series and Unit Root Testing

The ARIMA Procedure: The ARIMA Procedure

Lecture 5: Estimation of time series

Firstly, the dataset is cleaned and the years and months are separated to provide better distinction (sample below).

Lecture 7: Model Building Bus 41910, Time Series Analysis, Mr. R. Tsay

Using Analysis of Time Series to Forecast numbers of The Patients with Malignant Tumors in Anbar Provinc

ECONOMETRIA II. CURSO 2009/2010 LAB # 3

Read Section 1.1, Examples of time series, on pages 1-8. These example introduce the book; you are not tested on them.

AE International Journal of Multi Disciplinary Research - Vol 2 - Issue -1 - January 2014

Non-Stationary Time Series and Unit Root Testing

Automatic Forecasting

Univariate linear models

Univariate, Nonstationary Processes

ARIMA Modelling and Forecasting

Empirical Approach to Modelling and Forecasting Inflation in Ghana

Study on Modeling and Forecasting of the GDP of Manufacturing Industries in Bangladesh

Analysis. Components of a Time Series

STAT 436 / Lecture 16: Key

Time Series Analysis of United States of America Crude Oil and Petroleum Products Importations from Saudi Arabia

Modelling Monthly Rainfall Data of Port Harcourt, Nigeria by Seasonal Box-Jenkins Methods

Scenario 5: Internet Usage Solution. θ j

Module 3. Descriptive Time Series Statistics and Introduction to Time Series Models

A stochastic modeling for paddy production in Tamilnadu

Quantitative Finance I

Author: Yesuf M. Awel 1c. Affiliation: 1 PhD, Economist-Consultant; P.O Box , Addis Ababa, Ethiopia. c.

Introduction to ARMA and GARCH processes

Forecasting of meteorological drought using ARIMA model

ARIMA Models. Richard G. Pierse

Part 1. Multiple Choice (50 questions, 1 point each) Part 2. Problems/Short Answer (10 questions, 5 points each)

Marcel Dettling. Applied Time Series Analysis SS 2013 Week 05. ETH Zürich, March 18, Institute for Data Analysis and Process Design

Minitab Project Report - Assignment 6

Non-Stationary Time Series and Unit Root Testing

Decision 411: Class 9. HW#3 issues

A SEASONAL TIME SERIES MODEL FOR NIGERIAN MONTHLY AIR TRAFFIC DATA

University of Oxford. Statistical Methods Autocorrelation. Identification and Estimation

Estimation and application of best ARIMA model for forecasting the uranium price.

Lecture note 2 considered the statistical analysis of regression models for time

3 Theory of stationary random processes

Time Series 4. Robert Almgren. Oct. 5, 2009

Econometrics for Policy Analysis A Train The Trainer Workshop Oct 22-28, 2016

Time Series Forecasting: A Tool for Out - Sample Model Selection and Evaluation

CURSO DE ECONOMETRÍA II MODELOS ESTACIONARIOS Y METODOLOGÍA BOX-JENKINS. 1

Chapter 4: Models for Stationary Time Series

Chapter 6: Model Specification for Time Series

Ch 4. Models For Stationary Time Series. Time Series Analysis

Applied time-series analysis

Ch 9. FORECASTING. Time Series Analysis

Forecasting. Simon Shaw 2005/06 Semester II

Econometric Forecasting

Multivariate Time Series Analysis and Its Applications [Tsay (2005), chapter 8]

Review Session: Econometrics - CLEFIN (20192)

Transcription:

Univariate ARIMA Models ARIMA Model Building Steps: Identification: Using graphs, statistics, ACFs and PACFs, transformations, etc. to achieve stationary and tentatively identify patterns and model components. Estimation: Determine coefficients and estimate through software application of least squares and maximum likelihood methods, Diagnostics: Using graphs, statistics, ACFs and PACFs of residuals to verify whether the model is valid. If valid then use the decided model, otherwise repeat the steps of Identification, Estimation and Diagnostics. Forecast: Using graphs, simple statistics and confidence intervals to determine the validity of the forecast and track model performance to detect out of control situation. Identification Estimation Diagnostics good bad Revised model Forecast 1

ARIMA Model Assumptions: In ARIMA terms, a time series is a linear function of past actual values and random shocks, that is Y t = f (Y t-k, e t-k ) + e t, where k >0 In ARIMA model, we do not have a forecasting model a priori before Model Identification takes place. ARIMA helps us to choose right model to fit the time series. Put it in flow chart: Y t (Actual) ARIMA Machine (Model of pattern) Ŷ t and e t (Accurate forecasts and white noise residuals and errors) Decide the right empirical model ARIMA Notation: AR(p) Where p = order of autocorrelation (Indicates weighted moving average over past observations) I(d) Where d = order of integration (differencing) (Indicates linear trend or polynomial trend) MA(q) Where q = order of moving averaging (Indicates weighted moving average over past errors)

ARIMA Processes: 1. Auto-regressive Process: ARIMA (1,0,0): Y t = Y t-1 + e t or Y t = + Y t-1 + e t Bound of Stationary: the absolute value of < 1, (-1< < 1). If = 1, it becomes ARIMA (0,1,0) which is non-stationary. If > 1, the past values of Y t-k and e t-k have greater and greater influence on Y t, it implies the series is non-stationary with an ever increasing mean. To sum up, If Bound of Stationary does not hold, the series is not autoregressive; it is either drifting or trending, and first-difference should be used to model the series with stationary. Autoregressive Process: ARIMA (p, 0,0): Y t = + 1 Y t-1 + Y t- + + p Y t-p + e t or Y t = 1 Y t-1 + Y t- + + p Y t-p + e t. Moving Average Process: ARIMA (0,0,1) The general principle of moving average processes is that a random shock persists for exactly q observations and then gone. Using the black box input-output analogy, we can think of a first-order moving average process as e t 1 e t-1 Y t Leakage t1 i1 e e i ti Here the random shock e t enters the box, is joined with a portion of the preceding random shock, e t-1, and leaves the box as the time series observation Y t. A portion of the preceding random shock, e t-1, has already leaked out of the system along with all prior random shocks back into the distant past. So an ARIMA(0,0,1) process can be expressed as: Y t = e t + e t-1 or Y t = + e t + e t-1 Writing an ARIMA (0, 0, 1) process at two points in time, Y t = e t + e t-1 and Y t-1 = e t-1 + e t- 3

When substitute the expression for e t-1 into the expression for Yt, Y t = e t + (Y t-1 + e t- ) =Y t-1 + e t + e t- Similarly, for et-, Y t = e t + Y t-1 + (Y t- + e t-3 ) = Y t-1 + Y t- + e t + 3 e t-3 And continuing this substitution back into time, i1 i Y e Y t t ti Therefore, an ARIMA(0,0,1) can be expressed identically as the infinite sum of exponentially weighted past observation of the process. Extension to any ARIMA(0,0,q) can also be expressed as an infinite series of exponentially weighted past observations. Given this relationship, it is clear that the values of moving average parameters must be constrained between -1 and 1 ( -1<< 1). Bound of Invertibility: The absolute value of is less than 1 (-1<< 1). If not hold, the model is non- stationary. Moving Average Process: ARIMA (0,0,q) Y t = e t + 1 e t-1 + e t- + + q e t-q The important feature of such ARIMA(0,0,q) is that the variables of e t-1 to e t-q are unobserved and have to estimated using the available sample data. In practice, it is usual to keep q at a small value, and it is often set at 1 or. 3. Integrated Processes: ARIMA (0,1,0) Random Walk Process: ARIMA (0,1,0): Y t = Y t-1 +e t Y t Y t-1 = e t Y t = e t All future values are expected to equal the last known actual value. Deterministic Trend Process: ARIMA (0,1,0)1 Y t = Y t-1 + T + e t Y t = T + e t T is the trend Y t+m = Y t+m-1 + mt +e t Y t+m = mt + e t m is the forecast horizon 4. ARIMA(p,0,q): Y t = 1 Y t-1 + Y t- + + p Y t-p + e t + 1 e t-1 + e t- + + q e t-q 4

5. ARIMA(p,1,q): Y t = 1 Y t-1 + Y t- + + p Y t-p +e t + 1 e t-1 + e t- + + q e t-q Invertibility: The infinite MA representations of ARIMA(1,0,1) Y t = Y t-1 + e t + e t-1 = ( Y t- + e t-1 + e t- ) + e t + e t-1 = Y t- + e t + ()e t-1 + e t- = (Y t-3 + e t- + e t-3 )+ e t + ()e t-1 + e t- = 3 Y t-3 + e t- e t-3 + e t + ()e t-1 + e t- = 3 Y t-3 + e t + ()e t-1 + ()e t- 3 e t-3 Y t = e t + ()e t-1 + ()e t- + ()e t-3 + + n ()e t-n Yt jet j or j 1 Y t e t ( j0 j 1 The infinite AR representations of ARIMA(1,0,1): ) e t j Y t = Y t-1 + e t + e t-1 e t = Y t - Y t-1 - e t-1 = Y t - Y t-1 (Y t-1 - Y t- - e t- ) = Y t - Y t-1 Y t-1 + Y t- + e t- = Y t ()Y t-1 + Y t- + (Y t- - Y t-3 - e t-3 ) = Y t ()Y t-1 + ()Y t- - Y t-3 3 e t-3 e t = Y t ()Y t-1 + ()Y t- - ()Y t-3 + () k-1 ()Y k-1 +. et jyt j or et Yt j0 j1 j1 ( ) Y tj 5

Identication: ARIMA Model Identification Tools ( Yt Y )( Yt k Y ) t1 k cov( YtYt k ) 1. Autocorrelation function: ACF( k) ( Y Y ) var( Y ). Partial Autocorrelation function (PACF): The PACF measures the additional correlation between Y t and Y t-k after adjustments have been made for the intermediate values Y t-1,, Y t-k+1. The PACF is closely related to ACF, their value also lies between -1 and +1. The specific computational procedures for PACFs are complicated, but these formulas do not need to be understood for us to use PACFs in the model identification phase. 3. Graph plot: Autocorrlations and partial autocorrelations are generally displayed either as a table of values or as a plot of the coefficients in a correlogram. Process ACFs PACFs (Model) ARIMA (0,0,0) No significant lags No significant lags ARIMA (0,1,0) Linear decline at lag 1, with many lags significant Single significant peak at lag 1 t1 t t ARIMA (1,0,0) 1 > > 0 Exponential decline, with first two or many lags significant Single significant positive peak at lag 1 ARIMA (1,0,0) 1 < < 0 Alternative exponential decline with a negative peak ACF(1) Single significant negative peak at lag 1 ARIMA (0,0,1) 1 > > 0 Single significant negative peak at lag 1 Exponential decline of negative value, with first two or many lags significant ARIMA (0,0,1) -1< < 0 Single significant positive peak at lag 1 Alternative exponential decline with a positive peak PACF(1) 6

Theoretical ACFs for an ARIMA(0,0,0) or white noise process: Y t = e t or Y t = e t + Cov(Y t Y t-1 ) = E(Y t Y t-1 ) = E(e t e t-1 ) = 0 Var(Y t ) = E(Y t ) = ACF(1) = Cov(Y t Y t-1 ) /Var(Y t ) = 0 For all cov(y t Y t-k ) = 0, therefore, ACF(k) = 0, and it indicates the series is white noise Theoretical ACFs for an ARIMA(0,1,0): Y t = e t => Y t = Y t-1 + e t Cov(Y t Y t-1 ) = E(Y t Y t-1 ) = E[(Y t-1 + e t-1 )Y t-1 ] Var(Y t ) = E(Y t ) = = Var(Y t-1 ) ACF(1) = Cov(Y t Y t-1 ) /Var(Y t ) = 1 = E(Y t-1 Y t-1 +Y t-1 e t-1 ) = E(Y t-1 Y t-1 ) + E(Y t-1 e t-1 ) = Cov(Y t Y t- ) = E(Y t Y t- ) = E[(Y t- + e t- )Y t- ] = E(Y t- Y t- + Y t- e t- ) = E(Y t- Y t- ) + E(Y t- e t- ) = ACF() = Cov(Y t Y t-1 ) /Var(Y t ) = 1 For all cov(y t Y t-k ) 0, therefore, ACF(k) 0, and it indicates the first-difference is white noise. Theoretical ACFs for an ARIMA (1,0,0) Process: Y t = Y t-1 + e t Multiplying both sides by Y t-1 and taking the expected values as: E(Y t Y t-1 ) = E[ Y t-1 +e t )Y t-1 ] = E[ Y t-1+y t-1 e t ] where E(Y t Y t-1 ) = COV(Y t Y t-1 ) COV(Y t Y t-1 ) = E(Y t-1) + E(Y t-1 e t ) Since E(Y t-1) means VAR(Y t-1 ) and Y t-1 and e t is statistically independent to each other in ARIMA (1,0,0). Also owing to stationary, VAR(Y t-k ) = VAR(Y t ), we get: COV(Y t Y t-1 ) VAR(Y t-1 ) + 0 COV(Y t Y t-1 ) = VAR(Y t ) 7

Then, get ACF(1) = COV(Y t Y t-1 ) / VAR(Y t ) = VAR(Y t ) / VAR(Y t ) = Next, by continuing in a similar fashion for ACF(), multiplying both side by y t- and taking expected values: E(Y t Y t- ) = E[Y t- Y t-1 +e t )] where E(Y t Y t- ) = COV(Y t Y t- ) COV(Y t Y t- ) = E[Y t- Y t- + e t-1 ) +e t )] where Y t-1 =Y t- + e t-1 COV(Y t Y t- ) = E[ E(Y t-) + (Y t- e t-1 ) + (Y t- e t )] = E(Y t-) + E(Y t- e t-1 ) + E(Y t- e t ) For similar derivations and get: COV(Y t Y t- ) = VAR(Y t- ) + 0 + 0 COV(Y t Y t- ) = VAR(Y t ) Then, we get ACF() = COV(Y t Y t- ) / VAR(Y t ) = VAR(Y t ) / VAR(Y t ) = Similarly derived, for k > 0, the general formula for ACF(k) is: ACF(k) = COV(Y t Y t-k ) / VAR(Y t ) = k Since 1 > < 1, it means that ACFs in ARIMA (1,0,0) Process will suffer from exponentially decline. Boundary of Stationary: -1< 1 < 1 If 1 = 0.5 ACF (1) = 1 = 0.5 ACF () = 1 = 0.5 ACF (3) = 3 1 = 0.15 ACF (k) = k 1 = 0.5 k ACFs In graphical expression: Lag (k) 8

ACFs If 1 = - 0.5 ACF (1) = 1 = -0.5 ACF () = = 0.5 1 1 ACF (3) = = -0.15 ACF (k) = k 1 = (-0.5) k In graphical expression: Lag (k) Theoretical ACFs for an ARIMA (0,0,1) Process: For an ARIMA (0,0,1) model, Y t is statistically independent of Y t-k. Also, because of white noise e t and e t-k are statistically independent. The model is: Y t = e t + e t-1 Multiplying both side by Y t-1 = (e t-1 + e t- ) and taking expected values: E(Y t Y t-1 ) = E[(e t +e t-1 ) (e t-1 +e t- )] where E(Y t Y t-1 ) means COV(Y t Y t-1 ) COV(Y t Y t-1 ) = E(e t e t-1 +e t e t- + e t-1 + e t-1 e t- ) COV(Y t Y t-1 ) = E(e t-1) Then, find VAR(Y t ): VAR(Y t ) = E(Y t) = E[(e t + e t-1 ) ] = E(e t + e t e t-1 + e t-1) = E(e t) + E(e t-1) since E(e t) = E(e t-1) = (1+ ) E(e t) Then, ACF(1) = COV(Y t Y t-1 ) / VAR(Y t ) = E(e t-1) / (1+ )E(e t) = / (1+ ) 0 9

Using similar method to derive ACF(k). For ACF(), multiplying both side by Y t- =(e t- + e t-3 ) and taking expected values: E(Y t Y t- ) = E[(e t +e t-1 ) (e t- +e t-3 )] where E(Y t Y t- ) means COV(Y t Y t- ) COV(Y t Y t- ) = E(e t e t- +e t e t-3 + e t-1 e t- + e t-1 e t-3 ) COV(Y t Y t- ) = 0 + 0 + 0 + 0 = 0 Then, ACF() = COV(Y t Y t- ) / VAR(Y t ) = 0 / VAR(Y t ) = 0 General conclusion: ACF(k) = 0 for k > 1 Exercise: Derive the ACF(k) for ARIMA(,0,0) and ARIMA(0,0,)? 10

Diagnostics: In the model-building process, if an ARIMA(p, d, q) model is chosen (based on the ACFs and PACFs), some checks on the model adequacy are required. A residual analysis is usually based on the fact that the residuals of an adequate model should be approximately white noise. Therefore, checking the significance of the residual autocorrelations and comparing with approximate two standard error bounds, i.e., / n are need. Ljung-Box (1978) statistics: Q-statistics is an objective diagnostic measure of white noise for a time series, assessing whether there are patterns in a group of autocorrelations. Q K ACF( i) n( n ) for i= 1 to k with (k-p-q) degree of freedom N i i1 Ho: ACFs are not significantly different than white noise ACFs (i.e., ACFs = 0). H 1 : ACFs are statistically different than white noise ACFs (i.e., ACFs 0). Decision rule: If Q chi-square If Q > chi-square No reject Ho, the ACF patterns are white noise Reject Ho, the ACF patterns are not white noise If a model is rejected at this stage, the model-building cycle has to be repeated. Note: This test only make sense if k > p + q. Two criteria for model selection: Akaike's information criterion (AIC): AIC ˆ p q n log Whereˆ is the estimated variance of e t. Schwarz's Bayesian Information criterion (BIC, SC or SBC): BIC log ˆ p q log( n) n 11

Both criteria are likelihood-based and represent a different trade-off between "fit", as measured by the log-likelihood value, and "parsimony", as measured by the number of free parameters, p + q. If a constant is included in the model, the number of parameters is increased to p+ q+ 1. Usually, the model with the smallest AIC or BIC values are preferred. While the two criteria differ in their trade-off between fit and parsimony, the BIC criterion can be preferred because it has the property that it will almost surely select the true model. Characteristics of a Good Forecasting Model: 1. It fits the past data well. Plots of actual against fitted value are good. Adjusted R is high RSE is low relative to other models The MAPE is good. It forecasts the future and withheld data well 3. It is parsimonious, simple but effective, not having too many coefficients 4. The estimated coefficients and are statistically significant and no redundant or unnecessary. 5. The model is stationary and invertible, (i.e., 1 >, > -1) 6. No significant patterns left in ACFs and PACFs 7. The residual are white noise 1