Econ 623 Econometrics II Topic 2: Stationary Time Series

Similar documents
1 Class Organization. 2 Introduction

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

Ch. 15 Forecasting. 1.1 Forecasts Based on Conditional Expectations

Empirical Market Microstructure Analysis (EMMA)

Lecture 2: Univariate Time Series

Review Session: Econometrics - CLEFIN (20192)

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

Introduction to ARMA and GARCH processes

Forecasting with ARMA

Problem Set 2: Box-Jenkins methodology

Midterm Suggested Solutions

Discrete time processes

Introduction to Stochastic processes

Covariance Stationary Time Series. Example: Independent White Noise (IWN(0,σ 2 )) Y t = ε t, ε t iid N(0,σ 2 )

Lecture 1: Fundamental concepts in Time Series Analysis (part 2)

Chapter 4: Models for Stationary Time Series

Applied time-series analysis

Autoregressive Moving Average (ARMA) Models and their Practical Applications

Università di Pavia. Forecasting. Eduardo Rossi

CHAPTER 8 FORECASTING PRACTICE I

Class 1: Stationary Time Series Analysis

Some Time-Series Models

Stationary Stochastic Time Series Models

University of Oxford. Statistical Methods Autocorrelation. Identification and Estimation

Time Series 2. Robert Almgren. Sept. 21, 2009

at least 50 and preferably 100 observations should be available to build a proper model

Univariate Time Series Analysis; ARIMA Models

EASTERN MEDITERRANEAN UNIVERSITY ECON 604, FALL 2007 DEPARTMENT OF ECONOMICS MEHMET BALCILAR ARIMA MODELS: IDENTIFICATION

Lecture 3: Autoregressive Moving Average (ARMA) Models and their Practical Applications

Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley

Prof. Dr. Roland Füss Lecture Series in Applied Econometrics Summer Term Introduction to Time Series Analysis

Lecture 2: ARMA(p,q) models (part 2)

ECON 616: Lecture 1: Time Series Basics

ECONOMETRICS Part II PhD LBS

Chapter 9: Forecasting

FE570 Financial Markets and Trading. Stevens Institute of Technology

Econ 423 Lecture Notes: Additional Topics in Time Series 1

Time Series I Time Domain Methods

Empirical Macroeconomics

Module 3. Descriptive Time Series Statistics and Introduction to Time Series Models

ARIMA Models. Richard G. Pierse

Chapter 6: Model Specification for Time Series

18.S096 Problem Set 4 Fall 2013 Time Series Due Date: 10/15/2013

Ch 4. Models For Stationary Time Series. Time Series Analysis

Lecture 4a: ARMA Model

2. An Introduction to Moving Average Models and ARMA Models

Lecture 3: Autoregressive Moving Average (ARMA) Models and their Practical Applications

Stat 248 Lab 2: Stationarity, More EDA, Basic TS Models

11. Further Issues in Using OLS with TS Data

3. ARMA Modeling. Now: Important class of stationary processes

Empirical Macroeconomics

Lecture 1: Stationary Time Series Analysis

Ch. 14 Stationary ARMA Process

ECON/FIN 250: Forecasting in Finance and Economics: Section 6: Standard Univariate Models

Vector Auto-Regressive Models

Univariate ARIMA Models

1 Linear Difference Equations

Problem set 1 - Solutions

VAR Models and Applications

APPLIED ECONOMETRIC TIME SERIES 4TH EDITION

Forecasting and Estimation

Econometric Forecasting

Econometría 2: Análisis de series de Tiempo

Time Series Analysis -- An Introduction -- AMS 586

LECTURES 2-3 : Stochastic Processes, Autocorrelation function. Stationarity.

Notes on Time Series Modeling

Econometrics II Heij et al. Chapter 7.1

Topic 4 Unit Roots. Gerald P. Dwyer. February Clemson University

Ch 6. Model Specification. Time Series Analysis

EC402: Serial Correlation. Danny Quah Economics Department, LSE Lent 2015

{ } Stochastic processes. Models for time series. Specification of a process. Specification of a process. , X t3. ,...X tn }

5: MULTIVARATE STATIONARY PROCESSES

Time-Varying Parameters

Estimating AR/MA models

Econometrics I: Univariate Time Series Econometrics (1)

Classic Time Series Analysis

Advanced Econometrics

A time series is called strictly stationary if the joint distribution of every collection (Y t

Univariate Nonstationary Time Series 1

ARIMA Modelling and Forecasting

7. MULTIVARATE STATIONARY PROCESSES

Basic concepts and terminology: AR, MA and ARMA processes

Lecture 1: Stationary Time Series Analysis

TIME SERIES AND FORECASTING. Luca Gambetti UAB, Barcelona GSE Master in Macroeconomic Policy and Financial Markets

Chapter 8: Model Diagnostics

Ch. 19 Models of Nonstationary Time Series

Dynamic Regression Models (Lect 15)

E 4160 Autumn term Lecture 9: Deterministic trends vs integrated series; Spurious regression; Dickey-Fuller distribution and test

Modelling using ARMA processes

Lesson 13: Box-Jenkins Modeling Strategy for building ARMA models

Switching Regime Estimation

We will only present the general ideas on how to obtain. follow closely the AR(1) and AR(2) cases presented before.

AR, MA and ARMA models

Time Series 3. Robert Almgren. Sept. 28, 2009

Difference equations. Definitions: A difference equation takes the general form. x t f x t 1,,x t m.

Lecture 5: Unit Roots, Cointegration and Error Correction Models The Spurious Regression Problem

1. Stochastic Processes and Stationarity

The regression model with one stochastic regressor (part II)

Week 5 Quantitative Analysis of Financial Markets Characterizing Cycles

Covariances of ARMA Processes

Transcription:

1 Introduction Econ 623 Econometrics II Topic 2: Stationary Time Series In the regression model we can model the error term as an autoregression AR(1) process. That is, we can use the past value of the error term to explain the current value of the error term. This idea can be generalized to any variable, such as, Y t = φy t 1 + e t Time series model: a series is modeled only in terms of its own past values, time trend and some disturbance. Often, a series which is modeled only in terms of it own past values and some disturbance is referred to as a time series model. Why time series? Difference between regression models and time series models? Regression: assume one variable is related to another variable. Once we know the relationship, we know how to make inference and we know how to use one variable(s) to predict another variable. In the regression world, all different variables are related to each other according to economic theories, intuition or expectation. 1

Time series: assume one variable is related to its history. There is something which triggers the series to evolve over time. We are not making any attempt to discover the relationships between different variables probably because the relationships are too complicated or the underlying economic theory is not so clear. However, we try to find out the underlying dynamic behavior of one variable over time (time series pattern triggered by any system). The mechanism which produces an economic varible over time is referred to as the data generating process (DGP). If we know DGP, we know everything about the time series properties of any variable. Hence we can make inferences about the variable and use the realised values of this variable to forecast future values of this variable since we believe the new series will be also generated by the same DGP. GDP=f(monetary policy, fiscal policy, inflation, real interest, import, export,...) GDP t =f(gdp t 1,GDP t 2,...) X t = f(x t 1, X t 2,..., ε t, ε t 1,...) 2

2 Important Concepts A stochastic process is a set of random variable, typically indexed by time. {X t } T t=1 is a Gaussian white noise procesess if X t N(0, σ 2 ) A realization is a set of T observations, say {X (k) t } T t=1. If we have K independent realizations, ie {X (k) t } T t=1, k = 1,..., K, then X (1) t,..., X (K) t can be described as a sample of K realizations of random variable X t. This random variable has some density which is typically called the marginal density or unconditional density (f(x)). From the unconditional density, one can obtained unconditional moments. For example, the unconditional mean is E(X t ) = xf(x)dx µ t. + By SLLN for an iid sequence, plim 1 K unconditional variance. K k=1 X (k) t = E(X t ). Similarly, one can define Given a particular realization {X (1) t } T t=1, we construct a vector Y (1) t which contains the [j + 1] most recent observations on X: Y (1) t = X (1) t X (1) t j [ ]. As before, if we have K independent realizations, for each vector, we have K independent realizations, Y (k) t, k = 1,..., K. This random vector has a joint density (f(x t,..., x t j )). From the joint density, one can obtained autocovariance. For example, jth autocovariance of X t is + + γ jt = (x t E(X t ))(x t j E(X t j ))f(x t,..., x t j )dx t dx t j = E(X t µ t )(X t j µ t j ) By SLLN, plim 1 K K k=1 (X (k) t µ t )(X (k) t j µ t j) = E(X t µ t )(X t j µ t j ). 3

{X t } T t=1 is a covariance stationary process if 1. E(X t ) = µ < t 2. V ar(x t ) = σ 2 < t 3. Cov(X t, X t j ) = γ j t, j 0. What happens when we only work with one realization? For example, what are T the properties of the time series average 1 X (1) T t. Note that in general SLLN for the iid sequence is not applicable any more since X (1) t is not independent T over t. Whether 1 X (1) T t converges to µ for a stationary process has to do with t=1 ergodicity. A covariance stationary process is said to be ergodic for the mean T if plim 1 X (1) T t = µ as T. It requires the autocovariance γ j goes to zero t=1 t=1 suffi ciently quickly as j becomes large (for example γ j < for covariance stationary processes). Similarly, a covariance stationary process is said to be T ergodic for second moments if plim 1 (X (1) t µ)(x (1) t j µ) = γ j. T j t=j+1 j=0 4

Partial correlation coeffi cient Three variables: X 1, X 2, X 3. Let ρ 12 be the correlation coeffi cient between X 1 and X 2. Similarly we define ρ 13 and ρ 23. Partial correlation coeffi cient between X 1 and X 3, with the influence of variable X 2 removed, is defined as ρ 13 ρ 12 ρ 23 1 ρ 2 12 1 ρ 2 23 In fact it can be estimated by the OLS estimator of β 2 in the following regression model X t = β 0 + β 1 X t 1 + β 2 X t 2 + e t 5

ACF (autocorrelation function): a function of h. ρ j = γ j /γ 0, where γ 0 = V ar(x t ) ρ 0 = 1 The graphical representation is called correlogram. ACF at j can be estimated by the sample counterpart, T t=j+1 (Xt X)(X t j X) T t=1 (Xt X)2. This is a consistent estimator of ACF. PACF (partial ACF): a function of k. PACF at k is the correlation between X t and X t k, with the influence of variable X t 1,..., X t k+1 removed PACF at k can be estimated by running a regression and obtaining the coeffi cient of X t k : Xt = α 0 (k) + α 1 (k)x t 1 +... + α k (k)x t k. ˆα k (k) is a consistent estimator of PACF. The graphical representation is called partial correlogram. 6

3 Autoregressive and Moving Average (ARMA) Model AR(1): X t = φx t 1 + e t e t iidn(0, σ 2 e) X t = e t + φe t 1 + φ 2 e t 2 + if φ < 1 (*) E(X t ) = 0, V ar(x t ) = σ2 e 1 φ 2 if φ < 1 ρ j = φ j, j. ACF geometrically decreases with the number of the lags Since γ j < when φ < 1, it is ergodic for the mean. j=0 α 1 (1) = φ, α k (k) = 0 k > 1. PACF cuts off after the first lag 7

AR(1) with a drift: X t = µ + φx t 1 + e t e t iidn(0, σ 2 e) (1 φl)x t = µ + e t X t = µ 1 φ + e t + φe t 1 + φ 2 e t 2 + if φ < 1 covariance stationary if φ < 1 not covariance stationary if φ 1 a unit root process if φ = 1. A more general unit root process is X t = µ + X t 1 + u t where u t is stationary. explosive if φ > 1. E(X t ) = µ 1 φ, V ar(x t) = σ2 e 1 φ 2 if stationary ρ j = φ j, j. ACF geometrically decreases with the number of the lags Since γ j < when φ < 1, it is ergodic for the mean. j=0 α 1 (1) = φ, α k (k) = 0 k > 1. PACF cuts off after the first lag 8

0.0 0.4 0.8 0.0 0.4 0.8 6 2 2 4 6 0 200 400 600 800 1000 AR(1) w ith phi=0.9 Series : x1 Series : x1 Estimation of φ in Y t = φy t 1 + e t e t iidn(0, σ 2 e) 1. OLS = conditional maximum likelihood (ML) under Gaussianity. 2. Conditional ML or exact ML if we know the distribution function of e. X t+1 X t N(φX t 1, σ 2 e). 3. φols has no exact distribution function even under normality assumption on e t. Phillips (1977, Econometrica) found an edgeworth expansion to approximate the finite sample distribution of φ ols. E( φ ols ) does not have a closed-form expression. White (1961, Biometrika) showed that E( φ ols ) φ 2φ T 4. φols is consistent. We have to use the asymptotic distribution to test a hypothesis. 9

0.0 0.2 0.4 0.6 0.0 0.4 0.8 2 0 2 4 0 200 400 600 800 1000 AR(1) w ith phi=0.6 Series : x2 Series : x2 Estimation of φ in X t = µ + φx t 1 + e t e t iidn(0, σ 2 e) 1. OLS = conditional maximum likelihood (ML) under Gaussianity. 2. Conditional ML or exact ML if we know the distribution function of e. X t+1 X t N(µ + φx t 1, σ 2 e) 3. φols has no exact distribution function even under normality assumption on e t. Tanaka (1983, Econometrica) found an edgeworth expansion to approximate the finite sample distribution of φ ols. E( φ ols ) does not have a closed-form expression. Kendall (1954, Biometrika) showed that E( φ ols ) φ 1+3φ T 4. φols is consistent. φols = φ mle. can be used to test a hypothesis. ) d n ( φmle φ N(0, 1 φ 2 ) which 10

0.8 0.4 0.0 0.5 0.5 6 2 2 4 6 0 200 400 600 800 1000 AR(1) w ith phi= 0.9 Series : x3 Series : x3 Deriving the finite sample bias of φ ols in the AR(1) model without intercept 1. From Equation (5.3.19) in Priestley (1981), we have E(X t X t+r X s X s+r+v ) = E(X t X t+r )E(X s X s+r+v ) + E(X t X s )E(X t+r X s+r+v ) + E(X t X s+r+v )E(X t+r X s ). 2. φ t s = T 1+φ 3. Let φ ols = 1 T + 2φT +1 2φ 1 φ (1 φ) 2 XtXt 1 1 T X 2 t 1, φ t s + t s 1 = T 2φ + (1+φ2 )(φ 2T +1 φ) 1 φ 2 (1 φ 2 ) 2 U T V T,with U T = 1 T Xt X t 1 and V T = 1 T X 2 t 1. 4. Defining V ar(x t ) by σ 2 X, we have Cov(X t, X t+i ) = φ i σ 2 X, E(U T ) = φσ 2 X and E(V T ) = σ 2 X. 11

5. We also have Cov(U T, V T ) = 1 Cov(X 2 T 2 t, X s X s+1 ) = σx 4 2 φ t s + t s 1 T 2 [ T = 2σ4 X T 2 2φ 1 φ + (1 + φ2 )(φ 2T +1 φ) 2 (1 φ 2 ) 2 ], (1) V ar(v T ) = 1 Cov(X 2 T 2 t, Xs 2 ) = 2σ4 X φ 2 t s T 2 = 2σ4 X T 2 [T 1 + ] φ2 1 φ + 2φ2T +2 2φ 2, (2) 2 (1 φ 2 ) 2 6. Taking the Taylor expansion to the first two terms, we have E ( φols ) ( UT ) = E = E(U T ) V T E(V T ) Cov(U T, V T ) + E(U T )V ar(v T ) + o(t 1 ) E 2 (V T ) E 3 (V T ) (3) = φ 2φ T + o(t 1 ) 12

AR(p): X t = µ + p i=1 φ ix t i + e t, e t iidn(0, σ 2 e) covariance stationary if the roots of 1 φ 1 z φ p z p = 0 lie outside the unit circle (ie the absolute values of the roots are greater than unity). E(X t ) = µ 1 p i=1 φ i α k (k) = 0 k > p. PACF cuts off after the p th lag ACFs for the first p lags are more complicated than those in the AR(1) process. After the first p lags, however, the ACF geometrically decreases as the number of the lags increases p i=1 φ i = 1 implies a unit root. 13

0.0 0.4 0.8 0.0 0.4 0.8 4 0 2 4 6 0 200 400 600 800 1000 AR(2) with phi=0.6 and 0.3 Series : x6 Series : x6 Estimation 1. OLS = conditional ML under Gaussianity. 2. Conditiona ML or exact ML if we know the distribution function of e. 3. φ s are consistent. In AR(2), ) d n ( φols φ N [( ) ( 0, 0 1 φ 2 2 φ 1 (1 + φ 2 ) φ 1 (1 + φ 2 ) 1 φ 2 2 )] which can be used to test a hypothesis. MA(1): X t = µ + e t θe t 1, e t iid(0, σ 2 e) Always covariance stationary θ E(X t ) = µ, V ar(x t ) = σ 2 e(1 + θ 2 ) ρ 1 = θ 1+θ 2, ρ h = 0 h > 1. ACF cuts off after the first lag. 14

ACF remains the same when θ becomes 1/θ. α k (k) = ( 1)k θ k (θ 2 1). PACF does not cut off. 1 θ 2k+2 If θ < 1, MA(1) process is invertible. In this case PACF geometrically decreases with the number of the lags Relationship between MA(1) (with θ < 1) and AR( ) Estimation 1. ML if we know the distribution function of e. X t e t 1 N(µ θe t 1, σ 2 e). 15

0.4 0.2 0.0 0.4 0.2 0.6 1.0 4 2 0 2 0 200 400 600 800 1000 MA(1) w ith theta=0.6 Series : x4 Series : x4 MA(q): X t = µ + e t θ 1 e t 1 θ q e t q, e t iidn(0, σ 2 e) covariance stationary The PACFs for the first q lags are more complicated than those in the MA(1) process. After the first q lags, however, the PACF geometrically decreases to 0 as the number of the lags increases ρ j = 0 j > q. ACF will cut off after the q th lag Estimation 1. ML if we know the distribution function of e. 2. θ mle is consistent. d n ( θmle θ) N(0, 1 θ 2 ) which can be used to test a hypothesis. 16

0.2 0.2 0.4 0.0 0.4 0.8 2 0 2 0 200 400 600 800 1000 MA(1) w ith theta= 0.6 Series : x5 Series : x5 MA( ): X t = µ + e t θ 1 e t 1 θ 2 e t 2, e t iidn(0, σe) 2 covariance stationary if θ j < j=0 ergodic for the mean if θ j < j=0 Relationship between AR(1) and MA( ) 17

0.3 0.1 0.2 0.4 0.8 2 0 2 4 0 200 400 600 800 1000 MA(2) with theta=0.6 and 0.3 Series : x7 Series : x7 ARMA(1,1): X t = µ + φx t 1 + e t θe t 1 e t iidn(0, σ 2 e) ρ 1 = (φ θ)(1 φθ) 1 2φθ+θ 2, ρ h = φρ(h 1), h > 1 The first ACF depends on the parameters of both the AR part and the MA part. After the first period, the subsequent ACF geometrically decreases to 0 with the rate of decline given by the AR parameter as the lag increases. This is similar to AR(1). In contrast to AR(1), however, the PACF does not cut off. This is similar to MA(1). Estimation 1. ML if we know the distribution function of e. ARMA(0,0) Gaussian white noise : X t = µ + e t e t iidn(0, σ 2 e) 18

0.4 0.2 0.6 0.0 0.4 0.8 6 2 2 4 6 0 200 400 600 800 1000 ARMA(1,1) with phi=0.8,theta= 0.6 Series : x8 Series : x8 ARMA(p,q): X t = p i=1 φ ix t i + e t q i=1 θ ie t i e t iidn(0, σ 2 e) Φ(L)X t = Θ(L)e t X t = Θ(L) Φ(L) e t e t = Φ(L) Θ(L) X t (MA representation) under some conditions (AR representation) under some conditions ACF behaves similar to that in AR(p) process PACF behaves similar to that in MA(q) process Estimation 1. ML if we know the distribution function of e. Autocovariance generating function. When γ j is absolutely summable, we define the autocovariance generating function as g(z) = j= γ jz j. population spectrum is defined as s(ϖ) = 1 2π g(e iϖ ). 19 The

Wold Representation (Decomposition): For any zero mean covariance stationary process, X t, we can always find {e t } and v t such that X t = j=0 c je t j + v where {c j } is a real sequence with i=0 c2 j < with c 0 = 1, e t W N(0, σ 2 ), v t is purely deterministic, W N stands for white noise. 20

4 Forecasting Based on ARMA Processes How to forecast a series with an ARMA(p,q) process with known coeffi cients? {X 1,..., X T } historical observations X T +h (unknown) value of X in future period T + h X T +h forecast of X T +h based on {X 1,..., X T }(h-period-ahead forecast) ê T +h = X T +h X T +h forecast error Mean squared error: E(X T +h X T +h ) 2 21

Minimum MSE forecast of X T +h based on {X 1,..., X T }: XT +h = E(X T +h X 1,..., X T ). This is known as the optimal forecasts Proof: Without loss of generality, assume h = 1. Let g(x 1,..., X T ) be the forecast other than the conditional mean. The MSE is E[X T +1 g(x 1,..., X T )] 2 = E[X T +1 E(X T +1 X 1,..., X T )+E(X T +1 X 1,..., X T ) g(x 1,..., X T )] 2 = E[X T +1 E(X T +1 X 1,..., X T )] 2 +2E{[X T +1 E(X T +1 X 1,..., X T )][E(X T +1 X 1,..., X T ) g(x 1,..., X T )]} +E[E(X T +1 X 1,..., X T ) g(x 1,..., X T )] 2 Define η T +1 = [X T +1 E(X T +1 X 1,..., X T )][E(X T +1 X 1,..., X T ) g(x 1,..., X T )]. Conditional on X 1,..., X T, both E(X T +1 X 1,..., X T ) and g(x 1,..., X T ) are known constants. Hence E(η T +1 X 1,..., X T ) = 0. Therefore, E(η T +1 ) = E[E(η T +1 X 1,..., X T )] = 0 and E[X T +1 g(x 1,..., X T )] 2 = E[X T +1 E(X T +1 X 1,..., X T )] 2 +E[E(X T +1 X 1,..., X T ) g(x 1,..., X T )] 2 The second term cannot be smaller than zero. So g(x 1,..., X T ) = E(X T +1 X 1,..., X T ) will make the MSE the smallest. 1-step-ahead forecast: h = 1 multi-step-ahead forecast: h > 1 22

4.0.1 Forecasting based on AR(1) Minimum Mean Square Error (MSE) forecast of X T +1 : E(X T +1 X 1,..., X T ) = µ + φx T + E(e T +1 X 1,..., X T ) = µ + φx T Forecast error variance: σ 2 e Minimum MSE forecast of X T +2 : E(X T +2 X 1,..., X T ) = µ + φµ + φ 2 X T Minimum MSE forecast of X T +h : E(X T +h X 1,..., X T ) = µ + φµ +... + φ h 1 µ + φ h X T Forecast error variance: σ 2 e(1 + φ 2 + φ 4 +... + φ 2(h 1) ) The minimum MSE is known as the optimal forecasts 23

4.0.2 Forecasting based on MA(1) Minimum MSE (optimal) forecast of X T +1 : E(X T +1 X 1,, X T ) = µ + E(e T +1 X 1,, X T ) θe(e T X 1,, X T ) = µ θe(e T X 1,, X T ), where ε T can be derived from {X 1,..., X T } with some assumption about e 0. For example, we can assume e 0 E(e 0 ) = 0. Such an approximation will have negligible effect on X T +h when the sample size is approaching infinite. However, when the sample size is small, the effect could be large and this leads to a nonoptimal forecast in finite sample. The optimal forecast in finite sample can be obtained by Kalman filter. Minimum MSE (optimal) forecast of X T +2 : E(X T +2 X 1,, X T ) = µ + E(e T +2 X 1,, X T ) θe(e T +1 X 1,, X T ) = µ Minimum MSE (optimal) forecast of X T +h : E(X T +h X 1,..., X T ) = µ, h 2 24

4.0.3 Forecasting based on ARMA(1,1) Minimum MSE (optimal) forecast of X T +1 : E(X T +1 X 1,..., X T ) = µ + φx T + E(e T +1 X 1,..., X T ) θe(e T X 1,..., X T ) = µ + φx T θe(e T X 1,..., X T ) Minimum MSE (optimal) forecast of X T +h : E(X T +h X 1,..., X T ) = µ + φµ +... + φ h 1 µ + φ h X T, h 2 25

5 Box-Jenkins Method This method applies to stationary ARMA time series. Procedures: 1. When the data becomes stationary, identify plausible values for (p, q), say (p 1, q 1 ), (p 2, q 2 ),..., (p k, q k ), by examining the sample ACF and sample PACF. 2. Estimate these k ARMA processes 3. Choose one model using model selection criteria, such as AIC, or SC (BIC) 4. Validation: testing hypothesis, checking residuals 5. Forecast with the selected model 26