Model Identification for Time Series with Dependent Innovations

Size: px

Start display at page:

Download "Model Identification for Time Series with Dependent Innovations"

Annabella Arnold
6 years ago
Views:

1 Model Identification for Time Series with Dependent Innovations Shuhao Chen 1, Wanli Min 2 and Rong Chen 1,3 1 Rutgers University, 2 IBM Singapore and 3 Peking University Abstract: This paper investigates the impact of dependent but uncorrelated innovations (errors) on the traditional autoregressive moving average model (ARMA) order determination schemes such as autocorrelation function (ACF), partial autocorrelation function (PACF), extended autocorrelation function (EACF) and unit-root test. The ARMA models with iid innovations have been studied extensively and are well-posed, but their properties with dependent but uncorrelated innovations are relatively less studied. In the presence of such innovations, we show that the ACF, PACF and EACF are significantly impacted while the unitroot test is not affected. We also propose a new order determination scheme to address those impacts for analyzing time series with uncorrelated innovations. Key words and phrases: time series, order determination, uncorrelated but dependent errors, ACF, PACF, EACF, unit-root test, GARCH. 1 Introduction Time series analysis is widely used in many fields including econometrics, finance, engineering and metrology. The most commonly used time series models are the autoregressive moving average (ARMA) model. More specifically, ARM A(p, q) model for a univariate time series X t takes the form of Φ(B)X t = Ψ(B)ε t, (1) with Φ(B) = φ(b)u(b) where Φ(B) = 1 Φ 1 B Φ p B p, U(B) = 1 U 1 B U d B d, φ(b) = 1 φ 1 B φ p d B p d and Ψ(B) = 1+ψ 1 B+ +ψ q B q are polynomials in the back-shift operator B, defined as BX t = X t 1 and Bε t = ε t 1. We shall require that all zeros of U(B) are on the unit circle and those of φ(b) and Ψ(B) are outside the unit circle. ε t is the innovation term. Note that when U(B) = (1 B) d, model (1) is autoregressive moving integrated moving average ARIMA(p d, d, q) model. If we further assume Ψ(B) = 1, model (1) is ARI(p d, d) model. 1

2 The first step and also one of the key steps in building a time series model is order determination. In the literature, order determination schemes for time series models with identically independent distributed (iid) innovations{ε t } are well studied. Box and Jenkins (1976) introduced the autocorrelation function (ACF) and partial autocorrelation function (PACF). The Akaike Information Criterion (AIC) by Akaike (1974) and Bayesian Information Criteria (BIC) by Schwarz (1978) are two goodness of fit measures of an estimated model to facilitate the model selection. Tsay and Tiao (1984) proposed the extended autocorrelation function (EACF) for order determination of ARM A(p, q) model. On the other hand, Dickey and Fuller (1979) studied the unit-root behavior and gave the asymptotic distribution of a unit root test statistic. Standard order determination procedure combines those two techniques: taking the unit root test to decide whether to make difference(s)(i.e. set Y t = X t X t 1 ) and then using ACF/PACF/EACF procedure on differenced series Y t to get AR and MA orders p and q respectively. Other models and order determination schemes include R and S array approach by Gray, Kelley and McIntire (1978); Corner method by Beguin, Gourieroux and Monfort (1980) and Smallest Canonical Correlation (SCAN) by Tsay and Tiao (1985). Most of the existing order determination methods assume that the innovation sequence {ε t }, are iid and/or have constant conditional variance, i.e., E[ε 2 t F t 1 ] = σ 2, excluding many interesting processes such as the generalized autoregressive conditional heteroscedastic (GARCH) models by Bollerslev (1986) and the stochastic volatility (SV) model by Melino and Turnbull (1990), both often used in financial time series modeling. In these models, the innovation series are not auto-correlated but auto-dependent. Thus it becomes interesting and important to revisit the classical order determination schemes at the presence of auto-dependent but uncorrelated innovations. Specifically, in this paper we study the order determination schemes for MA, AR, ARMA and ARI models at the presence of uncorrelated but dependent innovations. Autocorrelation function (ACF) is a simple order determination procedure for MA process and for AR process partial autocorrelation function (PACF) becomes effective. For ARMA process, extended autocorrelation function (EACF) has shown to be very useful in identifying the AR and MA orders and it also works for differenced sequence of ARIMA process. We investigate how those schemes are impacted by the new type of innovations considered here. There have been several studies in this area. Min (2004) investigated the 2

3 effect of dependent innovations on pure AR and MA time series sequences; Yang and Zhang (2008) discussed the unit-root test with GARCH(1,1) innovations, one of the most used dependent but uncorrelated innovations. Our study is under a more unified framework which allows for a wide class of uncorrelated but dependent innovations. The rest of the paper is organized as follows. In section 2, we present a detailed study on how dependent but uncorrelated innovations affect the properties of the key statistics in classical time series model identification procedures. Based on these findings, we propose suitable methods and establish their theoretical validations. Simulation studies and applications to real examples are illustrated in section 3. Proofs are given in the Appendix. For simplicity, we employ the following notations and terms throughout the dist P content: means convergence in distribution; means convergence in probability; a.s. means almost sure convergence; = means denotation; term errors and innovations are used interchangeably. For a set of random variables {X n } and a corresponding set of function {f n }, the notation X n = O p (f n ) means that the set of values X n /f n is bounded in the limit in probability. 2 Method In this paper we consider the following structure of dependent but uncorrelated innovations. Let {e t } be iid random variables and F be a measurable function such that innovation ε t = F (e t, e t 1, ) is a well-defined random variable. We assume the following condition. Condition C.1: E(ε t F t 1 ) = 0 and E(ε 2 t ) = σ 2, t Z, where F t = σ(et, e t 1, ) is the σ-field generated by sequence {e t } representing the information available up to time t. The error series {ε t } is uncorrelated since E(ε t ε t 1 ) = E(E(ε t ε t 1 F t 1 )) = E(ε t 1 E(ε t F t 1 )) = 0. The second part of condition C.1 is weaker than the traditional condition E(ε 2 t F t 1 ) = σ 2, which can be seen from the iterative expectation E(ε 2 t ) = E(E(ε 2 t F t 1 )). The stronger condition E(ε 2 t F t 1 ) = σ 2 also implies zero autocorrelations of sequence {ε 2 t }, i.e., Cov(ε 2 t, ε 2 t 1 ) = 0. However, condition C.1 allows nonzero autocorrelations of {ε 2 t }. Another important characterization of the uncorrelated and dependent innovation {ε t } is related to projections defined in Wu and Min (2005): P t η = 3

4 E(η F t ) E(η F t 1 ). For instance, P 1 ε i ε j = E(εi ε j F 1 ) E(ε i ε j F 0 ). Denote and p as L 2 and L p norm, respectively. We assume the following condition. Condition C.2: E(ε 4 n) < and P 1 ε t 4 <, t=1 i,j=0 P 1ε i ε j <. Remark 1 The intuition of condition C.2 is that the projection of the future ε i ε j to the space µ 1 µ 0 = {Z : i,j=0 P 1Z <, Z is F 1 measurable and E(Z F 0 ) = 0} has a small magnitude; namely, the future depends weakly on the current states or the current states depends weakly on the previous states. This setup includes many interesting stochastic processes, such as ARCH(p) model, GARCH(p, q) model, and Stochastic Volatility Model. For example, Min (2004) showed that GARCH(p, q) model satisfies condition C Moving Average Model One of the fundamental building blocks of time series models is the moving average (MA) model. Specifically, a moving average process X t with order q, namely MA(q) is defined as follows X t = ε t + ψ 1 ε t ψ q ε t q. (2) It is a special case of ARMA model (1) with Φ(B) = 1. The autocovariance function and autocorrelation function (ACF) are defined as γ(h) = Cov(x t, x t+h ) and ρ(h) = Cor(x t, x t+h ), (3) respectively. The ACF has the unique feature that for an M A(q) process, its ACF cuts off at lag q. Since {ε t } are uncorrelated, we have γ(h) = σ 2 q i=0 ψ iψ i+h, where ψ j = 0 for j > q. Hence γ(q + i) = ρ(q + i) = 0, i > 0. ACF can be used effectively to determine the order q for an MA(q) model. Let x 1,, x n be a sample of MA(q) process in (2). We denote the sample autocovariance function and sample autocorrelation function respectively as ˆγ(h) = 1 n h x t x t+h and ˆρ(h) = ˆγ(h)/ˆγ(0). (4) n t=1 It is known (e.g. Brockwell and Davis, 1986) that n(ˆρ(h) ρ(h)) dist 4

5 N(0, W ) where W is given by the celebrated Bartlett formula W = k= {ρ 2 (k) + ρ(h k)ρ(h + k) + 2ρ 2 (h)ρ 2 (k) 4ρ(h)ρ(k)ρ(k h)}. (5) If we further assume ε t iid (0, σ 2 ), the Bartlett formula (5) implies that for M A(q) model, W = (1 + 2ρ 2 (1) + + 2ρ 2 (q)), if h > q. However, when {ε t } are dependent, the validity of using n 1 W to estimate the variance of sample ACF needs to be reexamined. In the following, we establish asymptotic property of ˆρ(h) under conditions C.1 and C.2 of {ε t }. Specifically, we show that the cut-off property (Lemma 1) and the asymptotic joint normality of ˆγ(h) still holds, but the asymptotic variance is different (Theorem 1). Lemma 1 Let {X t } be a MA(q) process with innovation {ε t } following conditions C.1. For the autocorrelation function ρ(h) defined in (3), we have ρ(q + 1) = ρ(q + 2) = = 0. The proof is trivial since the cut-off properties is only related to correlation rather than dependence. Theorem 1 Let {X t } be a MA(q) process with innovation {ε t } following conditions C.1 and C.2; autocorrelation function ρ(h) defined as (3) and sample autocorrelation ˆρ(h) defined as (4), then we have h 1, ( ) dist n(ˆρ(1) ρ(1),, ˆρ(h) ρ(h)) N 0, Σ, h 1 h h where Σ assumes a complex form to be explicitly expressed in the proof in Appendix. In particular, for h > q, Var(ˆρ(h)) 1 ( ) σ(0, h) + 2σ(1, h) + + 2σ(q, h), (6) nγ 2 (0) where σ(d, h) = E(x 0 x h x d x d+h ). If it is further assumed that ACF of {ε 2 t } to be 5

6 nonnegative and E(ε 2 t ε t i ε t j ) = 0, i j, then for any h > q, nˆρ(h) dist N(0, δ 2 h ), q δ2 h (1 + 2 ρ 2 (k)). (7) k=1 The inequality holds strictly if the ACFs are positive. The proof of the theorem is in the Appendix. Remark 2 The theorem applies to the finite 4 th moment GARCH(p, q) process and stochastic volatility models (see definition in (15)) as they satisfy the conditions of the theorem and additional positive ACF conditions. Based on the theorem, one can estimate the standard error of ˆρ(h) by (6), with σ(d, h) replaced by its estimate. However, the order of the MA process q is unknown. Hence, we use the following estimator, Var( ˆρ(h)) = 1 ) (ˆσ(0, h) + 2ˆσ(1, h) + + 2ˆσ(h 1, h). (8) nˆγ 2 (0) This is due to the fact that E(x t x t+h x s x s+h ) = 0 for s t > q for a MA(q) process, hence we have σ(q + 1, h) = σ(q + 2, h) = = σ(h 1, h) = 0 for h > q. Thus including ˆσ(q + 1, h) = ˆσ(q + 2, h) = = ˆσ(h 1, h) would not influence the estimator of the asymptotic variance. In section 3.1.1, we perform a comparison study between the variance calculation above and the standard variance formula 1 n (1+2ˆρ2 (1)+ +2ˆρ 2 (h 1)) for i.i.d innovations. Because of the differences in the variance, order identifications via ACF needs to be adjusted. A demonstration is represented in section It is noted that although our estimator is built for dependent but uncorrelated innovations, it applies to iid innovations automatically. 2.2 Autoregressive Model An autoregressive process X t with order p, namely AR(p), follows X t = φ 1 X t 1 + φ 2 X t φ p X t p + ε t. It is a common practice to test the significance of the sample partial autocorrelation function (PACF) for identifying the order of AR(p) series. The PACF φ h,h 6

7 is defined as φ 0,0 = 1 and φ h,h = the last component of φ h = Γ 1 h γ h, h 1, (9) where Γ h = [γ(i j)] h i,j=1 and γ h = (γ(1), γ(2),, γ(h)). It is estimated by replacing γ(i) with its estimate ˆγ(i) in the above formula. It can be shown (e.g. Brockwell and Davis, 1986) that the partial autocorrelation φ h,h at lag h may be regarded as the correlation between X t and X t h, adjusted for the intervening observations X t 1,, X t h+1. Since an AR(p) model imposes a linear relationship between X t and X t 1,, X t p, it is easily seen that when h > p, X t and X t h are conditionally uncorrelated, given X t 1,, X t h+1, hence the cut-off property of PACF of AR processes. If one further assumes Gaussian white noise, i.e., ε t iid N(0, σ 2 ), it is well known that h > p, denote s = h p 1, n( ˆφ p+1,p+1,, ˆφ p+s,p+s ) has an asymptotic joint normal distribution with unit variance. Namely, we have n( ˆφp+1,p+1,, ˆφ ( ) p+s,p+s ) dist N 0, I, (10) s 1 s s where I is the s dimensional identity matrix. However, similar to the MA(q) case, when {ε t } are dependent, the variance matrix of the asymptotic distribution of PACF is different from the iid case. Lemma 2 Assume {X t } to be an AR(p) process with innovation {ε t } following condition C.1. For the PACF φ h,h defined in (9), we have φ p+1,p+1 = φ p+2,p+2 = = 0. In analogy to (10), the next theorem establishes the asymptotic properties of sample PACF ˆφ h,h, h > p of an AR(p) process with dependent innovations. Theorem 2 Let ˆφ h,h be the lag h sample PACF of a stationary and invertible AR(p) time series {X t }, where the innovations {ε t } satisfies conditions C.1 and C.2. Then h > p (s = h p 1), n( ˆφp+1,p+1,, ˆφ ( ) p+s,p+s ) dist N 0, Ξ. s 1 s s where Ξ has a complex expression to be given in the proof of the theorem in Ap- 7

8 pendix. If we further assume ACF of {ε 2 t } to be nonnegative and E(ε 2 t ε t i ε t j ) = 0, i j, then. Ξ (l,l) 1, l (1,, s). (11) The strict inequality holds if all ACFs are positive. Remark 3 The dependent but uncorrelated innovations such as GARCH innovations render a large variance of sample PACF compared with the case of iid innovations under which the asymptotic variance is 1 n. So if we still use the traditional (1 α)% confidence interval to test for lag-h PACF with h > p, the type I error would be larger than α%. This over rejection leads to a specification of AR(h) model with h greater than the true order p. We can obtain the variance estimator of the sample lag-h PACF Var( ˆφ h,h ) in the following manner. First we obtain the estimate ˆφ 1,, ˆφ h and the corresponding residuals ˆε t = x t ˆφ 1 x t 1 ˆφ h x t h. Secondly, from the sample autocovariance matrix ˆΓ h and the sample version of Ω h = Cov(ε1 x 0, ε 1 x 1,, ε 1 x 1 h ), we obtain an estimate of Γ 1 h Ω hγ 1 h whose last diagonal entry over n is the variance estimator. See proof of Theorem 2 in Appendix for more details. In section 3.1.2, we perform a comparison study between variance calculation above and the standard variance 1 n under i.i.d case. It demonstrates that the order identification scheme for AR(p) process using PACF needs to be adjusted to the updated variance. The illustration of order identifications via PACF corresponding to different innovations is represented in section Autoregressive and Moving Average Model ARMA model in (1) is a hybrid of autoregressive model and moving average model. Tsay and Tiao (1984) proposed the extended autocorrelation function (EACF) technique to identify the orders of a stationary or non-stationary ARMA process based on iterated least square estimates of the autoregressive parameters. It is based on the fact that, if X t follows an ARMA(p, q) model, then the filtered process Y t = X t φ 1 X t 1 φ p X t p follows an MA(q) process Y t = ε t + ψ 1 ε t ψ q ε t q. 8

9 Under the dependent but uncorrelated innovations, in analogy to the procedure described in Tsay and Tiao (1984), the order determination scheme is proposed as follows. 1. For each candidate AR order s, we first obtain a consistent estimate of the AR coefficients, ˆφ 1, ˆφ 2,, ˆφ s. 2. Obtain Y t = X t ˆφ 1 X t 1 ˆφ 2 X t 2 ˆφ s X t s. If s = p, then Y t should approximately follow an M A(q) sequence. On the other hand, if we overfit the AR term by m (i.e. s = p + m), then Y t should be an MA(q + m) process. 3. Use the order determination procedures for moving average sequences described above on {Y t } to identify the MA order and build an EACF table as proposed in Tsay and Tiao (1984) and finalize the AR and MA order selections. Since the major difference of this scheme with that of Tsay and Tiao (1984) is in the ways of dealing with the significance of ACF elaborated in Section 2.1, we skip the discussion here. The impact on EACF process inference of different innovation terms is illustrated in Section Autoregressive Integrated Model Autoregressive integrated (ARI) model can be regarded as an extension of the AR model. The Augmented Dickey Fuller (ADF) test (Dickey & Fuller, 1979, Said and Dickey, 1984) is important in detecting whether the process is unit-root nonstationary. In this section we study ADF under uncorrelated but dependent innovations. Consider ARI model, X t = Φ 1 X t 1 + Φ 2 X t Φ p X t p + ε t. (12) Following the formulation of ADF, we present (12) as p 1 X t = ρx t 1 + a k X t k + ε t, (13) k=1 where X t k = Xt k X t k 1, ρ = p i=1 Φ i and a k = p j=k+1 Φ j. The 9

10 unit-root problem for this model focuses on testing H 0 : ρ = 1 against H 1 : ρ < 1. The ADF test statistic is T n = n( ρn 1), where ρ n is the least square estimator of ρ when X t is regressed on X t 1, X t 1,, X t p+1 using (13), ρ n = n t=p+1 X t 1(X t p 1 k=1 âk X t k ). n t=p+1 X2 t 1 Dickey and Fuller (1979) proved that under the null hypothesis T n = n( ρ n 1) dist 1 p 1 2 (1 a k )(Γ 1 (T 2 1)), where (Γ, T ) = ( i=1 γ2 i Z2 i, i=1 2γi Z i ), γ 2 i = 4[(2i 1)π] 2 and {Z i } i=1 is a sequence of iid N(0, 1) random variables. Chan and Wei (1988) showed that the limiting distribution could also be further represented in terms of function of Brownian Motion W (t), k=1 T n = n( ρ n 1) dist 1 p 1 2 (1 a k ) (W 2 (1) 1) 1 k=1 0 W 2 (t)dt. When the innovation {ε t } is not a sequence of iid random variables but dependent and uncorrelated random variables satisfying conditions C.1 and C.2, we show in the following that the same limiting distribution of the test statistics T n still applies. Before we state the results, we need the following definition from Wu and Min (2005) and lemmas. Definition 1 The process Y n = g(f n ), where g is a measurable function, is said to be L p (p 0) weakly dependent with order r(r 0) if E( Y n p ) < and n r P 1 Y n p <. (14) n=1 If (14) holds with r = 0, then Y n is said to be L p weakly dependent. We impose an additional condition on the innovation {ε t }. 10

11 Condtion C.3: The {ε t } process is L α is weakly dependent with order 1, for some α > 2. This innovation assumption includes a large class of nonlinear processes, and substantially relaxes the iid or martingale difference assumption. In particular, this condition is satisfied if the innovations are GARCH process, random coefficient AR, bilinear AR, etc. For more details, see Wu and Min (2005). The next theorem indicates that the limiting distribution of the test statistic T n does not change if we use dependent but uncorrelated errors. Theorem 3 If ρ = 1, for time series {X t } following (12) and errors {ε t } under the conditions C.1, C.2. and C.3, we have W n dist W, and therefore (1 a k ) T n dist p 1 k=1 1 0 W (t)dw (t) 1 0 W. 2 (t)dt Remark 4 According to Theorem of Fuller (1995), â k is a consistent estimator of a k and we can use â k in estimating the limiting distribution, where (ˆρ, â 1,, â k ) is the regression coefficients obtained by regressing X t on X t 1, X t 1,, X t p+1. Remark 5 Theorem 3 can not be easily extended to ARIMA models since equation (12) is no longer a simple regression model when MA part is involved. Further study would be performed on this topic. 3 Empirical Study In this section, we use both simulation and real data to demonstrate the results and methods we developed in the previous section. Simulation results for AR/MA/ARMA/ARI respectively are provided first to validate the theorems. Only part of the results are presented here. Results with different parameters of the GARCH model and results with other innovation models such as stochastic volatility model would be given upon request. For real data sets, we go through the entire order determination and estimation process. We will denote the EACF proposed by Tsay and Tiao (1984) as original EACF and the adjusted EACF for 11

12 time series associated with uncorrelated and dependent innovations as modified EACF. 3.1 Simulation Here we use GARCH(1, 1) innovations as the dependent but uncorrelated errors since GARCH is among the most common and useful models for this type of errors. Namely, we have the following model, { X t = φ 1 X t φ p X t p + ε t + + ψ q ε t q, ε t = g t ɛ t, g t = α 0 + αε 2 t 1 + βg t 1, where {ɛ t } are iid normal random sequences Moving Average Model We consider two formulae for estimating Var( ˆρ(h)): 1. V (h) = 1 n (1 + 2ˆρ2 (1) + + 2ˆρ 2 (h 1)); 2. V (h) = 1 (ˆσ(0, h) + 2ˆσ(1, h) + + 2ˆσ(h 1, h)). nˆγ 2 (0) Correspondingly, we denote the statistics ˆρ 2 (h)/v (h) and ˆρ 2 (h)/v (h) as T (h) and T (h) respectively. First, we use the following M A(1) model for simulation. { X t = ε t + ψ 1 ε t 1, ε t = g t ɛ t, g t = α 0 + αε 2 t 1 + βg t 1, with ψ 1 = 0.4 and GARCH effect setting: α = 0.2, β = 0.7. We simulate 2000 replications of time series with length 1000 each. In the Table 1, we report the empirical percentiles of T (h) and T (h) along with that of χ 2 1, for h = 2 and 3. Since the series are simulated from MA(1), the true value of ρ should be zero and asymptotically the statistics should follow χ 2 1 if the variance is estimated correctly. The last column shows the percentage of rejections if χ 2.95 is used as the critical value for testing non-zero ACF coefficients. It can be seem from Table 1 that T (h) closely follows the chi-square distribution while T (h) is much larger. Since the numerate ˆρ(h) is the same for both T (h) and T (h), it shows that the variance estimate V (h) is more accurate. The level of testing using T (h) is also more accurate. 12

13 Table 1: Empirical percentile of T (h) and T (h) for a simulated MA(1) series X t = ε t 0.4ε t 1 where innovation {ε t } is GARCH(1,1) with α = 0.2, β = 0.7. Mean S.D. 50% 75% 90% 95% 99% p T (2) % T (3) % χ T (2) % T (3) % In order to check the property of T (h) and T (h) when h q, we use the following M A(3) model for a second simulation. { X t = ε t + ψ 1 ε t 1 + ψ 2 ε t 2 + ψ 3 ε t 3, ε t = g t ɛ t, g t = α 0 + αε 2 t 1 + βg t 1, with ψ 1 = 0.8, ψ 2 = 0.8, ψ 3 = 0.8 and two different GARCH effect settings: i) α = 0.1, β = 0.8; ii) α = 0.5, β = 0.2. Again, we simulate 2000 replications of the time series with length 1000 each. The empirical percentiles of T (h) and T (h) with those of χ 2 1 are compared in the Table 2. Table 2: Empirical percentile of T (h) and T (h) for a simulated MA(3) series X t = ε t + 0.8ε t 1 0.8ε t ε t 3 where innovation ε t is GARCH(1,1) with α = 0.5, β = 0.2. Mean S.D. 50% 75% 90% 95% 99% p T (2) % T (3) % T (4) % T (5) % χ T (2) % T (3) % T (4) % T (5) % We make the following observations. 1. For h q, both T (h) and T (h) are significantly different from the χ

14 since ˆρ 2 (h)/ Var(ˆρ(h)) is asymptotically non-central χ 2 distributed as the asymptotic mean of ˆρ(h) is ρ(h) 0 when h q. 2. T (3) and T (3) are relatively greater than the others simply because ρ(3) is of a larger value. Theoretically, ρ(1) = 0.164, ρ(2) = 0.055, ρ(3) = 0.274, ρ(4) = ρ(5) = For h > q, the performance of T (h) and T (h) are similar to what we observed in MA(1) simulation. T (h) follows the χ 2 1 distribution closely while T (h) is much larger. 4. It is clear that while the power of testing using T (h) is comparable with that of T (h), its size is more accurate. Since order determination is based on the change from non-zero ACF coefficients to zero ACF coefficients, T (h) is certainly more reliable. Having confirmed that the variance of ˆρ(h) may change when faced with dependent innovations, we need to use the new order identification scheme discussed in previous section. Since MA model is a specific case of ARMA model, we incorporate the interpretation of the results of the new scheme compared to the classical one into our discussion of the ARMA model and only list the result table here (Table 3). All the interpretation in Section can be applied here though Autoregressive Model For comparing the variance estimator under i.i.d assumption and our proposed estimator, we consider two formulae for estimating the variance Var( ˆφ h,h ): 1. V (h, h) = 1 n ; 2. V (h, h) = 1 n [ˆΓ 1 h ˆΩ hˆγ 1 h ] (h,h). Correspondingly, we denote the statistics ˆφ 2 h,h /V (h, h) and ˆφ 2 h,h /V (h, h) as T (h) and T (h) respectively. We use AR(1) model for the first simulation experiment in this setting. { X t = φ 1 X t 1 + ε t, ε t = g t ɛ t, g t = α 0 + αε 2 t 1 + βg t 1, with φ 1 = 0.9 and GARCH effect setting: α = 0.5, β =

15 Table 3: EACF method comparison for a simulated MA(3) series with different type of innovations: X t = ε t + 0.8ε t 1 0.8ε t ε t 3 where innovation ε t s types are: (1) iid; (2) GARCH(1,1) with α = 0.1, β = 0.8; (3) GARCH(1,1) with α = 0.5, β = 0.2. The time series length is 1000 while the replication is 1000 as well. The numbers (in %) represent the percentage which the order (in row) identified by the method (in column) for the simulated series. Each column would sum up to 100%. Innovation (1) Innovation (2) Innovation (3) Models original modified original modified original modified MA(2) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% MA(3) 85.10% 89.80% 76.50% 87.70% 69.40% 85.00% MA(4) 6.60% 4.10% 9.30% 5.90% 13.90% 5.10% MA(5) 2.60% 1.90% 6.10% 2.20% 6.10% 1.90% ARMA(1,3) 0.20% 0.00% 0.30% 0.10% 0.40% 0.20% ARMA(1,4) 2.10% 1.60% 3.00% 0.50% 3.80% 0.00% ARMA(1,5) 0.00% 0.00% 0.00% 0.00% 0.00% 1.00% ARMA(2,3) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% ARMA(2,4) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% ARMA(2,5) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% ARMA(3,3) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% ARMA(3,4) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% Others 3.40% 2.60% 4.80% 3.60% 6.40% 6.80% 15

16 We simulate 2000 replications of time series with length 1000 each. The empirical percentiles of T (h) and T (h) with those of χ 2 1 are listed in the Table 4. It shows very similar results as that of Table 1 and leads to similar conclusions. Table 4: Empirical percentile of T (h) and T (h) for a simulated AR(1) series X t = 0.9X t 1 + ε t where innovation {ε t } is GARCH(1,1) with α = 0.5, β = 0.2. Mean S.D. 50% 75% 90% 95% 99% p T (2) % T (3) % χ T (2) % T (3) % For h p, we use the following AR(3) model for a second simulation. { X t = φ 1 X t 1 + φ 2 X t 2 + φ 3 X t 3 + ε t, ε t = g t ɛ t, g t = α 0 + αε 2 t 1 + βg t 1, with φ 1 = 0.8, φ 2 = 0.8, φ 3 = 0.8 and two different GARCH effect settings: i) α = 0.1, β = 0.8; ii) α = 0.5, β = 0.2. Again, we simulate 2000 replications of time series with length 1000 each. The empirical percentiles of T (h) and T (h) with those of χ 2 1 are compared in the Table 5. Again, we see very similar results as those in Table 2. Table 5: Empirical percentile of T (h) and T (h) for a simulated AR(3) series X t = 0.8X t 1 0.8X t X t 3 + ε t where innovation ε t is GARCH(1,1) with α = 0.5, β = 0.2. Mean S.D. 50% 75% 90% 95% 99% p T (2) % T (3) % T (4) % T (5) % χ T (2) % T (3) % T (4) % T (5) % 16

17 Table 6: EACF method comparison for a simulated AR(3) series with different type of innovations: X t = 0.8X t 1 0.8X t X t 3 + ε t where innovation ε t s types are: (1) iid; (2) GARCH(1,1) with α = 0.1, β = 0.8; (3) GARCH(1,1) with α = 0.5, β = 0.2. The time series length is 1000 while the replication is 1000 as well. The numbers (in %) represent the percentage which the order (in row) identified by the method (in column) for the simulated series. Each column would sum up to 100%. Innovation (1) Innovation (2) Innovation (3) Models original modified original modified original modified AR(2) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% ARMA(2,1) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% ARMA(2,2) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% AR(3) 57.90% 64.80% 48.40% 69.20% 33.60% 70.60% ARMA(3,1) 8.50% 10.40% 10.00% 8.60% 16.50% 7.40% ARMA(3,2) 5.30% 5.00% 7.70% 4.10% 10.10% 3.90% AR(4) 0.70% 0.60% 0.90% 0.50% 0.60% 1.00% ARMA(4,1) 18.20% 13.70% 19.30% 13.10% 18.20% 12.60% ARMA(4,2) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% AR(5) 1.20% 0.30% 1.30% 0.30% 1.60% 1.10% ARMA(5,1) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% ARMA(5,2) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% Others 8.20% 5.20% 12.40% 4.20% 19.40% 3.40% Having confirmed that the variance of ˆφ h,h may change when faced with dependent innovations, we need to use the new order identification scheme discussed in previous section. Since AR model is also a specific case of ARMA model, we incorporate the interpretation of the results of the new scheme compared to the classical one into our discussion of the ARMA model and only report the results in Table 6. All the interpretation in section can be applied here though Autoregressive Moving Average Model Here we compare the different inferences of EACF, the original EACF and the modified EACF, on time series with either iid innovations or uncorrelated but dependent innovations. The experiment is designed as follows. 1. Simulate ARM A(1, 1) time series with three different innovations, iid, GARCH innovations (with two different parameter settings) and innovations following a stochastic volatility model (to be given below). The length 17

18 of the time series is 1000, with 1000 replications. 2. For each time series generated, estimate the EACF table and use the original inference procedure (under iid assumption) to identify the model order. 3. Of the 1000 replications, count the frequency (percentage) of each candidate models selected by the original EACF procedure 4. Repeat step 2 and 3 using the modified EACF procedure. The parameters for the ARMA(1, 1) mean equation is (0.8, 0.5) for the AR and MA coefficients respectively. We use four types of innovations: (i) iid; (ii) GARCH(1,1) with α = 0.1, β = 0.8; (iii) GARCH(1,1) with α = 0.5, β = 0.2; (iv) innovation follows a stochastic volatility with α 1 = 0.5 where the model is defined as ε t = σ t e t, (1 α 1 B) ln(σ 2 t ) = v t, (15) where e t s are iid N(0, 1), v t s are iid N(0, σv), 2 {e t } and {v t } are independent. The reason we simulate two settings of GARCH model is that the GARCH effects differ with different parameters. Min (2004) showed that Var( ˆρ) is significantly impacted if Cov(ε 2 0, ε2 q k+i )/E2 (ε 2 0 ) is large. For instance, if {ε t} is a GARCH(1,1) process as is the case in our simulation, Cov(ε 2 0, ε2 1 )/σ4 = 2α + (6α 2 (α + β/3))/(1 2α 2 (α + β) 2 ). Given α = 0.5 and β = 0.2 the ratio is 86. On the other hand, with α = 0.1 and β = 0.8, the ratio is The simulation results are reported in Table 7, showing the percentage of a candidate model being selected based on either the original inference or the modified inference of the EACF table. From Table 7, it is seen that the modified EACF procedure has similar performance as the original EACF for pure ARMA models, but outperforms it for ARMA+GARCH Models. We run a set of formal tests to see it more clearly. Since the same sets of simulated time series are used for original EACF and modified EACF procedure, we create a 2 2 contingency table for each innovation type and run chi-square test to compare identification power. The results are shown in Table 8 with the table and corresponding chi-square statistic and its p-value, for each innovation type. From Tables 7 and 8, we see that both procedures works equally well with iid innovations (large p-value), confirming that the two algorithms are almost 18

19 Table 7: EACF method comparison for a simulated ARMA(1,1) series with different type of innovations: X t = 0.8X t 1 + ε t + 0.5ε t 1 where innovation ε t s types are: (1) iid; (2) GARCH(1,1) with α = 0.1, β = 0.8; (3) GARCH(1,1) with α = 0.5, β = 0.2 (4) Stochastic Volatility with α 1 = 0.5. The time series length is 1000 while the replication is 1000 as well. The numbers (in %) represent the percentage which the order (in row) identified by the method (in column) for the simulated series. Each column would sum up to 100%. Innovation (i) Innovation (ii) Innovation (iii) Innovation (iv) Models original modified original modified original modified original modified white noise 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% MA(1) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% MA(2) 0.00% 0.00% 0.00% 0.00% 0.00% 0.20% 0.00% 0.00% MA(3) 0.00% 0.00% 0.00% 0.00% 0.00% 0.20% 0.00% 0.00% AR(1) 0.00% 0.00% 0.00% 0.00% 0.00% 0.30% 0.00% 0.00% ARMA(1,1) 81.40% 82.40% 76.90% 82.70% 66.20% 87.60% 70.10% 84.30% ARMA(1,2) 9.90% 8.50% 11.50% 7.60% 16.30% 4.40% 10.80% 7.40% ARMA(1,3) 4.10% 3.90% 5.30% 4.50% 7.50% 3.00% 6.10% 3.70% AR(2) 0.00% 0.00% 0.00% 0.00% 0.00% 0.10% 0.00% 0.00% ARMA(2,1) 0.20% 0.40% 0.40% 0.10% 0.80% 0.20% 0.40% 0.00% ARMA(2,2) 2.70% 3.10% 3.60% 3.30% 3.50% 2.10% 7.10% 3.00% ARMA(2,3) 0.80% 0.30% 0.80% 0.40% 0.70% 0.40% 1.50% 0.50% AR(3) 0.00% 0.00% 0.00% 0.10% 0.10% 0.20% 0.10% 0.00% ARMA(3,1) 0.00% 0.00% 0.10% 0.00% 0.20% 0.10% 0.00% 0.00% ARMA(2,2) 0.10% 0.20% 0.10% 0.20% 0.60% 0.20% 0.20% 0.10% ARMA(3,3) 0.50% 0.90% 0.70% 0.50% 1.90% 0.50% 2.30% 0.50% Others 0.30% 0.30% 0.60% 0.60% 2.20% 0.50% 1.40% 0.50% 19

20 Table 8: EACF procedure comparison by 2 2 contingency table and chi square test. Innovation (i) original modified ARMA(1,1) chi-square non ARMA(1,1) p-value Innovation (ii) original modified ARMA(1,1) chi-square non ARMA(1,1) p-value Innovation (iii) original modified ARMA(1,1) chi-square non ARMA(1,1) p-value 7.13E-30 Innovation (iv) original modified ARMA(1,1) chi-square non ARMA(1,1) p-value 3.78E-14 equivalent for iid error terms. With GARCH innovation but relatively weak dependence (innovation ii), the improvement by the modified EACF compared to original EACF is significant but not as strong as the stronger dependent GARCH innovation case (innovation iii). For the case of innovation following the Stochastic Volatility model, the effect is quite similar to that of GARCH situation. Hence the modified EACF procedure works for this type of innovations as well Autoregressive Integrated Model In this subsection, we study the ADF test for ARI process with dependent but uncorrelated innovations. As we discussed in the previous section, the ADF test continues to work in this case and we illustrate the conclusion by the following simulation. We use the following ARI(1, 1) model as the mean equation. { X t = φ X t 1 + ε t, ε t = g t ɛ t, g t = α 0 + αε 2 t 1 + βg t 1. The innovations used follow GARCH model with two parameter settings: α = 0.1, β = 0.8 and α = 0.5, β = 0.2. The results of 5% level test for 10,000 replications are reported in Table 9. It also include the results for an ARI(1, 0) to check the power of the test. 20

21 Table 9: Results of ADF Test: replication = 10 4 ; time series length = Model φ Innovations Percentage of rejection ARI(1,0) 0.95 iid % ARI(1,1) 0.95 iid 5.09% ARI(1,1) 0.95 α = 0.1, β = % ARI(1,1) 0.95 α = 0.5, β = % ARI(1,0) 0.8 iid % ARI(1,1) 0.8 iid 4.92% ARI(1,1) 0.8 α = 0.1, β = % ARI(1,1) 0.8 α = 0.5, β = % ARI(1,0) 0.5 iid % ARI(1,1) 0.5 iid 5.03% ARI(1,1) 0.5 α = 0.1, β = % ARI(1,1) 0.5 α = 0.5, β = % With a very large number of replication, we confirm that the ADF test maintains roughly the correct level under the null hypothesis, regardless the type of innovations, as our theorem indicates. 3.2 Real Data We analyze the following two real data sets utilizing our modified EACF procedure in this section General Motor Stock Price We analyze the log-return series of General Motor company (GM) for the period from January 2, 1996 to May 8, 2006, shown in Figure 1. Order determination schemes are then applied. The EACF significance tables with the original and modified estimated variance are shown in Table 10, where 0 indicates insignificant EACF coefficient at 5% level and 2 indicates a significant one. It is seen that the original inference procedure indicates an ARM A(2, 2) model while the modified EACF procedure indicates a white noise model. When ARMA(2,2)+GARCH(1,1) model in the form { X t = µ + φ 1 X t 1 + φ 2 X t 2 + ε t + ψ 1 ε t 1 + ψ 2 ε t 2, ε t = g t ɛ t, g t = α 0 + α 1 ε t 1 + β 1 g t 1, 21

22 is fitted to the data, all of the AR and MA coefficients are not significant, shown in Table 11. It shows that the modified EACF procedure provides more reliable model identification results. Although starting with a wrong model does not necessary result in an inaccurate final model, but one has to go through a more tedious and effort consuming estimation and model checking process. Figure 1: General Motor Stock Return Series. GE Daily Log return Series using the Closing Price Series logreturn Time Table 10: Comparison of EACF tables for GM return data. Original EACF Table Modified EACF Table [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,1] [,2] [,3] [,4] [,5] [,6] [,7] [1,] [1,] [2,] [2,] [3,] [3,] [4,] [4,] [5,] [5,] [6,] [6,]

23 Table 11: The fitting results of GM data set using ARMA(2,2)+GARCH(1,1) model. Estimate Std. Error t value P r(> t ) Significance µ 2.37E E φ E E φ E E ψ E E ψ E E α E E α E E E-09 *** β E E <2.00E-16 *** Traffic Volume in a Large City Time series analysis is widely used in traffic control applications. Large amount of traffic information including speed and volume have been collected but accurate prediction remains a challenge. Here we analyze a traffic volume data collected in a major city in China. Each observation records the traffic volume (number of vehicles) passing a specific location is recorded every 3 minutes. The data set contains 8 weeks of data, with 480 observations each day. We use the first seven weeks for model identification and the last week to test the prediction results. Figure 2 shows four of the eight weeks of the time series. Clearly the time series possesses strong daily and weekly seasonality. To remove the seasonality and stabalize the variance, for each observation, we subtract from it the average of the volumes observed at the same time in the seven weeks, then divide it by the standard deviation of these seven volumes. Again, we use both the original EACF procedure and modified EACF procedure on the resulted series. Table 12 shows the result. According to the EACF table, the modified EACF procedure identifies an ARMA(1,1) while the original EACF procedure identifies an ARMA(1,4) model, if the significant entries are treated strictly. The estimation result for both models are shown in Table 13. The estimation shows that the last of the three extra MA terms in the ARMA(1,4)+GARCH(1,1) model is actually significant. The two models have very comparable BIC values (2.485 and 2.487). To further compare the two models, we obtain the multi-step prediction error using observation of the next 23

24 Figure 2: Big City Traffic Volume Data Set. week 1 week 2 Volumn Volumn Time Time week 3 week 4 Volumn Volumn Time Time Table 12: Comparison of EACF Tables for Traffic Data Set. Original EACF Table Modified EACF Table [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [1,] [1,] [2,] [2,] [3,] [3,] [4,] [4,] [5,] [5,] [6,] [6,]

25 Table 13: Estimation results for the traffic data ARMA(1,4)+GARCH(1,1) ARMA(1,1)+GARCH(1,1) Est Std. Err t value p value Est Std. Err t value p value ar1 9.90e e < 2e e e < 2e-16 ma1-8.81e e < 2e e e < 2e-16 ma2-2.58e e ma3-1.23e e ma4-2.79e e e-05 ω 5.44e e e e e e-10 α e e < 2e e e < 2e-16 β e e < 2e e e < 2e-16 week. Table 14 shows the Sum of Squares of Prediction Errors, defined as t (ˆx t(d) x t+d ) 2, where both the predictions and true observations are based on the transformed series. Here the d-step ahead prediction ˆx t (d) is an prediction of x t+d, made at time t with observations up to x t, and the model parameters estimated using the first seven weeks of data. It shows that the simpler model identified by the modified EACF procedure performs slightly better than the more complex one. Table 14: Sum of Squares of Prediction Errors: The numbers listed below are based on the summation of n = 3360 prediction error squares obtained from the eighth weeks transformed series. Sum of Squares of Prediction Error Model Prediction Steps d ARMA(1,4)+GARCH(1,1) ARMA(1,1)+GARCH(1,1) Appendix: Proof of the Theorems Proof of Theorem 1: Given conditions C.1 and C.2, Theorem 3 of Wu and Min (2005) implies that, for a given h, ( dist n(ˆγ(0) γ(0), ˆγ(1) γ(1),, ˆγ(h) γ(h)) N 0, (h+1) 1 Π (h+1) (h+1) where Π is the covariance matrix of random vector t= P 1(x t x t, x t 1 x t,, x t h x t ). 25 ),

26 By introducing a function H : R h+1 R h : H((x 0,, x h ) ) = (x 1 /x 0,, x h /x 0 ), with the argument of Delta method, we have the sample autocorrelation functions ˆρ(1),, ˆρ(h) follow asymptotic joint normal distribution, i.e. ( ) dist n(ˆρ(1) ρ(1),, ˆρ(h) ρ(h)) N 0, Σ, h 1 h h with Σ = DΠD. D R h (h+1) is the gradient of H( ) evaluated at (γ(0),, γ(h)), i.e. D = 1 γ(0) [ ρ h, I h ] where ρ h = (ρ 1,, ρ h ) and I h is the identity matrix of h h. This completes the proof for the first part of Theorem 1. For h > q, by the obvious fact ρ(h) = 0 for MA(q) process and the above expression for Σ, we have nˆρ(h) dist N(0, δ 2 h ), where δ 2 h = ξ 2 /γ 2 (0) and ξ = t= P 1x t h x t. Since {x t } follows MA(q) process x t = q i=1 ψ ix t i + ε t, we have ξ = ε 1 q i=0 ψ ix 1 h+i if h > q, therefore we have the following calculation for ξ 2, q ξ 2 = E(ε 1 ψ i x 1 h+i ) 2 i=0 = E[ε 2 1( κ i ε i ) 2 ] MA representation of {x t } = = i=0 κ i κ j E(ε 2 1ε i ε j ) = i,j=0 i=0 κ 2 i E(ε 2 1ε 2 i) i=0 κ 2 i E(ε 2 1)E(ε 2 i) + κ 2 i Cov(ε 2 1, ε 2 i). The first term above is i=0 i=0 i=0 i j terms vanish by assumption q E(ε 2 1) κ 2 i E(ε 2 i) = E(ε 2 1)E( κ 2 i ε 2 i) = E(ε 2 1)E( κ i ε i ) 2 = E(ε 2 1)E( ψ i x 1 h+i ) 2 = σ 2( q ψi 2 γ(0) + 2 i=0 = γ 2 (0) + 2 q γ 2 (k). k=1 q k=1 i=0 i=0 q ψ i ψ i+k γ(k) ) i=0 26

27 The last equality is due to the fact that σ 2( q ) i=0 ψ2 i = γ(0) and σ 2 ( q i=0 ψ ) iψ i+k = γ(k). Thus we have q q δh 2 (1 + 2 ρ 2 (k)) = ξ 2 /γ 2 (0) (1 + 2 ρ 2 (k)) = k=1 k=1 κ 2 i Cov(ε 2 1, ε 2 i)/γ 2 (0) i=0 Hence δ 2 h 1+2 q k=1 ρ2 (k) if i=0 κ2 i Cov(ε2 1, ε2 i )/γ2 (0) is nonnegative. This is true if all ACF of the ε 2 t series are non-negative. If all ACF are strictly positive, then δ 2 h > q k=1 ρ2 (k). In addition, when E(ε 2 n F n 1 ) = σ 2, we have the strict equality, which is the same as the iid case. To estimate the variance of ˆγ(h), h > q, note that Var(ˆγ(h)) = Var( 1 n h x t x t+h ) = 1 n h n n 2 Var( x t x t+h ) = MA(q) = n h n Symmetry = E(x 0 x h )=0 = t=1 t=1 1 ( n h n h n 2 Var(x t x t+h ) + 1 ( n 2 (n h)var(x 0 x h ) + 2 t=1 n h t=1 s=1,s t ) Cov(x t x t+h, x s x s+h ) q ) (n t)cov(x 0 x h, x t x t+h ) t=1 1 ( q Var(x 0 x h ) + 2 n t=1 1 Cov(x 0 x h, x d x d+h ) = 1 n n 1 n d q d q ) Cov(x 0 x h, x t x t+h ) d q ( ) E(x 0 x h x d x d+h ) E(x 0 x h )E(x d x d+h ) E(x 0 x h x d x d+h ) = 1 σ(d, h). (16) n d q Secondly, recall (4), we have ˆρ(h) = ˆγ(h) ˆγ(0). By Theorem 3 of Wu and Min (2005), [ ˆγ(0) ˆγ(h) ] ( [ dist γ(0) N γ(h) ] ), n 1 V. (17) 27

28 By Delta method (see Casella and Berger, 2002), we have for h > q Var(ˆρ(h)) = Var( ˆγ(h) ˆγ(0) ) ( γ(h) γ 2 (0), 1 γ(h)=0 = by (16) = ) ( [n 1 V ] γ(h) γ 2 (0), 1 ) T γ(0) γ(0) 1 γ 2 (0) n 1 V 22 Var(ˆγ(h)) γ 2 (0) 1 ( ) σ(0, h) + 2σ(1, h) + + 2σ(q, h). nγ 2 (0) Proof of Theorem 2: We adopt the notation X k R n k and ε R n 1 for any given k (p, p + s], x 0 x 1 x 1 k x X k = 1 x 0 x 2 k......, ε = x n 1 x n 2 x n k where {x t t = 1,, n} is the observed sequence and in the definition for X k we assume that x l = 0 for l 0. We first argue the asymptotic joint normal distribution for (X p+sε)/ n. In fact, with conditions C.1 and C.2, we can apply the Theorem 3 of Wu and Min (2005) respectively to new sequences of {x t j + ε t+1 } and {x t j ε t+1 } for any given j. Notice x t j ε t+1 = [(x t j +ε t+1 ) 2 (x t j ε t+1 ) 2 ]/4, we have P 1 (x t j ε t+1 ) <, which leads to the asymptotic normality of t=1 t=1 n x t j ε t+1 / n. Through Cramér-Wold device, we obtain asymptotic joint nor- mality of { n x t j ε t+1 / n j = 1,..., p + s}, i.e. (X p+sε)/ n. In particular, t=1 the mean vector of the asymptotic joint normal distribution is a vector of zeros since E(x t j ε t+1 ) = 0. On the other hand, lag-k PACF ˆφ k,k is obtained from the OLS regression of x t = φ 1 x t 1 + +φ p x t p + +φ k x t k where the true value of φ p+1,, φ k are all zero. For each k (p, p+s], n ˆφ k,k = [(X k X k/n) 1 ] (kth row) (X k ε)/ n is a linear transformation of the vector X p+sε/ n. Hence the vector n( ˆφ p+1,p+1,, ˆφ p+s,p+s ) also approaches a linear transformation of X p+sε/ n. Since (X p+sε)/ n follows asymptotic joint normal distribution with zero mean, it follows n( ˆφp+1,p+1,, ˆφ ( ) p+s,p+s ) dist N 0, Ξ. s 1 s s ε 1 ε 2. ε n, 28

29 This completes the proof for the first part of Theorem 2. To show (11), for simplicity of notations, lets denote k = p + l hereafter. Ξ (l,l) is related to the variance of asymptotic distribution of lag-k sample PACF ˆφ k,k. We first notice by the Theorem 3 of Wu and Min (2005), (X k ε)/ n has asymptotic normal distribution with covariance matrix Ω k = E(ξξ ) where ξ = t= (P 1 (ε t x t 1 ),, P 1 (ε t x t k )) = (ε 1 x 0, ε 1 x 1,, ε 1 x 1 k ). Since n ˆφ k,k = [(X k X k/n) 1 ] (kth row) (X k ε)/ n, we have Ξ (l,l) = [Γ 1 k Ω kγ 1 k ] (k,k), the (k, k) entry of matrix [Γ 1 k Ω kγ 1 k ] where Γ k := [γ(i j)] 1 i,j k. We will show Ω k σ 2 Γ k, i.e., Ω k σ 2 Γ k being a positive semidefinite matrix. To this end, consider any non-random vector C = (c 0, c 1,, c 1 k ) R k with a positive L 2 norm, C Ω k C = C [Cov(ε 1 x 0, ε 1 x 1,, ε 1 x 1 k ) ]C = E[ε 2 1(c 0 x 0 + c 1 x c 1 k x 1 k ) 2 ] = E[ε 2 1( κ i ε i ) 2 ] MA representation of {x t } = So Γ 1 k Ω kγ 1 k i=0 κ i κ j E(ε 2 1ε i ε j ) = i,j=0 κ 2 i E(ε 2 1)E(ε 2 i) i=0 κ 2 i E(ε 2 1ε 2 i) i=0 i j terms vanish by assumption ACF of {ε 2 t } are nonnegative by assumption = σ 2 κ 2 i E(ε 2 i) = σ 2 E[(c 0 x 0 + c 1 x c 1 k x 1 k ) 2 ] i=0 = σ 2 C Γ k C > 0 Γ 1 k σ2 Γ k Γ 1 k = σ 2 Γ 1 k. It is known that for AR(p) process the (k, k) entry of Γ 1 k is 1/σ 2 for k > p. The result follows in view of [Γ 1 k Ω kγ 1 k ] (k,k) σ 2 [Γ 1 k ] (k,k) = 1. If all ACFs are positive, then the inequality in above derivations holds strictly since Ω k > σ 2 Γ k. Let S n = n t=1 ε t, then Sn 2 2 = E( n t=1 ε t) 2 = E( n t=1 ε2 t ). Denote Vn 2 S n 2 2 = n t=1 ε2 t. For k = 0,, n, set { Sk V n, t = k n ; W n (t) = S k V n + (nt k) ε k+1 V n, k n t k+1 n. = 29

30 Lemma 3 For time series {X t } following (12), let θ i be the coefficients at lag i of the MA infinity representation of {X t }, Θ n = n i=0 θ i, Bn 2 = n 1 i=0 Θ2 i. Assume that the error series {ε t } satisfies condition C.3, i=1 (Θ n+i Θ i ) 2 = o(bn) 2 and B n, then W n dist W. Remark 6 The condition i=1 (Θ n+i Θ i ) 2 = o(bn) 2 is related to the mean equation s coefficients and is automatically satisfied for stationary and invertible time series. Proof of Lemma 3: See Theorem 1 of Wu and Min (2005). Proof of Theorem 3: Recall that S n = n t=1 ε t and Vn 2 = S n 2 2 = n t=1 ε2 t. Also recall that ˆρ n, the least square estimator of ρ, has the following form ρ n = n t=p+1 X t 1(X t p 1 k=1 âk X t k ). n t=p+1 X2 t 1 Under null hypothesis where ρ = 1, since no MA part is involved in X t, from multiple regression property â k a k, we immediately have T n = n( ρ n 1) = n n t=p+1 X t 1(X t X t 1 p 1 k=1 âk X t k ) n t=p+1 X2 t 1 = 1 n t=p+1 n X t 1ε t + o p (1). 1 n n 2 t=p+1 X2 t 1 Denote c = (1 p 1 k=1 a k) 1, and use the same polynomial decomposition given by Theorem of Fuller (1995). Although the conditions on innovations are different, they do not impact the argument for this specific part. Using the same argument which still hold for the our assumptions, we get 1 [ n 2 1 [ n n t=p+1 n t=p+1 X 2 t 1 c 2 X t 1 ε t c n t=p+1 n t=p+1 S 2 t 1] = op (1); S t 1 ε t ] = op (1). 30

STAT 443 Final Exam Review. 1 Basic Definitions. 2 Statistical Tests. L A TEXer: W. Kong

STAT 443 Final Exam Review. 1 Basic Definitions. 2 Statistical Tests. L A TEXer: W. Kong STAT 443 Final Exam Review L A TEXer: W Kong 1 Basic Definitions Definition 11 The time series {X t } with E[X 2 t ] < is said to be weakly stationary if: 1 µ X (t) = E[X t ] is independent of t 2 γ X