Zhaoxing Gao and Ruey S Tsay Booth School of Business, University of Chicago. August 23, 2018

Size: px

Start display at page:

Download "Zhaoxing Gao and Ruey S Tsay Booth School of Business, University of Chicago. August 23, 2018"

Violet Taylor
5 years ago
Views:

1 Supplementary Material for Structural-Factor Modeling of High-Dimensional Time Series: Another Look at Approximate Factor Models with Diverging Eigenvalues Zhaoxing Gao and Ruey S Tsay Booth School of Business, University of Chicago August 23, 2018 This online supplement consists of two parts. The first part analyzes another real example to further illustrate the usefulness of the proposed method, and the second part presents the technical proofs of the theorems in the main article. 1 An Additional Real Example In this section, we apply the proposed method to another real example to illustrate further its usefulness in practice. In this example, the dimension is greater than the sample size. Example 4. Consider the half-hourly temperature data observed at the Adelaide Airport from Sunday to Saturday in Adelaide, Australia between July 6, 1997 and March 31, The data are available in the R package fds. See also Shang and Hyndman (2011) for details. There are 48 observations per day, 7 days a week, and 508 weeks. From the original data we stack observations from the same week together from Sunday to Saturday and treat the temperature data of each week as a time series. Consequently, we have 508 time series each with sample size 336. The left panel of Figure A1 shows the time plots of 48 observations of the 508 Mondays, from which we see that, as expected, the temperature series exhibit certain diurnal pattern meaning that it is lower at night and higher in the afternoon. To remove such a diurnal pattern and any possible trend, we take the first difference y t = ỹ t ỹ t 1, where 1

2 ỹ t R 508 and t = 1,..., 336. The right panel of Figure A1 shows the time plots of y t, which appears to be weakly stationary with some volatility clusterings. Thus, in this example, we employ y t with n = 335 and p = 508. We apply the white noise test T (m) to the data with k 0 = 5 in Equation (2.9), m = 10 and the 97.5%-quantile 3.68 of the Gumbel distribution, and obtain an estimated number of factors r = 6. We also calculate the eigenvalues of Ŝ. Figures A2(a) and (b) show the first 10 eigenvalues of the covariance matrix of y t and Ŝ, respectively. We see clearly that the largest eigenvalue of Ŝ is larger than that of the covariance matrix of the data, which again supports the assumption that the largest eigenvalues of the noise covariance matrix are diverging for large p. Figure A2(c) plots the ratios of eigenvalues of Ŝ and shows that the largest gap between the eigenvalues occurs between µ 1 and µ 2. But following the proposed procedure, we choose K = min{ p, n, 10} = 10. The spectral densities of the 6 estimated factors are given in Figure A3, and they hardly change if we vary K from 1 to 10. From the spectral densities, we see that the estimated factors are all non-trivial processes and the variance of the first factor is extremely large compared to those of the others. On the other hand, we calculate the eigenvalues of the covariance matrices of the estimated factors x t and the data y t, the largest 6 eigenvalues are (72.88, 17.37, 11.17, 8.69, 7.68, 4.56) and (73.46, 19.13, 11.85, 11.11, 10.89, 9.27), respectively. We see that the eigenvalues of the covariance matrix of the estimated factors via the proposed method are almost at the same levels as those of the largest 6 eigenvalues of the covariance matrix of the data. This is reasonable and different from the result of principal component analysis because the proposed method can mitigate some effects of the idiosyncratic noises. We also apply the methods of BN and LYB to y t. The estimated numbers of factors are r = 6 and r = 1, respectively, by the principal component analysis and the ratio-based method. The spectral densities of the first 6 principal component factors are shown in Figure A4, from which we see that they are different from those in Figure A3. Again, this is understandable because the principal component analysis only employs the sample covariance matrix. Since the eigenvalues of the covariance matrix of Â1 x t using principal component analysis are the same as the first 6 eigenvalues of Σ y, they are likely influenced by the noise components, complicating the interpretation of common factors. On the other hand, the eigenvalues of the covariance matrix of the estimated factors via the proposed method are not only close to the first 6 eigenvalues of Σ y, but also are less affected by the noises, some effects of which are mitigated by the projected PCA. For the ratio-based method of LYB, the 2

3 Temperature (degree) Half hour Time Figure A1: Left: The 30-minute temperature of 508 Mondays in Example 4; Right: Time plots of the first difference of observed weekly data. estimated number of factor is r = 1. However, the spectral densities of the first 6 transformed series û 1t,...û 6t via the eigen-analysis shown in Figure A5 strongly suggest that û 2t to whu 6t are not white noise, a contradiction with the assumptions made in LYB. Finally, we also compare the forecast performance among the three methods. We estimate the models using data in the time span [1, τ] with τ = 287,..., 335 h for h-step ahead forecast, i.e. we use the data between Sundays to Fridays and onward to predict the temperature of Saturdays. The forecast error is similarly defined as (4.6) in Example 3 by replacing the dimension and the sample size accordingly with those of the example. The estimated number of factors are quite stable in the sub-samples and, therefore, we set r = 6, 6 and 1 for the methods of GT, BN and LYB, respectively. Table A1 reports the 1-step to 3-step ahead forecast errors using AR(1), AR(2), and AR(3) models. The smallest one in each step is shown in boldface. From the table, we see that our method is also capable of producing accurate forecasts if we search K properly. As an illustration, the point-wise 1-step ahead forecast errors are given in Figure A6, where the benchmark values are obtained in the same way as those in Example 3. From the table, we see that the three methods all perform well and are close to each other. They are much better than the benchmark model. 2 Proofs of Theorems In the proofs, we use C as a generic constant whose value may change at different place. We start with some useful lemmas. The following Lemma is Theorem in Golub and Van Loan (1996), which is stated 3

4 ^ γi ^ µi ^ µi+1 µi ^ (a) (b) (c) Figure A2: (a) The first 10 eigenvalues of the sample covariance matrix of the series y t in Example 4; (b) The first 10 eigenvalues of Ŝ; (c) The plots of the ratios for the eigenvalues µ i of the matrix Ŝ. x^1t x^2t x^3t x^4t x^5t x^6t Figure A3: Sample spectral densities of the 6 estimated factors using the proposed methodology and K = 10 in Example 4. 4

5 BN : x^1t BN : x^2t BN : x^3t 1e 02 1e+00 1e BN : x^4t BN : x^5t BN : x^6t Figure A4: Sample spectral densities of the 6 estimated factors using the principal component analysis in Bai and Ng (2002) in Example 4. u^1t u^2t u^3t u^4t u^5t u^6t Figure A5: Sample spectral densities of the first 6 transformed series using the eigen-analysis in Example 4. 5

6 Table A1: The 1-step, 2-step and 3-step ahead forecasting errors in Example 4 with different VAR models. Standard errors are in the parentheses. GT denotes our method, BN denotes the principal component analysis in Bai and Ng (2002) and LYB is the one in Lam et al. (2011). Boldface numbers denote the smallest value of each model. GT BN LYB Step AR K = 1 K = 2 K = 3 K = 4 K = 5 K = 6 K = 7 K = 8 K = 9 K = (0.282) (0.281) (0.281) (0.281) (0.282) (0.282) (0.281) (0.282) (0.281) (0.282) (0.281) (0.281) (0.283) (0.283) (0.282) (0.282) (0.282) (0.282) (0.282) (0.282) (0.281) (0.282) (0.279) (0.282) (0.282) (0.281) (0.280) (0.281) (0.281) (0.280) (0.280) (0.280) (0.280) (0.280) (0.291) (0.280) (0.282) (0.282) (0.282) (0.282) (0.282) (0.282) (0.282) (0.283) (0.282) (0.282) (0.285) (0.287) (0.282) (0.282) (0.283) (0.282) (0.282) (0.282) (0.282) (0.282) (0.282) (0.282) (0.282) (0.285) (0.281) (0.280) (0.280) (0.280) (0.280) (0.280) (0.281) (0.281) (0.281) (0.281) (0.307) (0.283) (0.288) (0.289) (0.289) (0.289) (0.289) (0.289) (0.289) (0.289) (0.289) (0.289) (0.287) (0.297) (0.289) (0.290) (0.290) (0.290) (0.289) (0.289) (0.289) (0.289) (0.290) (0.289) (0.288) (0.294) (0.287) (0.287) (0.287) (0.286) (0.286) (0.287) (0.287) (0.288) (0.287) (0.286) (0.305) (0.291) Forecast Error GT BN LYB Bechmark Window Figure A6: Time plots of the 1-step ahead point-wise forecast errors using AR(1) models in Example 4 with K = 1. GT denotes our method, BN denotes the principal component analysis in Bai and Ng (2002) and LYB is the one in Lam et al. (2011). 6

7 explicitly since it plays an important role in establishing our theorems. See also Johnstone and Lu (2009) and Lam et al. (2011). Lemma 1. Suppose A and A + E are n n symmetric matrices and that Q = [Q 1, Q 2 ] (Q 1 R n r and Q 2 R n (n r) ) is an orthogonal matrix such that span(q 1 ) is an invariant subspace for A (that is, A span(q 1 ) span(q 1 )). Partition the matrices Q AQ and Q EQ as follows: Q AQ = D D 2 and Q EQ = E 11 E 21 E 21 E 22. If sep(d 1, D 2 ) = min λ λ(d1 ),µ λ(d 2 ) λ µ > 0, where λ(m) denotes the set of eigenvalues of the matrix M, and E 2 sep(d 1, D 2 )/5, then there exists a matrix P R (n r) r with P 4 sep(d 1, D 2 ) E 21 2 such that the columns of Q 1 = (Q 1 + Q 2 P)(I + P P) 1/2 define an orthonormal basis for a subspace that is invariant for A + E. For any matrix A, let σ i (A) be the i-th largest singular value of A and σ min (A) be the minimum non-zero singular value. We provide some well known and useful inequalities in the following Lemma. Lemma 2(i)-(ii) can be found in, for example, Golub and Van Loan (1996) and Bernstein (2009). Lemma 2. (i) Let A, B R m n, for i = 1,..., min{m, n}, (σ i (A) σ 1 (B)) + σ i (A + B) σ i (A) + σ 1 (B), where x + = x if x > 0 and 0 otherwise. (ii) Let A R n m and B R m l, σ m (A)σ min{n,m,l} (B) σ min{n,m,l} (AB) σ 1 (A)σ min{n,m,l} (B). Proof of Theorem 1. As p is finite, we have the following facts: Σ y (k) 2 C and σ r (Σ y (k)) C > 0 for 1 k k 0, 7

8 and therefore, λ r (M) C > 0. Let σ ij (k) and σ ij (k) be the (i, j)-th element of Σ y (k) and Σ y (k), respectively. Then, σ i,j (k) σ i,j (k) = 1 n n k t=1 { yi,t+k y j,t E(y i,t+k y j,t ) } ȳj, n + n k n ȳi, ȳ j, k n E(y i,t+ky j,t ) = I 1 + I 2 + I 3 + I 4 + I 5, n k t=1 y i,t+k ȳi, n n k y j,t t=1 (A.1) where ȳ i, = n 1 n t=1 y i,t and ȳ j, = n 1 n t=1 y j,t. By Assumptions 1-2 and Proposition 2.5 of Fan and Yao (2003), n k E 1 {y i,t+k y j,t E(y i,t+k y j,t )} n t=1 = 1 n k n 2 E[{y i,t+k y j,t E(y i,t+k y j,t )} 2 ] t=1 + 1 n 2 E[{y i,t1 +ky j,t1 E(y i,t1 +ky j,t1 )}{y i,t2 +ky j,t2 E(y i,t2 +ky j,t2 )}] t 1 t 2 C n + C n 2 α( t 1 t 2 ) 1 2/γ C n + C n α(u) 1 2/γ C n n. t 1 t 2 u=1 2 (A.2) Thus, I 1 = O p (n 1/2 ). By a similar argument, we have I 2 = O p (n 1 ), I 3 = O p (n 1 ), I 4 = O p (n 1 ), I 5 = O p (n 1 ). Therefore, Σ y (k) Σ y (k) 2 Σ y (k) Σ y (k) F = O p (n 1/2 ), and k 0 M M 2 { Σ y (k) Σ y (k) Σ y (k) 2 Σ } y (k) Σ y (k) 2 = O p (n 1/2 ). (A.3) k=1 Note that A 1 B 1 M (A 1, B 1 ) = D (A.4) with sep(d, 0) = λ r (M). Letting A = M and E = M M in Lemma 1, then there exists a 8

9 matrix P R (p r) r such that P 2 4 sep(d, 0) (E) sep(d, 0) E 2 = O p (n 1/2 ), and Â1 = (A 1 + B 1 P)(I + P P) 1/2 is an estimator for A 1. Then we have Â1 A 1 2 = (A 1 (I (I + P P) 1/2 ) + B 1 P)(I + P P) 1/2 2 I (I + P P) 1/2 2 + P 2 2 P 2 = O p (n 1/2 ). Similarly, we also have B 1 B 1 2 = O p (n 1/2 ) for any zero subspace B 1 of M. To show that B 2 B 2 2 = O p (n 1/2 ), by a similar argument as above, we only need to show that Ŝ S 2 = O p (n 1/2 ). Note that Σ y B1 Σ y B 1 2 Σ y Σ 2 B Σ y 2 B 1 B 1 2 = O p (n 1/2 ), and hence Ŝ S 2 = Σ y B1 B 1 Σy Σ y B 1 B 1Σ y 2 = O p (n 1/2 ). Furthermore, we observe that Â 1 x t A 1 x t =Â1( B 2Â1) 1 B 2 y t A 1 x t (A.5) =Â1( B 2Â1) 1 B 2 (A 1 Â1)x t + (Â1 A 1 )x t + Â1( B 2Â1) 1 ( B 2 B 2 ) A 2 e t. Thus, Â1 x t A 1 x t 2 = O p ( Â1 A B 2 B 2 2 ) = O p (n 1/2 ). This completes the proof. Proof of Theorem 2. We only prove the result for Â1. Note that {D(M(Â1), M(A 1 ))} 2 = 1 r [tr{â 1(I p A 1 A 1)Â1}] Â 1(I p A 1 A 1)Â1 2 = Â 1(Â1Â 1 A 1 A 1)Â1 2, and Â 1(Â1Â 1 A 1 A 1)Â1 = Â 1(Â1 A 1 )(Â1 A 1 ) Â 1 + (Â1 A 1 ) (Â1 A 1 ), 9

10 which implies {D(M(Â1), M(A 1 ))} 2 2 Â1 A The conclusion follows from the above inequality and Theorem 1. This completes the proof. Lemma 3. If Assumptions 1-6 hold. Then Σ y (k) 2 = O p (p 1 δ 1 + κ max p 1 δ 1/2 δ 2 /2 ) for 1 k k 0, and Σ y 2 O p (p 1 δ 1 + p 1 δ 2 ). Proof. Note that Σ y (k) = L 1 Σ f (k)l 1 + L 1 Σ fε (k)l 2. By Assumptions 4-5, L 1 can be equivalently decomposed as L 1 = U 1 D 1 V 1 with U 1 U 1 = I r, V 1 V 1 = I r and D 1 is a diagonal matrix and its diagonals are all of order p (1 δ1)/2. U 1 may not necessarily be the same as A 1 but M(U 1 ) = M(A 1 ). Therefore, Σ y (k) 2 L Σ f (k) 2 + L 1 2 L 2 2 Σ fε (k) 2 Cp 1 δ 1 + Cκ max p 1 δ 1/2 δ 2 /2. The proof for Σ y is similar. This completes the proof. Lemma 4. (i) if Assumptions 1-7 hold, then, Σ y (k) Σ y (k) 2 = O p (pn 1/2 ), 0 k k 0, where Σ y (0) = Σ y and Σ y (0) = Σ y. (ii) If Assumptions 1-8 hold, then, for 0 k k 0, Σ O p (max{p 1 δ1/2 n 1/2, p 1 δ2/2 n 1/2 }), y (k) Σ y (k) 2 = O p (max{p 1 δ1/2 n 1/2, p 1 δ2/2 n 1/2, pn 1 }), if p = O(n), if n = O(p). 10

11 In particular, if p δ 1/2 n 1/2 = o(1) and p δ 2/2 n 1/2 = o(1), Σ O p (p 1 δ1/2 n 1/2 ), if δ 1 δ 2, y (k) Σ y (k) 2 = O p (p 1 δ2/2 n 1/2 ), if δ 1 > δ 2. Proof. We only show the result for 1 k k 0 since the case of k = 0 can be carried out in a similar way. (i) Let A 2 = (A 21, A 22 ), D 2 = diag(d 21, D 22 ) and V 2 = (V 21, V 22 ) with A 21 R p K, A 22 R p (v K), V 21 R v K, V 22 R v (v K), D 21 = diag(d 1,..., d K ) and D 22 = diag(d K+1,..., d v ). Then L 2 = A 21 D 21 V 21 + A 22D 22 V 22 and Model (2.1) becomes y t = L 1 f t + A 21 D 21 z t + A 22 D 22 V 22ε t, where z t = (z 1t,..., z Kt ) := V 21 ε t R K with E z it 2γ C. It follows from the above equation that Σ y (k) =L 1 Σf (k)l 1 + L 1 Σfz (k)d 21 A 21 + L 1 Σfε (k)v 22 D 22 A 22 + A 21 D 21 Σzf (k)l 1 + A 21 D 21 Σz (k)d 21 A 21 + A 21 D 21 Σzε (k)v 22 D 22 A 22 + A 22 D 22 V 22 Σ εf (k)l 1 + A 22 D 22 V 22 Σ εz (k)d 21 A 21 + A 22 D 22 V 22 Σ ε (k)v 22 D 22 A 22 =J J 9, (A.6) and Σ y (k) = L 1 Σ f (k)l 1 + L 1 Σ fz (k)d 21 A 21 + L 1 Σ fε (k)v 22 A 22. Note that J 1 L 1 Σ f (k)l 1 2 L Σ f (k) Σ f (k) 2 = O p (p 1 δ 1 n 1/2 ), J 2 L 1 Σ fz (k)d 21 A 21 2 L 1 2 D 21 2 Σ fz (k) Σ fz (k) 2 = O p (p 1 δ 1/2 δ 2 /2 n 1/2 ), J 3 L 1 Σ fε (k)v 22 A 22 2 C L 1 2 Σ fε (k) Σ fε (k) 2 = O p (p 1 δ 1/2 n 1/2 ). Without further assumptions on ε t, we can show that J 4 2 = O p (p 1 δ 1/2 δ 2 /2 n 1/2 ), J 5 2 = O p (p 1 δ 2 n 1/2 ), J 6 2 = O p (p 1 δ 2/2 n 1/2 ), 11

12 J 7 2 = O p (p 1 δ 1/2 n 1/2 ), J 8 2 = O p (p 1 δ 2/2 n 1/2 ), J 9 2 = O p (pn 1/2 ). Therefore, Σ y (k) Σ y (k) 2 = O p ( Σ ε (k) 2 ) = O p (pn 1/2 ). (A.7) (ii) On the other hand, if Assumption 8 holds for ε t, by Theorem of Vershynin (2018), Σ ε I v 2 = O p ( p/n + p/n), (A.8) and hence Σ ε O p ( p/n + p/n). Thus, Σ ε (k) 2 Σ O p (1) ε 2 = O p (pn 1 ) if p = O(n), if n = O(p), where the first inequality above can be found in the proof of Theorem 2 in Lam et al. (2011). Therefore, if p = O(n), we can further show that J 6 2 = O p (min{p 1 δ 2/2 n 1/2, p (1 δ 2)/2 }) = O p (p 1 δ 2/2 n 1/2 ), J 8 2 = O p (min{p 1 δ 2/2 n 1/2, p (1 δ 2)/2 }) = O p (p 1 δ 2/2 n 1/2 ), J 9 2 = O p (1). Gathering all the rates of J 1,..., J 9 together, we obtain Σ y (k) Σ y (k) 2 = O p (max{p 1 δ 1/2 n 1/2, p 1 δ 2/2 n 1/2 }). Similarly, if n = O(p), we have J 5 2 = O p (min{p 1 δ 2 n 1/2, p 2 δ 2 n 1 }) = O p (p 1 δ 2 n 1/2 ), J 6 2 = O p (min{p 1 δ 2/2 n 1/2, p 3/2 δ 2/2 n 1 }) = O p (p 1 δ 2/2 n 1/2 ), J 8 2 = O p (min{p 1 δ 2/2 n 1/2, p 3/2 δ 2/2 n 1 }) = O p (p 1 δ 2/2 n 1/2 ), J 9 2 = O p (pn 1 ). Then, we gather all the rates of J 1,..., J 9 together and obtain Σ y (k) Σ y (k) 2 = O p (max{p 1 δ 1/2 n 1/2, p 1 δ 2/2 n 1/2, pn 1 }). This completes the proof. 12

13 Lemma 5. (i) Let Assumptions 1-7 hold. If either p δ 1 n 1/2 = o(1) or κ 1 maxp δ 1/2+δ 2 /2 n 1/2 = o(1), then M M 2 = O p (p 2 δ 1 n 1/2 + κ max p 2 δ 1/2 δ 2 /2 n 1/2 ). (ii) Let Assumptions 1-8 hold, p δ 1/2 n 1/2 = o(1) and p δ 2/2 n 1/2 = o(1). If δ 1 δ 2, then M M 2 = O p (p 2 3δ 1/2 n 1/2 + κ max p 2 δ 1 δ 2 /2 n 1/2 ). If δ 1 > δ 2, then O p (p 2 δ 2 n 1 + p 2 δ 1 δ 2 /2 n 1/2 ), if κ max = 0, M M 2 = O p (κ max p 2 δ 1/2 δ 2 n 1/2 ), if κ max >> 0. Proof. (i) By (A.3) and Lemmas 3 and 4, M M 2 Cp 2 n 1 + Cpn 1/2 (p 1 δ 1 + κ max p 1 δ 1/2 δ 2 /2 ) Cp 2 n 1 + Cp 2 δ 1 n 1/2 + Cκ max p 2 δ 1/2 δ 2 /2 n 1/2 Cp 2 δ 1 n 1/2 + Cκ max p 2 δ 1/2 δ 2 /2 n 1/2, where we assume either p δ 1 n 1/2 = o(1) or κ 1 maxp δ 1/2+δ 2 /2 n 1/2 = o(1). (ii) If δ 1 δ 2, by Lemmas 3 and 5, and a similar argument as above, M M 2 Cp 2 δ 1 n 1 + Cp 2 3δ 1/2 n 1/2 + Cκ max p 2 δ 1 δ 2 /2 n 1/2 Cp 2 3δ 1/2 n 1/2 + Cκ max p 2 δ 1 δ 2 /2 n 1/2, where we assume p δ 1/2 n 1/2 = o(1). If δ 1 > δ 2. M M 2 Cp 2 δ 2 n 1 + Cp 2 δ 1 δ 2 /2 n 1/2 + Cκ max p 2 δ 1/2 δ 2 n 1/2. If κ max = 0, i.e., f t and ε s are independent for all t and s, then M M 2 Cp 2 δ 2 n 1 + Cp 2 δ 1 δ 2 /2 n 1/2. 13

14 If κ max >> 0, i.e., f t and ε t j are correlated for j > 0, then M M 2 Cκ max p 2 δ 1/2 δ 2 n 1/2, with the assumption that p δ 1/2 n 1/2 = o(1). This completes the proof. Lemma 6. If Assumptions 1-7 hold, then Cp 2(1 δ1), λ min (M) Cκ 2 min p2 δ 1 δ 2, Cκ 2 min p1 δ 1, if κ max p δ 1/2 δ 2 /2 = o(1), if r K and κ 1 min pδ 2/2 δ 1 /2 = o(1), if r > K and κ 1 min p(1 δ 1)/2 = o(1), (A.9) where κ min and κ max are defined in Assumption 6 and K is given in Assumption 5. Proof. Note that k 0 M = [A 1 Σ x (k)a 1 + A 1 Σ xe (k)a 2][A 1 Σ x (k)a 1 + A 1 Σ xe (k)a 2]. k=1 By Weyl s inequality, for any 1 k k 0, λ min (M) λ min {[A 1 Σ x (k)a 1 + A 1 Σ xe (k)a 2][A 1 Σ x (k)a 1 + A 1 Σ xe (k)a 2] } ={σ r (A 1 Σ x (k)a 1 + A 1 Σ xe (k)a 2)} 2, In addition, we have σ 1 (A 1 Σ x (k)a 1 ) σ r(a 1 Σ x (k)a 1 ) p1 δ 1, σ 1 (A 1 Σ xe (k)a 2 ) = O p(κ max p 1 δ 1/2 δ 2 /2 ) and By Lemma 2(ii), we have σ r (A 1 Σ xe (k)a Cκ min p 1 δ 1/2 δ 2 /2, if r K, 2) Cκ min p (1 δ1)/2, if r > K. Cp 2(1 δ1), λ min (M) Cκ 2 min p2 δ 1 δ 2, Cκ 2 min p1 δ 1, if κ max p δ 1/2 δ 2 /2 = o(1), if r K and κ 1 min pδ 2/2 δ 1 /2 = o(1), if r > K and κ 1 min p(1 δ 1)/2 = o(1), This competes the proof. Lemma 7. If Assumptions 1-7 hold, then λ K (S) Cp 2 2δ 2. 14

15 Proof. Note that Σ y B 1 = A 2 Σ e A 2B 1 = A 2 D 2 2A 2B 1. By Assumption 7(ii) and Lemma 2(ii), σ K (A 2 D 2 2 A 2 B 1) Cp 1 δ 2 and hence λ K (S) = σ K (Σ y B 1 ) 2 Cp 2 2δ 2. This completes the proof. Lemma 8. (i) Let Assumptions 1-7 hold. If p δ 1 n 1/2 = o(1) or p δ 2 n 1/2 = o(1), then Ŝ S 2 = O p (p 2 δ 1 n 1/2 + p 2 δ 2 n 1/2 + (p 2 2δ 1 + p 2 2δ 2 ) B 1 B 1 2 ). (ii) Let Assumptions 1-8 hold, p δ 1/2 n 1/2 = o(1) and p δ 2/2 n 1/2 = o(1). If δ 1 δ 2, then Ŝ S 2 Cp 2 3δ 1/2 n 1/2 + Cp 2 2δ 1 B 1 B 1 2. If δ 1 > δ 2, Ŝ S 2 Cp 2 3δ 2/2 n 1/2 + Cp 2 2δ 2 B 1 B 1 2. Proof. (i) We first note that Σ y B1 Σ y B 1 2 Σ y Σ y 2 + Σ y 2 B 1 B 1 2, and Σ y B1 Σ y B Σ y Σ y Σ y 2 2 B 1 B Σ y 2 Σ y Σ y 2 B 1 B 1 2. Therefore, Ŝ S 2 Σ y B1 Σ y B Σ y B 1 2 Σ y B1 Σ y B 1 2 Σ y Σ y Σ y 2 2 B 1 B Σ y 2 Σ y Σ y 2 B 1 B Σ y 2 Σ y Σ y Σ y 2 2 B 1 B 1 2 Σ y Σ y Σ y 2 Σ y Σ y Σ y 2 2 B 1 B 1 2 =R 1 + R 2 + R 3, (A.10) 15

16 where we assume B 1 B 1 2 = o p (1) as n. By Lemmas 3 and 4(i), R 1 Cp 2 n 1, R 2 C(pn 1/2 )(p 1 δ 1 + p 1 δ 2 ) Cp 2 δ 1 n 1/2 + Cp 2 δ 2 n 1/2, and R 3 C(p 2 2δ 1 + p 2 2δ 2 ) B 1 B 1 2. The result follows from the assumption that p δ 1 n 1/2 = o(1) or p δ 2 n 1/2 = o(1). (ii) When p δ 1/2 n 1/2 = o(1) and p δ 2/2 n 1/2 = o(1), if δ 1 δ 2, by Lemma 4(ii), R 1 Cp 2 δ 1 n 1, R 2 Cp 2 3δ 1/2 n 1/2, R 3 Cp 2 2δ 1 B 1 B 1 2. Then, Ŝ S 2 Cp 2 3δ 1/2 n 1/2 + Cp 2 2δ 1 B 1 B 1 2. If δ 1 > δ 2, R 1 Cp 1 δ 2 n 1, R 2 Cp 2 3δ 2/2 n 1/2, R 3 Cp 2 2δ 2 B 1 B 1 2. Then, Ŝ S 2 Cp 2 3δ 2/2 n 1/2 + Cp 2 2δ 2 B 1 B 1 2. This completes the proof. Proof of Theorem 3. Letting A = M and E = M M in Lemma 1, we can obtain and Â1 A 1 2 M M 2 λ min (M), B 1 B 1 2 M M 2 λ min (M) B 2 B 2 2 Ŝ S 2 λ K (S). Theorem 3 can be shown by an elementary argument based on Lemmas 5-8. We omit the details. This completes the proof. Proof of Theorem 4. The proof is similar to that of Theorem 3. Proof of Theorem 5. There exists R R (p K) r with R R = I r such that B 2 = B 2 R, 16

17 by (A.5), p 1/2 Â1 x t A 1 x t 2 p 1/2 (A 1 Â1)x t 2 + p 1/2 B 2A 2 e t 2 p 1/2 (A 1 Â1)x t 2 + p 1/2 R B 2 A 21 D 21 z t 2 + p 1/2 R B 2 A 22 D 22 V 22ε t 2 =N 1 + N 2 + N 3. (A.11) Note that N 1 p 1/2 Â1 A 1 2 x t 2 Cp δ 1/2 Â1 A 1 2, N 2 Cp 1/2 D 21 2 B 2 B 2 2 Cp δ 2/2 B 2 B 2 2. Letting R = (r 1,..., r r ) and ξ t = B 2 A 22 D 22 V 22 ε t, we have Var(r iξ t ) = r i B 2 A 22 D 2 22A 22 B 2r i O(1), and hence r i ξ t = O p (1). Therefore, N 3 p 1/2 r (r iξ t ) 2 = O p (p 1/2 ). i=1 Theorem 5 follows from the rates of N 1, N 2 and N 3. This completes the proof. Proof of Theorem 6. (i) The analysis of the power can be found in Chang et al. (2017), and we only need to show the consistency of the test. Let u t = G y t as that in Section 2.3 and u it be the i-th element of u t, by the proof of Theorem 3 in Chang et al. (2017), we only need to show that 1 n n (û it u it ) 2 = o p (1), for r + 1 i p. (A.12) t=1 Equivalently, (A.12) can be expressed as 1 n n ( b jy t b jy t ) 2 = o p (1), for 1 j v. t=1 Since b j b j 2 B 1 B 1 2, we need to guarantee B 1 B Σ y 2 = o p (1), which is the condition stated in Theorem 6(i). 17

18 (ii) To show the consistency of the test statistic T (m) in Tsay (2018), a sufficient condition is the consistency of the sample covariance matrix of B 1 y t when doing the PCA, i.e. B 1 Σ y B1 B 1Σ y B 1 2 = o p (1). Note that B 1 Σ y B1 B 1Σ y B 1 2 B 1 B 1 2 Σ y 2 + Σ y Σ y 2, and therefore, we only require the upper bound on the RHS of the above inequality to be o p (1). The power analysis is standard since the limiting distribution is based on standard extreme-value theory, and we omit the argument here. This completes the proof. References Bai, J., and Ng, S. (2002). Determining the number of factors in approximate factor models. Econometrica, 70, Bernstein, D. S. (2009). Matrix Mathematics: Theory, Facts, and Formulas. Princeton Univ. Press. Chang, J., Yao, Q., and Zhou, W. (2017). Testing for high-dimensional white noise using maximum cross-correlations. Biometrika, 104(1), Fan, J., and Yao, Q. (2003). Nonlinear Time Series: Nonparametric and Parametric Methods. Springer-Verlag, New York Golub, G. H., and Van Loan, C. F. (1996). Matrix computations. Johns Hopkins University Press. Johnstone, I.M., and Lu, A.Y. (2009). On consistency and sparsity for principal components analysis in high dimensions. Journal of the American Statistical Association, 104(486), Lam, C., Yao, Q., and Bathia, N. (2011). Estimation of latent factors for high-dimensional time series. Biometrika, 98, Shang, H. L., and Hyndman, R. J. (2011), FDS: Functional Data Sets Package in R, Vienna, Austria: R Development Core Team. 18

19 Tsay, R. S. (2018). Testing for serial correlations in high-dimensional time series via extreme value theory. Manuscript, University of Chicago. Vershynin, R. (2018). High-Dimensional Probability: An Introduction with Applications in Data Science. Cambridge University Press. 19

Factor Modelling for Time Series: a dimension reduction approach

Factor Modelling for Time Series: a dimension reduction approach Clifford Lam and Qiwei Yao Department of Statistics London School of Economics q.yao@lse.ac.uk p.1 Econometric factor models: a brief survey