Zhaoxing Gao and Ruey S Tsay Booth School of Business, University of Chicago. August 23, 2018

Size: px
Start display at page:

Download "Zhaoxing Gao and Ruey S Tsay Booth School of Business, University of Chicago. August 23, 2018"

Transcription

1 Supplementary Material for Structural-Factor Modeling of High-Dimensional Time Series: Another Look at Approximate Factor Models with Diverging Eigenvalues Zhaoxing Gao and Ruey S Tsay Booth School of Business, University of Chicago August 23, 2018 This online supplement consists of two parts. The first part analyzes another real example to further illustrate the usefulness of the proposed method, and the second part presents the technical proofs of the theorems in the main article. 1 An Additional Real Example In this section, we apply the proposed method to another real example to illustrate further its usefulness in practice. In this example, the dimension is greater than the sample size. Example 4. Consider the half-hourly temperature data observed at the Adelaide Airport from Sunday to Saturday in Adelaide, Australia between July 6, 1997 and March 31, The data are available in the R package fds. See also Shang and Hyndman (2011) for details. There are 48 observations per day, 7 days a week, and 508 weeks. From the original data we stack observations from the same week together from Sunday to Saturday and treat the temperature data of each week as a time series. Consequently, we have 508 time series each with sample size 336. The left panel of Figure A1 shows the time plots of 48 observations of the 508 Mondays, from which we see that, as expected, the temperature series exhibit certain diurnal pattern meaning that it is lower at night and higher in the afternoon. To remove such a diurnal pattern and any possible trend, we take the first difference y t = ỹ t ỹ t 1, where 1

2 ỹ t R 508 and t = 1,..., 336. The right panel of Figure A1 shows the time plots of y t, which appears to be weakly stationary with some volatility clusterings. Thus, in this example, we employ y t with n = 335 and p = 508. We apply the white noise test T (m) to the data with k 0 = 5 in Equation (2.9), m = 10 and the 97.5%-quantile 3.68 of the Gumbel distribution, and obtain an estimated number of factors r = 6. We also calculate the eigenvalues of Ŝ. Figures A2(a) and (b) show the first 10 eigenvalues of the covariance matrix of y t and Ŝ, respectively. We see clearly that the largest eigenvalue of Ŝ is larger than that of the covariance matrix of the data, which again supports the assumption that the largest eigenvalues of the noise covariance matrix are diverging for large p. Figure A2(c) plots the ratios of eigenvalues of Ŝ and shows that the largest gap between the eigenvalues occurs between µ 1 and µ 2. But following the proposed procedure, we choose K = min{ p, n, 10} = 10. The spectral densities of the 6 estimated factors are given in Figure A3, and they hardly change if we vary K from 1 to 10. From the spectral densities, we see that the estimated factors are all non-trivial processes and the variance of the first factor is extremely large compared to those of the others. On the other hand, we calculate the eigenvalues of the covariance matrices of the estimated factors x t and the data y t, the largest 6 eigenvalues are (72.88, 17.37, 11.17, 8.69, 7.68, 4.56) and (73.46, 19.13, 11.85, 11.11, 10.89, 9.27), respectively. We see that the eigenvalues of the covariance matrix of the estimated factors via the proposed method are almost at the same levels as those of the largest 6 eigenvalues of the covariance matrix of the data. This is reasonable and different from the result of principal component analysis because the proposed method can mitigate some effects of the idiosyncratic noises. We also apply the methods of BN and LYB to y t. The estimated numbers of factors are r = 6 and r = 1, respectively, by the principal component analysis and the ratio-based method. The spectral densities of the first 6 principal component factors are shown in Figure A4, from which we see that they are different from those in Figure A3. Again, this is understandable because the principal component analysis only employs the sample covariance matrix. Since the eigenvalues of the covariance matrix of Â1 x t using principal component analysis are the same as the first 6 eigenvalues of Σ y, they are likely influenced by the noise components, complicating the interpretation of common factors. On the other hand, the eigenvalues of the covariance matrix of the estimated factors via the proposed method are not only close to the first 6 eigenvalues of Σ y, but also are less affected by the noises, some effects of which are mitigated by the projected PCA. For the ratio-based method of LYB, the 2

3 Temperature (degree) Half hour Time Figure A1: Left: The 30-minute temperature of 508 Mondays in Example 4; Right: Time plots of the first difference of observed weekly data. estimated number of factor is r = 1. However, the spectral densities of the first 6 transformed series û 1t,...û 6t via the eigen-analysis shown in Figure A5 strongly suggest that û 2t to whu 6t are not white noise, a contradiction with the assumptions made in LYB. Finally, we also compare the forecast performance among the three methods. We estimate the models using data in the time span [1, τ] with τ = 287,..., 335 h for h-step ahead forecast, i.e. we use the data between Sundays to Fridays and onward to predict the temperature of Saturdays. The forecast error is similarly defined as (4.6) in Example 3 by replacing the dimension and the sample size accordingly with those of the example. The estimated number of factors are quite stable in the sub-samples and, therefore, we set r = 6, 6 and 1 for the methods of GT, BN and LYB, respectively. Table A1 reports the 1-step to 3-step ahead forecast errors using AR(1), AR(2), and AR(3) models. The smallest one in each step is shown in boldface. From the table, we see that our method is also capable of producing accurate forecasts if we search K properly. As an illustration, the point-wise 1-step ahead forecast errors are given in Figure A6, where the benchmark values are obtained in the same way as those in Example 3. From the table, we see that the three methods all perform well and are close to each other. They are much better than the benchmark model. 2 Proofs of Theorems In the proofs, we use C as a generic constant whose value may change at different place. We start with some useful lemmas. The following Lemma is Theorem in Golub and Van Loan (1996), which is stated 3

4 ^ γi ^ µi ^ µi+1 µi ^ (a) (b) (c) Figure A2: (a) The first 10 eigenvalues of the sample covariance matrix of the series y t in Example 4; (b) The first 10 eigenvalues of Ŝ; (c) The plots of the ratios for the eigenvalues µ i of the matrix Ŝ. x^1t x^2t x^3t x^4t x^5t x^6t Figure A3: Sample spectral densities of the 6 estimated factors using the proposed methodology and K = 10 in Example 4. 4

5 BN : x^1t BN : x^2t BN : x^3t 1e 02 1e+00 1e BN : x^4t BN : x^5t BN : x^6t Figure A4: Sample spectral densities of the 6 estimated factors using the principal component analysis in Bai and Ng (2002) in Example 4. u^1t u^2t u^3t u^4t u^5t u^6t Figure A5: Sample spectral densities of the first 6 transformed series using the eigen-analysis in Example 4. 5

6 Table A1: The 1-step, 2-step and 3-step ahead forecasting errors in Example 4 with different VAR models. Standard errors are in the parentheses. GT denotes our method, BN denotes the principal component analysis in Bai and Ng (2002) and LYB is the one in Lam et al. (2011). Boldface numbers denote the smallest value of each model. GT BN LYB Step AR K = 1 K = 2 K = 3 K = 4 K = 5 K = 6 K = 7 K = 8 K = 9 K = (0.282) (0.281) (0.281) (0.281) (0.282) (0.282) (0.281) (0.282) (0.281) (0.282) (0.281) (0.281) (0.283) (0.283) (0.282) (0.282) (0.282) (0.282) (0.282) (0.282) (0.281) (0.282) (0.279) (0.282) (0.282) (0.281) (0.280) (0.281) (0.281) (0.280) (0.280) (0.280) (0.280) (0.280) (0.291) (0.280) (0.282) (0.282) (0.282) (0.282) (0.282) (0.282) (0.282) (0.283) (0.282) (0.282) (0.285) (0.287) (0.282) (0.282) (0.283) (0.282) (0.282) (0.282) (0.282) (0.282) (0.282) (0.282) (0.282) (0.285) (0.281) (0.280) (0.280) (0.280) (0.280) (0.280) (0.281) (0.281) (0.281) (0.281) (0.307) (0.283) (0.288) (0.289) (0.289) (0.289) (0.289) (0.289) (0.289) (0.289) (0.289) (0.289) (0.287) (0.297) (0.289) (0.290) (0.290) (0.290) (0.289) (0.289) (0.289) (0.289) (0.290) (0.289) (0.288) (0.294) (0.287) (0.287) (0.287) (0.286) (0.286) (0.287) (0.287) (0.288) (0.287) (0.286) (0.305) (0.291) Forecast Error GT BN LYB Bechmark Window Figure A6: Time plots of the 1-step ahead point-wise forecast errors using AR(1) models in Example 4 with K = 1. GT denotes our method, BN denotes the principal component analysis in Bai and Ng (2002) and LYB is the one in Lam et al. (2011). 6

7 explicitly since it plays an important role in establishing our theorems. See also Johnstone and Lu (2009) and Lam et al. (2011). Lemma 1. Suppose A and A + E are n n symmetric matrices and that Q = [Q 1, Q 2 ] (Q 1 R n r and Q 2 R n (n r) ) is an orthogonal matrix such that span(q 1 ) is an invariant subspace for A (that is, A span(q 1 ) span(q 1 )). Partition the matrices Q AQ and Q EQ as follows: Q AQ = D D 2 and Q EQ = E 11 E 21 E 21 E 22. If sep(d 1, D 2 ) = min λ λ(d1 ),µ λ(d 2 ) λ µ > 0, where λ(m) denotes the set of eigenvalues of the matrix M, and E 2 sep(d 1, D 2 )/5, then there exists a matrix P R (n r) r with P 4 sep(d 1, D 2 ) E 21 2 such that the columns of Q 1 = (Q 1 + Q 2 P)(I + P P) 1/2 define an orthonormal basis for a subspace that is invariant for A + E. For any matrix A, let σ i (A) be the i-th largest singular value of A and σ min (A) be the minimum non-zero singular value. We provide some well known and useful inequalities in the following Lemma. Lemma 2(i)-(ii) can be found in, for example, Golub and Van Loan (1996) and Bernstein (2009). Lemma 2. (i) Let A, B R m n, for i = 1,..., min{m, n}, (σ i (A) σ 1 (B)) + σ i (A + B) σ i (A) + σ 1 (B), where x + = x if x > 0 and 0 otherwise. (ii) Let A R n m and B R m l, σ m (A)σ min{n,m,l} (B) σ min{n,m,l} (AB) σ 1 (A)σ min{n,m,l} (B). Proof of Theorem 1. As p is finite, we have the following facts: Σ y (k) 2 C and σ r (Σ y (k)) C > 0 for 1 k k 0, 7

8 and therefore, λ r (M) C > 0. Let σ ij (k) and σ ij (k) be the (i, j)-th element of Σ y (k) and Σ y (k), respectively. Then, σ i,j (k) σ i,j (k) = 1 n n k t=1 { yi,t+k y j,t E(y i,t+k y j,t ) } ȳj, n + n k n ȳi, ȳ j, k n E(y i,t+ky j,t ) = I 1 + I 2 + I 3 + I 4 + I 5, n k t=1 y i,t+k ȳi, n n k y j,t t=1 (A.1) where ȳ i, = n 1 n t=1 y i,t and ȳ j, = n 1 n t=1 y j,t. By Assumptions 1-2 and Proposition 2.5 of Fan and Yao (2003), n k E 1 {y i,t+k y j,t E(y i,t+k y j,t )} n t=1 = 1 n k n 2 E[{y i,t+k y j,t E(y i,t+k y j,t )} 2 ] t=1 + 1 n 2 E[{y i,t1 +ky j,t1 E(y i,t1 +ky j,t1 )}{y i,t2 +ky j,t2 E(y i,t2 +ky j,t2 )}] t 1 t 2 C n + C n 2 α( t 1 t 2 ) 1 2/γ C n + C n α(u) 1 2/γ C n n. t 1 t 2 u=1 2 (A.2) Thus, I 1 = O p (n 1/2 ). By a similar argument, we have I 2 = O p (n 1 ), I 3 = O p (n 1 ), I 4 = O p (n 1 ), I 5 = O p (n 1 ). Therefore, Σ y (k) Σ y (k) 2 Σ y (k) Σ y (k) F = O p (n 1/2 ), and k 0 M M 2 { Σ y (k) Σ y (k) Σ y (k) 2 Σ } y (k) Σ y (k) 2 = O p (n 1/2 ). (A.3) k=1 Note that A 1 B 1 M (A 1, B 1 ) = D (A.4) with sep(d, 0) = λ r (M). Letting A = M and E = M M in Lemma 1, then there exists a 8

9 matrix P R (p r) r such that P 2 4 sep(d, 0) (E) sep(d, 0) E 2 = O p (n 1/2 ), and Â1 = (A 1 + B 1 P)(I + P P) 1/2 is an estimator for A 1. Then we have Â1 A 1 2 = (A 1 (I (I + P P) 1/2 ) + B 1 P)(I + P P) 1/2 2 I (I + P P) 1/2 2 + P 2 2 P 2 = O p (n 1/2 ). Similarly, we also have B 1 B 1 2 = O p (n 1/2 ) for any zero subspace B 1 of M. To show that B 2 B 2 2 = O p (n 1/2 ), by a similar argument as above, we only need to show that Ŝ S 2 = O p (n 1/2 ). Note that Σ y B1 Σ y B 1 2 Σ y Σ 2 B Σ y 2 B 1 B 1 2 = O p (n 1/2 ), and hence Ŝ S 2 = Σ y B1 B 1 Σy Σ y B 1 B 1Σ y 2 = O p (n 1/2 ). Furthermore, we observe that  1 x t A 1 x t =Â1( B 2Â1) 1 B 2 y t A 1 x t (A.5) =Â1( B 2Â1) 1 B 2 (A 1 Â1)x t + (Â1 A 1 )x t + Â1( B 2Â1) 1 ( B 2 B 2 ) A 2 e t. Thus, Â1 x t A 1 x t 2 = O p ( Â1 A B 2 B 2 2 ) = O p (n 1/2 ). This completes the proof. Proof of Theorem 2. We only prove the result for Â1. Note that {D(M(Â1), M(A 1 ))} 2 = 1 r [tr{â 1(I p A 1 A 1)Â1}]  1(I p A 1 A 1)Â1 2 =  1(Â1 1 A 1 A 1)Â1 2, and  1(Â1 1 A 1 A 1)Â1 =  1(Â1 A 1 )(Â1 A 1 )  1 + (Â1 A 1 ) (Â1 A 1 ), 9

10 which implies {D(M(Â1), M(A 1 ))} 2 2 Â1 A The conclusion follows from the above inequality and Theorem 1. This completes the proof. Lemma 3. If Assumptions 1-6 hold. Then Σ y (k) 2 = O p (p 1 δ 1 + κ max p 1 δ 1/2 δ 2 /2 ) for 1 k k 0, and Σ y 2 O p (p 1 δ 1 + p 1 δ 2 ). Proof. Note that Σ y (k) = L 1 Σ f (k)l 1 + L 1 Σ fε (k)l 2. By Assumptions 4-5, L 1 can be equivalently decomposed as L 1 = U 1 D 1 V 1 with U 1 U 1 = I r, V 1 V 1 = I r and D 1 is a diagonal matrix and its diagonals are all of order p (1 δ1)/2. U 1 may not necessarily be the same as A 1 but M(U 1 ) = M(A 1 ). Therefore, Σ y (k) 2 L Σ f (k) 2 + L 1 2 L 2 2 Σ fε (k) 2 Cp 1 δ 1 + Cκ max p 1 δ 1/2 δ 2 /2. The proof for Σ y is similar. This completes the proof. Lemma 4. (i) if Assumptions 1-7 hold, then, Σ y (k) Σ y (k) 2 = O p (pn 1/2 ), 0 k k 0, where Σ y (0) = Σ y and Σ y (0) = Σ y. (ii) If Assumptions 1-8 hold, then, for 0 k k 0, Σ O p (max{p 1 δ1/2 n 1/2, p 1 δ2/2 n 1/2 }), y (k) Σ y (k) 2 = O p (max{p 1 δ1/2 n 1/2, p 1 δ2/2 n 1/2, pn 1 }), if p = O(n), if n = O(p). 10

11 In particular, if p δ 1/2 n 1/2 = o(1) and p δ 2/2 n 1/2 = o(1), Σ O p (p 1 δ1/2 n 1/2 ), if δ 1 δ 2, y (k) Σ y (k) 2 = O p (p 1 δ2/2 n 1/2 ), if δ 1 > δ 2. Proof. We only show the result for 1 k k 0 since the case of k = 0 can be carried out in a similar way. (i) Let A 2 = (A 21, A 22 ), D 2 = diag(d 21, D 22 ) and V 2 = (V 21, V 22 ) with A 21 R p K, A 22 R p (v K), V 21 R v K, V 22 R v (v K), D 21 = diag(d 1,..., d K ) and D 22 = diag(d K+1,..., d v ). Then L 2 = A 21 D 21 V 21 + A 22D 22 V 22 and Model (2.1) becomes y t = L 1 f t + A 21 D 21 z t + A 22 D 22 V 22ε t, where z t = (z 1t,..., z Kt ) := V 21 ε t R K with E z it 2γ C. It follows from the above equation that Σ y (k) =L 1 Σf (k)l 1 + L 1 Σfz (k)d 21 A 21 + L 1 Σfε (k)v 22 D 22 A 22 + A 21 D 21 Σzf (k)l 1 + A 21 D 21 Σz (k)d 21 A 21 + A 21 D 21 Σzε (k)v 22 D 22 A 22 + A 22 D 22 V 22 Σ εf (k)l 1 + A 22 D 22 V 22 Σ εz (k)d 21 A 21 + A 22 D 22 V 22 Σ ε (k)v 22 D 22 A 22 =J J 9, (A.6) and Σ y (k) = L 1 Σ f (k)l 1 + L 1 Σ fz (k)d 21 A 21 + L 1 Σ fε (k)v 22 A 22. Note that J 1 L 1 Σ f (k)l 1 2 L Σ f (k) Σ f (k) 2 = O p (p 1 δ 1 n 1/2 ), J 2 L 1 Σ fz (k)d 21 A 21 2 L 1 2 D 21 2 Σ fz (k) Σ fz (k) 2 = O p (p 1 δ 1/2 δ 2 /2 n 1/2 ), J 3 L 1 Σ fε (k)v 22 A 22 2 C L 1 2 Σ fε (k) Σ fε (k) 2 = O p (p 1 δ 1/2 n 1/2 ). Without further assumptions on ε t, we can show that J 4 2 = O p (p 1 δ 1/2 δ 2 /2 n 1/2 ), J 5 2 = O p (p 1 δ 2 n 1/2 ), J 6 2 = O p (p 1 δ 2/2 n 1/2 ), 11

12 J 7 2 = O p (p 1 δ 1/2 n 1/2 ), J 8 2 = O p (p 1 δ 2/2 n 1/2 ), J 9 2 = O p (pn 1/2 ). Therefore, Σ y (k) Σ y (k) 2 = O p ( Σ ε (k) 2 ) = O p (pn 1/2 ). (A.7) (ii) On the other hand, if Assumption 8 holds for ε t, by Theorem of Vershynin (2018), Σ ε I v 2 = O p ( p/n + p/n), (A.8) and hence Σ ε O p ( p/n + p/n). Thus, Σ ε (k) 2 Σ O p (1) ε 2 = O p (pn 1 ) if p = O(n), if n = O(p), where the first inequality above can be found in the proof of Theorem 2 in Lam et al. (2011). Therefore, if p = O(n), we can further show that J 6 2 = O p (min{p 1 δ 2/2 n 1/2, p (1 δ 2)/2 }) = O p (p 1 δ 2/2 n 1/2 ), J 8 2 = O p (min{p 1 δ 2/2 n 1/2, p (1 δ 2)/2 }) = O p (p 1 δ 2/2 n 1/2 ), J 9 2 = O p (1). Gathering all the rates of J 1,..., J 9 together, we obtain Σ y (k) Σ y (k) 2 = O p (max{p 1 δ 1/2 n 1/2, p 1 δ 2/2 n 1/2 }). Similarly, if n = O(p), we have J 5 2 = O p (min{p 1 δ 2 n 1/2, p 2 δ 2 n 1 }) = O p (p 1 δ 2 n 1/2 ), J 6 2 = O p (min{p 1 δ 2/2 n 1/2, p 3/2 δ 2/2 n 1 }) = O p (p 1 δ 2/2 n 1/2 ), J 8 2 = O p (min{p 1 δ 2/2 n 1/2, p 3/2 δ 2/2 n 1 }) = O p (p 1 δ 2/2 n 1/2 ), J 9 2 = O p (pn 1 ). Then, we gather all the rates of J 1,..., J 9 together and obtain Σ y (k) Σ y (k) 2 = O p (max{p 1 δ 1/2 n 1/2, p 1 δ 2/2 n 1/2, pn 1 }). This completes the proof. 12

13 Lemma 5. (i) Let Assumptions 1-7 hold. If either p δ 1 n 1/2 = o(1) or κ 1 maxp δ 1/2+δ 2 /2 n 1/2 = o(1), then M M 2 = O p (p 2 δ 1 n 1/2 + κ max p 2 δ 1/2 δ 2 /2 n 1/2 ). (ii) Let Assumptions 1-8 hold, p δ 1/2 n 1/2 = o(1) and p δ 2/2 n 1/2 = o(1). If δ 1 δ 2, then M M 2 = O p (p 2 3δ 1/2 n 1/2 + κ max p 2 δ 1 δ 2 /2 n 1/2 ). If δ 1 > δ 2, then O p (p 2 δ 2 n 1 + p 2 δ 1 δ 2 /2 n 1/2 ), if κ max = 0, M M 2 = O p (κ max p 2 δ 1/2 δ 2 n 1/2 ), if κ max >> 0. Proof. (i) By (A.3) and Lemmas 3 and 4, M M 2 Cp 2 n 1 + Cpn 1/2 (p 1 δ 1 + κ max p 1 δ 1/2 δ 2 /2 ) Cp 2 n 1 + Cp 2 δ 1 n 1/2 + Cκ max p 2 δ 1/2 δ 2 /2 n 1/2 Cp 2 δ 1 n 1/2 + Cκ max p 2 δ 1/2 δ 2 /2 n 1/2, where we assume either p δ 1 n 1/2 = o(1) or κ 1 maxp δ 1/2+δ 2 /2 n 1/2 = o(1). (ii) If δ 1 δ 2, by Lemmas 3 and 5, and a similar argument as above, M M 2 Cp 2 δ 1 n 1 + Cp 2 3δ 1/2 n 1/2 + Cκ max p 2 δ 1 δ 2 /2 n 1/2 Cp 2 3δ 1/2 n 1/2 + Cκ max p 2 δ 1 δ 2 /2 n 1/2, where we assume p δ 1/2 n 1/2 = o(1). If δ 1 > δ 2. M M 2 Cp 2 δ 2 n 1 + Cp 2 δ 1 δ 2 /2 n 1/2 + Cκ max p 2 δ 1/2 δ 2 n 1/2. If κ max = 0, i.e., f t and ε s are independent for all t and s, then M M 2 Cp 2 δ 2 n 1 + Cp 2 δ 1 δ 2 /2 n 1/2. 13

14 If κ max >> 0, i.e., f t and ε t j are correlated for j > 0, then M M 2 Cκ max p 2 δ 1/2 δ 2 n 1/2, with the assumption that p δ 1/2 n 1/2 = o(1). This completes the proof. Lemma 6. If Assumptions 1-7 hold, then Cp 2(1 δ1), λ min (M) Cκ 2 min p2 δ 1 δ 2, Cκ 2 min p1 δ 1, if κ max p δ 1/2 δ 2 /2 = o(1), if r K and κ 1 min pδ 2/2 δ 1 /2 = o(1), if r > K and κ 1 min p(1 δ 1)/2 = o(1), (A.9) where κ min and κ max are defined in Assumption 6 and K is given in Assumption 5. Proof. Note that k 0 M = [A 1 Σ x (k)a 1 + A 1 Σ xe (k)a 2][A 1 Σ x (k)a 1 + A 1 Σ xe (k)a 2]. k=1 By Weyl s inequality, for any 1 k k 0, λ min (M) λ min {[A 1 Σ x (k)a 1 + A 1 Σ xe (k)a 2][A 1 Σ x (k)a 1 + A 1 Σ xe (k)a 2] } ={σ r (A 1 Σ x (k)a 1 + A 1 Σ xe (k)a 2)} 2, In addition, we have σ 1 (A 1 Σ x (k)a 1 ) σ r(a 1 Σ x (k)a 1 ) p1 δ 1, σ 1 (A 1 Σ xe (k)a 2 ) = O p(κ max p 1 δ 1/2 δ 2 /2 ) and By Lemma 2(ii), we have σ r (A 1 Σ xe (k)a Cκ min p 1 δ 1/2 δ 2 /2, if r K, 2) Cκ min p (1 δ1)/2, if r > K. Cp 2(1 δ1), λ min (M) Cκ 2 min p2 δ 1 δ 2, Cκ 2 min p1 δ 1, if κ max p δ 1/2 δ 2 /2 = o(1), if r K and κ 1 min pδ 2/2 δ 1 /2 = o(1), if r > K and κ 1 min p(1 δ 1)/2 = o(1), This competes the proof. Lemma 7. If Assumptions 1-7 hold, then λ K (S) Cp 2 2δ 2. 14

15 Proof. Note that Σ y B 1 = A 2 Σ e A 2B 1 = A 2 D 2 2A 2B 1. By Assumption 7(ii) and Lemma 2(ii), σ K (A 2 D 2 2 A 2 B 1) Cp 1 δ 2 and hence λ K (S) = σ K (Σ y B 1 ) 2 Cp 2 2δ 2. This completes the proof. Lemma 8. (i) Let Assumptions 1-7 hold. If p δ 1 n 1/2 = o(1) or p δ 2 n 1/2 = o(1), then Ŝ S 2 = O p (p 2 δ 1 n 1/2 + p 2 δ 2 n 1/2 + (p 2 2δ 1 + p 2 2δ 2 ) B 1 B 1 2 ). (ii) Let Assumptions 1-8 hold, p δ 1/2 n 1/2 = o(1) and p δ 2/2 n 1/2 = o(1). If δ 1 δ 2, then Ŝ S 2 Cp 2 3δ 1/2 n 1/2 + Cp 2 2δ 1 B 1 B 1 2. If δ 1 > δ 2, Ŝ S 2 Cp 2 3δ 2/2 n 1/2 + Cp 2 2δ 2 B 1 B 1 2. Proof. (i) We first note that Σ y B1 Σ y B 1 2 Σ y Σ y 2 + Σ y 2 B 1 B 1 2, and Σ y B1 Σ y B Σ y Σ y Σ y 2 2 B 1 B Σ y 2 Σ y Σ y 2 B 1 B 1 2. Therefore, Ŝ S 2 Σ y B1 Σ y B Σ y B 1 2 Σ y B1 Σ y B 1 2 Σ y Σ y Σ y 2 2 B 1 B Σ y 2 Σ y Σ y 2 B 1 B Σ y 2 Σ y Σ y Σ y 2 2 B 1 B 1 2 Σ y Σ y Σ y 2 Σ y Σ y Σ y 2 2 B 1 B 1 2 =R 1 + R 2 + R 3, (A.10) 15

16 where we assume B 1 B 1 2 = o p (1) as n. By Lemmas 3 and 4(i), R 1 Cp 2 n 1, R 2 C(pn 1/2 )(p 1 δ 1 + p 1 δ 2 ) Cp 2 δ 1 n 1/2 + Cp 2 δ 2 n 1/2, and R 3 C(p 2 2δ 1 + p 2 2δ 2 ) B 1 B 1 2. The result follows from the assumption that p δ 1 n 1/2 = o(1) or p δ 2 n 1/2 = o(1). (ii) When p δ 1/2 n 1/2 = o(1) and p δ 2/2 n 1/2 = o(1), if δ 1 δ 2, by Lemma 4(ii), R 1 Cp 2 δ 1 n 1, R 2 Cp 2 3δ 1/2 n 1/2, R 3 Cp 2 2δ 1 B 1 B 1 2. Then, Ŝ S 2 Cp 2 3δ 1/2 n 1/2 + Cp 2 2δ 1 B 1 B 1 2. If δ 1 > δ 2, R 1 Cp 1 δ 2 n 1, R 2 Cp 2 3δ 2/2 n 1/2, R 3 Cp 2 2δ 2 B 1 B 1 2. Then, Ŝ S 2 Cp 2 3δ 2/2 n 1/2 + Cp 2 2δ 2 B 1 B 1 2. This completes the proof. Proof of Theorem 3. Letting A = M and E = M M in Lemma 1, we can obtain and Â1 A 1 2 M M 2 λ min (M), B 1 B 1 2 M M 2 λ min (M) B 2 B 2 2 Ŝ S 2 λ K (S). Theorem 3 can be shown by an elementary argument based on Lemmas 5-8. We omit the details. This completes the proof. Proof of Theorem 4. The proof is similar to that of Theorem 3. Proof of Theorem 5. There exists R R (p K) r with R R = I r such that B 2 = B 2 R, 16

17 by (A.5), p 1/2 Â1 x t A 1 x t 2 p 1/2 (A 1 Â1)x t 2 + p 1/2 B 2A 2 e t 2 p 1/2 (A 1 Â1)x t 2 + p 1/2 R B 2 A 21 D 21 z t 2 + p 1/2 R B 2 A 22 D 22 V 22ε t 2 =N 1 + N 2 + N 3. (A.11) Note that N 1 p 1/2 Â1 A 1 2 x t 2 Cp δ 1/2 Â1 A 1 2, N 2 Cp 1/2 D 21 2 B 2 B 2 2 Cp δ 2/2 B 2 B 2 2. Letting R = (r 1,..., r r ) and ξ t = B 2 A 22 D 22 V 22 ε t, we have Var(r iξ t ) = r i B 2 A 22 D 2 22A 22 B 2r i O(1), and hence r i ξ t = O p (1). Therefore, N 3 p 1/2 r (r iξ t ) 2 = O p (p 1/2 ). i=1 Theorem 5 follows from the rates of N 1, N 2 and N 3. This completes the proof. Proof of Theorem 6. (i) The analysis of the power can be found in Chang et al. (2017), and we only need to show the consistency of the test. Let u t = G y t as that in Section 2.3 and u it be the i-th element of u t, by the proof of Theorem 3 in Chang et al. (2017), we only need to show that 1 n n (û it u it ) 2 = o p (1), for r + 1 i p. (A.12) t=1 Equivalently, (A.12) can be expressed as 1 n n ( b jy t b jy t ) 2 = o p (1), for 1 j v. t=1 Since b j b j 2 B 1 B 1 2, we need to guarantee B 1 B Σ y 2 = o p (1), which is the condition stated in Theorem 6(i). 17

18 (ii) To show the consistency of the test statistic T (m) in Tsay (2018), a sufficient condition is the consistency of the sample covariance matrix of B 1 y t when doing the PCA, i.e. B 1 Σ y B1 B 1Σ y B 1 2 = o p (1). Note that B 1 Σ y B1 B 1Σ y B 1 2 B 1 B 1 2 Σ y 2 + Σ y Σ y 2, and therefore, we only require the upper bound on the RHS of the above inequality to be o p (1). The power analysis is standard since the limiting distribution is based on standard extreme-value theory, and we omit the argument here. This completes the proof. References Bai, J., and Ng, S. (2002). Determining the number of factors in approximate factor models. Econometrica, 70, Bernstein, D. S. (2009). Matrix Mathematics: Theory, Facts, and Formulas. Princeton Univ. Press. Chang, J., Yao, Q., and Zhou, W. (2017). Testing for high-dimensional white noise using maximum cross-correlations. Biometrika, 104(1), Fan, J., and Yao, Q. (2003). Nonlinear Time Series: Nonparametric and Parametric Methods. Springer-Verlag, New York Golub, G. H., and Van Loan, C. F. (1996). Matrix computations. Johns Hopkins University Press. Johnstone, I.M., and Lu, A.Y. (2009). On consistency and sparsity for principal components analysis in high dimensions. Journal of the American Statistical Association, 104(486), Lam, C., Yao, Q., and Bathia, N. (2011). Estimation of latent factors for high-dimensional time series. Biometrika, 98, Shang, H. L., and Hyndman, R. J. (2011), FDS: Functional Data Sets Package in R, Vienna, Austria: R Development Core Team. 18

19 Tsay, R. S. (2018). Testing for serial correlations in high-dimensional time series via extreme value theory. Manuscript, University of Chicago. Vershynin, R. (2018). High-Dimensional Probability: An Introduction with Applications in Data Science. Cambridge University Press. 19

Factor Modelling for Time Series: a dimension reduction approach

Factor Modelling for Time Series: a dimension reduction approach Factor Modelling for Time Series: a dimension reduction approach Clifford Lam and Qiwei Yao Department of Statistics London School of Economics q.yao@lse.ac.uk p.1 Econometric factor models: a brief survey

More information

Factor Models for Multiple Time Series

Factor Models for Multiple Time Series Factor Models for Multiple Time Series Qiwei Yao Department of Statistics, London School of Economics q.yao@lse.ac.uk Joint work with Neil Bathia, University of Melbourne Clifford Lam, London School of

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis CS5240 Theoretical Foundations in Multimedia Leow Wee Kheng Department of Computer Science School of Computing National University of Singapore Leow Wee Kheng (NUS) Principal

More information

Econ Slides from Lecture 7

Econ Slides from Lecture 7 Econ 205 Sobel Econ 205 - Slides from Lecture 7 Joel Sobel August 31, 2010 Linear Algebra: Main Theory A linear combination of a collection of vectors {x 1,..., x k } is a vector of the form k λ ix i for

More information

Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data

Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data Sri Lankan Journal of Applied Statistics (Special Issue) Modern Statistical Methodologies in the Cutting Edge of Science Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations

More information

Part 1a: Inner product, Orthogonality, Vector/Matrix norm

Part 1a: Inner product, Orthogonality, Vector/Matrix norm Part 1a: Inner product, Orthogonality, Vector/Matrix norm September 19, 2018 Numerical Linear Algebra Part 1a September 19, 2018 1 / 16 1. Inner product on a linear space V over the number field F A map,

More information

High-Dimensional Time Series Analysis

High-Dimensional Time Series Analysis High-Dimensional Time Series Analysis Ruey S. Tsay Booth School of Business University of Chicago December 2015 Outline Analysis of high-dimensional time-series data (or dependent big data) Problem and

More information

The Eigenvalue Problem: Perturbation Theory

The Eigenvalue Problem: Perturbation Theory Jim Lambers MAT 610 Summer Session 2009-10 Lecture 13 Notes These notes correspond to Sections 7.2 and 8.1 in the text. The Eigenvalue Problem: Perturbation Theory The Unsymmetric Eigenvalue Problem Just

More information

j=1 u 1jv 1j. 1/ 2 Lemma 1. An orthogonal set of vectors must be linearly independent.

j=1 u 1jv 1j. 1/ 2 Lemma 1. An orthogonal set of vectors must be linearly independent. Lecture Notes: Orthogonal and Symmetric Matrices Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong taoyf@cse.cuhk.edu.hk Orthogonal Matrix Definition. Let u = [u

More information

Chap. 3. Controlled Systems, Controllability

Chap. 3. Controlled Systems, Controllability Chap. 3. Controlled Systems, Controllability 1. Controllability of Linear Systems 1.1. Kalman s Criterion Consider the linear system ẋ = Ax + Bu where x R n : state vector and u R m : input vector. A :

More information

Stat 159/259: Linear Algebra Notes

Stat 159/259: Linear Algebra Notes Stat 159/259: Linear Algebra Notes Jarrod Millman November 16, 2015 Abstract These notes assume you ve taken a semester of undergraduate linear algebra. In particular, I assume you are familiar with the

More information

Orthogonal Projection and Least Squares Prof. Philip Pennance 1 -Version: December 12, 2016

Orthogonal Projection and Least Squares Prof. Philip Pennance 1 -Version: December 12, 2016 Orthogonal Projection and Least Squares Prof. Philip Pennance 1 -Version: December 12, 2016 1. Let V be a vector space. A linear transformation P : V V is called a projection if it is idempotent. That

More information

Numerical Methods for Solving Large Scale Eigenvalue Problems

Numerical Methods for Solving Large Scale Eigenvalue Problems Peter Arbenz Computer Science Department, ETH Zürich E-mail: arbenz@inf.ethz.ch arge scale eigenvalue problems, Lecture 2, February 28, 2018 1/46 Numerical Methods for Solving Large Scale Eigenvalue Problems

More information

Principal Component Analysis

Principal Component Analysis Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used

More information

Lecture 2: Linear Algebra Review

Lecture 2: Linear Algebra Review EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1

More information

E2 212: Matrix Theory (Fall 2010) Solutions to Test - 1

E2 212: Matrix Theory (Fall 2010) Solutions to Test - 1 E2 212: Matrix Theory (Fall 2010) s to Test - 1 1. Let X = [x 1, x 2,..., x n ] R m n be a tall matrix. Let S R(X), and let P be an orthogonal projector onto S. (a) If X is full rank, show that P can be

More information

Lecture 6: September 19

Lecture 6: September 19 36-755: Advanced Statistical Theory I Fall 2016 Lecture 6: September 19 Lecturer: Alessandro Rinaldo Scribe: YJ Choe Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

More information

Robust Testing and Variable Selection for High-Dimensional Time Series

Robust Testing and Variable Selection for High-Dimensional Time Series Robust Testing and Variable Selection for High-Dimensional Time Series Ruey S. Tsay Booth School of Business, University of Chicago May, 2017 Ruey S. Tsay HTS 1 / 36 Outline 1 Focus on high-dimensional

More information

Linear Methods in Data Mining

Linear Methods in Data Mining Why Methods? linear methods are well understood, simple and elegant; algorithms based on linear methods are widespread: data mining, computer vision, graphics, pattern recognition; excellent general software

More information

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage

More information

Recall the convention that, for us, all vectors are column vectors.

Recall the convention that, for us, all vectors are column vectors. Some linear algebra Recall the convention that, for us, all vectors are column vectors. 1. Symmetric matrices Let A be a real matrix. Recall that a complex number λ is an eigenvalue of A if there exists

More information

The Lanczos and conjugate gradient algorithms

The Lanczos and conjugate gradient algorithms The Lanczos and conjugate gradient algorithms Gérard MEURANT October, 2008 1 The Lanczos algorithm 2 The Lanczos algorithm in finite precision 3 The nonsymmetric Lanczos algorithm 4 The Golub Kahan bidiagonalization

More information

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania Submitted to the Annals of Statistics DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING By T. Tony Cai and Linjun Zhang University of Pennsylvania We would like to congratulate the

More information

THE INVERSE PROBLEM OF CENTROSYMMETRIC MATRICES WITH A SUBMATRIX CONSTRAINT 1) 1. Introduction

THE INVERSE PROBLEM OF CENTROSYMMETRIC MATRICES WITH A SUBMATRIX CONSTRAINT 1) 1. Introduction Journal of Computational Mathematics, Vol22, No4, 2004, 535 544 THE INVERSE PROBLEM OF CENTROSYMMETRIC MATRICES WITH A SUBMATRIX CONSTRAINT 1 Zhen-yun Peng Department of Mathematics, Hunan University of

More information

Tutorial on Principal Component Analysis

Tutorial on Principal Component Analysis Tutorial on Principal Component Analysis Copyright c 1997, 2003 Javier R. Movellan. This is an open source document. Permission is granted to copy, distribute and/or modify this document under the terms

More information

Unsupervised dimensionality reduction

Unsupervised dimensionality reduction Unsupervised dimensionality reduction Guillaume Obozinski Ecole des Ponts - ParisTech SOCN course 2014 Guillaume Obozinski Unsupervised dimensionality reduction 1/30 Outline 1 PCA 2 Kernel PCA 3 Multidimensional

More information

The largest eigenvalues of the sample covariance matrix. in the heavy-tail case

The largest eigenvalues of the sample covariance matrix. in the heavy-tail case The largest eigenvalues of the sample covariance matrix 1 in the heavy-tail case Thomas Mikosch University of Copenhagen Joint work with Richard A. Davis (Columbia NY), Johannes Heiny (Aarhus University)

More information

Homework 1. Yuan Yao. September 18, 2011

Homework 1. Yuan Yao. September 18, 2011 Homework 1 Yuan Yao September 18, 2011 1. Singular Value Decomposition: The goal of this exercise is to refresh your memory about the singular value decomposition and matrix norms. A good reference to

More information

Ma/CS 6b Class 23: Eigenvalues in Regular Graphs

Ma/CS 6b Class 23: Eigenvalues in Regular Graphs Ma/CS 6b Class 3: Eigenvalues in Regular Graphs By Adam Sheffer Recall: The Spectrum of a Graph Consider a graph G = V, E and let A be the adjacency matrix of G. The eigenvalues of G are the eigenvalues

More information

Assessing the dependence of high-dimensional time series via sample autocovariances and correlations

Assessing the dependence of high-dimensional time series via sample autocovariances and correlations Assessing the dependence of high-dimensional time series via sample autocovariances and correlations Johannes Heiny University of Aarhus Joint work with Thomas Mikosch (Copenhagen), Richard Davis (Columbia),

More information

Assignment #10: Diagonalization of Symmetric Matrices, Quadratic Forms, Optimization, Singular Value Decomposition. Name:

Assignment #10: Diagonalization of Symmetric Matrices, Quadratic Forms, Optimization, Singular Value Decomposition. Name: Assignment #10: Diagonalization of Symmetric Matrices, Quadratic Forms, Optimization, Singular Value Decomposition Due date: Friday, May 4, 2018 (1:35pm) Name: Section Number Assignment #10: Diagonalization

More information

Data Analysis and Manifold Learning Lecture 2: Properties of Symmetric Matrices and Examples

Data Analysis and Manifold Learning Lecture 2: Properties of Symmetric Matrices and Examples Data Analysis and Manifold Learning Lecture 2: Properties of Symmetric Matrices and Examples Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline

More information

UNIQUENESS OF SOLUTIONS TO MATRIX EQUATIONS ON TIME SCALES

UNIQUENESS OF SOLUTIONS TO MATRIX EQUATIONS ON TIME SCALES Electronic Journal of Differential Equations, Vol. 2013 (2013), No. 50, pp. 1 13. ISSN: 1072-6691. URL: http://ejde.math.txstate.edu or http://ejde.math.unt.edu ftp ejde.math.txstate.edu UNIQUENESS OF

More information

Comparing Forecast Accuracy of Different Models for Prices of Metal Commodities

Comparing Forecast Accuracy of Different Models for Prices of Metal Commodities Comparing Forecast Accuracy of Different Models for Prices of Metal Commodities João Victor Issler (FGV) and Claudia F. Rodrigues (VALE) August, 2012 J.V. Issler and C.F. Rodrigues () Forecast Models for

More information

Lecture 2 - Unconstrained Optimization Definition[Global Minimum and Maximum]Let f : S R be defined on a set S R n. Then

Lecture 2 - Unconstrained Optimization Definition[Global Minimum and Maximum]Let f : S R be defined on a set S R n. Then Lecture 2 - Unconstrained Optimization Definition[Global Minimum and Maximum]Let f : S R be defined on a set S R n. Then 1. x S is a global minimum point of f over S if f (x) f (x ) for any x S. 2. x S

More information

Appendix to Portfolio Construction by Mitigating Error Amplification: The Bounded-Noise Portfolio

Appendix to Portfolio Construction by Mitigating Error Amplification: The Bounded-Noise Portfolio Appendix to Portfolio Construction by Mitigating Error Amplification: The Bounded-Noise Portfolio Long Zhao, Deepayan Chakrabarti, and Kumar Muthuraman McCombs School of Business, University of Texas,

More information

Supplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION. September 2017

Supplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION. September 2017 Supplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION By Degui Li, Peter C. B. Phillips, and Jiti Gao September 017 COWLES FOUNDATION DISCUSSION PAPER NO.

More information

Math 408 Advanced Linear Algebra

Math 408 Advanced Linear Algebra Math 408 Advanced Linear Algebra Chi-Kwong Li Chapter 4 Hermitian and symmetric matrices Basic properties Theorem Let A M n. The following are equivalent. Remark (a) A is Hermitian, i.e., A = A. (b) x

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions

More information

1 Singular Value Decomposition and Principal Component

1 Singular Value Decomposition and Principal Component Singular Value Decomposition and Principal Component Analysis In these lectures we discuss the SVD and the PCA, two of the most widely used tools in machine learning. Principal Component Analysis (PCA)

More information

The SVD-Fundamental Theorem of Linear Algebra

The SVD-Fundamental Theorem of Linear Algebra Nonlinear Analysis: Modelling and Control, 2006, Vol. 11, No. 2, 123 136 The SVD-Fundamental Theorem of Linear Algebra A. G. Akritas 1, G. I. Malaschonok 2, P. S. Vigklas 1 1 Department of Computer and

More information

NORMS ON SPACE OF MATRICES

NORMS ON SPACE OF MATRICES NORMS ON SPACE OF MATRICES. Operator Norms on Space of linear maps Let A be an n n real matrix and x 0 be a vector in R n. We would like to use the Picard iteration method to solve for the following system

More information

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data. Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two

More information

Lecture 3: Review of Linear Algebra

Lecture 3: Review of Linear Algebra ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters, transforms,

More information

Econ671 Factor Models: Principal Components

Econ671 Factor Models: Principal Components Econ671 Factor Models: Principal Components Jun YU April 8, 2016 Jun YU () Econ671 Factor Models: Principal Components April 8, 2016 1 / 59 Factor Models: Principal Components Learning Objectives 1. Show

More information

Eigenvalues and Eigenvectors

Eigenvalues and Eigenvectors /88 Chia-Ping Chen Department of Computer Science and Engineering National Sun Yat-sen University Linear Algebra Eigenvalue Problem /88 Eigenvalue Equation By definition, the eigenvalue equation for matrix

More information

Lecture 3: Review of Linear Algebra

Lecture 3: Review of Linear Algebra ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak, scribe: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters,

More information

Yimin Wei a,b,,1, Xiezhang Li c,2, Fanbin Bu d, Fuzhen Zhang e. Abstract

Yimin Wei a,b,,1, Xiezhang Li c,2, Fanbin Bu d, Fuzhen Zhang e. Abstract Linear Algebra and its Applications 49 (006) 765 77 wwwelseviercom/locate/laa Relative perturbation bounds for the eigenvalues of diagonalizable and singular matrices Application of perturbation theory

More information

Main matrix factorizations

Main matrix factorizations Main matrix factorizations A P L U P permutation matrix, L lower triangular, U upper triangular Key use: Solve square linear system Ax b. A Q R Q unitary, R upper triangular Key use: Solve square or overdetrmined

More information

Supplementary Material for Nonparametric Operator-Regularized Covariance Function Estimation for Functional Data

Supplementary Material for Nonparametric Operator-Regularized Covariance Function Estimation for Functional Data Supplementary Material for Nonparametric Operator-Regularized Covariance Function Estimation for Functional Data Raymond K. W. Wong Department of Statistics, Texas A&M University Xiaoke Zhang Department

More information

Linear Algebra (Review) Volker Tresp 2017

Linear Algebra (Review) Volker Tresp 2017 Linear Algebra (Review) Volker Tresp 2017 1 Vectors k is a scalar (a number) c is a column vector. Thus in two dimensions, c = ( c1 c 2 ) (Advanced: More precisely, a vector is defined in a vector space.

More information

Total Least Squares Approach in Regression Methods

Total Least Squares Approach in Regression Methods WDS'08 Proceedings of Contributed Papers, Part I, 88 93, 2008. ISBN 978-80-7378-065-4 MATFYZPRESS Total Least Squares Approach in Regression Methods M. Pešta Charles University, Faculty of Mathematics

More information

EECS 275 Matrix Computation

EECS 275 Matrix Computation EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 6 1 / 22 Overview

More information

Banach Journal of Mathematical Analysis ISSN: (electronic)

Banach Journal of Mathematical Analysis ISSN: (electronic) Banach J. Math. Anal. 6 (2012), no. 1, 139 146 Banach Journal of Mathematical Analysis ISSN: 1735-8787 (electronic) www.emis.de/journals/bjma/ AN EXTENSION OF KY FAN S DOMINANCE THEOREM RAHIM ALIZADEH

More information

Asymptotic Quadratic Convergence of the Parallel Block-Jacobi EVD Algorithm for Hermitian Matrices

Asymptotic Quadratic Convergence of the Parallel Block-Jacobi EVD Algorithm for Hermitian Matrices Asymptotic Quadratic Convergence of the Parallel Block-Jacobi EVD Algorithm for ermitian Matrices Gabriel Okša Yusaku Yamamoto Marián Vajteršic Technical Report 016-01 March/June 016 Department of Computer

More information

Singular Value Decomposition

Singular Value Decomposition Chapter 6 Singular Value Decomposition In Chapter 5, we derived a number of algorithms for computing the eigenvalues and eigenvectors of matrices A R n n. Having developed this machinery, we complete our

More information

MATH 5720: Unconstrained Optimization Hung Phan, UMass Lowell September 13, 2018

MATH 5720: Unconstrained Optimization Hung Phan, UMass Lowell September 13, 2018 MATH 57: Unconstrained Optimization Hung Phan, UMass Lowell September 13, 18 1 Global and Local Optima Let a function f : S R be defined on a set S R n Definition 1 (minimizers and maximizers) (i) x S

More information

Lecture: Quadratic optimization

Lecture: Quadratic optimization Lecture: Quadratic optimization 1. Positive definite och semidefinite matrices 2. LDL T factorization 3. Quadratic optimization without constraints 4. Quadratic optimization with constraints 5. Least-squares

More information

A Smoothing Newton Method for Solving Absolute Value Equations

A Smoothing Newton Method for Solving Absolute Value Equations A Smoothing Newton Method for Solving Absolute Value Equations Xiaoqin Jiang Department of public basic, Wuhan Yangtze Business University, Wuhan 430065, P.R. China 392875220@qq.com Abstract: In this paper,

More information

Appendices to the paper "Detecting Big Structural Breaks in Large Factor Models" (2013) by Chen, Dolado and Gonzalo.

Appendices to the paper Detecting Big Structural Breaks in Large Factor Models (2013) by Chen, Dolado and Gonzalo. Appendices to the paper "Detecting Big Structural Breaks in Large Factor Models" 203 by Chen, Dolado and Gonzalo. A.: Proof of Propositions and 2 he proof proceeds by showing that the errors, factors and

More information

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx

More information

Homework 2. Solutions T =

Homework 2. Solutions T = Homework. s Let {e x, e y, e z } be an orthonormal basis in E. Consider the following ordered triples: a) {e x, e x + e y, 5e z }, b) {e y, e x, 5e z }, c) {e y, e x, e z }, d) {e y, e x, 5e z }, e) {

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 26, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 55 High dimensional

More information

CS 4495 Computer Vision Principle Component Analysis

CS 4495 Computer Vision Principle Component Analysis CS 4495 Computer Vision Principle Component Analysis (and it s use in Computer Vision) Aaron Bobick School of Interactive Computing Administrivia PS6 is out. Due *** Sunday, Nov 24th at 11:55pm *** PS7

More information

Applications of random matrix theory to principal component analysis(pca)

Applications of random matrix theory to principal component analysis(pca) Applications of random matrix theory to principal component analysis(pca) Jun Yin IAS, UW-Madison IAS, April-2014 Joint work with A. Knowles and H. T Yau. 1 Basic picture: Let H be a Wigner (symmetric)

More information

Principal Components Theory Notes

Principal Components Theory Notes Principal Components Theory Notes Charles J. Geyer August 29, 2007 1 Introduction These are class notes for Stat 5601 (nonparametrics) taught at the University of Minnesota, Spring 2006. This not a theory

More information

A Note on Eigenvalues of Perturbed Hermitian Matrices

A Note on Eigenvalues of Perturbed Hermitian Matrices A Note on Eigenvalues of Perturbed Hermitian Matrices Chi-Kwong Li Ren-Cang Li July 2004 Let ( H1 E A = E H 2 Abstract and à = ( H1 H 2 be Hermitian matrices with eigenvalues λ 1 λ k and λ 1 λ k, respectively.

More information

Forecasting 1 to h steps ahead using partial least squares

Forecasting 1 to h steps ahead using partial least squares Forecasting 1 to h steps ahead using partial least squares Philip Hans Franses Econometric Institute, Erasmus University Rotterdam November 10, 2006 Econometric Institute Report 2006-47 I thank Dick van

More information

Lecture notes: Applied linear algebra Part 1. Version 2

Lecture notes: Applied linear algebra Part 1. Version 2 Lecture notes: Applied linear algebra Part 1. Version 2 Michael Karow Berlin University of Technology karow@math.tu-berlin.de October 2, 2008 1 Notation, basic notions and facts 1.1 Subspaces, range and

More information

Data Mining Lecture 4: Covariance, EVD, PCA & SVD

Data Mining Lecture 4: Covariance, EVD, PCA & SVD Data Mining Lecture 4: Covariance, EVD, PCA & SVD Jo Houghton ECS Southampton February 25, 2019 1 / 28 Variance and Covariance - Expectation A random variable takes on different values due to chance The

More information

Minjing Tao and Yazhen Wang. University of Wisconsin-Madison. Qiwei Yao. London School of Economics. Jian Zou

Minjing Tao and Yazhen Wang. University of Wisconsin-Madison. Qiwei Yao. London School of Economics. Jian Zou Large Volatility Matrix Inference via Combining Low-Frequency and High-Frequency Approaches Minjing Tao and Yazhen Wang University of Wisconsin-Madison Qiwei Yao London School of Economics Jian Zou National

More information

Zeros and zero dynamics

Zeros and zero dynamics CHAPTER 4 Zeros and zero dynamics 41 Zero dynamics for SISO systems Consider a linear system defined by a strictly proper scalar transfer function that does not have any common zero and pole: g(s) =α p(s)

More information

Journal of Complexity. On strata of degenerate polyhedral cones, II: Relations between condition measures

Journal of Complexity. On strata of degenerate polyhedral cones, II: Relations between condition measures Journal of Complexity 26 (200) 209 226 Contents lists available at ScienceDirect Journal of Complexity journal homepage: www.elsevier.com/locate/jco On strata of degenerate polyhedral cones, II: Relations

More information

MATH 583A REVIEW SESSION #1

MATH 583A REVIEW SESSION #1 MATH 583A REVIEW SESSION #1 BOJAN DURICKOVIC 1. Vector Spaces Very quick review of the basic linear algebra concepts (see any linear algebra textbook): (finite dimensional) vector space (or linear space),

More information

Lecture Notes 2: Matrices

Lecture Notes 2: Matrices Optimization-based data analysis Fall 2017 Lecture Notes 2: Matrices Matrices are rectangular arrays of numbers, which are extremely useful for data analysis. They can be interpreted as vectors in a vector

More information

Theory and Applications of High Dimensional Covariance Matrix Estimation

Theory and Applications of High Dimensional Covariance Matrix Estimation 1 / 44 Theory and Applications of High Dimensional Covariance Matrix Estimation Yuan Liao Princeton University Joint work with Jianqing Fan and Martina Mincheva December 14, 2011 2 / 44 Outline 1 Applications

More information

RECURSIVE ESTIMATION AND KALMAN FILTERING

RECURSIVE ESTIMATION AND KALMAN FILTERING Chapter 3 RECURSIVE ESTIMATION AND KALMAN FILTERING 3. The Discrete Time Kalman Filter Consider the following estimation problem. Given the stochastic system with x k+ = Ax k + Gw k (3.) y k = Cx k + Hv

More information

. The following is a 3 3 orthogonal matrix: 2/3 1/3 2/3 2/3 2/3 1/3 1/3 2/3 2/3

. The following is a 3 3 orthogonal matrix: 2/3 1/3 2/3 2/3 2/3 1/3 1/3 2/3 2/3 Lecture Notes: Orthogonal and Symmetric Matrices Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong taoyf@cse.cuhk.edu.hk Orthogonal Matrix Definition. An n n matrix

More information

Table of Contents. Multivariate methods. Introduction II. Introduction I

Table of Contents. Multivariate methods. Introduction II. Introduction I Table of Contents Introduction Antti Penttilä Department of Physics University of Helsinki Exactum summer school, 04 Construction of multinormal distribution Test of multinormality with 3 Interpretation

More information

Dense LU factorization and its error analysis

Dense LU factorization and its error analysis Dense LU factorization and its error analysis Laura Grigori INRIA and LJLL, UPMC February 2016 Plan Basis of floating point arithmetic and stability analysis Notation, results, proofs taken from [N.J.Higham,

More information

arxiv: v5 [math.na] 16 Nov 2017

arxiv: v5 [math.na] 16 Nov 2017 RANDOM PERTURBATION OF LOW RANK MATRICES: IMPROVING CLASSICAL BOUNDS arxiv:3.657v5 [math.na] 6 Nov 07 SEAN O ROURKE, VAN VU, AND KE WANG Abstract. Matrix perturbation inequalities, such as Weyl s theorem

More information

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013. The University of Texas at Austin Department of Electrical and Computer Engineering EE381V: Large Scale Learning Spring 2013 Assignment Two Caramanis/Sanghavi Due: Tuesday, Feb. 19, 2013. Computational

More information

Preprocessing & dimensionality reduction

Preprocessing & dimensionality reduction Introduction to Data Mining Preprocessing & dimensionality reduction CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University Fall 2016 CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall 2016

More information

Sequential Low-Rank Change Detection

Sequential Low-Rank Change Detection Sequential Low-Rank Change Detection Yao Xie H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology, GA Email: yao.xie@isye.gatech.edu Lee Seversky Air Force Research

More information

15 Singular Value Decomposition

15 Singular Value Decomposition 15 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra) AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 1: Course Overview & Matrix-Vector Multiplication Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 20 Outline 1 Course

More information

EE731 Lecture Notes: Matrix Computations for Signal Processing

EE731 Lecture Notes: Matrix Computations for Signal Processing EE731 Lecture Notes: Matrix Computations for Signal Processing James P. Reilly c Department of Electrical and Computer Engineering McMaster University October 17, 005 Lecture 3 3 he Singular Value Decomposition

More information

Algebraic models for higher-order correlations

Algebraic models for higher-order correlations Algebraic models for higher-order correlations Lek-Heng Lim and Jason Morton U.C. Berkeley and Stanford Univ. December 15, 2008 L.-H. Lim & J. Morton (MSRI Workshop) Algebraic models for higher-order correlations

More information

ELE 538B: Mathematics of High-Dimensional Data. Spectral methods. Yuxin Chen Princeton University, Fall 2018

ELE 538B: Mathematics of High-Dimensional Data. Spectral methods. Yuxin Chen Princeton University, Fall 2018 ELE 538B: Mathematics of High-Dimensional Data Spectral methods Yuxin Chen Princeton University, Fall 2018 Outline A motivating application: graph clustering Distance and angles between two subspaces Eigen-space

More information

Krylov subspace projection methods

Krylov subspace projection methods I.1.(a) Krylov subspace projection methods Orthogonal projection technique : framework Let A be an n n complex matrix and K be an m-dimensional subspace of C n. An orthogonal projection technique seeks

More information

On the projection onto a finitely generated cone

On the projection onto a finitely generated cone Acta Cybernetica 00 (0000) 1 15. On the projection onto a finitely generated cone Miklós Ujvári Abstract In the paper we study the properties of the projection onto a finitely generated cone. We show for

More information

Finding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October

Finding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October Finding normalized and modularity cuts by spectral clustering Marianna Bolla Institute of Mathematics Budapest University of Technology and Economics marib@math.bme.hu Ljubjana 2010, October Outline Find

More information

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26 Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1

More information

Analysis of Big Dependent Data in Economics and Finance

Analysis of Big Dependent Data in Economics and Finance Analysis of Big Dependent Data in Economics and Finance Ruey S. Tsay Booth Shool of Business, University of Chicago September 2016 Ruey S. Tsay Big Dependent Data 1 / 72 Outline 1 Big data? Machine learning?

More information

Optimal spectral shrinkage and PCA with heteroscedastic noise

Optimal spectral shrinkage and PCA with heteroscedastic noise Optimal spectral shrinage and PCA with heteroscedastic noise William Leeb and Elad Romanov Abstract This paper studies the related problems of denoising, covariance estimation, and principal component

More information

Introduction PCA classic Generative models Beyond and summary. PCA, ICA and beyond

Introduction PCA classic Generative models Beyond and summary. PCA, ICA and beyond PCA, ICA and beyond Summer School on Manifold Learning in Image and Signal Analysis, August 17-21, 2009, Hven Technical University of Denmark (DTU) & University of Copenhagen (KU) August 18, 2009 Motivation

More information

Principles of forecasting

Principles of forecasting 2.5 Forecasting Principles of forecasting Forecast based on conditional expectations Suppose we are interested in forecasting the value of y t+1 based on a set of variables X t (m 1 vector). Let y t+1

More information

Sparse PCA in High Dimensions

Sparse PCA in High Dimensions Sparse PCA in High Dimensions Jing Lei, Department of Statistics, Carnegie Mellon Workshop on Big Data and Differential Privacy Simons Institute, Dec, 2013 (Based on joint work with V. Q. Vu, J. Cho, and

More information

Digital Image Processing Lectures 13 & 14

Digital Image Processing Lectures 13 & 14 Lectures 13 & 14, Professor Department of Electrical and Computer Engineering Colorado State University Spring 2013 Properties of KL Transform The KL transform has many desirable properties which makes

More information

Single Equation Linear GMM with Serially Correlated Moment Conditions

Single Equation Linear GMM with Serially Correlated Moment Conditions Single Equation Linear GMM with Serially Correlated Moment Conditions Eric Zivot October 28, 2009 Univariate Time Series Let {y t } be an ergodic-stationary time series with E[y t ]=μ and var(y t )

More information