Assumptions for the theoretical properties of the

Size: px

Start display at page:

Download "Assumptions for the theoretical properties of the"

Muriel Randall
5 years ago
Views:

1 Statistica Sinica: Supplement IMPUTED FACTOR REGRESSION FOR HIGH- DIMENSIONAL BLOCK-WISE MISSING DATA Yanqing Zhang, Niansheng Tang and Annie Qu Yunnan University and University of Illinois at Urbana-Champaign Supplementary Material S Assumptions for the theoretical properties of the proposed method Let A = { tr (A A)} /2 be the norm of matrix A and C denote a positive constant. The following assumptions are needed to establish the theoretical properties in Section 3.3. (C) Factors: E( w t 4 ) C <, n n t= W ()tw ()t p Σ w and n n t= w tw t p Σ w for some r r positive definite matrix Σ w, where W ()t is an r vector from the tth row of W () R n r. (C2) Factor loadings: λ i λ < and Λ Λ/q Σ Λ 0 for some positive finite value λ and some r r positive definite matrix Σ Λ. (C3) Weak dependence: (i) E(e ti ) = 0, E( e ti 8 ) C; (ii) E(e ti e lj ) = τ ij

2 2 YANQING ZHANG, NIANSHENG TANG AND ANNIE QU if t = l and 0 otherwise, q q i= τ ii C, q q q i= j= τ ij C, q j= τ ij C, q q q i= j= τ ij 2 C; (iii) q q q i= j= {E(e2 tie 2 tj) τ 2 ij} C; (iv) E( q /2 q i= {e tie li E(e ti e li )} 4 ) C for every (t, l). (C4) E{q q i= n /2 n t= W ()te ()ti 2 } C. (C5) For any t =,..., n, E( (n q) /2 q n i= t= W ()tλ ie ()ti 2 ) C and E( (n q) /2 q n i= l= W ()l[e ()li e ()ti E{e ()li e ()ti }] 2 ) C. (C6) Predicting model: E(ε t ) = 0, E(ε 2 t ) < C <, n n t w tε t p 0 and α <. Assumption (C) is general for the factor model where components of factor variables are correlated. Assumption (C2) ensures that each factor has a nontrivial contribution to the variance of z t. Here we only consider non-random factor loadings for simplicity. Assumption (C3) is the weak correlation assumption. Given Assumption (C3)(i), the remaining assumptions in (C3) are satisfied if the e ti s are independent for all i. Assumptions (C3)(iii) and (iv) imply that the fourth and the eighth moments are bounded, respectively. Thus, the proposed method is applicable for sub-gaussian cases. The assumption that q j= τ ij C for all i in Assumption (C3)(ii) implies that the eigenvalues of the covariance matrix of random error e i are bounded, since the largest eigenvalue of the covariance matrix is bounded by

3 S2. THE ACCURACY OF SCREENING FOR HIGH-DIMENSIONAL BLOCK-WISE MISSING DATA3 max i q j= τ ij. Assumption (C4) provides weak dependence between factors and random errors. When factors and errors are independent, which is a standard assumption for conventional factor models, Assumption (C4) is implied by Assumptions (C) and (C3), although independence is not required for Assumption (C4) to hold. Assumption (C5) is not restrictive since the sums in Assumption (C5) involve zero mean random variables. Assumption (C6) is a standard set of conditions that implies consistency of the ordinary least square estimator in the predicting model. S2 The accuracy of screening for high-dimensional block-wise missing data Let M o be the collection of indices for nonzero parameters in true sparse model y = p l= X lβ l + ε, and p o be the number of elements in M o. Define Σ = cov(x) and X (i) o = X (i) o Σ /2 (i) for i =,..., K, where X (i) o is an n o(i) s i matrix of observed values from the ith data source and Σ (i) is the corresponding submatrix of Σ. Thus, the covariance matrix of the transformed matrix X (i) o is I si. The following assumptions are needed for the accuracy of screening.

4 4 YANQING ZHANG, NIANSHENG TANG AND ANNIE QU (A) Var(y) = O(), and for some κ 0 and c, c 2 > 0, min β l l M o c min ik n κ o(i) and min cov(β l y, X l ) c 2. l M o (A2) For i =,..., K, X(i) o has a continuous and spherically symmetric distribution. If there are some c 3 > and C > 0 such that the deviation inequality Pr{λ max (s i X (i) o X (i) o ) > c 3 or λ min (s i X (i) o X (i) o ) < /c 3 } exp( C n o(i) ) holds, where λ max ( ) and λ min ( ) are the largest and smallest eigenvalues of a matrix, respectively. Also, ε N(0, σ 2 ) for some σ > 0. (A3) There are some τ 0 and c 4 > 0 such that λ max (Σ (i) ) c 4 n τ o(i) for i =,..., K. (A4) p > n and log(p) = O(n ϱ ) for some ϱ (0, 2κ), where κ is given by condition (A). The following lemma gives the accuracy of screening for high-dimensional block-wise missing data. Lemma. (accuracy of screening) Under Conditions (A)-(A4), if 2κ+τ < then there is some γ (0, ) such that, when γ 0 in such a way that γn 2κ τ as n, we have, for some C > 0, ( ) Pr(M o M γ ) = O exp[ C min{ min s i, min ik ik n 2κ o(i) / log(n o(i) )}],

5 S2. THE ACCURACY OF SCREENING FOR HIGH-DIMENSIONAL BLOCK-WISE MISSING DATA5 where M γ = { i p : ω i is among the first [γp] largest of all}. Proof. The lemma can be proved similarly to the argument of Fan and Lv (2008). The key difference in proof is that the Lemma 4 and Lemma 5 in Fan and Lv (2008) have different conclusions in our proof. In our proof, Lemma 4 is that for i =,..., K and any C > 0, there are constants c and c 2 with 0 < c < < c 2 such that Pr ( S (i) e, e < c n o(i) /s i or > c 2 n o(i) /s i ) 4 exp{ C min(no(i), s i )}, where S (i) = ( X o(i) X o(i) ) + X o(i) Xo(i) and e R p i is a unit vector with the th entry and 0 elsewhere. Lemma 5 is that let S (i) e = (V (i),..., V (i)si ) for i =,..., K, given that the first co-ordinate V (i) = v, the random vector (V (i)2,..., V (i)si ) is uniformly distributed on the sphere S si 2 (v v 2 ) /2 ; moreover, for any C > 0, there is some c > such that ( Pr V (i)2 > cn /2 o(i) s i ) A 3 exp{ C min(n o(i), s i )}, where A is an independent N(0, )-distributed random variable. With the Lemma 4 and Lemma 5, we are able to prove this lemma similarly to Fan and Lv (2008).

6 6 YANQING ZHANG, NIANSHENG TANG AND ANNIE QU S3 Proof of Theorems For the proof of theorems, we introduce some notations as follows. Let A = { tr (A A)} /2 denote the norm of matrix A, T j be the collection of row indices in the jth data group Z (j), and M oj and M mj be the collections of column indices of observed data and missing data in the jth data group Z (j), respectively. Denote δ = min( n, q), H = (Λ Λ/q)(W () W () /n )Ṽ (), and Ṽ() as the r r diagonal matrix of the first r largest eigenvalues of Z () Z () /(n q) in decreasing order. Let D t = {(d ti e ti )δ ti ; i =,..., q} for t =,..., n, where δ ti = if Z ti is missing and δ ti = 0 otherwise, and d ti = λ i w t λ iw t. We provide the proof of some lemmas and theorems. Lemma 2, 4 and 5 are the lemmas from Bai (2003), and Lemma 3 is theorem in Bai and Ng (2002), which are needed subsequently in the proof of theorems. Lemma 2. Under Assumptions (C)-(C4), as n, q : (i) n W () {Z ()Z () /(n q)} W () = Ṽ() p V, (ii) W () W () n ( Λ Λ q ) W () W () n p V, where V is the diagonal matrix consisting of the eigenvalues of Σ Λ Σ w.

7 Lemma 3. Under Assumption (C)-(C4), we have n n u= W ()u H W ()u 2 = O p (δ 2 ). Lemma 4. Under Assumption (C)-(C5), we have S3. PROOF OF THEOREMS7 n ( W () W () H ) W () = O p (δ 2 ). Lemma 5. Under Assumption (C)-(C5), we have n n u= ( W ()u H W ()u )e ()ui = O p (δ 2 ) for i =,..., q. Lemma 6. Under Assumption (C)-(C5), we have ( ) ( λ i H q j λ i )e li = O p min( n, q) i M oj for l T j, j = 2,..., k. Proof. From Λ = Z () W () /n and Z () = W () Λ + e (), we have Λ = ΛW W () () /n +e W () () /n. Writing W () = W () W () H + W () H and W () W () /n = I r, we obtain λ i = W () W ()λ i /n + n t= W ()t e ()ti /n = W () (W () W () H )λ i /n + H λ i + n t= ( W ()t H W ()t )e ()ti /n + H n t= W ()te ()ti /n = ( W () W () H ) (W () W () H )λ i /n +H W () (W () W () H )λ i /n + H λ i + n t= ( W ()t H W ()t )e ()ti /n + H n t= W ()te ()ti /n.

8 8 YANQING ZHANG, NIANSHENG TANG AND ANNIE QU Thus, we have λ i H λ i = H n u= W ()ue ()ui /n +( W () W () H ) (W () W () H )λ i /n +H W () (W () W () H )λ i /n (S3.) + n u= ( W ()u H W ()u )e ()ui /n. Thus, i M oj ( λ i H λ i )e li = I + I 2 + I 3 + I 4, where I = i M oj H n u= W ()ue ()ui e li /n, I 2 = i M oj ( W () W () H ) (W () W () H )λ i e li /n, I 3 = i M oj H W () (W () W () H )λ i e li /n, I 4 = i M oj n u= ( W ()u H W ()u )e ()ui e li /n. Note that H = O p () because H Λ Λ q W () W /2 () W W () () n n /2 Ṽ () and each of the matrix norms is stochastically bounded by Assumption (C) and (C2) together with W () W () /n = I r and Lemma 2. By Assumption (C3) and (C4), we have I H q j ( ) qj = O p n. { n q j i M oj n n u= W ()ue ()ui 2 } /2 { } /2 q j i M oj e 2 li By Assumption (C2) and (C3), we have i M oj λ i e li / q j = O p () since E( i M oj λ i e li / q j ) = 0 and E( i M oj λ i e li / q j 2 ) λ 2 i M oj τ ii /q j

9 S3. PROOF OF THEOREMS9 C. By Lemma 3, we obtain I 2 = n n u= ( W ()u H W ()u )( W ()u H W ()u ) H i M oj λ i e li ( n n u= W ()u H W ()u 2 ) H q j qj i M oj λ i e li = O p ( q j δ 2 ). By Lemma 4, we have I 3 H n n u= W ()u( W ()u H W ()u ) H q j qj i M oj λ i e li O p ( q j δ 2 ). By Lemma 5 and Assumption (C3), we have I 4 q j { q j ( q j ) /2 q j i M oj e 2 li } /2 i M oj n n u= ( W ()u H W ()u )e ()ui 2 O p ( q j δ 2 ). Thus, we obtain ( ) ( λ i H q j λ i )e li = O p min(. n, q) i M oj Lemma 7. Under Assumption (C)-(C5), we have ( ) λ i H λ i 2 q j = O p min(n, q 2 ) i M oj for j = 2,..., k. Proof. Since equation (S3.) and (x + y + z + u) 2 4(x 2 + y 2 + z 2 + u 2 ),

10 0 YANQING ZHANG, NIANSHENG TANG AND ANNIE QU we have i M oj λ i H λ i 2 4(J + J 2 + J 3 + J 4 ), where J = i M oj H n u= W ()ue ()ui /n 2, J 2 = i M oj ( W () W () H ) ( W () W () H )H λ i /n 2, J 3 = i M oj H W () ( W () W () H )H λ i /n 2, J 4 = i M oj n u= ( W ()u H W ()u )e ()ui /n 2. By Assumption (C4), we have J n H 2 q j ( q j i M oj n n u= W ()u e ()ui 2 ) O p ( q j n ). By Assumption (C2) and Lemma 3, we have J 2 n n u= ( W ()u H W ()u )( W ()u H W ()u ) 2 H 2 ( i M oj λ i 2 ) ( n n u= W ()u H W ()u 2 ) 2 O p ()q j λ2 = O p (q j δ 4 ). By Assumption (C2) and Lemma 4, we have J 3 H 2 n n u= W ()u( W ()u H W ()u ) 2 H 2 ( i M oj λ i 2 ) = O p (q j δ 4 ). By Lemma 5, we have J 4 = O p (q j δ 4 ). Then we obtain ( ) λ i H λ i 2 q j = O p. min(n, q 2 ) i M oj Lemma 8. Under Assumption (C)-(C5), as q, n, we have

11 S3. PROOF OF THEOREMS (i) W()t H W ()t = O p ( (min(n, q)) ) for t =,..., n, (ii) λi H λ i = O p ( (min( n, q)) ) for i =,..., q, (iii) W (j)t H W (j)t = O p ( q j q min( n, q j ) ) for j = 2,..., k and t =,..., n j. Proof. Since W () and λ i are obtained in the complete observed data block Z (), the proofs of part (i) and (ii) are similar to Bai (2003). The details are omitted. Consider part (iii). Since W (j) = Z o(j) Λo(j) ( Λ Λ o(j) o(j) ) and Z o(j) = W (j) Λ o(j) + e o(j) for j = 2,..., k, then we have, for t =,..., n j, That is W (j)t = ( Λ Λ o(j) o(j) ) Λ o(j)λ o(j) W (j)t + ( Λ Λ o(j) o(j) ) Λ o(j)e o(j)t = H W (j)t + ( Λ Λ o(j) o(j) ) Λ o(j)(λ o(j) Λ o(j) H )W (j)t +( Λ o(j) Λ o(j) ) ( Λ o(j) Λ o(j) H ) e o(j)t +( Λ o(j) Λ o(j) ) H Λ o(j)e o(j)t. W (j)t H W (j)t = ( Λ o(j) Λ o(j) ) {( Λ o(j) Λ o(j) H ) (Λ o(j) Λ o(j) H )W (j)t +H Λ o(j)(λ o(j) Λ o(j) H )W (j)t +( Λ o(j) Λ o(j) H ) e o(j)t + H Λ o(j)e o(j)t } = ( Λ o(j) Λ o(j) ) (I + I 2 + I 3 + I 4 ).

12 2 YANQING ZHANG, NIANSHENG TANG AND ANNIE QU By Lemma 2, we have Λ o(j) Λ o(j) = q j i= λ o(j)i λ o(j)i q λ i= i λ i = Λ Λ = q{ Z () Z () W n () n W q () } = O p (q). Thus, by Assumption (C) and Lemma 7, we have I = i M oj ( λ i H λ i )( λ i H λ i ) H W (j)t i M oj λ i H λ i 2 O p () q j = O p ( ). min(n,q 2 ) By Assumption (C)-(C2) and Lemma 7, we have I 2 = H i M oj λ i ( λ i H λ i ) H W (j)t O p ()( i M oj λ i 2 ) /2 ( i M oj λ i H λ i 2 ) /2 O p () q j λop ( = O p ( By Lemma 6, we have q j min( ). n,q) qj min( ) n,q) I 3 = ( λ i H q j λ i )e li = O p ( min( n, q) ) for l T j. i M oj By Assumption (C2) and (C3), we have Thus, we have I 4 = H q j qj i M oj λ i e li O p ( q j ). W (j)t H W (j)t O p ( q min( n, q j ) ). q j

13 S3. PROOF OF THEOREMS3 The proof of part (iii) is completed. Proof of Theorem. Based on Lemma 8, we can obtain λ i w t λ iw t = ( λ i H λ i ) ( w t H w t ) + ( λ i H λ i ) H w t + λ ih ( w t H w t ) = O p { q j (min( n q j, q)) }. The proof of theorem is completed. Lemma 9. Under Assumption (C), (C2) and (C3), we have Λ Dt /q = O p ( (min( n, q)) ) for t T j, j = 2,..., k. Proof. Based on the definition of D t, we have for t T j (j = 2,..., k) Λ Dt /q = q i= λ i(d ti e ti )δ ti /q = i M mj λ i (d ti e ti )/q i M mj λ i d ti /q + i M mj λ i e ti /q. Based on Assumption (C) and (C2), we have q i M mj λ i d ti = q i M mj λ i ( λ i w t λ iw t ) = q i M mj λ i {( λ i H λ i ) ( w t H w t ) +( λ i H λ i ) H w t +λ ih ( w t H w t )} λ q q j q + λ q q j q q q j i M mj λ i H λ i w t H w t q q j i M mj λ i H λ i H w t + q q i= λ iλ i H w t H w t O p ( n ) + O p ( q j q ). For Assumption (C2) and (C3), we have q i M mj λ i e ti = O p ( q ). Thus,

14 4 YANQING ZHANG, NIANSHENG TANG AND ANNIE QU we obtain q Λ Dt = O p ( min( n, ). The proof of the lemma is complet- q) ed. Lemma 0. Under Assumption (C)-(C5), we have nq 2 n e D ( l t 2 = O p (min(n, q)) ) for t T j, j = 2,..., k. l= Proof. Based on the definition of D t, we have for t T j, j = 2,..., k n nq 2 l= e D l t 2 = n nq 2 l= q i= e li(d ti e ti )δ ti 2 = nq 2 n l= i M mj e li (d ti e ti ) 2 2 nq 2 n l= i M mj e li d ti 2 (S3.2) For Assumption (C3), we have n e nq 2 li d ti 2 n nq i M mj l= + 2 nq 2 n l= i M mj e li e ti 2. e 2 li l= i M mj Based on Lemma 7 and 8, we can obtain q i M mj d 2 ti = q i M mj ( λ i w t λ iw t ) 2 = q q i M mj d 2 ti = O p () q i M mj {( λ i H λ i ) ( w t H w t ) +( λ i H λ i ) H w t + λ ih ( w t H w t )} 2 i M mj d 2 ti. C(q q j) q {( q q j i M mj λ i H λ i 2 ) w t H w t 2 +( q q j i M mj λ i H λ i 2 ) H 2 w t 2 + λ 2 H 2 w t H w t 2 } O p ( n ) + O p ( q j q 2 ). (S3.3)

15 S3. PROOF OF THEOREMS5 Thus, we have nq 2 n l= i M mj e li d ti 2 O p ( n ) + O p ( q j q 2 ). For Assumption (C3), we have i M mj E{ n nq l= i M e mj lie ti 2 } = n nq l= u M mj E(e li e ti e lu e tu ) = q i M mj u M mj τiu 2 i M mj u M mj {E(e 2 tie 2 tu) τiu} 2 + nq C. Thus, n nq 2 l= i M mj e li e ti 2 = O p ( ). From equation (S3.2), we have q n nq 2 l= e D l t 2 = O p ( min(n ). The proof of lemma is completed.,q) Lemma. Under Assumptions (C)-(C5), we have nq n l= ( ) Λ Dl 2 max2jk (n j ) = O p. n Proof. Based on the definition of D l, we have nq n Λ Dl 2 2 nq l= n q λ i d li δ li nq l= i= n q λ i e li δ li 2. l= i= (S3.4) Then, we have n nq l= q i= λ id li δ li 2 C nq n l= q i= λ id li 2 δ li C n q nq l= i= λ i 2 λ i H λ i 2 w l H w l 2 δ li + C n q nq l= i= λ i 2 λ i H λ i 2 H 2 w l 2 δ li + C n q nq l= i= λ iλ i 2 H 2 w l H w l 2 δ li = I + I 2 + I 3.

16 6 YANQING ZHANG, NIANSHENG TANG AND ANNIE QU For Lemma 7 and 8, we have I C λ 2 nq = C λ 2 nq n l= ( q i= λ i H λ i 2 ) w l H w l 2 δ li n k l= j=2 ( i M λ mj i H λ i 2 ) w l H w l 2 I(l T j ) (q q j )q 2 j n k nq l= j=2 O p( )I(l T q 2 min(n,q j ) min(n,q 2 ) j) n j q 2 j = k j=2 O p( ), nq 2 min(n,q j ) min(n,q 2 ) where I( ) is a indicator function. Similarly, we have I 2 C λ 2 nq = C λ 2 nq n q l= i= λ i H λ i 2 δ li n k l= j=2 { i M λ mj i H λ i 2 I(l T j )} k n j j=2 O p( ), n min(n,q 2 ) and I 3 C nq q λ i λ i 2 i= n k w l H w l 2 I(l T j ) l= j=2 k n j qj 2 O p ( nq 2 min(n, q j ) ). j=2 Thus, n nq l= q i= λ id li δ li 2 O p ( max 2jk(n j ) nn )+O p ( max 2jk(n j q j ) )}. For nq 2 Assumptions (C2) and (C3), we have E{ nq n l= q i= λ ie li δ li 2 } = nq n l= q i= q u= E(λ iλ u e li e lu δ li δ lu ) λ 2 nq n l= q i= q u= E(e lie lu )δ li δ lu = λ 2 nq n l= k j=2 { i M mj = λ 2 k n j j=2 nq O p ( max 2jk(n j ) n ). i M mj u M mj τ iu u M mj E(e li e lu )}I(l T j )

17 From inequality (S3.4), we have nq proof of lemma is completed. S3. PROOF OF THEOREMS7 n l= Λ Dl 2 = O p ( max 2jk(n j ) ). The n Lemma 2. Under Assumptions (C)-(C5), we have nq 2 n l= { } D l e t 2 max2jk (n j ) = O p n min(n, q) Proof. Based on the definition of D l, we have nq 2 n l= D l e t 2 2 nq 2 n l= q i= where nq 2 n l= q i= d lie ti δ li 2 nq d li e ti δ li nq 2 for t =,..., n. n l= q e li e ti δ li 2, (S3.5) i= n l= ( q i= d2 li δ li)( q q i= e2 ti). Based on Assumption (C3), we know E{ q q i= e2 ti} C. From (S3.3), we have n q nq l= i= d2 li δ li = n k n l= j=2 ( q i M mj d 2 li )I(l T j) O p { max 2jk(n j ) nn } + O p { max 2jk(n j q j ) nq 2 }. Thus, nq 2 n l= q i= d lie ti δ li 2 O p { max 2jk(n j ) nn } + O p { max 2jk(n j q j ) nq 2 }. Based on Assumption (C3), we have E{ nq n l= q i= e lie ti δ li 2 } = nq n l= q i= q u= E(e lie ti e lu e tu )δ li δ lu = n k nq l= j=2 { i M mj k j=2 { n j nq i M mj u M mj τiu 2 + nq O p ( max 2jk(n j ) n ). u M mj E(e li e ti e lu e tu )I(l T j )} i M mj u M mj (E(e 2 tie 2 tu) τ 2 iu)} Thus, n nq 2 l= q i= e lie ti δ li 2 = O p ( max 2jk(n j ) ). From inequality (S3.5), nq

18 8 YANQING ZHANG, NIANSHENG TANG AND ANNIE QU we have n nq 2 l= D l e t 2 = O p ( max 2jk(n j ) n min(n ). The proof of lemma is com-,q) pleted. Lemma 3. Under Assumptions (C)-(C5), we have n { } nq D D 2 l t 2 max2jk (n j ) = O p for t T u, u = 2,..., k. n min(n, q) l= Proof. Based on the definition of D l, we have n nq 2 l= D D l t 2 = n nq 2 l= q D i= li Dti 2 From (S3.3), we have = nq 2 n l= q i= (d li e li )(d ti e ti )δ li δ ti 2 C{ nq 2 n l= q i= d lid ti δ li δ ti 2 + nq 2 n l= q i= d lie ti δ li δ ti 2 + nq 2 n l= q i= e lid ti δ li δ ti 2 + nq 2 n l= q i= e lie ti δ li δ ti 2 } = C(I + I 2 + I 3 + I 4 ). I nq 2 n l= ( q i= d2 li δ li)( q i= d2 tiδ ti ) = n n l= k j=2 ( q O p ( max 2jk(n j ) nn 2 i M mj d 2 li )I(l T j)( q Based on Assumption (C3) and (S3.3), we have i M mu d 2 ti) ) + O( max 2jk(n j q j ) nn ) + O q 2 p ( max 2jk(n j q j )q u ). nq 4 I 2 nq 2 n l= ( q i= d2 li δ li)( q i= e2 tiδ ti ) = n n l= k j=2 ( q i M mj d 2 li )I(l T j)( q O p ( max 2jk(n j ) nn ) + O( max 2jk(n j q j ) nq 2 ). i M mu e 2 ti)

19 S3. PROOF OF THEOREMS9 Similarly, we have I 3 nq 2 n l= ( q i= e2 li δ li)( q i= d2 tiδ ti ) = n n l= k j=2 ( q i M mj e 2 li )I(l T j)( q O p ( max 2jk(n j ) nn ) + O( max 2jk(n j )q u ). nq 2 Based on Assumption (C3), we have i M mu d 2 ti) I 4 = nq 2 n l= q i= q a= e lie la e ti e ta δ li δ la δ ti δ ta = nq k j=2 n l= { q O p ( max 2jk(n j ) nq ) i M mj Mmu a M mj Mmu e li e la e ti e ta }I(l T j ) since and = q = q E{ q E{ q + q i M mj Mmu a M mj Mmu e li e la e ti e ta } i M mj Mmu a M mj Mmu τ 2 ia C if l t i M mj Mmu a M mj Mmu e li e la e ti e ta } i M mj Mmu a M mj Mmu {E(e 2 tie 2 ta) τ 2 ia} i M mj Mmu a M mj Mmu τ 2 ia C if l = t. Thus, we can obtain n nq 2 l= D D l t 2 = O p { max 2jk(n j ) n min(n }. The proof of,q) lemma is completed. Lemma 4. Under Assumptions (C)-(C5), we have n n t= { } ( ) d 2 max2jk (n j q j ) tiδ ti = O p + O nq 2 p min(q 2, n ) for i =,..., q.

20 20 YANQING ZHANG, NIANSHENG TANG AND ANNIE QU Proof. Based on the definition of d ti, we have n n t= d2 tiδ ti = n n t= ( λ i w t λ iw t ) 2 δ ti C{ n n t= λ i H λ i 2 w t H w t 2 δ ti + n + n n t= λ i H λ i 2 H 2 w t 2 δ ti n t= λ i 2 H 2 w t H w t 2 δ ti } = C(I + I 2 + I 3 ). Based on Lemma 8, we have I = n k j=2 ( t T j w t H w t 2 ) λ i H λ i 2 I(i M mj ) k n j q 2 j j=2 O p{ }I(i M nq 2 min(n,q j ) min(n,q 2 ) mj). For Assumption (C), we have I 2 λ i H λ i 2 H 2 ( n n t= w t 2 ) = O p ( min(n ). For Assumption (C2), we have,q 2 ) I 3 = n k j=2 t T j λ i 2 H 2 w t H w t 2 I(i M mj ) k n j q 2 j j=2 O p( )I(i M nq 2 min(n,q j ) mj). Thus, we can obtain n proof of lemma is completed. n t= d2 tiδ ti O p { max 2jk(n j q j ) } + O nq 2 p ( min(q 2,n ). The ) Proof of Theorem 2. Using the definition of Ŵ, we have nqẑẑ Ŵ = Ŵ V where V is a diagonal matrix whose diagonal elements are the first r

21 eigenvalues of ẐẐ /(nq) in decreasing order. We have S3. PROOF OF THEOREMS2 ŵ t = nq V Ŵ ẐẐt = nq V Ŵ (Z + D)(Z t + D t ) = nq V Ŵ (WΛ + e + D)(Λw t + e t + D t ) = nq V {Ŵ WΛ Λw t + Ŵ WΛ e t + Ŵ WΛ Dt + Ŵ eλw t Thus, we obtain +Ŵ ee t + Ŵ e D t + Ŵ DΛwt + Ŵ Det + Ŵ D Dt }. ŵ t H w t = V { nqŵ WΛ e t + nqŵ WΛ Dt + nqŵ eλw t + nqŵ ee t + nqŵ e D t + nqŵ DΛwt + nqŵ Det + nqŵ D Dt } = V 8 i= I i. Using the definition of D t, we have V (I + I 3 + I 4 + I 6 + I 7 ) t T ŵ t H w t = V 8 i= I i t T u, u = 2,..., k. For Assumptions (C), (C2) and (C3) together with Ŵ Ŵ n = I r, we have I = nqŵ WΛ e t q Ŵ Ŵ n = O p ( q ) /2 W W /2 q q i= λ ie ti where / q q i= λ ie ti = O p () since E{/ q q i= λ ie ti } = 0 and E( / q q i= λ ie ti 2 ) λ/q q i= τ ii C. Based on Assumption (C) n

22 22 YANQING ZHANG, NIANSHENG TANG AND ANNIE QU and (C3) with Ŵ Ŵ/n = I r, we have I 3 = eλw nqŵ t = n nq l= ŵlw tλ e l q ( n n l= ŵ l 2 ) /2 w t n n l= q q i= λ ie li 2 ) /2 = O p ( q ). Based on Assumption (C3), we can obtain I 4 = nqŵ ee t = nq n l= ŵle l e t q ( n n l= ŵ l 2 ) /2 ( nq n l= e l e t 2 ) /2 O p ( q ), where nq n l= e l e t 2 = O p () and E{ n nq l= e l e t 2 } = q q n nq i= j= l= E(e lie lj e ti e tj ) = q q nq i= j= {nτ ij 2 + E(e 2 tie 2 tj) τij} 2 = q q q i= j= τ ij 2 + q q nq i= j= {E(e2 tie 2 tj) τij} 2 C. Based on Assumption (C), Ŵ Ŵ n = I r and Lemma 9, we have for t T u, u = 2,..., k I 2 = WΛ Dt Ŵ Ŵ /2 W W /2 nqŵ n n q Λ Dt = O p () q Λ Dt = O p ( min( n, Ŵ Ŵ ). For q) n = I r and Lemma 0, we have I 5 ( n n l= ŵ l 2 ) /2 ( nq 2 n l= e l D t 2 ) /2 = O p ()( nq 2 n l= e l D t 2 ) /2 = O p ( min( n, q) ) for t T u, u = 2,..., k. For Assumption (C), Ŵ Ŵ n = I r and Lemma,

23 S3. PROOF OF THEOREMS23 we have I 6 = nq n l= ŵlw tλ Dl q ( n n l= ŵ l 2 ) /2 w t ( nq n l= Λ Dl 2 ) /2 O p { max 2jk( n j ) nq }. Similarly, for Lemma 2, we have I 7 ( n n l= ŵ l 2 ) /2 ( nq 2 n l= D l e t 2 ) /2 O p { max 2jk( n j ) n min( n, q) }. Based on Lemma 3, we can obtain I 8 ( n n l= ŵ l 2 ) /2 ( nq 2 n l= D l D t 2 ) /2 O p { max 2jk( n j ) n min( n, q) } for t T u, u = 2,..., k. Similarly to the augment of Lemma A, we can obtain that V is bounded. Thus, we have that ŵ t H w t = O p ( ) + O p { max 2jk( n j ) } = O p ( q nn min( q, n ) ) for t T, and ŵ t H w t = O p ( min( q, n ) ) for t T u, u = 2,..., k. The proof of the part (i) in theorem 2 is completed. We prove the part (ii) as the follow. From Λ = Ẑ Ŵ/n = n t= Ẑtŵ t/n and Ẑt = Λw t + e t + D t, we have λ i = n n t= ŵtw tλ i + n n t= ŵte ti + n n t= ŵt D ti. Writing w t = w t H ŵ t +H ŵ t and using Ŵ = I nŵ r,

24 24 YANQING ZHANG, NIANSHENG TANG AND ANNIE QU we have λ i H λ i = n n t= (ŵ t H w t )(ŵ t H w t ) H λ i n n t= H w t (ŵ t H w t ) H λ i + n n t= (ŵ t H w t )e ti + n n t= H w t e ti + n n t= (ŵ t H w t ) D ti + n n t= H w t Dti = J + J 2 + J 3 + J 4 + J 5 + J 6. Assumptions (C) and (C2) together with Ŵ Ŵ/n = I r imply that H Λ Λ q W W n /2 Ŵ Ŵ /2 V = O n p (). Following the part (i) in Theorem 2, we have J n n ŵ t H w t 2 H λ i O p ( min(q, n ) ). t= Based on Assumptions (C)-(C2) and Ŵ Ŵ/n = I r, we have J 2 H ( n n t= w t 2 ) /2 ( n n t= ŵ t H w t 2 ) /2 H λ i O p ( min( q, n ) ). For Assumption (C3), we have J 3 ( n n ŵ t H w t 2 ) /2 ( n t= n e 2 ti) /2 O p ( min( q, n ) ). For Assumption (C4), we have J 4 = H n n t= w te ti = O p ( n ). Based on the definition of D t, we have t= J 5 = n n t= (ŵ t H w t )(d ti e ti )δ ti ( n n t= ŵ t H w t 2 δ ti ) /2 ( n n t= d2 tiδ ti ) /2 +( n n t= ŵ t H w t 2 δ ti ) /2 ( n n t= e2 tiδ ti ) /2,

25 S3. PROOF OF THEOREMS25 where n n t= ŵ t H w t 2 δ ti = k n j=2 t T j ŵ t H w t 2 I(i M mj ) O p { max 2jk(n j ) n min(q,n ) } = O p{ min(q,n ) }. For Assumption (C3), we have n n t= e2 tiδ ti = O p (). Based on Lemma 4, we have J 5 = O p { min( q, n ) }. Similarly, we can prove J 6 = O p { min( q, }. Thus, we can obtain λ n ) i H λ i = O p ( min( q, n ). The ) proof of part (ii) is completed. Proof of Theorem 3. Based on the definite of α, we have α = (Ŵ Ŵ) Ŵ Y = ( n t= ŵtŵ t) n t= ŵty t = { n t= (ŵ t H w t )(ŵ t H w t ) + n t= (ŵ t H w t )w th + H n t= w t(ŵ t H w t ) +H n t= w tw th} { n t= (ŵ t H w t )y t + H n t= w ty t } = {H n t= w tw th + o p ()} {H n t= w tw tα + H n t= w tε t + o p ()} = H α + H ( n t= w tw t) n t= w tε t + o p () since H = O p () and Assumption (C) together with Theorem 2 (i) holds. Thus, from Assumption (C6), we can obtain n α H α = H ( w t w t) t= n w t ε t + o p () p 0. t=

26 26 YANQING ZHANG, NIANSHENG TANG AND ANNIE QU The proof of part (i) is completed. Similarly, we have α ŵ t α w t = α ŵ t (H α) (H w t ) = ( α H α) (ŵ t H w t ) + ( α H α) H w t p 0. +(H α) (ŵ t H w t ) The proof of part (ii) is completed. Bibliography Bai, J. and Ng, S. (2002). Determing the number of factors in approximate factor models. Econometrica 70, pp Bai, J. (2003). Inferential theory for factor models of large dimensions. Econometrica 7, pp Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society B 70, pp

Statistica Sinica Preprint No: SS

Statistica Sinica Preprint No: SS-2018-0008 Title Imputed Factor Regression for High-dimensional Block-wise Missing Data Manuscript ID SS-2018-0008 URL http://www.stat.sinica.edu.tw/statistica/ DOI 10.5705/ss.202018.0008