Index Models for Sparsely Sampled Functional Data. Supplementary Material.

Size: px

Start display at page:

Download "Index Models for Sparsely Sampled Functional Data. Supplementary Material."

Luke Fisher
5 years ago
Views:

1 Index Models for Sparsely Sampled Functional Data. Supplementary Material. May 21, Proof of Theorem 2 We will follow the general argument in the proof of Theorem 3 in Li et al. 2010). Write d = d 1,..., d p ) and d c = d c1,..., d cp ). Let G n d c ) denote the difference between the criterion values for the candidate and the true dimension vectors: G n d c ) = log Ld c ) log Ld) + d c log n/[nh d c n d)]. Note that the set of candidate dimension vectors is finite. Thus, it is sufficient to show that for each vector d c not equal to d function G n d c ) is positive with probability tending to one. Note that d c log n/[nh d c n d)] = O log n[n 4/ d c +4) + n 4/ d+4) ] ) = o1). If d cj < d j for at least one j, we can choose a positive constant c, such that log Ld c ) log Ld) > c with probability tending to one, due to the lack of fit. It follows that G n d c ) > 0 with probability tending to one. Now consider the remaining case of d cj d j for all j and d c > d. In this case d c log n/[nh d c n d)] > log n/[nh d c n d c )] for all sufficiently large n. On the other hand, log Ld c ) log Ld) = log1+ Ld c) Ld) ) = log1+o p 1/[nh d c n d c )])) = O p 1/[nh d c n d c )]), Ld) by the classical results on local linear smoothing. Consequently, with probability tending to one, G n d c ) > 1/2) log n/[nh d c n ] > 0. 1

2 2 Proof of Theorems 3 and 4 Theorem 3. For simplicity of the exposition we will focus on the case of a single predictor. Due to the additive structure of the SIMFE estimation procedure with respect to the predictors, extension to the general case presents only notational challenges, while the argument itself remains essentially intact. We will also simplify the notation by omitting the superscript containing the predictor index. For example, we will write µ i and ˆP i instead of µ ij and ˆP ij. Define δ kh = [log n/nh k )] 1/2. Observe the relationship A ˆβt) βt) = A[ˆη] [η] F, where [ˆη] and [η] denote matrixes [ˆη 1...ˆη d ] T and [η 1...η d ] T, respectively, and F stands for the Frobenius matrix norm. Hence, we need to show that there exists an invertible matrix A, for which A[ˆη] [η] F = O p h 4 opt + δ 2 dh opt + n 1/2 ). In the new notation ˆµ i = ˆµ i1,..., ˆµ iq ) T and ˆη ˆµ i = ˆη T 1 ˆµ i,..., ˆη T d ˆµ i) T = ˆP i. To be able to conveniently apply existing results, we will slightly modify the trimmed objective function, replacing ˆP l with ˆP l ˆP i and writing this expression in terms of ˆη and ˆµ. We will also use ˆµ to denote ˆµ l ˆµ i. The modified objective function, 1 n i=1 I ni ρˆηi Yl a i c T i ˆη ˆµ )2 Kh ˆη ˆµ ), l=1 however, corresponds to exactly the same SIMFE estimator as the original trimmed function. Define Ǩt = I ni ρˆηi Kht ˆη ˆµ ), ˆ = c i ˆµ and = c i µ. We will use the superscript t, τ) to indicate that the estimator corresponds to the τ iteration of the algorithm corresponding to the bandwidth h t. account the modifications to the objective function, yields Here η r1 such rotation. ˆη t,τ+1) = η r1 + { Ǩ ˆ t,τ) t A simple manipulation of formula 23), taking into ˆ t,τ) ) T } 1 Ǩ ˆ t,τ) t {Y l a t,τ) i η ˆ t,τ) r1 } 1) corresponds to an arbitrary rotation of [η], and the above formula holds for each Let M denote the number of time point configurations, at which the predictor functions are observed. We will only consider configurations that are generated with positive probabilities. Denote by A k, k = 1,..., M, the index set of the observations corresponding to the k-th time point configuration. Note that, using the basis representation for the predictors and the projection functions, equation 10) simplifies to Y i = m k η T 1 µ i,..., η T d µ i) + ε i, i A k, 2) 2

3 where Eε i W i ) = 0. The last equality also implies Eε i µ i ) = 0. This formulation of the SIMFE model allows us to apply some of the theory developed for the MAVE approach. We will first focus on the right-hand side of display 1), for sufficiently large n, with ˆ replaced by, and rewrite it as η r1 + { M Σ k } 1 { M Σ k}, where Σ k = i,l A k Ǩ t t,τ) t,τ) ) T and Σ k = i,l A k Ǩ t t,τ) Y i a t,τ) j η r1 t,τ) ). We will apply Lemma A.5 in the supplemental material of Xia 2008) to each Σ k, and use a natural generalization of Lemma A.4 to handle { M Σ k } 1. For each k M, write π k for the probability of the k-th time point configuration, and let µ k) denote µ i for some i in A k. Define and D 1 = Φ n = n 1 M A k n i A k ρ f i η µ i )) m k η µ i ) ν µ i ) ) ε i, M πkeρ 2 fη µ k) )) m k η µ k) ) ν µ k) ) ) m k η µ k) ) ν µ k) ) ) T, 3) D 2 = 2 M πkeρ 2 fη µ k) )) m k η µ k) ) T m k η µ k) ) ω µ k) ) ) T. 4) where ν µ) = µ E µ η µ) and ω µ) = E µ µ T η µ) E µ η µ)e µ η µ) T. We will use superscript + to denote the MoorePenrose inverse. By Lemmas A.4 and A.5 in the supplemental material of Xia 2008), there exists a rotation of [η], call it [η r2 ], such that η r1 + { provided ˆη t,τ) η r1 Ǩ t t,τ) t,τ) ) T } 1 Ǩ t t,τ) {Y l a t,τ) i η t,τ) } = η r2 + I D + 2 D 1 ) 1 D + 2 Φ n + O p h 4 t + δ 2 dh t ), 5) = o p h t ). Note that Φ n = O p n 1/2 ). Due to the assumption A3, the unknown parameters, µ δ and σ 2 are estimated at the usual parametric rate, n 1/2, and hence ˆµ = µ1 + O p n 1/2 )). Consequently, equations 1) and 5) imply ˆη t,τ+1) η r2 = O p h 4 t + δ 2 dh t + n 1/2 ), 6) as long as ˆη t,τ) η r1 = o p h t ). Note that δ 2 dh t = oh t ), and hence the stochastic bound in display 6) can be written as o p h t ). It follows that the final estimator for the bandwidth h t, which is also the initial estimator for the bandwidth h t+1, satisfies ˆη t+1,0) η r = o p h t+1 ), for some rotation of [η]. Hence, as long as ˆη 0,0) η = o p h 0 ) holds for the initialization estimator, we can establish, by induction, that the final SIMFE estimator satisfies [ˆη] [η r ] F = O p h 4 opt + δ 2 dh opt + n 1/2 ), 7) 3

4 where [η r ] is an appropriate rotation of [η]. The required bound for the initialization estimator follows from Theorem 4 in Li et al. 2010), which establishes that the gopg estimator converges at the rate O p h 2 0 +δ 2 ph 0 h 1 0 ). The last stochastic bound is o p h 0 ) by the definitions of h 0 and δ ph0. Theorem 4. In the case where n i = 1 for all i, we can repeat the proof of Theorem 3, treating S as an additional univariate predictor. The reduced dimension increases from d to d + 1. As a result, the convergence rate changes from O p n 4/d+4) log n + n 1/2) to O p n 4/d+5) log n + n 1/2). In the general case, some of the expressions in the proof of Theorem 3 need to be modified. However, because the sequence {n i } is bounded, the stochastic bounds 6) and 7) remain the same as in the case n i = 1. 3 Proof of Theorem 5 Partition the rows of the matrix I Ω i S i ) I Ω i S i ) T + σ 2 Ω i Ω T i into p groups of adjacent rows, so that the size of the j-th group is q j. Partition the columns of the same matrix analogously. This corresponds to a partition of the matrix into p 2 blocks, V ijk, where j is the group index in the row partition, and k is the group index in the column partition. Write v ijk for the vectorized form of the block V ijk. Recall the definitions of matrices Σ i and vectors ξ i in sections 2.4 and 3. Each element of ξ i has the form covu ijl1, U ikl2 ) for some indexes j, k, l 1, l 2 that satisfy: p k j 1, d j l 1 1, d k l 2 1, and l 2 l 1 whenever k = j. Consequently, each element of ξ i can be written as η T jl 1 V ijk η kl2. Thus, there exists a collection of vectors γ jl1 kl 2, with the indexes satisfying the inequalities given above, such that for each i the elements of ξ i have the form γ T jl 1 kl 2 v ijk. We will denote this representation of the vector ξ i as γ v i, where γ is the vector constructed by stacking the vectors γ jl1 kl 2, and v i is similarly constructed from v ijk. Using the derivations in Section 2.4 and the notation in Appendix D, we see that m ti P i ) can be written as ˇmη µ i, γ v i ), for some function ˇm, which does not depend on i. We can now follow the argument in the proof of Theorem 3, with some small modifications. Our initialization estimator is still gopg, however we use ˆµ i, ˆv i ), instead of ˆµ i, as the predictor vectors. Here ˆv i are constructed analogously to v i, but the unknown parameters and σ 2 are replaced with their estimates. We initialize the bandwidth as n 1/ˇp+4), where ˇp is the dimension of ˆµ i, ˆv i ). As in the proof of Theorem 3, the parameters and σ 2 are estimated at the parametric rate, n 1/2, and the aforementioned gopg estimator of η, γ) is 4

5 consistent. We no longer partition the observations by time point configuration, but instead treat v i as an additional predictor. Equation 2) is replaced with Y i = ˇm η µ i, γ v i ) + ε i, and the dimensionality of the corresponding model increases from d to d = d+dd+1)/2. The rest of the proof is the same as that of Theorem 3, with the appropriate modifications, such as replacing ˆη with ˆη, ˆγ) and ˆµ i with ˆµ i, ˆv i ). Taking into account the increased dimensionality of the problem, stochastic bound in display 7) changes to O p h 4 opt + δ 2 d hopt + n 1/2 ), which simplifies to O p n 4/ d+4) log n + n 1/2 ). References Li, L., Li, B., and Zhu, L.-X. 2010). Groupwise dimension reduction. Journal of the American Statistical Association 105, Xia, Y. 2008). A multiple-index model and dimension reduction. Journal of the American Statistical Association 103,

Index Models for Sparsely Sampled Functional Data

Index Models for Sparsely Sampled Functional Data Peter Radchenko, Xinghao Qiao, and Gareth M. James May 21, 2014 Abstract The regression problem involving functional predictors has many important applications