Nonlinear function on function additive model with multiple predictor curves. Xin Qi and Ruiyan Luo. Georgia State University

Size: px

Start display at page:

Download "Nonlinear function on function additive model with multiple predictor curves. Xin Qi and Ruiyan Luo. Georgia State University"

Patience Clark
5 years ago
Views:

1 Statistica Sinica: Supplement Nonlinear function on function additive model with multiple predictor curves Xin Qi and Ruiyan Luo Georgia State University Supplementary Material This supplementary material is organized as follows. Section S1 includes additional figures and tables in simulations and application. In Section S, we provides a set of identifibility conditions for the nonlinear function F (x, s, t) in the model (3.) in the main manuscript. Section S3 provides additional computational details and the choice of tuning parameters. Section S4 provides details of calculating the estimation error for F (x, s, t) in Simulation 1. The proofs of theorems and technical lemmas are provided in Sections S5 and S6, respectively. In this supplementary material, all the labels of equations, figures, tables, sections, and so on, are prefixed with S., such as equations: (S.1), (S.),, sections: S.1.1, and so on. The equation/section/ numbers without prefix S. are for those in the main manuscript.

2 Xin Qi and Ruiyan Luo S1 Additional figures and tables S.1.1 Figures 3 sample curves of X(s) s Figure S.1: Thirty sample curves of X(s) in Simulation 1.

3 S1. ADDITIONAL FIGURES AND TABLES 3 samples curves from Y(t) for F function 1 3 samples curves from Y(t) for F function t t 3 samples curves from Y(t) for F function 3 3 samples curves from Y(t) for F function t t Figure S.: Thirty sample curves of Y (t) for each of the four F (x, s, t) with ρ =.7 and σ = 1 in Simulation 1.

4 Xin Qi and Ruiyan Luo Frequencies of selected λ Frequencies of selected κ e 1 1e The setting : F function 1, σ =.1, ρ = Frequencies of selected λ e 1 1e The setting : F function 1, σ =.1, ρ = Frequencies of selected κ e 1 1e The setting : F function 1, σ =.1, ρ =.7 Frequencies of selected λ e 1 1e The setting : F function 1, σ =.1, ρ =.7 Frequencies of selected κ e 1 1e The setting : F function 1, σ = 1, ρ = Frequencies of selected λ e 1 1e The setting : F function 1, σ = 1, ρ = Frequencies of selected κ e 1 1e The setting : F function 1, σ = 1, ρ = e 1 1e The setting : F function 1, σ = 1, ρ =.7 Figure S.3: The frequencies of the selected tuning parameters for the first F (x, s, t) in Simulation 1.

5 S1. ADDITIONAL FIGURES AND TABLES Frequencies of selected λ Frequencies of selected κ 4 6 1e 1 1e The setting : F function, σ =.1, ρ = Frequencies of selected λ e 1 1e The setting : F function, σ =.1, ρ = Frequencies of selected κ e 1 1e The setting : F function, σ =.1, ρ =.7 Frequencies of selected λ e 1 1e The setting : F function, σ =.1, ρ =.7 Frequencies of selected κ e 1 1e The setting : F function, σ = 1, ρ = Frequencies of selected λ e 1 1e The setting : F function, σ = 1, ρ = Frequencies of selected κ e 1 1e The setting : F function, σ = 1, ρ = e 1 1e The setting : F function, σ = 1, ρ =.7 Figure S.4: The frequencies of the selected tuning parameters for the second F (x, s, t) in Simulation 1.

6 Xin Qi and Ruiyan Luo Frequencies of selected λ Frequencies of selected κ e 1 1e The setting : F function 3, σ =.1, ρ = Frequencies of selected λ e 1 1e The setting : F function 3, σ =.1, ρ = Frequencies of selected κ e 1 1e The setting : F function 3, σ =.1, ρ =.7 Frequencies of selected λ e 1 1e The setting : F function 3, σ =.1, ρ =.7 Frequencies of selected κ e 1 1e The setting : F function 3, σ = 1, ρ = Frequencies of selected λ e 1 1e The setting : F function 3, σ = 1, ρ = Frequencies of selected κ e 1 1e The setting : F function 3, σ = 1, ρ = e 1 1e The setting : F function 3, σ = 1, ρ =.7 Figure S.5: The frequencies of the selected tuning parameters for the third F (x, s, t) in Simulation 1.

7 S1. ADDITIONAL FIGURES AND TABLES Frequencies of selected λ Frequencies of selected κ e 1 1e The setting : F function 4, σ =.1, ρ = Frequencies of selected λ e 1 1e The setting : F function 4, σ =.1, ρ = Frequencies of selected κ e 1 1e The setting : F function 4, σ =.1, ρ =.7 Frequencies of selected λ e 1 1e The setting : F function 4, σ =.1, ρ =.7 Frequencies of selected κ e 1 1e The setting : F function 4, σ = 1, ρ = Frequencies of selected λ 1e 1 1e The setting : F function 4, σ = 1, ρ = e 1 1e The setting : F function 4, σ = 1, ρ = Frequencies of selected κ 1e 1 1e The setting : F function 4, σ = 1, ρ =.7 Figure S.6: The frequencies of the selected tuning parameters for the fourth F (x, s, t) in Simulation 1.

8 Xin Qi and Ruiyan Luo NO CO NMHC NOx t t t t C6H6 Temperature Relative Humidity t t t Figure S.7: The 355 sample curves for each of the seven variables in the air quality data.

9 The estimated intercept functions : µ^(t) S1. ADDITIONAL FIGURES AND TABLES t Figure S.8: Estimated intercept function µ(t) for the daily air quality data. The function F^1 (DISC) (x,s,1) The function F^ (DISC) (x,s,1) The function F^3 (DISC) (x,s,1) x s x s x s The function F^4 (DISC) (x,s,1) The function F^5 (DISC) (x,s,1) The function F^6 (DISC) (x,s,1) x s x s x s Figure S.9: Estimated functions (DISC) F j (x, s, t) given t = 1, 1 j 6, for the daily air quality data.

10 Xin Qi and Ruiyan Luo (DISC) F^3 (x,s,6) (DISC) F^3 (x,s,1) (DISC) F^3 (x,s,18) s= s=6 s=1 s=18 s= s= s=6 s=1 s=18 s= s= s=6 s=1 s=18 s= NOx NOx NOx Figure S.1: The curves daily air quality data. F (DISC) 3 (x, s, t) versus x (predictor NO x ) for different values of s and t in the

11 S1. ADDITIONAL FIGURES AND TABLES S.1. Tables Table S.1: The averages (and standard deviations) of the number K of the selected components and running time (in seconds) of 1 replicates for Simulation 1. The running time was obtained on a computer cluster with Linux system and Intel(R) Xeon(R) CPU GHz. Model σ ρ K time (.) (.986).7.15(.36) (.88).3(.17) 3.98(6.49).7.5(.) 3.48(6.366).(.) 1.13(.187).7 1.9(.7) 1.668(.13) 1.8(1.4) (6.895).7.7(1.18) (7.3).54(.5) (.671).7.75(.44) 1.98(.735).43(.6) 8.387(5.461) (1.1) 3.9(6.96).18(.39) 1.916(.537).7.89(.31) 1.981(.688) 4.7(.13) (6.64).7 3.3(1.39) 36.1(7.136)

12 Xin Qi and Ruiyan Luo Table S.: The averages (and standard deviations) of the number K of the selected components and running time (in seconds) of 1 replicates for Simulations and 3. Simulation Simulation 3 γ ρ curve σ ρ K time K time (.) 39.75(1.48) 3.(.) (13.785).7.(.) 4.173(1.4) 3.(.) 68.14(13.88).14(.67) (8.91).48(1.185) 5.554(46.39).7 4.9(1.31) (3.) 5.9(.357) 55.84(45.438).(.) 4.451(1.57) 3.(.) 64.99(1.47).7.1(.1) 41.71(1.851).74(.44) (1.848).(.) (8.976).39(1.54) (4.778).7 4.3(1.66) (9.49) 3.865(1.8) (39.48) (.) 39.8(1.67) 3.(.) 67.99(13.67).7.(.) 4.654(1.468) 3.(.) 69.3(14.186).3(.3) (9.441).4(1.39) 5.579(45.187) (1.31) (3.65) 5.394(.113) 53.69(45.759).(.) 4.176(1.564) 3.(.) 65.98(1.864).7.(.) (1.596).639(.483) 69.13(1.45).1(.1) 11.36(8.75).6(.676) 4.684(4.64).7 4.3(1.7) 116.4(9.6) 4.43(1.98) 4.397(35.73)

13 S. CONDITIONS FOR THE IDENTIFIABILITY OF THE FUNCTION F (X, S, T ) S Conditions for the identifiability of the function F (x, s, t) To ensure the identifiability of F (x, s, t) in model (3.), we impose a set of conditions on F (x, s, t) and the distribution of X(s) in the following Proposition. Proposition S.1. Suppose that the following conditions are satisfied, (1). X(s) is a Gaussian process. The covariance function Σ X (s, s ) of X(s) is continuous and all the eigenvalues of Σ X (s, s ) are positive. (). The true function F (x, s, t) satisfies the condition (3.3) in the manuscript, and the partial derivative x F (x, s, t) is a continuous function on (x, s, t) (, ) [, 1] [a, b]. If there is another function F (x, s, t) satisfying the conditions in (), and F (X(s), s, t)ds = F (X(s), s, t)ds for all t 1, then we have F (x, s, t) = F (x, s, t). Remark 1. Different conditions exist for the identifiability of F (x, s, t). But we note that any set of identifiability conditions cannot be verified in practice. Indeed, a necessary condition for identifiability is that all the eigenvalues of the covariance function of X(s) are positive. Indeed, if this condition is not satisfied, let β(s) be a nonzero eigenfunction corresponding to the zero eigenvalue. Then we have X(s)β(s)ds = and hence the function F (x, s, t) = F (x, s, t)+{x E[X(s)]}β(s) leads to the same model as the true model. But F (x, s, t) satisfies the condition (3.3) and is different from F (x, s, t). Therefore, F (x, s, t) is not identifiable. In practice, we only have a finite number of sample curves and there are

14 Xin Qi and Ruiyan Luo infinitely many eigenvalues for the covariance function of X(s), we cannot determine if all the eigenvalues are positive. Therefore, any identifiability conditions cannot be verified in practice. Proof of Proposition S.1. We prove this proposition by contradiction. Let (Ω, P ) denote the probability space. Suppose that there exists another function F (x, s, t) which is different from F (x, s, t), and satisfies the conditions in () in this proposition and F (X(ω, s), s, t)ds = F (X(ω, s), s, t)ds for all ω Ω and t 1. Then there exists t such that F (x, s, t ) F (x, s, t ). Define G(x, s) = F (x, s, t ) F (x, s, t ). Then G(x, s) and G(X(ω, s), s)ds = F (X(ω, s), s, t )ds F (X(ω, s), s, t )ds =, (S.1) for all ω. Then for any two ω and ω, we have = G(X(ω, s), s)ds G(X(ω, s), s)ds = X(ω,s) X(ω,s) x G(x, s)dxds. (S.) We show that x G(x, s) is a nonzero function. Otherwise, if x G =, then G(x, s) only depends on s, that is, we have F (x, s, t ) F (x, s, t ) = G(x, s) = h(s) for some function h(s). By the condition (3.3), E[ F (X(s), s, t )] = = E[F (X(s), s, t )] for all s 1, so we have = E[G(X(s), s)] = h(s) and hence G(x, s) = which contradicts to the fact G(x, s). Therefore, x G(x, s) is a nonzero function. Because x G(x, s) = x F (x, s, t ) x F (x, s, t ) is nonzero and continuous due to the continuity of both x F (x, s, t ) and x F (x, s, t ), we can find a rectangle region {(x, s) : x 1 x

15 S. CONDITIONS FOR THE IDENTIFIABILITY OF THE FUNCTION F (X, S, T ) x, s 1 s s }, such that x G(x, s) > δ for some positive constant δ. Without loss of generality, we assume that in this rectangle region, x G(x, s) > δ, (if x G(x, s) is negative, the proof is similar). We define two step functions: x 1 +x if < s < s 1 or s < s < 1 f 1 (s) =, x 1 if s 1 s s x 1 +x if < s < s 1 or s < s < 1 f (s) =, (S.3) x if s 1 s s and plot them in the following Figure S (s 1, x ) (s 1, x 1 ) f (s) f 1 (s) (s, x ) (s, x 1 ) s Figure S.11: The plots of f 1 (s) and f (s) defined in (S.3).

16 Xin Qi and Ruiyan Luo Because x G(x, s) > δ in the region {(x, s) : x 1 x x, s 1 s s }, we have = G(f (s), s)ds {x 1 x x,s 1 s s } G(f 1 (s), s)ds = f (s) x G(x, s)dxds f 1 (s) x G(x, s)(x, s)dxds {x 1 x x,s 1 s s } δdxds > η >, (S.4) where η is any positive constant satisfying η < δ(s s 1 )(x x 1 ) and we will choose a specific η in the following. Consider the Karhunen-Loève expansion: X(s) = ξ k φ k (s), (S.5) where {φ k (s), k 1} is the collection of all the eigenfunctions of Σ X (s, s ) and forms an orthonormal basis of the L [, 1] space. ξ k = X(s)φ k(s)ds, k 1, are uncorrelated random variables and V ar(ξ k ) = ν k, where ν 1 ν > are eigenvalues of Σ X (s, s ) and they are all positive by the condition (1) in this proposition. Because we have assumed that X(s) is a Gaussian process, {ξ k, k 1} are independent normal variables. Now let f 1 (s) = a (1) k φ k(s), f (s) = a () k φ k(s), be the expansions of f 1 (s) and f (s) using the orthonormal basis {φ k (s), k 1}. Define an event { A = ω : } { [X(s, ω) f 1 (s)] ds 1 ω : } [X(s, ω) f (s)] ds 1. (S.6) Let N 1 = sup{ X(s, ω) : s 1, ω A } and N = sup s 1 max{f 1 (s), f (s)} N = max{n 1, N }, C 1 = sup x G(x, s). (S.7) x N, s 1

17 S. CONDITIONS FOR THE IDENTIFIABILITY OF THE FUNCTION F (X, S, T ) We choose an integer M large enough such that k=m+1 E[ξ k] = k=m+1 ν k η 3C 1, k=m+1 {a (1) k } < η, 16C 1 k=m+1 {a () k } < η. 16C 1 (S.8) We define two events A 1 = {ω : [ξ k (ω) a (1) k ] < η/(8c 1 M), 1 k M, A = {ω : [ξ k (ω) a () k ] < η/(8c 1 M), 1 k M, k=m+1 k=m+1 ξ k (ω) η/(16c 1 )}, ξ k (ω) η/(16c 1 )}. (S.9) Because {ξ k, k 1} are independent normal variables, P (A 1 ) = where each P P M P ( ( ) [ξ k (ω) a (1) k ] < η/(8c 1 M) P k=m+1 ξ k (ω) η/(16c 1 ) ( ) [ξ k (ω) a (1) k ] < η/(8c 1 M) is positive, 1 k M, and ( k=m+1 ξ k (ω) η/(16c 1 ) 1 E ( ) k=m+1 ξ k η/(16c 1 ) 1 η/(3c 1) η/(16c 1 ) = 1, = 1 ) = 1 P ( k=m+1 E[ξ k ] η/(16c 1 ) k=m+1 ξ k > η/(16c 1 ) = 1 k=m+1 ν k η/(16c 1 ) ) ), where the first inequality in the second line follows from the Markov s inequality, and the first inequality in the last line follows from the first inequality in (S.8). Therefore, P (A 1 ) is positive and similarly, P (A ) is also positive. Pick up an ω A 1 and an ω A. By

18 Xin Qi and Ruiyan Luo (S.9) and (S.8), and noting that {φ k (s), k 1} is an orthonormal basis, we have [X(s, ω) f 1 (s)] ds = M [ξ k (ω) a (1) k ] + M [ξ k (ω) a (1) k ] + M η/(8c 1 M) + [ ξ k (ω)φ k (s) [ k=m+1 k=m+1 [ k=m+1 ξ k (ω) + η + η + η = 3η. 8C 1 16C 1 16C 1 8C 1 Similarly, a (1) k φ k(s)] ds ξ k (ω)φ k (s) a (1) k φ k(s)] ds ξ k (ω)φ k (s)] ds + [X(s, ω) f (s)] ds 3η 8C 1. [a (1) k ] k=m+1 [ k=m+1 a (1) k φ k(s)] ds (S.1) (S.11) Now we choose η small enough such that 3η/(8C 1 ) < 1. Then by the definition of the event A in (S.6), we have ω, ω A, which, together with (S.7), implies that for any (x, s) with s 1 and x between X(s, ω) and f 1 (s), or x between X(s, ω) and f (s), we have x G(x, s) C 1. (S.1) Now by (S.), X(s, ω) = G(X(s, ω), s)ds G(X(s, ω), s)ds = x G(x, s)dxds X(s,ω) f1 (s) f (s) X(s, ω) = x G(x, s)dxds + x G(x, s)dxds + x G(x, s)dxds X(s,ω) f 1 (s) f (s) f (s) X(s,ω) X(s, ω) x G(x, s)dxds x G(x, s)dxds x G(x, s)dxds f 1 (s) f 1 (s) f (s)

19 η η X(s,ω) f 1 (s) x G(x, s)dxds X(s, ω) f 1 (s) C 1 ds X(s, ω) f (s) x G(x, s)dxds X(s, ω) f (s) C 1 ds S3. COMPUTATIONAL ISSUE (by (S.4)) (by (S.1)) η C 1 X(s, ω) f 1 (s) ds C 1 X(s, ω) f (s) ds (by Cauchy-Schwarz ) η 3C 1η 8C 1 3C 1η 8C 1 = η 4, (S.13) where the last inequality follows from (S.1) and (S.11). Now (S.13) implies that η/4, but η is a positive number. So we got a contradiction. Hence, we must have F (x, s, t) = F (x, s, t). The proposition is proved. S3 Computational issue S.3.1 Solving the optimization problem (3.18) We first calculate the first term, the integrated sum of squared residuals, in the objective function of (3.18). 1 n Y(t) v (t)1 n R k v k (t) 1 n {1 n Y(t)}v (t)dt + l=1 1 n { R l R k }v k (t)v l (t)dt. dt = 1 n Y(t) dt + 1 n { R k Y(t)}v k (t)dt + v (t) dt 1 n {1 n R k }v k (t)v (t)dt (S3.14)

20 Xin Qi and Ruiyan Luo From the definition of Σ in (3.14), if j k, 1 n R j R k = 1 n n l=1 [ ] [ ] r l (Ĝj) r(ĝj) r l (Ĝk) r(ĝk) = Σ(Ĝj, Ĝk) = λ Ĝj, Ĝk H, where the last equality follows from the constraints in the optimization problem (3.15). Similarly, if j = k, 1 n R k R k = 1 n n l=1 [ ] [ ] r l (Ĝk) r(ĝk) r l (Ĝk) r(ĝk) = Σ(Ĝk, Ĝk) = 1 λ Ĝk, Ĝk H, Because in our asymptotic theory, the tuning parameter λ goes to zero as n, the terms λ Ĝj, Ĝk H are typically small. Hence, we have the approximation 1 R R n j k, for all k j, and 1 R R n k k 1. Moreover, for any 1 k K, R k 1 n = [ ] n l=1 r l (Ĝj) r(ĝj). Then by (S3.14), we have the following approximation, 1 n + Y(t) v (t)1 n ȳ(t)v (t)dt {v (t) ȳ(t)} dt R k v k (t) dt 1 n 1 n { R k Y(t)}v k (t)dt + ȳ(t) dt + Y(t) dt + v (t) dt v k (t) dt = 1 n {v k (t) ŵ () k (t)} dt (S3.15) Y(t) dt ŵ () k (t) dt, where ŵ () k (t) = R k Y(t)/n. Thus, the estimates µ(t) and φ k (t) of µ(t), φ k (t), 1 k K, can be obtained by solving the following problems separately: [ b b min v (t) Y (t) dt + κ v (t) a a [ b v k (t) 1 n R k Y(t) dt + κ min v k (t) a v ] (t) dt b a, (S3.16) ] v k(t) dt, 1 k K. (S3.17) =

21 S3. COMPUTATIONAL ISSUE We solve (S3.16) and (S3.17) using B-spline basis expansions as in Section 3. of Luo and Qi (17). S.3. Choice of the number of components and tuning parameters We choose the number of components and the two tuning parameters λ and κ simultaneously based on the following cross-validation procedure. Both λ and κ are chosen from the set {1 1, 1 8, 1 6, 1 4, 1, 1}. For the l-th value of λ, 1 l 6, we first determine a maximum number of components, K max,l. The optimal number of components will be chosen from the integers between 1 and K max,l. In Theorem 3, we choose the first few components with relatively large σ k. As σ k can be estimated by ( σ(l) k ) which is the maximum value of the optimization problem (3.15), we set { K max,l = min k > 1 : ( σ (l) ( σ (l) k ).1 1 ) + + ( σ (l) ) k }. (S3.18) Once we have determined all the K max,l, we use the cross-validation method to determine the tuning parameters (λ, κ) and the optimal number of components, K opt, simultaneously. We summarize the details of the procedure in the following algorithm. Algorithm For the l-th value of the tuning parameter λ, 1 l 6, we determine K max,l using the whole data set and (S3.18).. We randomly split the whole data set into five subsets. For each 1 v 5, we use the v-th subset as the v-th validation set and all other observations as the v-th training set. Then for the l-th value for λ and the v-th training set,

22 Xin Qi and Ruiyan Luo (a) we estimate Ĝ(v,l) k (x, s) for all 1 k K max,l. Then for the i-th value of κ, 1 i 6, we estimate µ (v,l,i) (t) and φ (v,l,i) k (t) for all 1 k K max,l. (b) For each K = 1,, K max,l, we use µ (v,l,i) (t), Ĝ (v,l) (x, s) and k (v,l,i) φ k (t) for 1 k K, to obtain the predicted response curves {ŷ (v,l,i,k) j (t), 1 j N v } for the v-th validation set using (3.19) and then calculate the corresponding validation error e v,l,i,k = N v Ny j=1 m=1 (ŷ(v,l,i,k) j (t m ) y (v) j (t m )) δ m /N v, where {y (v) (t), 1 j N v } are the collection of all the response curves in the v-th validation set. j After we repeat (a)-(b) for all 1 v 5 and 1 l 6, we calculate the average validation error, ē l,i,k = 5 v=1 e v,l,i,k/5. 3. Let ē l,i,k = min 1 l 6,1 i 6,1 K Kmax,l ē l,i,k. Then we choose the l -th value for λ, the i -th value for κ, and the optimal number of components is K opt = K. S4 Details of calculating the estimation error for F (x, s, t) in Simulation 1 In Simulation 1, the functions F (x, s, t) do not satisfy the condition (3.3) and may be unidentifiable. So we consider the following centered function F (x, s, t) = F (x, s, t) E[F (X(s), s, t],

23 S4. DETAILS OF CALCULATING THE ESTIMATION ERROR FOR F (X, S, T ) IN SIMULATION 1 and the models in Simulation 1 can be rewritten as Y (t) = µ (t) + F (X(s), s, t)ds + ε(t), where µ (t) = µ(t)+ E[F (X(s), s, t]ds. To calculate the expectation E[F (X(s), s, t], we note that X(s) is a Gaussian process, and for any s, the marginal distribution of X(s) is the standard normal distribution. So E[F (X(s), s, t] = e x / F (x, s, t)dx/ π which is calculated using a Riemann sum in practice. Based on Proposition S.1 in Section S of this supplementary material, we can show the centered functions F (x, s, t) is identifiable. Actually, F (x, s, t) satisfies the condition (3.3), and hence satisfies the condition () in Proposition S.1. It follows from the results in Section of the book Rasmussen and Williams (5) that all the eigenvalues of the covariance function Σ X (s, s ) = e {1 s s } are positive. Hence, the condition (1) in Proposition S.1 is also satisfied. It follows from Proposition S.1 that F (x, s, t) is identifiable. So we will calculate the estimation error for F (x, s, t). For any method considered in Simulation 1, let F (x, s, t) denote the estimate of F (x, s, t). Then we use F (x, s, t) = F (x, s, t) e x / F (x, s, t)dx/ π as the estimate of F (x, s, t). In practice, we have only a finite number of sample predictor curves. The range only covers a finite region of x. So we cannot obtain any information about F (x, s, t) when x is outside of this finite region, although the function F (x, s, t) is defined in the unbounded region (, ) [, 1] [, 1]. In the following Figure S.1, we plot 1 sample curves of X(s) in the training set of one repeat in Simulation 1. The plot

24 Xin Qi and Ruiyan Luo shows that most values of these sample curves fall between -.5 and.5. 1 sample curves of X(s) in the traing set x=.5 x= s Figure S.1: 1 sample curves of X(s) in the training set of one repeat in Simulation 1. Therefore, one can anticipate that the estimate of F (x, s, t) with x outside the interval [.5,.5] may be less accurate for any method considered in Simulation 1. Therefore, we only consider the estimation error of F (x, s, t) in the finite region [.5,.5] [, 1] [, 1]. Specifically, we calculate the following relative mean squared error (RelMSE) for F (x, s, t): RelMSE = [ ] F (x, s, t) F (x, s, t) dxdsdt F (x, s, t) dxdsdt

25 S4. DETAILS OF CALCULATING THE ESTIMATION ERROR FOR F (X, S, T ) IN SIMULATION 1 where the integrals are calculated using Riemann sum over a dense grid in the region [.5,.5] [, 1] [, 1]. We report the averages and standard deviations of the RelMSEs over 1 repeats in Table S.3 for all 16 settings in Simulation 1. Since the output of pffr.pc does not provide estimate of F (x, s, t), we do not include this method in Table S.3. Table S.3: The averages (standard deviations) of RelMSEs of 1 replicates for Simulation 1. Model σ ρ SigComp.nonlin pffr.nonlin SigComp.lin pffr e-4 ( ).6 ( 1e-4 ) e-4 ( e-4 ).11 ( e-4 ).7.11 ( e-4 ).13 ( 4e-4 ) 6e-4 ( e-4 ).778 (.46 ).47 (.11 ).6 ( 9e-4 ).46 (.18 ).51 ( 9e-4 ).7.8 (.83 ).1445 (.45 ).19 (.8 ).384 ( ) 6e-4 ( e-4 ). ( 7e-4 ).8878 (.913 ) ( ).7.6 ( 9e-4 ).35 (.1 ).9418 (.7316 ) ( ).77 (.1 ).59 (.17 ).894 (.383 ) ( ).7.84 (.15 ).14 (.364 ).938 (.4537 ) ( ) 9e-4 ( 1e-4 ).3 (.119 ).367 (.59 ) ( ).7.6 (.14 ).37 (.9 ).3944 (.519 ) ( ).15 (.38 ).8 (.91 ).3865 (.1979 ) ( ) (.71 ).71 (.79 ).3736 (.54 ) 3.77 ( ).15 ( 3e-4 ).374 (.143 ) 1.44 (.5115 ) ( )).7.41 ( 8e-4 ).43 (.193 ).9444 (.37 ) ( ).177 (.5 ).383 (.14 ) (.647 ) ( ) (.13 ).1617 (.54 ).9547 (.84 ) ( )

26 Xin Qi and Ruiyan Luo S5 Proofs of theorems S.5.1 Proof of Theorem 1 For 1 k K, let H k (x, s) be any function satisfying that H k(x(s), s)ds has a finite second moment and ϕ k (t) be any function in L [a, b]. Let Y new (t) = µ(t) + Y 1 (t) = µ(t) + Y (t) = µ(t) + F (X new (s), s, t)ds + ε new G k (X new (s), s)φ k (t)ds H k (X new (s), s)ϕ k (t)ds denote the response function for X new (s), the predicted response function based on the partial sum K G k(x, s)φ k (t), and the predicted response function based on K H k(x, s)ϕ k (t), respectively. The mean squared prediction error for K G k(x, s)φ k (t) is E [ Y new Y 1 ] L (S5.19) 1 =E µ + F (X new (s), s, )ds + ε new µ G k (X new (s), s)φ k ds L 1 =E F (X new (s), s, )ds G k (X new (s), s)φ k ds + E [ ] ε new L L (because X new (s) and ε new (t) are independent) =E F (X(s), s, )ds G k (X(s), s)φ k ds L + E [ ] ε L (S5.)

27 S5. PROOFS OF THEOREMS (because X new (s) and ε new (t) have the same distributions as X(t) and ε(t)) K =E S r k φ k + E [ ε ] L (by the definition of S(t) and (3.6)). L Similarly, the mean squared prediction error for the µ(t) + K H k(x, s)ϕ k (t) is E [ Y new Y ] K L = E S q k ϕ k + E [ ] ε L where q k = H k(x(s), s)ds. A nice property of the KL expansion is that K r kφ k (t) is the best K-dimensional random approximation of S(t) and has the smallest approximation [ S ] error among all K-dimensional random approximation of S(t). So E K r kφ k L [ S ] E K q kϕ k and hence E [ Y new Y 1 ] [ L E Ynew Y ] L. Because L H k (x, s) and ϕ k (t) are arbitrary, the partial sum K G k(x, s)φ k (t) has the smallest prediction error. Moreover, because {r 1, r, } are uncorrelated random variables with mean zero and variance one and {φ 1, φ, } are orthogonal functions with φ k L = σ k, we have E [ S ] [ r k φ k L = E k=k+1 which, together with (S5.19), leads to E [ Y new Y 1 L ] = r k φ k L ] = k=k+1 L k=k+1 σ k + E[ ε L ]. E [ r k φ k L ] = k=k+1 The inequalities in (3.8) in this theorem have been proved. Under Condition 1 in Section σ k,

28 Xin Qi and Ruiyan Luo 3.4 of asymptotic theory, a straightforward calculation leads to the inequalities in (3.9). S.5. Proof of Theorem We only prove the theorem for k = 1. The proof of the theorem for k > 1 is very similar to that for k = 1 and hence is omitted. Here is the outline of the proof for k = 1. We first show that for any G(x, s) satisfying the constraint Σ(G, G) = 1, we have Λ(G, G) σ 1. Then we prove that G 1 (x, s) satisfies Σ(G 1, G 1 ) = 1 and Λ(G 1, G 1 ) = σ 1, which implies that G 1 (x, s) is the solution to (3.11) and the maximum value of (3.11) is σ 1. Before providing the details of the proof, we recall some definitions and notations. Recall that for any K 1, K r(g k)φ k (t) = K r kφ k (t) is the best K-dimensional approximation to S(t), and S(t) = r(g k )φ k (t), where (S5.1) r(g 1 ), r(g ),, are uncorrelated random variables with mean and variance 1, (S5.) and φ k (t) = σ k φk, where φ 1, φ,, are orthonormal eigenfunctions of S(t). (S5.3) Recall the definitions of Λ(G, G) and Σ(G, G): Λ(G, G) = b a E [S(t)r(G)] dt, Σ(G, G) = E [ (r(g)) ]. (S5.4)

29 S5. PROOFS OF THEOREMS Now let G(x, s) be any function satisfying the following constraint in the optimization problem (3.11): 1 = Σ(G, G) = E [ (r(g)) ]. (S5.5) We have Λ(G, G) = b {E [S(t)r(G)]} dt = b a a { E [r(g k )r(g)] φ k (t)} dt (by (S5.1)) = {E [r(g k )r(g)]} φ k L (because φ 1(t),, φ K (t) are orthogonal) = {E [r(g k )r(g)]} σk (because φ k L = σ k by (S5.3)) {E [r(g k )r(g)]} σ1 σ1e [ r(g) ] (by (S5.)) =σ 1, (by (S5.5)). Therefore, the maximum value of the optimization problem (3.11) is less than or equal to σ 1. On the other hand, by (S5.), Σ(G 1, G 1 ) = E [ (r(g 1 )) ] = 1. Therefore, G 1 (x, s) satisfies the constraint in (3.11) for k = 1. Moreover, by (S5.4) and (S5.1), b { b Λ(G 1, G 1 ) = {E [S(t)r(G 1 )]} dt = E [r(g k )r(g 1 )] φ k (t)} dt = a a {E [r(g k )r(g 1 )]} φ k L (because φ 1(t),, φ K (t), are orthogonal to each other)

30 Xin Qi and Ruiyan Luo =σ 1E [ r(g 1 ) ] (because (S5.) and φ k L = σ k ) =σ 1, (by (S5.)). Therefore, G 1 (x, s) is the solution to (3.11) and the maximum value is σ 1. S.5.3 Proof of Theorem 3 For convenience, we first recall some notations in the main manuscript. We use L and H to denote L and Sobolev space in [, 1] [, 1], respectively, and both of them are Hilbert spaces. Given a function G(x, s) defined in [, 1] [, 1], let G L = G(x, s) dxds, G H = G L + xx G L + xs G L + ss G L, be the L norm and the Sobolev norm, respectively. For two functions G(x, s) and G(x, s) defined in [, 1] [, 1], the inner products in the two Hilbert spaces are denoted by G, G L = G(x, s) G(x, s)dxds, G, G H = { G(x, s) G(x, s) + xx G(x, s) xx G(x, s) + xs G(x, s) xs G(x, s) + ss G(x, s) ss G(x, s) } dxds, respectively. Step 1: Show that Σ(, ), Λ(, ), Σ(, ), and Λ(, ) are all bounded bilinear functions in H

31 S5. PROOFS OF THEOREMS For any G(x, s) H, by the definition of r(g) in (3.5), we have r(g) = = = = X(s) X(s) X(s) x x [ G(X(s), s)ds E G(X(s), s)ds] [ 1 x G(x, s)dxds + G(, s)ds E x G(x, s)dxds E x G(x, s) dxds + E [ [ X(s) X(s) xx G(y, s)dy + x G(, s) dxds xx G(y, s) dydxds + xx G(y, s) dydxds + xx G(y, s) dyds + xx G(y, s) dyds + K x x G(, s) ds X(s) x G(x, s)dxds + x G(x, s)dxds] x G(x, s) dxds [ ] = (K + ) { G(y, s) + xx G(y, s) }dy ds (K + ) {G(y, s) + xx G(y, s) }dyds ] x G(, s) dydxds x G(, s) dydxds [ ] { G(y, s) + xx G(y, s) }dy ds dyds G(, s)ds] x G(x, s) dxds (S5.6) (S5.7) (K + ) G H, (S5.8) where the inequality in (S5.6) follows from the interpolation inequality (Lemma 5.4 in Adams and Fournier (3)) and the inequality in (S5.7) follows from the Cauchy-

32 Xin Qi and Ruiyan Luo Schwarz inequality. By (S5.8) and the definition of Σ(, ) in (3.1), for any G(x, s), G(x, s) H, Σ(G, G) = E[r(G)r( G)] (K + ) G H G H. (S5.9) (S5.9) implies that Σ(, ) is a bounded bilinear function in H. By Theorem 1.8 in Rudin (1991), Σ(, ) defines a bounded linear operator which, for simplicity, is still denoted by Σ: for any G(x, s), G(x, s) H, Σ(G, G) = G, Σ G H. Similarly, we can show that Λ(, ), Σ(, ), and Λ(, ) are all bounded bilinear functions in H. They all define bounded operators in H (still denoted by Λ, Σ and Σ): Σ(G, G) = G, Σ G H, Λ(G, G) = G, Λ G H, Λ(G, G) = G, Λ G H. Then the optimization problem (3.11) can be expressed as max G, ΛG H, (S5.3) G H subject to G, ΣG H = 1 and G k, ΣG H =, 1 k k 1. The optimization problem (3.15) can be expressed as max G, ΛG H, G H subject to G, ΣG H + λ G H = 1 and Ĝk, ΣG H + λ Ĝk, G H =, for all 1 k k 1. This problem can be further expressed as max G, ΛG H, (S5.31) G H subject to G, ( Σ + λi)g H = 1 and Ĝk, ( Σ + λi)g H =,

33 S5. PROOFS OF THEOREMS where I is the identity operator in H. Step : Transform (S5.3) and (S5.31) to eigenvalue problems in H. Because Σ and ( Σ + λi) are all positive definite (that is, all their eigenvalues are nonnegative) and symmetric, their symmetric square root operators uniquely exist (Theorem 1.33 in Rudin (1991)). Let Σ 1/ and ( Σ + λi) 1/ denote the symmetric square root operator of Σ and Σ + λi, respectively. Since all the eigenvalues of Σ + λi are positive (and greater than λ), it is invertible and ( Σ + λi) 1/ is the symmetric square root operator of ( Σ + λi) 1. Lemma S.1. There exist two bounded operators, B and H, in H such that Λ = Σ B Σ + H, Λ = Σ BΣ. (S5.3) Let η k = Σ 1/ G k and η k = ( Σ + λi) 1/ Ĝ k. Then the optimization problem (S5.3) can be transformed to max η, Γη H, s.t. η η H H = 1, η k, η H =, for all 1 k k 1, (S5.33) where Γ = Σ 1/ BΣ 1/ and its solutions are η k s. The optimization problem (S5.31) can be transformed to max η, Γη H, s.t. η η H H = 1, η k, η H =, for all 1 k k 1, (S5.34) where Γ = ( Σ + λi) 1/ ( Σ B Σ + H)( Σ + λi) 1/ = ( Σ + λi) 1/ Σ B Σ( Σ + λi) 1/ + ( Σ +

34 Xin Qi and Ruiyan Luo λi) 1/ H( Σ + λi) 1/, and η k s are the solutions. Based on the two transformations, we provide the convergence rate in the next step. Step 3: Provide the convergence rate for 1 n n l=1 F (X l (s), s, )ds F (X l(s), s, )ds. L For simplicity, we assume that E[F (X(s), s, t)] = for all s, t. Similar to (3.7), the signal function can be written as S l (t) = Ŝ l (t) = = F (X l (s), s, t)ds = F (X l (s), s, t)ds = r l (G k )φ k (t), { Ĝ k (X l (s), s)ds } Ĝ k (s)ds φ k (t) (S5.35) {r l (Ĝk) r(ĝk)} φ k (t). (S5.36) By (S5.35) (S5.36), 1 n Ŝl S l L n = 1 n [r l (Ĝk) r(ĝk)] φ k r l (G k )φ k n l=1 l=1 L 3 n [r l (Ĝk) r(ĝk)] φ () k [r l (G k ) r(g k )]φ k n l=1 L + 3 n () [r n l (Ĝk) r(ĝk)][ φ k φ k ] + 3 n r(g n k )φ k l=1 L l=1 L, (S5.37) where () φ k (t) = R k Y(t)/n is the function in the problem (S3.17) to which φ k is the solution. By the definition (3.7), we have Y l (t) = µ(t) + F (X l(s), s, t)ds + ε l (t) =

35 S5. PROOFS OF THEOREMS µ(t)+ { G k(x l (s), s)φ k (t)} ds+ε l (t) = µ(t)+ r l(g k )φ k (t)+ε l (t). Therefore, φ () k (t) = R k Y(t)/n = = = = l=1 n l=1 [ ] r l (Ĝk) r(ĝk) Y l (t)/n n [ ] r l (Ĝk) r(ĝk) [Y l (t) Ȳ (t)]/n n l=1 j=1 1 [ ] r l (Ĝk) r(ĝk) [r l (G j ) r(g j )] φ j (t) + 1 n n Ĝk, ΣG j H φ j (t) + ( ΞĜk)(t), j=1 n l=1 (S5.38) [ ] r l (Ĝk) r(ĝk) [ε l (t) ε(t)] where Ξ is a operator from H to L [, 1] such that for any G J, ( ΞG)(t) = n l=1 [r l(g) r(g)] [ε l (t) ε(t)] /n. By (S5.38) and noting that φ k (t) s are orthogonal to each other and φ k = σ k, the first term on the right hand side of (S5.37) can be expressed as 3 n [r n l (Ĝk) r(ĝk)] φ () k [r l (G k ) r(g k )]φ k l=1 L 6 n [r l (Ĝk) r(ĝk)] Ĝk, n ΣG j H φ j [r l (G k ) r(g k )]φ k l=1 j=1 + 6 n [r l (Ĝk) r(ĝk)] ΞĜk n =6 j=1 l=1 where Ĝ(a) j σ j Ĝ(a) j L G j, Σ(Ĝ(a) j G j ) H + 6 k =1 Ĝk, ΣĜk H Ĝk, Ξ ΞĜ k H, (S5.39) = K Ĝk, ΣG j H Ĝ k and Ξ is the adjoint operator of Ξ. L Lemma S.. For any ɛ >, there exists an event Ω n,ɛ with probability greater than 1 ɛ,

36 Xin Qi and Ruiyan Luo such that in Ω n,ɛ, for any 1 j K, Ĝ(a) j and for any j > K, Ĝ(a) j G j, Σ(Ĝ(a) j G j ) H D 19 G j, Σ(Ĝ(a) j G j ) H 1 + D δ j + δ K ( K + ( K σ k ) σ k n 1/, ) n 1/, where δ k = min 1 j k (σ j σ j+1), and D 19 and D are two constants which do not depend on n. In the rest of the proof, we only consider the event Ω n,ɛ. By Condition 1, we have { } δ j = min 1 l j (σ l σl+1) C j (θ+1), σj Cj θ, σ j Cj θ, ( K ) D 1K (θ+1). (S5.4) and hence σ k Then by (S5.4) and Lemma S., 6 j=1 ( σ j Ĝ(a) j 6D 19 K j=1 G j, Σ(Ĝ(a) j G j ) H σ j [δ j + D 1 K (θ+1) ] + 6D D [(K θ+3 + K (θ+1) )n 1/ + K θ+1 ] j=k+1 σ j [δ K + D 1K (θ+1) ] ) n 1/ + 6 j=k+1 σ j D 3 [K (θ+1) n 1/ + K θ+1 ], (S5.41) where the inequality in the last line is because (θ + 1) > θ + 3 when θ > 1. By the definition (S6.61) of Ω n,ɛ, similar arguments as in the proof of Lemma S. lead to an

37 estimate of the second term on the right hand side of (S5.39): S5. PROOFS OF THEOREMS 6 Ĝk, ΣĜk H Ĝk, Ξ ΞĜ k H D 5 n 1/, k =1 Therefore, by (S5.39), (S5.41) and (S5.4), we have (S5.4) 3 n n [r l (Ĝk) r(ĝk)] φ () k l=1 [r l (G k ) r(g k )]φ k L D 3 [K (θ+1) n 1/ + K θ+1 ] + D 5 n 1/. (S5.43) By similar arguments, we can obtain the upper bound for the second and third terms on the right hand side of (S5.39), 3 n [r n l (Ĝk) r(ĝk)][ φ k l=1 3 n r(g k )φ k D 6 n 1, n l=1 L () φ k ] D 4 K θ+3 n 1/, L (S5.44) (S5.45) Combining (S5.43), (S5.44) and (S5.45), and noting that K = C K n 1/(3θ+) given in the theorem, we have 1 n n Ŝl S l L D 1n (θ 1)/(3θ+). l=1 which is the inequality (3.) in the theorem. (S5.46) Now we prove the inequality (3.1) in the theorem. Because Y new = µ(t) + Y pred = µ(t) + F (X new (s), s, t)ds + ε new (t), F (X new (s), s, t)ds (S5.47)

38 Xin Qi and Ruiyan Luo and ε new (t) is independent of X(s), Y(t) and X new (s), we have E [ Y pred Y new ] L X(s), Y(t) (S5.48) = µ(t) + F (X new (s), s, t)ds µ(t) F (X new (s), s, t)ds + E [ ] ε L. L The first inequality in (3.1) immediately follows from (S5.48). In order to show the first inequality in (3.1), we just need to show µ(t) + F (X new (s), s, t)ds µ(t) F (X new (s), s, t)ds D n (θ 1)/(3θ+), L which can be proved using similar arguments as in the proof of the inequality (S5.46). We skip the details. S6 Proofs of technical lemmas S.6.1 The proof of Lemma S.1 By the definition (3.7), we have Y l (t) = µ(t) + F (X l(s), s, t)ds + ε l (t) = µ(t) + { G k(x l (s), s)φ k (t)} ds + ε l (t) = µ(t) + r l(g k )φ k (t) + ε l (t), which, together

39 S6. PROOFS OF TECHNICAL LEMMAS with the definition (3.14) of Λ(G, G), leads to G, ΛG H = Λ(G, G) = 1 [ 1 n {r n l (G) r(g)} {Y l (t) Y (t)}] dt l=1 = 1 [ 1 ( n ) n {r n l (G) r(g)} {r l (G k ) r(g k )} φ k (t) + {r l (G) r(g)} {ε l (t) ε l (t)}] dt l=1 l=1 = 1 [ 1 n G, n ΣG n k H φ k (t) + {r l (G) r(g)} {ε l (t) ε l (t)}] dt = l=1 G, ΣG k H σ k + G, HG H = G, ( Σ B Σ + H)G H, where B and H are two bounded operators. Therefore, we have Λ = Σ B Σ+ H. Similarly, we can show that Λ = Σ BΣ. S.6. The proof of Lemma S. We use to denote the operator norm. We split the proof into several steps. Step 1: provide an upper bound for Γ Γ. Γ Γ (S6.49) = ( Σ + λi) 1/ Σ B Σ( Σ + λi) 1/ + ( Σ + λi) 1/ H( Σ + λi) 1/ Σ 1/ BΣ 1/ ( Σ + λi) 1/ Σ B Σ( Σ + λi) 1/ Σ 1/ BΣ 1/ + ( Σ + λi) 1/ H( Σ + λi) 1/ ( Σ + λi) 1/ Σ B ( Σ + λi) 1/ Σ Σ 1/ + ( Σ + λi) 1/ Σ Σ 1/ B Σ 1/ + ( Σ + λi) 1/ H( Σ + λi) 1/.

40 Xin Qi and Ruiyan Luo We first estimate ( Σ + λi) 1/ Σ Σ 1/ ( Σ + λi) 1/ Σ (Σ + λi) 1/ Σ + (Σ + λi) 1/ Σ Σ 1/ = f( Σ) f(σ) + g(σ), (S6.5) where f(z) = (z +λ) 1/ z and g(z) = f(z) z 1/. Because both f(z) and g(z) are analytic functions in the domain G = {z = x + iy : x > λ/} of the complex plane, it follows the theory in Section 1.6 of Rudin (1991) that f( Σ) f(σ) = f(z)(zi Σ) 1 dz C 1 +C +C 3 +C 4 f(z)(zi Σ) 1 dz, C 1 +C +C 3 +C 4 (S6.51) where C i, 1 i 4, are four segments in the domain G shown in Figure S.13.

41 S6. PROOFS OF TECHNICAL LEMMAS y λ + i C (M+1)+i C 1 C 3 x λ i C 4 (M+1) i Figure S.13: The contour for the integration in (S6.51) and M is the larger one of the maximum eigenvalues of Σ and Σ. Now we estimate f(z)(zi Σ) 1 dz f(z)(zi Σ) 1 dz = f(z) [(zi C 1 C 1 C Σ) ] 1 (zi Σ) 1 dz 1 f(z) (zi Σ) 1 (zi Σ) 1 dz. (S6.5) C 1 Define = Σ Σ. (S6.53) Then (zi Σ) 1 (zi Σ) 1 = (zi Σ) 1 ( Σ Σ)(zI Σ) 1 = (zi Σ) 1 (zi Σ) 1.

42 Xin Qi and Ruiyan Luo For any z C 1, (zi Σ) 1 (zi Σ) 1 (zi Σ) 1 (zi Σ) 1 z. where the last inequality follows from the fact that (zi Σ) 1 is less than or equal to the largest one of (z µ 1 ) 1, (z µ ) 1,, and (zi Σ) 1 is less than or equal to the largest one of (z µ 1 ) 1, (z µ ) 1,, (see inequality (3) in Section 1.4 in Rudin (1991)). µ k s and µ k s are the eigenvalues of Σ and Σ, and all of them are nonnegative. Therefore, for any z C 1, (z µ k ) 1 s and (z µ k ) 1 s are all less than or equal to z 1. Then by (S6.5), f(z)(zi Σ) 1 dz f(z)(zi Σ) 1 dz C 1 C 1 f(z) z dz D 3 λ 1/, C 1 where D 3 is a constant. By a similar argument, we can obtain f(z)(zi Σ) 1 dz f(z)(zi Σ) 1 dz D 4. C +C 3 +C 4 C +C 3 +C 4 Therefore, by (S6.51), we have ( Σ + λi) 1/ Σ (Σ + λi) 1/ Σ = f( Σ) f(σ) D 5 λ 1/, (S6.54) By the inequality (3) in Section 1.4 in Rudin (1991), we estimate the second term in (S6.5), (Σ + λi) 1/ Σ Σ 1/ = g(σ) sup g(µ k ) sup f(µ k ) µ 1/ k (S6.55) k 1 k 1 λ µ k = sup ( k 1 µk + λ + ) µ k µk + λ λ1/. (S6.5), (S6.54) and (S6.55) lead to ( Σ + λi) 1/ Σ Σ 1/ D 5 λ 1/ + λ 1/. (S6.56)

43 S6. PROOFS OF TECHNICAL LEMMAS By the same argument as in (S6.55), we have (Σ + λi) 1/ sup k 1 By the central limit theorem in Hilbert space, we have E [ ] = E [ Σ ] Σ With similar arguments as above and (S6.58), we have E 1 µk + λ λ 1/. D 1 n, E [ Ξ ] D n. [ ( Σ ] [ + λi) 1/ H( Σ + λi) 1/ D 6 λ 1/ n 1/ + λ 1 n 1], (S6.57) (S6.58) (S6.59) where D 1, D and D 6 are constants and Ξ is the operator defined in the last line of (S5.38). Now by (S6.49), (S6.56) (S6.59), E[ Γ Γ ] E[ ( Σ + λi) 1/ Σ B ( Σ + λi) 1/ Σ Σ 1/ ] + E[ ( Σ + λi) 1/ Σ Σ 1/ B Σ 1/ ] + E[ ( Σ + λi) 1/ H( Σ + λi) 1/ ] D 7 ( λ 1/ n 1/ + λ 1/ + λ 1/ n 1/ + λ 1 n 1) D 8 n 1/4. (S6.6) where we use the condition λ = C λ n 1/. For any ɛ >, define the event Ω n,ɛ = { Γ Γ D 8 ɛ 1 n 1/4 /3, D 1 ɛ 1 n 1/ /3, and Ξ D ɛ 1 n 1/ /3}. (S6.61) Then by (S6.58), (S6.6) and the Markov inequality, the probability of Ω n,ɛ is greater than 1 ɛ. In the rest of the proof of the lemma, we only consider the event Ω n,ɛ. Step 3: provide upper bounds for η k η k, k 1. We first note that the eigenvalues of Γ are the same as the eigenvalues σ 1 σ of the problem (S5.3) (which is equivalent to the problem (3.11) in the main

44 Xin Qi and Ruiyan Luo manuscript). As the inequalities in (5.) in Hall et al. (7), by the inequality (.1) in Bhatia et al. (1983) and (S6.6), we have sup δ k η k η k 8 Γ Γ D 9 n 1/4, k 1 (S6.6) where we recall that δ k = min 1 j k (σ j σ j+1), the last inequality follows from the definition (S6.61) of Ω n,ɛ and D 9 = 8D 8 ɛ 1 /3. Then it follows that sup k 1 δk 1 η k, η k = 1 sup δk η k η k 1 k 1 D 9n 1/ (S6.63) For any 1 k m, let P be the orthogonal projection operator onto the subspace spanned by { η k : k A}, where A is any subset of positive integers but does not include k. Then by the inequality (.1) in Bhatia et al. (1983), we have δ k Pη k Γ Γ D 9 n 1/4. (S6.64) Step 4: provide upper bounds for σ k G k H and σ k Ĝk H, k 1. Because for any k 1, we have G k (x, s) = F (x, s, t)φ k(t)dt/σ k, G k L = σ 4 k = σ k F L. G k (x, s) dxds = σ 4 k F (x, s, t) dtdxds [ F (x, s, t)φ k (t)dt] dxds φ k (t) dt = σ k F (x, s, t) dtdxds Similarly, we have xx G k L σ k xxf L, xs G k L σ k xsf L and ss G k L σ k ssf L. Hence, we have σ k G k H = σ k[ G k L + xxg k L + xsg k L + ssg k L ] D 1, (S6.65) for all k 1, where D 1 = F L + xx F L + xs F L + ss F L.

45 S6. PROOFS OF TECHNICAL LEMMAS To provide the upper bound for σ 1 Ĝ1, we consider the following inequality G 1, ΛG 1 H G 1, ( Σ + λi)g 1 H Ĝ1, ΛĜ1 H Ĝ1, ( Σ + λi)ĝ1 H = Ĝ1, ΛĜ1 H, (S6.66) where the inequality is because Ĝ1 is the solution to (S5.31) and the equality is because of Ĝ1, ( Σ + λi)ĝ1 = 1. By a tedious calculation, it follows from (S6.66) that in the event Ω n,ɛ, we have σ1 Ĝ1 D 11, where D 11 is a constant. For a general k, we can similarly obtain σ k Ĝk D 11, (S6.67) Step 5: prove the inequalities in the lemma. Because η k = Σ 1/ G k and η k = ( Σ + λi) 1/ Ĝ k, we have where q k = Σ 1/ Ĝ k = η k + q k, Σ1/ Gk = η k + q k, (S6.68) [ ( Σ + λi) 1/ Σ 1/] [ Ĝ k and q k = Σ1/ Σ 1/] G k. By similar arguments as in Step, we can show that ( Σ + λi) 1/ Σ 1/ D 1 n 1/4 and Σ 1/ Σ 1/ D 13 n 1/4. By (S6.65) and (S6.67), For any 1 j K, Ĝ(a) j G j, Σ(Ĝ(a) j q k D 14 σ k n 1/4, q k D 15 σ k n 1/4. (S6.69) G j ) = Σ 1/ Ĝ (a) j Ĝj, ΣG j Σ 1/ Ĝ j Σ 1/ G j + By (S6.6), (S6.63), (S6.69), we have Σ 1/ G j = k j, Ĝk, ΣG j Σ 1/ Ĝ k Σ 1/ G j Ĝk, ΣG j Σ 1/ Ĝ k. (S6.7) Ĝj, ΣG j Σ 1/ Ĝ j Σ 1/ G j = η j + q j, η j + q j ( η j + q j ) (η j + q j )

46 Xin Qi and Ruiyan Luo D 16 [δ j + σ 4 j ]n 1/ D 16 δ j n 1/ (S6.71) where the last inequality is because δ j = min 1 l j (σ l σ l+1 ) σ j. Similarly, k j, D 17 [ Ĝk, ΣG j Σ 1/ Ĝ k = k j, η k, η j η k + ( k j, η k + q k, η j + q j ( η k + q k ) ] ( K q k ) + q j D 18 δ j + σ k ) (S6.7) n 1/ where the last inequality follows from (S6.64), (S6.69) and the fact that K k j, η k, η j η k is the orthogonal projection of η j onto the space spanned by { η k : 1 k K, k j}. Combining (S6.7), (S6.71) and (S6.7) leads to Ĝ(a) j G j, Σ(Ĝ(a) j G j ) D 19 δ j + for any 1 l K. Similarly, for any l > K, we have Ĝ(a) j G j, Σ(Ĝ(a) j G j ) 1 + D δ K ( K + ( K σ k ) σ k n 1/, ) n 1/. Bibliography Adams, R. A. and J. J. Fournier (3). Sobolev spaces, Volume 14 of Pure and Applied Mathematics. Academic Press, Amsterdam. Bhatia, R., C. Davis, and A. McIntosh (1983). Perturbation of spectral subspaces and solution of linear operator equations. Linear Algebra and its Applications 5, Hall, P., J. L. Horowitz, et al. (7). Methodology and convergence rates for functional linear regression. The Annals of Statistics 35 (1), Luo, R. and X. Qi (17). Function-on-function linear regression by signal compression. Journal of the American Statistical Association 11 (518),

47 BIBLIOGRAPHY Rasmussen, C. E. and C. K. I. Williams (5). Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning series), Chapter 4. The MIT Pres. Rudin, W. (1991). Functional Analysis. International series in pure and applied mathematics. McGraw-Hill.

arxiv: v2 [math.pr] 27 Oct 2015

arxiv: v2 [math.pr] 27 Oct 2015 A brief note on the Karhunen-Loève expansion Alen Alexanderian arxiv:1509.07526v2 [math.pr] 27 Oct 2015 October 28, 2015 Abstract We provide a detailed derivation of the Karhunen Loève expansion of a stochastic