arxiv: v1 [math.st] 12 Nov 2012

Size: px
Start display at page:

Download "arxiv: v1 [math.st] 12 Nov 2012"

Transcription

1 he Annals of Statistics 2010, Vol. 38, No. 6, DOI: /09-AOS772 c Institute of Mathematical Statistics, 2010 arxiv: v1 [math.s] 12 Nov 2012 A REPRODUCING KERNEL HILBER SPACE APPROACH O FUNCIONAL LINEAR REGRESSION By Ming Yuan 1 and. ony Cai 2 Georgia Institute of echnology and University of Pennsylvania We study in this paper a smoothness regularization method for functional linear regression and provide a unified treatment for both the prediction and estimation problems. By developing a tool on simultaneous diagonalization of two positive definite ernels, we obtain shaper results on the minimax rates of convergence and show that smoothness regularized estimators achieve the optimal rates of convergence for both prediction and estimation under conditions weaer than those for the functional principal components based methods developed in the literature. Despite the generality of the method of regularization, we show that the procedure is easily implementable. Numerical results are obtained to illustrate the merits of the method and to demonstrate the theoretical developments. 1. Introduction. Consider the following functional linear regression model where the response Y is related to a square integrable random function X( ) through (1) Y =α 0 + X(t)β 0 (t)dt+ε. Here α 0 is the intercept, is the domain of X( ), β 0 ( ) is an unnown slope function and ε is a centered noise random variable. he domain is assumed to be a compact subset of an Euclidean space. Our goal is to estimate α 0 Received March 2009; revised November Supported in part by NSF Grant DMS-MPSA and NSF CAREER Award DMS Supported in part by NSF Grant DMS and NSF FRG Grant DMS AMS 2000 subject classifications. Primary 62J05; secondary 62G20. Key words and phrases. Covariance, eigenfunction, eigenvalue, functional linear regression, minimax, optimal convergence rate, principal component analysis, reproducing ernel Hilbert space, Sacs Ylvisaer conditions, simultaneous diagonalization, slope function, Sobolev space. his is an electronic reprint of the original article published by the Institute of Mathematical Statistics in he Annals of Statistics, 2010, Vol. 38, No. 6, his reprint differs from the original in pagination and typographic detail. 1

2 2 M. YUAN AND.. CAI and β 0 ( ) as well as to retrieve (2) η 0 (X):=α 0 + X(t)β 0 (t)dt based on a set of training data (x 1,y 1 ),...,(x n,y n ) consisting of n independent copies of (X,Y). We shall assume that the slope function β 0 resides in a reproducing ernel Hilbert space (RKHS) H, a subspace of the collection of square integrable functions on. In this paper, we investigate the method of regularization for estimating η 0, as well as α 0 and β 0. Let l n be a data fit functional that measures how well η fits the data and J be a penalty functional that assesses the plausibility of η. he method of regularization estimates η 0 by (3) ˆη nλ =argmin[l n (η data)+λj(η)], η where the minimization { is taen over (4) η:l 2 ( ) R η(x)=α+ } Xβ:α R,β H, and λ 0 isatuningparameter thatbalances thefidelitytothedataandthe plausibility. Equivalently, the minimization can be taen over (α, β) instead of η to obtain estimates for both the intercept and slope, denoted by ˆα nλ and ˆβ nλ hereafter. he most common choice of the data fit functional is the squared error l n (η)= 1 n (5) [y i η(x i )] 2. n i=1 In general, l n is chosen such that it is convex in η and El n (η) in uniquely minimized by η 0. In the context of functional linear regression, the penalty functional can be conveniently defined through the slope function β as a squared norm or semi-norm associated with H. he canonical example of H is the Sobolev spaces. Without loss of generality, assume that =[0,1], the Sobolev space of order m is then defined as W m 2 ([0,1])={β:[0,1] R β,β (1),...,β (m 1) are absolutely continuous and β (m) L 2 }. here are many possible norms that can be equipped with W2 m to mae it a reproducing ernel Hilbert space. For example, it can be endowed with the norm m 1 β 2 W2 m = ( ) 2 (6) β (q) + (β (m) ) 2. q=0 he readers are referred to Adams (1975) for a thorough treatment of this subject. In this case, a possible choice of the penalty functional is

3 given by (7) RKHS APPROACH O FLR 3 J(β)= 1 0 [β (m) (t)] 2 dt. Another setting of particular interest is =[0,1] 2 which naturally occurs when X represents an image. A popular choice in this setting is the thin plate spline where J is given by 1 1 [( 2 ) β 2 ( 2 ) β 2 ( 2 ) β 2 ] (8) J(β)= +2 + dx 1 dx 2, x 1 x x 2 1 and (x 1,x 2 ) are the arguments of bivariate function β. Other examples of include ={1,2,...,p} for some positive integer p, and unit sphere in an Euclidean space among others. he readers are referred to Wahba (1990) for common choices of H and J in these as well as other contexts. Other than the methods of regularization, a number of alternative estimators have been introduced in recent years for the functional linear regression [James (2002); Cardot, Ferraty and Sarda (2003); Ramsay and Silverman (2005); Yao, Müller and Wang (2005); Ferraty and Vieu (2006); Cai and Hall (2006); Li and Hsing (2007); Hall and Horowitz (2007); Crambes, Kneip and Sarda(2009); Johannes(2009)]. Most of the existing methods are based upon the functional principal component analysis (FPCA). he success of these approaches hinges on the availability of a good estimate of the functional principal components for X( ). In contrast, the aforementioned smoothness regularized estimator avoids this tas and therefore circumvents assumptions on the spacing of the eigenvalues of the covariance operator for X( ) as well as Fourier coefficients of β 0 with respect to the eigenfunctions, which are required by the FPCA-based approaches. Furthermore, as we shall see in the subsequent theoretical analysis, because the regularized estimator does not rely on estimating the functional principle components, stronger results on the convergence rates can be obtained. Despite the generality of the method of regularization, we show that the estimators can be computed rather efficiently. We first derive a representer theorem in Section 2 which demonstrates that although the minimization with respect to η in (3) is taen over an infinite-dimensional space, the solution can actually be found in a finite-dimensional subspace. his result maes our procedure easily implementable and enables us to tae advantage of the existing techniques and algorithms for smoothing splines to compute ˆη nλ, ˆβ nλ and ˆα nλ. We then consider in Section 3 the relationship between the eigen structures of the covariance operator for X( ) and the reproducing ernel of the RKHS H. hese eigen structures play prominent roles in determining the difficulty of the prediction and estimation problems in functional linear regression. We prove in Section 3 a result on simultaneous diagonalization of the reproducing ernel of the RKHS H and the covariance operator of x 2 2

4 4 M. YUAN AND.. CAI X( ) which provides a powerful machinery for studying the minimax rates of convergence. Section 4 investigates the rates of convergence of the smoothness regularized estimators. Both the minimax upper and lower bounds are established. he optimal convergence rates are derived in terms of a class of intermediate norms which provide a wide range of measures for the estimation accuracy. In particular, this approach gives a unified treatment for both the prediction of η 0 (X) and the estimation of β 0. he results show that the smoothness regularized estimators achieve the optimal rate of convergence for both prediction and estimation under conditions weaer than those for the functional principal components based methods developed in the literature. he representer theorem maes the regularized estimators easy to implement. Several efficient algorithms are available in the literature that can be used for the numerical implementation of our procedure. Section 5 presents numerical studies to illustrate the merits of the method as well as demonstrate the theoretical developments. All proofs are relegated to Section Representertheorem. hesmoothnessregularizedestimators ˆη nλ and ˆβ nλ are defined as the solution to a minimization problem over an infinitedimensional space. Before studying the properties of the estimators, we first show that the minimization is indeed well defined and easily computable thans to a version of the so-called representer theorem. Let the penalty functional J be a squared semi-norm on H such that the null space (9) H 0 :={β H:J(β)=0} is a finite-dimensional linear subspace of H with orthonormal basis {ξ 1,..., ξ N } where N :=dim(h 0 ). Denote by H 1 its orthogonal complement in H such that H = H 0 H 1. Similarly, for any function f H, there exists a unique decomposition f = f 0 + f 1 such that f 0 H 0 and f 1 H 1. Note H 1 forms a reproducing ernel Hilbert space with the inner product of H restricted to H 1. Let K(, ) be the corresponding reproducing ernel of H 1 such that J(f 1 ) = f 1 2 K = f 1 2 H for any f 1 H 1. Hereafter we use the subscript K to emphasize the correspondence between the inner product and its reproducing ernel. In what follows, we shall assume that K is continuous and square integrable. Note that K is also a nonnegative definite operator on L 2. With slight abuse of notation, write (10) (Kf)( )= K(,s)f(s)ds. Itisnown[see, e.g., CucerandSmale(2001)]that Kf H 1 forany f L 2. Furthermore, for any f H 1 (11) f(t)β(t)dt= Kf,β H.

5 RKHS APPROACH O FLR 5 his observation allows us to prove the following result which is important to both numerical implementation of the procedure and our theoretical analysis. heorem 1. Assume that l n depends on η only through η(x 1 ),η(x 2 ),..., η(x n ); then there exist d=(d 1,...,d N ) R N and c=(c 1,...,c n ) R n such that N n (12) ˆβ nλ (t)= d ξ (t)+ c i (Kx i )(t). heorem 1 is a generalization of the well-nown representer lemma for smoothing splines (Wahba, 1990). It demonstrates that although the minimization with respect to η is taen over an infinite-dimensional space, the solution can actually be found in a finite-dimensional subspace, and it suffices to evaluate the coefficients c and d in (12). Its proof follows a similar argument as that of heorem in Wahba (1990) where l n is assumed to be squared error, and is therefore omitted here for brevity. Consider, for example, the squared error loss. he regularized estimator is given by (13) (ˆα nλ, ˆβ nλ )= argmin α R,β H { 1 n n [ y i i=1 It is not hard to see that (14) ˆα nλ =ȳ i=1 ( } 2 α+ x i (t)β(t)dt)] +λj(β). x(t)ˆβ nλ (t)dt, where x(t) = n 1 n i=1 x i(t) and ȳ = n 1 n i=1 y i are the sample average of x and y, respectively. Consequently, (13) yields { 1 n [ } 2 (15) ˆβ nλ =argmin (y i ȳ) (x i (t) x(t))β(t)dt] +λj(β). β H n i=1 For the purpose of illustration, assume that H = W2 2 and J(β) = (β ) 2. hen H 0 is the linear space spanned by ξ 1 (t)=1 and ξ 2 (t)=t. A popular reproducing ernel associated with H 1 is (16) K(s,t)= 1 (2!) 2B 2(s)B 2 (t) 1 4! B 4( s t ), where B m ( ) is the mth Bernoulli polynomial. he readers are referred to Wahba (1990) for further details. Following heorem 1, it suffices to consider β of the following form: n (17) β(t)=d 1 +d 2 t+ c i [x i (s) x(s)]k(t,s)ds i=1

6 6 M. YUAN AND.. CAI for some d R 2 and c R n. Correspondingly, [X(t) x(t)]β(t) dt =d 1 + [X(t) x(t)]dt+d 2 n c i i=1 Note also that for β given in (17) (18) [X(t) x(t)]tdt [x i (s) x(s)]k(t,s)[x(t) x(t)]dsdt. J(β)=c Σc, where Σ=(Σ ij ) is a n n matrix with (19) Σ ij = [x i (s) x(s)]k(t,s)[x j (t) x(t)]dsdt. Denote by =( ij ) an n 2 matrix whose (i,j) entry is (20) ij = [x i (t) x(t)]t j 1 dt for j =1, 2. Set y=(y 1,...,y n ). hen (21) l n (η)+λj(β)= 1 n y (d+σc) 2 l 2 +λc Σc, which is quadratic in c and d, and the explicit form of the solution can be easily obtained for such a problem. his computational problem is similar to that behind the smoothing splines. Write W =Σ+nλI; then the minimizer of (21) is given by d=( W 1 ) 1 W 1 y, c=w 1 [I ( W 1 ) 1 W 1 ]y. 3. Simultaneous diagonalization. Before studying the asymptotic properties of the regularized estimators ˆη nλ and ˆβ nλ, we first investigate the relationship between the eigen structures of the covariance operator for X( ) and the reproducing ernel of the functional space H. As observed in earlier studies[e.g., Cai and Hall(2006); Hall and Horowitz(2007)], eigen structures play prominent roles in determining the nature of the estimation problem in functional linear regression. Recall that K is the reproducing ernel of H 1. Because K is continuous and square integrable, it follows from Mercer s theorem [Riesz and Sz-Nagy (1955)] that K admits the following spectral decomposition: (22) K(s,t)= ρ ψ (s)ψ (t).

7 RKHS APPROACH O FLR 7 Here ρ 1 ρ 2 are the eigenvalues of K, and {ψ 1,ψ 2,...} are the corresponding eigenfunctions, that is, (23) Moreover, (24) Kψ =ρ ψ,,2,... ψ i,ψ j L2 =δ ij and ψ i,ψ j K =δ ij /ρ j, where δ ij is the Kronecer s delta. Consider, for example, the univariate Sobolev space W2 m ([0,1]) with norm (6) and penalty (7). Observe that { } (25) H 1 = f H: f () =0, =0,1,...,m 1. It is nown that [see, e.g., Wahba (1990)] (26) K(s,t)= 1 (m!) 2B m(s)b m (t)+ ( 1)m 1 B 2m ( s t ). (2m)! Recall that B m is the mth Bernoulli polynomial. It is nown [see, e.g., Micchelli and Wahba (1981)] that in this case, ρ 2m, where for two positive sequences a and b, a b means that a /b is bounded away from 0 and as. Denote by C the covariance operator for X, that is, (27) C(s,t)=E{[X(s) E(X(s))][X(t) E(X(t))]}. here is a duality between reproducing ernel Hilbert spaces and covariance operators [Stein (1999)]. Similarly to the reproducing ernel K, assuming that the covariance operator C is continuous and square integrable, we also have the following spectral decomposition (28) C(s,t)= µ φ (s)φ (t), where µ 1 µ 2 are the eigenvalues and {φ 1,φ 2,...} are the eigenfunctions such that (29) Cφ := C(,t)φ (t)dt=µ φ,,2,... he decay rate of the eigenvalues {µ : 1} can be determined by the smoothness of the covariance operator C. More specifically, when C satisfies the so-called Sacs Ylvisaer conditions of order s where s is a nonnegative integer [Sacs and Ylvisaer (1966, 1968, 1970)], then µ 2(s+1). he readers are referred to the original papers by Sacs and Ylvisaer or a more recent paper by Ritter, Wasilowsi and Woźniawsi (1995) for

8 8 M. YUAN AND.. CAI detailed discussions of the Sacs Ylvisaer conditions. he conditions are also stated in the Appendix for completeness. Roughly speaing, a covariance operator C is said to satisfy the Sacs Ylvisaer conditions of order 0 if it is twice differentiable when s t but not differentiable when s=t. A covariance operator C satisfies the Sacs Ylvisaer conditions of order r for an integer r>0 if 2r C(s,t)/( s r t r ) satisfies the Sacs Ylvisaer conditions of order 0. In this paper, we say a covariance operator C satisfies the Sacs Ylvisaer conditions if C satisfies the Sacs Ylvisaer conditions of order r for some r 0. Various examples of covariance functions are nown to satisfy Sacs Ylvisaer conditions. For example, the Ornstein Uhlenbec covariance function C(s, t) = exp( s t ) satisfies the Sacs Ylvisaer conditions of order 0. Ritter, Wasilowsi and Woźniaowsi (1995) recently showed that covariance functions satisfying the Sacs Ylvisaer conditions are also intimately related to Sobolev spaces, a fact that is useful for the purpose of simultaneously diagonalizing K and C as we shall see later. Note that the two sets of eigenfunctions {ψ 1,ψ 2,...} and {φ 1,φ 2,...} may differ from each other. he two ernels K and C can, however, besimultaneously diagonalized. o avoid ambiguity, we shall assume in what follows that Cf 0 for any f H 0 and f 0. When using the squared error loss, this is also a necessary condition to ensure that El n (η) is uniquely minimized even if β is nown to come from the finite-dimensional space H 0. Under this assumption, we can define a norm R in H by (30) f 2 R = Cf,f L 2 +J(f)= f(s)c(s,t)f(t)dsdt+j(f). Note that R is a norm because f 2 R defined above is a quadratic form and is zero if and only if f =0. he following proposition shows that when this condition holds, R is well defined on H and equivalent to its original norm, H, in that there exist constants 0<c 1 <c 2 < such that c 1 f R f H c 2 f R for all f H. In particular, f R < if and only if f H <. Proposition 2. If Cf 0 for any f H 0 and f 0, then R and H are equivalent. Let R be the reproducing ernel associated with R. Recall that R can also be viewed as a positive operator. Denote by {(ρ,ψ ): 1} the eigenvalues and eigenfunctions of R. hen R is a linear map from L 2 to L 2 such that (31) Rψ = R(,t)ψ (t)dt=ρ ψ,,2,... he square root of the positive definite operator can therefore be given as the linear map from L 2 to L 2 such that (32) R 1/2 ψ =(ρ )1/2 ψ,,2,...

9 RKHS APPROACH O FLR 9 Letν 1 ν 2 betheeigenvaluesoftheboundedlinearoperatorr 1/2 CR 1/2 and {ζ :,2,...} be the corresponding orthogonal eigenfunctions in L 2. Write ω = ν 1/2 R 1/2 ζ, = 1,2,... Also let, R be the inner product associated with R, that is, for any f,g H, (33) It is not hard to see that (34) and ω j,ω R =ν 1/2 f,g R = 1 4 ( f +g 2 R f g 2 R ). j ν 1/2 C 1/2 ω j,c 1/2 ω L2 =ν 1/2 R 1/2 ζ j,r 1/2 ζ R =ν 1 ζ j,ζ L2 =ν 1 δ j, j ν 1/2 =ν 1/2 j ν 1/2 =δ j. C 1/2 R 1/2 ζ j,c 1/2 R 1/2 ζ L2 R 1/2 CR 1/2 ζ j,ζ L2 he following theorem shows that quadratic forms f 2 R = f,f R and Cf, f L2 can be simultaneously diagonalized on the basis of {ω : 1}. (35) heorem 3. For any f H, f = f ω, in the absolute sense where f = ν f,ω R. Furthermore, if γ = (ν 1 1) 1, then (36) f,f R = (1+γ 1 )f2 and Cf,f L2 = f 2. Consequently, (37) J(f)= f,f R Cf,f L2 = γ 1 f2. Note that {(γ,ω ): 1} can be determined jointly by {(ρ,ψ ): 1} and {(µ,φ ): 1}. However, in general, neither γ nor ω can be given in explicit form of {(ρ,ψ ): 1} and {(µ,φ ): 1}. One notable exception is the case when the operators C and K are commutable. In particular, the setting ψ = φ, = 1,2,..., is commonly adopted when studying FPCA-based approaches [see, e.g., Cai and Hall (2006); Hall and Horowitz (2007)]. Proposition 4. Assume that ψ =φ,,2,..., then γ =ρ µ and ω =µ 1/2 ψ.

10 10 M. YUAN AND.. CAI In general, when ψ and φ differ, such a relationship no longer holds. he following theorem reveals that similar asymptotic behavior of γ can still be expected in many practical settings. heorem 5. Consider the one-dimensional case when =[0, 1]. If H is the Sobolev space W2 m ([0,1]) endowed with norm (6), and C satisfies the Sacs Ylvisaer conditions, then γ µ ρ. heorem 5 shows that under fairly general conditions γ µ ρ. In this case, there is little difference between the general situation and the special case when K and C share a common set of eigenfunctions when woring with the system {(γ,ω ), =1,2,...}. his observation is crucial for our theoretical development in the next section. 4. Convergence rates. We now turn to the asymptotic properties of the smoothness regularized estimators. o fix ideas, in what follows, we shall focus on the squared error loss. Recall that in this case { 1 n [ } (38) (ˆα nλ, ˆβ nλ )= argmin α R,β H n i=1 y i ( α+ x i (t)β(t)dt)] 2 +λj(β) As shown before, the slope function can be equivalently defined as { 1 n [ } 2 (39) ˆβ nλ =argmin (y i ȳ) (x i (t) x(t))β(t)dt] +λj(β), β H n i=1 and once ˆβ nλ is computed, ˆα nλ is given by (40) ˆα nλ =ȳ x(t)ˆβ nλ (t)dt. In light of this fact, we shall focus our attention on ˆβ nλ in the following discussion for brevity. We shall also assume that the eigenvalues of the reproducing ernel K satisfies ρ 2r for some r > 1/2. Let F(s,M,K) be the collection of the distributions F of the process X that satisfy the following conditions: (a) he eigenvalues µ of its covariance operator C(, ) satisfy µ 2s for some s>1/2. (b) For any function f L 2 ( ), ( ) 4 E f(t)[x(t) E(X)(t)]dt (41) M [ ( 2 ] 2 E f(t)[x(t) E(X)(t)]dt)..

11 RKHS APPROACH O FLR 11 (c) When simultaneously diagonalizing K and C, γ ρ µ, where ν = (1+γ 1 ) 1 is the th largest eigenvalue of R 1/2 CR 1/2 where R is the reproducing ernel associated with R defined by (30). he first condition specifies the smoothness of the sample path of X( ). he second condition concerns the fourth moment of a linear functional of X( ). his condition is satisfied with M =3 for a Gaussian process because f(t)x(t)dt is normally distributed. In the light of heorem 5, the last condition is satisfied by any covariance function that satisfies the Sacs Ylvisaer conditions if H is taen tobe W2 m with norm(6). Itis also trivially satisfied if the eigenfunctions of the covariance operator C coincide with those of K Optimal rates of convergence. We are now ready to state our main results on the optimal rates of convergence, which are given in terms of a class of intermediate norms between f K and ( 1/2 (42) f(s)c(s,t)f(t)dsdt), which enables a unified treatment of both the prediction and estimation problems. For 0 a 1 define the norm a by (43) f 2 a = (1+γ a )f2, where f = ν f,ω R as shown in heorem 3. Clearly f 0 reduces to Cf,f L2 whereas f 1 = f R. he convergence rate results given below are valid for all 0 a 1. hey cover a range of interesting cases including the prediction error and estimation error. he following result gives the optimal rate of convergence for the regularized estimator ˆβ nλ with an appropriately chosen tuning parameter λ under the loss a. heorem 6. Assume that E(ε i ) = 0 and Var(ε i ) M 2. Suppose the eigenvalues ρ of the reproducing ernel K of the RKHS H satisfy ρ 2r for some r>1/2. hen the regularized estimator ˆβ nλ with (44) satisfies (45) lim lim D n =0. sup F F(s,M,K),β 0 H λ n 2(r+s)/(2(r+s)+1) P( ˆβ nλ β 0 2 a >Dn 2(1 a)(r+s)/(2(r+s)+1) ) Note that the rate of the optimal choice of λ does not depend on a. heorem 6 shows that the optimal rate of convergence for the regularized

12 12 M. YUAN AND.. CAI estimator ˆβ nλ is n 2(1 a)(r+s)/(2(r+s)+1). he following lower bound result demonstrates that this rate of convergence is indeed optimal among all estimators, and consequently the upper bound in equation (45) cannot be improved. Denote by B the collection of all measurable functions of the observations (X 1,Y 1 ),...,(X n,y n ). heorem 7. Under the assumptions of heorem 6, there exists a constant d>0 such that lim inf sup P( β β 0 2 a n β B >dn 2(1 a)(r+s)/(2(r+s)+1) )>0. F F(s,M,K),β 0 H (46) Consequently, the regularized estimator ˆβ nλ with λ n 2(r+s)/(2(r+s)+1) is rate optimal. he results, given in terms of a, provide a wide range of measures of the quality of an estimate for β 0. Observe that ( 2 (47) β β =E X β(t)x (t)dt β 0 (t)x (t)dt), where X is an independent copy of X, and the expectation on the righthand side is taen over X. he right-hand side is often referred to as the prediction error in regression. It measures the mean squared prediction error for a random future observation on X. From heorems 6 and 7, we have the following corollary. Corollary 8. Under the assumptions of heorem 6, the mean squared optimal prediction error of a slope function estimator over F F(s,M,K) and β 0 H is of the order n 2(r+s) 2(r+s)+1 and it can be achieved by the regularized estimator ˆβ nλ with λ satisfying (44). he result shows that the faster the eigenvalues of the covariance operator C for X( ) decay, the smaller the prediction error. When ψ =φ, the prediction error of a slope function estimator β can also be understood as the squared prediction error for a fixed predictor x ( ) such that x,φ L2 s following the discussed from the last section. A similar prediction problem has also been considered by Cai and Hall (2006) for FPCA-based approaches. In particular, they established a similar minimax lower bound and showed that the lower bound can be achieved by the FPCA-based approach, but with additional assumptions that µ µ +1 C0 1 2s 1, and 2r>4s+3. Our results here indicate that both restrictions are unnecessary for establishing the minimax rate for the prediction error. Moreover, in contrast to the FPCA-based approach, the regularized estimator ˆβ nλ can achieve the optimal rate without the extra requirements.

13 RKHS APPROACH O FLR 13 o illustrate the generality of our results, we consider an example where =[0,1], H=W2 m ([0,1]) and the stochastic process X( ) is a Wiener process. It is not hard to see that the covariance operator of X, C(s,t) = min{s, t}, satisfies the Sacs Ylvisaer conditions of order 0 and therefore µ 2. By Corollary 8, the minimax rate of the prediction error in estimating β 0 is n (2m+2)/(r2m+3). Note that the condition 2r>4s+3 required by Cai and Hall (2006) does not hold here for m 7/ he special case of φ =ψ. It is of interest to further loo into the case when the operators C and K share a common set of eigenfunctions. As discussed in the last section, we have in this case φ =ψ and γ 2(r+s) for all 1. In this context, heorems 6 and 7 provide bounds for more general prediction problems. Consider estimating x β 0 where x satisfies x,φ L2 s+q for some 0<q<s 1/2. Note that q<s 1/2 is needed to ensure that x is square integrable. he squared prediction error ( ) 2 (48) β(t)x (t)dt β 0 (t)x (t)dt is therefore equivalent to β β 0 (s q)/(r+s). he following result is a direct consequence of heorems 6 and 7. Corollary 9. Suppose x is a function satisfying x,φ L2 s+q for some 0<q<s 1/2. hen under the assumptions of heorem 6, {( ) 2 lim inf sup P β(t)x (t)dt β 0 (t)x (t)dt n β B F F(s,M,K),β 0 H (49) >dn 2(r+q)/(2(r+s)+1) }>0 for some constant d>0, and the regularized estimator ˆβ nλ with λ satisfying (44) achieves the optimal rate of convergence under the prediction error (48). It is also evident that when ψ =φ, s/(r+s) is equivalent to L2. herefore, heorems 6 and 7 imply the following result. Corollary 10. If φ =ψ for all 1, then under the assumptions of heorem 6 (50) lim inf n sup β B F F(s,M,K),β 0 H P( β β 0 2 L 2 >dn 2r/(2(r+s)+1) )>0 for some constant d>0, and the regularized estimate ˆβ nλ with λ satisfying (44) achieves the optimal rate. his result demonstrates that the faster the eigenvalues of the covariance operator for X( ) decay, the larger the estimation error. he behavior of the estimation error thus differs significantly from that of prediction error.

14 14 M. YUAN AND.. CAI Similar results on the lower bound have recently been obtained by Hall and Horowitz (2007) who considered estimating β 0 under the assumption that β 0,φ L2 decays in a polynomial order. Note that this slightly differs from our setting where β 0 H means that (51) ρ 1 β 0,ψ 2 L 2 = ρ 1 β 0,φ 2 L 2 <. Recall that ρ 2r. Condition (51) is comparable to, and slightly stronger than, (52) β 0,φ L2 M 0 r 1/2 for some constant M 0 > 0. When further assuming that 2s+1<2r, and µ µ +1 M0 1 2s 1 for all 1, Hall and Horowitz (2007) obtain the same lower bound as ours. However, we do not require that 2s + 1 < 2r which in essence states that β 0 is smoother than the sample path of X. Perhaps, more importantly, we do not require the spacing condition µ µ +1 M0 1 2s 1 on the eigenvalues because we do not need to estimate the corresponding eigenfunctions. Such a condition is impossible to verify even for a standard RKHS Estimating derivatives. heorems 6 and 7 can also be used for estimating the derivatives of β 0. A natural estimator of the qth derivative of β 0, β (q) (q) 0, is ˆβ nλ, the qth derivative of ˆβ nλ. In addition to φ =ψ, assume that ψ (q) /ψ q. his clearly holds when H=W2 m. In this case (53) β (q) β (q) 0 L 2 C 0 β β 0 (s+q)/(r+s). he following is then a direct consequence of heorems 6 and 7. Corollary 11. Assume that φ =ψ and ψ (q) /ψ q for all 1. hen under the assumptions of heorem 6, for some constant d>0, (54) lim n inf sup β (q) BF F(s,M,K),β 0 H >0, P( β (q) β (q) 0 2 L 2 >dn 2(r q)/(2(r+s)+1) ) and the regularized estimate ˆβ nλ with λ satisfying (44) achieves the optimal rate. Finally, we note that although we have focused on the squared error loss here, the method of regularization can be easily extended to handle other goodness of fit measures as well as the generalized functional linear regression [Cardot and Sarda(2005) and Müller and Stadtmüller(2005)]. We shall leave these extensions for future studies.

15 RKHS APPROACH O FLR Numerical results. he Representer heorem given in Section 2 maes the regularized estimators easy to implement. Similarly to smoothness regularized estimators in other contexts [see, e.g., Wahba (1990)], ˆη nλ and ˆβ nλ can be expressed as a linear combination of a finite number of nown basis functions although the minimization in (3) is taen over an infinitelydimensional space. Existing algorithms for smoothing splines can thus be used to compute our regularized estimators ˆη nλ, ˆβ nλ and ˆα nλ. o demonstrate the merits of the proposed estimators in finite sample settings, we carried out a set of simulation studies. We adopt the simulation setting of Hall and Horowitz (2007) where =[0,1]. hetrue slopefunction β 0 is given by (55) 50 β 0 = 4( 1) +1 2 φ, where φ 1 (t)=1 and φ +1 (t)= 2cos(πt) for 1. he random function X was generated as (56) 50 X = ζ Z φ, where Z are independently sampled from the uniform distribution on [ 3, 3] and ζ are deterministic. It is not hard to see that ζ 2 are the eigenvalues of the covariance function of X. Following Hall and Horowitz (2007), two sets of ζ were used. In the first set, the eigenvalues are well spaced: ζ =( 1) +1 ν/2 with ν =1.1,1.5,2 or 4. In the second set, { 1,, (57) ζ = 0.2( 1) +1 ( ), 2 4, 0.2( 1) +1 [(5 /5 ) ν/ (mod 5)], 5. AsinHallandHorowitz(2007),regressionmodelswithε N(0,σ 2 )where σ=0.5 and 1 were considered. o comprehend the effect of sample size, we consider n =50,100,200 and 500. We apply the regularization method to each simulated dataset and examine its estimation accuracy as measured by integrated squared error ˆβ nλ β 0 2 L 2 and prediction error ˆβ nλ β For the purpose of illustration, we tae H=W 2 2 and J(β)= (β ) 2, for which the detailed estimation procedure is given in Section 2. For each setting, the experiment was repeated 1000 times. As is common in most smoothing methods, the choice of the tuning parameter plays an important role in the performance of the regularized estimators. Data-driven choice of the tuning parameter is a difficult problem. Here we apply the commonly used practical strategy of empirically choosing the value of λ through the generalized cross validation. Note that the regularized estimator is a linear estimator in that ŷ = H(λ)y where ŷ=(ˆη nλ (x 1 ),...,ˆη nλ (x n )) and H(λ) is the so-called hat matrix depending

16 16 M. YUAN AND.. CAI Fig. 1. Prediction errors of the regularized estimator (σ = 0.5): X was simulated with a covariance function with well-spaced eigenvalues. he results are averaged over 1000 runs. Blac solid lines, red dashed lines, green dotted lines and blue dash-dotted lines correspond to ν =1.1, 1.5, 2 and 4, respectively. Both axes are in log scale. on λ. We then select the tuning parameter λ that minimizes (58) GCV(λ)= (1/n) ŷ y 2 l 2 (1 tr(h(λ))/n) 2. Denote by ˆλ GCV the resulting choice of the tuning parameter. We begin with the setting of well-spaced eigenvalues. he left panel of Figure 1 shows the prediction error, ˆβ nλ β 0 2 0, for each combination of ν value and sample size when σ=0.5. he results were averaged over 1000 simulation runs in each setting. Both axes are given in the log scale. he plot suggests that the estimation error converges at a polynomial rate as sample size n increases, which agrees with our theoretical results from the previous section. Furthermore, one can observe that with the same sample size, the prediction error tends to be smaller for larger ν. his also confirms our theoretical development which indicates that the faster the eigenvalues of the covariance operator for X( ) decay, the smaller the prediction error.

17 RKHS APPROACH O FLR 17 Fig. 2. Estimation errors of the regularized estimator (σ = 0.5): X was simulated with a covariance function with well-spaced eigenvalues. he results are averaged over 1000 runs. Blac solid lines, red dashed lines, green dotted lines and blue dash-dotted lines correspond to ν =1.1,1.5,2 and 4, respectively. Both axes are in log scale. o better understand the performance of the smoothness regularized estimator and the GCV choice of the tuning parameter, we also recorded the performance of an oracle estimator whose tuning parameter is chosen to minimize the prediction error. his choice of the tuning parameter ensures the optimal performance of the regularized estimator. It is, however, noteworthy that this is not a legitimate statistical estimator since it depends on the nowledge of unnown slope function β 0. he right panel of Figure 1 shows the prediction error associated with this choice of tuning parameter. It behaves similarly to the estimate with λ chosen by GCV. Note that the comparison between the two panels suggest that GCV generally leads to near optimal performance. We now turn to the estimation error. Figure 2 shows the estimation errors, averaged over 1000 simulation runs, with λ chosen by GCV or minimizing the estimation error for each combination of sample size and ν value. Similarly to the prediction error, the plots suggest a polynomial rate of convergence of the estimation error when the sample size increases, and GCV again leads to near-optimal choice of the tuning parameter.

18 18 M. YUAN AND.. CAI Fig. 3. Estimation and prediction errors of the regularized estimator (σ 2 =1 2 ): X was simulated with a covariance function with well-spaced eigenvalues. he results are averaged over 1000 runs. Blac solid lines, red dashed lines, green dotted lines and blue dash-dotted lines correspond to ν =1.1,1.5,2 and 4, respectively. Both axes are in log scale. A comparison between Figures 1 and 2 suggests that when X is smoother (larger ν), prediction (as measured by the prediction error) is easier, but estimation (as measured by the estimation error) tends to be harder, which highlights the difference between prediction and estimation in functional linear regression. We also note that this observation is in agreement with our theoretical results from the previous section where it is shown that the estimation error decreases at the rate of n 2r/(2(r+s)+1) which decelerates as s increases; whereas the prediction error decreases at the rate of n 2(r+s)/(2(r+s)+1) which accelerates as s increases. Figure 3 reports the prediction and estimation error when tuned with GCV for thelarge noise (σ=1) setting. Observations similar to those for the small noise setting can also be made. Furthermore, notice that the prediction errors are much smaller than the estimation error, which confirms our finding from the previous section that prediction is an easier problem in the context of functional linear regression. he numerical results in the setting with closely spaced eigenvalues are qualitatively similar to those in the setting with well-spaced eigenvalues.

19 RKHS APPROACH O FLR 19 Fig. 4. Estimation and prediction errors of the regularized estimator: X was simulated with a covariance function with closely-spaced eigenvalues. he results are averaged over 1000 runs. Both axes are in log scale. Note that y-axes are of different scales across panels. Figure 4 summarizes the results obtained for the setting with closely spaced eigenvalues. We also note that the performance of the regularization estimate with λ tuned with GCV compares favorably with those from Hall and Horowitz (2007) using FPCA-based methods even though their results are obtained with optimal rather than data-driven choice of the tuning parameters. 6. Proofs Proof of Proposition 2. Observe that (59) f(s)c(s,t)f(t)dsdt µ 1 f 2 L 2 c 1 f 2 H for some constant c 1 > 0. ogether with the fact that J(f) f 2 H, we conclude that (60) f 2 R = f(s)c(s,t)f(t)dsdt+j(f) (c 1 +1) f 2 H.

20 20 M. YUAN AND.. CAI Recall that ξ,,...,n, are the orthonormal basis of H 0. Under the assumption of the proposition, the matrix Σ=( Cξ j,ξ H ) 1 j, N is a positive definite matrix. Denote by µ 1 µ 2 µ N >0 its eigenvalues. It is clear that for any f 0 H 0 (61) f 0 2 R µ N f 0 2 H. Note also that for any f 1 H 1, (62) f 1 2 H =J(f 1) f 1 2 R. For any f H, we can write f := f 0 +f 1 where f 0 H 0 and f 1 H 1. hen (63) f 2 R = f(s)c(s,t)f(t)dsdt+ f 1 2 H. Recall that (64) f 1 2 H ρ 1 1 f 1 2 L 2 ρ 1 1 µ 1 1 f 0 (s)c(s,t)f 0 (t)dsdt. For brevity, assume that ρ 1 = µ 1 = 1 without loss of generality. By the Cauchy Schwarz inequality, f 2 R 1 f(s)c(s,t)f(t)dsdt+ f 1 2 H 2 1 f 0 (s)c(s,t)f 0 (t)dsdt+ 3 f 1 (s)c(s,t)f 1 (t)dsdt ( ( ) 1/2 f 0 (s)c(s,t)f 0 (t)dsdt ) 1/2 f 1 (s)c(s,t)f 1 (t)dsdt f 0 (s)c(s,t)f 0 (t)dsdt, where we used the fact that 3a 2 /2 ab b 2 /6 in deriving the last inequality. herefore, (65) 3 f 2 R f 0 2 H. ogether with the facts that f 2 H = f 0 2 H + f 1 2 H and (66) we conclude that (67) he proof is now complete. µ N f 2 R J(f 1 ) f 1 2 H, f 2 R (1+3/µ N ) 1 f 2 H.

21 RKHS APPROACH O FLR Proof of heorem 3. First note that R 1/2 f = R 1/2 f,ζ L2 ζ = R 1/2 f,ν 1/2 R 1/2 ω L2 ν 1/2 R 1/2 ω ( ) ( =R 1/2 ν R 1/2 f,r 1/2 ω L2 ω =R 1/2 ν f,ω R ω ). Applying bounded positive definite operator R 1/2 to both sides leads to (68) f = ν f,ω R ω. Recall that ω,ω j R =ν 1 δ j. herefore, f 2 R = ν f,ω R ω, ν j f,ω j R ω j =,j=1 j=1 ν ν j f,ω R f,ω j R ω,ω j R = ν f,ω 2 R. R Similarly, because Cω,ω j L2 =δ j, ( Cf,f L2 = C ν f,ω R ω ), ν j f,ω j R ω j j=1 = ν f,ω R Cω, ν j f,ω j R ω j =,j=1 j=1 L 2 ν ν j f,ω R f,ω j R Cω,ω j L2 = ν 2 f,ω 2 R. L Proof of Proposition 4. Recall that for any f H 0, Cf 0 if and only if f =0, which implies that H 0 l.s.{φ : 1} ={0}. ogether with the fact that H 0 H 1 ={0}, we conclude that H=H 1 =l.s.{φ : 1}. It is not hard to see that for any f,g H, (69) f,g R = f(s)c(s,t)g(t)dsdt+ f,g K.

22 22 M. YUAN AND.. CAI In particular, (70) ψ j,ψ R =(µ +ρ 1 )δ j, which implies that {((µ +ρ 1 ) 1,ψ ): 1} is also the eigen system of R, that is, (71) R(s,t)= (µ +ρ 1 ) 1 ψ (s)ψ (t). hen (72) Rψ := R(,t)ψ (t)dt=(µ +ρ 1 ) 1 ψ, herefore, R 1/2 CR 1/2 ψ =R 1/2 C((µ +ρ 1 ) 1/2 ψ ),2,... =R 1/2 (µ (µ +ρ 1 ) 1/2 ψ )=(1+ρ 1 µ 1 ) 1 ψ, ) 1 and γ =ρ µ. Con- which implies that ζ =ψ =φ, ν =(1+ρ 1 sequently, (73) µ 1 ω =ν 1/2 R 1/2 ψ =ν 1/2 (µ +ρ 1 ) 1/2 ψ =µ 1/2 ψ Proof of heorem 5. Recall that H=W2 m, which implies that ρ 2m. By Corollary 2 of Ritter, Wasilowsi and Woźniaowsi (1995), µ 2(s+1). It therefore suffices to show γ 2(s+1+m). he ey idea of the proof is a result from Ritter, Wasilowsi and Woźniaowsi (1995) indicating that the reproducing ernel Hilbert space associated with C differs from W2 s+1 ([0, 1]) only by a finite-dimensional linear space of polynomials. DenotebyQ r thereproducingernelforw2 r([0,1]).observethatq1/2 r (L 2 )= W2 r [e.g., Cucer and Smale (2001)]. We begin by quantifying the decay rate of λ (Qm 1/2 Q s+1 Qm 1/2 ). By Sobolev s embedding theorem, (Q 1/2 s+1 Q1/2 m )(L 2 )= Q 1/2 s+1 (Wm 2 )=Wm+s+1 2. herefore, Q 1/2 m Q s+1 Q 1/2 m is equivalent to Q m+s+1. Denote by λ (Q) be the th largest eigenvalue of a positive definite operator Q. Let {h : 1} be the eigenfunctions of Q m+s+1, that is, Q m+s+1 h = λ (Q m+s+1 )h,,2,... Denote by F and F the linear space spanned by{h j :1 j }and{h j :j +1},respectively.BytheCourant Fischer Weyl min max principle, λ (Q 1/2 m Q s+1q 1/2 m ) min f F Q 1/2 s+1 Q1/2 m f 2 L 2 / f 2 L 2 C 1 min f F Q 1/2 m+s+1 f 2 L 2 / f 2 L 2 C 1 λ (Q m+s+1 )

23 for some constant C 1 >0. On the other hand, RKHS APPROACH O FLR 23 λ (Qm 1/2 Q s+1 Qm 1/2 ) max Q 1/2 f F 1 s+1 Q1/2 m f 2 L 2 / f 2 L 2 C 2 min Q 1/2 f F 1 m+s+1 f 2 L 2 / f 2 L 2 C 2 λ (Q m+s+1 ) for some constant C 2 > 0. In summary, we have λ (Q 1/2 m Q s+1 Q 1/2 m ) 2(m+s+1). As shown by Ritter, Wasilowsi and Woźniaowsi [(1995), heorem 1, page 525], there exist D and U such that Q s+1 =D +U, D has at most 2(s + 1) nonzero eigenvalues and U 1/2 f L2 is equivalent to C 1/2 f L2. Moreover, the eigenfunctions of D, denoted by {g 1,...,g d } (d 2(s+1)) are polynomials of order no greater than 2s+1. Denote G the space spanned by {g 1,...,g d }. Clearly G W2 m=q1/2 m (L 2 ). Denote { h j :j 1} the eigenfunctions of Qm 1/2 Q s+1 Qm 1/2. Let F and F be defined similarly as F and F. hen by the Courant Fischer Weyl min max principle, λ d (Qm 1/2 UQm 1/2 ) min U 1/2 Q 1/2 m f 2 L 2 / f 2 L 2 m (G) f F Q 1/2 = min Q 1/2 f F Q 1/2 s+1 Q1/2 m f 2 L 2 / f 2 L 2 m (G) = min f F Q 1/2 s+1 Q1/2 m f 2 L 2 / f 2 L 2 C 1 λ (Q m+s+1 ) for some constant C 1 >0. On the other hand, λ +d (Qm 1/2 Q s+1qm 1/2 ) max U 1/2 Q 1/2 m f 2 L 2 / f 2 L 2 m (G) f F 1 Q 1/2 = max Q 1/2 f F s+1 Q1/2 m f 2 L 2 / f 2 L 2 1 Q 1/2 m (G) = min Q 1/2 f F 1 s+1 Q1/2 m f 2 L 2 / f 2 L 2 C 2 λ (Q m+s+1 ) for some constant C 2 >0. Hence λ (Q 1/2 m UQ 1/2 m ) 2(m+s+1). Because Qm 1/2 UQm 1/2 is equivalent to R 1/2 CR 1/2, following a similar argument as before, by the Courant Fischer Weyl min max principle, we complete the the proof.

24 24 M. YUAN AND.. CAI 6.5. Proof of heorem 6. We now proceed to prove heorem 6. he analysis follows a similar spirit as the technique commonly used in the study of the rate of convergence of smoothing splines [see, e.g., Silverman (1982); Cox and O Sullivan (1990)]. For brevity, we shall assume that EX( )=0 in the rest of the proof. In this case, α 0 can be estimated by ȳ and β 0 by (74) ˆβ nλ =argmin β H [ 1 n n ( y i i=1 x i (t)β(t)dt) 2 +λj(β) he proof below also applies to the more general setting when EX( ) 0 but with considerable technical obscurity. Recall that l n (β)= 1 n ( 2 (75) y i x i (t)β(t)dt). n i=1 Observe that [ ] 2 l (β):=el n (β)=e Y X(t)β(t) dt = σ 2 + [β(s) β 0 (s)]c(s,t)[β(t) β 0 (t)]dsdt Write (76) Clearly (77) = σ 2 + β β β λ =argmin{l (β)+λj(β)}. β H ˆβ nλ β 0 =(ˆβ nλ β λ )+( β λ β 0 ). We refer to the two terms on the right-hand side stochastic error and deterministic error, respectively Deterministic error. Write β 0 ( ) = a ω ( ) and β( ) = b ω ( ). hen heorem 3 implies that ]. herefore, (78) l (β)=σ 2 + (b a ) 2, J(β)= γ 1 b2. β λ ( )= a 1+λγ 1 ω ( )=: b ω ( ).

25 RKHS APPROACH O FLR 25 It can then be computed that for any a<1, Now note that sup (1+γ a )γ 1 (1+λγ 1 sup )2 x>0 β λ β 0 2 a = (1+γ a )( b a ) 2 sup x>0 = = ( λγ (1+γ a 1 ) 2 ) 1+λγ 1 a 2 λ 2 sup =λ 2 J(β 0 )sup (1+x a )x 1 (1+λx 1 ) 2 x 1 (1+λx 1 ) 2 +sup x>0 (1+γ a )γ 1 (1+λγ 1 )2 (1+γ a )γ 1 (1+λγ 1. )2 x (a+1) (1+λx 1 ) 2 γ 1 a2 1 inf x>0 (x 1/2 +λx 1/2 ) inf x>0 (x (a+1)/2 +λx (1 a)/2 ) 2 = 1 4λ +C 0λ (a+1). Hereafter, we use C 0 to denote a generic positive constant. In summary, we have Lemma 12. If λ is bounded from above, then β λ β 0 2 a =O(λ1 a J(β 0 )) Stochastic error. Next, weconsiderthestochasticerror ˆβ nλ β λ. Denote Dl n (β)f = 2 n [( ) ] y i x i (t)β(t)dt x i (t)f(t)dt, n i=1 ( ) Dl (β)f = 2E X X(t)[β 0 (t) β(t)]dt X(t)f(t)dt = 2 [β 0 (s) β(s)]c(s,t)f(t)dsdt, D 2 l n (β)fg = 2 n n [ i=1 ] x i (t)f(t)dt x i (t)g(t)dt,

26 26 M. YUAN AND.. CAI D 2 l (β)fg =2 f(s)c(s,t)g(t)dsdt. Also write l nλ (β)=l n (β)+λj(β) and l λ =l (β)+λj(β). Denote G λ = (1/2)D 2 l λ ( β λ ) and (79) β = β λ 1 2 G 1 λ Dl nλ( β λ ). It is clear that (80) ˆβ nλ β λ =(ˆβ nλ β)+( β β λ ). We now study the two terms on the right-hand side separately. For brevity, we shall abbreviate the subscripts of ˆβ and β in what follows. We begin with β β. Hereafter we shall omit the subscript for brevity if no confusion occurs. (81) Lemma 13. For any 0 a 1, E β β 2 a n 1 λ (a+1/(2(r+s))). Proof. NoticethatDl nλ ( β)=dl nλ ( β) Dl λ ( β)=dl n ( β) Dl ( β). herefore E[Dl nλ ( β)f] 2 =E[Dl n ( β)f Dl ( β)f] 2 = 4 ) [(Y n Var X(t) β(t)dt ] X(t)f(t)dt 4 ) [(Y n E X(t) β(t)dt X(t)f(t)dt = 4 ( ) 2 n E X(t)[β 0 (t) β(t)]dt X(t)f(t)dt ( + 4σ2 n E X(t)f(t)dt) 2, whereweusedthefact that ε=y Xβ 0 is uncorrelated with X. obound the first term, an application of the Cauchy Schwarz inequality yields ( ) 2 E X(t)[β 0 (t) β(t)]dt X(t)f(t)dt { ( β(t)]dt) 4 ( E X(t)[β 0 (t) E M β 0 β 2 0 f 2 0, ] 2 ) 4 } 1/2 X(t)f(t)dt where the second inequality holds by the second condition of F(s,M,K). herefore, (82) E[Dl nλ ( β)f] 2 4M n β 0 β 2 0 f σ2 n f 2 0,

27 RKHS APPROACH O FLR 27 which by Lemma 12 is further bounded by (C 0 σ 2 /n) f 2 0 for some positive constant C 0. Recall that ω 0 =1. We have (83) E[Dl nλ ( β)ω ] 2 C 0 σ 2 /n. hus, by the definition of β, E β β 2 a =E 1 2 G 1 λ Dl 2 nλ( β) a [ = 1 ] 4 E (1+γ a )(1+λγ 1 ) 2 (Dl nλ ( β)ω ) 2 C 0σ 2 4n C 0σ 2 4n C 0σ 2 4n C 0σ 2 4n (1+γ a )(1+λγ 1 ) 2 (1+ 2a(r+s) )(1+λ 2(r+s) ) = C 0σ 2 4n λ (a+1/(2(r+s))) n 1 λ (a+1/(2(r+s))). he proof is now complete. (84) x 2a(r+s) (1+λx 2(r+s) ) 2 dx (1+λx 2(r+s)/(2a(r+s)+1) ) 2 dx λ a+1/(2(r+s)) (1+x 2(r+s)/(2a(r+s)+1) ) 2 dx Now we are in position to bound E ˆβ β 2 a. By definition, First-order condition implies that (85) G λ (ˆβ β)= 1 2 D2 l λ ( β)(ˆβ β). Dl nλ (ˆβ)=Dl nλ ( β)+d 2 l nλ ( β)(ˆβ β)=0, where we used the fact that l n,λ is quadratic. ogether with the fact that (86) we have Dl nλ ( β)+d 2 l λ ( β)( β β)=0, D 2 l λ ( β)(ˆβ β)=d 2 l λ ( β)(ˆβ β)+d 2 l λ ( β)( β β) =D 2 l λ ( β)(ˆβ β) D 2 l nλ ( β)(ˆβ β) =D 2 l ( β)(ˆβ β) D 2 l n ( β)(ˆβ β).

28 28 M. YUAN AND.. CAI herefore, (87) Write (88) hen (ˆβ β)= 1 2 G 1 λ [D2 l ( β)(ˆβ β) D 2 l n ( β)(ˆβ β)]. ˆβ = ˆb ω and β = b ω. =0 ˆβ β 2 a = 1 (1+λγ ) 2 (1+γ a ) [ (ˆb j b j ) j=1 (1+λγ 1 ) 2 (1+γ a ) ( (1+γ c j=1 j ) 1 [ ( ) 1 n ω j (s) x i (t)x i (s) C(s,t) n j=1 i=1 ω (t)dsdt [ ] (ˆb j b j ) 2 (1+γ c j ) ] 2 ( ) 1 n ω j (s) x i (t)x i (s) C(s,t) n i=1 ω (t)dsdt where the inequality is due to the Cauchy Schwarz inequality. Note that ( [ ( ) ] 2 ) E (1+γj c ) 1 1 n ω j (s) x i (t)x i (s) C(s,t) ω (t)dsdt n j=1 = 1 n 1 n 1 n ( (1+γj c ) 1 Var j=1 [( (1+γj c ) 1 E j=1 [( (1+γj c ) 1 E j=1 i=1 ω j (t)x(t)dt ) ω (t)x(t)dt ) 2 ( ) 2 ] ω j (t)x(t)dt ω (t)x(t)dt ] 2 ) 4 ] 1/2 [( ) 4 ] 1/2 ω j (t)x(t)dt) E ω (t)x(t)dt,

29 M n = M n [( (1+γj c ) 1 E j=1 (1+γj c ) 1 n 1, j=1 provided that c>1/2(r+s). On the other hand, (1+λγ 1 (89) RKHS APPROACH O FLR 29 ) 2 (1+γ a ) C 0 (1+λ 2(r+s) ) 2 (1+ 2a(r+s) ) o sum up, 1 1 (1+λx 2(r+s) ) 2 x 2a(r+s) dx (1+λx 2(r+s)/(2a(r+s)+1) ) 2 dx =λ (a+1/(2(r+s))) λ (a+1/(2(r+s))). ) 2 ] [( ) 2 ] ω j (t)x(t)dt E ω (t)x(t)dt λ a+1/(2(r+s)) (1+x 2(r+s)/(2a(r+s)+1) ) 2 dx ˆβ β 2 a =O p (n 1 λ (a+1/(2(r+s))) ˆβ β 2 c). In particular, taing a=c yields (90) If (91) then (92) ˆβ β 2 c =O p (n 1 λ (c+1/(2(r+s))) ˆβ β 2 c). n 1 λ (c+1/(2(r+s))) 0, ˆβ β c =o p ( ˆβ β c ). ogether with the triangular inequality (93) herefore, (94) β β c ˆβ β c ˆβ β c =(1 o p (1)) ˆβ β c. ogether with Lemma 13, we have (95) ˆβ β c =O p ( β β c ) ˆβ β 2 c =O p (n 1 λ (c+1/(2(r+s))) )=o p (1). Putting it bac to (89), we now have:

30 30 M. YUAN AND.. CAI Lemma 14. If there also exists some 1/2(r+s)<c 1 such that n 1 λ (c+1/2(r+s)) 0, then (96) ˆβ β 2 a =o p (n 1 λ (a+1/2(r+s)) ). Combining Lemmas 12 14, we have (97) lim lim D n =0 sup F F(s,M,K),β 0 H P( ˆβ nλ β 0 2 a >Dn 2(1 a)(r+s)/(2(r+s)+1) ) by taing λ n 2(r+s)/(2(r+s)+1) Proof of heorem 7. Wenowsetouttoshowthatn 2(1 a)(r+s)/(2(r+s)+1) is the optimal rate. It follows from a similar argument as that of Hall and Horowitz (2007). Consider a setting where ψ =φ,,2,... Clearly in this case we also have ω = µ 1/2 φ. It suffices to show that the rate is optimal in this special case. Recall that β 0 = a φ. Set (98) { a = Ln 1/2 r θ, L n +1 2L n, 0, otherwise, where L n is the integer part of n 1/(2(r+s)+1), and θ is either 0 or 1. It is clear that (99) 2L n β 0 2 K =L n+1 L 1 n =1. hereforeβ 0 H.NowletX admitthefollowingexpansion:x = ξ s φ where ξ s are independent random variables drawn from a uniform distribution on [ 3, 3]. Simple algebraic manipulation shows that the distribution of X belongs to F(s,3). he observed data are (100) y i = 2L n =L n+1 L 1/2 n (r+s) ξ i θ +ε i, i=1,...,n, where the noise ε i is assumed to be independently sampled from N(0,M 2 ). As shown in Hall and Horowitz (2007), (101) lim inf inf n L n<j 2L n θ j supe( θ j θ j ) 2 >0, where sup denotes the supremum over all 2 Ln choices of (θ Ln+1,...,θ 2Ln ), and inf θ is taen over all measurable functions θ j of the data. herefore, for

31 RKHS APPROACH O FLR 31 any estimate β, (102) sup β β 0 2 a = sup 2L n =L n+1 L 1 n 2(1 a)(r+s) E( θ j θ j ) 2 Mn 2(1 a)(r+s)/(2(r+s)+1) for some constant M >0. Denote 1, θ >1, (103) θ = θ, 0 θ 1, 0, θ <0. It is easy to see that (104) 2L n =L n+1 L 1 n 2(1 a)(r+s) ( θ j θ j ) 2 2L n =L n+1 L 1 n 2(1 a)(r+s) ( θj θ j ) 2. Hence, we can assume that 0 θ j 1 without loss of generality in establishing the lower bound. Subsequently, (105) 2L n =L n+1 L 1 n 2(1 a)(r+s) ( θ j θ j ) 2 ogether with (102), this implies that (106) 2L n =L n+1 L 2(1 a)(r+s) n. L 1 n 2(1 a)(r+s) lim inf supp( β β 2 n a >dn 2(1 a)(r+s)/(2(r+s)+1) )>0 for some constant d>0. β APPENDIX: SACKS YLVISAKER CONDIIONS In Section 3, we discussed the relationship between the smoothness of C and the decay of its eigenvalues. More precisely, the smoothness can be quantified by the so-called Sacs Ylvisaer conditions. Following Ritter, Wasilowsi and Woźniaowsi (1995), denote (107) Ω + ={(s,t) (0,1) 2 :s>t} and Ω ={(s,t) (0,1) 2 :s<t}.

32 32 M. YUAN AND.. CAI Let cl(a) be the closure of a set A. Suppose that L is a continuous function on Ω + Ω such that L Ωj is continuously extendable to cl(ω j ) for j {+, }. By L j we denote the extension of L to [0,1] 2, which is continuous on cl(ω j ), and on [0,1] 2 \cl(ω j ). Furthermore write M (,l) (s,t) = ( +l /( s t l ))M(s,t). We say that a covariance function M on [0,1] 2 satisfies the Sacs Ylvisaer conditions of order r if the following three conditions hold: (A) L=M (r,r) iscontinuous on [0,1] 2, andits partial derivatives upto order 2 are continuous on Ω + Ω, and they are continuously extendable to cl(ω + ) and cl(ω ). (B) (108) min 0 s 1 (L(1,0) (s,s) L(1,0) + (s,s))>0. (C) L (2,0) + (s, ) belongs to the reproducing ernel Hilbert space spanned by L and furthermore (109) sup L (2,0) + (s, ) L <. 0 s 1 REFERENCES Adams, R. A. (1975). Sobolev Spaces. Academic Press, New Yor. MR Cai,. and Hall, P. (2006). Prediction in functional linear regression. Ann. Statist MR Cardot, H., Ferraty, F. and Sarda, P. (2003). Spline estimators for the functional linear model. Statist. Sinica MR Cardot, H. and Sarda, P. (2005). Estimation in generalized linear models for functional data via penalized lielihood. J. Multivariate Anal MR Cox, D. D. and O Sullivan, F. (1990). Asymptotic analysis of penalized lielihood and related estimators. Ann. Statist MR Crambes, C., Kneip, A. and Sarda, P. (2009). Smoothing splines estimators for functional linear regression. Ann. Statist MR Cucer, F. and Smale, S. (2001). On the mathematical foundations of learning. Bull. Amer. Math. Soc MR Ferraty, F. and Vieu, P. (2006). Nonparametric Functional Data Analysis: Methods, heory, Applications and Implementations. Springer, New Yor. MR Hall, P. and Horowitz, J. L. (2007). Methodology and convergence rates for functional linear regression. Ann. Statist MR James, G. (2002). Generalized linear models with functional predictors. J. Roy. Statist. Soc. Ser. B MR Johannes, J. (2009). Nonparametric estimation in functional linear models with second order stationary regressors. Unpublished manuscript. Available at abs/ v1. Li, Y. and Hsing,. (2007). On the rates of convergence in functional linear regression. J. Multivariate Anal MR

A Reproducing Kernel Hilbert Space Approach to Functional Linear Regression

A Reproducing Kernel Hilbert Space Approach to Functional Linear Regression A Reproducing Kernel Hilbert Space Approach to Functional Linear Regression Ming Yuan and. ony Cai 2 Georgia Institute of echnology and University of Pennsylvania (March 8, 2009) Abstract We study in this

More information

A REPRODUCING KERNEL HILBERT SPACE APPROACH TO FUNCTIONAL LINEAR REGRESSION

A REPRODUCING KERNEL HILBERT SPACE APPROACH TO FUNCTIONAL LINEAR REGRESSION he Annals of Statistics 2010, Vol. 38, No. 6, 3412 3444 DOI: 10.1214/09-AOS772 Institute of Mathematical Statistics, 2010 A REPRODUCING KERNEL HILBER SPACE APPROACH O FUNCIONAL LINEAR REGRESSION BY MING

More information

Nonparametric Inference In Functional Data

Nonparametric Inference In Functional Data Nonparametric Inference In Functional Data Zuofeng Shang Purdue University Joint work with Guang Cheng from Purdue Univ. An Example Consider the functional linear model: Y = α + where 1 0 X(t)β(t)dt +

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 11, 2009 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

Nonparametric Covariance Function Estimation for Functional and Longitudinal Data

Nonparametric Covariance Function Estimation for Functional and Longitudinal Data Nonparametric Covariance Function Estimation for Functional and Longitudinal Data. ony Cai 1 and Ming Yuan University of Pennsylvania and Georgia Institute of echnology (September 4, 010 Abstract Covariance

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces 9.520: Statistical Learning Theory and Applications February 10th, 2010 Reproducing Kernel Hilbert Spaces Lecturer: Lorenzo Rosasco Scribe: Greg Durrett 1 Introduction In the previous two lectures, we

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 12, 2007 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets Class 22, 2004 Tomaso Poggio and Sayan Mukherjee

RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets Class 22, 2004 Tomaso Poggio and Sayan Mukherjee RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets 9.520 Class 22, 2004 Tomaso Poggio and Sayan Mukherjee About this class Goal To introduce an alternate perspective of RKHS via integral operators

More information

SPECTRAL THEOREM FOR COMPACT SELF-ADJOINT OPERATORS

SPECTRAL THEOREM FOR COMPACT SELF-ADJOINT OPERATORS SPECTRAL THEOREM FOR COMPACT SELF-ADJOINT OPERATORS G. RAMESH Contents Introduction 1 1. Bounded Operators 1 1.3. Examples 3 2. Compact Operators 5 2.1. Properties 6 3. The Spectral Theorem 9 3.3. Self-adjoint

More information

Spatial Process Estimates as Smoothers: A Review

Spatial Process Estimates as Smoothers: A Review Spatial Process Estimates as Smoothers: A Review Soutir Bandyopadhyay 1 Basic Model The observational model considered here has the form Y i = f(x i ) + ɛ i, for 1 i n. (1.1) where Y i is the observed

More information

Kernel Method: Data Analysis with Positive Definite Kernels

Kernel Method: Data Analysis with Positive Definite Kernels Kernel Method: Data Analysis with Positive Definite Kernels 2. Positive Definite Kernel and Reproducing Kernel Hilbert Space Kenji Fukumizu The Institute of Statistical Mathematics. Graduate University

More information

Recall that any inner product space V has an associated norm defined by

Recall that any inner product space V has an associated norm defined by Hilbert Spaces Recall that any inner product space V has an associated norm defined by v = v v. Thus an inner product space can be viewed as a special kind of normed vector space. In particular every inner

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

arxiv: v2 [stat.me] 24 Sep 2018

arxiv: v2 [stat.me] 24 Sep 2018 ANOTHER LOOK AT STATISTICAL CALIBRATION: A NON-ASYMPTOTIC THEORY AND PREDICTION-ORIENTED OPTIMALITY arxiv:1802.00021v2 [stat.me] 24 Sep 2018 By Xiaowu Dai, and Peter Chien University of Wisconsin-Madison

More information

3. Some tools for the analysis of sequential strategies based on a Gaussian process prior

3. Some tools for the analysis of sequential strategies based on a Gaussian process prior 3. Some tools for the analysis of sequential strategies based on a Gaussian process prior E. Vazquez Computer experiments June 21-22, 2010, Paris 21 / 34 Function approximation with a Gaussian prior Aim:

More information

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces. Math 350 Fall 2011 Notes about inner product spaces In this notes we state and prove some important properties of inner product spaces. First, recall the dot product on R n : if x, y R n, say x = (x 1,...,

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product Chapter 4 Hilbert Spaces 4.1 Inner Product Spaces Inner Product Space. A complex vector space E is called an inner product space (or a pre-hilbert space, or a unitary space) if there is a mapping (, )

More information

Scattered Data Interpolation with Polynomial Precision and Conditionally Positive Definite Functions

Scattered Data Interpolation with Polynomial Precision and Conditionally Positive Definite Functions Chapter 3 Scattered Data Interpolation with Polynomial Precision and Conditionally Positive Definite Functions 3.1 Scattered Data Interpolation with Polynomial Precision Sometimes the assumption on the

More information

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis. Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Vector space Consists of: A set V A scalar

More information

Exercise Solutions to Functional Analysis

Exercise Solutions to Functional Analysis Exercise Solutions to Functional Analysis Note: References refer to M. Schechter, Principles of Functional Analysis Exersize that. Let φ,..., φ n be an orthonormal set in a Hilbert space H. Show n f n

More information

Bivariate Splines for Spatial Functional Regression Models

Bivariate Splines for Spatial Functional Regression Models Bivariate Splines for Spatial Functional Regression Models Serge Guillas Department of Statistical Science, University College London, London, WC1E 6BTS, UK. serge@stats.ucl.ac.uk Ming-Jun Lai Department

More information

Reproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto

Reproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto Reproducing Kernel Hilbert Spaces 9.520 Class 03, 15 February 2006 Andrea Caponnetto About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms

08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms (February 24, 2017) 08a. Operators on Hilbert spaces Paul Garrett garrett@math.umn.edu http://www.math.umn.edu/ garrett/ [This document is http://www.math.umn.edu/ garrett/m/real/notes 2016-17/08a-ops

More information

Optimization Theory. A Concise Introduction. Jiongmin Yong

Optimization Theory. A Concise Introduction. Jiongmin Yong October 11, 017 16:5 ws-book9x6 Book Title Optimization Theory 017-08-Lecture Notes page 1 1 Optimization Theory A Concise Introduction Jiongmin Yong Optimization Theory 017-08-Lecture Notes page Optimization

More information

Effective Dimension and Generalization of Kernel Learning

Effective Dimension and Generalization of Kernel Learning Effective Dimension and Generalization of Kernel Learning Tong Zhang IBM T.J. Watson Research Center Yorktown Heights, Y 10598 tzhang@watson.ibm.com Abstract We investigate the generalization performance

More information

Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model.

Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model. Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model By Michael Levine Purdue University Technical Report #14-03 Department of

More information

A Note on Hilbertian Elliptically Contoured Distributions

A Note on Hilbertian Elliptically Contoured Distributions A Note on Hilbertian Elliptically Contoured Distributions Yehua Li Department of Statistics, University of Georgia, Athens, GA 30602, USA Abstract. In this paper, we discuss elliptically contoured distribution

More information

MATH 205C: STATIONARY PHASE LEMMA

MATH 205C: STATIONARY PHASE LEMMA MATH 205C: STATIONARY PHASE LEMMA For ω, consider an integral of the form I(ω) = e iωf(x) u(x) dx, where u Cc (R n ) complex valued, with support in a compact set K, and f C (R n ) real valued. Thus, I(ω)

More information

Model Selection and Geometry

Model Selection and Geometry Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model

More information

Online gradient descent learning algorithm

Online gradient descent learning algorithm Online gradient descent learning algorithm Yiming Ying and Massimiliano Pontil Department of Computer Science, University College London Gower Street, London, WCE 6BT, England, UK {y.ying, m.pontil}@cs.ucl.ac.uk

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space.

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space. Chapter 1 Preliminaries The purpose of this chapter is to provide some basic background information. Linear Space Hilbert Space Basic Principles 1 2 Preliminaries Linear Space The notion of linear space

More information

Empirical Processes: General Weak Convergence Theory

Empirical Processes: General Weak Convergence Theory Empirical Processes: General Weak Convergence Theory Moulinath Banerjee May 18, 2010 1 Extended Weak Convergence The lack of measurability of the empirical process with respect to the sigma-field generated

More information

SHARP BOUNDARY TRACE INEQUALITIES. 1. Introduction

SHARP BOUNDARY TRACE INEQUALITIES. 1. Introduction SHARP BOUNDARY TRACE INEQUALITIES GILES AUCHMUTY Abstract. This paper describes sharp inequalities for the trace of Sobolev functions on the boundary of a bounded region R N. The inequalities bound (semi-)norms

More information

Gaussian Processes. 1. Basic Notions

Gaussian Processes. 1. Basic Notions Gaussian Processes 1. Basic Notions Let T be a set, and X : {X } T a stochastic process, defined on a suitable probability space (Ω P), that is indexed by T. Definition 1.1. We say that X is a Gaussian

More information

Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space

Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space Statistical Inference with Reproducing Kernel Hilbert Space Kenji Fukumizu Institute of Statistical Mathematics, ROIS Department

More information

The Stein and Chen-Stein methods for functionals of non-symmetric Bernoulli processes

The Stein and Chen-Stein methods for functionals of non-symmetric Bernoulli processes The Stein and Chen-Stein methods for functionals of non-symmetric Bernoulli processes Nicolas Privault Giovanni Luca Torrisi Abstract Based on a new multiplication formula for discrete multiple stochastic

More information

RANDOM FIELDS AND GEOMETRY. Robert Adler and Jonathan Taylor

RANDOM FIELDS AND GEOMETRY. Robert Adler and Jonathan Taylor RANDOM FIELDS AND GEOMETRY from the book of the same name by Robert Adler and Jonathan Taylor IE&M, Technion, Israel, Statistics, Stanford, US. ie.technion.ac.il/adler.phtml www-stat.stanford.edu/ jtaylor

More information

Asymptotically Efficient Nonparametric Estimation of Nonlinear Spectral Functionals

Asymptotically Efficient Nonparametric Estimation of Nonlinear Spectral Functionals Acta Applicandae Mathematicae 78: 145 154, 2003. 2003 Kluwer Academic Publishers. Printed in the Netherlands. 145 Asymptotically Efficient Nonparametric Estimation of Nonlinear Spectral Functionals M.

More information

On John type ellipsoids

On John type ellipsoids On John type ellipsoids B. Klartag Tel Aviv University Abstract Given an arbitrary convex symmetric body K R n, we construct a natural and non-trivial continuous map u K which associates ellipsoids to

More information

Bayesian Nonparametric Point Estimation Under a Conjugate Prior

Bayesian Nonparametric Point Estimation Under a Conjugate Prior University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 5-15-2002 Bayesian Nonparametric Point Estimation Under a Conjugate Prior Xuefeng Li University of Pennsylvania Linda

More information

Inequalities Relating Addition and Replacement Type Finite Sample Breakdown Points

Inequalities Relating Addition and Replacement Type Finite Sample Breakdown Points Inequalities Relating Addition and Replacement Type Finite Sample Breadown Points Robert Serfling Department of Mathematical Sciences University of Texas at Dallas Richardson, Texas 75083-0688, USA Email:

More information

1 Compact and Precompact Subsets of H

1 Compact and Precompact Subsets of H Compact Sets and Compact Operators by Francis J. Narcowich November, 2014 Throughout these notes, H denotes a separable Hilbert space. We will use the notation B(H) to denote the set of bounded linear

More information

Learning gradients: prescriptive models

Learning gradients: prescriptive models Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan

More information

Multiresolution analysis by infinitely differentiable compactly supported functions. N. Dyn A. Ron. September 1992 ABSTRACT

Multiresolution analysis by infinitely differentiable compactly supported functions. N. Dyn A. Ron. September 1992 ABSTRACT Multiresolution analysis by infinitely differentiable compactly supported functions N. Dyn A. Ron School of of Mathematical Sciences Tel-Aviv University Tel-Aviv, Israel Computer Sciences Department University

More information

Stat542 (F11) Statistical Learning. First consider the scenario where the two classes of points are separable.

Stat542 (F11) Statistical Learning. First consider the scenario where the two classes of points are separable. Linear SVM (separable case) First consider the scenario where the two classes of points are separable. It s desirable to have the width (called margin) between the two dashed lines to be large, i.e., have

More information

BIHARMONIC WAVE MAPS INTO SPHERES

BIHARMONIC WAVE MAPS INTO SPHERES BIHARMONIC WAVE MAPS INTO SPHERES SEBASTIAN HERR, TOBIAS LAMM, AND ROLAND SCHNAUBELT Abstract. A global weak solution of the biharmonic wave map equation in the energy space for spherical targets is constructed.

More information

UTILIZING PRIOR KNOWLEDGE IN ROBUST OPTIMAL EXPERIMENT DESIGN. EE & CS, The University of Newcastle, Australia EE, Technion, Israel.

UTILIZING PRIOR KNOWLEDGE IN ROBUST OPTIMAL EXPERIMENT DESIGN. EE & CS, The University of Newcastle, Australia EE, Technion, Israel. UTILIZING PRIOR KNOWLEDGE IN ROBUST OPTIMAL EXPERIMENT DESIGN Graham C. Goodwin James S. Welsh Arie Feuer Milan Depich EE & CS, The University of Newcastle, Australia 38. EE, Technion, Israel. Abstract:

More information

Asymptotic Properties of an Approximate Maximum Likelihood Estimator for Stochastic PDEs

Asymptotic Properties of an Approximate Maximum Likelihood Estimator for Stochastic PDEs Asymptotic Properties of an Approximate Maximum Likelihood Estimator for Stochastic PDEs M. Huebner S. Lototsky B.L. Rozovskii In: Yu. M. Kabanov, B. L. Rozovskii, and A. N. Shiryaev editors, Statistics

More information

Functional linear instrumental regression under second order stationarity.

Functional linear instrumental regression under second order stationarity. Functional linear instrumental regression under second order stationarity. Jan Johannes May 13, 2008 Abstract We consider the problem of estimating the slope parameter in functional linear instrumental

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 9, 2011 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

CLOSED RANGE POSITIVE OPERATORS ON BANACH SPACES

CLOSED RANGE POSITIVE OPERATORS ON BANACH SPACES Acta Math. Hungar., 142 (2) (2014), 494 501 DOI: 10.1007/s10474-013-0380-2 First published online December 11, 2013 CLOSED RANGE POSITIVE OPERATORS ON BANACH SPACES ZS. TARCSAY Department of Applied Analysis,

More information

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½ University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 1998 Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½ Lawrence D. Brown University

More information

Regularized principal components analysis

Regularized principal components analysis 9 Regularized principal components analysis 9.1 Introduction In this chapter, we discuss the application of smoothing to functional principal components analysis. In Chapter 5 we have already seen that

More information

Your first day at work MATH 806 (Fall 2015)

Your first day at work MATH 806 (Fall 2015) Your first day at work MATH 806 (Fall 2015) 1. Let X be a set (with no particular algebraic structure). A function d : X X R is called a metric on X (and then X is called a metric space) when d satisfies

More information

WEYL S LEMMA, ONE OF MANY. Daniel W. Stroock

WEYL S LEMMA, ONE OF MANY. Daniel W. Stroock WEYL S LEMMA, ONE OF MANY Daniel W Stroock Abstract This note is a brief, and somewhat biased, account of the evolution of what people working in PDE s call Weyl s Lemma about the regularity of solutions

More information

Concentration behavior of the penalized least squares estimator

Concentration behavior of the penalized least squares estimator Concentration behavior of the penalized least squares estimator Penalized least squares behavior arxiv:1511.08698v2 [math.st] 19 Oct 2016 Alan Muro and Sara van de Geer {muro,geer}@stat.math.ethz.ch Seminar

More information

The Mahalanobis distance for functional data with applications to classification

The Mahalanobis distance for functional data with applications to classification arxiv:1304.4786v1 [math.st] 17 Apr 2013 The Mahalanobis distance for functional data with applications to classification Esdras Joseph, Pedro Galeano and Rosa E. Lillo Departamento de Estadística Universidad

More information

THE L 2 -HODGE THEORY AND REPRESENTATION ON R n

THE L 2 -HODGE THEORY AND REPRESENTATION ON R n THE L 2 -HODGE THEORY AND REPRESENTATION ON R n BAISHENG YAN Abstract. We present an elementary L 2 -Hodge theory on whole R n based on the minimization principle of the calculus of variations and some

More information

arxiv: v1 [math.pr] 22 May 2008

arxiv: v1 [math.pr] 22 May 2008 THE LEAST SINGULAR VALUE OF A RANDOM SQUARE MATRIX IS O(n 1/2 ) arxiv:0805.3407v1 [math.pr] 22 May 2008 MARK RUDELSON AND ROMAN VERSHYNIN Abstract. Let A be a matrix whose entries are real i.i.d. centered

More information

i=1 α i. Given an m-times continuously

i=1 α i. Given an m-times continuously 1 Fundamentals 1.1 Classification and characteristics Let Ω R d, d N, d 2, be an open set and α = (α 1,, α d ) T N d 0, N 0 := N {0}, a multiindex with α := d i=1 α i. Given an m-times continuously differentiable

More information

Supplementary Material for Nonparametric Operator-Regularized Covariance Function Estimation for Functional Data

Supplementary Material for Nonparametric Operator-Regularized Covariance Function Estimation for Functional Data Supplementary Material for Nonparametric Operator-Regularized Covariance Function Estimation for Functional Data Raymond K. W. Wong Department of Statistics, Texas A&M University Xiaoke Zhang Department

More information

David Hilbert was old and partly deaf in the nineteen thirties. Yet being a diligent

David Hilbert was old and partly deaf in the nineteen thirties. Yet being a diligent Chapter 5 ddddd dddddd dddddddd ddddddd dddddddd ddddddd Hilbert Space The Euclidean norm is special among all norms defined in R n for being induced by the Euclidean inner product (the dot product). A

More information

Bayesian Aggregation for Extraordinarily Large Dataset

Bayesian Aggregation for Extraordinarily Large Dataset Bayesian Aggregation for Extraordinarily Large Dataset Guang Cheng 1 Department of Statistics Purdue University www.science.purdue.edu/bigdata Department Seminar Statistics@LSE May 19, 2017 1 A Joint Work

More information

THE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES. By Sara van de Geer and Johannes Lederer. ETH Zürich

THE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES. By Sara van de Geer and Johannes Lederer. ETH Zürich Submitted to the Annals of Applied Statistics arxiv: math.pr/0000000 THE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES By Sara van de Geer and Johannes Lederer ETH Zürich We study high-dimensional

More information

Representer Theorem. By Grace Wahba and Yuedong Wang. Abstract. The representer theorem plays an outsized role in a large class of learning problems.

Representer Theorem. By Grace Wahba and Yuedong Wang. Abstract. The representer theorem plays an outsized role in a large class of learning problems. Representer Theorem By Grace Wahba and Yuedong Wang Abstract The representer theorem plays an outsized role in a large class of learning problems. It provides a means to reduce infinite dimensional optimization

More information

MATH 590: Meshfree Methods

MATH 590: Meshfree Methods MATH 590: Meshfree Methods Chapter 2 Part 3: Native Space for Positive Definite Kernels Greg Fasshauer Department of Applied Mathematics Illinois Institute of Technology Fall 2014 fasshauer@iit.edu MATH

More information

A. R. SOLTANI AND Z. SHISHEBOR

A. R. SOLTANI AND Z. SHISHEBOR GEORGIAN MAHEMAICAL JOURNAL: Vol. 6, No. 1, 1999, 91-98 WEAKLY PERIODIC SEQUENCES OF BOUNDED LINEAR RANSFORMAIONS: A SPECRAL CHARACERIZAION A. R. SOLANI AND Z. SHISHEBOR Abstract. Let X and Y be two Hilbert

More information

NONTRIVIAL SOLUTIONS FOR SUPERQUADRATIC NONAUTONOMOUS PERIODIC SYSTEMS. Shouchuan Hu Nikolas S. Papageorgiou. 1. Introduction

NONTRIVIAL SOLUTIONS FOR SUPERQUADRATIC NONAUTONOMOUS PERIODIC SYSTEMS. Shouchuan Hu Nikolas S. Papageorgiou. 1. Introduction Topological Methods in Nonlinear Analysis Journal of the Juliusz Schauder Center Volume 34, 29, 327 338 NONTRIVIAL SOLUTIONS FOR SUPERQUADRATIC NONAUTONOMOUS PERIODIC SYSTEMS Shouchuan Hu Nikolas S. Papageorgiou

More information

A note on the σ-algebra of cylinder sets and all that

A note on the σ-algebra of cylinder sets and all that A note on the σ-algebra of cylinder sets and all that José Luis Silva CCM, Univ. da Madeira, P-9000 Funchal Madeira BiBoS, Univ. of Bielefeld, Germany (luis@dragoeiro.uma.pt) September 1999 Abstract In

More information

Packing-Dimension Profiles and Fractional Brownian Motion

Packing-Dimension Profiles and Fractional Brownian Motion Under consideration for publication in Math. Proc. Camb. Phil. Soc. 1 Packing-Dimension Profiles and Fractional Brownian Motion By DAVAR KHOSHNEVISAN Department of Mathematics, 155 S. 1400 E., JWB 233,

More information

INTEGRAL REPRESENTATIONS OF SEMI-INNER PRODUCTS IN FUNCTION SPACES

INTEGRAL REPRESENTATIONS OF SEMI-INNER PRODUCTS IN FUNCTION SPACES International Journal of Analysis and Applications ISSN 2291-8639 Volume 14, Number 2 (2017), 107-133 http://www.etamaths.com INTEGRAL REPRESENTATIONS OF SEMI-INNER PRODUCTS IN FUNCTION SPACES FLORIAN-HORIA

More information

SCHWARTZ SPACES ASSOCIATED WITH SOME NON-DIFFERENTIAL CONVOLUTION OPERATORS ON HOMOGENEOUS GROUPS

SCHWARTZ SPACES ASSOCIATED WITH SOME NON-DIFFERENTIAL CONVOLUTION OPERATORS ON HOMOGENEOUS GROUPS C O L L O Q U I U M M A T H E M A T I C U M VOL. LXIII 1992 FASC. 2 SCHWARTZ SPACES ASSOCIATED WITH SOME NON-DIFFERENTIAL CONVOLUTION OPERATORS ON HOMOGENEOUS GROUPS BY JACEK D Z I U B A Ń S K I (WROC

More information

and finally, any second order divergence form elliptic operator

and finally, any second order divergence form elliptic operator Supporting Information: Mathematical proofs Preliminaries Let be an arbitrary bounded open set in R n and let L be any elliptic differential operator associated to a symmetric positive bilinear form B

More information

Some notes on Coxeter groups

Some notes on Coxeter groups Some notes on Coxeter groups Brooks Roberts November 28, 2017 CONTENTS 1 Contents 1 Sources 2 2 Reflections 3 3 The orthogonal group 7 4 Finite subgroups in two dimensions 9 5 Finite subgroups in three

More information

The deterministic Lasso

The deterministic Lasso The deterministic Lasso Sara van de Geer Seminar für Statistik, ETH Zürich Abstract We study high-dimensional generalized linear models and empirical risk minimization using the Lasso An oracle inequality

More information

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University PCA with random noise Van Ha Vu Department of Mathematics Yale University An important problem that appears in various areas of applied mathematics (in particular statistics, computer science and numerical

More information

On prediction and density estimation Peter McCullagh University of Chicago December 2004

On prediction and density estimation Peter McCullagh University of Chicago December 2004 On prediction and density estimation Peter McCullagh University of Chicago December 2004 Summary Having observed the initial segment of a random sequence, subsequent values may be predicted by calculating

More information

Regularity of the density for the stochastic heat equation

Regularity of the density for the stochastic heat equation Regularity of the density for the stochastic heat equation Carl Mueller 1 Department of Mathematics University of Rochester Rochester, NY 15627 USA email: cmlr@math.rochester.edu David Nualart 2 Department

More information

Tikhonov Regularization of Large Symmetric Problems

Tikhonov Regularization of Large Symmetric Problems NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS Numer. Linear Algebra Appl. 2000; 00:1 11 [Version: 2000/03/22 v1.0] Tihonov Regularization of Large Symmetric Problems D. Calvetti 1, L. Reichel 2 and A. Shuibi

More information

Linear Algebra. Session 12

Linear Algebra. Session 12 Linear Algebra. Session 12 Dr. Marco A Roque Sol 08/01/2017 Example 12.1 Find the constant function that is the least squares fit to the following data x 0 1 2 3 f(x) 1 0 1 2 Solution c = 1 c = 0 f (x)

More information

Inner product on B -algebras of operators on a free Banach space over the Levi-Civita field

Inner product on B -algebras of operators on a free Banach space over the Levi-Civita field Available online at wwwsciencedirectcom ScienceDirect Indagationes Mathematicae 26 (215) 191 25 wwwelseviercom/locate/indag Inner product on B -algebras of operators on a free Banach space over the Levi-Civita

More information

D I S C U S S I O N P A P E R

D I S C U S S I O N P A P E R I N S T I T U T D E S T A T I S T I Q U E B I O S T A T I S T I Q U E E T S C I E N C E S A C T U A R I E L L E S ( I S B A ) UNIVERSITÉ CATHOLIQUE DE LOUVAIN D I S C U S S I O N P A P E R 2014/06 Adaptive

More information

Functional Latent Feature Models. With Single-Index Interaction

Functional Latent Feature Models. With Single-Index Interaction Generalized With Single-Index Interaction Department of Statistics Center for Statistical Bioinformatics Institute for Applied Mathematics and Computational Science Texas A&M University Naisyin Wang and

More information

Eigenvalues and Eigenfunctions of the Laplacian

Eigenvalues and Eigenfunctions of the Laplacian The Waterloo Mathematics Review 23 Eigenvalues and Eigenfunctions of the Laplacian Mihai Nica University of Waterloo mcnica@uwaterloo.ca Abstract: The problem of determining the eigenvalues and eigenvectors

More information

2. Review of Linear Algebra

2. Review of Linear Algebra 2. Review of Linear Algebra ECE 83, Spring 217 In this course we will represent signals as vectors and operators (e.g., filters, transforms, etc) as matrices. This lecture reviews basic concepts from linear

More information

Prof. M. Saha Professor of Mathematics The University of Burdwan West Bengal, India

Prof. M. Saha Professor of Mathematics The University of Burdwan West Bengal, India CHAPTER 9 BY Prof. M. Saha Professor of Mathematics The University of Burdwan West Bengal, India E-mail : mantusaha.bu@gmail.com Introduction and Objectives In the preceding chapters, we discussed normed

More information

Intertibility and spectrum of the multiplication operator on the space of square-summable sequences

Intertibility and spectrum of the multiplication operator on the space of square-summable sequences Intertibility and spectrum of the multiplication operator on the space of square-summable sequences Objectives Establish an invertibility criterion and calculate the spectrum of the multiplication operator

More information

SPECTRAL PROPERTIES OF THE LAPLACIAN ON BOUNDED DOMAINS

SPECTRAL PROPERTIES OF THE LAPLACIAN ON BOUNDED DOMAINS SPECTRAL PROPERTIES OF THE LAPLACIAN ON BOUNDED DOMAINS TSOGTGEREL GANTUMUR Abstract. After establishing discrete spectra for a large class of elliptic operators, we present some fundamental spectral properties

More information

ADJOINTS, ABSOLUTE VALUES AND POLAR DECOMPOSITIONS

ADJOINTS, ABSOLUTE VALUES AND POLAR DECOMPOSITIONS J. OPERATOR THEORY 44(2000), 243 254 c Copyright by Theta, 2000 ADJOINTS, ABSOLUTE VALUES AND POLAR DECOMPOSITIONS DOUGLAS BRIDGES, FRED RICHMAN and PETER SCHUSTER Communicated by William B. Arveson Abstract.

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 9, 2011 About this class Goal In this class we continue our journey in the world of RKHS. We discuss the Mercer theorem which gives

More information

Convergence rates of spectral methods for statistical inverse learning problems

Convergence rates of spectral methods for statistical inverse learning problems Convergence rates of spectral methods for statistical inverse learning problems G. Blanchard Universtität Potsdam UCL/Gatsby unit, 04/11/2015 Joint work with N. Mücke (U. Potsdam); N. Krämer (U. München)

More information

Lecture 22: Variance and Covariance

Lecture 22: Variance and Covariance EE5110 : Probability Foundations for Electrical Engineers July-November 2015 Lecture 22: Variance and Covariance Lecturer: Dr. Krishna Jagannathan Scribes: R.Ravi Kiran In this lecture we will introduce

More information

Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data

Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data Sri Lankan Journal of Applied Statistics (Special Issue) Modern Statistical Methodologies in the Cutting Edge of Science Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations

More information

1 Math 241A-B Homework Problem List for F2015 and W2016

1 Math 241A-B Homework Problem List for F2015 and W2016 1 Math 241A-B Homework Problem List for F2015 W2016 1.1 Homework 1. Due Wednesday, October 7, 2015 Notation 1.1 Let U be any set, g be a positive function on U, Y be a normed space. For any f : U Y let

More information

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley Review of Classical Least Squares James L. Powell Department of Economics University of California, Berkeley The Classical Linear Model The object of least squares regression methods is to model and estimate

More information

Chapter 3 Transformations

Chapter 3 Transformations Chapter 3 Transformations An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Linear Transformations A function is called a linear transformation if 1. for every and 2. for every If we fix the bases

More information

Part V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory

Part V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory Part V 7 Introduction: What are measures and why measurable sets Lebesgue Integration Theory Definition 7. (Preliminary). A measure on a set is a function :2 [ ] such that. () = 2. If { } = is a finite

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information