Sparse estimation for functional semiparametric additive models

Size: px

Start display at page:

Download "Sparse estimation for functional semiparametric additive models"

Darleen Harmon
5 years ago
Views:

1 Sparse estimatin fr functinal semiparametric additive mdels Peijun Sang, Richard A. Lckhart, Jigu Ca Department f Statistics and Actuarial Science, Simn Fraser University, Burnaby, BC, Canada V5A1S6 Abstract We prpse a functinal semiparametric additive mdel fr the effects f a functinal cvariate and several scalar cvariates and a scalar respnse. The effect f the functinal cvariate is mdeled nnparametrically, while a linear frm is adpted t mdel the effects f the scalar cvariates. This strategy can enhance flexibility in mdeling the effect f the functinal cvariate and maintain interpretability fr the effects f scalar cvariates simultaneusly. We develp the methd fr estimating the functinal semiparametric additive mdel by smthing and selecting nnvanishing cmpnents fr the functinal cvariate. Asympttic prperties f ur methd are als established. Tw simulatin studies are implemented t cmpare ur methd with varius cnventinal methds. We demnstrate ur methd with tw real applicatins. Keywrds: Functinal data analysis, Functinal linear mdel, Functinal principal cmpnent analysis 1. Intrductin High-dimensinal data sets f large vlume and cmplex structure are rapidly emerging in varius fields. Functinal data analysis, due t its great flexibility and wide applicatins in dealing with high-dimensinal data, has received cnsiderable attentin. One imprtant prblem in functinal data analysis is functinal linear regressin (FLR). One type f FLR mdels the relatinship between a functinal cvariate and a univariate scalar respnse f interest. Due t ptential lack f fit with FLR mdels, [24] prpsed functinal additive mdels (FAM), in which a scalar respnse depends n an additive frm f the functinal principal cmpnent (FPC) scres f a functinal cvariate. A lcal linear smther was emplyed t estimate each cmpnent in the additive frm and cnsistency was established fr this estimatr. Hwever, in many cases, nt nly functinal cvariates but als sme scalar cvariates may play a rle in explaining variatin f respnse. Fr instance, the Tecatr dataset (see Sectin 4.1 fr a mre detailed descriptin), which cnsists f three cntents (fat, water, and prtein) and 100-channel spectral trajectries f absrbance, has been analyzed with varius mdels, where the respnse f interest is ne f the three cntents. Previus studies have fcused n regressing the respnse n the spectral trajectries, which can be viewed as a functinal cvariate. Zhu et al. [36], fr example, emplyed a regularized functinal additive mdel, where scaled FPC scres are treated as cvariates t predict the prtein cntent. Hwever, pairwise scatter plts f the three cntents suggest that the ther tw cntents are highly crrelated with the prtein cntent as well; thus it may be beneficial t add them int the regressin mdel. In light f this fact, we aim t build a mdel which can incrprate the effects f bth the spectral trajectries and the fat and water cntents n the predictin f the prtein cntent. Mtivated by the abve example, we prpse a functinal semiparametric additive mdel (FSAM) t describe the relatinship between a functinal cvariate, a finite number f scalar cvariates and a respnse variable f interest. In this mdel, the effect f a functinal cvariate is represented by its scaled leading FPC scres while scalar cvariates are mdeled linearly. As a result, this mdel enables us t acquire flexibility in calibrating the effect f the functinal cvariate while retaining easy interpretatin f the effects f the scalar cvariates. There are tw main difficulties assciated with this new mdel: the first ne is the mdel estimatin and the secnd cncerns theretical prperties. Crrespnding authr. jigu ca@sfu.ca Preprint submitted t Jurnal f Multivariate Analysis June 28, 2018

2 Obviusly, the estimatin f the effect f a functinal cvariate may affect that f scalar cvariates and vice versa. T address this issue, we prpse an iterative updating algrithm, which is similar in spirit t the EM algrithm, t accunt fr the interdependence between these tw estimated effects. In additin, nly the nnparametric effect f the functinal cvariate needs t be regularized; this adds additinal difficulties in estimatin. In the theretical aspect, we aim t establish cnsistency fr the parametric part and the nnparametric part, respectively. Separating these tw effects is mre difficult than develping theretical prperties with nly a nnparametric part as in a FAM. A semiparametric additive mdel (smetimes described under alternative names like partially linear mdel) can be viewed as a special versin f a generalized additive mdel in which the mean respnse is assumed t have a linear relatinship with ne r mre f the cvariates, but the relatin with ther cvariates cannt be easily mdeled in a parametric frm [26, 29]. Numerus methds have been prpsed t fit such mdels. The methd f penalized least squares [7, 17, 32] has played a majr rle in this regard. Chen et al. [4] emplyed a piecewise plynmial t apprximate the nnparametric part and develped asympttic prperties f the least squares estimatr f the cefficients in the parametric part. Fan and Gijbels [8] estimated the nnparametric part using a lcal plynmial and derived asympttic prperties f their estimatrs as well. A cmprehensive review f different appraches t fitting a semiparametric additive mdel can be seen in [15]. Fr the case when bth a functinal cvariate and scalar cvariates are invlved t predict the mean respnse, Shin [28] cnsidered a functinal partially linear mdel in which the effect f a functinal cvariate is mdeled via a finite-dimensinal linear cmbinatin f principal cmpnent scres. A similar mdel was prpsed by Lu et al. [22] t mdel the quantile functin f the respnse variable. Even thugh bth papers derived asympttic prperties f their estimatrs, they did nt cnsider selectin f functinal principal cmpnents fr the functinal cvariate. Kng et al. [19] extended the abve wrk t the situatin when multiple functinal cvariates and high-dimensinal scalar cvariates are encuntered. The effect f each functinal cvariate is represented via a truncated linear cmbinatin f FPC scres, the truncatin level f which is allwed t increase as sample size increases. T identify imprtant features, reduce variability and enhance interpretability, they prpsed t cmbine regularizatin f each functinal cvariate with a penalty n high-dimensinal scalar cvariates. Ivanescu et al. [18] prpsed a general regressin framewrk which cnsidered a functinal respnse and tw functinal cvariates. This article has three main cntributins. First, in cmparisn with previus wrk n functinal partial linear regressin, ur mdel allws fr a mre general representatin in terms f the effect f a functinal cvariate. In additin, using a special regularizatin scheme, ur methd can select the nn-vanishing functinal principal cmpnents fr a functinal cvariate. Last but nt least, we derive asympttic prperties f the estimatr. The remainder f this paper is rganized as fllws. Sectin 2 intrduces FSAM and ur methd t estimate FSAM using a special regularizatin scheme and implementing an iterative updating algrithm. Sectin 3 evaluates the finite-sample perfrmance f ur prpsed estimatin methd in cmparisn with three alternative methds using sme simulatin studies. In Sectin 4, ur methd is demnstrated in tw real examples. Sme asympttic results fr the prpsed estimatin methd are prvided in Sectin 5. Sectin 6 cncludes this article. Additinal results f numerical studies and theretical prfs are given in the Online Supplement. 2. Mdel and estimatin methd 2.1. Functinal semiparametric additive mdel Let X(t) dente a square integrable stchastic prcess n a dmain I = [0, T] and Y dente a scalar randm variable. A functinal regressin mdel characterizes the relatinship between the scalar respnse Y and the randm functin X(t). A typical example is the functinal linear mdel: Y = X(t)β(t) dt, where β(t) is a square integrable I functin n [0, T] as well. T accunt fr the effect f sme scalar predictrs in a functinal regressin mdel, several functinal partial linear mdels have been prpsed; see [19, 22, 28]. In these papers, the effect f the functinal predictr is mdeled nnparametrically while a linear frm is adpted t mdel the effect f scalar predictrs. Fr instance, Shin [28] cnsidered the fllwing mdel: E(Y X, z) = X(t)β(t) dt + z α, (1) I 2

3 where z = (z 1,..., z p ) is a p-dimensinal scalar cvariate and α = (α 1,..., α p ) R p is the crrespnding cefficient vectr. Let m(t) and G(s, t) dente the mean functin and cvariance functin f X(t), respectively. The cvariance functin G(s, t) can be expressed as G(s, t) = λ k ψ k (s)ψ k (t), k=1 where λ 1, λ 2,... are the eigenvalues f G, satisfying λ 1 λ 2 0, and ψ 1 (t), ψ 2 (t),... are the crrespnding rthnrmal eigenfunctins, which satisfy ψ j ψ k dt = 1 if j = k and 0 therwise. Then the prcess X(t) admits the Karhunen Lève expansin: X(t) = m(t) + ξ k ψ k (t), (2) where ξ k = I (X(t) m(t))ψ k(t)dt is the uncrrelated functinal principal cmpnent (FPC) scre. In additin, E(ξ k ξ k ) = λ k if k = k and 0 therwise. (See [14] fr an interesting recent use f these ideas fr prcesses n the sphere.) Replacing X(t) in (1) with the expressin given in (2), we have E(Y X, z) = b + k=1 ξ k b k + z α, where b = I β(t)m(t) dt and b k = I β(t)ψ k(t) dt. T allw fr greater flexibility, the additive cmpnents with respect t the FPC scres ξ 1, ξ 2,... in the abve equatin can take a mre general frm. Mtivated by the idea f a generalized additive mdel [16], we cnsider E(Y X, z) = b + k=1 f k (ξ k ) + z α. k=1 This mdel withut scalar predictrs is previusly studied in [24, 36] t describe the relatinship between a scalar respnse and a functinal predictr. Fr cnvenience f regularizatin n each cmpnent f k, we first scale the FPC scres t [0, 1]. One pssible apprach is t treat ξ k as having a N(0, λ k ) distributin and apply the cumulative distributin functin (cdf) f N(0, λ k ) t ξ k, i.e., ζ k = Φ(ξ k / λ k ), where Φ is the cdf f N(0, 1). Other cumulative distributins culd be emplyed fr scaling, but we fcus slely n the Gaussian case here. The crrespnding additive mdel becmes: E(Y X, z) = b + f k (ζ k ) + z α. (3) In additin t making the fllwing regularizatin scheme easier t implement, there are tw main reasns why we cnsider transferring FPC scres t a cmpact dmain. The first reasn cncerns theretical derivatins. When each functin in a functinal space can be represented in terms f spline basis functins like B-spline bases r reprducing kernel functins, assuming the dmain is cmpact can simplify theretical derivatins. Such examples can be fund, e.g., in [23, 30]. Our secnd reasn fr fcussing n a cmpact dmain explains why that wuld be reasnable. Let h j dente the transfrmatin h(ξ j ) = ζ j and g j dente the functin with the argument being ζ j. Nte that if h j is any strictly mntne, cntinuus map frm R t (0, 1), we may write f j = g j h j with g j = f j h 1 j. We assume that there exists an integer, d, which is large enugh that f k 0 when k > d. This amunts t assuming that nly sme f the FPC scres f the functinal predictr are relevant t the respnse. Our truncated mdel is then given as E(Y X, z) = b + k=1 d f j (ζ j ) + z α. (4) In practice we chse an initial value f d in such a way that at least 99.9% f variability in X(t) can be explained by the first d FPCs. Let ζ = (ζ 1,..., ζ d ). 3 j=1

4 We als assume that the effect f each transfrmed scre, f 1,..., f d is a smth functin. In this paper, we call the effect f X(t), namely f (ζ) = b + f 1 (ζ 1 ) + + f d (ζ d ), the nnparametric part f the mdel and the linear cmbinatin z α the parametric part f the mdel. In additin, the f j s are called nnparametric cmpnents. Mdel (4) is called a functinal semiparametric additive mdel (FSAM) in this article Estimatin methd The bjective f this paper is t prpse an estimatin methd which can select and estimate nnparametric cmpnents that are relevant t the respnse while estimating the effects f scalar cvariates in Mdel (4). Zhu et al. [36] cnsidered a special case when α is knwn t be 0 in Mdel (4); t select and smth nn-vanishing nnparametric cmpnents in the estimatin f the nnparametric part, they apply the COmpnent Selectin and Smthing Operatr (COSSO) prpsed by [21]. We prvide a brief review f COSSO next. Let H be the lth-rder Sblev space n [0, 1], defined by H([0, 1]) = {h : h (ν) is abslutely cntinuus fr all ν {0,..., l 1}; h (l) L 2 }. Then H is a reprducing kernel Hilbert space (RKHS) equipped with the squared nrm l 1 { 1 2 h 2 = h (ν) (t) dt} + ν= {h (l) (t)} 2 dt. (5) Fr a mre detailed intrductin t this RKHS, ne can refer t Sectin 2.3 in [13]. One can decmpse H as H = {1} H, where elements f H have been centered. Fr example, take h(t) = t. Then h(t) = 1/2 + t 1/2 and t 1/2 H. Assuming that f 1,..., f d H, then f (ζ) lies in the subspace F d = {1} d H, j=1 i.e., in the direct sum f the space f cnstant functins and d cpies f H. This assumptin addresses the issue f identifiability fr nnparametric cmpnents in Mdel (4). The COSSO regularizatin, applied t functins in the RKHS, is used t select and smth nn-vanishing cmpnents when estimating f. Suppse the data cnsist f n independent and identically distributed triples (X 1, z 1, y 1 ),..., (X n, z n, y n ). When α in Mdel (4) are knwn t be 0, then the COSSO estimate f f is defined by minimizing Q( f ) = 1 n n {y i f (ζ i )} 2 + τ 2 J( f ), (6) where J( f ) = P 1 f + + P j f with each P j f denting the prjectin f f nt H with the argument being the jth cmpnent f ζ, and τ dentes a tuning parameter which cntrls the trade-ff between fidelity t the data and cmplexity f the mdel. If P 1 f,..., P j f are linear, then the minimizer f Q( f ) is the LASSO estimate. Hwever, the sum f the seminrms f P j f, i.e., J( f ), rather than the L 1 nrm f the cefficient vectr, is penalized in Q( f ) in general scenaris. Mre specifically, if we represent each P j f with a linear cmbinatin f the reprducing kernel functins, the P j f s are nt differentiable with respect t the cefficients. This fact makes minimizatin f Q( f ) an intricate prblem. Lin and Zhang [21] argued that intrducing an ancillary parameter θ = (θ 1,..., θ d ), can ease the minimizatin task greatly. As shwn in Lemma 2 f [21], minimizatin f (6) is equivalent t minimizing H( f, θ) = 1 n n d {y i f (ζ i )} 2 + λ 0 θ 1 j P j f 2 + λ j=1 j=1 d θ j (7) with respect t f and θ, when f F d, θ 1 0,..., θ d 0 and λ = τ 4 /(4λ 0 ). In (7), bth λ 0 and λ are nnnegative tuning parameters, which cntrl the smthness and selectin f the estimated nnparametric part, respectively. If θ j = 0, then the minimizer satisfies P j f = 0, indicating that f j, the jth cmpnent in the nnparametric part, vanishes. The utline f the algrithm is given as fllws. Generally speaking, the FPC scres cannt be bserved directly; thus estimating the first d FPC scres fr each trajectry X i is indispensable t estimate the nnparametric part later. Trajectries are usually recrded at a grid f time pints, which can be different acrss subjects, and they are ften bserved with measurement errrs. T address 4

5 these issues when estimating FPC scres, we can emply regularized FPCA, prpsed by Ramsay and Silverman [25], r PACE, prpsed by Ya et al. [35]. Then the estimated scaled FPC scres, dented as ˆζ i = (ˆζ i1,..., ˆζ id ), can be btained by applying the CDF f a nrmal distributin with specific variance t the estimated FPC scres. Nw we can implement COSSO. Let R j dente the n n matrix with the (s, t) entry R(ˆζ s j, ˆζ t j ), where R(, ) is the reprducing kernel f H, and R θ fr the matrix θ 1 R θ d R d. Fr fixed λ 0 and λ, the minimizer f (7) has the frm n d f (ζ) = b + θ j R(ζ j, ˆζ i j ). c i j=1 Thus f = ( f (ˆζ 1 ),..., f (ˆζ n )) = 1 n b + R θ c, where c = (c 1,..., c n ) and 1 n is the vectr f nes f length n. Then the penalty term has d d θ 1 j P j f 2 = θ j c R j c = c R θ c. Nw (7) becmes j=1 j=1 1 min b,c,θ 0 d n (y 1 nb R θ c) (y 1 n b R θ c) + λ 0 c R θ c + λ1 d θ, (8) where y = (y 1,..., y n ) and 0 d dentes the vectr cnsisting f d zers. T slve (8), we alternatively slve fr the pair (b, c) with θ fixed and then slve fr θ with (b, c) fixed. Mre specifically: 1) When θ is fixed, slving (8) is equivalent t slving the standard smthing spline min b,c y 1 nb R θ c 2 + nλ 0 c R θ c. (9) The slutin f (9) is similar t a smthing spline estimate and can be fund in [33]. 2) With (b, c) fixed, (8) becmes min θ 0 (v Gθ) (v Gθ) + nλ1 S θ, (10) where v = y (1/2)nλ 0 c 1 n b and G is the n d matrix with the jth clumn being R j c. Lin and Zhang [21] suggested cnsidering an equivalent ptimizatin prblem: fr sme M 0, find min θ v Gθ 2, subject t 1 θ M and θ 0 d. (11) The tuning parameter M in (11) is equivalent t λ in (10). Alternatively, the ptimizatin prblem (10) can be addressed directly using glmnet in R with the lwer bund f the parameters t be estimated set as 0. Only when the effect f the scalar cvariates z can be remved r is knwn, can the abve algrithm be implemented. Nw we take the unknwn effect f z int cnsideratin as well. The estimate f g = f (ζ) + α z is defined as ĝ n arg min 1 n { yi g(ζ n i, z i ) }2 + τ 2 J(g), g(ζ,z)=b+ d j=1 f j (ζ j )+α z where f j H and J(g) is set t be J( f ) = P 1 f + + P d f. We use rather than = since we d nt knw if the minimizer is unique in general; this des nt affect the results which fllw. Nte that the regularizatin suggested abve penalizes nly the nnparametric part, while neglecting the effect f the parametric part α z. Difficulties arise when we apply COSSO directly t estimate the nnparametric part f in (4) since the effect f scalar predictr z needs t be accunted fr as well. If the cefficient, α, f z were knwn, then a slight mdificatin f COSSO wuld suffice t deal with the estimatin prblem: replace y in (9) with y Zα and v in (10) with y Zα nλ 0 c/2 1 n b, where Z = (z 1,..., z n ). Hwever, α is unknwn as well and the estimate f the nnparametric part f depends n the value f α, which pses a bttleneck when implementing COSSO t estimate f. T deal with 5

6 Algrithm 1 Iterative updating fr regularized functinal semiparametric additive mdel Step 1: Start with an initial value f α, say ˆα (0), and an initial value f θ, say ˆθ (0). Step 2: Use the current estimate ˆα (m) and ˆθ (m) t btain estimates ˆb (m+1) and ĉ (m+1) by slving (9), in which y is replaced by y Z ˆα (m). Step 3: Use the current estimate ˆα (m), ˆb (m+1) and ĉ (m+1) t btain an updated estimate ˆθ (m+1) by slving (11), in which v is replaced by y Z ˆα (m) (1/2)nλ 0 ĉ (m+1) 1 n ˆb (m+1). Step 4: Use the estimate ˆb (m+1), ĉ (m+1) and ˆθ (m+1) t btain an updated estimate ˆα (m+1) by slving a least squares prblem. Step 5: Repeat Steps 2, 3 and 4 until ˆα (m+1) ˆα (m) < ɛ, where ɛ is a pre-determined tlerance value. this prblem, we prpse an iterative updating algrithm t estimate bth the nnparametric and parametric parts and t select and smth the nn-vanishing cmpnents in f. After estimating the ζ i s via regularized FPCA r PACE, the target functin t be minimized can be written as Q ( f, α) = 1 n n d {y i f (ˆζ i ) α z i } 2 + τ 2 P j f, j=1 where f (ˆζ i ) = b + d f j (ˆζ i j ) dentes the the nnparametric part evaluated at the estimated transfrmatin ˆζ i. We aim t lk fr ˆ f F d and ˆα R p, which can minimize the target functin Q. As illustrated abve, minimizing Q is equivalent t anther minimizatin prblem, viz. j=1 1 min α,b,c,θ 0 d n (y Zα 1 nb R θ c) (y Zα 1 n b R θ c) + λ 0 c R θ c + λ1 d θ. (12) Algrithm 1 utlines the steps t slve (12). The fitting methd presented abve is called Functinal Semiparametric Additive Mdel via COmpnent Selectin and Smthing Operatr (FSAM-COSSO) in this paper. Minimizatin f (12) turns ut t be a cnvex prblem. Our numerical studies shw that this algrithm can cnverge in a few steps with reasnable initial estimates fr bth α and θ; these are taken t be the rdinary least squares estimate and 0 d, respectively Tuning parameter selectin Crss-validatin (CV) r generalized crss-validatin (GCV) can be emplyed t chse the tuning parameters. The fllwing adaptive tuning scheme is a slight mdificatin f the prpsal by [21]: 1) In step 1 f Algrithm 1, the initial value f θ, ˆθ (0) is chsen as 1 d. We emply GCV r CV t chse the tuning parameter λ 0 when addressing the smthing spline prblem. In the fllwing updating steps, λ 0 is fixed t be the chsen value. 2) A grid f pints in a reasnable range are chsen as candidates fr M. CV is emplyed t chse the ptimal value f M. Mre specifically, the whle data set is randmly split int G flds. The ptimal M is chsen as the value which can minimize CV(M) = 1 n G g=1 (ŷ ( g) g y g ) (ŷ ( g) g y g ), where ŷ g ( g) dentes the predicted values fr the gth fld f the data when it is remved and the mdel is fitted using the ther G 1 flds f the data. 6

7 3. Simulatin studies In this sectin, tw simulatin studies are cnducted t evaluate the finite-sample perfrmance f ur prpsed apprach and cmpare it with ther alternative methds FSAM (4) with scalar cvariates A functinal cvariate is generated frm the first 20 Furier basis functins with eigenvalues defined, fr each k {1,..., 20}, by λ k = ab k ; we take a = 31.6 and b = 0.5. Mre specifically, X(t) = µ(t) + 20 k=1 ξ k ψ k (t) + e(t), where µ(t) = t + sin(t) dentes the mean functin f X(t), ξ k N(0, λ k ), the ψ k s dente the Furier basis functins and the measurement errr e(t) fllws N(0, 0.01), independently f all the ξ k s. We generate n = 1000 independent curves in ttal; each curve is sampled at 200 equally spaced pints between 0 and 10. The crrespnding scaled FPC scres, the ζ ik s, are defined as ζ ik = Φ(ξ ik / λ k ) fr all i {1,..., n} and k {1,..., 20}. Then the respnse variable y is generated frm the mdel defined, fr all i {1,..., n}, by In this mdel, y i = f 1 (ζ i1 ) + f 2 (ζ i2 ) + f 4 (ζ i4 ) + z i α 0 + ɛ i. a) f 1 (x) = xe x 1, f 2 (x) = cs(2πx) and f 4 (x) = 3(x 1/4) 2 7/16. They have a cmmn dmain [0, 1]. The nnparametric part is f (ζ) = f 1 (ζ 1 ) + f 2 (ζ 2 ) + f 4 (ζ 4 ), where ζ = (ζ 1,..., ζ 21 ) ; in ther wrds, the nn-vanishing nnparametric cmpnents are f 1, f 2 and f 4. b) z i = (z i1, z i2 ) is independent f X i (t); the tw cmpnents f z i are independently generated frm the U[0, 1] unifrm distributin; α 0 = ( 1, 2). c) ɛ i is independent f bth X i (t) and z i and is generated frm N(0, 1). The signal-t-nise rati, defined as var{ f (ζ)}/var(ɛ), is arund 1.75 under this setup. Amng the 1000 data pints (X 1 (t), z 1, y 1 ),..., (X 1000 (t), z 1000, y 1000 ), 200 are randmly selected as the training set and the remaining 800 data pints are treated as the test set. We used PACE t estimate FPC scres and then chse s that the first d fitted scres explain 99.9% f the variability in sample curves f the variability f the training set. We find d is arund 20 in all simulatin replicates. Let ˆζ i = (ˆζ i1,..., ˆζ id ) dente the estimate f ζ i. Then different methds are fitted t the triple (ˆζ i, z i, y i ), where i training set. The prpsed methd in this paper, FSAM-COSSO, is implemented t fit Mdel (4) t estimate and select nn-vanishing nnparametric cmpnents as well as estimating the cefficient vectr f the scalar cvariate z. In simulatin studies and real data applicatins presented in Sectin 4, we take the rder f the Sblev space t be l = 2. But the algrithm prpsed belw can be extended t mre general cases. MARS [11] fits an additive mdel fr (ˆζ i, y i z i α 0), assuming that the cefficients in the parametric part are knwn t be ( 1, 2) and y i z i α 0 is the new respnse. As a cmparisn, tw types f extended FAMs are cnsidered as well. The FSAM-GAMS mdel dentes a saturated mdel in which (ˆζ i1,..., ˆζ id ) are fitted by a generalized additive mdel (GAM) while z i1, z i2 are fitted in a linear frm. FSAM-COSSO differs frm FSAM-GAMS in that the latter des nt take cmpnent selectin int cnsideratin. In the secnd extended FAM, assuming that ζ 1, ζ 2 and ζ 4 are knwn t be the nly nn-vanishing features and the expressins f f 1, f 2, f 4 are knwn as well, a multiple linear regressin is fitted n ( f 1 (ˆζ i1 ), f 2 (ˆζ i2 ), f 4 (ˆζ i4 ), z i, y i ), in which y i dentes the respnse and the explanatry variables cnsist f f 1 (ˆζ i1 ), f 2 (ˆζ i2 ), f 4 (ˆζ i4 ), z i1, z i2 ; this mdel is called FSAM-GAM1 in this paper. The FSAM-PFLR mdel emplys a linear cmbinatin f ˆζ i1,..., ˆζ im, where m dentes the number f retained FPCs, t represent the effect f X i (t) n y i. It is a mdified versin f the partial functinal linear regressin prpsed by [28], where the effect f a functinal predictr is represented by a linear cmbinatin f the riginal FPC scres. The tuning parameter m is chsen based n AIC, as suggested in [28]. T investigate the effect f using the ˆζ i s n estimatin f each f j, we als 7

8 Table 1: Summary f the number f selected nnparametric cmpnents ver the 1000 simulatins fr each mdel. Mdel size indicates the number f nnparametric cmpnents selected in the mdel. In FSAM-GAMS we nly retain the significant nnparametric cmpnents (p-value less than 0.05). Here we implement the functin gam in the R package mgcv t fit FSAM-GAMS. The crrespnding p-values f nnparametric cmpnents are available frm the functin summary.gam. This selectin rule applies t FSAM-PFLR as well, where the p-value is available frm the functin lm. Mdel Cunts With the Mdel Size MARS FSAM-GAMS FSAM-PFLR FSAM-COSSO FSAM-COSSO implement the prpsed methd with true scres, the ζ i s, t fit the mdel. This methd is dented by FSAM-COSSO1 in the paper. T assess the perfrmance f the abve methds, 1000 simulatin replicates are cnducted t estimate the mean squared predicted errrs (MSPE) n the test set, which is defined as i(y i ŷ i ) 2 /n 0 with n 0 equal t the size f the test set. Besides predictin accuracy, we can als cmpare the perfrmance f the methds frm the perspective f mdel fitting; particularly, the number f selected nnparametric cmpnents, the frequency with which each ˆζ k is selected, and the bias and standard errr (SE) f the estimates f α are reprted fr each methd as well. Tables 1 and 2 summarize the number and frequency f nnparametric cmpnents selected in each methd ver the 1000 simulatins, respectively. FSAM-COSSO in mst cases selects the crrect number f nnparametric cmpnents. In cntrast, FSAM-GAMS and MARS are prne t retain sme irrelevant nnparametric cmpnents, which results in mre cmplex mdels and hence greater variance. Since AIC is emplyed t select the number f retained FPCs, FSAM-PFLR tends t yield a mdel with a relatively small size. As a result, even thugh it is less likely fr irrelevant features t be selected, FSAM-PFLR suffers frm frequently ignring relevant features. Furthermre, FSAM-COSSO nt nly selects relevant factrs (ζ 1, ζ 2, ζ 4 ) in almst every simulatin but als retains Table 2: Summary f frequency f each nnparametric cmpnent selected ver the 1000 simulatins fr each mdel. In FSAM-GAMS we nly retain the significant nnparametric cmpnents (p-value less than 0.05). This selectin rule applies t FSAM-PFLR as well. Mdel ˆ f 1 ˆ f 2 Frequency f Each Nnparametric Factr ˆ ˆ ˆ ˆ ˆ ˆ f 3 f 4 MARS FSAM-GAMS FSAM-PFLR FSAM-COSSO FSAM-COSSO ˆ f 11 ˆ f 12 ˆ f 13 ˆ f 14 MARS FSAM-GAMS FSAM-PFLR FSAM-COSSO FSAM-COSSO f 5 ˆ f 15 f 6 ˆ f 16 f 7 ˆ f 17 f 8 ˆ f 18 ˆ f 9 ˆ f 19 ˆ f 10 ˆ f 20 8

9 Table 3: Summary f estimated bias and standard errr (SE) f estimated α using each methd, and mean squared predictin errrs (MSPE). The abve statistics are calculated ver the 1000 simulatins. Nte that the clumn f MSPE crrespnds t average f MSPE ver the 1000 simulatins. Mdel Bias SE MSPE MARS 1.33 FSAM-GAM1 ( 0.025, 0.021) (0.255, 0.272) 1.15 FSAM-GAMS (0.032, 0.047) (0.282, 0.308) 1.41 FSAM-PFLR (0.034, 0.031) (0.303, 0.319) 1.67 FSAM-COSSO ( 0.026, 0.028) (0.262, 0.283) 1.20 FSAM-COSSO1 ( 0.022, 0.018) (0.249, 0.250) 1.11 irrelevant features cnsiderably less ften cmpared with MARS and FSAM-GAMS. The similarity between FSAM- COSSO and FSAM-COSSO1 suggests that replacing the true scres with estimates wuld make little difference in cmpnent selectin. Tables 1 and 2 therefre demnstrate that FSAM-COSSO enables us t better discver the nnparametric relatinship between the functinal cvariate X(t) and the respnse when a mdel is given in the frm f (4). Table 3 cmpares the abve methds in terms f the estimated bias and SE f the estimates f α and f the predictin accuracy which is represented by MSPE. T be specific we estimate the bias f an estimate ˆθ f a parameter θ with true value θ 0 by bias(ˆθ) = (ˆθ i θ 0 ) and SE(ˆθ) = (ˆθ i ˆθ) 2, in which ˆθ i dentes the estimated θ in the ith simulatin and ˆθ is average f ˆθ ver the 1000 simulatins. FSAM- COSSO cmpares favrably with the ther three cmpetitrs except FSAM-GAM1 and FSAM-COSSO1 in terms f predictin accuracy, even under the assumptin that α are knwn t be α 0 in MARS. In additin, the pint estimatr f α btained frm FSAM-COSSO is mre stable than its cunterparts frm the ther three cmpetitrs. Even thugh FSAM-GAM1 utperfrms FSAM-COSSO with respect t predictin accuracy and/r the bias and SE f estimated α, in practice we usually have n sufficient evidence t pint ut nn-vanishing nnparametric cmpnents in advance, let alne the clsed frms f these cmpnents. The bxplt in Figure 1 prvides a mre detailed cmparisn f predictin errrs amng the six methds ver the 1000 simulatins; it shws that FSAM-COSSO has a substantial advantage in predictin when the underlying mdel is given in the frm f (4) but unknwn. The fact that FSAM-COSSO1 utperfrms FSAM-GAM1 in predictin further indicates that the prpsed algrithm is effective in discvering predictive features f the respnse. In a randmly selected trial, 20 FPCs are retained initially such that ver 99.9% f the variability in the curves can be captured. Figure 2 illustrates hw crss-validatin is emplyed t chse the tuning parameter M. Chsing the value f M which can minimize the crss-validatin errr, FSAM-COSSO crrectly selects the three nn-vanishing nnparametric cmpnents. In additin, the ther three panels in Figure 2 display the estimated nnparametric cmpnents btained frm using the estimated scres and the true scres, as well as the true nnparametric cmpnents. It shws that estimates frm these tw methds are clse t the true nnparametric functins and there is little disagreement between the tw. This bservatin demnstrates that replacing true scres with the estimates has little impact n estimatin f the nnparametric cmpnents FSAM (4) withut scalar cvariates We als generate data in the same set up as in Sectin 3.1, except that the cefficient vectr fr the scalar cvariate z, α 0, is nw set t (0, 0). This is essentially the mdel discussed in [36]. Besides the methds emplyed in Sectin 3.1, we als apply a methd which regresses the scalar respnse y against ˆζ with COSSO regularizatin. This 9

10 methd is called FSAM-GAM2 in this paper. We als fit the data using the FSAM-GAM1 methd, which estimates FSAM (4) by assuming that ζ 1, ζ 2 and ζ 4 are knwn t be the nly nn-vanishing features, the parametric expressins f f 1, f 2, f 4 are knwn, and the cefficients f the scalar cvariate z are knwn t be 0. In ther wrds, the FSAM- GAM1 methd is essentially a a multiple linear regressin mdel with y i as the respnse and f 1 (ˆζ i1 ), f 2 (ˆζ i2 ), f 4 (ˆζ i4 ) as the explanatry variables. Results cmparing these methds are presented in the supplementary dcument. Table S1 in the Online Supplement summarizes the number f nnparametric cmpnents selected by each methd ver the 1000 Mnte Carl runs. There is nly a slight difference between FSAM-COSSO and FSAM-GAM2 in terms f selecting relevant cmpnents, which suggests that ur prpsed methd can still perfrm well in cmpnent selectin even if there is actually n effect frm scalar cvariates. Table S2 in the Online Supplement further cmpares these methds in a mre delicate way by prviding the frequency f each cmpnent selected acrss these 1000 Mnte Carl runs. Likewise in the scenari when there are scalar cvariates invlved in the mdel, FSAM-COSSO shws great advantages in retaining irrelevant cmpnents far less ften cmpared with MARS and FSAM-PFLR. In additin, the reasn why FSAM-PFLR cmpares favrably with FSAM-COSSO in retaining irrelevant cmpnents is that AIC tends t select a relatively small number f scaled FPC scres. Mre remarkable distinctins amng these methds can be fund in Figure S1 in the Online Supplement, which depicts the mean squared predictins n the test data ver these 1000 Mnte Carl runs. The perfrmance f the prpsed methd, FSAM-COSSO is slightly inferir t thse f FSAM-GAM1 and FSAM-GAM2, but much better than the ther methds in predictin accuracy. FSAM-GAM1 and FSAM-GAM2 each knw the mdel structure t sme extent in advance. That s why they can achieve greater predictin accuracy. Our methd, hwever, des nt assume that the linear part r the nnparametric part is knwn. Thus ur prpsed methd is highly cmpetitive in predictin cmpared with ther fitting methds. 4. Real data applicatins In this sectin, the prpsed methd (FSAM-COSSO) and several alternative methds are applied t analyze tw real datasets: the Tecatr data and attentin deficit hyperactivity disrder (ADHD) data Tecatr data The Tecatr data are recrded n a Tecatr Infratec Fd and Feed Analyzer wrking in the wavelength range nm by the Near Infrared Transmissin (NIT) principle. The dataset cnsists f 240 meat samples; a 100-channel spectrum f absrbance (negative base 10 lgarithms f the transmittance measured by the spectrmeter) is recrded fr each sample alng with the percentages f three cmpnents f the meat: misture (water), fat and prtein. The three cntents are determined by analytic chemistry. There has been extensive research n hw t predict the cntents Mean Squared Predictin Errr MARS FSAM GAM1 FSAM GAMS FSAM PFLR FSAM COSSO MSPE COSSO1 Figure 1: Mean squared predictin errrs f each methd ver 1000 simulatins. 10

11 Crss Validated Squared Errr M ^1 f ^2 f ^4 f Figure 2: The tp left panel shws hw the crss validatin errrs change acrss a range f plausible values fr the tuning parameter M. The ther 3 panels cmpare the estimated nnparametric cmpnents and the true underlying nnparametric cmpnents ( f 1, f 2, f 4 ). The blue lines dente the true nnparametric cmpnents, while the black and red lines represent the estimated nnparametric cmpnents frm FSAM-COSSO and FSAM-COSSO1, respectively. using the spectrum f absrbance; see, e.g., [6, 12, 31, 36]. The bjective f this study is t examine the effect f the spectral trajectries and the fat and water cntent f the meat sample n the prtein cntents by fitting Mdel (1). In Mdel (1), the respnse f primary interest, Y, is the prtein cntent; bth the spectral trajectries dented as X(t), and the fat and water cntents dented as z, are cnsidered as explanatry variables fr predicting the prtein cntent, differing frm the methd called the cmpnent selectin and estimatin fr functinal additive mdel (CSE- FAM) in [36] where nly spectral trajectries were taken int cnsideratin fr predicting the prtein cntent. After applying PACE t the spectral trajectries t btain estimated ζ i s, FSAM-COSSO is implemented t fit Mdel (4) t estimate and select nn-vanishing nnparametric cmpnents fr the respnse as well as estimating the effect f scalar cvariates n the respnse. The tp left panel f Figure 3 presents the spectral trajectries f the 240 meat samples. T assess the perfrmance f each methd, 185 ut f the 240 meat samples are randmly selected frm the training set and the remaining 55 samples cnstitute the test set. As suggested by Zhu et al. [36], the first 20 FPCs, accunting fr ver 99.9% f the ttal variability in the spectral trajectries, are initially retained t avid neglecting sme relevant FPCs. In additin, pairwise scatter plts amng the three cntents, illustrated in Figure 3, suggest a substantial multicllinearity between the fat and water cntents and a linear relatinship between the prtein cntent and the fat and/r water cntent. Therefre, nly the fat cntent is used in the parametric part when predicting the prtein cntent. We then apply FSAM-COSSO t estimate and select nnparametric cmpnents while estimating the effect f the fat cntent n the predictin f the prtein cntent. The tuning parameter λ 0 in the iterative updating algrithm is selected by sixfld crss-validatin, which gives λ 0 = Fivefld crss-validatin suggests that 13 is an ptimal chice fr M. The estimated nnparametric cmpnents are displayed in Figure S2 in the Online Supplement, which shws the 15 nnparametric cmpnents fˆ 1,..., fˆ 8, fˆ 11, fˆ 13,..., fˆ 18 are selected frm the 20 cmpnents. The estimated cefficient f the fat cntent is -0.19, crrespnding t the fact that the prtein cntent is negatively crrelated with the fat cntent indicated in the bttm right panel f Figure 3. 11

12 Fat Wavelengths (nm) Water Prtein Prtein Water Fat Figure 3: The tp left panel: the spectral trajectries recrded n the Tecatr Infratec Fd and Feed Analyzer. The ther three panels depict the scatter plts amng the three cntents. MSPE and quasi-r 2 (36) are calculated n the test data set t cmpare varius methds; the latter is defined by R 2 = 1 n (y i ŷ i ) 2/ n (y i ȳ) 2. The last 10 FPCs actually explain less than 0.01% f the ttal variability in the spectral curves. Hwever, they play a critical rle in predicting the prtein cntent, justifying why a sufficiently large number f principal cmpnents shuld be retained initially. T demnstrate the imprtance f the last 10 FPCs, the same mdel (FSAM-COSSO) is fit where nly the first 10 FPCs are initially retained. The mdel neglecting the last 10 FPCs appears t be cnsiderably inferir t that mdel retaining 20 FPCs initially in terms f bth MPSE and R 2. We als fit several alternative methds such as MARS and FSAM-FAMS t discver the relatinship between the prtein cntent and the explanatry variables and cmpare these mdels with FSAM-COSSO in predictin accuracy and the ability t explain the variability in the respnse. The effect f FPCs retained initially n the predictin f the prtein cntent is examined as well in the alternative methds. Furthermre, t investigate hw the predictin accuracy can be enhanced by incrprating the fat cntent in predictin mdels, the mdels are fitted by regressing the prtein cntent nly n the initially retained ˆζ i s and then cmpared t the crrespnding mdels where bth ˆζ i s and the fat cntent are cnsidered as explanatry variables. Table 4 summarizes predictin errrs and prprtins f variance explained n the test set f all methds. It can be bserved frm the table that retaining a sufficiently large number f FPCs initially can amelirate predictin accuracy t a great extent, even thugh the last 10 FPCs nly make a negligible cntributin t capturing the variance f the spectral curves. In additin, accunting fr the effect f the fat cntent in each f the abve mdels utperfrms its cunterpart, which des nt incrprate the effect f the fat cntent, in terms f predictin accuracy and explaining variability in the respnse variable. Last but nt least, the prpsed methd demnstrates its far superir ability t predict the respnse and explain its variance cmpared with ther methds. 12

13 Table 4: Summary f predictin errr and prprtin f variance explained n the test set f each mdel. FAM represents the functinal additive mdel (24) where nly the ˆζ i s are cnsidered as explanatry variables. MARS 0 dentes the MARS mdel cnsidering nly the ˆζ i s as explanatry variables while neglecting the effect f the fat cntent. d = 10 and d = 20 indicate that 10 and 20 leading FPCs are initially retained, respectively. d = 20 FSAM-COSSO CSEFAM FSAM-GAMS FAM MARS MARS 0 MSPE R d = 10 FSAM-COSSO CSEFAM FSAM-GAMS FAM MARS MARS 0 MSPE R ADHD data Attentin deficit hyperactivity disrder (ADHD) is the mst prevalent neurdevelpmental disrder in schl-age children and adlescents (Feldman & Reiff 9). The key symptms f ADHD cmprise inattentin, hyperactivity and impulsivity. Due t lack f bjective measurements in diagnsis, there have been critical cncerns fr apprpriate diagnsis f ADHD, which is assciated with substantial scial and ecnmic csts (Frd-Jnes 10). The data were btained frm the ADHD-200 Sample Initiative Prject, which aimed t seek bjective bilgical tls like neurimaging signals t aid diagnsis f ADHD. Our analysis is based n the data cllected frm the New Yrk University (NYU) Child Study Center, ne f the eight sites in the prject. The dataset cnsists f tw main parts. The first part is filtered preprcessed resting state data using an anatmical autmatic labeling atlas, which parcellates the brain int 116 Regins f Interests (ROIs). In each regin, the mean bld-xygen-level dependent (BOLD) signal was recrded at 172 equally spaced time pints. The secnd part is cmpsed f phentypic features like gender, handedness, diagnsis f ADHD, medicatin status, ADHD index and IQ measures. A mre detailed descriptin f the data can be fund in Bellec et al. [2]. Our bjective is t use the BOLD signals and phentypic features t predict the ADHD index, a measure which can reflect the verall level f ADHD symptms (5). We fcus n 120 subjects in ur analysis after remving measurements which failed in quality cntrl. The functinal predictr is taken as the BOLD signals f 91st 108th regins, because they are parcellatins f the cerebellum regin, which was fund t play a rle in the develpment f ADHD [3]. T cmpare the predictin perfrmance f each methd, these 120 subjects are randmly divided int a training set with 100 subjects and a test set with the ther 20 subjects. Fllwing this rule, we randmly split the data t the training and test set fr 100 times. Table 5 summarizes the mean squared predictin errr acrss the 100 splits fr each methd. FSAM-COSSO Table 5: Summary f predictin errr n the test set f each mdel. FLR represents the functinal linear mdel where nly the ˆζ i s are cnsidered as explanatry variables. Fr bth FSAM-PFLR and FLR, the number f retained FPCs is chsen via AIC. MARS 0 dentes the MARS mdel cnsidering nly the ˆζ i s as explanatry variables while neglecting ther phentypic features. 99.9% and 85% indicate that the first d FPCs initially retained can explain 99.9% and 85% f variability f curves in the training set, respectively. 99.9% FSAM-COSSO CSEFAM FSAM-PFLR FLR MARS MARS 0 MSPE % FSAM-COSSO CSEFAM FSAM-PFLR FLR MARS MARS 0 MSPE

14 turns ut t be substantially superir t ther methds in terms f predictin accuracy. In additin, accunting fr effects f phentypic features is able t imprve predictin accuracy greatly fr each methd. Mrever, fr methds ther than FSAM-COSSO, retaining a large number f FPC scres initially may impair predictin f the ADHD index as may be seen by cmparing the upper part with the lwer part f Table 5. The primary reasn fr this might be that the BOLD signal is nt a strng predictr f the ADHD index; thus incrprating mre FPC scres wuld add cnsiderable predictin variabilities while making little cntributin t reducing bias. Hwever, the negligible difference in predictin accuracy f FSAM-COSSO between these tw scenaris suggests that the prpsed methd manages t reduce variances via cmpnent selectin and thus achieve a better trade-ff between bias and variance. As a result, the prpsed methd can still achieve a satisfactry perfrmance in predictin even thugh a large number f irrelevant FPC scres are retained initially. 5. Asympttic prperties The fllwing therem is given nly in the case when the true scaled FPC scres ζ are knwn. It wuld be desirable t establish the crrespnding therem with the estimated scaled FPC scres, when the functinal data are densely bserved and may even be cntaminated with measurement errrs. This prblem was cnsidered by Zhu et al. [36], where there were n scalar cvariates. Thus it wuld be natural fr us t fllw their ideas t develp the therem. Unfrtunately we are nt able t fllw the prf f their Lemma 2, where a dual prblem is used t shw that the penalty term J( f ) is bunded by a cnstant independent f sample size. This statement is essential t shw that the derivative f the estimated functin is unifrmly bunded fr each argument. We are thus nt able t derive their subsequent results withut a bunded derivative. Fr this reasn we present the fllwing therem assuming the true scaled scres are knwn. We bserve, hwever, that simulatin studies shw that there wuld nt be remarkable differences in perfrmances when the true scres are replaced by estimates. Thus, althugh we expect that ur therem shuld extend t estimated scres, we have nt succeeded in ding s. We nw set ut cnditins t establish Therem 1 given that the true scaled FPC scres are knwn. Suppse that n iid bservatins are generated frm the fllwing mdel y i = f 0 (ζ i ) + α 0 z i + ɛ i, where ζ i [0, 1] d, α 0 R p, and f 0 is assumed t be an element f F d = {1} d H j=1 (again the sum indicates a direct sum f d cpies f H) with H = {1} H being the lth-rder Sblev space n [0, 1] with the nrm defined in (5). Nte that the estimated nnparametric part fˆ n and the estimated cefficient ˆα in the parametric part are defined as the slutin f the fllwing minimizatin prblem: ( fˆ n, ˆα) = arg min 1 n {y i f (ζ n i ) α z i } 2 + τ 2 nj( f ). f F d,α R p Cnsequently, the estimate f the cnditinal expectatin functin f y, g 0 (ζ, z) = f 0 (ζ) + α 0 z, is defined as The empirical nrm f g is defined as g n = ĝ n = ˆ f n + ˆα z. 1 n n g 2 (ζ i, z i ) fr g G = {g : g(ζ, z) = f (ζ) + α z, f F d, α R p }. Define h(ζ) = E(z ζ) and z = z h(ζ). Let Λ min (A) and Λ max (A) dente the minimal and maximal eigenvalues f a matrix A, respectively. The fllwing assumptins are needed. (A.1) Bth ζ and z are statistically independent f ɛ. Furthermre, E(ɛ) = 0 and max{e( z (1) ),..., E( z (p) )} <, where z ( j) dentes the jth cmpnent f z. 14

15 (A.2) Λ max [var{h(ζ)}] < and 0 < Λ min {var(z )} Λ max {var(z )} <. Obviusly 0 < Λ min {var(z)} Λ max {var(z)} < under (A.2). (A.3) The ɛ i s are (unifrmly) sub-gaussian, i.e., there exist sme cnstant K and σ 2 0, such that K2 (Ee ɛ2 i /K2 1) σ 2 0. (A.4) The supprt f z is cmpact in R p. The main result is the fllwing. Therem 1. Assume that Assumptins (A.1) (A.3) hld. (i) If 0 < J( f 0 ) <, and τ 1 n J( fˆ n ) = J( f 0 )O P (1). (ii) If f 0 is a cnstant, i.e., J( f 0 ) = 0, and τ 1 n = n l/(2l+1) {J( f 0 )} (2l 1)/(4l+2), we have ĝ n g 0 n = {J( f 0 )} 1/(2l+1) O P {n l(2l+1) } and = n 1/4, then we have ĝ n g 0 n = O P (n 1/2 ) and J( ˆ f n ) = O P (n 1/2 ). Crllary 1. If, in additin t Assumptins (A.1) (A.3), Assumptin (A.4) is hlds as well, then in either case (i) r (ii), we have fˆ n f 0 n = O P {n l(2l+1) } and ˆα α 0 E = O P {n l/(2l+1) }, where E dentes the Euclidean nrm f a vectr. Prfs f Therem 1 and Crllary 1 are given in the Online Supplement. 6. Cnclusins and discussins Semiparametric additive mdels are knwn t pssess the flexibility f a nnparametric mdel and the interpretability f a parametric mdel. In this paper, we prpse a functinal semiparametric additive mdel in which a scalar respnse is regressed n a functinal cvariate and finite-dimensinal scalar cvariates. T achieve flexibility and interpretability simultaneusly, the effect f the functinal cvariate n the mean respnse is mdeled in the framewrk f FAM, where the additive cmpnents are functins f scaled FPC scres, and a linear relatinship is assumed between the mean respnse and the scalar cvariates. We als develp an estimatin methd t estimate bth the nnparametric and parametric parts in the prpsed mdel. The estimatin prcedure cnsists f three imprtant steps. First, FPCA r PACE is emplyed t estimate FPC scres f the functinal cvariate which may be subject t measurement errrs. Secnd, we adpt a special regularizatin scheme (COSSO) t penalize the additive cmpnents t smth and select nn-vanishing cmpnents. Third, t address the issue f interdependence between the estimated nnparametric part and parametric part, we prpse an iterative updating algrithm, which is similar in spirit t the EM algrithm. We shw that chsing a sufficiently large number f FPCs is essential. On the ne hand, this can accunt fr a great prprtin f variability in the functinal cvariate. On the ther hand, retaining a sufficiently large number f FPCs can t a great extent circumvent neglecting predictive FPC scres with small variances, since there is n guarantee that leading FPC scres are necessarily mre relevant t the respnse. The imprtance f retaining a sufficiently large number f FPCs is demnstrated via the applicatin t the Tecatr data, where retaining a smaller number f FPCs results in substantially greater predictin errrs. The applicatins als shw that incrprating the effect f scalar cvariates can enhance predictin accuracy cmpared with mdels that nly accunt fr the effect f the functinal cvariate when the scalar cvariates are predictive f the respnse variable. The asympttic thery in ur article is based n the assumptin that the true scaled FPC scres are knwn, but in practice these are unavailable. We prvide an algrithm with respect t estimating FPC scres frm bserved curves which may be subject t measurement errrs and then estimating bth nnparametric and parametric parts in the mdel using estimated FPC scres. It wuld be nice t extend the thery t this case where true scaled FPC scres are nt bservable. The simulatin study suggests that the estimates are still quite clse t the true nnparametric and parametric parts when FPC scres are estimated. Even thugh this wrk fcuses n regressing a scalar respnse n a functinal cvariate and anther finitedimensinal cvariate, the methdlgy can be extended t accmmdate ther scenaris. Fr example, the framewrk may be extended t fit a generalized functinal semiparametric additive mdel in which the distributin f the respnse variable f interest belngs t an expnential family. In additin, mre than ne functinal cvariate can be 15

Resampling Methods. Chapter 5. Chapter 5 1 / 52

Resampling Methods. Chapter 5. Chapter 5 1 / 52 Resampling Methds Chapter 5 Chapter 5 1 / 52 1 51 Validatin set apprach 2 52 Crss validatin 3 53 Btstrap Chapter 5 2 / 52 Abut Resampling An imprtant statistical tl Pretending the data as ppulatin and