Generalized Functional Concurrent Model

Size: px

Start display at page:

Download "Generalized Functional Concurrent Model"

Delilah Rosemary Evans
6 years ago
Views:

1 Biometrics, Generalized Functional Concurrent Model Janet S. Kim, Arnab Maity, and Ana-Maria Staicu Department of Statistics, North Carolina State University, Raleigh, North Carolina 27695, U.S.A. * jskim3@ncsu.edu ** amaity@ncsu.edu *** ana-maria staicu@ncsu.edu Summary: We propose a flexible regression model to study the association between a functional response and a functional covariate that are observed on the same domain. Our modeling describes the relationship between the mean current response and the covariate by a smooth unknown bi-variate function that depends on both the current value of the covariate and the time point itself. We develop estimation methodology that accommodates realistic scenarios where the covariates are sampled with or without error on a sparse and/or irregular design, and prediction that accounts for unknown model correlation structure. In this framework we also discuss the problem of testing the null hypothesis that the covariate has no association with the response. The proposed methods are evaluated numerically through simulations and two real data applications. Key words: Functional data analysis; F-test; Generalized Functional Concurrent Model; Penalized B-splines; Prediction. This paper has been submitted for consideration for publication in Biometrics

2 Generalized Functional Concurrent Model 1 1. Introduction Regression problems where both the response and the covariate are functional have become more and more common in many scientific fields including medicine, finance, agriculture to name a few. Such regression problems are often called function-on-function regression, and their primary goal is to investigate the association between the response and predictor functional variables. In this article, we propose a flexible concurrent model that relate the current response to the predictor using a smooth unknown bivariate function that depends on the current predictor and the specific time-point. We develop estimation, inference and testing procedures for the functional covariate effect. The performance of the methods is evaluated numerically in finite samples and two data applications, gait study (Olshen et al., 1989; Ramsay and Silverman, 2005) and dietary calcium absorption study (Davis, 2002; Sentürk and Nguyen, 2011). Functional regression models where both the response and the covariate are functional have been long researched in the literature. Functional linear models rely on the assumption that the current response depends on full trajectory of the covariate. The dependence is modeled through a weighted integral of the full covariate trajectory through an unknown bivariate coefficient surface as weight. Estimation procedures for this model have been discussed in Ramsay and Silverman (2005), Yao, Müller, and Wang (2005b) and Wu, Fan, and Müller (2010), among others. The crucial dependence assumption for this type of models may be impractical in many real data situations. To bypass this limitation, one might use the functional historical models (see e.g., Malfait and Ramsay (2003)), where the current response is modeled using only the past observations of the covariate. Such models quantify the relation between the response and the functional covariate/s using a linear relationship via an unknown bivariate coefficient function. Another alternative is to assume a concurrent relationship, where the current response is modeled based on only the current value of the

3 2 Biometrics, 2015 covariate function. Functional linear concurrent models assume a linear relationship; they can be thought of a series of linear regressions for each time point, with the constraint that the regression coefficient is a univariate smooth function of time. Estimation procedure for such models have been described in Ramsay and Silverman (2005) and Sentürk and Nguyen (2011). While the linear approach provides easy interpretation for the estimated coefficient function, it may not capture all the variability of the response in practical situations where the underlying relationship is complex. In this article, we consider the functional concurrent model where we allow the relationship between the response and the covariate function to nonlinear. Specifically, we propose a model where the value of the response variable at a certain time point depends on both the time point and the covariate value at that time point through a smooth bivariate unknown function. Such formulation allows us to capture potential complex relationships between response and predictor functions as well as better capture out of sample prediction performance, as we will observe in our numerical investigation. Our model contains the standard linear concurrent model as a special case. We will show through numerical study that when the true underlying relationship is linear (that is, the linear concurrent model is in fact optimal), fitting our proposed model does not result in loss of prediction accuracy. On the other hand, when the true relationship is nonlinear, fitting the linear concurrent model results in high bias and loss of prediction accuracy. In this paper, we make two main contribution. First, we propose a general nonlinear functional concurrent model to describe association between two functional variables measured on the same domain, and develop prediction of a new response when only a new covariate and its evaluation points are available. We model the unknown regression function using tensor products of B-spline basis functions and develop a penalized least squares estimation procedure in conjunction with difference penalties (Eilers and Marx, 1996; Li and Ruppert,

4 Generalized Functional Concurrent Model ). General implementation methods of the tensor product splines can be found in Marx and Eilers (2005), Wood (2006a) and Wood (2006b). An important innovation of our methodology is that it accounts for correlated residual structure. Specifically, the estimators are calculated based on a working independence assumption, and the analytical form of their standard error, involving the covariance of the error process, is derived. The unknown covariance is then estimated by employing standard functional principal component analysis (FPCA) based methods (see e.g., Yao et al. (2005b); Di et al. (2009); Goldsmith, Greven, and Crainiceanu (2013)) to the resulted residuals. The final standard error bands are based on the analytical expressions with plug-in estimator of the model covariance. We also develop point-wise prediction intervals for new response curves. Second, we develop a testing procedure for evaluating the global effect of the functional covariate, that is, to test the null hypothesis that the functional covariate has no association with the response variable. We propose a F ratio type test statistic (see e.g., Shen and Faraway (2004); Xu et al. (2011)), and propose a resampling based algorithm to construct the null distribution of the test statistics. Our resampling procedure also takes into account the correlated error structure, and thus maintains nominal type I errors. The paper is organized as follows. In Section 2 we introduce the generalized functional concurrent model (GFCM) and its estimation procedure. Section 3 discusses prediction of new response trajectories. Section 4 deals with resampling based hypothesis testing. Section 5 presents how to implement the proposed model and various extensions. In Section 6, we investigate the empirical performance of our method through a simulation experiments and real data applications. Section 7 provides a brief discussion.

5 4 Biometrics, Generalized Functional Concurrent Model 2.1 Modeling Framework Suppose for i = 1, 2,..., n, we observe {(W ij, t ij ) : j = 1, 2,..., m X,i } and {(Y ik, t ik ) : k = 1, 2,..., m Y,i } where W ij s and Y ik s denote the covariate and response, respectively, observed at points t ij and t ik. It is assumed that t ij, t ik T for all i, j and k, and W ij = X i (t ij ) δ ij where X i ( ) is a real-valued, continuous, square-integrable, random curve defined on the compact interval T ; for convenience we take T = [0, 1] throughout the paper. Here δ ij is the measurement error and it is assumed that δ ij are independent and identically distributed with mean zero and variance τ 2. To illustrate ideas, we first consider W ij = X i (t ij ), which is equivalent to τ 2 = 0 and furthermore that t ij = t j, t ik = t k and furthermore that m X,i = m Y,i = m. We treat Y ik = Y i (t ik ) to emphasize the dependence on the time points t ik. Adaptation of our methods to more realistic scenarios where τ 2 > 0 or the different sampling design for X i s and Y i s are discussed in Section 5.2. The main objective of the paper is to provide flexible association models that relate the response at current time point to the current predictor, and to develop a testing procedure to assess whether there is any association between the predictor and response. We introduce the functional generalized concurrent model Y i (t) = F {X i (t), t} ɛ i (t), (1) where F (, ) is an unknown smooth function defined on R T, and ɛ i ( ) is a mean-zero error process uncorrelated with the predictor X i ( ). The standard functional linear concurrent model (Ramsay and Silverman, 2005; Ramsay, Hooker, and Graves, 2009) is a special case of model in (1) with F (x, t) = β 0 (t) xβ 1 (t), where β 0 ( ) and β 1 ( ) are unknown coefficient functions. We introduce two main innovations in (1). First, the general bivariate function F (, ) allows us to model a potentially complicated relationship between Y i ( ) and X i ( ) and extends the effect of the covariate beyond linearity. Second, we also allow the error

6 Generalized Functional Concurrent Model 5 process ɛ i ( ) to have its own covariance structure unlike the standard concurrent models where it is typically assumed that the errors are independent realizations from a white noise process. To be specific, we assume that cov{ɛ i (s), ɛ i (t)} = G(s, t) where G(, ) is a smooth unknown covariance function. The model dependence F (X i (t), t) is inspired from McLean et al. (2014) who used it to describe flexible association models between a scalar response and one or multiple dense functional covariates. Here we consider its application in cases when the response is functional, with unknown covariance structure, and the covariate is also functional data observed densely as well as sparsely with or without error. In the following we will develop an estimation procedure for the model components F (, ) and G(, ), discuss prediction of a new trajectory, and develop a testing procedure to formally assess the association between the response Y i ( ) and the true latent predictor X i ( ) in this general framework. The inferential procedures account for the nontrivial dependence within the subject. Our model and methodology are presented for the setting involving a single functional covariate; nevertheless they can be extended to incorporate other vector-valued covariates via a linear or smooth effect. 2.2 Estimation We model F (, ) using bivariate basis expansion using tensor product of B-splines as basis functions (see e.g., Marx and Eilers (2005); McLean et al. (2014)). Specifically we write F (x, t) = K x Kt k=1 l=1 θ k,lb X,k (x)b T,l (t), where {B X,k (x) : k = 1, 2,..., K x } and {B T,l (t) : l, 2,..., K t } are B-splines defined on [0,1], and θ k,l are unknown parameters. Then, model (1) can be written as Y i (t) = K x Kt k=1 l=1 θ k,lz i,k,l (t) ɛ i (t), (2) where Z i,k,l (t) = B X,k {X i (t)}b T,l (t), and K x and K t are the number of basis functions used. A larger number of basis functions would result in a better but rougher fit, while a small

7 6 Biometrics, 2015 number of basis functions results in overly smooth estimate. As is typical in the literature, we use rich bases to fully capture the complexity of the function and penalize coefficients to ensure smoothness of the resulting fit. Such strategies are discussed in Marx and Eilers (2005), Wood (2006a), Wood (2006b) and Goldsmith et al. (2012). Define the K x K t -vector Z i (t) = [Z i,k,l (t)] l=1,...,k t k=1,...,k x. Similarly, define the K x K t -vector of unknown coefficients Θ = [θ 1,1,..., θ 1,Kt,..., θ Kx,1,..., θ Kx,K t ] T. We can rewrite (2) as Y i (t) = Z i (t)θ ɛ i (t). This model can be viewed as a linear model with unknown parameter vector Θ, except that Y i ( ) is defined as a smooth function, and that the error process ɛ i ( ) is a mean-zero process with covariance G(, ). To prevent overfitting, we propose to estimate the unknown parameter Θ by a penalized criterion ni=1 Y i ( ) Z i ( )Θ 2 Θ T P Θ, where 2 is defined as common L 2 -norm corresponding to the inner product < f, g >= fg, and P is a penalty matrix containing penalty parameters that regularize the trade-off between the goodness of fit and the smoothness of fit. In practice, we observe Y i ( ) and X i ( ) on only a finite and dense grid of points t 1,..., t m. We can approximate the sum of squares as P ENSS(Θ) = n i=1 mj=1 {Y i (t j ) Z i (t j )Θ} 2 /m Θ T P Θ. (3) The case where the grid of points is irregular and sparse is presented in the Web Appendix. The penalty matrix in (3) is defined as P = λ x P T x P x λ t P T t P t and depends on two penalty parameters λ x and λ t. The parameters λ x and λ t control the smoothness in the direction of x and t, respectively. The row penalty P x = D x IKt and column penalty P t = I Kx Dt as defined as in Eilers and Marx (1996), Marx and Eilers (2005) and McLean et al. (2014). Here

8 Generalized Functional Concurrent Model 7 stands for the Kronecker product, IK is the identity matrix with dimension K, and D x and D t are matrices representing the row and column of second order difference penalties. An explicit form of the estimator Θ is readily available for fixed values of the penalty parameters. Define the m-dimensional vectors of response and covariate Y i = [Y i (t 1 ), Y i (t 2 ),..., Y i (t m )] T and X i = [X i (t 1 ), X i (t 2 ),..., X i (t m )] T. Similarly, define the m-dimensional vectors of errors E i = [ɛ i (t 1 ), ɛ i (t 2 ),..., ɛ i (t m )] T. Define matrices Z i such that the j-th row of Z i is Z i (t j ). The estimator Θ can be computed as Θ = { n i=1 Z T i Z i P } 1 { n i=1 Z T i Y i }. (4) In practice, the penalty parameters λ x and λ t can be chosen based on some appropriate criteria. For example one can use generalized cross validation (GCV), restricted maximum likelihood (REML), and maximum likelihood (ML) to estimate the smoothing parameters. Estimation under (3) can be fully implemented in R using functions of the mgcv package (Wood, 2011). 2.3 Variance Estimation The penalized criterion (3) does not account for the possibly correlated error process. While one can use standard software for solving the penalized least square problem, the default variance estimates, based on working independent error structure, may lead to incorrect inference. The variance of the parameter estimate Θ can be found as var( Θ) n = H{ Zi T GZ i }H T i=1 where H = { n i=1 Zi T Z i P } 1 and G = cov(e i ) = [G(t j, t k )] 1 j,k m is the m m covariance matrix evaluated at the observed time points. To address this problem, we assume that the error process has the form ɛ(t) = r(t) e(t), where r(t) is a zero-mean smooth stochastic process and e(t) is a zero-mean white noise measurement error with variance σ 2. Let Σ(s, t) be the autocovariance function of r(t). It

9 8 Biometrics, 2015 follows that the autocovariance of the random deviation ɛ(t), G(s, t) = Σ(s, t) σ 2 I(s = t) where I( ) is the indicator function, is unknown and needs to be estimated. To this end, we assume that Σ admits a spectral decomposition Σ(s, t) = k 1 φ k (s)φ k (t)λ k where {φ k ( ), λ k } are the eigencomponents. We first compute the residuals ɛ ij = Y i (t j ) Z i (t j ) Θ from the model fit, and employ FPCA methods to estimate φ k ( ), λ k, and σ 2. Then we estimate G by Ĝ(s, t) K k=1 λ k φ k (s) φ k (t) σ 2 I(s = t), for any s, t [0, 1], where K is the number of principal components. This truncation point is typically chosen by setting the percent of variance explained by the first few eigencomponents to pre-specified value such as 90% or 95%. 3. Prediction A main focus in this paper is prediction of unobserved response when a new covariate and its evaluation points are given. Suppose that we wish to predict new, unknown, response Y 0,i (t k ) for k = 1, 2,..., m when new observations X 0,i (t j ) (j = 1, 2,..., m) are given. We typically index subjects from new data by i {1, 2,..., n } to differentiate from the index i {1, 2,..., n} of original observed data. Then we assume that the model Y 0,i (t) = F {X 0,i (t), t} ɛ 0,i (t) (5) still holds for the new data, where the error process ɛ 0,i (t) has the same distributional assumption as ɛ i (t) in (1) and is uncorrelated with the new covariate X 0,i (t). For each subject i = 1,..., n, we predict the new response Y 0,i by Ŷ 0,i (t) = K x Kt θ k=1 l=1 k,l Z 0,i,k,l(t), where Z 0,i,k,l(t) = B X,k {X 0,i (t)}b T,l (t), and θ k,l are estimated based on (4). Uncertainty in the prediction depends on how small the difference is between the predicted response Ŷ0,i (t) and the true response Y 0,i (t). We follow an approach similar to Ruppert,

10 Generalized Functional Concurrent Model 9 Wand, and Carroll (2003) to estimate the prediction variance. Specifically, conditional on the new covariate, we have var{y 0,i (t) Ŷ0,i (t)} = var{ɛ 0,i (t)} var{ŷ0,i (t)}. Note that ɛ 0,i (t) in (5) is unobserved error process. However we can calculate the variance of the ɛ 0,i (t) relying on the assumption that the error process in the prediction set have the same covariance structure as in the original observed data set. Define Z 0,i = [Z 0,i,k,l(t)] l=1,...,kt k=1,...,k x. Then the prediction variance becomes var{y 0,i Ŷ 0,i } = G Z 0,i H{ n i=1 Zi T GZ i }H T Z0,i T, (6) where the j-th row of Z 0,i is Z 0,i (t j ), G is the m m covariance matrix evaluated at the observed time points. Note that this calculation involves both indices, i of the new data and i of the original data. The prediction variance can be estimated by plugging-in the sample estimate of G(, ) in (6). One can further define a 100(1 α)% point-wise prediction interval for the new observation Y 0,i (t) by C 1 α,i (t) = Ŷ0,i (t) ± Φ 1 (1 α/2)[ var{y 0,i (t) Ŷ0,i (t)}]1/2 where Φ( ) is the standard Gaussian cumulative distribution function (cdf). In the more general case when the new covariates X 0,i (t j ) are only observed on a sparsely sampled grid, we first compute the FPCA scores for the new covariates via the conditional expectation formula (see e.g., Yao et al. (2005a)) with the estimated eigenfunctions from the training data, and construct smooth versions of the new covariates. Then the procedure described above can be readily applied for prediction. 4. Hypothesis Testing In many situations, testing for association between the response and predictor variables is as important, if not more, as it is to estimate the model components. Often before performing

11 10 Biometrics, 2015 any estimation, it is preferred to test for association first to determine whether there is association to begin with and then a more in-depth analysis is done to determine the form of the relationship. In this section we consider the problem of testing whether the functional predictor variable is associated with the response. Specifically, we want to test H 0 : F {X(t), t} = F 0 (t) versus H 1 : F {X(t), t} = F 1 {X(t), t}, where F 0 (t) is a univariate function of t, and F 1 (x, t) is the bivariate function of x and t. Our testing procedure is based on first modelling the null effect F 0 (t) and then the full model F 1 (x, t) using basis function expressions in a manner that ensures that the null model is nested within the full model. Specifically, we propose to use B T = {B T,0 (t) = 1, B T,l (t), l 1} to model F 0 ( ) under the null model, where B T,l (t) (l 1) are the B- splines evaluated at time point t. To model F 1 (, ) under the full model, we use the same set of basis functions B T for t, but B X = {B X,0 (x) = 1, B X,l (x), l 1} for x, where B X,l (x) (l 1) are the B-splines evaluated at x. Then under the full model, we can write F 1 (x, t) = F 0 (t) k=1 l=0 B X,k (x)b T,l (t)θ k,l. We propose to compare the two models using the F ratio T n = (RSS 0 RSS 1 )/(df 1 df 0 ), (7) RSS 1 /(N df 1 ) where RSS i and df i (i = 0, 1) are the residual sums of squares and the effective degrees-offreedom (Wood, 2006a), respectively, under the null and the full models. Here N denotes the total number of observed data points. In this case, we have n subjects and m observations per subject, and thus, the total number of observed data becomes N = nm. This can be easily generalized when each subject have different number of observations. However, traditional F tests only make sense when asymptotics of test statistics are evident. In our cases, it is difficult to derive asymptotics of proposed test statistic in (7) due to potential uncertainty caused by applying smoothing techniques and penalization. Mostly, the degrees-of-freedom

12 Generalized Functional Concurrent Model 11 in test statistics will be affected by the penalization, and one might not achieve good level and power of a test by simply using the critical values from an F distribution. We instead propose to use a resampling based algorithm to construct an empirical sampling distribution for the test statistic; the steps are as follows. (1) Fit the full model F 1 (x, t) based on the estimation procedure of the GFCM, and obtain residuals { ɛ i (t j )} n i=1 by calculating Y i (t j ) Ŷi(t j ) for each i and j. (2) Fit the null model F 0 (t), and obtain its estimate F 0 (t). (3) Calculate the value of the test statistic in (7) based on the null and the full model fits; call this value T n,obs. (4) Resample B sets of bootstrap residuals E b(t) = { ɛ b,i(t)} n i=1 (b = 1, 2,..., B) with replacement from the residuals { ɛ i (t)} n i=1 obtained in the first step. (5) For b = 1, 2,..., B, generate response curves under the null model as Y b,i = F 0 (t) ɛ b,i(t). This provides B bootstrap data sets {X i (t), Y 1,i(t)} n i=1, {X i (t), Y 2,i(t)} n i=1,..., {X i (t), Y B,i(t)} n i=1. (6) Given the B bootstrap data sets, fit the null and the full models, and evaluate the test statistic in (7) for each data set, {T b } B b=1. These can be viewed as realizations from the distribution of T n under the assumption that H 0 is true. (7) Compute the p-value by p = B 1 B b=1 I{T b T n,obs }. Our proposed resampling algorithm has two advantages. First, the exact form of the null distribution of the test statistic T n is not required; the resampled version of the test statistic approximates the null distribution automatically. Second, our algorithm allows us to account for correlated error process ɛ( ). This is automatically done by sampling the entire residual vectors for each subject; and thus preserving the correlation structure within the residuals. We observed in our numerical study (results not shown) that preserving such correlation

13 12 Biometrics, 2015 structure is of particular importance, as ignoring the correlation results in severely inflated type I error. 5. Model Implementation and Extensions 5.1 Transformation of Functional Covariates One challenge with our estimation approach is that some B-splines might not have observed data on its support; this issue has been also emphasized in McLean et al. (2014). This problem typically arises when the realizations of the functional covariate X i (t j ) are not dense over R. To bypass this difficulty, we propose to first apply point-wise center/scaling transformation of the covariates; it is worthwhile to note that McLean et al. (2014) used a different approach to address this problem. In the generalized concurrent model framework, we define point-wise center/scaling transformation of X(t) by X (t) = {X(t) µ X (t)}/σ X (t) where µ X (t) and σ X (t) are mean and standard deviation of X(t). One can interpret the transformed covariate X (t) as the amount of standard deviation X(t) is away from the mean at time t. In practice, we estimate the mean and the standard deviation by the sample mean µ X (t) and the sample standard deviation σ X (t), respectively, of the covariates. Thus for a fixed point t j we will obtain realizations of the transformed covariates {Xi (t j )} n i=1 based on the sample mean µ X (t j ) and the sample standard deviation σ X (t j ) at the same point. The GFCM based on the transformed covariate X (t) can be written as Y i (t) = F {X i (t), t} ɛ i (t) = k=1 l=1 θ k,lz i,k,l(t) ɛ i (t), (8) where Z i,k,l(t) = B X,k {X i (t)}b T,l (t) and θ k,l are the parameters to be estimated. Note that Z i,k,l(t) and θ k,l are different from the previous quantities denoted by Z i,k,l (t) and θ k,l in model (2). To emphasize the difference, we introduce a new smooth function F in (8). Let

14 Generalized Functional Concurrent Model 13 Z i (t) = [Z i,k,l(t)] l=1,...,k t k=1,...,k x and Θ = [θ 1,1,..., θ 1,K t,..., θ K x,1,..., θ K x,k t ] T be the updated notations for the model components. Then the parameter estimates Θ can be found as Θ = { n i=1 Zi T Zi P } 1 { n i=1 Zi T Y i } (9) where the j-th row of Z i is Z i (t j ). Let G (, ) = cov{x (s), X (t)}. We compute the variance of Θ as var( Θ ) = H { n i=1 Zi T G Zi }H T, where H = { n i=1 Zi T Zi P } 1, and G is the m m covariance matrix of the process X ( ) evaluated at the observed time points t 1,..., t m. The prediction procedure described in Section 3 proceeds as before with the understanding that one now uses the transformed version of the new covariates. The dependence structure between Y i (t) and X i (t) is, in general, not the same as the one between Y i (t) and X i (t). Hence, the model fit F (x, t) of F (x, t) cannot be used directly to investigate the relationship between the response Y i (t) and the covariate X i (t). Instead, we will compare the response Y i (t) and its estimator Ŷi(t) because the estimator will be consistent without being affected by the transformation of the covariates. If the true relationship between Y i (t) and X i (t) is well captured by the model fit, the difference between Y i (t) and Ŷi(t) will be very small. 5.2 Extensions This section discusses modifications of the methodology that are required by realistic situations. In particular, we consider the case when the functional covariate is observed densely with error, or sparsely with or without noise, as well as when the sparseness of the covariate is different from that of the response. Assume first that functional covariate is observed on a dense and regular grid of points but with error; i.e. the observed predictors W ij s are such that W ij = X i (t j ) δ ij and the deviation δ ij has variance τ 2 > 0. Several approaches have been proposed to adjust for the measurement errors; Zhang and Chen (2007) proposed to first smooth each noisy trajectory using local polynomial kernel smoothing and then estimate the mean and standard deviation

15 14 Biometrics, 2015 of the covariate X i (t ij ) by their sample estimators. The recovered trajectories, say X i ( ) will estimate the latent ones X i ( ) with negligible error. The methodology described in Section 2.2 can be applied with X i ( ) in place of X i ( ). Numerical investigation of this approach is included in the simulation section. Then consider the case that the functional covariate is observed on a sparse and irregular grid of points with measurement error, i.e. W ij = X i (t ij ) δ ij. The common assumption made for this setting is that the number of observations m X,i for each subject is small, but ni=1 {t ij } m X,i j=1 is dense in T = [0, 1]. Reconstructing the latent trajectories X i ( ) is based on employing FPCA for sparse design to the observed W ij s. Yao, Müller, and Wang (2005a) proposed to use local linear smoothers to estimate mean and covariance functions. The latent trajectories are predicted using a finite Karhunen-Loève truncation using estimated eigenvalues/eigenfunctions from the spectral decomposition of the estimated covariance and predicted scores using conditional expectation. This method may be further applied to the response variable when the sampling design of the response is sparse as well. An alternative for the latter situation, is to use the standardized prediction of the covariate at the time points t ij at which the response is observed, X i (t ij ) and then continue the estimation using the data {Y ij, X i (t ij ) : j} n i=1. In preliminary investigation, the former method yielded slightly increased type I error rates; the results included in the simulation section use the latter method, which shows good performance in both estimation and testing evaluation. 6. Numerical Study In this section, we investigate the finite sample performance of our proposed methodology. We investigate the prediction accuracy of our method in Section 6.1. The performance of our proposed test is investigated in Section 6.2. Finally in Section 6.3, we assess practical behavior of our method by applying it to the gait data (Olshen et al., 1989; Ramsay and

16 Generalized Functional Concurrent Model 15 Silverman, 2005; Crainiceanu et al., 2014) and the dietary calcium absorption data (Davis, 2002; Sentürk and Nguyen, 2011) sets. 6.1 Prediction Performance Simulation Design. We aim to investigate the prediction accuracy of our method for both within sample and out of sample prediction. To achieve this, we construct training and test data sets assuming both are independent. We generate the covariate X i (t) of training data set by a nonlinear random process a 0i a 1i 2 sin(πt)a2i 2 cos(πt) where a0i N(0, 1), a 1i N(0, ) and a 2i N(0, ) for i = 1, 2,..., n. Assuming that the test data follows the same distribution as the training data, we generate the covariate X i (t) of test data set for i = 1, 2,..., n. Throughout the study, it is assumed that the covariate X i (t) of the training data are not observed directly. Instead we observe W ij = X i (t)wn(0,τ 2 ), where τ = The response Y i ( ) is generated using the model (1) using two choices for function F (, ): a non-linear version F NL (x, t) = 1 x t 2x 2 t and a linear counterpart F L (x, t) = 1 x t. We also consider three different types of the error process for E i : (1) E 1 i N(0, σ 2 ɛ I mi ); (2) E 2 i N(0, σ 2 ɛ Σ) N(0, σ 2 ɛ I mi ) where Σ has AR ρ (1) structure, and (3) E 3 i ξ i1 2 cos(πt) ξi2 2 sin(πt) N(0, σ 2 ɛ I mi ). We set σ 2 ɛ = 0.8 and ρ = 0.2. The random variables ξ i1 and ξ i2 are uncorrelated and are generated from N(0,2) and N(0, ), respectively. For the training data set, we considered three different sample sizes n = 50, 100 and 300. For each sample size setting, we consider two different sampling designs, dense and sparse. In the dense design, we sample m = 81 equally spaced time points for all i. The sparse design assumes that t ij = t ik for all i and the number of observations m i is randomly selected from {20,..., 31} for each i, preserving 24.7% 38.3% of the data. The time points t ij for j = 1, 2,..., m i are uniformly generated from the dense set {t j = (j 1)/80} 81 j=1. In the

17 16 Biometrics, 2015 Web Appendix, we further present the cases with reduced sparseness. For the test set, we use n = 100 and m = 81. For each of these choices we fit both the standard linear functional concurrent model (FCM) and our proposed generalized functional concurrent model (GFCM). When the true model is nonlinear, i.e. F (x, t) = F NL (x, t), one would expect the GFCM to perform better than FCM. On the other hand, if the true model is in fact linear, i.e. F (x, t) = F L (x, t) and FCM is indeed optimal in this case, such a comparison corresponds to whether the GFCM captures the relationship between the covariate and the response as well as FCM Evaluation Criteria. We perform N = 1000 Monte-Carlo simulations. Our performance measures are the root mean squared prediction error (RMSPE), the integrated coverage probability (ICP) and the integrated length (IL) of prediction bands. Define the in-sample RMSPE by where MSPE (r) = (nm) 1 n i=1 mj=1 {Y (r) i RMSPE in = N r=1 [MSPE (r) ] 1 2 /N, (t ij ) Ŷ (r) (t ij )} 2, and Y (r) (t ij ) and its estimate Ŷ (r) i (t ij ) are from the r-th Monte Carlo simulation. The out-of-sample RMSPE, denoted by RMSPE out, is defined similarly. We approximate (1 α) level point-wise prediction intervals to observe coverage probabilities at the nominal level. The ICP at the (1 α) level is given by i i ICP(1 α) = N n r=1 i =1 T I{Y 0,i (t) C(r) 1 α,i (t)}dt/(nn ), where C (r) 1 α,i (t) are the point-wise prediction intervals from the r-th Monte Carlo simulation and I( ) is the indicator function. We observe particular features of the prediction intervals by measuring their length. The IL of the (1 α) level prediction intervals is defined by IL = N n r=1 i =1 T 2MOE(r) 1 α,i (t)dt/(nn )

18 Generalized Functional Concurrent Model 17 with MOE (r) 1 α,i (t) = Φ 1 (1 α/2) ŝd{y (r) 0,i (t) Ŷ 0,i (t)}. Although the IL of the prediction intervals remains small, the band might fluctuate dramatically at some time point, and this will demonstrate a poor performance of variance estimation. Hence, we examine the range of the estimated standard errors (SE). For the prediction intervals, we define the minimum SE by min(se) = N n r=1 i =1 min{2moe (r) 1 α,i (t)}/(nn ). t We define the maximum SE, max(se), similarly. Then, R(SE)= [min(se), max(se)] provides the range of SE at the (1 α) level. For the prediction intervals, the nominal significance level is set as 0.95, 0.90, and Simulation Results. We obtain both the GFCM and the linear FCM fits using K x = K t = 7 cubic B-splines for x and t. When we apply FPCA, we set 99% of variance to be explained by the first few eigencomponents. Table 1 summarizes the predictive performance of the proposed procedure for different simulation scenarios. We first discuss the case when the true function is nonlinear (the top two panels in Table 1). We observe that RMSPE in and RMSPE out from fitting the GFCM are smaller than those from the linear FCM in all scenarios. The ICPs from the GFCM and the linear FCM are fairly close to the nominal levels of 0.95, 0.90, and However, on an average the linear FCM produces larger intervals, indicated by larger IL values, and the wider R(SE) compared to the GFCM. Such patterns confirm that the variability in prediction is not properly captured by the linear FCM when the true model is nonlinear. For less complicated error patterns (e.g., independent versus AR(1)), both models produce smaller prediction errors. However, GFCM still produces smaller errors compared to linear FCM. Also it is evident from Table 1 that the conclusions made above are true for different sample sizes as well as for dense and sparse sampling designs. We now explore the true linear model case (the bottom two panels in Table 1). When

19 18 Biometrics, 2015 the true model is in fact linear, we would expect that the linear FCM would provide the optimal results as it is the correct model to fit in this situation. From the results in Table 1), we observe that the RMSPE in and the RMSPE out from the GFCM and the linear FCM are almost same in all scenarios considered here. We observe a similar pattern when we compare other measures such as IL, R(SE) and ICP as well. This clearly implies that even when the true model is linear and one fits the more complicated model using the GFCM, prediction accuracy and coverage is not lost. Again, the number of subjects, sparseness of the sampling, and the error structure also slightly affect to the numerical result. Nevertheless, when the true model is linear, the overall predictive performance of the GFCM and the linear FCM is almost identical. To better visualize the results, Figure 1 displays prediction bands for the true nonlinear model (top panel) and for the true linear model (bottom panel) for a a single data set observed densely. In the top panel, the prediction bands for the linear FCM (dashed line) are much wider compared to the bands for the GFCM (solid line). This indicates that variance estimation from the linear FCM is less accurate. In the bottom panel, the prediction bands of the GFCM (grey solid line) and the linear FCM (dashed line) are almost identical indicating a similar performance in terms of the variance estimation when the true model is linear. In summary, from our numerical investigation we observe that when the true model is linear, fitting the GFCM does not result in any substantial loss in prediction accuracy over the FCM. However, when the true model is in fact nonlinear, fitting GFCM results is a significant gain in prediction accuracy over the standard FCM. 6.2 Testing Performance We assess the performance of resampling based tests using the same settings in the previous experiment. The true null model is defined as F BS (x, t, d) = 1 2t t 2 d(xt/8). If d is zero, the true model is a univariate function of time point t; otherwise, the true model depends

20 Generalized Functional Concurrent Model 19 on both x and t. Type I error of the test is investigated by setting d = 0, and power of the test correspond to nonzero values of d. We report the performance of the algorithm based on 1000 Monte Carlo simulations and B = 200 bootstraps for each simulation. Given a null model F 0 (t) = F BS (x, t, 0), we first compare the actual test levels to prescribed nominal levels α = 5% and 10%. In Web Table 1, we provide the rejection probabilities averaged over 1000 replications. It is evident from our results that the actual test levels get closer to the nominal level as the sample size increases. With correlated error structures, the test levels still maintain close to the nominal level both in the dense and sparse designs. We now calculate powers of our proposed test at nominal level α = 5%. We consider true null models with d = 0.1, 1, 2, 3, 4, 5 and 6, and results are presented in Figure 2. It is evident from the figure that for each case, the power increases with respect to the increasing values of d. We also notice that the tests based on the densely sampled design are more powerful than the sparsely sampled design, as one would expect. Nevertheless, one might encounter an important issue when implementing our resampling based algorithm. When calculating T n,obs or Tb, we occasionally observe negative RSS 0 RSS 1 both in the densely and the sparsely sampled designs. For those cases, we set RSS 0 RSS 1 = 0. We also recognize negative df 0 df 1 rarely in the sparse design. Such cases are excluded when calculating the actual test level and the power in all cases. 6.3 Applications Gait Data. In a study of gait deficiency, it is important to identify how the joints in hip and knee interact during a gait cycle (Theologis, 2009). Typically, one represents the timing of events occurring during a gait cycle as a percentage of this cycle, where the initial contact of a foot is recorded as 0% and the second contact of the same foot as 100%. As an illustrative example, we analyze longitudinal measurements of hip and knee angles taken from 39 children as they walk through a single gait cycle (Olshen et al., 1989; Ramsay and

21 20 Biometrics, 2015 Silverman, 2005; Crainiceanu et al., 2014). The hip and knee angles are measured at 20 evaluation points {t j } 20 j=1 in [0,1], which are translated from percent values of the cycle. In Web Figure 1, we display the observed individual trajectories of the hip and knee angles. In the analysis, the hip and knee angles are considered as densely observed functional covariate and functional response, respectively. We first employ our resampling based test to investigate whether the hip angles are associated with the knee angles. We select K x = K t = 11 cubic B-splines for x and t to fit the GFCM, and B = 250 bootstrap replications are used. The bootstrap p-value is computed to be less than 0.004, thus we conclude that the hip angle measured at a specific time point has a strong effect on the knee angle at the same time point. To assess how the hip and knee angles are related to each other we fit our proposed GFCM as well as linear FCM. We assess predictive accuracy by splitting the data into training and test sets of size 30 and 9. Assuming that the hip angles are observed with measurement errors, we smooth the covariate by FPCA and then apply the center/scaling transformation. We compare prediction errors obtained by fitting both the GFCM and the linear FCM. Also, as a benchmark model we further fit a linear mixed effect (LME) model Knee angle ij = (β 0 b 0i ) (β 1 b 1i )Hip angle ij ɛ ij where (b 0i, b 1i ) T are the random coefficients from N(0, R) with some covariance matrix R, and ɛ ij are the errors from N(0, σɛ 2 ). In this format, (b 0i, b 1i ) T and ɛ ij are assumed to be independent. For the GFCM and the linear FCM, we report the RMSPE, the ICP, the IL and the R(SE). For the LME model, we report similar measures, but we take average over the repeated measurements instead of integrating over the time domain. The results are collected in Table 2 (top panel). We observe that the LME model provides a poor predictive performance compared to the others, implying that models in the framework of concurrent regression model are obviously

22 Generalized Functional Concurrent Model 21 better. The GFCM presents slightly better predictive performances than the linear FCM, but the difference is negligible. For instance, prediction errors from the GFCM are smaller than the linear FCM. The R(SE) obtained from the GFCM are narrower at all significance levels. Figure 3, top panel, shows prediction bands obtained from the test data sets, and we only detect a small difference between the GFCM (solid line) and the linear FCM (dashed line). The bottom panel in Figure 3 displays a heat map of predicted surface in the GFCM. It is evident that the relationship between the hip angles and the knee angels is simply linear Dietary Calcium Absorption Data. As a sparse data example, we present an application to dietary calcium absorption data set (Davis, 2002; Sentürk and Nguyen, 2011). In a group of 188 patients, dietary and bone measurement tests are conducted approximately every five year, and calcium intake and calcium absorption are measured with respect to their ages. Patients are aged around 35 and 45 in the beginning of the study, and 1 to 4 repeated measurements are collected for each subject after then. Overall ages are ranging approximately from 35 to 64. In the analysis, ages are transformed into the values in [0,1], and the results are considered as evaluation points of the functions. There are dropouts and missed visits among the patients, yielding a sparse and irregular design of sampling. We investigate a relationship between the calcium intake and the calcium absorption measured at the same time point. We applied both the GFCM and the linear FCM by considering that the calcium intake and the calcium absorption are observed functional covariate and functional response, respectively. In Web Figure 2, we display the observed individual trajectories of the calcium intake and the calcium absorption along the patient s age at the visit. To begin, we employ our resampling based testing procedure to test the null hypothesis of no association. We select K x = K t = 11 cubic B-splines for x and t to fit the GFCM, and B = 250 bootstrap replications are used. The p-value of our test is obtained to be less

23 22 Biometrics, 2015 than 0.004, indicating that the calcium intake measured at a certain time point are strongly associated with the calcium absorption at the same time point. We next the analyze the predictive performance of the GFCM and FCM by forming the training and test sets of size 148 and 40, respectively. For the test data set, we smooth the individual trajectories to recover missed observations, thus the resulting covariate and response are smooth and have all observations on its support. Assuming that the calcium intake has a potential measurement errors, we smooth the sparse and noisy covariate of the training data and then apply the center/scaling transformation. To fit the GFCM, we again select K x = K t = 11 cubic B-splines for x and t. For the linear FCM, we use 11 cubic B- splines. Results of predictive accuracy are presented in Table 2 (bottom panel). In the table, the GFCM and the linear FCM are showing similar RMSPE in and RMSPE out, indicating a simple linear association between the calcium intake and absorption. In Figure 4, we display point-wise prediction intervals/bands for three subjects randomly selected from the test data set. In the figure, there are negligible differences between the GFCM (grey solid lines) and the linear FCM (dashed lines), further demonstrating linearity between the variables. 7. Discussion This article proposes a generalized functional concurrent model (GFCM) when both the response and the covariate are functional data observed on the same domain. Estimation and inference are discussed in realistic scenarios involving settings such as complex correlated error structures as well as sparse and/or irregular design, when the covariate is possibly contaminated with noise. The model and estimation framework can be easily extended to accommodate additional covariates via linear or smooth effects. In this flexible framework, the article considers testing for no covariate effect using F test. Bootstrap is used to estimate the null distribution of the test. The proposed methods are investigated numerically through simulations mimicking realistic settings. The results show that the new framework captures

24 Generalized Functional Concurrent Model 23 non-linear relationships between covariate and response, while it performs similarly to the common functional concurrent model, when the relationship is linear. Formally testing for linearity of the unknown relationship is also an interesting problem for future research. Acknowledgements Maity s research was supported by National Institute of Health grant R00 ES and an North Carolina State University faculty Research and Professional Development grant. Staicu s research was supported by National Institute of Health grant RO1 NS Supplementary Materials Web Appendices, Tables, and Figures referenced in Section 2 and Section 6 are available with this paper at the Biometrics website on Wiley Online Library. The computer code implementing our procedure is available with this paper at under the software section. References Crainiceanu, C. M., Reiss, P., Goldsmith, A. J., Huang, L., Huo, L., and Scheipl, F. (2014). Refund: Regression with Functional Data. R package version Davis, C. S. (2002). Statistical Methods for the Analysis of Repeated Measurements. New York, NY: Springer. Di, C.-Z., Crainiceanu, C., Caffo, B., and Punjabi, N. M. (2009). Multilevel functional principal component analysis. Annals of Applied Statistics 4, Eilers, P. H. C. and Marx, B. D. (1996). Flexible smoothing with B-splines and penalties. Statistical Science 11, Goldsmith, J., Bobb, J., Crainiceanu, C., Caffo, B., and Reich, D. (2012). Penalized functional regression. Journal of Computational and Graphical Statistics 20,

25 24 Biometrics, 2015 Goldsmith, J., Greven, S., and Crainiceanu, C. (2013) Corrected confidence bands for functional data using principal components. Biometrics 69, Li, Y. and Ruppert, D. (2008). On the asymptotics of penalized splines. Biometrika 95, Malfait, N. and Ramsay, J. O. (2003). The historical functional linear model. The Canadian Journal of statistics 31, Marx, B. D. and Eilers, P. H. C. (2005). Multivariate penalized signal regression. Technometrics 47, McLean, M. W., Hooker, G., Staicu, A. M., Scheipl, F., and Ruppert, D. (2014). Functional generalized additive models. Journal of Computational and Graphical Statistics 23, Olshen, R. A., Biden, E. N., Wyatt, M. P., and Sutherland, D. H. (1989). Gait analysis and the bootstrap. The Annals of Statistics 17, Ramsay, J. O., Hooker, G., and Graves, S. (2009). Functional Data Analysis in R and Matlab. New York, NY: Springer. Ramsay, J. O. and Silverman, B. W. (2005). Functional Data Analysis. 2nd edition. New York, NY: Springer. Ruppert, D., Wand, M. P., and Carroll, R. J. (2003). Semiparametric Regression. Cambridge, New York: Cambridge University Press. Sentürk, D. and Nguyen, D. V. (2011). Varying coefficient models for sparse noisecontaminated longitudinal data. Statistica Sinica, 21, Shen, Q. and Faraway, J. (2004). An F test for linear models with functional responses. Statistica Sinica, 14, Theologis, T. N. (2009). Children s Orthopaedics and Fractures (Chapter 6). 3rd edition. London: Springer.

26 Generalized Functional Concurrent Model 25 Wood, S. N. (2006a). Generalized Additive Models: An Introduction with R. Boca Raton, Florida: Chapman and Hall/CRC. Wood, S. N. (2006b). Low-rank scale-invariant tensor product smooths for generalized additive mixed models. Biometrics 62, Wood, S. N. (2011). mgcv: Mixed GAM Computation Vehicle with GCV/AIC/REML Smoothness Estimation. R package version Wu, Y., Fan, J., and Müller, H.-G. (2010). Varying-coefficient functional linear regression. Bernoulli 16, Xu, H., Shen, Q., Yang, X., and Shoptaw, S. (2011). A quasi F-test for functional linear models with functional covariates and its application to longitudinal data. Statistics in Medicine 30, Yao, F., Müller, H.-G., and Wang, J.-L. (2005a). Functional data analysis for sparse longitudinal data. Journal of the American Statistical Association 100, Yao, F., Müller, H.-G., and Wang, J.-L. (2005b). Functional linear regression analysis for longitudinal data. The Annals of Statistics 33, Zhang, J.-T. and Chen, J. (2007). Statistical inference for functional data. The Annals of Statistics 35, [Table 1 about here.] [Table 2 about here.] [Figure 1 about here.] [Figure 2 about here.] [Figure 3 about here.] [Figure 4 about here.]

Supplementary Material to General Functional Concurrent Model

Supplementary Material to General Functional Concurrent Model Janet S. Kim Arnab Maity Ana-Maria Staicu June 17, 2016 This Supplementary Material contains six sections. Appendix A discusses modifications