Generalized Functional Concurrent Model

Size: px
Start display at page:

Download "Generalized Functional Concurrent Model"

Transcription

1 Biometrics, Generalized Functional Concurrent Model Janet S. Kim, Arnab Maity, and Ana-Maria Staicu Department of Statistics, North Carolina State University, Raleigh, North Carolina 27695, U.S.A. * jskim3@ncsu.edu ** amaity@ncsu.edu *** ana-maria staicu@ncsu.edu Summary: We propose a flexible regression model to study the association between a functional response and a functional covariate that are observed on the same domain. Our modeling describes the relationship between the mean current response and the covariate by a smooth unknown bi-variate function that depends on both the current value of the covariate and the time point itself. We develop estimation methodology that accommodates realistic scenarios where the covariates are sampled with or without error on a sparse and/or irregular design, and prediction that accounts for unknown model correlation structure. In this framework we also discuss the problem of testing the null hypothesis that the covariate has no association with the response. The proposed methods are evaluated numerically through simulations and two real data applications. Key words: Functional data analysis; F-test; Generalized Functional Concurrent Model; Penalized B-splines; Prediction. This paper has been submitted for consideration for publication in Biometrics

2 Generalized Functional Concurrent Model 1 1. Introduction Regression problems where both the response and the covariate are functional have become more and more common in many scientific fields including medicine, finance, agriculture to name a few. Such regression problems are often called function-on-function regression, and their primary goal is to investigate the association between the response and predictor functional variables. In this article, we propose a flexible concurrent model that relate the current response to the predictor using a smooth unknown bivariate function that depends on the current predictor and the specific time-point. We develop estimation, inference and testing procedures for the functional covariate effect. The performance of the methods is evaluated numerically in finite samples and two data applications, gait study (Olshen et al., 1989; Ramsay and Silverman, 2005) and dietary calcium absorption study (Davis, 2002; Sentürk and Nguyen, 2011). Functional regression models where both the response and the covariate are functional have been long researched in the literature. Functional linear models rely on the assumption that the current response depends on full trajectory of the covariate. The dependence is modeled through a weighted integral of the full covariate trajectory through an unknown bivariate coefficient surface as weight. Estimation procedures for this model have been discussed in Ramsay and Silverman (2005), Yao, Müller, and Wang (2005b) and Wu, Fan, and Müller (2010), among others. The crucial dependence assumption for this type of models may be impractical in many real data situations. To bypass this limitation, one might use the functional historical models (see e.g., Malfait and Ramsay (2003)), where the current response is modeled using only the past observations of the covariate. Such models quantify the relation between the response and the functional covariate/s using a linear relationship via an unknown bivariate coefficient function. Another alternative is to assume a concurrent relationship, where the current response is modeled based on only the current value of the

3 2 Biometrics, 2015 covariate function. Functional linear concurrent models assume a linear relationship; they can be thought of a series of linear regressions for each time point, with the constraint that the regression coefficient is a univariate smooth function of time. Estimation procedure for such models have been described in Ramsay and Silverman (2005) and Sentürk and Nguyen (2011). While the linear approach provides easy interpretation for the estimated coefficient function, it may not capture all the variability of the response in practical situations where the underlying relationship is complex. In this article, we consider the functional concurrent model where we allow the relationship between the response and the covariate function to nonlinear. Specifically, we propose a model where the value of the response variable at a certain time point depends on both the time point and the covariate value at that time point through a smooth bivariate unknown function. Such formulation allows us to capture potential complex relationships between response and predictor functions as well as better capture out of sample prediction performance, as we will observe in our numerical investigation. Our model contains the standard linear concurrent model as a special case. We will show through numerical study that when the true underlying relationship is linear (that is, the linear concurrent model is in fact optimal), fitting our proposed model does not result in loss of prediction accuracy. On the other hand, when the true relationship is nonlinear, fitting the linear concurrent model results in high bias and loss of prediction accuracy. In this paper, we make two main contribution. First, we propose a general nonlinear functional concurrent model to describe association between two functional variables measured on the same domain, and develop prediction of a new response when only a new covariate and its evaluation points are available. We model the unknown regression function using tensor products of B-spline basis functions and develop a penalized least squares estimation procedure in conjunction with difference penalties (Eilers and Marx, 1996; Li and Ruppert,

4 Generalized Functional Concurrent Model ). General implementation methods of the tensor product splines can be found in Marx and Eilers (2005), Wood (2006a) and Wood (2006b). An important innovation of our methodology is that it accounts for correlated residual structure. Specifically, the estimators are calculated based on a working independence assumption, and the analytical form of their standard error, involving the covariance of the error process, is derived. The unknown covariance is then estimated by employing standard functional principal component analysis (FPCA) based methods (see e.g., Yao et al. (2005b); Di et al. (2009); Goldsmith, Greven, and Crainiceanu (2013)) to the resulted residuals. The final standard error bands are based on the analytical expressions with plug-in estimator of the model covariance. We also develop point-wise prediction intervals for new response curves. Second, we develop a testing procedure for evaluating the global effect of the functional covariate, that is, to test the null hypothesis that the functional covariate has no association with the response variable. We propose a F ratio type test statistic (see e.g., Shen and Faraway (2004); Xu et al. (2011)), and propose a resampling based algorithm to construct the null distribution of the test statistics. Our resampling procedure also takes into account the correlated error structure, and thus maintains nominal type I errors. The paper is organized as follows. In Section 2 we introduce the generalized functional concurrent model (GFCM) and its estimation procedure. Section 3 discusses prediction of new response trajectories. Section 4 deals with resampling based hypothesis testing. Section 5 presents how to implement the proposed model and various extensions. In Section 6, we investigate the empirical performance of our method through a simulation experiments and real data applications. Section 7 provides a brief discussion.

5 4 Biometrics, Generalized Functional Concurrent Model 2.1 Modeling Framework Suppose for i = 1, 2,..., n, we observe {(W ij, t ij ) : j = 1, 2,..., m X,i } and {(Y ik, t ik ) : k = 1, 2,..., m Y,i } where W ij s and Y ik s denote the covariate and response, respectively, observed at points t ij and t ik. It is assumed that t ij, t ik T for all i, j and k, and W ij = X i (t ij ) δ ij where X i ( ) is a real-valued, continuous, square-integrable, random curve defined on the compact interval T ; for convenience we take T = [0, 1] throughout the paper. Here δ ij is the measurement error and it is assumed that δ ij are independent and identically distributed with mean zero and variance τ 2. To illustrate ideas, we first consider W ij = X i (t ij ), which is equivalent to τ 2 = 0 and furthermore that t ij = t j, t ik = t k and furthermore that m X,i = m Y,i = m. We treat Y ik = Y i (t ik ) to emphasize the dependence on the time points t ik. Adaptation of our methods to more realistic scenarios where τ 2 > 0 or the different sampling design for X i s and Y i s are discussed in Section 5.2. The main objective of the paper is to provide flexible association models that relate the response at current time point to the current predictor, and to develop a testing procedure to assess whether there is any association between the predictor and response. We introduce the functional generalized concurrent model Y i (t) = F {X i (t), t} ɛ i (t), (1) where F (, ) is an unknown smooth function defined on R T, and ɛ i ( ) is a mean-zero error process uncorrelated with the predictor X i ( ). The standard functional linear concurrent model (Ramsay and Silverman, 2005; Ramsay, Hooker, and Graves, 2009) is a special case of model in (1) with F (x, t) = β 0 (t) xβ 1 (t), where β 0 ( ) and β 1 ( ) are unknown coefficient functions. We introduce two main innovations in (1). First, the general bivariate function F (, ) allows us to model a potentially complicated relationship between Y i ( ) and X i ( ) and extends the effect of the covariate beyond linearity. Second, we also allow the error

6 Generalized Functional Concurrent Model 5 process ɛ i ( ) to have its own covariance structure unlike the standard concurrent models where it is typically assumed that the errors are independent realizations from a white noise process. To be specific, we assume that cov{ɛ i (s), ɛ i (t)} = G(s, t) where G(, ) is a smooth unknown covariance function. The model dependence F (X i (t), t) is inspired from McLean et al. (2014) who used it to describe flexible association models between a scalar response and one or multiple dense functional covariates. Here we consider its application in cases when the response is functional, with unknown covariance structure, and the covariate is also functional data observed densely as well as sparsely with or without error. In the following we will develop an estimation procedure for the model components F (, ) and G(, ), discuss prediction of a new trajectory, and develop a testing procedure to formally assess the association between the response Y i ( ) and the true latent predictor X i ( ) in this general framework. The inferential procedures account for the nontrivial dependence within the subject. Our model and methodology are presented for the setting involving a single functional covariate; nevertheless they can be extended to incorporate other vector-valued covariates via a linear or smooth effect. 2.2 Estimation We model F (, ) using bivariate basis expansion using tensor product of B-splines as basis functions (see e.g., Marx and Eilers (2005); McLean et al. (2014)). Specifically we write F (x, t) = K x Kt k=1 l=1 θ k,lb X,k (x)b T,l (t), where {B X,k (x) : k = 1, 2,..., K x } and {B T,l (t) : l, 2,..., K t } are B-splines defined on [0,1], and θ k,l are unknown parameters. Then, model (1) can be written as Y i (t) = K x Kt k=1 l=1 θ k,lz i,k,l (t) ɛ i (t), (2) where Z i,k,l (t) = B X,k {X i (t)}b T,l (t), and K x and K t are the number of basis functions used. A larger number of basis functions would result in a better but rougher fit, while a small

7 6 Biometrics, 2015 number of basis functions results in overly smooth estimate. As is typical in the literature, we use rich bases to fully capture the complexity of the function and penalize coefficients to ensure smoothness of the resulting fit. Such strategies are discussed in Marx and Eilers (2005), Wood (2006a), Wood (2006b) and Goldsmith et al. (2012). Define the K x K t -vector Z i (t) = [Z i,k,l (t)] l=1,...,k t k=1,...,k x. Similarly, define the K x K t -vector of unknown coefficients Θ = [θ 1,1,..., θ 1,Kt,..., θ Kx,1,..., θ Kx,K t ] T. We can rewrite (2) as Y i (t) = Z i (t)θ ɛ i (t). This model can be viewed as a linear model with unknown parameter vector Θ, except that Y i ( ) is defined as a smooth function, and that the error process ɛ i ( ) is a mean-zero process with covariance G(, ). To prevent overfitting, we propose to estimate the unknown parameter Θ by a penalized criterion ni=1 Y i ( ) Z i ( )Θ 2 Θ T P Θ, where 2 is defined as common L 2 -norm corresponding to the inner product < f, g >= fg, and P is a penalty matrix containing penalty parameters that regularize the trade-off between the goodness of fit and the smoothness of fit. In practice, we observe Y i ( ) and X i ( ) on only a finite and dense grid of points t 1,..., t m. We can approximate the sum of squares as P ENSS(Θ) = n i=1 mj=1 {Y i (t j ) Z i (t j )Θ} 2 /m Θ T P Θ. (3) The case where the grid of points is irregular and sparse is presented in the Web Appendix. The penalty matrix in (3) is defined as P = λ x P T x P x λ t P T t P t and depends on two penalty parameters λ x and λ t. The parameters λ x and λ t control the smoothness in the direction of x and t, respectively. The row penalty P x = D x IKt and column penalty P t = I Kx Dt as defined as in Eilers and Marx (1996), Marx and Eilers (2005) and McLean et al. (2014). Here

8 Generalized Functional Concurrent Model 7 stands for the Kronecker product, IK is the identity matrix with dimension K, and D x and D t are matrices representing the row and column of second order difference penalties. An explicit form of the estimator Θ is readily available for fixed values of the penalty parameters. Define the m-dimensional vectors of response and covariate Y i = [Y i (t 1 ), Y i (t 2 ),..., Y i (t m )] T and X i = [X i (t 1 ), X i (t 2 ),..., X i (t m )] T. Similarly, define the m-dimensional vectors of errors E i = [ɛ i (t 1 ), ɛ i (t 2 ),..., ɛ i (t m )] T. Define matrices Z i such that the j-th row of Z i is Z i (t j ). The estimator Θ can be computed as Θ = { n i=1 Z T i Z i P } 1 { n i=1 Z T i Y i }. (4) In practice, the penalty parameters λ x and λ t can be chosen based on some appropriate criteria. For example one can use generalized cross validation (GCV), restricted maximum likelihood (REML), and maximum likelihood (ML) to estimate the smoothing parameters. Estimation under (3) can be fully implemented in R using functions of the mgcv package (Wood, 2011). 2.3 Variance Estimation The penalized criterion (3) does not account for the possibly correlated error process. While one can use standard software for solving the penalized least square problem, the default variance estimates, based on working independent error structure, may lead to incorrect inference. The variance of the parameter estimate Θ can be found as var( Θ) n = H{ Zi T GZ i }H T i=1 where H = { n i=1 Zi T Z i P } 1 and G = cov(e i ) = [G(t j, t k )] 1 j,k m is the m m covariance matrix evaluated at the observed time points. To address this problem, we assume that the error process has the form ɛ(t) = r(t) e(t), where r(t) is a zero-mean smooth stochastic process and e(t) is a zero-mean white noise measurement error with variance σ 2. Let Σ(s, t) be the autocovariance function of r(t). It

9 8 Biometrics, 2015 follows that the autocovariance of the random deviation ɛ(t), G(s, t) = Σ(s, t) σ 2 I(s = t) where I( ) is the indicator function, is unknown and needs to be estimated. To this end, we assume that Σ admits a spectral decomposition Σ(s, t) = k 1 φ k (s)φ k (t)λ k where {φ k ( ), λ k } are the eigencomponents. We first compute the residuals ɛ ij = Y i (t j ) Z i (t j ) Θ from the model fit, and employ FPCA methods to estimate φ k ( ), λ k, and σ 2. Then we estimate G by Ĝ(s, t) K k=1 λ k φ k (s) φ k (t) σ 2 I(s = t), for any s, t [0, 1], where K is the number of principal components. This truncation point is typically chosen by setting the percent of variance explained by the first few eigencomponents to pre-specified value such as 90% or 95%. 3. Prediction A main focus in this paper is prediction of unobserved response when a new covariate and its evaluation points are given. Suppose that we wish to predict new, unknown, response Y 0,i (t k ) for k = 1, 2,..., m when new observations X 0,i (t j ) (j = 1, 2,..., m) are given. We typically index subjects from new data by i {1, 2,..., n } to differentiate from the index i {1, 2,..., n} of original observed data. Then we assume that the model Y 0,i (t) = F {X 0,i (t), t} ɛ 0,i (t) (5) still holds for the new data, where the error process ɛ 0,i (t) has the same distributional assumption as ɛ i (t) in (1) and is uncorrelated with the new covariate X 0,i (t). For each subject i = 1,..., n, we predict the new response Y 0,i by Ŷ 0,i (t) = K x Kt θ k=1 l=1 k,l Z 0,i,k,l(t), where Z 0,i,k,l(t) = B X,k {X 0,i (t)}b T,l (t), and θ k,l are estimated based on (4). Uncertainty in the prediction depends on how small the difference is between the predicted response Ŷ0,i (t) and the true response Y 0,i (t). We follow an approach similar to Ruppert,

10 Generalized Functional Concurrent Model 9 Wand, and Carroll (2003) to estimate the prediction variance. Specifically, conditional on the new covariate, we have var{y 0,i (t) Ŷ0,i (t)} = var{ɛ 0,i (t)} var{ŷ0,i (t)}. Note that ɛ 0,i (t) in (5) is unobserved error process. However we can calculate the variance of the ɛ 0,i (t) relying on the assumption that the error process in the prediction set have the same covariance structure as in the original observed data set. Define Z 0,i = [Z 0,i,k,l(t)] l=1,...,kt k=1,...,k x. Then the prediction variance becomes var{y 0,i Ŷ 0,i } = G Z 0,i H{ n i=1 Zi T GZ i }H T Z0,i T, (6) where the j-th row of Z 0,i is Z 0,i (t j ), G is the m m covariance matrix evaluated at the observed time points. Note that this calculation involves both indices, i of the new data and i of the original data. The prediction variance can be estimated by plugging-in the sample estimate of G(, ) in (6). One can further define a 100(1 α)% point-wise prediction interval for the new observation Y 0,i (t) by C 1 α,i (t) = Ŷ0,i (t) ± Φ 1 (1 α/2)[ var{y 0,i (t) Ŷ0,i (t)}]1/2 where Φ( ) is the standard Gaussian cumulative distribution function (cdf). In the more general case when the new covariates X 0,i (t j ) are only observed on a sparsely sampled grid, we first compute the FPCA scores for the new covariates via the conditional expectation formula (see e.g., Yao et al. (2005a)) with the estimated eigenfunctions from the training data, and construct smooth versions of the new covariates. Then the procedure described above can be readily applied for prediction. 4. Hypothesis Testing In many situations, testing for association between the response and predictor variables is as important, if not more, as it is to estimate the model components. Often before performing

11 10 Biometrics, 2015 any estimation, it is preferred to test for association first to determine whether there is association to begin with and then a more in-depth analysis is done to determine the form of the relationship. In this section we consider the problem of testing whether the functional predictor variable is associated with the response. Specifically, we want to test H 0 : F {X(t), t} = F 0 (t) versus H 1 : F {X(t), t} = F 1 {X(t), t}, where F 0 (t) is a univariate function of t, and F 1 (x, t) is the bivariate function of x and t. Our testing procedure is based on first modelling the null effect F 0 (t) and then the full model F 1 (x, t) using basis function expressions in a manner that ensures that the null model is nested within the full model. Specifically, we propose to use B T = {B T,0 (t) = 1, B T,l (t), l 1} to model F 0 ( ) under the null model, where B T,l (t) (l 1) are the B- splines evaluated at time point t. To model F 1 (, ) under the full model, we use the same set of basis functions B T for t, but B X = {B X,0 (x) = 1, B X,l (x), l 1} for x, where B X,l (x) (l 1) are the B-splines evaluated at x. Then under the full model, we can write F 1 (x, t) = F 0 (t) k=1 l=0 B X,k (x)b T,l (t)θ k,l. We propose to compare the two models using the F ratio T n = (RSS 0 RSS 1 )/(df 1 df 0 ), (7) RSS 1 /(N df 1 ) where RSS i and df i (i = 0, 1) are the residual sums of squares and the effective degrees-offreedom (Wood, 2006a), respectively, under the null and the full models. Here N denotes the total number of observed data points. In this case, we have n subjects and m observations per subject, and thus, the total number of observed data becomes N = nm. This can be easily generalized when each subject have different number of observations. However, traditional F tests only make sense when asymptotics of test statistics are evident. In our cases, it is difficult to derive asymptotics of proposed test statistic in (7) due to potential uncertainty caused by applying smoothing techniques and penalization. Mostly, the degrees-of-freedom

12 Generalized Functional Concurrent Model 11 in test statistics will be affected by the penalization, and one might not achieve good level and power of a test by simply using the critical values from an F distribution. We instead propose to use a resampling based algorithm to construct an empirical sampling distribution for the test statistic; the steps are as follows. (1) Fit the full model F 1 (x, t) based on the estimation procedure of the GFCM, and obtain residuals { ɛ i (t j )} n i=1 by calculating Y i (t j ) Ŷi(t j ) for each i and j. (2) Fit the null model F 0 (t), and obtain its estimate F 0 (t). (3) Calculate the value of the test statistic in (7) based on the null and the full model fits; call this value T n,obs. (4) Resample B sets of bootstrap residuals E b(t) = { ɛ b,i(t)} n i=1 (b = 1, 2,..., B) with replacement from the residuals { ɛ i (t)} n i=1 obtained in the first step. (5) For b = 1, 2,..., B, generate response curves under the null model as Y b,i = F 0 (t) ɛ b,i(t). This provides B bootstrap data sets {X i (t), Y 1,i(t)} n i=1, {X i (t), Y 2,i(t)} n i=1,..., {X i (t), Y B,i(t)} n i=1. (6) Given the B bootstrap data sets, fit the null and the full models, and evaluate the test statistic in (7) for each data set, {T b } B b=1. These can be viewed as realizations from the distribution of T n under the assumption that H 0 is true. (7) Compute the p-value by p = B 1 B b=1 I{T b T n,obs }. Our proposed resampling algorithm has two advantages. First, the exact form of the null distribution of the test statistic T n is not required; the resampled version of the test statistic approximates the null distribution automatically. Second, our algorithm allows us to account for correlated error process ɛ( ). This is automatically done by sampling the entire residual vectors for each subject; and thus preserving the correlation structure within the residuals. We observed in our numerical study (results not shown) that preserving such correlation

13 12 Biometrics, 2015 structure is of particular importance, as ignoring the correlation results in severely inflated type I error. 5. Model Implementation and Extensions 5.1 Transformation of Functional Covariates One challenge with our estimation approach is that some B-splines might not have observed data on its support; this issue has been also emphasized in McLean et al. (2014). This problem typically arises when the realizations of the functional covariate X i (t j ) are not dense over R. To bypass this difficulty, we propose to first apply point-wise center/scaling transformation of the covariates; it is worthwhile to note that McLean et al. (2014) used a different approach to address this problem. In the generalized concurrent model framework, we define point-wise center/scaling transformation of X(t) by X (t) = {X(t) µ X (t)}/σ X (t) where µ X (t) and σ X (t) are mean and standard deviation of X(t). One can interpret the transformed covariate X (t) as the amount of standard deviation X(t) is away from the mean at time t. In practice, we estimate the mean and the standard deviation by the sample mean µ X (t) and the sample standard deviation σ X (t), respectively, of the covariates. Thus for a fixed point t j we will obtain realizations of the transformed covariates {Xi (t j )} n i=1 based on the sample mean µ X (t j ) and the sample standard deviation σ X (t j ) at the same point. The GFCM based on the transformed covariate X (t) can be written as Y i (t) = F {X i (t), t} ɛ i (t) = k=1 l=1 θ k,lz i,k,l(t) ɛ i (t), (8) where Z i,k,l(t) = B X,k {X i (t)}b T,l (t) and θ k,l are the parameters to be estimated. Note that Z i,k,l(t) and θ k,l are different from the previous quantities denoted by Z i,k,l (t) and θ k,l in model (2). To emphasize the difference, we introduce a new smooth function F in (8). Let

14 Generalized Functional Concurrent Model 13 Z i (t) = [Z i,k,l(t)] l=1,...,k t k=1,...,k x and Θ = [θ 1,1,..., θ 1,K t,..., θ K x,1,..., θ K x,k t ] T be the updated notations for the model components. Then the parameter estimates Θ can be found as Θ = { n i=1 Zi T Zi P } 1 { n i=1 Zi T Y i } (9) where the j-th row of Z i is Z i (t j ). Let G (, ) = cov{x (s), X (t)}. We compute the variance of Θ as var( Θ ) = H { n i=1 Zi T G Zi }H T, where H = { n i=1 Zi T Zi P } 1, and G is the m m covariance matrix of the process X ( ) evaluated at the observed time points t 1,..., t m. The prediction procedure described in Section 3 proceeds as before with the understanding that one now uses the transformed version of the new covariates. The dependence structure between Y i (t) and X i (t) is, in general, not the same as the one between Y i (t) and X i (t). Hence, the model fit F (x, t) of F (x, t) cannot be used directly to investigate the relationship between the response Y i (t) and the covariate X i (t). Instead, we will compare the response Y i (t) and its estimator Ŷi(t) because the estimator will be consistent without being affected by the transformation of the covariates. If the true relationship between Y i (t) and X i (t) is well captured by the model fit, the difference between Y i (t) and Ŷi(t) will be very small. 5.2 Extensions This section discusses modifications of the methodology that are required by realistic situations. In particular, we consider the case when the functional covariate is observed densely with error, or sparsely with or without noise, as well as when the sparseness of the covariate is different from that of the response. Assume first that functional covariate is observed on a dense and regular grid of points but with error; i.e. the observed predictors W ij s are such that W ij = X i (t j ) δ ij and the deviation δ ij has variance τ 2 > 0. Several approaches have been proposed to adjust for the measurement errors; Zhang and Chen (2007) proposed to first smooth each noisy trajectory using local polynomial kernel smoothing and then estimate the mean and standard deviation

15 14 Biometrics, 2015 of the covariate X i (t ij ) by their sample estimators. The recovered trajectories, say X i ( ) will estimate the latent ones X i ( ) with negligible error. The methodology described in Section 2.2 can be applied with X i ( ) in place of X i ( ). Numerical investigation of this approach is included in the simulation section. Then consider the case that the functional covariate is observed on a sparse and irregular grid of points with measurement error, i.e. W ij = X i (t ij ) δ ij. The common assumption made for this setting is that the number of observations m X,i for each subject is small, but ni=1 {t ij } m X,i j=1 is dense in T = [0, 1]. Reconstructing the latent trajectories X i ( ) is based on employing FPCA for sparse design to the observed W ij s. Yao, Müller, and Wang (2005a) proposed to use local linear smoothers to estimate mean and covariance functions. The latent trajectories are predicted using a finite Karhunen-Loève truncation using estimated eigenvalues/eigenfunctions from the spectral decomposition of the estimated covariance and predicted scores using conditional expectation. This method may be further applied to the response variable when the sampling design of the response is sparse as well. An alternative for the latter situation, is to use the standardized prediction of the covariate at the time points t ij at which the response is observed, X i (t ij ) and then continue the estimation using the data {Y ij, X i (t ij ) : j} n i=1. In preliminary investigation, the former method yielded slightly increased type I error rates; the results included in the simulation section use the latter method, which shows good performance in both estimation and testing evaluation. 6. Numerical Study In this section, we investigate the finite sample performance of our proposed methodology. We investigate the prediction accuracy of our method in Section 6.1. The performance of our proposed test is investigated in Section 6.2. Finally in Section 6.3, we assess practical behavior of our method by applying it to the gait data (Olshen et al., 1989; Ramsay and

16 Generalized Functional Concurrent Model 15 Silverman, 2005; Crainiceanu et al., 2014) and the dietary calcium absorption data (Davis, 2002; Sentürk and Nguyen, 2011) sets. 6.1 Prediction Performance Simulation Design. We aim to investigate the prediction accuracy of our method for both within sample and out of sample prediction. To achieve this, we construct training and test data sets assuming both are independent. We generate the covariate X i (t) of training data set by a nonlinear random process a 0i a 1i 2 sin(πt)a2i 2 cos(πt) where a0i N(0, 1), a 1i N(0, ) and a 2i N(0, ) for i = 1, 2,..., n. Assuming that the test data follows the same distribution as the training data, we generate the covariate X i (t) of test data set for i = 1, 2,..., n. Throughout the study, it is assumed that the covariate X i (t) of the training data are not observed directly. Instead we observe W ij = X i (t)wn(0,τ 2 ), where τ = The response Y i ( ) is generated using the model (1) using two choices for function F (, ): a non-linear version F NL (x, t) = 1 x t 2x 2 t and a linear counterpart F L (x, t) = 1 x t. We also consider three different types of the error process for E i : (1) E 1 i N(0, σ 2 ɛ I mi ); (2) E 2 i N(0, σ 2 ɛ Σ) N(0, σ 2 ɛ I mi ) where Σ has AR ρ (1) structure, and (3) E 3 i ξ i1 2 cos(πt) ξi2 2 sin(πt) N(0, σ 2 ɛ I mi ). We set σ 2 ɛ = 0.8 and ρ = 0.2. The random variables ξ i1 and ξ i2 are uncorrelated and are generated from N(0,2) and N(0, ), respectively. For the training data set, we considered three different sample sizes n = 50, 100 and 300. For each sample size setting, we consider two different sampling designs, dense and sparse. In the dense design, we sample m = 81 equally spaced time points for all i. The sparse design assumes that t ij = t ik for all i and the number of observations m i is randomly selected from {20,..., 31} for each i, preserving 24.7% 38.3% of the data. The time points t ij for j = 1, 2,..., m i are uniformly generated from the dense set {t j = (j 1)/80} 81 j=1. In the

17 16 Biometrics, 2015 Web Appendix, we further present the cases with reduced sparseness. For the test set, we use n = 100 and m = 81. For each of these choices we fit both the standard linear functional concurrent model (FCM) and our proposed generalized functional concurrent model (GFCM). When the true model is nonlinear, i.e. F (x, t) = F NL (x, t), one would expect the GFCM to perform better than FCM. On the other hand, if the true model is in fact linear, i.e. F (x, t) = F L (x, t) and FCM is indeed optimal in this case, such a comparison corresponds to whether the GFCM captures the relationship between the covariate and the response as well as FCM Evaluation Criteria. We perform N = 1000 Monte-Carlo simulations. Our performance measures are the root mean squared prediction error (RMSPE), the integrated coverage probability (ICP) and the integrated length (IL) of prediction bands. Define the in-sample RMSPE by where MSPE (r) = (nm) 1 n i=1 mj=1 {Y (r) i RMSPE in = N r=1 [MSPE (r) ] 1 2 /N, (t ij ) Ŷ (r) (t ij )} 2, and Y (r) (t ij ) and its estimate Ŷ (r) i (t ij ) are from the r-th Monte Carlo simulation. The out-of-sample RMSPE, denoted by RMSPE out, is defined similarly. We approximate (1 α) level point-wise prediction intervals to observe coverage probabilities at the nominal level. The ICP at the (1 α) level is given by i i ICP(1 α) = N n r=1 i =1 T I{Y 0,i (t) C(r) 1 α,i (t)}dt/(nn ), where C (r) 1 α,i (t) are the point-wise prediction intervals from the r-th Monte Carlo simulation and I( ) is the indicator function. We observe particular features of the prediction intervals by measuring their length. The IL of the (1 α) level prediction intervals is defined by IL = N n r=1 i =1 T 2MOE(r) 1 α,i (t)dt/(nn )

18 Generalized Functional Concurrent Model 17 with MOE (r) 1 α,i (t) = Φ 1 (1 α/2) ŝd{y (r) 0,i (t) Ŷ 0,i (t)}. Although the IL of the prediction intervals remains small, the band might fluctuate dramatically at some time point, and this will demonstrate a poor performance of variance estimation. Hence, we examine the range of the estimated standard errors (SE). For the prediction intervals, we define the minimum SE by min(se) = N n r=1 i =1 min{2moe (r) 1 α,i (t)}/(nn ). t We define the maximum SE, max(se), similarly. Then, R(SE)= [min(se), max(se)] provides the range of SE at the (1 α) level. For the prediction intervals, the nominal significance level is set as 0.95, 0.90, and Simulation Results. We obtain both the GFCM and the linear FCM fits using K x = K t = 7 cubic B-splines for x and t. When we apply FPCA, we set 99% of variance to be explained by the first few eigencomponents. Table 1 summarizes the predictive performance of the proposed procedure for different simulation scenarios. We first discuss the case when the true function is nonlinear (the top two panels in Table 1). We observe that RMSPE in and RMSPE out from fitting the GFCM are smaller than those from the linear FCM in all scenarios. The ICPs from the GFCM and the linear FCM are fairly close to the nominal levels of 0.95, 0.90, and However, on an average the linear FCM produces larger intervals, indicated by larger IL values, and the wider R(SE) compared to the GFCM. Such patterns confirm that the variability in prediction is not properly captured by the linear FCM when the true model is nonlinear. For less complicated error patterns (e.g., independent versus AR(1)), both models produce smaller prediction errors. However, GFCM still produces smaller errors compared to linear FCM. Also it is evident from Table 1 that the conclusions made above are true for different sample sizes as well as for dense and sparse sampling designs. We now explore the true linear model case (the bottom two panels in Table 1). When

19 18 Biometrics, 2015 the true model is in fact linear, we would expect that the linear FCM would provide the optimal results as it is the correct model to fit in this situation. From the results in Table 1), we observe that the RMSPE in and the RMSPE out from the GFCM and the linear FCM are almost same in all scenarios considered here. We observe a similar pattern when we compare other measures such as IL, R(SE) and ICP as well. This clearly implies that even when the true model is linear and one fits the more complicated model using the GFCM, prediction accuracy and coverage is not lost. Again, the number of subjects, sparseness of the sampling, and the error structure also slightly affect to the numerical result. Nevertheless, when the true model is linear, the overall predictive performance of the GFCM and the linear FCM is almost identical. To better visualize the results, Figure 1 displays prediction bands for the true nonlinear model (top panel) and for the true linear model (bottom panel) for a a single data set observed densely. In the top panel, the prediction bands for the linear FCM (dashed line) are much wider compared to the bands for the GFCM (solid line). This indicates that variance estimation from the linear FCM is less accurate. In the bottom panel, the prediction bands of the GFCM (grey solid line) and the linear FCM (dashed line) are almost identical indicating a similar performance in terms of the variance estimation when the true model is linear. In summary, from our numerical investigation we observe that when the true model is linear, fitting the GFCM does not result in any substantial loss in prediction accuracy over the FCM. However, when the true model is in fact nonlinear, fitting GFCM results is a significant gain in prediction accuracy over the standard FCM. 6.2 Testing Performance We assess the performance of resampling based tests using the same settings in the previous experiment. The true null model is defined as F BS (x, t, d) = 1 2t t 2 d(xt/8). If d is zero, the true model is a univariate function of time point t; otherwise, the true model depends

20 Generalized Functional Concurrent Model 19 on both x and t. Type I error of the test is investigated by setting d = 0, and power of the test correspond to nonzero values of d. We report the performance of the algorithm based on 1000 Monte Carlo simulations and B = 200 bootstraps for each simulation. Given a null model F 0 (t) = F BS (x, t, 0), we first compare the actual test levels to prescribed nominal levels α = 5% and 10%. In Web Table 1, we provide the rejection probabilities averaged over 1000 replications. It is evident from our results that the actual test levels get closer to the nominal level as the sample size increases. With correlated error structures, the test levels still maintain close to the nominal level both in the dense and sparse designs. We now calculate powers of our proposed test at nominal level α = 5%. We consider true null models with d = 0.1, 1, 2, 3, 4, 5 and 6, and results are presented in Figure 2. It is evident from the figure that for each case, the power increases with respect to the increasing values of d. We also notice that the tests based on the densely sampled design are more powerful than the sparsely sampled design, as one would expect. Nevertheless, one might encounter an important issue when implementing our resampling based algorithm. When calculating T n,obs or Tb, we occasionally observe negative RSS 0 RSS 1 both in the densely and the sparsely sampled designs. For those cases, we set RSS 0 RSS 1 = 0. We also recognize negative df 0 df 1 rarely in the sparse design. Such cases are excluded when calculating the actual test level and the power in all cases. 6.3 Applications Gait Data. In a study of gait deficiency, it is important to identify how the joints in hip and knee interact during a gait cycle (Theologis, 2009). Typically, one represents the timing of events occurring during a gait cycle as a percentage of this cycle, where the initial contact of a foot is recorded as 0% and the second contact of the same foot as 100%. As an illustrative example, we analyze longitudinal measurements of hip and knee angles taken from 39 children as they walk through a single gait cycle (Olshen et al., 1989; Ramsay and

21 20 Biometrics, 2015 Silverman, 2005; Crainiceanu et al., 2014). The hip and knee angles are measured at 20 evaluation points {t j } 20 j=1 in [0,1], which are translated from percent values of the cycle. In Web Figure 1, we display the observed individual trajectories of the hip and knee angles. In the analysis, the hip and knee angles are considered as densely observed functional covariate and functional response, respectively. We first employ our resampling based test to investigate whether the hip angles are associated with the knee angles. We select K x = K t = 11 cubic B-splines for x and t to fit the GFCM, and B = 250 bootstrap replications are used. The bootstrap p-value is computed to be less than 0.004, thus we conclude that the hip angle measured at a specific time point has a strong effect on the knee angle at the same time point. To assess how the hip and knee angles are related to each other we fit our proposed GFCM as well as linear FCM. We assess predictive accuracy by splitting the data into training and test sets of size 30 and 9. Assuming that the hip angles are observed with measurement errors, we smooth the covariate by FPCA and then apply the center/scaling transformation. We compare prediction errors obtained by fitting both the GFCM and the linear FCM. Also, as a benchmark model we further fit a linear mixed effect (LME) model Knee angle ij = (β 0 b 0i ) (β 1 b 1i )Hip angle ij ɛ ij where (b 0i, b 1i ) T are the random coefficients from N(0, R) with some covariance matrix R, and ɛ ij are the errors from N(0, σɛ 2 ). In this format, (b 0i, b 1i ) T and ɛ ij are assumed to be independent. For the GFCM and the linear FCM, we report the RMSPE, the ICP, the IL and the R(SE). For the LME model, we report similar measures, but we take average over the repeated measurements instead of integrating over the time domain. The results are collected in Table 2 (top panel). We observe that the LME model provides a poor predictive performance compared to the others, implying that models in the framework of concurrent regression model are obviously

22 Generalized Functional Concurrent Model 21 better. The GFCM presents slightly better predictive performances than the linear FCM, but the difference is negligible. For instance, prediction errors from the GFCM are smaller than the linear FCM. The R(SE) obtained from the GFCM are narrower at all significance levels. Figure 3, top panel, shows prediction bands obtained from the test data sets, and we only detect a small difference between the GFCM (solid line) and the linear FCM (dashed line). The bottom panel in Figure 3 displays a heat map of predicted surface in the GFCM. It is evident that the relationship between the hip angles and the knee angels is simply linear Dietary Calcium Absorption Data. As a sparse data example, we present an application to dietary calcium absorption data set (Davis, 2002; Sentürk and Nguyen, 2011). In a group of 188 patients, dietary and bone measurement tests are conducted approximately every five year, and calcium intake and calcium absorption are measured with respect to their ages. Patients are aged around 35 and 45 in the beginning of the study, and 1 to 4 repeated measurements are collected for each subject after then. Overall ages are ranging approximately from 35 to 64. In the analysis, ages are transformed into the values in [0,1], and the results are considered as evaluation points of the functions. There are dropouts and missed visits among the patients, yielding a sparse and irregular design of sampling. We investigate a relationship between the calcium intake and the calcium absorption measured at the same time point. We applied both the GFCM and the linear FCM by considering that the calcium intake and the calcium absorption are observed functional covariate and functional response, respectively. In Web Figure 2, we display the observed individual trajectories of the calcium intake and the calcium absorption along the patient s age at the visit. To begin, we employ our resampling based testing procedure to test the null hypothesis of no association. We select K x = K t = 11 cubic B-splines for x and t to fit the GFCM, and B = 250 bootstrap replications are used. The p-value of our test is obtained to be less

23 22 Biometrics, 2015 than 0.004, indicating that the calcium intake measured at a certain time point are strongly associated with the calcium absorption at the same time point. We next the analyze the predictive performance of the GFCM and FCM by forming the training and test sets of size 148 and 40, respectively. For the test data set, we smooth the individual trajectories to recover missed observations, thus the resulting covariate and response are smooth and have all observations on its support. Assuming that the calcium intake has a potential measurement errors, we smooth the sparse and noisy covariate of the training data and then apply the center/scaling transformation. To fit the GFCM, we again select K x = K t = 11 cubic B-splines for x and t. For the linear FCM, we use 11 cubic B- splines. Results of predictive accuracy are presented in Table 2 (bottom panel). In the table, the GFCM and the linear FCM are showing similar RMSPE in and RMSPE out, indicating a simple linear association between the calcium intake and absorption. In Figure 4, we display point-wise prediction intervals/bands for three subjects randomly selected from the test data set. In the figure, there are negligible differences between the GFCM (grey solid lines) and the linear FCM (dashed lines), further demonstrating linearity between the variables. 7. Discussion This article proposes a generalized functional concurrent model (GFCM) when both the response and the covariate are functional data observed on the same domain. Estimation and inference are discussed in realistic scenarios involving settings such as complex correlated error structures as well as sparse and/or irregular design, when the covariate is possibly contaminated with noise. The model and estimation framework can be easily extended to accommodate additional covariates via linear or smooth effects. In this flexible framework, the article considers testing for no covariate effect using F test. Bootstrap is used to estimate the null distribution of the test. The proposed methods are investigated numerically through simulations mimicking realistic settings. The results show that the new framework captures

24 Generalized Functional Concurrent Model 23 non-linear relationships between covariate and response, while it performs similarly to the common functional concurrent model, when the relationship is linear. Formally testing for linearity of the unknown relationship is also an interesting problem for future research. Acknowledgements Maity s research was supported by National Institute of Health grant R00 ES and an North Carolina State University faculty Research and Professional Development grant. Staicu s research was supported by National Institute of Health grant RO1 NS Supplementary Materials Web Appendices, Tables, and Figures referenced in Section 2 and Section 6 are available with this paper at the Biometrics website on Wiley Online Library. The computer code implementing our procedure is available with this paper at under the software section. References Crainiceanu, C. M., Reiss, P., Goldsmith, A. J., Huang, L., Huo, L., and Scheipl, F. (2014). Refund: Regression with Functional Data. R package version Davis, C. S. (2002). Statistical Methods for the Analysis of Repeated Measurements. New York, NY: Springer. Di, C.-Z., Crainiceanu, C., Caffo, B., and Punjabi, N. M. (2009). Multilevel functional principal component analysis. Annals of Applied Statistics 4, Eilers, P. H. C. and Marx, B. D. (1996). Flexible smoothing with B-splines and penalties. Statistical Science 11, Goldsmith, J., Bobb, J., Crainiceanu, C., Caffo, B., and Reich, D. (2012). Penalized functional regression. Journal of Computational and Graphical Statistics 20,

25 24 Biometrics, 2015 Goldsmith, J., Greven, S., and Crainiceanu, C. (2013) Corrected confidence bands for functional data using principal components. Biometrics 69, Li, Y. and Ruppert, D. (2008). On the asymptotics of penalized splines. Biometrika 95, Malfait, N. and Ramsay, J. O. (2003). The historical functional linear model. The Canadian Journal of statistics 31, Marx, B. D. and Eilers, P. H. C. (2005). Multivariate penalized signal regression. Technometrics 47, McLean, M. W., Hooker, G., Staicu, A. M., Scheipl, F., and Ruppert, D. (2014). Functional generalized additive models. Journal of Computational and Graphical Statistics 23, Olshen, R. A., Biden, E. N., Wyatt, M. P., and Sutherland, D. H. (1989). Gait analysis and the bootstrap. The Annals of Statistics 17, Ramsay, J. O., Hooker, G., and Graves, S. (2009). Functional Data Analysis in R and Matlab. New York, NY: Springer. Ramsay, J. O. and Silverman, B. W. (2005). Functional Data Analysis. 2nd edition. New York, NY: Springer. Ruppert, D., Wand, M. P., and Carroll, R. J. (2003). Semiparametric Regression. Cambridge, New York: Cambridge University Press. Sentürk, D. and Nguyen, D. V. (2011). Varying coefficient models for sparse noisecontaminated longitudinal data. Statistica Sinica, 21, Shen, Q. and Faraway, J. (2004). An F test for linear models with functional responses. Statistica Sinica, 14, Theologis, T. N. (2009). Children s Orthopaedics and Fractures (Chapter 6). 3rd edition. London: Springer.

26 Generalized Functional Concurrent Model 25 Wood, S. N. (2006a). Generalized Additive Models: An Introduction with R. Boca Raton, Florida: Chapman and Hall/CRC. Wood, S. N. (2006b). Low-rank scale-invariant tensor product smooths for generalized additive mixed models. Biometrics 62, Wood, S. N. (2011). mgcv: Mixed GAM Computation Vehicle with GCV/AIC/REML Smoothness Estimation. R package version Wu, Y., Fan, J., and Müller, H.-G. (2010). Varying-coefficient functional linear regression. Bernoulli 16, Xu, H., Shen, Q., Yang, X., and Shoptaw, S. (2011). A quasi F-test for functional linear models with functional covariates and its application to longitudinal data. Statistics in Medicine 30, Yao, F., Müller, H.-G., and Wang, J.-L. (2005a). Functional data analysis for sparse longitudinal data. Journal of the American Statistical Association 100, Yao, F., Müller, H.-G., and Wang, J.-L. (2005b). Functional linear regression analysis for longitudinal data. The Annals of Statistics 33, Zhang, J.-T. and Chen, J. (2007). Statistical inference for functional data. The Annals of Statistics 35, [Table 1 about here.] [Table 2 about here.] [Figure 1 about here.] [Figure 2 about here.] [Figure 3 about here.] [Figure 4 about here.]

Supplementary Material to General Functional Concurrent Model

Supplementary Material to General Functional Concurrent Model Supplementary Material to General Functional Concurrent Model Janet S. Kim Arnab Maity Ana-Maria Staicu June 17, 2016 This Supplementary Material contains six sections. Appendix A discusses modifications

More information

General Functional Concurrent Model

General Functional Concurrent Model General Functional Concurrent Model Janet S. Kim Arnab Maity Ana-Maria Staicu June 17, 2016 Abstract We propose a flexible regression model to study the association between a functional response and multiple

More information

Diagnostics for Linear Models With Functional Responses

Diagnostics for Linear Models With Functional Responses Diagnostics for Linear Models With Functional Responses Qing Shen Edmunds.com Inc. 2401 Colorado Ave., Suite 250 Santa Monica, CA 90404 (shenqing26@hotmail.com) Hongquan Xu Department of Statistics University

More information

An Introduction to Functional Data Analysis

An Introduction to Functional Data Analysis An Introduction to Functional Data Analysis Chongzhi Di Fred Hutchinson Cancer Research Center cdi@fredhutch.org Biotat 578A: Special Topics in (Genetic) Epidemiology November 10, 2015 Textbook Ramsay

More information

Functional Latent Feature Models. With Single-Index Interaction

Functional Latent Feature Models. With Single-Index Interaction Generalized With Single-Index Interaction Department of Statistics Center for Statistical Bioinformatics Institute for Applied Mathematics and Computational Science Texas A&M University Naisyin Wang and

More information

FUNCTIONAL DATA ANALYSIS. Contribution to the. International Handbook (Encyclopedia) of Statistical Sciences. July 28, Hans-Georg Müller 1

FUNCTIONAL DATA ANALYSIS. Contribution to the. International Handbook (Encyclopedia) of Statistical Sciences. July 28, Hans-Georg Müller 1 FUNCTIONAL DATA ANALYSIS Contribution to the International Handbook (Encyclopedia) of Statistical Sciences July 28, 2009 Hans-Georg Müller 1 Department of Statistics University of California, Davis One

More information

Missing Data Issues in the Studies of Neurodegenerative Disorders: the Methodology

Missing Data Issues in the Studies of Neurodegenerative Disorders: the Methodology Missing Data Issues in the Studies of Neurodegenerative Disorders: the Methodology Sheng Luo, PhD Associate Professor Department of Biostatistics & Bioinformatics Duke University Medical Center sheng.luo@duke.edu

More information

Function-on-Scalar Regression with the refund Package

Function-on-Scalar Regression with the refund Package University of Haifa From the SelectedWorks of Philip T. Reiss July 30, 2012 Function-on-Scalar Regression with the refund Package Philip T. Reiss, New York University Available at: https://works.bepress.com/phil_reiss/28/

More information

Covariate Adjusted Varying Coefficient Models

Covariate Adjusted Varying Coefficient Models Biostatistics Advance Access published October 26, 2005 Covariate Adjusted Varying Coefficient Models Damla Şentürk Department of Statistics, Pennsylvania State University University Park, PA 16802, U.S.A.

More information

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science 1 Likelihood Ratio Tests that Certain Variance Components Are Zero Ciprian M. Crainiceanu Department of Statistical Science www.people.cornell.edu/pages/cmc59 Work done jointly with David Ruppert, School

More information

Simultaneous Confidence Bands for the Coefficient Function in Functional Regression

Simultaneous Confidence Bands for the Coefficient Function in Functional Regression University of Haifa From the SelectedWorks of Philip T. Reiss August 7, 2008 Simultaneous Confidence Bands for the Coefficient Function in Functional Regression Philip T. Reiss, New York University Available

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 1 Bootstrapped Bias and CIs Given a multiple regression model with mean and

More information

REGRESSING LONGITUDINAL RESPONSE TRAJECTORIES ON A COVARIATE

REGRESSING LONGITUDINAL RESPONSE TRAJECTORIES ON A COVARIATE REGRESSING LONGITUDINAL RESPONSE TRAJECTORIES ON A COVARIATE Hans-Georg Müller 1 and Fang Yao 2 1 Department of Statistics, UC Davis, One Shields Ave., Davis, CA 95616 E-mail: mueller@wald.ucdavis.edu

More information

Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach"

Kneib, Fahrmeir: Supplement to Structured additive regression for categorical space-time data: A mixed model approach Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach" Sonderforschungsbereich 386, Paper 43 (25) Online unter: http://epub.ub.uni-muenchen.de/

More information

Likelihood ratio testing for zero variance components in linear mixed models

Likelihood ratio testing for zero variance components in linear mixed models Likelihood ratio testing for zero variance components in linear mixed models Sonja Greven 1,3, Ciprian Crainiceanu 2, Annette Peters 3 and Helmut Küchenhoff 1 1 Department of Statistics, LMU Munich University,

More information

P -spline ANOVA-type interaction models for spatio-temporal smoothing

P -spline ANOVA-type interaction models for spatio-temporal smoothing P -spline ANOVA-type interaction models for spatio-temporal smoothing Dae-Jin Lee 1 and María Durbán 1 1 Department of Statistics, Universidad Carlos III de Madrid, SPAIN. e-mail: dae-jin.lee@uc3m.es and

More information

Estimation of Sparse Functional Additive Models. with Adaptive Group LASSO

Estimation of Sparse Functional Additive Models. with Adaptive Group LASSO Estimation of Sparse Functional Additive Models with Adaptive Group LASSO Peijun Sang, Liangliang Wang and Jiguo Cao Department of Statistics and Actuarial Science Simon Fraser University Abstract: We

More information

Incorporating Covariates in Skewed Functional Data Models

Incorporating Covariates in Skewed Functional Data Models Incorporating Covariates in Skewed Functional Data Models Meng Li, Ana-Maria Staicu and Howard D. Bondell Department of Statistics, North Carolina State University ana-maria staicu@ncsu.edu November 5,

More information

Modeling Multiple Correlated Functional Outcomes with Spatially. Heterogeneous Shape Characteristics

Modeling Multiple Correlated Functional Outcomes with Spatially. Heterogeneous Shape Characteristics Biometrics 000, 1 25 DOI: 000 000 0000 Modeling Multiple Correlated Functional Outcomes with Spatially Heterogeneous Shape Characteristics Kunlaya Soiaporn 1,, David Ruppert 2,, and Raymond J. Carroll

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University Integrated Likelihood Estimation in Semiparametric Regression Models Thomas A. Severini Department of Statistics Northwestern University Joint work with Heping He, University of York Introduction Let Y

More information

Likelihood Ratio Tests for Dependent Data with Applications to Longitudinal and Functional Data Analysis

Likelihood Ratio Tests for Dependent Data with Applications to Longitudinal and Functional Data Analysis Likelihood Ratio Tests for Dependent Data with Applications to Longitudinal and Functional Data Analysis Ana-Maria Staicu Yingxing Li Ciprian M. Crainiceanu David Ruppert October 13, 2013 Abstract The

More information

Flexible Spatio-temporal smoothing with array methods

Flexible Spatio-temporal smoothing with array methods Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session IPS046) p.849 Flexible Spatio-temporal smoothing with array methods Dae-Jin Lee CSIRO, Mathematics, Informatics and

More information

Estimation of cumulative distribution function with spline functions

Estimation of cumulative distribution function with spline functions INTERNATIONAL JOURNAL OF ECONOMICS AND STATISTICS Volume 5, 017 Estimation of cumulative distribution function with functions Akhlitdin Nizamitdinov, Aladdin Shamilov Abstract The estimation of the cumulative

More information

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Monte Carlo Studies. The response in a Monte Carlo study is a random variable. Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating

More information

Second-Order Inference for Gaussian Random Curves

Second-Order Inference for Gaussian Random Curves Second-Order Inference for Gaussian Random Curves With Application to DNA Minicircles Victor Panaretos David Kraus John Maddocks Ecole Polytechnique Fédérale de Lausanne Panaretos, Kraus, Maddocks (EPFL)

More information

Analysis Methods for Supersaturated Design: Some Comparisons

Analysis Methods for Supersaturated Design: Some Comparisons Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs

More information

Analysing geoadditive regression data: a mixed model approach

Analysing geoadditive regression data: a mixed model approach Analysing geoadditive regression data: a mixed model approach Institut für Statistik, Ludwig-Maximilians-Universität München Joint work with Ludwig Fahrmeir & Stefan Lang 25.11.2005 Spatio-temporal regression

More information

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Elizabeth C. Mannshardt-Shamseldin Advisor: Richard L. Smith Duke University Department

More information

Spatial Backfitting of Roller Measurement Values from a Florida Test Bed

Spatial Backfitting of Roller Measurement Values from a Florida Test Bed Spatial Backfitting of Roller Measurement Values from a Florida Test Bed Daniel K. Heersink 1, Reinhard Furrer 1, and Mike A. Mooney 2 1 Institute of Mathematics, University of Zurich, CH-8057 Zurich 2

More information

On prediction and density estimation Peter McCullagh University of Chicago December 2004

On prediction and density estimation Peter McCullagh University of Chicago December 2004 On prediction and density estimation Peter McCullagh University of Chicago December 2004 Summary Having observed the initial segment of a random sequence, subsequent values may be predicted by calculating

More information

Some properties of Likelihood Ratio Tests in Linear Mixed Models

Some properties of Likelihood Ratio Tests in Linear Mixed Models Some properties of Likelihood Ratio Tests in Linear Mixed Models Ciprian M. Crainiceanu David Ruppert Timothy J. Vogelsang September 19, 2003 Abstract We calculate the finite sample probability mass-at-zero

More information

Incorporating Covariates in Skewed Functional Data Models

Incorporating Covariates in Skewed Functional Data Models Incorporating Covariates in Skewed Functional Data Models Meng Li, Ana-Maria Staicu and Howard D. Bondell Department of Statistics, North Carolina State University Abstract Most existing models for functional

More information

Interaction effects for continuous predictors in regression modeling

Interaction effects for continuous predictors in regression modeling Interaction effects for continuous predictors in regression modeling Testing for interactions The linear regression model is undoubtedly the most commonly-used statistical model, and has the advantage

More information

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of

More information

Linear models and their mathematical foundations: Simple linear regression

Linear models and their mathematical foundations: Simple linear regression Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction

More information

A TIME SERIES PARADOX: UNIT ROOT TESTS PERFORM POORLY WHEN DATA ARE COINTEGRATED

A TIME SERIES PARADOX: UNIT ROOT TESTS PERFORM POORLY WHEN DATA ARE COINTEGRATED A TIME SERIES PARADOX: UNIT ROOT TESTS PERFORM POORLY WHEN DATA ARE COINTEGRATED by W. Robert Reed Department of Economics and Finance University of Canterbury, New Zealand Email: bob.reed@canterbury.ac.nz

More information

Degradation Modeling and Monitoring of Truncated Degradation Signals. Rensheng Zhou, Nagi Gebraeel, and Nicoleta Serban

Degradation Modeling and Monitoring of Truncated Degradation Signals. Rensheng Zhou, Nagi Gebraeel, and Nicoleta Serban Degradation Modeling and Monitoring of Truncated Degradation Signals Rensheng Zhou, Nagi Gebraeel, and Nicoleta Serban School of Industrial and Systems Engineering, Georgia Institute of Technology Abstract:

More information

Regularization in Cox Frailty Models

Regularization in Cox Frailty Models Regularization in Cox Frailty Models Andreas Groll 1, Trevor Hastie 2, Gerhard Tutz 3 1 Ludwig-Maximilians-Universität Munich, Department of Mathematics, Theresienstraße 39, 80333 Munich, Germany 2 University

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

Lawrence D. Brown* and Daniel McCarthy*

Lawrence D. Brown* and Daniel McCarthy* Comments on the paper, An adaptive resampling test for detecting the presence of significant predictors by I. W. McKeague and M. Qian Lawrence D. Brown* and Daniel McCarthy* ABSTRACT: This commentary deals

More information

Using Estimating Equations for Spatially Correlated A

Using Estimating Equations for Spatially Correlated A Using Estimating Equations for Spatially Correlated Areal Data December 8, 2009 Introduction GEEs Spatial Estimating Equations Implementation Simulation Conclusion Typical Problem Assess the relationship

More information

On the Identifiability of the Functional Convolution. Model

On the Identifiability of the Functional Convolution. Model On the Identifiability of the Functional Convolution Model Giles Hooker Abstract This report details conditions under which the Functional Convolution Model described in Asencio et al. (2013) can be identified

More information

Psychology 282 Lecture #4 Outline Inferences in SLR

Psychology 282 Lecture #4 Outline Inferences in SLR Psychology 282 Lecture #4 Outline Inferences in SLR Assumptions To this point we have not had to make any distributional assumptions. Principle of least squares requires no assumptions. Can use correlations

More information

Tutorial on Functional Data Analysis

Tutorial on Functional Data Analysis Tutorial on Functional Data Analysis Ana-Maria Staicu Department of Statistics, North Carolina State University astaicu@ncsu.edu SAMSI April 5, 2017 A-M Staicu Tutorial on Functional Data Analysis April

More information

SUPPLEMENTARY APPENDICES FOR WAVELET-DOMAIN REGRESSION AND PREDICTIVE INFERENCE IN PSYCHIATRIC NEUROIMAGING

SUPPLEMENTARY APPENDICES FOR WAVELET-DOMAIN REGRESSION AND PREDICTIVE INFERENCE IN PSYCHIATRIC NEUROIMAGING Submitted to the Annals of Applied Statistics SUPPLEMENTARY APPENDICES FOR WAVELET-DOMAIN REGRESSION AND PREDICTIVE INFERENCE IN PSYCHIATRIC NEUROIMAGING By Philip T. Reiss, Lan Huo, Yihong Zhao, Clare

More information

Marginal Screening and Post-Selection Inference

Marginal Screening and Post-Selection Inference Marginal Screening and Post-Selection Inference Ian McKeague August 13, 2017 Ian McKeague (Columbia University) Marginal Screening August 13, 2017 1 / 29 Outline 1 Background on Marginal Screening 2 2

More information

Defect Detection using Nonparametric Regression

Defect Detection using Nonparametric Regression Defect Detection using Nonparametric Regression Siana Halim Industrial Engineering Department-Petra Christian University Siwalankerto 121-131 Surabaya- Indonesia halim@petra.ac.id Abstract: To compare

More information

Restricted Likelihood Ratio Tests in Nonparametric Longitudinal Models

Restricted Likelihood Ratio Tests in Nonparametric Longitudinal Models Restricted Likelihood Ratio Tests in Nonparametric Longitudinal Models Short title: Restricted LR Tests in Longitudinal Models Ciprian M. Crainiceanu David Ruppert May 5, 2004 Abstract We assume that repeated

More information

Effective Computation for Odds Ratio Estimation in Nonparametric Logistic Regression

Effective Computation for Odds Ratio Estimation in Nonparametric Logistic Regression Communications of the Korean Statistical Society 2009, Vol. 16, No. 4, 713 722 Effective Computation for Odds Ratio Estimation in Nonparametric Logistic Regression Young-Ju Kim 1,a a Department of Information

More information

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los

More information

Machine Learning for Disease Progression

Machine Learning for Disease Progression Machine Learning for Disease Progression Yong Deng Department of Materials Science & Engineering yongdeng@stanford.edu Xuxin Huang Department of Applied Physics xxhuang@stanford.edu Guanyang Wang Department

More information

SUPPLEMENT TO PARAMETRIC OR NONPARAMETRIC? A PARAMETRICNESS INDEX FOR MODEL SELECTION. University of Minnesota

SUPPLEMENT TO PARAMETRIC OR NONPARAMETRIC? A PARAMETRICNESS INDEX FOR MODEL SELECTION. University of Minnesota Submitted to the Annals of Statistics arxiv: math.pr/0000000 SUPPLEMENT TO PARAMETRIC OR NONPARAMETRIC? A PARAMETRICNESS INDEX FOR MODEL SELECTION By Wei Liu and Yuhong Yang University of Minnesota In

More information

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Xiuming Zhang zhangxiuming@u.nus.edu A*STAR-NUS Clinical Imaging Research Center October, 015 Summary This report derives

More information

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models Tihomir Asparouhov 1, Bengt Muthen 2 Muthen & Muthen 1 UCLA 2 Abstract Multilevel analysis often leads to modeling

More information

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science. Texts in Statistical Science Generalized Linear Mixed Models Modern Concepts, Methods and Applications Walter W. Stroup CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

A Bootstrap Test for Conditional Symmetry

A Bootstrap Test for Conditional Symmetry ANNALS OF ECONOMICS AND FINANCE 6, 51 61 005) A Bootstrap Test for Conditional Symmetry Liangjun Su Guanghua School of Management, Peking University E-mail: lsu@gsm.pku.edu.cn and Sainan Jin Guanghua School

More information

Illustration of the Varying Coefficient Model for Analyses the Tree Growth from the Age and Space Perspectives

Illustration of the Varying Coefficient Model for Analyses the Tree Growth from the Age and Space Perspectives TR-No. 14-06, Hiroshima Statistical Research Group, 1 11 Illustration of the Varying Coefficient Model for Analyses the Tree Growth from the Age and Space Perspectives Mariko Yamamura 1, Keisuke Fukui

More information

Robust Backtesting Tests for Value-at-Risk Models

Robust Backtesting Tests for Value-at-Risk Models Robust Backtesting Tests for Value-at-Risk Models Jose Olmo City University London (joint work with Juan Carlos Escanciano, Indiana University) Far East and South Asia Meeting of the Econometric Society

More information

Aggregated cancer incidence data: spatial models

Aggregated cancer incidence data: spatial models Aggregated cancer incidence data: spatial models 5 ième Forum du Cancéropôle Grand-est - November 2, 2011 Erik A. Sauleau Department of Biostatistics - Faculty of Medicine University of Strasbourg ea.sauleau@unistra.fr

More information

Stochastic Spectral Approaches to Bayesian Inference

Stochastic Spectral Approaches to Bayesian Inference Stochastic Spectral Approaches to Bayesian Inference Prof. Nathan L. Gibson Department of Mathematics Applied Mathematics and Computation Seminar March 4, 2011 Prof. Gibson (OSU) Spectral Approaches to

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Tolerance Bands for Functional Data

Tolerance Bands for Functional Data Tolerance Bands for Functional Data Lasitha N. Rathnayake and Pankaj K. Choudhary 1 Department of Mathematical Sciences, FO 35 University of Texas at Dallas Richardson, TX 75080-3021, USA Abstract Often

More information

Issues on quantile autoregression

Issues on quantile autoregression Issues on quantile autoregression Jianqing Fan and Yingying Fan We congratulate Koenker and Xiao on their interesting and important contribution to the quantile autoregression (QAR). The paper provides

More information

Computational methods for mixed models

Computational methods for mixed models Computational methods for mixed models Douglas Bates Department of Statistics University of Wisconsin Madison March 27, 2018 Abstract The lme4 package provides R functions to fit and analyze several different

More information

Computationally efficient banding of large covariance matrices for ordered data and connections to banding the inverse Cholesky factor

Computationally efficient banding of large covariance matrices for ordered data and connections to banding the inverse Cholesky factor Computationally efficient banding of large covariance matrices for ordered data and connections to banding the inverse Cholesky factor Y. Wang M. J. Daniels wang.yanpin@scrippshealth.org mjdaniels@austin.utexas.edu

More information

Functional Linear Mixed Models for. Irregularly or Sparsely Sampled Data

Functional Linear Mixed Models for. Irregularly or Sparsely Sampled Data Functional Linear Mixed Models for Irregularly or Sparsely Sampled Data Jona Cederbaum 1, Marianne Pouplier 2, Phil Hoole 2, and Sonja Greven 1 arxiv:150801686v1 [statme] 7 Aug 2015 1 Department of Statistics,

More information

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector

More information

Multilevel Cross-dependent Binary Longitudinal Data

Multilevel Cross-dependent Binary Longitudinal Data Multilevel Cross-dependent Binary Longitudinal Data Nicoleta Serban 1 H. Milton Stewart School of Industrial Systems and Engineering Georgia Institute of Technology nserban@isye.gatech.edu Ana-Maria Staicu

More information

Eric Shou Stat 598B / CSE 598D METHODS FOR MICRODATA PROTECTION

Eric Shou Stat 598B / CSE 598D METHODS FOR MICRODATA PROTECTION Eric Shou Stat 598B / CSE 598D METHODS FOR MICRODATA PROTECTION INTRODUCTION Statistical disclosure control part of preparations for disseminating microdata. Data perturbation techniques: Methods assuring

More information

Inversion Base Height. Daggot Pressure Gradient Visibility (miles)

Inversion Base Height. Daggot Pressure Gradient Visibility (miles) Stanford University June 2, 1998 Bayesian Backtting: 1 Bayesian Backtting Trevor Hastie Stanford University Rob Tibshirani University of Toronto Email: trevor@stat.stanford.edu Ftp: stat.stanford.edu:

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

Spatial bias modeling with application to assessing remotely-sensed aerosol as a proxy for particulate matter

Spatial bias modeling with application to assessing remotely-sensed aerosol as a proxy for particulate matter Spatial bias modeling with application to assessing remotely-sensed aerosol as a proxy for particulate matter Chris Paciorek Department of Biostatistics Harvard School of Public Health application joint

More information

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models

More information

Rejoinder. 1 Phase I and Phase II Profile Monitoring. Peihua Qiu 1, Changliang Zou 2 and Zhaojun Wang 2

Rejoinder. 1 Phase I and Phase II Profile Monitoring. Peihua Qiu 1, Changliang Zou 2 and Zhaojun Wang 2 Rejoinder Peihua Qiu 1, Changliang Zou 2 and Zhaojun Wang 2 1 School of Statistics, University of Minnesota 2 LPMC and Department of Statistics, Nankai University, China We thank the editor Professor David

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Institute of Statistics and Econometrics Georg-August-University Göttingen Department of Statistics

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

R-squared for Bayesian regression models

R-squared for Bayesian regression models R-squared for Bayesian regression models Andrew Gelman Ben Goodrich Jonah Gabry Imad Ali 8 Nov 2017 Abstract The usual definition of R 2 (variance of the predicted values divided by the variance of the

More information

Penalized Splines, Mixed Models, and Recent Large-Sample Results

Penalized Splines, Mixed Models, and Recent Large-Sample Results Penalized Splines, Mixed Models, and Recent Large-Sample Results David Ruppert Operations Research & Information Engineering, Cornell University Feb 4, 2011 Collaborators Matt Wand, University of Wollongong

More information

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix Labor-Supply Shifts and Economic Fluctuations Technical Appendix Yongsung Chang Department of Economics University of Pennsylvania Frank Schorfheide Department of Economics University of Pennsylvania January

More information

Dynamic System Identification using HDMR-Bayesian Technique

Dynamic System Identification using HDMR-Bayesian Technique Dynamic System Identification using HDMR-Bayesian Technique *Shereena O A 1) and Dr. B N Rao 2) 1), 2) Department of Civil Engineering, IIT Madras, Chennai 600036, Tamil Nadu, India 1) ce14d020@smail.iitm.ac.in

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Quantile Regression for Residual Life and Empirical Likelihood

Quantile Regression for Residual Life and Empirical Likelihood Quantile Regression for Residual Life and Empirical Likelihood Mai Zhou email: mai@ms.uky.edu Department of Statistics, University of Kentucky, Lexington, KY 40506-0027, USA Jong-Hyeon Jeong email: jeong@nsabp.pitt.edu

More information

On block bootstrapping areal data Introduction

On block bootstrapping areal data Introduction On block bootstrapping areal data Nicholas Nagle Department of Geography University of Colorado UCB 260 Boulder, CO 80309-0260 Telephone: 303-492-4794 Email: nicholas.nagle@colorado.edu Introduction Inference

More information

Sensitivity and Asymptotic Error Theory

Sensitivity and Asymptotic Error Theory Sensitivity and Asymptotic Error Theory H.T. Banks and Marie Davidian MA-ST 810 Fall, 2009 North Carolina State University Raleigh, NC 27695 Center for Quantitative Sciences in Biomedicine North Carolina

More information

Nonparametric Small Area Estimation via M-quantile Regression using Penalized Splines

Nonparametric Small Area Estimation via M-quantile Regression using Penalized Splines Nonparametric Small Estimation via M-quantile Regression using Penalized Splines Monica Pratesi 10 August 2008 Abstract The demand of reliable statistics for small areas, when only reduced sizes of the

More information

Bootstrapping high dimensional vector: interplay between dependence and dimensionality

Bootstrapping high dimensional vector: interplay between dependence and dimensionality Bootstrapping high dimensional vector: interplay between dependence and dimensionality Xianyang Zhang Joint work with Guang Cheng University of Missouri-Columbia LDHD: Transition Workshop, 2014 Xianyang

More information

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Lecture Linear models from a matrix perspective An example of how to do

More information

AN F TEST FOR LINEAR MODELS WITH FUNCTIONAL RESPONSES

AN F TEST FOR LINEAR MODELS WITH FUNCTIONAL RESPONSES Statistica Sinica 14(2004), 1239-1257 AN F TEST FOR LINEAR MODELS WITH FUNCTIONAL RESPONSES Qing Shen and Julian Faraway Edmunds.com and University of Michigan Abstract: Linear models where the response

More information

Nonparametric Inference In Functional Data

Nonparametric Inference In Functional Data Nonparametric Inference In Functional Data Zuofeng Shang Purdue University Joint work with Guang Cheng from Purdue Univ. An Example Consider the functional linear model: Y = α + where 1 0 X(t)β(t)dt +

More information

Confidence Estimation Methods for Neural Networks: A Practical Comparison

Confidence Estimation Methods for Neural Networks: A Practical Comparison , 6-8 000, Confidence Estimation Methods for : A Practical Comparison G. Papadopoulos, P.J. Edwards, A.F. Murray Department of Electronics and Electrical Engineering, University of Edinburgh Abstract.

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Estimation of Conditional Kendall s Tau for Bivariate Interval Censored Data

Estimation of Conditional Kendall s Tau for Bivariate Interval Censored Data Communications for Statistical Applications and Methods 2015, Vol. 22, No. 6, 599 604 DOI: http://dx.doi.org/10.5351/csam.2015.22.6.599 Print ISSN 2287-7843 / Online ISSN 2383-4757 Estimation of Conditional

More information

A review of some semiparametric regression models with application to scoring

A review of some semiparametric regression models with application to scoring A review of some semiparametric regression models with application to scoring Jean-Loïc Berthet 1 and Valentin Patilea 2 1 ENSAI Campus de Ker-Lann Rue Blaise Pascal - BP 37203 35172 Bruz cedex, France

More information

SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES

SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES Statistica Sinica 19 (2009), 71-81 SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES Song Xi Chen 1,2 and Chiu Min Wong 3 1 Iowa State University, 2 Peking University and

More information

Bootstrap Approach to Comparison of Alternative Methods of Parameter Estimation of a Simultaneous Equation Model

Bootstrap Approach to Comparison of Alternative Methods of Parameter Estimation of a Simultaneous Equation Model Bootstrap Approach to Comparison of Alternative Methods of Parameter Estimation of a Simultaneous Equation Model Olubusoye, O. E., J. O. Olaomi, and O. O. Odetunde Abstract A bootstrap simulation approach

More information

Derivative Principal Component Analysis for Representing the Time Dynamics of Longitudinal and Functional Data 1

Derivative Principal Component Analysis for Representing the Time Dynamics of Longitudinal and Functional Data 1 Derivative Principal Component Analysis for Representing the Time Dynamics of Longitudinal and Functional Data 1 Short title: Derivative Principal Component Analysis Xiongtao Dai, Hans-Georg Müller Department

More information

Generalized Functional Linear Models with Semiparametric Single-Index Interactions

Generalized Functional Linear Models with Semiparametric Single-Index Interactions Generalized Functional Linear Models with Semiparametric Single-Index Interactions Yehua Li Department of Statistics, University of Georgia, Athens, GA 30602, yehuali@uga.edu Naisyin Wang Department of

More information