Likelihood ratio testing for zero variance components in linear mixed models

Similar documents
RLRsim: Testing for Random Effects or Nonparametric Regression Functions in Additive Mixed Models

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science

Restricted Likelihood Ratio Tests in Nonparametric Longitudinal Models

Some properties of Likelihood Ratio Tests in Linear Mixed Models

Exact Likelihood Ratio Tests for Penalized Splines

On testing an unspecified function through a linear mixed effects model with. multiple variance components

ON EXACT INFERENCE IN LINEAR MODELS WITH TWO VARIANCE-COVARIANCE COMPONENTS

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

A NOTE ON A NONPARAMETRIC REGRESSION TEST THROUGH PENALIZED SPLINES

Likelihood Ratio Tests for Dependent Data with Applications to Longitudinal and Functional Data Analysis

Flexible Spatio-temporal smoothing with array methods

1 Mixed effect models and longitudinal data analysis

Some New Aspects of Dose-Response Models with Applications to Multistage Models Having Parameters on the Boundary

Semi-Nonparametric Inferences for Massive Data

The Restricted Likelihood Ratio Test at the Boundary in Autoregressive Series

Modelling geoadditive survival data

Stat 710: Mathematical Statistics Lecture 31

Bootstrap Procedures for Testing Homogeneity Hypotheses

Function-on-Scalar Regression with the refund Package

Can we do statistical inference in a non-asymptotic way? 1

Penalized Splines, Mixed Models, and Recent Large-Sample Results

Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach"

Score test for random changepoint in a mixed model

Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models

Generalized Functional Concurrent Model

Estimation of cumulative distribution function with spline functions

On the Behaviour of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

SUPPLEMENTARY SIMULATIONS & FIGURES

Semiparametric Mixed Model for Evaluating Pathway-Environment Interaction

Quasi-likelihood Scan Statistics for Detection of

High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances

Hypothesis Testing in Smoothing Spline Models

International Journal of PharmTech Research CODEN (USA): IJPRIF, ISSN: , ISSN(Online): Vol.9, No.9, pp , 2016

Stat 5101 Lecture Notes

Extending the Robust Means Modeling Framework. Alyssa Counsell, Phil Chalmers, Matt Sigal, Rob Cribbie

Nonparametric Small Area Estimation Using Penalized Spline Regression

Missing Data Issues in the Studies of Neurodegenerative Disorders: the Methodology

Efficient Estimation for the Partially Linear Models with Random Effects

Biostatistics Workshop Longitudinal Data Analysis. Session 4 GARRETT FITZMAURICE

A copula goodness-of-t approach. conditional probability integral transform. Daniel Berg 1 Henrik Bakken 2

A TWO-STAGE LINEAR MIXED-EFFECTS/COX MODEL FOR LONGITUDINAL DATA WITH MEASUREMENT ERROR AND SURVIVAL

Local Likelihood Bayesian Cluster Modeling for small area health data. Andrew Lawson Arnold School of Public Health University of South Carolina

Modeling Real Estate Data using Quantile Regression

A nonparametric two-sample wald test of equality of variances

Rejoinder. 1 Phase I and Phase II Profile Monitoring. Peihua Qiu 1, Changliang Zou 2 and Zhaojun Wang 2

Nonparametric Small Area Estimation Using Penalized Spline Regression

Randomized Complete Block Designs

Experimental Design and Data Analysis for Biologists

36-720: Linear Mixed Models

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

Spatial bias modeling with application to assessing remotely-sensed aerosol as a proxy for particulate matter

Econometrics with Observational Data. Introduction and Identification Todd Wagner February 1, 2017

SIMEX and TLS: An equivalence result

mboost - Componentwise Boosting for Generalised Regression Models

Generalized, Linear, and Mixed Models

GMM Logistic Regression with Time-Dependent Covariates and Feedback Processes in SAS TM

Statistical Practice. Selecting the Best Linear Mixed Model Under REML. Matthew J. GURKA

Nonparametric Inference In Functional Data

Longitudinal Modeling with Logistic Regression

Nesting and Equivalence Testing

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model

Approaches to Modeling Menstrual Cycle Function

BIOSTATISTICAL METHODS

Results of a simulation of modeling and nonparametric methodology for count data (from patient diary) in drug studies

Power and Sample Size Calculations with the Additive Hazards Model

Regularization in Cox Frailty Models

Intermediate Econometrics

Comparing two independent samples

Generalized Additive Models

Supplementary Material to General Functional Concurrent Model

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao

TESTING FOR HOMOGENEITY IN COMBINING OF TWO-ARMED TRIALS WITH NORMALLY DISTRIBUTED RESPONSES

Some Monte Carlo Evidence for Adaptive Estimation of Unit-Time Varying Heteroscedastic Panel Data Models

Factor Analytic Models of Clustered Multivariate Data with Informative Censoring (refer to Dunson and Perreault, 2001, Biometrics 57, )

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion

Single Index Quantile Regression for Heteroscedastic Data

Linear Mixed Models. One-way layout REML. Likelihood. Another perspective. Relationship to classical ideas. Drawbacks.

Short term PM 10 forecasting: a survey of possible input variables

Modelling Ireland s exchange rates: from EMS to EMU

Course topics (tentative) The role of random effects

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Empirical Likelihood

UNIVERSITY OF TORONTO Faculty of Arts and Science

Semiparametric Methods for Mapping Brain Development

Covariate Adjusted Varying Coefficient Models

A Test of Cointegration Rank Based Title Component Analysis.

Surrogate marker evaluation when data are small, large, or very large Geert Molenberghs

Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview

Confidence intervals for the variance component of random-effects linear models

BIOS 2083: Linear Models

Statistical Inference: The Marginal Model

Subject CS1 Actuarial Statistics 1 Core Principles

P -spline ANOVA-type interaction models for spatio-temporal smoothing

A Conditional Approach to Modeling Multivariate Extremes

Sample size calculations for logistic and Poisson regression models

Transcription:

Likelihood ratio testing for zero variance components in linear mixed models Sonja Greven 1,3, Ciprian Crainiceanu 2, Annette Peters 3 and Helmut Küchenhoff 1 1 Department of Statistics, LMU Munich University, Munich, Germany, sonja.greven@stat.uni-muenchen.de, 2 Department of Biostatistics, Johns Hopkins University, Baltimore, USA, 3 Institute of Epidemiology, GSF-National Research Center for Environment and Health, Neuherberg, Germany Abstract: We consider the problem of testing for zero variance components in linear mixed models. Typical applications include testing for a random intercept or testing for linearity of a smooth function. We propose two approximations to the finite sample null distribution of the restricted likelihood ratio test statistic. Our approach applies to a wider variety of mixed models than previous results, including those with moderate numbers of clusters, unbalanced designs, or nonparametric smoothing. Extensive simulations show that both proposed approximations outperform the 0.5χ 2 0 :0.5χ 2 1 approximation and parametric bootstrap currently used. Our methods are motivated by and applied to the longitudinal epidemiological study Airgene, with the aim of assessing non-linearity of dose-response-functions between ambient air pollution concentrations and inflammation. Keywords: Boundary; Bootstrap; Penalized splines; Nonparametric smoothing; Air pollution. 1 Introduction Linear mixed models are widely used to model longitudinal or clustered data and, more recently, to estimate smoothing parameters for penalized splines using REML or ML. We focus on linear mixed models of the form Y = Xβ + Z 1 b 1 +...+ Z S b S + ε, (1) with random effects b s N(0,σ 2 si Ks ) pairwise independent and independent of ε N(0,σ 2 ε I n), K s columns in Z s, I ν the identity matrix of size ν, and n the sample size. We are interested in testing one of the variance components H 0,s : σs 2 =0 versus H A,s : σs 2 > 0, (2)

2 Likelihood ratio testing for zero variance components in linear mixed models corresponding, for example, to testing for a zero random intercept or testing linearity against a general alternative. This problem is non-standard due to the parameter on the boundary of the parameter space. Stram and Lee (1994), using results from Self and Liang (1987), showed that the Likelihood Ratio Test (LRT) statistic for testing (2) has an asymptotic 0.5χ 2 0 :0.5χ2 1 null distribution if Y can be divided into independent and identically distributed (i.i.d.) subvectors. However, for penalized spline smoothing responses are not independent at least under the alternative, and longitudinal studies often have unbalanced data or only moderate numbers of subjects. Crainiceanu and Ruppert (2004) derived the finite sample and asymptotic null distribution of the LRT and restricted LRT (RLRT) for testing (2) in models with one variance component (S =1),andshowed that it is generally different from 0.5χ 2 0 :0.5χ 2 1.ForS>1, they recommend a parametric bootstrap, which can be computationally very expensive. As the LRT has been seen to have undesirable properties with a high probability mass at zero, we develop two faster approximations to the finite sample null distribution of the RLRT. 2 Two approximations to the RLRT null distribution 2.1 Fast finite sample approximation Our first approximation is inspired by pseudo-likelihood estimation (Gong and Samaniego, 1981), where nuisance parameters are replaced by consistent estimators. Liang and Self (1996) showed that under certain regularity conditions the asymptotic distribution of the pseudo LRT is the same as that of the LRT if the nuisance parameters are known. For our problem, we could view the b i, i s, as nuisance parameters. We assume that under regularity conditions the prediction of i sz i b i is good enough to allow the distribution of the RLRT in model (1) to be closely approximated by the RLRT in the reduced model Ỹ = Xβ + Z s b s + ε, (3) with Ỹ =Y Z i b i assumed known. As model (3) has only one variance i s component σ 2 s, the exact null distribution of the RLRT for testing (2) is known (Crainiceanu and Ruppert, 2004) to be RLRT n d =sup λ 0 { (n p)log [ 1+ N ] n(λ) D n (λ) } K s log(1 + λμ l,n ), (4) where d = denotes equality in distribution, p is the number of columns in X, N n (λ) = K s l=1 λμ l,n 1+λμ l,n w 2 l, D n (λ) = K s l=1 l=1 w 2 l 1+λμ l,n + n p l=k s+1 w 2 l, (5)

Sonja Greven et al. 3 w l,l =1,...,n p, are independent N(0, 1), and μ l,n,l =1,...,K s, are the eigenvalues of the K s K s matrix Z s (I n X(X X) 1 X )Z s.this distribution can be simulated from very efficiently, as the K s eigenvalues need to be computed only once, and speed depends only on K s rather than on the number of observations n. 2.2 Mixture approximation to the Bootstrap If a parametric bootstrap is preferred but computationally intensive, we propose the following parametric approximation to the RLRT distribution RLRT d aud, (6) where U Bernoulli(1 p), D χ 2 1,and d denotes approximate equality in distribution. The flexible family of distributions in (6) contains as a particular case the i.i.d. case asymptotic 0.5χ 2 0 :0.5χ2 1 distribution with a =1andp =0.5, and is just as easy to use. p and a can be estimated from a bootstrap sample, while (6) stabilizes estimation of tail quantiles and thus reduces the necessary bootstrap sample size. Maximum likelihood estimation of p would require the proportion of simulated RLRT values that are exactly zero, and is therefore very sensitive to numerical imprecisions (encountered, for example, with proc MIXED in SAS, and the lme function in R). We thus propose estimation of p and a using the method of moments, after setting all negative values to zero. Note that both our proposed approximations are asymptotically identical to the 0.5χ 2 0 :0.5χ2 1 approximation when the i.i.d. assumption holds. 3 Simulation Study We conducted an extensive simulation study, covering a range of important situations with one or two variance components. An overview is given in Table 1. We varied the number of subjects I = 6, 10 and observations per subject J = 5, 25, 50, 100 for all settings, as well as the value for the respective nuisance variance component σ1 2 =0, 0.1, 1, 10, 100, while all other parameters not restricted to zero under the null were fixed at either 1 or 1. Covariates were sampled from standard normal distributions, with increasing correlation for the case of two smooth functions. Smooth uni- or bivariate functions were modeled using low-rank thin plate splines with smoothing parameters estimated by REML, and with testing of the corresponding variance component translating to testing for linearity f(x) =β 0 + β 1 x in the univariate, and to testing for additivity and linearity f(x 1,x 2 )=β 0 + β 1 x 1 + β 2 x 2 in the bivariate case. 10,000 samples each were simulated from the RLRT null distribution and our two approximations compared to a bootstrap and the 0.5χ 2 0 :0.5χ 2 1 approximation.

4 Likelihood ratio testing for zero variance components in linear mixed models Tested Component Null Hypothesis Nuisance Component 1 Random Intercept Equality of Means - 2 Smooth Function Linearity - 3 Random Intercept Equality of Means Smooth Function 4 Smooth Function Linearity Random Intercept 5 Smooth Function Linearity Smooth Function 6 Random Slope Equality of Slopes Random Intercept 7 Bivariate Smooth Additivity and Linearity Random Intercept 8 Random Intercept Equality of Means Bivariate Smooth TABLE 1. Settings for the simulation study. The fast finite sample approximation produced empirical type I error rates close to the nominal level, comparable to the exact distribution when S = 1. The approximation was usually good even for n = 30; the necessary sample size increased somewhat when random effects were correlated. The aud approximation reduced the necessary bootstrap sample size by between 10% and 90%, with the reduction more pronounced for smaller α levels or p values. The 0.5χ 2 0 :0.5χ2 1 approximation was always very conservative. 4 Testing smooth dose-response-functions The Airgene study was conducted in six European cities between May 03 and July 04. One of its aims is to assess association between inflammatory responses and ambient air pollution concentrations in myocardial infarction survivors. 3 inflammatory blood markers (CRP, Fibrinogen, IL-6) were measured every month repeatedly up to 8 times in 1,003 patients. Air pollution and weather variables were measured concurrently in each city. Patients were genotyped and additional information collected at baseline. Analyses had to account for longitudinal data structure and potential nonlinearity of weather and trend variables, with smooth effects estimated in the mixed model framework. As the shape of the air pollution dose-response functions has important policy implications, one aim of the study was to investigate the functional form of the air pollution effects on inflammation. For illustration, we focus on the effect of PM 10, particulate matter with diameter less than 10 μm, on Fibrinogen in Barcelona. A total of 1074 valid blood samples and PM 10 exposures were available for 183 patients. The model used for the PM 10 -Fibrinogen dose-response function is FIB ij = u i + f(pm10 ij )+ L l=2 β l x ijl + ε ij, ε ij iid N(0,σ 2 ε), (7) where FIB ij is the jth Fibrinogen value of the ith patient, u i iid N(0,σ 2 u) is a random patient intercept, and PM10 indicates the 5-day-average PM 10

Sonja Greven et al. 5 Fibrinogen [g/l] 3.7 3.8 3.9 4.0 4.1 4.2 4.3 20 40 60 80 100 5 day average PM 10 concentration [μg/m 3 ] FIGURE 1. Estimated smooth PM 10-Fibrinogen dose-response function in Barcelona. exposure before blood withdrawal. f(.) is a smooth, unspecified, function estimated using penalized cubic B-Splines and penalizing deviations from linearity (Greven et al., 2006). Linear covariates x l are patient s age, asthma diagnosis, time trend, weekday and air temperature (cubic polynomial). Figure 1 shows the estimated smooth PM 10 effect on Fibrinogen in Barcelona. An important scientific question is whether the dose-response function is linear. This is equivalent to testing (2) in (7), where σs 2 is a variance component controlling the smoothness of f(.). Note that the i.i.d. assumption is violated and that the model includes two variance components. The test statistic for testing linearity of f( ) against a general alternative takes the value RLRT =2.9. Test results for all four approximations are given in Table 2. The fast finite sample approximation reduces computation time by 4 orders of magnitude, while results are similar to a bootstrap. The aud approximation gives results close to the bootstrap even for sample sizes 100 times lower. The 0.5χ 2 0 :0.5χ2 1 approximation is clearly conservative. In all cases, results indicate a significant difference from linearity. 5 Summary We have discussed testing for zero variance components in linear mixed models. Possible applications include, but are not limited to, testing for zero random intercepts or slopes and testing for linearity of a smooth function against a general alternative. For models with one variance component, we recommend directly using the exact null distribution of the RLRT

6 Likelihood ratio testing for zero variance components in linear mixed models Approximation Samples Time p-value Fast finite sample 100,000 35sec 0.023 aud 100 4min 0.026 aud 1,000 40min 0.031 aud 10,000 8h 0.029 0.5χ 2 0 :0.5χ2 1 - - 0.044 Bootstrap 10,000 8h 0.025 TABLE 2. Testing the PM 10 effect on Fibrinogen in Barcelona for linearity. Computation time was measured on a standard PC using Matlab (f.f.s.) / SAS. statistic derived in Crainiceanu and Ruppert (2004), which can be simulated efficiently. For models with more than one variance component, we have proposed two approximations to the finite sample null distribution of the RLRT. Extensive simulations showed superiority of both approximations over the 0.5χ 2 0 :0.5χ2 1 approximation and parametric bootstrap currently used. Our results extend existing methodology to linear mixed models with more than one variance component and lacking independence assumption. We have illustrated the use in testing for linearity of doseresponse-functions for longitudinal data on air pollution health effects. Acknowledgments: The first author was supported by the German Academic Exchange Service (DAAD). We would like to thank Yu-Jen Cheng and Fabian Scheipl for discussions related to the topic of this paper. Special thanks to the Airgene study group. The Airgene study was funded by EU contract QLK4-CT- 2002-O2236. References Crainiceanu, C., and Ruppert, D. (2004). Likelihood ratio tests in linear mixed models with one variance component. Journal of the Royal Statistical Society: Series B, 66(1), 165-185. Gong, G. and Samaniego, F. (1981). Pseudo Maximum Likelihood Estimation: Theory and Applications. The Annals of Statistics, 9(4), 861-869. Greven, S., Küchenhoff, H., and Peters, A. (2006). Additive mixed models with P-Splines. Proceedings of the 21st International Workshop on Statistical Modelling, 201-207. Eds.: Hinde J, Einbeck J and Newell J. Liang, K.-Y. and Self, S.G. (1996). On the Asymptotic Behaviour of the Pseudolikelihood Ratio Test Statistic. Journal of the Royal Statistical Society: Series B, 58(4), 785-796. Self, S.G. and Liang, K.-Y. (1987). Asymptotic Properties of Maximum Likelihood Estimators and Likelihood Ratio Tests Under Nonstandard Conditions. Journal of the American Statistical Association, 82(398), 605-610. Stram, D.O. and Lee, J.-W. (1994). Variance Components Testing in the Longitudinal Mixed Effects Model. Biometrics, 50(3), 1171-1177.