are censored. Because artificial censoring reduces the information available and leads to nonsmooth estimating functions, prior research has noted
|
|
- Oliver Tate
- 6 years ago
- Views:
Transcription
1 ABSTRACT VOCK, DAVID MICHAEL. Advanced Statistical Methods for Complex Longitudinal Data. (Under the direction of Marie Davidian and Anastasios A. Tsiatis.) Longitudinally collected data play an important role in many biomedical applications. As researchers have greater capacity to collect and store large amounts of data over time, biomedical research has shifted from trying to understand the effects of therapy at a single time point, to a more nuanced understanding of how therapy, which could change over time, longitudinally affects a patient. However, many standard methods of analyzing longitudinal data are not adequate to address questions of interest. The first half of the thesis addresses methodological challenges when the longitudinally collected response may be censored. Mixed models are commonly used to represent longitudinal or repeated measures data. An additional complication arises when the response is censored, for example, due to limits of quantification of the assay used. While Gaussian random effects are routinely assumed, little work has characterized the consequences of misspecifying the random effects distribution nor has a more flexible distribution been studied for censored longitudinal data. We show that, in general, maximum likelihood estimators will not be consistent when the random effects density is misspecified, and the effect of misspecification is likely to be greatest when the true random effects density deviates substantially from normality and the number of non-censored observations on each subject is small. We develop a mixed model framework for censored longitudinal data in which the random effects are represented by the flexible seminonparametric (SNP) density and show how to obtain estimates in SAS procedure NLMIXED. Simulations show that this approach can lead to reduction in bias and increase in efficiency relative to assuming Gaussian random effects. The methods are demonstrated on data from a study of hepatitis C virus. In the second portion, we develop methods to assess the effect of organ transplantation, a time-varying treatment, on survival. Because the number of patients waiting for organ transplants exceeds the number of organs available, a better understanding of how transplantation affects the distribution of residual lifetime is needed to improve organ allocation. However, there has been little work to assess the survival benefit of transplantation from a causal perspective. Previous methods developed to estimate the causal effects of treatment in the presence of timevarying confounders have assumed that treatment assignment was independent across patients, which is not true for organ transplantation. We develop a version of G-estimation that accounts for the fact that treatment assignment is not independent across individuals to estimate the parameters of a structural nested failure time model. In addition, G-estimation for failure time models requires the use of artificial censoring, a technique where some subjects observed to fail
2 are censored. Because artificial censoring reduces the information available and leads to nonsmooth estimating functions, prior research has noted that finding the solutions to estimating equations can be difficult. We suggest some computational strategies to mitigate the problems typically encountered with artificial censoring. We derive the asymptotic properties of our estimator and confirm through simulation studies that our method leads to valid inference of the effect of transplantation on the distribution of residual lifetime. We demonstrate our method on the survival benefit of lung transplantation using data from the United Network for Organ Sharing (UNOS).
3 Copyright 2012 by David Michael Vock All Rights Reserved
4 Advanced Statistical Methods for Complex Longitudinal Data by David Michael Vock A dissertation submitted to the Graduate Faculty of North Carolina State University in partial fulfillment of the requirements for the Degree of Doctor of Philosophy Statistics Raleigh, North Carolina 2012 APPROVED BY: Dennis D. Boos Daowen Zhang Marie Davidian Co-chair of Advisory Committee Anastasios A. Tsiatis Co-chair of Advisory Committee
5 DEDICATION To my parents, Janice and Dave; my sister, Emily; and my best friend, Laura. ii
6 BIOGRAPHY Originally from Wheaton, Illinois, David M. Vock is the son of David J. and Janice Vock. David graduated from Wheaton North High School in 2003 and was captain of the two-time Illinois state champion scholastic bowl team for Wheaton North. After high school, David attended St. Olaf College in Northfield, Minnesota where he majored in mathematics and chemistry with a concentration in statistics. While at St. Olaf College, he had the opportunity to study biostatistics and collaborate with World Health Organization researchers in Geneva, Switzerland. David was elected to Phi Beta Kappa and graduated summa cum laude with a Bachelor of Arts degree and distinction in statistics from St. Olaf in Interested in many different academic disciplines, David decided to pursue graduate studies in statistics because, in the words of John Tukey, he could play in everyone s backyard. He began his graduate studies in 2007 at North Carolina State University in Raleigh, North Carolina. David received a Master of Statistics degree from Noth Carolina State University in He actively collaborates with medical researchers at Duke Clinical Research Institute. David s collaborative work has largely focused on hepatitis C and lung transplantation research and has motivated much of his methodological research in longitudinal data analysis, causal inference, and dynamic treatment regimes. He intends to complete his Ph.D. under the direction of Drs. Marie Davidian and Anastasios A. Tsiatis. David has accepted a position as an assistant professor in the Division of Biostatistics at the University of Minnesota School of Public Health. iii
7 ACKNOWLEDGEMENTS Many people have helped and guided me through my graduate studies and dissertation research. I would like to express my gratitude to a few people below while acknowledging that many more people deserve praise. I would like especially to thank my advisors, Drs. Marie Davidian and Butch Tsiatis. They have showed incredible patience in allowing me to explore various research lines and I am very thankful for their latitude. Their expertise and gentle guidance have been immensely helpful in completing this work. I also would like to thank them for offering me a research assistantship to support me during most of my time at North Carolina State. Mostly, I would like to express my gratitude to them for being great faculty role models and mentors. They have guided me through the peer-review process, made sure that I would gain teaching experience, and given me valuable advice on searching for academic jobs. Butch and Marie have shown me how to contribute to interdisciplinary research and how to use collaborative research to motivate methodological work. I wish to thank all the faculty and staff at North Carolina State that have helped me along the way. I want to thank especially the Directors of the Statistics Graduate Program during my first years at North Carolina State, Drs. Pam Arroway and Jacqueline Hughes-Oliver, for being patient in answering all my questions and taking the time to help me find interesting funding opportunities. I am also appreciative of all the advice Dr. Eric Laber has given me about how to interview for faculty jobs and how to be a successful junior faculty member. Finally, I would like to thank Drs. Dennis Boos and Daowen Zhang for serving on my committee, writing letters of recommendation, and being wonderful teachers of some of the most interesting and useful courses I have taken in graduate school, especially Advanced Inference and Mixed Models and Variance Components. I am also thankful for the opportunity to work with Karen Pieper at Duke Clinical Research Institute. She has been an invaluable resource on how to work with clinical investigators and artfully explain advanced methodology to a broad audience. I have learned a lot from her ability to prioritize the most urgent needs of a statistical analysis and to cajole subject area experts to buy-in to rigorous statistical analysis. In addition, I would like to thank all the clinical investigators that I have worked with at Duke. Besides teaching me a lot about their own area of research, they have been patient and nurturing. I am especially grateful for the great education that I received at St. Olaf College that well prepared me for graduate school. I want to thank my mentors and research advisors, Drs. Rebecca Judge, Julie Legler, and Paul Roback. They have been extremely supportive and provided a lot of encouragement that I could be successful in graduate school. iv
8 Finally, I want to thank all my family and friends for their love and support. Simply put, without them I would not have succeeded. My parents have given so much to see that I have been successful and have always been loving and caring. Both my parents and my sister, Emily, have taught me to laugh, stay humble, and never to take myself too seriously. Finally, I would like to thank my girlfriend Laura for her patience, support, and good humor while I wrote this dissertation. v
9 TABLE OF CONTENTS List of Tables viii List of Figures xii Chapter 1 Mixed Model Analysis of Censored Longitudinal Data with Flexible Random Effects Density Introduction Gaussian Linear Mixed Model with Censored Response Semiparametric Linear Mixed Model with Censored Response Estimation Using SAS NLMIXED Simulation Application Discussion Chapter 2 Assessing the Effect of Organ Transplantation on the Distribution of Residual Lifetime Introduction Developing a Statistical Framework Potential Outcome Framework Failure Time Model Observed Data Organ Allocation Model Development of Class of Estimators Proposed Class of Estimators Asymptotic Properties of the Class of Estimators Reasonable Estimators in the Class Nuisance Parameters Censored Survival Times Artificial Censoring Mitigating Problems of Artificial Censoring Simulation Study UNOS Dataset Description of Cohort Model Specification Results Diagnostics and Non-parametric Estimate of Survival Conclusion References Appendices Appendix A Asymptotic Bias of Parameter Estimators for Censored Longitudinal Mixed Models vi
10 Appendix B SAS Code for Censored Longitudinal Models B.1 Obtain Empirical Bayes Estimates B.2 Obtain Grid of Starting Values B.3 SNP Estimation Appendix C Additional Simulations for Mixed Model Analysis of Censored Longitudinal Data Appendix D Joint Distribution of Potential Outcomes and Observed Data Appendix E Consistency and Asymptotic Normality of the Class of Estimators E.1 Proof of Consistency E.2 Proof of Asymptotic Normality Appendix F Improving the Efficiency of Class of Estimators Appendix G Doubly Robust Estimators Appendix H Estimating Nuisance Parameters H.1 Proof: Estimation of Nuisance Parameters Does Not Affect Asymptotic Variance H.2 Suggestions on Estimating ξ Appendix I Organ Transplantation Simulation Details vii
11 LIST OF TABLES Table 1.1 Table 1.2 Table 1.3 Table 1.4 Table 1.5 Table 1.6 Table 1.7 Table 1.8 Proportion of subjects by number of non-censored observation for the four random effects densities considered in the simulations with q = 2 described in Section Simulation results when Gaussian random effects were assumed for all models regardless of the true distribution of the random effects. The simulation included 500 data sets with 500 subjects each. MC Avg: Monte Carlo average of the parameter estimates; MC SD: Monte Carlo standard deviation of the parameter estimates; Avg SE: Average of the standard error estimates; CP: Monte Carlo coverage probability of the 95 percent Wald-type confidence intervals Proportion of subjects by number of non-censored observation when the true random effects density was bimodal and the time points where data were collected was varied. t A = (0, 1, 2, 3, 4) T ; t B = (0, 1, 2, 3, 3.5, 4, 4.5) T ; t C = ( 2, 1, 0, 1, 2, 3, 4) T Simulation results when Gaussian random effects were assumed for the bimodal random effects but t i in (1.2) was varied. The simulation included 500 data sets with 500 subjects each. t A = (0, 1, 2, 3, 4) T ; t B = (0, 1, 2, 3, 3.5, 4, 4.5) T ; t C = ( 2, 1, 0, 1, 2, 3, 4) T ; MC Avg: Monte Carlo average of the parameter estimates; MC SD: Monte Carlo standard deviation of the parameter estimates; Avg SE: Average of the standard error estimates; CP: Monte Carlo coverage probability of the 95 percent Waldtype confidence intervals Proportion of time K was selected using AIC, HQIC, and BIC when the SNP density was used to model the random effects density. The simulation included 500 data sets with 500 subjects each Simulation results when the SNP density was used to model the random effects density. Models with K = 0, K = 1, and K = 2 were fit and K was selected using HQIC. The simulation included 500 data sets with 500 subjects each. MC Avg: Monte Carlo average of the parameter estimates; MC SD: Monte Carlo standard deviation of the parameter estimates; Avg SE: Average of the standard error estimates; Ratio MSE: Ratio of the Monte Carlo mean square error between K = 0 and K selected by the HQIC Proportion of responses censored at each time point for the 811 patients with CT genotype in the IDEAL trial Proportion of subjects by number of uncensored viral load measurements for the 811 patients with the CT genotype in the IDEAL trial Table 1.9 Information criteria and parameter estimates from fitting model (1.10) concerning the IDEAL study with K = 0, K = 1, and K = Table 1.10 Parameter estimates, standard errors, and a 95 percent Wald-type confidence limits from fitting model (1.10) assuming the random effects followed the SNP density with K = viii
12 Table 2.1 Table 2.2 Summary of the parameter estimates for θ 0 by estimator from 1,000 Mote Carlo simulations when the number of organs allocated was 1,250. Converge: Proportion of Monte Carlo datasets the estimator converged; MC mean: Monte Carlo average of the parameter estimates; MC SD: Monte Carlo standard deviation of the parameter estimates; Avg SE: Average of the standard error estimates; Ratio Avg SE:SD: Ratio of the average of the standard error estimates to the standard deviation of the parameter estimates; Ratio MSE: Ratio of the average squared error of the fac estimates to the average squared error of the estimates of the given estimator; CP: Monte Carlo coverage probability of 95 percent Wald-type confidence intervals. Approximate standard errors for the values in the column MC Mean ranged from to , the column MC SD ranged from to , the column Avg SE ranged from to , the column Ratio Avg SE:SD ranged from to 0.045, the column Ratio MSE ranged from to 0.27, the column CP ranged from to The true value of θ 0 is Summary of the parameter estimates for θ 1 by estimator from 1,000 Mote Carlo simulations when the number of organs allocated was 1,250. Converge: Proportion of Monte Carlo datasets the estimator converged; MC mean: Monte Carlo average of the parameter estimates; MC SD: Monte Carlo standard deviation of the parameter estimates; Avg SE: Average of the standard error estimates; Ratio Avg SE:SD: Ratio of the average of the standard error estimates to the standard deviation of the parameter estimates; Ratio MSE: Ratio of the average squared error of the fac estimates to the average squared error of the estimates of the given estimator; CP: Monte Carlo coverage probability of 95 percent Wald-type confidence intervals. Approximate standard errors for the values in the column MC Mean ranged from to , the column MC SD ranged from to , the column Avg SE ranged from to , the column Ratio Avg SE:SD ranged from to 0.045, the column Ratio MSE ranged from to 0.11, the column CP ranged from to The true value of θ 1 is ix
13 Table 2.3 Table 2.4 Table 2.5 Summary of the parameter estimates for θ 0 by estimator from 1,000 Mote Carlo simulations when the number of organs allocated was 2,500. Converge: Proportion of Monte Carlo datasets the estimator converged; MC mean: Monte Carlo average of the parameter estimates; MC SD: Monte Carlo standard deviation of the parameter estimates; Avg SE: Average of the standard error estimates; Ratio Avg SE:SD: Ratio of the average of the standard error estimates to the standard deviation of the parameter estimates; Ratio MSE: Ratio of the average squared error of the fac estimates to the average squared error of the estimates of the given estimator; CP: Monte Carlo coverage probability of 95 percent Wald-type confidence intervals. Approximate standard errors for the values in the column MC Mean ranged from to , the column MC SD ranged from to , the column Avg SE ranged from to , the column Ratio Avg SE:SD ranged from to 0.045, the column Ratio MSE ranged from to 0.13, the column CP ranged from to The true value of θ 0 is Summary of the parameter estimates for θ 1 by estimator from 1,000 Mote Carlo simulations when the number of organs allocated was 2,500. Converge: Proportion of Monte Carlo datasets the estimator converged; MC mean: Monte Carlo average of the parameter estimates; MC SD: Monte Carlo standard deviation of the parameter estimates; Avg SE: Average of the standard error estimates; Ratio Avg SE:SD: Ratio of the average of the standard error estimates to the standard deviation of the parameter estimates; Ratio MSE: Ratio of the average squared error of the fac estimates to the average squared error of the estimates of the given estimator; CP: Monte Carlo coverage probability of 95 percent Wald-type confidence intervals. Approximate standard errors for the values in the column MC Mean ranged from to , the column MC SD ranged from to , the column Avg SE ranged from to , the column Ratio Avg SE:SD ranged from to 0.045, the column Ratio MSE ranged from to 0.090, the column CP ranged from to The true value of θ 1 is Parameter estimates for the model of the propensity to receive an available organ. Group A: primarily COPD; Group B: primarily pulmonary hypertension; Group C: primarily CF; Group D: primarily IPF; Single LTx: single lung transplantation. The reference for Private Insurance and Medicare Insurance is Medicaid or other public funding for payment. The reference for each UNOS Region is Region 11. To allow for a non-linear relationship between LAS and the log propensity to receive treatment, we considered a restricted cubic spline transformation of LAS knot points at 32, 36, 43, and 85. LAS and LAS refer to the non-linear terms of this transformation x
14 Table 2.6 Table B.1 Parameter estimates of the scale accelerated failure time model for the effect of lung transplantation. Group A: primarily COPD; Group B: primarily pulmonary hypertension; Group C: primarily CF; Group D: primarily IPF; Single LTx: single lung transplantation. The reference for Private Insurance and Medicare Insurance is Medicaid or other public funding for payment. LAS has been centered at 30, so the interpretation of the intercept term is the log acceleration factor at a LAS of Conversion between the notation used in Section 1 and the names used in the example code Table C.1 Proportion of subjects by the number of non-censored observations for the four random effect densities considered in the simulations with q = Table C.2 Simulation results when Gaussian random effects were assumed for all models regardless of the true distribution of the random effects. The simulation included 500 data sets with 500 subjects each. MC Avg: Monte Carlo average of the parameter estimates; MC SD: Monte Carlo standard deviation of the parameter estimates; Avg SE: Average of the standard error estimates; CP: Monte Carlo coverage probability of the 95 percent Wald-type confidence intervals Table C.3 Proportion of time K was selected using AIC, HQIC, and BIC for q = 1 simulations when SNP was used to estimate the random effects Table C.4 Simulation results when SNP was used to estimate the random effects for q = 1 simulations. Models with K = 0, K = 1, and K = 2 were fit and K was selected using HQIC. The simulation included 500 data sets with 500 subjects each. MC Avg: Monte Carlo average of the parameter estimates; MC SD: Monte Carlo standard deviation of the parameter estimates; Avg SE: Average of the standard error estimates; Ratio MSE: Ratio of the Monte Carlo mean square error between K = 0 and K selected by the HQIC Table I.1 Quantile (years) of the simulated residual life had the patient never been transplanted by LAS at transplantation. The maximum residual follow-up time is 5.66 years in the simulation xi
15 LIST OF FIGURES Figure 1.1 Figure 1.2 Trajectory of log 10 viral load of 25 randomly selected subjects with genotype CT from the IDEAL study. The lower limit of quantification, log 10 IU/mL, is shown in the green dotted line. For graphical purposes, the lower limit of quantification was imputed for the censored responses.. 2 Contour plot of the true random effects densities from the skewed and bimodal scenarios considered in the simulation described in Section 1.5. In each case, the densities were a mixture of two normals Figure 1.3 Average estimated contour and marginal density plots from fitting 500 data sets with skewed random effects where the density was assumed to follow the SNP density with K selected by HQIC. The blue dotted lines on the marginal density plots are the true marginal density. The true bivariate random effects density is plotted in Figure 1.2 and described in Section Figure 1.4 Average estimated contour and marginal density plots from fitting 500 data sets with bimodal random effects where the density was assumed to follow the SNP density with K selected by HQIC. The blue dotted lines on the marginal density plots are the true marginal density. The true bivariate random effects density is plotted in Figure 1.2 and described in Section Figure 1.5 Contour plot of the bivariate density estimate and marginal density estimates for the subject-specific intercepts and slopes with K = 2 from model (1.10) concerning the IDEAL study Figure 2.1 Figure 2.2 Figure 2.3 Plot of ij (θ, F j) and Ṽ ij (θ, F j) as a function of R ij where Cij (θ, F j) = 500 and ɛ = 50. These are the smooth approximations to ij (θ, F j ) = I{ R ij (θ) < Cij (θ, F j)} and Ṽij(θ, F j ) = min{ R ij (θ), Cij (θ, F j)}, respectively Histogram of all LAS scores each time an organ was offered (left) and LAS scores at transplantation for those patients that were transplanted (right) Kaplan-Meier residual survival curves for patients from the time of listing on the waitlist and transplantation. We have stratified by LAS at listing and transplantation, respectively, for these curves. The waitlist survival was censored at the time the patient was transplanted xii
16 Figure 2.4 Figure 2.5 Figure C.1 Figure C.2 Figure I.1 Logarithm of the estimated scale acceleration factor in the transformation model for the causal effect of lung transplantation in Group A (primarily COPD, left) and Group D (primarily IPF, right) by LAS and transplantation type. The curves here are drawn for patients under 60 years of age, not in UNOS regions 6-8, and not paying for the transplant through private insurance or Medicare. We emphasize that negative (positive) values of the log scale factor suggest that transplantation increases (decreases) lifespan. The histogram in the background gives the number of patients transplanted for that range of LAS and native disease group Kaplan-Meier survival curves of the transformed residual lifetime for all (i, j) pairs by LAS group and native disease. The rows correspond to native disease groups A, C, and D from top to bottom. The solid blue line corresponds to all (i, j) pairs where dn ij = 1 and the red dotted line corresponds to all (i, j) pairs where dn ij = 0. If the transformation model has been correctly specified then the two lines should be approximately the same Estimated random effects density for each of the four random effect densities considered (clockwise from upper left: normal, t 5, bimodal, and skewed) when SNP was used to estimate the random effects and K was selected using HQIC. The estimated density is plotted for 100 randomly selected Monte Carlo data sets. The truth is superimposed in red, and the average of all 500 Monte Carlo data sets is superimposed in blue Histogram of the estimated variance of the random effect when SNP was used to estimate the random effects density and the true random effect density was t 5 (left). The estimated random effects densities for those data sets where the estimate of var(b 0i ) is greater than 4 are also plotted (right) Logarithm of rate parameter from the exponential distribution used to generate the residual lifetime had the patient never been transplanted xiii
17 Chapter 1 Mixed Model Analysis of Censored Longitudinal Data with Flexible Random Effects Density 1.1 Introduction Longitudinal or repeated measures data are commonly represented by mixed effects models. A complication occurs when the response is censored for some of the observations, which often arises when assay measures are collected over time and the assay procedure is subject to limits of quantification (Hughes, 1999; Jacqmin-Gadda et al., 2000; Wu, 2002; Gray et al., 2004; Wang and Fygenson, 2009, and references therein). As an example, we consider data from the Individual Dosing Efficacy versus Flat Dosing to Assess Optimal Pegylated Interferon Therapy (IDEAL) study, which tracked the viral load progression of treatment-naive patients with hepatitis C (Thompson et al., 2010). One objective was to characterize the viral load decline in patients receiving standard treatment, pegylated interferon-alpha and ribavirin, over the first twelve weeks of treatment. Within-subjects, the trajectory of the log 10 viral load over the first 12 weeks is approximately linear, but responses were censored from below at log 10 IU/mL, the lower limit of quantification of the assay used. Over the first 12 weeks, 10.5 percent of all observations were censored, and 35.2 percent were censored at week 12. The trajectories of 25 randomly selected subjects are shown in Figure 1.1. A standard analysis would impute the censored responses as the limit or half limit of quantification and then fit a mixed model for uncensored longitudinal data. However, Jacqmin-Gadda et al. (2000) showed through simulation studies that such crude parameter estimators are badly biased even for modest levels of censoring. A refined analysis would postulate a likelihood that 1
18 Time (weeks) log Viral Load (IU/mL) Figure 1.1: Trajectory of log 10 viral load of 25 randomly selected subjects with genotype CT from the IDEAL study. The lower limit of quantification, log 10 IU/mL, is shown in the green dotted line. For graphical purposes, the lower limit of quantification was imputed for the censored responses. 2
19 accounts for the censoring and obtain the maximum likelihood estimates (MLE). Substantial research has gone into efficient methods to solve for the MLE. Hughes (1999) developed a Monte Carlo Expectation-Maximization (MCEM) estimation procedure to obtain the MLE for linear mixed effects models with censored responses. Wu (2002) extended the work of Hughes by developing a EM algorithm for nonlinear mixed-effects models with censored responses that allowed for covariates to be measured with error. Wu (2004) further extended the EM algorithm to joint models of multiple longitudinal processes when the responses may be censored. Vaida and Liu (2009) derived a closed form expression for the expectation step rather than use Monte Carlo simulation, which greatly reduced the computational time. The estimation procedure developed by Vaida and Liu (2009) is available in the R package lmec. Other computational approaches have been proposed besides the EM algorithm. Fitzgerald et al. (2002) implemented a multiple imputation approach that avoids maximization of a censored likelihood. Jacqmin-Gadda et al. (2000) noted that the full likelihood is the product of the marginal density of the non-censored responses and the conditional distribution of the censored responses, evaluated at the limit of detection, given the non-censored responses. The authors used numerical integration to estimate the conditional distribution and then combined the Simplex algorithm and the Marquardt algorithm in the Fortran language to solve for the MLE. In contrast, both Thiébaut and Jacqmin-Gadda (2004) and Lyles et al. (2000) approximate the full likelihood of mixed effect model by numerically integrating over the random effects densities and then using standard optimizers to obtain parameter estimates. Previous methods cited above using this full-likelihood approach have all assumed Gaussian random effects and intra-subject error. While intra-subject error may be reasonably assumed to be normally distributed, the assumption of Gaussian random effects may be too restrictive in many applications. In the IDEAL study, some patients do not respond to treatment while others show marked declines in the viral load, so the distribution of subject-specific slopes is unlikely to be approximately Gaussian. For non-censored linear mixed effects models, the maximum likelihood estimators for the fixed effects and covariance components are consistent under broad regularity conditions even when the random effects distribution is misspecified (Verbeke and Lesaffre, 1997). For non-linear mixed effects models, which include mixed models with censored responses and generalized linear mixed models, the maximum likelihood estimators will not, in general, be consistent if the random effects distribution is misspecified. The effect of misspecification has been well-studied in generalized linear mixed models. Broadly speaking, researchers have noted that there is considerable bias and loss of efficiency in the maximum likelihood parameter estimators assuming Gaussian random effects when either the true random effects density deviates substantially from normality or the variance of the random effects is large and there is moderate misspecification of the random effects (Neuhaus et al., 1992; Hartford and Davidian, 2000; Heagerty and Kurland, 3
20 2001; Agresti et al., 2004; Litière et al., 2008). The potential effect of misspecification of the random effects distribution when the response is censored has not been examined. A variety of methods have been proposed to relax the Gaussian assumption for the random effects (Magder and Zeger, 1996; Verbeke and Lesaffre, 1997; Kleinman and Ibrahim, 1998; Aitkin, 1999; Zhang and Davidian, 2001; Chen et al., 2002; Lee and Thompson, 2008; Ho and Hu, 2008; Komarek and Lesaffre, 2008; Lin and Lee, 2008; Bandyopadhyay et al., 2010; Lachos et al., 2010); however, none of these methods has been implemented for censored longitudinal data. For linear mixed models, Zhang and Davidian (2001) assumed the random effects follow a smooth density that can be represented by the seminonparametric (SNP) formulation proposed by Gallant and Nychka (1987). They showed the log likelihood can be written in a closed form leading to straightforward estimation. We show how the SNP representation can be implemented for the random effects density in linear mixed models when the response is potentially censored. In Section 1.2, we formulate the censored-response mixed model with Gaussian random effects and show that incorrectly assuming the random effects are Gaussian leads to inconsistent estimators of model parameters. In addition, we suggest general scenarios where estimation is likely to be particularly poor. In Section 1.3, we discuss the censored-response linear mixed model, where the random effects are assumed to follow a smooth, continuous density and introduce the SNP representation. Section 1.4 describes how to fit the SNP model using SAS procedure NLMIXED, how to select the degree of flexility in the model, and how to choose starting values. Section 1.5 presents the results of several simulations. In Section 1.6, we illustrate the method by application to data from the IDEAL study. 1.2 Gaussian Linear Mixed Model with Censored Response For ease of presentation, we assume responses are potentially left-censored; developments are easily generalized to right- or interval-censored data. Let Y ij, i = 1,..., m and j = 1,..., n i be the j th response for subject i that would have been been observed had there been no censoring. Consider the usual linear mixed model for longitudinal data Y ij = v T ijδ + s T ijα i + ɛ ij = x T ijβ + s T ijb i + ɛ ij, (1.1) where v ij = (x T ij, st ij )T ; δ = (β T, γ T ); b i = γ + α i ; β and γ are the p- and q-dimensional fixed effects parameters associated with covariates x ij and s ij, respectively; α i are the q-dimensional, mean zero subject-specific random effect vectors associated with covariates s ij, independent across i; and ɛ ij iid N(0, σ 2 ) are the intra-subject error, which are independent of α i. We adopt the rightmost representation in (1.1) and note that the centered random effects b i can be written 4
21 as b i = µ + RZ i, where µ is a (q 1) vector, R is a (q q) lower triangular matrix, and Z i is a (q 1) vector of random effects. Typically, Z i are assumed to follow a standard q-dimensional normal distribution, which implies b i are normally distributed with mean µ = γ and covariance matrix var(α i ) = RR T. As an example, a special case of (1.1) is the linear random coefficient model with baseline covariate x i given by Y ij = x i β + b 0i + b 1i t ij + ɛ ij, (1.2) where b i = (b 0i, b 1i ) T and s ij = (1, t ij ) T. Due to left-censoring, we observe Q ij which takes the value Y ij for Y ij > l ij and takes the value l ij, the known lower limit of quantification for the j th response on subject i, otherwise. Letting r = vech(r) be the non-zero elements of R, the parameters of interest are θ = (β T, µ T, r T, σ 2 ) T, and the likelihood assuming Gaussian random effects is m L(θ; Q) = i=1 n i j=1 [ {Φ 1 (m ij )} I(Q ij=l ij ) { 1 σ φ 1(m ij ) } I(Q ij >l ij ) ] φ q (z i )dz i, (1.3) where m ij = {Q ij x T ij β st ij (µ + RZ i)}/σ, Q is the vector of all responses observed, and φ n ( ) and Φ n ( ) are the standard n-dimensional normal density and distribution. Alternatively, we can express (1.1) as Y i = V i δ + S i α i + ɛ i, where V i {n i (p + q)} and S i (n i q) are the matrices with rows vij T and st ij, respectively, and Y i = (Y i1,..., Y ini ) T and similarly for ɛ i, l i, and Q i. To simplify the following argument, we assume that the design matrix is fixed, V i = V, and n i = n for i = 1,..., m and also suppress the index i throughout. To express (1.3) using vector notation, let f Y (y; θ) and F Y (y; θ) be the multivariate normal density and distribution, respectively, with mean V δ and covariance matrix Σ = SRR T S T + σ 2 I n. Define f Y1 Y 2 (y 1 y 2 ; θ) and F Y1 Y 2 (y 1 y 2 ; θ) to be the conditional multivariate normal density and distribution, respectively, of the vector Y 1 given the vector Y 2. There are 2 n 2 patterns of censoring/non-censoring that could be observed for an individual with at least one censored and one non-censored observation. Index each of these 2 n 2 distinct censoring patterns by k. Let Y k,o (Y k,c, respectively) be the random vector containing elements of Y that would be observed (censored) under censoring pattern k. If the k th pattern were observed for a particular subject, then the likelihood contribution from that subject assuming normal random effects is given by F Yk,c Y k,o (l k,c Q k,o ; θ)f Yk,o (Q k,o ; θ), where Q k,o is the vector of non-censored responses and l k,c (l k,o, respectively) is the limit of quantification for the censored observations (non-censored observations) under censoring pat- 5
22 tern k. The likelihood contribution from one subject can now be written as L(θ, Q) = f Y (Q; θ) I(Q>l) F Y (l; θ) I(Q=l) {F Yk,c Y k,o (l k,c Q k,o ; θ)f Yk,o (Q k,o ; θ)} I(Q k,c=l k,c,q k,o >l k,o ). 2 n 2 k=1 If the random effects distribution is incorrectly specified, the maximum likelihood estimator ˆθ will converge to value of θ that solves E { / θ log L(θ; Q)} = 0, where the expectation is taken with respect to the true random effects distribution. Assuming that the covariance components are known and using a first-order Taylor series expansion, we show in Appendix A that ˆδ will converge to the value of δ that solves l 0 = D(δ, V )(δ δ) n 2 k=1 [ l k,o lk,c { } V T Σ 1 (q V δ GY (l) ) F Y (l) f Y (q) g Y (q) dq (1.4) V T k,c Σ 1 k,c {q k,c V k,c δ } { GYk,c Y (l } ] k,o k,c q k,o ) F Yk,c Y k,o (l k,c q k,o ) f Y k,c Y k,o (q k,c q k,o ) g Yk,c Y k,o (q k,c q k,o ) g Yk,o (q k,o )dq k,c dq k,o, where g Y (q; δ ) and G Y (q; δ ) are the true marginal density and distribution, respectively, of Y ; all densities in (1.4) are evaluated at δ, the true parameter value; and D(δ, V ), Q k,c, V and Σ k,c are given in the Appendix A. While complex, the approximation reveals important insights on the asymptotic bias when the random effects are incorrectly specified. In the likely case that the random effects distribution is misspecified, the bias is driven by the difference in the true conditional distribution of the censored response given the non-censored response (that is g Yk,c Y k,o (q k,c q k,o )) and the same conditional distribution induced by the Gaussian random effects (that is f Yk,c Y k,o (q k,c q k,o )), weighted by the likelihood of the non-censored response (that is g Yk,o (q k,o )). Practically, this suggests that small deviations from normality will lead to only slight asymptotic bias. If we have correctly specified the intra-subject error distribution, then g Yk,c Y k,o (y k,c y k,o ) will be approximately normal if the dimension of Y k,o is large regardless of the true random effects density. Thus, if there is a large probability of observing many non-censored observations on each subject, then the asymptotic bias is likely to be slight. This suggests that the overall percentage of censored responses is immaterial in determining the asymptotic bias. Instead the absolute number of non-censored responses for each subject is critical. k,c, 6
23 1.3 Semiparametric Linear Mixed Model with Censored Response We offer a summary of semiparametric linear mixed models and refer the reader to Zhang and Davidian (2001) for a complete description. We now assume that the distribution of Z i belongs to the smooth class of continuous densities described by Gallant and Nychka (1987). This class is sufficiently flexible to include skewed, thick- and thin-tailed, and multimodal distributions but does not include densities with jumps, kinks, or oscillations. These densities can be represented as an infinite series but can be approximated by a truncated series. The densities that are part of the truncated series are referred to as seminonparametric or SNP. We thus assume that the density of Z i can be approximated by the SNP representation with degree of truncation K given by h K (z) = PK(z)φ 2 q (z) = (j j q) K a j1,...,j q (z j 1 1 zjq q ) 2 φ q (z), (1.5) where j l 0 for l = 1,..., q and K is the order of the polynomial P K (z). For example, with K = 2 and q = 2, P K (z) = a 00 + a 10 z 1 + a 01 z 2 + a 20 z1 2 + a 11z 1 z 2 + a 02 z2 2. When K = 0, we show below that a 0,...,0 must equal 1, and the density of Z i is a standard q-dimensional normal. K controls the degree of flexibility of the density h K (z); we discuss in Section 1.4 how to select K. The coefficients a j1,...,j q must be chosen so that h K (z) integrates to one. The constraint hk (z)dz = 1 is equivalent to imposing E{PK 2 (U)} = 1, where U follows a standard q- dimensional normal distribution (Zhang and Davidian, 2001). Let a be the d-dimensional vector of coefficients for the polynomial P K (z) and let j 1,..., j q be the subscripts corresponding to the j th element. Then the above constraint { can be rewritten as E{PK 2 } (U)} = at Aa = 1, where A is the matrix with (j, k) element equal to E(U j 1+k 1 1 ) E(U jq+kq 1 ) and U 1 is distributed as a standard normal. Rather than impose a constraint on a, we may rewrite the (d 1) vector a as a function of d 1 parameters. Because A must be positive definite, there exists a matrix B such that A = B 2. If we let c = Ba, then a T Aa = 1 implies c T c = 1. As c and c will result in the same density h K (z), c = (c 1,..., c d ) T must lie in the half-unit sphere of R d. Therefore, we can express c in terms of a polar coordinate transformation, where c 1 = sin(ξ 1 ), c 2 = cos(ξ 1 ) sin(ξ 2 ),..., c d 1 = cos(ξ 1 )... cos(ξ d 2 ) sin(ξ d 1 ),c d = cos(ξ 1 )... cos(ξ d 1 ), π/2 ξ j π/2 for j = 1,..., d 1, and ξ = (ξ 1,..., ξ d 1 ) T. The vector a can be written as B 1 c, which only involves d 1 parameters. The parameters of interest are now θ K SNP = (βt, µ T, r T, σ 2, ξ T ) T, which includes an additional d 1 parameters compared to when normality is assumed. 7
24 Note that the SNP density does not impose E(Z i ) = 0, so that E(b i ) = γ = µ + R{E(Z i )} and Var(b i ) = R{Var(Z i )}R T. The moments of Z i are linear combinations of the moments of a standard normal density, which can be found using recursion formulas. 1.4 Estimation Using SAS NLMIXED In the case where there is no censoring of the response variable, the log likelihood assuming the SNP representation of the random effects can be expressed in a closed form (Zhang and Davidian, 2001). Censoring necessitates numerical integration. For a fixed K, the likelihood of θ K SNP is given by m L(θSNP K ; Q) = i=1 n i j=1 [ {Φ 1 (m ij )} I(Q ij=l ij ) { 1 σ φ 1(m ij ) } I(Q ij >l ij ) ] P 2 K(z i )φ q (z i )dz i (1.6) For each observation i, this requires evaluation of a q-dimensional integral, which, in practice, would likely only be one- or two-dimensional. In contrast, we could integrate over the marginal density of Y i for each censored observation as Jacqmin-Gadda et al. (2000) proposed. However, the dimension of that integral would equal the number of censored observations for subject i, which could be quite large and computationally intractable. Even though the integral in (1.6) is typically only one- or two-dimensional, approximating the integral can be difficult for complex random effects densities such as the SNP density. More generally, we need to be able to approximate the following integral, L i (θ) = n i j=1 { } f Qij Z i (q ij z i ; β, µ, r, x ij ) g Zi (z i )dz i. (1.7) where f Qij Z i (q ij z i ; β, µ, r, x ij ) and g Zi (z i ) are the density of the response conditioned on the random effects and random effects density, respectively. In this application, and g Zi (z i ) = P 2 K (z i)φ q (z i ). f Qij Z i (q ij z i ; β, µ, r, x ij ) = {Φ 1 (m ij )} I(Q ij=l ij ) { 1 σ φ 1(m ij ) } I(Q ij >l ij ) Many of the standard numerical integration techniques perform quite poorly when the random effects density is complex. For example, adaptive Gaussian quadrature requires maximizing the integrand in (1.7) for each of the m subjects at each updated estimate of ˆβ, ˆµ, and ˆr. The maximizations can be too time consuming if g Zi is sufficiently complex as is the case for the SNP density. Traditional non-adaptive Gaussian quadrature centers the quadrature points at the mean of the random effects and scales them based on the variance of the random effects. 8
25 The mode of the integrand in (1.7) may not be near the mean of the random effects and a large number of quadrature points may be needed to achieve a reasonable approximation to the integral. For q 2, the number of quadrature points needed is usually too large to be computationally feasible. Other approximation techniques, such as the Laplace approximation (Wolfinger, 1993), first-order approximation (Beal and Sheiner, 1988), and importance sampling (Pinheiro and Bates, 1995) are often quite poor or infeasible if g bi is non-gaussian. Even though adaptive Gaussian quadrature is computationally infeasible, it is instructive to examine the underpinnings of the integral approximation (Pinheiro and Bates, 1995). Let Ẑi be the empirical Bayes estimate of Z i given the current estimates of β, µ and r, defined as ˆβ (k), ˆµ (k), and ˆr (k), respectively, and G i be the estimated covariance matrix of Ẑi. That is Ẑ i = arg max zi n i j=1 { f Qij Z i (q ij z i ; ˆβ } (k), ˆµ (k), ˆr (k), x ij ) g Zi (z i ) We can write Z i = Ẑi + G i 1/2 U i for the random variable U i and then perform a change of variable to have the integral in (1.7) taken with respect to U i : L i (θ) = n i G 1/2 i j=1 { } f Qij Z i (q ij Ẑi + G 1/2 i u i ; β, µ, r, x ij ) g Zi (Ẑi + G 1/2 i u i )du i. SAS procedure NLMIXED then approximates the integral using standard Gaussian quadrature with quadrature points based on the standard normal kernel. Using the approximation to (1.7) described above, we can then maximize (1.6) with respect to β, µ, and r to obtain ˆβ (k+1), ˆµ (k+1), and ˆr (k+1), respectively. Typically one would iterate, until convergence, between finding the empirical Bayes estimates to re-approximate (1.7), and then updating β, µ, and r. This requires m maximizations at each iteration because the integral in (1.7) is re-centered and re-scaled about the updated empirical Bayes estimates at each iteration. Heuristically, we would like to center and scale the integral one time at something reasonably close to Z i to avoid m maximizations at each iteration. We propose to center and scale the quadrature points at the empirical Bayes estimates and their estimated variance from assuming Gaussian random effects. Here we need to be careful as the Z i are assumed to have mean zero when Gaussian random effects are assumed, but this need not be the case for a flexible random effects density, such as the SNP density. Let ˆb i,g be the empirical Bayes estimate of b i, the centered random effect, from assuming Gaussian random effects and G i,g the estimated variance of the empirical Bayes estimates. 9
26 Then we have µ + RZ i = ˆb i,g + G 1/2 i,g U i Z i = R 1 (ˆb i,g µ) + R 1 G 1/2 i,g U i, which we can substitute into (1.7) and take the integral with respect to U i : L i (θ) = n i R 1 G 1/2 i,g j=1 [f Qij Z i { q ij R 1 (ˆb i,g µ) + R 1 G 1/2 i,g u i; β, µ, r, x ij }] (1.8) { } g Zi R 1 (ˆb i,g µ) + R 1 G 1/2 i,g u i; η du i. We can now use standard Gaussian quadrature with quadrature points based on the standard normal kernel to approximate (1.8). Because SAS procedure NLMIXED does not allow the user to center the quadrature points at arbitrary values, to force NLMIXED to use the integration strategy above we must use programming statements to center the integration as in (1.8). In addition to forcing NLMIXED to center the quadrature points at arbitrary values, we must also trick NLMIXED to allow non-gaussian random effects. The SAS procedure NLMIXED has been developed to obtain maximum likelihood estimates for mixed models with Gaussian random effects. Except when K = 0, the SNP random effects density will not be normally distributed. However, if we consider n i j=1 [ {Φ 1 (m ij )} I(Q ij=l ij ) { 1 σ φ 1(m ij ) } I(Q ij >l ij ) ] P 2 K(z i ) (1.9) to be the likelihood for Q ij conditioned on the random effects, then the random effects Z i can be thought to follow a standard q-dimensional normal distribution (Liu and Yu, 2008). Example code to implement SNP for left-censored mixed models is given in Appendix B. Among the optimization routines available in NLMIXED, dual-quasi Newton optimization works well. The inverse Hessian matrix may be used to obtain standard errors for the parameter estimates and is computed as part of the standard output in NLMIXED. The preceding discussion assumes a fixed K. We treat K as a tuning parameter and select K by visually comparing the estimated densities and computing information criteria evaluated at the estimates, ˆθ SNP K, of θk SNP from model fits for different values of K (Zhang and Davidian, 2001). The information criteria are of the form 2L(ˆθ SNP K, Q) + C(m)p where p is the number of parameters in the model. For the Akaike information criterion (AIC), C(m) = 2; Hannan- Quinn information criterion (HQIC), C(m) = 2 log log m; and Bayesian information criterion 10
Mixed model analysis of censored longitudinal data with flexible random-effects density
Biostatistics (2012), 13, 1, pp. 61 73 doi:10.1093/biostatistics/kxr026 Advance Access publication on September 13, 2011 Mixed model analysis of censored longitudinal data with flexible random-effects
More informationSemiparametric Mixed Effects Models with Flexible Random Effects Distribution
Semiparametric Mixed Effects Models with Flexible Random Effects Distribution Marie Davidian North Carolina State University davidian@stat.ncsu.edu www.stat.ncsu.edu/ davidian Joint work with A. Tsiatis,
More informationSome New Methods for Latent Variable Models and Survival Analysis. Latent-Model Robustness in Structural Measurement Error Models.
Some New Methods for Latent Variable Models and Survival Analysis Marie Davidian Department of Statistics North Carolina State University 1. Introduction Outline 3. Empirically checking latent-model robustness
More informationEstimating Causal Effects of Organ Transplantation Treatment Regimes
Estimating Causal Effects of Organ Transplantation Treatment Regimes David M. Vock, Jeffrey A. Verdoliva Boatman Division of Biostatistics University of Minnesota July 31, 2018 1 / 27 Hot off the Press
More informationA Sampling of IMPACT Research:
A Sampling of IMPACT Research: Methods for Analysis with Dropout and Identifying Optimal Treatment Regimes Marie Davidian Department of Statistics North Carolina State University http://www.stat.ncsu.edu/
More informationWeb-based Supplementary Materials for A Robust Method for Estimating. Optimal Treatment Regimes
Biometrics 000, 000 000 DOI: 000 000 0000 Web-based Supplementary Materials for A Robust Method for Estimating Optimal Treatment Regimes Baqun Zhang, Anastasios A. Tsiatis, Eric B. Laber, and Marie Davidian
More informationThe consequences of misspecifying the random effects distribution when fitting generalized linear mixed models
The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models John M. Neuhaus Charles E. McCulloch Division of Biostatistics University of California, San
More informationThe performance of estimation methods for generalized linear mixed models
University of Wollongong Research Online University of Wollongong Thesis Collection 1954-2016 University of Wollongong Thesis Collections 2008 The performance of estimation methods for generalized linear
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationInferences about Parameters of Trivariate Normal Distribution with Missing Data
Florida International University FIU Digital Commons FIU Electronic Theses and Dissertations University Graduate School 7-5-3 Inferences about Parameters of Trivariate Normal Distribution with Missing
More informationPrerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3
University of California, Irvine 2017-2018 1 Statistics (STATS) Courses STATS 5. Seminar in Data Science. 1 Unit. An introduction to the field of Data Science; intended for entering freshman and transfers.
More informationUsing Estimating Equations for Spatially Correlated A
Using Estimating Equations for Spatially Correlated Areal Data December 8, 2009 Introduction GEEs Spatial Estimating Equations Implementation Simulation Conclusion Typical Problem Assess the relationship
More informationSemiparametric Regression
Semiparametric Regression Patrick Breheny October 22 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/23 Introduction Over the past few weeks, we ve introduced a variety of regression models under
More informationINFORMATION APPROACH FOR CHANGE POINT DETECTION OF WEIBULL MODELS WITH APPLICATIONS. Tao Jiang. A Thesis
INFORMATION APPROACH FOR CHANGE POINT DETECTION OF WEIBULL MODELS WITH APPLICATIONS Tao Jiang A Thesis Submitted to the Graduate College of Bowling Green State University in partial fulfillment of the
More informationFor more information about how to cite these materials visit
Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/
More informationExperimental designs for multiple responses with different models
Graduate Theses and Dissertations Graduate College 2015 Experimental designs for multiple responses with different models Wilmina Mary Marget Iowa State University Follow this and additional works at:
More informationABSTRACT. HOLM, KATHLEEN JENNIFER. Comparison of Optimal Design Methods in Inverse Problems. (Under the direction of H.T. Banks.)
ABSTRACT HOLM, KATHLEEN JENNIFER. Comparison of Optimal Design Methods in Inverse Problems. (Under the direction of H.T. Banks.) This project was initiated with an investigation by Banks et al [6] into
More informationCausal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions
Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions Joe Schafer Office of the Associate Director for Research and Methodology U.S. Census
More informationComputational statistics
Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated
More informationOptimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai
Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective Anastasios (Butch) Tsiatis and Xiaofei Bai Department of Statistics North Carolina State University 1/35 Optimal Treatment
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationLongitudinal + Reliability = Joint Modeling
Longitudinal + Reliability = Joint Modeling Carles Serrat Institute of Statistics and Mathematics Applied to Building CYTED-HAROSA International Workshop November 21-22, 2013 Barcelona Mainly from Rizopoulos,
More informationStatistical Methods for Alzheimer s Disease Studies
Statistical Methods for Alzheimer s Disease Studies Rebecca A. Betensky, Ph.D. Department of Biostatistics, Harvard T.H. Chan School of Public Health July 19, 2016 1/37 OUTLINE 1 Statistical collaborations
More informationEstimation for state space models: quasi-likelihood and asymptotic quasi-likelihood approaches
University of Wollongong Research Online University of Wollongong Thesis Collection 1954-2016 University of Wollongong Thesis Collections 2008 Estimation for state space models: quasi-likelihood and asymptotic
More informationLifetime prediction and confidence bounds in accelerated degradation testing for lognormal response distributions with an Arrhenius rate relationship
Scholars' Mine Doctoral Dissertations Student Research & Creative Works Spring 01 Lifetime prediction and confidence bounds in accelerated degradation testing for lognormal response distributions with
More informationMultivariate Survival Analysis
Multivariate Survival Analysis Previously we have assumed that either (X i, δ i ) or (X i, δ i, Z i ), i = 1,..., n, are i.i.d.. This may not always be the case. Multivariate survival data can arise in
More informationWeb-based Supplementary Material for A Two-Part Joint. Model for the Analysis of Survival and Longitudinal Binary. Data with excess Zeros
Web-based Supplementary Material for A Two-Part Joint Model for the Analysis of Survival and Longitudinal Binary Data with excess Zeros Dimitris Rizopoulos, 1 Geert Verbeke, 1 Emmanuel Lesaffre 1 and Yves
More informationFULL LIKELIHOOD INFERENCES IN THE COX MODEL
October 20, 2007 FULL LIKELIHOOD INFERENCES IN THE COX MODEL BY JIAN-JIAN REN 1 AND MAI ZHOU 2 University of Central Florida and University of Kentucky Abstract We use the empirical likelihood approach
More informationSEMIPARAMETRIC APPROACHES TO INFERENCE IN JOINT MODELS FOR LONGITUDINAL AND TIME-TO-EVENT DATA
SEMIPARAMETRIC APPROACHES TO INFERENCE IN JOINT MODELS FOR LONGITUDINAL AND TIME-TO-EVENT DATA Marie Davidian and Anastasios A. Tsiatis http://www.stat.ncsu.edu/ davidian/ http://www.stat.ncsu.edu/ tsiatis/
More informationRonald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California
Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationBiost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation
Biost 58 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 5: Review Purpose of Statistics Statistics is about science (Science in the broadest
More informationLikelihood-based inference with missing data under missing-at-random
Likelihood-based inference with missing data under missing-at-random Jae-kwang Kim Joint work with Shu Yang Department of Statistics, Iowa State University May 4, 014 Outline 1. Introduction. Parametric
More informationEstimation of Optimal Treatment Regimes Via Machine Learning. Marie Davidian
Estimation of Optimal Treatment Regimes Via Machine Learning Marie Davidian Department of Statistics North Carolina State University Triangle Machine Learning Day April 3, 2018 1/28 Optimal DTRs Via ML
More informationWEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract
Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of
More informationPQL Estimation Biases in Generalized Linear Mixed Models
PQL Estimation Biases in Generalized Linear Mixed Models Woncheol Jang Johan Lim March 18, 2006 Abstract The penalized quasi-likelihood (PQL) approach is the most common estimation procedure for the generalized
More informationUNIVERSITY OF CALIFORNIA, SAN DIEGO
UNIVERSITY OF CALIFORNIA, SAN DIEGO Estimation of the primary hazard ratio in the presence of a secondary covariate with non-proportional hazards An undergraduate honors thesis submitted to the Department
More informationOther Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model
Other Survival Models (1) Non-PH models We briefly discussed the non-proportional hazards (non-ph) model λ(t Z) = λ 0 (t) exp{β(t) Z}, where β(t) can be estimated by: piecewise constants (recall how);
More informationPropensity Score Weighting with Multilevel Data
Propensity Score Weighting with Multilevel Data Fan Li Department of Statistical Science Duke University October 25, 2012 Joint work with Alan Zaslavsky and Mary Beth Landrum Introduction In comparative
More informationApproximate Median Regression via the Box-Cox Transformation
Approximate Median Regression via the Box-Cox Transformation Garrett M. Fitzmaurice,StuartR.Lipsitz, and Michael Parzen Median regression is used increasingly in many different areas of applications. The
More informationScore test for random changepoint in a mixed model
Score test for random changepoint in a mixed model Corentin Segalas and Hélène Jacqmin-Gadda INSERM U1219, Biostatistics team, Bordeaux GDR Statistiques et Santé October 6, 2017 Biostatistics 1 / 27 Introduction
More informationMISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30
MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD Copyright c 2012 (Iowa State University) Statistics 511 1 / 30 INFORMATION CRITERIA Akaike s Information criterion is given by AIC = 2l(ˆθ) + 2k, where l(ˆθ)
More informationCausal Inference with Big Data Sets
Causal Inference with Big Data Sets Marcelo Coca Perraillon University of Colorado AMC November 2016 1 / 1 Outlone Outline Big data Causal inference in economics and statistics Regression discontinuity
More informationESTIMATING STATISTICAL CHARACTERISTICS UNDER INTERVAL UNCERTAINTY AND CONSTRAINTS: MEAN, VARIANCE, COVARIANCE, AND CORRELATION ALI JALAL-KAMALI
ESTIMATING STATISTICAL CHARACTERISTICS UNDER INTERVAL UNCERTAINTY AND CONSTRAINTS: MEAN, VARIANCE, COVARIANCE, AND CORRELATION ALI JALAL-KAMALI Department of Computer Science APPROVED: Vladik Kreinovich,
More informationOpen Problems in Mixed Models
xxiii Determining how to deal with a not positive definite covariance matrix of random effects, D during maximum likelihood estimation algorithms. Several strategies are discussed in Section 2.15. For
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationComparing Group Means When Nonresponse Rates Differ
UNF Digital Commons UNF Theses and Dissertations Student Scholarship 2015 Comparing Group Means When Nonresponse Rates Differ Gabriela M. Stegmann University of North Florida Suggested Citation Stegmann,
More information401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.
401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis
More informationContents. Part I: Fundamentals of Bayesian Inference 1
Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian
More informationEPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7
Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationBayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang
Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features Yangxin Huang Department of Epidemiology and Biostatistics, COPH, USF, Tampa, FL yhuang@health.usf.edu January
More informationLasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices
Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,
More informationStatistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation
Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence
More informationModification and Improvement of Empirical Likelihood for Missing Response Problem
UW Biostatistics Working Paper Series 12-30-2010 Modification and Improvement of Empirical Likelihood for Missing Response Problem Kwun Chuen Gary Chan University of Washington - Seattle Campus, kcgchan@u.washington.edu
More informationEstimating the Mean Response of Treatment Duration Regimes in an Observational Study. Anastasios A. Tsiatis.
Estimating the Mean Response of Treatment Duration Regimes in an Observational Study Anastasios A. Tsiatis http://www.stat.ncsu.edu/ tsiatis/ Introduction to Dynamic Treatment Regimes 1 Outline Description
More informationSemi-Nonparametric Inferences for Massive Data
Semi-Nonparametric Inferences for Massive Data Guang Cheng 1 Department of Statistics Purdue University Statistics Seminar at NCSU October, 2015 1 Acknowledge NSF, Simons Foundation and ONR. A Joint Work
More informationEmpirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design
1 / 32 Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design Changbao Wu Department of Statistics and Actuarial Science University of Waterloo (Joint work with Min Chen and Mary
More informationGeneralized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.
Texts in Statistical Science Generalized Linear Mixed Models Modern Concepts, Methods and Applications Walter W. Stroup CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint
More informationMinimum Hellinger Distance Estimation in a. Semiparametric Mixture Model
Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.
More informationOne-stage dose-response meta-analysis
One-stage dose-response meta-analysis Nicola Orsini, Alessio Crippa Biostatistics Team Department of Public Health Sciences Karolinska Institutet http://ki.se/en/phs/biostatistics-team 2017 Nordic and
More informationPower and Sample Size Calculations with the Additive Hazards Model
Journal of Data Science 10(2012), 143-155 Power and Sample Size Calculations with the Additive Hazards Model Ling Chen, Chengjie Xiong, J. Philip Miller and Feng Gao Washington University School of Medicine
More informationEstimation and Model Selection in Mixed Effects Models Part I. Adeline Samson 1
Estimation and Model Selection in Mixed Effects Models Part I Adeline Samson 1 1 University Paris Descartes Summer school 2009 - Lipari, Italy These slides are based on Marc Lavielle s slides Outline 1
More informationLikelihood and Fairness in Multidimensional Item Response Theory
Likelihood and Fairness in Multidimensional Item Response Theory or What I Thought About On My Holidays Giles Hooker and Matthew Finkelman Cornell University, February 27, 2008 Item Response Theory Educational
More informationT E C H N I C A L R E P O R T A SANDWICH-ESTIMATOR TEST FOR MISSPECIFICATION IN MIXED-EFFECTS MODELS. LITIERE S., ALONSO A., and G.
T E C H N I C A L R E P O R T 0658 A SANDWICH-ESTIMATOR TEST FOR MISSPECIFICATION IN MIXED-EFFECTS MODELS LITIERE S., ALONSO A., and G. MOLENBERGHS * I A P S T A T I S T I C S N E T W O R K INTERUNIVERSITY
More informationVariable selection and machine learning methods in causal inference
Variable selection and machine learning methods in causal inference Debashis Ghosh Department of Biostatistics and Informatics Colorado School of Public Health Joint work with Yeying Zhu, University of
More informationEstimation in Generalized Linear Models with Heterogeneous Random Effects. Woncheol Jang Johan Lim. May 19, 2004
Estimation in Generalized Linear Models with Heterogeneous Random Effects Woncheol Jang Johan Lim May 19, 2004 Abstract The penalized quasi-likelihood (PQL) approach is the most common estimation procedure
More informationA Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness
A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model
More informationPractical considerations for survival models
Including historical data in the analysis of clinical trials using the modified power prior Practical considerations for survival models David Dejardin 1 2, Joost van Rosmalen 3 and Emmanuel Lesaffre 1
More informationClassification. Chapter Introduction. 6.2 The Bayes classifier
Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode
More informationPoint and Interval Estimation for Gaussian Distribution, Based on Progressively Type-II Censored Samples
90 IEEE TRANSACTIONS ON RELIABILITY, VOL. 52, NO. 1, MARCH 2003 Point and Interval Estimation for Gaussian Distribution, Based on Progressively Type-II Censored Samples N. Balakrishnan, N. Kannan, C. T.
More informationSupporting Information for Estimating restricted mean. treatment effects with stacked survival models
Supporting Information for Estimating restricted mean treatment effects with stacked survival models Andrew Wey, David Vock, John Connett, and Kyle Rudser Section 1 presents several extensions to the simulation
More informationIntroduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview
Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations
More informationInstitute of Actuaries of India
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2018 Examinations Subject CT3 Probability and Mathematical Statistics Core Technical Syllabus 1 June 2017 Aim The
More informationA Note on Bayesian Inference After Multiple Imputation
A Note on Bayesian Inference After Multiple Imputation Xiang Zhou and Jerome P. Reiter Abstract This article is aimed at practitioners who plan to use Bayesian inference on multiplyimputed datasets in
More informationLikelihood-Based Methods
Likelihood-Based Methods Handbook of Spatial Statistics, Chapter 4 Susheela Singh September 22, 2016 OVERVIEW INTRODUCTION MAXIMUM LIKELIHOOD ESTIMATION (ML) RESTRICTED MAXIMUM LIKELIHOOD ESTIMATION (REML)
More informationInvestigation into the use of confidence indicators with calibration
WORKSHOP ON FRONTIERS IN BENCHMARKING TECHNIQUES AND THEIR APPLICATION TO OFFICIAL STATISTICS 7 8 APRIL 2005 Investigation into the use of confidence indicators with calibration Gerard Keogh and Dave Jennings
More informationApproximating Bayesian Posterior Means Using Multivariate Gaussian Quadrature
Approximating Bayesian Posterior Means Using Multivariate Gaussian Quadrature John A.L. Cranfield Paul V. Preckel Songquan Liu Presented at Western Agricultural Economics Association 1997 Annual Meeting
More informationInternational Journal of PharmTech Research CODEN (USA): IJPRIF, ISSN: , ISSN(Online): Vol.9, No.9, pp , 2016
International Journal of PharmTech Research CODEN (USA): IJPRIF, ISSN: 0974-4304, ISSN(Online): 2455-9563 Vol.9, No.9, pp 477-487, 2016 Modeling of Multi-Response Longitudinal DataUsing Linear Mixed Modelin
More informationGroup Sequential Designs: Theory, Computation and Optimisation
Group Sequential Designs: Theory, Computation and Optimisation Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj 8th International Conference
More informationLecture Outline. Biost 518 Applied Biostatistics II. Choice of Model for Analysis. Choice of Model. Choice of Model. Lecture 10: Multiple Regression:
Biost 518 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture utline Choice of Model Alternative Models Effect of data driven selection of
More informationQuantile Regression for Residual Life and Empirical Likelihood
Quantile Regression for Residual Life and Empirical Likelihood Mai Zhou email: mai@ms.uky.edu Department of Statistics, University of Kentucky, Lexington, KY 40506-0027, USA Jong-Hyeon Jeong email: jeong@nsabp.pitt.edu
More informationAnalysis of Gamma and Weibull Lifetime Data under a General Censoring Scheme and in the presence of Covariates
Communications in Statistics - Theory and Methods ISSN: 0361-0926 (Print) 1532-415X (Online) Journal homepage: http://www.tandfonline.com/loi/lsta20 Analysis of Gamma and Weibull Lifetime Data under a
More informationMultilevel Statistical Models: 3 rd edition, 2003 Contents
Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction
More informationFitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation
Fitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation Dimitris Rizopoulos Department of Biostatistics, Erasmus University Medical Center, the Netherlands d.rizopoulos@erasmusmc.nl
More informationINVERTED KUMARASWAMY DISTRIBUTION: PROPERTIES AND ESTIMATION
Pak. J. Statist. 2017 Vol. 33(1), 37-61 INVERTED KUMARASWAMY DISTRIBUTION: PROPERTIES AND ESTIMATION A. M. Abd AL-Fattah, A.A. EL-Helbawy G.R. AL-Dayian Statistics Department, Faculty of Commerce, AL-Azhar
More informationDeductive Derivation and Computerization of Semiparametric Efficient Estimation
Deductive Derivation and Computerization of Semiparametric Efficient Estimation Constantine Frangakis, Tianchen Qian, Zhenke Wu, and Ivan Diaz Department of Biostatistics Johns Hopkins Bloomberg School
More informationDefault Priors and Effcient Posterior Computation in Bayesian
Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature
More informationModel comparison and selection
BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)
More informationSTATISTICAL INFERENCE IN ACCELERATED LIFE TESTING WITH GEOMETRIC PROCESS MODEL. A Thesis. Presented to the. Faculty of. San Diego State University
STATISTICAL INFERENCE IN ACCELERATED LIFE TESTING WITH GEOMETRIC PROCESS MODEL A Thesis Presented to the Faculty of San Diego State University In Partial Fulfillment of the Requirements for the Degree
More informationMedian Cross-Validation
Median Cross-Validation Chi-Wai Yu 1, and Bertrand Clarke 2 1 Department of Mathematics Hong Kong University of Science and Technology 2 Department of Medicine University of Miami IISA 2011 Outline Motivational
More informationProportional hazards regression
Proportional hazards regression Patrick Breheny October 8 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/28 Introduction The model Solving for the MLE Inference Today we will begin discussing regression
More informationIntroduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016
Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An
More informationMeasuring Social Influence Without Bias
Measuring Social Influence Without Bias Annie Franco Bobbie NJ Macdonald December 9, 2015 The Problem CS224W: Final Paper How well can statistical models disentangle the effects of social influence from
More informationReview Article Analysis of Longitudinal and Survival Data: Joint Modeling, Inference Methods, and Issues
Journal of Probability and Statistics Volume 2012, Article ID 640153, 17 pages doi:10.1155/2012/640153 Review Article Analysis of Longitudinal and Survival Data: Joint Modeling, Inference Methods, and
More informationJoint Modeling of Longitudinal Item Response Data and Survival
Joint Modeling of Longitudinal Item Response Data and Survival Jean-Paul Fox University of Twente Department of Research Methodology, Measurement and Data Analysis Faculty of Behavioural Sciences Enschede,
More informationParadoxical Results in Multidimensional Item Response Theory
UNC, December 6, 2010 Paradoxical Results in Multidimensional Item Response Theory Giles Hooker and Matthew Finkelman UNC, December 6, 2010 1 / 49 Item Response Theory Educational Testing Traditional model
More informationPrevious lecture. P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing.
Previous lecture P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing. Interaction Outline: Definition of interaction Additive versus multiplicative
More informationBayesian estimation of the discrepancy with misspecified parametric models
Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012
More informationType I and type II error under random-effects misspecification in generalized linear mixed models Link Peer-reviewed author version
Type I and type II error under random-effects misspecification in generalized linear mixed models Link Peer-reviewed author version Made available by Hasselt University Library in Document Server@UHasselt
More information