are censored. Because artificial censoring reduces the information available and leads to nonsmooth estimating functions, prior research has noted

Size: px
Start display at page:

Download "are censored. Because artificial censoring reduces the information available and leads to nonsmooth estimating functions, prior research has noted"

Transcription

1 ABSTRACT VOCK, DAVID MICHAEL. Advanced Statistical Methods for Complex Longitudinal Data. (Under the direction of Marie Davidian and Anastasios A. Tsiatis.) Longitudinally collected data play an important role in many biomedical applications. As researchers have greater capacity to collect and store large amounts of data over time, biomedical research has shifted from trying to understand the effects of therapy at a single time point, to a more nuanced understanding of how therapy, which could change over time, longitudinally affects a patient. However, many standard methods of analyzing longitudinal data are not adequate to address questions of interest. The first half of the thesis addresses methodological challenges when the longitudinally collected response may be censored. Mixed models are commonly used to represent longitudinal or repeated measures data. An additional complication arises when the response is censored, for example, due to limits of quantification of the assay used. While Gaussian random effects are routinely assumed, little work has characterized the consequences of misspecifying the random effects distribution nor has a more flexible distribution been studied for censored longitudinal data. We show that, in general, maximum likelihood estimators will not be consistent when the random effects density is misspecified, and the effect of misspecification is likely to be greatest when the true random effects density deviates substantially from normality and the number of non-censored observations on each subject is small. We develop a mixed model framework for censored longitudinal data in which the random effects are represented by the flexible seminonparametric (SNP) density and show how to obtain estimates in SAS procedure NLMIXED. Simulations show that this approach can lead to reduction in bias and increase in efficiency relative to assuming Gaussian random effects. The methods are demonstrated on data from a study of hepatitis C virus. In the second portion, we develop methods to assess the effect of organ transplantation, a time-varying treatment, on survival. Because the number of patients waiting for organ transplants exceeds the number of organs available, a better understanding of how transplantation affects the distribution of residual lifetime is needed to improve organ allocation. However, there has been little work to assess the survival benefit of transplantation from a causal perspective. Previous methods developed to estimate the causal effects of treatment in the presence of timevarying confounders have assumed that treatment assignment was independent across patients, which is not true for organ transplantation. We develop a version of G-estimation that accounts for the fact that treatment assignment is not independent across individuals to estimate the parameters of a structural nested failure time model. In addition, G-estimation for failure time models requires the use of artificial censoring, a technique where some subjects observed to fail

2 are censored. Because artificial censoring reduces the information available and leads to nonsmooth estimating functions, prior research has noted that finding the solutions to estimating equations can be difficult. We suggest some computational strategies to mitigate the problems typically encountered with artificial censoring. We derive the asymptotic properties of our estimator and confirm through simulation studies that our method leads to valid inference of the effect of transplantation on the distribution of residual lifetime. We demonstrate our method on the survival benefit of lung transplantation using data from the United Network for Organ Sharing (UNOS).

3 Copyright 2012 by David Michael Vock All Rights Reserved

4 Advanced Statistical Methods for Complex Longitudinal Data by David Michael Vock A dissertation submitted to the Graduate Faculty of North Carolina State University in partial fulfillment of the requirements for the Degree of Doctor of Philosophy Statistics Raleigh, North Carolina 2012 APPROVED BY: Dennis D. Boos Daowen Zhang Marie Davidian Co-chair of Advisory Committee Anastasios A. Tsiatis Co-chair of Advisory Committee

5 DEDICATION To my parents, Janice and Dave; my sister, Emily; and my best friend, Laura. ii

6 BIOGRAPHY Originally from Wheaton, Illinois, David M. Vock is the son of David J. and Janice Vock. David graduated from Wheaton North High School in 2003 and was captain of the two-time Illinois state champion scholastic bowl team for Wheaton North. After high school, David attended St. Olaf College in Northfield, Minnesota where he majored in mathematics and chemistry with a concentration in statistics. While at St. Olaf College, he had the opportunity to study biostatistics and collaborate with World Health Organization researchers in Geneva, Switzerland. David was elected to Phi Beta Kappa and graduated summa cum laude with a Bachelor of Arts degree and distinction in statistics from St. Olaf in Interested in many different academic disciplines, David decided to pursue graduate studies in statistics because, in the words of John Tukey, he could play in everyone s backyard. He began his graduate studies in 2007 at North Carolina State University in Raleigh, North Carolina. David received a Master of Statistics degree from Noth Carolina State University in He actively collaborates with medical researchers at Duke Clinical Research Institute. David s collaborative work has largely focused on hepatitis C and lung transplantation research and has motivated much of his methodological research in longitudinal data analysis, causal inference, and dynamic treatment regimes. He intends to complete his Ph.D. under the direction of Drs. Marie Davidian and Anastasios A. Tsiatis. David has accepted a position as an assistant professor in the Division of Biostatistics at the University of Minnesota School of Public Health. iii

7 ACKNOWLEDGEMENTS Many people have helped and guided me through my graduate studies and dissertation research. I would like to express my gratitude to a few people below while acknowledging that many more people deserve praise. I would like especially to thank my advisors, Drs. Marie Davidian and Butch Tsiatis. They have showed incredible patience in allowing me to explore various research lines and I am very thankful for their latitude. Their expertise and gentle guidance have been immensely helpful in completing this work. I also would like to thank them for offering me a research assistantship to support me during most of my time at North Carolina State. Mostly, I would like to express my gratitude to them for being great faculty role models and mentors. They have guided me through the peer-review process, made sure that I would gain teaching experience, and given me valuable advice on searching for academic jobs. Butch and Marie have shown me how to contribute to interdisciplinary research and how to use collaborative research to motivate methodological work. I wish to thank all the faculty and staff at North Carolina State that have helped me along the way. I want to thank especially the Directors of the Statistics Graduate Program during my first years at North Carolina State, Drs. Pam Arroway and Jacqueline Hughes-Oliver, for being patient in answering all my questions and taking the time to help me find interesting funding opportunities. I am also appreciative of all the advice Dr. Eric Laber has given me about how to interview for faculty jobs and how to be a successful junior faculty member. Finally, I would like to thank Drs. Dennis Boos and Daowen Zhang for serving on my committee, writing letters of recommendation, and being wonderful teachers of some of the most interesting and useful courses I have taken in graduate school, especially Advanced Inference and Mixed Models and Variance Components. I am also thankful for the opportunity to work with Karen Pieper at Duke Clinical Research Institute. She has been an invaluable resource on how to work with clinical investigators and artfully explain advanced methodology to a broad audience. I have learned a lot from her ability to prioritize the most urgent needs of a statistical analysis and to cajole subject area experts to buy-in to rigorous statistical analysis. In addition, I would like to thank all the clinical investigators that I have worked with at Duke. Besides teaching me a lot about their own area of research, they have been patient and nurturing. I am especially grateful for the great education that I received at St. Olaf College that well prepared me for graduate school. I want to thank my mentors and research advisors, Drs. Rebecca Judge, Julie Legler, and Paul Roback. They have been extremely supportive and provided a lot of encouragement that I could be successful in graduate school. iv

8 Finally, I want to thank all my family and friends for their love and support. Simply put, without them I would not have succeeded. My parents have given so much to see that I have been successful and have always been loving and caring. Both my parents and my sister, Emily, have taught me to laugh, stay humble, and never to take myself too seriously. Finally, I would like to thank my girlfriend Laura for her patience, support, and good humor while I wrote this dissertation. v

9 TABLE OF CONTENTS List of Tables viii List of Figures xii Chapter 1 Mixed Model Analysis of Censored Longitudinal Data with Flexible Random Effects Density Introduction Gaussian Linear Mixed Model with Censored Response Semiparametric Linear Mixed Model with Censored Response Estimation Using SAS NLMIXED Simulation Application Discussion Chapter 2 Assessing the Effect of Organ Transplantation on the Distribution of Residual Lifetime Introduction Developing a Statistical Framework Potential Outcome Framework Failure Time Model Observed Data Organ Allocation Model Development of Class of Estimators Proposed Class of Estimators Asymptotic Properties of the Class of Estimators Reasonable Estimators in the Class Nuisance Parameters Censored Survival Times Artificial Censoring Mitigating Problems of Artificial Censoring Simulation Study UNOS Dataset Description of Cohort Model Specification Results Diagnostics and Non-parametric Estimate of Survival Conclusion References Appendices Appendix A Asymptotic Bias of Parameter Estimators for Censored Longitudinal Mixed Models vi

10 Appendix B SAS Code for Censored Longitudinal Models B.1 Obtain Empirical Bayes Estimates B.2 Obtain Grid of Starting Values B.3 SNP Estimation Appendix C Additional Simulations for Mixed Model Analysis of Censored Longitudinal Data Appendix D Joint Distribution of Potential Outcomes and Observed Data Appendix E Consistency and Asymptotic Normality of the Class of Estimators E.1 Proof of Consistency E.2 Proof of Asymptotic Normality Appendix F Improving the Efficiency of Class of Estimators Appendix G Doubly Robust Estimators Appendix H Estimating Nuisance Parameters H.1 Proof: Estimation of Nuisance Parameters Does Not Affect Asymptotic Variance H.2 Suggestions on Estimating ξ Appendix I Organ Transplantation Simulation Details vii

11 LIST OF TABLES Table 1.1 Table 1.2 Table 1.3 Table 1.4 Table 1.5 Table 1.6 Table 1.7 Table 1.8 Proportion of subjects by number of non-censored observation for the four random effects densities considered in the simulations with q = 2 described in Section Simulation results when Gaussian random effects were assumed for all models regardless of the true distribution of the random effects. The simulation included 500 data sets with 500 subjects each. MC Avg: Monte Carlo average of the parameter estimates; MC SD: Monte Carlo standard deviation of the parameter estimates; Avg SE: Average of the standard error estimates; CP: Monte Carlo coverage probability of the 95 percent Wald-type confidence intervals Proportion of subjects by number of non-censored observation when the true random effects density was bimodal and the time points where data were collected was varied. t A = (0, 1, 2, 3, 4) T ; t B = (0, 1, 2, 3, 3.5, 4, 4.5) T ; t C = ( 2, 1, 0, 1, 2, 3, 4) T Simulation results when Gaussian random effects were assumed for the bimodal random effects but t i in (1.2) was varied. The simulation included 500 data sets with 500 subjects each. t A = (0, 1, 2, 3, 4) T ; t B = (0, 1, 2, 3, 3.5, 4, 4.5) T ; t C = ( 2, 1, 0, 1, 2, 3, 4) T ; MC Avg: Monte Carlo average of the parameter estimates; MC SD: Monte Carlo standard deviation of the parameter estimates; Avg SE: Average of the standard error estimates; CP: Monte Carlo coverage probability of the 95 percent Waldtype confidence intervals Proportion of time K was selected using AIC, HQIC, and BIC when the SNP density was used to model the random effects density. The simulation included 500 data sets with 500 subjects each Simulation results when the SNP density was used to model the random effects density. Models with K = 0, K = 1, and K = 2 were fit and K was selected using HQIC. The simulation included 500 data sets with 500 subjects each. MC Avg: Monte Carlo average of the parameter estimates; MC SD: Monte Carlo standard deviation of the parameter estimates; Avg SE: Average of the standard error estimates; Ratio MSE: Ratio of the Monte Carlo mean square error between K = 0 and K selected by the HQIC Proportion of responses censored at each time point for the 811 patients with CT genotype in the IDEAL trial Proportion of subjects by number of uncensored viral load measurements for the 811 patients with the CT genotype in the IDEAL trial Table 1.9 Information criteria and parameter estimates from fitting model (1.10) concerning the IDEAL study with K = 0, K = 1, and K = Table 1.10 Parameter estimates, standard errors, and a 95 percent Wald-type confidence limits from fitting model (1.10) assuming the random effects followed the SNP density with K = viii

12 Table 2.1 Table 2.2 Summary of the parameter estimates for θ 0 by estimator from 1,000 Mote Carlo simulations when the number of organs allocated was 1,250. Converge: Proportion of Monte Carlo datasets the estimator converged; MC mean: Monte Carlo average of the parameter estimates; MC SD: Monte Carlo standard deviation of the parameter estimates; Avg SE: Average of the standard error estimates; Ratio Avg SE:SD: Ratio of the average of the standard error estimates to the standard deviation of the parameter estimates; Ratio MSE: Ratio of the average squared error of the fac estimates to the average squared error of the estimates of the given estimator; CP: Monte Carlo coverage probability of 95 percent Wald-type confidence intervals. Approximate standard errors for the values in the column MC Mean ranged from to , the column MC SD ranged from to , the column Avg SE ranged from to , the column Ratio Avg SE:SD ranged from to 0.045, the column Ratio MSE ranged from to 0.27, the column CP ranged from to The true value of θ 0 is Summary of the parameter estimates for θ 1 by estimator from 1,000 Mote Carlo simulations when the number of organs allocated was 1,250. Converge: Proportion of Monte Carlo datasets the estimator converged; MC mean: Monte Carlo average of the parameter estimates; MC SD: Monte Carlo standard deviation of the parameter estimates; Avg SE: Average of the standard error estimates; Ratio Avg SE:SD: Ratio of the average of the standard error estimates to the standard deviation of the parameter estimates; Ratio MSE: Ratio of the average squared error of the fac estimates to the average squared error of the estimates of the given estimator; CP: Monte Carlo coverage probability of 95 percent Wald-type confidence intervals. Approximate standard errors for the values in the column MC Mean ranged from to , the column MC SD ranged from to , the column Avg SE ranged from to , the column Ratio Avg SE:SD ranged from to 0.045, the column Ratio MSE ranged from to 0.11, the column CP ranged from to The true value of θ 1 is ix

13 Table 2.3 Table 2.4 Table 2.5 Summary of the parameter estimates for θ 0 by estimator from 1,000 Mote Carlo simulations when the number of organs allocated was 2,500. Converge: Proportion of Monte Carlo datasets the estimator converged; MC mean: Monte Carlo average of the parameter estimates; MC SD: Monte Carlo standard deviation of the parameter estimates; Avg SE: Average of the standard error estimates; Ratio Avg SE:SD: Ratio of the average of the standard error estimates to the standard deviation of the parameter estimates; Ratio MSE: Ratio of the average squared error of the fac estimates to the average squared error of the estimates of the given estimator; CP: Monte Carlo coverage probability of 95 percent Wald-type confidence intervals. Approximate standard errors for the values in the column MC Mean ranged from to , the column MC SD ranged from to , the column Avg SE ranged from to , the column Ratio Avg SE:SD ranged from to 0.045, the column Ratio MSE ranged from to 0.13, the column CP ranged from to The true value of θ 0 is Summary of the parameter estimates for θ 1 by estimator from 1,000 Mote Carlo simulations when the number of organs allocated was 2,500. Converge: Proportion of Monte Carlo datasets the estimator converged; MC mean: Monte Carlo average of the parameter estimates; MC SD: Monte Carlo standard deviation of the parameter estimates; Avg SE: Average of the standard error estimates; Ratio Avg SE:SD: Ratio of the average of the standard error estimates to the standard deviation of the parameter estimates; Ratio MSE: Ratio of the average squared error of the fac estimates to the average squared error of the estimates of the given estimator; CP: Monte Carlo coverage probability of 95 percent Wald-type confidence intervals. Approximate standard errors for the values in the column MC Mean ranged from to , the column MC SD ranged from to , the column Avg SE ranged from to , the column Ratio Avg SE:SD ranged from to 0.045, the column Ratio MSE ranged from to 0.090, the column CP ranged from to The true value of θ 1 is Parameter estimates for the model of the propensity to receive an available organ. Group A: primarily COPD; Group B: primarily pulmonary hypertension; Group C: primarily CF; Group D: primarily IPF; Single LTx: single lung transplantation. The reference for Private Insurance and Medicare Insurance is Medicaid or other public funding for payment. The reference for each UNOS Region is Region 11. To allow for a non-linear relationship between LAS and the log propensity to receive treatment, we considered a restricted cubic spline transformation of LAS knot points at 32, 36, 43, and 85. LAS and LAS refer to the non-linear terms of this transformation x

14 Table 2.6 Table B.1 Parameter estimates of the scale accelerated failure time model for the effect of lung transplantation. Group A: primarily COPD; Group B: primarily pulmonary hypertension; Group C: primarily CF; Group D: primarily IPF; Single LTx: single lung transplantation. The reference for Private Insurance and Medicare Insurance is Medicaid or other public funding for payment. LAS has been centered at 30, so the interpretation of the intercept term is the log acceleration factor at a LAS of Conversion between the notation used in Section 1 and the names used in the example code Table C.1 Proportion of subjects by the number of non-censored observations for the four random effect densities considered in the simulations with q = Table C.2 Simulation results when Gaussian random effects were assumed for all models regardless of the true distribution of the random effects. The simulation included 500 data sets with 500 subjects each. MC Avg: Monte Carlo average of the parameter estimates; MC SD: Monte Carlo standard deviation of the parameter estimates; Avg SE: Average of the standard error estimates; CP: Monte Carlo coverage probability of the 95 percent Wald-type confidence intervals Table C.3 Proportion of time K was selected using AIC, HQIC, and BIC for q = 1 simulations when SNP was used to estimate the random effects Table C.4 Simulation results when SNP was used to estimate the random effects for q = 1 simulations. Models with K = 0, K = 1, and K = 2 were fit and K was selected using HQIC. The simulation included 500 data sets with 500 subjects each. MC Avg: Monte Carlo average of the parameter estimates; MC SD: Monte Carlo standard deviation of the parameter estimates; Avg SE: Average of the standard error estimates; Ratio MSE: Ratio of the Monte Carlo mean square error between K = 0 and K selected by the HQIC Table I.1 Quantile (years) of the simulated residual life had the patient never been transplanted by LAS at transplantation. The maximum residual follow-up time is 5.66 years in the simulation xi

15 LIST OF FIGURES Figure 1.1 Figure 1.2 Trajectory of log 10 viral load of 25 randomly selected subjects with genotype CT from the IDEAL study. The lower limit of quantification, log 10 IU/mL, is shown in the green dotted line. For graphical purposes, the lower limit of quantification was imputed for the censored responses.. 2 Contour plot of the true random effects densities from the skewed and bimodal scenarios considered in the simulation described in Section 1.5. In each case, the densities were a mixture of two normals Figure 1.3 Average estimated contour and marginal density plots from fitting 500 data sets with skewed random effects where the density was assumed to follow the SNP density with K selected by HQIC. The blue dotted lines on the marginal density plots are the true marginal density. The true bivariate random effects density is plotted in Figure 1.2 and described in Section Figure 1.4 Average estimated contour and marginal density plots from fitting 500 data sets with bimodal random effects where the density was assumed to follow the SNP density with K selected by HQIC. The blue dotted lines on the marginal density plots are the true marginal density. The true bivariate random effects density is plotted in Figure 1.2 and described in Section Figure 1.5 Contour plot of the bivariate density estimate and marginal density estimates for the subject-specific intercepts and slopes with K = 2 from model (1.10) concerning the IDEAL study Figure 2.1 Figure 2.2 Figure 2.3 Plot of ij (θ, F j) and Ṽ ij (θ, F j) as a function of R ij where Cij (θ, F j) = 500 and ɛ = 50. These are the smooth approximations to ij (θ, F j ) = I{ R ij (θ) < Cij (θ, F j)} and Ṽij(θ, F j ) = min{ R ij (θ), Cij (θ, F j)}, respectively Histogram of all LAS scores each time an organ was offered (left) and LAS scores at transplantation for those patients that were transplanted (right) Kaplan-Meier residual survival curves for patients from the time of listing on the waitlist and transplantation. We have stratified by LAS at listing and transplantation, respectively, for these curves. The waitlist survival was censored at the time the patient was transplanted xii

16 Figure 2.4 Figure 2.5 Figure C.1 Figure C.2 Figure I.1 Logarithm of the estimated scale acceleration factor in the transformation model for the causal effect of lung transplantation in Group A (primarily COPD, left) and Group D (primarily IPF, right) by LAS and transplantation type. The curves here are drawn for patients under 60 years of age, not in UNOS regions 6-8, and not paying for the transplant through private insurance or Medicare. We emphasize that negative (positive) values of the log scale factor suggest that transplantation increases (decreases) lifespan. The histogram in the background gives the number of patients transplanted for that range of LAS and native disease group Kaplan-Meier survival curves of the transformed residual lifetime for all (i, j) pairs by LAS group and native disease. The rows correspond to native disease groups A, C, and D from top to bottom. The solid blue line corresponds to all (i, j) pairs where dn ij = 1 and the red dotted line corresponds to all (i, j) pairs where dn ij = 0. If the transformation model has been correctly specified then the two lines should be approximately the same Estimated random effects density for each of the four random effect densities considered (clockwise from upper left: normal, t 5, bimodal, and skewed) when SNP was used to estimate the random effects and K was selected using HQIC. The estimated density is plotted for 100 randomly selected Monte Carlo data sets. The truth is superimposed in red, and the average of all 500 Monte Carlo data sets is superimposed in blue Histogram of the estimated variance of the random effect when SNP was used to estimate the random effects density and the true random effect density was t 5 (left). The estimated random effects densities for those data sets where the estimate of var(b 0i ) is greater than 4 are also plotted (right) Logarithm of rate parameter from the exponential distribution used to generate the residual lifetime had the patient never been transplanted xiii

17 Chapter 1 Mixed Model Analysis of Censored Longitudinal Data with Flexible Random Effects Density 1.1 Introduction Longitudinal or repeated measures data are commonly represented by mixed effects models. A complication occurs when the response is censored for some of the observations, which often arises when assay measures are collected over time and the assay procedure is subject to limits of quantification (Hughes, 1999; Jacqmin-Gadda et al., 2000; Wu, 2002; Gray et al., 2004; Wang and Fygenson, 2009, and references therein). As an example, we consider data from the Individual Dosing Efficacy versus Flat Dosing to Assess Optimal Pegylated Interferon Therapy (IDEAL) study, which tracked the viral load progression of treatment-naive patients with hepatitis C (Thompson et al., 2010). One objective was to characterize the viral load decline in patients receiving standard treatment, pegylated interferon-alpha and ribavirin, over the first twelve weeks of treatment. Within-subjects, the trajectory of the log 10 viral load over the first 12 weeks is approximately linear, but responses were censored from below at log 10 IU/mL, the lower limit of quantification of the assay used. Over the first 12 weeks, 10.5 percent of all observations were censored, and 35.2 percent were censored at week 12. The trajectories of 25 randomly selected subjects are shown in Figure 1.1. A standard analysis would impute the censored responses as the limit or half limit of quantification and then fit a mixed model for uncensored longitudinal data. However, Jacqmin-Gadda et al. (2000) showed through simulation studies that such crude parameter estimators are badly biased even for modest levels of censoring. A refined analysis would postulate a likelihood that 1

18 Time (weeks) log Viral Load (IU/mL) Figure 1.1: Trajectory of log 10 viral load of 25 randomly selected subjects with genotype CT from the IDEAL study. The lower limit of quantification, log 10 IU/mL, is shown in the green dotted line. For graphical purposes, the lower limit of quantification was imputed for the censored responses. 2

19 accounts for the censoring and obtain the maximum likelihood estimates (MLE). Substantial research has gone into efficient methods to solve for the MLE. Hughes (1999) developed a Monte Carlo Expectation-Maximization (MCEM) estimation procedure to obtain the MLE for linear mixed effects models with censored responses. Wu (2002) extended the work of Hughes by developing a EM algorithm for nonlinear mixed-effects models with censored responses that allowed for covariates to be measured with error. Wu (2004) further extended the EM algorithm to joint models of multiple longitudinal processes when the responses may be censored. Vaida and Liu (2009) derived a closed form expression for the expectation step rather than use Monte Carlo simulation, which greatly reduced the computational time. The estimation procedure developed by Vaida and Liu (2009) is available in the R package lmec. Other computational approaches have been proposed besides the EM algorithm. Fitzgerald et al. (2002) implemented a multiple imputation approach that avoids maximization of a censored likelihood. Jacqmin-Gadda et al. (2000) noted that the full likelihood is the product of the marginal density of the non-censored responses and the conditional distribution of the censored responses, evaluated at the limit of detection, given the non-censored responses. The authors used numerical integration to estimate the conditional distribution and then combined the Simplex algorithm and the Marquardt algorithm in the Fortran language to solve for the MLE. In contrast, both Thiébaut and Jacqmin-Gadda (2004) and Lyles et al. (2000) approximate the full likelihood of mixed effect model by numerically integrating over the random effects densities and then using standard optimizers to obtain parameter estimates. Previous methods cited above using this full-likelihood approach have all assumed Gaussian random effects and intra-subject error. While intra-subject error may be reasonably assumed to be normally distributed, the assumption of Gaussian random effects may be too restrictive in many applications. In the IDEAL study, some patients do not respond to treatment while others show marked declines in the viral load, so the distribution of subject-specific slopes is unlikely to be approximately Gaussian. For non-censored linear mixed effects models, the maximum likelihood estimators for the fixed effects and covariance components are consistent under broad regularity conditions even when the random effects distribution is misspecified (Verbeke and Lesaffre, 1997). For non-linear mixed effects models, which include mixed models with censored responses and generalized linear mixed models, the maximum likelihood estimators will not, in general, be consistent if the random effects distribution is misspecified. The effect of misspecification has been well-studied in generalized linear mixed models. Broadly speaking, researchers have noted that there is considerable bias and loss of efficiency in the maximum likelihood parameter estimators assuming Gaussian random effects when either the true random effects density deviates substantially from normality or the variance of the random effects is large and there is moderate misspecification of the random effects (Neuhaus et al., 1992; Hartford and Davidian, 2000; Heagerty and Kurland, 3

20 2001; Agresti et al., 2004; Litière et al., 2008). The potential effect of misspecification of the random effects distribution when the response is censored has not been examined. A variety of methods have been proposed to relax the Gaussian assumption for the random effects (Magder and Zeger, 1996; Verbeke and Lesaffre, 1997; Kleinman and Ibrahim, 1998; Aitkin, 1999; Zhang and Davidian, 2001; Chen et al., 2002; Lee and Thompson, 2008; Ho and Hu, 2008; Komarek and Lesaffre, 2008; Lin and Lee, 2008; Bandyopadhyay et al., 2010; Lachos et al., 2010); however, none of these methods has been implemented for censored longitudinal data. For linear mixed models, Zhang and Davidian (2001) assumed the random effects follow a smooth density that can be represented by the seminonparametric (SNP) formulation proposed by Gallant and Nychka (1987). They showed the log likelihood can be written in a closed form leading to straightforward estimation. We show how the SNP representation can be implemented for the random effects density in linear mixed models when the response is potentially censored. In Section 1.2, we formulate the censored-response mixed model with Gaussian random effects and show that incorrectly assuming the random effects are Gaussian leads to inconsistent estimators of model parameters. In addition, we suggest general scenarios where estimation is likely to be particularly poor. In Section 1.3, we discuss the censored-response linear mixed model, where the random effects are assumed to follow a smooth, continuous density and introduce the SNP representation. Section 1.4 describes how to fit the SNP model using SAS procedure NLMIXED, how to select the degree of flexility in the model, and how to choose starting values. Section 1.5 presents the results of several simulations. In Section 1.6, we illustrate the method by application to data from the IDEAL study. 1.2 Gaussian Linear Mixed Model with Censored Response For ease of presentation, we assume responses are potentially left-censored; developments are easily generalized to right- or interval-censored data. Let Y ij, i = 1,..., m and j = 1,..., n i be the j th response for subject i that would have been been observed had there been no censoring. Consider the usual linear mixed model for longitudinal data Y ij = v T ijδ + s T ijα i + ɛ ij = x T ijβ + s T ijb i + ɛ ij, (1.1) where v ij = (x T ij, st ij )T ; δ = (β T, γ T ); b i = γ + α i ; β and γ are the p- and q-dimensional fixed effects parameters associated with covariates x ij and s ij, respectively; α i are the q-dimensional, mean zero subject-specific random effect vectors associated with covariates s ij, independent across i; and ɛ ij iid N(0, σ 2 ) are the intra-subject error, which are independent of α i. We adopt the rightmost representation in (1.1) and note that the centered random effects b i can be written 4

21 as b i = µ + RZ i, where µ is a (q 1) vector, R is a (q q) lower triangular matrix, and Z i is a (q 1) vector of random effects. Typically, Z i are assumed to follow a standard q-dimensional normal distribution, which implies b i are normally distributed with mean µ = γ and covariance matrix var(α i ) = RR T. As an example, a special case of (1.1) is the linear random coefficient model with baseline covariate x i given by Y ij = x i β + b 0i + b 1i t ij + ɛ ij, (1.2) where b i = (b 0i, b 1i ) T and s ij = (1, t ij ) T. Due to left-censoring, we observe Q ij which takes the value Y ij for Y ij > l ij and takes the value l ij, the known lower limit of quantification for the j th response on subject i, otherwise. Letting r = vech(r) be the non-zero elements of R, the parameters of interest are θ = (β T, µ T, r T, σ 2 ) T, and the likelihood assuming Gaussian random effects is m L(θ; Q) = i=1 n i j=1 [ {Φ 1 (m ij )} I(Q ij=l ij ) { 1 σ φ 1(m ij ) } I(Q ij >l ij ) ] φ q (z i )dz i, (1.3) where m ij = {Q ij x T ij β st ij (µ + RZ i)}/σ, Q is the vector of all responses observed, and φ n ( ) and Φ n ( ) are the standard n-dimensional normal density and distribution. Alternatively, we can express (1.1) as Y i = V i δ + S i α i + ɛ i, where V i {n i (p + q)} and S i (n i q) are the matrices with rows vij T and st ij, respectively, and Y i = (Y i1,..., Y ini ) T and similarly for ɛ i, l i, and Q i. To simplify the following argument, we assume that the design matrix is fixed, V i = V, and n i = n for i = 1,..., m and also suppress the index i throughout. To express (1.3) using vector notation, let f Y (y; θ) and F Y (y; θ) be the multivariate normal density and distribution, respectively, with mean V δ and covariance matrix Σ = SRR T S T + σ 2 I n. Define f Y1 Y 2 (y 1 y 2 ; θ) and F Y1 Y 2 (y 1 y 2 ; θ) to be the conditional multivariate normal density and distribution, respectively, of the vector Y 1 given the vector Y 2. There are 2 n 2 patterns of censoring/non-censoring that could be observed for an individual with at least one censored and one non-censored observation. Index each of these 2 n 2 distinct censoring patterns by k. Let Y k,o (Y k,c, respectively) be the random vector containing elements of Y that would be observed (censored) under censoring pattern k. If the k th pattern were observed for a particular subject, then the likelihood contribution from that subject assuming normal random effects is given by F Yk,c Y k,o (l k,c Q k,o ; θ)f Yk,o (Q k,o ; θ), where Q k,o is the vector of non-censored responses and l k,c (l k,o, respectively) is the limit of quantification for the censored observations (non-censored observations) under censoring pat- 5

22 tern k. The likelihood contribution from one subject can now be written as L(θ, Q) = f Y (Q; θ) I(Q>l) F Y (l; θ) I(Q=l) {F Yk,c Y k,o (l k,c Q k,o ; θ)f Yk,o (Q k,o ; θ)} I(Q k,c=l k,c,q k,o >l k,o ). 2 n 2 k=1 If the random effects distribution is incorrectly specified, the maximum likelihood estimator ˆθ will converge to value of θ that solves E { / θ log L(θ; Q)} = 0, where the expectation is taken with respect to the true random effects distribution. Assuming that the covariance components are known and using a first-order Taylor series expansion, we show in Appendix A that ˆδ will converge to the value of δ that solves l 0 = D(δ, V )(δ δ) n 2 k=1 [ l k,o lk,c { } V T Σ 1 (q V δ GY (l) ) F Y (l) f Y (q) g Y (q) dq (1.4) V T k,c Σ 1 k,c {q k,c V k,c δ } { GYk,c Y (l } ] k,o k,c q k,o ) F Yk,c Y k,o (l k,c q k,o ) f Y k,c Y k,o (q k,c q k,o ) g Yk,c Y k,o (q k,c q k,o ) g Yk,o (q k,o )dq k,c dq k,o, where g Y (q; δ ) and G Y (q; δ ) are the true marginal density and distribution, respectively, of Y ; all densities in (1.4) are evaluated at δ, the true parameter value; and D(δ, V ), Q k,c, V and Σ k,c are given in the Appendix A. While complex, the approximation reveals important insights on the asymptotic bias when the random effects are incorrectly specified. In the likely case that the random effects distribution is misspecified, the bias is driven by the difference in the true conditional distribution of the censored response given the non-censored response (that is g Yk,c Y k,o (q k,c q k,o )) and the same conditional distribution induced by the Gaussian random effects (that is f Yk,c Y k,o (q k,c q k,o )), weighted by the likelihood of the non-censored response (that is g Yk,o (q k,o )). Practically, this suggests that small deviations from normality will lead to only slight asymptotic bias. If we have correctly specified the intra-subject error distribution, then g Yk,c Y k,o (y k,c y k,o ) will be approximately normal if the dimension of Y k,o is large regardless of the true random effects density. Thus, if there is a large probability of observing many non-censored observations on each subject, then the asymptotic bias is likely to be slight. This suggests that the overall percentage of censored responses is immaterial in determining the asymptotic bias. Instead the absolute number of non-censored responses for each subject is critical. k,c, 6

23 1.3 Semiparametric Linear Mixed Model with Censored Response We offer a summary of semiparametric linear mixed models and refer the reader to Zhang and Davidian (2001) for a complete description. We now assume that the distribution of Z i belongs to the smooth class of continuous densities described by Gallant and Nychka (1987). This class is sufficiently flexible to include skewed, thick- and thin-tailed, and multimodal distributions but does not include densities with jumps, kinks, or oscillations. These densities can be represented as an infinite series but can be approximated by a truncated series. The densities that are part of the truncated series are referred to as seminonparametric or SNP. We thus assume that the density of Z i can be approximated by the SNP representation with degree of truncation K given by h K (z) = PK(z)φ 2 q (z) = (j j q) K a j1,...,j q (z j 1 1 zjq q ) 2 φ q (z), (1.5) where j l 0 for l = 1,..., q and K is the order of the polynomial P K (z). For example, with K = 2 and q = 2, P K (z) = a 00 + a 10 z 1 + a 01 z 2 + a 20 z1 2 + a 11z 1 z 2 + a 02 z2 2. When K = 0, we show below that a 0,...,0 must equal 1, and the density of Z i is a standard q-dimensional normal. K controls the degree of flexibility of the density h K (z); we discuss in Section 1.4 how to select K. The coefficients a j1,...,j q must be chosen so that h K (z) integrates to one. The constraint hk (z)dz = 1 is equivalent to imposing E{PK 2 (U)} = 1, where U follows a standard q- dimensional normal distribution (Zhang and Davidian, 2001). Let a be the d-dimensional vector of coefficients for the polynomial P K (z) and let j 1,..., j q be the subscripts corresponding to the j th element. Then the above constraint { can be rewritten as E{PK 2 } (U)} = at Aa = 1, where A is the matrix with (j, k) element equal to E(U j 1+k 1 1 ) E(U jq+kq 1 ) and U 1 is distributed as a standard normal. Rather than impose a constraint on a, we may rewrite the (d 1) vector a as a function of d 1 parameters. Because A must be positive definite, there exists a matrix B such that A = B 2. If we let c = Ba, then a T Aa = 1 implies c T c = 1. As c and c will result in the same density h K (z), c = (c 1,..., c d ) T must lie in the half-unit sphere of R d. Therefore, we can express c in terms of a polar coordinate transformation, where c 1 = sin(ξ 1 ), c 2 = cos(ξ 1 ) sin(ξ 2 ),..., c d 1 = cos(ξ 1 )... cos(ξ d 2 ) sin(ξ d 1 ),c d = cos(ξ 1 )... cos(ξ d 1 ), π/2 ξ j π/2 for j = 1,..., d 1, and ξ = (ξ 1,..., ξ d 1 ) T. The vector a can be written as B 1 c, which only involves d 1 parameters. The parameters of interest are now θ K SNP = (βt, µ T, r T, σ 2, ξ T ) T, which includes an additional d 1 parameters compared to when normality is assumed. 7

24 Note that the SNP density does not impose E(Z i ) = 0, so that E(b i ) = γ = µ + R{E(Z i )} and Var(b i ) = R{Var(Z i )}R T. The moments of Z i are linear combinations of the moments of a standard normal density, which can be found using recursion formulas. 1.4 Estimation Using SAS NLMIXED In the case where there is no censoring of the response variable, the log likelihood assuming the SNP representation of the random effects can be expressed in a closed form (Zhang and Davidian, 2001). Censoring necessitates numerical integration. For a fixed K, the likelihood of θ K SNP is given by m L(θSNP K ; Q) = i=1 n i j=1 [ {Φ 1 (m ij )} I(Q ij=l ij ) { 1 σ φ 1(m ij ) } I(Q ij >l ij ) ] P 2 K(z i )φ q (z i )dz i (1.6) For each observation i, this requires evaluation of a q-dimensional integral, which, in practice, would likely only be one- or two-dimensional. In contrast, we could integrate over the marginal density of Y i for each censored observation as Jacqmin-Gadda et al. (2000) proposed. However, the dimension of that integral would equal the number of censored observations for subject i, which could be quite large and computationally intractable. Even though the integral in (1.6) is typically only one- or two-dimensional, approximating the integral can be difficult for complex random effects densities such as the SNP density. More generally, we need to be able to approximate the following integral, L i (θ) = n i j=1 { } f Qij Z i (q ij z i ; β, µ, r, x ij ) g Zi (z i )dz i. (1.7) where f Qij Z i (q ij z i ; β, µ, r, x ij ) and g Zi (z i ) are the density of the response conditioned on the random effects and random effects density, respectively. In this application, and g Zi (z i ) = P 2 K (z i)φ q (z i ). f Qij Z i (q ij z i ; β, µ, r, x ij ) = {Φ 1 (m ij )} I(Q ij=l ij ) { 1 σ φ 1(m ij ) } I(Q ij >l ij ) Many of the standard numerical integration techniques perform quite poorly when the random effects density is complex. For example, adaptive Gaussian quadrature requires maximizing the integrand in (1.7) for each of the m subjects at each updated estimate of ˆβ, ˆµ, and ˆr. The maximizations can be too time consuming if g Zi is sufficiently complex as is the case for the SNP density. Traditional non-adaptive Gaussian quadrature centers the quadrature points at the mean of the random effects and scales them based on the variance of the random effects. 8

25 The mode of the integrand in (1.7) may not be near the mean of the random effects and a large number of quadrature points may be needed to achieve a reasonable approximation to the integral. For q 2, the number of quadrature points needed is usually too large to be computationally feasible. Other approximation techniques, such as the Laplace approximation (Wolfinger, 1993), first-order approximation (Beal and Sheiner, 1988), and importance sampling (Pinheiro and Bates, 1995) are often quite poor or infeasible if g bi is non-gaussian. Even though adaptive Gaussian quadrature is computationally infeasible, it is instructive to examine the underpinnings of the integral approximation (Pinheiro and Bates, 1995). Let Ẑi be the empirical Bayes estimate of Z i given the current estimates of β, µ and r, defined as ˆβ (k), ˆµ (k), and ˆr (k), respectively, and G i be the estimated covariance matrix of Ẑi. That is Ẑ i = arg max zi n i j=1 { f Qij Z i (q ij z i ; ˆβ } (k), ˆµ (k), ˆr (k), x ij ) g Zi (z i ) We can write Z i = Ẑi + G i 1/2 U i for the random variable U i and then perform a change of variable to have the integral in (1.7) taken with respect to U i : L i (θ) = n i G 1/2 i j=1 { } f Qij Z i (q ij Ẑi + G 1/2 i u i ; β, µ, r, x ij ) g Zi (Ẑi + G 1/2 i u i )du i. SAS procedure NLMIXED then approximates the integral using standard Gaussian quadrature with quadrature points based on the standard normal kernel. Using the approximation to (1.7) described above, we can then maximize (1.6) with respect to β, µ, and r to obtain ˆβ (k+1), ˆµ (k+1), and ˆr (k+1), respectively. Typically one would iterate, until convergence, between finding the empirical Bayes estimates to re-approximate (1.7), and then updating β, µ, and r. This requires m maximizations at each iteration because the integral in (1.7) is re-centered and re-scaled about the updated empirical Bayes estimates at each iteration. Heuristically, we would like to center and scale the integral one time at something reasonably close to Z i to avoid m maximizations at each iteration. We propose to center and scale the quadrature points at the empirical Bayes estimates and their estimated variance from assuming Gaussian random effects. Here we need to be careful as the Z i are assumed to have mean zero when Gaussian random effects are assumed, but this need not be the case for a flexible random effects density, such as the SNP density. Let ˆb i,g be the empirical Bayes estimate of b i, the centered random effect, from assuming Gaussian random effects and G i,g the estimated variance of the empirical Bayes estimates. 9

26 Then we have µ + RZ i = ˆb i,g + G 1/2 i,g U i Z i = R 1 (ˆb i,g µ) + R 1 G 1/2 i,g U i, which we can substitute into (1.7) and take the integral with respect to U i : L i (θ) = n i R 1 G 1/2 i,g j=1 [f Qij Z i { q ij R 1 (ˆb i,g µ) + R 1 G 1/2 i,g u i; β, µ, r, x ij }] (1.8) { } g Zi R 1 (ˆb i,g µ) + R 1 G 1/2 i,g u i; η du i. We can now use standard Gaussian quadrature with quadrature points based on the standard normal kernel to approximate (1.8). Because SAS procedure NLMIXED does not allow the user to center the quadrature points at arbitrary values, to force NLMIXED to use the integration strategy above we must use programming statements to center the integration as in (1.8). In addition to forcing NLMIXED to center the quadrature points at arbitrary values, we must also trick NLMIXED to allow non-gaussian random effects. The SAS procedure NLMIXED has been developed to obtain maximum likelihood estimates for mixed models with Gaussian random effects. Except when K = 0, the SNP random effects density will not be normally distributed. However, if we consider n i j=1 [ {Φ 1 (m ij )} I(Q ij=l ij ) { 1 σ φ 1(m ij ) } I(Q ij >l ij ) ] P 2 K(z i ) (1.9) to be the likelihood for Q ij conditioned on the random effects, then the random effects Z i can be thought to follow a standard q-dimensional normal distribution (Liu and Yu, 2008). Example code to implement SNP for left-censored mixed models is given in Appendix B. Among the optimization routines available in NLMIXED, dual-quasi Newton optimization works well. The inverse Hessian matrix may be used to obtain standard errors for the parameter estimates and is computed as part of the standard output in NLMIXED. The preceding discussion assumes a fixed K. We treat K as a tuning parameter and select K by visually comparing the estimated densities and computing information criteria evaluated at the estimates, ˆθ SNP K, of θk SNP from model fits for different values of K (Zhang and Davidian, 2001). The information criteria are of the form 2L(ˆθ SNP K, Q) + C(m)p where p is the number of parameters in the model. For the Akaike information criterion (AIC), C(m) = 2; Hannan- Quinn information criterion (HQIC), C(m) = 2 log log m; and Bayesian information criterion 10

Mixed model analysis of censored longitudinal data with flexible random-effects density

Mixed model analysis of censored longitudinal data with flexible random-effects density Biostatistics (2012), 13, 1, pp. 61 73 doi:10.1093/biostatistics/kxr026 Advance Access publication on September 13, 2011 Mixed model analysis of censored longitudinal data with flexible random-effects

More information

Semiparametric Mixed Effects Models with Flexible Random Effects Distribution

Semiparametric Mixed Effects Models with Flexible Random Effects Distribution Semiparametric Mixed Effects Models with Flexible Random Effects Distribution Marie Davidian North Carolina State University davidian@stat.ncsu.edu www.stat.ncsu.edu/ davidian Joint work with A. Tsiatis,

More information

Some New Methods for Latent Variable Models and Survival Analysis. Latent-Model Robustness in Structural Measurement Error Models.

Some New Methods for Latent Variable Models and Survival Analysis. Latent-Model Robustness in Structural Measurement Error Models. Some New Methods for Latent Variable Models and Survival Analysis Marie Davidian Department of Statistics North Carolina State University 1. Introduction Outline 3. Empirically checking latent-model robustness

More information

Estimating Causal Effects of Organ Transplantation Treatment Regimes

Estimating Causal Effects of Organ Transplantation Treatment Regimes Estimating Causal Effects of Organ Transplantation Treatment Regimes David M. Vock, Jeffrey A. Verdoliva Boatman Division of Biostatistics University of Minnesota July 31, 2018 1 / 27 Hot off the Press

More information

A Sampling of IMPACT Research:

A Sampling of IMPACT Research: A Sampling of IMPACT Research: Methods for Analysis with Dropout and Identifying Optimal Treatment Regimes Marie Davidian Department of Statistics North Carolina State University http://www.stat.ncsu.edu/

More information

Web-based Supplementary Materials for A Robust Method for Estimating. Optimal Treatment Regimes

Web-based Supplementary Materials for A Robust Method for Estimating. Optimal Treatment Regimes Biometrics 000, 000 000 DOI: 000 000 0000 Web-based Supplementary Materials for A Robust Method for Estimating Optimal Treatment Regimes Baqun Zhang, Anastasios A. Tsiatis, Eric B. Laber, and Marie Davidian

More information

The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models

The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models John M. Neuhaus Charles E. McCulloch Division of Biostatistics University of California, San

More information

The performance of estimation methods for generalized linear mixed models

The performance of estimation methods for generalized linear mixed models University of Wollongong Research Online University of Wollongong Thesis Collection 1954-2016 University of Wollongong Thesis Collections 2008 The performance of estimation methods for generalized linear

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Inferences about Parameters of Trivariate Normal Distribution with Missing Data

Inferences about Parameters of Trivariate Normal Distribution with Missing Data Florida International University FIU Digital Commons FIU Electronic Theses and Dissertations University Graduate School 7-5-3 Inferences about Parameters of Trivariate Normal Distribution with Missing

More information

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3 University of California, Irvine 2017-2018 1 Statistics (STATS) Courses STATS 5. Seminar in Data Science. 1 Unit. An introduction to the field of Data Science; intended for entering freshman and transfers.

More information

Using Estimating Equations for Spatially Correlated A

Using Estimating Equations for Spatially Correlated A Using Estimating Equations for Spatially Correlated Areal Data December 8, 2009 Introduction GEEs Spatial Estimating Equations Implementation Simulation Conclusion Typical Problem Assess the relationship

More information

Semiparametric Regression

Semiparametric Regression Semiparametric Regression Patrick Breheny October 22 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/23 Introduction Over the past few weeks, we ve introduced a variety of regression models under

More information

INFORMATION APPROACH FOR CHANGE POINT DETECTION OF WEIBULL MODELS WITH APPLICATIONS. Tao Jiang. A Thesis

INFORMATION APPROACH FOR CHANGE POINT DETECTION OF WEIBULL MODELS WITH APPLICATIONS. Tao Jiang. A Thesis INFORMATION APPROACH FOR CHANGE POINT DETECTION OF WEIBULL MODELS WITH APPLICATIONS Tao Jiang A Thesis Submitted to the Graduate College of Bowling Green State University in partial fulfillment of the

More information

For more information about how to cite these materials visit

For more information about how to cite these materials visit Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/

More information

Experimental designs for multiple responses with different models

Experimental designs for multiple responses with different models Graduate Theses and Dissertations Graduate College 2015 Experimental designs for multiple responses with different models Wilmina Mary Marget Iowa State University Follow this and additional works at:

More information

ABSTRACT. HOLM, KATHLEEN JENNIFER. Comparison of Optimal Design Methods in Inverse Problems. (Under the direction of H.T. Banks.)

ABSTRACT. HOLM, KATHLEEN JENNIFER. Comparison of Optimal Design Methods in Inverse Problems. (Under the direction of H.T. Banks.) ABSTRACT HOLM, KATHLEEN JENNIFER. Comparison of Optimal Design Methods in Inverse Problems. (Under the direction of H.T. Banks.) This project was initiated with an investigation by Banks et al [6] into

More information

Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions

Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions Joe Schafer Office of the Associate Director for Research and Methodology U.S. Census

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective Anastasios (Butch) Tsiatis and Xiaofei Bai Department of Statistics North Carolina State University 1/35 Optimal Treatment

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Longitudinal + Reliability = Joint Modeling

Longitudinal + Reliability = Joint Modeling Longitudinal + Reliability = Joint Modeling Carles Serrat Institute of Statistics and Mathematics Applied to Building CYTED-HAROSA International Workshop November 21-22, 2013 Barcelona Mainly from Rizopoulos,

More information

Statistical Methods for Alzheimer s Disease Studies

Statistical Methods for Alzheimer s Disease Studies Statistical Methods for Alzheimer s Disease Studies Rebecca A. Betensky, Ph.D. Department of Biostatistics, Harvard T.H. Chan School of Public Health July 19, 2016 1/37 OUTLINE 1 Statistical collaborations

More information

Estimation for state space models: quasi-likelihood and asymptotic quasi-likelihood approaches

Estimation for state space models: quasi-likelihood and asymptotic quasi-likelihood approaches University of Wollongong Research Online University of Wollongong Thesis Collection 1954-2016 University of Wollongong Thesis Collections 2008 Estimation for state space models: quasi-likelihood and asymptotic

More information

Lifetime prediction and confidence bounds in accelerated degradation testing for lognormal response distributions with an Arrhenius rate relationship

Lifetime prediction and confidence bounds in accelerated degradation testing for lognormal response distributions with an Arrhenius rate relationship Scholars' Mine Doctoral Dissertations Student Research & Creative Works Spring 01 Lifetime prediction and confidence bounds in accelerated degradation testing for lognormal response distributions with

More information

Multivariate Survival Analysis

Multivariate Survival Analysis Multivariate Survival Analysis Previously we have assumed that either (X i, δ i ) or (X i, δ i, Z i ), i = 1,..., n, are i.i.d.. This may not always be the case. Multivariate survival data can arise in

More information

Web-based Supplementary Material for A Two-Part Joint. Model for the Analysis of Survival and Longitudinal Binary. Data with excess Zeros

Web-based Supplementary Material for A Two-Part Joint. Model for the Analysis of Survival and Longitudinal Binary. Data with excess Zeros Web-based Supplementary Material for A Two-Part Joint Model for the Analysis of Survival and Longitudinal Binary Data with excess Zeros Dimitris Rizopoulos, 1 Geert Verbeke, 1 Emmanuel Lesaffre 1 and Yves

More information

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

FULL LIKELIHOOD INFERENCES IN THE COX MODEL October 20, 2007 FULL LIKELIHOOD INFERENCES IN THE COX MODEL BY JIAN-JIAN REN 1 AND MAI ZHOU 2 University of Central Florida and University of Kentucky Abstract We use the empirical likelihood approach

More information

SEMIPARAMETRIC APPROACHES TO INFERENCE IN JOINT MODELS FOR LONGITUDINAL AND TIME-TO-EVENT DATA

SEMIPARAMETRIC APPROACHES TO INFERENCE IN JOINT MODELS FOR LONGITUDINAL AND TIME-TO-EVENT DATA SEMIPARAMETRIC APPROACHES TO INFERENCE IN JOINT MODELS FOR LONGITUDINAL AND TIME-TO-EVENT DATA Marie Davidian and Anastasios A. Tsiatis http://www.stat.ncsu.edu/ davidian/ http://www.stat.ncsu.edu/ tsiatis/

More information

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation Biost 58 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 5: Review Purpose of Statistics Statistics is about science (Science in the broadest

More information

Likelihood-based inference with missing data under missing-at-random

Likelihood-based inference with missing data under missing-at-random Likelihood-based inference with missing data under missing-at-random Jae-kwang Kim Joint work with Shu Yang Department of Statistics, Iowa State University May 4, 014 Outline 1. Introduction. Parametric

More information

Estimation of Optimal Treatment Regimes Via Machine Learning. Marie Davidian

Estimation of Optimal Treatment Regimes Via Machine Learning. Marie Davidian Estimation of Optimal Treatment Regimes Via Machine Learning Marie Davidian Department of Statistics North Carolina State University Triangle Machine Learning Day April 3, 2018 1/28 Optimal DTRs Via ML

More information

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of

More information

PQL Estimation Biases in Generalized Linear Mixed Models

PQL Estimation Biases in Generalized Linear Mixed Models PQL Estimation Biases in Generalized Linear Mixed Models Woncheol Jang Johan Lim March 18, 2006 Abstract The penalized quasi-likelihood (PQL) approach is the most common estimation procedure for the generalized

More information

UNIVERSITY OF CALIFORNIA, SAN DIEGO

UNIVERSITY OF CALIFORNIA, SAN DIEGO UNIVERSITY OF CALIFORNIA, SAN DIEGO Estimation of the primary hazard ratio in the presence of a secondary covariate with non-proportional hazards An undergraduate honors thesis submitted to the Department

More information

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model Other Survival Models (1) Non-PH models We briefly discussed the non-proportional hazards (non-ph) model λ(t Z) = λ 0 (t) exp{β(t) Z}, where β(t) can be estimated by: piecewise constants (recall how);

More information

Propensity Score Weighting with Multilevel Data

Propensity Score Weighting with Multilevel Data Propensity Score Weighting with Multilevel Data Fan Li Department of Statistical Science Duke University October 25, 2012 Joint work with Alan Zaslavsky and Mary Beth Landrum Introduction In comparative

More information

Approximate Median Regression via the Box-Cox Transformation

Approximate Median Regression via the Box-Cox Transformation Approximate Median Regression via the Box-Cox Transformation Garrett M. Fitzmaurice,StuartR.Lipsitz, and Michael Parzen Median regression is used increasingly in many different areas of applications. The

More information

Score test for random changepoint in a mixed model

Score test for random changepoint in a mixed model Score test for random changepoint in a mixed model Corentin Segalas and Hélène Jacqmin-Gadda INSERM U1219, Biostatistics team, Bordeaux GDR Statistiques et Santé October 6, 2017 Biostatistics 1 / 27 Introduction

More information

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30 MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD Copyright c 2012 (Iowa State University) Statistics 511 1 / 30 INFORMATION CRITERIA Akaike s Information criterion is given by AIC = 2l(ˆθ) + 2k, where l(ˆθ)

More information

Causal Inference with Big Data Sets

Causal Inference with Big Data Sets Causal Inference with Big Data Sets Marcelo Coca Perraillon University of Colorado AMC November 2016 1 / 1 Outlone Outline Big data Causal inference in economics and statistics Regression discontinuity

More information

ESTIMATING STATISTICAL CHARACTERISTICS UNDER INTERVAL UNCERTAINTY AND CONSTRAINTS: MEAN, VARIANCE, COVARIANCE, AND CORRELATION ALI JALAL-KAMALI

ESTIMATING STATISTICAL CHARACTERISTICS UNDER INTERVAL UNCERTAINTY AND CONSTRAINTS: MEAN, VARIANCE, COVARIANCE, AND CORRELATION ALI JALAL-KAMALI ESTIMATING STATISTICAL CHARACTERISTICS UNDER INTERVAL UNCERTAINTY AND CONSTRAINTS: MEAN, VARIANCE, COVARIANCE, AND CORRELATION ALI JALAL-KAMALI Department of Computer Science APPROVED: Vladik Kreinovich,

More information

Open Problems in Mixed Models

Open Problems in Mixed Models xxiii Determining how to deal with a not positive definite covariance matrix of random effects, D during maximum likelihood estimation algorithms. Several strategies are discussed in Section 2.15. For

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Comparing Group Means When Nonresponse Rates Differ

Comparing Group Means When Nonresponse Rates Differ UNF Digital Commons UNF Theses and Dissertations Student Scholarship 2015 Comparing Group Means When Nonresponse Rates Differ Gabriela M. Stegmann University of North Florida Suggested Citation Stegmann,

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Contents. Part I: Fundamentals of Bayesian Inference 1

Contents. Part I: Fundamentals of Bayesian Inference 1 Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian

More information

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features Yangxin Huang Department of Epidemiology and Biostatistics, COPH, USF, Tampa, FL yhuang@health.usf.edu January

More information

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,

More information

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistics - Lecture One. Outline. Charlotte Wickham  1. Basic ideas about estimation Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence

More information

Modification and Improvement of Empirical Likelihood for Missing Response Problem

Modification and Improvement of Empirical Likelihood for Missing Response Problem UW Biostatistics Working Paper Series 12-30-2010 Modification and Improvement of Empirical Likelihood for Missing Response Problem Kwun Chuen Gary Chan University of Washington - Seattle Campus, kcgchan@u.washington.edu

More information

Estimating the Mean Response of Treatment Duration Regimes in an Observational Study. Anastasios A. Tsiatis.

Estimating the Mean Response of Treatment Duration Regimes in an Observational Study. Anastasios A. Tsiatis. Estimating the Mean Response of Treatment Duration Regimes in an Observational Study Anastasios A. Tsiatis http://www.stat.ncsu.edu/ tsiatis/ Introduction to Dynamic Treatment Regimes 1 Outline Description

More information

Semi-Nonparametric Inferences for Massive Data

Semi-Nonparametric Inferences for Massive Data Semi-Nonparametric Inferences for Massive Data Guang Cheng 1 Department of Statistics Purdue University Statistics Seminar at NCSU October, 2015 1 Acknowledge NSF, Simons Foundation and ONR. A Joint Work

More information

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design 1 / 32 Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design Changbao Wu Department of Statistics and Actuarial Science University of Waterloo (Joint work with Min Chen and Mary

More information

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science. Texts in Statistical Science Generalized Linear Mixed Models Modern Concepts, Methods and Applications Walter W. Stroup CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint

More information

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.

More information

One-stage dose-response meta-analysis

One-stage dose-response meta-analysis One-stage dose-response meta-analysis Nicola Orsini, Alessio Crippa Biostatistics Team Department of Public Health Sciences Karolinska Institutet http://ki.se/en/phs/biostatistics-team 2017 Nordic and

More information

Power and Sample Size Calculations with the Additive Hazards Model

Power and Sample Size Calculations with the Additive Hazards Model Journal of Data Science 10(2012), 143-155 Power and Sample Size Calculations with the Additive Hazards Model Ling Chen, Chengjie Xiong, J. Philip Miller and Feng Gao Washington University School of Medicine

More information

Estimation and Model Selection in Mixed Effects Models Part I. Adeline Samson 1

Estimation and Model Selection in Mixed Effects Models Part I. Adeline Samson 1 Estimation and Model Selection in Mixed Effects Models Part I Adeline Samson 1 1 University Paris Descartes Summer school 2009 - Lipari, Italy These slides are based on Marc Lavielle s slides Outline 1

More information

Likelihood and Fairness in Multidimensional Item Response Theory

Likelihood and Fairness in Multidimensional Item Response Theory Likelihood and Fairness in Multidimensional Item Response Theory or What I Thought About On My Holidays Giles Hooker and Matthew Finkelman Cornell University, February 27, 2008 Item Response Theory Educational

More information

T E C H N I C A L R E P O R T A SANDWICH-ESTIMATOR TEST FOR MISSPECIFICATION IN MIXED-EFFECTS MODELS. LITIERE S., ALONSO A., and G.

T E C H N I C A L R E P O R T A SANDWICH-ESTIMATOR TEST FOR MISSPECIFICATION IN MIXED-EFFECTS MODELS. LITIERE S., ALONSO A., and G. T E C H N I C A L R E P O R T 0658 A SANDWICH-ESTIMATOR TEST FOR MISSPECIFICATION IN MIXED-EFFECTS MODELS LITIERE S., ALONSO A., and G. MOLENBERGHS * I A P S T A T I S T I C S N E T W O R K INTERUNIVERSITY

More information

Variable selection and machine learning methods in causal inference

Variable selection and machine learning methods in causal inference Variable selection and machine learning methods in causal inference Debashis Ghosh Department of Biostatistics and Informatics Colorado School of Public Health Joint work with Yeying Zhu, University of

More information

Estimation in Generalized Linear Models with Heterogeneous Random Effects. Woncheol Jang Johan Lim. May 19, 2004

Estimation in Generalized Linear Models with Heterogeneous Random Effects. Woncheol Jang Johan Lim. May 19, 2004 Estimation in Generalized Linear Models with Heterogeneous Random Effects Woncheol Jang Johan Lim May 19, 2004 Abstract The penalized quasi-likelihood (PQL) approach is the most common estimation procedure

More information

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model

More information

Practical considerations for survival models

Practical considerations for survival models Including historical data in the analysis of clinical trials using the modified power prior Practical considerations for survival models David Dejardin 1 2, Joost van Rosmalen 3 and Emmanuel Lesaffre 1

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

Point and Interval Estimation for Gaussian Distribution, Based on Progressively Type-II Censored Samples

Point and Interval Estimation for Gaussian Distribution, Based on Progressively Type-II Censored Samples 90 IEEE TRANSACTIONS ON RELIABILITY, VOL. 52, NO. 1, MARCH 2003 Point and Interval Estimation for Gaussian Distribution, Based on Progressively Type-II Censored Samples N. Balakrishnan, N. Kannan, C. T.

More information

Supporting Information for Estimating restricted mean. treatment effects with stacked survival models

Supporting Information for Estimating restricted mean. treatment effects with stacked survival models Supporting Information for Estimating restricted mean treatment effects with stacked survival models Andrew Wey, David Vock, John Connett, and Kyle Rudser Section 1 presents several extensions to the simulation

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview

Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations

More information

Institute of Actuaries of India

Institute of Actuaries of India Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2018 Examinations Subject CT3 Probability and Mathematical Statistics Core Technical Syllabus 1 June 2017 Aim The

More information

A Note on Bayesian Inference After Multiple Imputation

A Note on Bayesian Inference After Multiple Imputation A Note on Bayesian Inference After Multiple Imputation Xiang Zhou and Jerome P. Reiter Abstract This article is aimed at practitioners who plan to use Bayesian inference on multiplyimputed datasets in

More information

Likelihood-Based Methods

Likelihood-Based Methods Likelihood-Based Methods Handbook of Spatial Statistics, Chapter 4 Susheela Singh September 22, 2016 OVERVIEW INTRODUCTION MAXIMUM LIKELIHOOD ESTIMATION (ML) RESTRICTED MAXIMUM LIKELIHOOD ESTIMATION (REML)

More information

Investigation into the use of confidence indicators with calibration

Investigation into the use of confidence indicators with calibration WORKSHOP ON FRONTIERS IN BENCHMARKING TECHNIQUES AND THEIR APPLICATION TO OFFICIAL STATISTICS 7 8 APRIL 2005 Investigation into the use of confidence indicators with calibration Gerard Keogh and Dave Jennings

More information

Approximating Bayesian Posterior Means Using Multivariate Gaussian Quadrature

Approximating Bayesian Posterior Means Using Multivariate Gaussian Quadrature Approximating Bayesian Posterior Means Using Multivariate Gaussian Quadrature John A.L. Cranfield Paul V. Preckel Songquan Liu Presented at Western Agricultural Economics Association 1997 Annual Meeting

More information

International Journal of PharmTech Research CODEN (USA): IJPRIF, ISSN: , ISSN(Online): Vol.9, No.9, pp , 2016

International Journal of PharmTech Research CODEN (USA): IJPRIF, ISSN: , ISSN(Online): Vol.9, No.9, pp , 2016 International Journal of PharmTech Research CODEN (USA): IJPRIF, ISSN: 0974-4304, ISSN(Online): 2455-9563 Vol.9, No.9, pp 477-487, 2016 Modeling of Multi-Response Longitudinal DataUsing Linear Mixed Modelin

More information

Group Sequential Designs: Theory, Computation and Optimisation

Group Sequential Designs: Theory, Computation and Optimisation Group Sequential Designs: Theory, Computation and Optimisation Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj 8th International Conference

More information

Lecture Outline. Biost 518 Applied Biostatistics II. Choice of Model for Analysis. Choice of Model. Choice of Model. Lecture 10: Multiple Regression:

Lecture Outline. Biost 518 Applied Biostatistics II. Choice of Model for Analysis. Choice of Model. Choice of Model. Lecture 10: Multiple Regression: Biost 518 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture utline Choice of Model Alternative Models Effect of data driven selection of

More information

Quantile Regression for Residual Life and Empirical Likelihood

Quantile Regression for Residual Life and Empirical Likelihood Quantile Regression for Residual Life and Empirical Likelihood Mai Zhou email: mai@ms.uky.edu Department of Statistics, University of Kentucky, Lexington, KY 40506-0027, USA Jong-Hyeon Jeong email: jeong@nsabp.pitt.edu

More information

Analysis of Gamma and Weibull Lifetime Data under a General Censoring Scheme and in the presence of Covariates

Analysis of Gamma and Weibull Lifetime Data under a General Censoring Scheme and in the presence of Covariates Communications in Statistics - Theory and Methods ISSN: 0361-0926 (Print) 1532-415X (Online) Journal homepage: http://www.tandfonline.com/loi/lsta20 Analysis of Gamma and Weibull Lifetime Data under a

More information

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Multilevel Statistical Models: 3 rd edition, 2003 Contents Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction

More information

Fitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation

Fitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation Fitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation Dimitris Rizopoulos Department of Biostatistics, Erasmus University Medical Center, the Netherlands d.rizopoulos@erasmusmc.nl

More information

INVERTED KUMARASWAMY DISTRIBUTION: PROPERTIES AND ESTIMATION

INVERTED KUMARASWAMY DISTRIBUTION: PROPERTIES AND ESTIMATION Pak. J. Statist. 2017 Vol. 33(1), 37-61 INVERTED KUMARASWAMY DISTRIBUTION: PROPERTIES AND ESTIMATION A. M. Abd AL-Fattah, A.A. EL-Helbawy G.R. AL-Dayian Statistics Department, Faculty of Commerce, AL-Azhar

More information

Deductive Derivation and Computerization of Semiparametric Efficient Estimation

Deductive Derivation and Computerization of Semiparametric Efficient Estimation Deductive Derivation and Computerization of Semiparametric Efficient Estimation Constantine Frangakis, Tianchen Qian, Zhenke Wu, and Ivan Diaz Department of Biostatistics Johns Hopkins Bloomberg School

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Model comparison and selection

Model comparison and selection BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)

More information

STATISTICAL INFERENCE IN ACCELERATED LIFE TESTING WITH GEOMETRIC PROCESS MODEL. A Thesis. Presented to the. Faculty of. San Diego State University

STATISTICAL INFERENCE IN ACCELERATED LIFE TESTING WITH GEOMETRIC PROCESS MODEL. A Thesis. Presented to the. Faculty of. San Diego State University STATISTICAL INFERENCE IN ACCELERATED LIFE TESTING WITH GEOMETRIC PROCESS MODEL A Thesis Presented to the Faculty of San Diego State University In Partial Fulfillment of the Requirements for the Degree

More information

Median Cross-Validation

Median Cross-Validation Median Cross-Validation Chi-Wai Yu 1, and Bertrand Clarke 2 1 Department of Mathematics Hong Kong University of Science and Technology 2 Department of Medicine University of Miami IISA 2011 Outline Motivational

More information

Proportional hazards regression

Proportional hazards regression Proportional hazards regression Patrick Breheny October 8 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/28 Introduction The model Solving for the MLE Inference Today we will begin discussing regression

More information

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An

More information

Measuring Social Influence Without Bias

Measuring Social Influence Without Bias Measuring Social Influence Without Bias Annie Franco Bobbie NJ Macdonald December 9, 2015 The Problem CS224W: Final Paper How well can statistical models disentangle the effects of social influence from

More information

Review Article Analysis of Longitudinal and Survival Data: Joint Modeling, Inference Methods, and Issues

Review Article Analysis of Longitudinal and Survival Data: Joint Modeling, Inference Methods, and Issues Journal of Probability and Statistics Volume 2012, Article ID 640153, 17 pages doi:10.1155/2012/640153 Review Article Analysis of Longitudinal and Survival Data: Joint Modeling, Inference Methods, and

More information

Joint Modeling of Longitudinal Item Response Data and Survival

Joint Modeling of Longitudinal Item Response Data and Survival Joint Modeling of Longitudinal Item Response Data and Survival Jean-Paul Fox University of Twente Department of Research Methodology, Measurement and Data Analysis Faculty of Behavioural Sciences Enschede,

More information

Paradoxical Results in Multidimensional Item Response Theory

Paradoxical Results in Multidimensional Item Response Theory UNC, December 6, 2010 Paradoxical Results in Multidimensional Item Response Theory Giles Hooker and Matthew Finkelman UNC, December 6, 2010 1 / 49 Item Response Theory Educational Testing Traditional model

More information

Previous lecture. P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing.

Previous lecture. P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing. Previous lecture P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing. Interaction Outline: Definition of interaction Additive versus multiplicative

More information

Bayesian estimation of the discrepancy with misspecified parametric models

Bayesian estimation of the discrepancy with misspecified parametric models Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012

More information

Type I and type II error under random-effects misspecification in generalized linear mixed models Link Peer-reviewed author version

Type I and type II error under random-effects misspecification in generalized linear mixed models Link Peer-reviewed author version Type I and type II error under random-effects misspecification in generalized linear mixed models Link Peer-reviewed author version Made available by Hasselt University Library in Document Server@UHasselt

More information