Correlation of Bivariate Frailty Models and a New Marginal Weibull Distribution for Correlated Bivariate Survival Data

Similar documents
Multivariate Survival Analysis

Reliability Modelling Incorporating Load Share and Frailty

Frailty Models and Copulas: Similarities and Differences

CTDL-Positive Stable Frailty Model

A Measure of Association for Bivariate Frailty Distributions

Models for Multivariate Panel Count Data

Inferences about Parameters of Trivariate Normal Distribution with Missing Data

Duration Analysis. Joan Llull

Statistical Inference and Methods

Correlated Gamma Frailty Models for Bivariate Survival Data

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

TABLE OF CONTENTS CHAPTER 1 COMBINATORIAL PROBABILITY 1

Frailty Modeling for clustered survival data: a simulation study

ASSOCIATION MEASURES IN THE BIVARIATE CORRELATED FRAILTY MODEL

Model Selection in Bayesian Survival Analysis for a Multi-country Cluster Randomized Trial

CIMAT Taller de Modelos de Capture y Recaptura Known Fate Survival Analysis

Unobserved Heterogeneity

Inference on the Univariate Frailty Model: An Approach Bayesian Reference Analysis

Estimation of parametric functions in Downton s bivariate exponential distribution

Proportional hazards model for matched failure time data

STATISTICAL INFERENCE IN ACCELERATED LIFE TESTING WITH GEOMETRIC PROCESS MODEL. A Thesis. Presented to the. Faculty of. San Diego State University

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.

Multivariate survival modelling: a unified approach with copulas

Lifetime prediction and confidence bounds in accelerated degradation testing for lognormal response distributions with an Arrhenius rate relationship

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

Multivariate Survival Data With Censoring.

Continuous Time Survival in Latent Variable Models

Research & Reviews: Journal of Statistics and Mathematical Sciences

A comparison of methods to estimate time-dependent correlated gamma frailty models

Two-level lognormal frailty model and competing risks model with missing cause of failure

FRAILTY MODELS FOR MODELLING HETEROGENEITY

Survival Analysis. Stat 526. April 13, 2018

Multivariate spatial modeling

Lecture 5 Models and methods for recurrent event data

A Study on the Correlation of Bivariate And Trivariate Normal Models

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

UNIVERSITY OF CALIFORNIA, SAN DIEGO

Modeling Arbitrarily Interval-Censored Survival Data with External Time-Dependent Covariates

Estimation in Generalized Linear Models with Heterogeneous Random Effects. Woncheol Jang Johan Lim. May 19, 2004

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Practice Exam 1. (A) (B) (C) (D) (E) You are given the following data on loss sizes:

Power and Sample Size Calculations with the Additive Hazards Model

Rene Tabanera y Palacios 4. Danish Epidemiology Science Center. Novo Nordisk A/S Gentofte. September 1, 1995

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA

EXAMINATIONS OF THE HONG KONG STATISTICAL SOCIETY

Incorporating unobserved heterogeneity in Weibull survival models: A Bayesian approach

MAS3301 / MAS8311 Biostatistics Part II: Survival

Dependent Hazards in Multivariate Survival Problems

Information geometry for bivariate distribution control

8. Parametric models in survival analysis General accelerated failure time models for parametric regression

Sampling bias in logistic models

A Bivariate Weibull Regression Model

Comparative Distributions of Hazard Modeling Analysis

Semi-Competing Risks on A Trivariate Weibull Survival Model

Composite likelihood and two-stage estimation in family studies

18 Bivariate normal distribution I

Regularization in Cox Frailty Models

Stock Sampling with Interval-Censored Elapsed Duration: A Monte Carlo Analysis

3 Continuous Random Variables

Survival Analysis I (CHL5209H)

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY

Multivariate Random Variable

Evaluating the value of structural heath monitoring with longitudinal performance indicators and hazard functions using Bayesian dynamic predictions

PQL Estimation Biases in Generalized Linear Mixed Models

THE SINGULARITY OF THE INFORMATION MATRIX OF THE MIXED PROPORTIONAL HAZARD MODEL

Tests of independence for censored bivariate failure time data

GOODNESS-OF-FIT TESTS FOR ARCHIMEDEAN COPULA MODELS

Step-Stress Models and Associated Inference

Analysis of Time-to-Event Data: Chapter 4 - Parametric regression models

BAYESIAN ESTIMATION IN SHARED COMPOUND POISSON FRAILTY MODELS

Chapter 2. Discrete Distributions

ST5212: Survival Analysis

Longitudinal + Reliability = Joint Modeling

Modelling Dependent Credit Risks

Chapter 5. Chapter 5 sections

Random Variables and Their Distributions

Evaluation and Comparison of Mixed Effects Model Based Prognosis for Hard Failure

Variable Selection in Competing Risks Using the L1-Penalized Cox Model

Multistate models in survival and event history analysis

Chris Jones THE OPEN UNIVERSITY, U.K.

A Joint Model with Marginal Interpretation for Longitudinal Continuous and Time-to-event Outcomes

Multiple Random Variables

Lecture 3. Truncation, length-bias and prevalence sampling

THE UNIVERSITY OF HONG KONG DEPARTMENT OF STATISTICS AND ACTUARIAL SCIENCE

Chapter 5 continued. Chapter 5 sections

DISCUSSION PAPER PI-1306

Analysing geoadditive regression data: a mixed model approach

Survival Analysis. Lu Tian and Richard Olshen Stanford University

Frailty Probit model for multivariate and clustered interval-censor

Analysis of Gamma and Weibull Lifetime Data under a General Censoring Scheme and in the presence of Covariates

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

A joint modeling approach for multivariate survival data with random length

Institute of Actuaries of India

HANDBOOK OF APPLICABLE MATHEMATICS

Joint Modeling of Longitudinal Item Response Data and Survival

Bias Study of the Naive Estimator in a Longitudinal Binary Mixed-effects Model with Measurement Error and Misclassification in Covariates

Stochastic Comparisons of Weighted Sums of Arrangement Increasing Random Variables

Contents. Preface to Second Edition Preface to First Edition Abbreviations PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1

LOGISTIC REGRESSION Joseph M. Hilbe

Exam P Review Sheet. for a > 0. ln(a) i=0 ari = a. (1 r) 2. (Note that the A i s form a partition)

Transcription:

Correlation of Bivariate Frailty Models and a New Marginal Weibull Distribution for Correlated Bivariate Survival Data A dissertation submitted to the Graduate School of the University of Cincinnati in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Mathematical Science of the College of Arts and Sciences by Min Lin B.S., Peking University,P.R.China,99 M.S., University of Cincinnati,3 Committee Chair: Dr. Siva Sivaganesan

Abstract Survival analysis is widely used in many different areas. The classic models, such as Cox proportional hazards model, are frequently used to model univariate survival data. However, in biomedical studies, it is not uncommon that each study subject experience multiple events or subjects are related within some clusters. These data are called multivariate survival data. The statistical methods of these problems need to describe the dependence of observations within a subject or cluster. Frailty model is one way to approach the problem and is commonly used recently. Among the frailties, the gamma frailty is frequently used because of its analytic features. However, the gamma frailty model cannot handle the highly correlated data in some cases. In this thesis, different parametric survival models with gamma frailty and lognormal frailty have been examined in terms of correlation. Overall, lognormal frailty models perform better than gamma frailty models in many survival models. Another approach to solve multivariate survival data problem is via parametric distributions which can directly address the dependence among the data. In this thesis, a bivariate distribution with marginal Weibull distribution is proposed. Some properties of the distribution have been discussed. Weibull model with lognormal frailty, Weibull model with gamma frailty, and the marginal Weibull model are also fitted via Bayesian method and the results are compared. ii

Acknowledgments I would like to thank my doctoral dissertation committee chair Dr. Siva Sivaganesan for his advice and guidance through every stage of developing this dissertation, his suggestions, corrections, and great patience. Great appreciations also go to Dr. James Deddens and Dr. Paul Horn, my dissertation committee members, for their valuable time, support and friendship. I would like to extend my sincere appreciation and gratitude to the people of the Department of Mathematical Science for the help and the good time we spent together. Finally, my thanks go to my family, especially my wife, for your love and encouragement to make the completion of this dissertation possible. iv

Contents ABSTRACT ii ACKNOWLEDGEMENTS iv Contents v List of Tables x List of Figures xiv Introduction I Bivariate Survival Models Comparison 5 Background 6. Frailty Model............................... 6. Loglinear Random Effect Model..................... 7.3 Random Effect Model via Parameters.................. 8.4 Structure of Part I............................ 9.5 Notation.................................. 9 3 Some Properties of Frailty Model and Possible Extension of Random Effect Model 3. General Properties............................ 3. Frailty Model............................... 3 v

3.3 Loglinear Random Effect Model..................... 5 4 Bivariate Exponential Models 4. Exponential Model with Frailty..................... 4.. Exponential Model with Gamma Frailty............ 4.. Exponential Model with Lognormal Frailty........... 3 4. Loglinear Random Effect Model..................... 5 4.3 Discussion................................. 6 5 Bivariate Weibull Models 8 5. Weibull Model with Frailty........................ 8 5.. Weibull Model with Gamma Frailties.............. 9 5.. Weibull Model with Lognormal Frailties............ 3 5. Loglinear Random Effect Model..................... 33 5.3 Comparison and Discussion....................... 35 6 Bivariate Piecewise Exponential Models 36 6. Piecewise Exponential Model with Frailty............... 36 6.. Piecewise Exponential Model with Gamma Frailty....... 37 6.. Piecewise Exponential Model with Lognormal Frailty..... 44 6. Discussion of Piecewise Exponential Model............... 45 7 Bivariate Extreme Value Models 47 7. Minimum Extreme Value Model with Frailty.............. 47 7. Discussion................................. 48 8 Bivariate Gompertz Model 5 8. Gompertz Model with Frailty...................... 5 8. Log Time Survival Model........................ 5 vi

8.3 Discussion................................. 5 9 Bivariate Lognormal Models 53 9. Lognormal Model with Frailty...................... 53 9. Log-linear Survival Model........................ 54 9.3 Discussion................................. 54 Bivariate Gamma Model 56. Gamma Model with Frailty....................... 56.. Gamma Model with Gamma Frailty............... 57. Loglinear Survival Model......................... 57.3 Discussion................................. 58 Bivariate Loglogistic Model 6. Loglogistic Model with Frailty...................... 6. Log Time Survival Model........................ 6.3 Discussion................................. 6 II Bivariate Marginal Weibull Distribution 64 Introduction 65. Bivariate Distribution Approach..................... 65. Bayesian Approach to Survival Analysis................ 66.3 Structure of Part II............................ 68 3 A New Marginal Bivariate Weibull Distribution 69 3. Distribution................................ 69 4 Fit Marginal Bivariate Weibull Model Via Bayesian Methods 79 4. Gibbs Sampling............................. 79 vii

4. Incorporate Covariates.......................... 83 4.3 Simulation Results............................ 85 4.3. Different Initial Points...................... 86 4.3. Different Priors.......................... 87 4.3.3 Property of MCMC........................ 87 5 Comparison of Gamma Frailty Weibull Model, Lognormal Frailty Weibull Model, and Marginal Weibull Model 9 5. Gibbs Sampling for Weibull Model with Frailty............ 9 5.. Gamma Frailty.......................... 95 5.. Lognormal Frailty......................... 96 5. Independent Case............................. 98 5.3 Bivariate Marginal Weibull Model.................... 5.4 Model comparison via DIC........................ 3 5.5 Model comparison via Covariates.................... 4 5.6 Example.................................. 5 6 Conclusions and Further Research 6 6. Conclusions................................ 6 6.. Frailty Models........................... 6 6.. Bivariate Distribution...................... 7 6. Further Research............................. 8 6.. Frailty Models........................... 8 6.. Bivariate Distribution...................... 8 Bibliography Appendix 9 viii

A Proof of Theorem.. 9 B Correlation coefficient between piecewise exponential distribution with gamma frailty 7 C Results of Peicewise Exponential Model with Frailty 35 C. Correlation coefficients of piecewise exponential model with gamma frailty................................... 35 C. Simulation results of piecewise exponential model with lognormal frailty 38 D Correlation Coefficients in Gompertz Model 4 E Simulation Results of Marginal Weibull Model Fitting 4 E. Different Initial Points.......................... 4 E. Different Priors.............................. 46 F Model Comparison via DICs 59 G Model Comparison: Covariate 6 H Example Data 65 ix

List of Tables 5. Comparison the coefficient of sex...................... 5 C. The Correlation Coefficient of Piecewise Exponential Model with Gamma Frailty - mid-timepoint increases...................... 35 C. The Correlation Coefficient of Logarithm Scale Piecewise Exponential Model with Gamma Frailty - mid-timepoint increases.......... 35 C.3 The Correlation Coefficient of Piecewise Exponential Model with Gamma Frailty - Fixed α =.5 and t =..................... 36 C.4 The Correlation Coefficient of Logarithm Scale Piecewise Exponential Model with Gamma Frailty - Fixed α =.5 and t =......... 37 C.5 The Correlation Coefficient of Piecewise Exponential Model with Lognormal Frailty - λ = λ............................. 38 C.6 The Correlation Coefficient of Piecewise Exponential Model with Lognormal Frailty - µ, σ, and t fixed....................... 38 C.7 The Correlation Coefficient of Piecewise Exponential Model with Lognormal Frailty - σ increase........................... 38 C.8 The Correlation Coefficient of Piecewise Exponential Model with Lognormal Frailty - µ increase........................... 39 C.9 The Correlation Coefficient of Piecewise Exponential Model with Lognormal Frailty - t increases.......................... 39 C. The Correlation Coefficient of Logarithm Scale Piecewise Exponential Model with Lognormal Frailty....................... 39 x

C. The Correlation Coefficient of Logarithm Scale Piecewise Exponential Model with Lognormal Frailty....................... 39 D. The Correlation Coefficient of Gompertz Model Gamma Frailty..... 4 D. The Correlation Coefficient of Gompertz Model Lognormal Frailty (β =.)...................................... 4 D.3 The Correlation Coefficient of Gompertz Model Lognormal Frailty (β =.5)..................................... 4 D.4 The Correlation Coefficient of Gompertz Model Lognormal Frailty (µ = 5)...................................... 4 D.5 The Correlation Coefficient of Gompertz Model Lognormal Frailty - (β.5, σ =................................... 4 E. Estimation of α with different initail points in MCMC.......... 4 E. Estimation of α with different initail points in MCMC.......... 43 E.3 Estimation of γ with different initail points in MCMC.......... 43 E.4 Estimation of γ with different initail points in MCMC.......... 44 E.5 Estimation of ρ with different initail points in MCMC........... 44 E.6 Estimation of ρ with different initail points in MCMC........... 45 E.7 Estimation of α with different priors (α =.)............. 46 E.8 Estimation of α with different priors (α =.).............. 47 E.9 Estimation of α with different priors (α = )............... 47 E. Estimation of α with different priors (α = ).............. 48 E. Estimation of γ with different priors (α =.).............. 48 E. Estimation of γ with different priors (α =.).............. 49 E.3 Estimation of γ with different priors (α = )............... 49 E.4 Estimation of γ with different priors (α = )............... 5 E.5 Estimation of ρ with different priors (α =.).............. 5 xi

E.6 Estimation of ρ with different priors (α =.).............. 5 E.7 Estimation of ρ with different priors (α = )............... 5 E.8 Estimation of ρ with different priors (α = )............... 5 E.9 Estimation of α (Median) with different priors (α =.)........ 5 E. Estimation of α (Median) with different priors (α =.)......... 53 E. Estimation of α (Median) with different priors (α = )......... 53 E. Estimation of α (Median) with different priors (α = )......... 54 E.3 Estimation of γ (Median) with different priors (α =.)........ 54 E.4 Estimation of γ (Median) with different priors (α =.)......... 55 E.5 Estimation of γ (Median) with different priors (α = ).......... 55 E.6 Estimation of γ (Median) with different priors (α = )......... 56 E.7 Estimation of ρ (Median) with different priors (α =.)........ 56 E.8 Estimation of ρ (Median) with different priors (α =.)......... 57 E.9 Estimation of ρ (Median) with different priors (α = ).......... 57 E.3 Estimation of ρ (Median) with different priors (α = )......... 58 F. Model comparison via DIC with Marginal Weibull data.......... 59 F. Model comparison via DIC with lognormal frailty data.......... 59 F.3 Model comparison via DIC with gamma frailty data............ 6 F.4 Model comparison via DIC with independent Weibull data........ 6 F.5 Model comparison via DIC with marginal Weibull data with different sample size (n=)................................ 6 G. Model comparison of gamma frailty data (η =. α =.5)........ 6 G. Model comparison of gamma frailty data................. 6 G.3 Model comparison of lognormal frailty data (σ =.5 α = )....... 6 G.4 Model comparison of lognormal frailty data (σ = α = )........ 6 G.5 Model comparison of lognormal frailty data (σ = 5 α = )........ 6 xii

G.6 Model comparison of marginal Weibull data (ρ = α =.5)....... 63 G.7 Model comparison of marginal Weibull data (ρ = α = )........ 63 G.8 Model comparison of marginal Weibull data (ρ = α = 3)........ 63 G.9 Model comparison of marginal Weibull data (ρ =.4 ).......... 64 G. Model comparison of marginal Weibull data (ρ =.6 ).......... 64 G. Model comparison of marginal Weibull data (ρ =.36 ).......... 64 H. The Example Data Set of Infection of the Catheter............ 65 xiii

List of Figures 4- Correlation Coefficients of Exponential Model with Lognormal Frailty. 5 5- Correlation Coefficients of Weibull Model with Gamma Frailty..... 3 5- Correlation Coefficients of Weibull Model with Lognormal Frailty.... 34 3- Correlation Coefficients of the Two Marginal Weibull Distribution... 78 4- MCMC Sample of α............................. 88 4- MCMC Sample of γ............................. 89 4-3 MCMC Sample of ρ............................. 9 xiv

Chapter Introduction Survival analysis deals with time-to-event data. It is widely used in many fields including medicine, the environmental sciences, actuarial science, engineering, economics, and the social sciences, such as the remission time of cancers, the time-tofailure of engineering systems, employment duration, and the length of marriages. There are many models used in survival analysis. For instance, Weibull regression is the most commonly used parametric survival model. These traditional models allow people to test the significance of covariates and find the relationship between the lifetime and the covariates. These traditional models depend on the assumption that the survival times are conditionally independent on covariates. However, the assumption of independence does not always hold. Some survival data may be correlated within some clusters. For example, in epidemiology, problems of related events arise in family studies of disease incidence, there may be a tendency for disease occurrence within families, either because of shared environmental exposures or because of genetic predisposition. This type of survival data is called multivariate survival data. The independence among the observations within clusters cannot be assumed in these cases. Therefore, the traditional survival model may not be appropriate because the assumption of independence is no longer valid. Several methods have been proposed to solve the multivariate survival problems,

such as marginal models approach proposed by Zeger, et al. [84], Andersen and Gill [4], Wei, et al. [8] and Prentice et al. [68], penalized likelihood approach proposed by Prentice and Cai [67]. The frailty model is one of the proposed methods to solve the problem and is becoming increasingly popular in the area. The term frailty was introduced by Vaupel, Manton and Stallard [77] in 979. A random effect, called frailty, that describes excess risk for distinct cluster is added in the traditional survival models in the frailty model. These frailties can be either individual cluster specified or group shared. The later model, called shared frailty model, is one of the most commonly used frailty models. The idea of frailty is very close to the mixed model in longitudinal data analysis, which interprets the intra-cluster correlation among the data via a random effect. The frailty model assumes that time-to-event data is conditionally independent on the given frailty in a cluster. Those observations from the same cluster with higher risk share a higher frailty, and similarly, those observations from the same cluster with lower risk share a lower frailty. After extending the traditional proportional hazards model to frailty model, the hazards will be the product of baseline hazards, frailties, and non-negative function of covariates. The frailty model has been developed by Clayton and Cuzick [8], Oakes [6],[63]. Furthermore, Aalen [] provides theoretical and practical motivation for frailty models by discussing the impact of heterogeneity on analysis, and by illustrating how random effect can deal with it. Naturally, the choice of distribution of the frailty plays an important role in the model. Theoretically, any non-negative distribution might be used as the frailty. Choices of the distributions of the frailty based on difference baseline hazards have been discussed. For example, gamma frailty has been discussed by Andersen et al. [5], Glidden and Vittinghoff [3], Sahu, Day, Aslanidou and Sinha[7], Bjarnason and Hougaard [7], Clayton [6] and Oakes [6]. Hougaard [38] used a positive stable

frailty. Whitmore and Lee [8] used an inverse gamma frailty. Paik [65] used piecewise Gamma frailty. Qiou, Ravishanker and Dey [69] use positive stable frailty. Zeger, S.L., Liang, K.Y. and Albert, P.S. [84] used a class of gamma frailties. Gamma distribution has been commonly chosen as the distribution of the frailty because of the relative simple form of likelihood and convenience of computing. However, a question arises while examining the correlation between the time-to-events within cluster, based on the data from Weibull model with gamma frailty. The model can not handle highly correlated survival data within a cluster in some scenarios. On the other hand, lognormal frailty, another commonly used frailty, works better in terms of interpretation of the correlation in Weibull model. The survival model also can be expressed as log-linear model. It is an additive model to incorporate covariates via the logarithm of random variable for time. With extending loglinear model by adding a random variable term to the model, the multivariate survival problem might also be solved. To extend survival model to explain the correlated survival data, another survive model, the model via parameters in the survival distributions, also can be considered. I.e., add a random variable to the linear model of the parameter(s) also can explain some correlation. In Part I, we give an extensive discussion on the correlation based on gamma frailty and lognormal frailty with different commonly used survival baselines, and compare some properties with loglinear model and survival model via parameters. Another way to solve the correlated survival problems has also been discussed in the literature. It is a parametric approach which explains the dependence in the data using multivariate distribution. This method is mainly used on bivariate survival data because of the complexity of high-dimension multivariate distributions, especially for not equally correlated data in high dimension problems. Freund [3], Marshall and Olkin [53], Block and Basu [8] discussed some bivariate exponential distributions, 3

and Lee [49], Klein, Keiding, and Kamby [46], Lu and Bhattacharyya [5], Ghosh and Gelfand [9] discussed some bivariate Weibull distributions in bivariate survival data settings. In Part II, a bivariate distribution with marginal Weibull distribution, which is derived from bivariate normal distribution, is proposed for the bivariate survival problem. Some properties of the distribution have been discussed. Model fitting of the proposed distribution via Bayesian method also has been discussed. 4

Part I Bivariate Survival Models Comparison 5

Chapter Background. Frailty Model As briefly discussed previously, frailty model is a method to deal with dependence in the survival data. The most common type of frailty model is called the shared frailty model, which is a random effects model where the frailties are common (or shared) among clusters of individuals and are randomly distributed across clusters. In shared frailty model, the time-to-events are assumed conditionally independent on the given frailty. The hazard in the shared frailty model is the product of frailty, baseline hazard, and hazard caused by covariates. This thesis focuses on the properties of shared frailty model. All the frailty models discussed in this paper are shared frailty model unless specified otherwise. The frailty model can be described as follows. Let T ij be the event time of the j th observation in the i th cluster, where i=,,...,n and j=,,...,n i. The conditional hazard function of T ij, given the unobserved frailty random variable w i and fixed covariates vector x ij for j th observation in the i th cluster, can be written as 6

h(t ij w i, x ij ) = h (T ij )w i e x ij β where function h is the common baseline hazard function for all observations, β is the parameter vector for the covariates. A common method is to use a non-negative continuous parametric distribution with finite mean for the frailty w i. Elbers and Ridder [] showed that the finite mean assumption plays the same role as the mean zero assumption in the linear regression model. Usually, the parameters in the distribution of frailty are chosen to let the mean of the frailty equal to so that it has limited effect on the scale of other parameters in the model. Although many distributions are technically proposed for using as frailties, the most commonly used distributions are gamma and lognormal frailties in practice. Frailty model might be able to add on many widely used traditional parametric survival models, such as exponential, piecewise exponential, Weibull, lognormal, Gamma, extreme value, and log-logistic.. Loglinear Random Effect Model The survival model also can be expressed as loglinear model, i.e., the distribution of the logarithm scale survival time can be described by a random variable V, a location parameter µ, and a scale parameter σ as below, ln T = Y = µ + σv where V usually is a distribution with support (, ). For instance, to obtain Weibull distribution, let V follows extreme distribution with probability density function f(v) = e v ev, and let µ = ln α and σ =, γ γ follows change of variables, the probability density function of T can be obtained as 7

f(t) = αγt γ e αtγ. Thus, T follows Weibull distribution with parameters α and γ. Frailty model uses a random variable to describe excess risk for individual categories, which is natural since the risk ratio is most common way to interpret survival data. However, the dependence also can be introduced through loglinear model by adding a random effect to loglinear model. The model becomes loglinear random effect model, ln T = Y = µ + σv + ν where ν is a common random variable shared by all individuals within same cluster, and randomly distributed across clusters. V s are mutually independent across all observations. In the individual level, the model can be written as, ln T ij = Y ij = µ + σv ij + ν i Clearly, the T ij and T ij are not independent, hence it can incorporate some dependence among the survival data. Comparing this model with frailty model, we can see that it is an additive model to model on the logarithm scale survival time. Intuitively, choose ν following a normal or log-gamma distribution might be comparable to the lognormal frailty model or gamma frailty model, respectively. In the following chapters, we will examine the properties using these two distributions, and compare the loglinear model with frailty model based on different baselines..3 Random Effect Model via Parameters Another commonly used parametric approach to survival model is modeling covariates via one of the parameters in the parametric distribution. For instance, in Weibull model, the covariates might be modeled via the parameter α, as ln α = x β. 8

Ie, the distribution of survival time T has p.d.f. f(t) = e x β γt ( e x β )e γtex β. The similar idea of loglinear random effect model may also be applied to this model. The correlation can be introduced by adding a random effect through the parameters. Again, using Weibull distribution as an example, let ln α = ln α + ν, where ν is the common random effect within the same cluster and randomly distributed across clusters. The dependence is introduced by the similar fashion of loglinear random effect model. Because survival distributions usually have more than one parameter and the random effect can be added on logarithm scale of the parameter as well as original scale of the parameter, the correlation structures vary a lot for these different scenarios. If available, the one has the same distribution as the frailty model of loglinear random effect model will be discussed in the following chapters..4 Structure of Part I The remainder of Part I is organized as follows. Chapter 3 introduces some basic properties in terms of correlation in the frailty model and loglinear random effect model. Chapter 4 to Chapter will discuss and compare the different models in terms of correlation for exponential model, Weibull model, piecewise exponential model, extreme value model, Gompertz model, lognormal model, gamma model, and loglogistic model, for the frailty model, loglinear random effect model, and/or random effect model via parameters..5 Notation Unless specified otherwise, T, T, T will be used for original scale survival data. Y, Y, Y will be used for logarithm scale survival data. µ, σ will be used as position 9

and scale parameters in loglinear model. µ, σ will be used as location and scale parameters, respectively, in lognormal frailty. α and γ will be used as shape and scale parameters, respectively, in gamma frailty. Since the support of survival random variable T is t > in most survival distributions, this condition will not be repeated in p.d.f., survival function, and hazard function in this part unless specified otherwise.

Chapter 3 Some Properties of Frailty Model and Possible Extension of Random Effect Model In this chapter, some general properties regarding the correlation coefficients between paired survival random variables will be discussed. The general formula of joint distribution of frailty model will be derived. The correlation coefficients of loglinear random effect model will also be discussed. 3. General Properties Proposition : If two random variables X and X are conditionally independent on a random variable ω, the correlation coefficient between X and X is, ρ(x, X ) = E(E(X ω)e(x ω)) E(E(X ω))e(e(x ω)) (E(E(X ω)) E (E(X ω)))(e(e(x ω)) E (E(X ω)))

If X and X have identical distribution, X = X := X, then the correlation coefficients between X and X is, ρ(x, X ) = V ar(e(x ω)) V ar(e(x ω)) + E(V ar(x ω)) Proof: ρ(x, X ) = = = = Cov(X,X ) V ar(x )V ar(x ) E(X X ) E(X )E(X ) (E(X ) E (X ))((E(X ) E (X )) E(E(X X ω)) E(E(X ω))e(e(x ω)) (E(E(X ω)) E (E(X ω)))(e(e(x ω)) E (E(X ω))) E(E(X ω)e(x ω)) E(E(X ω))e(e(x ω)) (E(E(X ω)) E (E(X ω)))(e(e(x ω)) E (E(X ω))) If X and X follow the same distribution, noted as X, ρ(x, X ) = E(E(X ω)e(x ω)) E(E(X ω))e(e(x ω)) (E(E(X ω)) E (E(X ω)))(e(e(x ω)) E (E(X ω))) = E(E (X ω) E (E(X ω)) E(E(X ω)) E (E(X ω)) = = E(E (X ω) E (E(X ω)) E(E(X ω)) E(E (X ω))+e(e (X ω)) E (E(X ω)) V ar(e(x ω)) V ar(e(x ω))+e(v ar(x ω)) Notice that the correlation coefficient can also be written as E(V ar(x ω)) + V ar(e(x ω)). The correlation coefficient depends on the ratio of expectation of variance of X ω to the variance of expectation of X ω. When E(V ar(x ω)) is relatively larger, the correlation is smaller, when V ar(e(x ω)) is larger, the correlation is higher. Proposition : If two random variables X and X are conditionally independent on frailty ω, two other random variables Y and Y are also conditionally independent on frailty ω with Y ω = α X ω and Y ω = α X ω, the correlation coefficient between X and X is equal to the correlation coefficient between Y and Y.

Proof: ρ(y, Y ) = = = E(E(Y ω)e(y ω)) E(E(Y ω))e(e(y ω)) (E(E(Y ω)) E (E(Y ω)))(e(e(y ω)) E (E(Y ω))) E(E(α X ω)e(α X ω)) E(E(α X ω))e(e(α X ω)) (E(E((α X ) ω)) E (E(α X ω)))(e(e((α X ) ω)) E (E(α X ω))) E(E(X )E(X ω)) E(E(X ω))e(e(x ω)) (E(E(X ω)) E (E(X ω)))((e(e(x ω)) E (E(X ω))) = ρ(x, X ) This proposition states that linear transformation of the variable will not change the correlation coefficient between the correlated bivaraite survival random variables. Hence, some scale parameters will not affect the correlation coefficients between the correlated bivariate survival data in frailty model. 3. Frailty Model For the simple case of frailty model without considering covariates, we have, Proposition 3: Given the frailty model h(t ω) = ωh (t), the conditional p.d.f. of T ω is f(t ω) = ωh (t)(s (t)) ω = ω(s (t)) ω f (t), where h (t), H (t), S (t), and f (t) are the forms of hazard, cumulative hazard, survival, and probability density function of the survival distribution without frailty. Proof: It can be shown by following the definition of hazard function and survival function. From the conditional hazard function, h(t ω) = ωh (t) The cumulative hazard function, survival function, and p.d.f. can be obtained se- 3

quentially as below, H(t ω) = t h(t ω) dt = t ωh (t) dt = ωh (t) S(t ω) = e H(t ω) = e ωh (t) = (S (t)) ω f(t ω) = h(t ω)s(t ω) = ωh (t)(s (t)) ω = ω(s (t)) ω f (t) Furthermore, in bivariate case, the joint distribution of t, t, and ω is, f(t, t, ω) = f(t ω)f(t ω)f(ω) = ω(s (t )) ω f (t )ω(s (t )) ω f (t )f(ω) = f (t )f (t )ω (S (t )S (t )) ω f(ω) where f(ω) is the p.d.f. of the frailty. Proposition 4: If two random variables T and T are conditionally independent on gamma frailty ω with parameters α and γ, the joint distribution of T and T has probability density function, f(t, t ) = (α + )α γ h (t )h (t ) ( γ ln S (t ) γ ln S (t )) α + where h (t i ) and S (t i ) are the forms of hazard function and survival function without frailty. Proof: It is already shown that f(t ω) = ωh (t)(s (t)) ω, given ω follows gamma 4

distribution with parameters α and γ, so f(ω) = Γ(α )γ α ω α e ω γ f(t, t, ω) = ωh (t )(S (t )) ω ωh (t )(S (t )) ω = Γ(α )γ α f(t, t ) = f(t, t, ω) dω = h (t )h (t ) Γ(α )γ α Γ(α )γ α ω α e ω γ h (t )h (t )ω α + e ω( γ ln S (t ) ln S (t )) ω α + e ω( γ ln S (t ) ln S (t )) dω = h (t )h (t ) Γ(α )γ α Γ(α + )( γ ln S (t ) ln S (t )) (α +) = (α +)α γ h (t )h (t ) ( γ ln S (t ) γ ln S (t )) α + 3.3 Loglinear Random Effect Model The correlation of coefficient can be found in following theorem. Theorem 3.3. Assume Y and Y are conditionally independent on ν in loglinear random effect model, ie, Y = µ + σv + ν Y = µ + σv + ν where V and V are independent identical distributions. Then, ρ = V ar(ν) V ar(ν) + σ V ar(v ) V ar(ν) + σ V ar(v ) if V and V follow the same distribution, V = V := V, then ρ(y, Y ) = V ar(ν) V ar(ν) + σ V ar(v ) 5

Proof: Y = µ + σv + ν Y = µ + σv + ν So that, E(Y ν) = µ + ν + σe(v ) E(Y ν) = µ + ν + σe(v ) E(Y ν) = µ + ν + σ E (V ) + µν + µσe(v ) + νσe(v ) E(Y ν) = µ + ν + σ E (V ) + µν + µσe(v ) + νσe(v ) E(Y Y ν) = E(Y ν)e(y ν) = µ + ν + µν + σ E(V )E(V ) + σ(µ + ν)(e(v ) + E(V )) Cov(Y, Y ) = E(Y Y ) E(Y )E(Y ) = E(E(Y Y ν)) E(E(Y ν))e(e(y ν)) = E(µ + ν + µν + σ E(V )E(V ) + σ(µ + ν)(e(v ) + E(V )) E((µ + ν + σe(v )))E((µ + ν + σe(v ))) = µ + E(ν ) + µe(ν) + σ E(V )E(V ) + σ(µ + E(ν))(E(V ) + E(V )) (µ + E(ν) + σe(v ))(µ + E(ν) + σe(v )) = E(ν ) E (ν) = V ar(ν) 6

V ar(y ) = E(Y ) E (Y ) Similarly, = E(E(Y ν)) E (E(Y ν)) = E(µ + ν + σ E (V ) + µν + µσe(v ) + νσe(v )) (E(µ + ν + σe(v ))) = µ + E(ν ) + µe(ν) + σ E (V ) + µσe(v ) + σe(ν)e(v ) (µ + E(ν) + σe(v ))(µ + E(ν) + σe(v )) = E(ν ) E (ν) + σ (E(V ) E (V )) = V ar(ν) + σ V ar(v ) V ar(y ) = V ar(ν) + σ V ar(v ) So, ρ = Cov(Y, Y ) V ar(y )V ar(y ) = V ar(ν) V ar(ν) + σ V ar(v ) V ar(ν) + σ V ar(v ) If V and V follow the same distribution, V = V =: V, then ρ = = V ar(ν) V ar(ν)+σ V ar(v ) V ar(ν)+σ V ar(v ) V ar(ν) V ar(ν)+σ V ar(v ) Notice that the correlation coefficient will not affect the by the choice of ν in loglinear model. The correlation coefficient can also be written as + σ V ar(v ) V ar(ν). If the variance of the random effect is very large, the correlation would be close to. This is reasonable since the larger variance of the random effect can be interpreted as larger impact by the random effect. On the other hand, if the variance of the random effect is relative small, the correlation will be small. For the extreme case of that the variance of 7

the random effect ν goes to, the random variable degenerates to a fixed point, the loglinear becomes independent model, the correlation coefficient also goes to. The correlation coefficient for the original scale T can be obtained as in following theorem. Theorem 3.3. Assume Y and Y are conditionally independent on ν, ie, Y = ln T = µ + σv + ν Y = ln T = µ + σv + ν then, ρ(t, T ) = E(e σv )E(e σv )V ar(e ν ) (E(e σv )E(e ν ) (E(e σv )E(eν )) )(E(e σv )E(e ν ) (E(e σv )E(eν )) ) if V and V follow the same distribution, V = V =: V, then ρ = V ar(e ν ) V ar(e ν ) + E(e ν ) V ar(eσv ) E (e σv ) Proof: From, Y = ln T = µ + σv + ν Y = ln T = µ + σv + ν T = e µ e σv e ν T = e µ e σv e ν E(T ) = e µ E(e σv )E(e ν ) E(T ) = e µ E(e σv )E(e ν ) E(T T ) = e µ E(e σv )E(e σv )E(e ν) V ar(t ) = e µ E(e σv )E(e ν ) (e µ E(e σv )E(e ν )) = e µ (E(e σv )E(e ν ) (E(e σv )E(e ν )) ) 8

Similarly, V ar(t ) = e µ (E(e σv )E(e ν ) (E(e σv )E(e ν )) ) So, COV (T, T ) = E(T T ) E(T )E(T ) = e µ E(e σv )E(e σv )E(e ν ) e µ E(e σv )E(e ν )e µ E(e σv )E(e ν ) = e µ E(e σv )E(e σv )(E(e ν ) E (e ν )) = e µ E(e σv )E(e σv )V ar(e ν ) ρ(t, T ) = = = Cov(T,T ) V ar(t )V ar(t ) e µ E(e σv )E(e σv )V ar(e ν ) (e µ (E(e σv )E(e ν ) (E(e σv )E(e ν )) ))(e µ (E(e σv )E(e ν ) (E(e σv )E(e ν )) )) E(e σv )E(e σv )V ar(e ν ) (E(e σv )E(e ν ) (E(e σv )E(e ν )) )(E(e σv )E(e ν ) (E(e σv )E(e ν )) ) if V and V follow the same distribution, V = V =: V, then ρ(t, T ) = = = = = E(e σv )E(e σv )V ar(e ν ) (E(e σv )E(e ν ) (E(e σv )E(e ν )) )(E(e σv )E(e ν ) (E(e σv )E(e ν )) ) E(e σv )E(e σv )V ar(e ν ) E(e σv )E(e ν ) (E(e σv )E(e ν )) V ar(e ν ) E(e σv ) E (e σv ) E(eν ) E (e ν ) V ar(e ν ) V ar(e ν )+E(e ν )( E(eσV ) E (e σv ) ) V ar(e ν ) V ar(e ν )+E(e ν ) V ar(eσv ) E (e σv ) The correlation coefficient can also be written as, ρ(t, T ) = + E(eν ) V ar(e σv ) V ar(e ν ) E (e σv ) 9

when the parametric model has been chosen, V ar(eσv ) E (e σv ) the random effect ν will affect the correlation coefficients. is somehow fixed, the choice of As mentioned before, gamma distribution and lognormal distribution are most used frailty distributions. The choices of log-gamma distribution and normal distribution as distributions of ν might make the loglinear model comparable to the frailty model. The correlation coefficients of loglinear model with log-gamma or normal random effect can be easily calculated based on previous theorem. If ν follows log-gamma distribution, then ρ(y, Y ) = = ρ(t, T ) = = V ar(ν) V ar(ν)+σ V ar(v ) ψ (α) ψ (α)+σ V ar(v ) V ar(e ν ) V ar(e ν )+E(e ν ) V ar(eσv ) E (e σv ) α γ α γ +(α +)α γ V ar(e σv ) E (e σv ) = +(α+)γ V ar(e σv ) E (e σv ) the variance of a log-gamma distribution with parameter α and γ is trigamma function, noted as ψ (α ), which equals to the second derivative of gamma function. The trigamma function is a monotone decreasing function from to as α goes from to. If ν follows normal distribution with parameter µ and σ, then ρ(y, Y ) = = ρ(t, T ) = V ar(ν) V ar(ν)+σ V ar(v ) σ σ +σ V ar(v ) e µ +σ (e σ ) e µ +σ (e σ )+e µ +σ V ar(eσv ) E (e σv ) = e σ e σ +e σ V ar(eσv ) E (e σv ) = e σ e σ E(eσV ) E (e σv )

Chapter 4 Bivariate Exponential Models In this chapter, the correlation coefficients of frailty models with exponential baseline hazard, exponential loglinear random effect model, and modeling through the exponential parameter will be discussed and compared. 4. Exponential Model with Frailty The p.d.f., hazard function, and survival function of exponential distribution are, f(t) = λe λt h(t) = λ S(t) = e λt Therefore, for exponential model with frailty ω, the conditional proportional hazard function is, h(t ω) = ωλ Hence, the corresponding probability density function and survival function given frailty ω are, f(t ω) = ωλe ωλt S(t ω) = e ωλt

In bivariate case, the p.d.f. and survival functions are, f(t i ω) = ωλe ωλt i, i =, S(t i ω) = e ωλt i, i =, The following property holds for exponential model with any frailty. Theorem 4.. The parameters λ and λ in the exponential model with frailty are independent of the correlation coefficient between the two conditional exponential distribution. Proof: Assume T and T follow exponential distribution with parameter λ and λ, respectively, conditionally independent on a frailty ω. Notice that T ω = λ ωx and T ω = λ ωx, where X is the standard exponential distribution with p.d.f. f(x ) = e x. Ie, T ω and T ω can be obtained from linear transformation from standard exponential distribution. Based on proposition in Chapter 3, the correlation coefficients between T and T is free of λ and λ. Ie, the correlation coefficient solely depends on the choice of frailty in exponential frailty model. 4.. Exponential Model with Gamma Frailty The correlation coefficient between the two random exponential variables share the same gamma frailty is given in the follow theorem. Theorem 4.. If two conditional exponential random variables T and T on frailty w have parameters (λ, ω) and (λ, ω), respectively. The shared frailties ω s follow gamma distribution with parameter α and γ, then the correlation coefficients between the two exponential random variables is α for α >, correlation coefficients does not exists for α.

Since exponential distribution is a special case of Weibull distribution with shape parameter equals, the theorem can be proved by utilizing the proof in Appendix A by substitute both α and α with value. Notice that the correlation coefficient between two exponential random variables with shared gamma frailty only exists when α >, and the correlation coefficient is α, which is less than given α >. The exponential model with gamma frailty model can not interpret highly correlated bivariate data. 4.. Exponential Model with Lognormal Frailty The p.d.f. of the lognormal frailty is, f(ω) = πσ ω e σ (log(ω) µ) The correlation coefficient between two exponential random variables with shared lognormal frailty has following property. Theorem 4..3 The parameter µ of lognormal frailty is independent of the correlation coefficient in exponential model with lognormal frailty. Proof: Assume T and T follow exponential distribution with parameter λ and λ, respectively, conditionally independent on a lognormal frailty ω. The correlation 3

coefficient between T and T is, ρ(t, T ) = Cov(T,T ) V ar(t )V ar(t ) = = = = E(T T ) E(T )E(T ) (E(T ) (E(T )) )(E(T ) (E(T )) ) E(E(T T ω)) E(E(T ω))e(e(t ω)) (E(E(T ω)) (E(E(T ω))) )(E(E(T ω)) (E(E(T ω))) ) E((λ λ ω ) ) E((λ ω) )E((λ ω) ) (E((λ ω) ) E ((λ ω) ))(E((λ ω) ) E ((λ ω) )) (λ λ ) E((ω ) ) (λ λ ) E ((ω) ) (λ λ ) (E((ω) ) E ((ω) ))(E((ω) ) E ((ω) )) = E(ω ) E ((ω) ) E((ω) ) E ((ω) ) Since ω LN(µ, σ ), we have e µ ω e µ e N(µ,σ) e µ e µ+σz e σz LN(, σ ) = ω. Ie, ω = e µ ω ρ(t, T ) = Cov(T,T ) V ar(t )V ar(t ) = E((ω) ) E ((ω) ) E((ω) ) E ((ω) ) = E((eµ ω ) ) E ((e µ ω ) ) E((e µ ω ) ) E ((e µ ω ) ) = E((ω ) ) E ((ω ) ) E((ω ) ) E ((ω ) ) which is independent of the parameter µ. It is hard to calculate the correlation coefficient analytically for the exponential distribution with frailty, so that simulation is used to estimate the correlation. Based on the general property on exponential model frailty and above property, the parameters λ, λ, and µ are independent of the correlation coefficient. Without loss of generality, when simulating the correlated exponential data, the parameters have been set to,, and for λ, λ, and µ, respectively. In the simulation, each dataset contains pairs of correlated exponential data. For any choice of σ, datasets has been generated, the correlation coefficient has been calculated for each dataset, and the average correlation coefficient of the dataset with the same σ is used as the estimate of the correlation coefficient for the σ. 4

Figure 4-: Correlation Coefficients of Exponential Model with Lognormal Frailty The simulated correlation coefficients of exponential model with lognormal frailties is shown in Figure 4-. It is easy to see that lognormal frailty model can handle any correlated bivariate exponential data, with larger σ in the frailty interpreting higher correlation. Since the loglinear model is distributionally equivalent to frailty model, the correlation coefficient of logarithm scale survival data will be discussed in the next section. 4. Loglinear Random Effect Model The corresponding loglinear regression model of exponential survival model can be written as, Y = ln(t ) = ln λ + V 5

where V follows a standard extreme distribution with p.d.f f(v) = e v ev. By adding the random effect, the loglinear random effect model is, Y = ln(t ) = ln λ + V + ν where ν is the common random effect within cluster and randomly distributed across clusters. Let ν = ln ω, following change of variable, it will immediately show that the T ω has exponential distribution with parameter λω, which is exactly the same distribution as the frailty model. Ie, the exponential distribution with frailty model also can be written as Y = ln(t ) = ln λ + W ln ω. If ω is choosing to follow gamma distribution, then the loglinear random effect model will be equivalent to exponential with gamma frailty model. If ω follows lognormal distribution, the loglinear random effect model will be equivalent to exponential with lognormal frailty model. The correlation coefficients between Y and Y, or ln(t ) and ln(t ) can be calculated based on the formula given in Section 3 of Chapter 3. Variance of standard extreme distribution is π. For gamma frailty model, the variance of log-gamma dis- 6 tribution is trigamma function ψ (α ). Hence, the correlation coefficient is It only depends on α, the parameter in gamma frailty. ψ (α ) ψ (α )+π /6. For the lognormal frailty model, the variance of normal is σ. Hence, the correlation for lognormal frailty model is σ σ +π /6. 4.3 Discussion In exponential regression with p.d.f. f(t) = λe λt, let ln λ = ln λ +ln ω, where ω is a random variable, following the change of variable, it gives that T ω has exponential distribution with parameter λω, which is exactly the same as frailty model and log- 6

linear model. When ω follows gamma, the parametric model will be equivalent to gamma frailty model. When ω follows lognormal distribution, the parametric model will be equivalent to lognormal frailty model. By carefully choosing the parameters, the form of the model, and the distribution of the random variable, the three models, frailty model, log-linear survive model, and the model through parameter, are mathematically same. Comparing the gamma frailty model and lognormal frailty, we can find that gamma frailty can not handle highly correlated original scale bivariate exponential data, while the lognormal frailty is able to interpret highly correlated original scale bivariate exponential data. On the other hand, both models can interpret highly correlated logarithm scale bivariate exponential data. 7

Chapter 5 Bivariate Weibull Models In this chapter, the correlation coefficients of Weibull model with gamma frailties and lognormal frailties will be calculated and compared. 5. Weibull Model with Frailty The p.d.f., hazard function, and survival function of Weibull distribution are f(t) = αγt α e γtα h(t) = αγt α S(t) = e γtα For Weibull model with frailty ω, the conditional proportional hazard function is h(t ω) = ωαγt α Therefore, the corresponding conditional survival function and p.d.f. are S(t ω) = e ωγtα f(t ω) = αωγt α e ωγtα Similarly to exponential model with frailty, the correlation coefficient is indepen- 8

dent of parameters γ and γ for any frailty. Theorem 5.. The correlation coefficient in Weibull model with any frailty is independent of parameters γ and γ in the Weibull random variables. Proof: We will show that the correlation coefficient between the two random variable with Weibull parameters γ and γ, respectively, is equal to the correlation coefficient between the two random variables with Weibull parameter. Hence, the correlation coefficient is independent of the parameters γ and γ. We know Y ω W (α, γ ω) and Y ω W (α, γ ω). Let X ω = γ /α Y ω and X ω = γ /α Y ω, from proposition 3 in Chapter 3, we know that ρ(y, Y ) = ρ(x, X ), follows the change of variable, X ω W (α, ω) and X ω W (α, ω). Since the distribution of X and X does not related to γ and γ, ρ(x, X ) will not have γ and γ. Therefore, ρ(y, Y ) is not a function of γ and γ. 5.. Weibull Model with Gamma Frailties The following theorem describes the correlation coefficients between the two random variables from bivariate Weibull model with gamma frailties. Theorem 5.. If two conditional Weibull random variables X and X on frailty w has parameters (α, γ w) and (α, γ w), respectively. The shared frailties w s follow gamma distribution with parameter α and γ, the correlation coefficients between the Weibull model with gamma frailties is: Γ( )Γ( )(Γ(α α α )Γ(α ) Γ(α α α )Γ(α α )) α (α Γ(α )Γ(α )Γ( ) Γ α α (α )Γ α ( ))(α α Γ(α )Γ(α )Γ( ) Γ α α (α )Γ α ( )) α for α > max{ α, α }, otherwise, the correlation coefficient does not exist. The proof is lengthy and mainly integrals, and can be found in appendix A. 9

Notice that the correlation coefficient is independent of γ, γ, γ. Also, if and only if α > max{ α, α } the correlation coefficient exists, otherwise, the correlation coefficient does not exist. The formula is very complicate. We will consider some special cases of Weibull distribution to simplify the formula to understand the correlation coefficients. Corollary 5..3 If two identical Weibull random variables X and X share with gamma frailties, e.g., they have same parameters α, γ, then the correlation coefficients between them is: = Γ(α )Γ(α )Γ α ( +) Γ α (α )Γ α ( +) α Γ(α )Γ(α )Γ( +) Γ α α (α )Γ α ( +) α Γ(α )Γ(α ) Γ α (α ) α Γ(α )Γ(α ) Γ( α +) α Γ ( α +) Γ (α ) α Proof: For the special case of α = α := α, we have, ρ(x, x ) = (Γ(α )Γ(α α )Γ( +)Γ( +) Γ(α α α α )Γ( +)Γ(α α α )Γ( +)) α α (Γ(α )Γ(α )Γ( +) Γ α α (α )Γ α ( +)) α (Γ(α )Γ(α )Γ( +) Γ α α (α )Γ α ( +)) α = Γ(α )Γ(α )Γ α ( +) Γ α (α )Γ α ( +) α Γ(α )Γ(α )Γ( +) Γ α α (α )Γ α ( +) α Γ(α )Γ(α = ) Γ α (α ) α Γ(α )Γ(α ) Γ( α +) α Γ ( α +) Γ (α ) α From above formula, compare the numerator and denominator, notice that the difference between the numerator and denominator is that the first term of denominator, which has a multiplier Γ( α +) Γ ( α +). When α is very large then both α + and α + goes to so that Γ( α + ) and Γ ( α + ) goes to, then the correlation coefficient can be very close to, on the other hand, when α is very small, then the Γ( α +) Γ ( α +) is way larger than, then the correlation coefficient goes to. Therefore, the Weibull model with Gamma frailties can not handle the highly correlated survival 3

Figure 5-: Correlation Coefficients of Weibull Model with Gamma Frailty data when α is very small. The correlation coefficients given different combination of α and α are calculated and shown in Figure 5-. The vertical axis is the correlation coefficient, the horizontal axis is the shape parameter α in Weibull distribution, and different colors present different parameter choice of parameter α in gamma frailty. From the plot, we can find the range of correlation coefficients is limited because of the limitation of α > α. 5.. Weibull Model with Lognormal Frailties The following property can be found in lognormal frailty model based on Weibull baseline. Theorem 5..4 The correlation coefficient in Weibull model with lognormal frailty is not a function of the parameter µ of lognormal frailty. 3

Proof: We will show that the correlation coefficient between two random variables with lognormal frailty with parameter µ is equal to the correlation coefficient between two random variable with lognormal frailty with parameter µ =. We know if ω LN(µ, σ ), follow the change of variable, ω e µ = en(µ,σ ) e µ = eµ+σz e µ = e σz LN(, σ ) = ω, ie ω = e µ ω, and the p.d.f. of ω does not has the parameter µ. So, ρ(y, Y ) = = = ( ) ( ) ( ) Γ( α +) Γ( E α +) Γ( α +) Γ( E α +) E (ω) α (ω) α (ω) α (ω) α ( ) ( ( )) Γ( α E +) Γ( ( ) ( ( )) α +) Γ( E α E +) Γ( α +) E (ω) α (ω) α (ω) α (ω) α E Γ( α +) Γ( α +) (e µ ω ) α (e µ E Γ( α +) ω ) α (e µ E Γ( α +) ω ) α (e µ ω ) α E Γ( α +) (e µ E Γ( α +) ω ) α (e µ E Γ( α +) ω ) α (e µ E Γ( α +) ω ) α (e µ ω ) α E Γ( α +) Γ( α +) E Γ( α +) E Γ( α +) (ω ) α (ω ) α (ω ) α (ω ) α E Γ( α +) E Γ( α +) E Γ( α +) E Γ( α +) (ω ) α (ω ) α (ω ) α (ω ) α It is easy to see that there is no µ in the expression of the correlation coefficient. It is hard to calculate the correlation coefficients theoretically. Simulation is used to calculate the approximate correlation coefficients. Because γ and γ in Weibull distribution, and the parameter µ in lognormal frailty are irrelevant, without loss of generality, γ and γ are set to and µ is set to in the simulation. The simplified distributions becomes f(y i ω) = ωα i y α i i e ωyα i i, i =, 3

with the frailty f(ω) = πσ ω e σ (log(ω)) The processes of the simulating bivariate Weibull data with lognormal frailty are,. Generate a pair of sample from independent uniform(,) random variable, X and X,. Generate a sample from standard normal X, 3. Define Y = (ln( X )) α /e σz and Y = (log( X )) α /e σz, Then the random variable Y and Y are the sample needed. For each combination of α, α, σ, sample datasets, with, pair of data in each dataset, were generated. The correlation coefficient was calculated for each dataset and the correlation coefficients is estimated by the mean of the correlation coefficients for the datasets with same combination of α, α, σ. For illustration purpose, the results of the cases α = α are shown in Figure 5-. The vertical axis is the correlation coefficient, the horizontal axis is the shape parameter α in Weibull distribution, and different colors present different choices of parameter σ in lognormal frailty. 5. Loglinear Random Effect Model For Weibull regression model with p.d.f. f(t) = αγt α e γtα, the corresponding loglinear regression model can be written as, Y = ln(t ) = ln λ + γ V where W has a standard extreme distribution with p.d.f f(v) = e v ev. 33