A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky

Similar documents
EMPIRICAL ENVELOPE MLE AND LR TESTS. Mai Zhou University of Kentucky

11 Survival Analysis and Empirical Likelihood

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints

FULL LIKELIHOOD INFERENCES IN THE COX MODEL: AN EMPIRICAL LIKELIHOOD APPROACH

Empirical likelihood ratio with arbitrarily censored/truncated data by EM algorithm

Size and Shape of Confidence Regions from Extended Empirical Likelihood Tests

COMPUTATION OF THE EMPIRICAL LIKELIHOOD RATIO FROM CENSORED DATA. Kun Chen and Mai Zhou 1 Bayer Pharmaceuticals and University of Kentucky

COMPUTE CENSORED EMPIRICAL LIKELIHOOD RATIO BY SEQUENTIAL QUADRATIC PROGRAMMING Kun Chen and Mai Zhou University of Kentucky

Empirical Likelihood in Survival Analysis

Full likelihood inferences in the Cox model: an empirical likelihood approach

University of California, Berkeley

Quantile Regression for Residual Life and Empirical Likelihood

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints and Its Application to Empirical Likelihood

and Comparison with NPMLE

Efficiency of Profile/Partial Likelihood in the Cox Model

Comparing Distribution Functions via Empirical Likelihood

AFT Models and Empirical Likelihood

EMPIRICAL LIKELIHOOD AND DIFFERENTIABLE FUNCTIONALS

Nonparametric Hypothesis Testing and Condence Intervals with. Department of Statistics. University ofkentucky SUMMARY

Multistate Modeling and Applications

BARTLETT IDENTITIES AND LARGE DEVIATIONS IN LIKELIHOOD THEORY 1. By Per Aslak Mykland University of Chicago

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon

On the generalized maximum likelihood estimator of survival function under Koziol Green model

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA

Likelihood ratio confidence bands in nonparametric regression with censored data

Understanding product integration. A talk about teaching survival analysis.

Nonparametric Bayes Estimator of Survival Function for Right-Censoring and Left-Truncation Data

Investigation of goodness-of-fit test statistic distributions by random censored samples

Statistical Analysis of Competing Risks With Missing Causes of Failure

Survival Regression Models

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

A comparison study of the nonparametric tests based on the empirical distributions

1 Glivenko-Cantelli type theorems

Survival Analysis Math 434 Fall 2011

Goodness-of-fit tests for randomly censored Weibull distributions with estimated parameters

Empirical Processes & Survival Analysis. The Functional Delta Method

TESTS FOR LOCATION WITH K SAMPLES UNDER THE KOZIOL-GREEN MODEL OF RANDOM CENSORSHIP Key Words: Ke Wu Department of Mathematics University of Mississip

STAT Sample Problem: General Asymptotic Results

Symmetric Tests and Condence Intervals for Survival Probabilities and Quantiles of Censored Survival Data Stuart Barber and Christopher Jennison Depar

Statistics 262: Intermediate Biostatistics Non-parametric Survival Analysis

exclusive prepublication prepublication discount 25 FREE reprints Order now!

Empirical Likelihood Confidence Band

Efficient Semiparametric Estimators via Modified Profile Likelihood in Frailty & Accelerated-Failure Models

Conditional Inference by Estimation of a Marginal Distribution

Multi-state models: prediction

ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables

Survival analysis in R

Jackknife Empirical Likelihood for the Variance in the Linear Regression Model

Likelihood Construction, Inference for Parametric Survival Distributions

AN EMPIRICAL LIKELIHOOD RATIO TEST FOR NORMALITY

CONVERTING OBSERVED LIKELIHOOD FUNCTIONS TO TAIL PROBABILITIES. D.A.S. Fraser Mathematics Department York University North York, Ontario M3J 1P3

Introduction to Statistical Analysis

Product-limit estimators of the gap time distribution of a renewal process under different sampling patterns

Lecture 5 Models and methods for recurrent event data

Product-limit estimators of the survival function with left or right censored data

Survival analysis in R

TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST

Analysis of transformation models with censored data

Empirical likelihood for average derivatives of hazard regression functions

On a connection between the Bradley-Terry model and the Cox proportional hazards model

Efficiency Comparison Between Mean and Log-rank Tests for. Recurrent Event Time Data

Application of Parametric Homogeneity of Variances Tests under Violation of Classical Assumption

Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p.

Non-parametric Tests for the Comparison of Point Processes Based on Incomplete Data

Accelerated Failure Time Models: A Review

ST745: Survival Analysis: Nonparametric methods

TESTINGGOODNESSOFFITINTHECOX AALEN MODEL

Nonparametric rank based estimation of bivariate densities given censored data conditional on marginal probabilities

Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data

arxiv: v1 [stat.me] 2 Mar 2015

On a connection between the Bradley Terry model and the Cox proportional hazards model

Power and Sample Size Calculations with the Additive Hazards Model

UNIVERSITY OF CALIFORNIA, SAN DIEGO

Examination paper for TMA4275 Lifetime Analysis

Testing Goodness-of-Fit for Exponential Distribution Based on Cumulative Residual Entropy

Frailty Models and Copulas: Similarities and Differences

Chapter 4 Fall Notations: t 1 < t 2 < < t D, D unique death times. d j = # deaths at t j = n. Y j = # at risk /alive at t j = n

Survival Analysis: Weeks 2-3. Lu Tian and Richard Olshen Stanford University

Lecture 22 Survival Analysis: An Introduction

Statistical Inference on Constant Stress Accelerated Life Tests Under Generalized Gamma Lifetime Distributions

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

Cox s proportional hazards model and Cox s partial likelihood

ICSA Applied Statistics Symposium 1. Balanced adjusted empirical likelihood

On the Breslow estimator

A SIMPLE IMPROVEMENT OF THE KAPLAN-MEIER ESTIMATOR. Agnieszka Rossa

asymptotic normality of nonparametric M-estimators with applications to hypothesis testing for panel count data

Testing Statistical Hypotheses

Survival Analysis I (CHL5209H)

arxiv: v1 [math.st] 2 Apr 2016

Practice Exam 1. (A) (B) (C) (D) (E) You are given the following data on loss sizes:

KRUSKAL-WALLIS ONE-WAY ANALYSIS OF VARIANCE BASED ON LINEAR PLACEMENTS

A note on the asymptotic distribution of Berk-Jones type statistics under the null hypothesis

You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?

Estimation of Conditional Kendall s Tau for Bivariate Interval Censored Data

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Two-stage Adaptive Randomization for Delayed Response in Clinical Trials

The Proportional Hazard Model and the Modelling of Recurrent Failure Data: Analysis of a Disconnector Population in Sweden. Sweden

A GENERALIZED ADDITIVE REGRESSION MODEL FOR SURVIVAL TIMES 1. By Thomas H. Scheike University of Copenhagen

Transcription:

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky Empirical likelihood with right censored data were studied by Thomas and Grunkmier (1975), Li (1995), Murphy (1995), Pan (1997) and many others. We investigate in this paper the behavior of the two competing hazard formulation of the empirical likelihood: the Poisson and Binomial empirical likelihood. Simulation results show that the Binomial empirical likelihood have a better chi square approximation under null hypothesis. AMS 1991 Subject Classification: Primary 62G10; secondary 62G05. Key Words and Phrases: Censored data, Weighted hazards, Wilks theorem, Chi-square approximation. 1. Introduction Empirical likelihood (EL) method was first proposed by Thomas and Grunkmier (1975). Owen (1988, 1990, 2001) made this into a general methodology. He used parameters that are basically the means of the underlying distribution. Pan (1997), Fang (2000) and Pan and Zhou (2002) advocated using parameters that are weighted hazards. They showed among other things that by using a hazard formulation of the empirical likelihood and weighted hazards parameter, the empirical likelihood ratio handles right censored data easily. However, there are two competing versions of the hazard formulation of the empirical likelihood: namely the binomial and the Poisson version. They both are maximized by the well known Nelson-Aalen estimator. But the maximum values and the maximum of likelihood under constraints are different. We study in this paper their behavior as related to the chi square approximation to the empirical likelihood ratio under null hypothesis. We show that for discrete distributions the binomial EL ratio have much better chi square approximations under the null hypothesis. This difference is more profound for larger sample sizes as the approximation for Poisson EL do not improve with increasing sample size, where as the binomial does. For continuous distributions both ELs have good chi square approximations under null hypothesis, with Poisson some times slightly ahead of binomial in small samples. When sample sizes increase both approximations are good. 1

2. Poisson and Binomial Empirical Likelihood Suppose X 1,,X n are n independent, identically distributed observations. Assume the distribution of the X i is F(t) and the cumulative hazard function of X i is Λ(t). With right censoring, we only observe T i = min(x i,c i ) and δ i = I [Xi C i ] (1) where C i are the censoring times, assumed to be independent, identically distributed, and independent of the X i s. As shown in Pan and Zhou (2002) and Fang (2000), computations are much easier with the empirical likelihood formulated in terms of the (cumulative) hazard function. The hazard formulation of the censored data log empirical likelihood (denoted by log EL(Λ x )) is given as follows: log EL(Λ) = i {d i log v i + (R i d i )log(1 v i )} (2) where t i are the ordered, distinct values of T i ; d i = n j=1 I [Tj =t i ]δ j, and R i = n j=1 I [Tj t i ]. See, for example, Thomas and Grunkemeier (1975) and Li (1995) for similar notation and definition. Here, 0 < v i 1 are the discrete hazards at t i. We shall call this version of empirical likelihood the Binomial likelihood, following Murphy (1995). The maximization of (2) with respect to v i is known to be attained at the jumps of the Nelson-Aalen estimator: v i = d i /R i. Fang (2000) considered a hypothesis testing or confidence interval for a parameter θ with respect to the cumulative hazard function θ = g(t) log(1 dλ(t)) where g(t) is a nonnegative function. We note that θ are functionals of the cumulative hazard function. The constraints we shall impose on the hazards v i are: for given functions g( ) and constants µ, we have N 1 i g(t i )log(1 v i ) = µ, (3) where N is the total number of distinctive observation values. We need to exclude the last value as we always have v N = 1 for discrete hazards. 2

The EL ratio test statistic in terms of hazards is given by W 2 = 2{log maxel(λ)(with constraint (3)) log max EL(Λ)(without constraint)}. We have the following result that proves a version of Wilks theorem for W 2 under some regularity conditions. For proof please see Fang (2000). Theorem 1 Suppose µ = g(t)log{1 dλ(t)}. Then, the test statistic W 2 has asymptotically a distribution with 1 degrees of freedom. Remark 1 The integration constraints are originally given as θ = g(t)dlog{1 F(t)}. The above formulations are found by using the suggestive notation dlog{f(t)} = log{dλ(t)}. These two formulations are identical for both continuous and discrete F(t). Parallel results with Poisson likelihood function (defined below) and integral constraints were obtained by Pan and Zhou (2002). The Poisson definition of the empirical likelihood function is n log EL 2 = δ i log Λ(T i ) Λ(T j ). (4) i=1 j:t j T i This is called Poisson extension of the empirical likelihood, because it is in the form of a likelihood of a sequence of conditional Poisson trials. Johansen (1983) showed that the Poisson extension corresponds to the probability distribution from a dynamical Poisson process. See also Murphy (1995). It turns out, both empirical likelihood functions defined above are maximized by the jumps of the Nelson-Aalen (NA) estimator, ˆΛ(T i ) NA = δ i nj=1 I(T j T i ). Pan and Zhou (2002) studied the limit of the Poisson EL ratio with a general parameter of the form g(t)dλ(t) = g(t i ) Λ(T i ) = θ. They also obtained a chi square limit for the -2 log likelihood ratio under mild regularity conditions similar to Theorem 1 above. 3. Comparison of the EL ratios from Binomial and Poisson Type We first compare the performance of the chi square approximation for the two type of EL ratios under a discrete distribution. Tied observations arise naturally in biomedical 3

researches and other type of studies. Monte Carlo s with discrete distributions can closely mimic this type of data. To compare the Poisson and Binomial empirical likelihood ratio dealing with tied observations, s with a discrete exponential distribution were conducted. The discrete exponential random variable was generated by rounding the random variable from a standard exponential distribution to the second digit after the decimal point; and all values greater than 3 from the exponential distribution are defined as 3. An alternative approach, breaking the tied observations by adding a small value, was also applied to the Poisson empirical likelihood ratio approach. For example observations 2,2,2 will be replaced by 2,2 + ǫ,2 + 2ǫ etc. Constraint I(ti t) Λ(t i ) = θ (5) and I(ti t)log[1 Λ(t i )] = θ (6) was used for Poisson and Binomial empirical likelihood, respectively, where t is a known value. Setting t equal to 1.2 for both constraint (6) and (7), Monte Carlo of 1000 run was conducted for all three approaches (Poisson, Poisson with tie break and Binomial) using the same sample. Figures 1 and 2 are QQ-plots of the Poisson empirical likelihood ratios with respect to the χ 2 (1) percentiles. Figure 2 represents the result after artificially breaking the tied observations in the sample. The approximation is poor for both Poisson empirical likelihood ratio approaches, with tie-breaking slightly better. Figure 3 shows the results using constraint (7) and Binomial likelihood. The approximation to χ 2 (1) is very good compared to the Poisson EL ratio. Thus, the Binomial empirical likelihood ratio approach is superior to the Poisson empirical likelihood ratio approach, when the underlying distribution for the survival time is discrete. Next, we compare the two empirical likelihoods when the underlying distribution for the survival time is continuous. Using standard exponential as the survival time distribution, the was conducted with t equals 0.67 or 2.3 in the constraint (6) and (7). The survival function of the standard exponential at these two t values are 0.5 and 0.1, respectively. The results from Figure 4 indicate that the Poisson extension is comparable to Binomial extension when 1 F(t) = 0.5, and better than Binomial extension approach, when 1 4

Figure 1: 1000 Simulations with constraint (6) and Poisson empirical likelihood ratios n = 30 n = 80 n = 150 n = 500 5

Figure 2: 1000 Simulations with constraint (6) and Poisson empirical likelihood ratios. Tied observations in the sample were artificially broken by adding a small value n = 30 n = 80 n = 150 n = 500 6

Figure 3: 1000 Simulations with constraint (7) and binomial empirical likelihood ratios n = 30 n = 80 n = 150 n = 500 7

Figure 4: 3000 Simulations with standard exponential distribution. (a) Constraint (6) with 1 F(t) = 0.5 and Poisson extension. (b) Constraint (7) with 1 F(t) = 0.5 and Binomial extension. (c) Constraint (6) with 1 F(t) = 0.1 and Poisson extension. (d) Constraint (7) with 1 F(t) = 0.1 and Binomial extension. (a) n = 20 (b) n = 20 (c) n = 30 (d) n = 30 F(t) = 0.1. However these differences diminish when sample size n increases. Much more s were done and we only report representative ones here. In conclusion, the EL ratio from the Poisson likelihood is only suitable for survival time from continuous distributions, it cannot handle tied observation when distribution is discrete. The Poisson likelihood approach gives very good approximation when the underlying distribution is continuous. According to the results from the continuous distribution, the approximation from the Binomial extension approach is good when 1 F(t) = 0.5. In the case when 1 F(t) = 0.1, larger sample size is needed to reach a good approximation. We think this is due to the censoring in the tails. In other words, when there are censoring and the parameter considered is sensitive to the tail behavior, then the Poisson likelihood may have a better chi square approximation. 8

4. Concluding Remarks Two competing empirical likelihood definition lead to very different behavior for the log likelihood ratio. Practitioners should be aware of those and chose the appropriate one for the data at hand, so that the calculation of confidence interval or p-value in testing are more acurate. References Andersen, P.K., Borgan, O., Gill, R. and Keiding, N. (1993), Statistical Models Based on Counting Processes. Springer, New York. Efron, B. (1967). The Two Sample Problem With Censored Data. Proc. Fifth Berkeley Symp. Math. Statist. Probab. 4, 831-883. Fang, H. (2000) Binomial Empirical Likelihood with Discrete Censored Data. Ph.D. Dissertation Department of Statistics, University of Kentucky. Gill, R. (1989), Non-and Semi-parametric Maximum likelihood estimator and the von Mises Method (I) Scand. J. Statist. 16, 97-128. Johansen, S. (1983) An Extension of Cox s Regression Model International Statistical Review, 51, 258-262. Kaplan, E. and Meier, P. (1958), Non-parametric Estimator From Incomplete Observations J. Amer. Statist. Assoc. 53, 457 481. Li, G. (1995) On Nonparametric Likelihood Ratio Estimation of survival Probabilities for Censored Data Statistics and Probability Letters, 25, 95-104. Murphy, S. A. (1995) Likelihood Ration Based Confidence Intervals in Survival Analysis Journal of the American Statistical Association, 90, 1399-1405. Murphy, S. and Van der Varrt, (1997). Semiparametric Likelihood Ratio Inference. Ann. Statist. 25, 1471-1509. Owen, A. (1988). Empirical Likelihood Ratio Confidence Intervals for a Single Functional. Biometrika, 75 237-249. Owen, A. B. (1991), Empirical Likelihood for Linear Models The Annals of Statistics, 19, 1725-1747. Owen, A., (2001). Empirical Likelihood. Chapman & Hall, London. Pan, X. (1997), Empirical Likelihood Ratio Method for Censored Data Ph.D. Dissertation Department of Statistics, University of Kentucky. Pan, X.R. and Zhou, M., (2002). Empirical likelihood in terms of cumulative hazard function for censored data. J. Multivariate Analysis 80 (1), 166 188. Thomas, D. R. and Grunkemeier, G. L. (1975), Confidence Interval Estimation of Survival Probabilities for Censored Data Journal of the American Statistical Association, 70, 865-871. Department of Statistics University of Kentucky Lexington, KY 40506-0027 9