Nonparametric estimation of linear functionals of a multivariate distribution under multivariate censoring with applications.

Similar documents
Estimation of the Bivariate and Marginal Distributions with Censored Data

Semiparametric maximum likelihood estimation in normal transformation models for bivariate survival data

Product-limit estimators of the survival function with left or right censored data

Olivier Lopez. To cite this version: HAL Id: hal

GOODNESS-OF-FIT TESTS FOR ARCHIMEDEAN COPULA MODELS

AFT Models and Empirical Likelihood

On consistency of Kendall s tau under censoring

Harvard University. Harvard University Biostatistics Working Paper Series

Estimating Bivariate Survival Function by Volterra Estimator Using Dynamic Programming Techniques

ESTIMATION OF KENDALL S TAUUNDERCENSORING

EMPIRICAL LIKELIHOOD ANALYSIS FOR THE HETEROSCEDASTIC ACCELERATED FAILURE TIME MODEL

Nonparametric rank based estimation of bivariate densities given censored data conditional on marginal probabilities

Quantile Regression for Residual Life and Empirical Likelihood

Tests of independence for censored bivariate failure time data

TESTS FOR LOCATION WITH K SAMPLES UNDER THE KOZIOL-GREEN MODEL OF RANDOM CENSORSHIP Key Words: Ke Wu Department of Mathematics University of Mississip

Frailty Models and Copulas: Similarities and Differences

Pairwise dependence diagnostics for clustered failure-time data

Size and Shape of Confidence Regions from Extended Empirical Likelihood Tests

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

Estimation of Conditional Kendall s Tau for Bivariate Interval Censored Data

Empirical Likelihood in Survival Analysis

Review of Multivariate Survival Data

Likelihood ratio confidence bands in nonparametric regression with censored data

Lecture 3. Truncation, length-bias and prevalence sampling

A Measure of Association for Bivariate Frailty Distributions

Outline. Frailty modelling of Multivariate Survival Data. Clustered survival data. Clustered survival data

ASYMPTOTIC PROPERTIES AND EMPIRICAL EVALUATION OF THE NPMLE IN THE PROPORTIONAL HAZARDS MIXED-EFFECTS MODEL

Statistical Analysis of Competing Risks With Missing Causes of Failure

Smooth nonparametric estimation of a quantile function under right censoring using beta kernels

University of California, Berkeley

Approximate Self Consistency for Middle-Censored Data

Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach

Computational treatment of the error distribution in nonparametric regression with right-censored and selection-biased data

Analytical Bootstrap Methods for Censored Data

Conditional Copula Models for Right-Censored Clustered Event Time Data

Lifetime Dependence Modelling using a Generalized Multivariate Pareto Distribution

Lecture 5 Models and methods for recurrent event data

Inconsistency of the MLE for the joint distribution. of interval censored survival times. and continuous marks

Statistical Methods for Alzheimer s Disease Studies

GOODNESS-OF-FIT TEST FOR RANDOMLY CENSORED DATA BASED ON MAXIMUM CORRELATION. Ewa Strzalkowska-Kominiak and Aurea Grané (1)

Empirical Processes & Survival Analysis. The Functional Delta Method

Multistate Modeling and Applications

1 Glivenko-Cantelli type theorems

Estimation and Inference of Quantile Regression. for Survival Data under Biased Sampling

FULL LIKELIHOOD INFERENCES IN THE COX MODEL: AN EMPIRICAL LIKELIHOOD APPROACH

STAT Section 2.1: Basic Inference. Basic Definitions

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints

Efficiency of Profile/Partial Likelihood in the Cox Model

Multivariate Survival Data With Censoring.

Jyh-Jen Horng Shiau 1 and Lin-An Chen 1

Estimation and Goodness of Fit for Multivariate Survival Models Based on Copulas

Survival Analysis: Weeks 2-3. Lu Tian and Richard Olshen Stanford University

Empirical likelihood ratio with arbitrarily censored/truncated data by EM algorithm

Full likelihood inferences in the Cox model: an empirical likelihood approach

Asymptotic Properties of Nonparametric Estimation Based on Partly Interval-Censored Data

Editorial Manager(tm) for Lifetime Data Analysis Manuscript Draft. Manuscript Number: Title: On An Exponential Bound for the Kaplan - Meier Estimator

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Web-based Supplementary Material for. Dependence Calibration in Conditional Copulas: A Nonparametric Approach

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

Multivariate Survival Analysis

Frontier estimation based on extreme risk measures

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations

GROUPED SURVIVAL DATA. Florida State University and Medical College of Wisconsin

On the generalized maximum likelihood estimator of survival function under Koziol Green model

Regression model with left truncated data

On the Breslow estimator

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky

Asymptotic Properties of Kaplan-Meier Estimator. for Censored Dependent Data. Zongwu Cai. Department of Mathematics

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

Inference on Association Measure for Bivariate Survival Data with Hybrid Censoring

From semi- to non-parametric inference in general time scale models

Marginal Screening and Post-Selection Inference

Statistical Inference and Methods

Product-limit estimators of the gap time distribution of a renewal process under different sampling patterns

Product-limit estimators of the survival function for two modified forms of currentstatus

LEAST ABSOLUTE DEVIATIONS ESTIMATION FOR THE ACCELERATED FAILURE TIME MODEL

Chapter 2 Inference on Mean Residual Life-Overview

An augmented inverse probability weighted survival function estimator

Goodness-of-fit tests for randomly censored Weibull distributions with estimated parameters

Least Absolute Deviations Estimation for the Accelerated Failure Time Model. University of Iowa. *

Maximum likelihood estimation of a log-concave density based on censored data

Nonparametric Bayes Estimator of Survival Function for Right-Censoring and Left-Truncation Data

Overview of Extreme Value Theory. Dr. Sawsan Hilal space

Goodness-of-fit tests for the cure rate in a mixture cure model

Part III Measures of Classification Accuracy for the Prediction of Survival Times

Nonparametric Model Construction

Package Rsurrogate. October 20, 2016

Constrained estimation for binary and survival data

The Central Limit Theorem Under Random Truncation

A dimension reduction approach for conditional Kaplan-Meier estimators

Conditional Least Squares and Copulae in Claims Reserving for a Single Line of Business

A Goodness-of-fit Test for Semi-parametric Copula Models of Right-Censored Bivariate Survival Times

A Resampling Method on Pivotal Estimating Functions

Financial Econometrics and Volatility Models Copulas

Quantile Regression for Recurrent Gap Time Data

Analysing geoadditive regression data: a mixed model approach

Mixture modelling of recurrent event times with long-term survivors: Analysis of Hutterite birth intervals. John W. Mac McDonald & Alessandro Rosina

MAS3301 / MAS8311 Biostatistics Part II: Survival

Likelihood Construction, Inference for Parametric Survival Distributions

Outline. Frailty modelling of Multivariate Survival Data. Clustered survival data. Clustered survival data

Transcription:

Chapter 1 Nonparametric estimation of linear functionals of a multivariate distribution under multivariate censoring with applications. 1.1. Introduction The study of dependent durations arises in many fields. For example, in epidemiology, one can be interested in the lifetimes of twins or the times before healing after using two kinds of drugs on sick eyes. In actuarial sciences, provisioning of pension contracts with a reversion clauses requires to model the dependence between the two lifetimes involved in the contract. The reliability of several components of an engine could be studied in industry. Datasets used to perform statistical estimation in such problems often suffer from incompleteness of some observations due to censoring. Several approaches have been proposed to overcome this difficulty [HOU 00]. Among them, the estimation of the cumulative distribution function in a nonparametric way have been studied [HOU 00, DAB 88, WAN 97, LIN 93, PRE 92, TSA 86, PRU 91, LAA 96, AKR 03]. However, most of the estimators present some drawbacks, for instance, Tsai and al. [TSA 86] studied a method based on the conditional survival functions but the almost sure consistency is slow and the estimator is not symmetric in the sense that is not equivariant under reversal of coordinates. Dabrowska proposed a product limit estimator which can be seen as a bivariate Kaplan-Meier estimator [DAB 88]. The multivariate distribution function is rewritten with quantities which are easily estimated using their empirical counterparts, however, the estimator can assign negative Chapter written by Olivier LOPEZ and Philippe SAINT-PIERRE.

2 The Ouvrage-Hermes User s Guide mass to some observations [PRU 91]. Another approach based on slightly modify the observations is proposed by van der Laan [LAA 96]. A nonparametric estimator based on copula modeling is introduced by Wang et Wells [WAN 97] whereas Lin and Ying [LIN 93] focus on univariate censoring. These estimators are designed to estimate the cumulative distribution function but are not adapted to estimate the distribution. They do not define a proper joint distribution as they are not equivariant under reversal coordinates or assigns negative mass to some points. Therefore, one can not used them to estimate functionals of the multivariate distribution [GIL 92, FAN 00]. An estimator of the distribution based on weighted mean of marginal distribution have been introduced by Akritas and Van Keilegom [AKR 03]. This estimator define a proper distribution, however, the convergence rate does not achieve a n-rate, the computation and the generalization to multivariate case seems difficult. More recently, Lopez and Saint-Pierre [LOP 12] introduced an estimator which does not present these drawbacks. This estimator is asymptotically normal, can easily be extended to multivariate censored lifetimes and defines a proper joint distribution. Therefore it provides a valuable tool to study functionals of a bivariate distribution. This paper deals with multivariate right censored survival data but the notations can be quite complex. In order to facilitate the readability, a bivariate framework is considered through the paper even if the proposed estimation method can easily be adapted to deal with multivariate survival data. We are interested in estimating integrals of the distribution of a bivariate right-censored variable (T (1), T (2) ). Let us define F (t (1), t (2) ) = P(T (1) t (1), T (2) t (2) ) the joint distribution function and φ(t (1), t (2) ) a measurable function from R 2 to R such as φ(t (1), t (2) ) df (t (1), t (2) ) <. The aim is to estimate integrals of F of the following form ξ(φ) = E(φ(T (1), T (2) )) = φ(t (1), t (2) )df (t (1), t (2) ). (1.1) Such functionals are involved in several quantities useful when analyzing the relationship between bivariate data. For example, E(T (1) T (2) ) allows to obtain the covariance and the correlation coefficient. The distribution function E(1 T (1) t (1),T (2) t (2)), E(T (1) T (2) ), E(min(T (1), T (2) )) or P(T (1) < T (2) ) are others desirable quantities for the analysis. Kendall s coefficient and others dependance measures [GIL 83, FAN 00] can also be derived and will be discuss later in the paper. In absence of censoring, the estimation of integrals ξ has been studied by many authors. In the case of univariate survival data, among others, Stute has provided several works on the estimation of Kaplan-Meier integrals [STU 95]. In a multivariate framework, only few papers concern the estimation ξ. Indeed, most of the estimators of the bivariate distribution function are not adapted to this question. A class of asymptotically normal estimators designed for univariate censoring has been proposed by Lu and Burke [LU 08]. As regards bivariate censoring, Gill [GIL 92] and

Estimation of multivariate distribution under censoring with applications. 3 Fan and al. [FAN 00] has studied quantities (1.1) by correcting the Dabrowska s estimator [DAB 88] in view to obtain a regular distribution estimator. The nonparametric estimator proposed by Lopez and Saint-Pierre [LOP 12] is well adapted as it allows to obtain a proper joint distribution. Moreover, it present some desirable properties : classical empirical estimator is obtained in absence of censoring, estimates of marginal distribution lead to Kaplan-Meier estimates and it is equivariant under reversal of coordinates. Several applications useful in multivariate analysis can be derived from such estimator. Estimation of several dependence measures as the Kendall s coefficient are discussed. A bootstrap procedure for multivariate survival data is derived. The estimator [LOP 12] which defines a proper distribution (as it do not assign negative mass to some points) can be used to simulate samples. The Efron s resampling procedure [EFR 81] can then be adapted to the multivariate framework. Moreover, a regression model where the response and the covariate are both randomly right-censored is also studied. The rest of the paper is organized as follows. Section 2 introduces the bivariate distribution estimator to estimate the quantities (1.1). Section 3, provides an asymptotic i.i.d. representation for the estimators. Consistency and asymptotic normality are derived. Section 4 is devoted to the estimation of dependence measures, a bootstrap procedure and regression modeling. Finally, these methods are applied to the lifetimes of twins in section 5. 1.2. nonparametric estimation of the distribution The aim is to estimate the distribution of a random vector T = (T (1), T (2) ), subject to bivariate censoring. Therefore, T = (T (1), T (2) ) is not fully observed and the available data in case of right-censoring is an i.i.d. sample (Y j, j ) 1 j n where Y j = (Y (1) j, Y (2) j ) = (inf(t (1) j, C (1) j ), inf(t (2) j, C (2) j )), j = ( (1) j, (2) j ) = (1 T (1) j C (1) j, 1 T (2) j C (2) j and C = (C (1), C (2) ) is the censoring vector. Some identifiability assumptions are needed to derive a consistent estimate of the joint distribution of T = (T (1), T (2) ). Multivariate counterpart of traditional assumptions used in univariate survival analysis are considered in the rest of the paper, ), T and C are independent variables, (1.2) P(T (i) = C (i) ) = 0 for i = 1, 2. (1.3) In absence of censoring, the joint distribution function can easily be estimated using the empirical function F emp (t (1), t (2) ) = 1 n n j=1 1. Unfortunately, T (1) t (2) j t (1),T (2) j

4 The Ouvrage-Hermes User s Guide the empirical distribution is unavailable if T = (T (1), T (2) ) are censored and the empirical distribution of observed variables is a biased estimator. The well known Kaplan-Meier product limit estimate is often used in case of univariate right-censored data. This estimator can be rewritten as a weighted empirical sum [STU 95, SAT 01]. Each observation Y = (Y (1), Y (2) ) is weighted in order to compensate the censoring. Lopez and Saint-Pierre [LOP 12] have extended this approach to bivariate rightcensored lifetimes. The nonparametric estimator of the distribution function is given by ˆF (y (1), y (2) ) = 1 n n j=1 Ŵ j 1 Y (1) j y (1),Y (2) j y (2), (1.4) where the weights Ŵj are converging towards some limit weights W j which must satisfy the following equality E[W j φ(y (1), Y (2) )] = E[φ(T (1), T (2) )]. The weights W j represent the mass assigned to the jth observation to correct the effect of bivariate censoring. Assuming the independence between T and C, one can easily show that W j = = (1) j (2) j E[ (1) j (2) j T (1) = y (1), T (2) = y (2) ] (1.5) (1) j (2) j P(C (1) > y (1), C (2) > y (2) ). (1.6) One can observe that weights only affect mass to doubly uncensored observations. In case of independence between C (1) and C (2), the estimation of weights only consists in estimating the marginal distribution functions. In case of dependence between censoring times, a copula approach can be used to model the cumulative distribution function. In this paper, we suppose that P(C (1) > y (1), C (2) > y (2) ) = C(G 1 (y (1) ), G 2 (y (2) )) (1.7) where C is known copula. Indeed, the copula estimation for censored data has already been considered in the literature ([SHI 95]). Moreover, the theoretical results presented in the next section are still valid if the known copula is replaced by a consistent estimator of C [SHI 95]. In this case it is sufficient to estimate the marginal distribution functions G 1 and G 2 to obtain an estimator of the weights Ŵ j = (1) j (2) j C(Ĝ1(Y (1) (2) j ), Ĝ2(Y j )), (1.8)

Estimation of multivariate distribution under censoring with applications. 5 where Ĝi is the Kaplan-Meier estimator of the survival function of censoring. Indeed, by reversing the labels of T (i) and C (i), the censoring C (i) can be considered as right-censored variable. Therefore, the survival function of the censoring times can be estimated by the Kaplan-Meier estimator, for i = 1, 2, Ĝ i (t) = ( ) 1 dĥ0 (i) i (Y k ), (1.9) k:y (i) k t Ĥ i (Y (i) where Ĥi(t) = n j=1 1 Y (i) j t and Ĥ0 i (t) = n j=1 (1 (i) j )1 Y (i) j t. The estimations of weights are used to define estimators of the cumulative distribution function and integrals of the distribution ˆF (y (1), y (2) ) = 1 n n j=1 ˆξ(φ) = φ(y (1), y (2) )d ˆF (y (1), y (2) ) = 1 n k ) Ŵ j 1 Y (1) j y (1),Y (2) j y (2), (1.10) n j=1 Ŵ j φ(y (1) j, Y (2) j ), (1.11) These estimates can be generalized to multivariate observations using a multivariate copula function. Left-censored data or left and right censored data can also be considered. Indeed, the left-censored counterpart of the Kaplan-Meier estimator proposed by Gomez and al. [GÓM 92] can be used to estimate the marginal survival function in case of left censoring. 1.3. Asymptotic properties The weights used in equations (1.10) and (1.11) depend on the whole sample. Hence, they are not independent and the central limit theorem does not apply. An asymptotic i.i.d. representation for integrals (1.11) can be obtained under some assumptions related to the copula function and to the belonging of the function φ to a Donsker class of functions. Some conditions on regularity of the copula function C are needed. Assumption 1 The function (x 1, x 2 ) C(x 1, x 2 ) is twice continuously differentiable. Moreover, we will denote 1 C(x 1, x 2 ) (resp. 2 C(x 1, x 2 )) the partial derivative of C with respect to x 1 (resp. x 2 ), and we assume that sup 1 C(x 1, x 2 ) + sup 2 C(x 1, x 2 ) + x 1,x 2 x 1,x 2 and that C(x 1, x 2 ) 0 for x 1 0 and x 2 0. i,j {1,2} sup i j C(x 1, x 2 ) <, x 1,x 2

6 The Ouvrage-Hermes User s Guide Assumption 2 Assume that with 0 α i 1 for i = 1, 2. C(x 1, x 2 ) x α1 1 xα2 2, For instance, this assumption holds in the case of independence copula or copula from Frank s family with α 1 = α 2 = 1. Consistency of alternative estimators in multivariate analysis have only been studied on a compact set strictly include in the support of the distribution of Y = (Y (1), Y (2) ). Such result is restrictive, for instance in a regression framework, since only functions which vanish at the tail of the distribution are considered. Tightness arguments similar to ones used in [GIL 83] to study the consistency of Kaplan-Meier estimator on the whole line can be used to extend the consistency to functions taking non-zero values on the whole support of Y. Assumption 3 F is a Donsker class of locally bounded functions with positive locally bounded envelope function Φ such that Φ 2 (y)df (y) <, (1.12) C(G 1 (y (1) ), G 2 (y (2) )) and, for some ε > 0 arbitrary small, [ G 1 α1 1 (y (1) )K 1/2+ε 1 (y (1) ) Φ(y) where for i = 1, 2, G α2 2 (y(2) ) + G1 α2 2 (y (2) )K 1/2+ε G α1 1 (y(1) ) u dg i (t) K i (u) = 0 G i (t) 2 F i (t). ] 2 (y (2) ) df (y) < (1.13) The functions H i, G i and F i represent the survival function of the variables Y (i), C (i) and T (i) respectively. The moment condition (1.12) corresponds to a finite asymptotic variance and condition (1.13) is useful to obtain the tightness of the process. This assumption is a bivariate adaptation of assumption (1.6) described in [STU 95]. It is shown in [STU 95] that such assumption holds for a large class of distributions, for instance, when the lifetime distributions are sub-exponential the function Φ is polynomial. Donsker classes of functions [VAA 96] satisfy a uniform Central Limit Theorem property. Their asymptotic properties is a valuable tool to derive our main theorem. For any function φ L 1 (ξ), it is natural to compare ˆξ(φ) with the following quantity, ξ(φ) = 1 n W j φ(y (1) j, Y (2) j ). n j=1

Estimation of multivariate distribution under censoring with applications. 7 Indeed, ξ(φ) is a sum of i.i.d. quantities which tends to ξ(φ) by the strong law of large numbers. The following i.i.d. representation can be obtained THEOREM. Let F be a class of functions satisfying Assumption 3. Then, under identifiability assumptions 1.2 and 1.7 and under regularity assumptions 1 and 2, ˆξ(φ) ξ(φ) = 1 n n η(y j, j, t)φ(t)c(g 1 (t (1) ), G 2 (t (2) ))df (t) + R n (φ), j=1 with sup φ F R n (φ) = o P (n 1/2 ), and where η is η(y j, j ; y) = + i=1,2 i C(G 1 (y (1) ), G 2 (y (2) )) C(G 1 (y (1) ), G 2 (y (2) )) 2 { (i) (1 j )G i(y (i) j y) H i (Y (i) j ) 1 Y (i) j (i) j 1 Y (i) j >y F i (Y (i) j ) G i (y) u G }] i(u y)df i (u). H i (u)f i (u) The proof of this theorem is obtained and detailed in [LOP 12]. Note that the asymptotic normality of integrals (1.11) can easily be deduced from the i.i.d. representation and the central limit theorem. In some applications, the function φ belongs to a Donsker class of functions with bounded envelope Φ satisfying Φ(y (1), y (2) ) 0 if y (1) > τ (1) or y (2) > τ (2) for τ (i) < inf{y : P(Y (i) > y) = 0}, i = 1, 2. In such cases, the function φ are bounded with compact support strictly included in the support of the distribution, the assumption 3 is satisfied and then the assumption 2 and 3 are not needed to prove the asymptotic representation. 1.4. Statistical applications of functionals 1.4.1. Dependence measures A popular dependence measure between two variables is the Kendall s coefficient of concordance, τ = 2P[(T (1) 1 T (1) 2 )(T (2) 1 T (2) 2 ) > 0] 1. Alternatively, Kendall s coefficient τ can be evaluated by integration of the bivariate distribution function, τ = 4 F (t (1), t (2) )df (t (1), t (2) ) 1. This expression is an integrals of the form (1.1). A nonparametric estimate ˆτ can be obtained using the estimator ˆF. Theorem 1.3 can be used to obtain an asymptotic

8 The Ouvrage-Hermes User s Guide representation for ˆτ ([GRI 13]). The asymptotic normality of ˆτ can be derived from the Central Limit Theorem. THEOREM. Under identifiability assumptions 1.2 and 1.7 and under regularity assumptions 1 and 2 n 1/2 (ˆτ τ) N (0, Q). The analytic expression of the limit covariance matrix Q can be derived from the asymptotic representation and then be estimated from the data by replacing each unknown quantity by its empirical counterpart. Nevertheless, the bootstrap can be recommended to evaluate the law of ˆτ. Kendall s tau estimation helps to evaluate the dependence between the variables. It can also be useful when studying survival copula models under bivariate censoring. Indeed, Kendall s coefficient is related to parameter of association for some copula families, for example, Frank or Clayton copulas. The association parameter can be estimated using maximum likelihood estimation [SHI 95]. However, due to censored data, the optimization is very sensitive to the choice of initial value. Our estimate of Kendall s coefficient can be used as initial value of the association parameter in the maximization algorithm. The integrated hazard correlation, the median concordance [HOU 00] or the dependence measures defined in [GIL 92, FAN 00] can also be estimated using (1.11). 1.4.2. Bootstrap Estimator of integrals (1.1) is proven to be asymptotically normal [LOP 12]. The expression of the asymptotic variance can be derived even if the explicit expression is very complex. The variance can be estimated by replacing the unknown functions by their empirical estimates and asymptotic confidence intervals can be obtained. However, such intervals are often larger than those obtain with bootstrap methods. In case of univariate survival data, a bootstrap procedure introduced by Efron [EFR 81] is consistent and seems to give the best results [AKR 86]. Instead of resampling directly the pairs of observed variables (Y, ), Efron s methodology consist in simulating lifetimes and censoring times using nonparametric estimators of their distributions. The aim is to adapt this approach to a multivariate framework using the estimator (1.10) (which defines a proper distribution). The lifetimes (T (1), T (2) ) are simulated using the estimated distribution ˆF. The assumption (1.7) can be used to estimate the distribution function G of the censoring times (C (1), C (2) ), Ĝ(y (1), y (2) ) = C(Ĝ1(y (1) ), Ĝ2(y (2) )), (1.14) where Ĝ1 and Ĝ2 are the Kaplan-Meier estimators defined in (1.9). Samples of censoring times can be simulated according to the estimator of the distribution Ĝ. One can then construct samples of observed variables and estimated the integral (1.1) on each sample. The bootstrap procedure is as follow:

Estimation of multivariate distribution under censoring with applications. 9 Step 0: Compute estimates ˆF and Ĝ of the distribution functions of lifetimes F and censoring times G using estimators (1.10) and (1.14). Step 1: For b = 1,..., B, 1) Simulate independent variables (T (1) j,b, T (2) j,b ) 1 j n according to the distribution ˆF, 2) Simulate independent variables (C (1) j,b, C(2) j,b ) 1 j n according to the distribution Ĝ, 3) Compute the bth bootstrap sample (Y (1) j,b, Y (2) j,b, (1) j,b, (2) j,b ) 1 j n where for k = 1, 2, Y (k) j,b (k) = inf(t j,b, C(k) j,b ), (k) j,b = 1 T (k) j,b C(k) j,b 4) Compute the integral (1.11) for the bth bootstrap sample ˆξ b. Step 2: Use ˆξ 1,...,ˆξ B to estimate the distribution of ˆξ., The sample ˆξ 1,..., ˆξ B can be used to estimate the distribution, the mean ˆµ(ˆξ) or the variance ˆσ(ˆξ) of ˆξ using the empirical estimate. A large sample confidence interval of the mean µ(ˆξ) can be obtained using a normal approximation, CI 1 (α) = [ˆµ(ˆξ) ± z α/2 ˆσ(ˆξ) n ]. Another confidence interval can be obtained by estimating the quantiles ˆq α/2 and ˆq 1 α/2 using the empirical distribution. A confidence band is then given by CI 2 (α) = [ˆq α/2 ; ˆq 1 α/2 ]. In practice, the total mass of the distributions defined by ˆF and Ĝ can be lower than 1. A normalization by dividing the distribution by the total mass can be required. Such bootstrap procedure can be used to estimate the distribution of test statistics [GRI 13]. Bias corrections and weighted bootstrap approaches could also be adapted to our framework.

10 The Ouvrage-Hermes User s Guide 1.4.3. Linear regression The linear regression model has been studied in the particular case where the covariate is observed and the response is censored [STU 93, DEL 08]. We consider the more general case where the response and the covariate are both randomly censored. Assume that T (1) = f(β, T (2) ) + ɛ where E[ɛ T (2) ] = 0 and f is a known family of functions depending on a finite dimensional parameter β. As the two variables are censored, the nonparametric estimator (1.11) can be used to obtain a consistent least square estimator of β, ˆβ = arg min β φ(t (1), t (2) )d ˆF (t (1), t (2) ) with φ(t (1), t (2) ) = (t (1) f(β, t (2) )) 2. In the case of linear regression, the function φ satisfy the assumptions required to obtain the asymptotic representation of the integrals. One can then deduced an asymptotic representation and the asymptotic normality of ˆβ ([LOP 12]). THEOREM. Under identifiability assumptions 1.2 and 1.7 and under regularity assumptions 1 and 2 n 1/2 ( ˆβ β) N (0, R). The complex expression of the asymptotic covariance matrix R can be derived but can easily be estimated using bootstrap procedure. This approach can be seen as a generalization of the results obtained by [STU 93] when only one variables is censored. Note that prediction measures as the coefficient of determination R 2 can also be estimate using (1.11). A generalized linear model with censored variables can be considered using the same approach. The extension to a regression model with two or more covariates is possible as the estimators can be adapted to multivariate censored data. 1.5. Illustration Danish Twin Registry are used to illustrate the analysis of bivariate survival data using integrals of the distribution. The registry include more than 28000 pairs born from 1870-1952 in Denmark. The ascertainment of twins has taken place using different methods leading to several birth cohorts (1870-1910; 1911-1930; 1931-1952). An accurate information is the zygosity : monozygotic (MZ), dizygotic (DZ), unknown zygosity (UZ). We refer to [SKY 02] for detailed descriptions of the registry and the way it has been collected. The survival function can be estimated using estimator (1.10). In this application the censoring times are supposed to be independent. The estimation of survival function are obtained for each birth cohort and represented using contour plot (Fig. 1.1).

Estimation of multivariate distribution under censoring with applications. 11 Figure 1.1. Contour plot of the survival function for each birth cohort. Kendall s tau coefficient between the twin lifetimes is estimated for each birth cohort as well as the regression of T 1 on T 2. The estimations of Kendall s tau and regression coefficient displayed in the following tables lead to the same conclusion. The dependence between lifetime is stronger for monozygotic and seems to grow with time. Kendall tau 1870-1910 1911-1930 1931-1950 MZ 0.184 0.289 0.303 DZ 0.097 0.135 0.057 UZ 0.162 0.188 0.369 Table 1.1. Estimation of Kendall tau for each birth cohort. Regression coefficient 1870-1910 1911-1930 1931-1950 MZ 0.149 0.280 0. 354 DZ 0.088 0.115 0.037 UZ 0.161 0.212 0.279 Table 1.2. Estimation of regression coefficients for each birth cohort. 1.6. Conclusion We introduce a nonparametric estimator of the distribution of multivariate failure times. This estimator has several interesting properties, it is asymptotically normal, it

12 The Ouvrage-Hermes User s Guide generalize the univariate Kaplan-Meier estimator and the empirical function, it define a proper distribution and it is fast to compute. Using this estimator, integrals according to the distribution of failure times can be estimated consistently. We proposed several practical applications of the integrals of a distribution. Dependence measures as the Kendall s coefficient can be estimated. This measure can be used for studying the dependence but also to estimate the association parameter of a copula. A bootstrap procedure according to Efron s methodology is provided to derive confidence intervals. The regression model with censored response and variables is also considered. The obtained results can be generalized to the multivariate case and can be adapted to left censoring. Acknowledgment This work is supported by French Agence Nationale de la Recherche (ANR) ANR Grant ANR-09-JCJC-0101-01. The authors wish to thank the Danish Twin Register, University of Southern Denmark for providing the data. 1.7. Bibliography [AKR 86] AKRITAS M. G., Bootstrapping the Kaplan-Meier estimator, Assoc., vol. 81, num. 396, p. 1032 1038, 1986. J. Amer. Statist. [AKR 03] AKRITAS M. G., VAN KEILEGOM I., Estimation of bivariate and marginal distributions with censored data, J. R. Stat. Soc. Ser. B Stat. Methodol., vol. 65, num. 2, p. 457 471, 2003. [DAB 88] DABROWSKA D. M., Kaplan-Meier estimate on the plane, Ann. Statist., vol. 16, num. 4, p. 1475 1489, 1988. [DEL 08] DELECROIX M., LOPEZ O., PATILEA V., Nonlinear censored regression using synthetic data, Scand. J. Statist., vol. 35, num. 2, p. 248 265, 2008. [EFR 81] EFRON B., Censored data and the bootstrap, J. Amer. Statist. Assoc., vol. 76, num. 374, p. 312 319, 1981. [FAN 00] FAN J., HSU L., PRENTICE R. L., Dependence estimation over a finite bivariate failure time region, Lifetime Data Anal., vol. 6, num. 4, p. 343 355, 2000. [GIL 83] GILL R., Large sample behaviour of the product-limit estimator on the whole line, Ann. Statist., vol. 11, num. 1, p. 49 58, 1983. [GIL 92] GILL R. D., Multivariate Survival Analysis, vol. 37, p. 19 35, 1992. Teor. Veroyatnost. i Primenen., [GÓM 92] GÓMEZ G., JULIÀ O., UTZET F., Survival analysis for left censored data, Survival analysis: state of the art (Columbus, OH, 1991), vol. 211 of NATO Adv. Sci. Inst. Ser. E Appl. Sci., p. 269 288, Kluwer Acad. Publ., Dordrecht, 1992, With a discussion by Melvin L. Moeschberger and a rejoinder by the authors. [GRI 13] GRIBKOVA S., LOPEZ O., SAINT-PIERRE P., A simplified model for studying bivariate mortality under right-censoring, Journal of Multivariate Analysis, vol. 115, p. 181 192, 2013.

Estimation of multivariate distribution under censoring with applications. 13 [HOU 00] HOUGAARD P., Analysis of multivariate survival data, Statistics for Biology and Health, Springer-Verlag, New York, 2000. [LAA 96] VAN DER LAAN M. J., Efficient estimation in the bivariate censoring model and repairing NPMLE, Ann. Statist., vol. 24, num. 2, p. 596 627, 1996. [LIN 93] LIN D. Y., YING Z., A simple nonparametric estimator of the bivariate survival function under univariate censoring, Biometrika, vol. 80, num. 3, p. 573 581, 1993. [LOP 12] LOPEZ O., SAINT PIERRE P., Bivariate censored regression relying on a new estimator of the joint distribution function, Journal of Statistical Planning and Inference, vol. 142, num. 8, p. 2440 2453, 2012. [LU 08] LU X., BURKE M. D., Nonparametric estimation of linear functionals of a bivariate distribution under univariate censoring, J. Statist. Plann. Inference, vol. 138, num. 10, p. 3238 3256, 2008. [PRE 92] PRENTICE R. L., CAI J., Covariance and survivor function estimation using censored multivariate failure time data, Biometrika, vol. 79, num. 3, p. 495 512, 1992. [PRU 91] PRUITT R. C., On negative mass assigned by the bivariate Kaplan-Meier estimator, Ann. Statist., vol. 19, num. 1, p. 443 453, 1991. [SAT 01] SATTEN G. A., DATTA S., The Kaplan-Meier estimator as an inverse-probabilityof-censoring weighted average, Amer. Statist., vol. 55, num. 3, p. 207 210, 2001. [SHI 95] SHIH J. H., LOUIS T. A., Inferences on the association parameter in copula models for bivariate survival data, Biometrics, vol. 51, num. 4, p. 1384 1399, 1995. [SKY 02] SKYTTHE A., KYVIK K., HOLM N. V., VAUPEL J. W., CHRISTENSEN K., The danish twin registry: 127 birth cohorts of twins, Twin Research, vol. 5, p. 352 357, 2002. [STU 93] STUTE W., Consistent estimation under random censorship when covariables are present, J. Multivariate Anal., vol. 45, num. 1, p. 89 103, 1993. [STU 95] STUTE W., The central limit theorem under random censorship, vol. 23, num. 2, p. 422 439, 1995. Ann. Statist., [TSA 86] TSAI W.-Y., LEURGANS S., CROWLEY J., Nonparametric estimation of a bivariate survival function in the presence of censoring, Ann. Statist., vol. 14, num. 4, p. 1351 1365, 1986. [VAA 96] VAN DER VAART A. W., WELLNER J. A., Weak convergence and empirical processes with applications to statistics, Springer Series in Statistics, Springer-Verlag, New York, 1996. [WAN 97] WANG W., WELLS M. T., Nonparametric estimators of the bivariate survival function under simplified censoring conditions, Biometrika, vol. 84, num. 4, p. 863 880, 1997.

Index bivariate censoring, 3 bootstrap, 8 copula, 4, 8 Danish Twin Registry, 10 dependence measure, 7 Kaplan-Meier, 5 Kendall s coefficient, 7 linear regression, 10 multivariate survival analysis, 2