Error distribution function for parametrically truncated and censored data

Similar documents
Computational treatment of the error distribution in nonparametric regression with right-censored and selection-biased data

Bootstrap of residual processes in regression: to smooth or not to smooth?

Modelling Non-linear and Non-stationary Time Series

Estimation of the Bivariate and Marginal Distributions with Censored Data

Likelihood ratio confidence bands in nonparametric regression with censored data

Nonparametric Estimation of Regression Functions In the Presence of Irrelevant Regressors

I N S T I T U T D E S T A T I S T I Q U E B I O S T A T I S T I Q U E E T S C I E N C E S A C T U A R I E L L E S (I S B A)

UNIVERSITÄT POTSDAM Institut für Mathematik

TESTING FOR THE EQUALITY OF k REGRESSION CURVES

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model

Kernel density estimation with doubly truncated data

41903: Introduction to Nonparametrics

Lecture 3. Truncation, length-bias and prevalence sampling

Nonparametric Model Construction

Semiparametric modeling and estimation of the dispersion function in regression

Nonparametric Methods

Model-free prediction intervals for regression and autoregression. Dimitris N. Politis University of California, San Diego

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao

Regression Discontinuity Design Econometric Issues

Local Polynomial Regression

A GENERALIZED ADDITIVE REGRESSION MODEL FOR SURVIVAL TIMES 1. By Thomas H. Scheike University of Copenhagen

Additive Isotonic Regression

NONPARAMETRIC ADJUSTMENT FOR MEASUREMENT ERROR IN TIME TO EVENT DATA: APPLICATION TO RISK PREDICTION MODELS

Nonparametric Regression

NONPARAMETRIC CONFIDENCE INTERVALS FOR MONOTONE FUNCTIONS. By Piet Groeneboom and Geurt Jongbloed Delft University of Technology

Nonparametric Function Estimation with Infinite-Order Kernels

Practice Exam 1. (A) (B) (C) (D) (E) You are given the following data on loss sizes:

Well-developed and understood properties

Introduction. Linear Regression. coefficient estimates for the wage equation: E(Y X) = X 1 β X d β d = X β

Nonparametric Inference via Bootstrapping the Debiased Estimator

AFT Models and Empirical Likelihood

ECO Class 6 Nonparametric Econometrics

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

Quantile Regression for Residual Life and Empirical Likelihood

Nonparametric Regression Härdle, Müller, Sperlich, Werwarz, 1995, Nonparametric and Semiparametric Models, An Introduction

Individualized Treatment Effects with Censored Data via Nonparametric Accelerated Failure Time Models

STAT Section 2.1: Basic Inference. Basic Definitions

On Fractile Transformation of Covariates in Regression 1

Efficient Estimation for the Partially Linear Models with Random Effects

Bickel Rosenblatt test

DEPARTMENT MATHEMATIK ARBEITSBEREICH MATHEMATISCHE STATISTIK UND STOCHASTISCHE PROZESSE

Random fraction of a biased sample

Chapter 3: Maximum Likelihood Theory

Smooth nonparametric estimation of a quantile function under right censoring using beta kernels

Tests of independence for censored bivariate failure time data

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

Simple Linear Regression (Part 3)

Inference for Non-stationary Time Series Auto Regression. By Zhou Zhou 1 University of Toronto January 21, Abstract

Nonparametric Regression. Badr Missaoui

STAT 6385 Survey of Nonparametric Statistics. Order Statistics, EDF and Censoring

Asymptotic Statistics-III. Changliang Zou

NADARAYA WATSON ESTIMATE JAN 10, 2006: version 2. Y ik ( x i

Semiparametric modeling and estimation of heteroscedasticity in regression analysis of cross-sectional data

Estimation of the Conditional Variance in Paired Experiments

Goodness-of-fit tests for the cure rate in a mixture cure model

New Local Estimation Procedure for Nonparametric Regression Function of Longitudinal Data

Generated Covariates in Nonparametric Estimation: A Short Review.

A Review on Empirical Likelihood Methods for Regression

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Nonparametric Predictive Inference (An Introduction)

Finite Sample Performance of Semiparametric Binary Choice Estimators

Nonparametric two-sample tests of longitudinal data in the presence of a terminal event

Least Squares Model Averaging. Bruce E. Hansen University of Wisconsin. January 2006 Revised: August 2006

Bayesian Nonparametric Accelerated Failure Time Models for Analyzing Heterogeneous Treatment Effects

UNIVERSITY OF CALIFORNIA, SAN DIEGO

Local linear multiple regression with variable. bandwidth in the presence of heteroscedasticity

IV Quantile Regression for Group-level Treatments, with an Application to the Distributional Effects of Trade

Uniform Convergence Rates for Nonparametric Estimation

NONPARAMETRIC DENSITY ESTIMATION WITH RESPECT TO THE LINEX LOSS FUNCTION

Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model.

A Monte Carlo Comparison of Various Semiparametric Type-3 Tobit Estimators

Test for Discontinuities in Nonparametric Regression

Spline Density Estimation and Inference with Model-Based Penalities

Adaptive test of conditional moment inequalities

Non-parametric Inference and Resampling

Inference on distributions and quantiles using a finite-sample Dirichlet process

ECON 4160, Autumn term Lecture 1

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

Pricing and Risk Analysis of a Long-Term Care Insurance Contract in a non-markov Multi-State Model

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued

Introduction to Econometrics Final Examination Fall 2006 Answer Sheet

Fundamental concepts of functional data analysis

Statistica Sinica Preprint No: SS R4

Applied Econometrics (MSc.) Lecture 3 Instrumental Variables

Single-level Models for Binary Responses

Econ 582 Nonparametric Regression

Statistical Properties of Numerical Derivatives

Size and Shape of Confidence Regions from Extended Empirical Likelihood Tests

Nonparametric estimation of extreme risks from heavy-tailed distributions

Universidade de Vigo

Local Linear Estimation with Censored Data

Bootstrap, Jackknife and other resampling methods

ECON 721: Lecture Notes on Nonparametric Density and Regression Estimation. Petra E. Todd

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

GOODNESS-OF-FIT TEST FOR RANDOMLY CENSORED DATA BASED ON MAXIMUM CORRELATION. Ewa Strzalkowska-Kominiak and Aurea Grané (1)

Linear Regression Model. Badr Missaoui

Supplementary Materials for Residuals and Diagnostics for Ordinal Regression Models: A Surrogate Approach

Testing for Rank Invariance or Similarity in Program Evaluation: The Effect of Training on Earnings Revisited

Regression model with left truncated data

Multivariate random variables

Transcription:

Error distribution function for parametrically truncated and censored data Géraldine LAURENT Jointly with Cédric HEUCHENNE QuantOM, HEC-ULg Management School - University of Liège Friday, 14 September 2012

The doctor Anders Green et al (1981) studied the mortality of diabetics in the county of Fyn (Denmark) between the 1 July 1973 and 1 January 1982. For these data, we have the survival time of diabetic patient but this time can be partially observed, the age at the diabetic diagnosis, the sex for each patient.

Duration between the diagnosis and the study end (in years) 70 60 50 40 30 20 10 Female diabetics Censored Observed 0 0 20 40 60 80 Age at the diagnosis (in years) Duration between the diagnosis and the study end (in years) 70 60 50 40 30 20 10 Male diabetics Censored Observed 0 0 20 40 60 80 Age at the diagnosis (in years)

Estimation Asymptotic properties Data Analysis

Estimation

We consider the nonparametric regression model where Y is the response variable X is the covariate Y = m(x ) + σ(x )ε m( ) = E[Y ] and σ 2 ( ) = Var[Y ] are unknown smooth functions, ε is independent of X, with E[ε] = 0 and Var[ε] = 1.

The particularity of (X, Y ) is that (X, Y ) is obtained from cross-sectional sampling Y is subject to right censoring. We study the variable Y delimited by T Y C where T is the truncation variable C is the censoring variable.

Real World Time We use as notation F for cdf

Real World Truncation Time Time We use as notation F for cdf

Real World Truncation Time Time We use as notation F for cdf

Intermediate Observed World Y1 C2 Y3 Y4 C5 C6 Truncation Time Time We use as notation H for cdf, n the sample size

Observed World Y1 Y3 Y4 Truncation Time Time We use as notation H for cdf

We want to estimate the error distribution F ε (e) = IP(ε e) with (X, Y ) satisfying T Y C where the distribution F T X is a parametric distribution the distribution F C T X is completely unknown

We suppose that the variables Y and T are independent, conditionally on X for each value x, the support of F Y X ( x) is included into the support of F T X ( x) the lower bound of the T support is zero the variables (T, Y ) and C T are independent, conditionally on T Y, X

We have H X,Y (x, y) = IP(X x, Y y T Y C) Z Z = (E[w(X, Y )]) 1 w(r, s)df X,Y (r, s), r x the weight function w(x, y) is defined by w(x, y) = Z t y s y 1 GC T X (y t x) df T X (t x) where G C T X (z x) = IP(C T z X = x, T Y ).

We easily obtain F X,Y (x, y) = E[w(X, Y )] Therefore, we have Y m(x ) F ε (e) = IP σ(x ) = = ZZ (x,y): ZZ (x,y): y m(x) e σ(x) Z y m(x) e σ(x) u x e Z v y df X,Y (x, y) dh X,Y (u, v). w(u, v) E[w(X, Y )] dh X,Y (x, y) w(x, y)

Thus, the estimator is with ˆF ε (e) = Ê[w(X, Y )] M nx i=1 ˆε i = Y i ˆm(X i ), M = ˆσ(X i ) Ê[w(X, Y )] = 1 M nx i=1 I { ˆε i e, i = 1} ŵ(x i, Y i ) nx i ŵ(x i, Y i ) i, i=1! 1 where the functions ˆm( ), ˆσ( ) and ŵ(, ) are nonparametric estimators.

For G C T X (t x), we use the Beran (1981) estimator defined by Y W i (x, h n ) Ĝ C T X (t x) = 1 1 P nj=1 W j (x, h n )I {Z j Z i } where Z i t, i =0 Z i = min(c i T i, Y i T i ) and i = I {Y i C i } W i (x, h n ) = K x Xi hn Š P n j=1 K x Xj K is a kernel function hn Š are the Nadaraya-Watson weights h n is a bandwidth sequence tending to 0 when n => ŵ(x, y) = Z t y 1 ĜC T X (y t x) df T X (t x) Œ

The estimators of m( ) and σ( ) are given by ˆm(x) = P ni=1 W i (x,h n)y i i ŵ(x,y i ) P ni=1 W i (x,h n) i ŵ(x,y i ), ˆσ 2 (x) = P ni=1 W i (x,h n) i (Y i ˆm(x)) 2 ŵ(x,y i ) P ni=1 W i (x,h n) i ŵ(x,y i ), extension of the estimators in de Uña-Alvarez and Iglesias-Pérez (2010), described in Heuchenne-Laurent (2012).

Asymptotic properties

Under some assumptions not mentioned here, we obtain an asymptotic representation ˆF ε (e) F ε (e) = nx i=1 uniformly in < e < where Γ(Z i, Y i, X i, i, e) + o p (n 1 2 ) Γ(Z i, Y i, X i, i, e) = E[w(X, Y )] w(x i, Y i ) (I {ε i e, i = 1} F ε (e)) + f ε(e)f X (X i ) Ψ(Z i, Y i, X i, i, e) σ(x i ) for a given function Ψ.

Intermediate results used for the proof Technical lemma sup <e< n 1 nx i=1 E[w(X, Y )] w(x i, Y i ) I {ˆε i e, i = 1} E[w(X, Y )] w(x i, Y i ) I {ε i e, i = 1} IP (ˆε e χ n ) + IP (ε e) = o p n 1 2 Asymptotic representation of ˆm(x) m(x) Asymptotic representation of ˆσ(x) σ(x)

Under the assumptions of the previous theorem, the process n 1 2 ( ˆFε (e) F ε (e)) where < e < converges weakly to a zero-mean Gaussian process Ω(e) with a complex covariance function.

Data Analysis

Between 1987 and 1997 the Spanish Institute for Statistics studied the unemployment of active people, and more especially the one of married women. For these data, we note that the time of unemployment will not be completely observed, the age of the woman acts on the future job.

200 180 Censored Observed 160 Unemployment duration (in months) 140 120 100 80 60 40 20 0 0 100 200 300 400 500 600 700 800 900 1000 Woman age (in months)

For the real data, we suppose that the number of time periods is equal to 1009 but only 446 aren t censored. the distribution of T is a uniform one (Wang, 1991); the variable C is defined by C = T + τ where τ is a constant equal to 18 months; Because of the definition of the censoring variable, the weight function is simply defined by w(x, y) = Z y 0 y τ df T X (t x) The Bootstrap approximation gives the value of 70 months as the optimal bandwidth.

Representation of ˆF Y X for various ages. 1 0.9 0.8 Cumulative distribution function 0.7 0.6 0.5 0.4 0.3 0.2 0.1 cdf of Y X on 20 years cdf of Y X on 35 years cdf of Y X on 50 years 0 0 20 40 60 80 100 120 140 160 180 Unemployment time (in months)

For the real data, we suppose that the number diabetics is equal to 716 for female patients and 783 for male patients but only 241 and 256 are not censored for the female diabetics and the male diabetics. for each sex, the distribution of T X = x is supposed to be a gamma one (Wang, 1991) with a decreasing linear function for the mean and a constant function for the variance; the variable C T is supposed to be a variable dependant of the covariate for each sex. The optimal bandwidth obtained by bootstrap approximation is equal to 26 years for the female population and 28 years for the male population.

Representation of Ŝ Y X for various ages. 1 0.9 0.8 0.7 Est of F Y X ( X=15) female patient Est of F Y X ( X=30) female patient Est of F Y X ( X=50) female patient Est of F Y X ( X=15) male patient Est of F Y X ( X=30) male patient Est of F Y X ( X=50) male patient Survival function 0.6 0.5 0.4 0.3 0.2 0.1 0 0 10 20 30 40 50 60 70 Time interval between the age at diagnosis and the death (in years)

Thank you for your attention

BERAN, R. (1981): Nonparametric regression with randomly censored survival data. Technical Report, University of California, Berkeley. de UNA-ALVAREZ, J., IGLESIAS-PEREZ, M.C. (2010): Nonparametric estimation of a conditional distribution from length-biased data. Annals of the Institute of Statistical Mathematics 62, 323-341. HEUCHENNE, C., LAURENT, G. (2012): Nonparametric estimation of conditional moments with right-censored election biased data. Submission Journal of Statistical Planning and Inference. WANG, M.-C. (1991): Nonparametric estimation from cross-sectional survival data. Journal of the American Statistical Association 86, 130-143.