For right censored data with Y i = T i C i and censoring indicator, δ i = I(T i < C i ), arising from such a parametric model we have the likelihood,

Similar documents
CALCULATION METHOD FOR NONLINEAR DYNAMIC LEAST-ABSOLUTE DEVIATIONS ESTIMATOR

Problem Set 3: Bootstrap, Quantile Regression and MCMC Methods. MIT , Fall Due: Wednesday, 07 November 2007, 5:00 PM

Supporting Information for Estimating restricted mean. treatment effects with stacked survival models

Economics 536 Lecture 21 Counts, Tobit, Sample Selection, and Truncation

Quantile regression and heteroskedasticity

Lecture 22 Survival Analysis: An Introduction

Bayesian quantile regression for censored data

LQ-Moments for Statistical Analysis of Extreme Events

Single Index Quantile Regression for Heteroscedastic Data

Quantile Regression for Recurrent Gap Time Data

Censoring mechanisms

Bayesian spatial quantile regression

L-momenty s rušivou regresí

Luke B Smith and Brian J Reich North Carolina State University May 21, 2013

TMA 4275 Lifetime Analysis June 2004 Solution

Quantile Regression for Residual Life and Empirical Likelihood

COMPARISON OF GMM WITH SECOND-ORDER LEAST SQUARES ESTIMATION IN NONLINEAR MODELS. Abstract

On Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness

A Comparison of Robust Estimators Based on Two Types of Trimming

What s New in Econometrics? Lecture 14 Quantile Methods

Modeling Real Estate Data using Quantile Regression

References. Regression standard errors in clustered samples. diagonal matrix with elements

Approximate Median Regression via the Box-Cox Transformation

Quantile Regression: Inference

Robust covariance estimation for quantile regression

Double Bootstrap Confidence Interval Estimates with Censored and Truncated Data

Can we do statistical inference in a non-asymptotic way? 1

Inference on distributions and quantiles using a finite-sample Dirichlet process

Section 7: Local linear regression (loess) and regression discontinuity designs

University of California, Berkeley

Quantile Regression: Inference

Working Paper No Maximum score type estimators

Estimation of Censored Quantile Regression for Panel Data with Fixed Effects

Quantile Regression for Extraordinarily Large Data

AFT Models and Empirical Likelihood

1 Procedures robust to weak instruments

New Developments in Econometrics Lecture 16: Quantile Estimation

General Regression Model

Statistical Practice

Hazard Function, Failure Rate, and A Rule of Thumb for Calculating Empirical Hazard Function of Continuous-Time Failure Data

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data

REGRESSION WITH SPATIALLY MISALIGNED DATA. Lisa Madsen Oregon State University David Ruppert Cornell University

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

M-Quantile And Expectile. Random Effects Regression For

A Sampling of IMPACT Research:

Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Spatio-Temporal Latent Variable Models: A Potential Waste of Space and Time?

A Note on Bayesian Inference After Multiple Imputation

Testing an Autoregressive Structure in Binary Time Series Models

Product-limit estimators of the survival function with left or right censored data

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA

Analysis of Time-to-Event Data: Chapter 4 - Parametric regression models

p(t)dt a p(τ)dτ , c R.

UNIVERSITÄT POTSDAM Institut für Mathematik

TECHNICAL REPORT # 59 MAY Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study

Improving linear quantile regression for

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½

Practical Bayesian Quantile Regression. Keming Yu University of Plymouth, UK

Markov Chain Monte Carlo methods

Multistate Modeling and Applications

ON THE CONSEQUENCES OF MISSPECIFING ASSUMPTIONS CONCERNING RESIDUALS DISTRIBUTION IN A REPEATED MEASURES AND NONLINEAR MIXED MODELLING CONTEXT

MS&E 226: Small Data

Does Modeling Lead to More Accurate Classification?

A Study of Relative Efficiency and Robustness of Classification Methods

A multistate additive relative survival semi-markov model

SUPPLEMENT TO PARAMETRIC OR NONPARAMETRIC? A PARAMETRICNESS INDEX FOR MODEL SELECTION. University of Minnesota

Nuisance Parameter Estimation in Survival Models

Single Lens Lightcurve Modeling

Censored quantile instrumental variable estimation with Stata

The Adequate Bootstrap

regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered,

A Bivariate Weibull Regression Model

Quasi-likelihood Scan Statistics for Detection of

The Goodness-of-fit Test for Gumbel Distribution: A Comparative Study

Bootstrap & Confidence/Prediction intervals

Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview

Bootstrap and Parametric Inference: Successes and Challenges

Some Approximations of the Logistic Distribution with Application to the Covariance Matrix of Logistic Regression

The Accelerated Failure Time Model Under Biased. Sampling

Comment on Article by Scutari

6.867 Machine Learning

Inference For High Dimensional M-estimates. Fixed Design Results

Introduction to Quantile Regression

Package Rsurrogate. October 20, 2016

Inference For High Dimensional M-estimates: Fixed Design Results

STAT 830 Non-parametric Inference Basics

where x i and u i are iid N (0; 1) random variates and are mutually independent, ff =0; and fi =1. ff(x i )=fl0 + fl1x i with fl0 =1. We examine the e

Lecture 3. Truncation, length-bias and prevalence sampling

Bootstrap, Jackknife and other resampling methods

Outline of GLMs. Definitions

Quantile regression for longitudinal data using the asymmetric Laplace distribution

Introduction to Reliability Theory (part 2)

lqmm: Estimating Quantile Regression Models for Independent and Hierarchical Data with R

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given.

ST495: Survival Analysis: Hypothesis testing and confidence intervals

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Chapter 2 Inference on Mean Residual Life-Overview

Non-linear panel data modeling

Statistics & Data Sciences: First Year Prelim Exam May 2018

Transcription:

A NOTE ON LAPLACE REGRESSION WITH CENSORED DATA ROGER KOENKER Abstract. The Laplace likelihood method for estimating linear conditional quantile functions with right censored data proposed by Bottai and Zhang (1) is demonstrated to be highly non-robust. 1. Introduction Bottai and Zhang (1) have recently proposed a method of estimating linear conditional quantile functions for right censored data based on an asymmetric Laplace likelihood formulation. They assert that their procedure is both more accurate and faster than prior proposals by Portnoy (3) and Peng and Huang (8). The objective of this note is to demonstrate that neither of these claims are correct. 2. Inconsistency of the Bottai and Zhang Estimator The flaw in the proposed estimator is immediately apparent from consideration of the simplest median regression setting, where, in effect, it is assumed that the event times, T i, conditional on covariate vector, x i, arise from the double exponential (Laplace) density, f(t x) = 1 4σ exp( t x β /2σ), with corresponding distribution function, F (t i x i ) = 1 2 (1 + sgn(t x β)[1 exp( t x β /2σ)], For right censored data with Y i = T i C i and censoring indicator, δ i = I(T i < C i ), arising from such a parametric model we have the likelihood, n l n (β, σ) = δ i log f(y i x i ) + (1 δ i ) log F (y i x i ) i=1 and we can optimize with respect to (β, σ). Optimizing leads to a weighted quantile regression procedure with weights depending upon the estimated scale parameter ˆσ. This weighting is asymptotically the same as that introduced by Portnoy (3) provided the Laplacian distributional assumption holds. In the uncensored case, so δ i 1, it is easily seen that the estimation of β is independent of σ, reducing to conventional median regression, and ˆσ becomes simply the mean absolute error of the median regression fit. Thus, the Laplacian assumption is harmless. However, in the censored case, the estimation of β and σ become inextricably linked, so when the distributional assumption fails, the weights produced by the Laplace likelihood are no longer correct and the procedure fails to yield a consistent estimator. Note that this is the case even when the errors are Laplace Version: March 23, 11. This note was partially supported by NSF grant SES-8-6. All code required to reproduce the results reported here is available in R from the author by request, or from the url http://www.econ.uiuc.edu/~roger/research/crq/crq.html. 1

2 Laplace Regression if one is estimating a quantile other than the median with the asymmetric version of the Laplace likelihood, and in non-iid error conditions the situation is even more problematic since a single scale parameter based on the mean absolute error no longer provides a reliable way to estimate the weighting process. This failing is apparent in the simulation experiments reported in the next section. 3. Some Simulation Evidence Bottai and Zhang write, On the whole, we found that the performance of Portnoy s and Peng and Huang s methods was poorer than that of Wang and Wang (9)... In all the simulation scenarios considered, Laplace regression was consistently better than Wang and Wang s method in terms of bias and [coverage probability]. To reevaluate the finite sample performance of the procedure we have tried to reproduce some of the simulation results of Bottai and Zhang (1). This is somewhat difficult, since the simulation design is incompletely described, however we believe that the setting described below is quite similar to theirs. We focus on the comparison with Portnoy s estimator. We consider nine distinct models. There are three specifications of the covariate effects [M1] (τ x i ) = 2 + 6x 1i + Fu [M2] (τ x i ) = 1 + 2x 2i + 3x 3i + Fu [M3] (τ x i ) = 2 + 6x 1i + x 1i Fu and there are three choices of the quantile function, Fu 1 : Standard Gaussian, Gumbel with scale π/ 3, and t 3 with scale 3, designated F1, F2, and F3 respectively in Table 1. The covariates are generated iid-ly from standard uniform, standard Gaussian and Bernoulli (p =.) distributions respectively. The precise nature of the censoring used in the Bottai-Zhang simulations is somewhat unclear, but they comment that the censoring variable, C i is uniformly distributed on [, a] with a chosen for each model to make the censoring rate approximately equal to and 4% [sic]. In the simulations reported here, this same uniform censoring is employed with a = 3 for models M1 and M3, and a = for model M2. This yields between 1/4 to 1/3 censoring in each of the nine models investigated. In all cases we estimate four coefficients for the three quartiles and compare performance of the Bottai-Zhang estimator with that of Portnoy s crq estimator as implemented in the quantreg R package of Koenker (1). Prior comparisons reported in Koenker (8) indicate that the Peng and Huang (8) estimator performs very similarly to that of Portnoy (3). For the implementation of the Bottai-Zhang estimator we have used the code provided in the electronic supplement to their published paper. Before considering all nine models we conducted a preliminary exercise with the simplest of the nine, model M1 with Gaussian error. In Figures 1 and 2 report bias and (scaled) root mean squared error for the Bottai-Zhang estimator and the Portnoy estimator, for four sample sizes: n {1,, 1, 1, }. There are four parameters for each of the fitted models, and we evaluate performance for the three quartile estimates. Figure 1 reports the absolute value of the bias scaled by 1, for the two estimators. Figure 2 reports root mean squared errors scaled by the square root of sample size n. In all cases 1 replications were done.

R. Koenker 3 As can be seen in Figure 1, the bias of the Portnoy estimate is quite small and stable over the range of sample sizes considered, however the Bottai-Zhang estimator has quite substantial bias, somewhat attentuated by larger sample sizes, but remaining quite substantial even at the n = 1, sample size. The impact of this bias effect is also apparent in Figure 2 where root mean squared errors particularly for the nonzero coefficients β 1 and β 2 for the first and third quartiles are increasing in sample sample size. Note that because the root mean squared errors are scaled by the square root of sample size they should be quite stable with respect to n, and we see that this is true for the Portnoy crq estimator, but it is not the case for the Bottai-Zhang Laplace likelihood estimator. 4 1 4 Bias 1 4 1 log(n) Figure 1. Absolute Values of Bias (scaled by 1) for the Portnoy (magenta filled circles) and Bottai-Zhang (blue open circles) censored quantile regression estimators. In Table 2 we report comparative performance of the Bottai-Zhang (Laplace likelihood) estimator relative to the Portnoy crq estimator for all nine models. In all cases n = and 1 replications were done. Entries in the table represent ratio of root mean square errors for the two methods, with entries greater than one indicating superior performance for the Portnoy estimator. As is evident from the table the Portnoy method dominates the Bottai-Zhang estimator in all of the cases investigated. 4. Computational Speed Finally, we come to the Bottai-Zhang claim that their estimator, as implemented in R with the Nelder-Mead option of the R function optim, is faster than the Portnoy estimator. This is truly puzzling, since our R timings of a typical instance of the simulation estimation of the three quartiles is roughly 4 times faster for the Portnoy method than the Bottai-Zhang method when the sample size is, times faster

4 Laplace Regression RMSE 2 1 2 1 log(n) 2 1 Figure 2. Root mean squared error (scaled by n) for the Portnoy (magenta filled circles) and Bottai-Zhang (blue open circles) censored quantile regression estimators. when n = 1, and even at n = 1, is slightly faster despite the fact that it has estimated the model at more than 1 τ s rather than only three. These timings were made without any bootstrapping.. Conclusions We cannot offer any explanation for what went wrong suffice it to say that both of the central claims of Bottai and Zhang (1) fail to stand up to closer examination. As for the heuristic interpretation offered for the Laplace likelihood approach, it is worth mentioning again that in the uncensored case their method reduces to conventional quantile regression, so the scale parameter estimation is harmless, but it should also be emphasized that the proposed Nelder-Mead algorithm is a poor alternative to standard linear programming methods. In the censored case, the scale parameter of the Laplace likelihood is not so innocuous, it is unable to adapt to the myriad possibilities that give rise to the linear conditional quantile model, and consequently its performance is inevitably poor. References Bottai, M., and J. Zhang (1): Laplace regression with censored data, Biometrical Journal, 2, 487 3. Koenker, R. (8): Censored Quantile Regression Redux, Journal of Statistical Software, 27, 1 14. Koenker, R. (1): quantreg: An R Package for Quantile RegressionVersion 4.7, http:// CRAN.R-project.org/package=quantreg. Peng, L., and Y. Huang (8): Survival Analysis with Quantile Regression Models, Journal of American Statistical Association, 13, 637 649.

R. Koenker Coefficient M1 M2 M3 F1 F2 F3 F1 F2 F3 F1 F2 F3 τ =.2 ˆβ 1.19 1. 3.41 1.41 1.34 6.29 1.21 1.19 6.17 ˆβ 1 1. 1.17 1.94 1.37 1.37 3.7 1.41 1.24 3.26 ˆβ 2 1.12 1.2 2.78 1.2 1.9 4.87 1.26 1.7 4.93 ˆβ 3 1.28 1.9 3.46 1. 1.19.7 1.3 1.16 6.48 τ =. ˆβ 1.12 1.31 4.4 1.2 2.6 7.34 1.37 1.79.9 ˆβ 1 1.7 1.31 2.8 1.17 2.1 2.1 1.27 1.9 3.1 ˆβ 2 1. 1.1 3.64 1. 1.44 3.13 1.2 1. 4.4 ˆβ 3 1.8 1.2 2.72 1.22 1.6 3.2 1.3 1.64 4.6 τ =.7 ˆβ 2.4 3.82 16.1 2.6 3.3 19.8 3.7 6.1 21.78 ˆβ 1 2. 3.43 4.82 1.83 3.38.84 2.71 4.61 6.42 ˆβ 2 2.26 2.1 9.14 1.42 1.8 14.33 3.73 2.91.98 ˆβ 3 1.8 3.7 9.92 1.6 2.18 9.6 2.74.7 12.71 Table 1. Ratio of Root Mean Squared Errors: Table entries report the ratio of the RMSE for the Bottai-Zhang (Laplace likelihood) estimator to the corresponding Portnoy crq estimator, so entries larger than one indicate superior performance for the Portnoy method. All entries are based on 1 replications. Portnoy, S. (3): Censored Quantile Regression, Journal of American Statistical Association, 98, 11 112. Wang, H. J., and L. Wang (9): Locally Weighted Censored Quantile Regression, Journal of American Statistical Association, 14, 1117 1128.