DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS

Similar documents
Nonparametric confidence intervals. for receiver operating characteristic curves

A Simple Regression Problem

Non-Parametric Non-Line-of-Sight Identification 1

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

SMOOTH ESTIMATION OF ROC CURVE IN THE PRESENCE OF AUXILIARY INFORMATION

Testing equality of variances for multiple univariate normal populations

Boosting with log-loss

AN EFFICIENT CLASS OF CHAIN ESTIMATORS OF POPULATION VARIANCE UNDER SUB-SAMPLING SCHEME

Keywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution

Machine Learning Basics: Estimators, Bias and Variance

Biostatistics Department Technical Report

An Approximate Model for the Theoretical Prediction of the Velocity Increase in the Intermediate Ballistics Period

A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION

Bootstrapping Dependent Data

GEE ESTIMATORS IN MIXTURE MODEL WITH VARYING CONCENTRATIONS

Block designs and statistics

C na (1) a=l. c = CO + Clm + CZ TWO-STAGE SAMPLE DESIGN WITH SMALL CLUSTERS. 1. Introduction

Tail Estimation of the Spectral Density under Fixed-Domain Asymptotics

Necessity of low effective dimension

Ensemble Based on Data Envelopment Analysis

Bayes Decision Rule and Naïve Bayes Classifier

Estimation of the Mean of the Exponential Distribution Using Maximum Ranked Set Sampling with Unequal Samples

Stochastic Subgradient Methods

Introduction to Machine Learning. Recitation 11

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

The Distribution of the Covariance Matrix for a Subset of Elliptical Distributions with Extension to Two Kurtosis Parameters

An l 1 Regularized Method for Numerical Differentiation Using Empirical Eigenfunctions

Comparison of Stability of Selected Numerical Methods for Solving Stiff Semi- Linear Differential Equations

On Constant Power Water-filling

Computable Shell Decomposition Bounds

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

A Note on the Applied Use of MDL Approximations

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics

A Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair

Proc. of the IEEE/OES Seventh Working Conference on Current Measurement Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

Combining Classifiers

Computational and Statistical Learning Theory

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

A remark on a success rate model for DPA and CPA

LogLog-Beta and More: A New Algorithm for Cardinality Estimation Based on LogLog Counting

Meta-Analytic Interval Estimation for Bivariate Correlations

AN OPTIMAL SHRINKAGE FACTOR IN PREDICTION OF ORDERED RANDOM EFFECTS

Sharp Time Data Tradeoffs for Linear Inverse Problems

COS 424: Interacting with Data. Written Exercises

Ch 12: Variations on Backpropagation

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Analyzing Simulation Results

Extension of CSRSM for the Parametric Study of the Face Stability of Pressurized Tunnels

The Methods of Solution for Constrained Nonlinear Programming

Shannon Sampling II. Connections to Learning Theory

Optimal Jackknife for Discrete Time and Continuous Time Unit Root Models

Constructing Locally Best Invariant Tests of the Linear Regression Model Using the Density Function of a Maximal Invariant

Computational and Statistical Learning Theory

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Research in Area of Longevity of Sylphon Scraies

TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES

Ph 20.3 Numerical Solution of Ordinary Differential Equations

EFFICIENT GEOMETRIC METHODS FOR KERNEL DENSITY ESTIMATION BASED INDEPENDENT COMPONENT ANALYSIS

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis

The proofs of Theorem 1-3 are along the lines of Wied and Galeano (2013).

Pseudo-marginal Metropolis-Hastings: a simple explanation and (partial) review of theory

On the approximation of Feynman-Kac path integrals

Computable Shell Decomposition Bounds

Nonmonotonic Networks. a. IRST, I Povo (Trento) Italy, b. Univ. of Trento, Physics Dept., I Povo (Trento) Italy

Best Linear Unbiased and Invariant Reconstructors for the Past Records

Ştefan ŞTEFĂNESCU * is the minimum global value for the function h (x)

OBJECTIVES INTRODUCTION

arxiv: v1 [cs.ds] 3 Feb 2014

MSEC MODELING OF DEGRADATION PROCESSES TO OBTAIN AN OPTIMAL SOLUTION FOR MAINTENANCE AND PERFORMANCE

PORTMANTEAU TESTS FOR ARMA MODELS WITH INFINITE VARIANCE

Least Squares Fitting of Data

On Conditions for Linearity of Optimal Estimation

The Transactional Nature of Quantum Information

Nonlinear Log-Periodogram Regression for Perturbed Fractional Processes

Chapter 2. Small-Signal Model Parameter Extraction Method

A model reduction approach to numerical inversion for a parabolic partial differential equation

On a few Iterative Methods for Solving Nonlinear Equations

Statistics and Probability Letters

Distributed Subgradient Methods for Multi-agent Optimization

arxiv: v1 [math.st] 23 Feb 2012

A Comparative Study of Parametric and Nonparametric Regressions

In this chapter, we consider several graph-theoretic and probabilistic models

are equal to zero, where, q = p 1. For each gene j, the pairwise null and alternative hypotheses are,

A new type of lower bound for the largest eigenvalue of a symmetric matrix

A Smoothed Boosting Algorithm Using Probabilistic Output Codes

Probability Distributions

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information

CHAPTER 19: Single-Loop IMC Control

Nuclear Instruments and Methods in Physics Research B 262 (2007)

Warning System of Dangerous Chemical Gas in Factory Based on Wireless Sensor Network

Research Article Approximate Multidegree Reduction of λ-bézier Curves

IN modern society that various systems have become more

Hyperbolic Horn Helical Mass Spectrometer (3HMS) James G. Hagerman Hagerman Technology LLC & Pacific Environmental Technologies April 2005

The degree of a typical vertex in generalized random intersection graph models

1 Identical Parallel Machines

International Journal of Pure and Applied Mathematics Volume 37 No , IMPROVED DATA DRIVEN CONTROL CHARTS

Comparing Probabilistic Forecasting Systems with the Brier Score

A note on the multiplication of sparse matrices

An Improved Particle Filter with Applications in Ballistic Target Tracking

Transcription:

ISSN 1440-771X AUSTRALIA DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS An Iproved Method for Bandwidth Selection When Estiating ROC Curves Peter G Hall and Rob J Hyndan Working Paper 11/00

An iproved ethod for bandwidth selection when estiating ROC curves Peter G. Hall 1 and Rob J. Hyndan 1, 13 Septeber 00 Abstract: The receiver operating characteristic (ROC curve is used to describe the perforance of a diagnostic test which classifies observations into two groups. We introduce a new ethod for selecting bandwidths when coputing kernel estiates of ROC curves. Our technique allows for interaction between the distributions of each group of observations and gives substantial iproveent in MISE over other proposed ethods, especially when the two distributions are very different. Key words: Bandwidth selection; binary classification; kernel estiator; ROC curve. JEL classification: C1, C13, C14. 1 INTRODUCTION A receiver operating characteristic (ROC curve can be used to describe the perforance of a diagnostic test which classifies individuals into either group G 1 or group G. For exaple, G 1 ay contain individuals with a disease and G those without the disease. We assue that the diagnostic test is based on a continuous easureent T and that a person is classified as G 1 if T δ and G otherwise. Let G(t = Pr(T t G 1 and F (t = Pr(T t G denote the distribution functions of T for each group. (Thus F is the specificity of the test and 1 G is the sensitivity of the test. Then the ROC curve is defined as R(p = 1 G(F 1 (1 p where 0 p 1. Let {X 1,..., X } and {Y 1,..., Y n } denote independent saples of independent data fro G 1 and G, and let Fˆ and Ĝ denote their epirical distribution functions. Then a siple estiator of R(p is Rˆ(p = 1 Ĝ(Fˆ 1 (1 p, although this has the obvious weakness of being a step function while R(p is sooth. Zou, W.J. Hall & Shapiro (1997 and Lloyd (1998 proposed a sooth kernel estiator of R(p as follows. Let K(x be a continuous density function and L(x = x K(u du. The kernel estiators of F and G are F (t = 1 ( t L X i h 1 and G (t = i=1 i=1 1 ( t L 1 Centre for Matheatics and its Applications, Australian National University, Canberra ACT 000, Australia. Departent of Econoetrics and Business Statistics, Monash University, VIC 3800, Australia. Corresponding author: Rob Hyndan (Rob.Hyndan@onash.edu.au. 1 h Y i.

For the sake of siplicity we have used the sae kernel for each distribution, although of course this is not strictly necessary. The kernel estiator of R(p is then R (p = 1 G (F 1 (1 p. Qiu & Le (001 and Peng & Zhou (00 have discussed estiators alternative to R (p. Lloyd and Yong (1999 were the first to suggest epirical ethods for choosing bandwidths h 1 and h of appropriate size for R (p, but they treated the proble as one of estiating F and G separately, rather than of estiating the ROC function R. We shall show that by adopting the latter approach one can significantly reduce the surplus of ean squared error over its theoretically iniu level. This is particularly true in the practically interesting case where F and G are quite different. In the present paper we introduce and describe a bandwidth choice ethod which achieves these levels of perforance. A related proble, which leads to bandwidths of the correct order but without the correct constants, is that of soothing in distribution estiation. See, for exaple, Mielniczuk, Sarda and Vieu (1989, Sarda (1993, Altan and Legér (1995, and Bowan, Hall and Prvan (1998. METHODOLOGY.1 Optiality criterion and optial bandwidths If the tails of the distribution F are uch lighter than those of G then the error of an estiator of F in its tail can produce a relatively large contribution to the error of the corresponding estiator of G(F 1. As a result, if the L perforance criterion γ 1 (S = S [ E Ĝ(F 1 (p G(F (p] 1 dp (.1 for a set S [0, 1], is not weighted in an appropriate way then choice of the optial bandwidth in ters of γ 1 (S can be driven by relative tail properties of f and g. Forula (A.1 in the appendix will provide a theoretical illustration of this phenoenon. We suggest that the weight be chosen equal to f(f 1, so that the L criterion becoes γ(s = S [ E Ĝ(F 1 (p G(F (p] 1 f(f 1 (p dp. (. We shall show in the appendix that for this definition of ean integrated squared error, { } γ(s β(s E[F (t F (t] g (t + E[ Ĝ(t G(t] f (t dt (.3 F 1 (S where F 1 (S denotes the set of points F 1 (p with p S. Note particularly that the right-hand side is additive in the ean squared errors E( F F and E( Ĝ G, so that in principle h 1 and h ay be chosen individually, rather than together. That is, if h 1 and Hall and Hyndan: 13 Septeber 00

h iniise β 1 (S = E[F (t F (t] g (t dt and β (S = E[ Ĝ(t G(t] f (t dt, F 1 (S F 1 (S respectively, then they provide asyptotic iniisation of γ(s. To express optiality we take F 1 (S equal to the whole real line, obtaining the global criterion β(h 1, h = β 1 (h 1, h + β (h 1, h where β 1 (h 1, h = E[F (t F (t] g (t dt and β (h 1, h = E[Ĝ(t G(t] f (t dt β 1 = 1 (1 F F g + δ 1 + o( 1 h 1 + h 4 1 (.4 Suppose K is a copactly supported and syetric probability density, and f is bounded, continuous and square-integrable. Then arguents siilar to those of Azzalini (1981 show that E(F F = 1 [(1 F F h 1 ρ f] + ( 1 ρ h 1 f + o(n 1 h 1 + h 1 4, where ρ = (1 L(uL(u du, ρ = u K(u du. Of course, an analogous forula holds for E( Ĝ G, and so the forulae at (.4 adit siple asyptotic approxiations: β = n 1 (1 G G f + δ + o(n 1 h + h 4 where δ 1 = 1 h 1 ρ f g 1 + 4 ρ h4 1 (f g (.5 and δ = n 1 h ρ f 1 g + 4 ρ h 4 (fg (.6 The asyptotically optial bandwidths are therefore h 1 = 1/3 c(f, g and h = n 1/3 c(g, f where { }/{ } c(f, g 3 = ρ f(u g (u du ρ [f (u g(u] du. A conventional plug-in rule for choosing h 1 and h ay be developed directly fro these forulae. However, it requires selection of pilot bandwidths for estiating f, g and their derivatives. The technique suggested in the next section avoids that difficulty. Hall and Hyndan: 13 Septeber 00 3

. Epirical choice of bandwidth Let f and ĝ denote leave-one-out kernel estiators of f and g, respectively: f ( x Xi1 ( x Xi (x h 1 = K K ( 1h 1 h 1 i 1 <i 1 h 1 ĝ ( y Yi1 ( y Yi (y h = K K. n(n 1h h 1 i 1 <i n h Let fˆ i(x h 1 = {( 1 h 1 } 1 j=i K{(x X j /h 1 }, and define ĝ i (y h analogously, and let f 1 and ĝ 1 denote the kernel estiators of (f and (g, respectively: f 1 (x h 1 = ĝ 1 (y h = ( 1h 4 1 n(n 1h 4 i 1 =1 i =1 n n K ( x X i1 K ( x X i h 1 h 1 ( K y Y i1 h i 1 =1 i =1 K ( y Y i. h Note that the latter two estiators include all ters whereas the other estiators are leave-one-out estiators. We include the diagonal ters in the estiators of (f and (g as they act like ridge paraeters and produce better epirical perforance. Now let (h 1, h = 1 h 1 ρ 1 ĝ 1 4 (X i h + 4 ρ h 1 n 1 f 1 (Y i h 1 ĝ i (Y i h i=1 i=1 n n 1 h ρ n 1 f 1 (Y i h 1 + 4 ρ 4 h 1 ĝ 1 (X i h fˆ i(x i h 1. i=1 i=1 We could choose h 1 and h to iniize (h 1, h. To otivate this approach, note that E{ (h 1, h } = 1 h 1 ρ (Eĝ 1 f + 4 ρ 4 h 1 (Efˆ (Eĝ g n 1 h ρ (Efˆ g + 1 4 ρ h 4 (Eĝ (Efˆ f, (.7 which indicates that is an alost-unbiased approxiation to δ = δ 1 + δ ; copare (.7 with the su of the ters at (.5 and (.6. The relative size of stochastic error ay also be shown to be asyptotically negligible. Indeed, if n as n, if K is copactly supported and has a Hölder-continuous derivative, and if f and g are copactly supported and have three bounded derivatives, then (h 1, h /δ(h 1, h converges to 1 with probability 1, uniforly in n 1+ɛ h 1, h n ɛ for each 0 < τ < 1, as n. However, iniizing (h 1, h leads to soe nuerical instability. Instead, we constrain the iniization so that h 1 = ρh where ρ = h 1 /h and h 1 and h are the bandwidths selected for estiating F and G using the plug-in rule proposed by Lloyd and Yong (1999. Miniizing (h 1, h under this constraint provides values of h 1 and h which are suitable for estiating R (p. n Hall and Hyndan: 13 Septeber 00 4

3 SOME SIMULATIONS We copare the estiates obtained with our bandwidth selection ethod outlined above to those obtained by Lloyd and Yong (1999 using their plug-in rule. Let [ W (p = E G (F 1 (p G(F (p] 1 f(f 1 (p (3.1 denote ean squared error. Thus, ean integrated squared error, introduced at (., is given by γ(s = S W (p dp. The ideal but practically unattainable iniu of W (p, for a nonrando bandwidth, can be deduced by siulation, and will be denoted by W 0 (p. This value will be copared with its analogue, W 1 (p, obtained fro (3.1 using the values of h 1 and h chosen using the ethod outlined in Section.; and with W (p, obtained fro (3.1 using the values of h 1 and h chosen using the plug-in procedure suggested by Lloyd and Yong (1995. In our first exaple, illustrated in the first panel of Figure 1, we used Lloyd and Yong s (1999 odel, where F and G are N(0, 1 and N(1, 1 respectively. In the second exaple we chose F and G to be ore different; F was N(0, 1 and G was an equal ixture of N(, 1 and N(, 1. In both cases our ethod offers an iproveent, which as expected is greater when the distributions are further apart. The areas under the curves represent the increase in γ(s due to bandwidth selection. In these ters our ethod iproves on that of Lloyd and Yong (1999 by 1.% and 8.6%, in the respective exaples. Exaple 1 Exaple W i (x W 0 (x 0.0 0.5 1.0 1.5.0 W i (x W 0 (x 0.5 0.0 0.5 1.0 1.5.0.5 0.0 0. 0.4 0.6 0.8 1.0 p 0.0 0. 0.4 0.6 0.8 1.0 p Figure 1: Solid lines: W 1 (p W 0 (p. Dashed lines: W (p W 0 (p. Hall and Hyndan: 13 Septeber 00 5

APPENDIX: Derivation of (.3 Assue that f and g have continuous derivatives and are bounded away fro 0 on S. Put A = F F, B = Ĝ G and C = F 1 F 1, and write I for the identity function. Then by Taylor expansion, I = F (F 1 + C = I + A(F 1 + C f(f 1 + o p ( A(F 1 + C, whence it follows that C = [A(F 1 /f(f 1 ] + o p ( A(F 1. Hence, g(f 1 Ĝ(F 1 G(F 1 = B(F 1 f(f 1 A(F 1 + o p ( A(F 1 + B(F 1. (A.1 Note the ratio g(f 1 /f(f 1 on the right-hand side of (A.1. Since the variance of A equals (1 F F then the unweighted criterion γ 1, defined at (.1, can be largely deterined by the value of (g/f (1 F F in the tails if this quantity is not bounded. Using instead the weighted criterion γ, defined at (., we ay deduce fro (A.1, related coputations and the independence of the saples that S E[ Ĝ(F 1 G(F 1 ] f(f 1 = [1 + o(1] which is equivalent to (.3. F 1 (S [E(B f + E(A g ] REFERENCES Altan, N. and Léger, C. (1995. Bandwidth selection for kernel distribution function estiation. J. Statist. Plann. Inf. 46, 195 14. Azzalini, A. (1981. A note on the estiation of a distribution function and quantiles by a kernel ethod. Bioetrika 68, 36 38. Bowan, A.W., Hall, P. and Prvan, T. (1998. Cross-validation for the soothing of distribution functions. Bioetrika 85, 799 808. Lloyd, C.J. (1998. The use of soothed ROC curves to suarise and copare diagnostic systes. J. Aer. Statist. Assoc. 93, 1356 1364. Lloyd, C.J. and Yong, Z (1999. Kernel estiators of the ROC curve are better than epirical. Statist. Prob. Letters 44, 1 8. Mielniczuk, J., Sarda, P. and Vieu, P. (1989. Local data-driven bandwidth choice for density estiation. J. Statist. Plann. Inf. 3, 53 69. Peng, L. and Zhou, X.-H. (00. Local linear soothing of receiver operator characteristic (ROC curves. J. Statist. Plann. Inf., to appear. Qiu, P. and Le, C. (001. ROC curve estiation based on local soothing. J. Statist. Coput. and Siul. 70, 55 69. Sarda, P. (1993. Soothing paraeter selection for sooth distribution functions. J. Statist. Plann. Inf. 35, 65 75. Zou, K.H., Hall, W.J. and Shapiro, D.E. (1997. Sooth non-paraetric receiver operating characteristic (ROC curves for continuous diagnostic tests. Statistics in Medicine 16 143 156. Hall and Hyndan: 13 Septeber 00 6