A new distribution-free quantile estimator

Similar documents
Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Lecture 33: Bootstrap

A NEW METHOD FOR CONSTRUCTING APPROXIMATE CONFIDENCE INTERVALS FOR M-ESTU1ATES. Dennis D. Boos

Lecture 19: Convergence

Confidence Interval for Standard Deviation of Normal Distribution with Known Coefficients of Variation

Topic 9: Sampling Distributions of Estimators

Expectation and Variance of a random variable

1 Introduction to reducing variance in Monte Carlo simulations

Random Variables, Sampling and Estimation

A goodness-of-fit test based on the empirical characteristic function and a comparison of tests for normality

Topic 9: Sampling Distributions of Estimators

Goodness-Of-Fit For The Generalized Exponential Distribution. Abstract

Lecture 7: Properties of Random Samples

Since X n /n P p, we know that X n (n. Xn (n X n ) Using the asymptotic result above to obtain an approximation for fixed n, we obtain


Topic 9: Sampling Distributions of Estimators

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Probability and statistics: basic terms

Simulation. Two Rule For Inverting A Distribution Function

Trimmed Mean as an Adaptive Robust Estimator of a Location Parameter for Weibull Distribution

BIOSTATISTICS. Lecture 5 Interval Estimations for Mean and Proportion. dr. Petr Nazarov

Lecture 3. Properties of Summary Statistics: Sampling Distribution

Goodness-Of-Fit For The Generalized Exponential Distribution. Abstract

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Access to the published version may require journal subscription. Published with permission from: Elsevier.

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

Using the IML Procedure to Examine the Efficacy of a New Control Charting Technique

Estimation of Gumbel Parameters under Ranked Set Sampling

A statistical method to determine sample size to estimate characteristic value of soil parameters

Chapter 2 The Monte Carlo Method

Bootstrap Intervals of the Parameters of Lognormal Distribution Using Power Rule Model and Accelerated Life Tests

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

There is no straightforward approach for choosing the warmup period l.

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

Stat 200 -Testing Summary Page 1

Lecture Note 8 Point Estimators and Point Estimation Methods. MIT Spring 2006 Herman Bennett

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments:

Approximate Confidence Interval for the Reciprocal of a Normal Mean with a Known Coefficient of Variation

LECTURE 8: ASYMPTOTICS I

A General Family of Estimators for Estimating Population Variance Using Known Value of Some Population Parameter(s)

Lecture 2: Monte Carlo Simulation

NCSS Statistical Software. Tolerance Intervals

THE DATA-BASED CHOICE OF BANDWIDTH FOR KERNEL QUANTILE ESTIMATOR OF VAR

Linear Regression Models

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes.

MATH/STAT 352: Lecture 15

Estimation for Complete Data

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

POWER COMPARISON OF EMPIRICAL LIKELIHOOD RATIO TESTS: SMALL SAMPLE PROPERTIES THROUGH MONTE CARLO STUDIES*

1 Inferential Methods for Correlation and Regression Analysis

Maximum likelihood estimation from record-breaking data for the generalized Pareto distribution

32 estimating the cumulative distribution function

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

Anna Janicka Mathematical Statistics 2018/2019 Lecture 1, Parts 1 & 2

THE KALMAN FILTER RAUL ROJAS

Statistical inference: example 1. Inferential Statistics

Stochastic Simulation

Monte Carlo Integration

Estimation of Population Mean Using Co-Efficient of Variation and Median of an Auxiliary Variable

Statisticians use the word population to refer the total number of (potential) observations under consideration

Interval Estimation (Confidence Interval = C.I.): An interval estimate of some population parameter is an interval of the form (, ),

Department of Mathematics

Investigating the Significance of a Correlation Coefficient using Jackknife Estimates

The performance of univariate goodness-of-fit tests for normality based on the empirical characteristic function in large samples

New Entropy Estimators with Smaller Root Mean Squared Error

Chapter 6 Sampling Distributions

Double Stage Shrinkage Estimator of Two Parameters. Generalized Exponential Distribution

Properties and Hypothesis Testing

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

IE 230 Probability & Statistics in Engineering I. Closed book and notes. No calculators. 120 minutes.

Output Analysis (2, Chapters 10 &11 Law)

Some Properties of the Exact and Score Methods for Binomial Proportion and Sample Size Calculation

MBACATÓLICA. Quantitative Methods. Faculdade de Ciências Económicas e Empresariais UNIVERSIDADE CATÓLICA PORTUGUESA 9. SAMPLING DISTRIBUTIONS

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15

Department of Civil Engineering-I.I.T. Delhi CEL 899: Environmental Risk Assessment HW5 Solution

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

GUIDELINES ON REPRESENTATIVE SAMPLING

Modified Ratio Estimators Using Known Median and Co-Efficent of Kurtosis

Efficient GMM LECTURE 12 GMM II

o <Xln <X2n <... <X n < o (1.1)

Confidence interval for the two-parameter exponentiated Gumbel distribution based on record values

Inferential Statistics. Inference Process. Inferential Statistics and Probability a Holistic Approach. Inference Process.

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Stat 319 Theory of Statistics (2) Exercises

IE 230 Seat # Name < KEY > Please read these directions. Closed book and notes. 60 minutes.

4. Partial Sums and the Central Limit Theorem

The Bootstrap, Jackknife, Randomization, and other non-traditional approaches to estimation and hypothesis testing

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.

TMA4245 Statistics. Corrected 30 May and 4 June Norwegian University of Science and Technology Department of Mathematical Sciences.

The Sampling Distribution of the Maximum. Likelihood Estimators for the Parameters of. Beta-Binomial Distribution

Confidence Intervals QMET103

It should be unbiased, or approximately unbiased. Variance of the variance estimator should be small. That is, the variance estimator is stable.

Topic 10: The Law of Large Numbers

Exam II Review. CEE 3710 November 15, /16/2017. EXAM II Friday, November 17, in class. Open book and open notes.

Transcription:

Biometrika (1982), 69, 3, pp. 635-40 Prited i Great Britai 635 A ew distributio-free quatile estimator BY FRANK E. HARRELL Cliical Biostatistics, Duke Uiversity Medical Ceter, Durham, North Carolia, U.S.A. AND c. E. DA VIS Departmet of Biostatistics, Uiversity of North Carolia, Chapel Hill, North Carolia, U.S.A. SUMMARY A ew distributio-free estimator QP of the pth populatio quatile is formulated, where QP is a liear combiatio of order statistics admittig a jackkife variace estimator havig excellet properties. The small sample efficiecy of QP is studied uder a variety of light ad heavy-tailed symmetric ad asymmetric distributios. For the distributios ad values of p studied, Q Pis geerally substatially more efficiet tha the traditioal estimator based o oe or two order statistics. Some key words: Distributio-free estimator; Noparametric estimator; Order statistic; Percetile; Quatile. 1. INTRODUCTION The estimatio of populatio quatiles or percetiles is of great iterest, particularly whe the statisticia is uwillig to assume a parametric form for the distributio or eve to assume the distributio to be symmetric. Sample quatiles have may desirable properties. However, they also have drawbacks. They are ot particularly efficiet estimators of locatio for distributios such as the ormal, good estimators of the variace of sample quatiles do ot exist for geeral distributios, sample quatiles may ot be jackkifed, ad the sample media differs i form ad i efficiecy depedig o the sample size beig eve or odd. Maritz & Jarrett (1978) have developed a estimator of the sample media that performs well for some distributios. We propose to estimate the pth quatile by a liear combiatio of the order statistics. For most distributios the ew estimator offers a sigificat gai i efficiecy over the traditioal oe ad admits a jackkife variace estimator that performs well. The properties of the estimator ad its variace estimator are studied over a wide variety of light- ad heavy-tailed symmetric ad asymmetric distributios, with emphasis o small sample results. 2. ESTIMATORS Let X 1,..., X deote a radom sample of size from a cotiuous distributio havig distributio fuctio F(. ). Let X 0 > ~ ~ X<> deote the order statistics of the sample ad X = (Xc 1»..., X<>). Oe traditioal estimator of the pth populatio quatile F- 1 (p) is (1)

636 FRANK E. HARRELL AND C. E. DAVIS where (+ l)p = j +g ad j is the itegral part of (+ l)p. Whe p = t, TP is the usual sample media. The expected value of the kth order statistic is give by f 1 00 E(X<k>) = f3 xf(x)k- 1 {1-F(x)}-kdF(x) (k,-k+l) -oo Sice E(X{(+l)p}) coverges to F- 1 (p) for p E (0, 1), we take as our estimator of F-1(p) somethig which estimates E(X{(+ l)p}) whether or ot (+ 1)) pis a iteger, amely f Q = 1 1 F-l(y)y(+l)p-1(1-y)(+l)(l-p)-ldy P /3{(+l)p,(+l)(l-p)} 0 ' where F(X) is the sample distributio fuctio, F(x) = - 1 ~ l(x; ~ x), l(a) beig the idicator fuctio of the set A. The estimator ca be reexpressed as where QP = L W,iX(il' (2) i= 1 1 ii/ W:. = y<+ l)p-1(1-y)(+ 1)(1-p)-1 dy,i /3{ ( + 1) p, ( + 1) (1-p)} (i-1)/ = 1; 1 {p(+ 1), (1-p) (+ l)}-j(i-1)/{p(+ 1), (1-p) (+ l)} (3) ad lx(a, b) deotes the icomplete beta fuctio. Maritz & Jarrett (1978) used a similar idea to estimate the secod momet of the sample media. For p =!or ~ 100 with p =I= t, the weights W,i i (3) ca be adequately calculated with umerical itegratio usig Simpso's rule with 2 itervals betwee (i-1)/ ad i/. For other cases, the icomplete beta fuctio should be calculated exactly. The algorithm of Majumdar & Bhattacharjee (1973) is very efficiet for this calculatio. Followig David (1981, p. 273), QP is asymptotically ormally distributed uder mild assumptios o F(. ). Mote Carlo studies usig the Kolmogorov-Smirov statistic have show that for the uiform ad ormal distributios, the ormal approximatio is adequate for samples as small as 20 for p = t or 30-50 for p = 0 95. For asymmetric distributios such as the expoetial, sample sizes as large as 80-100 may be required for p = 0 9 or above. For the calculatio of tail probabilities usig the variace estimator give below ad a ormal approximatio, sample sizes ecessary for accurate cofidece itervals may be smaller. However, more research is eeded i this area. I order to calculate the jackkife variace estimate of QP (Miller, 1974), cosider removig order statistic j from the sample. The resultig estimate is Si= sj X, The jackkife variace estimator is (si); = l(i =I= j) W-1,i-l<i>i>" -1 VP= -- L (Si-S) 2 = (-1) (- 1 ~SJ-S2), i=l (4)

where A ew distributio-free quatile estimator S = - 1 " S. = - 1 ut X (u) (." 1) W + ( ) W L., J ' i = " - - 1, i - 1 - i - 1, ;, j=l r. SJ= XT AX, Au= (l-1) w;-1,1-1 + (-l) w;-1,l 637 Aim= (l-1) W-1,1-1 W-1,m-1 +(m-l-l) W-1,1 W-1,m-1 +(-m) w-1,l w-1,m (m > l), Amt= Aim W- 1,; = 0 (i < 1 or i > -1). Simulatios show that the jackkife versio of QP' while havig lower bias tha QP, has larger variace, resultig i a estimator with similar efficiecy to TP. The extreme order statistic weights for the jackkife estimator for small ad p = -!- are sometimes egative, resultig i early ubiased estimators of extreme quatiles although havig large variace. The jackkifed quatile estimator will ot be discussed further, oly VP, the associated variace estimator. Kaigh & Lachebruch (1982) have proposed a quatile estimator which is the average of subsample TP-like estimators. Their estimator has properties similar to QP ad does ot require umerical itegratio. It may require larger sample sizes for estimatig extreme quatiles, ad a variace estimator has ot bee studied. 3. EFFICIENCY OF QP RELATIVE TO TP FOR VARIOUS DISTRIBUTIONS To ivestigate the performace of QP with respect to TP for a wide variety of distributios, the geeralized lambda distributio (Ramberg et al., 1979) was cosidered. The distributio is defied by p-l(p) = µ+rr{pa-(l-p)b}, where µ ad <T are respectively locatio ad scale parameters, set to 0 ad 1 for this ivestigatio, ad a ad b are shape parameters. Table 1 shows the distributios used with the stadardized skewess, a 3, ad kurtosis, a 4, values. Table 1. Geeralized lambda distributios a b Cl3 Cl4 Descriptio 1 1 0 1 8 Light-tailed symmetric 0 1349 0 1349 0 3 Normal-like -0 1359-0 1359 0 9 Very heavy-tailed symmetric, like t distributio with 5 degrees of freedom -1-1 (f) (f) Cauchy-like 0 0251 0 0953 0 9 4 2 Medium-tailed asymmetric 0 0 0004 2 9 Expoetial-like The mea squared error of a estimator was used to measure its efficiecy, for example MSE(Qp) = E{Qp-F- 1 (p)} 2, MSE(Tp) = E{Tp-F- 1 (p)} 2. These mea squared errors were estimated by geeratig 1000 radom samples for each distributio ad each ad averagig the squared errors. Radom order statistics were geerated usig the method of Lurie & Hartley (1972) icorporatig a Tausworthe uiform radom umber geerator (Whittlesey, 1968).

638 FRANK E. HARRELL AND C. E. DAVIS (a) a= 1 b = 1 2 0 ' 1 5 p -------0--------0 05 ----o----0 10 ---------0 25 ---6---0 ----A---- 0 75 ---0--0 90 0 95 l>-----&,,,.. - -------~ - ~.:::--~-~ --(-,,o--~// - ---==~_:g_-..::=-----~ 0-::s:- <--0-------0-------<>----- 2 0 (b) a= 0 1349, b = 0 1349 0'---'--1 0---''--2~0-... 3~0--'-4 0-'--~50....60 0'--...1...-1~0-.. ~20. 3.0---''--4~0--... 5~0.,60 (d) a = - 1 b = - 1 2 0, 0..._.. 1~0-'--~20,... 3.o., 4~0-... 5~0.,60 - - l:!r --A.,...-b--=- --=--~ --=- --=---/;, ~.. /...------------- /,/..._'!,.,,,...,,;' I...' I o............._........., 10 20 30 40 50 60 2. 0 (e) a= 0 0251, b = 0 0953 (f) a = 0 b = 0 0004 2 0, :>, 0 ~ 1 5 ~ l O $ ~ 10 20 30 40 50 60 0 10 20 30 40 50 60 Fig. 1. Simulated mea squared error efficiecy of QP agaist TP for geeralized lambda distributios with differet parameters of the distributio. The efficiecy of QP relative to TP is MSE(Tp)/MSE(Qp) This was estimated by takig the ratio of estimated mea squared errors. Mote Carlo experimets were performed for = 6, 10, 16, 23, 45, 60 ad p = 0 05, 0 10, 0 25, 0, 0 75, 0 90, 0 95. Results for p >!are ot displayed for symmetric distributios. The results are show i Fig. 1. Oe additioal experimet was performed for the ormal distributio for = 250 ad p =! resultig i a estimated relative effiecy of Q P of l 07. From Fig. 1 we see that, except for the Cauchy-like distributio, QP is geerally more tha l l times as efficiet as TP"

A ew distributio-free quatile estimator 639 4. EXACT EFFICIENCY OF QP AND TP RELATIVE TO PARAMETRIC ESTIMATORS FOR THE NORMAL DISTRIBUTION For the ormal distributio, the uiform miimum variace ubiased estimator of F- 1 (p) is XP = X +<l>- 1 (p)s/e(s), where X ads are the sample mea ad stadard deviatio respectively, <I>(.) is the ormal distributio fuctio, ad E(s) = {2/(-1)} 1 / 2 r(t,)/r{t(-1)}. For 2:::; :::; 20, exact efficiecies of QP ad TP ca be calculated usig the momets of ormal order statistics tabled by Sarha & Greeberg (1962, p. 193). For p = O l ad p =, the efficiecies are show i Fig. 2. We do ot recommed the use of QP for small ad extreme p; however, the relative performace of XP is actually worse i this situatio tha for larger. From Fig. 2 we see that the ew estimator has much to offer over TP especially for extreme quatiles. 1 25 (a) p = O l 1 25 (b) p = l O :>, " ~ -~ " f:s ~ l O 0 75 0 25.._o..._..._._... 5_._...._... 10._..._..._1._5......_..20 0 250 5 10 Fig. 2. Exact mea squar!ld error efficiecy of ew estimator, QP, ad sample quatile TP' with respect to XP for the ormal distributio with quatiles O l ad. 15 20 5. PERFORMANCE OF THE VARIANCE ESTIMATOR VP For the 1000 repeated samples i the simulatios for each distributio ad sample size, the sample variace of the simulated Q P estimator was calculated. This was compared to the sample mea of simulated VP estimates. The ratio of the mea VP to the simulated V(Qp) was used to estimate E( Vp)/V(Qp) to measure the bias of VP" The results idicated that E(Vp) was seldom differet from V(Qp) by more tha a factor of 1 15, eve for the Cauchy-like distributio with > 16. Thus cofidece itervals for QP ca be readily costructed usig the asymptotic ormality of Q p The authors are grateful to the referee for costructive commets that improved the clarity of the paper. REFERENCES DAVID, H. A. (1981). Order Statistics, 2d editio. New York: Wiley. KAIGH, W. D. & LACHENBRUCH, P. A. (1982). A geeralized quatile estimator. Comm. Statist. A 11. To appear. LURIE, D. & HARTLEY, H. 0. (1972). Machie-geeratio of order statistics for Mote Carlo computatios. Am. Statisticia 26, 26-7. Errata (1972) 26, 56-57. MAJUMDAR, K. L. & BHATTACHARJEE, G. P. (1973). The icomplete beta itegral (Algorithm AS 63). Appl. Statist. 22, 409--11.

640 FRANK E. HARRELL AND C. E. DAVIS MARITZ, J. S. & JARRETT, R. G. (1978). A ote o estimatig the variace of the sample media. J. Am. Statist. Assoc. 73, 194-6. MILLER, R. (1974). The jackkife-a review. Biometrika 61, 1-15. RAMBERG, J. S., DuDEWICZ, E. J., TADIKAMALLA, P.R. & MYKYTKA, E. F. (1979). A probability distributio ad its uses i fittig data. Techometrics 21, 201-14. SARHAN, A. E. & GREENBERG, B. G. (Eds). (1962). Cotributios to Order Statistics. New York: Wiley. WHITTLESEY, J. R. B. (1968). A compariso of the correlatioal behavior of radom umber geerators for the IBM 360. Comm. Assoc. Comp. Mach. 11, 641-4. [Received October 1980. Revised February 1982]