A NEW METHOD FOR CONSTRUCTING APPROXIMATE CONFIDENCE INTERVALS FOR M-ESTU1ATES. Dennis D. Boos

.- A NEW METHOD FOR CONSTRUCTING APPROXIMATE CONFIDENCE INTERVALS FOR M-ESTU1ATES by Deis D. Boos Departmet of Statistics North Carolia State Uiversity Istitute of Statistics Mimeo Series #1198 September, 1978

e A New Method for Costructig Approximate Cofidece Itervals for M-estimates by Deis D. Boos The empirical fuctio used to defie M-estimates of locatio is similar to a distributio fuctio whe ~ is odecreasig. This similarity allows approximate cofidece itervals to be costructed from the "percetiles" of the defiig fuctio. KEY WORDS: M-estimates; Cofidece itervals; Quatiles; t statistic. "e 1

1. INTRODUCTION Let Xl,,X be a sample from a distributio F ad defie the locatio "parameter" 6 to be the solutio of 00! ~(x-6)df(x) = 0. _00 (1.1) A M-estimate for 6 is the solutio e of the empirical aalogue to (1.1) 1 [~(x.-e) = 0. i=l ~ (1. 2) '-e Asymptotic properties of A 6 are well-kow ad the Priceto study Adrews, et al (1972) suggests that r ( -&) approaches ormality fairly quickly. Huber (1970), Gross (1976, 1977), ad Shorack (1976) have co- structed approximate cofidece itervals for 6 based o studetizatio of r (8-6) by estimates of the asymptotic stadard deviatio. I this paper a ew method of costructig approximate cofidece itervals for 6 is proposed for the special class of mootoe odecreasig, right cotiuous ~ fuctios. The method exploits the fact that A (c) F = --~~(Xi-c) is like a distributio fuctio (Le., odecreasig ad right cotiuous). I particular, the edpoits of the proposed cofidece iterval are "percetiles" of A F. Sectio 2 gives a motivatig example ad Sectio 3 provides the basic ideas ad method. I Sectio 4 Mote Carlo results ad comparisos with other results are metioed. Sectio 5 shows how to exted to the regressio situatio ad Sectio 6 is a short summary. 2-

e 2. MOTIVATION FROM QUANTILE ESTIMATION Let quatile -1 F (p) = if{x:f(x) ~ p}. Cosider estimatio of the pth -1 F (p),o<p<l, from a sample havig distributio F. If F is the usual empirical df, the = P(a<F (F- 1 (p»<b). - -1 usig the fact that all df's G satisfy G (t).::.x iff t<g(x). For idepedet X. ~ -1 the statistic F (F (p» is biomial -1 (,F(F (p»). Thus the ormal approximatio to the biomial ad the assumptio -1 F(F (p» '" p lead us to choose '-e for a I p(l-p) -a = b = p + ~ - za/2 approximate (l-a) cofidece iterval for -1 F (p) (z a is the 100 (l-a)th percetile of the stadard ormal.) Although exact oparametric procedures exist for iid samples from cotiuous distributios, the above method geeralizes to quatile estimatio i more complicated situatios, e.g., stratified samplig from fiite populatios. The importat poit for the preset discussio is that M-estimatio ca use the same idea with F replaced by A F. 3. APPROXIMATE CONFIDENCE INTERVALS FOR e Let ~(t) be odecreasig, right cotiuous, ad strictly positive (egative) for large positive (egative) values of t. Two families of such ~ are "Hubers" ~(x) = max(-k, mi(k,x» ad "vth power" ~(x) = IxIVsg(x),0<v.::.1. For df's G defie 3

e 00 AG(C) = - f ~(x-c)dg(x) _00 -oo<c<oo ad t s (if A G (x), -oo<x<oo sup AG(X)). -oo<x<oo The parameter ad estimate are defied by Similar to the case of df's it follows that ad thus 8 = A;l(O) ad -l- AG (t)<x p(af-l(a) < 8 < A;l(b)) P(a ~ A F (8) < b) where is a reasoable estimate of The statistic of iterest, r A F (8) T = ~,--_ o I 1 2: ~(x.-e) i=l 1. has a form very close to a t statistic based o the rv's ~(X.-8). 1. the X. are symmetric about 8 ad ~(x) = -~(-x), the we expect T 1. to be close to a t distributio with -l degrees of freedom. Choosig -I alo = t a/2 = I b/o ' our proposed approximate (l-a) cofidece iterval is If 4

e (3.1) Uder suitable regularity coditios, the asymptotic width of (3.1) is comparable to methods based o studetizig ~ (6-8), i.e., It is ofte desirable that locatio estimates satisfy ""- For M-estimates the usual procedure is to replace ~(x) where ~ is a suitable scale estimate, or solve simultaeous equatios as i Huber's Proposal 2. The above methods carry through exactly ad the aalogous statistic of iterest is 1:. ~ ~(Xi-8 \ i=l & I! I -l 2 x.- 8 l: ~._1._ i=l \ A cr (3.2) 4. COMPARISONS AND MONTE CARLO RESULTS For small samples the form (3.2) is more appealig tha ~ (6-8)/8, where A S is a estimate of the asymptotic stadard deviatio 5

e ~ar IC(X 1 )]\ for the followig reaso. Although 6-e' is approximated by - 1 EIC(X.), Boos (1977) shows that this approximatio is at best 0 (3. ~ p sice ~-6--1EIC(Xi8 has a limit distributio. Thus, proximity of I(e-6)/S to a t distributio depeds o the t-1ike statistic -~ "- 2C (X.) /S ~ ad the approximatio of "- 6-6 by I fact Gross (1976) prefers to avoid use of the t distributio. O the other had Shorack (1976) seems to get very good t approximatios for certai Hampe1s. I order to spot cheek the performace of the approximate cofidece itervals based o (3.2), a small Mote Carlo study was performed. I Table 1 is foud the empirical error probabilities ad I times the expected cofidece iterval legths (ECIL) for 10,000 Mote Carlo "samples" --e geerated by the M~Gi11 "Super-Duper" radom umber geerator. A differet set of 10,000 samples was used for each distributio - ormal, logistic, D-EXP = double expoetial, T3 = t distributio with 3 degrees of freedom, slash a stadard ormal deviate divided by a idepedet uiform (0,1) deviate, ad for each sample size, = 10 ad = 20. Oly crude Mote Carlo techiques were used, so cosiderable error may exist i the 3rd decimal of the empirical probabilities ad i the 2d decimal of the ECIL. This is exemplified by the mea whose exact error probability we kow to be.05 for the ormal. SQRT is the M-estimator based o ~ *(x) = Ixl sg(x~ ad Hk = 1.0, 1.5 are Hubers with k = 1.0, 1.5 usig a ormalized iterquarti1e rage as a estimate of scale. Hk* = 1.5 is Huber's Proposal 2 with k = 1.5. For both = 10 ad = 20 the true levels are geerally coservative, but Hk = 1.5 ad Hk* = 1.5 are fairly close to.05 except for the slash distributio ad each has reasoably short ECIL. It is mildly surprisig that the mea is so 6

.- Table l. Empirical Error Probabilities ad Expected 95-Percet Cofidece Iterval Legths (multiplied by ~) - 10,. 20 Estimator Normal Logistic D-Exp T3 Slash Normal Logistic D-Exp T3 Slash a Empirical Error Probabilities Mea. 054.048.045.039.022.055.046.049.042.020 SQRT.039.033.030.029.017.048.043.040.040.026 Hk"l.O.046.040.035.036.031.055.048.045.048.043 Hk-1.5.059.050.046.044.033.056.050.048.049.040 Hk*"l.5.060.053.048.046.036.058.053.050.051.039 b. Expected 95-Percet Cofidece Iterval Legths (multiplied by.~) Mea 4.39 4.34 4.25 6.88 193.79 4.13 4.09 4.06 6.62 128.90 SQRT 4.93 4.69 4.34 6.73 84.35 4.45 4.17 3.73 5.76 32.15 Hk..l.O 4.97 4.65 4.14 6.28 14.60 4.41 4.08 3.54 5.34 11.36 Hk"l.5 4.51 4.30 3.94 5.95 14.61 4.21 3.99 3.62 5.37 12.22 Hk*-1.5 4.45 4.22 3.88 5.84 14.81 4.16 3.93 3.60 5.32 12.62 7

close to.05 from ormal to T3, though the ECIL are expectedly large for e heavy tails. SQRT seems to perform worst over all. Table 2 represets Mote Carlo estimates of the percetiles of T cr. The percetiles ted to be larger tha those of a t distributio for the ormal ad geerally smaller for the heavier-tailed distributios. For = 20 ad a =.05 all estimates except for SQRT ad the mea evaluated at the slash distributio are very close to t. = 1.73 (we should ote 05 that the method of calculatig the estimated percetiles resulted i cosiderable error i the secod decimal place). 5. REGRESSION..- X. ~ P Lc..6.+U. j=l ~J J ~ The Huber (1973) regressio model is where E (U.) = 0 ad the c.. are kow coefficiets. Let ~ ~J (8,... l,8,a) p be solutios of p L W(X.- L c. 8 )c.. = 0 i=l cr ~ k=l ~ k k ~J j l,p, 1 2 p (-p) L W(X.- L c. k 8 k ) = s. i=l cr ~ k=l ~ Defie Q (t),r p p... = - E ~A(X.- E c_ k 8 k -c.. t)c. i=l cr ~ k=l ~ ~r ~r k~r r = l,p. 8

The 8 = Q-I (0) ad r,r p(q-i (a) < 8,r r < Q-I (b» = P(a < Q (8) < b).,r -,r r By Taylor expasio i 6 r 8 we fid r p = - L WA(X.- L c. k 8 k + c. (6-8)c. i=l 0 ~ k=l ~ ~r r r ~r PI, p A * 2 = - L WA(X.- L c. k 8 )c. + ~ k L WA(X.- L c. 8 +c 8 )c. k (8-8 ) i=l 0 ~ i=l ~ ~r 0 i=l 0 ~ k=l ~ k i r ~r r r The first term i the above expressio is 0 ad 8*...L-> 0 if -e e --p-> 8 r r Thus, uder suitable regularity coditios, asymptotically ormal with mea 0 ad variace -~ ~ (8) is,r r A approximate cofidece iterval for 8 r is IQ-I (-t /20 I~), Q-I (t /20 I)], I,r a,r a where The advatage of this method over the usual methods is ot clear (The simplicity of the locatio model is goe!). Note though, that use of W' is ot required ad that for least absolute value regressio the above method circumvets a estimate of f ~-l (~)J 9

e 6. SUMMARY AND CONCLUSIONS A ew procedure for costructig cofidece itervals for a locatio parameter has bee proposed which exploits the mootoicity of a class of ~ fuctios. The distributioal problem is reduced to cosideratio of a t-1ike statistic ad Mote Carlo results verify that "Hubers" perform fairly well over a rage of distributios ad for samples of size = 10 ad = 20...- 10

REFERENCES Adrews, D. F., et al. (1972), Robust Estimatio of Locatio: Survey ad Advaces, Priceto, N. J.: Priceto Uiversity Press. Boos, Deis D. (1977), "Limitig Secod Order Distributios for First Order Fuctioa1s, with Applicatio to L- ad M-Statistics," Istitute of Statistics Mimeo Series #1152, North Carolia State Uiversity, Raleigh, N. C. Gross, Ala M. (1976), "Cofidece Iterval Robustess with Log-Tailed Symmetric Distributios," Joural of the America Statistical Associatio 3 71, 409-416. (1977), "Cofidece Itervals for Bisquare Regressio Estimates," Joural of the America Statistical Associatio 3 72, 341-354. Huber, Peter J. (1970), "Studetizig Robust Estimates," i Noparametric Techiques i Statistical Iferece, ed. Mada L. Puri, Cambridge: Cambridge Uiversity Press, 453-463. (1973), "Robust Regressio: Asymptotics, Cojectures, ad Mote Carlo," Aals of Statistics, 1, 799-821. Shorack, Gale R. (1976), "Robust Studetizatio of Locatio Estimates," Statistica Neerladica, 30, 119-142... 11