MAXIMUM BIAS CURVES FOR ROBUST REGRESSION WITH NON-ELLIPTICAL REGRESSORS

Size: px

Start display at page:

Download "MAXIMUM BIAS CURVES FOR ROBUST REGRESSION WITH NON-ELLIPTICAL REGRESSORS"

Ross Vernon Cobb
6 years ago
Views:

1 The Annals of Statistics 21, Vol. 29, No. 1, MAXIMUM BIAS CURVES FOR ROBUST REGRESSION WITH NON-ELLIPTICAL REGRESSORS By José R. Berrendero 1 and Ruben H. Zamar 2 Universidad Autónoma de Madrid and University of British Columbia Maximum bias curves for some regression estimates were previously derived assuming that (i) the intercept term is known and/or (ii) the regressors have an elliptical distribution. We present a single method to obtain the maximum bias curves for a large class of regression estimates. Our results are derived under verymild conditions and, in particular, do not require the restrictive assumptions (i) and (ii) above. Using these results it is shown that the maximum bias curves heavilydepend on the shape of the regressors distribution which we call the x-configuration. Despite this big effect, the relative performance of different estimates remains unchanged under different x-configurations. We also explore the links between maxbias curves and bias bounds. Finally, we compare the robustness properties of some estimates for the intercept parameter. 1. Introduction. The concept of maximum asymptotic bias was introduced byhuber (1964) for the simple location model. Extensions to scale and location-dispersion models were later obtained bymartin and Zamar (1989, 1993). Adrover (1998) derived the maximum asymptotic bias of dispersion matrices within the class of M-estimates. Martin Yohai and Zamar (1989), Croux, Rousseeuw and Hössjer (1994, 1996), Hennig (1995) and Berrendero and Romo (1998) considered the linear regression model. See also Donoho and Liu (1988), He and Simpson (1993) and Zamar (1992). We will focus on the regression model where the maximum bias theoryhas two main possible applications: (i) the comparison of competing robust regression estimates in terms of their bias behavior and (ii) the estimation of bias bounds for robust estimates in practical situations. Given two estimates, the one with smaller maxbias curve is obviouslymore robust. In particular, its maxbias curve will have a smaller derivative at zero (called gross-error-sensitivity) and a larger asymptote (called breakdown point, BP). The comparison of maxbias functions naturallyleads to the minimax bias theoryalso initiated byhuber s seminal 1964 work. The minimax bias theoryseeks estimates which minimize the maximum asymptotic bias Received June 1999; revised October 2. 1 Supported in part bydgesic Spanish Grant PB Supported in part bythe Natural Sciences and Engineering Research Council of Canada (NSERC). AMS 2 subject classification. 62F35. Key words and phrases. Robust regression, maxbias curve, S-estimates, τ-estimates, R- estimates. 224

2 MAXBIAS FOR ROBUST REGRESSION ESTIMATES 225 in a certain class of estimates. Huber (1964) showed that the median minimizes the maxbias curve among translation equivariant location estimates. Martin Yohai and Zamar (1989) showed that the least median of square estimate (LMS) introduced byrousseeuw (1984) is nearlyminimax in the class of regression M-estimates with general scale. Yohai and Zamar (1993) extend this result to the larger class of residual admissible estimates [see He (199)]. A main criticism of the maxbias theorywhen applied to the comparison of regression estimates is that its results applyonlyto regression-throughthe-origin models and/or to neighborhoods of a central regression model with ellipticallydistributed regressors. We address these criticisms byconsidering regression models with general regressors and including the intercept. Our results will also help to unifythe existing theorybecause theyyield a general methodologyapplicable to the large class of regression estimates satisfying (3) below. This class includes most regression and affine equivariant estimates for which maxbias curves were known. Regarding the second application, we notice that, in general, estimates face two sources of uncertainty: sampling variability and bias. In the case of robust estimates, the sampling variabilitycan be assessed bytheir standard errors (estimated using their asymptotic variances). On the other hand, the bias caused byoutliers and other departures from symmetrycan be assessed using their maximum asymptotic bias. To fix ideas, consider the location case and the median functional, M F. Suppose we have a large sample from a distribution F containing at most a fraction ε1% of contamination. Suppose we wish to bound the absolute difference D F = M F M F between the median M F of the contaminated distribution and the median M F of the core (uncontaminated) distribution. Huber (1964) showed that ( ) = (1) B ε M F M F σ F ε and therefore D F is bounded by σ B ε. In practice σ is seldom known and must be estimated bya robust scale functional S F, for example, the median of absolute deviations about the median. Unfortunatelythe quantitys F B ε is not an upper bound for D F because S F mayunderestimate σ.for instance, if F = 9N 1 + 1δ 15, then M F M F = 1397 > MAD F B 1 = = A quantity, K ε such that S F K ε is a bound for D F will be called bias bound. In Section 6 we studythe relation between maxbias curves and bias bounds for location and regression estimates. The bias bound is a new theoretical concept which highlights the practical potential of maxbias curves. In the case of regression estimates we face challenging problems because maxbias curves for regression estimates are derived using a normalized distance (quadratic form) between the asymptotic and the true values of the regression coefficients. The normalization is based on a certain unknown scatter matrix of the regressors. Moreover, the maximum bias curve depends on the joint distribution of the regressors and the available formulas relied on unrealistic assumptions (elliptical regressors and

3 226 J. R. BERRENDERO AND R. H. ZAMAR regression-through-the-origin model). Using the results of Theorem 1 we are able to derive satisfactory(but possiblynot optimal) bias bounds for robust regression estimates satisfying (3) below. Similar results for other classes of robust estimates, for example, one-step Newton-Raphson estimates [Simpson, Ruppert and Carroll (1992)], projection estimates [Maronna and Yohai (1993)] and maximum depth estimates [Rousseeuw and Hubert (1999)] would be desirable. The rest of the paper is organized as follows. In Section 2 we present some notation and technical background. In Section 3 we laydown the theoretical ground for our method. In Section 4 we illustrate our method with elliptical regressors. In Section 5 we consider several cases of non-elliptical regressors. In Section 6 we show how bias bounds can be obtained. In Section 7 we compare the robustness properties of some robust estimates for the intercept parameter. In Section 8 we give some concluding remarks. Finally, we collect all the proofs in the Appendix. 2. Notation and technical background. Consider the linear regression model with p-dimensional regressors and intercept parameter y i = α + x i + σ u i 1 i n where the independent errors, u i, have distribution F and are independent of the x i. We assume that the regressors x i are independent random vectors with common distribution G. The joint distribution of y i x i under this model is denoted H. To allow for a fraction ε of contamination in the data we assume that the actual true distribution H of y i x i belongs to the contamination neighborhood V ε H = { H H = 1 ε H + ε H } H arbitrarydistribution Let T be an p valued regression affine equivariant functional for the estimation of, defined on a subset of distribution functions H on p+1, which includes all H in V ε H and all the empirical distributions H n. A natural invariant measure of the robustness of T is given bythe maximum bias function (maxbias function) which gives the maximum bias caused bya fraction ε of contamination, (2) B T ε = sup T H T H 1/2 /σ H V ε H The matrix is an affine equivariant scatter matrix of the regressors under G. In view of the equivariance of T and the invariance of B T ε, wecan assume without loss of generalitythat =, σ = 1 and = I, and so B T ε =sup H Vε T H. We will also assume without loss of generalitythat α =. So far, derivations of B T ε for regression estimates were done assuming (i) that the regression is through the origin and/or (ii) that G is elliptical. Moreover, the methods and formulas used were rather specific for each particular type of estimates.

4 MAXBIAS FOR ROBUST REGRESSION ESTIMATES 227 We present a method to compute the maxbias curve of regression estimates which is general in several senses. First, it does not assume that the intercept term is known. Second, it does not require the assumption of ellipticityof G. Finally, it applies to a broad class of robust regression estimates, namely, those defined as (3) T H T H = arg min J F H α α where J is a robust loss functional and F H α is the distribution of the absolute residuals, r i α = y i α x i, under H. This definition includes, among others, S-estimates [Rousseeuw and Yohai (1984)], τ-estimates [Yohai and Zamar (1988)] and R-estimates [Hössjer (1994)]. Some examples of estimates which are not residual admissible are GM-estimates [see, e.g., Hampel et al. (1986)], GS-estimates [Croux, Rousseeuw and Hössjer (1994)] and P- estimates [Maronna and Yohai (1993)]. 3. The main result. We will assume that the robust loss functional satisfies: A1. (a) If F and G are two distribution functions on such that F u G u for every u, then J F J G. (b) ε-monotonicity. Given two sequences of distribution functions on, F n and G n, which are continuous on and such that F n u F u and G n u G u, where F and G are possibly sub-stochastic and continuous on, with G 1 ε and (4) G u F u for every u> then (5) lim J F n lim J G n n n Moreover, if 4 holds strictly, then 5 also holds strictly. (c) If F and G are two distribution functions on, with F continuous, then J 1 ε F + εδ = lim J 1 ε F + εu n J 1 ε F + εg n where U n stands for the uniform distribution function on n 1/n n+ 1/n. A1(a) is a monotonicitycondition that can be easilychecked in all the examples of Sections 4 and 5. A1(b) was introduced byyohai and Zamar (1993) who show that when J is ε-monotone, the corresponding estimate belongs to the general class of residual admissible regression estimates [He (199)]. The definition of residual admissible estimate is rather involved but, looselyspeaking, one can saythat the distribution of the absolute residuals produced by a residual admissible estimate cannot be improved (in the sense of stochastic dominance) byusing anyother set of parameters. Finally, A1(c) is a condition that specifies the form of the contaminations that cause the greatest loss.

5 228 J. R. BERRENDERO AND R. H. ZAMAR We will assume that the errors and regressors satisfy: A2. F has an even and strictly unimodal density f with f u > for every u, and P G x = c < 1, for each, c. Assumption A2 allows for great generality. In particular, it does not require ellipticitynor continuityof the regressors distribution. The following is our main result. Theorem 1. Let T be a regression estimate defined by 3 Let c = J 1 ε F H + εδ and (6) m t = inf =t inf J[ 1 ε F H α + εδ α where δ and δ are point mass distributions at zero and infinity, respectively. Then, under A1 and A2, the maxbias curve for T is given by (7) B T ε =m 1 c Some general properties of the contamination sensitivityand BP of residual admissible estimates with unknown intercept can now be obtained. These properties are stated in the following corollary. Corollary 1. Let T be a regression estimate defined by (3). With the same notation and assumptions of Theorem 1: (a) The slope of B T ε at zero is infinity: lim ε B T ε /ε = (b) The BP of T, ε, is given by ε = inf ε > m t <c for all t>. Notice that part (a) extends the results byhe (199) and Yohai and Zamar (1993, 1997) to the unknown intercept case. Moreover, part (b) shows how the BP of residual admissible estimates can be characterized through the loss function J. In the next section we illustrate the application of Theorem 1 in the Gaussian case. Non-elliptical distributions are considered in Section Gaussian and elliptical regressors. The following theorem shows that under strong symmetry assumptions on the regressors distribution the function m t, defined in (6), can be substantiallysimplified. Theorem 2. Assume A1 and A2, and that the distribution of x is symmetric, unimodal and only depends on for all. Then, it holds that inf α J [ 1 ε F H α + εδ ] = J [ 1 ε FH + εδ ] = m

6 MAXBIAS FOR ROBUST REGRESSION ESTIMATES 229 As a consequence the infima in (6) are no longer needed and the maxbias function B T ε satisfies [ ] (8) m B T ε = J 1 ε F H + εδ The symmetryassumption of Theorem 2 is clearlysatisfied bygaussian regressors, and more generally, byellipticallysymmetric and unimodal regressors. Moreover, in the Gaussian case the distribution of y x is normal with mean and variance and the function m t then becomes particularlysimple Maxbias curves for S-estimates. Regression S-estimates were defined byrousseeuw and Yohai (1984) bythe propertyof minimizing a scale M- estimate [see Huber (1964, 1981)]. It can be easilyverified that in this case the functional J in (3) is given by J F H α =S F H α, where (9) S F =inf { s> E F χ u/s b } We assume that the score function χ satisfies: A3. The function χ is even, bounded, monotone on, continuous at with = χ <χ = 1 and with at most a finite number of discontinuities. It can be easilychecked that A1(a) and A1(c) hold under A3. Moreover, Yohai and Zamar (1993) showed that S F is ε-monotone for all ε>. The method given bytheorems 1 and 2 can now be used to strengthen the results in Martin Yohai and Zamar (1989). First, it is easyto see that the formula (3.18) in that paper derived for S-estimators of regression through the origin is also valid for the general regression model. Second, Martin Yohai and Zamar (1989) and Yohai and Zamar (1993) established a minimax bias theoryfor the regression model. According to this theory, the least median of squares (LMS) estimate [Rousseeuw (1984)] minimizes the maximum bias among residual admissible estimates of regression through the origin. It is now immediate from Theorem 2 that the minimaxityof LMS extends to models including the intercept Maxbias curves for τ-estimates. The loss functional J in (3) for the case of τ-estimates is given by J F H α =τ 2 F H α with ( ) u τ 2 F =S 2 F E F χ 2 S F and S F is defined as in (9) with χ = χ 1.Ifχ 1 and χ 2 satisfya3 and χ 2 is differentiable with 2χ 2 u χ 2 u u, then A1(a) and A1(c) hold. Moreover, Yohai and Zamar (1993) showed that τ F is ε -monotone for all ε>. As shown byhössjer (1992), efficient S-estimates have a verylow BP. Yohai and Zamar (1988) showed that τ-estimates inherit the BP of the initial S- estimate defined by χ 1 whereas its efficiencyis mainlydetermined byχ 2.

7 23 J. R. BERRENDERO AND R. H. ZAMAR Therefore theycan have high efficiencyand BP simultaneously. A natural question then is to what extent the τ-estimate inherits the good bias behavior of the initial S-estimate. We can answer this question using Theorem 3 below. Notice that (1) establishes a nice relationship between the maxbias curves of the τ-estimates and their initial S-estimates. Theorem 3. Under the assumptions of Theorem 1 and assuming that the errors and the regressors are Gaussian, the maxbias curves for τ-estimates, B τ ε, are given by (1) B τ ε = { 1 + B 2 S ε H ε 1} 1/2 where B S ε is the maxbias curve of the initial S-estimate based on χ 1 and H ε is defined as [ ( ) b ε H ε = g + ε ]/ ( ) b g 1 ε 1 ε 1 ε with g s =g 2 g 1 1 s and g i s =E χ i u/s, fori = 1 2. Here, is the standard normal distribution function. A similar theorem can be stated for elliptical regressors. The maxbias curves for 95% efficient τ-estimates (solid line) and their initial S-estimates (dashed line) are given for the Huber loss functions χ y = min y/c 2 1 in Figure 1(a) and for the Tukeyloss functions χ y = min 3 y/c 2 3 y/c 4 + y/c 6 1 in Figure 1(b). The maxbias curves for the corresponding 95% efficient S-estimates are also displayed (dotted-dashed line). The breakdown points are.13 (Huber) and.12 (Tukey). The question posed above is answered then bythese figures. The maxbias curves of the efficient τ-estimates are closer to those of the initial S-estimates than those Maxbias 4 3 Maxbias (a) Huber score functions (b) Tukey score functions Fig. 1. Maxbias curves for tau-estimates solid line initial S-estimates dashed line and efficient S-estimates dotted-dashed line

8 MAXBIAS FOR ROBUST REGRESSION ESTIMATES 231 of the efficient S-estimates. Clearly, the gain in efficiencyis less costlyin the Huber case. Remark. Our method can also be applied to R-estimates [see Hössjer (1994)]. In this case J F H α =R F H α, where R F = a F u u k df u Berrendero and Zamar (1995) and Croux, Rousseeuw and Hössjer (1996) independentlygave formulas for the maxbias curve of R-estimators of regression through the origin. The method given bytheorems 1 and 2 shows that these formulas extend to the general regression model. 5. Non-elliptical regressors. We wish to investigate the effect of nonelliptical regressors on the bias of robust estimates. To that effect, we consider a regression through the origin model and compute the maxbias curve for three different regression S-estimates (ˆ LMS, ˆ H and ˆ L ) for several bivariate x-configurations. The corresponding score functions are { y c1 χ LMS y = 1 y>c 1 for ˆ LMS, for ˆ H and χ H y =min y/c χ L y =min y/c 3 1 for ˆ L. The constants c 1 = 674, c 2 = 1 41 and c 3 = 1 47 are chosen so that these estimates have BP equal to 1/2. The multiple integration needed to evaluate J 1 ε F H + εδ has been carried out using Monte-Carlo methods with 1, replicates. For each t>, the infimum over =t has been approximated byevaluating J 1 ε F H + εδ at γ = t cos γ tsin γ with γ ranging over the grid o 1 o 18 o. Finally, the equation m t =c has been solved bybisection Kurtosis of the marginal x-distributions. To investigate the effect of the tails of the marginal x-distributions, we compute the maxbias curves for several Student-t bivariate distributions. We found that, when the regressors x 1 and x 2 are independent Student-t random variables with n 1 n 2 degrees of freedom, the maxbias curve is solely determined bythe minimum of the two numbers, namelyn 1. Figure 2(a), (b) and (c) correspond to n 1 = 3, n 1 = 6 and n 1 = 12 respectively. Figure 2 (d) displays the Gaussian limiting case (n 1 ). As n 1 decreases, the maxbias curves of the three estimates increase, more dramaticallywhen

9 232 J. R. BERRENDERO AND R. H. ZAMAR Maxbias 6 4 Maxbias (a) Student t with 3 df (b) Student t with 6 df Maxbias 6 4 Maxbias (c) Student t with 12 df (d) Standard normal Fig. 2. Maxbias curves of ˆ LMS solid line ˆ H dashed line and ˆ L dotted-dashed line for independent Student-t and Gaussian regressors. n 1 = 3. The maxbias curves are alreadyveryclose to the limiting Gaussian case when n 1 = 12. Although the maxbias curves behave differentlyfor each x-configuration, the relative maxbias behaviors are remarkablypreserved across the considered cases: the maxbias curves of ˆ LMS and ˆθ H are quite similar and the maxbias curve of ˆ L is somewhat larger than the other two, in all the considered cases Skewness of the marginal x-distributions. To investigate the effect of unbalanced x-configurations (asymmetric marginal) we compute the maxbias curves for several Chi-square bivariate distributions. In Figure 3(a), (b) and (c) the maxbias curves for two independent Chisquare regressors x 1 x 2 with several pairs of degrees of freedom n 1 n 2 are displayed. As the regressors become more asymmetric the maxbias curves of the three estimates increase. However, when one of the two regressors is

10 MAXBIAS FOR ROBUST REGRESSION ESTIMATES Maxbias 1 5 Maxbias (a) X1 and X2 are chi square with 1 df (b) X1 and X2 are chi square with 1 and 5 df Maxbias 1 5 Maxbias (c) X1 and X2 are chi square with 1 and 1 df (d) Standard normal Fig. 3. Maxbias curves of ˆ LMS solid line ˆ H dashed line and ˆ L dotted-dashed line for independent Chi-square and Gaussian regressors. nearlygaussian [see Figure 3(d)] the maxbias curves approach a Gaussian behavior. As in the previous case, the relative maxbias behaviors are preserved across the considered cases Dependent regressors. To investigate the effect of dependent regressors we compute the maxbias curves for two closelyrelated x-configurations. In the two cases x 1 is standard normal and x 2 is Chi-square with one degree of freedom. In the first case x 2 = x 2 1 which corresponds to a second degree polynomial regression [see Figure 4(a)]. In the second case [Figure 4(b)] x 1 and x 2 are independent. Surprisingly, the maxbias curves are larger in the independent case although the results for the two cases are strikinglyclose. 6. Bias bound. To introduce the main ideas we will consider first the simple location model.

11 234 J. R. BERRENDERO AND R. H. ZAMAR Maxbias 1 5 Maxbias (a) (b) Fig. 4. Maxbias curves of ˆ LMS solid line ˆ H dashed line and ˆ L dotted-dashed line for (a) x 1 standard normal and x 2 = x 2 1 and (b) x 1 standard normal and x 2 Chi-square with one degree of freedom and independent of x Location model. Let y i = θ + σ u i, i = 1 n, where the errors, u i, follow a nominal distribution F satisfying A2. We assume that the actual distribution of the observations y 1 y n is F V ε F, that is, F y = 1 ε F y θ /σ +ε F for some arbitrarycontamination distribution F. Let M F be a location M-functional to estimate θ and let S F be a scale M-functional to estimate σ. Martin and Zamar (1993) give formulas for the maxbias of the location functional B M ε = sup F V ε F M F θ σ and for the implosion and explosion maxbiases of the scale functional S + S ε = sup S F and S S F S ε = inf F V ε F σ F V ε F σ As argued earlier on (in the Introduction) the quantity S F B M ε is not a bound for D F = M F θ. On the other hand, D F σ B M ε =S F B M ε σ S F S F B M ε /S S ε and so K ε =B M ε /S S ε is a bias bound. A refinement (of practical value when ε> 2) is provided bythe following lemma. Lemma 1. Let M F be an equivariant location functional with maxbias function B M ε and BP.5. Let S F be a scale M-functional with score function χ satisfying A3 and such that E F χ X =1/2. Let γ t be defined as the

12 MAXBIAS FOR ROBUST REGRESSION ESTIMATES 235 unique solution to 1 ε E F χ X t /γ t = 1/2. Then is a bias bound for M F. K ε = sup t B M ε t γ t Table 1 gives the values of B M ε, K ε and K ε for the median when S = MAD, for several values of ε and for F =, the standard normal distribution. Notice that K ε and K ε are larger than B M ε because they take into account the possible underestimation of σ Regression model. In the regression case one would normallybe interested in bounding the difference between the estimated regression coefficients and their true values. Notice that B T ε gives an upper bound for the invariant quantity T H T H T H T H 1/2 /σ [see (2)] whereas the straight difference T H T H depends on the units used to measure the response and the regressors. Moreover, as in the location model, to obtain a usable bound we must take into account the possible bias in the estimation of σ and. To fix ideas, we will assume that = Cov G x. Let a 1 G a p G be the unit eigenvectors of and λ 1 G λ p G be the corresponding eigenvalues. We have the following result: Lemma 2. For all H V ε H, T H T H σ B T ε / λ 1 G In applications, the nuisance parameters σ and λ 1 G must be robustlyestimated. The residual scale, σ can be estimated bya robust residual scale estimate S F applied to the distribution of the regression residuals F H T H T H that is, (11) ˆσ H = S F H T H T H Estimation of λ 1 G. Suppose that the core x-configuration is known except for an affine transformation. More precisely, suppose that G is the distribution function of x = A 1 x + where A is an unknown invertible matrix, Table 1 Maxbias and bias bound for the median S = MAD when F is the standard normal distribution B M K( ) K

13 236 J. R. BERRENDERO AND R. H. ZAMAR is an unknown vector and x has some specified canonical distribution function G with E G x = and Cov G x =I. Some examples are (i) the core distribution of x is multivariate normal or Student-t and (ii) the non-contaminated entries of x are linear combinations of some independent random variables (e.g., Student-t random variables with n 1 n 2 n p degrees of freedom). We will see that in order to be useful for the construction of the bias bound, the functional ˆλ 1 G must have the following two properties: (a) it must have a known upper bound for its over-estimation bias; and (b) it must satisfy the inequality ˆλ 1 G λ 1 G. We have found a functional which satisfies these requirements. The definition of this functional resembles the projection pursuit robust principal component methods proposed byli and Chen (1985). Let ρ be a continuous loss function satisfying A3 and such that (12) max inf E a =1 t G ρ a x t =1/2 For each distribution G each (unitary) vector a and each real number t let S G a t be the solution to (13) and let (14) Define, (15) E G ρ a x t /s =1/2 S G a =inf t S G a t ˆλ 1 G = min S G a a =1 The following lemma shows that ˆλ 1 G satisfies the conditions (a) and (b) above. Statement (c) of the lemma gives a condition under which the estimate is Fisher consistent. Note that this condition holds when G is spherically symmetric. In general, ρ should be chosen so that min a =1 inf t E G ρ a x t is close to 1/2. Lemma 3. Let x G and z =A 1 x + v G b = A a/ A a and t = t + a Av / A a. Suppose that ρ satisfies A3 and is continuous and strictly monotone at any u such that ρ u < 1 Then: (a) A a S G b t =S G a t for all a =1 and for all t (b) A a S G b =S G a for all a = 1 (c) ˆλ 1 G λ 1 G with equality if min a =1 inf t E G ρ a x t =1/2 As in the location case, we wish to find a bias bound for T H that is a quantity K ε such that (16) T H T H ˆσ H K ε =â H K ε ˆλ 1 G for all H V ε H

14 MAXBIAS FOR ROBUST REGRESSION ESTIMATES 237 where K ε onlydepends on H and ε, and â H can be consistentlyestimated. We have the following result: Theorem 4. Let ˆσ H and ˆλ 1 G be defined by 11 and 15 Suppose that ρ satisfies A3 and is continuous and strictly monotone at any u such that ρ u < 1 Then 16 holds with (17) S + a ε K ε =max a =1 ˆσ ε B T ε where ˆσ ε is the implosion bias of the scale functional used in 11 see Martin and Zamar 1993 and S + a ε = N a ε S G a where N a ε =inf t N a ε t and N a ε t satisfies ( a ) x t 1 ε E G ρ + ε = 1 N a ε t 2 If the core distribution of the regressors, G, is multivariate normal, then 16 holds with (18) K ε = S+ ε ˆσ ε B T ε where S + ε is the explosion bias of the S-scale functional S F and ˆσ ε is as before. Table 2 gives the values of B T ε and K ε when T is the least median of squares, G is Gaussian, and ρ is a jump function. As in Table 1, K ε is larger than B T ε because it takes into account the asymptotic bias in the estimation of σ H and λ 1 G. Notice that, because the estimation problems are now more involved, the differences observed in this table are larger than those observed in Table 1. However, it should also be noted that the bias bound given in Theorem 4 is probablynot optimal. It is maybe an interesting open problem to devise bias bounds which are optimal so that the differences in Table 2 are minimized. Often one is interested in linear combinations of the regression coefficients φ H =c (e.g., contrasts or predictions) which can be estimated bythe

15 238 J. R. BERRENDERO AND R. H. ZAMAR Table 2 Maxbias and bias bound for the least median of squares when G is a Gaussian distribution B T K functional φ H =c T H. Applying the results above we have φ H φ H c K ε â H and therefore, the bias bound for φ H n is c K ε â H n To use in practice the bias bound given bytheorem 4 we should specify(up to an affine transformation) the core distribution G and the amount of contamination ε and then compute the quantity K ε. The ratio ˆσ H / ˆλ 1 G depends on the sampling distributions H and G. Although these distributions are unknown, we have observations drawn from them and, therefore, we can replace H and G bythe empirical distributions H n and G n. For the bias bound to be asymptoticallyvalid, it is necessarythat / / â H n = ˆσ H n ˆλ 1 G n ˆσ H ˆλ 1 G =â H Since most robust scale estimates are consistent, the main concern is to show ˆλ 1 G n ˆλ 1 G. This is accomplished in the following result: Theorem 5. Let G n be the empirical distribution corresponding to a sample x 1 x n drawn from G. Assume that the loss function ρ is continuous and satisfies A3. Furthermore, assume that the loss function ρ and the distribution G are such that the functions E G ρ a x t and E s Gn ρ a x t are strictly s decreasing in s> a.s. for each a A = a: a =1, x p and t. Then, ˆλ 1 G n ˆλ 1 G a.s. 7. Intercept maxbias. Let T H T H be the regression functional defined by(3). Due to invariance considerations we define the intercept bias and maxbias as b H = T H α + T H / σ and B T ε = sup H V ε H b H respectively. Compare with Adrover, Salibian and Zamar (1999). Here, is a multivariate location parameter for x under G. We can then assume without loss of generalitythat = =, α = and σ = 1 and therefore B T ε = sup H Vε H T H.

16 MAXBIAS FOR ROBUST REGRESSION ESTIMATES 239 Table 3 Upper bounds for the maxbias of ˆα M and lower bounds for the maxbias of several residual admissible intercept estimates: T LMS least median of squares T H 5 BP Huber S-estimate T T 5 BP Tukey S-estimate Tτ H 95% efficient and 5 BP τ-estimate with Huber score function and T τ T 95% efficient and 5 BP τ-estimate with Tukey score functions ˆ M T LMS T H T T T H T T Although we cannot compute the maxbias for T, we are able to obtain (see Theorem 6 below) an upper bound for the maxbias of the intercept estimate ˆα M defined as ˆα M H = med ( F H T H ) where F H T H stands for the distribution function of y T H x under H. We will provide some evidence suggesting that for small ε (ε 15 in our calculations) the intercept maxbias of T is larger than that of ˆα M. (19) (2) Theorem 6. For every u, define F 1 u = F 2 u = inf t B T ε sup t B T ε inf 1 ε F H u =t sup 1 ε F H u +ε =t Let B 1 ε = inf u F 1 u > 1/2, B 2 ε = inf u F 2 u > 1/2 (and B 1 ε = and B 2 ε =, respectively, if the conditions are never satisfied. Define B ˆα M ε = max { B 1 ε B 2 ε } Then, B ˆαM ε B ˆα M ε Moreover, if the core distribution of the data, H,is multivariate normal, then B ˆα M ε = 1 + B 2 T ε 1/2 1 1/ 2 1 ε. The second column of Table 3 gives the values of the maxbias upper bound, B ˆα M ε, for the Gaussian case. The estimate T H used to compute ˆα M is the Huber S-estimate with BP equal to 1/2. The remaining columns of Table 3 give some maxbias lower bounds for several residual admissible estimates, for Gaussian H. More precisely, these are the values of T 1 ε H + εδ y where for each case the value of y has been chosen so that T 1 ε H + εδ y is maximized over the grid From Table 3 we conclude that, for ε 15, B ˆαM ε B T ε for all the considered cases.

17 24 J. R. BERRENDERO AND R. H. ZAMAR 8. Concluding remarks. As mentioned in the Introduction, the maximum bias theoryhas two main possible applications: (i) the comparison of competing robust regression estimates in terms of their bias behavior and (ii) the computation of bias bounds for robust estimates in practical situations. Regarding the first application, our results confirm the relevance of the existing global robustness theoryin two ways. First, theyshow that, under elliptical configurations, taking the intercept parameter equal to zero corresponds to the least favorable situation and so the maxbias curves for regressionthrough-the-origin models formallyagree with those for the general model. Therefore, the minimaxityof the LMS-estimator established bymartin Yohai and Zamar (1989) and Yohai and Zamar (1993) is proved to be valid for models including the intercept parameter. Second, our calculation of maxbias curves for several robust regression estimates under various x-configurations indicates that the maxbias ranking obtained using the Gaussian regressors also hold for other x-configurations. In summary, the comparison of competing robust estimates done using the Gaussian theoryhave implications beyond this restrictive setting. Regarding the second application we have defined bias bounds and discussed their relation and difference with the maxbias curves. Our results show that in this regard the x-configuration matters since the maxbias curves changed substantiallyacross the several cases considered. It would be desirable to obtain bias bounds which are locallyof order ε [that is, K ε =o ε ] In order to do that one should look outside the class of residual admissible estimates. However, it is not yet clear how to do this with the needed generality(unknown nuisance parameters, non-elliptical regressors, model including the intercept, etc.). Natural candidates are GM-estimates. However, recent work byadrover, Salibian and Zamar (1999) show that there are enormous technical difficulties and that the BP of GM-estimates maytake a big drop when the intercept is in the model. For the p = 1 case considered in the given reference one has: 1. BP = 1/2 when the intercept is known. 2. BP = 1/3 when the intercept is unknown and the location of the regressor is known. 3. BP = 1/4 when the intercept and the location of the regressor are unknown. Another possibilityare one-step Newton-Raphson estimates [see Simpson, Ruppert and Carroll (1992)]. The technical difficulties in this case are also considerable, speciallywhen p>1. To achieve high BP and local maxbias of order o ε, the initial estimate must have high BP and local maxbias of order o ε [see Simpson and Yohai (1998)]. We believe that knowledge about the maxbias behavior of the initial estimate (a residual admissible estimate in all the given proposals) will be useful, if not essential, for the derivation of bias bounds for these estimates. Bias bounds take into account both the core distribution G (which is specified bythe user) and the data [through the estimation of â H ]. Perhaps it maybe desirable to obtain completelyempirical bias bounds, that is, solely

18 MAXBIAS FOR ROBUST REGRESSION ESTIMATES 241 based on data. This could be achieved byreplacing G, in our bias bounds, by an estimate Ĝ computed from the sample. This approach poses interesting new theoretical and practical problems. APPENDIX Lemma 4. Let M =inf α J 1 ε F H α + εδ. Under A1(b) and A2, there exists α such that M =J [ ] 1 ε F H α + εδ Moreover, for every t> there exists K t > such that α K t for each : =t. Proof. From A1(b) and A2, it follows that J 1 ε F H α + εδ is a continuous function of α and. Since, for each u>, lim α F H α u < F H u it also follows from A1(b) that lim J [ ] [ ] 1 ε F H α + εδ >J 1 ε FH + εδ α Therefore, for each p there exists K > such that the infimum must be attained in the compact set K K and, therefore, the infimum is actually a minimum. Assume that the last assertion of this lemma is not true, it is then possible for some t> to find a sequence n : =t such that lim n α n =. Suppose w.l.o.g. that n. For each α and u we have that lim n [ 1 ε FH α n n u +εδ u ] = ε 1 ε F H α u +εδ u Hence, lim J [ ] [ ] (21) 1 ε F H α n n n + εδ J 1 ε FH α + εδ On the other hand, the definition of α implies that, for each α, lim J [ ] [ ] (22) 1 ε F H α n n n + εδ J 1 ε FH α + εδ It follows from (21) and (22), that the value of J 1 ε F H α + εδ does not depend on α, but this is a contradiction since lim α J 1 ε F H α +εδ < J 1 ε F H + εδ Lemma 5. Under A2, for all =1, λ>and u>, F H λα λ u is strictly decreasing in λ. Proof. The reader can check this result byfollowing the lines of the proof of Lemma A.1 in Yohai and Zamar (1993) rewriting the details when necessary. The following lemma shows that the function m 1 is well defined.

19 242 J. R. BERRENDERO AND R. H. ZAMAR Lemma 6. Let m t be as in equation 6 Under A1(b) and A2: (a) There exists t p and α t such that t =t and (b) m t is strictly increasing. Proof. m t =J [ 1 ε F H α t t + εδ ] With the notation of Lemma 4, notice that m t = inf M = inf inf J[ ] 1 ε F H α + εδ =t =t K t K t where J 1 ε F H α + εδ is uniformlycontinuous on the compact set : =t K t K t. Therefore, M is continuous on the compact set : = t, as it is the infimum of an equicontinuous familyof functions evaluated at. Part (a) follows from this fact. Let t 1 and t 2 be such that t 1 >t 2. Define λ = t 2 /t 1 < 1. Applying part (a), there exists 1 and 2 such that m t 1 =M 1 and m t 2 =M 2. Since, bylemma 5, F H α 1 1 u <F H λα 1 λ 1 u, it follows from A1(b) and the definition of α that ] m t 1 >J [ 1 ε F H λα 1 λ 1 + εδ J [ ] 1 ε F H α λ 1 λ 1 + εδ But, bythe definition of m t and given that λ 1 =t 2, m t 2 M λ 1 =J [ 1 ε F H α λ 1 λ 1 + εδ ] The last two inequalities prove part (b). Proof of Theorem 1. Let t be such that c = m t First, we prove that B T ε t. Let p be such that =t>t. It is enough to show that for every H V ε H and every α, J F H α >J F H. It is clear that for each H V ε H, α and u>, (23) F H α u 1 ε F H α u +εδ u Inequality(23), A1(a), the definition of the function m t and Lemma 6(b) implythat, for each H V ε H, J ( ) [ ] (24) F H α J 1 ε F H α + εδ m t >m t The condition c = m t and A1(c) imply m t = lim J [ ] (25) 1 ε F H + εu n J FH n Finally, inequalities (24) and (25) yield the first part of the result. Now, we show the inequality B T ε t. Let t be such that t<t. The idea of the proof is to find a distribution H V ε H such that T H >t. The inequalitymust hold if we can exhibit such a distribution for everyt <t.

20 MAXBIAS FOR ROBUST REGRESSION ESTIMATES 243 ByLemma 6(a), there exist t and α t such that m t =J 1 ε F H α t t + εδ. Define the following sequence of contaminated distributions: H n = δ yn x n where x n = n t and y n is uniformlydistributed on the interval α t + nt 2 1/n α t + nt 2 + 1/n. IfF n is the uniform distribution function on 1/n 1/n, then for each p, u> and α, F H n α u =F [ n u + α αt n ( t 2 )] t (26) [ F n u + α αt n ( t 2 )] t Let H n = 1 ε H + ε H n. Suppose that sup n T H n <tin order to find a contradiction. Under this assumption, there exists a convergent subsequence, denoted by T H n, such that lim n T H n = lim n n = where = t <t Since t 2 t t =, it follows from (26) that (27) lim F n H n α t t u =1 for u> Next, we show that we can also assume w.l.o.g that the subsequence of intercepts corresponding to n, denoted by T H n = α n, converges to a limit α. Assume for a moment that lim n α n =. Then, for each u>, (28) lim F H n n α n n u = ε lim F n H n α n n u < 1 ε F H α t t u +εδ u = lim F Hn α n t t u For the last equality, we apply(27). The strict inequalityfollows from A2 since { F H α t t u = P F y αt t x u x} dg x > for u> From (28) and assumption A1(b) we have that J F Hn α n n >J F Hn α t t for large enough n,and this fact contradicts the definition of α n n. Notice that t t =t t <t 2. Hence, t 2 >. It follows from (26) that (29) lim F n H n α n T n u = for u> From (29) and Lemma 5, we have that for each u>, (3) lim F H n n α n T n u = 1 ε F H α u 1 ε F H u [ = lim 1 ε FH n u +εu n u ] Applying A1(b), A1(c) and inequality (3), (31) lim n J( F Hn α n n ) lim n J [ 1 ε F H + εu n ] = c = m t

21 244 J. R. BERRENDERO AND R. H. ZAMAR From (27), we have that for each u>, (32) lim n F H n α t t u = 1 ε F H α t t u +εδ u From A1(b), equation (32) and Lemma 6(b), (33) lim n J [ F Hn α t t ] = J [ 1 ε FH α t t + εδ ] = m t <m t Applying (31) and (33), we have that for large enough n, J ( F Hn α n n ) >J ( FHn α t t ) This last inequalityis a contradiction since α n n =arg min α J F Hn α. For every t<t we have found a sequence of distributions H n in the neighborhood V ε H such that sup n T H n t. Proof of Corollary 1. From Theorem 1 and Lemma 6, there exist t p and α t such that J 1 ε F H α t t + εδ = c, and B T ε = t = t. Moreover, since α t = α t, J [ ] [ ] 1 ε F H t + εδ J 1 ε FH α t t + εδ = c = J [ 1 ε F H + εδ ] Therefore, bya1(b), there exists u such that 1 ε F H t u +ε 1 ε F H u This implies that F H t u F H u ε/ 1 ε. At this point, we obtain the conclusion of part (a) byreproducing the proof of Theorem 2.1 in Yohai and Zamar (1997). To prove part (b), notice that, from Theorem 1, ε<ε implies that B T ε <. On the other hand, ε>ε implies that m t <cfor all t>. Therefore, following exactlythe same lines of the second part of the proof of Theorem 1, we deduce that B T ε >tfor all t> and, hence, B T ε =. Proof of Theorem 2. It is easyto check that, for all u>, F H α u =P { u + α y x u + α } Bythe symmetryand unimodalityassumptions on F and G, we have that, for all α, and therefore, using A1(a), F H α u P { u y x u } = F H u J [ 1 ε F H α + εδ ] J [ 1 ε FH + εδ ] for all α. Finally, notice that, under the assumptions of the theorem, F H onlydepends on through the value of.

22 Proof of Theorem 3. that (34) MAXBIAS FOR ROBUST REGRESSION ESTIMATES 245 We will apply(7) and (8) to obtain (1). First, notice c = τ 2[ 1 ε F H + εδ ] ( b ε = [ g ε )] 2 [ 1 ε g Define m S = S 1 ε F H + εδ, then m τ = τ 2[ 1 ε F H + εδ ] (35) ( b ε 1 ε ) ] + ε ( y ) = m 2 S 1 ε E x H χ 2 m S = ( 1 + 2)[ ( )] b 2 g1 1 1 ε g 1 ε ( b 1 ε and, therefore, from (7), (34) and (35), the condition c = m τ implies [see also formula (3.18) in Martin, Yohai and Zamar (1989)] ( ) 2 g 1 + B 2 τ ε =1 1 1 b ε 1 ε + 2 = ( b ) H ε =B 2 S ε H ε g 1 1 Proof of Lemma 1. Since M is location equivariant, we can assume w.l.o.g. that θ = and σ = 1. Let V ε t F = F V ε F : M F =t. Notice that M F B M ε for all F V ε F and therefore V ε F = t B ε V ε t F. We have that (36) sup M F /S F = F V ε F 1 ε sup t B M ε sup t /S F F V ε t Now, for each F V ε t F, F = 1 ε F + ε F, it holds that { ( ) ( ) } X t X t S F =inf s>: 1 ε E F χ + εe S F F χ 1/2 S F and therefore, S F γ t. This fact, together with (36), proves the result. Proof of Lemma 2. There exist coefficients β i, i = 1 p, such that T H T H = p i=1 β ia i G Moreover, for all H V ε H, B 2 T ε T H T H T H T H /σ 2 p = β 2 i λ i G /σ 2 [ λ 1 G /σ 2 ] p i=1 = [ λ 1 G /σ 2 ] T H T H 2 β 2 i i=1 Therefore, sup H Vε H T H T H σ / λ 1 G B T ε. )

23 246 J. R. BERRENDERO AND R. H. ZAMAR Proof of Lemma 3. Part (a) follows directlyfrom the fact that E G ρ a x t /s =E G ρ [ b z t / s/ A a ] Part (b) follows because S G a =inf t S G a t = A a inf t S G b t = A a S G b To prove (c) first notice that (37) 1/2 inf t E G ρ [ a x t / S G a ] because otherwise there exists t 1 such that E G ρ a x t 1 /S G a < 1/2 and S G a t 1 <S G a Also, since lim t E G ρ a x t =1 there exists <K< and K t K such that (38) inf t E G ρ a x t = min K t K E G ρ a x t =E G ρ a x t Suppose now that S G a > 1 Then, (39) inf E G t ρ [ a x t /S G a ] < E G ρ a x t If (39) doesn t hold we arrive at a contradiction: inf E G t ρ [ a x t /S G a ] E G ρ a x t > E G ρ [ a x t /S G a ] where the strict inequalityholds because E G ρ a x t 1/2 < 1 together with the monotonicityassumption on ρ implythat the function (in s) E G ρ a x t /s is strictlydecreasing at s = 1 By(12), (37), (38) and (39), (4) 1/2 inf t = inf t E G ρ [ a x t / S G a ] < E G ρ a x t E G ρ a x t max inf E G a =1 t ρ a x t =1/2 Therefore, S G a 1 Finally, if Cov G x = AA by(b), ˆλ 1 G = min a =1 S G a =min a =1 S G b A a min a =1 A a = a 1 AA a 1 = λ1 G Finally, if min a =1 inf t E G ρ a x t =1/2 then S G b =1 for all b =1 and the equalityholds. Proof of Theorem 4. Let V ε G be the ε-contamination neighborhood for G and let V ε G be the corresponding ε-contamination neighborhood for G First of all notice that, given x G and z =A 1 x+v G, bylemma 3(b), (41) S G a = A a S G b and so (42) sup G V ε G S G a S G a = sup S G b A a G V ε G S G b A a = Moreover, bylemma 3(c), σ λ1 G = σ ˆλ 1 G ˆλ 1 G λ1 G σ (43) ˆλ 1 G sup S G b G V ε G S G b

24 MAXBIAS FOR ROBUST REGRESSION ESTIMATES 247 and (44) σ ˆλ 1 G = ˆσ H ˆλ 1 G ˆσ H ˆλ 1 G σ ˆλ 1 G ˆσ H ˆλ 1 G 1 ˆσ ε ˆλ 1 G ˆλ 1 G By(43) and (44) In addition, by(42), σ λ1 G ˆσ H ˆλ 1 G 1 σ ε sup G V ε G min a =1S G a min a =1 S G a (45) min a =1 S G a sup G V ε G = sup G Vε G min a =1 S G a min a =1 S G a /S G a S G a min a =1 S G a sup G Vε G min [ S G a /S G a ] S G a a =1 min a =1 S G a sup G V ε G = min S G a /S G a S G a a =1 min a =1 S G a min a =1 S + a ε S G a min a =1 S G a max a =1 S + a ε To obtain the next to the last inequalitynotice that, for each a =1 and t S G a t N a ε t In the Gaussian case, we have that S G a =σ a = a a. Moreover, S G a s, where s satisfies ( a ) x 1 ε E G ρ + ε = 1/2 s From Theorem 4 in Martin and Zamar (1993) we have that s = σ a S + ε, and therefore, sup ˆλ 1 G = G V ε G sup G V ε G a =1 min σ a S + ε = a =1 min S G a min sup a =1 G V ε G λ 1 G S + ε S G a The final assertion of the theorem follows from this inequality.

25 248 J. R. BERRENDERO AND R. H. ZAMAR The following lemma is needed to prove Theorem 5: Lemma 7. Under the assumptions of Theorem 5 for each c> (a) S G a t and S G n a t are uniformly continuous a.s. on a t A c c. (b) ρ a x t is uniformly continuous on a t A c c. S G a t (c) There exists C > such that S G n a = min t C S G n a t and S G a =min t C S G a t (d) S G n a and S G a are uniformly continuous on a A. Proof. From Lemma 3(a) in Martin and Zamar (1993) we know that for c>and <s <s <, the function ρ x t /s is uniformlycontinuous on x t s c c s s. Let a 1 t 1 s 1, a 2 t 2 s 2 A c c s s. Since ( a ) ( E 1 x t 1 a ) ( Gρ E G ρ 2 x t 2 a ) ( E G ρ 1 x t 1 a ) ρ 2 x t 2 s 1 s 2 then E G ρ a x t is uniformlycontinuous on a t s A c c s s s. Let δ > and let a 1 t 1, a 2 t 2 A c c. Define s 1 = S G a 1 t 1 and s 2 = S G a 2 t 2. Since E G ρ a x t /s is strictlydecreasing in s>, δ = EG ρ a 1 x t 1 s 1 δ 1 2 >. There exists δ 1 > such that if a 1 a 2 <δ 1 and t 1 t 2 <δ 1, then ( a ) ( E 1 x t 1 a ) Gρ E s 1 δ G ρ 2 x t 2 <δ s 1 δ /2 which implies that E G ρ a 2 x t 2 s 1 δ > 1/2 and, therefore, s 2 >s 1 δ. Analogously, it can be shown that there exists δ 2 > such that if a 1 a 2 <δ 2 and t 1 t 2 <δ 2, then s 1 >s 2 δ. Considering min δ 1 δ 2, we obtain s 1 s 2 <δ and part (a) follows. (The result for G n can be proved analogously.) Part (b) follows straightforwardlyfrom Lemma 3(a) in Martin and Zamar (1993) and part (a). To prove part (c), notice that S G a < and lim t S G a t = Therefore, for all a A there exists c a > such that S G a =min t c a S G a t = S G a t a. Moreover, there exists C>such that t a C for all a A because, if this was not the case, there would exist a sequence a k A (we assume a k a without loss of generality) such that t a k. Then, s 1 s 2 (46) lim S G a k t a k = k On the other hand, from the definition of t a and part (a) we have that (47) lim k S G a k t a k lim k S G a k =S G a <

26 MAXBIAS FOR ROBUST REGRESSION ESTIMATES 249 Equation (47) contradicts equation (46). Therefore, S G a = min t C S G a t. Analogously, we can prove that S G n a =min t C S G n a t. (We can assume the same value for C without loss of generality.) Finally, both S G a and S G n a are continuous since theyare defined as the minimum on the compact set C C of a familyof uniformlycontinuous functions. Since A is compact, both S G a and S G n a are uniformly continuous on A and therefore (d) follows. Proof of Theorem 5. Notice that it is enough to prove (48) max S G n a S G a a.s. a A Since A is a compact set, this fact, together with Lemma 7 (d) implies the conclusion of the theorem. Let δ>, a A and t C C. Since E G ρ a x t s a is strictlydecreasing, there exists δ such that E G ρ x t S G a t δ 1/2 δ >. For any γ, define B n a t γ = { 1 n n i=1 ( ρ a x i t S G a t δ ) 12 γ } ByLemma 7 (b), there exist t 1 t m with t j C C such that ( m B n a t j δ ) B 2 n a t a.s. j=1 t C Now, from Bernstein s inequality, there exists r> such that P G B c n a t j δ /2 { ( 1 n P G ρ n i=1 2 exp rnδ 2 a x i t j S G a t j δ ) ( E G ρ a x i t j S G a t j δ For all a A, wehave } P G {min S G n a t S G a t > δ t C { } { m ( P G B n a t P G B n a t j δ )} 2 t C j=1 { m ( = 1 P G B c n a t j δ )} 1 2m exp rnδ 2 2 j=1 ) > δ } 2 Analogously, it can be proved that, for all a A, } P G {max S G n a t S G a t <δ 1 2m exp rnδ 2 t C

27 25 J. R. BERRENDERO AND R. H. ZAMAR As a consequence, for all a A, } P G {max S G n a t S G a t >δ t C Therefore, n=1 } P G {max S G n a S G a >δ a A = n=1 n=1 < This inequalityimplies (48). } P G { S G n a n S G a n >δ 2m exp rnδ 2 } P G {max S G n a n t S G a n t >δ/2 t C Proof of Theorem 6. Let H V ε H and notice that T H B T ε. Since for some arbitrarydistribution H, F H T H = 1 ε F H T H, H +εf H T we have F 1 u F H T H u F 2 u for all u. Therefore, B 2 ε ˆα M H B 1 ε, and the first part of the result follows. When G is multivariate normal the inner infimum and supremum in (19) and (2) are not needed because F H u onlydepends on the length of. Indeed, for all =t we have ( ) F H u =P H y u x u = 1 + t 2 As a consequence, F 1 u = 1 ε u/ 1+B2 T ε and F 2 u = 1 ε u +ε. Hence, B 1 ε = 1+B2 T ε 1/2 1 1/ 2 1 ε and B 2 ε = 1 1/ 2 1 ε. Therefore, B ˆα M ε = 1 + B 2 T ε 1/2 1 1/ 2 1 ε. REFERENCES Adrover, J. G. (1998). Minimax bias robust estimation of the dispersion matrix of a multivariate distribution. Ann. Statist Adrover, J. G., Salibian, M. and Zamar, R. H. (1999). Robust inference for the simple linear regression model. Unpublished manuscript. Available at ftp://hajek.stat.ubc.ca/ pub/ruben/rislrm. Berrendero, J. R. and Zamar, R. H. (1995). On the maxbias curve of residual admissible robust regression estimates. Working Paper 95 62, Univ. Carlos III de Madrid. Berrendero, J. R. and Romo, J. (1998) Stabilityunder contamination of robust regression estimates based on differences of residuals. J. Statist. Plann. Inference Croux, C., Rousseeuw, P. J. and Hössjer, O. (1994). Generalized S-estimators. J. Amer. Statist. Assoc Croux, C., Rousseeuw, P. J. and Van Bael, A. (1996). Positive-breakdown regression byminimizing nested scale estimators. J. Statist. Plann. Inference

arxiv: v1 [math.st] 2 Aug 2007

arxiv: v1 [math.st] 2 Aug 2007 The Annals of Statistics 2007, Vol. 35, No. 1, 13 40 DOI: 10.1214/009053606000000975 c Institute of Mathematical Statistics, 2007 arxiv:0708.0390v1 [math.st] 2 Aug 2007 ON THE MAXIMUM BIAS FUNCTIONS OF