Robust estimators based on generalization of trimmed mean

Size: px

Start display at page:

Download "Robust estimators based on generalization of trimmed mean"

Oliver Osborne
5 years ago
Views:

Communications in Statistics - Simulation and Computation ISSN:

trimmed mean Lukáš Adam & Přemysl Bejda To cite this article:

generalization of trimmed mean, Communications in Statistics -

1337136 To link to this article: http://dx.doi.org/10.

1337136 Accepted author version posted online: 06 Jun 017.

Submit your article to this journal Article views: 1 View

1 Communications in Statistics - Simulation and Computation ISSN: (Print) (Online) Journal homepage: Robust estimators based on generalization of trimmed mean Lukáš Adam & Přemysl Bejda To cite this article: Lukáš Adam & Přemysl Bejda (017): Robust estimators based on generalization of trimmed mean, Communications in Statistics - Simulation and Computation, DOI: / To link to this article: Accepted author version posted online: 06 Jun 017. Published online: 06 Jun 017. Submit your article to this journal Article views: 1 View related articles View Crossmark data Full Terms & Conditions of access and use can be found at Download by: [Institute of Botany] Date: 18 July 017, At: :43

2 COMMUNICATIONS IN STATISTICS SIMULATION AND COMPUTATION 017, VOL. 0, NO. 0, Robust estimators based on generalization of trimmed mean Lukáš Adam a and Přemysl Bejda a a Institute of Information Theory and Automation, Czech Academy of Sciences, Prague, Czech Republic; b Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic ABSTRACT In this article, we propose new estimators of location. These estimators select a robust set around the geometric median, enlarge it, and compute the (iterative) weighted mean from it. By doing so, we obtain a robust estimator in the sense of the breakdown point, which uses more observations than standard estimators. We apply our approach on the concepts of boxplot and bagplot. We work in a general normed vector space and allow multi-valued estimators. 1. Introduction ARTICLE HISTORY Received 15 February 016 Accepted 5 May 017 KEYWORDS Breakdown point; Estimators; Geometric median; Location; Trimmed mean MATHEMATICS SUBJECT CLASSIFICATION 6G35; 6G05 Robust statistical methods try to weaken quite restrictive assumptions of classical methods. Forinstance,inmanystatisticalmodelsitisassumed that residuals are independent and identically distributed. Moreover, the assumption of normality of residuals is often added and the Euclidean norm is usually employed. But for real data these assumptions are often violated and classical approaches fail. This leads to a necessity to consider robust statistical methods. As classical examples of robust methods we mention replacing the l norm by the l 1 norm orreplacingthemeanbythemedian.suchdevelopmentwasalsoenabledbytheprogressin computer technologies as the robust methods are usually more complicated and computationally demanding than the classical methods. In this article, we are interested in robust estimators for the parameter of location. One of thefirstattemptstodealwithsuchestimatorsarem-estimators,seehuber(1964)ormaronna (1976). They are computationally simple but suffer from low breakdown point, see Hampel (1971)orDonohoandHuber(1983). This, loosely speaking, expresses the data fraction, which can be arbitrarily modified without affecting the finiteness of the estimator. Its value 1 d+1 is at most,seemaronnaetal.(006), where d is the dimension of observations. Later, multiple estimators got proposed, among others we mention: minimumvolumeellipsoidestimators(mve,rousseeuw,1985) whose name stems from the fact that among all proper ellipsoids containing at least half of observations, the one given by MVE has minimal volume. However, their efficiency is rather poor. S-estimators (Davies, 1987) have been suggested to overcome the low efficiency of MVE. They combine approaches of MVE and M-estimators. τ-estimators (Lopuhaä, 1991) also employ the idea of M-estimators but they do not require preliminary scale estimator. CONTACT Přemysl Bejda premyslbejda@gmail.com Faculty of Mathematics and Physics, Charles University, Sokolovská 83, Prague , Czech Republic. 017 Taylor & Francis Group, LLC

3 L. ADAM AND P. BEJDA Stahel Donoho estimator (Donoho, 198; Stahel,1981) isbasedontheideathatany outlier in multivariate case should be an outlier in some univariate projection. Theadvantageoftheseestimatorsistheirhighbreakdownpoint,whichisthehighestwhich a shift equivariant estimator can attain. However, their computation usually requires heavy effort. Therefore, Maronna and Zamarb (00)suggestedawayofreducingthecomputational complexity while sustaining the high breakdown point. As a price to pay, one is no longer able to estimate the covariance structure. In this article, we follow the goal of finding easily computable robust estimators with high breakdownpoint.fromasample,wefirstselectasetaofobservations,whicharerobustinthe sense of breakdown point. Then we utilize these observations to construct further estimators, for example, by enlarging A and computing the (iterative) weighted mean of observations fromtheenlargedset.setaisbased on the geometric median (also called the spatial median), which is a direct generalization of the real median proposed by Haldane (1948). Since the geometric median has breakdown point of 1, our estimators will be able to keep this property as well. Our estimators enjoy the following nice properties: Instead of considering the R d space, we work with a general normed vector space X.This opens a natural way to tackle time series by our approach. Our estimators have high breakdown point and are simple to compute. We partially consider the covariance structure. We are able to work with set-valued estimators instead of single-valued estimators. Note that estimators usually only satisfy several of the above properties, for example, either they have high breakdown point or they do not take into account the covariance structure at all. This article is organized as follows: in the first part of Section we define basic concepts of breakdown point and geometric median. Even though geometric median may be a set and not a point in general, most authors do not handle this fact. Because of this, we have decided to work with estimators, which are multifunctions (also known as set-valued maps). The second part of Section contains new results. We propose new estimators, discuss their breakdown point, and provide a comparison between our algorithms and M-estimators. Ournotationisasfollows:by(X, ) we understand a normed vector space. For a set A X, we define A := sup x A x. Often we will use the bold notation for x = (x 1,...,x n ) X n.bythelowerindexweunderstandacomponentofavectorwhileby the upper index, we mean an iteration number. A multifunction R : X Y is a generalization of a function, where R(x) doesnothavetobeonepointbutmaybea(nonempty)subsetof Y. WesaythatR is bounded on bounded sets if x A R(x) is a bounded set for all bounded sets A X. Since we consider multi-valued estimators, some of the definitions are slightly generalized.. New estimators based on generalization of trimmed mean In this section, we first recall the geometric median and on its basis derive other estimators. The basic idea is to find first the geometric median, then restrict ourselves to a set of neighboring observations and construct an estimator based only on this restricted set. If this set is chosen in a proper way, the estimator will have a breakdown point of 1.Thebreakdownpoint is nowadays one of the standard measures of robustness and expresses the minimal proportion of the data, which can be corrupted (made arbitrarily distant) before the estimator becomes unbounded.

4 COMMUNICATIONS IN STATISTICS SIMULATION AND COMPUTATION 3 Definition.1. Consider a normed vector space X and an estimator T n : X n X of some functional T.Forx = (x 1,...,x n ) X n and m = 1,...,n define A m,n (x) := { x X n x and x have at most m different coordinates}, m n (T n, x) := max m {1,...,n} {m sup x Am,n (x), z T n ( x), z T n (x) }. z z < Then we say that T n has the breakdown point ε n (T n, x) := 1 n m n (T n, x). Finally, for family of estimators {T n : X n X} we define the asymptotic breakdown point as ε := lim inf n x X ε n (T n, x). We continue with the definition of geometric median. Definition.. We define the geometric median as a multifunction ˆT n : X n X satisfying ˆT n (x 1,...,x n ) = argmin a X a x j. (1) We present now two examples. The first one shows that the choice of the norm can change the geometric median in a significant way and that the geometric median may indeed be multi-valued.thesecondonedepictsasimplesituationwhereweareabletocomputethe geometric median. Moreover, it will be used later in some proofs. Example.1. Consider X = R and points x 1 = (1, 0), x = ( 1, 0), andx 3 = x 4 = (0, 1). Then it is not difficult to verify that for the following norms, we have (R, 1 ) ˆT 4 (x 1,...,x 4 ) = conv{(0, 0), (0, 1)}, (R, ) ˆT 4 (x 1,...,x 4 ) ={(0, 1)}, (R, ) ˆT 4 (x 1,...,x 4 ) ={(0, 1)}, where conv stands for the convex hull. We see that for and the geometric median is determined in a unique way. This does not hold any more for 1. Example.. Consider x X and x = (x 1,...,x n ),wherex 1 = =x m = x for some m.fixanyy X.Thenwehave n x x i = i=1 = = i=m+1 i=m+1 x x i x y + i=m+1 i=m+1 j=1 x y + y x i + i=m+1 y x i m y x i i=1 y x i +(n m) y x i=1 due to the m n.butthismeansthat x ˆT n (x). y x i i=1 m y x Recall that a shift equivariant estimator T n satisfies T n (x 1 + y,...,x n + y) = T n (x 1,...,x n ) + y for all x 1,...,x n X and y X. For single-valued estimators, it has been i=1

5 4 L. ADAM AND P. BEJDA showninmaronnaetal.(006, formula (3.5)) that a shift equivariant estimator satisfies ε n (T n, x) 1 n 1. () n Since the geometric median possesses this property, it is not surprising, that we obtain formula ()aswell.moreover,weobtainevenequalityinthisestimate. Lemma.1. For any x = (x 1,...,x n ) X n,wehaveforthegeometricmedian ε n ( ˆT n, x) = 1 n 1, n and thus for the asymptotic breakdown point we have ε = 1. Proof. The second statement is an immediate consequence of the first one. Note that the first statement is equivalent to m n ( ˆT n, x) = n 0 with n 0 := n 1.FromExample., weseethat m n ( ˆT n, x) < n,whichfurtherimpliesm n ( ˆT n, x) n 1 n 0.Tofinishtheproof,itissufficient to show that m n ( ˆT n, x) n 0. Consider thus any x A n0,n(x) and denote by I the index set of coordinates where x and x differ and by J its complement. Denote by n 1 the cardinality of I and observe that n 1 n 0. Denoting further R := max i=1,...,n x i,wehave x j = x j R for all j J. Takingany j J and y X,weobtainthefollowingestimate: x j x l x j x l +(n n 1 )R x j y + y x l +(n n 1 )R l I l I l I = y x l y x l +n 1 x j y +(n n 1 )R l=1 l J y x l y x j + x j x l +n 1 x j y +(n n 1 )R l=1 l J l J y x l +(n 1 n) y x j +4(n n 1 )R. l=1 l=1 Since n 1 n n 0 n < 0, we obtain that there is R I > 0suchthatforall y R I we have x j x l < y x l. l=1 But this means that the geometric median lies in a ball with radius R I.Sincethereisonlyfinite number of possible subsets I,wehavefinishedtheproof. The next lemma is allows us to compute the breakdown point of an estimator. Lemma.. Consider multifunctions 1 : X n X m and : X n X m X. Assume that the following assumptions are satisfied: 1. All components of 1 have breakdown point at least p.. There exists 3 : X m Xwhichisboundedonboundedsetssuchthat (x, y) 3 (y) for all x X n and y X m. l=1

6 COMMUNICATIONS IN STATISTICS SIMULATION AND COMPUTATION 5 Then estimator T n defined as has breakdown point at least p. T n (x) := y 1 (x) (x, y) Proof. Due to the first assumption, there exists some R > 0suchthatform := np, all x A m,n (x) and for all y 1 ( x) we have y R. But then we have ( x, y) 3 (y), which is uniformly bounded due to the second assumption. Thus, the statement has been proved. We come now to new estimators. For a set S X and a point x = (x 1,...,x n ) X n,we define L (S, x) := { } x I X n 1 I {1,...,n} : max x i y min x i y, i I i {1,...,n}\I y S where x I denotes the restriction of x to components I. The interpretation of this set goes as follows: we select some y S and choose x I to be the n 1 observations closest to y. Then L (S, x) is the union of all such subsets with respect to all choices of y S. Since every such x I contains less than n components of x, this set is stable with respect to perturbations of x whenever less than one half of observations is contaminated. We will use L (S, x) todefinefurtherestimators.thenexttheoremsaysthatifwestartwith the geometric median S = ˆT n (x) and a multifunction R with certain boundedness properties, we obtain an estimator with the same breakdown point as the geometric median. Theorem.1. Consider any multifunction R : X n 1 X, which is bounded on bounded sets and for which there exists z k such that R(z k,...,z k ).ThenforestimatorT n : X n X defined as T 1 n (x) := y L ( ˆT n (x),x) R(y) (3) and for every x = (x 1,...,x n ) X n, we have the following relation ε n (T 1 n, x) = ε n ( ˆT n, x) = 1 n 1. n Proof. From Lemma. with m = n 1, 1 (x) = L ( ˆT n (x), x),and (x, y) = R(y) and Lemma.1 we obtain ε n (T 1 n, x) ε n ( ˆT n, x) = 1 n 1. n To show the opposite inequality, realize that the statement is equivalent to m n (T n 1, x) n 1. For contradiction assume that m n (T n 1, x) n n.wechangethefirst n coordinates of x to z k and denote the perturbed point by x k.thenexample. tellsusthat z k ˆT n ( x k ).DuetothedefinitionofL,weseethat(z k,...,z k ) L ( ˆT n ( x k ), x k ) and the imposed assumption of R implies a contradiction. If both R and L ( ˆT n (x), x) are single-valued functions, expression (3)reducesto T 1 n (x) = R(L ( ˆT n (x), x)).

7 6 L. ADAM AND P. BEJDA Moreover, in such a case L ( ˆT n (x), x) denotes one half of observations, which are closest to the geometric median. There are several natural choices for R: for example, mean, weighted mean, or geometric median. One of possible drawbacks of estimator (3)isthatitutilizesonlyhalfoftheoriginaldata.We want to make use of as many observations as possible while maintaining the high breakdown point. To this aim, we first consider a general set S X m for some m N,forexample,we may consider mean of x as a subset of X or L ( ˆT n (x), x) as a subset of X n 1.Thenwe consider some b : X X n [0, ) and enlarge S by defining E b (S, x) := { } x I I ={i x i m j=1 B(y j, b(y j, x))}. (4) y S Here, B(y j, b(y j, x)) stands for a ball around y j with radius b(y j, x).theinterpretationgoes as follows: from S we select y, make balls around all of its components, and select all components of x,whichlieintheunionofthisballs. Example.3. Consider the case of X = R, n = 5, and x = ( 3,, 0,, 4). Then the geometric median equals to ˆT n (x) = 0 and since n 0 = n 1 =, we also have If we consider b 1, then L ( ˆT n (x), x) ={(, 0), (0, )} R. E b (L ( ˆT n (x), x), x) ={( 3,, 0), (0, )}. Note that both elements of E b (L ( ˆT n (x), x), x) areofadifferentdimension. We obtain the following variant of Theorem.1,for which we omit its identical proof. Theorem.. Consider b : X X n [0, ) boundedonboundedsetsinthefirstvariable, uniformly in the second one, any family of multifunctions R s : X s Xfors= 1,...,n, which are all bounded on bounded sets and for which there exists z k such that R s (z k,...,z k ). Then for estimators Tn : X n XandTn 3 : X n Xdefinedas T n (x) := T 3 n (x) := y E b ( ˆT n (x),x) y E b (L ( ˆT n (x),x),x) R dim y (y), R dim y (y) and for every x = (x 1,...,x n ) X n, we have the following relation for breakdown points ε n (T n, x) = ε n (T 3 n, x) = ε n ( ˆT n, x) = 1 n 1. n Function b should neither have too large values (which corresponds to big enlargement of the set in question) because outliers may be close to the non-contaminated data, nor too small values because some information could be missed. We suggest a possible choice in the Appendix. To improve the behavior of the estimators, we implement an iterative procedure. We start with geometric median z 0 = ˆT n (x) and in every iteration k compute a new estimate z k. To do so, we employ (5b) withr being the weighted mean, where the (non-normalized) (5a) (5b)

8 COMMUNICATIONS IN STATISTICS SIMULATION AND COMPUTATION 7 weights satisfy 1 ( ) ify i L ({z k 1 }, x), w i (z k 1, y i ) = max 1 y i y j otherwise (6) b k (y j,x) y j L ({z k 1 },x) for some b k based on L ({z k 1 }, x).thischoiceofweightsmakesuseofthepossiblydivisionof components of y E b (L ({z k 1 }, x), x) into two parts: those who belong to L ({z k 1 }, x) and those who were added by enlarging this set. For the first part, we choose the (non-normalized) weight equal to one, the weight for observations from the second part decreases with the increasing distance to L ({z k 1 }, x). We summarize this approach in Algorithm.1. Considering the termination criterion, any standard criterion may be used, for example, if the (relative) change in z k is small. Algorithm.1 An estimator based on iterative weighting Input: observations x = (x 1,...,x n ) 1: k 0, z 0 ˆT n (x) : while not terminate do 3: k k + 1 4: determine b k based on L ({z k 1 }, x) 5: pick any y k E b k(l ({z k 1 }, x), x) 6: compute w k according to formula (6) and renormthem such that their sum equals to 1 7: z k dim y i=1 w k i yk i 8: end while 9: return estimate ˆx z k Finally, we would like to point out that every update step in Algorithm.1 keeps the stability result, which we already mentioned several times in the previous text. For k = 1, we may write z k = y 1 (x) (x, y), where y = y 1, 1 (x) := E b (L ({z 0 }, x), x), and (x, y) := dim y 1 i=1 w i (x, y 1 )yi 1.Then 1 has breakdown point 1 n 1 n and since (x, y) n max yi y y i holds true, thanks to Lemma. we obtain that z 1 hasthesamebreakdownpoint.byapplyingthesameprocedure to subsequent iterations, we obtain the same result for all z k. In the previous text, we highlighted some benefits of our estimators. Note that naturally there are also situations where it is better to use standard estimators. Consider, for example, the one-dimensional double exponential distribution with density 1 exp( x μ ). Then the (geometric) median coincides with the maximum likelihood estimator of μ, seesec. 6.3in Lehmann and Casella (006), and therefore the median is the most efficient estimator. Thus, by employing additional observations apart from the median, we only worsen the quality of an estimator. However, to benefit from such situation, we would have to know the true distribution and know that there is no contamination. Remark.1. There is some connection between our estimators with M-estimators. In Algorithm., we summarize the algorithm from Maronna et al. (006, sec..7.3).

9 8 L. ADAM AND P. BEJDA Algorithm. M-estimator from Require: observations x = (x 1,...,x n ), weighting functions W 1 and W 1: k 0, z 0 ˆT n (x), dispersion estimate σ 0 : while not terminate do 3: k k + 1 4: ri k x i z k 1 σ k 1 5: w1,i k W 1(ri k), wk,i W (ri k ) andnorm the weights to sum to one 6: z k n i=1 wk 1,i x i, σ k 1 n 7: end while 8: return estimate ˆx z k n i=1 wk,i (x i z k 1 ) Both approaches (iteratively) compute a weighted mean of observations. While M- estimators are based on the maximum likelihood estimate, ours are based on geometric intuition and trimmed mean. If X = R and if W 1 in Algorithm. has finite support, we can say that our estimators belong to the very wide class of M-estimators. This changes for R d though. Under the standard assumption that W 1 is symmetric around zero and non-increasing on rays emanating from zero, the weight of an observation depends only on the distance from z k 1. Thus, two observations have the same weight if and only if their distance to z k 1 is identical. On the other hand, in our approach all points in L ({z k 1 }, x) havethesameweightand this set is enlarged farther for distant observations. Thus, even though none of the algorithms estimates the covariance structure, our algorithm makes at least an attempt to consider it. Of course, there are M-estimators, which along with the location also properly estimate the covariance structure. But this raises the computational complexity and reduces the breakdown point to 1.Tosummarize:wecansaythatouralgorithmstrytopickthebestpropertiesof d+1 M-estimators, on one hand they have high breakdown point and are simple to compute; on the other hand, they at least partially consider the covariance structure. Another advantage of our estimator over M-estimators is a simpler theoretical analysis. To show that our estimator has the limiting breakdown point 1, it is sufficient to apply Lemma., which itself directly follows from the definition of the breakdown point. To the best of our knowledge, such direct application of the definition is not possible for M-estimators, for example, one has to take care of properties of weighting function due to the division by dispersion. 3. Numerical results In this section, we first show numerical performance of our estimators and then show how they comply with the well-known concepts of boxplot and bagplot Numerical performance We consider X = R d with d {1, 15} and compare our algorithms with known estimators; for reader s convenience we summarize the used algorithms in Table 1. To generate the samples, we first generate z i from N(0, 1), thencontaminatethemby some distribution with probability p and finally modify them via a covariance structure. This modification is performed in the following way: we randomly generate a correlation matrix C and diagonal matrix with diagonal elements having distribution U[0.5, 10]. Then we compute covariance matrix V = C, its Cholesky decomposition V = S S,andfinallyset y i = Sz i + μ, whereμ := (0,...,d 1). TogetherweconsiderN = 10,000 samples with

10 COMMUNICATIONS IN STATISTICS SIMULATION AND COMPUTATION 9 Table 1. Summary of algorithms. The horizontal line divides them in known and our algorithms. Mean mean Med median or geometric median Trun α-truncated mean with α = 0. Winsor α-winsorized mean with α = 0. M1 Huber M-estimator, see Huber (1981) M Algorithm. from Maronna et al. (006) SD Stahel Donoho estimator, see Stahel (1981)andDonoho (198) GM1 formula (3), where R is themean GM formula (5a), where R is themean GM3 Algorithm.1 Table. Value of loss function (7) for contaminations of N(0, 1) by some other distribution with probability p for X = R. Distribution p Mean Med Trun Winsor M1 M GM1 GM GM N(0, 100) Cauchy U( 10, 10) N(0, 100) Cauchy U( 10, 10) U( 0, 10) n = 100 observations. The loss function equals to 1 N N i=1 d ˆx i, j μ j, (7) j=1 where ˆx i, j denotes an estimate for sample i and coordinate j = 1,...,d. We present the results in Table for X = R and in Table 3 for X = R 15. The values show the loss function (7) for contaminating distribution (first column) with given probability (second column). Bold numbers are the best values among all estimators and numbers in italic are those within 5% of the best loss function value. For X = R, the results between M-estimators and our estimators are comparable with M-estimators in a slight lead, which is not surprising due to Remark.1. ForX = R 15, the performance of both estimators turn around and now our estimators perform better than the examined M-estimators. This is again expected as our estimators try to take into account the covariance structure as explained in Remark.1. For the second case, our estimators manage to beat the Stahel Donoho estimators almost in all cases. Table 3. Value of loss function (7) for contaminations of N(0, I) by some other distribution with probability p for X = R 15. Distribution p Mean Med M1 SD GM1 GM GM N(0, 100) Cauchy U( 10, 10) N(0, 100) Cauchy U( 10, 10) U( 0, 10)

11 10 L. ADAM AND P. BEJDA Figure 1. Comparison of the classical boxplot (left-hand side of each figure) and our modification (righthand side of each figure). In both cases, we contaminate N(0, 1) with a probability p by the distribution described under the figures. 3.. Relation to boxplot Boxplot was proposed for the first time in Tukey (1977). It takes the median, then computes the interquartile range (IQR), which is later widened. Observations, which are not present in this widening (known as whiskers) are considered as outliers. We compare boxplot with our method, where instead of considering IQR, we take y L ( ˆT n (x), x). Thus,wedonot need to take 5% observations with lower value than the median and 5% observations with higher value thanthe median but we take50%observations closest to the median. Whiskers are based on E b ({y}, x). We depict this comparison in Fig. 1.Wecontaminatethestandardnormaldistributionby some other distribution. In each (sub)figure, the left-hand side is the boxplot and the righthand side is our modification. Since both approaches differ in the way in which they treat nonsymmetry, both graphs in the left figure are identical. However, if we assume non-symmetric distributions (right figure), then our modification is able to detect outliers in a better way than the standard boxplot Relation to bagplot In this subsection, we consider generalization of boxplot into more dimensions. For two dimensions, this is known as bagplot and was studied for the first time in Rousseeuw et al. (1999). For generalization to functional data, see, for example, Sun and Genton (011). To construct bagplot, a real number called depth is assigned to every observation, see Zuo and Serfling (000). Then the observation with highest value of depth is called the depth median and the convex hull of approximately 50% observations with the highest depth is called the bag (this corresponds to IQR for boxplot). Then the bag is enlarged to the so-called fence (which corresponds to whiskers for boxplot). The observations not in the fence are considered as outliers. Wenowpresentourmodificationofthebagplot,illustratedinFig.. Havingasample x, we compute first its geometric median ˆT n (x). For simplicity, we assume that it is defined uniquely. Then we choose an arbitrary y X n 1 from L ( ˆT n (x), x), construct a convex hull containing all coordinates of y, and call this set the bag. In the last step, we enlarge the bag by considering conv{ ˆT n (x) + 3(y i ˆT n (x))}, i

12 COMMUNICATIONS IN STATISTICS SIMULATION AND COMPUTATION 11 Figure. Description of the bagplot based on GM. : observations, : mean, : ˆT n (x), fullanddashed line: convex envelope of L and E, respectively, :GM1, : GM, X: outliers. see a similar procedure in Rousseeuw et al. (1999). In this figure, we also denote GM1 (as the mean of all observation in the bag) and GM (as the mean of all observation in the fence). Here, the geometric median corresponds to the depth median, the convex hull to the bag, and its enlargement to the fence. Note that the construction of the fence is very similar to the construction of E b (L ( ˆT n (x), x), x). Moreover, since the construction of GM is based on objects, which have the limiting breakdown point of 1,estimatorGMhasthesameproperty. On the other hand, since bagplot is a generalization of boxplot, which has breakdown point of 1 4,wecannotexpectthebagplottohavehigherbreakdownpointthanthis.Thus,ourversion of bagplot deals better with outliers than the original method. Appendix: Choice of b Inthisshortsection,wederiveanestimateforb for algorithms GM and GM3 described in Table 1.Notethatduetotheconstructionofthealgorithm,itissufficienttodefineb i := b(x i, x) for all observations in L ( ˆT n (x), x). Since we want to keep GM as simple as possible, we consider constant b and relax this assumption for GM3. Moreover, we derive different value for one- and more-dimensional cases. We start with a technical lemma. Lemma A.1. Let Y be a random variable with a finite second moment, distribution function G, mean μ, standarddeviationσ,andletq μ,σ denote its quantile function. Then for a fixed α [0, 1], the following ratio does not depend on the values of μ and σ K G,α = q μ,σ (1 α/) q μ,σ (α/). q μ,σ (0.75) q μ,σ (0.5) Proof. This follows from the fact σ q 0,1 (α) + μ = q μ,σ (α). For the case of one dimension, consider a random sample x from N(μ, σ ) and denote its median by ˆT n (x). For simplicity assume that L ( ˆT n (x), x) R n 1 is a singleton. Then we may estimate q μ,σ (0.75) q μ,σ (0.5) by max L ( ˆT n (x), x) min L ( ˆT n (x), x). Then we define b as b := K N(0,1),α max L ( ˆT n (x), x) min L ( ˆT n (x), x) q μ,σ (1 α/) q μ,σ (α/).

13 1 L. ADAM AND P. BEJDA This value thus estimates the distance between chosen quantiles divided by two. If we take all observations, which distance from median is b, then we get, under assumption of not contaminated normality, approximately 1 α of our observations. For the multidimensional case R d, we assume that x follows the N(μ, σ I) distribution. We use max y L ( ˆT n (x),x) y ˆT n (x) as an approximation of σχ0.5,d,whereχ 0.5,d is the quantile function of χ distribution with d degrees of freedom evaluated at probability 0.5. To include approximately fraction α (0, 1) of all observations, we set b := max y ˆT n (x) χ α,d. y L ( ˆT n (x),x) χ0.5,d ForGM3,weconsiderdirectlyR d. Assume again that our sample x has distribution N(μ, σ I),thenforx i from the boundary of L ( ˆT n (x), x) we have x i ˆT n (x) σ χ 0.5,d. Now fixing given probability level α (0, 1) and weight w (0, 1), wewanttohaveallx e with x e ˆT n (x) χα,d σ to have weight (6)atleastw. But plugging this in the definition of weight results in b i x i x e 1 w x e ˆT n (x) 1 w = σ x i ˆT n (x) 1 w x i ˆT n (x) x i ˆT n (x) 1 w ). ( χ α,d χ 0.5,d σ ) ( χ α,d 1 w χ0.5,d Approximating again the distribution and taking minimum value of b i,weset χ α,d b i := (1 w) χ 0.5,d χ 0.5,d In the numerical experiments, we have chosen α = x i ˆT n (x). Acknowledgment The authors would like to thank anonymous reviewers for substantially improving this article. ORCID Lukáš Adam References Davies, P. L. (1987). Asymptotic behavior of S-estimators of multivariate location parameters and dispersion matrices. The Annals of Statistics 15: Donoho, D. L. (198). Breakdown Properties of Multivariate Location Estimators. Ph.D. Qualifying Paper, Harvard University.

14 COMMUNICATIONS IN STATISTICS SIMULATION AND COMPUTATION 13 Donoho, D. L., Huber, P. J. (1983). The notion of breakdown point. In: Bickel, P. J., Doksum, K. A., and Hodges, J. L., eds. A Festschriftfor Erich Lehman. California, Berkeley: Wadsworth, pp Haldane, J. B. S. (1948). Note on the median of a multivariate distribution. Biometrika 35(3 4): Hampel, F. R. (1971). A general qualitative definition of robustness. The Annals of Mathematical Statistics 4(6): Huber, P. J. (1964). Robust estimation of location parameter. The Annals of Mathematical Statistics 35(1): Huber, P. J. (1981). Robust Statistics. New York: John Wiley & Sons. Lehmann,E. L.,Casella, G. (006). Theory of Point Estimation. New York: Springer Science & Business Media. Lopuhaä, H. P. (1991).Multivariateτ-estimators for location and scatter. Canadian Journal of Statistics 19: Maronna, R. (1976). Robust M-estimators of multivariate location and scatter. The Annals of Statistics 4: Maronna, R., Martin, D., Yohai, V. (006). Robust Statistics Theory. New Jersey: John Wiley & Sons. Maronna, R. A., Zamarb, R. H. (00). Robust estimates of location and dispersion for highdimensional datasets. Technometrics 44(4): Rousseeuw, P. J. (1985). Multivariate estimation with high breakdown point. In: Mathematical Statistics and Applications. Amsterdam: Elsevier, pp Rousseeuw, P. J., Ruts, I., Tukey, J. W. (1999). The bagplot: A bivariate boxplot. The American Statistician 53(4): Stahel, W. A. (1981). Breakdown of Covariance Estimators. Research Report 31. ETH Zürich: Fachgruppe für Statistik. Sun, Y., Genton, M. G. (011). Functional boxplots. Journal of Computational and Graphical Statistics 0(): Tukey, J. W. (1977).Exploratory Data Analysis. Massachusetts: Addison-Wesley Publishing Co. Zuo, Y., Serfling, R. (000). General notions of statistical depth function. The Annals of Statistics 8():

TITLE : Robust Control Charts for Monitoring Process Mean of. Phase-I Multivariate Individual Observations AUTHORS : Asokan Mulayath Variyath.

TITLE : Robust Control Charts for Monitoring Process Mean of Phase-I Multivariate Individual Observations AUTHORS : Asokan Mulayath Variyath Department of Mathematics and Statistics, Memorial University