Robust estimators based on generalization of trimmed mean

Size: px
Start display at page:

Download "Robust estimators based on generalization of trimmed mean"

Transcription

1 Communications in Statistics - Simulation and Computation ISSN: (Print) (Online) Journal homepage: Robust estimators based on generalization of trimmed mean Lukáš Adam & Přemysl Bejda To cite this article: Lukáš Adam & Přemysl Bejda (017): Robust estimators based on generalization of trimmed mean, Communications in Statistics - Simulation and Computation, DOI: / To link to this article: Accepted author version posted online: 06 Jun 017. Published online: 06 Jun 017. Submit your article to this journal Article views: 1 View related articles View Crossmark data Full Terms & Conditions of access and use can be found at Download by: [Institute of Botany] Date: 18 July 017, At: :43

2 COMMUNICATIONS IN STATISTICS SIMULATION AND COMPUTATION 017, VOL. 0, NO. 0, Robust estimators based on generalization of trimmed mean Lukáš Adam a and Přemysl Bejda a a Institute of Information Theory and Automation, Czech Academy of Sciences, Prague, Czech Republic; b Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic ABSTRACT In this article, we propose new estimators of location. These estimators select a robust set around the geometric median, enlarge it, and compute the (iterative) weighted mean from it. By doing so, we obtain a robust estimator in the sense of the breakdown point, which uses more observations than standard estimators. We apply our approach on the concepts of boxplot and bagplot. We work in a general normed vector space and allow multi-valued estimators. 1. Introduction ARTICLE HISTORY Received 15 February 016 Accepted 5 May 017 KEYWORDS Breakdown point; Estimators; Geometric median; Location; Trimmed mean MATHEMATICS SUBJECT CLASSIFICATION 6G35; 6G05 Robust statistical methods try to weaken quite restrictive assumptions of classical methods. Forinstance,inmanystatisticalmodelsitisassumed that residuals are independent and identically distributed. Moreover, the assumption of normality of residuals is often added and the Euclidean norm is usually employed. But for real data these assumptions are often violated and classical approaches fail. This leads to a necessity to consider robust statistical methods. As classical examples of robust methods we mention replacing the l norm by the l 1 norm orreplacingthemeanbythemedian.suchdevelopmentwasalsoenabledbytheprogressin computer technologies as the robust methods are usually more complicated and computationally demanding than the classical methods. In this article, we are interested in robust estimators for the parameter of location. One of thefirstattemptstodealwithsuchestimatorsarem-estimators,seehuber(1964)ormaronna (1976). They are computationally simple but suffer from low breakdown point, see Hampel (1971)orDonohoandHuber(1983). This, loosely speaking, expresses the data fraction, which can be arbitrarily modified without affecting the finiteness of the estimator. Its value 1 d+1 is at most,seemaronnaetal.(006), where d is the dimension of observations. Later, multiple estimators got proposed, among others we mention: minimumvolumeellipsoidestimators(mve,rousseeuw,1985) whose name stems from the fact that among all proper ellipsoids containing at least half of observations, the one given by MVE has minimal volume. However, their efficiency is rather poor. S-estimators (Davies, 1987) have been suggested to overcome the low efficiency of MVE. They combine approaches of MVE and M-estimators. τ-estimators (Lopuhaä, 1991) also employ the idea of M-estimators but they do not require preliminary scale estimator. CONTACT Přemysl Bejda premyslbejda@gmail.com Faculty of Mathematics and Physics, Charles University, Sokolovská 83, Prague , Czech Republic. 017 Taylor & Francis Group, LLC

3 L. ADAM AND P. BEJDA Stahel Donoho estimator (Donoho, 198; Stahel,1981) isbasedontheideathatany outlier in multivariate case should be an outlier in some univariate projection. Theadvantageoftheseestimatorsistheirhighbreakdownpoint,whichisthehighestwhich a shift equivariant estimator can attain. However, their computation usually requires heavy effort. Therefore, Maronna and Zamarb (00)suggestedawayofreducingthecomputational complexity while sustaining the high breakdown point. As a price to pay, one is no longer able to estimate the covariance structure. In this article, we follow the goal of finding easily computable robust estimators with high breakdownpoint.fromasample,wefirstselectasetaofobservations,whicharerobustinthe sense of breakdown point. Then we utilize these observations to construct further estimators, for example, by enlarging A and computing the (iterative) weighted mean of observations fromtheenlargedset.setaisbased on the geometric median (also called the spatial median), which is a direct generalization of the real median proposed by Haldane (1948). Since the geometric median has breakdown point of 1, our estimators will be able to keep this property as well. Our estimators enjoy the following nice properties: Instead of considering the R d space, we work with a general normed vector space X.This opens a natural way to tackle time series by our approach. Our estimators have high breakdown point and are simple to compute. We partially consider the covariance structure. We are able to work with set-valued estimators instead of single-valued estimators. Note that estimators usually only satisfy several of the above properties, for example, either they have high breakdown point or they do not take into account the covariance structure at all. This article is organized as follows: in the first part of Section we define basic concepts of breakdown point and geometric median. Even though geometric median may be a set and not a point in general, most authors do not handle this fact. Because of this, we have decided to work with estimators, which are multifunctions (also known as set-valued maps). The second part of Section contains new results. We propose new estimators, discuss their breakdown point, and provide a comparison between our algorithms and M-estimators. Ournotationisasfollows:by(X, ) we understand a normed vector space. For a set A X, we define A := sup x A x. Often we will use the bold notation for x = (x 1,...,x n ) X n.bythelowerindexweunderstandacomponentofavectorwhileby the upper index, we mean an iteration number. A multifunction R : X Y is a generalization of a function, where R(x) doesnothavetobeonepointbutmaybea(nonempty)subsetof Y. WesaythatR is bounded on bounded sets if x A R(x) is a bounded set for all bounded sets A X. Since we consider multi-valued estimators, some of the definitions are slightly generalized.. New estimators based on generalization of trimmed mean In this section, we first recall the geometric median and on its basis derive other estimators. The basic idea is to find first the geometric median, then restrict ourselves to a set of neighboring observations and construct an estimator based only on this restricted set. If this set is chosen in a proper way, the estimator will have a breakdown point of 1.Thebreakdownpoint is nowadays one of the standard measures of robustness and expresses the minimal proportion of the data, which can be corrupted (made arbitrarily distant) before the estimator becomes unbounded.

4 COMMUNICATIONS IN STATISTICS SIMULATION AND COMPUTATION 3 Definition.1. Consider a normed vector space X and an estimator T n : X n X of some functional T.Forx = (x 1,...,x n ) X n and m = 1,...,n define A m,n (x) := { x X n x and x have at most m different coordinates}, m n (T n, x) := max m {1,...,n} {m sup x Am,n (x), z T n ( x), z T n (x) }. z z < Then we say that T n has the breakdown point ε n (T n, x) := 1 n m n (T n, x). Finally, for family of estimators {T n : X n X} we define the asymptotic breakdown point as ε := lim inf n x X ε n (T n, x). We continue with the definition of geometric median. Definition.. We define the geometric median as a multifunction ˆT n : X n X satisfying ˆT n (x 1,...,x n ) = argmin a X a x j. (1) We present now two examples. The first one shows that the choice of the norm can change the geometric median in a significant way and that the geometric median may indeed be multi-valued.thesecondonedepictsasimplesituationwhereweareabletocomputethe geometric median. Moreover, it will be used later in some proofs. Example.1. Consider X = R and points x 1 = (1, 0), x = ( 1, 0), andx 3 = x 4 = (0, 1). Then it is not difficult to verify that for the following norms, we have (R, 1 ) ˆT 4 (x 1,...,x 4 ) = conv{(0, 0), (0, 1)}, (R, ) ˆT 4 (x 1,...,x 4 ) ={(0, 1)}, (R, ) ˆT 4 (x 1,...,x 4 ) ={(0, 1)}, where conv stands for the convex hull. We see that for and the geometric median is determined in a unique way. This does not hold any more for 1. Example.. Consider x X and x = (x 1,...,x n ),wherex 1 = =x m = x for some m.fixanyy X.Thenwehave n x x i = i=1 = = i=m+1 i=m+1 x x i x y + i=m+1 i=m+1 j=1 x y + y x i + i=m+1 y x i m y x i i=1 y x i +(n m) y x i=1 due to the m n.butthismeansthat x ˆT n (x). y x i i=1 m y x Recall that a shift equivariant estimator T n satisfies T n (x 1 + y,...,x n + y) = T n (x 1,...,x n ) + y for all x 1,...,x n X and y X. For single-valued estimators, it has been i=1

5 4 L. ADAM AND P. BEJDA showninmaronnaetal.(006, formula (3.5)) that a shift equivariant estimator satisfies ε n (T n, x) 1 n 1. () n Since the geometric median possesses this property, it is not surprising, that we obtain formula ()aswell.moreover,weobtainevenequalityinthisestimate. Lemma.1. For any x = (x 1,...,x n ) X n,wehaveforthegeometricmedian ε n ( ˆT n, x) = 1 n 1, n and thus for the asymptotic breakdown point we have ε = 1. Proof. The second statement is an immediate consequence of the first one. Note that the first statement is equivalent to m n ( ˆT n, x) = n 0 with n 0 := n 1.FromExample., weseethat m n ( ˆT n, x) < n,whichfurtherimpliesm n ( ˆT n, x) n 1 n 0.Tofinishtheproof,itissufficient to show that m n ( ˆT n, x) n 0. Consider thus any x A n0,n(x) and denote by I the index set of coordinates where x and x differ and by J its complement. Denote by n 1 the cardinality of I and observe that n 1 n 0. Denoting further R := max i=1,...,n x i,wehave x j = x j R for all j J. Takingany j J and y X,weobtainthefollowingestimate: x j x l x j x l +(n n 1 )R x j y + y x l +(n n 1 )R l I l I l I = y x l y x l +n 1 x j y +(n n 1 )R l=1 l J y x l y x j + x j x l +n 1 x j y +(n n 1 )R l=1 l J l J y x l +(n 1 n) y x j +4(n n 1 )R. l=1 l=1 Since n 1 n n 0 n < 0, we obtain that there is R I > 0suchthatforall y R I we have x j x l < y x l. l=1 But this means that the geometric median lies in a ball with radius R I.Sincethereisonlyfinite number of possible subsets I,wehavefinishedtheproof. The next lemma is allows us to compute the breakdown point of an estimator. Lemma.. Consider multifunctions 1 : X n X m and : X n X m X. Assume that the following assumptions are satisfied: 1. All components of 1 have breakdown point at least p.. There exists 3 : X m Xwhichisboundedonboundedsetssuchthat (x, y) 3 (y) for all x X n and y X m. l=1

6 COMMUNICATIONS IN STATISTICS SIMULATION AND COMPUTATION 5 Then estimator T n defined as has breakdown point at least p. T n (x) := y 1 (x) (x, y) Proof. Due to the first assumption, there exists some R > 0suchthatform := np, all x A m,n (x) and for all y 1 ( x) we have y R. But then we have ( x, y) 3 (y), which is uniformly bounded due to the second assumption. Thus, the statement has been proved. We come now to new estimators. For a set S X and a point x = (x 1,...,x n ) X n,we define L (S, x) := { } x I X n 1 I {1,...,n} : max x i y min x i y, i I i {1,...,n}\I y S where x I denotes the restriction of x to components I. The interpretation of this set goes as follows: we select some y S and choose x I to be the n 1 observations closest to y. Then L (S, x) is the union of all such subsets with respect to all choices of y S. Since every such x I contains less than n components of x, this set is stable with respect to perturbations of x whenever less than one half of observations is contaminated. We will use L (S, x) todefinefurtherestimators.thenexttheoremsaysthatifwestartwith the geometric median S = ˆT n (x) and a multifunction R with certain boundedness properties, we obtain an estimator with the same breakdown point as the geometric median. Theorem.1. Consider any multifunction R : X n 1 X, which is bounded on bounded sets and for which there exists z k such that R(z k,...,z k ).ThenforestimatorT n : X n X defined as T 1 n (x) := y L ( ˆT n (x),x) R(y) (3) and for every x = (x 1,...,x n ) X n, we have the following relation ε n (T 1 n, x) = ε n ( ˆT n, x) = 1 n 1. n Proof. From Lemma. with m = n 1, 1 (x) = L ( ˆT n (x), x),and (x, y) = R(y) and Lemma.1 we obtain ε n (T 1 n, x) ε n ( ˆT n, x) = 1 n 1. n To show the opposite inequality, realize that the statement is equivalent to m n (T n 1, x) n 1. For contradiction assume that m n (T n 1, x) n n.wechangethefirst n coordinates of x to z k and denote the perturbed point by x k.thenexample. tellsusthat z k ˆT n ( x k ).DuetothedefinitionofL,weseethat(z k,...,z k ) L ( ˆT n ( x k ), x k ) and the imposed assumption of R implies a contradiction. If both R and L ( ˆT n (x), x) are single-valued functions, expression (3)reducesto T 1 n (x) = R(L ( ˆT n (x), x)).

7 6 L. ADAM AND P. BEJDA Moreover, in such a case L ( ˆT n (x), x) denotes one half of observations, which are closest to the geometric median. There are several natural choices for R: for example, mean, weighted mean, or geometric median. One of possible drawbacks of estimator (3)isthatitutilizesonlyhalfoftheoriginaldata.We want to make use of as many observations as possible while maintaining the high breakdown point. To this aim, we first consider a general set S X m for some m N,forexample,we may consider mean of x as a subset of X or L ( ˆT n (x), x) as a subset of X n 1.Thenwe consider some b : X X n [0, ) and enlarge S by defining E b (S, x) := { } x I I ={i x i m j=1 B(y j, b(y j, x))}. (4) y S Here, B(y j, b(y j, x)) stands for a ball around y j with radius b(y j, x).theinterpretationgoes as follows: from S we select y, make balls around all of its components, and select all components of x,whichlieintheunionofthisballs. Example.3. Consider the case of X = R, n = 5, and x = ( 3,, 0,, 4). Then the geometric median equals to ˆT n (x) = 0 and since n 0 = n 1 =, we also have If we consider b 1, then L ( ˆT n (x), x) ={(, 0), (0, )} R. E b (L ( ˆT n (x), x), x) ={( 3,, 0), (0, )}. Note that both elements of E b (L ( ˆT n (x), x), x) areofadifferentdimension. We obtain the following variant of Theorem.1,for which we omit its identical proof. Theorem.. Consider b : X X n [0, ) boundedonboundedsetsinthefirstvariable, uniformly in the second one, any family of multifunctions R s : X s Xfors= 1,...,n, which are all bounded on bounded sets and for which there exists z k such that R s (z k,...,z k ). Then for estimators Tn : X n XandTn 3 : X n Xdefinedas T n (x) := T 3 n (x) := y E b ( ˆT n (x),x) y E b (L ( ˆT n (x),x),x) R dim y (y), R dim y (y) and for every x = (x 1,...,x n ) X n, we have the following relation for breakdown points ε n (T n, x) = ε n (T 3 n, x) = ε n ( ˆT n, x) = 1 n 1. n Function b should neither have too large values (which corresponds to big enlargement of the set in question) because outliers may be close to the non-contaminated data, nor too small values because some information could be missed. We suggest a possible choice in the Appendix. To improve the behavior of the estimators, we implement an iterative procedure. We start with geometric median z 0 = ˆT n (x) and in every iteration k compute a new estimate z k. To do so, we employ (5b) withr being the weighted mean, where the (non-normalized) (5a) (5b)

8 COMMUNICATIONS IN STATISTICS SIMULATION AND COMPUTATION 7 weights satisfy 1 ( ) ify i L ({z k 1 }, x), w i (z k 1, y i ) = max 1 y i y j otherwise (6) b k (y j,x) y j L ({z k 1 },x) for some b k based on L ({z k 1 }, x).thischoiceofweightsmakesuseofthepossiblydivisionof components of y E b (L ({z k 1 }, x), x) into two parts: those who belong to L ({z k 1 }, x) and those who were added by enlarging this set. For the first part, we choose the (non-normalized) weight equal to one, the weight for observations from the second part decreases with the increasing distance to L ({z k 1 }, x). We summarize this approach in Algorithm.1. Considering the termination criterion, any standard criterion may be used, for example, if the (relative) change in z k is small. Algorithm.1 An estimator based on iterative weighting Input: observations x = (x 1,...,x n ) 1: k 0, z 0 ˆT n (x) : while not terminate do 3: k k + 1 4: determine b k based on L ({z k 1 }, x) 5: pick any y k E b k(l ({z k 1 }, x), x) 6: compute w k according to formula (6) and renormthem such that their sum equals to 1 7: z k dim y i=1 w k i yk i 8: end while 9: return estimate ˆx z k Finally, we would like to point out that every update step in Algorithm.1 keeps the stability result, which we already mentioned several times in the previous text. For k = 1, we may write z k = y 1 (x) (x, y), where y = y 1, 1 (x) := E b (L ({z 0 }, x), x), and (x, y) := dim y 1 i=1 w i (x, y 1 )yi 1.Then 1 has breakdown point 1 n 1 n and since (x, y) n max yi y y i holds true, thanks to Lemma. we obtain that z 1 hasthesamebreakdownpoint.byapplyingthesameprocedure to subsequent iterations, we obtain the same result for all z k. In the previous text, we highlighted some benefits of our estimators. Note that naturally there are also situations where it is better to use standard estimators. Consider, for example, the one-dimensional double exponential distribution with density 1 exp( x μ ). Then the (geometric) median coincides with the maximum likelihood estimator of μ, seesec. 6.3in Lehmann and Casella (006), and therefore the median is the most efficient estimator. Thus, by employing additional observations apart from the median, we only worsen the quality of an estimator. However, to benefit from such situation, we would have to know the true distribution and know that there is no contamination. Remark.1. There is some connection between our estimators with M-estimators. In Algorithm., we summarize the algorithm from Maronna et al. (006, sec..7.3).

9 8 L. ADAM AND P. BEJDA Algorithm. M-estimator from Require: observations x = (x 1,...,x n ), weighting functions W 1 and W 1: k 0, z 0 ˆT n (x), dispersion estimate σ 0 : while not terminate do 3: k k + 1 4: ri k x i z k 1 σ k 1 5: w1,i k W 1(ri k), wk,i W (ri k ) andnorm the weights to sum to one 6: z k n i=1 wk 1,i x i, σ k 1 n 7: end while 8: return estimate ˆx z k n i=1 wk,i (x i z k 1 ) Both approaches (iteratively) compute a weighted mean of observations. While M- estimators are based on the maximum likelihood estimate, ours are based on geometric intuition and trimmed mean. If X = R and if W 1 in Algorithm. has finite support, we can say that our estimators belong to the very wide class of M-estimators. This changes for R d though. Under the standard assumption that W 1 is symmetric around zero and non-increasing on rays emanating from zero, the weight of an observation depends only on the distance from z k 1. Thus, two observations have the same weight if and only if their distance to z k 1 is identical. On the other hand, in our approach all points in L ({z k 1 }, x) havethesameweightand this set is enlarged farther for distant observations. Thus, even though none of the algorithms estimates the covariance structure, our algorithm makes at least an attempt to consider it. Of course, there are M-estimators, which along with the location also properly estimate the covariance structure. But this raises the computational complexity and reduces the breakdown point to 1.Tosummarize:wecansaythatouralgorithmstrytopickthebestpropertiesof d+1 M-estimators, on one hand they have high breakdown point and are simple to compute; on the other hand, they at least partially consider the covariance structure. Another advantage of our estimator over M-estimators is a simpler theoretical analysis. To show that our estimator has the limiting breakdown point 1, it is sufficient to apply Lemma., which itself directly follows from the definition of the breakdown point. To the best of our knowledge, such direct application of the definition is not possible for M-estimators, for example, one has to take care of properties of weighting function due to the division by dispersion. 3. Numerical results In this section, we first show numerical performance of our estimators and then show how they comply with the well-known concepts of boxplot and bagplot Numerical performance We consider X = R d with d {1, 15} and compare our algorithms with known estimators; for reader s convenience we summarize the used algorithms in Table 1. To generate the samples, we first generate z i from N(0, 1), thencontaminatethemby some distribution with probability p and finally modify them via a covariance structure. This modification is performed in the following way: we randomly generate a correlation matrix C and diagonal matrix with diagonal elements having distribution U[0.5, 10]. Then we compute covariance matrix V = C, its Cholesky decomposition V = S S,andfinallyset y i = Sz i + μ, whereμ := (0,...,d 1). TogetherweconsiderN = 10,000 samples with

10 COMMUNICATIONS IN STATISTICS SIMULATION AND COMPUTATION 9 Table 1. Summary of algorithms. The horizontal line divides them in known and our algorithms. Mean mean Med median or geometric median Trun α-truncated mean with α = 0. Winsor α-winsorized mean with α = 0. M1 Huber M-estimator, see Huber (1981) M Algorithm. from Maronna et al. (006) SD Stahel Donoho estimator, see Stahel (1981)andDonoho (198) GM1 formula (3), where R is themean GM formula (5a), where R is themean GM3 Algorithm.1 Table. Value of loss function (7) for contaminations of N(0, 1) by some other distribution with probability p for X = R. Distribution p Mean Med Trun Winsor M1 M GM1 GM GM N(0, 100) Cauchy U( 10, 10) N(0, 100) Cauchy U( 10, 10) U( 0, 10) n = 100 observations. The loss function equals to 1 N N i=1 d ˆx i, j μ j, (7) j=1 where ˆx i, j denotes an estimate for sample i and coordinate j = 1,...,d. We present the results in Table for X = R and in Table 3 for X = R 15. The values show the loss function (7) for contaminating distribution (first column) with given probability (second column). Bold numbers are the best values among all estimators and numbers in italic are those within 5% of the best loss function value. For X = R, the results between M-estimators and our estimators are comparable with M-estimators in a slight lead, which is not surprising due to Remark.1. ForX = R 15, the performance of both estimators turn around and now our estimators perform better than the examined M-estimators. This is again expected as our estimators try to take into account the covariance structure as explained in Remark.1. For the second case, our estimators manage to beat the Stahel Donoho estimators almost in all cases. Table 3. Value of loss function (7) for contaminations of N(0, I) by some other distribution with probability p for X = R 15. Distribution p Mean Med M1 SD GM1 GM GM N(0, 100) Cauchy U( 10, 10) N(0, 100) Cauchy U( 10, 10) U( 0, 10)

11 10 L. ADAM AND P. BEJDA Figure 1. Comparison of the classical boxplot (left-hand side of each figure) and our modification (righthand side of each figure). In both cases, we contaminate N(0, 1) with a probability p by the distribution described under the figures. 3.. Relation to boxplot Boxplot was proposed for the first time in Tukey (1977). It takes the median, then computes the interquartile range (IQR), which is later widened. Observations, which are not present in this widening (known as whiskers) are considered as outliers. We compare boxplot with our method, where instead of considering IQR, we take y L ( ˆT n (x), x). Thus,wedonot need to take 5% observations with lower value than the median and 5% observations with higher value thanthe median but we take50%observations closest to the median. Whiskers are based on E b ({y}, x). We depict this comparison in Fig. 1.Wecontaminatethestandardnormaldistributionby some other distribution. In each (sub)figure, the left-hand side is the boxplot and the righthand side is our modification. Since both approaches differ in the way in which they treat nonsymmetry, both graphs in the left figure are identical. However, if we assume non-symmetric distributions (right figure), then our modification is able to detect outliers in a better way than the standard boxplot Relation to bagplot In this subsection, we consider generalization of boxplot into more dimensions. For two dimensions, this is known as bagplot and was studied for the first time in Rousseeuw et al. (1999). For generalization to functional data, see, for example, Sun and Genton (011). To construct bagplot, a real number called depth is assigned to every observation, see Zuo and Serfling (000). Then the observation with highest value of depth is called the depth median and the convex hull of approximately 50% observations with the highest depth is called the bag (this corresponds to IQR for boxplot). Then the bag is enlarged to the so-called fence (which corresponds to whiskers for boxplot). The observations not in the fence are considered as outliers. Wenowpresentourmodificationofthebagplot,illustratedinFig.. Havingasample x, we compute first its geometric median ˆT n (x). For simplicity, we assume that it is defined uniquely. Then we choose an arbitrary y X n 1 from L ( ˆT n (x), x), construct a convex hull containing all coordinates of y, and call this set the bag. In the last step, we enlarge the bag by considering conv{ ˆT n (x) + 3(y i ˆT n (x))}, i

12 COMMUNICATIONS IN STATISTICS SIMULATION AND COMPUTATION 11 Figure. Description of the bagplot based on GM. : observations, : mean, : ˆT n (x), fullanddashed line: convex envelope of L and E, respectively, :GM1, : GM, X: outliers. see a similar procedure in Rousseeuw et al. (1999). In this figure, we also denote GM1 (as the mean of all observation in the bag) and GM (as the mean of all observation in the fence). Here, the geometric median corresponds to the depth median, the convex hull to the bag, and its enlargement to the fence. Note that the construction of the fence is very similar to the construction of E b (L ( ˆT n (x), x), x). Moreover, since the construction of GM is based on objects, which have the limiting breakdown point of 1,estimatorGMhasthesameproperty. On the other hand, since bagplot is a generalization of boxplot, which has breakdown point of 1 4,wecannotexpectthebagplottohavehigherbreakdownpointthanthis.Thus,ourversion of bagplot deals better with outliers than the original method. Appendix: Choice of b Inthisshortsection,wederiveanestimateforb for algorithms GM and GM3 described in Table 1.Notethatduetotheconstructionofthealgorithm,itissufficienttodefineb i := b(x i, x) for all observations in L ( ˆT n (x), x). Since we want to keep GM as simple as possible, we consider constant b and relax this assumption for GM3. Moreover, we derive different value for one- and more-dimensional cases. We start with a technical lemma. Lemma A.1. Let Y be a random variable with a finite second moment, distribution function G, mean μ, standarddeviationσ,andletq μ,σ denote its quantile function. Then for a fixed α [0, 1], the following ratio does not depend on the values of μ and σ K G,α = q μ,σ (1 α/) q μ,σ (α/). q μ,σ (0.75) q μ,σ (0.5) Proof. This follows from the fact σ q 0,1 (α) + μ = q μ,σ (α). For the case of one dimension, consider a random sample x from N(μ, σ ) and denote its median by ˆT n (x). For simplicity assume that L ( ˆT n (x), x) R n 1 is a singleton. Then we may estimate q μ,σ (0.75) q μ,σ (0.5) by max L ( ˆT n (x), x) min L ( ˆT n (x), x). Then we define b as b := K N(0,1),α max L ( ˆT n (x), x) min L ( ˆT n (x), x) q μ,σ (1 α/) q μ,σ (α/).

13 1 L. ADAM AND P. BEJDA This value thus estimates the distance between chosen quantiles divided by two. If we take all observations, which distance from median is b, then we get, under assumption of not contaminated normality, approximately 1 α of our observations. For the multidimensional case R d, we assume that x follows the N(μ, σ I) distribution. We use max y L ( ˆT n (x),x) y ˆT n (x) as an approximation of σχ0.5,d,whereχ 0.5,d is the quantile function of χ distribution with d degrees of freedom evaluated at probability 0.5. To include approximately fraction α (0, 1) of all observations, we set b := max y ˆT n (x) χ α,d. y L ( ˆT n (x),x) χ0.5,d ForGM3,weconsiderdirectlyR d. Assume again that our sample x has distribution N(μ, σ I),thenforx i from the boundary of L ( ˆT n (x), x) we have x i ˆT n (x) σ χ 0.5,d. Now fixing given probability level α (0, 1) and weight w (0, 1), wewanttohaveallx e with x e ˆT n (x) χα,d σ to have weight (6)atleastw. But plugging this in the definition of weight results in b i x i x e 1 w x e ˆT n (x) 1 w = σ x i ˆT n (x) 1 w x i ˆT n (x) x i ˆT n (x) 1 w ). ( χ α,d χ 0.5,d σ ) ( χ α,d 1 w χ0.5,d Approximating again the distribution and taking minimum value of b i,weset χ α,d b i := (1 w) χ 0.5,d χ 0.5,d In the numerical experiments, we have chosen α = x i ˆT n (x). Acknowledgment The authors would like to thank anonymous reviewers for substantially improving this article. ORCID Lukáš Adam References Davies, P. L. (1987). Asymptotic behavior of S-estimators of multivariate location parameters and dispersion matrices. The Annals of Statistics 15: Donoho, D. L. (198). Breakdown Properties of Multivariate Location Estimators. Ph.D. Qualifying Paper, Harvard University.

14 COMMUNICATIONS IN STATISTICS SIMULATION AND COMPUTATION 13 Donoho, D. L., Huber, P. J. (1983). The notion of breakdown point. In: Bickel, P. J., Doksum, K. A., and Hodges, J. L., eds. A Festschriftfor Erich Lehman. California, Berkeley: Wadsworth, pp Haldane, J. B. S. (1948). Note on the median of a multivariate distribution. Biometrika 35(3 4): Hampel, F. R. (1971). A general qualitative definition of robustness. The Annals of Mathematical Statistics 4(6): Huber, P. J. (1964). Robust estimation of location parameter. The Annals of Mathematical Statistics 35(1): Huber, P. J. (1981). Robust Statistics. New York: John Wiley & Sons. Lehmann,E. L.,Casella, G. (006). Theory of Point Estimation. New York: Springer Science & Business Media. Lopuhaä, H. P. (1991).Multivariateτ-estimators for location and scatter. Canadian Journal of Statistics 19: Maronna, R. (1976). Robust M-estimators of multivariate location and scatter. The Annals of Statistics 4: Maronna, R., Martin, D., Yohai, V. (006). Robust Statistics Theory. New Jersey: John Wiley & Sons. Maronna, R. A., Zamarb, R. H. (00). Robust estimates of location and dispersion for highdimensional datasets. Technometrics 44(4): Rousseeuw, P. J. (1985). Multivariate estimation with high breakdown point. In: Mathematical Statistics and Applications. Amsterdam: Elsevier, pp Rousseeuw, P. J., Ruts, I., Tukey, J. W. (1999). The bagplot: A bivariate boxplot. The American Statistician 53(4): Stahel, W. A. (1981). Breakdown of Covariance Estimators. Research Report 31. ETH Zürich: Fachgruppe für Statistik. Sun, Y., Genton, M. G. (011). Functional boxplots. Journal of Computational and Graphical Statistics 0(): Tukey, J. W. (1977).Exploratory Data Analysis. Massachusetts: Addison-Wesley Publishing Co. Zuo, Y., Serfling, R. (000). General notions of statistical depth function. The Annals of Statistics 8():

TITLE : Robust Control Charts for Monitoring Process Mean of. Phase-I Multivariate Individual Observations AUTHORS : Asokan Mulayath Variyath.

TITLE : Robust Control Charts for Monitoring Process Mean of. Phase-I Multivariate Individual Observations AUTHORS : Asokan Mulayath Variyath. TITLE : Robust Control Charts for Monitoring Process Mean of Phase-I Multivariate Individual Observations AUTHORS : Asokan Mulayath Variyath Department of Mathematics and Statistics, Memorial University

More information

Robust estimation of scale and covariance with P n and its application to precision matrix estimation

Robust estimation of scale and covariance with P n and its application to precision matrix estimation Robust estimation of scale and covariance with P n and its application to precision matrix estimation Garth Tarr, Samuel Müller and Neville Weber USYD 2013 School of Mathematics and Statistics THE UNIVERSITY

More information

A Comparison of Robust Estimators Based on Two Types of Trimming

A Comparison of Robust Estimators Based on Two Types of Trimming Submitted to the Bernoulli A Comparison of Robust Estimators Based on Two Types of Trimming SUBHRA SANKAR DHAR 1, and PROBAL CHAUDHURI 1, 1 Theoretical Statistics and Mathematics Unit, Indian Statistical

More information

Measuring robustness

Measuring robustness Measuring robustness 1 Introduction While in the classical approach to statistics one aims at estimates which have desirable properties at an exactly speci ed model, the aim of robust methods is loosely

More information

MULTIVARIATE TECHNIQUES, ROBUSTNESS

MULTIVARIATE TECHNIQUES, ROBUSTNESS MULTIVARIATE TECHNIQUES, ROBUSTNESS Mia Hubert Associate Professor, Department of Mathematics and L-STAT Katholieke Universiteit Leuven, Belgium mia.hubert@wis.kuleuven.be Peter J. Rousseeuw 1 Senior Researcher,

More information

Inequalities Relating Addition and Replacement Type Finite Sample Breakdown Points

Inequalities Relating Addition and Replacement Type Finite Sample Breakdown Points Inequalities Relating Addition and Replacement Type Finite Sample Breadown Points Robert Serfling Department of Mathematical Sciences University of Texas at Dallas Richardson, Texas 75083-0688, USA Email:

More information

Detecting outliers in weighted univariate survey data

Detecting outliers in weighted univariate survey data Detecting outliers in weighted univariate survey data Anna Pauliina Sandqvist October 27, 21 Preliminary Version Abstract Outliers and influential observations are a frequent concern in all kind of statistics,

More information

Re-weighted Robust Control Charts for Individual Observations

Re-weighted Robust Control Charts for Individual Observations Universiti Tunku Abdul Rahman, Kuala Lumpur, Malaysia 426 Re-weighted Robust Control Charts for Individual Observations Mandana Mohammadi 1, Habshah Midi 1,2 and Jayanthi Arasan 1,2 1 Laboratory of Applied

More information

THE BREAKDOWN POINT EXAMPLES AND COUNTEREXAMPLES

THE BREAKDOWN POINT EXAMPLES AND COUNTEREXAMPLES REVSTAT Statistical Journal Volume 5, Number 1, March 2007, 1 17 THE BREAKDOWN POINT EXAMPLES AND COUNTEREXAMPLES Authors: P.L. Davies University of Duisburg-Essen, Germany, and Technical University Eindhoven,

More information

Introduction to Real Analysis Alternative Chapter 1

Introduction to Real Analysis Alternative Chapter 1 Christopher Heil Introduction to Real Analysis Alternative Chapter 1 A Primer on Norms and Banach Spaces Last Updated: March 10, 2018 c 2018 by Christopher Heil Chapter 1 A Primer on Norms and Banach Spaces

More information

Introduction Robust regression Examples Conclusion. Robust regression. Jiří Franc

Introduction Robust regression Examples Conclusion. Robust regression. Jiří Franc Robust regression Robust estimation of regression coefficients in linear regression model Jiří Franc Czech Technical University Faculty of Nuclear Sciences and Physical Engineering Department of Mathematics

More information

A Modified M-estimator for the Detection of Outliers

A Modified M-estimator for the Detection of Outliers A Modified M-estimator for the Detection of Outliers Asad Ali Department of Statistics, University of Peshawar NWFP, Pakistan Email: asad_yousafzay@yahoo.com Muhammad F. Qadir Department of Statistics,

More information

Introduction to robust statistics*

Introduction to robust statistics* Introduction to robust statistics* Xuming He National University of Singapore To statisticians, the model, data and methodology are essential. Their job is to propose statistical procedures and evaluate

More information

High Breakdown Point Estimation in Regression

High Breakdown Point Estimation in Regression WDS'08 Proceedings of Contributed Papers, Part I, 94 99, 2008. ISBN 978-80-7378-065-4 MATFYZPRESS High Breakdown Point Estimation in Regression T. Jurczyk Charles University, Faculty of Mathematics and

More information

Efficient and Robust Scale Estimation

Efficient and Robust Scale Estimation Efficient and Robust Scale Estimation Garth Tarr, Samuel Müller and Neville Weber School of Mathematics and Statistics THE UNIVERSITY OF SYDNEY Outline Introduction and motivation The robust scale estimator

More information

ON THE CALCULATION OF A ROBUST S-ESTIMATOR OF A COVARIANCE MATRIX

ON THE CALCULATION OF A ROBUST S-ESTIMATOR OF A COVARIANCE MATRIX STATISTICS IN MEDICINE Statist. Med. 17, 2685 2695 (1998) ON THE CALCULATION OF A ROBUST S-ESTIMATOR OF A COVARIANCE MATRIX N. A. CAMPBELL *, H. P. LOPUHAA AND P. J. ROUSSEEUW CSIRO Mathematical and Information

More information

IMPROVING THE SMALL-SAMPLE EFFICIENCY OF A ROBUST CORRELATION MATRIX: A NOTE

IMPROVING THE SMALL-SAMPLE EFFICIENCY OF A ROBUST CORRELATION MATRIX: A NOTE IMPROVING THE SMALL-SAMPLE EFFICIENCY OF A ROBUST CORRELATION MATRIX: A NOTE Eric Blankmeyer Department of Finance and Economics McCoy College of Business Administration Texas State University San Marcos

More information

Jensen s inequality for multivariate medians

Jensen s inequality for multivariate medians Jensen s inequality for multivariate medians Milan Merkle University of Belgrade, Serbia emerkle@etf.rs Given a probability measure µ on Borel sigma-field of R d, and a function f : R d R, the main issue

More information

Introduction to Robust Statistics. Elvezio Ronchetti. Department of Econometrics University of Geneva Switzerland.

Introduction to Robust Statistics. Elvezio Ronchetti. Department of Econometrics University of Geneva Switzerland. Introduction to Robust Statistics Elvezio Ronchetti Department of Econometrics University of Geneva Switzerland Elvezio.Ronchetti@metri.unige.ch http://www.unige.ch/ses/metri/ronchetti/ 1 Outline Introduction

More information

Asymptotic Relative Efficiency in Estimation

Asymptotic Relative Efficiency in Estimation Asymptotic Relative Efficiency in Estimation Robert Serfling University of Texas at Dallas October 2009 Prepared for forthcoming INTERNATIONAL ENCYCLOPEDIA OF STATISTICAL SCIENCES, to be published by Springer

More information

Stahel-Donoho Estimation for High-Dimensional Data

Stahel-Donoho Estimation for High-Dimensional Data Stahel-Donoho Estimation for High-Dimensional Data Stefan Van Aelst KULeuven, Department of Mathematics, Section of Statistics Celestijnenlaan 200B, B-3001 Leuven, Belgium Email: Stefan.VanAelst@wis.kuleuven.be

More information

YIJUN ZUO. Education. PhD, Statistics, 05/98, University of Texas at Dallas, (GPA 4.0/4.0)

YIJUN ZUO. Education. PhD, Statistics, 05/98, University of Texas at Dallas, (GPA 4.0/4.0) YIJUN ZUO Department of Statistics and Probability Michigan State University East Lansing, MI 48824 Tel: (517) 432-5413 Fax: (517) 432-5413 Email: zuo@msu.edu URL: www.stt.msu.edu/users/zuo Education PhD,

More information

ROBUST TESTS BASED ON MINIMUM DENSITY POWER DIVERGENCE ESTIMATORS AND SADDLEPOINT APPROXIMATIONS

ROBUST TESTS BASED ON MINIMUM DENSITY POWER DIVERGENCE ESTIMATORS AND SADDLEPOINT APPROXIMATIONS ROBUST TESTS BASED ON MINIMUM DENSITY POWER DIVERGENCE ESTIMATORS AND SADDLEPOINT APPROXIMATIONS AIDA TOMA The nonrobustness of classical tests for parametric models is a well known problem and various

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

ROBUST ESTIMATION OF A CORRELATION COEFFICIENT: AN ATTEMPT OF SURVEY

ROBUST ESTIMATION OF A CORRELATION COEFFICIENT: AN ATTEMPT OF SURVEY ROBUST ESTIMATION OF A CORRELATION COEFFICIENT: AN ATTEMPT OF SURVEY G.L. Shevlyakov, P.O. Smirnov St. Petersburg State Polytechnic University St.Petersburg, RUSSIA E-mail: Georgy.Shevlyakov@gmail.com

More information

A ROBUST METHOD OF ESTIMATING COVARIANCE MATRIX IN MULTIVARIATE DATA ANALYSIS G.M. OYEYEMI *, R.A. IPINYOMI **

A ROBUST METHOD OF ESTIMATING COVARIANCE MATRIX IN MULTIVARIATE DATA ANALYSIS G.M. OYEYEMI *, R.A. IPINYOMI ** ANALELE ŞTIINłIFICE ALE UNIVERSITĂłII ALEXANDRU IOAN CUZA DIN IAŞI Tomul LVI ŞtiinŃe Economice 9 A ROBUST METHOD OF ESTIMATING COVARIANCE MATRIX IN MULTIVARIATE DATA ANALYSIS G.M. OYEYEMI, R.A. IPINYOMI

More information

Identification of Multivariate Outliers: A Performance Study

Identification of Multivariate Outliers: A Performance Study AUSTRIAN JOURNAL OF STATISTICS Volume 34 (2005), Number 2, 127 138 Identification of Multivariate Outliers: A Performance Study Peter Filzmoser Vienna University of Technology, Austria Abstract: Three

More information

Highly Robust Variogram Estimation 1. Marc G. Genton 2

Highly Robust Variogram Estimation 1. Marc G. Genton 2 Mathematical Geology, Vol. 30, No. 2, 1998 Highly Robust Variogram Estimation 1 Marc G. Genton 2 The classical variogram estimator proposed by Matheron is not robust against outliers in the data, nor is

More information

Outliers Robustness in Multivariate Orthogonal Regression

Outliers Robustness in Multivariate Orthogonal Regression 674 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS PART A: SYSTEMS AND HUMANS, VOL. 30, NO. 6, NOVEMBER 2000 Outliers Robustness in Multivariate Orthogonal Regression Giuseppe Carlo Calafiore Abstract

More information

Robust Estimation of Cronbach s Alpha

Robust Estimation of Cronbach s Alpha Robust Estimation of Cronbach s Alpha A. Christmann University of Dortmund, Fachbereich Statistik, 44421 Dortmund, Germany. S. Van Aelst Ghent University (UGENT), Department of Applied Mathematics and

More information

Metric Spaces and Topology

Metric Spaces and Topology Chapter 2 Metric Spaces and Topology From an engineering perspective, the most important way to construct a topology on a set is to define the topology in terms of a metric on the set. This approach underlies

More information

A Brief Overview of Robust Statistics

A Brief Overview of Robust Statistics A Brief Overview of Robust Statistics Olfa Nasraoui Department of Computer Engineering & Computer Science University of Louisville, olfa.nasraoui_at_louisville.edu Robust Statistical Estimators Robust

More information

On Semicontinuity of Convex-valued Multifunctions and Cesari s Property (Q)

On Semicontinuity of Convex-valued Multifunctions and Cesari s Property (Q) On Semicontinuity of Convex-valued Multifunctions and Cesari s Property (Q) Andreas Löhne May 2, 2005 (last update: November 22, 2005) Abstract We investigate two types of semicontinuity for set-valued

More information

Computational Connections Between Robust Multivariate Analysis and Clustering

Computational Connections Between Robust Multivariate Analysis and Clustering 1 Computational Connections Between Robust Multivariate Analysis and Clustering David M. Rocke 1 and David L. Woodruff 2 1 Department of Applied Science, University of California at Davis, Davis, CA 95616,

More information

Breakdown points of Cauchy regression-scale estimators

Breakdown points of Cauchy regression-scale estimators Breadown points of Cauchy regression-scale estimators Ivan Mizera University of Alberta 1 and Christine H. Müller Carl von Ossietzy University of Oldenburg Abstract. The lower bounds for the explosion

More information

The nonsmooth Newton method on Riemannian manifolds

The nonsmooth Newton method on Riemannian manifolds The nonsmooth Newton method on Riemannian manifolds C. Lageman, U. Helmke, J.H. Manton 1 Introduction Solving nonlinear equations in Euclidean space is a frequently occurring problem in optimization and

More information

Regression Analysis for Data Containing Outliers and High Leverage Points

Regression Analysis for Data Containing Outliers and High Leverage Points Alabama Journal of Mathematics 39 (2015) ISSN 2373-0404 Regression Analysis for Data Containing Outliers and High Leverage Points Asim Kumer Dey Department of Mathematics Lamar University Md. Amir Hossain

More information

Lecture 14 October 13

Lecture 14 October 13 STAT 383C: Statistical Modeling I Fall 2015 Lecture 14 October 13 Lecturer: Purnamrita Sarkar Scribe: Some one Disclaimer: These scribe notes have been slightly proofread and may have typos etc. Note:

More information

Introduction to Robust Statistics. Anthony Atkinson, London School of Economics, UK Marco Riani, Univ. of Parma, Italy

Introduction to Robust Statistics. Anthony Atkinson, London School of Economics, UK Marco Riani, Univ. of Parma, Italy Introduction to Robust Statistics Anthony Atkinson, London School of Economics, UK Marco Riani, Univ. of Parma, Italy Multivariate analysis Multivariate location and scatter Data where the observations

More information

On the Power of Robust Solutions in Two-Stage Stochastic and Adaptive Optimization Problems

On the Power of Robust Solutions in Two-Stage Stochastic and Adaptive Optimization Problems MATHEMATICS OF OPERATIONS RESEARCH Vol. xx, No. x, Xxxxxxx 00x, pp. xxx xxx ISSN 0364-765X EISSN 156-5471 0x xx0x 0xxx informs DOI 10.187/moor.xxxx.xxxx c 00x INFORMS On the Power of Robust Solutions in

More information

On the Power of Robust Solutions in Two-Stage Stochastic and Adaptive Optimization Problems

On the Power of Robust Solutions in Two-Stage Stochastic and Adaptive Optimization Problems MATHEMATICS OF OPERATIONS RESEARCH Vol. 35, No., May 010, pp. 84 305 issn 0364-765X eissn 156-5471 10 350 084 informs doi 10.187/moor.1090.0440 010 INFORMS On the Power of Robust Solutions in Two-Stage

More information

The Dirichlet s P rinciple. In this lecture we discuss an alternative formulation of the Dirichlet problem for the Laplace equation:

The Dirichlet s P rinciple. In this lecture we discuss an alternative formulation of the Dirichlet problem for the Laplace equation: Oct. 1 The Dirichlet s P rinciple In this lecture we discuss an alternative formulation of the Dirichlet problem for the Laplace equation: 1. Dirichlet s Principle. u = in, u = g on. ( 1 ) If we multiply

More information

Statistical Data Analysis

Statistical Data Analysis DS-GA 0 Lecture notes 8 Fall 016 1 Descriptive statistics Statistical Data Analysis In this section we consider the problem of analyzing a set of data. We describe several techniques for visualizing the

More information

Improvement of The Hotelling s T 2 Charts Using Robust Location Winsorized One Step M-Estimator (WMOM)

Improvement of The Hotelling s T 2 Charts Using Robust Location Winsorized One Step M-Estimator (WMOM) Punjab University Journal of Mathematics (ISSN 1016-2526) Vol. 50(1)(2018) pp. 97-112 Improvement of The Hotelling s T 2 Charts Using Robust Location Winsorized One Step M-Estimator (WMOM) Firas Haddad

More information

Robust model selection criteria for robust S and LT S estimators

Robust model selection criteria for robust S and LT S estimators Hacettepe Journal of Mathematics and Statistics Volume 45 (1) (2016), 153 164 Robust model selection criteria for robust S and LT S estimators Meral Çetin Abstract Outliers and multi-collinearity often

More information

Robust scale estimation with extensions

Robust scale estimation with extensions Robust scale estimation with extensions Garth Tarr, Samuel Müller and Neville Weber School of Mathematics and Statistics THE UNIVERSITY OF SYDNEY Outline The robust scale estimator P n Robust covariance

More information

Outlier detection for skewed data

Outlier detection for skewed data Outlier detection for skewed data Mia Hubert 1 and Stephan Van der Veeken December 7, 27 Abstract Most outlier detection rules for multivariate data are based on the assumption of elliptical symmetry of

More information

Statistical Depth Function

Statistical Depth Function Machine Learning Journal Club, Gatsby Unit June 27, 2016 Outline L-statistics, order statistics, ranking. Instead of moments: median, dispersion, scale, skewness,... New visualization tools: depth contours,

More information

March 25, 2010 CHAPTER 2: LIMITS AND CONTINUITY OF FUNCTIONS IN EUCLIDEAN SPACE

March 25, 2010 CHAPTER 2: LIMITS AND CONTINUITY OF FUNCTIONS IN EUCLIDEAN SPACE March 25, 2010 CHAPTER 2: LIMIT AND CONTINUITY OF FUNCTION IN EUCLIDEAN PACE 1. calar product in R n Definition 1.1. Given x = (x 1,..., x n ), y = (y 1,..., y n ) R n,we define their scalar product as

More information

Nonparametric Modal Regression

Nonparametric Modal Regression Nonparametric Modal Regression Summary In this article, we propose a new nonparametric modal regression model, which aims to estimate the mode of the conditional density of Y given predictors X. The nonparametric

More information

Two Simple Resistant Regression Estimators

Two Simple Resistant Regression Estimators Two Simple Resistant Regression Estimators David J. Olive Southern Illinois University January 13, 2005 Abstract Two simple resistant regression estimators with O P (n 1/2 ) convergence rate are presented.

More information

Set, functions and Euclidean space. Seungjin Han

Set, functions and Euclidean space. Seungjin Han Set, functions and Euclidean space Seungjin Han September, 2018 1 Some Basics LOGIC A is necessary for B : If B holds, then A holds. B A A B is the contraposition of B A. A is sufficient for B: If A holds,

More information

Week 2: Sequences and Series

Week 2: Sequences and Series QF0: Quantitative Finance August 29, 207 Week 2: Sequences and Series Facilitator: Christopher Ting AY 207/208 Mathematicians have tried in vain to this day to discover some order in the sequence of prime

More information

General Foundations for Studying Masking and Swamping Robustness of Outlier Identifiers

General Foundations for Studying Masking and Swamping Robustness of Outlier Identifiers General Foundations for Studying Masking and Swamping Robustness of Outlier Identifiers Robert Serfling 1 and Shanshan Wang 2 University of Texas at Dallas This paper is dedicated to the memory of Kesar

More information

More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order Restriction

More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order Restriction Sankhyā : The Indian Journal of Statistics 2007, Volume 69, Part 4, pp. 700-716 c 2007, Indian Statistical Institute More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order

More information

Computer projects for Mathematical Statistics, MA 486. Some practical hints for doing computer projects with MATLAB:

Computer projects for Mathematical Statistics, MA 486. Some practical hints for doing computer projects with MATLAB: Computer projects for Mathematical Statistics, MA 486. Some practical hints for doing computer projects with MATLAB: You can save your project to a text file (on a floppy disk or CD or on your web page),

More information

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9 MAT 570 REAL ANALYSIS LECTURE NOTES PROFESSOR: JOHN QUIGG SEMESTER: FALL 204 Contents. Sets 2 2. Functions 5 3. Countability 7 4. Axiom of choice 8 5. Equivalence relations 9 6. Real numbers 9 7. Extended

More information

A Parametric Simplex Algorithm for Linear Vector Optimization Problems

A Parametric Simplex Algorithm for Linear Vector Optimization Problems A Parametric Simplex Algorithm for Linear Vector Optimization Problems Birgit Rudloff Firdevs Ulus Robert Vanderbei July 9, 2015 Abstract In this paper, a parametric simplex algorithm for solving linear

More information

Total Least Squares Approach in Regression Methods

Total Least Squares Approach in Regression Methods WDS'08 Proceedings of Contributed Papers, Part I, 88 93, 2008. ISBN 978-80-7378-065-4 MATFYZPRESS Total Least Squares Approach in Regression Methods M. Pešta Charles University, Faculty of Mathematics

More information

CONVENIENT PRETOPOLOGIES ON Z 2

CONVENIENT PRETOPOLOGIES ON Z 2 TWMS J. Pure Appl. Math., V.9, N.1, 2018, pp.40-51 CONVENIENT PRETOPOLOGIES ON Z 2 J. ŠLAPAL1 Abstract. We deal with pretopologies on the digital plane Z 2 convenient for studying and processing digital

More information

DESCRIPTIVE STATISTICS FOR NONPARAMETRIC MODELS I. INTRODUCTION

DESCRIPTIVE STATISTICS FOR NONPARAMETRIC MODELS I. INTRODUCTION The Annals of Statistics 1975, Vol. 3, No.5, 1038-1044 DESCRIPTIVE STATISTICS FOR NONPARAMETRIC MODELS I. INTRODUCTION BY P. J. BICKEL 1 AND E. L. LEHMANN 2 University of California, Berkeley An overview

More information

S-estimators in mapping applications

S-estimators in mapping applications S-estimators in mapping applications João Sequeira Instituto Superior Técnico, Technical University of Lisbon Portugal Email: joaosilvasequeira@istutlpt Antonios Tsourdos Autonomous Systems Group, Department

More information

RESTRICTED UNIFORM BOUNDEDNESS IN BANACH SPACES

RESTRICTED UNIFORM BOUNDEDNESS IN BANACH SPACES RESTRICTED UNIFORM BOUNDEDNESS IN BANACH SPACES OLAV NYGAARD AND MÄRT PÕLDVERE Abstract. Precise conditions for a subset A of a Banach space X are known in order that pointwise bounded on A sequences of

More information

Allocating Public Goods via the Midpoint Rule

Allocating Public Goods via the Midpoint Rule Allocating Public Goods via the Midpoint Rule Tobias Lindner Klaus Nehring Clemens Puppe Preliminary draft, February 2008 Abstract We study the properties of the following midpoint rule for determining

More information

Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses

Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses Ann Inst Stat Math (2009) 61:773 787 DOI 10.1007/s10463-008-0172-6 Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses Taisuke Otsu Received: 1 June 2007 / Revised:

More information

ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS

ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS MATHEMATICS OF OPERATIONS RESEARCH Vol. 28, No. 4, November 2003, pp. 677 692 Printed in U.S.A. ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS ALEXANDER SHAPIRO We discuss in this paper a class of nonsmooth

More information

Robust Statistics, Revisited

Robust Statistics, Revisited Robust Statistics, Revisited Ankur Moitra (MIT) joint work with Ilias Diakonikolas, Jerry Li, Gautam Kamath, Daniel Kane and Alistair Stewart CLASSIC PARAMETER ESTIMATION Given samples from an unknown

More information

STATISTICS SYLLABUS UNIT I

STATISTICS SYLLABUS UNIT I STATISTICS SYLLABUS UNIT I (Probability Theory) Definition Classical and axiomatic approaches.laws of total and compound probability, conditional probability, Bayes Theorem. Random variable and its distribution

More information

Robustness of location estimators under t- distributions: a literature review

Robustness of location estimators under t- distributions: a literature review IOP Conference Series: Earth and Environmental Science PAPER OPEN ACCESS Robustness of location estimators under t- distributions: a literature review o cite this article: C Sumarni et al 07 IOP Conf.

More information

QUADRATIC MAJORIZATION 1. INTRODUCTION

QUADRATIC MAJORIZATION 1. INTRODUCTION QUADRATIC MAJORIZATION JAN DE LEEUW 1. INTRODUCTION Majorization methods are used extensively to solve complicated multivariate optimizaton problems. We refer to de Leeuw [1994]; Heiser [1995]; Lange et

More information

A Mathematical Programming Approach for Improving the Robustness of LAD Regression

A Mathematical Programming Approach for Improving the Robustness of LAD Regression A Mathematical Programming Approach for Improving the Robustness of LAD Regression Avi Giloni Sy Syms School of Business Room 428 BH Yeshiva University 500 W 185 St New York, NY 10033 E-mail: agiloni@ymail.yu.edu

More information

Depth Measures for Multivariate Functional Data

Depth Measures for Multivariate Functional Data This article was downloaded by: [University of Cambridge], [Francesca Ieva] On: 21 February 2013, At: 07:42 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954

More information

Testing Some Covariance Structures under a Growth Curve Model in High Dimension

Testing Some Covariance Structures under a Growth Curve Model in High Dimension Department of Mathematics Testing Some Covariance Structures under a Growth Curve Model in High Dimension Muni S. Srivastava and Martin Singull LiTH-MAT-R--2015/03--SE Department of Mathematics Linköping

More information

Topological properties

Topological properties CHAPTER 4 Topological properties 1. Connectedness Definitions and examples Basic properties Connected components Connected versus path connected, again 2. Compactness Definition and first examples Topological

More information

arxiv: v1 [math.oc] 22 Sep 2016

arxiv: v1 [math.oc] 22 Sep 2016 EUIVALENCE BETWEEN MINIMAL TIME AND MINIMAL NORM CONTROL PROBLEMS FOR THE HEAT EUATION SHULIN IN AND GENGSHENG WANG arxiv:1609.06860v1 [math.oc] 22 Sep 2016 Abstract. This paper presents the equivalence

More information

AP Statistics Cumulative AP Exam Study Guide

AP Statistics Cumulative AP Exam Study Guide AP Statistics Cumulative AP Eam Study Guide Chapters & 3 - Graphs Statistics the science of collecting, analyzing, and drawing conclusions from data. Descriptive methods of organizing and summarizing statistics

More information

Does Compressed Sensing have applications in Robust Statistics?

Does Compressed Sensing have applications in Robust Statistics? Does Compressed Sensing have applications in Robust Statistics? Salvador Flores December 1, 2014 Abstract The connections between robust linear regression and sparse reconstruction are brought to light.

More information

arxiv: v1 [math.st] 2 Aug 2007

arxiv: v1 [math.st] 2 Aug 2007 The Annals of Statistics 2007, Vol. 35, No. 1, 13 40 DOI: 10.1214/009053606000000975 c Institute of Mathematical Statistics, 2007 arxiv:0708.0390v1 [math.st] 2 Aug 2007 ON THE MAXIMUM BIAS FUNCTIONS OF

More information

Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data

Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data Sri Lankan Journal of Applied Statistics (Special Issue) Modern Statistical Methodologies in the Cutting Edge of Science Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations

More information

Outlier detection for high-dimensional data

Outlier detection for high-dimensional data Biometrika (2015), 102,3,pp. 589 599 doi: 10.1093/biomet/asv021 Printed in Great Britain Advance Access publication 7 June 2015 Outlier detection for high-dimensional data BY KWANGIL RO, CHANGLIANG ZOU,

More information

The properties of L p -GMM estimators

The properties of L p -GMM estimators The properties of L p -GMM estimators Robert de Jong and Chirok Han Michigan State University February 2000 Abstract This paper considers Generalized Method of Moment-type estimators for which a criterion

More information

Filters in Analysis and Topology

Filters in Analysis and Topology Filters in Analysis and Topology David MacIver July 1, 2004 Abstract The study of filters is a very natural way to talk about convergence in an arbitrary topological space, and carries over nicely into

More information

Theorems. Theorem 1.11: Greatest-Lower-Bound Property. Theorem 1.20: The Archimedean property of. Theorem 1.21: -th Root of Real Numbers

Theorems. Theorem 1.11: Greatest-Lower-Bound Property. Theorem 1.20: The Archimedean property of. Theorem 1.21: -th Root of Real Numbers Page 1 Theorems Wednesday, May 9, 2018 12:53 AM Theorem 1.11: Greatest-Lower-Bound Property Suppose is an ordered set with the least-upper-bound property Suppose, and is bounded below be the set of lower

More information

0.1 Pointwise Convergence

0.1 Pointwise Convergence 2 General Notation 0.1 Pointwise Convergence Let {f k } k N be a sequence of functions on a set X, either complex-valued or extended real-valued. We say that f k converges pointwise to a function f if

More information

ASYMPTOTICS OF REWEIGHTED ESTIMATORS OF MULTIVARIATE LOCATION AND SCATTER. By Hendrik P. Lopuhaä Delft University of Technology

ASYMPTOTICS OF REWEIGHTED ESTIMATORS OF MULTIVARIATE LOCATION AND SCATTER. By Hendrik P. Lopuhaä Delft University of Technology The Annals of Statistics 1999, Vol. 27, No. 5, 1638 1665 ASYMPTOTICS OF REWEIGHTED ESTIMATORS OF MULTIVARIATE LOCATION AND SCATTER By Hendrik P. Lopuhaä Delft University of Technology We investigate the

More information

Letter-value plots: Boxplots for large data

Letter-value plots: Boxplots for large data Letter-value plots: Boxplots for large data Heike Hofmann Karen Kafadar Hadley Wickham Dept of Statistics Dept of Statistics Dept of Statistics Iowa State University Indiana University Rice University

More information

Inference For High Dimensional M-estimates: Fixed Design Results

Inference For High Dimensional M-estimates: Fixed Design Results Inference For High Dimensional M-estimates: Fixed Design Results Lihua Lei, Peter Bickel and Noureddine El Karoui Department of Statistics, UC Berkeley Berkeley-Stanford Econometrics Jamboree, 2017 1/49

More information

REGRESSION ANALYSIS AND ANALYSIS OF VARIANCE

REGRESSION ANALYSIS AND ANALYSIS OF VARIANCE REGRESSION ANALYSIS AND ANALYSIS OF VARIANCE P. L. Davies Eindhoven, February 2007 Reading List Daniel, C. (1976) Applications of Statistics to Industrial Experimentation, Wiley. Tukey, J. W. (1977) Exploratory

More information

Some Background Material

Some Background Material Chapter 1 Some Background Material In the first chapter, we present a quick review of elementary - but important - material as a way of dipping our toes in the water. This chapter also introduces important

More information

Solving Corrupted Quadratic Equations, Provably

Solving Corrupted Quadratic Equations, Provably Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin

More information

robustness, efficiency, breakdown point, outliers, rank-based procedures, least absolute regression

robustness, efficiency, breakdown point, outliers, rank-based procedures, least absolute regression Robust Statistics robustness, efficiency, breakdown point, outliers, rank-based procedures, least absolute regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html

More information

Lecture 12 Robust Estimation

Lecture 12 Robust Estimation Lecture 12 Robust Estimation Prof. Dr. Svetlozar Rachev Institute for Statistics and Mathematical Economics University of Karlsruhe Financial Econometrics, Summer Semester 2007 Copyright These lecture-notes

More information

Optimization and Optimal Control in Banach Spaces

Optimization and Optimal Control in Banach Spaces Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,

More information

REAL RENORMINGS ON COMPLEX BANACH SPACES

REAL RENORMINGS ON COMPLEX BANACH SPACES REAL RENORMINGS ON COMPLEX BANACH SPACES F. J. GARCÍA PACHECO AND A. MIRALLES Abstract. In this paper we provide two ways of obtaining real Banach spaces that cannot come from complex spaces. In concrete

More information

Supplementary Material for Wang and Serfling paper

Supplementary Material for Wang and Serfling paper Supplementary Material for Wang and Serfling paper March 6, 2017 1 Simulation study Here we provide a simulation study to compare empirically the masking and swamping robustness of our selected outlyingness

More information

Topology Proceedings. COPYRIGHT c by Topology Proceedings. All rights reserved.

Topology Proceedings. COPYRIGHT c by Topology Proceedings. All rights reserved. Topology Proceedings Web: http://topology.auburn.edu/tp/ Mail: Topology Proceedings Department of Mathematics & Statistics Auburn University, Alabama 36849, USA E-mail: topolog@auburn.edu ISSN: 0146-4124

More information

DD-Classifier: Nonparametric Classification Procedure Based on DD-plot 1. Abstract

DD-Classifier: Nonparametric Classification Procedure Based on DD-plot 1. Abstract DD-Classifier: Nonparametric Classification Procedure Based on DD-plot 1 Jun Li 2, Juan A. Cuesta-Albertos 3, Regina Y. Liu 4 Abstract Using the DD-plot (depth-versus-depth plot), we introduce a new nonparametric

More information

Robust high-dimensional linear regression: A statistical perspective

Robust high-dimensional linear regression: A statistical perspective Robust high-dimensional linear regression: A statistical perspective Po-Ling Loh University of Wisconsin - Madison Departments of ECE & Statistics STOC workshop on robustness and nonconvexity Montreal,

More information

An introduction to Mathematical Theory of Control

An introduction to Mathematical Theory of Control An introduction to Mathematical Theory of Control Vasile Staicu University of Aveiro UNICA, May 2018 Vasile Staicu (University of Aveiro) An introduction to Mathematical Theory of Control UNICA, May 2018

More information

Robustness of Principal Components

Robustness of Principal Components PCA for Clustering An objective of principal components analysis is to identify linear combinations of the original variables that are useful in accounting for the variation in those original variables.

More information