Robust estimators based on generalization of trimmed mean
|
|
- Oliver Osborne
- 5 years ago
- Views:
Transcription
1 Communications in Statistics - Simulation and Computation ISSN: (Print) (Online) Journal homepage: Robust estimators based on generalization of trimmed mean Lukáš Adam & Přemysl Bejda To cite this article: Lukáš Adam & Přemysl Bejda (017): Robust estimators based on generalization of trimmed mean, Communications in Statistics - Simulation and Computation, DOI: / To link to this article: Accepted author version posted online: 06 Jun 017. Published online: 06 Jun 017. Submit your article to this journal Article views: 1 View related articles View Crossmark data Full Terms & Conditions of access and use can be found at Download by: [Institute of Botany] Date: 18 July 017, At: :43
2 COMMUNICATIONS IN STATISTICS SIMULATION AND COMPUTATION 017, VOL. 0, NO. 0, Robust estimators based on generalization of trimmed mean Lukáš Adam a and Přemysl Bejda a a Institute of Information Theory and Automation, Czech Academy of Sciences, Prague, Czech Republic; b Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic ABSTRACT In this article, we propose new estimators of location. These estimators select a robust set around the geometric median, enlarge it, and compute the (iterative) weighted mean from it. By doing so, we obtain a robust estimator in the sense of the breakdown point, which uses more observations than standard estimators. We apply our approach on the concepts of boxplot and bagplot. We work in a general normed vector space and allow multi-valued estimators. 1. Introduction ARTICLE HISTORY Received 15 February 016 Accepted 5 May 017 KEYWORDS Breakdown point; Estimators; Geometric median; Location; Trimmed mean MATHEMATICS SUBJECT CLASSIFICATION 6G35; 6G05 Robust statistical methods try to weaken quite restrictive assumptions of classical methods. Forinstance,inmanystatisticalmodelsitisassumed that residuals are independent and identically distributed. Moreover, the assumption of normality of residuals is often added and the Euclidean norm is usually employed. But for real data these assumptions are often violated and classical approaches fail. This leads to a necessity to consider robust statistical methods. As classical examples of robust methods we mention replacing the l norm by the l 1 norm orreplacingthemeanbythemedian.suchdevelopmentwasalsoenabledbytheprogressin computer technologies as the robust methods are usually more complicated and computationally demanding than the classical methods. In this article, we are interested in robust estimators for the parameter of location. One of thefirstattemptstodealwithsuchestimatorsarem-estimators,seehuber(1964)ormaronna (1976). They are computationally simple but suffer from low breakdown point, see Hampel (1971)orDonohoandHuber(1983). This, loosely speaking, expresses the data fraction, which can be arbitrarily modified without affecting the finiteness of the estimator. Its value 1 d+1 is at most,seemaronnaetal.(006), where d is the dimension of observations. Later, multiple estimators got proposed, among others we mention: minimumvolumeellipsoidestimators(mve,rousseeuw,1985) whose name stems from the fact that among all proper ellipsoids containing at least half of observations, the one given by MVE has minimal volume. However, their efficiency is rather poor. S-estimators (Davies, 1987) have been suggested to overcome the low efficiency of MVE. They combine approaches of MVE and M-estimators. τ-estimators (Lopuhaä, 1991) also employ the idea of M-estimators but they do not require preliminary scale estimator. CONTACT Přemysl Bejda premyslbejda@gmail.com Faculty of Mathematics and Physics, Charles University, Sokolovská 83, Prague , Czech Republic. 017 Taylor & Francis Group, LLC
3 L. ADAM AND P. BEJDA Stahel Donoho estimator (Donoho, 198; Stahel,1981) isbasedontheideathatany outlier in multivariate case should be an outlier in some univariate projection. Theadvantageoftheseestimatorsistheirhighbreakdownpoint,whichisthehighestwhich a shift equivariant estimator can attain. However, their computation usually requires heavy effort. Therefore, Maronna and Zamarb (00)suggestedawayofreducingthecomputational complexity while sustaining the high breakdown point. As a price to pay, one is no longer able to estimate the covariance structure. In this article, we follow the goal of finding easily computable robust estimators with high breakdownpoint.fromasample,wefirstselectasetaofobservations,whicharerobustinthe sense of breakdown point. Then we utilize these observations to construct further estimators, for example, by enlarging A and computing the (iterative) weighted mean of observations fromtheenlargedset.setaisbased on the geometric median (also called the spatial median), which is a direct generalization of the real median proposed by Haldane (1948). Since the geometric median has breakdown point of 1, our estimators will be able to keep this property as well. Our estimators enjoy the following nice properties: Instead of considering the R d space, we work with a general normed vector space X.This opens a natural way to tackle time series by our approach. Our estimators have high breakdown point and are simple to compute. We partially consider the covariance structure. We are able to work with set-valued estimators instead of single-valued estimators. Note that estimators usually only satisfy several of the above properties, for example, either they have high breakdown point or they do not take into account the covariance structure at all. This article is organized as follows: in the first part of Section we define basic concepts of breakdown point and geometric median. Even though geometric median may be a set and not a point in general, most authors do not handle this fact. Because of this, we have decided to work with estimators, which are multifunctions (also known as set-valued maps). The second part of Section contains new results. We propose new estimators, discuss their breakdown point, and provide a comparison between our algorithms and M-estimators. Ournotationisasfollows:by(X, ) we understand a normed vector space. For a set A X, we define A := sup x A x. Often we will use the bold notation for x = (x 1,...,x n ) X n.bythelowerindexweunderstandacomponentofavectorwhileby the upper index, we mean an iteration number. A multifunction R : X Y is a generalization of a function, where R(x) doesnothavetobeonepointbutmaybea(nonempty)subsetof Y. WesaythatR is bounded on bounded sets if x A R(x) is a bounded set for all bounded sets A X. Since we consider multi-valued estimators, some of the definitions are slightly generalized.. New estimators based on generalization of trimmed mean In this section, we first recall the geometric median and on its basis derive other estimators. The basic idea is to find first the geometric median, then restrict ourselves to a set of neighboring observations and construct an estimator based only on this restricted set. If this set is chosen in a proper way, the estimator will have a breakdown point of 1.Thebreakdownpoint is nowadays one of the standard measures of robustness and expresses the minimal proportion of the data, which can be corrupted (made arbitrarily distant) before the estimator becomes unbounded.
4 COMMUNICATIONS IN STATISTICS SIMULATION AND COMPUTATION 3 Definition.1. Consider a normed vector space X and an estimator T n : X n X of some functional T.Forx = (x 1,...,x n ) X n and m = 1,...,n define A m,n (x) := { x X n x and x have at most m different coordinates}, m n (T n, x) := max m {1,...,n} {m sup x Am,n (x), z T n ( x), z T n (x) }. z z < Then we say that T n has the breakdown point ε n (T n, x) := 1 n m n (T n, x). Finally, for family of estimators {T n : X n X} we define the asymptotic breakdown point as ε := lim inf n x X ε n (T n, x). We continue with the definition of geometric median. Definition.. We define the geometric median as a multifunction ˆT n : X n X satisfying ˆT n (x 1,...,x n ) = argmin a X a x j. (1) We present now two examples. The first one shows that the choice of the norm can change the geometric median in a significant way and that the geometric median may indeed be multi-valued.thesecondonedepictsasimplesituationwhereweareabletocomputethe geometric median. Moreover, it will be used later in some proofs. Example.1. Consider X = R and points x 1 = (1, 0), x = ( 1, 0), andx 3 = x 4 = (0, 1). Then it is not difficult to verify that for the following norms, we have (R, 1 ) ˆT 4 (x 1,...,x 4 ) = conv{(0, 0), (0, 1)}, (R, ) ˆT 4 (x 1,...,x 4 ) ={(0, 1)}, (R, ) ˆT 4 (x 1,...,x 4 ) ={(0, 1)}, where conv stands for the convex hull. We see that for and the geometric median is determined in a unique way. This does not hold any more for 1. Example.. Consider x X and x = (x 1,...,x n ),wherex 1 = =x m = x for some m.fixanyy X.Thenwehave n x x i = i=1 = = i=m+1 i=m+1 x x i x y + i=m+1 i=m+1 j=1 x y + y x i + i=m+1 y x i m y x i i=1 y x i +(n m) y x i=1 due to the m n.butthismeansthat x ˆT n (x). y x i i=1 m y x Recall that a shift equivariant estimator T n satisfies T n (x 1 + y,...,x n + y) = T n (x 1,...,x n ) + y for all x 1,...,x n X and y X. For single-valued estimators, it has been i=1
5 4 L. ADAM AND P. BEJDA showninmaronnaetal.(006, formula (3.5)) that a shift equivariant estimator satisfies ε n (T n, x) 1 n 1. () n Since the geometric median possesses this property, it is not surprising, that we obtain formula ()aswell.moreover,weobtainevenequalityinthisestimate. Lemma.1. For any x = (x 1,...,x n ) X n,wehaveforthegeometricmedian ε n ( ˆT n, x) = 1 n 1, n and thus for the asymptotic breakdown point we have ε = 1. Proof. The second statement is an immediate consequence of the first one. Note that the first statement is equivalent to m n ( ˆT n, x) = n 0 with n 0 := n 1.FromExample., weseethat m n ( ˆT n, x) < n,whichfurtherimpliesm n ( ˆT n, x) n 1 n 0.Tofinishtheproof,itissufficient to show that m n ( ˆT n, x) n 0. Consider thus any x A n0,n(x) and denote by I the index set of coordinates where x and x differ and by J its complement. Denote by n 1 the cardinality of I and observe that n 1 n 0. Denoting further R := max i=1,...,n x i,wehave x j = x j R for all j J. Takingany j J and y X,weobtainthefollowingestimate: x j x l x j x l +(n n 1 )R x j y + y x l +(n n 1 )R l I l I l I = y x l y x l +n 1 x j y +(n n 1 )R l=1 l J y x l y x j + x j x l +n 1 x j y +(n n 1 )R l=1 l J l J y x l +(n 1 n) y x j +4(n n 1 )R. l=1 l=1 Since n 1 n n 0 n < 0, we obtain that there is R I > 0suchthatforall y R I we have x j x l < y x l. l=1 But this means that the geometric median lies in a ball with radius R I.Sincethereisonlyfinite number of possible subsets I,wehavefinishedtheproof. The next lemma is allows us to compute the breakdown point of an estimator. Lemma.. Consider multifunctions 1 : X n X m and : X n X m X. Assume that the following assumptions are satisfied: 1. All components of 1 have breakdown point at least p.. There exists 3 : X m Xwhichisboundedonboundedsetssuchthat (x, y) 3 (y) for all x X n and y X m. l=1
6 COMMUNICATIONS IN STATISTICS SIMULATION AND COMPUTATION 5 Then estimator T n defined as has breakdown point at least p. T n (x) := y 1 (x) (x, y) Proof. Due to the first assumption, there exists some R > 0suchthatform := np, all x A m,n (x) and for all y 1 ( x) we have y R. But then we have ( x, y) 3 (y), which is uniformly bounded due to the second assumption. Thus, the statement has been proved. We come now to new estimators. For a set S X and a point x = (x 1,...,x n ) X n,we define L (S, x) := { } x I X n 1 I {1,...,n} : max x i y min x i y, i I i {1,...,n}\I y S where x I denotes the restriction of x to components I. The interpretation of this set goes as follows: we select some y S and choose x I to be the n 1 observations closest to y. Then L (S, x) is the union of all such subsets with respect to all choices of y S. Since every such x I contains less than n components of x, this set is stable with respect to perturbations of x whenever less than one half of observations is contaminated. We will use L (S, x) todefinefurtherestimators.thenexttheoremsaysthatifwestartwith the geometric median S = ˆT n (x) and a multifunction R with certain boundedness properties, we obtain an estimator with the same breakdown point as the geometric median. Theorem.1. Consider any multifunction R : X n 1 X, which is bounded on bounded sets and for which there exists z k such that R(z k,...,z k ).ThenforestimatorT n : X n X defined as T 1 n (x) := y L ( ˆT n (x),x) R(y) (3) and for every x = (x 1,...,x n ) X n, we have the following relation ε n (T 1 n, x) = ε n ( ˆT n, x) = 1 n 1. n Proof. From Lemma. with m = n 1, 1 (x) = L ( ˆT n (x), x),and (x, y) = R(y) and Lemma.1 we obtain ε n (T 1 n, x) ε n ( ˆT n, x) = 1 n 1. n To show the opposite inequality, realize that the statement is equivalent to m n (T n 1, x) n 1. For contradiction assume that m n (T n 1, x) n n.wechangethefirst n coordinates of x to z k and denote the perturbed point by x k.thenexample. tellsusthat z k ˆT n ( x k ).DuetothedefinitionofL,weseethat(z k,...,z k ) L ( ˆT n ( x k ), x k ) and the imposed assumption of R implies a contradiction. If both R and L ( ˆT n (x), x) are single-valued functions, expression (3)reducesto T 1 n (x) = R(L ( ˆT n (x), x)).
7 6 L. ADAM AND P. BEJDA Moreover, in such a case L ( ˆT n (x), x) denotes one half of observations, which are closest to the geometric median. There are several natural choices for R: for example, mean, weighted mean, or geometric median. One of possible drawbacks of estimator (3)isthatitutilizesonlyhalfoftheoriginaldata.We want to make use of as many observations as possible while maintaining the high breakdown point. To this aim, we first consider a general set S X m for some m N,forexample,we may consider mean of x as a subset of X or L ( ˆT n (x), x) as a subset of X n 1.Thenwe consider some b : X X n [0, ) and enlarge S by defining E b (S, x) := { } x I I ={i x i m j=1 B(y j, b(y j, x))}. (4) y S Here, B(y j, b(y j, x)) stands for a ball around y j with radius b(y j, x).theinterpretationgoes as follows: from S we select y, make balls around all of its components, and select all components of x,whichlieintheunionofthisballs. Example.3. Consider the case of X = R, n = 5, and x = ( 3,, 0,, 4). Then the geometric median equals to ˆT n (x) = 0 and since n 0 = n 1 =, we also have If we consider b 1, then L ( ˆT n (x), x) ={(, 0), (0, )} R. E b (L ( ˆT n (x), x), x) ={( 3,, 0), (0, )}. Note that both elements of E b (L ( ˆT n (x), x), x) areofadifferentdimension. We obtain the following variant of Theorem.1,for which we omit its identical proof. Theorem.. Consider b : X X n [0, ) boundedonboundedsetsinthefirstvariable, uniformly in the second one, any family of multifunctions R s : X s Xfors= 1,...,n, which are all bounded on bounded sets and for which there exists z k such that R s (z k,...,z k ). Then for estimators Tn : X n XandTn 3 : X n Xdefinedas T n (x) := T 3 n (x) := y E b ( ˆT n (x),x) y E b (L ( ˆT n (x),x),x) R dim y (y), R dim y (y) and for every x = (x 1,...,x n ) X n, we have the following relation for breakdown points ε n (T n, x) = ε n (T 3 n, x) = ε n ( ˆT n, x) = 1 n 1. n Function b should neither have too large values (which corresponds to big enlargement of the set in question) because outliers may be close to the non-contaminated data, nor too small values because some information could be missed. We suggest a possible choice in the Appendix. To improve the behavior of the estimators, we implement an iterative procedure. We start with geometric median z 0 = ˆT n (x) and in every iteration k compute a new estimate z k. To do so, we employ (5b) withr being the weighted mean, where the (non-normalized) (5a) (5b)
8 COMMUNICATIONS IN STATISTICS SIMULATION AND COMPUTATION 7 weights satisfy 1 ( ) ify i L ({z k 1 }, x), w i (z k 1, y i ) = max 1 y i y j otherwise (6) b k (y j,x) y j L ({z k 1 },x) for some b k based on L ({z k 1 }, x).thischoiceofweightsmakesuseofthepossiblydivisionof components of y E b (L ({z k 1 }, x), x) into two parts: those who belong to L ({z k 1 }, x) and those who were added by enlarging this set. For the first part, we choose the (non-normalized) weight equal to one, the weight for observations from the second part decreases with the increasing distance to L ({z k 1 }, x). We summarize this approach in Algorithm.1. Considering the termination criterion, any standard criterion may be used, for example, if the (relative) change in z k is small. Algorithm.1 An estimator based on iterative weighting Input: observations x = (x 1,...,x n ) 1: k 0, z 0 ˆT n (x) : while not terminate do 3: k k + 1 4: determine b k based on L ({z k 1 }, x) 5: pick any y k E b k(l ({z k 1 }, x), x) 6: compute w k according to formula (6) and renormthem such that their sum equals to 1 7: z k dim y i=1 w k i yk i 8: end while 9: return estimate ˆx z k Finally, we would like to point out that every update step in Algorithm.1 keeps the stability result, which we already mentioned several times in the previous text. For k = 1, we may write z k = y 1 (x) (x, y), where y = y 1, 1 (x) := E b (L ({z 0 }, x), x), and (x, y) := dim y 1 i=1 w i (x, y 1 )yi 1.Then 1 has breakdown point 1 n 1 n and since (x, y) n max yi y y i holds true, thanks to Lemma. we obtain that z 1 hasthesamebreakdownpoint.byapplyingthesameprocedure to subsequent iterations, we obtain the same result for all z k. In the previous text, we highlighted some benefits of our estimators. Note that naturally there are also situations where it is better to use standard estimators. Consider, for example, the one-dimensional double exponential distribution with density 1 exp( x μ ). Then the (geometric) median coincides with the maximum likelihood estimator of μ, seesec. 6.3in Lehmann and Casella (006), and therefore the median is the most efficient estimator. Thus, by employing additional observations apart from the median, we only worsen the quality of an estimator. However, to benefit from such situation, we would have to know the true distribution and know that there is no contamination. Remark.1. There is some connection between our estimators with M-estimators. In Algorithm., we summarize the algorithm from Maronna et al. (006, sec..7.3).
9 8 L. ADAM AND P. BEJDA Algorithm. M-estimator from Require: observations x = (x 1,...,x n ), weighting functions W 1 and W 1: k 0, z 0 ˆT n (x), dispersion estimate σ 0 : while not terminate do 3: k k + 1 4: ri k x i z k 1 σ k 1 5: w1,i k W 1(ri k), wk,i W (ri k ) andnorm the weights to sum to one 6: z k n i=1 wk 1,i x i, σ k 1 n 7: end while 8: return estimate ˆx z k n i=1 wk,i (x i z k 1 ) Both approaches (iteratively) compute a weighted mean of observations. While M- estimators are based on the maximum likelihood estimate, ours are based on geometric intuition and trimmed mean. If X = R and if W 1 in Algorithm. has finite support, we can say that our estimators belong to the very wide class of M-estimators. This changes for R d though. Under the standard assumption that W 1 is symmetric around zero and non-increasing on rays emanating from zero, the weight of an observation depends only on the distance from z k 1. Thus, two observations have the same weight if and only if their distance to z k 1 is identical. On the other hand, in our approach all points in L ({z k 1 }, x) havethesameweightand this set is enlarged farther for distant observations. Thus, even though none of the algorithms estimates the covariance structure, our algorithm makes at least an attempt to consider it. Of course, there are M-estimators, which along with the location also properly estimate the covariance structure. But this raises the computational complexity and reduces the breakdown point to 1.Tosummarize:wecansaythatouralgorithmstrytopickthebestpropertiesof d+1 M-estimators, on one hand they have high breakdown point and are simple to compute; on the other hand, they at least partially consider the covariance structure. Another advantage of our estimator over M-estimators is a simpler theoretical analysis. To show that our estimator has the limiting breakdown point 1, it is sufficient to apply Lemma., which itself directly follows from the definition of the breakdown point. To the best of our knowledge, such direct application of the definition is not possible for M-estimators, for example, one has to take care of properties of weighting function due to the division by dispersion. 3. Numerical results In this section, we first show numerical performance of our estimators and then show how they comply with the well-known concepts of boxplot and bagplot Numerical performance We consider X = R d with d {1, 15} and compare our algorithms with known estimators; for reader s convenience we summarize the used algorithms in Table 1. To generate the samples, we first generate z i from N(0, 1), thencontaminatethemby some distribution with probability p and finally modify them via a covariance structure. This modification is performed in the following way: we randomly generate a correlation matrix C and diagonal matrix with diagonal elements having distribution U[0.5, 10]. Then we compute covariance matrix V = C, its Cholesky decomposition V = S S,andfinallyset y i = Sz i + μ, whereμ := (0,...,d 1). TogetherweconsiderN = 10,000 samples with
10 COMMUNICATIONS IN STATISTICS SIMULATION AND COMPUTATION 9 Table 1. Summary of algorithms. The horizontal line divides them in known and our algorithms. Mean mean Med median or geometric median Trun α-truncated mean with α = 0. Winsor α-winsorized mean with α = 0. M1 Huber M-estimator, see Huber (1981) M Algorithm. from Maronna et al. (006) SD Stahel Donoho estimator, see Stahel (1981)andDonoho (198) GM1 formula (3), where R is themean GM formula (5a), where R is themean GM3 Algorithm.1 Table. Value of loss function (7) for contaminations of N(0, 1) by some other distribution with probability p for X = R. Distribution p Mean Med Trun Winsor M1 M GM1 GM GM N(0, 100) Cauchy U( 10, 10) N(0, 100) Cauchy U( 10, 10) U( 0, 10) n = 100 observations. The loss function equals to 1 N N i=1 d ˆx i, j μ j, (7) j=1 where ˆx i, j denotes an estimate for sample i and coordinate j = 1,...,d. We present the results in Table for X = R and in Table 3 for X = R 15. The values show the loss function (7) for contaminating distribution (first column) with given probability (second column). Bold numbers are the best values among all estimators and numbers in italic are those within 5% of the best loss function value. For X = R, the results between M-estimators and our estimators are comparable with M-estimators in a slight lead, which is not surprising due to Remark.1. ForX = R 15, the performance of both estimators turn around and now our estimators perform better than the examined M-estimators. This is again expected as our estimators try to take into account the covariance structure as explained in Remark.1. For the second case, our estimators manage to beat the Stahel Donoho estimators almost in all cases. Table 3. Value of loss function (7) for contaminations of N(0, I) by some other distribution with probability p for X = R 15. Distribution p Mean Med M1 SD GM1 GM GM N(0, 100) Cauchy U( 10, 10) N(0, 100) Cauchy U( 10, 10) U( 0, 10)
11 10 L. ADAM AND P. BEJDA Figure 1. Comparison of the classical boxplot (left-hand side of each figure) and our modification (righthand side of each figure). In both cases, we contaminate N(0, 1) with a probability p by the distribution described under the figures. 3.. Relation to boxplot Boxplot was proposed for the first time in Tukey (1977). It takes the median, then computes the interquartile range (IQR), which is later widened. Observations, which are not present in this widening (known as whiskers) are considered as outliers. We compare boxplot with our method, where instead of considering IQR, we take y L ( ˆT n (x), x). Thus,wedonot need to take 5% observations with lower value than the median and 5% observations with higher value thanthe median but we take50%observations closest to the median. Whiskers are based on E b ({y}, x). We depict this comparison in Fig. 1.Wecontaminatethestandardnormaldistributionby some other distribution. In each (sub)figure, the left-hand side is the boxplot and the righthand side is our modification. Since both approaches differ in the way in which they treat nonsymmetry, both graphs in the left figure are identical. However, if we assume non-symmetric distributions (right figure), then our modification is able to detect outliers in a better way than the standard boxplot Relation to bagplot In this subsection, we consider generalization of boxplot into more dimensions. For two dimensions, this is known as bagplot and was studied for the first time in Rousseeuw et al. (1999). For generalization to functional data, see, for example, Sun and Genton (011). To construct bagplot, a real number called depth is assigned to every observation, see Zuo and Serfling (000). Then the observation with highest value of depth is called the depth median and the convex hull of approximately 50% observations with the highest depth is called the bag (this corresponds to IQR for boxplot). Then the bag is enlarged to the so-called fence (which corresponds to whiskers for boxplot). The observations not in the fence are considered as outliers. Wenowpresentourmodificationofthebagplot,illustratedinFig.. Havingasample x, we compute first its geometric median ˆT n (x). For simplicity, we assume that it is defined uniquely. Then we choose an arbitrary y X n 1 from L ( ˆT n (x), x), construct a convex hull containing all coordinates of y, and call this set the bag. In the last step, we enlarge the bag by considering conv{ ˆT n (x) + 3(y i ˆT n (x))}, i
12 COMMUNICATIONS IN STATISTICS SIMULATION AND COMPUTATION 11 Figure. Description of the bagplot based on GM. : observations, : mean, : ˆT n (x), fullanddashed line: convex envelope of L and E, respectively, :GM1, : GM, X: outliers. see a similar procedure in Rousseeuw et al. (1999). In this figure, we also denote GM1 (as the mean of all observation in the bag) and GM (as the mean of all observation in the fence). Here, the geometric median corresponds to the depth median, the convex hull to the bag, and its enlargement to the fence. Note that the construction of the fence is very similar to the construction of E b (L ( ˆT n (x), x), x). Moreover, since the construction of GM is based on objects, which have the limiting breakdown point of 1,estimatorGMhasthesameproperty. On the other hand, since bagplot is a generalization of boxplot, which has breakdown point of 1 4,wecannotexpectthebagplottohavehigherbreakdownpointthanthis.Thus,ourversion of bagplot deals better with outliers than the original method. Appendix: Choice of b Inthisshortsection,wederiveanestimateforb for algorithms GM and GM3 described in Table 1.Notethatduetotheconstructionofthealgorithm,itissufficienttodefineb i := b(x i, x) for all observations in L ( ˆT n (x), x). Since we want to keep GM as simple as possible, we consider constant b and relax this assumption for GM3. Moreover, we derive different value for one- and more-dimensional cases. We start with a technical lemma. Lemma A.1. Let Y be a random variable with a finite second moment, distribution function G, mean μ, standarddeviationσ,andletq μ,σ denote its quantile function. Then for a fixed α [0, 1], the following ratio does not depend on the values of μ and σ K G,α = q μ,σ (1 α/) q μ,σ (α/). q μ,σ (0.75) q μ,σ (0.5) Proof. This follows from the fact σ q 0,1 (α) + μ = q μ,σ (α). For the case of one dimension, consider a random sample x from N(μ, σ ) and denote its median by ˆT n (x). For simplicity assume that L ( ˆT n (x), x) R n 1 is a singleton. Then we may estimate q μ,σ (0.75) q μ,σ (0.5) by max L ( ˆT n (x), x) min L ( ˆT n (x), x). Then we define b as b := K N(0,1),α max L ( ˆT n (x), x) min L ( ˆT n (x), x) q μ,σ (1 α/) q μ,σ (α/).
13 1 L. ADAM AND P. BEJDA This value thus estimates the distance between chosen quantiles divided by two. If we take all observations, which distance from median is b, then we get, under assumption of not contaminated normality, approximately 1 α of our observations. For the multidimensional case R d, we assume that x follows the N(μ, σ I) distribution. We use max y L ( ˆT n (x),x) y ˆT n (x) as an approximation of σχ0.5,d,whereχ 0.5,d is the quantile function of χ distribution with d degrees of freedom evaluated at probability 0.5. To include approximately fraction α (0, 1) of all observations, we set b := max y ˆT n (x) χ α,d. y L ( ˆT n (x),x) χ0.5,d ForGM3,weconsiderdirectlyR d. Assume again that our sample x has distribution N(μ, σ I),thenforx i from the boundary of L ( ˆT n (x), x) we have x i ˆT n (x) σ χ 0.5,d. Now fixing given probability level α (0, 1) and weight w (0, 1), wewanttohaveallx e with x e ˆT n (x) χα,d σ to have weight (6)atleastw. But plugging this in the definition of weight results in b i x i x e 1 w x e ˆT n (x) 1 w = σ x i ˆT n (x) 1 w x i ˆT n (x) x i ˆT n (x) 1 w ). ( χ α,d χ 0.5,d σ ) ( χ α,d 1 w χ0.5,d Approximating again the distribution and taking minimum value of b i,weset χ α,d b i := (1 w) χ 0.5,d χ 0.5,d In the numerical experiments, we have chosen α = x i ˆT n (x). Acknowledgment The authors would like to thank anonymous reviewers for substantially improving this article. ORCID Lukáš Adam References Davies, P. L. (1987). Asymptotic behavior of S-estimators of multivariate location parameters and dispersion matrices. The Annals of Statistics 15: Donoho, D. L. (198). Breakdown Properties of Multivariate Location Estimators. Ph.D. Qualifying Paper, Harvard University.
14 COMMUNICATIONS IN STATISTICS SIMULATION AND COMPUTATION 13 Donoho, D. L., Huber, P. J. (1983). The notion of breakdown point. In: Bickel, P. J., Doksum, K. A., and Hodges, J. L., eds. A Festschriftfor Erich Lehman. California, Berkeley: Wadsworth, pp Haldane, J. B. S. (1948). Note on the median of a multivariate distribution. Biometrika 35(3 4): Hampel, F. R. (1971). A general qualitative definition of robustness. The Annals of Mathematical Statistics 4(6): Huber, P. J. (1964). Robust estimation of location parameter. The Annals of Mathematical Statistics 35(1): Huber, P. J. (1981). Robust Statistics. New York: John Wiley & Sons. Lehmann,E. L.,Casella, G. (006). Theory of Point Estimation. New York: Springer Science & Business Media. Lopuhaä, H. P. (1991).Multivariateτ-estimators for location and scatter. Canadian Journal of Statistics 19: Maronna, R. (1976). Robust M-estimators of multivariate location and scatter. The Annals of Statistics 4: Maronna, R., Martin, D., Yohai, V. (006). Robust Statistics Theory. New Jersey: John Wiley & Sons. Maronna, R. A., Zamarb, R. H. (00). Robust estimates of location and dispersion for highdimensional datasets. Technometrics 44(4): Rousseeuw, P. J. (1985). Multivariate estimation with high breakdown point. In: Mathematical Statistics and Applications. Amsterdam: Elsevier, pp Rousseeuw, P. J., Ruts, I., Tukey, J. W. (1999). The bagplot: A bivariate boxplot. The American Statistician 53(4): Stahel, W. A. (1981). Breakdown of Covariance Estimators. Research Report 31. ETH Zürich: Fachgruppe für Statistik. Sun, Y., Genton, M. G. (011). Functional boxplots. Journal of Computational and Graphical Statistics 0(): Tukey, J. W. (1977).Exploratory Data Analysis. Massachusetts: Addison-Wesley Publishing Co. Zuo, Y., Serfling, R. (000). General notions of statistical depth function. The Annals of Statistics 8():
TITLE : Robust Control Charts for Monitoring Process Mean of. Phase-I Multivariate Individual Observations AUTHORS : Asokan Mulayath Variyath.
TITLE : Robust Control Charts for Monitoring Process Mean of Phase-I Multivariate Individual Observations AUTHORS : Asokan Mulayath Variyath Department of Mathematics and Statistics, Memorial University
More informationRobust estimation of scale and covariance with P n and its application to precision matrix estimation
Robust estimation of scale and covariance with P n and its application to precision matrix estimation Garth Tarr, Samuel Müller and Neville Weber USYD 2013 School of Mathematics and Statistics THE UNIVERSITY
More informationA Comparison of Robust Estimators Based on Two Types of Trimming
Submitted to the Bernoulli A Comparison of Robust Estimators Based on Two Types of Trimming SUBHRA SANKAR DHAR 1, and PROBAL CHAUDHURI 1, 1 Theoretical Statistics and Mathematics Unit, Indian Statistical
More informationMeasuring robustness
Measuring robustness 1 Introduction While in the classical approach to statistics one aims at estimates which have desirable properties at an exactly speci ed model, the aim of robust methods is loosely
More informationMULTIVARIATE TECHNIQUES, ROBUSTNESS
MULTIVARIATE TECHNIQUES, ROBUSTNESS Mia Hubert Associate Professor, Department of Mathematics and L-STAT Katholieke Universiteit Leuven, Belgium mia.hubert@wis.kuleuven.be Peter J. Rousseeuw 1 Senior Researcher,
More informationInequalities Relating Addition and Replacement Type Finite Sample Breakdown Points
Inequalities Relating Addition and Replacement Type Finite Sample Breadown Points Robert Serfling Department of Mathematical Sciences University of Texas at Dallas Richardson, Texas 75083-0688, USA Email:
More informationDetecting outliers in weighted univariate survey data
Detecting outliers in weighted univariate survey data Anna Pauliina Sandqvist October 27, 21 Preliminary Version Abstract Outliers and influential observations are a frequent concern in all kind of statistics,
More informationRe-weighted Robust Control Charts for Individual Observations
Universiti Tunku Abdul Rahman, Kuala Lumpur, Malaysia 426 Re-weighted Robust Control Charts for Individual Observations Mandana Mohammadi 1, Habshah Midi 1,2 and Jayanthi Arasan 1,2 1 Laboratory of Applied
More informationTHE BREAKDOWN POINT EXAMPLES AND COUNTEREXAMPLES
REVSTAT Statistical Journal Volume 5, Number 1, March 2007, 1 17 THE BREAKDOWN POINT EXAMPLES AND COUNTEREXAMPLES Authors: P.L. Davies University of Duisburg-Essen, Germany, and Technical University Eindhoven,
More informationIntroduction to Real Analysis Alternative Chapter 1
Christopher Heil Introduction to Real Analysis Alternative Chapter 1 A Primer on Norms and Banach Spaces Last Updated: March 10, 2018 c 2018 by Christopher Heil Chapter 1 A Primer on Norms and Banach Spaces
More informationIntroduction Robust regression Examples Conclusion. Robust regression. Jiří Franc
Robust regression Robust estimation of regression coefficients in linear regression model Jiří Franc Czech Technical University Faculty of Nuclear Sciences and Physical Engineering Department of Mathematics
More informationA Modified M-estimator for the Detection of Outliers
A Modified M-estimator for the Detection of Outliers Asad Ali Department of Statistics, University of Peshawar NWFP, Pakistan Email: asad_yousafzay@yahoo.com Muhammad F. Qadir Department of Statistics,
More informationIntroduction to robust statistics*
Introduction to robust statistics* Xuming He National University of Singapore To statisticians, the model, data and methodology are essential. Their job is to propose statistical procedures and evaluate
More informationHigh Breakdown Point Estimation in Regression
WDS'08 Proceedings of Contributed Papers, Part I, 94 99, 2008. ISBN 978-80-7378-065-4 MATFYZPRESS High Breakdown Point Estimation in Regression T. Jurczyk Charles University, Faculty of Mathematics and
More informationEfficient and Robust Scale Estimation
Efficient and Robust Scale Estimation Garth Tarr, Samuel Müller and Neville Weber School of Mathematics and Statistics THE UNIVERSITY OF SYDNEY Outline Introduction and motivation The robust scale estimator
More informationON THE CALCULATION OF A ROBUST S-ESTIMATOR OF A COVARIANCE MATRIX
STATISTICS IN MEDICINE Statist. Med. 17, 2685 2695 (1998) ON THE CALCULATION OF A ROBUST S-ESTIMATOR OF A COVARIANCE MATRIX N. A. CAMPBELL *, H. P. LOPUHAA AND P. J. ROUSSEEUW CSIRO Mathematical and Information
More informationIMPROVING THE SMALL-SAMPLE EFFICIENCY OF A ROBUST CORRELATION MATRIX: A NOTE
IMPROVING THE SMALL-SAMPLE EFFICIENCY OF A ROBUST CORRELATION MATRIX: A NOTE Eric Blankmeyer Department of Finance and Economics McCoy College of Business Administration Texas State University San Marcos
More informationJensen s inequality for multivariate medians
Jensen s inequality for multivariate medians Milan Merkle University of Belgrade, Serbia emerkle@etf.rs Given a probability measure µ on Borel sigma-field of R d, and a function f : R d R, the main issue
More informationIntroduction to Robust Statistics. Elvezio Ronchetti. Department of Econometrics University of Geneva Switzerland.
Introduction to Robust Statistics Elvezio Ronchetti Department of Econometrics University of Geneva Switzerland Elvezio.Ronchetti@metri.unige.ch http://www.unige.ch/ses/metri/ronchetti/ 1 Outline Introduction
More informationAsymptotic Relative Efficiency in Estimation
Asymptotic Relative Efficiency in Estimation Robert Serfling University of Texas at Dallas October 2009 Prepared for forthcoming INTERNATIONAL ENCYCLOPEDIA OF STATISTICAL SCIENCES, to be published by Springer
More informationStahel-Donoho Estimation for High-Dimensional Data
Stahel-Donoho Estimation for High-Dimensional Data Stefan Van Aelst KULeuven, Department of Mathematics, Section of Statistics Celestijnenlaan 200B, B-3001 Leuven, Belgium Email: Stefan.VanAelst@wis.kuleuven.be
More informationYIJUN ZUO. Education. PhD, Statistics, 05/98, University of Texas at Dallas, (GPA 4.0/4.0)
YIJUN ZUO Department of Statistics and Probability Michigan State University East Lansing, MI 48824 Tel: (517) 432-5413 Fax: (517) 432-5413 Email: zuo@msu.edu URL: www.stt.msu.edu/users/zuo Education PhD,
More informationROBUST TESTS BASED ON MINIMUM DENSITY POWER DIVERGENCE ESTIMATORS AND SADDLEPOINT APPROXIMATIONS
ROBUST TESTS BASED ON MINIMUM DENSITY POWER DIVERGENCE ESTIMATORS AND SADDLEPOINT APPROXIMATIONS AIDA TOMA The nonrobustness of classical tests for parametric models is a well known problem and various
More informationLecture Notes 1: Vector spaces
Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector
More informationROBUST ESTIMATION OF A CORRELATION COEFFICIENT: AN ATTEMPT OF SURVEY
ROBUST ESTIMATION OF A CORRELATION COEFFICIENT: AN ATTEMPT OF SURVEY G.L. Shevlyakov, P.O. Smirnov St. Petersburg State Polytechnic University St.Petersburg, RUSSIA E-mail: Georgy.Shevlyakov@gmail.com
More informationA ROBUST METHOD OF ESTIMATING COVARIANCE MATRIX IN MULTIVARIATE DATA ANALYSIS G.M. OYEYEMI *, R.A. IPINYOMI **
ANALELE ŞTIINłIFICE ALE UNIVERSITĂłII ALEXANDRU IOAN CUZA DIN IAŞI Tomul LVI ŞtiinŃe Economice 9 A ROBUST METHOD OF ESTIMATING COVARIANCE MATRIX IN MULTIVARIATE DATA ANALYSIS G.M. OYEYEMI, R.A. IPINYOMI
More informationIdentification of Multivariate Outliers: A Performance Study
AUSTRIAN JOURNAL OF STATISTICS Volume 34 (2005), Number 2, 127 138 Identification of Multivariate Outliers: A Performance Study Peter Filzmoser Vienna University of Technology, Austria Abstract: Three
More informationHighly Robust Variogram Estimation 1. Marc G. Genton 2
Mathematical Geology, Vol. 30, No. 2, 1998 Highly Robust Variogram Estimation 1 Marc G. Genton 2 The classical variogram estimator proposed by Matheron is not robust against outliers in the data, nor is
More informationOutliers Robustness in Multivariate Orthogonal Regression
674 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS PART A: SYSTEMS AND HUMANS, VOL. 30, NO. 6, NOVEMBER 2000 Outliers Robustness in Multivariate Orthogonal Regression Giuseppe Carlo Calafiore Abstract
More informationRobust Estimation of Cronbach s Alpha
Robust Estimation of Cronbach s Alpha A. Christmann University of Dortmund, Fachbereich Statistik, 44421 Dortmund, Germany. S. Van Aelst Ghent University (UGENT), Department of Applied Mathematics and
More informationMetric Spaces and Topology
Chapter 2 Metric Spaces and Topology From an engineering perspective, the most important way to construct a topology on a set is to define the topology in terms of a metric on the set. This approach underlies
More informationA Brief Overview of Robust Statistics
A Brief Overview of Robust Statistics Olfa Nasraoui Department of Computer Engineering & Computer Science University of Louisville, olfa.nasraoui_at_louisville.edu Robust Statistical Estimators Robust
More informationOn Semicontinuity of Convex-valued Multifunctions and Cesari s Property (Q)
On Semicontinuity of Convex-valued Multifunctions and Cesari s Property (Q) Andreas Löhne May 2, 2005 (last update: November 22, 2005) Abstract We investigate two types of semicontinuity for set-valued
More informationComputational Connections Between Robust Multivariate Analysis and Clustering
1 Computational Connections Between Robust Multivariate Analysis and Clustering David M. Rocke 1 and David L. Woodruff 2 1 Department of Applied Science, University of California at Davis, Davis, CA 95616,
More informationBreakdown points of Cauchy regression-scale estimators
Breadown points of Cauchy regression-scale estimators Ivan Mizera University of Alberta 1 and Christine H. Müller Carl von Ossietzy University of Oldenburg Abstract. The lower bounds for the explosion
More informationThe nonsmooth Newton method on Riemannian manifolds
The nonsmooth Newton method on Riemannian manifolds C. Lageman, U. Helmke, J.H. Manton 1 Introduction Solving nonlinear equations in Euclidean space is a frequently occurring problem in optimization and
More informationRegression Analysis for Data Containing Outliers and High Leverage Points
Alabama Journal of Mathematics 39 (2015) ISSN 2373-0404 Regression Analysis for Data Containing Outliers and High Leverage Points Asim Kumer Dey Department of Mathematics Lamar University Md. Amir Hossain
More informationLecture 14 October 13
STAT 383C: Statistical Modeling I Fall 2015 Lecture 14 October 13 Lecturer: Purnamrita Sarkar Scribe: Some one Disclaimer: These scribe notes have been slightly proofread and may have typos etc. Note:
More informationIntroduction to Robust Statistics. Anthony Atkinson, London School of Economics, UK Marco Riani, Univ. of Parma, Italy
Introduction to Robust Statistics Anthony Atkinson, London School of Economics, UK Marco Riani, Univ. of Parma, Italy Multivariate analysis Multivariate location and scatter Data where the observations
More informationOn the Power of Robust Solutions in Two-Stage Stochastic and Adaptive Optimization Problems
MATHEMATICS OF OPERATIONS RESEARCH Vol. xx, No. x, Xxxxxxx 00x, pp. xxx xxx ISSN 0364-765X EISSN 156-5471 0x xx0x 0xxx informs DOI 10.187/moor.xxxx.xxxx c 00x INFORMS On the Power of Robust Solutions in
More informationOn the Power of Robust Solutions in Two-Stage Stochastic and Adaptive Optimization Problems
MATHEMATICS OF OPERATIONS RESEARCH Vol. 35, No., May 010, pp. 84 305 issn 0364-765X eissn 156-5471 10 350 084 informs doi 10.187/moor.1090.0440 010 INFORMS On the Power of Robust Solutions in Two-Stage
More informationThe Dirichlet s P rinciple. In this lecture we discuss an alternative formulation of the Dirichlet problem for the Laplace equation:
Oct. 1 The Dirichlet s P rinciple In this lecture we discuss an alternative formulation of the Dirichlet problem for the Laplace equation: 1. Dirichlet s Principle. u = in, u = g on. ( 1 ) If we multiply
More informationStatistical Data Analysis
DS-GA 0 Lecture notes 8 Fall 016 1 Descriptive statistics Statistical Data Analysis In this section we consider the problem of analyzing a set of data. We describe several techniques for visualizing the
More informationImprovement of The Hotelling s T 2 Charts Using Robust Location Winsorized One Step M-Estimator (WMOM)
Punjab University Journal of Mathematics (ISSN 1016-2526) Vol. 50(1)(2018) pp. 97-112 Improvement of The Hotelling s T 2 Charts Using Robust Location Winsorized One Step M-Estimator (WMOM) Firas Haddad
More informationRobust model selection criteria for robust S and LT S estimators
Hacettepe Journal of Mathematics and Statistics Volume 45 (1) (2016), 153 164 Robust model selection criteria for robust S and LT S estimators Meral Çetin Abstract Outliers and multi-collinearity often
More informationRobust scale estimation with extensions
Robust scale estimation with extensions Garth Tarr, Samuel Müller and Neville Weber School of Mathematics and Statistics THE UNIVERSITY OF SYDNEY Outline The robust scale estimator P n Robust covariance
More informationOutlier detection for skewed data
Outlier detection for skewed data Mia Hubert 1 and Stephan Van der Veeken December 7, 27 Abstract Most outlier detection rules for multivariate data are based on the assumption of elliptical symmetry of
More informationStatistical Depth Function
Machine Learning Journal Club, Gatsby Unit June 27, 2016 Outline L-statistics, order statistics, ranking. Instead of moments: median, dispersion, scale, skewness,... New visualization tools: depth contours,
More informationMarch 25, 2010 CHAPTER 2: LIMITS AND CONTINUITY OF FUNCTIONS IN EUCLIDEAN SPACE
March 25, 2010 CHAPTER 2: LIMIT AND CONTINUITY OF FUNCTION IN EUCLIDEAN PACE 1. calar product in R n Definition 1.1. Given x = (x 1,..., x n ), y = (y 1,..., y n ) R n,we define their scalar product as
More informationNonparametric Modal Regression
Nonparametric Modal Regression Summary In this article, we propose a new nonparametric modal regression model, which aims to estimate the mode of the conditional density of Y given predictors X. The nonparametric
More informationTwo Simple Resistant Regression Estimators
Two Simple Resistant Regression Estimators David J. Olive Southern Illinois University January 13, 2005 Abstract Two simple resistant regression estimators with O P (n 1/2 ) convergence rate are presented.
More informationSet, functions and Euclidean space. Seungjin Han
Set, functions and Euclidean space Seungjin Han September, 2018 1 Some Basics LOGIC A is necessary for B : If B holds, then A holds. B A A B is the contraposition of B A. A is sufficient for B: If A holds,
More informationWeek 2: Sequences and Series
QF0: Quantitative Finance August 29, 207 Week 2: Sequences and Series Facilitator: Christopher Ting AY 207/208 Mathematicians have tried in vain to this day to discover some order in the sequence of prime
More informationGeneral Foundations for Studying Masking and Swamping Robustness of Outlier Identifiers
General Foundations for Studying Masking and Swamping Robustness of Outlier Identifiers Robert Serfling 1 and Shanshan Wang 2 University of Texas at Dallas This paper is dedicated to the memory of Kesar
More informationMore Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order Restriction
Sankhyā : The Indian Journal of Statistics 2007, Volume 69, Part 4, pp. 700-716 c 2007, Indian Statistical Institute More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order
More informationComputer projects for Mathematical Statistics, MA 486. Some practical hints for doing computer projects with MATLAB:
Computer projects for Mathematical Statistics, MA 486. Some practical hints for doing computer projects with MATLAB: You can save your project to a text file (on a floppy disk or CD or on your web page),
More informationMAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9
MAT 570 REAL ANALYSIS LECTURE NOTES PROFESSOR: JOHN QUIGG SEMESTER: FALL 204 Contents. Sets 2 2. Functions 5 3. Countability 7 4. Axiom of choice 8 5. Equivalence relations 9 6. Real numbers 9 7. Extended
More informationA Parametric Simplex Algorithm for Linear Vector Optimization Problems
A Parametric Simplex Algorithm for Linear Vector Optimization Problems Birgit Rudloff Firdevs Ulus Robert Vanderbei July 9, 2015 Abstract In this paper, a parametric simplex algorithm for solving linear
More informationTotal Least Squares Approach in Regression Methods
WDS'08 Proceedings of Contributed Papers, Part I, 88 93, 2008. ISBN 978-80-7378-065-4 MATFYZPRESS Total Least Squares Approach in Regression Methods M. Pešta Charles University, Faculty of Mathematics
More informationCONVENIENT PRETOPOLOGIES ON Z 2
TWMS J. Pure Appl. Math., V.9, N.1, 2018, pp.40-51 CONVENIENT PRETOPOLOGIES ON Z 2 J. ŠLAPAL1 Abstract. We deal with pretopologies on the digital plane Z 2 convenient for studying and processing digital
More informationDESCRIPTIVE STATISTICS FOR NONPARAMETRIC MODELS I. INTRODUCTION
The Annals of Statistics 1975, Vol. 3, No.5, 1038-1044 DESCRIPTIVE STATISTICS FOR NONPARAMETRIC MODELS I. INTRODUCTION BY P. J. BICKEL 1 AND E. L. LEHMANN 2 University of California, Berkeley An overview
More informationS-estimators in mapping applications
S-estimators in mapping applications João Sequeira Instituto Superior Técnico, Technical University of Lisbon Portugal Email: joaosilvasequeira@istutlpt Antonios Tsourdos Autonomous Systems Group, Department
More informationRESTRICTED UNIFORM BOUNDEDNESS IN BANACH SPACES
RESTRICTED UNIFORM BOUNDEDNESS IN BANACH SPACES OLAV NYGAARD AND MÄRT PÕLDVERE Abstract. Precise conditions for a subset A of a Banach space X are known in order that pointwise bounded on A sequences of
More informationAllocating Public Goods via the Midpoint Rule
Allocating Public Goods via the Midpoint Rule Tobias Lindner Klaus Nehring Clemens Puppe Preliminary draft, February 2008 Abstract We study the properties of the following midpoint rule for determining
More informationGeneralized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses
Ann Inst Stat Math (2009) 61:773 787 DOI 10.1007/s10463-008-0172-6 Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses Taisuke Otsu Received: 1 June 2007 / Revised:
More informationON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS
MATHEMATICS OF OPERATIONS RESEARCH Vol. 28, No. 4, November 2003, pp. 677 692 Printed in U.S.A. ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS ALEXANDER SHAPIRO We discuss in this paper a class of nonsmooth
More informationRobust Statistics, Revisited
Robust Statistics, Revisited Ankur Moitra (MIT) joint work with Ilias Diakonikolas, Jerry Li, Gautam Kamath, Daniel Kane and Alistair Stewart CLASSIC PARAMETER ESTIMATION Given samples from an unknown
More informationSTATISTICS SYLLABUS UNIT I
STATISTICS SYLLABUS UNIT I (Probability Theory) Definition Classical and axiomatic approaches.laws of total and compound probability, conditional probability, Bayes Theorem. Random variable and its distribution
More informationRobustness of location estimators under t- distributions: a literature review
IOP Conference Series: Earth and Environmental Science PAPER OPEN ACCESS Robustness of location estimators under t- distributions: a literature review o cite this article: C Sumarni et al 07 IOP Conf.
More informationQUADRATIC MAJORIZATION 1. INTRODUCTION
QUADRATIC MAJORIZATION JAN DE LEEUW 1. INTRODUCTION Majorization methods are used extensively to solve complicated multivariate optimizaton problems. We refer to de Leeuw [1994]; Heiser [1995]; Lange et
More informationA Mathematical Programming Approach for Improving the Robustness of LAD Regression
A Mathematical Programming Approach for Improving the Robustness of LAD Regression Avi Giloni Sy Syms School of Business Room 428 BH Yeshiva University 500 W 185 St New York, NY 10033 E-mail: agiloni@ymail.yu.edu
More informationDepth Measures for Multivariate Functional Data
This article was downloaded by: [University of Cambridge], [Francesca Ieva] On: 21 February 2013, At: 07:42 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954
More informationTesting Some Covariance Structures under a Growth Curve Model in High Dimension
Department of Mathematics Testing Some Covariance Structures under a Growth Curve Model in High Dimension Muni S. Srivastava and Martin Singull LiTH-MAT-R--2015/03--SE Department of Mathematics Linköping
More informationTopological properties
CHAPTER 4 Topological properties 1. Connectedness Definitions and examples Basic properties Connected components Connected versus path connected, again 2. Compactness Definition and first examples Topological
More informationarxiv: v1 [math.oc] 22 Sep 2016
EUIVALENCE BETWEEN MINIMAL TIME AND MINIMAL NORM CONTROL PROBLEMS FOR THE HEAT EUATION SHULIN IN AND GENGSHENG WANG arxiv:1609.06860v1 [math.oc] 22 Sep 2016 Abstract. This paper presents the equivalence
More informationAP Statistics Cumulative AP Exam Study Guide
AP Statistics Cumulative AP Eam Study Guide Chapters & 3 - Graphs Statistics the science of collecting, analyzing, and drawing conclusions from data. Descriptive methods of organizing and summarizing statistics
More informationDoes Compressed Sensing have applications in Robust Statistics?
Does Compressed Sensing have applications in Robust Statistics? Salvador Flores December 1, 2014 Abstract The connections between robust linear regression and sparse reconstruction are brought to light.
More informationarxiv: v1 [math.st] 2 Aug 2007
The Annals of Statistics 2007, Vol. 35, No. 1, 13 40 DOI: 10.1214/009053606000000975 c Institute of Mathematical Statistics, 2007 arxiv:0708.0390v1 [math.st] 2 Aug 2007 ON THE MAXIMUM BIAS FUNCTIONS OF
More informationAsymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data
Sri Lankan Journal of Applied Statistics (Special Issue) Modern Statistical Methodologies in the Cutting Edge of Science Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations
More informationOutlier detection for high-dimensional data
Biometrika (2015), 102,3,pp. 589 599 doi: 10.1093/biomet/asv021 Printed in Great Britain Advance Access publication 7 June 2015 Outlier detection for high-dimensional data BY KWANGIL RO, CHANGLIANG ZOU,
More informationThe properties of L p -GMM estimators
The properties of L p -GMM estimators Robert de Jong and Chirok Han Michigan State University February 2000 Abstract This paper considers Generalized Method of Moment-type estimators for which a criterion
More informationFilters in Analysis and Topology
Filters in Analysis and Topology David MacIver July 1, 2004 Abstract The study of filters is a very natural way to talk about convergence in an arbitrary topological space, and carries over nicely into
More informationTheorems. Theorem 1.11: Greatest-Lower-Bound Property. Theorem 1.20: The Archimedean property of. Theorem 1.21: -th Root of Real Numbers
Page 1 Theorems Wednesday, May 9, 2018 12:53 AM Theorem 1.11: Greatest-Lower-Bound Property Suppose is an ordered set with the least-upper-bound property Suppose, and is bounded below be the set of lower
More information0.1 Pointwise Convergence
2 General Notation 0.1 Pointwise Convergence Let {f k } k N be a sequence of functions on a set X, either complex-valued or extended real-valued. We say that f k converges pointwise to a function f if
More informationASYMPTOTICS OF REWEIGHTED ESTIMATORS OF MULTIVARIATE LOCATION AND SCATTER. By Hendrik P. Lopuhaä Delft University of Technology
The Annals of Statistics 1999, Vol. 27, No. 5, 1638 1665 ASYMPTOTICS OF REWEIGHTED ESTIMATORS OF MULTIVARIATE LOCATION AND SCATTER By Hendrik P. Lopuhaä Delft University of Technology We investigate the
More informationLetter-value plots: Boxplots for large data
Letter-value plots: Boxplots for large data Heike Hofmann Karen Kafadar Hadley Wickham Dept of Statistics Dept of Statistics Dept of Statistics Iowa State University Indiana University Rice University
More informationInference For High Dimensional M-estimates: Fixed Design Results
Inference For High Dimensional M-estimates: Fixed Design Results Lihua Lei, Peter Bickel and Noureddine El Karoui Department of Statistics, UC Berkeley Berkeley-Stanford Econometrics Jamboree, 2017 1/49
More informationREGRESSION ANALYSIS AND ANALYSIS OF VARIANCE
REGRESSION ANALYSIS AND ANALYSIS OF VARIANCE P. L. Davies Eindhoven, February 2007 Reading List Daniel, C. (1976) Applications of Statistics to Industrial Experimentation, Wiley. Tukey, J. W. (1977) Exploratory
More informationSome Background Material
Chapter 1 Some Background Material In the first chapter, we present a quick review of elementary - but important - material as a way of dipping our toes in the water. This chapter also introduces important
More informationSolving Corrupted Quadratic Equations, Provably
Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin
More informationrobustness, efficiency, breakdown point, outliers, rank-based procedures, least absolute regression
Robust Statistics robustness, efficiency, breakdown point, outliers, rank-based procedures, least absolute regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html
More informationLecture 12 Robust Estimation
Lecture 12 Robust Estimation Prof. Dr. Svetlozar Rachev Institute for Statistics and Mathematical Economics University of Karlsruhe Financial Econometrics, Summer Semester 2007 Copyright These lecture-notes
More informationOptimization and Optimal Control in Banach Spaces
Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,
More informationREAL RENORMINGS ON COMPLEX BANACH SPACES
REAL RENORMINGS ON COMPLEX BANACH SPACES F. J. GARCÍA PACHECO AND A. MIRALLES Abstract. In this paper we provide two ways of obtaining real Banach spaces that cannot come from complex spaces. In concrete
More informationSupplementary Material for Wang and Serfling paper
Supplementary Material for Wang and Serfling paper March 6, 2017 1 Simulation study Here we provide a simulation study to compare empirically the masking and swamping robustness of our selected outlyingness
More informationTopology Proceedings. COPYRIGHT c by Topology Proceedings. All rights reserved.
Topology Proceedings Web: http://topology.auburn.edu/tp/ Mail: Topology Proceedings Department of Mathematics & Statistics Auburn University, Alabama 36849, USA E-mail: topolog@auburn.edu ISSN: 0146-4124
More informationDD-Classifier: Nonparametric Classification Procedure Based on DD-plot 1. Abstract
DD-Classifier: Nonparametric Classification Procedure Based on DD-plot 1 Jun Li 2, Juan A. Cuesta-Albertos 3, Regina Y. Liu 4 Abstract Using the DD-plot (depth-versus-depth plot), we introduce a new nonparametric
More informationRobust high-dimensional linear regression: A statistical perspective
Robust high-dimensional linear regression: A statistical perspective Po-Ling Loh University of Wisconsin - Madison Departments of ECE & Statistics STOC workshop on robustness and nonconvexity Montreal,
More informationAn introduction to Mathematical Theory of Control
An introduction to Mathematical Theory of Control Vasile Staicu University of Aveiro UNICA, May 2018 Vasile Staicu (University of Aveiro) An introduction to Mathematical Theory of Control UNICA, May 2018
More informationRobustness of Principal Components
PCA for Clustering An objective of principal components analysis is to identify linear combinations of the original variables that are useful in accounting for the variation in those original variables.
More information