robustness, efficiency, breakdown point, outliers, rank-based procedures, least absolute regression

Size: px

Start display at page:

Download "robustness, efficiency, breakdown point, outliers, rank-based procedures, least absolute regression"

Oliver Cooper
5 years ago
Views:

1 Robust Statistics robustness, efficiency, breakdown point, outliers, rank-based procedures, least absolute regression University of California, San Diego Instructor: Ery Arias-Castro Classical methods are often unstable Methods developed for a normal model (sometimes referred to as classical ) are often unstable in the following two (overlapping) ways: 1. They are sensitive to small departures from the assumed model, so much so that it can render these methods inferior compared to other methods. 2. They are sensitive to a small number of gross errors (often called outliers). (Regarding Point 1, although the Central Limit Theorem often comes to the rescue, we are talking about comparing methods at a finer level.) Since the normality assumption is rarely (if ever) correct in practice, this calls into question the popularity of these methods and justifies the search for other methods. Robust methods A (good) robust method is stable and efficient. Such a method is stable to small deviations from the underlying model: 1. A base model is assumed (say a normal model). 2. A class of deviations is considered. Example. Assume that X 1,...,X n are iid from F. Then we compare how a certain method behaves under F with how it behaves under a nearby distribution G, for example, where ε > 0 quantifies proximity. sup F(x) G(x) ε x R (This model includes cases where the data is contaminated by outliers.) Such a method is efficient, meaning it is comparable in performance to the best method for the base model. 1 / 13 2 / 13 3 / 13

2 Example: estimating a mean Suppose we observe X 1,...,X n iid from N(µ,σ 2 ), with both parameters unknown. We want to estimate µ. Let s compare the sample mean and the sample median. The sample mean is normal with mean µ and variance σ 2 /n. The sample median is asymptotically normal with mean µ and variance (π/2)σ 2 /n. In such a situation (where two estimators are asymptotically normal), a popular notion of asymptotic relative efficiency (ARE) is the ratio of the variances. Here the (asymptotic) efficiency of the sample median relative to the sample mean is σ 2 /n = 2/π 2/3 (π/2)σ 2 /n This notion of efficiency represents how much smaller the sample size need to be to achieve the same precision in terms of the length of the confidence interval. In this example, the sample median requires about 1/3 more data to achieve the same precision as the sample mean. 4 / 13 The sample mean is thus more efficient than the sample median. However, the sample median is more robust than the sample mean. This is in particular true in terms of stability in the presence of outliers. Informally, the breakdown point of a method is the smallest fraction of contamination that renders the method useless for estimators, this typically means being arbitrarily far from their target. The sample mean has breakdown point 1/n (where n is the sample size) since it is enough to change one observation to change the sample mean by any arbitrary amount. The sample median has breakdown point 1/2. 5 / 13 Note also that the sample median is more efficient than the sample mean when X 1,...,X n are (instead) iid from a double-exponential distribution. In that case, the efficiency of the sample median relative to the sample mean is 2, meaning that the sample mean requires twice the amount of data to achieve the same precision as the sample median. The situation becomes more and more extreme in favor of the sample median as the tails of the underlying distribution become heavier and heavier. 6 / 13

3 Other robust estimators for the mean under symmetry Assume the underlying distribution is symmetric (therefore about its median). The trimmed mean takes the average of the center 1 γ proportion of the sample, where γ (0,1) is a parameter of the method. The Hodges-Lehmann estimator [ ] median (X i +X j )/2 : i j It has a breakdown of about 30%, but for symmetric distributions, it has greater efficiency than does the sample median. Example: estimating a standard deviation Suppose we observe X 1,...,X n iid from N(µ,σ 2 ), with both parameters unknown. We want to estimate σ. Let s compare the sample standard deviation and the median absolute deviation: S n = [ 1 n 1 n ] 1/2 (X i X) 2 D n = median ( X 1 M,..., X n M ) where X denotes the sample mean and M the sample median. Note that as n : S n σ, D n Φ 1 (3/4)σ where Φ is the normal distribution function. Thus we change D n to cd n, where c = 1/Φ 1 (3/4) / 13 8 / 13 The efficiency of the (scaled) MAD relative to the sample standard deviation is about 0.37, meaning that the MAD requires about 2.7 times as much data as to achieve the same precision. In terms of robustness, the MAD has much larger breakdown point than the sample standard deviation (which is very sensitive to gross errors). Other robust estimators for the standard deviation have been proposed, for example Q n 1st quartile of { X i X j : i < j} The implicit constant of proportionality is chosen so that Q n σ as n. 9 / 13

4 Rank-based procedures Rank-based tests are naturally robust because the ranks themselves are. The Wilcoxon signed-rank test can be seen as an alternative to the one-sample t-test. (The Hodges-Lehmann estimator is derived from this test.) The Wilcoxon rank-sum test, which can be seen as an alternative to the two-sample t-test. The Kruskal-Wallis test, which can be seen as an alternative to the one-way ANOVA t-test. Etc. These rank-based tests are designed for different null hypotheses. Under their null hypotheses, the rank-based tests have the additional advantage of being distribution-free. Robust methods in regression Consider bivariate data (X 1,Y 1 ),...,(X n,y n ). We saw how to fit a model by least squares. For example, fitting a line by least squares requires to computing ( β 0, β 1 ) = argmin n [ ] 2 Yi b 0 b 1 X i 10 / 13 The fitted model is ˆf(x) = β 0 + β 1 x 11 / 13 Least absolute regression Fitting a line by least absolute regression requires to computing ( β 0, β 1 ) = argmin n Yi b 0 b 1 X i The fitted model is f(x) = β 0 + β 1 x If the underlying model is (standard assumptions) Y = β 0 +β 1 X +ε, ε N(0,σ 2 ) then least squares is more efficient than least absolute regression. If instead ε is double-exponential (still with zero mean and variance σ 2 ), then least absolute regression is more efficient than least squares. In any case, least absolute regression is more robust than least squares with respect to outliers in response. Both methods are very sensitive to outliers in predictor (a.k.a., high-leverage points). 12 / 13

5 Methods for regression with high breakdown points Define the ith residual for the line parameterized as Least median of squares is based on solving min Least trimmed sum of squares is based on solving where e i = Y i b 0 b 1 X i median [ e 1 2,...,e n 2] min h e (i) 2 e (1) 2 e (n) 2 are the ordered squared residuals. (h is a parameter of the method.) 13 / 13

One-Sample Numerical Data

One-Sample Numerical Data quantiles, boxplot, histogram, bootstrap confidence intervals, goodness-of-fit tests University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html