MULTIVARIATE TECHNIQUES, ROBUSTNESS

Size: px
Start display at page:

Download "MULTIVARIATE TECHNIQUES, ROBUSTNESS"

Transcription

1 MULTIVARIATE TECHNIQUES, ROBUSTNESS Mia Hubert Associate Professor, Department of Mathematics and L-STAT Katholieke Universiteit Leuven, Belgium Peter J. Rousseeuw 1 Senior Researcher, Renaissance Technologies, USA peter@rousseeuw.net The usual multivariate analysis techniques include location and scatter estimation, principal component analysis, factor analysis, discriminant analysis, canonical correlation analysis, multiple regression and cluster analysis. These methods all try to describe and discover structure in the data, and thus rely on the correlation structure between the variables. Classical procedures typically assume normality (i.e. gaussianity) and consequently use the sample mean and sample covariance matrix to estimate the true underlying model parameters. Below are three examples of multivariate settings used to analyze a data set with n objects and p variables, forming an n p data matrix X = (x 1,...,x n ) with x i = (x i1,..., x ip ) the ith observation. 1. Hotelling s T 2 -statistic for inference about the center of the (normal) underlying distribution is based on the sample mean x = n i=1 x i and the sample covariance matrix S x = 1 n n 1 i=1 (x i x)(x i x). 2. Classical principal component analysis (PCA) uses the eigenvectors and eigenvalues of S x to construct a smaller set of uncorrelated variables. 3. In the multiple regression setting, also a response variable y = (y 1,...,y n ) is measured. The goal of linear regression is to estimate the parameter θ = (β 0, β) = (β 0, β 1,...,β p ) relating the response variable and the predictor variables in the model y i = β 0 + β 1 x i1 + + β p x ip + ε i. 1 Dr Peter Rousseeuw was Professor and Head (since 1992) of the Division of Applied Mathematics, Universiteit Antwerpen, Belgium. Currently he is a Senior Researcher at Renaissance Technologies in New York. He has (co-)authored over 160 papers, two edited volumes and three books, including Robust Regression and Outlier Detection (with A.M. Leroy, Wiley-Interscience, 1987). In 2003 ISI-Thompson included him in their list of Highly Cited Mathematicians. His paper Least Median of Squares Regression (1984), Journal of the American Statistical Association, 79, ) which proposed new robust methods for regression and covariance, has been reprinted in Breakthroughs in Statistics III (the 3-volume collection consists of the 60 most influential publications in statistics from 1850 to 1990), Kotz and Johnson 1997, Springer-Verlag, New York. He is an Elected Member, International Statistical Institute (1991) and an Elected Fellow of Institute of Mathematical Statistics (elected 1993) and American Statistical Association (elected 1994). He was Associate Editor, Journal of the American Statistical Association ( ), and Computational Statistics and Data Analysis ( ). He has supervised 20 PhD students. 1

2 The least squares slope estimator can be written as ˆβ LS = S 1 x s xy with s xy = 1 n n 1 i=1 (y i ȳ)(x i x) the cross-covariance vector. The intercept is given by ˆβ 0 = ȳ ˆβ LS x. These classical estimators often possess optimal properties under the gaussian model assumptions, but they can be strongly affected by even a few outliers. Outliers are data points that deviate from the pattern suggested by the majority of the data. Outliers are more likely to occur in datasets with many observations and/or variables, and often they do not show up by simple visual inspection. When the data contain nasty outliers, typically two things happen: the multivariate estimates differ substantially from the right answer, defined here as the estimates we would have obtained without the outliers; the resulting fitted model does not allow to detect the outliers by means of their residuals, Mahalanobis distances, or the widely used leave-one-out diagnostics. The first consequence is fairly well-known (although the size of the effect is often underestimated). Unfortunately the second consequence is less well-known, and when stated many people find it hard to believe or paradoxical. Common intuition says that outliers must stick out from the classical fitted model, and indeed some of them do so. But the most harmful types of outliers, especially if there are several of them, may affect the estimated model so much in their direction that they are now well-fitted by it. Once this effect is understood, one sees that the following two problems are essentially equivalent: Robust estimation: find a robust fit, which is similar to the fit we would have found without the outliers; Outlier detection: find all the outliers that matter. Indeed, a solution to the first problem allows us, as a byproduct, to identify the outliers by their deviation from the robust fit. Conversely, a solution to the second problem would allow us to remove or downweight the outliers followed by a classical fit, which yields a robust estimate. It turns out that the more fruitful approach is to solve the first problem and to use its result to answer the second. This is because from a combinatorial viewpoint it is more feasible to search for sufficiently many good data points than to find all the bad data points. Many robust multivariate estimators have been constructed by replacing the empirical mean and covariance matrix with a robust alternative. Currently the most popular estimator for this purpose is the Minimum Covariance Determinant (MCD) estimator [27]. The MCD method looks for the h observations (out of n) whose classical covariance matrix has the lowest possible determinant. The raw 2

3 MCD estimate of location is then the average of these h points, whereas the raw MCD estimate of scatter is a multiple of their covariance matrix. Based on these raw estimates one typically carries out a reweighting step, yielding the reweighted MCD estimates [30]. The MCD location and scatter estimates are affine equivariant, which means that they behave properly under affine transformations of the data. Computation of the MCD is non-trivial, but can be performed efficiently by means of the FAST- MCD algorithm [30] which is available in standard SAS, S-Plus, and R. A useful measure of robustness is the finite-sample breakdown value [11, 12]. The breakdown value is the smallest amount of contamination that can have an arbitrarily large effect on the estimator. The MCD estimates of multivariate location and scatter have breakdown value (n h)/n. The MCD has its highest possible breakdown value of 50% when h = [(n + p + 1)/2]. Note that no affine equivariant estimator can have a breakdown value above 50%. Another measure of robustness is the influence function [12], which measures the effect on an estimator of adding a small mass of data in a specific place. The MCD has a bounded influence function, which means that a small contamination at any position can only have a small effect on the estimator [7]. In regression, a popular estimator with high breakdown value is the Least Trimmed Squares (LTS) estimator [27, 31]. The LTS is the fit that minimizes the sum of the h smallest squared residuals (out of n). Other frequently used robust estimators include S-estimators [28] and MM-estimators [35], which can achieve a higher finite-sample efficiency than the LTS. Robust multivariate estimators have been used to robustify the Hotelling T 2 -statistic [34], PCA [8, 32], multiple regression with one or several response variables [29, 1], discriminant analysis [14, 20, 5], factor analysis [26], canonical correlation [6], and cluster analysis [13]. Another important group of robust multivariate methods are based on projection pursuit (PP) techniques. They are especially useful when the dimension p of the data is larger than the sample size n, in which case the MCD is no longer welldefined. Robust PP methods project the data on many univariate directions and apply robust estimators of location and scale (such as the median and the median absolute deviation) to each projection. Examples include the Stahel-Donoho estimator of location and scatter [24] and generalizations [36], robust PCA [23, 9, 18, 2], discriminant analysis [25], canonical correlation [3], and outlier detection in skewed data [4, 19]. The hybrid ROBPCA method [17, 10] combines PP techniques with the MCD and has led to the construction of robust principal component regression [22], partial least squares [21], and classification for high-dimensional data [33]. A more extensive description of robust multivariate methods and their applications can be found in [16, 15]. 3

4 References [1] J. Agulló, C. Croux, and S. Van Aelst, The multivariate least trimmed squares estimator, Journal of Multivariate Analysis 99 (2008), [2] G. Boente, A.M. Pires, and I. Rodrigues, General projection-pursuit estimates for the common principal components model: Influence functions and Monte Carlo study, Journal of Multivariate Analysis 97 (2006), [3] J.A. Branco, C. Croux, P. Filzmoser, and M.R. Oliviera, Robust canonical correlations: A comparative study, Computational Statistics 20 (2005), [4] G. Brys, M. Hubert, and P.J. Rousseeuw, A robustification of Independent Component Analysis, Journal of Chemometrics 19 (2005), [5] C. Croux and C. Dehon, Robust linear discriminant analysis using S-estimators, The Canadian Journal of Statistics 29 (2001), [6], Analyse canonique basée sur des estimateurs robustes de la matrice de covariance, La Revue de Statistique Appliquée 2 (2002), [7] C. Croux and G. Haesbroeck, Influence function and efficiency of the Minimum Covariance Determinant scatter matrix estimator, Journal of Multivariate Analysis 71 (1999), [8], Principal components analysis based on robust estimators of the covariance or correlation matrix: influence functions and efficiencies, Biometrika 87 (2000), [9] C. Croux and A. Ruiz-Gazen, High breakdown estimators for principal components: the projection-pursuit approach revisited, Journal of Multivariate Analysis 95 (2005), [10] M. Debruyne and M. Hubert, The influence function of the Stahel-Donoho covariance estimator of smallest outlyingness, Statistics and Probability Letters 79 (2009), [11] D.L. Donoho and P.J. Huber, The notion of breakdown point, A Festschrift for Erich Lehmann (Belmont) (P. Bickel, K. Doksum, and J.L. Hodges, eds.), Wadsworth, 1983, pp [12] F.R. Hampel, E.M. Ronchetti, P.J. Rousseeuw, and W.A. Stahel, Robust statistics: The approach based on influence functions, Wiley-Interscience, New York, [13] J. Hardin and D.M. Rocke, Outlier detection in the multiple cluster setting using the minimum covariance determinant estimator, Computational Statistics and Data Analysis 44 (2004), [14] D.M. Hawkins and G.J. McLachlan, High-breakdown linear discriminant analysis, Journal of the American Statistical Association 92 (1997), [15] M. Hubert and M. Debruyne, Minimum Covariance Determinant, Wiley Interdisciplinary Reviews: Computational Statistics 2 (2010), [16] M. Hubert, P.J. Rousseeuw, and S. Van Aelst, High breakdown robust multivariate methods, Statistical Science 23 (2008), [17] M. Hubert, P.J. Rousseeuw, and K. Vanden Branden, ROBPCA: a new approach to robust principal components analysis, Technometrics 47 (2005),

5 [18] M. Hubert, P.J. Rousseeuw, and S. Verboven, A fast robust method for principal components with applications to chemometrics, Chemometrics and Intelligent Laboratory Systems 60 (2002), [19] M. Hubert and S. Van der Veeken, Outlier detection for skewed data, Journal of Chemometrics 22 (2008), [20] M. Hubert and K. Van Driessen, Fast and robust discriminant analysis, Computational Statistics and Data Analysis 45 (2004), [21] M. Hubert and K. Vanden Branden, Robust methods for Partial Least Squares Regression, Journal of Chemometrics 17 (2003), [22] M. Hubert and S. Verboven, A robust PCR method for high-dimensional regressors, Journal of Chemometrics 17 (2003), [23] G. Li and Z. Chen, Projection-pursuit approach to robust dispersion matrices and principal components: primary theory and Monte Carlo, Journal of the American Statistical Association 80 (1985), [24] R.A. Maronna and V.J. Yohai, The behavior of the Stahel-Donoho robust multivariate estimator, Journal of the American Statistical Association 90 (1995), [25] A.M. Pires, Robust discriminant analysis and the projection pursuit approach: Practical aspects, in: Developments in Robust Statistics (R. Dutter, P. Filzmoser, U. Gather, and P.J. Rousseeuw, eds.), Physika Verlag, Heidelberg, 2003, [26] G. Pison, P.J. Rousseeuw, P. Filzmoser, and C. Croux, Robust factor analysis, Journal of Multivariate Analysis 84 (2003), [27] P.J. Rousseeuw, Least median of squares regression, Journal of the American Statistical Association 79 (1984), [28] P.J. Rousseeuw and A.M. Leroy, Robust regression and outlier detection, Wiley- Interscience, New York, [29] P.J. Rousseeuw, S. Van Aelst, K. Van Driessen, and J. Agulló, Robust multivariate regression, Technometrics 46 (2004), [30] P.J. Rousseeuw and K. Van Driessen, A fast algorithm for the Minimum Covariance Determinant estimator, Technometrics 41 (1999), [31], Computing LTS regression for large data sets, Data Mining and Knowledge Discovery 12 (2006), [32] M. Salibian-Barrera, S. Van Aelst, and G. Willems, PCA based on multivariate MMestimators with fast and robust bootstrap, Journal of the American Statistical Association 101 (2006), [33] K. Vanden Branden and M. Hubert, Robust classification in high dimensions based on the SIMCA method, Chemometrics and Intelligent Laboratory Systems 79 (2005), [34] G. Willems, G. Pison, P.J. Rousseeuw, and S. Van Aelst, A robust Hotelling test, Metrika 55 (2002), [35] V.J. Yohai, High breakdown point and high efficiency robust estimates for regression, The Annals of Statistics 15 (1987), [36] Y. Zuo, H. Cui, and X. He, On the Stahel-Donoho estimator and depth-weighted means of multivariate data, The Annals of Statistics 32 (2004),

FAST CROSS-VALIDATION IN ROBUST PCA

FAST CROSS-VALIDATION IN ROBUST PCA COMPSTAT 2004 Symposium c Physica-Verlag/Springer 2004 FAST CROSS-VALIDATION IN ROBUST PCA Sanne Engelen, Mia Hubert Key words: Cross-Validation, Robustness, fast algorithm COMPSTAT 2004 section: Partial

More information

Re-weighted Robust Control Charts for Individual Observations

Re-weighted Robust Control Charts for Individual Observations Universiti Tunku Abdul Rahman, Kuala Lumpur, Malaysia 426 Re-weighted Robust Control Charts for Individual Observations Mandana Mohammadi 1, Habshah Midi 1,2 and Jayanthi Arasan 1,2 1 Laboratory of Applied

More information

Small Sample Corrections for LTS and MCD

Small Sample Corrections for LTS and MCD myjournal manuscript No. (will be inserted by the editor) Small Sample Corrections for LTS and MCD G. Pison, S. Van Aelst, and G. Willems Department of Mathematics and Computer Science, Universitaire Instelling

More information

IMPROVING THE SMALL-SAMPLE EFFICIENCY OF A ROBUST CORRELATION MATRIX: A NOTE

IMPROVING THE SMALL-SAMPLE EFFICIENCY OF A ROBUST CORRELATION MATRIX: A NOTE IMPROVING THE SMALL-SAMPLE EFFICIENCY OF A ROBUST CORRELATION MATRIX: A NOTE Eric Blankmeyer Department of Finance and Economics McCoy College of Business Administration Texas State University San Marcos

More information

TITLE : Robust Control Charts for Monitoring Process Mean of. Phase-I Multivariate Individual Observations AUTHORS : Asokan Mulayath Variyath.

TITLE : Robust Control Charts for Monitoring Process Mean of. Phase-I Multivariate Individual Observations AUTHORS : Asokan Mulayath Variyath. TITLE : Robust Control Charts for Monitoring Process Mean of Phase-I Multivariate Individual Observations AUTHORS : Asokan Mulayath Variyath Department of Mathematics and Statistics, Memorial University

More information

Small sample corrections for LTS and MCD

Small sample corrections for LTS and MCD Metrika (2002) 55: 111 123 > Springer-Verlag 2002 Small sample corrections for LTS and MCD G. Pison, S. Van Aelst*, and G. Willems Department of Mathematics and Computer Science, Universitaire Instelling

More information

Fast and robust bootstrap for LTS

Fast and robust bootstrap for LTS Fast and robust bootstrap for LTS Gert Willems a,, Stefan Van Aelst b a Department of Mathematics and Computer Science, University of Antwerp, Middelheimlaan 1, B-2020 Antwerp, Belgium b Department of

More information

Robust Estimation of Cronbach s Alpha

Robust Estimation of Cronbach s Alpha Robust Estimation of Cronbach s Alpha A. Christmann University of Dortmund, Fachbereich Statistik, 44421 Dortmund, Germany. S. Van Aelst Ghent University (UGENT), Department of Applied Mathematics and

More information

Computational Statistics and Data Analysis

Computational Statistics and Data Analysis Computational Statistics and Data Analysis 53 (2009) 2264 2274 Contents lists available at ScienceDirect Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda Robust

More information

Robust Classification for Skewed Data

Robust Classification for Skewed Data Advances in Data Analysis and Classification manuscript No. (will be inserted by the editor) Robust Classification for Skewed Data Mia Hubert Stephan Van der Veeken Received: date / Accepted: date Abstract

More information

DIMENSION REDUCTION OF THE EXPLANATORY VARIABLES IN MULTIPLE LINEAR REGRESSION. P. Filzmoser and C. Croux

DIMENSION REDUCTION OF THE EXPLANATORY VARIABLES IN MULTIPLE LINEAR REGRESSION. P. Filzmoser and C. Croux Pliska Stud. Math. Bulgar. 003), 59 70 STUDIA MATHEMATICA BULGARICA DIMENSION REDUCTION OF THE EXPLANATORY VARIABLES IN MULTIPLE LINEAR REGRESSION P. Filzmoser and C. Croux Abstract. In classical multiple

More information

Fast and Robust Discriminant Analysis

Fast and Robust Discriminant Analysis Fast and Robust Discriminant Analysis Mia Hubert a,1, Katrien Van Driessen b a Department of Mathematics, Katholieke Universiteit Leuven, W. De Croylaan 54, B-3001 Leuven. b UFSIA-RUCA Faculty of Applied

More information

Stahel-Donoho Estimation for High-Dimensional Data

Stahel-Donoho Estimation for High-Dimensional Data Stahel-Donoho Estimation for High-Dimensional Data Stefan Van Aelst KULeuven, Department of Mathematics, Section of Statistics Celestijnenlaan 200B, B-3001 Leuven, Belgium Email: Stefan.VanAelst@wis.kuleuven.be

More information

Outlier detection for skewed data

Outlier detection for skewed data Outlier detection for skewed data Mia Hubert 1 and Stephan Van der Veeken December 7, 27 Abstract Most outlier detection rules for multivariate data are based on the assumption of elliptical symmetry of

More information

ROBUST ESTIMATION OF A CORRELATION COEFFICIENT: AN ATTEMPT OF SURVEY

ROBUST ESTIMATION OF A CORRELATION COEFFICIENT: AN ATTEMPT OF SURVEY ROBUST ESTIMATION OF A CORRELATION COEFFICIENT: AN ATTEMPT OF SURVEY G.L. Shevlyakov, P.O. Smirnov St. Petersburg State Polytechnic University St.Petersburg, RUSSIA E-mail: Georgy.Shevlyakov@gmail.com

More information

Robust Wilks' Statistic based on RMCD for One-Way Multivariate Analysis of Variance (MANOVA)

Robust Wilks' Statistic based on RMCD for One-Way Multivariate Analysis of Variance (MANOVA) ISSN 2224-584 (Paper) ISSN 2225-522 (Online) Vol.7, No.2, 27 Robust Wils' Statistic based on RMCD for One-Way Multivariate Analysis of Variance (MANOVA) Abdullah A. Ameen and Osama H. Abbas Department

More information

Regression Analysis for Data Containing Outliers and High Leverage Points

Regression Analysis for Data Containing Outliers and High Leverage Points Alabama Journal of Mathematics 39 (2015) ISSN 2373-0404 Regression Analysis for Data Containing Outliers and High Leverage Points Asim Kumer Dey Department of Mathematics Lamar University Md. Amir Hossain

More information

The minimum volume ellipsoid (MVE), introduced

The minimum volume ellipsoid (MVE), introduced Minimum volume ellipsoid Stefan Van Aelst 1 and Peter Rousseeuw 2 The minimum volume ellipsoid (MVE) estimator is based on the smallest volume ellipsoid that covers h of the n observations. It is an affine

More information

Minimum Covariance Determinant and Extensions

Minimum Covariance Determinant and Extensions Minimum Covariance Determinant and Extensions Mia Hubert, Michiel Debruyne, Peter J. Rousseeuw September 22, 2017 arxiv:1709.07045v1 [stat.me] 20 Sep 2017 Abstract The Minimum Covariance Determinant (MCD)

More information

Robust Linear Discriminant Analysis and the Projection Pursuit Approach

Robust Linear Discriminant Analysis and the Projection Pursuit Approach Robust Linear Discriminant Analysis and the Projection Pursuit Approach Practical aspects A. M. Pires 1 Department of Mathematics and Applied Mathematics Centre, Technical University of Lisbon (IST), Lisboa,

More information

FAULT DETECTION AND ISOLATION WITH ROBUST PRINCIPAL COMPONENT ANALYSIS

FAULT DETECTION AND ISOLATION WITH ROBUST PRINCIPAL COMPONENT ANALYSIS Int. J. Appl. Math. Comput. Sci., 8, Vol. 8, No. 4, 49 44 DOI:.478/v6-8-38-3 FAULT DETECTION AND ISOLATION WITH ROBUST PRINCIPAL COMPONENT ANALYSIS YVON THARRAULT, GILLES MOUROT, JOSÉ RAGOT, DIDIER MAQUIN

More information

Robust Exponential Smoothing of Multivariate Time Series

Robust Exponential Smoothing of Multivariate Time Series Robust Exponential Smoothing of Multivariate Time Series Christophe Croux,a, Sarah Gelper b, Koen Mahieu a a Faculty of Business and Economics, K.U.Leuven, Naamsestraat 69, 3000 Leuven, Belgium b Erasmus

More information

Robust methods for multivariate data analysis

Robust methods for multivariate data analysis JOURNAL OF CHEMOMETRICS J. Chemometrics 2005; 19: 549 563 Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/cem.962 Robust methods for multivariate data analysis S. Frosch

More information

CLASSIFICATION EFFICIENCIES FOR ROBUST LINEAR DISCRIMINANT ANALYSIS

CLASSIFICATION EFFICIENCIES FOR ROBUST LINEAR DISCRIMINANT ANALYSIS Statistica Sinica 8(008), 58-599 CLASSIFICATION EFFICIENCIES FOR ROBUST LINEAR DISCRIMINANT ANALYSIS Christophe Croux, Peter Filzmoser and Kristel Joossens K.U. Leuven and Vienna University of Technology

More information

Outlier detection for high-dimensional data

Outlier detection for high-dimensional data Biometrika (2015), 102,3,pp. 589 599 doi: 10.1093/biomet/asv021 Printed in Great Britain Advance Access publication 7 June 2015 Outlier detection for high-dimensional data BY KWANGIL RO, CHANGLIANG ZOU,

More information

Fast and Robust Classifiers Adjusted for Skewness

Fast and Robust Classifiers Adjusted for Skewness Fast and Robust Classifiers Adjusted for Skewness Mia Hubert 1 and Stephan Van der Veeken 2 1 Department of Mathematics - LStat, Katholieke Universiteit Leuven Celestijnenlaan 200B, Leuven, Belgium, Mia.Hubert@wis.kuleuven.be

More information

Damage detection in the presence of outliers based on robust PCA Fahit Gharibnezhad 1, L.E. Mujica 2, Jose Rodellar 3 1,2,3

Damage detection in the presence of outliers based on robust PCA Fahit Gharibnezhad 1, L.E. Mujica 2, Jose Rodellar 3 1,2,3 Damage detection in the presence of outliers based on robust PCA Fahit Gharibnezhad 1, L.E. Mujica 2, Jose Rodellar 3 1,2,3 Escola Universitària d'enginyeria Tècnica Industrial de Barcelona,Department

More information

THE BREAKDOWN POINT EXAMPLES AND COUNTEREXAMPLES

THE BREAKDOWN POINT EXAMPLES AND COUNTEREXAMPLES REVSTAT Statistical Journal Volume 5, Number 1, March 2007, 1 17 THE BREAKDOWN POINT EXAMPLES AND COUNTEREXAMPLES Authors: P.L. Davies University of Duisburg-Essen, Germany, and Technical University Eindhoven,

More information

Inference based on robust estimators Part 2

Inference based on robust estimators Part 2 Inference based on robust estimators Part 2 Matias Salibian-Barrera 1 Department of Statistics University of British Columbia ECARES - Dec 2007 Matias Salibian-Barrera (UBC) Robust inference (2) ECARES

More information

Identification of Multivariate Outliers: A Performance Study

Identification of Multivariate Outliers: A Performance Study AUSTRIAN JOURNAL OF STATISTICS Volume 34 (2005), Number 2, 127 138 Identification of Multivariate Outliers: A Performance Study Peter Filzmoser Vienna University of Technology, Austria Abstract: Three

More information

The Minimum Regularized Covariance Determinant estimator

The Minimum Regularized Covariance Determinant estimator The Minimum Regularized Covariance Determinant estimator Kris Boudt Solvay Business School, Vrije Universiteit Brussel School of Business and Economics, Vrije Universiteit Amsterdam arxiv:1701.07086v3

More information

Why the Rousseeuw Yohai Paradigm is One of the Largest and Longest Running Scientific Hoaxes in History

Why the Rousseeuw Yohai Paradigm is One of the Largest and Longest Running Scientific Hoaxes in History Why the Rousseeuw Yohai Paradigm is One of the Largest and Longest Running Scientific Hoaxes in History David J. Olive Southern Illinois University December 4, 2012 Abstract The Rousseeuw Yohai paradigm

More information

Two Simple Resistant Regression Estimators

Two Simple Resistant Regression Estimators Two Simple Resistant Regression Estimators David J. Olive Southern Illinois University January 13, 2005 Abstract Two simple resistant regression estimators with O P (n 1/2 ) convergence rate are presented.

More information

The S-estimator of multivariate location and scatter in Stata

The S-estimator of multivariate location and scatter in Stata The Stata Journal (yyyy) vv, Number ii, pp. 1 9 The S-estimator of multivariate location and scatter in Stata Vincenzo Verardi University of Namur (FUNDP) Center for Research in the Economics of Development

More information

Introduction Robust regression Examples Conclusion. Robust regression. Jiří Franc

Introduction Robust regression Examples Conclusion. Robust regression. Jiří Franc Robust regression Robust estimation of regression coefficients in linear regression model Jiří Franc Czech Technical University Faculty of Nuclear Sciences and Physical Engineering Department of Mathematics

More information

Review of robust multivariate statistical methods in high dimension

Review of robust multivariate statistical methods in high dimension Review of robust multivariate statistical methods in high dimension Peter Filzmoser a,, Valentin Todorov b a Department of Statistics and Probability Theory, Vienna University of Technology, Wiedner Hauptstr.

More information

ROBUST PARTIAL LEAST SQUARES REGRESSION AND OUTLIER DETECTION USING REPEATED MINIMUM COVARIANCE DETERMINANT METHOD AND A RESAMPLING METHOD

ROBUST PARTIAL LEAST SQUARES REGRESSION AND OUTLIER DETECTION USING REPEATED MINIMUM COVARIANCE DETERMINANT METHOD AND A RESAMPLING METHOD ROBUST PARTIAL LEAST SQUARES REGRESSION AND OUTLIER DETECTION USING REPEATED MINIMUM COVARIANCE DETERMINANT METHOD AND A RESAMPLING METHOD by Dilrukshika Manori Singhabahu BS, Information Technology, Slippery

More information

Research Article Robust Control Charts for Monitoring Process Mean of Phase-I Multivariate Individual Observations

Research Article Robust Control Charts for Monitoring Process Mean of Phase-I Multivariate Individual Observations Journal of Quality and Reliability Engineering Volume 3, Article ID 4, 4 pages http://dx.doi.org/./3/4 Research Article Robust Control Charts for Monitoring Process Mean of Phase-I Multivariate Individual

More information

A Modified M-estimator for the Detection of Outliers

A Modified M-estimator for the Detection of Outliers A Modified M-estimator for the Detection of Outliers Asad Ali Department of Statistics, University of Peshawar NWFP, Pakistan Email: asad_yousafzay@yahoo.com Muhammad F. Qadir Department of Statistics,

More information

Robust scale estimation with extensions

Robust scale estimation with extensions Robust scale estimation with extensions Garth Tarr, Samuel Müller and Neville Weber School of Mathematics and Statistics THE UNIVERSITY OF SYDNEY Outline The robust scale estimator P n Robust covariance

More information

Sparse PCA for high-dimensional data with outliers

Sparse PCA for high-dimensional data with outliers Sparse PCA for high-dimensional data with outliers Mia Hubert Tom Reynkens Eric Schmitt Tim Verdonck Department of Mathematics, KU Leuven Leuven, Belgium June 25, 2015 Abstract A new sparse PCA algorithm

More information

Robustness for dummies

Robustness for dummies Robustness for dummies CRED WP 2012/06 Vincenzo Verardi, Marjorie Gassner and Darwin Ugarte Center for Research in the Economics of Development University of Namur Robustness for dummies Vincenzo Verardi,

More information

Practical High Breakdown Regression

Practical High Breakdown Regression Practical High Breakdown Regression David J. Olive and Douglas M. Hawkins Southern Illinois University and University of Minnesota February 8, 2011 Abstract This paper shows that practical high breakdown

More information

Rotor Balancing Using High Breakdown-Point and Bounded-Influence Estimators

Rotor Balancing Using High Breakdown-Point and Bounded-Influence Estimators *Manuscript Click here to view linked References Rotor Balancing Using High Breakdown-Point and Bounded-Influence Estimators Paolo PENNACCHI a, Steven CHAERON a,, Roberto RICCI b a Politecnico di Milano,

More information

Introduction to Robust Statistics. Elvezio Ronchetti. Department of Econometrics University of Geneva Switzerland.

Introduction to Robust Statistics. Elvezio Ronchetti. Department of Econometrics University of Geneva Switzerland. Introduction to Robust Statistics Elvezio Ronchetti Department of Econometrics University of Geneva Switzerland Elvezio.Ronchetti@metri.unige.ch http://www.unige.ch/ses/metri/ronchetti/ 1 Outline Introduction

More information

A ROBUST METHOD OF ESTIMATING COVARIANCE MATRIX IN MULTIVARIATE DATA ANALYSIS G.M. OYEYEMI *, R.A. IPINYOMI **

A ROBUST METHOD OF ESTIMATING COVARIANCE MATRIX IN MULTIVARIATE DATA ANALYSIS G.M. OYEYEMI *, R.A. IPINYOMI ** ANALELE ŞTIINłIFICE ALE UNIVERSITĂłII ALEXANDRU IOAN CUZA DIN IAŞI Tomul LVI ŞtiinŃe Economice 9 A ROBUST METHOD OF ESTIMATING COVARIANCE MATRIX IN MULTIVARIATE DATA ANALYSIS G.M. OYEYEMI, R.A. IPINYOMI

More information

Improved Feasible Solution Algorithms for. High Breakdown Estimation. Douglas M. Hawkins. David J. Olive. Department of Applied Statistics

Improved Feasible Solution Algorithms for. High Breakdown Estimation. Douglas M. Hawkins. David J. Olive. Department of Applied Statistics Improved Feasible Solution Algorithms for High Breakdown Estimation Douglas M. Hawkins David J. Olive Department of Applied Statistics University of Minnesota St Paul, MN 55108 Abstract High breakdown

More information

Inequalities Relating Addition and Replacement Type Finite Sample Breakdown Points

Inequalities Relating Addition and Replacement Type Finite Sample Breakdown Points Inequalities Relating Addition and Replacement Type Finite Sample Breadown Points Robert Serfling Department of Mathematical Sciences University of Texas at Dallas Richardson, Texas 75083-0688, USA Email:

More information

Detecting outliers and/or leverage points: a robust two-stage procedure with bootstrap cut-off points

Detecting outliers and/or leverage points: a robust two-stage procedure with bootstrap cut-off points Detecting outliers and/or leverage points: a robust two-stage procedure with bootstrap cut-off points Ettore Marubini (1), Annalisa Orenti (1) Background: Identification and assessment of outliers, have

More information

Robust regression in Stata

Robust regression in Stata The Stata Journal (2009) 9, Number 3, pp. 439 453 Robust regression in Stata Vincenzo Verardi 1 University of Namur (CRED) and Université Libre de Bruxelles (ECARES and CKE) Rempart de la Vierge 8, B-5000

More information

Outliers Robustness in Multivariate Orthogonal Regression

Outliers Robustness in Multivariate Orthogonal Regression 674 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS PART A: SYSTEMS AND HUMANS, VOL. 30, NO. 6, NOVEMBER 2000 Outliers Robustness in Multivariate Orthogonal Regression Giuseppe Carlo Calafiore Abstract

More information

Journal of Statistical Software

Journal of Statistical Software JSS Journal of Statistical Software April 2013, Volume 53, Issue 3. http://www.jstatsoft.org/ Fast and Robust Bootstrap for Multivariate Inference: The R Package FRB Stefan Van Aelst Ghent University Gert

More information

Accurate and Powerful Multivariate Outlier Detection

Accurate and Powerful Multivariate Outlier Detection Int. Statistical Inst.: Proc. 58th World Statistical Congress, 11, Dublin (Session CPS66) p.568 Accurate and Powerful Multivariate Outlier Detection Cerioli, Andrea Università di Parma, Dipartimento di

More information

Robust Weighted LAD Regression

Robust Weighted LAD Regression Robust Weighted LAD Regression Avi Giloni, Jeffrey S. Simonoff, and Bhaskar Sengupta February 25, 2005 Abstract The least squares linear regression estimator is well-known to be highly sensitive to unusual

More information

Package ltsbase. R topics documented: February 20, 2015

Package ltsbase. R topics documented: February 20, 2015 Package ltsbase February 20, 2015 Type Package Title Ridge and Liu Estimates based on LTS (Least Trimmed Squares) Method Version 1.0.1 Date 2013-08-02 Author Betul Kan Kilinc [aut, cre], Ozlem Alpu [aut,

More information

Communications in Statistics Simulation and Computation. Impact of contamination on training and test error rates in statistical clustering

Communications in Statistics Simulation and Computation. Impact of contamination on training and test error rates in statistical clustering Impact of contamination on training and test error rates in statistical clustering Journal: Communications in Statistics - Simulation and Computation Manuscript ID: LSSP-00-0.R Manuscript Type: Review

More information

Robust model selection criteria for robust S and LT S estimators

Robust model selection criteria for robust S and LT S estimators Hacettepe Journal of Mathematics and Statistics Volume 45 (1) (2016), 153 164 Robust model selection criteria for robust S and LT S estimators Meral Çetin Abstract Outliers and multi-collinearity often

More information

Detecting outliers in weighted univariate survey data

Detecting outliers in weighted univariate survey data Detecting outliers in weighted univariate survey data Anna Pauliina Sandqvist October 27, 21 Preliminary Version Abstract Outliers and influential observations are a frequent concern in all kind of statistics,

More information

ON THE CALCULATION OF A ROBUST S-ESTIMATOR OF A COVARIANCE MATRIX

ON THE CALCULATION OF A ROBUST S-ESTIMATOR OF A COVARIANCE MATRIX STATISTICS IN MEDICINE Statist. Med. 17, 2685 2695 (1998) ON THE CALCULATION OF A ROBUST S-ESTIMATOR OF A COVARIANCE MATRIX N. A. CAMPBELL *, H. P. LOPUHAA AND P. J. ROUSSEEUW CSIRO Mathematical and Information

More information

Robust Maximum Association Estimators

Robust Maximum Association Estimators Robust Maximum Association Estimators Andreas Alfons, Christophe Croux, and Peter Filzmoser Andreas Alfons is Assistant Professor, Erasmus School of Economics, Erasmus Universiteit Rotterdam, PO Box 1738,

More information

Recent developments in PROGRESS

Recent developments in PROGRESS Z., -Statistical Procedures and Related Topics IMS Lecture Notes - Monograph Series (1997) Volume 31 Recent developments in PROGRESS Peter J. Rousseeuw and Mia Hubert University of Antwerp, Belgium Abstract:

More information

Robust canonical correlation analysis: a predictive approach.

Robust canonical correlation analysis: a predictive approach. Robust canonical correlation analysis: a predictive approach. Nadia L. Kudraszow a, Ricardo A. Maronna b a CONICET and University of La Plata, Argentina b University of La Plata, Argentina Abstract We

More information

Evaluation of robust PCA for supervised audio outlier detection

Evaluation of robust PCA for supervised audio outlier detection Evaluation of robust PCA for supervised audio outlier detection Sarka Brodinova, Vienna University of Technology, sarka.brodinova@tuwien.ac.at Thomas Ortner, Vienna University of Technology, thomas.ortner@tuwien.ac.at

More information

MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 4: Measures of Robustness, Robust Principal Component Analysis

MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 4: Measures of Robustness, Robust Principal Component Analysis MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 4:, Robust Principal Component Analysis Contents Empirical Robust Statistical Methods In statistics, robust methods are methods that perform well

More information

Computational Connections Between Robust Multivariate Analysis and Clustering

Computational Connections Between Robust Multivariate Analysis and Clustering 1 Computational Connections Between Robust Multivariate Analysis and Clustering David M. Rocke 1 and David L. Woodruff 2 1 Department of Applied Science, University of California at Davis, Davis, CA 95616,

More information

arxiv: v1 [math.st] 11 Jun 2018

arxiv: v1 [math.st] 11 Jun 2018 Robust test statistics for the two-way MANOVA based on the minimum covariance determinant estimator Bernhard Spangl a, arxiv:1806.04106v1 [math.st] 11 Jun 2018 a Institute of Applied Statistics and Computing,

More information

Robust Factorization of a Data Matrix

Robust Factorization of a Data Matrix Robust Factorization of a Data Matrix Christophe Croux 1 and Peter Filzmoser 2 1 ECARE and Institut de Statistique, Université Libre de Bruxelles, Av. F.D. Roosevelt 50, B-1050 Bruxelles, Belgium 2 Department

More information

Robust Multivariate Analysis

Robust Multivariate Analysis Robust Multivariate Analysis David J. Olive Robust Multivariate Analysis 123 David J. Olive Department of Mathematics Southern Illinois University Carbondale, IL USA ISBN 978-3-319-68251-8 ISBN 978-3-319-68253-2

More information

Projection-based outlier detection in functional data

Projection-based outlier detection in functional data Biometrika (2017), xx, x, pp. 1 12 C 2007 Biometrika Trust Printed in Great Britain Projection-based outlier detection in functional data BY HAOJIE REN Institute of Statistics, Nankai University, No.94

More information

Computing Robust Regression Estimators: Developments since Dutter (1977)

Computing Robust Regression Estimators: Developments since Dutter (1977) AUSTRIAN JOURNAL OF STATISTICS Volume 41 (2012), Number 1, 45 58 Computing Robust Regression Estimators: Developments since Dutter (1977) Moritz Gschwandtner and Peter Filzmoser Vienna University of Technology,

More information

Robust multivariate methods in Chemometrics

Robust multivariate methods in Chemometrics Robust multivariate methods in Chemometrics Peter Filzmoser 1 Sven Serneels 2 Ricardo Maronna 3 Pierre J. Van Espen 4 1 Institut für Statistik und Wahrscheinlichkeitstheorie, Technical University of Vienna,

More information

Research Article Robust Multivariate Control Charts to Detect Small Shifts in Mean

Research Article Robust Multivariate Control Charts to Detect Small Shifts in Mean Mathematical Problems in Engineering Volume 011, Article ID 93463, 19 pages doi:.1155/011/93463 Research Article Robust Multivariate Control Charts to Detect Small Shifts in Mean Habshah Midi 1, and Ashkan

More information

A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL

A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL Discussiones Mathematicae Probability and Statistics 36 206 43 5 doi:0.75/dmps.80 A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL Tadeusz Bednarski Wroclaw University e-mail: t.bednarski@prawo.uni.wroc.pl

More information

A PRACTICAL APPLICATION OF A ROBUST MULTIVARIATE OUTLIER DETECTION METHOD

A PRACTICAL APPLICATION OF A ROBUST MULTIVARIATE OUTLIER DETECTION METHOD A PRACTICAL APPLICATION OF A ROBUST MULTIVARIATE OUTLIER DETECTION METHOD Sarah Franklin, Marie Brodeur, Statistics Canada Sarah Franklin, Statistics Canada, BSMD, R.H.Coats Bldg, 1 lth floor, Ottawa,

More information

Detection of outliers in multivariate data:

Detection of outliers in multivariate data: 1 Detection of outliers in multivariate data: a method based on clustering and robust estimators Carla M. Santos-Pereira 1 and Ana M. Pires 2 1 Universidade Portucalense Infante D. Henrique, Oporto, Portugal

More information

Alternative Biased Estimator Based on Least. Trimmed Squares for Handling Collinear. Leverage Data Points

Alternative Biased Estimator Based on Least. Trimmed Squares for Handling Collinear. Leverage Data Points International Journal of Contemporary Mathematical Sciences Vol. 13, 018, no. 4, 177-189 HIKARI Ltd, www.m-hikari.com https://doi.org/10.1988/ijcms.018.8616 Alternative Biased Estimator Based on Least

More information

By 3DYHOýtåHN, Wolfgang Härdle

By 3DYHOýtåHN, Wolfgang Härdle No. 2005 31 ROBUST ESTIMATION OF DIMENSION REDUCTION SPACE By 3DYHOýtåHN, Wolfgang Härdle February 2005 ISSN 0924-7815 Robust estimation of dimension reduction space P. Čížek a and W. Härdle b a Department

More information

Upper and Lower Bounds for Breakdown Points

Upper and Lower Bounds for Breakdown Points Upper and Lower Bounds for Breakdown Points Christine Müller Abstract General upper and lower bounds for the finite sample breakdown point are presented. The general upper bound is obtained by an approach

More information

Robust Maximum Association Between Data Sets: The R Package ccapp

Robust Maximum Association Between Data Sets: The R Package ccapp Robust Maximum Association Between Data Sets: The R Package ccapp Andreas Alfons Erasmus Universiteit Rotterdam Christophe Croux KU Leuven Peter Filzmoser Vienna University of Technology Abstract This

More information

Evaluation of robust PCA for supervised audio outlier detection

Evaluation of robust PCA for supervised audio outlier detection Institut f. Stochastik und Wirtschaftsmathematik 1040 Wien, Wiedner Hauptstr. 8-10/105 AUSTRIA http://www.isw.tuwien.ac.at Evaluation of robust PCA for supervised audio outlier detection S. Brodinova,

More information

Robustifying Robust Estimators

Robustifying Robust Estimators Robustifying Robust Estimators David J. Olive and Douglas M. Hawkins Southern Illinois University and University of Minnesota January 12, 2007 Abstract In the literature, estimators for regression or multivariate

More information

Introduction to Robust Statistics. Anthony Atkinson, London School of Economics, UK Marco Riani, Univ. of Parma, Italy

Introduction to Robust Statistics. Anthony Atkinson, London School of Economics, UK Marco Riani, Univ. of Parma, Italy Introduction to Robust Statistics Anthony Atkinson, London School of Economics, UK Marco Riani, Univ. of Parma, Italy Multivariate analysis Multivariate location and scatter Data where the observations

More information

Depth-weighted robust multivariate regression with application to sparse data

Depth-weighted robust multivariate regression with application to sparse data 164 The Canadian Journal of Statistics Vol. 45, No. 2, 2017, Pages 164 184 La revue canadienne de statistique Depth-weighted robust multivariate regression with application to sparse data Subhajit DUTTA

More information

A Comparison of Robust Estimators Based on Two Types of Trimming

A Comparison of Robust Estimators Based on Two Types of Trimming Submitted to the Bernoulli A Comparison of Robust Estimators Based on Two Types of Trimming SUBHRA SANKAR DHAR 1, and PROBAL CHAUDHURI 1, 1 Theoretical Statistics and Mathematics Unit, Indian Statistical

More information

arxiv: v3 [stat.me] 2 Feb 2018 Abstract

arxiv: v3 [stat.me] 2 Feb 2018 Abstract ICS for Multivariate Outlier Detection with Application to Quality Control Aurore Archimbaud a, Klaus Nordhausen b, Anne Ruiz-Gazen a, a Toulouse School of Economics, University of Toulouse 1 Capitole,

More information

Spatial autocorrelation: robustness of measures and tests

Spatial autocorrelation: robustness of measures and tests Spatial autocorrelation: robustness of measures and tests Marie Ernst and Gentiane Haesbroeck University of Liege London, December 14, 2015 Spatial Data Spatial data : geographical positions non spatial

More information

Mahalanobis Distance Index

Mahalanobis Distance Index A Fast Algorithm for the Minimum Covariance Determinant Estimator Peter J. Rousseeuw and Katrien Van Driessen Revised Version, 15 December 1998 Abstract The minimum covariance determinant (MCD) method

More information

odhady a jejich (ekonometrické)

odhady a jejich (ekonometrické) modifikace Stochastic Modelling in Economics and Finance 2 Advisor : Prof. RNDr. Jan Ámos Víšek, CSc. Petr Jonáš 27 th April 2009 Contents 1 2 3 4 29 1 In classical approach we deal with stringent stochastic

More information

Lecture 12 Robust Estimation

Lecture 12 Robust Estimation Lecture 12 Robust Estimation Prof. Dr. Svetlozar Rachev Institute for Statistics and Mathematical Economics University of Karlsruhe Financial Econometrics, Summer Semester 2007 Copyright These lecture-notes

More information

YIJUN ZUO. Education. PhD, Statistics, 05/98, University of Texas at Dallas, (GPA 4.0/4.0)

YIJUN ZUO. Education. PhD, Statistics, 05/98, University of Texas at Dallas, (GPA 4.0/4.0) YIJUN ZUO Department of Statistics and Probability Michigan State University East Lansing, MI 48824 Tel: (517) 432-5413 Fax: (517) 432-5413 Email: zuo@msu.edu URL: www.stt.msu.edu/users/zuo Education PhD,

More information

Outlier Detection via Feature Selection Algorithms in

Outlier Detection via Feature Selection Algorithms in Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS032) p.4638 Outlier Detection via Feature Selection Algorithms in Covariance Estimation Menjoge, Rajiv S. M.I.T.,

More information

Author s Accepted Manuscript

Author s Accepted Manuscript Author s Accepted Manuscript Robust fitting of mixtures using the Trimmed Likelihood Estimator N. Neykov, P. Filzmoser, R. Dimova, P. Neytchev PII: S0167-9473(06)00501-9 DOI: doi:10.1016/j.csda.2006.12.024

More information

Introduction to robust statistics*

Introduction to robust statistics* Introduction to robust statistics* Xuming He National University of Singapore To statisticians, the model, data and methodology are essential. Their job is to propose statistical procedures and evaluate

More information

MULTICOLLINEARITY DIAGNOSTIC MEASURES BASED ON MINIMUM COVARIANCE DETERMINATION APPROACH

MULTICOLLINEARITY DIAGNOSTIC MEASURES BASED ON MINIMUM COVARIANCE DETERMINATION APPROACH Professor Habshah MIDI, PhD Department of Mathematics, Faculty of Science / Laboratory of Computational Statistics and Operations Research, Institute for Mathematical Research University Putra, Malaysia

More information

Robust variable selection with application to quality of life research

Robust variable selection with application to quality of life research Stat Methods Appl manuscript No. (will be inserted by the editor) Robust variable selection with application to quality of life research Andreas Alfons Wolfgang E. Baaske Peter Filzmoser Wolfgang Mader

More information

Robust statistic for the one-way MANOVA

Robust statistic for the one-way MANOVA Institut f. Statistik u. Wahrscheinlichkeitstheorie 1040 Wien, Wiedner Hauptstr. 8-10/107 AUSTRIA http://www.statistik.tuwien.ac.at Robust statistic for the one-way MANOVA V. Todorov and P. Filzmoser Forschungsbericht

More information

STAT 593 Robust statistics: Modeling and Computing

STAT 593 Robust statistics: Modeling and Computing STAT 593 Robust statistics: Modeling and Computing Joseph Salmon http://josephsalmon.eu Télécom Paristech, Institut Mines-Télécom & University of Washington, Department of Statistics (Visiting Assistant

More information

The Stata Journal. Editor Nicholas J. Cox Department of Geography Durham University South Road Durham City DH1 3LE UK

The Stata Journal. Editor Nicholas J. Cox Department of Geography Durham University South Road Durham City DH1 3LE UK The Stata Journal Editor H. Joseph Newton Department of Statistics Texas A&M University College Station, Texas 77843 979-845-8817; fax 979-845-6077 jnewton@stata-journal.com Associate Editors Christopher

More information

Robust Component Analysis via HQ Minimization

Robust Component Analysis via HQ Minimization Robust Component Analysis via HQ Minimization Ran He, Wei-shi Zheng and Liang Wang 0-08-6 Outline Overview Half-quadratic minimization principal component analysis Robust principal component analysis Robust

More information

Robust Tools for the Imperfect World

Robust Tools for the Imperfect World Robust Tools for the Imperfect World Peter Filzmoser a,, Valentin Todorov b a Department of Statistics and Probability Theory, Vienna University of Technology, Wiedner Hauptstr. 8-10, 1040 Vienna, Austria

More information