Supplementary Material for Wang and Serfling paper

Size: px
Start display at page:

Download "Supplementary Material for Wang and Serfling paper"

Transcription

1 Supplementary Material for Wang and Serfling paper March 6, Simulation study Here we provide a simulation study to compare empirically the masking and swamping robustness of our selected outlyingness functions. Two parametric parent distributions are considered, the bivariate standard normal, which is symmetric, and the bivariate exponential with mean (0.5, 0.5), which is rightskewed. 1.1 Simulation plan We generate samples separately from the above two distributions with three different sample sizes n = 50, 500, and We consider contamination models with true positive rate ɛ, for ɛ = 0.02, 0.1, 0.25, and 0.4. For each model, the number of replacements in the sample is given by m = nɛ. The numbers of replications for the sample sizes are chosen to be 5000 for n = 50, 2000 for n = 500, and 500 for n = 2000, respectively. To choose the sample outlyingness thresholds λ, we assume false positive rates under no contamination to be given by α = 0.02 for n = 50, and α = 0.01 for n = 500 and n = That is, under no contamination, we expect about 1 observation, 5 observations, and 20 observations with outlyingness values > λ for n = 50, 500, and 2000, respectively. On this basis, we take as sample outlyingness threshold λ the 2nd largest, the 6th largest, and the 21st largest value of the observed sample outlyingness function, for n = 50, 500, and 2000, respectively. For convenience, we index X i, i = 1, 2,..., n, in order of their Euclidean norms, from smallest to largest. We explore masking and swamping robustness of the following four affine invariant sample outlier identifiers treated in Section 4 of the paper: 1

2 Classical Mahalanobis distance outlyingness (CMD), i.e., O MD (x, X n ) with ( µ, Σ) = (X, S). Robust Mahalanobis distance outlyingness (RMD), i.e., O MD (x, X n ) with ( µ, Σ) given by the MCD estimators. Here these are defined taking subsample size h = n+d+1, which for d = 2 yields 26 for n = 50, 251 for 2 n = 500, and 1001 for n = 2000, respectively. The Robustbase package in R is used to obtain approximate MCD estimators via the Fast-MCD algorithm of Rousseeuw and Van Driessen (1999). Robust Mahalanobis spatial outlyingness (RMS), i.e., O MS (x, X n ) with the MCD covariance estimator and the subsample size h chosen as for RMD. Projection outlyingness (P), i.e., O P (x, X n ) with univariate location and scale given by (Med, MAD d 1 ). For computational convenience, only a finite number 1000 of randomly chosen projection directions are used in our simulation. We measure and compare the masking and swamping robustness of the above outlyingness functions using the following two performance indices: Percent of data points masked among m outliers. If all m outliers are masked, this indicates masking breakdown. Percent of data points swamped among nonoutliers. Here the number of nonoutliers is at most n m. If all n m nonoutliers are swamped, this indicates swamping breakdown. Two scenarios in relation to outliers (similar to Dang and Serfling, 2010) are considered: A Replace X n m+1,..., X n 1, X n by KX n m+1,..., KX n 1, KX n for an inflation factor K chosen = 5. Denote the modified data set by X (A) n,m. B Replace X n m+1,..., X n 1, X n by KX n,..., KX n, KX n for inflation factor K again chosen = 5. Denote the modified data set by X (B) n,m. These are illustrated in Figure 1 for the two parent distributions, with sample size n = 50 and m = 5 replacements. 2

3 Figure 1. Scenarios A and B with sample size 50 and 5 replacements, for the bivariate standard normal distribution (left) and the bivariate exponential(0.5, 0.5) distribution (right). The 5 observations in the original sample with largest Euclidean norms are marked +. Their replacements in Scenario A are marked and in Scenario B are marked. Note the 5 replacements in Scenario B overlap each other and also one of the Scenario A replacements. 1.2 Simulation results The following sections present simulation results on masking and swamping for the two distributions, respectively Results for contaminated bivariate standard normal Table 1 shows for each procedure the average sample outlyingness thresholds λ for the replicated samples of sizes 50, 500, and 2000 from bivariate standard normal. As described earlier, the threshold is chosen according to the false positive rate α under no contamination, which is 0.02 for n = 50 and 0.01 for n = 500 and 2000, respectively. 3

4 n = 50, n = 500, n = 2000, 5000 trials 2000 trials 500 trials CMD RMD RMS P Table 1. Average sample outlyingness thresholds λ for CMD, RMD, RMS and P, with n = 50, 500 and 2000 under no contamination, based on the bivariate normal model Masking performance Table 2 displays the masking robustness of CMD, RMD, RMS and P under scenarios A and B with the four different contamination levels, based on the standard normal model. Here masking breakdown occurs in a given sample when all m outliers (replaced data points) are masked. Average percent (%) masked among m = nɛ replacement outliers ɛ = 0.02 ɛ = 0.10 ɛ = 0.25 ɛ = 0.40 MBP A B A B A B A B CMD n = 50, RMD trials RMS P CMD n = 500, RMD trials RMS P CMD n = 2000, RMD trials RMS P Table 2. Masking performance of CMD, RMD, RMS and P for bivariate standard normal samples with n = 50, 500 and As a benchmark, the MBP column gives the theoretical MBP values. As ɛ increases, masking occurs more frequently, especially when ɛ > MBP. Masking breakdown in a sample occurs when the percent of outliers masked is

5 Comments based on Table 2. Let us discuss the implications of Table 2 for each of the four procedures and then summarize. CMD is not robust with respect to masking, its masking performance degrading seriously as the contamination level increases. In particular, when ɛ 0.25, masking breakdown occurs under Scenario B of outlier replacements in all replications, for all three sample sizes. This result is not surprising since the theoretical MBP of CMD is n 1 0. RMD has strong masking robustness overall. No masking whatsoever occurs in either scenario at contamination levels ɛ = 0.02, 0.10 and 0.25, nor in Scenario A at level ɛ = On the other hand, in Scenario B at level ɛ = 0.40, the average percent masked is high. These results are in line with the theoretical MBP of RMD, which is n 1 n d+1 2 1/2, independently of the sample threshold λ. RMS has very weak masking robustness for the selected thresholds λ given in Table 1 corresponding to approximately the 0.98 quantile of the outlyingness function, which inserted into the MBP formula min { n 1 n d+1 2, n 1 1 λ n } with d = 2 yield MBP values 0.06, 0.03, 2 and 0.03, respectively, for n = 50, 500, and Accordingly, for very small ɛ = 0.02, which is just under the MBP values, there is no masking for either scenario, whereas for all cases of ɛ 0.10 there occurs significant levels of masking for both scenarios. P exhibits very strong robustness, in keeping with its high MBP For both scenarios and all contamination levels, virtually no masking occurs. This agrees with the results for RMD except in the case of Scenario B with contamination level ɛ = 0.40, where P significantly outperforms RMD. In Scenario B with high contamination level, the constraint of elliptical contours is a serious drawback. Thus, for the standard normal contamination model, the above findings corroborate what is suggested by the theoretical MBP values and add some perspective. Namely, CMD has unacceptable masking robustness and hence should not be used in practice, and RMS has only weak masking robustness due low MBP at high sample threshold levels, while RMD and P exhibit strong masking robustness and perform similarly, except that P significantly surpasses RMD in the case of a large cluster contamination Swamping performance Table 3 displays the swamping robustness of CMD, RMD, RMS and P under scenarios A and B with the four different contamination levels, based 5

6 on the standard normal model. Here swamping breakdown occurs in a given sample when all n m nonoutliers (nonreplaced data points) are swamped. Average percent (%) swamped among n m = n(1 ɛ) nonoutliers ɛ = 0.02 ɛ = 0.10 ɛ = 0.25 ɛ = 0.40 SBP A B A B A B A B CMD n = 50, RMD trials RMS P CMD n = 500, RMD trials RMS P CMD n = 2000, RMD trials RMS P Table 3. Swamping performance of CMD, RMD, RMS and P for bivariate standard normal samples with n = 50, 500 and As a benchmark, the SBP column gives the theoretical SBP values. Swamping breakdown in a sample occurs when the percent of nonoutliers swamped is 100. Comments based on Table 3. Except for Scenario B at contamination level ɛ = 0.40, all four procedures exhibit very low swamping, with CMD also performing very well and RMS moderately well even for this extreme scenario. Thus, for the standard normal contamination model, CMD and RMS perform best with respect to swamping robustness, although for masking robustness they are outperformed by RMD and P. These findings corroborate the high SBP values of the four procedures (especially high for CMD) Results for contaminated bivariate exponential Table 4 shows for each of the four procedures the average sample outlyingness thresholds λ for the replicated samples of sizes 50, 500, and 2000 from bivariate exponential(0.5, 0.5). Again, the threshold is chosen according to the false positive rate α under no contamination, which is 0.02 for n = 50 and 0.01 for n = 500 and 2000, respectively. 6

7 n = 50, n = 500, n = 2000, 5000 trials 2000 trials 500 trials CMD RMD RMS P Table 4. Average sample outlyingness thresholds λ for CMD, RMD, RMS and P, with n = 50, 500 and 2000 under no contamination, based on the bivariate exponential(0.5, 0.5) model Masking performance Table 5 displays the masking robustness of CMD, RMD, RMS and P under scenarios A and B with the four different contamination levels, based on the bivariate exponential(0.5, 0.5) model. Here masking breakdown occurs in a given sample when all m outliers (replaced data points) are masked. Average percent (%) masked among m = nɛ replacement outliers ɛ = 0.02 ɛ = 0.10 ɛ = 0.25 ɛ = 0.40 MBP A B A B A B A B CMD n = 50, RMD trials RMS P CMD n = 500, RMD trials RMS P CMD n = 2000, RMD trials RMS P Table 5. Masking performance of CMD, RMD, RMS and P for bivariate exponential(0.5, 0.5) samples with n = 50, 500 and The MBP column gives the theoretical MBP values. Masking breakdown in a sample occurs when the percent of outliers masked is

8 Comments based on Table 5. Both CMD and RMS lack masking robustness, while RMD and P maintain high masking robustness with all contamination levels below the theoretical MBPs. As added perspective, we note that RMD is slightly better than P for Scenario A, while the reverse holds for Scenario B. Of course, when imposing elliptical contours is not appropriate, P is the more suitable choice Swamping performance Table 6 displays the swamping robustness of CMD, RMD, RMS and P under scenarios A and B with the four different contamination levels, based on the bivariate exponential(0.5, 0.5) model. Swamping breakdown occurs in a given sample when all n m nonoutliers (nonreplaced data points) are swamped. Average percent (%) swamped among n m = n(1 ɛ) nonoutliers ɛ = 0.02 ɛ = 0.1 ɛ = 0.25 ɛ = 0.4 SBP A B A B A B A B CMD n = 50, RMD trials RMS P CMD n = 500, RMD trials RMS P CMD n = 2000, RMD trials RMS P Table 6. Swamping performance of CMD, RMD, RMS and P, for bivariate exponential(0.5, 0.5) samples with n = 50, 500 and The SBP column gives the theoretical SBP values. Swamping breakdown in a sample occurs when the percent of nonoutliers swamped is 100. Comments based on Table 6. CMD and RMS exhibit excellent swamping robustness. RMD follows closely, with weakness only in the case of a large cluster of outiers, and P follows RMD competitively. 8

9 1.2.3 Practical recommendations based on the simulation results Although its swamping performance is optimal, CMD has unacceptably low masking performance and hence is not recommended for use in practice. The masking robustness of RMS is weak at high sample thresholds λ, yet it has good swamping performance and its computational burden is relatively light. Both RMD and P are robust with respect to both masking and swamping, with RMD having less computational complexity. It is worth noting, however, that in the contaminated bivariate standard normal model, both the masking performance of RMD and the swamping performance of RMD and P degrade in the presence of a large cluster of outliers. One may select the appropriate outlier identifier according to the preferred balance on masking robustness versus swamping robustness, in conjunction with consideration of the appropriateness or not of elliptical contours, the computational burden, and whether or not protection against a large cluster is desired. These recommendations based on results in two specific parametric models for the data are consistent with the results based on the MBPs and SBPs, which, of course, are nonparametric and have wide general application across diverse data settings. References [1] Dang, X. and Serfling, R. (2010). Nonparametric depth-based multivariate outlier identifiers, and masking robustness properties. Journal of Statistical Planning and Inference [2] Rousseeuw, P. and Van Driessen, K. (1999). A fast algorithm for the minimum covariance determinant estimator. Technometrics

Computationally Easy Outlier Detection via Projection Pursuit with Finitely Many Directions

Computationally Easy Outlier Detection via Projection Pursuit with Finitely Many Directions Computationally Easy Outlier Detection via Projection Pursuit with Finitely Many Directions Robert Serfling 1 and Satyaki Mazumder 2 University of Texas at Dallas and Indian Institute of Science, Education

More information

General Foundations for Studying Masking and Swamping Robustness of Outlier Identifiers

General Foundations for Studying Masking and Swamping Robustness of Outlier Identifiers General Foundations for Studying Masking and Swamping Robustness of Outlier Identifiers Robert Serfling 1 and Shanshan Wang 2 University of Texas at Dallas This paper is dedicated to the memory of Kesar

More information

Monitoring Random Start Forward Searches for Multivariate Data

Monitoring Random Start Forward Searches for Multivariate Data Monitoring Random Start Forward Searches for Multivariate Data Anthony C. Atkinson 1, Marco Riani 2, and Andrea Cerioli 2 1 Department of Statistics, London School of Economics London WC2A 2AE, UK, a.c.atkinson@lse.ac.uk

More information

Introduction to Robust Statistics. Anthony Atkinson, London School of Economics, UK Marco Riani, Univ. of Parma, Italy

Introduction to Robust Statistics. Anthony Atkinson, London School of Economics, UK Marco Riani, Univ. of Parma, Italy Introduction to Robust Statistics Anthony Atkinson, London School of Economics, UK Marco Riani, Univ. of Parma, Italy Multivariate analysis Multivariate location and scatter Data where the observations

More information

Re-weighted Robust Control Charts for Individual Observations

Re-weighted Robust Control Charts for Individual Observations Universiti Tunku Abdul Rahman, Kuala Lumpur, Malaysia 426 Re-weighted Robust Control Charts for Individual Observations Mandana Mohammadi 1, Habshah Midi 1,2 and Jayanthi Arasan 1,2 1 Laboratory of Applied

More information

MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 4: Measures of Robustness, Robust Principal Component Analysis

MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 4: Measures of Robustness, Robust Principal Component Analysis MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 4:, Robust Principal Component Analysis Contents Empirical Robust Statistical Methods In statistics, robust methods are methods that perform well

More information

Detection of outliers in multivariate data:

Detection of outliers in multivariate data: 1 Detection of outliers in multivariate data: a method based on clustering and robust estimators Carla M. Santos-Pereira 1 and Ana M. Pires 2 1 Universidade Portucalense Infante D. Henrique, Oporto, Portugal

More information

Accurate and Powerful Multivariate Outlier Detection

Accurate and Powerful Multivariate Outlier Detection Int. Statistical Inst.: Proc. 58th World Statistical Congress, 11, Dublin (Session CPS66) p.568 Accurate and Powerful Multivariate Outlier Detection Cerioli, Andrea Università di Parma, Dipartimento di

More information

Robust estimation of scale and covariance with P n and its application to precision matrix estimation

Robust estimation of scale and covariance with P n and its application to precision matrix estimation Robust estimation of scale and covariance with P n and its application to precision matrix estimation Garth Tarr, Samuel Müller and Neville Weber USYD 2013 School of Mathematics and Statistics THE UNIVERSITY

More information

Research Article Robust Multivariate Control Charts to Detect Small Shifts in Mean

Research Article Robust Multivariate Control Charts to Detect Small Shifts in Mean Mathematical Problems in Engineering Volume 011, Article ID 93463, 19 pages doi:.1155/011/93463 Research Article Robust Multivariate Control Charts to Detect Small Shifts in Mean Habshah Midi 1, and Ashkan

More information

Robust Wilks' Statistic based on RMCD for One-Way Multivariate Analysis of Variance (MANOVA)

Robust Wilks' Statistic based on RMCD for One-Way Multivariate Analysis of Variance (MANOVA) ISSN 2224-584 (Paper) ISSN 2225-522 (Online) Vol.7, No.2, 27 Robust Wils' Statistic based on RMCD for One-Way Multivariate Analysis of Variance (MANOVA) Abdullah A. Ameen and Osama H. Abbas Department

More information

IDENTIFYING MULTIPLE OUTLIERS IN LINEAR REGRESSION : ROBUST FIT AND CLUSTERING APPROACH

IDENTIFYING MULTIPLE OUTLIERS IN LINEAR REGRESSION : ROBUST FIT AND CLUSTERING APPROACH SESSION X : THEORY OF DEFORMATION ANALYSIS II IDENTIFYING MULTIPLE OUTLIERS IN LINEAR REGRESSION : ROBUST FIT AND CLUSTERING APPROACH Robiah Adnan 2 Halim Setan 3 Mohd Nor Mohamad Faculty of Science, Universiti

More information

ON THE CALCULATION OF A ROBUST S-ESTIMATOR OF A COVARIANCE MATRIX

ON THE CALCULATION OF A ROBUST S-ESTIMATOR OF A COVARIANCE MATRIX STATISTICS IN MEDICINE Statist. Med. 17, 2685 2695 (1998) ON THE CALCULATION OF A ROBUST S-ESTIMATOR OF A COVARIANCE MATRIX N. A. CAMPBELL *, H. P. LOPUHAA AND P. J. ROUSSEEUW CSIRO Mathematical and Information

More information

The S-estimator of multivariate location and scatter in Stata

The S-estimator of multivariate location and scatter in Stata The Stata Journal (yyyy) vv, Number ii, pp. 1 9 The S-estimator of multivariate location and scatter in Stata Vincenzo Verardi University of Namur (FUNDP) Center for Research in the Economics of Development

More information

A Convex Hull Peeling Depth Approach to Nonparametric Massive Multivariate Data Analysis with Applications

A Convex Hull Peeling Depth Approach to Nonparametric Massive Multivariate Data Analysis with Applications A Convex Hull Peeling Depth Approach to Nonparametric Massive Multivariate Data Analysis with Applications Hyunsook Lee. hlee@stat.psu.edu Department of Statistics The Pennsylvania State University Hyunsook

More information

Small Sample Corrections for LTS and MCD

Small Sample Corrections for LTS and MCD myjournal manuscript No. (will be inserted by the editor) Small Sample Corrections for LTS and MCD G. Pison, S. Van Aelst, and G. Willems Department of Mathematics and Computer Science, Universitaire Instelling

More information

Robust estimation of principal components from depth-based multivariate rank covariance matrix

Robust estimation of principal components from depth-based multivariate rank covariance matrix Robust estimation of principal components from depth-based multivariate rank covariance matrix Subho Majumdar Snigdhansu Chatterjee University of Minnesota, School of Statistics Table of contents Summary

More information

Asymptotic Relative Efficiency in Estimation

Asymptotic Relative Efficiency in Estimation Asymptotic Relative Efficiency in Estimation Robert Serfling University of Texas at Dallas October 2009 Prepared for forthcoming INTERNATIONAL ENCYCLOPEDIA OF STATISTICAL SCIENCES, to be published by Springer

More information

Fast and robust bootstrap for LTS

Fast and robust bootstrap for LTS Fast and robust bootstrap for LTS Gert Willems a,, Stefan Van Aelst b a Department of Mathematics and Computer Science, University of Antwerp, Middelheimlaan 1, B-2020 Antwerp, Belgium b Department of

More information

Stahel-Donoho Estimation for High-Dimensional Data

Stahel-Donoho Estimation for High-Dimensional Data Stahel-Donoho Estimation for High-Dimensional Data Stefan Van Aelst KULeuven, Department of Mathematics, Section of Statistics Celestijnenlaan 200B, B-3001 Leuven, Belgium Email: Stefan.VanAelst@wis.kuleuven.be

More information

An Overview of Multiple Outliers in Multidimensional Data

An Overview of Multiple Outliers in Multidimensional Data Sri Lankan Journal of Applied Statistics, Vol (14-2) An Overview of Multiple Outliers in Multidimensional Data T. A. Sajesh 1 and M.R. Srinivasan 2 1 Department of Statistics, St. Thomas College, Thrissur,

More information

Detecting outliers in weighted univariate survey data

Detecting outliers in weighted univariate survey data Detecting outliers in weighted univariate survey data Anna Pauliina Sandqvist October 27, 21 Preliminary Version Abstract Outliers and influential observations are a frequent concern in all kind of statistics,

More information

arxiv: v3 [stat.me] 2 Feb 2018 Abstract

arxiv: v3 [stat.me] 2 Feb 2018 Abstract ICS for Multivariate Outlier Detection with Application to Quality Control Aurore Archimbaud a, Klaus Nordhausen b, Anne Ruiz-Gazen a, a Toulouse School of Economics, University of Toulouse 1 Capitole,

More information

Identification of Multivariate Outliers: A Performance Study

Identification of Multivariate Outliers: A Performance Study AUSTRIAN JOURNAL OF STATISTICS Volume 34 (2005), Number 2, 127 138 Identification of Multivariate Outliers: A Performance Study Peter Filzmoser Vienna University of Technology, Austria Abstract: Three

More information

CHAPTER 5. Outlier Detection in Multivariate Data

CHAPTER 5. Outlier Detection in Multivariate Data CHAPTER 5 Outlier Detection in Multivariate Data 5.1 Introduction Multivariate outlier detection is the important task of statistical analysis of multivariate data. Many methods have been proposed for

More information

A ROBUST METHOD OF ESTIMATING COVARIANCE MATRIX IN MULTIVARIATE DATA ANALYSIS G.M. OYEYEMI *, R.A. IPINYOMI **

A ROBUST METHOD OF ESTIMATING COVARIANCE MATRIX IN MULTIVARIATE DATA ANALYSIS G.M. OYEYEMI *, R.A. IPINYOMI ** ANALELE ŞTIINłIFICE ALE UNIVERSITĂłII ALEXANDRU IOAN CUZA DIN IAŞI Tomul LVI ŞtiinŃe Economice 9 A ROBUST METHOD OF ESTIMATING COVARIANCE MATRIX IN MULTIVARIATE DATA ANALYSIS G.M. OYEYEMI, R.A. IPINYOMI

More information

A Multi-Step, Cluster-based Multivariate Chart for. Retrospective Monitoring of Individuals

A Multi-Step, Cluster-based Multivariate Chart for. Retrospective Monitoring of Individuals A Multi-Step, Cluster-based Multivariate Chart for Retrospective Monitoring of Individuals J. Marcus Jobe, Michael Pokojovy April 29, 29 J. Marcus Jobe is Professor, Decision Sciences and Management Information

More information

On Invariant Within Equivalence Coordinate System (IWECS) Transformations

On Invariant Within Equivalence Coordinate System (IWECS) Transformations On Invariant Within Equivalence Coordinate System (IWECS) Transformations Robert Serfling Abstract In exploratory data analysis and data mining in the very common setting of a data set X of vectors from

More information

An Alternative Algorithm for Classification Based on Robust Mahalanobis Distance

An Alternative Algorithm for Classification Based on Robust Mahalanobis Distance Dhaka Univ. J. Sci. 61(1): 81-85, 2013 (January) An Alternative Algorithm for Classification Based on Robust Mahalanobis Distance A. H. Sajib, A. Z. M. Shafiullah 1 and A. H. Sumon Department of Statistics,

More information

FAULT DETECTION AND ISOLATION WITH ROBUST PRINCIPAL COMPONENT ANALYSIS

FAULT DETECTION AND ISOLATION WITH ROBUST PRINCIPAL COMPONENT ANALYSIS Int. J. Appl. Math. Comput. Sci., 8, Vol. 8, No. 4, 49 44 DOI:.478/v6-8-38-3 FAULT DETECTION AND ISOLATION WITH ROBUST PRINCIPAL COMPONENT ANALYSIS YVON THARRAULT, GILLES MOUROT, JOSÉ RAGOT, DIDIER MAQUIN

More information

Minimum Regularized Covariance Determinant Estimator

Minimum Regularized Covariance Determinant Estimator Minimum Regularized Covariance Determinant Estimator Honey, we shrunk the data and the covariance matrix Kris Boudt (joint with: P. Rousseeuw, S. Vanduffel and T. Verdonck) Vrije Universiteit Brussel/Amsterdam

More information

Inequalities Relating Addition and Replacement Type Finite Sample Breakdown Points

Inequalities Relating Addition and Replacement Type Finite Sample Breakdown Points Inequalities Relating Addition and Replacement Type Finite Sample Breadown Points Robert Serfling Department of Mathematical Sciences University of Texas at Dallas Richardson, Texas 75083-0688, USA Email:

More information

FAST CROSS-VALIDATION IN ROBUST PCA

FAST CROSS-VALIDATION IN ROBUST PCA COMPSTAT 2004 Symposium c Physica-Verlag/Springer 2004 FAST CROSS-VALIDATION IN ROBUST PCA Sanne Engelen, Mia Hubert Key words: Cross-Validation, Robustness, fast algorithm COMPSTAT 2004 section: Partial

More information

Fast and Robust Classifiers Adjusted for Skewness

Fast and Robust Classifiers Adjusted for Skewness Fast and Robust Classifiers Adjusted for Skewness Mia Hubert 1 and Stephan Van der Veeken 2 1 Department of Mathematics - LStat, Katholieke Universiteit Leuven Celestijnenlaan 200B, Leuven, Belgium, Mia.Hubert@wis.kuleuven.be

More information

368 XUMING HE AND GANG WANG of convergence for the MVE estimator is n ;1=3. We establish strong consistency and functional continuity of the MVE estim

368 XUMING HE AND GANG WANG of convergence for the MVE estimator is n ;1=3. We establish strong consistency and functional continuity of the MVE estim Statistica Sinica 6(1996), 367-374 CROSS-CHECKING USING THE MINIMUM VOLUME ELLIPSOID ESTIMATOR Xuming He and Gang Wang University of Illinois and Depaul University Abstract: We show that for a wide class

More information

Evaluation of robust PCA for supervised audio outlier detection

Evaluation of robust PCA for supervised audio outlier detection Evaluation of robust PCA for supervised audio outlier detection Sarka Brodinova, Vienna University of Technology, sarka.brodinova@tuwien.ac.at Thomas Ortner, Vienna University of Technology, thomas.ortner@tuwien.ac.at

More information

Multivariate outlier detection based on a robust Mahalanobis distance with shrinkage estimators

Multivariate outlier detection based on a robust Mahalanobis distance with shrinkage estimators UC3M Working Papers Statistics and Econometrics 17-10 ISSN 2387-0303 Mayo 2017 Departamento de Estadística Universidad Carlos III de Madrid Calle Madrid, 126 28903 Getafe (Spain) Fax (34) 91 624-98-48

More information

Robust Exponential Smoothing of Multivariate Time Series

Robust Exponential Smoothing of Multivariate Time Series Robust Exponential Smoothing of Multivariate Time Series Christophe Croux,a, Sarah Gelper b, Koen Mahieu a a Faculty of Business and Economics, K.U.Leuven, Naamsestraat 69, 3000 Leuven, Belgium b Erasmus

More information

Outlier detection for skewed data

Outlier detection for skewed data Outlier detection for skewed data Mia Hubert 1 and Stephan Van der Veeken December 7, 27 Abstract Most outlier detection rules for multivariate data are based on the assumption of elliptical symmetry of

More information

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.

More information

Design of Screening Experiments with Partial Replication

Design of Screening Experiments with Partial Replication Design of Screening Experiments with Partial Replication David J. Edwards Department of Statistical Sciences & Operations Research Virginia Commonwealth University Robert D. Leonard Department of Information

More information

Median Cross-Validation

Median Cross-Validation Median Cross-Validation Chi-Wai Yu 1, and Bertrand Clarke 2 1 Department of Mathematics Hong Kong University of Science and Technology 2 Department of Medicine University of Miami IISA 2011 Outline Motivational

More information

Improved Feasible Solution Algorithms for. High Breakdown Estimation. Douglas M. Hawkins. David J. Olive. Department of Applied Statistics

Improved Feasible Solution Algorithms for. High Breakdown Estimation. Douglas M. Hawkins. David J. Olive. Department of Applied Statistics Improved Feasible Solution Algorithms for High Breakdown Estimation Douglas M. Hawkins David J. Olive Department of Applied Statistics University of Minnesota St Paul, MN 55108 Abstract High breakdown

More information

MULTICHANNEL SIGNAL PROCESSING USING SPATIAL RANK COVARIANCE MATRICES

MULTICHANNEL SIGNAL PROCESSING USING SPATIAL RANK COVARIANCE MATRICES MULTICHANNEL SIGNAL PROCESSING USING SPATIAL RANK COVARIANCE MATRICES S. Visuri 1 H. Oja V. Koivunen 1 1 Signal Processing Lab. Dept. of Statistics Tampere Univ. of Technology University of Jyväskylä P.O.

More information

Computational Statistics and Data Analysis

Computational Statistics and Data Analysis Computational Statistics and Data Analysis 65 (2013) 29 45 Contents lists available at SciVerse ScienceDirect Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda Robust

More information

Detecting outliers and/or leverage points: a robust two-stage procedure with bootstrap cut-off points

Detecting outliers and/or leverage points: a robust two-stage procedure with bootstrap cut-off points Detecting outliers and/or leverage points: a robust two-stage procedure with bootstrap cut-off points Ettore Marubini (1), Annalisa Orenti (1) Background: Identification and assessment of outliers, have

More information

Small sample corrections for LTS and MCD

Small sample corrections for LTS and MCD Metrika (2002) 55: 111 123 > Springer-Verlag 2002 Small sample corrections for LTS and MCD G. Pison, S. Van Aelst*, and G. Willems Department of Mathematics and Computer Science, Universitaire Instelling

More information

Robust Classification for Skewed Data

Robust Classification for Skewed Data Advances in Data Analysis and Classification manuscript No. (will be inserted by the editor) Robust Classification for Skewed Data Mia Hubert Stephan Van der Veeken Received: date / Accepted: date Abstract

More information

The Robustness of the Multivariate EWMA Control Chart

The Robustness of the Multivariate EWMA Control Chart The Robustness of the Multivariate EWMA Control Chart Zachary G. Stoumbos, Rutgers University, and Joe H. Sullivan, Mississippi State University Joe H. Sullivan, MSU, MS 39762 Key Words: Elliptically symmetric,

More information

Definition 5.1: Rousseeuw and Van Driessen (1999). The DD plot is a plot of the classical Mahalanobis distances MD i versus robust Mahalanobis

Definition 5.1: Rousseeuw and Van Driessen (1999). The DD plot is a plot of the classical Mahalanobis distances MD i versus robust Mahalanobis Chapter 5 DD Plots and Prediction Regions 5. DD Plots A basic way of designing a graphical display is to arrange for reference situations to correspond to straight lines in the plot. Chambers, Cleveland,

More information

Effect of outliers on the variable selection by the regularized regression

Effect of outliers on the variable selection by the regularized regression Communications for Statistical Applications and Methods 2018, Vol. 25, No. 2, 235 243 https://doi.org/10.29220/csam.2018.25.2.235 Print ISSN 2287-7843 / Online ISSN 2383-4757 Effect of outliers on the

More information

TITLE : Robust Control Charts for Monitoring Process Mean of. Phase-I Multivariate Individual Observations AUTHORS : Asokan Mulayath Variyath.

TITLE : Robust Control Charts for Monitoring Process Mean of. Phase-I Multivariate Individual Observations AUTHORS : Asokan Mulayath Variyath. TITLE : Robust Control Charts for Monitoring Process Mean of Phase-I Multivariate Individual Observations AUTHORS : Asokan Mulayath Variyath Department of Mathematics and Statistics, Memorial University

More information

More on Unsupervised Learning

More on Unsupervised Learning More on Unsupervised Learning Two types of problems are to find association rules for occurrences in common in observations (market basket analysis), and finding the groups of values of observational data

More information

Outlier detection for high-dimensional data

Outlier detection for high-dimensional data Biometrika (2015), 102,3,pp. 589 599 doi: 10.1093/biomet/asv021 Printed in Great Britain Advance Access publication 7 June 2015 Outlier detection for high-dimensional data BY KWANGIL RO, CHANGLIANG ZOU,

More information

Generalized Multivariate Rank Type Test Statistics via Spatial U-Quantiles

Generalized Multivariate Rank Type Test Statistics via Spatial U-Quantiles Generalized Multivariate Rank Type Test Statistics via Spatial U-Quantiles Weihua Zhou 1 University of North Carolina at Charlotte and Robert Serfling 2 University of Texas at Dallas Final revision for

More information

Bootstrapping the Confidence Intervals of R 2 MAD for Samples from Contaminated Standard Logistic Distribution

Bootstrapping the Confidence Intervals of R 2 MAD for Samples from Contaminated Standard Logistic Distribution Pertanika J. Sci. & Technol. 18 (1): 209 221 (2010) ISSN: 0128-7680 Universiti Putra Malaysia Press Bootstrapping the Confidence Intervals of R 2 MAD for Samples from Contaminated Standard Logistic Distribution

More information

Regression Clustering

Regression Clustering Regression Clustering In regression clustering, we assume a model of the form y = f g (x, θ g ) + ɛ g for observations y and x in the g th group. Usually, of course, we assume linear models of the form

More information

Experimental Design and Data Analysis for Biologists

Experimental Design and Data Analysis for Biologists Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1

More information

DESIGNING EXPERIMENTS AND ANALYZING DATA A Model Comparison Perspective

DESIGNING EXPERIMENTS AND ANALYZING DATA A Model Comparison Perspective DESIGNING EXPERIMENTS AND ANALYZING DATA A Model Comparison Perspective Second Edition Scott E. Maxwell Uniuersity of Notre Dame Harold D. Delaney Uniuersity of New Mexico J,t{,.?; LAWRENCE ERLBAUM ASSOCIATES,

More information

Depth-weighted robust multivariate regression with application to sparse data

Depth-weighted robust multivariate regression with application to sparse data 164 The Canadian Journal of Statistics Vol. 45, No. 2, 2017, Pages 164 184 La revue canadienne de statistique Depth-weighted robust multivariate regression with application to sparse data Subhajit DUTTA

More information

Discriminant Analysis with High Dimensional. von Mises-Fisher distribution and

Discriminant Analysis with High Dimensional. von Mises-Fisher distribution and Athens Journal of Sciences December 2014 Discriminant Analysis with High Dimensional von Mises - Fisher Distributions By Mario Romanazzi This paper extends previous work in discriminant analysis with von

More information

IMPROVING THE SMALL-SAMPLE EFFICIENCY OF A ROBUST CORRELATION MATRIX: A NOTE

IMPROVING THE SMALL-SAMPLE EFFICIENCY OF A ROBUST CORRELATION MATRIX: A NOTE IMPROVING THE SMALL-SAMPLE EFFICIENCY OF A ROBUST CORRELATION MATRIX: A NOTE Eric Blankmeyer Department of Finance and Economics McCoy College of Business Administration Texas State University San Marcos

More information

Package ltsbase. R topics documented: February 20, 2015

Package ltsbase. R topics documented: February 20, 2015 Package ltsbase February 20, 2015 Type Package Title Ridge and Liu Estimates based on LTS (Least Trimmed Squares) Method Version 1.0.1 Date 2013-08-02 Author Betul Kan Kilinc [aut, cre], Ozlem Alpu [aut,

More information

MULTIVARIATE TECHNIQUES, ROBUSTNESS

MULTIVARIATE TECHNIQUES, ROBUSTNESS MULTIVARIATE TECHNIQUES, ROBUSTNESS Mia Hubert Associate Professor, Department of Mathematics and L-STAT Katholieke Universiteit Leuven, Belgium mia.hubert@wis.kuleuven.be Peter J. Rousseeuw 1 Senior Researcher,

More information

COMPARING ROBUST REGRESSION LINES ASSOCIATED WITH TWO DEPENDENT GROUPS WHEN THERE IS HETEROSCEDASTICITY

COMPARING ROBUST REGRESSION LINES ASSOCIATED WITH TWO DEPENDENT GROUPS WHEN THERE IS HETEROSCEDASTICITY COMPARING ROBUST REGRESSION LINES ASSOCIATED WITH TWO DEPENDENT GROUPS WHEN THERE IS HETEROSCEDASTICITY Rand R. Wilcox Dept of Psychology University of Southern California Florence Clark Division of Occupational

More information

Anomaly (outlier) detection. Huiping Cao, Anomaly 1

Anomaly (outlier) detection. Huiping Cao, Anomaly 1 Anomaly (outlier) detection Huiping Cao, Anomaly 1 Outline General concepts What are outliers Types of outliers Causes of anomalies Challenges of outlier detection Outlier detection approaches Huiping

More information

Robust scale estimation with extensions

Robust scale estimation with extensions Robust scale estimation with extensions Garth Tarr, Samuel Müller and Neville Weber School of Mathematics and Statistics THE UNIVERSITY OF SYDNEY Outline The robust scale estimator P n Robust covariance

More information

Directionally Sensitive Multivariate Statistical Process Control Methods

Directionally Sensitive Multivariate Statistical Process Control Methods Directionally Sensitive Multivariate Statistical Process Control Methods Ronald D. Fricker, Jr. Naval Postgraduate School October 5, 2005 Abstract In this paper we develop two directionally sensitive statistical

More information

Applied Multivariate and Longitudinal Data Analysis

Applied Multivariate and Longitudinal Data Analysis Applied Multivariate and Longitudinal Data Analysis Chapter 2: Inference about the mean vector(s) Ana-Maria Staicu SAS Hall 5220; 919-515-0644; astaicu@ncsu.edu 1 In this chapter we will discuss inference

More information

Computational Connections Between Robust Multivariate Analysis and Clustering

Computational Connections Between Robust Multivariate Analysis and Clustering 1 Computational Connections Between Robust Multivariate Analysis and Clustering David M. Rocke 1 and David L. Woodruff 2 1 Department of Applied Science, University of California at Davis, Davis, CA 95616,

More information

1. Density and properties Brief outline 2. Sampling from multivariate normal and MLE 3. Sampling distribution and large sample behavior of X and S 4.

1. Density and properties Brief outline 2. Sampling from multivariate normal and MLE 3. Sampling distribution and large sample behavior of X and S 4. Multivariate normal distribution Reading: AMSA: pages 149-200 Multivariate Analysis, Spring 2016 Institute of Statistics, National Chiao Tung University March 1, 2016 1. Density and properties Brief outline

More information

Vienna University of Technology

Vienna University of Technology Vienna University of Technology Deliverable 4. Final Report Contract with the world bank (1157976) Detecting outliers in household consumption survey data Peter Filzmoser Authors: Johannes Gussenbauer

More information

Robust Tools for the Imperfect World

Robust Tools for the Imperfect World Robust Tools for the Imperfect World Peter Filzmoser a,, Valentin Todorov b a Department of Statistics and Probability Theory, Vienna University of Technology, Wiedner Hauptstr. 8-10, 1040 Vienna, Austria

More information

Finding an unknown number of multivariate outliers

Finding an unknown number of multivariate outliers J. R. Statist. Soc. B (2009) 71, Part 2, pp. Finding an unknown number of multivariate outliers Marco Riani, Università di Parma, Italy Anthony C. Atkinson London School of Economics and Political Science,

More information

Descriptive Data Summarization

Descriptive Data Summarization Descriptive Data Summarization Descriptive data summarization gives the general characteristics of the data and identify the presence of noise or outliers, which is useful for successful data cleaning

More information

Identifying and accounting for outliers and extreme response patterns in latent variable modelling

Identifying and accounting for outliers and extreme response patterns in latent variable modelling Identifying and accounting for outliers and extreme response patterns in latent variable modelling Irini Moustaki Athens University of Economics and Business Outline 1. Define the problem of outliers and

More information

Projection-based outlier detection in functional data

Projection-based outlier detection in functional data Biometrika (2017), xx, x, pp. 1 12 C 2007 Biometrika Trust Printed in Great Britain Projection-based outlier detection in functional data BY HAOJIE REN Institute of Statistics, Nankai University, No.94

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Evaluation of robust PCA for supervised audio outlier detection

Evaluation of robust PCA for supervised audio outlier detection Institut f. Stochastik und Wirtschaftsmathematik 1040 Wien, Wiedner Hauptstr. 8-10/105 AUSTRIA http://www.isw.tuwien.ac.at Evaluation of robust PCA for supervised audio outlier detection S. Brodinova,

More information

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ). .8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics

More information

Modulation of symmetric densities

Modulation of symmetric densities 1 Modulation of symmetric densities 1.1 Motivation This book deals with a formulation for the construction of continuous probability distributions and connected statistical aspects. Before we begin, a

More information

The Stata Journal. Editor Nicholas J. Cox Department of Geography Durham University South Road Durham City DH1 3LE UK

The Stata Journal. Editor Nicholas J. Cox Department of Geography Durham University South Road Durham City DH1 3LE UK The Stata Journal Editor H. Joseph Newton Department of Statistics Texas A&M University College Station, Texas 77843 979-845-8817; fax 979-845-6077 jnewton@stata-journal.com Associate Editors Christopher

More information

Robust repeated median regression in moving windows with data-adaptive width selection SFB 823. Discussion Paper. Matthias Borowski, Roland Fried

Robust repeated median regression in moving windows with data-adaptive width selection SFB 823. Discussion Paper. Matthias Borowski, Roland Fried SFB 823 Robust repeated median regression in moving windows with data-adaptive width selection Discussion Paper Matthias Borowski, Roland Fried Nr. 28/2011 Robust Repeated Median regression in moving

More information

Improvement of The Hotelling s T 2 Charts Using Robust Location Winsorized One Step M-Estimator (WMOM)

Improvement of The Hotelling s T 2 Charts Using Robust Location Winsorized One Step M-Estimator (WMOM) Punjab University Journal of Mathematics (ISSN 1016-2526) Vol. 50(1)(2018) pp. 97-112 Improvement of The Hotelling s T 2 Charts Using Robust Location Winsorized One Step M-Estimator (WMOM) Firas Haddad

More information

Robust multivariate methods in Chemometrics

Robust multivariate methods in Chemometrics Robust multivariate methods in Chemometrics Peter Filzmoser 1 Sven Serneels 2 Ricardo Maronna 3 Pierre J. Van Espen 4 1 Institut für Statistik und Wahrscheinlichkeitstheorie, Technical University of Vienna,

More information

Two Simple Resistant Regression Estimators

Two Simple Resistant Regression Estimators Two Simple Resistant Regression Estimators David J. Olive Southern Illinois University January 13, 2005 Abstract Two simple resistant regression estimators with O P (n 1/2 ) convergence rate are presented.

More information

Development of robust scatter estimators under independent contamination model

Development of robust scatter estimators under independent contamination model Development of robust scatter estimators under independent contamination model C. Agostinelli 1, A. Leung, V.J. Yohai 3 and R.H. Zamar 1 Universita Cà Foscàri di Venezia, University of British Columbia,

More information

TESTS FOR TRANSFORMATIONS AND ROBUST REGRESSION. Anthony Atkinson, 25th March 2014

TESTS FOR TRANSFORMATIONS AND ROBUST REGRESSION. Anthony Atkinson, 25th March 2014 TESTS FOR TRANSFORMATIONS AND ROBUST REGRESSION Anthony Atkinson, 25th March 2014 Joint work with Marco Riani, Parma Department of Statistics London School of Economics London WC2A 2AE, UK a.c.atkinson@lse.ac.uk

More information

Influence Functions for a General Class of Depth-Based Generalized Quantile Functions

Influence Functions for a General Class of Depth-Based Generalized Quantile Functions Influence Functions for a General Class of Depth-Based Generalized Quantile Functions Jin Wang 1 Northern Arizona University and Robert Serfling 2 University of Texas at Dallas June 2005 Final preprint

More information

The Instability of Correlations: Measurement and the Implications for Market Risk

The Instability of Correlations: Measurement and the Implications for Market Risk The Instability of Correlations: Measurement and the Implications for Market Risk Prof. Massimo Guidolin 20254 Advanced Quantitative Methods for Asset Pricing and Structuring Winter/Spring 2018 Threshold

More information

Lecture Outline. Biost 518 Applied Biostatistics II. Choice of Model for Analysis. Choice of Model. Choice of Model. Lecture 10: Multiple Regression:

Lecture Outline. Biost 518 Applied Biostatistics II. Choice of Model for Analysis. Choice of Model. Choice of Model. Lecture 10: Multiple Regression: Biost 518 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture utline Choice of Model Alternative Models Effect of data driven selection of

More information

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation Biost 58 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 5: Review Purpose of Statistics Statistics is about science (Science in the broadest

More information

Multivariate Survival Analysis

Multivariate Survival Analysis Multivariate Survival Analysis Previously we have assumed that either (X i, δ i ) or (X i, δ i, Z i ), i = 1,..., n, are i.i.d.. This may not always be the case. Multivariate survival data can arise in

More information

Multivariate quantiles and conditional depth

Multivariate quantiles and conditional depth M. Hallin a,b, Z. Lu c, D. Paindaveine a, and M. Šiman d a Université libre de Bruxelles, Belgium b Princenton University, USA c University of Adelaide, Australia d Institute of Information Theory and

More information

Regression diagnostics

Regression diagnostics Regression diagnostics Kerby Shedden Department of Statistics, University of Michigan November 5, 018 1 / 6 Motivation When working with a linear model with design matrix X, the conventional linear model

More information

Introduction to Statistical Analysis

Introduction to Statistical Analysis Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive

More information

Correlation and Regression Theory 1) Multivariate Statistics

Correlation and Regression Theory 1) Multivariate Statistics Correlation and Regression Theory 1) Multivariate Statistics What is a multivariate data set? How to statistically analyze this data set? Is there any kind of relationship between different variables in

More information

Computational rank-based statistics

Computational rank-based statistics Article type: Advanced Review Computational rank-based statistics Joseph W. McKean, joseph.mckean@wmich.edu Western Michigan University Jeff T. Terpstra, jeff.terpstra@ndsu.edu North Dakota State University

More information

Parallel Computation of High Dimensional Robust Correlation and Covariance Matrices

Parallel Computation of High Dimensional Robust Correlation and Covariance Matrices Parallel Computation of High Dimensional Robust Correlation and Covariance Matrices James Chilson, Raymond Ng, Alan Wagner Department of Computer Science University of British Columbia Vancouver, BC, Canada,

More information

Detection of Multivariate Outliers in Business Survey Data with Incomplete Information

Detection of Multivariate Outliers in Business Survey Data with Incomplete Information Noname manuscript No. (will be inserted by the editor) Detection of Multivariate Outliers in Business Survey Data with Incomplete Information Valentin Todorov Matthias Templ Peter Filzmoser Received: date

More information