Supplementary Material for Wang and Serfling paper
|
|
- Dominic Simmons
- 6 years ago
- Views:
Transcription
1 Supplementary Material for Wang and Serfling paper March 6, Simulation study Here we provide a simulation study to compare empirically the masking and swamping robustness of our selected outlyingness functions. Two parametric parent distributions are considered, the bivariate standard normal, which is symmetric, and the bivariate exponential with mean (0.5, 0.5), which is rightskewed. 1.1 Simulation plan We generate samples separately from the above two distributions with three different sample sizes n = 50, 500, and We consider contamination models with true positive rate ɛ, for ɛ = 0.02, 0.1, 0.25, and 0.4. For each model, the number of replacements in the sample is given by m = nɛ. The numbers of replications for the sample sizes are chosen to be 5000 for n = 50, 2000 for n = 500, and 500 for n = 2000, respectively. To choose the sample outlyingness thresholds λ, we assume false positive rates under no contamination to be given by α = 0.02 for n = 50, and α = 0.01 for n = 500 and n = That is, under no contamination, we expect about 1 observation, 5 observations, and 20 observations with outlyingness values > λ for n = 50, 500, and 2000, respectively. On this basis, we take as sample outlyingness threshold λ the 2nd largest, the 6th largest, and the 21st largest value of the observed sample outlyingness function, for n = 50, 500, and 2000, respectively. For convenience, we index X i, i = 1, 2,..., n, in order of their Euclidean norms, from smallest to largest. We explore masking and swamping robustness of the following four affine invariant sample outlier identifiers treated in Section 4 of the paper: 1
2 Classical Mahalanobis distance outlyingness (CMD), i.e., O MD (x, X n ) with ( µ, Σ) = (X, S). Robust Mahalanobis distance outlyingness (RMD), i.e., O MD (x, X n ) with ( µ, Σ) given by the MCD estimators. Here these are defined taking subsample size h = n+d+1, which for d = 2 yields 26 for n = 50, 251 for 2 n = 500, and 1001 for n = 2000, respectively. The Robustbase package in R is used to obtain approximate MCD estimators via the Fast-MCD algorithm of Rousseeuw and Van Driessen (1999). Robust Mahalanobis spatial outlyingness (RMS), i.e., O MS (x, X n ) with the MCD covariance estimator and the subsample size h chosen as for RMD. Projection outlyingness (P), i.e., O P (x, X n ) with univariate location and scale given by (Med, MAD d 1 ). For computational convenience, only a finite number 1000 of randomly chosen projection directions are used in our simulation. We measure and compare the masking and swamping robustness of the above outlyingness functions using the following two performance indices: Percent of data points masked among m outliers. If all m outliers are masked, this indicates masking breakdown. Percent of data points swamped among nonoutliers. Here the number of nonoutliers is at most n m. If all n m nonoutliers are swamped, this indicates swamping breakdown. Two scenarios in relation to outliers (similar to Dang and Serfling, 2010) are considered: A Replace X n m+1,..., X n 1, X n by KX n m+1,..., KX n 1, KX n for an inflation factor K chosen = 5. Denote the modified data set by X (A) n,m. B Replace X n m+1,..., X n 1, X n by KX n,..., KX n, KX n for inflation factor K again chosen = 5. Denote the modified data set by X (B) n,m. These are illustrated in Figure 1 for the two parent distributions, with sample size n = 50 and m = 5 replacements. 2
3 Figure 1. Scenarios A and B with sample size 50 and 5 replacements, for the bivariate standard normal distribution (left) and the bivariate exponential(0.5, 0.5) distribution (right). The 5 observations in the original sample with largest Euclidean norms are marked +. Their replacements in Scenario A are marked and in Scenario B are marked. Note the 5 replacements in Scenario B overlap each other and also one of the Scenario A replacements. 1.2 Simulation results The following sections present simulation results on masking and swamping for the two distributions, respectively Results for contaminated bivariate standard normal Table 1 shows for each procedure the average sample outlyingness thresholds λ for the replicated samples of sizes 50, 500, and 2000 from bivariate standard normal. As described earlier, the threshold is chosen according to the false positive rate α under no contamination, which is 0.02 for n = 50 and 0.01 for n = 500 and 2000, respectively. 3
4 n = 50, n = 500, n = 2000, 5000 trials 2000 trials 500 trials CMD RMD RMS P Table 1. Average sample outlyingness thresholds λ for CMD, RMD, RMS and P, with n = 50, 500 and 2000 under no contamination, based on the bivariate normal model Masking performance Table 2 displays the masking robustness of CMD, RMD, RMS and P under scenarios A and B with the four different contamination levels, based on the standard normal model. Here masking breakdown occurs in a given sample when all m outliers (replaced data points) are masked. Average percent (%) masked among m = nɛ replacement outliers ɛ = 0.02 ɛ = 0.10 ɛ = 0.25 ɛ = 0.40 MBP A B A B A B A B CMD n = 50, RMD trials RMS P CMD n = 500, RMD trials RMS P CMD n = 2000, RMD trials RMS P Table 2. Masking performance of CMD, RMD, RMS and P for bivariate standard normal samples with n = 50, 500 and As a benchmark, the MBP column gives the theoretical MBP values. As ɛ increases, masking occurs more frequently, especially when ɛ > MBP. Masking breakdown in a sample occurs when the percent of outliers masked is
5 Comments based on Table 2. Let us discuss the implications of Table 2 for each of the four procedures and then summarize. CMD is not robust with respect to masking, its masking performance degrading seriously as the contamination level increases. In particular, when ɛ 0.25, masking breakdown occurs under Scenario B of outlier replacements in all replications, for all three sample sizes. This result is not surprising since the theoretical MBP of CMD is n 1 0. RMD has strong masking robustness overall. No masking whatsoever occurs in either scenario at contamination levels ɛ = 0.02, 0.10 and 0.25, nor in Scenario A at level ɛ = On the other hand, in Scenario B at level ɛ = 0.40, the average percent masked is high. These results are in line with the theoretical MBP of RMD, which is n 1 n d+1 2 1/2, independently of the sample threshold λ. RMS has very weak masking robustness for the selected thresholds λ given in Table 1 corresponding to approximately the 0.98 quantile of the outlyingness function, which inserted into the MBP formula min { n 1 n d+1 2, n 1 1 λ n } with d = 2 yield MBP values 0.06, 0.03, 2 and 0.03, respectively, for n = 50, 500, and Accordingly, for very small ɛ = 0.02, which is just under the MBP values, there is no masking for either scenario, whereas for all cases of ɛ 0.10 there occurs significant levels of masking for both scenarios. P exhibits very strong robustness, in keeping with its high MBP For both scenarios and all contamination levels, virtually no masking occurs. This agrees with the results for RMD except in the case of Scenario B with contamination level ɛ = 0.40, where P significantly outperforms RMD. In Scenario B with high contamination level, the constraint of elliptical contours is a serious drawback. Thus, for the standard normal contamination model, the above findings corroborate what is suggested by the theoretical MBP values and add some perspective. Namely, CMD has unacceptable masking robustness and hence should not be used in practice, and RMS has only weak masking robustness due low MBP at high sample threshold levels, while RMD and P exhibit strong masking robustness and perform similarly, except that P significantly surpasses RMD in the case of a large cluster contamination Swamping performance Table 3 displays the swamping robustness of CMD, RMD, RMS and P under scenarios A and B with the four different contamination levels, based 5
6 on the standard normal model. Here swamping breakdown occurs in a given sample when all n m nonoutliers (nonreplaced data points) are swamped. Average percent (%) swamped among n m = n(1 ɛ) nonoutliers ɛ = 0.02 ɛ = 0.10 ɛ = 0.25 ɛ = 0.40 SBP A B A B A B A B CMD n = 50, RMD trials RMS P CMD n = 500, RMD trials RMS P CMD n = 2000, RMD trials RMS P Table 3. Swamping performance of CMD, RMD, RMS and P for bivariate standard normal samples with n = 50, 500 and As a benchmark, the SBP column gives the theoretical SBP values. Swamping breakdown in a sample occurs when the percent of nonoutliers swamped is 100. Comments based on Table 3. Except for Scenario B at contamination level ɛ = 0.40, all four procedures exhibit very low swamping, with CMD also performing very well and RMS moderately well even for this extreme scenario. Thus, for the standard normal contamination model, CMD and RMS perform best with respect to swamping robustness, although for masking robustness they are outperformed by RMD and P. These findings corroborate the high SBP values of the four procedures (especially high for CMD) Results for contaminated bivariate exponential Table 4 shows for each of the four procedures the average sample outlyingness thresholds λ for the replicated samples of sizes 50, 500, and 2000 from bivariate exponential(0.5, 0.5). Again, the threshold is chosen according to the false positive rate α under no contamination, which is 0.02 for n = 50 and 0.01 for n = 500 and 2000, respectively. 6
7 n = 50, n = 500, n = 2000, 5000 trials 2000 trials 500 trials CMD RMD RMS P Table 4. Average sample outlyingness thresholds λ for CMD, RMD, RMS and P, with n = 50, 500 and 2000 under no contamination, based on the bivariate exponential(0.5, 0.5) model Masking performance Table 5 displays the masking robustness of CMD, RMD, RMS and P under scenarios A and B with the four different contamination levels, based on the bivariate exponential(0.5, 0.5) model. Here masking breakdown occurs in a given sample when all m outliers (replaced data points) are masked. Average percent (%) masked among m = nɛ replacement outliers ɛ = 0.02 ɛ = 0.10 ɛ = 0.25 ɛ = 0.40 MBP A B A B A B A B CMD n = 50, RMD trials RMS P CMD n = 500, RMD trials RMS P CMD n = 2000, RMD trials RMS P Table 5. Masking performance of CMD, RMD, RMS and P for bivariate exponential(0.5, 0.5) samples with n = 50, 500 and The MBP column gives the theoretical MBP values. Masking breakdown in a sample occurs when the percent of outliers masked is
8 Comments based on Table 5. Both CMD and RMS lack masking robustness, while RMD and P maintain high masking robustness with all contamination levels below the theoretical MBPs. As added perspective, we note that RMD is slightly better than P for Scenario A, while the reverse holds for Scenario B. Of course, when imposing elliptical contours is not appropriate, P is the more suitable choice Swamping performance Table 6 displays the swamping robustness of CMD, RMD, RMS and P under scenarios A and B with the four different contamination levels, based on the bivariate exponential(0.5, 0.5) model. Swamping breakdown occurs in a given sample when all n m nonoutliers (nonreplaced data points) are swamped. Average percent (%) swamped among n m = n(1 ɛ) nonoutliers ɛ = 0.02 ɛ = 0.1 ɛ = 0.25 ɛ = 0.4 SBP A B A B A B A B CMD n = 50, RMD trials RMS P CMD n = 500, RMD trials RMS P CMD n = 2000, RMD trials RMS P Table 6. Swamping performance of CMD, RMD, RMS and P, for bivariate exponential(0.5, 0.5) samples with n = 50, 500 and The SBP column gives the theoretical SBP values. Swamping breakdown in a sample occurs when the percent of nonoutliers swamped is 100. Comments based on Table 6. CMD and RMS exhibit excellent swamping robustness. RMD follows closely, with weakness only in the case of a large cluster of outiers, and P follows RMD competitively. 8
9 1.2.3 Practical recommendations based on the simulation results Although its swamping performance is optimal, CMD has unacceptably low masking performance and hence is not recommended for use in practice. The masking robustness of RMS is weak at high sample thresholds λ, yet it has good swamping performance and its computational burden is relatively light. Both RMD and P are robust with respect to both masking and swamping, with RMD having less computational complexity. It is worth noting, however, that in the contaminated bivariate standard normal model, both the masking performance of RMD and the swamping performance of RMD and P degrade in the presence of a large cluster of outliers. One may select the appropriate outlier identifier according to the preferred balance on masking robustness versus swamping robustness, in conjunction with consideration of the appropriateness or not of elliptical contours, the computational burden, and whether or not protection against a large cluster is desired. These recommendations based on results in two specific parametric models for the data are consistent with the results based on the MBPs and SBPs, which, of course, are nonparametric and have wide general application across diverse data settings. References [1] Dang, X. and Serfling, R. (2010). Nonparametric depth-based multivariate outlier identifiers, and masking robustness properties. Journal of Statistical Planning and Inference [2] Rousseeuw, P. and Van Driessen, K. (1999). A fast algorithm for the minimum covariance determinant estimator. Technometrics
Computationally Easy Outlier Detection via Projection Pursuit with Finitely Many Directions
Computationally Easy Outlier Detection via Projection Pursuit with Finitely Many Directions Robert Serfling 1 and Satyaki Mazumder 2 University of Texas at Dallas and Indian Institute of Science, Education
More informationGeneral Foundations for Studying Masking and Swamping Robustness of Outlier Identifiers
General Foundations for Studying Masking and Swamping Robustness of Outlier Identifiers Robert Serfling 1 and Shanshan Wang 2 University of Texas at Dallas This paper is dedicated to the memory of Kesar
More informationMonitoring Random Start Forward Searches for Multivariate Data
Monitoring Random Start Forward Searches for Multivariate Data Anthony C. Atkinson 1, Marco Riani 2, and Andrea Cerioli 2 1 Department of Statistics, London School of Economics London WC2A 2AE, UK, a.c.atkinson@lse.ac.uk
More informationIntroduction to Robust Statistics. Anthony Atkinson, London School of Economics, UK Marco Riani, Univ. of Parma, Italy
Introduction to Robust Statistics Anthony Atkinson, London School of Economics, UK Marco Riani, Univ. of Parma, Italy Multivariate analysis Multivariate location and scatter Data where the observations
More informationRe-weighted Robust Control Charts for Individual Observations
Universiti Tunku Abdul Rahman, Kuala Lumpur, Malaysia 426 Re-weighted Robust Control Charts for Individual Observations Mandana Mohammadi 1, Habshah Midi 1,2 and Jayanthi Arasan 1,2 1 Laboratory of Applied
More informationMS-E2112 Multivariate Statistical Analysis (5cr) Lecture 4: Measures of Robustness, Robust Principal Component Analysis
MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 4:, Robust Principal Component Analysis Contents Empirical Robust Statistical Methods In statistics, robust methods are methods that perform well
More informationDetection of outliers in multivariate data:
1 Detection of outliers in multivariate data: a method based on clustering and robust estimators Carla M. Santos-Pereira 1 and Ana M. Pires 2 1 Universidade Portucalense Infante D. Henrique, Oporto, Portugal
More informationAccurate and Powerful Multivariate Outlier Detection
Int. Statistical Inst.: Proc. 58th World Statistical Congress, 11, Dublin (Session CPS66) p.568 Accurate and Powerful Multivariate Outlier Detection Cerioli, Andrea Università di Parma, Dipartimento di
More informationRobust estimation of scale and covariance with P n and its application to precision matrix estimation
Robust estimation of scale and covariance with P n and its application to precision matrix estimation Garth Tarr, Samuel Müller and Neville Weber USYD 2013 School of Mathematics and Statistics THE UNIVERSITY
More informationResearch Article Robust Multivariate Control Charts to Detect Small Shifts in Mean
Mathematical Problems in Engineering Volume 011, Article ID 93463, 19 pages doi:.1155/011/93463 Research Article Robust Multivariate Control Charts to Detect Small Shifts in Mean Habshah Midi 1, and Ashkan
More informationRobust Wilks' Statistic based on RMCD for One-Way Multivariate Analysis of Variance (MANOVA)
ISSN 2224-584 (Paper) ISSN 2225-522 (Online) Vol.7, No.2, 27 Robust Wils' Statistic based on RMCD for One-Way Multivariate Analysis of Variance (MANOVA) Abdullah A. Ameen and Osama H. Abbas Department
More informationIDENTIFYING MULTIPLE OUTLIERS IN LINEAR REGRESSION : ROBUST FIT AND CLUSTERING APPROACH
SESSION X : THEORY OF DEFORMATION ANALYSIS II IDENTIFYING MULTIPLE OUTLIERS IN LINEAR REGRESSION : ROBUST FIT AND CLUSTERING APPROACH Robiah Adnan 2 Halim Setan 3 Mohd Nor Mohamad Faculty of Science, Universiti
More informationON THE CALCULATION OF A ROBUST S-ESTIMATOR OF A COVARIANCE MATRIX
STATISTICS IN MEDICINE Statist. Med. 17, 2685 2695 (1998) ON THE CALCULATION OF A ROBUST S-ESTIMATOR OF A COVARIANCE MATRIX N. A. CAMPBELL *, H. P. LOPUHAA AND P. J. ROUSSEEUW CSIRO Mathematical and Information
More informationThe S-estimator of multivariate location and scatter in Stata
The Stata Journal (yyyy) vv, Number ii, pp. 1 9 The S-estimator of multivariate location and scatter in Stata Vincenzo Verardi University of Namur (FUNDP) Center for Research in the Economics of Development
More informationA Convex Hull Peeling Depth Approach to Nonparametric Massive Multivariate Data Analysis with Applications
A Convex Hull Peeling Depth Approach to Nonparametric Massive Multivariate Data Analysis with Applications Hyunsook Lee. hlee@stat.psu.edu Department of Statistics The Pennsylvania State University Hyunsook
More informationSmall Sample Corrections for LTS and MCD
myjournal manuscript No. (will be inserted by the editor) Small Sample Corrections for LTS and MCD G. Pison, S. Van Aelst, and G. Willems Department of Mathematics and Computer Science, Universitaire Instelling
More informationRobust estimation of principal components from depth-based multivariate rank covariance matrix
Robust estimation of principal components from depth-based multivariate rank covariance matrix Subho Majumdar Snigdhansu Chatterjee University of Minnesota, School of Statistics Table of contents Summary
More informationAsymptotic Relative Efficiency in Estimation
Asymptotic Relative Efficiency in Estimation Robert Serfling University of Texas at Dallas October 2009 Prepared for forthcoming INTERNATIONAL ENCYCLOPEDIA OF STATISTICAL SCIENCES, to be published by Springer
More informationFast and robust bootstrap for LTS
Fast and robust bootstrap for LTS Gert Willems a,, Stefan Van Aelst b a Department of Mathematics and Computer Science, University of Antwerp, Middelheimlaan 1, B-2020 Antwerp, Belgium b Department of
More informationStahel-Donoho Estimation for High-Dimensional Data
Stahel-Donoho Estimation for High-Dimensional Data Stefan Van Aelst KULeuven, Department of Mathematics, Section of Statistics Celestijnenlaan 200B, B-3001 Leuven, Belgium Email: Stefan.VanAelst@wis.kuleuven.be
More informationAn Overview of Multiple Outliers in Multidimensional Data
Sri Lankan Journal of Applied Statistics, Vol (14-2) An Overview of Multiple Outliers in Multidimensional Data T. A. Sajesh 1 and M.R. Srinivasan 2 1 Department of Statistics, St. Thomas College, Thrissur,
More informationDetecting outliers in weighted univariate survey data
Detecting outliers in weighted univariate survey data Anna Pauliina Sandqvist October 27, 21 Preliminary Version Abstract Outliers and influential observations are a frequent concern in all kind of statistics,
More informationarxiv: v3 [stat.me] 2 Feb 2018 Abstract
ICS for Multivariate Outlier Detection with Application to Quality Control Aurore Archimbaud a, Klaus Nordhausen b, Anne Ruiz-Gazen a, a Toulouse School of Economics, University of Toulouse 1 Capitole,
More informationIdentification of Multivariate Outliers: A Performance Study
AUSTRIAN JOURNAL OF STATISTICS Volume 34 (2005), Number 2, 127 138 Identification of Multivariate Outliers: A Performance Study Peter Filzmoser Vienna University of Technology, Austria Abstract: Three
More informationCHAPTER 5. Outlier Detection in Multivariate Data
CHAPTER 5 Outlier Detection in Multivariate Data 5.1 Introduction Multivariate outlier detection is the important task of statistical analysis of multivariate data. Many methods have been proposed for
More informationA ROBUST METHOD OF ESTIMATING COVARIANCE MATRIX IN MULTIVARIATE DATA ANALYSIS G.M. OYEYEMI *, R.A. IPINYOMI **
ANALELE ŞTIINłIFICE ALE UNIVERSITĂłII ALEXANDRU IOAN CUZA DIN IAŞI Tomul LVI ŞtiinŃe Economice 9 A ROBUST METHOD OF ESTIMATING COVARIANCE MATRIX IN MULTIVARIATE DATA ANALYSIS G.M. OYEYEMI, R.A. IPINYOMI
More informationA Multi-Step, Cluster-based Multivariate Chart for. Retrospective Monitoring of Individuals
A Multi-Step, Cluster-based Multivariate Chart for Retrospective Monitoring of Individuals J. Marcus Jobe, Michael Pokojovy April 29, 29 J. Marcus Jobe is Professor, Decision Sciences and Management Information
More informationOn Invariant Within Equivalence Coordinate System (IWECS) Transformations
On Invariant Within Equivalence Coordinate System (IWECS) Transformations Robert Serfling Abstract In exploratory data analysis and data mining in the very common setting of a data set X of vectors from
More informationAn Alternative Algorithm for Classification Based on Robust Mahalanobis Distance
Dhaka Univ. J. Sci. 61(1): 81-85, 2013 (January) An Alternative Algorithm for Classification Based on Robust Mahalanobis Distance A. H. Sajib, A. Z. M. Shafiullah 1 and A. H. Sumon Department of Statistics,
More informationFAULT DETECTION AND ISOLATION WITH ROBUST PRINCIPAL COMPONENT ANALYSIS
Int. J. Appl. Math. Comput. Sci., 8, Vol. 8, No. 4, 49 44 DOI:.478/v6-8-38-3 FAULT DETECTION AND ISOLATION WITH ROBUST PRINCIPAL COMPONENT ANALYSIS YVON THARRAULT, GILLES MOUROT, JOSÉ RAGOT, DIDIER MAQUIN
More informationMinimum Regularized Covariance Determinant Estimator
Minimum Regularized Covariance Determinant Estimator Honey, we shrunk the data and the covariance matrix Kris Boudt (joint with: P. Rousseeuw, S. Vanduffel and T. Verdonck) Vrije Universiteit Brussel/Amsterdam
More informationInequalities Relating Addition and Replacement Type Finite Sample Breakdown Points
Inequalities Relating Addition and Replacement Type Finite Sample Breadown Points Robert Serfling Department of Mathematical Sciences University of Texas at Dallas Richardson, Texas 75083-0688, USA Email:
More informationFAST CROSS-VALIDATION IN ROBUST PCA
COMPSTAT 2004 Symposium c Physica-Verlag/Springer 2004 FAST CROSS-VALIDATION IN ROBUST PCA Sanne Engelen, Mia Hubert Key words: Cross-Validation, Robustness, fast algorithm COMPSTAT 2004 section: Partial
More informationFast and Robust Classifiers Adjusted for Skewness
Fast and Robust Classifiers Adjusted for Skewness Mia Hubert 1 and Stephan Van der Veeken 2 1 Department of Mathematics - LStat, Katholieke Universiteit Leuven Celestijnenlaan 200B, Leuven, Belgium, Mia.Hubert@wis.kuleuven.be
More information368 XUMING HE AND GANG WANG of convergence for the MVE estimator is n ;1=3. We establish strong consistency and functional continuity of the MVE estim
Statistica Sinica 6(1996), 367-374 CROSS-CHECKING USING THE MINIMUM VOLUME ELLIPSOID ESTIMATOR Xuming He and Gang Wang University of Illinois and Depaul University Abstract: We show that for a wide class
More informationEvaluation of robust PCA for supervised audio outlier detection
Evaluation of robust PCA for supervised audio outlier detection Sarka Brodinova, Vienna University of Technology, sarka.brodinova@tuwien.ac.at Thomas Ortner, Vienna University of Technology, thomas.ortner@tuwien.ac.at
More informationMultivariate outlier detection based on a robust Mahalanobis distance with shrinkage estimators
UC3M Working Papers Statistics and Econometrics 17-10 ISSN 2387-0303 Mayo 2017 Departamento de Estadística Universidad Carlos III de Madrid Calle Madrid, 126 28903 Getafe (Spain) Fax (34) 91 624-98-48
More informationRobust Exponential Smoothing of Multivariate Time Series
Robust Exponential Smoothing of Multivariate Time Series Christophe Croux,a, Sarah Gelper b, Koen Mahieu a a Faculty of Business and Economics, K.U.Leuven, Naamsestraat 69, 3000 Leuven, Belgium b Erasmus
More informationOutlier detection for skewed data
Outlier detection for skewed data Mia Hubert 1 and Stephan Van der Veeken December 7, 27 Abstract Most outlier detection rules for multivariate data are based on the assumption of elliptical symmetry of
More informationMinimum Hellinger Distance Estimation in a. Semiparametric Mixture Model
Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.
More informationDesign of Screening Experiments with Partial Replication
Design of Screening Experiments with Partial Replication David J. Edwards Department of Statistical Sciences & Operations Research Virginia Commonwealth University Robert D. Leonard Department of Information
More informationMedian Cross-Validation
Median Cross-Validation Chi-Wai Yu 1, and Bertrand Clarke 2 1 Department of Mathematics Hong Kong University of Science and Technology 2 Department of Medicine University of Miami IISA 2011 Outline Motivational
More informationImproved Feasible Solution Algorithms for. High Breakdown Estimation. Douglas M. Hawkins. David J. Olive. Department of Applied Statistics
Improved Feasible Solution Algorithms for High Breakdown Estimation Douglas M. Hawkins David J. Olive Department of Applied Statistics University of Minnesota St Paul, MN 55108 Abstract High breakdown
More informationMULTICHANNEL SIGNAL PROCESSING USING SPATIAL RANK COVARIANCE MATRICES
MULTICHANNEL SIGNAL PROCESSING USING SPATIAL RANK COVARIANCE MATRICES S. Visuri 1 H. Oja V. Koivunen 1 1 Signal Processing Lab. Dept. of Statistics Tampere Univ. of Technology University of Jyväskylä P.O.
More informationComputational Statistics and Data Analysis
Computational Statistics and Data Analysis 65 (2013) 29 45 Contents lists available at SciVerse ScienceDirect Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda Robust
More informationDetecting outliers and/or leverage points: a robust two-stage procedure with bootstrap cut-off points
Detecting outliers and/or leverage points: a robust two-stage procedure with bootstrap cut-off points Ettore Marubini (1), Annalisa Orenti (1) Background: Identification and assessment of outliers, have
More informationSmall sample corrections for LTS and MCD
Metrika (2002) 55: 111 123 > Springer-Verlag 2002 Small sample corrections for LTS and MCD G. Pison, S. Van Aelst*, and G. Willems Department of Mathematics and Computer Science, Universitaire Instelling
More informationRobust Classification for Skewed Data
Advances in Data Analysis and Classification manuscript No. (will be inserted by the editor) Robust Classification for Skewed Data Mia Hubert Stephan Van der Veeken Received: date / Accepted: date Abstract
More informationThe Robustness of the Multivariate EWMA Control Chart
The Robustness of the Multivariate EWMA Control Chart Zachary G. Stoumbos, Rutgers University, and Joe H. Sullivan, Mississippi State University Joe H. Sullivan, MSU, MS 39762 Key Words: Elliptically symmetric,
More informationDefinition 5.1: Rousseeuw and Van Driessen (1999). The DD plot is a plot of the classical Mahalanobis distances MD i versus robust Mahalanobis
Chapter 5 DD Plots and Prediction Regions 5. DD Plots A basic way of designing a graphical display is to arrange for reference situations to correspond to straight lines in the plot. Chambers, Cleveland,
More informationEffect of outliers on the variable selection by the regularized regression
Communications for Statistical Applications and Methods 2018, Vol. 25, No. 2, 235 243 https://doi.org/10.29220/csam.2018.25.2.235 Print ISSN 2287-7843 / Online ISSN 2383-4757 Effect of outliers on the
More informationTITLE : Robust Control Charts for Monitoring Process Mean of. Phase-I Multivariate Individual Observations AUTHORS : Asokan Mulayath Variyath.
TITLE : Robust Control Charts for Monitoring Process Mean of Phase-I Multivariate Individual Observations AUTHORS : Asokan Mulayath Variyath Department of Mathematics and Statistics, Memorial University
More informationMore on Unsupervised Learning
More on Unsupervised Learning Two types of problems are to find association rules for occurrences in common in observations (market basket analysis), and finding the groups of values of observational data
More informationOutlier detection for high-dimensional data
Biometrika (2015), 102,3,pp. 589 599 doi: 10.1093/biomet/asv021 Printed in Great Britain Advance Access publication 7 June 2015 Outlier detection for high-dimensional data BY KWANGIL RO, CHANGLIANG ZOU,
More informationGeneralized Multivariate Rank Type Test Statistics via Spatial U-Quantiles
Generalized Multivariate Rank Type Test Statistics via Spatial U-Quantiles Weihua Zhou 1 University of North Carolina at Charlotte and Robert Serfling 2 University of Texas at Dallas Final revision for
More informationBootstrapping the Confidence Intervals of R 2 MAD for Samples from Contaminated Standard Logistic Distribution
Pertanika J. Sci. & Technol. 18 (1): 209 221 (2010) ISSN: 0128-7680 Universiti Putra Malaysia Press Bootstrapping the Confidence Intervals of R 2 MAD for Samples from Contaminated Standard Logistic Distribution
More informationRegression Clustering
Regression Clustering In regression clustering, we assume a model of the form y = f g (x, θ g ) + ɛ g for observations y and x in the g th group. Usually, of course, we assume linear models of the form
More informationExperimental Design and Data Analysis for Biologists
Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1
More informationDESIGNING EXPERIMENTS AND ANALYZING DATA A Model Comparison Perspective
DESIGNING EXPERIMENTS AND ANALYZING DATA A Model Comparison Perspective Second Edition Scott E. Maxwell Uniuersity of Notre Dame Harold D. Delaney Uniuersity of New Mexico J,t{,.?; LAWRENCE ERLBAUM ASSOCIATES,
More informationDepth-weighted robust multivariate regression with application to sparse data
164 The Canadian Journal of Statistics Vol. 45, No. 2, 2017, Pages 164 184 La revue canadienne de statistique Depth-weighted robust multivariate regression with application to sparse data Subhajit DUTTA
More informationDiscriminant Analysis with High Dimensional. von Mises-Fisher distribution and
Athens Journal of Sciences December 2014 Discriminant Analysis with High Dimensional von Mises - Fisher Distributions By Mario Romanazzi This paper extends previous work in discriminant analysis with von
More informationIMPROVING THE SMALL-SAMPLE EFFICIENCY OF A ROBUST CORRELATION MATRIX: A NOTE
IMPROVING THE SMALL-SAMPLE EFFICIENCY OF A ROBUST CORRELATION MATRIX: A NOTE Eric Blankmeyer Department of Finance and Economics McCoy College of Business Administration Texas State University San Marcos
More informationPackage ltsbase. R topics documented: February 20, 2015
Package ltsbase February 20, 2015 Type Package Title Ridge and Liu Estimates based on LTS (Least Trimmed Squares) Method Version 1.0.1 Date 2013-08-02 Author Betul Kan Kilinc [aut, cre], Ozlem Alpu [aut,
More informationMULTIVARIATE TECHNIQUES, ROBUSTNESS
MULTIVARIATE TECHNIQUES, ROBUSTNESS Mia Hubert Associate Professor, Department of Mathematics and L-STAT Katholieke Universiteit Leuven, Belgium mia.hubert@wis.kuleuven.be Peter J. Rousseeuw 1 Senior Researcher,
More informationCOMPARING ROBUST REGRESSION LINES ASSOCIATED WITH TWO DEPENDENT GROUPS WHEN THERE IS HETEROSCEDASTICITY
COMPARING ROBUST REGRESSION LINES ASSOCIATED WITH TWO DEPENDENT GROUPS WHEN THERE IS HETEROSCEDASTICITY Rand R. Wilcox Dept of Psychology University of Southern California Florence Clark Division of Occupational
More informationAnomaly (outlier) detection. Huiping Cao, Anomaly 1
Anomaly (outlier) detection Huiping Cao, Anomaly 1 Outline General concepts What are outliers Types of outliers Causes of anomalies Challenges of outlier detection Outlier detection approaches Huiping
More informationRobust scale estimation with extensions
Robust scale estimation with extensions Garth Tarr, Samuel Müller and Neville Weber School of Mathematics and Statistics THE UNIVERSITY OF SYDNEY Outline The robust scale estimator P n Robust covariance
More informationDirectionally Sensitive Multivariate Statistical Process Control Methods
Directionally Sensitive Multivariate Statistical Process Control Methods Ronald D. Fricker, Jr. Naval Postgraduate School October 5, 2005 Abstract In this paper we develop two directionally sensitive statistical
More informationApplied Multivariate and Longitudinal Data Analysis
Applied Multivariate and Longitudinal Data Analysis Chapter 2: Inference about the mean vector(s) Ana-Maria Staicu SAS Hall 5220; 919-515-0644; astaicu@ncsu.edu 1 In this chapter we will discuss inference
More informationComputational Connections Between Robust Multivariate Analysis and Clustering
1 Computational Connections Between Robust Multivariate Analysis and Clustering David M. Rocke 1 and David L. Woodruff 2 1 Department of Applied Science, University of California at Davis, Davis, CA 95616,
More information1. Density and properties Brief outline 2. Sampling from multivariate normal and MLE 3. Sampling distribution and large sample behavior of X and S 4.
Multivariate normal distribution Reading: AMSA: pages 149-200 Multivariate Analysis, Spring 2016 Institute of Statistics, National Chiao Tung University March 1, 2016 1. Density and properties Brief outline
More informationVienna University of Technology
Vienna University of Technology Deliverable 4. Final Report Contract with the world bank (1157976) Detecting outliers in household consumption survey data Peter Filzmoser Authors: Johannes Gussenbauer
More informationRobust Tools for the Imperfect World
Robust Tools for the Imperfect World Peter Filzmoser a,, Valentin Todorov b a Department of Statistics and Probability Theory, Vienna University of Technology, Wiedner Hauptstr. 8-10, 1040 Vienna, Austria
More informationFinding an unknown number of multivariate outliers
J. R. Statist. Soc. B (2009) 71, Part 2, pp. Finding an unknown number of multivariate outliers Marco Riani, Università di Parma, Italy Anthony C. Atkinson London School of Economics and Political Science,
More informationDescriptive Data Summarization
Descriptive Data Summarization Descriptive data summarization gives the general characteristics of the data and identify the presence of noise or outliers, which is useful for successful data cleaning
More informationIdentifying and accounting for outliers and extreme response patterns in latent variable modelling
Identifying and accounting for outliers and extreme response patterns in latent variable modelling Irini Moustaki Athens University of Economics and Business Outline 1. Define the problem of outliers and
More informationProjection-based outlier detection in functional data
Biometrika (2017), xx, x, pp. 1 12 C 2007 Biometrika Trust Printed in Great Britain Projection-based outlier detection in functional data BY HAOJIE REN Institute of Statistics, Nankai University, No.94
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationEvaluation of robust PCA for supervised audio outlier detection
Institut f. Stochastik und Wirtschaftsmathematik 1040 Wien, Wiedner Hauptstr. 8-10/105 AUSTRIA http://www.isw.tuwien.ac.at Evaluation of robust PCA for supervised audio outlier detection S. Brodinova,
More informationx. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).
.8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics
More informationModulation of symmetric densities
1 Modulation of symmetric densities 1.1 Motivation This book deals with a formulation for the construction of continuous probability distributions and connected statistical aspects. Before we begin, a
More informationThe Stata Journal. Editor Nicholas J. Cox Department of Geography Durham University South Road Durham City DH1 3LE UK
The Stata Journal Editor H. Joseph Newton Department of Statistics Texas A&M University College Station, Texas 77843 979-845-8817; fax 979-845-6077 jnewton@stata-journal.com Associate Editors Christopher
More informationRobust repeated median regression in moving windows with data-adaptive width selection SFB 823. Discussion Paper. Matthias Borowski, Roland Fried
SFB 823 Robust repeated median regression in moving windows with data-adaptive width selection Discussion Paper Matthias Borowski, Roland Fried Nr. 28/2011 Robust Repeated Median regression in moving
More informationImprovement of The Hotelling s T 2 Charts Using Robust Location Winsorized One Step M-Estimator (WMOM)
Punjab University Journal of Mathematics (ISSN 1016-2526) Vol. 50(1)(2018) pp. 97-112 Improvement of The Hotelling s T 2 Charts Using Robust Location Winsorized One Step M-Estimator (WMOM) Firas Haddad
More informationRobust multivariate methods in Chemometrics
Robust multivariate methods in Chemometrics Peter Filzmoser 1 Sven Serneels 2 Ricardo Maronna 3 Pierre J. Van Espen 4 1 Institut für Statistik und Wahrscheinlichkeitstheorie, Technical University of Vienna,
More informationTwo Simple Resistant Regression Estimators
Two Simple Resistant Regression Estimators David J. Olive Southern Illinois University January 13, 2005 Abstract Two simple resistant regression estimators with O P (n 1/2 ) convergence rate are presented.
More informationDevelopment of robust scatter estimators under independent contamination model
Development of robust scatter estimators under independent contamination model C. Agostinelli 1, A. Leung, V.J. Yohai 3 and R.H. Zamar 1 Universita Cà Foscàri di Venezia, University of British Columbia,
More informationTESTS FOR TRANSFORMATIONS AND ROBUST REGRESSION. Anthony Atkinson, 25th March 2014
TESTS FOR TRANSFORMATIONS AND ROBUST REGRESSION Anthony Atkinson, 25th March 2014 Joint work with Marco Riani, Parma Department of Statistics London School of Economics London WC2A 2AE, UK a.c.atkinson@lse.ac.uk
More informationInfluence Functions for a General Class of Depth-Based Generalized Quantile Functions
Influence Functions for a General Class of Depth-Based Generalized Quantile Functions Jin Wang 1 Northern Arizona University and Robert Serfling 2 University of Texas at Dallas June 2005 Final preprint
More informationThe Instability of Correlations: Measurement and the Implications for Market Risk
The Instability of Correlations: Measurement and the Implications for Market Risk Prof. Massimo Guidolin 20254 Advanced Quantitative Methods for Asset Pricing and Structuring Winter/Spring 2018 Threshold
More informationLecture Outline. Biost 518 Applied Biostatistics II. Choice of Model for Analysis. Choice of Model. Choice of Model. Lecture 10: Multiple Regression:
Biost 518 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture utline Choice of Model Alternative Models Effect of data driven selection of
More informationBiost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation
Biost 58 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 5: Review Purpose of Statistics Statistics is about science (Science in the broadest
More informationMultivariate Survival Analysis
Multivariate Survival Analysis Previously we have assumed that either (X i, δ i ) or (X i, δ i, Z i ), i = 1,..., n, are i.i.d.. This may not always be the case. Multivariate survival data can arise in
More informationMultivariate quantiles and conditional depth
M. Hallin a,b, Z. Lu c, D. Paindaveine a, and M. Šiman d a Université libre de Bruxelles, Belgium b Princenton University, USA c University of Adelaide, Australia d Institute of Information Theory and
More informationRegression diagnostics
Regression diagnostics Kerby Shedden Department of Statistics, University of Michigan November 5, 018 1 / 6 Motivation When working with a linear model with design matrix X, the conventional linear model
More informationIntroduction to Statistical Analysis
Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive
More informationCorrelation and Regression Theory 1) Multivariate Statistics
Correlation and Regression Theory 1) Multivariate Statistics What is a multivariate data set? How to statistically analyze this data set? Is there any kind of relationship between different variables in
More informationComputational rank-based statistics
Article type: Advanced Review Computational rank-based statistics Joseph W. McKean, joseph.mckean@wmich.edu Western Michigan University Jeff T. Terpstra, jeff.terpstra@ndsu.edu North Dakota State University
More informationParallel Computation of High Dimensional Robust Correlation and Covariance Matrices
Parallel Computation of High Dimensional Robust Correlation and Covariance Matrices James Chilson, Raymond Ng, Alan Wagner Department of Computer Science University of British Columbia Vancouver, BC, Canada,
More informationDetection of Multivariate Outliers in Business Survey Data with Incomplete Information
Noname manuscript No. (will be inserted by the editor) Detection of Multivariate Outliers in Business Survey Data with Incomplete Information Valentin Todorov Matthias Templ Peter Filzmoser Received: date
More information