Supplementary Material for Wang and Serfling paper

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Supplementary Material for Wang and Serfling paper"

Transcription

1 Supplementary Material for Wang and Serfling paper March 6, Simulation study Here we provide a simulation study to compare empirically the masking and swamping robustness of our selected outlyingness functions. Two parametric parent distributions are considered, the bivariate standard normal, which is symmetric, and the bivariate exponential with mean (0.5, 0.5), which is rightskewed. 1.1 Simulation plan We generate samples separately from the above two distributions with three different sample sizes n = 50, 500, and We consider contamination models with true positive rate ɛ, for ɛ = 0.02, 0.1, 0.25, and 0.4. For each model, the number of replacements in the sample is given by m = nɛ. The numbers of replications for the sample sizes are chosen to be 5000 for n = 50, 2000 for n = 500, and 500 for n = 2000, respectively. To choose the sample outlyingness thresholds λ, we assume false positive rates under no contamination to be given by α = 0.02 for n = 50, and α = 0.01 for n = 500 and n = That is, under no contamination, we expect about 1 observation, 5 observations, and 20 observations with outlyingness values > λ for n = 50, 500, and 2000, respectively. On this basis, we take as sample outlyingness threshold λ the 2nd largest, the 6th largest, and the 21st largest value of the observed sample outlyingness function, for n = 50, 500, and 2000, respectively. For convenience, we index X i, i = 1, 2,..., n, in order of their Euclidean norms, from smallest to largest. We explore masking and swamping robustness of the following four affine invariant sample outlier identifiers treated in Section 4 of the paper: 1

2 Classical Mahalanobis distance outlyingness (CMD), i.e., O MD (x, X n ) with ( µ, Σ) = (X, S). Robust Mahalanobis distance outlyingness (RMD), i.e., O MD (x, X n ) with ( µ, Σ) given by the MCD estimators. Here these are defined taking subsample size h = n+d+1, which for d = 2 yields 26 for n = 50, 251 for 2 n = 500, and 1001 for n = 2000, respectively. The Robustbase package in R is used to obtain approximate MCD estimators via the Fast-MCD algorithm of Rousseeuw and Van Driessen (1999). Robust Mahalanobis spatial outlyingness (RMS), i.e., O MS (x, X n ) with the MCD covariance estimator and the subsample size h chosen as for RMD. Projection outlyingness (P), i.e., O P (x, X n ) with univariate location and scale given by (Med, MAD d 1 ). For computational convenience, only a finite number 1000 of randomly chosen projection directions are used in our simulation. We measure and compare the masking and swamping robustness of the above outlyingness functions using the following two performance indices: Percent of data points masked among m outliers. If all m outliers are masked, this indicates masking breakdown. Percent of data points swamped among nonoutliers. Here the number of nonoutliers is at most n m. If all n m nonoutliers are swamped, this indicates swamping breakdown. Two scenarios in relation to outliers (similar to Dang and Serfling, 2010) are considered: A Replace X n m+1,..., X n 1, X n by KX n m+1,..., KX n 1, KX n for an inflation factor K chosen = 5. Denote the modified data set by X (A) n,m. B Replace X n m+1,..., X n 1, X n by KX n,..., KX n, KX n for inflation factor K again chosen = 5. Denote the modified data set by X (B) n,m. These are illustrated in Figure 1 for the two parent distributions, with sample size n = 50 and m = 5 replacements. 2

3 Figure 1. Scenarios A and B with sample size 50 and 5 replacements, for the bivariate standard normal distribution (left) and the bivariate exponential(0.5, 0.5) distribution (right). The 5 observations in the original sample with largest Euclidean norms are marked +. Their replacements in Scenario A are marked and in Scenario B are marked. Note the 5 replacements in Scenario B overlap each other and also one of the Scenario A replacements. 1.2 Simulation results The following sections present simulation results on masking and swamping for the two distributions, respectively Results for contaminated bivariate standard normal Table 1 shows for each procedure the average sample outlyingness thresholds λ for the replicated samples of sizes 50, 500, and 2000 from bivariate standard normal. As described earlier, the threshold is chosen according to the false positive rate α under no contamination, which is 0.02 for n = 50 and 0.01 for n = 500 and 2000, respectively. 3

4 n = 50, n = 500, n = 2000, 5000 trials 2000 trials 500 trials CMD RMD RMS P Table 1. Average sample outlyingness thresholds λ for CMD, RMD, RMS and P, with n = 50, 500 and 2000 under no contamination, based on the bivariate normal model Masking performance Table 2 displays the masking robustness of CMD, RMD, RMS and P under scenarios A and B with the four different contamination levels, based on the standard normal model. Here masking breakdown occurs in a given sample when all m outliers (replaced data points) are masked. Average percent (%) masked among m = nɛ replacement outliers ɛ = 0.02 ɛ = 0.10 ɛ = 0.25 ɛ = 0.40 MBP A B A B A B A B CMD n = 50, RMD trials RMS P CMD n = 500, RMD trials RMS P CMD n = 2000, RMD trials RMS P Table 2. Masking performance of CMD, RMD, RMS and P for bivariate standard normal samples with n = 50, 500 and As a benchmark, the MBP column gives the theoretical MBP values. As ɛ increases, masking occurs more frequently, especially when ɛ > MBP. Masking breakdown in a sample occurs when the percent of outliers masked is

5 Comments based on Table 2. Let us discuss the implications of Table 2 for each of the four procedures and then summarize. CMD is not robust with respect to masking, its masking performance degrading seriously as the contamination level increases. In particular, when ɛ 0.25, masking breakdown occurs under Scenario B of outlier replacements in all replications, for all three sample sizes. This result is not surprising since the theoretical MBP of CMD is n 1 0. RMD has strong masking robustness overall. No masking whatsoever occurs in either scenario at contamination levels ɛ = 0.02, 0.10 and 0.25, nor in Scenario A at level ɛ = On the other hand, in Scenario B at level ɛ = 0.40, the average percent masked is high. These results are in line with the theoretical MBP of RMD, which is n 1 n d+1 2 1/2, independently of the sample threshold λ. RMS has very weak masking robustness for the selected thresholds λ given in Table 1 corresponding to approximately the 0.98 quantile of the outlyingness function, which inserted into the MBP formula min { n 1 n d+1 2, n 1 1 λ n } with d = 2 yield MBP values 0.06, 0.03, 2 and 0.03, respectively, for n = 50, 500, and Accordingly, for very small ɛ = 0.02, which is just under the MBP values, there is no masking for either scenario, whereas for all cases of ɛ 0.10 there occurs significant levels of masking for both scenarios. P exhibits very strong robustness, in keeping with its high MBP For both scenarios and all contamination levels, virtually no masking occurs. This agrees with the results for RMD except in the case of Scenario B with contamination level ɛ = 0.40, where P significantly outperforms RMD. In Scenario B with high contamination level, the constraint of elliptical contours is a serious drawback. Thus, for the standard normal contamination model, the above findings corroborate what is suggested by the theoretical MBP values and add some perspective. Namely, CMD has unacceptable masking robustness and hence should not be used in practice, and RMS has only weak masking robustness due low MBP at high sample threshold levels, while RMD and P exhibit strong masking robustness and perform similarly, except that P significantly surpasses RMD in the case of a large cluster contamination Swamping performance Table 3 displays the swamping robustness of CMD, RMD, RMS and P under scenarios A and B with the four different contamination levels, based 5

6 on the standard normal model. Here swamping breakdown occurs in a given sample when all n m nonoutliers (nonreplaced data points) are swamped. Average percent (%) swamped among n m = n(1 ɛ) nonoutliers ɛ = 0.02 ɛ = 0.10 ɛ = 0.25 ɛ = 0.40 SBP A B A B A B A B CMD n = 50, RMD trials RMS P CMD n = 500, RMD trials RMS P CMD n = 2000, RMD trials RMS P Table 3. Swamping performance of CMD, RMD, RMS and P for bivariate standard normal samples with n = 50, 500 and As a benchmark, the SBP column gives the theoretical SBP values. Swamping breakdown in a sample occurs when the percent of nonoutliers swamped is 100. Comments based on Table 3. Except for Scenario B at contamination level ɛ = 0.40, all four procedures exhibit very low swamping, with CMD also performing very well and RMS moderately well even for this extreme scenario. Thus, for the standard normal contamination model, CMD and RMS perform best with respect to swamping robustness, although for masking robustness they are outperformed by RMD and P. These findings corroborate the high SBP values of the four procedures (especially high for CMD) Results for contaminated bivariate exponential Table 4 shows for each of the four procedures the average sample outlyingness thresholds λ for the replicated samples of sizes 50, 500, and 2000 from bivariate exponential(0.5, 0.5). Again, the threshold is chosen according to the false positive rate α under no contamination, which is 0.02 for n = 50 and 0.01 for n = 500 and 2000, respectively. 6

7 n = 50, n = 500, n = 2000, 5000 trials 2000 trials 500 trials CMD RMD RMS P Table 4. Average sample outlyingness thresholds λ for CMD, RMD, RMS and P, with n = 50, 500 and 2000 under no contamination, based on the bivariate exponential(0.5, 0.5) model Masking performance Table 5 displays the masking robustness of CMD, RMD, RMS and P under scenarios A and B with the four different contamination levels, based on the bivariate exponential(0.5, 0.5) model. Here masking breakdown occurs in a given sample when all m outliers (replaced data points) are masked. Average percent (%) masked among m = nɛ replacement outliers ɛ = 0.02 ɛ = 0.10 ɛ = 0.25 ɛ = 0.40 MBP A B A B A B A B CMD n = 50, RMD trials RMS P CMD n = 500, RMD trials RMS P CMD n = 2000, RMD trials RMS P Table 5. Masking performance of CMD, RMD, RMS and P for bivariate exponential(0.5, 0.5) samples with n = 50, 500 and The MBP column gives the theoretical MBP values. Masking breakdown in a sample occurs when the percent of outliers masked is

8 Comments based on Table 5. Both CMD and RMS lack masking robustness, while RMD and P maintain high masking robustness with all contamination levels below the theoretical MBPs. As added perspective, we note that RMD is slightly better than P for Scenario A, while the reverse holds for Scenario B. Of course, when imposing elliptical contours is not appropriate, P is the more suitable choice Swamping performance Table 6 displays the swamping robustness of CMD, RMD, RMS and P under scenarios A and B with the four different contamination levels, based on the bivariate exponential(0.5, 0.5) model. Swamping breakdown occurs in a given sample when all n m nonoutliers (nonreplaced data points) are swamped. Average percent (%) swamped among n m = n(1 ɛ) nonoutliers ɛ = 0.02 ɛ = 0.1 ɛ = 0.25 ɛ = 0.4 SBP A B A B A B A B CMD n = 50, RMD trials RMS P CMD n = 500, RMD trials RMS P CMD n = 2000, RMD trials RMS P Table 6. Swamping performance of CMD, RMD, RMS and P, for bivariate exponential(0.5, 0.5) samples with n = 50, 500 and The SBP column gives the theoretical SBP values. Swamping breakdown in a sample occurs when the percent of nonoutliers swamped is 100. Comments based on Table 6. CMD and RMS exhibit excellent swamping robustness. RMD follows closely, with weakness only in the case of a large cluster of outiers, and P follows RMD competitively. 8

9 1.2.3 Practical recommendations based on the simulation results Although its swamping performance is optimal, CMD has unacceptably low masking performance and hence is not recommended for use in practice. The masking robustness of RMS is weak at high sample thresholds λ, yet it has good swamping performance and its computational burden is relatively light. Both RMD and P are robust with respect to both masking and swamping, with RMD having less computational complexity. It is worth noting, however, that in the contaminated bivariate standard normal model, both the masking performance of RMD and the swamping performance of RMD and P degrade in the presence of a large cluster of outliers. One may select the appropriate outlier identifier according to the preferred balance on masking robustness versus swamping robustness, in conjunction with consideration of the appropriateness or not of elliptical contours, the computational burden, and whether or not protection against a large cluster is desired. These recommendations based on results in two specific parametric models for the data are consistent with the results based on the MBPs and SBPs, which, of course, are nonparametric and have wide general application across diverse data settings. References [1] Dang, X. and Serfling, R. (2010). Nonparametric depth-based multivariate outlier identifiers, and masking robustness properties. Journal of Statistical Planning and Inference [2] Rousseeuw, P. and Van Driessen, K. (1999). A fast algorithm for the minimum covariance determinant estimator. Technometrics

General Foundations for Studying Masking and Swamping Robustness of Outlier Identifiers

General Foundations for Studying Masking and Swamping Robustness of Outlier Identifiers General Foundations for Studying Masking and Swamping Robustness of Outlier Identifiers Robert Serfling 1 and Shanshan Wang 2 University of Texas at Dallas This paper is dedicated to the memory of Kesar

More information

Accurate and Powerful Multivariate Outlier Detection

Accurate and Powerful Multivariate Outlier Detection Int. Statistical Inst.: Proc. 58th World Statistical Congress, 11, Dublin (Session CPS66) p.568 Accurate and Powerful Multivariate Outlier Detection Cerioli, Andrea Università di Parma, Dipartimento di

More information

IDENTIFYING MULTIPLE OUTLIERS IN LINEAR REGRESSION : ROBUST FIT AND CLUSTERING APPROACH

IDENTIFYING MULTIPLE OUTLIERS IN LINEAR REGRESSION : ROBUST FIT AND CLUSTERING APPROACH SESSION X : THEORY OF DEFORMATION ANALYSIS II IDENTIFYING MULTIPLE OUTLIERS IN LINEAR REGRESSION : ROBUST FIT AND CLUSTERING APPROACH Robiah Adnan 2 Halim Setan 3 Mohd Nor Mohamad Faculty of Science, Universiti

More information

Robust estimation of principal components from depth-based multivariate rank covariance matrix

Robust estimation of principal components from depth-based multivariate rank covariance matrix Robust estimation of principal components from depth-based multivariate rank covariance matrix Subho Majumdar Snigdhansu Chatterjee University of Minnesota, School of Statistics Table of contents Summary

More information

Identification of Multivariate Outliers: A Performance Study

Identification of Multivariate Outliers: A Performance Study AUSTRIAN JOURNAL OF STATISTICS Volume 34 (2005), Number 2, 127 138 Identification of Multivariate Outliers: A Performance Study Peter Filzmoser Vienna University of Technology, Austria Abstract: Three

More information

A Multi-Step, Cluster-based Multivariate Chart for. Retrospective Monitoring of Individuals

A Multi-Step, Cluster-based Multivariate Chart for. Retrospective Monitoring of Individuals A Multi-Step, Cluster-based Multivariate Chart for Retrospective Monitoring of Individuals J. Marcus Jobe, Michael Pokojovy April 29, 29 J. Marcus Jobe is Professor, Decision Sciences and Management Information

More information

Asymptotic Relative Efficiency in Estimation

Asymptotic Relative Efficiency in Estimation Asymptotic Relative Efficiency in Estimation Robert Serfling University of Texas at Dallas October 2009 Prepared for forthcoming INTERNATIONAL ENCYCLOPEDIA OF STATISTICAL SCIENCES, to be published by Springer

More information

FAULT DETECTION AND ISOLATION WITH ROBUST PRINCIPAL COMPONENT ANALYSIS

FAULT DETECTION AND ISOLATION WITH ROBUST PRINCIPAL COMPONENT ANALYSIS Int. J. Appl. Math. Comput. Sci., 8, Vol. 8, No. 4, 49 44 DOI:.478/v6-8-38-3 FAULT DETECTION AND ISOLATION WITH ROBUST PRINCIPAL COMPONENT ANALYSIS YVON THARRAULT, GILLES MOUROT, JOSÉ RAGOT, DIDIER MAQUIN

More information

Outlier detection for skewed data

Outlier detection for skewed data Outlier detection for skewed data Mia Hubert 1 and Stephan Van der Veeken December 7, 27 Abstract Most outlier detection rules for multivariate data are based on the assumption of elliptical symmetry of

More information

Robust Exponential Smoothing of Multivariate Time Series

Robust Exponential Smoothing of Multivariate Time Series Robust Exponential Smoothing of Multivariate Time Series Christophe Croux,a, Sarah Gelper b, Koen Mahieu a a Faculty of Business and Economics, K.U.Leuven, Naamsestraat 69, 3000 Leuven, Belgium b Erasmus

More information

Generalized Multivariate Rank Type Test Statistics via Spatial U-Quantiles

Generalized Multivariate Rank Type Test Statistics via Spatial U-Quantiles Generalized Multivariate Rank Type Test Statistics via Spatial U-Quantiles Weihua Zhou 1 University of North Carolina at Charlotte and Robert Serfling 2 University of Texas at Dallas Final revision for

More information

Small sample corrections for LTS and MCD

Small sample corrections for LTS and MCD Metrika (2002) 55: 111 123 > Springer-Verlag 2002 Small sample corrections for LTS and MCD G. Pison, S. Van Aelst*, and G. Willems Department of Mathematics and Computer Science, Universitaire Instelling

More information

Robust Tools for the Imperfect World

Robust Tools for the Imperfect World Robust Tools for the Imperfect World Peter Filzmoser a,, Valentin Todorov b a Department of Statistics and Probability Theory, Vienna University of Technology, Wiedner Hauptstr. 8-10, 1040 Vienna, Austria

More information

Vienna University of Technology

Vienna University of Technology Vienna University of Technology Deliverable 4. Final Report Contract with the world bank (1157976) Detecting outliers in household consumption survey data Peter Filzmoser Authors: Johannes Gussenbauer

More information

Definition 5.1: Rousseeuw and Van Driessen (1999). The DD plot is a plot of the classical Mahalanobis distances MD i versus robust Mahalanobis

Definition 5.1: Rousseeuw and Van Driessen (1999). The DD plot is a plot of the classical Mahalanobis distances MD i versus robust Mahalanobis Chapter 5 DD Plots and Prediction Regions 5. DD Plots A basic way of designing a graphical display is to arrange for reference situations to correspond to straight lines in the plot. Chambers, Cleveland,

More information

Robust scale estimation with extensions

Robust scale estimation with extensions Robust scale estimation with extensions Garth Tarr, Samuel Müller and Neville Weber School of Mathematics and Statistics THE UNIVERSITY OF SYDNEY Outline The robust scale estimator P n Robust covariance

More information

Evaluation of robust PCA for supervised audio outlier detection

Evaluation of robust PCA for supervised audio outlier detection Institut f. Stochastik und Wirtschaftsmathematik 1040 Wien, Wiedner Hauptstr. 8-10/105 AUSTRIA http://www.isw.tuwien.ac.at Evaluation of robust PCA for supervised audio outlier detection S. Brodinova,

More information

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ). .8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics

More information

Robust repeated median regression in moving windows with data-adaptive width selection SFB 823. Discussion Paper. Matthias Borowski, Roland Fried

Robust repeated median regression in moving windows with data-adaptive width selection SFB 823. Discussion Paper. Matthias Borowski, Roland Fried SFB 823 Robust repeated median regression in moving windows with data-adaptive width selection Discussion Paper Matthias Borowski, Roland Fried Nr. 28/2011 Robust Repeated Median regression in moving

More information

Improvement of The Hotelling s T 2 Charts Using Robust Location Winsorized One Step M-Estimator (WMOM)

Improvement of The Hotelling s T 2 Charts Using Robust Location Winsorized One Step M-Estimator (WMOM) Punjab University Journal of Mathematics (ISSN 1016-2526) Vol. 50(1)(2018) pp. 97-112 Improvement of The Hotelling s T 2 Charts Using Robust Location Winsorized One Step M-Estimator (WMOM) Firas Haddad

More information

Descriptive Data Summarization

Descriptive Data Summarization Descriptive Data Summarization Descriptive data summarization gives the general characteristics of the data and identify the presence of noise or outliers, which is useful for successful data cleaning

More information

Robust detection of discordant sites in regional frequency analysis

Robust detection of discordant sites in regional frequency analysis Click Here for Full Article WATER RESOURCES RESEARCH, VOL. 43, W06417, doi:10.1029/2006wr005322, 2007 Robust detection of discordant sites in regional frequency analysis N. M. Neykov, 1 and P. N. Neytchev,

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Joseph W. McKean 1. INTRODUCTION

Joseph W. McKean 1. INTRODUCTION Statistical Science 2004, Vol. 19, No. 4, 562 570 DOI 10.1214/088342304000000549 Institute of Mathematical Statistics, 2004 Robust Analysis of Linear Models Joseph W. McKean Abstract. This paper presents

More information

Extending the Robust Means Modeling Framework. Alyssa Counsell, Phil Chalmers, Matt Sigal, Rob Cribbie

Extending the Robust Means Modeling Framework. Alyssa Counsell, Phil Chalmers, Matt Sigal, Rob Cribbie Extending the Robust Means Modeling Framework Alyssa Counsell, Phil Chalmers, Matt Sigal, Rob Cribbie One-way Independent Subjects Design Model: Y ij = µ + τ j + ε ij, j = 1,, J Y ij = score of the ith

More information

On the limiting distributions of multivariate depth-based rank sum. statistics and related tests. By Yijun Zuo 2 and Xuming He 3

On the limiting distributions of multivariate depth-based rank sum. statistics and related tests. By Yijun Zuo 2 and Xuming He 3 1 On the limiting distributions of multivariate depth-based rank sum statistics and related tests By Yijun Zuo 2 and Xuming He 3 Michigan State University and University of Illinois A depth-based rank

More information

Heteroskedasticity-Robust Inference in Finite Samples

Heteroskedasticity-Robust Inference in Finite Samples Heteroskedasticity-Robust Inference in Finite Samples Jerry Hausman and Christopher Palmer Massachusetts Institute of Technology December 011 Abstract Since the advent of heteroskedasticity-robust standard

More information

Identification of local multivariate outliers

Identification of local multivariate outliers Noname manuscript No. (will be inserted by the editor) Identification of local multivariate outliers Peter Filzmoser Anne Ruiz-Gazen Christine Thomas-Agnan Received: date / Accepted: date Abstract The

More information

Robust regression in Stata

Robust regression in Stata The Stata Journal (2009) 9, Number 3, pp. 439 453 Robust regression in Stata Vincenzo Verardi 1 University of Namur (CRED) and Université Libre de Bruxelles (ECARES and CKE) Rempart de la Vierge 8, B-5000

More information

AN IMPROVEMENT TO THE ALIGNED RANK STATISTIC

AN IMPROVEMENT TO THE ALIGNED RANK STATISTIC Journal of Applied Statistical Science ISSN 1067-5817 Volume 14, Number 3/4, pp. 225-235 2005 Nova Science Publishers, Inc. AN IMPROVEMENT TO THE ALIGNED RANK STATISTIC FOR TWO-FACTOR ANALYSIS OF VARIANCE

More information

Author s Accepted Manuscript

Author s Accepted Manuscript Author s Accepted Manuscript Robust fitting of mixtures using the Trimmed Likelihood Estimator N. Neykov, P. Filzmoser, R. Dimova, P. Neytchev PII: S0167-9473(06)00501-9 DOI: doi:10.1016/j.csda.2006.12.024

More information

MULTICOLLINEARITY DIAGNOSTIC MEASURES BASED ON MINIMUM COVARIANCE DETERMINATION APPROACH

MULTICOLLINEARITY DIAGNOSTIC MEASURES BASED ON MINIMUM COVARIANCE DETERMINATION APPROACH Professor Habshah MIDI, PhD Department of Mathematics, Faculty of Science / Laboratory of Computational Statistics and Operations Research, Institute for Mathematical Research University Putra, Malaysia

More information

From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. About This Book... xiii About The Author...

From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. About This Book... xiii About The Author... From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. Contents About This Book... xiii About The Author... xxiii Chapter 1 Getting Started: Data Analysis with JMP...

More information

Robust Estimation of Cronbach s Alpha

Robust Estimation of Cronbach s Alpha Robust Estimation of Cronbach s Alpha A. Christmann University of Dortmund, Fachbereich Statistik, 44421 Dortmund, Germany. S. Van Aelst Ghent University (UGENT), Department of Applied Mathematics and

More information

Feature Vector Similarity Based on Local Structure

Feature Vector Similarity Based on Local Structure Feature Vector Similarity Based on Local Structure Evgeniya Balmachnova, Luc Florack, and Bart ter Haar Romeny Eindhoven University of Technology, P.O. Box 53, 5600 MB Eindhoven, The Netherlands {E.Balmachnova,L.M.J.Florack,B.M.terHaarRomeny}@tue.nl

More information

Testing Statistical Hypotheses

Testing Statistical Hypotheses E.L. Lehmann Joseph P. Romano Testing Statistical Hypotheses Third Edition 4y Springer Preface vii I Small-Sample Theory 1 1 The General Decision Problem 3 1.1 Statistical Inference and Statistical Decisions

More information

On the Distribution of Hotelling s T 2 Statistic Based on the Successive Differences Covariance Matrix Estimator

On the Distribution of Hotelling s T 2 Statistic Based on the Successive Differences Covariance Matrix Estimator On the Distribution of Hotelling s T 2 Statistic Based on the Successive Differences Covariance Matrix Estimator JAMES D. WILLIAMS GE Global Research, Niskayuna, NY 12309 WILLIAM H. WOODALL and JEFFREY

More information

Nonparametric Multivariate Control Charts Based on. A Linkage Ranking Algorithm

Nonparametric Multivariate Control Charts Based on. A Linkage Ranking Algorithm Nonparametric Multivariate Control Charts Based on A Linkage Ranking Algorithm Helen Meyers Bush Data Mining & Advanced Analytics, UPS 55 Glenlake Parkway, NE Atlanta, GA 30328, USA Panitarn Chongfuangprinya

More information

Data Preprocessing. Cluster Similarity

Data Preprocessing. Cluster Similarity 1 Cluster Similarity Similarity is most often measured with the help of a distance function. The smaller the distance, the more similar the data objects (points). A function d: M M R is a distance on M

More information

Eric Shou Stat 598B / CSE 598D METHODS FOR MICRODATA PROTECTION

Eric Shou Stat 598B / CSE 598D METHODS FOR MICRODATA PROTECTION Eric Shou Stat 598B / CSE 598D METHODS FOR MICRODATA PROTECTION INTRODUCTION Statistical disclosure control part of preparations for disseminating microdata. Data perturbation techniques: Methods assuring

More information

Statistical Depth Function

Statistical Depth Function Machine Learning Journal Club, Gatsby Unit June 27, 2016 Outline L-statistics, order statistics, ranking. Instead of moments: median, dispersion, scale, skewness,... New visualization tools: depth contours,

More information

Journal of the American Statistical Association, Vol. 85, No (Sep., 1990), pp

Journal of the American Statistical Association, Vol. 85, No (Sep., 1990), pp Unmasking Multivariate Outliers and Leverage Points Peter J. Rousseeuw; Bert C. van Zomeren Journal of the American Statistical Association, Vol. 85, No. 411. (Sep., 1990), pp. 633-639. Stable URL: http://links.jstor.org/sici?sici=0162-1459%28199009%2985%3a411%3c633%3aumoalp%3e2.0.co%3b2-p

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Letter-value plots: Boxplots for large data

Letter-value plots: Boxplots for large data Letter-value plots: Boxplots for large data Heike Hofmann Karen Kafadar Hadley Wickham Dept of Statistics Dept of Statistics Dept of Statistics Iowa State University Indiana University Rice University

More information

Dose-response modeling with bivariate binary data under model uncertainty

Dose-response modeling with bivariate binary data under model uncertainty Dose-response modeling with bivariate binary data under model uncertainty Bernhard Klingenberg 1 1 Department of Mathematics and Statistics, Williams College, Williamstown, MA, 01267 and Institute of Statistics,

More information

Testing Statistical Hypotheses

Testing Statistical Hypotheses E.L. Lehmann Joseph P. Romano, 02LEu1 ttd ~Lt~S Testing Statistical Hypotheses Third Edition With 6 Illustrations ~Springer 2 The Probability Background 28 2.1 Probability and Measure 28 2.2 Integration.........

More information

Smooth simultaneous confidence bands for cumulative distribution functions

Smooth simultaneous confidence bands for cumulative distribution functions Journal of Nonparametric Statistics, 2013 Vol. 25, No. 2, 395 407, http://dx.doi.org/10.1080/10485252.2012.759219 Smooth simultaneous confidence bands for cumulative distribution functions Jiangyan Wang

More information

WALD LECTURE II LOOKING INSIDE THE BLACK BOX. Leo Breiman UCB Statistics

WALD LECTURE II LOOKING INSIDE THE BLACK BOX. Leo Breiman UCB Statistics 1 WALD LECTURE II LOOKING INSIDE THE BLACK BOX Leo Breiman UCB Statistics leo@stat.berkeley.edu ORIGIN OF BLACK BOXES 2 Statistics uses data to explore problems. Think of the data as being generated by

More information

Robust Space-Time Adaptive Processing Using Projection Statistics

Robust Space-Time Adaptive Processing Using Projection Statistics Robust Space-Time Adaptive Processing Using Projection Statistics André P. des Rosiers 1, Gregory N. Schoenig 2, Lamine Mili 3 1: Adaptive Processing Section, Radar Division United States Naval Research

More information

A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL

A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL Discussiones Mathematicae Probability and Statistics 36 206 43 5 doi:0.75/dmps.80 A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL Tadeusz Bednarski Wroclaw University e-mail: t.bednarski@prawo.uni.wroc.pl

More information

Invariant coordinate selection for multivariate data analysis - the package ICS

Invariant coordinate selection for multivariate data analysis - the package ICS Invariant coordinate selection for multivariate data analysis - the package ICS Klaus Nordhausen 1 Hannu Oja 1 David E. Tyler 2 1 Tampere School of Public Health University of Tampere 2 Department of Statistics

More information

Rejoinder on: Control of the false discovery rate under dependence using the bootstrap and subsampling

Rejoinder on: Control of the false discovery rate under dependence using the bootstrap and subsampling Test (2008) 17: 461 471 DOI 10.1007/s11749-008-0134-6 DISCUSSION Rejoinder on: Control of the false discovery rate under dependence using the bootstrap and subsampling Joseph P. Romano Azeem M. Shaikh

More information

A Bootstrap Test for Causality with Endogenous Lag Length Choice. - theory and application in finance

A Bootstrap Test for Causality with Endogenous Lag Length Choice. - theory and application in finance CESIS Electronic Working Paper Series Paper No. 223 A Bootstrap Test for Causality with Endogenous Lag Length Choice - theory and application in finance R. Scott Hacker and Abdulnasser Hatemi-J April 200

More information

Robust statistic for the one-way MANOVA

Robust statistic for the one-way MANOVA Institut f. Statistik u. Wahrscheinlichkeitstheorie 1040 Wien, Wiedner Hauptstr. 8-10/107 AUSTRIA http://www.statistik.tuwien.ac.at Robust statistic for the one-way MANOVA V. Todorov and P. Filzmoser Forschungsbericht

More information

Using Ridge Least Median Squares to Estimate the Parameter by Solving Multicollinearity and Outliers Problems

Using Ridge Least Median Squares to Estimate the Parameter by Solving Multicollinearity and Outliers Problems Modern Applied Science; Vol. 9, No. ; 05 ISSN 9-844 E-ISSN 9-85 Published by Canadian Center of Science and Education Using Ridge Least Median Squares to Estimate the Parameter by Solving Multicollinearity

More information

Physics and Chemistry of the Earth

Physics and Chemistry of the Earth Physics and Chemistry of the Earth 34 (2009) 626 634 Contents lists available at ScienceDirect Physics and Chemistry of the Earth journal homepage: www.elsevier.com/locate/pce Performances of some parameter

More information

What s New in Econometrics? Lecture 14 Quantile Methods

What s New in Econometrics? Lecture 14 Quantile Methods What s New in Econometrics? Lecture 14 Quantile Methods Jeff Wooldridge NBER Summer Institute, 2007 1. Reminders About Means, Medians, and Quantiles 2. Some Useful Asymptotic Results 3. Quantile Regression

More information

Quantile Scale Curves

Quantile Scale Curves This article was downloaded by: [Zhejiang University] On: 24 March 2014, At: 16:06 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office:

More information

Why some measures of fluctuating asymmetry are so sensitive to measurement error

Why some measures of fluctuating asymmetry are so sensitive to measurement error Ann. Zool. Fennici 34: 33 37 ISSN 0003-455X Helsinki 7 May 997 Finnish Zoological and Botanical Publishing Board 997 Commentary Why some measures of fluctuating asymmetry are so sensitive to measurement

More information

On robust and efficient estimation of the center of. Symmetry.

On robust and efficient estimation of the center of. Symmetry. On robust and efficient estimation of the center of symmetry Howard D. Bondell Department of Statistics, North Carolina State University Raleigh, NC 27695-8203, U.S.A (email: bondell@stat.ncsu.edu) Abstract

More information

Comparing the performance of modified F t statistic with ANOVA and Kruskal Wallis test

Comparing the performance of modified F t statistic with ANOVA and Kruskal Wallis test Appl. Math. Inf. Sci. 7, No. 2L, 403-408 (2013) 403 Applied Mathematics & Information Sciences An International ournal http://dx.doi.org/10.12785/amis/072l04 Comparing the performance of modified F t statistic

More information

The energy Package. October 3, Title E-statistics (energy statistics) tests of fit, independence, clustering

The energy Package. October 3, Title E-statistics (energy statistics) tests of fit, independence, clustering The energy Package October 3, 2005 Title E-statistics (energy statistics) tests of fit, independence, clustering Version 1.0-3 Date 2005-10-02 Author Maria L. Rizzo and Gabor J.

More information

System Monitoring with Real-Time Contrasts

System Monitoring with Real-Time Contrasts System Monitoring with Real- Contrasts HOUTAO DENG Intuit, Mountain View, CA 94043, USA GEORGE RUNGER Arizona State University, Tempe, AZ 85287, USA EUGENE TUV Intel Corporation, Chandler, AZ 85226, USA

More information

CoDa-dendrogram: A new exploratory tool. 2 Dept. Informàtica i Matemàtica Aplicada, Universitat de Girona, Spain;

CoDa-dendrogram: A new exploratory tool. 2 Dept. Informàtica i Matemàtica Aplicada, Universitat de Girona, Spain; CoDa-dendrogram: A new exploratory tool J.J. Egozcue 1, and V. Pawlowsky-Glahn 2 1 Dept. Matemàtica Aplicada III, Universitat Politècnica de Catalunya, Barcelona, Spain; juan.jose.egozcue@upc.edu 2 Dept.

More information

A Brief Overview of Robust Statistics

A Brief Overview of Robust Statistics A Brief Overview of Robust Statistics Olfa Nasraoui Department of Computer Engineering & Computer Science University of Louisville, olfa.nasraoui_at_louisville.edu Robust Statistical Estimators Robust

More information

Robust Maximum Association Between Data Sets: The R Package ccapp

Robust Maximum Association Between Data Sets: The R Package ccapp Robust Maximum Association Between Data Sets: The R Package ccapp Andreas Alfons Erasmus Universiteit Rotterdam Christophe Croux KU Leuven Peter Filzmoser Vienna University of Technology Abstract This

More information

A Modified M-estimator for the Detection of Outliers

A Modified M-estimator for the Detection of Outliers A Modified M-estimator for the Detection of Outliers Asad Ali Department of Statistics, University of Peshawar NWFP, Pakistan Email: asad_yousafzay@yahoo.com Muhammad F. Qadir Department of Statistics,

More information

Comparing Non-informative Priors for Estimation and. Prediction in Spatial Models

Comparing Non-informative Priors for Estimation and. Prediction in Spatial Models Comparing Non-informative Priors for Estimation and Prediction in Spatial Models Vigre Semester Report by: Regina Wu Advisor: Cari Kaufman January 31, 2010 1 Introduction Gaussian random fields with specified

More information

Detection of Anomalies in Texture Images using Multi-Resolution Features

Detection of Anomalies in Texture Images using Multi-Resolution Features Detection of Anomalies in Texture Images using Multi-Resolution Features Electrical Engineering Department Supervisor: Prof. Israel Cohen Outline Introduction 1 Introduction Anomaly Detection Texture Segmentation

More information

THE BREAKDOWN POINT EXAMPLES AND COUNTEREXAMPLES

THE BREAKDOWN POINT EXAMPLES AND COUNTEREXAMPLES REVSTAT Statistical Journal Volume 5, Number 1, March 2007, 1 17 THE BREAKDOWN POINT EXAMPLES AND COUNTEREXAMPLES Authors: P.L. Davies University of Duisburg-Essen, Germany, and Technical University Eindhoven,

More information

HOTELLING S CHARTS USING WINSORIZED MODIFIED ONE STEP M-ESTIMATOR FOR INDIVIDUAL NON NORMAL DATA

HOTELLING S CHARTS USING WINSORIZED MODIFIED ONE STEP M-ESTIMATOR FOR INDIVIDUAL NON NORMAL DATA 20 th February 205. Vol.72 No.2 2005-205 JATIT & LLS. All rights reserved. ISSN: 992-8645 www.jatit.org E-ISSN: 87-395 HOTELLING S CHARTS USING WINSORIZED MODIFIED ONE STEP M-ESTIMATOR FOR INDIVIDUAL NON

More information

Surprise Detection in Science Data Streams Kirk Borne Dept of Computational & Data Sciences George Mason University

Surprise Detection in Science Data Streams Kirk Borne Dept of Computational & Data Sciences George Mason University Surprise Detection in Science Data Streams Kirk Borne Dept of Computational & Data Sciences George Mason University kborne@gmu.edu, http://classweb.gmu.edu/kborne/ Outline Astroinformatics Example Application:

More information

The energy Package. October 28, 2007

The energy Package. October 28, 2007 The energy Package October 28, 2007 Title E-statistics (energy statistics) tests of fit, independence, clustering Version 1.0-7 Date 2007-10-27 Author Maria L. Rizzo and Gabor

More information

Intraclass Correlations in One-Factor Studies

Intraclass Correlations in One-Factor Studies CHAPTER Intraclass Correlations in One-Factor Studies OBJECTIVE The objective of this chapter is to present methods and techniques for calculating the intraclass correlation coefficient and associated

More information

Dealing With Zeros and Missing Values in Compositional Data Sets Using Nonparametric Imputation 1

Dealing With Zeros and Missing Values in Compositional Data Sets Using Nonparametric Imputation 1 Mathematical Geology, Vol. 35, No. 3, April 2003 ( C 2003) Dealing With Zeros and Missing Values in Compositional Data Sets Using Nonparametric Imputation 1 J. A. Martín-Fernández, 2 C. Barceló-Vidal,

More information

On Parameter-Mixing of Dependence Parameters

On Parameter-Mixing of Dependence Parameters On Parameter-Mixing of Dependence Parameters by Murray D Smith and Xiangyuan Tommy Chen 2 Econometrics and Business Statistics The University of Sydney Incomplete Preliminary Draft May 9, 2006 (NOT FOR

More information

Variable Selection for Highly Correlated Predictors

Variable Selection for Highly Correlated Predictors Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu arxiv:1709.04840v1 [stat.me] 14 Sep 2017 Abstract Penalty-based variable selection methods are powerful in selecting relevant covariates

More information

Estimating Coefficients in Linear Models: It Don't Make No Nevermind

Estimating Coefficients in Linear Models: It Don't Make No Nevermind Psychological Bulletin 1976, Vol. 83, No. 2. 213-217 Estimating Coefficients in Linear Models: It Don't Make No Nevermind Howard Wainer Department of Behavioral Science, University of Chicago It is proved

More information

FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING

FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING Vishwanath Mantha Department for Electrical and Computer Engineering Mississippi State University, Mississippi State, MS 39762 mantha@isip.msstate.edu ABSTRACT

More information

On Selecting Tests for Equality of Two Normal Mean Vectors

On Selecting Tests for Equality of Two Normal Mean Vectors MULTIVARIATE BEHAVIORAL RESEARCH, 41(4), 533 548 Copyright 006, Lawrence Erlbaum Associates, Inc. On Selecting Tests for Equality of Two Normal Mean Vectors K. Krishnamoorthy and Yanping Xia Department

More information

Application and Use of Multivariate Control Charts In a BTA Deep Hole Drilling Process

Application and Use of Multivariate Control Charts In a BTA Deep Hole Drilling Process Application and Use of Multivariate Control Charts In a BTA Deep Hole Drilling Process Amor Messaoud, Winfied Theis, Claus Weihs, and Franz Hering Fachbereich Statistik, Universität Dortmund, 44221 Dortmund,

More information

CONTROL CHARTS FOR MULTIVARIATE NONLINEAR TIME SERIES

CONTROL CHARTS FOR MULTIVARIATE NONLINEAR TIME SERIES REVSTAT Statistical Journal Volume 13, Number, June 015, 131 144 CONTROL CHARTS FOR MULTIVARIATE NONLINEAR TIME SERIES Authors: Robert Garthoff Department of Statistics, European University, Große Scharrnstr.

More information

Independent Component (IC) Models: New Extensions of the Multinormal Model

Independent Component (IC) Models: New Extensions of the Multinormal Model Independent Component (IC) Models: New Extensions of the Multinormal Model Davy Paindaveine (joint with Klaus Nordhausen, Hannu Oja, and Sara Taskinen) School of Public Health, ULB, April 2008 My research

More information

Robust Performance Hypothesis Testing with the Variance. Institute for Empirical Research in Economics University of Zurich

Robust Performance Hypothesis Testing with the Variance. Institute for Empirical Research in Economics University of Zurich Institute for Empirical Research in Economics University of Zurich Working Paper Series ISSN 1424-0459 Working Paper No. 516 Robust Performance Hypothesis Testing with the Variance Olivier Ledoit and Michael

More information

Bounding the number of affine roots

Bounding the number of affine roots with applications in reliable and secure communication Inaugural Lecture, Aalborg University, August 11110, 11111100000 with applications in reliable and secure communication Polynomials: F (X ) = 2X 2

More information

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model

More information

A nonparametric two-sample wald test of equality of variances

A nonparametric two-sample wald test of equality of variances University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 211 A nonparametric two-sample wald test of equality of variances David

More information

An Empirical Study of Probability Elicitation under Noisy-OR Assumption

An Empirical Study of Probability Elicitation under Noisy-OR Assumption An Empirical Study of Probability Elicitation under Noisy-OR Assumption Adam Zagorecki and Marek Druzdzel Decision Systems Laboratory School of Information Science University of Pittsburgh, Pittsburgh,

More information

A NONPARAMETRIC TEST FOR HOMOGENEITY: APPLICATIONS TO PARAMETER ESTIMATION

A NONPARAMETRIC TEST FOR HOMOGENEITY: APPLICATIONS TO PARAMETER ESTIMATION Change-point Problems IMS Lecture Notes - Monograph Series (Volume 23, 1994) A NONPARAMETRIC TEST FOR HOMOGENEITY: APPLICATIONS TO PARAMETER ESTIMATION BY K. GHOUDI AND D. MCDONALD Universite' Lava1 and

More information

Applications of Information Geometry to Hypothesis Testing and Signal Detection

Applications of Information Geometry to Hypothesis Testing and Signal Detection CMCAA 2016 Applications of Information Geometry to Hypothesis Testing and Signal Detection Yongqiang Cheng National University of Defense Technology July 2016 Outline 1. Principles of Information Geometry

More information

Robust and Efficient Fitting of the Generalized Pareto Distribution with Actuarial Applications in View

Robust and Efficient Fitting of the Generalized Pareto Distribution with Actuarial Applications in View Robust and Efficient Fitting of the Generalized Pareto Distribution with Actuarial Applications in View Vytaras Brazauskas 1 University of Wisconsin-Milwaukee Andreas Kleefeld 2 University of Wisconsin-Milwaukee

More information

Robust Outcome Analysis for Observational Studies Designed Using Propensity Score Matching

Robust Outcome Analysis for Observational Studies Designed Using Propensity Score Matching The work of Kosten and McKean was partially supported by NIAAA Grant 1R21AA017906-01A1 Robust Outcome Analysis for Observational Studies Designed Using Propensity Score Matching Bradley E. Huitema Western

More information

Analysis of Cross-Sectional Data

Analysis of Cross-Sectional Data Analysis of Cross-Sectional Data Kevin Sheppard http://www.kevinsheppard.com Oxford MFE This version: November 8, 2017 November 13 14, 2017 Outline Econometric models Specification that can be analyzed

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

PRINCIPLES OF STATISTICAL INFERENCE

PRINCIPLES OF STATISTICAL INFERENCE Advanced Series on Statistical Science & Applied Probability PRINCIPLES OF STATISTICAL INFERENCE from a Neo-Fisherian Perspective Luigi Pace Department of Statistics University ofudine, Italy Alessandra

More information

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007) FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter

More information

A Statistical Perspective on Algorithmic Leveraging

A Statistical Perspective on Algorithmic Leveraging Ping Ma PINGMA@UGA.EDU Department of Statistics, University of Georgia, Athens, GA 30602 Michael W. Mahoney MMAHONEY@ICSI.BERKELEY.EDU International Computer Science Institute and Dept. of Statistics,

More information

Generalized Impulse Response Analysis: General or Extreme?

Generalized Impulse Response Analysis: General or Extreme? Auburn University Department of Economics Working Paper Series Generalized Impulse Response Analysis: General or Extreme? Hyeongwoo Kim Auburn University AUWP 2012-04 This paper can be downloaded without

More information

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Kuangyu Wen & Ximing Wu Texas A&M University Info-Metrics Institute Conference: Recent Innovations in Info-Metrics October

More information

arxiv: v1 [math.st] 2 Aug 2007

arxiv: v1 [math.st] 2 Aug 2007 The Annals of Statistics 2007, Vol. 35, No. 1, 13 40 DOI: 10.1214/009053606000000975 c Institute of Mathematical Statistics, 2007 arxiv:0708.0390v1 [math.st] 2 Aug 2007 ON THE MAXIMUM BIAS FUNCTIONS OF

More information