Supplementary Material for Wang and Serfling paper


 Dominic Simmons
 10 months ago
 Views:
Transcription
1 Supplementary Material for Wang and Serfling paper March 6, Simulation study Here we provide a simulation study to compare empirically the masking and swamping robustness of our selected outlyingness functions. Two parametric parent distributions are considered, the bivariate standard normal, which is symmetric, and the bivariate exponential with mean (0.5, 0.5), which is rightskewed. 1.1 Simulation plan We generate samples separately from the above two distributions with three different sample sizes n = 50, 500, and We consider contamination models with true positive rate ɛ, for ɛ = 0.02, 0.1, 0.25, and 0.4. For each model, the number of replacements in the sample is given by m = nɛ. The numbers of replications for the sample sizes are chosen to be 5000 for n = 50, 2000 for n = 500, and 500 for n = 2000, respectively. To choose the sample outlyingness thresholds λ, we assume false positive rates under no contamination to be given by α = 0.02 for n = 50, and α = 0.01 for n = 500 and n = That is, under no contamination, we expect about 1 observation, 5 observations, and 20 observations with outlyingness values > λ for n = 50, 500, and 2000, respectively. On this basis, we take as sample outlyingness threshold λ the 2nd largest, the 6th largest, and the 21st largest value of the observed sample outlyingness function, for n = 50, 500, and 2000, respectively. For convenience, we index X i, i = 1, 2,..., n, in order of their Euclidean norms, from smallest to largest. We explore masking and swamping robustness of the following four affine invariant sample outlier identifiers treated in Section 4 of the paper: 1
2 Classical Mahalanobis distance outlyingness (CMD), i.e., O MD (x, X n ) with ( µ, Σ) = (X, S). Robust Mahalanobis distance outlyingness (RMD), i.e., O MD (x, X n ) with ( µ, Σ) given by the MCD estimators. Here these are defined taking subsample size h = n+d+1, which for d = 2 yields 26 for n = 50, 251 for 2 n = 500, and 1001 for n = 2000, respectively. The Robustbase package in R is used to obtain approximate MCD estimators via the FastMCD algorithm of Rousseeuw and Van Driessen (1999). Robust Mahalanobis spatial outlyingness (RMS), i.e., O MS (x, X n ) with the MCD covariance estimator and the subsample size h chosen as for RMD. Projection outlyingness (P), i.e., O P (x, X n ) with univariate location and scale given by (Med, MAD d 1 ). For computational convenience, only a finite number 1000 of randomly chosen projection directions are used in our simulation. We measure and compare the masking and swamping robustness of the above outlyingness functions using the following two performance indices: Percent of data points masked among m outliers. If all m outliers are masked, this indicates masking breakdown. Percent of data points swamped among nonoutliers. Here the number of nonoutliers is at most n m. If all n m nonoutliers are swamped, this indicates swamping breakdown. Two scenarios in relation to outliers (similar to Dang and Serfling, 2010) are considered: A Replace X n m+1,..., X n 1, X n by KX n m+1,..., KX n 1, KX n for an inflation factor K chosen = 5. Denote the modified data set by X (A) n,m. B Replace X n m+1,..., X n 1, X n by KX n,..., KX n, KX n for inflation factor K again chosen = 5. Denote the modified data set by X (B) n,m. These are illustrated in Figure 1 for the two parent distributions, with sample size n = 50 and m = 5 replacements. 2
3 Figure 1. Scenarios A and B with sample size 50 and 5 replacements, for the bivariate standard normal distribution (left) and the bivariate exponential(0.5, 0.5) distribution (right). The 5 observations in the original sample with largest Euclidean norms are marked +. Their replacements in Scenario A are marked and in Scenario B are marked. Note the 5 replacements in Scenario B overlap each other and also one of the Scenario A replacements. 1.2 Simulation results The following sections present simulation results on masking and swamping for the two distributions, respectively Results for contaminated bivariate standard normal Table 1 shows for each procedure the average sample outlyingness thresholds λ for the replicated samples of sizes 50, 500, and 2000 from bivariate standard normal. As described earlier, the threshold is chosen according to the false positive rate α under no contamination, which is 0.02 for n = 50 and 0.01 for n = 500 and 2000, respectively. 3
4 n = 50, n = 500, n = 2000, 5000 trials 2000 trials 500 trials CMD RMD RMS P Table 1. Average sample outlyingness thresholds λ for CMD, RMD, RMS and P, with n = 50, 500 and 2000 under no contamination, based on the bivariate normal model Masking performance Table 2 displays the masking robustness of CMD, RMD, RMS and P under scenarios A and B with the four different contamination levels, based on the standard normal model. Here masking breakdown occurs in a given sample when all m outliers (replaced data points) are masked. Average percent (%) masked among m = nɛ replacement outliers ɛ = 0.02 ɛ = 0.10 ɛ = 0.25 ɛ = 0.40 MBP A B A B A B A B CMD n = 50, RMD trials RMS P CMD n = 500, RMD trials RMS P CMD n = 2000, RMD trials RMS P Table 2. Masking performance of CMD, RMD, RMS and P for bivariate standard normal samples with n = 50, 500 and As a benchmark, the MBP column gives the theoretical MBP values. As ɛ increases, masking occurs more frequently, especially when ɛ > MBP. Masking breakdown in a sample occurs when the percent of outliers masked is
5 Comments based on Table 2. Let us discuss the implications of Table 2 for each of the four procedures and then summarize. CMD is not robust with respect to masking, its masking performance degrading seriously as the contamination level increases. In particular, when ɛ 0.25, masking breakdown occurs under Scenario B of outlier replacements in all replications, for all three sample sizes. This result is not surprising since the theoretical MBP of CMD is n 1 0. RMD has strong masking robustness overall. No masking whatsoever occurs in either scenario at contamination levels ɛ = 0.02, 0.10 and 0.25, nor in Scenario A at level ɛ = On the other hand, in Scenario B at level ɛ = 0.40, the average percent masked is high. These results are in line with the theoretical MBP of RMD, which is n 1 n d+1 2 1/2, independently of the sample threshold λ. RMS has very weak masking robustness for the selected thresholds λ given in Table 1 corresponding to approximately the 0.98 quantile of the outlyingness function, which inserted into the MBP formula min { n 1 n d+1 2, n 1 1 λ n } with d = 2 yield MBP values 0.06, 0.03, 2 and 0.03, respectively, for n = 50, 500, and Accordingly, for very small ɛ = 0.02, which is just under the MBP values, there is no masking for either scenario, whereas for all cases of ɛ 0.10 there occurs significant levels of masking for both scenarios. P exhibits very strong robustness, in keeping with its high MBP For both scenarios and all contamination levels, virtually no masking occurs. This agrees with the results for RMD except in the case of Scenario B with contamination level ɛ = 0.40, where P significantly outperforms RMD. In Scenario B with high contamination level, the constraint of elliptical contours is a serious drawback. Thus, for the standard normal contamination model, the above findings corroborate what is suggested by the theoretical MBP values and add some perspective. Namely, CMD has unacceptable masking robustness and hence should not be used in practice, and RMS has only weak masking robustness due low MBP at high sample threshold levels, while RMD and P exhibit strong masking robustness and perform similarly, except that P significantly surpasses RMD in the case of a large cluster contamination Swamping performance Table 3 displays the swamping robustness of CMD, RMD, RMS and P under scenarios A and B with the four different contamination levels, based 5
6 on the standard normal model. Here swamping breakdown occurs in a given sample when all n m nonoutliers (nonreplaced data points) are swamped. Average percent (%) swamped among n m = n(1 ɛ) nonoutliers ɛ = 0.02 ɛ = 0.10 ɛ = 0.25 ɛ = 0.40 SBP A B A B A B A B CMD n = 50, RMD trials RMS P CMD n = 500, RMD trials RMS P CMD n = 2000, RMD trials RMS P Table 3. Swamping performance of CMD, RMD, RMS and P for bivariate standard normal samples with n = 50, 500 and As a benchmark, the SBP column gives the theoretical SBP values. Swamping breakdown in a sample occurs when the percent of nonoutliers swamped is 100. Comments based on Table 3. Except for Scenario B at contamination level ɛ = 0.40, all four procedures exhibit very low swamping, with CMD also performing very well and RMS moderately well even for this extreme scenario. Thus, for the standard normal contamination model, CMD and RMS perform best with respect to swamping robustness, although for masking robustness they are outperformed by RMD and P. These findings corroborate the high SBP values of the four procedures (especially high for CMD) Results for contaminated bivariate exponential Table 4 shows for each of the four procedures the average sample outlyingness thresholds λ for the replicated samples of sizes 50, 500, and 2000 from bivariate exponential(0.5, 0.5). Again, the threshold is chosen according to the false positive rate α under no contamination, which is 0.02 for n = 50 and 0.01 for n = 500 and 2000, respectively. 6
7 n = 50, n = 500, n = 2000, 5000 trials 2000 trials 500 trials CMD RMD RMS P Table 4. Average sample outlyingness thresholds λ for CMD, RMD, RMS and P, with n = 50, 500 and 2000 under no contamination, based on the bivariate exponential(0.5, 0.5) model Masking performance Table 5 displays the masking robustness of CMD, RMD, RMS and P under scenarios A and B with the four different contamination levels, based on the bivariate exponential(0.5, 0.5) model. Here masking breakdown occurs in a given sample when all m outliers (replaced data points) are masked. Average percent (%) masked among m = nɛ replacement outliers ɛ = 0.02 ɛ = 0.10 ɛ = 0.25 ɛ = 0.40 MBP A B A B A B A B CMD n = 50, RMD trials RMS P CMD n = 500, RMD trials RMS P CMD n = 2000, RMD trials RMS P Table 5. Masking performance of CMD, RMD, RMS and P for bivariate exponential(0.5, 0.5) samples with n = 50, 500 and The MBP column gives the theoretical MBP values. Masking breakdown in a sample occurs when the percent of outliers masked is
8 Comments based on Table 5. Both CMD and RMS lack masking robustness, while RMD and P maintain high masking robustness with all contamination levels below the theoretical MBPs. As added perspective, we note that RMD is slightly better than P for Scenario A, while the reverse holds for Scenario B. Of course, when imposing elliptical contours is not appropriate, P is the more suitable choice Swamping performance Table 6 displays the swamping robustness of CMD, RMD, RMS and P under scenarios A and B with the four different contamination levels, based on the bivariate exponential(0.5, 0.5) model. Swamping breakdown occurs in a given sample when all n m nonoutliers (nonreplaced data points) are swamped. Average percent (%) swamped among n m = n(1 ɛ) nonoutliers ɛ = 0.02 ɛ = 0.1 ɛ = 0.25 ɛ = 0.4 SBP A B A B A B A B CMD n = 50, RMD trials RMS P CMD n = 500, RMD trials RMS P CMD n = 2000, RMD trials RMS P Table 6. Swamping performance of CMD, RMD, RMS and P, for bivariate exponential(0.5, 0.5) samples with n = 50, 500 and The SBP column gives the theoretical SBP values. Swamping breakdown in a sample occurs when the percent of nonoutliers swamped is 100. Comments based on Table 6. CMD and RMS exhibit excellent swamping robustness. RMD follows closely, with weakness only in the case of a large cluster of outiers, and P follows RMD competitively. 8
9 1.2.3 Practical recommendations based on the simulation results Although its swamping performance is optimal, CMD has unacceptably low masking performance and hence is not recommended for use in practice. The masking robustness of RMS is weak at high sample thresholds λ, yet it has good swamping performance and its computational burden is relatively light. Both RMD and P are robust with respect to both masking and swamping, with RMD having less computational complexity. It is worth noting, however, that in the contaminated bivariate standard normal model, both the masking performance of RMD and the swamping performance of RMD and P degrade in the presence of a large cluster of outliers. One may select the appropriate outlier identifier according to the preferred balance on masking robustness versus swamping robustness, in conjunction with consideration of the appropriateness or not of elliptical contours, the computational burden, and whether or not protection against a large cluster is desired. These recommendations based on results in two specific parametric models for the data are consistent with the results based on the MBPs and SBPs, which, of course, are nonparametric and have wide general application across diverse data settings. References [1] Dang, X. and Serfling, R. (2010). Nonparametric depthbased multivariate outlier identifiers, and masking robustness properties. Journal of Statistical Planning and Inference [2] Rousseeuw, P. and Van Driessen, K. (1999). A fast algorithm for the minimum covariance determinant estimator. Technometrics
General Foundations for Studying Masking and Swamping Robustness of Outlier Identifiers
General Foundations for Studying Masking and Swamping Robustness of Outlier Identifiers Robert Serfling 1 and Shanshan Wang 2 University of Texas at Dallas This paper is dedicated to the memory of Kesar
More informationAccurate and Powerful Multivariate Outlier Detection
Int. Statistical Inst.: Proc. 58th World Statistical Congress, 11, Dublin (Session CPS66) p.568 Accurate and Powerful Multivariate Outlier Detection Cerioli, Andrea Università di Parma, Dipartimento di
More informationIDENTIFYING MULTIPLE OUTLIERS IN LINEAR REGRESSION : ROBUST FIT AND CLUSTERING APPROACH
SESSION X : THEORY OF DEFORMATION ANALYSIS II IDENTIFYING MULTIPLE OUTLIERS IN LINEAR REGRESSION : ROBUST FIT AND CLUSTERING APPROACH Robiah Adnan 2 Halim Setan 3 Mohd Nor Mohamad Faculty of Science, Universiti
More informationRobust estimation of principal components from depthbased multivariate rank covariance matrix
Robust estimation of principal components from depthbased multivariate rank covariance matrix Subho Majumdar Snigdhansu Chatterjee University of Minnesota, School of Statistics Table of contents Summary
More informationIdentification of Multivariate Outliers: A Performance Study
AUSTRIAN JOURNAL OF STATISTICS Volume 34 (2005), Number 2, 127 138 Identification of Multivariate Outliers: A Performance Study Peter Filzmoser Vienna University of Technology, Austria Abstract: Three
More informationA MultiStep, Clusterbased Multivariate Chart for. Retrospective Monitoring of Individuals
A MultiStep, Clusterbased Multivariate Chart for Retrospective Monitoring of Individuals J. Marcus Jobe, Michael Pokojovy April 29, 29 J. Marcus Jobe is Professor, Decision Sciences and Management Information
More informationAsymptotic Relative Efficiency in Estimation
Asymptotic Relative Efficiency in Estimation Robert Serfling University of Texas at Dallas October 2009 Prepared for forthcoming INTERNATIONAL ENCYCLOPEDIA OF STATISTICAL SCIENCES, to be published by Springer
More informationFAULT DETECTION AND ISOLATION WITH ROBUST PRINCIPAL COMPONENT ANALYSIS
Int. J. Appl. Math. Comput. Sci., 8, Vol. 8, No. 4, 49 44 DOI:.478/v68383 FAULT DETECTION AND ISOLATION WITH ROBUST PRINCIPAL COMPONENT ANALYSIS YVON THARRAULT, GILLES MOUROT, JOSÉ RAGOT, DIDIER MAQUIN
More informationOutlier detection for skewed data
Outlier detection for skewed data Mia Hubert 1 and Stephan Van der Veeken December 7, 27 Abstract Most outlier detection rules for multivariate data are based on the assumption of elliptical symmetry of
More informationRobust Exponential Smoothing of Multivariate Time Series
Robust Exponential Smoothing of Multivariate Time Series Christophe Croux,a, Sarah Gelper b, Koen Mahieu a a Faculty of Business and Economics, K.U.Leuven, Naamsestraat 69, 3000 Leuven, Belgium b Erasmus
More informationGeneralized Multivariate Rank Type Test Statistics via Spatial UQuantiles
Generalized Multivariate Rank Type Test Statistics via Spatial UQuantiles Weihua Zhou 1 University of North Carolina at Charlotte and Robert Serfling 2 University of Texas at Dallas Final revision for
More informationSmall sample corrections for LTS and MCD
Metrika (2002) 55: 111 123 > SpringerVerlag 2002 Small sample corrections for LTS and MCD G. Pison, S. Van Aelst*, and G. Willems Department of Mathematics and Computer Science, Universitaire Instelling
More informationRobust Tools for the Imperfect World
Robust Tools for the Imperfect World Peter Filzmoser a,, Valentin Todorov b a Department of Statistics and Probability Theory, Vienna University of Technology, Wiedner Hauptstr. 810, 1040 Vienna, Austria
More informationVienna University of Technology
Vienna University of Technology Deliverable 4. Final Report Contract with the world bank (1157976) Detecting outliers in household consumption survey data Peter Filzmoser Authors: Johannes Gussenbauer
More informationDefinition 5.1: Rousseeuw and Van Driessen (1999). The DD plot is a plot of the classical Mahalanobis distances MD i versus robust Mahalanobis
Chapter 5 DD Plots and Prediction Regions 5. DD Plots A basic way of designing a graphical display is to arrange for reference situations to correspond to straight lines in the plot. Chambers, Cleveland,
More informationRobust scale estimation with extensions
Robust scale estimation with extensions Garth Tarr, Samuel Müller and Neville Weber School of Mathematics and Statistics THE UNIVERSITY OF SYDNEY Outline The robust scale estimator P n Robust covariance
More informationEvaluation of robust PCA for supervised audio outlier detection
Institut f. Stochastik und Wirtschaftsmathematik 1040 Wien, Wiedner Hauptstr. 810/105 AUSTRIA http://www.isw.tuwien.ac.at Evaluation of robust PCA for supervised audio outlier detection S. Brodinova,
More informationx. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).
.8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the mostimportant distribution in all of statistics
More informationRobust repeated median regression in moving windows with dataadaptive width selection SFB 823. Discussion Paper. Matthias Borowski, Roland Fried
SFB 823 Robust repeated median regression in moving windows with dataadaptive width selection Discussion Paper Matthias Borowski, Roland Fried Nr. 28/2011 Robust Repeated Median regression in moving
More informationImprovement of The Hotelling s T 2 Charts Using Robust Location Winsorized One Step MEstimator (WMOM)
Punjab University Journal of Mathematics (ISSN 10162526) Vol. 50(1)(2018) pp. 97112 Improvement of The Hotelling s T 2 Charts Using Robust Location Winsorized One Step MEstimator (WMOM) Firas Haddad
More informationDescriptive Data Summarization
Descriptive Data Summarization Descriptive data summarization gives the general characteristics of the data and identify the presence of noise or outliers, which is useful for successful data cleaning
More informationRobust detection of discordant sites in regional frequency analysis
Click Here for Full Article WATER RESOURCES RESEARCH, VOL. 43, W06417, doi:10.1029/2006wr005322, 2007 Robust detection of discordant sites in regional frequency analysis N. M. Neykov, 1 and P. N. Neytchev,
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationJoseph W. McKean 1. INTRODUCTION
Statistical Science 2004, Vol. 19, No. 4, 562 570 DOI 10.1214/088342304000000549 Institute of Mathematical Statistics, 2004 Robust Analysis of Linear Models Joseph W. McKean Abstract. This paper presents
More informationExtending the Robust Means Modeling Framework. Alyssa Counsell, Phil Chalmers, Matt Sigal, Rob Cribbie
Extending the Robust Means Modeling Framework Alyssa Counsell, Phil Chalmers, Matt Sigal, Rob Cribbie Oneway Independent Subjects Design Model: Y ij = µ + τ j + ε ij, j = 1,, J Y ij = score of the ith
More informationOn the limiting distributions of multivariate depthbased rank sum. statistics and related tests. By Yijun Zuo 2 and Xuming He 3
1 On the limiting distributions of multivariate depthbased rank sum statistics and related tests By Yijun Zuo 2 and Xuming He 3 Michigan State University and University of Illinois A depthbased rank
More informationHeteroskedasticityRobust Inference in Finite Samples
HeteroskedasticityRobust Inference in Finite Samples Jerry Hausman and Christopher Palmer Massachusetts Institute of Technology December 011 Abstract Since the advent of heteroskedasticityrobust standard
More informationIdentification of local multivariate outliers
Noname manuscript No. (will be inserted by the editor) Identification of local multivariate outliers Peter Filzmoser Anne RuizGazen Christine ThomasAgnan Received: date / Accepted: date Abstract The
More informationRobust regression in Stata
The Stata Journal (2009) 9, Number 3, pp. 439 453 Robust regression in Stata Vincenzo Verardi 1 University of Namur (CRED) and Université Libre de Bruxelles (ECARES and CKE) Rempart de la Vierge 8, B5000
More informationAN IMPROVEMENT TO THE ALIGNED RANK STATISTIC
Journal of Applied Statistical Science ISSN 10675817 Volume 14, Number 3/4, pp. 225235 2005 Nova Science Publishers, Inc. AN IMPROVEMENT TO THE ALIGNED RANK STATISTIC FOR TWOFACTOR ANALYSIS OF VARIANCE
More informationAuthor s Accepted Manuscript
Author s Accepted Manuscript Robust fitting of mixtures using the Trimmed Likelihood Estimator N. Neykov, P. Filzmoser, R. Dimova, P. Neytchev PII: S01679473(06)005019 DOI: doi:10.1016/j.csda.2006.12.024
More informationMULTICOLLINEARITY DIAGNOSTIC MEASURES BASED ON MINIMUM COVARIANCE DETERMINATION APPROACH
Professor Habshah MIDI, PhD Department of Mathematics, Faculty of Science / Laboratory of Computational Statistics and Operations Research, Institute for Mathematical Research University Putra, Malaysia
More informationFrom Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. About This Book... xiii About The Author...
From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. Contents About This Book... xiii About The Author... xxiii Chapter 1 Getting Started: Data Analysis with JMP...
More informationRobust Estimation of Cronbach s Alpha
Robust Estimation of Cronbach s Alpha A. Christmann University of Dortmund, Fachbereich Statistik, 44421 Dortmund, Germany. S. Van Aelst Ghent University (UGENT), Department of Applied Mathematics and
More informationFeature Vector Similarity Based on Local Structure
Feature Vector Similarity Based on Local Structure Evgeniya Balmachnova, Luc Florack, and Bart ter Haar Romeny Eindhoven University of Technology, P.O. Box 53, 5600 MB Eindhoven, The Netherlands {E.Balmachnova,L.M.J.Florack,B.M.terHaarRomeny}@tue.nl
More informationTesting Statistical Hypotheses
E.L. Lehmann Joseph P. Romano Testing Statistical Hypotheses Third Edition 4y Springer Preface vii I SmallSample Theory 1 1 The General Decision Problem 3 1.1 Statistical Inference and Statistical Decisions
More informationOn the Distribution of Hotelling s T 2 Statistic Based on the Successive Differences Covariance Matrix Estimator
On the Distribution of Hotelling s T 2 Statistic Based on the Successive Differences Covariance Matrix Estimator JAMES D. WILLIAMS GE Global Research, Niskayuna, NY 12309 WILLIAM H. WOODALL and JEFFREY
More informationNonparametric Multivariate Control Charts Based on. A Linkage Ranking Algorithm
Nonparametric Multivariate Control Charts Based on A Linkage Ranking Algorithm Helen Meyers Bush Data Mining & Advanced Analytics, UPS 55 Glenlake Parkway, NE Atlanta, GA 30328, USA Panitarn Chongfuangprinya
More informationData Preprocessing. Cluster Similarity
1 Cluster Similarity Similarity is most often measured with the help of a distance function. The smaller the distance, the more similar the data objects (points). A function d: M M R is a distance on M
More informationEric Shou Stat 598B / CSE 598D METHODS FOR MICRODATA PROTECTION
Eric Shou Stat 598B / CSE 598D METHODS FOR MICRODATA PROTECTION INTRODUCTION Statistical disclosure control part of preparations for disseminating microdata. Data perturbation techniques: Methods assuring
More informationStatistical Depth Function
Machine Learning Journal Club, Gatsby Unit June 27, 2016 Outline Lstatistics, order statistics, ranking. Instead of moments: median, dispersion, scale, skewness,... New visualization tools: depth contours,
More informationJournal of the American Statistical Association, Vol. 85, No (Sep., 1990), pp
Unmasking Multivariate Outliers and Leverage Points Peter J. Rousseeuw; Bert C. van Zomeren Journal of the American Statistical Association, Vol. 85, No. 411. (Sep., 1990), pp. 633639. Stable URL: http://links.jstor.org/sici?sici=01621459%28199009%2985%3a411%3c633%3aumoalp%3e2.0.co%3b2p
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationLettervalue plots: Boxplots for large data
Lettervalue plots: Boxplots for large data Heike Hofmann Karen Kafadar Hadley Wickham Dept of Statistics Dept of Statistics Dept of Statistics Iowa State University Indiana University Rice University
More informationDoseresponse modeling with bivariate binary data under model uncertainty
Doseresponse modeling with bivariate binary data under model uncertainty Bernhard Klingenberg 1 1 Department of Mathematics and Statistics, Williams College, Williamstown, MA, 01267 and Institute of Statistics,
More informationTesting Statistical Hypotheses
E.L. Lehmann Joseph P. Romano, 02LEu1 ttd ~Lt~S Testing Statistical Hypotheses Third Edition With 6 Illustrations ~Springer 2 The Probability Background 28 2.1 Probability and Measure 28 2.2 Integration.........
More informationSmooth simultaneous confidence bands for cumulative distribution functions
Journal of Nonparametric Statistics, 2013 Vol. 25, No. 2, 395 407, http://dx.doi.org/10.1080/10485252.2012.759219 Smooth simultaneous confidence bands for cumulative distribution functions Jiangyan Wang
More informationWALD LECTURE II LOOKING INSIDE THE BLACK BOX. Leo Breiman UCB Statistics
1 WALD LECTURE II LOOKING INSIDE THE BLACK BOX Leo Breiman UCB Statistics leo@stat.berkeley.edu ORIGIN OF BLACK BOXES 2 Statistics uses data to explore problems. Think of the data as being generated by
More informationRobust SpaceTime Adaptive Processing Using Projection Statistics
Robust SpaceTime Adaptive Processing Using Projection Statistics André P. des Rosiers 1, Gregory N. Schoenig 2, Lamine Mili 3 1: Adaptive Processing Section, Radar Division United States Naval Research
More informationA NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL
Discussiones Mathematicae Probability and Statistics 36 206 43 5 doi:0.75/dmps.80 A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL Tadeusz Bednarski Wroclaw University email: t.bednarski@prawo.uni.wroc.pl
More informationInvariant coordinate selection for multivariate data analysis  the package ICS
Invariant coordinate selection for multivariate data analysis  the package ICS Klaus Nordhausen 1 Hannu Oja 1 David E. Tyler 2 1 Tampere School of Public Health University of Tampere 2 Department of Statistics
More informationRejoinder on: Control of the false discovery rate under dependence using the bootstrap and subsampling
Test (2008) 17: 461 471 DOI 10.1007/s1174900801346 DISCUSSION Rejoinder on: Control of the false discovery rate under dependence using the bootstrap and subsampling Joseph P. Romano Azeem M. Shaikh
More informationA Bootstrap Test for Causality with Endogenous Lag Length Choice.  theory and application in finance
CESIS Electronic Working Paper Series Paper No. 223 A Bootstrap Test for Causality with Endogenous Lag Length Choice  theory and application in finance R. Scott Hacker and Abdulnasser HatemiJ April 200
More informationRobust statistic for the oneway MANOVA
Institut f. Statistik u. Wahrscheinlichkeitstheorie 1040 Wien, Wiedner Hauptstr. 810/107 AUSTRIA http://www.statistik.tuwien.ac.at Robust statistic for the oneway MANOVA V. Todorov and P. Filzmoser Forschungsbericht
More informationUsing Ridge Least Median Squares to Estimate the Parameter by Solving Multicollinearity and Outliers Problems
Modern Applied Science; Vol. 9, No. ; 05 ISSN 9844 EISSN 985 Published by Canadian Center of Science and Education Using Ridge Least Median Squares to Estimate the Parameter by Solving Multicollinearity
More informationPhysics and Chemistry of the Earth
Physics and Chemistry of the Earth 34 (2009) 626 634 Contents lists available at ScienceDirect Physics and Chemistry of the Earth journal homepage: www.elsevier.com/locate/pce Performances of some parameter
More informationWhat s New in Econometrics? Lecture 14 Quantile Methods
What s New in Econometrics? Lecture 14 Quantile Methods Jeff Wooldridge NBER Summer Institute, 2007 1. Reminders About Means, Medians, and Quantiles 2. Some Useful Asymptotic Results 3. Quantile Regression
More informationQuantile Scale Curves
This article was downloaded by: [Zhejiang University] On: 24 March 2014, At: 16:06 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office:
More informationWhy some measures of fluctuating asymmetry are so sensitive to measurement error
Ann. Zool. Fennici 34: 33 37 ISSN 0003455X Helsinki 7 May 997 Finnish Zoological and Botanical Publishing Board 997 Commentary Why some measures of fluctuating asymmetry are so sensitive to measurement
More informationOn robust and efficient estimation of the center of. Symmetry.
On robust and efficient estimation of the center of symmetry Howard D. Bondell Department of Statistics, North Carolina State University Raleigh, NC 276958203, U.S.A (email: bondell@stat.ncsu.edu) Abstract
More informationComparing the performance of modified F t statistic with ANOVA and Kruskal Wallis test
Appl. Math. Inf. Sci. 7, No. 2L, 403408 (2013) 403 Applied Mathematics & Information Sciences An International ournal http://dx.doi.org/10.12785/amis/072l04 Comparing the performance of modified F t statistic
More informationThe energy Package. October 3, Title Estatistics (energy statistics) tests of fit, independence, clustering
The energy Package October 3, 2005 Title Estatistics (energy statistics) tests of fit, independence, clustering Version 1.03 Date 20051002 Author Maria L. Rizzo and Gabor J.
More informationSystem Monitoring with RealTime Contrasts
System Monitoring with Real Contrasts HOUTAO DENG Intuit, Mountain View, CA 94043, USA GEORGE RUNGER Arizona State University, Tempe, AZ 85287, USA EUGENE TUV Intel Corporation, Chandler, AZ 85226, USA
More informationCoDadendrogram: A new exploratory tool. 2 Dept. Informàtica i Matemàtica Aplicada, Universitat de Girona, Spain;
CoDadendrogram: A new exploratory tool J.J. Egozcue 1, and V. PawlowskyGlahn 2 1 Dept. Matemàtica Aplicada III, Universitat Politècnica de Catalunya, Barcelona, Spain; juan.jose.egozcue@upc.edu 2 Dept.
More informationA Brief Overview of Robust Statistics
A Brief Overview of Robust Statistics Olfa Nasraoui Department of Computer Engineering & Computer Science University of Louisville, olfa.nasraoui_at_louisville.edu Robust Statistical Estimators Robust
More informationRobust Maximum Association Between Data Sets: The R Package ccapp
Robust Maximum Association Between Data Sets: The R Package ccapp Andreas Alfons Erasmus Universiteit Rotterdam Christophe Croux KU Leuven Peter Filzmoser Vienna University of Technology Abstract This
More informationA Modified Mestimator for the Detection of Outliers
A Modified Mestimator for the Detection of Outliers Asad Ali Department of Statistics, University of Peshawar NWFP, Pakistan Email: asad_yousafzay@yahoo.com Muhammad F. Qadir Department of Statistics,
More informationComparing Noninformative Priors for Estimation and. Prediction in Spatial Models
Comparing Noninformative Priors for Estimation and Prediction in Spatial Models Vigre Semester Report by: Regina Wu Advisor: Cari Kaufman January 31, 2010 1 Introduction Gaussian random fields with specified
More informationDetection of Anomalies in Texture Images using MultiResolution Features
Detection of Anomalies in Texture Images using MultiResolution Features Electrical Engineering Department Supervisor: Prof. Israel Cohen Outline Introduction 1 Introduction Anomaly Detection Texture Segmentation
More informationTHE BREAKDOWN POINT EXAMPLES AND COUNTEREXAMPLES
REVSTAT Statistical Journal Volume 5, Number 1, March 2007, 1 17 THE BREAKDOWN POINT EXAMPLES AND COUNTEREXAMPLES Authors: P.L. Davies University of DuisburgEssen, Germany, and Technical University Eindhoven,
More informationHOTELLING S CHARTS USING WINSORIZED MODIFIED ONE STEP MESTIMATOR FOR INDIVIDUAL NON NORMAL DATA
20 th February 205. Vol.72 No.2 2005205 JATIT & LLS. All rights reserved. ISSN: 9928645 www.jatit.org EISSN: 87395 HOTELLING S CHARTS USING WINSORIZED MODIFIED ONE STEP MESTIMATOR FOR INDIVIDUAL NON
More informationSurprise Detection in Science Data Streams Kirk Borne Dept of Computational & Data Sciences George Mason University
Surprise Detection in Science Data Streams Kirk Borne Dept of Computational & Data Sciences George Mason University kborne@gmu.edu, http://classweb.gmu.edu/kborne/ Outline Astroinformatics Example Application:
More informationThe energy Package. October 28, 2007
The energy Package October 28, 2007 Title Estatistics (energy statistics) tests of fit, independence, clustering Version 1.07 Date 20071027 Author Maria L. Rizzo and Gabor
More informationIntraclass Correlations in OneFactor Studies
CHAPTER Intraclass Correlations in OneFactor Studies OBJECTIVE The objective of this chapter is to present methods and techniques for calculating the intraclass correlation coefficient and associated
More informationDealing With Zeros and Missing Values in Compositional Data Sets Using Nonparametric Imputation 1
Mathematical Geology, Vol. 35, No. 3, April 2003 ( C 2003) Dealing With Zeros and Missing Values in Compositional Data Sets Using Nonparametric Imputation 1 J. A. MartínFernández, 2 C. BarcelóVidal,
More informationOn ParameterMixing of Dependence Parameters
On ParameterMixing of Dependence Parameters by Murray D Smith and Xiangyuan Tommy Chen 2 Econometrics and Business Statistics The University of Sydney Incomplete Preliminary Draft May 9, 2006 (NOT FOR
More informationVariable Selection for Highly Correlated Predictors
Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu arxiv:1709.04840v1 [stat.me] 14 Sep 2017 Abstract Penaltybased variable selection methods are powerful in selecting relevant covariates
More informationEstimating Coefficients in Linear Models: It Don't Make No Nevermind
Psychological Bulletin 1976, Vol. 83, No. 2. 213217 Estimating Coefficients in Linear Models: It Don't Make No Nevermind Howard Wainer Department of Behavioral Science, University of Chicago It is proved
More informationFACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING
FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING Vishwanath Mantha Department for Electrical and Computer Engineering Mississippi State University, Mississippi State, MS 39762 mantha@isip.msstate.edu ABSTRACT
More informationOn Selecting Tests for Equality of Two Normal Mean Vectors
MULTIVARIATE BEHAVIORAL RESEARCH, 41(4), 533 548 Copyright 006, Lawrence Erlbaum Associates, Inc. On Selecting Tests for Equality of Two Normal Mean Vectors K. Krishnamoorthy and Yanping Xia Department
More informationApplication and Use of Multivariate Control Charts In a BTA Deep Hole Drilling Process
Application and Use of Multivariate Control Charts In a BTA Deep Hole Drilling Process Amor Messaoud, Winfied Theis, Claus Weihs, and Franz Hering Fachbereich Statistik, Universität Dortmund, 44221 Dortmund,
More informationCONTROL CHARTS FOR MULTIVARIATE NONLINEAR TIME SERIES
REVSTAT Statistical Journal Volume 13, Number, June 015, 131 144 CONTROL CHARTS FOR MULTIVARIATE NONLINEAR TIME SERIES Authors: Robert Garthoff Department of Statistics, European University, Große Scharrnstr.
More informationIndependent Component (IC) Models: New Extensions of the Multinormal Model
Independent Component (IC) Models: New Extensions of the Multinormal Model Davy Paindaveine (joint with Klaus Nordhausen, Hannu Oja, and Sara Taskinen) School of Public Health, ULB, April 2008 My research
More informationRobust Performance Hypothesis Testing with the Variance. Institute for Empirical Research in Economics University of Zurich
Institute for Empirical Research in Economics University of Zurich Working Paper Series ISSN 14240459 Working Paper No. 516 Robust Performance Hypothesis Testing with the Variance Olivier Ledoit and Michael
More informationBounding the number of affine roots
with applications in reliable and secure communication Inaugural Lecture, Aalborg University, August 11110, 11111100000 with applications in reliable and secure communication Polynomials: F (X ) = 2X 2
More informationA Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness
A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UTAustin SRC 2014, Galveston, TX 1 Background 2 Working model
More informationA nonparametric twosample wald test of equality of variances
University of Wollongong Research Online Faculty of Informatics  Papers (Archive) Faculty of Engineering and Information Sciences 211 A nonparametric twosample wald test of equality of variances David
More informationAn Empirical Study of Probability Elicitation under NoisyOR Assumption
An Empirical Study of Probability Elicitation under NoisyOR Assumption Adam Zagorecki and Marek Druzdzel Decision Systems Laboratory School of Information Science University of Pittsburgh, Pittsburgh,
More informationA NONPARAMETRIC TEST FOR HOMOGENEITY: APPLICATIONS TO PARAMETER ESTIMATION
Changepoint Problems IMS Lecture Notes  Monograph Series (Volume 23, 1994) A NONPARAMETRIC TEST FOR HOMOGENEITY: APPLICATIONS TO PARAMETER ESTIMATION BY K. GHOUDI AND D. MCDONALD Universite' Lava1 and
More informationApplications of Information Geometry to Hypothesis Testing and Signal Detection
CMCAA 2016 Applications of Information Geometry to Hypothesis Testing and Signal Detection Yongqiang Cheng National University of Defense Technology July 2016 Outline 1. Principles of Information Geometry
More informationRobust and Efficient Fitting of the Generalized Pareto Distribution with Actuarial Applications in View
Robust and Efficient Fitting of the Generalized Pareto Distribution with Actuarial Applications in View Vytaras Brazauskas 1 University of WisconsinMilwaukee Andreas Kleefeld 2 University of WisconsinMilwaukee
More informationRobust Outcome Analysis for Observational Studies Designed Using Propensity Score Matching
The work of Kosten and McKean was partially supported by NIAAA Grant 1R21AA01790601A1 Robust Outcome Analysis for Observational Studies Designed Using Propensity Score Matching Bradley E. Huitema Western
More informationAnalysis of CrossSectional Data
Analysis of CrossSectional Data Kevin Sheppard http://www.kevinsheppard.com Oxford MFE This version: November 8, 2017 November 13 14, 2017 Outline Econometric models Specification that can be analyzed
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationPRINCIPLES OF STATISTICAL INFERENCE
Advanced Series on Statistical Science & Applied Probability PRINCIPLES OF STATISTICAL INFERENCE from a NeoFisherian Perspective Luigi Pace Department of Statistics University ofudine, Italy Alessandra
More informationCHAPTER 17 CHISQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)
FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NONPARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter
More informationA Statistical Perspective on Algorithmic Leveraging
Ping Ma PINGMA@UGA.EDU Department of Statistics, University of Georgia, Athens, GA 30602 Michael W. Mahoney MMAHONEY@ICSI.BERKELEY.EDU International Computer Science Institute and Dept. of Statistics,
More informationGeneralized Impulse Response Analysis: General or Extreme?
Auburn University Department of Economics Working Paper Series Generalized Impulse Response Analysis: General or Extreme? Hyeongwoo Kim Auburn University AUWP 201204 This paper can be downloaded without
More informationSpatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood
Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Kuangyu Wen & Ximing Wu Texas A&M University InfoMetrics Institute Conference: Recent Innovations in InfoMetrics October
More informationarxiv: v1 [math.st] 2 Aug 2007
The Annals of Statistics 2007, Vol. 35, No. 1, 13 40 DOI: 10.1214/009053606000000975 c Institute of Mathematical Statistics, 2007 arxiv:0708.0390v1 [math.st] 2 Aug 2007 ON THE MAXIMUM BIAS FUNCTIONS OF
More information