Development of robust scatter estimators under independent contamination model

Similar documents
Inference based on robust estimators Part 2

A SHORT COURSE ON ROBUST STATISTICS. David E. Tyler Rutgers The State University of New Jersey. Web-Site dtyler/shortcourse.

IMPROVING THE SMALL-SAMPLE EFFICIENCY OF A ROBUST CORRELATION MATRIX: A NOTE

Robust estimation of scale and covariance with P n and its application to precision matrix estimation

Stahel-Donoho Estimation for High-Dimensional Data

Introduction to Robust Statistics. Elvezio Ronchetti. Department of Econometrics University of Geneva Switzerland.

Accurate and Powerful Multivariate Outlier Detection

A ROBUST METHOD OF ESTIMATING COVARIANCE MATRIX IN MULTIVARIATE DATA ANALYSIS G.M. OYEYEMI *, R.A. IPINYOMI **

MULTIVARIATE TECHNIQUES, ROBUSTNESS

TITLE : Robust Control Charts for Monitoring Process Mean of. Phase-I Multivariate Individual Observations AUTHORS : Asokan Mulayath Variyath.

Re-weighted Robust Control Charts for Individual Observations

Robust and sparse Gaussian graphical modelling under cell-wise contamination

Journal of Statistical Software

Robust Exponential Smoothing of Multivariate Time Series

ON THE CALCULATION OF A ROBUST S-ESTIMATOR OF A COVARIANCE MATRIX

Robust Tools for the Imperfect World

Robust Inference for Seemingly Unrelated Regression Models

1. Density and properties Brief outline 2. Sampling from multivariate normal and MLE 3. Sampling distribution and large sample behavior of X and S 4.

Robust scale estimation with extensions

Robust Wilks' Statistic based on RMCD for One-Way Multivariate Analysis of Variance (MANOVA)

Identification of Multivariate Outliers: A Performance Study

The minimum volume ellipsoid (MVE), introduced

Research Article Robust Multivariate Control Charts to Detect Small Shifts in Mean

Computational Connections Between Robust Multivariate Analysis and Clustering

Introduction to Robust Statistics. Anthony Atkinson, London School of Economics, UK Marco Riani, Univ. of Parma, Italy

WEIGHTED LIKELIHOOD NEGATIVE BINOMIAL REGRESSION

Supplementary Material for Wang and Serfling paper

Leverage effects on Robust Regression Estimators

The Performance of Mutual Information for Mixture of Bivariate Normal Distributions Based on Robust Kernel Estimation

odhady a jejich (ekonometrické)

Robust Linear Discriminant Analysis and the Projection Pursuit Approach

ROBUST ESTIMATION OF A CORRELATION COEFFICIENT: AN ATTEMPT OF SURVEY

APPLICATIONS OF A ROBUST DISPERSION ESTIMATOR. Jianfeng Zhang. Doctor of Philosophy in Mathematics, Southern Illinois University, 2011

Projection Estimators for Generalized Linear Models

Detecting Deviating Data Cells

parameter space Θ, depending only on X, such that Note: it is not θ that is random, but the set C(X).

KANSAS STATE UNIVERSITY

Robust and sparse estimation of the inverse covariance matrix using rank correlation measures

Robust Estimation of Cronbach s Alpha

Departamento de Estadfstica y Econometrfa Statistics and Econometrics Series 10

Robust estimation in time series

The S-estimator of multivariate location and scatter in Stata

On consistency factors and efficiency of robust S-estimators

Detection of outliers in multivariate data:

Vienna University of Technology

Two Simple Resistant Regression Estimators

ON THE MAXIMUM BIAS FUNCTIONS OF MM-ESTIMATES AND CONSTRAINED M-ESTIMATES OF REGRESSION

Why the Rousseeuw Yohai Paradigm is One of the Largest and Longest Running Scientific Hoaxes in History

Robustness for dummies

CPSC 540: Machine Learning

arxiv: v1 [math.st] 11 Jun 2018

A Modified M-estimator for the Detection of Outliers

Robust negative binomial regression

Improved Feasible Solution Algorithms for. High Breakdown Estimation. Douglas M. Hawkins. David J. Olive. Department of Applied Statistics

Parallel Computation of High Dimensional Robust Correlation and Covariance Matrices

Robust estimation for linear regression with asymmetric errors with applications to log-gamma regression

Small Sample Corrections for LTS and MCD

Maximum Likelihood Estimation; Robust Maximum Likelihood; Missing Data with Maximum Likelihood

Minimum Regularized Covariance Determinant Estimator

Robust canonical correlation analysis: a predictive approach.

Introduction Robust regression Examples Conclusion. Robust regression. Jiří Franc

368 XUMING HE AND GANG WANG of convergence for the MVE estimator is n ;1=3. We establish strong consistency and functional continuity of the MVE estim

Robustifying Robust Estimators

INVARIANT COORDINATE SELECTION

Outlier Detection via Feature Selection Algorithms in

Fast and robust bootstrap for LTS

1 Robust statistics; Examples and Introduction

Robust Estimation, Regression and Ranking with Applications in Portfolio Optimization. Tri-Dung Nguyen

Introduction to robust statistics*

Detection of Multivariate Outliers in Business Survey Data with Incomplete Information

Research Article Robust Control Charts for Monitoring Process Mean of Phase-I Multivariate Individual Observations

Reinforcement Learning and Optimal Control. ASU, CSE 691, Winter 2019

Approximate Median Regression via the Box-Cox Transformation

Computationally Easy Outlier Detection via Projection Pursuit with Finitely Many Directions

Robust regression in R. Eva Cantoni

ASYMPTOTICS OF REWEIGHTED ESTIMATORS OF MULTIVARIATE LOCATION AND SCATTER. By Hendrik P. Lopuhaä Delft University of Technology

Fast and Robust Discriminant Analysis

Robust regression in Stata

Robust M-Estimation of Multivariate GARCH Models

Median Cross-Validation

ROBUSTNESS OF TWO-PHASE REGRESSION TESTS

A Robust Approach to Regularized Discriminant Analysis

Dr. Allen Back. Sep. 23, 2016

Multivariate coefficient of variation for functional data

Likelihood-based inference with missing data under missing-at-random

Outlier detection for skewed data

Regression Analysis for Data Containing Outliers and High Leverage Points

Robust Linear Model Selection for High-Dimensional Batasets

An Alternative Hotelling T 2 Control Chart Based on Minimum Vector Variance (MVV)

Robust estimation for the multivariate linear model based on a τ-scale

To Peter, Kris, Dad, Lame and everyone else who believed in me. ii

Lecture 3. Inference about multivariate normal distribution

S estimators for functional principal component analysis

Robust Variable Selection Through MAVE

Robust estimators for additive models using backfitting

Symmetrised M-estimators of multivariate scatter

S estimators for functional principal component analysis

Optimal robust estimates using the Hellinger distance

J. W. LEE (Kumoh Institute of Technology, Kumi, South Korea) V. I. SHIN (Gwangju Institute of Science and Technology, Gwangju, South Korea)

Robust covariance matrices estimation and applications in signal processing

Transcription:

Development of robust scatter estimators under independent contamination model C. Agostinelli 1, A. Leung, V.J. Yohai 3 and R.H. Zamar 1 Universita Cà Foscàri di Venezia, University of British Columbia, and 3 Universidad de Buenos Aires and CONICET Mar 16, 013

Some declarations To math geeks: I am sorry but I will keep my talk to have minimal math equations and theorems today (come on, it is 9 am!)

Objective of the day Objective: robust estimation of (location and) scatter matrix for a data set of size n and p continuous variables.

What is contamination?

What is contamination? Perhaps the most classical contamination model is Huber-Tukey contamination model (HTCM) (Tukey in 1960, Huber in 1964), which was originally for 1-D data... Contamination is row-wise, e.g. [,1] [,] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 0.9 -.8 -.1-0.8 -.4 1.3.7 3.4 0.9-0.1 [,] -.4.3-1.8-3.0 1.9 1.0-0.5 0.4 -.8-1.5 [3,] 0.7 -.3-0.6.9-1.5-0.8.9 0.0 -.6 1.8 [4,] 1.0 1.9 1.6 1.1 0.0 -. 1.0-4.1. -0.9 [5,] 0.1-1.0 1.8. -0.1.1-1.3 3.1 1. 1.0 [6,] 1.7 3.0 0.6 0.9-1.4 1.9-0.3-0.4-0.4 1.7 [7,] -0.8 1.0.5 3.9 -.8.5-0.3-0.9.6.4

What is contamination? HTCM in math notation, x = (1 u)x + uc where x = (x 1,..., x p ) N(µ, Σ) c something u Bin(1, ɛ), 0 ɛ < 1/

New contamination model HTCM may not be realistic... outliers are more likely to happen in certain variables, independent of others what if p is large but n is of moderate to small size? what if every single observation has one component contamination?

New contamination model HTCM may not be realistic... outliers are more likely to happen in certain variables, independent of others what if p is large but n is of moderate to small size? what if every single observation has one component contamination? Alqallaf, Van Aelst, Yohai and Zamar (006) proposed a new contamination model... Cell-wise contamination model

New contamination model Contamination is cell-wise, e.g. [,1] [,] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,].69.10 4.59.13-1.09.7-0.7 0.47-1.4-1.90 [,].9.0-1.70-1.83-1.05 4.89 0.3-1.93 -.59 -.48 [3,] -0.75 0.53-3. 3.07 4.04-1.39-0.6 0.44 0.05.14 [4,] -.35 4.46-0.99-0.41 0.68 -.79 1.37 1.74 1.35 1.78 [5,] -1.09 -.77 4.59 -.78-0.97 1.35 4.10-0.56 3.79-0.11 [6,] -1.94-0.33-0.40-3. 1.3 0.4-1.89 1.0.60 4.54

New contamination model Contamination is cell-wise, e.g. [,1] [,] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,].69.10 4.59.13-1.09.7-0.7 0.47-1.4-1.90 [,].9.0-1.70-1.83-1.05 4.89 0.3-1.93 -.59 -.48 [3,] -0.75 0.53-3. 3.07 4.04-1.39-0.6 0.44 0.05.14 [4,] -.35 4.46-0.99-0.41 0.68 -.79 1.37 1.74 1.35 1.78 [5,] -1.09 -.77 4.59 -.78-0.97 1.35 4.10-0.56 3.79-0.11 [6,] -1.94-0.33-0.40-3. 1.3 0.4-1.89 1.0.60 4.54 where in math model is x = (1 U)x + Uc where x = (x 1,..., x p ) and c is same as before, except U = diag(u i ), where u i Bin(1, ɛ), 0 ɛ < 1/

Existing robust scatter estimators Under HTCM, we have... Minimum Volume Ellipsoid (MVE) (Rousseeuw, 1985) Minimum Covariance Determinant (MCD) (Rousseeuw, 1985) S-estimator (Davies, 1987) MM-estimator (Yohai, 1987; Tatsuoka and Tyler, 000) modified GK estimator (Maronna and Zamar, 00)... Let s look at how these existing robust scatter estimators (e.g. MVE, S-est, MM-est) perform under HTCM and Cell-wise contam.

HTCM Let s first illustrate through mini examples and diagrams: p = 3, n = 30, ɛ = 0.0, random covariance matrix, origin center, normal 95% conf. ellipsoids: MLE-clean (blue)

HTCM Let s first illustrate through mini examples and diagrams: p = 3, n = 30, ɛ = 0.0, random covariance matrix, origin center, normal 95% conf. ellipsoids: MLE-clean (blue)

HTCM Let s first illustrate through mini examples and diagrams: p = 3, n = 30, ɛ = 0.0, random covariance matrix, origin center, normal 95% conf. ellipsoids: MLE-clean (blue), MLE (yellow)

HTCM Let s first illustrate through mini examples and diagrams: p = 3, n = 30, ɛ = 0.0, random covariance matrix, origin center, normal 95% conf. ellipsoids: MLE-clean (blue), MLE (yellow), MVE (green)

HTCM Let s first illustrate through mini examples and diagrams: p = 3, n = 30, ɛ = 0.0, random covariance matrix, origin center, normal 95% conf. ellipsoids: MLE-clean (blue), MLE (yellow), MVE (green), S-est. (red)

HTCM Let s first illustrate through mini examples and diagrams: p = 3, n = 30, ɛ = 0.0, random covariance matrix, origin center, normal 95% conf. ellipsoids: MLE-clean (blue), MLE (yellow), MVE (green), S-est. (red),mm-est. (gray)

Davies S-estimator Definition (Davies, 1987): For µ R p and positive definite Σ, S-estimator is ( µ, Σ) = arg min s(µ, Σ) Σ = ŝ Σ where s(µ, Σ) is solution s to 1 n n (x i µ) T Σ 1 (x i µ) Σ 1/p ρ s = 1, i=1 with ρ( ) is some bounded monotone loss function and must satifies ( )) X E Φ (ρ = 1 c

MM-estimator (a two-stage estimator) Definition: For µ R p and positive definite Σ, MM-estimator is ( µ, Σ) = arg min J(µ, Σ) where J(µ, Σ) = 1 n n i=1 (x i µ) ρ T Σ 1 (x i µ) Σ 1/p ŝ n with ρ ( ) being a different loss function, i.e. ρ ( ) ρ 1 ( ) and ŝ n being the scale from S-estimate.

Cell-wise contamination p = 3, n = 30, ɛ = 0.0, random covariance matrix, origin center, normal 95% conf. ellipsoids: MLE-clean (blue)

Cell-wise contamination p = 3, n = 30, ɛ = 0.0, random covariance matrix, origin center, normal 95% conf. ellipsoids: MLE-clean (blue), MLE (yellow)

Cell-wise contamination p = 3, n = 30, ɛ = 0.0, random covariance matrix, origin center, normal 95% conf. ellipsoids: MLE-clean (blue), MLE (yellow), MVE (green)

Cell-wise contamination p = 3, n = 30, ɛ = 0.0, random covariance matrix, origin center, normal 95% conf. ellipsoids: MLE-clean (blue), MLE (yellow), MVE (green), S-est. (red)

Cell-wise contamination p = 3, n = 30, ɛ = 0.0, random covariance matrix, origin center, normal 95% conf. ellipsoids: MLE-clean (blue), MLE (yellow), MVE (green), S-est. (red),mm-est. (gray)

Composite S-estimator MVE, S-, and MM estimator performs very badly under cell-wise contam...

Composite S-estimator MVE, S-, and MM estimator performs very badly under cell-wise contam... Note that in our cell-wise contam. example, P( 1 variable is contam.) = 1 (1 ɛ) p = 0.488.

Composite S-estimator MVE, S-, and MM estimator performs very badly under cell-wise contam... Note that in our cell-wise contam. example, P( 1 variable is contam.) = 1 (1 ɛ) p = 0.488. In fact, all affine equivariant estimators for covariance collapse under cell-wise contam. (Allqalaf et al., 009)!

Composite S-estimator MVE, S-, and MM estimator performs very badly under cell-wise contam... Note that in our cell-wise contam. example, P( 1 variable is contam.) = 1 (1 ɛ) p = 0.488. In fact, all affine equivariant estimators for covariance collapse under cell-wise contam. (Allqalaf et al., 009)! We need to develop a new estimator... Composite-S estimator (CSE)...but this estimator is not affine equivariant, which saves from falling under HTCM!

Composite S-estimator In short, CSE attempts to minimize the size of the covariance (e.g. ellipses ) for each pair of variables simultaneously, instead of all variables.

Composite S-estimator In short, CSE attempts to minimize the size of the covariance (e.g. ellipses ) for each pair of variables simultaneously, instead of all variables. It tries to downweight bivariate Mahalanobis distances, instead of full, when constructing the covariance matrix

Composite S-estimator In short, CSE attempts to minimize the size of the covariance (e.g. ellipses ) for each pair of variables simultaneously, instead of all variables. It tries to downweight bivariate Mahalanobis distances, instead of full, when constructing the covariance matrix Now let s have an example, we will get back to its definition later...

Composite S-estimator Example: p = 5, n = 100, ɛ = 0.10, random covariance matrix, origin center, normal, cell-wise contam. 95% confidence region based on Davies S-estimator vs true covariance: Scatter Plot Matrix V1 0 4 0 4 4 0 4 0 V 4 6 4 6 4 0 4 0 V3 4 6 4 6 0 0 V4 0 4 0 4 4 0 4 0 V5 4 6 8 4 6 8 4 0 4 0 true S est

Composite S-estimator Example: p = 5, n = 100, ɛ = 0.10, random covariance matrix, origin center, normal, cell-wise contam. 95% confidence region based on CSE: Scatter Plot Matrix V1 0 4 0 4 4 0 4 0 V 4 6 4 6 4 0 4 0 V3 4 6 4 6 0 0 V4 0 4 0 4 4 0 4 0 V5 4 6 8 4 6 8 4 0 4 0 true CSE

Composite S-estimator Example: p = 5, n = 100, ɛ = 0.10, random covariance matrix, origin center, normal, cell-wise contam. 95% confidence region based on CSE versus S-est. based on each pair: Scatter Plot Matrix V1 0 4 0 4 4 0 4 0 V 4 6 4 6 4 0 4 0 V3 4 6 4 6 0 0 V4 0 4 0 4 4 0 4 0 V5 4 6 8 4 6 8 4 0 4 0 true CSE Pairwise S

Composite S-estimator Definition (CSE): For a given robust initial estimator Ω 0, ( µ, Σ) = arg min s(µ, Σ, Ω 0 ) Σ = ŝ Σ where s(µ, Σ, Ω 0 ) is solution s to d jk i p(p 1)n n p p 1 d jk (µ, Σ) i Σ jk 1/ ρ s c 0 Ω jk = 1 0 1/ i=1 j=k k=1 (µ, Σ) = (x jk µ jk ) T Σ jk 1 (x jk µ jk ) is the bivariate Mahalanobis distance, and c must satisifies the same criteria as in Davies S-estimator but in bivariate.

Composite MM-estimator CSE in general is robust under cell-wise contam. but not efficient.

Composite MM-estimator CSE in general is robust under cell-wise contam. but not efficient. Efficiency is a measurement of variability of the estimate relative to some gold standard, such as MLE, under no contamination.

Composite MM-estimator CSE in general is robust under cell-wise contam. but not efficient. Efficiency is a measurement of variability of the estimate relative to some gold standard, such as MLE, under no contamination. We use the corresponding MM-version (Tatsuoka and Tyler, 000) of CSE to achieve efficiency

Composite S- and MM-estimator Both have very nice but complex estimation procedure that closely link with S-estimator with missing data (Danilov et al, 01), but we will not describe here

Some results shown in ICORS 01 We performed a Monte Carlo study to assess the behavior of the proposed estimators. Simulation setting: x N(0, Σ 0 ), some n and p Σ 0 is exchangeable correlation, i.e. Σ 0 = 1 r... r r 1... r............ r... 1 r r... r 1

Some results shown in ICORS 01 Here we show some results for Correlations: r = 0.5 and r = 0.9 p = 10 and n = 100. p = 0 and n = 00.

Some results shown in ICORS 01 Performance criteria as: 1. Likelihood ratio test distance (LRT) for robustness evaluation D( Σ, Σ 0 ) = 1 N D( Σ i, Σ 0 ) N where i=1 D( Σ, Σ 0 ) = trace(σ 1 0 Σ) log(det(σ 1 0 Σ)) p. Relative efficiency based on LRT values for efficiency evaluation D( Σ MLE, Σ 0 )/D( Σ, Σ 0 )

Monte Carlo results Gaussian Efficiency Without Outliers p = 10, n = 100 p = 0, n = 00 ESTIMATES r 0.5 0.9 S-est 0.91 0.90 Pairwise-S 0.5 0.45 CSE 0.70 0.50 CMME 0.74 0.78 ESTIMATES r 0.5 0.9 S-est 0.96 0.96 Pairwise-S 0.36 0.37 CSE 0.74 0.44 CMME 0.81 0.60

Monte Carlo results n = 100, p = 10, ɛ = 10% THCM Corr.=0.5 10% Contamination (n=100, p=10) Pairwise S Classical S CS (QC) CMM (QC) 5 10 15 0 THCM Corr.=0.9 8 6 4 Average LRT distance 8 ICM Corr.=0.5 ICM Corr.=0.9 0 6 4 0 5 10 15 0 Outliers size

Remarks and conclusion In general, CSE (and CMME) are very robust under cell-wise contam. We have seen that CSE (and CMME) do not perform very well under HTCM Our goal is to have an estimator highly robust under both HTCM and cell-wise contam. (we are ambitious!)...while efficiency is our second priority To be continued...

Acknowledgement Special thanks to Professor R. Zamar and Professor V. Yohai! Prof. Zamar Prof. Yohai...AND THANK YOU FOR LISTENING! C. Agostinelli1, A. Leung,, V.J. Yohai3 and R.H. Zamar Development of robust scatter estimators under independent