Development of robust scatter estimators under independent contamination model C. Agostinelli 1, A. Leung, V.J. Yohai 3 and R.H. Zamar 1 Universita Cà Foscàri di Venezia, University of British Columbia, and 3 Universidad de Buenos Aires and CONICET Mar 16, 013
Some declarations To math geeks: I am sorry but I will keep my talk to have minimal math equations and theorems today (come on, it is 9 am!)
Objective of the day Objective: robust estimation of (location and) scatter matrix for a data set of size n and p continuous variables.
What is contamination?
What is contamination? Perhaps the most classical contamination model is Huber-Tukey contamination model (HTCM) (Tukey in 1960, Huber in 1964), which was originally for 1-D data... Contamination is row-wise, e.g. [,1] [,] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 0.9 -.8 -.1-0.8 -.4 1.3.7 3.4 0.9-0.1 [,] -.4.3-1.8-3.0 1.9 1.0-0.5 0.4 -.8-1.5 [3,] 0.7 -.3-0.6.9-1.5-0.8.9 0.0 -.6 1.8 [4,] 1.0 1.9 1.6 1.1 0.0 -. 1.0-4.1. -0.9 [5,] 0.1-1.0 1.8. -0.1.1-1.3 3.1 1. 1.0 [6,] 1.7 3.0 0.6 0.9-1.4 1.9-0.3-0.4-0.4 1.7 [7,] -0.8 1.0.5 3.9 -.8.5-0.3-0.9.6.4
What is contamination? HTCM in math notation, x = (1 u)x + uc where x = (x 1,..., x p ) N(µ, Σ) c something u Bin(1, ɛ), 0 ɛ < 1/
New contamination model HTCM may not be realistic... outliers are more likely to happen in certain variables, independent of others what if p is large but n is of moderate to small size? what if every single observation has one component contamination?
New contamination model HTCM may not be realistic... outliers are more likely to happen in certain variables, independent of others what if p is large but n is of moderate to small size? what if every single observation has one component contamination? Alqallaf, Van Aelst, Yohai and Zamar (006) proposed a new contamination model... Cell-wise contamination model
New contamination model Contamination is cell-wise, e.g. [,1] [,] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,].69.10 4.59.13-1.09.7-0.7 0.47-1.4-1.90 [,].9.0-1.70-1.83-1.05 4.89 0.3-1.93 -.59 -.48 [3,] -0.75 0.53-3. 3.07 4.04-1.39-0.6 0.44 0.05.14 [4,] -.35 4.46-0.99-0.41 0.68 -.79 1.37 1.74 1.35 1.78 [5,] -1.09 -.77 4.59 -.78-0.97 1.35 4.10-0.56 3.79-0.11 [6,] -1.94-0.33-0.40-3. 1.3 0.4-1.89 1.0.60 4.54
New contamination model Contamination is cell-wise, e.g. [,1] [,] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,].69.10 4.59.13-1.09.7-0.7 0.47-1.4-1.90 [,].9.0-1.70-1.83-1.05 4.89 0.3-1.93 -.59 -.48 [3,] -0.75 0.53-3. 3.07 4.04-1.39-0.6 0.44 0.05.14 [4,] -.35 4.46-0.99-0.41 0.68 -.79 1.37 1.74 1.35 1.78 [5,] -1.09 -.77 4.59 -.78-0.97 1.35 4.10-0.56 3.79-0.11 [6,] -1.94-0.33-0.40-3. 1.3 0.4-1.89 1.0.60 4.54 where in math model is x = (1 U)x + Uc where x = (x 1,..., x p ) and c is same as before, except U = diag(u i ), where u i Bin(1, ɛ), 0 ɛ < 1/
Existing robust scatter estimators Under HTCM, we have... Minimum Volume Ellipsoid (MVE) (Rousseeuw, 1985) Minimum Covariance Determinant (MCD) (Rousseeuw, 1985) S-estimator (Davies, 1987) MM-estimator (Yohai, 1987; Tatsuoka and Tyler, 000) modified GK estimator (Maronna and Zamar, 00)... Let s look at how these existing robust scatter estimators (e.g. MVE, S-est, MM-est) perform under HTCM and Cell-wise contam.
HTCM Let s first illustrate through mini examples and diagrams: p = 3, n = 30, ɛ = 0.0, random covariance matrix, origin center, normal 95% conf. ellipsoids: MLE-clean (blue)
HTCM Let s first illustrate through mini examples and diagrams: p = 3, n = 30, ɛ = 0.0, random covariance matrix, origin center, normal 95% conf. ellipsoids: MLE-clean (blue)
HTCM Let s first illustrate through mini examples and diagrams: p = 3, n = 30, ɛ = 0.0, random covariance matrix, origin center, normal 95% conf. ellipsoids: MLE-clean (blue), MLE (yellow)
HTCM Let s first illustrate through mini examples and diagrams: p = 3, n = 30, ɛ = 0.0, random covariance matrix, origin center, normal 95% conf. ellipsoids: MLE-clean (blue), MLE (yellow), MVE (green)
HTCM Let s first illustrate through mini examples and diagrams: p = 3, n = 30, ɛ = 0.0, random covariance matrix, origin center, normal 95% conf. ellipsoids: MLE-clean (blue), MLE (yellow), MVE (green), S-est. (red)
HTCM Let s first illustrate through mini examples and diagrams: p = 3, n = 30, ɛ = 0.0, random covariance matrix, origin center, normal 95% conf. ellipsoids: MLE-clean (blue), MLE (yellow), MVE (green), S-est. (red),mm-est. (gray)
Davies S-estimator Definition (Davies, 1987): For µ R p and positive definite Σ, S-estimator is ( µ, Σ) = arg min s(µ, Σ) Σ = ŝ Σ where s(µ, Σ) is solution s to 1 n n (x i µ) T Σ 1 (x i µ) Σ 1/p ρ s = 1, i=1 with ρ( ) is some bounded monotone loss function and must satifies ( )) X E Φ (ρ = 1 c
MM-estimator (a two-stage estimator) Definition: For µ R p and positive definite Σ, MM-estimator is ( µ, Σ) = arg min J(µ, Σ) where J(µ, Σ) = 1 n n i=1 (x i µ) ρ T Σ 1 (x i µ) Σ 1/p ŝ n with ρ ( ) being a different loss function, i.e. ρ ( ) ρ 1 ( ) and ŝ n being the scale from S-estimate.
Cell-wise contamination p = 3, n = 30, ɛ = 0.0, random covariance matrix, origin center, normal 95% conf. ellipsoids: MLE-clean (blue)
Cell-wise contamination p = 3, n = 30, ɛ = 0.0, random covariance matrix, origin center, normal 95% conf. ellipsoids: MLE-clean (blue), MLE (yellow)
Cell-wise contamination p = 3, n = 30, ɛ = 0.0, random covariance matrix, origin center, normal 95% conf. ellipsoids: MLE-clean (blue), MLE (yellow), MVE (green)
Cell-wise contamination p = 3, n = 30, ɛ = 0.0, random covariance matrix, origin center, normal 95% conf. ellipsoids: MLE-clean (blue), MLE (yellow), MVE (green), S-est. (red)
Cell-wise contamination p = 3, n = 30, ɛ = 0.0, random covariance matrix, origin center, normal 95% conf. ellipsoids: MLE-clean (blue), MLE (yellow), MVE (green), S-est. (red),mm-est. (gray)
Composite S-estimator MVE, S-, and MM estimator performs very badly under cell-wise contam...
Composite S-estimator MVE, S-, and MM estimator performs very badly under cell-wise contam... Note that in our cell-wise contam. example, P( 1 variable is contam.) = 1 (1 ɛ) p = 0.488.
Composite S-estimator MVE, S-, and MM estimator performs very badly under cell-wise contam... Note that in our cell-wise contam. example, P( 1 variable is contam.) = 1 (1 ɛ) p = 0.488. In fact, all affine equivariant estimators for covariance collapse under cell-wise contam. (Allqalaf et al., 009)!
Composite S-estimator MVE, S-, and MM estimator performs very badly under cell-wise contam... Note that in our cell-wise contam. example, P( 1 variable is contam.) = 1 (1 ɛ) p = 0.488. In fact, all affine equivariant estimators for covariance collapse under cell-wise contam. (Allqalaf et al., 009)! We need to develop a new estimator... Composite-S estimator (CSE)...but this estimator is not affine equivariant, which saves from falling under HTCM!
Composite S-estimator In short, CSE attempts to minimize the size of the covariance (e.g. ellipses ) for each pair of variables simultaneously, instead of all variables.
Composite S-estimator In short, CSE attempts to minimize the size of the covariance (e.g. ellipses ) for each pair of variables simultaneously, instead of all variables. It tries to downweight bivariate Mahalanobis distances, instead of full, when constructing the covariance matrix
Composite S-estimator In short, CSE attempts to minimize the size of the covariance (e.g. ellipses ) for each pair of variables simultaneously, instead of all variables. It tries to downweight bivariate Mahalanobis distances, instead of full, when constructing the covariance matrix Now let s have an example, we will get back to its definition later...
Composite S-estimator Example: p = 5, n = 100, ɛ = 0.10, random covariance matrix, origin center, normal, cell-wise contam. 95% confidence region based on Davies S-estimator vs true covariance: Scatter Plot Matrix V1 0 4 0 4 4 0 4 0 V 4 6 4 6 4 0 4 0 V3 4 6 4 6 0 0 V4 0 4 0 4 4 0 4 0 V5 4 6 8 4 6 8 4 0 4 0 true S est
Composite S-estimator Example: p = 5, n = 100, ɛ = 0.10, random covariance matrix, origin center, normal, cell-wise contam. 95% confidence region based on CSE: Scatter Plot Matrix V1 0 4 0 4 4 0 4 0 V 4 6 4 6 4 0 4 0 V3 4 6 4 6 0 0 V4 0 4 0 4 4 0 4 0 V5 4 6 8 4 6 8 4 0 4 0 true CSE
Composite S-estimator Example: p = 5, n = 100, ɛ = 0.10, random covariance matrix, origin center, normal, cell-wise contam. 95% confidence region based on CSE versus S-est. based on each pair: Scatter Plot Matrix V1 0 4 0 4 4 0 4 0 V 4 6 4 6 4 0 4 0 V3 4 6 4 6 0 0 V4 0 4 0 4 4 0 4 0 V5 4 6 8 4 6 8 4 0 4 0 true CSE Pairwise S
Composite S-estimator Definition (CSE): For a given robust initial estimator Ω 0, ( µ, Σ) = arg min s(µ, Σ, Ω 0 ) Σ = ŝ Σ where s(µ, Σ, Ω 0 ) is solution s to d jk i p(p 1)n n p p 1 d jk (µ, Σ) i Σ jk 1/ ρ s c 0 Ω jk = 1 0 1/ i=1 j=k k=1 (µ, Σ) = (x jk µ jk ) T Σ jk 1 (x jk µ jk ) is the bivariate Mahalanobis distance, and c must satisifies the same criteria as in Davies S-estimator but in bivariate.
Composite MM-estimator CSE in general is robust under cell-wise contam. but not efficient.
Composite MM-estimator CSE in general is robust under cell-wise contam. but not efficient. Efficiency is a measurement of variability of the estimate relative to some gold standard, such as MLE, under no contamination.
Composite MM-estimator CSE in general is robust under cell-wise contam. but not efficient. Efficiency is a measurement of variability of the estimate relative to some gold standard, such as MLE, under no contamination. We use the corresponding MM-version (Tatsuoka and Tyler, 000) of CSE to achieve efficiency
Composite S- and MM-estimator Both have very nice but complex estimation procedure that closely link with S-estimator with missing data (Danilov et al, 01), but we will not describe here
Some results shown in ICORS 01 We performed a Monte Carlo study to assess the behavior of the proposed estimators. Simulation setting: x N(0, Σ 0 ), some n and p Σ 0 is exchangeable correlation, i.e. Σ 0 = 1 r... r r 1... r............ r... 1 r r... r 1
Some results shown in ICORS 01 Here we show some results for Correlations: r = 0.5 and r = 0.9 p = 10 and n = 100. p = 0 and n = 00.
Some results shown in ICORS 01 Performance criteria as: 1. Likelihood ratio test distance (LRT) for robustness evaluation D( Σ, Σ 0 ) = 1 N D( Σ i, Σ 0 ) N where i=1 D( Σ, Σ 0 ) = trace(σ 1 0 Σ) log(det(σ 1 0 Σ)) p. Relative efficiency based on LRT values for efficiency evaluation D( Σ MLE, Σ 0 )/D( Σ, Σ 0 )
Monte Carlo results Gaussian Efficiency Without Outliers p = 10, n = 100 p = 0, n = 00 ESTIMATES r 0.5 0.9 S-est 0.91 0.90 Pairwise-S 0.5 0.45 CSE 0.70 0.50 CMME 0.74 0.78 ESTIMATES r 0.5 0.9 S-est 0.96 0.96 Pairwise-S 0.36 0.37 CSE 0.74 0.44 CMME 0.81 0.60
Monte Carlo results n = 100, p = 10, ɛ = 10% THCM Corr.=0.5 10% Contamination (n=100, p=10) Pairwise S Classical S CS (QC) CMM (QC) 5 10 15 0 THCM Corr.=0.9 8 6 4 Average LRT distance 8 ICM Corr.=0.5 ICM Corr.=0.9 0 6 4 0 5 10 15 0 Outliers size
Remarks and conclusion In general, CSE (and CMME) are very robust under cell-wise contam. We have seen that CSE (and CMME) do not perform very well under HTCM Our goal is to have an estimator highly robust under both HTCM and cell-wise contam. (we are ambitious!)...while efficiency is our second priority To be continued...
Acknowledgement Special thanks to Professor R. Zamar and Professor V. Yohai! Prof. Zamar Prof. Yohai...AND THANK YOU FOR LISTENING! C. Agostinelli1, A. Leung,, V.J. Yohai3 and R.H. Zamar Development of robust scatter estimators under independent