Robust estimation of principal components from depth-based multivariate rank covariance matrix

Similar documents
Statistical Depth Function

Multivariate quantiles and conditional depth

CONTRIBUTIONS TO THE THEORY AND APPLICATIONS OF STATISTICAL DEPTH FUNCTIONS

FAST CROSS-VALIDATION IN ROBUST PCA

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Damage detection in the presence of outliers based on robust PCA Fahit Gharibnezhad 1, L.E. Mujica 2, Jose Rodellar 3 1,2,3

On the limiting distributions of multivariate depth-based rank sum. statistics and related tests. By Yijun Zuo 2 and Xuming He 3

Sparse PCA for high-dimensional data with outliers

Jensen s inequality for multivariate medians

A Convex Hull Peeling Depth Approach to Nonparametric Massive Multivariate Data Analysis with Applications

Principal Components Theory Notes

Computational Statistics and Data Analysis

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Dependence. MFM Practitioner Module: Risk & Asset Allocation. John Dodson. September 11, Dependence. John Dodson. Outline.

Supplementary Material for Wang and Serfling paper

Robust Classification for Skewed Data

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Stahel-Donoho Estimation for High-Dimensional Data

sphericity, 5-29, 5-32 residuals, 7-1 spread and level, 2-17 t test, 1-13 transformations, 2-15 violations, 1-19

Supervised Classification for Functional Data Using False Discovery Rate and Multivariate Functional Depth

Principal Component Analysis

Principal Component Analysis. Applied Multivariate Statistics Spring 2012

CS281 Section 4: Factor Analysis and PCA

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

DD-Classifier: Nonparametric Classification Procedure Based on DD-plot 1. Abstract

Eigenvalues, Eigenvectors, and an Intro to PCA

Eigenvalues, Eigenvectors, and an Intro to PCA

Computationally Easy Outlier Detection via Projection Pursuit with Finitely Many Directions

Linear Dimensionality Reduction

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides

Basics of Multivariate Modelling and Data Analysis

Principal Component Analysis-I Geog 210C Introduction to Spatial Data Analysis. Chris Funk. Lecture 17

Robust scale estimation with extensions

18.S096 Problem Set 7 Fall 2013 Factor Models Due Date: 11/14/2013. [ ] variance: E[X] =, and Cov[X] = Σ = =

MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 4: Measures of Robustness, Robust Principal Component Analysis

arxiv: v3 [stat.me] 2 Feb 2018 Abstract

Principal Component Analysis (PCA) Theory, Practice, and Examples

Multivariate Statistics (I) 2. Principal Component Analysis (PCA)

Second-Order Inference for Gaussian Random Curves

Statistical Data Analysis

Motivating the Covariance Matrix

MATH 829: Introduction to Data Mining and Analysis Principal component analysis

Scatter Matrices and Independent Component Analysis

1. Introduction to Multivariate Analysis

MULTIVARIATE TECHNIQUES, ROBUSTNESS

Fast and Robust Classifiers Adjusted for Skewness

Invariant coordinate selection for multivariate data analysis - the package ICS

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

Multivariate Statistics

Principal Component Analysis

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

Bayesian Decision Theory

Basic Concepts in Matrix Algebra

Mathematical foundations - linear algebra

Supervised Classification for Functional Data by Estimating the. Estimating the Density of Multivariate Depth

WEIGHTED HALFSPACE DEPTH

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Unconstrained Ordination

Stat 5101 Lecture Notes

Next is material on matrix rank. Please see the handout

Statistics for Applications. Chapter 9: Principal Component Analysis (PCA) 1/16

VAR Model. (k-variate) VAR(p) model (in the Reduced Form): Y t-2. Y t-1 = A + B 1. Y t + B 2. Y t-p. + ε t. + + B p. where:

Minimum Regularized Covariance Determinant Estimator

Multivariate Statistics

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining

Statistical and Learning Techniques in Computer Vision Lecture 1: Random Variables Jens Rittscher and Chuck Stewart

Independent component analysis for functional data

Principal component analysis

IV. Matrix Approximation using Least-Squares

Modeling Mutagenicity Status of a Diverse Set of Chemical Compounds by Envelope Methods

arxiv: v4 [math.st] 26 Aug 2017

Chemometrics. Matti Hotokka Physical chemistry Åbo Akademi University

Introduction to Machine Learning

Dimensionality Reduction

Learning gradients: prescriptive models

A ROBUST METHOD OF ESTIMATING COVARIANCE MATRIX IN MULTIVARIATE DATA ANALYSIS G.M. OYEYEMI *, R.A. IPINYOMI **

Methods of Nonparametric Multivariate Ranking and Selection

STATISTICAL LEARNING SYSTEMS

Evaluation of robust PCA for supervised audio outlier detection

Multivariate Statistical Analysis

Unsupervised Learning: Dimensionality Reduction

Def. The euclidian distance between two points x = (x 1,...,x p ) t and y = (y 1,...,y p ) t in the p-dimensional space R p is defined as

Eigenvalues, Eigenvectors, and an Intro to PCA

Data Preprocessing Tasks

STATISTICAL SHAPE MODELS (SSM)

A New Robust Partial Least Squares Regression Method

On Invariant Within Equivalence Coordinate System (IWECS) Transformations

Revision: Chapter 1-6. Applied Multivariate Statistics Spring 2012

Independent Component Analysis and Its Application on Accelerator Physics

Principal Components Analysis (PCA)

Factor Analysis (10/2/13)

Robust estimation of scale and covariance with P n and its application to precision matrix estimation

Statistics 202: Data Mining. c Jonathan Taylor. Week 2 Based in part on slides from textbook, slides of Susan Holmes. October 3, / 1

Part I. Other datatypes, preprocessing. Other datatypes. Other datatypes. Week 2 Based in part on slides from textbook, slides of Susan Holmes

MULTICHANNEL SIGNAL PROCESSING USING SPATIAL RANK COVARIANCE MATRICES

Frank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c /9/9 page 147 le-tex

An Introduction to Multivariate Statistical Analysis

Computation. For QDA we need to calculate: Lets first consider the case that

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.

Transcription:

Robust estimation of principal components from depth-based multivariate rank covariance matrix Subho Majumdar Snigdhansu Chatterjee University of Minnesota, School of Statistics

Table of contents

Summary Introduction: what is data depth? Multivariate ranks based on data depth The Depth Covariance Matrix (DCM): overview of results Performance: simulations and real data analysis

What is depth? Example: 500 points from N 2 ((0, 0) T, diag(2, 1)) A scalar measure of how much inside a point is with respect to a data cloud

Formal definition For any multivariate distribution F = F X, the depth of a point x R p, say D(x, F X ) is any real-valued function that provides a center outward ordering of x with respect to F (Zuo and Serfling, 2000). Desirable properties (Liu, 1990) (P1) Affine invariance: D(Ax + b, F AX+b ) = D(x, F X ) (P2) Maximality at center: D(θ, F X ) = sup x R p D(x, F X ) for F X with center of symmetry θ, the deepest point of F X. (P3) Monotonicity w.r.t. deepest point: D(x; F X ) D(θ + a(x θ), F X ) (P4) Vanishing at infinity: D(x; F X ) 0 as x.

Examples Halfspace depth (HD) (Tukey, 1975) is the minimum probability of all halfspaces containing a point. HD(x, F ) = inf u R p ;u 0 P(uT X u T x) Projection depth (PD) (Zuo, 2003) is based on an outlyingness function: O(x, F ) = sup u =1 u T x m(u T X) s(u T ; PD(x, F ) = X) 1 1 + O(x, F)

Utility of data depth Robustness Classification Depth-weighted means and covariance matrices What we re going to do: Define multivariate rank vectors based on data depth, do PCA on them

Spatial signs (Locantore et al., 1999) S(x) = { x x 1 if x 0 0 if x = 0 Say x follows an elliptic distribution with mean µ, covariance matrix Σ. Sign covariance matrix (SCM): Σ S (X) = ES(X µ)s(x µ) T SCM has same eigenvectors as Σ. PCA using SCM is robust, but not efficient. 10 5 0 5 10 10 5 0 5 10 X[,1] X[,2] 1.0 0.5 0.0 0.5 1.0 1.0 0.5 0.0 0.5 1.0 ux[,1] ux[,2]

Spatial ranks Fix a depth function D(x, F) = D X (x). Define D X (x) = sup x R p D X (x) D X (x) Transform the original observation: x = D X (x)s(x µ). This is the Spatial Rank of x. Depth Covariance Matrix (DCM) = Cov( X). Has more information than spatial signs, so more efficient. 10 5 0 5 10 10 5 0 5 10 X[,1] X[,2] 0.4 0.0 0.2 0.4 0.4 0.0 0.2 0.4 Xrank[,1] Xrank[,2]

Form of DCM Theorem (1) Let the random variable X R p follow an elliptical distribution with center µ and covariance matrix Σ = ΓΛΓ T, its spectral decomposition. Then, given a depth function D X (.) the covariance matrix of the transformed random variable X is Cov( X) = ΓΛ D,S Γ T, with Λ D,S = E [( D Z (z)) 2 Λ1/2 zz T Λ 1/2 ] z T (1) Λz where z = (z 1,..., z p ) T N(0, I p ) and Λ D,S a diagonal matrix with diagonal entries ] [( D Z (z)) 2 λ i zi 2 λ D,S,i = E Z p j=1 λ jzj 2 TL; DR: Population eigenvectors are invariant under the spatial rank transformation.

Other results Asymptotic distribution of sample DCM, form of its asymptotic variance Asymptotic joint distribution of eigenvectors and eigenvalues of sample DCM Form and shape of influence function: a measure of robustness Asymptotic efficiency relative to sample covariance matrix

Simulation 6 elliptical distributions: p-variate normal and t- distributions with df = 5, 6, 10, 15, 25. All distributions centered at 0 p, and have covariance matrix Σ = diag(p, p 1,...1). 3 choices of p: 2, 3 and 4. 10000 samples each for sample sizes n = 20, 50, 100, 300, 500 For estimates ˆγ 1 of the first eigenvector ˆγ 1, prediction error is measured by the average smallest angle between the two lines, i.e. Mean Squared Prediction Angle: MSPA(ˆγ 1 ) = 1 10000 10000 m=1 ( cos 1 γ T 1 ˆγ (m) Finite sample efficiency of some eigenvector estimate ˆγ E 1 relative to that obtained from the sample covariance matrix, say ˆγ Cov 1 is: 1 ) FSE(ˆγ E 1, ˆγ Cov 1 ) = MSPA(ˆγCov MSPA(ˆγ E 1 ) 1 ) 2

Table of FSE for p = 2 (HSD = Halfspace depth, MhD = Mahalanobis depth, PD = Projection depth) F = Bivariate t 5 SCM HSD-CM MhD-CM PD-CM n=20 0.80 0.95 0.95 0.89 n=50 0.86 1.25 1.10 1.21 n=100 1.02 1.58 1.20 1.54 n=300 1.24 1.81 1.36 1.82 n=500 1.25 1.80 1.33 1.84 F = Bivariate t 6 SCM HSD-CM MhD-CM PD-CM n=20 0.77 0.92 0.92 0.86 n=50 0.76 1.11 1.00 1.08 n=100 0.78 1.27 1.06 1.33 n=300 0.88 1.29 1.09 1.35 n=500 0.93 1.37 1.13 1.40 F = Bivariate t 10 SCM HSD-CM MhD-CM PD-CM n=20 0.70 0.83 0.84 0.77 n=50 0.58 0.90 0.84 0.86 n=100 0.57 0.92 0.87 0.97 n=300 0.62 0.93 0.85 0.99 n=500 0.62 0.93 0.86 1.00

Table of FSE for p = 2 (HSD = Halfspace depth, MhD = Mahalanobis depth, PD = Projection depth) F = Bivariate t 15 SCM HSD-CM MhD-CM PD-CM n=20 0.63 0.76 0.78 0.72 n=50 0.52 0.79 0.75 0.80 n=100 0.51 0.83 0.77 0.88 n=300 0.55 0.84 0.79 0.91 n=500 0.56 0.85 0.80 0.93 F = Bivariate t 25 SCM HSD-CM MhD-CM PD-CM n=20 0.63 0.77 0.79 0.74 n=50 0.49 0.73 0.71 0.76 n=100 0.45 0.73 0.69 0.81 n=300 0.51 0.78 0.75 0.87 n=500 0.53 0.79 0.75 0.87 F = BVN SCM HSD-CM MhD-CM PD-CM n=20 0.56 0.69 0.71 0.67 n=50 0.42 0.66 0.66 0.70 n=100 0.42 0.69 0.66 0.77 n=300 0.47 0.71 0.69 0.82 n=500 0.48 0.73 0.71 0.83

Data analysis: Bus data Features extracted from images of 213 buses: 18 variables Methods compared: Classical PCA (CPCA) SCM PCA (SPCA) ROBPCA (Hubert et al., 2005) PCA based on MCD (MPCA) PCA based on projection-dcm (DPCA)

Score distance (SD) and orthogonal distance (OD) SD i = k sij 2 ; λ j j=1 OD i = x i Ps T i where S n k = (s 1,..., s n) T is the scoring matrix, P p k loading matrix, λ 1,..., λ k are eigenvalues obtained from the PCA, and x 1,..., x n are the observation vectors.

Bus data: comparison tables Quantile Method of PCA CPCA SPCA ROBPCA MPCA DPCA 10% 1.9 1.2 1.2 1.0 1.2 20% 2.3 1.6 1.6 1.3 1.6 30% 2.8 1.8 1.8 1.7 1.9 40% 3.2 2.2 2.1 2.1 2.3 50% 3.7 2.6 2.5 3.1 2.6 60% 4.4 3.1 3.0 5.9 3.2 70% 5.4 3.8 3.9 25.1 3.9 80% 6.5 5.2 4.8 86.1 4.8 90% 8.2 9.0 10.9 298.2 6.9 Max 24 1037 1055 1037 980 Table lists quantiles of the squared orthogonal distance for a sample point from the hyperplane formed by top 3 PCs, For DPCA, more than 90% of points have a smaller orthogonal distance than CPCA

Data analysis: Octane data 226 variables and 39 observations. Each observation is a gasoline sample with a certain octane number, and have their NIR absorbance spectra measured in 2 nm intervals between 1100-1550 nm. 6 outliers: compounds 25, 26 and 36-39, which contain alcohol. Classical PCA Depth PCA Orthogonal distance 0.00 0.10 0.20 0.30 26 Orthogonal distance 0.00 0.10 0.20 0.30 25 38 37 39 36 26 0 5 10 15 20 Score distance 0 5 10 15 20 Score distance

Extensions: Robust kernel PCA 20 points from each person. Noise added to one image from each person. Columns due to kernel CPCA, SPCA and DPCA, respectively. Rows due to top 2, 4, 6, 8 or 10 PCs considered.

Extensions: future work Explore properties of a depth-weighted M-estimator of scale matrix: [ ( Σ Dw = E D ] X (x)) 2 (x µ)(x µ) T (x µ) T Σ 1 Dw (x µ) Leverage the idea of depth based ranks: robust non-parametric testing Extending to high-dimensional and functional data

Acknowledgements NSF grants # IIS-1029711, # SES-0851705; NASA grant #-1502546; The Institute on the Environment (IonE), and College of Liberal Arts (CLA) - University of Minnesota; Abhirup Mallik; My team in Santander Consumer USA;

References M. Hubert, P. J. Rousseeuw, and K. V. Branden. ROBPCA: A New Approach to Robust Principal Component Analysis. Technometrics, 47-1:64 79, 2005. R.Y. Liu. On a notion of data depth based on random simplices. Ann. of Statist., 18:405 414, 1990. N. Locantore, J.S. Marron, D.G. Simpson, N. Tripoli, J.T. Zhang, and K.L. Cohen. Robust principal components of functional data. TEST, 8:1 73, 1999. J.W. Tukey. Mathematics and picturing data. In R.D. James, editor, Proceedings of the International Congress on Mathematics, volume 2, pages 523 531, 1975. Y. Zuo. Projection-based depth functions and associated medians. Ann. Statist., 31:1460 1490, 2003. Y. Zuo and R. Serfling. General notions of statistical depth functions. Ann. Statist., 28-2:461 482, 2000.

THANK YOU!