Cellwise robust regularized discriminant analysis

Similar documents
Cellwise robust regularized discriminant analysis

Robust and sparse Gaussian graphical modelling under cell-wise contamination

A Robust Approach to Regularized Discriminant Analysis

Robust estimation of scale and covariance with P n and its application to precision matrix estimation

Frontiers in Forecasting, Minneapolis February 21-23, Sparse VAR-Models. Christophe Croux. EDHEC Business School (France)

Introduction to Machine Learning

Introduction to Machine Learning

Robust and sparse estimation of the inverse covariance matrix using rank correlation measures

Raninen, Elias; Ollila, Esa Optimal Pooling of Covariance Matrix Estimates Across Multiple Classes

Robust Inverse Covariance Estimation under Noisy Measurements

Classification: Linear Discriminant Analysis

Robust scale estimation with extensions

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation

Lecture 6: Methods for high-dimensional problems

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber

High Dimensional Discriminant Analysis

Lecture 9: Classification, LDA

Sparse Permutation Invariant Covariance Estimation: Final Talk

Robust Linear Discriminant Analysis

Lecture 9: Classification, LDA

The Bayes classifier

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification

Classification Methods II: Linear and Quadratic Discrimminant Analysis

CMSC858P Supervised Learning Methods

The Variational Gaussian Approximation Revisited

Machine Learning (CS 567) Lecture 5

Gaussian Models

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)

Proximity-Based Anomaly Detection using Sparse Structure Learning

Lecture 4 Discriminant Analysis, k-nearest Neighbors

High Dimensional Discriminant Analysis

Classification Ensemble That Maximizes the Area Under Receiver Operating Characteristic Curve (AUC)

Machine Learning - MT Classification: Generative Models

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

Nonlinear Support Vector Machines through Iterative Majorization and I-Splines

Regularized Discriminant Analysis and Reduced-Rank LDA

Linear Regression and Discrimination

Section 7: Discriminant Analysis.

Sparse Permutation Invariant Covariance Estimation

An algorithm for the multivariate group lasso with covariance estimation

Learning Binary Classifiers for Multi-Class Problem

ROBUST LINEAR DISCRIMINANT ANALYSIS WITH A LAPLACIAN ASSUMPTION ON PROJECTION DISTRIBUTION

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Statistical Machine Learning Hilary Term 2018

STATISTICS 407 METHODS OF MULTIVARIATE ANALYSIS TOPICS

Introduction to Machine Learning

Regularized Discriminant Analysis. Part I. Linear and Quadratic Discriminant Analysis. Discriminant Analysis. Example. Example. Class distribution

Lasso, Ridge, and Elastic Net

High Dimensional Discriminant Analysis

High-dimensional covariance estimation based on Gaussian graphical models

Statistical Data Mining and Machine Learning Hilary Term 2016

Approximation. Inderjit S. Dhillon Dept of Computer Science UT Austin. SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina.

Regression with correlation for the Sales Data

Unsupervised Regressive Learning in High-dimensional Space

A Study of Relative Efficiency and Robustness of Classification Methods

arxiv: v1 [math.st] 31 Jan 2008

Machine Learning (BSMC-GA 4439) Wenke Liu

Robust methods and model selection. Garth Tarr September 2015

Sparse quadratic classification rules via linear dimension reduction

Lecture 9: Classification, LDA

Active and Semi-supervised Kernel Classification

Introduction to Machine Learning Spring 2018 Note 18

Support Vector Machines for Classification: A Statistical Portrait

MULTIVARIATE PATTERN RECOGNITION FOR CHEMOMETRICS. Richard Brereton

INRIA Rh^one-Alpes. Abstract. Friedman (1989) has proposed a regularization technique (RDA) of discriminant analysis

STATS306B STATS306B. Discriminant Analysis. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010

Independent Factor Discriminant Analysis

Assignment 3. Introduction to Machine Learning Prof. B. Ravindran

Variable Selection and Weighting by Nearest Neighbor Ensembles

Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Principal Component Analysis Applied to Polytomous Quadratic Logistic

The Minimum Regularized Covariance Determinant estimator

Does Modeling Lead to More Accurate Classification?

CS 340 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis

From dummy regression to prior probabilities in PLS-DA

Linear Discriminant Analysis Based in part on slides from textbook, slides of Susan Holmes. November 9, Statistics 202: Data Mining

LEC 4: Discriminant Analysis for Classification

EPMC Estimation in Discriminant Analysis when the Dimension and Sample Sizes are Large

Regularized Parameter Estimation in High-Dimensional Gaussian Mixture Models

Classification via kernel regression based on univariate product density estimators

Transductive Inference for Estimating Values of Functions

Applications of Information Geometry to Hypothesis Testing and Signal Detection

Learning Kernel Parameters by using Class Separability Measure

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

Supervised Learning: Linear Methods (1/2) Applied Multivariate Statistics Spring 2012

LDA, QDA, Naive Bayes

Multivariate Normal Models

Introduction to Machine Learning

Semi-supervised Dictionary Learning Based on Hilbert-Schmidt Independence Criterion

Multivariate Normal Models

Machine Learning Linear Classification. Prof. Matteo Matteucci

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

MATH 567: Mathematical Techniques in Data Science Logistic regression and Discriminant Analysis

Efficient and Robust Scale Estimation

Ch 4. Linear Models for Classification

10708 Graphical Models: Homework 2

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Transcription:

Cellwise robust regularized discriminant analysis Ines Wilms (KU Leuven) and Stéphanie Aerts (University of Liège) ICORS, July 2017 Wilms and Aerts Cellwise robust regularized discriminant analysis 1

Discriminant analysis X = {x 1,..., x N } of dimension p, splitted into K groups, each having n k observations Goal : Classify new data x π k prior probability N p (µ k, Σ k ) conditional distribution Wilms and Aerts Cellwise robust regularized discriminant analysis 2

Discriminant analysis X = {x 1,..., x N } of dimension p, splitted into K groups, each having n k observations Goal : Classify new data x π k prior probability N p (µ k, Σ k ) conditional distribution Quadratic discriminant analysis (QDA) : ( max (x µk ) T ) Θ k (x µ k ) + log(det Θ k ) + 2 log π k k where Θ k := Σ 1 k Linear discriminant analysis (LDA) : Homoscedasticity : Θ k = Θ k Wilms and Aerts Cellwise robust regularized discriminant analysis 2

Discriminant analysis In practice, the parameters µ k, Θ k or Θ are estimated by the sample means x k the inverse of the sample covariance matrices Σ k the inverse of the sample pooled covariance matrix Σ pool = 1 N K K k=1 n k Σ k Wilms and Aerts Cellwise robust regularized discriminant analysis 3

Example - Phoneme dataset Hastie et al., 2009 K = 2 phonemes: aa (like in barn) or ao (like in more) p = 256: log intensity of the sound across 256 frequencies N = 1717 records of a male voice Correct classification performance s-lda s-qda 77.7 62.4 Wilms and Aerts Cellwise robust regularized discriminant analysis 4

Example - Phoneme dataset Hastie et al., 2009 K = 2 phonemes: aa (like in barn) or ao (like in more) p = 256: log intensity of the sound across 256 frequencies N = 1717 records of a male voice Correct classification performance s-lda s-qda 77.7 62.4 Σ 1 k inaccurate when p/n k is large, not computable when p > n k Σ 1 pool inaccurate when p/n is large, not computable when p > N Wilms and Aerts Cellwise robust regularized discriminant analysis 4

Objectives Propose a family of discriminant methods that, unlike the classical approach, are 1 computable and accurate in high dimension 2 cover the path between LDA and QDA 3 robust against cellwise outliers Wilms and Aerts Cellwise robust regularized discriminant analysis 5

1. Computable in high dimension Graphical Lasso QDA (Xu et al., 2014) Step 1 : Compute the sample means x k and covariance matrices Σ k Step 2 : Graphical Lasso (Friedman et al, 2008) to estimate Θ 1,..., Θ K : ) max n k log det(θ k ) n k tr (Θ k Σk λ 1 θ k,ij Θ k Step 3 : Plug x 1,..., x K and Θ 1,..., Θ K into the quadratic rule Note: Use pooled covariance matrix for Graphical Lasso LDA i j Wilms and Aerts Cellwise robust regularized discriminant analysis 6

1. Computable in high dimension Graphical Lasso QDA (Xu et al., 2014) Step 1 : Compute the sample means x k and covariance matrices Σ k Step 2 : Graphical Lasso (Friedman et al, 2008) to estimate Θ 1,..., Θ K : ) max n k log det(θ k ) n k tr (Θ k Σk λ 1 θ k,ij Θ k Step 3 : Plug x 1,..., x K and Θ 1,..., Θ K into the quadratic rule Note: Use pooled covariance matrix for Graphical Lasso LDA i j QDA does not exploit similarities LDA ignores group specificities Wilms and Aerts Cellwise robust regularized discriminant analysis 6

2. Covering path between LDA and QDA Joint Graphical Lasso DA (Price et al., 2015) Step 1 : Compute the sample means x k and covariance matrices Σ k Step 2 : Joint Graphical Lasso (Danaher et al, 2014) to estimate Θ 1,..., Θ K : max Θ 1,...,Θ K K n k log det(θ k ) n k tr(θ k Σk ) λ 1 k=1 k=1 i j k<k i,j K θ k,ij λ 2 θ k,ij θ k,ij, Step 3 : Plug Θ 1,..., Θ K into the quadratic discriminant rule Wilms and Aerts Cellwise robust regularized discriminant analysis 7

2. Covering path between LDA and QDA Joint Graphical Lasso DA (Price et al., 2015) Step 1 : Compute the sample means x k and covariance matrices Σ k Step 2 : Joint Graphical Lasso (Danaher et al, 2014) to estimate Θ 1,..., Θ K : max Θ 1,...,Θ K K n k log det(θ k ) n k tr(θ k Σk ) λ 1 k=1 k=1 i j k<k i,j K θ k,ij λ 2 θ k,ij θ k,ij, Step 3 : Plug Θ 1,..., Θ K into the quadratic discriminant rule Lack of robustness Wilms and Aerts Cellwise robust regularized discriminant analysis 7

3. Robustness against cellwise outliers Robust Joint Graphical Lasso DA Step 1 : Compute robust m k and S k estimates Step 2 : Joint Graphical Lasso to estimate Θ 1,..., Θ K max Θ 1,...,Θ K K n k log det(θ k ) n k tr(θ k S k ) λ 1 k=1 k=1 i j k<k i,j K θ k,ij λ 2 θ k,ij θ k,ij, Step 3 : Plug m 1,..., m K and Θ 1,..., Θ K into the quadratic discriminant rule Wilms and Aerts Cellwise robust regularized discriminant analysis 8

3. Robustness against cellwise outliers Robust Joint Graphical Lasso DA Step 1 : Compute robust m k and S k estimates Step 2 : Joint Graphical Lasso to estimate Θ 1,..., Θ K max Θ 1,...,Θ K K n k log det(θ k ) n k tr(θ k S k ) λ 1 k=1 k=1 i j k<k i,j K θ k,ij λ 2 θ k,ij θ k,ij, Step 3 : Plug m 1,..., m K and Θ 1,..., Θ K into the quadratic discriminant rule m k : vector of marginal medians S k : cellwise robust covariance matrices Wilms and Aerts Cellwise robust regularized discriminant analysis 8

Cellwise robust covariance estimators s 11... s 1i... s 1p... S k = s i1... s ij... s ip... s p1... s pj... s pp s ij = scale(x i ) scale(x j ) ĉorr(x i, X j ) scale(.) : Q n -estimator (Rousseeuw and Croux, 1993) ĉorr(.,.) : Kendall s correlation ĉorr K (X i, X j ) = 2 n(n 1) l<m ( ) sign (xl i xm)(x i j l xm) j. (see Croux and Öllerer, 2015; Tarr et al., 2016) Wilms and Aerts Cellwise robust regularized discriminant analysis 9

Simulation study Setting K = 10 groups n k = 30 p = 30 M = 1000 training and test sets Classification Performance Average percentage of correct classification Estimation accuracy Average Kullback-Leibler distance : ( K ) KL( Θ 1,..., Θ K ; Θ 1,..., Θ K ) = log det( Θ k Θ 1 k ) + tr( Θ k Θ 1 k ) k=1 Kp. Wilms and Aerts Cellwise robust regularized discriminant analysis 10

Uncontaminated scheme Non-robust estimators s-lda s-qda GL-LDA GL-QDA JGL-DA p = 30 CC 77.7 NA 80.5 83.0 83.5 KL 30.29 NA 21.87 40.41 5.03 Robust estimators r-lda r-qda rgl-lda rgl-qda rjgl-da p = 30 CC 76.1 59.7 77.4 79.7 80.1 KL 22.86 139.18 22.98 44.57 11.02 Wilms and Aerts Cellwise robust regularized discriminant analysis 11

Contaminated scheme: 5% of cellwise contamination Correct classification percentages, p = 30 90 80 70 60 50 40 30 20 10 0 s LDA r LDA s QDA r QDA GL LDA rgl LDA GL QDA GL QDA JGL DA rjgl DA Wilms and Aerts Cellwise robust regularized discriminant analysis 12

Example 1 - Phoneme dataset N = 1717 K = 2 p = 256 Correct classification performance s-lda s-qda GL-LDA GL-QDA JGL-DA 77.7 62.4 81.4 74.9 78.4 r-lda r-qda rgl-lda rgl-qda rjgl-da 81.1 74.7 81.7 76.0 76.7 N train = 1030, N test = 687, averaged over 10 splits Wilms and Aerts Cellwise robust regularized discriminant analysis 13

Conclusion The proposed discriminant methods : 1 are computable in high dimension 2 cover the path between LDA and QDA 3 are robust against cellwise outliers 4 detect rowwise and cellwise outliers Code publicly available http://feb.kuleuven.be/ines.wilms/software Wilms and Aerts Cellwise robust regularized discriminant analysis 14

References S. Aerts, I. Wilms. Cellwise robust regularized discriminant analysis. FEB Research Report, KBI 1701, C. Croux and V. Öllerer. Robust high-dimensional precision matrix estimation, Modern Multivariate and Robust Methods. Springer, 2015 P. Danaher, P. Wang, and D. Witten. The joint graphical lasso for inverse covariance estimation across multiple classes. Journal of the Royal Statistical Society, Series B, 76: 373 397, 2014. B. Price, C. Geyer, and A. Rothman. Ridge fusion in statistical learning. Journal of Computational and Graphical Statistics, 24(2):439 454, 2015. P. Rousseeuw and C. Croux. Alternatives to the median absolute deviation. Journal of the American Statistical Association, 88(424):1273 1283, 1993. G. Tarr, S. Müller, and N.C. Weber. Robust estimation of precision matrices under cellwise contamination. Computational Statistics and Data Analysis, 93:404 420, 2016. B. Xu, K. Huang, King I., C. Liu, J. Sun, and N. Satoshi. Graphical lasso quadratic discriminant function and its application to character recognition. Neurocomputing, 129: 33 40, 2014. Wilms and Aerts Cellwise robust regularized discriminant analysis 15