High Dimensional Discriminant Analysis

Similar documents
High Dimensional Discriminant Analysis

High Dimensional Discriminant Analysis

Classification of high dimensional data: High Dimensional Discriminant Analysis

Model-based clustering of high-dimensional data: an overview and some recent advances

Journal of Statistical Software

Introduction to Machine Learning Spring 2018 Note 18

INRIA Rh^one-Alpes. Abstract. Friedman (1989) has proposed a regularization technique (RDA) of discriminant analysis

Classification 2: Linear discriminant analysis (continued); logistic regression

Classification Methods II: Linear and Quadratic Discrimminant Analysis

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification

Regularized Discriminant Analysis and Reduced-Rank LDA

Chap 2. Linear Classifiers (FTH, ) Yongdai Kim Seoul National University

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)

Adaptive Mixture Discriminant Analysis for. Supervised Learning with Unobserved Classes

CMSC858P Supervised Learning Methods

Introduction to Machine Learning

Lecture 9: Classification, LDA

Heeyoul (Henry) Choi. Dept. of Computer Science Texas A&M University

Lecture 9: Classification, LDA

Machine Learning 11. week

CS534 Machine Learning - Spring Final Exam

CSCI-567: Machine Learning (Spring 2019)

L11: Pattern recognition principles

Pattern Recognition. Parameter Estimation of Probability Density Functions

Lecture 9: Classification, LDA

Model-Based Clustering of High-Dimensional Data: A review

Lecture 4: Probabilistic Learning

Clustering VS Classification

Classification 1: Linear regression of indicators, linear discriminant analysis

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Classification. Chapter Introduction. 6.2 The Bayes classifier

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Supervised Learning. Regression Example: Boston Housing. Regression Example: Boston Housing

Subspace Analysis for Facial Image Recognition: A Comparative Study. Yongbin Zhang, Lixin Lang and Onur Hamsici

Machine Learning 2nd Edition

CS281 Section 4: Factor Analysis and PCA

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Machine Learning Linear Classification. Prof. Matteo Matteucci

Linear Methods for Prediction

Introduction to Machine Learning

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Introduction to Graphical Models

Lecture 4 Discriminant Analysis, k-nearest Neighbors

PATTERN RECOGNITION AND MACHINE LEARNING

Regularized Discriminant Analysis. Part I. Linear and Quadratic Discriminant Analysis. Discriminant Analysis. Example. Example. Class distribution

THESIS COVARIANCE REGULARIZATION IN MIXTURE OF GAUSSIANS FOR HIGH-DIMENSIONAL IMAGE CLASSIFICATION. Submitted by. Daniel L Elliott

Lecture 5. Gaussian Models - Part 1. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. November 29, 2016

Model selection criteria in Classification contexts. Gilles Celeux INRIA Futurs (orsay)

Kernel discriminant analysis and clustering with parsimonious Gaussian process models

5. Discriminant analysis

Bayesian Decision and Bayesian Learning

Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis

Kernel Methods. Machine Learning A W VO

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2

The Bayes classifier

LDA, QDA, Naive Bayes

PCA and LDA. Man-Wai MAK

Machine learning for pervasive systems Classification in high-dimensional spaces

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Multivariate Analysis

ECE 661: Homework 10 Fall 2014

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Mathematical Formulation of Our Example

Adaptive Mixture Discriminant Analysis for Supervised Learning with Unobserved Classes

Qualifying Exam in Machine Learning

Naïve Bayes classification

Classification: Linear Discriminant Analysis

CSC411: Final Review. James Lucas & David Madras. December 3, 2018

Probabilistic Time Series Classification

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

MACHINE LEARNING ADVANCED MACHINE LEARNING

High Dimensional Kullback-Leibler divergence for grassland classification using satellite image time series with high spatial resolution

A Study of Relative Efficiency and Robustness of Classification Methods

Spring 2006: Linear Discriminant Analysis, Etc.

Minimum Message Length Inference and Mixture Modelling of Inverse Gaussian Distributions

MSA220 Statistical Learning for Big Data

Machine Learning (CS 567) Lecture 5

Discriminant Analysis Documentation

LEC 4: Discriminant Analysis for Classification

Mixture of Gaussians Models

PCA and LDA. Man-Wai MAK

Statistical Data Mining and Machine Learning Hilary Term 2016

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang

Computation. For QDA we need to calculate: Lets first consider the case that

Statistical Machine Learning Hilary Term 2018

Dimensionality Reduction and Principal Components

Linear Regression and Discrimination

Linear Methods for Prediction

Unsupervised Learning

Introduction to Machine Learning

System 1 (last lecture) : limited to rigidly structured shapes. System 2 : recognition of a class of varying shapes. Need to:

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA

Principal Components Analysis (PCA)

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Does Modeling Lead to More Accurate Classification?

Machine Learning Lecture 2

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides

Transcription:

High Dimensional Discriminant Analysis Charles Bouveyron LMC-IMAG & INRIA Rhône-Alpes Joint work with S. Girard and C. Schmid High Dimensional Discriminant Analysis - Lear seminar p.1/43

Introduction High dimensional data: many scientific domains need to analyze data which are increasingly complex, modern data are made up of many variables: imagery (MRI, vision), biology (DNA micro-array),... Classification is very difficult in high dimensional spaces: many learning methods suffer from the curse of dimensionality [Bel61], since the number n of data is not generally sufficient to learn high-dimensional data. The empty space phenomena [ST83] allows to assume that data live in subspaces with lower dimensionality. High Dimensional Discriminant Analysis - Lear seminar p.2/43

Introduction Classification: supervised classification (discriminant analysis) requires examples of classes, unsupervised classification (clustering) aims to organize data in homogeneous classes. 2 ways: generative methods: QDA, LDA, GMM, discriminantive methods: logistic regression and SVM. Generative models can be both used in supervised and unsupervised classification. High Dimensional Discriminant Analysis - Lear seminar p.3/43

Outline of the talk Discriminant analysis framework New modelisation of high-dimensional data High dimensional discriminant analysis (HDDA) construction of the decision rule a posteriori probability and reformulation Particular rules Estimators and intrinsic dimension estimation Numerical results application to image categorization application to object recognition Extension to unsupervised classification High Dimensional Discriminant Analysis - Lear seminar p.4/43

Part 1 Discriminant analysis framework High Dimensional Discriminant Analysis - Lear seminar p.5/43

Discriminant analysis framework Discriminant analysis is the supervised part of classification, i.e. it requires a professor! Discriminant analysis goals: descriptive aspect: find a data representation which allows to interpret the groups using explanatory variables. decisional aspect: the major goal is to find the good class membership of a new data x. Of course, HDDA favours the decisional aspect! High Dimensional Discriminant Analysis - Lear seminar p.6/43

Discrimination problem The basic problem: assign an observation x = (x 1,..., x p ) R p with unknown class membership to one of k classes C 1,..., C k known a priori. We have a learning dataset A: A = {(x 1, c 1 ),..., (x n, c n )/x j R p and y j {1,..., k}}, where the vector x j contains p explanatory variables and y j indicates the index of the class of x j. We have to construct a decision rule δ: δ : R p {1,..., k} x y. High Dimensional Discriminant Analysis - Lear seminar p.7/43

Bayes decision rule The optimal decision rule δ, called Bayes decision rule, is : δ : x C i, if i = argmax{p(c i x)}, i=1,...,k δ : x C i, if i = argmin{ 2 log(π i f i (x))}, i=1,...,k where π i is the a priori probability of class C i and f i (x) denotes the class conditional density of x. Generative methods usually assume that distributions of classes are Gaussian N (µ i, Σ i ). High Dimensional Discriminant Analysis - Lear seminar p.8/43

Classical discriminant analysis method Quadratic discriminant analysis (QDA): i = argmin{(x µ i ) t Σ 1 i (x µ i ) + log(det Σ i ) 2 log(π i )}. i=1,...,k Linear discriminant analysis (LDA): with the assumption that i, Σ i = Σ i = argmin{µ t iσ 1 µ i 2µ t iσ 1 x 2 log(π i )}. i=1,...,k QDA and LDA have disappointing behavior when the size of the training dataset n is small compared to the number p of variables. High Dimensional Discriminant Analysis - Lear seminar p.9/43

Discriminant analysis regularization Dimension reduction: PCA, FDA, features selection, Fischer discriminant analysis (FDA) combines: a dimension reduction step (projection on the k 1 discriminant axes) with one of the previous methods (usually LDA). Parsimonious models: Regularized discriminant analysis (RDA, [Fri89]), is an intermediate classifier between QDA and LDA, Eigenvalue decomposition discriminant analysis (EDDA, [BC96]) is based on the re-parametrization of the covariance matrices of classes: Σ i = λ i D i A i D t i. High Dimensional Discriminant Analysis - Lear seminar p.10/43

Dimension reduction for classification 25 20 20 15 15 10 5 0 5 10 15 10 5 0 5 10 15 20 20 25 20 15 10 5 0 5 10 15 20 25 15 10 5 0 5 10 15 PCA axes Discriminant axes Fig.1 - High-dimensional data which classes live in different subspaces with lower dimensionality. High Dimensional Discriminant Analysis - Lear seminar p.11/43

Part 2 New modelisation High Dimensional Discriminant Analysis - Lear seminar p.12/43

New modelisation The empty space phenomena enables us to assume that HD data live in subspaces with low dimensionality. The main idea of the new modelisation is: each class is decomposed on two subspaces with low dimensionality, and the classes are assumed spherical in these subspaces. High Dimensional Discriminant Analysis - Lear seminar p.13/43

New modelisation We assume that class conditional densities are Gaussian N (µ i, Σ i ) with means µ i and covariance matrices Σ i. Let Q i be the orthogonal matrix of eigenvectors of the covariance matrix Σ i, Let B i be the basis of R p made of the eigenvectors of Σ i. The class conditional covariance matrix i is defined in the basis B i by: i = Q t i Σ i Q i. High Dimensional Discriminant Analysis - Lear seminar p.14/43

New modelisation We assume in addition that i contains only two different eigenvalues a i > b i. Let E i be the affine space generated by eigenvectors associated to the eigenvalue a i and such that µ i E i. We define also E i such that E i E i = R p and µ i E i. Let P i and P i be the projection operators on E i and E i. High Dimensional Discriminant Analysis - Lear seminar p.15/43

New modelisation Thus, we assume that i has the following form: i = 0 B @ a i 0... 0 a i 0 0 b i 0...... 1 C A 9 >= >; 9 >= >; d i (p d i ) 0 b i High Dimensional Discriminant Analysis - Lear seminar p.16/43

New modelisation: illustration High Dimensional Discriminant Analysis - Lear seminar p.17/43

Part 3 High Dimensional Discriminant Analysis High Dimensional Discriminant Analysis - Lear seminar p.18/43

High Dimensional Discriminant Analysis Under the preceding assumptions, the Bayes decision rule yields a new decision rule δ + : Theorem 1: The new decision rule δ + consists in classifying x to the class C i if: i = argmin i=1,...,k { 1 µ i P i (x) 2 + 1 x P i (x) 2 a i b i } +d i log(a i ) + (p d i ) log(b i ) 2 log(π i ). High Dimensional Discriminant Analysis - Lear seminar p.19/43

HDDA: illustration K i (x) = 1 a i µ i P i (x) 2 + 1 b i x P i (x) 2 + d i log(a i ) + (p d i ) log(b i ) 2 log(π i ). High Dimensional Discriminant Analysis - Lear seminar p.20/43

HDDA: a posteriori probability In many applications, it is interesting to dispose of the a posteriori probability p(c i x) that x belongs to C i. The Bayes formula yields: p(c i x) = exp ( 1 2 K i(x) ) k j=1 exp ( 1 2 K j(x) ), where K i is the cost function of δ + conditionally with the class C i : K i (x) = 1 a i µ i P i (x) 2 + 1 b i x P i (x) 2 +d i log(a i ) + (p d i ) log(b i ) 2 log(π i ). High Dimensional Discriminant Analysis - Lear seminar p.21/43

HDDA: reformulation In order to interpret easily the decision rule δ +, we introduce α i and σ i : a i = σ2 i α i and b i = σ2 i (1 α i ) with α i ]0, 1[ and σ i > 0. Thus, the decision rule δ + consists in classifying x to the class C i if: { 1 i ( = argmin i=1,...,k σi 2 αi µ i P i (x) 2 + (1 α i ) x P i (x) 2) ( ) } 1 αi +2p log(σ i ) + d i log p log(1 α i ) 2 log(π i ). Notation: HDDA is the model [a i b i Q i d i ] or [α i σ i Q i d i ]. α i High Dimensional Discriminant Analysis - Lear seminar p.22/43

Part 4 Particular rules High Dimensional Discriminant Analysis - Lear seminar p.23/43

Particular rules By allowing some but not all of HDDA parameters to vary, we obtain 24 particular rules: which correspond to different regularizations, which some ones are easily geometrically interpretable, which 9 have explicit formulations. HDDA can be interpreted as a classical discriminant analysis in particular cases: if i, α i = 1 2 : δ+ is QDA with sperical classes, if in addition i, σ i = σ: δ + is LDA with sperical classes. High Dimensional Discriminant Analysis - Lear seminar p.24/43

Links with classical methods QDA Σ i = λ i D i A i D t i Σ i = Q i i Q t i EDDA HDDA Σ i = λdad t A i = Id α i = 1 2... LDA QDAs... Σ i = σ 2 i Id σ i = σ LDAs π i = π LDA géo High Dimensional Discriminant Analysis - Lear seminar p.25/43

Model [ασq i d i ] The decision rule δ + consists in classifying x to the class C i if: i = argmin{α µ i P i (x) 2 + (1 α) x P i (x) 2 }. i=1,...,k High Dimensional Discriminant Analysis - Lear seminar p.26/43

Part 5 Estimation High Dimensional Discriminant Analysis - Lear seminar p.27/43

HDDA estimators Estimators are computed using maximum likelihood estimation from the learning set A. Common estimators: ˆπ i = n i n, n i = #(C i ), ˆµ i = 1 n i x j C i x j, ˆΣ i = 1 n i x j C i (x j ˆµ i ) t (x j ˆµ i ). High Dimensional Discriminant Analysis - Lear seminar p.28/43

Estimators of the model [a i b i Q i d i ] Assuming d i is known, the ML estimators are: ˆQ i is made of the eigenvectors associated to the ordered eigenvalues of ˆΣ i, â i is the mean of the largest d i eigenvalues of ˆΣ i : â i = d i l=1 λ il d i, ˆb i is the mean of the smallest (p d i ) eigenvalues of ˆΣ i : p λ il ˆb i = (p d i ). l=d i +1 High Dimensional Discriminant Analysis - Lear seminar p.29/43

Estimation trick The decision rule δ + do not requires to compute the last (p d i ) eigenvectors of ˆΣ i. Thus, in order to minize the number of parameters to estimate, we use the following relation: p l=d i +1 λ il = tr( ˆΣ i ) d i l=1 λ il. Number of parameters to estimate with p = 100, d i = 10 and k = 4: QDA: 20 603 HDDA: 4 323 High Dimensional Discriminant Analysis - Lear seminar p.30/43

Intrinsic dimension estimation We base our approach to chose the values of d i on eigenvalues of Σ i, We use two empirical methods: common thresholding on the cumulative variance: d p d i = argmin λ d=1,...,p 1 j / λ j s, j=1 j=1 scree-test of Cattell: analyses differences between the eigenvalues in order to find a brake in the scree of eigenvalues. High Dimensional Discriminant Analysis - Lear seminar p.31/43

Intrinsic dimension estimation 0.1 0.1 0.08 0.08 0.06 0.06 0.04 0.04 0.02 0.02 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 Ordered eigenvalues of Σ i Ordered eigenvalues of Σ i 1 0.05 0.8 0.04 0.03 0.6 0.02 0.4 0.01 0.2 0 2 4 6 8 10 Cumulative sum of eigenvalues Common tresholding 0 0 2 4 6 8 10 Difference betwenn eigenvalues Scree-test of Cattell High Dimensional Discriminant Analysis - Lear seminar p.32/43

Part 6 Numerical results High Dimensional Discriminant Analysis - Lear seminar p.33/43

Results: artificial data Method Classification rate HDDA ([a i b i Q i d i ]) 0.958 HDDA ([a i b i Q i d]) 0.964 LDA 0.512 FDA 0.51 SVM 0.478 3 Gaussian densities in R 15, with d 1 = 3, d 2 = 4 and d 3 = 5, In addition, the proportions are very different: π 1 = 1 2, π 2 = 1 3 and π 3 = 1 6, High Dimensional Discriminant Analysis - Lear seminar p.34/43

Results: image categorization A recent study [LBGGDH03] proposes an approach based on the human perception to categorize natural images. An image is represented by a vector of 49 dimensions. Each one of these 49 components is the response of the image to a Gabor filter. High Dimensional Discriminant Analysis - Lear seminar p.35/43

Results: image categorization Data: 328 descriptors in 49 dimensions, Results: Method Classification rate HDDA ([a i b i Q i d i ]) 0.857 HDDA ([a i bq i d]) 0.881 QDA 0.849 LDA 0.775 FDA (d = k 1) 0.79 SVM 0.839 Classification results for the image categorization experiment (leave-one-out). High Dimensional Discriminant Analysis - Lear seminar p.36/43

Results: object recognition Our approach uses local descriptors (Harris-Laplace+Sift), We consider 3 object classes (wheels, seat and handlebars) and 1 background class, The dataset is made of 1000 descriptors in 128 dimensions: learning dataset: 500, test dataset: 500. High Dimensional Discriminant Analysis - Lear seminar p.37/43

Results: object recognition 1 1 0.9 0.9 0.8 0.8 0.7 0.7 True positives 0.6 0.5 0.4 FDA LDA True positives 0.6 0.5 0.4 0.3 0.3 0.2 0.1 SVM classifiers HDDA classifiers 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 False positives 0.2 0.1 HDDA with error probability < 10 5 with error probability < 10 10 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 False positives Classification results for the object recognition experiment. High Dimensional Discriminant Analysis - Lear seminar p.38/43

Results: object recognition Recognition using HDDA Recognition using SVM High Dimensional Discriminant Analysis - Lear seminar p.39/43

Part 7 Unsupervised classification High Dimensional Discriminant Analysis - Lear seminar p.40/43

Extension to unsupervised classification The unsupervised classification aims to organize data in homogeneous classes. Gaussian mixture models (GMM) are an efficient way for unsupervised classification: in Gaussian mixture models, the density of the mixture is: f(x, θ) = k i=1 π i f i (x; µ i, Σ i ), where θ = {π 1,..., π k, µ 1,..., µ k, Σ 1,..., Σ k }. the parameters estimation is generally done by the EM algorithm. High Dimensional Discriminant Analysis - Lear seminar p.41/43

Extension to unsupervised classification Using our model for HD data, the two main steps of the EM algorithm are: E step: compute t (q) ij t (q) ij where K (q) i d (q) i log(a (q) i = t (q) i (x j ) = exp( K (q) i (x j )/2)/ k l=1 exp( K(q) l (x j )/2), (x) = µ(q) i P (q) i (x j ) 2 a (q) i ) + (p d (q) i ) log(b (q) i + x j P (q) i (x j ) 2 b (q) i ) 2 log(π (q) i ). M step: classical estimation of π i, µ i and Σ i ; the estimators of a i, b i and Q i are the same as those of HDDA. + High Dimensional Discriminant Analysis - Lear seminar p.42/43

References [BC96] H. Bensmail and G. Celeux. Regularized gaussian discriminant analysis through eigenvalue decomposition. Journal of the American Statistical Association, 91:1743 1748, 1996. [Bel61] R. Bellman. Adaptive Control Processes. Princeton University Press, 1961. [Fri89] J.H. Friedman. Regularized discriminant analysis. Journal of the American Statistical Association, 84:165 175, 1989. [LBGGDH03] H. Le Borgne, N. Guyader, A. Guerin-Dugué, and J. Hérault. Classification of images: Ica filters vs human perception. In 7th International Symposium on Signal Processing and its Applications, number 2, pages 251 254, 2003. [ST83] D. Scott and J. Thompson. Probability density estimation in higher dimensions. In Proceedings of the Fifteenth Symposium on the Interface, North Holland-Elsevier Science Publishers, pages 173 179, 1983. High Dimensional Discriminant Analysis - Lear seminar p.43/43