High Dimensional Discriminant Analysis

Similar documents
High Dimensional Discriminant Analysis

High Dimensional Discriminant Analysis

Classification of high dimensional data: High Dimensional Discriminant Analysis

Model-based clustering of high-dimensional data: an overview and some recent advances

Journal of Statistical Software

INRIA Rh^one-Alpes. Abstract. Friedman (1989) has proposed a regularization technique (RDA) of discriminant analysis

Introduction to Machine Learning

Classification Methods II: Linear and Quadratic Discrimminant Analysis

Introduction to Machine Learning

Classification 2: Linear discriminant analysis (continued); logistic regression

Regularized Discriminant Analysis and Reduced-Rank LDA

Kernel discriminant analysis and clustering with parsimonious Gaussian process models

Introduction to Machine Learning Spring 2018 Note 18

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)

CMSC858P Supervised Learning Methods

Lecture 9: Classification, LDA

Heeyoul (Henry) Choi. Dept. of Computer Science Texas A&M University

Lecture 9: Classification, LDA

Adaptive Mixture Discriminant Analysis for. Supervised Learning with Unobserved Classes

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification

High Dimensional Kullback-Leibler divergence for grassland classification using satellite image time series with high spatial resolution

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

Chap 2. Linear Classifiers (FTH, ) Yongdai Kim Seoul National University

Supervised Learning. Regression Example: Boston Housing. Regression Example: Boston Housing

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Machine Learning 2nd Edition

Lecture 9: Classification, LDA

BANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1

Discriminant Analysis Documentation

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

L11: Pattern recognition principles

Dimensionality Reduction and Principal Components

Linear Regression and Discrimination

Clustering VS Classification

Linear Methods for Prediction

LECTURE NOTE #10 PROF. ALAN YUILLE

The Bayes classifier

LEC 4: Discriminant Analysis for Classification

Model-Based Clustering of High-Dimensional Data: A review

Machine Learning 11. week

PCA and LDA. Man-Wai MAK

Random projection ensemble classification

CS534 Machine Learning - Spring Final Exam

Bayesian Decision and Bayesian Learning

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

Dimensionality Reduction and Principle Components

Classification: Linear Discriminant Analysis

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber

CS 340 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis

Course in Data Science

STATS306B STATS306B. Discriminant Analysis. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010

Pattern Recognition and Machine Learning

PCA and LDA. Man-Wai MAK

c 4, < y 2, 1 0, otherwise,

Curves clustering with approximation of the density of functional random variables

THESIS COVARIANCE REGULARIZATION IN MIXTURE OF GAUSSIANS FOR HIGH-DIMENSIONAL IMAGE CLASSIFICATION. Submitted by. Daniel L Elliott

Dimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014

Lecture 5. Gaussian Models - Part 1. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. November 29, 2016

Probabilistic Fisher Discriminant Analysis

International Journal of Pure and Applied Mathematics Volume 19 No , A NOTE ON BETWEEN-GROUP PCA

Supervised Learning: Linear Methods (1/2) Applied Multivariate Statistics Spring 2012

LDA, QDA, Naive Bayes

Subspace Analysis for Facial Image Recognition: A Comparative Study. Yongbin Zhang, Lixin Lang and Onur Hamsici

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

System 1 (last lecture) : limited to rigidly structured shapes. System 2 : recognition of a class of varying shapes. Need to:

Classification. Chapter Introduction. 6.2 The Bayes classifier

Model selection criteria in Classification contexts. Gilles Celeux INRIA Futurs (orsay)

Linear Methods for Prediction

Multivariate Analysis

Introduction to Machine Learning

CSCI-567: Machine Learning (Spring 2019)

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

Discriminant analysis and supervised classification

Machine Learning 1. Linear Classifiers. Marius Kloft. Humboldt University of Berlin Summer Term Machine Learning 1 Linear Classifiers 1

5. Discriminant analysis

Lecture 5: Classification

A Study of Relative Efficiency and Robustness of Classification Methods

Regularized Discriminant Analysis. Part I. Linear and Quadratic Discriminant Analysis. Discriminant Analysis. Example. Example. Class distribution

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification

Introduction to Machine Learning

Computation. For QDA we need to calculate: Lets first consider the case that

MSA220 Statistical Learning for Big Data

Linear Dimensionality Reduction

Spring 2006: Linear Discriminant Analysis, Etc.

Introduction to Graphical Models

Probabilistic Time Series Classification

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

ECE 661: Homework 10 Fall 2014

Lecture 6: Methods for high-dimensional problems

Cellwise robust regularized discriminant analysis

CS281 Section 4: Factor Analysis and PCA

STA 450/4000 S: January

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition

Gaussian Models

Principal Components Analysis (PCA)

Lecture 13 Visual recognition

STA 414/2104: Lecture 8

Cellwise robust regularized discriminant analysis

Classification via kernel regression based on univariate product density estimators

Transcription:

High Dimensional Discriminant Analysis Charles Bouveyron LMC-IMAG & INRIA Rhône-Alpes Joint work with S. Girard and C. Schmid ASMDA Brest May 2005

Introduction Modern data are high dimensional: Imagery: MRI, Computer vision, Biology: DNA micro-array. Classication is very dicult in high dimensional spaces: many learning methods suer from the curse of dimensionality [Bel61], the empty space phenomenum [ST83] allows to assume that data live in low-dimensional subspaces.

Outline 1 Framework of discriminant analysis 2 New model for high-dimensional data 3 High Dimensional Discriminant Analysis 4 Estimators and intrinsic dimension estimation 5 Numerical results 6 Conclusion & work in progress

Classication Classication: supervised classication (discriminant analysis), unsupervised classication (clustering). 2 main classication methods family: generative methods: QDA, LDA, discriminative methods: logistic regression, SVM. Generative models can be both used in supervised and unsupervised classication.

Discrimination problem The basic problem: assign an observation x = (x 1,..., x p ) R p with unknown class membership to one of k classes C 1,..., C k known a priori. We dispose of a learning dataset A: A = {(x 1, y 1 ),..., (x n, y n )/x j R p and y j {1,..., k}}, where the vector x j contains p explanatory variables and y j indicates the index of the class of x j. We have to construct a decision rule δ: δ : R p {1,..., k} x y.

Bayes decision rule The optimal decision rule δ is : δ : x C i, if i = argmax{p(c i x)}, i=1,...,k δ : x C i, if i = argmin{ 2 log(π i f i (x))}, i=1,...,k where π i is the a priori probability of class C i and f i (x) denotes the class conditional density of x. We consider only generative methods which assume that distributions of classes are Gaussian N (µ i, Σ i ).

Classical methods Quadratic discriminant analysis (QDA): i = argmin{(x µ i ) t Σ 1 i (x µ i ) + log(det Σ i ) 2 log(π i )}. i=1,...,k Linear discriminant analysis (LDA): with the assumption that i, Σ i = Σ i = argmin{µ t iσ 1 µ i 2µ t iσ 1 x 2 log(π i )}. i=1,...,k QDA and LDA have disappointing behavior when n p.

Regularizations Dimension reduction: PCA, feature selection, Fisher Discriminant Analysis (FDA). Parsimonious models: Regularized discriminant analysis [Fri89], Eigenvalue decomposition discriminant analysis [BC96].

Classication of high-dimensional data 5 5 0 0 5 5 0 5 10 Correct classification 5 5 0 5 10 FDA classification (48.8% correct) 5 5 0 0 5 5 0 5 10 SVM classification (46.4% correct) 5 5 0 5 10 HDDA classification (95.3% correct) Three Gaussian densities in R 100 with intrinsic dimensions equal to 2. For visualization, data are projected on the 2 discriminant axes.

The idea of new model The main idea: data of the same class live in a specic low-dimensional subspace, data of dierent classes live in dierent subspaces, For each class, we split R p into two subspaces: the subspace where the data live, and its orthogonal complementary, We use a parsimonious model: we model each class as spherical density in the 2 subspaces.

The new model We assume that class conditional densities are Gaussian N (µ i, Σ i ) with means µ i and covariance matrices Σ i. Let Q i be the orthogonal matrix of eigenvectors of the covariance matrix Σ i, Let B i be the basis of R p made of the eigenvectors of Σ i. The class conditional covariance matrix i is dened in the basis B i by: i = Q t i Σ i Q i.

The new model We assume in addition that i contains only two dierent eigenvalues a i > b i. Let E i be the ane space generated by eigenvectors associated to the eigenvalue a i and such that µ i E i. We dene also E i such that E i E i = R p and µ i E i. Let P i and P i be the projection operators on E i and E i.

The new model Thus, we assume that i has the following form: 0 i = B @ a i 0... 0 a i 0 0 b i 0...... 1 C A 9 = ; 9 >= >; d i (p d i ) 0 b i

High Dimensional Discriminant Analysis Under the preceding assumptions, the Bayes decision rule yields a new decision rule δ + : Theorem The new decision rule δ + consists in classifying x to the class C i if: { 1 i = argmin µ i P i (x) 2 + 1 x P i (x) 2 i=1,...,k a i b i } +d i log(a i ) + (p d i ) log(b i ) 2 log(π i ).

HDDA: illustration The subspace E i and its supplementary E i. K i(x) = 1 a i µ i P i(x) 2 + 1 b i x P i(x) 2 +d i log(a i)+(p d i) log(b i) 2 log(π i)

HDDA: particular rules By allowing some but not all of HDDA parameters to vary, we obtain 24 particular models: which correspond to dierent regularizations, which some ones are easily geometrically interpretable, which 9 have explicit formulations. Notations: a i = σ2 i α i with α i ]0, 1[, and b i = σ2 i (1 α i) with σ i > 0. HDDA can be interpreted as classical discriminant analysis in particular cases: if i, α i = 1 2 : δ+ is QDA with spherical classes, if in addition i, σ i = σ: δ + is LDA with spherical classes.

Model [ασq i d i ] Theorem The decision rule δ + consists in classifying x to the class C i if: i = argmin{α µ i P i (x) 2 + (1 α) x P i (x) 2 }. i=1,...,k

HDDA estimators Estimators are computed using maximum likelihood estimation from the learning set A. Classical estimators: ˆπ i = n i n, n i = #(C i ), ˆµ i = 1 n i x j C i x j, ˆΣ i = 1 n i x j C i (x j ˆµ i ) t (x j ˆµ i ).

Estimators of the model [a i b i Q i d i ] Assuming d i is known, the ML estimators are: ˆQ i is made of the eigenvectors associated to the ordered eigenvalues of ˆΣi, â i is the mean of the largest d i eigenvalues of ˆΣi : â i = 1 d i d i l=1 λ il, ˆb i is the mean of the smallest (p d i ) eigenvalues of ˆΣi : ˆb i = 1 (p d i ) p l=d i+1 λ il.

Estimation trick In order to minimize the number of parameters to estimate, we use the following relation: p l=d i +1 λ il = tr( ˆΣ d i i ) λ il. Number of parameters to estimate with p = 100, d i = 10 and k = 4: Method l=1 Nb of param. QDA 20 603 HDDA (model [a i b i Q i d i ]) 4 323 HDDA (model [a i b i Qd]) 1 367

Intrinsic dimension estimation We base our approach to chose the values of d i on eigenvalues of Σ i, We use the scree-test of Cattell [Cat66]: The scree-test of Cattell.

Optical character recognition We consider the USPS dataset: learning: 2007 examples, test: 7291 examples. Recognition results: Examples of the USPS dataset. Method Recognition rate HDDA [a i bq i d i ] 95.86 % HDDA [a i b i Q i d i ] 95.52 % LDA (d = 256) 74.56 % FDA (d = 9) 90.23 % SVM (linear) 94.28 % Human 97.50 %

Object recognition Our approach uses local descriptors: detection of interest points: Harris-Laplace operator interest points description: Sift operator. We consider 3 object classes (wheels, seat and handlebars) and 1 background class, The dataset contains 1000 descriptors in 128 dimensions: learning dataset: 500, test dataset: 500.

Numerical results 0.9 0.85 HDDA SVM (Rbf, γ=0.6) FDA PCA+LDA (d=45) 0.8 Precision 0.75 0.7 0.65 0.6 0.55 40 60 80 100 120 140 160 180 Recall Classication results for the object recognition experiment.

Recognition results Recognition using HDDA Recognition using SVM Recognition results for the object recognition experiment.

Conclusion The new model proposed here nds the specic subspace and estimates the intrinsic dimension of each class, uses this information in the Gaussian model of each class, includes additional assumptions in order to reduce the number of parameters to estimate. The main advantages of our model are: good performances without dimension reduction of the data, good performances with small learning datasets, as fast as classical generative methods, it can be used either in supervised or in unsupervised classication.

Work in progress Extension to unsupervised classication using the EM algorithm. Application to object recognition in a weakly-supervised framework: unsupervised classication to learn object parts, supervised classication to recognize the object in a new image.

References H. Bensmail and G. Celeux. Regularized gaussian discriminant analysis through eigenvalue decomposition. Journal of the American Statistical Association, 91:17431748, 1996. R. Bellman. Adaptive Control Processes. Princeton University Press, 1961. C. Bouveyron, S. Girard, and C. Schmid. Analyse discriminante de haute dimension. Rapport de recherche 5470, INRIA, January 2005. R. B. Cattell. The scree test for the number of factors. Multivariate Behavioral Research, 1(2):140161, 1966. J.H. Friedman. Regularized discriminant analysis. Journal of the American Statistical Association, 84:165175, 1989. D. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91110, 2004. D. Scott and J. Thompson. Probability density estimation in higher dimensions. In Proceedings of the Fifteenth Symposium on the Interface, North Holland-Elsevier Science Publishers, pages 173179, 1983.