A Tutorial on Data Reduction. Linear Discriminant Analysis (LDA) Shireen Elhabian and Aly A. Farag. University of Louisville, CVIP Lab September 2009

Similar documents
Lecture 10: Dimensionality reduction

Statistical pattern recognition

Pattern Classification

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression

15-381: Artificial Intelligence. Regression and cross validation

Which Separator? Spring 1

However, since P is a symmetric idempotent matrix, of P are either 0 or 1 [Eigen-values

Unified Subspace Analysis for Face Recognition

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015

Discriminative classifier: Logistic Regression. CS534-Machine Learning

Linear discriminants. Nuno Vasconcelos ECE Department, UCSD

Support Vector Machines CS434

b ), which stands for uniform distribution on the interval a x< b. = 0 elsewhere

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

LECTURE 9 CANONICAL CORRELATION ANALYSIS

Composite Hypotheses testing

Classification learning II

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

Support Vector Machines

Kernel Methods and SVMs Extension

Statistical analysis using matlab. HY 439 Presented by: George Fortetsanakis

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

Generalized Linear Methods

Multigradient for Neural Networks for Equalizers 1

Image classification. Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing i them?

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

p 1 c 2 + p 2 c 2 + p 3 c p m c 2

Why Bayesian? 3. Bayes and Normal Models. State of nature: class. Decision rule. Rev. Thomas Bayes ( ) Bayes Theorem (yes, the famous one)

Generative classification models

Parameter estimation class 5

1 GSW Iterative Techniques for y = Ax

e i is a random error

Lecture Notes on Linear Regression

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

Chapter 8 Indicator Variables

Lecture 12: Classification

Discriminative classifier: Logistic Regression. CS534-Machine Learning

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Norms, Condition Numbers, Eigenvalues and Eigenvectors

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

Efficient, General Point Cloud Registration with Kernel Feature Maps

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Report on Image warping

Chapter 12 Analysis of Covariance

Probability Theory. The nth coefficient of the Taylor series of f(k), expanded around k = 0, gives the nth moment of x as ( ik) n n!

Chat eld, C. and A.J.Collins, Introduction to multivariate analysis. Chapman & Hall, 1980

CHAPTER 10: LINEAR DISCRIMINATION

Non-linear Canonical Correlation Analysis Using a RBF Network

Hopfield Training Rules 1 N

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Intro to Visual Recognition

Spectral Clustering. Shannon Quinn

Outline. Multivariate Parametric Methods. Multivariate Data. Basic Multivariate Statistics. Steven J Zeil

Singular Value Decomposition: Theory and Applications

Lecture 10 Support Vector Machines II

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

Machine Learning for Signal Processing Linear Gaussian Models

Lecture 6/7 (February 10/12, 2014) DIRAC EQUATION. The non-relativistic Schrödinger equation was obtained by noting that the Hamiltonian 2

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis

SIMPLE LINEAR REGRESSION

Problem Set 9 Solutions

Face Recognition CS 663

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

Automatic Object Trajectory- Based Motion Recognition Using Gaussian Mixture Models

Why Monte Carlo Integration? Introduction to Monte Carlo Method. Continuous Probability. Continuous Probability

Solutions to exam in SF1811 Optimization, Jan 14, 2015

Fisher Linear Discriminant Analysis

Structure and Drive Paul A. Jensen Copyright July 20, 2003

Image Processing for Bubble Detection in Microfluidics

The exam is closed book, closed notes except your one-page cheat sheet.

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

OPTIMISATION. Introduction Single Variable Unconstrained Optimisation Multivariable Unconstrained Optimisation Linear Programming

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

CSE 252C: Computer Vision III

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede. . For P such independent random variables (aka degrees of freedom): 1 =

INF 4300 Digital Image Analysis REPETITION

Lecture 5.8 Flux Vector Splitting

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

Chapter 6 Support vector machine. Séparateurs à vaste marge

Limited Dependent Variables

Salmon: Lectures on partial differential equations. Consider the general linear, second-order PDE in the form. ,x 2

LINEAR REGRESSION ANALYSIS. MODULE VIII Lecture Indicator Variables

Machine Learning for Signal Processing Linear Gaussian Models

Unit 5: Quadratic Equations & Functions

CHAPTER 5 MINIMAX MEAN SQUARE ESTIMATION

A kernel method for canonical correlation analysis

Solutions to Homework 7, Mathematics 1. 1 x. (arccos x) (arccos x) 1

Linear Approximation with Regularization and Moving Least Squares

Mixture o f of Gaussian Gaussian clustering Nov

Estimating the Fundamental Matrix by Transforming Image Points in Projective Space 1

Lecture 12: Discrete Laplacian

Lossy Compression. Compromise accuracy of reconstruction for increased compression.

Dynamic Programming. Preview. Dynamic Programming. Dynamic Programming. Dynamic Programming (Example: Fibonacci Sequence)

Numerical Heat and Mass Transfer

Transcription:

A utoral on Data Reducton Lnear Dscrmnant Analss (LDA) hreen Elhaban and Al A Farag Unverst of Lousvlle, CVIP Lab eptember 009

Outlne LDA objectve Recall PCA No LDA LDA o Classes Counter eample LDA C Classes Illustratve Eample LDA vs PCA Eample Lmtatons of LDA

LDA Objectve he objectve of LDA s to perform dmensonalt reducton o hat, PCA does ths L Hoever, e ant to preserve as much of the class dscrmnator nformaton as possble OK, that s ne, let dell deeper J

Recall PCA In PCA, the man dea to reepress the avalable dataset to etract the relevant nformaton b reducng the redundanc and mnmze the nose e ddn t care about hether ths dataset represent features from one or more classes, e the dscrmnaton poer as not taken nto consderaton hle e ere talkng about PCA In PCA, e had a dataset matr X th dmensons mn, here columns represent dfferent data samples m dmensonal data vector e frst started b subtractng the mean to have a zero mean dataset, then e computed the covarance matr XX n feature vectors (data samples) Egen values and egen vectors ere then computed for Hence the ne bass vectors are those egen vectors th hghest egen values, here the number of those vectors as our choce hus, usng the ne bass, e can project the dataset onto a less dmensonal space th more poerful data representaton

No LDA Consder a pattern classfcaton problem, here e have C classes, eg seabass, tuna, salmon Each class has N mdmensonal samples, here,,, C Hence e have a set of mdmensonal samples {,,, N } belong to class ω tackng these samples from dfferent classes nto one bg fat matr X such that each column represents one sample e seek to obtan a transformaton of X to Y through projectng the samples n X onto a hperplane th dmenson C Let s see hat does ths mean?

LDA o Classes he to classes are not ell separated hen projected onto ths lne Assume e have mdmensonal samples {,,, N }, N of hch belong to ω and N belong to ω e seek to obtan a scalar b projectng the samples onto a lne (C space, C ) here é ë m ù û and é ë m ù û hs lne succeeded n separatng the to classes and n the meantme reducng the dmensonalt of our problem from to features (, ) to onl a scalar value here s the projecton vectors used to project to Of all the possble lnes e ould lke to select the one that mamzes the separablt of the scalars

LDA o Classes In order to fnd a good projecton vector, e need to defne a measure of separaton beteen the projectons he mean vector of each class n and feature space s: e projectng to ll lead to projectng the mean of to the mean of e could then choose the dstance beteen the projected means as our objectve functon N N N and N Î Î Î Î ( ) ) ( J

LDA o Classes Hoever, the dstance beteen the projected means s not a ver good measure snce t does not take nto account the standard devaton thn the classes hs as elds better class separablt hs as has a larger dstance beteen means

LDA o Classes he soluton proposed b Fsher s to mamze a functon that represents the dfference beteen the means, normalzed b a measure of the thnclass varablt, or the socalled scatter For each class e defne the scatter, an equvalent of the varance, as; (sum of square dfferences beteen the projected samples and ther class mean) s s ( ) Î measures the varablt thn class ω after projectng t on the space s + s hus measures the varablt thn the to classes at hand after projecton, hence t s called thnclass scatter of the projected samples

LDA o Classes he Fsher lnear dscrmnant s defned as the lnear functon that mamzes the crteron functon: (the dstance beteen the projected means normalzed b the thnclass scatter of the projected samples J ( ) s + s herefore, e ll be lookng for a projecton here eamples from the same class are projected ver close to each other and, at the same tme, the projected means are as farther apart as possble

LDA o Classes In order to fnd the optmum projecton *, e need to epress J() as an eplct functon of e ll defne a measure of the scatter n multvarate feature space hch are denoted as scatter matrces; here s the covarance matr of class ω, and s called the thnclass scatter matr ( )( ) + Î ) ( s s J +

LDA o Classes No, the scatter of the projecton can then be epressed as a functon of the scatter matr n feature space here s the thnclass scatter matr of the projected samples ( ) ( ) ( )( ) ( )( ) ( ) s s s + + + Î Î Î Î ) ( s s J +

LDA o Classes mlarl, the dfference beteen the projected means (n space) can be epressed n terms of the means n the orgnal feature space (space) he matr s called the beteenclass scatter of the orgnal samples/feature vectors, hle s the beteenclass scatter of the projected samples nce s the outer product of to vectors, ts rank s at most one ( ) ( ) ( )( )!!! "!! $! # ) ( s s J +

LDA o Classes e can fnall epress the Fsher crteron n terms of and as: J ( ) s + s Hence J() s a measure of the dfference beteen class means (encoded n the beteenclass scatter matr) normalzed b a measure of the thnclass scatter matr

LDA o Classes o fnd the mamum of J(), e dfferentate and equate to zero ( ) ( ) ( ) ( ) ( ) ( ) 0 ) ( 0 ) ( 0 : 0 0 0 ) ( Þ Þ Þ Þ Þ J J b Dvdng d d d d d d J d d

LDA o Classes olvng the generalzed egen value problem elds * l here l J ( ) scalar arg ma J( ) arg ma ( ) hs s knon as Fsher s Lnear Dscrmnant, although t s not a dscrmnant but rather a specfc choce of drecton for the projecton of the data don to one dmenson Usng the same notaton as PCA, the soluton ll be the egen vector(s) of X

LDA o Classes Eample Compute the Lnear Dscrmnant projecton for the follong todmensonal dataset 0 9 8 7 6 amples for class ω : X (, ){(4,),(,4),(,3),(3,6),(4,4)} ample for class ω : X (, ){(9,0),(6,8),(9,5),(8,7),(0,8)} 5 4 3 0 0 3 4 5 6 7 8 9 0

LDA o Classes Eample he classes mean are : û ù ë é + + + + û ù ë é + + + + Î Î 76 84 8 0 7 8 5 9 8 6 0 9 5 38 3 4 4 6 3 3 4 4 5 N N

LDA o Classes Eample Covarance matr of the frst class: ( )( ) û ù ë é + û ù ë é + û ù ë é + û ù ë é + û ù ë é Î 05 05 38 3 4 4 38 3 6 3 38 3 3 38 3 4 38 3 4

LDA o Classes Eample Covarance matr of the second class: ( )( ) û ù ë é + û ù ë é + û ù ë é + û ù ë é + û ù ë é Î 33 005 005 3 76 84 8 0 76 84 7 8 76 84 5 9 76 84 8 6 76 84 0 9

LDA o Classes Eample thnclass scatter matr: + 05 33 03 05 03 55 + 3 005 005 33

LDA o Classes Eample eteenclass scatter matr: ( )( ) é 3 84ùé 3 84ù ë38 76ûë38 76û 54 38 96 05 ( 54 38) 05 444

LDA o Classes Eample he LDA projecton s then obtaned as the soluton of the generalzed egen value problem l Þ li 0 33 Þ 03 Þ Þ Þ 03045 0066 03 55 93 l 4339 96 05 0066 96 087 05 6489 9794 l 05 l 444 0 05 l 444 0 ( 93 l)( 9794 l) 6489 l 007l 0 Þ l( l 007) Þ l 0, l 007 4339 0 0 0 0 0 0

LDA o Classes Eample Hence 93 4339 and 93 4339 hus; 6489 9794 6489 9794 05755 0878 0! l 007 %"$"# l and 09088 0473 * he optmal projecton s the one that gven mamum λ J()

LDA o Classes Eample Or drectl; * ( ) 33 03 03 55 03045 0066 09088 0473 é 3 84ù ë38 76û 0066 54 087 38

LDA Projecton he projecton vector correspondng to the smallest egen value Classes PDF : usng the LDA projecton vector th the other egen value 8888e06 035 03 05 0 LDA projecton vector th the other egen value 8888e06 0 9 8 p( ) 05 7 0 6 5 005 4 3 0 7 6 5 4 3 0 3 4 5 6 7 8 9 0 0 4 3 0 3 4 5 6 Usng ths vector leads to bad separablt beteen the to classes

LDA Projecton he projecton vector correspondng to the hghest egen value Classes PDF : usng the LDA projecton vector th hghest egen value 007 04 035 03 0 9 LDA projecton vector th the hghest egen value 007 p( ) 05 0 8 05 7 0 6 005 5 4 0 0 5 0 5 3 Usng ths vector leads to good separablt beteen the to classes 0 0 3 4 5 6 7 8 9 0

LDA CClasses No, e have Cclasses nstead of just to e are no seekng (C) projectons [,,, C ] b means of (C) projecton vectors can be arranged b columns nto a projecton matr [ C ] such that: [ ], û ù ë é û ù ë é C C m C C m m and here Þ

LDA CClasses If e have nfeature vectors, e can stack them nto one matr as follos; [ ], û ù ë é û ù ë é C C m n C C C n n C n m m m n n m and Y X here X Y

LDA CClasses Recall the to classes case, the thnclass scatter as computed as: + Eample of todmensonal features (m ), th three classes C 3 hs can be generalzed n the C classes case as: here and C N Î Î ( )( ) N : number of data samples n class ω 3

LDA CClasses Recall the to classes case, the beteenclass scatter as computed as: For Cclasses case, e ll measure the beteenclass scatter th respect to the mean of all classes as follos: here and C N ( )( ) ( )( ) N N " Î N " N N: number of all data N : number of data samples n class ω Eample of todmensonal features (m ), th three classes C 3 3

LDA CClasses mlarl, e can defne the mean vectors for the projected samples as: hle the scatter matrces for the projected samples ll be: " Î N and N ( )( ) Î C C ( )( ) C N

LDA CClasses Recall n toclasses case, e have epressed the scatter matrces of the projected samples n terms of those of the orgnal samples as: hs stll hold n Cclasses case Recall that e are lookng for a projecton that mamzes the rato of beteenclass to thnclass scatter nce the projecton s no longer a scalar (t has C dmensons), e then use the determnant of the scatter matrces to obtan a scalar objectve functon: J ( ) And e ll seek the projecton * that mamzes ths rato

LDA CClasses o fnd the mamum of J(), e dfferentate th respect to and equate to zero Recall n toclasses case, e solved the egen value problem For Cclasses case, e have C projecton vectors, hence the egen value problem can be generalzed to the Cclasses case as: l here l J ( ) l here l J ( ) hus, It can be shon that the optmal projecton matr * s the one hose columns are the egenvectors correspondng to the largest egen values of the follong generalzed egen value problem: * here l * l J( * ) scalar and scalar * scalar and,, C [ ] * * * C

Illustraton 3 Classes Let s generate a dataset for each class to smulate the three classes shon For each class do the follong, Use the random number generator to generate a unform stream of 500 samples that follos U(0,) 3 Usng the omuller approach, convert the generated unform stream to N(0,) hen use the method of egen values and egen vectors to manpulate the standard normal to have the requred mean vector and covarance matr Estmate the mean and covarance matr of the resulted dataset

vsual nspecton of the fgure, classes parameters (means and covarance matrces can be gven as follos: Dataset Generaton 3 û ù ë é + û ù ë é + û ù ë é + û ù ë é 5 3 35 4 0 0 4 3 3 5 5 7, 35 5, 7 3 5 5 mean Overall 3 3 Zero covarance to lead to data samples dstrbuted horzontall Postve covarance to lead to data samples dstrbuted along the lne Negatve covarance to lead to data samples dstrbuted along the lne

In Matlab J

It s orkng J 0 3 5 X the second feature 0 5 0 5 5 0 5 0 5 0 X the frst feature

Computng LDA Projecton Vectors ( )( ) Î " " C N and N N N here N ( )( ) Î Î C N and here Recall

Let s vsualze the projecton vectors 5 0 X the second feature 5 0 5 0 5 0 5 0 5 0 5 0 5 0 5 X the frst feature

Projecton Along frst projecton vector Classes PDF : usng the frst projecton vector th egen value 4508089 04 035 03 05 p( ) 0 05 0 005 0 5 0 5 0 5 0 5

Projecton Along second projecton vector Classes PDF : usng the second projecton vector th egen value 87885 04 035 03 05 p( ) 0 05 0 005 0 0 5 0 5 0 5 0

hch s etter?!!! Apparentl, the projecton vector that has the hghest egen value provdes hgher dscrmnaton poer beteen classes Classes PDF : usng the frst projecton vector th egen value 4508089 04 Classes PDF : usng the second projecton vector th egen value 87885 04 035 035 03 03 05 05 p( ) 0 p( ) 0 05 05 0 0 005 005 0 5 0 5 0 5 0 5 0 0 5 0 5 0 5 0

PCA vs LDA

Lmtatons of LDA L LDA produces at most C feature projectons If the classfcaton error estmates establsh that more features are needed, some other method must be emploed to provde those addtonal features LDA s a parametrc method snce t assumes unmodal Gaussan lkelhoods If the dstrbutons are sgnfcantl nongaussan, the LDA projectons ll not be able to preserve an comple structure of the data, hch ma be needed for classfcaton

Lmtatons of LDA L LDA ll fal hen the dscrmnator nformaton s not n the mean but rather n the varance of the data

hank You