A Tutorial on Data Reduction. Linear Discriminant Analysis (LDA) Shireen Elhabian and Aly A. Farag. University of Louisville, CVIP Lab September PDF Free Download

A utoral on Data Reducton Lnear Dscrmnant Analss (LDA) hreen Elhaban and Al A Farag Unverst of Lousvlle, CVIP Lab eptember 009

Outlne LDA objectve Recall PCA No LDA LDA o Classes Counter eample LDA C Classes Illustratve Eample LDA vs PCA Eample Lmtatons of LDA

LDA Objectve he objectve of LDA s to perform dmensonalt reducton o hat, PCA does ths L Hoever, e ant to preserve as much of the class dscrmnator nformaton as possble OK, that s ne, let dell deeper J

Recall PCA In PCA, the man dea to reepress the avalable dataset to etract the relevant nformaton b reducng the redundanc and mnmze the nose e ddn t care about hether ths dataset represent features from one or more classes, e the dscrmnaton poer as not taken nto consderaton hle e ere talkng about PCA In PCA, e had a dataset matr X th dmensons mn, here columns represent dfferent data samples m dmensonal data vector e frst started b subtractng the mean to have a zero mean dataset, then e computed the covarance matr XX n feature vectors (data samples) Egen values and egen vectors ere then computed for Hence the ne bass vectors are those egen vectors th hghest egen values, here the number of those vectors as our choce hus, usng the ne bass, e can project the dataset onto a less dmensonal space th more poerful data representaton

No LDA Consder a pattern classfcaton problem, here e have C classes, eg seabass, tuna, salmon Each class has N mdmensonal samples, here,,, C Hence e have a set of mdmensonal samples {,,, N } belong to class ω tackng these samples from dfferent classes nto one bg fat matr X such that each column represents one sample e seek to obtan a transformaton of X to Y through projectng the samples n X onto a hperplane th dmenson C Let s see hat does ths mean?

LDA o Classes he to classes are not ell separated hen projected onto ths lne Assume e have mdmensonal samples {,,, N }, N of hch belong to ω and N belong to ω e seek to obtan a scalar b projectng the samples onto a lne (C space, C ) here é ë m ù û and é ë m ù û hs lne succeeded n separatng the to classes and n the meantme reducng the dmensonalt of our problem from to features (, ) to onl a scalar value here s the projecton vectors used to project to Of all the possble lnes e ould lke to select the one that mamzes the separablt of the scalars

LDA o Classes In order to fnd a good projecton vector, e need to defne a measure of separaton beteen the projectons he mean vector of each class n and feature space s: e projectng to ll lead to projectng the mean of to the mean of e could then choose the dstance beteen the projected means as our objectve functon N N N and N Î Î Î Î ( ) ) ( J

LDA o Classes Hoever, the dstance beteen the projected means s not a ver good measure snce t does not take nto account the standard devaton thn the classes hs as elds better class separablt hs as has a larger dstance beteen means

LDA o Classes he soluton proposed b Fsher s to mamze a functon that represents the dfference beteen the means, normalzed b a measure of the thnclass varablt, or the socalled scatter For each class e defne the scatter, an equvalent of the varance, as; (sum of square dfferences beteen the projected samples and ther class mean) s s ( ) Î measures the varablt thn class ω after projectng t on the space s + s hus measures the varablt thn the to classes at hand after projecton, hence t s called thnclass scatter of the projected samples

LDA o Classes he Fsher lnear dscrmnant s defned as the lnear functon that mamzes the crteron functon: (the dstance beteen the projected means normalzed b the thnclass scatter of the projected samples J ( ) s + s herefore, e ll be lookng for a projecton here eamples from the same class are projected ver close to each other and, at the same tme, the projected means are as farther apart as possble

LDA o Classes In order to fnd the optmum projecton *, e need to epress J() as an eplct functon of e ll defne a measure of the scatter n multvarate feature space hch are denoted as scatter matrces; here s the covarance matr of class ω, and s called the thnclass scatter matr ( )( ) + Î ) ( s s J +

LDA o Classes No, the scatter of the projecton can then be epressed as a functon of the scatter matr n feature space here s the thnclass scatter matr of the projected samples ( ) ( ) ( )( ) ( )( ) ( ) s s s + + + Î Î Î Î ) ( s s J +

LDA o Classes mlarl, the dfference beteen the projected means (n space) can be epressed n terms of the means n the orgnal feature space (space) he matr s called the beteenclass scatter of the orgnal samples/feature vectors, hle s the beteenclass scatter of the projected samples nce s the outer product of to vectors, ts rank s at most one ( ) ( ) ( )( )!!! "!! $! # ) ( s s J +

LDA o Classes e can fnall epress the Fsher crteron n terms of and as: J ( ) s + s Hence J() s a measure of the dfference beteen class means (encoded n the beteenclass scatter matr) normalzed b a measure of the thnclass scatter matr

LDA o Classes o fnd the mamum of J(), e dfferentate and equate to zero ( ) ( ) ( ) ( ) ( ) ( ) 0 ) ( 0 ) ( 0 : 0 0 0 ) ( Þ Þ Þ Þ Þ J J b Dvdng d d d d d d J d d

LDA o Classes olvng the generalzed egen value problem elds * l here l J ( ) scalar arg ma J( ) arg ma ( ) hs s knon as Fsher s Lnear Dscrmnant, although t s not a dscrmnant but rather a specfc choce of drecton for the projecton of the data don to one dmenson Usng the same notaton as PCA, the soluton ll be the egen vector(s) of X

LDA o Classes Eample Compute the Lnear Dscrmnant projecton for the follong todmensonal dataset 0 9 8 7 6 amples for class ω : X (, ){(4,),(,4),(,3),(3,6),(4,4)} ample for class ω : X (, ){(9,0),(6,8),(9,5),(8,7),(0,8)} 5 4 3 0 0 3 4 5 6 7 8 9 0

LDA o Classes Eample he classes mean are : û ù ë é + + + + û ù ë é + + + + Î Î 76 84 8 0 7 8 5 9 8 6 0 9 5 38 3 4 4 6 3 3 4 4 5 N N

LDA o Classes Eample Covarance matr of the frst class: ( )( ) û ù ë é + û ù ë é + û ù ë é + û ù ë é + û ù ë é Î 05 05 38 3 4 4 38 3 6 3 38 3 3 38 3 4 38 3 4

LDA o Classes Eample Covarance matr of the second class: ( )( ) û ù ë é + û ù ë é + û ù ë é + û ù ë é + û ù ë é Î 33 005 005 3 76 84 8 0 76 84 7 8 76 84 5 9 76 84 8 6 76 84 0 9

LDA o Classes Eample thnclass scatter matr: + 05 33 03 05 03 55 + 3 005 005 33

LDA o Classes Eample eteenclass scatter matr: ( )( ) é 3 84ùé 3 84ù ë38 76ûë38 76û 54 38 96 05 ( 54 38) 05 444

LDA o Classes Eample he LDA projecton s then obtaned as the soluton of the generalzed egen value problem l Þ li 0 33 Þ 03 Þ Þ Þ 03045 0066 03 55 93 l 4339 96 05 0066 96 087 05 6489 9794 l 05 l 444 0 05 l 444 0 ( 93 l)( 9794 l) 6489 l 007l 0 Þ l( l 007) Þ l 0, l 007 4339 0 0 0 0 0 0

LDA o Classes Eample Hence 93 4339 and 93 4339 hus; 6489 9794 6489 9794 05755 0878 0! l 007 %"$"# l and 09088 0473 * he optmal projecton s the one that gven mamum λ J()

LDA o Classes Eample Or drectl; * ( ) 33 03 03 55 03045 0066 09088 0473 é 3 84ù ë38 76û 0066 54 087 38

LDA Projecton he projecton vector correspondng to the smallest egen value Classes PDF : usng the LDA projecton vector th the other egen value 8888e06 035 03 05 0 LDA projecton vector th the other egen value 8888e06 0 9 8 p( ) 05 7 0 6 5 005 4 3 0 7 6 5 4 3 0 3 4 5 6 7 8 9 0 0 4 3 0 3 4 5 6 Usng ths vector leads to bad separablt beteen the to classes

LDA Projecton he projecton vector correspondng to the hghest egen value Classes PDF : usng the LDA projecton vector th hghest egen value 007 04 035 03 0 9 LDA projecton vector th the hghest egen value 007 p( ) 05 0 8 05 7 0 6 005 5 4 0 0 5 0 5 3 Usng ths vector leads to good separablt beteen the to classes 0 0 3 4 5 6 7 8 9 0

LDA CClasses No, e have Cclasses nstead of just to e are no seekng (C) projectons [,,, C ] b means of (C) projecton vectors can be arranged b columns nto a projecton matr [ C ] such that: [ ], û ù ë é û ù ë é C C m C C m m and here Þ

LDA CClasses If e have nfeature vectors, e can stack them nto one matr as follos; [ ], û ù ë é û ù ë é C C m n C C C n n C n m m m n n m and Y X here X Y

LDA CClasses Recall the to classes case, the thnclass scatter as computed as: + Eample of todmensonal features (m ), th three classes C 3 hs can be generalzed n the C classes case as: here and C N Î Î ( )( ) N : number of data samples n class ω 3

LDA CClasses Recall the to classes case, the beteenclass scatter as computed as: For Cclasses case, e ll measure the beteenclass scatter th respect to the mean of all classes as follos: here and C N ( )( ) ( )( ) N N " Î N " N N: number of all data N : number of data samples n class ω Eample of todmensonal features (m ), th three classes C 3 3

LDA CClasses mlarl, e can defne the mean vectors for the projected samples as: hle the scatter matrces for the projected samples ll be: " Î N and N ( )( ) Î C C ( )( ) C N

LDA CClasses Recall n toclasses case, e have epressed the scatter matrces of the projected samples n terms of those of the orgnal samples as: hs stll hold n Cclasses case Recall that e are lookng for a projecton that mamzes the rato of beteenclass to thnclass scatter nce the projecton s no longer a scalar (t has C dmensons), e then use the determnant of the scatter matrces to obtan a scalar objectve functon: J ( ) And e ll seek the projecton * that mamzes ths rato

LDA CClasses o fnd the mamum of J(), e dfferentate th respect to and equate to zero Recall n toclasses case, e solved the egen value problem For Cclasses case, e have C projecton vectors, hence the egen value problem can be generalzed to the Cclasses case as: l here l J ( ) l here l J ( ) hus, It can be shon that the optmal projecton matr * s the one hose columns are the egenvectors correspondng to the largest egen values of the follong generalzed egen value problem: * here l * l J( * ) scalar and scalar * scalar and,, C [ ] * * * C

Illustraton 3 Classes Let s generate a dataset for each class to smulate the three classes shon For each class do the follong, Use the random number generator to generate a unform stream of 500 samples that follos U(0,) 3 Usng the omuller approach, convert the generated unform stream to N(0,) hen use the method of egen values and egen vectors to manpulate the standard normal to have the requred mean vector and covarance matr Estmate the mean and covarance matr of the resulted dataset

vsual nspecton of the fgure, classes parameters (means and covarance matrces can be gven as follos: Dataset Generaton 3 û ù ë é + û ù ë é + û ù ë é + û ù ë é 5 3 35 4 0 0 4 3 3 5 5 7, 35 5, 7 3 5 5 mean Overall 3 3 Zero covarance to lead to data samples dstrbuted horzontall Postve covarance to lead to data samples dstrbuted along the lne Negatve covarance to lead to data samples dstrbuted along the lne

In Matlab J

It s orkng J 0 3 5 X the second feature 0 5 0 5 5 0 5 0 5 0 X the frst feature

Computng LDA Projecton Vectors ( )( ) Î " " C N and N N N here N ( )( ) Î Î C N and here Recall

Let s vsualze the projecton vectors 5 0 X the second feature 5 0 5 0 5 0 5 0 5 0 5 0 5 0 5 X the frst feature

Projecton Along frst projecton vector Classes PDF : usng the frst projecton vector th egen value 4508089 04 035 03 05 p( ) 0 05 0 005 0 5 0 5 0 5 0 5

Projecton Along second projecton vector Classes PDF : usng the second projecton vector th egen value 87885 04 035 03 05 p( ) 0 05 0 005 0 0 5 0 5 0 5 0

hch s etter?!!! Apparentl, the projecton vector that has the hghest egen value provdes hgher dscrmnaton poer beteen classes Classes PDF : usng the frst projecton vector th egen value 4508089 04 Classes PDF : usng the second projecton vector th egen value 87885 04 035 035 03 03 05 05 p( ) 0 p( ) 0 05 05 0 0 005 005 0 5 0 5 0 5 0 5 0 0 5 0 5 0 5 0

PCA vs LDA

Lmtatons of LDA L LDA produces at most C feature projectons If the classfcaton error estmates establsh that more features are needed, some other method must be emploed to provde those addtonal features LDA s a parametrc method snce t assumes unmodal Gaussan lkelhoods If the dstrbutons are sgnfcantl nongaussan, the LDA projectons ll not be able to preserve an comple structure of the data, hch ma be needed for classfcaton

Lmtatons of LDA L LDA ll fal hen the dscrmnator nformaton s not n the mean but rather n the varance of the data

hank You

A Tutorial on Data Reduction. Linear Discriminant Analysis (LDA) Shireen Elhabian and Aly A. Farag. University of Louisville, CVIP Lab September 2009