A Tutorial on Data Reduction. Linear Discriminant Analysis (LDA) Shireen Elhabian and Aly A. Farag. University of Louisville, CVIP Lab September 2009

Size: px

Start display at page:

Download "A Tutorial on Data Reduction. Linear Discriminant Analysis (LDA) Shireen Elhabian and Aly A. Farag. University of Louisville, CVIP Lab September 2009"

Sherilyn Nash
5 years ago
Views:

1 A utoral on Data Reducton Lnear Dscrmnant Analss (LDA) hreen Elhaban and Al A Farag Unverst of Lousvlle, CVIP Lab eptember 009

2 Outlne LDA objectve Recall PCA No LDA LDA o Classes Counter eample LDA C Classes Illustratve Eample LDA vs PCA Eample Lmtatons of LDA

3 LDA Objectve he objectve of LDA s to perform dmensonalt reducton o hat, PCA does ths L Hoever, e ant to preserve as much of the class dscrmnator nformaton as possble OK, that s ne, let dell deeper J

4 Recall PCA In PCA, the man dea to reepress the avalable dataset to etract the relevant nformaton b reducng the redundanc and mnmze the nose e ddn t care about hether ths dataset represent features from one or more classes, e the dscrmnaton poer as not taken nto consderaton hle e ere talkng about PCA In PCA, e had a dataset matr X th dmensons mn, here columns represent dfferent data samples m dmensonal data vector e frst started b subtractng the mean to have a zero mean dataset, then e computed the covarance matr XX n feature vectors (data samples) Egen values and egen vectors ere then computed for Hence the ne bass vectors are those egen vectors th hghest egen values, here the number of those vectors as our choce hus, usng the ne bass, e can project the dataset onto a less dmensonal space th more poerful data representaton

5 No LDA Consder a pattern classfcaton problem, here e have C classes, eg seabass, tuna, salmon Each class has N mdmensonal samples, here,,, C Hence e have a set of mdmensonal samples {,,, N } belong to class ω tackng these samples from dfferent classes nto one bg fat matr X such that each column represents one sample e seek to obtan a transformaton of X to Y through projectng the samples n X onto a hperplane th dmenson C Let s see hat does ths mean?

6 LDA o Classes he to classes are not ell separated hen projected onto ths lne Assume e have mdmensonal samples {,,, N }, N of hch belong to ω and N belong to ω e seek to obtan a scalar b projectng the samples onto a lne (C space, C ) here é ë m ù û and é ë m ù û hs lne succeeded n separatng the to classes and n the meantme reducng the dmensonalt of our problem from to features (, ) to onl a scalar value here s the projecton vectors used to project to Of all the possble lnes e ould lke to select the one that mamzes the separablt of the scalars

7 LDA o Classes In order to fnd a good projecton vector, e need to defne a measure of separaton beteen the projectons he mean vector of each class n and feature space s: e projectng to ll lead to projectng the mean of to the mean of e could then choose the dstance beteen the projected means as our objectve functon N N N and N Î Î Î Î ( ) ) ( J

8 LDA o Classes Hoever, the dstance beteen the projected means s not a ver good measure snce t does not take nto account the standard devaton thn the classes hs as elds better class separablt hs as has a larger dstance beteen means

9 LDA o Classes he soluton proposed b Fsher s to mamze a functon that represents the dfference beteen the means, normalzed b a measure of the thnclass varablt, or the socalled scatter For each class e defne the scatter, an equvalent of the varance, as; (sum of square dfferences beteen the projected samples and ther class mean) s s ( ) Î measures the varablt thn class ω after projectng t on the space s + s hus measures the varablt thn the to classes at hand after projecton, hence t s called thnclass scatter of the projected samples

10 LDA o Classes he Fsher lnear dscrmnant s defned as the lnear functon that mamzes the crteron functon: (the dstance beteen the projected means normalzed b the thnclass scatter of the projected samples J ( ) s + s herefore, e ll be lookng for a projecton here eamples from the same class are projected ver close to each other and, at the same tme, the projected means are as farther apart as possble

11 LDA o Classes In order to fnd the optmum projecton *, e need to epress J() as an eplct functon of e ll defne a measure of the scatter n multvarate feature space hch are denoted as scatter matrces; here s the covarance matr of class ω, and s called the thnclass scatter matr ( )( ) + Î ) ( s s J +

12 LDA o Classes No, the scatter of the projecton can then be epressed as a functon of the scatter matr n feature space here s the thnclass scatter matr of the projected samples ( ) ( ) ( )( ) ( )( ) ( ) s s s Î Î Î Î ) ( s s J +

13 LDA o Classes mlarl, the dfference beteen the projected means (n space) can be epressed n terms of the means n the orgnal feature space (space) he matr s called the beteenclass scatter of the orgnal samples/feature vectors, hle s the beteenclass scatter of the projected samples nce s the outer product of to vectors, ts rank s at most one ( ) ( ) ( )( )!!! "!! $! # ) ( s s J +

14 LDA o Classes e can fnall epress the Fsher crteron n terms of and as: J ( ) s + s Hence J() s a measure of the dfference beteen class means (encoded n the beteenclass scatter matr) normalzed b a measure of the thnclass scatter matr

15 LDA o Classes o fnd the mamum of J(), e dfferentate and equate to zero ( ) ( ) ( ) ( ) ( ) ( ) 0 ) ( 0 ) ( 0 : ) ( Þ Þ Þ Þ Þ J J b Dvdng d d d d d d J d d

16 LDA o Classes olvng the generalzed egen value problem elds * l here l J ( ) scalar arg ma J( ) arg ma ( ) hs s knon as Fsher s Lnear Dscrmnant, although t s not a dscrmnant but rather a specfc choce of drecton for the projecton of the data don to one dmenson Usng the same notaton as PCA, the soluton ll be the egen vector(s) of X

LDA o Classes Eample Compute the Lnear Dscrmnant projecton for the follong todmensonal dataset 0 9 8 7 6 amples for class

17 LDA o Classes Eample Compute the Lnear Dscrmnant projecton for the follong todmensonal dataset amples for class ω : X (, ){(4,),(,4),(,3),(3,6),(4,4)} ample for class ω : X (, ){(9,0),(6,8),(9,5),(8,7),(0,8)}

18 LDA o Classes Eample he classes mean are : û ù ë é û ù ë é Î Î N N

19 LDA o Classes Eample Covarance matr of the frst class: ( )( ) û ù ë é + û ù ë é + û ù ë é + û ù ë é + û ù ë é Î

20 LDA o Classes Eample Covarance matr of the second class: ( )( ) û ù ë é + û ù ë é + û ù ë é + û ù ë é + û ù ë é Î

21 LDA o Classes Eample thnclass scatter matr:

22 LDA o Classes Eample eteenclass scatter matr: ( )( ) é 3 84ùé 3 84ù ë38 76ûë38 76û ( 54 38)

23 LDA o Classes Eample he LDA projecton s then obtaned as the soluton of the generalzed egen value problem l Þ li 0 33 Þ 03 Þ Þ Þ l l 05 l l ( 93 l)( 9794 l) 6489 l 007l 0 Þ l( l 007) Þ l 0, l

LDA o Classes Eample Hence 93 4339 and 93 4339 hus; 6489 9794 6489 9794 05755 0878

24 LDA o Classes Eample Hence and hus; ! l 007 %"$"# l and * he optmal projecton s the one that gven mamum λ J()

25 LDA o Classes Eample Or drectl; * ( ) é 3 84ù ë38 76û

26 LDA Projecton he projecton vector correspondng to the smallest egen value Classes PDF : usng the LDA projecton vector th the other egen value 8888e LDA projecton vector th the other egen value 8888e p( ) Usng ths vector leads to bad separablt beteen the to classes

27 LDA Projecton he projecton vector correspondng to the hghest egen value Classes PDF : usng the LDA projecton vector th hghest egen value LDA projecton vector th the hghest egen value 007 p( ) Usng ths vector leads to good separablt beteen the to classes

28 LDA CClasses No, e have Cclasses nstead of just to e are no seekng (C) projectons [,,, C ] b means of (C) projecton vectors can be arranged b columns nto a projecton matr [ C ] such that: [ ], û ù ë é û ù ë é C C m C C m m and here Þ

29 LDA CClasses If e have nfeature vectors, e can stack them nto one matr as follos; [ ], û ù ë é û ù ë é C C m n C C C n n C n m m m n n m and Y X here X Y

30 LDA CClasses Recall the to classes case, the thnclass scatter as computed as: + Eample of todmensonal features (m ), th three classes C 3 hs can be generalzed n the C classes case as: here and C N Î Î ( )( ) N : number of data samples n class ω 3

LDA CClasses Recall the to classes case, the beteenclass

the beteenclass scatter th respect to the mean of all

N N: number of all data N : number of data samples n

31 LDA CClasses Recall the to classes case, the beteenclass scatter as computed as: For Cclasses case, e ll measure the beteenclass scatter th respect to the mean of all classes as follos: here and C N ( )( ) ( )( ) N N " Î N " N N: number of all data N : number of data samples n class ω Eample of todmensonal features (m ), th three classes C 3 3

32 LDA CClasses mlarl, e can defne the mean vectors for the projected samples as: hle the scatter matrces for the projected samples ll be: " Î N and N ( )( ) Î C C ( )( ) C N

33 LDA CClasses Recall n toclasses case, e have epressed the scatter matrces of the projected samples n terms of those of the orgnal samples as: hs stll hold n Cclasses case Recall that e are lookng for a projecton that mamzes the rato of beteenclass to thnclass scatter nce the projecton s no longer a scalar (t has C dmensons), e then use the determnant of the scatter matrces to obtan a scalar objectve functon: J ( ) And e ll seek the projecton * that mamzes ths rato

34 LDA CClasses o fnd the mamum of J(), e dfferentate th respect to and equate to zero Recall n toclasses case, e solved the egen value problem For Cclasses case, e have C projecton vectors, hence the egen value problem can be generalzed to the Cclasses case as: l here l J ( ) l here l J ( ) hus, It can be shon that the optmal projecton matr * s the one hose columns are the egenvectors correspondng to the largest egen values of the follong generalzed egen value problem: * here l * l J( * ) scalar and scalar * scalar and,, C [ ] * * * C

35 Illustraton 3 Classes Let s generate a dataset for each class to smulate the three classes shon For each class do the follong, Use the random number generator to generate a unform stream of 500 samples that follos U(0,) 3 Usng the omuller approach, convert the generated unform stream to N(0,) hen use the method of egen values and egen vectors to manpulate the standard normal to have the requred mean vector and covarance matr Estmate the mean and covarance matr of the resulted dataset

vsual nspecton of the fgure, classes parameters (means and covarance matrces can be gven as follos: Dataset Generaton 3 û ù ë é + û ù ë é + û ù ë é + û ù ë é 5 3 35 4 0 0 4 3 3 5 5 7, 35 5, 7 3 5 5

36 vsual nspecton of the fgure, classes parameters (means and covarance matrces can be gven as follos: Dataset Generaton 3 û ù ë é + û ù ë é + û ù ë é + û ù ë é , 35 5, mean Overall 3 3 Zero covarance to lead to data samples dstrbuted horzontall Postve covarance to lead to data samples dstrbuted along the lne Negatve covarance to lead to data samples dstrbuted along the lne

37 In Matlab J

38 It s orkng J X the second feature X the frst feature

39 Computng LDA Projecton Vectors ( )( ) Î " " C N and N N N here N ( )( ) Î Î C N and here Recall

40 Let s vsualze the projecton vectors 5 0 X the second feature X the frst feature

41 Projecton Along frst projecton vector Classes PDF : usng the frst projecton vector th egen value p( )

42 Projecton Along second projecton vector Classes PDF : usng the second projecton vector th egen value p( )

43 hch s etter?!!! Apparentl, the projecton vector that has the hghest egen value provdes hgher dscrmnaton poer beteen classes Classes PDF : usng the frst projecton vector th egen value Classes PDF : usng the second projecton vector th egen value p( ) 0 p( )

44 PCA vs LDA

45 Lmtatons of LDA L LDA produces at most C feature projectons If the classfcaton error estmates establsh that more features are needed, some other method must be emploed to provde those addtonal features LDA s a parametrc method snce t assumes unmodal Gaussan lkelhoods If the dstrbutons are sgnfcantl nongaussan, the LDA projectons ll not be able to preserve an comple structure of the data, hch ma be needed for classfcaton

46 Lmtatons of LDA L LDA ll fal hen the dscrmnator nformaton s not n the mean but rather n the varance of the data

47 hank You

Lecture 10: Dimensionality reduction

Lecture 10: Dimensionality reduction Lecture : Dmensonalt reducton g The curse of dmensonalt g Feature etracton s. feature selecton g Prncpal Components Analss g Lnear Dscrmnant Analss Intellgent Sensor Sstems Rcardo Guterrez-Osuna Wrght