Pattern. Classification

Size: px

Start display at page:

Download "Pattern. Classification"

Frederick Quinn
5 years ago
Views:

1 Pattern Classfcaton

2 An Eample of Classfcaton Sortng ncomng Fsh on a conveyor accordng to speces usng optcal sensng Speces Sea bass Salmon

3 Some propertes that could be possbly used to dstngush between the two types of fshes s Length Lghtness dth Number and shape of fns Poston of the mouth, etc Features hs s the set of all suggested features to eplore for use n our classfer! Feature s a property or characterstcs of an object quantfable or non quantfable whch s used to dstngush between or classfy two objects

4 Feature vector A Sngle feature may not be useful always for classfcaton A set of features used for classfcaton form a feature vector Fsh [, ] Lghtness dth

5 Feature space he samples of nput when represented by ther features are represented as ponts n the feature space If a sngle feature s used, then wor on a one- dmensonal feature space Pont representng samples If number of features s, then we get ponts n Dspace as shown n the net slde e can also have an n-dmensonal feature space

6 Decson boundary n one-dmensonal case wth two classes Decson boundary n or 3 dmensonal case wth three classes

7 Class Class Class 3 F F Sample ponts n a two-dmensonal feature space

8 Some ermnologes: Pattern Feature Feature vector Feature space Classfcaton Decson Boundary Decson Regon Dscrmnant functon Hyperplanes and Hypersurfaces Learnng Supervsed and unsupervsed Error Nose PDF Baye s Rule Parametrc and Non-parametrc approaches

9 Decson regon and Decson Boundary Our goal of pattern recognton s to reach an optmal decson rule to categorze the ncomng data nto ther respectve categores he decson boundary separates ponts belongng to one class from ponts of other he decson boundary parttons the feature space nto decson regons he nature of the decson boundary s decded by the dscrmnant functon whch s used for decson It s a functon of the feature vector

10 Multple classes Now consder the etenson of lnear dscrmnants to K > classes e mght be tempted be to buld a K-class dscrmnant by combnng a number of two-class dscrmnant functons However, ths leads to some serous dffcultes Duda and Hart, 973 Consder the use of K classfers each of whch solves a two-class problem of separatng ponts n a partcular class C from ponts not n that class hs s nown as a one-versus-the-rest classfer An llustraton only follows; solutons follow later

12 Hyper planes and Hyper surfaces For two category case, a postve value of dscrmnant functon decdes class and a negatve value decdes the other If the number of dmensons s three hen the decson boundary wll be a plane or a 3-D surface he decson regons become sem-nfnte volumes If the number of dmensons ncreases to more than three, then the decson boundary becomes a hyper-plane or a hyper-surface he decson regons become sem-nfnte hyperspaces

13 Learnng he classfer to be desgned s bult usng nput samples whch s a mture of all the classes he classfer learns how to dscrmnate between samples of dfferent classes If the Learnng s offlne e Supervsed method then, the classfer s frst gven a set of tranng samples and the optmal decson boundary found, and then the classfcaton s done If the learnng s onlne then there s no teacher and no tranng samples Unsupervsed he nput samples are the test samples tself he classfer learns and classfes at the same tme

14 Error he accuracy of classfcaton depends on two thngs he optmalty of decson rule used: he central tas s to fnd an optmal decson rules whch can generalze to unseen samples as well as categorze the tranng samples as correctly as possble hs decson theory leads to a mnmum error-rate classfcaton he accuracy n measurements of feature vectors: hs naccuracy s because of presence of nose Hence our classfer should deal wth nosy and mssng features too

15 Classfer ypes Statstcal Syntactc Neural Supervsed or Unsupervsed Categores of Statstcal Classfers: Lnear Quadratc Pecewse Non-parametrc

16 Parametrc Decson mang Statstcal - Supervsed Goal of most classfcaton procedures s to estmate the probabltes that a pattern to be classfed belongs to varous possble classes, based on the values of some feature or set of features In most cases, we decde whch s the most lely class e need a mathematcal decson mang algorthm, to obtan classfcaton Bayesan decson mang or Bayes heorem hs method refers to choosng the most lely class, gven the value of the feature/s Bayes theorem calculates the probablty of class membershp Defne: Pw - Pror Prob for class w ; P - Prob Uncondl for feature vector Pw - Measured-condtoned or posteror probablty P w - Prob Class-Condnl Of feature vector n class w

17 Bayes heorem: P w P w P P w P s the probablty dstrbuton for feature n the entre populaton Also called uncondtonal densty functon or evdence Pw s the pror probablty that a random sample s a member of the class C P w s the class condtonal probablty or lelhood of obtanng feature value gven that the sample s from class w It s equal to the number of tmes occurrences of, f t belongs to class w he goal s to measure: Pw Measured-condtoned or posteror probablty, from the above three values hs s the Prob of any vector beng assgned to class w P w Pw BAYES RULE Pw, P

18 ae an eample: wo class problem: Cold C and not-cold C Feature s fever f Pror probablty of a person havng a cold, PC Prob of havng a fever, gven that a person has a cold s, Pf C 4 Overall prob of fever Pf hen usng Bayes h, the Prob that a person has a cold, gven that she or he has a fever s: Not convnced that t wors? let us tae an eample wth values to verfy: P f C P C 4* P C f P f otal Populaton hus, people havng cold People havng both fever and cold 4 hus, people havng only cold 4 6 People havng fever wth and wthout cold * People havng fever wthout cold 4 6 may use ths later So, probablty percentage of people havng cold along wth fever, out of all those havng fever, s: 4/ % I ORKS, GREA

19 A Venn dagram, llustratng the two class, one feature problem PC and f PCPf C 4 C f PC Pf Probablty of a jont event - a sample comes from class C and has the feature value : PC and PCP C PPC *4 *

20 Also verfy, for a K class problem: P Pw P w + Pw P w + + Pw P w hus: w P w P w P w P w P w P w P w P w P th our last eample: Pf PCPf C + PC Pf C * *66 Decson or Classfcaton algorthm accordng to Baye s heorem: > > f ; f ; w p w p w p w p w w p w p w p w p w Choose

21 Errors n decson mang: Let d, C, PC PC K; p C ep[ π Bayes decson rule: ] P C P C Choose C,f P C > P C hs gves α, and hence the two decson regons α Classfcaton error the shaded regon mnmum of the two curves: PE PChosen C, when belongs to C + PChosen C, when belongs to C α P C P γ C dγ + P C P γ C dγ α

22 A mnmum dstance NN supervsed classfer Rule: Assgn to R, where s closest to

23 An eample of -D DRs: R and R An eample of -D DRs: R and R; wth a non-lnear DB

24 Commonly used Dscrmnant functons based on Baye s decson rule: Decson based on arbtrary Posterors, for an eample: Apples Vs Oranges

25 Some eamples of dense dstrbuton of nstances, wth non-lnear decson boundares

26 K-means Clusterng unsupervsed Gven a fed number of clusters, assgn observatons to those clusters so that the means across clusters for all varables are as dfferent from each other as possble Input Number of Clusters, Collecton of n, d dmensonal vectors j, j,,, n Goal: fnd the mean vectors,,, Output n bnary membershp matr U where u j f G else & G j, j,,, represent the clusters

27 If n s the number of nown patterns and c the desred number of clusters, the -means algorthm s: Begn ntalze n, c,,,, c randomly selected do classfy n samples accordng to nearest recompute untl no change n End return,,, c

28 Classfcaton Stage he samples have to be assgned to clusters n order to mnmze the cost functon whch s: J c c, G J hs s the Eucldan Dstance of the samples from ts cluster center; for all clusters ths sum should be mnmum he classfcaton of a pont s done by: u f otherwse j,

29 Re-computng the Means he means are recomputed accordng to: G, G Dsadvantages hat happens when there s overlap between classes that s a pont s equally close to two cluster centers Algorthm wll not termnate he ermnatng condton s modfed to Change n cost functon computed at the end of the Classfcaton s below some threshold rather than

30 An Eample he no of clusters s two n ths case But stll there s some overlap

31 Normal Densty: ] ep[ π p Bvarate Normal Densty:, ] [ y y y y y y y y y y e y p ρ π ρ ρ + - Correlaton Coeffcent SD; - Mean; - y ρ Vsualze ρ as equvalent to the orentaton of the -D Gabor flter For as a dscrete random varable, the epected value of : n P E E s also called the frst moment of the dstrbuton he th moment s defned as: n P E P s the probablty of

32 Mult-varate Case: [ d ] Mean vector: d E Covarance matr symmetrc: d d d d d dd d d d d d-dmensonal normal densty s: ] ep[ det ] ep[ det j j j j d d s p π π Σ Σ Σ

33 ] ep[ det ] ep[ det j j j j d d s p π π Σ Σ Σ where, s j s the -j th component of Σ the nverse of covarance matr Σ Specal case, d ; where y ; hen: and y y y y y y y y y ρ ρ Can you now obtan ths, as gven earler:, ] [ y y y y y y y y y y e y p ρ π ρ ρ +

34 d E ; d D Σ Contours have constant densty of the dstant term d: he contours are lnes of constant Mahalanobs dstance determned by the matr Σ, and are quadratc functons he contours of constant densty may also be hyper-ellpsods nondagonal Σ of constant Mahalanobs dstance to d d d d d dd d d d d

35 Dagonal covarance; ; ; y y ρ Dagonal covarance; ; ; > y y ρ Non-Dagonal covarance; ; ; > y y ρ ; ; < y y ρ Remember, asymmetrc and orented Gaussans

37 Decson Regons and Boundares A classfer parttons a feature space nto class-labeled decson regons DRs If decson regons are used for a possble and unque class assgnment, the regons must cover R d and be dsjont nonoverlappng In Fuzzy theory, decson regons may be overlappng he border of each decson regon s a Decson Boundary DBs ypcal classfcaton approach s as follows: Determne the decson regon n R d nto whch falls, and assgn to ths class hs strategy s smple But determnng the DRs s a challenge It may not be possble to vsualze, DRs and DBs, n a general classfcaton tas wth a large number of classes and hgher feature space dmenson

Classfers are based on Dscrmnant functons In a C-class case, Dscrmnant functons are denoted by: g,,,,c hs parttons the R d nto C dstnct dsjont regons, and the process of classfcaton s mplemented usng

38 Classfers are based on Dscrmnant functons In a C-class case, Dscrmnant functons are denoted by: g,,,,c hs parttons the R d nto C dstnct dsjont regons, and the process of classfcaton s mplemented usng the Decson Rule: Assgn to class C m or regon m, where: g > g,, m m Decson Boundary s defned by the locus of ponts, where: Mnmum dstance also NN classfer: g g, l l Dscrmnant functon s based on the dstance to the class mean: R g ; g R hs does not tae nto account class PDFs and prors

39 P w P w P w P Remember Baye s: Consder dscrmnant functon as: and class-condtonal Prob as: ] ep[ det w p d Σ Σ π Many cases arse, due to the varyng nature of Σ: Dagonal equal or unequal elements; Off-dagonal +ve or ve

40 q d C P G d + Σ Σ ] det log[ ] log[ π Let the dscrmnaton functon for the th class be: ;,, and assume, j j C P C P C P g j ] ep[ det C P g d Σ Σ π Remember, multvarate Gaussan densty? Defne: d Σ hus the classfcaton s now nfluenced by the square dstance hyper-dmensonal of from, weghted by the Σ - Let us eamne: hs quadratc term scalar s nown as the Mahalanobs dstance the dstance from to n feature space

41 d Σ For a gven, some G m s largest where d m s the smallest, for a class m assgn to class m, based on NN Rule Smplest case: Σ I, the crtera becomes the Eucldean dstance norm and hence the NN classfer hs s equvalent to obtanng the mean m, for whch s the nearest, for all he dstance functon s then: and, / / / hus, vector notatons all where d G d ω ω ω ω Neglectng the class-nvarant term hs gves the smplest lnear dscrmnant functon or correlaton detector

42 he perceptron ANN bult to form the lnear dscrmnant functon w w O d w d w O + w w Vew ths as n -D space: G M Y + C

43 Generalzed results Gaussan case of a dscrmnant functon: log log ] det log[ ] log[ d d C P G Σ Σ Σ π π he mahalanobs dstance quadratc term spawns a number of dfferent surfaces, dependng on Σ - It s bascally a vector dstance usng a Σ - norm It s denoted as: he decson regon boundares are determned by solvng : :, j gves whch + j j G G ω ω ω ω hs s an epresson of a hyperplane separatng the decson regons n R d he hyperplane wll pass through the orgn, f: j ω ω

44 Mae the case of Baye s rule more general for class assgnment Earler we has assumed that: ;,, assumng, j j C P C P C P g j Now, ] log[p ] log[ ] log[ C C P P C P G + ] log[ log ] log[ log log ] log[ ] det log[ d C P C P d C P G + Σ + Σ + Σ Σ π π Neglectng the constant term Smpler case: Σ I, and elmnatng the class-ndependent bas, we have: ] log[ C P G + hese are loc of constant hyper-spheres, centered at class mean More on ths later on

45 If Σ s a dagonal matr, wth equal/unequal : d d and Consderng the dscrmnant functon: ] log[ log C P G + Σ hs now wll yeld a weghted dstance classfer Dependng on the covarance term more spread/scatter or not, we tend to put more emphass on some feature vector components than the other Chec out the followng: hs wll gve hyper-ellptcal surfaces n R d, for each class It s also possble to lnearse t

46 G d Σ Σ Σ Σ + Σ Σ More general decson boundares ae PC K for all, and elmnatng the class ndependent terms yeld: G Σ as Σ Σ, and are symmetrc where G ω ω ω ω and hus, Σ Σ + hus the decson surfaces are hyperplanes and decson boundares wll also be lnear use G G j, as done earler Beyond ths, f a dagonal Σ s class-dependent or off-dagonal terms are non-zero, we get non-lnear DFs, DRs or DBs

47 he dscrmnant functon DF for lnearly separable classes s: g ω + ω where, ω s a d vector of weghts used for class hs functon leads to DBs that are hyperplanes It s a pont n D, lne n -D, planar surfaces n 3-D, and 3-D case: ω ωω3 3 s a plane passng through the orgn ω ; > ω d In general, the equaton: d represents a plane H passng through any pont poston vector d hs plane parttons the space nto two mutually eclusve regons, say R p and R n he assgnment of the vector to ether the +ve sde, or > f Rp ve sde or along H, can be mplemented by: ω d < f f H R n

51 A reloo at, Lnear Dscrmnant Functon g: g ω d Orentaton of H s determned by ω Locaton of H s determned by d d ω H +ve sde, R p -ve sde, R n Pattern/feature Space H s a hyperplane for d > 3 he fgure shows a D representaton he complementary role of a sample n parametrc space: H w w eght Space

52 H ω H C w d C w w 3 w [, ]; [ 3, 4 ]; 4

53 [, ]; [3, 4]; w w 4 g < 3 SOLUION SPACE g >

54 w w LMS learnng Law n BPNN or FFNN models O Read about perceptron vs mult-layer feedforward networ d w d w + + η f f η κ s the learnng rate parameter w H η η f f Χ Χ and and w

55 [, ]; [ 3, 4 ]; w w 4 3 η κ decreases wth each teraton + + η η f f Χ Χ and and

56 In case of FFNN, the objectve s to mnmze the error term: d s d e ^ Learnng Algorthm : e LMS η α Δ + Δ

58 Lets loo at Bshop chap 5; Start Sec 47, pp 9

59 MSE error surface n case of mult-layer perceptron: / / ] [ R P E d + ξ R P w w w n +,,, δ δξ δ δξ δ δξ ξ P R hus ^, n n n n n n E E R d E P [ ] [ ]; [

60 Effect of class Prors revstng DBs n a more general case P w P w P w P ] ep[ det w p d Σ Σ π CASE A Same dagonal Σ, wth dentcal dagonal elements ln ] [ w P g + Cancelng n class-nvarant terms: ln ] [ w P g + +

61 ln ] [ w P g + + ln and hus, w P where g + + ω ω ω ω l g g l, he lnear DB s thus: whch s: ; + l l ω ω ω ω Prove that the nd constant term: ln where ; l l l l l l P P ω ω ω ω ω ω + hus the lnear DB s: l where, ; Nothng new, seen earler

62 l where, ; ln l l l l P P ω ω + Lnear DB: CASE A Same dagonal Σ, wth dentcal dagonal elements Contd

65 CASE B Arbtrary Σ, but dentcal for all class ln ] [ w P g + Σ ln w P g + Σ + Σ Removng the class-nvarant quadratc term: ln and hus, w P where g + Σ Σ + ω ω ω ω l g g l, he lnear DB s thus: whch s: ; + l l ω ω ω ω ln where ; l l l l l l l P P ω ω ω ω ω ω Σ + Prove t

66 hus the lnear DB s: hus, Σ ; where, ; l ω ω l where ω Σ he normal to the DB,, s thus the transformed lne jonng the two means he transformaton matr s a symmetrc Σ he DB s thus - a tlted rotated vector jonng the two means Let Σ D be dagonal, wth non-dentcal dagonal elements: and l l hen, D ; d case Drecton of DB l l DB l > κ

67 hus the lnear DB s: l ω ω where, ; where ω Σ ;, l hus Σ Specal case: Let, ΣD be arbtrary, but wth dagonal elements, l l l l hen ; l l D Solve for n ths case, and compare wth the dagonal Σ case

68 Dagonal Σ n all cases Increasng and decreasng

69 Dagonal elements n Σ are both, n all cases

71 Pont P s actually closer n the Eucldean sense to the mean for the Orange class he dscrmnant functon evaluated at P s smaller for class 'apple' than t s for class 'orange'

72 ln ln ] [ w P g + Σ Σ ln ln and ; ; hus, w P where g + Σ Σ Σ Σ + + ω ω ω ω CASE C Arbtrary Σ, all parameters are class dependent he DBs and DFs are hyper-quadrcs e shall frst loo nto a few cases of such surfaces net l g g l,

73 ; ; 3 ; / ; 6 3 Σ Σ Eample [Duda, Hart]: ; / / ; / Σ Σ Draw and Vsualze qualtatvely the so-contours Assume; Pw Pw 5; Get epresson of DB:

74 Quadratc Decson Boundares In R d wth,,, d, consder the equaton: d d d d w + wj j + j + w + w o he above equaton s defned by a quadrc dscrmnant functon, whch yelds a quadrc surface If d,, equaton becomes: w + w + w + w + w + w

75 Specal cases of equaton: w + w + w + w + w + w Case : w w w ; Eqn defnes a lne Case : Case 3: Case 4: Case 5: w w K; w ; defnes a crcle w w ; w w w ; defnes a crcle whose center s at the orgn w w ; defnes a blnear constrant w w w ; defnes a parabola wth a specfc orentaton Case 6: w, w, w w; w w w defnes a smple ellpse Selectng sutable values of w s, gves other conc sectons; Hyperbolc?? For d > 3, we defne a famly of hyper-surfaces n R d

76 d d d d w + wj j + j + w + ω o In the above equaton, the total number of parameters s:?? d + + dd-/ d+d+/ Organze these parameters, and manpulate the equaton to obtan: + w + ωo 3 w has d terms, ω o has one term, and ω j s a dd matr as: w f ωj wj f d -d non-dagonal terms of the matr, s obtaned by duplcatng splt nto two parts: dd-/ w j s In equaton 3, the symmetrc part of matr, contrbutes to the Quadratc terms Equaton 3 generally defnes a hyperhyperbolodal surface j j If I/, we get a hyper-spheres/planes

77 Eample of lnearzaton: g o Lnearze, let 3 hen: ] 3,, [ and ],, [ where, o w g d Σ Σ + Σ Σ + + o w ω

78 ln ln ] [ w P g + Σ Σ ln ln and ; ; hus, w P where g + Σ Σ Σ Σ + + ω ω ω ω CASE C Arbtrary Σ, all parameters are class dependent contd

80 ρ ; ρ y y ; y < ; ; y ;

81 ρ ; ρ y y ; y ± C; ; y C;

82 Read about GMM, and estmaton usng MLE or EM methods

83 Kullbac-Lebler dvergence he drected Kullbac-Lebler dvergence between Epλ 'true' dstrbuton and Epλ 'appromatng' dstrbuton s gven by: entropyp - cross_entropyp& Q Hp -, log log, log, + + q p H p p q p q p D or q p q p p q p D KL KL

85 Bregman dvergence D BG p, q F p F q F q, pq Jensen Shannon dvergence: he Bregman dstance assocated wth F for ponts P, Q, s the dfference between the value of F at pont P and the value of the frstorder aylor epanson of F around pont Q evaluated at pont P F s a contnuously-dfferentable real-valued and strctly conve functon defned on a closed conve set D DKL P, M + D Q, M p, q ; where M P Q/ JS + Devance nformaton crteron Bayesan nformaton crteron Quantum relatve entropy Informaton gan n decson trees Solomon Kullbac and Rchard Lebler Informaton theory and measure theory Entropy power nequalty Informaton gan rato F-dvergence

86 Prncpal Component Analyss Egen analyss, Karhunen-Loeve transform Egenvectors: derved from Egen decomposton of the scatter matr A projecton set that best eplans the dstrbuton of the representatve features of an object of nterest PCA technques choose a dmensonalty-reducng lnear projecton that mamzes the scatter of all projected samples

87 Prncpal Component Analyss Contd Let us consder a set of N sample mages {,,, N } tang values n n-dmensonal mage space Each mage belongs to one of c classes {,,, c } Let us consder a lnear transformaton, mappng the orgnal n-dmensonal mage space to m-dmensonal feature space, where m < n he new feature vectors y є R m are defned by the lnear transformaton,,, N where, є R nm s a matr wth orthogonal columns representng the bass n feature space

88 Prncpal Component Analyss Contd otal scatter matr S s defned as N S where, N s the number of samples, and R n s the mean mage of all samples E j j he scatter of transformed feature vectors {y,y, y N } s S In PCA, opt s chosen to mamze the determnant of the total scatter matr of projected samples, e, opt argma where {w,,,m} s the set of n dmensonal egenvectors of S correspondng to m largest egenvalues chec proof S [ j ]

89 Prncpal Component Analyss Contd Egenvectors are called egen mages/pctures and also bass mages/facal bass for faces Any data say, face can be reconstructed appromately as a weghted sum of a small collecton of mages that defne a facal bass egen mages and a mean mage of the face Data form a scatter n the feature space through projecton set egen vector set Features egenvectors are etracted from the tranng set wthout pror class nformaton Unsupervsed learnng

90 Demonstraton of KL ransform Frst egen vector Second egen vector

91 Another One

92 Another Eample Source: SQUID Homepage

93 Prncpal components analyss PCA s a technque used to reduce mult-dmensonal data sets to lower dmensons for analyss he applcatons nclude eploratory data analyss and generatng predctve models PCA nvolves the computaton of the egenvalue decomposton or Sngular value decomposton of a data set, usually after mean centerng the data for each attrbute PCA s mathematcally defned as an orthogonal lnear transformaton, that transforms the data to a new coordnate system such that the greatest varance by any projecton of the data comes to le on the frst coordnate called the frst prncpal component, the second greatest varance on the second coordnate, and so on PCA can be used for dmensonalty reducton n a data set by retanng those characterstcs of the data set that contrbute most to ts varance, by eepng lower-order prncpal components and gnorng hgher-order ones Such low-order components often contan the "most mportant" aspects of the data But ths s not necessarly the case, dependng on the applcaton

94 For a data matr,, wth zero emprcal mean the emprcal mean of the dstrbuton has been subtracted from the data set, where each column s made up of results for a dfferent subject, and each row the results from a dfferent probe hs wll mean that the PCA for our data matr wll be gven by: Y ΣV, where ΣV s the sngular value decomposton SVD of Goal of PCA: Fnd some orthonormal matr, where Y ; such that COVY /nyy s dagonalzed he rows of are the prncpal components of, whch are also the egenvectors of COV Unle other lnear transforms DC, DF, D etc, PCA does not have a fed set of bass vectors Its bass vectors depend on the data set

95 SVD the theorem Suppose M s an m-by-n matr whose entres come from the feld K, whch s ether the feld of real numbers or the feld of comple numbers hen there ests a factorzaton of the form M UΣV * where U s an m-by-m untary matr over K, the matr Σ s m-by-n wth nonnegatve numbers on the dagonal and zeros off the dagonal, and V* denotes the conjugate transpose of V, an n-by-n untary matr over K Such a factorzaton s called a sngular-value decomposton of M he matr V thus contans a set of orthonormal "nput" or "analysng" bass vector drectons for M he matr U contans a set of orthonormal "output" bass vector drectons for M he matr Σ contans the sngular values, whch can be thought of as scalar "gan controls" by whch each correspondng nput s multpled to gve a correspondng output A common conventon s to order the values Σ, n non-ncreasng fashon In ths case, the dagonal matr Σ s unquely determned by M though the matrces U and V are not For p mnm,n U s m-by-p, Σ s p-by-p, and V s n-by-p

96 he Karhunen-Loève transform s therefore equvalent to fndng the sngular value decomposton of the data matr, and then obtanng the reduced-space data matr Y by projectng down nto the reduced space defned by only the frst L sngular vectors, L : ΣV ; Y L Σ LVL he matr of sngular vectors of s equvalently the matr of egenvectors of the matr of observed covarances C fnd out? : COV ΣΣ he egenvectors wth the largest egenvalues correspond to the dmensons that have the strongest correlaton n the data set PCA s equvalent to emprcal orthogonal functons EOF PCA s a popular technque n pattern recognton But t s not optmzed for class separablty An alternatve s the lnear dscrmnant analyss, whch does tae ths nto account PCA optmally mnmzes reconstructon error under the L norm D

97 PCA by COVARIANCE Method e need to fnd a dd orthonormal transformaton matr, such that: wth the constrant that: CovY s a dagonal matr, and - COV Y E[ E[ YY COV ] ] E[ E[ D Y ] D ] COV Y COV Can you derve from the above, that: [ λ, λ,, λ ] [ COV d, COV d COV,, COV d ]

100 / / ~ ~ Eample of PCA Samples: ; 3 4 ; 3 ; D problem, wth N 3 Each column s an observaton sample and each row a varable dmenson, Method easest Mean of the samples: ; ; ; ; ~ ~ ~ ; ~ COVAR

101 Method PCA defn C C N S N ~ ~ ~ ; ; 3 3 C ; 3 SgmaC COVAR SgmaC/ Net do SVD, to get vectors

102 For a face mage wth N samples and dmenson d w*h, very large, we have: he array or avg of sze d*n N vertcal samples staced horzontally hus wll be of d*d, whch wll be very large o perform egenanalyss on such large dmenson s tme consumng and may be erroneous hus often of dmenson N*N s consdered for egen-analyss ll t result n the same, after SVD? Lets chec: S ~ ~ / S m ~ ~ Lets do SVD of both:

103 S ~ S m ~ ~ U U S S V V

104 Samples: Eample, where d <> N: ; ; 3 ; 4 ; 5 ; 6 ; D problem d, wth N 6 Each column s an observaton sample and each row a varable dmenson, Mean of the samples: 3 5 COVAR M * M / / 3 ; M M * M

105 COVAR M * M U S V M * M U S V U??

106 Scatter Matrces and Separablty crtera Scatter matrces used to formulate crtera of class separablty: thn-class scatter Matr: It shows the scatter of samples around ther respectve class epected vectors S c Between-class scatter Matr: It s the scatter of the epected vectors around the mture mean s the mture mean c S B N

107 Scatter Matrces and Separablty crtera Mture scatter matr: It s the covarance matr of all samples regardless of ther class assgnments N S S + he crtera formulaton for class separablty needs to convert these matrces nto a number hs number should be larger when betweenclass scatter s larger or the wthn-class scatter s smaller Several Crtera are J tr S S J tr S trs c 3 J S ln S S ln S ln trs J 4 trs B S

108 Lnear Dscrmnant Analyss Learnng set s labeled supervsed learnng Class specfc method n the sense that t tres to shape the scatter n order to mae t more relable for classfcaton Select to mamze the rato of the between-class scatter and the wthn-class scatter Between-class scatter matr s defned by- c S B N µ s the mean of class N s the no of samples n class thn-class scatter matr s: c S

109 Lnear Dscrmnant Analyss If S s nonsngular, opt s chosen to satsfy opt argma S S B opt [w, w,,w m ] {w,,,m} s the set of egenvectors of S B and S correspondng to m largest egen valuese S B w λ S here are at most c- non-zero egen values So upper bound of m s c- w

110 Lnear Dscrmnant Analyss S s sngular most of the tme It s ran s at most N-c Soluton Use an alternatve crteron Project the samples to a lower dmensonal space Use PCA to reduce dmenson of the feature space to N-c hen apply standard FLD to reduce dmenson to c- opt s gven by opt fld pca pca argma S fld argma pca pca S S B pca pca

111 Demonstraton for LDA

112

113

114

115 Hand worout EAMPLE: Data Ponts: Class: Lets try PCA frst : Overall data mean: COVAR of the mean-subtracted data: Egenvalues after SVD of above: Fnally, the egenvectors:

116 Same EAMPLE for LDA : Data Ponts: Class: S w S b INVS -7 4 w S b Perform Egendecomposton on above: Egenvalues of S w - S b : Egenvectors:

117 S w S b Egenvalues of S w - S b : Egenvectors: S w S b Egenvalues of S w - S b : 9783 Egenvectors:

118 After lnear projecton, usng LDA:

119 Same EAMPLE for LDA, wth C 3: Data Ponts: Class: S w INVS w S b S b Perform Egendecomposton on above: Egenvalues of S w - S b : Egenvectors:

120 Data projected along st egenvector: Data projected along nd egenvector: Hence, one may need ICA

121 Some of the latest advancements n Pattern recognton technology deal wth: Neuro-fuzzy soft computng concepts Mult-classfer Combnaton decson and feature fuson Renforcement learnng Learnng from small data sets Generalzaton capabltes Evolutonary Computatons Genetc algorthms Pervasve computng Neural dynamcs Support Vector machnes - ernel methods Modern ML methods sem-supervsed, transfer learnng, doman adaptaton Manfold based learnng, deep learnng, MKL,

122 REFERENCES Statstcal pattern Recognton; S Fuunaga; Academc Press, Bshop PR Satsh Kumar - ANN

Pattern Recognition. Measurement. Class Membership. Pattern Space, P. Space, Space, C = G -1 (P) = G -1 ( F -1 (M) )

Pattern Recognition. Measurement. Class Membership. Pattern Space, P. Space, Space, C = G -1 (P) = G -1 ( F -1 (M) ) Pattern Recognton Pattern Recognton s a branch of scence that concerns the descrpton or classfcaton or dentfcaton of measurements It s an mportant component of ntellgent systems and are used for both data