Statistical pattern recognition

Statstcal pattern recognton

Bayes theorem Problem: decdng f a patent has a partcular condton based on a partcular test However, the test s mperfect Someone wth the condton may go undetected (false negatve Someone wthout t condton may come out postve (false postve est propertes SPECIFICIY or true negatve rate P(NEG COND SENSIIVIY or true postve rate P(POS COND Rcardo Guterrez Osuna AMU CSE

Problem defnton Assume a populaton of, where out of every people has the medcal condton Assume that we desgn a test wth 98% specfcty P(NEG COND and 9% senstvty P(POS COND You take the test, and t comes POSIIVE What condtonal probablty are we after? How lkely s t that you have the condton? Rcardo Guterrez Osuna AMU CSE

Soluton: Jont frequency table he answer s the rato of ndvduals wth the condton to total ndvduals (consderng only ndvduals that tested postve or 9/88. HAS CONDIION FREE OF CONDIION ES IS POSIIVE ES IS NEGAIVE ROW OAL rue postve P(POS COND.99 False postve P(POS COND 9,9 (.9898 False negatve P(NEG COND (.9 rue negatve P(NEG COND 9,9.989, 9,9 COLUMN OAL 88 9,, Rcardo Guterrez Osuna AMU CSE

Condtonal probablty S S P(A I B P (A B for P(B > P(B A A B B B has A A B B occurred otal probablty P(A P(A I S P(A I B +... + P(A I BN P(A B P(B +... + P(A B P(B N P(A B k k P(B k N N B B B N- A B B N Rcardo Guterrez Osuna AMU CSE

Alternatve soluton: Bayes theorem P (A B P(B AP(A P(B P( + cond P(cond P (cond + P( + P( + cond P(cond P( + cond P(cond + P( + cond P( cond.9..9. + (.98.99. Rcardo Guterrez Osuna AMU CSE

In SPR, Bayes theorem s expressed as Posteror P(ω x P(x ω P(ω j j j N k P(x ω k P(ω k Lkelhood P(x ω j Pror P(ω P(x Norm constant j And we assgn sample x to the class ω k wth the hghest posteror It It can be shown ths rule mnmzes the prob. of error Rcardo Guterrez Osuna AMU CSE

Dscrmnant functons Class assgnment Select max Costs g (x g (x g C (x Dscrmnant functons x x x x d Features x ω where g (x > g (x g (x p( ω j j x Rcardo Guterrez Osuna AMU CSE

Quadratc classfers For normally dstrbuted classes, the posteror can be reduced to a very smple expresson Recall an n dmensonal Gaussan densty s p(x ( π n/ / exp (x μ (x μ U Usng Bayes rule, the DF can be wrtten as g (x P(ω x P(x ω P(ω P(x exp (x μ (x μ P(ω n/ / ( π P(x Rcardo Guterrez Osuna AMU CSE

Elmnatng constant terms g ( x -/ exp (x μ (x μ P( ω And takng logs g (x (x μ (x μ log + log P(ω hs s known as a quadratc dscrmnant functon (because t s a functon of x Rcardo Guterrez Osuna AMU CSE

Case : Σ σ I Features are statstcally ndependent, and have the same varance for all classes In ths case, the quadratc dscrmnant functon becomes g (x ( σ I - (x μ (x μ- log σ (x μ (x μ + log P(ω σ I + log P(ω Assumng equal prors and droppng constant terms g (x (x μ (x μ - DIM ( x μ hs s called an Eucldean dstance or nearest mean classfer Rcardo Guterrez Osuna AMU CSE

[ ] [ ] [ ] μ μ μ Σ Σ Σ Rcardo Guterrez Osuna AMU CSE

Case : Σ Σ All classes have the same covarance matrx, but the matrx s not dagonal In ths case, the quadratc dscrmnant becomes ( g (x (x μ (x μ - log + ( log ( P(ω assumng g equal prors and elmnatng constants g (x (x μ Σ - (x μ x hs s known as a Mahalanobs dstance classfer μ x x - μ K x - μ Κ Rcardo Guterrez Osuna AMU CSE

[ ] [ ] [ ]... μ μ μ.. Σ.. Σ.. Σ Rcardo Guterrez Osuna AMU CSE

General case [ ] [ ] [ ] μ μ μ [ ] [ ] [ ]... Σ Σ Σ μ μ μ Zoom out Rcardo Guterrez Osuna AMU CSE

k nearest neghbors Non parametrc approxmaton Lkelhood of each class P(x ω k N V x V And prors P(ω N N hen, the posteror becomes P(ω x P(x ω P(ω P(x k N N V N k NV Rcardo Guterrez Osuna AMU CSE k k

Example Gven the three classes, assgn a class label for the unknown example x u Assume the Eucldean dstance and k neghbors Of the closest neghbors, belong to ω and belongs ω to ω, so x s assgned to ω u, the predomnant class ω x u ω Rcardo Guterrez Osuna AMU CSE

Rcardo Guterrez Osuna AMU CSE

-NN -NN -NN Rcardo Guterrez Osuna AMU CSE

Advantages Smple mplementaton Nearly optmal n the large sample lmt (N P[error] Bayes <P[error] NN <P[error] Bayes Uses local nformaton, whch can yeld hghly adaptve behavor Lends tself very easly to parallel mplementatons Dsadvantages Large storage requrements Computatonally ntensve recall Hghly susceptble to the curse of dmensonalty Rcardo Guterrez Osuna AMU CSE

Dmensonalty reducton

Why do dmensonalty reducton? he so called curse of dmensonalty Exponental growth n the number of examples requred to accurately estmate a functon Exploratory data analyss Vsualzng the structure of the data n a lowdmensonal subspace Rcardo Guterrez Osuna AMU CSE

wo approaches to perform dmensonalty reducton Feature selecton: choose a subset of all the features [ x x...x ] [ x x ] N...x M Feature extracton: create new features by combnngthe exstng ones [ x x...x ] [ y y...y ] f ( [ x x ] N y M...x M Feature extracton s typcally a lnear transform x x M x N y lnear feature extracton y y M w w M w M w w w M M Rcardo Guterrez Osuna AMU CSE L L O w w w N N M MN x x M x N

Representaton vs. classfcaton Fe eature Feature Rcardo Guterrez Osuna AMU CSE

PCA Soluton Project the data onto the egenvectors of the largest egenvalues of the covarance matrx PCA fnds orthogonal drectons of largest varance Propertes If data s Gaussan, PCA fnds ndependent axes Otherwse, t smply de correlates lt the axes Lmtaton Drectons of hgh h varance do not necessarly contan dscrmnatory nformaton Rcardo Guterrez Osuna AMU CSE

LDA Defne scatter matrces x Wthn class μ S W S B S W C S C x ω ( x μ ( x μ S B μ S B μ S W Between class μ S W S B C N ( μ μ( μ μ x hen maxmze rato J(W W S W S B W W W Rcardo Guterrez Osuna AMU CSE

Soluton NOE Optmal projectons are the egenvectors of the largest egenvalues of the generalzed egenvalue problem ( S λ S w B W S B s the sum of C matrces of rank one or less and the mean vectors are constraned by Σμ μ herefore, S B wll be at most of rank (C, and LDA produces at most C feature projectons Lmtatons Overfttng Informaton not n the mean of the data Classes sgnfcantly non Gaussan Rcardo Guterrez Osuna AMU CSE

PCA axs 6 - - -6-8 9 6 6 8 9 8 66 9 9 8 6 6 6 6 6 8 8 6 8 8 9 9 8 9 8 9 6 6 6 98 8 9 6 9 8 8 8 6 6 8 8 9 8-6 8 - - axs axs - - 6-8 9 6 6 6 6 6 6 6 9 6 6 9 8 9 88 9 8 9 9 8 9 9 6 6 6 8 9 9 8 8 8 8 9 8 9 8 89 9 8 8 - - axs axs x - LDA axs - 9 8 9 9 9 88 98 88 99 8 888 8 9 8 8 6 6 6 6 6 6 66 -. -...... axs axs x - - - - -. 9 8 8 89 8 8 8 9 9 9 9 8 9 6 666 6 6 6.. - axs axs x - Rcardo Guterrez Osuna AMU CSE

LDA and overfttng Generate an artfcal dataset h l l l h h lk lh d hree classes, examples per class, wth the exact same lkelhood: a multvarate Gaussan wth zero mean and dentty covarance dmensons dmensons dmensons dmensons Rcardo Guterrez Osuna AMU CSE