Lecture 12: Classification

Size: px

Start display at page:

Download "Lecture 12: Classification"

Shannon Lucas
6 years ago
Views:

1 Lecture : Classfcaton g Dscrmnant functons g The optmal Bayes classfer g Quadratc classfers g Eucldean and Mahalanobs metrcs g K Nearest Neghbor Classfers Intellgent Sensor Systems Rcardo Guterrez-Osuna Wrght State Unversty

2 Dscrmnant functons g A convenent way to represent a pattern classfer s n terms of a famly of dscrmnant functons g (x wth a smple MAX gate as the classfcaton rule Class assgnment Select max Costs g (x g (x g C (x Dscrmnant functons x x x 3 x d Features Assgn x to class f g (x > g (x j j g How do we choose the dscrmnant functons g (x n Depends on the objectve functon to mnmze g Probablty of error g Bayes Rsk Intellgent Sensor Systems Rcardo Guterrez-Osuna Wrght State Unversty

3 Mnmzng probablty of error g Probablty of error P[error x] s the probablty of assgnng x to the wrong class n For a two-class problem, P[error x] s smply P(error x = P( P( x x f f we decde we decde g It makes sense that the classfcaton rule be desgned to mnmze the average probablty of error P[error] across all possble values of x + P(error = P(error, xdx = P(error xp(xdx g To ensure P(error s mnmum we mnmze P(error x by choosng the class wth maxmum posteror P(ω x at each x n Ths s called the MAXIMUM A POSTERIORI (MAP RULE g And the assocated dscrmnant functons become + MAP g (x = P( x Intellgent Sensor Systems Rcardo Guterrez-Osuna Wrght State Unversty 3

4 Mnmzng probablty of error g We prove the optmalty of the MAP rule graphcally n The rght plot shows the posteror for each of the two classes n The bottom plots shows the P(error for the MAP rule and another rule n Whch one has lower P(error (color-flled area? P(w x x THE MAP RULE THE OTHER RULE Choose RED Choose BLUE Choose RED Choose RED Choose BLUE Choose RED Intellgent Sensor Systems Rcardo Guterrez-Osuna Wrght State Unversty 4

5 Quadratc classfers g Let us assume that the lkelhood denstes are Gaussan P(x = ( n/ / exp (x T (x g Usng Bayes rule, the MAP dscrmnant functons become g (x = P( x = P(x P( P(x = ( n/ / exp (x T (x P( P(x n Elmnatng constant terms g (x = -/ exp (x T (x P( n We take natural logs (the logarthm s monotoncally ncreasng T g (x = (x (x - log + ( log( P( g Ths s known as a Quadratc Dscrmnant Functon g The quadratc term s know as the Mahalanobs dstance Intellgent Sensor Systems Rcardo Guterrez-Osuna Wrght State Unversty 5

6 Mahalanobs dstance g The Mahalanobs dstance can be thought of vector dstance that uses a - norm x Mahalanobs Dstance x - y = (x y T (x y µ x x - = K x - = n - can be thought of as a stretchng factor on the space n Note that for an dentty covarance matrx ( =I, the Mahalanobs dstance becomes the famlar Eucldean dstance g In the followng sldes we look at specal cases of the Quadratc classfer n For convenence we wll assume equprobable prors so we can drop the term log(p(ω Intellgent Sensor Systems Rcardo Guterrez-Osuna Wrght State Unversty 6

7 Specal case I: Σ =σ I g In ths case, the dscrmnant becomes g (x = (x T (x n Ths s known as a MINIMUM DISTANCE CLASSIFIER n Notce the lnear decson boundares Intellgent Sensor Systems Rcardo Guterrez-Osuna Wrght State Unversty 7

8 Specal case : Σ = Σ (Σ dagonal g In ths case, the dscrmnant becomes g (x = (x (x n Ths s known as a MAHALANOBIS DISTANCE CLASSIFIER n Stll lnear decson boundares T Intellgent Sensor Systems Rcardo Guterrez-Osuna Wrght State Unversty 8

9 Specal case 3: Σ =Σ (Σ non-dagonal g In ths case, the dscrmnant becomes g(x = (x (x n Ths s also known as a MAHALANOBIS DISTANCE CLASSIFIER n Stll lnear decson boundares T Intellgent Sensor Systems Rcardo Guterrez-Osuna Wrght State Unversty 9

10 Case 4: Σ =σ I, example g In ths case the quadratc expresson cannot be smplfed any further g Notce that the decson boundares are no longer lnear but quadratc Zoom out Intellgent Sensor Systems Rcardo Guterrez-Osuna Wrght State Unversty 0

11 Case 5: Σ Σ j general case, example g In ths case there are no constrants so the quadratc expresson cannot be smplfed any further g Notce that the decson boundares are also quadratc Zoom out Intellgent Sensor Systems Rcardo Guterrez-Osuna Wrght State Unversty

12 Lmtatons of quadratc classfers g The fundamental lmtaton s the unmodal Gaussan assumpton n For non-gaussan or multmodal Gaussan, the results may be sgnfcantly sub-optmal g A practcal lmtaton s assocated wth the mnmum requred sze for the dataset n If the number of examples per class s less than the number of dmensons, the covarance matrx becomes sngular and, therefore, ts nverse cannot be computed g In ths case t s common to assume the same covarance structure for all classes and compute the covarance matrx usng all the examples, regardless of class Intellgent Sensor Systems Rcardo Guterrez-Osuna Wrght State Unversty

13 Conclusons g We can extract the followng conclusons n The Bayes classfer for normally dstrbuted classes s quadratc n The Bayes classfer for normally dstrbuted classes wth equal covarance matrces s a lnear classfer n The mnmum Mahalanobs dstance classfer s optmum for g normally dstrbuted classes and equal covarance matrces and equal prors n The mnmum Eucldean dstance classfer s optmum for g normally dstrbuted classes and equal covarance matrces proportonal to the dentty matrx and equal prors n Both Eucldean and Mahalanobs dstance classfers are lnear g The goal of ths dscusson was to show that some of the most popular classfers can be derved from decson-theoretc prncples and some smplfyng assumptons n It s mportant to realze that usng a specfc (Eucldean or Mahalanobs mnmum dstance classfer mplctly corresponds to certan statstcal assumptons n The queston whether these assumptons hold or don t can rarely be answered n practce; n most cases we are lmted to postng and answerng the queston does ths classfer solve our problem or not? Intellgent Sensor Systems Rcardo Guterrez-Osuna Wrght State Unversty 3

14 K Nearest Neghbor classfer g The knn classfer s based on non-parametrc densty estmaton technques n Let us assume we seek to estmate the densty functon P(x from a dataset of examples n P(x can be approxmated by the expresson P(x k NV where V k s N s s the the the volume total number surroundng number of of examples x examples nsde V n The volume V s determned by the D-dm dstance R kd (x between x and ts k nearest neghbor P(x k NV = N c k R D D k (x x R V=πR P(x = N k R g Where c D s the volume of the unt sphere n D dmensons Intellgent Sensor Systems Rcardo Guterrez-Osuna Wrght State Unversty 4

15 K Nearest Neghbor classfer g We use the prevous result to estmate the posteror probablty n The uncondtonal densty s, agan, estmated wth P(x k N V n And the prors can be estmated by N P( = N n The posteror probablty then becomes k N P(x P( NV N k P( x = = = P(x k k n Yeldng dscrmnant functons NV g Ths s known as the k Nearest Neghbor classfer g (x = = k k Intellgent Sensor Systems Rcardo Guterrez-Osuna Wrght State Unversty 5

16 K Nearest Neghbor classfer g The knn classfer s a very ntutve method n Examples are classfed based on ther smlarty wth tranng data g For a gven unlabeled example x u R D, fnd the k closest labeled examples n the tranng data set and assgn x u to the class that appears most frequently wthn the k- subset g The knn only requres n An nteger k n A set of labeled examples n A measure of closeness axs ? axs Intellgent Sensor Systems Rcardo Guterrez-Osuna Wrght State Unversty 6

17 knn n acton: example g We generate data for a -dmensonal 3- class problem, where the class-condtonal denstes are mult-modal, and non-lnearly separable g We used knn wth n k = fve n Metrc = Eucldean dstance Intellgent Sensor Systems Rcardo Guterrez-Osuna Wrght State Unversty 7

18 knn n acton: example g We generate data for a -dm 3-class problem, where the lkelhoods are unmodal, and are dstrbuted n rngs around a common mean n These classes are also non-lnearly separable g We used knn wth n k = fve n Metrc = Eucldean dstance Intellgent Sensor Systems Rcardo Guterrez-Osuna Wrght State Unversty 8

19 knn versus NN -NN 5-NN 0-NN Intellgent Sensor Systems Rcardo Guterrez-Osuna Wrght State Unversty 9

20 Characterstcs of the knn classfer g Advantages n Analytcally tractable, smple mplementaton n Nearly optmal n the large sample lmt (N g P[error] Bayes >P[error] -NNR <P[error] Bayes n Uses local nformaton, whch can yeld hghly adaptve behavor n Lends tself very easly to parallel mplementatons g Dsadvantages n Large storage requrements n Computatonally ntensve recall n Hghly susceptble to the curse of dmensonalty g NN versus knn n The use of large values of k has two man advantages g Yelds smoother decson regons g Provdes probablstc nformaton: The rato of examples for each class gves nformaton about the ambguty of the decson n However, too large values of k are detrmental g It destroys the localty of the estmaton g In addton, t ncreases the computatonal burden Intellgent Sensor Systems Rcardo Guterrez-Osuna Wrght State Unversty 0

Statistical pattern recognition

Statistical pattern recognition Statstcal pattern recognton Bayes theorem Problem: decdng f a patent has a partcular condton based on a partcular test However, the test s mperfect Someone wth the condton may go undetected (false negatve