The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

he Gaussan classfer Nuno Vasconcelos ECE Department, UCSD

Bayesan decson theory recall that we have state of the world X observatons g decson functon L[g,y] loss of predctng y wth g Bayes decson rule s the rule that mnmzes the rsk Rsk E [ L X, ] X, gven, t conssts of pckng the predcton of mnmum condtonal rsk g M arg mn g X L [ g, ]

MA rule for the - loss L[ g,, y], g y g y the optmal decson rule s the mamum a-posteror probablty rule g arg ma X the assocated rsk s the probablty of error of ths rule Bayes error there s no other decson functon wth lower error 3

MA rule by applcaton of smple mathematcal laws Bayes rule, monotoncty of the we have shown that the followng three decson rules are optmal and equvalent arg ma X arg ma[ ] X 3 arg ma[ ] X s usually hard to use, 3 s frequently easer than 4

Eample the Bayes decson rule s usually hghly ntutve wehaveusedaneamplefromcommuncatons an eample a bt s transmtted by a source, corrupted by nose, and receved by a decoder channel X Q: what should the optmal decoder do to recover? 5

Eample ths was modeled as a classfcaton problem wth Gaussan classes X G,, X G,, G,, e π or, graphcally, 6

BDR for whch the optmal decson boundary s a threshold pck f < pck pck 7

BDR what s the pont of gong through all the math? now we know that the ntutve threshold s actually optmal, and n whch sense t s optmal mnmum probablty or error the Bayesan soluton keeps us honest. t forces us to make all our assumptons eplct assumptons we have made unform class probabltes Gaussanty the varance s the same under the two states G,, X, nose s addtve even for a trval problem, we have made lots of assumptons X ε 8

BDR what f the class probabltes are not the same? e g codng scheme 7 e.g. codng scheme 7 n ths case >> how does ths change the optmal decson rule? { } arg ma X arg ma e π arg ma π 9 mn arg

BDR or arg mn arg mn arg mn the optmal decson s, therefore k f mn arg pck f l < < or, pck f < <

BDR what s the role of the pror for class probabltes? < the pror moves the threshold up or down, n an ntutve way > : threshold ncreases snce has hgher probablty, we care more about errors on the sde by usng a hgher threshold we are makng t more lkely to pck f, all we care about s, the threshold becomes nfnte we never say how relevant s the pror? t s weghed by

BDR how relevant s the pror? t s weghed by the nverse of the normalzed dstance between the means dstance between the means n unts of varance f the classes are very far apart, the pror makes no dfference ths s the easy stuaton, the observatons are very clear, Bayes says forget the pror knowledge f the classes are eactly equal same mean the pror gets nfnte weght n ths case the observatons do not say anythng about the class, Bayes says forget about the data, just use the knowledge that you started wth even f that means always say or always say

he Gaussan classfer ths s one eample of a Gaussan classfer n practce we rarely have only one varable n practce we rarely have only one varable typcally X X,, X n s a vector of observatons the BDR for ths case s equvalent, but more nterestng q g the central dfferent s the class-condtonal dstrbutons are multvarate Gaussan X ep d π 3

he Gaussan classfer n ths case the BDR ep d X π the BDR [ ] arg ma X becomes l l arg ma d 4 d π

he Gaussan classfer ths can be wrtten as [ d, α ] arg mn wth dscrmnant: X.5 d, y y y α π d the optmal rule s to assgn to the closest class closest s measured wth the Mahalanobs dstance d,y to whch the α constant s added to account for the class pror 5

he Gaussan classfer frst specal case of nterest: all classes have the same covarance, the BDR becomes, [ d, α ] arg mn wth d, y y y same metrc for all classes α d π constant, not functon of, can be dropped 6

he Gaussan classfer n detal [ ] arg mn [ ] arg mn [ ] mn arg [ ] arg mn l 444 4 3 44 4 3 arg ma w w 7

he Gaussan classfer n summary, dscrmnant: X.5 arg ma g wth g w w w w the BDR s a lnear functon or a lnear dscrmnant 8