he Gaussan classfer Nuno Vasconcelos ECE Department, UCSD
Bayesan decson theory recall that we have state of the world X observatons g decson functon L[g,y] loss of predctng y wth g Bayes decson rule s the rule that mnmzes the rsk Rsk E [ L X, ] X, gven, t conssts of pckng the predcton of mnmum condtonal rsk g M arg mn g X L [ g, ]
MA rule for the - loss L[ g,, y], g y g y the optmal decson rule s the mamum a-posteror probablty rule g arg ma X the assocated rsk s the probablty of error of ths rule Bayes error there s no other decson functon wth lower error 3
MA rule by applcaton of smple mathematcal laws Bayes rule, monotoncty of the we have shown that the followng three decson rules are optmal and equvalent arg ma X arg ma[ ] X 3 arg ma[ ] X s usually hard to use, 3 s frequently easer than 4
Eample the Bayes decson rule s usually hghly ntutve wehaveusedaneamplefromcommuncatons an eample a bt s transmtted by a source, corrupted by nose, and receved by a decoder channel X Q: what should the optmal decoder do to recover? 5
Eample ths was modeled as a classfcaton problem wth Gaussan classes X G,, X G,, G,, e π or, graphcally, 6
BDR for whch the optmal decson boundary s a threshold pck f < pck pck 7
BDR what s the pont of gong through all the math? now we know that the ntutve threshold s actually optmal, and n whch sense t s optmal mnmum probablty or error the Bayesan soluton keeps us honest. t forces us to make all our assumptons eplct assumptons we have made unform class probabltes Gaussanty the varance s the same under the two states G,, X, nose s addtve even for a trval problem, we have made lots of assumptons X ε 8
BDR what f the class probabltes are not the same? e g codng scheme 7 e.g. codng scheme 7 n ths case >> how does ths change the optmal decson rule? { } arg ma X arg ma e π arg ma π 9 mn arg
BDR or arg mn arg mn arg mn the optmal decson s, therefore k f mn arg pck f l < < or, pck f < <
BDR what s the role of the pror for class probabltes? < the pror moves the threshold up or down, n an ntutve way > : threshold ncreases snce has hgher probablty, we care more about errors on the sde by usng a hgher threshold we are makng t more lkely to pck f, all we care about s, the threshold becomes nfnte we never say how relevant s the pror? t s weghed by
BDR how relevant s the pror? t s weghed by the nverse of the normalzed dstance between the means dstance between the means n unts of varance f the classes are very far apart, the pror makes no dfference ths s the easy stuaton, the observatons are very clear, Bayes says forget the pror knowledge f the classes are eactly equal same mean the pror gets nfnte weght n ths case the observatons do not say anythng about the class, Bayes says forget about the data, just use the knowledge that you started wth even f that means always say or always say
he Gaussan classfer ths s one eample of a Gaussan classfer n practce we rarely have only one varable n practce we rarely have only one varable typcally X X,, X n s a vector of observatons the BDR for ths case s equvalent, but more nterestng q g the central dfferent s the class-condtonal dstrbutons are multvarate Gaussan X ep d π 3
he Gaussan classfer n ths case the BDR ep d X π the BDR [ ] arg ma X becomes l l arg ma d 4 d π
he Gaussan classfer ths can be wrtten as [ d, α ] arg mn wth dscrmnant: X.5 d, y y y α π d the optmal rule s to assgn to the closest class closest s measured wth the Mahalanobs dstance d,y to whch the α constant s added to account for the class pror 5
he Gaussan classfer frst specal case of nterest: all classes have the same covarance, the BDR becomes, [ d, α ] arg mn wth d, y y y same metrc for all classes α d π constant, not functon of, can be dropped 6
he Gaussan classfer n detal [ ] arg mn [ ] arg mn [ ] mn arg [ ] arg mn l 444 4 3 44 4 3 arg ma w w 7
he Gaussan classfer n summary, dscrmnant: X.5 arg ma g wth g w w w w the BDR s a lnear functon or a lnear dscrmnant 8
46