Bayesan decson theory Nuno Vasconcelos ECE Department, UCSD
Bayesan decson theory recall that we have state of the world observatons decson functon L[,y] loss of predctn y wth the epected value of the loss s called the rsk Rsk E whch can be wrtten as [ L, ], Rsk M, L[, ] d,
Bayesan decson theory from ths Rsk by chan rule where Rsk M, L[, ] d, M L[, ] d R R d E [ R ] M L[, ] 0 0 s the condtonal rsk, ven the observaton 3
Bayesan decson theory snce, by defnton, t follows that L [, ] 0,, y 0 R M L[, ] 0, hence Rsk E [ R ] s mnmum f we mnmze R at all,.e., f we use pck the decson functon ar mn M L [, ] 4
Bayesan decson theory ths s the Bayes decson rule M whch decson rule has the assocated rsk ], [ ar mn L M whch decson rule has smaller condtonal rsk? the assocated rsk d L R M ], [,, 0 or d L R M ] [ s the Bayes rsk and cannot be beaten d L R ], [ 5 s the Bayes rsk, and cannot be beaten
Eample let s consder a bnary classfcaton problem {0,} for whch the condtonal rsk s M R we have two optons L[, ] 0 0 L [,0] L [,] + 0 R 0 L [0,0] 0] + L [0,] 0 + L R 0 L[,0] L[,] + and should pck the one of smaller condtonal rsk 6
Eample.e. pck 0 f R 0 < R and otherwse ths can be wrtten as, pck 0 f 0 L[0,0] + < 0 L[,0] + L[0,] < L[,] or { L[0,0] [,0] } { L[,] L[0,] } 0 L < < 0 usually there s no loss assocated wth the correct decson L[,] L[0,0] 0 and ths s the same as 0 L[,0] > L[0,] 7
Eample or, pck 0 f > and applyn Bayes rule 0 0 L[0,] L [,0] 0 L whch s equvalent to pck 0 f 0 > T > L [0,] L[,0] L[0,] L[,0] 0 0.e. we pck 0, when the probablty of ven that 0 dvded by that ven s reater than a threshold the optmal threshold h T depends d on the costs of the two types of error and the probabltes of the two classes 8
Eample let s consder the 0- loss n ths case the optmal decson functon s y y y L 0,, ], [ n ths case the optmal decson functon s ], [ ar mn L M ar mn [ ] mn ar [ ] mn ar ma ar 9 ma ar
Eample for the 0- loss the optmal decson rule s the mamum a-posteror probablty rule ar ma what s the assocated rsk? M R L[, ] d M d y d, y, d 0
Eample but R y, d, s really just the probablty of error of the decson rule note that the same result would hold for any,.e. R would be the probablty of error of ths mples the follown for the 0- loss the Bayes decson rule s the MA rule ar ma the rsk s the probablty of error of ths rule Bayes error there s no other decson functon wth lower error
MA rule usually can be wrtten n a smple form ven a probablstc model for and consder the two-class problem,.e. 0 or the BDR s ar ma 0,, f f 0 0 < 0 pck 0 when and otherwse usn Bayes rule 0 0 0 0
MA rule notn that s a non-neatve quantty ths s the same as pck 0 when 0 0 by usn the same reasonn, ths can be easly eneralzed to note that: ar ma many class-condtonal dstrbutons are eponental e.. the Gaussan ths product can be trcky to compute e.. the tal probabltes are qute small we can take advantae of the fact that we only care about the order of the terms on the rht-hand sde 3
The lo trck ths s the lo trck whch s to take los note that the lo s a monotoncally ncreasn functon a > b lo a > lob lo a lo b from whch ar ma ar ma lo ar ma lo the order s preserved + lo b a 4
MA rule n summary for the zero/one loss, the follown three decson rules are optmal and equvalent ar ma [ ] ar ma 3 ar ma[ lo + lo ] s usually hard to use, 3 s frequently easer than 5
Eample the Bayes decson rule s usually hhly ntutve eample: communcatons a bt s transmtted by a source, corrupted by nose, and receved by a decoder channel Q: what should the optmal decoder do to recover? 6
Eample ntutvely, t appears that t should just threshold pck T decson rule 0,, f f < T > T what s the threshold value? let s solve the problem wth the BDR 7
Eample we need class probabltes: n the absence of any other nfo let s say 0 class-condtonal denstes: nose results from thermal processes, electrons movn around and bumpn each other a lot of ndependent events that add up by the central lmt theorem t appears reasonable to assume that the nose s Gaussan we denote a Gaussan random varable of mean and varance by ~ N, 8
Eample the Gaussan probablty densty functon s G,, e π snce nose s Gaussan, and assumn t s just added to the snal we have channel + ε, ε ~ N0, n both cases, corresponds to a constant plus zero-mean Gaussan nose ths smply adds to the mean of the Gaussan 9
Eample n summary 0 G,0, G,, 0 or, raphcally, 0 0
Eample to compute the BDR, we recall that ar ma lo and note that t [ + lo ] terms whch are constant as a functon of can be dropped snce we are just lookn for the that mamzes the functon snce ths s the case for the class-probabltes we have 0 ar ma lo
BDR ths s ntutve we pck the class that best eplans ves hher probablty the observaton n ths case, we can solve vsually 0 pck 0 pck but the mathematcal t soluton s equally smple
BDR let s consder the more eneral case 0 G G for whch,,,, 0 0 G G ar ma lo ar ma lo π e lo ar ma π 3 ar mn
BDR or ar mn ar mn + ar mn + the optmal decson s, therefore pck 0 f or, pck 0 f + 0 0 < 0 + < 0 0 < + 4
BDR for a problem wth Gaussan classes, equal varances and equal class probabltes optmal decson boundary s the threshold at the md-pont between the two means pck 0 0 pck 5
BDR back to our snal decodn problem n ths case T 0.5 decson rule 0,, f f < 0.5 > 0.5 ths s, once aan, ntutve we place the threshold mdway alon the nose sources 6
BDR what s the pont of on throuh all the math? now we know that the ntutve threshold s actually optmal, and n whch sense t s optmal mnmum probablty or error the Bayesan soluton keeps us honest. t forces us to make all our assumptons eplct assumptons we have made unform class probabltes 0 Gaussanty the varance s the same under the two states G,,, nose s addtve even for a trval problem, we have made lots of assumptons + ε 7
BDR what f the class probabltes are not the same? e codn scheme 7 0 e.. codn scheme 7 0 n ths case >> 0 how does ths chane the optmal decson rule? { } lo lo ar ma + + lo lo ar ma e π + lo lo ar ma π 8 lo mn ar
BDR or lo ar mn lo ar mn lo ar mn + + the optmal decson s, therefore k 0 f lo mn ar + pck 0 f 0 l lo 0 lo 0 0 + < + < + or, pck 0 f lo 0 0 + < 0 + 9 0 lo 0 0 + + <
BDR what s the role of the pror for class probabltes? + < 0 + 0 lo 0 the pror moves the threshold up or down, n an ntutve way 0> : threshold ncreases snce 0 has hher probablty, we care more about errors on the 0 sde by usn a hher threshold we are makn t more lkely to pck 0 f 0, all we care about s 0, the threshold becomes nfnte we never say how relevant s the pror? t s wehed by 0 30
BDR how relevant s the pror? t s wehed by the nverse of the normalzed dstance between the means 0 dstance between the means n unts of varance f the classes are very far apart, the pror makes no dfference ths s the easy stuaton, the observatons are very clear, Bayes says foret the pror knowlede f the classes are eactly equal same mean the pror ets nfnte weht n ths case the observatons do not say anythn about the class, Bayes says foret about the data, just use the knowlede that you started wth even f that means always say 0 or always say 3
3