Bayesian decision theory. Nuno Vasconcelos ECE Department, UCSD

Size: px

Start display at page:

Download "Bayesian decision theory. Nuno Vasconcelos ECE Department, UCSD"

Margery French
5 years ago
Views:

1 Bayesan decson theory Nuno Vasconcelos ECE Department UCSD

2 Notaton the notaton n DHS s qute sloppy e.. show that error error z z dz really not clear what ths means we wll use the follown notaton subscrpts are random varables uppercase aruments are the values of the random varables lowercase equvalent to y y0 2

3 Bayesan decson theory framework for computn optmal decsons on problems nvolvn uncertanty probabltes basc concepts: world: has states or classes drawn from a state or class random varable fsh classfcaton {bass salmon} student radn {A B C D F} medcal danoss {dsease A dsease B dsease M} observer: measures observatons features drawn from a random process fsh classfcaton scale lenth scale wdth R 2 student radn HW HW n R n medcal danoss symptom symptom n R n 3

4 Bayesan decson theory decson functon: observer uses the observatons to make decsons about the state of the world y f Ω and y Ψ the decson functon s the mappn such that and y o s a predcton of the state y loss functon: : Ω Ψ y o s the cost Ly o y of decdn for y o when the true state s y usually ths s zero f there s no error and postve otherwse oal: to determne the optmal decson functon for the loss L.. 4

5 Classfcaton we wll focus on classfcaton problems the observer tres to nfer the state of the world we wll also mostly consder the 0- loss functon L[ y] 0 but the reresson case { M } K the observer tres to predct a contnuous y R s bascally the same for a sutable loss functon e.. squared error L[ y] y y y 2 5

6 Tools for solvn BDT problem n order to fnd optmal decson functon we need a probablstc descrpton of the problem n the most eneral form ths s the jont dstrbuton but we frequently decompose t nto a combnaton of two terms { 4243 these are the class condtonal dstrbuton and class probablty class probablty pror probablty of state before observer actually measures anythn reflects a pror belef that f all else s equal the world wll be n state wth probablty 6

7 Tools for solvn BDT problem class-condtonal dstrbuton: s the model for the observatons ven the class or state of the world consder the radn eample I know from eperence that a% of the students wll et A s b% B s c% C s and so forth hence for any student A a/00 B b / 00 etc. these are the state probabltes before I et to see any of the student s work the class-condtonal denstes are the models for the rades themselves let s assume that the rades are always Gaussan.e. they are completely characterzed by a mean and a varance 7

8 Tools for solvn BDT problem knowlede of the class chanes the mean rade e.. I epect A students to have an averae HW rade of 90% B students 75% C students 60% etc ths means that G µ σ.e. the dstrbuton of class s a Gaussan of mean µ and varance σ note that the decomposton s a specal case of a very powerful tool n Bayesan nference 8

9 The chan rule of probablty s an mportant consequence of the defnton of condtonal probablty note that by recursve applcaton of we can wrte y y y 2... n n 2 n n ths s called the chan rule of probablty n 3 n... n n n n n n t allows us to modularze nference problems 9

10 The chan rule of probablty e.. n the medcal danoss scenaro what s the probablty that you wll be sck and have 04 o of fever? sck sck breaks down a hard queston prob of sck and 04 nto two easer questons rob sck04: everyone knows that ths s close to one sck 04! ou have a cold! 0

11 The chan rule of probablty e.. what s the probablty that you wll be sck and have 04 o of fever? sck04 sck rob04: stll hard but easer than sck04 snce we know only have one random varable temperature does not depend on sckness t s just the queston what s the probablty that someone wll have 04 o? ather a number of people measure ther temperatures and make an hstoram that everyone can use after that

12 Tools for solvn BDT problems frequently we have problems wth multple random varables e.. when n the doctor you are mostly a collecton of random varables : temperature 2 : blood pressure 3 : weht 4 : couh we can summarze ths as a vector n of n random varables n s the jont probablty dstrbuton but frequently we only care about a subset of 2

13 3 Marnalzaton what f I only want to know f the patent has a cold or not? e.. havn a cold does not depend on blood pressure and weht all that matters are fever and couh that s we need to know 4 ab we marnalze wth respect to a subset of varables n ths case and 4 ths s done by summn or nteratn the others out d d cold?

14 Marnalzaton etremely mportant equaton: seems trval but for lare models s a major computatonal asset for probablstc nference for any queston there are lots of varables whch are rrelevant drect evaluaton s frequently ntractable typcally we combne wth the chan rule to eplore ndependence relatonshps that wll allow us to reduce computaton ndependence: and are ndependent random varables f y 4

15 Independence etremely useful n the desn of ntellent systems frequently known makes ndependent of Z e.. consder the shvern symptom: f you have temperature you sometmes shver t s a symptom of havn a cold but once you measure the temperature the two become ndependent sck98 shver sck 98 shver S S shver S sck 98 shver 98 S 98 smplfes consderably the estmaton of the probabltes 5

16 Independence combned wth marnalzaton enables effcent computaton e. to compute sck marnalzaton 2 chan rule sck sck s d 3 ndependence s S sck S sck s S s s d dvdn and roupn terms dvde and conquer makes the nteral smpler sck sck S s d s 6

17 Tools for solvn BDT problems Bayes rule y y y s the central equaton of Bayesan nference allows us to swtch the relaton between the varables ths s etremely useful e.. for medcal danoss doctor needs to know dsease y symptom ths s very complcated because t s not causal we are askn for the probablty of cause ven consequence 7

18 Tools for solvn BDT problems Bayes rule transforms t nto the probablty of consequence ven cause dsease y symptom and some other stuff note that symptom dsease y s easy you can et t out of any medcal tetbook what about the other stuff? dsease y does not depend on the patent you can et t by collectn statstcs over the entre populaton symptom s a combnaton of the two marnalzaton symptom symptom dsease y y symptom symptom dsease dsease y y dsease y 8

19 Bayes rule Bayes rule allows us to combne tetbook knowlede wth pror knowlede to compute the probablty of cause ven consequence e.. f you heard on the rado that there s an outbreak of measles you ncrease the pror probablty for the measles dsease cause measles snce relaton between cause and consequence patent symptoms measles does not chane Bayes rule wll ve you the updated measles patent symptoms that accounts for the new nformaton ths s hard f you work drectly wth the posteror probablty 9

20 Bayesan decson theory recall that we have state of the world observatons decson functon L[y] loss of predctn y wth the epected value of the loss s called the rsk Rsk E whch can be wrtten as [ L ] Rsk M L[ ] d 20

21 Bayesan decson theory from ths Rsk by chan rule M L[ ] d Rsk M L[ ] d R d E [ R ] where R M L[ ] s the condtonal rsk ven the observaton 2

22 Bayesan decson theory snce by defnton t follows that L [ ] 0 y R M L[ ] 0 hence Rsk E [ R ] s mnmum f we mnmze R at all.e. f we use pck the decson functon * ar mn M L[ ] 22

23 23 Bayesan decson theory ths s the Bayes decson rule the assocated rsk or s the Bayes rsk and cannot be beaten ] [ ar mn * L M d L R M ] [ * * d L R M ] [ * *

24 Eample let s consder a bnary classfcaton problem for whch the condtonal rsk s M R we have two optons * {0} L[ ] 0 L[ 0] L[ + ] 0 R0 0 L[00] L[0] + R 0 L[0] L[] + and should pck the one of smaller condtonal rsk 24

25 Eample.e. pck 0 f R 0 < R and otherwse ths can be wrtten as pck 0 f or 0 L[00] + < 0 < usually there s no loss assocated wth the correct decson and ths s the same as 0 L[0] + L[0] < { L[00] L[0] } < { L[] L[0] } L[ ] L[00] 0 0 L[0] > L[0] L[] 25

26 Eample or pck 0 f and applyn Bayes rule 0 L[0] > L[0] whch s equvalent to pck 0 f 0 0 L[0] > L[0] 0 * > T L[0] L[0] 0.e. we pck 0 when the probablty of ven that 0 dvded by that ven s reater than a threshold the optmal threshold T* depends on the costs of the two types of error and the probabltes of the two classes 26

27 27 Eample let s consder the 0- loss n ths case the optmal decson functon s y y y L 0 ] [ ] [ ar mn * L M ar mn [ ] mn ar ma ar ma ar

28 Eample for the 0- loss the optmal decson rule s the mamum a-posteror probablty rule * ar ma what s the assocated rsk? M * R* L[ ] d M d * y * d y * d 28

29 Eample but R * y * d s really just the probablty of error of the decson rule * note that the same result would hold for any.e. R would be the probablty of error of ths mples the follown for the 0- loss the Bayes decson rule s the MA rule * ar ma the rsk s the probablty of error of ths rule Bayes error there s no other decson functon wth lower error 29

30 30

Bayesian decision theory. Nuno Vasconcelos ECE Department, UCSD

Bayesian decision theory. Nuno Vasconcelos ECE Department, UCSD Bayesan decson theory Nuno Vasconcelos ECE Department, UCSD Bayesan decson theory recall that we have state of the world observatons decson functon L[,y] loss of predctn y wth the epected value of the