Bayesian Decision Theory Tutorial Visual Recognition Tutorial 1

Size: px

Start display at page:

Download "Bayesian Decision Theory Tutorial Visual Recognition Tutorial 1"

Erin Dickerson
5 years ago
Views:

1 Bayesian Decision Theory Tutorial C Visual Recognition Tutorial

2 Tutorial the outline Bayesian decision making with discrete probabilities an eample Looking at continuous densities Bayesian decision making with continuous probabilities an eample The Bayesian Doctor Eample Visual Recognition Tutorial

3 Eample checking on a course A student needs to achieve a decision on which courses to take, based only on his first lecture. From his previous eperience, he knows: Quality of the course good fair bad Probability prior These are prior probabilities Visual Recognition Tutorial 3

4 Eample continued The student also knows the class-conditionals: Pr j good fair bad Interesting lecture Boring lecture The loss function is given by the matri λa i j good course fair course bad course Taking the course Not taking the course Visual Recognition Tutorial 4

5 Eample continued The student wants to make an optimal decision> minimal possible Rά The probability to get the interesting lecture interesting: Printeresting Printerestinggood course* Prgood course + Printerestingfair course* Prfair course + Printerestingbad course* Prbad course 0.8*0.+0.5*0.4+0.* Consequently, Prboring Suppose the lecture was interesting. Then we want to compute the posterior probabilities of each one of the 3 possible states of nature Visual Recognition Tutorial 5

6 Eample continued Prgood courseinteresting lecture PrinterestinggoodPrgood 0.8*0. Printeresting Prfairinteresting PrinterestingfairPrfair 0.5* Printeresting 0.4 We can get Prbadinteresting0. either by the same method, or by noting that it complements to the above two. Now, we have all we need for making an intelligent decision about an optimal action Visual Recognition Tutorial 6

7 Eample conclusion The student needs to minimize the conditional risk; c R αi λ αi j P j j he can either take the course: Rtakinginteresting Prgoodinterestingλtakinggood course +Prfairinterestingλtakingfair course +Prbadinterestingλtakingbad course 0.4*0+0.5*5+0.*03.5 or drop it: Rnot takinginteresting Prgoodinterestingλnot takinggood course +Prfairinterestingλnot takingfair course +Prbadinterestingλnot takingbad course 0.4*0+0.5*5+0.* Visual Recognition Tutorial 7

8 Constructing an optimal decision function So, if the first lecture was interesting, the student will minimize the conditional risk by taking the course. In order to construct the full decision function, we need to define the risk minimization action for the case of boring lecture, as well. Do it! Visual Recognition Tutorial 8

9 Eample continuous density Let X be a real value r.v., representing a number randomly picked from the interval [0,]; its distribution is known to be uniform. Then let Y be a real r.v. whose value is chosen at random from [0, X] also with uniform distribution. We are presented with the value of Y, and need to guess the most likely value of X. In a more formal fashion:given the value of Y, find the probability density function p.d.f. of X and determine its maima Visual Recognition Tutorial 9

10 Eample continued Let w denote the state of nature, when X ; What we look for is Pw Yy that is, the p.d.f. The class-conditional given the value of X:, y P Y y w 0, < y For the given evidence: P Y y d ln y y using total probability Visual Recognition Tutorial 0

11 Eample conclusion Applying Bayes rule: p y w p w p w y p y ln y This is monotonically decreasing function, over [y,]. So informally the most likely value of X the one with highest probability density value is Xy Visual Recognition Tutorial

12 Illustration conditional p.d.f. 3.5 The conditional density py Visual Recognition Tutorial

13 Eample 3: hiring a secretary A manager needs to hire a new secretary, and a good one. Unfortunately, good secretary are hard to find: Prw g 0., Prw b 0.8 The manager decides to use a new test. The grade is a real number in the range from 0 to 00. The manager s estimation of the possible losses: λdecision,w i w g w b Hire Reject Visual Recognition Tutorial 3

14 Eample 3: continued The class conditional densities are known to be approimated by a normal p.d.f.: p grade good sec retary ~ N85,5 p grade bad sec retary ~ N40,3 pgradebadprbad pgradegoodprgood Visual Recognition Tutorial 4

15 Eample 3: continued The resulting probability density for the grade looks as follows: pp w b p w b + p w g p w g 0.05 p Visual Recognition Tutorial 5

16 Eample 3: continued We need to know for which grade values hiring the secretary would minimize the risk: Rhire < Rreject p w λhire, w + p w λhire, w b b g g < p w λreject, w + p w λreject, w b b g g [ λhire, w λreject, w ] p w < [ λreject, w λhire, w ] p w b b b b g g The posteriors are given by p w i p w p w i p i Visual Recognition Tutorial 6

17 Eample 3: continued The posteriors scaled by the loss differences, [ λ λ and look like: 30 hire, w reject, w ] p w b b b [ λreject, w λhire, w ] p w b g g 5 0 bad good Visual Recognition Tutorial 7

18 Numerically, we have: Eample 3: continued p e + e 5 π 3 π e e p w 3 5 b π, p wg π p p We need to solve 0 p w > 5 p w b g Solving numerically yields one solution in [0, 00]: Visual Recognition Tutorial 8

19 The Bayesian Doctor Eample A person doesn t feel well and goes to the doctor. Assume two states of nature: : The person has a common flue. : The person is really sick a vicious bacterial infection. The doctors prior is: p 0.9 p This doctor has two possible actions: ``prescribe hot tea or antibiotics. Doctor can use prior and predict optimally: always flue. Therefore doctor will always prescribe hot tea Visual Recognition Tutorial 9

20 The Bayesian Doctor - Cntd. But there is very high risk: Although this doctor can diagnose with very high rate of success using the prior, she can lose a patient once in a while. Denote the two possible actions: a prescribe hot tea a prescribe antibiotics Now assume the following cost loss matri: λ i, j a a Visual Recognition Tutorial 0

21 The Bayesian Doctor - Cntd. Choosing a results in epected risk of R a p λ, + p λ, Choosing a results in epected risk of R a p λ, + p λ, So, considering the costs it s much better and optimal! to always give antibiotics Visual Recognition Tutorial

22 The Bayesian Doctor - Cntd. But doctors can do more. For eample, they can take some observations. A reasonable observation is to perform a blood test. Suppose the possible results of the blood test are: negative no bacterial infection positive infection But blood tests can often fail. Suppose p 0.3 p 0.7 p 0. p class conditional probabilities Visual Recognition Tutorial

23 The Bayesian Doctor - Cntd. Define the conditional risk given the observation R ai p j λ We would like to compute the conditional risk for each action and observation so that the doctor can choose an optimal action that minimizes risk. How can we compute p? We use the class conditional probabilities and Bayes inversion rule. j j i, j Visual Recognition Tutorial 3

24 The Bayesian Doctor - Cntd. Let s calculate first p and p p p p + p p p is complementary to p, so p Visual Recognition Tutorial 4

25 The Bayesian Doctor - Cntd. R a p λ + p λ R a,, 0 + p 0 0 p p p p p p λ p p p, + p 0 λ, Visual Recognition Tutorial 5

26 36607 Visual Recognition Tutorial ,, + + p p p p p p a R λ λ ,, + + p p p p p p p a R λ λ The Bayesian Doctor - Cntd.

27 The Bayesian Doctor - Cntd. To summarize: R R R R a a a a Whenever we encounter an observation, we can minimize the epected loss by minimizing the conditional risk. Makes sense: Doctor chooses hot tea if blood test is negative, and antibiotics otherwise Visual Recognition Tutorial 7

28 Optimal Bayes Decision Strategies A strategy or decision function α is a mapping from observations to actions. The total risk of a decision function is given by E p [ R α ] p R α A decision function is optimal if it minimizes the total risk. This optimal total risk is called Bayes risk. In the Bayesian doctor eample: Total risk if doctor always gives antibiotics: 0.9 Bayes risk: 0.48 How have we got it? Visual Recognition Tutorial 8

Error Rates. Error vs Threshold. ROC Curve. Biometrics: A Pattern Recognition System. Pattern classification. Biometrics CSE 190 Lecture 3

Error Rates. Error vs Threshold. ROC Curve. Biometrics: A Pattern Recognition System. Pattern classification. Biometrics CSE 190 Lecture 3 Biometrics: A Pattern Recognition System Yes/No Pattern classification Biometrics CSE 190 Lecture 3 Authentication False accept rate (FAR): Proportion of imposters accepted False reject rate (FRR): Proportion