Pattern Classification

Size: px

Start display at page:

Download "Pattern Classification"

Millicent Stokes
5 years ago
Views:

1 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000 with the permission of the authors and the publisher

2 Chapter 2 (Part 1): Bayesian Decision Theory (Sections ) Introduction Bayesian Decision Theory Continuous Features

3 2 Introduction The sea bass/salmon example State of nature, prior State of nature is a random variable The catch of salmon and sea bass is equiprobable P(ω 1 ) = P(ω 2 ) (uniform priors) P(ω 1 ) + P( ω 2 ) = 1 (exclusivity and exhaustivity)

4 3 Decision rule with only the prior information Decide ω 1 if P(ω 1 ) > P(ω 2 ) otherwise decide ω 2 PROBLEM!!! If P(ω 1 ) > > P(ω 2 ) correct most of the time If P(ω 1 ) = P(ω 2 ) 50% of being correct Probability of error?

5 4 Use of the class conditional information. Suppose x is the observed lightness. P(x ω 1 ) and P(x ω 2 ) describe the difference in lightness between populations of sea and salmon

6 5 Likelihood

7 6 Posterior, likelihood, evidence Bayes Formula P(ω j x) = P(x ω j ). P (ω j ) / P(x) Where in case of two categories P( x ) = j = 2 j= 1 P( x ω ω j )P( j ) Posterior = (Likelihood. Prior) / Evidence

8 7

9 8 Decision given the posterior probabilities X is an observation for which: if P(ω 1 x) > P(ω 2 x) True state of nature = ω 1 if P(ω 1 x) < P(ω 2 x) True state of nature = ω 2 Therefore: whenever we observe a particular x, the probability of error is : P(error x) = P(ω 1 x) if we decide ω 2 P(error x) = P(ω 2 x) if we decide ω 1

10 9 Minimizing the probability of error Bayes Decision (Minimize the probability of error) Decide ω 1 if P(ω 1 x) > P(ω 2 x); otherwise decide ω 2 Therefore: P(error x) = min [P(ω 1 x), P(ω 2 x)]

11 Bayesian Decision Theory Continuous Features 10 Generalization of the preceding ideas Use of more than one feature Use more than two states of nature Allowing actions and not only decide on the state of nature Introduce a loss of function which is more general than the probability of error

12 11 Allowing actions other than classification primarily allows the possibility of rejection Refusing to make a decision in close or bad cases! The loss function states how costly each action taken is

13 12 Let {ω 1, ω 2,, ω c } be the set of c states of nature (or categories ) Let {α 1, α 2,, α a } be the set of possible actions Let λ(α i ω j ) be the loss incurred for taking action α i when the state of nature is ω j

14 Overall risk R = Sum of all R(α i x) for i = 1,,a 13 Conditional risk Minimizing R Minimizing R(α i x) for i = 1,, a R( α j = c = i x ) λ( α i ω j )P( ω j x ) j= 1 for i = 1,,a

15 14 Select the action α i for which R(α i x) is minimum R is minimum and R in this case is called the Bayes risk = Best performance that can be achieved! Bayes Decision Rule Minimize the overall risk!!!

16 15 Two-category classification α 1 : deciding ω 1 α 2 : deciding ω 2 λ ij = λ(α i ω j ) loss incurred for deciding ω i when the true state of nature is ω j Conditional risk: R(α 1 x) = λ 11 P(ω 1 x) + λ 12 P(ω 2 x) R(α 2 x) = λ 21 P(ω 1 x) + λ 22 P(ω 2 x)

17 16 Our rule is the following: if R(α 1 x) < R(α 2 x) action α 1 : decide ω 1 is taken This results in the equivalent rule : decide ω 1 if: (λ 21 - λ 11 ) P(x ω 1 ) P(ω 1 ) > (λ 12 - λ 22 ) P(x ω 2 ) P(ω 2 ) and decide ω 2 otherwise

18 17 Likelihood ratio: The preceding rule is equivalent to the following rule: P( x ω ) λ 1 12 λ P( ω 22 2 if >. P( x ω ) λ λ P( ω 2 P ) ) Then take action α 1 (decide ω 1 ) Otherwise take action α 2 (decide ω 2 )

19 Optimal decision property 18 If the likelihood ratio exceeds a threshold value independent of the input pattern x, we can take optimal actions

20 19 Minimax Criterion

21 Exercise 20 Select the optimal decision where: Ω= {ω 1, ω 2 } P(x ω 1 ) P(x ω 2 ) N(1.5, 0.2) N(2, 0.5) (Normal distribution) P(ω 1 ) = 2/3 P(ω 2 ) = 1/3 λ =

22 Soln. 21 P(x ω 1 ) P(x ω 2 ) P(x ω 1 ) / P(x ω 2 ) N(2, 0.5) N(1.5, 0.2) P(ω 1 ) = 2/3 P(ω 2 ) = 1/3 λ =

Error Rates. Error vs Threshold. ROC Curve. Biometrics: A Pattern Recognition System. Pattern classification. Biometrics CSE 190 Lecture 3

Error Rates. Error vs Threshold. ROC Curve. Biometrics: A Pattern Recognition System. Pattern classification. Biometrics CSE 190 Lecture 3 Biometrics: A Pattern Recognition System Yes/No Pattern classification Biometrics CSE 190 Lecture 3 Authentication False accept rate (FAR): Proportion of imposters accepted False reject rate (FRR): Proportion