Pattern Classification

Similar documents
Error Rates. Error vs Threshold. ROC Curve. Biometrics: A Pattern Recognition System. Pattern classification. Biometrics CSE 190 Lecture 3

Bayesian Decision Theory

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

Bayesian Decision Theory

Bayesian Decision Theory Lecture 2

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory

p(x ω i 0.4 ω 2 ω

Minimum Error-Rate Discriminant

Part 2 Elements of Bayesian Decision Theory

p(x ω i 0.4 ω 2 ω

Deep Learning for Computer Vision

Contents 2 Bayesian decision theory

44 CHAPTER 2. BAYESIAN DECISION THEORY

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 3

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

Nearest Neighbor Pattern Classification

Machine Learning 2017

Minimum Error Rate Classification

BAYESIAN DECISION THEORY

p(d θ ) l(θ ) 1.2 x x x

Bayes Decision Theory

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Bayesian Decision and Bayesian Learning

CSE555: Introduction to Pattern Recognition Midterm Exam Solution (100 points, Closed book/notes)

Bayesian Decision Theory

Machine Learning Lecture 2

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

CMU-Q Lecture 24:

Bayes Classifiers. CAP5610 Machine Learning Instructor: Guo-Jun QI

Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1)

Machine Learning Lecture 2

Chapter 2 Bayesian Decision Theory. Pattern Recognition Soochow, Fall Semester 1

Pattern Classification

Pattern Recognition 2

Expect Values and Probability Density Functions

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over

Detection theory 101 ELEC-E5410 Signal Processing for Communications

Bayesian Learning. Bayesian Learning Criteria

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart

What does Bayes theorem give us? Lets revisit the ball in the box example.

The Naïve Bayes Classifier. Machine Learning Fall 2017

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Example - basketball players and jockeys. We will keep practical applicability in mind:

Computer Vision. Pa0ern Recogni4on Concepts Part I. Luis F. Teixeira MAP- i 2012/13

Pattern Classification

Dimension Reduction and Component Analysis

Dimension Reduction and Component Analysis Lecture 4

INF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification

Multivariate statistical methods and data mining in particle physics

Parametric Techniques Lecture 3

PATTERN CLASSIFICATION

Parametric Techniques

Statistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003

Bayes Rule for Minimizing Risk

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

Bayesian Reasoning and Recognition

Machine detection of emotions: Feature Selection

Definition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution.

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Probability Models for Bayesian Recognition

MIT Spring 2016

L11: Pattern recognition principles

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation

Naive Bayesian Rough Sets

Bayesian Learning (II)

5. Discriminant analysis

Lecture 1: Bayesian Framework Basics

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Topics. Bayesian Learning. What is Bayesian Learning? Objectives for Bayesian Learning

Classifier Performance. Assessment and Improvement

Detection and Estimation Chapter 1. Hypothesis Testing

Bayes Decision Theory

Machine Learning and Deep Learning! Vincent Lepetit!

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

Classical and Bayesian inference

INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP

a b = a T b = a i b i (1) i=1 (Geometric definition) The dot product of two Euclidean vectors a and b is defined by a b = a b cos(θ a,b ) (2)

ST5215: Advanced Statistical Theory

Linear Discrimination Functions

Probability and (Bayesian) Data Analysis

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 1

Detection Theory. Chapter 3. Statistical Decision Theory I. Isael Diaz Oct 26th 2010

From Bayes Theorem to Pattern Recognition via Bayes Rule

STAT 830 Decision Theory and Bayesian Methods

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher

Detection theory. H 0 : x[n] = w[n]

ECO421: Communication

Necessary Corrections in Intransitive Likelihood-Ratio Classifiers

Bayesian Decision Theory Tutorial Visual Recognition Tutorial 1

Pattern recognition. "To understand is to perceive patterns" Sir Isaiah Berlin, Russian philosopher

Advanced statistical methods for data analysis Lecture 1

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Pattern Recognition and Machine Learning Course: Introduction. Bayesian Decision Theory.

Learning Theory, Overfi1ng, Bias Variance Decomposi9on

Research Article Decision Analysis via Granulation Based on General Binary Relation

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Estimation Tasks. Short Course on Image Quality. Matthew A. Kupinski. Introduction

Transcription:

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000 with the permission of the authors and the publisher

Chapter 2 (Part 1): Bayesian Decision Theory (Sections 2.1-2.2) Introduction Bayesian Decision Theory Continuous Features

2 Introduction The sea bass/salmon example State of nature, prior State of nature is a random variable The catch of salmon and sea bass is equiprobable P(ω 1 ) = P(ω 2 ) (uniform priors) P(ω 1 ) + P( ω 2 ) = 1 (exclusivity and exhaustivity)

3 Decision rule with only the prior information Decide ω 1 if P(ω 1 ) > P(ω 2 ) otherwise decide ω 2 PROBLEM!!! If P(ω 1 ) > > P(ω 2 ) correct most of the time If P(ω 1 ) = P(ω 2 ) 50% of being correct Probability of error?

4 Use of the class conditional information. Suppose x is the observed lightness. P(x ω 1 ) and P(x ω 2 ) describe the difference in lightness between populations of sea and salmon

5 Likelihood

6 Posterior, likelihood, evidence Bayes Formula P(ω j x) = P(x ω j ). P (ω j ) / P(x) Where in case of two categories P( x ) = j = 2 j= 1 P( x ω ω j )P( j ) Posterior = (Likelihood. Prior) / Evidence

7

8 Decision given the posterior probabilities X is an observation for which: if P(ω 1 x) > P(ω 2 x) True state of nature = ω 1 if P(ω 1 x) < P(ω 2 x) True state of nature = ω 2 Therefore: whenever we observe a particular x, the probability of error is : P(error x) = P(ω 1 x) if we decide ω 2 P(error x) = P(ω 2 x) if we decide ω 1

9 Minimizing the probability of error Bayes Decision (Minimize the probability of error) Decide ω 1 if P(ω 1 x) > P(ω 2 x); otherwise decide ω 2 Therefore: P(error x) = min [P(ω 1 x), P(ω 2 x)]

Bayesian Decision Theory Continuous Features 10 Generalization of the preceding ideas Use of more than one feature Use more than two states of nature Allowing actions and not only decide on the state of nature Introduce a loss of function which is more general than the probability of error

11 Allowing actions other than classification primarily allows the possibility of rejection Refusing to make a decision in close or bad cases! The loss function states how costly each action taken is

12 Let {ω 1, ω 2,, ω c } be the set of c states of nature (or categories ) Let {α 1, α 2,, α a } be the set of possible actions Let λ(α i ω j ) be the loss incurred for taking action α i when the state of nature is ω j

Overall risk R = Sum of all R(α i x) for i = 1,,a 13 Conditional risk Minimizing R Minimizing R(α i x) for i = 1,, a R( α j = c = i x ) λ( α i ω j )P( ω j x ) j= 1 for i = 1,,a

14 Select the action α i for which R(α i x) is minimum R is minimum and R in this case is called the Bayes risk = Best performance that can be achieved! Bayes Decision Rule Minimize the overall risk!!!

15 Two-category classification α 1 : deciding ω 1 α 2 : deciding ω 2 λ ij = λ(α i ω j ) loss incurred for deciding ω i when the true state of nature is ω j Conditional risk: R(α 1 x) = λ 11 P(ω 1 x) + λ 12 P(ω 2 x) R(α 2 x) = λ 21 P(ω 1 x) + λ 22 P(ω 2 x)

16 Our rule is the following: if R(α 1 x) < R(α 2 x) action α 1 : decide ω 1 is taken This results in the equivalent rule : decide ω 1 if: (λ 21 - λ 11 ) P(x ω 1 ) P(ω 1 ) > (λ 12 - λ 22 ) P(x ω 2 ) P(ω 2 ) and decide ω 2 otherwise

17 Likelihood ratio: The preceding rule is equivalent to the following rule: P( x ω ) λ 1 12 λ P( ω 22 2 if >. P( x ω ) λ λ P( ω 2 P 21 11 1 ) ) Then take action α 1 (decide ω 1 ) Otherwise take action α 2 (decide ω 2 )

Optimal decision property 18 If the likelihood ratio exceeds a threshold value independent of the input pattern x, we can take optimal actions

19 Minimax Criterion

Exercise 20 Select the optimal decision where: Ω= {ω 1, ω 2 } P(x ω 1 ) P(x ω 2 ) N(1.5, 0.2) N(2, 0.5) (Normal distribution) P(ω 1 ) = 2/3 P(ω 2 ) = 1/3 λ = 1 3 2 4

Soln. 21 P(x ω 1 ) P(x ω 2 ) P(x ω 1 ) / P(x ω 2 ) N(2, 0.5) N(1.5, 0.2) P(ω 1 ) = 2/3 P(ω 2 ) = 1/3 λ = 1 3 2 4