p(x ω i 0.4 ω 2 ω

Similar documents
p(x ω i 0.4 ω 2 ω

Bayesian Decision Theory Lecture 2

Lecture Notes on the Gaussian Distribution

Bayes Decision Theory

Bayesian Decision Theory

Minimum Error-Rate Discriminant

p(d θ ) l(θ ) 1.2 x x x

Contents 2 Bayesian decision theory

Pattern Classification

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

44 CHAPTER 2. BAYESIAN DECISION THEORY

Minimum Error Rate Classification

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

Bayesian Decision Theory

Machine Learning 2017

Parametric Techniques

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Parametric Techniques Lecture 3

Bayesian Decision and Bayesian Learning

Bayes Decision Theory

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

CS 195-5: Machine Learning Problem Set 1

Informatics 2B: Learning and Data Lecture 10 Discriminant functions 2. Minimal misclassifications. Decision Boundaries

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Introduction to Signal Detection and Classification. Phani Chavali

5. Discriminant analysis

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

Nonparametric Methods Lecture 5

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

CMU-Q Lecture 24:

Deep Learning for Computer Vision

Multivariate statistical methods and data mining in particle physics

Error Rates. Error vs Threshold. ROC Curve. Biometrics: A Pattern Recognition System. Pattern classification. Biometrics CSE 190 Lecture 3

Unsupervised Learning with Permuted Data

Bayesian Decision Theory

STA 4273H: Statistical Machine Learning

Machine Learning Lecture 2

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

PATTERN CLASSIFICATION

Linear Classifiers as Pattern Detectors

INF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

Generative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham

Notes on Discriminant Functions and Optimal Classification

Pattern Classification

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

B4 Estimation and Inference

Ch 4. Linear Models for Classification

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation

Clustering VS Classification

Machine Learning, Midterm Exam: Spring 2009 SOLUTION

Part 2 Elements of Bayesian Decision Theory

Image Processing 1 (IP1) Bildverarbeitung 1

Conditional Independence

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

Machine Learning Lecture 2

Cheng Soon Ong & Christian Walder. Canberra February June 2018

PATTERN RECOGNITION AND MACHINE LEARNING

The generative approach to classification. A classification problem. Generative models CSE 250B

Dimension Reduction and Component Analysis

Dimension Reduction and Component Analysis Lecture 4

Application: Can we tell what people are looking at from their brain activity (in real time)? Gaussian Spatial Smooth

BAYESIAN DECISION THEORY

Pattern Recognition. Parameter Estimation of Probability Density Functions

Linear Classification: Probabilistic Generative Models

Expect Values and Probability Density Functions

Pattern Recognition 2

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

Machine Learning and Data Mining. Bayes Classifiers. Prof. Alexander Ihler

Probabilistic Methods in Bioinformatics. Pabitra Mitra

COM336: Neural Computing

Concerns of the Psychophysicist. Three methods for measuring perception. Yes/no method of constant stimuli. Detection / discrimination.

Final Examination CS 540-2: Introduction to Artificial Intelligence

Naïve Bayes classification

Advanced statistical methods for data analysis Lecture 1

INF Anne Solberg One of the most challenging topics in image analysis is recognizing a specific object in an image.

Generative v. Discriminative classifiers Intuition

Lecture 3: Pattern Classification

CSE555: Introduction to Pattern Recognition Midterm Exam Solution (100 points, Closed book/notes)

{ p if x = 1 1 p if x = 0

Gaussian Mixture Models, Expectation Maximization

MACHINE LEARNING ADVANCED MACHINE LEARNING

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Conditional Probability. CS231 Dianna Xu

Linear Discrimination Functions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Object Recognition. Digital Image Processing. Object Recognition. Introduction. Patterns and pattern classes. Pattern vectors (cont.

Mobile Robot Localization

L5: Quadratic classifiers

Performance Evaluation and Comparison

Statistical methods in recognition. Why is classification a problem?

Bayesian Approaches Data Mining Selected Technique

Lecture 3: Pattern Classification. Pattern classification

Machine Learning for Signal Processing Bayes Classification and Regression

Real time detection through Generalized Likelihood Ratio Test of position and speed readings inconsistencies in automated moving objects

a b = a T b = a i b i (1) i=1 (Geometric definition) The dot product of two Euclidean vectors a and b is defined by a b = a b cos(θ a,b ) (2)

Classification. Sandro Cumani. Politecnico di Torino

Transcription:

p(x ω i ).4 ω.3.. 9 3 4 5 x FIGURE.. Hypothetical class-conditional probability density functions show the probability density of measuring a particular feature value x given the pattern is in category ω i.ifx represents the lightness of a fish, the two curves might describe the difference in lightness of populations of two types of fish. Density functions are normalized, and thus the area under each curve is.. From: Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern Classification. Copyright c by John Wiley & Sons, Inc.

P(ω i x).8.6.4 ω. 9 3 4 5 FIGURE.. Posterior probabilities for the particular priors P( ) = /3 andp(ω ) = /3 for the class-conditional probability densities shown in Fig... Thus in this case, given that a pattern is measured to have feature value x = 4, the probability it is in category ω is roughly.8, and that it is in is.9. At every x, the posteriors sum to.. From: Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern Classification. Copyright c by John Wiley & Sons, Inc. x

p(x ) p(x ω ) θ b θ a x FIGURE.3. The likelihood ratio p(x )/p(x ω ) for the distributions shown in Fig... If we employ a zero-one or classification loss, our decision boundaries are determined by the threshold θ a. If our loss function penalizes miscategorizing ω as patterns more than the converse, we get the larger threshold θ b, and hence becomes smaller. From: Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern Classification. Copyright c by John Wiley & Sons, Inc.

P(error).4.4.3.3......4.6.8 P( ) FIGURE.4. The curve at the bottom shows the minimum (Bayes) error as a function of prior probability P( ) in a two-category classification problem of fixed distributions. For each value of the priors (e.g., P( ) =.5) there is a corresponding optimal decision boundary and associated Bayes error rate. For any (fixed) such boundary, if the priors are then changed, the probability of error will change as a linear function of P( ) (shown by the dashed line). The maximum such error will occur at an extreme value of the prior, here at P( ) =. To minimize the maximum of such error, we should design our decision boundary for the maximum Bayes error (here P( ) =.6), and thus the error will not change as a function of prior, as shown by the solid red horizontal line. From: Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern Classification. Copyright c by John Wiley & Sons, Inc.

action (e.g., classification) costs discriminant functions g (x) g (x)... g c (x) input x x x 3... x d FIGURE.5. The functional structure of a general statistical pattern classifier which includes d inputs and c discriminant functions g i (x). A subsequent step determines which of the discriminant values is the maximum, and categorizes the input pattern accordingly. The arrows show the direction of the flow of information, though frequently the arrows are omitted when the direction of flow is self-evident. From: Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern Classification. Copyright c by John Wiley & Sons, Inc.

.3. p(x )P( ) p(x ω )P(ω ). 5 decision boundary 5 FIGURE.6. In this two-dimensional two-category classifier, the probability densities are Gaussian, the decision boundary consists of two hyperbolas, and thus the decision region is not simply connected. The ellipses mark where the density is /e times that at the peak of the distribution. From: Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern Classification. Copyright c by John Wiley & Sons, Inc.

p(x) σ.5%.5% µ - σ µ - σ µ µ + σ µ + σ x FIGURE.7. A univariate normal distribution has roughly 95% of its area in the range x µ σ, as shown. The peak of the distribution has value p(µ) = / πσ.from: Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern Classification. Copyright c by John Wiley & Sons, Inc.

x N(A t wµ, I) A t wµ A w N(µ,Σ) A t µ A µ P N(A t µ, A t Σ A) σ P t µ a x FIGURE.8. The action of a linear transformation on the feature space will convert an arbitrary normal distribution into another normal distribution. One transformation, A, takes the source distribution into distribution N(A t, A t A). Another linear transformation a projection P onto a line defined by vector a leads to N(µ, σ ) measured along that line. While the transforms yield distributions in a different space, we show them superimposed on the original x x -space. A whitening transform, A w, leads to a circularly symmetric Gaussian, here shown displaced. From: Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern Classification. Copyright c by John Wiley & Sons, Inc.

x µ FIGURE.9. Samples drawn from a two-dimensional Gaussian lie in a cloud centered on the mean. The ellipses show lines of equal probability density of the Gaussian. From: Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern Classification. Copyright c by John Wiley & Sons, Inc. x

.5-4 ω p(x ω i ) ω.4..5 P(ω )=.5 ω.3. -. - 4 x - P( )=.5 P(ω )=.5 - - - P( )=.5 P( )=.5 P(ω )=.5 4 FIGURE.. If the covariance matrices for two distributions are equal and proportional to the identity matrix, then the distributions are spherical in d dimensions, and the boundary is a generalized hyperplane of d dimensions, perpendicular to the line separating the means. In these one-, two-, and three-dimensional examples, we indicate p(x ω i ) and the boundaries for the case P( ) = P(ω ). In the three-dimensional case, the grid plane separates from. From: Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern Classification. Copyright c by John Wiley & Sons, Inc.

p(x ω i ).4 ω p(x ω i ).4 ω.3.3.... - 4 x - 4 x P( )=.7 P(ω )=.3 P( )=.9 P(ω )=..5-4 ω.5-4 ω...5.5 P(ω )=. P(ω )=. - P( )=.8 - P( )=.99 4 4 3 4 P(ω )=. ω P(ω )=. - - P( )=.8 - - P( )=.99 ω - - - - FIGURE.. As the priors are changed, the decision boundary shifts; for sufficiently disparate priors the boundary will not lie between the means of these one-, two- and three-dimensional spherical Gaussian distributions. From: Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern Classification. Copyright c by John Wiley & Sons, Inc.

. ω. ω -. -. -5 P(ω )=.5 P( )=.5 5-5 5-5 P(ω )=.9 P( )=. -5 5 5 7.5 P( )=.5 5 7.5.5 P( )=. 5 - ω P(ω )=.5-4 -.5 - P(ω )=.9 FIGURE.. Probability densities (indicated by the surfaces in two dimensions and ellipsoidal surfaces in three dimensions) and decision regions for equal but asymmetric Gaussian distributions. The decision hyperplanes need not be perpendicular to the line connecting the means. From: Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern Classification. Copyright c by John Wiley & Sons, Inc. - ω 4 -.5

p(x ω i ).4.3 ω.. -5 -.5.5 5 7.5 x FIGURE.3. Non-simply connected decision regions can arise in one dimensions for Gaussians having unequal variance. From: Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern Classification. Copyright c by John Wiley & Sons, Inc.

FIGURE.4. Arbitrary Gaussian distributions lead to Bayes decision boundaries that are general hyperquadrics. Conversely, given any hyperquadric, one can find two Gaussian distributions whose Bayes decision boundary is that hyperquadric. These variances are indicated by the contours of constant probability density. From: Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern Classification. Copyright c by John Wiley & Sons, Inc.

decision boundary FIGURE.5. Arbitrary three-dimensional Gaussian distributions yield Bayes decision boundaries that are two-dimensional hyperquadrics. There are even degenerate cases in which the decision boundary is a line. From: Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern Classification. Copyright c by John Wiley & Sons, Inc.

R 4 R 3 R 4 FIGURE.6. The decision regions for four normal distributions. Even with such a low number of categories, the shapes of the boundary regions can be rather complex. From: Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern Classification. Copyright c by John Wiley & Sons, Inc.

p(x ω i )P(ω i ) ω reducible error x B x* x p(x ω )P(ω ) dx p(x )P( ) dx FIGURE.7. Components of the probability of error for equal priors and (nonoptimal) decision point x. The pink area corresponds to the probability of errors for deciding when the state of nature is in fact ω ; the gray area represents the converse, as given in Eq. 7. If the decision boundary is instead at the point of equal posterior probabilities, x B, then this reducible error is eliminated and the total shaded area is the minimum possible; this is the Bayes decision and gives the Bayes error rate. From: Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern Classification. Copyright c by John Wiley & Sons, Inc.

e -k(β).8.6.4 Bhattacharyya bound Chernoff bound..5.5 β.75 β FIGURE.8. The Chernoff error bound is never looser than the Bhattacharyya bound. For this example, the Chernoff bound happens to be at β =.66, and is slightly tighter than the Bhattacharyya bound (β =.5). From: Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern Classification. Copyright c by John Wiley & Sons, Inc.

p(x ω i ) ω σ σ µ x* µ x FIGURE.9. During any instant when no external pulse is present, the probability density for an internal signal is normal, that is, p(x ) N(µ,σ ); when the external signal is present, the density is p(x ω ) N(µ,σ ). Any decision threshold x will determine the probability of a hit (the pink area under the ω curve, above x ) and of a false alarm (the black area under the curve, above x ). From: Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern Classification. Copyright c by John Wiley & Sons, Inc.

hit d'=3 d'= P(x > x* x ω ) d'= d'= P(x < x* x ω ) false alarm FIGURE.. In a receiver operating characteristic (ROC) curve, the abscissa is the probability of false alarm, P(x > x x ), and the ordinate is the probability of hit, P(x > x x ω ). From the measured hit and false alarm rates (here corresponding to x in Fig..9 and shown as the red dot), we can deduce that d = 3. From: Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern Classification. Copyright c by John Wiley & Sons, Inc.

p(x ωi) hit ω p(x < x* x ε ) x p(x < x* x ε ω ) false alarm FIGURE.. In a general operating characteristic curve, the abscissa is the probability of false alarm, P(x x ), and the ordinate is the probability of hit, P(x x ω ). As illustrated here, operating characteristic curves are generally not symmetric, as shown at the right. From: Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern Classification. Copyright c by John Wiley & Sons, Inc.

x ω ω ω 4 3 xˆ x x FIGURE.. Four categories have equal priors and the class-conditional distributions shown. If a test point is presented in which one feature is missing (here, x )andthe other is measured to have value ˆx (red dashed line), we want our classifier to classify the pattern as category ω, because p( ˆx ω ) is the largest of the four likelihoods. From: Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern Classification. Copyright c by John Wiley & Sons, Inc.

x 3 x x FIGURE.3. A three-dimensional distribution which obeys p(x, x 3 ) = p(x )p(x 3 ); thus here x and x 3 are statistically independent but the other feature pairs are not. From: Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern Classification. Copyright c by John Wiley & Sons, Inc.

P(c a) P(a) A P(b) B P(d b) C P(e c) P(c d) D P(f e) E P(g e) F P(g f) G FIGURE.4. A belief network consists of nodes (labeled with uppercase bold letters) and their associated discrete states (in lowercase). Thus node A has states a, a,..., denoted simply a; node B has states b, b,..., denoted b, and so forth. The links between nodes represent conditional probabilities. For example, P(c a) can be described by a matrix whose entries are P(c i a j ). From: Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern Classification. Copyright c by John Wiley & Sons, Inc.

A P(x a) B P(x b) parents of X P(c x) X P(d x) C D children of X FIGURE.5. A portion of a belief network, consisting of a node X, having variable values (x, x,...), its parents (A and B), and its children (C and D). From: Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern Classification. Copyright c by John Wiley & Sons, Inc.