EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

Similar documents
Bayesian Decision Theory

Pattern Classification

Bayes Decision Theory

Error Rates. Error vs Threshold. ROC Curve. Biometrics: A Pattern Recognition System. Pattern classification. Biometrics CSE 190 Lecture 3

Contents 2 Bayesian decision theory

Bayesian Decision Theory Lecture 2

Machine Learning 2017

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

Minimum Error-Rate Discriminant

p(x ω i 0.4 ω 2 ω

PATTERN CLASSIFICATION

L11: Pattern recognition principles

Bayes Decision Theory

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Bayesian Decision and Bayesian Learning

p(x ω i 0.4 ω 2 ω

44 CHAPTER 2. BAYESIAN DECISION THEORY

Application: Can we tell what people are looking at from their brain activity (in real time)? Gaussian Spatial Smooth

PATTERN RECOGNITION AND MACHINE LEARNING

Example - basketball players and jockeys. We will keep practical applicability in mind:

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation

Introduction to Statistical Inference

Multivariate statistical methods and data mining in particle physics

Bayes Decision Theory - I

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Lecture 3: Pattern Classification

Issues and Techniques in Pattern Classification

Bayesian Decision Theory

Parametric Techniques

ECE521 week 3: 23/26 January 2017

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Informatics 2B: Learning and Data Lecture 10 Discriminant functions 2. Minimal misclassifications. Decision Boundaries

CMU-Q Lecture 24:

Metric-based classifiers. Nuno Vasconcelos UCSD

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Lecture 3: Pattern Classification. Pattern classification

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 3

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher

Pattern Recognition 2

BAYESIAN DECISION THEORY

5. Discriminant analysis

Parametric Techniques Lecture 3

Pattern recognition. "To understand is to perceive patterns" Sir Isaiah Berlin, Russian philosopher

p(d θ ) l(θ ) 1.2 x x x

Bayes Rule for Minimizing Risk

Mathematical Formulation of Our Example

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Part 2 Elements of Bayesian Decision Theory

Chemometrics: Classification of spectra

Machine Learning for Signal Processing Bayes Classification and Regression

Linear Models for Classification

Machine Learning Lecture 2

Density Estimation: ML, MAP, Bayesian estimation

Pattern Recognition. Parameter Estimation of Probability Density Functions

Computer Vision Group Prof. Daniel Cremers. 3. Regression

Ch 4. Linear Models for Classification

Machine Learning. Theory of Classification and Nonparametric Classifier. Lecture 2, January 16, What is theoretically the best classifier

Object Recognition. Digital Image Processing. Object Recognition. Introduction. Patterns and pattern classes. Pattern vectors (cont.

Statistical Rock Physics

ECE662: Pattern Recognition and Decision Making Processes: HW TWO

Bayesian Decision Theory

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

Notation. Pattern Recognition II. Michal Haindl. Outline - PR Basic Concepts. Pattern Recognition Notions

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Introduction to Machine Learning

Probability Models for Bayesian Recognition

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata

Lecture 4 Discriminant Analysis, k-nearest Neighbors

Machine Learning Lecture 2

Minimum Error Rate Classification

Machine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods)

Gaussian Models

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

COM336: Neural Computing

Curve Fitting Re-visited, Bishop1.2.5

Intelligent Systems Statistical Machine Learning

CS 195-5: Machine Learning Problem Set 1

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)

Nearest Neighbor Pattern Classification

Machine Learning Lecture 7

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

The Bayes classifier

ECE521 Lecture7. Logistic Regression

Pattern Classification

Advanced statistical methods for data analysis Lecture 1

STA 4273H: Statistical Machine Learning

Machine detection of emotions: Feature Selection

SGN-2506: Introduction to Pattern Recognition. Jussi Tohka Tampere University of Technology Institute of Signal Processing 2006

Computer Vision. Pa0ern Recogni4on Concepts Part I. Luis F. Teixeira MAP- i 2012/13

Clustering VS Classification

Maximum Likelihood Estimation. only training data is available to design a classifier

Machine Learning Lecture 5

THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2012, Mr. Ruey S. Tsay

Probabilistic Methods in Bioinformatics. Pabitra Mitra

Transcription:

EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1

Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle Bayesian Decision Theory Discriminant Functions Decision Regions Minimum Distance Classification Nearest Neighbor Classification Parameter Estimation Decision Boundaries EEL 851 2

Introduction What is a Pattern? Object process of event consisting of both deterministic/stochastic components Record of dynamic occurrences influenced by both deterministic and stochastic factors Examples voice, image, characters Kind of Patterns Visual patterns Temporal patterns Logical patterns What is Pattern Class? Set of patterns sharing set of common attributes (or features) Usually originating from the same source EEL 851 3

Introduction Feature Relevant characteristics that make patterns apart from each other Data extractable through measurements or processing Examples Patterns Speech waveforms, crystals, textures, weather patterns Features Age, color, height, width Classifications Assigning patterns into classes based on features Noise Distortions associated with pattern processing and/or training samples that effect the classification performance of system EEL 851 4

Example Automation in fish packing plant Sort incoming fish on a conveyor according to species using optical sensing Sorting species Sea bass Salmon EEL 851 5

Problem Analysis Setup camera Take some sample images Features? Suggested features Length Lightness Width Number and shape of fins Position of mouth, etc.. Explore above suggested features! EEL 851 6

Preprocessing Segmentation Isolate fishes from background Isolate fishes from one another Feature extraction Reduce the data by measuring certain features Classifier Evaluates the evidence presented Makes final decision EEL 851 7

Preprocessing EEL 851 8

Classification Selecting length of fish Possible feature for classification EEL 851 9

Classification Length is a poor feature! Select brightness as a possible feature EEL 851 10

Threshold Decision Boundary Cost Consequences of our decision Equal? Task of decision theory Move the decision boundary towards smaller values of lightness, Cost Reduce number of sea brass classified as salmon EEL 851 11

Classification Adopt lightness and the width of fish Fish x = [x 1, x 2 ] Lightness Width EEL 851 12

Classification EEL 851 13

Classification Our satisfaction Premature Central aim Correctly classify a novel input Issue of Generalization! EEL 851 14

Classification Syntactic Pattern Recognition Recursive description using method of formal languages Structural Pattern Recognition Derivation of descriptions using formal rules EEL 851 15

Pattern Recognition Systems Invariant Features Translation Rotation Scale EEL 851 16

Design Cycle Data Collection Feature Choice Model Choice Training Evaluation Computational Complexity EEL 851 17

Design Cycle EEL 851 18

Types of Learning Supervised Learning Teacher Category label or cost for each label in training set Desired input Desired output Goals Produce a correct output given a new input Goals of Supervised Learning Classification Desired output Discrete class labels Regression Desired output Continuous valued Unsupervised Learning The system forms clusters or natural groupings of input pattern Goals Build a model or find useful representations of data Usage Reasoning, finding clusters, dimensionality reduction, decision making, prediction, etc. Data compression, Outlier detection, Help classification EEL 851 19

Bayesian Decision Theory Assumptions Problem Probabilistic terms Knowledge of relevant probabilities Sea Bass/Salmon example State of nature Random Variable ω ω 1 (sea bass) ω ω 1 (salmon) Priori probability P(ω 1 ) sea bass; P(ω 2 ) salmon P(ω 1 ) + P( ω 2 ) = 1 (exclusivity and exhaustivity) EEL 851 20

Bayesian Decision Theory Decision Rule Only prior information, Cannot see! Decide ω 1 if P(ω 1 ) > P(ω 2 ) otherwise decide ω 2 More info in most cases Class Conditional Info Class-conditional pdf p(x ω) p(x ω 1 ) and p(x ω 2 ) Difference in lightness between populations of sea and salmon EEL 851 21

Bayesian Decision Theory Class-conditional pdf x Lightness of fish Normalized EEL 851 22

Bayes Formula Lightness of fish x, Class? P(ω j x) = p(x ω j ). P (ω j ) / p(x) where in case of two categories p( x) j 2 = = j= 1 p( x ω ) P( ω ) j j Posterior = (Likelihood Prior) / Evidence p(ω j x) Likelihood p(x) Evidence (scale factor) EEL 851 23

Posterior Probabilities Prior probabilities P(ω 1 ) = 2/3 P(ω 2 ) = 1/3 EEL 851 24

Bayes Decision Rule Decision given posterior probabilities Observation x if P(ω 1 x) > P(ω 2 x) True state of nature = ω 1 if P(ω 1 x) < P(ω 2 x) True state of nature = ω 2 Probability of error P(error x) = P(ω 1 x) if we decide ω 2 P(error x) = P(ω 2 x) if we decide ω 1 EEL 851 25

Bayes Decision Rule Minimize probability of error Decide ω 1 if P(ω 1 x) > P(ω 2 x); otherwise decide ω 2 Bayes decision P(error x) = min [P(ω 1 x), P(ω 2 x)] EEL 851 26

Bayesian Classifier Generalization of Preceding Ideas More than one feature More than two state of nature Actions not only deciding state of nature Loss function More general than probability of error Bayes Decision Rule Input pattern C is classified into class ω k, for a given feature vector x, maximizes the posterior probability; P(ω k x) P(ω j x) for j k Likelihood conditions p(x ω k ) Training data measurements Prior probabilities P(ω k ) Supposed to known within given population of sample patterns p(x) is same for all class alternatives Ignored, normalization factor EEL 851 27

Loss Function Certain classification errors may be costly than others False Negative may be much more costly than False Positive Examples Fire alarm, Medical diagnosis Ω = {ω 1, ω 1, ω n } Possible set of nature/classes = {α 1,α 2, α n } Possible decisions/actions Loss Function λ(α i ω j ) Loss incurred by taking action α i when true state of nature is ω j Conditional Risk R(α i x) Loss expected by taking action α i when observed evidence is x EEL 851 28

Minimum Error Rate Classification Action α i is associated with class ω j All errors are equally likely Zero-One Loss Classification decision α i is correct only if state of nature is ω j Symmetrical function 0 λ( αi ω j ) = 1 Conditional Risk R(α i x) = λ(α i ω j ) P(ω j x) = 1 P(ω i x) Minimize Risk for for i = j i j Select the class maximizing the posterior probability Suggested by Bayes Decision Rule, Minimum error rate classification EEL 851 29

Neyman-Pearson Criterion Minimize overall risk subject to constraints R(α i x) dx < constant Misclassification limited to a frequency Fish example Do not misclassify more than 1% of salmon as bass Minimize the chance of classifying sea bass as salmon EEL 851 30

Discriminant Functions Classifier Representation Function employed for discriminating among classes g i (x), i = 1,,k Classifier Assign x to class ω i if g i (x) g j (x) for j i Possible Choices g i (x) = P(ω j x) g i (x) = p(x ω j )P(ω j ) g i (x) = ln {p(x ω j )} + ln {P(ω j )} EEL 851 31

Decision Region Effect of Decision Rule Feature space Decision regions Decision Boundaries Surface in feature space Ties occur among largest discriminant functions EEL 851 32

Normal Density Univariate Density Analytically tractable, Maximum entropy Continuous-valued density A lot of processes are asymptotically Gaussian Central Limit Theorem Aggregate effect of independent random disturbances Gaussian Many patterns Prototype corrupted by large number of RP P(x) = 1 exp 2π σ 1 x 2 µ σ 2, µ = mean (or expected value) of x σ 2 = expected squared deviation or variance EEL 851 33

Normal Density Multivariate density Multivariate normal density in d dimensions is: P(x) = (2π) 1 d / 2 Σ 1/ 2 exp 1 (x 2 µ ) t Σ 1 (x µ ) where: x = (x 1, x 2,, x d ) t feature vector µ = (µ 1, µ 2,, µ d ) t mean vector Σ = d*d covariance matrix Σ and Σ -1 are determinant and inverse respectively Σ Shape of Gaussian curve (x - µ) t Σ -1 (x - µ) Mahalanobis distance EEL 851 34

Normal Density P(x) = (2π) 1 d / 2 Σ 1/ 2 exp 1 (x 2 µ ) t Σ 1 (x µ ) EEL 851 35

Discriminant Functions Minimum Error Rate Classification g i (x) = ln p(x ω i ) + ln P(ω i ) g (x) = i 1 (x µ 2 i 1 t i ) (x µ i d ) ln2π 2 1 ln 2 Σ i + lnp( ω ) i Case Σ i = σ 2.I Features are statistically independent, same variance σ 2 Diagonal covariance matrix i t i g (x) = w x+ w 1 i0 1 t w i = 2 µ i wi0 = µ iµ i + ln P( ωi ) 2 2 σ σ Linear machines Decision surface Pieces of hyperplanes g i (x) = g j (x) EEL 851 36

Discriminant Functions EEL 851 37

Discriminant Functions EEL 851 38

Discriminant Functions Case Σ i = Σ Covariance matrices of all classes Identical but arbitrary g 1 t 1 x) = (x- µ i) Σ (x- µ i) + ln P( ω ) 2 i ( i Hyperplane separating R i and R j [ P( ω )/ P( ω )] 1 ln i j x 0 = ( µ i + µ j ).( µ i µ j ) 2 t 1 ( µ µ ) Σ ( µ µ ) i j i j Hyperplane separating R i and R j Not orthogonal to the line between the means EEL 851 39

Discriminant Functions EEL 851 40

Discriminant Functions EEL 851 41

EEL 851 42 Case Σ i = arbitrary The covariance matrices are different for each category Hyperquadrics Hyperplanes, pairs of hyperplanes, hyperspheres, hyperellipsoids, hyperparaboloids, hyperhyperboloids ) ( ln ln 2 1 2 1 w w 2 1 W : ) ( 1 0 1 i 1 i 0 i i i i t i i i i i i t i i t i P where w x w W x x x g ω µ µ µ + Σ Σ = = Σ Σ = = + = Discriminant Functions

Discriminant Functions EEL 851 43

Decision Boundary - Example Compute Bayes decision boundary Gaussian Distributions Means and Covariance's Discrete versions 1/ 2 0 µ 1 = 0 2 3 1 = 6 µ 2 3 = 2 2 2 = 0 0 2 2 x 2 = 3. 514 1. 125x1 + 0. 1875x1 EEL 851 44

Supervised Learning Parametric Approaches Bayesian parameter estimation Maximum likelihood estimation Estimation Problem Estimate P(ω j ) Estimate p(ω j x) Tough High dimensional feature spaces, small number of training samples Simplifying Assumptions Feature Independence Independently drawn samples I. I. D. model Assume that p(ω j x) is Gaussian Estimation problem Parameters of normality EEL 851 45

Estimation Methods Bayesian Estimation (MAP) Distribution parameters are random values that follow a known (i.e. Gaussian) distribution Behavior of training data helps in revising parameter values Large training samples Better chances of refining posterior probabilities (parameter peaking) Maximum-Likelihood (ML) Estimation Parameters of probabilistic distributions are fixed but unknown values Parameters Unknown constants, Identify using training data Best estimates of parameter values Class-conditional probabilities are maximized over the available samples EEL 851 46

Parametric Approaches Curse of dimensionality Estimate the parameters known distribution Smaller number of samples Some priori information EEL 851 47

Non-parametric Approaches Parzen window pdf estimation (KDE) Estimate p(ω j x) directly from sample patterns K n nearest-neighbor Directly construct the decision boundary based on training data EEL 851 48

Statistical Approaches EEL 851 49

Nearest-Neighbor Methods Statistical, nearest-neighbor or memory-based methods k-nearest-neighbor New pattern category Plurality of its k closest neighbor Large k Decreases the chance of undue influence by noisy training pattern Small k Reduces acuity of method Distance metric usually Euclidean In practice features are scaled a j Advantage Does not require training EEL 851 50

Nearest-Neighbor Decision Example Class of training patterns 4 EEL 851 51

References Richard O. Duda, Peter E. Hart, and David G.Stork, Pattern Classification, 2 nd edition,john Wiley & Sons, 2001 (most contents) Nils J. Nilsson, Introduction to Machine Learning, http://ai.stanford.edu/people/nilsson/mlbook.html http://www.rii.ricoh.com/%7estork/dhs.html A. K. Jain, R. P. W. Duin, and J. Mao, Statistical Pattern Recognition: A Review, IEEE Trans PAMI, pp. 4-37, Jan. 2000 Other Useful References Fukunaga, Introduction to Statistical Pattern Recognition, Academic Press, New York, NY, 1972. Therrien, Decision Estimation and Classification, John-Wiley & sons, New York, NY, 1989. Hertz, Krogh, and Palmer, Introduction to the Theory of Neural Computation, Addison-Wesley, Reading, MA, 1991. Bose and Liang, Neural Network Fundamentals with Graphs, Algorithms, and Applications, McGrawll-Hill, New York, NY, 1996 EEL 851 52