Pattern Recognition Problem. Pattern Recognition Problems. Pattern Recognition Problems. Pattern Recognition: OCR. Pattern Recognition Books

Similar documents
Multivariate statistical methods and data mining in particle physics

The dissimilarity representation for non-euclidean pattern recognition, a tutorial

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

Bayesian Reasoning and Recognition

Probability Models for Bayesian Recognition

Advanced statistical methods for data analysis Lecture 1

BAYESIAN DECISION THEORY

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart

Expect Values and Probability Density Functions

Bayesian decision making

Pattern Classification

Generative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham

Machine learning for pervasive systems Classification in high-dimensional spaces

Linear Classifiers as Pattern Detectors

Lecture 3: Pattern Classification

Nonparametric Methods Lecture 5

Small sample size generalization

Motivating the Covariance Matrix

CS 534: Computer Vision Segmentation III Statistical Nonparametric Methods for Segmentation

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

CS 195-5: Machine Learning Problem Set 1

Learning Methods for Linear Detectors

Non-parametric Classification of Facial Features

p(d θ ) l(θ ) 1.2 x x x

Generative Techniques: Bayes Rule and the Axioms of Probability

Towards Maximum Geometric Margin Minimum Error Classification

Brief Introduction of Machine Learning Techniques for Content Analysis

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Statistical and Learning Techniques in Computer Vision Lecture 1: Random Variables Jens Rittscher and Chuck Stewart

Bayes Decision Theory

GI01/M055: Supervised Learning

ECE-271B. Nuno Vasconcelos ECE Department, UCSD

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Lecture 3: Pattern Classification. Pattern classification

Recognition of Properties by Probabilistic Neural Networks

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Machine Learning Lecture 2

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory

Machine Learning and Pattern Recognition Density Estimation: Gaussians

Statistical methods in recognition. Why is classification a problem?

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Machine Learning Lecture 1

EECS490: Digital Image Processing. Lecture #26

OBJECT DETECTION AND RECOGNITION IN DIGITAL IMAGES

Bayesian Decision Theory

Linear Regression and Discrimination

CS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines

A Generative Model Based Kernel for SVM Classification in Multimedia Applications

Pattern Recognition 2

Linear & nonlinear classifiers

Maximum Likelihood Estimation. only training data is available to design a classifier

Object Recognition. Digital Image Processing. Object Recognition. Introduction. Patterns and pattern classes. Pattern vectors (cont.

BACKPROPAGATION. Neural network training optimization problem. Deriving backpropagation

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)

Hidden Markov Models (HMMs)

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides

Naïve Bayes classification

Applied Statistics. Multivariate Analysis - part II. Troels C. Petersen (NBI) Statistics is merely a quantization of common sense 1

Pattern Recognition and Machine Learning. Learning and Evaluation of Pattern Recognition Processes

Overriding the Experts: A Stacking Method For Combining Marginal Classifiers

Informatics 2B: Learning and Data Lecture 10 Discriminant functions 2. Minimal misclassifications. Decision Boundaries

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 1

Linear Classifiers as Pattern Detectors

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

Nonparametric probability density estimation

Pattern recognition. "To understand is to perceive patterns" Sir Isaiah Berlin, Russian philosopher

Combining Accuracy and Prior Sensitivity for Classifier Design Under Prior Uncertainty

Notation. Pattern Recognition II. Michal Haindl. Outline - PR Basic Concepts. Pattern Recognition Notions

Classification via kernel regression based on univariate product density estimators

Efficient Approach to Pattern Recognition Based on Minimization of Misclassification Probability

COM336: Neural Computing

Dynamic Time-Alignment Kernel in Support Vector Machine

Bayesian Learning (II)

Machine Learning 2017

Non-parametric Methods

Machine Learning. Theory of Classification and Nonparametric Classifier. Lecture 2, January 16, What is theoretically the best classifier

Optimal Segmentation of Random Sequences

Lecture Notes on the Gaussian Distribution

Supervised Learning: Non-parametric Estimation

Hypothesis testing:power, test statistic CMS:

Machine Learning 2nd Edition

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

ECE662: Pattern Recognition and Decision Making Processes: HW TWO

Heuristics for The Whitehead Minimization Problem

Supervised locally linear embedding

Unsupervised Learning with Permuted Data

Decision-making, inference, and learning theory. ECE 830 & CS 761, Spring 2016

Machine Learning Lecture 2

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

Machine Learning for Data Science (CS4786) Lecture 12

Parametric Techniques

Information Theory in Computer Vision and Pattern Recognition

Course content (will be adapted to the background knowledge of the class):

Bayesian Classifiers and Probability Estimation. Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington

PATTERN CLASSIFICATION

speaker recognition using gmm-ubm semester project presentation

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

6.867 Machine Learning

Transcription:

Introduction to Statistical Pattern Recognition Pattern Recognition Problem R.P.W. Duin Pattern Recognition Group Delft University of Technology The Netherlands prcourse@prtools.org What is this? What occasion? Where are the faces? Who is who? Pattern Recognition Problems Pattern Recognition Problems C Which group? To which class belongs an image To which class (segment) belongs every pixel? Where is an object of interest (detection); What is it (classification)? 3 4 Pattern Recognition ooks Fukunaga, K., Introduction to statistical pattern recognition, second edition, cademic Press, 990. Kohonen, T., Self-organizing maps, Springer Series in Information Sciences, Volume 30, erlin, 995. Ripley,.D., Pattern Recognition and Neural Networks, Cambridge University Press, 996. Devroye, L., Gyorfi, L., and Lugosi, G., probabilistic theory of pattern recognition, Springer, 996. Schurmann, J. Pattern classification, a unified view of statistical and neural approaches, Wiley, 996 Gose, E., Johnsonbaugh, R., and Jost, S., Pattern Recognition and Image nalysis, Prentice-Hall, 996. Vapnik, V.N., Statistical Learning Theory, Wiley, New York, 998. Duda, R.O., Hart, P.E., and Stork, D.G. Pattern Classification, d Edition Wiley, New York, 00. Hastie, T., Tibishirani, R., Friedman, J., The Elements of Statistical Learning, Springer, erlin, 00. Webb,. Statistical Pattern Recognition Wiley, 00. F. van der Heiden, R.P.W. Duin, D. de Ridder, and D.M.J. Tax, Classification, Parameter Estimation, State Estimation: n Engineering pproach Using MatLab, Wiley, New York, 004, -440. ishop, C.M., Pattern Recognition and Machine Learning, Springer 006 Theodoridis, S. and Koutroumbas, K. Pattern Recognition, 4 th ed., cademic Press, New York, 008. Pattern Recognition: OCR Kurzweil Reading Edge utomatic text reading machine with speech synthesizer Statsoft, Electronic Textbook, http://www.statsoft.com/textbook/stathome.html 5 6

Pattern Recognition: Traffic Pattern Recognition: Traffic Road sign detection Road sign recognition License Plates 7 8 Pattern Recognition: Images Pattern Recognition: Faces 00 Objects Is he in the database? Yes! 9 0 Pattern Recognition: Speech Pattern Recognition: Pathology MRI rain Image

Pattern Recognition: Pathology Pattern Recognition: Seismics 3000 500 000 500 Flow cytometry Earthquakes 000 500 0 0 50 00 50 00 50 300 3 4 Pattern Recognition: Shape Recognition Pattern Recognition: Shapes Pattern Recognition is very often Shape Recognition: Images: /W, grey value, color, D, 3D, 4D Time Signals Spectra Examples of objects for different classes Object of unknown class to be classified? 5 6 Pattern Recognition System Pattern Recognition System Sensor Generalization Sensor Generalization area area pixel_ Feature Pixel pixel_ 7 8 3

Pattern Recognition System Pattern Recognition System Sensor Generalization Sensor Generalization Dissimilarity D(x,x ) D(x,x ) Combining Classifiers Classifier_ Classifier_ 9 0 pplications iomedical: EEG, ECG, Röntgen, Nuclear, Tomography, Tissues, Cells, Chromosomes, io-informatics Speech Recognition, Speaker Identification. Character Recognition Signature Verification Remote Sensing, Meteorology Industrial Inspection Robot Vision Digital Microscopy Notes: not always, but very often image based no strong models available Requirements and Goals Set of Examples size complexity cost (speed) error Pattern Recognition System Classification Physical Knowledge (Model) Minimize desired set of examples Minimize amount of explicit knowledge Minimize complexity of representation Minimize cost of recognition Minimize probability of classification error Possible Object s Measurement samples f(t) t Feature vector area Sets of segments or primitives Pattern Recognition System Sensor Generalization Feature Space Classification Outline samples (shape) Symbolic structures (ttributed) graphs Dissimilarities F E -> -> 3.. n 3 C D d(y,x) d(y,x) Test object classified as Statistics needed to solve class overlap x 4 x 4

ayes decision rule, formal Decisions based on densities p( x) > p( x) else ayes: p(x ) p() p(x ) p() > else p(x) p(x) p(x ) p() > p(x ) p() else p(length female) p(length male) -class problems: S(x) = p(x ) p() - p(x ) p() > 0 else What is the gender of somebody with this length? length n-class problems: Class(x) = argmax ω (p(x ω) p(ω)) ayes: { p(female length) = p(length female) p(female) / p(length) p(male length) = p(length male) p(male) / p(length) 5 6 Decisions based on densities p(female) = 0.3 p(male) = 0.7.0 p(length female) p(female) p(length male) p(male) ayes decision rule p(female length) > p(male length) female else male ayes: p(length female) p(female) p(length male) p(male) p(length) > p(length) p(length female) p(female) > p(length male) p(male) female else male What is the gender of somebody with this length? length ayes: { p(female length) = p(length female) p(female) / p(length) p(male length) = p(length male) p(male) / p(length) pdf estimated from training set class prior probabilities known, guessed or estimated 7 8 What are Good Features? Good features are discriminative. Good features are have to be defined by experts Sometimes to be found by statistics Compactness s of real world similar objects are close. There is no ground for any generalization (induction) on representations that do not obey this demand. (.G. rkedev and E.M. raverman, Computers and Pattern Recognition, 966.) e.g. Fisher Criterion: μ μ J F (, ) μ μ = ------------------------- σ + σ x (area) The compactness hypothesis is not sufficient for perfect classification as dissimilar objects may be close. class overlap probabilities σ σ 9 x () x 30 5

Distances and Densities Distances: Scaling Problem? to be classified as x because it is most (area) close to an object because the local density of is larger.? area X area X () x efore scaling: D(X,) < D(X,) fter scaling: D(X,) > D(X,) 3 3 How to Scale in Case of no Natural Features Density Estimation: x color color' (area)? Make variances equal: color'= color var(color) '= var() () x What is the probability of finding an object of class () on this place in the D space? () x What is the probability of finding an object of class () on this place in the D space?? 33 34 The Gaussian distribution (3) Multivariate Gaussians 0.5 0.4 0.3 0. 0. μ σ -dimensional density: Normal distribution = Gaussian distribution Standard normal distribution: μ = 0, σ = 95% of data between [ μ -σ, μ + σ ] (in D!) 0-5 0 5 p( x) = ( x μ) exp πσ σ 35 μ G G = 3 ½ ½ k - dimensional density: T p( x) = exp ( x μ) G k π det( G) ( x μ) 36 6

Multivariate Gaussians () G = 0 0 G = 0 0 3 G = 3 0 0 G = 3 - - Density estimation () The density is defined on the whole feature space. round object x, the density is defined as: dp( x) fraction of objects p( x) = = dx volume Given n measured objects, e.g. person s height (m) how can we estimate p(x)? 37 38 Density estimation () Parametric estimation: ssume a parameterized model, e.g. Gaussian Estimate parameters from data Resulting density is of the assumed form Non parametric estimation: ssume no formal structure/model, choose approach Estimate density with chosen approach Resulting density has no formal form Parametric estimation ssume Gaussian model Estimate mean and covariance from data n n T μˆ = n x i G = ( ˆ) ( μ ˆ n xi μ xi ).5 0.5 0-0.5 i= i= 39-40 -.5 -.5 - -0.5 0 0.5.5.5 Nonparametric estimation Histogram method:. Divide feature space in N bins. Count the number of objects in each bin 3. Normalize: px ˆ( ) = N n ij ij i, j= ndxdy dy y dx x Parzen density estimation () Fix volume of bin, vary positions of bins, add contribution of each bin Define bin -shape (kernel): K( r) > 0 K( r) dr = For test object z sum all bins p( z) = hn i z xi K h 4 3 0 - - -3 - - 0 3 4 5 6 7 4 4 7

Parzen density estimation () With Gaussian kernel: ( ) x K x) = exp Parzen: pˆ (x) ( h π h 43 x Parametric vs. Nonparametric Parametric estimation, based on some model: Model parameters to estimate More samples required than parameters Model assumption could be incorrect resulting in erroneous conclusions Non parametric estimation, hangs on data directly: ssume no formal structure/model lmost no parameters to estimate Erroneous estimates are less likely 44 8