Introduction to Statistical Pattern Recognition Pattern Recognition Problem R.P.W. Duin Pattern Recognition Group Delft University of Technology The Netherlands prcourse@prtools.org What is this? What occasion? Where are the faces? Who is who? Pattern Recognition Problems Pattern Recognition Problems C Which group? To which class belongs an image To which class (segment) belongs every pixel? Where is an object of interest (detection); What is it (classification)? 3 4 Pattern Recognition ooks Fukunaga, K., Introduction to statistical pattern recognition, second edition, cademic Press, 990. Kohonen, T., Self-organizing maps, Springer Series in Information Sciences, Volume 30, erlin, 995. Ripley,.D., Pattern Recognition and Neural Networks, Cambridge University Press, 996. Devroye, L., Gyorfi, L., and Lugosi, G., probabilistic theory of pattern recognition, Springer, 996. Schurmann, J. Pattern classification, a unified view of statistical and neural approaches, Wiley, 996 Gose, E., Johnsonbaugh, R., and Jost, S., Pattern Recognition and Image nalysis, Prentice-Hall, 996. Vapnik, V.N., Statistical Learning Theory, Wiley, New York, 998. Duda, R.O., Hart, P.E., and Stork, D.G. Pattern Classification, d Edition Wiley, New York, 00. Hastie, T., Tibishirani, R., Friedman, J., The Elements of Statistical Learning, Springer, erlin, 00. Webb,. Statistical Pattern Recognition Wiley, 00. F. van der Heiden, R.P.W. Duin, D. de Ridder, and D.M.J. Tax, Classification, Parameter Estimation, State Estimation: n Engineering pproach Using MatLab, Wiley, New York, 004, -440. ishop, C.M., Pattern Recognition and Machine Learning, Springer 006 Theodoridis, S. and Koutroumbas, K. Pattern Recognition, 4 th ed., cademic Press, New York, 008. Pattern Recognition: OCR Kurzweil Reading Edge utomatic text reading machine with speech synthesizer Statsoft, Electronic Textbook, http://www.statsoft.com/textbook/stathome.html 5 6
Pattern Recognition: Traffic Pattern Recognition: Traffic Road sign detection Road sign recognition License Plates 7 8 Pattern Recognition: Images Pattern Recognition: Faces 00 Objects Is he in the database? Yes! 9 0 Pattern Recognition: Speech Pattern Recognition: Pathology MRI rain Image
Pattern Recognition: Pathology Pattern Recognition: Seismics 3000 500 000 500 Flow cytometry Earthquakes 000 500 0 0 50 00 50 00 50 300 3 4 Pattern Recognition: Shape Recognition Pattern Recognition: Shapes Pattern Recognition is very often Shape Recognition: Images: /W, grey value, color, D, 3D, 4D Time Signals Spectra Examples of objects for different classes Object of unknown class to be classified? 5 6 Pattern Recognition System Pattern Recognition System Sensor Generalization Sensor Generalization area area pixel_ Feature Pixel pixel_ 7 8 3
Pattern Recognition System Pattern Recognition System Sensor Generalization Sensor Generalization Dissimilarity D(x,x ) D(x,x ) Combining Classifiers Classifier_ Classifier_ 9 0 pplications iomedical: EEG, ECG, Röntgen, Nuclear, Tomography, Tissues, Cells, Chromosomes, io-informatics Speech Recognition, Speaker Identification. Character Recognition Signature Verification Remote Sensing, Meteorology Industrial Inspection Robot Vision Digital Microscopy Notes: not always, but very often image based no strong models available Requirements and Goals Set of Examples size complexity cost (speed) error Pattern Recognition System Classification Physical Knowledge (Model) Minimize desired set of examples Minimize amount of explicit knowledge Minimize complexity of representation Minimize cost of recognition Minimize probability of classification error Possible Object s Measurement samples f(t) t Feature vector area Sets of segments or primitives Pattern Recognition System Sensor Generalization Feature Space Classification Outline samples (shape) Symbolic structures (ttributed) graphs Dissimilarities F E -> -> 3.. n 3 C D d(y,x) d(y,x) Test object classified as Statistics needed to solve class overlap x 4 x 4
ayes decision rule, formal Decisions based on densities p( x) > p( x) else ayes: p(x ) p() p(x ) p() > else p(x) p(x) p(x ) p() > p(x ) p() else p(length female) p(length male) -class problems: S(x) = p(x ) p() - p(x ) p() > 0 else What is the gender of somebody with this length? length n-class problems: Class(x) = argmax ω (p(x ω) p(ω)) ayes: { p(female length) = p(length female) p(female) / p(length) p(male length) = p(length male) p(male) / p(length) 5 6 Decisions based on densities p(female) = 0.3 p(male) = 0.7.0 p(length female) p(female) p(length male) p(male) ayes decision rule p(female length) > p(male length) female else male ayes: p(length female) p(female) p(length male) p(male) p(length) > p(length) p(length female) p(female) > p(length male) p(male) female else male What is the gender of somebody with this length? length ayes: { p(female length) = p(length female) p(female) / p(length) p(male length) = p(length male) p(male) / p(length) pdf estimated from training set class prior probabilities known, guessed or estimated 7 8 What are Good Features? Good features are discriminative. Good features are have to be defined by experts Sometimes to be found by statistics Compactness s of real world similar objects are close. There is no ground for any generalization (induction) on representations that do not obey this demand. (.G. rkedev and E.M. raverman, Computers and Pattern Recognition, 966.) e.g. Fisher Criterion: μ μ J F (, ) μ μ = ------------------------- σ + σ x (area) The compactness hypothesis is not sufficient for perfect classification as dissimilar objects may be close. class overlap probabilities σ σ 9 x () x 30 5
Distances and Densities Distances: Scaling Problem? to be classified as x because it is most (area) close to an object because the local density of is larger.? area X area X () x efore scaling: D(X,) < D(X,) fter scaling: D(X,) > D(X,) 3 3 How to Scale in Case of no Natural Features Density Estimation: x color color' (area)? Make variances equal: color'= color var(color) '= var() () x What is the probability of finding an object of class () on this place in the D space? () x What is the probability of finding an object of class () on this place in the D space?? 33 34 The Gaussian distribution (3) Multivariate Gaussians 0.5 0.4 0.3 0. 0. μ σ -dimensional density: Normal distribution = Gaussian distribution Standard normal distribution: μ = 0, σ = 95% of data between [ μ -σ, μ + σ ] (in D!) 0-5 0 5 p( x) = ( x μ) exp πσ σ 35 μ G G = 3 ½ ½ k - dimensional density: T p( x) = exp ( x μ) G k π det( G) ( x μ) 36 6
Multivariate Gaussians () G = 0 0 G = 0 0 3 G = 3 0 0 G = 3 - - Density estimation () The density is defined on the whole feature space. round object x, the density is defined as: dp( x) fraction of objects p( x) = = dx volume Given n measured objects, e.g. person s height (m) how can we estimate p(x)? 37 38 Density estimation () Parametric estimation: ssume a parameterized model, e.g. Gaussian Estimate parameters from data Resulting density is of the assumed form Non parametric estimation: ssume no formal structure/model, choose approach Estimate density with chosen approach Resulting density has no formal form Parametric estimation ssume Gaussian model Estimate mean and covariance from data n n T μˆ = n x i G = ( ˆ) ( μ ˆ n xi μ xi ).5 0.5 0-0.5 i= i= 39-40 -.5 -.5 - -0.5 0 0.5.5.5 Nonparametric estimation Histogram method:. Divide feature space in N bins. Count the number of objects in each bin 3. Normalize: px ˆ( ) = N n ij ij i, j= ndxdy dy y dx x Parzen density estimation () Fix volume of bin, vary positions of bins, add contribution of each bin Define bin -shape (kernel): K( r) > 0 K( r) dr = For test object z sum all bins p( z) = hn i z xi K h 4 3 0 - - -3 - - 0 3 4 5 6 7 4 4 7
Parzen density estimation () With Gaussian kernel: ( ) x K x) = exp Parzen: pˆ (x) ( h π h 43 x Parametric vs. Nonparametric Parametric estimation, based on some model: Model parameters to estimate More samples required than parameters Model assumption could be incorrect resulting in erroneous conclusions Non parametric estimation, hangs on data directly: ssume no formal structure/model lmost no parameters to estimate Erroneous estimates are less likely 44 8