Object Recognition. Digital Image Processing. Object Recognition. Introduction. Patterns and pattern classes. Pattern vectors (cont.

Similar documents
EECS490: Digital Image Processing. Lecture #26

Pattern recognition. "To understand is to perceive patterns" Sir Isaiah Berlin, Russian philosopher

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

Classification. 1. Strategies for classification 2. Minimizing the probability for misclassification 3. Risk minimization

Bayesian Decision Theory

Image Analysis. PCA and Eigenfaces

Introduction to Pattern Recognition. Abstract. Content

Bayesian Decision Theory

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

Lecture 3: Pattern Classification

Informatics 2B: Learning and Data Lecture 10 Discriminant functions 2. Minimal misclassifications. Decision Boundaries

p(x ω i 0.4 ω 2 ω

Feature selection and extraction Spectral domain quality estimation Alternatives

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Linear Classifiers as Pattern Detectors

5. Discriminant analysis

GEOG 4110/5100 Advanced Remote Sensing Lecture 12. Classification (Supervised and Unsupervised) Richards: 6.1, ,

Bayes Decision Theory

Linear Classifiers as Pattern Detectors

Pattern Recognition. Parameter Estimation of Probability Density Functions

Pattern Recognition 2

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory

Bayes Decision Theory

Lecture 3: Pattern Classification. Pattern classification

Bayesian Decision and Bayesian Learning

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

Region Description for Recognition

Machine Learning Lecture 5

p(x ω i 0.4 ω 2 ω

Brief Introduction of Machine Learning Techniques for Content Analysis

Statistical methods in recognition. Why is classification a problem?

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

Application: Can we tell what people are looking at from their brain activity (in real time)? Gaussian Spatial Smooth

Digital Image Processing Chapter 11 Representation and Description

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

44 CHAPTER 2. BAYESIAN DECISION THEORY

Classification Methods II: Linear and Quadratic Discrimminant Analysis

Discriminant analysis and supervised classification

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Probability Models for Bayesian Recognition

DETECTION theory deals primarily with techniques for

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21

Machine Learning for Signal Processing Bayes Classification and Regression

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

Motivating the Covariance Matrix

Pattern Recognition Problem. Pattern Recognition Problems. Pattern Recognition Problems. Pattern Recognition: OCR. Pattern Recognition Books

CS6220: DATA MINING TECHNIQUES

Clustering with k-means and Gaussian mixture distributions

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Generative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham

Bayesian decision making

MLE/MAP + Naïve Bayes

Learning Methods for Linear Detectors

Linear Classification: Probabilistic Generative Models

Discriminant Analysis with High Dimensional. von Mises-Fisher distribution and

Inf2b Learning and Data

Ch 4. Linear Models for Classification

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides

Intelligent Systems Statistical Machine Learning

INF Introduction to classifiction Anne Solberg

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

CS6220: DATA MINING TECHNIQUES

SGN (4 cr) Chapter 5

International Journal "Information Theories & Applications" Vol.14 /

Data Preprocessing. Cluster Similarity

Linear & nonlinear classifiers

Role of Assembling Invariant Moments and SVM in Fingerprint Recognition

Small sample size generalization

Basics on Probability. Jingrui He 09/11/2007

Image Analysis. Feature extraction: corners and blobs

Feature selection. c Victor Kitov August Summer school on Machine Learning in High Energy Physics in partnership with

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Clustering with k-means and Gaussian mixture distributions

Introduction to Machine Learning

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Plan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping

ECE521 week 3: 23/26 January 2017

Kernel Methods & Support Vector Machines

CSE555: Introduction to Pattern Recognition Midterm Exam Solution (100 points, Closed book/notes)

Pattern Classification

Contents 2 Bayesian decision theory

Detection theory 101 ELEC-E5410 Signal Processing for Communications

Pattern Recognition and Machine Learning. Learning and Evaluation of Pattern Recognition Processes

A Probability Review

Introduction to Machine Learning

An Introduction to Multivariate Methods

7 Gaussian Discriminant Analysis (including QDA and LDA)

Solving Classification Problems By Knowledge Sets

Linear discriminant functions

BAYESIAN DECISION THEORY

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Session 1: Pattern Recognition

Linear Regression and Its Applications

Transcription:

3/5/03 Digital Image Processing Obect Recognition Obect Recognition One of the most interesting aspects of the world is that it can be considered to be made up of patterns. Christophoros Nikou cnikou@cs.uoi.gr A pattern is essentially an arrangement. It is characterized by the order of the elements of which it is made, rather than by the intrinsic nature of these elements Images taken from: R. Gonzalez and R. oods. Digital Image Processing, Prentice Hall, 008 Norbert iener University of Ioannina - Department of Computer Science 3 Introduction 4 Patterns and pattern classes Recognition of individual image regions (obects or patterns). Introduction to basic techniques. Decision theoretic techniques. Quantitative descriptors (e.g. area, length ). Patterns arranged in vectors. Structural techniques. Qualitative descriptors (relational descriptors for repetitive structures, e.g. staircase). Patterns arranged in strings or trees. Central idea: Learning from sample patterns. Pattern: an arrangement of descriptors (or features). Pattern class: a family of patterns sharing some common properties. hey are denoted by ω, ω,, ω, being the number of classes. Goal of pattern recognition: assign patterns to their classes with as little human interaction as possible. 5 Pattern vectors 6 Pattern vectors (cont.) Historical example Recognition of three types of iris flowers by the lengths and widths of their petals (Fisher 936). Variations between and within classes. Class separability depends strongly on the choice of descriptors. x x x Shape signature represented by the sampled amplitude values. Cloud of n-dimensional points. Other shape characteristics could have been employed (e.g. moments). he choice of descriptors has a profound role in the recognition performance. x x x xn

3/5/03 7 String descriptors 8 String descriptors (cont.) Description of structural relationships. Example: fingerprint recognition. Primitive components that describe. fingerprint ridge properties. Interrelationships of print features (minutiae). Abrupt endings, branching, merging, disconnected segments, Relative sizes and locations of print features. Staircase pattern described by a head-to-tail structural relationship. he rule allows only alternating pattern. It excludes other types of structures. Other rules may be defined. 9 ree descriptors 0 Decision-theoretic methods Hierarchical ordering In the satelite image example, the structural relationship is defined as: composed of hey are based on decision (discriminant) functions. Let x=[x, x,, x n ] represent a pattern vector. For pattern classes ω, ω,, ω, the basic problem is to find decision functions d (x), d (x),, d (x) with the property that if x belongs to class ω i : d ( x) d ( x) for,,..., ; i i Decision-theoretic methods (cont.) Decision-theoretic methods (cont.) he decision boundary separating class ω i from class ω is given by the values of x for which d i (x) = d (x) or d ( x) d ( x) d ( x) 0 i i If x belongs to class ω i : d ( x) 0 for,,..., ; i i Matching: an unknown pattern is assigned to the class to which it is closest with respect to a metric. Minimum distance classifier Computes the Euclidean distance between the unknown pattern and each of the prototype vectors. Correlation It can be directly formulated in terms of images Optimum statistical classifiers Neural networks

3/5/03 3 Minimum distance classifier 4 he prototype of each pattern class is the mean vector: m x,,..., N x It is easy to show that selecting the smallest distance is equivalent to evaluating the functions: Using the Euclidean distance as a measure of closeness: d ( x) x m m m,,..., D ( x) x m,,..., e assign x to class ω if D (x) is the smallest distance. hat is, the smallest distance implies the best match in this formulation. and assigning x to class ω if d (x) yields the largest numerical value. his formulation agrees with the concept of a decision function. 5 6 he decision boundary between classes ω i and ω is given by: d ( x) d ( x) d ( x) i i x ( mi m ) ( mi m ) ( mi m ) 0 he surface is the perpendicular bisector of the line segment oining m i and m. For n=, the perpendicular bisector is a line, for n=3 it is a plane and for n>3 it is called a hyperplane. 7 8 In practice, the classifier works well when the distance between means is large compared to the spread of each class. his occurs seldom unless the system designer controls the nature of the input. An example is the recognition of characters on bank checks American Banker s Association E-3B font character set. Characters designed on a 9x7 grid. he characters are scanned horizontally by a head that is narrower but taller than the character which produces a D signal proportional to the rate of change of the quantity of the ink. he waveforms (signatures) are different for each character. 3

3/5/03 9 Matching by correlation 0 Matching by correlation (cont.) e have seen the definition of correlation and its properties in the Fourier domain. g( x, y) w( s, t) f ( x s, y t) s G u v F u v u v t * (, ) (, ) (, ) Normalized correlation coefficient: ( xy, ) s t w( s, t) w f ( x s, y t) f xy w( s, t) w f ( x s, y t) f s t s t xy his definition is sensitive to scale changes in both images. Instead, we use the normalized correlation coefficient. γ (x,y) takes values in [-,]. he maximum occurs when the two regions are identical. Matching by correlation (cont.) Matching by correlation (cont.) ( xy, ) s t w( s, t) w f ( x s, y t) f xy w( s, t) w f ( x s, y t) f s t s t xy It is robust to changes in the amplitudes. Normalization with respect to scale and rotation is a challenging task. he compared windows may be seen as random variables. he correlation coefficient measures the linear dependence between X and Y. E[( X mx)( Y my)] ( XY, ) X X 3 Matching by correlation (cont.) 4 Optimum statistical classifiers Image Correlation coefficients Detection of the eye of the hurricane emplate Best match A probabilistic approach to recognition. It is possible to derive an optimal approach, in the sense that, on average, it yields the lowest probability of committing classification errors. he probability that a pattern x comes from class ω is denoted by p(ω /x). If the classifier decides that x came from ω when it actually came from ω i it incurs a loss denoted by L i. 4

3/5/03 5 6 As pattern x may belong to any of classes, the average loss assigning x to ω is: r ( x ) Lk p( / x k ) Lk p( / k ) P( k ) p( x) x k k Because /p(x) is positive and common to all r (x) the expression reduces to: r ( x) L p( x/ ) P( ) k k k k r ( x) L p( x/ ) P( ) k k k k p(x/ω ) is the pdf of patterns of class ω (class conditional density). P(ω ) is the probability of occurrence of class ω (a priori or prior probability). he classifier evaluates r (x), r (x),, r (x), and assigns pattern x to the class with the smallest average loss. 7 8 he classifier that minimizes the total average loss is called the Bayes classifier. It assigns an unknown pattern x to classs ω i if: r ( x) r ( x), for,,..., i or L p( x/ ) P( ) L p( x/ ) P( ), for all, i ki k k q q q k q he loss for a wrong decision is generally assigned to a non zero value (e.g. ) he loss for a correct decision is 0. herefore, L i r ( x ) L p( x / ) P( ) ( ) p( x / ) P( ) k k k k k k k k p( x) p( x/ ) P( ) i 9 30 he Bayes classifier assigns pattern x to classs ω i if: p( x) p( x / ) P( ) p( x) p( x / ) P( ) i i or p( x/ ) P( ) p( x/ ) P( ) i i which is the computation of decision functions: d ( x) p( x/ ) P( ),,,..., It assigns pattern x to the class whose decision function yields the largest numerical value. he probability of occurrence of each class P(ω ) must be known. Generally, we consider them equal, P(ω )=/. he probability densities of the patterns in each class P(x/ω ) must be known. More difficult problem (especially for multidimensional variables) which requires methods from pdf estimation. Generally, we assume: Analytic expressions for the pdf. he pdf parameters may be estimated from sample patterns. he Gaussian is the most common pdf. 5

3/5/03 3 classes e first consider the -D case for = classes. ( xm ) x x d ( ) p( / ) P( ) e P( ),,,..., For P(ω )=/: 3 In the n-d case: p( x / ) n/ ( ) C / e ( ) xm C ( xm ) Each density is specified by its mean vector and its covariance matrix: m E [ x] C E [( x m )( x m ) ] 33 Approximation of the mean vector and covariance matrix from samples from the classes: m N x C ( x m )( x m ) N x x 34 It is more convenient to work with the natural logarithm of the decision function as it is monotonically increasing and it does not change the order of the decision functions: d ( x) ln p( x / ) P( ) ln p( x / ) ln P( ) ln P( ) ln C ( x m ) C ( x m ) he decision functions are hyperquadrics. 35 36 If all the classes have the same covarinace C =C, =,,, the decision functions are linear (hyperplanes): d ( x) ln P( ) x C m m C m Moreover, if P(ω )=/ and C =I: d ( x) x m m m which is the minimum distance classifier decision function. he minimum distance classifier is optimum in the Bayes sense if: he pattern classes are Gaussian. All classes are equally to occur. All covariance matrices are equal to (the same multiple of) the identity matrix. Gaussian pattern classes satisfying these conditions are spherical clouds (hyperspheres) he classifier establishes a hyperplane between every pair of classes. It is the perpendicular bisector of the line segment oining the centers of the classes 6

3/5/03 37 Application to remotely sensed images 38 Application to remotely sensed images (cont.) 4-D vectors. hree classes ater Urban development Vegetation Mean vectors and covariance matrices learnt from samples whose class is known. Here, we will use samples from the image to learn the pdf parameters 39 Application to remotely sensed images (cont.) 40 Matching shape numbers String matching Structural methods 4 Matching shape numbers 4 Reminder: shape numbers he degree of similarity, k, between two shapes is defined as the largest order for which their shape numbers still coincide. Reminder: he shape number of a boundary is the first difference of smallest magnitude of its chain code (invariance to rotation). he order n of a shape number is defined as the number of digits in its representation. Examples. All closed shapes of order n=4, 6 and 8. First differences are computed by treating the chain as a circular sequence. 7

3/5/03 43 Matching shape numbers (cont.) 44 Matching shape numbers (cont.) Let a and b denote two closed shapes which are represented by 4-directional chain codes and s(a) and s(b) their shape numbers. he shapes have a degree of similarity, k, if: s ( a) s ( b) for 4,6,8,..., k s ( a) s ( b) for k, k 4,... his means that the first k digits should be equal. he subscript indicates the order. For 4-directional chain codes, the minimum order for a closed boundary is 4. Alternatively, the distance between two shapes a and b is defined as the inverse of their degree of similarity: D( a, b) k It satisfies the properties: D( a, b) 0 D( a, b) 0, iff a b D( a, c) max[ D( a, b), D( b, c)] 45 String matching Region boundaries a and b are code into strings denoted a a a 3 a n and b b b 3 b m. Let p represent the number of matches between the two strings. A match at the k-th position occurs if a k =b k. he number of symbols that do not match is: q max( a, b ) p A simple measure of similarity is: number of matches p p R number of mismatches q max( a, b ) p 8