INF Anne Solberg One of the most challenging topics in image analysis is recognizing a specific object in an image.

Size: px

Start display at page:

Download "INF Anne Solberg One of the most challenging topics in image analysis is recognizing a specific object in an image."

Giles Washington
5 years ago
Views:

1 INF Introduction to classifiction Anne Solberg Based on Chapter -6 6inDuda and Hart: attern Classification 303 INF 4300 Introduction to classification One of the most challenging topics in image analsis is recognizing a specific obect in an image To do that, several steps are normall used Some kind of segmentation can delimit the spatial etent of the foreground obects, then a set of features to describe the obect characteristics are computed, before the recognition step that decides which obect tpe this is The focus of the net three lectures is the recognition step The starting pointis thatwehave a set of K features computed for an obect These features can either describe the shape or the gra level/color properties of the obect Based on these features we need to decide what kind of an obect this is Statistical classification is the process of estimating the probabilit that an obect belongs to one of S obect classes based on the observed value of K features Classification can be done both unsupervised or supervised In unsupervised classification the categories or classes are not known, and the classification process with be based on grouping similar obects together In supervised classification the categories or classes are known, and the classification will of consist of estimating the probabilit that the obect belongs to each of the S obect classes The obect is assigned to the blass with highest probabilit For supervised classifcation, training data is needed Training data consists of a set of obects with know class tpe, and the are used to estimate the parameters of the classifier The performance of a classifier is normall computed as the accurac it gets when classifing i a different set of obects with known class labels called the test set INF 4300

Assume that each obect i in the scene is represented b a feature vector i If we have computed K features, i will be a vector with K components: i i K i A classifier that is based on K features is

2 Assume that each obect i in the scene is represented b a feature vector i If we have computed K features, i will be a vector with K components: i i K i A classifier that is based on K features is called a multivariate classifier The K features and the obects in the training data set will define a K- dimensional feature space The training data set is a set of obects with known class labels As we saw last week, we can visualize class separation on data with known class labels if we have one or two features, but visualizing multidimensional space is difficult INF In the scatter plot, each obect in the training data set is plotted in K- dimensional space at a position relative to the value of the K features Each obect is represented b a dot, and the color of the dot represents the class In this eample we have 4 classes visualize using features A scatter plot is a tool for visualizing i features We will later look at other class separabilit measures INF

3 Concepts in classification In the following three lectures we will cover these topics related to classification: Training set Test set Classifier accurac/confusion matrices Computing the probabilit that an obect belongs to a class Let each class be represented b a probabilit densit function In general man probabilit densities can be used but we use the multivariate normal distribution which is commonl used Baes rule Discriminanti i functions/decision i i boundaries Normal distribution, mean vector and covariance matrices knn classification Unsupervised classification/clustering INF Introduction to classification Supervised classification is related to thresholding Divide the image into two classes: foreground and background Thresholding is a two-class classification problem based on a D feature vector The feature vector consist of onl the gre level f, How can we classif a feature vector of K shape features into correct character tpe? We will now stud multivariate classification theor where we use N features to determine if an obect belongs to a set of K obect classes Recommended additional reading: attern Classification, R Duda, Hart and D Stork Chapter Introduction Chapter Baesian Decision Theor, -6 See ~inf3300/www_docs/bilder/dudahart_chappdf and dudahart-appendipdfinf

4 lan for this lecture: Eplain the relation between thresholding and classification with classes Background in probabilit theor Baes rule Classification with a Gaussian densit and a single feature Briefl: training and testing a classifier We og deeper into this in two weeks INF From INF30: Thresholding Basic thresholding assigns all piels in the image to one of classes: foreground or background 0 if g, if f, T f, T This can be seen as a -class classification problem based on a single feature, the gra level The classes are background and foreground, and the threshold T defines the border between them INF

5 Classification error for thresholding - Background - Foreground Threshold t In this region, foreground piels are misclassified as background In this region, background piels are misclassified as foreground INF Classification error for thresholding We assume that bz is the normalized histogram for background bz and fz is the histogram for foreground The histograms are estimates of the probabilit distribution of the gra levels in the image Let F and B be the prior probabilities for background and foregroundb+f= The normalized histogram for the image is then given b z B b z F f z The probabilit for misclassification given a treshold t is: t E t f z dz B E F t b z dz t INF

6 Find T that minimizes the error t E t F f z dz B b z dz t de t dt 0 F f T B b T Minimum error is achieved b setting T equal to the point where the probabilities for foreground and background are equal INF 4300 Distributions, standard deviation and variance A Gaussian distribution normal distribution is specified given the mean value and dthe variance : z Variance, standard deviation e INF 4300

7 Two Gaussian distributions for a single feature Assume that bz and fz are Gaussian distributions, then B B F F B F B F z e e B and F are the mean values for background and foreground B and F are the variance for background and foreground INF The -class classification problem summarized Given two Gaussian distributions bz and fz The classes have prior probabilities F and B Ever piel should be assigned to the class thatt minimizes the classification error The classification error is minimized at the point where F fz = B bz What we will do now is to generalize to K classes and D features INF

8 How do we find the best border beteen K classes with features? We will find the theoretical answer and a geometrical interpretation of class means, variance, and the equivalent of a threshold INF The goal of classification We estimate the decision boundaries based on training data Classification performance is alwas estimated on a separate test data set We tr to measure the generalization performance The classifier should perform well when classifing new samples Have lowest possible classification error We often face a tradeoff between classification error on the training set and generalization abilit when determining the compleit of the decision boundar INF

9 robabilit theor - Appendi A4 Let be a discrete random variable that can assume an of a finite number of M different values The probabilit that belongs to class i is p i =r=i r=i, i=,m A probabilit distribution must sum to and M probabilities pobab tesmust be positive so p 0 and i 0a pi i INF Epected values - definition The epected value or mean of a random variable is: M E i ip i The variance or second order moment is: E E u u Var These will be estimated fomt from training data where e we know the true class labels foil 39 for the univariate case INF

10 airs of random variables Let and be random variables The oint probabilit of observing a pair of values =i,= is p i Alternativel we can define a oint probabilit distribution function, for which, 0,, The marginal distributions for and if we want to eliminate one of them is:,, INF Statistical independence Epected values of two variables Variables and are statistical independent if and onl if, Two variables are uncorrelated if 0 Epected values of two variables: E f, E E E E f,,,,,, E, Where in this course have ou seen similar formulas? INF

11 Feature Can ou sketch,,,, Feature INF 4300 Conditional probabilit If two variables are statisticall dependent, knowing the value of one of them lets us get a better estimate of the value of the other one The conditional probabilit of given is: r i, r i r and for distributions :, Eample: Threshold a page with dark tet on white background is the gre level of a piel, and is its class F or B If we consider which gre levels can have - we epect small values if is tet =F, and large values if is background =B INF 4300

12 Epected values of M variables Using vector notation: μ E Σ E μ μ T INF Baesian decision theor A fundamental statistical approach to pattern classification Named after Thomas Baes 70-76, an english priest and matematician It combines prior knowledge about the problem with a probabilit distribution function The most central concept is Baes decision rule INF

13 Baes rule in general Baes rule in general The equation: The equation: I d In words: evidence prior likelihood posterior are observations, is the unknown class labels evidence We want to find the most probable class given the observations T b l i d f th l ifi ti bl l t To be eplained for the classification problem later :- INF Mean vectors and covariance matrices i N di i in N dimensions If f is a n-dimensional feature vector, we can formulate its mean vector and covariance matri as: E E f f n n n E E f f μ n n n n Σ with n features the mean vector will be of size n and or n n n nn n n with n features, the mean vector will be of size n and or size nn The matri will be smmetric as kl = lk INF

14 Baes rule for a classification problem Suppose we have J, =,J classes is the class label for a piel, and is the observed gra level l or feature vector We can use Baes rule to find an epression for the class with the highest probabilit: prior probabilit posterior probabilit likelihood normalizing factor For thresholding, is the prior probabilit for background or foreground If we don't have special knowledge that one of the classes occur more frequent than other classes, we set them equal for all classes =/J, =,,,J Small p means a probabilit distribution Capital means a probabilit scalar value between 0 and INF Baes rule eplained p is the probabilit densit function that models the likelihood for observing gra level if the piel belongs to class Tpicall we assume a tpe of distribution, eg Gaussian, and the mean and covariance of that t distribution ib ti is fitted to some data that we know belong to that class This fitting is called classifier training is the posterior probabilit that the piel actuall belongs to class We will soon se that the the classifier that achieves the minimum error is a classifier that assigns each piel to the class that has the highest posterior probabilit is ust a scaling factor that assures that the probabilities sum to INF

15 robabilit of error If we have classes, we make an error either if we decide if the true class is if we decide if the true class is If > we have more belief that belongs to, and we decide The probabilit of error is then: if we decide error if we decide INF Back to classification error for thresholding - Background - Foreground error error, d error d In this region, foreground piels are misclassified as background In this region, background piels are misclassified as foreground INF

16 Minimizing the error error error, d error d When we derived d the optimal threshold, h we showed that the minimum error was achieved for placing the threshold or decision border as we will call it now at the point where = This is still valid INF Baes decision rule In the class case, our goal of minimizing the error implies a decision rule: Decide ω if ω >ω ; otherwise ω For J classes, the rule analogusl etends to choose the class with maimum a posteriori probabilit The decision boundar is the border between classes i and, simpl where ω i =ω Eactl where the threshold was set in minimum error thresholding! INF

17 Baes classification with J classes and D features How do we generalize: To more the one feature at a time To J classes To consider loss functions that some errors are more costl than others INF Baes rule with c classes and d features If we measure d features, will be a d-dimensional dimensional feature vector Let {,, C } be a set of c classes The posterior probabilit for class c is now computed as c Still, we assign a piel with feature vector to the class that has the highest posterior probabilit: Decide if, for all i INF

18 Discriminant functions The decision rule Decide if, for all i can be written as assign to if g i g The classifier computes J discriminanti i functions g i and selects the class corresponding to the largest value of the discriminant function Since classification consists of choosing the class that has the largest value, a scaling of the discriminant function g i b fg i will not effect the decision if f is a monotonicall increasing function This can lead to simplifications as we will soon see INF Equivalent discriminant functions The following choices of discriminant functions give equivalent decisions: p i i gi i gi i i g ln p ln i i i The effect of the decision rules is to divide the feature space into c decision regions R,R c If g i >g for all i, then is in region R i The regions are separated b decision boundaries, surfaces in features space where the discriminant functions for two classes are equal INF

Decision functions - two classes If we have onl two classes, assigning to if g >g is equivalent to using a single discriminant i i function: g = g -g and decide if g>0 The following functions are

19 Decision functions - two classes If we have onl two classes, assigning to if g >g is equivalent to using a single discriminant i i function: g = g -g and decide if g>0 The following functions are equivalent: g g ln p p INF The Gaussian densit - univariate i case a single feature To use a classifier we need to select a probabilit densit function i The most commonl used probabilit densit is the normal Gaussian distribution: ep with epected value or mean E d and variance E d INF

20 Training a univariate Gaussian classifier To be able to compute the value of the discriminant function, we need to have an estimate of and for each class Assume that we know the true class labels for some piels and that this is given in a mask image Training the classifier then consists of computing and for all piels with class label inthe mask file The are computed from training data as: For all piels i with label k in the training mask, compute k N k N N k k ii i Nk i k k ii INF Classification with a univariate Gaussian Decide on values for the prior probabilities, If we have no prior information, assume that all classes are equall probable and =/c Estimate and based on training data based on the formulae on the previous slide For class =, J, compute the discriminant function ep Assign piel to the class with the highest value of The result after classification is an image with class labels l corresponding to the most probable class for each piel We compute the classification error rate from an independent test mask INF

21 Eample: image and training masks The masks contain labels for the training data If a piel is not part of the training data, it will have label 0 A piel belonging to class k will have value kinthe mask image We should have a similar mask for the test data INF Estimating classification error A simple measure of classification accurac can be to count the percentage of correctl classified piels overall averaged for all classes, or per class If a piel has true class label k, it is correctl classified if =k Normall we use a piels to train and test a classifier, so we have a disoint training mask and test mask Estimate the classification error b classifing all piels in the test set and count the percentage of wrongl classified piels INF

INF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification

INF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification INF 4300 151014 Introduction to classifiction Anne Solberg anne@ifiuiono Based on Chapter 1-6 in Duda and Hart: Pattern Classification 151014 INF 4300 1 Introduction to classification One of the most challenging