INF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification

INF 4300 90 Itroductio to classifictio Ae Solberg ae@ifiuioo Based o Chapter -6 i Duda ad Hart: atter Classificatio 90 INF 4300 Madator proect Mai task: classificatio You must implemet a classificatio algorithm Tetative schedule: eercise available November Tetative deadlie: November 3 90 INF 4300

Itroductio to classificatio Supervised classificatio is related to thresholdig Divide the image ito two classes: foregroud ad backgroud Thresholdig is a two-class classificatio problem based o a D feature vector The feature vector cosist of ol the gre level f How ca we classif a feature vector of N shape features ito correct character tpe? We will ow stud multivariate classificatio theor where we use N features to determie if a obect belogs to a set of K obect classes Recommeded additioal readig: atter Classificatio R Duda Hart ad D Stork Chapter Itroductio Chapter Baesia Decisio Theor -6 90 INF 4300 3 From INF30: Thresholdig Basic thresholdig assigs all piels i the image to oe of classes: foregroud or backgroud 0 if g if f T f T This ca be see as a -class classificatio problem based o a sigle feature the gra level 90 INF 4300 4

Classificatio error for thresholdig - Backgroud - Foregroud Threshold h t I this regio foregroud piels are misclassified as backgroud I this regio backgroud piels are misclassified as foregroud 90 INF 4300 5 Classificatio error for thresholdig We assume that bz is the ormalized histogram for backgroud bz ad fz is the histogram for foregroud The histograms are estimates of the probabilit distributio of the gra levels i the image Let F ad B be the prior probabilities for backgroud ad foregroudb+f= The ormalized histogram for the image is the give b p z B b z F f z The probabilit for misclassificatio give a treshold t is: t B t t f z dz b z dz F t 90 INF 4300 6 3

Fid T that miimizes the error Compute the derivative of t with respect to t t t t F f z dz B b z dz Set the derivative equal to 0: d t 0 F f T B b T dt Miimum error is achieved b settig T equal to the poit where the probabilities for foregroud ad backgroud are equal 90 INF 4300 7 Distributios stadard deviatio ad variace A Gaussia distributio ormal distributio is specified give the mea value ad the variace : p z Variace stadard deviatio e 90 INF 4300 8 4

Two Gaussia distributios for a sigle feature Assume that bz ad fz are Gaussia distributios the p B F B F B F z e e B F B ad F are the mea values for backgroud ad foregroud B ad F are the variace for backgroud ad foregroud 90 INF 4300 9 The -class classificatio problem summarized Give two Gaussia distributios bz ad fz The classes have prior probabilities F ad B ver piel should be assiged to the class that miimizes the classificatio error The classificatio error is miimized at the poit where F fz = B bz What we will do ow is to geeralize to D features ad K classes 90 INF 4300 0 5

How do we fid the best border betee K classes with features? We will fid the theoretical aswer ad a geometrical iterpretatio of class meas variace ad the equivalet of a threshold 90 INF 4300 The goal of classificatio We estimate the decisio boudaries based o traiig data Classificatio performace is alwas estimated o a separate test data set We tr to measure the geeralizatio performace The classifier should perform well whe classifig ew samples Have lowest possible classificatio error We ofte face a tradeoff betwee classificatio error o the traiig set ad geeralizatio abilit whe determiig the compleit of the decisio boudar 90 INF 4300 6

robabilit theor - Appedi A4 Let be a discrete radom variable that ca assume a of a fiite umber of M differet values The M differet values will i our case be oe of M classes The probabilit that belogs to class i is p i = r=i i=m A probabilit distributio must sum to ad probabilities must be positive so p M i 0 ad p i i 90 INF 4300 3 pected values The epected value or mea of a radom variable is: M ip i The variace or secod order momet is: Var u i u 90 INF 4300 4 7

8 airs of radom variables Let ad be radom variables The oit probabilit of observig a pair of values The oit probabilit of observig a pair of values =i= is p i Alterativel we ca defie a oit probabilit distributio fuctio for which The margial distributios for ad if we wat to elimiate oe of them is: 0 elimiate oe of them is: 90 INF 4300 5 Statistical idepedece pected values of two variables Variables ad are statistical idepedet if ad ol if Two variables are ucorrelated if pected values of two variables: f f 0 90 INF 4300 6

pected values of M variables Usig vector otatio: μ Σ μ μ T 90 INF 4300 7 Coditioal probabilit If two variables are statisticall depedet kowig the value of oe of them lets us get a better estimate of the value of the other oe The coditioal probabilit of give is: r i r i r ad for distributios : ample: Threshold a page with dark tet of white backgroud is the gre level of a piel ad is its class F or B If we cosider which gre levels ca have - we epect small values if is tet =F ad large values if is backgroud =B 90 INF 4300 8 9

0 Baes rule i geeral The equatio: I words: evidece prior likelihood posterior To be eplaied for the classificatio problem later :- 90 INF 4300 9 Mea vectors ad covariace matrices i N dimesios If f is a -dimesioal feature vector we ca formulate its mea vector ad covariace matri as: f f f f Σ μ with features the mea vector will be of size ad or size 90 INF 4300 0 Σ

Baes rule for a classificatio problem Suppose we have J =J classes is the class label for a piel ad is the observed gra level or feature vector We ca use Baes rule to fid a epressio for the class with the highest probabilit: p p prior probabilit posterior probabilit likelihood ormalizig factor For thresholdig is the prior probabilit for backgroud or foregroud If we do't have special kowledge that oe of the classes occur more frequet tha other classes we set them equal for all classes =/J =J p is the probabilit desit fuctio that models the likelihood for 90 INF 4300 Baes rule eplaied p p p is the probabilit desit fuctio that models the likelihood for observig gra level if the piel belogs to class Tpicall we assume a tpe of distributio eg Gaussia ad the mea ad covariace of that distributio is fitted to some data that we kow belog to that class This fittig is called classifier traiig is the posterior probabilit that the piel actuall belogs to class We will soo se that the the classifier that achieves the miimum error is a classifier that assigs each piel to the class that has the highest posterior probabilit p is ust a scalig factor that assures that the probabilities sum to 90 INF 4300

robabilit of error If we have classes we make a error either if we decide if the true class is if we decide if the true class is If > we have more belief that belogs to ad we decide The probabilit of error is the: if we decide error if we decide 90 INF 4300 3 Back to classificatio error for thresholdig - Backgroud - Foregroud error error d error p d I this regio foregroud piels are misclassified as backgroud I this regio backgroud piels are misclassified as foregroud 90 INF 4300 4

Miimizig the error error error d error p d Whe we derived the optimal threshold we showed that the miimum error was achieved for placig the threhold or decisio border at the poit where = This is still valid 90 INF 4300 5 Baes decisio rule I the class case our goal of miimizig the error implies a decisiorule: Decide ω if ω >ω ; otherwise ω For J classes the rule aalogusl eteds to choose the class with maimum a posteriori probabilit The decisio boudar is the border betwee classes i ad simpl where ω i =ω actl where the threshold was set i miimum error thresholdig! F 4006 INF 3300 6 3

Baes classificatio with J classes ad D features How do we geeralize: To more the oe feature at a time To J classes To cosider loss fuctios that some errors are more costl tha others 90 INF 4300 7 Feature space If we measure d features will be a d-dimesioal feature vector Let { C } be a set of c classes The posterior probabilit for class c is ow computed as p p c p p Still we assig a piel with feature vector to the class that has the highest posterior probabilit: Decide if for all i 90 INF 4300 8 4

Discrimiat fuctios The decisio rule Decide if for all i ca be writte as assig to if g g i The classifier computes c discrimiat fuctio ad selects the class correspodig to the largest value of the discrimiat fuctio Sice classificatio cosists of choosig the class that has the largest value a scalig of the discrimiat fuctio g i b fg i will ot effect the decisio if f is a mootoicall icreasig fuctio This ca lead to simplificatios as we will soo see 90 INF 4300 9 quivalet discrimiat fuctios The followig choices of discrimiat fuctios give equivalet decisios: p i i gi i p g p i g l p l i i i i i The effect of the decisio rules is to divide the feature space ito c decisio regios R R c If g i >g for all i the is i regio R i The regios are separated b decisio boudaries surfaces i features space where the discrimiat fuctios for two classes are equal 90 INF 4300 30 5

Decisio fuctios - two classes If we have ol two classes assigig to if g >g is equivalet to usig a sigle discrimiat fuctio: g = g -g ad decide if g>0 The followig fuctios are equivalet: g g l p p 90 INF 4300 3 The Gaussia desit - uivariate case a sigle feature To use a classifier we eed to select a probabilit desit fuctio p i The most commol used probabilit desit is the ormal Gaussia distributio: p ep with epected value or mea ad variace p d p d 90 INF 4300 3 6

Traiig a uivariate Gaussia classifier To be able to compute the value of the discrimiat fuctio we eed to have a estimate of ad for each class Assume that we kow the true class labels for some piels ad that this is give i a mask image Traiig the classifier the cosists of computig ad for all piels with class label i the mask file 90 INF 4300 33 Classificatio with a uivariate Gaussia Decide o values for the prior probabilities If we have o prior iformatio assume that all classes are equall probable ad =/c stimate ad based o traiig data Compute the discrimiat fuctio p ep for all classes ad assig each patter to the class with the highest value A simple measure of classificatio accurac ca be to cout the percetage of correctl classified piels overall averaged for all classes or per class If a piel has true class label k it is correctl classified if =k 90 INF 4300 34 7

ample: image ad traiig masks 90 INF 4300 35 8