Pattern Classification, Ch4 (Part 1)

Patter Classificatio All materials i these slides were take from Patter Classificatio (2d ed) by R O Duda, P E Hart ad D G Stork, Joh Wiley & Sos, 2000 with the permissio of the authors ad the publisher

Chapter 4 (Part 1): No-Parametric Classificatio (Sectios 41-43) 43) Itroductio Desity Estimatio Parze Widows

Itroductio 2 Uderlyig desity fuctios are kow commo parametric forms rarely fit the desities actually ecoutered For ex - All parametric desities are uimodal (have a sigle local maximum), whereas may practical problems ivolve multi-modal modal desities Goal Noparametric procedures that ca be used with arbitrary distributios ad without the assumptio that the forms of the uderlyig desities are kow There are two types of oparametric methods: Estimatig desity fuctio p p(x ω j ) Bypass probability ad go directly to a-posteriori probability estimatio p(ω j x )

Desity Estimatio 3 Basic idea: Probability that a vector x will fall i regio R is: P = R p( x') dx' P is a smoothed (or averaged) versio of the desity fuctio p(x) A sample of size the probability that k poits fall i R is the: k k Pk = P (1 P) k ad the expected value for k is: E(k) = P

4 p(x) is cotiuous ad that the regio R is so small that p does ot vary sigificatly withi it p ( x' )dx' = p( x' ) dx' = p( x' ) 1 ( x )dx' = p( x' ) µ ( R R R R R ) Where: µ(r) is: a surface i the Euclidea space R 2 a volume i the Euclidea space R 3 a hypervolume i the Euclidea space R

5 Sice p(x) p(x ) = costat, therefore i the Euclidea space R 3 : p( x') dx' p( x) V p( x) R k V Where x is a poit withi R ad V the volume eclosed by R

6 Max P k wrt k k P Therefore, the ratio k/ is a good estimate for the probability P ad hece for the desity fuctio p k / p( x ) V

Covergece 8 The fractio k/(v) is a space averaged value of p(x) p(x) is obtaied oly if V approaches zero lim V 0, k = 0 p( x ) = 0 (if = fixed) This is the case where o samples are icluded i R: it is a uiterestig case! lim V 0, k 0 p( x ) = I this case, the estimate diverges: it is a uiterestig case! R

9 The volume V eeds to approach 0 ayway if we wat to use this estimatio Practically, V caot be allowed to become small sice the umber of samples is always limited Oe will have to accept a certai amout of variace i the ratio k/ Theoretically, if a ulimited umber of samples is available, we ca circumvet this difficulty To estimate the desity of x, we form a sequece of regios R 1, R 2, cotaiig x: the first regio cotais oe sample, the secod two samples ad so o Let V be the volume of R, k the umber of samples fallig i R ad p (x) be the th estimate for p(x): p (x) = (k /)/V (7)

10 Three ecessary coditios should apply if we wat p (x) to coverge to p(x): 1 ) limv 2 ) lim k 3 ) lim k There are two differet ways of obtaiig sequeces of regios that satisfy these coditios: = / 0 = (a) Shrik a iitial regio where V = 1/ ad show that p ( x ) This is called the Parze-widow estimatio method (b) Specify k as some fuctio of, such as k = ; ; the volume V is grow util it ecloses k eighbors of x This is called the k -earest eighbor estimatio method = p( 0 x )

12 Parze Widows Parze-widow approach to estimate desities assume that the regio R is a d-dimesioal dimesioal hypercube V = h d (h Let ϕ (u) be the : legth of 1 1 u j j = ϕ(u) = 2 0 otherwise the edge of followig widow 1,,d R ) fuctio : ϕ((x-x i )/h ) is equal to uity if x i falls withi the hypercube of volume V cetered at x ad equal to zero otherwise

The umber of samples i this hypercube is: 13 k = i = i= 1 x ϕ h x i By substitutig k i equatio (7), we obtai the followig estimate: p (x) i 1 = = 1 V x ϕ h i= 1 x i P (x) estimates p(x) as a average of fuctios of x ad the samples (x i ) (i = 1,,) These fuctios ϕ ca be geeral!

14 Parze-Widow Desity Estimates

15 Illustratio The behavior of the Parze-widow method Case where p(x) N(0,1) Let ϕ(u) = (1/ (2π) exp(-u 2 /2) ad h = h 1 / (>1) Thus: p (h 1 : kow parameter) is a average of ormal desities cetered at the samples x i i 1 = ( x ) = 1 h x ϕ h i= 1 x i

16 Numerical results: For = 1 ad h 1 =1 1 1 / 2 2 p1( x ) = ϕ( x x1 ) = e ( x x1 ) N( x1 2π,1 ) For = 10 ad h = 01,, the cotributios of the idividual samples are clearly observable!

18 Aalogous results are also obtaied i two dimesios as illustrated:

Case where p(x) = λ 1 U(a,b) + λ 2 T(c,d) (ukow desity) (mixture of a uiform ad a triagle desity) 20

22 Classificatio example I classifiers based o Parze-widow estimatio: We estimate the desities for each category ad classify a test poit by the label correspodig to the maximum posterior The decisio regio for a Parze-widow classifier depeds upo the choice of widow fuctio as illustrated i the followig figure

Parze Widows Probabilistic Neural Networks 24 Patter Classificatio, Chapter 4 (Part 2)

Parze Widows Probabilistic Neural Networks 25 Compute a Parze estimate based o patters Iput uit Patters with d features sampled from c classes The iput uit is coected to patters x 1 x 2 x d W 11 W d W d2 p 1 p 2 p Iput patters Modifiable weights (traied) Patter Classificatio, Chapter 4 (Part 2)

26 Iput patters p p 1 p 2 ω p k 1 ω 2 ω c Category uits p Activatios (Emissio of oliear fuctios) Patter Classificatio, Chapter 4 (Part 2)

Traiig the etwork 27 Patter Classificatio, Chapter 4 (Part 2)

Traiig the etwork 28 Algorithm 1 Normalize each patter x of the traiig set to 1 2 Place the first traiig patter o the iput uits 3 Set the weights likig the iput uits ad the first patter uits such that: w 1 = x 1 4 Make a sigle coectio from the first patter uit to the category uit correspodig to the kow class of that patter 5 Repeat the process for all remaiig traiig patters by settig the weights such that w k = x k (k = 1, 2,, ) Patter Classificatio, Chapter 4 (Part 2)

29 1Normalize the test patter x ad place it at the iput uits 2 Each patter uit computes the ier product i order to yield the et activatio t etk = wk x ad emit a oliear fuctio x wk ( x wk )^T ( x wk ) w x ϕ( ) exp exp( ) 2 2 h 2σ σ 3 Each output uit sums the cotributios from all patter uits coected to it P ( x ω ) = i= 1 4 Classify by selectig the maximum value of P (x ω j ) (j = 1,, c) j ϕ i T P( ω x ) j Patter Classificatio, Chapter 4 (Part 2)