Patter Classificatio All materials i these slides were tae from Patter Classificatio (d ed) by R. O. Duda, P. E. Hart ad D. G. Stor, Joh Wiley & Sos, 000 with the permissio of the authors ad the publisher
Chapter 3: Maimum-Lielihood & Bayesia Parameter Estimatio (part ) Itroductio Maimum-Lielihood Estimatio Eample of a Specific Case The Gaussia Case: uow µ ad σ Bias Appedi: ML Problem Statemet
Itroductio Data availability i a Bayesia framewor We could desig a optimal classifier if we ew: P(ω i ) (priors) p( ω i ) (class-coditioal desities) Ufortuately, we rarely have this complete iformatio! Desig a classifier from a traiig sample No problem with prior estimatio Samples are ofte too small for class-coditioal estimatio (large dimesio of feature space!) Patter Classificatio, Chapter 3
A priori iformatio about the problem 3 Normality of p( ω i ) p( ω i ) ~ N( µ i, Σ i ) Characterized by parameters Estimatio techiques Maimum-Lielihood (ML) ad the Bayesia estimatios Results are early idetical, but the approaches are differet Patter Classificatio, Chapter 3
Parameters i ML estimatio are fied but uow! 4 Best parameters are obtaied by maimizig the probability of obtaiig the samples observed Bayesia methods view the parameters as radom variables havig some ow distributio Observatio of samples Obtai posterior desity I either approach, we use p(ω i ) for our classificatio rule! Patter Classificatio, Chapter 3
Maimum-Lielihood Estimatio 5 Has good covergece properties as the sample size icreases Simpler tha ay other alterative techiques Geeral priciple Assume we have c classes A collectio of samples categorized D to D C Draw accordig to p( ω j ) Samples are i.i.d p( ω i ) with ow parametric form that specifies it uiquely i Problem: Use the iformatio provided by the samples to obtai good estimates of the parameters i Assume- Samples i D i give o iformatio about j j i Patter Classificatio, Chapter 3
Maimum-Lielihood Estimatio 6 For eample, assume Normal pdfs p( ω j ) ~ N( µ j, Σ j ) p( ω j ) P ( ω j, j ) where: ( µ, Σ ) j j j ( µ, µ,..., σ, σ,cov(, m j j j j j j )...) Patter Classificatio, Chapter 3
Use the iformatio provided by the traiig samples to estimate (,,, c ), each i (i,,, c) is associated with each category 7 p( D p( D Suppose that D cotais samples,,,, ) p( ) F( ) ) is called the lielihood of w.r.t. the set of ML estimate of is, by defiitio the value maimizes p(d ) that It is the value of that best agrees with the actually observed traiig sample ˆ samples) Patter Classificatio, Chapter 3
8 Patter Classificatio, Chapter 3
Optimal estimatio Let (,,, p ) T ad let be the gradiet operator 9 D,,..., p T We defie l() as the log-lielihood fuctio l() l p(d ) New problem statemet: determie that maimizes the log-lielihood ˆ argma l( ) Patter Classificatio, Chapter 3
0 Set of ecessary coditios for a optimum is: D l D l p( ) 0 Patter Classificatio, Chapter 3
Maimum A Posteriori (MAP) Estimators Fid that maimizes l() + l p() Maimum Lielihood Estimator -- p() is flat Patter Classificatio, Chapter 3
Eample of a specific case: uow µ P( i µ) ~ N(µ, Σ) (Samples are draw from a multivariate ormal populatio) l p( µ ) [ π ) Σ ] d l ( ( µ ) T ( µ ) ad D l p( µ ) ( µ ) µ therefore: The ML estimate for µ must satisfy: Σ ( µ ˆ ) 0 Patter Classificatio, Chapter 3
Multiplyig by Σ ad rearragig, we obtai: µ ˆ 3 Just the arithmetic average of the samples of the traiig samples! Coclusio: If p( ω j ) (j,,, c) is supposed to be Gaussia i a d- dimesioal feature space; the we ca estimate the vector (,,, c ) t ad perform a optimal classificatio! Patter Classificatio, Chapter 3
Patter Classificatio, Chapter 3 4 ML Estimatio: Gaussia Case: uow µ ad σ (, ) (µ, σ ) + 0 ) ( 0 ) ( 0 )) ( (l )) ( (l ) ( l ) ( l σ σ σ σ π p p l p l
Patter Classificatio, Chapter 3 5 Summatio: Combiig () ad (), oe obtais: + () 0 ˆ ) ˆ ( ˆ () 0 ) ( ˆ T Σ ˆ) ( ˆ) ( ; ˆ ˆ µ µ µ
6 Bias ML estimate for σ is biased E ( i ) i. σ σ A elemetary ubiased estimator for Σ is: T C ( ˆ)( µ ˆ) µ 4 4444-44444 3 Sample covariace matri Patter Classificatio, Chapter 3
7 Appedi: ML Problem Statemet Let D {,,, } P(,, ) Π, P( ); D Our goal is to determie ˆ (value of that maes this sample the most represetative!) Patter Classificatio, Chapter 3
8 D...... N(µ j, Σ j ) P( j, ω ) P( j ω ) P( j ω ) D. 0. D. 8.. 0. 9 D c...... Patter Classificatio, Chapter 3
9 (,,, c ) Problem: fid ˆ such that: MaP(D ) MaP(,..., ) Ma P( ) Patter Classificatio, Chapter 3
Patter Classificatio, Chapter 3 0 else e p 0 0 ) ( Problem : 0 0 ) ( else e p ) ( e p