Pattern Classification

Size: px

Start display at page:

Download "Pattern Classification"

Walter Ralph Horton
5 years ago
Views:

1 Patter Classificatio All materials i these slides were tae from Patter Classificatio (d ed) by R. O. Duda, P. E. Hart ad D. G. Stor, Joh Wiley & Sos, 000 with the permissio of the authors ad the publisher

2 Chapter 3: Maimum-Lielihood & Bayesia Parameter Estimatio (part ) Itroductio Maimum-Lielihood Estimatio Eample of a Specific Case The Gaussia Case: uow μ ad σ Bias Appedi: ML Problem Statemet

3 Itroductio Data availability i a Bayesia framewor We could desig a optimal classifier if we ew: P(ω i ) (priors) P( ω i ) (class-coditioal desities) Ufortuately, we rarely have this complete iformatio! Desig a classifier from a traiig sample No problem with prior estimatio Samples are ofte too small for class-coditioal estimatio (large dimesio of feature space!) Patter Classificatio, Chapter 3

4 A priori iformatio about the problem Do we ow somethig about the distributio? fid parameters to characterize the distributio 3 Eample: Normality of P( ω i ) P( ω i ) ~ N( μ i, Σ i ) Characterized by parameters Estimatio techiques Maimum-Lielihood (ML) ad the Bayesia estimatios Results are early idetical, but the approaches are differet Patter Classificatio, Chapter 3

5 Parameters i ML estimatio are fied but uow! Best parameters are obtaied by maimizig the probability of obtaiig the samples observed 4 Bayesia methods view the parameters as radom variables havig some ow distributio I either approach, we use P(ω i ) for our classificatio rule! Patter Classificatio, Chapter 3

6 Maimum-Lielihood Estimatio 5 Has good covergece properties as the sample size icreases Simpler tha ay other alterative techiques Geeral priciple Assume we have c classes ad P( ω j ) ~ N( μ j, Σ j ) P( ω j ) P ( ω j, j ) where: ( μ j, Σ j ) ( μ j, μ j,..., σ j, σ j,cov( m j, j )...) Patter Classificatio, Chapter 3

7 Use the iformatio provided by the traiig samples to estimate (,,, c ), each i (i,,, c) is associated with each category 6 Suppose that D cotais samples,,,, P(D P(D ) P( ) F( ) ) is called the lielihood of w.r.t. the set of samples) ML estimate of is, by defiitio the value that maimizes P(D ) It is the value of that best agrees with the actually observed traiig sample ˆ Patter Classificatio, Chapter 3

8 7 Patter Classificatio, Chapter 3

9 Optimal estimatio Let (,,, p ) t ad let be the gradiet operator 8 p,,..., t We defie l() as the log-lielihood fuctio l() l P(D ) (recall D is the traiig data) New problem statemet: determie that maimizes the log-lielihood ˆ arg mal( ) Patter Classificatio, Chapter 3

10 9 The defiitio of l() is: ad l( ) l p( ) ( l l P( )) (eq 6) Set of ecessary coditios for a optimum is: l 0 (eq. 7) Patter Classificatio, Chapter 3

11 Eample, the Gaussia case: uow μ We assume we ow the covariace p( i μ) ~ N(μ, Σ) (Samples are draw from a multivariate ormal populatio) l ad p( μ) μ l p( [ ] d t π ) Σ ( μ) Σ ( μ) l ( μ) Σ ( μ) (eq. 9) 0 μ therefore: The ML estimate for μ must satisfy: Σ ( μˆ ) 0 from eqs 6,7 & 9 Patter Classificatio, Chapter 3

12 Multiplyig by Σ ad rearragig, we obtai: μˆ Just the arithmetic average of the samples of the traiig samples! Coclusio: If P( ω j ) (j,,, c) is supposed to be Gaussia i a d- dimesioal feature space; the we ca estimate the vector (,,, c ) t ad perform a optimal classificatio! Patter Classificatio, Chapter 3

13 Patter Classificatio, Chapter 3 Eample, Gaussia Case: uow μ ad Σ First cosider uivariate case: uow μ ad σ (, ) (μ, σ ) + ) ( ) ( )) ( (l )) ( (l ) ( l ) ( l σ σ σ σ π P P l p l

14 Patter Classificatio, Chapter 3 3 Summatio (over the traiig set): Combiig () ad (), oe obtais: ˆ) ( ˆ ; ˆ μ σ μ + () 0 ˆ ) ˆ ( ˆ () 0 ) ˆ ( ˆ

15 Patter Classificatio, Chapter 3 4 The ML estimates for the multivariate case is similar The scalars χ ad μ are replaced with vectors The variace σ is replaced by the covariace matri t ˆ ) ˆ )( ( ˆ ˆ μ μ Σ μ

16 5 Bias ML estimate for σ is biased E ( i ) σ σ i Etreme case:, E[ ] 0 σ As icreases the bias is reduced this type of estimator is called asymptotically ubiased Patter Classificatio, Chapter 3

17 6 A elemetary ubiased estimator for Σ is: t C ( ˆ )( ˆ μ μ) Sample covariace matri This estimator is ubiased for all distributios Such estimators are called absolutely ubiased Patter Classificatio, Chapter 3

18 7 Our earlier estimator for Σ is biased: Σˆ ( μˆ )( μˆ ) t I fact it is asymptotically ubiased: Observe that ˆ Σ C Patter Classificatio, Chapter 3

19 8 Appedi: ML Problem Statemet Let D {,,, } P(,, ) Π, P( ); D Our goal is to determie ˆ (value of that maimizes the lielihood of this sample set!) Patter Classificatio, Chapter 3

20 9 D N(μ j, Σ j ) P( j, ω ) P( j ω ) P( j ω ) D. 0. D D c Patter Classificatio, Chapter 3

21 0 (,,, c ) Problem: fid ˆ such that: MaP(D ) MaP(,..., ) Ma P( ) Patter Classificatio, Chapter 3

Pattern Classification

Pattern Classification Patter Classificatio All materials i these slides were tae from Patter Classificatio (d ed) by R. O. Duda, P. E. Hart ad D. G. Stor, Joh Wiley & Sos, 000 with the permissio of the authors ad the publisher