EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton Memoral Unversty of ewfoundland Pattern Recognton Lecture 7 May 3, 006 http://wwwengrmunca/~charlesr Offce Hours: uesdays hursdays 8:30-9:30 PM E-000 (untl furr notce Revew Bayesan Classfcaton For a gven pattern x, classfy t to most probable class P (C x Use Bayes heorem to fnd probabltes: C C j P (C j x (after measurement pro a posteror probablty of C gven x class-condtonal pdf, derved from samples P (C x p C P (C p a pror probablty of class C total pdf of x, ndependent of class EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton
Revew Maxmum A Pror (Bayes Classfer p C P (C C C j p C j P (C j p C P (C p C j P (C j θ EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton 3 Revew PMAP(error! PML(error! PMICD(error! PMED(error P(error P(classfyng as C Cj P(Cj + P(classfyng as Cj C P(C EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton 4
Loss and Condtonal Rsk Loss λ j λ(α C j cost of acton gven class j Condtonal Rsk Classfer c R(α x λ(α C j P (C j x R(α x j α j α R(α j x EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton 5 Probablstc Classfcaton Wth pror probabltes and class condtonal denstes, we can desgn optmal classfers based on MAP classfer Unfortunately, n real world, we rarely have ths nformaton avalable We typcally have a number of tranng samples from whch we must determne nformaton about class If we can assume form of probablstc dstrbutons, such as Gaussan, n we smply need to fnd (estmate necessary parameters (mean and covarance Orwse we need to use nonparametrc technques to estmate dstrbuton EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton 6
Gven labelled samples {x xɛc,,,, } { Estmate Probablstc Classfcaton p C, P (C a Parameter estmaton - assume a known form for pdf and estmate necessary parameters (mean and varance for normal dstrbutons # b Densty estmaton - estmate non-parametrc pdf s from gven samples Evaluate dscrmnant 3 Assgn x accordngly ˆp C ˆP (C ˆp C ˆP (C estmated values C C 0 EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton 7 Parameter Estmaton - Assume a form for dstrbuton, whch wll dctate requred parameters - Gaussan pdf: ˆµ, ˆσ for -D and ˆµ, ˆΣ for mult-d Assumng two classes, n re are also + samples P (C P P (C P eed to fnd P! here are approaches to estmatng parameters: Maxmum lkelhood Bayes estmaton (Bayesan learnng EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton 8
Maxmum Lkelhood Parameter Estmaton Choose as estmates values of parameters that maxmze lkelhood (probablty of observed set of tranng samples (labelled A Pror Probablty Estmates Assumng that we knew P(CPP and P(CP(-P, n we could wrte P (, P Probablty of occurrences of C n samples, wth P(C for any sngle sample gven by P EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton 9 ( P (, P P ( P!!!(! P ( P We need to fnd value of P that maxmzes P(,P occurrng Label ths value of P as( ˆP ML So we need to take dervatve wrt P and set t to zero: ( P {P (, P } ( { P ( P ( P ( P } 0 ( P ( P 0 P 0 EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton ˆP ML 0
So maxmum lkelhood estmate of a pror probablty of C s relatve frequency of occurrence of C n samples EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton Gaussan Parameter Estmaton Follow a smlar process to fnd parameters for assumed Gaussan densty If we knew µ, Σ n we could wrte probablty of samples x x as p x µ, Σ (π d/ Σ / e Σ for, Σ that maxmze ths hat s, for EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton
Σ as µ xp Σ wrte e of samples If we knew µ,p Σn we µ, could probablty x d/ e xµ / d/ x µ, Σ / (π Σ!! (π Σ µ Σ µ Σ µ p µ x µ, Σ e x µ, Σ! e / d/ d/ Σ / µ Σ µ (πare Σ (π x µ, Σ We e lookng µ, Σ for thatµ,maxmze ths! hatths maxmze s, for are for lookng Σ that hat for Σ µ (πd/we Σ / s, µ p x µ, Σ! / e µ Σ µ d/ (π Σ e p maxmze x µ, Σ ths hat r µ, Σ that maxmze hat for s, for We areths lookng µ, Σ that s, Σ for d/ / (π r µ, Σ that maxmze ths hat s, for We are lookng for µ, Σ that maxmze ths hat s, for We are lookng for µ, Σ that maxmze ths hat s, for p x µ, Σ 0 p x µ, Σ 0 p x µ, Σ 0 p x p xx µ, µ,σ Σ 0 p µ, Σ 0 Σ p p xσ µ, µ, x 0Σ 0 Σ x µ, Σ p 0 p x µ, Σ 0 Σ p x µ, Σ 0 p x µ, Σ 0 Σ Frst, take log to smplfy: Σ p x µ, Σ 0 Frst, take log to smplfy: Σp x µ, Σ 0 smplfy: Σ smplfy: Frst, takefrst, log to smplfy: take log to smplfy: $ # Frst, take log to smplfy: n/ / Σ log (π # Σ $ $ n/ / Σ Σ log (π $ Σ / Σ / # Σ Σ $ # $ Σ $ /n/ Σ / n/ log # (π Σ Σ (π log Σ n/ / log (π Σ Σ Σ Σ Σ (Σ Σ 5 MAXIMUM LIKELIHOOD 8 ΣΣ (Σ (Σ 0 to maxmze (Σ! µ M L x m sample mean (Σ 0(Σ to maxmze 0 to maxmze (Σ EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton 3 0 to maxmze Smlarly for covarance matrx Σ Consder θ Σ for convenence 0 to maxmze 0 to maxmze 0 to maxmze 5 MAXIMUM LIKELIHOOD # $ M LIKELIHOOD 5 8 n/ MAXIMUM LIKELIHOOD 8 log (π + log θ θ θ M LIKELIHOOD 8! 5 MAXIMUM LIKELIHOOD 8! µ x m sample mean! M L θ µ M L x m sample mean! x m sample mean! x m sample mean θ θ µ Σ x m mean Consder forσ covarance Σ Consder θ Σ for M L matrx or covarance sample Σ for Smlarly convenence Smlarly for covarance matrx Consder θ Σmatrx for convenence Smlarly for θ covarance matrx covarance matrx Σ Consder θ Σ changes for convenence ote transpose poston dfferentaton Smlarly for that covarance matrx Σ Consder θ after Σ for convenence conv We use result that # $ n/ (π $ + log θ (π log θ # $ θ log θ + n/ # θ$ log (π + log θ θ πn/ + log θ θ n/ log θ θ log (π + θ ' ( θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ For maxmum value (set equaton to zero transpose changes poston after dfferentaton ote that transpose changes poston after dfferentaton e transpose changes poston after dfferentaton ote that transpose changes poston after dfferentaton result that We use result that transpose changes poston after dfferentaton ote * that esult that θ We use result that ML n/ $ We use result that θ ' ( θ ' ( But θ Σ and θσ Σ θ θ ' ( θ θ θ ' ( θ θ θ θ θ θ θ θ um value (set equaton tofor zero maxmum value (set equaton to zero θ EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton m value (set equaton to zero Σ M S L For maxmum value (set equaton to zero θ θ * * matrx whch s sample covarance θ θ M L value(set M*L For maxmum equaton to zero * ˆ '4 ( θ
[ ] [ˆθ ] ML But θ Σ and Σ Σ ˆΣ ML S whch s sample covarance matrx! ote that ths form s based EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton 5 Bas and Based Estmates We now have maxmum lkelhood values for pdf parameters hese values maxmze probablty of samples observed n tranng set he estmated class a pror probabltes he estmated mean and covarance matrces for a normal dstrbuton ˆµ ML x m ˆΣ ML ˆ ˆ [ Are se formulatons based? ] EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton 6
Defnton: An estmate s unbased f ts expected value s ts true value Is maxmum lkelhood estmate of mean based? [ ] E x E[ˆµ E[x ] ( µ ML ] So ML mean s unbased EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton 7 What s expected value of ML estmate of covarance matrx? [ ] E[ˆΣ ML ] E ˆ ˆ [ E ˆ ˆ ] E [ ( (ˆµ ( (ˆµ ] [ EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton 8
[ ] E [ ( (ˆµ ( (ˆµ ] [ ] E [ (ˆµ (ˆµ + (ˆµ (ˆµ ] [ ] [ [ ] However, [ snce x [ ˆµ E [ [ [ (ˆµ (ˆµ (ˆµ (ˆµ + (ˆµ (ˆµ ] [ ] E [ (ˆµ (ˆµ ] Σ E[(ˆµ (ˆµ ] So expected value of ML covarance matrx s actual covarance matrx mnus a small amount It s based EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton 9 [ E[(ˆµ (ˆµ ] E ] E j j [ ] ] E samples are ndependent Σ In unvarate case, ths s equvalent to sayng: var σ var( x σ he varance of sample mean s varance dvded by number of samples EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton 0
[ he ML covarance matrx s slghtly less than actual covarance matrx, but becomes a better estmate for larger values of E[ˆΣ ML ] Σ Σ Σ An unbased estmate of covarance matrx s n: ˆΣ u ˆΣ ML m m EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton