2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton by Arne Lejon. The notaton followed n the text book wll be fully respected here. A short revew of the ssue dscussed n the correspondng chapter of the text book s gven here as a reference. For a complete proof of these results and for the problem text refer to the text book. problem defnton and optmzaton Ths chapter of the text book generalzes the problem of classfcaton ntroduced n the prevous chapter. Extensons are: more than two categores are consdered more than one sgnal features can be employed the performance crteron s generalzed The new scheme of the classfer s depcted n fg. taken from the text book. All the elements Sgnal Source S Feature Extracton Classfer Dscrmnant Functon g S Ms S Trunsducton N N X Dscrmnant Functon Dscrmnant Functon g M d M A X Fgure. General sgnal classfcaton Ths work comes under the terms of the Creatve Commons BY-SA.0 lcense http://creatvecommons.org/lcenses/by-sa/.0/ (9)

of ths classfcaton system are descrbed statstcally: the sgnal state can take any value from the set {j =... M s } wth a pror probablty P S (j). The observaton feature vector x s the outcome of a random vector X whose dstrbuton depends on the state S and can be wrtten as f X S (x j) for each of the M s possble states. The decson rule D makes use of the nformaton gven by the a posteror probablty P S X (j x), obtaned wth the Bayes Rule as: P S X (j x) = f X S (x j)p S (j) Ms j= f X S(x j)p S (j) to perform some actons for any ncomng observaton vector x. Ths decson mechansm s the result of an optmzaton process amed at fulfllng a performance crteron. Ths crteron s defned by a cost functon L(D =, S = j) that descrbes the loss the system s subjected to when t takes the decson D =, beng the source n the state S = j. Snce all decsons are taken wth regard to the observaton vector x, and ths s only statstcally related to the true state S of the source, we can predct (statstcally) a Condtonal Expected Loss or Condtonal Rsk: M s R(D = x) = L(D =, S = j)p S X (j x) j= The performance crteron s hence the one that leads to the mnmum rsk and s called Bayes Mnmum-Rsk decson rule: D = arg mn R(D = x) The last rule s proved to mnmze the total expected loss Q = E [R(D(X) X)] over all possble outcomes of the random vector X Specal cases f the decson s to guess the state of the source, and the loss functon s { 0, f = j L(D =, S = j) =, otherwse then the optmal decson rule ntroduced before can be smplfed to the Maxmum a Posteror decson rule (MAP): D = arg max f X S (x )P S () f the prevous condtons are verfed and the a pror probabltes are all the same (P S (j) = M s, for all j), then the resultng decson rule s called Maxmum Lkelhood Decson Rule (ML): D = arg max f X S (x ) In general any decson rule can be expressed n the form: and the g are called dscrmnant functons. D = arg max g =...M d (9)

Exercse. We observe two sequences and we know that one s generated by a human beng and the other by a random-number generator of a computer. There are two possble states of the source: x = { 3 4 5; 4 5 3} S = {, } = {[h, c], [c, h]} Where c stands for computer and h for human beng. The a pror probablty of the states are equally dstrbuted: P S () = P S () = To contnue the soluton of ths problem we have to formulate some assumptons: n the absence of other nformaton t s reasonable to assume that the machne generates unformly dstrbuted numbers, and that any sequence of the knd consdered n ths example has the same probablty of beng generated: P ({ 3 4 5} c) = P ({ 4 5 3} c) = q common sense experence (and perhaps psychologcal arguments) would suggest that the probablty that a human beng generates the sequence { 3 4 5} s hgher than that of generatng the sequence { 4 5 3}. In symbols: P ({ 3 4 5} h) = p ; P ({ 4 5 3} h) = p ; p > p Combnng the events, and assumng that they are ndependent we can wrte: Applyng Bayes rule: P S X (j x) = = P X S (x ) = P ({ 3 4 5; 4 5 3} [h, c]) = p q P X S (x ) = P ({ 3 4 5; 4 5 3} [c, h]) = qp P S (j)p X S (x j) P S ()P X S (x ) + P S ()P X S (x ) qp j q(p + p ) = p j p + p that can be read as the probablty of the state j gven the observaton x. The optmal MAP guess about the source s equvalent to the maxmum lkelhood optmal guess: S opt = arg max P S X (j x) = arg max j j p j = arg max p + p j Accordng to our assumptons on the values of p and p the optmal guess s S = : the human beng has most probably generated the sequence { 3 4 5} whle the machne the sequence { 4 5 3}. p j 3 (9)

Exercse. a) The mnmum error probablty crteron s acheved by consderng the loss functon: {, j L(D =, S = j) = 0, = j Snce the a pror probabltes of the state are unformly dstrbuted, we are n the Maxmum lkelhood case: the decson rule s D = arg max f X S (x ) where f X S (x ) = e (x µ ) σ πσ To smplfy the decson rule I chose to maxmze a monotone ncreasng functon of the argument nstead of the argument tself, for example takng the logarthm: g ln f X S (x ) (x µ ) σ (x µ ) where we smplfy all the constant terms that don t affect the maxmzaton process. Snce the decson mechansm checks whether g s greater or smaller than g, whch are monotone functons of the argument x, ths can be mplemented by a smple threshold x t wth g (x t ) = g (x t ). Substtutng: (x t µ ) = (x t µ ) x t = µ + µ Ths result was predctable when consderng that two Gaussan dstrbutons wth the same 0. f X S (x s )P(S=s ) 0.8 0.6 0.4 0. 0. 0.08 0.06 0.04 x t P E 0.0 µ µ 0 x Fgure. 4 (9)

varance ntersect n the medan pont between ther mean values, and that (as ponted out more that once n chapter ) the optmal threshold wth regard to the mnmum error probablty corresponds to ths ntersecton pont. b) As prevously explaned (see chapter and fg. ), f we assume µ > µ as n ths case, the total probablty of error s gven by: P E = P S () xt f X S (x )dx + P S () Substtutng the gven values, and gven the symmetry: P E = = 0 N(, )dx + 0 + 0 N(, )dx N(, )dx = [ Φ()] = 0.03 For the numercal result refer to BETA pag. 405. Exercse.3 + x t f X S (x )dx We have a sgnal source wth N possble outcomes j =, N, governed by the known probabltes P S (j). There are N + possble decsons (D = j; j =, N f the state was S = j and D = N + no decson ). The cost functon s gven n the exercse text as: L(D=, S=j) = 0, = j j =...N r, = N +, j =...N c, otherwse that sets the cost to 0 f the decson was correct, to c f t was taken, but ncorrect and to r f t was rejected. a) The expected cost s by defnton R(D= x) = j L(D=, S=j)P S X(j x). To compute ths we consder two dfferent cases: ) the decson s taken ( N + ); R(D= x) = c j P S X (j x) = c( P S X ( x)) The last equalty s true because j P S X(j x) =. In ths case we know that the mnmum expected cost s acheved wth the followng decson functon: ) the decson s not taken ( = N + ). D = arg max[cp S X ( x)] and, snce = N +, N R(D=N + x) = r P S X (j x) = r j= D = no decson Beta, mathematcs handbook, Studentltteratur 5 (9)

The last thng to check s whch s the best choce between the frst and second case for each x. The decson wll not be rejected f N + : R(D= x) R(D=N + x) c[ P S X ( x)] r P S X ( x) r c, Ths way we have proved that the decson functon D proposed n the example s optmal. b) If r = 0 then rejectng a decson s free of cost, f c then a wrong decson would be enormously costly. In both cases t s never worth rskng an error d wll always reject the decson. From a mathematcal pont of vew, the condton to accept a decson (no rejecton) becomes P S X ( x) that s never verfed unless the observaton x can only be generated by the source state S = (the equalty holds), and there s no doubt on the decson to take. c) If r > c, rejectng a decson wll always be more expensve than tryng one no decson wll be rejected, from the mathematcal pont of vew, the condton s that the probablty of the state gven the observaton be grater than a negatve number, whch s always verfed by probabltes: P S X ( x) ɛ; ɛ > 0 d) If =, N then the dscrmnant functon correspond to the one n pont a) whch we know to be optmal. We have to prove that the choce of the decson N + leads to the same condton as n pont a). Decson N + s chosen (arg max(.) = N + ) f and only f =, N, g N+ = ( r N c ) f X S (x j)p S (j) > g = f X S (x )P S () j= f X S (x )P S () N j= f X S(x j)p S (j) < r, =, N c applyng Bayes rule, as we wanted to prove. e) The three functons g are: P S X ( x) < r c, =, N g = P S ()f X S (x ) = e (x ) π g = P S ()f X S (x ) = e (x+) π g 3 = ( r c )[g + g ] = 3 4 [g + g ] These functons are plotted n fg. 3. 6 (9)

0.5 Dscrmnant functons g g g 3 0. 0.5 0. 0.05 0 4 3 0 3 4 x Fgure 3. f) The decson D = 3 s taken f and only f g 3 > max [g, g ], whch s verfed n the regon ndcated n the fgure by a sgn. The total probablty of rejecton s P D (3) = P X S ( x 0 < x < x 0 )P S () + P X S ( x 0 < x < x 0 )P S () = x0 x 0 N(, )dx + x0 x 0 N(, )dx Snce the problem s fully symmetrc the two terms are equal and x0 P D (3) = N(, )dx x 0 = Φ(x 0 + ) Φ( x 0 + ) Last thng to do s to fnd the value of x 0.e. the value at whch g 3 = g : g 3 = g ( r c )[ N(, ) + N(, )] = N(, ) ( r c ) e x + [e x + e x ] = e x + e x π π e x = c r c r x = log r r Wth the values specfed by the problem x 0 = ln 3. The total probablty of rejecton s then P R = P D (3) = Φ(ln 3 + ) Φ( ln 3 + ) = Φ(.549) Φ(0.45) 0.7 7 (9)

0.5 Dscrmnant functons g g g 3 0. 0.5 0. 0.05 0 4 3 0 3 4 x Fgure 4. Rejecton s never consdered Where the functon Φ s tabled n BETA 3 pag. 405. The decson g 3 (rejecton) s never chosen f g 3 < max [g, g ], x R. Ths s guaranteed f g 3 (0) < g (0) as s clear lookng at fg 4. Snce g 3 (0) = ( r c )[g (0) + g (0)] = ( r c ) g (0) for the symmetry, then rejecton s never consdered f r > c Intersecton of Two Gaussan Dstrbutons The problem of fndng for whch x, P XS (x, 0) <> P XS (x, ) s common n the exercses seen so far. Ths corresponds to fndng where P S (0)P X S (x, 0) <> P S ()P X S (x, ). In case of Gaussan dstrbutons N(µ, σ ) f we set p = P S () wth = 0, the ntersecton ponts are: x, = ( ) σ µ 0 σ0 µ ± σ 0 σ (µ 0 µ ) + (σ σ 0 ) ln p0 σ p σ 0 σ σ 0 In the specal case n whch σ 0 = σ = σ that s the most nterestng n our case (same nose that affects both observatons), there s at most one fnte soluton, and a sngle threshold on the value of x s a soluton to the problem descrbed before: x = µ + µ 0 + σ (µ 0 µ ) ln If the probablty of the source s equal (p 0 = p = /) then 3 Beta, mathematcs handbook, Studentltteratur x = µ + µ 0 ( p p 0 ) 8 (9)

... or f the means are opposte to each other (µ 0 = µ and µ = µ) x = σ µ ln ( p p 0 ) 9 (9)