MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

Group M D L M Chapter Bayesan Decson heory Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty

Bayesan Decson heory Bayesan decson theory s a statstcal approach to data mnng/pattern recognton Mathematcal foundaton for decson makng Usng probablstc approach to help makng decson so as to mnmze the rsk cost. Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty

Bayesan Decson heory Basc Assumptons he decson problem s posed formalzed n probablstc terms All the relevant probablty values are known Key rncple Bayes heorem Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 3

relmnares and Notatons a state of nature : : p : p : : pror probablty feature vector evdence probablty class-condtonal densty / lkelhood posteror probablty Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 4

Decson Before Observaton he roblem o make a decson where ror probablty s known No observaton s allowed Naïve Decson Rule Decde f, otherwse hs s the best we can do wthout observaton Fed pror probabltes -> Same decsons all tme Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 5

Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 6 Bayes heorem p p c p p homas Bayes 70-76

Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 7 Decson After Observaton p p arg ma D arg ma D unmportant n makng decson

Decson After Observaton p p Bayes Formula Known ror probablty Class-condtonal pdf Observaton Unknown osteror probablty : Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 8

Specal Cases p p Case I: Equal pror probablty = = = c =/c Depends on the lkelhood p Case II: Equal lkelhood p =p = = p c Degenerate to naïve decson rule posteror lkelhood pror evdence Normally, pror probablty and lkelhood functon together n Bayesan decson process Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 9

An eample : sea bass : salmon =/3 =/3 What wll the posteror probablty for ether type of fsh look lke? class-condtonal pdf for lghtness Decde f p > p ; otherwse decde Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 0

An eample R R R R posteror probablty for ether type of fsh h-as: lghtness of fsh scales v-as: posteror probablty for each type of fsh Black curve: sea bass Red curve: salmon For each value of, the hgher curve yelds the output of Bayesan decson For each value of, the posterors of ether curve sum to.0 Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty

Another Eample roblem statement A new medcal test s used to detect whether a patent has a certan cancer or not, whose test result s ether + postve or negatve For patent wth ths cancer, the probablty of returnng postve test result s 0.98 For patent wthout ths cancer, the probablty of returnng negatve test result s 0.97 he probablty for any person to have ths cancer s 0.008 Queston If postve test result s returned, does she/he have cancer? Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty

Another Eample Cont. Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 3

Feasblty of Bayes Formula p p o compute posteror probablty, we need to know pror probablty and lkelhood posteror lkelhood pror evdence How do we know these probabltes? A smple soluton: Countng Relatve frequences An advanced soluton: Conduct Densty estmaton Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 4

A Further Eample roblem Based on the heght of a car n some campus, decde whether t costs more than $50,000 or not : prce > $ 50,000 : prce <=$ 50,000 : heght of a car Decde f > ; otherwse decde Quanttes to know: How to get them? Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 5

A Further Eample Cont. Collectng samples Suppose we have randomly pcked 09 cars n the campus, got prces from ther owners, and measured ther heghts Compute and # cars n : # cars n : 988 09 988 09 0.83 0.87 Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 6

A Further Eample Cont. Compute Dscretze the heght spectrum say [0.5m,.5m] nto 0 ntervals each wth length 0.m, and then count the number of cars fallng nto each nterval for ether class Suppose =.05, whch means that falls nto nterval I = [.0m,.m] For, # cars n I s 46, For, # cars n I s 59, 46.05 59.05 988 0.08 0.0597 Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 7

A Further Eample Cont. Queston For a car wth heght.05m, s ts prce greater than $50,000? 0.83 09 988 0.87 09 46.05 0.08 59.05 0.0597 988.05.05 /.05.05.05 0.870.0597.05 0.830.08.05.05 <, prce<=$50,000 Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 8

Is Bayes Decson Rule Optmal Consder two categores Decde f > ; otherwse decde When we observe, the probablty of error s: error f f we decde we decde hus, under Bayes decson rule, we have error mn[, ] For every, we ensure that error s as small as possble Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 9

Generalzed Bayes Decson Rule Allowng to use more than one feature R d R : d-dmensonal Eucldean Space Allowng more than two states of nature a set of c states of nature Allowng actons other than merely decdng the state of nature a set of a possble actons Note that c a Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty

Generalzed Bayes Decson Rule cont. Introducng a loss functon more general than the probablty of error : A R loss functon, : the loss ncurred for takng acton when the state of nature s For ease of reference, t s usually wrtten as: : We want to mnmze the epected loss n makng decson. Rsk Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty

Generalzed Bayes Decson Rule cont. roblem Gven a partcular, we have to decde whch acton to take o do ths, we need to know the loss of takng each acton a α : However, the true state of nature s uncertan he acton beng taken α rue state of nature Epected average loss We want to mnmze the epected loss n makng decson. Rsk Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 4

Generalzed Bayes Decson Rule cont. Epected loss c R Gven, the epected loss rsk assocated wth takng acton. c he ncurred loss of takng acton α n case of true state of nature beng he probablty of beng the true state of nature he epected loss s also named as condtonal rsk Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 5

Generalzed Bayes Decson Rule cont. Suppose we have: For a partcular : = 0.0 = 0.99 Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 6

Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 7 Generalzed Bayes Decson Rule cont. 0/ Loss Functon c R c otherwse correct decson asscated wth s a 0 R error

Generalzed Bayes Decson Rule cont. Bayes decson rule general case A Overall rsk A arg mn R arg mn c R R p d Decson functon For every, we ensure that the condtonal rsk Ra s as small as possble; hus, the overall rsk over all possble must be as small as possble. he optmal one to mnmze the overall rsk Its resultng overall rsk s called the Bayesan rsk Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 8

General Case: wo-category {, } {, } Loss Functon Acton State of Nature R R Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 9

Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 30 General Case: wo-category erform f R > R ; otherwse perform R R

General Case: wo-category erform f R > R ; otherwse perform postve postve osteror probabltes are scaled before comparson. Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 3

Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 3 General Case: wo-category erform f R > R ; otherwse perform p p p p

General Case: wo-category Lkelhood Rato hreshold erform f p p Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 33

Dscrmnant Functon Dscrmnant functons for multcategory g d : R R c One functon per category g g Acton e.g., classfcaton g c Assgn to f g > g for all. Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 34

Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 35 Dscrmnant Functon Mnmum Rsk Case: Mnmum Error-Rate Case: R g g p g ln ln p g

Dscrmnant Functon Relatonshp between mnmum rsk and mnmum error rate Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 36

Dscrmnant Functon Varous dscrmnant functon Identcal classfcaton results If f. s a monotoncally ncreasng functon, then fg. s are also be dscrmnant functons. Eample f k k 0 f g k g c f ln f g ln g c Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 37

Decson Regons c dscrmnant functons result n c decson regons. R { g g } where R R and c R R d Decson boundary Decson regons are separated by decson boundares wo-category eample Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 38

he Normal Dstrbuton Dscrete random varable X - Assume nteger robablty mass functon pmf: p X Cumulatve dstrbuton functon cdf: Contnuous random varable X robablty densty functon pdf: p or f Cumulatve dstrbuton functon cdf: F X p t t not a probablty F X p t dt Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 39

Epectatons a.k.a. epected value, mean or average of a random varable s a random varable, the epectaton of E[ ] p p d he k th k moment E[ X ] he st moment E[X X ] s dscrete s contnuous he k th k central moment E[ X ] Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 40 X

Important Epectatons Mean X Varance E[ X ] p p d X X s s dscrete contnuous X Var[ X ] E[ X X ] X X p p d X X s s dscrete contnuous Notaton: Var[ ] σ: standard devaton? Fact: Var[ ] E[ ] E[ ] Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 4

Entropy he entropy measures the fundamental uncertanty n the value of ponts selected randomly from a dstrbuton. H[ X ] p log p p log p d X X s dscrete s contnuous Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 4

Unvarate Gaussan Dstrbuton Gaussan dstrbuton, a.k.a. Gaussan densty, normal densty. X~N,σ p E[X] = e Var[X] =σ Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 43

Unvarate Gaussan Dstrbuton Gaussan dstrbuton, a.k.a. Gaussan densty, normal densty. X~N,σ p E[X] = e Var[X] =σ Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 44

Random Vectors A d-dmensonal random vector s: X,,, d d X : R X ~ p X p, Epected vector E[ ] E[ ] E[ X] E[ d ], ont pdf Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 45, d E[ ] p d d Margnal pdf on the th component. E X [ ],,, d

Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 46 Random Vectors Covarance matr ] [ E X X d d d d d -, ] [ d d p E Margnal pdf on a par of random varables, ropertes: Symmetrc, ostve semdefnte

Multvarate Gaussan Dstrbuton X s a d-dmensonal random vector X ~ N, p E[X ] d ep / / E[ X X ] Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 47

ropertes of N, X s a d-dmensonal random vector, and X ~ N, If Y=A X, where A s a d k matr, then Y~NA, A A Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 48

On Covarance Matr As mentoned before, s symmetrc and postve semdefnte. ΦΛΦ hus, ΦΛ / Λ / Φ : orthonormal matr, whose columns are egenvectors of. : dagonal matr egenvalues. ΦΛ / ΦΛ / Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 49

Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 50 Mahalanobs Dstance Mahalanobs dstance r, ~ N X ep / / d p ep / / d p constant r depends on the value of r.c. Mahalanobs 894-97

Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 5 Dscrmnant Functons for Gaussan Densty Mnmum-error-rate classfcaton c g ln g ln ln g ep / / d p ln ln ln d g Constant, could be gnored Constant, could be gnored

Dscrmnant Functons for Gaussan Densty hree cases Case Classes are centered at dfferent mean, and ther feature components are parwsely ndependent have the same varance. Case I Classes are centered at dfferent mean, but have the same varaton. Case 3 Arbtrary Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 5

Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 53 Case : I ln ln ln d g rrelevant ln g ln ln g rrelevant I

Case : g ln It s a lnear dscrmnant functon I where g Weght vector w w 0 w hreshold/bas w ln 0 Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 54

Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 55 Case : I 0 w g w 0 0 w w w w 0 0 w w w w ln ln Boundary btw. and g g w ln 0 w

Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 56 Case : he decson boundary wll be a hyperplane perpendcular to the lne btw. the means at somewhere. I Boundary btw. and g g ln w w 0 0 w ln 0 w 0 0 0 f = mdpont

Case : I Mnmum dstance classfer template matchng Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 57

Case : I Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 58

Case : I Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 59

Case : I Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 60

Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 6 Case : ln ln ln d g rrelevant Irrelevant f =, ln g Mahalanobs Dstance ln Irrelevant 0 w g w w ln 0 w

Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 6 Case : 0 w g w g g 0 0 w w ] / ln[ 0 w 0 w ln 0 w

Case : Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 63

Case : Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 64

Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 65 Case 3: ln ln g 0 w g w W Wthout ths term In Case and ln ln ln d g rrelevant W w ln ln 0 w Decson surfaces are hyperquadrcs, e.g., Hyperplanes Hyperspheres Hyperellpsods hyperhyperbolods

Case 3: Non-smply connected decson regons can arse n one dmenson for Gaussans havng unequal varance. Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 66

Case 3: Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 67

Case 3: Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 68

Case 3: Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 69

Case 3: Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 70

Summary Bayesan Decson heory Basc concepts Bayes theorem Bayes decson rule Feasblty of Bayes Decson Rule ror probablty + lkelhood Soluton I: countng relatve frequences Soluton II: conduct densty estmaton Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 7

Summary Bayes decson rule: he general scenaro Allowng more than one feature Allowng more than two states of nature Allowng actons than merely decdng state of nature Loss functon Epected loss condtonal rsk General Bayes decson rule Mnmum-error-rate classfcaton Dscrmnant functons Gaussan densty Dscrmnant functons for Gaussan pdf. Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 7

k-means Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 73

Group Any Queston? Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty