MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

Size: px

Start display at page:

Download "MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU"

Dominick Singleton
5 years ago
Views:

1 Group M D L M Chapter Bayesan Decson heory Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty

2 Bayesan Decson heory Bayesan decson theory s a statstcal approach to data mnng/pattern recognton Mathematcal foundaton for decson makng Usng probablstc approach to help makng decson so as to mnmze the rsk cost. Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty

3 Bayesan Decson heory Basc Assumptons he decson problem s posed formalzed n probablstc terms All the relevant probablty values are known Key rncple Bayes heorem Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 3

4 relmnares and Notatons a state of nature : : p : p : : pror probablty feature vector evdence probablty class-condtonal densty / lkelhood posteror probablty Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 4

5 Decson Before Observaton he roblem o make a decson where ror probablty s known No observaton s allowed Naïve Decson Rule Decde f, otherwse hs s the best we can do wthout observaton Fed pror probabltes -> Same decsons all tme Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 5

Xn-Shun Xu @ SDU School of Computer Scence and echnology,

6 Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 6 Bayes heorem p p c p p homas Bayes 70-76

7 Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 7 Decson After Observaton p p arg ma D arg ma D unmportant n makng decson

Unknown osteror probablty : Xn-Shun Xu @ SDU

8 Decson After Observaton p p Bayes Formula Known ror probablty Class-condtonal pdf Observaton Unknown osteror probablty : Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 8

9 Specal Cases p p Case I: Equal pror probablty = = = c =/c Depends on the lkelhood p Case II: Equal lkelhood p =p = = p c Degenerate to naïve decson rule posteror lkelhood pror evdence Normally, pror probablty and lkelhood functon together n Bayesan decson process Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 9

10 An eample : sea bass : salmon =/3 =/3 What wll the posteror probablty for ether type of fsh look lke? class-condtonal pdf for lghtness Decde f p > p ; otherwse decde Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 0

An eample R R R R posteror probablty for ether type of fsh

type of fsh Black curve: sea bass Red curve: salmon For each

For each value of, the posterors of ether curve sum to.

11 An eample R R R R posteror probablty for ether type of fsh h-as: lghtness of fsh scales v-as: posteror probablty for each type of fsh Black curve: sea bass Red curve: salmon For each value of, the hgher curve yelds the output of Bayesan decson For each value of, the posterors of ether curve sum to.0 Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty

12 Another Eample roblem statement A new medcal test s used to detect whether a patent has a certan cancer or not, whose test result s ether + postve or negatve For patent wth ths cancer, the probablty of returnng postve test result s 0.98 For patent wthout ths cancer, the probablty of returnng negatve test result s 0.97 he probablty for any person to have ths cancer s Queston If postve test result s returned, does she/he have cancer? Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty

13 Another Eample Cont. Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 3

14 Feasblty of Bayes Formula p p o compute posteror probablty, we need to know pror probablty and lkelhood posteror lkelhood pror evdence How do we know these probabltes? A smple soluton: Countng Relatve frequences An advanced soluton: Conduct Densty estmaton Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 4

15 A Further Eample roblem Based on the heght of a car n some campus, decde whether t costs more than $50,000 or not : prce > $ 50,000 : prce <=$ 50,000 : heght of a car Decde f > ; otherwse decde Quanttes to know: How to get them? Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 5

16 A Further Eample Cont. Collectng samples Suppose we have randomly pcked 09 cars n the campus, got prces from ther owners, and measured ther heghts Compute and # cars n : # cars n : Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 6

A Further Eample Cont. Compute Dscretze the heght spectrum say [0.5m,.5m] nto 0 ntervals each wth length 0.m, and then count the number of cars fallng nto each nterval for ether class Suppose =.

17 A Further Eample Cont. Compute Dscretze the heght spectrum say [0.5m,.5m] nto 0 ntervals each wth length 0.m, and then count the number of cars fallng nto each nterval for ether class Suppose =.05, whch means that falls nto nterval I = [.0m,.m] For, # cars n I s 46, For, # cars n I s 59, Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 7

18 A Further Eample Cont. Queston For a car wth heght.05m, s ts prce greater than $50,000? / <, prce<=$50,000 Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 8

19 Is Bayes Decson Rule Optmal Consder two categores Decde f > ; otherwse decde When we observe, the probablty of error s: error f f we decde we decde hus, under Bayes decson rule, we have error mn[, ] For every, we ensure that error s as small as possble Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 9

20 Is Bayes Decson Rule Optmal Consder two categores Decde f > ; otherwse decde When we observe, the probablty of error s: error f f we decde we decde hus, under Bayes decson rule, we have error mn[, ] For every, we ensure that error s as small as possble Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 0

21 Generalzed Bayes Decson Rule Allowng to use more than one feature R d R : d-dmensonal Eucldean Space Allowng more than two states of nature a set of c states of nature Allowng actons other than merely decdng the state of nature a set of a possble actons Note that c a Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty

22 Generalzed Bayes Decson Rule cont. Introducng a loss functon more general than the probablty of error : A R loss functon, : the loss ncurred for takng acton when the state of nature s For ease of reference, t s usually wrtten as: : We want to mnmze the epected loss n makng decson. Rsk Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty

23 Generalzed Bayes Decson Rule cont. Introducng a loss functon more general than the probablty of error : A R loss functon, : the loss ncurred for takng acton when the state of nature s For ease of reference, t s usually wrtten as: : We want to mnmze the epected loss n makng decson. Rsk Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 3

24 Generalzed Bayes Decson Rule cont. roblem Gven a partcular, we have to decde whch acton to take o do ths, we need to know the loss of takng each acton a α : However, the true state of nature s uncertan he acton beng taken α rue state of nature Epected average loss We want to mnmze the epected loss n makng decson. Rsk Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 4

25 Generalzed Bayes Decson Rule cont. Epected loss c R Gven, the epected loss rsk assocated wth takng acton. c he ncurred loss of takng acton α n case of true state of nature beng he probablty of beng the true state of nature he epected loss s also named as condtonal rsk Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 5

26 Generalzed Bayes Decson Rule cont. Suppose we have: For a partcular : = 0.0 = 0.99 Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 6

27 Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 7 Generalzed Bayes Decson Rule cont. 0/ Loss Functon c R c otherwse correct decson asscated wth s a 0 R error

28 Generalzed Bayes Decson Rule cont. Bayes decson rule general case A Overall rsk A arg mn R arg mn c R R p d Decson functon For every, we ensure that the condtonal rsk Ra s as small as possble; hus, the overall rsk over all possble must be as small as possble. he optmal one to mnmze the overall rsk Its resultng overall rsk s called the Bayesan rsk Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 8

29 General Case: wo-category {, } {, } Loss Functon Acton State of Nature R R Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 9

30 Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 30 General Case: wo-category erform f R > R ; otherwse perform R R

31 General Case: wo-category erform f R > R ; otherwse perform postve postve osteror probabltes are scaled before comparson. Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 3

32 Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 3 General Case: wo-category erform f R > R ; otherwse perform p p p p

33 General Case: wo-category Lkelhood Rato hreshold erform f p p Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 33

34 Dscrmnant Functon Dscrmnant functons for multcategory g d : R R c One functon per category g g Acton e.g., classfcaton g c Assgn to f g > g for all. Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 34

35 Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 35 Dscrmnant Functon Mnmum Rsk Case: Mnmum Error-Rate Case: R g g p g ln ln p g

36 Dscrmnant Functon Relatonshp between mnmum rsk and mnmum error rate Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 36

37 Dscrmnant Functon Varous dscrmnant functon Identcal classfcaton results If f. s a monotoncally ncreasng functon, then fg. s are also be dscrmnant functons. Eample f k k 0 f g k g c f ln f g ln g c Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 37

38 Decson Regons c dscrmnant functons result n c decson regons. R { g g } where R R and c R R d Decson boundary Decson regons are separated by decson boundares wo-category eample Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 38

39 he Normal Dstrbuton Dscrete random varable X - Assume nteger robablty mass functon pmf: p X Cumulatve dstrbuton functon cdf: Contnuous random varable X robablty densty functon pdf: p or f Cumulatve dstrbuton functon cdf: F X p t t not a probablty F X p t dt Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 39

40 Epectatons a.k.a. epected value, mean or average of a random varable s a random varable, the epectaton of E[ ] p p d he k th k moment E[ X ] he st moment E[X X ] s dscrete s contnuous he k th k central moment E[ X ] Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 40 X

41 Important Epectatons Mean X Varance E[ X ] p p d X X s s dscrete contnuous X Var[ X ] E[ X X ] X X p p d X X s s dscrete contnuous Notaton: Var[ ] σ: standard devaton? Fact: Var[ ] E[ ] E[ ] Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 4

42 Entropy he entropy measures the fundamental uncertanty n the value of ponts selected randomly from a dstrbuton. H[ X ] p log p p log p d X X s dscrete s contnuous Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 4

43 Unvarate Gaussan Dstrbuton Gaussan dstrbuton, a.k.a. Gaussan densty, normal densty. X~N,σ p E[X] = e Var[X] =σ Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 43

44 Unvarate Gaussan Dstrbuton Gaussan dstrbuton, a.k.a. Gaussan densty, normal densty. X~N,σ p E[X] = e Var[X] =σ Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 44

45 Random Vectors A d-dmensonal random vector s: X,,, d d X : R X ~ p X p, Epected vector E[ ] E[ ] E[ X] E[ d ], ont pdf Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 45, d E[ ] p d d Margnal pdf on the th component. E X [ ],,, d

46 Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 46 Random Vectors Covarance matr ] [ E X X d d d d d -, ] [ d d p E Margnal pdf on a par of random varables, ropertes: Symmetrc, ostve semdefnte

47 Multvarate Gaussan Dstrbuton X s a d-dmensonal random vector X ~ N, p E[X ] d ep / / E[ X X ] Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 47

48 ropertes of N, X s a d-dmensonal random vector, and X ~ N, If Y=A X, where A s a d k matr, then Y~NA, A A Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 48

49 On Covarance Matr As mentoned before, s symmetrc and postve semdefnte. ΦΛΦ hus, ΦΛ / Λ / Φ : orthonormal matr, whose columns are egenvectors of. : dagonal matr egenvalues. ΦΛ / ΦΛ / Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 49

Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 50 Mahalanobs Dstance

50 Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 50 Mahalanobs Dstance Mahalanobs dstance r, ~ N X ep / / d p ep / / d p constant r depends on the value of r.c. Mahalanobs

51 Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 5 Dscrmnant Functons for Gaussan Densty Mnmum-error-rate classfcaton c g ln g ln ln g ep / / d p ln ln ln d g Constant, could be gnored Constant, could be gnored

52 Dscrmnant Functons for Gaussan Densty hree cases Case Classes are centered at dfferent mean, and ther feature components are parwsely ndependent have the same varance. Case I Classes are centered at dfferent mean, but have the same varaton. Case 3 Arbtrary Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 5

53 Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 53 Case : I ln ln ln d g rrelevant ln g ln ln g rrelevant I

54 Case : g ln It s a lnear dscrmnant functon I where g Weght vector w w 0 w hreshold/bas w ln 0 Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 54

55 Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 55 Case : I 0 w g w 0 0 w w w w 0 0 w w w w ln ln Boundary btw. and g g w ln 0 w

56 Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 56 Case : he decson boundary wll be a hyperplane perpendcular to the lne btw. the means at somewhere. I Boundary btw. and g g ln w w 0 0 w ln 0 w f = mdpont

57 Case : I Mnmum dstance classfer template matchng Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 57

58 Case : I Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 58

59 Case : I Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 59

60 Case : I Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 60

Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty 6 Case : ln

61 Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 6 Case : ln ln ln d g rrelevant Irrelevant f =, ln g Mahalanobs Dstance ln Irrelevant 0 w g w w ln 0 w

62 Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 6 Case : 0 w g w g g 0 0 w w ] / ln[ 0 w 0 w ln 0 w

Case : Xn-Shun Xu @ SDU School of Computer

63 Case : Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 63

64 Case : Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 64

65 Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 65 Case 3: ln ln g 0 w g w W Wthout ths term In Case and ln ln ln d g rrelevant W w ln ln 0 w Decson surfaces are hyperquadrcs, e.g., Hyperplanes Hyperspheres Hyperellpsods hyperhyperbolods

66 Case 3: Non-smply connected decson regons can arse n one dmenson for Gaussans havng unequal varance. Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 66

67 Case 3: Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 67

68 Case 3: Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 68

69 Case 3: Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 69

70 Case 3: Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 70

71 Summary Bayesan Decson heory Basc concepts Bayes theorem Bayes decson rule Feasblty of Bayes Decson Rule ror probablty + lkelhood Soluton I: countng relatve frequences Soluton II: conduct densty estmaton Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 7

72 Summary Bayes decson rule: he general scenaro Allowng more than one feature Allowng more than two states of nature Allowng actons than merely decdng state of nature Loss functon Epected loss condtonal rsk General Bayes decson rule Mnmum-error-rate classfcaton Dscrmnant functons Gaussan densty Dscrmnant functons for Gaussan pdf. Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 7

73 k-means Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty 73

74 Group Any Queston? Xn-Shun SDU School of Computer Scence and echnology, Shandong Unversty

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering / Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons