CHAPTER 3: BAYESIAN DECISION THEORY

Size: px

Start display at page:

Download "CHAPTER 3: BAYESIAN DECISION THEORY"

Anne Ball
5 years ago
Views:

1 HATER 3: BAYESIAN DEISION THEORY

2 Decson mang under uncertanty 3 Data comes from a process that s completely not nown The lac of nowledge can be compensated by modelng t as a random process May be the underlyng process s determnstc, but because we do not have access to complete nowledge about t, we model t as random and use probablty theory to analyze t

3 robablty and Inference 4 Result of tossng a con s {Heads,Tals} Random var X {,0} Bernoull: {X=} = p X o ( p o )( X) Sample: X = { t } N t = Estmaton: p o = # {Heads}/#{Tosses} = t t / N redcton of net toss: Heads f p o > ½, Tals otherwse

4 lassfcaton redt scorng: Inputs are ncome and savngs. Output s low-rs vs hgh-rs Input: = [, 2 ] T,Output: belongs to {0,} redcton: choose or choose f 0 otherwse f ( ( 0 otherwse,, 2 2 ) ) 0. 5 ( 0, 2 ) 5

5 Bayes Rule p p p p p p 6 posteror lelhood pror evdence

6 Bayes Rule: K>2 lasses K p p p p ma f choose and K 0 7

7 Bayes Rule: Smple settng 8 onsder smple settng Y (class label) s boolean valued X s a vector contanng n boolean attrbutes (each feature/attrbute s bnary) Applyng Bayes Theorem..

8 Bayes Rule: How many parameters? 9 Let How many parameters do we need to estmate? 2(2 n -) Why s ths bad? Ths corresponds to 2 dstnct parameters for each of the dstnct nstances n the nstance space X To mae relable estmate we need to see each of those dstnct nstances multple tmes How bad can ths be? If X has 30 boolean features we need to estmate 3 bllon parameters! Totally mpractcal!

9 an we do anythng about t? 0 By usng a smple modelng trc (assumpton), we can reduce the number of parameters to be estmated from 2(2 n -) to just 2n The trc s called condtonal ndependence The resultng method (algorthm) s called Naïve Bayes classfer

10 ondtonal ndependence Why?

11 Naïve Bayes 2 Ths s a classfcaton algorthm based on Bayes rule that assumes that attrbutes X,.., X n are condtonally ndependent of one another Ths dramatcally smplfes the representaton of (XY) onsder frst the case when X has only two attrbutes.e., X=(X, X 2 ) In general when X=(X,,Xn), we can wrte

12 Naïve Bayes contd. 3 Applcaton of Bayes rule yelds Naïve Bayes classfcaton rule s redct Y=y, f t mamzes R.H.S Whch smplfes to

13 Naïve Bayes algorthm for dscretenput 4 The settng n nput attrbutes/features X, each tang J possble dscrete values Y s dscrete output varable (class label) tang K possble values arameters For each par of, values There are n(j-)k parameters There are (K-) parameters (pror probabltes) Estmates

14 5 Naïve Bayes algorthm for emal SAM flterng

15 Losses and Rss Actons: α s the decson to assgn nput to class Loss of α when the actual class of nput s : λ Epected rs (Duda and Hart, 973) R K choose f R mn R 6

16 Losses and Rss: 0/ Loss f f 0 K R 7 For mnmum rs, choose the most probable class

17 Losses and Rss: Reject 0 0 otherwse f f, K K K R R otherwse reject and f choose 8

18 Dscrmnant Functons choose g f p g ma R g g,,, K K decson regons R,...,R K R g ma g 20

19 K=2 lasses Dchotomzer (K=2) vs olychotomzer (K>2) g() = g () g 2 () f g choose 0 2 otherwse Log odds: log 2 2

20 Utlty Theory rob of state gven edence : (S ) Utlty of α when state s : U Epected utlty: EU U S hoose α f EU ma EU j j 22

21 Assocaton Rules Assocaton rule: X Y eople who buy/clc/vst/enjoy X are also lely to buy/clc/vst/enjoy Y. A rule mples assocaton, not necessarly causaton. 23

22 Assocaton measures 24 Support (X Y): onfdence (X Y): Lft (X Y): customers and customerswho bought # #, Y X Y X X Y X X Y X X Y customerswho bought and customerswho bought # # ) (, ) ( ) ( ) ( ) (, Y X Y Y X Y X

23 Assocaton measures 25 Support shows statstcal sgnfcance of the rule We are nterested n mamzng the support of a rule because even f there s a dependency wth strong confdence value, f the number of such customers s small, the rule s worthless onfdence shows the strength of the rule To be able to say a rule holds wth enough confdence, ths value must be close to and sgnfcantly larger than (Y) If X and Y are ndependent we epect Lft to be close to

24 26 Eample

25 Apror algorthm (Agrawal et al., ) For (X,Y,Z), a 3-tem set, to be frequent (have enough support), (X,Y), (X,Z), and (Y,Z) should be frequent. If (X,Y) s not frequent, none of ts supersets can be frequent. Once we fnd the frequent -tem sets, we convert them to rules: X, Y Z,... and X Y, Z,...

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

ETHEM ALPAYDIN The MIT Press, 2014 Lecture Sldes for INTRODUCTION TO MACHINE LEARNING 3RD EDITION alpaydn@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/2ml3e CHAPTER 3: BAYESIAN DECISION THEORY Probablty