Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING

MACHINE LEANING Vasant Honavar Bonformatcs and Computatonal Bology rogram Center for Computatonal Intellgence, Learnng, & Dscovery Iowa State Unversty honavar@cs.astate.edu www.cs.astate.edu/~honavar/ www.cld.astate.edu/ Copyrght Vasant Honavar, 006.

Notaton Let Y, Z denote Y Z sets of { Y, Y, Y3} ; Y Y, Y { Z, Z} ; Z Z, Z Y U Z Y, Y, Y3, Z, Z Y U Z Y Z Z random varables, Y Note the overloadng of and an unfortunate consequence of the set notaton 3 Y, Z Copyrght Vasant Honavar, 006.

Let Y, Z denote Y Smlarly, Eample: Copyrght Vasant Honavar, 006. z sets of Margnalzaton Y, z where the summaton s over all assgnment of values to random varables n Z Y Y z z Y { Y, Y, Y }, Z { Z, Z } Suppose all random varables are bnary. The jont dstrbuton over the varablesn Z U Y Margnalzaton over Y results n a the varablesn Z yeldng a z 3 random varables table of jont dstrbuton over entres. has 5 entres 3

Independence and Condtonal Independence If Let E, be a probablty space. Let A, A E. We say that the events A and A are A A A A A ndependent f and 0, A A A A are ndependen t If A A 0, A and A A A A are ndependent A 0 or A 0 or both, A A 0 If Copyrght Vasant Honavar, 006. 4

Independence and Condtonal Independence If for every subset k elements of A B { B... Bk } { A,..., A } k B... B C B C k j mutually ndependent gven C. j n obtaned by selectng k n f we have, we say that A,..., A n are Copyrght Vasant Honavar, 006. 5

Condtonal Independence s condtonally ndependent of Y gven Z f the probablty dstrbuton governng s ndependent of the value of Y gven the value of Z: Y, Z Z that s, f, y j, zk Y y j, Z zk Z zk Copyrght Vasant Honavar, 006. 6

Condtonal Independence Thunder s ndependent of an gven Lghtnng an, Lghtnng Thunder Lghtenng Thunder an 0, Lghtenng Thunder an, Lghtnng 0 Thunder Lghtenng 0 Thunder an 0, Lghtenng 0 Thunder 0 an, Lghtnng Thunder 0 Lghtenng Thunder 0 an 0, Lghtenng Thunder 0 an, Lghtnng 0 Thunder 0 Lghtenng 0 Thunder 0 an 0, Lghtenng 0 Thunder Copyrght Vasant Honavar, 006. 7

Let Z Independence and Condtonal Independence random varables on a Z,..., Z,... Z and are mutually ndependent gven W f n Z... Zn W Z W Z Z U W Z W f Z n n W Note that these represent sets of possble value assgnments be parwse dsjont sets of gven event space. and Z are ndependent. equatons, for all to random varables Copyrght Vasant Honavar, 006. 8

Independence ropertes of andom Varables Let W,, Y, Z be parwse dsjont sets of random varables on a gven event space. Let I a. I b. I c. I d. I, Y, Z That s, denote that and Z are ndependent U Z Y Y Z Y, or Y U Z Y, Z, Y I Y, Z,, Z, Y U W I, Z, Y, Z, Y U W I, Z U W, Y, Z, Y I, Z U Y, W I, Z, Y U W roof : Follows from defnton of ndependence. gven Y..Then : Copyrght Vasant Honavar, 006. 9

Epectaton and Varance Let : E be a random varable on a fnte probablty space E, and B E. The condtonal epectaton or epected value of gven E B The varance of Var e E B E e B e gven B s gven by B E B B e e The uncondtonal epectaton and varance correspond to the case B E n whch case we smply drop " B". e B B s Copyrght Vasant Honavar, 006. 0

Condtonal epectaton of random varables Epectaton of a random varable condtoned on a random varabley Note that ths denotes a set The defntons can be etended to the case where and Y replaced by sets of s E Y e E of equatons for possble values of Y. random varables. e Y e are Eample 0Y00.6 0Y0.3 Y00.4 Y0.7 E E Y Y 0 0.7 0 Y 0 0 + Y 0 0.4 Copyrght Vasant Honavar, 006.

ropertes of Epectaton and Varance Let If For any c s and If,... j has mean μ and varanceσ, and n s, be random varables and a, b, c... c j Var then a Var c B roof of these results s left as an eercse. B E B E B + b has mean aμ + b and varance a σ. E c B ce are ndependent gven B, then B c n Var be real numbers. B Copyrght Vasant Honavar, 006.

Learnng as Bayesan Inference robablty s the logc of Scence Jaynes Bayesan subjectve probablty provdes a bass for updatng belefs based on evdence By updatng belefs about hypotheses based on data, we can learn about the world. Bayesan framework provdes a sound probablstc bass for understandng many learnng algorthms and desgnng new algorthms Bayesan framework provdes several practcal reasonng and learnng algorthms Copyrght Vasant Honavar, 006. 3

Classfcaton usng Bayesan Decson Theory Consder the problem of classfyng an nstance nto one of two mutually eclusve classes or probablty of class probablty of class What s the probablty of error? gven the evdence gven the evdence error f f we choose we choose Copyrght Vasant Honavar, 006. 4

5 Copyrght Vasant Honavar, 006. Iowa State Unversty Mnmum Error Classfcaton [ ] ; We have :, mn whch yelds f Choose f Choose classfcaton error To mnmze error > >

Classfcaton usng Bayesan decson theory Choose f Choose f > >.e..e. Copyrght Vasant Honavar, 006. 6

Optmalty of Bayesan Decson ule We can show that the Bayesan classfer s optmal n that t s guaranteed to mnmze the probablty of msclassfcaton roof gven n class Copyrght Vasant Honavar, 006. 7

Optmalty of Bayes Decson ule Error Copyrght Vasant Honavar, 006. 8

9 Copyrght Vasant Honavar, 006. Iowa State Unversty Optmalty of the Bayes Decson ule + + + +, Applyng Bayes ule :,, e e d p d p p p p d p d p

Copyrght Vasant Honavar, 006. Optmalty of the Bayes Decson ule p p d + p p d e Because e e p d + p d s such that such that p mnmzed > and covers the by choosng > entre nput space, d 0

Optmalty of Bayes Decson ule The proof generalzes to multvarate nput spaces Smlar result can be proved n the case of dscrete as opposed to contnuous nput spaces replace ntegral over the nput space by sum Copyrght Vasant Honavar, 006.

Bayes Decson ule yelds Mnmum Error Classfcaton To mnmze classfcaton error Choose f Choose f whch yelds error mn > > [, ] Copyrght Vasant Honavar, 006.

Bayes Decson ule Behavor of Bayes decson rule as a functon of pror probablty of classes Copyrght Vasant Honavar, 006. 3

Bayes Optmal Classfer Classfcaton rule that guarantees mnmum error : If If classfcaton depends entrely on and Choose f Choose f, classfcaton depends entrely on Bayes classfcaton rule combnes the effect of the two terms optmally - so as to yeld mnmum error classfcaton. Generalzaton to multple classes > > and c arg ma j j Copyrght Vasant Honavar, 006. 4

5 Copyrght Vasant Honavar, 006. Iowa State Unversty Mnmum sk Classfcaton con otherwse Flp a f Choose f Choose rule that guarantees mnmum rsk : Classfcaton to class epected loss ncurred n assgnng when the correct classfcaton s to class rsk or cost assocated wth assgnng an nstance Let λ j j λ λ λ λ < < + +

λ j Mnmum sk Classfcaton rsk or cost assocated to class when the correct classfcaton s j Ordnarly λ λ and λ λ are postve cost of beng correct s less than the cost of error λ λ So we choose f > λ λ Otherwse choose Mnmum error classfcaton rule s a specal case : λ 0 f j and λ f j j wth assgnng an nstance Ths classfcaton rule can be shown to be optmal n that t s guaranteed to mnmze the rsk of msclassfcaton j Copyrght Vasant Honavar, 006. 6

Summary of Bayesan recpe for classfcaton λ j rsk or cost assocated wth assgnng an nstance Choose Choose Choose to class f f f when the correct classfcaton s λ λ λ λ λ λ λ λ Mnmum error classfcaton rule s a specal case : j > < > Otherwse choose Copyrght Vasant Honavar, 006. 7

Summary of Bayesan recpe for classfcaton The Bayesan recpe s smple, optmal, and n prncple, straghtforward to apply To use ths recpe n practce, we need to know and Because these probabltes are unknown, we need to estmate them from data or learn them! s typcally hgh-dmensonal v Need to estmate from lmted data Copyrght Vasant Honavar, 006. 8