Baysian Dcision Thory
Baysian Dcision Thory Know probabiity distribution of th catgoris Amost nvr th cas in ra if! Nvrthss usfu sinc othr cass can b rducd to this on aftr som work Do not vn nd training data Can dsign optima cassifir
Baysian Dcision thory Fish Examp: Each fish is in on of stats: sa bass or samon Lt ω dnot th stat of natur ω = ω for sa bass ω = ω for samon Th stat of natur is unprdictab ω is a variab that must b dscribd probabiisticay. If th catch producd as much samon as sa bass th nxt fish is quay iky to b sa bass or samon. Dfin: (ω ) : a priori probabiity that th nxt fish is sa bass (ω ): a priori probabiity that th nxt fish is samon.
Baysian Dcision thory If othr typs of fish ar irrvant: ( ω ) + ( ω ) =. rior probabiitis rfct our prior knowdg (.g. tim of yar, fishing ara, ) Simp dcision Ru: Mak a dcision without sing th fish. Dcid ω if ( ω ) > ( ω ); ω othrwis. OK if dciding for on fish If svra fish, a assignd to sam cass In gnra, w hav som faturs and mor information. 4
Cats and Dogs Suppos w hav ths conditiona probabiity mass functions for cats and dogs (sma ars dog) = 0., (arg ars dog) = 0.9 (sma ars cat) = 0.8, (arg ars cat) = 0. Obsrv an anima with arg ars Dog or a cat? Maks sns to say dog bcaus probabiity of obsrving arg ars in a dog is much argr than probabiity of obsrving arg ars in a cat r[arg ars dog] = 0.9 > 0.= r[arg ars cat] = 0. W choos th vnt of argr probabiity, i.. maximum ikihood vnt
Examp: Fish Sorting Rspctd fish xprt says that Samon ngth has distribution N(5,) Sa bass s ngth has distribution N(0,4) Rca if r.v. is Nµ,σ, thn it s dnsity is p ( x) = ( ) σ ( x µ ) σ π 6
Cass Conditiona Dnsitis ( ) p samon fixd = π ( ) 5 ( ) p bass fixd = π ( ) 0 *4 7
Likihood function Fix ngth, t fish cass vary. Thn w gt ikihood function (it is not dnsity and not probabiity mass) π π ( 5) ( cass) = ( 0) p fixd 8 if cass= samon if cass= bass 8
Likihood vs. Cass Conditiona Dnsity p( cass) 7 ngth Suppos a fish has ngth 7. How do w cassify it? 9
ML (maximum ikihood) Cassifir W woud ik to choos samon if r ngth= 7 samon> r ngth= 7 bass [ ] [ ] Howvr, sinc ngth is a continuous r.v., [ ngth= 7 samon] = r[ ngth= 7 bass] 0 r = Instad, w choos cass which maximizs ikihood ( 5) ( 0) ( ) p samon = ( ) *4 p bass = π π ML cassifir: for an obsrvd : bass <?p bass > samon ( ) ( ) p samon in words: if p( samon) > p( bass), cassify as samon, s cassify as bass
ML (maximum ikihood) Cassifir p( 7 bass) p( 7 samon) Thus w choos th cass (bass) which is mor iky to hav givn th obsrvation 7
Dcision Boundary cassify as samon cassify as sa bass 6.70 ngth
How rior Changs Dcision Boundary? Without priors samon 6.70 sa bass ngth How shoud this chang with prior? (samon) = /3 (bass) = /3 samon?? 6.70 sa bass ngth 3
Bays Dcision Ru. Hav ikihood functions p(ngth samon) and p(ngth bass). Hav priors (samon) and (bass) Qustion: Having obsrvd fish of crtain ngth, do w cassify it as samon or bass? Natura Ida: samon if bass if ( ) ( ) samon ngth > bass ngth ( ) ( ) bass ngth > samon ngth 4
ostrior (samon ngth) and (bass ngth) ar cad postrior distributions, bcaus th data (ngth) was rvad (post data) How to comput postriors? Not obvious 5 From Bays ru: ( ) ( ) ( ) n g t h n g t h n g t h n g t h p s a s a s a s a s a s a s a s a n g t h n g t h n g t h n g t h s a m o n s a m o n s a m o n s a m o n = ( ) ( ) ( ) ( ) p ngth bass bass p ngth ngth bass = Simiary:
MA (maximum a postriori) cassifir p > samon samon ngth? bass < ( ) ( ) bass ( ngth samon) ( samon) p( ngth bass) ( bass)? p( ngth) p( ngth) p bass< > samon ngth ( ) ( ) ( ) ( ) p ngth samon samon? bass< >samon p ngth bass bass 6
Back to Fish Sorting Examp Likihood p( samon) = π ( ) 5 ( bass) riors: (samon) = /3, (bass) = /3 Sov inquaity samon 6.70 π p = π ( ) 0 8 ( 5) ( 0) 8 nw dcision boundary 7.8 > 3 π sa bass ngth 3 Nw dcision boundary maks sns sinc w xpct to s mor samon 7
rior (s)=/3 and (b)= /3 vs. rior (s)=0.999 and (b)= 0.00 samon bass 7. 8.9 ngth
Likihood vs ostriors (samon ) p( samon) (bass ) p( bass) ikihood p( fish cass) dnsity with rspct to ngth, ara undr th curv is ngth postrior (fish cass ) mass function with rspct to fish cass, so for ach, (samon )+(bass ) =
Mor on ostrior postrior dnsity (our goa) ( ) c = ikihood (givn) ( c) ( c) ( ) rior (givn) normaizing factor, oftn do not vn nd it for cassification sinc () dos not dpnd on cass c. If w do nd it, from th aw of tota probabiity: ( ) = p( samon) p( samon) + p( bass) p( bass) Notic this formua consists of ikihoods and priors, which ar givn
Mor on riors rior coms from prior knowdg, no data has bn sn yt If thr is a riab sourc prior knowdg, it shoud b usd Som probms cannot vn b sovd riaby without a good prior
Mor on Map Cassifir postrior ( ) c = ikihood prior ( c) ( c) ( ) Do not car about () whn maximizing (c ) ( ) ( ) ( ) c proportiona c c If (samon)=(bass) (uniform prior) MA cassifir bcoms ML cassifir ( c ) ( c) If for som obsrvation, ( samon)=( bass), thn this obsrvation is uninformativ and dcision is basd soy on th prior ( c ) ( c)
Justification for MA Cassifir Lt s comput probabiity of rror for th MA stimat: > ( samon )? ( bass ) bass < samon For any particuar, probabiity of rror (bass ) if w dcid samon r[rror ]= (samon ) if w dcid bass Thus MA cassifir is optima for ach individua! 3
Justification for MA Cassifir W ar intrstd to minimiz rror not ust for on, w ray want to minimiz th avrag rror ovr a r [ rror] = p( rror,) d= r[ rror ] p( )d If r[rror ]is as sma as possib, th intgra is sma as possib But Bays ru maks r[rror ] as sma as possib Thus MA cassifir minimizs th probabiity of rror!
Mor Gnra Cas Lt s gnraiz a itt bit Hav mor than on fatur x = [ x, x,..., ] Hav mor than casss xd,c,..., cm { } c
Mor Gnra Cas As bfor, for ach w hav ( ) p x c is ikihood of obsrvation x givn that th tru cass is c ( ) c is prior probabiity of cass c ( c ) x is postrior probabiity of cass c givn that w obsrvd data x Evidnc, or probabiity dnsity for data p m ( x) = p( x c ) ( ) c = 6
Minimum Error Rat Cassification Want to minimiz avrag probabiity of rror r = = [ rror] p( rror, x) dx r[ rror x] p( x)dx [ ] ( ) nd to mak this as sma as possib r rror x = ci x if w dcid cass c i r[ rror x] Dcid on cass c i is minimizd with MA cassifir ( c x) > ( c x) i i MA cassifir is optima If w want to minimiz th probabiity of rror if -(c x) -(c x) (c x) (c x) -(c 3 x) (c 3 x)
Gnra Baysian Dcision Thory In cos cass w may want to rfus to mak a dcision (t human xprt hand tough cas) aow actions { } α, α α,..., Suppos som mistaks ar mor costy than othrs (cassifying a bnign tumor as cancr is not as bad as cassifying cancr as bnign tumor) Aow oss functions λα dscribing oss i c occurrd whn taking action α i whn th tru cass is c k ( ) 8
Conditiona Risk Suppos w obsrv x and wish to tak action α i If th tru cass is c, by dfinition, w incur oss ( ) λαi c robabiity that th tru cass is c aftr obsrving x is R ( c x) Th xpctd oss associatd with taking action is cad conditiona risk and it is: α i m ( α ) = ( ) ( ) i x λαi c c x =
Conditiona Risk sum ovr disoint vnts (diffrnt casss) probabiity of cass c givn obsrvation x R m = ( α x) λ( α c ) ( c x) i pnaty for taking action α i if obsrv x = i part of ovra pnaty which coms from vnt that tru cass is c
Examp: Zro-On oss function action is dcision that tru cass is R m ( α x) λ( α c ) ( c x)= i α i λ ( α ) i c = = == = ci i 0 if i = othrwis ( x) i ( c x)= Thus MA cassifir optimizs R(α i x) ( c x) > ( c x) i i = r c i (no mistak) (mistak) [ rror if dcid ] MA cassifir is Bays dcision ru undr zro-on oss function c i
Ovra Risk Dcision ru is a function α(x) which for vry x spcifis action out of { α, α,..., α k } Th avrag risk for α(x) ( ) ( ( x ) x ) p ( x )dx R α = R α x x x 3 X α(x ) α(x ) α(x 3 ) nd to mak this as sma as possib { α, α α },..., Bays dcision ru α(x) for vry x is th action which minimizs th conditiona risk R m ( α ) = ( ) ( ) i x λαi c c x = Bays dcision ru α(x) is optima, i.. givs th minimum possib ovra risk R* k
Bays Risk: Examp Samon is mor tasty and xpnsiv than sa bass λ ( samon bass) sb =λ = cassify bass as samon λ ( bass samon) bs =λ = cassify samon as bass λ =λ 0 no mistak, no oss ss bb= Likihoods R ( samon) p riors (samon)= (bass) = ( ) Risk R( x) = λ( α c ) ( ) c x p( bass) = ππ π ( ) ( ) ( ) ( ) samon R = λ s + λ b = λ ss sb 5 sb b ( ) ( ) ( ) ( ) bass = = λ s + λ b = λ bs bb bs s ( ) 0 *4 m α =λ α s( s ) + λα b( b )
R Bays Risk: Examp ( samon ) = ( b ) R( bass ) ( s ) λ sb =λ Bays dcision ru (optima for our oss function) Nd to sov ( ) ( ) λ sb b? λ > < samon bs bass ( b ) λbs < ( s ) λsb ( b) ( b) p( ) ( ) ( ) ( ) p s s s Or, quivanty, sinc priors ar qua: = ( b) λ < ( s) λsb bs bs
Bays Risk: Examp Nd to sov ( b) λbs < ( s) λsb Substituting ikihoods and osss π xp ( 0) ( 5) xp π 8 < xp ( 0) 8 ( 5) xp 5 < xp n xp ( 0) ( 5) 8 ( ) < n ( 0) ( 5) + < 0 3 0< 0 0 < 6.6667 8 samon nw dcision boundary 6.67 6.70 sa bass ngth
Likihood Ratio Ru In catgory cas, us ikihood ratio ru ( x c ) ( ) x c > λ λ λ λ ( ) ( ) c c ikihood ratio fixd numbr Indpndnt of x If abov inquaity hods, dcid c Othrwis dcid c 36
Discriminant Functions A dcision rus hav th sam structur: at obsrvation x choos cass s.t. g i ( x) > g ( x) i discriminant function ML dcision ru: g ( x) = ( x ) c i i c i MA dcision ru: g ( x) ( c x) i = i Bays dcision ru: g ( x) R( c x) i = i
Dcision Rgions Discriminant functions spit th fatur vctor spac X into dcision rgions c ( x) max{ } g = g i c c 3 c 3 c 38
Important oints If w know probabiity distributions for th casss, w can dsign th optima cassifir Dfinition of optima dpnds on th chosn oss function Undr th minimum rror rat (zro-on oss function No prior: ML cassifir is optima Hav prior: MA cassifir is optima Mor gnra oss function Gnra Bays cassifir is optima 39