CHAPTER 10: LINEAR DISCRIMINATION

CHAPER : LINEAR DISCRIMINAION

Dscrmnan-based Classfcaon 3 In classfcaon h K classes (C,C,, C k ) We defned dscrmnan funcon g j (), j=,,,k hen gven an es eample, e chose (predced) s class label as C f g () as he mamum among g (), g (),,g k () In prevous chapers e have Used g ()=log P(C ) hs s called lkelhood classfcaon Where e used mamum lkelhood esmae echnque for esmae class lkelhood P( C )

4 Lkelhood- vs. Dscrmnan-based Classfcaon Lkelhood-based: Assume a model for p( C ), use Baes rule o calculae P(C ) g () = log P(C ) hs requres esmang class condonal denses P( C ) For hgh-dmensonal daa (man arbues/feaures), esmang class condonal denses self s a dffcul ask Dscrmnan-based: Assume a model for g ( Φ ); no dens esmaon Parameers Φ descrbe he class boundar Esmang he class boundar s enough for performng classfcaon no need o accurael esmae he denses nsde he boundares

Lnear Dscrmnan 5 Lnear dscrmnan: g Advanages:, j j Smple: O(d) space/compuaon (d s he number of feaures) Knoledge eracon: Weghed sum of arbues; posve/negave eghs, magnudes (cred scorng) Opmal hen p( C ) are Gaussan h shared cov mar; useful hen classes are (almos) lnearl separable d j

Quadrac dscrmnan: Hgher-order (produc) erms: Map from o z usng nonlnear bass funcons and use a lnear dscrmnan n z-space Generalzed Lnear Model 6 5 4 3 z z z z z,,,, g W W,, k j j g j

Generalzed Lnear Model 7 Eample of non-lnear bass funcons: sn() ep(-( -m) /c) ep(- -m /c) Log( ) ( >c) (a +b >c)

o Classes g g g oherse f choose C g C 8

9 Geomer

Undersandng he geomer Le he dscrmnan funcon s gven b g()= + + = +, here =(, ) ake an o pons,, lng on he decson surface (boundar) g()= g( )=g( )= + = + => ( - )= Noe ha ( - ) s a vecor lng on he decson surface (hperplane), hch means s normal o an vecor lng on he decson surface

Undersandng he geomer An daa pon can be ren as a sum of o vecors as follos = p +r(/ ) p s normal projecon of on o decson hper plane ( p les on he decson hperplane) r s dsance of o he hperplane g()= + = ( p +r(/ )+= ( p + )+r( )/ =+(r / )=r => r=g()/ Smlarl f =, r ll denoe dsance of he hperplane from he orgn g()= =r => r= /

Mulple Classes Dscrmnan funcon for he h class s: g, Choos ec g K mag j f j Classes are lnearl separable

Mulple classes 3 Durng esng, gven, deall e should have onl one g j (), j=,,,k greaer han zero and all ohers should be less han Hoever, hs s no alas he case Posve half spaces of he hperplane s ma overlap Or e ma have all g j ()< hese ma be aken as rejec case Rememberng ha g () / s he dsance from he npu pon o he decson hperplane, assumng all have smlar lengh, hs assgns pon o he class (among all g j ()>) o hose decson hperplane he pon s mos dsan

Parse Separaon I possble ha classes are no lnearl separable bu are parse lnearl separable We can use K(K-)/ lnear dscrmnans g j () o classf g j j, j j j Parameers are compued durng ranng so as o have g j don' care f C f C oherse Classfcaon s performed as follos choos ec f j, g j j For an npu o be assgned o class C, should be on he posve sde of H and H 3. We don care abou he value of H 3 4

If he class denses are Gaussan, and share a common covarance mar, he dscrmnan funcon s lnear,.e., hen p ( C ) ~ N ( μ, ) For he specal case hen here are o classes, e defne, log(/(-) s knon as log ransformaon or log odds of From Dscrmnans o Poserors 5 C P g log, μ μ μ oherse and log f choose and 5 C C C P C P / /.

In case of o normal classes sharng a common covarance mar, he log odds s lnear 6 P C P C log PC log log P C P C log log p C P C log p C P C d / / ep / μ μ d / / ep / μ μ here μ μ μ μ μ μ he nverse of log s logsc or sgmod funcon P C log P C sgmod P C ep log P C P C

Sgmod (Logsc) Funcon 7 Calculaeg Calculae sgmod andchoosec, or andchoosec f. 5 f g

Logsc Regresson 8 Logsc regresson s a classfcaon mehod here n case of bnar classfcaon, he log rao of p(c ) and p(c ) s p modeled as a lnear funcon C log P C log Snce e are modelng rao of poseror probabl drecl, here s no need for dens esmaon.e. p( C ) and p( C ) Noe ha hs s slghl dfferen verson han ha s gven n he book bu hs s he mos del verson n pracce Rearrangng, e can re Gven, predced label s C hen P(C )>P(C ) Or alernavel, hen, + > o classf usng hs model, ha e need o kno ha and s Ho do e fnd and? p C ep P C and P C ep ep

Logsc Regresson for bnar classfcaon 9 N Gven ranng daa X, r r s modeled as Bernoull dsrbuon r ~ Bernoull here, P C ep o esmae and, e can Mamze he lkelhood, r r X l Or, equvalenl mamze he log-lkelhood Or equvalenl, mnmze negave log-lkelhood, X log log L r r, X, X log log E L r r

Graden-Descen E( X) s error h parameers on sample X *=arg mn E( X) Graden E E E E,,..., Graden-descen: Sars from random and updaes eravel n he negave drecon of graden d

Graden-Descen E, E ( ) E ( + ) + η

Graden-Descen

3 Graden-Descen

ranng: Graden-Descen 4 j j j j r E d j r r r E da d r r E,...,,, sgmoda If log log X

Logsc Regresson for K classes (K>) 7 Gven ranng daa N X, r r s modeled as Mulnomal dsrbuon ep r ~ Mul K, here, P C,,..., K K ep j j j o esmae,,, K and,,, K e can Mamze he lkelhood, K l r X Or equvalenl, mnmze negave log-lkelhood K E, X r log he graden can compued usng smple formula r r j j j j j j Usng graden descen, e can have smple algorhm for logsc regresson for K class classfcaon problem hs s knon as sofma funcon