INTRODUCTION TO Lecure Sldes for Machne Learnng nd Edon ETHEM ALPAYDIN, modfed by Leonardo Bobadlla and some pars from hp://www.cs.au.ac.l/~aparzn/machnelearnng/ The MIT Press, 00 alpaydn@boun.edu.r hp://www.cmpe.boun.edu.r/~ehem/mle
Oulne Ths class: Ch 5: Mulvarae Mehods Mulvarae Daa Parameer Esmaon Esmaon of Mssng Values Mulvarae Classfcaon Lecure Noes for E Alpaydın 00 Inroducon o Machne Learnng e The MIT Press (V.0)
CHAPTER 5: Mulvarae Mehods
Mulvarae Dsrbuon 4 Assume all members of class came from jon dsrbuon Can learn dsrbuons from daa P(x C) Assgn new nsance for mos probable class P(C x) usng Bayes rule An nsance descrbed by a vecor of correlaed parameers Realm of mulvarae dsrbuons Mulvarae normal Based on E Alpaydın 004 Inroducon o Machne Learnng The MIT Press (V.)
Mulvarae Daa Based on E Alpaydın 004 Inroducon o Machne Learnng The MIT Press (V.) 5 = N d N N d d X X X X X X X X X X Mulple measuremens (sensors) d npus/feaures/arbues: d-varae N nsances/observaons/examples
Mulvarae Parameers 6 Σ Cov [ x] = μ= [ µ,..., µ ] Mean : E Covarance: σ Cov Correlaon :Corr [ ] T ( X ) = E ( X μ)( X μ) j ( X,X ) ( X,X ) = d j T σ σ σ d j ρ σ σ σ j d = σ j σ σ σ σ σ j d d d Based on E Alpaydın 004 Inroducon o Machne Learnng The MIT Press (V.)
Parameer Esmaon Samplemean m : m Covarancemarx S: Correlaon marx R : = s j r j N = = N = x s N = j s s, =,..., d j ( )( x m x m ) N j j Based on E Alpaydın 004 Inroducon o Machne Learnng The MIT Press (V.) 7
Esmaon of Mssng Values 8 Wha o do f ceran nsances have mssng arbues? Ignore hose nsances: no a good dea f he sample s small Use mssng as an arbue: may gve nformaon Impuaon: Fll n he mssng value Mean mpuaon: Use he mos lkely value (e.g., mean) Impuaon by regresson: Predc based on oher Based on for E Alpaydın 004 Inroducon o Machne Learnng The MIT Press (V.) arbues
Mulvarae Normal 9 Have d-arbues Ofen can assume each one dsrbued normally Arbues mgh be dependan/correlaed Jon dsrbuon of correlaed several varables P(X =x, X =x, X d =x d )=? X s normally dsrbued wh mean µ and varance Lecure Noes for E Alpaydın 004 Inroducon o Machne Learnng The MIT Press (V.)
Mulvarae Normal 0 x p ( x) = ( μ Σ) ~ N, d ( π ) d / exp / Σ ( ) T ( ) x μ Σ x μ Mahalanobs dsance: (x μ)t (x μ) varables are correlaed Dvded by nverse of covarance (large) Conrbue less o Mahalanobs dsance Conrbue more o he probably Lecure Noes for E Alpaydın 004 Inroducon o Machne Learnng The MIT Press (V.)
Bvarae Normal Based on E Alpaydın 004 Inroducon o Machne Learnng The MIT Press (V.)
Mulvarae Normal Dsrbuon p Mahalanobs dsance: (x μ)t (x μ) measures he dsance from x o μ n erms of (normalzes for dfference n varances and correlaons) Bvarae: d = ρσ σ σ ( x,x ) = exp z ρz Σ ρσ Based on E Alpaydın 004 Inroducon o Machne Learnng The MIT Press (V.) σ = σ ( ) ( z + z ) ρ πσ σ ρ z = ( x µ ) / σ
Bvarae Normal 3 Lecure Noes for E Alpaydın 004 Inroducon o Machne Learnng The MIT Press (V.)
Bvarae Normal Lecure Noes for E Alpaydın 00 Inroducon o Machne Learnng e The MIT Press (V.0) 4
Independen Inpus: Nave 5 Bayes If x are ndependen, offdagonals of are 0, Mahalanobs dsance reduces o weghed (by /σ ) Eucldean dsance: d p ( x )= = p (x )= d (π ) d / = d σ exp[ = If varances are also equal, reduces o Eucldean dsance ( x μ σ )] Based on Inroducon o Machne Learnng The MIT Press (V.)
Projecon Dsrbuon 6 Example: vecor of 3 feaures Mulvarae normal dsrbuon Projecon o dmensonal space (e.g. XY plane) Vecors of feaures Projecon are also mulvarae normal dsrbuon Projecon of d-dmensonal normal o k-dmensonal space s k-dmensonal normal Based on E Alpaydın 004 Inroducon o Machne Learnng The MIT Press (V.)
D projecon 7 Based on E Alpaydın 004 Inroducon o Machne Learnng The MIT Press (V.)
Mulvarae Classfcaon 8 Assume members of class from a sngle mulvarae dsrbuon Mulvarae normal s a good choce Easy o analyze Model many naural phenomena Model a class as havng sngle prooype source (mean) slghly randomly changed Based on E Alpaydın 004 Inroducon o Machne Learnng The MIT Press (V.)
Example 9 Machng cars o cusomers Each ca defnes a class of machng cusomers Cusomers descrbed by (age, ncome) There s a correlaon beween age and ncome Assume each class s mulvarae normal Need o learn P(x C) from daa Use Bayes o compue P(C x) Lecure Noes for E Alpaydın 004 Inroducon o Machne Learnng The MIT Press (V.)
Paramerc Classfcaon If p (x C ) ~ N ( μ, ) p C = exp d / / Σ Σ ( x ) ( π ) Dscrmnan funcons are Need o know Covarance Marx and mean o compue dscrmnan funcons. Can gnore P(x) as he same for all classes ( x ) ( ) T ( x μ) ( x μ) P C P C g ( x) = log P ( C x ) =log =log p ( x C ) + log P ( C ) log P( x) P( x) d T = logπ log Σ ( x μ ) Σ ( x μ ) + log P ( C ) LogP( x) Based on E Alpaydın 004 Inroducon o Machne Learnng The MIT Press (V.) 0
Esmaon of Parameers ( ) ( )( ) = = = T r r r r N r C Pˆ m x m x x m S ( ) ( ) ( ) ( ) T C g Pˆ log log + = m x m x x S S Based on E Alpaydın 004 Inroducon o Machne Learnng The MIT Press (V.)
Covarance Marx per Class Quadrac dscrmnan Requres esmaon of K*d*(d+)/ parameers for covarance marx ( ) ( ) ( ) ( ) T T T T T T C Pˆ w w C Pˆ g log log where log log 0 0 + = = = + + = + + = S S S S W W S S S S m m m w x w x x m m m x x x x Based on E Alpaydın 004 Inroducon o Machne Learnng The MIT Press (V.)
lkelhoods dscrmnan: P (C x ) = 0.5 poseror for C 3 Based on E Alpaydın 004 Inroducon o Machne Learnng The MIT Press (V.)
Common Covarance Marx S 4 If no enough daa can assume all classes have same common sample covarance marx S S= Pˆ ( C ) S Dscrmnan reduces o a lnear dscrmnan (x T S - x s common o all dscrmnan and can be removed) ( ) ( ) T x = x m S ( x m ) + log ( C ) g Pˆ T g ( x) = w x + w 0 where w = S m w = m S m + log Pˆ C T 0 Based on E Alpaydın 004 Inroducon o Machne Learnng The MIT Press (V.) ( )
Common Covarance Marx S 5 Based on E Alpaydın 004 Inroducon o Machne Learnng The MIT Press (V.)
Dagonal S 6 When xj j =,..d, are ndependen, s dagonal p (x C ) = j p (x j C ) (Nave Bayes assumpon) g d x = j = m ( ) j j x + log ( C ) s j Pˆ Classfy based on weghed Eucldean dsance (n s j uns) o he neares mean Based on E Alpaydın 004 Inroducon o Machne Learnng The MIT Press (V.)
Dagonal S 7 varances may be dfferen Based on E Alpaydın 004 Inroducon o Machne Learnng The MIT Press (V.)
Dagonal S, equal varances 8 Neares mean classfer: Classfy based on Eucldean dsance o he neares mean g ( x )= x m +log P (C s ) d = s j= (x j m ) j +log P (C ) Each mean can be consdered a prooype or emplae and hs s emplae machng Lecure Noes for E Alpaydın 004 Inroducon o Machne Learnng The MIT Press (V.)
Dagonal S, equal varances 9 *? Based on E Alpaydın 004 Inroducon o Machne Learnng The MIT Press (V.)
Model Selecon Assumpon Covarance marx No of parameers Shared, Hyperspherc S=S=s^I Shared, Axs-algned S=S, wh sj=0 d Shared, Hyperellpsodal S=S d(d+)/ Dfferen, Hyperellpsodal S K d(d+)/ As we ncrease complexy (less resrced S), bas decreases and varance ncreases Assume smple models (allow some bas) o conrol varance (regularzaon) Lecure Noes for E Alpaydın 00 Inroducon o Machne Learnng e The MIT Press (V.0) 30
Model Selecon 3 Dfferen covarance marx for each class Have o esmae many parameers Small bas, large varance Common covarance marces, dagonal covarance ec. reduce number of parameers Increase bas bu conrol varance In-beween saes? Lecure Noes for E Alpaydın 004 Inroducon o Machne Learnng The MIT Press (V.)
3 Regularzed Dscrmnan Analyss(RDA) a=b=0: Quadrac classfer a=0, b=:shared Covarance, lnear classfer a=,b=0: Dagonal Covarance Choose bes a,b by cross valdaon Lecure Noes for E Alpaydın 004 Inroducon o Machne Learnng The MIT Press (V.)
Model Selecon: Example 33 Lecure Noes for E Alpaydın 004 Inroducon o Machne Learnng The MIT Press (V.)
Model Selecon 34 Based on E Alpaydın 004 Inroducon o Machne Learnng The MIT Press (V.)
Dscree Feaures Bnary feaures: p j p xj= C f xj are ndependen (Nave Bayes ) he dscrmnan s lnear g ( x) = logp ( x C ) + logp ( C ) = ( ) ( ) x ( ) ( x = ) j p j C j pj p x j= [ x logp + ( x ) log( p )] + logp ( C ) j j j j Esmaed parameers d j = j x r Lecure Noes for E Alpaydın 00 Inroducon o Machne Learnng e The MIT Press (V.0) pˆ j r 35
Mulvarae Regresson ( ) w w w + ε r,,..., = g x 0 d Mulvarae lnear model w 0 + w x + w x + + w x d ( ) [ E w, w,..., w X = r w w x w x ] 0 d d 0 d d Lecure Noes for E Alpaydın 00 Inroducon o Machne Learnng e The MIT Press (V.0) 36
Mulvarae Regresson l ( ) [, w,..., w X = r w w x w x ] E w 0 w d 0 + w x + w x + + w x d 0 d d d Lecure Noes for E Alpaydın 00 Inroducon o Machne Learnng e The MIT Press (V.0) 37
CHAPTER 6: Dmensonaly Reducon
Dmensonaly of npu 39 Number of Observables (e.g. age and ncome) If number of observables s ncreased More me o compue More memory o sore npus and nermedae resuls More complcaed explanaons (knowledge from learnng) Regresson from 00 vs. parameers No smple vsualzaon D vs. 0D graph Need much more daa (curse of dmensonaly) Based M of on E -d Alpaydın npus 004 s Inroducon no equal o Machne o npu Learnng of The dmenson MIT Press (V.) M
Dmensonaly reducon 40 Some feaures (dmensons) bear lle or nor useful nformaon (e.g. color of har for a car selecon) Can drop some feaures Have o esmae whch feaures can be dropped from daa Several feaures can be combned ogeher whou loss or even wh gan of nformaon (e.g. ncome of all famly members for loan applcaon) Some feaures can be combned ogeher Based on E Alpaydın 004 Inroducon o Machne Learnng The MIT Press (V.) Have o esmae whch feaures o combne from
Feaure Selecon vs Exracon 4 Feaure selecon: Choosng k<d mporan feaures, gnorng he remanng d k Subse selecon algorhms Feaure exracon: Projec he orgnal x, =,...,d dmensons o new k<d dmensons, z j, j =,...,k Prncpal Componens Analyss (PCA) Lnear Dscrmnan Analyss (LDA) Facor Analyss (FA) Lecure Noes for E Alpaydın 004 Inroducon o Machne Learnng The MIT Press (V.)
Usage 4 Have daa of dmenson d Reduce dmensonaly o k<d Dscard unmporan feaures Combne several feaures n one Use resulng k-dmensonal daa se for Learnng for classfcaon problem (e.g. parameers of probables P(x C) Learnng for regresson problem (e.g. parameers for model y=g(x Theha) Based on E Alpaydın 004 Inroducon o Machne Learnng The MIT Press (V.)
Subse selecon 43 Have nal se of feaures of sze d There are ^d possble subses Need a crera o decde whch subse s he bes A way o search over he possble subses Can go over all ^d possbles Need some heurscs Based on E Alpaydın 004 Inroducon o Machne Learnng The MIT Press (V.)
Goodness of feaure se 44 Supervsed Tran usng seleced subse Esmae error on valdaon daa se Unsupervsed Look a npu only(e.g. age, ncome and savngs) Selec subse of ha bear mos of he nformaon abou he person Based on E Alpaydın 004 Inroducon o Machne Learnng The MIT Press (V.)
Muual Informaon 45 Have a 3 random varables(feaures) X,Y,Z and have o selec whch gves mos nformaon If X and Y are correlaed hen much of he nformaon abou of Y s already n X Make sense o selec feaures whch are uncorrelaed Muual Informaon (Kullback Lebler Dvergence ) s more general measure of muual nformaon Can be exended o n varables (nformaon varables x,.. x n have abou varable x n+ ) Based on E Alpaydın 004 Inroducon o Machne Learnng The MIT Press (V.)
Subse-selecon 46 Forward search Sar from empy se of feaures Try each of remanng feaures Esmae classfcaon/regresson error for addng specfc feaure Selec feaure ha gves maxmum mprovemen n valdaon error Sop when no sgnfcan mprovemen Backward search Sar wh orgnal se of sze d Drop feaures wh smalles mpac on error Based on E Alpaydın 004 Inroducon o Machne Learnng The MIT Press (V.)
Subse Selecon There are ^d subses of d feaures Forward search: Add he bes feaure a each sep Se of feaures F nally Ø. A each eraon, fnd he bes new feaure j = argmn E ( F x ) Add xj o F f E ( F xj ) < E ( F ) Hll-clmbng O(d^) algorhm Backward search: Sar wh all feaures and remove one a a me, f possble. Floang search (Add k, remove l) Lecure Noes for E Alpaydın 00 Inroducon o Machne Learnng e The MIT Press (V.0) 47
Floang Search 48 Forward and backward search are greedy algorhms Selec bes opons a sngle sep Do no always acheve opmum value Floang search Two ypes of seps: Add k, remove l More compuaons Based on E Alpaydın 004 Inroducon o Machne Learnng The MIT Press (V.)
Feaure Exracon 49 Face recognon problem Tranng daa npu: pars of Image + Label(name) Classfer npu: Image Classfer oupu: Label(Name) Image: Marx of 56X56=65536 values n range 0..56 Each pxels bear lle nformaon so can selec 00 bes ones Average of pxels around specfc posons may gve an ndcaon abou an eye color. Based on E Alpaydın 004 Inroducon o Machne Learnng The MIT Press (V.)
Projecon 50 Fnd a projecon marx w from d-dmensonal o k-dmensonal vecors ha keeps error low Based on E Alpaydın 004 Inroducon o Machne Learnng The MIT Press (V.)
PCA: Movaon 5 Assume ha d observables are lnear combnaon of k<d vecors z =w x + +w k x d We would lke o work wh bass as has lesser dmenson and have all(almos) requred nformaon Wha we expec from such bass Uncorrelaed or oherwse can be reduced furher Have large varance (e.g. w have large varaon) or oherwse bear no nformaon Based on E Alpaydın 004 Inroducon o Machne Learnng The MIT Press (V.)
PCA: Movaon 5 Based on E Alpaydın 004 Inroducon o Machne Learnng The MIT Press (V.)
PCA: Movaon 53 Choose drecons such ha a oal varance of daa wll be maxmum Maxmze Toal Varance Choose drecons ha are orhogonal Mnmze correlaon Choose k<d orhogonal drecons whch maxmze oal varance Based on E Alpaydın 004 Inroducon o Machne Learnng The MIT Press (V.)
PCA 54 Choosng only drecons: Maxmze varance subjec o a consran usng Lagrange Mulplers Takng Dervaves Egenvecor. Snce wan o maxmze we should choose an egenvecor wh larges egenvalue Based on E Alpaydın 004 Inroducon o Machne Learnng The MIT Press (V.)
PCA 55 d-dmensonal feaure space d by d symmerc covarance marx esmaed from samples Selec k larges egenvalue of he covarance marx and assocaed k egenvecors The frs egenvecor wll be a drecon wh larges varance Based on E Alpaydın 004 Inroducon o Machne Learnng The MIT Press (V.)
Wha PCA does 56 z = W T (x m) where he columns of W are he egenvecors of, and m s sample mean Ceners he daa a he orgn and roaes he axes Based on E Alpaydın 004 Inroducon o Machne Learnng The MIT Press (V.)
How o choose k? 57 Proporon of Varance (PoV) explaned λ λ + λ + λ + + + λ k + λk + + λ d when λ are sored n descendng order Typcally, sop a PoV>0.9 Scree graph plos of PoV vs k, sop a elbow Lecure Noes for E Alpaydın 004 Inroducon o Machne Learnng The MIT Press (V.)
58 Lecure Noes for E Alpaydın 004 Inroducon o Machne Learnng The MIT Press (V.)
PCA 59 PCA s unsupervsed (does no ake no accoun class nformaon) Can ake no accoun classes : Karhuned-Loeve Expanson Esmae Covarance Per Class Take average weghed by pror Common Prncple Componens Assume all classes have same egenvecors (drecons) bu dfferen varances Lecure Noes for E Alpaydın 004 Inroducon o Machne Learnng The MIT Press (V.)
PCA 60 Does no ry o explan nose Large nose can become new dmenson/larges PC Ineresed n resulng uncorrelaed varables whch explan large poron of oal sample varance Somemes neresed n explaned shared varance (common facors) ha affec daa Based on E Alpaydın 004 Inroducon o Machne Learnng The MIT Press (V.)