Dimensionality reduction Feature selection

CS 675 Itroucto to ache Learg Lecture Dmesoalty reucto Feature selecto los Hauskrecht mlos@cs.ptt.eu 539 Seott Square Dmesoalty reucto. otvato. L methos are sestve to the mesoalty of ata Questo: Is there a lower mesoal represetato of the ata that captures well ts characterstcs? Objectve of mesoalty reucto: F a lower mesoal represetato of ata Two learg problems: Supervse D {, y,, y,..,,, y},,.., Usupervse,,..,,,.., Goal: replace,,.., wth ' of mesoalty < D { }

Dmesoalty reucto Solutos: Selecto of a smaller subset of puts features from a large set of puts; tra classfer o the reuce put set Combato of hgh mesoal puts to a smaller set of features ; tra classfer o ew features k selecto combato Task-epeet feature selecto Assume: Classfcato problem: put vector, y - output Objectve: F a subset of puts/features that gves/preserves most of the output precto capabltes Selecto approaches: Last lecture Flterg approaches Flter out features wth small prectve potetal Doe before classfcato; typcally uses uvarate aalyss Wrapper approaches Select features that rectly optmze the accuracy of the multvarate classfer Embee methos Feature selecto a learg closely te the metho Regularzato methos, ecso tree methos

Feature selecto through flterg Assume: Classfcato problem: put vector, y - output How to select the features/puts? For each put Calculate a score reflectg how well output y aloe prects the Pck the puts wth the best scores or equvaletly elmate/flter the puts wth the worst scores Feature scorg for classfcato Scores for measurg the fferetal epresso T-Test score Bal & Log Base o the test that two groups come from the same populato Null hypothess: s mea of class 0 = mea of class Class 0 Class 3

4 Feature scorg for classfcato Scores for measurg the fferetal epresso Fsher Score AUROC score: Area uer Recever Operatg Characterstc curve Fsher Class 0 Class Feature scorg Correlato coeffcets easures lear epeeces utual formato easures epeeces Nees scretze put values ~ ~, ~ log, ~, y P j P y j P y j P y I k k k j k,, y Var Var y Cov y k k k

Feature/put epeeces Uvarate score assumptos: Oly oe put a ts effect o y s corporate the score Effects of two features o y are cosere to be epeet Correlato base feature selecto A partal soluto to the above problem Iea: goo feature subsets cota features that are hghly correlate wth the class but epeet of each other Assume a set of features S of sze. The S r y r Average correlato betwee a class y Average correlato betwee pars of s r r y Feature selecto: low sample sze Problems: ay puts a low sample sze f may raom features, a ot may staces we ca lear from, the features wth a goo fferetally epresse score may arse smply by chace The probablty of ths happeg ca be qute large Techques to aress the problem: reuce FDR False scovery rate a FWER Famly wse error. 5

Feature selecto: wrappers Wrapper approach: The put/feature selecto s rve by the precto accuracy of the classfer regressor we actually wat to bult How to f the approprate feature subset S? For puts/features there are fferet feature subsets Iea: Greey search the space of classfers Graually a features mprovg the qualty of the moel Graually remove features that effect the accuracy the least Score shoul reflect the accuracy of the classfer error a also prevet overft Staar way to measure the qualty of the moel: Iteral cross-valato k-fol cross valato Iteral cross-valato Splt tra set: to teral tra a test sets Iteral tra set: tra fferet moels efe e.g. o fferet subsets of features Iteral test set/s: estmate the geeralzato error a select the best moel amog possble moels Iteral cross-valato k-fol: Dve the tra ata to m equal parttos of sze N/k Hol out oe partto for valato, tra the classfers o the rest of ata Repeat such that every partto s hel out oce The estmate of the geeralzato error of the learer s the mea of errors of o all parttos 6

Feature selecto: wrappers Eample: Greey forwar search: Assume a logstc regresso moel Start wth a smple moel: Choose feature p y, w g wo w j p y, w g w o wth the best error the teral step Choose feature wth the best error the teral step p y, w g wo w wj j Etc. Whe to stop? Goal: Stop ag features whe the yteral error o the ata stops mprovg Embee methos Feature selecto + classfcato moel learg oe jotly Eamples of embee methos: Regularze moels oels of hgher complety are eplctly pealze leag to vrtual removal of puts from the moel Covers: Regularze logstc/lear regresso Support vector maches» Optmzato of margs pealzes ozero weghts w, D L w, D R w J Fucto to optmze CART/Decso trees Loss fucto ft of the ata Regularzato pealty 7

Usupervse mesoalty reucto Is there a lower mesoal represetato of the ata that captures well ts characterstcs? Assume: We have a ata { } such that,,.., N,,.., Assume the meso of the ata pot s very large We wat to aalyze, there s o class label y Our goal: F a lower mesoal represetato of ata of meso < Prcpal compoet aalyss PCA Objectve: We wat to replace a hgh mesoal put wth a small set of puts obtae by combg puts Dfferet from the feature subset selecto!!! PCA: A lear trasformato of mesoal put to mesoal feature vector z such that z A ay fferet trasformatos ests, whch oe to pck? PCA selects the lear trasformato for whch the retae varace s mamal Or, equvaletly t s the lear trasformato for whch the sum of squares recostructo cost s mmze 8

PCA: eample 40 0 0-0 30 40 0-40 40 30 0 0 0-0 -0-30 -30-0 -0 0 0 Projectos to fferet as PCA 9

PCA PCA projecto to the mesoal space PCA PCA projecto to the mesoal space 40 30 Xprm=0.04+ 0.06y- 0.99z Yprm=0.70+0.70y+0.07z 97% varace retae 0 0 Yprm 0-0 -0-30 -40-40 -30-0 -0 0 0 0 30 40 50 Xprm 0

Prcpal compoet aalyss PCA PCA: lear trasformato of a mesoal put to mesoal vector z such that uer whch the retae varace s mamal. Remember: o y s eee Fact: A vector ca be represete usg a set of orthoormal vectors u z u Leas to trasformato of coorates from to z usg u s z u T Prcpal compoet aalyss PCA Fact: A vector ca be represete usg a set of orthoormal vectors u z u Leas to trasformato of coorates from to z usg u s z u T New bases: u, u, u 3 Staar bases:,0,0; 0,,0; 0,0,

PCA Iea: replace coorates wth of coorates to represet. We wat to f the subset of bass vectors. How to choose the best set of bass vectors? We wat the subset that gves the best appromato of ata the ataset o average we use least squares ft z b ~ u u z b - costat a fe b z ~ u Error for ata etry N N b z E ~ Recostructo error PCA Dfferetate the error fucto wth regar to all a set equal to 0 we get: The we ca rewrte: The error fucto s optmze whe bass vectors satsfy: The best bass vectors: scar vectors wth - smallest egevalues or keep vectors wth largest egevalues Egevector s calle a prcpal compoet u T N z N b b N N T E Σu u T N Σ u Σu E u

PCA Oce egevectors u wth largest egevalues are etfe, they are use to trasform the orgal -mesoal ata to mesos u u To f the true mesoalty of the ata we ca just look at egevalues that cotrbute the most small egevalues are sregare Problem: PCA s a lear metho. The true mesoalty ca be overestmate. There ca be o-lear correlatos. ofcatos for oleartes: kerel PCA Dmesoalty reucto wth eural ets PCA s lmte to lear mesoalty reucto To o o-lear reuctos we ca use eural ets Auto-assocatve or auto-ecoer etwork: a eural etwork wth the same puts a outputs z z, z The mle layer correspos to the reuce mesos 3

Dmesoalty reucto wth eural ets Error crtero: E N y Error measure tres to recover the orgal ata through lmte umber of mesos the mle layer No-leartes moele through termeate layers betwee the mle layer a put/output If o termeate layers are use the moel replcates PCA optmzato through learg z z, z Dmesoalty reucto through clusterg Clusterg algorthms group together smlar staces the ata sample Dmesoalty reucto base o clusterg: Replace a hgh mesoal ata etry wth a cluster label Problem: Determstc clusterg gves oly oe label per put ay ot be eough to represet the ata for precto Solutos: Clusterg over subsets of put ata Soft clusterg probablty of a cluster s use rectly 4

Dmesoalty reucto through clusterg Soft clusterg e.g. mture of Gaussas attempts to cover all staces the ata sample wth a small umber of groups Each group s more or less resposble for a ata etry resposblty a posteror of a group gve the ata etry ture of G. resposblty l k p y Dmesoalty reucto base o soft clusterg Replace a hgh mesoal ata wth the set of group posterors Fee all posterors to the learer e.g. lear regressor, classfer h u u l p y l l l u CS 750 ache Learg Dmesoalty reucto through clusterg We ca use the ea of soft clusterg before applyg regresso/classfcato learg Two stage algorthms Lear the clusterg Lear the classfcato Iput clusterg: hgh mesoal Output clusterg Iput classfer: p c Output classfer: y Eample: Networks wth Raal Bass Fuctos RBFs Problem: Clusterg leare base o p sregars the target Precto base o p y 5