BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics

BIOINF 585: Machie Learig for Systems Biology & Cliical Iformatics Lecture 14: Dimesio Reductio Jie Wag Departmet of Computatioal Medicie & Bioiformatics Uiversity of Michiga 1

Outlie What is feature reductio? Why feature reductio? Feature reductio algorithms Priciple Compoet Aalysis (PCA) 2

What is feature reductio Feature reductio refers to the mappig of the origial high-dimesioal data oto a lower-dimesioal space. Criterio for feature reductio ca be differet based o differet problem settigs. Usupervised settig: miimize the iformatio loss Supervised settig: maximize the class discrimiatio Give a set of data poits x 1, x 2,, x with p variables, we compute a liear trasformatio G R p d : x R p y = G T x R d, d p. 3

What is feature reductio Origial data reduced data Liear trasformatio G T R d p y R d x R p G R p d : x R p y = G T x R d, d p. 4

High-dimesioal data Gee expressio Face images Hadwritte digits 5

Feature reductio vs Feature selectio Feature reductio We create a small umber of ew features based o the full set of origial features The trasformed features ca be liear combiatios of the origial oes Feature selectio We oly make use of a subset of the origial features (sparse models). 6

Why feature reductio Most machie learig ad data miig techiques may ot be effective for high-dimesioal data Curse of dimesioality The itrisic dimesio may be small For example, the umber of gees resposible for a certai type of disease may be small 7

Why feature reductio Visualizatio: projectio of high-dimesioal data oto 2D or 3D. Data compressio: efficiet storage ad retrieval Feature extractio: sythesis of a smaller set of ew features that are hopefully more useful 8

Feature reductio algorithms Usupervised Idepedet compoet aalysis (ICA) Priciple compoet aalysis (PCA) Caoical correlatio aalysis (CCA) Supervised Liear discrimiat aalysis (LDA) Semi-supervised Research topic 9

Feature reductio algorithms Liear Priciple compoet aalysis (PCA) Liear discrimiat aalysis (LDA) Caoical correlatio aalysis (CCA) Noliear Noliear feature reductio usig kerels Maifold learig 10

Priciple Compoet Aalysis Priciple compoet aalysis (PCA) Reduce the dimesioality of a data set by fidig a ew set of variables, smaller tha the origial set of variables Retais most of the sample s iformatio Useful for the compressio ad classificatio of data By iformatio, we mea the variace preset i the sample. The ew variables, called priciple compoets (PCs), are ucorrelated, ad are ordered by the fractio of the total iformatio each retais. 11

How to derive PCs Maximize the variace of the data samples that are projected ito a subspace with a fixed dimesio. Miimize the recostructio error. 12

How to derive PCs Maximize the variace of the data samples that are projected ito a subspace with a fixed dimesio. Miimize the recostructio error. 13

Sample variace average x 1 x 2 x 3 x 4 x x = 1 i=1 x i Sample variace: 1 i=1 x i x 2 2 14

The first priciple compoet The first priciple compoet is the vector p 1 such that the variace of the projected data oto p 1 is maximized. x i p 1 : p 1 = 1 The variace of the projected data o to p 1 is 1 i=1 z i z 2 2 z i = p 1, x i p 1 = 1 i=1 p 1, x i p 1 p 1, x p 1 2 2 For projectios, please refer to G. Strag: Lecture 15 & 16. http://ocw.mit.edu/courses/mathematics/18-06-liearalgebra-sprig-2010/idex.htm = 1 i=1 p 1, x i x 2 p 1 2 2 = 1 15

The first priciple compoet The variace of the projected data o to p 1 is 1 i=1 p 1, x i x 2 covariace matrix = 1 p T 1 x i x p T 1 x i x i=1 = 1 p T 1 x i x x i x T p 1 i=1 S = 1 x i x x i x T i=1 symmetric = p 1 T 1 i=1 x i x x i x T p 1 = p 1 T Sp 1 16

The first priciple compoet Fidig the first priciple compoet boils dow to the followig optimizatio problem. max p 1 T We solve this problem by Lagrage multiplier. Sp 1 p 1 : p 1 =1 https://www.cs.berkeley.edu/~klei/papers/lag rage-multipliers.pdf L p 1, λ 1 = p 1 T Sp 1 + λ 1 1 p 1 T p 1 Lagrage fuctio p 1 L p 1, λ 1 = 2Sp 1 2λ 1 p 1 = 0 Optimality coditio Sp 1 = λ 1 p 1 p 1 T Sp 1 = λ 1 p 1 T p 1 = λ 1 p 1 is a eigevector of S λ 1 is the largest eigevalue of S 17

The first priciple compoet Projectio of two dimesioal data usig oe dimesioal PCA. Figure is from D. Barber, Bayesia Reasoig ad Machie Learig. 18

The secod priciple compoet The secod priciple compoet is the vector p 2 such that p 2 is perpedicular to p 1 ad the variace of the projected data oto p 1 is maximized. max p T 2 Sp 2 p T 1 p 2 =0, p 2 =1 L p 2, λ 2, μ = p 2 T Sp 2 + λ 2 1 p 2 T p 2 μp 1 T p 2 p 2 L p 2, λ 2 = 2Sp 2 2λ 2 p 2 μp 1 = 0 2p 1 T Sp 2 2λ 2 p 1 T p 2 μp 1 T p 1 = 0 2λ 1 p 1 T p 2 2λ 2 p 1 T p 2 μp 1 T p 1 = 0 μ = 0 Sp 2 = λ 2 p 2 p 2 T Sp 2 = λ 2 p 2 T p 2 = λ 2 p 2 is a eigevector of S λ 2 is the secod largest eigevalue of S 19

PCA for image compressio d=1 d=2 d=4 d=8 d=16 d=32 d=64 d=100 Origial Image 20

Refereces C. Burges. Dimesio Reductio: A Guided Tour. Foudatios ad Treds i Machie Learig, 2009. J. Cuigham ad Z. Ghahramai. Liear Dimesioality Reductio: Survey, Isights, ad Geeralizatios. Joural of Machie Learig Research, 2015. 21