Graph Wavelets to Analyze Genomic Data with Biological Networks

Size: px

Start display at page:

Download "Graph Wavelets to Analyze Genomic Data with Biological Networks"

Phyllis Moore
5 years ago
Views:

1 Graph Wavelets to Analyze Genomic Data with Biological Networks Yunlong Jiao and Jean-Philippe Vert "Emerging Topics in Biological Networks and Systems Biology" symposium, Swedish Collegium for Advanced Study, Uppsala, October 11, 2017

3 Motivation

4 Typical problem X R n p gene expression profile of each patient Y Y n survival information of each patient n = p = Goal: learn to predict Y from X Difficult (n < p)

to measures the fit to the training data J(β) is a penalty to

5 Regularized linear models Fit a linear model β R p by solving where min R(Y, Xβ) + λj(β), β Rp R(Y, Xβ) is an empirical risk to measures the fit to the training data J(β) is a penalty to control the complexity of the model λ > 0 is a regularization parameter

6 Standard regularizations min R(Y, Xβ) + λj(β) β Rp where Lasso: J(β) = β 1 for gene selection. Ridge: J(β) = β 2 2 to address n m. Elastic net: J(β) = α β (1 α) β 1

7 Network-based regularizations G = (V, E) a graph of genes J G (β) =? β should be "smooth" on the graph? Selected genes should be connected?

8 Examples J G (β) = i j (β i β j ) 2 (Rapaport et al., 2007) J G (β) = a β 1 + (1 a) i j J G (β) = (β i β j ) 2 (Li and Li, 2008) sup α R p : i j α 2 i +α2 j 1 α β (Jacob et al., 2009) J G (β) = a β 1 + (1 a) i j β i β j (Hoefling, 2010)

9 Examples J G (β) = i j (β i β j ) 2 (Rapaport et al., 2007) J G (β) = a β 1 + (1 a) i j J G (β) = (β i β j ) 2 (Li and Li, 2008) sup α R p : i j α 2 i +α2 j 1 α β (Jacob et al., 2009) J G (β) = a β 1 + (1 a) i j β i β j (Hoefling, 2010)

10 From smoothness penalty to Laplacian (β i β j ) 2 = β Lβ i j where L = D A is the graph Laplacian L =

11 From Laplacian to Fourier analysis: L = UΛU Eigenvectors U of L form the Fourier basis: ˆβ = U β Eigenvalues Λ = (0 = λ 1... λ p ) represent the "frequencies" of the Fourier basis lambda = 0 Lambda =

12 From Laplacian to Fourier analysis: L = UΛU Eigenvectors U of L form the Fourier basis: ˆβ = U β Eigenvalues Λ = (0 = λ 1... λ p ) represent the "frequencies" of the Fourier basis lambda = 0.12 Lambda =

13 From Laplacian to Fourier analysis: L = UΛU Eigenvectors U of L form the Fourier basis: ˆβ = U β Eigenvalues Λ = (0 = λ 1... λ p ) represent the "frequencies" of the Fourier basis lambda = 0.47 Lambda =

14 From Laplacian to Fourier analysis: L = UΛU Eigenvectors U of L form the Fourier basis: ˆβ = U β Eigenvalues Λ = (0 = λ 1... λ p ) represent the "frequencies" of the Fourier basis lambda = 1 Lambda =

15 From Laplacian to Fourier analysis: L = UΛU Eigenvectors U of L form the Fourier basis: ˆβ = U β Eigenvalues Λ = (0 = λ 1... λ p ) represent the "frequencies" of the Fourier basis lambda = 1.7 Lambda =

16 From Laplacian to Fourier analysis: L = UΛU Eigenvectors U of L form the Fourier basis: ˆβ = U β Eigenvalues Λ = (0 = λ 1... λ p ) represent the "frequencies" of the Fourier basis lambda = 2.3 Lambda =

17 From Laplacian to Fourier analysis: L = UΛU Eigenvectors U of L form the Fourier basis: ˆβ = U β Eigenvalues Λ = (0 = λ 1... λ p ) represent the "frequencies" of the Fourier basis lambda = 3 Lambda =

18 From Laplacian to Fourier analysis: L = UΛU Eigenvectors U of L form the Fourier basis: ˆβ = U β Eigenvalues Λ = (0 = λ 1... λ p ) represent the "frequencies" of the Fourier basis lambda = 3.5 Lambda =

19 From Laplacian to Fourier analysis: L = UΛU Eigenvectors U of L form the Fourier basis: ˆβ = U β Eigenvalues Λ = (0 = λ 1... λ p ) represent the "frequencies" of the Fourier basis lambda = 3.9 Lambda =

20 Smoothness in the Fourier domain Therefore, the smoothness penalty penalizes Fourier coefficients corresponding to high frequencies: J(β) = i j (β i β j ) 2 = β Lβ = β UΛU β = ˆβ Λ ˆβ = p λ i ˆβ i 2 i=1 "the linear model mapped on the graph should have little energy at high frequency" Rapaport et al. (2007) extends this to more general penalties: J φ (β) = p φ(λ i ) ˆβ i 2 s.t. β = U ˆβ i=1 for φ : R + R + non-decreasing.

21 Fourier vs wavelets Fourier Wavelets Localized in frequency Localized in frequency AND space

22 Wavelets on graphs A family of vectors {Ψ v,s } R p where v [1, p] is a vertex (space) s R + is a scale (frequency) In practice we choose a small number of scales s 1 <... < s S This results in p S vectors (overcomplete basis) continuous-wavelet-transform-(cwt).html

Example on graphs 146 D.K. Hammond et al. / Appl. Comput. Harmon. Anal. 30 (2011) 129 150 Fig. 4.

23 Example on graphs 146 D.K. Hammond et al. / Appl. Comput. Harmon. Anal. 30 (2011) Fig. 4. Spectral graph wavelets on Minnesota road graph, with K = 100, J = 4 scales. (a) Vertex at which wavelets are centered, (b) scaling function, (c) (f) wavelets, scales 1 4.

24 Example on graphs D.K. Hammond et al. / Appl. Comput. Harmon. Anal. 30 (2011) Fig. 5. Spectral graph wavelets on cerebral cortex, with K = 50, J = 4scales.(a)ROIatwhichwaveletsarecentered,(b)scalingfunction,(c) (f)wavelets, scales 1 4. (Hammond et al., 2011)

25 How to make the wavelet basis {Ψ v,t }? Hammond et al. (2011) propose spectral graph wavelets Formally, at scale s > 0, Ψ s = (Ψ 1,s... Ψ p,s ) = Ug(sΛ)U where g : R + R + is a function that satisfies g is a band-pass filter (localization in frequency) g is smooth D.K. Hammond nearet 0 al. (this / Appl. ensures Comput. Harmon. localization Anal. 30 (2011) in space)

26 Graph wavelet-based regularization Given a graph, compute a redundant set of S p wavelet basis at different scales: B = (Ψ s1... Ψ ss ) Take for penalty the atomic norm: S p S p J(β) = min c i : β = c i B i i=1 i=1 J(β) is small when β is a sum of a few atoms The atom Ψ s,v has weights in a neighborhood of v of "size" s

27 Summary: Fourier vs. wavelet penalty Fourier: β will be smooth on the graph min R(Y, Xβ) + λj(β) β Rp J φ (β) = φ(λ) ˆβ 2 2 s.t. β = U ˆβ Wavelet J(β) = min { c 1 : β = Bc} β will decompose as a sum of localized functions on the graph (pathways?)

28 Experiment Protein-protein interaction (PPI) network obtained from Human Protein Reference Database (HPRD). METABRIC breast cancer dataset - n = 1, 981 breast cancer samples paired with survival information of patients. - Expression data of a total of 24, 771 genes available, among which p = 9, 117 genes are found with known interaction in HPRD. Benchmark study comparing 6 penalty functions: Label Penalty function J(β) Network-based Gene selection ridge β 2 2 lasso β 1 e-net a β 1 + (1 a) β 2 2 i j (β i β j ) 2 lap laplasso a β 1 + (1 a) i j (β i β j ) 2 wavelet min θ θ 1 s.t. β = Ψθ

29 Prediction performance concordance index label ridge lasso e net lap laplasso wavelet label Boxplots on survival risk prediction performance evaluated by concordance index scores over 5-fold cross-validation repeated 10 times of the METABRIC data.

30 Prediction performance Label Mean CI scores (± SD) Network-based Gene selection ridge (±0.018) lap (±0.0193) laplasso (±0.0185) e-net (±0.0183) wavelet (±0.0198) lasso (±0.0177) Mean concordance index (CI) scores (± standard deviation) of survival risk prediction over 5-fold cross-validation repeated 10 times of the METABRIC data. Methods are ordered by decreasing mean CI scores.

31 Gene selection performance: Stability number of commonly selected genes label lasso e net laplasso wavelet number of selected genes Stability performance of gene selection related to breast cancer survival, estimated over 100 random experiments. The black dotted curve denotes random selection.

32 Gene selection performance: Connectivity number of connecting edges label lasso e net laplasso wavelet number of selected genes Connectivity performance of gene selection related to breast cancer survival, where special marks correspond to the number tuned by cross-validation. The black dotted curve denotes random selection.

33 Gene selection performance: Interpretability Figure: Gene subnetworks related to breast cancer survival identified by regularization methods identified by the elastic net (10 genes connected out of 112 selected) or the Laplacian lasso (10 genes connected out of 100 selected).

34 Gene selection performance: Interpretability Figure: Gene subnetworks related to breast cancer survival identified by regularization methods identified by network-based wavelet smoothing (82 genes connected out of 109 selected).

35 Conclusion Can biological networks help define a structure on high-dimensional omics data? Fourier-based penalties (smoothness, diffusion...) already exist Wavelets-based penalties decompose a signal over a basis localized in space localized in frequency Preliminary results on gene expression classification

36 internet&de&l ANRT!(rubrique!"zoom!sur")! Promotion!aux!Rencontres!Universités!Entreprises!(RUE)! Echange!avec!Stéphane!Mallat!et!réflexion!sur!la!pertinence!du! programme!tronc!commun! Rencontre!avec!le!responsable!des!relations!internationales!de!la! National!Taiwan!University!à!l ESPCI:!présentation!d ITI! Contact!en!cours!pour!visite!d entreprise! Article&les&Echos& Création&du&logo&PSL/ITI&et&du&Diplôme&Supérieur&de&Recherche&et& d Innovation&de&PSL/ITI&(DSRI),&dépôt&INPI&en&cours.& Thanks!! 2/!MISE!EN!ŒUVRE!OPERATIONNELLE!DU!PROGRAMME! Point&d étape&iti&/&20&fevrier& &1er&JUILLET&2014&! C.SURIAM!!F.LEQUEUX!!!!

37 References D. K. Hammond, P. Vandergheynst, and R. Gribonval. Wavelets on graphs via spectral graph theory. Applied and Computational Harmonic Analysis, 30(2): , ISSN doi: /j.acha H. Hoefling. A path algorithm for the Fused Lasso Signal Approximator. J. Comput. Graph. Stat., 19(4): , doi: /jcgs URL L. Jacob, G. Obozinski, and J.-P. Vert. Group lasso with overlap and graph lasso. In ICML 09: Proceedings of the 26th Annual International Conference on Machine Learning, pages , New York, NY, USA, ACM. ISBN doi: / URL C. Li and H. Li. Network-constrained regularization and variable selection for analysis of genomic data. Bioinformatics, 24: , May ISSN doi: /bioinformatics/btn081. F. Rapaport, A. Zinovyev, M. Dutreix, E. Barillot, and J.-P. Vert. Classification of microarray data using gene networks. BMC Bioinformatics, 8:35, doi: / URL R. Tibshirani. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B, 58(1): , URL

6. Regularized linear regression

Foundations of Machine Learning École Centrale Paris Fall 2015 6. Regularized linear regression Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr