Graph Wavelets to Analyze Genomic Data with Biological Networks

Size: px
Start display at page:

Download "Graph Wavelets to Analyze Genomic Data with Biological Networks"

Transcription

1 Graph Wavelets to Analyze Genomic Data with Biological Networks Yunlong Jiao and Jean-Philippe Vert "Emerging Topics in Biological Networks and Systems Biology" symposium, Swedish Collegium for Advanced Study, Uppsala, October 11, 2017

2

3 Motivation

4 Typical problem X R n p gene expression profile of each patient Y Y n survival information of each patient n = p = Goal: learn to predict Y from X Difficult (n < p)

5 Regularized linear models Fit a linear model β R p by solving where min R(Y, Xβ) + λj(β), β Rp R(Y, Xβ) is an empirical risk to measures the fit to the training data J(β) is a penalty to control the complexity of the model λ > 0 is a regularization parameter

6 Standard regularizations min R(Y, Xβ) + λj(β) β Rp where Lasso: J(β) = β 1 for gene selection. Ridge: J(β) = β 2 2 to address n m. Elastic net: J(β) = α β (1 α) β 1

7 Network-based regularizations G = (V, E) a graph of genes J G (β) =? β should be "smooth" on the graph? Selected genes should be connected?

8 Examples J G (β) = i j (β i β j ) 2 (Rapaport et al., 2007) J G (β) = a β 1 + (1 a) i j J G (β) = (β i β j ) 2 (Li and Li, 2008) sup α R p : i j α 2 i +α2 j 1 α β (Jacob et al., 2009) J G (β) = a β 1 + (1 a) i j β i β j (Hoefling, 2010)

9 Examples J G (β) = i j (β i β j ) 2 (Rapaport et al., 2007) J G (β) = a β 1 + (1 a) i j J G (β) = (β i β j ) 2 (Li and Li, 2008) sup α R p : i j α 2 i +α2 j 1 α β (Jacob et al., 2009) J G (β) = a β 1 + (1 a) i j β i β j (Hoefling, 2010)

10 From smoothness penalty to Laplacian (β i β j ) 2 = β Lβ i j where L = D A is the graph Laplacian L =

11 From Laplacian to Fourier analysis: L = UΛU Eigenvectors U of L form the Fourier basis: ˆβ = U β Eigenvalues Λ = (0 = λ 1... λ p ) represent the "frequencies" of the Fourier basis lambda = 0 Lambda =

12 From Laplacian to Fourier analysis: L = UΛU Eigenvectors U of L form the Fourier basis: ˆβ = U β Eigenvalues Λ = (0 = λ 1... λ p ) represent the "frequencies" of the Fourier basis lambda = 0.12 Lambda =

13 From Laplacian to Fourier analysis: L = UΛU Eigenvectors U of L form the Fourier basis: ˆβ = U β Eigenvalues Λ = (0 = λ 1... λ p ) represent the "frequencies" of the Fourier basis lambda = 0.47 Lambda =

14 From Laplacian to Fourier analysis: L = UΛU Eigenvectors U of L form the Fourier basis: ˆβ = U β Eigenvalues Λ = (0 = λ 1... λ p ) represent the "frequencies" of the Fourier basis lambda = 1 Lambda =

15 From Laplacian to Fourier analysis: L = UΛU Eigenvectors U of L form the Fourier basis: ˆβ = U β Eigenvalues Λ = (0 = λ 1... λ p ) represent the "frequencies" of the Fourier basis lambda = 1.7 Lambda =

16 From Laplacian to Fourier analysis: L = UΛU Eigenvectors U of L form the Fourier basis: ˆβ = U β Eigenvalues Λ = (0 = λ 1... λ p ) represent the "frequencies" of the Fourier basis lambda = 2.3 Lambda =

17 From Laplacian to Fourier analysis: L = UΛU Eigenvectors U of L form the Fourier basis: ˆβ = U β Eigenvalues Λ = (0 = λ 1... λ p ) represent the "frequencies" of the Fourier basis lambda = 3 Lambda =

18 From Laplacian to Fourier analysis: L = UΛU Eigenvectors U of L form the Fourier basis: ˆβ = U β Eigenvalues Λ = (0 = λ 1... λ p ) represent the "frequencies" of the Fourier basis lambda = 3.5 Lambda =

19 From Laplacian to Fourier analysis: L = UΛU Eigenvectors U of L form the Fourier basis: ˆβ = U β Eigenvalues Λ = (0 = λ 1... λ p ) represent the "frequencies" of the Fourier basis lambda = 3.9 Lambda =

20 Smoothness in the Fourier domain Therefore, the smoothness penalty penalizes Fourier coefficients corresponding to high frequencies: J(β) = i j (β i β j ) 2 = β Lβ = β UΛU β = ˆβ Λ ˆβ = p λ i ˆβ i 2 i=1 "the linear model mapped on the graph should have little energy at high frequency" Rapaport et al. (2007) extends this to more general penalties: J φ (β) = p φ(λ i ) ˆβ i 2 s.t. β = U ˆβ i=1 for φ : R + R + non-decreasing.

21 Fourier vs wavelets Fourier Wavelets Localized in frequency Localized in frequency AND space

22 Wavelets on graphs A family of vectors {Ψ v,s } R p where v [1, p] is a vertex (space) s R + is a scale (frequency) In practice we choose a small number of scales s 1 <... < s S This results in p S vectors (overcomplete basis) continuous-wavelet-transform-(cwt).html

23 Example on graphs 146 D.K. Hammond et al. / Appl. Comput. Harmon. Anal. 30 (2011) Fig. 4. Spectral graph wavelets on Minnesota road graph, with K = 100, J = 4 scales. (a) Vertex at which wavelets are centered, (b) scaling function, (c) (f) wavelets, scales 1 4.

24 Example on graphs D.K. Hammond et al. / Appl. Comput. Harmon. Anal. 30 (2011) Fig. 5. Spectral graph wavelets on cerebral cortex, with K = 50, J = 4scales.(a)ROIatwhichwaveletsarecentered,(b)scalingfunction,(c) (f)wavelets, scales 1 4. (Hammond et al., 2011)

25 How to make the wavelet basis {Ψ v,t }? Hammond et al. (2011) propose spectral graph wavelets Formally, at scale s > 0, Ψ s = (Ψ 1,s... Ψ p,s ) = Ug(sΛ)U where g : R + R + is a function that satisfies g is a band-pass filter (localization in frequency) g is smooth D.K. Hammond nearet 0 al. (this / Appl. ensures Comput. Harmon. localization Anal. 30 (2011) in space)

26 Graph wavelet-based regularization Given a graph, compute a redundant set of S p wavelet basis at different scales: B = (Ψ s1... Ψ ss ) Take for penalty the atomic norm: S p S p J(β) = min c i : β = c i B i i=1 i=1 J(β) is small when β is a sum of a few atoms The atom Ψ s,v has weights in a neighborhood of v of "size" s

27 Summary: Fourier vs. wavelet penalty Fourier: β will be smooth on the graph min R(Y, Xβ) + λj(β) β Rp J φ (β) = φ(λ) ˆβ 2 2 s.t. β = U ˆβ Wavelet J(β) = min { c 1 : β = Bc} β will decompose as a sum of localized functions on the graph (pathways?)

28 Experiment Protein-protein interaction (PPI) network obtained from Human Protein Reference Database (HPRD). METABRIC breast cancer dataset - n = 1, 981 breast cancer samples paired with survival information of patients. - Expression data of a total of 24, 771 genes available, among which p = 9, 117 genes are found with known interaction in HPRD. Benchmark study comparing 6 penalty functions: Label Penalty function J(β) Network-based Gene selection ridge β 2 2 lasso β 1 e-net a β 1 + (1 a) β 2 2 i j (β i β j ) 2 lap laplasso a β 1 + (1 a) i j (β i β j ) 2 wavelet min θ θ 1 s.t. β = Ψθ

29 Prediction performance concordance index label ridge lasso e net lap laplasso wavelet label Boxplots on survival risk prediction performance evaluated by concordance index scores over 5-fold cross-validation repeated 10 times of the METABRIC data.

30 Prediction performance Label Mean CI scores (± SD) Network-based Gene selection ridge (±0.018) lap (±0.0193) laplasso (±0.0185) e-net (±0.0183) wavelet (±0.0198) lasso (±0.0177) Mean concordance index (CI) scores (± standard deviation) of survival risk prediction over 5-fold cross-validation repeated 10 times of the METABRIC data. Methods are ordered by decreasing mean CI scores.

31 Gene selection performance: Stability number of commonly selected genes label lasso e net laplasso wavelet number of selected genes Stability performance of gene selection related to breast cancer survival, estimated over 100 random experiments. The black dotted curve denotes random selection.

32 Gene selection performance: Connectivity number of connecting edges label lasso e net laplasso wavelet number of selected genes Connectivity performance of gene selection related to breast cancer survival, where special marks correspond to the number tuned by cross-validation. The black dotted curve denotes random selection.

33 Gene selection performance: Interpretability Figure: Gene subnetworks related to breast cancer survival identified by regularization methods identified by the elastic net (10 genes connected out of 112 selected) or the Laplacian lasso (10 genes connected out of 100 selected).

34 Gene selection performance: Interpretability Figure: Gene subnetworks related to breast cancer survival identified by regularization methods identified by network-based wavelet smoothing (82 genes connected out of 109 selected).

35 Conclusion Can biological networks help define a structure on high-dimensional omics data? Fourier-based penalties (smoothness, diffusion...) already exist Wavelets-based penalties decompose a signal over a basis localized in space localized in frequency Preliminary results on gene expression classification

36 internet&de&l ANRT!(rubrique!"zoom!sur")! Promotion!aux!Rencontres!Universités!Entreprises!(RUE)! Echange!avec!Stéphane!Mallat!et!réflexion!sur!la!pertinence!du! programme!tronc!commun! Rencontre!avec!le!responsable!des!relations!internationales!de!la! National!Taiwan!University!à!l ESPCI:!présentation!d ITI! Contact!en!cours!pour!visite!d entreprise! Article&les&Echos& Création&du&logo&PSL/ITI&et&du&Diplôme&Supérieur&de&Recherche&et& d Innovation&de&PSL/ITI&(DSRI),&dépôt&INPI&en&cours.& Thanks!! 2/!MISE!EN!ŒUVRE!OPERATIONNELLE!DU!PROGRAMME! Point&d étape&iti&/&20&fevrier& &1er&JUILLET&2014&! C.SURIAM!!F.LEQUEUX!!!!

37 References D. K. Hammond, P. Vandergheynst, and R. Gribonval. Wavelets on graphs via spectral graph theory. Applied and Computational Harmonic Analysis, 30(2): , ISSN doi: /j.acha H. Hoefling. A path algorithm for the Fused Lasso Signal Approximator. J. Comput. Graph. Stat., 19(4): , doi: /jcgs URL L. Jacob, G. Obozinski, and J.-P. Vert. Group lasso with overlap and graph lasso. In ICML 09: Proceedings of the 26th Annual International Conference on Machine Learning, pages , New York, NY, USA, ACM. ISBN doi: / URL C. Li and H. Li. Network-constrained regularization and variable selection for analysis of genomic data. Bioinformatics, 24: , May ISSN doi: /bioinformatics/btn081. F. Rapaport, A. Zinovyev, M. Dutreix, E. Barillot, and J.-P. Vert. Classification of microarray data using gene networks. BMC Bioinformatics, 8:35, doi: / URL R. Tibshirani. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B, 58(1): , URL

6. Regularized linear regression

6. Regularized linear regression Foundations of Machine Learning École Centrale Paris Fall 2015 6. Regularized linear regression Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr

More information

Structured Sparse Estimation with Network Flow Optimization

Structured Sparse Estimation with Network Flow Optimization Structured Sparse Estimation with Network Flow Optimization Julien Mairal University of California, Berkeley Neyman seminar, Berkeley Julien Mairal Neyman seminar, UC Berkeley /48 Purpose of the talk introduce

More information

Regularization and Variable Selection via the Elastic Net

Regularization and Variable Selection via the Elastic Net p. 1/1 Regularization and Variable Selection via the Elastic Net Hui Zou and Trevor Hastie Journal of Royal Statistical Society, B, 2005 Presenter: Minhua Chen, Nov. 07, 2008 p. 2/1 Agenda Introduction

More information

Statistical learning on graphs

Statistical learning on graphs Statistical learning on graphs Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr ParisTech, Ecole des Mines de Paris Institut Curie INSERM U900 Seminar of probabilities, Institut Joseph Fourier, Grenoble,

More information

Package Grace. R topics documented: April 9, Type Package

Package Grace. R topics documented: April 9, Type Package Type Package Package Grace April 9, 2017 Title Graph-Constrained Estimation and Hypothesis Tests Version 0.5.3 Date 2017-4-8 Author Sen Zhao Maintainer Sen Zhao Description Use

More information

Lecture 14: Shrinkage

Lecture 14: Shrinkage Lecture 14: Shrinkage Reading: Section 6.2 STATS 202: Data mining and analysis October 27, 2017 1 / 19 Shrinkage methods The idea is to perform a linear regression, while regularizing or shrinking the

More information

The lasso: some novel algorithms and applications

The lasso: some novel algorithms and applications 1 The lasso: some novel algorithms and applications Newton Institute, June 25, 2008 Robert Tibshirani Stanford University Collaborations with Trevor Hastie, Jerome Friedman, Holger Hoefling, Gen Nowak,

More information

Group lasso for genomic data

Group lasso for genomic data Group lasso for genomic data Jean-Philippe Vert Mines ParisTech / Curie Institute / Inserm Machine learning: Theory and Computation workshop, IMA, Minneapolis, March 26-3, 22 J.P Vert (ParisTech) Group

More information

25 : Graphical induced structured input/output models

25 : Graphical induced structured input/output models 10-708: Probabilistic Graphical Models 10-708, Spring 2013 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Meghana Kshirsagar (mkshirsa), Yiwen Chen (yiwenche) 1 Graph

More information

2.3. Clustering or vector quantization 57

2.3. Clustering or vector quantization 57 Multivariate Statistics non-negative matrix factorisation and sparse dictionary learning The PCA decomposition is by construction optimal solution to argmin A R n q,h R q p X AH 2 2 under constraint :

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Convex relaxation for Combinatorial Penalties

Convex relaxation for Combinatorial Penalties Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,

More information

DATA MINING AND MACHINE LEARNING

DATA MINING AND MACHINE LEARNING DATA MINING AND MACHINE LEARNING Lecture 5: Regularization and loss functions Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Loss functions Loss functions for regression problems

More information

Statistical aspects of prediction models with high-dimensional data

Statistical aspects of prediction models with high-dimensional data Statistical aspects of prediction models with high-dimensional data Anne Laure Boulesteix Institut für Medizinische Informationsverarbeitung, Biometrie und Epidemiologie February 15th, 2017 Typeset by

More information

Building a Prognostic Biomarker

Building a Prognostic Biomarker Building a Prognostic Biomarker Noah Simon and Richard Simon July 2016 1 / 44 Prognostic Biomarker for a Continuous Measure On each of n patients measure y i - single continuous outcome (eg. blood pressure,

More information

Feature Grouping and Selection Over an Undirected Graph

Feature Grouping and Selection Over an Undirected Graph Feature Grouping and Selection Over an Undirected Graph Sen Yang, Lei Yuan, Ying-Cheng Lai, Xiaotong Shen, Peter Wonka, and Jieping Ye 1 Introduction High-dimensional regression/classification is challenging

More information

arxiv: v1 [stat.ml] 18 Jan 2010

arxiv: v1 [stat.ml] 18 Jan 2010 Improving stability and interpretability of gene expression signatures arxiv:1001.3109v1 [stat.ml] 18 Jan 2010 Anne-Claire Haury Mines ParisTech, CBIO Institut Curie, Paris, F-75248 INSERM, U900, Paris,

More information

25 : Graphical induced structured input/output models

25 : Graphical induced structured input/output models 10-708: Probabilistic Graphical Models 10-708, Spring 2016 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Raied Aljadaany, Shi Zong, Chenchen Zhu Disclaimer: A large

More information

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

Genetic Networks. Korbinian Strimmer. Seminar: Statistical Analysis of RNA-Seq Data 19 June IMISE, Universität Leipzig

Genetic Networks. Korbinian Strimmer. Seminar: Statistical Analysis of RNA-Seq Data 19 June IMISE, Universität Leipzig Genetic Networks Korbinian Strimmer IMISE, Universität Leipzig Seminar: Statistical Analysis of RNA-Seq Data 19 June 2012 Korbinian Strimmer, RNA-Seq Networks, 19/6/2012 1 Paper G. I. Allen and Z. Liu.

More information

MS-C1620 Statistical inference

MS-C1620 Statistical inference MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents

More information

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider variable

More information

Saharon Rosset 1 and Ji Zhu 2

Saharon Rosset 1 and Ji Zhu 2 Aust. N. Z. J. Stat. 46(3), 2004, 505 510 CORRECTED PROOF OF THE RESULT OF A PREDICTION ERROR PROPERTY OF THE LASSO ESTIMATOR AND ITS GENERALIZATION BY HUANG (2003) Saharon Rosset 1 and Ji Zhu 2 IBM T.J.

More information

Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence. Uncorrelated Lasso

Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence. Uncorrelated Lasso Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence Uncorrelated Lasso Si-Bao Chen School of Computer Science and Technology, Anhui University, Hefei, 23060, China Chris Ding Department

More information

Univariate shrinkage in the Cox model for high dimensional data

Univariate shrinkage in the Cox model for high dimensional data Univariate shrinkage in the Cox model for high dimensional data Robert Tibshirani January 6, 2009 Abstract We propose a method for prediction in Cox s proportional model, when the number of features (regressors)

More information

Applied Machine Learning Annalisa Marsico

Applied Machine Learning Annalisa Marsico Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 22 April, SoSe 2015 Goals Feature Selection rather than Feature

More information

ESL Chap3. Some extensions of lasso

ESL Chap3. Some extensions of lasso ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied

More information

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010 Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X

More information

A Short Introduction to the Lasso Methodology

A Short Introduction to the Lasso Methodology A Short Introduction to the Lasso Methodology Michael Gutmann sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology March 9, 2016 Michael

More information

The prediction of house price

The prediction of house price 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Institute of Statistics Mimeo Series No Simultaneous regression shrinkage, variable selection and clustering of predictors with OSCAR

Institute of Statistics Mimeo Series No Simultaneous regression shrinkage, variable selection and clustering of predictors with OSCAR DEPARTMENT OF STATISTICS North Carolina State University 2501 Founders Drive, Campus Box 8203 Raleigh, NC 27695-8203 Institute of Statistics Mimeo Series No. 2583 Simultaneous regression shrinkage, variable

More information

Semi-Penalized Inference with Direct FDR Control

Semi-Penalized Inference with Direct FDR Control Jian Huang University of Iowa April 4, 2016 The problem Consider the linear regression model y = p x jβ j + ε, (1) j=1 where y IR n, x j IR n, ε IR n, and β j is the jth regression coefficient, Here p

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Regression Model In The Analysis Of Micro Array Data-Gene Expression Detection

Regression Model In The Analysis Of Micro Array Data-Gene Expression Detection Jamal Fathima.J.I 1 and P.Venkatesan 1. Research Scholar -Department of statistics National Institute For Research In Tuberculosis, Indian Council For Medical Research,Chennai,India,.Department of statistics

More information

A Study of Network-based Kernel Methods on Protein-Protein Interaction for Protein Functions Prediction

A Study of Network-based Kernel Methods on Protein-Protein Interaction for Protein Functions Prediction The Third International Symposium on Optimization and Systems Biology (OSB 09) Zhangjiajie, China, September 20 22, 2009 Copyright 2009 ORSC & APORC, pp. 25 32 A Study of Network-based Kernel Methods on

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Generalized Elastic Net Regression

Generalized Elastic Net Regression Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1

More information

Learning gradients: prescriptive models

Learning gradients: prescriptive models Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan

More information

Hierarchical Penalization

Hierarchical Penalization Hierarchical Penalization Marie Szafransi 1, Yves Grandvalet 1, 2 and Pierre Morizet-Mahoudeaux 1 Heudiasyc 1, UMR CNRS 6599 Université de Technologie de Compiègne BP 20529, 60205 Compiègne Cedex, France

More information

Nikhil Rao, Miroslav Dudík, Zaid Harchaoui

Nikhil Rao, Miroslav Dudík, Zaid Harchaoui THE GROUP k-support NORM FOR LEARNING WITH STRUCTURED SPARSITY Nikhil Rao, Miroslav Dudík, Zaid Harchaoui Technicolor R&I, Microsoft Research, University of Washington ABSTRACT Several high-dimensional

More information

STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă

STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă mmp@stat.washington.edu Reading: Murphy: BIC, AIC 8.4.2 (pp 255), SRM 6.5 (pp 204) Hastie, Tibshirani

More information

From Lasso regression to Feature vector machine

From Lasso regression to Feature vector machine From Lasso regression to Feature vector machine Fan Li, Yiming Yang and Eric P. Xing,2 LTI and 2 CALD, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA USA 523 {hustlf,yiming,epxing}@cs.cmu.edu

More information

Machine Learning for Economists: Part 4 Shrinkage and Sparsity

Machine Learning for Economists: Part 4 Shrinkage and Sparsity Machine Learning for Economists: Part 4 Shrinkage and Sparsity Michal Andrle International Monetary Fund Washington, D.C., October, 2018 Disclaimer #1: The views expressed herein are those of the authors

More information

Correlate. A method for the integrative analysis of two genomic data sets

Correlate. A method for the integrative analysis of two genomic data sets Correlate A method for the integrative analysis of two genomic data sets Sam Gross, Balasubramanian Narasimhan, Robert Tibshirani, and Daniela Witten February 19, 2010 Introduction Sparse Canonical Correlation

More information

Gene expression microarray technology measures the expression levels of thousands of genes. Research Article

Gene expression microarray technology measures the expression levels of thousands of genes. Research Article JOURNAL OF COMPUTATIONAL BIOLOGY Volume 7, Number 2, 2 # Mary Ann Liebert, Inc. Pp. 8 DOI:.89/cmb.29.52 Research Article Reducing the Computational Complexity of Information Theoretic Approaches for Reconstructing

More information

27: Case study with popular GM III. 1 Introduction: Gene association mapping for complex diseases 1

27: Case study with popular GM III. 1 Introduction: Gene association mapping for complex diseases 1 10-708: Probabilistic Graphical Models, Spring 2015 27: Case study with popular GM III Lecturer: Eric P. Xing Scribes: Hyun Ah Song & Elizabeth Silver 1 Introduction: Gene association mapping for complex

More information

A Modern Look at Classical Multivariate Techniques

A Modern Look at Classical Multivariate Techniques A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico

More information

Effective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data

Effective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data Effective Linear Discriant Analysis for High Dimensional, Low Sample Size Data Zhihua Qiao, Lan Zhou and Jianhua Z. Huang Abstract In the so-called high dimensional, low sample size (HDLSS) settings, LDA

More information

A Bias Correction for the Minimum Error Rate in Cross-validation

A Bias Correction for the Minimum Error Rate in Cross-validation A Bias Correction for the Minimum Error Rate in Cross-validation Ryan J. Tibshirani Robert Tibshirani Abstract Tuning parameters in supervised learning problems are often estimated by cross-validation.

More information

Fast Regularization Paths via Coordinate Descent

Fast Regularization Paths via Coordinate Descent August 2008 Trevor Hastie, Stanford Statistics 1 Fast Regularization Paths via Coordinate Descent Trevor Hastie Stanford University joint work with Jerry Friedman and Rob Tibshirani. August 2008 Trevor

More information

High-Dimensional Statistical Learning: Introduction

High-Dimensional Statistical Learning: Introduction Classical Statistics Biological Big Data Supervised and Unsupervised Learning High-Dimensional Statistical Learning: Introduction Ali Shojaie University of Washington http://faculty.washington.edu/ashojaie/

More information

Spectral Graph Wavelets on the Cortical Connectome and Regularization of the EEG Inverse Problem

Spectral Graph Wavelets on the Cortical Connectome and Regularization of the EEG Inverse Problem Spectral Graph Wavelets on the Cortical Connectome and Regularization of the EEG Inverse Problem David K Hammond University of Oregon / NeuroInformatics Center International Conference on Industrial and

More information

Machine Learning. Regression basics. Marc Toussaint University of Stuttgart Summer 2015

Machine Learning. Regression basics. Marc Toussaint University of Stuttgart Summer 2015 Machine Learning Regression basics Linear regression, non-linear features (polynomial, RBFs, piece-wise), regularization, cross validation, Ridge/Lasso, kernel trick Marc Toussaint University of Stuttgart

More information

Compressed Sensing in Cancer Biology? (A Work in Progress)

Compressed Sensing in Cancer Biology? (A Work in Progress) Compressed Sensing in Cancer Biology? (A Work in Progress) M. Vidyasagar FRS Cecil & Ida Green Chair The University of Texas at Dallas M.Vidyasagar@utdallas.edu www.utdallas.edu/ m.vidyasagar University

More information

TGDR: An Introduction

TGDR: An Introduction TGDR: An Introduction Julian Wolfson Student Seminar March 28, 2007 1 Variable Selection 2 Penalization, Solution Paths and TGDR 3 Applying TGDR 4 Extensions 5 Final Thoughts Some motivating examples We

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

Homework 1: Solutions

Homework 1: Solutions Homework 1: Solutions Statistics 413 Fall 2017 Data Analysis: Note: All data analysis results are provided by Michael Rodgers 1. Baseball Data: (a) What are the most important features for predicting players

More information

Sparsity in Underdetermined Systems

Sparsity in Underdetermined Systems Sparsity in Underdetermined Systems Department of Statistics Stanford University August 19, 2005 Classical Linear Regression Problem X n y p n 1 > Given predictors and response, y Xβ ε = + ε N( 0, σ 2

More information

hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference

hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference CS 229 Project Report (TR# MSB2010) Submitted 12/10/2010 hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference Muhammad Shoaib Sehgal Computer Science

More information

A new approach for biomarker detection using fusion networks

A new approach for biomarker detection using fusion networks A new approach for biomarker detection using fusion networks Master thesis Bioinformatics Mona Rams June 15, 2016 Department Mathematik und Informatik Freie Universität Berlin 2 Statutory Declaration I

More information

Spatial Lasso with Application to GIS Model Selection. F. Jay Breidt Colorado State University

Spatial Lasso with Application to GIS Model Selection. F. Jay Breidt Colorado State University Spatial Lasso with Application to GIS Model Selection F. Jay Breidt Colorado State University with Hsin-Cheng Huang, Nan-Jung Hsu, and Dave Theobald September 25 The work reported here was developed under

More information

Covariance-regularized regression and classification for high-dimensional problems

Covariance-regularized regression and classification for high-dimensional problems Covariance-regularized regression and classification for high-dimensional problems Daniela M. Witten Department of Statistics, Stanford University, 390 Serra Mall, Stanford CA 94305, USA. E-mail: dwitten@stanford.edu

More information

Iterative Laplacian Score for Feature Selection

Iterative Laplacian Score for Feature Selection Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,

More information

Variable Selection for High-Dimensional Data with Spatial-Temporal Effects and Extensions to Multitask Regression and Multicategory Classification

Variable Selection for High-Dimensional Data with Spatial-Temporal Effects and Extensions to Multitask Regression and Multicategory Classification Variable Selection for High-Dimensional Data with Spatial-Temporal Effects and Extensions to Multitask Regression and Multicategory Classification Tong Tong Wu Department of Epidemiology and Biostatistics

More information

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010 Model-Averaged l 1 Regularization using Markov Chain Monte Carlo Model Composition Technical Report No. 541 Department of Statistics, University of Washington Chris Fraley and Daniel Percival August 22,

More information

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:

More information

A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables

A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables Qi Tang (Joint work with Kam-Wah Tsui and Sijian Wang) Department of Statistics University of Wisconsin-Madison Feb. 8,

More information

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie Computational Biology Program Memorial Sloan-Kettering Cancer Center http://cbio.mskcc.org/leslielab

More information

Supervised classification for structured data: Applications in bio- and chemoinformatics

Supervised classification for structured data: Applications in bio- and chemoinformatics Supervised classification for structured data: Applications in bio- and chemoinformatics Jean-Philippe Vert Jean-Philippe.Vert@mines-paristech.fr Mines ParisTech / Curie Institute / Inserm The third school

More information

Proteomics and Variable Selection

Proteomics and Variable Selection Proteomics and Variable Selection p. 1/55 Proteomics and Variable Selection Alex Lewin With thanks to Paul Kirk for some graphs Department of Epidemiology and Biostatistics, School of Public Health, Imperial

More information

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Consistent high-dimensional Bayesian variable selection via penalized credible regions Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable

More information

Least Angle Regression, Forward Stagewise and the Lasso

Least Angle Regression, Forward Stagewise and the Lasso January 2005 Rob Tibshirani, Stanford 1 Least Angle Regression, Forward Stagewise and the Lasso Brad Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani Stanford University Annals of Statistics,

More information

A Least Squares Formulation for Canonical Correlation Analysis

A Least Squares Formulation for Canonical Correlation Analysis A Least Squares Formulation for Canonical Correlation Analysis Liang Sun, Shuiwang Ji, and Jieping Ye Department of Computer Science and Engineering Arizona State University Motivation Canonical Correlation

More information

Compressed Sensing and Neural Networks

Compressed Sensing and Neural Networks and Jan Vybíral (Charles University & Czech Technical University Prague, Czech Republic) NOMAD Summer Berlin, September 25-29, 2017 1 / 31 Outline Lasso & Introduction Notation Training the network Applications

More information

Lecture 14: Variable Selection - Beyond LASSO

Lecture 14: Variable Selection - Beyond LASSO Fall, 2017 Extension of LASSO To achieve oracle properties, L q penalty with 0 < q < 1, SCAD penalty (Fan and Li 2001; Zhang et al. 2007). Adaptive LASSO (Zou 2006; Zhang and Lu 2007; Wang et al. 2007)

More information

Multimodal Deep Learning for Predicting Survival from Breast Cancer

Multimodal Deep Learning for Predicting Survival from Breast Cancer Multimodal Deep Learning for Predicting Survival from Breast Cancer Heather Couture Deep Learning Journal Club Nov. 16, 2016 Outline Background on tumor histology & genetic data Background on survival

More information

Dimension Reduction Methods

Dimension Reduction Methods Dimension Reduction Methods And Bayesian Machine Learning Marek Petrik 2/28 Previously in Machine Learning How to choose the right features if we have (too) many options Methods: 1. Subset selection 2.

More information

Predicting Graph Labels using Perceptron. Shuang Song

Predicting Graph Labels using Perceptron. Shuang Song Predicting Graph Labels using Perceptron Shuang Song shs037@eng.ucsd.edu Online learning over graphs M. Herbster, M. Pontil, and L. Wainer, Proc. 22nd Int. Conf. Machine Learning (ICML'05), 2005 Prediction

More information

Wavelets on Graphs, an Introduction

Wavelets on Graphs, an Introduction Wavelets on Graphs, an Introduction Pierre Vandergheynst and David Shuman Ecole Polytechnique Fédérale de Lausanne (EPFL) Signal Processing Laboratory {pierre.vandergheynst,david.shuman}@epfl.ch Université

More information

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent

More information

Sparse regularization for functional logistic regression models

Sparse regularization for functional logistic regression models Sparse regularization for functional logistic regression models Hidetoshi Matsui The Center for Data Science Education and Research, Shiga University 1-1-1 Banba, Hikone, Shiga, 5-85, Japan. hmatsui@biwako.shiga-u.ac.jp

More information

Statistical Analysis of Data Generated by a Mixture of Two Parametric Distributions

Statistical Analysis of Data Generated by a Mixture of Two Parametric Distributions UDC 519.2 Statistical Analysis of Data Generated by a Mixture of Two Parametric Distributions Yu. K. Belyaev, D. Källberg, P. Rydén Department of Mathematics and Mathematical Statistics, Umeå University,

More information

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1).

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1). Regression and PCA Classification The goal: map from input X to a label Y. Y has a discrete set of possible values We focused on binary Y (values 0 or 1). But we also discussed larger number of classes

More information

SVRG++ with Non-uniform Sampling

SVRG++ with Non-uniform Sampling SVRG++ with Non-uniform Sampling Tamás Kern András György Department of Electrical and Electronic Engineering Imperial College London, London, UK, SW7 2BT {tamas.kern15,a.gyorgy}@imperial.ac.uk Abstract

More information

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1 Variable Selection in Restricted Linear Regression Models Y. Tuaç 1 and O. Arslan 1 Ankara University, Faculty of Science, Department of Statistics, 06100 Ankara/Turkey ytuac@ankara.edu.tr, oarslan@ankara.edu.tr

More information

Machine Learning for Biomedical Engineering. Enrico Grisan

Machine Learning for Biomedical Engineering. Enrico Grisan Machine Learning for Biomedical Engineering Enrico Grisan enrico.grisan@dei.unipd.it Curse of dimensionality Why are more features bad? Redundant features (useless or confounding) Hard to interpret and

More information

Fused elastic net EPoC

Fused elastic net EPoC Fused elastic net EPoC Tobias Abenius November 28, 2013 Data Set and Method Use mrna and copy number aberration (CNA) to build a network using an extension of our EPoC method (Jörnsten et al., 2011; Abenius

More information

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao

More information

Grouping Pursuit in regression. Xiaotong Shen

Grouping Pursuit in regression. Xiaotong Shen Grouping Pursuit in regression Xiaotong Shen School of Statistics University of Minnesota Email xshen@stat.umn.edu Joint with Hsin-Cheng Huang (Sinica, Taiwan) Workshop in honor of John Hartigan Innovation

More information

Regularization Path Algorithms for Detecting Gene Interactions

Regularization Path Algorithms for Detecting Gene Interactions Regularization Path Algorithms for Detecting Gene Interactions Mee Young Park Trevor Hastie July 16, 2006 Abstract In this study, we consider several regularization path algorithms with grouped variable

More information

Regression.

Regression. Regression www.biostat.wisc.edu/~dpage/cs760/ Goals for the lecture you should understand the following concepts linear regression RMSE, MAE, and R-square logistic regression convex functions and sets

More information

Parameter Norm Penalties. Sargur N. Srihari

Parameter Norm Penalties. Sargur N. Srihari Parameter Norm Penalties Sargur N. srihari@cedar.buffalo.edu 1 Regularization Strategies 1. Parameter Norm Penalties 2. Norm Penalties as Constrained Optimization 3. Regularization and Underconstrained

More information

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana

More information

Automatic Response Category Combination. in Multinomial Logistic Regression

Automatic Response Category Combination. in Multinomial Logistic Regression Automatic Response Category Combination in Multinomial Logistic Regression Bradley S. Price, Charles J. Geyer, and Adam J. Rothman arxiv:1705.03594v1 [stat.me] 10 May 2017 Abstract We propose a penalized

More information

Accepted author version posted online: 28 Nov 2012.

Accepted author version posted online: 28 Nov 2012. This article was downloaded by: [University of Minnesota Libraries, Twin Cities] On: 15 May 2013, At: 12:23 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954

More information

High-dimensional regression with unknown variance

High-dimensional regression with unknown variance High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march 2012 Setting Gaussian regression with unknown variance: Y i = f i + ε i with ε i i.i.d. N (0, σ 2 ) f = (f

More information

Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR

Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR Howard D. Bondell and Brian J. Reich Department of Statistics, North Carolina State University,

More information

Predicting Protein Functions and Domain Interactions from Protein Interactions

Predicting Protein Functions and Domain Interactions from Protein Interactions Predicting Protein Functions and Domain Interactions from Protein Interactions Fengzhu Sun, PhD Center for Computational and Experimental Genomics University of Southern California Outline High-throughput

More information

Bi-level feature selection with applications to genetic association

Bi-level feature selection with applications to genetic association Bi-level feature selection with applications to genetic association studies October 15, 2008 Motivation In many applications, biological features possess a grouping structure Categorical variables may

More information

Introduction to the genlasso package

Introduction to the genlasso package Introduction to the genlasso package Taylor B. Arnold, Ryan Tibshirani Abstract We present a short tutorial and introduction to using the R package genlasso, which is used for computing the solution path

More information