Regulatory Inferece from Gene Expression. CMSC858P Spring 2012 Hector Corrada Bravo

Similar documents
Gene Regulatory Networks II Computa.onal Genomics Seyoung Kim

Regression.

Multivariate Normal Models

Multivariate Normal Models

Probabilistic Graphical Models

Gaussian Graphical Models and Graphical Lasso

An Introduction to Graphical Lasso

Probabilistic Graphical Models

Learning Gaussian Graphical Models with Unknown Group Sparsity

Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results

MATH 829: Introduction to Data Mining and Analysis Graphical Models III - Gaussian Graphical Models (cont.)

Structure estimation for Gaussian graphical models

Priors in Dependency network learning

The lasso: some novel algorithms and applications

UVA CS 4501: Machine Learning. Lecture 6: Linear Regression Model with Dr. Yanjun Qi. University of Virginia

CSC 412 (Lecture 4): Undirected Graphical Models

Exam: high-dimensional data analysis January 20, 2014

High-dimensional Covariance Estimation Based On Gaussian Graphical Models

STAD68: Machine Learning

Approximation. Inderjit S. Dhillon Dept of Computer Science UT Austin. SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina.

MATH 829: Introduction to Data Mining and Analysis Graphical Models II - Gaussian Graphical Models

25 : Graphical induced structured input/output models

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

CS 6140: Machine Learning Spring What We Learned Last Week 2/26/16

27: Case study with popular GM III. 1 Introduction: Gene association mapping for complex diseases 1

Sparse inverse covariance estimation with the lasso

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

25 : Graphical induced structured input/output models

Introduction to graphical models: Lecture III

Sta$s$cal Op$miza$on for Big Data. Zhaoran Wang and Han Liu (Joint work with Tong Zhang)

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc.

Big & Quic: Sparse Inverse Covariance Estimation for a Million Variables

ESL Chap3. Some extensions of lasso

Discovering molecular pathways from protein interaction and ge

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation

Inferring Transcriptional Regulatory Networks from High-throughput Data

Inverse Covariance Estimation with Missing Data using the Concave-Convex Procedure

Lecture 25: November 27

Sparse Inverse Covariance Estimation for a Million Variables

Networks. Can (John) Bruce Keck Founda7on Biotechnology Lab Bioinforma7cs Resource

Sparse Graph Learning via Markov Random Fields

Undirected Graphical Models

Permutation-invariant regularization of large covariance matrices. Liza Levina

Sparse Gaussian conditional random fields

Inferring Transcriptional Regulatory Networks from Gene Expression Data II

Learning discrete graphical models via generalized inverse covariance matrices

6. Regularized linear regression

10708 Graphical Models: Homework 2

Mixture Models. Michael Kuhn

CPSC 540: Machine Learning

Sparse Additive Functional and kernel CCA

CS 6140: Machine Learning Spring What We Learned Last Week. Survey 2/26/16. VS. Model

Graphical model inference with unobserved variables via latent tree aggregation

CS 6140: Machine Learning Spring 2016

High-dimensional covariance estimation based on Gaussian graphical models

Graphical Model Selection

Chapter 17: Undirected Graphical Models

Part 2. Representation Learning Algorithms

Graphical Models and Independence Models

Sparse Permutation Invariant Covariance Estimation: Final Talk

11 : Gaussian Graphic Models and Ising Models

STA 414/2104, Spring 2014, Practice Problem Set #1

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models

Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics

CMSC858P Supervised Learning Methods

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Sparse Covariance Selection using Semidefinite Programming

A novel and efficient algorithm for de novo discovery of mutated driver pathways in cancer

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS

Robust and sparse Gaussian graphical modelling under cell-wise contamination

Founda'ons of Large- Scale Mul'media Informa'on Management and Retrieval. Lecture #3 Machine Learning. Edward Chang

Lecture 2 Part 1 Optimization

Least Squares Parameter Es.ma.on

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

CSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015

Genetic Networks. Korbinian Strimmer. Seminar: Statistical Analysis of RNA-Seq Data 19 June IMISE, Universität Leipzig

Learning the Network Structure of Heterogenerous Data via Pairwise Exponential MRF

Gradient directed regularization for sparse Gaussian concentration graphs, with applications to inference of genetic networks

Linear Models for Regression CS534

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Learning structurally consistent undirected probabilistic graphical models

GWAS. Genotype-Phenotype Association CMSC858P Spring 2012 Hector Corrada Bravo University of Maryland. logistic regression. logistic regression

Multivariate Bernoulli Distribution 1

Model-Based Dynamic Sampling (MBDS) in 2D and 3D. Dilshan Godaliyadda Gregery Buzzard Charles Bouman Purdue University

Example: multivariate Gaussian Distribution

DART Tutorial Sec'on 1: Filtering For a One Variable System

Logis&c Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

1 Undirected Graphical Models. 2 Markov Random Fields (MRFs)

Variables. Cho-Jui Hsieh The University of Texas at Austin. ICML workshop on Covariance Selection Beijing, China June 26, 2014

COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017

STATS306B STATS306B. Discriminant Analysis. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010

Lecture 6: Methods for high-dimensional problems

CS 6140: Machine Learning Spring 2017

Introduction to Machine Learning

A Local Inverse Formula and a Factorization

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

Transcription:

Regulatory Inferece from Gene Expression CMSC858P Spring 2012 Hector Corrada Bravo

2

Graphical Model Let y be a vector- valued random variable Suppose some condi8onal independence proper8es hold for some variables in y. Example: variable y 2 and y 3 are independent given remaining variables in y. We can encode this condi8onal independence proper8es in a graph 3

Graphical Model Hammersley- Clifford theorem: all probability distribu0ons that sa0sfy condi0onal independence proper0es in a graph can be wri8en as P (y) = 1 Z exp c C f c (y c ), cliques in graph poten8al func8on variables in clique 4

Graphical Models The probability distribu8on is determined by choice of poten8al func8ons Example: 1. f c (y i )= τ 2 ii y2 i 2 2. f c ({y i,y j })= τ ijy i y j 2 3. f c (y c ) = 0 for y c >= 3. 5

Gaussian Graphical Models Define matrix as 1. Σ 1 ij 2. Σ 1 ij = τ ij if there is an edge between y i and y j = 0 otherwise Σ 1 = τ11 2 τ 12 0 τ 14 τ 12 τ22 2 τ 23 0 0 τ 23 τ33 2 τ 34 τ 14 0 τ 34 τ44 2 6

Network Discovery Gene expression vector y is assumed to be distributed as y N(0, Σ) Likelihood given data (n arrays) is then l(σ) =(2π det(σ)) n/2 exp 1 2 n y T i Σ 1 y i i=1 7

BIOINFORMATICS Vol. 19 Suppl. 1 2003, pages i273 i282 DOI: 10.1093/bioinformatics/btg1038 Genome-wide discovery of transcriptional modules from DNA sequence and gene expression E. Segal, R. Yelensky and D. Koller Computer Science Department of Stanford University, Stanford, CA 94305-9010, USA Received on January 6, 2003; accepted on February 20, 2003 ABSTRACT that transcriptional elements should explain the observed

Module Networks: Segal 2003 9

Module Networks: Segal 2003 P(g. R g.s) P(g.R g.s ) g. M P(g.E g.m) g.s 1 g.s 2 g.s 3 R R g. M 1 0 1 2 3 CPD 1 0 0 g.r 1 g.r 2 g.m CPD 3 CPD 2 2 3 g.e 1 g.e 2 g.e 3 10

Module Networks: Segal 2003 11

Segal and Widom, 2009

Network Discovery Recent efforts in discovering networks directly from expression data Main idea: treat en8re vector of gene expression for a given sample as mul8variate normal A sparse (inverse) covariance matrix induces a network This is related to gaussian graphical models Banerjee, et al. [ICML 2006, JMLR 2008], Friedman [Biosta8s8cs 2007] 13

Gaussian Graphical Model We can write P as P (y) = 1 Z exp yt Σ 1 y 2 By a nice property of exponen0al families, this makes y be mul8variate normal implies Z = 2π det(σ) 14

ML Es8ma8on Maximum Likelihood Es8mate is given by inverse of solu8on to max X0 log det X (SX) S is the sample covariance S = n i=1 y i y i T 15

Condi8onal Independence Condi8onal independence is given by sparse inverse covariance Σ 1 = τ11 2 τ 12 0 τ 14 τ 12 τ22 2 τ 23 0 0 τ 23 τ33 2 τ 34 τ 14 0 τ 34 τ44 2 16

Sparse- inducing penalty Use a penalized likelihood method to induce sparsity where max X0 log det X (SX) λx 1 X 1 = ij X ij 17

Block- coordinate ascent Solve by maximizing one column of matrix at a 8me current solu8on Turns out this is equivalent to solving where W = W11 w 12 w12 T, S = w 22 1 min β 2 W 1 2 11 β z2 + λβ 1 z = W 1 2 11 s 12 S11 s 12 s T 12 s 22 Solu8on is then w 12 = W 11 β 18

Block- coordinate ascent So, we have l1- regularized regression (lasso) problem at each step 1 min β 2 W 1 2 11 β z2 + λβ 1 Each of these can be solved via coordinate descent as before Recall: so_- thresholding at each step 19

Genes associated with iron homeostasis Genes associated with cellular membrane fusion

Summary Using gaussian graphical model representa8on mul8variate normal probability over a sparse graph we can take resul8ng graph as gene network Use sparsity- inducing regulariza8on (l1- norm) Block- coordinate ascent method leads to l1- regularized regression at each step Can use efficient coordinate descent (so_- thresholding) to solve regression 21