Gene Regulatory Networks II Computa.onal Genomics Seyoung Kim

Similar documents
Regulatory Inferece from Gene Expression. CMSC858P Spring 2012 Hector Corrada Bravo

CSCI1950 Z Computa3onal Methods for Biology Lecture 24. Ben Raphael April 29, hgp://cs.brown.edu/courses/csci1950 z/ Network Mo3fs

Inferring Transcriptional Regulatory Networks from Gene Expression Data II

Networks. Can (John) Bruce Keck Founda7on Biotechnology Lab Bioinforma7cs Resource

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

Priors in Dependency network learning

CS 6140: Machine Learning Spring What We Learned Last Week 2/26/16

Inferring Transcriptional Regulatory Networks from High-throughput Data

Bayesian networks Lecture 18. David Sontag New York University

Learning in Bayesian Networks

Latent Dirichlet Alloca/on

Introduction to Bioinformatics

CS 6140: Machine Learning Spring What We Learned Last Week. Survey 2/26/16. VS. Model

CS 6140: Machine Learning Spring 2016

Parameter Es*ma*on: Cracking Incomplete Data

The Monte Carlo Method: Bayesian Networks

STAD68: Machine Learning

Regression.

Sta$s$cal sequence recogni$on

Bias/variance tradeoff, Model assessment and selec+on

Graphical Models. Lecture 1: Mo4va4on and Founda4ons. Andrew McCallum

Chapter 17: Undirected Graphical Models

Boolean models of gene regulatory networks. Matthew Macauley Math 4500: Mathematical Modeling Clemson University Spring 2016

Latent Variable models for GWAs

GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data

10708 Graphical Models: Homework 2

Part 2. Representation Learning Algorithms

Introduc)on to RNA- Seq Data Analysis. Dr. Benilton S Carvalho Department of Medical Gene)cs Faculty of Medical Sciences State University of Campinas

CSE 473: Ar+ficial Intelligence. Hidden Markov Models. Bayes Nets. Two random variable at each +me step Hidden state, X i Observa+on, E i

Computational methods for predicting protein-protein interactions

Gene Regula*on, ChIP- X and DNA Mo*fs. Statistics in Genomics Hongkai Ji

Gaussian Graphical Models and Graphical Lasso

Machine Learning and Data Mining. Bayes Classifiers. Prof. Alexander Ihler

Graph structure learning for network inference

Transcrip:on factor binding mo:fs

CSCI1950 Z Computa3onal Methods for Biology* (*Working Title) Lecture 1. Ben Raphael January 21, Course Par3culars

Sta$s$cal Op$miza$on for Big Data. Zhaoran Wang and Han Liu (Joint work with Tong Zhang)

Computational Genomics. Reconstructing dynamic regulatory networks in multiple species

Probability and Structure in Natural Language Processing

Inferring Protein-Signaling Networks II

Phylogene)cs. IMBB 2016 BecA- ILRI Hub, Nairobi May 9 20, Joyce Nzioki

Computational Genomics

Bioinformatics 2. Yeast two hybrid. Proteomics. Proteomics

Inferring Causal Phenotype Networks from Segregating Populat

An Introduc+on to Sta+s+cs and Machine Learning for Quan+ta+ve Biology. Anirvan Sengupta Dept. of Physics and Astronomy Rutgers University

Systems biology and biological networks

Mul$ple Sequence Alignment Methods. Tandy Warnow Departments of Bioengineering and Computer Science h?p://tandy.cs.illinois.edu

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

Sta$s$cs for Genomics ( )

STA 4273H: Sta-s-cal Machine Learning

Written Exam 15 December Course name: Introduction to Systems Biology Course no

Logis&c Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Structure Learning: the good, the bad, the ugly

27: Case study with popular GM III. 1 Introduction: Gene association mapping for complex diseases 1

UVA CS 4501: Machine Learning. Lecture 6: Linear Regression Model with Dr. Yanjun Qi. University of Virginia

Graphical Models. Lecture 10: Variable Elimina:on, con:nued. Andrew McCallum

Introduction to continuous and hybrid. Bayesian networks

Computer Vision. Pa0ern Recogni4on Concepts Part I. Luis F. Teixeira MAP- i 2012/13

Outline. What is Machine Learning? Why Machine Learning? 9/29/08. Machine Learning Approaches to Biological Research: Bioimage Informa>cs and Beyond

Genetic Networks. Korbinian Strimmer. Seminar: Statistical Analysis of RNA-Seq Data 19 June IMISE, Universität Leipzig

From genes to func.on Gene regula.on and transcrip.on

11 : Gaussian Graphic Models and Ising Models

25 : Graphical induced structured input/output models

Learning Bayesian Networks Does Not Have to Be NP-Hard

Evolu&on of Cellular Interac&on Networks. Pedro Beltrao Krogan and Lim UCSF

Research Statement on Statistics Jun Zhang

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

Graphical Models. Lecture 3: Local Condi6onal Probability Distribu6ons. Andrew McCallum

Introduction to Bioinformatics

Predicting Protein Functions and Domain Interactions from Protein Interactions

{ p if x = 1 1 p if x = 0

Self Similar (Scale Free, Power Law) Networks (I)

Causal Graphical Models in Systems Genetics

Principles of Gene Expression

Machine learning for Dynamic Social Network Analysis

Multi Omics Clustering. ABDBM Ron Shamir

Least Squares Parameter Es.ma.on

UVA CS / Introduc8on to Machine Learning and Data Mining

Reasoning Under Uncertainty: Belief Network Inference

Quan&fying Uncertainty. Sai Ravela Massachuse7s Ins&tute of Technology

Differen'al Privacy with Bounded Priors: Reconciling U+lity and Privacy in Genome- Wide Associa+on Studies

Computational Biology: Basics & Interesting Problems

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University

networks in molecular biology Wolfgang Huber

2 : Directed GMs: Bayesian Networks

Introduction to Particle Filters for Data Assimilation

25 : Graphical induced structured input/output models

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

Bayesian Networks. Motivation

Proteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it?

An Introduction to Reversible Jump MCMC for Bayesian Networks, with Application

Discovering molecular pathways from protein interaction and ge

Lecture 5: Bayesian Network

Introduction to Probabilistic Graphical Models

Bayesian Networks (Part II)

Classifica(on and predic(on omics style. Dr Nicola Armstrong Mathema(cs and Sta(s(cs Murdoch University

Learning P-maps Param. Learning

Prof. Dr. Ralf Möller Dr. Özgür L. Özçep Universität zu Lübeck Institut für Informationssysteme. Tanya Braun (Exercises)

Probabilistic Graphical Models

Transcription:

Gene Regulatory Networks II 02-710 Computa.onal Genomics Seyoung Kim

Goal: Discover Structure and Func;on of Complex systems in the Cell Identify the different regulators and their target genes that are involved in the system. Represent the relationship between regulators and their target genes as a network Nodes: entities (regulators, target genes) Edges: regulatory relationship High- level goal: Use high throughput data to discover paaerns of combinatorial regula.on and to understand how the ac.vity of genes involved in related biological processes is coordinated and interconnected.

Overview Bayesian networks (network with directed edges): Module networks and their extensions Module network (Segal et al., Nature Gene.cs 2003): Gene module s ac.vity is determined by their expression levels of regulator genes Geronemo (Lee et al., PNAS 2006): Gene module s ac.vity is determined by their expression levels of regulator gene and SNPs Lirnet (Lee et al., PLoS Gene.cs 2009): incorporates prior knowledge CONEXIC (Akavia et al., Cell 2010): cancer data analysis for copy number varia.on and gene expression data Gaussian graphical models (network with undirected edges) and their extensions

Probabilis;c Graphical Models p(x 3 X 2 ) X 3 p(x 2 X 1 ) X 2 p(x 1 ) X 1 X 6 p(x 4 X 1 ) p(x 5 X 4 ) X 4 X 5 p(x 6 X 2, X 5 ) The joint distribu.on on (X 1, X 2,, X N ) factors according to the parent- of rela.ons defined by the edges E : p(x 1, X 2, X 3, X 4, X 5, X 6 ) = p(x 1 ) p(x 2 X 1 )p(x 3 X 2 ) p(x 4 X 1 )p(x 5 X 4 )p(x 6 X 2, X 5 )

Learning Bayesian Networks Density es.ma.on Model data distribu.on in popula.on Learn both the graph structure and the associated probability density Probabilis.c inference: Predic.on Classifica.on Data p(x 3 X 2 ) X 3 p(x 2 X 1 ) X 2 p(x 1 ) X 1 X 6 p(x 4 X 1 ) p(x 5 X 4 ) p(x 6 X 2, X 5 ) X 4 X 5 p(x 1, X 2, X 3, X 4, X 5, X 6 ) = p(x 1 ) p(x 2 X 1 )p(x 3 X 2 ) p(x 4 X 1 )p(x 5 X 4 )p(x 6 X 2, X 5 )

The Module Network Idea Unlike methods that represent individual genes, module- based methods explicitly model modular structure. This helps in: Reducing dependence on possibly noisy measurements for individual genes, by combining informa.on among genes in the same module Elevated sta.s.cal significance

The Module Network Idea Bayesian Network CPD 1 Module Network Share parameters and dependencies between variables with similar behavior CPD 1 MSFT Module I MSFT CPD 2 CPD 3 MOT CPD 4 MOT CPD 2 DELL INTL DELL INTL Module II CPD 5 AMAT HPQ CPD 6 AMAT HPQ CPD 3 Module III Slides from the presenta.on by Segal et al. UAI03

Learning Module Network Module Network Model defini.on Learning the model Experimental results

Module Network Components Module Assignment Func.on A( ) A(MSFT)=M I AMAT DELL MSFT MOT HPQ INTL A(MOT)=A(DELL)=A(INTL) =M II A(AMAT)= A(HPQ)=M III Module I MSFT MOT DELL INTL Module II Module III AMAT HPQ Slides from the presenta.on by Segal et al. UAI03

Module Network Components Module Assignment Func.on Set of parents for each module Pa(M I )= Pa(M II )={MSFT} Pa(M III )={DELL, INTL} Module I MSFT MOT DELL INTL Module II Module III AMAT HPQ Slides from the presenta.on by Segal et al. UAI03

Module Network Components Module Assignment Func.on Set of parents for each module Condi.onal probability density (CPD) template for each module Module I MSFT MOT DELL INTL Module II Module III AMAT HPQ Slides from the presenta.on by Segal et al. UAI03

Ground Bayesian Network A module network induces a ground BN over X A module network defines a coherent probabilty distribu.on over X if the ground BN is acyclic Module I MSFT MSFT DELL MOT INTL Module II DELL MOT INTL AMAT HPQ Ground Bayesian Network Module III AMAT HPQ Slides from the presenta.on by Segal et al. UAI03

Module Graph Nodes correspond to modules M i M j if at least one variable in M i is a parent of M j M I M II M III Module graph Module I MSFT Theorem: The ground BN is acyclic if the module graph is acyclic Module II DELL MOT INTL Acyclicity checked efficiently using the module graph Module III AMAT HPQ Slides from the presenta.on by Segal et al. UAI03

Learning Module Network Module Network Model defini.on Learning the model Experimental results

Learning Overview Given data D, find assignment func.on A and structure S that maximize the Bayesian score Marginal data likelihood Marginal likelihood Assignment / structure prior Data likelihood Parameter prior Slides from the presenta.on by Segal et al. UAI03

Bayesian Score Decomposi;on Bayesian score decomposes by modules Module j parents Module j variables

Bayesian Score Decomposi;on Bayesian score decomposes by modules Module I MSFT MOT DELL INTL Module II Module III AMAT HPQ Slides from the presenta.on by Segal et al. UAI03

Likelihood Func;on θ MI Score M2 (MSFT, X 2 : D) = Score(DELL,MSFT : D) + Score(MOT,MSFT : D) + Score(INTL,MSFT : D) θ MII MSFT Module I MSFT MOT DELL INTL θ MIII DELL,INTL Module II Module score decomposes by variables in the module Module III Instance 1 Instance 2 Instance 3 AMAT HPQ Slides from the presenta.on by Segal et al. UAI03

Algorithm Overview Find assignment func.on A and structure S that maximize the Bayesian score Find initial assignment A Assignment function A Improve assignments Improve structure Dependency structure S Slides from the presenta.on by Segal et al. UAI03

Learning Dependency Structure Heuris.c search with operators Add/delete parent for module Module I MSFT MOT DELL INTL Module II Module III AMAT HPQ Slides from the presenta.on by Segal et al. UAI03

Learning Dependency Structure Heuris.c search with operators Add/delete parent for module Handle acyclicity Can be checked efficiently on the module graph Module I MSFT X MOT M I M II M III DELL INTL X INTL Module I AMAT HPQ Module II Module III Slides from the presenta.on by Segal et al. UAI03

Learning Dependency Structure Heuris.c search with operators Add/delete parent for module Handle acyclicity Can be checked efficiently on the module graph M I M II M III Module I DELL MSFT MOT INTL X INTL Module I INTL Module III Module II AMAT HPQ Module III Slides from the presenta.on by Segal et al. UAI03

Learning Dependency Structure Heuris.c search with operators Add/delete parent for module Handle acyclicity Can be checked efficiently on the module graph Module I MSFT Efficient computa.on Ajer applying operator for module M j, only update score of operators for module M j DELL Module II MOT INTL AMAT HPQ Module III Slides from the presenta.on by Segal et al. UAI03

Algorithm Overview Find assignment func.on A and structure S that maximize the Bayesian score Find initial assignment A Improve assignments Assignment function A Dependency structure S Improve structure Slides from the presenta.on by Segal et al. UAI03

Learning Assignment Func;on DELL Module I MSFT MOT INTL Module II Module III AMAT HPQ Slides from the presenta.on by Segal et al. UAI03

Learning Assignment Func;on A(DELL)=M I Score: 0.7 DELL Module I MSFT MOT INTL Module II Module III AMAT HPQ Slides from the presenta.on by Segal et al. UAI03

Learning Assignment Func;on A(DELL)=M I Score: 0.7 Module I MSFT A(DELL)=M II Score: 0.9 DELL MOT INTL Module II Module III AMAT HPQ Slides from the presenta.on by Segal et al. UAI03

Learning Assignment Func;on A(DELL)=M I Score: 0.7 Module I MSFT A(DELL)=M II Score: 0.9 MOT INTL A(DELL)=M III Score: cyclic! Module II DELL AMAT HPQ Module III Slides from the presenta.on by Segal et al. UAI03

Learning Module Network Module Network Model defini.on Learning the model Experimental results

Learning Module Network from Gene Expression Data

Determine Combinatorial Control From Gene Expression data Under Mul;ple Condi;ons

Resul;ng Module Row: genes Columns: array (condi.on) Segal et al Nature Gene.cs 2003

Experimental Design Hypothesis: Regulator X ac;vates process Y Experiment: Knock out X and repeat experiment false false true HAP4 true Ypl230W X?

Biological Experiments Valida;on Ypl230w Were the differen.ally expressed genes predicted as targets? Rank modules by enrichment for diff. expressed genes # Module Significance 39 Protein folding 7/23, 1e - 4 29 Cell differentiation 6/41, 2e - 2 5 Glycolysis and foldin g 5/37, 4e - 2 34 Mitochondrial and protein fate 5/37, 4e - 2 Ppt1 # Module Significance 14 Ribosomal and phosphate metabolism 8/32, 9e 3 11 Amino acid and purine metabolism 11/53, 1e 2 15 mrna, rrna and trna processing 9/43, 2e 2 39 Protein f olding 6/23, 2e 2 30 Cell cycle 7/30, 2e 2 Kin82 All regulators regulate predicted modules # Module Significance 3 Ener gy and osmotic stress I 8/31, 1e 4 2 Energy, osmolarity & camp signaling 9/64, 6e 3 15 mrna, rrna and trna processing 6/43, 2e 2 Segal et al., Nature Genetics, 2003

Extensions of Module Network: Geronemo (Lee et al., PNAS 2006) Both gene expression levels and SNPs can be regulators for gene expression levels of other genes Purple regulator: expression regulator Blue regulator: SNP regulator

Extensions of Module Network: Lirnet (Lee et al., PLoS Gene;cs 2009) Incorporate prior knowledge (regulatory features) on gene- expression regulators and SNP regulators Regulatory features (on the lej) and learned regulatory priors (purple bar)

Extensions of Module Network: CONEXIC (Akavia et al., Cell 2010) Extends module networks to handle cancer copy number varia.on and gene expression data to find cancer- causing (driver) muta.on A driver muta.on should occur in mul.ple tumors more ojen than would be expected by chance

Extensions of Module Network: CONEXIC (Akavia et al., Cell 2010) A driver muta.on may be associated (correlated) with the expression of a group of genes that form a module A driver may be over- expressed due to amplifica.on of the DNA encoding it or the ac.on of other factors. The target genes correlate with driver gene expression rather than driver copy number

Overview Bayesian networks (network with directed edges): Module networks and their extensions Module network (Segal et al., Nature Gene.cs 2003): Gene module s ac.vity is determined by their expression levels of regulator genes Geronemo (Lee et al., PNAS 2006): Gene module s ac.vity is determined by their expression levels of regulator gene and SNPs Lirnet (Lee et al., PLoS Gene.cs 2009): incorporates prior knowledge CONEXIC (Akavia et al., Cell 2010): cancer data analysis for copy number varia.on and gene expression data Gaussian graphical models (network with undirected edges) and their extensions

Gaussian Graphical Models The gene expressions for K genes Y={y 1,, y K } are Gaussian distributed: Y ~ N(0 K,Θ 1 ) 0 K : vector of K zeros Θ: K by K inverse covariance matrix Then, the inverse covariance matrix Θ encodes a Gaussian graphical model Non- zero elements in Θ correspond to edges

Gaussian Graphical Models Non- zero elements in Θ correspond to edges Gaussian graphical model encoded by Θ y 1 y 2 y 3 y 4 y 5 1.3 0.3 0.8 0 0 0.3 1.0 0 0 0 0.8 0 1.2 0 0 0 0 0 1.5 1.1 0 0 0 1.1 0.9 y 1 y 3 y 2 y 4 y 5

Gaussian Graphical Models Non- zero elements in Θ correspond to edges Gaussian graphical model encoded by Θ y 1 y 2 y 3 y 4 y 5 1.3 0.3 0.8 0 0 0.3 1.0 0 0 0 0.8 0 1.2 0 0 0 0 0 1.5 1.1 0 0 0 1.1 0.9 y 1 y 3 y 2 y 4 y 5 Nonzero/zero paaern of the y 5 s column matches the neighbors of the node y 5

Probabilis;c Graphical Models Sta.s.cs and Mechanics are independent of each other condi.onal on Algebra

Learning a Sparse Gaussian Graphical Models Minimize nega.ve log likelihood of data with L 1 penalty argmin logdet Θ tr(sθ) λ Θ 1 where - tr(a) is the trace of matrix A - S is a K by K sample covariance - Θ 1 is an L 1 regulariza.on The op.miza.on problem is convex! Many sojware packages exist (e.g., BIG&QUIC, Hsieh et al., NIPS 2013; FastGGM, Wang et al., Plos Comp Bio 2016)

Extensions of Gaussian Graphical Models Network with hubs (Tan et al., JMLR 2014) Network with block structures (Tan et al., UAI 2009) Learned from Glioblastoma expression data (Hubs as pink nodes)

Summary Modeling gene networks with Bayesian networks Probabilis.c model for learning modules of variables and their structural dependencies Module networks have improved performance over Bayesian networks Sta.s.cal robustness Interpretability Reconstruc.on of many known regulatory modules and predic.on of targets for unknown regulators Modeling gene networks with undirected networks Gaussian graphical models are extremely popular: fast learning methods are available (more efficient than Bayesian network learning)