Inferring Transcriptional Regulatory Networks from Gene Expression Data II

Similar documents
Inferring Transcriptional Regulatory Networks from High-throughput Data

Bayesian networks for reconstructing transcriptional regulatory networks

Inferring Protein-Signaling Networks II

Inferring Protein-Signaling Networks

Introduction to Bioinformatics

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

Learning gene regulatory networks Statistical methods for haplotype inference Part I

Rule learning for gene expression data

Computational Genomics. Reconstructing dynamic regulatory networks in multiple species

Predicting Protein Functions and Domain Interactions from Protein Interactions

Gene Regulatory Networks II Computa.onal Genomics Seyoung Kim

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

Computational methods for predicting protein-protein interactions

Sparse Linear Models (10/7/13)

Chapter 15 Active Reading Guide Regulation of Gene Expression

REVIEW SESSION. Wednesday, September 15 5:30 PM SHANTZ 242 E

Discovering molecular pathways from protein interaction and ge

Written Exam 15 December Course name: Introduction to Systems Biology Course no

Self Similar (Scale Free, Power Law) Networks (I)

Latent Variable models for GWAs

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data

Computational Network Biology Biostatistics & Medical Informatics 826 Fall 2018

Introduction to clustering methods for gene expression data analysis

GENE REGULATION AND PROBLEMS OF DEVELOPMENT

56:198:582 Biological Networks Lecture 10

Lecture 8: Temporal programs and the global structure of transcription networks. Chap 5 of Alon. 5.1 Introduction

Computational Biology: Basics & Interesting Problems

BME 5742 Biosystems Modeling and Control

Understanding Science Through the Lens of Computation. Richard M. Karp Nov. 3, 2007

3.B.1 Gene Regulation. Gene regulation results in differential gene expression, leading to cell specialization.

Mixture models for analysing transcriptome and ChIP-chip data

Gene Ontology and Functional Enrichment. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein

CSEP 590A Summer Lecture 4 MLE, EM, RE, Expression

CSEP 590A Summer Tonight MLE. FYI, re HW #2: Hemoglobin History. Lecture 4 MLE, EM, RE, Expression. Maximum Likelihood Estimators

Computational Systems Biology

Introduction to clustering methods for gene expression data analysis

Learning in Bayesian Networks

Evidence for dynamically organized modularity in the yeast protein-protein interaction network

Gene regulation II Biochemistry 302. February 27, 2006

Computational Biology Course Descriptions 12-14

Unit 3: Control and regulation Higher Biology

Lecture 4: Transcription networks basic concepts

Bioinformatics. Transcriptome

Lecture 18 June 2 nd, Gene Expression Regulation Mutations

Complete all warm up questions Focus on operon functioning we will be creating operon models on Monday

Network Biology-part II

A New Method to Build Gene Regulation Network Based on Fuzzy Hierarchical Clustering Methods

STRUCTURAL BIOINFORMATICS I. Fall 2015

Prokaryotic Gene Expression (Learning Objectives)

Co-ordination occurs in multiple layers Intracellular regulation: self-regulation Intercellular regulation: coordinated cell signalling e.g.

Regulation of Gene Expression

Updated: 10/11/2018 Page 1 of 5

Discovering modules in expression profiles using a network

Comparative Network Analysis

Gene regulation I Biochemistry 302. Bob Kelm February 25, 2005

Geert Geeven. April 14, 2010

Probabilistic Latent Semantic Analysis

Network motifs in the transcriptional regulation network (of Escherichia coli):

Transcription Regulation and Gene Expression in Eukaryotes FS08 Pharmacenter/Biocenter Auditorium 1 Wednesdays 16h15-18h00.

L3.1: Circuits: Introduction to Transcription Networks. Cellular Design Principles Prof. Jenna Rickus

A general co-expression network-based approach to gene expression analysis: comparison and applications

CS-E5880 Modeling biological networks Gene regulatory networks

BMD645. Integration of Omics

Network Biology: Understanding the cell s functional organization. Albert-László Barabási Zoltán N. Oltvai

scrna-seq Differential expression analysis methods Olga Dethlefsen NBIS, National Bioinformatics Infrastructure Sweden October 2017

Chapter 16 Lecture. Concepts Of Genetics. Tenth Edition. Regulation of Gene Expression in Prokaryotes

MODULE -4 BAYEIAN LEARNING

56:198:582 Biological Networks Lecture 8

Sample Size Estimation for Studies of High-Dimensional Data

Prokaryotic Gene Expression (Learning Objectives)

UNIT 6 PART 3 *REGULATION USING OPERONS* Hillis Textbook, CH 11

What is the expectation maximization algorithm?

networks in molecular biology Wolfgang Huber

Gibbs Sampling Methods for Multiple Sequence Alignment

Biology. Biology. Slide 1 of 26. End Show. Copyright Pearson Prentice Hall

Fuzzy Clustering of Gene Expression Data

Group lasso for genomic data

A Case Study -- Chu et al. The Transcriptional Program of Sporulation in Budding Yeast. What is Sporulation? CSE 527, W.L. Ruzzo 1

Variable Selection in Structured High-dimensional Covariate Spaces

Boolean models of gene regulatory networks. Matthew Macauley Math 4500: Mathematical Modeling Clemson University Spring 2016

Bayesian Networks Inference with Probabilistic Graphical Models

Evolutionary analysis of the well characterized endo16 promoter reveals substantial variation within functional sites

Comparison of Shannon, Renyi and Tsallis Entropy used in Decision Trees

Basic Synthetic Biology circuits

GWAS IV: Bayesian linear (variance component) models

CHAPTER 13 PROKARYOTE GENES: E. COLI LAC OPERON

GENES AND CHROMOSOMES III. Lecture 5. Biology Department Concordia University. Dr. S. Azam BIOL 266/

Control of Gene Expression in Prokaryotes

Name: SBI 4U. Gene Expression Quiz. Overall Expectation:

Regulatory Inferece from Gene Expression. CMSC858P Spring 2012 Hector Corrada Bravo

Topic 4 - #14 The Lactose Operon

Control of Gene Expression

STAAR Biology Assessment

SCOTCAT Credits: 20 SCQF Level 7 Semester 1 Academic year: 2018/ am, Practical classes one per week pm Mon, Tue, or Wed

Data Mining Techniques

Bi 1x Spring 2014: LacI Titration

Identifying Signaling Pathways

STA 414/2104: Lecture 8

Bioinformatics: Network Analysis

Transcription:

Inferring Transcriptional Regulatory Networks from Gene Expression Data II Lectures 9 Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall (JHN) 022 1 Review: Gene Regulation Expression level of Regulator controls the expression levels of Targets it binds to. Regulator s expression is predictive of Targets expression E Regulator Regulator E Targets Targets AGTCTTAACGTTTGACCGCTAATT Segal et al., Nature Genetics 2003; Lee et al., PNAS 2006 1

Review: Inferring the regulatory networks Input: Gene expression data Output: Bayesian network representing how genes regulate each other at the transcriptional level, and how these interactions govern gene expression levels. Algorithm: Score-based structure learning of Bayesian networks Challenge: Too many possible structures Solutions: Control the complexity of the structure, Module networks Expression data measurement of mrna levels of all genes Arrays: conditions, individuals, etc A e 1 C B e 6 e Q Q 2x10 4 (for human) Transcriptional regulatory network Module 2-3 x + 0.5 x Review: The Module Networks Concept X3 X5 X2 Repressor X4 Activator X3 = X 1 Module 1 Module 4 X4 Module 3 X6 activator expression repressor expression target gene expression induced Activator X3 false Repressor X4 false Tree CPDs regulation program Module 3 genes Module3 Linear CPDs repressed 4 2

Outline Motivation Why are we interested in inferring the regulatory network? Algorithms for learning regulatory networks Tree-CPDs with Bayesian score Linear Gaussian CPDs with regularization Evaluation of the method Statistical evaluation Biological interpretation Advanced topics Many models can get similar scores. Which one would you choose? A gene can be involved in multiple modules. Possible to incorporate prior knowledge? 5 Motivations Gene A regulates gene B s expression Important basic biology questions Gene A regulates gene B in condition C Condition C: disease states (cancer/normal, subtypes), phenotypes, species (evolutionary processes), developmental stages, cell types, experimental conditions Example application (C: disease states) Understanding histologic transformation in lymphoma Lymphoma: the most common type of blood cancer in the US. Transformation of Follicular Lymphoma (FL) to Diffuse Large B-Cell Lymphoma (DLBCL) Occurs in 40-60% of patients; Dramatically worse prognosis Goal: Infer the mechanisms that drive transformation. 6 3

Predicting Cancer Transformation Network-based approach Different network features transformation mechanism? Early diagnosis of transformation; therapeutic implications? Share as many networks features as possible more robust than inferring two networks separately. Follicular Lymphoma e 1 e 5 e Q N1 patients with FL N2 patients with DLBCL Diffuse Large B-Cell Lymphoma regulatory networks gene expression data Outline Motivation Algorithms for learning regulatory networks Tree-CPDs with Bayesian score Linear Gaussian CPDs with regularization Evaluation of the method Statistical evaluation Biological interpretation Advanced topics Many models can get similar scores. Which one would you choose? A gene can be involved in multiple modules. Possible to incorporate prior knowledge? 8 4

false Goal false Heat CMK1 Shock? HAP4 Identify modules (member genes) Discover module regulation program Regulation program genes (X s) experiments Candidate regulators X 1 Module 1 (μ A,σ A ) false X3 X4 X3 X2 Module 2 X4 P(Level) 0 Level Context A P(Level) fals e 3 Level Context B P(Level)... -3 Level Context C X5 Module 3 X6 9 Learning Structure learning Find the structure that maximizes Bayesian score log P(S D) (or via regularization) Expectation Maximization (EM) algorithm M-step: Given a partition of the genes into modules, learn the best regulation program (tree CPD) for each module. E-step: Given the inferred regulatory programs, we reassign genes into modules such that the associated regulation program best predicts each gene s behavior. 10 5

Learning Regulatory Network Iterative procedure (EM) Cluster genes into modules (E-step) Learn a regulatory program for each module (tree model) (M-step) Maximum increase in the structural score (Bayesian) M1 Candidate regulators M22 KEM1 MKT1 MSK1 DHH1 ECM18 UTH1 GPA1 HAP4 TEC1 ASG7 MFA1 MEC3 HAP1 SGS1 SEC59 RIM15 SAS5 PHO5 PHO2 PHO3 PHM6 SPL2 PHO84 PHO4 GIT1 VTC3 11 From Expression to Regulation (M-step) Combinatorial search over the space of trees Arrays sorted in original order HAP4 PHO5 PHO2 PHO3 PHM6 PHO84 PHO4 HAP4 GIT1 VTC3 Arrays sorted according to expression of HAP4 Segal et al. Nat Genet 2003 12 6

From Expression to Regulation (M-step) HAP4 SIP4 PHO5 PHO2 PHO3 PHM6 PHO84 PHO4 GIT1 VTC3 SIP4 HAP4 Segal et al. Nat Genet 2003 13 Learning Control Programs u Score: log P(M D) log P(D M,, )P(, ) d d 0 u Score of HAP4 split: log P(M D) HAP4 log P(D HAP4 M,, )P(, ) d d + log P(D HAP4 M,, )P(, ) d d Segal et al. Nat Genet 2003 0 0 7

Learning Control Programs u Split as long as the score improves HAP4 HAP4 YGR043C 0 0 0 u 0 0 u Score of HAP4 split: log P(M D) log P(D HAP4 M,, )P(, ) d d + log P(D HAP4 M,, )P(, ) d d Score of HAP4/YGR043C split: log P(M D) log P(D HAP4 M,, )P(, ) d d + log P(D HAP4 D YGR043C M,, )P(, ) d d 15 + log P(D HAP4 D YGR043C M,, )P(, ) d d Review Learning Regulatory Network Iterative procedure Cluster genes into modules (E-step) Learn a regulatory program for each module (M-step) KEM1 HAP4 MKT1 false MSK1 DHH1 Maximum increase in Bayesian score UTH1 ECM18 TEC1 GPA1 HAP4 M1 M22 ASG7 MFA1 Candidate regulators MEC3 HAP1 MSK1 false SGS1 SEC59 RIM15 SAS5 PHO5 PHO2 PHO3 PHM6 SPL2 PHO84 PHO4 GIT1 VTC3 16 8

Module Networks* Activator Gene1 Gene2 Gene3 Genes in module activator expression Tree regression repressor expression target gene expression induced repressed activator false repressor false Repressor regulation program module genes Learning quickly runs out of statistical power Poor regulator selection lower in the tree Many correct regulators not selected Arbitrary choice among correlated regulators Combinatorial search Multiple local optima * Segal et al., Nature Genetics 2003 Regulation as Linear Regression minimize w (w 1 x 1 + w N x N -E Module ) 2 x 1 x 2 w w 1 2 w N parameters E Module E Targets = w 1 x 1 ++w N x N +ε x N S1011 S321 HAP1 S120 S321 SGS1 SEC59 SAS5 S22 S1 ECM18 ASG7 MEC3 UTH1 GPA1 MFA1 TEC1 RIM15 PHO3 PHO5 PHO84 PHM6 PHO2 PHO4 SPL2 GIT1 VTC3 But we often have very large N and linear regression gives them all nonzero weight! Problem: This objective learns too many regulators 18 9

Lasso* (L 1 ) Regression L 1 term minimize w (w 1 x 1 + w N x N -E Module ) 2 + C w i x 1 x 2 x N L 2 L1 parameters w w 1 2 w N E Module Induces sparsity in the solution w (many w i s set to zero) Provably selects right features when many features are irrelevant Convex optimization problem No combinatorial search Unique global optimum Efficient optimization * Tibshirani, 1996 Learning Regulatory Network Cluster genes into modules Learn a regulatory program for each module S1 S22 S1011 S321 S120 S321 ECM18 UTH1 GPA1 TEC1 ASG7 MFA1 MEC3 HAP1 SGS1 SEC59 RIM15 SAS5 PHO3 PHO5 PHO84 L PHM6 1 regression minimize PHO2 PHO4 w (Σw i x SPL2 i -E Targets )2+ C w i GIT1 VTC3 Lee et al., PLoS Genet 2009 10

Learning the regulatory network Multiple regression tasks S22 S1 S120 S1011 ECM18 ASG7 MEC3 S321 UTH1 GPA1 S321 MFA1 TEC1 HAP1 SGS1 RIM15 SEC59 SAS5 Module M PHO3 PHO5 PHO84 PHM6 PHO2 PHO4 SPL2 GIT1 VTC3 Module 1 minimize w1 (Σ w 1n x n E module1 ) 2 + C w 1n : minimize wn (Σ w Mn x n E modulem ) 2 + C w Mn Module 1 x 1 x 2 E module 1 x N w w 11 12 =0 w 1N Module M x 1 x 2 : E module M x N w w M1 =0 M2 w MN =0 Outline Motivation Algorithms for learning regulatory networks Evaluation of the method Statistical evaluation Biological interpretation Advanced topics Many models can get similar scores. Which one would you choose? A gene can be involved in multiple modules. Possible to incorporate prior knowledge? 22 11

Statistical Evaluation Cross-validation test Divide the data (experiments) into training and test data Compute the likelihood function for the Test data experiments Test likelihood How well the network fits to the test data? Test data genes? Regulatory network 23 Module Evaluation Criteria Are the module genes functionally coherent? Do the regulators have regulatory roles in the predicted conditions C (see slide 6)? Are the genes in the module known targets of the predicted regulators? Are the regulators consistent with the cisregulatory motifs (TF binding sites) found in promoters of the module genes? 24 12

Functional Coherence genes m Modules k N n Module 1 Cholesterol synthesis How significant is the overlap? Known functional categories Gene ontology (GO) http://www.geneontology.org/ Predicted targets of regulators Sharing TF binding sites : Calculate P(# overlap k K, n, N; two groups are independent) based on the hypergeometric distribution 25 Module Functional Coherence Coherence (%) 100 90 80 70 60 50 40 30 20 10 26 Modules >60% Coherent 41 Modules >40% Coherent 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 u Metabolic: AA, respiration, glycolysis, galactose u Stress: Oxidative stress, osmotic stress u Cellular localization: Nucleas, ER u Cellular processes: Cell cycle, sporulation, mating u Molecular functions: Protein folding, RNA & DNA processing, trafficking 26 13

Respiration Module u HAP4 known to up regulate Oxid. Phos. 27 Respiration Module u HAP4 known to up regulate Oxid. Phos. u HAP4, MSN4, XBP1 known to be regulators under predicted conditions 28 14

Respiration Module u HAP4 known to up regulate Oxid. Phos. u HAP4, MSN4, XBP1 known to be regulators under predicted conditions u HAP4 Binding site found in 39/55 genes 29 Respiration Module u HAP4 known to up regulate Oxid. Phos. u HAP4, MSN4, XBP1 known to be regulators under predicted conditions u HAP4 Binding site found in 39/55 genes u MSN4 Binding site found in 28/55 genes 30 15

Outline Motivation Algorithms for learning regulatory networks Evaluation of the method Advanced topics Many models can get similar scores. Which one would you choose? A gene can be involved in multiple modules. Possible to incorporate prior knowledge? 31 Structural learning via boostrapping Many networks that achieve similar scores 33.15 33.10 32.99 Which one would you choose? Estimate the robustness of each network or each edge. How?? Learn the networks from multiple datasets. Inferring sub-networks from perturbed expression profiles, Pe er et al. Bioinformatics 2001 32 16

Bootstrapping Sampling with replacement genes experiments Original data Bootstrap data 1 data 2 data N Inferring sub-networks from perturbed expression profiles, Pe er et al. Bioinformatics 2001 33 Bootstrapping Sampling with replacement Estimated confidence of each edge i # networks that contain the edge total # networks (N) 0.8 0.31 0.75 0.26 0.43 0.73 0.56 Bootstrap data 1 data 2 data N Inferring sub-networks from perturbed expression profiles, Pe er et al. Bioinformatics 2001 34 17

Overlapping Processes The living cell is a complex system Example, the cell cycle Cell cycle: the series of events that take place in a cell leading to its division and duplication. Genes functionally relevant to cell cycle regulation in the specific cell cycle phase Genes participate in multiple processes Partial figure from Maaser and Borlak Oct. 2008 Mutually exclusive clustering as a common approach to analyzing gene expression (+) genes likely to share a common function (-) group genes into mutually exclusive clusters (-) no info about genes relation to one another 35 Decomposition of Processes Model an expression level of a gene as a mixture of regulatory modules. Hard EM vs soft EM HAP4 false MSK1 false DHH1 w 1 false PUF3 X w 2 : w N HAP1 false HAP5 false false Probabilistic discovery of overlapping cellular processes and their regulation Battle et al. Journal of Computational Biology 2005 36 18