Identifying Signaling Pathways

Similar documents
Comparative Network Analysis

Multiple Whole Genome Alignment

Inferring Models of cis-regulatory Modules using Information Theory

Bioinformatics 2. Yeast two hybrid. Proteomics. Proteomics

Proteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it?

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

Written Exam 15 December Course name: Introduction to Systems Biology Course no

Computational methods for predicting protein-protein interactions

Inferring Models of cis-regulatory Modules using Information Theory

GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data

Types of biological networks. I. Intra-cellurar networks

Lecture Notes for Fall Network Modeling. Ernest Fraenkel

Inferring Protein-Signaling Networks

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor

Introduction to Bioinformatics

Unravelling the biochemical reaction kinetics from time-series data

The geneticist s questions. Deleting yeast genes. Functional genomics. From Wikipedia, the free encyclopedia

Latent Variable models for GWAs

Discovering molecular pathways from protein interaction and ge

Understanding Science Through the Lens of Computation. Richard M. Karp Nov. 3, 2007

Complete all warm up questions Focus on operon functioning we will be creating operon models on Monday

Learning in Bayesian Networks

Introduction to Bioinformatics

Causal Discovery by Computer

Predicting Protein Functions and Domain Interactions from Protein Interactions

Information Extraction from Biomedical Text. BMI/CS 776 Mark Craven

BioControl - Week 6, Lecture 1

Models of transcriptional regulation

Systems biology and biological networks

Proteomics. 2 nd semester, Department of Biotechnology and Bioinformatics Laboratory of Nano-Biotechnology and Artificial Bioengineering

Network Biology: Understanding the cell s functional organization. Albert-László Barabási Zoltán N. Oltvai

Grundlagen der Systembiologie und der Modellierung epigenetischer Prozesse

Proteomics. Areas of Interest

Computational approaches for functional genomics

networks in molecular biology Wolfgang Huber

Prokaryotic Gene Expression (Learning Objectives)

Proteomics Systems Biology

Information Extraction from Biomedical Text

Bioinformatics 2 - Lecture 4

Clustering and Network

V14 Graph connectivity Metabolic networks

L3.1: Circuits: Introduction to Transcription Networks. Cellular Design Principles Prof. Jenna Rickus

SPA for quantitative analysis: Lecture 6 Modelling Biological Processes

Computational Network Biology Biostatistics & Medical Informatics 826 Fall 2018

Gene Network Science Diagrammatic Cell Language and Visual Cell

Lecture 4: Yeast as a model organism for functional and evolutionary genomics. Part II

Chapter 15 Active Reading Guide Regulation of Gene Expression

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

The geneticist s questions

Networks & pathways. Hedi Peterson MTAT Bioinformatics

CONJOINT 541. Translating a Transcriptome at Specific Times and Places. David Morris. Department of Biochemistry

Biological networks CS449 BIOINFORMATICS

Biological Concepts and Information Technology (Systems Biology)

Biological Pathways Representation by Petri Nets and extension

CHAPTER 13 PROKARYOTE GENES: E. COLI LAC OPERON

hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference

Introduction. Gene expression is the combined process of :

Boolean models of gene regulatory networks. Matthew Macauley Math 4500: Mathematical Modeling Clemson University Spring 2016

Protein-protein interaction networks Prof. Peter Csermely

56:198:582 Biological Networks Lecture 8

Discovering modules in expression profiles using a network

Bioinformatics Chapter 1. Introduction

How much non-coding DNA do eukaryotes require?

Computational Systems Biology

Chapter 7: Metabolic Networks

Simulation of Gene Regulatory Networks

UNIT 6 PART 3 *REGULATION USING OPERONS* Hillis Textbook, CH 11

Inferring Transcriptional Regulatory Networks from Gene Expression Data II

Computational Biology: Basics & Interesting Problems

Tandem Mass Spectrometry: Generating function, alignment and assembly

Learning Sequence Motif Models Using Gibbs Sampling

METABOLIC PATHWAY PREDICTION/ALIGNMENT

Physical network models and multi-source data integration

ATLAS of Biochemistry

Course plan Academic Year Qualification MSc on Bioinformatics for Health Sciences. Subject name: Computational Systems Biology Code: 30180

What is Systems Biology

Biological Networks. Gavin Conant 163B ASRC

32 Gene regulation, continued Lecture Outline 11/21/05

Principles of Genetics

V14 extreme pathways

86 Part 4 SUMMARY INTRODUCTION

Introduction Biology before Systems Biology: Reductionism Reduce the study from the whole organism to inner most details like protein or the DNA.

six lectures on systems biology

Unit 3: Control and regulation Higher Biology

Inferring Protein-Signaling Networks II

Multiple Choice Review- Eukaryotic Gene Expression

Networks in systems biology

Approximate inference for stochastic dynamics in large biological networks

Regulation and signaling. Overview. Control of gene expression. Cells need to regulate the amounts of different proteins they express, depending on

Geert Geeven. April 14, 2010

COMPARATIVE PATHWAY ANNOTATION WITH PROTEIN-DNA INTERACTION AND OPERON INFORMATION VIA GRAPH TREE DECOMPOSITION

CSCE555 Bioinformatics. Protein Function Annotation

Mathematical Biology - Lecture 1 - general formulation

BMI/CS 776 Lecture #20 Alignment of whole genomes. Colin Dewey (with slides adapted from those by Mark Craven)

Evidence for dynamically organized modularity in the yeast protein-protein interaction network

robustness: revisting the significance of mirna-mediated regulation

Optimality, Robustness, and Noise in Metabolic Network Control

A New Method to Build Gene Regulation Network Based on Fuzzy Hierarchical Clustering Methods

V19 Metabolic Networks - Overview

Mole_Oce Lecture # 24: Introduction to genomics

Transcription:

These slides, excluding third-party material, are licensed under CC BY-NC 4.0 by Anthony Gitter, Mark Craven, Colin Dewey Identifying Signaling Pathways BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2018 Anthony Gitter gitter@biostat.wisc.edu

Goals for lecture Challenges of integrating high-throughput assays Connecting relevant genes/proteins with interaction networks ResponseNet algorithm Evaluating pathway predictions Classes of signaling pathway prediction methods 2

High-throughput screening Which genes are involved in which cellular processes? Hit: gene that affects the phenotype Phenotypes include: Growth rate Cell death Cell size Intensity of some reporter Many others 3

Types of screens Genetic screening Test genes individually or in parallel Knockout, knockdown (RNA interference), overexpression, CRISPR/Cas genome editing Chemical screening Which genes are affected by a stimulus? 4

Differentially expressed genes Compare mrna transcript levels between control and treatment conditions Genes whose expression changes significantly are also involved in the cellular process Alternatively, differential protein abundance or phosphorylation 5

Interpreting screens Screen hits Differentially expressed genes Very few genes detected in both 6

Assays reveal different parts of a cellular process Database representation of a pathway KEGG 7

Assays reveal different parts of a cellular process Differentially expressed genes Genetic screen hits 8

Pathways connect the disjoint gene lists Can t rely on pathway databases High-quality, low coverage Instead learn condition-specific pathways computationally Combine data with generic physical interaction networks 9

Physical interactions Protein-protein interactions (PPI) Appling Graz Prot A Prot B Metabolic Protein-DNA (transcription factor-gene) Yeger-Lotem2009 TF Gene Genes and proteins are different node types 10

Hairball networks Networks are highly connected Can t use naïve strategy to connect screen hits and differentially expressed genes Yeger-Lotem2009 11

Identify connections within an interaction network Yeger-Lotem2009 12

How to define a computational pathway Given: Partially directed network of known physical interactions (e.g. PPI, kinase-substrate, TF-gene) Scores on source nodes Scores on target nodes Do: Return directed paths in the network connecting sources to targets 13

ResponseNet optimization goals Connect screen hits and differentially expressed genes Recover sparse connections Identify intermediate proteins missed by the screens Prefer high-confidence interactions Minimum cost flow formulation can meet these objectives 14

Construct the interaction network Protein Gene 15

Transform to a flow problem S T 16

Max flow on graphs Each edge can tolerate different level of flow or have different preference of sending flow along that edge S Pump flow from source Incoming and outgoing flow conserved at each node Flow conserved to target T 17

Weighting interactions Probability-like confidence of the interaction Example evidence: edge score of 1.0 16 distinct publications supporting the edge irefweb 18

Weights and capacities on edges S w ij from interaction network confidence (w ij, c ij ) c ij = 1 Flow capacity T 19

Find the minimum cost flow Return the edges with non-zero flow S Prefer no flow on the low-weight edges if alternative paths exist T 20

Formal minimum cost flow Positive flow on an edge incurs a cost Cost is greater for low-weight edges Flow on an edge Parameter controlling the amount of flow from the source 21

Formal minimum cost flow Flow coming in to a node equals flow leaving the node 22

Formal minimum cost flow Flow leaving the source equals flow entering the target 23

Formal minimum cost flow Flow is non-negative and does not exceed edge capacity 24

Formal minimum cost flow 25

Linear programming Optimization problem is a linear program Canonical form Wikipedia Polynomial time complexity Many off-the-shelf solvers Practical Optimization: A Gentle Introduction Introduction to linear programming Simplex method Network flow 26

ResponseNet pathways Identifies pathway members that are neither hits nor differentially expressed Ste5 recovered when STE5 deletion is the perturbation 27

ResponseNet summary Advantages Computationally efficient Integrates multiple types of data Incorporates interaction confidence Identifies biologically plausible networks Disadvantages Direction of flow is not biologically meaningful Path length not considered Requires sources and targets Dependent on completeness and quality of input network 28

Evaluating pathway predictions Unlike PIQ, we don t have a complete gold standard available for evaluation Can simulate gold standard pathways from a network Compare relative performance of multiple methods on independent data 29

Evaluating pathway predictions Ritz2016 30

Evaluating pathway predictions Ritz2016 31

Evaluating pathway predictions MacGilvray2018 PR curves can evaluate node or edge recovery but not the global pathway structure 32

Evaluation beyond pathway databases Natural language processing can also help semi-automated evaluation Literome Chilibot ihop 33

Classes of pathway prediction algorithms No Are edges important? Yes Network diffusion Sources and targets? No Yes Spanning tree Steiner tree Next slide 34

Classes of pathway prediction algorithms Have sources and targets What path properties are important? Total path length or score Total sourcetarget connectivity Connectivity in minimum cost network Complex properties Shortest paths Network flow Steiner tree Integer program Symbolic solver Graphical model 35

Alternative pathway identification algorithms k-shortest paths Ruths2007 Shih2012 Random walks / network diffusion / circuits Tu2006 eqtl electrical diagrams (eqed) HotNet Integer programs Signaling-regulatory Pathway INferencE (SPINE) Chasman2014 36

Alternative pathway identification algorithms Path-based objectives Physical Network Models (PNM) Maximum Edge Orientation (MEO) Signaling and Dynamic Regulatory Events Miner (SDREM) Steiner tree Prize-collecting Steiner forest (PCSF) Belief propagation approximation (msgsteiner) Omics Integrator implementation Hybrid approaches PathLinker: random walk + shortest paths ANAT: shortest paths + Steiner tree 37

Recent developments in pathway discovery Multi-task learning: jointly model several related biological conditions ResponseNet extension: SAMNet Steiner forest extension: Multi-PCSF SDREM extension: MT-SDREM Temporal data ResponseNet extension: TimeXNet Steiner forest extension and ST-Steiner Temporal Pathway Synthesizer 38

Condition-specific genes/proteins used as input Genetic screen hits (as causes or effects) Differentially expressed genes Transcription factors inferred from gene expression Proteomic changes (protein abundance or posttranslational modifications) Kinases inferred from phosphorylation Genetic variants or DNA mutations Enzymes regulating metabolites Receptors or sensory proteins Protein interaction partners Pathway databases or other prior knowledge 39