Computational Network Biology Biostatistics & Medical Informatics 826 Fall 2018

Similar documents
Comparative Network Analysis

Inferring Transcriptional Regulatory Networks from Gene Expression Data II

Network Biology: Understanding the cell s functional organization. Albert-László Barabási Zoltán N. Oltvai

Introduction to Bioinformatics

The Role of Network Science in Biology and Medicine. Tiffany J. Callahan Computational Bioscience Program Hunter/Kahn Labs

Molecular and Cellular Biology

Interaction Network Analysis

STRUCTURAL BIOINFORMATICS I. Fall 2015

Understanding Science Through the Lens of Computation. Richard M. Karp Nov. 3, 2007

Self Similar (Scale Free, Power Law) Networks (I)

CHEM 121: Chemical Biology

Evidence for dynamically organized modularity in the yeast protein-protein interaction network

Identifying Signaling Pathways

Bioinformatics 2. Yeast two hybrid. Proteomics. Proteomics

CHEM 181: Chemical Biology

Course plan Academic Year Qualification MSc on Bioinformatics for Health Sciences. Subject name: Computational Systems Biology Code: 30180

Network alignment and querying

Inferring Transcriptional Regulatory Networks from High-throughput Data

Updated: 10/11/2018 Page 1 of 5

GTECH 380/722 Analytical and Computer Cartography Hunter College, CUNY Department of Geography

Computational Biology Course Descriptions 12-14

Math 410 Linear Algebra Summer Session American River College

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

Proteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it?

Course information. Required Text

Cellular Biophysics SS Prof. Manfred Radmacher

CHEM 102 Fall 2012 GENERAL CHEMISTRY

Pathway Association Analysis Trey Ideker UCSD

Course Descriptions Biology

networks in molecular biology Wolfgang Huber

Grade Level: AP Biology may be taken in grades 11 or 12.

Proteomics Systems Biology

CISC 636 Computational Biology & Bioinformatics (Fall 2016)

Predicting Protein Functions and Domain Interactions from Protein Interactions

Cell biology traditionally identifies proteins based on their individual actions as catalysts, signaling

Bio 101 General Biology 1

Computational Molecular Biology (

GRAPH-THEORETICAL COMPARISON REVEALS STRUCTURAL DIVERGENCE OF HUMAN PROTEIN INTERACTION NETWORKS

Network Biology-part II

SYLLABUS AND COURSE POLICIES

Differential Modeling for Cancer Microarray Data

Network motifs in the transcriptional regulation network (of Escherichia coli):

Computational Biology: Basics & Interesting Problems

EASTERN ARIZONA COLLEGE General Biology I

GTECH 380/722 Analytical and Computer Cartography Hunter College, CUNY Department of Geography

W/F = 10:00 AM - 1:00 PM Other times available by appointment only

ECS 253 / MAE 253 April 26, Intro to Biological Networks, Motifs, and Model selection/validation

or

BioControl - Week 6, Lecture 1

Biological Networks. Gavin Conant 163B ASRC

MIP543 RNA Biology Fall 2015

Network diffusion-based analysis of high-throughput data for the detection of differentially enriched modules

Advanced Cell Biology. Lecture 1

Types of biological networks. I. Intra-cellurar networks

Important Dates. Non-instructional days. No classes. College offices closed.

Instructor: Dr. Darryl Kropf 203 S. Biology ; please put cell biology in subject line

Computational Systems Biology

A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS

Undergraduate Curriculum in Biology

An introduction to SYSTEMS BIOLOGY

Introduction to Biology Spring 2011 Biol 1308 CRN# 78977

Mathematical Biology - Lecture 1 - general formulation

Biology Lecture Schedule FALL; Sec # Steven A. Fink; Instructor FALL 2013 MSA 005 Phone: (310)

Lassen Community College Course Outline

Multi Omics Clustering. ABDBM Ron Shamir

STRUCTURAL BIOINFORMATICS II. Spring 2018

Networks & pathways. Hedi Peterson MTAT Bioinformatics

Analysis of Complex Systems

Biophysical Chemistry CHEM348 and CHEM348L

Chemistry Syllabus Fall Term 2017

Welcome to HST.508/Biophysics 170

Lecture 10: May 19, High-Throughput technologies for measuring proteinprotein

Lecture Notes for Fall Network Modeling. Ernest Fraenkel

Chemistry 330 Fall 2015 Organic Chemistry I

BIOINFORMATICS LAB AP BIOLOGY

Graph Theory and Networks in Biology arxiv:q-bio/ v1 [q-bio.mn] 6 Apr 2006

The geneticist s questions

CHEMISTRY 100 : CHEMISTRY and MAN

Computational Genomics. Reconstructing dynamic regulatory networks in multiple species

Algorithms in Computational Biology (236522) spring 2008 Lecture #1

Central Maine Community College Auburn, Maine Course Syllabus: Introduction to General Biology Instructor Lloyd Crocker

Computational approaches for functional genomics

Gene Regulatory Networks II Computa.onal Genomics Seyoung Kim

Graph Alignment and Biological Networks

Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week:

Bioinformatics I. CPBS 7711 October 29, 2015 Protein interaction networks. Debra Goldberg

Erzsébet Ravasz Advisor: Albert-László Barabási

10/4/ :31 PM Approved (Changed Course) BIO 10 Course Outline as of Summer 2017

Overview. Overview. Social networks. What is a network? 10/29/14. Bioinformatics I. Networks are everywhere! Introduction to Networks

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Staffing formula and enrollment limits X Other REMOVAL of GWAR attribute

MISSISSIPPI VALLEY STATE UNIVERSITY Department of Natural Science Chemistry Program Course Number: CH 320 Course Name: Introduction to Biochemistry

The architecture of complexity: the structure and dynamics of complex networks.

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

Phylogenetic Analysis of Molecular Interaction Networks 1

Building a cohesive bioinformatics classroom: implementation of wireless sharing technology in a large scale-up of SEA-PHAGES introductory biology lab

Stochastic simulations

Grundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson

Latent Variable models for GWAs

26.542: COLLOIDAL NANOSCIENCE & NANOSCALE ENGINEERING Fall 2013

Transcription:

Computational Network Biology Biostatistics & Medical Informatics 826 Fall 2018 Sushmita Roy sroy@biostat.wisc.edu https://compnetbiocourse.discovery.wisc.edu Sep 6 th 2018

Goals for today Administrivia Course topics Short survey of interests/background

BMI826 Computational Network Biology Course home page: https://compnetbiocourse.discovery.wisc.edu Instructor: Prof. Sushmita Roy sroy@biostat.wisc.edu Office: room 3168, Wisconsin Institute for Discovery Your WISC cards will be enabled for upper floor access. Office hours: Tuesday/Thursday: 2:30pm-3:30 pm By appointment via email Class announcements via piazza Enroll at https://piazza.com/wisc/fall2018/bmi826023

Finding my office WID 3168 Discovery building Engineering hall

Course organization Tentative schedule: https://compnetbiocourse.discovery.wisc.edu/schedule-2/ The material in this course is organized into five major topics At the beginning of each topic I will provide an introduction for the topic Most of the material is from published papers and review articles We will read and discuss papers from each of these topics Please see syllabus at https://compnetbiocourse.discovery.wisc.edu/syllabus/ Readings will be made online on the schedule page. Last week or so will be project presentations

Recommended background Computer science Introductory course in data structures is good, but not required Statistics Good if you ve had at least one course, but not required Molecular biology Good if you have had some introductory course An interest in learning some basic molecular biology Programming background Familiarity with a Linux environment Be able to run programs on data on the command line Be able to write code to do some data analysis and computations

Course grading Written critiques: 20% Five written critiques: one for each major topic Written and implementation assignments: 30% Three or so Project: 45% Proposal 10% Report 20% In class presentation 15% In class participation: 5%

Writing a critique Critiques are due at the end of each major topic Critiques will be a 1-2 page analysis of the papers read in each topic Specific papers to be included in the critique will be mentioned The critique should have the following components Overview of the problem area Approaches discussed Strengths and weaknesses Extensions to any of the approaches

Project information There are three main components to the project Proposal, In class presentation, Project report Project proposal draft (Oct 4 th ) Project proposal final draft (Oct 18 th ) Project presentations in last week of class Project report due (Dec 12 th ) Last day of lecture

Computational resources for this class Linux server mi1.biostat.wisc.edu available through the BMI department Pleaseconnect to mi1.biostat.wisc.edu Accounts for all registered students in the class have been requested

Goals for today Administrivia Course topics Short survey of background and interests

What is Network biology? A collection of algorithms and tools to build, interpret, and use graph representations of interacting molecular entities in biological and bio-medical problems (Adapted from Winterbach et al, 2013, BMC Systems Biology) ~15 years old The term Network biology was likely coined by Albert-László Barabási & Zoltán N. Oltvai 2004 Intersects with computer science, statistics, physics, molecular biology Related/overlapping areas Bioinformatics, Systems biology, Complex systems, Biological network analysis, Network science

Why Network biology? Cells are complex systems A complex system: many components that interact to determine overall function Networks are natural representations of complex systems Provides a framework and important tools for integration, interpretation and discovery Many biological applications e.g. Understanding complex biological processes at the molecular level Disease prognosis Interpretation of genetic variation Predictive models of cellular function Gene function prediction and prioritization

Overview of lecture topics Course material is organized by the biological problem and computational approaches to address the problem q q q q q q Biological problem Mapping regulatory network structure Dynamics and context specificity of networks Understanding design principles of biological networks Interpretation of sequence variants Identification of important genes Predicting the function of a gene Computational approaches q Probabilistic graphical models q Graph structure learning q Multiple network learning q Graph clustering q Graph alignment q Diffusion on graphs

Network inference: How do molecular entities interact within a cell? Amit et al., Nat. Rev. Immunology, 2011

Network inference Gene expression levels Samples Phenotypic Variation in Yeast Algorithm Figure 3. Variation in gene expression in S. cerevisiae isolates. The diagrams show the average log2 expression differences measured in the denoted strains. Each row represents a given gene and each column represents a different strain, color-coded as described in Figure 1. (A) Expression patterns of 2,680 genes that varied significantly (FDR = 0.01, paired t-test) in at least one strain compared to S288c. (B) Expression patterns of 953 genes that varied significantly in at least one strain compared to strain YPS163 (FDR=0.01, unpaired t-test). For (A) and (B), a red color indicates higher expression and a green color represents lower expression in the denoted strain compared to S288c, according to the key. (C) Expression patterns of 1,330 genes that varied significantly (FDR = 0.01, paired t-test) in at least one strain compared to the mean expression of all 17 strains. Here, red and green correspond to higher and lower expression, respectively, compared to the mean expression of that gene in all strains. Genes were organized independently in each plot by hierarchical clustering. doi:10.1371/journal.pgen.1000223.g003 differences from the mean ranged from 30 (in vineyard strain I14) of some of these categories was also observed for a different set of to nearly 600 (in clinical isolate YJM789), with a median of 88 vineyard strains [26,28], and the genetic basis for differential expression differences per strain. The number of expression expression of amino acid biosynthetic genes in one vineyard strain differences did not correlate strongly with the genetic distances of has recently been linked to a polymorphism in an amino acid the strains (R 2 = 0.16). However, this is not surprising since many sensory protein [35]. We also noted that the 1330 genes with of the observed expression differences are likely linked in trans to statistically variable expression in at least one non-laboratory the same genetic loci [27,31,34,35,43]. Consistent with this strain were enriched for genes that contained upstream TATA interpretation, we found that the genes affected in each strain elements [46] (p = 10 216 ) and genes with paralogs (p = 10 26 ) but were enriched for specific functional categories (Table S4), under-enriched for essential genes [47] (p = 10 225 ). The trends revealing that altered expression of pathways of genes was a and statistical significance were similar using 953 genes that varied common occurrence in our study. significantly from YPS163. Thus, genes with specific functional We noticed that some functional categories were repeatedly and regulatory features are more likely to vary in expression under affected in different strains. To further explore this, we identified the conditions examined here, consistent with reports of other individual genes whose expression differed from the mean in at recent studies [30,43,48,49] (see Discussion). least 3 of the 17 non-laboratory strains. This group of 219 genes was strongly enriched for genes involved in amino acid metabolism Influence of Copy Number Variation on Gene Expression Variation Biological knowledge bases (p,10 247 ), revealing that genes involved in these functions had Expression from transposable Ty elements was highly variable a higher frequency of expression variation. Differential expression across strains. However, Ty copy number is known to vary widely (p,10 214 ), sulfur metabolism (p,10 214 ), and transposition PLoS Genetics www.plosgenetics.org 5 October 2008 Volume 4 Issue 10 e1000223 Computational concepts 1. Different types of graphical models for network representation 2. Learning graphical models from data 3. Integrating prior information into models X 1 X 5 Y 1 Y 2 X 2

Network dynamics: How do networks change between different biological contexts? Context C 1 t=1 t=2 t=3 t=4 t=5 t=6 t=7 Context C 2 Contexts can be different time points, cell types, disease states, organisms t=8 t=9 t=10 t=11 t=12 t=13 t=14 t=15 t=16 t=17 t=18 t=19 Computational concepts 1. Multi-task learning 2. Dynamic models for networks t=20 t=21 t=22 t=23 t=24 Gene network rewiring during cell cycle From Curtis et al., BMC Bioinformatics 2012

Network topology: How is a network organized? A Random network Aa B Scale-free network Ba C Hierarchical network Ca Ab P(k) P(k) Bb 1 0.1 0.01 0.001 0.0001 Cb P(k) 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 7 Computational concepts 1. Degree distribution 2. Network motifs 3. Centrality measures Ac k Bc 1 10 100 1,000 k Cc 10 8 10 100 1,000 10,000 k Barabasi & Oltvai, Nature Review Genetics 2004

Graph clustering: functional and diseaser E V I E W module identification Recently, network propagation methods such those implemented in the bioinformatics tools Hot and TieDIE (TABLE 1) have been used for network m Mitra et al., Nat Rev Genetics 2013; Hotspot 2 ping of cancer mutations. These methods have pro particularly valuable for discovering R Emutational VIEWS h 67,70 74. For example, in o spots in human cancers implementation of the application tool HotNet 67, s Hotspot 1 nificantly mutated pathways in glioblastomas a adenocarcinomas were identified through netw Interaction networks (e.g., physical, genetic or metabolic) propagation of associated cancer mutation profi Here, diffusion flow was run on a human protein p tein network that was seeded from known cancer ge to map their global neighbourhood of interaction. T Experimental data Barabasi et al., Nat Rev operation translates to computing the influence of c Genetics 2011 cer genes on all remaining genes in the network (BOX Simulated data The resulting influence network (representing the Converting molecular set of network connectivities surrounding cancer s Figure 2 Disease modules. Schematic diagram of the three modularity concepts that are discussed in this Review. activity to scores a Topological modules correspond to locally dense neighbourhoods of the interactome, such that the nodes of weigh genes) was subsequently partitioned into Computational concepts the module show a higher tendency to interact with each other than with nodes outside the module. As such, subnetworks. Thresholds weretoapplied to these s topological modules represent a pure network property. b Functional modules correspond network neighbourhoods 1. Graph clustering in which there is a statistically significant segregation of nodes of related function. Thus, a functional module requires us networks according tohypothesis either that thenodes number of patient to define some nodal characteristics (shown as grey nodes) and relies on the that are involved in 2. Modularity measures closely related cellular functions tend to interact with each other and are therefore located in the same network Method validation which they were mutated, or by the average numbe neighbourhood. c A disease module represents a group of nodes whose perturbation (mutations, deletions, copy ork annotation

ng of the ecies) or d nodes problem.17 in the gnment; mpanied mapped problem obal nethere one rks, one simplest ntifying proteins ze some as the ions be- Figure 2. The NetworkBLAST local network alignment algorithm. Given two input networks, a network alignment graph is constructed. Nodes in this graph correspond to pairs of sequence-similar proteins, one from each species, and edges correspond to conserved interactions. A search algorithm identifies highly similar subnetworks that follow a prespecified interaction pattern. Adapted from Sharan and Ideker.30 Graph alignment: What parts of networks from two species are similar? -hard as ubgraph However, Computational concepts ILP ap1. Clustering on graphs worked 2. we Scoring subnetworks and subnetwork Figure 3. Performance comparison ofsearch computational approaches. Here, 3. Matrix factorization Kelley et al PNAS 2003 demon-

Graph diffusion: Which genes are most important? Computational concepts 1. Random walks on graphs 2. Graph diffusion kernels 3. Random walks on graphs Koehler et al., AJHG 2008

Graph diffusion: Characterizing genetic variation and impact on complex phenotypes a a G G a G Computational concepts 1. Graph diffusion methods 2. Subnetwork identification Ye et al, 2014, Science, Leiserson et al., Nature Genetics 2015

Graph-based data integration a Original data b Patient similarity matrices c Patient similarity networks d Fusion iterations e Patients mrna expression Fused patient similarity network Patients Patients Patients DNA methylation Patients Patients Patients Patient similarity: mrna-based DNA methylation based Supported by all data Computational concepts 1. Graph clustering 2. Clustering multiple graphs From Wang et al Nature Methods 2013

Plan for next few lectures Sep 11 th, 13 th Background into graph theory, probability theory and molecular networks Readings: L. Hunter. Life and Its Molecules: A Brief Introduction. AI Magazine 25(1):9-22, 2004. Sep 15 th Winterbach et al., Topology of molecular interaction networks. BMC Systems Biology, 2013 Section on Network Biology Probabilistic graphical models for molecular networks

Learning goals of this class Gain a broad overview of the application areas and computational solutions in Network biology Gain a deeper understanding one or two areas introduced Apply the computational concepts to similar problems in biology and complex systems Understand and critique scientific articles Enable self learning and deeper study of related topics

Goals for today Administrivia Course topics Short survey of interests/background