Computational Network Biology Biostatistics & Medical Informatics 826 Fall 2018 Sushmita Roy sroy@biostat.wisc.edu https://compnetbiocourse.discovery.wisc.edu Sep 6 th 2018
Goals for today Administrivia Course topics Short survey of interests/background
BMI826 Computational Network Biology Course home page: https://compnetbiocourse.discovery.wisc.edu Instructor: Prof. Sushmita Roy sroy@biostat.wisc.edu Office: room 3168, Wisconsin Institute for Discovery Your WISC cards will be enabled for upper floor access. Office hours: Tuesday/Thursday: 2:30pm-3:30 pm By appointment via email Class announcements via piazza Enroll at https://piazza.com/wisc/fall2018/bmi826023
Finding my office WID 3168 Discovery building Engineering hall
Course organization Tentative schedule: https://compnetbiocourse.discovery.wisc.edu/schedule-2/ The material in this course is organized into five major topics At the beginning of each topic I will provide an introduction for the topic Most of the material is from published papers and review articles We will read and discuss papers from each of these topics Please see syllabus at https://compnetbiocourse.discovery.wisc.edu/syllabus/ Readings will be made online on the schedule page. Last week or so will be project presentations
Recommended background Computer science Introductory course in data structures is good, but not required Statistics Good if you ve had at least one course, but not required Molecular biology Good if you have had some introductory course An interest in learning some basic molecular biology Programming background Familiarity with a Linux environment Be able to run programs on data on the command line Be able to write code to do some data analysis and computations
Course grading Written critiques: 20% Five written critiques: one for each major topic Written and implementation assignments: 30% Three or so Project: 45% Proposal 10% Report 20% In class presentation 15% In class participation: 5%
Writing a critique Critiques are due at the end of each major topic Critiques will be a 1-2 page analysis of the papers read in each topic Specific papers to be included in the critique will be mentioned The critique should have the following components Overview of the problem area Approaches discussed Strengths and weaknesses Extensions to any of the approaches
Project information There are three main components to the project Proposal, In class presentation, Project report Project proposal draft (Oct 4 th ) Project proposal final draft (Oct 18 th ) Project presentations in last week of class Project report due (Dec 12 th ) Last day of lecture
Computational resources for this class Linux server mi1.biostat.wisc.edu available through the BMI department Pleaseconnect to mi1.biostat.wisc.edu Accounts for all registered students in the class have been requested
Goals for today Administrivia Course topics Short survey of background and interests
What is Network biology? A collection of algorithms and tools to build, interpret, and use graph representations of interacting molecular entities in biological and bio-medical problems (Adapted from Winterbach et al, 2013, BMC Systems Biology) ~15 years old The term Network biology was likely coined by Albert-László Barabási & Zoltán N. Oltvai 2004 Intersects with computer science, statistics, physics, molecular biology Related/overlapping areas Bioinformatics, Systems biology, Complex systems, Biological network analysis, Network science
Why Network biology? Cells are complex systems A complex system: many components that interact to determine overall function Networks are natural representations of complex systems Provides a framework and important tools for integration, interpretation and discovery Many biological applications e.g. Understanding complex biological processes at the molecular level Disease prognosis Interpretation of genetic variation Predictive models of cellular function Gene function prediction and prioritization
Overview of lecture topics Course material is organized by the biological problem and computational approaches to address the problem q q q q q q Biological problem Mapping regulatory network structure Dynamics and context specificity of networks Understanding design principles of biological networks Interpretation of sequence variants Identification of important genes Predicting the function of a gene Computational approaches q Probabilistic graphical models q Graph structure learning q Multiple network learning q Graph clustering q Graph alignment q Diffusion on graphs
Network inference: How do molecular entities interact within a cell? Amit et al., Nat. Rev. Immunology, 2011
Network inference Gene expression levels Samples Phenotypic Variation in Yeast Algorithm Figure 3. Variation in gene expression in S. cerevisiae isolates. The diagrams show the average log2 expression differences measured in the denoted strains. Each row represents a given gene and each column represents a different strain, color-coded as described in Figure 1. (A) Expression patterns of 2,680 genes that varied significantly (FDR = 0.01, paired t-test) in at least one strain compared to S288c. (B) Expression patterns of 953 genes that varied significantly in at least one strain compared to strain YPS163 (FDR=0.01, unpaired t-test). For (A) and (B), a red color indicates higher expression and a green color represents lower expression in the denoted strain compared to S288c, according to the key. (C) Expression patterns of 1,330 genes that varied significantly (FDR = 0.01, paired t-test) in at least one strain compared to the mean expression of all 17 strains. Here, red and green correspond to higher and lower expression, respectively, compared to the mean expression of that gene in all strains. Genes were organized independently in each plot by hierarchical clustering. doi:10.1371/journal.pgen.1000223.g003 differences from the mean ranged from 30 (in vineyard strain I14) of some of these categories was also observed for a different set of to nearly 600 (in clinical isolate YJM789), with a median of 88 vineyard strains [26,28], and the genetic basis for differential expression differences per strain. The number of expression expression of amino acid biosynthetic genes in one vineyard strain differences did not correlate strongly with the genetic distances of has recently been linked to a polymorphism in an amino acid the strains (R 2 = 0.16). However, this is not surprising since many sensory protein [35]. We also noted that the 1330 genes with of the observed expression differences are likely linked in trans to statistically variable expression in at least one non-laboratory the same genetic loci [27,31,34,35,43]. Consistent with this strain were enriched for genes that contained upstream TATA interpretation, we found that the genes affected in each strain elements [46] (p = 10 216 ) and genes with paralogs (p = 10 26 ) but were enriched for specific functional categories (Table S4), under-enriched for essential genes [47] (p = 10 225 ). The trends revealing that altered expression of pathways of genes was a and statistical significance were similar using 953 genes that varied common occurrence in our study. significantly from YPS163. Thus, genes with specific functional We noticed that some functional categories were repeatedly and regulatory features are more likely to vary in expression under affected in different strains. To further explore this, we identified the conditions examined here, consistent with reports of other individual genes whose expression differed from the mean in at recent studies [30,43,48,49] (see Discussion). least 3 of the 17 non-laboratory strains. This group of 219 genes was strongly enriched for genes involved in amino acid metabolism Influence of Copy Number Variation on Gene Expression Variation Biological knowledge bases (p,10 247 ), revealing that genes involved in these functions had Expression from transposable Ty elements was highly variable a higher frequency of expression variation. Differential expression across strains. However, Ty copy number is known to vary widely (p,10 214 ), sulfur metabolism (p,10 214 ), and transposition PLoS Genetics www.plosgenetics.org 5 October 2008 Volume 4 Issue 10 e1000223 Computational concepts 1. Different types of graphical models for network representation 2. Learning graphical models from data 3. Integrating prior information into models X 1 X 5 Y 1 Y 2 X 2
Network dynamics: How do networks change between different biological contexts? Context C 1 t=1 t=2 t=3 t=4 t=5 t=6 t=7 Context C 2 Contexts can be different time points, cell types, disease states, organisms t=8 t=9 t=10 t=11 t=12 t=13 t=14 t=15 t=16 t=17 t=18 t=19 Computational concepts 1. Multi-task learning 2. Dynamic models for networks t=20 t=21 t=22 t=23 t=24 Gene network rewiring during cell cycle From Curtis et al., BMC Bioinformatics 2012
Network topology: How is a network organized? A Random network Aa B Scale-free network Ba C Hierarchical network Ca Ab P(k) P(k) Bb 1 0.1 0.01 0.001 0.0001 Cb P(k) 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 7 Computational concepts 1. Degree distribution 2. Network motifs 3. Centrality measures Ac k Bc 1 10 100 1,000 k Cc 10 8 10 100 1,000 10,000 k Barabasi & Oltvai, Nature Review Genetics 2004
Graph clustering: functional and diseaser E V I E W module identification Recently, network propagation methods such those implemented in the bioinformatics tools Hot and TieDIE (TABLE 1) have been used for network m Mitra et al., Nat Rev Genetics 2013; Hotspot 2 ping of cancer mutations. These methods have pro particularly valuable for discovering R Emutational VIEWS h 67,70 74. For example, in o spots in human cancers implementation of the application tool HotNet 67, s Hotspot 1 nificantly mutated pathways in glioblastomas a adenocarcinomas were identified through netw Interaction networks (e.g., physical, genetic or metabolic) propagation of associated cancer mutation profi Here, diffusion flow was run on a human protein p tein network that was seeded from known cancer ge to map their global neighbourhood of interaction. T Experimental data Barabasi et al., Nat Rev operation translates to computing the influence of c Genetics 2011 cer genes on all remaining genes in the network (BOX Simulated data The resulting influence network (representing the Converting molecular set of network connectivities surrounding cancer s Figure 2 Disease modules. Schematic diagram of the three modularity concepts that are discussed in this Review. activity to scores a Topological modules correspond to locally dense neighbourhoods of the interactome, such that the nodes of weigh genes) was subsequently partitioned into Computational concepts the module show a higher tendency to interact with each other than with nodes outside the module. As such, subnetworks. Thresholds weretoapplied to these s topological modules represent a pure network property. b Functional modules correspond network neighbourhoods 1. Graph clustering in which there is a statistically significant segregation of nodes of related function. Thus, a functional module requires us networks according tohypothesis either that thenodes number of patient to define some nodal characteristics (shown as grey nodes) and relies on the that are involved in 2. Modularity measures closely related cellular functions tend to interact with each other and are therefore located in the same network Method validation which they were mutated, or by the average numbe neighbourhood. c A disease module represents a group of nodes whose perturbation (mutations, deletions, copy ork annotation
ng of the ecies) or d nodes problem.17 in the gnment; mpanied mapped problem obal nethere one rks, one simplest ntifying proteins ze some as the ions be- Figure 2. The NetworkBLAST local network alignment algorithm. Given two input networks, a network alignment graph is constructed. Nodes in this graph correspond to pairs of sequence-similar proteins, one from each species, and edges correspond to conserved interactions. A search algorithm identifies highly similar subnetworks that follow a prespecified interaction pattern. Adapted from Sharan and Ideker.30 Graph alignment: What parts of networks from two species are similar? -hard as ubgraph However, Computational concepts ILP ap1. Clustering on graphs worked 2. we Scoring subnetworks and subnetwork Figure 3. Performance comparison ofsearch computational approaches. Here, 3. Matrix factorization Kelley et al PNAS 2003 demon-
Graph diffusion: Which genes are most important? Computational concepts 1. Random walks on graphs 2. Graph diffusion kernels 3. Random walks on graphs Koehler et al., AJHG 2008
Graph diffusion: Characterizing genetic variation and impact on complex phenotypes a a G G a G Computational concepts 1. Graph diffusion methods 2. Subnetwork identification Ye et al, 2014, Science, Leiserson et al., Nature Genetics 2015
Graph-based data integration a Original data b Patient similarity matrices c Patient similarity networks d Fusion iterations e Patients mrna expression Fused patient similarity network Patients Patients Patients DNA methylation Patients Patients Patients Patient similarity: mrna-based DNA methylation based Supported by all data Computational concepts 1. Graph clustering 2. Clustering multiple graphs From Wang et al Nature Methods 2013
Plan for next few lectures Sep 11 th, 13 th Background into graph theory, probability theory and molecular networks Readings: L. Hunter. Life and Its Molecules: A Brief Introduction. AI Magazine 25(1):9-22, 2004. Sep 15 th Winterbach et al., Topology of molecular interaction networks. BMC Systems Biology, 2013 Section on Network Biology Probabilistic graphical models for molecular networks
Learning goals of this class Gain a broad overview of the application areas and computational solutions in Network biology Gain a deeper understanding one or two areas introduced Apply the computational concepts to similar problems in biology and complex systems Understand and critique scientific articles Enable self learning and deeper study of related topics
Goals for today Administrivia Course topics Short survey of interests/background