Comparative Genomics: Sequence, Structure, and Networks. Bonnie Berger MIT

Size: px
Start display at page:

Download "Comparative Genomics: Sequence, Structure, and Networks. Bonnie Berger MIT"

Transcription

1 Comparative Genomics: Sequence, Structure, and Networks Bonnie Berger MIT

2 Comparative Genomics Look at the same kind of data across species with the hope that areas of high correlation correspond to functional parts or modules of the genome.

3 Biology in One Slide Protein Function

4 Comparative Genomics of DNA Protein Function

5 Multiple Species Comparison Look at multiple species simultaneously

6 Application to Regulatory Motif Discovery S. cer S. par S. mik S. bay Evaluate conservation within: Gal4 Controls (1) All intergenic regions 22% 5% (2) Intergenic : coding (3) Upstream : downstream 4:1 1:3 12:1 1:1 A signature for regulatory motifs

7 Result Highlights [Kellis, Patterson, Birren, Berger*, Lander* (2004). RECOMB, ; J. Comp Biol special issue, 11:2-3, ; Kellis et al. Nature (2003)] Identify gene correspondence across species for > 90% of genes in Yeast. 99.9% sensitivity and 99% specificity on 4000 known genes. Refine boundaries of hundreds of genes (5700 genes total). Identify most previously known and 41 novel regulatory motifs. Genome-wide, unbiased search. No previous knowledge necessary.

8 Comparative Genomics of RNA Protein Function

9 RNA Secondary Structure Detection Problem: Identify biologically significant RNA secondary structure. Challenge: Any given single sequence will have a plausible secondary structure. A-U G-C G-U Hairpin Loops Stems Bulge loop Interior loops Multi-branched loop

10 Compensatory Mutations Given K orthologous aligned RNA sequences: If i th and j th positions are base-paired in many organisms, then their nucleotides must covary.

11 Compensatory Mutations Given K orthologous aligned RNA sequences: If i th and j th positions are base-paired in many organisms, then their nucleotides must covary.

12 Approaches to Secondary Structure Detection Statistical Stochastic context free grammars for 2- species comparison (QRNA) Machine learning (RNAGenie) Our approach: statistical significance across multiple species (MSARi) Homology Train on a particular RNA secondary structure and try to predict that structure

13 Result Highlights [Coventry, Kleitman, Berger (2004), PNAS] Identifies RNA secondary structure with 90% sensitivity at 98% specificity. no previous knowledge necessary Used to identify functionally significant RNA secondary structure in mrna. Can be used to scan multiple genomes for RNA secondary structure. Benchmarks: QRNA 19.8% sensitivity at 98% specificity ddbrna 68% sensitivity at 97.7% specificity

14 Comparative Genomics of Proteins Protein Function

15 Protein Structure: The Protein Folding Problem Given an amino acid sequence, e.g., MDPNCSCAAAGDSCTCANSCTCLACKCTSCK, how will it fold in 3D? Proteins must fold to function Some diseases are caused by misfolding e.g., mad cow disease

16 Protein Folding by Comparative Modeling Similar protein sequences similar structures Use known structures to predict a new one About 40,000 protein structures have been solved using experimental techniques and stored in the Protein Data Bank (PDB) ; ~1000 are unique structural folds Same structural folds Different structural folds

17 Protein Threading Query Sequence: DRVYIHPFADRVYIHPFA The Best Match Threading = Match between a string and a 3D object

18 Result Highlights Result Highlights RAPTOR: threading as Linear-Programming (Jinbo Xu) {0,1}, 1 ] [, ] [,.. ), )(, (, ] [, ],, [ ), )(, (, ],, [ ), )(, (, ), )(, ( ), )(, (,, = = = + = k j l i l i i D l l i i k j R l k j l i k j l j i R k k j l i l i k j l i k j l i l i l i y x x j D k y x D i l y x s t y b x a E Minimize Structural Template 9 T N L A K Y E T L Input Sequence RAPTOR was the best performing algorithm at CAFASP, a worldwide competition

19 Threading Protein Complexes A B RGPPQLIK EGAATQY DBLRAP DBLRAP is our extension of RAPTOR for joint homology modeling of two structures (PSB 06) Extend LP formulation to score interfaces between two structures as well DBLRAP was able to predict interactions for 8% of proteins in the yeast genome (c.f. 5% previously)

20 Structure Alignment

21 Protein Structure Alignment Problem: find the optimal alignment between two protein structures

22 Contact Map Alignment Goal: find maximum common subgraph contact distance Du (5Å 7.5Å)

23 State of Art: Contact Map Alignment History: more than 20 years, many programs based on heuristic algorithms NP-hard and hard to approximate if being measured by Maximum Common Subgraph (Goldman, Papadimitriou & Istrail, FOCS 99) Lagrangian relaxation (Caprara & Lancia, Recomb 2002) Integer programming (Caprara et al, JCB 2004)

24 Tree-Decomposition for Protein Structure Alignment Method: tree-decomposition of one protein structure into small pieces to exploit the geometric characteristics of a structure Results: there is a poly-time approximation algorithm (PTAS) to find an alignment at least (1-1/k) of the best. Its time complexity is: O( k 2 tw= O(( poly( N) Δ D D l ) 3 k 2 tw ) /( 6 ε Dc ) ) Δ = (1 + ε) 3 ( D D c l ) 3 The parameters D, Dc and Dl are small constants, so is D/Dl. Therefore, this problem admits a PTAS, the best that we can achieve since this problem is NP-hard.

25 Biological Applications of Tree Decomposition Sidechain packing (Xu & Berger, Recomb 05, JACM 06) Protein threading (Xu, Jiao, & Berger, CSB 05) Network motif search (Dost et al, Recomb 07) RNA secondary structure alignment (Song et al, CSB 05) De novo sequencing (Liu et al, PSB 06) Protein structure alignment (Xu, Jiao, & Berger, Recomb 06, JCB 06; Xu, CDC 07)

26 Comparative Genomics of Networks Protein Function

27 Why understanding function-level differences is important Increased complexity (function) is not explained simply by variations in gene (or protein) count Estimated Number of Genes Estimated Number of Proteins Numbers from

28 Protein-Protein Interactions (PPIs( PPIs) Often, proteins interact with other proteins to perform their functions Many cellular activities are a result of protein interactions MAPK Signaling Cascade Image from: /mapkmap2.html

29 Modeling PPIs Traditional perspective: low-throughput, structural New perspective: high-throughput, network-based G-protein complex GDP GDP Gγ Gα Gα Gβ Gβ Gγ Traditional perspective Image from New systems-level perspective

30 Protein-Protein Interaction (PPI) Network X + Y =? Yeast 2-Hybrid method Cusick et al. Hum Med Gen, 05 Yeast PPI Network

31 Motivation behind Network Comparison Compare PPI networks at the species level Transfer annotation from one species to another More feasible, cheaper and easier than in humans Error detection Compute functional orthologs Functional orthologs: proteins which perform the same function across species

32 The Problem Given two protein-protein interaction networks, find for a piece of one network, something that has a comparative structure in the other network Our approach: match neighborhood topologies

33 Algorithm: IsoRank a1 a2 a4 a5 a7 a3 a8 a6 b3 b2 b5 b4 b7 b9 b1 b6 b8 Sequence similarity a5 b7 2.1 a5 b9 1.5 a3 b2 3.4 a5 a5 a5 a5 a3 a3 b7 b1 b3 b9 b1 b6 1e-2 2e-8 1e-7 1e-4 5e-4 3e-9 Functional similarity for each possible node pairing

34 Functional Similarity Score: Intuition Compute pairwise scores R ij : a1 a2 a3 a4 a5 R a5,b1 =? b1 b2 b5 b3 b4 Goal: high R ij i and j are a good match Intuition: i and j are a good match if their sequences align and their neighbors are a good match

35 Computing R ij Combine both sequence and network data network similarity R ij = (1-α)E E ij ij +αn ij functional similarity sequence similarity

36 Simple Case: α=1 (no E ij ) R ij =N ij. a1 a2 R ij a3 R ij depends on neighborhoods of i and j R= N= N 1 1 N v R ij ij uv u N ( ui ) Nv ( Ni )( v j) N ( j) ( N) ( u) ( N) ( v) a4 = N(a) is the set of neighbors of a a5 1, b4 a 2, b3 u 1 R = R a 2 3 R uv b1 b2 b5 b3 a1 a2 b3 b4 b4

37 Simple case: α=1 (no E ij ) R ij =N ij. R ij R ij depends on neighborhoods of i and j = N ij = u N ( i) v N ( j) ( ) ( ) N N(a) is the set of neighbors of a a1 b1 b2 a2 a3 b5 b3 b4 a4 a5 R a 2, b2 a1 1 = R R 3 1 a2 a3 u a1, b1 a3, b1 1 N v R uv 1 + R R 3 3 a1, b3 b1 b2 a3, b3 b3

38 Example: Computed R ij values a4 R a1 a2 a3 a5 b1 b2 b3 b4 b5 a a b1 b5 a a b2 b3 a b4 Empty cell indicates R ij = 0

39 Example: Computed R ij values a4 R a1 a2 a3 a5 b1 b2 b3 b4 b5 a a b1 b5 a a b2 b3 a b4 Empty cell indicates R ij = 0

40 Example: Computed R ij values a4 R a1 a2 a3 a5 b1 b2 b3 b4 b5 a a b1 b5 a a b2 b3 a b4 Empty cell indicates R ij = 0

41 Capturing non-local effects? p R pr =8.12e-3 r R pq =8.64e-3 q The algorithm can resolve between p-r vs. p-q

42 Computing R: an eigenvalue problem The equations for R describe an eigenvalue problem A[ ij ][ R uv ] = = AR N ( u 1 ) N ( v ) R is the principal eigenvector of A size ( A ) = N 1 N 2 N 1 N 2 N1 = # nodes in Graph 1 N2 = # nodes in Graph 2 A is about 10 8 x10 8 when aligning yeast and fly networks However, both A and R are very sparse We use the Power method to efficiently compute R Extension to weighted edges is straightforward

43 = ) ( ) ( ) ( ) ( 1 i N u j N v ij R uv v N u N R A Random Walk Interpretation A Random Walk Interpretation Tensor Product: G 1 x G 2 r p s v j q i u G 1 G 2 ) ( ) ( 1 v N u N ) ( ) ( 1 j N i N r,s r,j r,v u,s u,j u,v i,s i,j i,v

44 General Case: 0 α 1 Let B ij = sequence similarity score between i (from graph #1) and j from (graph #2) E ij = B ij / B 1 R = (1 α R 0 = αar 1 ) E + α AR

45 Complex Case: Multiple Networks 1 R 3 #1 #3 1 R 2 2 R 3 #2

46 Results: Yeast-Fly Global Alignment # of edges in the common subgraph: 1420 Implies about 5% overlap! Why so low? PPI data currently is noisy and low-coverage # of edges in the largest component: 35 The value of α used: 0.6 Provided best overall agreement with previous gene correspondence predictions

47 Various Topologies Are Found Existing local alignment methods (PathBlast; Kelley et al.) often find only specific topologies

48 Role of α: why the dip?

49 Robustness to Error in PPI data a1 a2 a4 a1 a2 a4 a5 a7 a5 a7 a11 a8 a3 a10 a9 a6 a11 a8 a3 a9 a10 a6

50 Robustness to Error in PPI data True curve somewhere around here

51 Functional Orthologs Genes that perform similar functions functional orthologs vs plain old orthologs distinguish between orthologs and paralogs Bandyopadhyay et al. [Genome Res. 06] Use local network alignment results Then use a MRF to partially resolve ambiguities We compared our results with theirs

52 Functional Orthologs: IsoRank Pairwise Alignment Predictions Protein Functional Ortholog IsoRank Bandyopadhyay et al. Gid8 CG6617 CG % CG Gpa1 Goα47a Goα47a 41% Giα65a --- Kap104 Trn Trn 41% CG % CG18617 Vph1 Vph1 43% Stv1 48% Egd1 Bic Bcd 47%

53 Results: Multiple Network Alignment Size of networks human (36387 PPIs), yeast (31899 PPIs), fly (25831 PPIs), worm (4573 PPIs) and mouse (255 PPIs) # of edges in the common subgraph with Isorank 1663 PPIs aligned in at least 2 species 157 PPIs aligned in at least 3 species Comparison with NCBI s Homologene 509 PPIs in at least 2 species 40 PPIs in at least 3 species INPARANOID ( PPIs in at least 2 species

54 Multiple Network Alignment: Functional Orthologs Coverage of known genes Out of 86,932 proteins in five species Isorank: 59,539 have at least one mapping INPARANOID: 55,000; 66% overlap with ours Homologene: 33,434 Functional coherence Isorank INPARANOID Homologene

55 Theoretical Considerations Limitation: K-regular graphs, i.e., what if all the nodes have the same degree? Convergence guarantee: number of iterations of the power method scales as log(1/α)

56 Biological Applications of Isorank Pairwise network alignment (Singh, Xu & Berger, Recomb 07, SODA 08) Multiple network alignment (Singh, Xu & Berger, PSB 08) Multiple RNA secondary and tertiary structure alignment Multiple protein structure alignment

57 Related Work on (Pairwise( Pairwise) Network Alignment PathBlast: Kelly et al. Use sequence similarity to shortlist possible pairs of matching nodes Search for conserved topologies like pathways and hub-and-spokes Koyuturk et al. Like PathBlast, but with a more sophisticated objective function that models gene deletion etc. Graemlin: Batzoglou et al. First uses sequence data to generate matching pairs of seed subgraphs and then heuristically grows the seed matches in search of a specific topology

58 Open Issues How can traditional graph-theoretic algorithms be extended to handle noise and incomplete data in biology?

59 Acknowledgments Sequence Genomics Manolis Kellis, Eric Lander, Nick Patterson RNA Secondary Structure Alex Coventry, Dan Kleitman Protein Structure Jinbo Xu, Rohit Singh PPI Network Alignment Rohit Singh, Jinbo Xu Thanks also to: Michael Baym Gopal Ramachandran Leonid Chindelevitch Michael Schnall-Levin Chris Bakal, HMS & CSAIL Lenore Cowen, Tufts

Network Alignment 858L

Network Alignment 858L Network Alignment 858L Terms & Questions A homologous h Interolog = B h Species 1 Species 2 Are there conserved pathways? What is the minimum set of pathways required for life? Can we compare networks

More information

Network alignment and querying

Network alignment and querying Network biology minicourse (part 4) Algorithmic challenges in genomics Network alignment and querying Roded Sharan School of Computer Science, Tel Aviv University Multiple Species PPI Data Rapid growth

More information

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison 10-810: Advanced Algorithms and Models for Computational Biology microrna and Whole Genome Comparison Central Dogma: 90s Transcription factors DNA transcription mrna translation Proteins Central Dogma:

More information

Comparative Network Analysis

Comparative Network Analysis Comparative Network Analysis BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2016 Anthony Gitter gitter@biostat.wisc.edu These slides, excluding third-party material, are licensed under CC BY-NC 4.0 by

More information

Phylogenetic Analysis of Molecular Interaction Networks 1

Phylogenetic Analysis of Molecular Interaction Networks 1 Phylogenetic Analysis of Molecular Interaction Networks 1 Mehmet Koyutürk Case Western Reserve University Electrical Engineering & Computer Science 1 Joint work with Sinan Erten, Xin Li, Gurkan Bebek,

More information

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche The molecular structure of a protein can be broken down hierarchically. The primary structure of a protein is simply its

More information

Computational methods for predicting protein-protein interactions

Computational methods for predicting protein-protein interactions Computational methods for predicting protein-protein interactions Tomi Peltola T-61.6070 Special course in bioinformatics I 3.4.2008 Outline Biological background Protein-protein interactions Computational

More information

Predicting Protein Functions and Domain Interactions from Protein Interactions

Predicting Protein Functions and Domain Interactions from Protein Interactions Predicting Protein Functions and Domain Interactions from Protein Interactions Fengzhu Sun, PhD Center for Computational and Experimental Genomics University of Southern California Outline High-throughput

More information

BIOINFORMATICS. Improved Network-based Identification of Protein Orthologs. Nir Yosef a,, Roded Sharan a and William Stafford Noble b

BIOINFORMATICS. Improved Network-based Identification of Protein Orthologs. Nir Yosef a,, Roded Sharan a and William Stafford Noble b BIOINFORMATICS Vol. no. 28 Pages 7 Improved Network-based Identification of Protein Orthologs Nir Yosef a,, Roded Sharan a and William Stafford Noble b a School of Computer Science, Tel-Aviv University,

More information

Bioinformatics: Network Analysis

Bioinformatics: Network Analysis Bioinformatics: Network Analysis Comparative Network Analysis COMP 572 (BIOS 572 / BIOE 564) - Fall 2013 Luay Nakhleh, Rice University 1 Biomolecular Network Components 2 Accumulation of Network Components

More information

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models 02-710 Computational Genomics Systems biology Putting it together: Data integration using graphical models High throughput data So far in this class we discussed several different types of high throughput

More information

Graph Alignment and Biological Networks

Graph Alignment and Biological Networks Graph Alignment and Biological Networks Johannes Berg http://www.uni-koeln.de/ berg Institute for Theoretical Physics University of Cologne Germany p.1/12 Networks in molecular biology New large-scale

More information

Algorithms in Bioinformatics

Algorithms in Bioinformatics Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri RNA Structure Prediction Secondary

More information

Improved network-based identification of protein orthologs

Improved network-based identification of protein orthologs BIOINFORMATICS Vol. 24 ECCB 28, pages i2 i26 doi:.93/bioinformatics/btn277 Improved network-based identification of protein orthologs Nir Yosef,, Roded Sharan and William Stafford Noble 2,3 School of Computer

More information

Introduction to Bioinformatics

Introduction to Bioinformatics CSCI8980: Applied Machine Learning in Computational Biology Introduction to Bioinformatics Rui Kuang Department of Computer Science and Engineering University of Minnesota kuang@cs.umn.edu History of Bioinformatics

More information

CISC 636 Computational Biology & Bioinformatics (Fall 2016)

CISC 636 Computational Biology & Bioinformatics (Fall 2016) CISC 636 Computational Biology & Bioinformatics (Fall 2016) Predicting Protein-Protein Interactions CISC636, F16, Lec22, Liao 1 Background Proteins do not function as isolated entities. Protein-Protein

More information

Grundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson

Grundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson Grundlagen der Bioinformatik, SS 10, D. Huson, April 12, 2010 1 1 Introduction Grundlagen der Bioinformatik Summer semester 2010 Lecturer: Prof. Daniel Huson Office hours: Thursdays 17-18h (Sand 14, C310a)

More information

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB Homology Modeling (Comparative Structure Modeling) Aims of Structural Genomics High-throughput 3D structure determination and analysis To determine or predict the 3D structures of all the proteins encoded

More information

Protein-protein Interaction: Network Alignment

Protein-protein Interaction: Network Alignment Protein-protein Interaction: Network Alignment Lecturer: Roded Sharan Scribers: Amiram Wingarten and Stas Levin Lecture 7, May 6, 2009 1 Introduction In the last few years the amount of available data

More information

Markov Models & DNA Sequence Evolution

Markov Models & DNA Sequence Evolution 7.91 / 7.36 / BE.490 Lecture #5 Mar. 9, 2004 Markov Models & DNA Sequence Evolution Chris Burge Review of Markov & HMM Models for DNA Markov Models for splice sites Hidden Markov Models - looking under

More information

Computational Systems Biology

Computational Systems Biology Computational Systems Biology Vasant Honavar Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Graduate Program Center for Computational Intelligence, Learning, & Discovery

More information

Spectral Alignment of Networks Soheil Feizi, Gerald Quon, Muriel Medard, Manolis Kellis, and Ali Jadbabaie

Spectral Alignment of Networks Soheil Feizi, Gerald Quon, Muriel Medard, Manolis Kellis, and Ali Jadbabaie Computer Science and Artificial Intelligence Laboratory Technical Report MIT-CSAIL-TR-205-005 February 8, 205 Spectral Alignment of Networks Soheil Feizi, Gerald Quon, Muriel Medard, Manolis Kellis, and

More information

Understanding Science Through the Lens of Computation. Richard M. Karp Nov. 3, 2007

Understanding Science Through the Lens of Computation. Richard M. Karp Nov. 3, 2007 Understanding Science Through the Lens of Computation Richard M. Karp Nov. 3, 2007 The Computational Lens Exposes the computational nature of natural processes and provides a language for their description.

More information

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor Biological Networks:,, and via Relative Description Length By: Tamir Tuller & Benny Chor Presented by: Noga Grebla Content of the presentation Presenting the goals of the research Reviewing basic terms

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

COMP 598 Advanced Computational Biology Methods & Research. Introduction. Jérôme Waldispühl School of Computer Science McGill University

COMP 598 Advanced Computational Biology Methods & Research. Introduction. Jérôme Waldispühl School of Computer Science McGill University COMP 598 Advanced Computational Biology Methods & Research Introduction Jérôme Waldispühl School of Computer Science McGill University General informations (1) Office hours: by appointment Office: TR3018

More information

CSCE555 Bioinformatics. Protein Function Annotation

CSCE555 Bioinformatics. Protein Function Annotation CSCE555 Bioinformatics Protein Function Annotation Why we need to do function annotation? Fig from: Network-based prediction of protein function. Molecular Systems Biology 3:88. 2007 What s function? The

More information

BLAST. Varieties of BLAST

BLAST. Varieties of BLAST BLAST Basic Local Alignment Search Tool (1990) Altschul, Gish, Miller, Myers, & Lipman Uses short-cuts or heuristics to improve search speed Like speed-reading, does not examine every nucleotide of database

More information

Università della Calabria

Università della Calabria Università della Calabria Facoltà di Ingegneria BIOINFORMATICS TECHNIQUES AND METHODOLOGIES Research group coordinated by Prof. Luigi Palopoli Lecturer: Simona Rombo OUTLINE 1. Introduction to Bioinformatics

More information

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Homology Modeling. Roberto Lins EPFL - summer semester 2005 Homology Modeling Roberto Lins EPFL - summer semester 2005 Disclaimer: course material is mainly taken from: P.E. Bourne & H Weissig, Structural Bioinformatics; C.A. Orengo, D.T. Jones & J.M. Thornton,

More information

Jeremy Chang Identifying protein protein interactions with statistical coupling analysis

Jeremy Chang Identifying protein protein interactions with statistical coupling analysis Jeremy Chang Identifying protein protein interactions with statistical coupling analysis Abstract: We used an algorithm known as statistical coupling analysis (SCA) 1 to create a set of features for building

More information

Example of Function Prediction

Example of Function Prediction Find similar genes Example of Function Prediction Suggesting functions of newly identified genes It was known that mutations of NF1 are associated with inherited disease neurofibromatosis 1; but little

More information

Clustering of Pathogenic Genes in Human Co-regulatory Network. Michael Colavita Mentor: Soheil Feizi Fifth Annual MIT PRIMES Conference May 17, 2015

Clustering of Pathogenic Genes in Human Co-regulatory Network. Michael Colavita Mentor: Soheil Feizi Fifth Annual MIT PRIMES Conference May 17, 2015 Clustering of Pathogenic Genes in Human Co-regulatory Network Michael Colavita Mentor: Soheil Feizi Fifth Annual MIT PRIMES Conference May 17, 2015 Topics Background Genetic Background Regulatory Networks

More information

Computational Biology: Basics & Interesting Problems

Computational Biology: Basics & Interesting Problems Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information

More information

Introduction to Comparative Protein Modeling. Chapter 4 Part I

Introduction to Comparative Protein Modeling. Chapter 4 Part I Introduction to Comparative Protein Modeling Chapter 4 Part I 1 Information on Proteins Each modeling study depends on the quality of the known experimental data. Basis of the model Search in the literature

More information

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

More information

CAP 5510 Lecture 3 Protein Structures

CAP 5510 Lecture 3 Protein Structures CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE 8/19/2005 Su-Shing Chen, CISE 1 Protein Conformation 8/19/2005 Su-Shing Chen, CISE 2 Protein Conformational Structures Hydrophobicity

More information

Protein function prediction via analysis of interactomes

Protein function prediction via analysis of interactomes Protein function prediction via analysis of interactomes Elena Nabieva Mona Singh Department of Computer Science & Lewis-Sigler Institute for Integrative Genomics January 22, 2008 1 Introduction Genome

More information

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction CMPS 6630: Introduction to Computational Biology and Bioinformatics Tertiary Structure Prediction Tertiary Structure Prediction Why Should Tertiary Structure Prediction Be Possible? Molecules obey the

More information

Comparing whole genomes

Comparing whole genomes BioNumerics Tutorial: Comparing whole genomes 1 Aim The Chromosome Comparison window in BioNumerics has been designed for large-scale comparison of sequences of unlimited length. In this tutorial you will

More information

Towards Detecting Protein Complexes from Protein Interaction Data

Towards Detecting Protein Complexes from Protein Interaction Data Towards Detecting Protein Complexes from Protein Interaction Data Pengjun Pei 1 and Aidong Zhang 1 Department of Computer Science and Engineering State University of New York at Buffalo Buffalo NY 14260,

More information

Evolution by duplication

Evolution by duplication 6.095/6.895 - Computational Biology: Genomes, Networks, Evolution Lecture 18 Nov 10, 2005 Evolution by duplication Somewhere, something went wrong Challenges in Computational Biology 4 Genome Assembly

More information

Identifying Signaling Pathways

Identifying Signaling Pathways These slides, excluding third-party material, are licensed under CC BY-NC 4.0 by Anthony Gitter, Mark Craven, Colin Dewey Identifying Signaling Pathways BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2018

More information

RNA-Strukturvorhersage Strukturelle Bioinformatik WS16/17

RNA-Strukturvorhersage Strukturelle Bioinformatik WS16/17 RNA-Strukturvorhersage Strukturelle Bioinformatik WS16/17 Dr. Stefan Simm, 01.11.2016 simm@bio.uni-frankfurt.de RNA secondary structures a. hairpin loop b. stem c. bulge loop d. interior loop e. multi

More information

An Efficient Algorithm for Protein-Protein Interaction Network Analysis to Discover Overlapping Functional Modules

An Efficient Algorithm for Protein-Protein Interaction Network Analysis to Discover Overlapping Functional Modules An Efficient Algorithm for Protein-Protein Interaction Network Analysis to Discover Overlapping Functional Modules Ying Liu 1 Department of Computer Science, Mathematics and Science, College of Professional

More information

CMPS 3110: Bioinformatics. Tertiary Structure Prediction

CMPS 3110: Bioinformatics. Tertiary Structure Prediction CMPS 3110: Bioinformatics Tertiary Structure Prediction Tertiary Structure Prediction Why Should Tertiary Structure Prediction Be Possible? Molecules obey the laws of physics! Conformation space is finite

More information

CS47300: Web Information Search and Management

CS47300: Web Information Search and Management CS47300: Web Information Search and Management Prof. Chris Clifton 6 September 2017 Material adapted from course created by Dr. Luo Si, now leading Alibaba research group 1 Vector Space Model Disadvantages:

More information

EVOLUTIONARY DISTANCES

EVOLUTIONARY DISTANCES EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:

More information

Predicting RNA Secondary Structure

Predicting RNA Secondary Structure 7.91 / 7.36 / BE.490 Lecture #6 Mar. 11, 2004 Predicting RNA Secondary Structure Chris Burge Review of Markov Models & DNA Evolution CpG Island HMM The Viterbi Algorithm Real World HMMs Markov Models for

More information

Network Motifs of Pathogenic Genes in Human Regulatory Network

Network Motifs of Pathogenic Genes in Human Regulatory Network Network Motifs of Pathogenic Genes in Human Regulatory Network Michael Colavita Mentor: Soheil Feizi Fourth Annual MIT PRIMES Conference May 18, 2014 Topics Background Genetics Regulatory Networks The

More information

17 Non-collinear alignment Motivation A B C A B C A B C A B C D A C. This exposition is based on:

17 Non-collinear alignment Motivation A B C A B C A B C A B C D A C. This exposition is based on: 17 Non-collinear alignment This exposition is based on: 1. Darling, A.E., Mau, B., Perna, N.T. (2010) progressivemauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One 5(6):e11147.

More information

GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data

GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data 1 Gene Networks Definition: A gene network is a set of molecular components, such as genes and proteins, and interactions between

More information

Computational Genomics. Reconstructing dynamic regulatory networks in multiple species

Computational Genomics. Reconstructing dynamic regulatory networks in multiple species 02-710 Computational Genomics Reconstructing dynamic regulatory networks in multiple species Methods for reconstructing networks in cells CRH1 SLT2 SLR3 YPS3 YPS1 Amit et al Science 2009 Pe er et al Recomb

More information

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) Bioinformática Sequence Alignment Pairwise Sequence Alignment Universidade da Beira Interior (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) 1 16/3/29 & 23/3/29 27/4/29 Outline

More information

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9 Lecture 5 Alignment I. Introduction. For sequence data, the process of generating an alignment establishes positional homologies; that is, alignment provides the identification of homologous phylogenetic

More information

Basics of protein structure

Basics of protein structure Today: 1. Projects a. Requirements: i. Critical review of one paper ii. At least one computational result b. Noon, Dec. 3 rd written report and oral presentation are due; submit via email to bphys101@fas.harvard.edu

More information

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010 BLAST Database Searching BME 110: CompBio Tools Todd Lowe April 8, 2010 Admin Reading: Read chapter 7, and the NCBI Blast Guide and tutorial http://www.ncbi.nlm.nih.gov/blast/why.shtml Read Chapter 8 for

More information

Network diffusion-based analysis of high-throughput data for the detection of differentially enriched modules

Network diffusion-based analysis of high-throughput data for the detection of differentially enriched modules Network diffusion-based analysis of high-throughput data for the detection of differentially enriched modules Matteo Bersanelli 1+, Ettore Mosca 2+, Daniel Remondini 1, Gastone Castellani 1 and Luciano

More information

Sequence analysis and Genomics

Sequence analysis and Genomics Sequence analysis and Genomics October 12 th November 23 rd 2 PM 5 PM Prof. Peter Stadler Dr. Katja Nowick Katja: group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute

More information

Learning in Bayesian Networks

Learning in Bayesian Networks Learning in Bayesian Networks Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Berlin: 20.06.2002 1 Overview 1. Bayesian Networks Stochastic Networks

More information

Basic Local Alignment Search Tool

Basic Local Alignment Search Tool Basic Local Alignment Search Tool Alignments used to uncover homologies between sequences combined with phylogenetic studies o can determine orthologous and paralogous relationships Local Alignment uses

More information

Protein Complex Identification by Supervised Graph Clustering

Protein Complex Identification by Supervised Graph Clustering Protein Complex Identification by Supervised Graph Clustering Yanjun Qi 1, Fernanda Balem 2, Christos Faloutsos 1, Judith Klein- Seetharaman 1,2, Ziv Bar-Joseph 1 1 School of Computer Science, Carnegie

More information

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Bioinformatics. Dept. of Computational Biology & Bioinformatics Bioinformatics Dept. of Computational Biology & Bioinformatics 3 Bioinformatics - play with sequences & structures Dept. of Computational Biology & Bioinformatics 4 ORGANIZATION OF LIFE ROLE OF BIOINFORMATICS

More information

Lecture 5: November 19, Minimizing the maximum intracluster distance

Lecture 5: November 19, Minimizing the maximum intracluster distance Analysis of DNA Chips and Gene Networks Spring Semester, 2009 Lecture 5: November 19, 2009 Lecturer: Ron Shamir Scribe: Renana Meller 5.1 Minimizing the maximum intracluster distance 5.1.1 Introduction

More information

Bioinformatics Chapter 1. Introduction

Bioinformatics Chapter 1. Introduction Bioinformatics Chapter 1. Introduction Outline! Biological Data in Digital Symbol Sequences! Genomes Diversity, Size, and Structure! Proteins and Proteomes! On the Information Content of Biological Sequences!

More information

Bioinformatics 2. Yeast two hybrid. Proteomics. Proteomics

Bioinformatics 2. Yeast two hybrid. Proteomics. Proteomics GENOME Bioinformatics 2 Proteomics protein-gene PROTEOME protein-protein METABOLISM Slide from http://www.nd.edu/~networks/ Citrate Cycle Bio-chemical reactions What is it? Proteomics Reveal protein Protein

More information

A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS

A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS CRYSTAL L. KAHN and BENJAMIN J. RAPHAEL Box 1910, Brown University Department of Computer Science & Center for Computational Molecular Biology

More information

How much non-coding DNA do eukaryotes require?

How much non-coding DNA do eukaryotes require? How much non-coding DNA do eukaryotes require? Andrei Zinovyev UMR U900 Computational Systems Biology of Cancer Institute Curie/INSERM/Ecole de Mine Paritech Dr. Sebastian Ahnert Dr. Thomas Fink Bioinformatics

More information

Protein folding. α-helix. Lecture 21. An α-helix is a simple helix having on average 10 residues (3 turns of the helix)

Protein folding. α-helix. Lecture 21. An α-helix is a simple helix having on average 10 residues (3 turns of the helix) Computat onal Biology Lecture 21 Protein folding The goal is to determine the three-dimensional structure of a protein based on its amino acid sequence Assumption: amino acid sequence completely and uniquely

More information

An Introduction to Bioinformatics Algorithms Hidden Markov Models

An Introduction to Bioinformatics Algorithms   Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

98 Algorithms in Bioinformatics I, WS 06, ZBIT, D. Huson, December 6, 2006

98 Algorithms in Bioinformatics I, WS 06, ZBIT, D. Huson, December 6, 2006 98 Algorithms in Bioinformatics I, WS 06, ZBIT, D. Huson, December 6, 2006 8.3.1 Simple energy minimization Maximizing the number of base pairs as described above does not lead to good structure predictions.

More information

Iteration Method for Predicting Essential Proteins Based on Orthology and Protein-protein Interaction Networks

Iteration Method for Predicting Essential Proteins Based on Orthology and Protein-protein Interaction Networks Georgia State University ScholarWorks @ Georgia State University Computer Science Faculty Publications Department of Computer Science 2012 Iteration Method for Predicting Essential Proteins Based on Orthology

More information

Inferring Transcriptional Regulatory Networks from High-throughput Data

Inferring Transcriptional Regulatory Networks from High-throughput Data Inferring Transcriptional Regulatory Networks from High-throughput Data Lectures 9 Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20

More information

Written Exam 15 December Course name: Introduction to Systems Biology Course no

Written Exam 15 December Course name: Introduction to Systems Biology Course no Technical University of Denmark Written Exam 15 December 2008 Course name: Introduction to Systems Biology Course no. 27041 Aids allowed: Open book exam Provide your answers and calculations on separate

More information

EBI web resources II: Ensembl and InterPro

EBI web resources II: Ensembl and InterPro EBI web resources II: Ensembl and InterPro Yanbin Yin http://www.ebi.ac.uk/training/online/course/ 1 Homework 3 Go to http://www.ebi.ac.uk/interpro/training.htmland finish the second online training course

More information

Chapter 3. Global Alignment of Protein Protein Interaction Networks. Misael Mongiovì and Roded Sharan. Abstract. 1. Introduction

Chapter 3. Global Alignment of Protein Protein Interaction Networks. Misael Mongiovì and Roded Sharan. Abstract. 1. Introduction Chapter 3 Global Alignment of Protein Protein Interaction Networks Misael Mongiovì and Roded Sharan Abstract Sequence-based comparisons have been the workhorse of bioinformatics for the past four decades,

More information

HMMs and biological sequence analysis

HMMs and biological sequence analysis HMMs and biological sequence analysis Hidden Markov Model A Markov chain is a sequence of random variables X 1, X 2, X 3,... That has the property that the value of the current state depends only on the

More information

Genome 559 Wi RNA Function, Search, Discovery

Genome 559 Wi RNA Function, Search, Discovery Genome 559 Wi 2009 RN Function, Search, Discovery The Message Cells make lots of RN noncoding RN Functionally important, functionally diverse Structurally complex New tools required alignment, discovery,

More information

Proteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it?

Proteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it? Proteomics What is it? Reveal protein interactions Protein profiling in a sample Yeast two hybrid screening High throughput 2D PAGE Automatic analysis of 2D Page Yeast two hybrid Use two mating strains

More information

Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University

Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University Department of Chemical Engineering Program of Applied and

More information

ECS 253 / MAE 253 April 26, Intro to Biological Networks, Motifs, and Model selection/validation

ECS 253 / MAE 253 April 26, Intro to Biological Networks, Motifs, and Model selection/validation ECS 253 / MAE 253 April 26, 2016 Intro to Biological Networks, Motifs, and Model selection/validation Announcement HW2, due May 3 (one week) HW2b, due May 5 HW2a, due May 5. Will be posted on Smartsite.

More information

A Geometric Interpretation of Gene Co-Expression Network Analysis. Steve Horvath, Jun Dong

A Geometric Interpretation of Gene Co-Expression Network Analysis. Steve Horvath, Jun Dong A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong Outline Network and network concepts Approximately factorizable networks Gene Co-expression Network Eigengene Factorizability,

More information

Repeat resolution. This exposition is based on the following sources, which are all recommended reading:

Repeat resolution. This exposition is based on the following sources, which are all recommended reading: Repeat resolution This exposition is based on the following sources, which are all recommended reading: 1. Separation of nearly identical repeats in shotgun assemblies using defined nucleotide positions,

More information

V 5 Robustness and Modularity

V 5 Robustness and Modularity Bioinformatics 3 V 5 Robustness and Modularity Mon, Oct 29, 2012 Network Robustness Network = set of connections Failure events: loss of edges loss of nodes (together with their edges) loss of connectivity

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison CMPS 6630: Introduction to Computational Biology and Bioinformatics Structure Comparison Protein Structure Comparison Motivation Understand sequence and structure variability Understand Domain architecture

More information

Computational Biology From The Perspective Of A Physical Scientist

Computational Biology From The Perspective Of A Physical Scientist Computational Biology From The Perspective Of A Physical Scientist Dr. Arthur Dong PP1@TUM 26 November 2013 Bioinformatics Education Curriculum Math, Physics, Computer Science (Statistics and Programming)

More information

Computational Network Biology Biostatistics & Medical Informatics 826 Fall 2018

Computational Network Biology Biostatistics & Medical Informatics 826 Fall 2018 Computational Network Biology Biostatistics & Medical Informatics 826 Fall 2018 Sushmita Roy sroy@biostat.wisc.edu https://compnetbiocourse.discovery.wisc.edu Sep 6 th 2018 Goals for today Administrivia

More information

RNA Search and! Motif Discovery" Genome 541! Intro to Computational! Molecular Biology"

RNA Search and! Motif Discovery Genome 541! Intro to Computational! Molecular Biology RNA Search and! Motif Discovery" Genome 541! Intro to Computational! Molecular Biology" Day 1" Many biologically interesting roles for RNA" RNA secondary structure prediction" 3 4 Approaches to Structure

More information

Structure-Based Comparison of Biomolecules

Structure-Based Comparison of Biomolecules Structure-Based Comparison of Biomolecules Benedikt Christoph Wolters Seminar Bioinformatics Algorithms RWTH AACHEN 07/17/2015 Outline 1 Introduction and Motivation Protein Structure Hierarchy Protein

More information

Using Ensembles of Hidden Markov Models for Grand Challenges in Bioinformatics

Using Ensembles of Hidden Markov Models for Grand Challenges in Bioinformatics Using Ensembles of Hidden Markov Models for Grand Challenges in Bioinformatics Tandy Warnow Founder Professor of Engineering The University of Illinois at Urbana-Champaign http://tandy.cs.illinois.edu

More information

A graph kernel approach to the identification and characterisation of structured non-coding RNAs using multiple sequence alignment information

A graph kernel approach to the identification and characterisation of structured non-coding RNAs using multiple sequence alignment information graph kernel approach to the identification and characterisation of structured noncoding RNs using multiple sequence alignment information Mariam lshaikh lbert Ludwigs niversity Freiburg, Department of

More information

Motif Prediction in Amino Acid Interaction Networks

Motif Prediction in Amino Acid Interaction Networks Motif Prediction in Amino Acid Interaction Networks Omar GACI and Stefan BALEV Abstract In this paper we represent a protein as a graph where the vertices are amino acids and the edges are interactions

More information

Computational Genomics and Molecular Biology, Fall

Computational Genomics and Molecular Biology, Fall Computational Genomics and Molecular Biology, Fall 2014 1 HMM Lecture Notes Dannie Durand and Rose Hoberman November 6th Introduction In the last few lectures, we have focused on three problems related

More information

Sequence analysis and comparison

Sequence analysis and comparison The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species

More information

BMI/CS 776 Lecture #20 Alignment of whole genomes. Colin Dewey (with slides adapted from those by Mark Craven)

BMI/CS 776 Lecture #20 Alignment of whole genomes. Colin Dewey (with slides adapted from those by Mark Craven) BMI/CS 776 Lecture #20 Alignment of whole genomes Colin Dewey (with slides adapted from those by Mark Craven) 2007.03.29 1 Multiple whole genome alignment Input set of whole genome sequences genomes diverged

More information

Overview Multiple Sequence Alignment

Overview Multiple Sequence Alignment Overview Multiple Sequence Alignment Inge Jonassen Bioinformatics group Dept. of Informatics, UoB Inge.Jonassen@ii.uib.no Definition/examples Use of alignments The alignment problem scoring alignments

More information

Department of Computing, Imperial College London. Introduction to Bioinformatics: Biological Networks. Spring 2010

Department of Computing, Imperial College London. Introduction to Bioinformatics: Biological Networks. Spring 2010 Department of Computing, Imperial College London Introduction to Bioinformatics: Biological Networks Spring 2010 Lecturer: Nataša Pržulj Office: 407A Huxley E-mail: natasha@imperial.ac.uk Lectures: Time

More information

Tools and Algorithms in Bioinformatics

Tools and Algorithms in Bioinformatics Tools and Algorithms in Bioinformatics GCBA815, Fall 2015 Week-4 BLAST Algorithm Continued Multiple Sequence Alignment Babu Guda, Ph.D. Department of Genetics, Cell Biology & Anatomy Bioinformatics and

More information

Inferring Transcriptional Regulatory Networks from Gene Expression Data II

Inferring Transcriptional Regulatory Networks from Gene Expression Data II Inferring Transcriptional Regulatory Networks from Gene Expression Data II Lectures 9 Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday

More information