Comparative Genomics: Sequence, Structure, and Networks. Bonnie Berger MIT
|
|
- Peter Chambers
- 5 years ago
- Views:
Transcription
1 Comparative Genomics: Sequence, Structure, and Networks Bonnie Berger MIT
2 Comparative Genomics Look at the same kind of data across species with the hope that areas of high correlation correspond to functional parts or modules of the genome.
3 Biology in One Slide Protein Function
4 Comparative Genomics of DNA Protein Function
5 Multiple Species Comparison Look at multiple species simultaneously
6 Application to Regulatory Motif Discovery S. cer S. par S. mik S. bay Evaluate conservation within: Gal4 Controls (1) All intergenic regions 22% 5% (2) Intergenic : coding (3) Upstream : downstream 4:1 1:3 12:1 1:1 A signature for regulatory motifs
7 Result Highlights [Kellis, Patterson, Birren, Berger*, Lander* (2004). RECOMB, ; J. Comp Biol special issue, 11:2-3, ; Kellis et al. Nature (2003)] Identify gene correspondence across species for > 90% of genes in Yeast. 99.9% sensitivity and 99% specificity on 4000 known genes. Refine boundaries of hundreds of genes (5700 genes total). Identify most previously known and 41 novel regulatory motifs. Genome-wide, unbiased search. No previous knowledge necessary.
8 Comparative Genomics of RNA Protein Function
9 RNA Secondary Structure Detection Problem: Identify biologically significant RNA secondary structure. Challenge: Any given single sequence will have a plausible secondary structure. A-U G-C G-U Hairpin Loops Stems Bulge loop Interior loops Multi-branched loop
10 Compensatory Mutations Given K orthologous aligned RNA sequences: If i th and j th positions are base-paired in many organisms, then their nucleotides must covary.
11 Compensatory Mutations Given K orthologous aligned RNA sequences: If i th and j th positions are base-paired in many organisms, then their nucleotides must covary.
12 Approaches to Secondary Structure Detection Statistical Stochastic context free grammars for 2- species comparison (QRNA) Machine learning (RNAGenie) Our approach: statistical significance across multiple species (MSARi) Homology Train on a particular RNA secondary structure and try to predict that structure
13 Result Highlights [Coventry, Kleitman, Berger (2004), PNAS] Identifies RNA secondary structure with 90% sensitivity at 98% specificity. no previous knowledge necessary Used to identify functionally significant RNA secondary structure in mrna. Can be used to scan multiple genomes for RNA secondary structure. Benchmarks: QRNA 19.8% sensitivity at 98% specificity ddbrna 68% sensitivity at 97.7% specificity
14 Comparative Genomics of Proteins Protein Function
15 Protein Structure: The Protein Folding Problem Given an amino acid sequence, e.g., MDPNCSCAAAGDSCTCANSCTCLACKCTSCK, how will it fold in 3D? Proteins must fold to function Some diseases are caused by misfolding e.g., mad cow disease
16 Protein Folding by Comparative Modeling Similar protein sequences similar structures Use known structures to predict a new one About 40,000 protein structures have been solved using experimental techniques and stored in the Protein Data Bank (PDB) ; ~1000 are unique structural folds Same structural folds Different structural folds
17 Protein Threading Query Sequence: DRVYIHPFADRVYIHPFA The Best Match Threading = Match between a string and a 3D object
18 Result Highlights Result Highlights RAPTOR: threading as Linear-Programming (Jinbo Xu) {0,1}, 1 ] [, ] [,.. ), )(, (, ] [, ],, [ ), )(, (, ],, [ ), )(, (, ), )(, ( ), )(, (,, = = = + = k j l i l i i D l l i i k j R l k j l i k j l j i R k k j l i l i k j l i k j l i l i l i y x x j D k y x D i l y x s t y b x a E Minimize Structural Template 9 T N L A K Y E T L Input Sequence RAPTOR was the best performing algorithm at CAFASP, a worldwide competition
19 Threading Protein Complexes A B RGPPQLIK EGAATQY DBLRAP DBLRAP is our extension of RAPTOR for joint homology modeling of two structures (PSB 06) Extend LP formulation to score interfaces between two structures as well DBLRAP was able to predict interactions for 8% of proteins in the yeast genome (c.f. 5% previously)
20 Structure Alignment
21 Protein Structure Alignment Problem: find the optimal alignment between two protein structures
22 Contact Map Alignment Goal: find maximum common subgraph contact distance Du (5Å 7.5Å)
23 State of Art: Contact Map Alignment History: more than 20 years, many programs based on heuristic algorithms NP-hard and hard to approximate if being measured by Maximum Common Subgraph (Goldman, Papadimitriou & Istrail, FOCS 99) Lagrangian relaxation (Caprara & Lancia, Recomb 2002) Integer programming (Caprara et al, JCB 2004)
24 Tree-Decomposition for Protein Structure Alignment Method: tree-decomposition of one protein structure into small pieces to exploit the geometric characteristics of a structure Results: there is a poly-time approximation algorithm (PTAS) to find an alignment at least (1-1/k) of the best. Its time complexity is: O( k 2 tw= O(( poly( N) Δ D D l ) 3 k 2 tw ) /( 6 ε Dc ) ) Δ = (1 + ε) 3 ( D D c l ) 3 The parameters D, Dc and Dl are small constants, so is D/Dl. Therefore, this problem admits a PTAS, the best that we can achieve since this problem is NP-hard.
25 Biological Applications of Tree Decomposition Sidechain packing (Xu & Berger, Recomb 05, JACM 06) Protein threading (Xu, Jiao, & Berger, CSB 05) Network motif search (Dost et al, Recomb 07) RNA secondary structure alignment (Song et al, CSB 05) De novo sequencing (Liu et al, PSB 06) Protein structure alignment (Xu, Jiao, & Berger, Recomb 06, JCB 06; Xu, CDC 07)
26 Comparative Genomics of Networks Protein Function
27 Why understanding function-level differences is important Increased complexity (function) is not explained simply by variations in gene (or protein) count Estimated Number of Genes Estimated Number of Proteins Numbers from
28 Protein-Protein Interactions (PPIs( PPIs) Often, proteins interact with other proteins to perform their functions Many cellular activities are a result of protein interactions MAPK Signaling Cascade Image from: /mapkmap2.html
29 Modeling PPIs Traditional perspective: low-throughput, structural New perspective: high-throughput, network-based G-protein complex GDP GDP Gγ Gα Gα Gβ Gβ Gγ Traditional perspective Image from New systems-level perspective
30 Protein-Protein Interaction (PPI) Network X + Y =? Yeast 2-Hybrid method Cusick et al. Hum Med Gen, 05 Yeast PPI Network
31 Motivation behind Network Comparison Compare PPI networks at the species level Transfer annotation from one species to another More feasible, cheaper and easier than in humans Error detection Compute functional orthologs Functional orthologs: proteins which perform the same function across species
32 The Problem Given two protein-protein interaction networks, find for a piece of one network, something that has a comparative structure in the other network Our approach: match neighborhood topologies
33 Algorithm: IsoRank a1 a2 a4 a5 a7 a3 a8 a6 b3 b2 b5 b4 b7 b9 b1 b6 b8 Sequence similarity a5 b7 2.1 a5 b9 1.5 a3 b2 3.4 a5 a5 a5 a5 a3 a3 b7 b1 b3 b9 b1 b6 1e-2 2e-8 1e-7 1e-4 5e-4 3e-9 Functional similarity for each possible node pairing
34 Functional Similarity Score: Intuition Compute pairwise scores R ij : a1 a2 a3 a4 a5 R a5,b1 =? b1 b2 b5 b3 b4 Goal: high R ij i and j are a good match Intuition: i and j are a good match if their sequences align and their neighbors are a good match
35 Computing R ij Combine both sequence and network data network similarity R ij = (1-α)E E ij ij +αn ij functional similarity sequence similarity
36 Simple Case: α=1 (no E ij ) R ij =N ij. a1 a2 R ij a3 R ij depends on neighborhoods of i and j R= N= N 1 1 N v R ij ij uv u N ( ui ) Nv ( Ni )( v j) N ( j) ( N) ( u) ( N) ( v) a4 = N(a) is the set of neighbors of a a5 1, b4 a 2, b3 u 1 R = R a 2 3 R uv b1 b2 b5 b3 a1 a2 b3 b4 b4
37 Simple case: α=1 (no E ij ) R ij =N ij. R ij R ij depends on neighborhoods of i and j = N ij = u N ( i) v N ( j) ( ) ( ) N N(a) is the set of neighbors of a a1 b1 b2 a2 a3 b5 b3 b4 a4 a5 R a 2, b2 a1 1 = R R 3 1 a2 a3 u a1, b1 a3, b1 1 N v R uv 1 + R R 3 3 a1, b3 b1 b2 a3, b3 b3
38 Example: Computed R ij values a4 R a1 a2 a3 a5 b1 b2 b3 b4 b5 a a b1 b5 a a b2 b3 a b4 Empty cell indicates R ij = 0
39 Example: Computed R ij values a4 R a1 a2 a3 a5 b1 b2 b3 b4 b5 a a b1 b5 a a b2 b3 a b4 Empty cell indicates R ij = 0
40 Example: Computed R ij values a4 R a1 a2 a3 a5 b1 b2 b3 b4 b5 a a b1 b5 a a b2 b3 a b4 Empty cell indicates R ij = 0
41 Capturing non-local effects? p R pr =8.12e-3 r R pq =8.64e-3 q The algorithm can resolve between p-r vs. p-q
42 Computing R: an eigenvalue problem The equations for R describe an eigenvalue problem A[ ij ][ R uv ] = = AR N ( u 1 ) N ( v ) R is the principal eigenvector of A size ( A ) = N 1 N 2 N 1 N 2 N1 = # nodes in Graph 1 N2 = # nodes in Graph 2 A is about 10 8 x10 8 when aligning yeast and fly networks However, both A and R are very sparse We use the Power method to efficiently compute R Extension to weighted edges is straightforward
43 = ) ( ) ( ) ( ) ( 1 i N u j N v ij R uv v N u N R A Random Walk Interpretation A Random Walk Interpretation Tensor Product: G 1 x G 2 r p s v j q i u G 1 G 2 ) ( ) ( 1 v N u N ) ( ) ( 1 j N i N r,s r,j r,v u,s u,j u,v i,s i,j i,v
44 General Case: 0 α 1 Let B ij = sequence similarity score between i (from graph #1) and j from (graph #2) E ij = B ij / B 1 R = (1 α R 0 = αar 1 ) E + α AR
45 Complex Case: Multiple Networks 1 R 3 #1 #3 1 R 2 2 R 3 #2
46 Results: Yeast-Fly Global Alignment # of edges in the common subgraph: 1420 Implies about 5% overlap! Why so low? PPI data currently is noisy and low-coverage # of edges in the largest component: 35 The value of α used: 0.6 Provided best overall agreement with previous gene correspondence predictions
47 Various Topologies Are Found Existing local alignment methods (PathBlast; Kelley et al.) often find only specific topologies
48 Role of α: why the dip?
49 Robustness to Error in PPI data a1 a2 a4 a1 a2 a4 a5 a7 a5 a7 a11 a8 a3 a10 a9 a6 a11 a8 a3 a9 a10 a6
50 Robustness to Error in PPI data True curve somewhere around here
51 Functional Orthologs Genes that perform similar functions functional orthologs vs plain old orthologs distinguish between orthologs and paralogs Bandyopadhyay et al. [Genome Res. 06] Use local network alignment results Then use a MRF to partially resolve ambiguities We compared our results with theirs
52 Functional Orthologs: IsoRank Pairwise Alignment Predictions Protein Functional Ortholog IsoRank Bandyopadhyay et al. Gid8 CG6617 CG % CG Gpa1 Goα47a Goα47a 41% Giα65a --- Kap104 Trn Trn 41% CG % CG18617 Vph1 Vph1 43% Stv1 48% Egd1 Bic Bcd 47%
53 Results: Multiple Network Alignment Size of networks human (36387 PPIs), yeast (31899 PPIs), fly (25831 PPIs), worm (4573 PPIs) and mouse (255 PPIs) # of edges in the common subgraph with Isorank 1663 PPIs aligned in at least 2 species 157 PPIs aligned in at least 3 species Comparison with NCBI s Homologene 509 PPIs in at least 2 species 40 PPIs in at least 3 species INPARANOID ( PPIs in at least 2 species
54 Multiple Network Alignment: Functional Orthologs Coverage of known genes Out of 86,932 proteins in five species Isorank: 59,539 have at least one mapping INPARANOID: 55,000; 66% overlap with ours Homologene: 33,434 Functional coherence Isorank INPARANOID Homologene
55 Theoretical Considerations Limitation: K-regular graphs, i.e., what if all the nodes have the same degree? Convergence guarantee: number of iterations of the power method scales as log(1/α)
56 Biological Applications of Isorank Pairwise network alignment (Singh, Xu & Berger, Recomb 07, SODA 08) Multiple network alignment (Singh, Xu & Berger, PSB 08) Multiple RNA secondary and tertiary structure alignment Multiple protein structure alignment
57 Related Work on (Pairwise( Pairwise) Network Alignment PathBlast: Kelly et al. Use sequence similarity to shortlist possible pairs of matching nodes Search for conserved topologies like pathways and hub-and-spokes Koyuturk et al. Like PathBlast, but with a more sophisticated objective function that models gene deletion etc. Graemlin: Batzoglou et al. First uses sequence data to generate matching pairs of seed subgraphs and then heuristically grows the seed matches in search of a specific topology
58 Open Issues How can traditional graph-theoretic algorithms be extended to handle noise and incomplete data in biology?
59 Acknowledgments Sequence Genomics Manolis Kellis, Eric Lander, Nick Patterson RNA Secondary Structure Alex Coventry, Dan Kleitman Protein Structure Jinbo Xu, Rohit Singh PPI Network Alignment Rohit Singh, Jinbo Xu Thanks also to: Michael Baym Gopal Ramachandran Leonid Chindelevitch Michael Schnall-Levin Chris Bakal, HMS & CSAIL Lenore Cowen, Tufts
Network Alignment 858L
Network Alignment 858L Terms & Questions A homologous h Interolog = B h Species 1 Species 2 Are there conserved pathways? What is the minimum set of pathways required for life? Can we compare networks
More informationNetwork alignment and querying
Network biology minicourse (part 4) Algorithmic challenges in genomics Network alignment and querying Roded Sharan School of Computer Science, Tel Aviv University Multiple Species PPI Data Rapid growth
More information10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison
10-810: Advanced Algorithms and Models for Computational Biology microrna and Whole Genome Comparison Central Dogma: 90s Transcription factors DNA transcription mrna translation Proteins Central Dogma:
More informationComparative Network Analysis
Comparative Network Analysis BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2016 Anthony Gitter gitter@biostat.wisc.edu These slides, excluding third-party material, are licensed under CC BY-NC 4.0 by
More informationPhylogenetic Analysis of Molecular Interaction Networks 1
Phylogenetic Analysis of Molecular Interaction Networks 1 Mehmet Koyutürk Case Western Reserve University Electrical Engineering & Computer Science 1 Joint work with Sinan Erten, Xin Li, Gurkan Bebek,
More informationProtein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche
Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche The molecular structure of a protein can be broken down hierarchically. The primary structure of a protein is simply its
More informationComputational methods for predicting protein-protein interactions
Computational methods for predicting protein-protein interactions Tomi Peltola T-61.6070 Special course in bioinformatics I 3.4.2008 Outline Biological background Protein-protein interactions Computational
More informationPredicting Protein Functions and Domain Interactions from Protein Interactions
Predicting Protein Functions and Domain Interactions from Protein Interactions Fengzhu Sun, PhD Center for Computational and Experimental Genomics University of Southern California Outline High-throughput
More informationBIOINFORMATICS. Improved Network-based Identification of Protein Orthologs. Nir Yosef a,, Roded Sharan a and William Stafford Noble b
BIOINFORMATICS Vol. no. 28 Pages 7 Improved Network-based Identification of Protein Orthologs Nir Yosef a,, Roded Sharan a and William Stafford Noble b a School of Computer Science, Tel-Aviv University,
More informationBioinformatics: Network Analysis
Bioinformatics: Network Analysis Comparative Network Analysis COMP 572 (BIOS 572 / BIOE 564) - Fall 2013 Luay Nakhleh, Rice University 1 Biomolecular Network Components 2 Accumulation of Network Components
More informationComputational Genomics. Systems biology. Putting it together: Data integration using graphical models
02-710 Computational Genomics Systems biology Putting it together: Data integration using graphical models High throughput data So far in this class we discussed several different types of high throughput
More informationGraph Alignment and Biological Networks
Graph Alignment and Biological Networks Johannes Berg http://www.uni-koeln.de/ berg Institute for Theoretical Physics University of Cologne Germany p.1/12 Networks in molecular biology New large-scale
More informationAlgorithms in Bioinformatics
Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri RNA Structure Prediction Secondary
More informationImproved network-based identification of protein orthologs
BIOINFORMATICS Vol. 24 ECCB 28, pages i2 i26 doi:.93/bioinformatics/btn277 Improved network-based identification of protein orthologs Nir Yosef,, Roded Sharan and William Stafford Noble 2,3 School of Computer
More informationIntroduction to Bioinformatics
CSCI8980: Applied Machine Learning in Computational Biology Introduction to Bioinformatics Rui Kuang Department of Computer Science and Engineering University of Minnesota kuang@cs.umn.edu History of Bioinformatics
More informationCISC 636 Computational Biology & Bioinformatics (Fall 2016)
CISC 636 Computational Biology & Bioinformatics (Fall 2016) Predicting Protein-Protein Interactions CISC636, F16, Lec22, Liao 1 Background Proteins do not function as isolated entities. Protein-Protein
More informationGrundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson
Grundlagen der Bioinformatik, SS 10, D. Huson, April 12, 2010 1 1 Introduction Grundlagen der Bioinformatik Summer semester 2010 Lecturer: Prof. Daniel Huson Office hours: Thursdays 17-18h (Sand 14, C310a)
More informationHomology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB
Homology Modeling (Comparative Structure Modeling) Aims of Structural Genomics High-throughput 3D structure determination and analysis To determine or predict the 3D structures of all the proteins encoded
More informationProtein-protein Interaction: Network Alignment
Protein-protein Interaction: Network Alignment Lecturer: Roded Sharan Scribers: Amiram Wingarten and Stas Levin Lecture 7, May 6, 2009 1 Introduction In the last few years the amount of available data
More informationMarkov Models & DNA Sequence Evolution
7.91 / 7.36 / BE.490 Lecture #5 Mar. 9, 2004 Markov Models & DNA Sequence Evolution Chris Burge Review of Markov & HMM Models for DNA Markov Models for splice sites Hidden Markov Models - looking under
More informationComputational Systems Biology
Computational Systems Biology Vasant Honavar Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Graduate Program Center for Computational Intelligence, Learning, & Discovery
More informationSpectral Alignment of Networks Soheil Feizi, Gerald Quon, Muriel Medard, Manolis Kellis, and Ali Jadbabaie
Computer Science and Artificial Intelligence Laboratory Technical Report MIT-CSAIL-TR-205-005 February 8, 205 Spectral Alignment of Networks Soheil Feizi, Gerald Quon, Muriel Medard, Manolis Kellis, and
More informationUnderstanding Science Through the Lens of Computation. Richard M. Karp Nov. 3, 2007
Understanding Science Through the Lens of Computation Richard M. Karp Nov. 3, 2007 The Computational Lens Exposes the computational nature of natural processes and provides a language for their description.
More informationBiological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor
Biological Networks:,, and via Relative Description Length By: Tamir Tuller & Benny Chor Presented by: Noga Grebla Content of the presentation Presenting the goals of the research Reviewing basic terms
More information6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008
MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationCOMP 598 Advanced Computational Biology Methods & Research. Introduction. Jérôme Waldispühl School of Computer Science McGill University
COMP 598 Advanced Computational Biology Methods & Research Introduction Jérôme Waldispühl School of Computer Science McGill University General informations (1) Office hours: by appointment Office: TR3018
More informationCSCE555 Bioinformatics. Protein Function Annotation
CSCE555 Bioinformatics Protein Function Annotation Why we need to do function annotation? Fig from: Network-based prediction of protein function. Molecular Systems Biology 3:88. 2007 What s function? The
More informationBLAST. Varieties of BLAST
BLAST Basic Local Alignment Search Tool (1990) Altschul, Gish, Miller, Myers, & Lipman Uses short-cuts or heuristics to improve search speed Like speed-reading, does not examine every nucleotide of database
More informationUniversità della Calabria
Università della Calabria Facoltà di Ingegneria BIOINFORMATICS TECHNIQUES AND METHODOLOGIES Research group coordinated by Prof. Luigi Palopoli Lecturer: Simona Rombo OUTLINE 1. Introduction to Bioinformatics
More informationHomology Modeling. Roberto Lins EPFL - summer semester 2005
Homology Modeling Roberto Lins EPFL - summer semester 2005 Disclaimer: course material is mainly taken from: P.E. Bourne & H Weissig, Structural Bioinformatics; C.A. Orengo, D.T. Jones & J.M. Thornton,
More informationJeremy Chang Identifying protein protein interactions with statistical coupling analysis
Jeremy Chang Identifying protein protein interactions with statistical coupling analysis Abstract: We used an algorithm known as statistical coupling analysis (SCA) 1 to create a set of features for building
More informationExample of Function Prediction
Find similar genes Example of Function Prediction Suggesting functions of newly identified genes It was known that mutations of NF1 are associated with inherited disease neurofibromatosis 1; but little
More informationClustering of Pathogenic Genes in Human Co-regulatory Network. Michael Colavita Mentor: Soheil Feizi Fifth Annual MIT PRIMES Conference May 17, 2015
Clustering of Pathogenic Genes in Human Co-regulatory Network Michael Colavita Mentor: Soheil Feizi Fifth Annual MIT PRIMES Conference May 17, 2015 Topics Background Genetic Background Regulatory Networks
More informationComputational Biology: Basics & Interesting Problems
Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information
More informationIntroduction to Comparative Protein Modeling. Chapter 4 Part I
Introduction to Comparative Protein Modeling Chapter 4 Part I 1 Information on Proteins Each modeling study depends on the quality of the known experimental data. Basis of the model Search in the literature
More informationAlgorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment
Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot
More informationCAP 5510 Lecture 3 Protein Structures
CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE 8/19/2005 Su-Shing Chen, CISE 1 Protein Conformation 8/19/2005 Su-Shing Chen, CISE 2 Protein Conformational Structures Hydrophobicity
More informationProtein function prediction via analysis of interactomes
Protein function prediction via analysis of interactomes Elena Nabieva Mona Singh Department of Computer Science & Lewis-Sigler Institute for Integrative Genomics January 22, 2008 1 Introduction Genome
More informationCMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction
CMPS 6630: Introduction to Computational Biology and Bioinformatics Tertiary Structure Prediction Tertiary Structure Prediction Why Should Tertiary Structure Prediction Be Possible? Molecules obey the
More informationComparing whole genomes
BioNumerics Tutorial: Comparing whole genomes 1 Aim The Chromosome Comparison window in BioNumerics has been designed for large-scale comparison of sequences of unlimited length. In this tutorial you will
More informationTowards Detecting Protein Complexes from Protein Interaction Data
Towards Detecting Protein Complexes from Protein Interaction Data Pengjun Pei 1 and Aidong Zhang 1 Department of Computer Science and Engineering State University of New York at Buffalo Buffalo NY 14260,
More informationEvolution by duplication
6.095/6.895 - Computational Biology: Genomes, Networks, Evolution Lecture 18 Nov 10, 2005 Evolution by duplication Somewhere, something went wrong Challenges in Computational Biology 4 Genome Assembly
More informationIdentifying Signaling Pathways
These slides, excluding third-party material, are licensed under CC BY-NC 4.0 by Anthony Gitter, Mark Craven, Colin Dewey Identifying Signaling Pathways BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2018
More informationRNA-Strukturvorhersage Strukturelle Bioinformatik WS16/17
RNA-Strukturvorhersage Strukturelle Bioinformatik WS16/17 Dr. Stefan Simm, 01.11.2016 simm@bio.uni-frankfurt.de RNA secondary structures a. hairpin loop b. stem c. bulge loop d. interior loop e. multi
More informationAn Efficient Algorithm for Protein-Protein Interaction Network Analysis to Discover Overlapping Functional Modules
An Efficient Algorithm for Protein-Protein Interaction Network Analysis to Discover Overlapping Functional Modules Ying Liu 1 Department of Computer Science, Mathematics and Science, College of Professional
More informationCMPS 3110: Bioinformatics. Tertiary Structure Prediction
CMPS 3110: Bioinformatics Tertiary Structure Prediction Tertiary Structure Prediction Why Should Tertiary Structure Prediction Be Possible? Molecules obey the laws of physics! Conformation space is finite
More informationCS47300: Web Information Search and Management
CS47300: Web Information Search and Management Prof. Chris Clifton 6 September 2017 Material adapted from course created by Dr. Luo Si, now leading Alibaba research group 1 Vector Space Model Disadvantages:
More informationEVOLUTIONARY DISTANCES
EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:
More informationPredicting RNA Secondary Structure
7.91 / 7.36 / BE.490 Lecture #6 Mar. 11, 2004 Predicting RNA Secondary Structure Chris Burge Review of Markov Models & DNA Evolution CpG Island HMM The Viterbi Algorithm Real World HMMs Markov Models for
More informationNetwork Motifs of Pathogenic Genes in Human Regulatory Network
Network Motifs of Pathogenic Genes in Human Regulatory Network Michael Colavita Mentor: Soheil Feizi Fourth Annual MIT PRIMES Conference May 18, 2014 Topics Background Genetics Regulatory Networks The
More information17 Non-collinear alignment Motivation A B C A B C A B C A B C D A C. This exposition is based on:
17 Non-collinear alignment This exposition is based on: 1. Darling, A.E., Mau, B., Perna, N.T. (2010) progressivemauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One 5(6):e11147.
More informationGLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data
GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data 1 Gene Networks Definition: A gene network is a set of molecular components, such as genes and proteins, and interactions between
More informationComputational Genomics. Reconstructing dynamic regulatory networks in multiple species
02-710 Computational Genomics Reconstructing dynamic regulatory networks in multiple species Methods for reconstructing networks in cells CRH1 SLT2 SLR3 YPS3 YPS1 Amit et al Science 2009 Pe er et al Recomb
More informationSara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)
Bioinformática Sequence Alignment Pairwise Sequence Alignment Universidade da Beira Interior (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) 1 16/3/29 & 23/3/29 27/4/29 Outline
More informationInDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9
Lecture 5 Alignment I. Introduction. For sequence data, the process of generating an alignment establishes positional homologies; that is, alignment provides the identification of homologous phylogenetic
More informationBasics of protein structure
Today: 1. Projects a. Requirements: i. Critical review of one paper ii. At least one computational result b. Noon, Dec. 3 rd written report and oral presentation are due; submit via email to bphys101@fas.harvard.edu
More informationBLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010
BLAST Database Searching BME 110: CompBio Tools Todd Lowe April 8, 2010 Admin Reading: Read chapter 7, and the NCBI Blast Guide and tutorial http://www.ncbi.nlm.nih.gov/blast/why.shtml Read Chapter 8 for
More informationNetwork diffusion-based analysis of high-throughput data for the detection of differentially enriched modules
Network diffusion-based analysis of high-throughput data for the detection of differentially enriched modules Matteo Bersanelli 1+, Ettore Mosca 2+, Daniel Remondini 1, Gastone Castellani 1 and Luciano
More informationSequence analysis and Genomics
Sequence analysis and Genomics October 12 th November 23 rd 2 PM 5 PM Prof. Peter Stadler Dr. Katja Nowick Katja: group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute
More informationLearning in Bayesian Networks
Learning in Bayesian Networks Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Berlin: 20.06.2002 1 Overview 1. Bayesian Networks Stochastic Networks
More informationBasic Local Alignment Search Tool
Basic Local Alignment Search Tool Alignments used to uncover homologies between sequences combined with phylogenetic studies o can determine orthologous and paralogous relationships Local Alignment uses
More informationProtein Complex Identification by Supervised Graph Clustering
Protein Complex Identification by Supervised Graph Clustering Yanjun Qi 1, Fernanda Balem 2, Christos Faloutsos 1, Judith Klein- Seetharaman 1,2, Ziv Bar-Joseph 1 1 School of Computer Science, Carnegie
More informationBioinformatics. Dept. of Computational Biology & Bioinformatics
Bioinformatics Dept. of Computational Biology & Bioinformatics 3 Bioinformatics - play with sequences & structures Dept. of Computational Biology & Bioinformatics 4 ORGANIZATION OF LIFE ROLE OF BIOINFORMATICS
More informationLecture 5: November 19, Minimizing the maximum intracluster distance
Analysis of DNA Chips and Gene Networks Spring Semester, 2009 Lecture 5: November 19, 2009 Lecturer: Ron Shamir Scribe: Renana Meller 5.1 Minimizing the maximum intracluster distance 5.1.1 Introduction
More informationBioinformatics Chapter 1. Introduction
Bioinformatics Chapter 1. Introduction Outline! Biological Data in Digital Symbol Sequences! Genomes Diversity, Size, and Structure! Proteins and Proteomes! On the Information Content of Biological Sequences!
More informationBioinformatics 2. Yeast two hybrid. Proteomics. Proteomics
GENOME Bioinformatics 2 Proteomics protein-gene PROTEOME protein-protein METABOLISM Slide from http://www.nd.edu/~networks/ Citrate Cycle Bio-chemical reactions What is it? Proteomics Reveal protein Protein
More informationA PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS
A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS CRYSTAL L. KAHN and BENJAMIN J. RAPHAEL Box 1910, Brown University Department of Computer Science & Center for Computational Molecular Biology
More informationHow much non-coding DNA do eukaryotes require?
How much non-coding DNA do eukaryotes require? Andrei Zinovyev UMR U900 Computational Systems Biology of Cancer Institute Curie/INSERM/Ecole de Mine Paritech Dr. Sebastian Ahnert Dr. Thomas Fink Bioinformatics
More informationProtein folding. α-helix. Lecture 21. An α-helix is a simple helix having on average 10 residues (3 turns of the helix)
Computat onal Biology Lecture 21 Protein folding The goal is to determine the three-dimensional structure of a protein based on its amino acid sequence Assumption: amino acid sequence completely and uniquely
More informationAn Introduction to Bioinformatics Algorithms Hidden Markov Models
Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training
More information98 Algorithms in Bioinformatics I, WS 06, ZBIT, D. Huson, December 6, 2006
98 Algorithms in Bioinformatics I, WS 06, ZBIT, D. Huson, December 6, 2006 8.3.1 Simple energy minimization Maximizing the number of base pairs as described above does not lead to good structure predictions.
More informationIteration Method for Predicting Essential Proteins Based on Orthology and Protein-protein Interaction Networks
Georgia State University ScholarWorks @ Georgia State University Computer Science Faculty Publications Department of Computer Science 2012 Iteration Method for Predicting Essential Proteins Based on Orthology
More informationInferring Transcriptional Regulatory Networks from High-throughput Data
Inferring Transcriptional Regulatory Networks from High-throughput Data Lectures 9 Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20
More informationWritten Exam 15 December Course name: Introduction to Systems Biology Course no
Technical University of Denmark Written Exam 15 December 2008 Course name: Introduction to Systems Biology Course no. 27041 Aids allowed: Open book exam Provide your answers and calculations on separate
More informationEBI web resources II: Ensembl and InterPro
EBI web resources II: Ensembl and InterPro Yanbin Yin http://www.ebi.ac.uk/training/online/course/ 1 Homework 3 Go to http://www.ebi.ac.uk/interpro/training.htmland finish the second online training course
More informationChapter 3. Global Alignment of Protein Protein Interaction Networks. Misael Mongiovì and Roded Sharan. Abstract. 1. Introduction
Chapter 3 Global Alignment of Protein Protein Interaction Networks Misael Mongiovì and Roded Sharan Abstract Sequence-based comparisons have been the workhorse of bioinformatics for the past four decades,
More informationHMMs and biological sequence analysis
HMMs and biological sequence analysis Hidden Markov Model A Markov chain is a sequence of random variables X 1, X 2, X 3,... That has the property that the value of the current state depends only on the
More informationGenome 559 Wi RNA Function, Search, Discovery
Genome 559 Wi 2009 RN Function, Search, Discovery The Message Cells make lots of RN noncoding RN Functionally important, functionally diverse Structurally complex New tools required alignment, discovery,
More informationProteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it?
Proteomics What is it? Reveal protein interactions Protein profiling in a sample Yeast two hybrid screening High throughput 2D PAGE Automatic analysis of 2D Page Yeast two hybrid Use two mating strains
More informationAlpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University
Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University Department of Chemical Engineering Program of Applied and
More informationECS 253 / MAE 253 April 26, Intro to Biological Networks, Motifs, and Model selection/validation
ECS 253 / MAE 253 April 26, 2016 Intro to Biological Networks, Motifs, and Model selection/validation Announcement HW2, due May 3 (one week) HW2b, due May 5 HW2a, due May 5. Will be posted on Smartsite.
More informationA Geometric Interpretation of Gene Co-Expression Network Analysis. Steve Horvath, Jun Dong
A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong Outline Network and network concepts Approximately factorizable networks Gene Co-expression Network Eigengene Factorizability,
More informationRepeat resolution. This exposition is based on the following sources, which are all recommended reading:
Repeat resolution This exposition is based on the following sources, which are all recommended reading: 1. Separation of nearly identical repeats in shotgun assemblies using defined nucleotide positions,
More informationV 5 Robustness and Modularity
Bioinformatics 3 V 5 Robustness and Modularity Mon, Oct 29, 2012 Network Robustness Network = set of connections Failure events: loss of edges loss of nodes (together with their edges) loss of connectivity
More informationHidden Markov Models
Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training
More informationCMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison
CMPS 6630: Introduction to Computational Biology and Bioinformatics Structure Comparison Protein Structure Comparison Motivation Understand sequence and structure variability Understand Domain architecture
More informationComputational Biology From The Perspective Of A Physical Scientist
Computational Biology From The Perspective Of A Physical Scientist Dr. Arthur Dong PP1@TUM 26 November 2013 Bioinformatics Education Curriculum Math, Physics, Computer Science (Statistics and Programming)
More informationComputational Network Biology Biostatistics & Medical Informatics 826 Fall 2018
Computational Network Biology Biostatistics & Medical Informatics 826 Fall 2018 Sushmita Roy sroy@biostat.wisc.edu https://compnetbiocourse.discovery.wisc.edu Sep 6 th 2018 Goals for today Administrivia
More informationRNA Search and! Motif Discovery" Genome 541! Intro to Computational! Molecular Biology"
RNA Search and! Motif Discovery" Genome 541! Intro to Computational! Molecular Biology" Day 1" Many biologically interesting roles for RNA" RNA secondary structure prediction" 3 4 Approaches to Structure
More informationStructure-Based Comparison of Biomolecules
Structure-Based Comparison of Biomolecules Benedikt Christoph Wolters Seminar Bioinformatics Algorithms RWTH AACHEN 07/17/2015 Outline 1 Introduction and Motivation Protein Structure Hierarchy Protein
More informationUsing Ensembles of Hidden Markov Models for Grand Challenges in Bioinformatics
Using Ensembles of Hidden Markov Models for Grand Challenges in Bioinformatics Tandy Warnow Founder Professor of Engineering The University of Illinois at Urbana-Champaign http://tandy.cs.illinois.edu
More informationA graph kernel approach to the identification and characterisation of structured non-coding RNAs using multiple sequence alignment information
graph kernel approach to the identification and characterisation of structured noncoding RNs using multiple sequence alignment information Mariam lshaikh lbert Ludwigs niversity Freiburg, Department of
More informationMotif Prediction in Amino Acid Interaction Networks
Motif Prediction in Amino Acid Interaction Networks Omar GACI and Stefan BALEV Abstract In this paper we represent a protein as a graph where the vertices are amino acids and the edges are interactions
More informationComputational Genomics and Molecular Biology, Fall
Computational Genomics and Molecular Biology, Fall 2014 1 HMM Lecture Notes Dannie Durand and Rose Hoberman November 6th Introduction In the last few lectures, we have focused on three problems related
More informationSequence analysis and comparison
The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species
More informationBMI/CS 776 Lecture #20 Alignment of whole genomes. Colin Dewey (with slides adapted from those by Mark Craven)
BMI/CS 776 Lecture #20 Alignment of whole genomes Colin Dewey (with slides adapted from those by Mark Craven) 2007.03.29 1 Multiple whole genome alignment Input set of whole genome sequences genomes diverged
More informationOverview Multiple Sequence Alignment
Overview Multiple Sequence Alignment Inge Jonassen Bioinformatics group Dept. of Informatics, UoB Inge.Jonassen@ii.uib.no Definition/examples Use of alignments The alignment problem scoring alignments
More informationDepartment of Computing, Imperial College London. Introduction to Bioinformatics: Biological Networks. Spring 2010
Department of Computing, Imperial College London Introduction to Bioinformatics: Biological Networks Spring 2010 Lecturer: Nataša Pržulj Office: 407A Huxley E-mail: natasha@imperial.ac.uk Lectures: Time
More informationTools and Algorithms in Bioinformatics
Tools and Algorithms in Bioinformatics GCBA815, Fall 2015 Week-4 BLAST Algorithm Continued Multiple Sequence Alignment Babu Guda, Ph.D. Department of Genetics, Cell Biology & Anatomy Bioinformatics and
More informationInferring Transcriptional Regulatory Networks from Gene Expression Data II
Inferring Transcriptional Regulatory Networks from Gene Expression Data II Lectures 9 Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday
More information