RNAscClust clustering RNAs using structure conservation and graph-based motifs
|
|
- Maximillian Norton
- 6 years ago
- Views:
Transcription
1 RNsclust clustering RNs using structure conservation and graph-based motifs Milad Miladi 1 lexander Junge 2 miladim@informatikuni-freiburgde ajunge@rthdk 1 Bioinformatics roup, niversity of Freiburg 2 enter for non-coding RN in Technology and Health (RTH), niversity of openhagen 31 st TBI Winterseminar Bled, Slovenia February / 15
2 lustering ncrn sequences using structure conservation human raphlust [Heyne et al, Bioinformatics, 2012]: lusters ncrn sequences an find paralogs belonging to same ncrn class Features based on local sequence and structure 2 / 15
3 lustering ncrn sequences using structure conservation human human chimp pig mouse - raphlust [Heyne et al, Bioinformatics, 2012]: lusters ncrn sequences an find paralogs belonging to same ncrn class Features based on local sequence and structure RNsclust: lusters paralogous RN sequences structurally aligned to their orthologs Extends raphlust approach: 2 / 15
4 lustering ncrn sequences using structure conservation covariation human human chimp pig mouse - consensus structure ( ( ) ) raphlust [Heyne et al, Bioinformatics, 2012]: lusters ncrn sequences an find paralogs belonging to same ncrn class Features based on local sequence and structure RNsclust: lusters paralogous RN sequences structurally aligned to their orthologs Extends raphlust approach: Derives evolutionary conserved sequence and secondary structure 2 / 15
5 Single sequence vs alignment clustering covariation human structure prediction from single sequence human chimp pig mouse - consensus structure ( ( ) ) structure prediction based on multiple alignment consensus structure 5' 3' wrong structure 5' 5' 5' 3' 5' 3' 3' 3' set of correct structures 3 / 15
6 Single sequence vs alignment clustering covariation human structure prediction from single sequence human chimp pig mouse - consensus structure ( ( ) ) structure prediction based on multiple alignment consensus structure 5' 3' wrong structure single sequence clustering 5' 5' 5' 3' 5' 3' 3' 3' multiple sequence alignment clustering set of correct structures cluster 1 cluster 2 cluster 1 cluster 2 cluster 3 wrong clustering correct clustering 3 / 15
7 Identifying similarities of secondary structures Neighborhood Subgraph Pairwise Distance (NSPD) raph Kernel [osta and De rave, Proceedings IML 10, 2010] Extract substructures: intuitively: structure k-mers NSPD Kernel ncrns highly similar if many shared substructures 4 / 15
8 Measuring structure similarity of multiple alignments human chimp pig mouse threshold t - - ( ( ( ) ) ) base pair reliabilities constraints PETfold [Seemann et al, NR, 2008] 5 / 15
9 Measuring structure similarity of multiple alignments human chimp pig mouse threshold t - - ( ( ( ) ) ) base pair reliabilities constraints PETfold [Seemann et al, NR, 2008] constrained folding human chimp pig mouse RNfold for each sequence [Lorenz et al, lg for Mol Biol, 2011] use NSPD raph Kernel to compare alignments 5 / 15
10 RNsclust pipeline: From input alignments to clustering Set of multiple alignments Reliable basepairs onstrained folding NSPD raph Kernel Pairwise similarities raphlust postprocessing steps lustering of multiple alignments 6 / 15
11 onstructing a benchmark data set Split Rfam 12 family seed alignments into subalignments: Human Rfam family seed alignment Family subalignments himp Mouse split Pig 7 / 15
12 onstructing a benchmark data set Split Rfam 12 family seed alignments into subalignments: Human Rfam family seed alignment Family subalignments himp Mouse split Pig 1 Each subalignment contains one human sequence 2 Similar sequences from different species form a subalignment subalignments have max sequence identity 7 / 15
13 onstructing a benchmark data set Split Rfam 12 family seed alignments into subalignments: Human Rfam family seed alignment Family subalignments himp Mouse split Pig 1 Each subalignment contains one human sequence 2 Similar sequences from different species form a subalignment subalignments have max sequence identity Ideal clustering groups only subalignments from same Rfam family 7 / 15
14 omparing sequence to alignment clustering Subalignments in benchmark set Human Extract human sequences Human sequences in benchmark set luster with raphlust luster with RNsclust ompare clustering performance 8 / 15
15 Low covariation in the benchmark data set Benchmark set has high verage Pairwise Sequence Identity (PSI) in alignments High PSI ount mean PSI 48 families 234 alignments PSI per alignment (based on Rfam alignments) 9 / 15
16 Low covariation in the benchmark data set Benchmark set has high verage Pairwise Sequence Identity (PSI) in alignments High PSI ount mean PSI 48 families 234 alignments PSI per alignment (based on Rfam alignments) limit PSI to study effect of covariation on clustering performance 9 / 15
17 Benchmark sets with different degrees of covariation reate 2 additional benchmark data set with controlled PSI in alignments Medium PSI Low PSI ount 30 ount mean PSI 26 families 166 alignments PSI per alignment (based on Rfam alignments) mean PSI 10 families 92 alignments PSI per alignment (based on Rfam alignments) 10 / 15
18 lustering performance metrics - V-measure Homogeneity H: each cluster contains only members of a single family ompleteness : all members of a given family are in same cluster V-measure [Rosenberg and Hirschberg, 2007] is harmonic mean of H and : V = 2 H H + 11 / 15
19 lustering performance metrics - djusted Rand Index a = #object pairs from same family assigned to same cluster b = #object pairs from different families assigned to different clusters n = number of alignments Rand Index = a ( + b n 2) djusted Rand Index [Hubert and rabie, 1985] is Rand Index [Rand, 1971] adjusted for chance 12 / 15
20 More covariation improves RNsclust performance 10 V-measure 10 RNsclust raphlust djusted Rand Index V-measure djusted Rand Index Low PSI (049) Medium PSI (064) High PSI (081) Low PSI (049) Medium PSI (064) High PSI (081) 13 / 15
21 RNsclust onclusion Leverage conserved sec structure derived from multiple alignments NSPD raph Kernel as similarity measure Improved clustering compared to raphlust 14 / 15
22 RNsclust onclusion Leverage conserved sec structure derived from multiple alignments NSPD raph Kernel as similarity measure Improved clustering compared to raphlust Next step: enome-scale clustering of potential ncrns 14 / 15
23 cknowledgements Bioinformatics roup, niversity of Freiburg: Milad Miladi Fabrizio osta Rolf Backofen RTH, niversity of openhagen: Stefan Seemann Jakob Hull Havgaard Jan orodkin Funding: DF, Danish enter for Scientific omputing, Innovation Fund Denmark, Danish ancer Society 15 / 15
24 cknowledgements Bioinformatics roup, niversity of Freiburg: Milad Miladi Fabrizio osta Rolf Backofen RTH, niversity of openhagen: Stefan Seemann Jakob Hull Havgaard Jan orodkin Funding: DF, Danish enter for Scientific omputing, Innovation Fund Denmark, Danish ancer Society Thank you for your attention! 15 / 15
25 16 / 15
26 Identifying similarities of secondary structures Neighborhood Subgraph Pairwise Distance (NSPD) Kernel used in raphlust [Heyne et al, Bioinformatics, 2012] Extract substructures: ' 6 3' structure k-mers graph kernel H H H H H sparse feature vector ncrns highly similar if many shared substructures / 15
27 RNsclust full pipeline RNsclust specific input and methodology Input multiple alignments Predicting sets of sec structures Sparse feature vectorization Local sensitivity hashing andidate clusters of multiple alignments raphlust methodology andidate clusters of representatives luster refinement and extension Report clusters Iterate until no candiate left Steps executed in parallel are shown as stacks 18 / 15
28 V-measure lusters K = {K 1,, K m }; true classes = { 1,, n } h is defined as: Homogeneity h = { 1 if H( K) = 0 1 H( K) H() H( K) is the conditional entropy of the classes given the clustering and H() is the entropy of the classes, ie, else K H( K) = H() = c=1 k=1 c=1 a ck N log K k=1 a ck log n a ck c=1 a ck K k=1 a ck n 19 / 15
29 V-measure lusters K = {K 1,, K m }; true classes = { 1,, n } On the other hand, completeness c is defined as: c = { 1 if H(K ) = 0 1 H(K ) H(K) else where H(K ) is the conditional entropy of the clustering given the classes and H(K) is the entropy of the clustering, ie, K H(K ) = K H(K) = k=1 c=1 k=1 a ck N log c=1 a ck log n a ck K k=1 a ck c=1 a ck n V-measure is harmonic mean of homogeneity and completeness and is not normalized wrt random labeling 00 is as bad as it can be, 10 is perfect 19 / 15
30 onstructing a benchmark data set Split each Rfam 12 family seed alignment into subalignments Similar sequences from different species form a subalignment Human Rfam family seed alignment Family subalignments himp Mouse split Pig 20 / 15
31 onstructing a benchmark data set 1) Each sequence in the alignment is represented as a node in a graph Human himp Mouse Pig 20 / 15
32 onstructing a benchmark data set 2) Remove sequences with pairwise sequence identify (PSI) > 095 Human himp Mouse Pig 20 / 15
33 onstructing a benchmark data set 3) dd edge between sequences from diff species with PSI (09, 095] Human himp Mouse Pig 20 / 15
34 onstructing a benchmark data set 4) Search for cliques in graph Human himp Mouse Pig 20 / 15
35 onstructing a benchmark data set 5) dd clique with max PSI as subalignment to benchmark data set Human himp Mouse Pig Family subalignments (liques) 1 20 / 15
36 onstructing a benchmark data set 6) dd edge between sequences from diff species with PSI (08, 09] Human himp Mouse Pig Family subalignments (liques) 1 20 / 15
37 onstructing a benchmark data set 7) dd clique as subalignment to benchmark data set Human himp Mouse Pig Family subalignments (liques) / 15
38 onstructing a benchmark data set 8) dd edge between sequences from diff species with PSI (07, 08] Human himp Mouse Pig Family subalignments (liques) / 15
A graph kernel approach to the identification and characterisation of structured non-coding RNAs using multiple sequence alignment information
graph kernel approach to the identification and characterisation of structured noncoding RNs using multiple sequence alignment information Mariam lshaikh lbert Ludwigs niversity Freiburg, Department of
More informationStructure Local Multiple Alignment of RNA
Structure Local Multiple lignment of RN Wolfgang Otto 1 and Sebastian Will 2 and Rolf ackofen 2 1 ioinformatics, niversity Leipzig, D-04107 Leipzig wolfgang@bioinf.uni-leipzig.de 2 ioinformatics, lbert-ludwigs-niversity
More informationA phylogenetic view on RNA structure evolution
3 2 9 4 7 3 24 23 22 8 phylogenetic view on RN structure evolution 9 26 6 52 7 5 6 37 57 45 5 84 63 86 77 65 3 74 7 79 8 33 9 97 96 89 47 87 62 32 34 42 73 43 44 4 76 58 75 78 93 39 54 82 99 28 95 52 46
More informationConserved RNA Structures. Ivo L. Hofacker. Institut for Theoretical Chemistry, University Vienna.
onserved RN Structures Ivo L. Hofacker Institut for Theoretical hemistry, University Vienna http://www.tbi.univie.ac.at/~ivo/ Bled, January 2002 Energy Directed Folding Predict structures from sequence
More informationChapter DM:II (continued)
Chapter DM:II (continued) II. Cluster Analysis Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained Cluster Analysis
More informationRNA Abstract Shape Analysis
ourse: iegerich RN bstract nalysis omplete shape iegerich enter of Biotechnology Bielefeld niversity robert@techfak.ni-bielefeld.de ourse on omputational RN Biology, Tübingen, March 2006 iegerich ourse:
More informationRNA-Strukturvorhersage Strukturelle Bioinformatik WS16/17
RNA-Strukturvorhersage Strukturelle Bioinformatik WS16/17 Dr. Stefan Simm, 01.11.2016 simm@bio.uni-frankfurt.de RNA secondary structures a. hairpin loop b. stem c. bulge loop d. interior loop e. multi
More informationMariam Alshaikh. April 15, 2015
Master Thesis graph kernel approach to the identification and characterization of structured non-coding RNs using multiple sequence alignment information Mariam lshaikh pril 15, 2015 Faculty of Engineering
More informationFlexible RNA design under structure and sequence constraints using a language theoretic approach
Flexible RN design under structure and sequence constraints using a language theoretic approach Yu Zhou,, Yann Ponty Stéphane Vialette Jérôme Waldispühl Yi Zhang lain Denise LRI, niv. Paris-Sud, Orsay
More informationDUPLICATED RNA GENES IN TELEOST FISH GENOMES
DPLITED RN ENES IN TELEOST FISH ENOMES Dominic Rose, Julian Jöris, Jörg Hackermüller, Kristin Reiche, Qiang LI, Peter F. Stadler Bioinformatics roup, Department of omputer Science, and Interdisciplinary
More informationGenome 559 Wi RNA Function, Search, Discovery
Genome 559 Wi 2009 RN Function, Search, Discovery The Message Cells make lots of RN noncoding RN Functionally important, functionally diverse Structurally complex New tools required alignment, discovery,
More informationSearching for Noncoding RNA
Searching for Noncoding RN Larry Ruzzo omputer Science & Engineering enome Sciences niversity of Washington http://www.cs.washington.edu/homes/ruzzo Bio 2006, Seattle, 8/4/2006 1 Outline Noncoding RN Why
More informationProtein Folding by Robotics
Protein Folding by Robotics 1 TBI Winterseminar 2006 February 21, 2006 Protein Folding by Robotics 1 TBI Winterseminar 2006 February 21, 2006 Protein Folding by Robotics Probabilistic Roadmap Planning
More informationLocal Alignment of RNA Sequences with Arbitrary Scoring Schemes
Local Alignment of RNA Sequences with Arbitrary Scoring Schemes Rolf Backofen 1, Danny Hermelin 2, ad M. Landau 2,3, and Oren Weimann 4 1 Institute of omputer Science, Albert-Ludwigs niversität Freiburg,
More informationSequence Bioinformatics. Multiple Sequence Alignment Waqas Nasir
Sequence Bioinformatics Multiple Sequence Alignment Waqas Nasir 2010-11-12 Multiple Sequence Alignment One amino acid plays coy; a pair of homologous sequences whisper; many aligned sequences shout out
More information12/31/2010. Overview. 05-Boolean Algebra Part 3 Text: Unit 3, 7. DeMorgan s Law. Example. Example. DeMorgan s Law
Overview 05-oolean lgebra Part 3 Text: Unit 3, 7 EEGR/ISS 201 Digital Operations and omputations Winter 2011 DeMorgan s Laws lgebraic Simplifications Exclusive-OR and Equivalence Functionally omplete NND-NOR
More informationBioinformatics: Network Analysis
Bioinformatics: Network Analysis Comparative Network Analysis COMP 572 (BIOS 572 / BIOE 564) - Fall 2013 Luay Nakhleh, Rice University 1 Biomolecular Network Components 2 Accumulation of Network Components
More informationOutline. Sequence-comparison methods. Buzzzzzzzz. Why compare sequences? Gerard Kleywegt Uppsala University
MB330 - January, 2006 Sequence-comparison methods erard Kleywegt Uppsala University Outline! Why compare sequences?! Dotplots! airwise sequence alignments &! Multiple sequence alignments! rofile methods!
More informationPhylogeny Tree Algorithms
Phylogeny Tree lgorithms Jianlin heng, PhD School of Electrical Engineering and omputer Science University of entral Florida 2006 Free for academic use. opyright @ Jianlin heng & original sources for some
More informationAlgorithms in Bioinformatics
Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri RNA Structure Prediction Secondary
More informationCMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison
CMPS 6630: Introduction to Computational Biology and Bioinformatics Structure Comparison Protein Structure Comparison Motivation Understand sequence and structure variability Understand Domain architecture
More informationImpact Of The Energy Model On The Complexity Of RNA Folding With Pseudoknots
Impact Of The Energy Model On The omplexity Of RN Folding With Pseudoknots Saad Sheikh, Rolf Backofen Yann Ponty, niversity of Florida, ainesville, S lbert Ludwigs niversity, Freiburg, ermany LIX, NRS/Ecole
More informationA Method for Aligning RNA Secondary Structures
Method for ligning RN Secondary Structures Jason T. L. Wang New Jersey Institute of Technology J Liu, JTL Wang, J Hu and B Tian, BM Bioinformatics, 2005 1 Outline Introduction Structural alignment of RN
More informationA New Space for Comparing Graphs
A New Space for Comparing Graphs Anshumali Shrivastava and Ping Li Cornell University and Rutgers University August 18th 2014 Anshumali Shrivastava and Ping Li ASONAM 2014 August 18th 2014 1 / 38 Main
More informationValue-Ordering and Discrepancies. Ciaran McCreesh and Patrick Prosser
Value-Ordering and Discrepancies Maintaining Arc Consistency (MAC) Achieve (generalised) arc consistency (AC3, etc). If we have a domain wipeout, backtrack. If all domains have one value, we re done. Pick
More informationGraph Alignment and Biological Networks
Graph Alignment and Biological Networks Johannes Berg http://www.uni-koeln.de/ berg Institute for Theoretical Physics University of Cologne Germany p.1/12 Networks in molecular biology New large-scale
More informationClustering using Mixture Models
Clustering using Mixture Models The full posterior of the Gaussian Mixture Model is p(x, Z, µ,, ) =p(x Z, µ, )p(z )p( )p(µ, ) data likelihood (Gaussian) correspondence prob. (Multinomial) mixture prior
More informationSeqAn and OpenMS Integration Workshop. Temesgen Dadi, Julianus Pfeuffer, Alexander Fillbrunn The Center for Integrative Bioinformatics (CIBI)
SeqAn and OpenMS Integration Workshop Temesgen Dadi, Julianus Pfeuffer, Alexander Fillbrunn The Center for Integrative Bioinformatics (CIBI) Mass-spectrometry data analysis in KNIME Julianus Pfeuffer,
More informationComparative Genomics: Sequence, Structure, and Networks. Bonnie Berger MIT
Comparative Genomics: Sequence, Structure, and Networks Bonnie Berger MIT Comparative Genomics Look at the same kind of data across species with the hope that areas of high correlation correspond to functional
More informationJeremy Chang Identifying protein protein interactions with statistical coupling analysis
Jeremy Chang Identifying protein protein interactions with statistical coupling analysis Abstract: We used an algorithm known as statistical coupling analysis (SCA) 1 to create a set of features for building
More informationMultiple Sequence Alignment
Multiple equence lignment Four ami Khuri Dept of omputer cience an José tate University Multiple equence lignment v Progressive lignment v Guide Tree v lustalw v Toffee v Muscle v MFFT * 20 * 0 * 60 *
More informationClassification Semi-supervised learning based on network. Speakers: Hanwen Wang, Xinxin Huang, and Zeyu Li CS Winter
Classification Semi-supervised learning based on network Speakers: Hanwen Wang, Xinxin Huang, and Zeyu Li CS 249-2 2017 Winter Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions Xiaojin
More informationA New Similarity Measure among Protein Sequences
A New Similarity Measure among Protein Sequences Kuen-Pin Wu, Hsin-Nan Lin, Ting-Yi Sung and Wen-Lian Hsu * Institute of Information Science Academia Sinica, Taipei 115, Taiwan Abstract Protein sequence
More information5. MULTIPLE SEQUENCE ALIGNMENT BIOINFORMATICS COURSE MTAT
5. MULTIPLE SEQUENCE ALIGNMENT BIOINFORMATICS COURSE MTAT.03.239 03.10.2012 ALIGNMENT Alignment is the task of locating equivalent regions of two or more sequences to maximize their similarity. Homology:
More informationSequence Alignment Techniques and Their Uses
Sequence Alignment Techniques and Their Uses Sarah Fiorentino Since rapid sequencing technology and whole genomes sequencing, the amount of sequence information has grown exponentially. With all of this
More informationA General Method for Combining Predictors Tested on Protein Secondary Structure Prediction
A General Method for Combining Predictors Tested on Protein Secondary Structure Prediction Jakob V. Hansen Department of Computer Science, University of Aarhus Ny Munkegade, Bldg. 540, DK-8000 Aarhus C,
More informationSUPPLEMENTARY INFORMATION
Supplementary information S3 (box) Methods Methods Genome weighting The currently available collection of archaeal and bacterial genomes has a highly biased distribution of isolates across taxa. For example,
More informationTowards a Comprehensive Annotation of Structured RNAs in Drosophila
Towards a Comprehensive Annotation of Structured RNAs in Drosophila Rebecca Kirsch 31st TBI Winterseminar, Bled 20/02/2016 Studying Non-Coding RNAs in Drosophila Why Drosophila? especially for novel molecules
More informationScale & Affine Invariant Interest Point Detectors
Scale & Affine Invariant Interest Point Detectors KRYSTIAN MIKOLAJCZYK AND CORDELIA SCHMID [2004] Shreyas Saxena Gurkirit Singh 23/11/2012 Introduction We are interested in finding interest points. What
More informationLOCAL SEQUENCE-STRUCTURE MOTIFS IN RNA
Journal of Bioinformatics and omputational Biology c Imperial ollege Press LOL SEQENE-STRTRE MOTIFS IN RN ROLF BKOFEN and SEBSTIN WILL hair for Bioinformatics at the Institute of omputer Science, Friedrich-Schiller-niversitaet
More informationAn Introduction to Exponential-Family Random Graph Models
An Introduction to Exponential-Family Random Graph Models Luo Lu Feb.8, 2011 1 / 11 Types of complications in social network Single relationship data A single relationship observed on a set of nodes at
More informationInteraction Network Analysis
CSI/BIF 5330 Interaction etwork Analsis Young-Rae Cho Associate Professor Department of Computer Science Balor Universit Biological etworks Definition Maps of biochemical reactions, interactions, regulations
More informationBasic Local Alignment Search Tool
Basic Local Alignment Search Tool Alignments used to uncover homologies between sequences combined with phylogenetic studies o can determine orthologous and paralogous relationships Local Alignment uses
More informationI. Pairwise Display and the PairViz R package
raph theoretic methods for ata Visualization. I. Pairwise isplay and the PairViz R package Wayne Oldford based on joint work with atherine Hurley Tutorial 1 The problem an we automatically, yet meaningfully,
More informationTools and Algorithms in Bioinformatics
Tools and Algorithms in Bioinformatics GCBA815, Fall 2015 Week-4 BLAST Algorithm Continued Multiple Sequence Alignment Babu Guda, Ph.D. Department of Genetics, Cell Biology & Anatomy Bioinformatics and
More informationOutline Sequence-comparison methods. Buzzzzzzzz. MB330 - The class of 2008
Outline Sequence-comparison methods erard Kleywegt Uppsala University Why compare sequences otplots airwise sequence alignments Multiple sequence alignments rofile methods Buzzzzzzzz Why compare sequences
More informationSUPPLEMENTARY INFORMATION
Supplementary information S1 (box). Supplementary Methods description. Prokaryotic Genome Database Archaeal and bacterial genome sequences were downloaded from the NCBI FTP site (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/)
More informationAlgorithmische Bioinformatik WS 11/12:, by R. Krause/ K. Reinert, 14. November 2011, 12: Motif finding
Algorithmische Bioinformatik WS 11/12:, by R. Krause/ K. Reinert, 14. November 2011, 12:00 4001 Motif finding This exposition was developed by Knut Reinert and Clemens Gröpl. It is based on the following
More informationGray-box Inference for Structured Gaussian Process Models: Supplementary Material
Gray-box Inference for Structured Gaussian Process Models: Supplementary Material Pietro Galliani, Amir Dezfouli, Edwin V. Bonilla and Novi Quadrianto February 8, 017 1 Proof of Theorem Here we proof the
More informationRNA Search and! Motif Discovery" Genome 541! Intro to Computational! Molecular Biology"
RNA Search and! Motif Discovery" Genome 541! Intro to Computational! Molecular Biology" Day 1" Many biologically interesting roles for RNA" RNA secondary structure prediction" 3 4 Approaches to Structure
More informationClustering Lecture 1: Basics. Jing Gao SUNY Buffalo
Clustering Lecture 1: Basics Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced topics Clustering
More informationChapter 5. Proteomics and the analysis of protein sequence Ⅱ
Proteomics Chapter 5. Proteomics and the analysis of protein sequence Ⅱ 1 Pairwise similarity searching (1) Figure 5.5: manual alignment One of the amino acids in the top sequence has no equivalent and
More informationTowards Detecting Protein Complexes from Protein Interaction Data
Towards Detecting Protein Complexes from Protein Interaction Data Pengjun Pei 1 and Aidong Zhang 1 Department of Computer Science and Engineering State University of New York at Buffalo Buffalo NY 14260,
More informationAlgorithms and Data Structures
Algorithms and Data Structures, Divide and Conquer Albert-Ludwigs-Universität Freiburg Prof. Dr. Rolf Backofen Bioinformatics Group / Department of Computer Science Algorithms and Data Structures, December
More informationIntroduction to protein alignments
Introduction to protein alignments Comparative Analysis of Proteins Experimental evidence from one or more proteins can be used to infer function of related protein(s). Gene A Gene X Protein A compare
More informationThe University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.
The University of Texas at Austin Department of Electrical and Computer Engineering EE381V: Large Scale Learning Spring 2013 Assignment 1 Caramanis/Sanghavi Due: Thursday, Feb. 7, 2013. (Problems 1 and
More informationNcRNA Homology Search Using Hamming Distance Seeds
NcRN Homology Search Using Hamming Distance Seeds Osama ljawad Dept. of omputer Science and Engineering Michigan State University East Lansing, MI 48824 aljawado@msu.edu Yanni Sun Dept. of omputer Science
More informationclustq: Efficient Protein Decoy Clustering Using Superposition-free Weighted Internal Distance Comparisons
clustq: Efficient Protein Decoy Clustering Using Superposition-free Weighted Internal Distance Comparisons Debswapna Auburn University ACM-BCB August 31, 2018 What is protein decoy clustering? Clustering
More informationKernel Methods and Support Vector Machines
Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete
More informationThe Lovász Local Lemma : A constructive proof
The Lovász Local Lemma : A constructive proof Andrew Li 19 May 2016 Abstract The Lovász Local Lemma is a tool used to non-constructively prove existence of combinatorial objects meeting a certain conditions.
More informationLecture 24: Randomized Algorithms
Lecture 24: Randomized Algorithms Chapter 12 11/26/2013 Comp 465 Fall 2013 1 Randomized Algorithms Randomized algorithms incorporate random, rather than deterministic, decisions Commonly used in situations
More informationCMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction
CMPS 6630: Introduction to Computational Biology and Bioinformatics Tertiary Structure Prediction Tertiary Structure Prediction Why Should Tertiary Structure Prediction Be Possible? Molecules obey the
More informationSolving the MWT. Recall the ILP for the MWT. We can obtain a solution to the MWT problem by solving the following ILP:
Solving the MWT Recall the ILP for the MWT. We can obtain a solution to the MWT problem by solving the following ILP: max subject to e i E ω i x i e i C E x i {0, 1} x i C E 1 for all critical mixed cycles
More informationCMPS 3110: Bioinformatics. Tertiary Structure Prediction
CMPS 3110: Bioinformatics Tertiary Structure Prediction Tertiary Structure Prediction Why Should Tertiary Structure Prediction Be Possible? Molecules obey the laws of physics! Conformation space is finite
More informationA New Distributed Algorithm for Side-Chain Positioning in the Process of Protein Docking
5nd IEEE Conference on Decision and Control December 10-13, 013. Florence, Italy A New Distributed Algorithm for Side-Chain Positioning in the Process of Protein Docking Mohammad Moghadasi, Dima Kozakov,
More informationNetwork Alignment 858L
Network Alignment 858L Terms & Questions A homologous h Interolog = B h Species 1 Species 2 Are there conserved pathways? What is the minimum set of pathways required for life? Can we compare networks
More informationBetter restore the recto side of a document with an estimation of the verso side: Markov model and inference with graph cuts
June 23 rd 2008 Better restore the recto side of a document with an estimation of the verso side: Markov model and inference with graph cuts Christian Wolf Laboratoire d InfoRmatique en Image et Systèmes
More informationLecture 23: Randomized Algorithms
Lecture 23: Randomized Algorithms Chapter 12 11/20/2014 Comp 555 Bioalgorithms (Fall 2014) 1 Randomized Algorithms Randomized algorithms incorporate random, rather than deterministic, decisions Commonly
More informationIntroduction to Bioinformatics
Introduction to Bioinformatics Lecture : p he biological problem p lobal alignment p Local alignment p Multiple alignment 6 Background: comparative genomics p Basic question in biology: what properties
More informationC3020 Molecular Evolution. Exercises #3: Phylogenetics
C3020 Molecular Evolution Exercises #3: Phylogenetics Consider the following sequences for five taxa 1-5 and the known outgroup O, which has the ancestral states (note that sequence 3 has changed from
More informationBackground: comparative genomics. Sequence similarity. Homologs. Similarity vs homology (2) Similarity vs homology. Sequence Alignment (chapter 6)
Sequence lignment (chapter ) he biological problem lobal alignment Local alignment Multiple alignment Background: comparative genomics Basic question in biology: what properties are shared among organisms?
More informationPhylogenetic trees 07/10/13
Phylogenetic trees 07/10/13 A tree is the only figure to occur in On the Origin of Species by Charles Darwin. It is a graphical representation of the evolutionary relationships among entities that share
More informationTowards firm foundations for the rational design of RNA molecules
39 28 4 8 282 28 98 32 3 36 68 3 82 : (405)-43-> 43 42 4 40 8 25 37 46 26 36 24 27 E23_9 34 22 3 33 32 E23_2 2 48 o 9 Towards firm foundations for the rational design of RN molecules 47 2 E23_6 E23_ 45
More informationMarkov Random Fields
Markov Random Fields Umamahesh Srinivas ipal Group Meeting February 25, 2011 Outline 1 Basic graph-theoretic concepts 2 Markov chain 3 Markov random field (MRF) 4 Gauss-Markov random field (GMRF), and
More informationDecember 2, :4 WSPC/INSTRUCTION FILE jbcb-profile-kernel. Profile-based string kernels for remote homology detection and motif extraction
Journal of Bioinformatics and Computational Biology c Imperial College Press Profile-based string kernels for remote homology detection and motif extraction Rui Kuang 1, Eugene Ie 1,3, Ke Wang 1, Kai Wang
More informationProtein Complex Identification by Supervised Graph Clustering
Protein Complex Identification by Supervised Graph Clustering Yanjun Qi 1, Fernanda Balem 2, Christos Faloutsos 1, Judith Klein- Seetharaman 1,2, Ziv Bar-Joseph 1 1 School of Computer Science, Carnegie
More information2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity.
Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity. A global picture of the protein universe will help us to understand
More informationSequence comparison: Score matrices. Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas
Sequence comparison: Score matrices Genome 559: Introduction to Statistical and omputational Genomics Prof James H Thomas Informal inductive proof of best alignment path onsider the last step in the best
More informationComputational Design of New and Recombinant Selenoproteins
Computational Design of ew and Recombinant Selenoproteins Rolf Backofen and Friedrich-Schiller-University Jena Institute of Computer Science Chair for Bioinformatics 1 Computational Design of ew and Recombinant
More informationModel Accuracy Measures
Model Accuracy Measures Master in Bioinformatics UPF 2017-2018 Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA Barcelona, Spain Variables What we can measure (attributes) Hypotheses
More informationStatistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences
Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD Department of Computer Science University of Missouri 2008 Free for Academic
More informationWeek 10: Homology Modelling (II) - HHpred
Week 10: Homology Modelling (II) - HHpred Course: Tools for Structural Biology Fabian Glaser BKU - Technion 1 2 Identify and align related structures by sequence methods is not an easy task All comparative
More informationMultiple Whole Genome Alignment
Multiple Whole Genome Alignment BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 206 Anthony Gitter gitter@biostat.wisc.edu These slides, excluding third-party material, are licensed under CC BY-NC 4.0 by
More informationEVALUATION OF RNA SECONDARY STRUCTURE MOTIFS USING REGRESSION ANALYSIS
EVLTION OF RN SEONDRY STRTRE MOTIFS SIN RERESSION NLYSIS Mohammad nwar School of Information Technology and Engineering, niversity of Ottawa e-mail: manwar@site.uottawa.ca bstract Recent experimental evidences
More informationExample: multivariate Gaussian Distribution
School of omputer Science Probabilistic Graphical Models Representation of undirected GM (continued) Eric Xing Lecture 3, September 16, 2009 Reading: KF-chap4 Eric Xing @ MU, 2005-2009 1 Example: multivariate
More informationComparative Network Analysis
Comparative Network Analysis BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2016 Anthony Gitter gitter@biostat.wisc.edu These slides, excluding third-party material, are licensed under CC BY-NC 4.0 by
More informationRNA Protein Interaction
Introduction RN binding in detail structural analysis Examples RN Protein Interaction xel Wintsche February 16, 2009 xel Wintsche RN Protein Interaction Introduction RN binding in detail structural analysis
More informationGEOMETRY OF PROTEIN STRUCTURES. I. WHY HYPERBOLIC SURFACES ARE A GOOD APPROXIMATION FOR BETA-SHEETS
GEOMETRY OF PROTEIN STRUCTURES. I. WHY HYPERBOLIC SURFACES ARE A GOOD APPROXIMATION FOR BETA-SHEETS Boguslaw Stec 1;2 and Vladik Kreinovich 1;3 1 Bioinformatics Program 2 Chemistry Department 3 Department
More informationBearing fault diagnosis based on EMD-KPCA and ELM
Bearing fault diagnosis based on EMD-KPCA and ELM Zihan Chen, Hang Yuan 2 School of Reliability and Systems Engineering, Beihang University, Beijing 9, China Science and Technology on Reliability & Environmental
More informationConditional Graphical Models
PhD Thesis Proposal Conditional Graphical Models for Protein Structure Prediction Yan Liu Language Technologies Institute University Thesis Committee Jaime Carbonell (Chair) John Lafferty Eric P. Xing
More informationMultiple Alignment. Slides revised and adapted to Bioinformática IST Ana Teresa Freitas
n Introduction to Bioinformatics lgorithms Multiple lignment Slides revised and adapted to Bioinformática IS 2005 na eresa Freitas n Introduction to Bioinformatics lgorithms Outline Dynamic Programming
More informationComputer Vision Group Prof. Daniel Cremers. 14. Clustering
Group Prof. Daniel Cremers 14. Clustering Motivation Supervised learning is good for interaction with humans, but labels from a supervisor are hard to obtain Clustering is unsupervised learning, i.e. it
More informationBarMap & SundialsWrapper
BarMap & SundialsWrapper advanced RN folding kinetics Stefan Badelt Institute for Theoretical hemistry Theoretical Biochemistry roup stef@tbi.univie.ac.at February 17, 215 1 Outline BarMap.pm & SundialsWrapper.pm
More informationMachine Learning Summer School
Machine Learning Summer School Lecture 1: Introduction to Graphical Models Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ epartment of ngineering University of ambridge, UK
More informationNetworks as vectors of their motif frequencies and 2-norm distance as a measure of similarity
Networks as vectors of their motif frequencies and 2-norm distance as a measure of similarity CS322 Project Writeup Semih Salihoglu Stanford University 353 Serra Street Stanford, CA semih@stanford.edu
More informationSequence comparison: Score matrices
Sequence comparison: Score matrices http://facultywashingtonedu/jht/gs559_2013/ Genome 559: Introduction to Statistical and omputational Genomics Prof James H Thomas FYI - informal inductive proof of best
More informationTiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1
Tiffany Samaroo MB&B 452a December 8, 2003 Take Home Final Topic 1 Prior to 1970, protein and DNA sequence alignment was limited to visual comparison. This was a very tedious process; even proteins with
More information9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)
I9 Introduction to Bioinformatics, 0 Phylogenetic ree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & omputing, IUB Evolution theory Speciation Evolution of new organisms is driven by
More information17 Non-collinear alignment Motivation A B C A B C A B C A B C D A C. This exposition is based on:
17 Non-collinear alignment This exposition is based on: 1. Darling, A.E., Mau, B., Perna, N.T. (2010) progressivemauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One 5(6):e11147.
More information4th IMPRS Astronomy Summer School Drawing Astrophysical Inferences from Data Sets
4th IMPRS Astronomy Summer School Drawing Astrophysical Inferences from Data Sets William H. Press The University of Texas at Austin Lecture 6 IMPRS Summer School 2009, Prof. William H. Press 1 Mixture
More information