Mitochondrial Genome Annotation

Similar documents
BLAST. Varieties of BLAST

Detecting non-coding RNA in Genomic Sequences

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Comparative Genomics II

RGP finder: prediction of Genomic Islands

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010

something about srna in archaea

Aligning Circularly Ordered Lists. Why would anyone want to do that? Peter F. STADLER

Using Ensembles of Hidden Markov Models for Grand Challenges in Bioinformatics

Supplementary Information

Hidden Markov Models (HMMs) and Profiles

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

PGA: A Program for Genome Annotation by Comparative Analysis of. Maximum Likelihood Phylogenies of Genes and Species

Bioinformatics Chapter 1. Introduction

DATA ACQUISITION FROM BIO-DATABASES AND BLAST. Natapol Pornputtapong 18 January 2018

BINF6201/8201. Molecular phylogenetic methods

Towards a Comprehensive Annotation of Structured RNAs in Drosophila

Comparative Gene Finding. BMI/CS 776 Spring 2015 Colin Dewey

(Lys), resulting in translation of a polypeptide without the Lys amino acid. resulting in translation of a polypeptide without the Lys amino acid.

Mathangi Thiagarajan Rice Genome Annotation Workshop May 23rd, 2007

Lecture 14: Multiple Sequence Alignment (Gene Finding, Conserved Elements) Scribe: John Ekins

Genome Annotation. Qi Sun Bioinformatics Facility Cornell University

Cluster Analysis of Gene Expression Microarray Data. BIOL 495S/ CS 490B/ MATH 490B/ STAT 490B Introduction to Bioinformatics April 8, 2002

-max_target_seqs: maximum number of targets to report

A Method for Aligning RNA Secondary Structures

Sequence Alignment Techniques and Their Uses

SUPPLEMENTARY INFORMATION

Alignment principles and homology searching using (PSI-)BLAST. Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU)

SUPPLEMENTARY INFORMATION

Protein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki.

Model Accuracy Measures

Structure-Based Comparison of Biomolecules

Motivating the need for optimal sequence alignments...

Patterns and profiles applications of multiple alignments. Tore Samuelsson March 2013

Markov Chains and Hidden Markov Models. = stochastic, generative models

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

MATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)

Tree Building Activity

Bioinformatics. Proteins II. - Pattern, Profile, & Structure Database Searching. Robert Latek, Ph.D. Bioinformatics, Biocomputing

Taxonomy. Content. How to determine & classify a species. Phylogeny and evolution

Early History up to Schedule. Proteins DNA & RNA Schwann and Schleiden Cell Theory Charles Darwin publishes Origin of Species

BMI/CS 776 Lecture #20 Alignment of whole genomes. Colin Dewey (with slides adapted from those by Mark Craven)

Comparative Bioinformatics Midterm II Fall 2004

Basic Local Alignment Search Tool

CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018

Predicting Protein Functions and Domain Interactions from Protein Interactions

GEP Annotation Report

Introduction to Evolutionary Concepts

Assigning Taxonomy to Marker Genes. Susan Huse Brown University August 7, 2014

Conserved RNA Structures. Ivo L. Hofacker. Institut for Theoretical Chemistry, University Vienna.

EECS730: Introduction to Bioinformatics

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

RNA Search and! Motif Discovery" Genome 541! Intro to Computational! Molecular Biology"

5. MULTIPLE SEQUENCE ALIGNMENT BIOINFORMATICS COURSE MTAT

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment

17 Non-collinear alignment Motivation A B C A B C A B C A B C D A C. This exposition is based on:

From Gene to Protein

Quantitative Measurement of Genome-wide Protein Domain Co-occurrence of Transcription Factors

PHYLOGENY AND SYSTEMATICS

CISC 636 Computational Biology & Bioinformatics (Fall 2016)

Phylogenetic Tree Reconstruction

BIOINFORMATICS LAB AP BIOLOGY

Introduction to Bioinformatics

Organelle genome evolution

1. In most cases, genes code for and it is that

98 Algorithms in Bioinformatics I, WS 06, ZBIT, D. Huson, December 6, 2006

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio

Multiple Whole Genome Alignment

a-dB. Code assigned:

A. Incorrect! In the binomial naming convention the Kingdom is not part of the name.

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program)

Alignment Strategies for Large Scale Genome Alignments

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment

Predicting RNA Secondary Structure Using Profile Stochastic Context-Free Grammars and Phylogenic Analysis

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

Journal Club Kairi Raime

Evaluation & Credibility Issues

The Mitochondrion. Renáta Schipp

Mitochondria Mitochondria were first seen by kollicker in 1850 in muscles and called them sarcosomes. Flemming (1882) described these organelles as

Videos. Bozeman, transcription and translation: Crashcourse: Transcription and Translation -

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

Lesson Overview. Ribosomes and Protein Synthesis 13.2

Microbes usually have few distinguishing properties that relate them, so a hierarchical taxonomy mainly has not been possible.

Microbial Taxonomy. Slowly evolving molecules (e.g., rrna) used for large-scale structure; "fast- clock" molecules for fine-structure.

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Protein function prediction based on sequence analysis

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

EECS730: Introduction to Bioinformatics

CS612 - Algorithms in Bioinformatics

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

Computational methods for predicting protein-protein interactions

Part III - Bioinformatics Study of Aminoacyl trna Synthetases. VMD Multiseq Tutorial Web tools. Perth, Australia 2004 Computational Biology Workshop

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Grundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson

Sequence analysis and comparison

Accelerating Biomolecular Nuclear Magnetic Resonance Assignment with A*

PREDICTION OF HETERODIMERIC PROTEIN COMPLEXES FROM PROTEIN-PROTEIN INTERACTION NETWORKS USING DEEP LEARNING

Grundlagen der Systembiologie und der Modellierung epigenetischer Prozesse

Transcription:

Protein Genes 1,2 1 Institute of Bioinformatics University of Leipzig 2 Department of Bioinformatics Lebanese University TBI Bled 2015

Outline Introduction Mitochondrial DNA Problem Tools Training Annotation

Mitochondrial DNA Problem Mitochondrial DNA Circular molecule located in mitochondria within eurkaryotic cells. transform energy to a form used by the cells Length about 16500 nucleotide. 13 protein coding genes 22 trna genes 2 rrna [wikipedia]

Mitochondrial DNA Problem Problem Refseq is the most used repository for mitochondrial genome annotation Refseq suffers form several inconsistencies and errors in annotation Missing or incorrect strand Confusing trnl1/trnl2 trns1/trns2 Inconsistencies in gene names(bernt et. al 2012) Problem in developing automated analysis for mitochondrial data

Mitochondrial DNA Problem Objective Develop an automated pipeline for mtdna annotation by refining taxon specific hmm and covariance models.

Tools Training Annotation Tools HMMER an implementation of profile HMMs (Sean Eddy and his group ). hmmbuild build a model from multiple sequences hmmalign algin a model to sequences hmmsearch search a model in sequences database hmmscan search a genome in models database

Tools Training Annotation Training Data Build taxon specific models along the phylogenetic tree nodes Step 1: Build protein models for leaf sequences ABCD AB A B C Step 2: Recursively lift up the model with best score CD D Step 3: Build the models database

Tools Training Annotation Training Data Build taxon specific models along the phylogenetic tree nodes Step 1: Build protein models for leaf sequences ABCD AB M A A M B B M C C Step 2: Recursively lift up the model with best score CD M D D Step 3: Build the models database

Tools Training Annotation Training Data Build taxon specific models along the phylogenetic tree nodes Step 1: Build protein models for leaf sequences Step 2: Recursively lift up the model with best score ABCD M AB M CD AB CD M A A M B B M C C M D D Step 3: Build the models database

Tools Training Annotation Training Data Build taxon specific models along the phylogenetic tree nodes Step 1: Build protein models for leaf sequences Step 2: Recursively lift up the model with best score Step 3: Build the models database M AB M ABCD ABCD M CD AB CD M A A M B B M C C M D D

Tools Training Annotation Example: Part of nad1 Alignment

Tools Training Annotation Annotation Input Start Fasta Translate in 6 Frames Taxonomic level database Search DB Overlap no Output Bed yes Best hit

Train 3843 mt genome sequence from refseq63 Test on 925 genome which are newly annotated in refseq69 Scan against level models database

Phylum and Class Models Phylum models

Phylum and Class Models Phylum models Class models

Phylum and Class Models Sn = TP TP+FN Sp = TP TP+FP Phylum models Class models

Phylum and Class Models Sn = TP TP+FN Sp = TP TP+FP Phylum models Class models Sn = 0.986 Sp = 0.985 Sn = 0.989 Sp = 0.988

An automated pipeline to annotate protein coding genes in mtdna by refining taxon specific hmm Outlook Find the best level database and best parameters to minimize FN and maximize TP Analyse the results in deep to improve the refseq annotation Apply on trna

Thanks to Matthias Bernt Christian Honer Frank Juhling Abdullah Sahyoun Peter Stadler