RGP finder: prediction of Genomic Islands

Similar documents
Bio 119 Bacterial Genomics 6/26/10

Comparative genomics: Overview & Tools + MUMmer algorithm

Synteny Portal Documentation

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Comparing whole genomes

ATLAS of Biochemistry

Topology. 1 Introduction. 2 Chromosomes Topology & Counts. 3 Genome size. 4 Replichores and gene orientation. 5 Chirochores.

SUPPLEMENTARY INFORMATION

(Lys), resulting in translation of a polypeptide without the Lys amino acid. resulting in translation of a polypeptide without the Lys amino acid.

Sequence analysis and comparison

Videos. Bozeman, transcription and translation: Crashcourse: Transcription and Translation -

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

BIOINFORMATICS LAB AP BIOLOGY

Whole Genome Alignments and Synteny Maps

-max_target_seqs: maximum number of targets to report

Mitochondrial Genome Annotation

Sequence Alignment Techniques and Their Uses

Multiple Choice Review- Eukaryotic Gene Expression

Vital Statistics Derived from Complete Genome Sequencing (for E. coli MG1655)

Supplementary Information

Mathangi Thiagarajan Rice Genome Annotation Workshop May 23rd, 2007

CHAPTER : Prokaryotic Genetics

a-dB. Code assigned:

Comparative Genomics II

Genomics and bioinformatics summary. Finding genes -- computer searches

Last updated: Copyright

Supplemental Materials

GENE REGULATION AND PROBLEMS OF DEVELOPMENT

Basic Local Alignment Search Tool

NMR Predictor. Introduction

EBI web resources II: Ensembl and InterPro

Overview of IslandPick pipeline and the generation of GI datasets

This document describes the process by which operons are predicted for genes within the BioHealthBase database.

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)

GEP Annotation Report

9/2/17. Molecular and Cellular Biology. 3. The Cell From Genes to Proteins. key processes

Comparative Genomics Background and Strategies. Nitya Sharma, Emily Rogers, Kanika Arora, Zhiming Zhao, Yun Gyeong Lee

Genomes and Their Evolution

Fitness constraints on horizontal gene transfer

1. In most cases, genes code for and it is that

BIOINFORMATICS: An Introduction

Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p

objective functions...

a-fB. Code assigned:

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment

Week 10: Homology Modelling (II) - HHpred

SUPPLEMENTARY INFORMATION

Chromosomal rearrangements in mammalian genomes : characterising the breakpoints. Claire Lemaitre

TE content correlates positively with genome size

Lesson Overview. Ribosomes and Protein Synthesis 13.2

Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are:

Bioinformatics Chapter 1. Introduction

Biology 105/Summer Bacterial Genetics 8/12/ Bacterial Genomes p Gene Transfer Mechanisms in Bacteria p.

BLAST. Varieties of BLAST

Using Bioinformatics to Study Evolutionary Relationships Instructions

GCD3033:Cell Biology. Transcription

9/11/18. Molecular and Cellular Biology. 3. The Cell From Genes to Proteins. key processes

Supplementary Information

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

ProMass Deconvolution User Training. Novatia LLC January, 2013

Comparative Genomics Background & Strategy. Faction 2

Microbiology / Active Lecture Questions Chapter 10 Classification of Microorganisms 1 Chapter 10 Classification of Microorganisms

BMD645. Integration of Omics

Introduction to Evolutionary Concepts

MassHunter Software Overview

Reading Assignments. A. Genes and the Synthesis of Polypeptides. Lecture Series 7 From DNA to Protein: Genotype to Phenotype

PROTEIN SYNTHESIS INTRO

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Computational Biology: Basics & Interesting Problems

Applications of genome alignment

Supplementary Information for: The genome of the extremophile crucifer Thellungiella parvula

Bacterial Genetics & Operons

RELATIONSHIPS BETWEEN GENES/PROTEINS HOMOLOGUES

PGA: A Program for Genome Annotation by Comparative Analysis of. Maximum Likelihood Phylogenies of Genes and Species

EBI web resources II: Ensembl and InterPro. Yanbin Yin Spring 2013

Eukaryotic vs. Prokaryotic genes

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University

METABOLIC PATHWAY PREDICTION/ALIGNMENT

Mitosis vs Meiosis. Mitosis and Meiosis -- Internet Tutorial

Networks & pathways. Hedi Peterson MTAT Bioinformatics

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus:

Genômica comparativa. João Carlos Setubal IQ-USP outubro /5/2012 J. C. Setubal

Bioinformatics Exercises

USING BLAST TO IDENTIFY PROTEINS THAT ARE EVOLUTIONARILY RELATED ACROSS SPECIES

Whole Genome Alignment. Adam Phillippy University of Maryland, Fall 2012

OECD QSAR Toolbox v.4.1. Tutorial on how to predict Skin sensitization potential taking into account alert performance

Translation Part 2 of Protein Synthesis

Gene expression in prokaryotic and eukaryotic cells, Plasmids: types, maintenance and functions. Mitesh Shrestha

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment

- conserved in Eukaryotes. - proteins in the cluster have identifiable conserved domains. - human gene should be included in the cluster.

Outline. I. Methods. II. Preliminary Results. A. Phylogeny Methods B. Whole Genome Methods C. Horizontal Gene Transfer

Lecture 2. The Blast2GO annotation framework

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Single alignment: Substitution Matrix. 16 march 2017

Computational Biology

The wonderful world of RNA informatics

Comparative RNA-seq analysis of transcriptome dynamics during petal development in Rosa chinensis

Draft document version 0.6; ClustalX version 2.1(PC), (Mac); NJplot version 2.3; 3/26/2012

Transcription:

Training courses on MicroScope platform RGP finder: prediction of Genomic Islands Dynamics of bacterial genomes Gene gain Horizontal gene transfer Gene loss Deletion of one or several genes Duplication Accumulation of pseudogenes Recombination and rearrangements Gene modification Single Nucleotide Polymorphism(SNP)

Prediction of RGPs (Region Of Genomic Plasticity) using Synteny breaks Method : Use BBH (Bidirectional Best Hits) and Synteny (conservation of gene context) to determine conserved blocks between a query organism and a set of compared sequences. Predicted RGPs are regions 5kb between these blocks. RGPs can be sites of insertion of MGEs (Mobile Genetic Elements) or they can result from deletion of particular segments of DNA in one or more compared strains Che et al, (2014) Pathogens Exemple of synteny break Integrase trna Compared organisms Synteny break = RGP

Genomic island characteristics Insertion hotspots: trna, integrases Composition bias: GC % deviation Presence of mobility genes: phages, IS, transposases Direct repeats (DR) Hacker et al., 2001 Dobrindt et al. 2004 Compared organisms selection Percentage of genes conserved in synteny with the query genome compositional results availability Choose one or several organism(s) from PkGDB and/or RefSeq databases Try to choose related organisms to avoid too much rearrangements from distant species

Results: circular view trna predicted RGP compositional bias method results query specific region Additional compositional methods used : SIGI-HMM : codon usage bias IVOM: variable length k-mers bias Results: RGP description predicted RGP details other compositional bias methods results Specificity percentage : % CDS in RGP not in synteny ( table Features associated with RGPs and feature score (arbitrary score for sorting the

Explore and viewer pages click!!! CDSs inside RGP coordinates compositional methods correspondence similarity summary red = no match or constraints not verified green = match and constraints verified Exercises Using Acinetobacter baumannii AYE: Exo1 : find Regions of Genomic Plasticity in AYE compared to Acinetobacter baylyi ADP1. Ø How many regions are predicted? Take a look to the longest predicted region? How many antibiotic resistance genes are present? Try to find this genomic region on the Circular Genome Viewer.

Other tools Identical Gene Name Access in MaGe menu Provides a list of genes which share identical gene names Two or more genes with the same name

Overlapping CDS Access in MaGe menu List of CDSs which overlap, in their 5' extremity, with the following CDS. This list is useful to remove artefactual CDS (false positive) and/or to correct translational start codon position. Length of the overlap Gene 1 of the overlap Gene 2 of the overlap Expert Annotation Summary Access in MaGe menu General overview of the EXPERT annotations performed on protein coding genes for the selected genome: number of validated CDSs, and their distribution in Product Type, Cellular Localization categories and Evidence

EC Number Update Access in MaGe menu This interface lists the EC numbers given by the user during the process of expert annotation which are no longer valid within the ENZYME resource. Genes modified Old EC number definition New EC number for this enzymatic function Old > New Labels Access in MaGe menu Provides CDS label (i.e, locus_tag) correspondences between a new version of the genome being annotated/analysed (progression of the sequencing step) and the old one(s).

Genome Overview -1- Access in Genomic Tools menu Provides general statistical data about a replicon such as sequence features, genomic object number and type, rrna/trna classification, annotation class and contigs coordinates (in case of multi-contig sequences) Genome Overview -2- Access in Genomic Tools menu General overview of ALL annotations performed on protein coding genes for the selected genome: number of validated CDSs, and their distribution in Product Type, Cellular Localization categories and Evidence

Circular Genome View Access in Genomic Tools menu GC percent deviation (GC window - mean GC) in a 1000bp window Predicted CDSs transcribed in the clockwise direction Predicted CDSs transcribed in the counterclockwise direction GC skew (G+C/G-C) in a 1000bp window rrna (blue), trna (green) misc_rna (orange) Transposable elements (pink) and pseudogenes (grey) High quality, zoomable maps of circular genomes. Starting with information of one genome and the features to visualize, CGView converts the input into a graphical map (PNG, JPG, or SVG format) and completes it with labels, a title, legends, and footnotes. In addition to the default full view map, the program can generate a series of hyperlinked maps showing expanded views Reference: Stothard P, Wishart DS. «Circular genome visualization and exploration using CGView.» Bioinformatics. 2005 Feb 15 ;21(4):537-9 Tandem Duplications Access in Genomic Tools menu List of Genomic regions containing tandem duplications of protein coding genes. Tandem duplicated genes have an identity 35% with a minlrap 0.8 and are separated by a maximum of 5 consecutive genes.

Non-Ribosomal Peptide / Polyketide Synthetase NRPS/PKS proteins prediction (2metdb; Bachmann and Ravel, 2009, Methods in Enzymology) COG Automatic Classification Access in Genomic Tools menu Statistic distribution of the protein coding genes of the selected genome within the COG functional categories. These values are computed using the automatic results obtained with the COGNiTOR software http:// www.ncbi.nlm.nih.gov/cog/

Minimal Gene Set Access in Genomic Tools menu Minimal Gene Set includes well conserved housekeeping genes for basic metabolism and macromolecular synthesis, many of which are essential genes (the list of these genes is taken from Gil R, Silva FJ, Peretó J, Moya A. Determination of the core of a minimal bacterial gene set. Microbiol Mol Biol Rev. 2004 Sep ;68(3):518-37). Fusion/Fission prediction tool Access in Comparative Genomics menu Number of genomes wherein genes are merged Genes merged in other genomes Number of genomes wherein gene is split Genes split in other genomes

LinePlot -1- Access in Comparative Genomics menu This tool draws a global comparison, based on synteny results (the size of which can be selected by the user) between 2 bacterial genomes. The picture gives an overview of the conservation of synteny groups between the query genome and another genome chosen from the ones available in our PkGDB database First organism of comparison Genomic Object to display Second organism of comparison LinePlot -2- Access in Comparative Genomics menu

PkGDB/RefSeq synteny Statistics Access in Comparative Genomics menu CDS of the reference sequence in synteny with this comparison replicon Synton statistics CDS of the comparison replicon in synteny with this reference sequence Replicon of comparison Number of CDSs in the replicon of comparison BLAST Searches Access in Searches menu Nucleic or Protein Search Pattern type (Prosite) Identity and alignment length constraints Query Pattern/Sequence Sequences to compare

Export Data Access in Export menu Different file format export for Genome, Region or CDS COG classification for all CDS Role and BioProcess lists Metabolic database of this Genome in the BioCyc format All non-coding sequences Sequence for given positions Sequence for given CDS