Training courses on MicroScope platform RGP finder: prediction of Genomic Islands Dynamics of bacterial genomes Gene gain Horizontal gene transfer Gene loss Deletion of one or several genes Duplication Accumulation of pseudogenes Recombination and rearrangements Gene modification Single Nucleotide Polymorphism(SNP)
Prediction of RGPs (Region Of Genomic Plasticity) using Synteny breaks Method : Use BBH (Bidirectional Best Hits) and Synteny (conservation of gene context) to determine conserved blocks between a query organism and a set of compared sequences. Predicted RGPs are regions 5kb between these blocks. RGPs can be sites of insertion of MGEs (Mobile Genetic Elements) or they can result from deletion of particular segments of DNA in one or more compared strains Che et al, (2014) Pathogens Exemple of synteny break Integrase trna Compared organisms Synteny break = RGP
Genomic island characteristics Insertion hotspots: trna, integrases Composition bias: GC % deviation Presence of mobility genes: phages, IS, transposases Direct repeats (DR) Hacker et al., 2001 Dobrindt et al. 2004 Compared organisms selection Percentage of genes conserved in synteny with the query genome compositional results availability Choose one or several organism(s) from PkGDB and/or RefSeq databases Try to choose related organisms to avoid too much rearrangements from distant species
Results: circular view trna predicted RGP compositional bias method results query specific region Additional compositional methods used : SIGI-HMM : codon usage bias IVOM: variable length k-mers bias Results: RGP description predicted RGP details other compositional bias methods results Specificity percentage : % CDS in RGP not in synteny ( table Features associated with RGPs and feature score (arbitrary score for sorting the
Explore and viewer pages click!!! CDSs inside RGP coordinates compositional methods correspondence similarity summary red = no match or constraints not verified green = match and constraints verified Exercises Using Acinetobacter baumannii AYE: Exo1 : find Regions of Genomic Plasticity in AYE compared to Acinetobacter baylyi ADP1. Ø How many regions are predicted? Take a look to the longest predicted region? How many antibiotic resistance genes are present? Try to find this genomic region on the Circular Genome Viewer.
Other tools Identical Gene Name Access in MaGe menu Provides a list of genes which share identical gene names Two or more genes with the same name
Overlapping CDS Access in MaGe menu List of CDSs which overlap, in their 5' extremity, with the following CDS. This list is useful to remove artefactual CDS (false positive) and/or to correct translational start codon position. Length of the overlap Gene 1 of the overlap Gene 2 of the overlap Expert Annotation Summary Access in MaGe menu General overview of the EXPERT annotations performed on protein coding genes for the selected genome: number of validated CDSs, and their distribution in Product Type, Cellular Localization categories and Evidence
EC Number Update Access in MaGe menu This interface lists the EC numbers given by the user during the process of expert annotation which are no longer valid within the ENZYME resource. Genes modified Old EC number definition New EC number for this enzymatic function Old > New Labels Access in MaGe menu Provides CDS label (i.e, locus_tag) correspondences between a new version of the genome being annotated/analysed (progression of the sequencing step) and the old one(s).
Genome Overview -1- Access in Genomic Tools menu Provides general statistical data about a replicon such as sequence features, genomic object number and type, rrna/trna classification, annotation class and contigs coordinates (in case of multi-contig sequences) Genome Overview -2- Access in Genomic Tools menu General overview of ALL annotations performed on protein coding genes for the selected genome: number of validated CDSs, and their distribution in Product Type, Cellular Localization categories and Evidence
Circular Genome View Access in Genomic Tools menu GC percent deviation (GC window - mean GC) in a 1000bp window Predicted CDSs transcribed in the clockwise direction Predicted CDSs transcribed in the counterclockwise direction GC skew (G+C/G-C) in a 1000bp window rrna (blue), trna (green) misc_rna (orange) Transposable elements (pink) and pseudogenes (grey) High quality, zoomable maps of circular genomes. Starting with information of one genome and the features to visualize, CGView converts the input into a graphical map (PNG, JPG, or SVG format) and completes it with labels, a title, legends, and footnotes. In addition to the default full view map, the program can generate a series of hyperlinked maps showing expanded views Reference: Stothard P, Wishart DS. «Circular genome visualization and exploration using CGView.» Bioinformatics. 2005 Feb 15 ;21(4):537-9 Tandem Duplications Access in Genomic Tools menu List of Genomic regions containing tandem duplications of protein coding genes. Tandem duplicated genes have an identity 35% with a minlrap 0.8 and are separated by a maximum of 5 consecutive genes.
Non-Ribosomal Peptide / Polyketide Synthetase NRPS/PKS proteins prediction (2metdb; Bachmann and Ravel, 2009, Methods in Enzymology) COG Automatic Classification Access in Genomic Tools menu Statistic distribution of the protein coding genes of the selected genome within the COG functional categories. These values are computed using the automatic results obtained with the COGNiTOR software http:// www.ncbi.nlm.nih.gov/cog/
Minimal Gene Set Access in Genomic Tools menu Minimal Gene Set includes well conserved housekeeping genes for basic metabolism and macromolecular synthesis, many of which are essential genes (the list of these genes is taken from Gil R, Silva FJ, Peretó J, Moya A. Determination of the core of a minimal bacterial gene set. Microbiol Mol Biol Rev. 2004 Sep ;68(3):518-37). Fusion/Fission prediction tool Access in Comparative Genomics menu Number of genomes wherein genes are merged Genes merged in other genomes Number of genomes wherein gene is split Genes split in other genomes
LinePlot -1- Access in Comparative Genomics menu This tool draws a global comparison, based on synteny results (the size of which can be selected by the user) between 2 bacterial genomes. The picture gives an overview of the conservation of synteny groups between the query genome and another genome chosen from the ones available in our PkGDB database First organism of comparison Genomic Object to display Second organism of comparison LinePlot -2- Access in Comparative Genomics menu
PkGDB/RefSeq synteny Statistics Access in Comparative Genomics menu CDS of the reference sequence in synteny with this comparison replicon Synton statistics CDS of the comparison replicon in synteny with this reference sequence Replicon of comparison Number of CDSs in the replicon of comparison BLAST Searches Access in Searches menu Nucleic or Protein Search Pattern type (Prosite) Identity and alignment length constraints Query Pattern/Sequence Sequences to compare
Export Data Access in Export menu Different file format export for Genome, Region or CDS COG classification for all CDS Role and BioProcess lists Metabolic database of this Genome in the BioCyc format All non-coding sequences Sequence for given positions Sequence for given CDS