MiGA: The Microbial Genome Atlas

Similar documents
Microbiome: 16S rrna Sequencing 3/30/2018

Microbial Taxonomy. Microbes usually have few distinguishing properties that relate them, so a hierarchical taxonomy mainly has not been possible.

Microbes usually have few distinguishing properties that relate them, so a hierarchical taxonomy mainly has not been possible.

Microbial Taxonomy. Slowly evolving molecules (e.g., rrna) used for large-scale structure; "fast- clock" molecules for fine-structure.

Microbial Taxonomy and the Evolution of Diversity

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

Taxonomy. Content. How to determine & classify a species. Phylogeny and evolution

8/23/2014. Phylogeny and the Tree of Life

Chapter 19. Microbial Taxonomy

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

A. Incorrect! In the binomial naming convention the Kingdom is not part of the name.

The minimal prokaryotic genome. The minimal prokaryotic genome. The minimal prokaryotic genome. The minimal prokaryotic genome

PHYLOGENY AND SYSTEMATICS

Bergey s Manual Classification Scheme. Vertical inheritance and evolutionary mechanisms

Macroevolution Part I: Phylogenies

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.

Ch 10. Classification of Microorganisms

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Name: Class: Date: ID: A

Interpreting the Molecular Tree of Life: What Happened in Early Evolution? Norm Pace MCD Biology University of Colorado-Boulder

Methods for Microbiome Analysis

Chapter 17. Table of Contents. Objectives. Taxonomy. Classifying Organisms. Section 1 Biodiversity. Section 2 Systematics

Outline. I. Methods. II. Preliminary Results. A. Phylogeny Methods B. Whole Genome Methods C. Horizontal Gene Transfer

Stepping stones towards a new electronic prokaryotic taxonomy. The ultimate goal in taxonomy. Pragmatic towards diagnostics

Phylogeny and the Tree of Life

Phylogeny 9/8/2014. Evolutionary Relationships. Data Supporting Phylogeny. Chapter 26

Microbial Diversity and Assessment (II) Spring, 2007 Guangyi Wang, Ph.D. POST103B

Microbiology / Active Lecture Questions Chapter 10 Classification of Microorganisms 1 Chapter 10 Classification of Microorganisms

Outline. Classification of Living Things

Fitness constraints on horizontal gene transfer

Microbiology Helmut Pospiech

Origins of Life. Fundamental Properties of Life. Conditions on Early Earth. Evolution of Cells. The Tree of Life

Introduction to Evolutionary Concepts

Bacillus anthracis. Last Lecture: 1. Introduction 2. History 3. Koch s Postulates. 1. Prokaryote vs. Eukaryote 2. Classifying prokaryotes

What examples can you think of?

Chapter 26 Phylogeny and the Tree of Life

Assigning Taxonomy to Marker Genes. Susan Huse Brown University August 7, 2014

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010

BLAST. Varieties of BLAST

Test Bank for Microbiology A Systems Approach 3rd edition by Cowan

Phylogeny and the Tree of Life

The Tree of Life. Chapter 17

Unit 5: Taxonomy. KEY CONCEPT Organisms can be classified based on physical similarities.

Sec$on 9. Evolu$onary Rela$onships

Homology and Information Gathering and Domain Annotation for Proteins

Chapter 18 Systematics: Seeking Order Amidst Diversity

MICROBIAL BIOCHEMISTRY BIOT 309. Dr. Leslye Johnson Sept. 30, 2012

Comparing Prokaryotic and Eukaryotic Cells

Inferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT

Title ghost-tree: creating hybrid-gene phylogenetic trees for diversity analyses

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

Microbial Taxonomy and Phylogeny: Extending from rrnas to Genomes

The practice of naming and classifying organisms is called taxonomy.

Genetic Variation: The genetic substrate for natural selection. Horizontal Gene Transfer. General Principles 10/2/17.

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics)

Microbial Taxonomy. C. Microbes usually have few distinguishing properties that relate them, so a hierarchical taxonomy mainly has not been possible.

Test Bank for Microbiology A Systems Approach 3rd edition by Cowan

Comparative genomics: Overview & Tools + MUMmer algorithm

Comparative Bioinformatics Midterm II Fall 2004

Bacterial Communities in Women with Bacterial Vaginosis: High Resolution Phylogenetic Analyses Reveal Relationships of Microbiota to Clinical Criteria

9/19/2012. Chapter 17 Organizing Life s Diversity. Early Systems of Classification

Chapters 25 and 26. Searching for Homology. Phylogeny

Genomes and Their Evolution

Taxonomy and Clustering of SSU rrna Tags. Susan Huse Josephine Bay Paul Center August 5, 2013

Concept Modern Taxonomy reflects evolutionary history.

Nature Biotechnology: doi: /nbt Supplementary Figure 1. Detailed overview of the primer-free full-length SSU rrna library preparation.

Phylogeny and the Tree of Life

Biodiversity. The Road to the Six Kingdoms of Life

CLASSIFICATION OF LIVING THINGS. Chapter 18

Introduction to polyphasic taxonomy

Fig. 26.7a. Biodiversity. 1. Course Outline Outcomes Instructors Text Grading. 2. Course Syllabus. Fig. 26.7b Table

Classification and Phylogeny

Chapter 19 Organizing Information About Species: Taxonomy and Cladistics

Computational Biology: Basics & Interesting Problems

Microbial Diversity. Yuzhen Ye I609 Bioinformatics Seminar I (Spring 2010) School of Informatics and Computing Indiana University

Classification and Phylogeny

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

Chapter 17. Organizing Life's Diversity

Science Unit Learning Summary

10 Biodiversity Support. AQA Biology. Biodiversity. Specification reference. Learning objectives. Introduction. Background

Modern cellular organisms. From

Biodiversity. The Road to the Six Kingdoms of Life

(Lys), resulting in translation of a polypeptide without the Lys amino acid. resulting in translation of a polypeptide without the Lys amino acid.

Robert Edgar. Independent scientist

This is a repository copy of Microbiology: Mind the gaps in cellular evolution.

Curriculum Links. AQA GCE Biology. AS level

SCIENTIFIC EVIDENCE TO SUPPORT THE THEORY OF EVOLUTION. Using Anatomy, Embryology, Biochemistry, and Paleontology

Exploring Microbes in the Sea. Alma Parada Postdoctoral Scholar Stanford University

Section 18-1 Finding Order in Diversity

Lecture 11 Friday, October 21, 2011

Genomics and Bioinformatics

Biology 211 (2) Week 1 KEY!

9.3 Classification. Lesson Objectives. Vocabulary. Introduction. Linnaean Classification

Biology Assessment. Eligible Texas Essential Knowledge and Skills

Taxonomical Classification using:

C3020 Molecular Evolution. Exercises #3: Phylogenetics

STAAR Biology Assessment

SPECIATION. REPRODUCTIVE BARRIERS PREZYGOTIC: Barriers that prevent fertilization. Habitat isolation Populations can t get together

Classification, Phylogeny yand Evolutionary History

Transcription:

December 12 th 2017 MiGA: The Microbial Genome Atlas Jim Cole Center for Microbial Ecology Dept. of Plant, Soil & Microbial Sciences Michigan State University East Lansing, Michigan U.S.A.

Where I m From Michigan State University - 50,000 students (11,000 graduate students) College of Agriculture & Natural Resources Department of Plant Soil & Microbial Sciences Center for Microbial Ecology - Study the interactions of microbes with each other and with their environment. http://rdp.cme.msu.edu/ 1

Relationship to Health Sciences Microbiome: the microorganisms in a particular environment (including the body or a part of the body). Only 10% of the cells in your body are human! ~23,000 human genes 1,000,000+ genes in human microbiome

Outline of My Presentation Background Material MiGA, the Microbial Genome Atlas http://rdp.cme.msu.edu 3

One Representation of the Tree of Life

Can you name these bacteria? From: Ch. 2 -- Terrestrial Bacteria from Agricultural Soils: By Masoomeh Shams-Ghahfarokhi, Sanaz Kalantari and Mehdi Razzaghi-Abyaneh DOI: 10.5772/45918 http://rdp.cme.msu.edu 5

Elucidation of the three domains of life Carl Woese (1929 2012) Ribosomal RNA sequence as phylogenetic marker Discovered 3 rd kingdom Archaea and Bacteria separate domains Contrast with former Prokaryote hypothesis

Phylogenetic Tree of Life Three domains of life based on the work of Carl Woese and colleagues http://rdp.cme.msu.edu/ 7

Ribosomes Universal Marker Subunits 30S 50S rrna 16S 23S 5S Protein synthesis factory. Core function present in all cellular organisms. Very little evidence of horizontal gene transfer. Historically easy to work with. Purify by centrifugation and extract rrna. Now we use PCR to amplify from genomic DNA rrna genes have conserved regions interspersed with highly variable regions. Conserved regions used for both PCR primers and sequencing primers. http://rdp.cme.msu.edu 8

Diversity of uncultured organisms explored by rrna sequencing David A. Stahl, David J. Lane, Gary J. Olsen and Norman R. Pace Science, New Series, Vol. 224, No. 4647 (Apr. 27, 1984), pp. 409-411 Published by: American Association for the Advancement of Science

Hydrothermal Vent Black Smoker 10

Explosion in rrna Sequencing By 2008, the majority of all bacterial sequences submitted to GenBank were 16S rrna sequences Less than 2% of these had a Latin name attached (valid or not) (R. Christen, 2008)

Growth of rrna data 3.5 Release 11.4: 3,333,501 sequences Environmental Sequences Isolate Sequences 3 2.5 2 1.5 1 No. of Sequences (in Millions) 0.5 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 Year 0

Limits of rrna Phylogeny Slowly evolving - Can t resolve species Short sequence, ~1550 bases High random error Can add LSU rrna, but database is limited

Genes Beyond rrna rrna genes are slowly evolving and present in multiple copies. Other single-copy conserved genes are faster evolving. Many important ecological functions are encoded by genes that are horizontally transferred. Their evolutionary history does not match that of rrna. 14

rplb vs 16S Pariwise Distances in one Order (RefSeq Genomes) Jiarong Guo

1995 Haemophilus influenzae genome published

Bacterial and Archaeal Genomes from cultured organisms (INSDC 9/4/2017) (Compare to >3 million rrna genes) 8227 Complete Genomes 1469 Genomes with Gaps 46773 Scaffolds 49569 Contigs only 106038 Isolate genomes in total Now cheaper to obtain draft genome than single 16S rrna 15 years ago!

Microbial Genomes from Uncultured Organisms Single Cell Genomes: Single microbial cells are separated before sequencing Issues: Incomplete genomes, enzymatic DNA amplification causes artifacts Metagenomic Binning: Grouped from metagenomic assemblies Issues: Incomplete genomes, may mix allelic variants, contamination an issue

Objectives of the MiGA project How would you taxonomically classify a novel genome? How would you build a novel classification for a collection of genomes? In other words: to built the genome-equivalent of the Ribosomal Database Project (RDP) based on the ANI/AAI approach.

Multi-Gene Phylogenetic Analysis Use additional universal marker genes Universal: transcription translation replication Choose for no horizontal gene transfer Unfortunately, few genes meet these criteria 100 130 genes commonly used Compare all genes common between each pair of organisms (Average Identity) Uses larger part of available genome Robust to missing data (partial genomes)

Introduction to the Pangenome Of Terms in Biology: The Pan-Genome by Christoph Weigel In Small Things Considered June 12, 2014 schaechter.asmblog.org

Horizontal gene transfer occurs more readily between closely related organisms

Need to find comparable genes Homologous: for ANI method The existence of shared ancestry between a pair of genes. Orthologous: Inherited by two organisms from the same ancestral sequence. (Usually same function.) Paralogous: Originally created by a duplication event within a single genome. (May have different functions.)

Reciprocal Best Matches - Likely Orthologs Strain A genes Strain B genes

Best matches not reciprocal - Potential Paralogs? Strain A genes Strain B genes

ANI: Average Nucleotide Identity AAI: Average Amino Acid Identity haai: Heuristic AAI Implementation Rodriguez-R & Konstantinidis 2016 PeerJ Preprint 27

Detect not-previously described (novel) taxa % of genome pairs in a taxonomic rank Novel taxa are determined at species, genus & phylum levels Novel species <95% AAI Novel genus <65% AAI Novel phylum <45% AAI

Average Nucleotide Identity - a replacement for DDH Among available genome relatedness indices, average nucleotide identity (ANI) is one of the of the most robust measurements of genomic relatedness between strains, and has great potential in the taxonomy of bacteria and archaea as a substitute for the labour-intensive DNA DNA hybridization (DDH) technique. Kim et al., IJSEM February 2014 vol. 64 no. Pt 2 346-351

MiGA uses average Identity 30

Hierarchical approach genome classification 1 Bacterium vs archeon, 1 CPU 2 Two E. coli genomes, 1 CPU 32

Hierarchical approach to genome classification 1 In the NCBI RefSeq database 2 Phylum, genus, or distant species 33

Pre-clustering references AAI or ANI distances Medoid clustering 34

Query the clustering Medoid clustering 35

Input data types and project types Genome classification against reference Clade project

MiGA s genome clasification output (in part)

16S rrna taxonomy and quality metrics

Genome contamination analysis with MyTaxa MyTaxa* scan of assembled Salmonella from a stool metagenome Detecting chimeras, areas to focus for manual checking, HGT Soon available through: http://enve-omics.gatech.edu/

So many bad quality genomes. What to do? MyTaxa_Scan of a submitted genome B. cereus in 95% of the sequence, Streptococcus pneumoniae in the rest and 16S Detecting chimeras, areas to focus for manual checking, HGT

Clade project See also the ogs.* utilities in the Enveomics Collection

Pangenome calculation in a clade project Enveomics Collection: Rodriguez-R & Konstantinidis, PeerJ 2016

Medoid clustering to call clades Very robust separation even among closely related genome of B. anthracis (>99.5% ANI)

45