Bacterial Communities in Women with Bacterial Vaginosis: High Resolution Phylogenetic Analyses Reveal Relationships of Microbiota to Clinical Criteria

Similar documents
Microbiome: 16S rrna Sequencing 3/30/2018

MiGA: The Microbial Genome Atlas

Assigning Taxonomy to Marker Genes. Susan Huse Brown University August 7, 2014

Microbes and you ON THE LATEST HUMAN MICROBIOME DISCOVERIES, COMPUTATIONAL QUESTIONS AND SOME SOLUTIONS. Elizabeth Tseng

Microbes usually have few distinguishing properties that relate them, so a hierarchical taxonomy mainly has not been possible.

Microbial Taxonomy. Slowly evolving molecules (e.g., rrna) used for large-scale structure; "fast- clock" molecules for fine-structure.

Taxonomy. Content. How to determine & classify a species. Phylogeny and evolution

A. Incorrect! In the binomial naming convention the Kingdom is not part of the name.

Taxonomy and Clustering of SSU rrna Tags. Susan Huse Josephine Bay Paul Center August 5, 2013

Microbiota: Its Evolution and Essence. Hsin-Jung Joyce Wu "Microbiota and man: the story about us

Exploring Microbes in the Sea. Alma Parada Postdoctoral Scholar Stanford University

Supervised Learning to Predict Geographic Origin of Human Metagenomic Samples

Outline Classes of diversity measures. Species Divergence and the Measurement of Microbial Diversity. How do we describe and compare diversity?

Amplicon Sequencing. Dr. Orla O Sullivan SIRG Research Fellow Teagasc

Ch 10. Classification of Microorganisms

PHYLOGENY AND SYSTEMATICS

SUPPLEMENTARY INFORMATION

Comparison of Three Fugal ITS Reference Sets. Qiong Wang and Jim R. Cole

Bacillus anthracis. Last Lecture: 1. Introduction 2. History 3. Koch s Postulates. 1. Prokaryote vs. Eukaryote 2. Classifying prokaryotes

Phylogenetics: Building Phylogenetic Trees

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

Comparative Genomics II

Microbial Taxonomy. Microbes usually have few distinguishing properties that relate them, so a hierarchical taxonomy mainly has not been possible.

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

Probing diversity in a hidden world: applications of NGS in microbial ecology

Biology 211 (2) Week 1 KEY!

Microbial Taxonomy and the Evolution of Diversity

Phylogenetic Tree Reconstruction

Taxonomical Classification using:

Genomics and bioinformatics summary. Finding genes -- computer searches

Chapter 26 Phylogeny and the Tree of Life

Macroevolution Part I: Phylogenies

Title ghost-tree: creating hybrid-gene phylogenetic trees for diversity analyses

The Effect of Primer Choice and Short Read Sequences on the Outcome of 16S rrna Gene Based Diversity Studies

Sec$on 9. Evolu$onary Rela$onships

Introduction to microbiota data analysis

Microbial analysis with STAMP

8/23/2014. Phylogeny and the Tree of Life

Istituto di Microbiologia. Università Cattolica del Sacro Cuore, Roma. Gut Microbiota assessment and the Meta-HIT program.

Microbial Diversity. Yuzhen Ye I609 Bioinformatics Seminar I (Spring 2010) School of Informatics and Computing Indiana University

Biology 160 Cell Lab. Name Lab Section: 1:00pm 3:00 pm. Student Learning Outcomes:

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor

AP Biology. Cladistics

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

Microbial Diversity and Assessment (II) Spring, 2007 Guangyi Wang, Ph.D. POST103B

A Bayesian taxonomic classification method for 16S rrna gene sequences with improved species-level accuracy

CLASSIFICATION UNIT GUIDE DUE WEDNESDAY 3/1

Niche Modeling. STAMPS - MBL Course Woods Hole, MA - August 9, 2016

SPECIATION. REPRODUCTIVE BARRIERS PREZYGOTIC: Barriers that prevent fertilization. Habitat isolation Populations can t get together

Using Ensembles of Hidden Markov Models for Grand Challenges in Bioinformatics

SUPPLEMENTARY INFORMATION

An Adaptive Association Test for Microbiome Data

a,bD (modules 1 and 10 are required)

Bioinformatics Chapter 1. Introduction

The Evolution of DNA Uptake Sequences in Neisseria Genus from Chromobacteriumviolaceum. Cory Garnett. Introduction

A (short) introduction to phylogenetics

Fig. 26.7a. Biodiversity. 1. Course Outline Outcomes Instructors Text Grading. 2. Course Syllabus. Fig. 26.7b Table

Supplemental Online Results:

Other resources. Greengenes (bacterial) Silva (bacteria, archaeal and eukarya)

Stepping stones towards a new electronic prokaryotic taxonomy. The ultimate goal in taxonomy. Pragmatic towards diagnostics

A Novel Ribosomal-based Method for Studying the Microbial Ecology of Environmental Engineering Systems

Nature Biotechnology: doi: /nbt Supplementary Figure 1. Detailed overview of the primer-free full-length SSU rrna library preparation.

Supplementary Information

Outline. Classification of Living Things

Microbiology / Active Lecture Questions Chapter 10 Classification of Microorganisms 1 Chapter 10 Classification of Microorganisms

Chad Burrus April 6, 2010

Introduction to polyphasic taxonomy

Dr. Amira A. AL-Hosary

a-dB. Code assigned:

Censusing the Sea in the 21 st Century

An Automated Phylogenetic Tree-Based Small Subunit rrna Taxonomy and Alignment Pipeline (STAP)

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

Bioinformatics. Transcriptome

Algorithms in Bioinformatics

From Gene to Protein

Evolutionary trees. Describe the relationship between objects, e.g. species or genes

C3020 Molecular Evolution. Exercises #3: Phylogenetics

Bergey s Manual Classification Scheme. Vertical inheritance and evolutionary mechanisms

Classification, Phylogeny yand Evolutionary History

A. Correct! Taxonomy is the science of classification. B. Incorrect! Taxonomy is the science of classification.

SEPP and TIPP for metagenomic analysis. Tandy Warnow Department of Computer Science University of Texas

Microbiology - Problem Drill 04: Prokayotic & Eukaryotic Cells - Structures and Functions

Mixture Models in Phylogenetic Inference. Mark Pagel and Andrew Meade Reading University.

Introduction to Bioinformatics Online Course: IBT

9.3 Classification. Lesson Objectives. Vocabulary. Introduction. Linnaean Classification

Phylogenomics, Multiple Sequence Alignment, and Metagenomics. Tandy Warnow University of Illinois at Urbana-Champaign

Zhongyi Xiao. Correlation. In probability theory and statistics, correlation indicates the

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Phylogeny and Molecular Evolution. Introduction

7. Tests for selection

Pipelining RDP Data to the Taxomatic Background Accomplishments vs objectives

Chapter 17. Table of Contents. Objectives. Taxonomy. Classifying Organisms. Section 1 Biodiversity. Section 2 Systematics

Dynamic optimisation identifies optimal programs for pathway regulation in prokaryotes. - Supplementary Information -

(b) In 1977, Carl Woese suggested that there are three domains of living organisms: the Archaea, the Bacteria and the Eukaryota.

SEPP and TIPP for metagenomic analysis. Tandy Warnow Department of Computer Science University of Texas

PHYLOGENY & THE TREE OF LIFE

MULTIPLE SEQUENCE ALIGNMENT FOR CONSTRUCTION OF PHYLOGENETIC TREE

Mitochondrial Genome Annotation

Statistical methods for the analysis of microbiome compositional data in HIV studies

Transcription:

Bacterial Communities in Women with Bacterial Vaginosis: High Resolution Phylogenetic Analyses Reveal Relationships of Microbiota to Clinical Criteria Seminar presentation Pierre Barbera Supervised by: Alexandros Stamatakis, Lucas Czech CHAIR OF HIGH PERFORMANCE COMPUTING IN THE LIFE SCIENCES KIT University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association www.kit.edu

Outline 1 Introduction Basics Goals 2 Wet Lab Work PCR of 16S 3 Taxonomic Classification Building the Reference Tree Place Sequence on Tree Taxonomic Assignment 4 Correlation Analysis Kantorovich-Rubinstein Squash Clustering 5 Results and Summary

Outline 1 Introduction Basics Goals 2 Wet Lab Work PCR of 16S 3 Taxonomic Classification Building the Reference Tree Place Sequence on Tree Taxonomic Assignment 4 Correlation Analysis Kantorovich-Rubinstein Squash Clustering 5 Results and Summary

How are bacteria studied?

How are bacteria studied? Can be cultured Bacteria Can t currently be cultured

Metagenomics Microbiome/Microbiota: collection of microorganisms in environmental niche Metagenomics: study of collective genetic material from a microbiome Interactions and composition of Microbiome centrally important

Bacterial Communities rule your life! Human body has roughly 10 trillion cells But it houses 10 times that many bacteria Large part of it in the gut

Bacterial Vaginosis (BV) One of the most common infections of the vagina Around 30% of women in the US are BV-positive Cause still unknown (according to CDC) Linked to imbalances in the microbiome of the vagina

Central questions Is there a core BV biome? Can novel species be identified? Can any synergistic relationships (between bacteria) be identified? What is the effect of race on BV prevalence? Can we identify correlations between microbiome composition and clinical features (of BV)?

Rough Workflow Take sample from Patient Introduction Wet Lab Work Taxonomic Classification Pierre Barbera Bacterial Communities in Women with Bacterial Vaginosis Correlation Analysis Results and Summary July 16, 2015

Rough Workflow Take sample from Patient Isolate and amplify important DNA parts Introduction Wet Lab Work PCR of 16S Taxonomic Classification Pierre Barbera Bacterial Communities in Women with Bacterial Vaginosis Correlation Analysis Results and Summary July 16, 2015

Rough Workflow Take sample from Patient Isolate and amplify important DNA parts PCR of 16S Sequence the DNA Introduction Wet Lab Work Taxonomic Classification Pierre Barbera Bacterial Communities in Women with Bacterial Vaginosis Correlation Analysis Results and Summary July 16, 2015

Rough Workflow Take sample from Patient Isolate and amplify important DNA parts PCR of 16S Sequence the DNA Identify what Species are present Introduction Wet Lab Work? Taxonomic Classification Pierre Barbera Bacterial Communities in Women with Bacterial Vaginosis Correlation Analysis Results and Summary July 16, 2015

Rough Workflow Take sample from Patient Isolate and amplify important DNA parts PCR of 16S Sequence the DNA Identify what Species are present? Establish statistical correlations Introduction Wet Lab Work Taxonomic Classification Pierre Barbera Bacterial Communities in Women with Bacterial Vaginosis Correlation Analysis Results and Summary July 16, 2015

Outline 1 Introduction Basics Goals 2 Wet Lab Work PCR of 16S 3 Taxonomic Classification Building the Reference Tree Place Sequence on Tree Taxonomic Assignment 4 Correlation Analysis Kantorovich-Rubinstein Squash Clustering 5 Results and Summary

Sample Collection Samples from 242 STD clinic patients (Seattle, USA) Vaginal swabs, immediately frozen at 20 C 220 samples had sufficient bacterial volume, basis for further methods

Isolate and Amplify Only a specific part of the sampled DNA, the 16S rrna gene, is needed To isolate and amplify this portion the polymerase chain reaction lab technique is used

16S ribosomal RNA Part of the ribosome in prokaryotes Slow rate of evolution highly conserved between species Very good sequence to establish phylogenies

Polymerase Chain Reaction (PCR) Technique to massively multiply a certain portion of DNA Requires Primer DNA-strands that will delimit the portion to multiply

Sequencing Sequence resulting amplified samples 454 pyrosequencing Result: many different 16S reads per sample This is where the wet lab work ends and the bioinformatics work begins

Outline 1 Introduction Basics Goals 2 Wet Lab Work PCR of 16S 3 Taxonomic Classification Building the Reference Tree Place Sequence on Tree Taxonomic Assignment 4 Correlation Analysis Kantorovich-Rubinstein Squash Clustering 5 Results and Summary

Preprocessing of Reads Classify by barcodes Only keep high quality reads that... start with known barcode contain exact match to used primer are at least 200 base pairs long (excluding primer/barcode) have sufficient quality score Trim primer sequences and barcodes All done in R, using R/Bioconductor package microbiome

Reference Tree Preparations Take known sequences of bacteria known to reside in vaginal environment Trim to 16S region, same as in samples Perform mislabel detection, as public data is often mislabeled/wrong

Mislabel Detection S 3 2 1 0 1 2 3 Compute pairwise distance between all sequences of a taxon Select a primary reference sequences with smallest median distance to all others Discard sequences that are too far from this reference sequence (by some threshold)

Building the Reference Tree Build Multiple Sequence Alignment (MSA) using cmalign Build tree using MSA and RAxML 7.2.7, using GTR model

Place Sequence on Tree? 16S Sequences AACGTA TTATCC GATACA TGATAT Take sequence, find optimal placement on existing tree Optimality meaning Bayesian posterior probability criterion Remember where sequence was placed Done using the pplacer tool

Place Sequence on Tree? 16S Sequences AACGTA TTATCC GATACA TGATAT Take sequence, find optimal placement on existing tree Optimality meaning Bayesian posterior probability criterion Remember where sequence was placed Done using the pplacer tool

Place Sequence on Tree? 16S Sequences AACGTA TTATCC GATACA TGATAT Take sequence, find optimal placement on existing tree Optimality meaning Bayesian posterior probability criterion Remember where sequence was placed Done using the pplacer tool

Place Sequence on Tree? 16S Sequences AACGTA TTATCC GATACA TGATAT Take sequence, find optimal placement on existing tree Optimality meaning Bayesian posterior probability criterion Remember where sequence was placed Done using the pplacer tool

Place Sequence on Tree? 16S Sequences AACGTA TTATCC GATACA TGATAT Take sequence, find optimal placement on existing tree Optimality meaning Bayesian posterior probability criterion Remember where sequence was placed Done using the pplacer tool

Place Sequence on Tree 16S Sequences AACGTA (< middle >) TTATCC GATACA TGATAT Take sequence, find optimal placement on existing tree Optimality meaning Bayesian posterior probability criterion Remember where sequence was placed Done using the pplacer tool

Place Sequence on Tree? 16S Sequences AACGTA (< middle >) TTATCC GATACA TGATAT Take sequence, find optimal placement on existing tree Optimality meaning Bayesian posterior probability criterion Remember where sequence was placed Done using the pplacer tool

Place Sequence on Tree? 16S Sequences AACGTA (< middle >) TTATCC GATACA TGATAT Take sequence, find optimal placement on existing tree Optimality meaning Bayesian posterior probability criterion Remember where sequence was placed Done using the pplacer tool

Taxonomic Assignment AACGTA Interpret as Taxonomic Identifier Lactobacillus Taxonomic DB Assign taxonomic labels to edges of the tree Such that labels are as specific as possible (species, genus, family etc.)

Taxonomic Assignment Lactobacillus AACGTA Interpret as Taxonomic Identifier AACGTA L. vaginalis Taxonomic DB Assign taxonomic labels to edges of the tree Such that labels are as specific as possible (species, genus, family etc.)

Taxonomic Assignment Lactobacillus AACGTA Assign most specific rank AACGTA L. vaginalis Taxonomic DB Assign taxonomic labels to edges of the tree Such that labels are as specific as possible (species, genus, family etc.)

Taxonomic Assignment Lactobacillus L. vaginalis AACGTA Assign most specific rank AACGTA L. vaginalis Taxonomic DB Assign taxonomic labels to edges of the tree Such that labels are as specific as possible (species, genus, family etc.)

Classification Result... 220 virtual trees, one per sample Virtual as in sequences are not contained on one common tree, but are associated with the reference tree by coordinates Each representing the bacterial composition of a patients vaginal environment let s call them sample-trees Basis for further statistical evaluation

Outline 1 Introduction Basics Goals 2 Wet Lab Work PCR of 16S 3 Taxonomic Classification Building the Reference Tree Place Sequence on Tree Taxonomic Assignment 4 Correlation Analysis Kantorovich-Rubinstein Squash Clustering 5 Results and Summary

Correlation Analysis

What next? Now that we have sample-trees we want compare them In the paper this is done by assembling them into another tree, a tree of trees To do that we need a distance metric and a way to cluster the sample-trees

Earth-Mover Distance Distribution 1 Distribution 2 1 1 0 0 Distance between two distributions Blue = mass Distance is the work required to shift the mass such that distributions are equal

Earth-Mover Distance Distribution 1 Distribution 2 1 +4 1 0 0 Distance between two distributions Blue = mass Distance is the work required to shift the mass such that distributions are equal

Earth-Mover Distance Distribution 1 Distribution 2 1 +4 +3 1 0 0 Distance between two distributions Blue = mass Distance is the work required to shift the mass such that distributions are equal

Earth-Mover Distance Distribution 1 Distribution 2 1 +4 +3 +3 1 0 0 Distance between two distributions Blue = mass Distance is the work required to shift the mass such that distributions are equal

Earth-Mover Distance Distribution 1 Distribution 2 1 +4 +3 +3 +3 = 13 1 0 0 Distance between two distributions Blue = mass Distance is the work required to shift the mass such that distributions are equal

Samples as Distribution on the Tree 0.3 0.2 0.4 0.0 Edge labels: fraction of total reads that were placed at an edge

Phylogenetic Kantorovich-Rubinstein Combination of earth-mover distance and trees with read distribution Apply earth-mover distance between the edges of two trees Distance is minimal amount of work required to move mass to match other distribution

Kantorovich-Rubinstein Visualisation Sample 1 Sample 2 0.3 0.2 0.2 0.4 0.0 0.5

Kantorovich-Rubinstein Visualisation Sample 1 Sample 2 +0.2 0.2 0.2 0.4 0.0 0.5

Kantorovich-Rubinstein Visualisation Sample 1 Sample 2 +0.2 + 0.2 0.2 0.5 0.0 0.5

Kantorovich-Rubinstein Visualisation Sample 1 Sample 2 +0.2 + 0.5 + 0.2 0.0 0.5 0.2

Kantorovich-Rubinstein Visualisation Sample 1 Sample 2 +0.2 + 0.5 + 0.2 + 0.5 0.2

Kantorovich-Rubinstein Visualisation Sample 1 Sample 2 +0.2 + 0.5 + 0.2 + 0.5 0.2 Distance = 0.5

Reminder: Hierarchical Clustering Pairwise Distance Matrix (PWD) A B C D A 17 21 27 B 12 18 C 14 D

Reminder: Hierarchical Clustering Pairwise Distance Matrix (PWD) A B C D A 17 21 27 B 12 18 C 14 D B C

Reminder: Hierarchical Clustering Pairwise Distance Matrix (PWD) A X D A 13 27 X 18 D X 6 6 B C merging and branch length assignment

Squash Clustering Building a tree of trees Sample-trees at the tips Pairwise distance matrix based on K-R Distance Merge by building the average of the distributions (squashing) Branch lengths = K-R Distance between trees at two incident nodes

Squashing Visualised 0.3 0.2 Sample 1 0.4 0.0 Sample 2 0.5 0.2

Squashing Visualised 0.3 0.2 0.2 SQUASH! 0.4 0.5 0.0

Squashing Visualised 0.3 0.2 0.2 0.4 0.0 0.5 (0.4 + 0.5)/2 = 0.45 SQUASH!

Squashing Visualised 0.2 0.2 0.45 0.05

Squash Clustering Visualised 0.3 0.2 0.2 0.4 0.0 0.5

Squash Clustering Visualised 0.2 0.2 0.45 0.05 merge step 0.3 0.2 0.2 0.4 0.0 0.5

Squash Clustering Visualised 0.2 0.2 0.45 0.05 0.25 assign branch 0.25 lengths (K-R Distance) 0.3 0.2 0.2 0.4 0.0 0.5

Outline 1 Introduction Basics Goals 2 Wet Lab Work PCR of 16S 3 Taxonomic Classification Building the Reference Tree Place Sequence on Tree Taxonomic Assignment 4 Correlation Analysis Kantorovich-Rubinstein Squash Clustering 5 Results and Summary

Results of Clustering

Results Healthy vaginal microbiome dominated by Lactobacillus spp. Women with BV have highly diverse vaginal microbiome Clinical tests of BV correlate differently well to bacteria Race appears to have influence on whether some bacteria contribute to BV

References I Sujatha Srinivasan et al. Bacterial communities in women with bacterial vaginosis: High resolution phylogenetic analyses reveal relationships of microbiota to clinical criteria. In: PLoS ONE 7.6 (2012). ISSN: 19326203. DOI: 1371/journal.pone.0037818. References

Image sources Slide 4: Slide 5: Slide 10: Slide 15: Slide 15: Slide 26: https://en.wikipedia.org/wiki/petri_dish#/media/file:agar_plate_with_colonies.jpg, Wikipedia user Phyzome https://www.flickr.com/photos/pere/523019984, flickr user pere, with modification https://en.wikipedia.org/wiki/cotton_swab#/media/file:white_menbo.jpg, Wikipedia user Aney https://commons.wikimedia.org/wiki/file:dna_sequence.svg, Wikimedia user Sjef Scatterplots taken from Srinivasan et al. 2012 https://en.wikipedia.org/wiki/ribosome#/media/file:peptide_syn.png, Wikipedia user Boumphreyfr https://en.wikipedia.org/wiki/polymerase_chain_reaction#/media/file: Polymerase_chain_reaction.svg, Wikipedia user Enzoklop https://www.flickr.com/photos/darkuncle/4421756078/, flickr user darkuncle, modification by me References

Identifying Novel Bacteria Take all sequenced reads from interesting bacterial order (here: Clostridiales) Place on tree, cluster into islands with distance cutoff 0.02 Throw away islands that have reads only from one individual Choose representative from reads arbitrarily, BLAST it to find appropriate island label Display islands as leaves on Ref. tree References

Results Novel Bacteria 11 novel bacteria identified Less than 97% identity to known bacteria Range: 91% to 96% References

Some Numbers 425775 sequence reads from 220 samples Median read length: 225bp Mean number of reads per subject: 1620 99.1% of reads were classified at species level References

Kantorovic-Rubinstein Formula Z(P, Q) = T P(τ(y)) Q(τ(y)) λ(dy) Z is the resulting distance P, Q are the distributions T is the tree τ signifies the sub tree below vertex y λ is the length measure References

Classifier Validation Assemble validation set of 16S sequences belonging to Lactobacillus genus (source: RDP) Consisting of subspecies found in human vagina Also met previous distance metric from mislabel detection Trim sequences to match V4 16S region References

Pearson Correlation Coefficient Measure of the linear correlation (dependence) between two variables X and Y Gives a value between +1 and 1 inclusive 1 is total positive correlation 0 is no correlation 1 is total negative correlation References

p-value Qualifies the significance of a result Tells you whether your data rejects your null hypothesis (NH) 0 p 1 p 0.05: strong evidence against NH reject NH p > 0.05: weak evidence against NH failed to reject NH References