Species Divergence and the Measurement of Microbial Diversity Cathy Lozupone University of Colorado, Boulder. Washington University, St Louis. Outline Classes of diversity measures α vs β diversity Quantitative vs Qualitative Divergence/phylogenetic-based diversity vs Taxon/species Phylogenetic diversity measures that: Compare the total amount of diversity between samples. e.g. Is a polluted lake less diverse than pristine? Test if samples have significantly different membership. e.g. Do gut samples from HIV positive people have different microbes than those from healthy people? Identify environmental variables associated with differences between many samples. e.g. Does ph, organic carbon, soil type, etc correlate with variability across many soils? These measures are not just for microbes! Lozupone, C.A. and R. Knight (2008) Species divergence and the measurement of microbial diversity. FEMS Microbiol Rev. 1-22. How do we describe and compare diversity? α Diversity: How many species are in a sample? (e.g. 6 colors in A and 6 in B) e.g.: Are polluted environments less diverse than pristine? β Diversity: How many species are shared between samples? (e.g. 2 shared colors between A and B) e.g.: Does the microbiota differ with different disease states? A B Quantitative versus Qualitative measures Qualitative: Considers presence absence only α: How many species are in a sample? e.g.: 6 colors in both A and B. β: How many species are shared between samples? e.g.: A and B are identical because the same colors are present in both. Quantitative: Also considers relative abundance. α: Accounts for evenness : e.g. B, where the population is evenly distributed across the 6 species, is more diverse than A, where all species are present but red dominates. β: Samples will be considered more similar if the same species are numerically dominant versus rare. e.g. B and A no longer look identical because of differences in abundance. A B 1
What is a phylogenetic diversity measure? α Diversity: Taxon: How many species are in a sample? Phylogenetic: How much phylogenetic divergence is in a sample? (e.g. B more individually diverse than A - more divergent colors) β Diversity: Taxon: How many species are shared between samples? Phylogenetic: How much phylogenetic distance is shared between samples? (only related colors from B are in A) A B Advantages of phylogenetic techniques. Phylogenetically related organisms are more likely to have similar roles in a community. Taxon-based methods assume a star phylogeny, where all relationships between taxa are ignored. Easily applied to microbial community sequence data. Most (>99%) microbes cannot be cultured. 1. Extract DNA from environmental samples. 3. Generate Sequences: Sanger Pyrosequencing 4. Diversity evaluation. Taxon (Species)-based: Group sequences into OTUs based on % identity. 97% id for species. Phylogeny-based: Majority of phylogenetic diversity is microbial. 2. PCR amplify SSU rrna gene. Adapted from Pace 1997 Science 276:734-740. 2
Phylogenetic Diversity Measures α Diversity Phylogenetic Diversity (PD) Compare the total amount of diversity between samples. β Diversity Test if samples have significantly different membership. UniFrac Significance P test LibShuff Identify environmental variables associated with differences between many samples. Unweighted and Weighted UniFrac DPCoA Compare local and regional diversity Gain in PD (G) NRI-NTI Phylogenetic Diversity (PD) Sum of branches leading to sequences in a sample. Qualitative α diversity. Sample with taxa spanning the most branch length in this tree represents the most phylogenetically and perhaps functionally divergent community. Faith, D.P. (1992) Conservation evaluation and phylogenetic diversity. Biological Conservation 61, 1-10. PD Rarefaction Phylogenetic β diversity: How is diversity partitioned among samples? Plot the amount of branch length against the # of observations. Shape of curve allows for estimating how far we are from sampling all of the phylogenetic diversity. Allows for comparison of phylogenetic diversity between samples. Eckburg, P.B., et al. (2005) Diversity of the human intestinal microbial flora. Science 308, 1635-1638. Do two samples contain significantly different microbial populations? Can we see broad trends that relate many samples and explain them in terms of environmental factors? 3
Unique Fraction (UniFrac) metric Qualitative phylogenetic β diversity. Distance = fraction of the total branch length that is unique to any particular environment. Phylogenetic (P) Test The number of changes between states (samples) required to explain the distribution of sequences on the tree (Fitch parsimony). Sensitive to tree topology but not to branch lengths. Lozupone and Knight, 2005, Appl Environ Microbiol 71:8228 Martin, A.P. (2002) Phylogenetic approaches for describing and comparing the diversity of microbial communities. Appl Environ Microbiol 68, 3673-3682. Is the phylogenetic diversity significantly different between samples? Monte Carlo simulations: randomly permute the data (environment assignments) and determine how often the random data has a more extreme value than the real data. P-values: P-test: fraction of random trees that have less parsimony changes than the real tree. UniFrac: fraction of random trees that have more Unique branch length than the real tree. UniFrac Website: http://bmf //bmf.colorado.edu/unifrac/ LibShuff CX: fraction of sequences in X that are not singletons after grouping through range of sequence distances. CXY: fraction in X that are also in Y Cramer-von Mises statistic: distance between 2 curves. Significance with Monte Carlo. Comparison of Bacteria in two beetles species. Singleton DR, Furlong MA, Rathbun SL & Whitman WB (2001). Appl Environ Microbiol 67: 4374-4376. 4
Clustering with the UniFrac Algorithm Can we see broad trends that relate many samples and explain them in terms of environmental factors? What types of environments have similar phylogenetic diversity? ph Temperature 0-100 C Pressure 1-12 Nutrient Availability Oligotrophic Eutrophic 1-200 atm Lozupone CA & Knight R (2007) Global patterns in bacterial diversity. Proc Natl Acad Sci U S A 104: 11436-11440. Salinity is the most important factor Hierarchical clustering (UPGMA) of the same UniFrac distance matrix PCoA of UniFrac Distance Matrix 5
Qualitative vs Quantitative measures of Phylogenetic β Diversity Qualitative: Unweighted UniFrac Detects factors restrictive for microbial growth. High temperature, low ph, founder effects. Quantitative: Weighted UniFrac, DPCoA. Detects transient changes. Seasonal changes, nutrient availability, response to pollution. Yield different, complementary results and applying both to same data can provide insight into nature of community changes. Weighted UniFrac Qualitative Quantitative Lozupone et al., 2007. Appl Environ Microbiol 73:1576 Mice heterozygous for mutation in Leptin gene interbreed. 16S gene sequenced for bacteria in gut of mothers and offspring. Obesity and Gut Microbiota Unweighted UniFrac Clustering of Mouse Data Weighted UniFrac Ley et al., (2005)Obesity Alters Gut Microbiota, PNAS Vol 102: pp 11070-11075 Mice cluster perfectly by mother No obvious effects of obesity Robust to sampling effort Obese mice mostly cluster together Not robust to sampling effort 6
Unweighted UniFrac Weighted UniFrac Comparison of human stool and mucosal microbes Unweighted: all samples cluster by individual. Weighted: stool looks different. Eckburg, P.B., et al. (2005) Diversity of the human intestinal microbial flora. Science 308, 1635-1638. Measures in the same class cluster the data similarly Double principal coordinates analysis (DPCoA) Another quantitative β diversity measure. A matrix of species distances is first used to ordinate the species using PCoA. The position of the communities in coordinate space is the average position of the species that they contain, weighted by relative abundances. Produces same results as weighted UniFrac. Short reads (pyrosequencing) can recapture the result. UW UniFrac clustering with Arb parsimony insertion of 100 bp reads extending from primer R357. Assignment of short reads to an existing phylogeny (e.g. greengenes coreset) allows for the analysis of very large datasets. Liu Z, Lozupone C, Hamady M, Bushman FD & Knight R (2007) Short pyrosequencing reads suffice for accurate microbial community analysis. Nucleic Acids Res 35: e120. Comparison of Local Diversity to Regional Diversity β-diversity measures can also relate diversity in a single community to the total diversity in a habitat type or globally. Net Relatedness Index (NRI) and Nearest Taxa Index (NTI) Webb CO (2000) Exploring the phylogenetic structure of ecological communities. Am Nat 156: 145-155. Overdispersion of sequences in the tree: Competition important. Underdispersion of sequences: Habitat Filtering important. Gain in PD (G) 7
Gain in PD (G) Which communities contain the most unseen diversity? Branches leading only to sequences in a sample. Faith, D.P. (1992) Conservation evaluation and phylogenetic diversity. Biological Conservation 61, 1-10. Correcting G for sampling effort Regression of G values vs # of OTUs detected in sample. Culture based studies (red) discovered little new diversity. Unique saline environments (e.g. hypersaline mats) discovered more. Lozupone CA & Knight R (2007) Global patterns in bacterial diversity. Proc Natl Acad Sci U S A 104: 11436-11440. Summary Phylogenetic diversity measures can be more powerful than taxon based measures because they use information on how closely related taxa are to each other. Phylogenetic measures are available for both α diversity and β diversity. Quantitative and Qualitative beta diversity measures produce complementary insights into how communities are related. Although several different methods may exist for a particular class of diversity measure - these are likely to give similar results (e.g. DPCoA and Weighted UniFrac). Acknowledgments Rob Knight Micah Hamady Knight and Gordon Labs 8