Other resources. Greengenes (bacterial) Silva (bacteria, archaeal and eukarya)

Size: px
Start display at page:

Download "Other resources. Greengenes (bacterial) Silva (bacteria, archaeal and eukarya)"

Transcription

1 General QIIME resources Blog (news, updates): Support/forum: Citing QIIME: Caporaso, J.G. et al., QIIME allows analysis of high-throughput community sequencing data. Nature methods 7(5), (2010) Citing tools used in QIIME: Other useful QIIME papers: Navas-Molina et al., Chapter Nineteen Advancing Our Understanding of the Human Microbiome Using QIIME. Microbial Metagenomics, Metatranscriptomics, and Metaproteomics. Methods in Enzymology. Volume 531, Pages (2013) Data Files and other resources:

2 Other resources Unifrac: Usearch Phyloseq R package: Databases: Greengenes (bacterial) Silva (bacteria, archaeal and eukarya) RDP (bacterial, archael, fungi)

3

4

5 OTU picking (clustering) QIIME has three methods for OTU picking (de novo, closed-reference, and open reference) Sequences grouped together based on sequence identity (de novo) or alignment to reference sequence (closed = reads discarded if they don t match a reference, open = reads that don t hit reference form de novo cluster) QIIME recommends open reference OTU picking for large datasets For de novo OTU picking % identity usually set at >97%, then OTU assumed to represent species Full length 16S rrna gene sequences: 97% cutoff for 16S rrna gene defined by Stackebrandt & Goebel, Reviewed and refined by others, latest being Kim et al IJSEM 64, % 16S rrna gene sequence similarity can be used as the threshold for differentiating two species this is based on full length 16S rrna sequence and cannot be directly extrapolated to microbial community NGS studies.

6 = recommended New method: Uparse: Edgar, R.C. (2013) UPARSE: Highly accurate OTU sequences from microbial amplicon reads, Nature Methods [Pubmed: , dx.doi.org/ /nmeth.2604].

7 Pros and cons of OTU picking approaches Also see:

8

9 Important ecological concepts How is biodiversity defined and measured? Component of biodiversity: RICHNESS Relative abundance EVENNESS Species richness: number of different species in a habitat/sample Species relative abundance: number of each species relative to total number of all species in a sample (number of reads per OTU in a sample relative to total number of reads in that sample) Species evenness: how close in numbers each species in an environment are; distribution Simple example: Richness: 2 species Relative abundance: 5/10 = 0.5 or 50% 5/10 = 0.5 or 50% Even Richness: 2 species Relative abundance: 2/7 = 0.29 or 29% 5/7 = 0.71 or 71% Uneven

10 Alpha diversity: Diversity within a habitat unit (/sample) (Alone) Beta diversity: Diversity Between units/samples Gamma diversity: total diversity in a landscape SIMPLE EXAMPLE Beta X vs Y: 3 shared species Beta Y vs Z: 1 shared species 6 unique species 7 unique species Species richness X Y Alpha: 9 species Alpha: 4 species Alpha: 5 species Z Beta X vs Z: 2 shared species 9 unique species Gamma: Total of 12 unique species (modified from PROBLEM: Doesn t take abundance of each species OR relatedness of each species into account

11 Species richness X Y Z Size adjusted according to abundance X Y Z Phylogenetic relationship Metrics used to describe diversity measure different aspects of the community

12 Note this list is outdated (QIIME1.7)

13 Alpha metrics: estimate richness Observed species: count of unique OTUs in a sample Chao1: how likely it is there are more undiscovered species Sobs = number of species in the sample, F1 = number of singletons (number of species appear once in the sample) F2 = is the number of doubletons (number of species appear twice in the sample). Central concept is that if rare species (singletons) are still being discovered when sampling a community then there is probably more rare species yet to be found. If all species have been found at least twice (doubleton) then it is less likely new species still to be discovered. Both measure richness (number of species) Richness does not take the abundances of the types into account, it is not the same thing as diversity May be useful for judging completeness of sampling, i.e. is sample size/sequencing depth enough to capture all species? see rarefaction

14 Alpha metrics: estimate diversity Shannon diversity index: Shannon-Weaver, Shannon-Wiener, or Shannon Index Complicated computation: Information Theory (other metric use this: Brillouin Indices) Shannon Diversity index (H) characterizes species diversity and accounts for abundance and evenness of the species. Shannon equitability index (EH) is a measure of evenness. If S is the number of observed species, then EH = H/ln (S) Simpson diversity index (1-D): Simpsons Diversity Index = 1-D, Value between 0 and 1. 0 = no diversity, 1 = infinite diversity Simpson Index: D = Σ n(n-1) N (N-1) Species Total (n) n-1 n(n-1) N= total number of individuals of all species, A n = total number of individuals for each species B C D Example: D = Σ n(n-1) = 254 = 254/702 = 0.36 Total 27 = N 254 = Σ n(n-1) N (N-1) 27(27-1) 1-D = = 0.64 Simpsons reciprocal = 1/D Probability of 2 individuals being conspecifics if drawn randomly from an infinitely large community Simple computation: measures species dominance (weighted towards abundance of most common species) (other metrics: McIntosh, and Berger-Parker) Total species richness is downweighed relative to evenness Both indices estimate diversity (richness, abundance and evenness) Simpson diversity less sensitive to richness and more sensitive to evenness than Shannon diversity

15 Alpha metrics: estimate phylogenetic PD/PD_whole_tree: diversity Faith s Phylogenetic Diversity (PD) minimum total branch length of the phylogenetic tree that incorporates all OTUs in a sample Not weighted for abundance PD weighted for abundance see

16 Rarefaction 50 individuals 2 species 250 individuals 4 species 500 individuals Collector s curves NGS: individuals = reads Evaluate sample size: is sequencing depth enough? Comparing the richness and diversity observed in different samples Note rarefaction is not the same as rarefying 8 species Felix Borner / AP

17 Alpha diversity rarefaction QIIME tutorial and some examples MC / Cardenas PA, Cooper PJ, Cox MJ, Chico M, Arias C, et al. (2012) Upper Airways Microbiota in Antibiotic-Naıve Wheezing and Healthy Infants from the Tropics of Rural Ecuador. PLoS ONE 7(10): e doi: /journal.p

18 = recommended

19 Non-phylogenetic beta diversity: Bray Curtis dissimilarity: based on species abundance or count data 0 < BC > 1 0 = identical, two sites have all the same species 1 = two sites do not share any species NB: Not a distance Jaccard index: dissimilarity measure for presence absence data (species present or absent)

20 Phylogenetic beta diversity: UniFrac distance UNIFRAC help:

21 UNIFRAC help: Raw unweighted Unifrac: sum of branch length that is unique to one environment or the other l i is the branch length between node i and its parent, and Ai and Bi are indicators equal to 0 or 1 as descendants of node i are absent or present in communities A and B respectively A = red, B= blue, branches in common are purple, branches unique to A are red and unique to B are blue. Presence/absence metric. Raw weighted Unifrac: Branch lengths are weighted by the relative abundance of sequences Normalised weighted Unifrac: takes abundance and normalises branch length Rapidly evolving lineages (with long branch length can skew unifrac)

22 Unifrac distance matrix and clustering UNIFRAC help:

23 UPGMA clustering of Unifrac distance matrix Unweighted Pair Group Method with Arithmetic Mean constructs a rooted tree or dendrogram that reflects the structure present in a pairwise distance matrix (or similarity matrix); in this case Unifrac distance matrix simple bottom-up agglomerative hierarchical clustering method: nearest two samples are merged into a new higher-level cluster, distance between new cluster and remaining samples calculated, repeat until all samples are clustered Great step-by-step explanation at: (Dr Edwards, University Southampton) assumes a constant rate of evolution (equal rates of mutation) UNIFRAC help:

24 Image from

25 Principal Coordinate Analysis (PCoA or PCO) Also sometimes called classical MDS (multidimensional scaling) Can use any distance matrix (must obey triangle inequality), in this case the Unifrac distance matrix Assumes linear relation represent distance between samples graphically in multidimensional space (n-1 dimension, n = number samples) A new set of reduced variables is derived from the original distances and used to scale samples Samples now represented on 2D or 3D plot with these new variables as axes and the relationship between the sample on the plot should reflect their underlying distance Ordinates data on plot so that axis 1 (PC1) explains the greatest amount of variance, axis 2 (PC2) explains the next greatest amount of variance, etc.

26 Nescent_qiime_tutorial_june2012.pdf

27 Five control samples are all red and the four Fast samples are all blue. This lets you easily visualize clustering by metadata category. The 3d visualization software allows you to rotate the axes to see the data from different perspectives. Metadata categories as they appeared in the columns in your mapping file

28 Jackknife: assess confidence in nodes of UPGMA tree and in PCA Choose smaller number of sequences randomly from each sample Make UPGMA tree from this subset of sequences Compare with the UPGMA tree made from all the sequences This process is repeated (default: 10x) with many random subsets of sequences, and the tree nodes which appear more consistently across subsets have higher support red for % support, yellow for 50-75%, green for 25-50%, and blue for < 25% support

29 The jackknifed replicate PCoA plots can be compared to assess the degree of variation from one replicate to the next. QIIME displays this variation by displaying confidence ellipsoids around the samples represented in a PCoA plot.

30 Communities clustered using PCoA of the unweighted UniFrac distance matrix Science 18 December 2009: Vol. 326 no pp DOI: /science

31 OTU Heatmap Classification of OTUs Samples OTUs

32 Rows (OTUs): Ordered by OTU Phylogenetic tree Columns (samples): ordered by UPGMA tree not currently implemented directly in QIIME use other software such as R

33 R package: phyloseq Phyloseq: McMurdie PJ, Holmes S (2013) phyloseq: An R Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data. PLoS ONE 8(4): e doi: /journal.pone Improved OTU heatmap visualizations can be generated using the plot_heatmap() command in the phyloseq package for R Could use ordination rather than hierarchical clustering to order samples

34 Examples: papers that use other hierarchical clustering to order samples in heatmap

35

36

37 Note on Rarefying beta_diversity_through_plots.py -i otu_table.biom -o bdiv_even100/ -t rep_set.tre -m Fasting_Map.txt -e 100 Note: don t confuse rarefying with rarefaction Rarefaction: sample without replacement at many different sequencing depths, alpha diversity, statistically valid Rarefying: library size normalization by random subsampling without replacement attempt to normalize by selecting same number of sequences from each sample, not statistically valid. E.g. have 100 reads for sample A and from sample B, then take only 100 reads from sample B to normalize. QIIME Describe method for taking different sequencing depth of samples into account without removing data (integrated into R package phyloseq)

38 Other QIIME visuals: OTU network Visual representation of shared OTUs and unique OTUs Red circles = sample White square = OTU Green = fasting Blue = control Core set of OTUs that differentiate fasting from control?

39 Other QIIME analyses: Procrustes Analysis compare UniFrac PCoA plots generated by two different processing pipelines, different 16S variable regions, different sequencing technologies, repeated samples

40 Which 16S database? Databases may differ in: - greengenes (archael, bacterial), Silva (archael, bacterial, eukaryotic), RDP (archael, bacterial) - Coverage or number of sequences, quality - Taxonomic classification of sequences - Frequency of updating - Compatibility of data with choice of analysis platform Options? 1. choose same database throughout your study that is compatible with the tools you are using AND/OR same database used in other studies if you want to compare 2. Compare analysis using different databases 3. Database of well curated/classified sequences specific to your environment(?)

Lecture 2: Diversity, Distances, adonis. Lecture 2: Diversity, Distances, adonis. Alpha- Diversity. Alpha diversity definition(s)

Lecture 2: Diversity, Distances, adonis. Lecture 2: Diversity, Distances, adonis. Alpha- Diversity. Alpha diversity definition(s) Lecture 2: Diversity, Distances, adonis Lecture 2: Diversity, Distances, adonis Diversity - alpha, beta (, gamma) Beta- Diversity in practice: Ecological Distances Unsupervised Learning: Clustering, etc

More information

Taxonomy and Clustering of SSU rrna Tags. Susan Huse Josephine Bay Paul Center August 5, 2013

Taxonomy and Clustering of SSU rrna Tags. Susan Huse Josephine Bay Paul Center August 5, 2013 Taxonomy and Clustering of SSU rrna Tags Susan Huse Josephine Bay Paul Center August 5, 2013 Primary Methods of Taxonomic Assignment Bayesian Kmer Matching RDP http://rdp.cme.msu.edu Wang, et al (2007)

More information

Title ghost-tree: creating hybrid-gene phylogenetic trees for diversity analyses

Title ghost-tree: creating hybrid-gene phylogenetic trees for diversity analyses 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 Title ghost-tree: creating hybrid-gene phylogenetic trees for diversity analyses

More information

Microbiome: 16S rrna Sequencing 3/30/2018

Microbiome: 16S rrna Sequencing 3/30/2018 Microbiome: 16S rrna Sequencing 3/30/2018 Skills from Previous Lectures Central Dogma of Biology Lecture 3: Genetics and Genomics Lecture 4: Microarrays Lecture 12: ChIP-Seq Phylogenetics Lecture 13: Phylogenetics

More information

Amplicon Sequencing. Dr. Orla O Sullivan SIRG Research Fellow Teagasc

Amplicon Sequencing. Dr. Orla O Sullivan SIRG Research Fellow Teagasc Amplicon Sequencing Dr. Orla O Sullivan SIRG Research Fellow Teagasc What is Amplicon Sequencing? Sequencing of target genes (are regions of ) obtained by PCR using gene specific primers. Why do we do

More information

Outline Classes of diversity measures. Species Divergence and the Measurement of Microbial Diversity. How do we describe and compare diversity?

Outline Classes of diversity measures. Species Divergence and the Measurement of Microbial Diversity. How do we describe and compare diversity? Species Divergence and the Measurement of Microbial Diversity Cathy Lozupone University of Colorado, Boulder. Washington University, St Louis. Outline Classes of diversity measures α vs β diversity Quantitative

More information

FIG S1: Rarefaction analysis of observed richness within Drosophila. All calculations were

FIG S1: Rarefaction analysis of observed richness within Drosophila. All calculations were Page 1 of 14 FIG S1: Rarefaction analysis of observed richness within Drosophila. All calculations were performed using mothur (2). OTUs were defined at the 3% divergence threshold using the average neighbor

More information

Lecture: Mixture Models for Microbiome data

Lecture: Mixture Models for Microbiome data Lecture: Mixture Models for Microbiome data Lecture 3: Mixture Models for Microbiome data Outline: - - Sequencing thought experiment Mixture Models (tangent) - (esp. Negative Binomial) - Differential abundance

More information

Chad Burrus April 6, 2010

Chad Burrus April 6, 2010 Chad Burrus April 6, 2010 1 Background What is UniFrac? Materials and Methods Results Discussion Questions 2 The vast majority of microbes cannot be cultured with current methods Only half (26) out of

More information

Nature Biotechnology: doi: /nbt Supplementary Figure 1. Detailed overview of the primer-free full-length SSU rrna library preparation.

Nature Biotechnology: doi: /nbt Supplementary Figure 1. Detailed overview of the primer-free full-length SSU rrna library preparation. Supplementary Figure 1 Detailed overview of the primer-free full-length SSU rrna library preparation. Detailed overview of the primer-free full-length SSU rrna library preparation. Supplementary Figure

More information

Lecture 3: Mixture Models for Microbiome data. Lecture 3: Mixture Models for Microbiome data

Lecture 3: Mixture Models for Microbiome data. Lecture 3: Mixture Models for Microbiome data Lecture 3: Mixture Models for Microbiome data 1 Lecture 3: Mixture Models for Microbiome data Outline: - Mixture Models (Negative Binomial) - DESeq2 / Don t Rarefy. Ever. 2 Hypothesis Tests - reminder

More information

Flowchart. (b) (c) (d)

Flowchart. (b) (c) (d) Flowchart (c) (b) (d) This workflow consists of the following steps: alpha diversity (microbial community evenness and richness) d1) Generate rarefied OTU tables (mulbple_rarefacbons.py) d2) Compute measures

More information

Multivariate Statistics 101. Ordination (PCA, NMDS, CA) Cluster Analysis (UPGMA, Ward s) Canonical Correspondence Analysis

Multivariate Statistics 101. Ordination (PCA, NMDS, CA) Cluster Analysis (UPGMA, Ward s) Canonical Correspondence Analysis Multivariate Statistics 101 Ordination (PCA, NMDS, CA) Cluster Analysis (UPGMA, Ward s) Canonical Correspondence Analysis Multivariate Statistics 101 Copy of slides and exercises PAST software download

More information

Supplementary Materials for

Supplementary Materials for advances.sciencemag.org/cgi/content/full/2/1/e1500997/dc1 Supplementary Materials for Social behavior shapes the chimpanzee pan-microbiome Andrew H. Moeller, Steffen Foerster, Michael L. Wilson, Anne E.

More information

Supplementary Information

Supplementary Information Supplementary Information Table S1. Per-sample sequences, observed OTUs, richness estimates, diversity indices and coverage. Samples codes as follows: YED (Young leaves Endophytes), MED (Mature leaves

More information

Probing diversity in a hidden world: applications of NGS in microbial ecology

Probing diversity in a hidden world: applications of NGS in microbial ecology Probing diversity in a hidden world: applications of NGS in microbial ecology Guus Roeselers TNO, Microbiology & Systems Biology Group Symposium on Next Generation Sequencing October 21, 2013 Royal Museum

More information

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between

More information

An introduction to the picante package

An introduction to the picante package An introduction to the picante package Steven Kembel (steve.kembel@gmail.com) April 2010 Contents 1 Installing picante 1 2 Data formats in picante 2 2.1 Phylogenies................................ 2 2.2

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary information S1 (box). Supplementary Methods description. Prokaryotic Genome Database Archaeal and bacterial genome sequences were downloaded from the NCBI FTP site (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/)

More information

Taxonomical Classification using:

Taxonomical Classification using: Taxonomical Classification using: Extracting ecological signal from noise: introduction to tools for the analysis of NGS data from microbial communities Bergen, April 19-20 2012 INTRODUCTION Taxonomical

More information

Characterizing and predicting cyanobacterial blooms in an 8-year

Characterizing and predicting cyanobacterial blooms in an 8-year 1 2 3 4 5 Characterizing and predicting cyanobacterial blooms in an 8-year amplicon sequencing time-course Authors Nicolas Tromas 1*, Nathalie Fortin 2, Larbi Bedrani 1, Yves Terrat 1, Pedro Cardoso 4,

More information

What is the range of a taxon? A scaling problem at three levels: Spa9al scale Phylogene9c depth Time

What is the range of a taxon? A scaling problem at three levels: Spa9al scale Phylogene9c depth Time What is the range of a taxon? A scaling problem at three levels: Spa9al scale Phylogene9c depth Time 1 5 0.25 0.15 5 0.05 0.05 0.10 2 0.10 0.10 0.20 4 Reminder of what a range-weighted tree is Actual Tree

More information

How to quantify biological diversity: taxonomical, functional and evolutionary aspects. Hanna Tuomisto, University of Turku

How to quantify biological diversity: taxonomical, functional and evolutionary aspects. Hanna Tuomisto, University of Turku How to quantify biological diversity: taxonomical, functional and evolutionary aspects Hanna Tuomisto, University of Turku Why quantify biological diversity? understanding the structure and function of

More information

Lecture 2: Descriptive statistics, normalizations & testing

Lecture 2: Descriptive statistics, normalizations & testing Lecture 2: Descriptive statistics, normalizations & testing From sequences to OTU table Sequencing Sample 1 Sample 2... Sample N Abundances of each microbial taxon in each of the N samples 2 1 Normalizing

More information

The Effect of Primer Choice and Short Read Sequences on the Outcome of 16S rrna Gene Based Diversity Studies

The Effect of Primer Choice and Short Read Sequences on the Outcome of 16S rrna Gene Based Diversity Studies The Effect of Primer Choice and Short Read Sequences on the Outcome of 16S rrna Gene Based Diversity Studies Jonas Ghyselinck 1 *., Stefan Pfeiffer 2 *., Kim Heylen 1, Angela Sessitsch 2, Paul De Vos 1

More information

Censusing the Sea in the 21 st Century

Censusing the Sea in the 21 st Century Censusing the Sea in the 21 st Century Nancy Knowlton & Matthieu Leray Photo: Ove Hoegh-Guldberg Smithsonian s National Museum of Natural History Estimates of Marine/Reef Species Numbers (Millions) Marine

More information

Lecture 5: Ecological distance metrics; Principal Coordinates Analysis. Univariate testing vs. community analysis

Lecture 5: Ecological distance metrics; Principal Coordinates Analysis. Univariate testing vs. community analysis Lecture 5: Ecological distance metrics; Principal Coordinates Analysis Univariate testing vs. community analysis Univariate testing deals with hypotheses concerning individual taxa Is this taxon differentially

More information

Supplemental Online Results:

Supplemental Online Results: Supplemental Online Results: Functional, phylogenetic, and computational determinants of prediction accuracy using reference genomes A series of tests determined the relationship between PICRUSt s prediction

More information

Supplementary Information

Supplementary Information Supplementary Information Altitudinal patterns of diversity and functional traits of metabolically active microorganisms in stream biofilms Linda Wilhelm 1, Katharina Besemer 2, Lena Fragner 3, Hannes

More information

An Automated Phylogenetic Tree-Based Small Subunit rrna Taxonomy and Alignment Pipeline (STAP)

An Automated Phylogenetic Tree-Based Small Subunit rrna Taxonomy and Alignment Pipeline (STAP) An Automated Phylogenetic Tree-Based Small Subunit rrna Taxonomy and Alignment Pipeline (STAP) Dongying Wu 1 *, Amber Hartman 1,6, Naomi Ward 4,5, Jonathan A. Eisen 1,2,3 1 UC Davis Genome Center, University

More information

Microbial analysis with STAMP

Microbial analysis with STAMP Microbial analysis with STAMP Conor Meehan cmeehan@itg.be A quick aside on who I am Tangents already! Who I am A postdoc at the Institute of Tropical Medicine in Antwerp, Belgium Mycobacteria evolution

More information

Studying the effect of species dominance on diversity patterns using Hill numbers-based indices

Studying the effect of species dominance on diversity patterns using Hill numbers-based indices Studying the effect of species dominance on diversity patterns using Hill numbers-based indices Loïc Chalmandrier Loïc Chalmandrier Diversity pattern analysis November 8th 2017 1 / 14 Introduction Diversity

More information

Lecture 5: Ecological distance metrics; Principal Coordinates Analysis. Univariate testing vs. community analysis

Lecture 5: Ecological distance metrics; Principal Coordinates Analysis. Univariate testing vs. community analysis Lecture 5: Ecological distance metrics; Principal Coordinates Analysis Univariate testing vs. community analysis Univariate testing deals with hypotheses concerning individual taxa Is this taxon differentially

More information

Microbiota: Its Evolution and Essence. Hsin-Jung Joyce Wu "Microbiota and man: the story about us

Microbiota: Its Evolution and Essence. Hsin-Jung Joyce Wu Microbiota and man: the story about us Microbiota: Its Evolution and Essence Overview q Define microbiota q Learn the tool q Ecological and evolutionary forces in shaping gut microbiota q Gut microbiota versus free-living microbe communities

More information

Assigning Taxonomy to Marker Genes. Susan Huse Brown University August 7, 2014

Assigning Taxonomy to Marker Genes. Susan Huse Brown University August 7, 2014 Assigning Taxonomy to Marker Genes Susan Huse Brown University August 7, 2014 In a nutshell Taxonomy is assigned by comparing your DNA sequences against a database of DNA sequences from known taxa Marker

More information

BIO 682 Multivariate Statistics Spring 2008

BIO 682 Multivariate Statistics Spring 2008 BIO 682 Multivariate Statistics Spring 2008 Steve Shuster http://www4.nau.edu/shustercourses/bio682/index.htm Lecture 11 Properties of Community Data Gauch 1982, Causton 1988, Jongman 1995 a. Qualitative:

More information

MiGA: The Microbial Genome Atlas

MiGA: The Microbial Genome Atlas December 12 th 2017 MiGA: The Microbial Genome Atlas Jim Cole Center for Microbial Ecology Dept. of Plant, Soil & Microbial Sciences Michigan State University East Lansing, Michigan U.S.A. Where I m From

More information

BINF6201/8201. Molecular phylogenetic methods

BINF6201/8201. Molecular phylogenetic methods BINF60/80 Molecular phylogenetic methods 0-7-06 Phylogenetics Ø According to the evolutionary theory, all life forms on this planet are related to one another by descent. Ø Traditionally, phylogenetics

More information

diversity(datamatrix, index= shannon, base=exp(1))

diversity(datamatrix, index= shannon, base=exp(1)) Tutorial 11: Diversity, Indicator Species Analysis, Cluster Analysis Calculating Diversity Indices The vegan package contains the command diversity() for calculating Shannon and Simpson diversity indices.

More information

Phylogenetic diversity and conservation

Phylogenetic diversity and conservation Phylogenetic diversity and conservation Dan Faith The Australian Museum Applied ecology and human dimensions in biological conservation Biota Program/ FAPESP Nov. 9-10, 2009 BioGENESIS Providing an evolutionary

More information

Exploring Microbes in the Sea. Alma Parada Postdoctoral Scholar Stanford University

Exploring Microbes in the Sea. Alma Parada Postdoctoral Scholar Stanford University Exploring Microbes in the Sea Alma Parada Postdoctoral Scholar Stanford University Cruising the ocean to get us some microbes It s all about the Microbe! Microbes = microorganisms an organism that requires

More information

-Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the

-Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the 1 2 3 -Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the 1950's. -PCA is based on covariance or correlation

More information

Deciphering the Enigma of Undetected Species, Phylogenetic, and Functional Diversity. Based on Good-Turing Theory

Deciphering the Enigma of Undetected Species, Phylogenetic, and Functional Diversity. Based on Good-Turing Theory Metadata S1 Deciphering the Enigma of Undetected Species, Phylogenetic, and Functional Diversity Based on Good-Turing Theory Anne Chao, Chun-Huo Chiu, Robert K. Colwell, Luiz Fernando S. Magnago, Robin

More information

Algorithms in Bioinformatics

Algorithms in Bioinformatics Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Distance Methods Character Methods

More information

Using Topological Data Analysis to find discrimination between microbial states in human microbiome data

Using Topological Data Analysis to find discrimination between microbial states in human microbiome data Using Topological Data Analysis to find discrimination between microbial states in human microbiome data Mehrdad Yazdani 1,2, Larry Smarr 1,3 and Rob Knight 4 1 California Institute for Telecommunications

More information

Bacterial Communities in Women with Bacterial Vaginosis: High Resolution Phylogenetic Analyses Reveal Relationships of Microbiota to Clinical Criteria

Bacterial Communities in Women with Bacterial Vaginosis: High Resolution Phylogenetic Analyses Reveal Relationships of Microbiota to Clinical Criteria Bacterial Communities in Women with Bacterial Vaginosis: High Resolution Phylogenetic Analyses Reveal Relationships of Microbiota to Clinical Criteria Seminar presentation Pierre Barbera Supervised by:

More information

Distance Measures. Objectives: Discuss Distance Measures Illustrate Distance Measures

Distance Measures. Objectives: Discuss Distance Measures Illustrate Distance Measures Distance Measures Objectives: Discuss Distance Measures Illustrate Distance Measures Quantifying Data Similarity Multivariate Analyses Re-map the data from Real World Space to Multi-variate Space Distance

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary information S3 (box) Methods Methods Genome weighting The currently available collection of archaeal and bacterial genomes has a highly biased distribution of isolates across taxa. For example,

More information

Comparative Genomics II

Comparative Genomics II Comparative Genomics II Advances in Bioinformatics and Genomics GEN 240B Jason Stajich May 19 Comparative Genomics II Slide 1/31 Outline Introduction Gene Families Pairwise Methods Phylogenetic Methods

More information

Sampling e ects on beta diversity

Sampling e ects on beta diversity Introduction Methods Results Conclusions Sampling e ects on beta diversity Ben Bolker, Adrian Stier, Craig Osenberg McMaster University, Mathematics & Statistics and Biology UBC, Zoology University of

More information

Multiple Sequence Alignment. Sequences

Multiple Sequence Alignment. Sequences Multiple Sequence Alignment Sequences > YOR020c mstllksaksivplmdrvlvqrikaqaktasglylpe knveklnqaevvavgpgftdangnkvvpqvkvgdqvl ipqfggstiklgnddevilfrdaeilakiakd > crassa mattvrsvksliplldrvlvqrvkaeaktasgiflpe

More information

Overview of clustering analysis. Yuehua Cui

Overview of clustering analysis. Yuehua Cui Overview of clustering analysis Yuehua Cui Email: cuiy@msu.edu http://www.stt.msu.edu/~cui A data set with clear cluster structure How would you design an algorithm for finding the three clusters in this

More information

Robert Edgar. Independent scientist

Robert Edgar. Independent scientist Robert Edgar Independent scientist robert@drive5.com www.drive5.com "Bacterial taxonomy is a hornets nest that no one, really, wants to get into." Referee #1, UTAX paper Assume prokaryotic species meaningful

More information

Introduction to multivariate analysis Outline

Introduction to multivariate analysis Outline Introduction to multivariate analysis Outline Why do a multivariate analysis Ordination, classification, model fitting Principal component analysis Discriminant analysis, quickly Species presence/absence

More information

LDM Package. 1 Overview. Yi-Juan Hu and Glen A. Satten March 19, 2018

LDM Package. 1 Overview. Yi-Juan Hu and Glen A. Satten March 19, 2018 LDM Package Yi-Juan Hu and Glen A. Satten March 19, 2018 1 Overview The LDM package implements the Linear Decomposition Model (Hu and Satten 2018), which provides a single analysis path that includes distance-based

More information

Bem Vindo. Amazonian Biodiversity and Systematics in Brazil.

Bem Vindo. Amazonian Biodiversity and Systematics in Brazil. Bem Vindo Amazonian Biodiversity and Systematics in Brazil. John W. Wenzel Director, Center for Biodiversity and Ecosystems Carnegie Museum of Natural History Pittsburgh, PA. 1800: Alexander von Humbolt

More information

DETECTING BIOLOGICAL AND ENVIRONMENTAL CHANGES: DESIGN AND ANALYSIS OF MONITORING AND EXPERIMENTS (University of Bologna, 3-14 March 2008)

DETECTING BIOLOGICAL AND ENVIRONMENTAL CHANGES: DESIGN AND ANALYSIS OF MONITORING AND EXPERIMENTS (University of Bologna, 3-14 March 2008) Dipartimento di Biologia Evoluzionistica Sperimentale Centro Interdipartimentale di Ricerca per le Scienze Ambientali in Ravenna INTERNATIONAL WINTER SCHOOL UNIVERSITY OF BOLOGNA DETECTING BIOLOGICAL AND

More information

The biogenesis-atbc2012 Training workshop "Evolutionary Approaches to Biodiversity Science" June 2012, Bonito, Brazil

The biogenesis-atbc2012 Training workshop Evolutionary Approaches to Biodiversity Science June 2012, Bonito, Brazil The biogenesis-atbc2012 Training workshop "Evolutionary Approaches to Biodiversity Science" 16-18 June 2012, Bonito, Brazil Phylogenetic and functional diversity (including PD) and phylogenetic conservation

More information

Experimental Design and Data Analysis for Biologists

Experimental Design and Data Analysis for Biologists Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1

More information

DIMENSION REDUCTION AND CLUSTER ANALYSIS

DIMENSION REDUCTION AND CLUSTER ANALYSIS DIMENSION REDUCTION AND CLUSTER ANALYSIS EECS 833, 6 March 2006 Geoff Bohling Assistant Scientist Kansas Geological Survey geoff@kgs.ku.edu 864-2093 Overheads and resources available at http://people.ku.edu/~gbohling/eecs833

More information

Supplementary Information

Supplementary Information Supplementary Information For the article"comparable system-level organization of Archaea and ukaryotes" by J. Podani, Z. N. Oltvai, H. Jeong, B. Tombor, A.-L. Barabási, and. Szathmáry (reference numbers

More information

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

More information

Microbes usually have few distinguishing properties that relate them, so a hierarchical taxonomy mainly has not been possible.

Microbes usually have few distinguishing properties that relate them, so a hierarchical taxonomy mainly has not been possible. Microbial Taxonomy Traditional taxonomy or the classification through identification and nomenclature of microbes, both "prokaryote" and eukaryote, has been in a mess we were stuck with it for traditional

More information

Microbial Taxonomy. Slowly evolving molecules (e.g., rrna) used for large-scale structure; "fast- clock" molecules for fine-structure.

Microbial Taxonomy. Slowly evolving molecules (e.g., rrna) used for large-scale structure; fast- clock molecules for fine-structure. Microbial Taxonomy Traditional taxonomy or the classification through identification and nomenclature of microbes, both "prokaryote" and eukaryote, has been in a mess we were stuck with it for traditional

More information

Taxonomy. Content. How to determine & classify a species. Phylogeny and evolution

Taxonomy. Content. How to determine & classify a species. Phylogeny and evolution Taxonomy Content Why Taxonomy? How to determine & classify a species Domains versus Kingdoms Phylogeny and evolution Why Taxonomy? Classification Arrangement in groups or taxa (taxon = group) Nomenclature

More information

Variations in pelagic bacterial communities in the North Atlantic Ocean coincide with water bodies

Variations in pelagic bacterial communities in the North Atlantic Ocean coincide with water bodies The following supplement accompanies the article Variations in pelagic bacterial communities in the North Atlantic Ocean coincide with water bodies Richard L. Hahnke 1, Christina Probian 1, Bernhard M.

More information

Carlo Vittorio Cannistraci. Minimum Curvilinear Embedding unveils nonlinear patterns in 16S metagenomic data

Carlo Vittorio Cannistraci. Minimum Curvilinear Embedding unveils nonlinear patterns in 16S metagenomic data Carlo Vittorio Cannistraci Minimum Curvilinear Embedding unveils nonlinear patterns in 16S metagenomic data Biomedical Cybernetics Group Biotechnology Center (BIOTEC) Technische Universität Dresden (TUD)

More information

Computational approaches for functional genomics

Computational approaches for functional genomics Computational approaches for functional genomics Kalin Vetsigian October 31, 2001 The rapidly increasing number of completely sequenced genomes have stimulated the development of new methods for finding

More information

Stochastic calculus for summable processes 1

Stochastic calculus for summable processes 1 Stochastic calculus for summable processes 1 Lecture I Definition 1. Statistics is the science of collecting, organizing, summarizing and analyzing the information in order to draw conclusions. It is a

More information

Multivariate Analysis of Ecological Data

Multivariate Analysis of Ecological Data Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology

More information

Phylogenetic trees 07/10/13

Phylogenetic trees 07/10/13 Phylogenetic trees 07/10/13 A tree is the only figure to occur in On the Origin of Species by Charles Darwin. It is a graphical representation of the evolutionary relationships among entities that share

More information

Dr. Amira A. AL-Hosary

Dr. Amira A. AL-Hosary Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

More information

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor Biological Networks:,, and via Relative Description Length By: Tamir Tuller & Benny Chor Presented by: Noga Grebla Content of the presentation Presenting the goals of the research Reviewing basic terms

More information

Biology 211 (2) Week 1 KEY!

Biology 211 (2) Week 1 KEY! Biology 211 (2) Week 1 KEY Chapter 1 KEY FIGURES: 1.2, 1.3, 1.4, 1.5, 1.6, 1.7 VOCABULARY: Adaptation: a trait that increases the fitness Cells: a developed, system bound with a thin outer layer made of

More information

Supplementary Information

Supplementary Information Supplementary Information Supplementary Figure 1. Schematic pipeline for single-cell genome assembly, cleaning and annotation. a. The assembly process was optimized to account for multiple cells putatively

More information

A (short) introduction to phylogenetics

A (short) introduction to phylogenetics A (short) introduction to phylogenetics Thibaut Jombart, Marie-Pauline Beugin MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis with PR Statistics, Millport Field

More information

Sequence Analysis '17- lecture 8. Multiple sequence alignment

Sequence Analysis '17- lecture 8. Multiple sequence alignment Sequence Analysis '17- lecture 8 Multiple sequence alignment Ex5 explanation How many random database search scores have e-values 10? (Answer: 10!) Why? e-value of x = m*p(s x), where m is the database

More information

Microbial Taxonomy and the Evolution of Diversity

Microbial Taxonomy and the Evolution of Diversity 19 Microbial Taxonomy and the Evolution of Diversity Copyright McGraw-Hill Global Education Holdings, LLC. Permission required for reproduction or display. 1 Taxonomy Introduction to Microbial Taxonomy

More information

concentration ( mol l -1 )

concentration ( mol l -1 ) concentration ( mol l -1 ) 8 10 0 20 40 60 80 100 120 140 160 180 methane sulfide ammonium oxygen sulfate (/10) b depth (m) 12 14 Supplementary Figure 1. Water column parameters from August 2011. Chemical

More information

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations. Previously Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations y = Ax Or A simply represents data Notion of eigenvectors,

More information

Phylogenetic Tree Reconstruction

Phylogenetic Tree Reconstruction I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven

More information

CS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1. Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003

CS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1. Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003 CS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1 Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003 Lecturer: Wing-Kin Sung Scribe: Ning K., Shan T., Xiang

More information

Introduction to microbiota data analysis

Introduction to microbiota data analysis Introduction to microbiota data analysis Natalie Knox, PhD Head Bacterial Genomics, Bioinformatics Core National Microbiology Laboratory, Public Health Agency of Canada 2 National Microbiology Laboratory

More information

BAT Biodiversity Assessment Tools, an R package for the measurement and estimation of alpha and beta taxon, phylogenetic and functional diversity

BAT Biodiversity Assessment Tools, an R package for the measurement and estimation of alpha and beta taxon, phylogenetic and functional diversity Methods in Ecology and Evolution 2015, 6, 232 236 doi: 10.1111/2041-210X.12310 APPLICATION BAT Biodiversity Assessment Tools, an R package for the measurement and estimation of alpha and beta taxon, phylogenetic

More information

A Bayesian taxonomic classification method for 16S rrna gene sequences with improved species-level accuracy

A Bayesian taxonomic classification method for 16S rrna gene sequences with improved species-level accuracy Gao et al. BMC Bioinformatics (2017) 18:247 DOI 10.1186/s12859-017-1670-4 SOFTWARE Open Access A Bayesian taxonomic classification method for 16S rrna gene sequences with improved species-level accuracy

More information

Handling Fungal data in MoBeDAC

Handling Fungal data in MoBeDAC Handling Fungal data in MoBeDAC Jason Stajich UC Riverside Fungal Taxonomy and naming undergoing a revolution One fungus, one name http://www.biology.duke.edu/fungi/ mycolab/primers.htm http://www.biology.duke.edu/fungi/

More information

Phylogeny 9/8/2014. Evolutionary Relationships. Data Supporting Phylogeny. Chapter 26

Phylogeny 9/8/2014. Evolutionary Relationships. Data Supporting Phylogeny. Chapter 26 Phylogeny Chapter 26 Taxonomy Taxonomy: ordered division of organisms into categories based on a set of characteristics used to assess similarities and differences Carolus Linnaeus developed binomial nomenclature,

More information

Clusters. Unsupervised Learning. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Clusters. Unsupervised Learning. Luc Anselin.   Copyright 2017 by Luc Anselin, All Rights Reserved Clusters Unsupervised Learning Luc Anselin http://spatial.uchicago.edu 1 curse of dimensionality principal components multidimensional scaling classical clustering methods 2 Curse of Dimensionality 3 Curse

More information

Palaeontological community and diversity analysis brief notes. Oyvind Hammer Paläontologisches Institut und Museum, Zürich

Palaeontological community and diversity analysis brief notes. Oyvind Hammer Paläontologisches Institut und Museum, Zürich Palaeontological community and diversity analysis brief notes Oyvind Hammer Paläontologisches Institut und Museum, Zürich ohammer@nhm.uio.no Zürich, June 3, 2002 Contents 1 Introduction 2 2 The basics

More information

Inferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT

Inferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT Inferring phylogeny Constructing phylogenetic trees Tõnu Margus Contents What is phylogeny? How/why it is possible to infer it? Representing evolutionary relationships on trees What type questions questions

More information

H. Pieter J. van Veelen *, Joana Falcao Salles and B. Irene Tieleman

H. Pieter J. van Veelen *, Joana Falcao Salles and B. Irene Tieleman van Veelen et al. Microbiome (2017) 5:156 DOI 10.1186/s40168-017-0371-6 RESEARCH Open Access Multi-level comparisons of cloacal, skin, feather and nest-associated microbiota suggest considerable influence

More information

Organizing Diversity Taxonomy is the discipline of biology that identifies, names, and classifies organisms according to certain rules.

Organizing Diversity Taxonomy is the discipline of biology that identifies, names, and classifies organisms according to certain rules. 1 2 3 4 5 6 7 8 9 10 Outline 1.1 Introduction to AP Biology 1.2 Big Idea 1: Evolution 1.3 Big Idea 2: Energy and Molecular Building Blocks 1.4 Big Idea 3: Information Storage, Transmission, and Response

More information

Mapping of Science. Bart Thijs ECOOM, K.U.Leuven, Belgium

Mapping of Science. Bart Thijs ECOOM, K.U.Leuven, Belgium Mapping of Science Bart Thijs ECOOM, K.U.Leuven, Belgium Introduction Definition: Mapping of Science is the application of powerful statistical tools and analytical techniques to uncover the structure

More information

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz Phylogenetic Trees What They Are Why We Do It & How To Do It Presented by Amy Harris Dr Brad Morantz Overview What is a phylogenetic tree Why do we do it How do we do it Methods and programs Parallels

More information

Unit 5: Taxonomy. KEY CONCEPT Organisms can be classified based on physical similarities.

Unit 5: Taxonomy. KEY CONCEPT Organisms can be classified based on physical similarities. KEY CONCEPT Organisms can be classified based on physical similarities. Linnaeus developed the scientific naming system still used today. Taxonomy is the science of naming and classifying organisms. White

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION City of origin as a confounding variable. The original study was designed such that the city where sampling was performed was perfectly confounded with where the DNA extractions and sequencing was performed.

More information

The implications of neutral evolution for neutral ecology. Daniel Lawson Bioinformatics and Statistics Scotland Macaulay Institute, Aberdeen

The implications of neutral evolution for neutral ecology. Daniel Lawson Bioinformatics and Statistics Scotland Macaulay Institute, Aberdeen The implications of neutral evolution for neutral ecology Daniel Lawson Bioinformatics and Statistics Scotland Macaulay Institute, Aberdeen How is How is diversity Diversity maintained? maintained? Talk

More information

INTRODUCTION TO MULTIVARIATE ANALYSIS OF ECOLOGICAL DATA

INTRODUCTION TO MULTIVARIATE ANALYSIS OF ECOLOGICAL DATA INTRODUCTION TO MULTIVARIATE ANALYSIS OF ECOLOGICAL DATA David Zelený & Ching-Feng Li INTRODUCTION TO MULTIVARIATE ANALYSIS Ecologial similarity similarity and distance indices Gradient analysis regression,

More information

2/19/2018. Dataset: 85,122 islands 19,392 > 1km 2 17,883 with data

2/19/2018. Dataset: 85,122 islands 19,392 > 1km 2 17,883 with data The group numbers are arbitrary. Remember that you can rotate dendrograms around any node and not change the meaning. So, the order of the clusters is not meaningful. Taking a subset of the data changes

More information

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the

More information