Flowchart. (b) (c) (d)

Size: px

Start display at page:

Download "Flowchart. (b) (c) (d)"

Dinah Perkins
5 years ago
Views:

1 Flowchart (c) (b) (d)

2 This workflow consists of the following steps: alpha diversity (microbial community evenness and richness) d1) Generate rarefied OTU tables (mulbple_rarefacbons.py) d2) Compute measures of alpha diversity for each rarefied OTU table (alpha_diversity.py) d3) Collate alpha diversity results (collate_alpha.py) d4) Generate alpha rarefacbon plots (make_rarefacbon_plots.py) beta diversity (similarity between individual microbial communibes) d5) Rarefy OTU table to remove sampling depth heterogeneity (single_rarefacbon.py) d6) Compute beta diversity (beta_diversity.py) d7) Run Principal Coordinates Analysis (principal_coordinates.py) d8) Generate PCoA plots (make_3d_plots.py or make_2d_plots.py) d9) StaBsBcal analyses

3 alpha diversity (microbial community evenness and richness, or the within-sample) Alpha diversity measures in QIIME: (hxp://scikit-bio.org/docs/latest/generated/skbio.diversity.alpha.html) A number of alpha diversity metrics are currently supported in QIIME: alpha_diversity.py -s non-phylogenebc: Shannon-Wiener diversity index

4 alpha diversity (microbial community evenness and richness, or the within-sample) d1) Generate rarefied OTU tables, Perform mulbple subsamplings on an OTU table multiple_rarefactions.py -i otu_table.biom -m 100 -x 140 s 5 -n 2 -o rarefied_otu_tables/ -m, --min Minimum number of seqs/sample for rarefacbon. -x, --max Maximum number of seqs/sample (inclusive) for rarefacbon. -s, --step Size of each steps between the min/max of seqs/sample (e.g. min, min+step... for level <= max). -n, --num_reps The number of iterabons at each step. [default: 10] Any sample containing fewer sequences in the input file than the requested number of sequences per sample is removed from the output rarefied OTU table. --max should not be > number of sequences in the sample with most coverage/depth rarefacbon_##_#.txt: the first set of numbers represents the number of sequences sampled, and the last number represents the iterabon number. In each sample the sum of the counts equals the number of samples taken.

alpha diversity (microbial community evenness and richness, or the within-sample) d1) Generate rarefied OTU tables d2) Compute measures of alpha diversity for each rarefied OTU table This script

5 alpha diversity (microbial community evenness and richness, or the within-sample) d1) Generate rarefied OTU tables d2) Compute measures of alpha diversity for each rarefied OTU table This script processes single OTU table alpha_diversity.py -i otu_table.biom m observed_otus,shannon,pd_whole_tree o alpha_div.txt -t rep_phylo.tre The script processes mulbple OTU tables in the given folder alpha_diversity.py i rarefied_otu_tables/ m observed_otus,shannon,pd_whole_tree o rarefied_otu_tables/ -t rep_phylo.tre

6 alpha diversity (microbial community evenness and richness, or the within-sample) d1) Generate rarefied OTU tables d2) Compute measures of alpha diversity for each rarefied OTU table d3) Collate alpha diversity results collate_alpha.py i rarefied_otu_tables/ -o rarefied_otu_tables/ one file for every alpha diversity metric used.

7 alpha diversity (microbial community evenness and richness, or the within-sample) d1) Generate rarefied OTU tables d2) Compute measures of alpha diversity for each rarefied OTU table d3) Collate alpha diversity results d4) Generate alpha rarefacbon plots make_rarefacbon_plots.py -i rarefied_otu_tables/alpha_div_collated/ -m FasBng_Map.txt --generate_average_tables --generate_per_sample_plots -o rarefied_otu_tables/alpha_plot/

8 This workflow consists of the following steps: alpha diversity (microbial community evenness and richness) d1) Generate rarefied OTU tables (mulbple_rarefacbons.py) d2) Compute measures of alpha diversity for each rarefied OTU table (alpha_diversity.py) d3) Collate alpha diversity results (collate_alpha.py) d4) Generate alpha rarefacbon plots (make_rarefacbon_plots.py) beta diversity (similarity between individual microbial communibes) d5) Rarefy OTU table to remove sampling depth heterogeneity (single_rarefacbon.py) d6) Compute beta diversity (beta_diversity.py) d7) Run Principal Coordinates Analysis (principal_coordinates.py) d8) Generate PCoA plots (make_3d_plots.py or make_2d_plots.py) d9) StaBsBcal analyses

9 beta diversity (similarity between individual microbial communibes) Beta diversity metrics assess the differences between microbial communibes. The fundamental output of these comparisons is a square matrix where a distance or dissimilarity is calculated between every pair of community samples, reflecbng the dissimilarity between those samples. The data in this distance matrix can be visualized with analyses such as Principal Coordinates Analysis (PCoA) and hierarchical clustering. Like alpha diversity, there are many possible beta diversity metrics that can be calculated with QIIME. beta_diversity.py -s Beat diversity measures: phylogenebc & non-phylogenebc phylogenebc measures: weighted & unweighted UniFrac, which are used extensively in recent projects.

10 beta diversity (similarity between individual microbial communibes) d5) Rarefy OTU table to remove sampling depth heterogeneity (opbonal) To compare samples at equal sequencing depth, it creates a subsampled OTU table by random sampling of the input OTU table. Samples that have fewer sequences than the requested rarefacbon depth are omixed. single_rarefaction.py -i otu_table.biom -o otu_table_even100.biom -d 100 -d, --depth Number of sequences to subsample per sample. This is one Bme subsampling on OTU table, different from making rarefacbon curve multiple_rarefactions.py -i otu_table.biom -m 100 -x 140 s 5 -n 2 -o rarefied_otu_tables/

11 beta diversity (similarity between individual microbial communibes) d5) Rarefy OTU table to remove sampling depth heterogeneity d6) Compute beta diversity Single File Beta Diversity (non-phylogenebc): beta_diversity.py -i otu_table.biom -m bray_curbs -o beta_div Single File Beta Diversity (phylogenebc): beta_diversity.py -i otu_table.biom -m weighted_unifrac,unweighted_unifrac -o beta_div -t rep_phylo.tre MulBple File (batch) Beta Diversity (phylogenebc): beta_diversity.py i otu_tables/ -m weighted_unifrac,unweighted_unifrac -o beta_div/ -t rep_phylo.tre

12 visualizabons beta diversity (similarity between individual microbial communibes) d5) Rarefy OTU table to remove sampling depth heterogeneity d6) Compute beta diversity d7) Run Principal Coordinates Analysis PCoA is a technique that helps to extract and visualize a few highly-informabve components of variabon from complex, mulbdimensional data. This is a transformabon that maps the samples present in the distance matrix to a new set of orthogonal axes such that a maximum amount of variabon is explained by the first principal coordinate, etc. The principal coordinates can be ploxed in two or three dimensions to provide an intuibve visualizabon of differences between samples. principal_coordinates.py i beta_div/ -o pcoa/

13 visualizabons beta diversity (similarity between individual microbial communibes) d5) Rarefy OTU table to remove sampling depth heterogeneity d6) Compute beta diversity d7) Run Principal Coordinates Analysis d8) Generate PCoA plots Make 2D PCoA Plots make_2d_plots.py -i pcoa/pcoa_weighted_unifrac_otu_table.txt m FasBng_Map.txt -o 2d_plots/ a specific category to color make_2d_plots.py -i pcoa/pcoa_weighted_unifrac_otu_table.txt m FasBng_Map.txt -o 2d_plots/ -b Treatment any combinabon of categories make_2d_plots.py -i pcoa/pcoa_weighted_unifrac_otu_table.txt m FasBng_Map.txt -o 2d_plots/ -b Treatment&&DOB

14 visualizabons beta diversity (similarity between individual microbial communibes) d5) Rarefy OTU table to remove sampling depth heterogeneity d6) Compute beta diversity d7) Run Principal Coordinates Analysis d8) Generate PCoA plots Make 3D PCoA Plots make_emperor.py -i pcoa/pcoa_weighted_unifrac_otu_table.txt m FasBng_Map.txt o 3d_plots/

15 This workflow consists of the following steps: alpha diversity (microbial community evenness and richness) d1) Generate rarefied OTU tables (mulbple_rarefacbons.py) d2) Compute measures of alpha diversity for each rarefied OTU table (alpha_diversity.py) d3) Collate alpha diversity results (collate_alpha.py) d4) Generate alpha rarefacbon plots (make_rarefacbon_plots.py) beta diversity (similarity between individual microbial communibes) d5) Rarefy OTU table to remove sampling depth heterogeneity (single_rarefacbon.py) d6) Compute beta diversity (beta_diversity.py) d7) Run Principal Coordinates Analysis (principal_coordinates.py) d8) Generate PCoA plots (make_3d_plots.py or make_2d_plots.py) d9) StaBsBcal analyses

$d9.1) StaBsBcal analyses CreaBng Distance Comparison & Plots make_distance_boxplots.py -d weighted_unifrac_otu_table.txt m FasBng_Map.$

16 d9.1) StaBsBcal analyses CreaBng Distance Comparison & Plots make_distance_boxplots.py -d weighted_unifrac_otu_table.txt m FasBng_Map.txt -o./ -f 'Treatment --save_raw_data Plosng Within and Between Distances Comparisons based on two-sided Student's two-sample t-test

d9.2) StaBsBcal analyses d) microbiome diversity analyses Comparing Distance Matrices based on the Mantel test, a non-parametric stabsbcal method that computes the correlabon between two distance

17 d9.2) StaBsBcal analyses d) microbiome diversity analyses Comparing Distance Matrices based on the Mantel test, a non-parametric stabsbcal method that computes the correlabon between two distance matrices. One common applicabon of distance matrix comparison is to determine if correlabon exists between a community distance matrix (e.g. UniFrac distance matrix) and a second matrix derived from an environmental parameter (e.g. difference in ph). If communibes that are at dissimilar ph levels are more different from one another than communibes that are at very similar ph levels. If so, this would indicate posibve correlabon between the two distance matrices. nonparametric means they use permutabons to determine the p-value, or stabsbcal significance. compare_distance_matrices.py --method=mantel i weighted_unifrac_dm.txt,ph_dm.txt o./ -n 999

18 d9.3) StaBsBcal analyses d) microbiome diversity analyses Comparing Categories with stabsbcal methods: Analyzes stabsbcal significance of sample groupings using distance matrices A majority of the comparison are based on the ANOVA family, determine whether the grouping of samples by a given category is stabsbcally significant. ANOSIM is nonparametric, stabsbcal significance is determined through permutabons. It only works with a categorical variable. compare_categories.py --method anoism -i weighted_unifrac_dm.txt -m map.txt -c Treatment o./ -n 999 The p-value of indicates that at an alpha of 0.05, the grouping of samples by individual is stabsbcally significant. The R value of is fairly close to +1, indicabng dissimilarity between the groups.

19 d9.4) StaBsBcal analyses d) microbiome diversity analyses Comparing Categories with stabsbcal methods Adonis creates a set by first idenbfying the relevant centroids of data and then calculabng the squared deviabons from these points. It can accept either categorical or conbnuous variables in the metadata mapping file. Significance tests are performed using F-tests based on sequenbal sums of squares from permutabons of the raw data. compare_categories.py --method adonis -i weighted_unifrac_dm.txt -m map.txt -c Treatment o./ -n 999

20 d9.5) StaBsBcal analyses Supervised classificabon d) microbiome diversity analyses Supervised classificabon is to classify unlabeled communibes based on a set of labeled training communibes using the Random Forests (R randomforest package needed). supervised_learning.py -i otu_table.biom -m Fasting_Map.txt -c Treatment -o./ OTU tables rarefied at an even depth and collate the results into a single file supervised_learning.py i rarefied_tables/ -m Fasting_Map.txt -c Treatment -o./ -w sl_cv10.txt -e, --errortype Type of error esbmabon. Valid choices are: oob, loo, cv5, cv10. [default oob] oob: out-of-bag, fastest, only builds one classifier, use for quick esbmates loo: leave-one-out cross validabon, use for small data sets (< ~30-50 samples) cv10: 10-fold cross validabon, provides mean and standard deviabon of error, use for best esbmates

21 d9.5) StaBsBcal analyses d) microbiome diversity analyses Supervised classificabon Outputs: 1) summary.txt: including the predicted class labels, the expected generalizabon error of the classifier, the rabo of the baseline error to the esbmated generalizabon error. A reasonable criterion for good classificabon is that this rabo >2, i.e., the classifier does at least twice as well as random guessing. 2) cv_probabilibes.txt: Cross-validaBon esbmates of class probabilibes for samples to avoid overfisng. 3) mislabeling.txt: esbmated probability of the known class, and probability for most likely other class. 4) feature_importance_scores.txt: a list of discriminabve OTUs with their associated importance scores For Random Forests, the importance is expected mean decrease in accuracy when feature is ignored. 5) confusion_matrix.txt: the number of samples whose true class was i that were classified in class j.

22 d9.6) StaBsBcal analyses Tracking the source of microbes (SourceTracker needed) 1) Filter OTUs present in less than 1% of the samples from the OTU table filter_otus_from_otu_table.py -i otu_table.biom -o filtered_otu_table.biom s xx 2) Convert table from BIOM to tab-separated text format biom convert -i filtered_otu_table.biom -o filtered_otu_table.txt -b 3) Run SourceTracker R --slave --vanilla --args -i filtered_otu_table.txt -m map.txt -o./ < sourcetracker_for_qiime.r

23 Taxonomic classificabons of single- and paired-end sequences The RTAX procedure takes advantage of mate-pair informabon when performing taxonomic classificabon. The addibonal informabon from a second read may allow a more precise taxonomy assignment to be made. The procedure is to perform OTU picking on one read only, but then to obtain addibonal informabon from the second read at the taxonomic classificabon step. assign_taxonomy.py -i otus_rep_set/forward_read.fasta -m rtax --read_1_seqs_fp forward_read.fna --read_2_seqs_fp reverse_read.fna -r gg_97_otus.fasta -t gg_otus_tax.txt --single_ok

24 Baochen Shi CNSI 4338, UCLA

Other resources. Greengenes (bacterial) Silva (bacteria, archaeal and eukarya)

Other resources. Greengenes (bacterial) Silva (bacteria, archaeal and eukarya) General QIIME resources http://qiime.org/ Blog (news, updates): http://qiime.wordpress.com/ Support/forum: https://groups.google.com/forum/#!forum/qiimeforum Citing QIIME: Caporaso, J.G. et al., QIIME