Microbial analysis with STAMP Conor Meehan cmeehan@itg.be
A quick aside on who I am Tangents already!
Who I am A postdoc at the Institute of Tropical Medicine in Antwerp, Belgium Mycobacteria evolution and epidemiology Previously postdoc at Dalhousie University, Halifax, Canada Human microbiome (esp. gut and airways) Previously PhD student at NUI Galway, Ireland HIV evolution and drug resistance Biologist trapped in the body of a computer scientist Scripting for speed, not for development
What I do/can maybe help with Microbial genetics Pathogen evolution (esp. HIV and Mycobacteria) Phylogenetic reconstruction Lateral gene transfer Microbiome analysis (esp. gut and airway) Protein structure prediction Species concepts in light of the microbiome Knowing which beers to drink at the Kidd Comparing European and North American lifestyles (as I have done both)
What data do you have?
OTU list Qiime Mothur MG-RAST Pplacer Etc. Function list MG-RAST HUManN KEGG SEED COG/eggNOG PICRUSt Etc. Microbiome datasets
PICRUSt?
PICRUSt Phylogenetic Investigation of Communities by Reconstruction of Unobserved States http://picrust.github.com http://huttenhower.sph.harvard.edu/galaxy/ Langille MG, Zaneveld J et al. (2013) Predictive functional profiling of microbial communities using 16S rrna marker gene sequences. Nature Biotechnology 31, 814-821
16S rrna gene QIIME/ MOTHUR Sample 1 Sample 2 Sample 3 OTU 1 4 0 2 OTU 2 1 0 0 OTU 3 2 4 2 Shotgun Metagenomics MG- RAST/ HUMAnN Sample 1 Sample 2 Sample 3 K00001 20 15 18 K00002 1 2 0 K00003 4 5 4 PICRUST CurFs will talk about this in detail on Friday
I get it, I have data. Now what? Well, what do you want to know?
Potential Questions Differences in abundances between conditions Environmental conditions ph, salinity, etc. Host measurements BMI, age, etc. STAMP Geographical influences Gradients across environmental conditions Composition differences between sites Alpha/Beta diversities GenGIS
STAMP (no S = software)
STAMP Software that allows for statistical comparison of samples to distinguish ecological influences Parks, DH and Beiko RG (2010). Identifying biologically relevant differences between metagenomic communities. Bioinformatics, 26, 715-721 Utilises various statistical tests and corrects for multiple sampling Allows for comparisons of individual samples or groups of samples Outputs graphical and tabular lists of OTUs/functions that differ between groups. Primarily used for comparisons between metagenomes, can also compare between genomes (e.g. COG category counts)
A quick tutorial (i.e. I do it, you watch, we ll call it interactive learning)
Lachnospiraceae sporulation Tutorial dataset Genome analysis suggested that gut-residing Lachnospiraceae undergo sporulation while those in other environments do not. Question was: are there more Lachnospiraceae-related sporulation genes in gut microbiomes than in others? Mapped reads from 3 environments (multiple samples) to sporulationrelated genes in lachnospiraceae genomes Compared environments to see if there is an overabundance in the gut microbiome Part of Meehan CJ & Beiko RG (2014) A phylogenomic view of ecological specialization in the Lachnospiraceae, a family of digestive tract-associated bacteria, Genome Biol Evol. 6(13)
STAMP it
A quick research example (i.e. I show you what I did with STAMP)
An example application Meehan CJ and Beiko RG (2012) Lateral gene transfer of an ABC transporter complex between major constituents of the human gut microbiome, BMC Microbiology 12:248 Dataset: MetaHIT 124 patients Metadata included the BMI of the patient Classed these into low (18-22; 34 samples) and obese (33+; 33 samples) Are there functional differences between the gut microbiomes of these two groups?
Functional assignment and abundance comparisons Assembled contigs from metagenomic reads input to Orphelia Predicts ORFs Any <150nt discarded Homology search against IMG genomes using USEARCH Assigns KOs (good example of why you need to learn to script) Dataset input to STAMP to look for differences between low and high BMI groups
Nickel/peptides transporter Found to be greatest in difference between the low and high BMI groups Contains 5 proteins, 4 of which differed significantly between groups What species are contributing these functions to the microbiome?
Species assignments A phylogenetic tree was built for each of the 5 KOs Full length genes extracted from all IMG genomes Aligned with ClustalOmega, trimmed with BMGE, built with FastTree Metagenomic reads assigned to each of the 5 KOs in previous steps were placed on relevant reference tree Pplacer classifies reads in a rank flexible manner Allows for probability cut-off for selecting assignment Integrates the NCBI taxonomy using Taxtastic Faecalibacterium prausnitzii found to be highly associated with all 5 KOs Examine trees for sister taxa Reveals LGT from other residence of gut microbiome Strain differences in operon presence and gene orders
Species Operon 1 Operon 2 Operon 3 Operon 4 Operon 5 Operon 6 F. prausnitzii M21/2 F. prausnitzii A2-165 F. cf. prausnitzii KLE1255 F. prausnitzii SL3/3 650558314 F. prausnitzii L2-6
Microbial analysis Can take OTU lists and get estimated KO/SEED categories with PICRUSt Once you have OTU and/or functional tables there is a whole host of analyses that can be done Phylogenetic placements and counts (pplacer) Comparisons between samples (STAMP etc.) Comparisons between environmental factors (STAMP etc.) Geographical influence on compositions (gengis) Lets talk gengis after this short break. Download and install gengis from here: http://kiwi.cs.dal.ca/gengis/ Download the GOS dataset from here: http://kiwi.cs.dal.ca/gengis/images/4/48/gos_atlantic.zip Go to the tutorial page here: https://stamps.mbl.edu/index.php/gengis_tutorial