Flowchart. (b) (c) (d)
|
|
- Dinah Perkins
- 5 years ago
- Views:
Transcription
1 Flowchart (c) (b) (d)
2 This workflow consists of the following steps: alpha diversity (microbial community evenness and richness) d1) Generate rarefied OTU tables (mulbple_rarefacbons.py) d2) Compute measures of alpha diversity for each rarefied OTU table (alpha_diversity.py) d3) Collate alpha diversity results (collate_alpha.py) d4) Generate alpha rarefacbon plots (make_rarefacbon_plots.py) beta diversity (similarity between individual microbial communibes) d5) Rarefy OTU table to remove sampling depth heterogeneity (single_rarefacbon.py) d6) Compute beta diversity (beta_diversity.py) d7) Run Principal Coordinates Analysis (principal_coordinates.py) d8) Generate PCoA plots (make_3d_plots.py or make_2d_plots.py) d9) StaBsBcal analyses
3 alpha diversity (microbial community evenness and richness, or the within-sample) Alpha diversity measures in QIIME: (hxp://scikit-bio.org/docs/latest/generated/skbio.diversity.alpha.html) A number of alpha diversity metrics are currently supported in QIIME: alpha_diversity.py -s non-phylogenebc: Shannon-Wiener diversity index
4 alpha diversity (microbial community evenness and richness, or the within-sample) d1) Generate rarefied OTU tables, Perform mulbple subsamplings on an OTU table multiple_rarefactions.py -i otu_table.biom -m 100 -x 140 s 5 -n 2 -o rarefied_otu_tables/ -m, --min Minimum number of seqs/sample for rarefacbon. -x, --max Maximum number of seqs/sample (inclusive) for rarefacbon. -s, --step Size of each steps between the min/max of seqs/sample (e.g. min, min+step... for level <= max). -n, --num_reps The number of iterabons at each step. [default: 10] Any sample containing fewer sequences in the input file than the requested number of sequences per sample is removed from the output rarefied OTU table. --max should not be > number of sequences in the sample with most coverage/depth rarefacbon_##_#.txt: the first set of numbers represents the number of sequences sampled, and the last number represents the iterabon number. In each sample the sum of the counts equals the number of samples taken.
5 alpha diversity (microbial community evenness and richness, or the within-sample) d1) Generate rarefied OTU tables d2) Compute measures of alpha diversity for each rarefied OTU table This script processes single OTU table alpha_diversity.py -i otu_table.biom m observed_otus,shannon,pd_whole_tree o alpha_div.txt -t rep_phylo.tre The script processes mulbple OTU tables in the given folder alpha_diversity.py i rarefied_otu_tables/ m observed_otus,shannon,pd_whole_tree o rarefied_otu_tables/ -t rep_phylo.tre
6 alpha diversity (microbial community evenness and richness, or the within-sample) d1) Generate rarefied OTU tables d2) Compute measures of alpha diversity for each rarefied OTU table d3) Collate alpha diversity results collate_alpha.py i rarefied_otu_tables/ -o rarefied_otu_tables/ one file for every alpha diversity metric used.
7 alpha diversity (microbial community evenness and richness, or the within-sample) d1) Generate rarefied OTU tables d2) Compute measures of alpha diversity for each rarefied OTU table d3) Collate alpha diversity results d4) Generate alpha rarefacbon plots make_rarefacbon_plots.py -i rarefied_otu_tables/alpha_div_collated/ -m FasBng_Map.txt --generate_average_tables --generate_per_sample_plots -o rarefied_otu_tables/alpha_plot/
8 This workflow consists of the following steps: alpha diversity (microbial community evenness and richness) d1) Generate rarefied OTU tables (mulbple_rarefacbons.py) d2) Compute measures of alpha diversity for each rarefied OTU table (alpha_diversity.py) d3) Collate alpha diversity results (collate_alpha.py) d4) Generate alpha rarefacbon plots (make_rarefacbon_plots.py) beta diversity (similarity between individual microbial communibes) d5) Rarefy OTU table to remove sampling depth heterogeneity (single_rarefacbon.py) d6) Compute beta diversity (beta_diversity.py) d7) Run Principal Coordinates Analysis (principal_coordinates.py) d8) Generate PCoA plots (make_3d_plots.py or make_2d_plots.py) d9) StaBsBcal analyses
9 beta diversity (similarity between individual microbial communibes) Beta diversity metrics assess the differences between microbial communibes. The fundamental output of these comparisons is a square matrix where a distance or dissimilarity is calculated between every pair of community samples, reflecbng the dissimilarity between those samples. The data in this distance matrix can be visualized with analyses such as Principal Coordinates Analysis (PCoA) and hierarchical clustering. Like alpha diversity, there are many possible beta diversity metrics that can be calculated with QIIME. beta_diversity.py -s Beat diversity measures: phylogenebc & non-phylogenebc phylogenebc measures: weighted & unweighted UniFrac, which are used extensively in recent projects.
10 beta diversity (similarity between individual microbial communibes) d5) Rarefy OTU table to remove sampling depth heterogeneity (opbonal) To compare samples at equal sequencing depth, it creates a subsampled OTU table by random sampling of the input OTU table. Samples that have fewer sequences than the requested rarefacbon depth are omixed. single_rarefaction.py -i otu_table.biom -o otu_table_even100.biom -d 100 -d, --depth Number of sequences to subsample per sample. This is one Bme subsampling on OTU table, different from making rarefacbon curve multiple_rarefactions.py -i otu_table.biom -m 100 -x 140 s 5 -n 2 -o rarefied_otu_tables/
11 beta diversity (similarity between individual microbial communibes) d5) Rarefy OTU table to remove sampling depth heterogeneity d6) Compute beta diversity Single File Beta Diversity (non-phylogenebc): beta_diversity.py -i otu_table.biom -m bray_curbs -o beta_div Single File Beta Diversity (phylogenebc): beta_diversity.py -i otu_table.biom -m weighted_unifrac,unweighted_unifrac -o beta_div -t rep_phylo.tre MulBple File (batch) Beta Diversity (phylogenebc): beta_diversity.py i otu_tables/ -m weighted_unifrac,unweighted_unifrac -o beta_div/ -t rep_phylo.tre
12 visualizabons beta diversity (similarity between individual microbial communibes) d5) Rarefy OTU table to remove sampling depth heterogeneity d6) Compute beta diversity d7) Run Principal Coordinates Analysis PCoA is a technique that helps to extract and visualize a few highly-informabve components of variabon from complex, mulbdimensional data. This is a transformabon that maps the samples present in the distance matrix to a new set of orthogonal axes such that a maximum amount of variabon is explained by the first principal coordinate, etc. The principal coordinates can be ploxed in two or three dimensions to provide an intuibve visualizabon of differences between samples. principal_coordinates.py i beta_div/ -o pcoa/
13 visualizabons beta diversity (similarity between individual microbial communibes) d5) Rarefy OTU table to remove sampling depth heterogeneity d6) Compute beta diversity d7) Run Principal Coordinates Analysis d8) Generate PCoA plots Make 2D PCoA Plots make_2d_plots.py -i pcoa/pcoa_weighted_unifrac_otu_table.txt m FasBng_Map.txt -o 2d_plots/ a specific category to color make_2d_plots.py -i pcoa/pcoa_weighted_unifrac_otu_table.txt m FasBng_Map.txt -o 2d_plots/ -b Treatment any combinabon of categories make_2d_plots.py -i pcoa/pcoa_weighted_unifrac_otu_table.txt m FasBng_Map.txt -o 2d_plots/ -b Treatment&&DOB
14 visualizabons beta diversity (similarity between individual microbial communibes) d5) Rarefy OTU table to remove sampling depth heterogeneity d6) Compute beta diversity d7) Run Principal Coordinates Analysis d8) Generate PCoA plots Make 3D PCoA Plots make_emperor.py -i pcoa/pcoa_weighted_unifrac_otu_table.txt m FasBng_Map.txt o 3d_plots/
15 This workflow consists of the following steps: alpha diversity (microbial community evenness and richness) d1) Generate rarefied OTU tables (mulbple_rarefacbons.py) d2) Compute measures of alpha diversity for each rarefied OTU table (alpha_diversity.py) d3) Collate alpha diversity results (collate_alpha.py) d4) Generate alpha rarefacbon plots (make_rarefacbon_plots.py) beta diversity (similarity between individual microbial communibes) d5) Rarefy OTU table to remove sampling depth heterogeneity (single_rarefacbon.py) d6) Compute beta diversity (beta_diversity.py) d7) Run Principal Coordinates Analysis (principal_coordinates.py) d8) Generate PCoA plots (make_3d_plots.py or make_2d_plots.py) d9) StaBsBcal analyses
16 d9.1) StaBsBcal analyses CreaBng Distance Comparison & Plots make_distance_boxplots.py -d weighted_unifrac_otu_table.txt m FasBng_Map.txt -o./ -f 'Treatment --save_raw_data Plosng Within and Between Distances Comparisons based on two-sided Student's two-sample t-test
17 d9.2) StaBsBcal analyses d) microbiome diversity analyses Comparing Distance Matrices based on the Mantel test, a non-parametric stabsbcal method that computes the correlabon between two distance matrices. One common applicabon of distance matrix comparison is to determine if correlabon exists between a community distance matrix (e.g. UniFrac distance matrix) and a second matrix derived from an environmental parameter (e.g. difference in ph). If communibes that are at dissimilar ph levels are more different from one another than communibes that are at very similar ph levels. If so, this would indicate posibve correlabon between the two distance matrices. nonparametric means they use permutabons to determine the p-value, or stabsbcal significance. compare_distance_matrices.py --method=mantel i weighted_unifrac_dm.txt,ph_dm.txt o./ -n 999
18 d9.3) StaBsBcal analyses d) microbiome diversity analyses Comparing Categories with stabsbcal methods: Analyzes stabsbcal significance of sample groupings using distance matrices A majority of the comparison are based on the ANOVA family, determine whether the grouping of samples by a given category is stabsbcally significant. ANOSIM is nonparametric, stabsbcal significance is determined through permutabons. It only works with a categorical variable. compare_categories.py --method anoism -i weighted_unifrac_dm.txt -m map.txt -c Treatment o./ -n 999 The p-value of indicates that at an alpha of 0.05, the grouping of samples by individual is stabsbcally significant. The R value of is fairly close to +1, indicabng dissimilarity between the groups.
19 d9.4) StaBsBcal analyses d) microbiome diversity analyses Comparing Categories with stabsbcal methods Adonis creates a set by first idenbfying the relevant centroids of data and then calculabng the squared deviabons from these points. It can accept either categorical or conbnuous variables in the metadata mapping file. Significance tests are performed using F-tests based on sequenbal sums of squares from permutabons of the raw data. compare_categories.py --method adonis -i weighted_unifrac_dm.txt -m map.txt -c Treatment o./ -n 999
20 d9.5) StaBsBcal analyses Supervised classificabon d) microbiome diversity analyses Supervised classificabon is to classify unlabeled communibes based on a set of labeled training communibes using the Random Forests (R randomforest package needed). supervised_learning.py -i otu_table.biom -m Fasting_Map.txt -c Treatment -o./ OTU tables rarefied at an even depth and collate the results into a single file supervised_learning.py i rarefied_tables/ -m Fasting_Map.txt -c Treatment -o./ -w sl_cv10.txt -e, --errortype Type of error esbmabon. Valid choices are: oob, loo, cv5, cv10. [default oob] oob: out-of-bag, fastest, only builds one classifier, use for quick esbmates loo: leave-one-out cross validabon, use for small data sets (< ~30-50 samples) cv10: 10-fold cross validabon, provides mean and standard deviabon of error, use for best esbmates
21 d9.5) StaBsBcal analyses d) microbiome diversity analyses Supervised classificabon Outputs: 1) summary.txt: including the predicted class labels, the expected generalizabon error of the classifier, the rabo of the baseline error to the esbmated generalizabon error. A reasonable criterion for good classificabon is that this rabo >2, i.e., the classifier does at least twice as well as random guessing. 2) cv_probabilibes.txt: Cross-validaBon esbmates of class probabilibes for samples to avoid overfisng. 3) mislabeling.txt: esbmated probability of the known class, and probability for most likely other class. 4) feature_importance_scores.txt: a list of discriminabve OTUs with their associated importance scores For Random Forests, the importance is expected mean decrease in accuracy when feature is ignored. 5) confusion_matrix.txt: the number of samples whose true class was i that were classified in class j.
22 d9.6) StaBsBcal analyses Tracking the source of microbes (SourceTracker needed) 1) Filter OTUs present in less than 1% of the samples from the OTU table filter_otus_from_otu_table.py -i otu_table.biom -o filtered_otu_table.biom s xx 2) Convert table from BIOM to tab-separated text format biom convert -i filtered_otu_table.biom -o filtered_otu_table.txt -b 3) Run SourceTracker R --slave --vanilla --args -i filtered_otu_table.txt -m map.txt -o./ < sourcetracker_for_qiime.r
23 Taxonomic classificabons of single- and paired-end sequences The RTAX procedure takes advantage of mate-pair informabon when performing taxonomic classificabon. The addibonal informabon from a second read may allow a more precise taxonomy assignment to be made. The procedure is to perform OTU picking on one read only, but then to obtain addibonal informabon from the second read at the taxonomic classificabon step. assign_taxonomy.py -i otus_rep_set/forward_read.fasta -m rtax --read_1_seqs_fp forward_read.fna --read_2_seqs_fp reverse_read.fna -r gg_97_otus.fasta -t gg_otus_tax.txt --single_ok
24 Baochen Shi CNSI 4338, UCLA
Other resources. Greengenes (bacterial) Silva (bacteria, archaeal and eukarya)
General QIIME resources http://qiime.org/ Blog (news, updates): http://qiime.wordpress.com/ Support/forum: https://groups.google.com/forum/#!forum/qiimeforum Citing QIIME: Caporaso, J.G. et al., QIIME
More informationLecture 2: Diversity, Distances, adonis. Lecture 2: Diversity, Distances, adonis. Alpha- Diversity. Alpha diversity definition(s)
Lecture 2: Diversity, Distances, adonis Lecture 2: Diversity, Distances, adonis Diversity - alpha, beta (, gamma) Beta- Diversity in practice: Ecological Distances Unsupervised Learning: Clustering, etc
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! h0p://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 1 EvaluaBon
More informationVisit The Chemical Sta/s/cian! h2p://chemicalsta/s/cian.wordpress.com/ PREDICTUM INC. 2NY3
Visit The Chemical Sta/s/cian! h2p://chemicalsta/s/cian.wordpress.com/ PREDICTUM IC. WWW.PREDICTUM.COM 23 Follow Me On Twi2er @chemstateric! PREDICTUM IC. WWW.PREDICTUM.COM 23 I am a new guest blogger
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! h0p://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 1 EvaluaBon
More informationTitle ghost-tree: creating hybrid-gene phylogenetic trees for diversity analyses
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 Title ghost-tree: creating hybrid-gene phylogenetic trees for diversity analyses
More informationSupplementary Information
Supplementary Information Table S1. Per-sample sequences, observed OTUs, richness estimates, diversity indices and coverage. Samples codes as follows: YED (Young leaves Endophytes), MED (Mature leaves
More informationAmplicon Sequencing. Dr. Orla O Sullivan SIRG Research Fellow Teagasc
Amplicon Sequencing Dr. Orla O Sullivan SIRG Research Fellow Teagasc What is Amplicon Sequencing? Sequencing of target genes (are regions of ) obtained by PCR using gene specific primers. Why do we do
More informationSupplementary Figure 1. The sampled single species populations of threespine stickleback
Supplementary Figure 1. The sampled single species populations of threespine stickleback (a & b) and Eurasian perch (c & d) exhibit unimodal variation in both morphology and diet. Unimodal phenotype distributions
More informationMicrobiome: 16S rrna Sequencing 3/30/2018
Microbiome: 16S rrna Sequencing 3/30/2018 Skills from Previous Lectures Central Dogma of Biology Lecture 3: Genetics and Genomics Lecture 4: Microarrays Lecture 12: ChIP-Seq Phylogenetics Lecture 13: Phylogenetics
More informationIndiana University, Fall 2014 P309 Intermediate Physics Lab. Lecture 1: Experimental UncertainBes
Indiana University, Fall 2014 P309 Intermediate Physics Lab Lecture 1: Experimental UncertainBes Reading: Bevington & Robinson, Chapters 1-3 Handouts from hmp://physics.indiana.edu/~courses/p309/f14/ Experimental
More informationEnhancing HST Spectral Products
Enhancing HST Spectral Products Alessandra Aloisi & HST Spectroscopic Legacy Working Group Introducing the HST Spectroscopic Legacy Working Group The HSLWG was formed to study the recommendabons stemming
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More information2/19/2018. Dataset: 85,122 islands 19,392 > 1km 2 17,883 with data
The group numbers are arbitrary. Remember that you can rotate dendrograms around any node and not change the meaning. So, the order of the clusters is not meaningful. Taking a subset of the data changes
More informationCharacterizing and predicting cyanobacterial blooms in an 8-year
1 2 3 4 5 Characterizing and predicting cyanobacterial blooms in an 8-year amplicon sequencing time-course Authors Nicolas Tromas 1*, Nathalie Fortin 2, Larbi Bedrani 1, Yves Terrat 1, Pedro Cardoso 4,
More informationLecture: Mixture Models for Microbiome data
Lecture: Mixture Models for Microbiome data Lecture 3: Mixture Models for Microbiome data Outline: - - Sequencing thought experiment Mixture Models (tangent) - (esp. Negative Binomial) - Differential abundance
More informationTaxonomy and Clustering of SSU rrna Tags. Susan Huse Josephine Bay Paul Center August 5, 2013
Taxonomy and Clustering of SSU rrna Tags Susan Huse Josephine Bay Paul Center August 5, 2013 Primary Methods of Taxonomic Assignment Bayesian Kmer Matching RDP http://rdp.cme.msu.edu Wang, et al (2007)
More informationSupplementary Materials for
advances.sciencemag.org/cgi/content/full/2/1/e1500997/dc1 Supplementary Materials for Social behavior shapes the chimpanzee pan-microbiome Andrew H. Moeller, Steffen Foerster, Michael L. Wilson, Anne E.
More informationOECD QSAR Toolbox v.3.3. Step-by-step example of how to build a userdefined
OECD QSAR Toolbox v.3.3 Step-by-step example of how to build a userdefined QSAR Background Objectives The exercise Workflow of the exercise Outlook 2 Background This is a step-by-step presentation designed
More informationDimensionality Reduction
Dimensionality Reduction Le Song Machine Learning I CSE 674, Fall 23 Unsupervised learning Learning from raw (unlabeled, unannotated, etc) data, as opposed to supervised data where a classification of
More informationTDT 4173 Machine Learning and Case Based Reasoning. Helge Langseth og Agnar Aamodt. NTNU IDI Seksjon for intelligente systemer
TDT 4173 Machine Learning and Case Based Reasoning Lecture 6 Support Vector Machines. Ensemble Methods Helge Langseth og Agnar Aamodt NTNU IDI Seksjon for intelligente systemer Outline 1 Wrap-up from last
More informationShort- course on symmetry and crystallography. Part 1: Point symmetry. Michael Engel Ann Arbor, June 2011
Short- course on symmetry and crystallography Part 1: Point symmetry Michael Engel Ann Arbor, June 2011 Euclidean move Defini&on 1: An Euclidean move T = {A, b} transformabon that leaves space invariant:
More informationOutline Classes of diversity measures. Species Divergence and the Measurement of Microbial Diversity. How do we describe and compare diversity?
Species Divergence and the Measurement of Microbial Diversity Cathy Lozupone University of Colorado, Boulder. Washington University, St Louis. Outline Classes of diversity measures α vs β diversity Quantitative
More informationLecture 3: Mixture Models for Microbiome data. Lecture 3: Mixture Models for Microbiome data
Lecture 3: Mixture Models for Microbiome data 1 Lecture 3: Mixture Models for Microbiome data Outline: - Mixture Models (Negative Binomial) - DESeq2 / Don t Rarefy. Ever. 2 Hypothesis Tests - reminder
More informationSTK4900/ Lecture 5. Program
STK4900/9900 - Lecture 5 Program 1. Checking model assumptions Linearity Equal variances Normality Influential observations Importance of model assumptions 2. Selection of predictors Forward and backward
More informationHYPOTHESIS TESTING: THE CHI-SQUARE STATISTIC
1 HYPOTHESIS TESTING: THE CHI-SQUARE STATISTIC 7 steps of Hypothesis Testing 1. State the hypotheses 2. Identify level of significant 3. Identify the critical values 4. Calculate test statistics 5. Compare
More informationVisual matching: distance measures
Visual matching: distance measures Metric and non-metric distances: what distance to use It is generally assumed that visual data may be thought of as vectors (e.g. histograms) that can be compared for
More informationFIG S1: Rarefaction analysis of observed richness within Drosophila. All calculations were
Page 1 of 14 FIG S1: Rarefaction analysis of observed richness within Drosophila. All calculations were performed using mothur (2). OTUs were defined at the 3% divergence threshold using the average neighbor
More informationRadial selec*on issues for primordial non- Gaussianity detec*on
Radial selec*on issues for primordial non- Gaussianity detec*on Carlos Cunha NG at KICP, University of Chicago April 20, 2012 Radial issues Decoupled from angular selecbon: One average N(z) for all (simplest
More informationLecture 5: Ecological distance metrics; Principal Coordinates Analysis. Univariate testing vs. community analysis
Lecture 5: Ecological distance metrics; Principal Coordinates Analysis Univariate testing vs. community analysis Univariate testing deals with hypotheses concerning individual taxa Is this taxon differentially
More informationAdvanced Statistical Methods: Beyond Linear Regression
Advanced Statistical Methods: Beyond Linear Regression John R. Stevens Utah State University Notes 3. Statistical Methods II Mathematics Educators Worshop 28 March 2009 1 http://www.stat.usu.edu/~jrstevens/pcmi
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationPattern Recognition and Machine Learning. Learning and Evaluation of Pattern Recognition Processes
Pattern Recognition and Machine Learning James L. Crowley ENSIMAG 3 - MMIS Fall Semester 2016 Lesson 1 5 October 2016 Learning and Evaluation of Pattern Recognition Processes Outline Notation...2 1. The
More informationIntroduction to Statistical Analysis using IBM SPSS Statistics (v24)
to Statistical Analysis using IBM SPSS Statistics (v24) to Statistical Analysis Using IBM SPSS Statistics is a two day instructor-led classroom course that provides an application-oriented introduction
More information2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity.
Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity. A global picture of the protein universe will help us to understand
More informationShared Segmentation of Natural Scenes. Dependent Pitman-Yor Processes
Shared Segmentation of Natural Scenes using Dependent Pitman-Yor Processes Erik Sudderth & Michael Jordan University of California, Berkeley Parsing Visual Scenes sky skyscraper sky dome buildings trees
More informationBayesian Structure Modeling. SPFLODD December 1, 2011
Bayesian Structure Modeling SPFLODD December 1, 2011 Outline Defining Bayesian Parametric Bayesian models Latent Dirichlet allocabon (Blei et al., 2003) Bayesian HMM (Goldwater and Griffiths, 2007) A limle
More informationMultivariate analysis of genetic data: exploring groups diversity
Multivariate analysis of genetic data: exploring groups diversity T. Jombart Imperial College London Bogota 01-12-2010 1/42 Outline Introduction Clustering algorithms Hierarchical clustering K-means Multivariate
More informationTOPCAT basics. Modern Astrophysics Techniques. Contact: Mladen Novak,
TOPCAT basics Modern Astrophysics Techniques Contact: Mladen Novak, mlnovak@phy.hr What is TOPCAT? TOPCAT= Tool for OPeraBons on Catalogues And Tables hep://www.star.bris.ac.uk/~mbt/topcat/ Useful, because
More informationY (Nominal/Categorical) 1. Metric (interval/ratio) data for 2+ IVs, and categorical (nominal) data for a single DV
1 Neuendorf Discriminant Analysis The Model X1 X2 X3 X4 DF2 DF3 DF1 Y (Nominal/Categorical) Assumptions: 1. Metric (interval/ratio) data for 2+ IVs, and categorical (nominal) data for a single DV 2. Linearity--in
More informationStatistics Toolbox 6. Apply statistical algorithms and probability models
Statistics Toolbox 6 Apply statistical algorithms and probability models Statistics Toolbox provides engineers, scientists, researchers, financial analysts, and statisticians with a comprehensive set of
More informationLecture 5: Ecological distance metrics; Principal Coordinates Analysis. Univariate testing vs. community analysis
Lecture 5: Ecological distance metrics; Principal Coordinates Analysis Univariate testing vs. community analysis Univariate testing deals with hypotheses concerning individual taxa Is this taxon differentially
More informationStatistical Machine Learning
Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x
More informationLecture 2: Descriptive statistics, normalizations & testing
Lecture 2: Descriptive statistics, normalizations & testing From sequences to OTU table Sequencing Sample 1 Sample 2... Sample N Abundances of each microbial taxon in each of the N samples 2 1 Normalizing
More informationFACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING
FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING Vishwanath Mantha Department for Electrical and Computer Engineering Mississippi State University, Mississippi State, MS 39762 mantha@isip.msstate.edu ABSTRACT
More informationprofileanalysis Innovation with Integrity Quickly pinpointing and identifying potential biomarkers in Proteomics and Metabolomics research
profileanalysis Quickly pinpointing and identifying potential biomarkers in Proteomics and Metabolomics research Innovation with Integrity Omics Research Biomarker Discovery Made Easy by ProfileAnalysis
More informationCSC Neural Networks. Perceptron Learning Rule
CSC 302 1.5 Neural Networks Perceptron Learning Rule 1 Objectives Determining the weight matrix and bias for perceptron networks with many inputs. Explaining what a learning rule is. Developing the perceptron
More informationRarefaction Example. Consider this dataset: Original matrix:
Rarefaction Example Conider thi dataet: Where i diverity highet? S 4 6 6 6 Shannon 1.0375911 0.9176461 0.9908044 1.0397044 What about rarefied diverity? rarefy(community,ample=10) 3.175905 2.576947 2.889674
More informationReal Estate Price Prediction with Regression and Classification CS 229 Autumn 2016 Project Final Report
Real Estate Price Prediction with Regression and Classification CS 229 Autumn 2016 Project Final Report Hujia Yu, Jiafu Wu [hujiay, jiafuwu]@stanford.edu 1. Introduction Housing prices are an important
More informationDIMENSION REDUCTION AND CLUSTER ANALYSIS
DIMENSION REDUCTION AND CLUSTER ANALYSIS EECS 833, 6 March 2006 Geoff Bohling Assistant Scientist Kansas Geological Survey geoff@kgs.ku.edu 864-2093 Overheads and resources available at http://people.ku.edu/~gbohling/eecs833
More informationAgilent MassHunter Profinder: Solving the Challenge of Isotopologue Extraction for Qualitative Flux Analysis
Agilent MassHunter Profinder: Solving the Challenge of Isotopologue Extraction for Qualitative Flux Analysis Technical Overview Introduction Metabolomics studies measure the relative abundance of metabolites
More informationPackage milineage. October 20, 2017
Type Package Package milineage October 20, 2017 Title Association Tests for Microbial Lineages on a Taxonomic Tree Version 2.0 Date 2017-10-18 Author Zheng-Zheng Tang Maintainer Zheng-Zheng Tang
More informationMachine Learning Approaches to Crop Yield Prediction and Climate Change Impact Assessment
Machine Learning Approaches to Crop Yield Prediction and Climate Change Impact Assessment Andrew Crane-Droesch FCSM, March 2018 The views expressed are those of the authors and should not be attributed
More informationMicrobial analysis with STAMP
Microbial analysis with STAMP Conor Meehan cmeehan@itg.be A quick aside on who I am Tangents already! Who I am A postdoc at the Institute of Tropical Medicine in Antwerp, Belgium Mycobacteria evolution
More informationNaïve Bayes Lecture 17
Naïve Bayes Lecture 17 David Sontag New York University Slides adapted from Luke Zettlemoyer, Carlos Guestrin, Dan Klein, and Mehryar Mohri Bayesian Learning Use Bayes rule! Data Likelihood Prior Posterior
More informationncounter PlexSet Data Analysis Guidelines
ncounter PlexSet Data Analysis Guidelines NanoString Technologies, Inc. 530 airview Ave North Seattle, Washington 98109 USA Telephone: 206.378.6266 888.358.6266 E-mail: info@nanostring.com Molecules That
More informationAn Adaptive Association Test for Microbiome Data
An Adaptive Association Test for Microbiome Data Chong Wu 1, Jun Chen 2, Junghi 1 Kim and Wei Pan 1 1 Division of Biostatistics, School of Public Health, University of Minnesota; 2 Division of Biomedical
More informationParametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami
Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric Assumptions The observations must be independent. Dependent variable should be continuous
More informationBayes Decision Theory - I
Bayes Decision Theory - I Nuno Vasconcelos (Ken Kreutz-Delgado) UCSD Statistical Learning from Data Goal: Given a relationship between a feature vector and a vector y, and iid data samples ( i,y i ), find
More informationCh 13 BIOL 100. Biodiversity: sum total of all organisms in an area. Split into three specific levels: Ecosystem diversity.
Ch 13 BIOL 100 Biodiversity: sum total of all organisms in an area Split into three specific levels: Ecosystem diversity Species diversity GeneBc diversity Species diversity Number or variety of species
More informationMeasurement scales. Carlos Bana e Costa, João Lourenço, Mónica Oliveira DECISION SUPPORT MODELS, DEPARTMENT OF ENGINEERING AND MANAGEMENT
Measurement scales Carlos Bana e Costa, João Lourenço, Mónica Oliveira MULTICRITERIA STEPS: Structuring vs. evaluaeon STRUCTURING OPTIONS EVALUATION Points of view OpBons performance profile Plausible
More informationNon-Inferiority Tests for the Ratio of Two Proportions in a Cluster- Randomized Design
Chapter 236 Non-Inferiority Tests for the Ratio of Two Proportions in a Cluster- Randomized Design Introduction This module provides power analysis and sample size calculation for non-inferiority tests
More informationExperimental Design and Data Analysis for Biologists
Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1
More informationWALD LECTURE II LOOKING INSIDE THE BLACK BOX. Leo Breiman UCB Statistics
1 WALD LECTURE II LOOKING INSIDE THE BLACK BOX Leo Breiman UCB Statistics leo@stat.berkeley.edu ORIGIN OF BLACK BOXES 2 Statistics uses data to explore problems. Think of the data as being generated by
More informationMIRA, SVM, k-nn. Lirong Xia
MIRA, SVM, k-nn Lirong Xia Linear Classifiers (perceptrons) Inputs are feature values Each feature has a weight Sum is the activation activation w If the activation is: Positive: output +1 Negative, output
More informationMultivariate Statistics Summary and Comparison of Techniques. Multivariate Techniques
Multivariate Statistics Summary and Comparison of Techniques P The key to multivariate statistics is understanding conceptually the relationship among techniques with regards to: < The kinds of problems
More informationUsing Topological Data Analysis to find discrimination between microbial states in human microbiome data
Using Topological Data Analysis to find discrimination between microbial states in human microbiome data Mehrdad Yazdani 1,2, Larry Smarr 1,3 and Rob Knight 4 1 California Institute for Telecommunications
More informationRNA-seq. Differential analysis
RNA-seq Differential analysis DESeq2 DESeq2 http://bioconductor.org/packages/release/bioc/vignettes/deseq 2/inst/doc/DESeq2.html Input data Why un-normalized counts? As input, the DESeq2 package expects
More informationSUPPLEMENTARY INFORMATION
City of origin as a confounding variable. The original study was designed such that the city where sampling was performed was perfectly confounded with where the DNA extractions and sequencing was performed.
More informationN-gram Language Modeling
N-gram Language Modeling Outline: Statistical Language Model (LM) Intro General N-gram models Basic (non-parametric) n-grams Class LMs Mixtures Part I: Statistical Language Model (LM) Intro What is a statistical
More informationNaïve Bayes. Vibhav Gogate The University of Texas at Dallas
Naïve Bayes Vibhav Gogate The University of Texas at Dallas Supervised Learning of Classifiers Find f Given: Training set {(x i, y i ) i = 1 n} Find: A good approximation to f : X Y Examples: what are
More informationANOVA approach. Investigates interaction terms. Disadvantages: Requires careful sampling design with replication
ANOVA approach Advantages: Ideal for evaluating hypotheses Ideal to quantify effect size (e.g., differences between groups) Address multiple factors at once Investigates interaction terms Disadvantages:
More informationNon-parametric Methods
Non-parametric Methods Machine Learning Alireza Ghane Non-Parametric Methods Alireza Ghane / Torsten Möller 1 Outline Machine Learning: What, Why, and How? Curve Fitting: (e.g.) Regression and Model Selection
More informationDETECTING BIOLOGICAL AND ENVIRONMENTAL CHANGES: DESIGN AND ANALYSIS OF MONITORING AND EXPERIMENTS (University of Bologna, 3-14 March 2008)
Dipartimento di Biologia Evoluzionistica Sperimentale Centro Interdipartimentale di Ricerca per le Scienze Ambientali in Ravenna INTERNATIONAL WINTER SCHOOL UNIVERSITY OF BOLOGNA DETECTING BIOLOGICAL AND
More informationExploiting Sparse Non-Linear Structure in Astronomical Data
Exploiting Sparse Non-Linear Structure in Astronomical Data Ann B. Lee Department of Statistics and Department of Machine Learning, Carnegie Mellon University Joint work with P. Freeman, C. Schafer, and
More informationPhysics 140. Sound. Chapter 12
Physics 140 Sound Chapter 12 Sound waves Sound is composed of longitudinal pressure waves. wave propagabon Compression Compression Compression è when parbcles come together RarefacBon RarefacBon RarefacBon
More informationL11: Pattern recognition principles
L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction
More informationA Program for Data Transformations and Kernel Density Estimation
A Program for Data Transformations and Kernel Density Estimation John G. Manchuk and Clayton V. Deutsch Modeling applications in geostatistics often involve multiple variables that are not multivariate
More informationEasySDM: A Spatial Data Mining Platform
EasySDM: A Spatial Data Mining Platform (User Manual) Authors: Amine Abdaoui and Mohamed Ala Al Chikha, Students at the National Computing Engineering School. Algiers. June 2013. 1. Overview EasySDM is
More informationCHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)
FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationMaximum Likelihood Estimation. only training data is available to design a classifier
Introduction to Pattern Recognition [ Part 5 ] Mahdi Vasighi Introduction Bayesian Decision Theory shows that we could design an optimal classifier if we knew: P( i ) : priors p(x i ) : class-conditional
More informationBacterial Communities in Women with Bacterial Vaginosis: High Resolution Phylogenetic Analyses Reveal Relationships of Microbiota to Clinical Criteria
Bacterial Communities in Women with Bacterial Vaginosis: High Resolution Phylogenetic Analyses Reveal Relationships of Microbiota to Clinical Criteria Seminar presentation Pierre Barbera Supervised by:
More informationLoss Functions and Optimization. Lecture 3-1
Lecture 3: Loss Functions and Optimization Lecture 3-1 Administrative Assignment 1 is released: http://cs231n.github.io/assignments2017/assignment1/ Due Thursday April 20, 11:59pm on Canvas (Extending
More informationMultivariate analysis of genetic data: an introduction
Multivariate analysis of genetic data: an introduction Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Imperial College London XXIV Simposio Internacional De Estadística Bogotá, 25th July
More informationHandling Human Interpreted Analytical Data. Workflows for Pharmaceutical R&D. Presented by Peter Russell
Handling Human Interpreted Analytical Data Workflows for Pharmaceutical R&D Presented by Peter Russell 2011 Survey 88% of R&D organizations lack adequate systems to automatically collect data for reporting,
More informationH. Pieter J. van Veelen *, Joana Falcao Salles and B. Irene Tieleman
van Veelen et al. Microbiome (2017) 5:156 DOI 10.1186/s40168-017-0371-6 RESEARCH Open Access Multi-level comparisons of cloacal, skin, feather and nest-associated microbiota suggest considerable influence
More informationA Randomized Approach for Crowdsourcing in the Presence of Multiple Views
A Randomized Approach for Crowdsourcing in the Presence of Multiple Views Presenter: Yao Zhou joint work with: Jingrui He - 1 - Roadmap Motivation Proposed framework: M2VW Experimental results Conclusion
More informationInfluence measures for CART
Jean-Michel Poggi Orsay, Paris Sud & Paris Descartes Joint work with Avner Bar-Hen Servane Gey (MAP5, Paris Descartes ) CART CART Classification And Regression Trees, Breiman et al. (1984) Learning set
More informationModeling Mutagenicity Status of a Diverse Set of Chemical Compounds by Envelope Methods
Modeling Mutagenicity Status of a Diverse Set of Chemical Compounds by Envelope Methods Subho Majumdar School of Statistics, University of Minnesota Envelopes in Chemometrics August 4, 2014 1 / 23 Motivation
More informationLecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26
Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1
More informationRevision: Chapter 1-6. Applied Multivariate Statistics Spring 2012
Revision: Chapter 1-6 Applied Multivariate Statistics Spring 2012 Overview Cov, Cor, Mahalanobis, MV normal distribution Visualization: Stars plot, mosaic plot with shading Outlier: chisq.plot Missing
More informationQuan%ta%on with XPRESS. and. ASAPRa%o
Quan%ta%on with XPRESS and ASAPRa%o 1 Pep%de and Protein Quan%ta%on Raw Mass Spec Data Pep%de Iden%fica%on Pep%de Valida%on Quan%ta%on Protein Assignment Protein List msconvert X!Tandem SpectraST SEQUEST*
More informationBIO 682 Multivariate Statistics Spring 2008
BIO 682 Multivariate Statistics Spring 2008 Steve Shuster http://www4.nau.edu/shustercourses/bio682/index.htm Lecture 11 Properties of Community Data Gauch 1982, Causton 1988, Jongman 1995 a. Qualitative:
More informationClassification, Linear Models, Naïve Bayes
Classification, Linear Models, Naïve Bayes CMSC 470 Marine Carpuat Slides credit: Dan Jurafsky & James Martin, Jacob Eisenstein Today Text classification problems and their evaluation Linear classifiers
More informationText Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University
Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data
More informationROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015
ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti
More informationBayesian Methods: Naïve Bayes
Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior
More informationSTATISTICS 407 METHODS OF MULTIVARIATE ANALYSIS TOPICS
STATISTICS 407 METHODS OF MULTIVARIATE ANALYSIS TOPICS Principal Component Analysis (PCA): Reduce the, summarize the sources of variation in the data, transform the data into a new data set where the variables
More informationReview of Panel Data Model Types Next Steps. Panel GLMs. Department of Political Science and Government Aarhus University.
Panel GLMs Department of Political Science and Government Aarhus University May 12, 2015 1 Review of Panel Data 2 Model Types 3 Review and Looking Forward 1 Review of Panel Data 2 Model Types 3 Review
More information