Flowchart. (b) (c) (d)

Size: px
Start display at page:

Download "Flowchart. (b) (c) (d)"

Transcription

1 Flowchart (c) (b) (d)

2 This workflow consists of the following steps: alpha diversity (microbial community evenness and richness) d1) Generate rarefied OTU tables (mulbple_rarefacbons.py) d2) Compute measures of alpha diversity for each rarefied OTU table (alpha_diversity.py) d3) Collate alpha diversity results (collate_alpha.py) d4) Generate alpha rarefacbon plots (make_rarefacbon_plots.py) beta diversity (similarity between individual microbial communibes) d5) Rarefy OTU table to remove sampling depth heterogeneity (single_rarefacbon.py) d6) Compute beta diversity (beta_diversity.py) d7) Run Principal Coordinates Analysis (principal_coordinates.py) d8) Generate PCoA plots (make_3d_plots.py or make_2d_plots.py) d9) StaBsBcal analyses

3 alpha diversity (microbial community evenness and richness, or the within-sample) Alpha diversity measures in QIIME: (hxp://scikit-bio.org/docs/latest/generated/skbio.diversity.alpha.html) A number of alpha diversity metrics are currently supported in QIIME: alpha_diversity.py -s non-phylogenebc: Shannon-Wiener diversity index

4 alpha diversity (microbial community evenness and richness, or the within-sample) d1) Generate rarefied OTU tables, Perform mulbple subsamplings on an OTU table multiple_rarefactions.py -i otu_table.biom -m 100 -x 140 s 5 -n 2 -o rarefied_otu_tables/ -m, --min Minimum number of seqs/sample for rarefacbon. -x, --max Maximum number of seqs/sample (inclusive) for rarefacbon. -s, --step Size of each steps between the min/max of seqs/sample (e.g. min, min+step... for level <= max). -n, --num_reps The number of iterabons at each step. [default: 10] Any sample containing fewer sequences in the input file than the requested number of sequences per sample is removed from the output rarefied OTU table. --max should not be > number of sequences in the sample with most coverage/depth rarefacbon_##_#.txt: the first set of numbers represents the number of sequences sampled, and the last number represents the iterabon number. In each sample the sum of the counts equals the number of samples taken.

5 alpha diversity (microbial community evenness and richness, or the within-sample) d1) Generate rarefied OTU tables d2) Compute measures of alpha diversity for each rarefied OTU table This script processes single OTU table alpha_diversity.py -i otu_table.biom m observed_otus,shannon,pd_whole_tree o alpha_div.txt -t rep_phylo.tre The script processes mulbple OTU tables in the given folder alpha_diversity.py i rarefied_otu_tables/ m observed_otus,shannon,pd_whole_tree o rarefied_otu_tables/ -t rep_phylo.tre

6 alpha diversity (microbial community evenness and richness, or the within-sample) d1) Generate rarefied OTU tables d2) Compute measures of alpha diversity for each rarefied OTU table d3) Collate alpha diversity results collate_alpha.py i rarefied_otu_tables/ -o rarefied_otu_tables/ one file for every alpha diversity metric used.

7 alpha diversity (microbial community evenness and richness, or the within-sample) d1) Generate rarefied OTU tables d2) Compute measures of alpha diversity for each rarefied OTU table d3) Collate alpha diversity results d4) Generate alpha rarefacbon plots make_rarefacbon_plots.py -i rarefied_otu_tables/alpha_div_collated/ -m FasBng_Map.txt --generate_average_tables --generate_per_sample_plots -o rarefied_otu_tables/alpha_plot/

8 This workflow consists of the following steps: alpha diversity (microbial community evenness and richness) d1) Generate rarefied OTU tables (mulbple_rarefacbons.py) d2) Compute measures of alpha diversity for each rarefied OTU table (alpha_diversity.py) d3) Collate alpha diversity results (collate_alpha.py) d4) Generate alpha rarefacbon plots (make_rarefacbon_plots.py) beta diversity (similarity between individual microbial communibes) d5) Rarefy OTU table to remove sampling depth heterogeneity (single_rarefacbon.py) d6) Compute beta diversity (beta_diversity.py) d7) Run Principal Coordinates Analysis (principal_coordinates.py) d8) Generate PCoA plots (make_3d_plots.py or make_2d_plots.py) d9) StaBsBcal analyses

9 beta diversity (similarity between individual microbial communibes) Beta diversity metrics assess the differences between microbial communibes. The fundamental output of these comparisons is a square matrix where a distance or dissimilarity is calculated between every pair of community samples, reflecbng the dissimilarity between those samples. The data in this distance matrix can be visualized with analyses such as Principal Coordinates Analysis (PCoA) and hierarchical clustering. Like alpha diversity, there are many possible beta diversity metrics that can be calculated with QIIME. beta_diversity.py -s Beat diversity measures: phylogenebc & non-phylogenebc phylogenebc measures: weighted & unweighted UniFrac, which are used extensively in recent projects.

10 beta diversity (similarity between individual microbial communibes) d5) Rarefy OTU table to remove sampling depth heterogeneity (opbonal) To compare samples at equal sequencing depth, it creates a subsampled OTU table by random sampling of the input OTU table. Samples that have fewer sequences than the requested rarefacbon depth are omixed. single_rarefaction.py -i otu_table.biom -o otu_table_even100.biom -d 100 -d, --depth Number of sequences to subsample per sample. This is one Bme subsampling on OTU table, different from making rarefacbon curve multiple_rarefactions.py -i otu_table.biom -m 100 -x 140 s 5 -n 2 -o rarefied_otu_tables/

11 beta diversity (similarity between individual microbial communibes) d5) Rarefy OTU table to remove sampling depth heterogeneity d6) Compute beta diversity Single File Beta Diversity (non-phylogenebc): beta_diversity.py -i otu_table.biom -m bray_curbs -o beta_div Single File Beta Diversity (phylogenebc): beta_diversity.py -i otu_table.biom -m weighted_unifrac,unweighted_unifrac -o beta_div -t rep_phylo.tre MulBple File (batch) Beta Diversity (phylogenebc): beta_diversity.py i otu_tables/ -m weighted_unifrac,unweighted_unifrac -o beta_div/ -t rep_phylo.tre

12 visualizabons beta diversity (similarity between individual microbial communibes) d5) Rarefy OTU table to remove sampling depth heterogeneity d6) Compute beta diversity d7) Run Principal Coordinates Analysis PCoA is a technique that helps to extract and visualize a few highly-informabve components of variabon from complex, mulbdimensional data. This is a transformabon that maps the samples present in the distance matrix to a new set of orthogonal axes such that a maximum amount of variabon is explained by the first principal coordinate, etc. The principal coordinates can be ploxed in two or three dimensions to provide an intuibve visualizabon of differences between samples. principal_coordinates.py i beta_div/ -o pcoa/

13 visualizabons beta diversity (similarity between individual microbial communibes) d5) Rarefy OTU table to remove sampling depth heterogeneity d6) Compute beta diversity d7) Run Principal Coordinates Analysis d8) Generate PCoA plots Make 2D PCoA Plots make_2d_plots.py -i pcoa/pcoa_weighted_unifrac_otu_table.txt m FasBng_Map.txt -o 2d_plots/ a specific category to color make_2d_plots.py -i pcoa/pcoa_weighted_unifrac_otu_table.txt m FasBng_Map.txt -o 2d_plots/ -b Treatment any combinabon of categories make_2d_plots.py -i pcoa/pcoa_weighted_unifrac_otu_table.txt m FasBng_Map.txt -o 2d_plots/ -b Treatment&&DOB

14 visualizabons beta diversity (similarity between individual microbial communibes) d5) Rarefy OTU table to remove sampling depth heterogeneity d6) Compute beta diversity d7) Run Principal Coordinates Analysis d8) Generate PCoA plots Make 3D PCoA Plots make_emperor.py -i pcoa/pcoa_weighted_unifrac_otu_table.txt m FasBng_Map.txt o 3d_plots/

15 This workflow consists of the following steps: alpha diversity (microbial community evenness and richness) d1) Generate rarefied OTU tables (mulbple_rarefacbons.py) d2) Compute measures of alpha diversity for each rarefied OTU table (alpha_diversity.py) d3) Collate alpha diversity results (collate_alpha.py) d4) Generate alpha rarefacbon plots (make_rarefacbon_plots.py) beta diversity (similarity between individual microbial communibes) d5) Rarefy OTU table to remove sampling depth heterogeneity (single_rarefacbon.py) d6) Compute beta diversity (beta_diversity.py) d7) Run Principal Coordinates Analysis (principal_coordinates.py) d8) Generate PCoA plots (make_3d_plots.py or make_2d_plots.py) d9) StaBsBcal analyses

16 d9.1) StaBsBcal analyses CreaBng Distance Comparison & Plots make_distance_boxplots.py -d weighted_unifrac_otu_table.txt m FasBng_Map.txt -o./ -f 'Treatment --save_raw_data Plosng Within and Between Distances Comparisons based on two-sided Student's two-sample t-test

17 d9.2) StaBsBcal analyses d) microbiome diversity analyses Comparing Distance Matrices based on the Mantel test, a non-parametric stabsbcal method that computes the correlabon between two distance matrices. One common applicabon of distance matrix comparison is to determine if correlabon exists between a community distance matrix (e.g. UniFrac distance matrix) and a second matrix derived from an environmental parameter (e.g. difference in ph). If communibes that are at dissimilar ph levels are more different from one another than communibes that are at very similar ph levels. If so, this would indicate posibve correlabon between the two distance matrices. nonparametric means they use permutabons to determine the p-value, or stabsbcal significance. compare_distance_matrices.py --method=mantel i weighted_unifrac_dm.txt,ph_dm.txt o./ -n 999

18 d9.3) StaBsBcal analyses d) microbiome diversity analyses Comparing Categories with stabsbcal methods: Analyzes stabsbcal significance of sample groupings using distance matrices A majority of the comparison are based on the ANOVA family, determine whether the grouping of samples by a given category is stabsbcally significant. ANOSIM is nonparametric, stabsbcal significance is determined through permutabons. It only works with a categorical variable. compare_categories.py --method anoism -i weighted_unifrac_dm.txt -m map.txt -c Treatment o./ -n 999 The p-value of indicates that at an alpha of 0.05, the grouping of samples by individual is stabsbcally significant. The R value of is fairly close to +1, indicabng dissimilarity between the groups.

19 d9.4) StaBsBcal analyses d) microbiome diversity analyses Comparing Categories with stabsbcal methods Adonis creates a set by first idenbfying the relevant centroids of data and then calculabng the squared deviabons from these points. It can accept either categorical or conbnuous variables in the metadata mapping file. Significance tests are performed using F-tests based on sequenbal sums of squares from permutabons of the raw data. compare_categories.py --method adonis -i weighted_unifrac_dm.txt -m map.txt -c Treatment o./ -n 999

20 d9.5) StaBsBcal analyses Supervised classificabon d) microbiome diversity analyses Supervised classificabon is to classify unlabeled communibes based on a set of labeled training communibes using the Random Forests (R randomforest package needed). supervised_learning.py -i otu_table.biom -m Fasting_Map.txt -c Treatment -o./ OTU tables rarefied at an even depth and collate the results into a single file supervised_learning.py i rarefied_tables/ -m Fasting_Map.txt -c Treatment -o./ -w sl_cv10.txt -e, --errortype Type of error esbmabon. Valid choices are: oob, loo, cv5, cv10. [default oob] oob: out-of-bag, fastest, only builds one classifier, use for quick esbmates loo: leave-one-out cross validabon, use for small data sets (< ~30-50 samples) cv10: 10-fold cross validabon, provides mean and standard deviabon of error, use for best esbmates

21 d9.5) StaBsBcal analyses d) microbiome diversity analyses Supervised classificabon Outputs: 1) summary.txt: including the predicted class labels, the expected generalizabon error of the classifier, the rabo of the baseline error to the esbmated generalizabon error. A reasonable criterion for good classificabon is that this rabo >2, i.e., the classifier does at least twice as well as random guessing. 2) cv_probabilibes.txt: Cross-validaBon esbmates of class probabilibes for samples to avoid overfisng. 3) mislabeling.txt: esbmated probability of the known class, and probability for most likely other class. 4) feature_importance_scores.txt: a list of discriminabve OTUs with their associated importance scores For Random Forests, the importance is expected mean decrease in accuracy when feature is ignored. 5) confusion_matrix.txt: the number of samples whose true class was i that were classified in class j.

22 d9.6) StaBsBcal analyses Tracking the source of microbes (SourceTracker needed) 1) Filter OTUs present in less than 1% of the samples from the OTU table filter_otus_from_otu_table.py -i otu_table.biom -o filtered_otu_table.biom s xx 2) Convert table from BIOM to tab-separated text format biom convert -i filtered_otu_table.biom -o filtered_otu_table.txt -b 3) Run SourceTracker R --slave --vanilla --args -i filtered_otu_table.txt -m map.txt -o./ < sourcetracker_for_qiime.r

23 Taxonomic classificabons of single- and paired-end sequences The RTAX procedure takes advantage of mate-pair informabon when performing taxonomic classificabon. The addibonal informabon from a second read may allow a more precise taxonomy assignment to be made. The procedure is to perform OTU picking on one read only, but then to obtain addibonal informabon from the second read at the taxonomic classificabon step. assign_taxonomy.py -i otus_rep_set/forward_read.fasta -m rtax --read_1_seqs_fp forward_read.fna --read_2_seqs_fp reverse_read.fna -r gg_97_otus.fasta -t gg_otus_tax.txt --single_ok

24 Baochen Shi CNSI 4338, UCLA

Other resources. Greengenes (bacterial) Silva (bacteria, archaeal and eukarya)

Other resources. Greengenes (bacterial)  Silva (bacteria, archaeal and eukarya) General QIIME resources http://qiime.org/ Blog (news, updates): http://qiime.wordpress.com/ Support/forum: https://groups.google.com/forum/#!forum/qiimeforum Citing QIIME: Caporaso, J.G. et al., QIIME

More information

Lecture 2: Diversity, Distances, adonis. Lecture 2: Diversity, Distances, adonis. Alpha- Diversity. Alpha diversity definition(s)

Lecture 2: Diversity, Distances, adonis. Lecture 2: Diversity, Distances, adonis. Alpha- Diversity. Alpha diversity definition(s) Lecture 2: Diversity, Distances, adonis Lecture 2: Diversity, Distances, adonis Diversity - alpha, beta (, gamma) Beta- Diversity in practice: Ecological Distances Unsupervised Learning: Clustering, etc

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! h0p://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 1 EvaluaBon

More information

Visit The Chemical Sta/s/cian! h2p://chemicalsta/s/cian.wordpress.com/ PREDICTUM INC. 2NY3

Visit The Chemical Sta/s/cian! h2p://chemicalsta/s/cian.wordpress.com/ PREDICTUM INC.   2NY3 Visit The Chemical Sta/s/cian! h2p://chemicalsta/s/cian.wordpress.com/ PREDICTUM IC. WWW.PREDICTUM.COM 23 Follow Me On Twi2er @chemstateric! PREDICTUM IC. WWW.PREDICTUM.COM 23 I am a new guest blogger

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! h0p://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 1 EvaluaBon

More information

Title ghost-tree: creating hybrid-gene phylogenetic trees for diversity analyses

Title ghost-tree: creating hybrid-gene phylogenetic trees for diversity analyses 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 Title ghost-tree: creating hybrid-gene phylogenetic trees for diversity analyses

More information

Supplementary Information

Supplementary Information Supplementary Information Table S1. Per-sample sequences, observed OTUs, richness estimates, diversity indices and coverage. Samples codes as follows: YED (Young leaves Endophytes), MED (Mature leaves

More information

Amplicon Sequencing. Dr. Orla O Sullivan SIRG Research Fellow Teagasc

Amplicon Sequencing. Dr. Orla O Sullivan SIRG Research Fellow Teagasc Amplicon Sequencing Dr. Orla O Sullivan SIRG Research Fellow Teagasc What is Amplicon Sequencing? Sequencing of target genes (are regions of ) obtained by PCR using gene specific primers. Why do we do

More information

Supplementary Figure 1. The sampled single species populations of threespine stickleback

Supplementary Figure 1. The sampled single species populations of threespine stickleback Supplementary Figure 1. The sampled single species populations of threespine stickleback (a & b) and Eurasian perch (c & d) exhibit unimodal variation in both morphology and diet. Unimodal phenotype distributions

More information

Microbiome: 16S rrna Sequencing 3/30/2018

Microbiome: 16S rrna Sequencing 3/30/2018 Microbiome: 16S rrna Sequencing 3/30/2018 Skills from Previous Lectures Central Dogma of Biology Lecture 3: Genetics and Genomics Lecture 4: Microarrays Lecture 12: ChIP-Seq Phylogenetics Lecture 13: Phylogenetics

More information

Indiana University, Fall 2014 P309 Intermediate Physics Lab. Lecture 1: Experimental UncertainBes

Indiana University, Fall 2014 P309 Intermediate Physics Lab. Lecture 1: Experimental UncertainBes Indiana University, Fall 2014 P309 Intermediate Physics Lab Lecture 1: Experimental UncertainBes Reading: Bevington & Robinson, Chapters 1-3 Handouts from hmp://physics.indiana.edu/~courses/p309/f14/ Experimental

More information

Enhancing HST Spectral Products

Enhancing HST Spectral Products Enhancing HST Spectral Products Alessandra Aloisi & HST Spectroscopic Legacy Working Group Introducing the HST Spectroscopic Legacy Working Group The HSLWG was formed to study the recommendabons stemming

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

2/19/2018. Dataset: 85,122 islands 19,392 > 1km 2 17,883 with data

2/19/2018. Dataset: 85,122 islands 19,392 > 1km 2 17,883 with data The group numbers are arbitrary. Remember that you can rotate dendrograms around any node and not change the meaning. So, the order of the clusters is not meaningful. Taking a subset of the data changes

More information

Characterizing and predicting cyanobacterial blooms in an 8-year

Characterizing and predicting cyanobacterial blooms in an 8-year 1 2 3 4 5 Characterizing and predicting cyanobacterial blooms in an 8-year amplicon sequencing time-course Authors Nicolas Tromas 1*, Nathalie Fortin 2, Larbi Bedrani 1, Yves Terrat 1, Pedro Cardoso 4,

More information

Lecture: Mixture Models for Microbiome data

Lecture: Mixture Models for Microbiome data Lecture: Mixture Models for Microbiome data Lecture 3: Mixture Models for Microbiome data Outline: - - Sequencing thought experiment Mixture Models (tangent) - (esp. Negative Binomial) - Differential abundance

More information

Taxonomy and Clustering of SSU rrna Tags. Susan Huse Josephine Bay Paul Center August 5, 2013

Taxonomy and Clustering of SSU rrna Tags. Susan Huse Josephine Bay Paul Center August 5, 2013 Taxonomy and Clustering of SSU rrna Tags Susan Huse Josephine Bay Paul Center August 5, 2013 Primary Methods of Taxonomic Assignment Bayesian Kmer Matching RDP http://rdp.cme.msu.edu Wang, et al (2007)

More information

Supplementary Materials for

Supplementary Materials for advances.sciencemag.org/cgi/content/full/2/1/e1500997/dc1 Supplementary Materials for Social behavior shapes the chimpanzee pan-microbiome Andrew H. Moeller, Steffen Foerster, Michael L. Wilson, Anne E.

More information

OECD QSAR Toolbox v.3.3. Step-by-step example of how to build a userdefined

OECD QSAR Toolbox v.3.3. Step-by-step example of how to build a userdefined OECD QSAR Toolbox v.3.3 Step-by-step example of how to build a userdefined QSAR Background Objectives The exercise Workflow of the exercise Outlook 2 Background This is a step-by-step presentation designed

More information

Dimensionality Reduction

Dimensionality Reduction Dimensionality Reduction Le Song Machine Learning I CSE 674, Fall 23 Unsupervised learning Learning from raw (unlabeled, unannotated, etc) data, as opposed to supervised data where a classification of

More information

TDT 4173 Machine Learning and Case Based Reasoning. Helge Langseth og Agnar Aamodt. NTNU IDI Seksjon for intelligente systemer

TDT 4173 Machine Learning and Case Based Reasoning. Helge Langseth og Agnar Aamodt. NTNU IDI Seksjon for intelligente systemer TDT 4173 Machine Learning and Case Based Reasoning Lecture 6 Support Vector Machines. Ensemble Methods Helge Langseth og Agnar Aamodt NTNU IDI Seksjon for intelligente systemer Outline 1 Wrap-up from last

More information

Short- course on symmetry and crystallography. Part 1: Point symmetry. Michael Engel Ann Arbor, June 2011

Short- course on symmetry and crystallography. Part 1: Point symmetry. Michael Engel Ann Arbor, June 2011 Short- course on symmetry and crystallography Part 1: Point symmetry Michael Engel Ann Arbor, June 2011 Euclidean move Defini&on 1: An Euclidean move T = {A, b} transformabon that leaves space invariant:

More information

Outline Classes of diversity measures. Species Divergence and the Measurement of Microbial Diversity. How do we describe and compare diversity?

Outline Classes of diversity measures. Species Divergence and the Measurement of Microbial Diversity. How do we describe and compare diversity? Species Divergence and the Measurement of Microbial Diversity Cathy Lozupone University of Colorado, Boulder. Washington University, St Louis. Outline Classes of diversity measures α vs β diversity Quantitative

More information

Lecture 3: Mixture Models for Microbiome data. Lecture 3: Mixture Models for Microbiome data

Lecture 3: Mixture Models for Microbiome data. Lecture 3: Mixture Models for Microbiome data Lecture 3: Mixture Models for Microbiome data 1 Lecture 3: Mixture Models for Microbiome data Outline: - Mixture Models (Negative Binomial) - DESeq2 / Don t Rarefy. Ever. 2 Hypothesis Tests - reminder

More information

STK4900/ Lecture 5. Program

STK4900/ Lecture 5. Program STK4900/9900 - Lecture 5 Program 1. Checking model assumptions Linearity Equal variances Normality Influential observations Importance of model assumptions 2. Selection of predictors Forward and backward

More information

HYPOTHESIS TESTING: THE CHI-SQUARE STATISTIC

HYPOTHESIS TESTING: THE CHI-SQUARE STATISTIC 1 HYPOTHESIS TESTING: THE CHI-SQUARE STATISTIC 7 steps of Hypothesis Testing 1. State the hypotheses 2. Identify level of significant 3. Identify the critical values 4. Calculate test statistics 5. Compare

More information

Visual matching: distance measures

Visual matching: distance measures Visual matching: distance measures Metric and non-metric distances: what distance to use It is generally assumed that visual data may be thought of as vectors (e.g. histograms) that can be compared for

More information

FIG S1: Rarefaction analysis of observed richness within Drosophila. All calculations were

FIG S1: Rarefaction analysis of observed richness within Drosophila. All calculations were Page 1 of 14 FIG S1: Rarefaction analysis of observed richness within Drosophila. All calculations were performed using mothur (2). OTUs were defined at the 3% divergence threshold using the average neighbor

More information

Radial selec*on issues for primordial non- Gaussianity detec*on

Radial selec*on issues for primordial non- Gaussianity detec*on Radial selec*on issues for primordial non- Gaussianity detec*on Carlos Cunha NG at KICP, University of Chicago April 20, 2012 Radial issues Decoupled from angular selecbon: One average N(z) for all (simplest

More information

Lecture 5: Ecological distance metrics; Principal Coordinates Analysis. Univariate testing vs. community analysis

Lecture 5: Ecological distance metrics; Principal Coordinates Analysis. Univariate testing vs. community analysis Lecture 5: Ecological distance metrics; Principal Coordinates Analysis Univariate testing vs. community analysis Univariate testing deals with hypotheses concerning individual taxa Is this taxon differentially

More information

Advanced Statistical Methods: Beyond Linear Regression

Advanced Statistical Methods: Beyond Linear Regression Advanced Statistical Methods: Beyond Linear Regression John R. Stevens Utah State University Notes 3. Statistical Methods II Mathematics Educators Worshop 28 March 2009 1 http://www.stat.usu.edu/~jrstevens/pcmi

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

Pattern Recognition and Machine Learning. Learning and Evaluation of Pattern Recognition Processes

Pattern Recognition and Machine Learning. Learning and Evaluation of Pattern Recognition Processes Pattern Recognition and Machine Learning James L. Crowley ENSIMAG 3 - MMIS Fall Semester 2016 Lesson 1 5 October 2016 Learning and Evaluation of Pattern Recognition Processes Outline Notation...2 1. The

More information

Introduction to Statistical Analysis using IBM SPSS Statistics (v24)

Introduction to Statistical Analysis using IBM SPSS Statistics (v24) to Statistical Analysis using IBM SPSS Statistics (v24) to Statistical Analysis Using IBM SPSS Statistics is a two day instructor-led classroom course that provides an application-oriented introduction

More information

2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity.

2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity. A global picture of the protein universe will help us to understand

More information

Shared Segmentation of Natural Scenes. Dependent Pitman-Yor Processes

Shared Segmentation of Natural Scenes. Dependent Pitman-Yor Processes Shared Segmentation of Natural Scenes using Dependent Pitman-Yor Processes Erik Sudderth & Michael Jordan University of California, Berkeley Parsing Visual Scenes sky skyscraper sky dome buildings trees

More information

Bayesian Structure Modeling. SPFLODD December 1, 2011

Bayesian Structure Modeling. SPFLODD December 1, 2011 Bayesian Structure Modeling SPFLODD December 1, 2011 Outline Defining Bayesian Parametric Bayesian models Latent Dirichlet allocabon (Blei et al., 2003) Bayesian HMM (Goldwater and Griffiths, 2007) A limle

More information

Multivariate analysis of genetic data: exploring groups diversity

Multivariate analysis of genetic data: exploring groups diversity Multivariate analysis of genetic data: exploring groups diversity T. Jombart Imperial College London Bogota 01-12-2010 1/42 Outline Introduction Clustering algorithms Hierarchical clustering K-means Multivariate

More information

TOPCAT basics. Modern Astrophysics Techniques. Contact: Mladen Novak,

TOPCAT basics. Modern Astrophysics Techniques. Contact: Mladen Novak, TOPCAT basics Modern Astrophysics Techniques Contact: Mladen Novak, mlnovak@phy.hr What is TOPCAT? TOPCAT= Tool for OPeraBons on Catalogues And Tables hep://www.star.bris.ac.uk/~mbt/topcat/ Useful, because

More information

Y (Nominal/Categorical) 1. Metric (interval/ratio) data for 2+ IVs, and categorical (nominal) data for a single DV

Y (Nominal/Categorical) 1. Metric (interval/ratio) data for 2+ IVs, and categorical (nominal) data for a single DV 1 Neuendorf Discriminant Analysis The Model X1 X2 X3 X4 DF2 DF3 DF1 Y (Nominal/Categorical) Assumptions: 1. Metric (interval/ratio) data for 2+ IVs, and categorical (nominal) data for a single DV 2. Linearity--in

More information

Statistics Toolbox 6. Apply statistical algorithms and probability models

Statistics Toolbox 6. Apply statistical algorithms and probability models Statistics Toolbox 6 Apply statistical algorithms and probability models Statistics Toolbox provides engineers, scientists, researchers, financial analysts, and statisticians with a comprehensive set of

More information

Lecture 5: Ecological distance metrics; Principal Coordinates Analysis. Univariate testing vs. community analysis

Lecture 5: Ecological distance metrics; Principal Coordinates Analysis. Univariate testing vs. community analysis Lecture 5: Ecological distance metrics; Principal Coordinates Analysis Univariate testing vs. community analysis Univariate testing deals with hypotheses concerning individual taxa Is this taxon differentially

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

Lecture 2: Descriptive statistics, normalizations & testing

Lecture 2: Descriptive statistics, normalizations & testing Lecture 2: Descriptive statistics, normalizations & testing From sequences to OTU table Sequencing Sample 1 Sample 2... Sample N Abundances of each microbial taxon in each of the N samples 2 1 Normalizing

More information

FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING

FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING Vishwanath Mantha Department for Electrical and Computer Engineering Mississippi State University, Mississippi State, MS 39762 mantha@isip.msstate.edu ABSTRACT

More information

profileanalysis Innovation with Integrity Quickly pinpointing and identifying potential biomarkers in Proteomics and Metabolomics research

profileanalysis Innovation with Integrity Quickly pinpointing and identifying potential biomarkers in Proteomics and Metabolomics research profileanalysis Quickly pinpointing and identifying potential biomarkers in Proteomics and Metabolomics research Innovation with Integrity Omics Research Biomarker Discovery Made Easy by ProfileAnalysis

More information

CSC Neural Networks. Perceptron Learning Rule

CSC Neural Networks. Perceptron Learning Rule CSC 302 1.5 Neural Networks Perceptron Learning Rule 1 Objectives Determining the weight matrix and bias for perceptron networks with many inputs. Explaining what a learning rule is. Developing the perceptron

More information

Rarefaction Example. Consider this dataset: Original matrix:

Rarefaction Example. Consider this dataset: Original matrix: Rarefaction Example Conider thi dataet: Where i diverity highet? S 4 6 6 6 Shannon 1.0375911 0.9176461 0.9908044 1.0397044 What about rarefied diverity? rarefy(community,ample=10) 3.175905 2.576947 2.889674

More information

Real Estate Price Prediction with Regression and Classification CS 229 Autumn 2016 Project Final Report

Real Estate Price Prediction with Regression and Classification CS 229 Autumn 2016 Project Final Report Real Estate Price Prediction with Regression and Classification CS 229 Autumn 2016 Project Final Report Hujia Yu, Jiafu Wu [hujiay, jiafuwu]@stanford.edu 1. Introduction Housing prices are an important

More information

DIMENSION REDUCTION AND CLUSTER ANALYSIS

DIMENSION REDUCTION AND CLUSTER ANALYSIS DIMENSION REDUCTION AND CLUSTER ANALYSIS EECS 833, 6 March 2006 Geoff Bohling Assistant Scientist Kansas Geological Survey geoff@kgs.ku.edu 864-2093 Overheads and resources available at http://people.ku.edu/~gbohling/eecs833

More information

Agilent MassHunter Profinder: Solving the Challenge of Isotopologue Extraction for Qualitative Flux Analysis

Agilent MassHunter Profinder: Solving the Challenge of Isotopologue Extraction for Qualitative Flux Analysis Agilent MassHunter Profinder: Solving the Challenge of Isotopologue Extraction for Qualitative Flux Analysis Technical Overview Introduction Metabolomics studies measure the relative abundance of metabolites

More information

Package milineage. October 20, 2017

Package milineage. October 20, 2017 Type Package Package milineage October 20, 2017 Title Association Tests for Microbial Lineages on a Taxonomic Tree Version 2.0 Date 2017-10-18 Author Zheng-Zheng Tang Maintainer Zheng-Zheng Tang

More information

Machine Learning Approaches to Crop Yield Prediction and Climate Change Impact Assessment

Machine Learning Approaches to Crop Yield Prediction and Climate Change Impact Assessment Machine Learning Approaches to Crop Yield Prediction and Climate Change Impact Assessment Andrew Crane-Droesch FCSM, March 2018 The views expressed are those of the authors and should not be attributed

More information

Microbial analysis with STAMP

Microbial analysis with STAMP Microbial analysis with STAMP Conor Meehan cmeehan@itg.be A quick aside on who I am Tangents already! Who I am A postdoc at the Institute of Tropical Medicine in Antwerp, Belgium Mycobacteria evolution

More information

Naïve Bayes Lecture 17

Naïve Bayes Lecture 17 Naïve Bayes Lecture 17 David Sontag New York University Slides adapted from Luke Zettlemoyer, Carlos Guestrin, Dan Klein, and Mehryar Mohri Bayesian Learning Use Bayes rule! Data Likelihood Prior Posterior

More information

ncounter PlexSet Data Analysis Guidelines

ncounter PlexSet Data Analysis Guidelines ncounter PlexSet Data Analysis Guidelines NanoString Technologies, Inc. 530 airview Ave North Seattle, Washington 98109 USA Telephone: 206.378.6266 888.358.6266 E-mail: info@nanostring.com Molecules That

More information

An Adaptive Association Test for Microbiome Data

An Adaptive Association Test for Microbiome Data An Adaptive Association Test for Microbiome Data Chong Wu 1, Jun Chen 2, Junghi 1 Kim and Wei Pan 1 1 Division of Biostatistics, School of Public Health, University of Minnesota; 2 Division of Biomedical

More information

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric Assumptions The observations must be independent. Dependent variable should be continuous

More information

Bayes Decision Theory - I

Bayes Decision Theory - I Bayes Decision Theory - I Nuno Vasconcelos (Ken Kreutz-Delgado) UCSD Statistical Learning from Data Goal: Given a relationship between a feature vector and a vector y, and iid data samples ( i,y i ), find

More information

Ch 13 BIOL 100. Biodiversity: sum total of all organisms in an area. Split into three specific levels: Ecosystem diversity.

Ch 13 BIOL 100. Biodiversity: sum total of all organisms in an area. Split into three specific levels: Ecosystem diversity. Ch 13 BIOL 100 Biodiversity: sum total of all organisms in an area Split into three specific levels: Ecosystem diversity Species diversity GeneBc diversity Species diversity Number or variety of species

More information

Measurement scales. Carlos Bana e Costa, João Lourenço, Mónica Oliveira DECISION SUPPORT MODELS, DEPARTMENT OF ENGINEERING AND MANAGEMENT

Measurement scales. Carlos Bana e Costa, João Lourenço, Mónica Oliveira DECISION SUPPORT MODELS, DEPARTMENT OF ENGINEERING AND MANAGEMENT Measurement scales Carlos Bana e Costa, João Lourenço, Mónica Oliveira MULTICRITERIA STEPS: Structuring vs. evaluaeon STRUCTURING OPTIONS EVALUATION Points of view OpBons performance profile Plausible

More information

Non-Inferiority Tests for the Ratio of Two Proportions in a Cluster- Randomized Design

Non-Inferiority Tests for the Ratio of Two Proportions in a Cluster- Randomized Design Chapter 236 Non-Inferiority Tests for the Ratio of Two Proportions in a Cluster- Randomized Design Introduction This module provides power analysis and sample size calculation for non-inferiority tests

More information

Experimental Design and Data Analysis for Biologists

Experimental Design and Data Analysis for Biologists Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1

More information

WALD LECTURE II LOOKING INSIDE THE BLACK BOX. Leo Breiman UCB Statistics

WALD LECTURE II LOOKING INSIDE THE BLACK BOX. Leo Breiman UCB Statistics 1 WALD LECTURE II LOOKING INSIDE THE BLACK BOX Leo Breiman UCB Statistics leo@stat.berkeley.edu ORIGIN OF BLACK BOXES 2 Statistics uses data to explore problems. Think of the data as being generated by

More information

MIRA, SVM, k-nn. Lirong Xia

MIRA, SVM, k-nn. Lirong Xia MIRA, SVM, k-nn Lirong Xia Linear Classifiers (perceptrons) Inputs are feature values Each feature has a weight Sum is the activation activation w If the activation is: Positive: output +1 Negative, output

More information

Multivariate Statistics Summary and Comparison of Techniques. Multivariate Techniques

Multivariate Statistics Summary and Comparison of Techniques. Multivariate Techniques Multivariate Statistics Summary and Comparison of Techniques P The key to multivariate statistics is understanding conceptually the relationship among techniques with regards to: < The kinds of problems

More information

Using Topological Data Analysis to find discrimination between microbial states in human microbiome data

Using Topological Data Analysis to find discrimination between microbial states in human microbiome data Using Topological Data Analysis to find discrimination between microbial states in human microbiome data Mehrdad Yazdani 1,2, Larry Smarr 1,3 and Rob Knight 4 1 California Institute for Telecommunications

More information

RNA-seq. Differential analysis

RNA-seq. Differential analysis RNA-seq Differential analysis DESeq2 DESeq2 http://bioconductor.org/packages/release/bioc/vignettes/deseq 2/inst/doc/DESeq2.html Input data Why un-normalized counts? As input, the DESeq2 package expects

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION City of origin as a confounding variable. The original study was designed such that the city where sampling was performed was perfectly confounded with where the DNA extractions and sequencing was performed.

More information

N-gram Language Modeling

N-gram Language Modeling N-gram Language Modeling Outline: Statistical Language Model (LM) Intro General N-gram models Basic (non-parametric) n-grams Class LMs Mixtures Part I: Statistical Language Model (LM) Intro What is a statistical

More information

Naïve Bayes. Vibhav Gogate The University of Texas at Dallas

Naïve Bayes. Vibhav Gogate The University of Texas at Dallas Naïve Bayes Vibhav Gogate The University of Texas at Dallas Supervised Learning of Classifiers Find f Given: Training set {(x i, y i ) i = 1 n} Find: A good approximation to f : X Y Examples: what are

More information

ANOVA approach. Investigates interaction terms. Disadvantages: Requires careful sampling design with replication

ANOVA approach. Investigates interaction terms. Disadvantages: Requires careful sampling design with replication ANOVA approach Advantages: Ideal for evaluating hypotheses Ideal to quantify effect size (e.g., differences between groups) Address multiple factors at once Investigates interaction terms Disadvantages:

More information

Non-parametric Methods

Non-parametric Methods Non-parametric Methods Machine Learning Alireza Ghane Non-Parametric Methods Alireza Ghane / Torsten Möller 1 Outline Machine Learning: What, Why, and How? Curve Fitting: (e.g.) Regression and Model Selection

More information

DETECTING BIOLOGICAL AND ENVIRONMENTAL CHANGES: DESIGN AND ANALYSIS OF MONITORING AND EXPERIMENTS (University of Bologna, 3-14 March 2008)

DETECTING BIOLOGICAL AND ENVIRONMENTAL CHANGES: DESIGN AND ANALYSIS OF MONITORING AND EXPERIMENTS (University of Bologna, 3-14 March 2008) Dipartimento di Biologia Evoluzionistica Sperimentale Centro Interdipartimentale di Ricerca per le Scienze Ambientali in Ravenna INTERNATIONAL WINTER SCHOOL UNIVERSITY OF BOLOGNA DETECTING BIOLOGICAL AND

More information

Exploiting Sparse Non-Linear Structure in Astronomical Data

Exploiting Sparse Non-Linear Structure in Astronomical Data Exploiting Sparse Non-Linear Structure in Astronomical Data Ann B. Lee Department of Statistics and Department of Machine Learning, Carnegie Mellon University Joint work with P. Freeman, C. Schafer, and

More information

Physics 140. Sound. Chapter 12

Physics 140. Sound. Chapter 12 Physics 140 Sound Chapter 12 Sound waves Sound is composed of longitudinal pressure waves. wave propagabon Compression Compression Compression è when parbcles come together RarefacBon RarefacBon RarefacBon

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

A Program for Data Transformations and Kernel Density Estimation

A Program for Data Transformations and Kernel Density Estimation A Program for Data Transformations and Kernel Density Estimation John G. Manchuk and Clayton V. Deutsch Modeling applications in geostatistics often involve multiple variables that are not multivariate

More information

EasySDM: A Spatial Data Mining Platform

EasySDM: A Spatial Data Mining Platform EasySDM: A Spatial Data Mining Platform (User Manual) Authors: Amine Abdaoui and Mohamed Ala Al Chikha, Students at the National Computing Engineering School. Algiers. June 2013. 1. Overview EasySDM is

More information

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007) FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Maximum Likelihood Estimation. only training data is available to design a classifier

Maximum Likelihood Estimation. only training data is available to design a classifier Introduction to Pattern Recognition [ Part 5 ] Mahdi Vasighi Introduction Bayesian Decision Theory shows that we could design an optimal classifier if we knew: P( i ) : priors p(x i ) : class-conditional

More information

Bacterial Communities in Women with Bacterial Vaginosis: High Resolution Phylogenetic Analyses Reveal Relationships of Microbiota to Clinical Criteria

Bacterial Communities in Women with Bacterial Vaginosis: High Resolution Phylogenetic Analyses Reveal Relationships of Microbiota to Clinical Criteria Bacterial Communities in Women with Bacterial Vaginosis: High Resolution Phylogenetic Analyses Reveal Relationships of Microbiota to Clinical Criteria Seminar presentation Pierre Barbera Supervised by:

More information

Loss Functions and Optimization. Lecture 3-1

Loss Functions and Optimization. Lecture 3-1 Lecture 3: Loss Functions and Optimization Lecture 3-1 Administrative Assignment 1 is released: http://cs231n.github.io/assignments2017/assignment1/ Due Thursday April 20, 11:59pm on Canvas (Extending

More information

Multivariate analysis of genetic data: an introduction

Multivariate analysis of genetic data: an introduction Multivariate analysis of genetic data: an introduction Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Imperial College London XXIV Simposio Internacional De Estadística Bogotá, 25th July

More information

Handling Human Interpreted Analytical Data. Workflows for Pharmaceutical R&D. Presented by Peter Russell

Handling Human Interpreted Analytical Data. Workflows for Pharmaceutical R&D. Presented by Peter Russell Handling Human Interpreted Analytical Data Workflows for Pharmaceutical R&D Presented by Peter Russell 2011 Survey 88% of R&D organizations lack adequate systems to automatically collect data for reporting,

More information

H. Pieter J. van Veelen *, Joana Falcao Salles and B. Irene Tieleman

H. Pieter J. van Veelen *, Joana Falcao Salles and B. Irene Tieleman van Veelen et al. Microbiome (2017) 5:156 DOI 10.1186/s40168-017-0371-6 RESEARCH Open Access Multi-level comparisons of cloacal, skin, feather and nest-associated microbiota suggest considerable influence

More information

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views A Randomized Approach for Crowdsourcing in the Presence of Multiple Views Presenter: Yao Zhou joint work with: Jingrui He - 1 - Roadmap Motivation Proposed framework: M2VW Experimental results Conclusion

More information

Influence measures for CART

Influence measures for CART Jean-Michel Poggi Orsay, Paris Sud & Paris Descartes Joint work with Avner Bar-Hen Servane Gey (MAP5, Paris Descartes ) CART CART Classification And Regression Trees, Breiman et al. (1984) Learning set

More information

Modeling Mutagenicity Status of a Diverse Set of Chemical Compounds by Envelope Methods

Modeling Mutagenicity Status of a Diverse Set of Chemical Compounds by Envelope Methods Modeling Mutagenicity Status of a Diverse Set of Chemical Compounds by Envelope Methods Subho Majumdar School of Statistics, University of Minnesota Envelopes in Chemometrics August 4, 2014 1 / 23 Motivation

More information

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26 Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1

More information

Revision: Chapter 1-6. Applied Multivariate Statistics Spring 2012

Revision: Chapter 1-6. Applied Multivariate Statistics Spring 2012 Revision: Chapter 1-6 Applied Multivariate Statistics Spring 2012 Overview Cov, Cor, Mahalanobis, MV normal distribution Visualization: Stars plot, mosaic plot with shading Outlier: chisq.plot Missing

More information

Quan%ta%on with XPRESS. and. ASAPRa%o

Quan%ta%on with XPRESS. and. ASAPRa%o Quan%ta%on with XPRESS and ASAPRa%o 1 Pep%de and Protein Quan%ta%on Raw Mass Spec Data Pep%de Iden%fica%on Pep%de Valida%on Quan%ta%on Protein Assignment Protein List msconvert X!Tandem SpectraST SEQUEST*

More information

BIO 682 Multivariate Statistics Spring 2008

BIO 682 Multivariate Statistics Spring 2008 BIO 682 Multivariate Statistics Spring 2008 Steve Shuster http://www4.nau.edu/shustercourses/bio682/index.htm Lecture 11 Properties of Community Data Gauch 1982, Causton 1988, Jongman 1995 a. Qualitative:

More information

Classification, Linear Models, Naïve Bayes

Classification, Linear Models, Naïve Bayes Classification, Linear Models, Naïve Bayes CMSC 470 Marine Carpuat Slides credit: Dan Jurafsky & James Martin, Jacob Eisenstein Today Text classification problems and their evaluation Linear classifiers

More information

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data

More information

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti

More information

Bayesian Methods: Naïve Bayes

Bayesian Methods: Naïve Bayes Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior

More information

STATISTICS 407 METHODS OF MULTIVARIATE ANALYSIS TOPICS

STATISTICS 407 METHODS OF MULTIVARIATE ANALYSIS TOPICS STATISTICS 407 METHODS OF MULTIVARIATE ANALYSIS TOPICS Principal Component Analysis (PCA): Reduce the, summarize the sources of variation in the data, transform the data into a new data set where the variables

More information

Review of Panel Data Model Types Next Steps. Panel GLMs. Department of Political Science and Government Aarhus University.

Review of Panel Data Model Types Next Steps. Panel GLMs. Department of Political Science and Government Aarhus University. Panel GLMs Department of Political Science and Government Aarhus University May 12, 2015 1 Review of Panel Data 2 Model Types 3 Review and Looking Forward 1 Review of Panel Data 2 Model Types 3 Review

More information