Lecture 2: Diversity, Distances, adonis. Lecture 2: Diversity, Distances, adonis. Alpha- Diversity. Alpha diversity definition(s)
|
|
- Maryann Kelley
- 5 years ago
- Views:
Transcription
1 Lecture 2: Diversity, Distances, adonis Lecture 2: Diversity, Distances, adonis Diversity - alpha, beta (, gamma) Beta- Diversity in practice: Ecological Distances Unsupervised Learning: Clustering, etc Ordination: e.g. PCA, UniFrac/PCoA, DPCoA Testing: Permutational Multivariate ANOVA Some slides from prof A. Alekseyenko, NYU; and prof S. Holmes, Stanford 1 2 Alpha- Diversity Alpha diversity definition(s) Alpha diversity describes the diversity of a single community (specimen). In statistical terms, it is a scalar statistic computed for a single observation (column) that represents the diversity of that observation. There are many statistics that can describe diversity: e.g. taxonomical richness, evenness, dominance, etc. 3 4
2 Rank abundance plots Species richness Suppose we observe a community that can contain up to k species. The relative proportions of the species P = {p 1,, p k } Richness is computed as R = 1(p 1 ) + 1(p 2 ) + + 1(p k ) where 1(.) is an indicator function, i.e. 1(x) = 1 if p i 0, and 0 otherwise. Higher R means greater diversity Very dependent upon depth of sampling and sensitive to presence of rare species 5 6 Sanders 1968 non-parametric richness estimate coverage Sanders, H. L. (1968). Marine benthic diversity: a comparative study. American Naturalist Rarefaction Curves Number of species # Observations / Library Size / # Reads / Sample Size 7 Shannon index Suppose we observe a community that can contain up to k species. The relative proportions of the species are P = {p 1,, p k }. Shannon index is related to the notion of information content from information theory. It roughly represents the amount of information that is available for the distribution of P. When p i = p j, for all i and j, then we have no information about which species a random draw will result in. As the inequality becomes more pronounced, we gain more information about the possible outcome of the draw. The Shannon index captures this property of the distribution. Shannon index is computed as S k = p 1 log 2 p 1 p 2 log 2 p 2 p k log 2 p k Note as p i 0, log 2 p i, we therefore define p i log 2 p i = 0. Higher S k means higher diversity Shannon entropy 8
3 From Shannon to Evenness Shannon index for a community of k species has a maximum at log 2 k We can make different communities more comparable if we normalize by the maximum Evenness index is computed as E k =S k /log 2 k E k =1 means total evenness Simpson index Suppose we observe a community that can contain up to k species. The relative proportions of the species are P = {p 1,, p k }. Simpson index is the probability of resampling the same species on two consecutive draws with replacement. Suppose on the first draw we picked species i, this event has probability p i, hence the probability of drawing that species twice is p i *p i. Simpson index is usually computed as: D=1 (p p p k 2 ) In this case, the index represents the probability that two individuals randomly selected from a sample will belong to different species. D = 0 means no diversity (1 species is completely dominant) D = 1 means complete diversity 9 10 Numbers equivalent diversity Often it is convenient to talk about alpha diversity in terms of equivalent units: How many equally abundant taxa will it take to get the same diversity as we see in a given community? For richness there is no difference in statistic For Shannon, remember that log 2 k is the maximum which is attained when all species equal abundance. Hence the diversity in equivalent units is 2 Sk For Simpson the equivalent units measure of diversity is 1/(1- D) Sometimes called Inverse Simpson Index Beta- Diversity 11 12
4 Beta- Diversity Microbial ecologists typically use beta diversity as a broad umbrella term that can refer to any of several indices related to compositional differences (Differences in species content between samples) For some reason this is contentious, and there appears to be ongoing (and pointless?) argument over the possible definitions For our purposes, and microbiome research, when you hear beta- diversity, you can probably think: Diversity of species composition 13 Summary of diversity types α diversity within a community, # of species only β diversity between communities (differentiation), species identity is taken into account γ (global) diversity of the site Theoretically, one would wishes to use such measures that result in γ = α β This is only possible if α and β are independent of each other. 14 Beta- Diversity in practice Dimensional Reduction 1.UniFrac or Bray- Curtis distance between samples 2.MDS ( PCoA ) 3.Plot first two axes 4.Admire clusters 5.Write Paper 6.Choose new microbiomes 7.Return to Step 1, Repeat Why? Let s back up. This is one option in an arsenal of dimensional reduction methods, that come from unsupervised learning in exploratory data analysis Regress disc on weight Regress weight on disc 15 16
5 Dimensional Reduction Minimize the distance to the line in both directions the purple line is the principal component line Dimensional Reduction Principal Components are Linear Combinations of the old variables The projection that maximizes the area of the shadow and an equivalent measurement is the sums of squares of the distances between points in the projection, we want to see as much of the variation as possible, that s what PCA does The PCA workflow Ordination Using the Tree 1. UniFrac- PCoA 2. Double Principal Coordinates 19 20
6 Ordination Best Practice Ordination Best Practice 1. Always look at scree plot 2. Variables, Samples 3. Biplot 4. Altogether (if readable) pca.turtles=dudi.pca(turtles[,-1],scannf=f,nf=2)! scatter(pca.turtles) How many axes are probably useful? Are their clusters? How many? Are their gradients? Are the patterns consistent with covariates (e.g. sample observations) How might we test this? Are their clusters? How many?! Gap Statistic 23 24
7 Are their gradients?! PCA regression Are the patterns consistent with covariates How might we test this? (Permutational) Multivariate ANOVA vegan::adonis( ) 25 26
Lecture: Mixture Models for Microbiome data
Lecture: Mixture Models for Microbiome data Lecture 3: Mixture Models for Microbiome data Outline: - - Sequencing thought experiment Mixture Models (tangent) - (esp. Negative Binomial) - Differential abundance
More informationLecture 2: Descriptive statistics, normalizations & testing
Lecture 2: Descriptive statistics, normalizations & testing From sequences to OTU table Sequencing Sample 1 Sample 2... Sample N Abundances of each microbial taxon in each of the N samples 2 1 Normalizing
More informationOther resources. Greengenes (bacterial) Silva (bacteria, archaeal and eukarya)
General QIIME resources http://qiime.org/ Blog (news, updates): http://qiime.wordpress.com/ Support/forum: https://groups.google.com/forum/#!forum/qiimeforum Citing QIIME: Caporaso, J.G. et al., QIIME
More informationLecture 3: Mixture Models for Microbiome data. Lecture 3: Mixture Models for Microbiome data
Lecture 3: Mixture Models for Microbiome data 1 Lecture 3: Mixture Models for Microbiome data Outline: - Mixture Models (Negative Binomial) - DESeq2 / Don t Rarefy. Ever. 2 Hypothesis Tests - reminder
More informationLecture 5: Ecological distance metrics; Principal Coordinates Analysis. Univariate testing vs. community analysis
Lecture 5: Ecological distance metrics; Principal Coordinates Analysis Univariate testing vs. community analysis Univariate testing deals with hypotheses concerning individual taxa Is this taxon differentially
More informationLecture 5: Ecological distance metrics; Principal Coordinates Analysis. Univariate testing vs. community analysis
Lecture 5: Ecological distance metrics; Principal Coordinates Analysis Univariate testing vs. community analysis Univariate testing deals with hypotheses concerning individual taxa Is this taxon differentially
More informationMultivariate Statistics 101. Ordination (PCA, NMDS, CA) Cluster Analysis (UPGMA, Ward s) Canonical Correspondence Analysis
Multivariate Statistics 101 Ordination (PCA, NMDS, CA) Cluster Analysis (UPGMA, Ward s) Canonical Correspondence Analysis Multivariate Statistics 101 Copy of slides and exercises PAST software download
More informationOutline Classes of diversity measures. Species Divergence and the Measurement of Microbial Diversity. How do we describe and compare diversity?
Species Divergence and the Measurement of Microbial Diversity Cathy Lozupone University of Colorado, Boulder. Washington University, St Louis. Outline Classes of diversity measures α vs β diversity Quantitative
More informationMachine Learning for Data Science (CS4786) Lecture 12
Machine Learning for Data Science (CS4786) Lecture 12 Gaussian Mixture Models Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016fa/ Back to K-means Single link is sensitive to outliners We
More informationParametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a
Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a
More informationExperimental Design and Data Analysis for Biologists
Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1
More informationSupplementary Information
Supplementary Information Table S1. Per-sample sequences, observed OTUs, richness estimates, diversity indices and coverage. Samples codes as follows: YED (Young leaves Endophytes), MED (Mature leaves
More information-Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the
1 2 3 -Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the 1950's. -PCA is based on covariance or correlation
More informationLecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26
Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1
More informationFIG S1: Rarefaction analysis of observed richness within Drosophila. All calculations were
Page 1 of 14 FIG S1: Rarefaction analysis of observed richness within Drosophila. All calculations were performed using mothur (2). OTUs were defined at the 3% divergence threshold using the average neighbor
More informationCharacterizing and predicting cyanobacterial blooms in an 8-year
1 2 3 4 5 Characterizing and predicting cyanobacterial blooms in an 8-year amplicon sequencing time-course Authors Nicolas Tromas 1*, Nathalie Fortin 2, Larbi Bedrani 1, Yves Terrat 1, Pedro Cardoso 4,
More informationAlgebra of Principal Component Analysis
Algebra of Principal Component Analysis 3 Data: Y = 5 Centre each column on its mean: Y c = 7 6 9 y y = 3..6....6.8 3. 3.8.6 Covariance matrix ( variables): S = -----------Y n c ' Y 8..6 c =.6 5.8 Equation
More informationBIO 682 Multivariate Statistics Spring 2008
BIO 682 Multivariate Statistics Spring 2008 Steve Shuster http://www4.nau.edu/shustercourses/bio682/index.htm Lecture 11 Properties of Community Data Gauch 1982, Causton 1988, Jongman 1995 a. Qualitative:
More informationSUPPLEMENTARY INFORMATION
doi:10.1038/nature25973 Power Simulations We performed extensive power simulations to demonstrate that the analyses carried out in our study are well powered. Our simulations indicate very high power for
More informationEigenvalues, Eigenvectors, and an Intro to PCA
Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Changing Basis We ve talked so far about re-writing our data using a new set of variables, or a new basis.
More informationEigenvalues, Eigenvectors, and an Intro to PCA
Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Changing Basis We ve talked so far about re-writing our data using a new set of variables, or a new basis.
More informationNumerical Methods and Computation Prof. S.R.K. Iyengar Department of Mathematics Indian Institute of Technology Delhi
Numerical Methods and Computation Prof. S.R.K. Iyengar Department of Mathematics Indian Institute of Technology Delhi Lecture No - 27 Interpolation and Approximation (Continued.) In our last lecture we
More informationLecture 2: Probability Distributions
EAS31136/B9036: Statistics in Earth & Atmospheric Sciences Lecture 2: Probability Distributions Instructor: Prof. Johnny Luo www.sci.ccny.cuny.edu/~luo Dates Topic Reading (Based on the 2 nd Edition of
More informationPRINCIPAL COMPONENTS ANALYSIS
121 CHAPTER 11 PRINCIPAL COMPONENTS ANALYSIS We now have the tools necessary to discuss one of the most important concepts in mathematical statistics: Principal Components Analysis (PCA). PCA involves
More informationDETECTING BIOLOGICAL AND ENVIRONMENTAL CHANGES: DESIGN AND ANALYSIS OF MONITORING AND EXPERIMENTS (University of Bologna, 3-14 March 2008)
Dipartimento di Biologia Evoluzionistica Sperimentale Centro Interdipartimentale di Ricerca per le Scienze Ambientali in Ravenna INTERNATIONAL WINTER SCHOOL UNIVERSITY OF BOLOGNA DETECTING BIOLOGICAL AND
More informationHow to quantify biological diversity: taxonomical, functional and evolutionary aspects. Hanna Tuomisto, University of Turku
How to quantify biological diversity: taxonomical, functional and evolutionary aspects Hanna Tuomisto, University of Turku Why quantify biological diversity? understanding the structure and function of
More informationUsing Topological Data Analysis to find discrimination between microbial states in human microbiome data
Using Topological Data Analysis to find discrimination between microbial states in human microbiome data Mehrdad Yazdani 1,2, Larry Smarr 1,3 and Rob Knight 4 1 California Institute for Telecommunications
More informationFlowchart. (b) (c) (d)
Flowchart (c) (b) (d) This workflow consists of the following steps: alpha diversity (microbial community evenness and richness) d1) Generate rarefied OTU tables (mulbple_rarefacbons.py) d2) Compute measures
More informationChad Burrus April 6, 2010
Chad Burrus April 6, 2010 1 Background What is UniFrac? Materials and Methods Results Discussion Questions 2 The vast majority of microbes cannot be cultured with current methods Only half (26) out of
More information4. Ordination in reduced space
Université Laval Analyse multivariable - mars-avril 2008 1 4.1. Generalities 4. Ordination in reduced space Contrary to most clustering techniques, which aim at revealing discontinuities in the data, ordination
More informationMultivariate Analysis of Ecological Data
Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology
More informationEdwin A. Hernández-Delgado*
Long-term Coral Reef Ecological Change Monitoring Program of the Luis Peña Channel Marine Fishery Reserve, Culebra Island, Puerto Rico: I. Status of the coral reef epibenthic communities (1997-2002). Edwin
More informationMultivariate Analysis of Ecological Data
Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology
More information4/4/2018. Stepwise model fitting. CCA with first three variables only Call: cca(formula = community ~ env1 + env2 + env3, data = envdata)
0 Correlation matrix for ironmental matrix 1 2 3 4 5 6 7 8 9 10 11 12 0.087451 0.113264 0.225049-0.13835 0.338366-0.01485 0.166309-0.11046 0.088327-0.41099-0.19944 1 1 2 0.087451 1 0.13723-0.27979 0.062584
More informationMultivariate analysis
Multivariate analysis Prof dr Ann Vanreusel -Multidimensional scaling -Simper analysis -BEST -ANOSIM 1 2 Gradient in species composition 3 4 Gradient in environment site1 site2 site 3 site 4 site species
More informationStructure in Data. A major objective in data analysis is to identify interesting features or structure in the data.
Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two
More informationUnconstrained Ordination
Unconstrained Ordination Sites Species A Species B Species C Species D Species E 1 0 (1) 5 (1) 1 (1) 10 (4) 10 (4) 2 2 (3) 8 (3) 4 (3) 12 (6) 20 (6) 3 8 (6) 20 (6) 10 (6) 1 (2) 3 (2) 4 4 (5) 11 (5) 8 (5)
More informationMultivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis
Multivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis For example Data reduction approaches Cluster analysis Principal components analysis
More informationJeffrey D. Ullman Stanford University
Jeffrey D. Ullman Stanford University 2 Often, our data can be represented by an m-by-n matrix. And this matrix can be closely approximated by the product of two matrices that share a small common dimension
More informationLinear Programming and its Extensions Prof. Prabha Shrama Department of Mathematics and Statistics Indian Institute of Technology, Kanpur
Linear Programming and its Extensions Prof. Prabha Shrama Department of Mathematics and Statistics Indian Institute of Technology, Kanpur Lecture No. # 03 Moving from one basic feasible solution to another,
More informationG E INTERACTION USING JMP: AN OVERVIEW
G E INTERACTION USING JMP: AN OVERVIEW Sukanta Dash I.A.S.R.I., Library Avenue, New Delhi-110012 sukanta@iasri.res.in 1. Introduction Genotype Environment interaction (G E) is a common phenomenon in agricultural
More information20 Unsupervised Learning and Principal Components Analysis (PCA)
116 Jonathan Richard Shewchuk 20 Unsupervised Learning and Principal Components Analysis (PCA) UNSUPERVISED LEARNING We have sample points, but no labels! No classes, no y-values, nothing to predict. Goal:
More informationClusters. Unsupervised Learning. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved
Clusters Unsupervised Learning Luc Anselin http://spatial.uchicago.edu 1 curse of dimensionality principal components multidimensional scaling classical clustering methods 2 Curse of Dimensionality 3 Curse
More informationStudying the effect of species dominance on diversity patterns using Hill numbers-based indices
Studying the effect of species dominance on diversity patterns using Hill numbers-based indices Loïc Chalmandrier Loïc Chalmandrier Diversity pattern analysis November 8th 2017 1 / 14 Introduction Diversity
More informationPrincipal Components Analysis. Sargur Srihari University at Buffalo
Principal Components Analysis Sargur Srihari University at Buffalo 1 Topics Projection Pursuit Methods Principal Components Examples of using PCA Graphical use of PCA Multidimensional Scaling Srihari 2
More informationTaxonomy and Clustering of SSU rrna Tags. Susan Huse Josephine Bay Paul Center August 5, 2013
Taxonomy and Clustering of SSU rrna Tags Susan Huse Josephine Bay Paul Center August 5, 2013 Primary Methods of Taxonomic Assignment Bayesian Kmer Matching RDP http://rdp.cme.msu.edu Wang, et al (2007)
More informationMachine Learning (Spring 2012) Principal Component Analysis
1-71 Machine Learning (Spring 1) Principal Component Analysis Yang Xu This note is partly based on Chapter 1.1 in Chris Bishop s book on PRML and the lecture slides on PCA written by Carlos Guestrin in
More informationResampling Methods. Lukas Meier
Resampling Methods Lukas Meier 20.01.2014 Introduction: Example Hail prevention (early 80s) Is a vaccination of clouds really reducing total energy? Data: Hail energy for n clouds (via radar image) Y i
More informationMultivariate Statistics Summary and Comparison of Techniques. Multivariate Techniques
Multivariate Statistics Summary and Comparison of Techniques P The key to multivariate statistics is understanding conceptually the relationship among techniques with regards to: < The kinds of problems
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationIntroduction to multivariate analysis Outline
Introduction to multivariate analysis Outline Why do a multivariate analysis Ordination, classification, model fitting Principal component analysis Discriminant analysis, quickly Species presence/absence
More informationPrincipal Component Analysis. Applied Multivariate Statistics Spring 2012
Principal Component Analysis Applied Multivariate Statistics Spring 2012 Overview Intuition Four definitions Practical examples Mathematical example Case study 2 PCA: Goals Goal 1: Dimension reduction
More informationCPSC 340: Machine Learning and Data Mining. More PCA Fall 2017
CPSC 340: Machine Learning and Data Mining More PCA Fall 2017 Admin Assignment 4: Due Friday of next week. No class Monday due to holiday. There will be tutorials next week on MAP/PCA (except Monday).
More informationCS 6375 Machine Learning
CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.
More informationVariations in pelagic bacterial communities in the North Atlantic Ocean coincide with water bodies
The following supplement accompanies the article Variations in pelagic bacterial communities in the North Atlantic Ocean coincide with water bodies Richard L. Hahnke 1, Christina Probian 1, Bernhard M.
More informationLecture 7 Spectral methods
CSE 291: Unsupervised learning Spring 2008 Lecture 7 Spectral methods 7.1 Linear algebra review 7.1.1 Eigenvalues and eigenvectors Definition 1. A d d matrix M has eigenvalue λ if there is a d-dimensional
More informationINTRODUCTION TO MULTIVARIATE ANALYSIS OF ECOLOGICAL DATA
INTRODUCTION TO MULTIVARIATE ANALYSIS OF ECOLOGICAL DATA David Zelený & Ching-Feng Li INTRODUCTION TO MULTIVARIATE ANALYSIS Ecologial similarity similarity and distance indices Gradient analysis regression,
More informationUnsupervised learning: beyond simple clustering and PCA
Unsupervised learning: beyond simple clustering and PCA Liza Rebrova Self organizing maps (SOM) Goal: approximate data points in R p by a low-dimensional manifold Unlike PCA, the manifold does not have
More informationPalaeontological community and diversity analysis brief notes. Oyvind Hammer Paläontologisches Institut und Museum, Zürich
Palaeontological community and diversity analysis brief notes Oyvind Hammer Paläontologisches Institut und Museum, Zürich ohammer@nhm.uio.no Zürich, June 3, 2002 Contents 1 Introduction 2 2 The basics
More informationClassification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees
Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees Rafdord M. Neal and Jianguo Zhang Presented by Jiwen Li Feb 2, 2006 Outline Bayesian view of feature
More informationLecture: Face Recognition and Feature Reduction
Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 11-1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed
More informationPRINCIPAL COMPONENTS ANALYSIS (PCA)
PRINCIPAL COMPONENTS ANALYSIS (PCA) Introduction PCA is considered an exploratory technique that can be used to gain a better understanding of the interrelationships between variables. PCA is performed
More informationCSE 554 Lecture 7: Alignment
CSE 554 Lecture 7: Alignment Fall 2012 CSE554 Alignment Slide 1 Review Fairing (smoothing) Relocating vertices to achieve a smoother appearance Method: centroid averaging Simplification Reducing vertex
More informationSTA414/2104. Lecture 11: Gaussian Processes. Department of Statistics
STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations
More informationClassification and Regression Trees
Classification and Regression Trees Ryan P Adams So far, we have primarily examined linear classifiers and regressors, and considered several different ways to train them When we ve found the linearity
More informationIntroduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur
Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Module - 5 Lecture - 22 SVM: The Dual Formulation Good morning.
More informationReal Analysis Prof. S.H. Kulkarni Department of Mathematics Indian Institute of Technology, Madras. Lecture - 13 Conditional Convergence
Real Analysis Prof. S.H. Kulkarni Department of Mathematics Indian Institute of Technology, Madras Lecture - 13 Conditional Convergence Now, there are a few things that are remaining in the discussion
More informationSTAT 730 Chapter 14: Multidimensional scaling
STAT 730 Chapter 14: Multidimensional scaling Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Data Analysis 1 / 16 Basic idea We have n objects and a matrix
More informationOperation and Supply Chain Management Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology, Madras
Operation and Supply Chain Management Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology, Madras Lecture - 3 Forecasting Linear Models, Regression, Holt s, Seasonality
More informationGenerative Clustering, Topic Modeling, & Bayesian Inference
Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week
More informationTitle ghost-tree: creating hybrid-gene phylogenetic trees for diversity analyses
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 Title ghost-tree: creating hybrid-gene phylogenetic trees for diversity analyses
More informationDiversity partitioning without statistical independence of alpha and beta
1964 Ecology, Vol. 91, No. 7 Ecology, 91(7), 2010, pp. 1964 1969 Ó 2010 by the Ecological Society of America Diversity partitioning without statistical independence of alpha and beta JOSEPH A. VEECH 1,3
More informationGentle Introduction to Infinite Gaussian Mixture Modeling
Gentle Introduction to Infinite Gaussian Mixture Modeling with an application in neuroscience By Frank Wood Rasmussen, NIPS 1999 Neuroscience Application: Spike Sorting Important in neuroscience and for
More informationSupplementary Materials for
advances.sciencemag.org/cgi/content/full/2/1/e1500997/dc1 Supplementary Materials for Social behavior shapes the chimpanzee pan-microbiome Andrew H. Moeller, Steffen Foerster, Michael L. Wilson, Anne E.
More informationLinear Regression (9/11/13)
STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationInformation Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay
Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 13 Competitive Optimality of the Shannon Code So, far we have studied
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationIntroduction to Machine Learning
10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what
More informationAn Adaptive Association Test for Microbiome Data
An Adaptive Association Test for Microbiome Data Chong Wu 1, Jun Chen 2, Junghi 1 Kim and Wei Pan 1 1 Division of Biostatistics, School of Public Health, University of Minnesota; 2 Division of Biomedical
More informationRandomized Decision Trees
Randomized Decision Trees compiled by Alvin Wan from Professor Jitendra Malik s lecture Discrete Variables First, let us consider some terminology. We have primarily been dealing with real-valued data,
More informationMATH 3C: MIDTERM 1 REVIEW. 1. Counting
MATH 3C: MIDTERM REVIEW JOE HUGHES. Counting. Imagine that a sports betting pool is run in the following way: there are 20 teams, 2 weeks, and each week you pick a team to win. However, you can t pick
More informationLecture: Face Recognition and Feature Reduction
Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab 1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed in the
More informationDiscriminant analysis and supervised classification
Discriminant analysis and supervised classification Angela Montanari 1 Linear discriminant analysis Linear discriminant analysis (LDA) also known as Fisher s linear discriminant analysis or as Canonical
More informationUnsupervised Learning: Dimensionality Reduction
Unsupervised Learning: Dimensionality Reduction CMPSCI 689 Fall 2015 Sridhar Mahadevan Lecture 3 Outline In this lecture, we set about to solve the problem posed in the previous lecture Given a dataset,
More informationTABLE OF CONTENTS CHAPTER 1 COMBINATORIAL PROBABILITY 1
TABLE OF CONTENTS CHAPTER 1 COMBINATORIAL PROBABILITY 1 1.1 The Probability Model...1 1.2 Finite Discrete Models with Equally Likely Outcomes...5 1.2.1 Tree Diagrams...6 1.2.2 The Multiplication Principle...8
More informationSYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions
SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu
More informationMotivating the Covariance Matrix
Motivating the Covariance Matrix Raúl Rojas Computer Science Department Freie Universität Berlin January 2009 Abstract This note reviews some interesting properties of the covariance matrix and its role
More informationJeffrey D. Ullman Stanford University
Jeffrey D. Ullman Stanford University 3 We are given a set of training examples, consisting of input-output pairs (x,y), where: 1. x is an item of the type we want to evaluate. 2. y is the value of some
More informationChapter 15. Probability Rules! Copyright 2012, 2008, 2005 Pearson Education, Inc.
Chapter 15 Probability Rules! Copyright 2012, 2008, 2005 Pearson Education, Inc. The General Addition Rule When two events A and B are disjoint, we can use the addition rule for disjoint events from Chapter
More informationPrincipal component analysis (PCA) for clustering gene expression data
Principal component analysis (PCA) for clustering gene expression data Ka Yee Yeung Walter L. Ruzzo Bioinformatics, v17 #9 (2001) pp 763-774 1 Outline of talk Background and motivation Design of our empirical
More informationClassification 2: Linear discriminant analysis (continued); logistic regression
Classification 2: Linear discriminant analysis (continued); logistic regression Ryan Tibshirani Data Mining: 36-462/36-662 April 4 2013 Optional reading: ISL 4.4, ESL 4.3; ISL 4.3, ESL 4.4 1 Reminder:
More informationStatistics Toolbox 6. Apply statistical algorithms and probability models
Statistics Toolbox 6 Apply statistical algorithms and probability models Statistics Toolbox provides engineers, scientists, researchers, financial analysts, and statisticians with a comprehensive set of
More informationLecture 6 Proof for JL Lemma and Linear Dimensionality Reduction
COMS 4995: Unsupervised Learning (Summer 18) June 7, 018 Lecture 6 Proof for JL Lemma and Linear imensionality Reduction Instructor: Nakul Verma Scribes: Ziyuan Zhong, Kirsten Blancato This lecture gives
More informationStructural Equation Modeling and Confirmatory Factor Analysis. Types of Variables
/4/04 Structural Equation Modeling and Confirmatory Factor Analysis Advanced Statistics for Researchers Session 3 Dr. Chris Rakes Website: http://csrakes.yolasite.com Email: Rakes@umbc.edu Twitter: @RakesChris
More informationLECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS
LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS NOTES FROM PRE- LECTURE RECORDING ON PCA PCA and EFA have similar goals. They are substantially different in important ways. The goal
More informationChapter 11 Canonical analysis
Chapter 11 Canonical analysis 11.0 Principles of canonical analysis Canonical analysis is the simultaneous analysis of two, or possibly several data tables. Canonical analyses allow ecologists to perform
More informationPackage MiRKATS. May 30, 2016
Type Package Package MiRKATS May 30, 2016 Title Microbiome Regression-based Kernal Association Test for Survival (MiRKAT-S) Version 1.0 Date 2016-04-22 Author Anna Plantinga , Michael
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More information