Multivariate Statistics 101. Ordination (PCA, NMDS, CA) Cluster Analysis (UPGMA, Ward s) Canonical Correspondence Analysis

Similar documents
4. Ordination in reduced space

INTRODUCTION TO MULTIVARIATE ANALYSIS OF ECOLOGICAL DATA

Ordination & PCA. Ordination. Ordination

Lecture 2: Diversity, Distances, adonis. Lecture 2: Diversity, Distances, adonis. Alpha- Diversity. Alpha diversity definition(s)

Principal Component Analysis (PCA) Theory, Practice, and Examples

Algebra of Principal Component Analysis

BIO 682 Multivariate Statistics Spring 2008

Experimental Design and Data Analysis for Biologists

Short Answer Questions: Answer on your separate blank paper. Points are given in parentheses.

Multivariate analysis of genetic data an introduction

Multivariate analysis

Unconstrained Ordination

Multivariate Analysis of Ecological Data

4/2/2018. Canonical Analyses Analysis aimed at identifying the relationship between two multivariate datasets. Cannonical Correlation.

4/4/2018. Stepwise model fitting. CCA with first three variables only Call: cca(formula = community ~ env1 + env2 + env3, data = envdata)

DIMENSION REDUCTION AND CLUSTER ANALYSIS

Multivariate analysis of genetic data: an introduction

Introduction to multivariate analysis Outline

diversity(datamatrix, index= shannon, base=exp(1))

1.3. Principal coordinate analysis. Pierre Legendre Département de sciences biologiques Université de Montréal

EXAM PRACTICE. 12 questions * 4 categories: Statistics Background Multivariate Statistics Interpret True / False

Multivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis

Chapter 11 Canonical analysis

8. FROM CLASSICAL TO CANONICAL ORDINATION

FINM 331: MULTIVARIATE DATA ANALYSIS FALL 2017 PROBLEM SET 3

Introduction to ordination. Gary Bradfield Botany Dept.

VarCan (version 1): Variation Estimation and Partitioning in Canonical Analysis

An Introduction to Applied Multivariate Analysis with R

Linking species-compositional dissimilarities and environmental data for biodiversity assessment

Factors affecting the Power and Validity of Randomization-based Multivariate Tests for Difference among Ecological Assemblages

Multivariate Ordination Analyses: Principal Component Analysis. Dilys Vela

An Introduction to R for the Geosciences: Ordination I

Lecture 5: Ecological distance metrics; Principal Coordinates Analysis. Univariate testing vs. community analysis

CAP. Canonical Analysis of Principal coordinates. A computer program by Marti J. Anderson. Department of Statistics University of Auckland (2002)

DETECTING BIOLOGICAL AND ENVIRONMENTAL CHANGES: DESIGN AND ANALYSIS OF MONITORING AND EXPERIMENTS (University of Bologna, 3-14 March 2008)

Indirect Gradient Analysis

Multivariate Analysis of Ecological Data using CANOCO

Data Analysis in Paleontology Using R. Data Manipulation

A User's Guide To Principal Components

Multivariate Statistics Fundamentals Part 1: Rotation-based Techniques

Revision: Chapter 1-6. Applied Multivariate Statistics Spring 2012

-Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the

PCA Advanced Examples & Applications

Statistics II 1. Modelling Biology. Basic Applications of Mathematics and Statistics in the Biological Sciences

Dimensionality Reduction Techniques (DRT)

Distance Measures. Objectives: Discuss Distance Measures Illustrate Distance Measures

Latent Variable Methods Course

Correspondence Analysis & Related Methods

Maximum variance formulation

Multivariate Analysis of Ecological Data

STAT 730 Chapter 14: Multidimensional scaling

Multivariate Statistics Summary and Comparison of Techniques. Multivariate Techniques

Variations in pelagic bacterial communities in the North Atlantic Ocean coincide with water bodies

Lecture 5: Ecological distance metrics; Principal Coordinates Analysis. Univariate testing vs. community analysis

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Chapter 1 Ordination Methods and the Evaluation of Ediacaran Communities

Lab 7. Direct & Indirect Gradient Analysis

JUST THE MATHS UNIT NUMBER 9.9. MATRICES 9 (Modal & spectral matrices) A.J.Hobson

Species Associations: The Kendall Coefficient of Concordance Revisited

Chad Burrus April 6, 2010

Distances and similarities Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Dimension Reduction (PCA, ICA, CCA, FLD,

Techniques and Applications of Multivariate Analysis

Statistics 202: Data Mining. c Jonathan Taylor. Week 2 Based in part on slides from textbook, slides of Susan Holmes. October 3, / 1

2/19/2018. Dataset: 85,122 islands 19,392 > 1km 2 17,883 with data

Part I. Other datatypes, preprocessing. Other datatypes. Other datatypes. Week 2 Based in part on slides from textbook, slides of Susan Holmes

How to analyze multiple distance matrices

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

DIAGONALIZATION OF THE STRESS TENSOR

Econ Slides from Lecture 8

What is Principal Component Analysis?

Other resources. Greengenes (bacterial) Silva (bacteria, archaeal and eukarya)

Vector Space Models. wine_spectral.r

Lecture 6 Proof for JL Lemma and Linear Dimensionality Reduction

Statistical Machine Learning

Exercises * on Principal Component Analysis

6. Let C and D be matrices conformable to multiplication. Then (CD) =

Dissimilarity and transformations. Pierre Legendre Département de sciences biologiques Université de Montréal

FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING

Principal Components Analysis. Sargur Srihari University at Buffalo

Clusters. Unsupervised Learning. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Table of Contents. Multivariate methods. Introduction II. Introduction I

Textbook Examples of. SPSS Procedure

Introduction to Machine Learning

Unsupervised dimensionality reduction

Multivariate Analysis of Ecological Data

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining

Chapter 4: Factor Analysis

14 Singular Value Decomposition

Statistical Pattern Recognition

Linear & Non-Linear Discriminant Analysis! Hugh R. Wilson

COMPOSITIONAL DATA IN COMMUNITY ECOLOGY: THE PARADIGM OR PERIL OF PROPORTIONS?

Linear Algebra: Characteristic Value Problem

LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS

Lecture: Mixture Models for Microbiome data

Preprocessing & dimensionality reduction

Palaeontological community and diversity analysis brief notes. Oyvind Hammer Paläontologisches Institut und Museum, Zürich

Transcription:

Multivariate Statistics 101 Ordination (PCA, NMDS, CA) Cluster Analysis (UPGMA, Ward s) Canonical Correspondence Analysis

Multivariate Statistics 101 Copy of slides and exercises PAST software download http://folk.uio.no/ohammer/past/ Quinn & Keough Textbook http://www.zoology.unimelb.edu.au/qkstats/

1. Data Prep Habitat & biological data transformations Input data into PAST software (PAleontological STatistics)

2. Ordination (Habitat) Principal Component Analysis (PCA) of Habitat Data

3. Ordination (Biology) Multidimensional Scaling (MDS) and Correspondence Analysis (CA) of Biological data

4. Clustering and CCA Clustering alternative to ordination Canonical Correspondence Analysis of Habitat and Biological Data Putting it all together!

Multivariate Statistics 101 What types of data do you collect?

1. Data Prep Habitat & biological data transformations Input data into PAST software (PAleontological STatistics)

1. Data Prep Typically standardize habitat data because parameters are measure on different scales Typically Log (x+1) transform taxa abundance data to down weight influence of common taxa

Data Standardizations Important/ essential in multivariate analyses to reduce the influence of large values: Transformations (e.g., log; also for linearity) Centering (value mean) results in mean of zero for all variables (essential if variables are measured on different scales) Standardizing (centered data / SD) results in a mean of zero & SD of one for all variables (but you may be interested in changes in variability)

Excel and PAST Demos and Exercises Data and transformations Data input into PAST

General Multivariate Analyses

Ordination and Clustering Raw data matrix Sites Between site distance matrix Sites Ordination or Cluster solution of relative similarity among sites Taxa Sites Group A Group B

Taxa Ordination Raw data matrix Sites Group A Group B Between site distance matrix Group A Group B Group A Group B Ordination solution showing similar groups of sites

Taxa Clustering Raw data matrix Sites Group A Group B Between site distance matrix Group A Group B Cluster solution showing biologically similar groups of sites Group A Group B Group A Group B

Multivariate Lingo

Multivariate Distances Many options but often: Euclidean Distance is used for chemical or physical data Chi-squared or Bray-Curtis Distance is used for biological data

Missing Data Delete or consolidate entire row (many missing) Substitute column mean (few missing) Estimate values (many missing & don t want to delete data)

Ordination Typically use Principle Components Analysis (PCA; based on Euclidean Distance) for chemical or physical data Typically use Correspondence Analysis (CA; based on chi-squared distance) or Multi-dimensional Scaling (MDS) with Bray-Curtis Distance for biological data

2. Ordination (Habitat) Principal Component Analysis (PCA) of Habitat Data

Ordination Typically use Principle Components Analysis (PCA; based on Euclidean Distance) for chemical or physical data

Taxa Ordination Raw data matrix Sites Group A Group B Between site distance matrix Group A Group B Group A Group B Ordination solution showing similar groups of sites

PCA PCA Axis 2 26% TVE, positively correlated with ph PCA Axis 1 explains 66% of variability, positively correlated with temperature, negatively correlated with flow

Multivariate Analyses Eigenvalues % of total variation explained Eigenvectors relationship between new variable and original variable (the higher the value, the stronger the relationship)

PCA PCA Axis 2 26% TVE, positively correlated with ph PCA Axis 1 explains 66% of variability, positively correlated with temperature, negatively correlated with flow

Axis Scores Axis 2 Axis 1

Compare to original data! Axis 3 Axis 2

PAST Demo & Exercise PCA of habitat data

3. Ordination (Biology) Multidimensional Scaling (MDS) and Correspondence Analysis (CA) of Biological data

Ordination Typically use Correspondence Analysis (CA; based on chi-squared distance) or Multi-dimensional Scaling (MDS) with Bray-Curtis Distance for biological data

Taxa Ordination Raw data matrix Sites Group A Group B Between site distance matrix Group A Group B Group A Group B Ordination solution showing similar groups of sites

MDS (or NMDS*) MDS Axis 2 33% TVE, + correlation with mites MDS Axis 1 explains 52% of variability, negatively correlated with mayflies * Stress in NMDS should be < 0.10

Ordination Typically use Correspondence Analysis (CA; based on chi-squared distance) or Multi-dimensional Scaling (MDS) with Bray-Curtis Distance for biological data

CA* CA Axis 2 explains 24% TVE EPT Mites CA Axis 1 explains 57% TVE * Important to remove rare species/ taxa

Re-do if you get a horseshoe Axis 2 Axis 1

Multivariate Analyses Eigenvalues % of total variation explained Eigenvectors relationship between new variable and original variable (the higher the value, the stronger the relationship)

Axis Scores Axis 2 Axis 1

Compare to original data! Axis 3 Axis 2

PAST Demos & Exercises NMDS ordination CA ordination

4. Clustering and CCA Clustering alternative to ordination Canonical Correspondence Analysis of Habitat and Biological Data Putting it all together!

Taxa Clustering Raw data matrix Sites Group A Group B Between site distance matrix Group A Group B Cluster solution showing biologically similar groups of sites Group A Group B Group A Group B

Dendrogram

Clustering Methods Clustering observations many methods (UPGMA, Wards) e.g., groups like streams

Clustering Methods Clustering variables used to reduce number of variables without changing the scale (more intuitive than new axes scores) many methods (UPGMA, Wards) e.g., stream water chemistry variables

4. Clustering and CCA Clustering alternative to ordination Canonical Correspondence Analysis of Habitat and Biological Data Putting it all together!

Relating Biological & Environmental Data Relationships between species distributions and their habitat (chemical and physical surroundings)

Canonical Correspondence Analysis

Canonical Correspondence Analysis Multiple Y variables & multiple X variables Extension of CA CCA algorithm produces axes that represent the maximum correlation with linear combinations of environmental variables

CA* CA Axis 2 explains 24% TVE EPT Mites CA Axis 1 explains 57% TVE * Important to remove rare species/ taxa

CCA* CA Axis 2 explains 24% TVE EPT O 2 TP Chir. CA Axis 1 explains 57% TVE * Important to remove rare species/ taxa

PAST Demos & Exercises UPGMA & Ward s Clustering CCA

Multivariate Statistics 101 What types of data do you collect? How could you use Multivariate Statistics to analyze it?

Multivariate Statistics 101 Ordination (PCA, NMDS, CA) Cluster Analysis (UPGMA, Ward s) Canonical Correspondence Analysis