Multivariate Statistics 101 Ordination (PCA, NMDS, CA) Cluster Analysis (UPGMA, Ward s) Canonical Correspondence Analysis
Multivariate Statistics 101 Copy of slides and exercises PAST software download http://folk.uio.no/ohammer/past/ Quinn & Keough Textbook http://www.zoology.unimelb.edu.au/qkstats/
1. Data Prep Habitat & biological data transformations Input data into PAST software (PAleontological STatistics)
2. Ordination (Habitat) Principal Component Analysis (PCA) of Habitat Data
3. Ordination (Biology) Multidimensional Scaling (MDS) and Correspondence Analysis (CA) of Biological data
4. Clustering and CCA Clustering alternative to ordination Canonical Correspondence Analysis of Habitat and Biological Data Putting it all together!
Multivariate Statistics 101 What types of data do you collect?
1. Data Prep Habitat & biological data transformations Input data into PAST software (PAleontological STatistics)
1. Data Prep Typically standardize habitat data because parameters are measure on different scales Typically Log (x+1) transform taxa abundance data to down weight influence of common taxa
Data Standardizations Important/ essential in multivariate analyses to reduce the influence of large values: Transformations (e.g., log; also for linearity) Centering (value mean) results in mean of zero for all variables (essential if variables are measured on different scales) Standardizing (centered data / SD) results in a mean of zero & SD of one for all variables (but you may be interested in changes in variability)
Excel and PAST Demos and Exercises Data and transformations Data input into PAST
General Multivariate Analyses
Ordination and Clustering Raw data matrix Sites Between site distance matrix Sites Ordination or Cluster solution of relative similarity among sites Taxa Sites Group A Group B
Taxa Ordination Raw data matrix Sites Group A Group B Between site distance matrix Group A Group B Group A Group B Ordination solution showing similar groups of sites
Taxa Clustering Raw data matrix Sites Group A Group B Between site distance matrix Group A Group B Cluster solution showing biologically similar groups of sites Group A Group B Group A Group B
Multivariate Lingo
Multivariate Distances Many options but often: Euclidean Distance is used for chemical or physical data Chi-squared or Bray-Curtis Distance is used for biological data
Missing Data Delete or consolidate entire row (many missing) Substitute column mean (few missing) Estimate values (many missing & don t want to delete data)
Ordination Typically use Principle Components Analysis (PCA; based on Euclidean Distance) for chemical or physical data Typically use Correspondence Analysis (CA; based on chi-squared distance) or Multi-dimensional Scaling (MDS) with Bray-Curtis Distance for biological data
2. Ordination (Habitat) Principal Component Analysis (PCA) of Habitat Data
Ordination Typically use Principle Components Analysis (PCA; based on Euclidean Distance) for chemical or physical data
Taxa Ordination Raw data matrix Sites Group A Group B Between site distance matrix Group A Group B Group A Group B Ordination solution showing similar groups of sites
PCA PCA Axis 2 26% TVE, positively correlated with ph PCA Axis 1 explains 66% of variability, positively correlated with temperature, negatively correlated with flow
Multivariate Analyses Eigenvalues % of total variation explained Eigenvectors relationship between new variable and original variable (the higher the value, the stronger the relationship)
PCA PCA Axis 2 26% TVE, positively correlated with ph PCA Axis 1 explains 66% of variability, positively correlated with temperature, negatively correlated with flow
Axis Scores Axis 2 Axis 1
Compare to original data! Axis 3 Axis 2
PAST Demo & Exercise PCA of habitat data
3. Ordination (Biology) Multidimensional Scaling (MDS) and Correspondence Analysis (CA) of Biological data
Ordination Typically use Correspondence Analysis (CA; based on chi-squared distance) or Multi-dimensional Scaling (MDS) with Bray-Curtis Distance for biological data
Taxa Ordination Raw data matrix Sites Group A Group B Between site distance matrix Group A Group B Group A Group B Ordination solution showing similar groups of sites
MDS (or NMDS*) MDS Axis 2 33% TVE, + correlation with mites MDS Axis 1 explains 52% of variability, negatively correlated with mayflies * Stress in NMDS should be < 0.10
Ordination Typically use Correspondence Analysis (CA; based on chi-squared distance) or Multi-dimensional Scaling (MDS) with Bray-Curtis Distance for biological data
CA* CA Axis 2 explains 24% TVE EPT Mites CA Axis 1 explains 57% TVE * Important to remove rare species/ taxa
Re-do if you get a horseshoe Axis 2 Axis 1
Multivariate Analyses Eigenvalues % of total variation explained Eigenvectors relationship between new variable and original variable (the higher the value, the stronger the relationship)
Axis Scores Axis 2 Axis 1
Compare to original data! Axis 3 Axis 2
PAST Demos & Exercises NMDS ordination CA ordination
4. Clustering and CCA Clustering alternative to ordination Canonical Correspondence Analysis of Habitat and Biological Data Putting it all together!
Taxa Clustering Raw data matrix Sites Group A Group B Between site distance matrix Group A Group B Cluster solution showing biologically similar groups of sites Group A Group B Group A Group B
Dendrogram
Clustering Methods Clustering observations many methods (UPGMA, Wards) e.g., groups like streams
Clustering Methods Clustering variables used to reduce number of variables without changing the scale (more intuitive than new axes scores) many methods (UPGMA, Wards) e.g., stream water chemistry variables
4. Clustering and CCA Clustering alternative to ordination Canonical Correspondence Analysis of Habitat and Biological Data Putting it all together!
Relating Biological & Environmental Data Relationships between species distributions and their habitat (chemical and physical surroundings)
Canonical Correspondence Analysis
Canonical Correspondence Analysis Multiple Y variables & multiple X variables Extension of CA CCA algorithm produces axes that represent the maximum correlation with linear combinations of environmental variables
CA* CA Axis 2 explains 24% TVE EPT Mites CA Axis 1 explains 57% TVE * Important to remove rare species/ taxa
CCA* CA Axis 2 explains 24% TVE EPT O 2 TP Chir. CA Axis 1 explains 57% TVE * Important to remove rare species/ taxa
PAST Demos & Exercises UPGMA & Ward s Clustering CCA
Multivariate Statistics 101 What types of data do you collect? How could you use Multivariate Statistics to analyze it?
Multivariate Statistics 101 Ordination (PCA, NMDS, CA) Cluster Analysis (UPGMA, Ward s) Canonical Correspondence Analysis