Multivariate Statistics 101. Ordination (PCA, NMDS, CA) Cluster Analysis (UPGMA, Ward s) Canonical Correspondence Analysis

Size: px
Start display at page:

Download "Multivariate Statistics 101. Ordination (PCA, NMDS, CA) Cluster Analysis (UPGMA, Ward s) Canonical Correspondence Analysis"

Transcription

1 Multivariate Statistics 101 Ordination (PCA, NMDS, CA) Cluster Analysis (UPGMA, Ward s) Canonical Correspondence Analysis

2 Multivariate Statistics 101 Copy of slides and exercises PAST software download Quinn & Keough Textbook

3 1. Data Prep Habitat & biological data transformations Input data into PAST software (PAleontological STatistics)

4 2. Ordination (Habitat) Principal Component Analysis (PCA) of Habitat Data

5 3. Ordination (Biology) Multidimensional Scaling (MDS) and Correspondence Analysis (CA) of Biological data

6 4. Clustering and CCA Clustering alternative to ordination Canonical Correspondence Analysis of Habitat and Biological Data Putting it all together!

7 Multivariate Statistics 101 What types of data do you collect?

8 1. Data Prep Habitat & biological data transformations Input data into PAST software (PAleontological STatistics)

9 1. Data Prep Typically standardize habitat data because parameters are measure on different scales Typically Log (x+1) transform taxa abundance data to down weight influence of common taxa

10 Data Standardizations Important/ essential in multivariate analyses to reduce the influence of large values: Transformations (e.g., log; also for linearity) Centering (value mean) results in mean of zero for all variables (essential if variables are measured on different scales) Standardizing (centered data / SD) results in a mean of zero & SD of one for all variables (but you may be interested in changes in variability)

11 Excel and PAST Demos and Exercises Data and transformations Data input into PAST

12 General Multivariate Analyses

13 Ordination and Clustering Raw data matrix Sites Between site distance matrix Sites Ordination or Cluster solution of relative similarity among sites Taxa Sites Group A Group B

14 Taxa Ordination Raw data matrix Sites Group A Group B Between site distance matrix Group A Group B Group A Group B Ordination solution showing similar groups of sites

15 Taxa Clustering Raw data matrix Sites Group A Group B Between site distance matrix Group A Group B Cluster solution showing biologically similar groups of sites Group A Group B Group A Group B

16 Multivariate Lingo

17 Multivariate Distances Many options but often: Euclidean Distance is used for chemical or physical data Chi-squared or Bray-Curtis Distance is used for biological data

18 Missing Data Delete or consolidate entire row (many missing) Substitute column mean (few missing) Estimate values (many missing & don t want to delete data)

19 Ordination Typically use Principle Components Analysis (PCA; based on Euclidean Distance) for chemical or physical data Typically use Correspondence Analysis (CA; based on chi-squared distance) or Multi-dimensional Scaling (MDS) with Bray-Curtis Distance for biological data

20 2. Ordination (Habitat) Principal Component Analysis (PCA) of Habitat Data

21 Ordination Typically use Principle Components Analysis (PCA; based on Euclidean Distance) for chemical or physical data

22 Taxa Ordination Raw data matrix Sites Group A Group B Between site distance matrix Group A Group B Group A Group B Ordination solution showing similar groups of sites

23

24 PCA PCA Axis 2 26% TVE, positively correlated with ph PCA Axis 1 explains 66% of variability, positively correlated with temperature, negatively correlated with flow

25

26 Multivariate Analyses Eigenvalues % of total variation explained Eigenvectors relationship between new variable and original variable (the higher the value, the stronger the relationship)

27 PCA PCA Axis 2 26% TVE, positively correlated with ph PCA Axis 1 explains 66% of variability, positively correlated with temperature, negatively correlated with flow

28 Axis Scores Axis 2 Axis 1

29 Compare to original data! Axis 3 Axis 2

30 PAST Demo & Exercise PCA of habitat data

31 3. Ordination (Biology) Multidimensional Scaling (MDS) and Correspondence Analysis (CA) of Biological data

32 Ordination Typically use Correspondence Analysis (CA; based on chi-squared distance) or Multi-dimensional Scaling (MDS) with Bray-Curtis Distance for biological data

33 Taxa Ordination Raw data matrix Sites Group A Group B Between site distance matrix Group A Group B Group A Group B Ordination solution showing similar groups of sites

34 MDS (or NMDS*) MDS Axis 2 33% TVE, + correlation with mites MDS Axis 1 explains 52% of variability, negatively correlated with mayflies * Stress in NMDS should be < 0.10

35 Ordination Typically use Correspondence Analysis (CA; based on chi-squared distance) or Multi-dimensional Scaling (MDS) with Bray-Curtis Distance for biological data

36 CA* CA Axis 2 explains 24% TVE EPT Mites CA Axis 1 explains 57% TVE * Important to remove rare species/ taxa

37 Re-do if you get a horseshoe Axis 2 Axis 1

38 Multivariate Analyses Eigenvalues % of total variation explained Eigenvectors relationship between new variable and original variable (the higher the value, the stronger the relationship)

39 Axis Scores Axis 2 Axis 1

40 Compare to original data! Axis 3 Axis 2

41 PAST Demos & Exercises NMDS ordination CA ordination

42 4. Clustering and CCA Clustering alternative to ordination Canonical Correspondence Analysis of Habitat and Biological Data Putting it all together!

43 Taxa Clustering Raw data matrix Sites Group A Group B Between site distance matrix Group A Group B Cluster solution showing biologically similar groups of sites Group A Group B Group A Group B

44 Dendrogram

45 Clustering Methods Clustering observations many methods (UPGMA, Wards) e.g., groups like streams

46 Clustering Methods Clustering variables used to reduce number of variables without changing the scale (more intuitive than new axes scores) many methods (UPGMA, Wards) e.g., stream water chemistry variables

47 4. Clustering and CCA Clustering alternative to ordination Canonical Correspondence Analysis of Habitat and Biological Data Putting it all together!

48 Relating Biological & Environmental Data Relationships between species distributions and their habitat (chemical and physical surroundings)

49 Canonical Correspondence Analysis

50 Canonical Correspondence Analysis Multiple Y variables & multiple X variables Extension of CA CCA algorithm produces axes that represent the maximum correlation with linear combinations of environmental variables

51 CA* CA Axis 2 explains 24% TVE EPT Mites CA Axis 1 explains 57% TVE * Important to remove rare species/ taxa

52 CCA* CA Axis 2 explains 24% TVE EPT O 2 TP Chir. CA Axis 1 explains 57% TVE * Important to remove rare species/ taxa

53 PAST Demos & Exercises UPGMA & Ward s Clustering CCA

54 Multivariate Statistics 101 What types of data do you collect? How could you use Multivariate Statistics to analyze it?

55 Multivariate Statistics 101 Ordination (PCA, NMDS, CA) Cluster Analysis (UPGMA, Ward s) Canonical Correspondence Analysis

4. Ordination in reduced space

4. Ordination in reduced space Université Laval Analyse multivariable - mars-avril 2008 1 4.1. Generalities 4. Ordination in reduced space Contrary to most clustering techniques, which aim at revealing discontinuities in the data, ordination

More information

INTRODUCTION TO MULTIVARIATE ANALYSIS OF ECOLOGICAL DATA

INTRODUCTION TO MULTIVARIATE ANALYSIS OF ECOLOGICAL DATA INTRODUCTION TO MULTIVARIATE ANALYSIS OF ECOLOGICAL DATA David Zelený & Ching-Feng Li INTRODUCTION TO MULTIVARIATE ANALYSIS Ecologial similarity similarity and distance indices Gradient analysis regression,

More information

Ordination & PCA. Ordination. Ordination

Ordination & PCA. Ordination. Ordination Ordination & PCA Introduction to Ordination Purpose & types Shepard diagrams Principal Components Analysis (PCA) Properties Computing eigenvalues Computing principal components Biplots Covariance vs. Correlation

More information

Lecture 2: Diversity, Distances, adonis. Lecture 2: Diversity, Distances, adonis. Alpha- Diversity. Alpha diversity definition(s)

Lecture 2: Diversity, Distances, adonis. Lecture 2: Diversity, Distances, adonis. Alpha- Diversity. Alpha diversity definition(s) Lecture 2: Diversity, Distances, adonis Lecture 2: Diversity, Distances, adonis Diversity - alpha, beta (, gamma) Beta- Diversity in practice: Ecological Distances Unsupervised Learning: Clustering, etc

More information

Principal Component Analysis (PCA) Theory, Practice, and Examples

Principal Component Analysis (PCA) Theory, Practice, and Examples Principal Component Analysis (PCA) Theory, Practice, and Examples Data Reduction summarization of data with many (p) variables by a smaller set of (k) derived (synthetic, composite) variables. p k n A

More information

Algebra of Principal Component Analysis

Algebra of Principal Component Analysis Algebra of Principal Component Analysis 3 Data: Y = 5 Centre each column on its mean: Y c = 7 6 9 y y = 3..6....6.8 3. 3.8.6 Covariance matrix ( variables): S = -----------Y n c ' Y 8..6 c =.6 5.8 Equation

More information

BIO 682 Multivariate Statistics Spring 2008

BIO 682 Multivariate Statistics Spring 2008 BIO 682 Multivariate Statistics Spring 2008 Steve Shuster http://www4.nau.edu/shustercourses/bio682/index.htm Lecture 11 Properties of Community Data Gauch 1982, Causton 1988, Jongman 1995 a. Qualitative:

More information

Experimental Design and Data Analysis for Biologists

Experimental Design and Data Analysis for Biologists Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1

More information

Short Answer Questions: Answer on your separate blank paper. Points are given in parentheses.

Short Answer Questions: Answer on your separate blank paper. Points are given in parentheses. ISQS 6348 Final exam solutions. Name: Open book and notes, but no electronic devices. Answer short answer questions on separate blank paper. Answer multiple choice on this exam sheet. Put your name on

More information

Multivariate analysis of genetic data an introduction

Multivariate analysis of genetic data an introduction Multivariate analysis of genetic data an introduction Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Imperial College London Population genomics in Lausanne 23 Aug 2016 1/25 Outline Multivariate

More information

Multivariate analysis

Multivariate analysis Multivariate analysis Prof dr Ann Vanreusel -Multidimensional scaling -Simper analysis -BEST -ANOSIM 1 2 Gradient in species composition 3 4 Gradient in environment site1 site2 site 3 site 4 site species

More information

Unconstrained Ordination

Unconstrained Ordination Unconstrained Ordination Sites Species A Species B Species C Species D Species E 1 0 (1) 5 (1) 1 (1) 10 (4) 10 (4) 2 2 (3) 8 (3) 4 (3) 12 (6) 20 (6) 3 8 (6) 20 (6) 10 (6) 1 (2) 3 (2) 4 4 (5) 11 (5) 8 (5)

More information

Multivariate Analysis of Ecological Data

Multivariate Analysis of Ecological Data Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology

More information

4/2/2018. Canonical Analyses Analysis aimed at identifying the relationship between two multivariate datasets. Cannonical Correlation.

4/2/2018. Canonical Analyses Analysis aimed at identifying the relationship between two multivariate datasets. Cannonical Correlation. GAL50.44 0 7 becki 2 0 chatamensis 0 darwini 0 ephyppium 0 guntheri 3 0 hoodensis 0 microphyles 0 porteri 2 0 vandenburghi 0 vicina 4 0 Multiple Response Variables? Univariate Statistics Questions Individual

More information

4/4/2018. Stepwise model fitting. CCA with first three variables only Call: cca(formula = community ~ env1 + env2 + env3, data = envdata)

4/4/2018. Stepwise model fitting. CCA with first three variables only Call: cca(formula = community ~ env1 + env2 + env3, data = envdata) 0 Correlation matrix for ironmental matrix 1 2 3 4 5 6 7 8 9 10 11 12 0.087451 0.113264 0.225049-0.13835 0.338366-0.01485 0.166309-0.11046 0.088327-0.41099-0.19944 1 1 2 0.087451 1 0.13723-0.27979 0.062584

More information

DIMENSION REDUCTION AND CLUSTER ANALYSIS

DIMENSION REDUCTION AND CLUSTER ANALYSIS DIMENSION REDUCTION AND CLUSTER ANALYSIS EECS 833, 6 March 2006 Geoff Bohling Assistant Scientist Kansas Geological Survey geoff@kgs.ku.edu 864-2093 Overheads and resources available at http://people.ku.edu/~gbohling/eecs833

More information

Multivariate analysis of genetic data: an introduction

Multivariate analysis of genetic data: an introduction Multivariate analysis of genetic data: an introduction Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Imperial College London XXIV Simposio Internacional De Estadística Bogotá, 25th July

More information

Introduction to multivariate analysis Outline

Introduction to multivariate analysis Outline Introduction to multivariate analysis Outline Why do a multivariate analysis Ordination, classification, model fitting Principal component analysis Discriminant analysis, quickly Species presence/absence

More information

diversity(datamatrix, index= shannon, base=exp(1))

diversity(datamatrix, index= shannon, base=exp(1)) Tutorial 11: Diversity, Indicator Species Analysis, Cluster Analysis Calculating Diversity Indices The vegan package contains the command diversity() for calculating Shannon and Simpson diversity indices.

More information

1.3. Principal coordinate analysis. Pierre Legendre Département de sciences biologiques Université de Montréal

1.3. Principal coordinate analysis. Pierre Legendre Département de sciences biologiques Université de Montréal 1.3. Pierre Legendre Département de sciences biologiques Université de Montréal http://www.numericalecology.com/ Pierre Legendre 2018 Definition of principal coordinate analysis (PCoA) An ordination method

More information

EXAM PRACTICE. 12 questions * 4 categories: Statistics Background Multivariate Statistics Interpret True / False

EXAM PRACTICE. 12 questions * 4 categories: Statistics Background Multivariate Statistics Interpret True / False EXAM PRACTICE 12 questions * 4 categories: Statistics Background Multivariate Statistics Interpret True / False Stats 1: What is a Hypothesis? A testable assertion about how the world works Hypothesis

More information

Multivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis

Multivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis Multivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis For example Data reduction approaches Cluster analysis Principal components analysis

More information

Chapter 11 Canonical analysis

Chapter 11 Canonical analysis Chapter 11 Canonical analysis 11.0 Principles of canonical analysis Canonical analysis is the simultaneous analysis of two, or possibly several data tables. Canonical analyses allow ecologists to perform

More information

8. FROM CLASSICAL TO CANONICAL ORDINATION

8. FROM CLASSICAL TO CANONICAL ORDINATION Manuscript of Legendre, P. and H. J. B. Birks. 2012. From classical to canonical ordination. Chapter 8, pp. 201-248 in: Tracking Environmental Change using Lake Sediments, Volume 5: Data handling and numerical

More information

FINM 331: MULTIVARIATE DATA ANALYSIS FALL 2017 PROBLEM SET 3

FINM 331: MULTIVARIATE DATA ANALYSIS FALL 2017 PROBLEM SET 3 FINM 331: MULTIVARIATE DATA ANALYSIS FALL 2017 PROBLEM SET 3 The required files for all problems can be found in: http://www.stat.uchicago.edu/~lekheng/courses/331/hw3/ The file name indicates which problem

More information

Introduction to ordination. Gary Bradfield Botany Dept.

Introduction to ordination. Gary Bradfield Botany Dept. Introduction to ordination Gary Bradfield Botany Dept. Ordination there appears to be no word in English which one can use as an antonym to classification ; I would like to propose the term ordination.

More information

VarCan (version 1): Variation Estimation and Partitioning in Canonical Analysis

VarCan (version 1): Variation Estimation and Partitioning in Canonical Analysis VarCan (version 1): Variation Estimation and Partitioning in Canonical Analysis Pedro R. Peres-Neto March 2005 Department of Biology University of Regina Regina, SK S4S 0A2, Canada E-mail: Pedro.Peres-Neto@uregina.ca

More information

An Introduction to Applied Multivariate Analysis with R

An Introduction to Applied Multivariate Analysis with R ~ Snrinuer Brian Everitt Torsten Hathorn An Introduction to Applied Multivariate Analysis with R > Preface........................................................ vii 1 Multivariate Data and Multivariate

More information

Linking species-compositional dissimilarities and environmental data for biodiversity assessment

Linking species-compositional dissimilarities and environmental data for biodiversity assessment Linking species-compositional dissimilarities and environmental data for biodiversity assessment D. P. Faith, S. Ferrier Australian Museum, 6 College St., Sydney, N.S.W. 2010, Australia; N.S.W. National

More information

Factors affecting the Power and Validity of Randomization-based Multivariate Tests for Difference among Ecological Assemblages

Factors affecting the Power and Validity of Randomization-based Multivariate Tests for Difference among Ecological Assemblages Factors affecting the Power and Validity of Randomization-based Multivariate Tests for Difference among Ecological Assemblages Cameron Hurst B.Sc. (Hons) This thesis was submitted in fulfillment of the

More information

Multivariate Ordination Analyses: Principal Component Analysis. Dilys Vela

Multivariate Ordination Analyses: Principal Component Analysis. Dilys Vela Multivariate Ordination Analyses: Principal Component Analysis Dilys Vela Tatiana Boza Multivariate Analyses A multivariate data set includes more than one variable ibl recorded dd from a number of replicate

More information

An Introduction to R for the Geosciences: Ordination I

An Introduction to R for the Geosciences: Ordination I An Introduction to R for the Geosciences: Ordination I Gavin Simpson April 29, 2013 Summary This practical will use the PONDS dataset to demonstrate methods of indirect gradient analysis (PCA, CA, and

More information

Lecture 5: Ecological distance metrics; Principal Coordinates Analysis. Univariate testing vs. community analysis

Lecture 5: Ecological distance metrics; Principal Coordinates Analysis. Univariate testing vs. community analysis Lecture 5: Ecological distance metrics; Principal Coordinates Analysis Univariate testing vs. community analysis Univariate testing deals with hypotheses concerning individual taxa Is this taxon differentially

More information

CAP. Canonical Analysis of Principal coordinates. A computer program by Marti J. Anderson. Department of Statistics University of Auckland (2002)

CAP. Canonical Analysis of Principal coordinates. A computer program by Marti J. Anderson. Department of Statistics University of Auckland (2002) CAP Canonical Analysis of Principal coordinates A computer program by Marti J. Anderson Department of Statistics University of Auckland (2002) 2 DISCLAIMER This FORTRAN program is provided without any

More information

DETECTING BIOLOGICAL AND ENVIRONMENTAL CHANGES: DESIGN AND ANALYSIS OF MONITORING AND EXPERIMENTS (University of Bologna, 3-14 March 2008)

DETECTING BIOLOGICAL AND ENVIRONMENTAL CHANGES: DESIGN AND ANALYSIS OF MONITORING AND EXPERIMENTS (University of Bologna, 3-14 March 2008) Dipartimento di Biologia Evoluzionistica Sperimentale Centro Interdipartimentale di Ricerca per le Scienze Ambientali in Ravenna INTERNATIONAL WINTER SCHOOL UNIVERSITY OF BOLOGNA DETECTING BIOLOGICAL AND

More information

Indirect Gradient Analysis

Indirect Gradient Analysis Indirect Gradient Analysis Gavin Simpson May 12, 2006 Summary This practical will use the PONDS dataset to demonstrate methods of indirect gradient analysis (PCA, CA, and DCA) of species and environmental

More information

Multivariate Analysis of Ecological Data using CANOCO

Multivariate Analysis of Ecological Data using CANOCO Multivariate Analysis of Ecological Data using CANOCO JAN LEPS University of South Bohemia, and Czech Academy of Sciences, Czech Republic Universitats- uric! Lanttesbibiiothek Darmstadt Bibliothek Biologie

More information

Data Analysis in Paleontology Using R. Data Manipulation

Data Analysis in Paleontology Using R. Data Manipulation Data Analysis in Paleontology Using R Session 3 24 Jan 2006 Gene Hunt Dept. of Paleobiology NMNH, SI Data Manipulation Sorting x

More information

A User's Guide To Principal Components

A User's Guide To Principal Components A User's Guide To Principal Components J. EDWARD JACKSON A Wiley-Interscience Publication JOHN WILEY & SONS, INC. New York Chichester Brisbane Toronto Singapore Contents Preface Introduction 1. Getting

More information

Multivariate Statistics Fundamentals Part 1: Rotation-based Techniques

Multivariate Statistics Fundamentals Part 1: Rotation-based Techniques Multivariate Statistics Fundamentals Part 1: Rotation-based Techniques A reminded from a univariate statistics courses Population Class of things (What you want to learn about) Sample group representing

More information

Revision: Chapter 1-6. Applied Multivariate Statistics Spring 2012

Revision: Chapter 1-6. Applied Multivariate Statistics Spring 2012 Revision: Chapter 1-6 Applied Multivariate Statistics Spring 2012 Overview Cov, Cor, Mahalanobis, MV normal distribution Visualization: Stars plot, mosaic plot with shading Outlier: chisq.plot Missing

More information

-Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the

-Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the 1 2 3 -Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the 1950's. -PCA is based on covariance or correlation

More information

PCA Advanced Examples & Applications

PCA Advanced Examples & Applications PCA Advanced Examples & Applications Objectives: Showcase advanced PCA analysis: - Addressing the assumptions - Improving the signal / decreasing the noise Principal Components (PCA) Paper II Example:

More information

Statistics II 1. Modelling Biology. Basic Applications of Mathematics and Statistics in the Biological Sciences

Statistics II 1. Modelling Biology. Basic Applications of Mathematics and Statistics in the Biological Sciences Statistics II Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part II: Data Analysis and Statistics Script C Introductory Course for Students of Biology, Biotechnology

More information

Dimensionality Reduction Techniques (DRT)

Dimensionality Reduction Techniques (DRT) Dimensionality Reduction Techniques (DRT) Introduction: Sometimes we have lot of variables in the data for analysis which create multidimensional matrix. To simplify calculation and to get appropriate,

More information

Distance Measures. Objectives: Discuss Distance Measures Illustrate Distance Measures

Distance Measures. Objectives: Discuss Distance Measures Illustrate Distance Measures Distance Measures Objectives: Discuss Distance Measures Illustrate Distance Measures Quantifying Data Similarity Multivariate Analyses Re-map the data from Real World Space to Multi-variate Space Distance

More information

Latent Variable Methods Course

Latent Variable Methods Course Latent Variable Methods Course Learning from data Instructor: Kevin Dunn kevin.dunn@connectmv.com http://connectmv.com Kevin Dunn, ConnectMV, Inc. 2011 Revision: 269:35e2 compiled on 15-12-2011 ConnectMV,

More information

Correspondence Analysis & Related Methods

Correspondence Analysis & Related Methods Corresponence Analysis & Relate Methos Michael Greenacre SESSION 3: MUIDIMENSIONA SCAING (MDS DIMENSION REDUCION CASSICA MDS NONMERIC MDS Distances an issimilarities... n objects = istance between object

More information

Maximum variance formulation

Maximum variance formulation 12.1. Principal Component Analysis 561 Figure 12.2 Principal component analysis seeks a space of lower dimensionality, known as the principal subspace and denoted by the magenta line, such that the orthogonal

More information

Multivariate Analysis of Ecological Data

Multivariate Analysis of Ecological Data Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology

More information

STAT 730 Chapter 14: Multidimensional scaling

STAT 730 Chapter 14: Multidimensional scaling STAT 730 Chapter 14: Multidimensional scaling Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Data Analysis 1 / 16 Basic idea We have n objects and a matrix

More information

Multivariate Statistics Summary and Comparison of Techniques. Multivariate Techniques

Multivariate Statistics Summary and Comparison of Techniques. Multivariate Techniques Multivariate Statistics Summary and Comparison of Techniques P The key to multivariate statistics is understanding conceptually the relationship among techniques with regards to: < The kinds of problems

More information

Variations in pelagic bacterial communities in the North Atlantic Ocean coincide with water bodies

Variations in pelagic bacterial communities in the North Atlantic Ocean coincide with water bodies The following supplement accompanies the article Variations in pelagic bacterial communities in the North Atlantic Ocean coincide with water bodies Richard L. Hahnke 1, Christina Probian 1, Bernhard M.

More information

Lecture 5: Ecological distance metrics; Principal Coordinates Analysis. Univariate testing vs. community analysis

Lecture 5: Ecological distance metrics; Principal Coordinates Analysis. Univariate testing vs. community analysis Lecture 5: Ecological distance metrics; Principal Coordinates Analysis Univariate testing vs. community analysis Univariate testing deals with hypotheses concerning individual taxa Is this taxon differentially

More information

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations. Previously Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations y = Ax Or A simply represents data Notion of eigenvectors,

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Chapter 1 Ordination Methods and the Evaluation of Ediacaran Communities

Chapter 1 Ordination Methods and the Evaluation of Ediacaran Communities Chapter 1 Ordination Methods and the Evaluation of Ediacaran Communities 1 2 3 Matthew E. Clapham 4 Contents 1.1 Introduction... 000 1.2 ataset Summary... 000 1.3 ata Standardization... 000 1.4 Ordination

More information

Lab 7. Direct & Indirect Gradient Analysis

Lab 7. Direct & Indirect Gradient Analysis Lab 7 Direct & Indirect Gradient Analysis Direct and indirect gradient analysis refers to a case where you have two datasets with variables that have cause-and-effect or mutual influences on each other.

More information

JUST THE MATHS UNIT NUMBER 9.9. MATRICES 9 (Modal & spectral matrices) A.J.Hobson

JUST THE MATHS UNIT NUMBER 9.9. MATRICES 9 (Modal & spectral matrices) A.J.Hobson JUST THE MATHS UNIT NUMBER 9.9 MATRICES 9 (Modal & spectral matrices) by A.J.Hobson 9.9. Assumptions and definitions 9.9.2 Diagonalisation of a matrix 9.9.3 Exercises 9.9.4 Answers to exercises UNIT 9.9

More information

Species Associations: The Kendall Coefficient of Concordance Revisited

Species Associations: The Kendall Coefficient of Concordance Revisited Species Associations: The Kendall Coefficient of Concordance Revisited Pierre LEGENDRE The search for species associations is one of the classical problems of community ecology. This article proposes to

More information

Chad Burrus April 6, 2010

Chad Burrus April 6, 2010 Chad Burrus April 6, 2010 1 Background What is UniFrac? Materials and Methods Results Discussion Questions 2 The vast majority of microbes cannot be cultured with current methods Only half (26) out of

More information

Distances and similarities Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining

Distances and similarities Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining Distances and similarities Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Similarities Start with X which we assume is centered and standardized. The PCA loadings were

More information

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage

More information

Dimension Reduction (PCA, ICA, CCA, FLD,

Dimension Reduction (PCA, ICA, CCA, FLD, Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011 April 6 th, 2011 Parts of the PCA slides are from previous 10-701 lectures 1 Outline Dimension reduction

More information

Techniques and Applications of Multivariate Analysis

Techniques and Applications of Multivariate Analysis Techniques and Applications of Multivariate Analysis Department of Statistics Professor Yong-Seok Choi E-mail: yschoi@pusan.ac.kr Home : yschoi.pusan.ac.kr Contents Multivariate Statistics (I) in Spring

More information

Statistics 202: Data Mining. c Jonathan Taylor. Week 2 Based in part on slides from textbook, slides of Susan Holmes. October 3, / 1

Statistics 202: Data Mining. c Jonathan Taylor. Week 2 Based in part on slides from textbook, slides of Susan Holmes. October 3, / 1 Week 2 Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Part I Other datatypes, preprocessing 2 / 1 Other datatypes Document data You might start with a collection of

More information

2/19/2018. Dataset: 85,122 islands 19,392 > 1km 2 17,883 with data

2/19/2018. Dataset: 85,122 islands 19,392 > 1km 2 17,883 with data The group numbers are arbitrary. Remember that you can rotate dendrograms around any node and not change the meaning. So, the order of the clusters is not meaningful. Taking a subset of the data changes

More information

Part I. Other datatypes, preprocessing. Other datatypes. Other datatypes. Week 2 Based in part on slides from textbook, slides of Susan Holmes

Part I. Other datatypes, preprocessing. Other datatypes. Other datatypes. Week 2 Based in part on slides from textbook, slides of Susan Holmes Week 2 Based in part on slides from textbook, slides of Susan Holmes Part I Other datatypes, preprocessing October 3, 2012 1 / 1 2 / 1 Other datatypes Other datatypes Document data You might start with

More information

How to analyze multiple distance matrices

How to analyze multiple distance matrices DISTATIS How to analyze multiple distance matrices Hervé Abdi & Dominique Valentin Overview. Origin and goal of the method DISTATIS is a generalization of classical multidimensional scaling (MDS see the

More information

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis: Uses one group of variables (we will call this X) In

More information

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Introduction Edps/Psych/Stat/ 584 Applied Multivariate Statistics Carolyn J Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN c Board of Trustees,

More information

DIAGONALIZATION OF THE STRESS TENSOR

DIAGONALIZATION OF THE STRESS TENSOR DIAGONALIZATION OF THE STRESS TENSOR INTRODUCTION By the use of Cauchy s theorem we are able to reduce the number of stress components in the stress tensor to only nine values. An additional simplification

More information

Econ Slides from Lecture 8

Econ Slides from Lecture 8 Econ 205 Sobel Econ 205 - Slides from Lecture 8 Joel Sobel September 1, 2010 Computational Facts 1. det AB = det BA = det A det B 2. If D is a diagonal matrix, then det D is equal to the product of its

More information

What is Principal Component Analysis?

What is Principal Component Analysis? What is Principal Component Analysis? Principal component analysis (PCA) Reduce the dimensionality of a data set by finding a new set of variables, smaller than the original set of variables Retains most

More information

Other resources. Greengenes (bacterial) Silva (bacteria, archaeal and eukarya)

Other resources. Greengenes (bacterial)  Silva (bacteria, archaeal and eukarya) General QIIME resources http://qiime.org/ Blog (news, updates): http://qiime.wordpress.com/ Support/forum: https://groups.google.com/forum/#!forum/qiimeforum Citing QIIME: Caporaso, J.G. et al., QIIME

More information

Vector Space Models. wine_spectral.r

Vector Space Models. wine_spectral.r Vector Space Models 137 wine_spectral.r Latent Semantic Analysis Problem with words Even a small vocabulary as in wine example is challenging LSA Reduce number of columns of DTM by principal components

More information

Lecture 6 Proof for JL Lemma and Linear Dimensionality Reduction

Lecture 6 Proof for JL Lemma and Linear Dimensionality Reduction COMS 4995: Unsupervised Learning (Summer 18) June 7, 018 Lecture 6 Proof for JL Lemma and Linear imensionality Reduction Instructor: Nakul Verma Scribes: Ziyuan Zhong, Kirsten Blancato This lecture gives

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

Exercises * on Principal Component Analysis

Exercises * on Principal Component Analysis Exercises * on Principal Component Analysis Laurenz Wiskott Institut für Neuroinformatik Ruhr-Universität Bochum, Germany, EU 4 February 207 Contents Intuition 3. Problem statement..........................................

More information

6. Let C and D be matrices conformable to multiplication. Then (CD) =

6. Let C and D be matrices conformable to multiplication. Then (CD) = Quiz 1. Name: 10 points per correct answer. (20 points for attendance). 1. Let A = 3 and B = [3 yy]. When is A equal to B? xx A. When x = 3 B. When y = 3 C. When x = y D. Never 2. See 1. What is the dimension

More information

Dissimilarity and transformations. Pierre Legendre Département de sciences biologiques Université de Montréal

Dissimilarity and transformations. Pierre Legendre Département de sciences biologiques Université de Montréal and transformations Pierre Legendre Département de sciences biologiques Université de Montréal http://www.numericalecology.com/ Pierre Legendre 2017 Definitions An association coefficient is a function

More information

FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING

FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING Vishwanath Mantha Department for Electrical and Computer Engineering Mississippi State University, Mississippi State, MS 39762 mantha@isip.msstate.edu ABSTRACT

More information

Principal Components Analysis. Sargur Srihari University at Buffalo

Principal Components Analysis. Sargur Srihari University at Buffalo Principal Components Analysis Sargur Srihari University at Buffalo 1 Topics Projection Pursuit Methods Principal Components Examples of using PCA Graphical use of PCA Multidimensional Scaling Srihari 2

More information

Clusters. Unsupervised Learning. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Clusters. Unsupervised Learning. Luc Anselin.   Copyright 2017 by Luc Anselin, All Rights Reserved Clusters Unsupervised Learning Luc Anselin http://spatial.uchicago.edu 1 curse of dimensionality principal components multidimensional scaling classical clustering methods 2 Curse of Dimensionality 3 Curse

More information

Table of Contents. Multivariate methods. Introduction II. Introduction I

Table of Contents. Multivariate methods. Introduction II. Introduction I Table of Contents Introduction Antti Penttilä Department of Physics University of Helsinki Exactum summer school, 04 Construction of multinormal distribution Test of multinormality with 3 Interpretation

More information

Textbook Examples of. SPSS Procedure

Textbook Examples of. SPSS Procedure Textbook s of IBM SPSS Procedures Each SPSS procedure listed below has its own section in the textbook. These sections include a purpose statement that describes the statistical test, identification of

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

Unsupervised dimensionality reduction

Unsupervised dimensionality reduction Unsupervised dimensionality reduction Guillaume Obozinski Ecole des Ponts - ParisTech SOCN course 2014 Guillaume Obozinski Unsupervised dimensionality reduction 1/30 Outline 1 PCA 2 Kernel PCA 3 Multidimensional

More information

Multivariate Analysis of Ecological Data

Multivariate Analysis of Ecological Data Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology

More information

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Combinations of features Given a data matrix X n p with p fairly large, it can

More information

Chapter 4: Factor Analysis

Chapter 4: Factor Analysis Chapter 4: Factor Analysis In many studies, we may not be able to measure directly the variables of interest. We can merely collect data on other variables which may be related to the variables of interest.

More information

14 Singular Value Decomposition

14 Singular Value Decomposition 14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

Linear & Non-Linear Discriminant Analysis! Hugh R. Wilson

Linear & Non-Linear Discriminant Analysis! Hugh R. Wilson Linear & Non-Linear Discriminant Analysis! Hugh R. Wilson PCA Review! Supervised learning! Fisher linear discriminant analysis! Nonlinear discriminant analysis! Research example! Multiple Classes! Unsupervised

More information

COMPOSITIONAL DATA IN COMMUNITY ECOLOGY: THE PARADIGM OR PERIL OF PROPORTIONS?

COMPOSITIONAL DATA IN COMMUNITY ECOLOGY: THE PARADIGM OR PERIL OF PROPORTIONS? Ecology, 78(3), 1997, pp. 929 940 1997 by the Ecological Society of America COMPOSITIONAL DATA IN COMMUNITY ECOLOGY: THE PARADIGM OR PERIL OF PROPORTIONS? DONALD A. JACKSON Aquatic Ecology Group, Department

More information

Linear Algebra: Characteristic Value Problem

Linear Algebra: Characteristic Value Problem Linear Algebra: Characteristic Value Problem . The Characteristic Value Problem Let < be the set of real numbers and { be the set of complex numbers. Given an n n real matrix A; does there exist a number

More information

LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS

LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS NOTES FROM PRE- LECTURE RECORDING ON PCA PCA and EFA have similar goals. They are substantially different in important ways. The goal

More information

Lecture: Mixture Models for Microbiome data

Lecture: Mixture Models for Microbiome data Lecture: Mixture Models for Microbiome data Lecture 3: Mixture Models for Microbiome data Outline: - - Sequencing thought experiment Mixture Models (tangent) - (esp. Negative Binomial) - Differential abundance

More information

Preprocessing & dimensionality reduction

Preprocessing & dimensionality reduction Introduction to Data Mining Preprocessing & dimensionality reduction CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University Fall 2016 CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall 2016

More information

Palaeontological community and diversity analysis brief notes. Oyvind Hammer Paläontologisches Institut und Museum, Zürich

Palaeontological community and diversity analysis brief notes. Oyvind Hammer Paläontologisches Institut und Museum, Zürich Palaeontological community and diversity analysis brief notes Oyvind Hammer Paläontologisches Institut und Museum, Zürich ohammer@nhm.uio.no Zürich, June 3, 2002 Contents 1 Introduction 2 2 The basics

More information