Multivariate Statistics Summary and Comparison of Techniques. Multivariate Techniques

Similar documents
4/2/2018. Canonical Analyses Analysis aimed at identifying the relationship between two multivariate datasets. Cannonical Correlation.

Multivariate Analysis of Ecological Data using CANOCO

DETECTING BIOLOGICAL AND ENVIRONMENTAL CHANGES: DESIGN AND ANALYSIS OF MONITORING AND EXPERIMENTS (University of Bologna, 3-14 March 2008)

Experimental Design and Data Analysis for Biologists

Discrimination Among Groups. Discrimination Among Groups

4/4/2018. Stepwise model fitting. CCA with first three variables only Call: cca(formula = community ~ env1 + env2 + env3, data = envdata)

Chapter 11 Canonical analysis

INTRODUCTION TO MULTIVARIATE ANALYSIS OF ECOLOGICAL DATA

STATISTICS 407 METHODS OF MULTIVARIATE ANALYSIS TOPICS

Factors affecting the Power and Validity of Randomization-based Multivariate Tests for Difference among Ecological Assemblages

Unconstrained Ordination

Discrimination Among Groups. Classification (and Regression) Trees

Course in Data Science

-Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the

Multivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis

Mean Ellenberg indicator values as explanatory variables in constrained ordination. David Zelený

Textbook Examples of. SPSS Procedure

Discriminant Analysis and Statistical Pattern Recognition

Introduction to ordination. Gary Bradfield Botany Dept.

Basics of Multivariate Modelling and Data Analysis

Multivariate analysis

Principal component analysis

Introduction to Spatial Analysis. Spatial Analysis. Session organization. Learning objectives. Module organization. GIS and spatial analysis

ANOVA approach. Investigates interaction terms. Disadvantages: Requires careful sampling design with replication

Multivariate Statistics Fundamentals Part 1: Rotation-based Techniques

Data Mining. 3.6 Regression Analysis. Fall Instructor: Dr. Masoud Yaghini. Numeric Prediction

Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees

Applied Multivariate Statistical Analysis Richard Johnson Dean Wichern Sixth Edition

Analysis of Multivariate Ecological Data

Small n, σ known or unknown, underlying nongaussian

Multivariate Analysis of Ecological Data

CHAPTER 2. Types of Effect size indices: An Overview of the Literature

9/26/17. Ridge regression. What our model needs to do. Ridge Regression: L2 penalty. Ridge coefficients. Ridge coefficients

INFORMATION THEORY AND STATISTICS

10-810: Advanced Algorithms and Models for Computational Biology. Optimal leaf ordering and classification

Statistics Toolbox 6. Apply statistical algorithms and probability models

Generalized Linear Models (GLZ)

EXAM PRACTICE. 12 questions * 4 categories: Statistics Background Multivariate Statistics Interpret True / False

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

Too good to be true: pitfalls of using mean Ellenberg indicator values in vegetation analyses

Multivariate Statistics 101. Ordination (PCA, NMDS, CA) Cluster Analysis (UPGMA, Ward s) Canonical Correspondence Analysis

Techniques and Applications of Multivariate Analysis

5. Discriminant analysis

Bivariate Relationships Between Variables

Data Analysis as a Decision Making Process

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Statistical Machine Learning

Classification: Linear Discriminant Analysis

A User's Guide To Principal Components

sphericity, 5-29, 5-32 residuals, 7-1 spread and level, 2-17 t test, 1-13 transformations, 2-15 violations, 1-19

An Introduction to Ordination Connie Clark

Machine Learning Practice Page 2 of 2 10/28/13

Review of some concepts in predictive modeling

Lecture 2: Diversity, Distances, adonis. Lecture 2: Diversity, Distances, adonis. Alpha- Diversity. Alpha diversity definition(s)

CAP. Canonical Analysis of Principal coordinates. A computer program by Marti J. Anderson. Department of Statistics University of Auckland (2002)

Logistic Regression: Regression with a Binary Dependent Variable

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

4. Ordination in reduced space

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Outline Week 1 PCA Challenge. Introduction. Multivariate Statistical Analysis. Hung Chen

Wolfgang Karl Härdle Leopold Simar. Applied Multivariate. Statistical Analysis. Fourth Edition. ö Springer

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Discriminant Analysis with High Dimensional. von Mises-Fisher distribution and

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Types of Statistical Tests DR. MIKE MARRAPODI

Pattern Recognition and Machine Learning

Lab 7. Direct & Indirect Gradient Analysis

Multivariate statistical methods and data mining in particle physics

PATTERN CLASSIFICATION

Use R! Series Editors:

Overview of clustering analysis. Yuehua Cui

1. Introduction to Multivariate Analysis

Multivariate analysis of genetic data: an introduction

Classification techniques focus on Discriminant Analysis

2/19/2018. Dataset: 85,122 islands 19,392 > 1km 2 17,883 with data

day month year documentname/initials 1

Data Mining Techniques

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

A Least Squares Formulation for Canonical Correlation Analysis

Ordination & PCA. Ordination. Ordination

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

An Alternative Algorithm for Classification Based on Robust Mahalanobis Distance

Multivariate Statistical Analysis

Lecture 4 Discriminant Analysis, k-nearest Neighbors

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata

Introduction to multivariate analysis Outline

Econometrics I. Lecture 10: Nonparametric Estimation with Kernels. Paul T. Scott NYU Stern. Fall 2018

Short Note: Naive Bayes Classifiers and Permanence of Ratios

Discriminant Analysis and Statistical Pattern Recognition

8. FROM CLASSICAL TO CANONICAL ORDINATION

Analysis of community ecology data in R

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

6. Let C and D be matrices conformable to multiplication. Then (CD) =

CS534 Machine Learning - Spring Final Exam

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

2 D wavelet analysis , 487

Probabilistic Methods in Bioinformatics. Pabitra Mitra

Transcription:

Multivariate Statistics Summary and Comparison of Techniques P The key to multivariate statistics is understanding conceptually the relationship among techniques with regards to: < The kinds of problems each technique is suited for < The objective(s) of each technique < The data structure required for each technique < Sampling considerations for each technique < Underlying mathematical model, or lack thereof, of each technique < Potential for complementary use of techniques 1 Model y1 + y2 +... + yi y 1 + y 2 +... + y i = x y 1 + y 2 +... + y i = x 1 + x 2 +... + x j Techniques Unconstrained Ordination Multivariate ANOVA Multi-Response Permutation Analysis of Similarities Mantel Test Discriminant Analysis Logistic Regression Classification Trees Indicator Species Analysis Constrained Ordination Canonical Correlation Multivariate Regression Trees 2

Technique Objective Unconstrained Ordination (PCA, MDS, CA, DCA, NMDS) (Family of techinques) Discrimination (MANOVA, MRPP, ANOSIM, Mantel, DA, LR, CART, ISA) Constrained Ordination (RDA, CCA, CAP) 3 Extract gradients of maximum variation Establish groups of similar entities Test for & describe differences among groups of entities or predict group membership Extract gradients of variation in dependent variables explainable by independent variables Technique Unconstrained Ordination (PCA, MDS, CA, DCA, NMDS) (Family of techinques) Discrimination (MANOVA, MRPP, ANOSIM, Mantel, DA, LR, CART, ISA) Constrained Ordination (RDA, CCA, CAP) Variance Emphasis Emphasizes variation among individual sampling entities by defining gradients of maximum total sample variance; describes the interentity variance structure. 4

Technique Unconstrained Ordination (PCA, MDS, CA, DCA, NMDS) (Family of techinques) Discrimination (MANOVA, MRPP, ANOSIM, Mantel, DA, LR, CART, ISA) Constrained Ordination (RDA, CCA, CAP) Variance Emphasis Emphasizes both differences and similarities among individual sampling entities by clustering entities based on inter-entity resemblance. 5 Technique Unconstrained Ordination (PCA, MDS, CA, DCA, NMDS) (Family of techinques) Discrimination (MANOVA, MRPP, ANOSIM, Mantel, DA, LR, CART, ISA) Constrained Ordination (RDA, CCA, CAP) Variance Emphasis Emphasizes variation among groups of sampling entities; describes the inter-group variance structure. 6

Technique Unconstrained Ordination (PCA, MDS, CA, DCA, NMDS) (Family of techinques) Discrimination (MANOVA, MRPP, ANOSIM, Mantel, DA, LR, CART, ISA) Constrained Ordination (RDA, CCA, CAP) Variance Emphasis Emphasizes variation among individual sampling entities by defining gradients of maximum total sample variance explainable by environmental variables 7 Technique Unconstrained Ordination (PCA, MDS, CA, DCA, NMDS) (Family of techinques) Discrimination (MANOVA, MRPP, ANOSIM, Mantel, DA, LR, CART, ISA) Constrained Ordination (RDA, CCA, CAP) Dependence Type Interdependence Interdependence Dependence Dependence (& interdependence?) 8

Technique Unconstrained Ordination (PCA, MDS, CA, DCA, NMDS) (Family of techinques) Discrimination (MANOVA, MRPP, ANOSIM, Mantel, DA, LR, CART, ISA) Constrained Ordination (RDA, CCA, CAP) Data Structure One set; >>2 variables (Y) One set; >>2 varibles (Y) Two sets; 1 grouping variable (X), >>2 discriminating variables (Y) Two sets; >>2 response (Y) variables, $1 explanatory (X) variables 9 Obs Group Y-set X-set 1 A a 11 a 12 a 13... a 1p b 11 b 12 b 13... b 1m 2 A a 21 a 22 a 23... a 2p b 21 b 22 b 23... b 2m 3 A a 31 a 32 a 33... a 3p b 31 b 32 b 33... b 3m............................ n A a n1 a n2 a n3... a np b n1 b n2 b n3... b nm n+1 C c 11 c 12 c 13... c 1p n+2 C c 21 c 22 c 23... c 2p n+3 C c 31 c 32 c 33... c 3p................ N C c n1 c n2 c n3... c np Unconstrained Ordination (PCA, PCO, CA, DCA, NMDS) (Family of techinques) 10

Obs Group Y-set X-set 1 A a 11 a 12 a 13... a 1p b 11 b 12 b 13... b 1m 2 A a 21 a 22 a 23... a 2p b 21 b 22 b 23... b 2m 3 A a 31 a 32 a 33... a 3p b 31 b 32 b 33... b 3m............................ n A a n1 a n2 a n3... a np b n1 b n2 b n3... b nm n+1 C c 11 c 12 c 13... c 1p n+2 C c 21 c 22 c 23... c 2p n+3 C c 31 c 32 c 33... c 3p................ N C c n1 c n2 c n3... c np Discrimination Techniques (MANOVA, MRPP, ANOSIM, Mantel; DA, LR, CART, ISA) 11 Obs Group Y-set X-set 1 A a 11 a 12 a 13... a 1p b 11 b 12 b 13... b 1m 2 A a 21 a 22 a 23... a 2p b 21 b 22 b 23... b 2m 3 A a 31 a 32 a 33... a 3p b 31 b 32 b 33... b 3m............................ n A a n1 a n2 a n3... a np b n1 b n2 b n3... b nm n+1 C c 11 c 12 c 13... c 1p n+2 C c 21 c 22 c 23... c 2p n+3 C c 31 c 32 c 33... c 3p................ N C c n1 c n2 c n3... c np Constrained Ordination (RDA, CCA, CAP, COR) MRT 12

Technique Unconstrained Ordination (PCA, MDS, CA, DCA, NMDS) (Family of techinques) Discrimination (MANOVA, MRPP, ANOSIM, Mantel, DA, LR, CART, ISA) Constrained Ordination (RDA, CCA, CAP) Sample Characteristics N (from one or unknown # pops) N (from unknown # pop's) N (from known # pop's) or N1, N2,... (from separate pop's) N (from one pop) 13 P Describe the major ecological gradients of variation among individual sampling entities, and/or to portray sampling entities along "continuous" gradients of maximum sample variation, then use... Unconstrained Ordination 14

P Describe the major ecological gradients of variation among individual sampling entities, and/or to portray sampling entities along "continuous" gradients of maximum sample variation, then use... P Assume linear relationship to ecological gradients... Unconstrained Ordination PCA, PCO(MDS) 15 P Describe the major ecological gradients of variation among individual sampling entities, and/or to portray sampling entities along "continuous" gradients of maximum sample variation, then use... P Assume linear relationship to ecological gradients... P Assume unimodal relationship to ecological gradients... Unconstrained Ordination PCA, PCO(MDS) CA(RA), DCA 16

P Describe the major ecological gradients of variation among individual sampling entities, and/or to portray sampling entities along "continuous" gradients of maximum sample variation, then use... P Assume linear relationship to ecological gradients... P Assume unimodal relationship to ecological gradients... P Assume no particular relationship; only monotonic relationship between input and output dissimilarities... Unconstrained Ordination PCA, PCO(MDS) CA(RA), DCA Alt: PCA* NMDS 17 P Establish artificial classes or groups of similar entities where pre-specified, welldefined groups do not already exist, and/or to portray sampling entities in "discrete" groups, then use... 18

P Establish artificial classes or groups of similar entities where pre-specified, welldefined groups do not already exist, and/or to portray sampling entities in "discrete" groups, then use... P Assign entities to a specified number of groups to maximize within-group similarity or form composite clusters... Non-hierarchical 19 P Establish artificial classes or groups of similar entities where pre-specified, welldefined groups do not already exist, and/or to portray sampling entities in "discrete" groups, then use... P Assign entities to a specified number of groups to maximize within-group similarity or form composite clusters... P Assign entities to groups and display relationships among groups as they form... 20 Non-hierarchical Hierarchical

P Establish artificial classes or groups of entities with similar species composition and abundance where pre-specified, well-defined groups do not already exist, based on measured environmental variables, and/or to portray sampling entities in "discrete" groups representing species assemblages with distinct environmental affinities, then use... Constrained (MRT) 21 P Differentiate among pre-specified, well-defined classes or groups of sampling entities, and to: P Test for significant differences among groups... < Parametric test... MANOVA / DA 22

P Differentiate among pre-specified, well-defined classes or groups of sampling entities, and to: P Test for significant differences among groups... < Parametric test... < Nonparametric tests... MANOVA / DA MRPP, ANOSIM, Mantel 23 P Differentiate among pre-specified, well-defined classes or groups of sampling entities, and to: P Describe the major ecological differences among groups... < Assume a linear discrimination function... DA (CAD) 24

P Differentiate among pre-specified, well-defined classes or groups of sampling entities, and to: P Describe the major ecological differences among groups... < Assume a linear discrimination function... < Assume a logistic discrimination function... DA (CAD) LR (MLR) 25 P Differentiate among pre-specified, well-defined classes or groups of sampling entities, and to: P Describe the major ecological differences among groups... < Assume a linear discrimination function... < Assume a logistic discrimination function... < Do not assume any particular function... DA (CAD) LR (MLR) CART (UCT) 26

P Differentiate among pre-specified, well-defined classes or groups of sampling entities, and to: P Describe the major ecological differences among groups... < Assume a linear discrimination function... < Assume a logistic discrimination function... < Do not assume any particular function... < Identify indicators for each group... DA (CAD) LR (MLR) CART (UCT) ISA 27 P Differentiate among pre-specified, well-defined classes or groups of sampling entities, and to: P Predict group membership of future observations... < Linear classification function... DA (LDF) 28

P Differentiate among pre-specified, well-defined classes or groups of sampling entities, and to: P Predict group membership of future observations... < Linear classification function... < Logistic classification function... DA (LDF) LR (MLR) 29 P Differentiate among pre-specified, well-defined classes or groups of sampling entities, and to: P Predict group membership of future observations... < Linear classification function... < Logistic classification function... < Decision tree classifier... DA (LDF) LR (MLR) CART (UCT) 30

P Differentiate among pre-specified, well-defined classes or groups of sampling entities, and to: P Predict group membership of future observations... < Linear classification function... < Logistic classification function... < Decision tree classifier... < Other nonparametric classifiers... DA (LDF) LR (MLR) CART (UCT) Kernel K nearest-neighbor 31 P Explain the variation in a single continuous dependent variable using two or more continuous independent variables, and/or to develop a model for predicting the value of the dependent variable from the values of the independent variables, then use... Alternatives: Multiple Linear Regression CART (URT) 32

P Explain the variation in a single dichotomous dependent (grouping) variable using two or more continuous and/or categorical independent variables, and/or to develop a model for predicting the group membership of a sampling entity from the values of the independent variables, then use... Alternatives: Multiple Logistic Regression DA CART (UCT) 33 P Describe the major ecological patterns in one set of (response) variables explainable by another set of (explanatory) variables, then use... Constrained Ordination or MRT 34

P Describe the major ecological patterns in one set of (response) variables explainable by another set of (explanatory) variables, then use... P Assume linear response function of response variables (species) along linear gradients defined by the explanatory variables (environment)... Constrained Ordination or MRT RDA, CAP 35 P Describe the major ecological patterns in one set of (response) variables explainable by another set of (explanatory) variables, then use... P Assume linear response function of response variables (species) along linear gradients defined by the explanatory variables (environment)... P Assume unimodal response function of response variables (species) along linear gradients defined by the explanatory variables (environment)... Constrained Ordination or MRT RDA, CAP CCA, DCCA Alt: RDA* 36

P Describe the major ecological patterns in one set of (response) variables explainable by another set of (explanatory) variables, then use... P Assume linear response function of response variables (species) along linear gradients defined by the explanatory variables (environment)... P Assume unimodal response function of response variables (species) along linear gradients defined by the explanatory variables (environment)... P Do not assume any response function... Constrained Ordination or MRT RDA, CAP CCA, DCCA MRT 37 P Determine the magnitude of the ecological relationships between two sets of variables expressed as distance matrices; i.e., dissimilarities between samples, then use... P Determine the magnitude of the ecological relationships between two sets of variables expressed as distance matrices after accounting for a third set of variables (i.e., Y~X Z), then use... Mantel Test Partial Mantel Test 38

Dependence Techniques Independent Variables 39 Dependence Techniques CT = Contingency tables SLR = Simple logistic regression MLR = Multiple logistic regression SRA = Simple linear regression MRA = Multiple linear regression T-test = T-test ANOVA = Analysis of variance UCT = Univar. classification trees URT = Univar. regression trees T 2 -test MANOVA = Multivariate analysis of variance DA = Discriminant analysis ISA RDA = Redundancy analysis CCA = Can. correspond. analysis COR = Canonical corr. analysis MRT = Hotelling s T 2 = Indicator species analysis CAP = Can. prin. coord. analysis = Multivar. regression trees 40

Dependence Techniques Independent Variables CT CT CT SLR SLR MLR UCT CT CT CT UCT DA DA SLR MLR UCT UCT DA DA CT RDA CT RDA CT RDA RDA DA CAP DA CAP MRT CAP CAP MRT CCA CCA COR CCA CCA COR SRA SRA MRA SRA MRA T-test ANOVA ANOVA URT URT T 2 -test RDA Manova RDA Manova RDA RDA DA CAP DA CAP MRT CAP CAP MRT ISA CCA ISA CCA COR CCA CCA COR RDA CAP CCA RDA CAP CCA 41 Advantages of Multivariate Statistics P Reflect more accurately the true multidimensional, multivariate nature of natural systems. P Provide a way to handle large data sets with large numbers of variables. P Provide a way of summarizing redundancy in large data sets. P Provide rules for combining variables in an "optimal" way. 42

Advantages of Multivariate Statistics P Provide a solution to the multiple comparison problem by controlling experimentwise error rate. P Provide a means of detecting and quantifying truly multivariate patterns that arise out of the correlational structure of the variable set. P Provide a means of exploring complex data sets for patterns and relationships from which hypotheses can be generated and subsequently tested experimentally. 43