Analysis of Multivariate Ecological Data
|
|
- Rebecca Dorsey
- 5 years ago
- Views:
Transcription
1 Analysis of Multivariate Ecological Data School on Recent Advances in Analysis of Multivariate Ecological Data October 2016 Prof. Pierre Legendre Dr. Daniel Borcard Département de sciences biologiques Université de Montréal C.P. 6128, succursale Centre Ville Montréal QC H3C 3J7 Canada
2 Day 3 2
3 Day 3 Statistical testing by permutations 3
4 Statistical testing by permutations! see course material by Pierre Legendre 4
5 Statistical tests for multivariate data 1. Parametric tests When the conditions of a given test are fulfilled, an auxiliary variable constructed on the basis of one or several parameters estimated from the data (for instance an F or t- statistic) has a known behaviour under the null hypothesis. It is thus possible to ascertain whether the observed value of that statistic is likely or not to occur if H 0 is true. If the observed value is as extreme or more extreme than the value of the reference statistic for a pre established probability level (usually α = 0.05), then H 0 is rejected. If not, H 0 is not rejected Two-tailed test: H 1 : S Sα/2 α/2 α/2 1 α H 0 rejected One-tailed test (left) H 1 : S Sα α H 0 rejected Sα/2 One-tailed test (right) H 1 : S Sα Sα H 0 accepted 1 α H 0 accepted 1 α Sα/2 Sα H 0 accepted H 0 rejected α H 0 rejected 5
6 Statistical tests for multivariate data 2. Permutation tests Principle: If no theoretical reference distribution is available, then generate a reference distribution under H 0 from the data themselves. This is achieved by permuting the data randomly in a scheme that ensures H 0 to be true, and recomputing the test statistic. Repeat the procedure a large number of times (e.g. 1000). 6
7 Statistical tests for multivariate data 2. Permutation tests Principle: The observed test statistic is then compared to the set of test statistics obtained by permutations. If the observed value is as extreme or more extreme than, say, the 5% most extreme values obtained under permutations, then it is considered too extreme for H 0 to be likely. H 0 is rejected. 7
8 Statistical tests for multivariate data 2. Permutation tests A B 8
9 Statistical tests for multivariate data 2. Permutation tests Words of caution (permutation tests) The method of permutations does not solve all the problems related to statistical testing. 1. Some problems may require different and more complicated permutation schemes than the simple random scheme applied here. Example: tests of the main factors of an ANOVA, where the permutations for factor A must be limited within the levels of factor B, and vice versa. 9
10 Statistical tests for multivariate data 2. Permutation tests Words of caution (permutation tests) 2. Permutation tests do solve several distributional problems, but not all. In particular, they do not solve distributional problems linked to the hypothesis being tested. For instance, permutational ANOVA does not require normality, but it still does require homogeneity of variances: actually two hypotheses are tested simultaneously, i.e. equality of the means and equality of the variances. 10
11 Statistical tests for multivariate data 2. Permutation tests Words of caution (permutation tests) 3. Contrary to popular belief, permutation tests do not solve the problem of independence of observations. This problem has still to be addressed by special solutions, differing from case to case, and often related to the correction of degrees of freedom. 11
12 Statistical tests for multivariate data 2. Permutation tests Words of caution (permutation tests) 4. Although many statistics can be tested directly by permutations (e.g. Pearson's r), it is advised to use a pivotal statistic whenever possible. A pivotal statistic has a distribution under the null hypothesis which remains the same for any value of the measured effect. 5. It is not the statistic itself which determines if a test is parametric or not: it is the reference to a theoretical distribution (which requires assumptions about the parameters of the statistical population from which the data have been extracted) or to permutations. 12
13 Statistical tests for multivariate data 3. Tests of an RDA or CCA To test one single axis at a time: verify whether an equal or larger eigenvalue can be obtained under the null hypothesis of no relationship between the response matrix and the explanatory matrix. To test the significance of the analysis globally, the basis is the sum of all canonical eigenvalues. The hypotheses are thus: - H 0 : there is no linear relationship between the response matrix and the explanatory matrix; - H 1 : there is a linear relationship between the response matrix and the explanatory matrix. 13
14 Statistical tests for multivariate data 3. Tests of an RDA or CCA Originally, the test statistic was the eigenvalue or sum of canonical eigenvalues itself. Now, one uses a pivotal statistic instead, which is a "pseudo-f" statistic which is defined as: F = sum of all canonical eigenvalues / m RSS/(n m 1) where n is the number of objects, m is the number of explanatory variables and RSS is the residual sum of squares, i.e. the sum of non-canonical eigenvalues (after fitting the explanatory variables). 14
15 Statistical tests for multivariate data 3. Tests of an RDA or CCA: permutation procedures The main permutation types are the following: without covariables in the analysis: - permutation of raw data - permutation of residuals with covariables in the analysis - permutation of residuals under a reduced (or null) model; - permutation of residuals under a full model. 15
16 Day 3 Canonical ordination 16
17 1. Introduction Explicitly puts into relationship two matrices: one dependent matrix and one explanatory matrix. Both are involved at the stage of the ordination. This approach combines the techniques of ordination and multiple regression 17
18 1. Introduction Response variables Explanatory variables Analysis 1 variable 1 variable Simple regression 1 variable m variables Multiple regression p variables - Simple ordination p variables m variables Canonical ordination 18
19 1. Introduction The results of RDA and CCA are presented in the form of biplots or triplots. The explanatory variables can be qualitative (the multistate ones are declared as "factor" (vegan) or coded as a series of binary variables (e.g. Canoco), or quantitative. 19
20 1. Introduction A qualitative explanatory variable is represented on the bi- or triplot as the centroid of the sites that have the description "1" for that variable ("Centroids for factor constraints" in vegan, "Centroids of environmental variables" in Canoco). A quantitative explanatory variable is represented as a vector (the vector apices are given under the name "Biplot scores for constraining variables" in vegan and "Biplot scores of environmental variables" in Canoco). 20
21 Day 3 Redundancy analysis (RDA) 21
22 2. Canonical ordination: redundancy analysis (RDA) Response variables Explanatory var. Data table Y (centred variables) Data table X (centred var.) YU = ordination in the space of variables Y Regress each variable y on table X and compute the fitted (y) ^ and residual (y res ) values Fitted values from the multiple regressions ^ 1 Y = X [X'X] X'Y PCA U = matrix of eigenvectors (canonical) ^ YU = ordination in the space of variables X
23 2. Redundancy analysis (RDA) RDA Scaling 1 = Distance biplot: the eigenvectors are scaled to unit length. 1) Distances among objects in the biplot are approximations of their Euclidean distances in multidimensional space. 2) The angles among response vectors are meaningless. 3) Projecting an object at right angle on a response variable or a quantitative explanatory variable approximates the position of the object along that variable. 23
24 2. Redundancy analysis (RDA) RDA Scaling 1 = Distance biplot: the eigenvectors are scaled to unit length. 4) The angles between response and explanatory variables in the biplot reflect their correlations. 5) The relationship between the centroid of a qualitative explanatory variable and a response variable (species) is found by projecting the centroid at right angle on the variable (as for individual objects). 6) Distances among centroids, and between centroids and individual objects, approximate Euclidean distances. 24
25 Triplot RDA spe.hel ~ env2 - scaling 1 - wa scores RDA on a covariance matrix, Hellingertransformed species abundances, scaling 1 Numbers = sites Red = species Blue = explanatory variables Canonical ordination BAR RDA ABL deb nit GRE GAR dur BOU SPI TOX CAR ANG BCO PSO TANPER BBO PCH HOT 29 VAN GOU BRO ROT pho amm dbo CHE 5 9 BLA CHA OMB 16 ph penmoderate penlow pensteep penvery_steep oxy LOC alt VAI TRU RDA1 25
26 2. Redundancy analysis (RDA) RDA Scaling 2 = correlation biplot: the eigenvectors are scaled to the square root of their eigenvalue. 1) Distances among objects in the biplot are not approximations of their Euclidean distances in multidimensional space. 2) The angles in the biplot between response and explanatory variables, and between response variables themselves or explanatory variables themselves, reflect their correlations. 3) Projecting an object at right angle on a response or an explanatory variable approximates the value of the object along that variable. 26
27 2. Redundancy analysis (RDA) RDA Scaling 2 = correlation biplot: the eigenvectors are scaled to the square root of their eigenvalue. 4) The angles between descriptors reflect their correlations. 5) The relationship between the centroid of a qualitative explanatory variable and a response variable (species) is found by projecting the centroid at right angle on the variable (as for individual objects). 6) Distances among centroids, and between centroids and individual objects, do not approximate Euclidean distances. 27
28 RDA on a covariance matrix, Hellinger-transformed species abundances, scaling 2 Blue: species Green+ brown: quantitative explanatory variables Yellow + red + black: categorical explanatory variables
29 Day 3 Canonical correspondence analysis (CCA) 29
30 3. Canonical correspondence analysis (CCA) CCA is actually a constrained CA, i.e. a constrained PCA on a species data table that has been transformed into a table of Pearson χ 2 statistics. Objects, response variables and centroids of categories are plotted as points on the biplot or the triplot. Quantitative explanatory variables are plotted as vectors (arrows). For the species and objects, the interpretation is the same as in CA. 30
31 3. Canonical correspondence analysis (CCA) Interpretation of the explanatory variables: CCA Scaling type 1 (focus on sites): (1) The position of objects on a quantitative explanatory variable can be obtained by projecting the objects at right angle on the variable. (2) An object found near the point representing the centroid of a qualitative explanatory variable is more likely to possess the state "1" for that variable. 31
32 3. Canonical correspondence analysis (CCA) Interpretation of the explanatory variables: CCA Scaling type 2 (focus on species): (1) The optimum of a species along a quantitative environmental variable can be obtained by projecting the species at right angle on the variable. (2) A species found near the centroid of a qualitative environmental variable is likely to be found frequently (or in larger abundances) in the sites possessing the state "1" for that variable. 32
33 3. CCA: example with scaling 2 33
34 Day 3 Multivariate ANOVA by RDA 34
35 4. Multivariate ANOVA by RDA In its classical, parametric form, multivariate analysis of variance (MANOVA) has stringent conditions of application and restrictions (e.g. multivariate normality of each group of data, homogeneity of the variance-covariance matrices, number of response variables smaller than the number of objects minus the number of groups ). RDA offers an elegant alternative, and adds the versatility of the permutation tests and the triplot representation of results. 35
36 4. Orthogonal factors: coding an ANOVA for RDA To run an equivalent of MANOVA using RDA, to allow testing the factors and interaction in a way that provides the correct F values, one must code the factors in such a way that: 1. The variables represent exactly the experimental design. 2. The variables are orthogonal to one another 3. The interaction (when present) can be properly coded as orthogonal to the main factors. 4. The number of variables needed to code each factor (and the interaction) is equal to their respective number of degrees of freedom. à Helmert contrasts 36
37 4. Orthogonal factors: coding an ANOVA for RDA Two orthogonal factors, several observations (objects) per cell. Factor B : 2 levels Factor A: 3 levels Object 1 Object 2 Object 5 Object 6 Object 9 Object 10 Object 3 Object 4 Object 7 Object 8 Object 11 Object 12 n = 12 Factor A: 3 levels, therefore 2 orthogonal variables Factor B: 2 levels, therefore 1 variable 37
38 4. Orthogonal factors: coding an ANOVA for RDA 1. All columns must have zero sum. 2. The number of variables needed to code a factor corresponds to the number of degrees of freedom of this factor; this includes the interaction. 3. The correlation among variables is 0 everywhere. 4. Interaction variables are produced by columnwise multiplication of factor variables. Obj.1 Obj.2 Obj.3 Obj.4 Obj.5 Obj.6 Obj.7 Obj.8 Obj.9 Obj.10 Obj.11 Obj.12 Factor A Factor B Interaction (A B)
39 Warning 4. Multivariate ANOVA by RDA Testing by permutations does not alleviate the requirement of homogeneity of within-group dispersions in multivariate ANOVA by RDA. This condition can be tested in R by function betadisper() {vegan}. 39
40 Example 4. Multivariate ANOVA by RDA 27 sites of the (Hellinger-transformed) Doubs fish data and fictitious balanced two-way ANOVA design: Factor "altitude" (alt.fac): 3 levels Factor "ph" (ph.fac): 3 levels 40
41 Example 4. Multivariate ANOVA by RDA # Creation of a factor 'altitude' (3 levels, 9 sites each) alt.fac <- gl(3, 9,labels=c("high", "mid", "low")) # Creation of a factor mimicking 'ph' ph.fac <- as.factor(c(1, 2, 3, 2, 3, 1, 3, 2, 1, 2, 1, 3, 3, 2, 1, 1, 2, 3, 2, 1, 2, 3, 2, 1, 1, 3, 3)) # Are the factors balanced? table(alt.fac, ph.fac) ph.fac alt.fac high mid low
42 Example Canonical ordination 4. Multivariate ANOVA by RDA # Creation of Helmert contrasts for the factors and their # interaction alt.ph.helm <- model.matrix(~ alt.fac * ph.fac, contrasts=list(alt.fac="contr.helmert", ph.fac="contr.helmert"))[,-1] head(alt.ph.helm) alt1 alt2 ph1 ph2 alt1:ph1 alt2:ph1 alt1:ph2 alt2:ph
43 4. Multivariate ANOVA by RDA Example Within-group dispersions: see script of today's practicals: ICTP-Day3.R Within-group dispersions are homogeneous. 43
44 Example 4. Multivariate ANOVA by RDA 1. Test of the interaction; unconstrained permutations Permutation test for rda under reduced model Permutation: free Number of permutations: 999 Model: rda(x = spe.hel[1:27, ], Y = alt.ph.helm[, 5:8], Z = alt.ph.helm[, 1:4]) Df Variance F Pr(>F) Model Residual Nonsignificant interaction => we can proceed. 44
45 4. Multivariate ANOVA by RDA Example 2. Test of the main factor "altitude"; permutations constrained within the levels of factor "ph". Permutation test for rda under reduced model Blocks: strata Permutation: free Number of permutations: 999 Model: rda(x = spe.hel[1:27, ], Y = alt.ph.helm[, 1:2], Z = alt.ph.helm[, 3:8]) Df Variance F Pr(>F) Model *** Residual
46 4. Multivariate ANOVA by RDA Example 3. Test of the main factor "ph"; permutations constrained within the levels of factor "altitude". Permutation test for rda under reduced model Blocks: strata Permutation: free Number of permutations: 999 Model: rda(x = spe.hel[1:27, ], Y = alt.ph.helm[, 3:4], Z = alt.ph.helm[, c(1:2, 5:8)]) Df Variance F Pr(>F) Model Residual
47 Example 4. Multivariate ANOVA by RDA Only factor "altitude" is significant. One could compute an RDA with the Helmert contrasts coding for altitude, and draw a triplot with the sites' weighted sum scores related to the factor levels, and the arrows of the species scores. 47
48 Multivariate ANOVA, factor altitude scaling 1 wa scores RDA2 Canonical ordination Baba 4. Multivariate ANOVA by RDA low 19 Alal Pato Albi Legi Cyca Rham Anan Chna Gogo Lele Gyce Blbj Abbr IcmePefl Scer Sqce Titi Eslu Ruru Cogo Thth Teso mid high Phph Babl Satr RDA1 48
49 Day 3 Selection of explanatory variables 49
50 5. Selection of environmental variables There are situations where one wants to reduce the number of explanatory variables in a regression or canonical ordination model for various reasons, e.g.: - not enough "sound ecological thinking => too many candidate explanatory variables; - special procedures (e.g. dbmem, see Day 5) producing a large number of explanatory variables. This can be done with a procedure of selection of explanatory variables. 50
51 5. Selection of environmental variables No single, perfect method exists to reduce the number of variables, besides the examination of all possible subsets of explanatory variables. In multiple regression, the three usual methods are forward, backward and stepwise selection of explanatory variables, the latter one being a combination of the former two. In RDA, forward selection is the method most often applied. 51
52 5. Forward selection of environmental variables The principle of forward selection is as follows: 1. Compute, in turn, the independent contribution of each of the m explanatory variables to the explanation of the variation of the response data table. This is done by running m separate canonical analyses. 2. Test the significance of the contribution of the best variable. 3. If it is significant, include it into the model as a first explanatory variable. 52
53 5. Forward selection of environmental variables 4. Compute (one at a time) the partial contributions (conditional effects) of the m 1 remaining explanatory variables, controlling for the effect of the one already in the model. 5. Test the significance of the best partial contribution among the m 1 variables. 6. If it is significant, include this variable into the model. 7. Compute the partial contributions of the m 2 remaining explanatory variables, controlling for the effect of the two already in the model. 8. The procedure goes on until no more significant partial contribution is found. 53
54 5. Forward selection of environmental variables a) First of all, forward selection is too liberal (i.e., it allows too many explanatory variables to enter a model). Before running a forward selection, always perform a global test (including all explanatory variables). If, and only if the global test is significant, run the forward selection. 54
55 5. Forward selection of environmental variables b) Even if the global test is significant, forward selection is too liberal. Simulations have shown that, in addition to the usual alpha level, one must add a second stopping criterion to forward selection: the model under construction must not have an R 2 adj higher than that of the global model (i.e., the model containing all explanatory variables). Blanchet F. G., P. Legendre, and D. Borcard Forward selection of explanatory variables. Ecology 89:
56 5. Forward selection of environmental variables c) The tests are run by random permutations. d) Like all procedures of selection (forward, backward or stepwise), this one does not guarantee that the best model is found. From the second step on, the inclusion of variables is conditioned by the nature of the variables that are already in the model. 56
57 5. Forward selection of environmental variables e) As in all regression models, the presence of strongly intercorrelated explanatory variables renders the regression/ canonical coefficients unstable. Forward selection does not necessarily eliminate this problem since even strongly correlated variables may be admitted into a model. 57
58 5. Forward selection of environmental variables f) Forward selection can help when several candidate explanatory variables are strongly correlated, but the choice has no a priori ecological validity. In this case it is often advisable to eliminate one of the intercorrelated variables on ecological basis rather than on statistical basis. g) In cases where several correlated explanatory variables are present, without clear a priori reasons to eliminate one or the other, one can examine the variance inflation factors (VIF). 58
59 5. Forward selection of environmental variables h) The variance inflation factors (VIF) measure how much the variance of the regression or canonical coefficients is inflated by the presence of correlations among explanatory variables. This measures in fact the instability of the regression model. i) As a rule of thumb, ter Braak recommends that variables that have a VIF larger than 20 be removed from the analysis. j) Beware: always remove the variables one at a time and recompute the analysis, since the VIF of every variable depends on all the others! 59
60 5. Forward selection of environmental variables In R, variable selection for ecological data can be run with the following functions: forward.sel() {adespatial} ordistep() {vegan} ordir2step() {vegan} Forward Backward R 2 a Factors 60
Canonical analysis. Pierre Legendre Département de sciences biologiques Université de Montréal
Canonical analysis Pierre Legendre Département de sciences biologiques Université de Montréal http://www.numericalecology.com/ Pierre Legendre 2017 Outline of the presentation 1. Canonical analysis: definition
More information4/2/2018. Canonical Analyses Analysis aimed at identifying the relationship between two multivariate datasets. Cannonical Correlation.
GAL50.44 0 7 becki 2 0 chatamensis 0 darwini 0 ephyppium 0 guntheri 3 0 hoodensis 0 microphyles 0 porteri 2 0 vandenburghi 0 vicina 4 0 Multiple Response Variables? Univariate Statistics Questions Individual
More informationAnalysis of Multivariate Ecological Data
Analysis of Multivariate Ecological Data School on Recent Advances in Analysis of Multivariate Ecological Data 24-28 October 2016 Prof. Pierre Legendre Dr. Daniel Borcard Département de sciences biologiques
More informationAnalysis of Multivariate Ecological Data
Analysis of Multivariate Ecological Data School on Recent Advances in Analysis of Multivariate Ecological Data 24-28 October 2016 Prof. Pierre Legendre Dr. Daniel Borcard Département de sciences biologiques
More information1.2. Correspondence analysis. Pierre Legendre Département de sciences biologiques Université de Montréal
1.2. Pierre Legendre Département de sciences biologiques Université de Montréal http://www.numericalecology.com/ Pierre Legendre 2018 Definition of correspondence analysis (CA) An ordination method preserving
More informationChapter 11 Canonical analysis
Chapter 11 Canonical analysis 11.0 Principles of canonical analysis Canonical analysis is the simultaneous analysis of two, or possibly several data tables. Canonical analyses allow ecologists to perform
More informationPartial regression and variation partitioning
Partial regression and variation partitioning Pierre Legendre Département de sciences biologiques Université de Montréal http://www.numericalecology.com/ Pierre Legendre 2017 Outline of the presentation
More informationCommunity surveys through space and time: testing the space-time interaction in the absence of replication
Community surveys through space and time: testing the space-time interaction in the absence of replication Pierre Legendre, Miquel De Cáceres & Daniel Borcard Département de sciences biologiques, Université
More information4. Ordination in reduced space
Université Laval Analyse multivariable - mars-avril 2008 1 4.1. Generalities 4. Ordination in reduced space Contrary to most clustering techniques, which aim at revealing discontinuities in the data, ordination
More informationCommunity surveys through space and time: testing the space time interaction
Suivi spatio-temporel des écosystèmes : tester l'interaction espace-temps pour identifier les impacts sur les communautés Community surveys through space and time: testing the space time interaction Pierre
More informationFiche TD avec le logiciel. Coinertia Analysis. A.B. Dufour
Fiche TD avec le logiciel : course6 Coinertia Analysis A.B. Dufour Contents 1 Introduction 2 2 Principle 2 2.1 Remembering the relationship between two variables....... 2 2.2 Defining the relationship
More informationTemporal eigenfunction methods for multiscale analysis of community composition and other multivariate data
Temporal eigenfunction methods for multiscale analysis of community composition and other multivariate data Pierre Legendre Département de sciences biologiques Université de Montréal Pierre.Legendre@umontreal.ca
More information4/4/2018. Stepwise model fitting. CCA with first three variables only Call: cca(formula = community ~ env1 + env2 + env3, data = envdata)
0 Correlation matrix for ironmental matrix 1 2 3 4 5 6 7 8 9 10 11 12 0.087451 0.113264 0.225049-0.13835 0.338366-0.01485 0.166309-0.11046 0.088327-0.41099-0.19944 1 1 2 0.087451 1 0.13723-0.27979 0.062584
More informationChapter 2 Exploratory Data Analysis
Chapter 2 Exploratory Data Analysis 2.1 Objectives Nowadays, most ecological research is done with hypothesis testing and modelling in mind. However, Exploratory Data Analysis (EDA), which uses visualization
More informationFigure 43 - The three components of spatial variation
Université Laval Analyse multivariable - mars-avril 2008 1 6.3 Modeling spatial structures 6.3.1 Introduction: the 3 components of spatial structure For a good understanding of the nature of spatial variation,
More informationExperimental Design and Data Analysis for Biologists
Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1
More informationComparison of two samples
Comparison of two samples Pierre Legendre, Université de Montréal August 009 - Introduction This lecture will describe how to compare two groups of observations (samples) to determine if they may possibly
More informationAnalyse canonique, partition de la variation et analyse CPMV
Analyse canonique, partition de la variation et analyse CPMV Legendre, P. 2005. Analyse canonique, partition de la variation et analyse CPMV. Sémin-R, atelier conjoint GREFi-CRBF d initiation au langage
More information8. FROM CLASSICAL TO CANONICAL ORDINATION
Manuscript of Legendre, P. and H. J. B. Birks. 2012. From classical to canonical ordination. Chapter 8, pp. 201-248 in: Tracking Environmental Change using Lake Sediments, Volume 5: Data handling and numerical
More informationNONLINEAR REDUNDANCY ANALYSIS AND CANONICAL CORRESPONDENCE ANALYSIS BASED ON POLYNOMIAL REGRESSION
Ecology, 8(4),, pp. 4 by the Ecological Society of America NONLINEAR REDUNDANCY ANALYSIS AND CANONICAL CORRESPONDENCE ANALYSIS BASED ON POLYNOMIAL REGRESSION VLADIMIR MAKARENKOV, AND PIERRE LEGENDRE, Département
More informationTesting the significance of canonical axes in redundancy analysis
Methods in Ecology and Evolution 2011, 2, 269 277 doi: 10.1111/j.2041-210X.2010.00078.x Testing the significance of canonical axes in redundancy analysis Pierre Legendre 1 *, Jari Oksanen 2 and Cajo J.
More information1.3. Principal coordinate analysis. Pierre Legendre Département de sciences biologiques Université de Montréal
1.3. Pierre Legendre Département de sciences biologiques Université de Montréal http://www.numericalecology.com/ Pierre Legendre 2018 Definition of principal coordinate analysis (PCoA) An ordination method
More informationUse R! Series Editors:
Use R! Series Editors: Robert Gentleman Kurt Hornik Giovanni G. Parmigiani Use R! Albert: Bayesian Computation with R Bivand/Pebesma/Gómez-Rubio: Applied Spatial Data Analysis with R Cook/Swayne: Interactive
More informationINTRODUCTION TO MULTIVARIATE ANALYSIS OF ECOLOGICAL DATA
INTRODUCTION TO MULTIVARIATE ANALYSIS OF ECOLOGICAL DATA David Zelený & Ching-Feng Li INTRODUCTION TO MULTIVARIATE ANALYSIS Ecologial similarity similarity and distance indices Gradient analysis regression,
More informationPredictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore
What is Multiple Linear Regression Several independent variables may influence the change in response variable we are trying to study. When several independent variables are included in the equation, the
More informationAlgebra of Principal Component Analysis
Algebra of Principal Component Analysis 3 Data: Y = 5 Centre each column on its mean: Y c = 7 6 9 y y = 3..6....6.8 3. 3.8.6 Covariance matrix ( variables): S = -----------Y n c ' Y 8..6 c =.6 5.8 Equation
More informationMultiple linear regression S6
Basic medical statistics for clinical and experimental research Multiple linear regression S6 Katarzyna Jóźwiak k.jozwiak@nki.nl November 15, 2017 1/42 Introduction Two main motivations for doing multiple
More informationVisualizing Tests for Equality of Covariance Matrices Supplemental Appendix
Visualizing Tests for Equality of Covariance Matrices Supplemental Appendix Michael Friendly and Matthew Sigal September 18, 2017 Contents Introduction 1 1 Visualizing mean differences: The HE plot framework
More informationCompare several group means: ANOVA
1 - Introduction Compare several group means: ANOVA Pierre Legendre, Université de Montréal August 009 Objective: compare the means of several () groups for a response variable of interest. The groups
More informationy ˆ i = ˆ " T u i ( i th fitted value or i th fit)
1 2 INFERENCE FOR MULTIPLE LINEAR REGRESSION Recall Terminology: p predictors x 1, x 2,, x p Some might be indicator variables for categorical variables) k-1 non-constant terms u 1, u 2,, u k-1 Each u
More informationMultivariate Statistics Summary and Comparison of Techniques. Multivariate Techniques
Multivariate Statistics Summary and Comparison of Techniques P The key to multivariate statistics is understanding conceptually the relationship among techniques with regards to: < The kinds of problems
More information" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2
Notation and Equations for Final Exam Symbol Definition X The variable we measure in a scientific study n The size of the sample N The size of the population M The mean of the sample µ The mean of the
More informationEXAM PRACTICE. 12 questions * 4 categories: Statistics Background Multivariate Statistics Interpret True / False
EXAM PRACTICE 12 questions * 4 categories: Statistics Background Multivariate Statistics Interpret True / False Stats 1: What is a Hypothesis? A testable assertion about how the world works Hypothesis
More informationGeneralized Linear Models (GLZ)
Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) are an extension of the linear modeling process that allows models to be fit to data that follow probability distributions other than the
More informationMultivariate Analysis of Ecological Data using CANOCO
Multivariate Analysis of Ecological Data using CANOCO JAN LEPS University of South Bohemia, and Czech Academy of Sciences, Czech Republic Universitats- uric! Lanttesbibiiothek Darmstadt Bibliothek Biologie
More informationMultivariate Analysis of Ecological Data
Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology
More informationSpecies Associations: The Kendall Coefficient of Concordance Revisited
Species Associations: The Kendall Coefficient of Concordance Revisited Pierre LEGENDRE The search for species associations is one of the classical problems of community ecology. This article proposes to
More informationGlossary. The ISI glossary of statistical terms provides definitions in a number of different languages:
Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the
More informationPrincipal component analysis
Principal component analysis Motivation i for PCA came from major-axis regression. Strong assumption: single homogeneous sample. Free of assumptions when used for exploration. Classical tests of significance
More information-Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the
1 2 3 -Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the 1950's. -PCA is based on covariance or correlation
More informationDimensionality Reduction Techniques (DRT)
Dimensionality Reduction Techniques (DRT) Introduction: Sometimes we have lot of variables in the data for analysis which create multidimensional matrix. To simplify calculation and to get appropriate,
More informationBivariate Relationships Between Variables
Bivariate Relationships Between Variables BUS 735: Business Decision Making and Research 1 Goals Specific goals: Detect relationships between variables. Be able to prescribe appropriate statistical methods
More informationCommunity surveys through space and time: testing the space-time interaction in the absence of replication
Community surveys through space and time: testing the space-time interaction in the absence of replication Pierre Legendre Département de sciences biologiques Université de Montréal http://www.numericalecology.com/
More informationCanonical Correlation & Principle Components Analysis
Canonical Correlation & Principle Components Analysis Aaron French Canonical Correlation Canonical Correlation is used to analyze correlation between two sets of variables when there is one set of IVs
More informationVarCan (version 1): Variation Estimation and Partitioning in Canonical Analysis
VarCan (version 1): Variation Estimation and Partitioning in Canonical Analysis Pedro R. Peres-Neto March 2005 Department of Biology University of Regina Regina, SK S4S 0A2, Canada E-mail: Pedro.Peres-Neto@uregina.ca
More informationCorrelation. Tests of Relationships: Correlation. Correlation. Correlation. Bivariate linear correlation. Correlation 9/8/2018
Tests of Relationships: Parametric and non parametric approaches Whether samples from two different variables vary together in a linear fashion Parametric: Pearson product moment correlation Non parametric:
More informationParametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami
Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric Assumptions The observations must be independent. Dependent variable should be continuous
More informationCHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)
FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter
More informationBiol 206/306 Advanced Biostatistics Lab 5 Multiple Regression and Analysis of Covariance Fall 2016
Biol 206/306 Advanced Biostatistics Lab 5 Multiple Regression and Analysis of Covariance Fall 2016 By Philip J. Bergmann 0. Laboratory Objectives 1. Extend your knowledge of bivariate OLS regression to
More informationCHI SQUARE ANALYSIS 8/18/2011 HYPOTHESIS TESTS SO FAR PARAMETRIC VS. NON-PARAMETRIC
CHI SQUARE ANALYSIS I N T R O D U C T I O N T O N O N - P A R A M E T R I C A N A L Y S E S HYPOTHESIS TESTS SO FAR We ve discussed One-sample t-test Dependent Sample t-tests Independent Samples t-tests
More informationStat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010
1 Linear models Y = Xβ + ɛ with ɛ N (0, σ 2 e) or Y N (Xβ, σ 2 e) where the model matrix X contains the information on predictors and β includes all coefficients (intercept, slope(s) etc.). 1. Number of
More informationAnalysing data: regression and correlation S6 and S7
Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association
More informationDissimilarity and transformations. Pierre Legendre Département de sciences biologiques Université de Montréal
and transformations Pierre Legendre Département de sciences biologiques Université de Montréal http://www.numericalecology.com/ Pierre Legendre 2017 Definitions An association coefficient is a function
More information2 Regression Analysis
FORK 1002 Preparatory Course in Statistics: 2 Regression Analysis Genaro Sucarrat (BI) http://www.sucarrat.net/ Contents: 1 Bivariate Correlation Analysis 2 Simple Regression 3 Estimation and Fit 4 T -Test:
More informationEstimating and controlling for spatial structure in the study of ecological communitiesgeb_
Global Ecology and Biogeography, (Global Ecol. Biogeogr.) (2010) 19, 174 184 MACROECOLOGICAL METHODS Estimating and controlling for spatial structure in the study of ecological communitiesgeb_506 174..184
More informationLecture 15. Hypothesis testing in the linear model
14. Lecture 15. Hypothesis testing in the linear model Lecture 15. Hypothesis testing in the linear model 1 (1 1) Preliminary lemma 15. Hypothesis testing in the linear model 15.1. Preliminary lemma Lemma
More informationChapter 1 Statistical Inference
Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations
More informationTopic 12. The Split-plot Design and its Relatives (continued) Repeated Measures
12.1 Topic 12. The Split-plot Design and its Relatives (continued) Repeated Measures 12.9 Repeated measures analysis Sometimes researchers make multiple measurements on the same experimental unit. We have
More informationCorrelation. A statistics method to measure the relationship between two variables. Three characteristics
Correlation Correlation A statistics method to measure the relationship between two variables Three characteristics Direction of the relationship Form of the relationship Strength/Consistency Direction
More informationThe discussion Analyzing beta diversity contains the following papers:
The discussion Analyzing beta diversity contains the following papers: Legendre, P., D. Borcard, and P. Peres-Neto. 2005. Analyzing beta diversity: partitioning the spatial variation of community composition
More informationPrincipal Component Analysis (PCA) Theory, Practice, and Examples
Principal Component Analysis (PCA) Theory, Practice, and Examples Data Reduction summarization of data with many (p) variables by a smaller set of (k) derived (synthetic, composite) variables. p k n A
More informationStatistics in medicine
Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu
More informationSpatial eigenfunction modelling: recent developments
Spatial eigenfunction modelling: recent developments Pierre Legendre Département de sciences biologiques Université de Montréal http://www.numericalecology.com/ Pierre Legendre 2018 Outline of the presentation
More informationMath 423/533: The Main Theoretical Topics
Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)
More informationFinding Relationships Among Variables
Finding Relationships Among Variables BUS 230: Business and Economic Research and Communication 1 Goals Specific goals: Re-familiarize ourselves with basic statistics ideas: sampling distributions, hypothesis
More informationMODEL II REGRESSION USER S GUIDE, R EDITION
MODEL II REGRESSION USER S GUIDE, R EDITION PIERRE LEGENDRE Contents 1. Recommendations on the use of model II regression methods 2 2. Ranged major axis regression 4 3. Input file 5 4. Output file 5 5.
More informationSleep data, two drugs Ch13.xls
Model Based Statistics in Biology. Part IV. The General Linear Mixed Model.. Chapter 13.3 Fixed*Random Effects (Paired t-test) ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch
More informationUnconstrained Ordination
Unconstrained Ordination Sites Species A Species B Species C Species D Species E 1 0 (1) 5 (1) 1 (1) 10 (4) 10 (4) 2 2 (3) 8 (3) 4 (3) 12 (6) 20 (6) 3 8 (6) 20 (6) 10 (6) 1 (2) 3 (2) 4 4 (5) 11 (5) 8 (5)
More informationREVIEW 8/2/2017 陈芳华东师大英语系
REVIEW Hypothesis testing starts with a null hypothesis and a null distribution. We compare what we have to the null distribution, if the result is too extreme to belong to the null distribution (p
More informationData Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression
Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction
More informationOrdination & PCA. Ordination. Ordination
Ordination & PCA Introduction to Ordination Purpose & types Shepard diagrams Principal Components Analysis (PCA) Properties Computing eigenvalues Computing principal components Biplots Covariance vs. Correlation
More informationVariable Selection and Model Building
LINEAR REGRESSION ANALYSIS MODULE XIII Lecture - 39 Variable Selection and Model Building Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur 5. Akaike s information
More information6. Spatial analysis of multivariate ecological data
Université Laval Analyse multivariable - mars-avril 2008 1 6. Spatial analysis of multivariate ecological data 6.1 Introduction 6.1.1 Conceptual importance Ecological models have long assumed, for simplicity,
More informationFuzzy coding in constrained ordinations
Fuzzy coding in constrained ordinations Michael Greenacre Department of Economics and Business Faculty of Biological Sciences, Fisheries & Economics Universitat Pompeu Fabra University of Tromsø 08005
More informationMANOVA MANOVA,$/,,# ANOVA ##$%'*!# 1. $!;' *$,$!;' (''
14 3! "#!$%# $# $&'('$)!! (Analysis of Variance : ANOVA) *& & "#!# +, ANOVA -& $ $ (+,$ ''$) *$#'$)!!#! (Multivariate Analysis of Variance : MANOVA).*& ANOVA *+,'$)$/*! $#/#-, $(,!0'%1)!', #($!#$ # *&,
More informationSampling distribution of t. 2. Sampling distribution of t. 3. Example: Gas mileage investigation. II. Inferential Statistics (8) t =
2. The distribution of t values that would be obtained if a value of t were calculated for each sample mean for all possible random of a given size from a population _ t ratio: (X - µ hyp ) t s x The result
More informationFactor analysis. George Balabanis
Factor analysis George Balabanis Key Concepts and Terms Deviation. A deviation is a value minus its mean: x - mean x Variance is a measure of how spread out a distribution is. It is computed as the average
More informationSTAT 501 EXAM I NAME Spring 1999
STAT 501 EXAM I NAME Spring 1999 Instructions: You may use only your calculator and the attached tables and formula sheet. You can detach the tables and formula sheet from the rest of this exam. Show your
More informationMathematical Notation Math Introduction to Applied Statistics
Mathematical Notation Math 113 - Introduction to Applied Statistics Name : Use Word or WordPerfect to recreate the following documents. Each article is worth 10 points and should be emailed to the instructor
More informationDay 4: Shrinkage Estimators
Day 4: Shrinkage Estimators Kenneth Benoit Data Mining and Statistical Learning March 9, 2015 n versus p (aka k) Classical regression framework: n > p. Without this inequality, the OLS coefficients have
More informationSpecies associations
Species associations Pierre Legendre 1 and F. Guillaume Blanchet 2 1 Département de sciences biologiques, Université de Montréal 2 Department of Renewable Resources, University of Alberta Introduction
More informationChapter 4: Regression Models
Sales volume of company 1 Textbook: pp. 129-164 Chapter 4: Regression Models Money spent on advertising 2 Learning Objectives After completing this chapter, students will be able to: Identify variables,
More informationG562 Geometric Morphometrics. Statistical Tests. Department of Geological Sciences Indiana University. (c) 2012, P. David Polly
Statistical Tests Basic components of GMM Procrustes This aligns shapes and minimizes differences between them to ensure that only real shape differences are measured. PCA (primary use) This creates a
More informationStatistics: revision
NST 1B Experimental Psychology Statistics practical 5 Statistics: revision Rudolf Cardinal & Mike Aitken 29 / 30 April 2004 Department of Experimental Psychology University of Cambridge Handouts: Answers
More informationMultivariate Regression (Chapter 10)
Multivariate Regression (Chapter 10) This week we ll cover multivariate regression and maybe a bit of canonical correlation. Today we ll mostly review univariate multivariate regression. With multivariate
More informationDr. Maddah ENMG 617 EM Statistics 11/28/12. Multiple Regression (3) (Chapter 15, Hines)
Dr. Maddah ENMG 617 EM Statistics 11/28/12 Multiple Regression (3) (Chapter 15, Hines) Problems in multiple regression: Multicollinearity This arises when the independent variables x 1, x 2,, x k, are
More informationReview of Statistics 101
Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods
More informationSTAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression
STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression Rebecca Barter April 20, 2015 Fisher s Exact Test Fisher s Exact Test
More informationStatistics Introductory Correlation
Statistics Introductory Correlation Session 10 oscardavid.barrerarodriguez@sciencespo.fr April 9, 2018 Outline 1 Statistics are not used only to describe central tendency and variability for a single variable.
More informationAnalysis of Variance
Statistical Techniques II EXST7015 Analysis of Variance 15a_ANOVA_Introduction 1 Design The simplest model for Analysis of Variance (ANOVA) is the CRD, the Completely Randomized Design This model is also
More informationANALYSIS OF CHARACTER DIVERGENCE ALONG ENVIRONMENTAL GRADIENTS AND OTHER COVARIATES
ORIGINAL ARTICLE doi:10.1111/j.1558-5646.2007.00063.x ANALYSIS OF CHARACTER DIVERGENCE ALONG ENVIRONMENTAL GRADIENTS AND OTHER COVARIATES Dean C. Adams 1,2,3 and Michael L. Collyer 1,4 1 Department of
More informationCanonical Correlations
Canonical Correlations Like Principal Components Analysis, Canonical Correlation Analysis looks for interesting linear combinations of multivariate observations. In Canonical Correlation Analysis, a multivariate
More informationAnalysis of community ecology data in R
Analysis of community ecology data in R Jinliang Liu ( 刘金亮 ) Institute of Ecology, College of Life Science Zhejiang University Email: jinliang.liu@foxmail.com http://jinliang.weebly.com R packages ###
More informationChapter Seven: Multi-Sample Methods 1/52
Chapter Seven: Multi-Sample Methods 1/52 7.1 Introduction 2/52 Introduction The independent samples t test and the independent samples Z test for a difference between proportions are designed to analyze
More informationMLR Model Selection. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project
MLR Model Selection Author: Nicholas G Reich, Jeff Goldsmith This material is part of the statsteachr project Made available under the Creative Commons Attribution-ShareAlike 3.0 Unported License: http://creativecommons.org/licenses/by-sa/3.0/deed.en
More informationUPDATE NOTES: CANOCO VERSION 3.10
UPDATE NOTES: CANOCO VERSION 3.10 Cajo J.F. ter Braak 1 New features in CANOCO 3.10 compared to CANOCO 2.x: 2 2 Summary of the ordination....................................... 5 2.1 Summary of the ordination
More informationSimple Linear Regression
Chapter 2 Simple Linear Regression Linear Regression with One Independent Variable 2.1 Introduction In Chapter 1 we introduced the linear model as an alternative for making inferences on means of one or
More informationy response variable x 1, x 2,, x k -- a set of explanatory variables
11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate
More informationDETECTING BIOLOGICAL AND ENVIRONMENTAL CHANGES: DESIGN AND ANALYSIS OF MONITORING AND EXPERIMENTS (University of Bologna, 3-14 March 2008)
Dipartimento di Biologia Evoluzionistica Sperimentale Centro Interdipartimentale di Ricerca per le Scienze Ambientali in Ravenna INTERNATIONAL WINTER SCHOOL UNIVERSITY OF BOLOGNA DETECTING BIOLOGICAL AND
More informationDo not copy, post, or distribute
14 CORRELATION ANALYSIS AND LINEAR REGRESSION Assessing the Covariability of Two Quantitative Properties 14.0 LEARNING OBJECTIVES In this chapter, we discuss two related techniques for assessing a possible
More information