Inconsistencies between theory and methodology: a recurrent problem in ordination studies.

Size: px
Start display at page:

Download "Inconsistencies between theory and methodology: a recurrent problem in ordination studies."

Transcription

1 This is the pre-peer-reviewed version of the following article: Inconsistencies between theory and methodology: recurrent problem in ordination studies, Austin, M., Journal of Vegetation Science, vol. 24, issue 2, Copyright 2012, International Association for Vegetation Science, Wiley-Blackwell, which has been published in final form at Inconsistencies between theory and methodology: a recurrent problem in ordination studies. AUSTIN, M. P. CSIRO Ecosystem Sciences GPO Box 284, Canberra, Australian Capitol Territory 2601 Australia Corresponding author; mike.austin@csiro.au 8 9 This review is dedicated to the memory of Prof. I. Noy-Meir Abstract A historical review of ordination studies is presented with particular reference to the pioneering contributions of the late Prof. I. Noy-Meir and their continuing relevance. Inconsistencies between different ordination methods and ecological models are examined. Attention is drawn to three needs. (1) To use artificial data to evaluate different methods (2) For artificial data to be simulated based an explicit theory of vegetation composition (3) To examine the neglected topic of data standardisation which has relevance for both theory and methodological performance. Three conceptual models which have been used to generate artificial data are discussed. Comparative studies of the ability of ordination methods to recover truth using artificial data are reviewed. Differences among comparative studies in conceptual models used to generate data, nature of the data matrices and methods of evaluation limits reaching definitive conclusions. The balance of evidence demonstrates that Multi-dimensional Scaling using the Bray-Curtis family of dissimilarity measures recovers ecological truth better than Correspondence Analysis using Chi-squared distance. Data standardisation prior to ordination is shown to be important for the best recovery of truth. Standardisation alters the properties of the vegetation data

2 matrix, yet little is known of the influence or relevance of the collective vegetation properties stand abundance, dominance or species richness on ordinations. Are they random variables, or indicators of environmental conditions and how do they influence ordination performance? Many current ordinations studies are suboptimal and Noy-Meir s observations from the 1970s and 80s remain relevant Keywords: Correspondence Analysis; Non-metric multidimensional scaling; Bray- Curtis coefficient; Extended dissimilarity; Horseshoe effect; species response models; ecological distance; simulated data. Running head: Inconsistencies between theory and methodology Introduction Here I present a historical review of vegetation analysis using ordination methods with an emphasis on the relevance of Imanuel Noy-Meir s early work. I focus on the use of artificial data to explore performance of analytical methods and the value of such data for stating the phenomenological theory often assumed in vegetation studies. Attention is drawn to the often neglected issue of data standardisation and its relevance to theoretical assumptions. Many issues raised by Imanuel Noy-Meir remain relevant today Context The importance of theory and methodology was first brought home to me by Imanuel Noy-Meir in 1970 showing me a preprint of a seminal paper by Swan (1970). The paper showed that the then current geometric ordination methods could produce grossly distorted ordination graphs if the species data were assumed to consist of

3 species with unimodal symmetric response curves to an environmental gradient, a reasonable theoretical assumption at the time. The other novel feature of the study was the use of artificial data based on a theoretical postulate to test the performance of an analytical method. Imanuel and I immediately followed up on Swan s pioneering work by examining the impact of such theoretical ideas on the use of the statistical method of principal components analysis (PCA) for ordination studies confirming the distortion problem (Noy-Meir & Austin 1970). Imanuel s Ph.D research was a pioneering study of the use of ordination methods as applied to a large scale survey of the semi-arid vegetation of southern Australia (Noy- Meir 1970). His early work on ordination has been largely overshadowed by his later work on grazing systems in the arid zone though his work on survey design and multivariate analysis deserves greater recognition than it has received (Noy-Meir 1971; 1973a, 1974a,b;) Issues There is a constant tension between ecological theory, the data collected and the statistical methods used in vegetation studies. Often, there are significant incompatibilities between the ecological theory and the statistical methods used (Austin 2002). Ordination is a powerful technique for summarising complex multivariate ecological data yet confusion about appropriate methods is still apparent in the literature (Legendre & Gallagher 2001; McCune & Grace 2002; Lepš & Šmilauer 2003; Clarke et al. 2006; Hirst & Jackson 2007; Zuur et al. 2007; Roberts 2008; von Wehrden et al ). Kenkel & Orloci (1986) summarised early issues reviewed by Noy-Meir & Whittaker (1977) stating that Comparisons of ordination techniques have tended to confound three factors: the methodological algorithm, the resemblance measure employed, and

4 the standardisation used. These and other issues continue to exercise plant and other ecologists. For example, what part should ecological theory play in determining the ordination method used? Alternatively, if the purpose is to summarise the variation in a region or habitat is it necessary to use the best possible method rather than the most convenient? Historical use of simulated data Early use for evaluating ordination methods The first major use of artificial data in plant community ecology appears to have been applied to assessing ordination methods (Swan 1970). The great advantage of using artificial data is that the true relationship between species and environment is known. The major disadvantage is that the results are highly sensitive to the ecological model used to generate the data (Austin et al ). Swan (1970) based his simulations of artificial data sets on the continuum concept (Gleason 1926, Curtis & MacIntosh 1951, Austin 1985) assuming continuous variation in species composition along an ecological gradient. In Swan s version species were assumed to have identical symmetric bell-shaped curves of abundance, equally spaced along an environmental gradient with varying degrees of overlap between species in different data sets. The results clearly defined the now notorious horseshoe or arch effect for a onedimensional gradient where the ordination method should have produced a linear array of stands but instead produced a curve or horseshoe in two dimensions with the curvature a function of the species overlap. This distortion made stands of vegetation with no species in common appear more similar than they actually were. After Swan s initial one-dimensional continuum model, Austin and Noy-Meir (1971) introduced a two-dimensional coenoplane model using similar assumptions regarding

5 species responses. The results of these simulations confirmed the importance of the horseshoe distortions when applied to a two-dimensional environmental plane. In addition, by varying species richness per stand with more species in the centre of the coenoplane (hence simultaneously varying total vegetation abundance per stand) greater distortion resulted (Fig. 1, model 1D Austin and Noy-Meir (1971)). Principal component ordination gave a gross flask-shaped distortion of the artificial coenoplane (Fig. 1a) in three dimensions (Fig.1b, c). Surprisingly, successive double standardisation (Bray & Curtis 1957) where each species score is divided by the maximum for that species and the new scores are then divided by the total for the stand, removed the distortion in the PCA ordinations almost totally (Fig. 1d). The perimeter stands on the edge of the grid were both species-poor and had low vegetation abundance per stand and the double standardisation corrected for this. A conclusion relevant to current studies is that an ordination using PCA based on a Euclidean dissimilarity measure can recover a close approximation to truth when an appropriate standardisation is used. The choice of standardisation will depend on the question and relationships assumed between species richness, stand abundance and ecological gradients. A second model was also tested with complete lack of success in recovering the simulated ecological space (Austin & Noy-Meir 1971). Species with variable maximum abundance and tolerance ranges were randomly located within a square which was sampled by stands placed at random. The model constructed islands of vegetation where some species co-occurred in stands with irregular patterns of stand abundance and species richness surrounded by areas with no vegetation. The conceptual space was unsaturated with either species or vegetation creating an ecological space for which there is no observational support. The PCA ordination

6 failed to recover any pattern resembling that of the original. This early model (model 2, Austin & Noy-Meir 1971) provides a clear example of a conceptual model of vegetation which was totally inconsistent with common knowledge of vegetation. Under normal stable conditions any fertile site will be covered with vegetation. For other conceptual models which have been and are being used for vegetation studies it may be less obvious that the model is inconsistent with our knowledge. The implications of these inconsistencies for our interpretation of results may be profound. Noy-Meir went on to examine data transformations and standardisations for their impact on ordination outcomes using a combination of artificial data and selected real data sets (Noy-Meir, 1973b; Noy-Meir et al ; see also Noy-Meir & Whittaker 1977). He concluded that standardisations could have a profound effect on recovering truth as represented by the artificial data and gave rise to different outcomes when applied to real data. He stated that species standardisation equalising all species implied that: 1. All species common or rare, are of equal a priori interest and therefore 2. each presence of a rare species (or absence of a common one) is proportionally more important than that of an abundant one and 3. a priori interest in sites is in proportion to their richness in rare species (Noy-Meir et al. 1975) Similar implications applied to stand standardisations when equal weight is given to each site (stand) regardless of differences in total abundance and a priori weight is given to dominant species. Noy-Meir et al.. (1975) also drew attention to the relationship between standardisations and phytosociological concepts such as faithful and constant species with particular standardisations emphasising one rather than the other. The choice of standardisation was to be based on the question posed and what

7 properties of vegetation composition (e.g. rarity, dominance or species richness) were thought relevant to the question. No particular theoretical framework regarding these properties of the data matrix was presented Development of alternative models for evaluating ordinations Swan s (1970) paper stimulated a great deal of research on new methods of ordination and their evaluation using artificial data in the period (e.g. Gauch & Whittaker 1972a, b; Hill 1973; Ihm & Groenewoud 1975; Fasham 1977; Kenkel & Orloci 1986). The literature was extensively reviewed at the time in books (e.g. Gauch, 1982; Greig-Smith, 1983; Jongman et al. 1995) and papers (e.g. Austin, 1985; Kent & Ballard 1988; ter Braak & Prentice 1988). However, with benefit of hindsight it is possible to identify trends and issues which are still relevant today. The construction of artificial data is a statement of a phenomenological theory of vegetation composition rather than simply as hypothetical data for evaluating ordination methods. I identify three phenomenological models here by the names of two scientists, for each of which one developed the ecological concepts and the other expressed the ideas in terms of a set of programmable rules or statistical procedures (see also Austin et al. 2006) Whittaker/Gauch Model Gauch and Whittaker (1972a; Gauch, 1982) put forward a Gaussian model for the pattern of species along an environmental gradient (coenocline) based on visual interpretation of the species response curves observed in Whittaker s direct gradient studies of American forests (e.g. Whittaker 1960). They based their model on nine propositions. The two principal propositions were that (1) species response curves

8 approximated Gaussian curves i.e. symmetric bell-shaped curves and (2) the modes of major species were uniformly (equally spaced) along environmental gradients while minor species modes were randomly distributed. The propositions were reviewed (Austin 1985) and then tested with presence/absence tree data from Australian Eucalypt forests (Austin 1987). After stratification of data to control for topographic and precipitation effects, it was shown that symmetric response curves were rare (7 out of 42 species) with skewed curves in response to mean annual temperature characteristic of major canopy species of sclerophyll forest in south-eastern New South Wales, Australia. The hypothesis that species modes were regularly distributed was rejected for this vegetation type (Austin 1987). The additional propositions could not be examined because the temperature gradient showed a marked gradient in tree species richness confounding any tests. Whittaker and Gauch (1978) recognised that there were variations in species richness along environmental gradients but made no specific proposals about the nature of those patterns. Peet (1978) supported much previous work by showing there were local patterns of species richness in relation to elevation and topographic moisture gradients but did not reach any broad generalisations. Eucalypt species richness has been shown to be a curvilinear interactive function of mean annual temperature and mean annual precipitation using generalised linear modelling (Margules et al. 1987). This provided statistical support for Peet s conclusions and demonstrated that species richness (α-diversity) was highly correlated with several environmental variables. Minchin (1989) showed that while there was a pattern in total species richness in relation to altitude and drainage class, much clearer patterns emerged when species were grouped into structural classes, trees, shrubs, herbs and graminoids. A distinguishing feature of Whittaker s model is that the equal spacing of the modes of

9 species applies to the dominant stratum of his forests namely the trees; understorey species are regarded as having a random pattern (Whittaker 1956; Gauch & Whittaker 1972) There is little evidence to support the Whittaker/Gauch Gaussian model of vegetation and by not addressing patterns of species richness the model is shown to be incomplete Swan/ter Braak model Swan (1970) put forward a species packing model, a simpler version of the Whittaker/Gauch model. Species had identical bell-shaped response curves with the same maximum value and potential range across a gradient and that the peaks should be located at equidistant points along the gradient (Swan 1970). Artificial datasets were constructed with variations in the ratio of species range to the length of the gradient. As the species ranges became shorter relative to the length of the gradient so stands eventually had no species in common and the original gradient could not be recovered by the ordination technique used. Hill (1973) introduced an ordination method reciprocal averaging into ecology subsequently recognised as correspondence analysis (CA, Hill 1974) which uses this species packing model. Ter Braak (1985; ter Braak & Prentice 1988) demonstrated that CA approximates a Maximum Likelihood Gaussian ordination subject to certain conditions. These were, quoting from ter Braak & Prentice (1988): 1. The site scores (x i ) are closely spaced in comparison with the species tolerance (t) 2. The species optima are equally spaced along the environmental variable over an interval that extends for a sufficient distance in both directions from the true value x i

10 The species optima must be closely spaced in comparison with their tolerances. 4. The species have equal tolerances; 5. The species have equal maximum values These conditions are for the successful mathematical performance of CA; they say nothing about their ecological realism. The unrealistic conditions of this species packing model can be relaxed without necessarily compromising efficiency (ter Braak & Prentice 1988). Note however that while species maximum values may differ, they must not show a trend along the environmental variable for instance, leading to species-rich samples at one end of the gradient and species-poor at the other end (ter Braak & Prentice 1988). Species-richness gradients in relation to environment are well known (e.g. Peet 1978; Margules et al. 1986; Pausas 1994). Problems will also occur if species tolerances differ substantially among species. Canonical Correspondence Analysis (CCA, ter Braak 1986) is based on CA and fits a regression for each CA axis to a set of environmental predictors. It is therefore subject to the same assumptions and conditions as CA. Hill and Gauch (1980) tested both CA and the modified version detrended correspondence analysis (DCA) showing that DCA removed some of the horseshoe distortion using both artificial data and actual data sets. While the artificial data sets tested several different types of data, all were generated based on a model that species responses were symmetric bell-shaped curves, see also Gauch et al. (1981); gradients in species richness were not tested. Jackson & Somers (1991) show that DCA is very sensitive to the number of segments along the gradient that are used with results for 24 and 25 segments giving markedly different interpretations.

11 Kenkel & Orloci (1986) investigated the interaction of the methods, Principal Coordinates Analysis (P-Co-A), Multidimensional Scaling (MDS), with the data, unstandardised, double standardised and stand normalised while holding the dissimilarity measure constant (Euclidean distance). Two-dimensional artificial data approximating the Whittaker/Gauch model were used. These methods were then compared with CA and DCA with their implicit double standardisation. The authors concluded that MDS with stand normalisation best recovered truth. It is also apparent from their Table 1 and Figure 1 that the advantage of MDS with stand normalisation over CA methods increased with increasing species turn-over along the two gradients ie increasing numbers of double-zero matches between stands. These results and the qualifications expressed by ter Braak and Prentice (1988) above, suggest that CA methods are sensitive to departures from the Swan/ ter Braak species packing model, while the model itself lacks theoretical or empirical support. Chisquared distance (CSQ) is the implicit dissimilarity measure used in CA (Faith et al. 1987; Legendre & Gallagher 2001; Lepš & Šmilauer 2003). The implications of this and the numerous comparisons of CA, DCA and non-metric multidimensional scaling (NMDS) as alternative methods of ordination are discussed later Ellenberg/Minchin model Mueller-Dombois and Ellenberg (1974, see also Austin 1976, 1980, 1985) put forward a graphical model of species responses to environmental gradients based on Ellenberg s earlier phytosociological studies and his multispecies competition experiments along a watertable gradient (Ellenberg 1953,1954). The physiological response (approximates the fundamental niche) curves of species were non-gaussian unimodal bell-shaped curves. The ecological response (approximates the realised niche) could take any shape from symmetric bell-shaped to asymmetric bimodal

12 curves. The differences between ecological response curves were assumed to be due to the presence or absence of superior competitors in different regions along the gradient. No suggestions were made regarding spacing of species modes, or patterns of the collective properties of vegetation abundance, species richness or dominance along the gradient. Simulated data based on a version of this model introducing assumptions regarding collective properties was used to evaluate the performance of ordination methods (Austin 1976). A review of the evidence for the responses of these collective properties to environmental gradients showed models of vegetation composition needed to incorporate recognition of their patterns (Austin 1980). Graphic ordination using only the three collective properties, vegetation abundance, species richness and normalised dominance per stand were then shown to recover much of the ecological information revealed by a floristic analysis of rainforest stands (Austin 1981). The three dimensional ordination of collective properties showed that the vegetation differed depending on topographic position and geographic location. Minchin (1987b) synthesised these and other ideas into a comprehensive computer program (COMPAS) to generate simulated vegetation datasets. To allow for skewed species response curves a beta-function was used to simulate species data rather than a Gaussian curve. No specific model was assumed. The numerous options allowed very different phenomenological vegetation models to be simulated varying species richness, carrying capacity (i.e. stand abundance) and interspecific interaction (competition). Minchin and colleagues were then able to test the robustness of ordination methods to a variety of vegetation models with potentially more realistic assumptions than the Swan/ter Braak or Whittaker/Gauch models (Minchin 1987a, 1989; Faith et al. 1987; Belbin 1991).

13 Comparisons using a large and diverse range of simulated models were made of the ability of various ordination methods, in particular detrended correspondence analysis (DCA) and local non-metric multidimensional scaling (LNMDS) to recover the pattern of stands in ecological space (Minchin 1987a). Performance was examined in relation to (1) the beta diversity of ecological gradients in one or two dimensions (2) shape of species ecological response curves (e.g. symmetric/skewed, unimodal/bimodal) and (3) arrangement of stands in ecological space (e.g regular/random). Under conditions approximating the Swan/ter Braak model, DCA and LNMDS performed equally well. When the models increasingly diverged from this model and approached the conditions represented by the various options of the Ellenberg/Minchin model, so LNMDS performed better at recovering the stand positions than DCA. Faith et al. (1987) took a unique approach. They examined the ability of dissimilarity measures to represent the ecological distance and how this was influenced by data standardisations. The program COMPAS (Minchin 1987b) was used to generate 561 artificial datasets varying in beta diversity, species ecological response shapes, competition between species, and trends in carrying capacity across the ecological space. Four data standardisations were applied: (1) species adjusted to equal maximum abundance (SPM) (2) Stands (sites) standardised to equal total abundance (SAT) (3) Bray-Curtis successive double standardisation i.e. 1 followed by 2 (DBL) (4) species standardised to equal standard deviations (SPS) only used with Euclidean distance. The combination of standardisation with certain measures are indistinguishable from other measures, see Faith et al. (1987) for full details of dissimilarity measures and standardisations.

14 Selected performances are presented in Table 1. Euclidean distance and χ 2 (chisquared distance) used in CA, performed poorly occupying the lowest rankings regardless of standardisation. The Gower metric was also poor at recovering the ecological distance. The Manhattan metric is very sensitive to standardisation, with double standardisation it was indistinguishable in performance from the best measures Bray-Curtis and quantitative Kulczynski (Table 1). The Canberra, Bray-Curtis and Kulczynski coefficients with standardisations were clearly superior at recovering the ecological distance compared with CSQ that is CA, with or without standardisation. Standardisation plays a key role in recovering ecological distance and increases the success of ordination. Though recognised early (Austin & Noy-Meir 1971; Noy-Meir et al. 1975), the importance of standardisation does not appear to have been widely considered. The differential ability of the 29 unique combinations of ten dissimilarity measures and four standardisations to measure the ecological distance is at variance with many of the current approaches recommended by ecologists (e.g. Legendre & Legendre 1998; McCune & Grace 2002; Lepš & Šmilauer 2003; Zuur et al. 2007). The Ellenberg/Minchin approach as expressed in the software package COMPAS allows numerous vegetation models to be generated rather than a single one (Minchin 1987b). The results obtained using these various models indicated that the conclusions on choice of dissimilarity measure, standardisation and ordination method are likely to be robust to changes in vegetation model (Minchin 1987a; Faith et al. 1987) Recognition of theoretical context Many of the ideas and approaches discussed above found expression in a symposium Theory and models in vegetation science held in Uppsala in 1985 edited by Prentice

15 and van der Maarel and published in Vegetatio volume 69. Imanuel Noy-Meir and Eddy van der Maarel (1987) provided a historical perspective on the relations between community theory and community analysis in vegetation science. They reviewed the ideas of earlier ecologists on theory in terms of the relative importance given to different processes in determining vegetation properties (Table 2). There is no discussion of ecological or environmental processes which might determine collective properties such species richness. The authors also drew attention to the potential for vegetation analysis methods and their implicit vegetation models to either generate hypotheses or to test them. However, they made the following pertinent comments: 1. One might have expected that the development of an objective methodology for collecting and analysing data on plant communities would also lead to a more rigorous evaluation and sharper formulation of general hypotheses about plant communities. This potential use of the new methodology was not explored much further, however. 2. The loss of interest by plant ecologists in generalisations about plant communities is perhaps not accidental. It may reflect the realization that the classical theories were inadequate, and the feeling that the processes in plant communities are probably too intricate and complex to expect any general pattern to be observable at the community level. (Noy-Meir & van der Maarel 1987). Though the comments may appear pessimistic they are still relevant today. Kent & Ballard (1988) in a review make a similar comment In the absence of a fully developed model of vegetation response to environment, perhaps it is not surprising that the various trends and problems in the evolution and application of classification

16 and ordination methods have occurred and there seems little reason to believe that they will not continue into the near future. Subsequently a conceptual framework for a continuum theory of vegetation was put forward together with a review of how consistent these ideas were with those of recent ecologists (Austin & Smith 1989). Nine propositions were put forward concerning the nature of environmental gradients, fundamental niche responses of plants, the realised niche response of plants and environmental responses of the collective properties of vegetation. Comparison of these ideas with those of other ecologists revealed a wide range of inconsistencies (Table 3). There are numerous difficulties of definition and meaning. The nature of the environmental gradient is one example. Neither Ellenberg nor Whittaker appear to have made general statements about gradients but they recognised different gradients. Whittaker (1956) analysed vegetation in terms of two factor-complexes of elevation and moisture. Ellenberg recognised seven habitat factors to which European species were considered to respond (Muller-Dombois & Ellenberg 1974). The three ecologists Grime, Tilman, and Bazzaz collapse various environmental gradients into a single productivity gradient for the purpose of generalisation and all give importance to disturbance. Austin and Smith (1989) distinguish three types of gradients, indirect (eg. altitude), direct (eg. temperature) and resource (eg. phosphorus) suggesting that vegetation properties will respond differently to them but fail to mention disturbance. These ideas on the types of gradients are inconsistent but need not be incompatible. The length of gradients is another source of confusion. The Swan/ter Braak and Whittaker/Gauch models do not specify any limits. The Ellenberg/Minchin model allows for gradients in species richness and stand abundance which imply limits to the environmental gradients ie. where no species can exist, but these limits will be beyond

17 the simulated datasets actually used (Minchin 1987b; Faith et al. 1987). ß-diversity measures are often used to describe the length of the gradient and to advise on the ordination methods to be used (e.g. Lepš & Šmilauer 2003). However there is little comment on the impact species richness or stand abundance may have on performance of methods if such properties of vegetation are also determined by the same environmental gradients. A short section of a temperature gradient may show no significant or consistent changes in either species richness or stand abundance. However if the gradient is viewed over its entire length from low temperatures where no vegetation is able to exist to high temperatures with no vegetation, there will be response curves for both species richness and plant biomass whose shape will be unknown (Table 3, cf Austin & Smith 1989). The importance of these collective properties for vegetation analysis will depend on both the position of the samples along the gradient and the length of the section of the gradient sampled. The relevance of reasoning such as this has had little influence on the recent practice of vegetation analysis (Jongman et al. 1995; Lepš & Šmilauer 2003; ter Braak & Prentice 1988; Zuur 2007; von Wehrden et al. 2009). The period from saw recognition that the creation of artificial datasets for evaluating ordination methods constituted a phenomenological model of vegetation. Initially the conceptual models used generated datasets which were inconsistent with available ecological knowledge. Later models (eg. Austin & Smith 1989) have had limited testing Recent developments There has been a very substantial expansion in computing and statistical tools for analysis of vegetation in the last twenty years (e.g. Legendre & Legendre 1998;

18 McCune & Grace 2002; Lepš & Šmilauer 2003; Zuur et al. 2007). Ordination methods have been widely applied to biotic data sets other than vegetation. There has not been a similar expansion in developing the interface between theory and statistical analysis (Austin 2002). Here discussion is restricted to studies where artificial data were used and where Imanuel Noy-Meir made early contributions: evaluation of ordination techniques, use and creation of simulated data and role of data standardisations Ordination Studies Comparative Studies Palmer (1993) used simulated data with two-dimensional orthogonal gradients generated by default settings of Minchin s (1987b) program COMPAS to evaluate the performance of DCA and CCA. DCA produced a distorted grid but it approximated the simulated grid reasonably well (see Palmer 1993 Fig.5). CCA supplied with the true co-ordinates as environmental predictors recovered the grid. McCune (1997) however has shown that the results of CCA are sensitive to noisy environmental data. No comparisons were made with other ordination techniques. The results using the Ellenberg/Minchin model suggest that DCA is robust in some circumstances to the assumptions on which the CA methods and Swan/ter Braak model are based (ter Braak and Prentice 1988). Van Groenewoud (1992) compared performance of CA and DCA in recovering the known structure using similar simulated datasets. He varied skewness of species responses, gradient length and sampling pattern of stands. He concluded that CA and DCA were unsatisfactory, failing to recover the second gradient. Legendre and Gallagher (2001) examined the use of various dissimilarity measures and associated data transformations on the performance of selected ordination methods and recommended against using CA. They used a single artificial

19 dataset based on the Swan/ter Braak model. Given the contrasting conclusions of these papers it is instructive to compare the types of simulated data, the conceptual model used, ordination methods and other properties of the papers. Table 4 summarises some among many attributes of the studies. There are differences in all properties producing uncertainties whether conclusions are due to differences in the analytical techniques. Differences in performance may be confounded with properties of the simulated datasets. Palmer (1993) when comparing his conclusions on the reduced distortion shown by DCA in contrast to the results of Minchin (1987a) comments that this may be due to the greater number of species generated in his dataset (Table 4). The mean species range is equal to the length of the gradients (Table 1 in Palmer 1993) for 300 species, so the alpha diversity (number of species) per stand will be very high and the frequency of zero values in the data matrix relatively low. In contrast the test data sets of van Groenewoud (1992) with total species numbers of either, 18 or 30 species, variable species ranges and variable gradient lengths would have had low alpha diversity and higher frequency of zero values. His data sets are not dissimilar in response patterns however, compare Palmer (Fig ) and van Groenewoud (Fig ). The inconsistent datasets do not allow any general conclusions about performance of CA methods to be drawn (Table 4) and the differential performance issues of CA versus MDS raised by Minchin 1987a and Faith et al. (1987) are not addressed. Legendre and Gallagher (2001) used a rather specific form of the Swan/ter Braak model with abundant regularly spaced species and rare low abundance species only occurring between the modes of the abundant species. This produced small quasiperiodic variations in both species richness and stand abundance along the environmental gradient (Fig. 3a Legendre and Gallagher 2001). Among their

20 conclusions they make reference to (1) the biased sensitivity of χ 2 distance, and hence CA methods to rare species with low abundance (2) that CCA is sensitive to stands with high abundance and suggest CCA should only be used when stand abundances are approximately equal. There is however no discussion of the ecological assumptions built into their artificial dataset Novel Approaches Other ordination methods and dissimilarity measures evaluated on simulated data have been published (Belbin 1991; De ath 1999a,b; Roberts 2008). De ath (1999a,b) used both Gaussian and Ellenberg/Minchin models of species response in evaluating two methods principal curves and extended dissimilarity. Principal curves was shown to perform better than either CCA or multidimensional scaling particularly when the carrying capacity was fixed. The method is restricted to one dimensional gradients and requires further work to extend it to two-dimensional data sets (De ath 1999a). In a second paper De ath (1999b) presented a different method extended dissimilarity (XD) for adjusting the estimate of dissimilarity between stands which have few or no species in common and is applicable to two-dimensional data sets. The method was developed based on the earlier work of Williamson (1978) on a step-across method, Bradfield and Kenkel s (1987) flexible shortest path adjustment, Faith et al. (1987) hybrid multidimensional scaling and Belbin (1991) semi-strong hybrid scaling. The study used site-standardised Bray-Curtis as the dissimilarity measure, one and two-dimensional artificial data sets and both PCoA and NMDS as ordination techniques. Briefly, the method depends on calculating an extended dissimilarity value for those dissimilarity measures greater than a set value e.g. 0.8 using minimum estimates from the sum of dissimilarities with intermediate

21 stands. This is essential for stands which have no species in common, see De ath (1999b) for details of the extended dissimilarity algorithm. Results show that extended dissimilarity was always equal to or better than the unadjusted dissimilarity measure. This was particularly marked for two-dimensional data with very high beta diversity gradients. As of 9 th June 2011, the paper has been cited only 27 times. Yet the results suggest the approach estimates the ecological distance from dissimilarity measures better than any other approach and deserves greater attention. De ath (1999b) bases his approach on two propositions: (1) Dissimilarity should take a constant value (typically 1) if and only if sites have no species in common and the value 0 when sites have equal abundances of all species. (2) Dissimilarity should increase with increasing ecological distance (the Euclidean distance between sites on the gradient(s)). Recognition and acceptance of these propositions requires realistic simulated data to define true ecological distance and simplifies discussion of choice of dissimilarity measures. However the influence of collective properties such as stand abundance, species richness and dominance and their relationship with data standardisations remains to be considered. Roberts and colleagues have developed fuzzy set ordination techniques and used artificial datasets to evaluate the choice of dissimilarity measure and the ability to recover the original gradients (Roberts 1986, 2008, 2009; Boyce & Ellison 2001). Boyce & Ellison (2001) used COMPAS (Minchin 1987b) to analyse the performance of nine presence/absence dissimilarity measures in recovering a single gradient subject to a number of variations. They concluded that (1) measures ignoring joint absences were better at recovering the true gradient, (2) at high ß-diversity a stepacross algorithm (e.g. De ath 1999b) improves gradient recovery (3) regular sampling

22 along the gradient is better at recovering truth than random sampling. Roberts (2008) presents a method for constrained regression analysis similar to CCA based on fuzzy set theory. He creates one, two and three dimensional artificial data sets using similar but not identical options to those provided by Minchin (1987b). The artificial gradients are successfully recovered even in the face of considerable noise. A comparison of Multidimensional Fuzzy Set Ordination (MFSO) with CCA and Distance-Based redundancy Analysis (DB-RDA) using four real data sets indicated that MFSO was the better technique (Roberts 2009). While fuzzy set ordination may have a role to play in vegetation analysis and published results support general conclusions about joint-absence measures and the relative value of CA methods it is difficult to reach firm conclusions because every comparative study is different. One evaluation criterion used by Roberts (2009) was to calculate the Pearson product-moment correlation of the pair-wise distances among the samples in the ordination with a Bray-Curtis dissimilarity matrix of the samples Yet Faith et al. (1987) clearly showed that it is better to standardise the data prior to calculating the Bray-Curtis dissimilarity for recovering the ecological distance between samples (Table 1) Dissimilarity Measures Clarke et al. (2006) provide an examination of dissimilarity measures with particular regard to the Bray-Curtis coefficient and its application to marine pollution studies. This paper presents results which have implications far beyond marine pollution studies. Their initial point is that sites with no organisms frequently occur in pollution, disturbance and successional studies. They then demonstrate that incorporating such empty sites into ordination studies can be achieved by adding a dummy species to the original abundance matrix, with a value of 1 for all samples

23 provided the empty sites are entirely empty from the same cause (Clarke et al. 2006). These authors, more importantly, outline a series of suitable properties (guidelines) for the performance of resemblance measures (dissimilarity measures) see Table 5. They suggest six guidelines which they evaluate for several measures concluding that Bray-Curtis type coefficients should be used for ordination studies. The first two guidelines, coincidence and complementarity are equivalent to proposition one above of De ath (1999b). Localisation refers to the criterion that addition of a new sample should not affect the dissimilarity between existing samples. The Gower metric would be an example of a dissimilarity measure which does not satisfy this guideline. The contribution of each species in the dataset is standardised by the species range in the existing data set, addition of a new sample may change the species range. The guideline independence of joint absence is defined as exclusion or inclusion of taxa which are not present in either sample does not effect the resemblance between two samples. Strictly speaking, Euclidean distance does satisfy the independence of joint absence guideline as defined, but compare Clarke et al. (2006 page 64). Defined as Joint absences of species not present in either sample should not limit the dissimilarity relative to other samples in the data set is more specific. When Euclidean distance is applied to two species-poor stands each with low total stand abundance but no species in common, the distance will be constrained to have low dissimilarity. The apparently high similarity relative to the dissimilarities between other stands will depend on the number of double-zero matches. Simulated data has demonstrated this to be the principal cause of the horseshoe distortion many times (Fig. 1). Euclidean distance, Canberra, Manhattan and Gower metrics all

24 have this property. This is an essential guideline when absence may be due to different causes. Clarke et al. (2006) present ordinations for four marine data sets using a variety of dissimilarity measures to demonstrate the extreme differences in results which can arise. To further examine the differences between dissimilarity measures a secondstage ordination of the measures was done (Somerfield & Clarke 1995). For this method, a number of different between-site dissimilarity matrices are calculated from the same set of abundances,. A measure of (second-stage) dissimilarity is then estimated between the different (first-stage) dissimilarity matrices. The measure used was Spearman rank correlation calculated pairwise between corresponding elements of each of the dissimilarity matrices (Clarke et al. 2006). The two-stage ordination summarises and displays the very different patterns captured by the different dissimilarity measures used (Fig. 2). The simplified graph can be interpreted as showing four groups of dissimilarity measures based on the coefficients considered in Table 5. The first group consisting of Canberra metric, Gower metric, Manhattan metric, and Euclidean distance were almost always judged to have performed badly. The second group consists of a single member normalised Euclidean distance using data standardised by standard deviates behaves as an extreme member of group 1. The third group consisting solely of the χ² distance used in CA behaves as an outlier to the fourth group (Fig.2). The fourth group is composed of Bray-Curtis type coefficients namely Bray-Curtis coefficient, Kulczynski dissimilarity and Canberra dissimilarity (equivalent to Faith et al. Canberra metric Adkins form) which the authors judge to be the most successful at recovering ecological information. Clarke et al. (2006) also provide information on the performance of other coefficients and possible reasons for the varied behaviour of different coefficients. However the

25 conclusions are based on subjective assessments of the success of four real datasets. There is no truth as is possible for artificial data. Their results are consistent with and support the conclusions of Faith et al. (1987) on the performance of different dissimilarity measures. Clarke et al. (2006) discuss the need to consider species dominance and rarity, and use various standardisations and transformations to overcome perceived problems. Clarification is needed however on the inconsistent use of transformed data between case studies. For example, the coral communities study used square-root transformed data while the Clyde sludge disposal study used 4 th -root transformed counts. Do such differences confound the conclusions? The authors recognise the problem What is needed here is an effective means of assessing whether any newly defined measure contributes essentially different information than any of the existing canon, and which if any of the known coefficients it most closely resembles. (Clarke et al page 74). If ecological information is to be consistently recognised, an ecological theory is required to define what constitutes ecological information. This raises the very general issue of what constitutes appropriate ecological theory? An attempt to test the Whittaker/Gauch model (Austin 1987), recast the theory (Austin & Smith 1989) and make further tests (Austin & Gayward 1994) relied on data consisting of species of a single growth form, trees, from a single genus Eucalyptus. Whether similar theory regarding competition, abundance, species packing and niche shape can be applied to a marine benthic dataset (e.g. Clarke et al. 2006) consisting of species from ten phyla (Olsgard et al. 1997) with different bodyplans, feeding strategies and positions in the food chain is an open question. The recent ordination studies reviewed in this section provide new methods, knowledge of the mathematical relationships between dissimilarity measures and possible guidelines

26 for choosing measures. But even when using artificial data they do not examine the ecological theory on which ordination needs to be based Discussion The earlier comments of Noy-Meir and others on the lack of progress in vegetation theory in relation to ordination have proved prescient (Noy-Meir et al. 1975; Noy- Meir & van der Maarel 1987). This lack of theory has not been confined to vegetation studies as recent texts and papers on the use of ordination in ecology demonstrate (e.g. Legendre & Legendre 1998; Legendre & Gallagher 2001; McCune & Grace 2002; Lepš & Šmilauer 2003; Clarke et al. 2006; Zuur et al. 2007; Roberts 2008). The historical phases recognised in this review demonstrate the continued recognition of analytical problems with ordination methods which require the use of artificial data to resolve, a progressive increase in understanding of the mathematical procedures being used but also a continued failure to examine the ecological assumptions of the approach Historical overview During the early period of ordination studies using artificial data ( ), Noy- Meir and colleagues identified the importance of data standardisations in overcoming the distortion problems and recognised that recovery of truth was sensitive to the collective properties of species richness, stand abundance and dominance found in data sets (Noy-Meir & Austin 1970; Austin & Noy-Meir 1971; Noy-Meir, 1973b; Noy-Meir et al. 1975; see also Noy-Meir & Whittaker 1977). While progress on the issues of distortion, data standardisation, role of collective properties and ecological theory has been made it cannot be said that a current consensus exists.

An Introduction to Ordination Connie Clark

An Introduction to Ordination Connie Clark An Introduction to Ordination Connie Clark Ordination is a collective term for multivariate techniques that adapt a multidimensional swarm of data points in such a way that when it is projected onto a

More information

Introduction to ordination. Gary Bradfield Botany Dept.

Introduction to ordination. Gary Bradfield Botany Dept. Introduction to ordination Gary Bradfield Botany Dept. Ordination there appears to be no word in English which one can use as an antonym to classification ; I would like to propose the term ordination.

More information

-Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the

-Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the 1 2 3 -Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the 1950's. -PCA is based on covariance or correlation

More information

Linking species-compositional dissimilarities and environmental data for biodiversity assessment

Linking species-compositional dissimilarities and environmental data for biodiversity assessment Linking species-compositional dissimilarities and environmental data for biodiversity assessment D. P. Faith, S. Ferrier Australian Museum, 6 College St., Sydney, N.S.W. 2010, Australia; N.S.W. National

More information

INTRODUCTION TO MULTIVARIATE ANALYSIS OF ECOLOGICAL DATA

INTRODUCTION TO MULTIVARIATE ANALYSIS OF ECOLOGICAL DATA INTRODUCTION TO MULTIVARIATE ANALYSIS OF ECOLOGICAL DATA David Zelený & Ching-Feng Li INTRODUCTION TO MULTIVARIATE ANALYSIS Ecologial similarity similarity and distance indices Gradient analysis regression,

More information

Compositional dissimilarity as a robust measure of ecological distance

Compositional dissimilarity as a robust measure of ecological distance Vegetatio 69: 57-68, 1987 Dr W. Junk Publishers, Dordrecht - Printed in the Netherlands 57 Compositional dissimilarity as a robust measure of ecological distance Daniel P. Faith, Peter R. Minchin & Lee

More information

BIOL 580 Analysis of Ecological Communities

BIOL 580 Analysis of Ecological Communities BIOL 580 Analysis of Ecological Communities Monday 9:00 Lewis 407, Tuesday-Thursday 9:00-11:00, AJMJ 221 Dave Roberts Ecology Department 310 Lewis Hall droberts@montana.edu Course Description This course

More information

BIOL 580 Analysis of Ecological Communities

BIOL 580 Analysis of Ecological Communities BIOL 580 Analysis of Ecological Communities Monday 9:00 Lewis 407, Tuesday-Thursday 9:00-11:00, Lewis 407 Dave Roberts Ecology Department 117 AJM Johnson Hall droberts@montana.edu Course Description This

More information

8. FROM CLASSICAL TO CANONICAL ORDINATION

8. FROM CLASSICAL TO CANONICAL ORDINATION Manuscript of Legendre, P. and H. J. B. Birks. 2012. From classical to canonical ordination. Chapter 8, pp. 201-248 in: Tracking Environmental Change using Lake Sediments, Volume 5: Data handling and numerical

More information

Rigid rotation of nonmetric multidimensional scaling axes to environmental congruence

Rigid rotation of nonmetric multidimensional scaling axes to environmental congruence Ab~tracta Batanica 14:100-110, 1000 Department of Plant Taonomy and Ecology, ELTE. Budapeat Rigid rotation of nonmetric multidimensional scaling aes to environmental congruence N.C. Kenkel and C.E. Burchill

More information

Factors affecting the Power and Validity of Randomization-based Multivariate Tests for Difference among Ecological Assemblages

Factors affecting the Power and Validity of Randomization-based Multivariate Tests for Difference among Ecological Assemblages Factors affecting the Power and Validity of Randomization-based Multivariate Tests for Difference among Ecological Assemblages Cameron Hurst B.Sc. (Hons) This thesis was submitted in fulfillment of the

More information

Multivariate Analysis of Ecological Data using CANOCO

Multivariate Analysis of Ecological Data using CANOCO Multivariate Analysis of Ecological Data using CANOCO JAN LEPS University of South Bohemia, and Czech Academy of Sciences, Czech Republic Universitats- uric! Lanttesbibiiothek Darmstadt Bibliothek Biologie

More information

ANOVA approach. Investigates interaction terms. Disadvantages: Requires careful sampling design with replication

ANOVA approach. Investigates interaction terms. Disadvantages: Requires careful sampling design with replication ANOVA approach Advantages: Ideal for evaluating hypotheses Ideal to quantify effect size (e.g., differences between groups) Address multiple factors at once Investigates interaction terms Disadvantages:

More information

4. Ordination in reduced space

4. Ordination in reduced space Université Laval Analyse multivariable - mars-avril 2008 1 4.1. Generalities 4. Ordination in reduced space Contrary to most clustering techniques, which aim at revealing discontinuities in the data, ordination

More information

Distance Measures. Objectives: Discuss Distance Measures Illustrate Distance Measures

Distance Measures. Objectives: Discuss Distance Measures Illustrate Distance Measures Distance Measures Objectives: Discuss Distance Measures Illustrate Distance Measures Quantifying Data Similarity Multivariate Analyses Re-map the data from Real World Space to Multi-variate Space Distance

More information

Ordination & PCA. Ordination. Ordination

Ordination & PCA. Ordination. Ordination Ordination & PCA Introduction to Ordination Purpose & types Shepard diagrams Principal Components Analysis (PCA) Properties Computing eigenvalues Computing principal components Biplots Covariance vs. Correlation

More information

BIO 682 Multivariate Statistics Spring 2008

BIO 682 Multivariate Statistics Spring 2008 BIO 682 Multivariate Statistics Spring 2008 Steve Shuster http://www4.nau.edu/shustercourses/bio682/index.htm Lecture 11 Properties of Community Data Gauch 1982, Causton 1988, Jongman 1995 a. Qualitative:

More information

DETECTING BIOLOGICAL AND ENVIRONMENTAL CHANGES: DESIGN AND ANALYSIS OF MONITORING AND EXPERIMENTS (University of Bologna, 3-14 March 2008)

DETECTING BIOLOGICAL AND ENVIRONMENTAL CHANGES: DESIGN AND ANALYSIS OF MONITORING AND EXPERIMENTS (University of Bologna, 3-14 March 2008) Dipartimento di Biologia Evoluzionistica Sperimentale Centro Interdipartimentale di Ricerca per le Scienze Ambientali in Ravenna INTERNATIONAL WINTER SCHOOL UNIVERSITY OF BOLOGNA DETECTING BIOLOGICAL AND

More information

Correspondence Analysis & Related Methods

Correspondence Analysis & Related Methods Correspondence Analysis & Related Methods Michael Greenacre SESSION 9: CA applied to rankings, preferences & paired comparisons Correspondence analysis (CA) can also be applied to other types of data:

More information

Bootstrapped ordination: a method for estimating sampling effects in indirect gradient analysis

Bootstrapped ordination: a method for estimating sampling effects in indirect gradient analysis Vegetatio 8: 153-165, 1989. 1989 Kluwer Academic Publishers. Printed in Belgium. 153 Bootstrapped ordination: a method for estimating sampling effects in indirect gradient analysis Robert G. Knox ~ & Robert

More information

Diversity partitioning without statistical independence of alpha and beta

Diversity partitioning without statistical independence of alpha and beta 1964 Ecology, Vol. 91, No. 7 Ecology, 91(7), 2010, pp. 1964 1969 Ó 2010 by the Ecological Society of America Diversity partitioning without statistical independence of alpha and beta JOSEPH A. VEECH 1,3

More information

Multivariate Statistics 101. Ordination (PCA, NMDS, CA) Cluster Analysis (UPGMA, Ward s) Canonical Correspondence Analysis

Multivariate Statistics 101. Ordination (PCA, NMDS, CA) Cluster Analysis (UPGMA, Ward s) Canonical Correspondence Analysis Multivariate Statistics 101 Ordination (PCA, NMDS, CA) Cluster Analysis (UPGMA, Ward s) Canonical Correspondence Analysis Multivariate Statistics 101 Copy of slides and exercises PAST software download

More information

Lecture 2: Diversity, Distances, adonis. Lecture 2: Diversity, Distances, adonis. Alpha- Diversity. Alpha diversity definition(s)

Lecture 2: Diversity, Distances, adonis. Lecture 2: Diversity, Distances, adonis. Alpha- Diversity. Alpha diversity definition(s) Lecture 2: Diversity, Distances, adonis Lecture 2: Diversity, Distances, adonis Diversity - alpha, beta (, gamma) Beta- Diversity in practice: Ecological Distances Unsupervised Learning: Clustering, etc

More information

Experimental Design and Data Analysis for Biologists

Experimental Design and Data Analysis for Biologists Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1

More information

Gradient types. Gradient Analysis. Gradient Gradient. Community Community. Gradients and landscape. Species responses

Gradient types. Gradient Analysis. Gradient Gradient. Community Community. Gradients and landscape. Species responses Vegetation Analysis Gradient Analysis Slide 18 Vegetation Analysis Gradient Analysis Slide 19 Gradient Analysis Relation of species and environmental variables or gradients. Gradient Gradient Individualistic

More information

Chapter 11 Canonical analysis

Chapter 11 Canonical analysis Chapter 11 Canonical analysis 11.0 Principles of canonical analysis Canonical analysis is the simultaneous analysis of two, or possibly several data tables. Canonical analyses allow ecologists to perform

More information

Distance-based multivariate analyses confound location and dispersion effects

Distance-based multivariate analyses confound location and dispersion effects Methods in Ecology and Evolution 2012, 3, 89 101 doi: 10.1111/j.2041-210X.2011.00127.x Distance-based multivariate analyses confound location and dispersion effects David I. Warton 1 *, Stephen T. Wright

More information

A Theory of Gradient Analysis

A Theory of Gradient Analysis Originally Published in Volume 18 (this series), pp 271 317, 1988 A Theory of Gradient Analysis CAJO J.F. TER BRAAK AND I. COLIN PRENTICE I. Introduction... 236 II. Linear Methods... 241 A. Regression...

More information

Rank-abundance. Geometric series: found in very communities such as the

Rank-abundance. Geometric series: found in very communities such as the Rank-abundance Geometric series: found in very communities such as the Log series: group of species that occur _ time are the most frequent. Useful for calculating a diversity metric (Fisher s alpha) Most

More information

Vegetation Structure Assessment (VSA):

Vegetation Structure Assessment (VSA): Vegetation Structure Assessment (VSA): LFA Procedures for Measuring Vegetation Structure and its Functional Role Vegetation plays an important functional role in providing goods and services for both itself

More information

Unconstrained Ordination

Unconstrained Ordination Unconstrained Ordination Sites Species A Species B Species C Species D Species E 1 0 (1) 5 (1) 1 (1) 10 (4) 10 (4) 2 2 (3) 8 (3) 4 (3) 12 (6) 20 (6) 3 8 (6) 20 (6) 10 (6) 1 (2) 3 (2) 4 4 (5) 11 (5) 8 (5)

More information

THE ROLE OF COMPUTER BASED TECHNOLOGY IN DEVELOPING UNDERSTANDING OF THE CONCEPT OF SAMPLING DISTRIBUTION

THE ROLE OF COMPUTER BASED TECHNOLOGY IN DEVELOPING UNDERSTANDING OF THE CONCEPT OF SAMPLING DISTRIBUTION THE ROLE OF COMPUTER BASED TECHNOLOGY IN DEVELOPING UNDERSTANDING OF THE CONCEPT OF SAMPLING DISTRIBUTION Kay Lipson Swinburne University of Technology Australia Traditionally, the concept of sampling

More information

Dissimilarity and transformations. Pierre Legendre Département de sciences biologiques Université de Montréal

Dissimilarity and transformations. Pierre Legendre Département de sciences biologiques Université de Montréal and transformations Pierre Legendre Département de sciences biologiques Université de Montréal http://www.numericalecology.com/ Pierre Legendre 2017 Definitions An association coefficient is a function

More information

VarCan (version 1): Variation Estimation and Partitioning in Canonical Analysis

VarCan (version 1): Variation Estimation and Partitioning in Canonical Analysis VarCan (version 1): Variation Estimation and Partitioning in Canonical Analysis Pedro R. Peres-Neto March 2005 Department of Biology University of Regina Regina, SK S4S 0A2, Canada E-mail: Pedro.Peres-Neto@uregina.ca

More information

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission.

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. Effects of Sample Distribution along Gradients on Eigenvector Ordination Author(s): C. L. Mohler Source: Vegetatio, Vol. 45, No. 3 (Jul. 31, 1981), pp. 141-145 Published by: Springer Stable URL: http://www.jstor.org/stable/20037040.

More information

CAP. Canonical Analysis of Principal coordinates. A computer program by Marti J. Anderson. Department of Statistics University of Auckland (2002)

CAP. Canonical Analysis of Principal coordinates. A computer program by Marti J. Anderson. Department of Statistics University of Auckland (2002) CAP Canonical Analysis of Principal coordinates A computer program by Marti J. Anderson Department of Statistics University of Auckland (2002) 2 DISCLAIMER This FORTRAN program is provided without any

More information

EXAM PRACTICE. 12 questions * 4 categories: Statistics Background Multivariate Statistics Interpret True / False

EXAM PRACTICE. 12 questions * 4 categories: Statistics Background Multivariate Statistics Interpret True / False EXAM PRACTICE 12 questions * 4 categories: Statistics Background Multivariate Statistics Interpret True / False Stats 1: What is a Hypothesis? A testable assertion about how the world works Hypothesis

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

Figure 43 - The three components of spatial variation

Figure 43 - The three components of spatial variation Université Laval Analyse multivariable - mars-avril 2008 1 6.3 Modeling spatial structures 6.3.1 Introduction: the 3 components of spatial structure For a good understanding of the nature of spatial variation,

More information

Crossword puzzles! Activity: stratification. zonation. climax community. succession. Match the following words to their definition:

Crossword puzzles! Activity: stratification. zonation. climax community. succession. Match the following words to their definition: Activity: Match the following words to their definition: stratification zonation climax community succession changing community structure across a landscape changing community composition over time changes

More information

The Environmental Classification of Europe, a new tool for European landscape ecologists

The Environmental Classification of Europe, a new tool for European landscape ecologists The Environmental Classification of Europe, a new tool for European landscape ecologists De Environmental Classification of Europe, een nieuw gereedschap voor Europese landschapsecologen Marc Metzger Together

More information

Multivariate Analysis of Ecological Data

Multivariate Analysis of Ecological Data Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology

More information

2002 HSC Notes from the Marking Centre Geography

2002 HSC Notes from the Marking Centre Geography 2002 HSC Notes from the Marking Centre Geography 2003 Copyright Board of Studies NSW for and on behalf of the Crown in right of the State of New South Wales. This document contains Material prepared by

More information

1.3. Principal coordinate analysis. Pierre Legendre Département de sciences biologiques Université de Montréal

1.3. Principal coordinate analysis. Pierre Legendre Département de sciences biologiques Université de Montréal 1.3. Pierre Legendre Département de sciences biologiques Université de Montréal http://www.numericalecology.com/ Pierre Legendre 2018 Definition of principal coordinate analysis (PCoA) An ordination method

More information

Exploratory Factor Analysis and Principal Component Analysis

Exploratory Factor Analysis and Principal Component Analysis Exploratory Factor Analysis and Principal Component Analysis Today s Topics: What are EFA and PCA for? Planning a factor analytic study Analysis steps: Extraction methods How many factors Rotation and

More information

Species Associations: The Kendall Coefficient of Concordance Revisited

Species Associations: The Kendall Coefficient of Concordance Revisited Species Associations: The Kendall Coefficient of Concordance Revisited Pierre LEGENDRE The search for species associations is one of the classical problems of community ecology. This article proposes to

More information

Multivariate Analysis of Ecological Data

Multivariate Analysis of Ecological Data Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology

More information

Current controversies in Marine Ecology with an emphasis on Coral reef systems. Niche Diversification Hypothesis Assumptions:

Current controversies in Marine Ecology with an emphasis on Coral reef systems. Niche Diversification Hypothesis Assumptions: Current controversies in Marine Ecology with an emphasis on Coral reef systems Open vs closed populations (already Discussed) The extent and importance of larval dispersal Maintenance of Diversity Equilibrial

More information

Too good to be true: pitfalls of using mean Ellenberg indicator values in vegetation analyses

Too good to be true: pitfalls of using mean Ellenberg indicator values in vegetation analyses && (2) Too good to be true: pitfalls of using mean Ellenberg indicator values in vegetation analyses David Zelený & André P. Schaffers Keywords Bio-indication; Circularity of reasoning; Compositional similarity;

More information

-The study of the interactions between the different species in an area

-The study of the interactions between the different species in an area Community Ecology -The study of the interactions between the different species in an area Interspecific Interactions -Interaction between different species -May be positive, negative, or neutral and include

More information

4/2/2018. Canonical Analyses Analysis aimed at identifying the relationship between two multivariate datasets. Cannonical Correlation.

4/2/2018. Canonical Analyses Analysis aimed at identifying the relationship between two multivariate datasets. Cannonical Correlation. GAL50.44 0 7 becki 2 0 chatamensis 0 darwini 0 ephyppium 0 guntheri 3 0 hoodensis 0 microphyles 0 porteri 2 0 vandenburghi 0 vicina 4 0 Multiple Response Variables? Univariate Statistics Questions Individual

More information

Selecting and Breeding for Cold Resistance in Eucalyptus

Selecting and Breeding for Cold Resistance in Eucalyptus Selecting and Breeding for Cold Resistance in Eucalyptus By L. D. PRYOR, Superintendent, Parks and Gardens Section, Canberra, A. C. T. Australia (Received for publication August 14, 1956) Importance in

More information

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 Statistics Boot Camp Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 March 21, 2018 Outline of boot camp Summarizing and simplifying data Point and interval estimation Foundations of statistical

More information

Affinity analysis: methodologies and statistical inference

Affinity analysis: methodologies and statistical inference Vegetatio 72: 89-93, 1987 Dr W. Junk Publishers, Dordrecht - Printed in the Netherlands 89 Affinity analysis: methodologies and statistical inference Samuel M. Scheiner 1,2,3 & Conrad A. Istock 1,2 1Department

More information

Multivariate Statistics Summary and Comparison of Techniques. Multivariate Techniques

Multivariate Statistics Summary and Comparison of Techniques. Multivariate Techniques Multivariate Statistics Summary and Comparison of Techniques P The key to multivariate statistics is understanding conceptually the relationship among techniques with regards to: < The kinds of problems

More information

Chapter 1 Ordination Methods and the Evaluation of Ediacaran Communities

Chapter 1 Ordination Methods and the Evaluation of Ediacaran Communities Chapter 1 Ordination Methods and the Evaluation of Ediacaran Communities 1 2 3 Matthew E. Clapham 4 Contents 1.1 Introduction... 000 1.2 ataset Summary... 000 1.3 ata Standardization... 000 1.4 Ordination

More information

1.2. Correspondence analysis. Pierre Legendre Département de sciences biologiques Université de Montréal

1.2. Correspondence analysis. Pierre Legendre Département de sciences biologiques Université de Montréal 1.2. Pierre Legendre Département de sciences biologiques Université de Montréal http://www.numericalecology.com/ Pierre Legendre 2018 Definition of correspondence analysis (CA) An ordination method preserving

More information

Introduction to multivariate analysis Outline

Introduction to multivariate analysis Outline Introduction to multivariate analysis Outline Why do a multivariate analysis Ordination, classification, model fitting Principal component analysis Discriminant analysis, quickly Species presence/absence

More information

Current controversies in Marine Ecology with an emphasis on Coral reef systems

Current controversies in Marine Ecology with an emphasis on Coral reef systems Current controversies in Marine Ecology with an emphasis on Coral reef systems Open vs closed populations (already discussed) The extent and importance of larval dispersal Maintenance of Diversity Equilibrial

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

Quotations from other works that I have written

Quotations from other works that I have written Quotations from other works that I have written (Including supporting documentation from other sources) The following five groups of quotations are in numerical order of what I consider to be of the greatest

More information

Jae-Bong Lee 1 and Bernard A. Megrey 2. International Symposium on Climate Change Effects on Fish and Fisheries

Jae-Bong Lee 1 and Bernard A. Megrey 2. International Symposium on Climate Change Effects on Fish and Fisheries International Symposium on Climate Change Effects on Fish and Fisheries On the utility of self-organizing maps (SOM) and k-means clustering to characterize and compare low frequency spatial and temporal

More information

Metacommunities Spatial Ecology of Communities

Metacommunities Spatial Ecology of Communities Spatial Ecology of Communities Four perspectives for multiple species Patch dynamics principles of metapopulation models (patchy pops, Levins) Mass effects principles of source-sink and rescue effects

More information

Dynamic and Succession of Ecosystems

Dynamic and Succession of Ecosystems Dynamic and Succession of Ecosystems Kristin Heinz, Anja Nitzsche 10.05.06 Basics of Ecosystem Analysis Structure Ecosystem dynamics Basics Rhythms Fundamental model Ecosystem succession Basics Energy

More information

Topic Page: Central tendency

Topic Page: Central tendency Topic Page: Central tendency Definition: measures of central tendency from Dictionary of Psychological Testing, Assessment and Treatment summary statistics which divide the data into two halves (i.e. half

More information

Probability and Statistics

Probability and Statistics Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be CHAPTER 4: IT IS ALL ABOUT DATA 4a - 1 CHAPTER 4: IT

More information

4/4/2018. Stepwise model fitting. CCA with first three variables only Call: cca(formula = community ~ env1 + env2 + env3, data = envdata)

4/4/2018. Stepwise model fitting. CCA with first three variables only Call: cca(formula = community ~ env1 + env2 + env3, data = envdata) 0 Correlation matrix for ironmental matrix 1 2 3 4 5 6 7 8 9 10 11 12 0.087451 0.113264 0.225049-0.13835 0.338366-0.01485 0.166309-0.11046 0.088327-0.41099-0.19944 1 1 2 0.087451 1 0.13723-0.27979 0.062584

More information

Chapter 5: Evenness, Richness and Diversity

Chapter 5: Evenness, Richness and Diversity 142 Chapter 5: Deep crevice in a large Inselberg at Bornhardtia near Ironbark 143 Chapter 5 Eveness, Richness and Diversity 5.1 Introduction The distribution of abundances amongst species in communities

More information

Multivariate analysis

Multivariate analysis Multivariate analysis Prof dr Ann Vanreusel -Multidimensional scaling -Simper analysis -BEST -ANOSIM 1 2 Gradient in species composition 3 4 Gradient in environment site1 site2 site 3 site 4 site species

More information

Wavelet methods and null models for spatial pattern analysis

Wavelet methods and null models for spatial pattern analysis Wavelet methods and null models for spatial pattern analysis Pavel Dodonov Part of my PhD thesis, by the Federal University of São Carlos (São Carlos, SP, Brazil), supervised by Dr Dalva M. Silva-Matos

More information

Multivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis

Multivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis Multivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis For example Data reduction approaches Cluster analysis Principal components analysis

More information

PCA Advanced Examples & Applications

PCA Advanced Examples & Applications PCA Advanced Examples & Applications Objectives: Showcase advanced PCA analysis: - Addressing the assumptions - Improving the signal / decreasing the noise Principal Components (PCA) Paper II Example:

More information

Feature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size

Feature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size Feature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size Berkman Sahiner, a) Heang-Ping Chan, Nicholas Petrick, Robert F. Wagner, b) and Lubomir Hadjiiski

More information

Unsupervised machine learning

Unsupervised machine learning Chapter 9 Unsupervised machine learning Unsupervised machine learning (a.k.a. cluster analysis) is a set of methods to assign objects into clusters under a predefined distance measure when class labels

More information

Exploratory Factor Analysis and Principal Component Analysis

Exploratory Factor Analysis and Principal Component Analysis Exploratory Factor Analysis and Principal Component Analysis Today s Topics: What are EFA and PCA for? Planning a factor analytic study Analysis steps: Extraction methods How many factors Rotation and

More information

Improved Kalman Filter Initialisation using Neurofuzzy Estimation

Improved Kalman Filter Initialisation using Neurofuzzy Estimation Improved Kalman Filter Initialisation using Neurofuzzy Estimation J. M. Roberts, D. J. Mills, D. Charnley and C. J. Harris Introduction It is traditional to initialise Kalman filters and extended Kalman

More information

Four aspects of a sampling strategy necessary to make accurate and precise inferences about populations are:

Four aspects of a sampling strategy necessary to make accurate and precise inferences about populations are: Why Sample? Often researchers are interested in answering questions about a particular population. They might be interested in the density, species richness, or specific life history parameters such as

More information

Welcome! Text: Community Ecology by Peter J. Morin, Blackwell Science ISBN (required) Topics covered: Date Topic Reading

Welcome! Text: Community Ecology by Peter J. Morin, Blackwell Science ISBN (required) Topics covered: Date Topic Reading Welcome! Text: Community Ecology by Peter J. Morin, Blackwell Science ISBN 0-86542-350-4 (required) Topics covered: Date Topic Reading 1 Sept Syllabus, project, Ch1, Ch2 Communities 8 Sept Competition

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Linear Equations. 196 minutes. 191 marks. Page 1 of 50

Linear Equations. 196 minutes. 191 marks. Page 1 of 50 Linear Equations 196 minutes 191 marks Page 1 of 50 Q1. The perimeter of this L-shape is 56 cm. Not drawn accurately Set up and solve an equation to work out the value of x. x =... (Total 4 marks) Page

More information

The Model Building Process Part I: Checking Model Assumptions Best Practice

The Model Building Process Part I: Checking Model Assumptions Best Practice The Model Building Process Part I: Checking Model Assumptions Best Practice Authored by: Sarah Burke, PhD 31 July 2017 The goal of the STAT T&E COE is to assist in developing rigorous, defensible test

More information

Using Solar Active Region Latitude Analysis to Monitor Solar Cycle Progress

Using Solar Active Region Latitude Analysis to Monitor Solar Cycle Progress Using Solar Active Region Latitude Analysis to Monitor Solar Cycle Progress A Study Commissioned by RyeBrook Space Science Services RyeBrook Space 2017 Abstract: This paper seeks to answer the question

More information

Applying cluster analysis to 2011 Census local authority data

Applying cluster analysis to 2011 Census local authority data Applying cluster analysis to 2011 Census local authority data Kitty.Lymperopoulou@manchester.ac.uk SPSS User Group Conference November, 10 2017 Outline Basic ideas of cluster analysis How to choose variables

More information

EVALUATING THE REPEATABILITY OF TWO STUDIES OF A LARGE NUMBER OF OBJECTS: MODIFIED KENDALL RANK-ORDER ASSOCIATION TEST

EVALUATING THE REPEATABILITY OF TWO STUDIES OF A LARGE NUMBER OF OBJECTS: MODIFIED KENDALL RANK-ORDER ASSOCIATION TEST EVALUATING THE REPEATABILITY OF TWO STUDIES OF A LARGE NUMBER OF OBJECTS: MODIFIED KENDALL RANK-ORDER ASSOCIATION TEST TIAN ZHENG, SHAW-HWA LO DEPARTMENT OF STATISTICS, COLUMBIA UNIVERSITY Abstract. In

More information

Time: 1 hour 30 minutes

Time: 1 hour 30 minutes Paper Reference(s) 6684/0 Edexcel GCE Statistics S Silver Level S Time: hour 30 minutes Materials required for examination papers Mathematical Formulae (Green) Items included with question Nil Candidates

More information

Time: 1 hour 30 minutes

Time: 1 hour 30 minutes Paper Reference(s) 6683/01 Edexcel GCE Statistics S1 Gold Level G4 Time: 1 hour 30 minutes Materials required for examination papers Mathematical Formulae (Green) Items included with question Nil Candidates

More information

Logistic Regression: Regression with a Binary Dependent Variable

Logistic Regression: Regression with a Binary Dependent Variable Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression

More information

INTRODUCTION TO ANALYSIS OF VARIANCE

INTRODUCTION TO ANALYSIS OF VARIANCE CHAPTER 22 INTRODUCTION TO ANALYSIS OF VARIANCE Chapter 18 on inferences about population means illustrated two hypothesis testing situations: for one population mean and for the difference between two

More information

UNIVERSITY OF YORK BIOLOGY. Animal and Plant Biology Part II

UNIVERSITY OF YORK BIOLOGY. Animal and Plant Biology Part II Examination Candidate Number: Desk Number: UNIVERSITY OF YORK BSc Stage 1 Degree Examinations 2017-18 Department: BIOLOGY Title of Exam: Animal and Plant Biology Part II Time allowed: 2 hours Total marks

More information

The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1)

The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1) The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1) Authored by: Sarah Burke, PhD Version 1: 31 July 2017 Version 1.1: 24 October 2017 The goal of the STAT T&E COE

More information

Transitivity a FORTRAN program for the analysis of bivariate competitive interactions Version 1.1

Transitivity a FORTRAN program for the analysis of bivariate competitive interactions Version 1.1 Transitivity 1 Transitivity a FORTRAN program for the analysis of bivariate competitive interactions Version 1.1 Werner Ulrich Nicolaus Copernicus University in Toruń Chair of Ecology and Biogeography

More information

Time: 1 hour 30 minutes

Time: 1 hour 30 minutes Paper Reference(s) 6684/01 Edexcel GCE Statistics S2 Gold Level G3 Time: 1 hour 30 minutes Materials required for examination papers Mathematical Formulae (Green) Items included with question Nil Candidates

More information

Catalonia is a small region, managed by an autonomous government (depending from Spain), and placed in NE. Spain, next to Mediterranean sea.

Catalonia is a small region, managed by an autonomous government (depending from Spain), and placed in NE. Spain, next to Mediterranean sea. Characterization of the river basin according to the WFD The Catalan Basins, a case of small Mediterranean water district Planning Department c/ Provença, 204-208 08036-Barcelona Introduction to Catalonia

More information

Supplementary Material

Supplementary Material Supplementary Material The impact of logging and forest conversion to oil palm on soil bacterial communities in Borneo Larisa Lee-Cruz 1, David P. Edwards 2,3, Binu Tripathi 1, Jonathan M. Adams 1* 1 Department

More information

The Precise Effect of Multicollinearity on Classification Prediction

The Precise Effect of Multicollinearity on Classification Prediction Multicollinearity and Classification Prediction The Precise Effect of Multicollinearity on Classification Prediction Mary G. Lieberman John D. Morris Florida Atlantic University The results of Morris and

More information

Chapter 1. Gaining Knowledge with Design of Experiments

Chapter 1. Gaining Knowledge with Design of Experiments Chapter 1 Gaining Knowledge with Design of Experiments 1.1 Introduction 2 1.2 The Process of Knowledge Acquisition 2 1.2.1 Choosing the Experimental Method 5 1.2.2 Analyzing the Results 5 1.2.3 Progressively

More information

Stochastic beach profile modelling

Stochastic beach profile modelling Stochastic beach profile modelling Paste 2015 RJ Jewell and AB Fourie (eds) 2015 Australian Centre for Geomechanics, Perth, ISBN 978-0-9924810-1-8 https://papers.acg.uwa.edu.au/p/1504_35_seddon/ KD Seddon

More information

Canonical Correlation & Principle Components Analysis

Canonical Correlation & Principle Components Analysis Canonical Correlation & Principle Components Analysis Aaron French Canonical Correlation Canonical Correlation is used to analyze correlation between two sets of variables when there is one set of IVs

More information

Generalized Linear Models (GLZ)

Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) are an extension of the linear modeling process that allows models to be fit to data that follow probability distributions other than the

More information

Hierarchical, Multi-scale decomposition of species-environment relationships

Hierarchical, Multi-scale decomposition of species-environment relationships Landscape Ecology 17: 637 646, 2002. 2002 Kluwer Academic Publishers. Printed in the Netherlands. 637 Hierarchical, Multi-scale decomposition of species-environment relationships Samuel A. Cushman* and

More information