Effects of Sample Distribution along Gradients on Eigenvector Ordination Author(s): C. L. Mohler Source: Vegetatio, Vol. 45, No. 3 (Jul. 31, 1981), pp. 141-145 Published by: Springer Stable URL: http://www.jstor.org/stable/20037040. Accessed: 04/03/2011 11:23 Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at. http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at. http://www.jstor.org/action/showpublisher?publishercode=springer.. Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org. Springer is collaborating with JSTOR to digitize, preserve and extend access to Vegetatio. http://www.jstor.org
Effects of sample distribution along gradients on eigenvector ordination C. L. Mohler* Section of Ecology and Systematics, Cornell University, Ithaca, NY 14850, USA Keywords: Correspondence analysis, Detrended correspondence analysis, Eigenvector ordination, Gradient analysis, Principal components analysis, Sample distribution, Stratified sampling Abstract In general, disproportionately heavy sampling of the ends of a gradient increases the interpretability of eigenvector ordinations. More specifically, correspondence analysis (CA) and detrended correspondence analysis (DCA) best reproduce the original positions of samples in simulated coenoclines when samples are clustered toward the ends of the axis. Principal components analysis (PCA) reproduces the original sample positions less well than either CA or DCA and shows no improvement as samples are increasingly clustered toward the ends of the axis. PCA and CA show less curvature of one dimensional data into the second axis when sampling favors the ends of the axis. Introduction Ordination is often used to discover and elucidate major axes of compositional variation in vegetation data. In general, however, most phytosociologists have at least a rough idea of the variety of vegetation in an area even before formal sampling. When this is the case it is possible to stratify sampling with respect to the dominant axis (axes) of variation. This paper explores some aspects of pattern of sample stratification on performance of ordination techniques. The somewhat complemen tary problem of determining compositional distance between samples on an axis will be dealt with elsewhere. This study was prompted by the discovery that disproportionately heavy sampling of the ends of a gradient allows greater accuracy in *I thank Mark V. Wilson, Peter L. Marks, Hugh G. Gauch, the late R. H. Whittaker, E. van der Maarel, and several anonymous reviewers for helpful comments on the manuscript, and Monica Howland for preparing the figures. This work was supported by Mclntire-Stennis Grant No. 183-7551 and a grant from the National Park Service, both to Peter L. Marks of the Section of Ecology and Systematics at Cornell University. plotting the response of species abundance against environmental variables and ordination axes. There are a great variety of ordination techniques and the effect of sample arrangement on ordination will depend on the technique used. First, as Gauch et al ( 1977) indicate, sample dispersion can have no effect on Bray-Curtis polar ordination (Bray & Curtis 1957) because the technique arranges samples only with respect to selected end points. Second, in weighted average ordination (Ellenberg, 1948; Whittaker, 1956) sample positions are assigned only on the basis of species weights (and vice versa) so that clustering the samples can have no effect on their relative positions in the ordination. Third, Gaussian ordination (Gauch et al, 1974; Ihm & van Groenewoud, 1975) should be improved by heavy sampling toward the ends of the gradient since such a dispersion of samples gives maximum informa tion about the more poorly defined curves. The most common and generally useful techniques, however, are principal components analysis (PCA) and correspondence analysis (CA, also called reciprocal averaging). Gauch et al (1977) find some changes in both PCA and CA Vegetatio 45, 141-145 (1981). 0042-3106/81/0453-0141 $1.00.? Dr. W. Junk Publishers, The Hague. Printed in The Netherlands.
142 when clusters of samples are added to a coenocline sampled at regular intervals. Although their dis cussion is brief, apparently CA is only slightly affected whereas PCA is strongly and unpredict ably affected. PCA and CA ordinations of coenoclines generally show 'arch distortion' (Gauch et al, 1977) in which samples (or species) lying originally on a single axis are displaced into higher axes. Although this is considered an inconvenience in arranging data it is a logical consequence of the nature of the tech nique: samples (species) with low first axis loadings necessarily differ from those with high loadings and the two sorts therefore fall at opposite ends of some higher axis. Accordingly, it seems reasonable to expect that bunching samples at the end should raise the percent of variance accounted for by the first axis and reduce the variance accounted for by the second and higher axes, which is to say, reduce the curvature into higher axes. Recently Hill & Gauch (1980) have introduced detrended correspondence analysis (DCA) in which curvature of the primary axes into higher axes is systematically eliminated and positions are ad justed to remove CA's tendency to bunch species and samples near the ends of axes. Two hypotheses based on the above considera tions \vere tested: (1) that deviation from regular sampling of a coenocline causes eigenvector ordina tions to distort first axis sample arrangements and (2) that concentration of samples near the ends of the gradient decreases curvature into the second axis (except in the case of DCA where such curvature Methods is secondarily removed). 2 2 _l?l_ 3 2 _l_l_ 2 3 3 3 3 3 2 _J_I_l_ 2 3 3 3 2 _l_i_i 2 2 i i i i i i _l_i_i_l_ 2 2 2 _J_I_I_I 2 3 3 _l_i_i 25 50 75 100 Coenocline N Index 1 23.8 1 30.5 1 40.0 Fig. 1. Sample dispersion patterns for ordination of simulated data. Coenocline index values are computed by the formula?m where AT is the axial position of the th sample, Xis the mean of X, and n is the number of samples. In all cases considered here n = 21 and X = 50, except the last where X = 48.33. The index runs from 0 (all samples at X) to 100 (half of the samples at 0 and half at 100). according to the eight patterns diagrammed in Figure 1. (Some coenoclines replicated certain sampling patterns.) Each data set was then ordinated using CA, DCA, species centered PCA, and species centered PCA with standardization. To compare the dispersion of samples on the original axis with the dispersion on the ordination axes I computed the index 2" I A 68.6 77.1 80.0 87.0 (1) I first created three replicates of a 19 species, 21 stand coenocline using the simulation algorithm of Gauch & Whittaker (1972) modified so that the expected standard deviation of a species' abundance at each point on the gradient is proportional to its mean abundance at that point. This modification produces data with a structure similar to that from a variety of gradient analysis studies (Mohler, 1979 and unpublished data). The coenocline measured 2.9 half changes using the technique of Wilson & Mohler (in press). I concatenated the three data sets in various ways to produce 11 coenoclines sampled where X- is the axial position of sample y, X is the mean of _?.., and n is the number of samples. The index runs from 0 (all samples at X) to 100 (half of the samples at 0 and half at 100). Thus, for example, if Id is smaller for an ordination than for the coenocline, the ordination is tending to shift samples toward the center of moment of the axis. For each sampling pattern I also computed the mean displacement of samples on ordination axes from their position on the original coenocline (Kessell & Whittaker, 1976). Mean sample dis placement is a measure of the overall lack of
143 Table 1. Dispersion pattern of coenoclines and ordinations, and mean displacement of samples on each ordination from their original positions on the corresponding coenocline. All coenocline and ordination axes were scaled from 0 to 100. Dispersion pattern is quantified by the index /?(see text). Note that some dispersion patterns are replicated. The PCA ordinations were species centered with standardization. i H Mean Sample Displacement Coenocline axis DCA axis 1 CA axis 1 PCA axis 1 DCA CA PCA Center favored Regular spacing Ends favored 23.8 30.5 40.0 68.6 68.6 77.1 80.0 87.0 19.9 28.2 35.5 47.5 45.1 47.7 62.0 63.6 73.2 76.0 87.0 14.7 25.0 34.7 51.4 48.0 51.2 68.2 69.1 79.1 81.6 89.7 14.0 18.3 22.3 30.7 35.5 31.6 46.4 47.1 58.6 61.9 67.0 5.0 3.6 4.1 3.4 5.6 2.8 4.8 3.6 2.6 2.7 1.8 9.7 12.8 8.9 6.3 7.7 5.7 4.8 4.7 3.8 3.6 6.3 14.9 7.9 9.2 11.0 9.3 10.6 11.7 11.6 10.2 10.0 12.9 congruence between the two sample arrangements whereas comparison of /?values indicates whether a lack of congruence is due to systematic shift of samples with respect to the center of moment of the axis. Results For CA and DCA the mean displacement of samples from their true position generally decreases as the samples become more clustered toward the ends (Table 1). The only notable exception to this trend is the slightly larger mean displacement for the most heavily end-weighted coenocline when using CA. Correlations between coenoclines and ordinations show almost exactly the same pattern. Placement of stands is consistently more accurate with DCA than with CA regardless of sampling scheme. In contrast to the two types of corre spondence analysis, the mean displacement for standardized PCA does not improve when samples are clustered toward the ends (Table 1). On the other hand, correlations between coenocline and PCA axis positions do show a general improvement when samples are clustered toward the ends. In general, misplacement of samples is worse for species centered PCA with standardization than for either CA or DCA. PCA without standardization shows involution in which the true end samples are displaced toward the center and samples at inter mediate positions are shifted outward. Although involution with unstandardized PCA occurs with all sampling schemes tested, clustering samples toward the ends decreases this sort of distortion. Thus, as samples go from highly bunched toward the center to highly concentrated near the ends, separation of coenocline end samples on the ordi nation axis increases from 30 units to 85 units. DCA reproduces the original dispersion pattern fairly well for all data sets tested (Table 1). Bunching samples toward the center of the gradient tends to produce CA ordinations in which the samples are even more centrally clustered than on the original axis, as indicated by Id values which are lower for the ordinations than for the original gradient (Table 1). On the other hand, Id values for CA are close to those for the original gradient for all sampling patterns which favor the gradient extremes (Table 1). PCA does not perform so well. Relative to the original dispersion, species centered PCA with standardization always tends to bunch samples toward the mean position (Table 1). Since nonstandardized PCA does not preserve sample sequence there is little point in computing /?values for this technique. With CA and the two PCA techniques, degree of curvature into the second axis is less when sampling
> 144 _i_i-1? 20 40 60 80 Fig. 2. Variance accounted for by an ordination versus sample dispersion. Ordinate is variance accounted for (% EV) by first and second correspondence analysis axes; abscissa is coenocline sample dispersion index for the coenocline (see text). The upper curve plots % EV for the first axis; the lower plots % EV for the second axis. Trend lines are fitted by eye. % EV values for the noiseless coenocline are represented by X. favors the ends. Percent of variance accounted for by the first CA axis rises from 44% for the most centrally clustered pattern to 75% for the pattern which most favors the extremes (Fig. 2). In a complementary fashion, percentage of variance accounted for by the second axis falls from 39% to 9% (Fig. 2). Both PCA procedures perform in a similar 80h 60 ^ 40 20 h Discussion manner. Eigenvector ordination seems to work best when sampling pattern favors the gradient extremes. CA and DCA reproduce the original sample positions more exactly when ends of the gradient are sampled more heavily than the center. Although DCA adjusts sample positions to correct for systematic displacement this does not necessarily correct for misplacement due to random variation in the data. Concentration of samples near the ends of the gradient through stratified sampling does seem to correct for some additional component of error, but the actual mechanism involved is unclear. As in previous evaluations (Chardy et al, 1976; Gauch et al, 1977), PCA proved less able to recover the original sample positions than CA. Recovery of original positions by PCA does not improve when samples are clustered toward the extremes; never theless, heavy sampling of the ends results in less involution and less curvature into the second axis and thus makes PCA ordinations more interpret able than with regular or centrally clustered sampling. Although I made no attempt to evaluate the effect of sampling pattern on ordinations of data with two or more axes major of variation, results of this study can be generalized to more complex data sets. First, from the previous discussion it seems reasonable to expect that whenever a factor is known to affect community composition, stratifi cation which favors the extremes ofthat factor will improve stand placement on CA or DCA axes which correlate highly with that factor. Increased accuracy in sample placement due to multiple axis stratification is likely to be greatest with DCA since the various axes will then lack the complicating distortions found in other eigenvector techniques. Second, with both CA and PCA, curvature of the first axis into higher axes complicates interpreta tion even when there are several major dimensions of variability in the data. Sample stratification which favors extremes of the first dimension should reduce this curvature. Furthermore, stratification of samples with respect to the second dimension of variation should reduce curvature of the second ordination axis into the third and so forth. Since stratification which favors the extremes loads additional variability into particular dimen sions it can influence which dimension emerges as the dominant axis. A change in the ranking of axes will be of little importance in most applications, however, since relative placement of samples within the ordination field will be largely independent of which axis appears with the largest eigenvalue. The consequences of this study for field workers are clear: in order to visualize variability in vegeta tion using eigenvector ordination one must capture variability in the sample. Accordingly, extreme and unusual environments should be strongly favored during sampling. In contrast, since intermediate environments tend to be more common than extreme ones within most landscapes, a haphazard or simple random sample is likely to result in
145 ordinations of low interpretability. I demonstrate elsewhere that stratified sampling which favors extreme communities also greatly improves estimates of species distribution parameters. References Bray, J. R. & Curtis, J. T., 1957. An ordination of the upland forest communities of southern Wisconsin. Ecol. Monogr. 27: 325-349. Chardy, P., Glemarec, M. & Lauree, A., 1976. Application of inertia methods to benthic marine ecology: practical impli cations of the basic options. Estuarine Coastal Mar. Sei. 4: 179-205. Ellenberg, H., 1948. Unkrautgesellschaften als Mass f?r den Sauregrad, die Verdichtung und andere Eigenschaften des Ackerbodens. Ber. Landtech. 4: 130-146. Gauch, H. G. Jr. & Whittaker, R. H., 1972. Coenocline simulation. Ecology 53: 446-451. Gauch, H. G. Jr., Chase, G. B. & Whittaker, R. H., 1974. Ordination of vegetation samples by Gaussian species dis tributions. Ecology 55: 1382-1390. Gauch, H. G. Jr., Whittaker, R. H. & Wentworth, T. R., 1977. A comparative study of reciprocal averaging and other ordina tion techniques. J. Ecol. 65: 157-174. Hill, M. O. & Gauch Jr., H. G., 1980. Detrended correspondence analysis, an improved ordination technique. Vegetatio 42: 47-59. Ihm, P. & Groenewoud, H. van, 1975. A multivariate ordering of vegetation data based on Gaussian type gradient response curves. J. Ecol. 63: 767-777. Kessell, S. R. & Whittaker, R. H., 1976. Comparisons of three ordination techniques. Vegetatio 32: 21-29. Mohler, C. L., 1979. An analysis of floodplain vegetation of the lower Neches drainage, southeast Texas, with some con siderations on the use of regression and correlation in plant synecology. Ph.D. Thesis, Cornell Univ., Ithaca, N.Y. Whittaker, R. H., 1956. Vegetation of the Great Smoky Moun tains. Ecol. Monogr. 26: 1-80. Wilson, M. V. & Mohler, C. L., In press. GRADBETA-a FORTRAN program for measuring compositional change along gradients. Ecology & Systematics, Cornell Univ., Ithaca, N.Y. 51 p. Accepted 5.1.1981.