Some Processes or Numerical Taxonomy in Terms or Distance

Size: px

Start display at page:

Download "Some Processes or Numerical Taxonomy in Terms or Distance"

Marianna Anderson
6 years ago
Views:

1 Some Processes or Numerical Taxonomy in Terms or Distance JEAN R. PROCTOR Abstract A connection is established between matching coefficients and distance in n-dimensional space for a variety of character scoring procedures. Following this, geometric interpretations are given of some clustering methods used for matching coefficients, and the merits of the methods are considered. As a consequence, it is suggested the average coefficient method should be replaced by the centroid method. Introduction A classification by numerical taxonomic processes usually involves three main stages: the numerical coding of taxonomic characters for each organism; the calculation of a measure of resemblance between each pair of organisms, using the coded characters; and the grouping of organisms on the basis of the measures of resemblance. A number of methods have been proposed for each of these three stages, but there has been a tendency to treat each stage as a separate problem, whereas a unified approach in which the clustering methods are related to the definition and method of formation of the measure of resemblance would seem more satisfactory. Such an approach is followed in this paper using the simple matching coefficient (Sokal and Michener, 1958) as the measure of resemblance. The existence of a relationship between the matching coefficient and the distance coefficient of Sokal (1961) suggested the possibility of examining in terms of distance some of the clustering methods that have been used for matching coefficients. In order to do this, the coded characters are regarded as defining a space of n-dimensions within which each organism is represented by a point, the distance between points being a measure of resemblance inversely related to the matching coefficient. An advantage of working in terms of distance rather than with matching coefficients is that it is possible to visualize, at least 131 partially, what the various methods imply geometrically and thus obtain a better understanding of the processes of cluster analysis. Also, the distance approach appears to have greater potential for forming a unified theoretical approach to all aspects of numerical taxonomy. The connection between the matching coefficient and Sokal's distance coefficient is discussed briefly by Sokal and Sneath (1963:151) but only for coefficients based entirely on two-state characters. Since other types of characters are frequently used for matching coefficients, part of this paper consists of a description of the various scoring methods used for these characters and a distance interpretation of these methods. After establishing a relationship between matching coefficients and distance for various types of character scoring, some of the clustering methods used with matching coefficients are discussed in terms of distance, and the centroid method of cluster analysis is introduced. To a large extent the terminology of Sokal and Sneath (1963) is followed in this paper. Following their usage, an organism, or basic unit of the material on which a numerical taxonomic analysis is carried out, is referred to as an OTU (operational taxonomic unit) in the remainder of this paper. One small change is that the symbol S S M for the matching coefficient is here simplified to S since no other association coefficients are discussed. Where different

2 132 SYSTEMATIC ZOOLOGY terms are used, the Sokal and Sneath equivalents are usually stated. In this paper the term "centroid" has its usual physical meaning of centre of mass or centre of gravity. The Matching Coefficient, Distance, and the Distance Coefficient If two OTUs 7 and / agree on m^- out of n simple two-state characters, the matching coefficient, S^-, is defined by For Sokal's distance coefficient, d%., each of the n two-state characters is allotted one dimension in which the two character states may be represented by the values 0 and 1. Then, if Xi(p) and Xj(p) are the values scored by 7 and / respectively in the pth dimension, the distance between 7 and / in this dimension is [xi(p)-xj(p)]. Since the dimensions are orthogonal, the distance A t y between 7 and / in the n-dimensional space is given by... + Wn^n)] 8. In any dimension p, if 7 and / have the same score, then whereas, if they score differently, then Thus A^. is equal to the number of characters in which 7 and / differ, i.e. Aj = n-m<,. (2) Then, from equations (1) and (2), S ij = l-ayn. (3) Thus, when two-state characters alone are used, there is a simple relationship between the matching coefficient and distance in n- dimensional space. Sokal's distance coefficient for two-state characters coded 0 and 1 is simply the squared distance scaled so that the coefficient cannot exceed unity. Since the greatest value attainable by A?, is n, <2?. fined by Then, using equation (3), is de- (4) that is, the sum of the matching coefficient and the distance coefficient is unity. For simplicity, the discussion in this paper will be in terms of distance rather than the distance coefficient, except where the latter is specifically mentioned. Since the distance coefficient is proportional to the squared distance, there is a simple transition from one measure to the other. Character Scoring and Distance For the relationship between matching coefficients and distance (equation 3) to apply to matching coefficients in general, a distance interpretation, consistent with this relationship, is needed for all types of character scoring commonly used with these coefficients. The problems to be considered include missing information and character weighting, as well as the scoring of multistate, dependent, and quantitative characters. 1. Missing Information. Where it is impossible to score a character for any particular OTU, it is customary to record NC (no comparison) and ignore the character in comparing this OTU with any other OTU, whether or not the latter is scored on this character. Thus, if, on comparing two OTUs on n characters, they agree on c, cannot be compared on y, and differ on (n-c-y), the matching coefficient will be c/(n-y). can also be written This -y in which form it appears that the degree of similarity on the y missing characters is estimated from the coefficient based on the remaining characters. In distance terms, the squared distance between the OTUs on

SOME PROCESSES OF NUMERICAL TAXONOMY IN TERMS OF DISTANCE 133 the missing characters is estimated by the average squared distance between the OTUs on the other characters. 2. Weighting.

3 SOME PROCESSES OF NUMERICAL TAXONOMY IN TERMS OF DISTANCE 133 the missing characters is estimated by the average squared distance between the OTUs on the other characters. 2. Weighting. Giving a two-state character weight w is equivalent to representing the two states in one dimension by the values 0 and yjw instead of 0 and 1. Weighting can also be considered to be equivalent to representing a character in each of w dimensions by the values 0 and 1. Since in these w dimensions only two points, distance -\/w apart, are used in scoring a two-state character, the two distance representations are equivalent, the second merely introducing w-1 redundant dimensions. 3. Multistate Characters. Several scoring methods have been proposed for multistate characters, i.e., characters consisting of more than two mutually exclusive states. With the non-additive method (Beers and Lockhart, 1962), two OTUs score one similarity if they agree on a multistate character and one difference if they disagree. A concise way of coding multistate characters to achieve this purpose is described by Lockhart and Hartman (1963). A three-state character scored by the non-additive method can be fitted into the distance scheme by allotting it two dimensions instead of the one dimension of a two-state character, the three states being represented by the vertices of an equilateral triangle with sides of unit length. Similarly, the states of a four-state character can be represented in three dimensions by the vertices of an equilateral tetrahedron with unit sides. In general, afc-state character can be represented in (k-1) dimensions by an equilateral figure with sides of unit length. The distance between any pair of OTUs compared on a single multistate character represented in this way is either unity or zero, as with a two-state character. It should be noted that although the inclusion of a Restate character increases the number of dimensions by (k-1), the number of characters and hence the denominator of the matching and distance coefficients is increased by one only, when the non-additive method is used. Another scoring method is to treat each state as if it were a separate two-state "present or absent" character. If two OTUs agree on a fc-state qualitative character, k similarities are recorded, whereas if they differ, there are k-2 similarities and 2 differences. Thus a fc-state character introduces k-2 redundant similarities into the matching coefficient, and each multistate character has a weight of 2. If multistate characters are to have the same weight as two-state characters, this scoring method must be applied to the latter also. If this is done, a matching coefficient obtained using this method is equal to the proportion of character states on which two OTUs agree rather than the proportion of characters, as with the non-additive method. The geometric interpretation of this second method is similar to that of the non-additive method, except that the equilateral figure representing a multistate character has sides of length \/2 instead of 1. Redundant similarities do not affect the distance, but merely the denominators of the distance and matching coefficients. If the states of a multistate character can be ordered, the additive scoring method is sometimes used. Each state is again treated as a separate character, but each OTU is recorded as having not only the character state in which it falls, but all lower states also. The differences between two OTUs on afc-state character scored in this way can vary from 0 to k-1, so that the character is given weight fc-1. In terms of distance, a k- state character scored by the additive method can be represented in fc-1 dimensions by the k points (0, 0, 0,...), (1, 0, 0,...), (1, 1,0,...), (1,1,1...), etc. 4. Dependent Characters. Another type of character which presents scoring problems is the secondary or dependent character; i.e., a character which cannot be considered unless some other, primary feature is present. An OTU which does not possess a primary feature cannot be scored on any of

4 134 SYSTEMATIC ZOOLOGY its secondary characters. Although only one level of dependence is considered in this paper, various levels are possible. The subject of dependence in numerical taxonomy has been discussed by Proctor and Kendrick (1963), Kendrick and Proctor (1964), Lockhart (1964) and Kendrick (1965). Two scoring methods have been suggested for secondary characters. The first (Proctor and Kendrick, 1963; Kendrick and Proctor, 1964) weights a primary character according to the number of its secondary characters and codes a secondary character NC if the primary character is absent. The distance interpretation of this method is therefore covered by sections 1 and 2 above. In the second method, proposed by Lockhart (1964), primary characters are not weighted, but an additional character state is added to each secondary character. If an OTU does not possess a primary feature, the additional state of each associated secondary character is recorded as present. Since the non-additive scoring method for multistate characters is used, the distance interpretation of this method is covered by part of section 3 above. 5. Quantitative Characters. In forming matching coefficients, quantitative characters are usually treated as multistate characters, the range of measurement being split into arbitrary states. They are then scored by either the non-additive or the additive method, although it is recognized that neither of these methods is entirely satisfactory: if the non-additive method is used, the degree of difference between states is not taken into account, whereas, if the additive method is used, too much weight is given to each quantitative character. Kendrick and Proctor (1964) suggested that fractional values be given to differences in quantitative characters, and this has been discussed further by Kendrick (1964,1965). But, from the matching coefficient standpoint, it is not evident how fractional similarities and differences should be assessed. For a distance coefficient, quantitative features appear to be the most suitable characters. Each character is allotted one dimension, and, after suitable scaling, measurements can be used directly instead of being placed into arbitrary character states. Then, if OTUs I and / score Xi(p) and Xj(p) on quantitative character p, the contribution to the squared distance, A^., is as with two-state characters, but this quantity is no longer restricted to the values 0 and 1. The matching coefficient equivalent of this method, consistent with the relationship of equation 4, is to add to the numerator and 1 to the denominator of the coefficient. Thus, whereas a distance treatment of multistate and dependent characters is obtained by a direct distance interpretation of the matching coefficient scoring methods, a matching coefficient treatment of quantitative characters is suggested by their distance treatment. The main problem in treating quantitative characters in this way, is the choice of a suitable scale. The scale chosen should have the property that differences of equal magnitude should be of equal taxonomic importance whether or not they occur at the upper or at the lower end of the scale. This is often not true with biological measurements, since natural variation frequently increases with the size of a feature. This difficulty can usually be overcome by rescaling measurements by means of either a square root or a logarithmic transformation before coding them for a numerical taxonomic analysis. It is also usually desirable to standardize the characters so that they all have approximately equal weight. One approach to this is to decide on the range of measurements to be expected in any one character, to equate the lower and upper limits of this range to 0 and 1 respectively, and then to code intermediate values. (An occasional value below 0 or above 1 should have little

5 SOME PROCESSES OF NUMERICAL TAXONOMY IN TERMS OF DISTANCE 135 effect on the resultant coefficients.) Other methods of scaling characters are discussed by Sokal and Sneath (1963: ). Cluster Analysis and Distance The methods of cluster analysis discussed in this paper are all hierarchical, linking first the most similar OTUs, and continuing the linkage process until all OTUs are combined in one cluster. The results of such a process can be displayed by means of a tree diagram, usually termed a dendrogram or phenogram. Once a cluster has been formed by linking two or more OTUs, it is necessary in these hierarchical methods to define a measure of resemblance between this cluster and other clusters or individual OTUs, and it is in the choice of this measure that the methods to be discussed differ. For matching coefficients, the highest of the coefficients between the existing clusters (including unlinked OTUs) at any stage in the clustering process is then used as the criterion in forming the next cluster. Some related methods (the variable group methods of Sokal and Michener, 1958) in which the clustering criterion is an arbitrary coefficient level, lowered by a fixed amount at each stage in the clustering process, have similar distance interpretations, although they are not discussed here. 1. Single Coefficient Clustering Methods. In two clustering methods, the resemblance between two clusters is measured by a matching coefficient between a single pair of OTUs, this being either the maximum or the minimum of the coefficients between pairs of OTUs, one from each cluster. The more commonly used is the maximum coefficient (the single linkage method of Sokal and Sneath, 1963:180), which corresponds to the smallest distance coefficient or shortest distance between two clusters. The minimum coefficient method (the -complete linkage method of Sokal and Sneath, 1963:181; proposed by S0renson, 1948, for ecological studies) corresponds to measuring the distance between two clusters by the longest of the distances between pairs of OTUs, one from each cluster. Thus both the maximum and minimum coefficient methods have simple distance interpretations, but, with both, the peripheral members of clusters determine the order of subsequent clustering. When clusters are represented by a compact set of points, both methods work well, but if clusters are diffuse or poorly represented, the omission or addition of one OTU may drastically alter the order of clustering. Any cluster obtained using both methods is likely to be a well defined group. 2. Average Coefficient Methods. Some form of average coefficient is frequently used to measure the resemblance between clusters. In the unweighted pair-group method of Sokal and Michener (1958), called here the average coefficient method, the relationship between two clusters is assessed by calculating the arithmetic average of coefficients between all pairs consisting of one OTU from each cluster. In distance terms this is equivalent to the average of the corresponding squared distances, which can be shown to be equal to the squared distance between the cluster centroids plus, for both clusters, the average squared distance of individual OTUs from the cluster centroid. Hence the average distance coefficient between clusters {«} and {/?}, containing t a and tp OTUs respectively, can be shown to be equal to (5) where d 2 a^ is the distance coefficient between the cluster centroids; d 2^ is the distance coefficient between an OTU I in cluster {«} and the centroid of {«}; d^ is the distance coefficient between an OTU / in cluster {/?} and the centroid of {/?}. Another measure of average cluster resemblance is the median coefficient proposed by Kendrick and Proctor (1964). Its merits are that it is not influenced by outlying coefficient values, and, in the absence of a computer program for cluster analysis, it is easier to obtain than the arithmetic average but gives very similar results. In

6 136 SYSTEMATIC ZOOLOGY FIG. 1. see text. (a) (b) (c) Methods of combining three OTUs and determining the cluster centre. For explanation, terms of distance, it has no simple geometric interpretation. A variant of the average coefficient method is the weighted pair group method of Sokal and Michener (1958), called here the weighted average coefficient method, in which equal weight is given to each cluster regardless of its size. The aim of the weighted method is to prevent a cluster analysis from being unduly influenced by taxa represented by many OTUs at the expense of taxa with few representatives, but it appears to be too crude a method of weighting to be effective. (See also the discussion of the Weighted Centroid Method). 3. The Centroid Method. If the scores of all OTUs within a cluster are averaged for each character, the resulting values may be regarded as defining an average OTU, representative of the cluster. To obtain matching coefficients between average OTUs, fractional similarities and differences must be assessed and incorporated into the coefficients. This problem has already been discussed in the section on quantitative characters, and again the difficulty is easily overcome by considering the distance interpretation. The average OTU is equivalent to the cluster centroid, and the distance between cluster centroids may be regarded as a measure of cluster resemblance. This distance is easy to calculate, fractional scores presenting no problem. The centroid method can be easily demonstrated by considering three OTUs represented by the points A, B, and C in n- dimensional space, Fig. l(a). (Since there are only three points, they necessarily lie in a plane and so can be drawn in two dimensions.) S A B, SBC, SAC are the matching coefficients and d\ B, d 2 BC, d\ c are the distance coefficients between the pairs of OTUs. Then, by equation 4, S AB = 1 - d\ B, etc.: Also d AB = n = AB/^n, etc. Suppose that AB is shorter than BC or AC so that the first step in clustering these three OTUs is to combine A and B. The point obtained by averaging the scores of A and B for each character is X, the midpoint of the line AB and the centroid of the cluster {A, B). Now by elementary geometry it can be shown that CX 2 = (AC 2 + BC 2 )/2 - AB 2 /4, and so, if d 2 cx is the distance coefficient between C and X, i.e., the centroid distance coefficient, then If the centroid matching coefficient, S C x, is defined by

7 SOME PROCESSES OF NUMERICAL TAXONOMY IN TERMS OF DISTANCE LU l.o-i 9, (a) CENTROID METHOD _y \y_ (b) AVERAGE COEFFICIENT METHOD FIG. 2. Comparison of dendrograms produced for 16 OTUs of the genus Verticicladiella (Hyphomycetes) by application of the centroid (a) and average coefficient (b) methods. so that centroid matching and distance coefficients are related in the same way as the corresponding coefficients between individual OTUs, then S C x = (S AC + S BC )/2 + (l-s AB )/4. Extending the discussion to two clusters {a} and {/?} containing t a and tp members, it can be shown that the centroid distance coefficient between {a} and d 2 ap, can be obtained from the formula d2 - Data I 5 (6) where D a p is the sum of the distance coefficients between all pairs of OTUs consisting of one from each cluster; D aa and Dpp are the sums of the distance coefficients between all pairs of OTUs within {«} and {/?} respectively. The corresponding centroid matching coefficient, S a /3, is given by

8 138 SYSTEMATIC ZOOLOGY = -f r(n-i) L 2*3 Mpgi t\ \ ' (7) where M a p, M a «and Mpp, are the sums of matching coefficients corresponding to D a p, Daa and D/3/3, respectively. Centroid coefficients at any stage in the clustering process can also be calculated directly from the centroid coefficients obtained for the previous stage rather than from the coefficients between individual OTUs. For instance, if clusters {a} and {/?} are combined, the centroid matching coefficient between the cluster {a, /?} and another cluster {$}, S{ap)o, is given by (tasao + (*«+ #3) t a tp(l-safi) 4. Comparison of Centroid and Average Coefficient Methods. The results of the application of both the centroid and the average coefficient methods to the matching coefficients between 16 OTUs, consisting of Hyphomycetes all belonging to the genus Verticicladiella, are shown by the dendrograms in Fig. 2. (The 16 OTUs are part of a larger group discussed by Kendrick and Proctor, 1964.) The OTUs were independently identified as belonging to five species, represented by no. 1, nos. 8-13, nos , no. 21, and no. 22, respectively. Although, using the average coefficient method, the lowest value at which OTUs within the same species are linked,.827, is higher than the highest value at which species are linked,.807, the difference between these values is small compared to the range of values at which species are linked,.645 to.807. With the centroid method, the difference between the lowest within-species link,.883, and the highest between-species link,.807, is increased, and the range of between-species links,.778 to.807, is greatly decreased. Thus, the centroid method gives results more in accordance with the taxonomists' assessment than does the average coefficient method, although this does not necessarily prove the superiority of the centroid method. It will be noted from this example that the average matching coefficients are never greater than the corresponding centroid matching coefficients and that, as the clusters become larger and more diffuse, the differences between the two types of coefficients tend to increase. It follows that the average distance coefficients are never less than the corresponding centroid distance coefficients. It is evident from formula (5) that this is true in general, since, unless all OTUs within each cluster are identical so that all OTUs are distance zero from their cluster centroid, the average distance coefficient must be greater than the centroid distance coefficient. Also, as the clusters become more diffuse, the average squared distance of individual OTUs from their cluster centroid will increase, and hence the differences between the two types of coefficient will increase. Assuming that cluster centroids are equally distant, the average coefficient method will combine dense clusters before diffuse clusters. However, since the minimum distance between the diffuse clusters will be less than between the dense clusters, it seems more logical, if either is to be given priority, to combine the diffuse before the dense clusters. Also, as clusters become more diffuse, the average coefficient method appears to overestimate the dissimilarity between clusters to an increasingly greater extent. For these reasons the centroid method, which is independent of cluster density, is preferable to the average coefficient method. In Fig. 2(a), it may be noted that, in three cases, three clusters or single OTUs are linked at the same level. Nos. 8, 9, and 10 are linked simultaneously because the coefficients between 8 and 9 and between 8 and 10 are equal, but the other triple links are due to reversals in coefficient values. Thus, after combining nos. 1 and 22 at a coefficient of.807, it was found that the coefficient between the cluster {8-13} and the cluster {1, 22} was.823. Thus

9 SOME PROCESSES OF NUMERICAL TAXONOMY IN TERMS OF DISTANCE 139 cluster {8-13} appears to be more similar to the cluster {1, 22} than the component OTUs of the latter cluster are to each other. Similar reversals occur when correlation coefficients are combined using Spearman's method. These are discussed by Sokal and Sneath (1963: ). Returning to the example of Fig. 1, a reversal after combining A and B would simply mean that CX was shorter than AB, in which case AC and BC could be only slightly longer than AB and so the cluster {A,B} would not be clearly separated from C. In general, a reversal indicates that when two clusters are combined, the distance between their centroids is greater than the distance between the centroid of the combined cluster and that of at least one other cluster. Thus a cluster formed immediately before a reversal is not a clearly separated group. Consequently, it seems reasonable to combine at the same level all clusters having higher matching coefficients with a newly formed cluster than the coefficient level at which the latter was formed. The problem is then to define the level of resemblance at which the clustering takes place. So far, I have simply used the coefficient immediately preceding a reversal as this measure. It should be noted that although the occurrence of a reversal indicates that a cluster is not clearly separated from other OTUs the opposite is not necessarily true. Many of the clusters formed by the methods discussed in this paper will not necessarily be satisfactory taxonomic groups, i.e., there will be no clear discontinuity between the clusters. 5. Weighted Centroid Methods. It is possible to define a weighted centroid method analogous to the weighted average coefficient method in which each cluster is given the same weight regardless of the number of OTUs each cluster contains. In considering the geometrical interpretation of this method, Fig. 1 will again be used. The first step in a weighted analysis, as with the unweighted method, is to combine A and B into a cluster with centre at X, the midpoint of AB. On combining C with A and B, the centre of the cluster {A,B, C} would be at Y, where CY = % CX, if the unweighted centroid method were used. However, with the weighted centroid method, the cluster centre is at Z, the mid-point of CX. Whether Y or Z seems the more reasonable cluster centre depends on the relative positions of A, B and C. If the distance between A and B is short relative to CX, as in Fig. l(b), so that A and B could be representatives of one taxon while C belonged to a second, then Z may be a better cluster centre than Y. If, on the other hand, AC and BC are only slightly longer than AB, as in Fig. l(c), then Y seems an appropriate cluster centre. An obvious disadvantage of this weighted method is that the position of a cluster centre depends so much on the order in which the OTUs forming the cluster are joined together. The omission of one intermediate OTU can alter the order of clustering and so change the apparent centre of even a large cluster very considerably. These disadvantages apply to the weighted average coefficient method as well as to the weighted centroid method. Although the type of weighting discussed so far is unsatisfactory, some technique is desirable to overcome distortions in clustering caused by unequal representations of taxa. When combining A and B with C in the above example, a satisfactory method would give equal weight to each OTU when they are all equally distant from each other, but would give the same weight to C as to the cluster {A, B} when A and B are identical. This result might be achieved by using some measure of the relationship between the variation within each cluster to the distance between clusters as the cluster weight for determining the centre of a combined cluster. Further work is continuing on this problem. 6. Discussion of Clustering Methods. In taxonomy it is expected that there will be variations between individuals within a species, between species within a genus, etc., but that these different types of variation

10 140 SYSTEMATIC ZOOLOGY will have different orders of magnitude. Thus, in terms of the variation within a species, there should be a discontinuity between one species and any other species; in terms of the variations between species within a genus, there should be a discontinuity between one genus and any other genus; etc. An ideal method of cluster analysis will be one which can point up these discontinuities, as well as measure the resemblance between clusters. In the clustering methods discussed in this paper, no attempt is made to test for discontinuities between clusters. Of the measures of cluster resemblance used in these methods, it has been shown that the maximum coefficient, minimum coefficient, average coefficient and centroid coefficient all have consistent distance interpretations, although the average coefficient is a poor measure of overall cluster resemblance. The centroid method is advocated not as the ideal clustering method but simply as an improvement on the average coefficient method. At present there is no entirely satisfactory method of cluster analysis in numerical taxonomy. If taxonomic groups are well separated, most methods will produce a satisfactory clustering. Unfortunately, separation between groups is often poor, and under these circumstances, the different clustering methods produce different results. To achieve a satisfactory method it is necessary to take into account the variation within clusters as well as the distance between clusters. In the previous section, a measure of the relationship between these two quantities was suggested as a cluster weighting factor. The same or related measure might also be used to test for discontinuities between clusters. the analyses were carried out on an I.B.M computer using programs written by Mr. A. Bickle and Mr. B. Meredith. I am indebted also to Dr. P. Robinson and Mrs. P. M. Morse for many valuable suggestions made during the preparation of this manuscript. (Copies of the computer programs used may be obtained from Mr. A. Bickle, Statistical Research Service, Canada Department of Agriculture, Ottawa, Ontario, Canada.) REFERENCES BEERS, R. J. and W. R. LOCKHART Experiment methods in computer taxonomy. J. Gen. Microbiol. 28: KENDRICK, W. B Quantitative characters in computer taxonomy. Systematics Assoc. Publ. 6, Phenetic and phylogenetic classification, p KENDRICK, W. B Complexity and dependence in computer taxonomy. Taxon 14: KENDRICK, W. B. and J. R. PROCTOR Computer taxonomy in the fungi imperfecti. Can. J. Bot. 42: LOCKHART, W. R Scoring of data and group-formation in quanitative taxonomy. Developments in industrial microbiol. 5: LOCKHART, W. R. and P. A. HARTMAN Formation of monothetic groups in quantitative bacterial taxonomy. J. Bacteriol. 85: PROCTOR, J. R. and W. B. KENDRICK Unequal weighting in numerical taxonomy. Nature 197: SOKAL, R. R Distance as a measure of taxonomic similarity. Systematic Zool. 10: SOKAL, R. R. and C. D. MICHENER A statistical method for evaluating systematic relationships. Univ. Kansas Sci. Bull. 38: SOKAL, R. R. and P. H. A. SNEATH Principles of numerical taxonomy. W. H. Freeman and Co., San Francisco and London, 359 p. S0RENSEN, T A method of establishing groups of equal amplitude in plant sociology based on similarity of species content and its application to analyses of the vegetation on Danish commons. Biol. Skr. 5(4): Acknowledgments The taxonomic data used in this paper were supplied by Dr. W. B. Kendrick, and JEAN R. PROCTOR is employed in the Statistical Research Service of the Canada Department of Agriculture, Central Experimental Farm, Ottawa, Ontario, Canada.

ESTIMATION OF CONSERVATISM OF CHARACTERS BY CONSTANCY WITHIN BIOLOGICAL POPULATIONS

ESTIMATION OF CONSERVATISM OF CHARACTERS BY CONSTANCY WITHIN BIOLOGICAL POPULATIONS JAMES S. FARRIS Museum of Zoology, The University of Michigan, Ann Arbor Accepted March 30, 1966 The concept of conservatism