Can taxon-sampling effects be minimized by using branch supports? P. Hovenkamp

Cladistics Cladistics 22 (2006) 264 275 www.blackwell-synergy.com Can taxon-sampling effects be minimized by using branch supports? P. Hovenkamp Nationaal Herbarium Nederland, Leiden, PO Box 9514, NL-2300 RA Leiden, The Netherlands Accepted 27 February 2006 Abstract Rydin and Ka llersjo (2002) found that taxon-sampling effects were a strongly disturbing factor in a high-level phylogenetic analysis. I have reanalyzed some of their data to assess whether bootstrap frequencies can be used to predict the stability of clades to taxon sampling, and to compare it with the performance of a stability measure based on taxon resampling ( taxon jackknifing ). High bootstrap frequencies correctly identify a small number of stable clades, but miss many other equally stable clades. When the total error rate is considered, no cut-off level based on bootstrap frequencies performs better than using all clades in the strict consensus, whereas a slight improvement was observed when cut-off levels based on taxon jackknifing frequencies are used. Ó The Willi Hennig Society 2006. Taxon-sampling effects are among the most worrying sources of error in phylogeny reconstruction. It has been well-documented that they may have a severe effect on the results of an analysis in studies based on both real and simulated data sets (Maddison et al., 1984, their fig. 14; Donoghue et al., 1989; Lecointre et al., 1993; Hovenkamp, 1996; Poe, 1998; Poe and Swofford, 1999; Graham and Olmstead, 2000; Grandcolas and D Haese, 2001; Rosenberg and Kumar, 2001; Rydin and Ka llersjo, 2002; Simmons et al., 2004; Soltis et al., 2004; Leebens-Mack et al., 2005). Their effect may also be counterintuitive, as Duff and Nickrent (1999) noted, who reported that deletion of a single taxon influenced distant clades more often than it disturbed the clades directly involved. More than anything else, the fear of taxon-sampling effects can undermine confidence in the value of an analysis. Taxon-sampling effects are not only apparently widespread, severe and unexpected, but also unavoidable. In many cases, it is impossible to apply a denser taxon sampling, thus sampling one s way out of the Felsenstein zone (Leebens-Mack et al., 2005), e.g., when dealing with groups where many of the species are only known from a few poorly preserved specimens, or in Corresponding author: P. Hovenkamp E-mail address: hovenkamp@nhn.leidenuniv.nl groups where high recent extinction rates are suspected. And in the cases where denser sampling is possible, it is not clear at what levels of intensity sampling effects become negligible. Poe (1998), for instance, showed that at least for small data sets, the size of the data set is positively correlated with the degree of distortion by taxon-sampling effects, which suggests that increasing the number of taxa may not always be effective. In practice, most systematists probably deal with taxon-sampling effects by ignoring clades with low support and then hope for the best. This presupposes that commonly used clade support measures, such as bootstrap frequencies, have some relationship to taxonsampling effects, and that they can be used to predict the persistence of clades under differential taxon-sampling schemes. The belief that the bootstrap provides such a measure appears to be widely held among practicing systematists (as any casual conversation will show), but is rarely expressed explicitly, and seems to be unsupported by factual evidence. Studies that aim at assessing the effects of taxon sampling (Graybeal, 1998; Poe, 1998; Pollock et al., 2002; Zwickl and Hillis, 2002) have primarily used tree-based, not node-based, measures of accuracy, and therefore are silent on this problem. The performance of node-based supports as predictors of clade stability to taxon sampling has been assessed explicitly only by Simmons et al. (2004), who found that Ó The Willi Hennig Society 2006

P. Hovenkamp / Cladistics 22 (2006) 264 275 265 jackknife supports (they did not assess bootstrapping) are very poorly correlated to clade stability under such conditions (their fig. 3C). Savolainen et al. (2000) suggest that using bootstrap supports with a cut-off level of 80% is effective in identifying clades that are stable to the addition or deletion of taxa, but their claim is based on anecdotal evidence, not on a systematic assessment of the performance of bootstrap support using different cut-off levels (Chase, pers. comm.). Thus, little is known about the usefulness of bootstrap support either or not in combination with particular cutoff levels for evaluating clade stability. Moreover, bootstrap supports are based on manipulating character sampling, which raises the question whether supports that are based on manipulating taxon sampling may not be more appropriate indicators for the assessment of taxon-sampling effects. Such a support measure is available (Lanyon, 1985; Siddall, 1995), but has never been implemented in a computer program, and accordingly, is little used. Here, I investigate the performance of bootstrap support well as taxon jackknife figures in predicting the stability of clades to taxon-sampling effects. Data and methods I use part of the data set of Rydin and Ka llersjo (2002), which provides an example of a data set in which serious taxon-sampling effects have been observed. I selected their 38-taxon data sets 1, 2, 8 and 9, in which most of the more disturbing taxon-sampling effects were displayed. In accordance with their numbering, these data sets are here numbered 38-1, 38-2, 38-8 and 38-9. In addition, I added four data sets produced by randomly resampling 38 taxa from the total set of taxa of these four data sets combined. These data sets are indicated as 38R1 38R4. All 38-taxon data sets will be referred to as partial data sets. A full data set was produced by combining the four selected partial data sets, containing 67 taxa. For this data set, a consensus tree and bootstrap supports were computed, which were used as reference tree against which the results of the partial analyses are evaluated, using either the consensus tree or the bootstrap consensus tree at different cut-off levels. For the partial data sets, minimal length trees, bootstrap majority trees and taxon jackknifing supports at taxon deletion levels of 1, 2, 5 and 10 taxa were computed for all clades in the minimal length trees. Minimal length trees and bootstrap supports were all computed with TNT (Goloboff et al., 2004). Heuristic searches were conducted using default settings (10 random addition sequences followed by TBR branch swapping, holding 10 trees per replication). For bootstrap, the standard bootstrap option was selected, and computed using 0 replications (Hedges, 1992). To compute taxon jackknifing values, batch files were prepared and results were tallied using WinSupport (written in Visual Basic and available on request) in conjunction with Nona (Goloboff, 1998). Heuristic searches were conducted with 15 random addition sequences followed by TBR branch swapping, holding 20 trees for each addition sequence (Nona command sequence hold 20; mult*15;max;). For all levels of taxon deletion, 0 replicates were used, except for the 1-taxon level, which was computed exactly by deleting each taxon in turn. To evaluate the predictivity of each type of branch support, I compared each accepted clade from the partial analyses with the accepted clades in the full analysis. If it could be identified with one of these, it was scored as a true positive, if not, as a false positive. Conversely, if an accepted clade from the full analysis was not accepted in a partial analysis, it was there scored as a false negative. These assignments were repeated while applying different cut-off support levels for the partial analyses. The use of cut-off levels introduces a fourth category, true negatives, for the clades that for a particular choice of cut-off levels are rejected or absent in both the partial and the full analysis. The proportion of false negatives thus represents the rate at which hypotheses of monophyly would be incorrectly rejected on the basis of a partial analysis ( type II error ), the rate at which hypotheses of monophyly would be incorrectly accepted ( type I error ) is given by the proportion of false positives. Table 2 provides a tabular overview of all the categories distinguished. When a tree is evaluated that contains only a subset of the taxa in the reference tree, the identification of clades in the subset tree with clades from the reference tree presents a fundamental problem (Goloboff and Pol, 2002; Goloboff, 2005). Often, it is impossible to identify a clade from a partial analysis unequivocally with a single containing clade from the reference tree. In many cases, it is sufficient to identify it by locating the corresponding most recent common ancestor (MRCA), but this cannot be made into a general strategy, as it can (and does) easily lead to conflicting identifications and ignores real conflict between cladograms. For example, in partial analysis 38-02, the MRCA of clades 72 and 73 both correspond to clade 75 in the full analysis; and of both clade 65 (Gnetales Angiosperms) and 71 (Seedplants) to containing clade 74 (Seedplants). When systematists interpret phylogenetic trees they cannot therefore rely on simple algorithmic identifications, but must continuously apply their professional judgment to assess the relevance of a tree to a particular research question involving a larger group of taxa. There are unproblematic cases where no one would hesitate to apply the results of a partial analysis to a more inclusive clade. For instance, analysis of partial data set 38R1 returns a clade containing Isoetes and Phylloglossum. At

266 P. Hovenkamp / Cladistics 22 (2006) 264 275 face value this could be taken as support for monophyly of Isoetes and Phylloglossum, whereas applying the MRCA criterion would identify this clade with all vascular plants in the full analysis. However, as in this taxon selection Huperzia (generally associated with Phylloglossum where the two occur in the same data set) is not represented, this clade may reasonably be interpreted as supporting Lycopodiales, and not as evidence for exclusive monophyly of Isoetes and Phylloglossum. To the extent, however, that the position of the excluded taxa is more ambiguous, such assessments become more difficult to make. In interpreting the results of the partial analyses I have tried to make reasonable identifications. They are listed in Table 1, which provides, for each clade of the partial analyses (numbered as in the cladograms of Appendix 1, Fig. 7), the corresponding containing clade (numbered as in Appendix 1). 80 40 20 0 80 70 - -70 70-80 80- - Results Analysis of the full data set produced 66 cladograms of minimal length, with a strict consensus that is for the most part well-resolved (Appendix 1, Figs 5, 6). In the consensus trees obtained for partial data sets in total 248 clades were present (Appendix 2). A full list of all partial clades, their presence in the partial analysis and their identification is given in Table 1. The distribution of bootstrap values in the partial analyses is shown in Fig. 1 (upper panel). The proportion of high bootstrap values in this study is about average compared with several data sets reported in the literature (data not shown). Taxon-sampling effects Inspection of Table 1 shows that the constancy of clades over the partial analyses appears to be linked to their bootstrap supports, but also that there may be exceptions to this linkage. Thus, clades that are contradicted in a partial analysis may appear with high supports in others (e.g., the seedplant clade, which is contradicted in analysis 38R2), but in most cases where this is the case, the support for the contradictory clade is usually low (< %) or absent. More often, clades that have high bootstrap values ( %) in one partial analysis have bootstrap values that would be considered inadequate by most researchers in others. Bootstrap support For a more quantitative appreciation of the predictive accuracy of bootstrap supports, the bootstrap support can be related to the percentage of correct clades. When the consensus tree of the full analysis is used as a - -70 70-80 80- - Fig. 1. Relation between percentage correctly predicted clades (criterion: presence in consensus of full analysis) and bootstrap support in partial analyses.upper panel: total numbers of clades (binsize), lower panel: percentage of correct clades. reference, this relationship (Fig. 2) shows a more conservative behavior than has been found in other studies (Hillis and Bull, 1993; Wilcox et al., 2002; Simmons et al., 2004), with bootstrap percentages over % already indicating correct clades with a frequency of 80 %. The conservative behavior (Hillis and Bull, 1993; Rodrigo, 1993) of the bootstrap indicates that hypotheses of monophyly are more often rejected than they are in fact incorrect. To explore the effect of this conservative behavior, Fig. 2 shows the different error rates at different cut-off levels for the partial analyses, evaluated on the basis of a reference tree based on the consensus of the full analysis. Of all clades from the consensus trees of the partial analyses, 33 (13%) are not represented in the consensus of the full analysis (12 of which are in fact contradicted). Thus, acceptance of the results of a partial analysis with no consideration of bootstrap supports would, as the result of taxon-sampling effects, have an error rate of 13%, composed entirely of false positives. If clades are accepted on the basis of a particular bootstrap support level, increasing the cut-off level from % (accepting all clades in the bootstrap majority tree) results in increasingly smaller numbers of accepted clades and in an associated drop in the rate of false positive results. However, this drop is more than compensated by the increase in false negatives, resulting from the rejection of clades by the increasingly restrictive cut-off levels. The

P. Hovenkamp / Cladistics 22 (2006) 264 275 267 Table 1 All clades occurring in at least one of the strict consensus trees for the partial analyses, and the corresponding clade number and associated bootstrap support in the analysis of the full dataset. Numbers refer to clade numbers in Appendix 1, bootstrap supports between brackets; a: clade absent in the full analysis; n: clade cannot be matched to a clade in the full dataset due to differential taxon composition; c: clade contradicted in the partial analysis Partial datasets Full dataset clade 38R1 38R2 38R3 38R4 38-01 38-02 38-09 38-10 68 (< ) n 38 (< ) 38 (< ) n n 47 (< ) n n 69 (< ) n 39 (< ) n 38 (99) 45 () 48 (< ) 59 (99) n 70 (65) 45 () 66 (< ) 39 (53) 39 (99) 46 (91) 68 (63) (99) 59 (< ) 71 (< ) a 67 (< ) 53 (57) c 56 (56) 69 (51) c 63 (< ) 72 (64) 68 (98) 68 (< ) 54 (59) 53 (96) 57 (78) 70 (76) 64 (75) 64 (63) 73 (< ) c 69 (< ) 69 (< ) c a c a 68 (< ) 74 (70) 69 (96) c 70 (63) 65 () 71 (89) 71 (84) 69 (< ) 69 (< ) 75 (67) 71 (73) n 72 (8 71 (69) n n 71 (78) 71 (70) 76 (83) 72 (82) n n 72 (79) n n 72 (85) 72 (87) 78 (66) 39 (69) n 41 (85) 41 (80) n n 48 (67) 48 (< ) 79 (< ) n n c n n n 49 (< ) 49 () 80 () 40 (99) 41 () 43 (95) 42 (99) 38 (99) 38 () (99) n 81 (62) n a 44 (86) n n n a n 82 (< ) 43 (93) 44 (< ) 46 (< ) 44 (79) 40 () 40 () 54 (< ) 52 (< ) 84 (< ) 66 (70) 62 (54) 52 (< ) 52 (80) 55 (53) 52 (< ) a a 85 (55) 38 (95) n 40 (96) n n n 45 (55) 45 (52) 86 (73) n n n n n n 46 (72) 46 (70) 87 (88) n n n 40 (72) n n 47 (86) 47 (63) 88 () 41 () 43 (99) n 43 () n n 44 () 44 (97) 89 (< ) 42 (< ) c n n n n 52 (< ) a (97) 44 (98) n n 45 (98) 41 (96) 41 (96) 55 (95) 53 (95) 91 (99) n 47 () n 49 () n () n n 92 () 64 (99) n 49 (99) n 51 () n n n 93 (98) 65 (97) 48 (93) (99) (98) 53 (99) n n n 94 (< ) a n 51 (< ) c n 51 (< ) n n 95 (< ) n n n n 64 (< ) (< ) n n 96 (< ) 56 (57) 56 (76) 66 () 62 (< ) 67 (66) 62 (64) n n 97 (69) 57 (98) 57 (58) 67 (93) 63 (71) 68 (< ) 63 (< ) 67 (61) 66 (57) 98 () 58 (93) 64 (98) 68 (92) 64 (98) 69 (99) 64 (99) 68 (91) 67 () 99( < ) 49 (79) 46 (84) 47 (62) 48 (94) () n 62 (52) 61 (51) (< ) 63 (99) n 48 () n n n 63 () 62 () 101 (< ) 47 (85) n n n a 53 (51) n n 102 (76) 48 (85) 45 (75) n n n n n n 104 (99) () 49 () 55 (99) n n 43 () n 56 () 105 (96) 51 (99) (61) 58 (98) 56 (99) 44 (99) 46 (99) 58 () 57 (< ) 106 () n n n n 63 () 59 () n n 107 (98) n n 56 (99) 54 (98) n 44 (99) a n 108 (89) n n 57 (94) 55 (91) 43 (83) 45 (86) 57 (92) 55 (93) 109 () n n n 58 () () 55 () 66 () 65 () 110 (71) n 63 (70) n n 61 (77) 56 (80) n n 111 (99) n 52 () 59 () n n n 38 () 38 (99) 112 (97) n n n 66 (87) n n 39 (97) 39 (97) 113 (< ) n 53 (< ) n 59 (< ) 62 (0) 58 (< ) n n 114 (< ) n 54 (< ) 63 (< ) n n n n n 115 () 53 () n 61 () n n n 40 (9 40 () 116 (65) n 58 (< ) c 68 (< ) n n 42 (54) 42 (53) 117 (88) n 59 (88) n n n 49 (87) n n 118 (< ) 55 (< ) n n n n 57 (< ) n n 119 (99) n n n n 47 (99) n n n 120 (99) (99) n n n 48 (99) n n n 121 () n n n n n n 51 () 51 () 122 (99) n n n 67 (99) n n 41 () 41 () 123 (98) 62 (98) n n n n n n n a 46 (< ) n n a a n n n a 52 (< ) 55 (< ) a a 65 (< ) 61 (< ) n n a 54 (77) a 62 (< ) a n n 43 (0) 43 (< ) a 59 (< ) c c a a 65 (< ) a c

268 P. Hovenkamp / Cladistics 22 (2006) 264 275 Table 1 Continued Partial datasets Full dataset clade 38R1 38R2 38R3 38R4 38-01 38-02 38-09 38-10 a 70 (< ) a 71 (< ) 70 (< ) n c c c a n 40 (< ) c n n n c n a c 51 () c n n n c c a n (< ) n n n a n n a n 61 (< ) n c a n n n a n c 42 (54) n n n c n a c c 45 (< ) n n n c a a n c (< ) c n n c c a a c c 46 (< ) c c c c a n c n 51 (< ) c n n n a a a n n 39 (62) 39 () c a a a a a a 72 (82) 72 (83) 70 (78) 70 (< ) a n n n n a 42 (< ) 56 (< ) 54 (< ) a n n n n n 54 (< ) n n Table 2 Status assigned to clades in partial analysis. Acceptance can be on basis of various cut-off levels. Category False positive corresponds to Type II error, the category False negatives to Type I errors. Together, these two represent the error rate. In full analysis In partial analyses Accepted Rejected Accepted True positive False positive Rejected False negative True negative A steep increase is only present when both the level of taxon deletion and the cut-off level is relatively high, and as with bootstrap supports, this increase is mainly due to an increase in the number of false negatives. For each level of taxon deletion, there exists an optimal cut-off level at which the total error rate is slightly but distinctly lower than the null-rate for no clade selection. As severity of taxon deletion increases, this optimal cut-off level shifts towards lower values. total error rate, composed of both false positives and false negatives increases from 13% to reach 70% at bootstrap cut-off levels of %. Increasing cut-off levels result mainly in an increase of false negatives, while the number of correctly rejected clades remains more or less constant. Taxon jackknife support A comparison between bootstrap and taxon jackknife support is useful only if the two are not strongly linearly correlated. Figure 3 shows the correlations between the bootstrap support values and the taxon jackknife support values at different levels of taxon deletion for all 248 clades included in the analysis. Correlations are clearly not linear, but show saturation plots, in which the absence of a clear correlation below bootstrap levels of 80% is notable at all levels of taxon deletion. Based on these plots, we may expect that the performance of taxon jackknife support differs from that of bootstrap support. An analysis of error rates for the four levels of taxon jackknife deletion (Fig. 4) confirms this. For all levels of taxon deletion, there is only a very moderate increase, or even a slight decrease, in the total error rate for cut-off levels below 80%. Discussion Bootstrap proportions have been found to be conservative estimates of accuracy (Hillis and Bull, 1993) irrespective of whether accuracy is judged by reference to a real or a simulated true tree. Thus, the common practice of selection of clades on the basis of bootstrap proportion can be expected to lead to a rejection of clades more often than is actually warranted by the data (Rodrigo, 1993) and thus to a high rate of false negatives. The procedure used here to quantify this effect is comparable with other studies of the performance of bootstrap supports based on total error rates (Rodrigo, 1993; Berry and Gascuel, 1996), and in particular to the iterated bootstrap procedure, used by Rodrigo (1993) to asses error rates for bootstrap values. In the iterated bootstrap procedure, a single large data set is resampled for characters in a two-step procedure. The procedure used here differs in that taxa instead of characters are resampled in the first step. Despite this difference, there is a close agreement with the results of Rodrigo (1993) and Berry and Gascuel (1996). They found that total error rates were lowest at bootstrap cutoff levels of approximately %, and rose with higher cut-off levels. My results show that this behavior of the bootstrap also holds when accuracy is judged with

P. Hovenkamp / Cladistics 22 (2006) 264 275 269 % % 80% 70% True positives True negatives False negatives False positives % % 40% 30% 20% 10% 0% cons 70 80 Fig. 2. Bootstrap supports in partial analysis as indicators of clade presence in complete analysis. Horizontal: cut-off value used to accept or reject clades in the partial analyses cons: consensus tree. Vertical: percentage of clades from partial analyses (N ¼ 248). 95 95 85 85 80 80 75 75 70 70 65 65 55 55 55 65 70 75 80 85 95 55 65 70 75 80 85 95 1 taxon deletion 2 taxon deletion 95 85 80 75 70 65 55 55 65 70 75 80 5 taxon deletion 85 95 95 85 80 75 70 65 55 55 65 70 75 80 85 10 taxon deletion 95 Fig. 3. Correlations between bootstrap support and taxon jackknife support at different levels of taxon deletion. Hor: bootstrap support, vert. taxon jackknife support.

270 P. Hovenkamp / Cladistics 22 (2006) 264 275 % % 80% 70% % % 40% 30% 20% 10% 0% % % 80% 70% % % 40% 30% 20% 10% 0% % True positives True negatives % False negatives False positives 80% 70% % % 40% 30% 20% 10% 0% cons 70 80 cons 70 80 1 taxon deletion 2 taxon deletion % % 80% 70% % % 40% 30% 20% 10% 0% cons 70 80 cons 70 80 5 taxon deletion 10 taxon deletion Fig. 4. Taxon jackknife supports in partial analysis as indicators of clade presence in full analysis. T1: Jackknifing 1 taxon; T2 Jackknifing 2 taxa; T5 Jackknifing 5 taxa. Axes as in Fig. 2. reference to a tree based on a more inclusive taxon sample. I find a similar rise in total error rate from bootstrap cut-off levels of % going to %, although absolute levels are much lower than those reported by Rodrigo (1993). However, the total error rate for any cut-off level is minimally as high or higher than the error rate without any selection. This is in contrast to the results of Berry and Gascuel (1996), who found that error rate could be improved substantially (under some circumstances) by using a % bootstrap cut-off level. I found that even at this low cut-off level, the number of false negatives outweighed the number of true negatives, leading to a rising total error rate. The results presented here indicate that when taxonsampling effects are a serious concern, bootstrap support can be used as a predictor of clade stability only to a limited extent. On the one hand, it is clear that clades with a high bootstrap support are robust not only to differential character sampling (a property that is assessed directly by bootstrapping), but also to increased taxon sampling. Thus, the use of bootstrap supports in combination with a high acceptance level is likely to reduce the number of falsely accepted clades. On the other hand, it is not the case that clades with low bootstrap supports disappear under increased taxon sampling. Especially with cut-off levels in the range of 80 % selecting clades on the basis of bootstrap proportion may lead to an increase in number of falsely rejected clades that more than outweighs the reduction in the number of falsely accepted ones. Using such cutoff levels may be justified if the main interest is in the avoidance of falsely accepted clades, and if the associated increase in falsely rejected clades is not considered a problem. But the associated increase in total error rate represents an under-utilization of the information present in the data, and thus may necessitate the use of more data than is actually needed to resolve a particular problem. Whenever it is desirable also to minimize false negatives and thus to maximize the amount of information obtained from the data, the total error rate should be considered. On this basis, no cut-off level for accepting clades on basis of bootstrap support was found to be an improvement on the performance of a strict consensus as acceptance criterion for clades. When clades are evaluated based on their stability under a taxon deletion protocol the effects of the numbers of taxa deleted should be taken into account. Figure 3 shows that the effect of deleting a larger number of taxa increases the variance in support values thus obtained, and also that there is only a weak correlation between these support values and bootstrap

P. Hovenkamp / Cladistics 22 (2006) 264 275 271 values. Using a taxon jackknifing protocol a slight improvement in predictivity compared with using no support values can be observed at all four levels of taxon deletion (Fig. 4). It is difficult to say whether this slight decrease is truly significant, but the consistency with which the drop in total error rates is present at at least one cut-off level at all investigated levels of taxon deletion, suggests that the effect is not accidental. The observed pattern thus suggests that the total error rate of the strict consensus can be improved on slightly by using a combination of a severe taxon deletion scheme and a relaxed clade selection cut-off level or vice versa. This difference in performance between different resampling protocols suggests that clade supports based on data perturbation can be made maximally informative by adapting the type of perturbation to the specific source of error that it is expected to counteract. Thus, when support figures are intended to minimize taxonsampling effects, a permutation procedure using a taxon-resampling protocol is shown to give optimal results. A next step should be to assess whether, when clade supports are used to minimize the effects of small errors (Jenner, 2001) or data perturbations (Hovenkamp, 1996, 1999; Holmes, 2003) of the data matrix, a protocol should be used that is based on such random permutations, such as the Mojo procedure introduced by Wenzel and Siddall (1999) or the Carp support used by In den Bosch and Zandee (2001). Taxon-sampling effects and the size of data sets Poe (1998) noted that the severity of taxon-sampling effects increased with the size of the data set as the only significant correlation, but this correlation was based on relatively small data sets (five to 20 taxa). In substantially larger data sets the impact of taxonsampling effects appears to be strongly reduced (Zwickl and Hillis, 2002), and total error rates associated with bootstrap supports go down considerably (Berry and Gascuel, 1996). It may well be that when data sets reach sizes such as used to assess higher level phylogenies (Soltis et al., 1998; Savolainen et al., 2000; Tehler et al., 2000) the effect of taxon sampling may be negligible. To what extent that is actually the case, and at which combinations of data set size and sampling density taxon-sampling effects are no longer an issue should be assessed separately. However, carrying out the required analyses for large data sets may be prohibitively time-consuming. The data set I used here was carefully selected to combine a realistic size with the presence of marked taxon-sampling effects. The results presented here show that under those circumstances, there may be a loss in informativeness of the data if bootstrap supports are used as a proxy for clade stability. Acknowledgments I thank Catarina Rydin for kindly providing me with the aligned data sets, and an anonymous reviewer for some helpful suggestions for improving the manuscript. References Berry, V., Gascuel, O., 1996. On the interpretation of bootstrap trees: appropriate threshold of clade selection and induced gain. Mol. Biol. Evol. 13, 999 1011. Donoghue, M.J., Doyle, J.A., Gauthier, J., Kluge, A.G., Rowe, T., 1989. The importance of fossils in phylogeny reconstruction. Ann. Rev. Ecol. Syst. 20, 431 4. Duff, R.J., Nickrent, D.L., 1999. Phylogenetic relationships of land plants using mitochondrial small-subunit rdna sequences. Am. J. Bot. 86, 372 386. Goloboff, P.A., 1998. Nona. Program and documentation. ver. 2.9. Distributed by the author, Tucumán, Argentina. Goloboff, P.A., 2005. Minority rule supertrees? MPR, Compatibility and Minimum Flip may display the least frequent groups. Cladistics 21, 282 294. Goloboff, P.A., Pol, D., 2002. Semi-strict supertrees. Cladistics 18, 514 525. Goloboff, P.A., Farris, J.S., Nixon, K.C., 2004. TNT. Cladistics 20, 84. Graham, S.W., Olmstead, R.G., 2000. Utility of 17 chloroplast genes for inferring the phylogeny of the basal angiosperms. Am. J. Bot. 87, 1712 1730. Grandcolas, P., D Haese, C., 2001. The phylogeny of cockroach families: Is the current molecular hypothesis robust? Cladistics 17, 48 55. Graybeal, A., 1998. Is it better to add taxa or characters to a difficult phylogenetic problem? Syst. Biol. 47, 9 17. Hedges, S.B., 1992. The number of replications need for accurate estimation of the bootstrap P value in phylogenetic studies. Mol. Biol. Evol. 9, 366 369. Hillis, D.M., Bull., J.J., 1993. An empirical test of bootstrapping as a method for assessing confidence in phylogenetic analysis. Syst. Biol. 42, 182 192. Holmes, S., 2003. Bootstrapping phylogenetic trees: theory and methods. Stat. Sci. 18, 241 255. Hovenkamp, P.H., 1996. The inevitable instability of generic circumscriptions in Old World Polypodiaceae. In Camus, J.M., Gibby, M, Johns, R.J. (Eds.), Pteridology in Perspective. Royal Botanic Gardens, Kew, pp. 249 2. Hovenkamp, P., 1999. Unambiguous data or unambiguous results? Cladistics 15, 99 102. In den Bosch, H.A.J., Zandee, M., 2001. Courtship behaviour in lacertid lizards: phylogenetic interpretations of the Lacerta kulzeri complex (Reptilia: Lacertidae). Neth. J. Zool. 51, 263 284. Jenner, R.A., 2001. Bilaterian phylogeny and uncritical recycling of morphological data sets. Syst. Biol., 730 742. Lanyon, S.M., 1985. Detecting internal inconsistencies in distance data. Syst. Zool. 34, 397 403. Lecointre, G., Philippe, H., Lê, H.L.V., Le Guyader, H., 1993. Species sampling has a major impact on phylogenetic inference. Mol. Phyl. Evol. 2, 205 224. Leebens-Mack, J., Raubeson, L.A., Cui, L.Y., Kuehl, J.V., Fourcade, M.H., Chumley, T.W., Boore, J.L., Jansen, R.K., depamphilis, C.W., 2005. Identifying the basal angiosperm node in chloroplast genome phylogenies: sampling one s way out of the Felsenstein zone. Mol. Biol. Evol. 22, 1948 1963. Maddison, W.P., Donoghue, M.J., Maddison, D.R., 1984. Outgroup analysis and parsimony. Syst. Zool. 33, 83 103.

272 P. Hovenkamp / Cladistics 22 (2006) 264 275 Poe, S., 1998. Sensitivity of phylogeny estimation to taxonomic sampling. Syst. Bot. 47, 18 31. Poe, S., Swofford, D., 1999. Taxon sampling revisited. Nature, 398, 299 300. Pollock, D.P., Zwickl, D.J., McGuire, J.A., Hillis, D.M., 2002. Increased taxon sampling is advantageous for phylogenetic inference. Syst. Biol. 51, 664 671. Rodrigo, A.G., 1993. Calibrating the bootstrap test of monophyly. Int. J. Parasitol. 23, 7 514. Rosenberg, M.S., Kumar, S., 2001. Incomplete taxon sampling is not a problem for phylogenetic inference. Proc. Natl Acad. Sci. USA, 98, 10751 10756. Rydin, C., Källersjo, M., 2002. Taxon sampling and seed plant phylogeny. Cladistics 18, 485 513. Savolainen, V., Chase, M.W., Hoot, S.B., Morton, C.M., Soltis, D.E., Bayer, C., Fay, M.F., De Bruijn, A.Y., Sullivan, S., Qiu, Y.L., 2000. Phylogenetics of flowering plants based on combined analysis of plastid atpb and rbcl gene sequences. Syst. Biol. 49, 306 362. Siddall, M.E., 1995. Another monophyly index: revisiting the jackknife. Cladistics, 11, 33 56. Simmons, M.P., Pickett, K.M., Miya, M., 2004. How meaningful are Bayesian support values? Mol. Biol. Evol. 21, 188 199. Soltis, D.E., Soltis, P.S., Mort, M.E., Chase, M.W., Savolainen, V., Hoot, S.B., Morton, C.M., 1998. Inferring complex phylogenies using parsimony: an empirical approach using three large DNA data sets for angiosperms. Syst. Biol. 47, 32 42. Soltis, D.E., Albert, V.A., Savolainen, V., Hilu, K., Qiu, Y.L., Chase, M.W., Farris, J.S., Stefanovic, S., Rice, D.W., Palmer, J.D., Soltis, P.S., 2004. Genome-scale data, angiosperm relationships, and ending incongruence : a cautionary tale in phylogenetics. Trends Plant Sci. 9, 477 483. Tehler, A., Farris, J.S., Lipscomb, D.L., Ka llersjö, M., 2000. Phylogenetic analyses of fungi based on large rdna data sets. Mycologia 92, 459 474. Wenzel, J.W., Siddall, M.E., 1999. Noise. Cladistics 15, 51 64. Wilcox, T.P., Zwickl, D.J., Heath, T.A., Hillis, D.M., 2002. Phylogenetic relationships of the dwarf boas and a comparison of Bayesian and bootstrap measures of phylogenetic support. Mol. Phyl. Evol. 25, 361 371. Zwickl, D.J., Hillis, D.M., 2002. Increased taxon sampling greatly reduces phylogenetic error. Syst. Biol. 51, 588 598.

P. Hovenkamp / Cladistics 22 (2006) 264 275 273 Fig. 5. Strict consensus.

274 P. Hovenkamp / Cladistics 22 (2006) 264 275 Fig. 6. Bootstrap majority tree ( bootstrap replicates).

275 Fig. 7. Partial datasets. P. Hovenkamp / Cladistics 22 (2006) 264 275