Can taxon-sampling effects be minimized by using branch supports? P. Hovenkamp

Size: px
Start display at page:

Download "Can taxon-sampling effects be minimized by using branch supports? P. Hovenkamp"

Transcription

1 Cladistics Cladistics 22 (2006) Can taxon-sampling effects be minimized by using branch supports? P. Hovenkamp Nationaal Herbarium Nederland, Leiden, PO Box 9514, NL-2300 RA Leiden, The Netherlands Accepted 27 February 2006 Abstract Rydin and Ka llersjo (2002) found that taxon-sampling effects were a strongly disturbing factor in a high-level phylogenetic analysis. I have reanalyzed some of their data to assess whether bootstrap frequencies can be used to predict the stability of clades to taxon sampling, and to compare it with the performance of a stability measure based on taxon resampling ( taxon jackknifing ). High bootstrap frequencies correctly identify a small number of stable clades, but miss many other equally stable clades. When the total error rate is considered, no cut-off level based on bootstrap frequencies performs better than using all clades in the strict consensus, whereas a slight improvement was observed when cut-off levels based on taxon jackknifing frequencies are used. Ó The Willi Hennig Society Taxon-sampling effects are among the most worrying sources of error in phylogeny reconstruction. It has been well-documented that they may have a severe effect on the results of an analysis in studies based on both real and simulated data sets (Maddison et al., 1984, their fig. 14; Donoghue et al., 1989; Lecointre et al., 1993; Hovenkamp, 1996; Poe, 1998; Poe and Swofford, 1999; Graham and Olmstead, 2000; Grandcolas and D Haese, 2001; Rosenberg and Kumar, 2001; Rydin and Ka llersjo, 2002; Simmons et al., 2004; Soltis et al., 2004; Leebens-Mack et al., 2005). Their effect may also be counterintuitive, as Duff and Nickrent (1999) noted, who reported that deletion of a single taxon influenced distant clades more often than it disturbed the clades directly involved. More than anything else, the fear of taxon-sampling effects can undermine confidence in the value of an analysis. Taxon-sampling effects are not only apparently widespread, severe and unexpected, but also unavoidable. In many cases, it is impossible to apply a denser taxon sampling, thus sampling one s way out of the Felsenstein zone (Leebens-Mack et al., 2005), e.g., when dealing with groups where many of the species are only known from a few poorly preserved specimens, or in Corresponding author: P. Hovenkamp address: hovenkamp@nhn.leidenuniv.nl groups where high recent extinction rates are suspected. And in the cases where denser sampling is possible, it is not clear at what levels of intensity sampling effects become negligible. Poe (1998), for instance, showed that at least for small data sets, the size of the data set is positively correlated with the degree of distortion by taxon-sampling effects, which suggests that increasing the number of taxa may not always be effective. In practice, most systematists probably deal with taxon-sampling effects by ignoring clades with low support and then hope for the best. This presupposes that commonly used clade support measures, such as bootstrap frequencies, have some relationship to taxonsampling effects, and that they can be used to predict the persistence of clades under differential taxon-sampling schemes. The belief that the bootstrap provides such a measure appears to be widely held among practicing systematists (as any casual conversation will show), but is rarely expressed explicitly, and seems to be unsupported by factual evidence. Studies that aim at assessing the effects of taxon sampling (Graybeal, 1998; Poe, 1998; Pollock et al., 2002; Zwickl and Hillis, 2002) have primarily used tree-based, not node-based, measures of accuracy, and therefore are silent on this problem. The performance of node-based supports as predictors of clade stability to taxon sampling has been assessed explicitly only by Simmons et al. (2004), who found that Ó The Willi Hennig Society 2006

2 P. Hovenkamp / Cladistics 22 (2006) jackknife supports (they did not assess bootstrapping) are very poorly correlated to clade stability under such conditions (their fig. 3C). Savolainen et al. (2000) suggest that using bootstrap supports with a cut-off level of 80% is effective in identifying clades that are stable to the addition or deletion of taxa, but their claim is based on anecdotal evidence, not on a systematic assessment of the performance of bootstrap support using different cut-off levels (Chase, pers. comm.). Thus, little is known about the usefulness of bootstrap support either or not in combination with particular cutoff levels for evaluating clade stability. Moreover, bootstrap supports are based on manipulating character sampling, which raises the question whether supports that are based on manipulating taxon sampling may not be more appropriate indicators for the assessment of taxon-sampling effects. Such a support measure is available (Lanyon, 1985; Siddall, 1995), but has never been implemented in a computer program, and accordingly, is little used. Here, I investigate the performance of bootstrap support well as taxon jackknife figures in predicting the stability of clades to taxon-sampling effects. Data and methods I use part of the data set of Rydin and Ka llersjo (2002), which provides an example of a data set in which serious taxon-sampling effects have been observed. I selected their 38-taxon data sets 1, 2, 8 and 9, in which most of the more disturbing taxon-sampling effects were displayed. In accordance with their numbering, these data sets are here numbered 38-1, 38-2, 38-8 and In addition, I added four data sets produced by randomly resampling 38 taxa from the total set of taxa of these four data sets combined. These data sets are indicated as 38R1 38R4. All 38-taxon data sets will be referred to as partial data sets. A full data set was produced by combining the four selected partial data sets, containing 67 taxa. For this data set, a consensus tree and bootstrap supports were computed, which were used as reference tree against which the results of the partial analyses are evaluated, using either the consensus tree or the bootstrap consensus tree at different cut-off levels. For the partial data sets, minimal length trees, bootstrap majority trees and taxon jackknifing supports at taxon deletion levels of 1, 2, 5 and 10 taxa were computed for all clades in the minimal length trees. Minimal length trees and bootstrap supports were all computed with TNT (Goloboff et al., 2004). Heuristic searches were conducted using default settings (10 random addition sequences followed by TBR branch swapping, holding 10 trees per replication). For bootstrap, the standard bootstrap option was selected, and computed using 0 replications (Hedges, 1992). To compute taxon jackknifing values, batch files were prepared and results were tallied using WinSupport (written in Visual Basic and available on request) in conjunction with Nona (Goloboff, 1998). Heuristic searches were conducted with 15 random addition sequences followed by TBR branch swapping, holding 20 trees for each addition sequence (Nona command sequence hold 20; mult*15;max;). For all levels of taxon deletion, 0 replicates were used, except for the 1-taxon level, which was computed exactly by deleting each taxon in turn. To evaluate the predictivity of each type of branch support, I compared each accepted clade from the partial analyses with the accepted clades in the full analysis. If it could be identified with one of these, it was scored as a true positive, if not, as a false positive. Conversely, if an accepted clade from the full analysis was not accepted in a partial analysis, it was there scored as a false negative. These assignments were repeated while applying different cut-off support levels for the partial analyses. The use of cut-off levels introduces a fourth category, true negatives, for the clades that for a particular choice of cut-off levels are rejected or absent in both the partial and the full analysis. The proportion of false negatives thus represents the rate at which hypotheses of monophyly would be incorrectly rejected on the basis of a partial analysis ( type II error ), the rate at which hypotheses of monophyly would be incorrectly accepted ( type I error ) is given by the proportion of false positives. Table 2 provides a tabular overview of all the categories distinguished. When a tree is evaluated that contains only a subset of the taxa in the reference tree, the identification of clades in the subset tree with clades from the reference tree presents a fundamental problem (Goloboff and Pol, 2002; Goloboff, 2005). Often, it is impossible to identify a clade from a partial analysis unequivocally with a single containing clade from the reference tree. In many cases, it is sufficient to identify it by locating the corresponding most recent common ancestor (MRCA), but this cannot be made into a general strategy, as it can (and does) easily lead to conflicting identifications and ignores real conflict between cladograms. For example, in partial analysis 38-02, the MRCA of clades 72 and 73 both correspond to clade 75 in the full analysis; and of both clade 65 (Gnetales Angiosperms) and 71 (Seedplants) to containing clade 74 (Seedplants). When systematists interpret phylogenetic trees they cannot therefore rely on simple algorithmic identifications, but must continuously apply their professional judgment to assess the relevance of a tree to a particular research question involving a larger group of taxa. There are unproblematic cases where no one would hesitate to apply the results of a partial analysis to a more inclusive clade. For instance, analysis of partial data set 38R1 returns a clade containing Isoetes and Phylloglossum. At

3 266 P. Hovenkamp / Cladistics 22 (2006) face value this could be taken as support for monophyly of Isoetes and Phylloglossum, whereas applying the MRCA criterion would identify this clade with all vascular plants in the full analysis. However, as in this taxon selection Huperzia (generally associated with Phylloglossum where the two occur in the same data set) is not represented, this clade may reasonably be interpreted as supporting Lycopodiales, and not as evidence for exclusive monophyly of Isoetes and Phylloglossum. To the extent, however, that the position of the excluded taxa is more ambiguous, such assessments become more difficult to make. In interpreting the results of the partial analyses I have tried to make reasonable identifications. They are listed in Table 1, which provides, for each clade of the partial analyses (numbered as in the cladograms of Appendix 1, Fig. 7), the corresponding containing clade (numbered as in Appendix 1) Results Analysis of the full data set produced 66 cladograms of minimal length, with a strict consensus that is for the most part well-resolved (Appendix 1, Figs 5, 6). In the consensus trees obtained for partial data sets in total 248 clades were present (Appendix 2). A full list of all partial clades, their presence in the partial analysis and their identification is given in Table 1. The distribution of bootstrap values in the partial analyses is shown in Fig. 1 (upper panel). The proportion of high bootstrap values in this study is about average compared with several data sets reported in the literature (data not shown). Taxon-sampling effects Inspection of Table 1 shows that the constancy of clades over the partial analyses appears to be linked to their bootstrap supports, but also that there may be exceptions to this linkage. Thus, clades that are contradicted in a partial analysis may appear with high supports in others (e.g., the seedplant clade, which is contradicted in analysis 38R2), but in most cases where this is the case, the support for the contradictory clade is usually low (< %) or absent. More often, clades that have high bootstrap values ( %) in one partial analysis have bootstrap values that would be considered inadequate by most researchers in others. Bootstrap support For a more quantitative appreciation of the predictive accuracy of bootstrap supports, the bootstrap support can be related to the percentage of correct clades. When the consensus tree of the full analysis is used as a Fig. 1. Relation between percentage correctly predicted clades (criterion: presence in consensus of full analysis) and bootstrap support in partial analyses.upper panel: total numbers of clades (binsize), lower panel: percentage of correct clades. reference, this relationship (Fig. 2) shows a more conservative behavior than has been found in other studies (Hillis and Bull, 1993; Wilcox et al., 2002; Simmons et al., 2004), with bootstrap percentages over % already indicating correct clades with a frequency of 80 %. The conservative behavior (Hillis and Bull, 1993; Rodrigo, 1993) of the bootstrap indicates that hypotheses of monophyly are more often rejected than they are in fact incorrect. To explore the effect of this conservative behavior, Fig. 2 shows the different error rates at different cut-off levels for the partial analyses, evaluated on the basis of a reference tree based on the consensus of the full analysis. Of all clades from the consensus trees of the partial analyses, 33 (13%) are not represented in the consensus of the full analysis (12 of which are in fact contradicted). Thus, acceptance of the results of a partial analysis with no consideration of bootstrap supports would, as the result of taxon-sampling effects, have an error rate of 13%, composed entirely of false positives. If clades are accepted on the basis of a particular bootstrap support level, increasing the cut-off level from % (accepting all clades in the bootstrap majority tree) results in increasingly smaller numbers of accepted clades and in an associated drop in the rate of false positive results. However, this drop is more than compensated by the increase in false negatives, resulting from the rejection of clades by the increasingly restrictive cut-off levels. The

4 P. Hovenkamp / Cladistics 22 (2006) Table 1 All clades occurring in at least one of the strict consensus trees for the partial analyses, and the corresponding clade number and associated bootstrap support in the analysis of the full dataset. Numbers refer to clade numbers in Appendix 1, bootstrap supports between brackets; a: clade absent in the full analysis; n: clade cannot be matched to a clade in the full dataset due to differential taxon composition; c: clade contradicted in the partial analysis Partial datasets Full dataset clade 38R1 38R2 38R3 38R (< ) n 38 (< ) 38 (< ) n n 47 (< ) n n 69 (< ) n 39 (< ) n 38 (99) 45 () 48 (< ) 59 (99) n 70 (65) 45 () 66 (< ) 39 (53) 39 (99) 46 (91) 68 (63) (99) 59 (< ) 71 (< ) a 67 (< ) 53 (57) c 56 (56) 69 (51) c 63 (< ) 72 (64) 68 (98) 68 (< ) 54 (59) 53 (96) 57 (78) 70 (76) 64 (75) 64 (63) 73 (< ) c 69 (< ) 69 (< ) c a c a 68 (< ) 74 (70) 69 (96) c 70 (63) 65 () 71 (89) 71 (84) 69 (< ) 69 (< ) 75 (67) 71 (73) n 72 (8 71 (69) n n 71 (78) 71 (70) 76 (83) 72 (82) n n 72 (79) n n 72 (85) 72 (87) 78 (66) 39 (69) n 41 (85) 41 (80) n n 48 (67) 48 (< ) 79 (< ) n n c n n n 49 (< ) 49 () 80 () 40 (99) 41 () 43 (95) 42 (99) 38 (99) 38 () (99) n 81 (62) n a 44 (86) n n n a n 82 (< ) 43 (93) 44 (< ) 46 (< ) 44 (79) 40 () 40 () 54 (< ) 52 (< ) 84 (< ) 66 (70) 62 (54) 52 (< ) 52 (80) 55 (53) 52 (< ) a a 85 (55) 38 (95) n 40 (96) n n n 45 (55) 45 (52) 86 (73) n n n n n n 46 (72) 46 (70) 87 (88) n n n 40 (72) n n 47 (86) 47 (63) 88 () 41 () 43 (99) n 43 () n n 44 () 44 (97) 89 (< ) 42 (< ) c n n n n 52 (< ) a (97) 44 (98) n n 45 (98) 41 (96) 41 (96) 55 (95) 53 (95) 91 (99) n 47 () n 49 () n () n n 92 () 64 (99) n 49 (99) n 51 () n n n 93 (98) 65 (97) 48 (93) (99) (98) 53 (99) n n n 94 (< ) a n 51 (< ) c n 51 (< ) n n 95 (< ) n n n n 64 (< ) (< ) n n 96 (< ) 56 (57) 56 (76) 66 () 62 (< ) 67 (66) 62 (64) n n 97 (69) 57 (98) 57 (58) 67 (93) 63 (71) 68 (< ) 63 (< ) 67 (61) 66 (57) 98 () 58 (93) 64 (98) 68 (92) 64 (98) 69 (99) 64 (99) 68 (91) 67 () 99( < ) 49 (79) 46 (84) 47 (62) 48 (94) () n 62 (52) 61 (51) (< ) 63 (99) n 48 () n n n 63 () 62 () 101 (< ) 47 (85) n n n a 53 (51) n n 102 (76) 48 (85) 45 (75) n n n n n n 104 (99) () 49 () 55 (99) n n 43 () n 56 () 105 (96) 51 (99) (61) 58 (98) 56 (99) 44 (99) 46 (99) 58 () 57 (< ) 106 () n n n n 63 () 59 () n n 107 (98) n n 56 (99) 54 (98) n 44 (99) a n 108 (89) n n 57 (94) 55 (91) 43 (83) 45 (86) 57 (92) 55 (93) 109 () n n n 58 () () 55 () 66 () 65 () 110 (71) n 63 (70) n n 61 (77) 56 (80) n n 111 (99) n 52 () 59 () n n n 38 () 38 (99) 112 (97) n n n 66 (87) n n 39 (97) 39 (97) 113 (< ) n 53 (< ) n 59 (< ) 62 (0) 58 (< ) n n 114 (< ) n 54 (< ) 63 (< ) n n n n n 115 () 53 () n 61 () n n n 40 (9 40 () 116 (65) n 58 (< ) c 68 (< ) n n 42 (54) 42 (53) 117 (88) n 59 (88) n n n 49 (87) n n 118 (< ) 55 (< ) n n n n 57 (< ) n n 119 (99) n n n n 47 (99) n n n 120 (99) (99) n n n 48 (99) n n n 121 () n n n n n n 51 () 51 () 122 (99) n n n 67 (99) n n 41 () 41 () 123 (98) 62 (98) n n n n n n n a 46 (< ) n n a a n n n a 52 (< ) 55 (< ) a a 65 (< ) 61 (< ) n n a 54 (77) a 62 (< ) a n n 43 (0) 43 (< ) a 59 (< ) c c a a 65 (< ) a c

5 268 P. Hovenkamp / Cladistics 22 (2006) Table 1 Continued Partial datasets Full dataset clade 38R1 38R2 38R3 38R a 70 (< ) a 71 (< ) 70 (< ) n c c c a n 40 (< ) c n n n c n a c 51 () c n n n c c a n (< ) n n n a n n a n 61 (< ) n c a n n n a n c 42 (54) n n n c n a c c 45 (< ) n n n c a a n c (< ) c n n c c a a c c 46 (< ) c c c c a n c n 51 (< ) c n n n a a a n n 39 (62) 39 () c a a a a a a 72 (82) 72 (83) 70 (78) 70 (< ) a n n n n a 42 (< ) 56 (< ) 54 (< ) a n n n n n 54 (< ) n n Table 2 Status assigned to clades in partial analysis. Acceptance can be on basis of various cut-off levels. Category False positive corresponds to Type II error, the category False negatives to Type I errors. Together, these two represent the error rate. In full analysis In partial analyses Accepted Rejected Accepted True positive False positive Rejected False negative True negative A steep increase is only present when both the level of taxon deletion and the cut-off level is relatively high, and as with bootstrap supports, this increase is mainly due to an increase in the number of false negatives. For each level of taxon deletion, there exists an optimal cut-off level at which the total error rate is slightly but distinctly lower than the null-rate for no clade selection. As severity of taxon deletion increases, this optimal cut-off level shifts towards lower values. total error rate, composed of both false positives and false negatives increases from 13% to reach 70% at bootstrap cut-off levels of %. Increasing cut-off levels result mainly in an increase of false negatives, while the number of correctly rejected clades remains more or less constant. Taxon jackknife support A comparison between bootstrap and taxon jackknife support is useful only if the two are not strongly linearly correlated. Figure 3 shows the correlations between the bootstrap support values and the taxon jackknife support values at different levels of taxon deletion for all 248 clades included in the analysis. Correlations are clearly not linear, but show saturation plots, in which the absence of a clear correlation below bootstrap levels of 80% is notable at all levels of taxon deletion. Based on these plots, we may expect that the performance of taxon jackknife support differs from that of bootstrap support. An analysis of error rates for the four levels of taxon jackknife deletion (Fig. 4) confirms this. For all levels of taxon deletion, there is only a very moderate increase, or even a slight decrease, in the total error rate for cut-off levels below 80%. Discussion Bootstrap proportions have been found to be conservative estimates of accuracy (Hillis and Bull, 1993) irrespective of whether accuracy is judged by reference to a real or a simulated true tree. Thus, the common practice of selection of clades on the basis of bootstrap proportion can be expected to lead to a rejection of clades more often than is actually warranted by the data (Rodrigo, 1993) and thus to a high rate of false negatives. The procedure used here to quantify this effect is comparable with other studies of the performance of bootstrap supports based on total error rates (Rodrigo, 1993; Berry and Gascuel, 1996), and in particular to the iterated bootstrap procedure, used by Rodrigo (1993) to asses error rates for bootstrap values. In the iterated bootstrap procedure, a single large data set is resampled for characters in a two-step procedure. The procedure used here differs in that taxa instead of characters are resampled in the first step. Despite this difference, there is a close agreement with the results of Rodrigo (1993) and Berry and Gascuel (1996). They found that total error rates were lowest at bootstrap cutoff levels of approximately %, and rose with higher cut-off levels. My results show that this behavior of the bootstrap also holds when accuracy is judged with

6 P. Hovenkamp / Cladistics 22 (2006) % % 80% 70% True positives True negatives False negatives False positives % % 40% 30% 20% 10% 0% cons Fig. 2. Bootstrap supports in partial analysis as indicators of clade presence in complete analysis. Horizontal: cut-off value used to accept or reject clades in the partial analyses cons: consensus tree. Vertical: percentage of clades from partial analyses (N ¼ 248) taxon deletion 2 taxon deletion taxon deletion taxon deletion 95 Fig. 3. Correlations between bootstrap support and taxon jackknife support at different levels of taxon deletion. Hor: bootstrap support, vert. taxon jackknife support.

7 270 P. Hovenkamp / Cladistics 22 (2006) % % 80% 70% % % 40% 30% 20% 10% 0% % % 80% 70% % % 40% 30% 20% 10% 0% % True positives True negatives % False negatives False positives 80% 70% % % 40% 30% 20% 10% 0% cons cons taxon deletion 2 taxon deletion % % 80% 70% % % 40% 30% 20% 10% 0% cons cons taxon deletion 10 taxon deletion Fig. 4. Taxon jackknife supports in partial analysis as indicators of clade presence in full analysis. T1: Jackknifing 1 taxon; T2 Jackknifing 2 taxa; T5 Jackknifing 5 taxa. Axes as in Fig. 2. reference to a tree based on a more inclusive taxon sample. I find a similar rise in total error rate from bootstrap cut-off levels of % going to %, although absolute levels are much lower than those reported by Rodrigo (1993). However, the total error rate for any cut-off level is minimally as high or higher than the error rate without any selection. This is in contrast to the results of Berry and Gascuel (1996), who found that error rate could be improved substantially (under some circumstances) by using a % bootstrap cut-off level. I found that even at this low cut-off level, the number of false negatives outweighed the number of true negatives, leading to a rising total error rate. The results presented here indicate that when taxonsampling effects are a serious concern, bootstrap support can be used as a predictor of clade stability only to a limited extent. On the one hand, it is clear that clades with a high bootstrap support are robust not only to differential character sampling (a property that is assessed directly by bootstrapping), but also to increased taxon sampling. Thus, the use of bootstrap supports in combination with a high acceptance level is likely to reduce the number of falsely accepted clades. On the other hand, it is not the case that clades with low bootstrap supports disappear under increased taxon sampling. Especially with cut-off levels in the range of 80 % selecting clades on the basis of bootstrap proportion may lead to an increase in number of falsely rejected clades that more than outweighs the reduction in the number of falsely accepted ones. Using such cutoff levels may be justified if the main interest is in the avoidance of falsely accepted clades, and if the associated increase in falsely rejected clades is not considered a problem. But the associated increase in total error rate represents an under-utilization of the information present in the data, and thus may necessitate the use of more data than is actually needed to resolve a particular problem. Whenever it is desirable also to minimize false negatives and thus to maximize the amount of information obtained from the data, the total error rate should be considered. On this basis, no cut-off level for accepting clades on basis of bootstrap support was found to be an improvement on the performance of a strict consensus as acceptance criterion for clades. When clades are evaluated based on their stability under a taxon deletion protocol the effects of the numbers of taxa deleted should be taken into account. Figure 3 shows that the effect of deleting a larger number of taxa increases the variance in support values thus obtained, and also that there is only a weak correlation between these support values and bootstrap

8 P. Hovenkamp / Cladistics 22 (2006) values. Using a taxon jackknifing protocol a slight improvement in predictivity compared with using no support values can be observed at all four levels of taxon deletion (Fig. 4). It is difficult to say whether this slight decrease is truly significant, but the consistency with which the drop in total error rates is present at at least one cut-off level at all investigated levels of taxon deletion, suggests that the effect is not accidental. The observed pattern thus suggests that the total error rate of the strict consensus can be improved on slightly by using a combination of a severe taxon deletion scheme and a relaxed clade selection cut-off level or vice versa. This difference in performance between different resampling protocols suggests that clade supports based on data perturbation can be made maximally informative by adapting the type of perturbation to the specific source of error that it is expected to counteract. Thus, when support figures are intended to minimize taxonsampling effects, a permutation procedure using a taxon-resampling protocol is shown to give optimal results. A next step should be to assess whether, when clade supports are used to minimize the effects of small errors (Jenner, 2001) or data perturbations (Hovenkamp, 1996, 1999; Holmes, 2003) of the data matrix, a protocol should be used that is based on such random permutations, such as the Mojo procedure introduced by Wenzel and Siddall (1999) or the Carp support used by In den Bosch and Zandee (2001). Taxon-sampling effects and the size of data sets Poe (1998) noted that the severity of taxon-sampling effects increased with the size of the data set as the only significant correlation, but this correlation was based on relatively small data sets (five to 20 taxa). In substantially larger data sets the impact of taxonsampling effects appears to be strongly reduced (Zwickl and Hillis, 2002), and total error rates associated with bootstrap supports go down considerably (Berry and Gascuel, 1996). It may well be that when data sets reach sizes such as used to assess higher level phylogenies (Soltis et al., 1998; Savolainen et al., 2000; Tehler et al., 2000) the effect of taxon sampling may be negligible. To what extent that is actually the case, and at which combinations of data set size and sampling density taxon-sampling effects are no longer an issue should be assessed separately. However, carrying out the required analyses for large data sets may be prohibitively time-consuming. The data set I used here was carefully selected to combine a realistic size with the presence of marked taxon-sampling effects. The results presented here show that under those circumstances, there may be a loss in informativeness of the data if bootstrap supports are used as a proxy for clade stability. Acknowledgments I thank Catarina Rydin for kindly providing me with the aligned data sets, and an anonymous reviewer for some helpful suggestions for improving the manuscript. References Berry, V., Gascuel, O., On the interpretation of bootstrap trees: appropriate threshold of clade selection and induced gain. Mol. Biol. Evol. 13, Donoghue, M.J., Doyle, J.A., Gauthier, J., Kluge, A.G., Rowe, T., The importance of fossils in phylogeny reconstruction. Ann. Rev. Ecol. Syst. 20, Duff, R.J., Nickrent, D.L., Phylogenetic relationships of land plants using mitochondrial small-subunit rdna sequences. Am. J. Bot. 86, Goloboff, P.A., Nona. Program and documentation. ver Distributed by the author, Tucumán, Argentina. Goloboff, P.A., Minority rule supertrees? MPR, Compatibility and Minimum Flip may display the least frequent groups. Cladistics 21, Goloboff, P.A., Pol, D., Semi-strict supertrees. Cladistics 18, Goloboff, P.A., Farris, J.S., Nixon, K.C., TNT. Cladistics 20, 84. Graham, S.W., Olmstead, R.G., Utility of 17 chloroplast genes for inferring the phylogeny of the basal angiosperms. Am. J. Bot. 87, Grandcolas, P., D Haese, C., The phylogeny of cockroach families: Is the current molecular hypothesis robust? Cladistics 17, Graybeal, A., Is it better to add taxa or characters to a difficult phylogenetic problem? Syst. Biol. 47, Hedges, S.B., The number of replications need for accurate estimation of the bootstrap P value in phylogenetic studies. Mol. Biol. Evol. 9, Hillis, D.M., Bull., J.J., An empirical test of bootstrapping as a method for assessing confidence in phylogenetic analysis. Syst. Biol. 42, Holmes, S., Bootstrapping phylogenetic trees: theory and methods. Stat. Sci. 18, Hovenkamp, P.H., The inevitable instability of generic circumscriptions in Old World Polypodiaceae. In Camus, J.M., Gibby, M, Johns, R.J. (Eds.), Pteridology in Perspective. Royal Botanic Gardens, Kew, pp Hovenkamp, P., Unambiguous data or unambiguous results? Cladistics 15, In den Bosch, H.A.J., Zandee, M., Courtship behaviour in lacertid lizards: phylogenetic interpretations of the Lacerta kulzeri complex (Reptilia: Lacertidae). Neth. J. Zool. 51, Jenner, R.A., Bilaterian phylogeny and uncritical recycling of morphological data sets. Syst. Biol., Lanyon, S.M., Detecting internal inconsistencies in distance data. Syst. Zool. 34, Lecointre, G., Philippe, H., Lê, H.L.V., Le Guyader, H., Species sampling has a major impact on phylogenetic inference. Mol. Phyl. Evol. 2, Leebens-Mack, J., Raubeson, L.A., Cui, L.Y., Kuehl, J.V., Fourcade, M.H., Chumley, T.W., Boore, J.L., Jansen, R.K., depamphilis, C.W., Identifying the basal angiosperm node in chloroplast genome phylogenies: sampling one s way out of the Felsenstein zone. Mol. Biol. Evol. 22, Maddison, W.P., Donoghue, M.J., Maddison, D.R., Outgroup analysis and parsimony. Syst. Zool. 33,

9 272 P. Hovenkamp / Cladistics 22 (2006) Poe, S., Sensitivity of phylogeny estimation to taxonomic sampling. Syst. Bot. 47, Poe, S., Swofford, D., Taxon sampling revisited. Nature, 398, Pollock, D.P., Zwickl, D.J., McGuire, J.A., Hillis, D.M., Increased taxon sampling is advantageous for phylogenetic inference. Syst. Biol. 51, Rodrigo, A.G., Calibrating the bootstrap test of monophyly. Int. J. Parasitol. 23, Rosenberg, M.S., Kumar, S., Incomplete taxon sampling is not a problem for phylogenetic inference. Proc. Natl Acad. Sci. USA, 98, Rydin, C., Källersjo, M., Taxon sampling and seed plant phylogeny. Cladistics 18, Savolainen, V., Chase, M.W., Hoot, S.B., Morton, C.M., Soltis, D.E., Bayer, C., Fay, M.F., De Bruijn, A.Y., Sullivan, S., Qiu, Y.L., Phylogenetics of flowering plants based on combined analysis of plastid atpb and rbcl gene sequences. Syst. Biol. 49, Siddall, M.E., Another monophyly index: revisiting the jackknife. Cladistics, 11, Simmons, M.P., Pickett, K.M., Miya, M., How meaningful are Bayesian support values? Mol. Biol. Evol. 21, Soltis, D.E., Soltis, P.S., Mort, M.E., Chase, M.W., Savolainen, V., Hoot, S.B., Morton, C.M., Inferring complex phylogenies using parsimony: an empirical approach using three large DNA data sets for angiosperms. Syst. Biol. 47, Soltis, D.E., Albert, V.A., Savolainen, V., Hilu, K., Qiu, Y.L., Chase, M.W., Farris, J.S., Stefanovic, S., Rice, D.W., Palmer, J.D., Soltis, P.S., Genome-scale data, angiosperm relationships, and ending incongruence : a cautionary tale in phylogenetics. Trends Plant Sci. 9, Tehler, A., Farris, J.S., Lipscomb, D.L., Ka llersjö, M., Phylogenetic analyses of fungi based on large rdna data sets. Mycologia 92, Wenzel, J.W., Siddall, M.E., Noise. Cladistics 15, Wilcox, T.P., Zwickl, D.J., Heath, T.A., Hillis, D.M., Phylogenetic relationships of the dwarf boas and a comparison of Bayesian and bootstrap measures of phylogenetic support. Mol. Phyl. Evol. 25, Zwickl, D.J., Hillis, D.M., Increased taxon sampling greatly reduces phylogenetic error. Syst. Biol. 51,

10 P. Hovenkamp / Cladistics 22 (2006) Fig. 5. Strict consensus.

11 274 P. Hovenkamp / Cladistics 22 (2006) Fig. 6. Bootstrap majority tree ( bootstrap replicates).

12 275 Fig. 7. Partial datasets. P. Hovenkamp / Cladistics 22 (2006)

13

Phylogenetic inference

Phylogenetic inference Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis-) advantages of different information types

More information

Letter to the Editor. The Effect of Taxonomic Sampling on Accuracy of Phylogeny Estimation: Test Case of a Known Phylogeny Steven Poe 1

Letter to the Editor. The Effect of Taxonomic Sampling on Accuracy of Phylogeny Estimation: Test Case of a Known Phylogeny Steven Poe 1 Letter to the Editor The Effect of Taxonomic Sampling on Accuracy of Phylogeny Estimation: Test Case of a Known Phylogeny Steven Poe 1 Department of Zoology and Texas Memorial Museum, University of Texas

More information

(Stevens 1991) 1. morphological characters should be assumed to be quantitative unless demonstrated otherwise

(Stevens 1991) 1. morphological characters should be assumed to be quantitative unless demonstrated otherwise Bot 421/521 PHYLOGENETIC ANALYSIS I. Origins A. Hennig 1950 (German edition) Phylogenetic Systematics 1966 B. Zimmerman (Germany, 1930 s) C. Wagner (Michigan, 1920-2000) II. Characters and character states

More information

Consensus methods. Strict consensus methods

Consensus methods. Strict consensus methods Consensus methods A consensus tree is a summary of the agreement among a set of fundamental trees There are many consensus methods that differ in: 1. the kind of agreement 2. the level of agreement Consensus

More information

Ratio of explanatory power (REP): A new measure of group support

Ratio of explanatory power (REP): A new measure of group support Molecular Phylogenetics and Evolution 44 (2007) 483 487 Short communication Ratio of explanatory power (REP): A new measure of group support Taran Grant a, *, Arnold G. Kluge b a Division of Vertebrate

More information

Inferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution

Inferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution Today s topics Inferring phylogeny Introduction! Distance methods! Parsimony method!"#$%&'(!)* +,-.'/01!23454(6!7!2845*0&4'9#6!:&454(6 ;?@AB=C?DEF Overview of phylogenetic inferences Methodology Methods

More information

Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition

Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition David D. Pollock* and William J. Bruno* *Theoretical Biology and Biophysics, Los Alamos National

More information

Evaluating phylogenetic hypotheses

Evaluating phylogenetic hypotheses Evaluating phylogenetic hypotheses Methods for evaluating topologies Topological comparisons: e.g., parametric bootstrapping, constrained searches Methods for evaluating nodes Resampling techniques: bootstrapping,

More information

Molecular Phylogenetics and Evolution

Molecular Phylogenetics and Evolution Molecular Phylogenetics and Evolution 61 (2011) 177 191 Contents lists available at ScienceDirect Molecular Phylogenetics and Evolution journal homepage: www.elsevier.com/locate/ympev Spurious 99% bootstrap

More information

Dr. Amira A. AL-Hosary

Dr. Amira A. AL-Hosary Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

More information

Systematics - Bio 615

Systematics - Bio 615 Bayesian Phylogenetic Inference 1. Introduction, history 2. Advantages over ML 3. Bayes Rule 4. The Priors 5. Marginal vs Joint estimation 6. MCMC Derek S. Sikes University of Alaska 7. Posteriors vs Bootstrap

More information

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the

More information

Weighted compromise trees: a method to summarize competing phylogenetic hypotheses

Weighted compromise trees: a method to summarize competing phylogenetic hypotheses Cladistics Cladistics 29 (2013) 309 314 10.1111/cla.12000 Weighted compromise trees: a method to summarize competing phylogenetic hypotheses Michael J. Sharkey a, *, Stephanie Stoelb a, Daniel R. Miranda-Esquivel

More information

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

More information

Combining Data Sets with Different Phylogenetic Histories

Combining Data Sets with Different Phylogenetic Histories Syst. Biol. 47(4):568 581, 1998 Combining Data Sets with Different Phylogenetic Histories JOHN J. WIENS Section of Amphibians and Reptiles, Carnegie Museum of Natural History, Pittsburgh, Pennsylvania

More information

Letter to the Editor. Department of Biology, Arizona State University

Letter to the Editor. Department of Biology, Arizona State University Letter to the Editor Traditional Phylogenetic Reconstruction Methods Reconstruct Shallow and Deep Evolutionary Relationships Equally Well Michael S. Rosenberg and Sudhir Kumar Department of Biology, Arizona

More information

Hillis DM Inferring complex phylogenies. Nature 383:

Hillis DM Inferring complex phylogenies. Nature 383: Hillis DM. 1996. Inferring complex phylogenies. Nature 383: 130-131. Triangles: parsimony Squares: neighbor-joining (under specified model) Circles: UPGMA Designing your phylogenetic analysis Choice of

More information

ASSESSING AMONG-LOCUS VARIATION IN THE INFERENCE OF SEED PLANT PHYLOGENY

ASSESSING AMONG-LOCUS VARIATION IN THE INFERENCE OF SEED PLANT PHYLOGENY Int. J. Plant Sci. 168(2):111 124. 2007. Ó 2007 by The University of Chicago. All rights reserved. 1058-5893/2007/16802-0001$15.00 ASSESSING AMONG-LOCUS VARIATION IN THE INFERENCE OF SEED PLANT PHYLOGENY

More information

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley

PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION Integrative Biology 200B Spring 2009 University of California, Berkeley "PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley B.D. Mishler Jan. 22, 2009. Trees I. Summary of previous lecture: Hennigian

More information

Bootstrapping and Tree reliability. Biol4230 Tues, March 13, 2018 Bill Pearson Pinn 6-057

Bootstrapping and Tree reliability. Biol4230 Tues, March 13, 2018 Bill Pearson Pinn 6-057 Bootstrapping and Tree reliability Biol4230 Tues, March 13, 2018 Bill Pearson wrp@virginia.edu 4-2818 Pinn 6-057 Rooting trees (outgroups) Bootstrapping given a set of sequences sample positions randomly,

More information

Effects of Gap Open and Gap Extension Penalties

Effects of Gap Open and Gap Extension Penalties Brigham Young University BYU ScholarsArchive All Faculty Publications 200-10-01 Effects of Gap Open and Gap Extension Penalties Hyrum Carroll hyrumcarroll@gmail.com Mark J. Clement clement@cs.byu.edu See

More information

Minimum evolution using ordinary least-squares is less robust than neighbor-joining

Minimum evolution using ordinary least-squares is less robust than neighbor-joining Minimum evolution using ordinary least-squares is less robust than neighbor-joining Stephen J. Willson Department of Mathematics Iowa State University Ames, IA 50011 USA email: swillson@iastate.edu November

More information

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley

Integrative Biology 200 PRINCIPLES OF PHYLOGENETICS Spring 2018 University of California, Berkeley Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley B.D. Mishler Feb. 14, 2018. Phylogenetic trees VI: Dating in the 21st century: clocks, & calibrations;

More information

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2016 University of California, Berkeley. Parsimony & Likelihood [draft]

Integrative Biology 200 PRINCIPLES OF PHYLOGENETICS Spring 2016 University of California, Berkeley. Parsimony & Likelihood [draft] Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2016 University of California, Berkeley K.W. Will Parsimony & Likelihood [draft] 1. Hennig and Parsimony: Hennig was not concerned with parsimony

More information

Concepts and Methods in Molecular Divergence Time Estimation

Concepts and Methods in Molecular Divergence Time Estimation Concepts and Methods in Molecular Divergence Time Estimation 26 November 2012 Prashant P. Sharma American Museum of Natural History Overview 1. Why do we date trees? 2. The molecular clock 3. Local clocks

More information

Isolating - A New Resampling Method for Gene Order Data

Isolating - A New Resampling Method for Gene Order Data Isolating - A New Resampling Method for Gene Order Data Jian Shi, William Arndt, Fei Hu and Jijun Tang Abstract The purpose of using resampling methods on phylogenetic data is to estimate the confidence

More information

Consensus Methods. * You are only responsible for the first two

Consensus Methods. * You are only responsible for the first two Consensus Trees * consensus trees reconcile clades from different trees * consensus is a conservative estimate of phylogeny that emphasizes points of agreement * philosophy: agreement among data sets is

More information

reconciling trees Stefanie Hartmann postdoc, Todd Vision s lab University of North Carolina the data

reconciling trees Stefanie Hartmann postdoc, Todd Vision s lab University of North Carolina the data reconciling trees Stefanie Hartmann postdoc, Todd Vision s lab University of North Carolina 1 the data alignments and phylogenies for ~27,000 gene families from 140 plant species www.phytome.org publicly

More information

Integrating Fossils into Phylogenies. Throughout the 20th century, the relationship between paleontology and evolutionary biology has been strained.

Integrating Fossils into Phylogenies. Throughout the 20th century, the relationship between paleontology and evolutionary biology has been strained. IB 200B Principals of Phylogenetic Systematics Spring 2011 Integrating Fossils into Phylogenies Throughout the 20th century, the relationship between paleontology and evolutionary biology has been strained.

More information

A (short) introduction to phylogenetics

A (short) introduction to phylogenetics A (short) introduction to phylogenetics Thibaut Jombart, Marie-Pauline Beugin MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis with PR Statistics, Millport Field

More information

Museum of Natural History, New York, New York, USA

Museum of Natural History, New York, New York, USA This article was downloaded by:[american Museum of Natural History] On: 3 July 2008 Access Details: [subscription number 767966983] Publisher: Taylor & Francis Informa Ltd Registered in England and Wales

More information

Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30

Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30 Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30 A non-phylogeny

More information

Frequentist Properties of Bayesian Posterior Probabilities of Phylogenetic Trees Under Simple and Complex Substitution Models

Frequentist Properties of Bayesian Posterior Probabilities of Phylogenetic Trees Under Simple and Complex Substitution Models Syst. Biol. 53(6):904 913, 2004 Copyright c Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 10.1080/10635150490522629 Frequentist Properties of Bayesian Posterior Probabilities

More information

C3020 Molecular Evolution. Exercises #3: Phylogenetics

C3020 Molecular Evolution. Exercises #3: Phylogenetics C3020 Molecular Evolution Exercises #3: Phylogenetics Consider the following sequences for five taxa 1-5 and the known outgroup O, which has the ancestral states (note that sequence 3 has changed from

More information

Cladistics. Measures of stratigraphic fit to phylogeny and their sensitivity to tree size, tree shape, and scale

Cladistics. Measures of stratigraphic fit to phylogeny and their sensitivity to tree size, tree shape, and scale Cladistics Cladistics 2 (24) 64 75 www.blackwell-synergy.com Measures of stratigraphic fit to phylogeny and their sensitivity to tree size, tree shape, and scale Diego Pol 1,*, Mark A. Norell 1 and Mark

More information

Bootstrap confidence levels for phylogenetic trees B. Efron, E. Halloran, and S. Holmes, 1996

Bootstrap confidence levels for phylogenetic trees B. Efron, E. Halloran, and S. Holmes, 1996 Bootstrap confidence levels for phylogenetic trees B. Efron, E. Halloran, and S. Holmes, 1996 Following Confidence limits on phylogenies: an approach using the bootstrap, J. Felsenstein, 1985 1 I. Short

More information

8/23/2014. Phylogeny and the Tree of Life

8/23/2014. Phylogeny and the Tree of Life Phylogeny and the Tree of Life Chapter 26 Objectives Explain the following characteristics of the Linnaean system of classification: a. binomial nomenclature b. hierarchical classification List the major

More information

Reconstructing the history of lineages

Reconstructing the history of lineages Reconstructing the history of lineages Class outline Systematics Phylogenetic systematics Phylogenetic trees and maps Class outline Definitions Systematics Phylogenetic systematics/cladistics Systematics

More information

Questions we can ask. Recall. Accuracy and Precision. Systematics - Bio 615. Outline

Questions we can ask. Recall. Accuracy and Precision. Systematics - Bio 615. Outline Outline 1. Mechanistic comparison with Parsimony - branch lengths & parameters 2. Performance comparison with Parsimony - Desirable attributes of a method - The Felsenstein and Farris zones - Heterotachous

More information

Phylogenomics. Jeffrey P. Townsend Department of Ecology and Evolutionary Biology Yale University. Tuesday, January 29, 13

Phylogenomics. Jeffrey P. Townsend Department of Ecology and Evolutionary Biology Yale University. Tuesday, January 29, 13 Phylogenomics Jeffrey P. Townsend Department of Ecology and Evolutionary Biology Yale University How may we improve our inferences? How may we improve our inferences? Inferences Data How may we improve

More information

Fast Phylogenetic Methods for the Analysis of Genome Rearrangement Data: An Empirical Study

Fast Phylogenetic Methods for the Analysis of Genome Rearrangement Data: An Empirical Study Fast Phylogenetic Methods for the Analysis of Genome Rearrangement Data: An Empirical Study Li-San Wang Robert K. Jansen Dept. of Computer Sciences Section of Integrative Biology University of Texas, Austin,

More information

Cladistics. The deterministic effects of alignment bias in phylogenetic inference. Mark P. Simmons a, *, Kai F. Mu ller b and Colleen T.

Cladistics. The deterministic effects of alignment bias in phylogenetic inference. Mark P. Simmons a, *, Kai F. Mu ller b and Colleen T. Cladistics Cladistics 27 (2) 42 46./j.96-3.2.333.x The deterministic effects of alignment bias in phylogenetic inference Mark P. Simmons a, *, Kai F. Mu ller b and Colleen T. Webb a a Department of Biology,

More information

Distinctions between optimal and expected support. Ward C. Wheeler

Distinctions between optimal and expected support. Ward C. Wheeler Cladistics Cladistics 26 (2010) 657 663 10.1111/j.1096-0031.2010.00308.x Distinctions between optimal and expected support Ward C. Wheeler Division of Invertebrate Zoology, American Museum of Natural History,

More information

Parsimony via Consensus

Parsimony via Consensus Syst. Biol. 57(2):251 256, 2008 Copyright c Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 10.1080/10635150802040597 Parsimony via Consensus TREVOR C. BRUEN 1 AND DAVID

More information

Multiple sequence alignment accuracy and phylogenetic inference

Multiple sequence alignment accuracy and phylogenetic inference Utah Valley University From the SelectedWorks of T. Heath Ogden 2006 Multiple sequence alignment accuracy and phylogenetic inference T. Heath Ogden, Utah Valley University Available at: https://works.bepress.com/heath_ogden/6/

More information

Total Evidence Or Taxonomic Congruence: Cladistics Or Consensus Classification

Total Evidence Or Taxonomic Congruence: Cladistics Or Consensus Classification Cladistics 14, 151 158 (1998) WWW http://www.apnet.com Article i.d. cl970056 Total Evidence Or Taxonomic Congruence: Cladistics Or Consensus Classification Arnold G. Kluge Museum of Zoology, University

More information

Phylogenetics in the Age of Genomics: Prospects and Challenges

Phylogenetics in the Age of Genomics: Prospects and Challenges Phylogenetics in the Age of Genomics: Prospects and Challenges Antonis Rokas Department of Biological Sciences, Vanderbilt University http://as.vanderbilt.edu/rokaslab http://pubmed2wordle.appspot.com/

More information

A Chain Is No Stronger than Its Weakest Link: Double Decay Analysis of Phylogenetic Hypotheses

A Chain Is No Stronger than Its Weakest Link: Double Decay Analysis of Phylogenetic Hypotheses Syst. Biol. 49(4):754 776, 2000 A Chain Is No Stronger than Its Weakest Link: Double Decay Analysis of Phylogenetic Hypotheses MARK WILKINSON, 1 JOSEPH L. THORLEY, 1,2 AND PAUL UPCHURCH 3 1 Department

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: Distance-based methods Ultrametric Additive: UPGMA Transformed Distance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

Assessing Congruence Among Ultrametric Distance Matrices

Assessing Congruence Among Ultrametric Distance Matrices Journal of Classification 26:103-117 (2009) DOI: 10.1007/s00357-009-9028-x Assessing Congruence Among Ultrametric Distance Matrices Véronique Campbell Université de Montréal, Canada Pierre Legendre Université

More information

Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies

Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies 1 What is phylogeny? Essay written for the course in Markov Chains 2004 Torbjörn Karfunkel Phylogeny is the evolutionary development

More information

How to read and make phylogenetic trees Zuzana Starostová

How to read and make phylogenetic trees Zuzana Starostová How to read and make phylogenetic trees Zuzana Starostová How to make phylogenetic trees? Workflow: obtain DNA sequence quality check sequence alignment calculating genetic distances phylogeny estimation

More information

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution Ziheng Yang Department of Biology, University College, London An excess of nonsynonymous substitutions

More information

Integrative Biology 200A "PRINCIPLES OF PHYLOGENETICS" Spring 2008

Integrative Biology 200A PRINCIPLES OF PHYLOGENETICS Spring 2008 Integrative Biology 200A "PRINCIPLES OF PHYLOGENETICS" Spring 2008 University of California, Berkeley B.D. Mishler March 18, 2008. Phylogenetic Trees I: Reconstruction; Models, Algorithms & Assumptions

More information

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University Phylogenetics: Bayesian Phylogenetic Analysis COMP 571 - Spring 2015 Luay Nakhleh, Rice University Bayes Rule P(X = x Y = y) = P(X = x, Y = y) P(Y = y) = P(X = x)p(y = y X = x) P x P(X = x 0 )P(Y = y X

More information

Bayesian support is larger than bootstrap support in phylogenetic inference: a mathematical argument

Bayesian support is larger than bootstrap support in phylogenetic inference: a mathematical argument Bayesian support is larger than bootstrap support in phylogenetic inference: a mathematical argument Tom Britton Bodil Svennblad Per Erixon Bengt Oxelman June 20, 2007 Abstract In phylogenetic inference

More information

arxiv: v1 [q-bio.pe] 6 Jun 2013

arxiv: v1 [q-bio.pe] 6 Jun 2013 Hide and see: placing and finding an optimal tree for thousands of homoplasy-rich sequences Dietrich Radel 1, Andreas Sand 2,3, and Mie Steel 1, 1 Biomathematics Research Centre, University of Canterbury,

More information

Biology 211 (2) Week 1 KEY!

Biology 211 (2) Week 1 KEY! Biology 211 (2) Week 1 KEY Chapter 1 KEY FIGURES: 1.2, 1.3, 1.4, 1.5, 1.6, 1.7 VOCABULARY: Adaptation: a trait that increases the fitness Cells: a developed, system bound with a thin outer layer made of

More information

Increasing Data Transparency and Estimating Phylogenetic Uncertainty in Supertrees: Approaches Using Nonparametric Bootstrapping

Increasing Data Transparency and Estimating Phylogenetic Uncertainty in Supertrees: Approaches Using Nonparametric Bootstrapping Syst. Biol. 55(4):662 676, 2006 Copyright c Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 10.1080/10635150600920693 Increasing Data Transparency and Estimating Phylogenetic

More information

Lecture 6 Phylogenetic Inference

Lecture 6 Phylogenetic Inference Lecture 6 Phylogenetic Inference From Darwin s notebook in 1837 Charles Darwin Willi Hennig From The Origin in 1859 Cladistics Phylogenetic inference Willi Hennig, Cladistics 1. Clade, Monophyletic group,

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: istance-based methods Ultrametric Additive: UPGMA Transformed istance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

Zhongyi Xiao. Correlation. In probability theory and statistics, correlation indicates the

Zhongyi Xiao. Correlation. In probability theory and statistics, correlation indicates the Character Correlation Zhongyi Xiao Correlation In probability theory and statistics, correlation indicates the strength and direction of a linear relationship between two random variables. In general statistical

More information

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics)

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics) - Phylogeny? - Systematics? The Phylogenetic Systematics (Phylogeny and Systematics) - Phylogenetic systematics? Connection between phylogeny and classification. - Phylogenetic systematics informs the

More information

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2011 University of California, Berkeley

PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION Integrative Biology 200B Spring 2011 University of California, Berkeley "PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2011 University of California, Berkeley B.D. Mishler March 31, 2011. Reticulation,"Phylogeography," and Population Biology:

More information

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships Chapter 26: Phylogeny and the Tree of Life You Must Know The taxonomic categories and how they indicate relatedness. How systematics is used to develop phylogenetic trees. How to construct a phylogenetic

More information

Need for systematics. Applications of systematics. Linnaeus plus Darwin. Approaches in systematics. Principles of cladistics

Need for systematics. Applications of systematics. Linnaeus plus Darwin. Approaches in systematics. Principles of cladistics Topics Need for systematics Applications of systematics Linnaeus plus Darwin Approaches in systematics Principles of cladistics Systematics pp. 474-475. Systematics - Study of diversity and evolutionary

More information

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata.

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata. Supplementary Note S2 Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata. Phylogenetic trees reconstructed by a variety of methods from either single-copy orthologous loci (Class

More information

Parsimony overcomes statistical inconsistency with the addition of more data from the same gene

Parsimony overcomes statistical inconsistency with the addition of more data from the same gene Cladistics Cladistics 21 (2005) 438 445 www.blackwell-synergy.com Parsimony overcomes statistical inconsistency with the addition of more data from the same gene Kurt M. Pickett 1, *, Greg L. Tolman 2,

More information

Assessing Progress in Systematics with Continuous Jackknife Function Analysis

Assessing Progress in Systematics with Continuous Jackknife Function Analysis Syst. Biol. 52(1):55 65, 2003 DOI: 10.1080/10635150390132731 Assessing Progress in Systematics with Continuous Jackknife Function Analysis JEREMY A. MILLER Department of Systematic Biology Entomology,

More information

Chapter 26 Phylogeny and the Tree of Life

Chapter 26 Phylogeny and the Tree of Life Chapter 26 Phylogeny and the Tree of Life Biologists estimate that there are about 5 to 100 million species of organisms living on Earth today. Evidence from morphological, biochemical, and gene sequence

More information

Points of View Matrix Representation with Parsimony, Taxonomic Congruence, and Total Evidence

Points of View Matrix Representation with Parsimony, Taxonomic Congruence, and Total Evidence Points of View Syst. Biol. 51(1):151 155, 2002 Matrix Representation with Parsimony, Taxonomic Congruence, and Total Evidence DAVIDE PISANI 1,2 AND MARK WILKINSON 2 1 Department of Earth Sciences, University

More information

Anatomy of a species tree

Anatomy of a species tree Anatomy of a species tree T 1 Size of current and ancestral Populations (N) N Confidence in branches of species tree t/2n = 1 coalescent unit T 2 Branch lengths and divergence times of species & populations

More information

The Life System and Environmental & Evolutionary Biology II

The Life System and Environmental & Evolutionary Biology II The Life System and Environmental & Evolutionary Biology II EESC V2300y / ENVB W2002y Laboratory 1 (01/28/03) Systematics and Taxonomy 1 SYNOPSIS In this lab we will give an overview of the methodology

More information

Removal of Noisy Characters from Chloroplast Genome-Scale Data Suggests Revision of Phylogenetic Placements of Amborella and Ceratophyllum

Removal of Noisy Characters from Chloroplast Genome-Scale Data Suggests Revision of Phylogenetic Placements of Amborella and Ceratophyllum J Mol Evol (29) 68:197 24 DOI 1.17/s239-9-926-9 Removal of Noisy Characters from Chloroplast Genome-Scale Data Suggests Revision of Phylogenetic Placements of Amborella and Ceratophyllum Vadim V. Goremykin

More information

Integrative Biology 200A "PRINCIPLES OF PHYLOGENETICS" Spring 2012 University of California, Berkeley

Integrative Biology 200A PRINCIPLES OF PHYLOGENETICS Spring 2012 University of California, Berkeley Integrative Biology 200A "PRINCIPLES OF PHYLOGENETICS" Spring 2012 University of California, Berkeley B.D. Mishler April 12, 2012. Phylogenetic trees IX: Below the "species level;" phylogeography; dealing

More information

Historical Biogeography. Historical Biogeography. Systematics

Historical Biogeography. Historical Biogeography. Systematics Historical Biogeography I. Definitions II. Fossils: problems with fossil record why fossils are important III. Phylogeny IV. Phenetics VI. Phylogenetic Classification Disjunctions debunked: Examples VII.

More information

How should we organize the diversity of animal life?

How should we organize the diversity of animal life? How should we organize the diversity of animal life? The difference between Taxonomy Linneaus, and Cladistics Darwin What are phylogenies? How do we read them? How do we estimate them? Classification (Taxonomy)

More information

To link to this article: DOI: / URL:

To link to this article: DOI: / URL: This article was downloaded by:[ohio State University Libraries] [Ohio State University Libraries] On: 22 February 2007 Access Details: [subscription number 731699053] Publisher: Taylor & Francis Informa

More information

A Phylogenetic Network Construction due to Constrained Recombination

A Phylogenetic Network Construction due to Constrained Recombination A Phylogenetic Network Construction due to Constrained Recombination Mohd. Abdul Hai Zahid Research Scholar Research Supervisors: Dr. R.C. Joshi Dr. Ankush Mittal Department of Electronics and Computer

More information

Properties of normal phylogenetic networks

Properties of normal phylogenetic networks Properties of normal phylogenetic networks Stephen J. Willson Department of Mathematics Iowa State University Ames, IA 50011 USA swillson@iastate.edu August 13, 2009 Abstract. A phylogenetic network is

More information

Phylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches

Phylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches Int. J. Bioinformatics Research and Applications, Vol. x, No. x, xxxx Phylogenies Scores for Exhaustive Maximum Likelihood and s Searches Hyrum D. Carroll, Perry G. Ridge, Mark J. Clement, Quinn O. Snell

More information

1 ATGGGTCTC 2 ATGAGTCTC

1 ATGGGTCTC 2 ATGAGTCTC We need an optimality criterion to choose a best estimate (tree) Other optimality criteria used to choose a best estimate (tree) Parsimony: begins with the assumption that the simplest hypothesis that

More information

Phylogenetic Tree Reconstruction

Phylogenetic Tree Reconstruction I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven

More information

What is Phylogenetics

What is Phylogenetics What is Phylogenetics Phylogenetics is the area of research concerned with finding the genetic connections and relationships between species. The basic idea is to compare specific characters (features)

More information

X X (2) X Pr(X = x θ) (3)

X X (2) X Pr(X = x θ) (3) Notes for 848 lecture 6: A ML basis for compatibility and parsimony Notation θ Θ (1) Θ is the space of all possible trees (and model parameters) θ is a point in the parameter space = a particular tree

More information

PHYLOGENY & THE TREE OF LIFE

PHYLOGENY & THE TREE OF LIFE PHYLOGENY & THE TREE OF LIFE PREFACE In this powerpoint we learn how biologists distinguish and categorize the millions of species on earth. Early we looked at the process of evolution here we look at

More information

Hypothesis tests

Hypothesis tests 6.1 6.4 Hypothesis tests Prof. Tesler Math 186 February 26, 2014 Prof. Tesler 6.1 6.4 Hypothesis tests Math 186 / February 26, 2014 1 / 41 6.1 6.2 Intro to hypothesis tests and decision rules Hypothesis

More information

Introduction to characters and parsimony analysis

Introduction to characters and parsimony analysis Introduction to characters and parsimony analysis Genetic Relationships Genetic relationships exist between individuals within populations These include ancestordescendent relationships and more indirect

More information

Lecture V Phylogeny and Systematics Dr. Kopeny

Lecture V Phylogeny and Systematics Dr. Kopeny Delivered 1/30 and 2/1 Lecture V Phylogeny and Systematics Dr. Kopeny Lecture V How to Determine Evolutionary Relationships: Concepts in Phylogeny and Systematics Textbook Reading: pp 425-433, 435-437

More information

THE TRIPLES DISTANCE FOR ROOTED BIFURCATING PHYLOGENETIC TREES

THE TRIPLES DISTANCE FOR ROOTED BIFURCATING PHYLOGENETIC TREES Syst. Biol. 45(3):33-334, 1996 THE TRIPLES DISTANCE FOR ROOTED BIFURCATING PHYLOGENETIC TREES DOUGLAS E. CRITCHLOW, DENNIS K. PEARL, AND CHUNLIN QIAN Department of Statistics, Ohio State University, Columbus,

More information

Chapter 19 Organizing Information About Species: Taxonomy and Cladistics

Chapter 19 Organizing Information About Species: Taxonomy and Cladistics Chapter 19 Organizing Information About Species: Taxonomy and Cladistics An unexpected family tree. What are the evolutionary relationships among a human, a mushroom, and a tulip? Molecular systematics

More information

The origin of angiosperms has long been considered a fundamental

The origin of angiosperms has long been considered a fundamental Phylogeny of seed plants based on all three genomic compartments: Extant gymnosperms are monophyletic and Gnetales closest relatives are conifers L. Michelle Bowe*, Gwénaële Coat, and Claude W. depamphilis

More information

The practice of naming and classifying organisms is called taxonomy.

The practice of naming and classifying organisms is called taxonomy. Chapter 18 Key Idea: Biologists use taxonomic systems to organize their knowledge of organisms. These systems attempt to provide consistent ways to name and categorize organisms. The practice of naming

More information

ANALYSIS OF CHARACTER DIVERGENCE ALONG ENVIRONMENTAL GRADIENTS AND OTHER COVARIATES

ANALYSIS OF CHARACTER DIVERGENCE ALONG ENVIRONMENTAL GRADIENTS AND OTHER COVARIATES ORIGINAL ARTICLE doi:10.1111/j.1558-5646.2007.00063.x ANALYSIS OF CHARACTER DIVERGENCE ALONG ENVIRONMENTAL GRADIENTS AND OTHER COVARIATES Dean C. Adams 1,2,3 and Michael L. Collyer 1,4 1 Department of

More information

5 Measures of Support

5 Measures of Support 5 Measures of Support Mark E. Siddall Contents 1 Introduction, 80 2 The Bootstrap 81 3 The Jackknife 85 4 Noise 88 5 Direct Measures of Support 92 6 Remarks and Conclusions 96 References 99 1 Introduction

More information

Surprise! A New Hominin Fossil Changes Almost Nothing!

Surprise! A New Hominin Fossil Changes Almost Nothing! Surprise! A New Hominin Fossil Changes Almost Nothing! Author: Andrew J Petto Table 1: Brief Comparison of Australopithecus with early Homo fossils Species Apes (outgroup) Thanks to Louise S Mead for comments

More information

Efficiencies of maximum likelihood methods of phylogenetic inferences when different substitution models are used

Efficiencies of maximum likelihood methods of phylogenetic inferences when different substitution models are used Molecular Phylogenetics and Evolution 31 (2004) 865 873 MOLECULAR PHYLOGENETICS AND EVOLUTION www.elsevier.com/locate/ympev Efficiencies of maximum likelihood methods of phylogenetic inferences when different

More information

Points of View. Congruence Versus Phylogenetic Accuracy: Revisiting the Incongruence Length Difference Test

Points of View. Congruence Versus Phylogenetic Accuracy: Revisiting the Incongruence Length Difference Test Points of View Syst. Biol. 53(1):81 89, 2004 Copyright c Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 10.1080/10635150490264752 Congruence Versus Phylogenetic Accuracy:

More information

Non-independence in Statistical Tests for Discrete Cross-species Data

Non-independence in Statistical Tests for Discrete Cross-species Data J. theor. Biol. (1997) 188, 507514 Non-independence in Statistical Tests for Discrete Cross-species Data ALAN GRAFEN* AND MARK RIDLEY * St. John s College, Oxford OX1 3JP, and the Department of Zoology,

More information

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9 Lecture 5 Alignment I. Introduction. For sequence data, the process of generating an alignment establishes positional homologies; that is, alignment provides the identification of homologous phylogenetic

More information