Molecular Phylogenetics and Evolution

Size: px
Start display at page:

Download "Molecular Phylogenetics and Evolution"

Transcription

1 Molecular Phylogenetics and Evolution 61 (2011) Contents lists available at ScienceDirect Molecular Phylogenetics and Evolution journal homepage: Spurious 99% bootstrap and jackknife support for unsupported clades Mark P. Simmons a,, John V. Freudenstein b a Department of Biology, Colorado State University, Fort Collins, CO , USA b The Ohio State University Herbarium, 1315 Kinnear Road, Columbus, OH 43212, USA article info abstract Article history: Received 20 February 2011 Revised 25 May 2011 Accepted 8 June 2011 Available online 16 June 2011 Keywords: Frequency-within-replicates bootstrap Jackknife Majority rule consensus Missing data Undersampling-within-replicates artifact Unsupported clade Quantifying branch support using the bootstrap and/or jackknife is generally considered to be an essential component of rigorous parsimony and maximum likelihood phylogenetic analyses. Previous authors have described how application of the frequency-within-replicates approach to treating multiple equally optimal trees found in a given bootstrap pseudoreplicate can provide apparent support for otherwise unsupported clades. We demonstrate how a similar problem may occur when a non-representative subset of equally optimal trees are held per pseudoreplicate, which we term the undersampling-within-replicates artifact. We illustrate the frequency-within-replicates and undersampling-within-replicates bootstrap and jackknife artifacts using both contrived and empirical examples, demonstrate that the artifacts can occur in both parsimony and likelihood analyses, and show that the artifacts occur in outputs from multiple different phylogenetic-inference programs. Based on our results, we make the following five recommendations, which are particularly relevant to supermatrix analyses, but apply to all phylogenetic analyses. First, when two or more optimal trees are found in a given pseudoreplicate they should be summarized using the strict-consensus rather than frequency-within-replicates approach. Second jackknife resampling should be used rather than bootstrap resampling. Third, multiple tree searches while holding multiple trees per search should be conducted in each pseudoreplicate rather than conducting only a single search and holding only a single tree. Fourth, branches with a minimum possible optimized length of zero should be collapsed within each tree search rather than collapsing branches only if their maximum possible optimized length is zero. Fifth, resampling values should be mapped onto the strict consensus of all optimal trees found rather than simply presenting the P50% bootstrap or jackknife tree or mapping the resampling values onto a single optimal tree. Ó 2011 Elsevier Inc. All rights reserved. 1. Introduction The effect of missing data on phylogenetic analyses has been studied with respect to their effects on tree construction, polymorphic taxon coding, and assessment of homoplasy (e.g., Nixon and Davis, 1991; Platnick et al., 1991). Wiens (2003) demonstrated that it is not just the amount of missing data in a matrix, but rather their arrangement, that determines their effect on phylogenetic resolution and accuracy. Much less investigation has been focused on the effects of missing data on resampling support analyses. Wilkinson (2003) observed that wildcard terminals (Nixon and Wheeler, 1991), containing many missing values, can lower resampling support values for many clades on the inferred tree. But wildcard terminals can also raise resampling support values, sometimes dramatically, including those for clades that are unsupported by the data. The vast majority of parsimony- (Farris, 1970; Fitch, 1971) and likelihood-based (Felsenstein, 1973) phylogenetic analyses Corresponding author. Fax: address: psimmons@lamar.colostate.edu (M.P. Simmons). quantify branch support using the bootstrap (BS; Felsenstein, 1985) or jackknife (JK; Farris et al., 1996). Although interpretations of BS/JK values vary (e.g., Felsenstein, 1985; Hillis and Bull, 1993; Sanderson, 1995), use of one of these resampling procedures is generally considered to be an essential component of rigorous parsimony and likelihood phylogenetic analyses (e.g., Page and Holmes, 1998; Graur and Li, 2000; Felsenstein, 2004). BS/JK support is typically presented on a majority-rule-consensus (MRC), which was introduced by Margush and McMorris (1981). In addition to being a way of presenting resampling support values, the MRC is sometimes used in the same manner in which it was originally proposed. Margush and McMorris (1981) introduced MRC not as a method to summarize BS/JK support, but rather to summarize equally optimal trees. The sole justification given by Margush and McMorris (1981, p. 242) for using MRCs to summarize equally optimal trees was that, We feel that many requirements of a consensus are met by what has been called majority rule in the social sciences. The use of MRCs to summarize equally optimal phylogenetic trees has since been criticized by Barrett et al. (1991), Nixon and Carpenter (1996), Sharkey and Leathers (2001), and Sumrall et al. (2001). Like Barrett et al /$ - see front matter Ó 2011 Elsevier Inc. All rights reserved. doi: /j.ympev

2 178 M.P. Simmons, J.V. Freudenstein / Molecular Phylogenetics and Evolution 61 (2011) (1991, p. 487),...we know of no general justification for this approach. Note that this criticism of MRCs does not apply to their use in summarizing BS/JK support from multiple pseudoreplicates. Both Sharkey and Leathers (2001) and Sumrall et al. (2001) demonstrated that ambiguity is determinate to MRCs, such that more ambiguity in one part of a tree leads to greater apparent support for clades in other parts of the tree. As Sharkey and Leathers (2001, p. 282) stated, This preference [of MRCs] is based on the implicit assumption that all fundamental cladograms are independent and equally likely to be the correct tree. This assumption is unfounded. Sumrall et al. (2001, p. 256) further clarified and showed that: A combination of two factors a labile terminal taxon and unequal numbers of optimal trees consistent with each of its regional placements is hypothesized to cause bias in majority-rule consensus trees. This bias effectively favors the least resolved alternate regional island of parsimony. Sumrall et al. (2001) concluded by noting that the negative implications of using MRC extend to resampling measures, as in the BS and the JK, when MRC-like assumptions are used to summarize multiple equally optimal trees found within a given pseudoreplicate. Felsenstein (1985) did not specifically address how to handle multiple equally optimal trees when he introduced the BS to phylogenetics (see also De Laet et al., 2004). But Felsenstein (2004, p. 339) clarified his position: Some methods may give us more than one estimate of the phylogeny...in such cases we can consider that if 10 tied estimates are found for one bootstrap replicate, we consider each to be one-tenth of a tree, so that the results from that bootstrap replicate are not overemphasized when the trees are combined. Table 1 Data matrix of binary characters for part A of the third contrived example. 1A A A A A A A A A A B B B B B B Wildcard 0???????????? Although this approach does not overemphasize such a BS pseudoreplicate, assigning a score of 0.1 to each of the 10 equally optimal trees does overemphasize each of the clades that is not present in all 10 of those trees. That is to say that this approach assigns support to a clade(s) that is not present in the strict consensus. As noted by Goloboff et al. (2003), any clade present in all optimal trees is by definition supported by the data (irrespective of how well supported the clade may be as measured by a branch-support criterion); any clade present in one or more, but not all, optimal trees is unsupported; and any clade that is not present in any of the optimal trees is contradicted. We follow Goloboff et al. s (2003) definitions throughout this paper. Felsenstein s (2004) approach to summarizing equally optimal trees found in a given pseudoreplicate (termed the frequency-within-replicates bootstrap [FWR BS] by Soreng and Davis, 1998) has been explicitly incorporated into both PHYLIP (Felsenstein, 2009) and PAUP (Swofford, 2001), whereas NONA (Goloboff, 1999a,b) and TNT (Goloboff et al., 2008) explicitly implement the strict-consensus bootstrap (SC BS), in which only clades resolved in the strict consensus (Schuh and Polhemus, 1980) of all optimal trees found within each pseudoreplicate are considered (i.e., 1 if present, 0 if absent). The SC BS generally provides lower support than the FWR BS when multiple trees are held for each pseudoreplicate (Davis et al., 1998; Soreng and Davis, 1998) and is preferred because it does not rely on the unjustified MRC-like approach that inflates inferred support (Davis et al., 2004; Freudenstein and Davis, 2010). Inflated support for clades that are present in the SC of all equally optimal trees, as described by Davis and colleagues, is not the only problem that can be caused by use of the FWR BS. De Laet et al. (2004, p. 590) stated in a meeting abstract that,... this method is defective in that it can yield high resampling frequencies for groups that are unsupported by the data, and this can occur in both parsimony and likelihood analyses. Goloboff and Pol (2005, p. 152) provided a contrived example of how this can occur by using a wildcard terminal consisting entirely of missing data in an otherwise well supported (four uncontradicted synapomorphies per clade) pectinate tree. FWR BS support in the example is dependent upon clade size (ranging from 50% for an 11-terminal clade to 96% for a two-terminal clade), showing the same artifact as Bayesian MCMC (Yang and Rannala, 1997) posterior probabilities (see also Pickett and Randle, 2005). The FWR BS artifact that was first suggested by Sumrall et al. (2001), and later clarified and demonstrated by De Laet et al. (2004) and Goloboff and Pol (2005), is expected to occur in data matrices with one or more wildcard terminals. Terminals may behave as wildcards when they are unambiguously scored for only a few parsimony informative characters (as in incompletely preserved fossils; Nixon and Wheeler, 1991), when they are scored as autapomorphies for most parsimony informative characters (because the autapomorphies behave identically to missing data and inapplicable entries (i.e.,, as with gaps in nucleotide characters; Simmons and Ochoterena, 2000) in a parsimony context), and when there is extreme character conflict caused by convergence or reversal between divergent terminals (e.g., characters in Table 1). Two types of empirical data matrices in which wildcard terminals may be expected to occur are those for which a large number of terminals are sampled relative to the number of parsimonyinformative characters (e.g., Källersjö et al. (1998) sampled rbcl for 2538 plants; Tehler et al. (2003) sampled 18S rdna for 1551 fungi) and supermatrix analyses with high amounts of missing data and low overlap in loci sampled among terminals (e.g., McMahon and Sanderson (2006) sampled 2228 Papilionoideae legumes for 33,168 nucleotide characters with only 4.3% of the cells scored as nucleotides; see supplemental online data matrices posted at: for examples of the low overlap in loci sampled among closely related terminals). In addition to the FWR BS/JK artifact providing apparent support for otherwise unsupported clades, the same type of problem may occur when a non-representative subset of equally optimal trees are held per BS/JK pseudoreplicate. By non-representative subset we mean that the SC of the optimal trees sampled is more resolved than the SC of all equally optimal trees and is resolved such that at least one unsupported clade is more likely to be resolved than by random chance alone. By definition, a single hillclimbing heuristic search (as in nearest-neighbor interchange [NNI], subtree pruning regrafting [SPR], and tree bisection reconnection [TBR]) is only capable of sampling a single island of equally optimal trees (Maddison, 1991). If only a subset of the islands of equally optimal tree islands is sampled, the SC may resolve

3 M.P. Simmons, J.V. Freudenstein / Molecular Phylogenetics and Evolution 61 (2011) unsupported clades (Maddison, 1991). Yet it is not necessary to sample every tree on every island to obtain a properly resolved SC (Goloboff, 1999a,b). Weakly supported clades are generally not expected to survive BS/JK resampling (Farris et al., 1996). But even unsupported clades can survive resampling if the trees sampled from most pseudoreplicates are a non-representative subset, which we refer to as the undersampling-within-replicates BS/JK artifact and demonstrate below. One difference between the undersampling-within-replicates and FWR artifacts is that the former does not apply when all equally optimal trees are held in each pseudoreplicate, whereas the latter applies whenever multiple equally optimal trees are held. A second difference is that the undersampling-within-replicates artifact is applicable to pseudoreplicates in which only a single optimal tree is held irrespective of how many optimal trees there are (if more than one), whereas the FWR artifact does not apply unless two or more equally optimal trees are held. In this manuscript we illustrate the FWR and undersamplingwithin-replicates BS/JK artifacts using both contrived and empirical examples, demonstrate that the artifacts can occur in both parsimony and likelihood analyses, show that the artifacts occur in outputs from multiple different phylogenetic-inference programs, and infer in which types of data matrices the artifacts are most likely to occur. We conclude by making recommendations on how to minimize the number and severity of occurrences of the artifacts. 2. Methods 2.1. Contrived examples Four contrived examples were created to demonstrate the artifacts. All examples consist entirely of binary parsimony informative characters. The first example consists of 102 terminals and 112 characters. With the exception of the wildcard terminal, which is scored as missing data for the following 12 characters, six uncontradicted synapomorphies support each of the following two clades: (1, 2) and (1, 2, 3). The remaining 100 characters alternately unite terminals with the wildcard terminal. There is very strong support for both clade (1, 2) and (1, 2, 3) relative to terminals 4 101, yet there is no support for those two clades excluding the wildcard terminal. Because of the wildcard terminal, the SC is entirely unresolved. This first example was created to demonstrate how high the inferred support can be for unsupported clades. The second example is identical to the first example, except that it consists of only 11 terminals and 21 characters. Because of the wildcard terminal, the SC is entirely unresolved. This example was created to be more computationally tractable than the first example and also demonstrate the importance of collapsing branches with a minimum possible length of zero (see below under Section 2.3). The third example consists of two parts A and B. The first character of part A unites terminals 1B 6B and the wildcard terminal separately from terminals 1A 10A (Table 1) in the SC. With the exception of the wildcard terminal, which is scored as missing data for characters 2 13, six uncontradicted synapomorphies support each of the following two clades: (1B, 2B) and (1B, 2B, 3B). The remaining 15 characters alternatively unite terminals 1A 10A and 2B 6B with the wildcard terminal. Part B consists of only the last seven terminals of part A. This third example was created to demonstrate that a localized wildcard terminal in the SC can behave as a global wildcard in some BS and JK pseudoreplicates, thereby making the artifacts more severe than may otherwise be expected. The fourth example consists of 14 terminals and 60 characters (Table 2). Terminals 7 and 8 are identical to each other and are the only terminals without any missing data. Six uncontradicted synapomorphies are provided for each of 10 clades (or 12 synapomorphies for each of five clades, depending on how the tree is resolved), but because of the missing data, the SC is entirely unresolved. This example was created to demonstrate how low character overlap between sampled terminals in supermatrix analyses can create the artifacts Empirical data Four empirical data sets were included in this study, two of which sampled a single locus for all terminals (Bailey et al., 2006; Richardson et al., 2006), and two of which sampled many loci but have substantial missing data (McMahon and Sanderson, 2006; Thomson and Shaffer, 2010; Table 3). All four data sets produced trees that have one or more large polytomies and many weakly supported clades. Each of the four data sets were split into two (Thomson and Shaffer, 2010) or six (all others) sub-matrices to enable relatively thorough most parsimonious, bootstrap, and jackknife tree searches to be computationally tractable in PAUP. The Bailey et al. (2006) data set used consists of 618 unique rdna internal transcribed spacer (ITS) sequences (after deletion of 128 duplicates) sampled from the Brassicaceae (mustards). The data matrix comprises 721 nucleotide characters, of which 451 are parsimony informative. The 618 terminals were split into six ± natural groups (to the degree possible, based on the tree presented in Fig. S1 of Bailey et al. (2006)) of similar size (Table 3). The Richardson et al. (2006) data set used consists of 511 unique kinesin-superfamily-motor-domain sequences (after deletion of 18 duplicates) sampled from 19 model species across eukaryotes. The data matrix comprises 2584 amino acid characters, of which 834 are parsimony informative. The 511 terminals were split into six ± natural groups (to the degree possible, based on the tree presented in Figs of Richardson et al. (2006)) of similar size (Table 3). Table 2 Data matrix of binary characters for the fourth contrived example ?????????????????????????????? ?????????????????????????????? ?????????????????????????????? ?????????????????????????????? ?????????????????????????????? ?????????????????????????????? ?????????????????????????????? ?????????????????????????????? ?????????????????????????????? ?????????????????????????????? ?????????????????????????????? ??????????????????????????????

4 180 M.P. Simmons, J.V. Freudenstein / Molecular Phylogenetics and Evolution 61 (2011) Table 3 Properties of data matrices sampled. Matrix # Terminals # Pars. inf. chars % Missing/inapplicable for pars. inf. chars Maximum % missing/inapplicable for single terminal # Clades on strict consensus # Additional clades on majority-rule consensus Contrived A B A B Bailey et al Richardson McMahon Thomson N/A Thomson a Defined as clades in the MRC but not the SC showing P50% BS and/or JK support in any of the parsimony-based PAUP or TNT results. # Problem clades a The McMahon and Sanderson (2006) data set used consists of the dense supermatrix of 2228 terminals sampled from the Papilionoideae legumes. Of the 33,168 nucleotide characters, 7199 are parsimony informative. The data matrix consists of 89% missing data, 6.7% gapped positions (no gap characters were coded), and only 4.3% nucleotides. Six convex groups (monophyletic or paraphyletic) as resolved on the SC of 5000 equally parsimonious trees downloaded from pubdata.htm, without any overlapping terminals, were sub-sampled from the original matrix. Unlike the other convex groups, the third matrix was largely homogeneous with respect to taxon sampling (all terminals from Astragalus) and character sampling (all but three terminals were sampled for both ITS1 and ITS2 of rdna; one terminal was only sampled for ITS1 and one terminal was only sampled for ITS2; the entire data matrix consisted of only 20.7% missing or inapplicable entries; Table 3). One duplicate sequence was deleted from the fifth matrix. Two alternative data sets were used from Thomson and Shaffer s (2010) supermatrix of 53,406 nucleotide characters (of which 4440 are parsimony informative in the 223-terminal matrix and 4511 are parsimony informative in the 223-terminal matrix) sampled from turtles. The inferred phylogeny from the 213-terminal matrix was presented in Thomson and Shaffer s (2010) Fig. 5, after removal of 10 wildcard (or rogue ) terminals. Both the 213- and 223-terminal matrices were split into two sub-matrices based on which taxa are present on pages 53 and 54 of Thomson and Shaffer (2010). The 10 wildcard terminals were assigned to these two sub-matrices based on current turtle taxonomy. All data sets analyzed for this study, both contrived and empirical, are posted as supplemental online data at Parsimony tree searches With the exception of contrived example 2B, all PAUP ver. 4b10 and TNT ver. 1.1 searches were performed such that branches with a minimum possible optimized length of zero would be collapsed following Davis et al. (2005) and Freudenstein and Davis (2010). Failure to collapse these branches can lead to a proliferation of equally optimal trees with branches that are only supported under some character-state optimizations, particularly in the context of missing data (Kitching et al., 1998; Kearney and Clark, 2003). Contrived example 2B was run using the default setting in PAUP (collapse branches only if their maximum length is zero) and by deactivating the collapse function in TNT. For contrived examples 2 4, searches for the most parsimonious trees were performed in PAUP using branch and bound with all most parsimonious trees being held, after which the SC and MRC were calculated. BS and JK analyses for contrived examples 2, 3B, and 4 were performed using 10,000 replicates. Following Farris et al. (1996), the JK deletion probability was set to and jac resampling was emulated. The following three approaches were used for calculating BS and JK support: (1) branch-and-bound searches while holding a single optimal tree for each pseudoreplicate, (2) 100 random-addition-sequence (RAS) searches employing TBR swapping with up to 10 optimal trees held per RAS search (hence up to 1000 trees held per pseudoreplicate), and (3) branch-and-bound searches while holding all optimal trees for

5 M.P. Simmons, J.V. Freudenstein / Molecular Phylogenetics and Evolution 61 (2011) each pseudoreplicate. In all BS and JK analyses performed for this study, only clades with P50% support were considered. Because of time limitations when attempting branch-andbound searches, BS and JK support for contrived example 3A were calculated using the following three approaches: (1) one RAS search with TBR swapping and only a single tree held for each of the 2500 pseudoreplicates, (2) 100 RAS searches employing TBR swapping with up to 10 trees held per TBR search for each of the 10,000 pseudoreplicates, and (3) 10 RAS searches employing TBR swapping with up to 10,000 trees held per RAS search (hence up to 100,000 trees held per pseudoreplicate). For contrived example 1 and all empirical matrices, searches for the most parsimonious trees were performed in PAUP using 1000 RAS searches employing TBR swapping and up to 10,000 trees held per RAS search (hence up to 10 million trees held), after which the SC and MRC were calculated. BS and JK analyses were generally performed using 2500 pseudoreplicates. The following four approaches were used for calculating BS and JK support: (1) one RAS search with TBR swapping and only a single tree held, (2) 10 RAS searches with TBR swapping and only a single tree held per RAS search (hence up to 10 trees held per pseudoreplicate), (3) 10 RAS searches with TBR swapping and up to 10 trees held per RAS search (hence up to 100 trees held per pseudoreplicate), and (4) 10 RAS searches with TBR swapping and up to 10,000 trees held per RAS search (hence up to 100,000 trees held per pseudoreplicate). Because of memory and/or speed limitations, fewer than 2500 BS and/or JK pseudoreplicates were used for 14 of the empirical matrices (from Bailey et al. (2006), McMahon and Sanderson (2006), and Thomson and Shaffer (2010)) when performing 10 RAS searches with TBR swapping and up to 10,000 trees held per RAS search. The number of pseudoreplicates actually used in these cases ranged from 23 to 1812 with a median of only 55. Because of these low numbers of pseudoreplicates, a high degree of error is expected relative to the other BS and JK analyses performed. These 14 abbreviated PAUP analyses constitute just 8% of all PAUP BS and JK analyses performed. Greater than ±1% accuracy is expected for all BS analyses using 10,000 pseudoreplicates, and greater than ±2% accuracy is expected for all BS analyses using 2500 replicates, for BS support P50% (Hedges, 1992). Identical BS and JK searches to those performed in PAUP were performed in TNT, with the exception that the JK deletion probability was set to 0.37 because TNT does not allow it to be set beyond the nearest hundredth. Because of the far greater speed relative to PAUP, all TNT searches were run to completion. To the degree possible, the four sets of contrived data sets were also run in DAMBE (Xia and Xie, 2001), MEGA (Tamura et al., 2007), PHYLIP (Felsenstein, 2009), POY4 (Varón et al., 2010), and SeaView (Gouy et al., 2010). The 0 and 1 character states were converted to adenines and thymines for the DAMBE, MEGA, and SeaView analyses. DAMBE ver was used to perform both BS and 50%-deletion JK analyses (the probability of deletion cannot be varied for JK analyses). There are no options to control the quality of tree-search or the number of trees held per random-addition-order, BS, or JK pseudoreplicate. Because of memory limitations, only 1000 pseudoreplicates, each consisting of 10 RAS searches, were performed. Because DAMBE eliminates any characters with missing data,? was changed to autapomorphic cytosines for contrived examples 1 3. Because DAMBE is limited to nucleotide character states and would treat N s as missing data, it was impossible to run contrived matrix 4. Presumably because of limitations on the number of terminals, DAMBE was unable to complete even a single search for the most parsimonious trees on contrived example 1. As such, the DAMBE results are limited to contrived examples 2 and 3. MEGA ver. 4.1b3 was used to perform BS analyses. Tree searches were performed using all characters rather than the default setting of excluding those characters with gaps or missing data. There are no options to control the number of trees held per RAS replicate or BS pseudoreplicate. Because of memory and speed limitations, 10,000 BS pseudoreplicates, each consisting of 100 RAS searches, were performed using close-neighbor-interchange with the intermediate search level (two) for contrived examples 2 4. Because of time limitations, contrived example 1 was analyzed using 200 BS pseudoreplicates, each consisting of five RAS searches using close-neighbor interchange with the lowest search level (one). The PHYLIP ver program package was used to perform BS and JK analyses. Matrix weight files were constructed with SEQ- BOOT and these were used to perform the parsimony analyses with PARS. 10,000 pseudoreplicates were run in all cases. Within PARS, five RAS searches with the more thorough branch swapping option were applied in each pseudoreplicate. One, 1000, or 10,000 trees were held per pseudoreplicate. The JK deletion fraction was set at 37%. MRCs with resampling values were constructed with CONSENSE. POY4 ver was used to perform BS and JK analyses. Data were run as morphological characters with a static matrix pseudoreplicates were performed, using 10 RAS searches with TBR branch swapping, and saving one, 1000, or 10,000 trees per RAS search. SeaView ver and were used to perform BS analyses through dnapars from PHYLIP ver All gapped characters were included in the analyses and gaps were treated as unknown states rather than as a separate character state. There are no options to control the quality of tree-searches. Because of memory and speed limitations, only 1000 BS pseudoreplicates were performed for most analyses (200 and 150 pseudoreplicates were performed for the first contrived example when holding 10 and 10,000 trees per search, respectively). It is unclear whether the number of optimal trees retained is set per RAS search (as in PAUP ) or BS pseudoreplicate, but this was alternatively set to one RAS search with one tree held, 100 RAS searches with 10 trees held, or 100 RAS searches with 10,000 trees held. Together with M. Gouy (pers. comm., 2010), we found a bug in how SeaView ver handles multiple equally parsimonious trees within a given pseudoreplicate, which M. Gouy corrected in ver Only the results from ver are reported here Likelihood tree searches To demonstrate that the FWR and undersampling-within-replicates artifacts are not limited to parsimony-based analyses, the second contrived example was also analyzed using likelihood in PAUP and RAxML ver (Stamatakis, 2006). The second contrived example was modified from 0/1 character states into nucleotide character states, with the same character-state distribution among terminals for each character being maintained (Table 4). All 12 types of nucleotide substitutions are represented among the 21 characters. In addition to analyzing the data matrix of 11 terminals coded for 21 characters, a second data matrix was created by adding 1000 invariant characters with 250 characters for each of the four nucleotides. So as to perfectly fit the data, the Felsenstein (1981) model (F81) was applied to the 21-character matrix in PAUP by selecting Nst = 1 and the default setting of BaseFreq = Empirical. The F81 model with invariant sites (Reeves, 1992) set to pinvar = (i.e., 1000/1021) was applied to the 1021-character matrix. The optimal likelihood trees were found for each matrix using branch-and-bound while holding all trees. Both BS and JK analyses were performed on each matrix using 2500 pseudoreplicates, and RAS followed by TBR searches (one RAS search with one tree held,

6 182 M.P. Simmons, J.V. Freudenstein / Molecular Phylogenetics and Evolution 61 (2011) Table 4 Data matrix of binary characters that have been modified into nucleotide character states for likelihood analyses of the second contrived example. 1 ATGCAT TGCAGC ATGCATGCG 2 ATGCAT TGCAGC TTGCATGCG 3 TGCAGC TGCAGC AGGCATGCG 4 TGCAGC ATGCAT ATCCATGCG 5 TGCAGC ATGCAT ATGAATGCG 6 TGCAGC ATGCAT ATGCGTGCG 7 TGCAGC ATGCAT ATGCACGCG 8 TGCAGC ATGCAT ATGCATTCG 9 TGCAGC ATGCAT ATGCATGGG 10 TGCAGC ATGCAT ATGCATGCA Wildcard???????????? TGCAGCTGA 10 RAS searches with one tree held per RAS search, and 10 RAS searches with up to 10 trees held per RAS search). Within RAxML, an optimal likelihood tree was searched for using 1000 independent searches starting from randomized parsimony trees with the GTRGAMMA model (the simplest model available for nucleotide characters in RAxML) and four discrete rate categories. BS analyses were conducted using 2500 pseudoreplicates with 100 searches per pseudoreplicate and using the f i option, which refine[s] the final BS tree under GAMMA and a more exhaustive algorithm (Stamatakis, 2008, p. 9) Data analysis The BS and JK results for each data matrix were plotted onto the MRC using TreeGraph 2 ver (Stöver and Müller, 2010). Any additional clades resolved in a given BS or JK tree, which we found can occasionally occur, were not recorded. Only clades with P50% BS or JK support are reported unless otherwise stated. The raw data are posted as supplemental online data at A minor confounding effect when comparing PAUP and TNT results is that the JK deletion probability for PAUP was set to , whereas it was set to 0.37 in TNT. For the four empirical data sets, the results from the two singlelocus studies (Bailey et al., 2006; Richardson et al., 2006) were analyzed together and the results from the two supermatrix studies (McMahon and Sanderson, 2006; Thomson and Shaffer, 2010) were analyzed together. To eliminate any redundancy, only the results from the 223-terminal Thomson and Shaffer (2010) matrix were analyzed (though results from the 213-terminal matrix are included as supplemental online data). The rationale for making these pairwise groups was based on preliminary data inspection indicating consistency of results within each of the combined data sets; and that the single-locus and supermatrix studies are fundamentally different from each other with respect to numbers of parsimony-informative characters, percent missing and inapplicable entries for parsimony-informative characters, and the maximum percent of missing and inapplicable entries from single terminals (Table 3). To sum and quantify the BS and JK support for otherwise unsupported clades, the averaged number of artifactual clades resolved (Simmons et al., 2010), a measure similar to the averaged overall success of resolution (Simmons and Webb, 2006), was applied. This measure scales BS and JK support to that conferred by one to four uncontradicted synapomorphies. Clades with 50 62% BS/JK support (less than that provided by one uncontradicted synapomorphy) were set to 0.2, 63 85% support to 0.4, 86 94% support to 0.6, 95 97% support to 0.8, and % support (equivalent to at least four uncontradicted synapomorphies) to 1.0. Least-squares regression equations and determination coefficients were calculated in Microsoft Excel. For cases where the raw data were appropriate for regression (i.e., each value was an estimate of some parameter), those values were used directly. In cases where the data points that were collected resulted in a sum, the sums were used for regression, resulting in fewer data points being used. Because it is likely that the assumption of homoscedasticity is violated in studies of branch-support values given that variance is typically greater with lower values than with higher ones (Hedges, 1992), significance values were not determined for the regressions, but the slopes and determination coefficients (r 2 ) were used as a general guide to the patterns and strength of correlations. In order to test directly whether the SC or another approach is implemented in a particular program, a simple binary matrix was constructed and run using a small number n of resampling replicates and saving multiple trees per replicate (Jerrold I. Davis, pers. comm., 2010). Programs using a SC approach can only yield support values that are multiples of 1/n, while other approaches, such as FWR, can yield intermediate values. The specific matrix that we used may be accessed in the supplemental online data. 3. Results 3.1. Contrived examples In the first contrived example, the tree is entirely unresolved in the SC, but clades (1, 2) and (1, 2, 3) are present in 99% and 98%, respectively, of the 100 most parsimonious trees found (Fig. 1). All programs reported P97% BS for clade (1, 2) and P95% BS for clade (1, 2, 3). Similarly high JK support was reported by PAUP and PHYLIP for the same two clades, yet both clades received <50% JK support by TNT when more than one tree could be held per pseudoreplicate. Due to computational time constraints, not all PHYLIP or POY4 analyses were completed. This first example demonstrates how high the inferred support can be for unsupported clades. In the second contrived example, the tree is entirely unresolved in the SC, but clades (1, 2) and (1, 2, 3) are present in 89% and 78%, respectively, of the nine most parsimonious trees when collapsing branches with a minimum possible optimized length of zero (Fig. 2). The same two clades are present in 98% and 96% of the 51 most parsimonious trees when such branches were not collapsed. Collapsing such branches always decreased both PAUP and TNT BS and JK values for the unsupported clades when all trees were held in each pseudoreplicate, but did not do so when only a single tree was held. Consistent with the first contrived example, only TNT and POY4 JK values were <50% for both unsupported clades when multiple trees were held per pseudoreplicate and branches with a minimum possible optimized length of zero were collapsed. This second example demonstrates the importance of collapsing branches with a minimum possible length of zero. In part A of the third contrived example, only clade (1B 6B + wildcard) is resolved in the SC, but clades (1B, 2B) and (1B, 2B, 3B) are present in 80% and 60%, respectively, of the most parsimonious trees found (Fig. 3). Despite being present in the SC and most JK trees, clade (1B 6B + wildcard) is unresolved in all P50% BS trees. Only SeaView BS and TNT and POY4 JK values are <50% for both unsupported clades when multiple trees were held per pseudoreplicate. Part B of the third contrived example, after elimination of terminals 1A 10A, is otherwise identical to part A with respect to resolution on the SC and MRC trees. Although the two unsupported clades might be considered independent of clade (1B 6B + wildcard) from part A, BS and JK values for these two unsupported clades generally decreased in part B, often substantially, and particularly when multiple trees were held per pseudoreplicate. This third example demonstrates that a localized wildcard terminal in the SC (as in part A) can behave as a global wildcard in some BS

7 M.P. Simmons, J.V. Freudenstein / Molecular Phylogenetics and Evolution 61 (2011) Fig. 1. Results from the first contrived example plotted on the majority-rule consensus. Percentages in the majority-rule consensus are at the left edge of each branch. BS support is above each branch while JK support is below each branch. The maximum number of trees held in a given pseudoreplicate increases from left to right (1, 10, 100, 100,000). The BS/JK supports for MEGA, PHYLIP, and SeaView analyses are at their closest inferred positions. If a clade was not resolved in a given P50% BS or JK tree then it is reported as. Fig. 2. Results from the second contrived example plotted on the majority-rule consensus for both parsimony and likelihood. Percentages in the parsimony majority-rule consensus are at the left edge of each branch. BS support is above each branch while JK support is below each branch. The maximum number of trees held in a given pseudoreplicate increases from left to right (parsimony: 1, 1000, all; PAUP likelihood: 1, 10, 100). The BS/JK supports for DAMBE, MEGA, PHYLIP, RAxML, and SeaView analyses are at their closest inferred positions. If a clade was not resolved in a given P50% BS or JK tree then it is reported as. col 0 refers to collapsing branches if the minimum optimized length is zero; said branches were not collapsed in the other PAUP and TNT analyses reported here.

8 184 M.P. Simmons, J.V. Freudenstein / Molecular Phylogenetics and Evolution 61 (2011) Fig. 3. Results from the third contrived example plotted on the majority-rule consensus. Percentages in the majority-rule consensus are at the left edge of each branch. BS support is above each branch while JK support is below each branch. The a analyses are those that incorporated terminals 1A 10A, whereas the b analyses did not. The maximum number of trees held in a given pseudoreplicate increases from left to right (1, 1000, all). The BS/JK supports for DAMBE, MEGA, PHYLIP, and SeaView analyses are at their closest inferred positions. If a clade was not resolved in a given P50% BS or JK tree then it is reported as. indicates that the wildcard terminal was resolved on the other side of the branch in the TNT BS/JK trees. and JK pseudoreplicates, thereby making the artifacts more severe than may otherwise be expected (as in part B). In the fourth contrived example, the tree is entirely unresolved in the SC, but clade (7, 8) is present in 99.9% of the most parsimonious trees (Fig. 4). The BS and JK values assigned to this unsupported clade differed dramatically among phylogenetic-inference programs as well as, in PAUP, the number of trees held. SeaView BS, TNT BS and JK, and PAUP BS and JK, when only a single tree could be held per pseudoreplicate, are <50%. In contrast, MEGA BS, PHYLIP BS and JK, and PAUP BS and JK, when multiple trees could be held per pseudoreplicate, are P95% and frequently 100% (presumably due to rounding error, at least in the PAUP branch-and-bound searches). The BS and JK values also differ dramatically for the other clades shown in Fig. 4 depending on the program and the number of trees held. PAUP BS and JK values are 100% for all of these clades [except clade (1 6) with 99% JK] and TNT BS and JK values are 54% when only a single most parsimonious tree was held per pseudoreplicate. Yet these clades received <50% BS and JK values by PAUP and TNT when all optimal trees were held per pseudoreplicate. Alternatively, PHYLIP BS and JK values are 64% for clades (1, 2) and (13, 14), yet <50% for the other clades, when only a single tree could be held per pseudoreplicate. SeaView provided different values still, with P60% BS for four of the clades when only a single tree could be held per pseudoreplicate. This fourth example demonstrates how minimal character overlap between sampled terminals in supermatrix analyses can create the artifacts and how the BS and JK values can vary dramatically amongst programs Empirical data All of the following results are restricted to clades that were resolved in the MRC but not in the SC. By definition (Goloboff et al., 2003), all such clades are unsupported (or even contradicted if such clades conflict with those resolved in the SC). The number of terminals in each clade (in an unrooted sense, with the smallest possible clade size reported) vs. the averaged percentage in the MRC for those clades is presented in Fig. 5. A negative relationship (least-square regression slope = 0.96x; r 2 = 0.21) was inferred in the combined supermatrix results, whereas a very weak positive

9 M.P. Simmons, J.V. Freudenstein / Molecular Phylogenetics and Evolution 61 (2011) Fig. 4. Results from the fourth contrived example plotted on the BS or MRC for three selected clades. Percentages in the majority-rule consensus (including those <50%) are at the left edge of each branch. BS support is above each branch while JK support is below each branch. The maximum number of trees held in a given pseudoreplicate increases from left to right (1, 1000, all). The BS/JK supports for MEGA, PHYLIP, and SeaView analyses are at their closest inferred positions. If a clade was not resolved in a given P50% BS or JK tree then it is reported as. %in majority-rule consensus # of terminals in clade Fig. 5. Number of terminals per clade vs. percentage in majority-rule consensus for that clade for all single-locus analyses (filled diamonds) and supermatrix analyses (open triangles). Error bars are ±95% confidence intervals. relationship was inferred in the combined single-locus results (least-square regression slope = 0.19x; r 2 = 0.02). The number of terminals in each clade vs. the frequency of the FWR or undersampling-within-replicates artifact for those clades (defined by P50% support from two or more of the four treesearch strategies) is presented in Fig. 6a and b for the single-locus and supermatrix results, respectively. A slightly negative relationship was inferred in all single-locus (PAUP BS = 0.009x, r 2 = 0.18; PAUP JK = 0.009x, r 2 = 0.19; TNT BS = 0.006x, r 2 = 0.16; TNT JK = 0.008x, r 2 = 0.14) and the PAUP supermatrix results (BS = 0.008x, r 2 = 0.06; JK = 0.012x, r 2 = 0.14), while a stronger negative relationship was inferred the TNT supermatrix results (BS = 0.028x, r 2 = 0.62; JK = 0.031x, r 2 = 0.74). The percentage in the MRC (binned at 5% intervals) vs. the frequency of the FWR or undersampling-within-replicates artifact for those clades is presented in Fig. 6c and d for the single-locus and supermatrix results, respectively. A clear positive relationship was inferred in all supermatrix results (PAUP BS = 0.009x, r 2 = 0.83; PAUP JK = 0.009x, r 2 = 0.78; TNT BS = 0.012x, r 2 = 0.85; TNT JK = 0.012x, r 2 = 0.82), while a weaker positive relationship was inferred in the single-locus results (PAUP BS = 0.001x, r 2 = 0.18; PAUP JK = 0.003x, r 2 = 0.22; TNT BS = 0.001x, r 2 = 0.07; TNT JK = 0.003x, r 2 = 0.22). The percentage in the MRC vs. the scaled support for those clades (as defined in Section 2.5, averaged across all four treesearch strategies) is presented in Fig. 6e and f for the single-locus and supermatrix results, respectively. These comparisons were limited to those clades resolved by both BS and JK within PAUP for the PAUP results and those resolved by both BS and JK within TNT for the TNT results. The comparisons were limited in this manner so as to better compare BS and JK results within each program without biasing the averages based on inclusion of clades with weak support in JK trees that are entirely absent in P50% BS trees. A clear positive relationship was inferred in all supermatrix results (PAUP BS = 0.006x, r 2 = 0.94; PAUP JK = 0.006x, r 2 = 0.91; TNT BS = 0.005x, r 2 = 0.93; TNT JK = 0.005x, r 2 = 0.93), while a weak positive relationship was inferred in the single-locus results (PAUP BS = 0.003x, r 2 = 0.72; PAUP JK = 0.005x, r 2 = 0.22; TNT BS = x, r 2 = 0.002; TNT JK = 0.002x, r 2 = 0.04). The tree-search strategy within each pseudoreplicate, with increasing thoroughness and number of trees held, vs. the number of occurrences of the FWR or undersampling-within-replicates artifact is presented in Fig. 7a and b for the single-locus and supermatrix results, respectively. Increasing the number of RAS/TBR searches performed from 1 to 10, and consequently the number of trees held from 1 to 610, increased the number of occurrences of the artifacts in all PAUP results, and to a lesser degree in the TNT supermatrix results. Further increasing the number of trees held generally decreased the number of occurrences of the artifacts in both PAUP and TNT results. Unlike PAUP, the number of occurrences of the undersampling-within-replicates artifact in the

10 186 M.P. Simmons, J.V. Freudenstein / Molecular Phylogenetics and Evolution 61 (2011) (a) (b) single-locus frequency of artifact frequency of artifact supermatrix # of terminals in clade # of terminals in clade (c) frequency of artifact PAUP* bootstrap PAUP* jackknife TNT bootstrap TNT jackknife (d) frequency of artifact % in majority-rule consensus % in majority-rule consensus (e) (f) scaled support for artifact scaled support for artifact % in majority-rule consensus % in majority-rule consensus Fig. 6. Pairwise comparisons between three potentially correlated factors in the single-locus (a, c, and e) and supermatrix (b, d, and f) analyses. (a and b) number of terminals in clade vs. frequency of artifacts; (c and d) percentage in majority-rule consensus vs. frequency of artifacts; (e and f) percentage in majority-rule consensus vs. scaled support for artifacts. Symbols used are explained in (c). Error bars are ±95% confidence intervals. supermatrix results precipitously decreased for TNT when increasing the maximum number of trees held from 10 to 10,000 per RAS/ TBR search. Essentially the same results were obtained when the tree-search strategy within each pseudoreplicate was compared to the number of occurrences of the FWR or undersampling-withinreplicates artifact scaled to support for those clades (Fig. 7c and d). 4. Discussion The contrived examples were deliberately created to show how extreme the FWR and undersampling-within-replicates artifacts could be. Yet the supermatrix studies demonstrated that the same artifacts could lead to 90 + % BS/JK values for unsupported clades

(Stevens 1991) 1. morphological characters should be assumed to be quantitative unless demonstrated otherwise

(Stevens 1991) 1. morphological characters should be assumed to be quantitative unless demonstrated otherwise Bot 421/521 PHYLOGENETIC ANALYSIS I. Origins A. Hennig 1950 (German edition) Phylogenetic Systematics 1966 B. Zimmerman (Germany, 1930 s) C. Wagner (Michigan, 1920-2000) II. Characters and character states

More information

C3020 Molecular Evolution. Exercises #3: Phylogenetics

C3020 Molecular Evolution. Exercises #3: Phylogenetics C3020 Molecular Evolution Exercises #3: Phylogenetics Consider the following sequences for five taxa 1-5 and the known outgroup O, which has the ancestral states (note that sequence 3 has changed from

More information

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the

More information

Evaluating phylogenetic hypotheses

Evaluating phylogenetic hypotheses Evaluating phylogenetic hypotheses Methods for evaluating topologies Topological comparisons: e.g., parametric bootstrapping, constrained searches Methods for evaluating nodes Resampling techniques: bootstrapping,

More information

Cladistics. The deterministic effects of alignment bias in phylogenetic inference. Mark P. Simmons a, *, Kai F. Mu ller b and Colleen T.

Cladistics. The deterministic effects of alignment bias in phylogenetic inference. Mark P. Simmons a, *, Kai F. Mu ller b and Colleen T. Cladistics Cladistics 27 (2) 42 46./j.96-3.2.333.x The deterministic effects of alignment bias in phylogenetic inference Mark P. Simmons a, *, Kai F. Mu ller b and Colleen T. Webb a a Department of Biology,

More information

Systematics - Bio 615

Systematics - Bio 615 Bayesian Phylogenetic Inference 1. Introduction, history 2. Advantages over ML 3. Bayes Rule 4. The Priors 5. Marginal vs Joint estimation 6. MCMC Derek S. Sikes University of Alaska 7. Posteriors vs Bootstrap

More information

Consensus Methods. * You are only responsible for the first two

Consensus Methods. * You are only responsible for the first two Consensus Trees * consensus trees reconcile clades from different trees * consensus is a conservative estimate of phylogeny that emphasizes points of agreement * philosophy: agreement among data sets is

More information

Molecular Evolution & Phylogenetics

Molecular Evolution & Phylogenetics Molecular Evolution & Phylogenetics Heuristics based on tree alterations, maximum likelihood, Bayesian methods, statistical confidence measures Jean-Baka Domelevo Entfellner Learning Objectives know basic

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: Distance-based methods Ultrametric Additive: UPGMA Transformed Distance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University Phylogenetics: Bayesian Phylogenetic Analysis COMP 571 - Spring 2015 Luay Nakhleh, Rice University Bayes Rule P(X = x Y = y) = P(X = x, Y = y) P(Y = y) = P(X = x)p(y = y X = x) P x P(X = x 0 )P(Y = y X

More information

Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30

Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30 Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30 A non-phylogeny

More information

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley

PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION Integrative Biology 200B Spring 2009 University of California, Berkeley "PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley B.D. Mishler Jan. 22, 2009. Trees I. Summary of previous lecture: Hennigian

More information

Dr. Amira A. AL-Hosary

Dr. Amira A. AL-Hosary Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

More information

Ratio of explanatory power (REP): A new measure of group support

Ratio of explanatory power (REP): A new measure of group support Molecular Phylogenetics and Evolution 44 (2007) 483 487 Short communication Ratio of explanatory power (REP): A new measure of group support Taran Grant a, *, Arnold G. Kluge b a Division of Vertebrate

More information

X X (2) X Pr(X = x θ) (3)

X X (2) X Pr(X = x θ) (3) Notes for 848 lecture 6: A ML basis for compatibility and parsimony Notation θ Θ (1) Θ is the space of all possible trees (and model parameters) θ is a point in the parameter space = a particular tree

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: istance-based methods Ultrametric Additive: UPGMA Transformed istance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

More information

Phylogenetic Tree Reconstruction

Phylogenetic Tree Reconstruction I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven

More information

Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition

Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition David D. Pollock* and William J. Bruno* *Theoretical Biology and Biophysics, Los Alamos National

More information

The Effect of Ambiguous Data on Phylogenetic Estimates Obtained by Maximum Likelihood and Bayesian Inference

The Effect of Ambiguous Data on Phylogenetic Estimates Obtained by Maximum Likelihood and Bayesian Inference Syst. Biol. 58(1):130 145, 2009 Copyright c Society of Systematic Biologists DOI:10.1093/sysbio/syp017 Advance Access publication on May 21, 2009 The Effect of Ambiguous Data on Phylogenetic Estimates

More information

Weighted compromise trees: a method to summarize competing phylogenetic hypotheses

Weighted compromise trees: a method to summarize competing phylogenetic hypotheses Cladistics Cladistics 29 (2013) 309 314 10.1111/cla.12000 Weighted compromise trees: a method to summarize competing phylogenetic hypotheses Michael J. Sharkey a, *, Stephanie Stoelb a, Daniel R. Miranda-Esquivel

More information

Letter to the Editor. Department of Biology, Arizona State University

Letter to the Editor. Department of Biology, Arizona State University Letter to the Editor Traditional Phylogenetic Reconstruction Methods Reconstruct Shallow and Deep Evolutionary Relationships Equally Well Michael S. Rosenberg and Sudhir Kumar Department of Biology, Arizona

More information

Inferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution

Inferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution Today s topics Inferring phylogeny Introduction! Distance methods! Parsimony method!"#$%&'(!)* +,-.'/01!23454(6!7!2845*0&4'9#6!:&454(6 ;?@AB=C?DEF Overview of phylogenetic inferences Methodology Methods

More information

Effects of Gap Open and Gap Extension Penalties

Effects of Gap Open and Gap Extension Penalties Brigham Young University BYU ScholarsArchive All Faculty Publications 200-10-01 Effects of Gap Open and Gap Extension Penalties Hyrum Carroll hyrumcarroll@gmail.com Mark J. Clement clement@cs.byu.edu See

More information

Phylogenetic Analysis

Phylogenetic Analysis Phylogenetic Analysis Aristotle Through classification, one might discover the essence and purpose of species. Nelson & Platnick (1981) Systematics and Biogeography Carl Linnaeus Swedish botanist (1700s)

More information

Phylogenetic Analysis

Phylogenetic Analysis Phylogenetic Analysis Aristotle Through classification, one might discover the essence and purpose of species. Nelson & Platnick (1981) Systematics and Biogeography Carl Linnaeus Swedish botanist (1700s)

More information

Increasing Data Transparency and Estimating Phylogenetic Uncertainty in Supertrees: Approaches Using Nonparametric Bootstrapping

Increasing Data Transparency and Estimating Phylogenetic Uncertainty in Supertrees: Approaches Using Nonparametric Bootstrapping Syst. Biol. 55(4):662 676, 2006 Copyright c Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 10.1080/10635150600920693 Increasing Data Transparency and Estimating Phylogenetic

More information

Parsimony via Consensus

Parsimony via Consensus Syst. Biol. 57(2):251 256, 2008 Copyright c Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 10.1080/10635150802040597 Parsimony via Consensus TREVOR C. BRUEN 1 AND DAVID

More information

Phylogenetic Analysis

Phylogenetic Analysis Phylogenetic Analysis Aristotle Through classification, one might discover the essence and purpose of species. Nelson & Platnick (1981) Systematics and Biogeography Carl Linnaeus Swedish botanist (1700s)

More information

A Chain Is No Stronger than Its Weakest Link: Double Decay Analysis of Phylogenetic Hypotheses

A Chain Is No Stronger than Its Weakest Link: Double Decay Analysis of Phylogenetic Hypotheses Syst. Biol. 49(4):754 776, 2000 A Chain Is No Stronger than Its Weakest Link: Double Decay Analysis of Phylogenetic Hypotheses MARK WILKINSON, 1 JOSEPH L. THORLEY, 1,2 AND PAUL UPCHURCH 3 1 Department

More information

Phylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches

Phylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches Int. J. Bioinformatics Research and Applications, Vol. x, No. x, xxxx Phylogenies Scores for Exhaustive Maximum Likelihood and s Searches Hyrum D. Carroll, Perry G. Ridge, Mark J. Clement, Quinn O. Snell

More information

Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D

Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D 7.91 Lecture #5 Database Searching & Molecular Phylogenetics Michael Yaffe B C D B C D (((,B)C)D) Outline Distance Matrix Methods Neighbor-Joining Method and Related Neighbor Methods Maximum Likelihood

More information

Phylogenetic inference

Phylogenetic inference Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis-) advantages of different information types

More information

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between

More information

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center Phylogenetic Analysis Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center Outline Basic Concepts Tree Construction Methods Distance-based methods

More information

Phylogenetic analyses. Kirsi Kostamo

Phylogenetic analyses. Kirsi Kostamo Phylogenetic analyses Kirsi Kostamo The aim: To construct a visual representation (a tree) to describe the assumed evolution occurring between and among different groups (individuals, populations, species,

More information

Pinvar approach. Remarks: invariable sites (evolve at relative rate 0) variable sites (evolves at relative rate r)

Pinvar approach. Remarks: invariable sites (evolve at relative rate 0) variable sites (evolves at relative rate r) Pinvar approach Unlike the site-specific rates approach, this approach does not require you to assign sites to rate categories Assumes there are only two classes of sites: invariable sites (evolve at relative

More information

,...,.,.,,.,...,.,...,...,.,.,...

,...,.,.,,.,...,.,...,...,.,.,... Areas of Endemism The definitions and criteria for areas of endemism are complex issues (Linder 2001; Morrone 1994b; Platnick 1991; Szumik et al. 2002; Viloria 2005). There are severa1 definitions of areas

More information

Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week (Jan 27 & 29):

Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week (Jan 27 & 29): Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week (Jan 27 & 29): Statistical estimation of models of sequence evolution Phylogenetic inference using maximum likelihood:

More information

Letter to the Editor. The Effect of Taxonomic Sampling on Accuracy of Phylogeny Estimation: Test Case of a Known Phylogeny Steven Poe 1

Letter to the Editor. The Effect of Taxonomic Sampling on Accuracy of Phylogeny Estimation: Test Case of a Known Phylogeny Steven Poe 1 Letter to the Editor The Effect of Taxonomic Sampling on Accuracy of Phylogeny Estimation: Test Case of a Known Phylogeny Steven Poe 1 Department of Zoology and Texas Memorial Museum, University of Texas

More information

Questions we can ask. Recall. Accuracy and Precision. Systematics - Bio 615. Outline

Questions we can ask. Recall. Accuracy and Precision. Systematics - Bio 615. Outline Outline 1. Mechanistic comparison with Parsimony - branch lengths & parameters 2. Performance comparison with Parsimony - Desirable attributes of a method - The Felsenstein and Farris zones - Heterotachous

More information

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree) I9 Introduction to Bioinformatics, 0 Phylogenetic ree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & omputing, IUB Evolution theory Speciation Evolution of new organisms is driven by

More information

Bootstrap confidence levels for phylogenetic trees B. Efron, E. Halloran, and S. Holmes, 1996

Bootstrap confidence levels for phylogenetic trees B. Efron, E. Halloran, and S. Holmes, 1996 Bootstrap confidence levels for phylogenetic trees B. Efron, E. Halloran, and S. Holmes, 1996 Following Confidence limits on phylogenies: an approach using the bootstrap, J. Felsenstein, 1985 1 I. Short

More information

Chapter 9 BAYESIAN SUPERTREES. Fredrik Ronquist, John P. Huelsenbeck, and Tom Britton

Chapter 9 BAYESIAN SUPERTREES. Fredrik Ronquist, John P. Huelsenbeck, and Tom Britton Chapter 9 BAYESIAN SUPERTREES Fredrik Ronquist, John P. Huelsenbeck, and Tom Britton Abstract: Keywords: In this chapter, we develop a Bayesian approach to supertree construction. Bayesian inference requires

More information

Bootstrapping and Tree reliability. Biol4230 Tues, March 13, 2018 Bill Pearson Pinn 6-057

Bootstrapping and Tree reliability. Biol4230 Tues, March 13, 2018 Bill Pearson Pinn 6-057 Bootstrapping and Tree reliability Biol4230 Tues, March 13, 2018 Bill Pearson wrp@virginia.edu 4-2818 Pinn 6-057 Rooting trees (outgroups) Bootstrapping given a set of sequences sample positions randomly,

More information

A Fitness Distance Correlation Measure for Evolutionary Trees

A Fitness Distance Correlation Measure for Evolutionary Trees A Fitness Distance Correlation Measure for Evolutionary Trees Hyun Jung Park 1, and Tiffani L. Williams 2 1 Department of Computer Science, Rice University hp6@cs.rice.edu 2 Department of Computer Science

More information

Finding the best tree by heuristic search

Finding the best tree by heuristic search Chapter 4 Finding the best tree by heuristic search If we cannot find the best trees by examining all possible trees, we could imagine searching in the space of possible trees. In this chapter we will

More information

Phylogenetics. BIOL 7711 Computational Bioscience

Phylogenetics. BIOL 7711 Computational Bioscience Consortium for Comparative Genomics! University of Colorado School of Medicine Phylogenetics BIOL 7711 Computational Bioscience Biochemistry and Molecular Genetics Computational Bioscience Program Consortium

More information

A (short) introduction to phylogenetics

A (short) introduction to phylogenetics A (short) introduction to phylogenetics Thibaut Jombart, Marie-Pauline Beugin MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis with PR Statistics, Millport Field

More information

Algorithmic Methods Well-defined methodology Tree reconstruction those that are well-defined enough to be carried out by a computer. Felsenstein 2004,

Algorithmic Methods Well-defined methodology Tree reconstruction those that are well-defined enough to be carried out by a computer. Felsenstein 2004, Tracing the Evolution of Numerical Phylogenetics: History, Philosophy, and Significance Adam W. Ferguson Phylogenetic Systematics 26 January 2009 Inferring Phylogenies Historical endeavor Darwin- 1837

More information

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics Bioinformatics 1 Biology, Sequences, Phylogenetics Part 4 Sepp Hochreiter Klausur Mo. 30.01.2011 Zeit: 15:30 17:00 Raum: HS14 Anmeldung Kusss Contents Methods and Bootstrapping of Maximum Methods Methods

More information

A phylogenomic toolbox for assembling the tree of life

A phylogenomic toolbox for assembling the tree of life A phylogenomic toolbox for assembling the tree of life or, The Phylota Project (http://www.phylota.org) UC Davis Mike Sanderson Amy Driskell U Pennsylvania Junhyong Kim Iowa State Oliver Eulenstein David

More information

How to read and make phylogenetic trees Zuzana Starostová

How to read and make phylogenetic trees Zuzana Starostová How to read and make phylogenetic trees Zuzana Starostová How to make phylogenetic trees? Workflow: obtain DNA sequence quality check sequence alignment calculating genetic distances phylogeny estimation

More information

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

CHAPTERS 24-25: Evidence for Evolution and Phylogeny CHAPTERS 24-25: Evidence for Evolution and Phylogeny 1. For each of the following, indicate how it is used as evidence of evolution by natural selection or shown as an evolutionary trend: a. Paleontology

More information

Bootstraps and testing trees. Alog-likelihoodcurveanditsconfidenceinterval

Bootstraps and testing trees. Alog-likelihoodcurveanditsconfidenceinterval ootstraps and testing trees Joe elsenstein epts. of Genome Sciences and of iology, University of Washington ootstraps and testing trees p.1/20 log-likelihoodcurveanditsconfidenceinterval 2620 2625 ln L

More information

What is Phylogenetics

What is Phylogenetics What is Phylogenetics Phylogenetics is the area of research concerned with finding the genetic connections and relationships between species. The basic idea is to compare specific characters (features)

More information

ESS 345 Ichthyology. Systematic Ichthyology Part II Not in Book

ESS 345 Ichthyology. Systematic Ichthyology Part II Not in Book ESS 345 Ichthyology Systematic Ichthyology Part II Not in Book Thought for today: Now, here, you see, it takes all the running you can do, to keep in the same place. If you want to get somewhere else,

More information

Non-independence in Statistical Tests for Discrete Cross-species Data

Non-independence in Statistical Tests for Discrete Cross-species Data J. theor. Biol. (1997) 188, 507514 Non-independence in Statistical Tests for Discrete Cross-species Data ALAN GRAFEN* AND MARK RIDLEY * St. John s College, Oxford OX1 3JP, and the Department of Zoology,

More information

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2016 University of California, Berkeley. Parsimony & Likelihood [draft]

Integrative Biology 200 PRINCIPLES OF PHYLOGENETICS Spring 2016 University of California, Berkeley. Parsimony & Likelihood [draft] Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2016 University of California, Berkeley K.W. Will Parsimony & Likelihood [draft] 1. Hennig and Parsimony: Hennig was not concerned with parsimony

More information

8/23/2014. Phylogeny and the Tree of Life

8/23/2014. Phylogeny and the Tree of Life Phylogeny and the Tree of Life Chapter 26 Objectives Explain the following characteristics of the Linnaean system of classification: a. binomial nomenclature b. hierarchical classification List the major

More information

Consensus methods. Strict consensus methods

Consensus methods. Strict consensus methods Consensus methods A consensus tree is a summary of the agreement among a set of fundamental trees There are many consensus methods that differ in: 1. the kind of agreement 2. the level of agreement Consensus

More information

Many of the slides that I ll use have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Many of the slides that I ll use have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Many of the slides that I ll use have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis

More information

arxiv: v1 [q-bio.pe] 6 Jun 2013

arxiv: v1 [q-bio.pe] 6 Jun 2013 Hide and see: placing and finding an optimal tree for thousands of homoplasy-rich sequences Dietrich Radel 1, Andreas Sand 2,3, and Mie Steel 1, 1 Biomathematics Research Centre, University of Canterbury,

More information

Thanks to Paul Lewis, Jeff Thorne, and Joe Felsenstein for the use of slides

Thanks to Paul Lewis, Jeff Thorne, and Joe Felsenstein for the use of slides hanks to Paul Lewis, Jeff horne, and Joe Felsenstein for the use of slides Hennigian logic reconstructs the tree if we know polarity of characters and there is no homoplasy UPM infers a tree from a distance

More information

1 ATGGGTCTC 2 ATGAGTCTC

1 ATGGGTCTC 2 ATGAGTCTC We need an optimality criterion to choose a best estimate (tree) Other optimality criteria used to choose a best estimate (tree) Parsimony: begins with the assumption that the simplest hypothesis that

More information

Can taxon-sampling effects be minimized by using branch supports? P. Hovenkamp

Can taxon-sampling effects be minimized by using branch supports? P. Hovenkamp Cladistics Cladistics 22 (2006) 264 275 www.blackwell-synergy.com Can taxon-sampling effects be minimized by using branch supports? P. Hovenkamp Nationaal Herbarium Nederland, Leiden, PO Box 9514, NL-2300

More information

Lecture 6 Phylogenetic Inference

Lecture 6 Phylogenetic Inference Lecture 6 Phylogenetic Inference From Darwin s notebook in 1837 Charles Darwin Willi Hennig From The Origin in 1859 Cladistics Phylogenetic inference Willi Hennig, Cladistics 1. Clade, Monophyletic group,

More information

Algorithm-Independent Learning Issues

Algorithm-Independent Learning Issues Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning

More information

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 2009 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

More information

Combining Data Sets with Different Phylogenetic Histories

Combining Data Sets with Different Phylogenetic Histories Syst. Biol. 47(4):568 581, 1998 Combining Data Sets with Different Phylogenetic Histories JOHN J. WIENS Section of Amphibians and Reptiles, Carnegie Museum of Natural History, Pittsburgh, Pennsylvania

More information

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega BLAST Multiple Sequence Alignments: Clustal Omega What does basic BLAST do (e.g. what is input sequence and how does BLAST look for matches?) Susan Parrish McDaniel College Multiple Sequence Alignments

More information

Estimating Evolutionary Trees. Phylogenetic Methods

Estimating Evolutionary Trees. Phylogenetic Methods Estimating Evolutionary Trees v if the data are consistent with infinite sites then all methods should yield the same tree v it gets more complicated when there is homoplasy, i.e., parallel or convergent

More information

The Information Content of Trees and Their Matrix Representations

The Information Content of Trees and Their Matrix Representations 2004 POINTS OF VIEW 989 Syst. Biol. 53(6):989 1001, 2004 Copyright c Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 10.1080/10635150490522737 The Information Content of

More information

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5. Five Sami Khuri Department of Computer Science San José State University San José, California, USA sami.khuri@sjsu.edu v Distance Methods v Character Methods v Molecular Clock v UPGMA v Maximum Parsimony

More information

Tree of Life iological Sequence nalysis Chapter http://tolweb.org/tree/ Phylogenetic Prediction ll organisms on Earth have a common ancestor. ll species are related. The relationship is called a phylogeny

More information

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 200 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

More information

Phylogeny. November 7, 2017

Phylogeny. November 7, 2017 Phylogeny November 7, 2017 Phylogenetics Phylon = tribe/race, genetikos = relative to birth Phylogenetics: study of evolutionary relationships among organisms, sequences, or anything in between Related

More information

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057 Estimating Phylogenies (Evolutionary Trees) II Biol4230 Thurs, March 2, 2017 Bill Pearson wrp@virginia.edu 4-2818 Jordan 6-057 Tree estimation strategies: Parsimony?no model, simply count minimum number

More information

Phylogenomics. Jeffrey P. Townsend Department of Ecology and Evolutionary Biology Yale University. Tuesday, January 29, 13

Phylogenomics. Jeffrey P. Townsend Department of Ecology and Evolutionary Biology Yale University. Tuesday, January 29, 13 Phylogenomics Jeffrey P. Townsend Department of Ecology and Evolutionary Biology Yale University How may we improve our inferences? How may we improve our inferences? Inferences Data How may we improve

More information

Reconstructing the history of lineages

Reconstructing the history of lineages Reconstructing the history of lineages Class outline Systematics Phylogenetic systematics Phylogenetic trees and maps Class outline Definitions Systematics Phylogenetic systematics/cladistics Systematics

More information

Algorithms in Bioinformatics

Algorithms in Bioinformatics Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Distance Methods Character Methods

More information

Phylogenetics in the Age of Genomics: Prospects and Challenges

Phylogenetics in the Age of Genomics: Prospects and Challenges Phylogenetics in the Age of Genomics: Prospects and Challenges Antonis Rokas Department of Biological Sciences, Vanderbilt University http://as.vanderbilt.edu/rokaslab http://pubmed2wordle.appspot.com/

More information

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive. Additive distances Let T be a tree on leaf set S and let w : E R + be an edge-weighting of T, and assume T has no nodes of degree two. Let D ij = e P ij w(e), where P ij is the path in T from i to j. Then

More information

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley

Integrative Biology 200 PRINCIPLES OF PHYLOGENETICS Spring 2018 University of California, Berkeley Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley B.D. Mishler Feb. 14, 2018. Phylogenetic trees VI: Dating in the 21st century: clocks, & calibrations;

More information

Biology 211 (2) Week 1 KEY!

Biology 211 (2) Week 1 KEY! Biology 211 (2) Week 1 KEY Chapter 1 KEY FIGURES: 1.2, 1.3, 1.4, 1.5, 1.6, 1.7 VOCABULARY: Adaptation: a trait that increases the fitness Cells: a developed, system bound with a thin outer layer made of

More information

Phylogenetic Analysis and Intraspeci c Variation : Performance of Parsimony, Likelihood, and Distance Methods

Phylogenetic Analysis and Intraspeci c Variation : Performance of Parsimony, Likelihood, and Distance Methods Syst. Biol. 47(2): 228± 23, 1998 Phylogenetic Analysis and Intraspeci c Variation : Performance of Parsimony, Likelihood, and Distance Methods JOHN J. WIENS1 AND MARIA R. SERVEDIO2 1Section of Amphibians

More information

Multiple sequence alignment accuracy and phylogenetic inference

Multiple sequence alignment accuracy and phylogenetic inference Utah Valley University From the SelectedWorks of T. Heath Ogden 2006 Multiple sequence alignment accuracy and phylogenetic inference T. Heath Ogden, Utah Valley University Available at: https://works.bepress.com/heath_ogden/6/

More information

reconciling trees Stefanie Hartmann postdoc, Todd Vision s lab University of North Carolina the data

reconciling trees Stefanie Hartmann postdoc, Todd Vision s lab University of North Carolina the data reconciling trees Stefanie Hartmann postdoc, Todd Vision s lab University of North Carolina 1 the data alignments and phylogenies for ~27,000 gene families from 140 plant species www.phytome.org publicly

More information

arxiv: v1 [q-bio.pe] 16 Aug 2007

arxiv: v1 [q-bio.pe] 16 Aug 2007 MAXIMUM LIKELIHOOD SUPERTREES arxiv:0708.2124v1 [q-bio.pe] 16 Aug 2007 MIKE STEEL AND ALLEN RODRIGO Abstract. We analyse a maximum-likelihood approach for combining phylogenetic trees into a larger supertree.

More information

Protocol for the design, conducts and interpretation of collaborative studies (Resolution Oeno 6/2000)

Protocol for the design, conducts and interpretation of collaborative studies (Resolution Oeno 6/2000) Protocol for the design, conducts and interpretation of collaborative studies (Resolution Oeno 6/2000) INTRODUCTION After a number of meetings and workshops, a group of representatives from 27 organizations

More information

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution Ziheng Yang Department of Biology, University College, London An excess of nonsynonymous substitutions

More information

The statistical and informatics challenges posed by ascertainment biases in phylogenetic data collection

The statistical and informatics challenges posed by ascertainment biases in phylogenetic data collection The statistical and informatics challenges posed by ascertainment biases in phylogenetic data collection Mark T. Holder and Jordan M. Koch Department of Ecology and Evolutionary Biology, University of

More information

Phylogeny: building the tree of life

Phylogeny: building the tree of life Phylogeny: building the tree of life Dr. Fayyaz ul Amir Afsar Minhas Department of Computer and Information Sciences Pakistan Institute of Engineering & Applied Sciences PO Nilore, Islamabad, Pakistan

More information

Kei Takahashi and Masatoshi Nei

Kei Takahashi and Masatoshi Nei Efficiencies of Fast Algorithms of Phylogenetic Inference Under the Criteria of Maximum Parsimony, Minimum Evolution, and Maximum Likelihood When a Large Number of Sequences Are Used Kei Takahashi and

More information

Evolutionary Tree Analysis. Overview

Evolutionary Tree Analysis. Overview CSI/BINF 5330 Evolutionary Tree Analysis Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Backgrounds Distance-Based Evolutionary Tree Reconstruction Character-Based

More information

Chapter 26 Phylogeny and the Tree of Life

Chapter 26 Phylogeny and the Tree of Life Chapter 26 Phylogeny and the Tree of Life Biologists estimate that there are about 5 to 100 million species of organisms living on Earth today. Evidence from morphological, biochemical, and gene sequence

More information

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Bioinformatics tools for phylogeny and visualization. Yanbin Yin Bioinformatics tools for phylogeny and visualization Yanbin Yin 1 Homework assignment 5 1. Take the MAFFT alignment http://cys.bios.niu.edu/yyin/teach/pbb/purdue.cellwall.list.lignin.f a.aln as input and

More information

AP Biology. Cladistics

AP Biology. Cladistics Cladistics Kingdom Summary Review slide Review slide Classification Old 5 Kingdom system Eukaryote Monera, Protists, Plants, Fungi, Animals New 3 Domain system reflects a greater understanding of evolution

More information

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata.

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata. Supplementary Note S2 Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata. Phylogenetic trees reconstructed by a variety of methods from either single-copy orthologous loci (Class

More information

Phylogenetics: Parsimony

Phylogenetics: Parsimony 1 Phylogenetics: Parsimony COMP 571 Luay Nakhleh, Rice University he Problem 2 Input: Multiple alignment of a set S of sequences Output: ree leaf-labeled with S Assumptions Characters are mutually independent

More information