Molecular dating of phylogenetic trees: A brief review of current methods that estimate divergence times

Size: px
Start display at page:

Download "Molecular dating of phylogenetic trees: A brief review of current methods that estimate divergence times"

Transcription

1 Diversity and Distributions, (Diversity Distrib.) (2006) 12, Blackwell Publishing, Ltd. CAPE SPECIAL FEATURE Molecular dating of phylogenetic trees: A brief review of current methods that estimate divergence times Frank Rutschmann Institute of Systematic Botany, University of Zürich, Zollikerstrasse 107, CH-8008 Zürich, Switzerland Correspondence: Frank Rutschmann, Institute of Systematic Botany, University of Zürich, Zollikerstrasse 107, CH-8008 Zürich, Switzerland. frank@plant.ch ABSTRACT This article reviews the most common methods used today for estimating divergence times and rates of molecular evolution. The methods are grouped into three main classes: (1) methods that use a molecular clock and one global rate of substitution, (2) methods that correct for rate heterogeneity, and (3) methods that try to incorporate rate heterogeneity. Additionally, links to the most important literature on molecular dating are given, including articles comparing the performance of different methods, papers that investigate problems related to taxon, gene and partition sampling, and literature discussing highly debated issues like calibration strategies and uncertainties, dating precision and the calculation of error estimates. Keywords Divergence time estimation, molecular dating methods, rate heterogeneity, review. INTRODUCTION The use of DNA sequences to estimate the timing of evolutionary events is increasingly popular. The idea of dating evolutionary divergences using calibrated sequence differences was first proposed in 1965 by Zuckerkandl and Pauling (1965). The authors postulated that the amount of difference between the DNA molecules of two species is a function of the time since their evolutionary separation. This was shown by comparing protein sequences (hemoglobins) from different species and further comparing amino acid substitution rates with ages estimated from fossils. Based on this central idea, molecular dating has been used in countless studies as a method to investigate mechanisms and processes of evolution. For example, the timing of the eucaryotic evolution (Douzery et al., 2004), the Early Cambrian origin of the main phyla of animals (Cambrian explosion; Wray et al., 1996; Smith & Peterson, 2002; Aris-Brosou & Yang, 2003), the replacement of dinosaurs by modern birds and mammals in the late Tertiary (Madsen et al., 2001), and the age of the last common ancestor of the main pandemic strain of human immunodeficiency virus (HIV; Korber et al., 2000) have all been investigated using molecular dating. Also in plants, there are numerous studies where molecular dating methods have been used to investigate the timeframe of evolutionary events, e.g. for testing biogeographical hypotheses or to investigate the causes of recent radiations (for a more complete review see Sanderson et al., 2004). For example, dating techniques have been applied on taxa from very different taxonomic levels, e.g. to infer the age of plastid-containing eucaryotes (Yoon et al., 2004), land plants (Sanderson, 2003a), tracheophytes (Soltis et al., 2002), angiosperms (Magallón & Sanderson, 2001; Sanderson & Doyle, 2001; Wikström et al., 2001, 2003; Bell et al., 2005), monocot dicot divergence (Chaw et al., 2004), Asterids (Bremer et al., 2004), Myrtales (Sytsma et al., 2004), Crypteroniaceae (Conti et al., 2002) and Fuchsia (Berry et al., 2004). The goal of this article is to give a short overview on the most commonly used molecular dating methods. To allow for easier comparisons, the different methods for estimating divergence times are also summarized in Tables 1 3. A. The ideal case scenario: a molecular clock and one global rate of substitution In the special case of a molecular clock, all branches of a phylogenetic tree evolve at the same, global substitution rate. The clock-like tree is ultrametric, which means that the total distance between the root and every tip is constant. Method 1: Linear regression (Nei, 1987; Li & Graur, 1991; Hillis et al., 1996; Sanderson, 1998) In an ultrametric tree, nodal depths can be converted easily into divergence times, because the molecular distance between each member of a sister pair and their most recent common ancestor is one half of the distance between the two sequences. If the divergence time for at least one node is known (calibration point), the global rate of substitution can be estimated and, based on that, divergence times for all nodes can be calculated by linear regression of the molecular distances (Li & Graur, 1991; Sanderson, 1998) The Author DOI: /j x 35 Journal compilation 2006 Blackwell Publishing Ltd

2 F. Rutschmann Table 1 Comparison table of different molecular dating methods. Part 1: Methods that use a molecular clock and one global rate of substitution The specifications about ease of use and popularity represent only the author s personal view and are therefore highly subjective No Method Linear regression Mean path length Langley Fitch ML with clock Author(s) Nei (1987); Li & Graur (1991) Bremer & Gustafsson (1997) Langley & Fitch (1974) Felsenstein (1981) Software where it is implemented PATH (Britton et al., 2002) R8S (Sanderson, 2003b) Many phylogenetic packages Current version 1.7 Runs on operating system(s) Unix /Linux with Gpc 1999 Unix /Linux Depends on program Optimization strategy Maximum likelihood Maximum likelihood Input data* Distance matrix Phylogram (with bl) Phylogram (with bl) Sequence data (+ tree topology ) Allows multiple calibration points Yes No Yes Depends on program Accounts for multiple data sets/partitions No No No No Provides error estimates (SE/CI/CrI) Yes, CI Yes, mean path length CI Yes, internal and bootstrap CIs Depends on program Ease of use Easy Easy Medium Depends on program Popularity according to literature Very popular until about 10 years ago Popular Less popular Very popular *Except model parameters, priors or calibration constraints. SE, standard error; CI, 95% confidence interval; CrI, 95% Bayesian credibility interval. Unix software also runs under Mac OS X, as this operating system bases on Darwin, an open source Unix environment. PAUP* (Swofford, 2001), DNAMLK (part of the PHYLIP package; Felsenstein, 1993), BASEML (part of the PAML package; Yang, 1997), MRBAYES (Huelsenbeck and Ronquist, 2001), BEAST (Drummond & Rambaut, 2003), etc. If the user provides a tree topology in addition to the sequences, the optimization runs much faster The Author

3 Review: Molecular dating methods Table 2 Comparison table of different molecular dating methods. Part 2: Methods that correct for rate heterogeneity The specifications about ease of use and popularity represent only the author s personal view and are therefore highly subjective No. 5 Method Linearized trees Author(s) Li & Tanimura (1987) Software where it is implemented Current version Runs on operating system(s) Optimization strategy Depends on clock method Input data* Phylogram (with bl) Allows multiple calibration points Depends on clock method Accounts for multiple data sets/partitions No Provides error estimates (SE/CI/CrI) Depends on clock method Ease of use Easy Popularity according to literature Less popular No. 6 Method Local molecular clock Author(s) Hasegawa et al. (1989); Uyenoyama (1995); Rambaut & Bromham (1998); Yoder & Yang (2000) Software where it is implemented BASEML (PAML; Yang, 1997) QDATE (Rambaut & Bromham, 1998) RHINO (Rambaut, 2001) Current version Runs on operating system(s) Unix /Linux/Windows Mac OS 9.x/Unix /Linux/Windows Mac OS 9.x/Unix /Linux/Windows Optimization strategy Maximum likelihood Maximum likelihood Maximum likelihood Input data* Phylogram (with bl) Sequence data + quartet definition Sequence data + tree topology Allows multiple calibration points Yes Yes Yes Accounts for multiple data sets/partitions Yes No Only codon position partitions Provides error estimates (SE/CI/CrI) Yes, SE Yes, CI Yes, CI Ease of use Medium Easy Medium Popularity according to literature Popular Less popular Popular, but more for rate comparisons Software where it is implemented R8S (Sanderson, 2003b) BEAST (Drummond & Rambaut, 2003) Current version Runs on operating system(s) Unix /Linux Unix /Linux/Windows, requires Java 1.4 Optimization strategy Smoothing/minimizing Optimality function Bayesian MCMC Input data* Phylogram (with bl) Sequence data Allows multiple calibration points Yes Yes Accounts for multiple data sets/partitions No Yes Provides error estimates (SE/CI/CrI) Yes, CI, but separate bootstrapping is required Yes, CrI Ease of use Medium Medium Popularity according to literature Popular, but more for NPRS and PL methods Becoming increasingly popular *Except model parameters, priors, or calibration constraints. SE, standard error; CI, 95% confidence interval; CrI, 95% Bayesian credibility interval. Unix software also runs under Mac OS X, as this operating system bases on Darwin, an open source Unix environment The Author 37

4 F. Rutschmann Table 3 Comparison table of different molecular dating methods. Part 3: Methods that incorporate rate heterogeneity The specifications about ease of use and popularity represent only the author s personal view and are therefore highly subjective No. 7 8 Method NPRS PL Author(s) Sanderson (1997) Sanderson (2002) Software where it s implemented R8S (Sanderson, 2003b) TREEEDIT (Rambaut & Charleston, 2002) R8S (Sanderson, 2003b) Current version a Runs on operating system(s) Unix /Linux Mac OS 8.6 or later, including Mac OS X Unix /Linux Optimization strategy Smoothing/minimizing optimality function Smoothing/minimizing optimality function Smoothing/minimizing optimality function Input data* Phylogram (with bl) Phylogram (with bl) Phylogram (with bl) Model of rate evolution Rate autocorrelation Rate autocorrelation Rate autocorrelation Allows multiple calibration points Yes No Yes Accounts for multiple data sets/partitions No No No Provides error estimates (SE/CI/CrI) Yes, internal and boostrap CIs No Yes, CI, but separate bootstrapping is required Accounts for phylogenetic uncertainty No No No Ease of use Medium Easy (graphical user interface) Medium Popularity according to literature Popular Popular Becoming increasingly popular No Method AHRS MULTIDIVTIME PHYBAYES Author(s)Yang (2004) Thorne et al. (1998), Kishino et al. (2001) Aris-Brosou & Yang (2002) Software where it is implemented BASEML (PAML; Yang, 1997) MULTIDIVTIME (Thorne & Kishino, 2002) PHYBAYES (Aris-Brosou & Yang, 2001) Current version 3.14 (since 3.14 beta 5) 9/25/03 0.2e Runs on operating system(s) Unix /Linux/Windows Unix /Linux/Windows Unix /Linux/Windows Optimization strategy Maximum likelihood Bayesian MCMC Bayesian MCMC Input data* Sequence data + tree topology Sequence data + tree topology Sequence data + tree topology Model of rate evolution Rate autocorrelation Rate autocorrelation Rate autocorrelation Rates are drawn from Lognormal distribution Six different distributions Prior distribution of divergence time Described as dirichlet distribution Described as generalized birth death process Allows multiple calibration points Yes Yes, as user-specified intervals No Accounts for multiple data sets/partitions Yes Yes No Provides error estimates (SE/CI/CrI) Yes, CI Yes, CrI Yes, CrI, but must be calculated by the user Accounts for phylogenetic uncertainty No No, but accepts polytomies in input tree No Ease of use Medium Medium (use step-by-step manual; Rutschmann, 2005) Medium Popularity according to literature Not yet very popular Becoming increasingly popular Not yet very popular The Author

5 Review: Molecular dating methods Table 3 Continued No Method Variable rate models in BEAST Overdispersed clock Compound Poisson Author(s) Drummond et al. (submitted) Cutler (2000) Huelsenbeck et al. (2000) Software where it is implemented BEAST (Drummond et al. submitted) C program (Cutler, 2000) C program (Huelsenbeck et al., 2000) Current version 1.3 Dating5.c Runs on operating system(s) Unix /Linux/Windows, requires Java 1.4 Unix /Linux/Windows Optimization strategy Bayesian MCMC Maximum likelihood Bayesian MCMC Input data* Sequence data Phylogram (with bl) Model of rate evolution Various models implemented** Doubly stochastic Poisson process Compound poisson process Rates are drawn from Different distributions such as exp or lognormal Prior distribution of divergence time No specific description, priors are uniform Allows multiple calibration points Yes Yes Accounts for multiple data sets/partitions Yes No Provides error estimates (SE/CI/CrI) Yes, CrI Yes, CI Accounts for phylogenetic uncertainty Yes No Ease of use Medium, a range of tutorials is available Medium Popularity according to literature Becoming increasingly popular Not yet very popular *Except model parameters, priors, or calibration constraints. SE, standard error; CI, 95% confidence interval; CrI, 95% Bayesian credibility interval. Unix software also runs under Mac OS X, as this operating system bases on Darwin, an open source UNIX environment. Plus one hyperparameter (autocorrelation value ν). Plus several hyperparameters (describing speciation and extinction rate). **Implements a range of relaxed clock models by Drummond et al. (submitted), but also the models by Thorne and Kishino (2002) and Aris-Brosou and Yang (2002). Additionally, two graphical user interfaces called BEAUTI and TRACER facilitate data setup and output analysis. The method is implemented in a C program, but it is not meant for public access (as there is no documentation). For the age of calibration points, different prior distributions are available, such as normal, lognormal, exponential or gamma The Author 39

6 F. Rutschmann In other words: the observed number of differences D between two given sequences is a function of the constant rate of substitution r (subst. * site 1 * Myr 1 ) and the time t (Myr) elapsed since the lineage exists. If we have one calibration point (e.g. if we can assign a fossil or geological event to one specific node in the tree), we can calculate the global substitution rate as follows: r = D/2t. In a second step, we can use the global rate r to calculate the divergence time between any other two sequences: t = D/2r. In those cases where we have more than one calibration point, we can plot all calibration nodes in an age-genetic distance diagram, build a (weighted) regression line, whose slope is a function of the global substitution rate, and then interpolate (or extrapolate) the divergence times for the unknown nodes. The scatter of data points around the regression line provides then a confidence interval around estimated ages. Although it is possible to estimate the global substitution rate r over an entire phylogeny (see methods 3 and 4), pairwise sequence comparisons have provided the most widely used approach to molecular dating until approximately 5 years ago, probably because the calculations can be done easily with any statistical or spreadsheet software. Method 2: Tree-based mean path length method (Bremer & Gustafsson, 1997; Britton et al., 2002) The mean path length method estimates rates and divergence times based on the mean path length (MPL) between a node and each of its terminals. By calculating the MPL between a calibration node and all its terminals, and dividing it by the known age of the node, the global substitution rate is obtained. To calculate the divergence time of a node, its MPL is divided by the global rate. Although the calculation is simple enough to be done by hand, Britton et al. (2002) provide a Pascal program, named PATH, for the analysis of larger trees. Method 3: Tree-based maximum likelihood clock optimization (Langley & Fitch, 1974; Sanderson, 2003b) The Langley Fitch method uses maximum likelihood (ML) to optimize the global rate of substitutions, starting with a phylogeny for which branch lengths are known. Using the optimized, constant rate, branch lengths and divergence times are then estimated. Finally, the results plus the outcome of a chi-square test of rate constancy are reported. The method is implemented in R8s (Sanderson, 2003b). Method 4: Character-based maximum likelihood clock optimization (Felsenstein, 1981; Swofford et al., 1996) Under the assumption of rate constancy (and therefore under the constraints of ultrametricity), the global rate of substitution can also be optimized by ML directly from sequence data during the phylogenetic reconstruction. The likelihood is then a much more complex function of the data matrix, and the computing time is much higher than for the phylogenetic reconstruction without ultrametric constraints. Once the global rate of substitution is known, branch lengths and divergence times can be calculated. The ML clock optimization method is implemented, at least partially, in PAUP* (Swofford, 2001), DNAMLK (part of the PHYLIP package; Felsenstein, 1993), BASEML (part of the PAML package; Yang, 1997), and other phylogenetic packages. It is perhaps the most widespread strategy commonly known as enforcing the molecular clock. Depending on the software, the additional constrain of a fixed tree topology can be provided by the user, which reduces the complexity of the likelihood function and the computing time significantly. B. The reality (in most cases): rate heterogeneity or the relaxed clock As sequences from multiple species began to accumulate during the 1970s, it became apparent that a clock is not always a good model for the process of molecular evolution (Langley & Fitch, 1974).Variation in rates of nucleotide substitution, both along a lineage and between different lineages, is known to be pervasive (Britten, 1986; Gillespie, 1986; Li, 1997). Several reasons are given for these deviations from the clock-like model of sequence evolution (some people call it relaxed or sloppy clock): (1) generation time: a lineage with shorter generation time accelerates the clock because it shortens the time to accumulate and fix new mutations during genetic recombination (Ohta, 2002; but disputed by Whittle & Johnston, 2003); (2) metabolic rate: organisms with higher metabolic rates have increased rates of DNA synthesis and higher rates of nucleotide mutations than species with lower metabolic rates (Martin & Palumbi, 1993; Gillooly et al., 2005); (3) mutation rate: species-characteristic differences in the fidelity of the DNA replication or DNA repair machinery (Ota & Penny, 2003); and (4) effect of effective population size on the rate of fixation of mutations: the fixation of nearly neutral alleles is expected to be the greatest in small populations (according to the nearly neutral theory of DNA evolution; Ohta, 2002). Because it is much easier to calculate divergence times under the clock model (with one global substitution rate), it is worth testing the data for clock-like behavior. This can be done by comparing how closely the ultrametric and additive trees fit the data. For example, the likelihood score of the best ultrametric tree can be compared with the (usually higher) likelihood score of the best additive tree and the difference between the two values (multiplied by two) can be checked for significance on a chisquare table with n 2 degrees of freedom (where n is the number of terminals in the tree; likelihood ratio test; Felsenstein, 1988, 1993, 2004; Muse & Weir, 1992). Other tests that can be applied to identify parts of the tree that show significant rate deviations are the relative rates test (Wu & Li, 1985) and the Tajima test (Tajima, 1993). However, all these clock tests lack power for shorter sequences and will detect only a relatively low proportion of cases of rate variation for the types of sequence that are typically used in molecular clock studies (Bromham et al., 2000; Bromham & Penny, 2003). Failure to detect clock variation can cause systematic error in age estimates, because undetected rate The Author

7 Review: Molecular dating methods variation can lead to significantly over- or underestimated divergence times (Bromham et al., 2000). If the null hypothesis of a constant rate is rejected, or if we have evidence suggesting that test results should be treated with caution, we might conclude that rates vary across the tree; in such cases the use of methods that try to model rate changes over the tree is necessary. This procedure is also supported by the fact that an increasing number of divergence time analyses show significant deviations from a molecular clock, especially if sequences of distantly related species are analysed (e.g. different orders or families; Yoder & Yang, 2000; Hasegawa et al., 2003; Springer et al., 2003). At least two groups of methods try to handle a relaxed clock: (1) methods that correct for rate heterogeneity before the dating and (2) methods that incorporate rate heterogeneity in the dating process, on the basis of specific rate change models. (1) Methods that correct for rate heterogeneity The first set of methods described below correct for the observed rate heterogeneity by pruning branches or dividing the global rate into several rate classes (local rates). After this first step, which makes the trees (at least partially) ultrametric, the methods estimate rates and divergence times using the molecular clock as described above. Method 5: Linearized trees (Li & Tanimura, 1987; Takezaki et al., 1995; Hedges et al., 1996) The linearized trees method involves three steps: first, identify all branches in a phylogeny that depart significantly from rate constancy by using a statistical test (e.g. relative rate tests, Li & Tanimura, 1987; or two-cluster and branch-length tests, Takezaki et al., 1995). Then, selectively eliminate (prune) those branches. Finally, construct a tree with the remaining branches (the linearized tree) under the assumption of rate constancy. This procedure relies on eliminating data that do not fit the expected global rate behavior, and in many cases, this approach would lead to a massive elimination of data. Cutler (2000), describing the procedure as taxon shopping, stated: If one believes that rate overdispersion is intrinsic to the process of evolution (Gillespie, 1991) ( ), then restricting one s analysis to taxa which happen to pass relative-rate tests is inappropriate. Method 6: Local rates methods (Hasegawa et al., 1989; Uyenoyama, 1995; Rambaut & Bromham, 1998; Yoder & Yang, 2000) Apply two or more local molecular clocks on the tree by using a common model that characterizes the rate constancy on each part of the tree. One substantial difficulty is to identify correctly the branches or regions of a tree in which substitution rates significantly differ from the others; this difficulty explains why several methods of the local rates type exist. Usually, biological (e.g. similar life form, generation time, metabolic rate) or functional (e.g. gene function) information is used for their recognition. Also conspicuous patterns in the transition/transversion rate of some branches (Hasegawa et al., 1989), the known differential function of alleles (Uyenoyama, 1995), the branch lengths obtained by ML under the absence of a molecular clock (Yoder & Yang, 2000; Yang & Yoder, 2003), or the rate constancy within a quartet of two pairs of sister groups (Rambaut & Bromham, 1998) can be used for the definition of local clock regions. Probably the best known example of a local rates method is the ML-based local molecular clock approach (Hasegawa et al., 1989; Yoder & Yang, 2000; Yang & Yoder, 2003). This method pre-assigns evolutionary rates to some lineages while all the other branches evolve at the same rate. The local molecular clock model therefore lies between the two extremes of a global clock (assuming the same rate for all lineages) and the models that assume one independent rate for each branch (described below). The method allows for the definition of rate categories before the dating, which is a crucial and sensitive step for this method. Two different strategies can be used to pre-assign independent rates: (1) definition of rate categories: the user pre-assigns rate categories to specific branches based on the branch length estimates obtained without the clock assumption. For example, three different rate categories are defined, one for the outgroup lineage with long branch lengths, another for a crown group with short branch lengths, and a third for all other branches; (2) definition of rank categories: divide the taxa into several rate groups according to taxonomic ranks, e.g. order, suborder, family and genus, based on the assumption that closely related evolutionary lineages tend to evolve at similar rates (Kishino et al., 2001; Thorne & Kishino, 2002). After the definition of rate categories, the divergence times and rates for the different branch groups are estimated by ML optimization. The local molecular clock model is implemented in BASEML (part of the PAML package; Yang, 1997) and the program provides standard errors for estimated divergence times. BASEML does not (yet) allow for the specification of fossil calibrations as lower or upper limits on node ages, as in R8s or MULTIDIVTIME (see below). So far, nodal constraints based on fossils have to be specified as fixed ages. The local molecular clock method implemented in BASEML is now able to analyse multiple genes or data partitions with different evolutionary characteristics simultaneously and allows the branch group rates to vary freely among data partitions (since version 3.14). For example, the models allow some branches to evolve faster at codon position 1 while they evolve slower at codon position 2 (Yang & Yoder, 2003). The quartet method (Rambaut & Bromham, 1998) implemented in QDATE is one of the simplest local clock methods. The method works with species quartets built by combining two pairs of species, each of which has a known date of divergence. For each pair, a rate can be estimated, and this allows to estimate the date of the divergence between the pairs (age of the quartet). Because groups with undisputed relationships can be chosen, the method avoids problems of topological uncertainties. On the other hand, it is difficult to combine estimates from multiple quartets in a meaningful way (Bromham et al., 1998). Another program that allows the user to assign different rates and substitution models to different parts of a tree is RHINO by Rambaut (2001). This ML local clock implementation has been 2006 The Author 41

8 F. Rutschmann used so far mainly for comparing substitution rates of different lineages by using likelihood ratio tests (e.g. Bromham & Woolfit, 2004). A fourth implementation of the local molecular clock approach has been realized in the software R8S (Sanderson, 2003b). It follows the Langley Fitch method described above (method 3; Langley & Fitch, 1974), but instead of only using one constant rate of substitution, the method permits the user to specify multiple rate parameters and assign them to the appropriate branches or branch groups. After such a definition of rate categories, the divergence times and rates for the different branch groups are estimated by ML optimization. A fifth program, BEAST (Drummond & Rambaut, 2003), uses Bayesian inference and the Markov chain Monte Carlo (MCMC) procedure to derive the posterior distribution of local rates and times. As the software does not require a starting tree topology, it is able to account for phylogenetic uncertainty. Additionally, it permits the definition of calibration distributions (such as normal, lognormal, exponential or gamma) instead of simple point estimates or age intervals. (2) Methods that estimate divergence times by incorporating rate heterogeneity Methods that relax rate constancy must necessarily be guided by specifications about how rates are expected to change among lineages. Because rates and divergence times are confounded, it is not possible to estimate one without making assumptions regarding the other (Aris-Brosou & Yang, 2002). Recently, it has been questioned that divergence times without a molecular clock can be estimated consistently just by increasing the sequence lengths (Britton, 2005). However, available methods try to introduce rate heterogeneity on the basis of three different approaches: one is the concept of temporal autocorrelation in rates (Methods 7 11; Gillespie, 1991), another is the stationary process of rate change (Method 13; Cutler, 2000), and a third is the compound Poisson process of rate change (Method 14; Huelsenbeck et al., 2000). All methods estimate branch lengths without assuming rate constancy, and then model the distribution of divergence times and rates by minimizing the discrepancies between branch lengths and the rate changes over the branches. The methods differ not only in their models, but also in their strategy to incorporate age constraints (calibration points) into the analysis. (2a) Methods that model rate change according to the standard Poisson process and the concept of rate autocorrelation An autocorrelation limits the speed with which a rate can change from an ancestral lineage to a descendant lineage (Sanderson, 1997). As the rate of substitution evolves along lineages, daughter lineages might inherit their initial rate from their parental lineage and evolve new rates independently (Gillespie, 1991). Temporal autocorrelation is an explicit a priori criterion to guide inference of among-lineage rate change and is implemented in several methods (Magallón, 2004). Readers who want to learn more about the theory of temporal autocorrelation are referred to publications by Takahata (1987), Gillespie (1991), Sanderson (1997) and Thorne et al. (1998). Method 7: Nonparametric rate smoothing (Sanderson, 1997, 2003b) By analogy to smoothing methods in regression analysis, the nonparametric rate smoothing (NPRS) method attempts to simultaneously estimate unknown divergence times and smooth the rapidity of rate change along lineages (Sanderson, 1997, 2003b). To smooth rate changes, the method contains a nonparametric function that penalizes rates that change too quickly from branch to neighboring branch, which reflects the idea of autocorrelation of rates. In other words: the local transformations in rate are smoothed as the rate itself changes over the tree by minimizing the ancestral-descendant changes of rate. Since the penalty function includes unknown times, an optimality criterion based on this penalty (the sum of squared differences in local rate estimates compared from branch to neighboring branch; leastsquares method) permits an estimation of the divergence times. NPRS is implemented in R8S (Sanderson, 2003b) and TREEEDIT (Rambaut & Charleston, 2002). With R8S, but not with TREEEDIT, the user is able to add one or more calibration constraints to permit scaling of rates and times to absolute temporal units. A serious limitation of NPRS is its tendency to overfit the data, leading to rapid fluctuations in rate in regions of a tree that have short branches (Sanderson, 2003b). In R8S, but not in TREEEDIT, two strategies to provide confidence intervals on the estimated parameters are available: (1) a built-in procedure that uses the curvature of the likelihood surface around the parameter estimate (after Cutler, 2000); and (2) the calculation of an age distribution based on chronograms generated from a large number of bootstrapped data sets. The central 95% of the age distribution provide the confidence interval (Efron & Tibshirani, 1993; Baldwin & Sanderson, 1998; Sanderson, 2003b). This robust, but time-consuming procedure can be facilitated by using a collection of Perl scripts written by Eriksson (2002) called the R8S-bootstrap-kit. Method 8: Penalized likelihood (Sanderson, 2002, 2003b) Penalized likelihood (PL) combines likelihood and the nonparametric penalty function used in NPRS. This semi-parametric technique attempts to combine the statistical power of parametric methods with the robustness of nonparametric methods. In effect, it permits the specification of the relative contribution of the rate smoothing and the data-fitting parts of the estimation procedure: a roughness penalty can be assigned as smoothing parameter in the input file. The smoothing parameter can be estimated by running a cross-validation procedure, which is a data-driven method for finding the optimal level of smoothing. If the smoothing parameter is large, the function is dominated by the roughness penalty, and this leads to a clock-like model. If it is low, the smoothing will be effectively unconstrained (similar to The Author

9 Review: Molecular dating methods NPRS). So far, PL is only implemented in R8S (Sanderson, 2003b). As with NPRS, the user is able to add one or more calibration constraints to permit scaling of rates and times to real units. The same two strategies for providing confidence intervals on the estimated parameters as for the NPRS method are also available for PL. Method 9: Heuristic rate smoothing (AHRS) for ML estimation of divergence times (Yang, 2004) The heuristic rate-smoothing (AHRS) algorithm for ML estimation of divergence times (Yang, 2004) has a number of similarities with PL and the two Bayesian dating methods described above. It involves three steps: (1) estimation of branch lengths in the absence of a molecular clock; (2) heuristic rate smoothing to estimate substitution rates for branches together with divergence times, and classification of branches into several rate classes; and (3) ML estimation of divergence times and rates of the different branch groups. The AHRS algorithm differs slightly from PL: where Sanderson (2002) uses a Poisson approximation to fit the branch lengths, the AHRS algorithm uses a normal approximation of the ML estimates of branch lengths. Furthermore, the ratesmoothing algorithm in AHRS is used only to partition branches on each gene tree into different rate groups, and plays therefore a less significant role in this method than in PL. In contrast to the Bayesian approaches described above, AHRS optimizes rates, together with divergence times, rather than averaging over them in an MCMC procedure. Another difference to the Bayesian dating methods is that the AHRS algorithm does not need any prior for divergence times, which can be an advantage. On the other hand, it is not possible to specify fossil calibrations as lower or upper bounds on node ages, as in R8S or MULTIDIVTIME so far, nodal constraints based on fossils have to be specified as fixed ages. The AHRS algorithm is implemented in the BASEML and CODEML programs, which are parts of the PAML package (since version 3.14 final; Yang, 1997). Those programs provide standard errors for estimated divergence times. As MULTIDIVTIME, the AHRS algorithm implemented in BASEML is able to analyse multiple genes/loci with different evolutionary characteristics simultaneously. Method 10: Bayesian implementation of rate autocorrelation in MULTIDIVTIME (Thorne et al., 1998; Kishino et al., 2001; Thorne & Kishino, 2002) The Bayesian dating method implemented in MULTIDIVTIME (Thorne et al., 1998; Kishino et al., 2001; Thorne & Kishino, 2002) uses a fully probabilistic and high parametric model to describe the change in evolutionary rate over time and uses the MCMC procedure to derive the posterior distribution of rates and times. In effect, the variation of rates is addressed by letting the MCMC algorithm assign rates to different parts of the tree, and then sampling from the patterns that are possible. By this way, the MC techniques average over various patterns of rates along the tree. The result is a posterior distribution of rates and times derived from a prior distribution. For the assignments of rates to different branches in the tree, rates are drawn from a lognormal distribution, and a hyperparameter ν (also called Brownian motion constant) describes the amount of autocorrelation. The internal node age proportions are described as a dirichlet distribution, which represents the idea to model evolutionary lineages due to speciation, but is not intended as a detailed model of speciation and extinction processes. In practice, the most commonly used procedure is divided into three different steps and programs, and is described in more detail in a step-by-step manual (Rutschmann, 2005). (1) In the first step, model parameters for the F 84 + G model (Kishino & Hasegawa, 1989; Felsenstein, 1993) are estimated by using the program BASEML, which is part of the PAML package (Yang, 1997). (2) By using these parameters, the ML of the branch lengths is estimated, together with a variance covariance matrix of the branch length estimates by using the program ESTBRANCHES (Thorne et al., 1998). (3) The third program, MULTIDIVTIME (Kishino et al., 2001; Thorne & Kishino, 2002), is then able to approximate the posterior distributions of substitution rates and divergence times by using a multivariate normal distribution of estimated branch lengths and running an MCMC procedure. MULTIDIVTIME asks the user to specify several priors, such as the mean and the variance of the distributions for the initial substitution rate at the root node or the prospective age of the root node. Additionally, constraints on nodal ages can be specified as age intervals. The program provides Bayesian credibility intervals for estimated divergence times and substitution rates. In contrast to NPRS and PL (implemented in R8S), MULTIDIVTIME is able to account for multiple genes/loci with different evolutionary characteristics. Such a simultaneous analysis of multiple genes may improve the estimates of divergence times which are shared across genes. Method 11: PHYBAYES (Aris-Brosou & Yang, 2001, 2002) The PHYBAYES program (Aris-Brosou & Yang, 2001, 2002) is similar to the MULTIDIVTIME Bayesian approach described above. It also uses a fully probabilistic and high parametric model to describe the change in evolutionary rate over time and uses the MCMC procedure to derive the posterior distribution of rates and times. But the method is more versatile in terms of possibilities of defining the prior distributions, as it allows for the usage of models that explicitly describe the processes of speciation and extinction. For the rates of evolution, it offers a choice of six different rate distributions to model the autocorrelated rate change from an ancestor to a descendent branch (lognormal, stationarized -lognormal, truncated normal, Ornstein Uhlenbeck process, gamma and exponential, plus the definition of two model-related hyperparameters), whereas in MULTIDIVTIME, rates are always drawn from a lognormal distribution. The prior distribution of divergence times is generated by a process of cladogenesis, the generalized birth and death process with species sampling (Yang & Rannala, 1997), a model that assumes a constant speciation and extinction rate per lineage (MULTIDIVTIME uses a dirichlet distribution of all internal node age proportions to generalize the rooted tree structure; Kishino et al., 2001). In 2006 The Author 43

10 F. Rutschmann contrast to MULTIDIVTIME, it is not possible to analyse multiple genes simultaneously with PHYBAYES, and the program does not allow for an a priori integration of nodal constraints (calibration points). (2b) Methods that model rate change with other concepts than rate autocorrelation Method 12: Bayesian implementation of rate variation in BEAST (Drummond et al., submitted) Similar to PHYBAYES, the variable rate methods implemented in BEAST use Bayesian inference and the MCMC procedures to derive the posterior distribution of rates and times. In contrast, in addition to the autocorrelated models like the one implemented in MULTIDIVTIME, a range of different, novel models have been implemented, where the rates are drawn from a distribution (with various distributions on offer; Drummond et al. submitted). These models have a couple of interesting features: (1) the parameters of the distributions can be estimated (instead of being specified), and (2) the correlation of rates between adjacent branches can be tested (if > 0, this would indicate some inherence of rates). Another unique feature of the software is that it does not require a starting tree topology, which allows it to account for phylogenetic uncertainty. Additionally, BEAST permits the definition of calibration distributions (such as normal, lognormal, exponential or gamma) to model calibration uncertainty instead of simple point estimates or age intervals. For the other, non-calibrated nodes, there is no specific process that describes the prior distribution of divergence times (they are uniform over a range from 0 to very large). BEAST allows the user to simultaneously analyse multiple data sets/partitions with different substitution models, and provides Bayesian credibility intervals. Method 13: Overdispersed clock method (Cutler, 2000) While all methods described so far assume fundamentally that the number of substitutions in a lineage is Poisson-distributed, the overdispersed clock model (Cutler, 2000) assumes that the number of substitutions in a lineage is stationary. Unlike a Poisson process, the variance in the number of substitutions will not necessarily be equal to the mean. Under this model, which treats rate changes according to a stationary process, ML estimates of divergence times can be calculated. The method is implemented in a c program, which is available directly from the author (Cutler, 2000). It is possible to incorporate multiple calibration points. Method 14: Compound Poisson process method (Huelsenbeck et al., 2000) As all methods described above (with the exception of the overdispersed clock model; Cutler, 2000), the compound Poisson process method uses a model that assumes that nucleotide substitutions occur along branches of the tree according to a Poisson process. But in addition to the other models, it assumes that another, independent Poisson process generates events of substitution rate change. Therefore, this second Poisson process is superimposed on the primary Poisson process of molecular substitution (hence the name compound Poisson process), and introduces changes (in form of discrete jumps) in the rate of substitution in different branches of the phylogeny. Rates on the tree are then determined by the number of rate-change events, the point in the tree where they occur, and the magnitude of change at each event (Huelsenbeck et al., 2000; Magallón, 2004). These parameters are estimated by using Bayesian inference (MCMC integration). One of the main advantages of treating rate variation as a compound Poisson process is that the model can introduce rate variation at any point of the phylogenetic tree; all other methods assume that substitution rates change only at speciation events (nodes; Huelsenbeck et al., 2000). The method is implemented in a C program, but it is not meant for being available for the community so far as there is no documentation (yet). CONCLUSIONS Molecular dating is a rapidly developing field, and the methods generated so far are still far from being perfect (Sanderson et al., 2004). Molecular dating estimates derived from different inference methods can be in conflict, and so can the results obtained with different taxon sampling, gene sampling and calibration strategies (see below). It should be clear that there is no single best molecular dating method; rather, all approaches have advantages and disadvantages. For example, the methods reviewed here differ in the type of input data they use and process: the first group of methods (NPRS and PL) bases their analysis on input phylograms with branch lengths. Therefore, they are not able to incorporate branch length errors or parameters of the substitution model into the dating analysis (this has to be done prior to the dating). On the other hand, these methods are fast and versatile, because they can process phylogenies generated from parsimony, likelihood or Bayesian analyses. The second group of methods (MULTIDIVTIME, PHYBAYES, AHRS) uses one true tree topology to assess rates and divergence times and estimate the branch lengths themselves. Therefore, they are able to account for the branch length errors described above, but still base their analyses on a fixed, user supplied tree topology. The third group of methods (ML with clock and BEAST) directly calculate ultrametric phylogenies based only on sequence data and model parameters, a procedure that also allows them to incorporate topological uncertainties. Computationally, this can be very expensive, especially in the case of a variable rate model, and with a high number of taxa. As I have not tested all software packages and methods described here myself, this review is not a comparison based on practical experiments. However, the papers that first described these approaches always report on their performance on simulated real data. Three papers that compare some of the described methods on original data sets or simulated data have been published recently: Yang and Yoder (2003) compared methods The Author

11 Review: Molecular dating methods 3, 5 and 8 (see Tables 1 3) by analysing a mouse lemur data set using multiple gene loci and calibration points; Pérez-Losada et al. (2004) compared methods 5, 7, 8 and 9 in their analysis of a nuclear ribosomal 18S data set to test the evolutionary radiation of the Thoracian Barnacles by comparing different calibration points independently; and Ho et al. (2005b) compared the performance of methods 8, 10 and 11 by using simulated data. Currently, software developers are starting to integrate different methods in the same software (e.g. BASEML, Yang, 1997; BEAST, Drummond et al., 2005; future versions of MRBAYES > version 3.1, Huelsenbeck & Ronquist, 2001). This recent trend is thus allowing users to try out different methods based on their own data sets, and then compare the results. Although this review focuses on the dating methods themselves, at least a few links to key articles about more general or very specific issues related to molecular dating are given here: (1) general reviews: Magallón (2004) wrote a comprehensive review of the theory of molecular dating, which also discusses paleontological dating methods and the uncertainties of the paleontological record. The classification of the molecular methods described here is based on her publication. Another recent review has been written by Sanderson et al. (2004), which discusses the advantages and disadvantages of Bayesian vs. smoothing molecular dating methods, summarizes the inferred ages of the major clades of plants and lists many published dating applications that investigated recent plant radiations and/or tested biogeographical hypotheses. Finally, Welch and Bromham (2005), Bromham and Penny (2003), Arbogast et al. (2002) and Wray (2001) wrote more general reviews on the issue of estimating divergence times. (2) For specific discussions about the crucial and controversial role of calibration, refer to the following papers: Where on the tree and how should we assign ages from fossils or geologic events? Near & Sanderson (2004), Conti et al. (2004) and Rutschmann et al. (2004). How can we deal with the incompleteness of the fossil record? Tavaré et al. (2002), Foote et al. (1999) and Foote and Sepkoski (1999). How should we constrain the age of the root and deal with the methodological handicap of asymmetric random variables in molecular dating? Rodríguez-Trelles et al. (2002) and Sanderson and Doyle (2001). (3) For the recent debate about the precision of divergence time estimates, refer to the following papers: Should we extrapolate substitution rates accross different evolutionary timescales? Ho et al. (2005a). How can we account for the various uncertainties related to the calibration and the dating procedure. How should we report and interpret error estimates, and should we use secondary calibration points? Hedges and Kumar (2003), Graur and Martin (2004), Reisz and Müller (2004) and Hedges and Kumar (2004). (4) For questions related to the influence of taxon sampling on estimating divergence times under various dating methods, see Linder et al. (2005) and Sanderson and Doyle (2001). (5) For the influence of gene sampling read Heckman et al. (2001). (6) Theoretical problems and strategies connected to the molecular dating of supertrees are discussed in Vos and Mooers (2004) and Bryant et al. (2004). (7) For special dating problems like estimating the substitution rate when the ages of different terminals are known (e.g. from virus sequences that were isolated at different dates; implemented in the software TIPDATE and also in BEAST), refer to Rambaut (2000). Finally, I would like to add a suggestion to those among us who write software: although the development of graphical user interfaces (GUIs) is certainly not a first priority, GUIs would simplify significantly molecular dating analyses for the average biologist. Modern tools (like the open source Qt 4 C++ class library by TROLLTECH; make the development of fast, native and multiplatform GUI applications easier than ever before. I do not share the widespread concerns about stupid analyse-by-click users. On the contrary: comprehensive user interfaces allow the user to explore and detect all the important features a method offers. ACKNOWLEDGEMENTS I thank Peter Linder, Chloé Galley, Cyril Guibert, Chris Hardy, Timo van der Niet and Philip Moline for organizing the meeting, Recent Floristic Radiations in the Cape Flora, and for inviting me to participate in it as a discussion facilitator. All authors and co-authors of the methods described in this article who contributed by sending software or manuals, or answering questions, are greatly acknowledged, especially Jeff Thorne, Ziheng Yang, Andrew Rambaut, Michael Sanderson, David Cutler and Bret Larget. Finally, I m very grateful to Elena Conti, Susana Magallón, Susanne Renner and Tim Barraclough for critically reading the manuscript. REFERENCES Arbogast, B.S., Edwards, S.V., Wakeley, J., Beerli, P. & Slwinski, J.B. (2002) Estimating divergence times from molecular data on phylogenetic and population genetic timescales. Annual Review of Ecology, Evolution and Systematics, 33, Aris-Brosou, S. & Yang, Z. (2001) PHYBAYES: a program for phylogenetic analyses in a Bayesian framework. Department of Biology (Galton Laboratory), University College London, London, UK. Aris-Brosou, S. & Yang, Z. (2002) Effects of models of rate evolution on estimation of divergence dates with special reference to the metazoan 18S ribosomal RNA phylogeny. Systematic Biology, 51, Aris-Brosou, S. & Yang, Z. (2003) Bayesian models of episodic evolution support a Late Precambrian explosive diversification of Metazoa. Molecular Biology and Evolution, 20, Baldwin, B.G. & Sanderson, M.J. (1998) Age and rate of diversification of the Hawaiian silversword alliance (Compositae). Proceedings of the National Academy of Sciences of the USA, 95, Bell, C.D., Soltis, D.E. & Soltis, P.S. (2005) The age of the angiosperms: a molecular timescale without a clock. Evolution, 59, Berry, P.E., Hahn, W.J., Sytsma, K.J., Hall, J.C. & Mast, A. (2004) Phylogenetic relationships and biogeography of Fuchsia (Onagraceae) based on noncoding nuclear and chloroplast DNA data. American Journal of Botany, 94, The Author 45

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley

Integrative Biology 200 PRINCIPLES OF PHYLOGENETICS Spring 2018 University of California, Berkeley Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley B.D. Mishler Feb. 14, 2018. Phylogenetic trees VI: Dating in the 21st century: clocks, & calibrations;

More information

DATING LINEAGES: MOLECULAR AND PALEONTOLOGICAL APPROACHES TO THE TEMPORAL FRAMEWORK OF CLADES

DATING LINEAGES: MOLECULAR AND PALEONTOLOGICAL APPROACHES TO THE TEMPORAL FRAMEWORK OF CLADES Int. J. Plant Sci. 165(4 Suppl.):S7 S21. 2004. Ó 2004 by The University of Chicago. All rights reserved. 1058-5893/2004/1650S4-0002$15.00 DATING LINEAGES: MOLECULAR AND PALEONTOLOGICAL APPROACHES TO THE

More information

Dating r8s, multidistribute

Dating r8s, multidistribute Phylomethods Fall 2006 Dating r8s, multidistribute Jun G. Inoue Software of Dating Molecular Clock Relaxed Molecular Clock r8s multidistribute r8s Michael J. Sanderson UC Davis Estimation of rates and

More information

Concepts and Methods in Molecular Divergence Time Estimation

Concepts and Methods in Molecular Divergence Time Estimation Concepts and Methods in Molecular Divergence Time Estimation 26 November 2012 Prashant P. Sharma American Museum of Natural History Overview 1. Why do we date trees? 2. The molecular clock 3. Local clocks

More information

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression)

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression) Using phylogenetics to estimate species divergence times... More accurately... Basics and basic issues for Bayesian inference of divergence times (plus some digression) "A comparison of the structures

More information

Molecular Clocks. The Holy Grail. Rate Constancy? Protein Variability. Evidence for Rate Constancy in Hemoglobin. Given

Molecular Clocks. The Holy Grail. Rate Constancy? Protein Variability. Evidence for Rate Constancy in Hemoglobin. Given Molecular Clocks Rose Hoberman The Holy Grail Fossil evidence is sparse and imprecise (or nonexistent) Predict divergence times by comparing molecular data Given a phylogenetic tree branch lengths (rt)

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: istance-based methods Ultrametric Additive: UPGMA Transformed istance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

Inferring Speciation Times under an Episodic Molecular Clock

Inferring Speciation Times under an Episodic Molecular Clock Syst. Biol. 56(3):453 466, 2007 Copyright c Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 10.1080/10635150701420643 Inferring Speciation Times under an Episodic Molecular

More information

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: Distance-based methods Ultrametric Additive: UPGMA Transformed Distance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

Dr. Amira A. AL-Hosary

Dr. Amira A. AL-Hosary Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

More information

8/23/2014. Phylogeny and the Tree of Life

8/23/2014. Phylogeny and the Tree of Life Phylogeny and the Tree of Life Chapter 26 Objectives Explain the following characteristics of the Linnaean system of classification: a. binomial nomenclature b. hierarchical classification List the major

More information

Accepted Article. Molecular-clock methods for estimating evolutionary rates and. timescales

Accepted Article. Molecular-clock methods for estimating evolutionary rates and. timescales Received Date : 23-Jun-2014 Revised Date : 29-Sep-2014 Accepted Date : 30-Sep-2014 Article type : Invited Reviews and Syntheses Molecular-clock methods for estimating evolutionary rates and timescales

More information

Estimating Divergence Dates from Molecular Sequences

Estimating Divergence Dates from Molecular Sequences Estimating Divergence Dates from Molecular Sequences Andrew Rambaut and Lindell Bromham Department of Zoology, University of Oxford The ability to date the time of divergence between lineages using molecular

More information

MOLECULAR SYSTEMATICS: A SYNTHESIS OF THE COMMON METHODS AND THE STATE OF KNOWLEDGE

MOLECULAR SYSTEMATICS: A SYNTHESIS OF THE COMMON METHODS AND THE STATE OF KNOWLEDGE CELLULAR & MOLECULAR BIOLOGY LETTERS http://www.cmbl.org.pl Received: 16 August 2009 Volume 15 (2010) pp 311-341 Final form accepted: 01 March 2010 DOI: 10.2478/s11658-010-0010-8 Published online: 19 March

More information

Estimating Absolute Rates of Molecular Evolution and Divergence Times: A Penalized Likelihood Approach

Estimating Absolute Rates of Molecular Evolution and Divergence Times: A Penalized Likelihood Approach Estimating Absolute Rates of Molecular Evolution and Divergence Times: A Penalized Likelihood Approach Michael J. Sanderson Section of Evolution and Ecology, University of California, Davis Rates of molecular

More information

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution Ziheng Yang Department of Biology, University College, London An excess of nonsynonymous substitutions

More information

Inference of Viral Evolutionary Rates from Molecular Sequences

Inference of Viral Evolutionary Rates from Molecular Sequences Inference of Viral Evolutionary Rates from Molecular Sequences Alexei Drummond 1,2, Oliver G. Pybus 1 and Andrew Rambaut 1 * 1 Department of Zoology, University of Oxford, South Parks Road, Oxford, OX1

More information

A (short) introduction to phylogenetics

A (short) introduction to phylogenetics A (short) introduction to phylogenetics Thibaut Jombart, Marie-Pauline Beugin MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis with PR Statistics, Millport Field

More information

Dating Divergence Times in Phylogenies

Dating Divergence Times in Phylogenies Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology 322 Dating Divergence Times in Phylogenies CAJSA LISA ANDERSON ACTA UNIVERSITATIS UPSALIENSIS UPPSALA

More information

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

CHAPTERS 24-25: Evidence for Evolution and Phylogeny CHAPTERS 24-25: Evidence for Evolution and Phylogeny 1. For each of the following, indicate how it is used as evidence of evolution by natural selection or shown as an evolutionary trend: a. Paleontology

More information

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the

More information

ESTIMATING DIVERGENCE TIMES FROM MOLECULAR DATA ON PHYLOGENETIC

ESTIMATING DIVERGENCE TIMES FROM MOLECULAR DATA ON PHYLOGENETIC Annu. Rev. Ecol. Syst. 2002. 33:707 40 doi: 10.1146/annurev.ecolsys.33.010802.150500 Copyright c 2002 by Annual Reviews. All rights reserved ESTIMATING DIVERGENCE TIMES FROM MOLECULAR DATA ON PHYLOGENETIC

More information

Algorithms in Bioinformatics

Algorithms in Bioinformatics Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Distance Methods Character Methods

More information

Markov chain Monte-Carlo to estimate speciation and extinction rates: making use of the forest hidden behind the (phylogenetic) tree

Markov chain Monte-Carlo to estimate speciation and extinction rates: making use of the forest hidden behind the (phylogenetic) tree Markov chain Monte-Carlo to estimate speciation and extinction rates: making use of the forest hidden behind the (phylogenetic) tree Nicolas Salamin Department of Ecology and Evolution University of Lausanne

More information

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis

More information

C3020 Molecular Evolution. Exercises #3: Phylogenetics

C3020 Molecular Evolution. Exercises #3: Phylogenetics C3020 Molecular Evolution Exercises #3: Phylogenetics Consider the following sequences for five taxa 1-5 and the known outgroup O, which has the ancestral states (note that sequence 3 has changed from

More information

Phylogenetic Tree Reconstruction

Phylogenetic Tree Reconstruction I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven

More information

An Evaluation of Different Partitioning Strategies for Bayesian Estimation of Species Divergence Times

An Evaluation of Different Partitioning Strategies for Bayesian Estimation of Species Divergence Times Syst. Biol. 67(1):61 77, 2018 The Author(s) 2017. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. This is an Open Access article distributed under the terms of

More information

Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30

Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30 Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30 A non-phylogeny

More information

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics)

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics) - Phylogeny? - Systematics? The Phylogenetic Systematics (Phylogeny and Systematics) - Phylogenetic systematics? Connection between phylogeny and classification. - Phylogenetic systematics informs the

More information

Algorithmic Methods Well-defined methodology Tree reconstruction those that are well-defined enough to be carried out by a computer. Felsenstein 2004,

Algorithmic Methods Well-defined methodology Tree reconstruction those that are well-defined enough to be carried out by a computer. Felsenstein 2004, Tracing the Evolution of Numerical Phylogenetics: History, Philosophy, and Significance Adam W. Ferguson Phylogenetic Systematics 26 January 2009 Inferring Phylogenies Historical endeavor Darwin- 1837

More information

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley

PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION Integrative Biology 200B Spring 2009 University of California, Berkeley "PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley B.D. Mishler Jan. 22, 2009. Trees I. Summary of previous lecture: Hennigian

More information

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other? Phylogeny and systematics Why are these disciplines important in evolutionary biology and how are they related to each other? Phylogeny and systematics Phylogeny: the evolutionary history of a species

More information

Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition

Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition David D. Pollock* and William J. Bruno* *Theoretical Biology and Biophysics, Los Alamos National

More information

What is Phylogenetics

What is Phylogenetics What is Phylogenetics Phylogenetics is the area of research concerned with finding the genetic connections and relationships between species. The basic idea is to compare specific characters (features)

More information

Statistical nonmolecular phylogenetics: can molecular phylogenies illuminate morphological evolution?

Statistical nonmolecular phylogenetics: can molecular phylogenies illuminate morphological evolution? Statistical nonmolecular phylogenetics: can molecular phylogenies illuminate morphological evolution? 30 July 2011. Joe Felsenstein Workshop on Molecular Evolution, MBL, Woods Hole Statistical nonmolecular

More information

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Bio 1B Lecture Outline (please print and bring along) Fall, 2007 Bio 1B Lecture Outline (please print and bring along) Fall, 2007 B.D. Mishler, Dept. of Integrative Biology 2-6810, bmishler@berkeley.edu Evolution lecture #5 -- Molecular genetics and molecular evolution

More information

Classification and Phylogeny

Classification and Phylogeny Classification and Phylogeny The diversity of life is great. To communicate about it, there must be a scheme for organization. There are many species that would be difficult to organize without a scheme

More information

C.DARWIN ( )

C.DARWIN ( ) C.DARWIN (1809-1882) LAMARCK Each evolutionary lineage has evolved, transforming itself, from a ancestor appeared by spontaneous generation DARWIN All organisms are historically interconnected. Their relationships

More information

Classification and Phylogeny

Classification and Phylogeny Classification and Phylogeny The diversity it of life is great. To communicate about it, there must be a scheme for organization. There are many species that would be difficult to organize without a scheme

More information

PAML 4: Phylogenetic Analysis by Maximum Likelihood

PAML 4: Phylogenetic Analysis by Maximum Likelihood PAML 4: Phylogenetic Analysis by Maximum Likelihood Ziheng Yang* *Department of Biology, Galton Laboratory, University College London, London, United Kingdom PAML, currently in version 4, is a package

More information

Bayesian Models for Phylogenetic Trees

Bayesian Models for Phylogenetic Trees Bayesian Models for Phylogenetic Trees Clarence Leung* 1 1 McGill Centre for Bioinformatics, McGill University, Montreal, Quebec, Canada ABSTRACT Introduction: Inferring genetic ancestry of different species

More information

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5. Five Sami Khuri Department of Computer Science San José State University San José, California, USA sami.khuri@sjsu.edu v Distance Methods v Character Methods v Molecular Clock v UPGMA v Maximum Parsimony

More information

Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week (Jan 27 & 29):

Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week (Jan 27 & 29): Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week (Jan 27 & 29): Statistical estimation of models of sequence evolution Phylogenetic inference using maximum likelihood:

More information

Chapter 22: Descent with Modification 1. BRIEFLY summarize the main points that Darwin made in The Origin of Species.

Chapter 22: Descent with Modification 1. BRIEFLY summarize the main points that Darwin made in The Origin of Species. AP Biology Chapter Packet 7- Evolution Name Chapter 22: Descent with Modification 1. BRIEFLY summarize the main points that Darwin made in The Origin of Species. 2. Define the following terms: a. Natural

More information

Intraspecific gene genealogies: trees grafting into networks

Intraspecific gene genealogies: trees grafting into networks Intraspecific gene genealogies: trees grafting into networks by David Posada & Keith A. Crandall Kessy Abarenkov Tartu, 2004 Article describes: Population genetics principles Intraspecific genetic variation

More information

Estimating the Rate of Evolution of the Rate of Molecular Evolution

Estimating the Rate of Evolution of the Rate of Molecular Evolution Estimating the Rate of Evolution of the Rate of Molecular Evolution Jeffrey L. Thorne,* Hirohisa Kishino, and Ian S. Painter* *Program in Statistical Genetics, Statistics Department, North Carolina State

More information

Consistency Index (CI)

Consistency Index (CI) Consistency Index (CI) minimum number of changes divided by the number required on the tree. CI=1 if there is no homoplasy negatively correlated with the number of species sampled Retention Index (RI)

More information

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships Chapter 26: Phylogeny and the Tree of Life You Must Know The taxonomic categories and how they indicate relatedness. How systematics is used to develop phylogenetic trees. How to construct a phylogenetic

More information

Lecture 11 Friday, October 21, 2011

Lecture 11 Friday, October 21, 2011 Lecture 11 Friday, October 21, 2011 Phylogenetic tree (phylogeny) Darwin and classification: In the Origin, Darwin said that descent from a common ancestral species could explain why the Linnaean system

More information

Calibrating the Tree of Life: fossils, molecules and evolutionary timescales

Calibrating the Tree of Life: fossils, molecules and evolutionary timescales Annals of Botany 104: 789 794, 2009 doi:10.1093/aob/mcp192, available online at www.aob.oxfordjournals.org BOTANICAL BRIEFING Calibrating the Tree of Life: fossils, molecules and evolutionary timescales

More information

Major questions of evolutionary genetics. Experimental tools of evolutionary genetics. Theoretical population genetics.

Major questions of evolutionary genetics. Experimental tools of evolutionary genetics. Theoretical population genetics. Evolutionary Genetics (for Encyclopedia of Biodiversity) Sergey Gavrilets Departments of Ecology and Evolutionary Biology and Mathematics, University of Tennessee, Knoxville, TN 37996-6 USA Evolutionary

More information

Phylogenetics. BIOL 7711 Computational Bioscience

Phylogenetics. BIOL 7711 Computational Bioscience Consortium for Comparative Genomics! University of Colorado School of Medicine Phylogenetics BIOL 7711 Computational Bioscience Biochemistry and Molecular Genetics Computational Bioscience Program Consortium

More information

Bootstrap confidence levels for phylogenetic trees B. Efron, E. Halloran, and S. Holmes, 1996

Bootstrap confidence levels for phylogenetic trees B. Efron, E. Halloran, and S. Holmes, 1996 Bootstrap confidence levels for phylogenetic trees B. Efron, E. Halloran, and S. Holmes, 1996 Following Confidence limits on phylogenies: an approach using the bootstrap, J. Felsenstein, 1985 1 I. Short

More information

PhyQuart-A new algorithm to avoid systematic bias & phylogenetic incongruence

PhyQuart-A new algorithm to avoid systematic bias & phylogenetic incongruence PhyQuart-A new algorithm to avoid systematic bias & phylogenetic incongruence Are directed quartets the key for more reliable supertrees? Patrick Kück Department of Life Science, Vertebrates Division,

More information

Name: Class: Date: ID: A

Name: Class: Date: ID: A Class: _ Date: _ Ch 17 Practice test 1. A segment of DNA that stores genetic information is called a(n) a. amino acid. b. gene. c. protein. d. intron. 2. In which of the following processes does change

More information

Maximum Likelihood Estimation on Large Phylogenies and Analysis of Adaptive Evolution in Human Influenza Virus A

Maximum Likelihood Estimation on Large Phylogenies and Analysis of Adaptive Evolution in Human Influenza Virus A J Mol Evol (2000) 51:423 432 DOI: 10.1007/s002390010105 Springer-Verlag New York Inc. 2000 Maximum Likelihood Estimation on Large Phylogenies and Analysis of Adaptive Evolution in Human Influenza Virus

More information

7. Tests for selection

7. Tests for selection Sequence analysis and genomics 7. Tests for selection Dr. Katja Nowick Group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute for Brain Research www. nowicklab.info

More information

Taming the Beast Workshop

Taming the Beast Workshop Workshop and Chi Zhang June 28, 2016 1 / 19 Species tree Species tree the phylogeny representing the relationships among a group of species Figure adapted from [Rogers and Gibbs, 2014] Gene tree the phylogeny

More information

Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies

Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies 1 What is phylogeny? Essay written for the course in Markov Chains 2004 Torbjörn Karfunkel Phylogeny is the evolutionary development

More information

Reading for Lecture 13 Release v10

Reading for Lecture 13 Release v10 Reading for Lecture 13 Release v10 Christopher Lee November 15, 2011 Contents 1 Evolutionary Trees i 1.1 Evolution as a Markov Process...................................... ii 1.2 Rooted vs. Unrooted Trees........................................

More information

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky MOLECULAR PHYLOGENY "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky EVOLUTION - theory that groups of organisms change over time so that descendeants differ structurally

More information

Homework Assignment, Evolutionary Systems Biology, Spring Homework Part I: Phylogenetics:

Homework Assignment, Evolutionary Systems Biology, Spring Homework Part I: Phylogenetics: Homework Assignment, Evolutionary Systems Biology, Spring 2009. Homework Part I: Phylogenetics: Introduction. The objective of this assignment is to understand the basics of phylogenetic relationships

More information

Letter to the Editor. Department of Biology, Arizona State University

Letter to the Editor. Department of Biology, Arizona State University Letter to the Editor Traditional Phylogenetic Reconstruction Methods Reconstruct Shallow and Deep Evolutionary Relationships Equally Well Michael S. Rosenberg and Sudhir Kumar Department of Biology, Arizona

More information

Chapter 16: Reconstructing and Using Phylogenies

Chapter 16: Reconstructing and Using Phylogenies Chapter Review 1. Use the phylogenetic tree shown at the right to complete the following. a. Explain how many clades are indicated: Three: (1) chimpanzee/human, (2) chimpanzee/ human/gorilla, and (3)chimpanzee/human/

More information

Tree of Life iological Sequence nalysis Chapter http://tolweb.org/tree/ Phylogenetic Prediction ll organisms on Earth have a common ancestor. ll species are related. The relationship is called a phylogeny

More information

Estimating Evolutionary Trees. Phylogenetic Methods

Estimating Evolutionary Trees. Phylogenetic Methods Estimating Evolutionary Trees v if the data are consistent with infinite sites then all methods should yield the same tree v it gets more complicated when there is homoplasy, i.e., parallel or convergent

More information

Classification, Phylogeny yand Evolutionary History

Classification, Phylogeny yand Evolutionary History Classification, Phylogeny yand Evolutionary History The diversity of life is great. To communicate about it, there must be a scheme for organization. There are many species that would be difficult to organize

More information

BINF6201/8201. Molecular phylogenetic methods

BINF6201/8201. Molecular phylogenetic methods BINF60/80 Molecular phylogenetic methods 0-7-06 Phylogenetics Ø According to the evolutionary theory, all life forms on this planet are related to one another by descent. Ø Traditionally, phylogenetics

More information

Phylogenetics in the Age of Genomics: Prospects and Challenges

Phylogenetics in the Age of Genomics: Prospects and Challenges Phylogenetics in the Age of Genomics: Prospects and Challenges Antonis Rokas Department of Biological Sciences, Vanderbilt University http://as.vanderbilt.edu/rokaslab http://pubmed2wordle.appspot.com/

More information

Unit 9: Evolution Guided Reading Questions (80 pts total)

Unit 9: Evolution Guided Reading Questions (80 pts total) Name: AP Biology Biology, Campbell and Reece, 7th Edition Adapted from chapter reading guides originally created by Lynn Miriello Unit 9: Evolution Guided Reading Questions (80 pts total) Chapter 22 Descent

More information

As time goes by: A simple fool s guide to molecular clock approaches in invertebrates*

As time goes by: A simple fool s guide to molecular clock approaches in invertebrates* Amer. Malac. Bull. 27: 25-45 (2009) As time goes by: A simple fool s guide to molecular clock approaches in invertebrates* Thomas Wilke, Roland Schultheiß, and Christian Albrecht Department of Animal Ecology

More information

Biology 211 (2) Week 1 KEY!

Biology 211 (2) Week 1 KEY! Biology 211 (2) Week 1 KEY Chapter 1 KEY FIGURES: 1.2, 1.3, 1.4, 1.5, 1.6, 1.7 VOCABULARY: Adaptation: a trait that increases the fitness Cells: a developed, system bound with a thin outer layer made of

More information

PHYLOGENY AND SYSTEMATICS

PHYLOGENY AND SYSTEMATICS AP BIOLOGY EVOLUTION/HEREDITY UNIT Unit 1 Part 11 Chapter 26 Activity #15 NAME DATE PERIOD PHYLOGENY AND SYSTEMATICS PHYLOGENY Evolutionary history of species or group of related species SYSTEMATICS Study

More information

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between

More information

Local and relaxed clocks: the best of both worlds

Local and relaxed clocks: the best of both worlds Local and relaxed clocks: the best of both worlds Mathieu Fourment and Aaron E. Darling ithree institute, University of Technology Sydney, Sydney, Australia ABSTRACT Time-resolved phylogenetic methods

More information

The Tempo of Macroevolution: Patterns of Diversification and Extinction

The Tempo of Macroevolution: Patterns of Diversification and Extinction The Tempo of Macroevolution: Patterns of Diversification and Extinction During the semester we have been consider various aspects parameters associated with biodiversity. Current usage stems from 1980's

More information

Theory of Evolution Charles Darwin

Theory of Evolution Charles Darwin Theory of Evolution Charles arwin 858-59: Origin of Species 5 year voyage of H.M.S. eagle (83-36) Populations have variations. Natural Selection & Survival of the fittest: nature selects best adapted varieties

More information

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline Phylogenetics Todd Vision iology 522 March 26, 2007 pplications of phylogenetics Studying organismal or biogeographic history Systematics ating events in the fossil record onservation biology Studying

More information

MOLECULAR EVIDENCE ON PLANT DIVERGENCE TIMES 1

MOLECULAR EVIDENCE ON PLANT DIVERGENCE TIMES 1 American Journal of Botany 91(10): 1656 1665. 2004. MOLECULAR EVIDENCE ON PLANT DIVERGENCE TIMES 1 MICHAEL J. SANDERSON, 2,5 JEFFREY L. THORNE, 3 NIKLAS WIKSTRÖM, 4 AND KÅRE BREMER 4 2 Section of Evolution

More information

Introduction to Phylogenetic Analysis

Introduction to Phylogenetic Analysis Introduction to Phylogenetic Analysis Tuesday 24 Wednesday 25 July, 2012 School of Biological Sciences Overview This free workshop provides an introduction to phylogenetic analysis, with a focus on the

More information

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree) I9 Introduction to Bioinformatics, 0 Phylogenetic ree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & omputing, IUB Evolution theory Speciation Evolution of new organisms is driven by

More information

Reconstructing the history of lineages

Reconstructing the history of lineages Reconstructing the history of lineages Class outline Systematics Phylogenetic systematics Phylogenetic trees and maps Class outline Definitions Systematics Phylogenetic systematics/cladistics Systematics

More information

A Simple Method for Estimating Informative Node Age Priors for the Fossil Calibration of Molecular Divergence Time Analyses

A Simple Method for Estimating Informative Node Age Priors for the Fossil Calibration of Molecular Divergence Time Analyses A Simple Method for Estimating Informative Node Age Priors for the Fossil Calibration of Molecular Divergence Time Analyses Michael D. Nowak 1 *, Andrew B. Smith 2, Carl Simpson 3, Derrick J. Zwickl 4

More information

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University Phylogenetics: Bayesian Phylogenetic Analysis COMP 571 - Spring 2015 Luay Nakhleh, Rice University Bayes Rule P(X = x Y = y) = P(X = x, Y = y) P(Y = y) = P(X = x)p(y = y X = x) P x P(X = x 0 )P(Y = y X

More information

Anatomy of a tree. clade is group of organisms with a shared ancestor. a monophyletic group shares a single common ancestor = tapirs-rhinos-horses

Anatomy of a tree. clade is group of organisms with a shared ancestor. a monophyletic group shares a single common ancestor = tapirs-rhinos-horses Anatomy of a tree outgroup: an early branching relative of the interest groups sister taxa: taxa derived from the same recent ancestor polytomy: >2 taxa emerge from a node Anatomy of a tree clade is group

More information

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz Phylogenetic Trees What They Are Why We Do It & How To Do It Presented by Amy Harris Dr Brad Morantz Overview What is a phylogenetic tree Why do we do it How do we do it Methods and programs Parallels

More information

PHYLOGENY & THE TREE OF LIFE

PHYLOGENY & THE TREE OF LIFE PHYLOGENY & THE TREE OF LIFE PREFACE In this powerpoint we learn how biologists distinguish and categorize the millions of species on earth. Early we looked at the process of evolution here we look at

More information

Molecular Evolution & Phylogenetics

Molecular Evolution & Phylogenetics Molecular Evolution & Phylogenetics Heuristics based on tree alterations, maximum likelihood, Bayesian methods, statistical confidence measures Jean-Baka Domelevo Entfellner Learning Objectives know basic

More information

New Inferences from Tree Shape: Numbers of Missing Taxa and Population Growth Rates

New Inferences from Tree Shape: Numbers of Missing Taxa and Population Growth Rates Syst. Biol. 51(6):881 888, 2002 DOI: 10.1080/10635150290155881 New Inferences from Tree Shape: Numbers of Missing Taxa and Population Growth Rates O. G. PYBUS,A.RAMBAUT,E.C.HOLMES, AND P. H. HARVEY Department

More information

Phylogenetic inference

Phylogenetic inference Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis-) advantages of different information types

More information

Name. Ecology & Evolutionary Biology 2245/2245W Exam 2 1 March 2014

Name. Ecology & Evolutionary Biology 2245/2245W Exam 2 1 March 2014 Name 1 Ecology & Evolutionary Biology 2245/2245W Exam 2 1 March 2014 1. Use the following matrix of nucleotide sequence data and the corresponding tree to answer questions a. through h. below. (16 points)

More information

Preliminaries. Download PAUP* from: Tuesday, July 19, 16

Preliminaries. Download PAUP* from:   Tuesday, July 19, 16 Preliminaries Download PAUP* from: http://people.sc.fsu.edu/~dswofford/paup_test 1 A model of the Boston T System 1 Idea from Paul Lewis A simpler model? 2 Why do models matter? Model-based methods including

More information

EVOLUTIONARY DISTANCES

EVOLUTIONARY DISTANCES EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:

More information

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Bioinformatics tools for phylogeny and visualization. Yanbin Yin Bioinformatics tools for phylogeny and visualization Yanbin Yin 1 Homework assignment 5 1. Take the MAFFT alignment http://cys.bios.niu.edu/yyin/teach/pbb/purdue.cellwall.list.lignin.f a.aln as input and

More information

DATING THE DIPSACALES: COMPARING MODELS,

DATING THE DIPSACALES: COMPARING MODELS, American Journal of Botany 92(2): 284 296. 2005. DATING THE DIPSACALES: COMPARING MODELS, GENES, AND EVOLUTIONARY IMPLICATIONS 1 CHARLES D. BELL 2 AND MICHAEL J. DONOGHUE Department of Ecology and Evolutionary

More information

Maximum Likelihood Until recently the newest method. Popularized by Joseph Felsenstein, Seattle, Washington.

Maximum Likelihood Until recently the newest method. Popularized by Joseph Felsenstein, Seattle, Washington. Maximum Likelihood This presentation is based almost entirely on Peter G. Fosters - "The Idiot s Guide to the Zen of Likelihood in a Nutshell in Seven Days for Dummies, Unleashed. http://www.bioinf.org/molsys/data/idiots.pdf

More information

Improving divergence time estimation in phylogenetics: more taxa vs. longer sequences

Improving divergence time estimation in phylogenetics: more taxa vs. longer sequences Mathematical Statistics Stockholm University Improving divergence time estimation in phylogenetics: more taxa vs. longer sequences Bodil Svennblad Tom Britton Research Report 2007:2 ISSN 650-0377 Postal

More information

GENETICS - CLUTCH CH.22 EVOLUTIONARY GENETICS.

GENETICS - CLUTCH CH.22 EVOLUTIONARY GENETICS. !! www.clutchprep.com CONCEPT: OVERVIEW OF EVOLUTION Evolution is a process through which variation in individuals makes it more likely for them to survive and reproduce There are principles to the theory

More information