Evolutionary rate variation among vertebrate β globin genes: Implications for dating gene family duplication events

Size: px
Start display at page:

Download "Evolutionary rate variation among vertebrate β globin genes: Implications for dating gene family duplication events"

Transcription

1 Gene 380 (2006) Evolutionary rate variation among vertebrate β globin genes: Implications for dating gene family duplication events Gabriela Aguileta a, Joseph P. Bielawski a,b,c,, Ziheng Yang a a University College London, Department of Biology, Darwin Building, Gower Street, London WC1E 6BT, England b Dalhousie University, Department of Biology, Halifax, Nova Scotia, Canada B3H 4J1 c Dalhousie University, Department of Mathematics and Statistics, Halifax, Nova Scotia, Canada B3H 4J1 Received 28 June 2005; received in revised form 10 April 2006; accepted 24 April 2006 Available online 4 May 2006 Received by A. Roger Abstract A comprehensive dataset of 62 β globin gene sequences from various vertebrates was compiled to test the molecular clock and to estimate dates of gene duplications. We found that evolution of the β globin family of genes is not clock-like, a result that is at odds with the common use of this family as an example of a constant rate of evolution over time. Divergence dates were estimated either with or without assuming the molecular clock, and both analyses produced similar date estimates, which are also in general agreement with estimates reported previously. In addition we report date estimates for seven previously unexamined duplication events within the β globin family. Despite multiple sources of rate variation, the average rate across the β globin phylogeny yielded reasonable estimates of divergence dates in most cases. Exceptions were cases of gene conversion, where it appears to have led to underestimates of divergence dates. Our results suggest (i) the major duplications giving rise to the paralogous β globin genes are associated with significant evolutionary rate variation among gene lineages; and (ii) genes arising from more recent gene duplications (e.g., tandem duplications within lineages) do not appear to differ greatly in rate. We believe this pattern reflects a complex interplay of evolutionary forces where natural selection for diversifying paralogous functions and lineage-specific effects contribute to rate variation on a long-term basis, while gene conversion tends to increase sequence similarity. Gene conversion effects appear to be stronger on recent gene duplicates, as their sequences are highly similar. Lastly, phylogenetic analyses do not support a previous report that avian globins are members of a relic lineage of ω globins Elsevier B.V. All rights reserved. Keywords: Divergence times; Rate variation; Global and local clock models; Bayesian method 1. Introduction In the β globin family, there have been extensive gene duplications and functional divergences (Hardison, 1998). The divergence of the α and β globins was dated to millions of years ago (Myra) (Goodman et al., 1987). The first known duplication within the β globin gene family gave rise to a proto-β and Abbreviations: CI, credibility intervals; HKY85G, Hasewaga Kishino Yano 1985 Gamma corrected; JC, Jukes Cantor; LRT, likelihood ratio test; MCMC, Markov chain Monte Carlo; ML, maximum likelihood; Myra, million years ago; Myr, million years. Corresponding author. Dalhousie University, Department of Biology, Halifax, Nova Scotia, Canada B3H 4J1. Tel.: ; fax: address: j.bielawski@dal.ca (J.P. Bielawski). aproto-ε gene, and is estimated to have occurred between 150 and 200 Myra (Czelusniak et al., 1982). Between 100 and 140 Myra a duplication of proto-ε led to the evolution of ε and γ globins. In simian primates, γ globins underwent a further duplication around 35 Myra; the resulting paralogs are referred to as A γ and G γ (Hayasaka et al., 1992). Also, duplication of proto-β between 80 and 90 Myra led to the evolution of the mammalian β and δ globins (Hardison and Margot, 1984). The complicated history of gene duplication has produced confusion about orthologous relationships among some members of the β globin family. Globins have been cited as a classic example of a constant rate of evolution over time (e.g., Bromham and Penny, 2003). Early attempts to estimate dates of species divergences and gene duplication events from globin sequences were conducted under this assumption, i.e., a strict molecular clock (e.g., Zuckerkandl and /$ - see front matter 2006 Elsevier B.V. All rights reserved. doi: /j.gene

2 22 G. Aguileta et al. / Gene 380 (2006) Pauling, 1962; Czelusniak et al., 1982). However, the clock is often violated in comparisons of vertebrate genes; interestingly such an observation was anticipated by Zuckerkandl and Pauling (1962), and is now widely acknowledged (for a review see Bromham and Penny, 2003). Recent work has established that a severe violation of the clock assumption can lead to seriously biased time estimates (Aris-Brosou and Yang, 2003). Newly developed likelihood and Bayesian methods permit estimation of divergence dates when evolutionary rates differ among lineages and may be expected to produce more reliable estimates (e.g., Thorne et al., 1998; Yang and Yoder, 2003). Those methods can also analyze multiple gene partitions and use multiple fossil calibration points. A goal of our study was to investigate evolutionary rate variation among lineages in the β globin family of genes, and to assess any impact this might have on estimates of divergence dates under different analytical methods. In this paper, we inferred a phylogeny for the vertebrate β globin family of genes. We employed the gene tree to date gene duplication and conversion events, and test for rate variation. We estimated such dates by using recently developed methods that do not require the assumption of a molecular clock and compared the estimates to previous ones obtained assuming a strict molecular clock. Also, the sources of rate variation among β globin genes were investigated. We tested the hypothesis that rate variation might be associated with recent tandem duplication events, and that two recently duplicated genes might start to diverge in rate from each other soon after they split. Finally, we tested for lineagespecific effects on evolutionary rates. Fig. 1. Maximum likelihood phylogeny of the rooted ingroup tree for the β globin gene family. Species names and GenBank accession numbers are listed next to each sequence. Circles represent calibration nodes. Squares represent gene duplication events. Nodes of interest are numbered according to Jeff Thorne's multidivtime node assignment. The grey bar indicates the proposed time for the eutherian mammal radiation (Bowen et al., 2002).

3 G. Aguileta et al. / Gene 380 (2006) Materials and methods 2.1. Sequence data Sixty-two vertebrate β globin gene sequences were retrieved from GenBank, including amphibians, birds and mammals, with fish used as outgroups. The ingroup species names and GenBank accession numbers are identified in the tree of Fig. 1. The outgroup sequences were: Zebrafish βa1 (U50382), βa2 (U50379) and βe1 (AF082662); carp β globin (AB004740); salmon β globin (Y08923) and Oryzias ε globin (AB080118). Alignment was conducted using Clustal X. Sequences were easy to align; only minor manual adjustments were necessary. Sites with gaps were removed before analysis Phylogenetic analysis Maximum likelihood (ML) and Bayesian methods were used to estimate a phylogeny for the vertebrate β globin sequences. PAUP (Swofford, 2000) was used to conduct the ML analysis. The model of nucleotide substitution was HKY85G. Support for the branches was assessed with a bootstrap analysis with 100 pseudoreplicates. MrBayes (Huelsenbeck and Ronquist, 2001) was used to conduct the Bayesian analysis, with the HKY85G model assumed. The Markov Chain Monte Carlo (MCMC) algorithm was run with three chains for ten million generations and sampled every 100 generations. This process was repeated three times to check for convergence. Stationarity was reached after sampling 2000 trees; trees sampled before stationarity were excluded from the analysis Divergence date estimation Fossil calibrations are available for 15 ancestral nodes distributed across the phylogeny (Fig. 1 and Table 1) Likelihood models of global and local clocks We analyzed the data using the three codon positions combined but accounting for the differences in rate among them. We used the ML tree for date estimation. We estimated divergence dates under both the global clock and the local clock models. In the latter case, we used a recently developed method that permits multiple calibration points, thereby allowing us to use different fossil dates distributed across the phylogeny (Yang and Yoder, 2003). Parameters in the models include the substitution rate (μ) and the ages of nodes which are not calibration points. The nucleotide substitution model was HKY85G. The local clock models allowed us to test whether the paralogous genes in the β globin family were evolving under significantly different rates. Four rates were specified for each clade of genes in the phylogeny (i.e., β, γ, ε, and all other branches). A likelihood ratio test (LRT) was used to compare the local clock model (i.e., each gene clade was allowed to have a different rate) with the global clock model (i.e., all gene clades had the same rate) (Yang and Yoder, 2003). ML models of global and local clock were employed as implemented in the baseml program in the PAML package (Yang, 1997). Table 1 Calibration dates for ancestral nodes in Fig. 1 (in millions of years) Node Range Mid-value References C 1 birds/mammals (Benton, 1999) C 2 primate basal radiation (Tavare et al., 2002) C 3 monkeys/apes (Shoshani et al., 1996) C 4 gorilla/human (Shoshani et al., 1996) C 5 artiodactyl radiation (Bowen et al., 2002) C 6 rat/mouse (Jacobs and Downs, 1994) C 7 primate basal radiation (Tavare et al., 2002) C 8 artiodactyl radiation (Bowen et al., 2002) C 9 eutherian radiation (Goodman et al., 1987) C 10 primate basal radiation (Tavare et al., 2002) C 11 monkey/ape (Shoshani et al., 1996) C 12 duck/chicken (Cracraft, 2001) C 13 stem marsupial (Luo et al., 2001) Bony fishes/amphibians (outgroup) (Bromham et al., 1998) Bayesian method for divergence date estimation We used a Bayesian MCMC approach (Thorne et al., 1998) as implemented in a program package written by Jeff Thorne. The program estbranches was used to produce the ML estimates of branch lengths for the rooted ingroup tree and the variance covariance matrix. Fish sequences were used as outgroups to locate the root in the ingroup tree. The HKY85G model of nucleotide substitution was assumed. The transition/transversion rate ratio and the shape parameter of the gamma distribution were obtained using PAML (Yang, 1997). The output of estbranches was used in the program multidivtime to estimate divergence dates. In this analysis we used the dataset comprised of all codon positions, and accounted for the differences in rate among them. Fossil calibration was specified as lower and upper bounds for ages of ancestral nodes (Table 1). The MCMC chain was run at least twice for 100,000 generations after a burn-in of 10,000 generations. We sampled the chain every 10 generations. The priors for multidivtime are as follows: (i) a gamma prior for the time of the root with a mean of 400 Myr and a standard deviation of 200 Myr; (ii) a gamma prior for the rate at the root with a mean of 0.23 and a standard deviation of 0.12; and (iii) a gamma prior for the parameter ν, which controls the variability of rates over time, with ν=0.4 and a standard deviation of 0.4. Note that the method of Yang and Yoder (2003) allows only fixed node ages; whereas Bayesian approaches allow a range to be used. In order to have comparable calibration points we conducted our ML-based analyses by specifying the mid-value of the ranges we used in our Bayesian analysis (Table 1). 3. Results 3.1. Phylogenetic analysis Maximum likelihood and Bayesian analyses resulted in similar topologies (Fig. 1). Most branches were supported by high bootstrap proportions (N 90%) and posterior probabilities (N 90%, data not shown). Phylogenetic relationships within clades of genes (e.g., β, γ, δ, andε) are in general agreement with the expected species tree; exceptions occur where gene conversion

4 24 G. Aguileta et al. / Gene 380 (2006) has affected the topology (Aguileta et al., 2004) and where there have been additional gene duplication events (e.g., the tandem duplications of β globin within the rodent and artiodactyl lineages). Besides dating gene duplications, we were interested in dating gene conversion events, as identified by Aguileta et al. (2004). Sequences affected by gene conversion include: tarsier β and δ globins; bushbaby β and δ globins; goat βa and βc globins; artiodactyl β and γ globins; mouse β1and β2 globins; human and chimpanzee A γ and G γ globins; and mouse βh0 and βh1 globins. All the above sets of sequences are sister taxa because of historical recombination events (e.g., δ tarsier is sister to β tarsier in Fig. 1 instead of being sister to another δ globin). In some cases functionally different paralogs are sister to each other due to lineage- Fig. 2. Two evolutionary models for the β globin gene family. The gene trees show relationships of the genes within eutherian, marsupial and avian organismal lineages. The dotted line represents the ω lineage and the interrupted line represents the β-like lineage. Diamonds represent speciation events and circles represent gene duplications. Putative gene losses are marked at the tips with a question mark (?) (A) In the model of Wheeler et al. (2001), there are two major gene lineages shared between mammals and birds, and ω globins are more closely related to bird β globins. (B) In our model, based on the gene tree from phylogenetic analysis of 62 vertebrate β-like globin gene sequences, bird β globins are orthologous to both eutherian and marsupial β globins.

5 G. Aguileta et al. / Gene 380 (2006) specific duplication events (e.g., γ cow, γ sheep, γ goat, and mouse βh0 and βh1). For example, a duplication of β globin within the artiodactyl lineage led to the evolution of a globin that is functionally analogous to the vertebrate γ globins, and was labeled as a γ globin (Czelusniak et al., 1982). In Fig. 1, the most basal taxa are frog globins, followed by marsupial ω globins and bird globins. Marsupial ω globin is thought to be a relic gene that was lost in the eutherian lineage (Wheeler et al., 2001). According to a model proposed by Wheeler et al. (2001), present-day avian globins are ortholgous to the marsupial ω globins (Fig. 2A) and should have appeared as sister taxa in our tree. In contrast, we find that avian globins are sister to the eutherian β globins (Fig. 1), indicating a paralogous relationship between avian globins and marsupial ω globins. As expected, the eutherian mammal sequences are monophyletic, with the first duplication in that group being that of proto-β and proto-ε. Also consistent with previous studies, our tree indicated that a subsequent duplication event in the proto-β clade had generated the β and δ lineages (node 101 in Fig. 1). Another duplication within the proto-ε lineage yielded the ε and γ lineages (node 84 in Fig. 1) Estimation of dates for species divergences and gene duplications We first used ML methods to estimate dates for nodes other than those fixed as calibration points (Table 1) under both local and global clock models. The estimates listed in Table 2 were obtained by modeling the heterogeneity among the three codon positions. We present only the results obtained using the HKY85G model, as there is little difference in results between the latter and the JC models (e.g., the largest difference being about 17 Myr at nodes 106 [ ] and 107 [ ]) (data not shown). All our estimated dates of gene duplication are within the range reported in previous studies, although some are somewhat younger. For instance, our estimate for the G γ A γ divergence was between 0.7 and 3.4 Myr, compared with a previous estimate at around 35 Myra (Hardison, 1984). We believe that estimated dates for divergences of primate A γ and G γ and the mouse βh0 and βh1 and β1 andβ2 are younger than the actual dates because of increased sequence similarity due to gene conversion. It seems caution must be exercised with dates estimated from gene families, particularly when estimated dates are shallower than expected. We compared the global clock model with an alternative model where each gene lineage (e.g., β, ε, γ) is allowed to have a different rate (i.e., local clock model) by means of a likelihoodratio test (LRT). Note that the β globin lineage includes the δ globin lineage because the latter cannot be analyzed independently, as there are not enough δ globin sequences available. The LRT rejected the global clock model in favor of the alternative model where paralogous genes are allowed different rates (2δ=63.4, df=3, Pb0.0001). Hence, we also estimated divergence dates under a local clock assuming different rate classes for each of the lineages corresponding to β, ε, γ genes, and the rest of the phylogeny (i.e., r 1 for β, r 2 for γ, r 3 for ε, andr 4 for the remaining branches). ML date estimates under the local clock model with four rates are listed in column (b) in Table 2.Date estimates were variable between the global-and local clock models. For instance, the estimate obtained for the ε and ρ divergence was 49.4 Myra under the global clock, whereas the local clock model resulted in 55.1 Myra (compare columns (a) and (b) in Table 2). The global clock estimate for the split between ε and γ was Myra, and the estimate under the local clock model was (compare columns (a) and (b) in Table 2). Nevertheless, these differences are not large, with all estimates within the range suggested by previous reports and the fossil data of the relevant species. We also used the Bayesian method (Thorne et al., 1998) to estimate divergence dates. An attractive feature of the Bayesian method is the ability to compute a 95% credibility interval (CI) for an estimated divergence date. Posterior means of divergence times for ancestral nodes and their 95% CIs are listed in Table 2. The model accounts for differences among codon positions. The prior mean rate at the root is the ML estimate under the global clock for all codon positions. The dates obtained with and without the clock assumption do not vary greatly (Table 2). Differences are higher between the ML and Bayesian methods than between Table 2 ML and Bayesian estimates of duplication dates with and without assuming the clock Node Maximum likelihood Bayesian analysis Global clock Local clock Clock No clock (a) (b) (c) (d) 58 chicken (33.5, 86.6) 59.3 (32.1, 87.8) ε/ρ 62 mouse (11.5, 49.8) 29.0 (12.0, 50.3) βh0/βh1 66 Cebus (0.0, 9.4) 3.4 (0.1, 9.8) G γ /A γ 74 γ clade (85.3, 119.3) (84.7, 119.2) 83 ε clade (78.6, 116.0) 96.9 (78.4, 116.7) 84 ε/γ (104.1, 149.0) (103.9, 148.8) 85 mouse (0.5, 11.6) 6.0 (0.6, 11.6) β 1 /β 2 90 goat (7.8, 35.9) 20.6 (7.8, 35.2) βa/βc (26.4, 57.3) 41.4 (26.6, 57.0) artiodactyl βa βc/γ 95 bushbaby β/δ (2.6, 29.0) 14.8 (2.7, 29.1) 96 tarsier β/δ (12.1, 42.6) 26.7 (12.5, 42.5) 98 δ clade (32.1, 37.8) 34.8 (32.1, 37.8) 106 β clade (81.4, 130.1) (81.0, 129.6) 107 proto β/ε (124.2, 182.9) (123.9, 182.3) Note: The first column reports the node numbers in the phylogeny (Fig. 1) for which we estimated dates. Numbers represent either basal nodes for each member of the β-family of genes (i.e., γ globins [node 74], ε globins [node 83], δ globins [node 98], β globins [node 106], proto-β/ε globins [node 107), gene conversion events (nodes 95 and 96) or tandem duplication events (all other node numbers). The estimates in columns (a) and (b) are ML estimates obtained from the three codon positions combined and accounting for the differences in rate among them. Bayesian estimates are in columns (c) and (d), estimated with and without the clock assumption. Under the Bayesian result columns 95% credibility intervals (CIs) are indicated in parentheses.

6 26 G. Aguileta et al. / Gene 380 (2006) different models. The largest differences occur at four nodes: (i) node 66 is younger under ML by approximately 3 Myr; (ii) node 95 is younger under ML dates by nearly 9 Myr; (iii) node 90 is older under ML by around 11 Myr; and (iv) node 92 is older under ML by approximately 10 Myr (Table 2). In all those cases the 95% CI includes the ML estimate. In general, the two approaches produced similar dates for globin gene duplications. The average rates for each of the three codon positions obtained under the global clock were 0.186, 0.055, and substitutions per site per year, for the first, second and third codon positions respectively (Table 3). The estimated absolute rates under the local clock model show that the β and γ genes evolve with a similar rate (0.140 and substitutions per site per year, respectively, under the local clock with model HKY85G) (Table 3). ε genes evolve at the slowest rate ( substitutions per site per year under the local clock with model HKY85G) (Table 3). The observed differences in rate among the three gene lineages are consistent with their differences in selective pressure, with ε having the most constrained rate of nonsynonymous substitution of the three (Aguileta et al., 2004) Variable rates of evolution The β globin gene family offers the possibility to investigate both older (deeper) and more recent gene duplication events in terms of the rate variability associated with them. This analysis may help us understand in what way gene duplicates vary in rate both in the long- and short-term after their divergence. First, we conducted additional tests to determine if evolutionary rates might also have diverged following the more recent tandem duplication events. We refer to sister genes that arose via a recent gene duplication as a recent gene pair. We tested the following recent gene pairs: (1) β and δ globin from tarsier; (2) β and δ globin from Table 3 ML Estimates of substitution rates for the four branch classes ( 10 8 substitutions per site per year) Model Averaged Mixed model Codon HKY85G HKY85G position/ gene name Global clock st nd rd Local clock BG a β ε γ BG β ε γ BG β ε γ a Letters at the far right column indicate the rate for each corresponding codon position ( ) or gene (BG, β, ε, γ). BG corresponds to background, used to designate the rate in the rest of the phylogeny. 1st, 2nd, and 3rd correspond to the three codon positions, respectively. bushbaby; (3) βh0 βh1 from mouse; (4) β1 β2 from mouse; (5) G γ A γ from Cebus; (6)G γ A γ from chimpanzee; (7) G γ A γ from human; and (8) βa1 βa2 from zebrafish (used as outgroup). We compared the rate of evolution in recent gene pairs with the rate across the rest of the genes in the tree. We assigned one rate class to the tested pair of duplicate genes and one rate class to the rest of the genes (two-rates test). P-values were corrected for multiple tests by using the Bonferroni method. We compared each case with the global clock model using LRTs. Only two of the eight LRTs were significant: mouse βh0 and βh1 (2δ=21.44, df=1, P- value b ); and mouse β1 and β2 (2δ = 42.25, df= 1, P- value b ). These findings suggest that only a minority of recent gene duplication events in this gene family was associated with a change in the evolutionary rate. Functional divergence among paralogous members of a gene family can lead to differences in selection pressure, and consequently the average rate of amino acid evolution (Aguileta et al., 2004). Using the same eight pairs of genes described above, we tested whether the two sequences in a recent gene pair evolved under divergent evolutionary rates. This analysis may indicate if rates of substitution begin to diverge between gene duplicates soon after they split. To do this, we assigned one rate class to each duplicate in a selected gene pair and a third rate class to the rest of the genes in the tree (three-rates model). We constructed the LRTs to compare the null (two-rates) model with the alternative (threerates) model, with P-values corrected for multiple tests by using the Bonferroni method. Only one of the eight LRTs was significant: mouse β1andβ2(2δ=17.66,df=1,p-value=0.008). The majority of recent gene duplicates appear to evolve under relatively homogenous rates. We also examined possible rate differences in species lineages such as primates and rodents, which have very different long-term effective population sizes, generation times and metabolic rates (O'hUigin and Li, 1992). From the original 62-sequence dataset we sampled two sets of sequences, the mammalian β globins (23 sequences) and the mammalian γ globins (14 sequences). Within these separate sets we allowed primates and rodents to have a different rate according to three different tests: (i) we assigned one rate for primate β globin genes and one rate for non-primate β globin genes; (ii) we assigned one rate for primate γ globin genes and one rate for non-primate γ globin genes; and (iii) we assigned one rate for rodent β globin genes and one rate for non-rodent β globin genes. An LRT was used to compare these three local clock models to the global clock, thus testing for an organismal lineage effect on evolutionary rate. P- values were corrected for multiple tests by using the Bonferroni method. We detected an organismal lineage effect on substitution rate in all three tests: primate β globins vs. non-primate β globins (2δ= 14.02, df= 1, P-value= ); primate γ globins vs. nonprimate γ globins (2δ =21.44, df=1, P-valueb ); and rodent β globins vs. non-rodent β globins (2δ =27.06, df=1, P- value b ). These findings indicate that primate β globins evolve approximately 1.5 times faster than non-primate β globins (0.166 and substitutions per site per year, respectively). In turn, primate γ globins evolve roughly 17 times faster than rodent γ globins (0.117 and substitutions per site per year, respectively) an acceleration that may be related to the A γ G γ divergence in primates. Further tests are necessary to evaluate this

7 G. Aguileta et al. / Gene 380 (2006) possibility. Also, rodent β globins are evolving approximately 2.1 times as fast as non-rodent globins (rodent β globins: 0.263, nonrodent β globins: substitutions per site per year). Collectively, these results suggest an organism-lineage effect on evolutionary rates, with rodent and primate globins typically evolving more quickly, on average, than other organism lineages. 4. Discussion 4.1. Avian β globins are not orthologous to marsupial ω globins Only recently an orphaned globin (i.e., one that is unlinked to previously characterized globin gene clusters) was discovered in present-day marsupials (Wheeler et al., 2001). This orphaned globin, called ω, was highly divergent from all the sequences sampled for this study, suggesting a very early origin within the β globin family (Wheeler et al., 2001). Previous phylogenetic analyses suggested that the marsupial ω globins were most closely related to the avian globins (Wheeler et al., 2001). In order to explain this relationship Wheeler et al. (2001) proposed a model whereby an ancient gene duplication led to two major evolutionary lineages (Fig. 2A): (i) a lineage leading to a cluster of mammalian β-like globin genes; and (ii) a lineage leading to the avian β-like globin genes and the ω globin genes. To reconcile this model with the organismal history, the entire β-like globin cluster must be lost early in the history of the avian lineage and the ω globins must be lost early in the history of the eutherian lineage (Fig. 2A). Their model represented a significant change in the status of the β-like globin cluster, as it suggested that present-day avian and eutherian β clusters are not orthologous. Our estimated divergence date indeed confirms an ancient origin for ω globin genes that predates the origins of birds and mammals (Fig. 1: around 353 Myra); however, our phylogenetic relationships did not support the model of Wheeler et al. (2001). Based on a much larger sample of globin genes, we found support (60% bootstrap under ML, and 96% posterior probability) for a sister relationship between avian and mammalian β-like globin genes (Fig. 1). We provide an alternate model for the evolution of the β-like globin cluster in which the avian and mammalian β clusters are orthologs (Fig. 2B). This model also requires that a globin locus is lost twice; here, both events involve the loss of the ω globins, once early in the avian lineage, and once early in the eutherian lineage. We note that no orthologs of ω globin genes have been reported in monotremes. If monotremes do not possess an ω globin gene, both models of globin evolution will require the assumption of a third loss early in the history of the monotreme lineage Dates of gene divergence and gene conversion in the vertebrate β globin gene family Most dates estimated in this study for β globin genes are in fairly good agreement with previous studies. Interestingly, in a recent survey of 22 different proteins, Glazko et al. (2005) found that using a strict clock provided a useful approximate time-scale for mammalian divergence dates, despite some genes displaying significant rate variation among lineages. In contrast to comparisons across much deeper phylogenetic divergences (e.g., Aris-Brosou and Yang, 2003), local deviations from a strict clock model in vertebrate genes do not necessarily yield biased time estimates. Further work is needed to understand the inherent limitations of the different methods of divergence date estimation, and to resolve when one should use methods that correct for rate variation among lineages. Our study is the first to place confidence intervals as measures of accuracy of the point estimates for globin-derived dates. Our analysis is based on more realistic models, accounting for differences in substitution dynamics among codon positions and among lineages. Although the confidence intervals do not explicitly accommodate errors in the assignment of fossils to a taxonomic lineage, the use of multiple calibration points is expected to reduce the sensitivity of results to such errors. Furthermore, by employing a range of dates for each calibration point the Bayesian methods accommodate uncertainty in the assignment of fossils to a particular geological date. Although our estimates are remarkably similar to previous estimates, we illustrate that the errors associated with the earlier point estimates may be large. Gene conversion is an effective mechanism for increasing sequence similarity among members of a gene family. In the case of the β globin gene family, we believe that estimated dates for divergences of primate A γ G γ and the mouse βh0 βh1 and β1 β2 are younger than the actual dates because of increased sequence similarity due to gene conversion. Evolution by gene conversion is a common process in many gene families. As methods are developed that allow efficient estimation of dates from multiple datasets (Thorne et al., 1998; Yang and Yoder, 2003), gene families will become an attractive source of data from which to estimate divergence dates. This is particularly true because a single calibration, derived from a speciation event, can be used multiple times in different paralogs. However, it is more likely that such datasets will have evolved under a history of frequent recombination, as compared with single copy genes, in which case gene conversion effects must be accounted for Rate variation among recently duplicated genes We found significant heterogeneity in evolutionary rates within the β globin family of genes. Although we have shown that incorrect assumption of clock-like evolution has not negatively impacted previous efforts to estimate the dates of the major duplication events within this gene family, rate heterogeneity among globin genes is nevertheless important to understand. The pattern of rate variation is relevant to understanding the processes that influence the evolution and functional divergence of genes. Some rate variation among globins appears to reflect functional divergence of major lineages of paralogs; i.e., β, γ, and ε (this study; Bielawski and Yang, 2004; Aguileta et al., 2004). For example, Bielawski and Yang (2004) found that a weakened relationship between γ-chain hemoglobin and 2.3-DPG is connected with divergence in selection pressure among γ and ε globins, thereby resulting in divergent rates of nonsynonymous substitution at major structural and functional features of the hemoglobin tetramer. In this study we further evaluated the relationship between evolutionary rate and the process of gene duplication by examining

8 28 G. Aguileta et al. / Gene 380 (2006) recent gene duplication events. Interestingly, we found very little evidence for a significant difference in rates of recently duplicated genes as compared with the rest of the phylogeny. Only two such cases were observed: mouse βh0 βh1 and mouse β1 β2, which we believe are affected by gene conversion. With one exception, we found no evidence for significant difference in evolutionary rate between the recently duplicated sister genes. The exception was the mouse β1 β2 globin gene pair. Clearly, most of the recent duplication events in the β globin family were not associated with a significant change in evolutionary rate. In this study we found evidence for lineage-specific effects within globins, with rodents having higher evolutionary rates than primates, and both having higher rates than other lineages. Because we measured the rate of amino acid substitution, our rate estimates reflect the impact of the fixation probability of mutations, as well as the impact of functional divergence on the distribution of selective pressures. These two aspects of the evolutionary rate variation have different origins. Rate variation among orthologous genes is influenced by organism-specific effects such as, but not limited to, effective population size and mutation rate. Rate variation among paralogous globin genes is significantly impacted by differences in selective pressure among functionally divergent members of the family (e.g., Bielawski and Yang, 2004; Aguileta et al., 2004). We suggest that it is the combined effect of these two forces that results in the observed patterns of rate variability within the β globin gene family Tempo of evolution in the vertebrate β globin gene family The β globin family of genes exhibits a complex pattern of evolutionary rates. Differential rates of evolution reflect a tradeoff between the frequency of gene conversion, which is an evolutionary force for increased sequence similarity, and natural selection pressure for divergent functions in globins, which is a force of increased sequence heterogeneity. In addition to the dynamics of these forces, there are lineage-specific effects which also influence the evolutionary rate of globins. The complex interplay of such forces is likely to differ among different gene families and provides at least a partial explanation for the conflicting results among surveys of evolutionary rates across other gene families. For instance, accelerated rates of evolution are often associated with gene duplication events (e.g., Aguileta et al., 2004). Such an effect is usually thought either to reflect adaptive evolution or to reflect an increase in neutral evolutionary space due to redundancy. However, when the sample of genes is restricted to relatively recent divergences, (e.g., this study) the evidence for heterogeneous evolutionary rates is substantially reduced. Such differences reflect both differences in the power of the different approaches, as well as evolutionary differences among the samples of gene families. Furthermore, in some cases, including globins, divergence in regulatory sequence elements contributes substantially to functional divergence among paralogs (e.g., Hardison, 1998).As a case study,the β globin gene family illustrates that a wide variety of processes influence the rate of evolution and functional divergence of genes within a single gene family (this study; Bielawski and Yang, 2004; Aguileta et al., 2004). A case by case approach, and using a variety of methods, might reveal that the primary evolutionary processes that shape gene family evolution vary substantially among different gene families. Acknowledgements We are grateful to Katherine A. Dunn for useful comments and help in the use of Jeff Thorne's Bayesian program. G.A. was supported by a grant from the Mexican Council of Science and Technology (CONACYT); J.P.B. and Z.Y. were supported by a grant from the Biotechnology and Biological Sciences Research Council (BBSRC, UK). J.P.B was partially supported by a Discovery Grant from the Natural Sciences and Engineering Research Council of Canada (DG298394). References Aguileta, G., Bielawski, J.P., Yang, Z., Gene conversion and functional divergence in the beta-globin gene family. J. Mol. Evol. 59, Aris-Brosou, S., Yang, Z., Bayesian models of episodic evolution support a Late Precambrian explosive diversification of the metazoa. Mol. Biol. Evol. 20, Benton, M.J., Early origins of modern birds and mammals: molecules vs. morphology. BioEssays 21, Bielawski, J.P., Yang, Z., A maximum likelihood method for detecting functional divergence at individual codon sites, with application to gene family evolution. J. Mol. Evol. 59, Bowen, G.J., et al., Mammalian dispersal at the Paleocene/Eocene boundary. Science 295, Bromham, L., Penny, D., The modern molecular clock. Nature 4, Bromham, L., Rambaut, A., Fortey, R., Cooper, A., Penny, D., Testing the Cambrian explosion hypothesis by using a molecular dating technique. Proc. Natl. Acad. Sci. U.S.A. 95, Cracraft, J., Avian evolution, Gondwana biogeography and the Cretaceous Tertiary mass extinction event. Proc. R. Soc. Lond., B Biol. Sci. 268, Czelusniak, J., Goodman, M., Hewett-Emmett, D., Weiss, M.L., Venta, P.J., Tashian, R.E., Phylogenetic origins and adaptive evolution of avian and mammalian haemoglobin genes. Nature 298, Glazko, G., Koonin, E.V., Rogozin, I., Molecular dating: ape bones agree with chicken entrails. Trends Genet. 21, Goodman, M., Miyamoto, M.M., Czelusniak, J., Pattern and process in vertebrate phylogeny revealed by coevolution of molecules and phylogenies. In: Patterson, C. (Ed.), Molecules and Morphology in Evolution: Conflict or Compromise? Cambridge University Press, Cambridge, pp Hardison, R.C., Comparison of the beta-like globin gene families of rabbits and humans indicates that the gene cluster 5 -ε-γ-δ-β-3 predates the mammalian radiation. Mol. Biol. Evol. 1, Hardison, R.C., Hemoglobins from bacteria to man: evolution of different patterns of gene expression. J. Exp. Biol. 201 (Pt 8), Hardison, R.C., Margot, J.B., Rabbit globin pseudogene psi beta 2 is a hybrid of delta- and beta-globin gene sequences. Mol. Biol. Evol. 1, Hayasaka, K., Fitch, D.H., Slightom, J.L., Goodman, M., Fetal recruitment of anthropoid gamma-globin genes. Findings from phylogenetic analyses involving the 5 -flanking sequences of the psi gamma 1 globin gene of spider monkey Ateles geoffroyi. J. Mol. Biol. 224, Huelsenbeck, J.P., Ronquist, F., MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 17, Jacobs, L.L., Downs, W.R., In: Tomida, Y., Li, C.K., Setoguchi, T. (Eds.), Rodent and Lagomorph Families of Asian Origins and Diversification. National Science Museum, Tokyo, pp Luo, Z.X., Cifelli, R.L., Kielan-Jaworowska, Z., Dual origin of tribosphenic mammals. Nature 409, O'hUigin, C., Li, W.H., The molecular clock ticks regularly in muroid rodents and hamsters. J. Mol. Evol. 35, Shoshani, J., Groves, C.P., Simons, E.L., Gunnell, G.F., Primate phylogeny: morphological vs. molecular results. Mol. Phylogenet. Evol. 5,

9 G. Aguileta et al. / Gene 380 (2006) Swofford, D. PAUP 4.0-Phylogenetic analysis using parsimony ( and other methods). [4.0] Sinauer Assoc., Sunderland, M.A. Tavare, S., Marshall, C.R., Will, O., Soligo, C., Martin, R.D., Using the fossil record to estimate the age of the last common ancestor of extant primates. Nature 416, Thorne, J.L., Kishino, H., Painter, I.S., Estimating the rate of evolution of the rate of molecular evolution. Mol. Biol. Evol. 15, Wheeler, D., et al., An orphaned mammalian beta-globin gene of ancient evolutionary origin. Proc. Natl. Acad. Sci. U.S.A. 98, Yang, Z., PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13, Yang, Z., Yoder, A.D., Comparison of likelihood and Bayesian methods for estimating divergence times using multiple gene loci and calibration points, with application to a radiation of cute-looking mouse lemur species. Syst. Biol. 52, Zuckerkandl, E., Pauling, L., In: Kasha, M., Pullman, B. (Eds.), Molecular Disease, Evolution, and Genetic Heterogeneity. New York, pp

C3020 Molecular Evolution. Exercises #3: Phylogenetics

C3020 Molecular Evolution. Exercises #3: Phylogenetics C3020 Molecular Evolution Exercises #3: Phylogenetics Consider the following sequences for five taxa 1-5 and the known outgroup O, which has the ancestral states (note that sequence 3 has changed from

More information

Concepts and Methods in Molecular Divergence Time Estimation

Concepts and Methods in Molecular Divergence Time Estimation Concepts and Methods in Molecular Divergence Time Estimation 26 November 2012 Prashant P. Sharma American Museum of Natural History Overview 1. Why do we date trees? 2. The molecular clock 3. Local clocks

More information

Inferring Speciation Times under an Episodic Molecular Clock

Inferring Speciation Times under an Episodic Molecular Clock Syst. Biol. 56(3):453 466, 2007 Copyright c Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 10.1080/10635150701420643 Inferring Speciation Times under an Episodic Molecular

More information

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression)

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression) Using phylogenetics to estimate species divergence times... More accurately... Basics and basic issues for Bayesian inference of divergence times (plus some digression) "A comparison of the structures

More information

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley

Integrative Biology 200 PRINCIPLES OF PHYLOGENETICS Spring 2018 University of California, Berkeley Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley B.D. Mishler Feb. 14, 2018. Phylogenetic trees VI: Dating in the 21st century: clocks, & calibrations;

More information

Molecular Clocks. The Holy Grail. Rate Constancy? Protein Variability. Evidence for Rate Constancy in Hemoglobin. Given

Molecular Clocks. The Holy Grail. Rate Constancy? Protein Variability. Evidence for Rate Constancy in Hemoglobin. Given Molecular Clocks Rose Hoberman The Holy Grail Fossil evidence is sparse and imprecise (or nonexistent) Predict divergence times by comparing molecular data Given a phylogenetic tree branch lengths (rt)

More information

Gene Families part 2. Review: Gene Families /727 Lecture 8. Protein family. (Multi)gene family

Gene Families part 2. Review: Gene Families /727 Lecture 8. Protein family. (Multi)gene family Review: Gene Families Gene Families part 2 03 327/727 Lecture 8 What is a Case study: ian globin genes Gene trees and how they differ from species trees Homology, orthology, and paralogy Last tuesday 1

More information

Estimating Divergence Dates from Molecular Sequences

Estimating Divergence Dates from Molecular Sequences Estimating Divergence Dates from Molecular Sequences Andrew Rambaut and Lindell Bromham Department of Zoology, University of Oxford The ability to date the time of divergence between lineages using molecular

More information

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution Ziheng Yang Department of Biology, University College, London An excess of nonsynonymous substitutions

More information

8/23/2014. Phylogeny and the Tree of Life

8/23/2014. Phylogeny and the Tree of Life Phylogeny and the Tree of Life Chapter 26 Objectives Explain the following characteristics of the Linnaean system of classification: a. binomial nomenclature b. hierarchical classification List the major

More information

Dating r8s, multidistribute

Dating r8s, multidistribute Phylomethods Fall 2006 Dating r8s, multidistribute Jun G. Inoue Software of Dating Molecular Clock Relaxed Molecular Clock r8s multidistribute r8s Michael J. Sanderson UC Davis Estimation of rates and

More information

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task. Chapter 12 (Strikberger) Molecular Phylogenies and Evolution METHODS FOR DETERMINING PHYLOGENY In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task. Modern

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: istance-based methods Ultrametric Additive: UPGMA Transformed istance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics)

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics) - Phylogeny? - Systematics? The Phylogenetic Systematics (Phylogeny and Systematics) - Phylogenetic systematics? Connection between phylogeny and classification. - Phylogenetic systematics informs the

More information

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the

More information

7. Tests for selection

7. Tests for selection Sequence analysis and genomics 7. Tests for selection Dr. Katja Nowick Group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute for Brain Research www. nowicklab.info

More information

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

More information

Dr. Amira A. AL-Hosary

Dr. Amira A. AL-Hosary Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

More information

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships Chapter 26: Phylogeny and the Tree of Life You Must Know The taxonomic categories and how they indicate relatedness. How systematics is used to develop phylogenetic trees. How to construct a phylogenetic

More information

Letter to the Editor. Department of Biology, Arizona State University

Letter to the Editor. Department of Biology, Arizona State University Letter to the Editor Traditional Phylogenetic Reconstruction Methods Reconstruct Shallow and Deep Evolutionary Relationships Equally Well Michael S. Rosenberg and Sudhir Kumar Department of Biology, Arizona

More information

Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies

Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies 1 What is phylogeny? Essay written for the course in Markov Chains 2004 Torbjörn Karfunkel Phylogeny is the evolutionary development

More information

PAML 4: Phylogenetic Analysis by Maximum Likelihood

PAML 4: Phylogenetic Analysis by Maximum Likelihood PAML 4: Phylogenetic Analysis by Maximum Likelihood Ziheng Yang* *Department of Biology, Galton Laboratory, University College London, London, United Kingdom PAML, currently in version 4, is a package

More information

DATING LINEAGES: MOLECULAR AND PALEONTOLOGICAL APPROACHES TO THE TEMPORAL FRAMEWORK OF CLADES

DATING LINEAGES: MOLECULAR AND PALEONTOLOGICAL APPROACHES TO THE TEMPORAL FRAMEWORK OF CLADES Int. J. Plant Sci. 165(4 Suppl.):S7 S21. 2004. Ó 2004 by The University of Chicago. All rights reserved. 1058-5893/2004/1650S4-0002$15.00 DATING LINEAGES: MOLECULAR AND PALEONTOLOGICAL APPROACHES TO THE

More information

1 ATGGGTCTC 2 ATGAGTCTC

1 ATGGGTCTC 2 ATGAGTCTC We need an optimality criterion to choose a best estimate (tree) Other optimality criteria used to choose a best estimate (tree) Parsimony: begins with the assumption that the simplest hypothesis that

More information

Accuracy and Power of the Likelihood Ratio Test in Detecting Adaptive Molecular Evolution

Accuracy and Power of the Likelihood Ratio Test in Detecting Adaptive Molecular Evolution Accuracy and Power of the Likelihood Ratio Test in Detecting Adaptive Molecular Evolution Maria Anisimova, Joseph P. Bielawski, and Ziheng Yang Department of Biology, Galton Laboratory, University College

More information

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky MOLECULAR PHYLOGENY "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky EVOLUTION - theory that groups of organisms change over time so that descendeants differ structurally

More information

Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition

Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition David D. Pollock* and William J. Bruno* *Theoretical Biology and Biophysics, Los Alamos National

More information

Cladistics and Bioinformatics Questions 2013

Cladistics and Bioinformatics Questions 2013 AP Biology Name Cladistics and Bioinformatics Questions 2013 1. The following table shows the percentage similarity in sequences of nucleotides from a homologous gene derived from five different species

More information

Chapter 19: Taxonomy, Systematics, and Phylogeny

Chapter 19: Taxonomy, Systematics, and Phylogeny Chapter 19: Taxonomy, Systematics, and Phylogeny AP Curriculum Alignment Chapter 19 expands on the topics of phylogenies and cladograms, which are important to Big Idea 1. In order for students to understand

More information

Classification and Phylogeny

Classification and Phylogeny Classification and Phylogeny The diversity of life is great. To communicate about it, there must be a scheme for organization. There are many species that would be difficult to organize without a scheme

More information

Tree of Life iological Sequence nalysis Chapter http://tolweb.org/tree/ Phylogenetic Prediction ll organisms on Earth have a common ancestor. ll species are related. The relationship is called a phylogeny

More information

An Evaluation of Different Partitioning Strategies for Bayesian Estimation of Species Divergence Times

An Evaluation of Different Partitioning Strategies for Bayesian Estimation of Species Divergence Times Syst. Biol. 67(1):61 77, 2018 The Author(s) 2017. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. This is an Open Access article distributed under the terms of

More information

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis 10 December 2012 - Corrections - Exercise 1 Non-vertebrate chordates generally possess 2 homologs, vertebrates 3 or more gene copies; a Drosophila

More information

Classification and Phylogeny

Classification and Phylogeny Classification and Phylogeny The diversity it of life is great. To communicate about it, there must be a scheme for organization. There are many species that would be difficult to organize without a scheme

More information

Taming the Beast Workshop

Taming the Beast Workshop Workshop and Chi Zhang June 28, 2016 1 / 19 Species tree Species tree the phylogeny representing the relationships among a group of species Figure adapted from [Rogers and Gibbs, 2014] Gene tree the phylogeny

More information

Estimation of species divergence dates with a sloppy molecular clock

Estimation of species divergence dates with a sloppy molecular clock Estimation of species divergence dates with a sloppy molecular clock Ziheng Yang Department of Biology University College London Date estimation with a clock is easy. t 2 = 13my t 3 t 1 t 4 t 5 Node Distance

More information

Lecture Notes: BIOL2007 Molecular Evolution

Lecture Notes: BIOL2007 Molecular Evolution Lecture Notes: BIOL2007 Molecular Evolution Kanchon Dasmahapatra (k.dasmahapatra@ucl.ac.uk) Introduction By now we all are familiar and understand, or think we understand, how evolution works on traits

More information

Many of the slides that I ll use have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Many of the slides that I ll use have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Many of the slides that I ll use have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis

More information

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information #

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information # Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Details of PRF Methodology In the Poisson Random Field PRF) model, it is assumed that non-synonymous mutations at a given gene are either

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: Distance-based methods Ultrametric Additive: UPGMA Transformed Distance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other? Phylogeny and systematics Why are these disciplines important in evolutionary biology and how are they related to each other? Phylogeny and systematics Phylogeny: the evolutionary history of a species

More information

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata.

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata. Supplementary Note S2 Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata. Phylogenetic trees reconstructed by a variety of methods from either single-copy orthologous loci (Class

More information

Chapter 16: Reconstructing and Using Phylogenies

Chapter 16: Reconstructing and Using Phylogenies Chapter Review 1. Use the phylogenetic tree shown at the right to complete the following. a. Explain how many clades are indicated: Three: (1) chimpanzee/human, (2) chimpanzee/ human/gorilla, and (3)chimpanzee/human/

More information

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis

More information

Lecture 11 Friday, October 21, 2011

Lecture 11 Friday, October 21, 2011 Lecture 11 Friday, October 21, 2011 Phylogenetic tree (phylogeny) Darwin and classification: In the Origin, Darwin said that descent from a common ancestral species could explain why the Linnaean system

More information

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline Phylogenetics Todd Vision iology 522 March 26, 2007 pplications of phylogenetics Studying organismal or biogeographic history Systematics ating events in the fossil record onservation biology Studying

More information

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 2009 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

More information

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 200 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

More information

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

CHAPTERS 24-25: Evidence for Evolution and Phylogeny CHAPTERS 24-25: Evidence for Evolution and Phylogeny 1. For each of the following, indicate how it is used as evidence of evolution by natural selection or shown as an evolutionary trend: a. Paleontology

More information

Consensus Methods. * You are only responsible for the first two

Consensus Methods. * You are only responsible for the first two Consensus Trees * consensus trees reconcile clades from different trees * consensus is a conservative estimate of phylogeny that emphasizes points of agreement * philosophy: agreement among data sets is

More information

A (short) introduction to phylogenetics

A (short) introduction to phylogenetics A (short) introduction to phylogenetics Thibaut Jombart, Marie-Pauline Beugin MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis with PR Statistics, Millport Field

More information

Phylogenetic inference

Phylogenetic inference Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis-) advantages of different information types

More information

Lecture 6 Phylogenetic Inference

Lecture 6 Phylogenetic Inference Lecture 6 Phylogenetic Inference From Darwin s notebook in 1837 Charles Darwin Willi Hennig From The Origin in 1859 Cladistics Phylogenetic inference Willi Hennig, Cladistics 1. Clade, Monophyletic group,

More information

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Bio 1B Lecture Outline (please print and bring along) Fall, 2007 Bio 1B Lecture Outline (please print and bring along) Fall, 2007 B.D. Mishler, Dept. of Integrative Biology 2-6810, bmishler@berkeley.edu Evolution lecture #5 -- Molecular genetics and molecular evolution

More information

How should we go about modeling this? Model parameters? Time Substitution rate Can we observe time or subst. rate? What can we observe?

How should we go about modeling this? Model parameters? Time Substitution rate Can we observe time or subst. rate? What can we observe? How should we go about modeling this? gorilla GAAGTCCTTGAGAAATAAACTGCACACACTGG orangutan GGACTCCTTGAGAAATAAACTGCACACACTGG Model parameters? Time Substitution rate Can we observe time or subst. rate? What

More information

Maximum Likelihood Until recently the newest method. Popularized by Joseph Felsenstein, Seattle, Washington.

Maximum Likelihood Until recently the newest method. Popularized by Joseph Felsenstein, Seattle, Washington. Maximum Likelihood This presentation is based almost entirely on Peter G. Fosters - "The Idiot s Guide to the Zen of Likelihood in a Nutshell in Seven Days for Dummies, Unleashed. http://www.bioinf.org/molsys/data/idiots.pdf

More information

Algorithms in Bioinformatics

Algorithms in Bioinformatics Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Distance Methods Character Methods

More information

Efficiencies of maximum likelihood methods of phylogenetic inferences when different substitution models are used

Efficiencies of maximum likelihood methods of phylogenetic inferences when different substitution models are used Molecular Phylogenetics and Evolution 31 (2004) 865 873 MOLECULAR PHYLOGENETICS AND EVOLUTION www.elsevier.com/locate/ympev Efficiencies of maximum likelihood methods of phylogenetic inferences when different

More information

Chapter 26: Phylogeny and the Tree of Life

Chapter 26: Phylogeny and the Tree of Life Chapter 26: Phylogeny and the Tree of Life 1. Key Concepts Pertaining to Phylogeny 2. Determining Phylogenies 3. Evolutionary History Revealed in Genomes 1. Key Concepts Pertaining to Phylogeny PHYLOGENY

More information

Reconstructing Evolutionary Trees. Chapter 14

Reconstructing Evolutionary Trees. Chapter 14 Reconstructing Evolutionary Trees Chapter 14 Phylogenetic trees The evolutionary history of a group of species = phylogeny The problem: Evolutionary histories can never truly be known. Once again, we are

More information

What is Phylogenetics

What is Phylogenetics What is Phylogenetics Phylogenetics is the area of research concerned with finding the genetic connections and relationships between species. The basic idea is to compare specific characters (features)

More information

Phylogenetic Tree Reconstruction

Phylogenetic Tree Reconstruction I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven

More information

Understanding relationship between homologous sequences

Understanding relationship between homologous sequences Molecular Evolution Molecular Evolution How and when were genes and proteins created? How old is a gene? How can we calculate the age of a gene? How did the gene evolve to the present form? What selective

More information

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley

PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION Integrative Biology 200B Spring 2009 University of California, Berkeley "PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley B.D. Mishler Jan. 22, 2009. Trees I. Summary of previous lecture: Hennigian

More information

C.DARWIN ( )

C.DARWIN ( ) C.DARWIN (1809-1882) LAMARCK Each evolutionary lineage has evolved, transforming itself, from a ancestor appeared by spontaneous generation DARWIN All organisms are historically interconnected. Their relationships

More information

MOLECULAR SYSTEMATICS: A SYNTHESIS OF THE COMMON METHODS AND THE STATE OF KNOWLEDGE

MOLECULAR SYSTEMATICS: A SYNTHESIS OF THE COMMON METHODS AND THE STATE OF KNOWLEDGE CELLULAR & MOLECULAR BIOLOGY LETTERS http://www.cmbl.org.pl Received: 16 August 2009 Volume 15 (2010) pp 311-341 Final form accepted: 01 March 2010 DOI: 10.2478/s11658-010-0010-8 Published online: 19 March

More information

New Inferences from Tree Shape: Numbers of Missing Taxa and Population Growth Rates

New Inferences from Tree Shape: Numbers of Missing Taxa and Population Growth Rates Syst. Biol. 51(6):881 888, 2002 DOI: 10.1080/10635150290155881 New Inferences from Tree Shape: Numbers of Missing Taxa and Population Growth Rates O. G. PYBUS,A.RAMBAUT,E.C.HOLMES, AND P. H. HARVEY Department

More information

How should we organize the diversity of animal life?

How should we organize the diversity of animal life? How should we organize the diversity of animal life? The difference between Taxonomy Linneaus, and Cladistics Darwin What are phylogenies? How do we read them? How do we estimate them? Classification (Taxonomy)

More information

Probability Distribution of Molecular Evolutionary Trees: A New Method of Phylogenetic Inference

Probability Distribution of Molecular Evolutionary Trees: A New Method of Phylogenetic Inference J Mol Evol (1996) 43:304 311 Springer-Verlag New York Inc. 1996 Probability Distribution of Molecular Evolutionary Trees: A New Method of Phylogenetic Inference Bruce Rannala, Ziheng Yang Department of

More information

How to read and make phylogenetic trees Zuzana Starostová

How to read and make phylogenetic trees Zuzana Starostová How to read and make phylogenetic trees Zuzana Starostová How to make phylogenetic trees? Workflow: obtain DNA sequence quality check sequence alignment calculating genetic distances phylogeny estimation

More information

Chapter 27: Evolutionary Genetics

Chapter 27: Evolutionary Genetics Chapter 27: Evolutionary Genetics Student Learning Objectives Upon completion of this chapter you should be able to: 1. Understand what the term species means to biology. 2. Recognize the various patterns

More information

Inferring Complex DNA Substitution Processes on Phylogenies Using Uniformization and Data Augmentation

Inferring Complex DNA Substitution Processes on Phylogenies Using Uniformization and Data Augmentation Syst Biol 55(2):259 269, 2006 Copyright c Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 101080/10635150500541599 Inferring Complex DNA Substitution Processes on Phylogenies

More information

Molecular evolution. Joe Felsenstein. GENOME 453, Autumn Molecular evolution p.1/49

Molecular evolution. Joe Felsenstein. GENOME 453, Autumn Molecular evolution p.1/49 Molecular evolution Joe Felsenstein GENOME 453, utumn 2009 Molecular evolution p.1/49 data example for phylogeny inference Five DN sequences, for some gene in an imaginary group of species whose names

More information

Phylogenetics in the Age of Genomics: Prospects and Challenges

Phylogenetics in the Age of Genomics: Prospects and Challenges Phylogenetics in the Age of Genomics: Prospects and Challenges Antonis Rokas Department of Biological Sciences, Vanderbilt University http://as.vanderbilt.edu/rokaslab http://pubmed2wordle.appspot.com/

More information

Maximum Likelihood Estimation on Large Phylogenies and Analysis of Adaptive Evolution in Human Influenza Virus A

Maximum Likelihood Estimation on Large Phylogenies and Analysis of Adaptive Evolution in Human Influenza Virus A J Mol Evol (2000) 51:423 432 DOI: 10.1007/s002390010105 Springer-Verlag New York Inc. 2000 Maximum Likelihood Estimation on Large Phylogenies and Analysis of Adaptive Evolution in Human Influenza Virus

More information

Letter to the Editor. Temperature Hypotheses. David P. Mindell, Alec Knight,? Christine Baer,$ and Christopher J. Huddlestons

Letter to the Editor. Temperature Hypotheses. David P. Mindell, Alec Knight,? Christine Baer,$ and Christopher J. Huddlestons Letter to the Editor Slow Rates of Molecular Evolution Temperature Hypotheses in Birds and the Metabolic Rate and Body David P. Mindell, Alec Knight,? Christine Baer,$ and Christopher J. Huddlestons *Department

More information

Unit 7: Evolution Guided Reading Questions (80 pts total)

Unit 7: Evolution Guided Reading Questions (80 pts total) AP Biology Biology, Campbell and Reece, 10th Edition Adapted from chapter reading guides originally created by Lynn Miriello Name: Unit 7: Evolution Guided Reading Questions (80 pts total) Chapter 22 Descent

More information

Unit 9: Evolution Guided Reading Questions (80 pts total)

Unit 9: Evolution Guided Reading Questions (80 pts total) Name: AP Biology Biology, Campbell and Reece, 7th Edition Adapted from chapter reading guides originally created by Lynn Miriello Unit 9: Evolution Guided Reading Questions (80 pts total) Chapter 22 Descent

More information

Evolution by duplication

Evolution by duplication 6.095/6.895 - Computational Biology: Genomes, Networks, Evolution Lecture 18 Nov 10, 2005 Evolution by duplication Somewhere, something went wrong Challenges in Computational Biology 4 Genome Assembly

More information

A Statistical Test of Phylogenies Estimated from Sequence Data

A Statistical Test of Phylogenies Estimated from Sequence Data A Statistical Test of Phylogenies Estimated from Sequence Data Wen-Hsiung Li Center for Demographic and Population Genetics, University of Texas A simple approach to testing the significance of the branching

More information

Bayesian Models of Episodic Evolution Support a Late Precambrian Explosive Diversification of the Metazoa

Bayesian Models of Episodic Evolution Support a Late Precambrian Explosive Diversification of the Metazoa Bayesian Models of Episodic Evolution Support a Late Precambrian Explosive Diversification of the Metazoa Stéphane Aris-Brosou 1 and Ziheng Yang Department of Biology, University College London, England

More information

Homework Assignment, Evolutionary Systems Biology, Spring Homework Part I: Phylogenetics:

Homework Assignment, Evolutionary Systems Biology, Spring Homework Part I: Phylogenetics: Homework Assignment, Evolutionary Systems Biology, Spring 2009. Homework Part I: Phylogenetics: Introduction. The objective of this assignment is to understand the basics of phylogenetic relationships

More information

Natural selection on the molecular level

Natural selection on the molecular level Natural selection on the molecular level Fundamentals of molecular evolution How DNA and protein sequences evolve? Genetic variability in evolution } Mutations } forming novel alleles } Inversions } change

More information

Reconstructing the history of lineages

Reconstructing the history of lineages Reconstructing the history of lineages Class outline Systematics Phylogenetic systematics Phylogenetic trees and maps Class outline Definitions Systematics Phylogenetic systematics/cladistics Systematics

More information

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University Phylogenetics: Bayesian Phylogenetic Analysis COMP 571 - Spring 2015 Luay Nakhleh, Rice University Bayes Rule P(X = x Y = y) = P(X = x, Y = y) P(Y = y) = P(X = x)p(y = y X = x) P x P(X = x 0 )P(Y = y X

More information

Phylogeny and the Tree of Life

Phylogeny and the Tree of Life Chapter 26 Phylogeny and the Tree of Life PowerPoint Lecture Presentations for Biology Eighth Edition Neil Campbell and Jane Reece Lectures by Chris Romero, updated by Erin Barley with contributions from

More information

Molecular dating of phylogenetic trees: A brief review of current methods that estimate divergence times

Molecular dating of phylogenetic trees: A brief review of current methods that estimate divergence times Diversity and Distributions, (Diversity Distrib.) (2006) 12, 35 48 Blackwell Publishing, Ltd. CAPE SPECIAL FEATURE Molecular dating of phylogenetic trees: A brief review of current methods that estimate

More information

Markov chain Monte-Carlo to estimate speciation and extinction rates: making use of the forest hidden behind the (phylogenetic) tree

Markov chain Monte-Carlo to estimate speciation and extinction rates: making use of the forest hidden behind the (phylogenetic) tree Markov chain Monte-Carlo to estimate speciation and extinction rates: making use of the forest hidden behind the (phylogenetic) tree Nicolas Salamin Department of Ecology and Evolution University of Lausanne

More information

molecular evolution and phylogenetics

molecular evolution and phylogenetics molecular evolution and phylogenetics Charlotte Darby Computational Genomics: Applied Comparative Genomics 2.13.18 https://www.thinglink.com/scene/762084640000311296 Internal node Root TIME Branch Leaves

More information

Chapter 22: Descent with Modification 1. BRIEFLY summarize the main points that Darwin made in The Origin of Species.

Chapter 22: Descent with Modification 1. BRIEFLY summarize the main points that Darwin made in The Origin of Species. AP Biology Chapter Packet 7- Evolution Name Chapter 22: Descent with Modification 1. BRIEFLY summarize the main points that Darwin made in The Origin of Species. 2. Define the following terms: a. Natural

More information

BINF6201/8201. Molecular phylogenetic methods

BINF6201/8201. Molecular phylogenetic methods BINF60/80 Molecular phylogenetic methods 0-7-06 Phylogenetics Ø According to the evolutionary theory, all life forms on this planet are related to one another by descent. Ø Traditionally, phylogenetics

More information

Bootstrapping and Tree reliability. Biol4230 Tues, March 13, 2018 Bill Pearson Pinn 6-057

Bootstrapping and Tree reliability. Biol4230 Tues, March 13, 2018 Bill Pearson Pinn 6-057 Bootstrapping and Tree reliability Biol4230 Tues, March 13, 2018 Bill Pearson wrp@virginia.edu 4-2818 Pinn 6-057 Rooting trees (outgroups) Bootstrapping given a set of sequences sample positions randomly,

More information

Phylogeny and the Tree of Life

Phylogeny and the Tree of Life LECTURE PRESENTATIONS For CAMPBELL BIOLOGY, NINTH EDITION Jane B. Reece, Lisa A. Urry, Michael L. Cain, Steven A. Wasserman, Peter V. Minorsky, Robert B. Jackson Chapter 26 Phylogeny and the Tree of Life

More information

AP Biology. Cladistics

AP Biology. Cladistics Cladistics Kingdom Summary Review slide Review slide Classification Old 5 Kingdom system Eukaryote Monera, Protists, Plants, Fungi, Animals New 3 Domain system reflects a greater understanding of evolution

More information

FUNDAMENTALS OF MOLECULAR EVOLUTION

FUNDAMENTALS OF MOLECULAR EVOLUTION FUNDAMENTALS OF MOLECULAR EVOLUTION Second Edition Dan Graur TELAVIV UNIVERSITY Wen-Hsiung Li UNIVERSITY OF CHICAGO SINAUER ASSOCIATES, INC., Publishers Sunderland, Massachusetts Contents Preface xiii

More information

Phylogenetic analysis. Characters

Phylogenetic analysis. Characters Typical steps: Phylogenetic analysis Selection of taxa. Selection of characters. Construction of data matrix: character coding. Estimating the best-fitting tree (model) from the data matrix: phylogenetic

More information

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5. Five Sami Khuri Department of Computer Science San José State University San José, California, USA sami.khuri@sjsu.edu v Distance Methods v Character Methods v Molecular Clock v UPGMA v Maximum Parsimony

More information

SCIENTIFIC EVIDENCE TO SUPPORT THE THEORY OF EVOLUTION. Using Anatomy, Embryology, Biochemistry, and Paleontology

SCIENTIFIC EVIDENCE TO SUPPORT THE THEORY OF EVOLUTION. Using Anatomy, Embryology, Biochemistry, and Paleontology SCIENTIFIC EVIDENCE TO SUPPORT THE THEORY OF EVOLUTION Using Anatomy, Embryology, Biochemistry, and Paleontology Scientific Fields Different fields of science have contributed evidence for the theory of

More information

Supplemental Information Likelihood-based inference in isolation-by-distance models using the spatial distribution of low-frequency alleles

Supplemental Information Likelihood-based inference in isolation-by-distance models using the spatial distribution of low-frequency alleles Supplemental Information Likelihood-based inference in isolation-by-distance models using the spatial distribution of low-frequency alleles John Novembre and Montgomery Slatkin Supplementary Methods To

More information

Intraspecific gene genealogies: trees grafting into networks

Intraspecific gene genealogies: trees grafting into networks Intraspecific gene genealogies: trees grafting into networks by David Posada & Keith A. Crandall Kessy Abarenkov Tartu, 2004 Article describes: Population genetics principles Intraspecific genetic variation

More information