Rooting the Ribosomal Tree of Life. Research article. Abstract. Introduction

Size: px
Start display at page:

Download "Rooting the Ribosomal Tree of Life. Research article. Abstract. Introduction"

Transcription

1 Rooting the Ribosomal Tree of Life Gregory P. Fournier 1 and J. Peter Gogarten*,1 1 Department of Molecular and Cell Biology, University of Connecticut *Corresponding author: gogarten@uconn.edu. Associate editor: Martin Embley Research article Abstract The origin of the genetic code and the rooting of the tree of life (ToL) are two of the most challenging problems in the study of life s early evolution. Although both have been the focus of numerous investigations utilizing a variety of methods, until now, each problem has been addressed independently. Typically, attempts to root the ToL have relied on phylogenies of genes with ancient duplications, which are subject to artifacts of tree reconstruction and horizontal gene transfer, or specific physiological characters believed to be primitive, which are often based on subjective criteria. Here, we demonstrate a unique method for rooting based on the identification of amino acid usage biases comprising the residual signature of a more primitive genetic code. Using a phylogenetic tree of concatenated ribosomal proteins, our analysis of amino acid compositional bias detects a strong and unique signal associated with the early expansion of the genetic code, placing the root of the translation machinery along the bacterial branch. Key words: tree of life, ribosome, genetic code, ancestral reconstruction. Introduction In the last few decades, extensive sequencing of genetic material from a broad range of organisms has permitted the construction of large detailed phylogenetic trees representing their evolutionary relationships. Of particular interest are the deeper relationships between the three domains of life: Bacteria, Archaea, and Eukarya. As no organismal outgroup exists to polarize a universal phylogeny representing all three domains, identifying alternative methods to root universal trees is critical to understand the deep evolutionary history of life on Earth. This problem is also compounded by an increasing realization that universal trees do not necessarily reflect the organismal tree of life (ToL) due to extensive horizontal gene transfer. Many genes have distinct incongruent histories, and therefore, a single tree is insufficient to explain the full evolutionary history of life (Bapteste et al. 2009). Many protein families contain ancestral gene duplications, where divergent paralogous copies of a specific gene existed at the time of the most recent common ancestor (MRCA). In effect, this allows each paralog to act as outgroup for the other within a phylogenetic reconstruction, producing a reciprocal rooting of the tree (Schwartz and Dayhoff 1978). These ancestral gene duplications have been frequently used to root the ToL, an approach pioneered using protein sequences of the catalytic and noncatalytic subunits of the ATP synthase complex (Gogarten et al. 1989) and elongation factors (Iwabe et al. 1989).Thisapproachhasalsobeenusedwithother gene families containing ancestral duplications, including aminoacyl transfer RNA (trna) synthetases (aars) (Brown and Doolittle 1995), an additional analysis of elongation factors (Baldauf et al. 1996), and protein-targeting machinery components (Gribaldo and Cammarano 1998). Additionally, this approach has been applied in at least one case to ancient internal gene duplications using the large subunit of carbamoyl phosphate synthetase (Schofield 1993; Lawson et al. 1996; Olendzenski and Gogarten 1998; Islas et al. 2007). These analyses generally support placing the root either on the bacterial branch or within the bacterial domain itself. In most of these analyses, the eukaryotic nucleocytoplasm groups together with archaeal homologs, and this grouping is further supported by higher order shared derived molecular characteristics (see discussion in Zhaxybayeva and Gogarten 2007). Molecular data have provided overwhelming evidence for the endosymbiotic origin of mitochondria and plastids (Gray 1993; Margulis 1995). Following common practice, we use the term eukaryotes to denote the nucleocytoplasmic component of the eukaryotic cell. Although support for the eukaryotic nucleocytoplasm grouping with the archaea is strong, the question of monophyletic or paraphyletic archaea remains contested. Sometimes, the eukaryotic nucleocytoplasm is found to have originated from one particular group of archaea (e.g., Cox et al. 2008); however, the root of the group of Eukaryotes plus Archaea is difficult to determine, and many argue that the eukaryotic nucleocytoplasm constitutes a deeper branch and a true sister group to extant archaea (see, e.g., the discussion in Dagan and Martin 2007; Poole and Penny 2007a, 2007b), although recent analyses have also suggested an eocyte rooting and topology (Cox et al. 2008). Phylogenetic analysis including all three domains of life is further complicated by the possibility of horizontal gene transfer events (Zhaxybayeva and Gogarten 2004), as well as artifacts of tree reconstruction, especially those caused by long branches frequently associated with interdomain relationships (for review, see Zhaxybayeva et al. 2005). Other approaches for rooting the ToL focus on specific physiological characters that are proposed to be primitive, The Author Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please journals.permissions@oxfordjournals.org 1792 Mol. Biol. Evol. 27(8): doi: /molbev/msq057 Advance Access publication March 1, 2010

2 Rooting the Ribosomal Tree of Life doi: /molbev/msq057 such as aspects of membrane and cell wall structure (Cavalier-Smith 2002), split trna structures (Di Giulio 2007), or extensive RNA-world molecular relics (Penny and Poole 1999). Branches associated with groups showing these primitive characters are postulated to contain the root. These approaches root the tree within the bacterial or archaeal domains or on the eukaryal branch. Although logical support can be given to define any of these characters as primitive, ultimately these approaches depend on subjective and often controversial hypotheses about the physiological nature of early life. Whereas primitive characters are not useful in phylogenetic reconstruction (Hennig 1966), derived characters (duplicated genes, large insertions, or gaps) can be used to exclude the root from that part of the ToL where organisms possess the derived character state, although the root may be located on the branch where the character transitioned from primitive to derived (Gogarten et al. 1989; Zhaxybayeva et al. 2005; Skophammer et al. 2006, 2007; Lake et al. 2007; Zhaxybayeva and Gogarten 2007). Many of the aforementioned analyses depend either directly or indirectly on specific genes that show weak conservation and/or a history of frequent interdomain horizontal gene transfer, making them poor proxies for an organismal ToL. It seems likely that the best genetic proxy for an organismal ToL are phylogenetic trees generated using the ribosomal machinery, specifically the three ribosomal RNAs (rrnas) and about 29 universal ribosomal proteins that comprise the core ribosome. Although ribosomal protein- and RNA-encoding genes have been transferred in the past (see examples and discussion in Gogarten et al. 2002), these genes are resistant to transfer (Sorek et al. 2007), with most transfers occurring between close relatives followed by homologous recombination (e.g., Morandi et al. 2005). These transfers across short phylogenetic distances have little effect on large-scale tree topology, especially among deep branches where the root of the ToL may reside. Additionally, the high level of structural, sequence, and functional conservation in ribosomal proteins allows for a higher confidence in tree topologies being free of artifacts, especially those producing long branches via radical functional divergence following duplication, as proposed for some deep paralogs (Philippe and Forterre 1999; Cavalier-Smith 2006). For these reasons, a ribosomal ToL has been proposed to be an ideal backbone upon which to map horizontal gene transfers, clearly depicting their distinct contribution to genomic evolution (Gogarten 1995; Dagan et al. 2008; Swithers et al. 2009). Unfortunately, because there are no confirmed paralogs for ribosomal genes, rooting the ribosomal ToL using established methods so far has not been possible. In comparison with many organismal characters used in other rooting methods, the genetic code is shared by all cellular life and likely present in its complete form at the time of the MRCA, apart from slight modifications within specific lineages (Knight et al. 2001; Miranda et al. 2006; Fournier and Gogarten 2007). Although this strongly suggests that the genetic code and its requisite translation machinery must have evolved in a preprotein (and likely RNA-based) world, such a complex system could only have evolved incrementally, likely coevolving with the emergence of encoded proteins of increasing complexity and functionality. As such, it is also likely that specific genetically encoded amino acids were incorporated into protein synthesis gradually, up until the code reached its current retinue of 20 amino acids. Amino acids could not be fixed at specific positions within proteins until their establishment in the code; therefore, at the time of the MRCA, the most recent amino acids would have had less time to become selected for and should have been underrepresented if there was insufficient time for equilibrium to be attained. Conversely, more ancient amino acids should have been overrepresented at this time (Fournier and Gogarten 2007). Based on this model, at least two distinct approaches have attempted to use amino acid usage at ancient positions to infer the evolutionary history of the genetic code, albeit with somewhat differing results (Brooks and Fresco 2002; Brooks et al. 2002, 2004; Fournier and Gogarten 2007). The work of Brooks et al. 2002, 2004; identifies biases in overall amino acid usage in ancestral reconstructions of a wide set of universally conserved genes using an expectation maximization methodology. In contrast, Fournier and Gogarten (2007) only count completely conserved positions within universal ribosomal proteins, a simpler yet more stringent model that seeks to avoid overinterpretation of reconstructions at variable positions at the expense of a large data set. These analyses both agree on the ancient underrepresentation of Cys, Trp, Phe, and Tyr residues. However, whereas Brooks et al. (2004) also identify Ser, Thr, Leu, Gln, and Asp as recent and Val, Ile, His, and Glu as more ancient, Fournier and Gogarten (2007) identify Ile, Val, Glu, and Lys as recent and Asn and Gly as ancient. Interestingly, a subsequent revised analysis (Fournier 2009) incorporating a 90% conservation probability cutoff for the inclusion of reconstructed positions produces results more similar to that of Brooks et al., providing statistical support for Ser and Gln as more recent additions while removing support for Ile and Val. As these hydrophobic amino acids are extremely similar to one another, it is likely that the initial analysis underestimated their ancestral frequency due to occasional substitutions in some genomes. In both these compositional analyses, the location of the root of the ToL is assumed to be known (as either a midpoint of the individual molecular phylogenies [Brooks et al. 2004] or the bacterial branch rooting [Fournier and Gogarten 2007]), with the ancestral genetic code impacting usages on this particular deep branch. Both these assumptions are increasingly difficult to justify, given the emerging understanding of the rate heterogeneity of genomic evolution, the frequency of gene transfer, and the occasionally conflicting rootings identified using different sets of genes (Zhaxybayeva et al. 2005; Bapteste et al. 2009). However, assuming that a strong and unique compositional bias indicates a proximity to a more primitive genetic code unique to the deepest branch on the ToL, it becomes apparent that detecting such 1793

3 Fournier and Gogarten doi: /molbev/msq057 FIG. 1. Model for identifying amino acid positions conserved on branches. For ancestral node reconstructions, shaded positions indicate.90% probability of amino acid identity, reflecting a high probability of fixation. The set of conserved positions shared between adjacent nodes corresponds to those conserved at a postulated root located along this particular branch. The analysis is repeated for every branch within the tree. Sequence fragments are from reconstructions of ribosomal protein L11. a bias is, in fact, a method for empirically locating the root of the ribosomal ToL. In this generalized approach, positions conserved within every branch of a phylogenetic tree can be identified using ancestral sequence reconstructions (fig. 1). Therefore, if enough sequence conservation exists within a universal protein phylogeny, the branch containing the root can be directly inferred, if positions conserved within it have retained a strong and unique signature of bias in amino acid composition independent of other physiological effects. Provided that such a unique compositional signal is detected, its compatibility with existing hypotheses regarding genetic code evolution provides additional empirical evidence for both the location of the root and the supported model(s) of code evolution. Like previous composition-based investigations into genetic code evolution, this hypothesis assumes that newer amino acids will approach an equilibrium of usage due to the effects of positive and purifying selection on protein sequences, thereby sequestering this signature to only the deepest branches of the tree. It is also important to note that, while relying on the primitive character of an ancestral genetic code signature, this approach is distinct from other attempts to root the tree using primitive characters, in that the root is being directly inferred along an internal branch, as opposed to indirectly inferred via the character states within various taxa at terminal positions on the tree. Materials and Methods Sequence Collection and Alignment Sequences of 29 universally conserved ribosomal proteins were collected from GenBank (Benson et al. 2008) from 121 genomes, including 45 completed bacterial genomes with a wide phylogenetic distribution, 43 completed archaeal genomes, and 33 eukaryal genomes. The MUSCLE program (Edgar 2004) was used to perform a multiple sequence alignment for individual proteins within each 1794 domain. Domain alignments were then combined in ClustalW v (Thompson et al. 1994) using a profile alignment. Alignments for each individual protein were then concatenated. Phylogeny and Ancestral sequence Reconstruction Maximum likelihood (ML) trees were constructed using PHYML (Guindon and Gascuel 2003) under a WAG þ C (four rate categories, estimated a) þ PINVAR model, with 100 bootstrap replicates. Ancestral sequences were reconstructed using ANCESCON (O-option and no optimization of P-vector, WAG model) (Cai et al. 2004) using an ML tree constructed from positions with at least 50% conservation (i.e., at least 50% of the extant sequences at the tip of the branches shared the same amino acid). This marginal reconstruction method does not rely on tree polarity and determines the likelihood of each amino acid at each position within ancestral nodes, given the alignment and unrooted tree topology. Model for Identification of Conserved Positions Positions within each node of the ancestral reconstruction were defined as conserved if their identity was reported with at least 90% probability. This filter permits the inclusion of sites that may have infrequent substitutions between similar amino acids at terminal branches, avoiding any underrepresentation of similar frequently interchanged residues due to excessive stringency. Assuming in each case that the branch in question contains the root, positions were defined as conserved within each branch if identical positions were conserved as the same amino acid within both adjacent nodes. Data Set Simulations Simulated sequences (ten replicates) were generated using the EVOLVER program in PAML (Yang 2007), using the above-described observed tree topology, 4,500 positions (roughly equal to the average ribosomal protein concatenate length), a (as estimated for the actual data), under a WAG model with four rate categories. Initial amino acid compositions were set to the average observed overall ribosomal protein amino acid usages from the concatenated data sets. For each replicate, ancestral sequences were reconstructed using ANCESCON (O-option and no optimization of P-vector). Measuring/Combining Compositional Bias Usage bias along each branch was calculated for each amino acid as the difference between the observed and average simulated usage rate at conserved positions, measured as a relative percentage. For tests of statistical significance, this difference was measured in SD units (standard deviation of simulated data sets), followed by a two-sided Z-test. Significance was defined as a corresponding P value of less than 5% (P, 0.05). To measure combined compositional bias for each branch, usage biases (SD) for all 20 analyzed amino acids were combined by calculating a vector in

4 Rooting the Ribosomal Tree of Life doi: /molbev/msq057 20D space. The length of this vector was then used to calculate the geometric mean distance per amino acid (SD). Long-Branch Compositional Bias Simulation Simulated sequences were generated using the EVOLVER program in PAML, using a simple three-branch tree topology, 4,500 positions, flat amino acid usage rates (all equal to 0.05), a , under a WAG model with four rate categories. Ten replicates were generated for each branch length at log-linear intervals ( substitutions/ site). For each simulation, the usage rates of amino acids at conserved positions along one branch were calculated and then averaged across replicates. Results and Discussion Phylogeny of Concatenated Ribosomal Proteins The ribosome is one of the most ancient and wellconserved structures in the biological world, with 29 core proteins universally present across all cellular life (Harris et al. 2003). The high level of sequence conservation across all three domains allows for reliable alignment of individual ribosomal protein sequences, and as there are high selective barriers to horizontal transfers of these genes (Sorek et al. 2007), they are more likely to retain a vertical signal of organismal evolution. These features make ribosomal protein sequences excellent for constructing reliable phylogenetic trees, especially for distantly related groups; however, because no paralogs of universal ribosomal proteins are currently known, phylogenies of ribosomal proteins (and ribosomal RNAs) have remained unrooted. Individual core ribosomal proteins are short in length, each containing few phylogenetically informative positions. As such, although universal phylogenies generated from alignments of individual ribosomal proteins generally do not show significant conflict, they are largely unresolved (data not shown). Concatenation of the 29 individually aligned protein sequences produces an alignment with 9,258 positions, resulting in consistent highly resolved phylogenies using a variety of tree reconstruction methods. Consistency and resolution is further improved by only including slowly evolving positions, generating trees using an alignment consisting of 4,571 positions showing at least 50% identity within each site. The resulting ML phylogenetic tree was largely congruent with phylogenies generated using 16S rrna (Woese and Fox 1977) in that it showed strong support for the monophyly of Archaea, Bacteria, and Eukarya, as well as the monophyly of other major groups, such as Crenarchaeota, Euryarchaeota, Proteobacteria, and Firmicutes (supplementary text file S2, Supplementary Material online). Some artifacts due to composition and long branches are also present, such as the deep placement of some protist and fungal lineages within the Eukarya and possibly the locations of the roots of the bacterial and archaeal domains. Tree reconstructions using the more sophisticated nonhomogenous site model in PhyloBayes (Lartillot and Philippe 2004) did not remove the apparent long-branch attraction artifacts present in our phylogeny (data not shown). However, the observed phylogenetic artifacts and uncertainties should have little if any impact on our analysis, given the very short deep branches within the bacterial and archaeal domains, and the absence of any hypothesis, placing the root within the eukaryal domain. In addition, the impact of model misspecification on ancestral state reconstruction is greatly reduced by the limitation of our analyses to sites for which the ancestral state is determined with more than 90% posterior probability. Future refinement of this methodology may explore branch heterogeneity with respect to composition and covariation (e.g., Blanquart and Lartillot 2008), although it is unclear if such an approach would significantly impact our results. Compositional Bias and Substitution Models To attribute compositional bias at conserved positions to a branch-specific biological effect, the model of amino acid substitutions must be first taken into account. In any biologically relevant substitution model, similar amino acids more frequently substitute for one another, whereas some more unique amino acids undergo substitutions much less often. Based on these characteristics, conserved positions on any branch should contain an overabundance of amino acids, which are relatively invariant (e.g., Cys), and an underabundance of amino acids, which frequently substitute for others (e.g., Ser), relative to their overall frequencies, which remain constant across the reconstruction. Furthermore, the more evolution separates sequences (i.e., the longer the branch), the more pronounced these effects should become. Testing this assumption by simulating sequence evolution across increasingly long branches shows that this is indeed the case (fig. 2). Even before a branch length of 1 substitution/site is reached, conserved positions show a clear enrichment in Gly, Cys, Trp, and Pro. At longer branch lengths, an underabundance of Asn, Gln, Lys, and Ser also becomes apparent. Therefore, observed usages at conserved positions along branches must be compared with those detected within simulated sequence reconstructions in order to avoid attributing branch-specific biological relevance to any observed bias in these particular amino acids. Interestingly, these biases seem to reach a maximum at branch lengths of and then decrease until converging with overall amino acid usage rates at branch lengths around 50. As this is approximately the branch length at which, for this model, the number of conserved positions remaining approaches the number expected by chance between two nonrelated sequences (N/20), this likely represents the eventual removal of any detectable sequence homology due to extreme saturation with substitutions. At this point, remaining matching positions are therefore merely a product of overall sequence composition, as would be the case for two random sequences. Amino Acid Usage Biases In comparing observed conserved positions with those within reconstructions using simulated sequences, ten 1795

5 Fournier and Gogarten doi: /molbev/msq057 FIG. 2.Branch length induced compositional bias at conserved positions. Bias in the usage of particular AAs increases with evolutionary distance (branch length) due to the differential substitution rates of AAs as described in biologically relevant substitution models. As labeled, AAs that tend to occupy conserved positions tend to increase in usage (Gly, Cys, Trp, and Pro), whereas AAs that frequently substitute tend to decrease (Asn, Gln, Ser, and Lys). At extreme branch lengths, this bias subsides as no informative positions remain due to saturation with substitutions. Branch lengths are represented on a log scale. AA, amino acid. amino acids showed strong statistically significant biases within deep branches, with nine showing the strongest bias within the bacterial branch, that is, the branch leading to the bacterial root (table 1, figs. 3 and 4, supplementary fig. S1, Supplementary Material online). Gly, Ala, and Asn all showed a strong significant overrepresentation in the bacterial branch (35.1%, 45.6%, and 54.7%, respectively). The branch leading to the archaeal root also showed weaker yet significant overrepresentation of Ala and Gly. The only other amino acid showing significant overrepresentation Table 1. Amino Acid Usage at Conserved Positions on Deep Phylogenetic Branches. Amino Acid Usage within Deep Branches Bacterial Archaeal Eukaryal Global Ribosomal Usage Ala 0.113* 0.093* Asp Cys 0.005* Glu Phe 0.020* 0.025* 0.026* Gly 0.185* 0.135* His Ile Lys Leu Met Asn 0.037* Pro Gln 0.011* 0.016* Arg * 0.102* Ser 0.024* 0.031* 0.028* Thr Val Trp 0.002* 0.008* 0.009* Tyr 0.015* * * amino acid usage rates significantly different from expected usage in sequence simulations (2-sided Z-test, p,0.05, see Materials and Methods). on deep branches was Arg along the branches leading to the archaeal and eukaryal roots, albeit with a substantially weaker bias (13.9% and 22.2%, respectively). Conversely, strong significant underrepresentation along the bacterial branch was observed for Gln ( 47.6%), Phe ( 44.0%), Ser ( 38.2%), Trp ( 85.1%), Tyr ( 64.1%), and Cys ( 65.4%). Weaker yet significant biases were observed along the branch leading to the archaeal root for Gln, Phe, Ser, and Trp and along the branch to the eukaryal root for Phe, Ser, Trp, and Tyr. The strong statistically significant biases observed for Asn, Cys, and Trp along the bacterial branch are especially compelling as their direction is the inverse of that expected for conserved positions within long branches (fig. 2). Therefore, these usage biases cannot be explained by an underestimation of branch lengths at deep positions within the tree. Similarly, usages of Ala, Tyr, and Phe should be immune to any such conservation bias as these are not observed to substantially fluctuate over increasing branch lengths. Because the usage biases of Gly, Gln, and Ser are of the same direction as the expected conservation biases, their significance lies in the magnitude of the effect, which potentially could be an artifact of underestimated branch lengths. However, the magnitude of the observed biases is beyond what the model can accommodate. For example, at the maximal bias observed within simulations (branch length ;20), usage of Gly at conserved positions is less than double the initial condition (8.7% vs. 5.0%) (fig. 2), a substantially smaller increase than what is observed on the much shorter branch leading to the bacterial root compared with current average ribosomal usages (18.5% vs. 7.9%) (table 1). The case is similar for the underrepresentation of Gln and Ser. Therefore, it is unlikely that the strong compositional bias observed at conserved positions on the bacterial branch is 1796

6 Rooting the Ribosomal Tree of Life doi: /molbev/msq057 FIG. 3.Composite amino acid usage bias across universal ribosomal tree. Bias values reflect the geometric mean distances (normalized as SD) between observed and expected amino acid usages. Increased bias across the archaeal domain is largely due to widespread thermophily, halophily, and nucleotide composition bias. Aside from branches associated with haloarchaea, the branch leading to the bacteria (the bacterial root) contains the greatest bias. a product of especially large conservation bias along long branches. Rare amino acids such as Cys, Trp, and Met show particularly strong biases across most of the tree; however, because their usage rates are low with respect to their variance across simulated analyses, most of these biases are nonsignificant. For example, even though Met shows an overrepresentation of 40.1% along the bacterial branch, this value is not statistically significant at the cutoff level used for this analysis (P, 0.05). Composite Analysis Combining individual amino acid biases into an overall composite bias across the tree (fig. 3), the bacterial branch clearly shows the strongest composite bias outside of the haloarchaea, corresponding to 3.44 SD per amino acid. In comparison, the composite bias in the archaeal branch is equivalent to 1.95 SD per amino acid and in the eukaryal branch is 1.71 SD per amino acid. This verifies that the strong individual biases in amino acid usage along this branch are indicative of a unique signal and that there is no strong alternative signal consisting of a combined effect among amino acids with weaker biases, which, on their own, would not be identified as significant. In general, composite bias is greater within the Archaea than in the Bacteria and Eukarya due to the combined effects of widespread physiological factors that influence amino acid composition in different archaeal groups. Alternative Compositional Signatures Several physiological characters can impose amino acid biases within proteins, including thermophily, halophily, and bias in genomic composition, specifically G þ C content. The impact of each of these factors is well known and is reflected in the observed amino acid composition across several branches of the concatenated ribosomal tree (table 2). The overrepresentation of Asp and Glu in halophiles is an adaptation to the ionic hypersaline environment of their intracellular space (Dennis and Shimmin 1997). Thermophiles prefer charged residues over polar due to the thermodynamics of protein folding and also tend to favor proline to promote rigidity of protein structures (Watanabe et al. 1991; Fukuchi and Nishikawa 2001; Zhou et al. 2008). Finally, genomes with strong G þ C biases tend to favor and avoid specific sets of amino acids based on the nucleotide content of their codons (Paila et al. 2008). Each of these observed physiologically imposed compositional biases is incongruent with the set of biases observed on the bacterial branch; this reinforces our interpretation that this signature is not simply reflection of an ancestral physiological state but is an echo of a more primitive genetic code at the deepest branch of the phylogeny of the ribosome. An alternative hypothesis to explain the previously reported (Fournier and Gogarten 2007) compositional bias within the bacterial branch is that it represents a mesophilic 1797

7 Fournier and Gogarten doi: /molbev/msq057 FIG. 4.Amino acid usages at conserved positions along the branch leading to the bacterial root. Error bars represent SD calculated from ten replicate simulations in the case of expected values (red bars, SD corrected for low sample size) and from 29 universal ribosomal proteins in the case of observed usages in extant ribosomal protein sequence (green bars). signature for early life, as indicated by an underrepresentation of amino acids typically favored in thermophiles (Boussau et al. 2008). Although the signature we reconstruct for the bacterial branch is inconsistent with ancient thermophily, it does not necessarily follow that this signature is solely caused by the converse physiological state, that is, nonthermophily. Because the majority of sequences used to provide the initial amino acid frequencies in the expectation simulation are from mesophiles, a branch solely defined by its mesophilic character should produce no significant signature of compositional bias at all; underrepresentation of thermophilic amino acids would therefore not be expected. Our data therefore are compatible with an underrepresentation of amino acids added late during the expansion of the genetic code and a mesophilic MRCA located on the bacterial branch. Assuming a mesophilic MRCA alone cannot explain the observed compositional bias. Comparison with Other Models for the Early Expansion of the Genetic Code The strong unique biases in amino acid composition at conserved positions present along the bacterial branch cannot be explained by the effects of substitution models within long branches, nucleotide composition, or the compositional impacts of any known physiological conditions. Additionally, the specific amino acid biases detected are similar to those predicted in several models of genetic code Table 2. Amino Acid Bias Signatures within the Universal Ribosomal Tree. Signature Overrepresentation Underrepresentation Thermophiles Glu, Pro, Trp Ser, Gln Halophiles Asp, Glu Ile, Lys, Tyr, Cys Low G 1 C Ile, Lys, Ser, Tyr, Asn Pro, Glu High G 1 C Arg Ile, Asn Deep branches Ala, Gly, Asn Phe, Gln, Ser, Trp, Tyr, Cys evolution, suggesting that the bacterial branch is closest to a more primitive state of the genetic code, and therefore contains the root of the translational machinery. The set of amino acids showing significant bias along the bacterial branch is similar to those previously detected by a related but less sophisticated analysis (Fournier and Gogarten 2007). However, the incorporation of phylogeny, ancestral sequence reconstruction, substitution biases, and a probabilistic definition of conserved positions greatly increase the sample size and utility of the method presented, correcting artifacts in the previous analysis, such as the previously discussed apparent underrepresentation of Ile, Val, and Lys. The absence of a significant bias on this deep branch for some amino acids (Val, Ile, Leu, Asp, Pro, Thr, Lys, or Arg) can be explained in two ways. First, an amino acid may actually be within the intermediate set of those added to the code, with sufficient time for usage equilibrium before the MRCA, but not old enough for an initial overwhelming excess to exist. Second, some amino acids within this set may be equally ancient to those showing a significant overrepresentation; however, their initial excess may have rapidly disappeared with the addition of other similar amino acids that provided a slight selective advantage in many positions, effectively partitioning away their excess. In some cases, this would also effectively diminish the bias signal of the underrepresented new amino acid as sites would have been preselected for their invasion. This preselection effect could explain the flat usage levels of Val, Leu, and Ile, making it impossible to determine from this analysis in what order they were added to the code. This may also be the case for Arg and Lys. However, because Thr and Asp both have similar alternatives, which are underrepresented on the bacterial branch (Ser and Glu), there is stronger evidence that these are both indeed more ancient amino acids. Because Pro is a functionally and structurally unique amino acid that should not be susceptible to 1798

8 Rooting the Ribosomal Tree of Life doi: /molbev/msq057 competition, it seems most likely that it was actually added at an intermediate stage. Additional inferences about code evolution can be made by comparing the signatures of sets of metabolically related amino acids with similar physiochemical properties and assuming that these come as a set. For example, Gln is physiochemically analogous to Asn and shows strong underrepresentation on the bacterial branch, indicating that it was not a specifically encoded amino acid in an earlier version of the genetic code. Although its underrepresentation on the bacterial branch is not statistically significant at the cutoff used in this analysis, Glu does show a trend congruent with the metabolically related Glu Gln amino acid pair being a more recent addition to the code. Likewise, the similar Asp Asn pair is more likely to be ancient due to the detected overrepresentation of Asn along the same deep branch. Based on this inference, at a more primitive state, the code would still contain a similar diversity of physiochemical properties, being able to synthesize proteins containing hydroxyl, amino, hydrophobic, and both positively and negatively charged side chains. This suggests that after its earliest stages, genetic code evolution may have promoted continued optimization of existing protein folds rather than being responsible for a radical expansion of protein diversity and functionality. The only exceptions seem to be unique specialized amino acids such as those with aromatic structure (Tyr, Trp, and Phe) and Cys. Aromatic residues may have been advantageous for the hydrophobic core packing needed as proteins became larger, whereas Cys would have been advantageous as more opportunities developed for complex linkages between proteins, lipids, and other substrates. Alternatively, aromatic amino acids could have been selected for as RNA replacements in a dying RNA world, mimicking nucleotides in important structural roles (e.g., stacking interactions). The unique metal-binding properties of Cys may also have played a role in its selection as this faculty may have been deficient in an RNA-based physiology. The specific amino acid biases identified on the bacterial branch are consistent with several generally accepted principles regarding the evolution of the genetic code, specifically a preference for earlier amino acids to be simple, with latter ones more complex (Trifonov 2000). This is especially apparent in the identification of Trp and Tyr as more recent amino acids as both are products of complex metabolic pathways (Klipcan and Safro 2004). These results are also in agreement with the hypothesis that class II aars are generally associated with earlier amino acids, with class I aars evolving later (Hartman 1995). In fact, every amino acid identified as early in this analysis has a cognate class II aars, whereas most of the amino acids identified as late have a cognate class I aars. Second-order inferences regarding amino acids without significant bias also favor this model as Asp and Thr (inferred to be earlier) are both associated with class II aars, whereas Glu (inferred to be more recent) is associated with class I. The exceptions are Phe and Ser; interestingly, both are associated with unique class II aars: PheRS has a unique heterotetrameric structure (Mosyak et al. 1995) and undergoes a class I-like aminoacylation reaction (Sprinzl and Cramer 1975); SerRS does not recognize the anticodon of its cognate trnas (as its uniquely disparate set of six codons makes this impossible), instead relying on a long variable arm for interaction specificity (Asahara et al. 1994). This may suggest that in each case, these aars are more derived and among the more recent class II variants to evolve. The identification of Ser as a late amino acid is especially interesting for additional reasons as this amino acid is generally considered to be among the most ancient based on biochemical, metabolic, and codon table criteria (Trifonov 2000). Although it may be the case that the biochemical and metabolic support for Ser being a primordial amino acid simply indicates its likely participation in primordial intermediate metabolism (predating its incorporation into the genetic code), this type of speculation requires evidence beyond that provided in the scope of this investigation. There is at least one model of genetic code evolution predicting Thr preceding Ser (Higgs 2009), which reconstructs the organization and evolution of the genetic code using a selection-based model, evaluating the fitness effect of code expansion via the addition of new amino acids via stepwise codon space partitioning. The amino acids predicted to be the earliest in this model (Gly, Ala, and Asp) are also in agreement with the presented analysis, along with Val, on which our analysis remains agnostic as previously described. In further agreement, Pro is predicted to be added at an intermediate stage and Cys, Trp, Tyr, and Phe at a later stage. The only amino acids showing conflict between these two analyses are Glu and Asn. Higgs (2009) places Glu among the earlier amino acids to be added (fifth, in the second stage of code evolution), as opposed to among the most recent. Conversely, Higgs (2009) places Asn as an intermediate-late addition, as opposed to an early one. In both cases, these differences are likely related to the selection-based model favoring the physiochemical ionic similarity of the Asp Glu pair over the metabolic/ chemical similarity of the Glu Gln and Asp Asn pairs. Conclusions Rooting the tree of the translation machinery using the polarizing ancestral state of its own functional semantics (i.e., the genetic code) is a novel approach that avoids many of the shortcomings of other rooting methods while simultaneously making use of a large set of proteins providing reliable phylogenetic reconstruction. The results of this method locate the ribosomal MRCA upon the branch leading to the bacterial domain, in agreement with previous analyses utilizing sets of genes that underwent ancestral duplications, including those also related to protein synthesis. As ribosomal proteins are more likely to show a consistent phylogenetic signal indicative of vertical inheritance, the rooting of the ribosomal ToL provides a polarized phylogenetic scaffold upon which the complex genetic reticulations of evolutionary history can be mapped and better understood. Additionally, this method identifies a pattern 1799

9 Fournier and Gogarten doi: /molbev/msq057 of amino acid usage biases in general agreement with current models of genetic code evolution. Supplementary Material Supplementary figure S1 and supplementary text file S2 are available at Molecular Biology and Evolution online ( Acknowledgments We thank Tim Harlow for writing the program for parsing ANCESCON results; Pascal Lapierre for writing the program to identify conserved positions within multiple sequence alignments; and Kristen Swithers, Nicolas Galtier, and two anonymous reviewers for comments and discussions. This work was supported through grants from the NASA Exobiology (NNX07AK15G) and National Science Foundation Assembling the Tree of Life (DEB ) programs. References Asahara H, Himeno H, Tamura K, Nameki N, Hasegawa T, Shimizu M Escherichia coli seryl-trna synthetase recognizes trna(ser) by its characteristic tertiary structure. J Mol Biol. 236: Baldauf SL, Palmer JD, Doolittle WF The root of the universal tree and the origin of eukaryotes based on elongation factor phylogeny. Proc Natl Acad Sci U S A. 93: Bapteste E, O Malley MA, Beiko RG, et al. (11 co-authors) Prokaryotic evolution and the tree of life are two different things. Biol Direct. 4:34. Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW GenBank. Nucleic Acids Res. 37: Blanquart S, Lartillot N A site- and time-heterogeneous model of amino-acid replacement. Mol Biol Evol. 25: Boussau B, Blanquart S, Necsulea A, Lartillot N, Gouy M Parallel adaptations to high temperatures in the Archaean eon. Nature 456: Brooks DJ, Fresco JR Increased frequency of cysteine, tyrosine, and phenylalanine residues since the last universal ancestor. Mol Cell Proteomics. 1: Brooks DJ, Fresco JR, Lesk AM, Singh M Evolution of amino acid frequencies in proteins over deep time: inferred order of introduction of amino acids into the genetic code. Mol Biol Evol. 19: Brooks DJ, Fresco JR, Singh M A novel method for estimating ancestral amino acid composition and its application to proteins of the Last Universal Ancestor. Bioinformatics 20: Brown J, Doolittle W Root of the universal tree of life based on ancient aminoacyl-trna synthetase gene duplications. Proc Natl Acad Sci U S A. 92: Cai W, Pei J, Grishin NV Reconstruction of ancestral protein sequences and its applications. BMC Evol Biol. 4:33. Cavalier-Smith T The neomuran origin of archaebacteria, the negibacterial root of the universal tree and bacterial megaclassification. Int J Syst Evol Microbiol. 52:7 76. Cavalier-Smith T Rooting the tree of life by transition analyses. Biol Direct. 1:19. Cox CJ, Foster PG, Hirt RP, Harris SR, Embley TM The archaebacterial origin of eukaryotes. Proc Natl Acad Sci U S A. 105: Dagan T, Artzy-Randrup Y, Martin W Modular networks and cumulative impact of lateral transfer in prokaryote genome evolution. Proc Natl Acad Sci U S A. 105: Dagan T, Martin W Testing hypotheses without considering predictions. Bioessays 29: Dennis PP, Shimmin LC Evolutionary divergence and salinitymediated selection in halophilic archaea. Microbiol Mol Biol Rev. 61: Di Giulio M The tree of life might be rooted in the branch leading to Nanoarchaeota. Gene 401: Edgar RC MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32: Fournier GP Genetic code evolution and amino acid composition analysis [PhD thesis]. [Storrs (CT)]: Department of Molecular and Cell Biology, University of Connecticut. Fournier GP, Gogarten JP Signature of a primitive genetic code in ancient protein lineages. J Mol Evol. 65: Fukuchi S, Nishikawa K Protein surface amino acid compositions distinctively differ between thermophilic and mesophilic bacteria. J Mol Biol. 309: Gogarten JP The early evolution of cellular life. Trends Ecol Evol. 10: Gogarten JP, Doolittle WF, Lawrence JG Prokaryotic evolution in light of gene transfer. Mol Biol Evol. 19: Gogarten JP, Kibak H, Dittrich P, et al. (13 co-authors) Evolution of the vacuolar Hþ-ATPase: implications for the origin of eukaryotes. Proc Natl Acad Sci U S A. 86: Gray MW Origin and evolution of organelle genomes. Curr Opin Genet Dev. 3: Gribaldo S, Cammarano P The root of the universal tree of life inferred from anciently duplicated genes encoding components of the protein-targeting machinery. J Mol Evol. 47: Guindon S, Gascuel O A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol. 52: Harris JK, Kelley ST, Spiegelman GB, Pace NR The genetic core of the universal ancestor. Genome Res. 13: Hartman H Speculations on the evolution of the genetic code IV: the evolution of the aminoacyl-trna synthetases. Orig Life Evol Biosph. 25: Hennig W Phylogenetic systematics. Urbana (IL): University of Illinois Press. Higgs PG A four-column theory for the origin of the genetic code: tracing the evolutionary pathways that gave rise to an optimized code. Biol Direct. 4:16. Islas S, Hernández-Morales R, Lazcano A Question 7: comparative genomics and early cell evolution: a cautionary methodological note. Orig Life Evol Biosph. 37: Iwabe N, Kuma K, Hasegawa M, Osawa S, Miyata T Evolutionary relationship of archaebacteria, eubacteria, and eukaryotes inferred from phylogenetic trees of duplicated genes. Proc Natl Acad Sci U S A. 86: Klipcan L, Safro M Amino acid biogenesis, evolution of the genetic code and aminoacyl-trna synthetases. J Theor Biol. 228: Knight RD, Freeland SJ, Landweber LF Rewiring the keyboard: evolvability of the genetic code. Nat Rev Genet. 2: Lake JA, Herbold CW, Rivera MC, Servin JA, Skophammer RG Rooting the tree of life using nonubiquitous genes. Mol Biol Evol. 24: Lartillot N, Philippe H Bayesian phylogenetic software based on mixture models. Mol Biol Evol. 21: Lawson F, Charlebois R, Dillon J Phylogenetic analysis of carbamoylphosphate synthetase genes: complex evolutionary history includes an internal duplication within a gene which can root the tree of life. Mol Biol Evol. 13: Margulis L Symbiosis in cell evolution: microbial communities in the archean and proterozoic eons. New York: W H Freeman & Co.

10 Rooting the Ribosomal Tree of Life doi: /molbev/msq057 Miranda I, Silva R, Santos MA Evolution of the genetic code in yeasts. Yeast 23: Morandi A, Zhaxybayeva O, Gogarten JP, Graf J Evolutionary and diagnostic implications of intragenomic heterogeneity in the 16S rrna gene in Aeromonas strains. JBacteriol. 187: Mosyak L, Reshetnikova L, Goldgur Y, Delarue M, Safro MG Structure of phenylalanyl-trna synthetase from Thermus thermophilus. Nat Struct Biol. 2: Olendzenski L, Gogarten JP Deciphering the molecular record for the early evolution of life: gene duplication and horizontal gene transfer. In: Wiegel J, Adams MWW, editors. Thermophiles: the keys to molecular evolution and the origin of life? Philadelphia (PA): Taylor & Francis Inc. p Paila U, Kondam R, Ranjan A Genome bias influences amino acid choices: analysis of amino acid substitution and recompilation of substitution matrices exclusive to an AT-biased genome. Nucleic Acids Res. 36: Penny D, Poole A The nature of the last universal common ancestor. Curr Opin Genet Dev. 9: Philippe H, Forterre P The rooting of the universal tree of life is not reliable. J Mol Evol. 49: Poole AM, Penny D. 2007a. Evaluating hypotheses for the origin of eukaryotes. Bioessays 29: Poole AM, Penny D. 2007b. Response to Dagan and Martin. Bioessays 29: Schofield JP Molecular studies on an ancient gene encoding for carbamoyl-phosphate synthetase. Clin Sci (Lond). 84: Schwartz RM, Dayhoff MO Origins of prokaryotes, eukaryotes, mitochondria, and chloroplasts. Science. 199: Skophammer RG, Herbold CW, Rivera MC, Servin JA, Lake JA Evidence that the root of the tree of life is not within the Archaea. Mol Biol Evol. 23: Skophammer RG, Servin JA, Herbold CW, Lake JA Evidence for a gram-positive, eubacterial root of the tree of life. Mol Biol Evol. 24: Sorek R, Zhu Y, Creevey CJ, Francino MP, Bork P, Rubin EM Genome-wide experimental determination of barriers to horizontal gene transfer. Science 318: Sprinzl M, Cramer F Site of aminoacylation of trnas from Escherichia coli with respect to the 2#- or 3#-hydroxyl group of the terminal adenosine. Proc Natl Acad Sci U S A. 72: Swithers KS, Gogarten JP, Fournier GP Trees in the web of life. J Biol. 8:54. Thompson J, Higgins D, Gibson T CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighing, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22: Trifonov EN Consensus temporal order of amino acids and evolution of the triplet code. Gene 261: Watanabe K, Chishiro K, Kitamura K, Suzuki Y Proline residues responsible for thermostability occur with high frequency in the loop regions of an extremely thermostable oligo-1,6-glucosidase from Bacillus thermoglucosidasius KP1006. J Biol Chem. 266: Woese CR, Fox GE Phylogenetic structure of the prokaryotic domain: the primary kingdoms. Proc Natl Acad Sci U S A. 74: Yang Z PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol. 24: Zhaxybayeva O, Gogarten JP Cladogenesis, coalescence and the evolution of the three domains of life. Trends Genet. 20: Zhaxybayeva O, Gogarten JP Horizontal gene transfer, gene histories and the root of the tree of life. In: Pudritz RE, Higgs PG, Stone J, editors. Astrobiology and the origins of life. Cambridge: Cambridge University Press p Zhaxybayeva O, Lapierre P, Gogarten JP Ancient gene duplications and the root(s) of the tree of life. Protoplasma 227: Zhou XX, Wang YB, Pan YJ, Li WF Differences in amino acids composition and coupling patterns between mesophilic and thermophilic proteins. Amino Acids. 34:

Unsupervised Learning in Spectral Genome Analysis

Unsupervised Learning in Spectral Genome Analysis Unsupervised Learning in Spectral Genome Analysis Lutz Hamel 1, Neha Nahar 1, Maria S. Poptsova 2, Olga Zhaxybayeva 3, J. Peter Gogarten 2 1 Department of Computer Sciences and Statistics, University of

More information

Interpreting the Molecular Tree of Life: What Happened in Early Evolution? Norm Pace MCD Biology University of Colorado-Boulder

Interpreting the Molecular Tree of Life: What Happened in Early Evolution? Norm Pace MCD Biology University of Colorado-Boulder Interpreting the Molecular Tree of Life: What Happened in Early Evolution? Norm Pace MCD Biology University of Colorado-Boulder nrpace@colorado.edu Outline What is the Tree of Life? -- Historical Conceptually

More information

8/23/2014. Phylogeny and the Tree of Life

8/23/2014. Phylogeny and the Tree of Life Phylogeny and the Tree of Life Chapter 26 Objectives Explain the following characteristics of the Linnaean system of classification: a. binomial nomenclature b. hierarchical classification List the major

More information

Microbial Taxonomy and the Evolution of Diversity

Microbial Taxonomy and the Evolution of Diversity 19 Microbial Taxonomy and the Evolution of Diversity Copyright McGraw-Hill Global Education Holdings, LLC. Permission required for reproduction or display. 1 Taxonomy Introduction to Microbial Taxonomy

More information

The universal ancestor was a thermophile or a hyperthermophile

The universal ancestor was a thermophile or a hyperthermophile Gene 281 (2001) 11 17 www.elsevier.com/locate/gene The universal ancestor was a thermophile or a hyperthermophile Massimo Di Giulio* International Institute of Genetics and Biophysics, CNR, Via G. Marconi

More information

C3020 Molecular Evolution. Exercises #3: Phylogenetics

C3020 Molecular Evolution. Exercises #3: Phylogenetics C3020 Molecular Evolution Exercises #3: Phylogenetics Consider the following sequences for five taxa 1-5 and the known outgroup O, which has the ancestral states (note that sequence 3 has changed from

More information

Dynamic optimisation identifies optimal programs for pathway regulation in prokaryotes. - Supplementary Information -

Dynamic optimisation identifies optimal programs for pathway regulation in prokaryotes. - Supplementary Information - Dynamic optimisation identifies optimal programs for pathway regulation in prokaryotes - Supplementary Information - Martin Bartl a, Martin Kötzing a,b, Stefan Schuster c, Pu Li a, Christoph Kaleta b a

More information

Introduction to the Ribosome Overview of protein synthesis on the ribosome Prof. Anders Liljas

Introduction to the Ribosome Overview of protein synthesis on the ribosome Prof. Anders Liljas Introduction to the Ribosome Molecular Biophysics Lund University 1 A B C D E F G H I J Genome Protein aa1 aa2 aa3 aa4 aa5 aa6 aa7 aa10 aa9 aa8 aa11 aa12 aa13 a a 14 How is a polypeptide synthesized? 2

More information

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Bio 1B Lecture Outline (please print and bring along) Fall, 2007 Bio 1B Lecture Outline (please print and bring along) Fall, 2007 B.D. Mishler, Dept. of Integrative Biology 2-6810, bmishler@berkeley.edu Evolution lecture #5 -- Molecular genetics and molecular evolution

More information

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships Chapter 26: Phylogeny and the Tree of Life You Must Know The taxonomic categories and how they indicate relatedness. How systematics is used to develop phylogenetic trees. How to construct a phylogenetic

More information

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9 Lecture 5 Alignment I. Introduction. For sequence data, the process of generating an alignment establishes positional homologies; that is, alignment provides the identification of homologous phylogenetic

More information

The indefinable term prokaryote and the polyphyletic origin of genes MASSIMO DI GIULIO

The indefinable term prokaryote and the polyphyletic origin of genes MASSIMO DI GIULIO HYPOTHESIS The indefinable term prokaryote and the polyphyletic origin of genes MASSIMO DI GIULIO Early Evolution of Life Laboratory, Institute of Biosciences and Bioresources, CNR, Via P. Castellino,

More information

Ancestral Reconstruction of a Pre-LUCA Aminoacyl-tRNA Synthetase Ancestor Supports the Late Addition of Trp to the Genetic Code

Ancestral Reconstruction of a Pre-LUCA Aminoacyl-tRNA Synthetase Ancestor Supports the Late Addition of Trp to the Genetic Code Ancestral Reconstruction of a Pre-LUCA Aminoacyl-tRNA Synthetase Ancestor Supports the Late Addition of Trp to the Genetic Code The MIT Faculty has made this article openly available. Please share how

More information

Sequence Based Bioinformatics

Sequence Based Bioinformatics Structural and Functional Analysis of Inosine Monophosphate Dehydrogenase using Sequence-Based Bioinformatics Barry Sexton 1,2 and Troy Wymore 3 1 Bioengineering and Bioinformatics Summer Institute, Department

More information

Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition

Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition David D. Pollock* and William J. Bruno* *Theoretical Biology and Biophysics, Los Alamos National

More information

2 Genome evolution: gene fusion versus gene fission

2 Genome evolution: gene fusion versus gene fission 2 Genome evolution: gene fusion versus gene fission Berend Snel, Peer Bork and Martijn A. Huynen Trends in Genetics 16 (2000) 9-11 13 Chapter 2 Introduction With the advent of complete genome sequencing,

More information

The indefinable term prokaryote and the polyphyletic origin of genes

The indefinable term prokaryote and the polyphyletic origin of genes Indian Academy of Sciences HYPOTHESIS The indefinable term prokaryote and the polyphyletic origin of genes MASSIMO DI GIULIO Early Evolution of Life Laboratory, Institute of Biosciences and Bioresources,

More information

Chapter 19. Microbial Taxonomy

Chapter 19. Microbial Taxonomy Chapter 19 Microbial Taxonomy 12-17-2008 Taxonomy science of biological classification consists of three separate but interrelated parts classification arrangement of organisms into groups (taxa; s.,taxon)

More information

Rooting Major Cellular Radiations using Statistical Phylogenetics

Rooting Major Cellular Radiations using Statistical Phylogenetics Rooting Major Cellular Radiations using Statistical Phylogenetics Svetlana Cherlin Thesis submitted for the degree of Doctor of Philosophy School of Mathematics & Statistics Institute for Cell & Molecular

More information

Supplementary Materials for

Supplementary Materials for advances.sciencemag.org/cgi/content/full/1/8/e1500527/dc1 Supplementary Materials for A phylogenomic data-driven exploration of viral origins and evolution The PDF file includes: Arshan Nasir and Gustavo

More information

Introduction to Evolutionary Concepts

Introduction to Evolutionary Concepts Introduction to Evolutionary Concepts and VMD/MultiSeq - Part I Zaida (Zan) Luthey-Schulten Dept. Chemistry, Beckman Institute, Biophysics, Institute of Genomics Biology, & Physics NIH Workshop 2009 VMD/MultiSeq

More information

Dr. Amira A. AL-Hosary

Dr. Amira A. AL-Hosary Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

More information

Chapter Chemical Uniqueness 1/23/2009. The Uses of Principles. Zoology: the Study of Animal Life. Fig. 1.1

Chapter Chemical Uniqueness 1/23/2009. The Uses of Principles. Zoology: the Study of Animal Life. Fig. 1.1 Fig. 1.1 Chapter 1 Life: Biological Principles and the Science of Zoology BIO 2402 General Zoology Copyright The McGraw Hill Companies, Inc. Permission required for reproduction or display. The Uses of

More information

Microbial Diversity and Assessment (II) Spring, 2007 Guangyi Wang, Ph.D. POST103B

Microbial Diversity and Assessment (II) Spring, 2007 Guangyi Wang, Ph.D. POST103B Microbial Diversity and Assessment (II) Spring, 007 Guangyi Wang, Ph.D. POST03B guangyi@hawaii.edu http://www.soest.hawaii.edu/marinefungi/ocn403webpage.htm General introduction and overview Taxonomy [Greek

More information

Comparing Prokaryotic and Eukaryotic Cells

Comparing Prokaryotic and Eukaryotic Cells A prokaryotic cell Basic unit of living organisms is the cell; the smallest unit capable of life. Features found in all cells: Ribosomes Cell Membrane Genetic Material Cytoplasm ATP Energy External Stimuli

More information

Concepts and Methods in Molecular Divergence Time Estimation

Concepts and Methods in Molecular Divergence Time Estimation Concepts and Methods in Molecular Divergence Time Estimation 26 November 2012 Prashant P. Sharma American Museum of Natural History Overview 1. Why do we date trees? 2. The molecular clock 3. Local clocks

More information

Translation. A ribosome, mrna, and trna.

Translation. A ribosome, mrna, and trna. Translation The basic processes of translation are conserved among prokaryotes and eukaryotes. Prokaryotic Translation A ribosome, mrna, and trna. In the initiation of translation in prokaryotes, the Shine-Dalgarno

More information

Microbial Taxonomy. Microbes usually have few distinguishing properties that relate them, so a hierarchical taxonomy mainly has not been possible.

Microbial Taxonomy. Microbes usually have few distinguishing properties that relate them, so a hierarchical taxonomy mainly has not been possible. Microbial Taxonomy Traditional taxonomy or the classification through identification and nomenclature of microbes, both "prokaryote" and eukaryote, has been in a mess we were stuck with it for traditional

More information

Niche specific amino acid features within the core genes of the genus Shewanella

Niche specific amino acid features within the core genes of the genus Shewanella www.bioinformation.net Hypothesis Volume 8(19) Niche specific amino acid features within the core genes of the genus Shewanella Rachana Banerjee* & Subhasis Mukhopadhyay Department of Biophysics, Molecular

More information

This is a repository copy of Microbiology: Mind the gaps in cellular evolution.

This is a repository copy of Microbiology: Mind the gaps in cellular evolution. This is a repository copy of Microbiology: Mind the gaps in cellular evolution. White Rose Research Online URL for this paper: http://eprints.whiterose.ac.uk/114978/ Version: Accepted Version Article:

More information

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

More information

Secondary Structure. Bioch/BIMS 503 Lecture 2. Structure and Function of Proteins. Further Reading. Φ, Ψ angles alone determine protein structure

Secondary Structure. Bioch/BIMS 503 Lecture 2. Structure and Function of Proteins. Further Reading. Φ, Ψ angles alone determine protein structure Bioch/BIMS 503 Lecture 2 Structure and Function of Proteins August 28, 2008 Robert Nakamoto rkn3c@virginia.edu 2-0279 Secondary Structure Φ Ψ angles determine protein structure Φ Ψ angles are restricted

More information

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task. Chapter 12 (Strikberger) Molecular Phylogenies and Evolution METHODS FOR DETERMINING PHYLOGENY In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task. Modern

More information

Energy and Cellular Metabolism

Energy and Cellular Metabolism 1 Chapter 4 About This Chapter Energy and Cellular Metabolism 2 Energy in biological systems Chemical reactions Enzymes Metabolism Figure 4.1 Energy transfer in the environment Table 4.1 Properties of

More information

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression)

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression) Using phylogenetics to estimate species divergence times... More accurately... Basics and basic issues for Bayesian inference of divergence times (plus some digression) "A comparison of the structures

More information

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

More information

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis 10 December 2012 - Corrections - Exercise 1 Non-vertebrate chordates generally possess 2 homologs, vertebrates 3 or more gene copies; a Drosophila

More information

Computational approaches for functional genomics

Computational approaches for functional genomics Computational approaches for functional genomics Kalin Vetsigian October 31, 2001 The rapidly increasing number of completely sequenced genomes have stimulated the development of new methods for finding

More information

The minimal prokaryotic genome. The minimal prokaryotic genome. The minimal prokaryotic genome. The minimal prokaryotic genome

The minimal prokaryotic genome. The minimal prokaryotic genome. The minimal prokaryotic genome. The minimal prokaryotic genome Dr. Dirk Gevers 1,2 1 Laboratorium voor Microbiologie 2 Bioinformatics & Evolutionary Genomics The bacterial species in the genomic era CTACCATGAAAGACTTGTGAATCCAGGAAGAGAGACTGACTGGGCAACATGTTATTCAG GTACAAAAAGATTTGGACTGTAACTTAAAAATGATCAAATTATGTTTCCCATGCATCAGG

More information

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other? Phylogeny and systematics Why are these disciplines important in evolutionary biology and how are they related to each other? Phylogeny and systematics Phylogeny: the evolutionary history of a species

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION doi:10.1038/nature17991 Supplementary Discussion Structural comparison with E. coli EmrE The DMT superfamily includes a wide variety of transporters with 4-10 TM segments 1. Since the subfamilies of the

More information

Effects of Gap Open and Gap Extension Penalties

Effects of Gap Open and Gap Extension Penalties Brigham Young University BYU ScholarsArchive All Faculty Publications 200-10-01 Effects of Gap Open and Gap Extension Penalties Hyrum Carroll hyrumcarroll@gmail.com Mark J. Clement clement@cs.byu.edu See

More information

Advanced Topics in RNA and DNA. DNA Microarrays Aptamers

Advanced Topics in RNA and DNA. DNA Microarrays Aptamers Quiz 1 Advanced Topics in RNA and DNA DNA Microarrays Aptamers 2 Quantifying mrna levels to asses protein expression 3 The DNA Microarray Experiment 4 Application of DNA Microarrays 5 Some applications

More information

Introduction to Comparative Protein Modeling. Chapter 4 Part I

Introduction to Comparative Protein Modeling. Chapter 4 Part I Introduction to Comparative Protein Modeling Chapter 4 Part I 1 Information on Proteins Each modeling study depends on the quality of the known experimental data. Basis of the model Search in the literature

More information

CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018

CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018 CONCEPT OF SEQUENCE COMPARISON Natapol Pornputtapong 18 January 2018 SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE Sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of

More information

Sequence analysis and comparison

Sequence analysis and comparison The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species

More information

Objective: You will be able to justify the claim that organisms share many conserved core processes and features.

Objective: You will be able to justify the claim that organisms share many conserved core processes and features. Objective: You will be able to justify the claim that organisms share many conserved core processes and features. Do Now: Read Enduring Understanding B Essential knowledge: Organisms share many conserved

More information

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the

More information

Evolution of complete proteomes: guanine-cytosine pressure, phylogeny and environmental influences blend the proteomic architecture

Evolution of complete proteomes: guanine-cytosine pressure, phylogeny and environmental influences blend the proteomic architecture Chen et al. BMC Evolutionary Biology 213, 13:219 RESEARCH ARTICLE Open Access Evolution of complete proteomes: guanine-cytosine pressure, phylogeny and environmental influences blend the proteomic architecture

More information

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

CHAPTERS 24-25: Evidence for Evolution and Phylogeny CHAPTERS 24-25: Evidence for Evolution and Phylogeny 1. For each of the following, indicate how it is used as evidence of evolution by natural selection or shown as an evolutionary trend: a. Paleontology

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary information S3 (box) Methods Methods Genome weighting The currently available collection of archaeal and bacterial genomes has a highly biased distribution of isolates across taxa. For example,

More information

A thermophilic last universal ancestor inferred from its estimated amino acid composition

A thermophilic last universal ancestor inferred from its estimated amino acid composition CHAPTER 17 A thermophilic last universal ancestor inferred from its estimated amino acid composition Dawn J. Brooks and Eric A. Gaucher 17.1 Introduction The last universal ancestor (LUA) represents a

More information

Non-independence in Statistical Tests for Discrete Cross-species Data

Non-independence in Statistical Tests for Discrete Cross-species Data J. theor. Biol. (1997) 188, 507514 Non-independence in Statistical Tests for Discrete Cross-species Data ALAN GRAFEN* AND MARK RIDLEY * St. John s College, Oxford OX1 3JP, and the Department of Zoology,

More information

Sequence Alignments. Dynamic programming approaches, scoring, and significance. Lucy Skrabanek ICB, WMC January 31, 2013

Sequence Alignments. Dynamic programming approaches, scoring, and significance. Lucy Skrabanek ICB, WMC January 31, 2013 Sequence Alignments Dynamic programming approaches, scoring, and significance Lucy Skrabanek ICB, WMC January 31, 213 Sequence alignment Compare two (or more) sequences to: Find regions of conservation

More information

MiGA: The Microbial Genome Atlas

MiGA: The Microbial Genome Atlas December 12 th 2017 MiGA: The Microbial Genome Atlas Jim Cole Center for Microbial Ecology Dept. of Plant, Soil & Microbial Sciences Michigan State University East Lansing, Michigan U.S.A. Where I m From

More information

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline Phylogenetics Todd Vision iology 522 March 26, 2007 pplications of phylogenetics Studying organismal or biogeographic history Systematics ating events in the fossil record onservation biology Studying

More information

AP BIOLOGY (UNIT 9) (Ch. 26)

AP BIOLOGY (UNIT 9) (Ch. 26) EARLY EARTH AND THE ORIGIN OF LIFE (Ch. 26) In the Big Bang Theory, the observable universe began with an instantaneously expanding point, roughly ten to twenty billion years ago. Since then, the universe

More information

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky MOLECULAR PHYLOGENY "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky EVOLUTION - theory that groups of organisms change over time so that descendeants differ structurally

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary information S1 (box). Supplementary Methods description. Prokaryotic Genome Database Archaeal and bacterial genome sequences were downloaded from the NCBI FTP site (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/)

More information

doi: / _25

doi: / _25 Boc, A., P. Legendre and V. Makarenkov. 2013. An efficient algorithm for the detection and classification of horizontal gene transfer events and identification of mosaic genes. Pp. 253-260 in: B. Lausen,

More information

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016 Molecular phylogeny - Using molecular sequences to infer evolutionary relationships Tore Samuelsson Feb 2016 Molecular phylogeny is being used in the identification and characterization of new pathogens,

More information

Inferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT

Inferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT Inferring phylogeny Constructing phylogenetic trees Tõnu Margus Contents What is phylogeny? How/why it is possible to infer it? Representing evolutionary relationships on trees What type questions questions

More information

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Bioinformatics tools for phylogeny and visualization. Yanbin Yin Bioinformatics tools for phylogeny and visualization Yanbin Yin 1 Homework assignment 5 1. Take the MAFFT alignment http://cys.bios.niu.edu/yyin/teach/pbb/purdue.cellwall.list.lignin.f a.aln as input and

More information

Biotechnology of Proteins. The Source of Stability in Proteins (III) Fall 2015

Biotechnology of Proteins. The Source of Stability in Proteins (III) Fall 2015 Biotechnology of Proteins The Source of Stability in Proteins (III) Fall 2015 Conformational Entropy of Unfolding It is The factor that makes the greatest contribution to stabilization of the unfolded

More information

Part III - Bioinformatics Study of Aminoacyl trna Synthetases. VMD Multiseq Tutorial Web tools. Perth, Australia 2004 Computational Biology Workshop

Part III - Bioinformatics Study of Aminoacyl trna Synthetases. VMD Multiseq Tutorial Web tools. Perth, Australia 2004 Computational Biology Workshop Part III - Bioinformatics Study of Aminoacyl trna Synthetases VMD Multiseq Tutorial Web tools Perth, Australia 2004 Computational Biology Workshop Multiple Sequence Alignments The aminoacyl-trna synthetases,

More information

PHYLOGENY AND SYSTEMATICS

PHYLOGENY AND SYSTEMATICS AP BIOLOGY EVOLUTION/HEREDITY UNIT Unit 1 Part 11 Chapter 26 Activity #15 NAME DATE PERIOD PHYLOGENY AND SYSTEMATICS PHYLOGENY Evolutionary history of species or group of related species SYSTEMATICS Study

More information

INTERACTIVE CLUSTERING FOR EXPLORATION OF GENOMIC DATA

INTERACTIVE CLUSTERING FOR EXPLORATION OF GENOMIC DATA INTERACTIVE CLUSTERING FOR EXPLORATION OF GENOMIC DATA XIUFENG WAN xw6@cs.msstate.edu Department of Computer Science Box 9637 JOHN A. BOYLE jab@ra.msstate.edu Department of Biochemistry and Molecular Biology

More information

Name: Class: Date: ID: A

Name: Class: Date: ID: A Class: _ Date: _ Ch 17 Practice test 1. A segment of DNA that stores genetic information is called a(n) a. amino acid. b. gene. c. protein. d. intron. 2. In which of the following processes does change

More information

Biology 211 (2) Week 1 KEY!

Biology 211 (2) Week 1 KEY! Biology 211 (2) Week 1 KEY Chapter 1 KEY FIGURES: 1.2, 1.3, 1.4, 1.5, 1.6, 1.7 VOCABULARY: Adaptation: a trait that increases the fitness Cells: a developed, system bound with a thin outer layer made of

More information

Lecture 15: Realities of Genome Assembly Protein Sequencing

Lecture 15: Realities of Genome Assembly Protein Sequencing Lecture 15: Realities of Genome Assembly Protein Sequencing Study Chapter 8.10-8.15 1 Euler s Theorems A graph is balanced if for every vertex the number of incoming edges equals to the number of outgoing

More information

Nature Genetics: doi: /ng Supplementary Figure 1. Icm/Dot secretion system region I in 41 Legionella species.

Nature Genetics: doi: /ng Supplementary Figure 1. Icm/Dot secretion system region I in 41 Legionella species. Supplementary Figure 1 Icm/Dot secretion system region I in 41 Legionella species. Homologs of the effector-coding gene lega15 (orange) were found within Icm/Dot region I in 13 Legionella species. In four

More information

11. What are the four most abundant elements in a human body? A) C, N, O, H, P B) C, N, O, P C) C, S, O, H D) C, Na, O, H E) C, H, O, Fe

11. What are the four most abundant elements in a human body? A) C, N, O, H, P B) C, N, O, P C) C, S, O, H D) C, Na, O, H E) C, H, O, Fe 48017 omework#1 on VVP Chapter 1: and in the provided answer template on Monday 4/10/17 @ 1:00pm; Answers on this document will not be graded! Matching A) Phylogenetic B) negative C) 2 D) Δ E) TS F) halobacteria

More information

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics)

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics) - Phylogeny? - Systematics? The Phylogenetic Systematics (Phylogeny and Systematics) - Phylogenetic systematics? Connection between phylogeny and classification. - Phylogenetic systematics informs the

More information

Properties of amino acids in proteins

Properties of amino acids in proteins Properties of amino acids in proteins one of the primary roles of DNA (but not the only one!) is to code for proteins A typical bacterium builds thousands types of proteins, all from ~20 amino acids repeated

More information

SPECIES OF ARCHAEA ARE MORE CLOSELY RELATED TO EUKARYOTES THAN ARE SPECIES OF PROKARYOTES.

SPECIES OF ARCHAEA ARE MORE CLOSELY RELATED TO EUKARYOTES THAN ARE SPECIES OF PROKARYOTES. THE TERMS RUN AND TUMBLE ARE GENERALLY ASSOCIATED WITH A) cell wall fluidity. B) cell membrane structures. C) taxic movements of the cell. D) clustering properties of certain rod-shaped bacteria. A MAJOR

More information

What can sequences tell us?

What can sequences tell us? Bioinformatics What can sequences tell us? AGACCTGAGATAACCGATAC By themselves? Not a heck of a lot...* *Indeed, one of the key results learned from the Human Genome Project is that disease is much more

More information

Similarity or Identity? When are molecules similar?

Similarity or Identity? When are molecules similar? Similarity or Identity? When are molecules similar? Mapping Identity A -> A T -> T G -> G C -> C or Leu -> Leu Pro -> Pro Arg -> Arg Phe -> Phe etc If we map similarity using identity, how similar are

More information

Phylogenetic inference

Phylogenetic inference Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis-) advantages of different information types

More information

Chapter 26 Phylogeny and the Tree of Life

Chapter 26 Phylogeny and the Tree of Life Chapter 26 Phylogeny and the Tree of Life Biologists estimate that there are about 5 to 100 million species of organisms living on Earth today. Evidence from morphological, biochemical, and gene sequence

More information

Classification, Phylogeny yand Evolutionary History

Classification, Phylogeny yand Evolutionary History Classification, Phylogeny yand Evolutionary History The diversity of life is great. To communicate about it, there must be a scheme for organization. There are many species that would be difficult to organize

More information

The Prokaryotic World

The Prokaryotic World The Prokaryotic World A. An overview of prokaryotic life There is no doubt that prokaryotes are everywhere. By everywhere, I mean living in every geographic region, in extremes of environmental conditions,

More information

Using Higher Calculus to Study Biologically Important Molecules Julie C. Mitchell

Using Higher Calculus to Study Biologically Important Molecules Julie C. Mitchell Using Higher Calculus to Study Biologically Important Molecules Julie C. Mitchell Mathematics and Biochemistry University of Wisconsin - Madison 0 There Are Many Kinds Of Proteins The word protein comes

More information

(Lys), resulting in translation of a polypeptide without the Lys amino acid. resulting in translation of a polypeptide without the Lys amino acid.

(Lys), resulting in translation of a polypeptide without the Lys amino acid. resulting in translation of a polypeptide without the Lys amino acid. 1. A change that makes a polypeptide defective has been discovered in its amino acid sequence. The normal and defective amino acid sequences are shown below. Researchers are attempting to reproduce the

More information

7.012 Problem Set 1 Solutions

7.012 Problem Set 1 Solutions ame TA Section 7.012 Problem Set 1 Solutions Your answers to this problem set must be inserted into the large wooden box on wheels outside 68120 by 4:30 PM, Thursday, September 15. Problem sets will not

More information

Computational Biology: Basics & Interesting Problems

Computational Biology: Basics & Interesting Problems Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information

More information

Phylogeny 9/8/2014. Evolutionary Relationships. Data Supporting Phylogeny. Chapter 26

Phylogeny 9/8/2014. Evolutionary Relationships. Data Supporting Phylogeny. Chapter 26 Phylogeny Chapter 26 Taxonomy Taxonomy: ordered division of organisms into categories based on a set of characteristics used to assess similarities and differences Carolus Linnaeus developed binomial nomenclature,

More information

A. Incorrect! In the binomial naming convention the Kingdom is not part of the name.

A. Incorrect! In the binomial naming convention the Kingdom is not part of the name. Microbiology Problem Drill 08: Classification of Microorganisms No. 1 of 10 1. In the binomial system of naming which term is always written in lowercase? (A) Kingdom (B) Domain (C) Genus (D) Specific

More information

BIOL 1010 Introduction to Biology: The Evolution and Diversity of Life. Spring 2011 Sections A & B

BIOL 1010 Introduction to Biology: The Evolution and Diversity of Life. Spring 2011 Sections A & B BIOL 1010 Introduction to Biology: The Evolution and Diversity of Life. Spring 2011 Sections A & B Steve Thompson: stthompson@valdosta.edu http://www.bioinfo4u.net 1 ʻTree of Life,ʼ ʻprimitive,ʼ ʻprogressʼ

More information

Examples of Phylogenetic Reconstruction

Examples of Phylogenetic Reconstruction Examples of Phylogenetic Reconstruction 1. HIV transmission Recently, an HIV-positive Florida dentist was suspected of having transmitted the HIV virus to his dental patients. Although a number of his

More information

BINF6201/8201. Molecular phylogenetic methods

BINF6201/8201. Molecular phylogenetic methods BINF60/80 Molecular phylogenetic methods 0-7-06 Phylogenetics Ø According to the evolutionary theory, all life forms on this planet are related to one another by descent. Ø Traditionally, phylogenetics

More information

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 2009 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

More information

Chapter 26 Phylogeny and the Tree of Life

Chapter 26 Phylogeny and the Tree of Life Chapter 26 Phylogeny and the Tree of Life Chapter focus Shifting from the process of how evolution works to the pattern evolution produces over time. Phylogeny Phylon = tribe, geny = genesis or origin

More information

Supplementary Information for Hurst et al.: Causes of trends of amino acid gain and loss

Supplementary Information for Hurst et al.: Causes of trends of amino acid gain and loss Supplementary Information for Hurst et al.: Causes of trends of amino acid gain and loss Methods Identification of orthologues, alignment and evolutionary distances A preliminary set of orthologues was

More information

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 200 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

More information

Honor pledge: I have neither given nor received unauthorized aid on this test. Name :

Honor pledge: I have neither given nor received unauthorized aid on this test. Name : Midterm Exam #1 MB 451 : Microbial Diversity Honor pledge: I have neither given nor received unauthorized aid on this test. Signed : Date : Name : 1. What are the three primary evolutionary branches of

More information

Copyright Mark Brandt, Ph.D A third method, cryogenic electron microscopy has seen increasing use over the past few years.

Copyright Mark Brandt, Ph.D A third method, cryogenic electron microscopy has seen increasing use over the past few years. Structure Determination and Sequence Analysis The vast majority of the experimentally determined three-dimensional protein structures have been solved by one of two methods: X-ray diffraction and Nuclear

More information

Letter to the Editor. Department of Biology, Arizona State University

Letter to the Editor. Department of Biology, Arizona State University Letter to the Editor Traditional Phylogenetic Reconstruction Methods Reconstruct Shallow and Deep Evolutionary Relationships Equally Well Michael S. Rosenberg and Sudhir Kumar Department of Biology, Arizona

More information

Origins of Life. Fundamental Properties of Life. Conditions on Early Earth. Evolution of Cells. The Tree of Life

Origins of Life. Fundamental Properties of Life. Conditions on Early Earth. Evolution of Cells. The Tree of Life The Tree of Life Chapter 26 Origins of Life The Earth formed as a hot mass of molten rock about 4.5 billion years ago (BYA) -As it cooled, chemically-rich oceans were formed from water condensation Life

More information

A (short) introduction to phylogenetics

A (short) introduction to phylogenetics A (short) introduction to phylogenetics Thibaut Jombart, Marie-Pauline Beugin MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis with PR Statistics, Millport Field

More information

Geometrical Concept-reduction in conformational space.and his Φ-ψ Map. G. N. Ramachandran

Geometrical Concept-reduction in conformational space.and his Φ-ψ Map. G. N. Ramachandran Geometrical Concept-reduction in conformational space.and his Φ-ψ Map G. N. Ramachandran Communication paths in trna-synthetase: Insights from protein structure networks and MD simulations Saraswathi Vishveshwara

More information