Gene Family Content-Based Phylogeny of Prokaryotes: The Effect of Criteria for Inferring Homology

Size: px
Start display at page:

Download "Gene Family Content-Based Phylogeny of Prokaryotes: The Effect of Criteria for Inferring Homology"

Transcription

1 Syst. Biol. 54(2): , 2005 Copyright c Society of Systematic Biologists ISSN: print / X online DOI: / Gene Family Content-Based Phylogeny of Prokaryotes: The Effect of Criteria for Inferring Homology AUSTIN L. HUGHES, 1 VIKRAM EKOLLU, 2 ROBERT FRIEDMAN, 1 AND JOHN R. ROSE 2 1 Department of Biological Sciences and 2 Department of Computer Science and Engineering, University of South Carolina, Columbia, South Carolina 29205, USA; austin@biol.sc.edu (A.L.H.) Abstract. A number of recent papers have suggested that gene family content can be used to resolve phylogenies, particularly in the case of prokaryotes, in which extensive horizontal gene transfer means that individual gene phylogenies may not mirror the organismal phylogeny. However, no study has yet examined how sensitive such analyses are to the criterion of homology assessment used to assemble multigene families. Using data from 99 completely sequenced prokaryotic genomes, we examined the effect of homology criteria in phylogenetic analyses wherein presence or absence of each family in the genome was used as a cladistic character. Different criteria resulted in evidence for contradictory tree topologies, sometimes with high bootstrap support. A moderately strict criterion seemed best for assembling multigene families in a biologically meaningful way, but it was not necessarily preferable for phylogenetic analysis. Instead, a very strict criterion, which broke up gene families into smaller subfamilies, seemed to have advantages for phylogenetic purposes. The poor performance of gene family content-based phylogenetic analysis in the case of prokaryotes appears to reflect high levels of homoplasy resulting not only from horizontal gene transfer but also, more importantly, from extensive parallel loss of gene families in certain bacteria genomes. [Gene content; gene families; gene loss; horizontal gene transfer; phylogenetic methods.] The availability of a large number of complete sequences of prokaryotic genomes holds promise for further resolving the phylogenetic relationships among major prokaryotic groups. However, there is evidence that horizontal gene transfer (HGT) may have been a frequent occurrence in prokaryotic evolution, which would imply that the phylogeny of individual genes may not reflect the organismal phylogeny (Daubin et al., 2003; Kunin and Ouzounis, 2003; Lerat et al., 2003; Mirkin et al., 2003; Wolf et al., 2002). For this reason, a number of investigators have advocated approaches to prokaryotic phylogeny based on so-called gene content (Snell et al., 1999), that is, the presence or absence of gene families in genomes, which might be more properly called gene family content. Often, gene family content analyses have made use of various distances based on the proportion of shared gene families (Snell et al., 1999). Because of the ad hoc character of such distances, some authors (e.g., Gu, 2000; Huson and Steel, 2004) have proposed a maximum likelihood approach to this question. However, because developing a biologically accurate model of gene family gain and loss is problematic, a number of authors have applied parsimony to gene family content analyses (Wolf et al., 2001, 2004; Hughes and Friedman, 2004). Whatever method of phylogenetic reconstruction is used, any analysis based on gene family content faces a problem in defining gene families. When families are defined in automated fashion, some search criterion based on the extent of sequence similarity must be used; but the effect of the choice of search criterion on the results of phylogenetic analyses has so far not been studied. In the present paper, we apply a range of different homology criteria to establish gene families in order to examine the sensitivity of such analyses to the criteria used in assigning family membership. METHODS We analyzed 99 complete genomes of prokaryotes, 16 from Archaea and 83 from Bacteria (see Appendix 1 for accession numbers). References to currently accepted taxonomy of these species followed the Bergey s Manual Trust Web site ( bergeys/outline.prn.pdf). We assembled gene families by inferred homology from search applied to predicted protein translations using the BLASTCLUST software available in the Blast software package (Altschul et al., 1997). This program identifies families by a singlelinkage method, which assembles larger families by linking shared genes among families, thus ensuring that a given gene will be assigned to only one family. Sequence homology was established by identifying matches using a conservative E-value of 10 6.Weused six different criteria for scoring a match between two sequences: (1) a minimum of 10% sequence identity across at least 30% of the two sequences; (2) a minimum of 20% sequence identity across at least 40% of the two sequences; (3) a minimum of 30% sequence identity across at least 50% of the two sequences; (4) a minimum of 40% sequence identity across at least 60% of the two sequences; (5) a minimum of 50% sequence identity across at least 70% of the two sequences; and (6) a minimum of 60% sequence identity across at least 80% of the two sequences. We refer to these criteria, respectively, as 10/30, 20/40, 30/50, 40/60, 50/70, and 60/80. Using these specified homology criteria, all predicted proteins in the 99 genomes were assigned to families. Families having only a single member were excluded from the analyses. For each remaining family, each genome was scored for presence (1) or absence (0). Maximum parsimony (MP) analysis, using heuristic search by simple stepwise addition (Swofford, 2002), was applied 268

2 2005 HUGHES ET AL. GENE FAMILY CONTENT-BASED PHYLOGENY OF PROKARYOTES 269 TABLE 1. Properties of gene families identified by different homology criteria. Criterion Genome characteristics 10/30 20/40 30/50 40/60 50/70 60/80 Number of genes in 2460 ± ± ± ± ± ± 129 families per genome a Number of families per 837 ± ± ± ± ± ± 122 genome a Genes/family a 2.86 ± ± ± ± ± ± 0.01 Correlation between genes/family and genome size (bp) a Mean ± SE for 99 prokaryotic genomes (P < 0.001) (P < 0.001) to the resulting matrix, in which protein families corresponded to characters. MP trees were rooted on the assumption that Archaea constitute an outgroup to Bacteria. Bootstrapping (1000 replicates) (Felsenstein, 1985) was used to assess the extent to which clustering patterns in the MP tree received support from the data set as a whole. In order to assess the nature of the phylogenetic signal in the data sets assembled under different homology criteria, we computed the amount of possible synapomorphy (APS) (Ferris, 1989; Simmons et al., 2004). For each parsimony-informative character, APS is defined as the difference between the maximum and minimum number of possible steps for that character. Characters with high APS can potentially be used to resolve deep internal branches of the phylogenetic tree, whereas those with low APS can only resolve outer branches. Thus, the average APS across all informative characters provides information regarding the potential for resolution of deep branches. In order to examine the tree-like nature of the signal in each data set, we calculated NeighborNet splits graphs using SplitsTree 4.0 (Bryant and Moulton, 2004; Huson, 1998) from a matrix of p-distances (proportion of difference) among genomes, derived from the matrix of 1s and 0s. This approach allowed a heuristic visualization of the extent of conflicting signals in the data, as homology criteria were changed. RESULTS The different search criteria led to differences in definition and membership of families (Table 1). As the strictness of the criterion increased, the mean number of genes per genome assigned to families decreased (Table 1). This evidently occurred because increasingly strict homology criteria led to an increase in the number of singletons, i.e., single genes not assigned to membership in any family. The mean number of families per genome was lowest with the least strict criterion (10/30), then increased as the criterion became stricter, reaching a maximum at 40/60, and then declined as the criterion became still stricter (Table 1). The mean number of genes per family decreased as a function of increasing strictness of the homology criterion (Table 1) (P < 0.001) (P < 0.001) (P = 0.002) (N.S.) Under most criteria, the mean number of genes per family in a genome was correlated with genome size (in bp). This correlation was strongest with the 30/50 criterion (Table 1), in which case a close linear relationship was observed (Fig. 1A). However, under the FIGURE 1. Scatter plots showing the relationship between number of genes per family and genome size in 99 prokaryotes: (A) when families were assembled under the 30/50 homology criterion; and (B) when families were assembled under the 60/80 homology criterion.

3 270 SYSTEMATIC BIOLOGY VOL. 54 TABLE 2. Summary of MP analyses based on gene families identified by different search criteria. Criterion 10/30 20/40 30/50 40/60 50/70 60/80 No. informative 12,919 13,965 19,131 31,908 40,366 43,890 characters No. MP trees found Changes 1to (15.7%) 5726 (16.4%) 8659 (17.7%) 9372 (14.0%) 8110 (11.9%) 4210 (6.9%) 0to1 26,814 (84.3%) 29,113 (83.6%) 40,233 (82.4%) 57,346 (86.0%) 59,892 (88.1%) 56,676 (93.1%) Total 31,789 34,839 48,892 66,718 68,002 60,886 Consistency index a Mean d b T Significant branches c Terminal pair 28 (47.4%) 28 (49.1%) 29 (47.5%) 32 (43.8%) 31 (44.9%) 27 (42.2%) Internal 31 (52.6%) 29 (50.9%) 32 (52.5%) 41 (56.2%) 38 (55.1%) 37 (57.8%) Total Mean APS d (±SE) 4.23 ± ± ± ± ± ± 0.01 a Excluding noninformative sites. b Mean topological distance (d T )tomptrees found under all other criteria. c Defined as a branch receiving at least 95% support in 1000 bootstrap samples. d APS = amount of possible synapomorphy (per character). strictest criterion (60/80), there was not a significant relationship between the mean number of genes per family and genome size (Table 1 and Fig. 1B). This evidently occurred because, under the strictest criterion, families were broken up to the point that relatively few families had more than a single member in any given genome. Table 2 summarizes results of phylogenetic analyses conducted using the data sets assembled under the different homology criteria. The number of informative characters (i.e., families) available for analyses increased as the strictness of the criterion increased (Table 2). The consistency index (CI) decreased, reaching a minimum at 30/50, then increased sharply as the criteria increased in strictness (Table 2). This pattern evidently occurred because the proportion of hypothesized changes involving loss of a family (character changes from 1 to 0) was highest with the 30/50 criterion. Under the 30/50 criterion, large families were broken up but not excessively so. Thus there were fewer gains of families (character change from 0 to 1) relative to losses under this criterion, and families including both gains and losses contributed to the reduction in CI. With more liberal criteria, fewer distinct families were identified; thus, both gains and losses were reduced. In contrast, with stricter criteria, an increasingly large number of families were identified, leading to very few losses of families and a large number of gains (Table 2). Regarding bootstrap support for branches within the trees, the number of significant (95% support or better) did not change in a consistent way as a function of the strictness of the homology criterion (Table 2). Both the number of significant branches and the number of significant internal branches (i.e., those deeper than the branch leading to a terminal pair) were highest with the 40/60 criterion. The mean APS (per informative character) differed significantly among the six criteria (one-way analysis of variance [ANOVA]; F 5,161,903 = ; P < 0.001) (Table 2). Mean APS increased slightly with increasing strictness of the homology criterion from 10/30 to 30/30, then decreased as the criterion became increasingly strict (Table 2). As a result, the mean APS for 60/80 was less than half that for 30/50 (Table 2). These results imply that, using a criterion of intermediate strictness, there was maximal potential information for resolving deep internal branches, whereas with an extremely strict criterion a greater proportion of information was available for resolving terminal branches. Figure 2 illustrates the single MP tree based on the moderate 30/50 criterion. As in all MP trees found under all search criteria, Archaea clustered apart from Bacteria (Fig. 2). In addition, as in all MP trees found under all criteria, closely related species (such as congeners) clustered together, usually with strong bootstrap support (Fig. 2). In the Bacteria, certain members of recognized higher level taxonomic groups clustered together, although monophyly of previously recognized higher level groupings was generally not supported. For example, the order Bacillales (including Bacillus and related genera) formed a well-supported monophyletic group (Fig. 2). However, the phylum Firmicutes, in which Bacillales is included, did not form a monophyletic cluster. Mycoplasma and Ureaplasma, traditionally included in Firmicutes, clustered apart from the cluster including most Firmicutes. In addition, the cluster including most Firmicutes also included Fusobacterium (Fig. 2), which is assigned to a separate phylum (Fusobacteria). Similarly, there was a well-supported cluster that included many genera assigned to the phylum Proteobacteria, such as Escherichia, Agrobacterium, and Ralstonia (Fig. 2). However, the groupings within this cluster did not correspond to the currently accepted classes Alphaproteobacteria, Betaproteobacteria, and Gammaproteobacteria (Fig. 2). In addition, Rickettsia and Buchnera, traditionally assigned to Proteobacteria, fell outside this

4 2005 HUGHES ET AL. GENE FAMILY CONTENT-BASED PHYLOGENY OF PROKARYOTES 271 FIGURE 2. Single MP tree constructed under the 30/50 homology criterion (for details see Table 2). Symbols on the branches indicate the strength of bootstrap support: open circles, 95% to 98%; closed circles 99%.

5 272 SYSTEMATIC BIOLOGY VOL. 54 cluster (Fig. 2). In the six MP trees based on the strict 60/80 criterion, Rickettsia and Buchnera clustered strongly with Proteobacteria (Fig. 3). On the other hand, Firmicutes were not recovered as a monophyletic group, because Mycoplasma and Ureaplasma fell outside the cluster with other genera traditionally assigned to Firmicutes (Fig. 3). Figure 4 shows the strict consensus of all MP trees found with the different criteria used. In this consensus tree, most deep-branching patterns were unresolved (Fig. 4). Only 46 branches received significant bootstrap support a much lower figure than in any of the individual trees constructed on the basis of individual homology criteria (Table 2). Of these, only 19 (41%) represented deep branches (i.e., not branches subtending terminal pairs), again a much lower figure than in any individual tree (Table 2). The fact that certain deep branches were not resolved in the consensus tree but received significant bootstrap support in individual trees implies that the trees constructed on the basis of the different homology criteria frequently resolved the higher-level relationships of prokaryotes in mutually contradictory ways. Equally illustrative of the conflicts among trees were the high mean topological distances (d T ) among the MP trees found under each criterion (Table 2). The 20/40 and 30/50 criteria were closest on average to the other criteria, while 60/80 was farthest from the other criteria (Table 2). The large average d T values to 60/80 reflected in part the placement of both Rickettsia and Buchnera with other Proteobacteria under the latter criterion, which was not observed under any other criterion (Figs. 2, 3 and data not shown). NeighborNet analyses produced splits graphs that corroborated the findings from the APS analyses. These show that, as the homology assessment became strict, support decreased for the internal branches that separate major clusters. Most noticeable was the loss of phylogenetic signal separating Archaera and Eubacteria; for example, compare the graph for 30/50 with that for 60/80. (For splits graphs for all criteria, see Fig. A1, available online at the Society of Systematic Biologists web site, Interestingly, at stricter levels used to infer homology, there appeared to be a higher level of bifurcation amongst terminal taxa (Fig. 4). These findings support other observations that we report and indicate that no one criterion well represents all relevant phylogenetic information. DISCUSSION The results presented here demonstrate that, at least in the case of prokaryotic genomes, phylogenetic analyses based on gene family content are highly sensitive to the homology criteria used to define families. The true phylogeny of these organisms is so far poorly resolved. Thus, it is not in general possible to say which of the homology criteria used produced a tree closer to the true tree. However, the fact that the trees obtained with different homology criteria were mutually contradictory did not increase confidence in the applicability of gene content analyses to the resolution of prokaryotic phylogenies. Although parsimony was used for phylogenetic reconstruction in the present analyses, there is no reason to believe that the problems revealed here are unique to parsimony. Because all methods of analysis that have been applied to gene family content take family assignment of genes as a given, at least some of the same problems are likely to arise with distance or likelihood methods as well. The absence of Rickettsia and Buchnera from the cluster with other Proteobacteria in the phylogeny based on the moderate 30/50 criterion (Fig. 2) suggested that parallel loss of gene families is the likely explanation for some of the observed problems. Both Rickettsia and Buchnera have reduced genome sizes due to massive loss of gene families as an adaptation to life as obligate intracellular parasites (Andersson et al., 1998; van Ham et al., 2003). The extensive loss of gene families apparently caused these taxa to cluster nearer to other genera that have lost numerous gene families in adaptation to intracellular life, such as Mycoplasma (Himmelreich et al., 1996). Parallel gene family loss in adaptation to similar lifestyles appears to have created a sufficient degree of homoplasy that the true relationships of these organisms cannot be recovered by the method used. Previous studies have noted the problems that large-scale loss of gene families can pose for analyses based on gene family content (House and Firzgibbon, 2002; Dutilh et al., 2004; Lake and Rivera, 2004). Dutilh et al. (2004) have developed a method of reducing phylogenetically discordant signals in gene family content data that appears to ameliorate the problem. On the other hand, in our analysis based on the 60/80 homology criterion, Rickettsia and Buchnera clustered among the Proteobacteria, although Buchnera did not cluster with Gammaproteobacteria, as expected from traditional classification (Fig. 3). The strict 60/80 criterion evidently had the effect of breaking up gene families so that only proteins showing a close phylogenetic relationship were grouped in a common family (Table 2). Because of the problems of extensive parallel gene loss, these extremely subdivided families may better reconstruct relatively close evolutionary relationships than do less subdivided families, at least in the case of prokaryotes. The greatly reduced amount of possible synapomorphy (APS) per character in the case of the 60/80 criterion in comparison to more liberal criteria (Table 2) suggests that a stricter criterion provides more information suitable for resolving close relationships than do more liberal criteria. Conversely, moderate criteria (such as 30/50) showed the highest mean APS per character (Table 2) and thus the most potential information for resolving deep branches. However, the higher APS for moderate criteria did not in practice lead to a strikingly better resolution of deep branches (compare Figs. 2 and 3). Even at this level of homology criteria NeighborNet analysis showed many contradictory internal splits. This may at least in part reflect ancient horizontal gene transfers (HGT) among major lineages. Using a stricter criterion eliminates some contradictory splits; however, accompanying this is the loss of information as the stricter criterion breaks up ancient gene families whose phylogenetic relationships may document HGT events.

6 2005 HUGHES ET AL. GENE FAMILY CONTENT-BASED PHYLOGENY OF PROKARYOTES 273 FIGURE 3. Strict consensus tree of 6 MP trees constructed under the 60/80 criterion. Symbols on the branches indicate the strength of bootstrap support: open circles, 95% to 98%; closed circles 99%.

7 274 SYSTEMATIC BIOLOGY VOL. 54 FIGURE 4. Strict consensus of all MP trees (N = 14) constructed under all six search criteria. Symbols on the branches indicate the strength of bootstrap support for a given clustering pattern in all MP trees: open circles, 95% to 98% in all trees; closed circles 99% in all MP trees.

8 2005 HUGHES ET AL. GENE FAMILY CONTENT-BASED PHYLOGENY OF PROKARYOTES 275 Families assembled with a moderate criterion may provide a better representation of what is usually meant by a multigene family than do the highly subdivided families assembled by a very strict criterion. In completely sequenced eukaryotic genomes, there is a correlation between genome size and the number of genes per family (Friedman and Hughes, 2001). We found in prokaryotic genomes also, except when the strictest was used, the number of genes per family was positively correlated with genome size (Table 1 and Fig. 1A). This suggests that less strict criteria better capture the concept of a gene family as a product of within-genome gene duplications (and, in the case of prokaryotes, occasional between-genome horizontal transfers). This correlation was strongest with the 30/50 criterion, suggesting that a criterion of intermediate strictness may be optimal when the goal is to assemble gene families for purposes of reconstructing the pattern of gene duplication within a genome. On the other hand, a very liberal criterion (such as 10/30) may approximate the results of an analysis based on families of domains or protein folds (Lin and Gerstein, 2000), since a very liberal criterion is likely to group proteins that share even one domain. With all homology criteria used, the hypothesized gains of families substantially exceeded the hypothesized losses (Table 2). Hypothesized gains of families include both the first appearance of the gene in the phylogeny and its appearance in a new part of the phylogeny as a result of an HGT event. Furthermore, as stricter homology criteria are used, an increasing number of hypothesized gains of gene families are artifacts of the break-up of large families. When subfamilies of a large family are characterized as separate families, each such family is hypothesized to make a separate first appearance in the phylogeny. Thus, although a very strict homology criterion might be preferable for reconstructing some relationships of prokaryotic phylogeny, it would be very misleading if it were used to reconstruct the true pattern of HGT within a phylogeny. ACKNOWLEDGMENTS This research was supported by grant GM to A.L.H. from the National Institutes of Health. REFERENCES Altschul, S. F., T. L. Madden, A. A. Schäffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25: Andersson, S. G., A. Zomorodipour, J. O. Anderssson, T. Sicheritz- Ponten, U. C. Alsmark, R. M. Podowski, A. K. Naslund, A. S. Erikson, H. H. Winkler, and C. G. Kurland The genome sequence of Rickettsia prowazekii and the origin of mitochondria. Nature 396: Bryant, D., and V. Moulton NeighborNet: An agglomerative method for the construction of planar phylogenetic networks. Mol. Biol. Evol. 21: Daubin, V., N. A. Moran, and H. Ochman Phylogenetics and the cohesion of bacterial genomes. Science 301: Dutilh, B. E., M. A. Huynen, W. J. Bruno, and B. Snel The consistent phylogenetic signal in genome trees revealed by reducing the impact of noise. J. Mol. Evol. 58: Farris, J. S The retention index and the rescaled consistency index. Cladistics 5: Felsenstein, J Confidence limits on phylogenies: An approach using the bootstrap. Evolution 39: Friedman, R., and Hughes, A. L Pattern and timing of gene duplication in animal genomes. Genome Res. 11: Gu, X A simple evolutionary model for genome phylogeny based on gene content. Pages in Comparative genomics (D. Sankoff and J. H. Nadeau, eds.) Kluwer Academic, Dordrecht. Himmelreich, R., H. Hilbert, H. Plagens, E. Pirkl, B. C. Li, and R. Herrmann Complete sequence analysis of the genome of the bacterium Mycoplasma pneumoniae. Nucleic Acids Res. 24: House, C. H., and S. T. Fitz-Gibbon Using homolog groups to create a whole-genomic tree of free-living organisms: An update. J. Mol. Evol. 54: Hughes, A. L., and R. Friedman Differential loss of ancestral gene families as a source of genomic divergence in animals. Proc. R. Soc. Lond. B Suppl. 271:S107 S109. Huson, D SplitsTree: Analyzing and visualizing evolutionary data. Bioinformatics 14: Huson, D. H., and M. Steel Phylogenetic trees based on gene content. Bioinformatics 20: Kunin, V., and C. A. Ouzounis The balance of driving forces during genome evolution in prokaryotes. Genome Res. 13: Lake, J. A., and M. C. Rivera Deriving the genomic tree of life in the presence of horizontal gene transfer: Conditioned reconstruction. Mol. Biol. Evol. 21: Lerat, E., V. Daubin, and N. A. Moran From gene trees to organismal phylogeny in prokaryotes: The case of the γ -Proteobacteria. PloS Biol. 1:E19. Lin, J., and M. Gerstein Whole-genome trees based on the occurrence of folds and orthologs: Implications for comparing genomes on different levels. Genome Res. 10: Mirkin, B. G., T. I. Fenner, M. Y. Galperin, and E. V. Koonin Algorithms for computing parsimonious evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes. BMC Evol. Biol. 3:2. Simmons, M. P., T. G. Carr, and K. O Neill Relative characterstate space, amount of potential phylogenetic information, and heterogeneity of nucleotide and amino acid characters. Mol. Phyl. Evol. 32: Snell, B., P. Bork, and M. A. Huynen Genome phylogeny based on gene content. Nat. Genet. 21: Swofford, D. L PAUP*: Phylogenetic analysis using parsimony (*and other methods). Sinauer, Sunderland, Massachusetts. Van Ham, R. C. J., J. Kamerbeek, C. Palacios, C. Rausell, F. Abascal, U. Bastolla, J. M. Fernández, L. Jiménez, M. Postigo, F. J. Silva, J. Tamames, E. Viguera, A. Latorre, A. Valencia, F. Morán, and A. Moya Reductive genome evolution in Buchnera aphidicola. Proc. Natl. Acad. Sci. USA 100: Wolf, Y. I., I. B. Rogozin, N. V. Grishin, and E. V. Koonin Genome trees and the tree of life. Trends Genet. 18: Wolf, Y. I., I. B. Rogozin, N. V. Grishin, R. L. Tatusov, and E. V. Koonin Genome trees constructed using five different approaches suggest new major bacterial clades. BMC Evol. Biol. 1:8. Wolf, Y. I., I. B. Rogozin, and E. V. Koonin Coelomata and not Ecdysozoa: Evidence from genome-wide phylogenetic analysis. Genome Res. 14: First submitted 22 December 2003; reviews returned 6 August 2004; final acceptance 31 October 2004 Associate Editor: Peter Lockhart Editor: Chris Simon

9 276 SYSTEMATIC BIOLOGY VOL. 54 APPENDIX 1 Genome sequences and accession numbers used in analyses: 1. Halobacterium sp. NC Bradyrhizobium japonicum NC Thermoplasma acidophilum NC Mesorhizobium loti NC Thermoplasma volcanicum NC Sinorhizobium meliloti NC Aeropyrum pernix NC Agrobacterium tumefaciens C58 NC Pyrobaculum aerophilum NC Agrobacterium tumefaciens C58 UW NC Sulfolobus solfataricus NC Neisseria meningitidis MC58 NC Solfolobus tokadei NC Neisseria meningitidis Z2491 NC Pyrococcus furiosus NC Haemophilus influenzae NC Pyrococcus abyssi NC Pasteurella multilocida NC Pyrococcus horokoshii NC Shewanella oneidensis NC Archaeoglobus fulgidus NC Vibrio cholerae NC Methanosarcina acetivorans NC Vibrio parahaemolyticus NC Methanosarcina mazei NC Yersinia pestis C092 NC Methanococcus jannaschii-nc Yersinia pestis KIM NC Methanobacterium thermoautotrophicum NC Salmonella enterica NC Methanopyrus kandleri NC Salmonella typhimurium NC Trophyerma whipplei NC Escherichia coli K12 NC Buchnera aphidicola Bp NC Escherichia coli O157H7 NC Buchnera aphidicola Sg NC Escherichia coli O157H7 EDL933 NC Buchnera sp. APS NC Deinococcus radiodurans NC Chlamydia trachomatis NC Streptomyces avertimilis NC Chlamydia pneumoniae NC Streptomyces coelicolor NC Chlamydophila pneumoniae CWL029 NC Cornyebacterim efficiens NC Chlamydophila pneumoniae J138 NC Mycobacterium leprae NC Borrelia burgdorferi NC Mycobacterium tuberculosis CDC1551 NC Treponema pallidum NC Mycobacterium tuberculosis H37Rv NC Mycoplasma pulmonis NC Thermotoga maritima NC Mycoplasma genitalium NC Thermoanaerobacter tencongensis NC Mycoplasma pneumoniae NC Clostridium acetobulyticum NC Mycoplasma penetrans NC Clostridium perfringens NC Ureaplasma urealyticum NC Fusobacterium nucleatum NC Rickettsia conorii NC Staphylococcus aureus MW2 NC Rickettsia prowazekei NC Staphylococcus aureus Mu50 NC Campylobacter jejuni NC Staphylococcus aureus N315 NC Helicobacter pylori NC Listeria innocua NC Helicobacter pylori J99 NC Listeria monocytogenes NC Aquifex aeolicus NC Oceanobacillus iheyensis NC Chlorobium tepidum NC Bacillus halodurans NC Thermosynechoccus elongatus NC Bacillus subtilis NC Nostoc sp. NC Lactobacillus plantarum NC Synechocystis sp. BA Lactococcus lactis NC Nitrosomonas europaea NC Streptococcus pneumoniae R6 NC Xylella fastidiosa NC Streptococcus pneumoniae NC Xanthomonas campestris NC Streptococcus agalactiae 2603VR NC Xanthomonas axonopodis NC Streptococcus agalactiae NEM316 NC Pseudomonas aeruginosa NC Streptococcus pyogenes NC Ralstonia solanacearum NC Streptococcus pyogenes MGAS8232 NC Caulobacter crescentus NC Streptococcus pyogenes MGAS315 NC Brucella melitensis NC Streptococcus pyog pyogenes SSI1 NC Brucella suis NC

Prokaryotic phylogenies inferred from protein structural domains

Prokaryotic phylogenies inferred from protein structural domains Letter Prokaryotic phylogenies inferred from protein structural domains Eric J. Deeds, 1 Hooman Hennessey, 2 and Eugene I. Shakhnovich 3,4 1 Department of Molecular and Cellular Biology, Harvard University,

More information

Additional file 1 for Structural correlations in bacterial metabolic networks by S. Bernhardsson, P. Gerlee & L. Lizana

Additional file 1 for Structural correlations in bacterial metabolic networks by S. Bernhardsson, P. Gerlee & L. Lizana Additional file 1 for Structural correlations in bacterial metabolic networks by S. Bernhardsson, P. Gerlee & L. Lizana Table S1 The species marked with belong to the Proteobacteria subset and those marked

More information

2 Genome evolution: gene fusion versus gene fission

2 Genome evolution: gene fusion versus gene fission 2 Genome evolution: gene fusion versus gene fission Berend Snel, Peer Bork and Martijn A. Huynen Trends in Genetics 16 (2000) 9-11 13 Chapter 2 Introduction With the advent of complete genome sequencing,

More information

ABSTRACT. As a result of recent successes in genome scale studies, especially genome

ABSTRACT. As a result of recent successes in genome scale studies, especially genome ABSTRACT Title of Dissertation / Thesis: COMPUTATIONAL ANALYSES OF MICROBIAL GENOMES OPERONS, PROTEIN FAMILIES AND LATERAL GENE TRANSFER. Yongpan Yan, Doctor of Philosophy, 2005 Dissertation / Thesis Directed

More information

Evolutionary Analysis by Whole-Genome Comparisons

Evolutionary Analysis by Whole-Genome Comparisons JOURNAL OF BACTERIOLOGY, Apr. 2002, p. 2260 2272 Vol. 184, No. 8 0021-9193/02/$04.00 0 DOI: 184.8.2260 2272.2002 Copyright 2002, American Society for Microbiology. All Rights Reserved. Evolutionary Analysis

More information

The Minimal-Gene-Set -Kapil PHY498BIO, HW 3

The Minimal-Gene-Set -Kapil PHY498BIO, HW 3 The Minimal-Gene-Set -Kapil Rajaraman(rajaramn@uiuc.edu) PHY498BIO, HW 3 The number of genes in organisms varies from around 480 (for parasitic bacterium Mycoplasma genitalium) to the order of 100,000

More information

# shared OGs (spa, spb) Size of the smallest genome. dist (spa, spb) = 1. Neighbor joining. OG1 OG2 OG3 OG4 sp sp sp

# shared OGs (spa, spb) Size of the smallest genome. dist (spa, spb) = 1. Neighbor joining. OG1 OG2 OG3 OG4 sp sp sp Bioinformatics and Evolutionary Genomics: Genome Evolution in terms of Gene Content 3/10/2014 1 Gene Content Evolution What about HGT / genome sizes? Genome trees based on gene content: shared genes Haemophilus

More information

Stabilization against Hyperthermal Denaturation through Increased CG Content Can Explain the Discrepancy between Whole Genome and 16S rrna Analyses

Stabilization against Hyperthermal Denaturation through Increased CG Content Can Explain the Discrepancy between Whole Genome and 16S rrna Analyses 11458 Biochemistry 2005, 44, 11458-11465 Stabilization against Hyperthermal Denaturation through Increased CG Content Can Explain the Discrepancy between Whole Genome and 16S rrna Analyses T. E. Meyer*,

More information

The genomic tree of living organisms based on a fractal model

The genomic tree of living organisms based on a fractal model Physics Letters A 317 (2003) 293 302 www.elsevier.com/locate/pla The genomic tree of living organisms based on a fractal model Zu-Guo Yu a,b,,voanh a, Ka-Sing Lau c, Ka-Hou Chu d a Program in Statistics

More information

Biased biological functions of horizontally transferred genes in prokaryotic genomes

Biased biological functions of horizontally transferred genes in prokaryotic genomes Biased biological functions of horizontally transferred genes in prokaryotic genomes Yoji Nakamura 1,5, Takeshi Itoh 2,3, Hideo Matsuda 4 & Takashi Gojobori 1,2 Horizontal gene transfer is one of the main

More information

Introduction to Bioinformatics Integrated Science, 11/9/05

Introduction to Bioinformatics Integrated Science, 11/9/05 1 Introduction to Bioinformatics Integrated Science, 11/9/05 Morris Levy Biological Sciences Research: Evolutionary Ecology, Plant- Fungal Pathogen Interactions Coordinator: BIOL 495S/CS490B/STAT490B Introduction

More information

The minimal prokaryotic genome. The minimal prokaryotic genome. The minimal prokaryotic genome. The minimal prokaryotic genome

The minimal prokaryotic genome. The minimal prokaryotic genome. The minimal prokaryotic genome. The minimal prokaryotic genome Dr. Dirk Gevers 1,2 1 Laboratorium voor Microbiologie 2 Bioinformatics & Evolutionary Genomics The bacterial species in the genomic era CTACCATGAAAGACTTGTGAATCCAGGAAGAGAGACTGACTGGGCAACATGTTATTCAG GTACAAAAAGATTTGGACTGTAACTTAAAAATGATCAAATTATGTTTCCCATGCATCAGG

More information

PBL: INVENT A SPECIES

PBL: INVENT A SPECIES PBL: INVENT A SPECIES Group directions Group Name Group Members Project Prompt Invent a species. Task As a group you have the opportunity to invent a new species. Where did your species come from and how

More information

Prokaryotic Utilization of the Twin-Arginine Translocation Pathway: a Genomic Survey

Prokaryotic Utilization of the Twin-Arginine Translocation Pathway: a Genomic Survey JOURNAL OF BACTERIOLOGY, Feb. 2003, p. 1478 1483 Vol. 185, No. 4 0021-9193/03/$08.00 0 DOI: 10.1128/JB.185.4.1478 1483.2003 Copyright 2003, American Society for Microbiology. All Rights Reserved. Prokaryotic

More information

Visualization of multiple alignments, phylogenies and gene family evolution

Visualization of multiple alignments, phylogenies and gene family evolution nature methods Visualization of multiple alignments, phylogenies and gene family evolution James B Procter, Julie Thompson, Ivica Letunic, Chris Creevey, Fabrice Jossinet & Geoffrey J Barton Supplementary

More information

Assessing evolutionary relationships among microbes from whole-genome analysis Jonathan A Eisen

Assessing evolutionary relationships among microbes from whole-genome analysis Jonathan A Eisen 475 Assessing evolutionary relationships among microbes from whole-genome analysis Jonathan A Eisen The determination and analysis of complete genome sequences have recently enabled many major advances

More information

Organisation of the S10, spc and alpha ribosomal protein gene clusters in prokaryotic genomes

Organisation of the S10, spc and alpha ribosomal protein gene clusters in prokaryotic genomes FEMS Microbiology Letters 242 (2005) 117 126 www.fems-microbiology.org Organisation of the S10, spc and alpha ribosomal protein gene clusters in prokaryotic genomes Tom Coenye *, Peter Vandamme Laboratorium

More information

Correlations between Shine-Dalgarno Sequences and Gene Features Such as Predicted Expression Levels and Operon Structures

Correlations between Shine-Dalgarno Sequences and Gene Features Such as Predicted Expression Levels and Operon Structures JOURNAL OF BACTERIOLOGY, Oct. 2002, p. 5733 5745 Vol. 184, No. 20 0021-9193/02/$04.00 0 DOI: 10.1128/JB.184.20.5733 5745.2002 Copyright 2002, American Society for Microbiology. All Rights Reserved. Correlations

More information

Genome-Wide Molecular Clock and Horizontal Gene Transfer in Bacterial Evolution

Genome-Wide Molecular Clock and Horizontal Gene Transfer in Bacterial Evolution JOURNAL OF BACTERIOLOGY, Oct. 004, p. 6575 6585 Vol. 186, No. 19 001-9193/04/$08.00 0 DOI: 10.118/JB.186.19.6575 6585.004 Copyright 004, American Society for Microbiology. All Rights Reserved. Genome-Wide

More information

Genes order and phylogenetic reconstruction: application to γ-proteobacteria

Genes order and phylogenetic reconstruction: application to γ-proteobacteria Genes order and phylogenetic reconstruction: application to γ-proteobacteria Guillaume Blin 1, Cedric Chauve 2 and Guillaume Fertin 1 1 LINA FRE CNRS 2729, Université de Nantes 2 rue de la Houssinière,

More information

Midterm Exam #1 : In-class questions! MB 451 Microbial Diversity : Spring 2015!

Midterm Exam #1 : In-class questions! MB 451 Microbial Diversity : Spring 2015! Midterm Exam #1 : In-class questions MB 451 Microbial Diversity : Spring 2015 Honor pledge: I have neither given nor received unauthorized aid on this test. Signed : Name : Date : TOTAL = 45 points 1.

More information

CcpA-Dependent Carbon Catabolite Repression in Bacteria

CcpA-Dependent Carbon Catabolite Repression in Bacteria MICROBIOLOGY AND MOLECULAR BIOLOGY REVIEWS, Dec. 2003, p. 475 490 Vol. 67, No. 4 1092-2172/03/$08.00 0 DOI: 10.1128/MMBR.67.4.475 490.2003 Copyright 2003, American Society for Microbiology. All Rights

More information

8/23/2014. Phylogeny and the Tree of Life

8/23/2014. Phylogeny and the Tree of Life Phylogeny and the Tree of Life Chapter 26 Objectives Explain the following characteristics of the Linnaean system of classification: a. binomial nomenclature b. hierarchical classification List the major

More information

Microbial Taxonomy. Classification of living organisms into groups. A group or level of classification

Microbial Taxonomy. Classification of living organisms into groups. A group or level of classification Lec 2 Oral Microbiology Dr. Chatin Purpose Microbial Taxonomy Classification Systems provide an easy way grouping of diverse and huge numbers of microbes To provide an overview of how physicians think

More information

Deposited research article Short segmental duplication: parsimony in growth of microbial genomes Li-Ching Hsieh*, Liaofu Luo, and Hoong-Chien Lee

Deposited research article Short segmental duplication: parsimony in growth of microbial genomes Li-Ching Hsieh*, Liaofu Luo, and Hoong-Chien Lee This information has not been peer-reviewed. Responsibility for the findings rests solely with the author(s). Deposited research article Short segmental duplication: parsimony in growth of microbial genomes

More information

Shedding Genomic Ballast: Extensive Parallel Loss of Ancestral Gene Families in Animals

Shedding Genomic Ballast: Extensive Parallel Loss of Ancestral Gene Families in Animals J Mol Evol (2004) 59:827 833 DOI: 10.1007/s00239-004-0115-7 Shedding Genomic Ballast: Extensive Parallel Loss of Ancestral Gene Families in Animals Austin L. Hughes, Robert Friedman Department of Biological

More information

Ch 27: The Prokaryotes Bacteria & Archaea Older: (Eu)bacteria & Archae(bacteria)

Ch 27: The Prokaryotes Bacteria & Archaea Older: (Eu)bacteria & Archae(bacteria) Ch 27: The Prokaryotes Bacteria & Archaea Older: (Eu)bacteria & Archae(bacteria) (don t study Concept 27.2) Some phyla Remember: Bacterial cell structure and shapes 1 Usually very small but some are unusually

More information

Microbial Taxonomy and the Evolution of Diversity

Microbial Taxonomy and the Evolution of Diversity 19 Microbial Taxonomy and the Evolution of Diversity Copyright McGraw-Hill Global Education Holdings, LLC. Permission required for reproduction or display. 1 Taxonomy Introduction to Microbial Taxonomy

More information

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other? Phylogeny and systematics Why are these disciplines important in evolutionary biology and how are they related to each other? Phylogeny and systematics Phylogeny: the evolutionary history of a species

More information

A thermophilic last universal ancestor inferred from its estimated amino acid composition

A thermophilic last universal ancestor inferred from its estimated amino acid composition CHAPTER 17 A thermophilic last universal ancestor inferred from its estimated amino acid composition Dawn J. Brooks and Eric A. Gaucher 17.1 Introduction The last universal ancestor (LUA) represents a

More information

Pseudogenes are considered to be dysfunctional genes

Pseudogenes are considered to be dysfunctional genes Pseudogenes and bacterial genome decay Jean O Micks Contrary to the evolutionary idea of junk DNA, many pseudogenes still have function in the genomes of archaea, bacteria, and also eukaryotes, such as

More information

Multifractal characterisation of complete genomes

Multifractal characterisation of complete genomes arxiv:physics/1854v1 [physics.bio-ph] 28 Aug 21 Multifractal characterisation of complete genomes Vo Anh 1, Ka-Sing Lau 2 and Zu-Guo Yu 1,3 1 Centre in Statistical Science and Industrial Mathematics, Queensland

More information

Computational approaches for functional genomics

Computational approaches for functional genomics Computational approaches for functional genomics Kalin Vetsigian October 31, 2001 The rapidly increasing number of completely sequenced genomes have stimulated the development of new methods for finding

More information

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 2009 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

More information

Structural Proteomics of Eukaryotic Domain Families ER82 WR66

Structural Proteomics of Eukaryotic Domain Families ER82 WR66 Structural Proteomics of Eukaryotic Domain Families ER82 WR66 NIH Protein Structure Initiative Mission Statement To make the three-dimensional atomic level structures of most proteins readily available

More information

The use of gene clusters to infer functional coupling

The use of gene clusters to infer functional coupling Proc. Natl. Acad. Sci. USA Vol. 96, pp. 2896 2901, March 1999 Genetics The use of gene clusters to infer functional coupling ROSS OVERBEEK*, MICHAEL FONSTEIN, MARK D SOUZA*, GORDON D. PUSCH*, AND NATALIA

More information

Evolutionary Use of Domain Recombination: A Distinction. Between Membrane and Soluble Proteins

Evolutionary Use of Domain Recombination: A Distinction. Between Membrane and Soluble Proteins 1 Evolutionary Use of Domain Recombination: A Distinction Between Membrane and Soluble Proteins Yang Liu, Mark Gerstein, Donald M. Engelman Department of Molecular Biophysics and Biochemistry, Yale University,

More information

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 200 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

More information

Figure Page 117 Microbiology: An Introduction, 10e (Tortora/ Funke/ Case)

Figure Page 117 Microbiology: An Introduction, 10e (Tortora/ Funke/ Case) Chapter 11 The Prokaryotes: Domains Bacteria and Archaea Objective Questions 1) Which of the following are found primarily in the intestines of humans? A) Gram-negative aerobic rods and cocci B) Aerobic,

More information

Dr. Amira A. AL-Hosary

Dr. Amira A. AL-Hosary Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

More information

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

More information

From Phylogenetics to Phylogenomics: The Evolutionary Relationships of Insect Endosymbiotic γ-proteobacteria as a Test Case

From Phylogenetics to Phylogenomics: The Evolutionary Relationships of Insect Endosymbiotic γ-proteobacteria as a Test Case Syst. Biol. 56(1):1 16, 2007 Copyright c Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 10.1080/10635150601109759 From Phylogenetics to Phylogenomics: The Evolutionary Relationships

More information

Increasing biological complexity is positively correlated with the relative genome-wide expansion of non-protein-coding DNA sequences

Increasing biological complexity is positively correlated with the relative genome-wide expansion of non-protein-coding DNA sequences Increasing biological complexity is positively correlated with the relative genome-wide expansion of non-protein-coding DNA sequences Ryan J. Taft 1, 2 * and John S. Mattick 3 1 Rowe Program in Genetics,

More information

C3020 Molecular Evolution. Exercises #3: Phylogenetics

C3020 Molecular Evolution. Exercises #3: Phylogenetics C3020 Molecular Evolution Exercises #3: Phylogenetics Consider the following sequences for five taxa 1-5 and the known outgroup O, which has the ancestral states (note that sequence 3 has changed from

More information

OGtree: a tool for creating genome trees of prokaryotes based on overlapping genes

OGtree: a tool for creating genome trees of prokaryotes based on overlapping genes Published online 2 May 2008 Nucleic Acids Research, 2008, Vol. 36, Web Server issue W475 W480 doi:10.1093/nar/gkn240 OGtree: a tool for creating genome trees of prokaryotes based on overlapping genes Li-Wei

More information

Tree of Life: An Introduction to Microbial Phylogeny Beverly Brown, Sam Fan, LeLeng To Isaacs, and Min-Ken Liao

Tree of Life: An Introduction to Microbial Phylogeny Beverly Brown, Sam Fan, LeLeng To Isaacs, and Min-Ken Liao Microbes Count! 191 Tree of Life: An Introduction to Microbial Phylogeny Beverly Brown, Sam Fan, LeLeng To Isaacs, and Min-Ken Liao Video VI: Microbial Evolution Introduction Bioinformatics tools allow

More information

Effects of Gap Open and Gap Extension Penalties

Effects of Gap Open and Gap Extension Penalties Brigham Young University BYU ScholarsArchive All Faculty Publications 200-10-01 Effects of Gap Open and Gap Extension Penalties Hyrum Carroll hyrumcarroll@gmail.com Mark J. Clement clement@cs.byu.edu See

More information

Application of tetranucleotide frequencies for the assignment of genomic fragments

Application of tetranucleotide frequencies for the assignment of genomic fragments Blackwell Science, LtdOxford, UKEMIEnvironmental Microbiology 1462-2912Society for Applied Microbiology and Blackwell Publishing Ltd, 20046Original ArticleTetras for metagenomicsh. Teeling et al. Environmental

More information

Inferring positional homologs with common intervals of sequences

Inferring positional homologs with common intervals of sequences Outline Introduction Our approach Results Conclusion Inferring positional homologs with common intervals of sequences Guillaume Blin, Annie Chateau, Cedric Chauve, Yannick Gingras CGL - Université du Québec

More information

Microbiology Helmut Pospiech

Microbiology Helmut Pospiech Microbiology http://researchmagazine.uga.edu/summer2002/bacteria.htm 05.04.2018 Helmut Pospiech The Species Concept in Microbiology No universally accepted concept of species for prokaryotes Current definition

More information

MiGA: The Microbial Genome Atlas

MiGA: The Microbial Genome Atlas December 12 th 2017 MiGA: The Microbial Genome Atlas Jim Cole Center for Microbial Ecology Dept. of Plant, Soil & Microbial Sciences Michigan State University East Lansing, Michigan U.S.A. Where I m From

More information

Two Families of Mechanosensitive Channel Proteins

Two Families of Mechanosensitive Channel Proteins MICROBIOLOGY AND MOLECULAR BIOLOGY REVIEWS, Mar. 2003, p. 66 85 Vol. 67, No. 1 1092-2172/03/$08.00 0 DOI: 10.1128/MMBR.67.1.66 85.2003 Copyright 2003, American Society for Microbiology. All Rights Reserved.

More information

Reversing Gene Erosion Reconstructing Ancestral Bacterial Genomes from Gene-Content and Order Data

Reversing Gene Erosion Reconstructing Ancestral Bacterial Genomes from Gene-Content and Order Data Reversing Gene Erosion Reconstructing Ancestral Bacterial Genomes from Gene-Content and Order Data Joel V. Earnest-DeYoung 1, Emmanuelle Lerat 2, and Bernard M.E. Moret 1,3 Abstract In the last few years,

More information

(Stevens 1991) 1. morphological characters should be assumed to be quantitative unless demonstrated otherwise

(Stevens 1991) 1. morphological characters should be assumed to be quantitative unless demonstrated otherwise Bot 421/521 PHYLOGENETIC ANALYSIS I. Origins A. Hennig 1950 (German edition) Phylogenetic Systematics 1966 B. Zimmerman (Germany, 1930 s) C. Wagner (Michigan, 1920-2000) II. Characters and character states

More information

Phylogenetic Networks, Trees, and Clusters

Phylogenetic Networks, Trees, and Clusters Phylogenetic Networks, Trees, and Clusters Luay Nakhleh 1 and Li-San Wang 2 1 Department of Computer Science Rice University Houston, TX 77005, USA nakhleh@cs.rice.edu 2 Department of Biology University

More information

Quantitative Exploration of the Occurrence of Lateral Gene Transfer Using Nitrogen Fixation Genes as a Case Study

Quantitative Exploration of the Occurrence of Lateral Gene Transfer Using Nitrogen Fixation Genes as a Case Study Lin 1 Quantitative Exploration of the Occurrence of Lateral Gene Transfer Using Nitrogen Fixation Genes as a Case Study by Jason Lin Advisor: Professor Peter Bickel Introduction Under the concept of evolution,

More information

doi: / _25

doi: / _25 Boc, A., P. Legendre and V. Makarenkov. 2013. An efficient algorithm for the detection and classification of horizontal gene transfer events and identification of mosaic genes. Pp. 253-260 in: B. Lausen,

More information

Chapter 19. Microbial Taxonomy

Chapter 19. Microbial Taxonomy Chapter 19 Microbial Taxonomy 12-17-2008 Taxonomy science of biological classification consists of three separate but interrelated parts classification arrangement of organisms into groups (taxa; s.,taxon)

More information

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Bioinformatics tools for phylogeny and visualization. Yanbin Yin Bioinformatics tools for phylogeny and visualization Yanbin Yin 1 Homework assignment 5 1. Take the MAFFT alignment http://cys.bios.niu.edu/yyin/teach/pbb/purdue.cellwall.list.lignin.f a.aln as input and

More information

Biology 211 (2) Week 1 KEY!

Biology 211 (2) Week 1 KEY! Biology 211 (2) Week 1 KEY Chapter 1 KEY FIGURES: 1.2, 1.3, 1.4, 1.5, 1.6, 1.7 VOCABULARY: Adaptation: a trait that increases the fitness Cells: a developed, system bound with a thin outer layer made of

More information

Letter to the Editor. Department of Biology, Arizona State University

Letter to the Editor. Department of Biology, Arizona State University Letter to the Editor Traditional Phylogenetic Reconstruction Methods Reconstruct Shallow and Deep Evolutionary Relationships Equally Well Michael S. Rosenberg and Sudhir Kumar Department of Biology, Arizona

More information

Introduction to polyphasic taxonomy

Introduction to polyphasic taxonomy Introduction to polyphasic taxonomy Peter Vandamme EUROBILOFILMS - Third European Congress on Microbial Biofilms Ghent, Belgium, 9-12 September 2013 http://www.lm.ugent.be/ Content The observation of diversity:

More information

The Complement of Enzymatic Sets in Different Species

The Complement of Enzymatic Sets in Different Species doi:10.1016/j.jmb.2005.04.027 J. Mol. Biol. (2005) 349, 745 763 The Complement of Enzymatic Sets in Different Species Shiri Freilich 1 *, Ruth V. Spriggs 1,2, Richard A. George 1,2 Bissan Al-Lazikani 2,

More information

Unsupervised Learning in Spectral Genome Analysis

Unsupervised Learning in Spectral Genome Analysis Unsupervised Learning in Spectral Genome Analysis Lutz Hamel 1, Neha Nahar 1, Maria S. Poptsova 2, Olga Zhaxybayeva 3, J. Peter Gogarten 2 1 Department of Computer Sciences and Statistics, University of

More information

Microbial Diversity and Assessment (II) Spring, 2007 Guangyi Wang, Ph.D. POST103B

Microbial Diversity and Assessment (II) Spring, 2007 Guangyi Wang, Ph.D. POST103B Microbial Diversity and Assessment (II) Spring, 007 Guangyi Wang, Ph.D. POST03B guangyi@hawaii.edu http://www.soest.hawaii.edu/marinefungi/ocn403webpage.htm General introduction and overview Taxonomy [Greek

More information

Fitness constraints on horizontal gene transfer

Fitness constraints on horizontal gene transfer Fitness constraints on horizontal gene transfer Dan I Andersson University of Uppsala, Department of Medical Biochemistry and Microbiology, Uppsala, Sweden GMM 3, 30 Aug--2 Sep, Oslo, Norway Acknowledgements:

More information

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis 10 December 2012 - Corrections - Exercise 1 Non-vertebrate chordates generally possess 2 homologs, vertebrates 3 or more gene copies; a Drosophila

More information

2/25/2013. Chapter 11 The Prokaryotes: Domains Bacteria and Archaea The Prokaryotes

2/25/2013. Chapter 11 The Prokaryotes: Domains Bacteria and Archaea The Prokaryotes 1 2 3 4 5 6 7 8 9 10 11 12 Chapter 11 The Prokaryotes: Domains Bacteria and Archaea The Prokaryotes Domain Bacteria Proteobacteria From the mythical Greek god Proteus, who could assume many shapes Gram-negative

More information

Bacterial Molecular Phylogeny Using Supertree Approach

Bacterial Molecular Phylogeny Using Supertree Approach Genome Informatics 12: 155-164 (2001) Bacterial Molecular Phylogeny Using Supertree Approach Vincent Daubin daubin@biomserv.univ-lyonl.fr Guy Perriere perriere@biomserv.univ-lyonl.fr Manolo Gouy gouy@biomserv.univ-lyonl.fr

More information

Domain Bacteria. BIO 220 Microbiology Jackson Community College

Domain Bacteria. BIO 220 Microbiology Jackson Community College Domain Bacteria BIO 220 Microbiology Jackson Community College John Ireland, Ph.D. 2006 Scientific Nomenclature Domain - Bacteria Phylum Important for gross characteristics Class Intermediate characteristics

More information

The impact of the neisserial DNA uptake sequence on genome evolution and stability

The impact of the neisserial DNA uptake sequence on genome evolution and stability The impact of the neisserial DNA uptake sequence on genome evolution and stability Ole Herman Ambur The Tønjum Group Transformation Neisserial transformation requires: DNA uptake sequence (DUS) Transformation

More information

Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition

Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition David D. Pollock* and William J. Bruno* *Theoretical Biology and Biophysics, Los Alamos National

More information

Measure representation and multifractal analysis of complete genomes

Measure representation and multifractal analysis of complete genomes PHYSICAL REVIEW E, VOLUME 64, 031903 Measure representation and multifractal analysis of complete genomes Zu-Guo Yu, 1,2, * Vo Anh, 1 and Ka-Sing Lau 3 1 Centre in Statistical Science and Industrial Mathematics,

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary information S1 (box). Supplementary Methods description. Prokaryotic Genome Database Archaeal and bacterial genome sequences were downloaded from the NCBI FTP site (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/)

More information

Genome reduction in prokaryotic obligatory intracellular parasites of humans: a comparative analysis

Genome reduction in prokaryotic obligatory intracellular parasites of humans: a comparative analysis International Journal of Systematic and Evolutionary Microbiology (2004), 54, 1937 1941 DOI 10.1099/ijs.0.63090-0 Genome reduction in prokaryotic obligatory intracellular parasites of humans: a comparative

More information

1. Prokaryotic Nutritional & Metabolic Adaptations

1. Prokaryotic Nutritional & Metabolic Adaptations Chapter 27B: Bacteria and Archaea 1. Prokaryotic Nutritional & Metabolic Adaptations 2. Survey of Prokaryotic Groups A. Domain Bacteria Gram-negative groups B. Domain Bacteria Gram-positive groups C. Domain

More information

Consensus Methods. * You are only responsible for the first two

Consensus Methods. * You are only responsible for the first two Consensus Trees * consensus trees reconcile clades from different trees * consensus is a conservative estimate of phylogeny that emphasizes points of agreement * philosophy: agreement among data sets is

More information

BATMAS30: Amino Acid Substitution Matrix for Alignment of Bacterial Transporters

BATMAS30: Amino Acid Substitution Matrix for Alignment of Bacterial Transporters PROTEINS: Structure, Function, and Genetics 51:85 95 (2003) BATMAS30: Amino Acid Substitution Matrix for Alignment of Bacterial Transporters Roman A. Sutormin, 1 * Aleksandra B. Rakhmaninova, 2 and Mikhail

More information

Name: Class: Date: ID: A

Name: Class: Date: ID: A Class: _ Date: _ Ch 17 Practice test 1. A segment of DNA that stores genetic information is called a(n) a. amino acid. b. gene. c. protein. d. intron. 2. In which of the following processes does change

More information

A Structural Equation Model Study of Shannon Entropy Effect on CG content of Thermophilic 16S rrna and Bacterial Radiation Repair Rec-A Gene Sequences

A Structural Equation Model Study of Shannon Entropy Effect on CG content of Thermophilic 16S rrna and Bacterial Radiation Repair Rec-A Gene Sequences A Structural Equation Model Study of Shannon Entropy Effect on CG content of Thermophilic 16S rrna and Bacterial Radiation Repair Rec-A Gene Sequences T. Holden, P. Schneider, E. Cheung, J. Prayor, R.

More information

Classification, Phylogeny yand Evolutionary History

Classification, Phylogeny yand Evolutionary History Classification, Phylogeny yand Evolutionary History The diversity of life is great. To communicate about it, there must be a scheme for organization. There are many species that would be difficult to organize

More information

Classification and Phylogeny

Classification and Phylogeny Classification and Phylogeny The diversity of life is great. To communicate about it, there must be a scheme for organization. There are many species that would be difficult to organize without a scheme

More information

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata.

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata. Supplementary Note S2 Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata. Phylogenetic trees reconstructed by a variety of methods from either single-copy orthologous loci (Class

More information

Phylogeny and the Tree of Life

Phylogeny and the Tree of Life Chapter 26 Phylogeny and the Tree of Life PowerPoint Lecture Presentations for Biology Eighth Edition Neil Campbell and Jane Reece Lectures by Chris Romero, updated by Erin Barley with contributions from

More information

Chapter 26 Phylogeny and the Tree of Life

Chapter 26 Phylogeny and the Tree of Life Chapter 26 Phylogeny and the Tree of Life Biologists estimate that there are about 5 to 100 million species of organisms living on Earth today. Evidence from morphological, biochemical, and gene sequence

More information

The Prokaryotes: Domains Bacteria and Archaea

The Prokaryotes: Domains Bacteria and Archaea PowerPoint Lecture Presentations prepared by Bradley W. Christian, McLennan Community College C H A P T E R 11 The Prokaryotes: Domains Bacteria and Archaea Table 11.1 Classification of Selected Prokaryotes*

More information

rho Is Not Essential for Viability or Virulence in Staphylococcus aureus

rho Is Not Essential for Viability or Virulence in Staphylococcus aureus ANTIMICROBIAL AGENTS AND CHEMOTHERAPY, Apr. 2001, p. 1099 1103 Vol. 45, No. 4 0066-4804/01/$04.00 0 DOI: 10.1128/AAC.45.4.1099 1103.2001 Copyright 2001, American Society for Microbiology. All Rights Reserved.

More information

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley

PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION Integrative Biology 200B Spring 2009 University of California, Berkeley "PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley B.D. Mishler Jan. 22, 2009. Trees I. Summary of previous lecture: Hennigian

More information

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting. Genome Annotation Bioinformatics and Computational Biology Genome Annotation Frank Oliver Glöckner 1 Genome Analysis Roadmap Genome sequencing Assembly Gene prediction Protein targeting trna prediction

More information

Chapter 26 Phylogeny and the Tree of Life

Chapter 26 Phylogeny and the Tree of Life Chapter 26 Phylogeny and the Tree of Life Chapter focus Shifting from the process of how evolution works to the pattern evolution produces over time. Phylogeny Phylon = tribe, geny = genesis or origin

More information

Classification and Phylogeny

Classification and Phylogeny Classification and Phylogeny The diversity it of life is great. To communicate about it, there must be a scheme for organization. There are many species that would be difficult to organize without a scheme

More information

Phylogenetic Analysis

Phylogenetic Analysis Phylogenetic Analysis Aristotle Through classification, one might discover the essence and purpose of species. Nelson & Platnick (1981) Systematics and Biogeography Carl Linnaeus Swedish botanist (1700s)

More information

Phylogenetic Analysis

Phylogenetic Analysis Phylogenetic Analysis Aristotle Through classification, one might discover the essence and purpose of species. Nelson & Platnick (1981) Systematics and Biogeography Carl Linnaeus Swedish botanist (1700s)

More information

Phylogenetic Analysis

Phylogenetic Analysis Phylogenetic Analysis Aristotle Through classification, one might discover the essence and purpose of species. Nelson & Platnick (1981) Systematics and Biogeography Carl Linnaeus Swedish botanist (1700s)

More information

Orthologs, Paralogs, and Evolutionary Genomics 1

Orthologs, Paralogs, and Evolutionary Genomics 1 Annu. Rev. Genet. 2005. 39:309 38 First published online as a Review in Advance on August 30, 2005 The Annual Review of Genetics is online at http://genet.annualreviews.org doi: 10.1146/ annurev.genet.39.073003.114725

More information

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline Phylogenetics Todd Vision iology 522 March 26, 2007 pplications of phylogenetics Studying organismal or biogeographic history Systematics ating events in the fossil record onservation biology Studying

More information

Outline. Classification of Living Things

Outline. Classification of Living Things Outline Classification of Living Things Chapter 20 Mader: Biology 8th Ed. Taxonomy Binomial System Species Identification Classification Categories Phylogenetic Trees Tracing Phylogeny Cladistic Systematics

More information

A Phylogenetic Network Construction due to Constrained Recombination

A Phylogenetic Network Construction due to Constrained Recombination A Phylogenetic Network Construction due to Constrained Recombination Mohd. Abdul Hai Zahid Research Scholar Research Supervisors: Dr. R.C. Joshi Dr. Ankush Mittal Department of Electronics and Computer

More information

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016 Molecular phylogeny - Using molecular sequences to infer evolutionary relationships Tore Samuelsson Feb 2016 Molecular phylogeny is being used in the identification and characterization of new pathogens,

More information

N o hal June 2007

N o hal June 2007 LABORATOIRE D INFORMATIQUE DE NANTES-ATLANTIQUE A large-scale analysis for significance assessment of frequencies relative to potentially strong sigma 70 promoters: comparison of 32 prokaryotic Christine

More information