The Chloroplast Genome of Nymphaea alba: Whole-Genome Analyses and the Problem of Identifying the Most Basal Angiosperm

Size: px
Start display at page:

Download "The Chloroplast Genome of Nymphaea alba: Whole-Genome Analyses and the Problem of Identifying the Most Basal Angiosperm"

Transcription

1 The Chloroplast Genome of Nymphaea alba: Whole-Genome Analyses and the Problem of Identifying the Most Basal Angiosperm Vadim V. Goremykin,* Karen I. Hirsch-Ernst, Stefan Wölfl,à and Frank H. Hellwig* *Institut für Spezielle Botanik, Universität Jena, Jena, Germany; Zentrum Pharmakologie und Toxikologie, Universität Göttingen, Göttingen, Germany; and àklinik für Innere Medizin, Universität Jena, Jena, Germany Angiosperms (flowering plants) dominate contemporary terrestrial flora with roughly 250,000 species, but their origin and early evolution are still poorly understood. In recent years, molecular evidence has accumulated suggesting a dicotyledonous origin of monocots. Phylogenetic reconstructions have suggested that several dicotyledonous groups that include taxa such as Amborella, Austrobaileya, and Nymphaea branch off as the most basal among angiosperms. This has led to the concept of monocots, eudicots, basal dicots, and ANITA groupings. Here, we present the sequence and phylogenetic analyses of the chloroplast DNA of Nymphaea alba. Phylogenetic analyses of our 14-species data set, consisting of 29,991 aligned nucleotide positions per chloroplast genome, revealed consistent support for Nymphaea being a divergent member of a monophyletic dicot assemblage. Three distinct angiosperm lineages were supported in the majority of our phylogenetic analyses eudicots, Magnoliopsida, and monocots. However, the monocot lineage leading to the grasses was the deepest branching. Although analyses of only one individual gene alignment (out of 61) is consistent with some recently proposed hypotheses for the paraphyly of dicots, we also report observations that nine genes do not support paraphyly of dicots. Instead, they support the basal monocot-dicot split. Consistent with this finding, we also report observations suggesting that the monocot lineage leading to the grasses has the strongest phylogenetic affinity to gymnosperms. Our findings have general implications for studies of substitution model specification and analyses of concatenated genome data. Introduction Key words: Nymphaea, chloroplast genomes, angiosperms, gymnosperms, molecular evolution, substitution rates. Vadim.Goremykin@uni-jena.de. Mol. Biol. Evol. 21(7): doi: /molbev/msh147 Advance Access publication April 14, 2004 A new consensus view of higher-level angiosperm systematics is currently emerging, based mostly on the analysis of three genes (reviewed in Savolainen and Chase [2003]). In this view, the dicot lineages, including Nymphaea and Amborella, are regarded as the deepest branching among angiosperms. Recent prominent studies (Adams et al. 2002; Bergthorsson et al. 2003) have taken for granted that this consensus is correct. In our previous studies involving chloroplast genome analyses (Goremykin et al. 2003a, 2003b), we noticed that although the optimal symmetrical model (GTR 1 I 1 ÿ) identified by Modeltest (Posada and Crandall 1998) on concatenated chloroplast data was consistent with a basal position of the magnoliids Calycanthus and Amborella, tree building with many other symmetric substitution models did not support this hypothesis. In fact, an alternative hypothesis placing monocots as basal was in most analyses favored with 100% nonparametric bootstrap support. In our paper, we raised the concern of model specification and whether or not the best-fitting symmetric model for our concatenated data gave the most biologically realistic result. In an effort to shed more light on these earlier findings, we have sequenced the chloroplast genome of Nymphaea alba. It was hoped that addition of this putatively basal species would help break up long branches and, thus, help stabilize the angiosperm tree topology. Nymphaea alba belongs to Nymphaeaceae (Nymphaeales). Like Amborella, Nymphaeales (excluding Nelumbonaceae) have no vessels (Cronquist 1981; Takhtajan 1966). Their position has been viewed by classical taxonomists as somehow intermediate between monocotyledonous and dicotyledonous flowering plants. Some botanists (Rohweder and Endress 1983) noted certain similarities such as floral organs in trimerous whorls uniting members of this order (Cabomba) and primitive monocotyledons (Alisma). Others considered the morphological similarities between Nymphaeales and the monocots (rhizodermis differentiated into short and long cells, dispersed vascular bundles, compound midrib, and operculate seeds) important enough to actually include the Nymphaeales into the monocotyledons (Schaffner 1904, 1934; Guttenberg and Müller-Schröder 1958; Haines and Lye 1975). A detailed investigation of embryonal development of Nymphaeales (Lodkina 1988) revealed asynchronous and asymmetric development of cotyledons, which was interpreted by the author as evidence of an ancestral position of Nympheales to monocots. Indeed, a number of scientists for different reasons believed that Nymphaeales belong to the stock of early lineages from which monocots arose (Arber 1920; Takhtajan 1973; Cronquist 1981; Dahlgren and Clifford 1982). Many character states of Nymphaeales considered to be primitive (e.g., uniaperturate pollen, apocarpous gyneceum, numerous stamens, and laminar placentation) point to the antiquity of the order. The complex polymerous flowers of water lilies with no clear differentiation between sepals and carpels fit very well into the euanthian school of flower origin theories. Fossil record supports the great age of Nymphaeales. Friis, Pedersen, and Crane (2001) found fossilized nymphaealean flowers dating back to the early Cretaceous period (125 to 115 Myr) and belonging to the oldest fossil assemblages that contain unequivocal angiosperm stamens and carpels. The representatives of this order tend to be among the first taxa to diverge from the other angiosperm lineages in many molecular studies. They are either included in the ANITA (Amborella, Nymphaeales, and Illiciales-Trimeniales-Aristolochiales) grade (Qiu et al. 1999; Soltis, Soltis, and Chase 1999; Barkman et al. 2000; Graham and Olmstead 2000; Zanis et al. 2002) or form a Molecular Biology and Evolution vol. 21 no. 7 Ó Society for Molecular Biology and Evolution 2004; all rights reserved.

2 1446 Goremykin et al. clade with Amborella,which is a sister to all other angiosperms (Barkman et al. 2000; Graham and Olmstead 2000; Zanis et al. 2002). Recent publications have not been able to discriminate between those hypotheses. It seems that resolution of this issue will require larger amounts of sequence data. The data set of the 61 chloroplast-coding genes we present and investigate here is the largest available to date for addressing the issue of early-branching angiosperms. Materials and Methods Genomic Sequencing Nymphaea alba leaves were harvested from a plant growing in the botanical gardens of the University of Jena, Germany. Total DNA was extracted from the leaves employing the CTAB method (Murray and Thompson 1980) and further purified with Quiagen columns (Quiagen, Valencia, Calif.) according to the manufacturer s protocol. The Nymphaea alba plastome sequence was amplified by implementing a long-range PCR strategy with primers developed from the alignment of known chloroplast genome sequences, as described previously (Goremykin et al. 2003a, 2003b). We covered the entire Nymphaea alba chloroplast genome with PCR products, which exhibited lengths between approximately 4 to 20 kb. The inverted repeat regions of the cpdna were amplified separately, each with two PCR products extending to the flanking sequences of the single-copy regions and overlapping in the middle of the respective repeat. PCR products were purified by electrophoresis through lowmelting agarose followed by digestion with agarase and were subsequently sheared by nebulization, yielding fragments of 0.5 to 1.5 kb in length. The fragments were cloned into the 4Blunt-TOPO vector employing the TOPO Shotgun Subcloning kit (Invitrogen, Groningen, The Netherlands), according to the manufacturer s protocol. Recombinant plasmids containing individual fragments were isolated from transformed E. coli clones with the Montage Plasmid Miniprep kit (Millipore, Eschborn, Germany). Sequencing reactions with plasmid DNA were prepared using the Big Dye Terminator sequencing kit (ABI, Foster City, Calif.). Automated sequencing was performed on ABI 3100, ABI 377 (ABI), and MegaBACE 1000 (Amersham/Pharmacia Biotech, Uppsala, Sweden) sequencers. Sequence Assembly and Annotation All automated sequencer traces were base-called with the PHRED program (Ewing et al. 1998), and sequence masking and assembly were performed with the STADEN package (Staden, Beal, and Bonfield 2000). Sequencing data were accumulated to 103 coverage for all PCR fragments; remaining gaps were closed by PCR. The Nymphaea alba chloroplast genome sequence has been deposited in the EMBL database under the accession number AJ The primer sequences used for the amplification of plastome sequences by PCR and the alignments employed for phylogenetic analyses are available upon request. The genome was annotated as described previously (Goremykin et al. 2003b). Results General Genome Properties The chloroplast DNA of Nymphaea alba is a 159,930- bases-long circular molecule, which has a structure typical for many land plants large and small single-copy regions separated by inverted repeat regions. Using PHRED confidence values, we determined the total plastome assembly to contain 0.03 incorrectly read bases, which suggests no mistakes in the genomic sequence. The G1C content of the plastome is 39.15%, which is close to that of the other angiosperms (Amborella has, for example, 38.3% G1C, Nicotiana has 37.8%, and Zea has 38.4%). The G1C content of the protein-coding genes of known function found on the Nymphaea alba cpdna is close to the overall one (40.1%). However, the above bases are not uniformly distributed across the different codon positions. The synonymous third codon positions have a G1C content of 31.5%, whereas the first and the second codon positions have a G1C content closer to uniform (i.e., 44.5%). Both the gene order and the gene content of the genome under study are identical to those of Amborella trichopoda cpdna (Goremykin et al. 2003a) and are very similar to the gene order and the gene content of Calycanthus fertilis cpdna (Goremykin et al. 2003b), with the exception of the hypothetical ACR-toxin sensitivity gene (ACRS) open reading frame (ORF) found in the latter genome and shorter inverted repeat region of the Calycanthus plastome, not including the rpl2 gene. The gene map of the Nymphaea alba cpdna is presented in figure 1. Alignment and Data Properties The chloroplast genome of Nymphaea alba contains all 61 genes common to completely sequenced chloroplast genomes of the land plants (Ohyama et al. 1986; Shinozaki et al. 1986; Hiratsuka et al. 1989; Wakasugi et al. 1994; Maier et al. 1995; Sato et al. 1999; Hupfer et al. 2000; Kato et al. 2000; Schmitz-Linneweber et al. 2001; Ogihara et al. 2002; Goremykin et al. 2003a, 2003b). Individual alignments of the first and the second codon positions of 61 genes and alignments of their translated sequences were produced with our ClustalW-embedded Perl script. They were manually concatenated and edited to produce a 29,991-position-long nucleotide alignment and a 14,811 amino acid alignment used in phylogenetic analyses. In the nucleotide alignment, we excluded the third codon positions because it could pose problems in phylogeny reconstruction for the application of fitting and tree building with symmetric substitution models. These sites tend to exhibit high and irregular AT-contents and were found to be very divergent in comparison Pinus vs. angiosperms (Goremykin et al. 2003a, 2003b). Even at the first 1 second codon positions, several species in the data set do not pass a 5% chi-square test of compositional homogeneity. This also raises some concern for phylogenetic analysis of angiosperm/outgroup 112 sequence data sets, which we address. Analyses of Concatenated Nucleotide Alignment The tree depicting the inferred phylogenetic relationship of the species under analysis is presented in figure 2.

3 The Chloroplast Genome of Nymphaea alba 1447 FIG. 1. Nymphaea alba cpdna. The topmost part of the map corresponds to the start and the end of the EMBL sequence entry AJ Genes shown inside the circle are transcribed clockwise, and genes outside the circle are transcribed counterclockwise. The genes of the genetic apparatus are shown in red, photosynthesis genes are indicated as green, and genes of NADH dehydrogenase are shown in violet. The ORFs, ycfs, and genes of unknown function are designated as gray. Intron-containing genes (names of which are indicated in blue) are represented by their exons. In the cases when two genes overlap, one of them is shifted off the map to show its position. This topology was found in distance analysis of the 29,991 position alignment of the first and the second coding positions of 61 chloroplast genes employing Tajima-Nei substitution model as implemented in the Treecon package (Van de Peer et al. 1994) and further confirmed in distance analyses with Jukes-Cantor, Kimura two-parameter Felsenstein F81, Felsenstein F84, Kimura three-parameter, Hasegawa, Kishino and Yano, Tajima-Nei, Tamura-Nei, and General time-reversible (GTR) models as implemented in the PAUP* package (Swofford 2002), both with and without gamma correction (employing alpha shape parameter 0.27). The branches uniting dicotyledons,

4 1448 Goremykin et al. ((outgroups (Calycanthus (Nymphaea, Amborella)))(monocots, eudicots)) with exactly the same settings and models that we used with Tree-Puzzle. The branch separating outgroups with magnoliids from the rest of the species received low (,60%) QPS support in these analyses. As with our earlier findings (Goremykin et al. 2003a), the best fit model found by the Modeltest program (Posada and Crandall 1998) suggested a General Time-Reversible model with positional rate heterogeneity. Tree building under heuristic ML (PAUP*) with the optimal symmetric model yielded a topology with a clade bearing Nymphaea and Amborella branching first among the angiosperms, followed by Calycanthus, and then by the dichotomy monocots-eudicots. However, as with the findings reported previously, deviations in the choice of model and parameter values gave trees wherein branches united dicotyledons, magnoliids, and Nymphaea with Amborella. In these analyses monocots were basal. FIG. 2. Neighbor-joining tree built from Tajima-Nei distances derived from analysis of the alignment of the first and the second codon positions from 61 protein-coding genes common to the plastomes of land plants. magnoliids and Nymphaea with Amborella were recovered with 100/100 bootstrap proportion (BP) support in all above analyses. Several species in the data set do not pass the 5% chisquare test of compositional homogeneity. The compositional biases in these sequences, however, would not be expected to affect the phylogeny reconstruction because the topology presented on figure 2 was also recovered in the LogDet analysis with 100/100 BP support for all aforementioned branches. This topology was further confirmed in maximumparsimony analyses performed with heuristic and branch and bound searches. The bootstrap values supporting the monophyly of dicotyledonous plants, monophyly of Magnoliopsida, and sister group relationship between Amborella and Nymphaea remained on the maximum level in these analyses. The same result was recovered in maximum-likelihood (ML) analyses performed with the Tree-Puzzle program (Strimmer and von Haeseler 1996). The ML tree built with the Hasegawa, Kishino, and Yano model of substitution was congruent to the topology shown on figure 2. The monophyletic status of dicots, Magnoliids, and Nymphaea with Amborella received, respectively, 95, 98, and 99 quartet puzzling support (QPS). Applying the Tamura-Nei model of substitution resulted in no changes in topology and in a slight change of support for the above three branches: 95, 97, 99 QPS (respectively). The hypothetical branch uniting all angiosperms under analysis with the exception of Nymphaea and Amborella, a branch that would be in compliance with the ANITA grade hypothesis, received no support in these QP ML analyses. The heuristic ML searches performed with PAUP* yielded different results. The quartet-puzzling algorithm implemented in this program found an alternative topology Analyses of Concatenated Amino Acid Alignment These results were further tested with analyses of the 14,811-position-long alignment of the translated sequences. Heuristic search employing maximum-parsimony algorithm (PAUP*) yielded the topology congruent with monocots basal (fig. 2). The branch bearing Nymphaea and Amborella and the one uniting all dicotyledonous plants were found in 100/100 bootstrap trees, whereas the branch bearing the three magnoliids was recovered in 99/100 bootstrap replicas. Distance analyses of the protein alignment were performed with the Treecon package employing Kimura and Tajima-Nei models and with the PHYLIP package (Felsenstein 1989) with Dayhoff model. These analyses resulted in topologies identical to the one presented on figure 2 with 100/100 BP support values for the three aforementioned branches. The neighbor-net (Bryant and Moulton 2004) tree built with the Protein LogDet method (Tholesson 2004) with all amino acid sites included had the topology congruent to the one shown in figure 2. Maximum-likelihood analyses were performed with the Tree-Puzzle program with default settings and root assigned to Marchantia. The branching order of the eudicot clades (Nicotiana/Spinacia), (Arabidopsis/Lotus), and the one leading to Oenothera could not be resolved in all ML analyses. The eudicot monophyly though, as well as monophyly of magnoliids, dicots, and the sister group relationship between Amborella and Nymphaea, received strong support. With Müller-Vingron, BLOSUM, Adachi- Hasegawa, Dayhoff, and Jones and Jones substitution models, the lowest QPS value supporting the eudicot branch in the above ML analyses was 96 QPS. The single lowest quartet-puzzling support value obtained for the other three branches was 97. Analyses of the Individual Alignments Rearrangements in 11 plastomes of the spermatophytes affecting gene order involve large chunks of DNA. Therefore, on the level of at least spermatophytes, orthology of every gene under analysis as well as common evolutionary history of all 61 genes can easily be proved

5 The Chloroplast Genome of Nymphaea alba 1449 on the basis of gene order identity and general sequence similarity along the large stretches of cpdna from different species. Yet the land plant tree topologies derived from the chloroplast genes are often different. To investigate possible phylogenetic biases of individual genes, we counted the bootstrap values supporting the branches bearing outgroups with Nymphaea, Amborella or with Nymphaea and Amborella taken together, Calycanthus, Magnoliopsida, Rosopsida, and grasses in NJ/ GTR trees built from the alignments of the first and the second codon positions from 61 coding genes common to the genomes of the land plants. The results of these analyses are shown in table 1. Here, we label a certain branch to be supported by a protein alignment when corresponding BP support value is no lower than the arbitrary value of 60%. One can see that one of six branches is supported by a much larger number of genes than the other five. This is the branch uniting outgroups with grasses that we recovered from the majority of analyses of the concatenated data set. This branch found some support in the analysis of atpf (67% BP), matk (60% BP), rpob (99% BP), rpoc1 (100% BP), rpoc2 (99% BP), rps12 (61% BP), rps18 (76% BP), rps3 (77% BP), and rps8 alignments (90% BP). These alignments exhibit, respectively, 117, 479, 483, 357, 939, 24, 50, 146, and 81 informative positions. We found no clear cases of support on the level of individual proteins for the branches uniting outgroups with (1) Nymphaea, (2) Amborella, and (3) Magnoliopsida. The branches bearing outgroups with (1) Nymphaea 1 Amborella, (2) Calycanthus, and (3) Rosopsida were supported each by a single protein, by, respectively, 74% BP (psbf), 73% BP (psaj), and 70% BP ( ycf 3). The number of informative positions in alignments of these proteins are, respectively, 7, 17, and 35. Additional Tests We wished to evaluate support for the clade grouping the monocotyledoneous species with outgroups in our analyses. One simple way of testing outgroup affinity of different angiosperm branches would be to count the number of positions in which a group to be tested share the same character state with the most closely related outgroup that is not observed in all other angiosperm sequences. Given low level of homoplasy (for example by excluding the highly homoplastic third positions), one would expect such positions in a group to be tested to contain the ancestral character states existing before the splitting of angiosperms and the outgroup that subsequently mutated in other angiosperm lineage. Another way to test the basal monocot-dicot split would be to count the number of synapomorphies supporting alternative branches within the angiosperm ingroup. One can expect more synapomorphies between more closely related ingroup taxa. We found the concatenated alignment of the first and the second codon positions sampled from 61 chloroplast genes to contain 68 positions with bases shared between three grasses and Pinus to the exclusion of other angiosperms. Three species of magnoliopsida (Calycanthus, Nymphaea, and Amborella) share only 13 such positions with Pinus and five species of Rosopsida share only five. The part of the ANITA grade (Nymphaea 1 Amborella 1 Pinus) is supported by 10 positions. We deleted nonspermatophyte outgroups and repeated the analysis. The number of positions supporting outgoup affinity of the above four groups changed to 151 (grasses 1 Pinus), 56 (Nymphaea 1 Amborella 1 Pinus), 30 (Calycanthus 1 Amborella 1 Nymphaea 1 Pinus), and 19 (Rosopsida 1 Pinus). For comparison, monophyly of angiosperms and spermatophytes is supported by 532 and 631 positions, respectively. The positions supporting outgroup affinity of grasses and of the species from the ANITA grade are shown in figure 3. In the total alignment of the first and the second codon positions, the mean distance across the range Pinus to monocots is 0.17 substitutions/position (ML estimation with Tamura-Nei model of substitution). Given that value, the probability that a position mutated twice since the separation of the gymnosperm and monocot lines is , which corresponds to one twice-mutated position out of Therefore, if the rate of substitutions in 151 alignment positions supporting grouping of monocots with Pinus (fig. 3) does not exceed the mean one characteristic of the whole alignment, this subset could be expected to contain approximately four homoplastic positions. The mean distance among eight dicotyledonous plants in the 151-position subset is 0.11, which is higher than the corresponding distance observed in the total alignment of 61 genes (0.065 substitutions/position, same model). Taking into account this somewhat elevated substitution rate, the 151-position subset can be expected to contain approximately seven homoplastic positions. One can also note that the GC content of the 151-position subset (49.3%) is close to equilibrium and is similar to the total GC content of the alignment (44.4%). However, because the first codon positions can undergo synonymous substitutions, it is not immediately clear from the above observations whether support for gymnosperm affinity of the line leading to Zea, Oryza, and Triticum would be reflected in the protein sequences. So we repeated the analysis, this time using a 14,811 amino acid alignment. It was found to contain 45 positions that clearly support the separation of grasses and Pinus from the rest of the species under analysis. By contrast, the affinity of ANITA members to Pinus was supported by 19 positions, and that of Magnoliopsida and Rosopsida was supported by nine and four positions, respectively. As in the analyses of nucleotide sequences, removal of fern and liverwort sequences resulted in a stronger signal. In the alignment containing only spermatophyte species, there are 86 positions in which Pinus and three grasses share a character to the exclusion of other angiosperms, as opposed to 42, 13, and 11 positions supporting affinity of Pinus to, respectively, ANITA members, Magnoliopsida, and Rosopsida. The nucleotide alignment of 14 OTUs with 29,991 positions per species contains 93 synapomorphies shared among the dicots, whereas the number of synapomorphies supporting monophyly of eudicots and monocots is 17. In the 14,811 amino acid alignment, there are 93 synapomorphies that unite all dicots. The number of

6 1450 Goremykin et al. Table 1 Results of the Individual Analysis of the Alignments of the First and the Second Codon Positions Bootstrap Proportion Values a Alignment Length Number of Informative Positions Grasses Amborella Nymphaea Nymphaea 1 Amborella Calycanthus Magnoliids Rosopsida atpa atpb atpe atpf atph atpi ccsa cema clpp lhba matk peta petb petd petg petl petn psaa psab psac psai psaj psba psbb psbc psbd psbe psbf psbh psbi psbj psbk psbl psbm psbn psbt rbcl rpl rpl rpl rpl rpl rpl rpl rpoa rpob rpoc rpoc rps rps rps rps rps rps rps rps rps rps rps ycf ycf a Bootstrap proportion values supporting the branch bearing outgroups with grasses, Amborella, Nymphaea, Nymphaea plus Amborella, Magnoliopsida, and Rosopsida.

7 The Chloroplast Genome of Nymphaea alba 1451 FIG. 3. The alignment positions in which grasses (above) and members of ANITA group (below) share characters with Pinus to the exclusion of all other angiosperm lines in the alignment of the first and the second codon positions of 61 protein-coding genes common to 14 genomes of land plants. synapomorphies that unite monocots and eudicots in this alignment is 11. Discussion Our findings of strong contradictory signals between different phylogenetic analyses of angiosperm chloroplast genome data highlight the concern recently raised in Molecular Biology and Evolution over appropriate analysis of complete genome data and the problems of sequence concatenation (Holland et al. 2004). Our findings are also cautionary and provide insight into the importance of further complete genome sequences before strong conclusions can be drawn in respect of identification of the most basal angiosperms. In the majority of our analyses, we detected neither support for (1) monocot affinity of a line leading to modern Nymphaeales nor support for (2) an early appearance of Nymphaeales in the evolutionary history of flowering plants. These analyses suggested that Nymphaea alba is a derived representative of a dicot lineage, which is, of all species under analysis, most closely related to Amborella trichopoda. The clade bearing Amborella with Nymphaea was detected with lower support previously (Barkman et al. 2000, Graham and Olmstead 2000), although not its derived position, because it appeared as the first branch to

8 1452 Goremykin et al. split off among angiosperms in these trees. Because our taxon sampling remains limited, it is, however, premature to state close taxonomical relationship of Amborella and Nymphaeales. In analyses of the individual genes we registered multiple cases of support for the sister group relationship of those two plants, yet little support for their close gymnosperm affinity. It is the line leading to three grasses that was found in a most-basal position within the angiosperms in the highest number of phylogenetic analyses of the individual gene alignments. This branching was supported by the alignments of genes (matk, rpob, rpoc1, and rpoc2) containing the greatest numbers of informative positions among the 61 individual gene alignments we built. All alternative branches tested either received no support in these analyses or were supported each by a single alignment of a short gene with a low number of informative positions. The interpretation of these results is straightforward: There are no bona fide cases of support for alternative rootings of angiosperms on the level of individual gene alignments. The appearance of alternative topologies in analyses of individual genes is probably the result of stochastic variations in the substitution process becoming visible when character sampling becomes small, as was recently observed in the analysis of 108 genes from yeasts (Rokas et al. 2003). Because bootstrap proportion values are affected by the ratio of characters supporting different topologies but not by their absolute numbers, such small biased samples could still exhibit high BP support values for incorrectly inferred branches. Another conclusion that can be drawn from our analyses of individual genes is that the character-wise small data sets can be generally unreliable for elucidating phylogeny. Analyzed individually, no alignment recovered the very stable topology (fig. 2) we inferred from the analysis of the concatenated nucleotide alignment of 29,991 positions employing the same method of tree construction. This is consistent with the newer findings of Rokas et al. (2003) and is furthermore consistent with earlier analyses of chloroplast genome phylogeny comparing individual and concatenated alignments (Goremykin, Hansmann, and Martin 1997; Martin et al. 1998; Lockhart et al. 1999). An attempt to additionally test the outgroup affinity of the grasses revealed that they share the largest number of characters with Pinus to the exclusion of all other angiosperms among branches checked. The positions supporting that affinity are not hypermutated and do not exhibit any strong compositional bias. The number of such positions exceeds the number of positions supporting outgroup affinity of ANITA members approximately three times in the nucleotide and two times in the amino acid alignment; other conflicting signals are substantially less pronounced. It is possible for nonadjacent and nonrelated taxa on a tree to have more sequence identities than adjacent and related taxa if the molecular clock is violated in a way that the substitution rate among the nonadjacent taxa gets comparatively small. However, the above considerations cannot be applied to explain the result in the figure 3, because both the grasses and Pinus are born on the branches that are the longest among the spermatophytes. One can note that the above results are in good accord with the numbers of synapomorphies that can be observed within the angiosperm ingroup. In concatenated protein and nucleotide alignments, the clade uniting all dicots is supported by, respectively, eightfold and fivefold larger numbers of synapomorphies than an alternative clade bearing eudicots and monocots. These observations and results of individual gene analyses stand in sharp contrast to the topology favored by the optimal symmetric model applied to concatenated sequence data. This apparent contradiction may be easily explained. When the internal branches of the true underlying tree are short compared with the length of the external branches, tree building is expected to be problematic (Hendy and Penny 1987). This problem is exacerbated when sequence evolution is not well described by the assumed substitution model (Jermiin et al. 2004). In such cases, small deviations in model parameters can potentially lead to very different tree topologies supported by high BP support values. Once concatenated, the optimal symmetric model will merely be the best average fit to all these genes. The majority of analyses presented here converge on the basal monocot-dicot split. However, the extreme difference between internal ingroup and external outgroup branch lengths in the concatenated gene angiosperm tree shown in figure 2 suggests that we may be some ways from being confident of identifying the most basal angiosperm. Clearly, the sequencing of genomes for more closely related outgroups and putatively basal angiosperms will be important for overcoming potential problems of model misspecification and long-branch attraction. Supplementary Material The sequence reported in this paper has been deposited in the EMBL database (accession number AJ627251). Acknowledgment This publication was supported by a grant of the Deutsche Forschungsgemeinschaft. Literature Cited Adams, K. L., Y. L. Qiu, M. Stoutemyer, and J. D. Palmer Punctuated evolution of mitochondrial gene content: High and variable rates of mitochondrial gene loss and transfer to the nucleus during angiosperm evolution. Proc. Natl. Acad. Sci. USA 99: Arber, A Water plants: A study of the aquatic angiosperm. Cambridge University Press. London. Barkman, T. J., G. Chenery, J. R. McNeal, J. Lyons-Weiler, W. J. Ellisens, G. Moore, A. D. Wolfe, and C. W. depamphilis Independent and combined analyses of sequences from all three genomic compartments converge on the root of flowering plant phylogeny. Proc. Natl. Acad. Sci. USA 97: Bergthorsson, U., K. L. Adams, B. Thomason, and J. D. Palmer Widespread horizontal transfer of mitochondrial genes in flowering plants. Nature 424:

9 The Chloroplast Genome of Nymphaea alba 1453 Bryant, D., and Moulton, V Neighbor-net: an agglomerative method for the construction of phylogenetic networks. Mol. Biol. Evol. 21: Cronquist, A An integrated system of classification of flowering plants. Columbia University Press, New York. Dahlgren, R. M. T., and H. T. Clifford The Monocotyledons: a comparative study. Academic Press, New York. Ewing, B., L. Hillier, M. C. Wendl, and P. Green Basecalling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 8: Felsenstein, J PHYLIP (phylogeny inference package). Version 3.2. Cladistics 5: Friis, E. M., K. R. Pedersen, and P. R. Crane Fossil evidence of water lilies (Nymphaeales) in the early Cretaceous. Nature 410: Goremykin, V., S. Hansmann, and W. Martin Evolutionary analysis of 58 proteins encoded in six completely sequenced chloroplast genomes: revised molecular estimates of two seed plant divergence times. Plant Syst. Evol. 206: Goremykin, V. V., K. I. Hirsch-Ernst, S. Wölfl, and F. H. Hellwig. 2003a. Analysis of the Amborella trichopoda chloroplast genome sequence suggests that Amborella is not a basal angiosperm. Mol. Biol. Evol. 20: b. The chloroplast genome of the basal angiosperm Calycanthus fertilis structural and phylogenetic analyses. Plant Syst. Evol. 242: Graham, S. W., and R. G. Olmstead Utility of 17 chloroplast genes for inferring the phylogeny of the basal angiosperms. Am. J. Bot. 87: Guttenberg, H. V., and R. Müller-Schröder Untersuchungen über die Entwicklung des Embryos und der Keimpflanze von Nuphar luteum. Planta 51: Haines, R. W., and K. A. Lye Seedlings of Nymphaeaceae. J. Linn. Soc. Bot. 70: Hendy, M. D., and D. Penny Edge lengths of trees from sequence data. Math. Biosci. 83: Hiratsuka, J., H. Shimada, R. Whittier et al. (16 co-authors) The complete sequence of the rice (Oryza sativa) chloroplast genome: intermolecular recombination between distinct trna genes accounts for a major plastid DNA inversion during the evolution of the cereals. Mol. Gen. Genet. 217: Holland, B. R., K. T. Huber, V. Moulton, and P. J. Lockhart. (in press). Using consensus networks to visualize contradictory evidence for species phylogeny. Mol. Biol. Evol. Hupfer, H., M. Swiatek, S. Hornung, R. G. Hermann, R. M. Maier, W.-L. Chiu, and B. Sears Complete nucleotide sequence of the Oenothera elata plastid chromosome, representing plastome I of the five distinguishable Euoenothera plastomes. Mol. Gen. Genet. 263: Jermiin, L. S., S. Y. W. Ho, F. Ababneh, J. Robinson, and A. W. D. Larkum. (in press). The biasing effect of compositional heterogeneity on phylogenetic estimates may be underestimated. Syst. Biol. Kato, T., T. Kaneko, S. Sato, Y. Nakamura, and S. Tabata Complete structure of the chloroplast genome of a legume, Lotus japonicus. DNA Res. 7: Lockhart, P. J., C. J. Howe, A. C. Barbrook, A. W. D. Larkum, and D. Penny Spectral analysis, systematic bias, and the evolution of chloroplasts. Mol. Biol. Evol. 16: Lodkina, M. M Evolutionary relationships of monocots and dicots derived from studies of embryo and seedlings data. Botanichesky Zhurnal 73: (in Russian). Maier, R. M., K. Neckermann, G. L. Igloi, and H. Kossel Complete sequence of the maize chloroplast genome: gene content, hotspots of divergence and fine tuning of genetic information by transcript editing. J. Mol. Biol. 251: Martin, W., B. Stoebe, V. Goremykin, S. Hansmann, M. Hasegawa, and K. V. Kowallik Gene transfer to the nucleus and the evolution of chloroplasts. Nature 393: Murray, M. G., and W. F. Thompson Rapid isolation of high molecular weight DNA. Nucleic Acids Res. 8: Ogihara, Y., K. Isono, T. Kojim et al. (19 co-authors) Structural features of a wheat plastome as revealed by complete sequencing of chloroplast DNA. Mol. Genet. Genomics 266: Ohyama, K., H. Fukuzawa, T. Kohchi et al. (13 co-authors) Chloroplast gene organization deduced from complete sequence of liverwort Marchantia polymorpha chloroplast DNA. Nature 322: Posada, D., and K. A. Crandall Modeltest: testing the model of DNA substitution. Bioinformatics. 14: Qiu, Y.-L., J. Lee, F. Bernasconi-Quadroni, D. E. Soltis, P. S. Soltis, M. Zanis, E. A. Zimmer, Z. Chen, V. Savolainen, and M. W. Chase The earliest angiosperms: evidence from mitochondrial, plastid and nuclear genomes. Nature 402: Rohweder, O., and P. K. Endress Samenpflanzen. Georg Thieme Verlag, Stuttgart. Rokas, A., B. L. Williams, N. King, and S. B. Carroll Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature 425: Sato, S., Y. Nakamura, T. Kaneko, E. Asamizu, and S. Tabata Complete structure of the chloroplast genome of Arabidopsis thaliana. DNA Res. 6: Savolainen, V., and M. W. Chase A decade of progress in plant molecular phylogenetics. Trends Genet. 19: Schaffner, J. H Some morphological peculiarities of the Nymphaeaceae and Helobiae. Ohio Nat. 4: Phylogenetic taxonomy of plants. Quart. Rev. Biol. 9: Schmitz-Linneweber, C., R. M. Maier, J. P. Alcaraz, A. Cottet, R. G. Herrmann, and R. Mache The plastid chromosome of spinach (Spinacia oleracea): complete nucleotide sequence and gene organization. Plant Mol. Biol. 45: Shinozaki, K., M. Ohme, M. Tanaka et al. (23 co-authors) The complete nucleotide sequence of the tobacco chloroplast genome: its gene organization and expression. EMBO J. 5: Soltis, P. S., D. E. Soltis, and M. W. Chase Angiosperm phylogeny inferred from multiple genes as a tool for comparative biology. Nature 402: Staden, R., K. F. Beal, and J. K. Bonfield The Staden package, Methods Mol. Biol. 132: Strimmer, K., and A. von Haeseler Quartet puzzling: a quartet maximum likelihood method for reconstructing tree topologies. Mol. Biol. Evol. 13: Swofford, D. L PAUP*: phylogenetic analysis using parsimony (*and other methods). Version 4. Sinauer Associates, Sunderland, Mass. Takhtajan, A Systema et phylogenia Magnoliophytorum. Nauka, Moscow, Leningrad Evolution und Ausbreitung der Blütenpflanzen. Gustav Fischer Verlag, Jena. Tholesson, M LDDist: a Perl module for calculating LogDet pair-wise distances for protein and nucleotide sequences. Bioinformatics 20: Van de Peer, Y., and R. De Wachter TREECON for Windows: a software package for the construction and draw-

10 1454 Goremykin et al. ing of evolutionary trees for the Microsoft Windows environment. Comput. Applic. Biosci. 10: Wakasugi, T., J. Tsudzuki, S. Ito, K. Nakashima, T. Tsudzuki, and M. Sugiura Loss of all ndh genes as determined by sequencing the entire chloroplast genome of the black pine Pinus thunbergii. Proc. Natl. Acad. Sci. USA 91: Zanis, M. J., D. E. Soltis, P. S. Soltis, S. Mathews, and M. J. Donoghue The root of the angiosperms revisited. Proc. Natl. Acad. Sci. USA 99: Peter Lockhart, Associate Editor Accepted March 29, 2004

Removal of Noisy Characters from Chloroplast Genome-Scale Data Suggests Revision of Phylogenetic Placements of Amborella and Ceratophyllum

Removal of Noisy Characters from Chloroplast Genome-Scale Data Suggests Revision of Phylogenetic Placements of Amborella and Ceratophyllum J Mol Evol (29) 68:197 24 DOI 1.17/s239-9-926-9 Removal of Noisy Characters from Chloroplast Genome-Scale Data Suggests Revision of Phylogenetic Placements of Amborella and Ceratophyllum Vadim V. Goremykin

More information

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata.

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata. Supplementary Note S2 Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata. Phylogenetic trees reconstructed by a variety of methods from either single-copy orthologous loci (Class

More information

Supplemental Information. Full transcription of the chloroplast genome in photosynthetic

Supplemental Information. Full transcription of the chloroplast genome in photosynthetic Supplemental Information Full transcription of the chloroplast genome in photosynthetic eukaryotes Chao Shi 1,3*, Shuo Wang 2*, En-Hua Xia 1,3*, Jian-Jun Jiang 2, Fan-Chun Zeng 2, and 1, 2 ** Li-Zhi Gao

More information

The Phylogenetic Reconstruction of the Grass Family (Poaceae) Using matk Gene Sequences

The Phylogenetic Reconstruction of the Grass Family (Poaceae) Using matk Gene Sequences The Phylogenetic Reconstruction of the Grass Family (Poaceae) Using matk Gene Sequences by Hongping Liang Dissertation submitted to the Faculty of the Virginia Polytechnic Institute and State University

More information

Plastid Genome Phylogeny and a Model of Amino Acid Substitution for Proteins Encoded by Chloroplast DNA

Plastid Genome Phylogeny and a Model of Amino Acid Substitution for Proteins Encoded by Chloroplast DNA J Mol Evol (2000) 50:348 358 DOI: 10.1007/s002399910038 Springer-Verlag New York Inc. 2000 Plastid Genome Phylogeny and a Model of Amino Acid Substitution for Proteins Encoded by Chloroplast DNA Jun Adachi,

More information

C3020 Molecular Evolution. Exercises #3: Phylogenetics

C3020 Molecular Evolution. Exercises #3: Phylogenetics C3020 Molecular Evolution Exercises #3: Phylogenetics Consider the following sequences for five taxa 1-5 and the known outgroup O, which has the ancestral states (note that sequence 3 has changed from

More information

Third-codon transversion rate-based Nymphaea basal angiosperm phylogeny -- concordance with developmental evidence

Third-codon transversion rate-based Nymphaea basal angiosperm phylogeny -- concordance with developmental evidence Third-codon transversion rate-based Nymphaea basal angiosperm phylogeny -- concordance with developmental evidence Xiaohan Yang*, Gerald A. Tuskan*, Timothy J. Tschaplinski, (Max) Zong-Ming Cheng* *Department

More information

Dr. Amira A. AL-Hosary

Dr. Amira A. AL-Hosary Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: istance-based methods Ultrametric Additive: UPGMA Transformed istance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

8/23/2014. Phylogeny and the Tree of Life

8/23/2014. Phylogeny and the Tree of Life Phylogeny and the Tree of Life Chapter 26 Objectives Explain the following characteristics of the Linnaean system of classification: a. binomial nomenclature b. hierarchical classification List the major

More information

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

CHAPTERS 24-25: Evidence for Evolution and Phylogeny CHAPTERS 24-25: Evidence for Evolution and Phylogeny 1. For each of the following, indicate how it is used as evidence of evolution by natural selection or shown as an evolutionary trend: a. Paleontology

More information

Letter to the Editor. Department of Biology, Arizona State University

Letter to the Editor. Department of Biology, Arizona State University Letter to the Editor Traditional Phylogenetic Reconstruction Methods Reconstruct Shallow and Deep Evolutionary Relationships Equally Well Michael S. Rosenberg and Sudhir Kumar Department of Biology, Arizona

More information

The origin of flowering plants and characteristics of angiosperm

The origin of flowering plants and characteristics of angiosperm Independent and combined analyses of sequences from all three genomic compartments converge on the root of flowering plant phylogeny Todd J. Barkman, Gordon Chenery, Joel R. McNeal, James Lyons-Weiler,

More information

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

More information

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the

More information

Effects of Gap Open and Gap Extension Penalties

Effects of Gap Open and Gap Extension Penalties Brigham Young University BYU ScholarsArchive All Faculty Publications 200-10-01 Effects of Gap Open and Gap Extension Penalties Hyrum Carroll hyrumcarroll@gmail.com Mark J. Clement clement@cs.byu.edu See

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: Distance-based methods Ultrametric Additive: UPGMA Transformed Distance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition

Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition David D. Pollock* and William J. Bruno* *Theoretical Biology and Biophysics, Los Alamos National

More information

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other? Phylogeny and systematics Why are these disciplines important in evolutionary biology and how are they related to each other? Phylogeny and systematics Phylogeny: the evolutionary history of a species

More information

The Phylogenetic Handbook

The Phylogenetic Handbook The Phylogenetic Handbook A Practical Approach to DNA and Protein Phylogeny Edited by Marco Salemi University of California, Irvine and Katholieke Universiteit Leuven, Belgium and Anne-Mieke Vandamme Rega

More information

Efficiencies of maximum likelihood methods of phylogenetic inferences when different substitution models are used

Efficiencies of maximum likelihood methods of phylogenetic inferences when different substitution models are used Molecular Phylogenetics and Evolution 31 (2004) 865 873 MOLECULAR PHYLOGENETICS AND EVOLUTION www.elsevier.com/locate/ympev Efficiencies of maximum likelihood methods of phylogenetic inferences when different

More information

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley

Integrative Biology 200 PRINCIPLES OF PHYLOGENETICS Spring 2018 University of California, Berkeley Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley B.D. Mishler Feb. 14, 2018. Phylogenetic trees VI: Dating in the 21st century: clocks, & calibrations;

More information

Phylogenetic Tree Reconstruction

Phylogenetic Tree Reconstruction I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven

More information

Phylogenetic analyses. Kirsi Kostamo

Phylogenetic analyses. Kirsi Kostamo Phylogenetic analyses Kirsi Kostamo The aim: To construct a visual representation (a tree) to describe the assumed evolution occurring between and among different groups (individuals, populations, species,

More information

7. Tests for selection

7. Tests for selection Sequence analysis and genomics 7. Tests for selection Dr. Katja Nowick Group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute for Brain Research www. nowicklab.info

More information

LAB 4: PHYLOGENIES & MAPPING TRAITS

LAB 4: PHYLOGENIES & MAPPING TRAITS LAB 4: PHYLOGENIES & MAPPING TRAITS *This is a good day to check your Physcomitrella (protonema, buds, gametophores?) and Ceratopteris cultures (embryos, young sporophytes?)* Phylogeny Introduction The

More information

Consensus Methods. * You are only responsible for the first two

Consensus Methods. * You are only responsible for the first two Consensus Trees * consensus trees reconcile clades from different trees * consensus is a conservative estimate of phylogeny that emphasizes points of agreement * philosophy: agreement among data sets is

More information

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships Chapter 26: Phylogeny and the Tree of Life You Must Know The taxonomic categories and how they indicate relatedness. How systematics is used to develop phylogenetic trees. How to construct a phylogenetic

More information

Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/26

Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/26 Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 27. Phylogeny methods, part 4 (Models of DNA and

More information

Maria V. Yamburenko, Yan O. Zubo, Radomíra Vanková, Victor V. Kusnetsov, Olga N. Kulaeva, Thomas Börner

Maria V. Yamburenko, Yan O. Zubo, Radomíra Vanková, Victor V. Kusnetsov, Olga N. Kulaeva, Thomas Börner ABA represses the transcription of chloroplast genes Maria V. Yamburenko, Yan O. Zubo, Radomíra Vanková, Victor V. Kusnetsov, Olga N. Kulaeva, Thomas Börner Supplementary data Supplementary tables Table

More information

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution Ziheng Yang Department of Biology, University College, London An excess of nonsynonymous substitutions

More information

Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/22

Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/22 Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 24. Phylogeny methods, part 4 (Models of DNA and

More information

Flowering plants (Magnoliophyta)

Flowering plants (Magnoliophyta) Flowering plants (Magnoliophyta) Susana Magallón Departamento de Botánica, Instituto de Biología, Universidad Nacional Autónoma de México, 3er Circuito de Ciudad Universitaria, Del. Coyoacán, México D.F.

More information

Lecture 6 Phylogenetic Inference

Lecture 6 Phylogenetic Inference Lecture 6 Phylogenetic Inference From Darwin s notebook in 1837 Charles Darwin Willi Hennig From The Origin in 1859 Cladistics Phylogenetic inference Willi Hennig, Cladistics 1. Clade, Monophyletic group,

More information

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016 Molecular phylogeny - Using molecular sequences to infer evolutionary relationships Tore Samuelsson Feb 2016 Molecular phylogeny is being used in the identification and characterization of new pathogens,

More information

How to read and make phylogenetic trees Zuzana Starostová

How to read and make phylogenetic trees Zuzana Starostová How to read and make phylogenetic trees Zuzana Starostová How to make phylogenetic trees? Workflow: obtain DNA sequence quality check sequence alignment calculating genetic distances phylogeny estimation

More information

FUNDAMENTALS OF MOLECULAR EVOLUTION

FUNDAMENTALS OF MOLECULAR EVOLUTION FUNDAMENTALS OF MOLECULAR EVOLUTION Second Edition Dan Graur TELAVIV UNIVERSITY Wen-Hsiung Li UNIVERSITY OF CHICAGO SINAUER ASSOCIATES, INC., Publishers Sunderland, Massachusetts Contents Preface xiii

More information

Phylogenetic inference

Phylogenetic inference Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis-) advantages of different information types

More information

Cladistics and Bioinformatics Questions 2013

Cladistics and Bioinformatics Questions 2013 AP Biology Name Cladistics and Bioinformatics Questions 2013 1. The following table shows the percentage similarity in sequences of nucleotides from a homologous gene derived from five different species

More information

Beyond Reasonable Doubt: Evolution from DNA Sequences

Beyond Reasonable Doubt: Evolution from DNA Sequences : Evolution from DNA Sequences W. Timothy J. White 1., Bojian Zhong 1., David Penny 1 * Institute of Fundamental Sciences, Massey University, Palmerston North, New Zealand Abstract We demonstrate quantitatively

More information

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics Bioinformatics 1 Biology, Sequences, Phylogenetics Part 4 Sepp Hochreiter Klausur Mo. 30.01.2011 Zeit: 15:30 17:00 Raum: HS14 Anmeldung Kusss Contents Methods and Bootstrapping of Maximum Methods Methods

More information

The origin of angiosperms has long been considered a fundamental

The origin of angiosperms has long been considered a fundamental Phylogeny of seed plants based on all three genomic compartments: Extant gymnosperms are monophyletic and Gnetales closest relatives are conifers L. Michelle Bowe*, Gwénaële Coat, and Claude W. depamphilis

More information

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky MOLECULAR PHYLOGENY "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky EVOLUTION - theory that groups of organisms change over time so that descendeants differ structurally

More information

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 2009 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

More information

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 200 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

More information

SPECIATION. REPRODUCTIVE BARRIERS PREZYGOTIC: Barriers that prevent fertilization. Habitat isolation Populations can t get together

SPECIATION. REPRODUCTIVE BARRIERS PREZYGOTIC: Barriers that prevent fertilization. Habitat isolation Populations can t get together SPECIATION Origin of new species=speciation -Process by which one species splits into two or more species, accounts for both the unity and diversity of life SPECIES BIOLOGICAL CONCEPT Population or groups

More information

7.1 Introduction. Summary

7.1 Introduction. Summary Regulation of Plastid Gene Expression by High Temperature during Light Induced Chloroplast Development in Dark Grown Wheat Seedlings : Study using the Cloned DNA Fragments of Chloroplast Genome Summary

More information

C.DARWIN ( )

C.DARWIN ( ) C.DARWIN (1809-1882) LAMARCK Each evolutionary lineage has evolved, transforming itself, from a ancestor appeared by spontaneous generation DARWIN All organisms are historically interconnected. Their relationships

More information

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Bio 1B Lecture Outline (please print and bring along) Fall, 2007 Bio 1B Lecture Outline (please print and bring along) Fall, 2007 B.D. Mishler, Dept. of Integrative Biology 2-6810, bmishler@berkeley.edu Evolution lecture #5 -- Molecular genetics and molecular evolution

More information

Reconstructing the history of lineages

Reconstructing the history of lineages Reconstructing the history of lineages Class outline Systematics Phylogenetic systematics Phylogenetic trees and maps Class outline Definitions Systematics Phylogenetic systematics/cladistics Systematics

More information

Another Look at the Root of the Angiosperms Reveals a Familiar Tale

Another Look at the Root of the Angiosperms Reveals a Familiar Tale Syst. Biol. 63(3):368 382, 2014 The Author(s) 2014. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com

More information

ASSESSING AMONG-LOCUS VARIATION IN THE INFERENCE OF SEED PLANT PHYLOGENY

ASSESSING AMONG-LOCUS VARIATION IN THE INFERENCE OF SEED PLANT PHYLOGENY Int. J. Plant Sci. 168(2):111 124. 2007. Ó 2007 by The University of Chicago. All rights reserved. 1058-5893/2007/16802-0001$15.00 ASSESSING AMONG-LOCUS VARIATION IN THE INFERENCE OF SEED PLANT PHYLOGENY

More information

Phylogenetic Analysis

Phylogenetic Analysis Phylogenetic Analysis Aristotle Through classification, one might discover the essence and purpose of species. Nelson & Platnick (1981) Systematics and Biogeography Carl Linnaeus Swedish botanist (1700s)

More information

Organelle genome evolution

Organelle genome evolution Organelle genome evolution Plant of the day! Rafflesia arnoldii -- largest individual flower (~ 1m) -- no true leafs, shoots or roots -- holoparasitic -- non-photosynthetic Big questions What is the origin

More information

Need for systematics. Applications of systematics. Linnaeus plus Darwin. Approaches in systematics. Principles of cladistics

Need for systematics. Applications of systematics. Linnaeus plus Darwin. Approaches in systematics. Principles of cladistics Topics Need for systematics Applications of systematics Linnaeus plus Darwin Approaches in systematics Principles of cladistics Systematics pp. 474-475. Systematics - Study of diversity and evolutionary

More information

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Bioinformatics tools for phylogeny and visualization. Yanbin Yin Bioinformatics tools for phylogeny and visualization Yanbin Yin 1 Homework assignment 5 1. Take the MAFFT alignment http://cys.bios.niu.edu/yyin/teach/pbb/purdue.cellwall.list.lignin.f a.aln as input and

More information

ANGIOSPERM DIVERGENCE TIMES: THE EFFECT OF GENES, CODON POSITIONS, AND TIME CONSTRAINTS

ANGIOSPERM DIVERGENCE TIMES: THE EFFECT OF GENES, CODON POSITIONS, AND TIME CONSTRAINTS Evolution, 59(8), 2005, pp. 1653 1670 ANGIOSPERM DIVERGENCE TIMES: THE EFFECT OF GENES, CODON POSITIONS, AND TIME CONSTRAINTS SUSANA A. MAGALLÓN 1,2 AND MICHAEL J. SANDERSON 3,4 1 Departamento de Botánica,

More information

PhyQuart-A new algorithm to avoid systematic bias & phylogenetic incongruence

PhyQuart-A new algorithm to avoid systematic bias & phylogenetic incongruence PhyQuart-A new algorithm to avoid systematic bias & phylogenetic incongruence Are directed quartets the key for more reliable supertrees? Patrick Kück Department of Life Science, Vertebrates Division,

More information

Algorithms in Bioinformatics

Algorithms in Bioinformatics Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Distance Methods Character Methods

More information

Inferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution

Inferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution Today s topics Inferring phylogeny Introduction! Distance methods! Parsimony method!"#$%&'(!)* +,-.'/01!23454(6!7!2845*0&4'9#6!:&454(6 ;?@AB=C?DEF Overview of phylogenetic inferences Methodology Methods

More information

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline Phylogenetics Todd Vision iology 522 March 26, 2007 pplications of phylogenetics Studying organismal or biogeographic history Systematics ating events in the fossil record onservation biology Studying

More information

Phylogenetic Analysis

Phylogenetic Analysis Phylogenetic Analysis Aristotle Through classification, one might discover the essence and purpose of species. Nelson & Platnick (1981) Systematics and Biogeography Carl Linnaeus Swedish botanist (1700s)

More information

Phylogenetic Analysis

Phylogenetic Analysis Phylogenetic Analysis Aristotle Through classification, one might discover the essence and purpose of species. Nelson & Platnick (1981) Systematics and Biogeography Carl Linnaeus Swedish botanist (1700s)

More information

ESS 345 Ichthyology. Systematic Ichthyology Part II Not in Book

ESS 345 Ichthyology. Systematic Ichthyology Part II Not in Book ESS 345 Ichthyology Systematic Ichthyology Part II Not in Book Thought for today: Now, here, you see, it takes all the running you can do, to keep in the same place. If you want to get somewhere else,

More information

Lecture 11 Friday, October 21, 2011

Lecture 11 Friday, October 21, 2011 Lecture 11 Friday, October 21, 2011 Phylogenetic tree (phylogeny) Darwin and classification: In the Origin, Darwin said that descent from a common ancestral species could explain why the Linnaean system

More information

Letter to the Editor. The Effect of Taxonomic Sampling on Accuracy of Phylogeny Estimation: Test Case of a Known Phylogeny Steven Poe 1

Letter to the Editor. The Effect of Taxonomic Sampling on Accuracy of Phylogeny Estimation: Test Case of a Known Phylogeny Steven Poe 1 Letter to the Editor The Effect of Taxonomic Sampling on Accuracy of Phylogeny Estimation: Test Case of a Known Phylogeny Steven Poe 1 Department of Zoology and Texas Memorial Museum, University of Texas

More information

Sequenced Mitochondrial Genomes of Bryophytes

Sequenced Mitochondrial Genomes of Bryophytes Mitochondrial Genomes of Bryophytes 1 Sequenced Mitochondrial Genomes of Bryophytes Asheesh Shanker Department of Bioscience and Biotechnology, Banasthali University, Rajasthan, India Abstract: The determination

More information

Chapter 26 Phylogeny and the Tree of Life

Chapter 26 Phylogeny and the Tree of Life Chapter 26 Phylogeny and the Tree of Life Biologists estimate that there are about 5 to 100 million species of organisms living on Earth today. Evidence from morphological, biochemical, and gene sequence

More information

Phylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches

Phylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches Int. J. Bioinformatics Research and Applications, Vol. x, No. x, xxxx Phylogenies Scores for Exhaustive Maximum Likelihood and s Searches Hyrum D. Carroll, Perry G. Ridge, Mark J. Clement, Quinn O. Snell

More information

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Bioinformatics II Probability and Statistics Universität Zürich and ETH Zürich Spring Semester 2009 Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Dr Fraser Daly adapted from

More information

Phylogenetics. BIOL 7711 Computational Bioscience

Phylogenetics. BIOL 7711 Computational Bioscience Consortium for Comparative Genomics! University of Colorado School of Medicine Phylogenetics BIOL 7711 Computational Bioscience Biochemistry and Molecular Genetics Computational Bioscience Program Consortium

More information

Estimating Divergence Dates from Molecular Sequences

Estimating Divergence Dates from Molecular Sequences Estimating Divergence Dates from Molecular Sequences Andrew Rambaut and Lindell Bromham Department of Zoology, University of Oxford The ability to date the time of divergence between lineages using molecular

More information

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics)

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics) - Phylogeny? - Systematics? The Phylogenetic Systematics (Phylogeny and Systematics) - Phylogenetic systematics? Connection between phylogeny and classification. - Phylogenetic systematics informs the

More information

PHYLOGENY AND SYSTEMATICS

PHYLOGENY AND SYSTEMATICS AP BIOLOGY EVOLUTION/HEREDITY UNIT Unit 1 Part 11 Chapter 26 Activity #15 NAME DATE PERIOD PHYLOGENY AND SYSTEMATICS PHYLOGENY Evolutionary history of species or group of related species SYSTEMATICS Study

More information

Chapter 16: Reconstructing and Using Phylogenies

Chapter 16: Reconstructing and Using Phylogenies Chapter Review 1. Use the phylogenetic tree shown at the right to complete the following. a. Explain how many clades are indicated: Three: (1) chimpanzee/human, (2) chimpanzee/ human/gorilla, and (3)chimpanzee/human/

More information

Lecture 4. Models of DNA and protein change. Likelihood methods

Lecture 4. Models of DNA and protein change. Likelihood methods Lecture 4. Models of DNA and protein change. Likelihood methods Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 4. Models of DNA and protein change. Likelihood methods p.1/36

More information

The MADS-Box Floral Homeotic Gene Lineages Predate the Origin of Seed Plants: Phylogenetic and Molecular Clock Estimates

The MADS-Box Floral Homeotic Gene Lineages Predate the Origin of Seed Plants: Phylogenetic and Molecular Clock Estimates J Mol Evol (1997) 45:392 396 Springer-Verlag New York Inc. 1997 The MADS-Box Floral Homeotic Gene Lineages Predate the Origin of Seed Plants: Phylogenetic and Molecular Clock Estimates Michael D. Purugganan

More information

(Stevens 1991) 1. morphological characters should be assumed to be quantitative unless demonstrated otherwise

(Stevens 1991) 1. morphological characters should be assumed to be quantitative unless demonstrated otherwise Bot 421/521 PHYLOGENETIC ANALYSIS I. Origins A. Hennig 1950 (German edition) Phylogenetic Systematics 1966 B. Zimmerman (Germany, 1930 s) C. Wagner (Michigan, 1920-2000) II. Characters and character states

More information

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5. Five Sami Khuri Department of Computer Science San José State University San José, California, USA sami.khuri@sjsu.edu v Distance Methods v Character Methods v Molecular Clock v UPGMA v Maximum Parsimony

More information

What is Phylogenetics

What is Phylogenetics What is Phylogenetics Phylogenetics is the area of research concerned with finding the genetic connections and relationships between species. The basic idea is to compare specific characters (features)

More information

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega BLAST Multiple Sequence Alignments: Clustal Omega What does basic BLAST do (e.g. what is input sequence and how does BLAST look for matches?) Susan Parrish McDaniel College Multiple Sequence Alignments

More information

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between

More information

Name. Ecology & Evolutionary Biology 2245/2245W Exam 2 1 March 2014

Name. Ecology & Evolutionary Biology 2245/2245W Exam 2 1 March 2014 Name 1 Ecology & Evolutionary Biology 2245/2245W Exam 2 1 March 2014 1. Use the following matrix of nucleotide sequence data and the corresponding tree to answer questions a. through h. below. (16 points)

More information

Inferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT

Inferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT Inferring phylogeny Constructing phylogenetic trees Tõnu Margus Contents What is phylogeny? How/why it is possible to infer it? Representing evolutionary relationships on trees What type questions questions

More information

BMC Evolutionary Biology

BMC Evolutionary Biology BMC Evolutionary Biology BioMed Central Research article The complete plastid genome sequence of Welwitschia mirabilis: an unusually compact plastome with accelerated divergence rates Skip R McCoy 1, Jennifer

More information

Investigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST

Investigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST Investigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST Introduction Bioinformatics is a powerful tool which can be used to determine evolutionary relationships and

More information

Name: Class: Date: ID: A

Name: Class: Date: ID: A Class: _ Date: _ Ch 17 Practice test 1. A segment of DNA that stores genetic information is called a(n) a. amino acid. b. gene. c. protein. d. intron. 2. In which of the following processes does change

More information

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree) I9 Introduction to Bioinformatics, 0 Phylogenetic ree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & omputing, IUB Evolution theory Speciation Evolution of new organisms is driven by

More information

Chapter 19: Taxonomy, Systematics, and Phylogeny

Chapter 19: Taxonomy, Systematics, and Phylogeny Chapter 19: Taxonomy, Systematics, and Phylogeny AP Curriculum Alignment Chapter 19 expands on the topics of phylogenies and cladograms, which are important to Big Idea 1. In order for students to understand

More information

PGA: A Program for Genome Annotation by Comparative Analysis of. Maximum Likelihood Phylogenies of Genes and Species

PGA: A Program for Genome Annotation by Comparative Analysis of. Maximum Likelihood Phylogenies of Genes and Species PGA: A Program for Genome Annotation by Comparative Analysis of Maximum Likelihood Phylogenies of Genes and Species Paulo Bandiera-Paiva 1 and Marcelo R.S. Briones 2 1 Departmento de Informática em Saúde

More information

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley

PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION Integrative Biology 200B Spring 2009 University of California, Berkeley "PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley B.D. Mishler Jan. 22, 2009. Trees I. Summary of previous lecture: Hennigian

More information

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057 Estimating Phylogenies (Evolutionary Trees) II Biol4230 Thurs, March 2, 2017 Bill Pearson wrp@virginia.edu 4-2818 Jordan 6-057 Tree estimation strategies: Parsimony?no model, simply count minimum number

More information

Distances that Perfectly Mislead

Distances that Perfectly Mislead Syst. Biol. 53(2):327 332, 2004 Copyright c Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 10.1080/10635150490423809 Distances that Perfectly Mislead DANIEL H. HUSON 1 AND

More information

Algorithmic Methods Well-defined methodology Tree reconstruction those that are well-defined enough to be carried out by a computer. Felsenstein 2004,

Algorithmic Methods Well-defined methodology Tree reconstruction those that are well-defined enough to be carried out by a computer. Felsenstein 2004, Tracing the Evolution of Numerical Phylogenetics: History, Philosophy, and Significance Adam W. Ferguson Phylogenetic Systematics 26 January 2009 Inferring Phylogenies Historical endeavor Darwin- 1837

More information

MOLECULAR PHYLOGENY AND GENETIC DIVERSITY ANALYSIS. Masatoshi Nei"

MOLECULAR PHYLOGENY AND GENETIC DIVERSITY ANALYSIS. Masatoshi Nei MOLECULAR PHYLOGENY AND GENETIC DIVERSITY ANALYSIS Masatoshi Nei" Abstract: Phylogenetic trees: Recent advances in statistical methods for phylogenetic reconstruction and genetic diversity analysis were

More information

Molecular Clocks. The Holy Grail. Rate Constancy? Protein Variability. Evidence for Rate Constancy in Hemoglobin. Given

Molecular Clocks. The Holy Grail. Rate Constancy? Protein Variability. Evidence for Rate Constancy in Hemoglobin. Given Molecular Clocks Rose Hoberman The Holy Grail Fossil evidence is sparse and imprecise (or nonexistent) Predict divergence times by comparing molecular data Given a phylogenetic tree branch lengths (rt)

More information

Phylogenetics in the Age of Genomics: Prospects and Challenges

Phylogenetics in the Age of Genomics: Prospects and Challenges Phylogenetics in the Age of Genomics: Prospects and Challenges Antonis Rokas Department of Biological Sciences, Vanderbilt University http://as.vanderbilt.edu/rokaslab http://pubmed2wordle.appspot.com/

More information

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz Phylogenetic Trees What They Are Why We Do It & How To Do It Presented by Amy Harris Dr Brad Morantz Overview What is a phylogenetic tree Why do we do it How do we do it Methods and programs Parallels

More information

Chapter 27: Evolutionary Genetics

Chapter 27: Evolutionary Genetics Chapter 27: Evolutionary Genetics Student Learning Objectives Upon completion of this chapter you should be able to: 1. Understand what the term species means to biology. 2. Recognize the various patterns

More information

Lecture Notes: Markov chains

Lecture Notes: Markov chains Computational Genomics and Molecular Biology, Fall 5 Lecture Notes: Markov chains Dannie Durand At the beginning of the semester, we introduced two simple scoring functions for pairwise alignments: a similarity

More information