Removal of Noisy Characters from Chloroplast Genome-Scale Data Suggests Revision of Phylogenetic Placements of Amborella and Ceratophyllum

Size: px
Start display at page:

Download "Removal of Noisy Characters from Chloroplast Genome-Scale Data Suggests Revision of Phylogenetic Placements of Amborella and Ceratophyllum"

Transcription

1 J Mol Evol (29) 68: DOI 1.17/s Removal of Noisy Characters from Chloroplast Genome-Scale Data Suggests Revision of Phylogenetic Placements of Amborella and Ceratophyllum Vadim V. Goremykin Æ Roberto Viola Æ Frank H. Hellwig Received: 29 June 28 / Accepted: 29 January 29 / Published online: 27 February 29 Ó Springer Science+Business Media, LLC 29 Abstract It is widely appreciated that noisy, highly variable data can impede phylogeney reconstruction. Researchers have for a long time omitted problematic data from phylogenetic analyses, such as the third-codon positions and variable regions. In the analyses of the phylogenetic relations of the angiosperms; however, inclusion of complete gene sequences into genomic-scale alignments has become a common practice. Here we demonstrate that this practice can be misleading. We show that support of the basal-most position of Amborella trichopoda among the angiosperms in the chloroplast genomic data is based only on a tiny subset (\ 1% of the total alignment length) of the most variable positions in alignment, exhibiting mean maximum likelihood (ML) distance among the angiosperm operational taxonomic units (OTUs) approximately 36 substitutions/site. Exclusion of these positions leads to disappearance of the basal Amborella branch. Likewise, the recently reported sistergroup relationship of Ceratophyllum to the eudicots is based on the presence of 2% of the most variable positions in the genomic alignment, exhibiting, on average, 2 substitutions/site in comparison among the angiosperm OTUs. These observations highlight a need for excluding a certain proportion of saturated positions in alignment from phylogenomic analyses. V. V. Goremykin (&) R. Viola IASMA Research Center, Via E. Mach 1, 381 San Michele all Adige, TN, Italy vadim.goremykin@iasma.it F. H. Hellwig Institut für Spezielle Botanik, Universität Jena, Philosophenweg 16, 7743 Jena, Germany Keywords Chloroplast genomes Molecular evolution Angiosperm diversification Introduction The first years of plant phylogenomics demonstrated that amassing large number of characters is not sufficient to ensure the accuracy of phylogenetic inference (Soltis et al. 24; Goremykin et al. 25). We also observed that systematic mistakes inherent to the current methods of phylogeny reconstruction may lead to the appearance of spurious results that are nonetheless strongly supported by the nonparametric bootstrap (Goremykin and Hellwig 26). Recently, Jeffroy et al. (26) made an interesting observation that a more reliable topology might be obtained with a worse-performing method applied to data with less saturation than with a more exact method applied to data with high saturation. In particular, they observed that removal of the fast-evolving positions makes the results of phylogeny reconstruction much less affected by nonphylogenetic (in particular compositional) signals, and, consequently, recommended the voluntary discarding of part of the data from a phylogenetic analysis. In this study, we wished to check if such decrease of variability could provide new insights into the early diversification of angiosperms, an area of considerable recent debate (Stefanovic et al. 24; Soltis et al. 24; Goremykin et al. 25; Leebens-Mack et al. 25; Goremykin and Hellwig 26; Jansen et al. 27; Moore et al. 27). We hypothesized that the proportion of data still bearing witness to the old diversification events might be low in this case, given the relatively short geologic time span for the origin of the major angiosperm groups compared with the long time passed since then (Crane et al.

2 198 J Mol Evol (29) 68: ). An attempt to resolve this old radiation is aggravated by massive extinction in certain angiosperm lines, which left a number of highly isolated taxa subtended by long branches at the base of the angiosperm tree (Amborellales, Ceratophyllales, and Nymphaeales). Because of the superimposed mutations, various attraction artefacts can be deemed quite probable under such conditions. Because nothing can be done to improve taxon sampling in the vicinity of such isolated angiosperm lines, removal of the distorting nonphylogenetic signal (i.e., noise) is the only currently available way to improve phylogenetic reconstruction of these taxa. Previously we routinely excluded the highly divergent third-codon positions from our genomic analyses because of these concerns. Recently Stefanovic et al. (24) and Leebens-Mack et al. (25) reported that the third-codon positions should be included in phylogeny reconstruction studies on basal angiosperms. According to their suggestion, this time we present the results obtained with and without the third-codon positions as well as the results obtained after discarding a small proportion of the saturated alignment positions. We present here a data set comprising 95 chloroplast gene sequences as well as 2 introns and 7 intergenic transcribed spacers from the inverted repeat of cpdna. Results of phylogenetic reconstruction based on this data set suggest that the placement of Ceratophyllum as a sister to the eudicots (Moore at al. 27), as well as the well-publicised basalmost placement of Amborella among the extant angiosperms (most recently asserted by Jansen et al. 27 in their phylogenomic research of cpdna), are most likely artefacts because of the presence of noisy data in alignment. Materials and Methods Genome Sequencing Fresh shoots of Ceratophyllum demersum were harvested from a plant grown at the Botanical Garden of the University of Jena, Germany. Total DNA was extracted using the cetyltrimethylammoniumbromid-based method (Murray and Thompson 198) and purified with Qiagen columns according to the manufacturer s protocol (Qiagen, Valencia, CA). We employed a long-range polymerase chain reaction (PCR) strategy to cover a chloroplast genome with PCR products as previously described (Goremykin et al. 23). To fill the gaps in the genomic assembly, we also developed a set of Ceratophyllum-specific primers. The resulting products were purified by electrophoresis through low-melting agarose gels. According to agarose digestion with agarase, DNA in the resulting solution was directly subjected to fragmentation and subcloning employing the TOPO Shotgun Subcloning Kit (Invitrogen, Groningen, The Netherlands) according to the manufacturer s protocol. Recombinant plasmids were isolated from the clones using the Montage Plasmid Miniprep Kit (Millipore, Eschborn, Germany). The resulting plasmid DNA was prepared for sequence analysis with the Big Dye Terminator Sequencing Kit (Applied Biosystems, Foster City, CA) according to the manufacturer s protocol. Automated sequencing was performed on ABI 31 (Applied Biosystems) sequencers. ABI-reads were base-called with the PHRED program (Ewing and Green 1998). Sequence masking and assembly was performed with the STADEN package (Staden et al. 2). At the first stage of the plastome amplification, the reads were accumulated until 89 coverage was achieved for all PCR fragments. At the second stage (closure of the remaining gaps by PCR), we accepted at least 39 coverage for smaller PCR products. Results Intraspecific Divergence Among Chloroplast Genomes The cpdna of C. demersum that we sequenced is a 156,177 bp-long circular molecule, 76 bases shorter then the previously published cpdna from the northern American specimen of this plant (Moore et al. 27). The difference in size is caused by numerous indels concentrated in the noncoding regions of both plants. In addition to the indels, these two genome sequences have 257 single nucleotide polymorphisms, which correspond to 1 mutation per 69 alignment positions and to an uncorrected p distance of.16 substitutions/site (s/s). Two Ceratophyllum cpdna sequences have no inversions in respect to each other and have the same gene content. The lengths of Ceratophyllum cpdnas and their gene content are typical for the plastomes of the dicotyledonous angiosperms, with the latter being, for instance, identical to that of Nymphaea alba (Goremykin et al. 24). In addition to Ceratophyllum, chloroplast genomes from different specimen of the same species are currently available for two cultivars of Oryza sativa: indica (Tang et al. 24) and japonica (Hiratsuka et al. 1989). Wishing to estimate intraspecific sequence divergence in rice, we manually aligned these 2 sequences. Resulting alignment contained 152 single nucleotide polymorphisms, which corresponds to 1 mutation for 886 alignment positions and an uncorrected p distance of.11 s/s. High numbers of substitution observed between cpdnas from the same species suggests that chloroplast genomes can be a useful tool for population genetics studies.

3 J Mol Evol (29) 68: Phylogenetic Analyses Sequences of the 61 protein-coding genes, 3 trna genes, and 4 rrna genes, as well as those of the 7 spacers and 2 introns located in the most conserved part of the inverted repeat region of the cpdna, were sampled from the annotated sequences of the publicly available chloroplast genomes as well as from our de novo sequenced cpdna of C. demersum (EBI accession number AM71298). They were sorted into separate files for each individual gene and region. Files containing the protein-coding sequences were processed to produce alignments of all codon positions and of the first and the second codon positions. Nonproteincoding sequences were aligned using CLUSTALW. These individual alignments were manually concatenated and edited to produce a 53,848 position-long alignment (referred to hereafter as alignment A) and its 39,22 position-long subset with no third-codon positions (alignment B). Phylogenetic trees were constructed employing PAUP* v.4.b1 (Swofford 22) and PHYML (Guindon and Gascuel 23). We performed tests of model fitness (hierarchical likelihood ratio test (hlrt) and the Akaike Information Criterion-based test (AIC)) as implemented in Modeltest (Posada and Crandall 1998) based on the A and B alignments and identified the base substitution models best describing our data (GTR? I? C in both test cases). Using these, we built maximum likelihood (ML) trees with the help of PAUP*. To get the bootstrap branch support values for the trees obtained, we used the bootstrapping algorithm implemented in the PHYML, employing the previously mentioned model and the trees recovered previously with the help of PAUP* as the input trees. We did this because it would take a prohibitively long time to perform bootstrap with PAUP*. The ML tree built from alignment A is shown in Fig. 1. The topology obtained after the third positions were removed (alignment B) is highly similar to the tree presented in Fig. 1. However, it supports a sister-group relationship of Amborella and Nymphaea (78/1 bootstrap proportion support) at the base of the tree instead of the basal-most placement of the former species among the extant angiosperms. Piper is not sister to Drimys as was the case of alignment A but forms a sister group to the cluster (Drimys [Calycanthus, Liriodendron]). The branching order of the other operational taxonomic units (OTUs) is the same on the both ML trees. Having obtained slightly different trees, we wished to determine which placement of Amborella was more trustworthy. Previously, we had globally deleted the thirdcodon positions from our genomic data sets because they, on average, exhibit much higher substitution rates compared with the first and the second positions. Removal of the third-codon positions is a widespread practice because it is easy to accomplish using available programs such as PAUP*. However, some third-codon positions are constant or nearly so, so there is no reason to get rid of them. At the same time, the first- and second-codon positions also contain a certain (smaller) proportion of some highly variable sites that arguably must be removed. A more objective but somewhat more complex way to deal with such instances of saturation would be to measure variability directly at each alignment position and to discard only those positions affected by such saturation. To do so we employed a character-sorting approach similar to the one we published previously (Goremykin et al. 1997). With the help of our Perl script (sorter. pl, available on request), we calculated p distances at each position of alignment A and then sorted the alignment positions in ascending order of the resulting values. The resulting alignment, which contains invariable positions to the left and the most divergent positions to the right, was subsequently iteratively shortened by 5 positions from the right-hand side, producing a series of the alignments with decreased variability. We identified the best symmetric ML models for the sorted alignment A and its first 19 shortened subsets using Modeltest. GTR? I? C was chosen by both tests implemented in this program (hlrt and AIC) in all 2 cases. Employing the settings of these models, we built 2 ML trees with PAUP; imported the resulting trees into PHYML to be used as starting trees; and performed bootstrap with the help of PHYML by employing the previously mentioned model. The results of these experiments are presented in Fig. 2. One can see that removal of the first 5 most variable positions (\1% of the total data length) from alignment A leads to the loss of support for the basal-most position of Amborella within the angiosperms. Removal of the 1 most divergent positions results in Ceratophyllum assuming the sister-group position to the branch bearing eudicots and monocots, and removal of 25 positions leads to shifting of the branch subtending Ceratophyllum further down the tree to the base of the cluster uniting four magnoliid species. An example of this topology is presented in Fig. 3. Further changes in tree topology do not occur until a total of 55 positions are removed. The noneudicot parts of trees, built on the basis of the subsets of alignment A with 55 to 8 of the most divergent positions removed, has the same topology as the tree in Fig. 3. The eudicot clusters on these trees contain unresolved branches with zero lengths. Further decrease of variability results in disintegration of the monocot and dicot clusters. Ceratophyllum becomes a sister group to Phalaenopsis and Acorus to Spinacia. There are numerous zero-length branches on these trees. To estimate the sequence divergence level within the subset of alignment A comprising the 5 most variable

4 2 J Mol Evol (29) 68: Fig. 1 Tree obtained in ML analyses of alignment A. The numbers next to the branches indicate bootstrap support bootstrap values Amborella+outgroup Ceratophyllum+eudicots Ceratophyllum+magnoliids lengths of alignments Amborella+Nymphaea Ceratophyllum+monocots+eudicots Ceratophyllum+Phalenopsis Fig. 2 Bootstrap support for the various placements of Amborella and Ceratophyllum in the ML trees built on the basis of sorted alignment A and its 19 subsets with decreased variability. The numbers below the graph indicate alignment lengths. The bootstrap values supporting the branch subtending all angiosperms, except Nymphaea and Amborella, were approximately 1% throughout the variability removal process and are not shown positions, we used the following procedure in PAUP*: (1) used Modeltest to find the optimal model for this subset, (2) imported the settings of the best model into PAUP* using the lset command (Swofford 22), (3) set the distance to ml (PAUP* command: dset distance = ml;), (4) constructed a neighbor-joining (NJ) tree (PAUP* command: nj;), and (5) exported the distance matrix used to build the tree into a file (5.matrix, supplementary materials) using the savedist command (Swofford 22). The mean distance in this matrix is s/s. The mean distance among the angiosperms in this matrix is 36.6 s/s. We wished to see if there is any correlation among the distances between Amborella and other species calculated on the basis of the 5-position subset of the most divergent positions and the rest of the alignment. We reproduced the distance matrixes using Tree-Puzzle v. 5.2 (Strimmer and von Haeseler 1996), each time setting the rate parameters of GTR? G model to those calculated with the help of PAUP. This was done in order to be able to see the part of the graphs depicting distances lower than 9 s/s. PAUP matrices contained distances higher than 1 s/s, so the part of the graph

5 J Mol Evol (29) 68: Fig. 3 Tree topology obtained in ML analyses of the subsets of alignment A with the 2 to 5 most variable positions removed. The tree presented was obtained on the basis of a 48,363 position-long subset below 1 s/s would appear flattened. Tree-Puzzle sets very high (and therefore very unreliable) distances to approximately 9 s/s. The resulting dot plot is presented in Fig. 4a. There is no visible correlation in distances estimated from the 5-position subset and from the rest of alignment A. With the subset comprising the 1 most variable positions, PAUP could not build an NJ tree using the model settings suggested by Modeltest; therefore we followed an example in the PAUP command reference manual (Swofford 22, p. 57) to fit GTR? I? G to this data. We exported the distance matrix into a file in NEXUS format (1.matrix, supplementary materials). The mean distance in this matrix is s/s. The mean distance among the angiosperm OTUs in this matrix is 2.26 s/s. The dot-plot depicting correlations in the distances between Ceratophyllum and other species calculated from the 1-position subset and the rest of alignment A is presented on Fig. 4b. The distribution of distance pairs in Fig. 4b is also nonlinear. Distance calculation for the subset of the 25 most variable positions was conducted as in the case of 5-position subset. The mean distance in this matrix (25.matrix, supplementary materials) is 1.6 s/s. The mean distance among the angiosperm OTUs is 1.3 s/s. The dot-plot depicting correlations in the distances between Ceratophyllum and other species calculated from the 25-position subset and the rest of alignment A is shown in Fig. 4c. The distribution of distance pairs in Fig. 4c becomes less broad. Discussion Previously we reported that the choice of substitution model strongly affects the results of the ML inference of the phylogenetic relations among the major angiosperm lineages (Goremykin et al. 25; Goremykin and Hellwig 26). The results presented here demonstrate that careful choice of the ML model alone is not enough. We observed that the presence of a small proportion of highly variable positions in alignment alters the structure of the angiosperm subtree. We therefore suggest, in the absence of

6 22 J Mol Evol (29) 68: ML distances in 5 pos. subset Distances from Amborella to other species. 12 out of 33 distances have maximum value.,1,2,3,4,5 ML distances in alignment A shortened by 5 pos. B) Distances from Ceratophyllum to other species. 4,5 ML distances in 1 pos. subset 4 3,5 3 2,5 2 1,5 1,5 ML distances in alignment A shortened by 1 pos. C) Distances from Ceratophyllum to other species 3 ML distances in 25 pos. subset A) 2,5 2 1,5 1,5,1,2,5,1,15,2,25,3,35,4 ML distances in alignment A shortened by 25 pos. Fig. 4 Correlation in distances in the subsets of alignment A included and excluded from analyses. a Dot-plots depicting correlations in the distances between Amborella and other species calculated from the 5-position subset of the most divergent positions and the rest of alignment A. b Dot-plots depicting correlations in the distances between Ceratophyllum and other species calculated from the 1- position subset of the most divergent positions and the rest of alignment A. (c) Dot-plots depicting correlations in the distances between Ceratophyllum and other species calculated from the 25- position subset of the most divergent positions and the rest of alignment A,3,4,5 better base substitution models, discarding a certain proportion of the most variable sites from the alignment to avoid potential errors (e.g., Felsenstein 1978; Bergsten 25; Jeffroy et al. 26) in phylogeny reconstruction. In our previous studies, we chose to simply remove the divergent third-codon positions from analysis to accomplish this. However, this did not become a widespread practice, perhaps because of recommendations by Stefanovic et al. (24) and Leebens-Mack et al. (25). These investigators advocated inclusion of the third-codon positions into phylogenomic analyses of angiosperm evolution based on cpdna, citing the insignificant changes that these positions introduced to their trees. Here we observed a change caused by inclusion of the variable third-codon positions, which invokes canonical basal-most placement of Amborella, reported in a large number of papers (Mathews and Donoghue 1999, 2; Parkinson et al. 1999; Qiu et al. 1999, 2, 25; Soltis et al. 1999, 2a, b; Barkman et al. 2; Borsch et al. 23; Hilu et al. 23; Stefanovic et al. 24; Leebens- Mack et al. 25; Jansen et al. 27; Moore et al. 27). At the same time, alignment with the third-codon positions removed rather strongly supports a sister-group relation of Amborella and Nymphaea. To appraise which placement of Amborella is more trustworthy, we sorted the characters in the 53,363 position-long, genome-scale alignment according to their variability and repeatedly shortened the sorted alignment from its most divergent end, producing a series of its subsets. Then we built trees from the sequences of these subsets. This procedure allowed us to directly observe the influence of the most variable characters on the results of phylogeny reconstruction. We observed that removal of just 5 of the most variable positions from the alignment lead to the disappearance of the Amborella-basal topology. This 5- position subset exhibits mean distance [3 s/s in comparison among angiosperms and cannot be expected to bear witness to the evolutionary events that happened during the primary radiation of this plant group. The 5-position subset shows no traces of similarity between Pinus OTU and angiosperm OTUs. Its removal is therefore justified. Equally justified therefore is revision of the assertion that plastid genomic data unequivocally support Amborella as the sole sister group of the remaining angiosperms (Jansen et al. 27). Similarly, the sister-group relation between Ceratophyllum and eudicots, recently reported by Moore et al. (27), depends on the presence of the 1 most divergent positions in the alignment A. This 1-position alignment subset also exhibits very high saturation level and is best omitted from this taxonomic level of analysis unless a good

7 J Mol Evol (29) 68: case can be made in the future as to why it should be used for deep angiosperm phylogeny. The choice between two further alternative placements of Ceratophyllum as a sister group to the clade subtending eudicots plus monocots (which is supported by subsets of alignment A with 1 to 2 positions removed or as a sister to the clade ([Calycanthus,Liriodendron] [Drimys, Piper]) (which is supported by alignments with 25 to 8 positions removed) is more difficult to make. A mean distance among the angiosperms in the subset of the 25 most divergent position is 1.3 s/s. The substitution paths on this divergence level might or might not be well-described by the general time-reversible family of substitution models. To make a certain judgement between these placements, we would need to apply some objective stopping criterion for the removal of moderately saturated sites derived from the comparative performance of different substitution models applied to the different (slightly modified) data. Such methodology is not currently available; however, we are currently working in this area. Strong and consistent bootstrap support of the placement of Ceratophyllum as a sister to magnoliids, through a long phase of variability decrease until the stage characterised by apparent loss of phylogenetic signal, can be interpreted in favour of magnolian affinity of this species. References Barkman TJ, Chenery G, McNeal JR, Lyons-Weile J, Ellisens WJ, Moore G, Wolfe AD, depamphilis CW (2) Independent and combined analyses of sequences from all three genomic compartments converge on the root of flowering plant phylogeny. Proc Natl Acad Sci USA 97: Bergsten J (25) A review of long-branch attraction. Cladistics 21: Borsch T, Hilu KW, Quandt D, Wilde V, Neinhuis C, Barthlott W (23) Non-coding plastid trnt-trnf sequences reveal a well resolved phylogeny of basal angiosperms. J Evol Biol 16: Crane PR, Friis EM, Pedersen KR (1995) The origin and early diversification of angiosperms. Nature 374:27 33 Ewing B, Green P (1998) Base-calling of automated sequencer traces using Phred. II. Error probabilities. Genome Res 8: Felsenstein J (1978) Cases in which parsimony or compatibility methods will be positively misleading. Syst Zool 27:41 41 Goremykin V, Hansmann S, Martin W (1997) Evolutionary analysis of 58 proteins encoded in six completely sequenced chloroplast genomes: Revised molecular estimates of two seed plant divergence times. Plant Syst Evol 26: Goremykin VV, Holland B, Hirsch-Ernst KI, Hellwig FH (23) The chloroplast genome of the basal angiosperm Calycanthus fertilis structural and phylogenetic analyses. Plant Syst Evol 242: Goremykin VV, Hirsch-Ernst KI, Wolfl S, Hellwig FH (24) The chloroplast genome of Nymphaea alba: Whole-genome analyses and the problem of identifying the most basal angiosperm. Mol Biol Evol 21: Goremykin VV, Holland B, Hirsch-Ernst KI, Hellwig FH (25) Analysis of Acorus calamus chloroplast genome and its phylogenetic implications. Mol Biol Evol 22: Goremykin VV, Hellwig FH (26) A new test of phylogenetic model fitness addresses the issue of the basal angiosperm phylogeny. Gene 381:81 91 Guindon S, Gascuel O (23) A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol 52: Hilu KW, Borsch T, Muller K, Soltis DE, Soltis PS, Savolainen V, Chase MW, Powell M, Alice L, Evans R et al (23) Angiosperm phylogeny based on matk sequence information. Am J Bot 9: Hiratsuka J, Shimada H, Whittier R, Ishibashi T, Sakamoto M, Mori M, Kondo C, Honji Y, Sun CR, Meng BY et al (1989) The complete sequence of the rice (Oryza sativa) chloroplast genome: intermolecular recombination between distinct trna genes accounts for a major plastid DNA inversion during the evolution of the cereals. Mol Gen Genet 217: Jansen RK, Cai Z, Raubeson LA, Daniell H, Depamphilis CW, Leebens-Mack J, Muller KF, Guisinger-Bellian M, Haberle RC, Hansen AK et al (27) Analysis of 81 genes from 64 plastid genomes resolves relationships in angiosperms and identifies genome-scale evolutionary patterns. Proc Natl Acad Sci USA 14: Jeffroy O, Brinkmann H, Delsuc F, Philippe H (26) Phylogenomics: the beginning of incongruence? Trends Genet 22: Leebens-Mack J, Raubeson LA, Cui LY, Kuehl JV, Fourcade MH, Chumley TW, Boore JL, Jansen RK, de Pamphilis CW (25) Identifying the basal angiosperm node in chloroplast genome phylogenies: sampling one s way out of the Felsenstein zone. Mol Biol Evol 22: Mathews S, Donoghue MJ (1999) The root of angiosperm phylogeny inferred from duplicate phytochrome genes. Science 286: Mathews S, Donoghue MJ (2) Basal angiosperm phylogeny inferred from duplicate phytochromes A and C. Int J Plant Sci 161(Suppl):S41 S55 Moore MJ, Bell CD, Soltis PS, Soltis DE (27) Using plastid genome-scale data to resolve enigmatic relationships among basal angiosperms. Proc Natl Acad Sci USA 14: Murray MG, Thompson WF (198) Rapid isolation of high molecular weight DNA. Nucleic Acids Res 8: Posada D, Crandall KA (1998) Modeltest: Testing the model of DNA substitution. Bioinformatics 14: Parkinson CL, Adams KL, Palmer JD (1999) Multigene analyses identify the three earliest lineages of extant flowering plants. Curr Biol 9: Qiu Y-L, Lee J, Bernasconi-Quadroni F, Soltis DE, Soltis PS, Zanis M, Zimmer EA, Chen Z, Savolainen V, Chase MW (1999) The earliest angiosperms: evidence from mitochondrial, plastid and nuclear genomes. Nature 42:44 47 Qiu Y-L, Lee J, Bernasconi-Quadroni F, Soltis DE, Soltis PS, Zanis M, Zimmer EA, Chen Z, Savolainen V, Chase MW (2) Phylogeny of basal angiosperms: analyses of five genes from three genomes. Int J Plant Sci 161(Suppl):S3 S27 Qiu Y-L, Dombrovska O, Lee J, Li L, Whitlock BA, Bernasconi- Quadroni F, Rest JS, Davis CC, Borsch T, Hilu KW et al (25) Phylogenetic analyses of basal angiosperms based on nine plastid, mitochondrial, and nuclear genes. Int J Plant Sci 166: Soltis PS, Soltis DE, Chase MW (1999) Angiosperm phylogeny inferred from multiple genes as a tool for comparative biology. Nature 42:42 43 Soltis PS, Soltis DE, Zanis MJ, Kim S (2a) Basal lineages of angiosperms: relationships and implications for floral evolution. Int J Plant Sci 161(Suppl):S97 S17

8 24 J Mol Evol (29) 68: Soltis DE, Soltis PS, Chase MW, Mort ME, Albach DC, Zanis M, Savolainen V, Hahn WH, Hoot SB, Fay MF et al (2b) Angiosperm phylogeny inferred from 18S rdna, rbcl, and atpb sequences. Bot J Linn Soc 133: Soltis DE, Albert VA, Savolainen V, Hilu K, Qiu Y-L, Chase MW, Farris JS, Stefanovic S, Rice DW, Palmer JD, Soltis PS (24) Genome-scale data, angiosperm relationships, and ending incongruence : A cautionary tale in phylogenetics. Trends Plants Sci 9: Staden R, Beal KF, Bonfield JK (2) The Staden package Meth Mol Biol 132: Stefanovic S, Rice DW, Palmer JD (24) Long branch attraction, taxon sampling, and the earliest angiosperms: Amborella or monocots? BMC Evol Biol 4:35 Strimmer K, von Haeseler A (1996) Quartet puzzling: a quartet maximum likelihood method for reconstructing tree topologies. Mol Biol Evol 13: Swofford DL (22) PAUP*: phylogenetic analysis using parsimony (* and other methods). Version 4. Sinauer, Sunderland Tang J, Xia H, Cao M, Zhang X, Zeng W, Hu S, Tong W, Wang J, Wang J, Yu J, Yang H, Zhu Z (24) A comparison of rice chloroplast genomes. Plant Physiol 135:412 42

Third-codon transversion rate-based Nymphaea basal angiosperm phylogeny -- concordance with developmental evidence

Third-codon transversion rate-based Nymphaea basal angiosperm phylogeny -- concordance with developmental evidence Third-codon transversion rate-based Nymphaea basal angiosperm phylogeny -- concordance with developmental evidence Xiaohan Yang*, Gerald A. Tuskan*, Timothy J. Tschaplinski, (Max) Zong-Ming Cheng* *Department

More information

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata.

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata. Supplementary Note S2 Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata. Phylogenetic trees reconstructed by a variety of methods from either single-copy orthologous loci (Class

More information

The Phylogenetic Reconstruction of the Grass Family (Poaceae) Using matk Gene Sequences

The Phylogenetic Reconstruction of the Grass Family (Poaceae) Using matk Gene Sequences The Phylogenetic Reconstruction of the Grass Family (Poaceae) Using matk Gene Sequences by Hongping Liang Dissertation submitted to the Faculty of the Virginia Polytechnic Institute and State University

More information

The origin of flowering plants and characteristics of angiosperm

The origin of flowering plants and characteristics of angiosperm Independent and combined analyses of sequences from all three genomic compartments converge on the root of flowering plant phylogeny Todd J. Barkman, Gordon Chenery, Joel R. McNeal, James Lyons-Weiler,

More information

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

More information

C3020 Molecular Evolution. Exercises #3: Phylogenetics

C3020 Molecular Evolution. Exercises #3: Phylogenetics C3020 Molecular Evolution Exercises #3: Phylogenetics Consider the following sequences for five taxa 1-5 and the known outgroup O, which has the ancestral states (note that sequence 3 has changed from

More information

Dr. Amira A. AL-Hosary

Dr. Amira A. AL-Hosary Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

More information

Another Look at the Root of the Angiosperms Reveals a Familiar Tale

Another Look at the Root of the Angiosperms Reveals a Familiar Tale Syst. Biol. 63(3):368 382, 2014 The Author(s) 2014. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com

More information

The Phylogenetic Handbook

The Phylogenetic Handbook The Phylogenetic Handbook A Practical Approach to DNA and Protein Phylogeny Edited by Marco Salemi University of California, Irvine and Katholieke Universiteit Leuven, Belgium and Anne-Mieke Vandamme Rega

More information

On the Inter-Generic Hybrid Sasaella ramosa. Yoshiyuki HOSOYAMA, Kazuko HOSHIDA, Sonoe TAKEOKA and Shohei MIYATA. (Received November 30, 2001)

On the Inter-Generic Hybrid Sasaella ramosa. Yoshiyuki HOSOYAMA, Kazuko HOSHIDA, Sonoe TAKEOKA and Shohei MIYATA. (Received November 30, 2001) No.37 (2002) pp.209-216 Sasaella ramosa On the Inter-Generic Hybrid Sasaella ramosa Yoshiyuki HOSOYAMA, Kazuko HOSHIDA, Sonoe TAKEOKA and Shohei MIYATA (Received November 30, 2001) Until several ten years

More information

Effects of Gap Open and Gap Extension Penalties

Effects of Gap Open and Gap Extension Penalties Brigham Young University BYU ScholarsArchive All Faculty Publications 200-10-01 Effects of Gap Open and Gap Extension Penalties Hyrum Carroll hyrumcarroll@gmail.com Mark J. Clement clement@cs.byu.edu See

More information

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Bioinformatics tools for phylogeny and visualization. Yanbin Yin Bioinformatics tools for phylogeny and visualization Yanbin Yin 1 Homework assignment 5 1. Take the MAFFT alignment http://cys.bios.niu.edu/yyin/teach/pbb/purdue.cellwall.list.lignin.f a.aln as input and

More information

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression)

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression) Using phylogenetics to estimate species divergence times... More accurately... Basics and basic issues for Bayesian inference of divergence times (plus some digression) "A comparison of the structures

More information

Flowering plants (Magnoliophyta)

Flowering plants (Magnoliophyta) Flowering plants (Magnoliophyta) Susana Magallón Departamento de Botánica, Instituto de Biología, Universidad Nacional Autónoma de México, 3er Circuito de Ciudad Universitaria, Del. Coyoacán, México D.F.

More information

Consensus Methods. * You are only responsible for the first two

Consensus Methods. * You are only responsible for the first two Consensus Trees * consensus trees reconcile clades from different trees * consensus is a conservative estimate of phylogeny that emphasizes points of agreement * philosophy: agreement among data sets is

More information

The Chloroplast Genome of Nymphaea alba: Whole-Genome Analyses and the Problem of Identifying the Most Basal Angiosperm

The Chloroplast Genome of Nymphaea alba: Whole-Genome Analyses and the Problem of Identifying the Most Basal Angiosperm The Chloroplast Genome of Nymphaea alba: Whole-Genome Analyses and the Problem of Identifying the Most Basal Angiosperm Vadim V. Goremykin,* Karen I. Hirsch-Ernst, Stefan Wölfl,à and Frank H. Hellwig*

More information

Letter to the Editor. Department of Biology, Arizona State University

Letter to the Editor. Department of Biology, Arizona State University Letter to the Editor Traditional Phylogenetic Reconstruction Methods Reconstruct Shallow and Deep Evolutionary Relationships Equally Well Michael S. Rosenberg and Sudhir Kumar Department of Biology, Arizona

More information

Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition

Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition David D. Pollock* and William J. Bruno* *Theoretical Biology and Biophysics, Los Alamos National

More information

Phylogenetic analyses. Kirsi Kostamo

Phylogenetic analyses. Kirsi Kostamo Phylogenetic analyses Kirsi Kostamo The aim: To construct a visual representation (a tree) to describe the assumed evolution occurring between and among different groups (individuals, populations, species,

More information

Phylogenetics: Building Phylogenetic Trees

Phylogenetics: Building Phylogenetic Trees 1 Phylogenetics: Building Phylogenetic Trees COMP 571 Luay Nakhleh, Rice University 2 Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary model should

More information

Phylogenetic Tree Reconstruction

Phylogenetic Tree Reconstruction I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven

More information

Phylogenomics: the beginning of incongruence?

Phylogenomics: the beginning of incongruence? Phylogenomics: the beginning of incongruence? Olivier Jeffroy, Henner Brinkmann, Frédéric Delsuc, Hervé Philippe To cite this version: Olivier Jeffroy, Henner Brinkmann, Frédéric Delsuc, Hervé Philippe.

More information

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University Phylogenetics: Building Phylogenetic Trees COMP 571 - Fall 2010 Luay Nakhleh, Rice University Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary

More information

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9 Lecture 5 Alignment I. Introduction. For sequence data, the process of generating an alignment establishes positional homologies; that is, alignment provides the identification of homologous phylogenetic

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: Distance-based methods Ultrametric Additive: UPGMA Transformed Distance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

Efficiencies of maximum likelihood methods of phylogenetic inferences when different substitution models are used

Efficiencies of maximum likelihood methods of phylogenetic inferences when different substitution models are used Molecular Phylogenetics and Evolution 31 (2004) 865 873 MOLECULAR PHYLOGENETICS AND EVOLUTION www.elsevier.com/locate/ympev Efficiencies of maximum likelihood methods of phylogenetic inferences when different

More information

Can taxon-sampling effects be minimized by using branch supports? P. Hovenkamp

Can taxon-sampling effects be minimized by using branch supports? P. Hovenkamp Cladistics Cladistics 22 (2006) 264 275 www.blackwell-synergy.com Can taxon-sampling effects be minimized by using branch supports? P. Hovenkamp Nationaal Herbarium Nederland, Leiden, PO Box 9514, NL-2300

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: istance-based methods Ultrametric Additive: UPGMA Transformed istance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the

More information

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships Chapter 26: Phylogeny and the Tree of Life You Must Know The taxonomic categories and how they indicate relatedness. How systematics is used to develop phylogenetic trees. How to construct a phylogenetic

More information

STEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization)

STEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization) STEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization) Laura Salter Kubatko Departments of Statistics and Evolution, Ecology, and Organismal Biology The Ohio State University kubatko.2@osu.edu

More information

Phylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches

Phylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches Int. J. Bioinformatics Research and Applications, Vol. x, No. x, xxxx Phylogenies Scores for Exhaustive Maximum Likelihood and s Searches Hyrum D. Carroll, Perry G. Ridge, Mark J. Clement, Quinn O. Snell

More information

Algorithms in Bioinformatics

Algorithms in Bioinformatics Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Distance Methods Character Methods

More information

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5. Five Sami Khuri Department of Computer Science San José State University San José, California, USA sami.khuri@sjsu.edu v Distance Methods v Character Methods v Molecular Clock v UPGMA v Maximum Parsimony

More information

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution Ziheng Yang Department of Biology, University College, London An excess of nonsynonymous substitutions

More information

Missing data and influential sites: choice of sites for phylogenetic analysis can be as important as taxon-sampling and model choice

Missing data and influential sites: choice of sites for phylogenetic analysis can be as important as taxon-sampling and model choice Genome Biology and Evolution Advance Access published March 6, 2013 doi:10.1093/gbe/evt032 Submission date: February 27, 2013 Letter Running Head: Missing data and influential sites Missing data and influential

More information

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016 Molecular phylogeny - Using molecular sequences to infer evolutionary relationships Tore Samuelsson Feb 2016 Molecular phylogeny is being used in the identification and characterization of new pathogens,

More information

Phylogenetic inference

Phylogenetic inference Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis-) advantages of different information types

More information

Letter to the Editor. The Effect of Taxonomic Sampling on Accuracy of Phylogeny Estimation: Test Case of a Known Phylogeny Steven Poe 1

Letter to the Editor. The Effect of Taxonomic Sampling on Accuracy of Phylogeny Estimation: Test Case of a Known Phylogeny Steven Poe 1 Letter to the Editor The Effect of Taxonomic Sampling on Accuracy of Phylogeny Estimation: Test Case of a Known Phylogeny Steven Poe 1 Department of Zoology and Texas Memorial Museum, University of Texas

More information

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees Erin Molloy and Tandy Warnow {emolloy2, warnow}@illinois.edu University of Illinois at Urbana

More information

SHARED MOLECULAR SIGNATURES SUPPORT THE INCLUSION OF CATAMIXIS IN SUBFAMILY PERTYOIDEAE (ASTERACEAE).

SHARED MOLECULAR SIGNATURES SUPPORT THE INCLUSION OF CATAMIXIS IN SUBFAMILY PERTYOIDEAE (ASTERACEAE). 418 SHARED MOLECULAR SIGNATURES SUPPORT THE INCLUSION OF CATAMIXIS IN SUBFAMILY PERTYOIDEAE (ASTERACEAE). Jose L. Panero Section of Integrative Biology, 1 University Station, C0930, The University of Texas,

More information

8/23/2014. Phylogeny and the Tree of Life

8/23/2014. Phylogeny and the Tree of Life Phylogeny and the Tree of Life Chapter 26 Objectives Explain the following characteristics of the Linnaean system of classification: a. binomial nomenclature b. hierarchical classification List the major

More information

MOLECULAR PHYLOGENY AND GENETIC DIVERSITY ANALYSIS. Masatoshi Nei"

MOLECULAR PHYLOGENY AND GENETIC DIVERSITY ANALYSIS. Masatoshi Nei MOLECULAR PHYLOGENY AND GENETIC DIVERSITY ANALYSIS Masatoshi Nei" Abstract: Phylogenetic trees: Recent advances in statistical methods for phylogenetic reconstruction and genetic diversity analysis were

More information

PhyQuart-A new algorithm to avoid systematic bias & phylogenetic incongruence

PhyQuart-A new algorithm to avoid systematic bias & phylogenetic incongruence PhyQuart-A new algorithm to avoid systematic bias & phylogenetic incongruence Are directed quartets the key for more reliable supertrees? Patrick Kück Department of Life Science, Vertebrates Division,

More information

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

CHAPTERS 24-25: Evidence for Evolution and Phylogeny CHAPTERS 24-25: Evidence for Evolution and Phylogeny 1. For each of the following, indicate how it is used as evidence of evolution by natural selection or shown as an evolutionary trend: a. Paleontology

More information

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 2009 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

More information

The Evolutionary Root of Flowering Plants

The Evolutionary Root of Flowering Plants Syst. Biol. 62(1):50 61, 2013 The Author(s) 2012. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com

More information

PGA: A Program for Genome Annotation by Comparative Analysis of. Maximum Likelihood Phylogenies of Genes and Species

PGA: A Program for Genome Annotation by Comparative Analysis of. Maximum Likelihood Phylogenies of Genes and Species PGA: A Program for Genome Annotation by Comparative Analysis of Maximum Likelihood Phylogenies of Genes and Species Paulo Bandiera-Paiva 1 and Marcelo R.S. Briones 2 1 Departmento de Informática em Saúde

More information

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 200 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

More information

Intraspecific gene genealogies: trees grafting into networks

Intraspecific gene genealogies: trees grafting into networks Intraspecific gene genealogies: trees grafting into networks by David Posada & Keith A. Crandall Kessy Abarenkov Tartu, 2004 Article describes: Population genetics principles Intraspecific genetic variation

More information

The origin of angiosperms has long been considered a fundamental

The origin of angiosperms has long been considered a fundamental Phylogeny of seed plants based on all three genomic compartments: Extant gymnosperms are monophyletic and Gnetales closest relatives are conifers L. Michelle Bowe*, Gwénaële Coat, and Claude W. depamphilis

More information

What is Phylogenetics

What is Phylogenetics What is Phylogenetics Phylogenetics is the area of research concerned with finding the genetic connections and relationships between species. The basic idea is to compare specific characters (features)

More information

Fast Phylogenetic Methods for the Analysis of Genome Rearrangement Data: An Empirical Study

Fast Phylogenetic Methods for the Analysis of Genome Rearrangement Data: An Empirical Study Fast Phylogenetic Methods for the Analysis of Genome Rearrangement Data: An Empirical Study Li-San Wang Robert K. Jansen Dept. of Computer Sciences Section of Integrative Biology University of Texas, Austin,

More information

Supplemental Data. Vanneste et al. (2015). Plant Cell /tpc

Supplemental Data. Vanneste et al. (2015). Plant Cell /tpc Supplemental Figure 1: Representation of the structure of an orthogroup. Each orthogroup consists out of a paralogous pair representing the WGD (denoted by Equisetum 1 and Equisetum 2) supplemented with

More information

ASSESSING AMONG-LOCUS VARIATION IN THE INFERENCE OF SEED PLANT PHYLOGENY

ASSESSING AMONG-LOCUS VARIATION IN THE INFERENCE OF SEED PLANT PHYLOGENY Int. J. Plant Sci. 168(2):111 124. 2007. Ó 2007 by The University of Chicago. All rights reserved. 1058-5893/2007/16802-0001$15.00 ASSESSING AMONG-LOCUS VARIATION IN THE INFERENCE OF SEED PLANT PHYLOGENY

More information

Dynamic optimisation identifies optimal programs for pathway regulation in prokaryotes. - Supplementary Information -

Dynamic optimisation identifies optimal programs for pathway regulation in prokaryotes. - Supplementary Information - Dynamic optimisation identifies optimal programs for pathway regulation in prokaryotes - Supplementary Information - Martin Bartl a, Martin Kötzing a,b, Stefan Schuster c, Pu Li a, Christoph Kaleta b a

More information

first (i.e., weaker) sense of the term, using a variety of algorithmic approaches. For example, some methods (e.g., *BEAST 20) co-estimate gene trees

first (i.e., weaker) sense of the term, using a variety of algorithmic approaches. For example, some methods (e.g., *BEAST 20) co-estimate gene trees Concatenation Analyses in the Presence of Incomplete Lineage Sorting May 22, 2015 Tree of Life Tandy Warnow Warnow T. Concatenation Analyses in the Presence of Incomplete Lineage Sorting.. 2015 May 22.

More information

Integrative Biology 200A "PRINCIPLES OF PHYLOGENETICS" Spring 2008

Integrative Biology 200A PRINCIPLES OF PHYLOGENETICS Spring 2008 Integrative Biology 200A "PRINCIPLES OF PHYLOGENETICS" Spring 2008 University of California, Berkeley B.D. Mishler March 18, 2008. Phylogenetic Trees I: Reconstruction; Models, Algorithms & Assumptions

More information

Concepts and Methods in Molecular Divergence Time Estimation

Concepts and Methods in Molecular Divergence Time Estimation Concepts and Methods in Molecular Divergence Time Estimation 26 November 2012 Prashant P. Sharma American Museum of Natural History Overview 1. Why do we date trees? 2. The molecular clock 3. Local clocks

More information

C.DARWIN ( )

C.DARWIN ( ) C.DARWIN (1809-1882) LAMARCK Each evolutionary lineage has evolved, transforming itself, from a ancestor appeared by spontaneous generation DARWIN All organisms are historically interconnected. Their relationships

More information

A Phylogenetic Network Construction due to Constrained Recombination

A Phylogenetic Network Construction due to Constrained Recombination A Phylogenetic Network Construction due to Constrained Recombination Mohd. Abdul Hai Zahid Research Scholar Research Supervisors: Dr. R.C. Joshi Dr. Ankush Mittal Department of Electronics and Computer

More information

EVOLUTIONARY DISTANCES

EVOLUTIONARY DISTANCES EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:

More information

Using Phylogenomics to Predict Novel Fungal Pathogenicity Genes

Using Phylogenomics to Predict Novel Fungal Pathogenicity Genes Using Phylogenomics to Predict Novel Fungal Pathogenicity Genes David DeCaprio, Ying Li, Hung Nguyen (sequenced Ascomycetes genomes courtesy of the Broad Institute) Phylogenomics Combining whole genome

More information

7. Tests for selection

7. Tests for selection Sequence analysis and genomics 7. Tests for selection Dr. Katja Nowick Group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute for Brain Research www. nowicklab.info

More information

Phylogenetic Networks, Trees, and Clusters

Phylogenetic Networks, Trees, and Clusters Phylogenetic Networks, Trees, and Clusters Luay Nakhleh 1 and Li-San Wang 2 1 Department of Computer Science Rice University Houston, TX 77005, USA nakhleh@cs.rice.edu 2 Department of Biology University

More information

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley

Integrative Biology 200 PRINCIPLES OF PHYLOGENETICS Spring 2018 University of California, Berkeley Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley B.D. Mishler Feb. 14, 2018. Phylogenetic trees VI: Dating in the 21st century: clocks, & calibrations;

More information

BINF6201/8201. Molecular phylogenetic methods

BINF6201/8201. Molecular phylogenetic methods BINF60/80 Molecular phylogenetic methods 0-7-06 Phylogenetics Ø According to the evolutionary theory, all life forms on this planet are related to one another by descent. Ø Traditionally, phylogenetics

More information

Systematics - Bio 615

Systematics - Bio 615 Bayesian Phylogenetic Inference 1. Introduction, history 2. Advantages over ML 3. Bayes Rule 4. The Priors 5. Marginal vs Joint estimation 6. MCMC Derek S. Sikes University of Alaska 7. Posteriors vs Bootstrap

More information

Organelle genome evolution

Organelle genome evolution Organelle genome evolution Plant of the day! Rafflesia arnoldii -- largest individual flower (~ 1m) -- no true leafs, shoots or roots -- holoparasitic -- non-photosynthetic Big questions What is the origin

More information

PHYLOGENY AND SYSTEMATICS

PHYLOGENY AND SYSTEMATICS AP BIOLOGY EVOLUTION/HEREDITY UNIT Unit 1 Part 11 Chapter 26 Activity #15 NAME DATE PERIOD PHYLOGENY AND SYSTEMATICS PHYLOGENY Evolutionary history of species or group of related species SYSTEMATICS Study

More information

Estimating Evolutionary Trees. Phylogenetic Methods

Estimating Evolutionary Trees. Phylogenetic Methods Estimating Evolutionary Trees v if the data are consistent with infinite sites then all methods should yield the same tree v it gets more complicated when there is homoplasy, i.e., parallel or convergent

More information

Anatomy of a species tree

Anatomy of a species tree Anatomy of a species tree T 1 Size of current and ancestral Populations (N) N Confidence in branches of species tree t/2n = 1 coalescent unit T 2 Branch lengths and divergence times of species & populations

More information

Cladistics and Bioinformatics Questions 2013

Cladistics and Bioinformatics Questions 2013 AP Biology Name Cladistics and Bioinformatics Questions 2013 1. The following table shows the percentage similarity in sequences of nucleotides from a homologous gene derived from five different species

More information

Smith et al. American Journal of Botany 98(3): Data Supplement S2 page 1

Smith et al. American Journal of Botany 98(3): Data Supplement S2 page 1 Smith et al. American Journal of Botany 98(3):404-414. 2011. Data Supplement S1 page 1 Smith, Stephen A., Jeremy M. Beaulieu, Alexandros Stamatakis, and Michael J. Donoghue. 2011. Understanding angiosperm

More information

Non-independence in Statistical Tests for Discrete Cross-species Data

Non-independence in Statistical Tests for Discrete Cross-species Data J. theor. Biol. (1997) 188, 507514 Non-independence in Statistical Tests for Discrete Cross-species Data ALAN GRAFEN* AND MARK RIDLEY * St. John s College, Oxford OX1 3JP, and the Department of Zoology,

More information

Small RNA in rice genome

Small RNA in rice genome Vol. 45 No. 5 SCIENCE IN CHINA (Series C) October 2002 Small RNA in rice genome WANG Kai ( 1, ZHU Xiaopeng ( 2, ZHONG Lan ( 1,3 & CHEN Runsheng ( 1,2 1. Beijing Genomics Institute/Center of Genomics and

More information

Phylogenomics. Jeffrey P. Townsend Department of Ecology and Evolutionary Biology Yale University. Tuesday, January 29, 13

Phylogenomics. Jeffrey P. Townsend Department of Ecology and Evolutionary Biology Yale University. Tuesday, January 29, 13 Phylogenomics Jeffrey P. Townsend Department of Ecology and Evolutionary Biology Yale University How may we improve our inferences? How may we improve our inferences? Inferences Data How may we improve

More information

Quartet Inference from SNP Data Under the Coalescent Model

Quartet Inference from SNP Data Under the Coalescent Model Bioinformatics Advance Access published August 7, 2014 Quartet Inference from SNP Data Under the Coalescent Model Julia Chifman 1 and Laura Kubatko 2,3 1 Department of Cancer Biology, Wake Forest School

More information

Supplementary material to Whitney, K. D., B. Boussau, E. J. Baack, and T. Garland Jr. in press. Drift and genome complexity revisited. PLoS Genetics.

Supplementary material to Whitney, K. D., B. Boussau, E. J. Baack, and T. Garland Jr. in press. Drift and genome complexity revisited. PLoS Genetics. Supplementary material to Whitney, K. D., B. Boussau, E. J. Baack, and T. Garland Jr. in press. Drift and genome complexity revisited. PLoS Genetics. Tree topologies Two topologies were examined, one favoring

More information

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2016 University of California, Berkeley. Parsimony & Likelihood [draft]

Integrative Biology 200 PRINCIPLES OF PHYLOGENETICS Spring 2016 University of California, Berkeley. Parsimony & Likelihood [draft] Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2016 University of California, Berkeley K.W. Will Parsimony & Likelihood [draft] 1. Hennig and Parsimony: Hennig was not concerned with parsimony

More information

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley

PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION Integrative Biology 200B Spring 2009 University of California, Berkeley "PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley B.D. Mishler Jan. 22, 2009. Trees I. Summary of previous lecture: Hennigian

More information

Chapter 7: Models of discrete character evolution

Chapter 7: Models of discrete character evolution Chapter 7: Models of discrete character evolution pdf version R markdown to recreate analyses Biological motivation: Limblessness as a discrete trait Squamates, the clade that includes all living species

More information

Phylogeny: traditional and Bayesian approaches

Phylogeny: traditional and Bayesian approaches Phylogeny: traditional and Bayesian approaches 5-Feb-2014 DEKM book Notes from Dr. B. John Holder and Lewis, Nature Reviews Genetics 4, 275-284, 2003 1 Phylogeny A graph depicting the ancestor-descendent

More information

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree) I9 Introduction to Bioinformatics, 0 Phylogenetic ree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & omputing, IUB Evolution theory Speciation Evolution of new organisms is driven by

More information

INCREASED RATES OF MOLECULAR EVOLUTION IN AN EQUATORIAL PLANT CLADE: AN EFFECT OF ENVIRONMENT OR PHYLOGENETIC NONINDEPENDENCE?

INCREASED RATES OF MOLECULAR EVOLUTION IN AN EQUATORIAL PLANT CLADE: AN EFFECT OF ENVIRONMENT OR PHYLOGENETIC NONINDEPENDENCE? Evolution, 59(1), 2005, pp. 238 242 INCREASED RATES OF MOLECULAR EVOLUTION IN AN EQUATORIAL PLANT CLADE: AN EFFECT OF ENVIRONMENT OR PHYLOGENETIC NONINDEPENDENCE? JEREMY M. BROWN 1,2 AND GREGORY B. PAULY

More information

Title ghost-tree: creating hybrid-gene phylogenetic trees for diversity analyses

Title ghost-tree: creating hybrid-gene phylogenetic trees for diversity analyses 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 Title ghost-tree: creating hybrid-gene phylogenetic trees for diversity analyses

More information

Supplementary Materials for

Supplementary Materials for advances.sciencemag.org/cgi/content/full/1/8/e1500527/dc1 Supplementary Materials for A phylogenomic data-driven exploration of viral origins and evolution The PDF file includes: Arshan Nasir and Gustavo

More information

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task. Chapter 12 (Strikberger) Molecular Phylogenies and Evolution METHODS FOR DETERMINING PHYLOGENY In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task. Modern

More information

Phylogeny: building the tree of life

Phylogeny: building the tree of life Phylogeny: building the tree of life Dr. Fayyaz ul Amir Afsar Minhas Department of Computer and Information Sciences Pakistan Institute of Engineering & Applied Sciences PO Nilore, Islamabad, Pakistan

More information

Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week (Jan 27 & 29):

Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week (Jan 27 & 29): Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week (Jan 27 & 29): Statistical estimation of models of sequence evolution Phylogenetic inference using maximum likelihood:

More information

Tree of Life iological Sequence nalysis Chapter http://tolweb.org/tree/ Phylogenetic Prediction ll organisms on Earth have a common ancestor. ll species are related. The relationship is called a phylogeny

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary information S1 (box). Supplementary Methods description. Prokaryotic Genome Database Archaeal and bacterial genome sequences were downloaded from the NCBI FTP site (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/)

More information

Consistency Index (CI)

Consistency Index (CI) Consistency Index (CI) minimum number of changes divided by the number required on the tree. CI=1 if there is no homoplasy negatively correlated with the number of species sampled Retention Index (RI)

More information

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between

More information

KaKs Calculator: Calculating Ka and Ks Through Model Selection and Model Averaging

KaKs Calculator: Calculating Ka and Ks Through Model Selection and Model Averaging Method KaKs Calculator: Calculating Ka and Ks Through Model Selection and Model Averaging Zhang Zhang 1,2,3#, Jun Li 2#, Xiao-Qian Zhao 2,3, Jun Wang 1,2,4, Gane Ka-Shu Wong 2,4,5, and Jun Yu 1,2,4 * 1

More information

Algorithmic Methods Well-defined methodology Tree reconstruction those that are well-defined enough to be carried out by a computer. Felsenstein 2004,

Algorithmic Methods Well-defined methodology Tree reconstruction those that are well-defined enough to be carried out by a computer. Felsenstein 2004, Tracing the Evolution of Numerical Phylogenetics: History, Philosophy, and Significance Adam W. Ferguson Phylogenetic Systematics 26 January 2009 Inferring Phylogenies Historical endeavor Darwin- 1837

More information

Molecular Markers, Natural History, and Evolution

Molecular Markers, Natural History, and Evolution Molecular Markers, Natural History, and Evolution Second Edition JOHN C. AVISE University of Georgia Sinauer Associates, Inc. Publishers Sunderland, Massachusetts Contents PART I Background CHAPTER 1:

More information

An Amino Acid Substitution-Selection Model Adjusts Residue Fitness to Improve Phylogenetic Estimation

An Amino Acid Substitution-Selection Model Adjusts Residue Fitness to Improve Phylogenetic Estimation An Amino Acid Substitution-Selection Model Adjusts Residue Fitness to Improve Phylogenetic Estimation Huai-Chun Wang,*,1,2 Edward Susko, 1,2 and Andrew J. Roger 2,3 1 Department of Mathematics and Statistics,

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary information S3 (box) Methods Methods Genome weighting The currently available collection of archaeal and bacterial genomes has a highly biased distribution of isolates across taxa. For example,

More information

Lab 9: Maximum Likelihood and Modeltest

Lab 9: Maximum Likelihood and Modeltest Integrative Biology 200A University of California, Berkeley "PRINCIPLES OF PHYLOGENETICS" Spring 2010 Updated by Nick Matzke Lab 9: Maximum Likelihood and Modeltest In this lab we re going to use PAUP*

More information