C HLOROPLAST DNA SEQUENCE UTILITY FOR THE LOWEST

Size: px
Start display at page:

Download "C HLOROPLAST DNA SEQUENCE UTILITY FOR THE LOWEST"

Transcription

1 American Journal of Botany 101 (11 ): , C HLOROPLAST DNA SEQUENCE UTILITY FOR THE LOWEST PHYLOGENETIC AND PHYLOGEOGRAPHIC INFERENCES IN ANGIOSPERMS: THE TORTOISE AND THE HARE IV 1 J OEY S HAW 2,3,5, H AYDEN L. SHAFER 2, O. RAYNE L EONARD 4, M ARGARET J. KOVACH 2, M ARK S CHORR 2, AND A SHLEY B. MORRIS 4 2 Department of Biological and Environmental Sciences, University of Tennessee at Chattanooga, Chattanooga, Tennessee USA; 3 Botanical Research Institute of Texas, Fort Worth, Texas USA; and 4 Department of Biology, Middle Tennessee State University, Murfreesboro, Tennessee USA Premise of the study: Noncoding chloroplast DNA (NC-cpDNA) sequences are the staple data source of low-level phylogeographic and phylogenetic studies of angiosperms. We followed up on previous papers (tortoise and hare II and III) that sought to identify the most consistently variable regions of NC-cpDNA. We used an exhaustive literature review and newly available whole plastome data to assess applicability of previous conclusions at low taxonomic levels. Methods: We aligned complete plastomes of 25 species pairs from across angiosperms, comparing the number of genetic differences found in 107 NC-cpDNA regions and matk. We surveyed Web of Science for the plant phylogeographic literature between 2007 and 2013 to assess how NC-cpDNA has been used at the intraspecific level. Key results: Several regions are consistently the most variable across angiosperm lineages: ndhf-rpl32, rpl32-trnl (UAG), ndhctrnv (UAC), 5 rps16-trnq (UUG), psbe-petl, trnt (GGU) -psbd, peta-psbj, and rpl16 intron. However, there is no universally best region. The average number of regions applied to low-level studies is ~2.5, which may be too little to access the full discriminating power of this genome. Conclusions: Plastome sequences have been used successfully at lower and lower taxonomic levels. Our findings corroborate earlier works, suggesting that there are regions that are most likely to be the most variable. However, while NC-cpDNA sequences are commonly used in plant phylogeographic studies, few of the most variable regions are applied in that context. Furthermore, it appears that in most studies too few NC-cpDNAs are used to access the discriminating power of the cpdna genome. Key words: chloroplast; DNA barcode; intergenic spacer; intraspecific; intron; noncoding cpdna; phylogeography; plastid region; plastome; tortoise and hare. The chloroplast genome has long been recognized as the workhorse for testing relationships between biological and geographical phenomena in angiosperms (Palmer, 1987 ; Palmer et al., 1988 ; Olmstead and Palmer, 1994 ; Soltis et al., 1997 ; Graham and Olmstead, 2000 ; Kelchner, 2000 ). Chloroplast sequences have been used successfully to infer relationships at all taxonomic levels, from the deepest-level relationships of land plants and angiosperms ( Chase et al., 1993 ; Borsch et al., 2003 ; Hilu et al., 2003 ; Moore et al., 2010 ), through intermediate taxonomic levels of orders and families ( Downie et al., 2000 ; Potter et al., 2007 ; Chin et al., 2014 ), to relationships among closely related species or populations ( Soltis et al., 1997, 2006 ; Schaal et al., 1998 ; Shaw and Small, 2004 ; Morris et al., 2008 ). 1 Manuscript received 9 September 2014; revision accepted 17 September The authors thank Brad Ruhfel, Stephen Downie, an anonymous reviewer, and the Associate Editor for their careful consideration of this article. The authors thank Ed Schilling (University of Tennessee) and James Beck (Wichita State University) for reviewing early drafts of this manuscript. They also extend special thanks to R. Small, E. Schilling, E. Lickey, J. Beck, K. Grubbs, S. Farmer, W. Liu, J. Miller, and C. Winder for their work and important contributions to the earlier chapters of this research and the Hesler Foundation at the University of Tennessee for financially supporting the earlier chapters. 5 Author for correspondence ( joey-shaw@utc.edu) doi: /ajb It is at this shallowest taxonomic level, (inter- and intraspecific studies) that researchers are often challenged with finding sufficient genetic variability to address their questions ( Schaal et al., 1998 ; Schaal and Olsen, 2000 ; Holderegger and Abbott, 2003 ; Petit and Vendramin, 2007 ). Shaw et al. (2005, 2007 ) provided guidance in the search for the most consistently variable noncoding chloroplast regions (NC-cpDNA). In The Tortoise and the Hare II (TH2) ( Shaw et al., 2005 ), the focus was to compare the relative number of genetic differences typically found in NC-cpDNA regions that were prevalent in the literature at that time. In that study, 21 NC-cpDNA regions were assessed for utility within each of 10 seed plant lineages. Each lineage comprised three congeneric species. The results were surprising in that the most commonly used regions at that time (e.g., trnl intron, trnl-trnf ), were among the least informative, while some uncommonly used regions ( trnd-trnt, trns-trng, and rpob-trnc ) appeared to be much more informative. In The Tortoise and the Hare III (TH3) ( Shaw et al., 2007 ), the focus was to expand genomic sampling to all single-copy NC-cpDNA regions. Initially, paired plastomes from three angiosperm lineages ( Atropa vs. Nicotiana [Solanaceae]: asterid; Lotus vs. Medicago [Fabaceae]: rosid; and Saccharum vs. Oryza [Poacae]: monocot) were used to screen all single-copy NC-cpDNA regions to search for any that might have higher variability than the best regions identified in TH2. The outcome highlighted 13 regions, and these were sequenced across the same three-species groups as the TH2 study. In the end, nine rarely or never before American Journal of Botany 101 (11 ): , 2014 ; Botanical Society of America 1987

2 1988 AMERICAN JOURNAL OF BOTANY [Vol. 101 used NC-cpDNA regions were identified as being, on average across angiosperms, the most informative for low-level studies ( rpl32-trnl (UAG), trnq (UUG) -5 rps16, 3 trnv (UAC) -ndhc, ndhfrpl32, psbd-trnt (GGU), psbj-peta, 3 rps16 5 trnk (UUU), atpiatph, and petl-psbe ). Before the publication of TH3, very few congeneric species pairs of whole plastomes were available in GenBank, and we were fortunate to find species pairs within the same family (asterids: Solanaceae; rosids: Fabaceae; and monocots: Poaceae). There are now ~340 angiosperm chloroplast genomes in Gen- Bank (January 2014), many of which represent congeneric accessions. This wealth of recent data allows us the new opportunity to more thoroughly screen the utility of noncoding regions of the plastome across a wider spectrum of the angiosperm phylogeny to evaluate inferences made in TH3. As a consequence of the rapidly increasing number of whole plastomes now available in public databases, we have an opportunity to learn from the growing availability of such data for the benefit of the larger community of researchers not yet sequencing chloroplast genomes. With the rapid advancement of next-generation sequencing (NGS) approaches, we have entered the Golden Age of Molecular Ecology ( Paliy, 2013 ), and sequencing whole plastomes, or even hundreds of nuclear loci, for low-level inquiry is certainly the future ( Soltis et al., 2013 ). It is becoming increasingly feasible to sequence whole or nearly whole plastomes using highthroughput methods ( Cronn et al., 2012 ; De Wit et al., 2012 ; Shi et al., 2012 ; Straub et al., 2012 ; Stull et al., 2013 ; Uribe-Convers et al., 2014 ), and the cost savings over Sanger sequencing can be significant for large-scale studies involving many taxa ( Godden et al., 2013 ) or for laboratories that are already established with the needed laboratory and computational infrastructure. Core sequencing facilities now offer opportunities for multiple laboratories to share the cost of an NGS run, further lowering the price per sample. While there are a few notable cases in which researchers have published phylogenies based on whole plastome data, most are targeted at resolving deep relationships ( Ruhfel et al., 2014 ) or tackling questions in model organisms ( Eserman et al., 2014 ) or economically important groups ( Nikiforova et al., 2013 ; Njuguna et al., 2013 ), but this is rapidly changing. At present, the majority of whole plastome publications are limited to one or a few accessions per study (e.g., Martin et al., 2013 ; Walker et al., 2014 ), and these typically come from the larger research laboratories, reflecting the financial and computational costs associated with this kind of data generation ( Rocha et al., 2013 ; Soltis et al., 2013 ). The start-up costs for hardware and software alone may prohibit smaller, lesser-funded laboratories around the world from moving from sequencing a few NCcpDNA regions to generating massive quantities of genomic data. For smaller-scale studies, or studies coming from lesser-funded or equipped laboratories, the cost of NGS and the complexity of bioinformatics analyses required for such large data sets may still be prohibitive ( Doyle, 2013 ; Rocha et al., 2013 ; Bowen et al., 2014 ; but see Straub et al., 2012 for an alternative viewpoint). Bowen et al. (2014), in summarizing results from a meeting at the National Evolutionary Synthesis Center (NESCent), suggested that (1) single mtdna locus studies (for plants, we would modify this phrase to one to several cpdna loci studies ) provide powerful first assessments of patterns; (2) genome-wide analyses are warranted if results from standard markers are discordant with other aspects of the organisms biology; (3) at this stage of software development and technology, a judicious rather than wholesale application of genomics is prudent. Given all of this, we anticipate that Sanger or NGS of a smaller number of individual NC-cpDNA regions will continue to be an important approach for many exploratory inquiries, low-level systematics and phylogeographic studies, and for barcoding studies, at least for the near future. Therefore, there is still a need to be able to select from a few, presumably the most potentially informative, NCcpDNA regions. In the present study, we asked: Which, and how many, of the most informative regions of the TH series have been useful at the lowest phylogenetic and phylogeographic levels in angiosperms? While we are particularly interested in intraspecific data, there are currently no publicly available whole plastome phylogeography data sets, nor are there many publications for such an assessment (but see Whittall et al., 2010 and Doorduin et al., 2011 ). Therefore, we addressed our question by analyzing a large set of whole plastomes, reflecting as many low-level taxonomic comparisons as are currently available through the NCBI Organelle Genome Resources (GenBank). We used 25 closely related species pairs congeners when possible (20/25) across 10 major lineages of angiosperms (Nymphaeales, magnoliids, monocots, Proteales, eurosid I, eurosid II, Caryophyllales, Ericales, euasterid I, euasterid II) ( Fig. 1 ) to compare all single-copy NCcpDNA regions and matk because it is among the most variable coding regions and has been a popular region since 1997 ( Hilu and Liang, 1997 ). We also were able to compare the results within the major clades to overall results. In addition to this new data set, we searched the literature covering the period since the last TH paper ( ). To determine whether NC-cpDNA sequences are an important tool for low-level studies, we asked: What percentage of intraspecific studies of plants used cpdna sequence data? Beyond this initial question, we also asked: Which NC-cpDNA regions were chosen and sequenced? How many NC-cpDNAs were needed to generate sufficient data? How many papers explicitly reported screening NC-cpDNAs, and how many were typically screened? All results are discussed in the context of the trends in the current literature and findings of our previous TH studies. In the end, we propose a strategy for investigators of low-level taxonomic or phylogeographic studies using NC-cpDNA sequence data. MATERIALS AND METHODS Taxon selection for genomic comparisons Initially, we set out to use only congeneric species pairs in angiosperms; however, doing so, left us with phylogenetic gaps in our sampling. To fill in these phylogenetic gaps, we used the most closely related species pairs available in a few lineages, resulting in an initial list of 34 angiosperm species pairs for comparison. After we had compiled all of the data, four species pairs were removed ( Acorus americanus and A. calamus, Nicotiana sylvestris and N. tabacum, Oryza nivara and O. sativa, and Phyllostachys edulis and P. nigra var. henonis ) because they contained too few genetic differences between them, as we later explain in detail. We also had to remove five other species pairs ( Wolffi a australiana and Wolffi ella lingulata, Colocosia esculenta and Lemna minor, Ranunculus macranthus and Megaleranthus saniculifolia, Cuscuta exaltata and C. obtusifl ora, and Erodium carvifolium and E. texanum ) because they were too variable to be confidently aligned across the entire genome (the large indels and genomic rearrangements in these species pairs that made our analyses difficult could certainly provide workers of these groups with important information in other contexts). The remaining 25 species pairs (including 20 congeneric pairs) used in our analyses are shown in Fig. 1. We made a strong effort to sample across the phylogenetic breadth of angiosperms, using nomenclature and topology from the Angiosperm Phylogeny Group ( ). We made an effort to sample about the same number of species pairs per major lineage. In the end, we were able to sample one species pair from Nymphaeales, two from

3 November 2014] SHAW ET AL. TORTOISE AND HARE IV 1989 Fig. 1. Phylogenetic breadth of sampling for plastome comparisons. Species pairs (mostly congeners) are listed beside family names and next to major lineages. Tree topology follows Angiosperm Phylogeny Group III (APG III) ( ). magnoliids, five from monocots, one from Proteales, four from eurosid I, four from eurosid II, one from Caryophyllales, one from Ericales, three from euasterid I, and three from euasterid II ( Fig. 1 ). NC-cpDNA selection Rather than try (1) to determine exactly where the large single copy (LSC), small single copy (SSC), and inverted repeat (IR) regions occur in each of the 25 lineages and (2) expand the data set to include uncommon intergenic spacers (i.e., intergenic spacers unique to a given taxon because of genomic rearrangements unique to that taxon), we used the Gossypium hirsutum genome ( Lee et al., 2006 ) as a model. It was used because it is a typical chloroplast genome to determine which regions to include or exclude as well as the starting and stopping points of the LSC, SSC, and IRs. It is well accepted that the most variable portions of the cpdna genome are not in the IR regions ( Clegg et al., 1984 ; Wolfe et al., 1987 ). The IR regions have been shown to contain low levels of variability ( Maier et al., 1995 ), and Presting (2006) confirmed and quantified earlier reports that the IR was significantly more resistant to genetic changes compared to the single-copy regions. We therefore concentrated on the single-copy portions of the genome. The two single-copy regions consist of the LSC, which is typically about 80 kb long, and the SSC, which is typically 20 kb. These regions contain a combination of rrna and trna genes, as well as protein-encoding genes. It is common knowledge that gene-encoding regions will accumulate genetic differences more slowly than noncoding regions, the exceptions being matk (Steele and Vilgalys, 1994 ; Johnson and Soltis, 1995 ; Fazekas et al., 2008 ; Lahaye et al., 2008 ) and perhaps ycf1 ( Dong et al., 2012 ), which seem to accumulate mutations at a fast pace for coding regions. Because matk has historically been an important region ( Hilu and Liang, 1997 ) and is commonly recognized as being as variable as many noncoding regions of the plastome, our research effort was focused on matk and all single-copy, noncoding regions of the plastome. In total, 107 noncoding regions and matk were compared across 25 species pairs spanning the breadth of angiosperms. Sequence alignment The GenBank nucleotide BLAST function was used for initial alignment. BLAST aligns local regions of two separate sequences based on nucleotide similarity. In initial BLAST searches, we found that some of the more genetically dissimilar species pairs (e.g., Aethionema ) often contained too many gaps to ensure reliable alignments. Thus, we altered the default BLAST algorithm parameters to increase the standard linear gap cost, as a measure to reduce the presence of misaligned sequences. BLAST parameters were as follows: Max target sequences = 100, expected threshold = 10, Max matches in a query range = 0, Match/Mismatch scores = 1-2, and Gap Costs = Existence: 5; Extension: 2. After the sequences were aligned with BLAST, the alignments were visualized in Portable Document Format, and the gene-encoding regions were unequivocally denoted using the annotations in GenBank. Alignments were manually scored for genetic differences. In a few cases where small sections of the alignments were too variable to be confidently aligned (e.g., poly A/T regions), gaps were opened up, and these were conservatively scored as a single genetic difference (an indel). Data analysis All noncoding portions of the genome, including intergenic spacers (IGS) and introns, were analyzed for genetic differences between the

4 1990 AMERICAN JOURNAL OF BOTANY [Vol. 101 species pairs, and these genetic differences were counted as potentially informative characters (PICs) following Shaw et al. (2005, 2007 ). (PICs have been shown to be a good predictor of parsimoniously informative characters [ Fior et al., 2013 ].) The PICs included indels, nucleotide substitutions, and inversions. Indels and inversions were scored as binary (presence/absence) characters following Simmons and Ochoterena (2000). The process of alignment, annotation, and analysis was repeated for each of the 25 species pairs. The length of the noncoding region was recorded, and the number of PICs tallied for each noncoding cpdna region and matk in each of the 25 lineages. All genetic differences were scored as independent, single characters. Three types of calculations were performed to evaluate NC-cpDNA variability. First, we estimated the proportion of observed genetic differences for each NC-cpDNA using a modified version of the formula used in O Donnell (1992), Gielly and Taberlet (1994), and Shaw et al. (2005, 2007 ). The proportion of genetic differences (or % variability) = [(NS + ID + IV) / L] 100, where NS = the number of nucleotide substitutions, ID = the number of indels, IV = the number of inversions, and L = the aligned sequence length. Second, we averaged the number of PICs found within each noncoding chloroplast region across the 25 lineages that contained those NC-cpDNA regions (if the region was missing in a lineage, the total was divided by 24 rather than 25). Third, to ensure that lineages containing a greater number of genetic differences between the species pair (greater evolutionary distance) were not overrepresented (weighted) in global comparisons, we determined the percentage contribution of each noncoding region to the overall PIC total within a lineage (what was called normalized PICs in earlier TH papers). In effect, we calculated the percentage contribution of a NC-cpDNA to the number of genetic differences observed in a species pair. These values were then used to generate an average value for each noncoding region so that the regions could be directly compared. We argue that the percentage contribution values are the most accurate for comparisons of NC-cpDNA variability across lineages because they reduce the overrepresentation of average PIC values from the species pairs that evolved at faster rates or are evolutionarily further apart. For example, Silene species accumulated 1483 PICs across all regions, while Olea species only had 321 PICs. Without this normalization of the data, the species pair with the greatest overall number of genetic differences (e.g., Silene ) would more strongly influence the analysis compared with other lineages that had many fewer genetic differences between species. A downside to the percentage contribution analysis is that species pairs with a very low number of genetic differences between them tend to have overly high scores in this analysis. For example, the Nicotiana species pair was omitted because there were only two nucleotide substitutions across the entire alignment, and therefore, each NC-cpDNA containing them would have the very large percentage contribution PIC score of For this reason, we omitted species pairs with fewer than 100 PICs in the alignment (discussed earlier in Taxon selection for genomic comparisons ). Adjusted mean PICs A mixed-model analysis of covariance (ANCOVA) was used to look for statistical separation between the 10 NC-cpDNAs that ranked highest in the overall percentage contribution analysis. We chose to analyze only 10 regions because statistical inference would be weakened with all 108 regions. Furthermore, we argue that if statistical separation among the top 10 is present, then it is sure to exist between these and the rest. The mean PICs/ region was treated as the response variable and the fixed-effects factor was the NC-cpDNAs; lineage was treated as a block (random-effects factor) and region length (number of base pairs per region) as a covariate. Data were log-transformed [log 10 (PIC + 1)]; [log 10 (length + 1)] to satisfy the parametric test (ANCOVA) assumptions of normality, variance homogeneity, and linearity. Region-specific PIC estimates were reported as geometric means ± SE. A preliminary ANCOVA indicated that there was a direct linear relationship between the number of PICs/region and length of the region ( P < ) and that the slopes of PIC-length relationships were similar ( P > 0.10), which allowed us to compare slope-adjusted mean PICs/region. Region-specific estimates of adjusted mean PICs ( y ), which control for the effect of region length ( x, covariate), were determined using linear regression models [log ( y + 1) = a + b( x + 1)] with a common slope ( b = ) and length value ( x = mean length = ~1073 bp). Scheffé s method was used to make pairwise comparisons (contrasts) between region-specific mean PICs and/or groups of mean PICs, based on patterns exhibited by the data. Scheffé s procedure, a highly conservative post hoc test that maintains a familywise type I error rate at or below the alpha level, is recommended for multiple contrasts between groups of sample means (Zar, 2010 ). Data were analyzed using the UNIVARIATE, REG, GLM, MIXED, and LSMEANS procedures of the Statistical Analysis System ( SAS Institute, 2011 ). Statistical significance was declared at an alpha level of Literature review To answer the question What proportion of intraspecific plant studies used cpdna sequence data?, we performed a Web of Science search in May 2014 by confining the dates to and searching on the terms phylogeography or phylogeographic, refined to articles, then further refined by *aceae. Subsequent refinements were performed using either nuclear or ndna or mitochondria or mitochondrial or mtdna or chloroplast or plastid or cpdna. To address the other questions posited in the Introduction section, we performed a search on Web of Science in January We first used the topic search terms phylogeography or phylogeographic limited to the years We refined this number using the topic search terms chloroplast or cp- DNA to focus specifically on plant phylogeographic studies. While this may exclude papers that exclusively use nuclear markers, we determined this to be the most appropriate search strategy for the present survey. The top five source titles for these plant phylogeographic studies during the selected 7 years were Molecular Ecology (99), Journal of Biogeography (79), Molecular Phylogenetics and Evolution (45), American Journal of Botany (32), and Plant Systematics and Evolution (28), representing approximately 41% of total publications under our search criteria. Because our emphasis here is on low taxonomic level and population studies, we excluded papers in Molecular Phylogenetics and Evolution and Plant Systematics and Evolution due to the more traditional systematic nature of publications therein. Of the total of 210 plant phylogeographic studies in the remaining three journals ( Molecular Ecology, Journal of Biogeography, and American Journal of Botany ), 33 were excluded following our review for either being a review or focused on an animal, more than three species, or a nonvascular plant. This exclusion resulted in 178 publications being included in our review. We built a database containing the following information for each of these 178 studies: author, year of publication, source title, taxon, family, markers tested (when available), markers used, and length and PICs for markers used (when available). We used these data to answer the questions posited in the introduction. We were also interested in comparing relative utility of regions within studies to determine whether patterns predicted by TH2 and TH3 were supported by the literature. For this comparison, we could only include studies that explicitly stated the number of PICs/region and used at least two or more regions. Of the 178 studies included in our review, 45 met these criteria. The remaining studies either reported PICs across all regions combined, or many did not report PICs at all. For these 45 studies, NC-cpDNAs were ranked by decreasing number of PICs observed, and trends were qualitatively assessed across studies. RESULTS Overview of molecular data set In all, we manually examined over 2.5 million base pairs of aligned data from 25 species pairs, resulting in a data set of 107 single-copy NC-cpDNA regions including 15 introns, 92 IGS, and matk for 25 species pairs (Appendix S1, see Supplemental Data accompanying the online version of this article). Taxon selection, omission, and results on omitted taxa At the outset, we identified 34 congeneric or fairly closely related species pairs; however, five of the pairs were too variable to be confidently aligned, and four were too invariable to yield reliable information. There were only two genetic differences between Nicotiana sylvestris (NC_007500) and Nicotiana tabacum (NC_001879); one difference was in the psbm-trnd (GUC) spacer, and the other was in the petd intron. There were only 22 genetic differences between Acorus americanus (NC_010093) and Acorus calamus (NC_007407). These differences were found in ndhf-rpl32 (1), ccsa-ndhd (1), ndhg-ndhi (1), 5 trnk (UUU) - 3 rps16 (2), rps16 intron (1), trnq (UUG) -psbk (1), trnc (GCA) -petn (2), trnd (GUC) -trny (GUA) (1), trnt (GGU) -psbd (3), trnt (UGU) - trnl (UAA) (1), trnm (CAU) -atpe (1), atpb-rbcl (1), ycf4-cema (1), peta-psbj (1), psbe-petl (1), psaj-rpl33 (1), rps18-rpl20 (1), and rpl16 intron (1). We only observed 26 genetic differences between Oryza nivara (NC_005973) and Oryza sativa (NC_ ), and these were found in ndhf-rpl32 (1), ccsa-ndhd

5 November 2014] SHAW ET AL. TORTOISE AND HARE IV 1991 (1), ndha intron (1), psba-3 trnk (UUU) (1), rps16 intron (2), trnd (GUC) -trny (GUA) (2), trne (UUC) -trnt (GGU) (2), psbz-trng (GCC) (2), ycf3-trns (GGA) (2), ndhc-trnv (UAC) (3), trnm (CAU) -atpe (1 ), accd-psai (1), petl-petg (1), petd-rpoa (1), rps11-rpl36 (1 ), rpl14-rpl16 (2), rpl16 intron (1), and rpl22-rps19 (1). Twentyfour genetic differences were observed between Phyllostachys edulis (NC_015817) and Phyllostachys nigra (NC_015826). In Phyllostachys, differences were found in rpl32-trnl (UAG) (5), ccsa-ndhd (2), psac-ndhe (2), matk-5 trnk (UUU) (1), 5 rps16- trnq (UUG) (2), rpob-trnc (GCA) (1), petm-psbm (1), psbm-trnd (GUC) (1), ycf3 intron1 (1), ycf3-trns (GGA) (1), trnt (UGU) -trnl (UAA) (2), ndhc-trnv (UAC) (1), rpl33-rps18 (1), petb intron (1), and rpl14- rpl16 (2). We also had to remove species pairs because they were too variable to be confidently aligned; that is, between these species pairs, GenBank BLAST opened up large gaps with scattered unaligned nucleotides, and we could not manually improve on these alignments. Species pairs in which this was the case include: Erodium carvifolium (NC_015083) and Erodium texanum (NC_014569); Cuscuta exaltata (NC_009963) and Cuscuta gronovii (NC_009765); Wollfia australiana (NC_015899) and Wollfiella lingulata (NC_015894); Colocasia esculenta (NC_016753) and Lemna minor (NC_010109); and Ranunculus macranthus (NC_008796) and Megaleranthus saniculifolia (NC_012615). In a few cases, one or two NC-cpDNAs had to be omitted from an otherwise neatly aligned genomic comparison because these NC-cpDNAs were too variable to be confidently aligned. This was the case in Nuphar/Nymphaea for petn-psbm, Phoenix/Pseudophoenix for trnc-petn and petn-psbm, Gossypium for psbe-petl, Silene for trnh (GUG) -psba and clpp-psbb, and Anthriscus /Daucus for ndhf-rpl32, rpl32-trnl (UAG), 5 trnk- 3 rps16, 5 rps16-trnq (UUG), and atph-atpi (Appendix S1). The most informative noncoding cpdna regions Figure 2A shows the number of genetic differences observed within every single-copy NC-cpDNA and matk, averaged across 25 species pairs. In this analysis, the top 10 regions were: 5 rps16-trnq (UUG), ndhc-trnv (UAC), ndhf-rpl32, trnt (GGU) -psbd, psbe-petl, petapsbj, rpl32-trnl (UAG), rpl16 intron, ndha intron, and rpobtrnc (GCA) (underlines highlight the regions in common between this analysis and the analysis of average percentage contribution to total PICs, below). The matk gene was ranked 12th. Figure 2B summarizes the PIC data where the value for each NC-cpDNA represents the average percentage contribution of the total genetic differences observed in pairwise species comparisons averaged across the 25 lineages. In this analysis, the top 10 regions were: ndhf-rpl32, ndhc-trnv (UAC), rpl16 intron, rpl32-trnl (UAG), 5 rps16-trnq (UUG), 5 trnk (UUU) -3 rps16, psbe-petl, trnt (GGU)- psbd, peta-psbj, and psbm-trnd (GUC) (underlines highlight the regions in common between this analysis and the analysis of overall average PICs, above). The matk gene was again ranked 12th. While the relative positions of the most variable regions may change between the averaged PIC and percentage contribution of averaged PIC analyses (or even lineage to lineage, see below), there are eight NC-cpDNAs in the top 10 of both of these analyses and these are underlined above and throughout the rest of this paper. These eight regions account for roughly 21% of the PICs in a given lineage (based on percentage contribution of each region averaged across the 25 species pairs). The top 12 regions account for >33% of the total PICs. Percentage variability within each region is shown in Fig. 2C. Eight of the top 11 most variable regions in this analysis had an average total length of about 250 bp and 9/11 were 350 bp, making the total PICs offered by these regions relatively low even though their percentage variability was high (compare Fig. 2C with Fig. 2A and 2B ). The rpl32-trnl (UAG) and rps15-ycf1 regions were the only two of the top regions that were over 500 bp and only rpl32-trnl (UAG) was over 750 bp. Because choosing the most informative NC-cpDNAs requires combining the effects of NCcpDNA length and percentage variability, Fig. 2D is positioned next to Fig. 2C to highlight the fact that several NC-cpDNAs that are highly variable by percentage are also very short. Because we had more than one species pair within most major lineages, we compared the trends in the major clades of angiosperms with the overall trends described above ( Fig. 3A F ). While the ranking of the top regions is not perfectly consistent across the major lineages, there are some NC-cpDNAs that consistently rank among the top 13 (top 13 shown to incorporate matk because it has been a popular and informative marker since the beginning of sequencing cpdna). Either ndhf-rpl32 or rpl32-trnl (UAG) was the highest ranked region in four of six major lineages, in one lineage rpl32-trnl (UAG) was ranked second behind 5 rps16-trnq (UUG ), and both regions were too variable to be included in one species comparison ( Anthriscus/Daucus ). The 5 rps16-trnq (UUG) region ranked in the top two in two of six major lineages and in the top 13 in four of six lineages, but was excluded from one lineage for being too variable ( Daucus/Anthriscus ). It should be noted that eurosid I was the only major lineage in which neither ndhf-rpl32 nor rpl32- trnl (UAG) ranked in the top 2; however, these NC-cpDNA regions are missing from Populus and unusually small in Vigna, perhaps skewing the data because these two lineages account for two of four species pairs of the eurosid I group. The rpl32-trnl (UAG) region did rank in the top five in the other eurosid I, Cucumis. Another high-ranking region, ndhc-trnv (UAC), ranked in the top five in four of six of the major lineages. Analysis of the top eight most variable regions in the 25 individual species pair comparisons (Fig. 4 ) shows that ndhctrnv (UAC), rpl32- trnl (UAG), and ndhf-rpl32 all ranked in the top 10 in 16 of 25 lineages, psbe-petl in 15 of 25 lineages (and it was too variable to be included in Gossypium ), trnt (GGU)-psbD in 14 of 25 lineages, 5 rps16-trnq (UUG) and peta-psbj in 11 of 25 lineages, and rpl16 intron in 10 of 25 lineages. Adjusted mean PICs Results of the mixed-model AN- COVA (which accounted for variation across lineages and controlled for the effect of region length) indicate that the mean log-transformed number of PICs/region differed across the 10 NC-cpDNAs that ranked the highest in the overall percentage contribution analysis ( F 9,194 = 3.09, P = ). Following the ANCOVA, we performed pairwise contrasts of adjusted means and/or groups of means using Scheffé s procedure; selected comparisons involving combined DNA regions (combined vs. combined; single vs. combined) revealed significant differences in the mean PICs/region between certain groups of the 10 regions. Scheffé s contrasts revealed significant differences in adjusted mean PICs/region among three groups of combined DNA regions ( P < 0.05; Fig. 5 ). Mean PICs/region (compared as groups based on visual breaks in the data) was highest for Group 1 (geometric mean = 19.2 PICs; regions ndhf-rpl32, rpl32-trnl (UAG) ), intermediate for Group 2 (12.7 PICs; regions ndhc-trnv (UAC), rpl16 intron, peta-psbj ), and lowest for Group 3 (10.2 PICs; regions 5 rps16-trnq (UUG), psbe-petl, trnt (GGU) - psbd, ndha intron, rpob-trnc (GCA) ). Pairwise contrasts within the three groups yielded no differences in mean PICs/region ( P > 0.05).

6 1992 AMERICAN JOURNAL OF BOTANY [Vol. 101 Fig. 2. Expected number of genetic differences within the noncoding single-copy portions of the plastome (NC-cpDNA). Gene order is preserved beginning at the top with the intersection of inverted repeat b (IRb) and the large single-copy (LSC) region; the small single-copy (SSC) region is at the bottom and begins with rps15-ycf1. The vertical black lines and the four-point stars highlight the 10 regions that on average contain the greatest number of potentially informative characters (PICs) in those analyses (A, B). (A) Number of PICs averaged across the 25 species pairs. (B) Average percentage of potentially informative characters (PICs) that each noncoding region contributes to the total PICs observed between species pairs. (C) Percentage variability averaged across 25 species pairs. The top 11 most variable regions by percentage are cutoff from the rest by the vertical black line and the four-point stars (there was a tie for 10th place). Arrows from NC-cpDNAs in (C) point to the same regions in (D) to highlight the fact that 8/11 of the most variable regions in terms of percentage variability are too short to be of great value. (D) Length of noncoding regions averaged across 25 species pairs. The black line in (D) marks 250 bp to highlight those regions that may be potentially highly variable in terms of percentage but are less than 250 bp long.

7 November 2014] SHAW ET AL. TORTOISE AND HARE IV 1993 Fig. 2. Continued.

8 1994 AMERICAN JOURNAL OF BOTANY [Vol. 101 Fig. 3. Clade-specific ranks of single-copy noncoding NC-cpDNAs (top 13). Percentage contribution of total potentially informative characters (PICs) ( y -axis) for each NC-cpDNA ( x -axis) averaged across the species pairs in the clade. The eight highest-ranking regions in the analysis of all 25 lineages combined are marked with four-pointed stars ( ndhf-rpl32, rpl32-trnl, rps16-trnq, ndhc-trnv, trnt-psbd, psbe-petl, peta-psbj, rpl16 intron ). In another contrast (excluding Group 1), the mean PICs/region for the highest-ranking Group 2 region (geometric mean = 14.7 PICs; region ndhc-trnv (UAC ) was higher than that for other regions in Groups 2 and 3 (10.6 PICs; regions rpl16 intron, petapsbj, 5 rps16-trnq (UUG), psbe-petl, trnt (GGU) -psbd, ndha intron, rpob-trnc (GCA) ; P < 0.05). In summary, the ANCOVA and multiple contrasts of adjusted mean PICs/region within the top 10 regions (averaged across lineages) indicated that ndhfrpl32, rpl32-trnl (UAG), and ndhc-trnv (UAC) were the most variable frames, respectively.

9 November 2014] SHAW ET AL. TORTOISE AND HARE IV 1995 Literature review The total number of phylogeography or phylogeographic papers published between 1987 and 2013 was We refined this number by limiting the selections to articles only (9286) and refining by the search term *aceae to capture any mention of a plant family in the text, resulting in a total of 1089 papers. Further refining by the search terms chloroplast or plastid or cpdna resulted in 699 papers; refining by mitochondria or mitochondrial or mtdna resulted in 351 papers; refining by nuclear or ndna resulted in 253 papers ( Fig. 6 ). For the literature review, 178 papers published between 2007 and 2013 were reviewed for this study. Of those 178, 123 (69%) used cpdna sequence data ( Fig. 7 ). The percentage of papers using cpdna sequences varied by year, but ranged between 50% (2007) and 81% (2012). The average number of cpdna regions used was two to three, with a minimum of one to a maximum of six. Only 28 of the 178 papers explicitly reported screening additional loci, although it was not always intimated what loci were tested. When this information was provided, we determined that the mean number of loci screened by year ranged from four to 15, with a few papers indicating several regions were screened. Comparing the noncoding regions within studies showed that in 11 comparisons, trnh (GUG) -psba provided more PICs than other regions in the same study. However, in six other studies, trnh (GUG) -psba was not the best performer. The region with the second highest observed PICs was trns (GCU) -5 trng (GCC), based on nine comparisons. There were only two studies (in which it was included) that it was not the top performer. Of the top regions reported in the present study, two ( ndhc-trnv (UAC) and psbm-trnd (GUC) were not used in any of the studies reviewed for this comparison. The remaining top regions identified here were used infrequently in the surveyed papers: rpl32-trnl (UAG) (best in three of four studies); ndhf-rpl32 (best in the one study in which it was used); psbe-petl (best in the one study in which it was used); trnt (UGU) -trnl (UAA) (best in one of two studies); trnt (GGU)-psbD, rpob-trnc (GCA), and peta-psbj were not best in any of the studies in which they were used (2, 1, and 2 studies, respectively; Appendix S2, see online Supplemental Data). DISCUSSION NC-cpDNAs are important for low-level studies Today, data from NC-cpDNA is the most commonly used tool for phylogeographic and low-level phylogenetic studies of plants ( Fig. 6 ). Our literature review highlighted the fact that researchers working at inter- and intraspecific levels still rely heavily on NC-cpDNA sequence data from one to a few loci ( Figs. 6, 7 ). Of the more recently published plant phylogeography papers, those published between 2007 and 2013, nearly 70% used cp- DNA sequence data ( Fig. 7 ). There is a body of literature dating back to the 1990s in which botanists have been searching for the most appropriate NC-cpDNAs for low-level phylogenetic and phylogeographic studies ( Taberlet et al., 1991 ; Demesure et al., 1995 ; Dumolin- Lapegue et al., 1997b ; Small et al., 1998 ; Hamilton, 1999 ; Saltonstall, 2001 ; Shaw et al., 2005, 2007 ). In the mid-2000s, a few studies compared numerous NC-cpDNA regions across a broad enough taxonomic scope such that general trends about NC-cpDNA region relative variability could be reported ( Small et al., 1998 ; Bastia et al., 2001 ; Grivet et al., 2001 ; Aoki et al., 2003 ; Dhingra and Folta, 2005 ; Shaw et al., 2005, 2007 ; Ebert and Peakall, 2009 ). Reports of these general trends may have allowed workers to begin utilizing the most informative NC-cpDNA regions and push the use of the plastome further toward the shallowest taxonomic levels. The most informative noncoding cpdna regions A few NC-cpDNA regions stand out as being the most likely to contain high levels of variability at the inter- and intraspecific levels, even though this study ( Figs. 2 4 ) and a body of literature ( Mort et al., 2007 ; Shaw et al., 2007 ; Särkinen and George, 2013 ) reinforces the observation that there is no universally best NC-cpDNA. That said, screening what are known to be, on average, the fastest evolving regions is a good starting point for molecular investigations of new groups ( Cires et al., 2012 ; Fior et al., 2013 ). In the best-case scenario, researchers would have, or could generate, at least two complete plastomes within their study genus (or study section of large genera) to screen for the most informative regions, before beginning a phylogeographic or low-level phylogenetic study ( Doorduin et al., 2011 ; Fehrmann et al., 2012 ; Li et al., 2013 ; Särkinen and George, 2013 ). Since there may be as many as genera of angiosperms ( org ) and GenBank presently (August 2014) has only 374 angiosperms plastomes representing 205 genera and a fair number of these are either species poor, endemic, or genera of parasitic plants and not likely to be useful predictors of variability across large numbers of species it is unlikely that most researchers will have this advantage for several more years. While it is true that the cost of NGS approaches are rapidly declining, and novel and simpler methods for undertaking such work are being rapidly published, many researchers (particularly at primarily undergraduate institutions) are still limited by funding and computational capacity. For these researchers, as well as others, there still remains a trade-off between collecting big data for a small number of taxa or individuals at the same cost of collecting small data for a larger number of taxa or individuals. The cost of software (e.g., Geneious) may be inexpensive to some, yet to other researchers this cost alone may be prohibitive. Additionally, for phylogeographic studies, software has not yet fully developed to be able to handle the computational loads associated with NGS big data. Therefore, the choice between NGS and Sanger approaches, while not necessarily mutually exclusive ( Fior et al., 2013 ), is going to be an independent one based on funding (both short-term and long-term potential), laboratory infrastructure, and computational support in a given laboratory. As such, there is still a valid need for information such as that presented here, providing insight into which NC-cpDNA regions or portions of the plastome are likely to be most informative (Fig. 8 ). That is, the results of our work can guide researchers who want to use the most variable portions of the plastome, whether they are amplified independently, in larger clusters, multiplexed, or extracted from whole plastome data sets. That said, three regions of the plastome stand out to us as hotspots of variability ( Fig. 8 ). First is the area in the SSC that runs from ccsa to ndhf. This portion of the plastome contains two of the most variable single regions ( ndhf-rpl32 and rpl32- trnl (UAG) ). The second hotspot, is a larger area of the genome from matk to 3 trng (GCC), which contains several fairly variable regions including matk, 5 trnk (UUU) -3 rps16, 5 rps16- trnq (UUG), and trns (GCU) -5 trng (GCC), among others. The third hotspot is from rpob to psbd, and this larger portion of the genome contains several high-ranking regions all clustered together. Interestingly, there are a fair number of rearrangements in these regions of the plastome as well (Appendix S1). Based on the extensive plastome survey included in the present study,

10 1996 AMERICAN JOURNAL OF BOTANY [Vol. 101 Fig. 4. Potentially informative characters ( y -axis) for each NC-cpDNA ( x -axis) within each species pair. The eight highest-ranking regions in the analysis of all 25 lineages combined are marked with four-pointed stars ( ndhf-rpl32, rpl32-trnl, rps16-trnq, ndhc-trnv, trnt-psbd, psbe-petl, peta-psbj, rpl16 intron ). Mag. = Magnolia, Cymb. = Cymbidium, Phalae. = Phalaenopsis, Aeth. = Aethionema, Sol. = Solanum, Chrys. = Chrysanthemum.

11 November 2014] SHAW ET AL. TORTOISE AND HARE IV 1997 Fig. 4. we suggest that is it possible to make good predictions based on either the overall trends or based on the trends observed within a clade of interest ( Figs. 2 4 ). Continued. We used 25 closely related species pairs (most were congeners) from 10 major lineages of angiosperms, with multiple species pairs from six of these clades, to test all NC-cpDNA regions

12 1998 AMERICAN JOURNAL OF BOTANY [Vol. 101 Fig. 5. Geometric mean potentially informative characters (PICs) ± SE per region for the top 10 NC-cpDNA regions ( n = lineages per region). Regression-based estimates [ y = log 10 (total PICs per region +1), x = log 10 (length of region + 1)] were calculated for a common x value. Regions are coded 1 to 10 (highest to lowest) and arranged in decreasing order of mean values. Mean PICs/region are reported for three statistical groups; group means followed by different letters are significantly different ( P < 0.05; ANCOVA, Scheffé contrasts). plus matk. Our data predicts that the regions most likely to contain the greatest number of variable sites in angiosperms are 5 rps16-trnq (UUG), ndhc-trnv (UAC), ndhf-rpl32, trnt (GGU) -psbd, psbe-petl, peta-psbj, rpl32-trnl (UAG), rpl16 intron, ndha intron, rpob-trnc (GCA), trns (GCU) -5 trng (GCC), 5 trnk (UUU) -3 rps16, psbmtrnd (GUC), and matk ( Fig. 2A, B ). Primers have been developed and are universal for most of these regions (see Shaw et al., 2007 ). However, the coding regions of ndhf-rpl32 and rpl32-trnl (UAG) Fig. 6. Intraspecific studies of plants from 1987 to 2013 showing the contributions of the separate plant genomes. We performed a search in Web of Science (May 2014) to determine which plant genome(s) was most frequently used in the phylogeographic literature. Search terms included phylogeography or phylogeographic refined by articles only and the term *aceae to identify studies including a plant family, with subsequent independent refinements using the terms chloroplast or plastid or cpdna or mitochondria or mitochondrial or mtdna or nuclear or ndna. Fig. 7. Summary of cpdna sequence used in 178 plant phylogeographic studies published between 2007 and Of 178 studies surveyed over 7 years, 69% used cpdna sequence data. This percentage varied by year, ranging between 50% (2007) and 81% (2012), and it has not dropped below 68% since (specifically, ndhf and rpl32 ) are highly variable, making universal primer design difficult for these two NC-cpDNAs. Appendix S3 contains a brief discussion on the ndhf-rpl32 and rpl32-trnl (UAG) primers and a table that will aid in lineagespecific primer design for future projects. The most variable NC-cpDNA regions identified in this research agree with the previous findings by Shaw et al. (2007) and others ( Byrne and Hankinson, 2012 ; Dong et al., 2012 ). It is interesting that these studies are highly corroborative regarding the best NC-cpDNAs for low-level studies, even though they sampled completely different species from many different plant families. That is, none of the 25 species pairs sampled here were the same as any species pair from TH2 or TH3. Both the present study and TH3 converge on ndhf-rpl32, rpl32-trnl (UAG), 5 rps16-trnq (UUG), ndhc-trnv (UAC), psbj-peta, trnt (GGU) -psbd, and psbe-psbl being among the most informative regions of the plastome for most groups. Even analyzed in a different way and with different species, Dong et al. (2012) and Byrne and Hankinson (2012) showed rpl32-trnl (UAG), 5 rps16-trnq (UUG), ndhc-trnv (UAC), trnt (GGU) -psbd, and rpl16 intron to be among the most variable. Additionally, new comparative plastomic data continue to be consistent with our findings at multiple taxonomic levels, including Illicium (O. R. Leonard and A. B. Morris, unpublished data), Solanaceae ( Särkinen and George, 2013 ), and Apiales ( Downie and Jansen, in press ). Finally, numerous other studies with narrower taxonomic scopes have also shown these same regions to be the most variable of the angiosperm plastome (see discussion, below). Like TH3, we show that when making finer level comparisons, there is a lesser degree of predictability among the top NCcpDNAs. For example, trns (GCU) -5 trng (GCC) ranked just outside of the top 10 regions in the overall comparison, and it ranked in the top 12 in 3/6 of the comparisons of major lineages, but it ranked first place in the Phalaenopsis (monocot) and Populus (eurosid I) comparisons and third place in Olea (euasterid I) ( Fig. 4 ). Interestingly,

13 November 2014] SHAW ET AL. TORTOISE AND HARE IV 1999 Fig. 8. Map of the potentially informative characters (PICs) expected to be found around the plastome. Gene content, order, and mapping, was based on the map of Lee et al. (2006) for Gossypium hirsutum. Bars radiating in from NC-cpDNA regions show the average percentage contribution to total PICs for each NC-cpDNA and matk. The blue, orange, red, and pink circles indicate increasing predicted PIC values for each NC-cpDNA, as demonstrated by the scale bar. The inverted repeats are indicated by bold black lines on the genome and on the blue, orange, red, and pink circles. Note that three areas highlighted in light green standout as hotspots or clusters of highly variable regions of the plastome: ccsa-ndhf, matk-3 trng, and rpob-psbd. both Phalaenopsis and Populus have plastome rearrangements that result in the partial loss of the ndhf-rpl32-trnl (UAG) region (actually, further supporting our data that this is a highly variable region). In another example, matk, which ranked 12th place overall ( Fig. 2A, 2B ) and in the top 12 positions in 3/6 comparisons of major lineages ( Fig. 3B D ), was ranked first in Aethionema

Supplemental Information. Full transcription of the chloroplast genome in photosynthetic

Supplemental Information. Full transcription of the chloroplast genome in photosynthetic Supplemental Information Full transcription of the chloroplast genome in photosynthetic eukaryotes Chao Shi 1,3*, Shuo Wang 2*, En-Hua Xia 1,3*, Jian-Jun Jiang 2, Fan-Chun Zeng 2, and 1, 2 ** Li-Zhi Gao

More information

USING COMPARATIVE PLASTOMICS TO IDENTIFY POTENTIALLY INFORMATIVE NON-CODING REGIONS FOR BASAL ANGIOSPERMS, WITH A FOCUS ON ILLICIUM (SCHISANDRACEAE)

USING COMPARATIVE PLASTOMICS TO IDENTIFY POTENTIALLY INFORMATIVE NON-CODING REGIONS FOR BASAL ANGIOSPERMS, WITH A FOCUS ON ILLICIUM (SCHISANDRACEAE) USING COMPARATIVE PLASTOMICS TO IDENTIFY POTENTIALLY INFORMATIVE NON-CODING REGIONS FOR BASAL ANGIOSPERMS, WITH A FOCUS ON ILLICIUM (SCHISANDRACEAE) By Opal Rayne Leonard A Thesis Submitted in Partial

More information

Communicating Editor: Mark P. Simmons

Communicating Editor: Mark P. Simmons A Comparative Analysis of Whole Plastid Genomes from the Apiales: Expansion and Contraction of the Inverted Repeat, Mitochondrial to Plastid Transfer of DNA, and Identification of Highly Divergent Noncoding

More information

BMC Plant Biology. Open Access. Abstract

BMC Plant Biology. Open Access. Abstract BMC Plant Biology BioMed Central Research article Complete plastid genome sequences suggest strong selection for retention of photosynthetic genes in the parasitic plant genus Cuscuta Joel R McNeal* 1,2,

More information

Complete chloroplast genome of the genus Cymbidium: lights into the species identification, phylogenetic implications and population genetic analyses

Complete chloroplast genome of the genus Cymbidium: lights into the species identification, phylogenetic implications and population genetic analyses Yang et al. BMC Evolutionary Biology 2013, 13:84 RESEARCH ARTICLE Open Access Complete chloroplast genome of the genus Cymbidium: lights into the species identification, phylogenetic implications and population

More information

Comparison of intraspecific, interspecific and intergeneric chloroplast diversity in Cycads

Comparison of intraspecific, interspecific and intergeneric chloroplast diversity in Cycads www.nature.com/scientificreports OPEN received: 11 May 2016 accepted: 20 July 2016 Published: 25 August 2016 Comparison of intraspecific, interspecific and intergeneric chloroplast diversity in Cycads

More information

Dr. Amira A. AL-Hosary

Dr. Amira A. AL-Hosary Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

More information

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships Chapter 26: Phylogeny and the Tree of Life You Must Know The taxonomic categories and how they indicate relatedness. How systematics is used to develop phylogenetic trees. How to construct a phylogenetic

More information

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

More information

Outline. Genome Evolution. Genome. Genome Architecture. Constraints on Genome Evolution. New Evolutionary Synthesis 11/8/16

Outline. Genome Evolution. Genome. Genome Architecture. Constraints on Genome Evolution. New Evolutionary Synthesis 11/8/16 Genome Evolution Outline 1. What: Patterns of Genome Evolution Carol Eunmi Lee Evolution 410 University of Wisconsin 2. Why? Evolution of Genome Complexity and the interaction between Natural Selection

More information

Phylogenetic diversity and conservation

Phylogenetic diversity and conservation Phylogenetic diversity and conservation Dan Faith The Australian Museum Applied ecology and human dimensions in biological conservation Biota Program/ FAPESP Nov. 9-10, 2009 BioGENESIS Providing an evolutionary

More information

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

More information

Comparative analysis of complete chloroplast genome sequences of two subtropical trees, Phoebe sheareri and Phoebe omeiensis (Lauraceae)

Comparative analysis of complete chloroplast genome sequences of two subtropical trees, Phoebe sheareri and Phoebe omeiensis (Lauraceae) Tree Genetics & Genomes (2017) 13:120 DOI 10.1007/s11295-017-1196-y ORIGINAL ARTICLE Comparative analysis of complete chloroplast genome sequences of two subtropical trees, Phoebe sheareri and Phoebe omeiensis

More information

Complete chloroplast genome sequences contribute to plant species delimitation: A case study of the Anemopaegma species complex 1

Complete chloroplast genome sequences contribute to plant species delimitation: A case study of the Anemopaegma species complex 1 RESEARCH ARTICLE AMERICAN JOURNAL OF BOTANY Complete chloroplast genome sequences contribute to plant species delimitation: A case study of the Anemopaegma species complex 1 Fabiana Firetti2,4, Alexandre

More information

Selecting barcoding loci for plants: evaluation of seven candidate loci with species-level sampling in three divergent groups of land plants

Selecting barcoding loci for plants: evaluation of seven candidate loci with species-level sampling in three divergent groups of land plants Molecular Ecology Resources (2009) 9, 439 457 doi: 10.1111/j.1755-0998.2008.02439.x Blackwell Publishing Ltd DNA BARCODING Selecting barcoding loci for plants: evaluation of seven candidate loci with species-level

More information

Outline. Genome Evolution. Genome. Genome Architecture. Constraints on Genome Evolution. New Evolutionary Synthesis 11/1/18

Outline. Genome Evolution. Genome. Genome Architecture. Constraints on Genome Evolution. New Evolutionary Synthesis 11/1/18 Genome Evolution Outline 1. What: Patterns of Genome Evolution Carol Eunmi Lee Evolution 410 University of Wisconsin 2. Why? Evolution of Genome Complexity and the interaction between Natural Selection

More information

SEQUENCING NUCLEAR MARKERS IN FRESHWATER GREEN ALGAE: CHARA SUBSECTION WILLDENOWIA

SEQUENCING NUCLEAR MARKERS IN FRESHWATER GREEN ALGAE: CHARA SUBSECTION WILLDENOWIA SEQUENCING NUCLEAR MARKERS IN FRESHWATER GREEN ALGAE: CHARA SUBSECTION WILLDENOWIA Stephen D. Gottschalk Department of Biological Sciences, Fordham University, 441 E Fordham Rd, Bronx, NY 10458, USA ABSTRACT

More information

The Phylogenetic Reconstruction of the Grass Family (Poaceae) Using matk Gene Sequences

The Phylogenetic Reconstruction of the Grass Family (Poaceae) Using matk Gene Sequences The Phylogenetic Reconstruction of the Grass Family (Poaceae) Using matk Gene Sequences by Hongping Liang Dissertation submitted to the Faculty of the Virginia Polytechnic Institute and State University

More information

Organelle genome evolution

Organelle genome evolution Organelle genome evolution Plant of the day! Rafflesia arnoldii -- largest individual flower (~ 1m) -- no true leafs, shoots or roots -- holoparasitic -- non-photosynthetic Big questions What is the origin

More information

PHYLOGENY AND SYSTEMATICS

PHYLOGENY AND SYSTEMATICS AP BIOLOGY EVOLUTION/HEREDITY UNIT Unit 1 Part 11 Chapter 26 Activity #15 NAME DATE PERIOD PHYLOGENY AND SYSTEMATICS PHYLOGENY Evolutionary history of species or group of related species SYSTEMATICS Study

More information

8/23/2014. Phylogeny and the Tree of Life

8/23/2014. Phylogeny and the Tree of Life Phylogeny and the Tree of Life Chapter 26 Objectives Explain the following characteristics of the Linnaean system of classification: a. binomial nomenclature b. hierarchical classification List the major

More information

Multiple Multilocus DNA Barcodes from the Plastid Genome Discriminate Plant Species Equally Well

Multiple Multilocus DNA Barcodes from the Plastid Genome Discriminate Plant Species Equally Well Multiple Multilocus DNA Barcodes from the Plastid Genome Discriminate Plant Species Equally Well Aron J. Fazekas 1. *, Kevin S. Burgess 2, Prasad R. Kesanakurti 1, Sean W. Graham 3., Steven G. Newmaster

More information

USING BLAST TO IDENTIFY PROTEINS THAT ARE EVOLUTIONARILY RELATED ACROSS SPECIES

USING BLAST TO IDENTIFY PROTEINS THAT ARE EVOLUTIONARILY RELATED ACROSS SPECIES USING BLAST TO IDENTIFY PROTEINS THAT ARE EVOLUTIONARILY RELATED ACROSS SPECIES HOW CAN BIOINFORMATICS BE USED AS A TOOL TO DETERMINE EVOLUTIONARY RELATIONSHPS AND TO BETTER UNDERSTAND PROTEIN HERITAGE?

More information

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

CHAPTERS 24-25: Evidence for Evolution and Phylogeny CHAPTERS 24-25: Evidence for Evolution and Phylogeny 1. For each of the following, indicate how it is used as evidence of evolution by natural selection or shown as an evolutionary trend: a. Paleontology

More information

Comparing whole genomes

Comparing whole genomes BioNumerics Tutorial: Comparing whole genomes 1 Aim The Chromosome Comparison window in BioNumerics has been designed for large-scale comparison of sequences of unlimited length. In this tutorial you will

More information

Pleione praecox. Painting by: Ms. Hemalata Pradhan

Pleione praecox. Painting by: Ms. Hemalata Pradhan Introduction Painting by: Ms. Hemalata Pradhan Pleione praecox 1 INTRODUCTION DNA barcoding is an innovative molecular technique, which uses short and agreed upon DNA sequence(s) from either nuclear or/and

More information

Plastome organization and evolution of chloroplast genes in Cardamine species adapted to contrasting habitats

Plastome organization and evolution of chloroplast genes in Cardamine species adapted to contrasting habitats Hu et al. BMC Genomics (2015) 16:306 DOI 10.1186/s12864-015-1498-0 RESEARCH ARTICLE Open Access Plastome organization and evolution of chloroplast genes in Cardamine species adapted to contrasting habitats

More information

ESTIMATION OF CONSERVATISM OF CHARACTERS BY CONSTANCY WITHIN BIOLOGICAL POPULATIONS

ESTIMATION OF CONSERVATISM OF CHARACTERS BY CONSTANCY WITHIN BIOLOGICAL POPULATIONS ESTIMATION OF CONSERVATISM OF CHARACTERS BY CONSTANCY WITHIN BIOLOGICAL POPULATIONS JAMES S. FARRIS Museum of Zoology, The University of Michigan, Ann Arbor Accepted March 30, 1966 The concept of conservatism

More information

Implications of the Plastid Genome Sequence of Typha (Typhaceae, Poales) for Understanding Genome Evolution in Poaceae

Implications of the Plastid Genome Sequence of Typha (Typhaceae, Poales) for Understanding Genome Evolution in Poaceae J Mol Evol (2010) 70:149 166 DOI 10.1007/s00239-009-9317-3 Implications of the Plastid Genome Sequence of Typha (Typhaceae, Poales) for Understanding Genome Evolution in Poaceae Mary M. Guisinger Timothy

More information

Amy Driskell. Laboratories of Analytical Biology National Museum of Natural History Smithsonian Institution, Wash. DC

Amy Driskell. Laboratories of Analytical Biology National Museum of Natural History Smithsonian Institution, Wash. DC DNA Barcoding Amy Driskell Laboratories of Analytical Biology National Museum of Natural History Smithsonian Institution, Wash. DC 1 Outline 1. Barcoding in general 2. Uses & Examples 3. Barcoding Bocas

More information

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between

More information

Development of primer pairs from diverse chloroplast genomes for use in plant phylogenetic research

Development of primer pairs from diverse chloroplast genomes for use in plant phylogenetic research Development of primer pairs from diverse chloroplast genomes for use in plant phylogenetic research Y.C. Yang 1, 2, T.L. Kung 1, 3, C.Y. Hu 4 and S.F. Lin 1 1 Department of Agronomy, National Taiwan University,

More information

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9 Lecture 5 Alignment I. Introduction. For sequence data, the process of generating an alignment establishes positional homologies; that is, alignment provides the identification of homologous phylogenetic

More information

C3020 Molecular Evolution. Exercises #3: Phylogenetics

C3020 Molecular Evolution. Exercises #3: Phylogenetics C3020 Molecular Evolution Exercises #3: Phylogenetics Consider the following sequences for five taxa 1-5 and the known outgroup O, which has the ancestral states (note that sequence 3 has changed from

More information

The Evolution of Chloroplast Genes and Genomes in Ferns

The Evolution of Chloroplast Genes and Genomes in Ferns Utah State University DigitalCommons@USU Biology Faculty Publications Biology 1-1-2011 The Evolution of Chloroplast Genes and Genomes in Ferns Paul G. Wolf Utah State University J. P. Der A. M. Duffy J.

More information

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics)

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics) - Phylogeny? - Systematics? The Phylogenetic Systematics (Phylogeny and Systematics) - Phylogenetic systematics? Connection between phylogeny and classification. - Phylogenetic systematics informs the

More information

Convergent plastome evolution and gene loss in holoparasitic Lennoaceae. Centre for Biodiversity Genomics, University of Guelph, Guelph ON, Canada

Convergent plastome evolution and gene loss in holoparasitic Lennoaceae. Centre for Biodiversity Genomics, University of Guelph, Guelph ON, Canada Convergent plastome evolution and gene loss in holoparasitic Lennoaceae (Boraginales). Adam C. Schneider 1,3, 4 *, Thomas Braukmann 2, Arjan Banerjee 1,3, and Saša Stefanović 1 1 Department of Biology,

More information

Lecture 11 Friday, October 21, 2011

Lecture 11 Friday, October 21, 2011 Lecture 11 Friday, October 21, 2011 Phylogenetic tree (phylogeny) Darwin and classification: In the Origin, Darwin said that descent from a common ancestral species could explain why the Linnaean system

More information

Use of DNA metabarcoding to identify plants from environmental samples: comparisons with traditional approaches

Use of DNA metabarcoding to identify plants from environmental samples: comparisons with traditional approaches Use of DNA metabarcoding to identify plants from environmental samples: comparisons with traditional approaches Christine E. Edwards 1, Denise L. Lindsay 2, Thomas Minckley 3, and Richard F. Lance 2 1

More information

Computational approaches for functional genomics

Computational approaches for functional genomics Computational approaches for functional genomics Kalin Vetsigian October 31, 2001 The rapidly increasing number of completely sequenced genomes have stimulated the development of new methods for finding

More information

DNA Barcoding Analyses of White Spruce (Picea glauca var. glauca) and Black Hills Spruce (Picea glauca var. densata)

DNA Barcoding Analyses of White Spruce (Picea glauca var. glauca) and Black Hills Spruce (Picea glauca var. densata) Southern Adventist Univeristy KnowledgeExchange@Southern Senior Research Projects Southern Scholars 4-4-2010 DNA Barcoding Analyses of White Spruce (Picea glauca var. glauca) and Black Hills Spruce (Picea

More information

Grundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson

Grundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson Grundlagen der Bioinformatik, SS 10, D. Huson, April 12, 2010 1 1 Introduction Grundlagen der Bioinformatik Summer semester 2010 Lecturer: Prof. Daniel Huson Office hours: Thursdays 17-18h (Sand 14, C310a)

More information

Aoife McLysaght Dept. of Genetics Trinity College Dublin

Aoife McLysaght Dept. of Genetics Trinity College Dublin Aoife McLysaght Dept. of Genetics Trinity College Dublin Evolution of genome arrangement Evolution of genome content. Evolution of genome arrangement Gene order changes Inversions, translocations Evolution

More information

Microbes usually have few distinguishing properties that relate them, so a hierarchical taxonomy mainly has not been possible.

Microbes usually have few distinguishing properties that relate them, so a hierarchical taxonomy mainly has not been possible. Microbial Taxonomy Traditional taxonomy or the classification through identification and nomenclature of microbes, both "prokaryote" and eukaryote, has been in a mess we were stuck with it for traditional

More information

Microbial Taxonomy. Slowly evolving molecules (e.g., rrna) used for large-scale structure; "fast- clock" molecules for fine-structure.

Microbial Taxonomy. Slowly evolving molecules (e.g., rrna) used for large-scale structure; fast- clock molecules for fine-structure. Microbial Taxonomy Traditional taxonomy or the classification through identification and nomenclature of microbes, both "prokaryote" and eukaryote, has been in a mess we were stuck with it for traditional

More information

Investigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST

Investigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST Investigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST Introduction Bioinformatics is a powerful tool which can be used to determine evolutionary relationships and

More information

MiGA: The Microbial Genome Atlas

MiGA: The Microbial Genome Atlas December 12 th 2017 MiGA: The Microbial Genome Atlas Jim Cole Center for Microbial Ecology Dept. of Plant, Soil & Microbial Sciences Michigan State University East Lansing, Michigan U.S.A. Where I m From

More information

a,bD (modules 1 and 10 are required)

a,bD (modules 1 and 10 are required) This form should be used for all taxonomic proposals. Please complete all those modules that are applicable (and then delete the unwanted sections). For guidance, see the notes written in blue and the

More information

Microbial Taxonomy. Microbes usually have few distinguishing properties that relate them, so a hierarchical taxonomy mainly has not been possible.

Microbial Taxonomy. Microbes usually have few distinguishing properties that relate them, so a hierarchical taxonomy mainly has not been possible. Microbial Taxonomy Traditional taxonomy or the classification through identification and nomenclature of microbes, both "prokaryote" and eukaryote, has been in a mess we were stuck with it for traditional

More information

Objective: You will be able to justify the claim that organisms share many conserved core processes and features.

Objective: You will be able to justify the claim that organisms share many conserved core processes and features. Objective: You will be able to justify the claim that organisms share many conserved core processes and features. Do Now: Read Enduring Understanding B Essential knowledge: Organisms share many conserved

More information

Case Study. Who s the daddy? TEACHER S GUIDE. James Clarkson. Dean Madden [Ed.] Polyploidy in plant evolution. Version 1.1. Royal Botanic Gardens, Kew

Case Study. Who s the daddy? TEACHER S GUIDE. James Clarkson. Dean Madden [Ed.] Polyploidy in plant evolution. Version 1.1. Royal Botanic Gardens, Kew TEACHER S GUIDE Case Study Who s the daddy? Polyploidy in plant evolution James Clarkson Royal Botanic Gardens, Kew Dean Madden [Ed.] NCBE, University of Reading Version 1.1 Polypoidy in plant evolution

More information

SHARED MOLECULAR SIGNATURES SUPPORT THE INCLUSION OF CATAMIXIS IN SUBFAMILY PERTYOIDEAE (ASTERACEAE).

SHARED MOLECULAR SIGNATURES SUPPORT THE INCLUSION OF CATAMIXIS IN SUBFAMILY PERTYOIDEAE (ASTERACEAE). 418 SHARED MOLECULAR SIGNATURES SUPPORT THE INCLUSION OF CATAMIXIS IN SUBFAMILY PERTYOIDEAE (ASTERACEAE). Jose L. Panero Section of Integrative Biology, 1 University Station, C0930, The University of Texas,

More information

NGSS Example Bundles. Page 1 of 13

NGSS Example Bundles. Page 1 of 13 High School Modified Domains Model Course III Life Sciences Bundle 4: Life Diversifies Over Time This is the fourth bundle of the High School Domains Model Course III Life Sciences. Each bundle has connections

More information

The identification of animal biological diversity by using

The identification of animal biological diversity by using Use of DNA barcodes to identify flowering plants W. John Kress*, Kenneth J. Wurdack*, Elizabeth A. Zimmer*, Lee A. Weigt, and Daniel H. Janzen *Department of Botany and Laboratories of Analytical Biology,

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary information S1 (box). Supplementary Methods description. Prokaryotic Genome Database Archaeal and bacterial genome sequences were downloaded from the NCBI FTP site (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/)

More information

The Chlamydomonas reinhardtii Plastid Chromosome: Islands of Genes in a Sea of Repeats

The Chlamydomonas reinhardtii Plastid Chromosome: Islands of Genes in a Sea of Repeats The Plant Cell, Vol. 14, 2659 2679, November 2002, www.plantcell.org 2002 American Society of Plant Biologists GENOMICS ARTICLE The Chlamydomonas reinhardtii Plastid Chromosome: Islands of Genes in a Sea

More information

Chapter 27: Evolutionary Genetics

Chapter 27: Evolutionary Genetics Chapter 27: Evolutionary Genetics Student Learning Objectives Upon completion of this chapter you should be able to: 1. Understand what the term species means to biology. 2. Recognize the various patterns

More information

Molecular Markers, Natural History, and Evolution

Molecular Markers, Natural History, and Evolution Molecular Markers, Natural History, and Evolution Second Edition JOHN C. AVISE University of Georgia Sinauer Associates, Inc. Publishers Sunderland, Massachusetts Contents PART I Background CHAPTER 1:

More information

MULTIPLE SEQUENCE ALIGNMENT FOR CONSTRUCTION OF PHYLOGENETIC TREE

MULTIPLE SEQUENCE ALIGNMENT FOR CONSTRUCTION OF PHYLOGENETIC TREE MULTIPLE SEQUENCE ALIGNMENT FOR CONSTRUCTION OF PHYLOGENETIC TREE Manmeet Kaur 1, Navneet Kaur Bawa 2 1 M-tech research scholar (CSE Dept) ACET, Manawala,Asr 2 Associate Professor (CSE Dept) ACET, Manawala,Asr

More information

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other? Phylogeny and systematics Why are these disciplines important in evolutionary biology and how are they related to each other? Phylogeny and systematics Phylogeny: the evolutionary history of a species

More information

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Bio 1B Lecture Outline (please print and bring along) Fall, 2007 Bio 1B Lecture Outline (please print and bring along) Fall, 2007 B.D. Mishler, Dept. of Integrative Biology 2-6810, bmishler@berkeley.edu Evolution lecture #5 -- Molecular genetics and molecular evolution

More information

STEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization)

STEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization) STEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization) Laura Salter Kubatko Departments of Statistics and Evolution, Ecology, and Organismal Biology The Ohio State University kubatko.2@osu.edu

More information

BLAST. Varieties of BLAST

BLAST. Varieties of BLAST BLAST Basic Local Alignment Search Tool (1990) Altschul, Gish, Miller, Myers, & Lipman Uses short-cuts or heuristics to improve search speed Like speed-reading, does not examine every nucleotide of database

More information

Genomes and Their Evolution

Genomes and Their Evolution Chapter 21 Genomes and Their Evolution PowerPoint Lecture Presentations for Biology Eighth Edition Neil Campbell and Jane Reece Lectures by Chris Romero, updated by Erin Barley with contributions from

More information

Quantifying sequence similarity

Quantifying sequence similarity Quantifying sequence similarity Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, February 16 th 2016 After this lecture, you can define homology, similarity, and identity

More information

A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS

A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS CRYSTAL L. KAHN and BENJAMIN J. RAPHAEL Box 1910, Brown University Department of Computer Science & Center for Computational Molecular Biology

More information

Research Article Analysis of the Complete Mitochondrial Genome Sequence of the Diploid Cotton Gossypium raimondii by Comparative Genomics Approaches

Research Article Analysis of the Complete Mitochondrial Genome Sequence of the Diploid Cotton Gossypium raimondii by Comparative Genomics Approaches BioMed Research International Volume 2016, Article ID 5040598, 18 pages http://dx.doi.org/10.1155/2016/5040598 Research Article Analysis of the Complete Mitochondrial Genome Sequence of the Diploid Cotton

More information

Integrative Biology 200A "PRINCIPLES OF PHYLOGENETICS" Spring 2012 University of California, Berkeley

Integrative Biology 200A PRINCIPLES OF PHYLOGENETICS Spring 2012 University of California, Berkeley Integrative Biology 200A "PRINCIPLES OF PHYLOGENETICS" Spring 2012 University of California, Berkeley B.D. Mishler Feb. 7, 2012. Morphological data IV -- ontogeny & structure of plants The last frontier

More information

Major questions of evolutionary genetics. Experimental tools of evolutionary genetics. Theoretical population genetics.

Major questions of evolutionary genetics. Experimental tools of evolutionary genetics. Theoretical population genetics. Evolutionary Genetics (for Encyclopedia of Biodiversity) Sergey Gavrilets Departments of Ecology and Evolutionary Biology and Mathematics, University of Tennessee, Knoxville, TN 37996-6 USA Evolutionary

More information

Phylogenetic Analysis

Phylogenetic Analysis Phylogenetic Analysis Aristotle Through classification, one might discover the essence and purpose of species. Nelson & Platnick (1981) Systematics and Biogeography Carl Linnaeus Swedish botanist (1700s)

More information

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University Sequence Alignment: A General Overview COMP 571 - Fall 2010 Luay Nakhleh, Rice University Life through Evolution All living organisms are related to each other through evolution This means: any pair of

More information

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Bioinformatics. Dept. of Computational Biology & Bioinformatics Bioinformatics Dept. of Computational Biology & Bioinformatics 3 Bioinformatics - play with sequences & structures Dept. of Computational Biology & Bioinformatics 4 ORGANIZATION OF LIFE ROLE OF BIOINFORMATICS

More information

Microbiome: 16S rrna Sequencing 3/30/2018

Microbiome: 16S rrna Sequencing 3/30/2018 Microbiome: 16S rrna Sequencing 3/30/2018 Skills from Previous Lectures Central Dogma of Biology Lecture 3: Genetics and Genomics Lecture 4: Microarrays Lecture 12: ChIP-Seq Phylogenetics Lecture 13: Phylogenetics

More information

DNA BARCODING OF PLANTS AT SHAW NATURE RESERVE USING matk AND rbcl GENES

DNA BARCODING OF PLANTS AT SHAW NATURE RESERVE USING matk AND rbcl GENES DNA BARCODING OF PLANTS AT SHAW NATURE RESERVE USING matk AND rbcl GENES LIVINGSTONE NGANGA. Missouri Botanical Garden. Barcoding is the use of short DNA sequences to identify and differentiate species.

More information

A novel chloroplast gene reported for flagellate plants

A novel chloroplast gene reported for flagellate plants RESEARCH ARTICLE BRIEF COMMUNICATION A novel chloroplast gene reported for flagellate plants Michael Song 1,5, *, Li-Yaung Kuo 2,3, *, Layne Huiet 4, Kathleen M. Pryer 4, Carl J. Rothfels 1, and Fay-Wei

More information

Leber s Hereditary Optic Neuropathy

Leber s Hereditary Optic Neuropathy Leber s Hereditary Optic Neuropathy Dear Editor: It is well known that the majority of Leber s hereditary optic neuropathy (LHON) cases was caused by 3 mtdna primary mutations (m.3460g A, m.11778g A, and

More information

Computational Biology

Computational Biology Computational Biology Lecture 6 31 October 2004 1 Overview Scoring matrices (Thanks to Shannon McWeeney) BLAST algorithm Start sequence alignment 2 1 What is a homologous sequence? A homologous sequence,

More information

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega BLAST Multiple Sequence Alignments: Clustal Omega What does basic BLAST do (e.g. what is input sequence and how does BLAST look for matches?) Susan Parrish McDaniel College Multiple Sequence Alignments

More information

Non-independence in Statistical Tests for Discrete Cross-species Data

Non-independence in Statistical Tests for Discrete Cross-species Data J. theor. Biol. (1997) 188, 507514 Non-independence in Statistical Tests for Discrete Cross-species Data ALAN GRAFEN* AND MARK RIDLEY * St. John s College, Oxford OX1 3JP, and the Department of Zoology,

More information

Small RNA in rice genome

Small RNA in rice genome Vol. 45 No. 5 SCIENCE IN CHINA (Series C) October 2002 Small RNA in rice genome WANG Kai ( 1, ZHU Xiaopeng ( 2, ZHONG Lan ( 1,3 & CHEN Runsheng ( 1,2 1. Beijing Genomics Institute/Center of Genomics and

More information

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010 BLAST Database Searching BME 110: CompBio Tools Todd Lowe April 8, 2010 Admin Reading: Read chapter 7, and the NCBI Blast Guide and tutorial http://www.ncbi.nlm.nih.gov/blast/why.shtml Read Chapter 8 for

More information

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the

More information

Microbial Taxonomy and the Evolution of Diversity

Microbial Taxonomy and the Evolution of Diversity 19 Microbial Taxonomy and the Evolution of Diversity Copyright McGraw-Hill Global Education Holdings, LLC. Permission required for reproduction or display. 1 Taxonomy Introduction to Microbial Taxonomy

More information

Phylogenomics. Jeffrey P. Townsend Department of Ecology and Evolutionary Biology Yale University. Tuesday, January 29, 13

Phylogenomics. Jeffrey P. Townsend Department of Ecology and Evolutionary Biology Yale University. Tuesday, January 29, 13 Phylogenomics Jeffrey P. Townsend Department of Ecology and Evolutionary Biology Yale University How may we improve our inferences? How may we improve our inferences? Inferences Data How may we improve

More information

The Scope and Growth of Spatial Analysis in the Social Sciences

The Scope and Growth of Spatial Analysis in the Social Sciences context. 2 We applied these search terms to six online bibliographic indexes of social science Completed as part of the CSISS literature search initiative on November 18, 2003 The Scope and Growth of Spatial

More information

M OLECULAR SYSTEMATICS OF THE NEOTROPICAL GENUS PSIGURIA (CUCURBITACEAE): IMPLICATIONS FOR PHYLOGENY

M OLECULAR SYSTEMATICS OF THE NEOTROPICAL GENUS PSIGURIA (CUCURBITACEAE): IMPLICATIONS FOR PHYLOGENY American Journal of Botany 97(1): 156 173. 2010. M OLECULAR SYSTEMATICS OF THE NEOTROPICAL GENUS PSIGURIA (CUCURBITACEAE): IMPLICATIONS FOR PHYLOGENY AND SPECIES IDENTIFICATION 1 P. Roxanne Steele2,3,4,

More information

Phylogenetic Analysis

Phylogenetic Analysis Phylogenetic Analysis Aristotle Through classification, one might discover the essence and purpose of species. Nelson & Platnick (1981) Systematics and Biogeography Carl Linnaeus Swedish botanist (1700s)

More information

Phylogenetic Analysis

Phylogenetic Analysis Phylogenetic Analysis Aristotle Through classification, one might discover the essence and purpose of species. Nelson & Platnick (1981) Systematics and Biogeography Carl Linnaeus Swedish botanist (1700s)

More information

ABSTRACT INTRODUCTION. Xin Yao 1,2, Ying-Ying Liu 3, Yun-Hong Tan 1, Yu Song 1,2 and Richard T. Corlett 1

ABSTRACT INTRODUCTION. Xin Yao 1,2, Ying-Ying Liu 3, Yun-Hong Tan 1, Yu Song 1,2 and Richard T. Corlett 1 The complete chloroplast genome sequence of Helwingia himalaica (Helwingiaceae, Aquifoliales) and a chloroplast phylogenomic analysis of the Campanulidae Xin Yao,2, Ying-Ying Liu 3, Yun-Hong Tan, Yu Song,2

More information

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) Bioinformática Sequence Alignment Pairwise Sequence Alignment Universidade da Beira Interior (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) 1 16/3/29 & 23/3/29 27/4/29 Outline

More information

Analysis of putative DNA barcodes for identification and distinction of native and invasive plant species

Analysis of putative DNA barcodes for identification and distinction of native and invasive plant species Babson College Digital Knowledge at Babson Babson Faculty Research Fund Working Papers Babson Faculty Research Fund 2010 Analysis of putative DNA barcodes for identification and distinction of native and

More information

PHYLOGENY & THE TREE OF LIFE

PHYLOGENY & THE TREE OF LIFE PHYLOGENY & THE TREE OF LIFE PREFACE In this powerpoint we learn how biologists distinguish and categorize the millions of species on earth. Early we looked at the process of evolution here we look at

More information

Taxonomy. Content. How to determine & classify a species. Phylogeny and evolution

Taxonomy. Content. How to determine & classify a species. Phylogeny and evolution Taxonomy Content Why Taxonomy? How to determine & classify a species Domains versus Kingdoms Phylogeny and evolution Why Taxonomy? Classification Arrangement in groups or taxa (taxon = group) Nomenclature

More information

and just what is science? how about this biology stuff?

and just what is science? how about this biology stuff? Welcome to Life on Earth! Rob Lewis 512.775.6940 rlewis3@austincc.edu 1 The Science of Biology Themes and just what is science? how about this biology stuff? 2 1 The Process Of Science No absolute truths

More information

MtDNA profiles and associated haplogroups

MtDNA profiles and associated haplogroups A systematic approach to an old problem Anita Brandstätter 1, Alexander Röck 2, Arne Dür 2, Walther Parson 1 1 Institute of Legal Medicine Innsbruck Medical University 2 Institute of Mathematics University

More information

Another Look at the Root of the Angiosperms Reveals a Familiar Tale

Another Look at the Root of the Angiosperms Reveals a Familiar Tale Syst. Biol. 63(3):368 382, 2014 The Author(s) 2014. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com

More information

A (short) introduction to phylogenetics

A (short) introduction to phylogenetics A (short) introduction to phylogenetics Thibaut Jombart, Marie-Pauline Beugin MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis with PR Statistics, Millport Field

More information

Received: 17 August 2017; Accepted: 10 September 2017; Published: 11 September 2017

Received: 17 August 2017; Accepted: 10 September 2017; Published: 11 September 2017 Article Complete Chloroplast Genome of Pinus massoniana (Pinaceae): Gene Rearrangements, Loss of ndh Genes, and Short Inverted Repeats Contraction, Expansion ZhouXian Ni 1, YouJu Ye 1, Tiandao Bai 1,2,

More information

Thirteen Camellia chloroplast genome sequences determined by high-throughput sequencing: genome structure and phylogenetic relationships

Thirteen Camellia chloroplast genome sequences determined by high-throughput sequencing: genome structure and phylogenetic relationships Huang et al. BMC Evolutionary Biology 2014, 14:151 RESEARCH ARTICLE Open Access Thirteen Camellia chloroplast genome sequences determined by high-throughput sequencing: genome structure and phylogenetic

More information

BIOINFORMATICS LAB AP BIOLOGY

BIOINFORMATICS LAB AP BIOLOGY BIOINFORMATICS LAB AP BIOLOGY Bioinformatics is the science of collecting and analyzing complex biological data. Bioinformatics combines computer science, statistics and biology to allow scientists to

More information