Comparison of Three Fugal ITS Reference Sets. Qiong Wang and Jim R. Cole

Size: px
Start display at page:

Download "Comparison of Three Fugal ITS Reference Sets. Qiong Wang and Jim R. Cole"

Transcription

1 RDP TECHNICAL REPORT Created 04/12/2014, Updated 08/08/2014 Summary Comparison of Three Fugal ITS Reference Sets Qiong Wang and Jim R. Cole In this report, we evaluate the performance of three different fungal ITS datasets using RDP Classifier (1). The genera covered differed significantly between the three sets. The UNITE was the largest dataset, covered at least 85% and 73% genera from DOE_SFA and Warcup, respectively. DOE_SFA is the smallest, containing only 48% and 39% of the Warcup and UNITE genera, respectively. Warcup showed the highest and tightest similarity within species with median at 96%. UNITE_sh (grouping by UNITE species hypothesis accession code) has a median similarity within species of 90%. Warcup and UNITE_sh performed similarly during the leave- one- sequence- out testing: 85%, 88% accuracy at species and 93%, 90% at genus respectively. Warcup showed the best accuracy during leave- one- taxon- out testing, with UNITE_sh the second best. It took 80 seconds to classify 1000 near- full length ITS sequences using the UNITE_sh training set on a single CPU on Mac 3.2 GHz Intel Core i5 processor. Using the Warcup training set, the speed was twice as fast, roughly proportional to the relative number of species. When trained on UNITE_name set in which sequences were grouped by UNITE taxon names, the Classifier performed much worse than when trained on the UNITE_sh. Both the Warcup and UNITE_sh ITS training sets are available on RDP Classifier web site, and RDP SourceForge repository ( classifier/) and GitHub repository (and ITS Reference Sets DOE_SFA ref set: This is a published hand- curated set. The sequences and taxonomy construction of this set were described in detail in Porras- Alfaro et al. (U.S. Department of Energy Science Focus Area; 2). It contains lineage only to the genus level. Briefly, the majority of sequences were selected from published phylogenies or from NCBI searches. It only contains lineage information down to genus level. Warcup ref set: An version from an active curatorial effort kindly provided by Paul Greenfield and Vinita Deshpande of the Australian Commonwealth Scientific and Industrial Research Organization (manuscript in preparation). It also incorporates some training sequences from DOE_SFA and UNITE ref sets. It contains lineages to the species level.

2 UNITE ref set: A set consisting of UNITE core sequences (excluding chimeric and low quality) for each dynamic species hypothesis provided by Kessy Abarenkov of UNITE on July 4, This file uses the UNITE dynamic species hypotheses. These were created using a two- tier clustering process, which first cluster sequences to subgenus/genus level and then to finer species level (3). In addition to the UNITE species hypothesis accession code number, each sequence is labeled with a lineage including a more traditional UNITE taxon name as species designation. We tested the UNITE set twice once grouping by UNITE taxon name as terminal taxa (UNITE_name) and a second time, using a concatenation of the UNITE Species hypotheses and UNITE taxon name to group sequences into terminal taxa (UNITE_sh). For example, instead of having one terminal taxon Cortinarius_caesiocortinatus, this set has two terminal taxa Cortinarius_caesiocortinatus SH FU and Cortinarius_caesiocortinatus SH FU. Except the grouping of sequences into terminal taxa, the sequences included in these two UNITE ref sets are identical. For each of the ref sets described above, we constructed a unique set by removing any sequence identical to, or a substring of another sequence in the same training set. Removing duplicates is important for evaluating the performance of the dataset to avoid inflated results. The taxonomic composition and the number of sequences are listed in Table 1a. In addition to the common domain Fungi, DOE_SFA set contains 1 sequence from each of three domains Protozoa, Viridiplantae and Stramenopiles; UNITE contains 56 sequences from domain Protozoa. Vast majority of these three datasets contain sequences of full ITS regions, including ITS1, 5.8S and ITS2 (Table 1b). Table 1a: taxonomic compositions of major ranks Rank Warcup DOE_SFA UNITE domain (kingdom) phylum class order family genus 1,620 1,134 2,135 species 8,967 NA 20,221* Unique Sequences 17,923 6, ,019 * The UNITE_sh has 20,221 species level taxa, the UNITE_name has 10,346. Table 1b: Completeness of the unique sequences Completeness (%) Warcup DOE_SFA UNITE (Near) complete Incomplete ITS Incomplete ITS Incomplete both

3 Results Commonality We compared the three ref sets to measure the extent that genera and sequences were shared between the different data sets (Table 2a, 2b). UNITE is the largest set, containing 85% of genera from Warcup and 73% of genera from DOE_SFA. It also contains more than half of the sequences (Genbank accnos) from Warcup and DOE_SFA. Warcup is the second largest set, containing 69% of genera from DOE_SFA and 64% of genera from UNITE. The percent of sequences from the other sets found in either Warcup or DOE_SFA was less than 15% (Table 2b). The number of shared genera and shared sequences between each pair of ref set was shown in Venn diagram (Fig. 1). Table 2a: Shared genera Warcup DOE_SFA UNITE Warcup 48% 85% DOE_SFA 69% 73% UNITE 64% 39% Table 2b: Shared Sequences Warcup DOE_SFA UNITE Warcup 6% 66% DOE_SFA 15% 56% UNITE 8% 3% Shared genera DOE_SFA Warcup 246$ 56$ 40$ 723$ 109$! 646$! 657$ UNITE Shared sequences DOE_SFA 2786$ 239$ 5777$ 796$ 3068$ 11111$!! $ UNITE Warcup Figure 1: Venn diagram of shared genera and shared sequences. Taxa Similarity We examined how close the sequences were within taxa and between taxa. Since no good multiple alignment methods are available for ITS, we used Sab scores as a measure of similarity between sequences.

4 DOE_SFA does not group at species rank, the median Sab score within genera is 56% and drops to 31% among families (Fig. 2). Warcup showed the highest and tightest similarity within species with median at 96%. UNITE_sh has a median similarity within species of 90%, with a large range from 72% (2 nd percentile) to 99% (98 th percentile). For both DOE_SFA and UNITE_sh, the higher ranks were slightly less similar than Warcup. UNITE_name has the lowest median similarity of 37% within species. The similarity between species (or higher ranks) was low for all the sets. Figure 2: box and whisker plots showing intra- taxa similarity (Sab score) for each major rank. The 1 st quartile, median and 3rd quartiles are shown as the bottom, middle and top of the box, the 2 nd and 98th percentiles are indicated by whiskers. From clockwise: Warcup, DOE_SFA, UNITE_name and UNITE_sh. Note DOE_SFA does not group at species rank.

5 Leave- One- Out Testing We preformed both leave- one- sequence- out and leave- one- taxon- out testing on the three fungal ITS datasets. All Warcup and DOE_SFA sequences were used for testing. For the UNITE_sh and UNITE_name sets, one sequence from each species was chosen randomly as query for these tests. Classification without bootstrap cutoff was use for these accuracy measurements. Warcup achieved 85% accuracy at species level and 93% accuracy at genus level. UNITE_sh showed 88% at species and 90% at genus level (Fig. 3). DOE_SFA showed only 79% at genus level. One notable difference worth mentioning here are differences between our testing results and the testing results from the publication describing DOE_SFA dataset (2). Duplicate sequences were not removed from the training set in those tests while they were removed for this report. When a taxon was removed from the testing, the accuracy at lower ranks (order, family, genus) decreased for all the data sets. For example, if the species was not present in the training set, in 73% of the cases, the Classifier trained on Warcup set can assign a sequence to the correct genus, but for only 58% of the cases when trained on UNITE_sh set. If the genus is not present, Classifier trained on Warcup set made the correct family assignment 90% of the time but only 77% of the time when trained on UNITE_sh set and 60% when trained on DOE_SFA set. When tested on the UNITE_name ref set constructed using the species name as the terminal taxon name, the Classifier showed only 74% accuracy at species level and 80% at genus level with leave- one- sequence- out testing. The accuracy of the leave- one- taxon- out testing using UNITE_name set was also worse than the one using UNITE_sh set. Further investigating the misclassified sequences during leave- one- out testing, we found they have the closest match (highest Sab score) to a sequence from a different species in the majority of the cases (Table 3). Table 3: percent of misclassified sequences with closest matches in different taxon # misclassified seqs % misclassified seqs with closest match in different taxon Warcup % DOE_SFA % UNITE_sh % UNITE_name %

6 100% 90% 80% Accuracy 70% 60% 50% 40% 30% 100% 90% 80% Accuracy 70% 60% 50% 40% domain phylum class order family genus species DOE_SFA Warcup UNITE_sh UNITE_name 30% domain phylum class order family genus Figure 3: Classification accuracy at each major taxon rank from leave- one- out testing. The RDP Classifier was trained on the each of the four fungal ITS sets. No bootstrap cutoff was applied in the accuracy calculation. Top: leave- one- sequence- out testing. Bottom: leave- one- taxon- out testing. Methods Leave- one- sequence- out testing: each iteration one sequence from the training set was chosen as a test sequence. That sequence was removed from training set. The assignment of the sequence produced by the Classifier was compared to the original taxonomy label to measure the accuracy of the Classifier. Singleton

7 sequences might be included in the accuracy calculation if the higher rank taxon contained multiple sequences. For example if a sequences is the only sequences for a species, it s not included in the accuracy calculation for species rank; but if this sequence belonged to a genus containing multiple species, then it was included in the accuracy calculation for the genus rank. Leave- one- taxon- out testing is very similar to the leave- one- sequence- out testing except for each test sequence, the lowest taxon that sequence assigned to (either species or genus node) was removed from the training set. This is intended to test if the species or genus is no present in the training set, how likely the Classifier can assign the sequence to the correct genus or higher taxa. Sab score: the percent of share 8- mers between two sequences. This is the same score as the one calculated by RDP SeqMatch except the latter uses 7- mer. We used 8- mer here because Classifier performs the best using 8- mer when trained on 16S rrna datasets. Taxa Similarity: for each pair of sequences from a set, we calculated the Sab score and added score to the lowest common ancestor taxon of the two sequences. For example, if these two sequences were from the same species, the Sab score was added to species pool to measure how close sequences are within species. If there are from the same genus but not from the same species, the Sab score was added to the genus pool to measure how close they are between species. The Sab scores for each rank were used to generate box and whisker plots. Completeness Measurement: sequence records were retrieved from Genbank using the Genbank accnos from all three ref sets. Only sequences with feature internal transcribed spacer 1" and internal transcribed spacer 2" were considered as complete and the corresponding sequence region were kept. The resulted in 13,912 complete reference sequences (called COMBO set). For each query sequence in each of the ref sets, the pairwise alignment between the query and a sequence from COMBO set with the best alignment score was used to determine the completeness. A query is marked with Incomplete ITS1 if the query alignment contains at least 50 inserts in the beginning, or Incomplete ITS2 is it contains at least 50 inserts at the end of the alignment, or both. References 1. Wang, Q, G. M. Garrity, J. M. Tiedje, and J. R. Cole Naïve Bayesian Classifier for Rapid Assignment of rrna Sequences into the New Bacterial Taxonomy. Appl Environ Microbiol. 73(16): Porras- Alfaro A, Liu KL, Kuske CR, Xie G From genus to phylum: large- subunit and internal transcribed spacer rrna operon regions show similar classification accuracies influenced by database composition. Appl Environ Microbiol. 80(3):

8 3. Koljalg U., Nilsson R.H., Abarenkov K., Tedersoo L., Taylor A., Bahram M., Bates S.T., Bruns T.D., Bengtsson- Palme J., Callaghan T.M., et al Towards a unified paradigm for sequence- based identification of fungi. Molecular Ecology 22:

Taxonomical Classification using:

Taxonomical Classification using: Taxonomical Classification using: Extracting ecological signal from noise: introduction to tools for the analysis of NGS data from microbial communities Bergen, April 19-20 2012 INTRODUCTION Taxonomical

More information

Assigning Taxonomy to Marker Genes. Susan Huse Brown University August 7, 2014

Assigning Taxonomy to Marker Genes. Susan Huse Brown University August 7, 2014 Assigning Taxonomy to Marker Genes Susan Huse Brown University August 7, 2014 In a nutshell Taxonomy is assigned by comparing your DNA sequences against a database of DNA sequences from known taxa Marker

More information

Taxonomy and Clustering of SSU rrna Tags. Susan Huse Josephine Bay Paul Center August 5, 2013

Taxonomy and Clustering of SSU rrna Tags. Susan Huse Josephine Bay Paul Center August 5, 2013 Taxonomy and Clustering of SSU rrna Tags Susan Huse Josephine Bay Paul Center August 5, 2013 Primary Methods of Taxonomic Assignment Bayesian Kmer Matching RDP http://rdp.cme.msu.edu Wang, et al (2007)

More information

MiGA: The Microbial Genome Atlas

MiGA: The Microbial Genome Atlas December 12 th 2017 MiGA: The Microbial Genome Atlas Jim Cole Center for Microbial Ecology Dept. of Plant, Soil & Microbial Sciences Michigan State University East Lansing, Michigan U.S.A. Where I m From

More information

Robert Edgar. Independent scientist

Robert Edgar. Independent scientist Robert Edgar Independent scientist robert@drive5.com www.drive5.com "Bacterial taxonomy is a hornets nest that no one, really, wants to get into." Referee #1, UTAX paper Assume prokaryotic species meaningful

More information

PGA: A Program for Genome Annotation by Comparative Analysis of. Maximum Likelihood Phylogenies of Genes and Species

PGA: A Program for Genome Annotation by Comparative Analysis of. Maximum Likelihood Phylogenies of Genes and Species PGA: A Program for Genome Annotation by Comparative Analysis of Maximum Likelihood Phylogenies of Genes and Species Paulo Bandiera-Paiva 1 and Marcelo R.S. Briones 2 1 Departmento de Informática em Saúde

More information

Accuracy of taxonomy prediction for 16S rrna and fungal ITS sequences

Accuracy of taxonomy prediction for 16S rrna and fungal ITS sequences Accuracy of taxonomy prediction for 16S rrna and fungal ITS sequences Robert C. Edgar Sonoma, CA, USA ABSTRACT Prediction of taxonomy for marker gene sequences such as 16S ribosomal RNA (rrna) is a fundamental

More information

Using Ensembles of Hidden Markov Models for Grand Challenges in Bioinformatics

Using Ensembles of Hidden Markov Models for Grand Challenges in Bioinformatics Using Ensembles of Hidden Markov Models for Grand Challenges in Bioinformatics Tandy Warnow Founder Professor of Engineering The University of Illinois at Urbana-Champaign http://tandy.cs.illinois.edu

More information

Microbiome: 16S rrna Sequencing 3/30/2018

Microbiome: 16S rrna Sequencing 3/30/2018 Microbiome: 16S rrna Sequencing 3/30/2018 Skills from Previous Lectures Central Dogma of Biology Lecture 3: Genetics and Genomics Lecture 4: Microarrays Lecture 12: ChIP-Seq Phylogenetics Lecture 13: Phylogenetics

More information

Title ghost-tree: creating hybrid-gene phylogenetic trees for diversity analyses

Title ghost-tree: creating hybrid-gene phylogenetic trees for diversity analyses 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 Title ghost-tree: creating hybrid-gene phylogenetic trees for diversity analyses

More information

Bacterial Communities in Women with Bacterial Vaginosis: High Resolution Phylogenetic Analyses Reveal Relationships of Microbiota to Clinical Criteria

Bacterial Communities in Women with Bacterial Vaginosis: High Resolution Phylogenetic Analyses Reveal Relationships of Microbiota to Clinical Criteria Bacterial Communities in Women with Bacterial Vaginosis: High Resolution Phylogenetic Analyses Reveal Relationships of Microbiota to Clinical Criteria Seminar presentation Pierre Barbera Supervised by:

More information

Microbial Taxonomy and the Evolution of Diversity

Microbial Taxonomy and the Evolution of Diversity 19 Microbial Taxonomy and the Evolution of Diversity Copyright McGraw-Hill Global Education Holdings, LLC. Permission required for reproduction or display. 1 Taxonomy Introduction to Microbial Taxonomy

More information

rrdp: Interface to the RDP Classifier

rrdp: Interface to the RDP Classifier rrdp: Interface to the RDP Classifier Michael Hahsler Anurag Nagar Abstract This package installs and interfaces the naive Bayesian classifier for 16S rrna sequences developed by the Ribosomal Database

More information

2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity.

2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity. A global picture of the protein universe will help us to understand

More information

Supplementary File 4: Methods Summary and Supplementary Methods

Supplementary File 4: Methods Summary and Supplementary Methods Supplementary File 4: Methods Summary and Supplementary Methods Methods Summary Constructing and implementing an SDM model requires local measurements of community composition and rasters of environmental

More information

The practice of naming and classifying organisms is called taxonomy.

The practice of naming and classifying organisms is called taxonomy. Chapter 18 Key Idea: Biologists use taxonomic systems to organize their knowledge of organisms. These systems attempt to provide consistent ways to name and categorize organisms. The practice of naming

More information

PHYLOGENY AND SYSTEMATICS

PHYLOGENY AND SYSTEMATICS AP BIOLOGY EVOLUTION/HEREDITY UNIT Unit 1 Part 11 Chapter 26 Activity #15 NAME DATE PERIOD PHYLOGENY AND SYSTEMATICS PHYLOGENY Evolutionary history of species or group of related species SYSTEMATICS Study

More information

Handling Fungal data in MoBeDAC

Handling Fungal data in MoBeDAC Handling Fungal data in MoBeDAC Jason Stajich UC Riverside Fungal Taxonomy and naming undergoing a revolution One fungus, one name http://www.biology.duke.edu/fungi/ mycolab/primers.htm http://www.biology.duke.edu/fungi/

More information

Pipelining RDP Data to the Taxomatic Background Accomplishments vs objectives

Pipelining RDP Data to the Taxomatic Background Accomplishments vs objectives Pipelining RDP Data to the Taxomatic Timothy G. Lilburn, PI/Co-PI George M. Garrity, PI/Co-PI (Collaborative) James R. Cole, Co-PI (Collaborative) Project ID 0010734 Grant No. DE-FG02-04ER63932 Background

More information

Taxonomy. Content. How to determine & classify a species. Phylogeny and evolution

Taxonomy. Content. How to determine & classify a species. Phylogeny and evolution Taxonomy Content Why Taxonomy? How to determine & classify a species Domains versus Kingdoms Phylogeny and evolution Why Taxonomy? Classification Arrangement in groups or taxa (taxon = group) Nomenclature

More information

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the

More information

Microbial Diversity and Assessment (II) Spring, 2007 Guangyi Wang, Ph.D. POST103B

Microbial Diversity and Assessment (II) Spring, 2007 Guangyi Wang, Ph.D. POST103B Microbial Diversity and Assessment (II) Spring, 007 Guangyi Wang, Ph.D. POST03B guangyi@hawaii.edu http://www.soest.hawaii.edu/marinefungi/ocn403webpage.htm General introduction and overview Taxonomy [Greek

More information

Macroevolution Part I: Phylogenies

Macroevolution Part I: Phylogenies Macroevolution Part I: Phylogenies Taxonomy Classification originated with Carolus Linnaeus in the 18 th century. Based on structural (outward and inward) similarities Hierarchal scheme, the largest most

More information

The Classification of Plants and Other Organisms. Chapter 18

The Classification of Plants and Other Organisms. Chapter 18 The Classification of Plants and Other Organisms Chapter 18 LEARNING OBJECTIVE 1 Define taxonomy Explain why the assignment of a scientific name to each species is important for biologists KEY TERMS TAXONOMY

More information

A Bayesian taxonomic classification method for 16S rrna gene sequences with improved species-level accuracy

A Bayesian taxonomic classification method for 16S rrna gene sequences with improved species-level accuracy Gao et al. BMC Bioinformatics (2017) 18:247 DOI 10.1186/s12859-017-1670-4 SOFTWARE Open Access A Bayesian taxonomic classification method for 16S rrna gene sequences with improved species-level accuracy

More information

Comparative Genomics II

Comparative Genomics II Comparative Genomics II Advances in Bioinformatics and Genomics GEN 240B Jason Stajich May 19 Comparative Genomics II Slide 1/31 Outline Introduction Gene Families Pairwise Methods Phylogenetic Methods

More information

Phylogenetic Tree Reconstruction

Phylogenetic Tree Reconstruction I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven

More information

Classification, Phylogeny yand Evolutionary History

Classification, Phylogeny yand Evolutionary History Classification, Phylogeny yand Evolutionary History The diversity of life is great. To communicate about it, there must be a scheme for organization. There are many species that would be difficult to organize

More information

Chapter 1 - Lecture 3 Measures of Location

Chapter 1 - Lecture 3 Measures of Location Chapter 1 - Lecture 3 of Location August 31st, 2009 Chapter 1 - Lecture 3 of Location General Types of measures Median Skewness Chapter 1 - Lecture 3 of Location Outline General Types of measures What

More information

Outline. Classification of Living Things

Outline. Classification of Living Things Outline Classification of Living Things Chapter 20 Mader: Biology 8th Ed. Taxonomy Binomial System Species Identification Classification Categories Phylogenetic Trees Tracing Phylogeny Cladistic Systematics

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 1 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification

More information

Naïve Bayesian Classifier for Rapid Assignment of rrna Sequences into the New Bacterial Taxonomy

Naïve Bayesian Classifier for Rapid Assignment of rrna Sequences into the New Bacterial Taxonomy APPLIED AND ENVIRONMENTAL MICROBIOLOGY, Aug. 2007, p. 5261 5267 Vol. 73, No. 16 0099-2240/07/$08.00 0 doi:10.1128/aem.00062-07 Copyright 2007, American Society for Microbiology. All Rights Reserved. Naïve

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary information S3 (box) Methods Methods Genome weighting The currently available collection of archaeal and bacterial genomes has a highly biased distribution of isolates across taxa. For example,

More information

Chad Burrus April 6, 2010

Chad Burrus April 6, 2010 Chad Burrus April 6, 2010 1 Background What is UniFrac? Materials and Methods Results Discussion Questions 2 The vast majority of microbes cannot be cultured with current methods Only half (26) out of

More information

Prac%cal Bioinforma%cs for Life Scien%sts. Week 14, Lecture 28. István Albert Bioinforma%cs Consul%ng Center Penn State

Prac%cal Bioinforma%cs for Life Scien%sts. Week 14, Lecture 28. István Albert Bioinforma%cs Consul%ng Center Penn State Prac%cal Bioinforma%cs for Life Scien%sts Week 14, Lecture 28 István Albert Bioinforma%cs Consul%ng Center Penn State Final project A group of researchers are interested in studying protein binding loca%ons

More information

Chapter 26 Phylogeny and the Tree of Life

Chapter 26 Phylogeny and the Tree of Life Chapter 26 Phylogeny and the Tree of Life Chapter focus Shifting from the process of how evolution works to the pattern evolution produces over time. Phylogeny Phylon = tribe, geny = genesis or origin

More information

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between

More information

Biologists use a system of classification to organize information about the diversity of living things.

Biologists use a system of classification to organize information about the diversity of living things. Section 1: Biologists use a system of classification to organize information about the diversity of living things. K What I Know W What I Want to Find Out L What I Learned Essential Questions What are

More information

CLASSIFICATION UNIT GUIDE DUE WEDNESDAY 3/1

CLASSIFICATION UNIT GUIDE DUE WEDNESDAY 3/1 CLASSIFICATION UNIT GUIDE DUE WEDNESDAY 3/1 MONDAY TUESDAY WEDNESDAY THURSDAY FRIDAY 2/13 2/14 - B 2/15 2/16 - B 2/17 2/20 Intro to Viruses Viruses VS Cells 2/21 - B Virus Reproduction Q 1-2 2/22 2/23

More information

1. HyperLogLog algorithm

1. HyperLogLog algorithm SUPPLEMENTARY INFORMATION FOR KRAKENHLL (BREITWIESER AND SALZBERG, 2018) 1. HyperLogLog algorithm... 1 2. Database building and reanalysis of the patient data (Salzberg, et al., 2016)... 7 3. Enabling

More information

Iterative Laplacian Score for Feature Selection

Iterative Laplacian Score for Feature Selection Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,

More information

AP Biology. Cladistics

AP Biology. Cladistics Cladistics Kingdom Summary Review slide Review slide Classification Old 5 Kingdom system Eukaryote Monera, Protists, Plants, Fungi, Animals New 3 Domain system reflects a greater understanding of evolution

More information

Biology 211 (2) Week 1 KEY!

Biology 211 (2) Week 1 KEY! Biology 211 (2) Week 1 KEY Chapter 1 KEY FIGURES: 1.2, 1.3, 1.4, 1.5, 1.6, 1.7 VOCABULARY: Adaptation: a trait that increases the fitness Cells: a developed, system bound with a thin outer layer made of

More information

Assessing and Improving Methods Used in Operational Taxonomic Unit-Based Approaches for 16S rrna Gene Sequence Analysis

Assessing and Improving Methods Used in Operational Taxonomic Unit-Based Approaches for 16S rrna Gene Sequence Analysis APPLIED AND ENVIRONMENTAL MICROBIOLOGY, May 2011, p. 3219 3226 Vol. 77, No. 10 0099-2240/11/$12.00 doi:10.1128/aem.02810-10 Copyright 2011, American Society for Microbiology. All Rights Reserved. Assessing

More information

SPECIATION. REPRODUCTIVE BARRIERS PREZYGOTIC: Barriers that prevent fertilization. Habitat isolation Populations can t get together

SPECIATION. REPRODUCTIVE BARRIERS PREZYGOTIC: Barriers that prevent fertilization. Habitat isolation Populations can t get together SPECIATION Origin of new species=speciation -Process by which one species splits into two or more species, accounts for both the unity and diversity of life SPECIES BIOLOGICAL CONCEPT Population or groups

More information

Diversity, Productivity and Stability of an Industrial Microbial Ecosystem

Diversity, Productivity and Stability of an Industrial Microbial Ecosystem Diversity, Productivity and Stability of an Industrial Microbial Ecosystem Doruk Beyter 1, Pei-Zhong Tang 2, Scott Becker 2, Tony Hoang 3, Damla Bilgin 3, Yan Wei Lim 4, Todd C. Peterson 2, Stephen Mayfield

More information

C3020 Molecular Evolution. Exercises #3: Phylogenetics

C3020 Molecular Evolution. Exercises #3: Phylogenetics C3020 Molecular Evolution Exercises #3: Phylogenetics Consider the following sequences for five taxa 1-5 and the known outgroup O, which has the ancestral states (note that sequence 3 has changed from

More information

The Life System and Environmental & Evolutionary Biology II

The Life System and Environmental & Evolutionary Biology II The Life System and Environmental & Evolutionary Biology II EESC V2300y / ENVB W2002y Laboratory 1 (01/28/03) Systematics and Taxonomy 1 SYNOPSIS In this lab we will give an overview of the methodology

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary information S1 (box). Supplementary Methods description. Prokaryotic Genome Database Archaeal and bacterial genome sequences were downloaded from the NCBI FTP site (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/)

More information

Behavioral Data Mining. Lecture 2

Behavioral Data Mining. Lecture 2 Behavioral Data Mining Lecture 2 Autonomy Corp Bayes Theorem Bayes Theorem P(A B) = probability of A given that B is true. P(A B) = P(B A)P(A) P(B) In practice we are most interested in dealing with events

More information

Chapter 17A. Table of Contents. Section 1 Categories of Biological Classification. Section 2 How Biologists Classify Organisms

Chapter 17A. Table of Contents. Section 1 Categories of Biological Classification. Section 2 How Biologists Classify Organisms Classification of Organisms Table of Contents Section 1 Categories of Biological Classification Section 1 Categories of Biological Classification Classification Section 1 Categories of Biological Classification

More information

Amy Driskell. Laboratories of Analytical Biology National Museum of Natural History Smithsonian Institution, Wash. DC

Amy Driskell. Laboratories of Analytical Biology National Museum of Natural History Smithsonian Institution, Wash. DC DNA Barcoding Amy Driskell Laboratories of Analytical Biology National Museum of Natural History Smithsonian Institution, Wash. DC 1 Outline 1. Barcoding in general 2. Uses & Examples 3. Barcoding Bocas

More information

Centrifuge: rapid and sensitive classification of metagenomic sequences

Centrifuge: rapid and sensitive classification of metagenomic sequences Centrifuge: rapid and sensitive classification of metagenomic sequences Daehwan Kim, Li Song, Florian P. Breitwieser, and Steven L. Salzberg Supplementary Material Supplementary Table 1 Supplementary Note

More information

Taxonomy and Biodiversity

Taxonomy and Biodiversity Chapter 25/26 Taxonomy and Biodiversity Evolutionary biology The major goal of evolutionary biology is to reconstruct the history of life on earth Process: a- natural selection b- mechanisms that change

More information

Organizing Life on Earth

Organizing Life on Earth Organizing Life on Earth Inquire: Organizing Life on Earth Overview Scientists continually obtain new information that helps to understand the evolutionary history of life on Earth. Each group of organisms

More information

Methods for Microbiome Analysis

Methods for Microbiome Analysis Classroom Lecture 4 December 2015 Methods for Microbiome Analysis James R. Cole Director, RDP (Ribosomal Database Project) Center for Microbial Ecology Michigan State University East Lansing, Michigan

More information

- conserved in Eukaryotes. - proteins in the cluster have identifiable conserved domains. - human gene should be included in the cluster.

- conserved in Eukaryotes. - proteins in the cluster have identifiable conserved domains. - human gene should be included in the cluster. NCBI BLAST Services DELTA-BLAST BLAST (http://blast.ncbi.nlm.nih.gov/), Basic Local Alignment Search tool, is a suite of programs for finding similarities between biological sequences. DELTA-BLAST is a

More information

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology

More information

Phylogeny 9/8/2014. Evolutionary Relationships. Data Supporting Phylogeny. Chapter 26

Phylogeny 9/8/2014. Evolutionary Relationships. Data Supporting Phylogeny. Chapter 26 Phylogeny Chapter 26 Taxonomy Taxonomy: ordered division of organisms into categories based on a set of characteristics used to assess similarities and differences Carolus Linnaeus developed binomial nomenclature,

More information

Biology 2.1 Taxonomy: Domain, Kingdom, Phylum. ICan2Ed.com

Biology 2.1 Taxonomy: Domain, Kingdom, Phylum. ICan2Ed.com Biology 2.1 Taxonomy: Domain, Kingdom, Phylum ICan2Ed.com Taxonomy is the scientific field that catalogs, describes, and names living organisms. The way to divide living organisms into groups based on

More information

8/23/2014. Phylogeny and the Tree of Life

8/23/2014. Phylogeny and the Tree of Life Phylogeny and the Tree of Life Chapter 26 Objectives Explain the following characteristics of the Linnaean system of classification: a. binomial nomenclature b. hierarchical classification List the major

More information

CS612 - Algorithms in Bioinformatics

CS612 - Algorithms in Bioinformatics Fall 2017 Databases and Protein Structure Representation October 2, 2017 Molecular Biology as Information Science > 12, 000 genomes sequenced, mostly bacterial (2013) > 5x10 6 unique sequences available

More information

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships Chapter 26: Phylogeny and the Tree of Life You Must Know The taxonomic categories and how they indicate relatedness. How systematics is used to develop phylogenetic trees. How to construct a phylogenetic

More information

Classification of Organisms

Classification of Organisms Classification of Organisms Main Idea *****Chapter 14***** Students should be able to: * Understand why a classification system is important * Understand that there are a variety of ways to classify organisms

More information

A (short) introduction to phylogenetics

A (short) introduction to phylogenetics A (short) introduction to phylogenetics Thibaut Jombart, Marie-Pauline Beugin MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis with PR Statistics, Millport Field

More information

Biology Classification Unit 11. CLASSIFICATION: process of dividing organisms into groups with similar characteristics

Biology Classification Unit 11. CLASSIFICATION: process of dividing organisms into groups with similar characteristics Biology Classification Unit 11 11:1 Classification and Taxonomy CLASSIFICATION: process of dividing organisms into groups with similar characteristics TAXONOMY: the science of classifying living things

More information

Organizing Life s Diversity Section 17.1 Classification

Organizing Life s Diversity Section 17.1 Classification Organizing Life s Diversity Section 17.1 Classification Scan Section 1 of your book. Write three questions that come to mind from reading the headings and the illustration captions. 1. 2. 3. Review species

More information

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task. Chapter 12 (Strikberger) Molecular Phylogenies and Evolution METHODS FOR DETERMINING PHYLOGENY In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task. Modern

More information

University of Groningen

University of Groningen University of Groningen Can clade age alone explain the relationship between body size and diversity? Etienne, Rampal S.; de Visser, Sara N.; Janzen, Thijs; Olsen, Jeanine L.; Olff, Han; Rosindell, James

More information

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree) I9 Introduction to Bioinformatics, 0 Phylogenetic ree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & omputing, IUB Evolution theory Speciation Evolution of new organisms is driven by

More information

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring / Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Combining Classifiers Empirical view Theoretical

More information

Unit 9: Taxonomy (Classification) Notes

Unit 9: Taxonomy (Classification) Notes Name Exam Date Class Unit 9: Taxonomy (Classification) Notes What is Classification? is when we place organisms into based on their. Classification is also known as. Taxonomists are scientists that & organisms

More information

Microbes and you ON THE LATEST HUMAN MICROBIOME DISCOVERIES, COMPUTATIONAL QUESTIONS AND SOME SOLUTIONS. Elizabeth Tseng

Microbes and you ON THE LATEST HUMAN MICROBIOME DISCOVERIES, COMPUTATIONAL QUESTIONS AND SOME SOLUTIONS. Elizabeth Tseng Microbes and you ON THE LATEST HUMAN MICROBIOME DISCOVERIES, COMPUTATIONAL QUESTIONS AND SOME SOLUTIONS Elizabeth Tseng Dept. of CSE, University of Washington Johanna Lampe Lab, Fred Hutchinson Cancer

More information

18-1 Finding Order in Diversity Slide 2 of 26

18-1 Finding Order in Diversity Slide 2 of 26 18-1 Finding Order in Diversity 2 of 26 Natural selection and other processes have led to a staggering diversity of organisms. Biologists have identified and named about 1.5 million species so far. They

More information

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18 CSE 417T: Introduction to Machine Learning Lecture 11: Review Henry Chai 10/02/18 Unknown Target Function!: # % Training data Formal Setup & = ( ), + ),, ( -, + - Learning Algorithm 2 Hypothesis Set H

More information

Ensemble Methods. NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan

Ensemble Methods. NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan Ensemble Methods NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan How do you make a decision? What do you want for lunch today?! What did you have last night?! What are your favorite

More information

Analysis of N-terminal Acetylation data with Kernel-Based Clustering

Analysis of N-terminal Acetylation data with Kernel-Based Clustering Analysis of N-terminal Acetylation data with Kernel-Based Clustering Ying Liu Department of Computational Biology, School of Medicine University of Pittsburgh yil43@pitt.edu 1 Introduction N-terminal acetylation

More information

Supplementary Information

Supplementary Information Supplementary Information Supplementary Figure 1. Schematic pipeline for single-cell genome assembly, cleaning and annotation. a. The assembly process was optimized to account for multiple cells putatively

More information

Name: Class: Date: ID: A

Name: Class: Date: ID: A Class: _ Date: _ Ch 17 Practice test 1. A segment of DNA that stores genetic information is called a(n) a. amino acid. b. gene. c. protein. d. intron. 2. In which of the following processes does change

More information

Microbes usually have few distinguishing properties that relate them, so a hierarchical taxonomy mainly has not been possible.

Microbes usually have few distinguishing properties that relate them, so a hierarchical taxonomy mainly has not been possible. Microbial Taxonomy Traditional taxonomy or the classification through identification and nomenclature of microbes, both "prokaryote" and eukaryote, has been in a mess we were stuck with it for traditional

More information

Microbial Taxonomy. Slowly evolving molecules (e.g., rrna) used for large-scale structure; "fast- clock" molecules for fine-structure.

Microbial Taxonomy. Slowly evolving molecules (e.g., rrna) used for large-scale structure; fast- clock molecules for fine-structure. Microbial Taxonomy Traditional taxonomy or the classification through identification and nomenclature of microbes, both "prokaryote" and eukaryote, has been in a mess we were stuck with it for traditional

More information

Chapter 26. Phylogeny and the Tree of Life. Lecture Presentations by Nicole Tunbridge and Kathleen Fitzpatrick Pearson Education, Inc.

Chapter 26. Phylogeny and the Tree of Life. Lecture Presentations by Nicole Tunbridge and Kathleen Fitzpatrick Pearson Education, Inc. Chapter 26 Phylogeny and the Tree of Life Lecture Presentations by Nicole Tunbridge and Kathleen Fitzpatrick Investigating the Tree of Life Phylogeny is the evolutionary history of a species or group of

More information

Department of Computer and Information Science and Engineering. CAP4770/CAP5771 Fall Midterm Exam. Instructor: Prof.

Department of Computer and Information Science and Engineering. CAP4770/CAP5771 Fall Midterm Exam. Instructor: Prof. Department of Computer and Information Science and Engineering UNIVERSITY OF FLORIDA CAP4770/CAP5771 Fall 2016 Midterm Exam Instructor: Prof. Daisy Zhe Wang This is a in-class, closed-book exam. This exam

More information

Intraspecific gene genealogies: trees grafting into networks

Intraspecific gene genealogies: trees grafting into networks Intraspecific gene genealogies: trees grafting into networks by David Posada & Keith A. Crandall Kessy Abarenkov Tartu, 2004 Article describes: Population genetics principles Intraspecific genetic variation

More information

Test Bank for Microbiology A Systems Approach 3rd edition by Cowan

Test Bank for Microbiology A Systems Approach 3rd edition by Cowan Test Bank for Microbiology A Systems Approach 3rd edition by Cowan Link download full: http://testbankair.com/download/test-bankfor-microbiology-a-systems-approach-3rd-by-cowan/ Chapter 1: The Main Themes

More information

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5. Five Sami Khuri Department of Computer Science San José State University San José, California, USA sami.khuri@sjsu.edu v Distance Methods v Character Methods v Molecular Clock v UPGMA v Maximum Parsimony

More information

a,bD (modules 1 and 10 are required)

a,bD (modules 1 and 10 are required) This form should be used for all taxonomic proposals. Please complete all those modules that are applicable (and then delete the unwanted sections). For guidance, see the notes written in blue and the

More information

Dr. Amira A. AL-Hosary

Dr. Amira A. AL-Hosary Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

More information

Measures of Location. Measures of position are used to describe the relative location of an observation

Measures of Location. Measures of position are used to describe the relative location of an observation Measures of Location Measures of position are used to describe the relative location of an observation 1 Measures of Position Quartiles and percentiles are two of the most popular measures of position

More information

An Automated Phylogenetic Tree-Based Small Subunit rrna Taxonomy and Alignment Pipeline (STAP)

An Automated Phylogenetic Tree-Based Small Subunit rrna Taxonomy and Alignment Pipeline (STAP) An Automated Phylogenetic Tree-Based Small Subunit rrna Taxonomy and Alignment Pipeline (STAP) Dongying Wu 1 *, Amber Hartman 1,6, Naomi Ward 4,5, Jonathan A. Eisen 1,2,3 1 UC Davis Genome Center, University

More information

K-means-based Feature Learning for Protein Sequence Classification

K-means-based Feature Learning for Protein Sequence Classification K-means-based Feature Learning for Protein Sequence Classification Paul Melman and Usman W. Roshan Department of Computer Science, NJIT Newark, NJ, 07102, USA pm462@njit.edu, usman.w.roshan@njit.edu Abstract

More information

The Tree of Life. Chapter 17

The Tree of Life. Chapter 17 The Tree of Life Chapter 17 1 17.1 Taxonomy The science of naming and classifying organisms 2000 years ago Aristotle Grouped plants and animals Based on structural similarities Greeks and Romans included

More information

SECTION 17-1 REVIEW BIODIVERSITY. VOCABULARY REVIEW Distinguish between the terms in each of the following pairs of terms.

SECTION 17-1 REVIEW BIODIVERSITY. VOCABULARY REVIEW Distinguish between the terms in each of the following pairs of terms. SECTION 17-1 REVIEW BIODIVERSITY VOCABULARY REVIEW Distinguish between the terms in each of the following pairs of terms. 1. taxonomy, taxon 2. kingdom, species 3. phylum, division 4. species name, species

More information

Class XI Chapter 1 The Living World Biology

Class XI Chapter 1 The Living World Biology Question 1: Why are living organisms classified? A large variety of plants, animals, and microbes are found on earth. All these living organisms differ in size, shape, colour, habitat, and many other characteristics.

More information

PhyloNet. Yun Yu. Department of Computer Science Bioinformatics Group Rice University

PhyloNet. Yun Yu. Department of Computer Science Bioinformatics Group Rice University PhyloNet Yun Yu Department of Computer Science Bioinformatics Group Rice University yy9@rice.edu Symposium And Software School 2016 The University Of Texas At Austin Installation System requirement: Java

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

Concept Modern Taxonomy reflects evolutionary history.

Concept Modern Taxonomy reflects evolutionary history. Concept 15.4 Modern Taxonomy reflects evolutionary history. What is Taxonomy: identification, naming, and classification of species. Common Names: can cause confusion - May refer to several species (ex.

More information

Stepping stones towards a new electronic prokaryotic taxonomy. The ultimate goal in taxonomy. Pragmatic towards diagnostics

Stepping stones towards a new electronic prokaryotic taxonomy. The ultimate goal in taxonomy. Pragmatic towards diagnostics Stepping stones towards a new electronic prokaryotic taxonomy - MLSA - Dirk Gevers Different needs for taxonomy Describe bio-diversity Understand evolution of life Epidemiology Diagnostics Biosafety...

More information

ECE 661: Homework 10 Fall 2014

ECE 661: Homework 10 Fall 2014 ECE 661: Homework 10 Fall 2014 This homework consists of the following two parts: (1) Face recognition with PCA and LDA for dimensionality reduction and the nearest-neighborhood rule for classification;

More information

Test: Classification of Living Things

Test: Classification of Living Things : Classification of Living Things Date: Name: Class: Word Bank: Biodiversity Classification Taxonomy Binomial Nomenclature Phylogeny Cladistics Cladogram Specific Epithet Use the word bank above to match

More information