Comparative Genomics Final Results

Size: px
Start display at page:

Download "Comparative Genomics Final Results"

Transcription

1 Comparative Genomics Final Results April 20, 2016 Juan Castro, Aroon Chande, Cheng Chen, Evan Clayton, Hector Espitia, Alli Gombolay, Walker Gussler, Ken Lee, Tyrone Lee, Hari Prasanna, Carlos Ruiz, Niveda Sundararaman, and Peijue Zhang

2 Outline Background Classic Classification Exploratory Methods Results ANI MASH Phenotypic Method Results Typing Scheme SVM-RFE Virulence Factors - Other classification targets

3 Background Our data: Haemophilus influenzae Two major categories: unencapsulated strains (untypable) and the encapsulated strains (a, b, c, d, e, f). Genetic diversity among unencapsulated strains is greater than within the encapsulated group.

4 Produce an inflammatory response in humans. H. influenzae type b (Hib) causes bacteremia, pneumonia, epiglottitis and acute bacterial meningitis in infants and young children The pathogenesis of H. influenzae infections is not completely understood

5 Getting to Know your Data (64 isolates) Sources Blood Sinus Drainage Ankle fluid Years Average assembly (bp) Average GC Content

6 What is the goal Are the NTHi evolutionarily different from the typable Hi Yes Develop a typing scheme for various strains of Nontypeable Haemophilus influenzae (NTHi)

7 Classic Classification does not work

8

9

10 Grouping H. haemolyticus (ANI Group 1) Other Haemophilus (ANI Group 1) Hi serotype F-like (ANI Group 2) NTHi group 1 (ANI Group 3) NTHi group 2 (ANI Group 4) Hi serotype b Strains without clear grouping M25364 (Unknown species)

11 Grouping H. haemolyticus (ANI Group 1) Other Haemophilus (ANI Group 1) Hi serotype F-like (ANI Group 2) NTHi group 1 (ANI Group 3) NTHi group 2 (ANI Group 4) Hi serotype b } } } Serotype C Serotype A Serotype D Serotype B Strains without clear grouping M25364 (Unknown species) Serotype F } } Serotype E Serotype A

12 All of the methods we used were insufficient for the differentiation of NTHi from typeable Hi.

13 Serotyping

14 Serotyping Res

15 Serotyping Res

16 Serotyping Typeable! Res Non Typeable!

17 Our Typing Scheme Davis, G. et al; Journal of Clinical Microbiology, 2011

18 Our Typing Scheme Davis, G. et al; Journal of Clinical Microbiology, 2011

19 Typing Scheme Region I Region III Region II bexa bexb bexc bexd ccsa bexa bexb bexc bexd dcsa bexa bexb bexc bexd bcsb ccsb dcsb ccsc dcsc bcsd ccsd dcsd dcse hcsa hcsb hcsa hcsb hcsa hcsb

20 Typing Scheme Region I Region III Region II bexa bexb bexc bexd ccsa bexa bexb bexc bexd dcsa bexa bexb bexc bexd ccsb dcsb bcsb 1 ccsc dcsc ccsd dcsd bcsd 1 dcse hcsa hcsb hcsa hcsb hcsa hcsb 1 1 8

21 Typing Scheme

22 Typing Scheme

23 SVM-RFE Approach SVM Support Vector Machine + RFE Recursive Feature Elimination Machine learning technique intended to extract informative features from a data set Conceived to obtain the most informative genes associated to cancer, using gene expression data

24 SVMs (Support Vector Machines) A data set in which we have features (genes) and samples (genomes) 2 features => 2 dimensions (x, y)

25 SVMs (Support Vector Machines) A data set in which we have features (genes) and samples (genomes) 2 features = 2 dimensions (x, y) 2 classes (red, blue)

26 SVMs (Support Vector Machines) The SVM can separate the data in two classes: Using training data Finding the best separation of the data (optimal hyperplane)

27 SVMs (Support Vector Machines) How to classify new data (with no prior knowledge)

28 SVMs (Support Vector Machines) How to classify new data (with no prior knowledge) Using what the SVM learned from training: Optimal hyperplane

29 SVM-RFE...

30 SVM-RFE...

31 SVM-RFE...

32 SVM-RFE...

33 SVM-RFE...

34 SVM-RFE

35 SVM-RFE

36 SVM-RFE

37 SVM-RFE Input matrix (BSR values + class) BLAST Score Ratio (BSR) Gene1 Gene2 Gene3... GeneN Genome1 T bsr1,1 bsr1,2 bsr1,3... bsr1,n Genome2 NTHi bsr2,1 bsr2,2 bsr2,3... bsr2,n Genome3 NTHi bsr3,1 bsr4,2 bsr3,3... bsr3,n GenomeM T bsrm,1 bsrm,2 bsr1m,3... bsrm,n... Class... Relatedness of pangenome genes across all genomes

38 SVM-RFE Testing the SVM-RFE method 4 subsets (24 genomes each one) 12 Typeable 12 NTHi

39 SVM-RFE Top-20 genes intersection

40 SVM-RFE Top-20 genes intersection # Gene Id Description (NCBI nr microbial protein database) 1 M04828_0_262 Bcs3 [Hi] 2 M03959_0_297 HcsB' [Hi] 3 M03959_0_305 Capsule polysaccharide transporter [H. sputorum] BexC [Hi] 4 M09394_0_294 HcsB [Hi] 5 M09394_0_299 Capsule biosynthesis protein CapC [H. sputorum] Ccs1 [Hi] 6 M04744_0_747 Capsule biosynthesis protein [Hi] BexC [Hi] 7 M04744_0_748 Sugar ABC transporter permease [Hi] BexB [Hi]

41 SVM-RFE All shared genes from the top-20 intersection correspond to capsule-related genes (present in the typing scheme) SVM-RFE has potential to be a useful technique to find genes with discriminative power

42 The time of reckoning

43 Now what Class III Spahich, N,, et al; Journal of Bacteriology, 2012

44 Now what Typeable strains Class III Spahich, N,, et al; Journal of Bacteriology, 2012

45 Now what Class II Typeable strains Class III Spahich, N,, et al; Journal of Bacteriology, 2012

46 Now what Class I Class II Typeable strains Class III Spahich, N,, et al; Journal of Bacteriology, 2012

47 Experimental setup Many Genes Few genes

48 Experimental setup PCR primers Many Genes Few genes

49 Experimental setup Gene Amplicon size (bp) Tm (FP/RP) (Cº) Forward primer / Reverse primer KfiC /60.0 5'-AGTCAGAGGGCGAAACCGACCA-3' / 5'AACTCGCCCTTGGCAACGGATG-3' WbaP/RfbP /60.5 5'-TCCTGAAGCAAGAGCTGAATGGGA-3' / 5'ACGTCCGCTGACTTGCCAAAGC-3' HscA /59.9 5'-AGCGGAAAAGGCTCGGTCACAA-3' / 5'CACACCCCAGCCAGCAAACCAT-3' HxuA /59.4 5'-TCAACGCAGACGCCGTTGGAAA-3' / 5'ATCAATTGAGCCAGGCGCACCA-3' BexC /59.9 5'-AGCAGCGACACAAACTGCGGAT-3' / 5'GCCCAGTCTGGCTTGCTTGGTT-3'

50 Multiplexed PCR* Marker Typeable Typeable Strain Strain A, B, or F C, D, or E NTHi Class I NTHi Class II NTHi Class III Strain M25364 Neg Control 1200bp 1000bp 900bp 800bp 700bp KfiC 600bp 500bp HxuA HscA 400bp 300bp 200bp WbaP-RfbP BexC 100bp *Dramatization

51 What could be improved Train SVM-RFE with the classes found with HMM-SOM and Virulence patterns. Reduce the number of features separating typeable from NTHi

52 Why not

53 THANKS! QUESTIONS

54 References Davis, G. S., Sandstedt, S. A., Patel, M., Marrs, C. F., & Gilsdorf, J. R. (2011). Use of bexb to detect the capsule locus in Haemophilus influenzae. Journal of Clinical Microbiology, 49(7), doi: /JCM Spahich, N. A., Hood, D. W., Moxon, E. R., & St Geme, J. W. (2012). Inactivation of Haemophilus influenzae lipopolysaccharide biosynthesis genes interferes with outer membrane localization of the hap autotransporter. Journal of Bacteriology, 194(7), doi: /jb Sahl, J. W., Caporaso, J. G., Rasko, D. A., & Keim, P. (2014). The large-scale blast score ratio (LS-BSR) pipeline: a method to rapidly compare genetic content between bacterial genomes. PeerJ, 2, e332. doi: /peerj.332 Guyon, I., Weston, J., Barnhill, S., & Vapnik, V. (2002). Gene Selection for Cancer Classification using Support Vector Machines. Machine Learning, 46(1-3), doi: /a:

Outline. I. Methods. II. Preliminary Results. A. Phylogeny Methods B. Whole Genome Methods C. Horizontal Gene Transfer

Outline. I. Methods. II. Preliminary Results. A. Phylogeny Methods B. Whole Genome Methods C. Horizontal Gene Transfer Comparative Genomics Preliminary Results April 4, 2016 Juan Castro, Aroon Chande, Cheng Chen, Evan Clayton, Hector Espitia, Alli Gombolay, Walker Gussler, Ken Lee, Tyrone Lee, Hari Prasanna, Carlos Ruiz,

More information

HAEMOPHILUS MODULE 29.1 INTRODUCTION OBJECTIVES 29.2 MORPHOLOGY. Notes

HAEMOPHILUS MODULE 29.1 INTRODUCTION OBJECTIVES 29.2 MORPHOLOGY. Notes 29 HAEMOPHILUS 29.1 INTRODUCTION The genus Haemophilus contains small, nonmotile, nonsporing, oxidase positive, pleomorphic, gram negative bacilli that are parasitic on human beings or animals. Haemophilus

More information

Genome Assembly Results, Protocol & Demo

Genome Assembly Results, Protocol & Demo Genome Assembly Results, Protocol & Demo Monday, March 7, 2016 BIOL 7210: Genome Assembly Group Aroon Chande, Cheng Chen, Alicia Francis, Alli Gombolay, Namrata Kalsi, Ellie Kim, Tyrone Lee, Wilson Martin,

More information

Gatech Computational Genomics 2011: Comparative Genomics working documentation

Gatech Computational Genomics 2011: Comparative Genomics working documentation Gatech Computational Genomics 2011: Comparative Genomics working documentation Table of Contents WHAT WILL WE ACCOMPLISH WITH COMPARATIVE GENOMICS? 1 WHAT DO WE ALREADY KNOW ABOUT HAEMOPHILUS SPECIES?

More information

Comparison of Virulence Determinants of Different Strains of Haemophilus influenzae.

Comparison of Virulence Determinants of Different Strains of Haemophilus influenzae. Aziz Niazi 1 Comparison of Virulence Determinants of Different Strains of Haemophilus influenzae. Niazi A. Rahman School of Biomedical Science, Curtin University of Technology, Perth WA Introduction...

More information

MiGA: The Microbial Genome Atlas

MiGA: The Microbial Genome Atlas December 12 th 2017 MiGA: The Microbial Genome Atlas Jim Cole Center for Microbial Ecology Dept. of Plant, Soil & Microbial Sciences Michigan State University East Lansing, Michigan U.S.A. Where I m From

More information

High Genetic Diversity of Nontypeable Haemophilus influenzae Isolates from Two Children Attending a Day Care Center

High Genetic Diversity of Nontypeable Haemophilus influenzae Isolates from Two Children Attending a Day Care Center JOURNAL OF CLINICAL MICROBIOLOGY, Nov. 2008, p. 3817 3821 Vol. 46, No. 11 0095-1137/08/$08.00 0 doi:10.1128/jcm.00940-08 Copyright 2008, American Society for Microbiology. All Rights Reserved. High Genetic

More information

Genetic Diversity, Population Structure, and Virulence Gene Polymorphisms in Nontypeable Haemophilus influenzae. Nathan C. LaCross

Genetic Diversity, Population Structure, and Virulence Gene Polymorphisms in Nontypeable Haemophilus influenzae. Nathan C. LaCross Genetic Diversity, Population Structure, and Virulence Gene Polymorphisms in Nontypeable Haemophilus influenzae by Nathan C. LaCross A dissertation submitted in partial fulfillment of the requirements

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary information S1 (box). Supplementary Methods description. Prokaryotic Genome Database Archaeal and bacterial genome sequences were downloaded from the NCBI FTP site (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/)

More information

Xingli Fan 1,2, Xiaoxiang Liu 2, Lei Ji 2, Damin Cai 2, Jinqin Jiang 2, Jingjing Zhu 2, Aihua Sun 2* and Jie Yan 1*

Xingli Fan 1,2, Xiaoxiang Liu 2, Lei Ji 2, Damin Cai 2, Jinqin Jiang 2, Jingjing Zhu 2, Aihua Sun 2* and Jie Yan 1* Fan et al. BMC Infectious Diseases (2018) 18:414 https://doi.org/10.1186/s12879-018-3295-2 RESEARCH ARTICLE Epidemiological analysis and rapid detection by one-step multiplex PCR assay of Haemophilus influenzae

More information

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie Computational Biology Program Memorial Sloan-Kettering Cancer Center http://cbio.mskcc.org/leslielab

More information

Quantification of H. influenzae Type b in cerebrospinal fluid from children with meningitis

Quantification of H. influenzae Type b in cerebrospinal fluid from children with meningitis ISSN: 231-7706 Volume 3 Number 3 (2014) pp. 283-20 http://www.ijcmas.com Original Research Article Quantification of H. influenzae Type b in cerebrospinal fluid from children with meningitis Majeed Arsheed

More information

In silico identification of novel candidate drug targets in Haemophilus influenzae Rd KW20

In silico identification of novel candidate drug targets in Haemophilus influenzae Rd KW20 International Journal of Genetics and Genomics 2014; 2(4): 62-67 Published online August 20, 2014 (http://www.sciencepublishinggroup.com/j/ijgg) doi: 10.11648/j.ijgg.20140204.13 In silico identification

More information

Protein E of Haemophilus influenzae Is a Ubiquitous Highly Conserved Adhesin

Protein E of Haemophilus influenzae Is a Ubiquitous Highly Conserved Adhesin BRIEF REPORT Protein E of Haemophilus influenzae Is a Ubiquitous Highly Conserved Adhesin Birendra Singh, 1 Marta Brant, 1 Mogens Kilian, 3 Björn Hallström, 2 and Kristian Riesbeck 1 1 Medical Microbiology,

More information

CLASSIFYING LABORATORY TEST RESULTS USING MACHINE LEARNING

CLASSIFYING LABORATORY TEST RESULTS USING MACHINE LEARNING CLASSIFYING LABORATORY TEST RESULTS USING MACHINE LEARNING Joy (Sizhe) Chen, Kenny Chiu, William Lu, Nilgoon Zarei AUGUST 31, 2018 TEAM Joy (Sizhe) Chen Kenny Chiu William Lu Nelly (Nilgoon) Zarei 2 AGENDA

More information

Invasive Disease Due to Nontypeable Haemophilus influenzae among Children in Arkansas

Invasive Disease Due to Nontypeable Haemophilus influenzae among Children in Arkansas JOURNAL OF CLINICAL MICROBIOLOGY, July 2003, p. 3064 3069 Vol. 41, No. 7 0095-1137/03/$08.00 0 DOI: 10.1128/JCM.41.7.3064 3069.2003 Copyright 2003, American Society for Microbiology. All Rights Reserved.

More information

FEATURE SELECTION COMBINED WITH RANDOM SUBSPACE ENSEMBLE FOR GENE EXPRESSION BASED DIAGNOSIS OF MALIGNANCIES

FEATURE SELECTION COMBINED WITH RANDOM SUBSPACE ENSEMBLE FOR GENE EXPRESSION BASED DIAGNOSIS OF MALIGNANCIES FEATURE SELECTION COMBINED WITH RANDOM SUBSPACE ENSEMBLE FOR GENE EXPRESSION BASED DIAGNOSIS OF MALIGNANCIES Alberto Bertoni, 1 Raffaella Folgieri, 1 Giorgio Valentini, 1 1 DSI, Dipartimento di Scienze

More information

CRISPR-SeroSeq: A Developing Technique for Salmonella Subtyping

CRISPR-SeroSeq: A Developing Technique for Salmonella Subtyping Department of Biological Sciences Seminar Blog Seminar Date: 3/23/18 Speaker: Dr. Nikki Shariat, Gettysburg College Title: Probing Salmonella population diversity using CRISPRs CRISPR-SeroSeq: A Developing

More information

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM 1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University

More information

The minimal prokaryotic genome. The minimal prokaryotic genome. The minimal prokaryotic genome. The minimal prokaryotic genome

The minimal prokaryotic genome. The minimal prokaryotic genome. The minimal prokaryotic genome. The minimal prokaryotic genome Dr. Dirk Gevers 1,2 1 Laboratorium voor Microbiologie 2 Bioinformatics & Evolutionary Genomics The bacterial species in the genomic era CTACCATGAAAGACTTGTGAATCCAGGAAGAGAGACTGACTGGGCAACATGTTATTCAG GTACAAAAAGATTTGGACTGTAACTTAAAAATGATCAAATTATGTTTCCCATGCATCAGG

More information

Support Vector Machine. Industrial AI Lab. Prof. Seungchul Lee

Support Vector Machine. Industrial AI Lab. Prof. Seungchul Lee Support Vector Machine Industrial AI Lab. Prof. Seungchul Lee Classification (Linear) Autonomously figure out which category (or class) an unknown item should be categorized into Number of categories /

More information

This is the author s version of a work that was submitted for publication prior to peer-review. This is known as the pre-print.

This is the author s version of a work that was submitted for publication prior to peer-review. This is known as the pre-print. This is the author s version of a work that was submitted for publication prior to peer-review. This is known as the pre-print. Citation for author s submitted version Hare, Kim M., Marsh, Robyn Leanne,

More information

NAME: Microbiology BI234 MUST be written and will not be accepted as a typed document. 1.

NAME: Microbiology BI234 MUST be written and will not be accepted as a typed document. 1. Chapter 3 Study Guide Explain the 3 main characteristics that help differentiate prokaryotes from eukaryotes. What are the 7 structures/substances found in all bacterial cells? What are 8 specific structures

More information

Modifying A Linear Support Vector Machine for Microarray Data Classification

Modifying A Linear Support Vector Machine for Microarray Data Classification Modifying A Linear Support Vector Machine for Microarray Data Classification Prof. Rosemary A. Renaut Dr. Hongbin Guo & Wang Juh Chen Department of Mathematics and Statistics, Arizona State University

More information

Applied Machine Learning Annalisa Marsico

Applied Machine Learning Annalisa Marsico Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 29 April, SoSe 2015 Support Vector Machines (SVMs) 1. One of

More information

Introduction to polyphasic taxonomy

Introduction to polyphasic taxonomy Introduction to polyphasic taxonomy Peter Vandamme EUROBILOFILMS - Third European Congress on Microbial Biofilms Ghent, Belgium, 9-12 September 2013 http://www.lm.ugent.be/ Content The observation of diversity:

More information

Haemophilus influenzae septic abortion

Haemophilus influenzae septic abortion Infect Dis Obstet Gynecol 2002;10:161 164 Haemophilus influenzae septic abortion Thomas L. Cherpes 1,3, Shimon Kusne 1 and Sharon L. Hillier 2,3 1 Department of Medicine 2 Department of Obstetrics, Gynecology,

More information

THE BIOCHEMISTRY AND GENETICS OF CAPSULAR POLYSACCHARIDE PRODUCTION IN BACTERIA

THE BIOCHEMISTRY AND GENETICS OF CAPSULAR POLYSACCHARIDE PRODUCTION IN BACTERIA Annu. Rev. Microbiol. 1996. 50:285 315 Copyright c 1996 by Annual Reviews Inc. All rights reserved THE BIOCHEMISTRY AND GENETICS OF CAPSULAR POLYSACCHARIDE PRODUCTION IN BACTERIA Ian S. Roberts School

More information

Fluids, Using Specific Antibody-Coated Staphylococci

Fluids, Using Specific Antibody-Coated Staphylococci JOURNAL OF CLINICAL MICROBIOLOGY, Jan. 1977, p. 81-85 Copyright 1977 American Society for Microbiology Vol. 5, No. 1 Printed in U.S.A. Detection ofhaemophilus influenzae Type b Antigens in Body Fluids,

More information

Support Vector Machine. Industrial AI Lab.

Support Vector Machine. Industrial AI Lab. Support Vector Machine Industrial AI Lab. Classification (Linear) Autonomously figure out which category (or class) an unknown item should be categorized into Number of categories / classes Binary: 2 different

More information

#33 - Genomics 11/09/07

#33 - Genomics 11/09/07 BCB 444/544 Required Reading (before lecture) Lecture 33 Mon Nov 5 - Lecture 31 Phylogenetics Parsimony and ML Chp 11 - pp 142 169 Genomics Wed Nov 7 - Lecture 32 Machine Learning Fri Nov 9 - Lecture 33

More information

Outline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22

Outline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22 Outline Basic concepts: SVM and kernels SVM primal/dual problems Chih-Jen Lin (National Taiwan Univ.) 1 / 22 Outline Basic concepts: SVM and kernels Basic concepts: SVM and kernels SVM primal/dual problems

More information

IR Biotyper. Innovation with Integrity. Microbial typing for real-time epidemiology FT-IR

IR Biotyper. Innovation with Integrity. Microbial typing for real-time epidemiology FT-IR IR Biotyper Microbial typing for real-time epidemiology Innovation with Integrity FT-IR IR Biotyper - Proactive hospital hygiene and infection control Fast, easy-to-apply and economical typing methods

More information

Binding of human hemoglobin by Haemophilus influenzae

Binding of human hemoglobin by Haemophilus influenzae FEMS Microbiology Letters 118 (1994) 243-248 1994 Federation of European Microbiological Societies 0378-1097/94/$07.00 Published by Elsevier 243 FEMSLE 05945 Binding of human hemoglobin by Haemophilus

More information

Comparative genomics: Overview & Tools + MUMmer algorithm

Comparative genomics: Overview & Tools + MUMmer algorithm Comparative genomics: Overview & Tools + MUMmer algorithm Urmila Kulkarni-Kale Bioinformatics Centre University of Pune, Pune 411 007. urmila@bioinfo.ernet.in Genome sequence: Fact file 1995: The first

More information

Classification Ensemble That Maximizes the Area Under Receiver Operating Characteristic Curve (AUC)

Classification Ensemble That Maximizes the Area Under Receiver Operating Characteristic Curve (AUC) Classification Ensemble That Maximizes the Area Under Receiver Operating Characteristic Curve (AUC) Eunsik Park 1 and Y-c Ivan Chang 2 1 Chonnam National University, Gwangju, Korea 2 Academia Sinica, Taipei,

More information

Title ghost-tree: creating hybrid-gene phylogenetic trees for diversity analyses

Title ghost-tree: creating hybrid-gene phylogenetic trees for diversity analyses 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 Title ghost-tree: creating hybrid-gene phylogenetic trees for diversity analyses

More information

Prediction of the Risk Types of Human Papillomaviruses by Support Vector Machines

Prediction of the Risk Types of Human Papillomaviruses by Support Vector Machines Prediction of the Risk Types of Human Papillomaviruses by Support Vector Machines Je-Gun Joung 1,2, Sok June O 2,4, and Byoung-Tak Zhang 1,2,3 1 Biointelligence Laboratory, Graduate Program in Bioinformatics

More information

Microbial Taxonomy. Microbes usually have few distinguishing properties that relate them, so a hierarchical taxonomy mainly has not been possible.

Microbial Taxonomy. Microbes usually have few distinguishing properties that relate them, so a hierarchical taxonomy mainly has not been possible. Microbial Taxonomy Traditional taxonomy or the classification through identification and nomenclature of microbes, both "prokaryote" and eukaryote, has been in a mess we were stuck with it for traditional

More information

A genomic insight into evolution and virulence of Corynebacterium diphtheriae

A genomic insight into evolution and virulence of Corynebacterium diphtheriae A genomic insight into evolution and virulence of Corynebacterium diphtheriae Vartul Sangal, Ph.D. Northumbria University, Newcastle vartul.sangal@northumbria.ac.uk @VartulSangal Newcastle University 8

More information

Computational Learning Theory (VC Dimension)

Computational Learning Theory (VC Dimension) Computational Learning Theory (VC Dimension) 1 Difficulty of machine learning problems 2 Capabilities of machine learning algorithms 1 Version Space with associated errors error is the true error, r is

More information

Genetic Variation: The genetic substrate for natural selection. Horizontal Gene Transfer. General Principles 10/2/17.

Genetic Variation: The genetic substrate for natural selection. Horizontal Gene Transfer. General Principles 10/2/17. Genetic Variation: The genetic substrate for natural selection What about organisms that do not have sexual reproduction? Horizontal Gene Transfer Dr. Carol E. Lee, University of Wisconsin In prokaryotes:

More information

Subtype distribution of Haemophilus influenzae isolates from North India

Subtype distribution of Haemophilus influenzae isolates from North India J. Med. Microbiol. Vol. 51 (2002), 399 404 # 2002 Society for General Microbiology ISSN 0022-2615 MOLECULAR EPIDEMIOLOGY Subtype distribution of Haemophilus influenzae isolates from North India A. SHARMA,

More information

The use of genome wide association methods to investigate pathogenicity, population structure and serovar in Haemophilus parasuis

The use of genome wide association methods to investigate pathogenicity, population structure and serovar in Haemophilus parasuis https://helda.helsinki.fi The use of genome wide association methods to investigate pathogenicity, population structure and serovar in Haemophilus parasuis Howell, Kate J BioMed Central 2014-12-24 BMC

More information

A Bahadur Representation of the Linear Support Vector Machine

A Bahadur Representation of the Linear Support Vector Machine A Bahadur Representation of the Linear Support Vector Machine Yoonkyung Lee Department of Statistics The Ohio State University October 7, 2008 Data Mining and Statistical Learning Study Group Outline Support

More information

Ch 3. Bacteria and Archaea

Ch 3. Bacteria and Archaea Ch 3 Bacteria and Archaea SLOs for Culturing of Microorganisms Compare and contrast the overall cell structure of prokaryotes and eukaryotes. List structures all bacteria possess. Describe three basic

More information

Bioinformatics Exercises

Bioinformatics Exercises Bioinformatics Exercises AP Biology Teachers Workshop Susan Cates, Ph.D. Evolution of Species Phylogenetic Trees show the relatedness of organisms Common Ancestor (Root of the tree) 1 Rooted vs. Unrooted

More information

Feature Selection for SVMs

Feature Selection for SVMs Feature Selection for SVMs J. Weston, S. Mukherjee, O. Chapelle, M. Pontil T. Poggio, V. Vapnik, Barnhill BioInformatics.com, Savannah, Georgia, USA. CBCL MIT, Cambridge, Massachusetts, USA. AT&T Research

More information

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall

More information

APPLICATION FOR RESEARCH GRANT Page 8

APPLICATION FOR RESEARCH GRANT Page 8 APPLICATION FOR RESEARCH GRANT Page 8 F. PROJECT SUMMARY Applicants must provide, in the space below, a 200-word summary in lay language, of their research proposal. It is critical that the summary be

More information

A genome sequence based discriminator for vancomycin intermediate Staphyolococcus aureus Supplementary Methods

A genome sequence based discriminator for vancomycin intermediate Staphyolococcus aureus Supplementary Methods 1 Journal of Bacteriology Computational Biology Section 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 Supplementary Information for: A genome sequence based discriminator

More information

Introduction to Microbiology BIOL 220 Summer Session I, 1996 Exam # 1

Introduction to Microbiology BIOL 220 Summer Session I, 1996 Exam # 1 Name I. Multiple Choice (1 point each) Introduction to Microbiology BIOL 220 Summer Session I, 1996 Exam # 1 B 1. Which is possessed by eukaryotes but not by prokaryotes? A. Cell wall B. Distinct nucleus

More information

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder HMM applications Applications of HMMs Gene finding Pairwise alignment (pair HMMs) Characterizing protein families (profile HMMs) Predicting membrane proteins, and membrane protein topology Gene finding

More information

Grundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson

Grundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson Grundlagen der Bioinformatik, SS 10, D. Huson, April 12, 2010 1 1 Introduction Grundlagen der Bioinformatik Summer semester 2010 Lecturer: Prof. Daniel Huson Office hours: Thursdays 17-18h (Sand 14, C310a)

More information

Bacterial Morphology and Structure م.م رنا مشعل

Bacterial Morphology and Structure م.م رنا مشعل Bacterial Morphology and Structure م.م رنا مشعل SIZE OF BACTERIA Unit for measurement : Micron or micrometer, μm: 1μm=10-3 mm Size: Varies with kinds of bacteria, and also related to their age and external

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary information S3 (box) Methods Methods Genome weighting The currently available collection of archaeal and bacterial genomes has a highly biased distribution of isolates across taxa. For example,

More information

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting. Genome Annotation Bioinformatics and Computational Biology Genome Annotation Frank Oliver Glöckner 1 Genome Analysis Roadmap Genome sequencing Assembly Gene prediction Protein targeting trna prediction

More information

Some models of genomic selection

Some models of genomic selection Munich, December 2013 What is the talk about? Barley! Steptoe x Morex barley mapping population Steptoe x Morex barley mapping population genotyping from Close at al., 2009 and phenotyping from cite http://wheat.pw.usda.gov/ggpages/sxm/

More information

EBI web resources II: Ensembl and InterPro. Yanbin Yin Spring 2013

EBI web resources II: Ensembl and InterPro. Yanbin Yin Spring 2013 EBI web resources II: Ensembl and InterPro Yanbin Yin Spring 2013 1 Outline Intro to genome annotation Protein family/domain databases InterPro, Pfam, Superfamily etc. Genome browser Ensembl Hands on Practice

More information

EBI web resources II: Ensembl and InterPro

EBI web resources II: Ensembl and InterPro EBI web resources II: Ensembl and InterPro Yanbin Yin http://www.ebi.ac.uk/training/online/course/ 1 Homework 3 Go to http://www.ebi.ac.uk/interpro/training.htmland finish the second online training course

More information

Fitness constraints on horizontal gene transfer

Fitness constraints on horizontal gene transfer Fitness constraints on horizontal gene transfer Dan I Andersson University of Uppsala, Department of Medical Biochemistry and Microbiology, Uppsala, Sweden GMM 3, 30 Aug--2 Sep, Oslo, Norway Acknowledgements:

More information

Microbes usually have few distinguishing properties that relate them, so a hierarchical taxonomy mainly has not been possible.

Microbes usually have few distinguishing properties that relate them, so a hierarchical taxonomy mainly has not been possible. Microbial Taxonomy Traditional taxonomy or the classification through identification and nomenclature of microbes, both "prokaryote" and eukaryote, has been in a mess we were stuck with it for traditional

More information

Microbial Taxonomy. Slowly evolving molecules (e.g., rrna) used for large-scale structure; "fast- clock" molecules for fine-structure.

Microbial Taxonomy. Slowly evolving molecules (e.g., rrna) used for large-scale structure; fast- clock molecules for fine-structure. Microbial Taxonomy Traditional taxonomy or the classification through identification and nomenclature of microbes, both "prokaryote" and eukaryote, has been in a mess we were stuck with it for traditional

More information

Building a Prognostic Biomarker

Building a Prognostic Biomarker Building a Prognostic Biomarker Noah Simon and Richard Simon July 2016 1 / 44 Prognostic Biomarker for a Continuous Measure On each of n patients measure y i - single continuous outcome (eg. blood pressure,

More information

Microarray Data Analysis: Discovery

Microarray Data Analysis: Discovery Microarray Data Analysis: Discovery Lecture 5 Classification Classification vs. Clustering Classification: Goal: Placing objects (e.g. genes) into meaningful classes Supervised Clustering: Goal: Discover

More information

Stepping stones towards a new electronic prokaryotic taxonomy. The ultimate goal in taxonomy. Pragmatic towards diagnostics

Stepping stones towards a new electronic prokaryotic taxonomy. The ultimate goal in taxonomy. Pragmatic towards diagnostics Stepping stones towards a new electronic prokaryotic taxonomy - MLSA - Dirk Gevers Different needs for taxonomy Describe bio-diversity Understand evolution of life Epidemiology Diagnostics Biosafety...

More information

BACTERIA AND ARCHAEA 10/15/2012

BACTERIA AND ARCHAEA 10/15/2012 BACTERIA AND ARCHAEA Chapter 27 KEY CONCEPTS: Structural and functional adaptations contribute to prokaryotic success Rapid reproduction, mutation, and genetic recombination promote genetic diversity in

More information

COMBI - Combining high-dimensional classification and multiple hypotheses testing for the analysis of big data in genetics

COMBI - Combining high-dimensional classification and multiple hypotheses testing for the analysis of big data in genetics COMBI - Combining high-dimensional classification and multiple hypotheses testing for the analysis of big data in genetics Thorsten Dickhaus University of Bremen Institute for Statistics AG DANK Herbsttagung

More information

The Open Microbiology Journal

The Open Microbiology Journal Send Orders for Reprints to reprints@benthamscience.ae The Open Microbiology Journal, 2017, 11, i-vi The Open Microbiology Journal Supplementary Material Content list available at: www.benthamopen.com/tomicroj/

More information

Gene Expression Data Classification with Revised Kernel Partial Least Squares Algorithm

Gene Expression Data Classification with Revised Kernel Partial Least Squares Algorithm Gene Expression Data Classification with Revised Kernel Partial Least Squares Algorithm Zhenqiu Liu, Dechang Chen 2 Department of Computer Science Wayne State University, Market Street, Frederick, MD 273,

More information

Microbial Genetics, Mutation and Repair. 2. State the function of Rec A proteins in homologous genetic recombination.

Microbial Genetics, Mutation and Repair. 2. State the function of Rec A proteins in homologous genetic recombination. Answer the following questions 1. Define genetic recombination. Microbial Genetics, Mutation and Repair 2. State the function of Rec A proteins in homologous genetic recombination. 3. List 3 types of bacterial

More information

Whole genome sequencing (WGS) - there s a new tool in town. Henrik Hasman DTU - Food

Whole genome sequencing (WGS) - there s a new tool in town. Henrik Hasman DTU - Food Whole genome sequencing (WGS) - there s a new tool in town Henrik Hasman DTU - Food Welcome to the NGS world TODAY Welcome Introduction to Next Generation Sequencing DNA purification (Hands-on) Lunch (Sandwishes

More information

A. Incorrect! In the binomial naming convention the Kingdom is not part of the name.

A. Incorrect! In the binomial naming convention the Kingdom is not part of the name. Microbiology Problem Drill 08: Classification of Microorganisms No. 1 of 10 1. In the binomial system of naming which term is always written in lowercase? (A) Kingdom (B) Domain (C) Genus (D) Specific

More information

Nanoparticle classification in wide-field interferometric microscopy by supervised learning from model

Nanoparticle classification in wide-field interferometric microscopy by supervised learning from model 4238 Vol. 56, No. 5 / May 20 207 / Applied Optics Research Article Nanoparticle classification in wide-field interferometric microscopy by supervised learning from model OGUZHAN AVCI, CELALETTIN YURDAKUL,

More information

Homology and Information Gathering and Domain Annotation for Proteins

Homology and Information Gathering and Domain Annotation for Proteins Homology and Information Gathering and Domain Annotation for Proteins Outline Homology Information Gathering for Proteins Domain Annotation for Proteins Examples and exercises The concept of homology The

More information

An Improved 1-norm SVM for Simultaneous Classification and Variable Selection

An Improved 1-norm SVM for Simultaneous Classification and Variable Selection An Improved 1-norm SVM for Simultaneous Classification and Variable Selection Hui Zou School of Statistics University of Minnesota Minneapolis, MN 55455 hzou@stat.umn.edu Abstract We propose a novel extension

More information

Bergey s Manual Classification Scheme. Vertical inheritance and evolutionary mechanisms

Bergey s Manual Classification Scheme. Vertical inheritance and evolutionary mechanisms Bergey s Manual Classification Scheme Gram + Gram - No wall Funny wall Vertical inheritance and evolutionary mechanisms a b c d e * * a b c d e * a b c d e a b c d e * a b c d e Accumulation of neutral

More information

Microbes and you ON THE LATEST HUMAN MICROBIOME DISCOVERIES, COMPUTATIONAL QUESTIONS AND SOME SOLUTIONS. Elizabeth Tseng

Microbes and you ON THE LATEST HUMAN MICROBIOME DISCOVERIES, COMPUTATIONAL QUESTIONS AND SOME SOLUTIONS. Elizabeth Tseng Microbes and you ON THE LATEST HUMAN MICROBIOME DISCOVERIES, COMPUTATIONAL QUESTIONS AND SOME SOLUTIONS Elizabeth Tseng Dept. of CSE, University of Washington Johanna Lampe Lab, Fred Hutchinson Cancer

More information

Microbial Diversity and Assessment (II) Spring, 2007 Guangyi Wang, Ph.D. POST103B

Microbial Diversity and Assessment (II) Spring, 2007 Guangyi Wang, Ph.D. POST103B Microbial Diversity and Assessment (II) Spring, 007 Guangyi Wang, Ph.D. POST03B guangyi@hawaii.edu http://www.soest.hawaii.edu/marinefungi/ocn403webpage.htm General introduction and overview Taxonomy [Greek

More information

Microbial Taxonomy. C. Microbes usually have few distinguishing properties that relate them, so a hierarchical taxonomy mainly has not been possible.

Microbial Taxonomy. C. Microbes usually have few distinguishing properties that relate them, so a hierarchical taxonomy mainly has not been possible. Microbial Taxonomy 1. Traditional taxonomy or the classification through identification and nomenclature of microbes, both "prokaryote" and eucaryote, is in a mess we are stuck with it for traditional

More information

Universal Learning Technology: Support Vector Machines

Universal Learning Technology: Support Vector Machines Special Issue on Information Utilizing Technologies for Value Creation Universal Learning Technology: Support Vector Machines By Vladimir VAPNIK* This paper describes the Support Vector Machine (SVM) technology,

More information

Compressed Sensing in Cancer Biology? (A Work in Progress)

Compressed Sensing in Cancer Biology? (A Work in Progress) Compressed Sensing in Cancer Biology? (A Work in Progress) M. Vidyasagar FRS Cecil & Ida Green Chair The University of Texas at Dallas M.Vidyasagar@utdallas.edu www.utdallas.edu/ m.vidyasagar University

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 27, 2015 Outline One versus all/one versus one Ranking loss for multiclass/multilabel classification Scaling to millions of labels Multiclass

More information

Predicting Protein Functions and Domain Interactions from Protein Interactions

Predicting Protein Functions and Domain Interactions from Protein Interactions Predicting Protein Functions and Domain Interactions from Protein Interactions Fengzhu Sun, PhD Center for Computational and Experimental Genomics University of Southern California Outline High-throughput

More information

Structured Statistical Learning with Support Vector Machine for Feature Selection and Prediction

Structured Statistical Learning with Support Vector Machine for Feature Selection and Prediction Structured Statistical Learning with Support Vector Machine for Feature Selection and Prediction Yoonkyung Lee Department of Statistics The Ohio State University http://www.stat.ohio-state.edu/ yklee Predictive

More information

Taxonomy. Content. How to determine & classify a species. Phylogeny and evolution

Taxonomy. Content. How to determine & classify a species. Phylogeny and evolution Taxonomy Content Why Taxonomy? How to determine & classify a species Domains versus Kingdoms Phylogeny and evolution Why Taxonomy? Classification Arrangement in groups or taxa (taxon = group) Nomenclature

More information

Lecture 18: Multiclass Support Vector Machines

Lecture 18: Multiclass Support Vector Machines Fall, 2017 Outlines Overview of Multiclass Learning Traditional Methods for Multiclass Problems One-vs-rest approaches Pairwise approaches Recent development for Multiclass Problems Simultaneous Classification

More information

The Complete Genome Sequence of Bacillus thuringiensis subsp. chinensis strain CT-43

The Complete Genome Sequence of Bacillus thuringiensis subsp. chinensis strain CT-43 JB Accepts, published online ahead of print on 6 May 2011 J. Bacteriol. doi:10.1128/jb.05085-11 Copyright 2011, American Society for Microbiology and/or the Listed Authors/Institutions. All Rights Reserved.

More information

Mathematical Programming for Multiple Kernel Learning

Mathematical Programming for Multiple Kernel Learning Mathematical Programming for Multiple Kernel Learning Alex Zien Fraunhofer FIRST.IDA, Berlin, Germany Friedrich Miescher Laboratory, Tübingen, Germany 07. July 2009 Mathematical Programming Stream, EURO

More information

Inferring Protein-Signaling Networks II

Inferring Protein-Signaling Networks II Inferring Protein-Signaling Networks II Lectures 15 Nov 16, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall (JHN) 022

More information

BRADLEY R. CLARKE, ROWAN PEARCE, AND IAN S. ROBERTS* School of Biological Sciences, University of Manchester, Manchester M13 9PT, United Kingdom

BRADLEY R. CLARKE, ROWAN PEARCE, AND IAN S. ROBERTS* School of Biological Sciences, University of Manchester, Manchester M13 9PT, United Kingdom JOURNAL OF BACTERIOLOGY, Apr. 1999, p. 2279 2285 Vol. 181, No. 7 0021-9193/99/$04.00 0 Copyright 1999, American Society for Microbiology. All Rights Reserved. Genetic Organization of the Escherichia coli

More information

Model Accuracy Measures

Model Accuracy Measures Model Accuracy Measures Master in Bioinformatics UPF 2017-2018 Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA Barcelona, Spain Variables What we can measure (attributes) Hypotheses

More information

10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers

10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers Computational Methods for Data Analysis Massimo Poesio SUPPORT VECTOR MACHINES Support Vector Machines Linear classifiers 1 Linear Classifiers denotes +1 denotes -1 w x + b>0 f(x,w,b) = sign(w x + b) How

More information

Support Vector Machine via Nonlinear Rescaling Method

Support Vector Machine via Nonlinear Rescaling Method Manuscript Click here to download Manuscript: svm-nrm_3.tex Support Vector Machine via Nonlinear Rescaling Method Roman Polyak Department of SEOR and Department of Mathematical Sciences George Mason University

More information

Support'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan

Support'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan Support'Vector'Machines Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan kasthuri.kannan@nyumc.org Overview Support Vector Machines for Classification Linear Discrimination Nonlinear Discrimination

More information

Learning Classification with Auxiliary Probabilistic Information Quang Nguyen Hamed Valizadegan Milos Hauskrecht

Learning Classification with Auxiliary Probabilistic Information Quang Nguyen Hamed Valizadegan Milos Hauskrecht Learning Classification with Auxiliary Probabilistic Information Quang Nguyen Hamed Valizadegan Milos Hauskrecht Computer Science Department University of Pittsburgh Outline Introduction Learning with

More information

Polyhedral Computation. Linear Classifiers & the SVM

Polyhedral Computation. Linear Classifiers & the SVM Polyhedral Computation Linear Classifiers & the SVM mcuturi@i.kyoto-u.ac.jp Nov 26 2010 1 Statistical Inference Statistical: useful to study random systems... Mutations, environmental changes etc. life

More information

The Minimal-Gene-Set -Kapil PHY498BIO, HW 3

The Minimal-Gene-Set -Kapil PHY498BIO, HW 3 The Minimal-Gene-Set -Kapil Rajaraman(rajaramn@uiuc.edu) PHY498BIO, HW 3 The number of genes in organisms varies from around 480 (for parasitic bacterium Mycoplasma genitalium) to the order of 100,000

More information

Detection and characterization of interactions of genetic risk factors in disease

Detection and characterization of interactions of genetic risk factors in disease 4 PROC. OF THE 12th PYTHON IN SCIENCE CONF. (SCIPY 213) Detection and characterization of interactions of genetic risk factors in disease Patricia Francis-Lyon, Shashank Belvadi, Fu-Yuan Cheng http://www.youtube.com/wa?v=ia9mzrcca8

More information

Chapter 19. Microbial Taxonomy

Chapter 19. Microbial Taxonomy Chapter 19 Microbial Taxonomy 12-17-2008 Taxonomy science of biological classification consists of three separate but interrelated parts classification arrangement of organisms into groups (taxa; s.,taxon)

More information