Introduction to Bioinformatics Integrated Science, 11/9/05

Similar documents
2 Genome evolution: gene fusion versus gene fission

The Minimal-Gene-Set -Kapil PHY498BIO, HW 3

Comparative genomics: Overview & Tools + MUMmer algorithm

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

# shared OGs (spa, spb) Size of the smallest genome. dist (spa, spb) = 1. Neighbor joining. OG1 OG2 OG3 OG4 sp sp sp

Bio 119 Bacterial Genomics 6/26/10

Genetic Variation: The genetic substrate for natural selection. Horizontal Gene Transfer. General Principles 10/2/17.

Biology 105/Summer Bacterial Genetics 8/12/ Bacterial Genomes p Gene Transfer Mechanisms in Bacteria p.

Evolutionary Analysis by Whole-Genome Comparisons

Computational approaches for functional genomics

Genomes and Their Evolution

Computational methods for predicting protein-protein interactions

ABSTRACT. As a result of recent successes in genome scale studies, especially genome

Virginia Western Community College BIO 101 General Biology I

Taxonomy. Content. How to determine & classify a species. Phylogeny and evolution

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.

Related Courses He who asks is a fool for five minutes, but he who does not ask remains a fool forever.

Chapter 19. Microbial Taxonomy

Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p

Microbial Diversity and Assessment (II) Spring, 2007 Guangyi Wang, Ph.D. POST103B

Fitness constraints on horizontal gene transfer

Computational Biology: Basics & Interesting Problems

The minimal prokaryotic genome. The minimal prokaryotic genome. The minimal prokaryotic genome. The minimal prokaryotic genome

Dynamic optimisation identifies optimal programs for pathway regulation in prokaryotes. - Supplementary Information -

Chapter 15 Active Reading Guide Regulation of Gene Expression

Computational Structural Bioinformatics

3.B.1 Gene Regulation. Gene regulation results in differential gene expression, leading to cell specialization.

Microbes usually have few distinguishing properties that relate them, so a hierarchical taxonomy mainly has not been possible.

Microbial Taxonomy. Slowly evolving molecules (e.g., rrna) used for large-scale structure; "fast- clock" molecules for fine-structure.

Introduction to Bioinformatics. Shifra Ben-Dor Irit Orr

CGS 5991 (2 Credits) Bioinformatics Tools

11/24/13. Science, then, and now. Computational Structural Bioinformatics. Learning curve. ECS129 Instructor: Patrice Koehl

BSC 4934: QʼBIC Capstone Workshop" Giri Narasimhan. ECS 254A; Phone: x3748

Mouth animalcules (bacteria)

SPECIES OF ARCHAEA ARE MORE CLOSELY RELATED TO EUKARYOTES THAN ARE SPECIES OF PROKARYOTES.

AP Bio Module 16: Bacterial Genetics and Operons, Student Learning Guide

ECOL/MCB 320 and 320H Genetics

Microbial Taxonomy. Microbes usually have few distinguishing properties that relate them, so a hierarchical taxonomy mainly has not been possible.

Evolutionary Use of Domain Recombination: A Distinction. Between Membrane and Soluble Proteins

Introduction to Biology

The Gene The gene; Genes Genes Allele;

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Evaluation. Course Homepage.

BIOLOGY Grades Summer Units: 10 high school credits UC Requirement Category: d. General Description:

Bio 101 General Biology 1

Biology 112 Practice Midterm Questions

Biology Science Crosswalk

Bacterial Genetics & Operons

Microbial Taxonomy and the Evolution of Diversity

Microbiology / Active Lecture Questions Chapter 10 Classification of Microorganisms 1 Chapter 10 Classification of Microorganisms

Bioinformatics Chapter 1. Introduction

Midterm Exam #1 : In-class questions! MB 451 Microbial Diversity : Spring 2015!

Bioinformatics in the post-sequence era

Big Idea 1: The process of evolution drives the diversity and unity of life. Sunday, August 28, 16

SPRINGFIELD TECHNICAL COMMUNITY COLLEGE ACADEMIC AFFAIRS

no.1 Raya Ayman Anas Abu-Humaidan

RCPS Curriculum Pacing Guide Subject: Biology. Remembering, Understanding, Applying, Analyzing, Evaluating, Creating

Introductory Microbiology Dr. Hala Al Daghistani

BIOLOGY (BIOL) Biology (BIOL) 1. BIOL 155 Introductory Microbiology Laboratory 1 credits

I. Molecules and Cells: Cells are the structural and functional units of life; cellular processes are based on physical and chemical changes.

Phylogeny & Systematics

I. Molecules & Cells. A. Unit One: The Nature of Science. B. Unit Two: The Chemistry of Life. C. Unit Three: The Biology of the Cell.

The use of gene clusters to infer functional coupling

BIOLOGY I, PRE-AP. Section Description State Standard Addressed

G4120: Introduction to Computational Biology

Universal Rules Governing Genome Evolution Expressed by Linear Formulas

AP Biology. Read college-level text for understanding and be able to summarize main concepts

HORIZONTAL TRANSFER IN EUKARYOTES KIMBERLEY MC GRAIL FERNÁNDEZ GENOMICS

6.096 Algorithms for Computational Biology. Prof. Manolis Kellis

Fundamentals of Biology Valencia College BSC1010C

Horizontal transfer and pathogenicity

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Burton's Microbiology for the Health Sciences

Bibb County Science Pacing Guide for Biology Parts A and B*

Bacillus anthracis. Last Lecture: 1. Introduction 2. History 3. Koch s Postulates. 1. Prokaryote vs. Eukaryote 2. Classifying prokaryotes

INTERACTIVE CLUSTERING FOR EXPLORATION OF GENOMIC DATA

Understanding Science Through the Lens of Computation. Richard M. Karp Nov. 3, 2007

Biology IA & IB Syllabus Mr. Johns/Room 2012/August,

2. Cellular and Molecular Biology

PRESCOTT UNIFIED SCHOOL DISTRICT District Instructional Guide

Map of AP-Aligned Bio-Rad Kits with Learning Objectives

Genome reduction in prokaryotic obligatory intracellular parasites of humans: a comparative analysis

Intro to Prokaryotes Lecture 1 Spring 2014

Introduction to Microbiology. CLS 212: Medical Microbiology Miss Zeina Alkudmani

Computational Cell Biology Lecture 4

Vital Statistics Derived from Complete Genome Sequencing (for E. coli MG1655)

Advanced Algorithms and Models for Computational Biology

Microbial Genetics, Mutation and Repair. 2. State the function of Rec A proteins in homologous genetic recombination.

Chapter 1. How Do Biologists Study Life?

Comparing Prokaryotic and Eukaryotic Cells

Microbiology - Problem Drill 04: Prokayotic & Eukaryotic Cells - Structures and Functions

Conservation of Gene Co-Regulation between Two Prokaryotes: Bacillus subtilis and Escherichia coli

This is a repository copy of Microbiology: Mind the gaps in cellular evolution.

Name: Class: Date: ID: A

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Introduction to Microbiology BIOL 220 Summer Session I, 1996 Exam # 1

GACE Biology Assessment Test I (026) Curriculum Crosswalk

Introduction to cells

Outline. Genome Evolution. Genome. Genome Architecture. Constraints on Genome Evolution. New Evolutionary Synthesis 11/8/16

Translation - Prokaryotes

Warm Up. What are some examples of living things? Describe the characteristics of living things

Transcription:

1 Introduction to Bioinformatics Integrated Science, 11/9/05 Morris Levy Biological Sciences Research: Evolutionary Ecology, Plant- Fungal Pathogen Interactions Coordinator: BIOL 495S/CS490B/STAT490B Introduction to Bioinformatics Fall semester [Selected slides from Mark Levinthal and Daisuke Kihara] 1 Bioinformatics/Computational Biology Development and application of computational tools (algorithms*) for genome sequencing and massive data analysis * Systematic procedure for solving a problem in a finite number of steps; can be written in a computer language and run as a program.-mount) Interdisciplinary (Biol/CS/Stat) Emphases on determining sequence, structure and functional relationships of DNAs, genes, and proteins necessary for cell metabolism and organism development Databases and Information Management 2

2 DNA RNA protein phenotype genomic DNA databases cdna ESTs Expression profiles protein Sequence Structure databases 3 Research Areas in Bioinformatics Genomics: Sequence and organization of the genome (structural), gene finding and functional annotation; Comparative genomics Proteomics: structure and function of entire inventory of proteins produced Transcriptomics: gene expression profiles in cells, tissues, organs, organisms during development; comparative expression in disease pathology or genetic disorder Metabolomics: organization and flux of all cellular pathways (chemistry and physiology) Phylogenetics: evolutionary history of above 4

3 Plan of my three lecturers 1. Intro to bioinformatics: comparative genomics 2. Tutorial on NCBI database use incl. BLAST (Basic Local Alignment Search Tool) 3. Phylogenetic Informatics Always ask questions for clarification during lecture; even other questions (@$0.50) 5 3 Domains of Life Archaea Prokaryote, lacks nuclear membrane, singlecell Initially found in extreme conditions, high temp., pressure, low ph Bacteria Prokaryote E.coli Eucarya/Eukarya Yeast (unicellular) a tree of life based human on small subunit rrna sequences (Pace, 2001). 6

4 Three Domains of Life + Endosymbiosis Monophyly but with horizontal transfer (eukaryotic organelles,i.e., mitochondria and chloroplasts, are bacterial in evolutionary origin) Closer relationship of the Archea and Eukaryota relative to Bacteria (share information processing genes) 7 Genome Sequences Human Genome Sequence Completed in 2000 8

5 1995: genome of the bacterium Haemophilus influenzae is sequenced 9 10

6 Overview of bacterial complete genomes 11 MBGD database http://mbgd.genome.ad.jp 12

7 Genome sizes in nucleotide base pairs plasmids viruses bacteria fungi plants The size of the human genome is ~ 3 x 10 9 base pairs and is thought to contain ~25,000-35,000 genes; protein coding genes = <2% of total genome (Why?) algae insects mollusks bony fish amphibians reptiles birds mammals 10 4 10 5 10 6 10 7 10 8 10 9 10 10 10 11 http://www3.kumc.edu/jcalvet/powerpoint/bioc801b.ppt Genes in the genome Organism Domain Genome size (KB) ORFs Ratio- Genome/ORF Escherichia coli Bacteria 4639 4289 1.08 Bacillus subtilis Bacteria 4214 4099 1.02 Methanobacterium thermoautotrophicum Archea 1751 1918 0.91 Saccharomyces cerevisiae Eukaryote Single cell 12069 6294 1.9 Caenorhabditis elegans Eukaryote nematode 97000 19099 5.07 Oryza sativa Eukaryote plant 420000 50000 8.4 Drosophila melanogaster Eukaryote insect 137000 14100 9.71 14

8 GC content varies across genomes Number of species in each GC class 10 5 5 3 10 5 Bacteria Plants Invertebrates Vertebrates 20 30 40 50 60 70 80 GC content (%) 15 Function Assignment BLAST/FASTA sequence comparison with genes of known function motif (protein folding structures) search via structure prediction Amount of Unknown Function in Genomes (201 Genomes) (Hawkins & Kihara) 16

9 Comparison Strategies for Deciphering Gene Function Genomes of closely related organisms Genomes of distantly related organisms Genomes vs. metabolic pathways, compounds *Inference: Conservation of sequence, physical order and/or phylogenetic clustering of genes implies their functional association 17 Dynamic rearrangement of genomes: Mycoplasma pneumoniae and Mycoplasma genitalium (Himmelreich et al., 1997) M. pneumoniae (1996) 732 genes M. genitalium (1995), 522 genes = smallest genome in self-replicating organisms 18

10 Genome map of two Mycoplasma sp. Method: FASTA/BLAST bidirectional hits 19 Dot-plots of closely related genomes (Suyama & Bork 2001) (a) (b) (c) (d) (e) (f) (g) (h) (i) Chlamydia pneumoniae, AR39 (CPa) & CWL029 (CP) Neisseria meningitidis, Z2491 (NMa) & MC58 (NMb) Helicobacter pylori, J99(HP99) & 26695 (HP) Chlamydia trachomatis, serov ar D (CT) & MoPn (CTm) Mycobacterium leprae (ML) & M. tuberculosis (MT) Pyrococcus horikoshii (PH) & P. abyssi (PA) E.coli (EC) & Vibrio cholerae chromosome 1 (VC1) Mycoplasma pneumoniae (MP) & M. genitalium (MG) CP & CT 20

11 Conservation of gene order (gene clusters) Danderkar, Snel, Huynen & Bork (1998) Analysis of selective constraints that preserve gene order Genomes to be compared should be not too far but not too close in evolutionary distance Reason for conservation: Physical interactions between coded proteins Operon: a unit of transcription which consists of several genes with related functions, (a) promoter region(s) and other regulatory sites 21 Conserved gene arrangements Ribosomal proteins ATP synthases Transporters ABC (ATP binding Cassette) transporters Enzyme pairs GroEL & GroES etc. Cell-division proteins Gene pairs of unknown function Tryptophan operon 22

12 Examples of proteins with the same phylogenetic profile (co-occurrence) A. Ribosomal proteins B. Flagellar structural proteins C. Histidine biosynthetic protein 23 Domain/gene fusion Genome 1 A B Genome 2 A B Two separate genes in one genome are fused into a single gene in another genome Most probably they are involved in the same function 24

13 Fusion proteins in human genome Domain Fusion Database: http://calcium.uhnres.utoronto.ca/pi/ 25 Pathway database KEGG database: http://www.genome.ad.jp green= E.coli genes 26

14 Clusters of chemical compounds on pathways (Hattori, Okuno, Goto, Kanehisa 2003) Compounds in pathways are compared and clustered Sub-pathways with similar compounds sometimes correspond to operons of enzymes 27 Functional network of Escherichia coli 89 complete genomes Functional interactions can be predicted from: Conserved gene order Gene fusion events Common phylogenetic pattern 28

15 Summary: Comparative Genomics Conservation of the gene order, phylogenetic patterns, and gene fusion events detected by comparative genomics analyses implies functional association. Detection of orthologous genes (bidirectional best hits) is the basis of all above analyses. Combinations with pathway/compound associations broadens inferences for metabolic pathway analysis Gene/Gene family comparisons across phylogeny indicate functional diversification (not discussed- later see Dr. Mason re globin evolution) 29 References Comparative analysis of the genomes of the bacteria Mycoplasma pneumoniae and Mycoplasma genitalium. Himmelreich R et al. Nucleic Acid Research 25:701-712 (1997) Comparative genomics, minimal gene-sets, and the last universal common ancestor. Koonin EV. Nature Review Microbiology, 1: 128-136 (2003) Evolution of prokaryotic gene order: genome rearrangements in closely related species. Suyama M & Bork P. Trends in Genetics, 17: 10-13 (2001) Assigning protein functions by comparative genome analysis: Protein phylogenetic profiles. Pellegrini M,,,Eisenberg D, Yeates TO. Proc. Natl. Acad. Sci. USA 96: 4285-4288 (1999) Development of a chemical structure comparison method for integrated analysis of chemical and genomic information in the metabolic pathways. Hattori M et al. J. Am. Chem. Soc. 125: 11853-11865 (2003) The identification of functional modules from the genomic association of genes. Snel B, Bork P, Huynen MA. Proc. Natl. Acad. Sci. USA 99: 5890-5895 (2002) Genome evolution reveals biochemical networks and functional modules. von Mering AC et al. Proc. Natl. Acad. Sci. USA 100: 15428-15433 (2003) 30