Comparative genomics: Overview & Tools + MUMmer algorithm

Size: px
Start display at page:

Download "Comparative genomics: Overview & Tools + MUMmer algorithm"

Transcription

1 Comparative genomics: Overview & Tools + MUMmer algorithm Urmila Kulkarni-Kale Bioinformatics Centre University of Pune, Pune urmila@bioinfo.ernet.in

2 Genome sequence: Fact file 1995: The first complete genome sequence of Haemophilus infuenzae Rd-was published Biological systems are dynamic and evolving The forth dimension: Time Genome sequence is a snapshot of evolution Correlation between Phenotypic properties and Genomic region is not straightforward as phenotypic properties are result of many to many interactions 2

3 Genomes: the current status Published complete genomes: 403 Ongoing:» Archaeal: 81» Bacterial: 1226» Eukaryal: 169» Archaeal: 107» Prokaryotic: 3478» Eukaryotic: 1209 GOLD database Metagenomics:203 As of Viral: >4500 3

4 Genome databases Genomes at NCBI, EBI, TIGR 4

5 H. influenzae Complete Genome 5

6 Function information clock of E. coli Generated on March 2K4 6

7 Comparison of the coding regions Begins with the gene identification algorithm: infer what portions of the genomic sequence actively code for genes. There are four basic approaches. 7

8 Knowledge of Full Genome sequence: Solutions or new questions? Correct # of genes? Still struggling with the gene counters 8

9 Genome analyses Variation in Genome size GC content Codon usage Amino acid composition Genome organisation Single circular chromosomes E. coli: 4.6Mbp M. pneumoniae: 0.81Mbp B. subtilis: 4.20Mbp B. burgdorferi: 29% M. tuberculosis: 68% G, A, P, R: GC rich I, F, Y, M, D: AT rich Linear chromosome + extra chromosomal elements 9

10 CG: Comparisons between genomes The stains of the same species The closely related species The distantly related species List of Orthologs Evolution of individual genes Evolution of organisms 10

11 11

12 CG helps to ask some interesting questions Identification similarities/differences between genomes may allow us to understand : How 2 organisms evolved? Why certain bacteria cause diseases while others do not? Identification and prioritization of drug targets 12

13 CG: Unit of comparison Unit of comparison: Gene/Genome Number Content (sequence) Location (map position) Gene Order Gene Cluster (Genes that are part of a known metabolic pathway, are found to exist as a group) Colinearity of gene order is referred as synteny A conserved group of genes in the same order in two genomes as a syntenic groups or syntenic clusters Translocation: movement of genomic part from one position to another 13

14 Dandekar et al., 1998 Structure of tryptophan Numbers: Gene operon number Arrows: Direction of transcription //: Dispersion of operon by 50 genes Domain fusion trpd and trpg trpf and trpc trpb and trpa genetically linked separate genes 14

15 Important observations with regard to Gene Order Order is highly conserved in closely related species but gets changed by rearrangements With more evolutionary distance, no correspondence between the gene order of orthologous genes Group of genes having similar biochemical function tend to remain localized Genes required for synthesis of tryptophan (trp genes) in E. coli and other prokaryotes 15

16 Synteny Refers to regions of two genomes that show considerable similarity in terms of sequence and conservation of the order of genes likely to be related by common descent. 16

17 COGs: Phylogenetic classification of proteins encoded in complete genomes 17

18 Genome Pairwise genome comparison of protein homologs (symmetrical best hits) 18

19 Integr8: CG site at EBI 19

20 Comparative Genomics Tools BLAST2 MUMmer PipMaker AVID/VISTA Comparisons and analyses at both Nucleic acid and protein level 20

21 BLAST2 Available at NCBI Input: GI or FASTA sequence (range can be specified) Output: Graphical Alignment of 2 genomes 21

22 Genome Alignment Algorithm: MUMmer Developed by Dr. Steven Salzberg s group at TIGR NAR (1999) 27: NAR (2002) 30: Availability Free TIGR site 22

23 Features of MUMmer The algorithm assumes that sequences are closely related Can quickly compare millions of bases Outputs: Base to base alignment Highlights the exact matches and differences in the genomes Locates SNPs Large inserts Significant repeats Tandem repeats and reversals 23

24 Definitions are drawn from biology SNP: Single mutation surrounded by two matching regions Regions of DNA where 2 sequences have diverged by more than one SNP Large inserts: regions inserted into one of the genomes Sequence reversals, lateral gene transfer Repeats: the form of duplication that has occurred in either genome. Tandem repeats: regions of repeated DNA in immediate succession but with different copy number in different genomes. A repeat can occur 2.5 times 24

25 Techniques used in the MUMmer Algorithm Compute Suffix trees for every genome Longest Increasing Subsequence (LIS) Alignment using Smith & Waterman algorithm Integration of these techniques for genome alignment 25

26 MUMmer: Steps in the alignment process Read two genomes Perform Maximum Unique Match (MUM) of genomes Using SNPs, mutation regions, repeats, tandem repeats Close the gaps in the Alignment Sort and order the MUMs using LIS Output alignment MUMs regions that do not match exactly 26

27 MUMmer steps Locating MUMs Sorting MUMs Closure with gaps G1: ACTGATTACGTGAACTGGATCCA G2: ACTCTAGGTGAAGTGATCCA 27

28 Genome1: ACTGATTACGTGAACTGGATCCA Genome2: ACTCTAGGTGAAGTGATCCA Genome1: ACTGATTACGTGAACTGGATCCA Genome2: ACTCTAGGTGAAGTGATCCA ACTGATTACGTGAACTGGATCCA ACTC--TAGGTGAAGT-GATCCA 28

29 What is a MUM? MUM is a subsequence that occurs exactly once in both genomes and is NOT part of any longer sequence Two characters that bound a MUM are always mismatches GenA: tcgatcgacgatcgccgccgtagatcgaataacgagagagcataacgactta GenB: gcattagacgatcgccgccgtagatcgaataacgagagagcataatccagag Principle: if a long matching sequence occurs exactly once in each genome, it is certainly to be part of global alignment Similar to BLAST & FASTA!! 29

30 Sorting & ordering MUMs MUMs are sorted according to their position in Genome A The order of matching MUMs in Genome B is considered MUM3: Random match Inexact repeat 2 4 MUM5: transposition LIS algorithm to locate longest set of MUMs which occur in ascending order in both genomes Leads to Global MUM-alignment 30

31 MUMmer Results 2 strains of M. tuberculosis H37Rv & CDC1551 Genome size: 4Mb Time: 55 s Generating suffix tree: 5 s Sorting MUMs: 45s S&W alignment: 5 s 31

32 Alignment of M. tuberculosis strains CDC1551 (Top) & H37Rv (bottom) Single green lines indicate SNPs Blue lines indicate insertions 32

33 Comparison of 2 Mycoplasma genomes cousins that are distantly related M. genitalium: nt M. pneumoniae: ( ) Analysis of proteins tell us that all M.g. proteins are present in P.m. Alignment was carried using FASTA (dividing each genome into 1000 bp) All-against-all searches Fixed length of pattern (25) Using MUMmer (length = 25) 33

34 Comparison of 2 Mycoplasma genomes Using FASTA Fixed length patterns: 25mers MUMmer 34

35 Post-sequencing challenges Genome sequencing is just the beginning to appreciate biocomplexity Sequence-based function assignment approaches fail as the sequence similarity drops Structure-based function prediction approaches are limited by the availability of structures, association of structural motifs & associated functional descriptor As a result, in any genome, Genes with known function: ~ 40% Genes with unknown function: ~60% 35

Whole Genome Alignments and Synteny Maps

Whole Genome Alignments and Synteny Maps Whole Genome Alignments and Synteny Maps IINTRODUCTION It was not until closely related organism genomes have been sequenced that people start to think about aligning genomes and chromosomes instead of

More information

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting. Genome Annotation Bioinformatics and Computational Biology Genome Annotation Frank Oliver Glöckner 1 Genome Analysis Roadmap Genome sequencing Assembly Gene prediction Protein targeting trna prediction

More information

2 Genome evolution: gene fusion versus gene fission

2 Genome evolution: gene fusion versus gene fission 2 Genome evolution: gene fusion versus gene fission Berend Snel, Peer Bork and Martijn A. Huynen Trends in Genetics 16 (2000) 9-11 13 Chapter 2 Introduction With the advent of complete genome sequencing,

More information

The Minimal-Gene-Set -Kapil PHY498BIO, HW 3

The Minimal-Gene-Set -Kapil PHY498BIO, HW 3 The Minimal-Gene-Set -Kapil Rajaraman(rajaramn@uiuc.edu) PHY498BIO, HW 3 The number of genes in organisms varies from around 480 (for parasitic bacterium Mycoplasma genitalium) to the order of 100,000

More information

Introduction to Bioinformatics Integrated Science, 11/9/05

Introduction to Bioinformatics Integrated Science, 11/9/05 1 Introduction to Bioinformatics Integrated Science, 11/9/05 Morris Levy Biological Sciences Research: Evolutionary Ecology, Plant- Fungal Pathogen Interactions Coordinator: BIOL 495S/CS490B/STAT490B Introduction

More information

Pairwise & Multiple sequence alignments

Pairwise & Multiple sequence alignments Pairwise & Multiple sequence alignments Urmila Kulkarni-Kale Bioinformatics Centre 411 007 urmila@bioinfo.ernet.in Basis for Sequence comparison Theory of evolution: gene sequences have evolved/derived

More information

Essentiality in B. subtilis

Essentiality in B. subtilis Essentiality in B. subtilis 100% 75% Essential genes Non-essential genes Lagging 50% 25% Leading 0% non-highly expressed highly expressed non-highly expressed highly expressed 1 http://www.pasteur.fr/recherche/unites/reg/

More information

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT 3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT.03.239 25.09.2012 SEQUENCE ANALYSIS IS IMPORTANT FOR... Prediction of function Gene finding the process of identifying the regions of genomic DNA that encode

More information

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Bio 1B Lecture Outline (please print and bring along) Fall, 2007 Bio 1B Lecture Outline (please print and bring along) Fall, 2007 B.D. Mishler, Dept. of Integrative Biology 2-6810, bmishler@berkeley.edu Evolution lecture #5 -- Molecular genetics and molecular evolution

More information

Genômica comparativa. João Carlos Setubal IQ-USP outubro /5/2012 J. C. Setubal

Genômica comparativa. João Carlos Setubal IQ-USP outubro /5/2012 J. C. Setubal Genômica comparativa João Carlos Setubal IQ-USP outubro 2012 11/5/2012 J. C. Setubal 1 Comparative genomics There are currently (out/2012) 2,230 completed sequenced microbial genomes publicly available

More information

Comparative Bioinformatics Midterm II Fall 2004

Comparative Bioinformatics Midterm II Fall 2004 Comparative Bioinformatics Midterm II Fall 2004 Objective Answer, part I: For each of the following, select the single best answer or completion of the phrase. (3 points each) 1. Deinococcus radiodurans

More information

I519 Introduction to Bioinformatics, Genome Comparison. Yuzhen Ye School of Informatics & Computing, IUB

I519 Introduction to Bioinformatics, Genome Comparison. Yuzhen Ye School of Informatics & Computing, IUB I519 Introduction to Bioinformatics, 2015 Genome Comparison Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Whole genome comparison/alignment Build better phylogenies Identify polymorphism

More information

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task. Chapter 12 (Strikberger) Molecular Phylogenies and Evolution METHODS FOR DETERMINING PHYLOGENY In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task. Modern

More information

Genomics and bioinformatics summary. Finding genes -- computer searches

Genomics and bioinformatics summary. Finding genes -- computer searches Genomics and bioinformatics summary 1. Gene finding: computer searches, cdnas, ESTs, 2. Microarrays 3. Use BLAST to find homologous sequences 4. Multiple sequence alignments (MSAs) 5. Trees quantify sequence

More information

Translation and Operons

Translation and Operons Translation and Operons You Should Be Able To 1. Describe the three stages translation. including the movement of trna molecules through the ribosome. 2. Compare and contrast the roles of three different

More information

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega BLAST Multiple Sequence Alignments: Clustal Omega What does basic BLAST do (e.g. what is input sequence and how does BLAST look for matches?) Susan Parrish McDaniel College Multiple Sequence Alignments

More information

Bioinformatics Exercises

Bioinformatics Exercises Bioinformatics Exercises AP Biology Teachers Workshop Susan Cates, Ph.D. Evolution of Species Phylogenetic Trees show the relatedness of organisms Common Ancestor (Root of the tree) 1 Rooted vs. Unrooted

More information

Computational methods for predicting protein-protein interactions

Computational methods for predicting protein-protein interactions Computational methods for predicting protein-protein interactions Tomi Peltola T-61.6070 Special course in bioinformatics I 3.4.2008 Outline Biological background Protein-protein interactions Computational

More information

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

More information

Computational approaches for functional genomics

Computational approaches for functional genomics Computational approaches for functional genomics Kalin Vetsigian October 31, 2001 The rapidly increasing number of completely sequenced genomes have stimulated the development of new methods for finding

More information

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between

More information

The minimal prokaryotic genome. The minimal prokaryotic genome. The minimal prokaryotic genome. The minimal prokaryotic genome

The minimal prokaryotic genome. The minimal prokaryotic genome. The minimal prokaryotic genome. The minimal prokaryotic genome Dr. Dirk Gevers 1,2 1 Laboratorium voor Microbiologie 2 Bioinformatics & Evolutionary Genomics The bacterial species in the genomic era CTACCATGAAAGACTTGTGAATCCAGGAAGAGAGACTGACTGGGCAACATGTTATTCAG GTACAAAAAGATTTGGACTGTAACTTAAAAATGATCAAATTATGTTTCCCATGCATCAGG

More information

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment Introduction to Bioinformatics online course : IBT Jonathan Kayondo Learning Objectives Understand

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary information S1 (box). Supplementary Methods description. Prokaryotic Genome Database Archaeal and bacterial genome sequences were downloaded from the NCBI FTP site (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/)

More information

Chapter 15 Active Reading Guide Regulation of Gene Expression

Chapter 15 Active Reading Guide Regulation of Gene Expression Name: AP Biology Mr. Croft Chapter 15 Active Reading Guide Regulation of Gene Expression The overview for Chapter 15 introduces the idea that while all cells of an organism have all genes in the genome,

More information

Prokaryotic Gene Expression (Learning Objectives)

Prokaryotic Gene Expression (Learning Objectives) Prokaryotic Gene Expression (Learning Objectives) 1. Learn how bacteria respond to changes of metabolites in their environment: short-term and longer-term. 2. Compare and contrast transcriptional control

More information

BIOINFORMATICS: An Introduction

BIOINFORMATICS: An Introduction BIOINFORMATICS: An Introduction What is Bioinformatics? The term was first coined in 1988 by Dr. Hwa Lim The original definition was : a collective term for data compilation, organisation, analysis and

More information

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Evaluation. Course Homepage.

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Evaluation. Course Homepage. CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools Giri Narasimhan ECS 389; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs06.html 1/12/06 CAP5510/CGS5166 1 Evaluation

More information

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Bioinformatics. Dept. of Computational Biology & Bioinformatics Bioinformatics Dept. of Computational Biology & Bioinformatics 3 Bioinformatics - play with sequences & structures Dept. of Computational Biology & Bioinformatics 4 ORGANIZATION OF LIFE ROLE OF BIOINFORMATICS

More information

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University Sequence Alignment: A General Overview COMP 571 - Fall 2010 Luay Nakhleh, Rice University Life through Evolution All living organisms are related to each other through evolution This means: any pair of

More information

7 Multiple Genome Alignment

7 Multiple Genome Alignment 94 Bioinformatics I, WS /3, D. Huson, December 3, 0 7 Multiple Genome Alignment Assume we have a set of genomes G,..., G t that we want to align with each other. If they are short and very closely related,

More information

Control of Prokaryotic (Bacterial) Gene Expression. AP Biology

Control of Prokaryotic (Bacterial) Gene Expression. AP Biology Control of Prokaryotic (Bacterial) Gene Expression Figure 18.1 How can this fish s eyes see equally well in both air and water? Aka. Quatro ojas Regulation of Gene Expression: Prokaryotes and eukaryotes

More information

Genetic Variation: The genetic substrate for natural selection. Horizontal Gene Transfer. General Principles 10/2/17.

Genetic Variation: The genetic substrate for natural selection. Horizontal Gene Transfer. General Principles 10/2/17. Genetic Variation: The genetic substrate for natural selection What about organisms that do not have sexual reproduction? Horizontal Gene Transfer Dr. Carol E. Lee, University of Wisconsin In prokaryotes:

More information

Whole Genome Alignment. Adam Phillippy University of Maryland, Fall 2012

Whole Genome Alignment. Adam Phillippy University of Maryland, Fall 2012 Whole Genome Alignment Adam Phillippy University of Maryland, Fall 2012 Motivation cancergenome.nih.gov Breast cancer karyotypes www.path.cam.ac.uk Goal of whole-genome alignment } For two genomes, A and

More information

# shared OGs (spa, spb) Size of the smallest genome. dist (spa, spb) = 1. Neighbor joining. OG1 OG2 OG3 OG4 sp sp sp

# shared OGs (spa, spb) Size of the smallest genome. dist (spa, spb) = 1. Neighbor joining. OG1 OG2 OG3 OG4 sp sp sp Bioinformatics and Evolutionary Genomics: Genome Evolution in terms of Gene Content 3/10/2014 1 Gene Content Evolution What about HGT / genome sizes? Genome trees based on gene content: shared genes Haemophilus

More information

Regulation of Gene Expression

Regulation of Gene Expression Chapter 18 Regulation of Gene Expression Edited by Shawn Lester PowerPoint Lecture Presentations for Biology Eighth Edition Neil Campbell and Jane Reece Lectures by Chris Romero, updated by Erin Barley

More information

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) Bioinformática Sequence Alignment Pairwise Sequence Alignment Universidade da Beira Interior (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) 1 16/3/29 & 23/3/29 27/4/29 Outline

More information

RGP finder: prediction of Genomic Islands

RGP finder: prediction of Genomic Islands Training courses on MicroScope platform RGP finder: prediction of Genomic Islands Dynamics of bacterial genomes Gene gain Horizontal gene transfer Gene loss Deletion of one or several genes Duplication

More information

AP Bio Module 16: Bacterial Genetics and Operons, Student Learning Guide

AP Bio Module 16: Bacterial Genetics and Operons, Student Learning Guide Name: Period: Date: AP Bio Module 6: Bacterial Genetics and Operons, Student Learning Guide Getting started. Work in pairs (share a computer). Make sure that you log in for the first quiz so that you get

More information

MiGA: The Microbial Genome Atlas

MiGA: The Microbial Genome Atlas December 12 th 2017 MiGA: The Microbial Genome Atlas Jim Cole Center for Microbial Ecology Dept. of Plant, Soil & Microbial Sciences Michigan State University East Lansing, Michigan U.S.A. Where I m From

More information

Investigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST

Investigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST Investigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST Introduction Bioinformatics is a powerful tool which can be used to determine evolutionary relationships and

More information

Comparative Genomics Background and Strategies. Nitya Sharma, Emily Rogers, Kanika Arora, Zhiming Zhao, Yun Gyeong Lee

Comparative Genomics Background and Strategies. Nitya Sharma, Emily Rogers, Kanika Arora, Zhiming Zhao, Yun Gyeong Lee Comparative Genomics Background and Strategies Nitya Sharma, Emily Rogers, Kanika Arora, Zhiming Zhao, Yun Gyeong Lee Introduction Why comparative genomes? h"p://www.ensembl.org/info/about/species.html

More information

Outline. Genome Evolution. Genome. Genome Architecture. Constraints on Genome Evolution. New Evolutionary Synthesis 11/8/16

Outline. Genome Evolution. Genome. Genome Architecture. Constraints on Genome Evolution. New Evolutionary Synthesis 11/8/16 Genome Evolution Outline 1. What: Patterns of Genome Evolution Carol Eunmi Lee Evolution 410 University of Wisconsin 2. Why? Evolution of Genome Complexity and the interaction between Natural Selection

More information

Genomes and Their Evolution

Genomes and Their Evolution Chapter 21 Genomes and Their Evolution PowerPoint Lecture Presentations for Biology Eighth Edition Neil Campbell and Jane Reece Lectures by Chris Romero, updated by Erin Barley with contributions from

More information

Outline. Genome Evolution. Genome. Genome Architecture. Constraints on Genome Evolution. New Evolutionary Synthesis 11/1/18

Outline. Genome Evolution. Genome. Genome Architecture. Constraints on Genome Evolution. New Evolutionary Synthesis 11/1/18 Genome Evolution Outline 1. What: Patterns of Genome Evolution Carol Eunmi Lee Evolution 410 University of Wisconsin 2. Why? Evolution of Genome Complexity and the interaction between Natural Selection

More information

Comparing whole genomes

Comparing whole genomes BioNumerics Tutorial: Comparing whole genomes 1 Aim The Chromosome Comparison window in BioNumerics has been designed for large-scale comparison of sequences of unlimited length. In this tutorial you will

More information

Molecular Evolution & the Origin of Variation

Molecular Evolution & the Origin of Variation Molecular Evolution & the Origin of Variation What Is Molecular Evolution? Molecular evolution differs from phenotypic evolution in that mutations and genetic drift are much more important determinants

More information

Molecular Evolution & the Origin of Variation

Molecular Evolution & the Origin of Variation Molecular Evolution & the Origin of Variation What Is Molecular Evolution? Molecular evolution differs from phenotypic evolution in that mutations and genetic drift are much more important determinants

More information

Sequence Database Search Techniques I: Blast and PatternHunter tools

Sequence Database Search Techniques I: Blast and PatternHunter tools Sequence Database Search Techniques I: Blast and PatternHunter tools Zhang Louxin National University of Singapore Outline. Database search 2. BLAST (and filtration technique) 3. PatternHunter (empowered

More information

Sequence analysis and comparison

Sequence analysis and comparison The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species

More information

Principles of Genetics

Principles of Genetics Principles of Genetics Snustad, D ISBN-13: 9780470903599 Table of Contents C H A P T E R 1 The Science of Genetics 1 An Invitation 2 Three Great Milestones in Genetics 2 DNA as the Genetic Material 6 Genetics

More information

I519 Introduction to Bioinformatics, Genome Comparison. Yuzhen Ye School of Informatics & Computing, IUB

I519 Introduction to Bioinformatics, Genome Comparison. Yuzhen Ye School of Informatics & Computing, IUB I519 Introduction to Bioinformatics, 2011 Genome Comparison Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Whole genome comparison/alignment Build better phylogenies Identify polymorphism

More information

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships Chapter 26: Phylogeny and the Tree of Life You Must Know The taxonomic categories and how they indicate relatedness. How systematics is used to develop phylogenetic trees. How to construct a phylogenetic

More information

Taxonomy. Content. How to determine & classify a species. Phylogeny and evolution

Taxonomy. Content. How to determine & classify a species. Phylogeny and evolution Taxonomy Content Why Taxonomy? How to determine & classify a species Domains versus Kingdoms Phylogeny and evolution Why Taxonomy? Classification Arrangement in groups or taxa (taxon = group) Nomenclature

More information

GENE REGULATION AND PROBLEMS OF DEVELOPMENT

GENE REGULATION AND PROBLEMS OF DEVELOPMENT GENE REGULATION AND PROBLEMS OF DEVELOPMENT By Surinder Kaur DIET Ropar Surinder_1998@ yahoo.in Mob No 9988530775 GENE REGULATION Gene is a segment of DNA that codes for a unit of function (polypeptide,

More information

3.B.1 Gene Regulation. Gene regulation results in differential gene expression, leading to cell specialization.

3.B.1 Gene Regulation. Gene regulation results in differential gene expression, leading to cell specialization. 3.B.1 Gene Regulation Gene regulation results in differential gene expression, leading to cell specialization. We will focus on gene regulation in prokaryotes first. Gene regulation accounts for some of

More information

Tools and Algorithms in Bioinformatics

Tools and Algorithms in Bioinformatics Tools and Algorithms in Bioinformatics GCBA815, Fall 2015 Week-4 BLAST Algorithm Continued Multiple Sequence Alignment Babu Guda, Ph.D. Department of Genetics, Cell Biology & Anatomy Bioinformatics and

More information

Sequence Alignment Techniques and Their Uses

Sequence Alignment Techniques and Their Uses Sequence Alignment Techniques and Their Uses Sarah Fiorentino Since rapid sequencing technology and whole genomes sequencing, the amount of sequence information has grown exponentially. With all of this

More information

Cladistics and Bioinformatics Questions 2013

Cladistics and Bioinformatics Questions 2013 AP Biology Name Cladistics and Bioinformatics Questions 2013 1. The following table shows the percentage similarity in sequences of nucleotides from a homologous gene derived from five different species

More information

Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p

Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p.110-114 Arrangement of information in DNA----- requirements for RNA Common arrangement of protein-coding genes in prokaryotes=

More information

Phylogenetics - Orthology, phylogenetic experimental design and phylogeny reconstruction. Lesser Tenrec (Echinops telfairi)

Phylogenetics - Orthology, phylogenetic experimental design and phylogeny reconstruction. Lesser Tenrec (Echinops telfairi) Phylogenetics - Orthology, phylogenetic experimental design and phylogeny reconstruction Lesser Tenrec (Echinops telfairi) Goals: 1. Use phylogenetic experimental design theory to select optimal taxa to

More information

Outline. I. Methods. II. Preliminary Results. A. Phylogeny Methods B. Whole Genome Methods C. Horizontal Gene Transfer

Outline. I. Methods. II. Preliminary Results. A. Phylogeny Methods B. Whole Genome Methods C. Horizontal Gene Transfer Comparative Genomics Preliminary Results April 4, 2016 Juan Castro, Aroon Chande, Cheng Chen, Evan Clayton, Hector Espitia, Alli Gombolay, Walker Gussler, Ken Lee, Tyrone Lee, Hari Prasanna, Carlos Ruiz,

More information

Prokaryotic Gene Expression (Learning Objectives)

Prokaryotic Gene Expression (Learning Objectives) Prokaryotic Gene Expression (Learning Objectives) 1. Learn how bacteria respond to changes of metabolites in their environment: short-term and longer-term. 2. Compare and contrast transcriptional control

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics Lecture : p he biological problem p lobal alignment p Local alignment p Multiple alignment 6 Background: comparative genomics p Basic question in biology: what properties

More information

Introduction to Bioinformatics

Introduction to Bioinformatics CSCI8980: Applied Machine Learning in Computational Biology Introduction to Bioinformatics Rui Kuang Department of Computer Science and Engineering University of Minnesota kuang@cs.umn.edu History of Bioinformatics

More information

CGS 5991 (2 Credits) Bioinformatics Tools

CGS 5991 (2 Credits) Bioinformatics Tools CAP 5991 (3 Credits) Introduction to Bioinformatics CGS 5991 (2 Credits) Bioinformatics Tools Giri Narasimhan 8/26/03 CAP/CGS 5991: Lecture 1 1 Course Schedules CAP 5991 (3 credit) will meet every Tue

More information

Bioinformatics Chapter 1. Introduction

Bioinformatics Chapter 1. Introduction Bioinformatics Chapter 1. Introduction Outline! Biological Data in Digital Symbol Sequences! Genomes Diversity, Size, and Structure! Proteins and Proteomes! On the Information Content of Biological Sequences!

More information

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other? Phylogeny and systematics Why are these disciplines important in evolutionary biology and how are they related to each other? Phylogeny and systematics Phylogeny: the evolutionary history of a species

More information

Bioinformatics course

Bioinformatics course Bioinformatics course Phylogeny and Comparative genomics 10/23/13 1 Contents-phylogeny Introduction-biology, life classificationtaxonomy Phylogenetic-tree of life, tree representation Why study phylogeny?

More information

Introduction to Molecular and Cell Biology

Introduction to Molecular and Cell Biology Introduction to Molecular and Cell Biology Molecular biology seeks to understand the physical and chemical basis of life. and helps us answer the following? What is the molecular basis of disease? What

More information

Big Idea 3: Living systems store, retrieve, transmit and respond to information essential to life processes. Tuesday, December 27, 16

Big Idea 3: Living systems store, retrieve, transmit and respond to information essential to life processes. Tuesday, December 27, 16 Big Idea 3: Living systems store, retrieve, transmit and respond to information essential to life processes. Enduring understanding 3.B: Expression of genetic information involves cellular and molecular

More information

Research Proposal. Title: Multiple Sequence Alignment used to investigate the co-evolving positions in OxyR Protein family.

Research Proposal. Title: Multiple Sequence Alignment used to investigate the co-evolving positions in OxyR Protein family. Research Proposal Title: Multiple Sequence Alignment used to investigate the co-evolving positions in OxyR Protein family. Name: Minjal Pancholi Howard University Washington, DC. June 19, 2009 Research

More information

Using Bioinformatics to Study Evolutionary Relationships Instructions

Using Bioinformatics to Study Evolutionary Relationships Instructions 3 Using Bioinformatics to Study Evolutionary Relationships Instructions Student Researcher Background: Making and Using Multiple Sequence Alignments One of the primary tasks of genetic researchers is comparing

More information

The Gene The gene; Genes Genes Allele;

The Gene The gene; Genes Genes Allele; Gene, genetic code and regulation of the gene expression, Regulating the Metabolism, The Lac- Operon system,catabolic repression, The Trp Operon system: regulating the biosynthesis of the tryptophan. Mitesh

More information

Supplemental Materials

Supplemental Materials JOURNAL OF MICROBIOLOGY & BIOLOGY EDUCATION, May 2013, p. 107-109 DOI: http://dx.doi.org/10.1128/jmbe.v14i1.496 Supplemental Materials for Engaging Students in a Bioinformatics Activity to Introduce Gene

More information

Example of Function Prediction

Example of Function Prediction Find similar genes Example of Function Prediction Suggesting functions of newly identified genes It was known that mutations of NF1 are associated with inherited disease neurofibromatosis 1; but little

More information

Sequence Alignment (chapter 6)

Sequence Alignment (chapter 6) Sequence lignment (chapter 6) he biological problem lobal alignment Local alignment Multiple alignment Introduction to bioinformatics, utumn 6 Background: comparative genomics Basic question in biology:

More information

2012 Univ Aguilera Lecture. Introduction to Molecular and Cell Biology

2012 Univ Aguilera Lecture. Introduction to Molecular and Cell Biology 2012 Univ. 1301 Aguilera Lecture Introduction to Molecular and Cell Biology Molecular biology seeks to understand the physical and chemical basis of life. and helps us answer the following? What is the

More information

Sequencing alignment Ameer Effat M. Elfarash

Sequencing alignment Ameer Effat M. Elfarash Sequencing alignment Ameer Effat M. Elfarash Dept. of Genetics Fac. of Agriculture, Assiut Univ. aelfarash@aun.edu.eg Why perform a multiple sequence alignment? MSAs are at the heart of comparative genomics

More information

Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are:

Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are: Comparative genomics and proteomics Species available Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are: Vertebrates: human, chimpanzee, mouse, rat,

More information

Chapter 18: Control of Gene Expression

Chapter 18: Control of Gene Expression Chapter 18: Control of Gene Expression 海洋生物研究所 曾令銘 海事大樓 426 室分機 : 5326 Differential Expression of Genes Prokaryotes and eukaryotes precisely regulate gene expression in response to environmental conditions

More information

Microbes usually have few distinguishing properties that relate them, so a hierarchical taxonomy mainly has not been possible.

Microbes usually have few distinguishing properties that relate them, so a hierarchical taxonomy mainly has not been possible. Microbial Taxonomy Traditional taxonomy or the classification through identification and nomenclature of microbes, both "prokaryote" and eukaryote, has been in a mess we were stuck with it for traditional

More information

Microbial Taxonomy. Slowly evolving molecules (e.g., rrna) used for large-scale structure; "fast- clock" molecules for fine-structure.

Microbial Taxonomy. Slowly evolving molecules (e.g., rrna) used for large-scale structure; fast- clock molecules for fine-structure. Microbial Taxonomy Traditional taxonomy or the classification through identification and nomenclature of microbes, both "prokaryote" and eukaryote, has been in a mess we were stuck with it for traditional

More information

Biology Tutorial. Aarti Balasubramani Anusha Bharadwaj Massa Shoura Stefan Giovan

Biology Tutorial. Aarti Balasubramani Anusha Bharadwaj Massa Shoura Stefan Giovan Biology Tutorial Aarti Balasubramani Anusha Bharadwaj Massa Shoura Stefan Giovan Viruses A T4 bacteriophage injecting DNA into a cell. Influenza A virus Electron micrograph of HIV. Cone-shaped cores are

More information

Chromosomal rearrangements in mammalian genomes : characterising the breakpoints. Claire Lemaitre

Chromosomal rearrangements in mammalian genomes : characterising the breakpoints. Claire Lemaitre PhD defense Chromosomal rearrangements in mammalian genomes : characterising the breakpoints Claire Lemaitre Laboratoire de Biométrie et Biologie Évolutive Université Claude Bernard Lyon 1 6 novembre 2008

More information

Bacterial Genetics & Operons

Bacterial Genetics & Operons Bacterial Genetics & Operons The Bacterial Genome Because bacteria have simple genomes, they are used most often in molecular genetics studies Most of what we know about bacterial genetics comes from the

More information

Introduction to Bioinformatics. Shifra Ben-Dor Irit Orr

Introduction to Bioinformatics. Shifra Ben-Dor Irit Orr Introduction to Bioinformatics Shifra Ben-Dor Irit Orr Lecture Outline: Technical Course Items Introduction to Bioinformatics Introduction to Databases This week and next week What is bioinformatics? A

More information

Inferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT

Inferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT Inferring phylogeny Constructing phylogenetic trees Tõnu Margus Contents What is phylogeny? How/why it is possible to infer it? Representing evolutionary relationships on trees What type questions questions

More information

Molecular evolution - Part 1. Pawan Dhar BII

Molecular evolution - Part 1. Pawan Dhar BII Molecular evolution - Part 1 Pawan Dhar BII Theodosius Dobzhansky Nothing in biology makes sense except in the light of evolution Age of life on earth: 3.85 billion years Formation of planet: 4.5 billion

More information

BIOINFORMATICS LAB AP BIOLOGY

BIOINFORMATICS LAB AP BIOLOGY BIOINFORMATICS LAB AP BIOLOGY Bioinformatics is the science of collecting and analyzing complex biological data. Bioinformatics combines computer science, statistics and biology to allow scientists to

More information

Homology and Information Gathering and Domain Annotation for Proteins

Homology and Information Gathering and Domain Annotation for Proteins Homology and Information Gathering and Domain Annotation for Proteins Outline Homology Information Gathering for Proteins Domain Annotation for Proteins Examples and exercises The concept of homology The

More information

CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018

CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018 CONCEPT OF SEQUENCE COMPARISON Natapol Pornputtapong 18 January 2018 SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE Sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of

More information

This document describes the process by which operons are predicted for genes within the BioHealthBase database.

This document describes the process by which operons are predicted for genes within the BioHealthBase database. 1. Purpose This document describes the process by which operons are predicted for genes within the BioHealthBase database. 2. Methods Description An operon is a coexpressed set of genes, transcribed onto

More information

Bioinformatics for Biologists

Bioinformatics for Biologists Bioinformatics for Biologists Sequence Analysis: Part I. Pairwise alignment and database searching Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute Bioinformatics Definitions The use of computational

More information

Biochemistry 324 Bioinformatics. Pairwise sequence alignment

Biochemistry 324 Bioinformatics. Pairwise sequence alignment Biochemistry 324 Bioinformatics Pairwise sequence alignment How do we compare genes/proteins? When we have sequenced a genome, we try and identify the function of unknown genes by finding a similar gene

More information

CSCE555 Bioinformatics. Protein Function Annotation

CSCE555 Bioinformatics. Protein Function Annotation CSCE555 Bioinformatics Protein Function Annotation Why we need to do function annotation? Fig from: Network-based prediction of protein function. Molecular Systems Biology 3:88. 2007 What s function? The

More information

Processes of Evolution

Processes of Evolution 15 Processes of Evolution Forces of Evolution Concept 15.4 Selection Can Be Stabilizing, Directional, or Disruptive Natural selection can act on quantitative traits in three ways: Stabilizing selection

More information

PROTEIN SYNTHESIS INTRO

PROTEIN SYNTHESIS INTRO MR. POMERANTZ Page 1 of 6 Protein synthesis Intro. Use the text book to help properly answer the following questions 1. RNA differs from DNA in that RNA a. is single-stranded. c. contains the nitrogen

More information

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison 10-810: Advanced Algorithms and Models for Computational Biology microrna and Whole Genome Comparison Central Dogma: 90s Transcription factors DNA transcription mrna translation Proteins Central Dogma:

More information

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I) CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I) Contents Alignment algorithms Needleman-Wunsch (global alignment) Smith-Waterman (local alignment) Heuristic algorithms FASTA BLAST

More information