Genômica comparativa. João Carlos Setubal IQ-USP outubro /5/2012 J. C. Setubal

Similar documents
I519 Introduction to Bioinformatics, Genome Comparison. Yuzhen Ye School of Informatics & Computing, IUB

17 Non-collinear alignment Motivation A B C A B C A B C A B C D A C. This exposition is based on:

I519 Introduction to Bioinformatics, Genome Comparison. Yuzhen Ye School of Informatics & Computing, IUB

BMI/CS 776 Lecture #20 Alignment of whole genomes. Colin Dewey (with slides adapted from those by Mark Craven)

Comparative genomics: Overview & Tools + MUMmer algorithm

7 Multiple Genome Alignment

Whole Genome Alignments and Synteny Maps

Outline. I. Methods. II. Preliminary Results. A. Phylogeny Methods B. Whole Genome Methods C. Horizontal Gene Transfer

Whole Genome Alignment. Adam Phillippy University of Maryland, Fall 2012

Comparative Genomics II

General context Anchor-based method Evaluation Discussion. CoCoGen meeting. Accuracy of the anchor-based strategy for genome alignment.

Evolution at the nucleotide level: the problem of multiple whole-genome alignment

Example of Function Prediction

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

Computational approaches for functional genomics

Comparative Genomics Background and Strategies. Nitya Sharma, Emily Rogers, Kanika Arora, Zhiming Zhao, Yun Gyeong Lee

Inferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

Orthologs Detection and Applications

GenomeBlast: a Web Tool for Small Genome Comparison

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

Bioinformatics course

Orthology Part I: concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona

Introduction to Bioinformatics

Comparative Bioinformatics Midterm II Fall 2004

Applications of genome alignment

Multiple Whole Genome Alignment

Basic Local Alignment Search Tool

Research Proposal. Title: Multiple Sequence Alignment used to investigate the co-evolving positions in OxyR Protein family.

Computational methods for predicting protein-protein interactions

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Sequence Alignment (chapter 6)

Chapters AP Biology Objectives. Objectives: You should know...

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

8/23/2014. Phylogeny and the Tree of Life

Taxonomy. Content. How to determine & classify a species. Phylogeny and evolution

arxiv: v1 [q-bio.gn] 30 Oct 2009

Bioinformatics. Part 8. Sequence Analysis An introduction. Mahdi Vasighi

Introduction to Bioinformatics Integrated Science, 11/9/05

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010

Cluster Analysis of Gene Expression Microarray Data. BIOL 495S/ CS 490B/ MATH 490B/ STAT 490B Introduction to Bioinformatics April 8, 2002

Sequence Database Search Techniques I: Blast and PatternHunter tools

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline


METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.

BLAST. Varieties of BLAST

Curriculum Links. AQA GCE Biology. AS level

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Genomes and Their Evolution

Topology. 1 Introduction. 2 Chromosomes Topology & Counts. 3 Genome size. 4 Replichores and gene orientation. 5 Chirochores.

Background: comparative genomics. Sequence similarity. Homologs. Similarity vs homology (2) Similarity vs homology. Sequence Alignment (chapter 6)

Principles of Genetics

Genetic Basis of Variation in Bacteria

Introduction to polyphasic taxonomy

Bioinformatics Exercises

Handling Rearrangements in DNA Sequence Alignment

Comparative Genomics Background & Strategy. Faction 2

Orthology Part I concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona

Cladistics and Bioinformatics Questions 2013

Evolution by duplication

Annotation and Nomenclature: A Zebrafish Example. Ingo Braasch, Julian Catchen and John Postlethwait

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Genetic Variation: The genetic substrate for natural selection. Horizontal Gene Transfer. General Principles 10/2/17.

Introduction to Bioinformatics Introduction to Bioinformatics

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT

PGA: A Program for Genome Annotation by Comparative Analysis of. Maximum Likelihood Phylogenies of Genes and Species

Dr. Amira A. AL-Hosary

GENETICS - CLUTCH CH.1 INTRODUCTION TO GENETICS.

Genome Rearrangements In Man and Mouse. Abhinav Tiwari Department of Bioengineering

RGP finder: prediction of Genomic Islands

Understanding relationship between homologous sequences

Session 5: Phylogenomics

Biol478/ August

Effects of Gap Open and Gap Extension Penalties

CREATING PHYLOGENETIC TREES FROM DNA SEQUENCES

Integration of Omics Data to Investigate Common Intervals

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016

Improved Sensitivity And Reliability Of Anchor Based Genome Alignment

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Gene function annotation

Lecture 8 Multiple Alignment and Phylogeny

Comparing Genomes! Homologies and Families! Sequence Alignments!

Figure S1: Mitochondrial gene map for Pythium ultimum BR144. Arrows indicate transcriptional orientation, clockwise for the outer row and

SPRINGFIELD TECHNICAL COMMUNITY COLLEGE ACADEMIC AFFAIRS

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis

Pairwise & Multiple sequence alignments

2. What was the Avery-MacLeod-McCarty experiment and why was it significant? 3. What was the Hershey-Chase experiment and why was it significant?

EVOLUTIONARY DISTANCES

Review sheet for the material covered by exam III

Warm-Up- Review Natural Selection and Reproduction for quiz today!!!! Notes on Evidence of Evolution Work on Vocabulary and Lab

Review sheet for Mendelian genetics through human evolution. What organism did Mendel study? What characteristics of this organism did he examine?

Graph Alignment and Biological Networks

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Impact of recurrent gene duplication on adaptation of plant genomes

Phylogenetic Tree Reconstruction

Genomics and Bioinformatics

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata.

Comparing whole genomes

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Transcription:

Genômica comparativa João Carlos Setubal IQ-USP outubro 2012 11/5/2012 J. C. Setubal 1

Comparative genomics There are currently (out/2012) 2,230 completed sequenced microbial genomes publicly available Many are of closely related species Why compare? How to do it? 2

Why comparative genomics? To understand the genomic basis of the present Differences in lifestyle pathogen vs. nonpathogen Obligate vs. free-living Host specificity animals vs. plants, plant X vs. plant Y, etc In the case of pathogens: this understanding should help us in fighting disease To understand the past How organisms evolved to be what they are 3

Citrus canker Xanthomonas axonopodis pathovar citri 4

Black rot: Xanthomonas campestris pathovar campestris 5

What is comparative genomics Assuming input is the sequence and its annotation There are many ways that genomes can be compared Different resolutions Whole genome Genome alignments Synteny (gene order conservation) Anomalous regions Gene-centric Gene families and unique genes Gene clustering by function Gene sequence variations Codon usage, SNPs, indels, pseudogenes 6

Resolution Low resolution Scope: entire genomes Example event: rearrangement High resolution Scope: nucleotide sequences Example event: single mutation 7

Genome-wide evolutionary events Replicon rearrangements Gene/region duplication Gene/region loss Chromosome plasmid DNA exchange Lateral transfer 8

Whole replicon alignments: the pairwise case If the sequences were identical we would see B A 9

an inversion A B C D A D C B 10

D B C A A B C D Such inversions seem to happen around 5 November 2012 the origin or terminus JC Setubal of replication 11

13

14

15

E. coli K12 Promer alignment Red: direct; green: reverse Xanthomonas axonopodis pv citri Both are γ proteobacteria! 16

Eisen JA, Heidelberg JF, White O, Salzberg SL. Evidence for JC symmetric Setubal chromosomal inversions around the replication origin in bacteria. Genome Biol. 2000;1(6):RESEARCH0011 5 November 2012 18

Replicon sequence comparisons Basic tool: MUMmer Delcher AL, Phillippy A, Carlton J, Salzberg SL. Fast algorithms for large-scale genome alignment and comparison. Nucleic Acids Res. 2002 Jun 1;30(11):2478-83. Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, Salzberg SL. Versatile and open software for comparing large genomes. Genome Biol. 2004;5(2):R12 http://mummer.sourceforge.net 19

Basics of MUMmer It finds Maximal Unique Matches These are exact matches above a user-specified threshold that are unique Exact matches found are clustered and extended (using dynamic programming) Result is approximate matches Data structure for exact match finding: suffix tree Difficult to build but very fast Nucmer and promer Both very fast O(n + #MUMs), n = genome lengths 20

Whole replicon multiple alignment The program MAUVE Darling AC, Mau B, Blattner FR, Perna NT. Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res. 2004 Jul;14(7):1394-403. 21

Main chromosome alignment MAUVE 22

Chromosome 2 alignment MAUVE 23

Chromosome alignment MAUVE Dugway RSA 493 RSA 331 24

Genome Alignments MAUVE 25

How MAUVE works Seed-and-extend hashing Seeds/anchors: Maximal Multiple Unique Matches of minimum length k Result: Local collinear blocks (LCBs) O(G 2 n + Gn log Gn), G = # genomes, n = average genome length 26

Alignment algorithm 1. Find Multi-MUMs 2. Use the multi-mums to calculate a phylogenetic guide tree 3. Find LCBs (subset of multi-mums; filter out spurious matches; requires minimum weight) 4. Recursive anchoring to identify additional anchors (extension of LCBs) 5. Progressive alignment (CLUSTALW) using guide tree 27

Gene-centric comparisons Homologs: genes that have the same ancestor; in general retain the same function Orthologs: homologs from different species (arise from speciation) Paralogs: homologs from the same species (arise from duplication) Duplication before speciation (ancient duplication) Out-paralogs; may not have the same function Duplication after speciation (recent duplication) In-paralogs; likely to have the same function 28

Gene Set Computations Given a set of genomes, represented by their proteomes or sets of protein sequences Given homlogous relationships (as given for example by orthomcl) Which genes are shared by genomes X and Y? Which genes are unique to genome Z? Venn or extended Venn diagrams 29

3-way genome comparison A B C 30

Fig. 4. Net gene loss or gain throughout the evolution of the {alpha}-proteobacterial species Boussau, Bastien et al. (2004) Proc. Natl. Acad. Sci. USA 101, 9722-9727 34 Copyright 2004 by the National Academy of Sciences

Proteome alignment done with LCS (top: Xcc; bottom: Xac ) Blue: BBHs that are in the LCS; dark blue: BBHs not in the LCS; red: Xac specifics; yellow: Xcc specifics 35

36

37

38

What do the tables show conserved blocks (aka microsyntenic regions ), and how these blocks appear in different replicons across the genomes compared some of these blocks are not operons (would need to show strand) possible block losses 39

Polymorphism detection indels, SNPs pseudogenes 42

Figure 4. I II

Gluconate isomerase A Brucella gene in the process a decay 44