Evolution by duplication

Similar documents
C3020 Molecular Evolution. Exercises #3: Phylogenetics

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Phylogenetic inference

Dr. Amira A. AL-Hosary

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

EVOLUTIONARY DISTANCES

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Phylogenetic Tree Reconstruction

BINF6201/8201. Molecular phylogenetic methods


1 ATGGGTCTC 2 ATGAGTCTC

Inferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT

Comparative Genomics II

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata.

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Phylogeny: building the tree of life

Genomes and Their Evolution

Constructing Evolutionary/Phylogenetic Trees

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

Algorithms in Bioinformatics

8/23/2014. Phylogeny and the Tree of Life

Constructing Evolutionary/Phylogenetic Trees

Evolutionary Tree Analysis. Overview

A (short) introduction to phylogenetics

Understanding relationship between homologous sequences

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS

Divergence Pattern of Duplicate Genes in Protein-Protein Interactions Follows the Power Law

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

BLAST. Varieties of BLAST

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics)

Quantifying sequence similarity

Phylogenetic inference: from sequences to trees

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

C.DARWIN ( )

Gene Families part 2. Review: Gene Families /727 Lecture 8. Protein family. (Multi)gene family

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

What is Phylogenetics

Graph Alignment and Biological Networks

17 Non-collinear alignment Motivation A B C A B C A B C A B C D A C. This exposition is based on:

Processes of Evolution

Comparing Genomes! Homologies and Families! Sequence Alignments!

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Massachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution

Lecture 11 Friday, October 21, 2011

Reading for Lecture 13 Release v10

Phylogenetics in the Age of Genomics: Prospects and Challenges

Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D

Cladistics and Bioinformatics Questions 2013

Introduction to Bioinformatics Introduction to Bioinformatics

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

Phylogeny and Evolution. Gina Cannarozzi ETH Zurich Institute of Computational Science

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

Intraspecific gene genealogies: trees grafting into networks

Session 5: Phylogenomics

Comparative genomics. Lucy Skrabanek ICB, WMC 6 May 2008

Molecular Phylogenetics (part 1 of 2) Computational Biology Course João André Carriço

Evolutionary Models. Evolutionary Models

Using algebraic geometry for phylogenetic reconstruction

Chromosomal rearrangements in mammalian genomes : characterising the breakpoints. Claire Lemaitre

Introduction to Bioinformatics Online Course: IBT

Computational Biology: Basics & Interesting Problems

Multiple Sequence Alignment. Sequences

Computational analyses of ancient polyploidy

A Phylogenetic Network Construction due to Constrained Recombination

A bioinformatics approach to the structural and functional analysis of the glycogen phosphorylase protein family

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Lecture Notes: BIOL2007 Molecular Evolution

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

Computational approaches for functional genomics

Orthology Part I concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona

How to read and make phylogenetic trees Zuzana Starostová

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley

Orthology Part I: concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona

Molecular Evolution, course # Final Exam, May 3, 2006

Phylogenetic Reconstruction of Orthology, Paralogy, and Conserved Synteny for Dog and Human

FUNDAMENTALS OF MOLECULAR EVOLUTION

Comparative Bioinformatics Midterm II Fall 2004

USING BLAST TO IDENTIFY PROTEINS THAT ARE EVOLUTIONARILY RELATED ACROSS SPECIES

O 3 O 4 O 5. q 3. q 4. Transition

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics

Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30

Phylogenetics: Building Phylogenetic Trees

Consensus Methods. * You are only responsible for the first two

Computational methods for predicting protein-protein interactions

Phylogenomics of closely related species and individuals

Bootstrapping and Tree reliability. Biol4230 Tues, March 13, 2018 Bill Pearson Pinn 6-057

Early History up to Schedule. Proteins DNA & RNA Schwann and Schleiden Cell Theory Charles Darwin publishes Origin of Species

Transcription:

6.095/6.895 - Computational Biology: Genomes, Networks, Evolution Lecture 18 Nov 10, 2005 Evolution by duplication Somewhere, something went wrong

Challenges in Computational Biology 4 Genome Assembly Regulatory motif discovery Gene Finding DNA Sequence alignment 8 Comparative Genomics TCATGCTAT TCGTGATAA TGAGGATAT 7 Evolutionary Theory TTATCATAT TTATGATTT Database lookup RNA folding 9 Gene expression analysis 12 Protein network analysis RNA transcript 10 Cluster discovery Gibbs sampling 13 Regulatory network inference 14 Emerging network properties

Open questions (?) Image removed due to copyright restrictions. Image removed due to copyright restrictions. Image removed due to copyright restrictions. Panda Bear or raccoon? Out of Africa mitochondrial evolution story? Human evolution Did we ever meet Neanderthal? Primate evolution Are we chimp-like or gorilla-like? Vertebrate evolution How did complex body plans arise? Recent evolution What genes are under selection?

What we have learned Phylogenetic trees Distance-based methods UPGMA, Neighbor-Joining Alignment-based methods Parsimony: set-based, dynamic programming Evolution by nucleotide mutation Probability of back-mutation Markov chain Models of evolution Jukes-Cantor: Kimura 2-parameter model Evolution by rearrangements Sorting by reversals Signed / unsigned version & approximation algorithms

Today s goals: Evolution by Duplication Detecting gene duplication Orthologs and paralogs Gene trees and species trees Reconciliation Detecting genome duplication Evidence across species Evidence in a single species Duplicate gene evolution Detect accelerated divergence Measuring positive selection Gene conversion

Determining orthologs and paralogs

Orthologs and paralogs human mouse rat dog rabbit orthologs paralogs Orthologs arise by speciation typically keep same function Paralogs arise by duplication typically take on new functions Ortholog identification a prerequisite to genomic studies

Why are orthologs & paralogs important? Comparative genomics relies on correct orthology Signal discovery by orthologous conservation Evolutionary genomics relies on complete mapping Duplicated regions are also the most interesting ones Image removed due to copyright restrictions. Please see: Kellis, Manolis, Bruce W. Birren, and Eric S. Lander. "Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae." Nature 428 (April 2004): 617-624. Whole-genome duplication in yeast, fish, and vertebrates

Challenges in genome-wide orthology Tens of thousands of genes Abundant duplication and loss Spurious matches Noisy data Many paralogous families precede species divergence Single phylogeny is impossible not enough traits Protein family expansions Gene conversion, loss, inactivation Common domains in unrelated proteins Similarity not always due to common ancestry Varying rates of mutation (gene & species) Pseudogenes, incorrect/incomplete gene models Goal: Systematic ortholog identification across multiple, complete, mammalian genomes

Current methods for ortholog finding Pair-wise sequence comparison Hit clustering methods Synteny methods Phylogenetic methods Best bi-directional BLAST hits Focuses on one-to-one orthologs (no duplications) Detect clusters in graph of pair-wise hits Difficulty to separate large connected components Detect conserved regions, stretches of nearby hits Genome alignment methods focus on best hits Phylogeny of family clusters orthologs near each other Traditionally applied to specific families (not genome-wide) Current methods successful in limited datasets Complete mammalian genomes present new challenges

Algorithm: SynPhyl Images removed due to copyright restrictions. Combine synteny and phylogeny to find orthologs Initial gene family construction Build phylogenetic trees within families Reconcile gene trees to determine orthology

Building Meaningful Gene Families

Step 1. Initial gene family construction Challenge: How to keep cluster sizes balanced Limitations of traditional clustering methods UPGMA, k-means, graph-partitioning lead to imbalance Bi-partitioning methods lead to arbitrary midway splitting SynPhyl approach: a. Seed clusters with unambiguous hits b. Extend clusters in gene pulling step c. Refine clusters in phylogeny step Balanced Clusters

Step 1. Initial gene family construction (1) Initial cluster seeds from unambiguous matches Syntenic orthologs Multi-species significant BBH Human BBH component Dog human Mouse dog mouse Rat Initial gene clusters

Step 2. Cluster extension (1) Initial cluster seeds from unambiguous matches (2) Cluster extension Pull unassigned genes to existing clusters Ensure distance of new gene within cluster distribution Unassigned genes Initial gene clusters

Step 3. Phylogenetic reconstruction (1) Initial cluster seeds from unambiguous matches (2) Cluster extension (3) Phylogenetic reconstruction Phylogeny for each cluster Align each cluster (MUSCLE protein alignment) Neighbor-Joining: fast, distance-based (JTT model) Bootstrapping used for confidence measure, propagates Use phylogeny to further separate clusters Reconciliation Four mammals - 78,744 genes - 17,586 trees - Largest:` 103 genes Ten fungi - 54,890 genes - 5,537 trees - Largest: 164 genes 80% 60% 90% 90% Extended gene clusters

Bootstrap confidence scores Repeat 100 times Gene cluster Alignment Sample with replacement Bootstrapping: Sample columns from the alignment randomly Build trees based on these columns (NJ, ML, MP) For every internal branch Count how many topologies agree with inferred split Percentage is the bootstrap confidence score Building a final tree Full tree, using all the data Consensus tree Tree

Phylogenetic Tree Reconciliation Gene Tree Ù Species Tree

Gene Tree / Species Tree reconciliation Known species tree G1: Each species contains each subfamily Easy to infer duplication events G2: Loss events in each family hide complex ancestry Reconciliation with species tree recovers the events

Reconciliation to determine orthology Reconcile each gene tree to the species tree Each node in gene tree maps to node in species tree Read off orthology and paralogy Infer gene duplication and loss events Gene tree Species tree d 1 h 1 m 1 r 1 m 2 r 2 gene loss in chimp gene duplication in rodent ancestor dog human chimp mouse rat

Reconciliation algorithm For every node g, decide duplication or speciation Map left child to tree Æ M(a). Map right child to tree Æ M(b) M(g) is least common ancestor of M(a) and M(b) After mapping: g is a duplication node if M(g)={M(a) or M(b)} g is a speciation node if M(g) is distinct from its children Post-processing: count loss edges Limitation: Reconciliation assumes correct species tree Generally NOT the case

Mammalian tree: Abundance of alternate tree topologies Most trees are incorrect Count most frequent subtrees of size four Correct species tree a minority <20% Reason: Long branch attraction Due to rapidly evolving rodent lineage Common phylogenetic reconstruction problem What happens to reconciliation?

Reconciliation with erroneous trees Gene tree Species tree duplication D H M R D H M R D H M R With erroneous trees: Direct reconciliation leads to spurious duplications & losses Solution: Use species tree to constrain gene tree

Towards better reconciliation methods Gene Tree Species Tree new root d 1 h 1 m 1 r 1 Topology 1 d 2 h 2 m m 2 r 2 3 r 3 dog Topology 2 Full solution: Maximize joint likelihood Incorporate cost of reconciliation in tree building Tradeoff: nucleotide mutations & gene duplication/loss One solution: Partitioning by Reconciliation human Key insight: most errors are on older branches, irrelevant to orthology Use species tree to partition gene tree Allow re-rooting of each partition based on species tree Î Apply reconciliation algorithm to each partition mouse rat

Step 4: Partitioning by reconciliation (1) Initial cluster seeds (2) Cluster extension (3) Phylogenetic reconstruction Gene Clusters (4) Partitioning by reconciliation Partitioned Trees Partition Unrooted Trees Unrooted Trees Phylogeny Repeat 100 times Rooted Trees Select root Reconciliation Bootstrapping Loop Ortholog assignments with confidence score

Putting it all together: SynPhyl Gene Annotations Gene Family Clusters Initial clustering Genome synteny Repeat 100 times Unrooted Trees Partitioned Trees Unrooted Trees Partition Phylogeny Rooted Trees Reconciliation Select root Bootstrapping Loop Ortholog and Paralog Database Assign orthology with confidence scores

Benchmarks and Results

Results: Mammalian comparisons Compare human, mouse, rat, dog complete genomes Coverage: 75,753 genes Number of groups: 18,446 (of which 13,741 have all four species) One-to-one orthologs in four species: 12,359 Species Present # Groups Dog Human Mouse Rat 13741 Count of ortholog groups by species - Human Mouse Rat 752 Dog - Mouse Rat 457 Dog Human - Rat 270 Dog Human Mouse - 1073 - - Mouse Rat 502 Dog Human - - 361 Dog - Mouse - 101 Dog - - Rat 97 - Human Mouse - 75 - Human - Rat 41 Contribution of phylogenetic reconstruction More one-to-one orthologs: 11,619 Æ 12,359 Large families split into small groups: 17,586 Æ 18,446 Figure by MIT OCW.

Higher resolution: resolving fine-grain correspondence

Higher sensitivity: recognize subtle duplication events S P E C I E S C O M P O S I T I O N S DOG HUMAN MOUSE RAT COUNT 1 1 1 1 1 2 1 1 1 1 2 1 1 1 1 2 2 1 1 1 2 2 2 2 1 1 2 2 1 2 2 2 OTHER 10444 225 214 205 106 59 39 37 340 Figure by MIT OCW. Additional duplicates found for ENSEMBL 1-to-1 orthologs Hundreds of additional duplicates detected Confirmed by branch lengths and topology

SynPhyl comparison to direct reconciliation Fewer gene losses Fewer gene duplications Direct reconciliation SynPhyl reconciliation Total count of losses: 18,352 11,750 Total count of duplications: 10,114 8,942 More gene trees reconcile to species tree Gene duplications and losses dramatically decreased

Result: Genome-wide correspondence of multiple species Image removed due to copyright restrictions.

Summary / Contributions SynPhyl: new tool for genome-wide orthology Uses synteny, phylogeny, and known species tree Automatically determines orthologs and paralogs Returns ortholog assignments, trees for each family Algorithmic highlights Initial clustering constrained by synteny Fine-grain correspondence uses phylogeny Partition by reconciliation constrained by species trees Advantages of the algorithm Practical, fast (< ½ day on a PC) Uses information available: phylogeny, synteny Confidence metric: bootstrap values propagate to orthology Phylogeny ensures consistent orthologs (no over-collapsing) Performance Successfully applied to mammals, fungi Fine-grain resolution: phylogeny disambiguates large families High sensitivity: captures all duplication events

Outline Detecting gene duplication Orthologs and paralogs Gene trees and species trees Reconciliation Detecting genome duplication Evidence across species Evidence in a single species Duplicate gene evolution Detect accelerated divergence Measuring positive selection Gene conversion

Genome Duplication

A range of evolutionary distances 20 Myr 5 Myr S.cerevisiae S.paradoxus S.mikatae S.bayanus 100 Myr K. waltii Ability to ask different set of questions

Gene correspondence Image removed due to copyright restrictions. Please see: Kellis, Manolis, Bruce W. Birren, and Eric S. Lander. "Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae." Nature 428 (April 8, 2004): 617-624.

Gene correspondence Image removed due to copyright restrictions. Please see: Kellis, Manolis, Bruce W. Birren, and Eric S. Lander. "Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae." Nature 428 (April 8, 2004): 617-624.

Signatures of evolutionary events Image removed due to copyright restrictions. Please see: Kellis, Manolis, Bruce W. Birren, and Eric S. Lander. "Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae." Nature 428 (April 8, 2004): 617-624. Few genes remain in 2 copies Gene interleaving is evidence of complete duplication

Duplicate mapping tiles K. waltii Image removed due to copyright restrictions. Please see: Figure 3 in Kellis, Manolis, Bruce W. Birren, and Eric S. Lander. "Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae." Nature 428 (April 8, 2004): 617-624.

Duplicate mapping of centromeres Image removed due to copyright restrictions. Please see: Figure 2 in Kellis, Manolis, Bruce W. Birren, and Eric S. Lander. "Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae." Nature 428 (April 8, 2004): 617-624. Recognize sister regions solely based on gene order

Conclusion: Whole Genome Duplication has happened Image removed due to copyright restrictions. Please see: Figure 1 in Kellis, Manolis, Bruce W. Birren, and Eric S. Lander. "Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae." Nature 428 (April 8, 2004): 617-624.

Whole Genome Duplications are everywhere! Image removed due to copyright restrictions. Yeast Duplication - Most genes 1-to-1 mapping - Gene interleaving evidence of duplication - Complete tiling of the genome Image removed due to copyright restrictions. Vertebrate Duplication in Fish - Fish: Gene order not conserved, only chromosomes - Mammals: Gene order conserved, not chromosomes Image removed due to copyright restrictions. Two rounds of WGD in base of vertebrate lineage - Build clusters of related genes (use Ciona as outgroup) - Count duplications by reconciliation - Find regions of duplicate overlap Æ 4-way synteny

Genome duplication evidence in a single species

Evidence of duplication using a single genome? Image removed due to copyright restrictions. Please see: Figure 1 in Kellis, Manolis, Bruce W. Birren, and Eric S. Lander. "Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae." Nature 428 (April 8, 2004): 617-624. Genomic evidence However Conserved order of paralogous genes Same transcriptional orientation Interspersed with single-copy genes Interpretation: Genome duplication followed by gene loss

Whole genome duplication is controversial Insufficient evidence Only 50% of genome in duplicate regions Only 8% of genes present in two copies Extensive redundancy outside duplicate regions Evidence against WGD Divergence-based dating show multiple times Other species have similar level of redundancy Alternative evolutionary scenario proposed Independent segmental duplications Also consistent with the evidence Image removed due to copyright restrictions. Please see: Figure 1 in Kellis, Manolis, Bruce W. Birren, and Eric S. Lander. "Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae." Nature 428 (April 8, 2004): 617-624. There was a whole-genome duplication. Wolfe, Nature 97 There was no whole-genome duplication. Dujon, FEBS 2000 At least some chrom dup. occurred independently Langkjaer, JMB, 2000 Dynamic equilibrium of duplications and loss Llorente, FEBS, 2000 Recent evidence supports single event. Wong, PNAS 02 Continuous block duplications and deletions Dujon, Yeast 2003 Dup. precedes divergence from Kluyveromyces. Piskur, Nature, 2003 Telomere-mediated duplication events Coissac, Mol Bio Evo 1997 Multiple closely spaced events Friedman, Genome Res, 2003 Spontaneous duplication of large chromosomal segments Koszul, EMBO 04 Evidence remains inconclusive

Conclusion: Whole Genome Duplication has happened Image removed due to copyright restrictions. Please see: Figure 1 in Kellis, Manolis, Bruce W. Birren, and Eric S. Lander. "Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae." Nature 428 (April 8, 2004): 617-624.

Outline Detecting gene duplication Orthologs and paralogs Gene trees and species trees Reconciliation Detecting genome duplication Evidence across species Evidence in a single species Duplicate gene evolution Detect accelerated divergence Measuring positive selection Gene conversion

Post-duplication evolution

Whole-genome duplication results in 500 new genes Number of genes 10,000 5,000 WGD Gene Loss 5,500 ~500 gained 100Myrs Today time Evidence of accelerated gene evolution

Fate of duplicated genes 457 genes kept in two copies, result of selection Involved in sugar metabolism and fermentation WGD S. cerevisiae copy 1 S. cerevisiae copy 2 K. waltii Evidence of accelerated protein divergence?

Measuring accelerated divergence 1 GTT(V:Val) TTT(F:Phe)? Two shortest paths possible GTA(V:Val) 2 TTA(L:Leu) Protein divergence Count amino-acid changes Use BLOSUM substitution matrix Nucleotide divergence Count nucleotide substitutions Correct for back-mutations Use transition/transversion evolutionary model d N / d S Two types of nucleotide substitutions S = synonymous: Preserve amino-acid translation N = non-synonymous: Change amino-acid Count synonymous / non-synonymous sites Depends on path taken between two codons

Scenarios for rapid gene evolution One copy faster Scer - copy2 Scer - copy1 Kwal Ohno, 1970 Both copies faster Scer - copy1 Kwal Scer - copy2 Lynch, 2000 20% of duplicated genes show acceleration 95% of cases: Only one copy faster

Emerging gene functions after duplication Origin of replication Æ silencing 4-fold acceleration Scer Scer - Orc1 (origin of replication) Kwal -Orc1 - Sir3 (silencing) Translation initiation Æ anti-viral defense 3-fold acceleration Scer - Hbs1 (translation initiation) Kwal - Hbs1 Scer - Ski7 (anti-viral defense) Asymmetric divergence Æ recognize ancestral / derived

Distinct functional properties Ancestral function Derived function Gene deletion Lethal (20%) Never lethal Gain new function and lose ancestral function

Distinct functional properties Ancestral function Derived function Gene deletion Expression Localization Lethal (20%) Abundant General Never lethal Specific (stress, starvation) Specific (mitochondrion, spores) Gain new function and lose ancestral function

Gene conversion

Decelerated evolution Scer copy1 Scer copy2 Kwal 60 gene pairs (13% of 457 pairs) 98% protein identity (all pairs: 55%) 90% identity in 4fold degenerate sites (all pairs: 41%) Not recent duplication Gene order argues ancestral WGD pairs Gene conversion?

Evidence of gene conversion WGD YBL072C S. cerevisiae YER102W S. cerevisiae YBL072C S. bayanus YER102W S. bayanus K. waltii A. gossypii Tree root reveals time of duplication No acceleration in the K. waltii branch The two genes have recently replaced each other Branching order reveals gene conversion Paralogs are closer to each other than to their ortholog Both S. cerevisiae and S. bayanus show gene conversion Periodic gene conversion

Summary Detecting gene duplication Orthologs and paralogs Gene trees and species trees Reconciliation Detecting genome duplication Evidence across species Evidence in a single species Duplicate gene evolution Detect accelerated divergence Measuring positive selection Gene conversion