Impact of recurrent gene duplication on adaptation of plant genomes

Similar documents
Impact of recurrent gene duplication on adaptation of plant genomes

Comparative genomics of gene families in relation with metabolic pathways for gene candidates highlighting

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Lecture Notes: BIOL2007 Molecular Evolution

An Optimal System for Evolutionary Cell Biology: the genus Paramecium

Origin and diversification of leucine-rich repeat receptor-like protein kinase (LRR-RLK) genes in plants

Drosophila melanogaster and D. simulans, two fruit fly species that are nearly

A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

Fast coalescent-based branch support using local quartet frequencies

POPULATION GENETICS Biology 107/207L Winter 2005 Lab 5. Testing for positive Darwinian selection

Evolutionary model for the statistical divergence of paralogous and orthologous gene pairs generated by whole genome duplication and speciation

Erasing Errors Due to Alignment Ambiguity When Estimating Positive Selection

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis

Orthology Part I: concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona

Computational methods for predicting protein-protein interactions

Genômica comparativa. João Carlos Setubal IQ-USP outubro /5/2012 J. C. Setubal

REVIEWS. The evolution of gene duplications: classifying and distinguishing between models

NUCLEOTIDE SUBSTITUTIONS AND THE EVOLUTION OF DUPLICATE GENES

Example of Function Prediction

Processes of Evolution

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution

How to detect paleoploidy?

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata.

Understanding relationship between homologous sequences

Comparative Genomics II

Smith et al. American Journal of Botany 98(3): Data Supplement S2 page 1

Concepts and Methods in Molecular Divergence Time Estimation

Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition

Evolution by duplication

Extensive Functional Diversification of the Populus Glutathione S-Transferase Supergene Family C W

types of codon models

Proceedings of the SMBE Tri-National Young Investigators Workshop 2005

Supplementary Information

Lecture 7 Mutation and genetic variation

Outline. Genome Evolution. Genome. Genome Architecture. Constraints on Genome Evolution. New Evolutionary Synthesis 11/8/16

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

a-dB. Code assigned:

Phylogenetic inference

Faster evolving Drosophila paralogs lose expression rate and ubiquity and accumulate more non-synonymous SNPs

Genome Evolution Greg Lang, Department of Biological Sciences

Comparison and Analysis of Heat Shock Proteins in Organisms of the Kingdom Viridiplantae. Emily Germain, Rensselaer Polytechnic Institute

Basic Local Alignment Search Tool

THE EVOLUTION OF DUPLICATED GENES CONSIDERING PROTEIN STABILITY CONSTRAINTS

GENOME DUPLICATION AND GENE ANNOTATION: AN EXAMPLE FOR A REFERENCE PLANT SPECIES.

Gene function annotation

Network Centralities and the Retention of Genes Following Whole Genome Duplication in Saccharomyces cerevisiae

ASTRAL: Fast coalescent-based computation of the species tree topology, branch lengths, and local branch support

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression)

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Divergence Pattern of Duplicate Genes in Protein-Protein Interactions Follows the Power Law

Anatomy of a species tree

FUNDAMENTALS OF MOLECULAR EVOLUTION

Big Questions. Is polyploidy an evolutionary dead-end? If so, why are all plants the products of multiple polyploidization events?

7. Tests for selection

Models for gene duplication when dosage balance works as a transition state to subsequent neo- or sub-functionalization

a-fB. Code assigned:

Hidden Markov models in population genetics and evolutionary biology

Annotation and Nomenclature: A Zebrafish Example. Ingo Braasch, Julian Catchen and John Postlethwait

Natural selection on the molecular level

Maximum Likelihood in Phylogenetics

BLAST. Varieties of BLAST

Objectives. Comparison and Analysis of Heat Shock Proteins in Organisms of the Kingdom Viridiplantae. Emily Germain 1,2 Mentor Dr.

THE genus Paramecium has been used as a model unicellular

UC Berkeley Berkeley Scientific Journal

Relationship Between Gene Duplicability and Diversifiability in the Topology of Biochemical Networks

Evolutionary Rate Covariation of Domain Families

Outline. Genome Evolution. Genome. Genome Architecture. Constraints on Genome Evolution. New Evolutionary Synthesis 11/1/18

Gene Families part 2. Review: Gene Families /727 Lecture 8. Protein family. (Multi)gene family

Grade 11 Biology SBI3U 12

comparative genomics of high throughput data between species and evolution of function

Evolutionary Genomics and Proteomics

- mutations can occur at different levels from single nucleotide positions in DNA to entire genomes.

Package WGDgc. June 3, 2014

GENETICS - CLUTCH CH.22 EVOLUTIONARY GENETICS.

Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium. November 12, 2012

Functional Redundancy and Expression Divergence among Gene Duplicates in Yeast

KaKs Calculator: Calculating Ka and Ks Through Model Selection and Model Averaging

C3020 Molecular Evolution. Exercises #3: Phylogenetics

Phylogenetics - Orthology, phylogenetic experimental design and phylogeny reconstruction. Lesser Tenrec (Echinops telfairi)

GENES ENCODING FLOWER- AND ROOT-SPECIFIC FUNCTIONS ARE MORE RESISTANT TO FRACTIONATION THAN GLOBALLY EXPRESSED GENES IN BRASSICA RAPA.

SEQUENCE DIVERGENCE,FUNCTIONAL CONSTRAINT, AND SELECTION IN PROTEIN EVOLUTION

Potato Genome Analysis

Phylogenetics. BIOL 7711 Computational Bioscience

Insights into three whole-genome duplications gleaned from the. *Department of Biology, Indiana University, Bloomington, IN, USA 47405

STEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization)

Package WGDgc. October 23, 2015

Inferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT

Population-genetic models of the fates of duplicate genes

Orthologs Detection and Applications

Fitness landscapes and seascapes

Biol478/ August

Genome-wide analysis of the MYB transcription factor superfamily in soybean

Temporal Trails of Natural Selection in Human Mitogenomes. Author. Published. Journal Title DOI. Copyright Statement.

An Evaluation of Different Partitioning Strategies for Bayesian Estimation of Species Divergence Times

Metabolic Adaptation after Whole Genome Duplication

RELATING PHYSICOCHEMMICAL PROPERTIES OF AMINO ACIDS TO VARIABLE NUCLEOTIDE SUBSTITUTION PATTERNS AMONG SITES ZIHENG YANG

Molecular Clocks. The Holy Grail. Rate Constancy? Protein Variability. Evidence for Rate Constancy in Hemoglobin. Given

Transcription:

Impact of recurrent gene duplication on adaptation of plant genomes Iris Fischer, Jacques Dainat, Vincent Ranwez, Sylvain Glémin, Jacques David, Jean-François Dufayard, Nathalie Chantret

Plant Genomes High frequency of duplications/retentions in angiosperms - Whole genome duplication (WGD) found in every sequenced angiosperm genome (e.g. Jaillon et al. 2007, Nature; D Hont et al. 2012, Nature; The Tomato Genome Consortium 2012, Nature) - Small-scale duplications (SSD) Tandem duplication enriched in genes involved in environmental stress response (Hanada et al. 2008, Plant Physiol.) Duplication by transposable elements most abundant source for new genes in plants (Freeling et al. 2008, Genome Res.)

Plant Genomes High frequency of duplications/retentions in angiosperms - Whole genome duplication (WGD) found in every sequenced angiosperm genome (e.g. Jaillon et al. 2007, Nature; D Hont et al. 2012, Nature; The Tomato Genome Consortium 2012, Nature) - Small-scale duplications (SSD) Tandem duplication enriched in genes involved in environmental stress response (Hanada et al. 2008, Plant Physiol.) Duplication by transposable elements most abundant source for new genes in plants (Freeling et al. 2008, Genome Res.) Surprisingly high retention rate in angiosperms complex organization of multigene families Angiosperm genomes are very dynamic

Duplications Pseudogenization

Duplications Pseudogenization Gene conservation Selection on duplication Dosage

Duplications Pseudogenization Subfunctionalization Gene conservation Neutral evolution Selection on duplication Duplication-Degeneration- Complementation (DDC) Dosage

Duplications Neofunctionalization Pseudogenization Subfunctionalization Gene conservation Positive selection on new mutation Adaptation Non-synonymous substitutions Synonymous substitutions Neutral evolution Duplication-Degeneration- Complementation (DDC) Selection on duplication Dosage

Duplications Neofunctionalization Pseudogenization Subfunctionalization Gene conservation Positive selection on new mutation Adaptation Non-synonymous substitutions ω = d N /d S ratio = nonsynonymous substitution rate/synonymous substitution rate ω=1: neutral evolution ω<1: purifying selection ω>1: positive selection Neutral evolution Duplication-Degeneration- Complementation (DDC) Selection on duplication Dosage Synonymous substitutions

Lineage specific expansion Heterogeneity in duplication and retention rates has been discovered in several plant species (e.g. Touminen et al. 2011, BMC Genomics; Yonekura- Sakakibara & Hanada 2011, Plant J.)

Lineage specific expansion Gene family tree Heterogeneity in duplication and retention rates has been discovered in several plant species (e.g. Touminen et al. 2011, BMC Genomics; Yonekura- Sakakibara & Hanada 2011, Plant J.) Ultraparalogs: ONLY related by duplication (=UP) -> LSE gene clusters Superorthologs: ONLY related by speciation (=SO) -> reference gene set

Lineage specific expansion Gene family tree Heterogeneity in duplication and retention rates has been discovered in several plant species (e.g. Touminen et al. 2011, BMC Genomics; Yonekura- Sakakibara & Hanada 2011, Plant J.) Ultraparalogs: ONLY related by duplication (=UP) -> LSE gene clusters Superorthologs: ONLY related by speciation (=SO) -> reference gene set Positive selection footprints have been detected frequently in gene families undergoing LSE (e.g. Smith et al. 2013, MBE; Yang et al. 2013, BMC Plant Biol.)

Lineage specific expansion Gene family tree Heterogeneity in duplication and retention rates has been discovered in several plant species (e.g. Touminen et al. 2011, BMC Genomics; Yonekura- Sakakibara & Hanada 2011, Plant J.) Ultraparalogs: ONLY related by duplication (=UP) -> LSE gene clusters Superorthologs: ONLY related by speciation (=SO) -> reference gene set Objective Positive selection footprints have been detected frequently in gene families undergoing LSE (e.g. Smith et al. 2013, MBE; Yang et al. 2013, BMC Plant Biol.) Can we observe positive selection more frequently in LSE genes compared to single-copy genes in several plant genomes?

Data Rouard et al. 2010, Nuc. Acid Res. Full proteoms of 21(35) Viridiplantae and 1 red algae >3300 families Family size from 2 - >3000 sequences

Data Gene family clustering & cluster annotation Rouard et al. 2010, Nuc. Acid Res. Full proteoms of 21(35) Viridiplantae and 1 red algae >3300 families Family size from 2 - >3000 sequences 10 well annotated genomes, we extracted CDS data

Workflow Gene family tree a) identify superorthologs and ultraparalogs in gene family trees (6+ sequences) b) Extract and align sequences (PRANK: Löytynoja & Goldman 2005, PNAS; GUIDANCE: Penn et al. 2010, MBE) c) Infer ML trees (PhyML: Guindon et al. 2010, Syst. Biol.) d) Search for selection footprints (Yang 2007, MBE; Dutheil et al. 2012, MBE) Fischer et al. 2014, BMC Plant Biology

Workflow Gene family tree a) identify superorthologs and ultraparalogs in gene family trees (6+ sequences) b) Extract and align sequences (PRANK: Löytynoja & Goldman 2005, PNAS; GUIDANCE: Penn et al. 2010, MBE) c) Infer ML trees (PhyML: Guindon et al. 2010, Syst. Biol.) d) Search for selection footprints (Yang 2007, MBE; Dutheil et al. 2012, MBE) Fischer et al. 2014, BMC Plant Biology

Workflow Gene family tree a) identify superorthologs and ultraparalogs in gene family trees (6+ sequences) b) Extract and align sequences (PRANK: Löytynoja & Goldman 2005, PNAS; GUIDANCE: Penn et al. 2010, MBE) c) Infer ML trees (PhyML: Guindon et al. 2010, Syst. Biol.) d) Search for selection footprints (Yang 2007, MBE; Dutheil et al. 2012, MBE) Fischer et al. 2014, BMC Plant Biology

Workflow Gene family tree a) identify superorthologs and ultraparalogs in gene family trees (6+ sequences) b) Extract and align sequences (PRANK: Löytynoja & Goldman 2005, PNAS; GUIDANCE: Penn et al. 2010, MBE) c) Infer ML trees (PhyML: Guindon et al. 2010, Syst. Biol.) d) Search for selection footprints (Yang 2007, MBE; Dutheil et al. 2012, MBE) Fischer et al. 2014, BMC Plant Biology

Dataset description 1672 UP clusters 1370 SO clusters Fischer et al. 2014, BMC Plant Biology

Codons under selection Fisch er et al. 2014, BMC Plan t Bio lo gy

Codons under selection 5.38% of UP clusters under positive selection vs. none of the SO clusters Fischer et al. 2014, BMC Plant Biology

ω on the branch level 15583 UP branches; 15181 SO branches Mean ω branches w/ ω>1.2 0.28 0.22% 0.29 0.17% 0.62 8.78% 0.51 5.81% 0.84 15.79% Fischer et al. 2014, BMC Plant Biology

ω on the branch level 15583 UP branches; 15181 SO branches Mean ω branches w/ ω>1.2 0.28 0.22% 0.29 0.17% 0.62 8.78% 0.51 5.81% 0.84 15.79% Fischer et al. 2014, BMC Plant Biology

Effect of cluster size UP clusters still show more signatures of positive selection more frequently after controlling for cluster size effect Fischer et al. 2014, BMC Plant Biology

Publication

Summary We found a high number of codons under selection in LSE genes (5.38%) compared to no single-copy gene clusters under positive selection This pattern is consistent when we look at the branch level where ω is elevated in LSE clusters and we find more branches with ω > 1.2 compared to single-copy genes We used a conservative approach and might have missed some true positives => LSE genes fuel adaptation in angiosperms Fischer et al. 2014, BMC Plant Biology

Summary We found a high number of codons under selection in LSE genes (5.38%) compared to no single-copy gene clusters under positive selection This pattern is consistent when we look at the branch level where ω is elevated in LSE clusters and we find more branches with ω > 1.2 compared to single-copy genes We used a conservative approach and might have missed some true positives => LSE genes fuel adaptation in angiosperms Perspective The approach can be used on other well-annotated genomes or a subset of gene families Sequencing of plant populations will help inferring positive selection at the population level and detect differences in selection between paralogs: more detailed view on evolution of duplicated genes Fischer et al. 2014, BMC Plant Biology

Acknowledgements Mathieu Rouard http://www.greenphyl.org/cgi-bin/index.cgi Thank you for your attention!