Impact of recurrent gene duplication on adaptation of plant genomes Iris Fischer, Jacques Dainat, Vincent Ranwez, Sylvain Glémin, Jacques David, Jean-François Dufayard, Nathalie Chantret
Plant Genomes High frequency of duplications/retentions in angiosperms - Whole genome duplication (WGD) found in every sequenced angiosperm genome (e.g. Jaillon et al. 2007, Nature; D Hont et al. 2012, Nature; The Tomato Genome Consortium 2012, Nature) - Small-scale duplications (SSD) Tandem duplication enriched in genes involved in environmental stress response (Hanada et al. 2008, Plant Physiol.) Duplication by transposable elements most abundant source for new genes in plants (Freeling et al. 2008, Genome Res.)
Plant Genomes High frequency of duplications/retentions in angiosperms - Whole genome duplication (WGD) found in every sequenced angiosperm genome (e.g. Jaillon et al. 2007, Nature; D Hont et al. 2012, Nature; The Tomato Genome Consortium 2012, Nature) - Small-scale duplications (SSD) Tandem duplication enriched in genes involved in environmental stress response (Hanada et al. 2008, Plant Physiol.) Duplication by transposable elements most abundant source for new genes in plants (Freeling et al. 2008, Genome Res.) Surprisingly high retention rate in angiosperms complex organization of multigene families Angiosperm genomes are very dynamic
Duplications Pseudogenization
Duplications Pseudogenization Gene conservation Selection on duplication Dosage
Duplications Pseudogenization Subfunctionalization Gene conservation Neutral evolution Selection on duplication Duplication-Degeneration- Complementation (DDC) Dosage
Duplications Neofunctionalization Pseudogenization Subfunctionalization Gene conservation Positive selection on new mutation Adaptation Non-synonymous substitutions Synonymous substitutions Neutral evolution Duplication-Degeneration- Complementation (DDC) Selection on duplication Dosage
Duplications Neofunctionalization Pseudogenization Subfunctionalization Gene conservation Positive selection on new mutation Adaptation Non-synonymous substitutions ω = d N /d S ratio = nonsynonymous substitution rate/synonymous substitution rate ω=1: neutral evolution ω<1: purifying selection ω>1: positive selection Neutral evolution Duplication-Degeneration- Complementation (DDC) Selection on duplication Dosage Synonymous substitutions
Lineage specific expansion Heterogeneity in duplication and retention rates has been discovered in several plant species (e.g. Touminen et al. 2011, BMC Genomics; Yonekura- Sakakibara & Hanada 2011, Plant J.)
Lineage specific expansion Gene family tree Heterogeneity in duplication and retention rates has been discovered in several plant species (e.g. Touminen et al. 2011, BMC Genomics; Yonekura- Sakakibara & Hanada 2011, Plant J.) Ultraparalogs: ONLY related by duplication (=UP) -> LSE gene clusters Superorthologs: ONLY related by speciation (=SO) -> reference gene set
Lineage specific expansion Gene family tree Heterogeneity in duplication and retention rates has been discovered in several plant species (e.g. Touminen et al. 2011, BMC Genomics; Yonekura- Sakakibara & Hanada 2011, Plant J.) Ultraparalogs: ONLY related by duplication (=UP) -> LSE gene clusters Superorthologs: ONLY related by speciation (=SO) -> reference gene set Positive selection footprints have been detected frequently in gene families undergoing LSE (e.g. Smith et al. 2013, MBE; Yang et al. 2013, BMC Plant Biol.)
Lineage specific expansion Gene family tree Heterogeneity in duplication and retention rates has been discovered in several plant species (e.g. Touminen et al. 2011, BMC Genomics; Yonekura- Sakakibara & Hanada 2011, Plant J.) Ultraparalogs: ONLY related by duplication (=UP) -> LSE gene clusters Superorthologs: ONLY related by speciation (=SO) -> reference gene set Objective Positive selection footprints have been detected frequently in gene families undergoing LSE (e.g. Smith et al. 2013, MBE; Yang et al. 2013, BMC Plant Biol.) Can we observe positive selection more frequently in LSE genes compared to single-copy genes in several plant genomes?
Data Rouard et al. 2010, Nuc. Acid Res. Full proteoms of 21(35) Viridiplantae and 1 red algae >3300 families Family size from 2 - >3000 sequences
Data Gene family clustering & cluster annotation Rouard et al. 2010, Nuc. Acid Res. Full proteoms of 21(35) Viridiplantae and 1 red algae >3300 families Family size from 2 - >3000 sequences 10 well annotated genomes, we extracted CDS data
Workflow Gene family tree a) identify superorthologs and ultraparalogs in gene family trees (6+ sequences) b) Extract and align sequences (PRANK: Löytynoja & Goldman 2005, PNAS; GUIDANCE: Penn et al. 2010, MBE) c) Infer ML trees (PhyML: Guindon et al. 2010, Syst. Biol.) d) Search for selection footprints (Yang 2007, MBE; Dutheil et al. 2012, MBE) Fischer et al. 2014, BMC Plant Biology
Workflow Gene family tree a) identify superorthologs and ultraparalogs in gene family trees (6+ sequences) b) Extract and align sequences (PRANK: Löytynoja & Goldman 2005, PNAS; GUIDANCE: Penn et al. 2010, MBE) c) Infer ML trees (PhyML: Guindon et al. 2010, Syst. Biol.) d) Search for selection footprints (Yang 2007, MBE; Dutheil et al. 2012, MBE) Fischer et al. 2014, BMC Plant Biology
Workflow Gene family tree a) identify superorthologs and ultraparalogs in gene family trees (6+ sequences) b) Extract and align sequences (PRANK: Löytynoja & Goldman 2005, PNAS; GUIDANCE: Penn et al. 2010, MBE) c) Infer ML trees (PhyML: Guindon et al. 2010, Syst. Biol.) d) Search for selection footprints (Yang 2007, MBE; Dutheil et al. 2012, MBE) Fischer et al. 2014, BMC Plant Biology
Workflow Gene family tree a) identify superorthologs and ultraparalogs in gene family trees (6+ sequences) b) Extract and align sequences (PRANK: Löytynoja & Goldman 2005, PNAS; GUIDANCE: Penn et al. 2010, MBE) c) Infer ML trees (PhyML: Guindon et al. 2010, Syst. Biol.) d) Search for selection footprints (Yang 2007, MBE; Dutheil et al. 2012, MBE) Fischer et al. 2014, BMC Plant Biology
Dataset description 1672 UP clusters 1370 SO clusters Fischer et al. 2014, BMC Plant Biology
Codons under selection Fisch er et al. 2014, BMC Plan t Bio lo gy
Codons under selection 5.38% of UP clusters under positive selection vs. none of the SO clusters Fischer et al. 2014, BMC Plant Biology
ω on the branch level 15583 UP branches; 15181 SO branches Mean ω branches w/ ω>1.2 0.28 0.22% 0.29 0.17% 0.62 8.78% 0.51 5.81% 0.84 15.79% Fischer et al. 2014, BMC Plant Biology
ω on the branch level 15583 UP branches; 15181 SO branches Mean ω branches w/ ω>1.2 0.28 0.22% 0.29 0.17% 0.62 8.78% 0.51 5.81% 0.84 15.79% Fischer et al. 2014, BMC Plant Biology
Effect of cluster size UP clusters still show more signatures of positive selection more frequently after controlling for cluster size effect Fischer et al. 2014, BMC Plant Biology
Publication
Summary We found a high number of codons under selection in LSE genes (5.38%) compared to no single-copy gene clusters under positive selection This pattern is consistent when we look at the branch level where ω is elevated in LSE clusters and we find more branches with ω > 1.2 compared to single-copy genes We used a conservative approach and might have missed some true positives => LSE genes fuel adaptation in angiosperms Fischer et al. 2014, BMC Plant Biology
Summary We found a high number of codons under selection in LSE genes (5.38%) compared to no single-copy gene clusters under positive selection This pattern is consistent when we look at the branch level where ω is elevated in LSE clusters and we find more branches with ω > 1.2 compared to single-copy genes We used a conservative approach and might have missed some true positives => LSE genes fuel adaptation in angiosperms Perspective The approach can be used on other well-annotated genomes or a subset of gene families Sequencing of plant populations will help inferring positive selection at the population level and detect differences in selection between paralogs: more detailed view on evolution of duplicated genes Fischer et al. 2014, BMC Plant Biology
Acknowledgements Mathieu Rouard http://www.greenphyl.org/cgi-bin/index.cgi Thank you for your attention!