Aoife McLysaght Dept. of Genetics Trinity College Dublin
Evolution of genome arrangement Evolution of genome content.
Evolution of genome arrangement Gene order changes Inversions, translocations Evolution of genome content.
Evolution of genome arrangement Gene order changes Inversions, translocations Evolution of genome content Gene gain (sequence divergence, duplication, recombination, horizontal transfer) Gene loss (deletion).
Evolution of genome arrangement Gene order changes Inversions, translocations Evolution of genome content Gene gain (sequence divergence, duplication, recombination, horizontal transfer) Gene loss (deletion) One or more genes per event
Translate knowledge from sequenced or model genomes to organism of interest Positional cloning of genes Use probes designed in one genome to detect a target in another genome Improve model parameters for phylogenetic inference from genome arrangement
Not just a bag of genes Genome organisation contains information Order of Hox genes corresponds to spatial pattern of gene expression Clustering of housekeeping genes By observation of allowed changes gain understanding of genomic constraints and plasticity
Greater power to detect change Precision Can infer lineage in which change occurred Detect direction and rate of change More genomes also increase computational burden
20 completely sequenced genomes 150-300kb containing ~200 genes
Double-stranded DNA viruses, no RNA stage Replicate in the host cytoplasm Entomopox insect infecting Chordopox vertebrate infecting Orthopox subset of chordopox which includes smallpox (variola) and vaccinia
How are these genomes arranged? How has genome content changed? Is the rate of change constant?
How are these genomes arranged? How has genome content changed? Is the rate of change constant? Can we detect adaptive genome evolution?
Significant sequence similarity How significant? over a long stretch of the protein How long?
0 0 0 0 1.0 10 14 16 15 0.9 14 22 25 26 0.8 18 26 30 28 0.7 19 30 33 29 0.6 20 30 32 17 0.5 20 31 34 10 0.4 20 31 33 7 0.3 19 29 32 4 0.2 19 29 31 0 0.1 19 29 31 0 0.0 1e-20 1e-10 1e-5 1 e-value threshold Minimum aligned proportion
Complete linkage Single-link clustering Our method
C A E D B
C A E G F D B
C A E F J D C B E G I D H B
4042 total proteins 3384 proteins classified into 875 groups 813 complete linkage 521 groups of 1 member 150 groups of 2 members 204 3 members
34 orthologues present in all genomes
34 orthologues present in all genomes
92 orthologues present in all orthopox genomes
Examine phylogenetic spread of a group of orthologues Assign gene gain and loss events to branches in the phylogeny
Tested for uniform rate of gene acquisition Assume a molecular clock
Tested for uniform rate of gene acquisition Assume a molecular clock Are gene acquisition events distributed randomly throughout the tree?
Tested for uniform rate of gene acquisition Assume a molecular clock Are gene acquisition events distributed randomly throughout the tree? Simulations
Significant deficit Significant excess
Slower rate of amino acid substitution within this clade (leading to abberantly short branch lengths) Takezaki relative rate test Branch lengths from synonymous distances
Slower rate of amino acid substitution within this clade (leading to abberantly short branch lengths) Takezaki relative rate test Branch lengths from synonymous distances Increased rate of gene gain Increased selection for the retention of gained genes
Extensive sequence divergence Recombination Horizontal transfer
AMV-EPB_034 inhibitor of apoptosis from Amsacta moorei entomopoxvirus (AMV-EPB) GenBank sequence inhibitor of apoptosis from Bombyx mori (silkworm) BLAST e-value 9e-81 Amsacta moorei entomopoxvirus infects Amsacta moorei (Red Hairy Caterpillar) Bombyx and Amsacta both Order Lepidoptera
AMV-EPB_034 inhibitor of apoptosis from Amsacta moorei entomopoxvirus (AMV-EPB) GenBank sequence inhibitor of apoptosis from Bombyx mori (silkworm) BLAST e-value 9e-81 Amsacta moorei entomopoxvirus infects Amsacta moorei (Red Hairy Caterpillar) Bombyx and Amsacta both Order Lepidoptera 62% of best non-viral GenBank hits are from same taxonomic Class as viral host
Events are not independent Depend on previous (in time) gain and loss events of the gene family Requires a probabilistic model?
Selection for diversification Positive selection Characteristic of host-parasite co-evolution
GGG GAG GCG GUG GGA GAA Glu GCA GUA GGC GAC GCC GUC GGU Gly GAU Asp GCU Ala GUU Val AGG AAG ACG AUG Met AGA Arg AAA Lys ACA AUA AGC AAC ACC AUC AGU Ser AAU Asn ACU Thr AUU Ile CGG CAG CCG CUG CGA CAA Gln CCA CUA CGC CAC CCC CUC CGU Arg CAU His CCU Pro CUU Leu UGG Trp UAG ter UCG UUG UGA ter UAA ter UCA UUA Leu UGC UAC UCC UUC UGU Cys UAU Tyr UCU Ser UUU Phe
Two classes of DNA substitutions Synonymous (DNA change without amino acid change) Nonsynonymous (DNA change causing amino acid change) Neutral equal frequencies Conservative selection fewer nonsynonymous substitutions Positive selection more nonsynonymous substitutions
Two classes of DNA substitutions Synonymous (DNA change without amino acid change) Nonsynonymous (DNA change causing amino acid change) Neutral equal frequencies Conservative selection fewer nonsynonymous substitutions Positive selection more nonsynonymous substitutions
Two classes of DNA substitutions Synonymous (DNA change without amino acid change) Nonsynonymous (DNA change causing amino acid change) Neutral equal frequencies Conservative selection fewer nonsynonymous substitutions Positive selection more nonsynonymous substitutions
Two classes of DNA substitutions Synonymous (DNA change without amino acid change) Nonsynonymous (DNA change causing amino acid change) Neutral equal frequencies Conservative selection fewer nonsynonymous substitutions Positive selection more nonsynonymous substitutions
204 groups of orthologues Maximum liklihood test for positive selection (PAML) Significantly higher frequency of nonsynonymous substitutions
Detected positive selection on 26 genes Examples: Membrane glycoprotein Haemagluttinin Immunoglobulin domain protein
13 genes are unique to orthopox clade Significantly more than expected (P < 0.05) Disproportionate frequency of positive selection on genes gained within the orthopox lineage
Association of positive selection on protein sequences and increased rate of gene acquisition
Association of positive selection on protein sequences and increased rate of gene acquisition Adaptive significance of gene acquisition? Mimic host defences Avoid host recognition Block cell death
The rate of genome evolution is not constant The rate of gene acquisition has increased in the orthopox lineage Orthopox lineage is also has an increased frequency of positive selection Possible adaptive significance of genome evolution
University of California, Irvine Brandon Gaut Pierre Baldi