Aoife McLysaght Dept. of Genetics Trinity College Dublin

Similar documents
Objective: You will be able to justify the claim that organisms share many conserved core processes and features.

Using an Artificial Regulatory Network to Investigate Neural Computation

Edinburgh Research Explorer

Genetic code on the dyadic plane

A p-adic Model of DNA Sequence and Genetic Code 1

Biology 155 Practice FINAL EXAM

In previous lecture. Shannon s information measure x. Intuitive notion: H = number of required yes/no questions.

UNIVERSITY OF CAMBRIDGE INTERNATIONAL EXAMINATIONS General Certifi cate of Education Advanced Subsidiary Level and Advanced Level

CHEMISTRY 9701/42 Paper 4 Structured Questions May/June hours Candidates answer on the Question Paper. Additional Materials: Data Booklet

Genetic Code, Attributive Mappings and Stochastic Matrices

Lecture IV A. Shannon s theory of noisy channels and molecular codes

A Minimum Principle in Codon-Anticodon Interaction

The degeneracy of the genetic code and Hadamard matrices. Sergey V. Petoukhov

Mathematics of Bioinformatics ---Theory, Practice, and Applications (Part II)

Reducing Redundancy of Codons through Total Graph

Lect. 19. Natural Selection I. 4 April 2017 EEB 2245, C. Simon

A modular Fibonacci sequence in proteins

A Mathematical Model of the Genetic Code, the Origin of Protein Coding, and the Ribosome as a Dynamical Molecular Machine

Natural Selection. Nothing in Biology makes sense, except in the light of evolution. T. Dobzhansky

Crystal Basis Model of the Genetic Code: Structure and Consequences

Analysis of Codon Usage Bias of Delta 6 Fatty Acid Elongase Gene in Pyramimonas cordata isolate CS-140

The genetic code, 8-dimensional hypercomplex numbers and dyadic shifts. Sergey V. Petoukhov

Three-Dimensional Algebraic Models of the trna Code and 12 Graphs for Representing the Amino Acids

ATTRIBUTIVE CONCEPTION OF GENETIC CODE, ITS BI-PERIODIC TABLES AND PROBLEM OF UNIFICATION BASES OF BIOLOGICAL LANGUAGES *

Ribosome kinetics and aa-trna competition determine rate and fidelity of peptide synthesis

Natural Selection. Nothing in Biology makes sense, except in the light of evolution. T. Dobzhansky

SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS. Prokaryotes and Eukaryotes. DNA and RNA

THE GENETIC CODE INVARIANCE: WHEN EULER AND FIBONACCI MEET

Abstract Following Petoukhov and his collaborators we use two length n zero-one sequences, α and β,

Foundations of biomaterials: Models of protein solvation

CODING A LIFE FULL OF ERRORS

In Silico Modelling and Analysis of Ribosome Kinetics and aa-trna Competition

Practical Bioinformatics

Get started on your Cornell notes right away

The Genetic Code Degeneracy and the Amino Acids Chemical Composition are Connected

In Silico Modelling and Analysis of Ribosome Kinetics and aa-trna Competition

C CH 3 N C COOH. Write the structural formulas of all of the dipeptides that they could form with each other.

Molecular Evolution and Phylogenetic Analysis

Week 6: Protein sequence models, likelihood, hidden Markov models

Slide 1 / 54. Gene Expression in Eukaryotic cells

Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/26

Expression pattern and, surprisingly, gene length shape codon usage in Caenorhabditis, Drosophila, and Arabidopsis

arxiv: v2 [physics.bio-ph] 8 Mar 2018

2013 Japan Student Services Origanization

Transcription Attenuation

3. Evolution makes sense of homologies. 3. Evolution makes sense of homologies. 3. Evolution makes sense of homologies

PROTEIN SYNTHESIS INTRO

Recent Evidence for Evolution of the Genetic Code

Protein Threading. Combinatorial optimization approach. Stefan Balev.

Chemistry Chapter 26

The Trigram and other Fundamental Philosophies

Supplementary Information for

Introduction to Molecular Phylogeny

High throughput near infrared screening discovers DNA-templated silver clusters with peak fluorescence beyond 950 nm

SYMMETRIES, GENERALIZED NUMBERS AND HARMONIC LAWS IN MATRIX GENETICS

An algebraic hypothesis about the primeval genetic code

Six Fractal Codes of Biological Life: Perspectives in Astrobiology and Emergence of Binary Logics

Fundamental mathematical structures applied to physics and biology. Peter Rowlands and Vanessa Hill

NIH Public Access Author Manuscript J Theor Biol. Author manuscript; available in PMC 2009 April 21.

Proteins: Characteristics and Properties of Amino Acids

Modelling and Analysis in Bioinformatics. Lecture 1: Genomic k-mer Statistics

Genome and language two scripts of heredity

Structure and Function. Overview of BCOR 11. Various forms of Carbon molecules and functional groups

SUPPORTING INFORMATION FOR. SEquence-Enabled Reassembly of β-lactamase (SEER-LAC): a Sensitive Method for the Detection of Double-Stranded DNA

Crick s early Hypothesis Revisited

Advanced topics in bioinformatics

NSCI Basic Properties of Life and The Biochemistry of Life on Earth

DO NOT OPEN THE EXAMINATION PAPER UNTIL YOU ARE TOLD BY THE SUPERVISOR TO BEGIN

Sequence Divergence & The Molecular Clock. Sequence Divergence

part 3: analysis of natural selection pressure

Supplemental data. Pommerrenig et al. (2011). Plant Cell /tpc

AP Biology Spring Review

Gene Finding Using Rt-pcr Tests

Codon Distribution in Error-Detecting Circular Codes

Practice Problems 6. a) Why is there such a big difference between the length of the HMG CoA gene found on chromosome 5 and the length of the mrna?

SSR ( ) Vol. 48 No ( Microsatellite marker) ( Simple sequence repeat,ssr),

Lecture 4. Models of DNA and protein change. Likelihood methods

THE MATHEMATICAL STRUCTURE OF THE GENETIC CODE: A TOOL FOR INQUIRING ON THE ORIGIN OF LIFE

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Evolutionary Analysis of Viral Genomes

part 4: phenomenological load and biological inference. phenomenological load review types of models. Gαβ = 8π Tαβ. Newton.

Characterization of Pathogenic Genes through Condensed Matrix Method, Case Study through Bacterial Zeta Toxin

A Brief History of Life on Earth

Supplementary materials

Answers to Chapter 6 (in-text & asterisked problems)

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

Protein Struktur. Biologen und Chemiker dürfen mit Handys spielen (leise) go home, go to sleep. wake up at slide 39

Advanced Topics in RNA and DNA. DNA Microarrays Aptamers

Why do more divergent sequences produce smaller nonsynonymous/synonymous

SUPPLEMENTARY DATA - 1 -

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Journal of Biometrics & Biostatistics

Midterm Review Guide. Unit 1 : Biochemistry: 1. Give the ph values for an acid and a base. 2. What do buffers do? 3. Define monomer and polymer.

Grade 12 Prototype Examination. Biology. Course Code Barcode Number. Date of Birth

Comparison of the Lipoprotein Gene among the Enterobacteriaceae

Supplemental Table 1. Primers used for cloning and PCR amplification in this study

Electronic supplementary material

Clay Carter. Department of Biology. QuickTime and a TIFF (Uncompressed) decompressor are needed to see this picture.

Similarity or Identity? When are molecules similar?

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

Transcription:

Aoife McLysaght Dept. of Genetics Trinity College Dublin

Evolution of genome arrangement Evolution of genome content.

Evolution of genome arrangement Gene order changes Inversions, translocations Evolution of genome content.

Evolution of genome arrangement Gene order changes Inversions, translocations Evolution of genome content Gene gain (sequence divergence, duplication, recombination, horizontal transfer) Gene loss (deletion).

Evolution of genome arrangement Gene order changes Inversions, translocations Evolution of genome content Gene gain (sequence divergence, duplication, recombination, horizontal transfer) Gene loss (deletion) One or more genes per event

Translate knowledge from sequenced or model genomes to organism of interest Positional cloning of genes Use probes designed in one genome to detect a target in another genome Improve model parameters for phylogenetic inference from genome arrangement

Not just a bag of genes Genome organisation contains information Order of Hox genes corresponds to spatial pattern of gene expression Clustering of housekeeping genes By observation of allowed changes gain understanding of genomic constraints and plasticity

Greater power to detect change Precision Can infer lineage in which change occurred Detect direction and rate of change More genomes also increase computational burden

20 completely sequenced genomes 150-300kb containing ~200 genes

Double-stranded DNA viruses, no RNA stage Replicate in the host cytoplasm Entomopox insect infecting Chordopox vertebrate infecting Orthopox subset of chordopox which includes smallpox (variola) and vaccinia

How are these genomes arranged? How has genome content changed? Is the rate of change constant?

How are these genomes arranged? How has genome content changed? Is the rate of change constant? Can we detect adaptive genome evolution?

Significant sequence similarity How significant? over a long stretch of the protein How long?

0 0 0 0 1.0 10 14 16 15 0.9 14 22 25 26 0.8 18 26 30 28 0.7 19 30 33 29 0.6 20 30 32 17 0.5 20 31 34 10 0.4 20 31 33 7 0.3 19 29 32 4 0.2 19 29 31 0 0.1 19 29 31 0 0.0 1e-20 1e-10 1e-5 1 e-value threshold Minimum aligned proportion

Complete linkage Single-link clustering Our method

C A E D B

C A E G F D B

C A E F J D C B E G I D H B

4042 total proteins 3384 proteins classified into 875 groups 813 complete linkage 521 groups of 1 member 150 groups of 2 members 204 3 members

34 orthologues present in all genomes

34 orthologues present in all genomes

92 orthologues present in all orthopox genomes

Examine phylogenetic spread of a group of orthologues Assign gene gain and loss events to branches in the phylogeny

Tested for uniform rate of gene acquisition Assume a molecular clock

Tested for uniform rate of gene acquisition Assume a molecular clock Are gene acquisition events distributed randomly throughout the tree?

Tested for uniform rate of gene acquisition Assume a molecular clock Are gene acquisition events distributed randomly throughout the tree? Simulations

Significant deficit Significant excess

Slower rate of amino acid substitution within this clade (leading to abberantly short branch lengths) Takezaki relative rate test Branch lengths from synonymous distances

Slower rate of amino acid substitution within this clade (leading to abberantly short branch lengths) Takezaki relative rate test Branch lengths from synonymous distances Increased rate of gene gain Increased selection for the retention of gained genes

Extensive sequence divergence Recombination Horizontal transfer

AMV-EPB_034 inhibitor of apoptosis from Amsacta moorei entomopoxvirus (AMV-EPB) GenBank sequence inhibitor of apoptosis from Bombyx mori (silkworm) BLAST e-value 9e-81 Amsacta moorei entomopoxvirus infects Amsacta moorei (Red Hairy Caterpillar) Bombyx and Amsacta both Order Lepidoptera

AMV-EPB_034 inhibitor of apoptosis from Amsacta moorei entomopoxvirus (AMV-EPB) GenBank sequence inhibitor of apoptosis from Bombyx mori (silkworm) BLAST e-value 9e-81 Amsacta moorei entomopoxvirus infects Amsacta moorei (Red Hairy Caterpillar) Bombyx and Amsacta both Order Lepidoptera 62% of best non-viral GenBank hits are from same taxonomic Class as viral host

Events are not independent Depend on previous (in time) gain and loss events of the gene family Requires a probabilistic model?

Selection for diversification Positive selection Characteristic of host-parasite co-evolution

GGG GAG GCG GUG GGA GAA Glu GCA GUA GGC GAC GCC GUC GGU Gly GAU Asp GCU Ala GUU Val AGG AAG ACG AUG Met AGA Arg AAA Lys ACA AUA AGC AAC ACC AUC AGU Ser AAU Asn ACU Thr AUU Ile CGG CAG CCG CUG CGA CAA Gln CCA CUA CGC CAC CCC CUC CGU Arg CAU His CCU Pro CUU Leu UGG Trp UAG ter UCG UUG UGA ter UAA ter UCA UUA Leu UGC UAC UCC UUC UGU Cys UAU Tyr UCU Ser UUU Phe

Two classes of DNA substitutions Synonymous (DNA change without amino acid change) Nonsynonymous (DNA change causing amino acid change) Neutral equal frequencies Conservative selection fewer nonsynonymous substitutions Positive selection more nonsynonymous substitutions

Two classes of DNA substitutions Synonymous (DNA change without amino acid change) Nonsynonymous (DNA change causing amino acid change) Neutral equal frequencies Conservative selection fewer nonsynonymous substitutions Positive selection more nonsynonymous substitutions

Two classes of DNA substitutions Synonymous (DNA change without amino acid change) Nonsynonymous (DNA change causing amino acid change) Neutral equal frequencies Conservative selection fewer nonsynonymous substitutions Positive selection more nonsynonymous substitutions

Two classes of DNA substitutions Synonymous (DNA change without amino acid change) Nonsynonymous (DNA change causing amino acid change) Neutral equal frequencies Conservative selection fewer nonsynonymous substitutions Positive selection more nonsynonymous substitutions

204 groups of orthologues Maximum liklihood test for positive selection (PAML) Significantly higher frequency of nonsynonymous substitutions

Detected positive selection on 26 genes Examples: Membrane glycoprotein Haemagluttinin Immunoglobulin domain protein

13 genes are unique to orthopox clade Significantly more than expected (P < 0.05) Disproportionate frequency of positive selection on genes gained within the orthopox lineage

Association of positive selection on protein sequences and increased rate of gene acquisition

Association of positive selection on protein sequences and increased rate of gene acquisition Adaptive significance of gene acquisition? Mimic host defences Avoid host recognition Block cell death

The rate of genome evolution is not constant The rate of gene acquisition has increased in the orthopox lineage Orthopox lineage is also has an increased frequency of positive selection Possible adaptive significance of genome evolution

University of California, Irvine Brandon Gaut Pierre Baldi