molecular evolution and phylogenetics

Similar documents
Algorithms in Bioinformatics

Phylogenetic Tree Reconstruction

BINF6201/8201. Molecular phylogenetic methods

Phylogenetic inference

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression)

EVOLUTIONARY DISTANCES

Phylogeny: building the tree of life

Evolutionary Tree Analysis. Overview

What is Phylogenetics

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

8/23/2014. Phylogeny and the Tree of Life

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Constructing Evolutionary/Phylogenetic Trees

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center


UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics)

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

Dr. Amira A. AL-Hosary

Phylogenetic trees 07/10/13

Phylogenetics: Building Phylogenetic Trees

Multiple Sequence Alignment. Sequences

C3020 Molecular Evolution. Exercises #3: Phylogenetics

How to read and make phylogenetic trees Zuzana Starostová

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016

Reading for Lecture 13 Release v10

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

A (short) introduction to phylogenetics

Phylogenetics. BIOL 7711 Computational Bioscience

Phylogenetics - Orthology, phylogenetic experimental design and phylogeny reconstruction. Lesser Tenrec (Echinops telfairi)

Many of the slides that I ll use have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Phylogeny and Evolution. Gina Cannarozzi ETH Zurich Institute of Computational Science

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Theory of Evolution Charles Darwin

Phylogeny and the Tree of Life

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

Chapter 27: Evolutionary Genetics

Phylogeny and the Tree of Life

C.DARWIN ( )

Chapter 26: Phylogeny and the Tree of Life

CS5263 Bioinformatics. Guest Lecture Part II Phylogenetics

Lecture 6 Phylogenetic Inference

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

Phylogenetic Analysis

Phylogenetic Analysis

Phylogenetic Analysis

1 ATGGGTCTC 2 ATGAGTCTC

Molecular Evolution & Phylogenetics

Anatomy of a tree. clade is group of organisms with a shared ancestor. a monophyletic group shares a single common ancestor = tapirs-rhinos-horses

Bayesian Models for Phylogenetic Trees

Comparative Genomics II

Constructing Evolutionary/Phylogenetic Trees

Comparative Bioinformatics Midterm II Fall 2004

Phylogenetic analysis. Characters

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley

Phylogeny. November 7, 2017

Phylogenetic analyses. Kirsi Kostamo

Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies

Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week (Jan 27 & 29):

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata.

Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D

Module 12: Molecular Phylogenetics

Chapter 26 Phylogeny and the Tree of Life

Phylogeny: traditional and Bayesian approaches

Session 5: Phylogenomics

Chapter 16: Reconstructing and Using Phylogenies

Organizing Life s Diversity

Seuqence Analysis '17--lecture 10. Trees types of trees Newick notation UPGMA Fitch Margoliash Distance vs Parsimony

Page 1. Evolutionary Trees. Why build evolutionary tree? Outline

Gene Families part 2. Review: Gene Families /727 Lecture 8. Protein family. (Multi)gene family

Phylogeny Tree Algorithms

Module 13: Molecular Phylogenetics

Molecular Phylogenetics (part 1 of 2) Computational Biology Course João André Carriço

Consensus Methods. * You are only responsible for the first two

Effects of Gap Open and Gap Extension Penalties

Phylogeny and Systematics

A Phylogenetic Network Construction due to Constrained Recombination

Cladistics and Bioinformatics Questions 2013

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics

MOLECULAR SYSTEMATICS: A SYNTHESIS OF THE COMMON METHODS AND THE STATE OF KNOWLEDGE

Inferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT

Chapter 19: Taxonomy, Systematics, and Phylogeny

Phylogeny and the Tree of Life

Plan: Evolutionary trees, characters. Perfect phylogeny Methods: NJ, parsimony, max likelihood, Quartet method

Phylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches

Lecture 11 Friday, October 21, 2011

Phylogeny and the Tree of Life

Estimating Evolutionary Trees. Phylogenetic Methods

Molecular Evolution and Phylogenetic Tree Reconstruction

Chapter 26 Phylogeny and the Tree of Life

7. Tests for selection

Transcription:

molecular evolution and phylogenetics Charlotte Darby Computational Genomics: Applied Comparative Genomics 2.13.18 https://www.thinglink.com/scene/762084640000311296

Internal node Root TIME Branch Leaves A B C D

computer scientist biologist??? Each set of three figures shows the same topological relationship among the leaves C E D B A B C D E A A B C D E

Internal node Root Leaves Branch Leaves A B C D Strains or isolates of the same species OR different species OR DNA or protein sequences From the same genome or different genomes Leaves are implied to be extant (present at the current time)

Internal node Root Root Branch Leaves A B C D Implies what happened first in chronological time Tree-building algorithms may not infer the root of a tree Represented as a node with 3 outgoing branches where the root ought to be Could root halfway along the longest path midpoint rooting Could root by prior knowledge of an outgroup e.g. bacteria in the tree above

http://cabbagesofdoom.blogspot.com/2012/06/how-to-root-phylogenetic-tree.html

http://cabbagesofdoom.blogspot.com/2012/06/how-to-root-phylogenetic-tree.html

Internal node Root Internal node Branch Leaves A B C D Represents some ancestral state Most recent common ancestor (MRCA) of the leaves following it Internal nodes are implied to NOT be extant

Internal node Root Branch length Branch Leaves A B C D Chronological time OR number of changes to the DNA / protein sequence Depending on assumptions you make about rate of evolution when building tree, may not be ultrametric (every root-leaf path has same length) OR no distance implied just defines a branching order B C A D A B C D

Why visualize biological information as trees? (besides having a colorful Figure 1 for your Nature paper!) Trees show fundamental structure in your data. A picture s worth a thousand words!

The last amniote common ancestor of synapsids and sauropsids is placed 315 MYA. The sauropsid lineage evolved into archosaurs (such as birds, crocodilians, and dinosaurs) and lepidosaurs (such as snakes and lizards), diverging in the Triassic period. In the synapsid lineage, these primitive mammals acquired homothermy and lactation before the divergence of protetherian and therian mammals 166 MYA in the Jurassic period. While descendant species of the prototherian mammals - extant monotremes such as the platypus Ornithorhynchus anatinus - have characteristics such as venom, electroreception, and meroblastic cleavage, therian mammals evolved holoblastic cleavage, placentation, viviparity, and testicular descent before their divergence 148 MYA into marsupials and eutherians. Further diversification leads us to observe a pouch and prolonged lactation in extant marsupials, and inner cell mass and prolonged gestation in eutherians. Text composed by me for illustration only

species phylogeny When did species diverge? What characteristics define extant (surviving) species, and when in evolution did they arise? Genome analysis of the platypus reveals unique signatures of evolution. Warren et al., Nature 2008

strain phylogeny When were pathogens transmitted between individuals, species, or places? Establishment and cryptic transmission of Zika virus in Brazil and the Americas. Faria et al., Nature 2017

Establishment and cryptic transmission of Zika virus in Brazil and the Americas. Faria et al., Nature 2017

gene phylogeny D D D D D D How did a gene family diversify through duplication? How are the current functions of genes with shared evolutionary history (AKA gene family or homologous genes) related to their evolution? D D Gene duplication event http://evolution-textbook.org/content/free/figures/27_evow_art/29_evow_ch27.jpg

(((A,B),C),(D,E)) The Newick Standard was adopted 26 June 1986 by an informal committee meeting convened by me [Joe Felsenstein] during the Society for the Study of Evolution meetings in Durham, New Hampshire and consisting of James Archie, William H.E. Day, Wayne Maddison, Christopher Meacham, F. James Rohlf, David Swofford, and myself. (The committee was not an activity of the SSE nor endorsed by it). The reason for the name is that the second and final session of the committee met at Newick's restaurant in Dover, New Hampshire, and we enjoyed the meal of lobsters. ((raccoon:19.19959,bear:6.80041):0.84600[50],((sea_lion:11.99700, seal:12.00300):7.52973[100],((monkey:100.85930,cat:47.14069):20.59 201[80], weasel:18.87953):2.09460[75]):3.87382[50],dog:25.46154); (((A:5,B:5)f:3,C:9)g:1,(D:2,E:6)h:2); http://evolution.genetics.washington.edu/phylip/newicktree.html

Demo: http://tree.bio.ed.ac.uk/software/figtree/

Building Trees

You need a principled way of choosing the tree that best represents the relationship among leaves There are (2n-3)!! rooted trees for n taxa 3 taxa: 3 trees 5 taxa: 105 trees 10 taxa: 34,459,425 trees 20 taxa: 8.2 10 21 trees 50 taxa: 2.8 10 76 trees 100 taxa: 3.3 10 184 trees The double factorial, symbolized by two exclamation marks (!!), is a quantity defined for all integers greater than or equal to -1. For an even integer n, the double factorial is the product of all even integers less than or equal to n but greater than or equal to 2.

Parsimony: scenario that requires the fewest changes is the best https://evolution.berkeley.edu/evolibrary/article/0_0_0/evotrees_build_07

Clustering a distance matrix Calculate all pairwise distances Use a mathematical model of protein / nucleotide evolution to calculate the distance between two sequences Here: Hamming distance Example algorithms: Least squares Neighbor joining Minimum evolution UPGMA (Unweighted Pair Group Method with Arithmetic Mean) Software: PHYLIP AATG AAAG 3A>3T 2A>2C 4G>4T AAAT AAAG ACTG I II III I - 1 3 II 1-2 III 3 2 -

Estimating distance between sequences: nucleotide evolution models The thickness of the arrows indicates the substitution rates of the four nucleotides (T, C, A and G), and the sizes of the circles represent the nucleotide frequencies when the substitution process is in equilibrium. Note that both JC69 and K80 predict equal proportions of the four nucleotides. Molecular phylogenetics: principles and practice. Yang and Rannala, Nature Reviews Genetics 2012

Clustering: Neighbor joining 0. Start with a star graph 1. Calculate distance matrix 2. Find the minimum entry (closest leaves) and join those leaves with a new internal node, with branch lengths based on the values in the matrix 3. Replace the rows/columns of the leaves with a single row/column of the new internal node 4. Recalculate distance matrix Repeat 2,3,4 until you have an unrooted binary tree *Resulting tree is unrooted and not guaranteed to be ultrametric (Step 2) https://en.wikipedia.org/wiki/neighbor_joining

Clustering: UPGMA 0. Start with a star graph 1. Calculate distance matrix 2. Find the minimum entry (closest leaves) and join those leaves at a distance of the mean distance between their elements 3. Replace the rows/columns of the leaves with a single row/column of their union (cluster) 4. Recalculate distance matrix Repeat 2,3,4 until you have an rooted binary tree *Resulting tree is rooted and ultrametric: assumes rate of evolution is constant on all branches (molecular clock) https://en.wikipedia.org/wiki/upgma

1 2 4 5 3 1 2 3 4 5

Multiple sequence alignment http://www.sequence-alignment.com/sequence-alignment.jpg

Maximum likelihood estimation from an MSA 0. Start with a multiple sequence alignment 1. Build a starting tree 2. Calculate the likelihood of the tree based on an evolutionary model 3. Change the tree slightly to increase the likelihood Repeat 2,3 until some convergence criterion met ML is a heuristic search of tree space Bootstrapping: resampling from tree space to gauge the quality of your solution Proportion of resamples that support a certain branching pattern Software: PhyML/RAxML Genetic algorithm - GARLI

Bayesian estimation from an MSA 0. Start with a multiple sequence alignment 1. Build a starting tree and assume some prior probability on the distribution of trees 2. Calculate the posterior probability of the tree based on an evolutionary model 3. Sample a nearby tree, usually using MCMC (Markov chain Monte Carlo) algorithms 4. Accept the new tree if it is better, otherwise keep with the old tree Repeat 2,3,4 until some convergence criterion met Bayesian methods are also a heuristic search of tree space The posterior probability (probability correct given model) gauges the quality of your solution Software: BEAST, MrBayes

Molecular phylogenetics: principles and practice. Yang and Rannala, Nature Reviews Genetics 2012

3-hydroxy-3-methylglutaryl coenzyme A synthase (HMGCS) enzyme family Homologs: Genes with shared evolutionary history Orthologs: Genes that diverged through speciation (i.e. not duplication) Paralogs: Genes that diverged through duplication Orthologs Gene duplication event Paralogs Same species Paralogs Different species When type `1` genes cluster separately from type `2` genes in the gene tree, this suggests there was a duplication and that type `1` genes and type `2` genes are paralogs. How old is my gene? Capra et al., Trends in Genetics 2013

3-hydroxy-3-methylglutaryl coenzyme A synthase (HMGCS) enzyme family 1 (duplication) amniotes 1,2 fish frog shark 1 More parsimonious Supported by gene tree evidence X X X Gene duplication at root 1,2 (3 losses) Gene tree inferred using PhyML from sequences aligned with MAFFT and rooted using three invertebrate outgroup sequences. Branch support was assessed using alrt scores. How old is my gene? Capra et al., Trends in Genetics 2013 amniotes 1,2 fish frog shark 1

Evolution of chromosomes in hexaploid sweet potato a) Evolutionary history of cultivated I. batatas revealed by phylogenetic analysis of homologous chromosome regions. a, The dominant topology structure of all phylogenetic trees. Numbers indicate the average branch length of trees in this structure. b) The time points of B2 subgenome specialization and two whole-genome duplication events were estimated as 1.3, 0.8 and 0.5 million years ago (Ma). The estimation is based on 0.7% mutation rate per million years. The dashed curve indicates the crossing between diploid and tetraploid progenitors. Haplotype-resolved sweet potato genome traces back its hexaploidization history. Yang et. al, Nature Plants 2017

Building a species phylogeny Macroscopic characteristics Molecular characteristics Conserved sequence, e.g. ribosome Concatenation of conserved genes Consensus of many individual-gene trees (supertree) http://www.bio.utexas.edu/faculty/sjasper/bio213/phylogeny.html

Species phylogeny of Cyanobacteria from consensus of many protein families Consensus cyanobacterial phylogeny based on maximum likelihood trees for each of 438 orthologous protein families. The numbers at each bifurcation indicate the total number of trees where that exact bifurcation/branching order is observed Note that as this is a consensus tree, topology is meaningful but distances are not. Integrating Markov Clustering and Molecular Phylogenetics to Reconstruct the Cyanobacterial Species Tree from Conserved Protein Families. Swingley et al., Mol Bio Evol 2008

Species phylogeny of Cyanobacteria from concatenation of many multiple sequence alignments [MrBayes] tree for the full-concatenated data set, based on 230,415 aligned positions in 26 genomes. The scale bar indicates the number of substitutions per site. Shown at each bifurcation are the predicted coregenome (upper number) and pan-genome (lower number) sizes of an ancestor at that point. Integrating Markov Clustering and Molecular Phylogenetics to Reconstruct the Cyanobacterial Species Tree from Conserved Protein Families. Swingley et al., Mol Bio Evol 2008

+ extant species w/ nitrogen fixation X loss of nitrogen fixation if we assume present at root and parallel loss (gray square) ancestor had nitrogen fixation if we assume multiple gains + nitrogen fixation gained on branch if we assume multiple gains More parsimonious: 3 Gains or 5 Losses? Integrating Markov Clustering and Molecular Phylogenetics to Reconstruct the Cyanobacterial Species Tree from Conserved Protein Families. Swingley et al., Mol Bio Evol 2008

Resources Review article Molecular phylogenetics: principles and practice. Yang and Rannala, Nature Reviews Genetics 2012 Opinion piece Homology: a personal view on some of the problems. Fitch, Trends In Genetics 2000 Textbooks Molecular Evolution: A Phylogenetic Approach. Page and Holmes Inferring Phylogenies. Felsenstein My work in evolutionary biology (horizontal gene transfer) Xenolog classification. Darby, Stolzer et. al., Bioinformatics 2017