Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Similar documents
Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016

Lecture 11 Friday, October 21, 2011

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Phylogenetic Tree Reconstruction

Algorithms in Bioinformatics

8/23/2014. Phylogeny and the Tree of Life

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

Phylogenetic inference

C3020 Molecular Evolution. Exercises #3: Phylogenetics

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

BINF6201/8201. Molecular phylogenetic methods

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

Dr. Amira A. AL-Hosary

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

Chapter 26: Phylogeny and the Tree of Life

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

Phylogeny 9/8/2014. Evolutionary Relationships. Data Supporting Phylogeny. Chapter 26

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

Chapter 19 Organizing Information About Species: Taxonomy and Cladistics

Phylogenetic Analysis

What is Phylogenetics

EVOLUTIONARY DISTANCES

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Multiple Sequence Alignment. Sequences

Cladistics and Bioinformatics Questions 2013

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Phylogenetic analyses. Kirsi Kostamo

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Phylogenetic Analysis

Phylogenetic Analysis


Phylogeny and Systematics

Evolutionary Tree Analysis. Overview

Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics

Phylogeny and the Tree of Life

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics)

Phylogeny and the Tree of Life

31/10/2012. Human Evolution. Cytochrome c DNA tree

Evolutionary trees. Describe the relationship between objects, e.g. species or genes

Classification and Phylogeny

1 ATGGGTCTC 2 ATGAGTCTC

Evolution Problem Drill 10: Human Evolution

A (short) introduction to phylogenetics

FUNDAMENTALS OF MOLECULAR EVOLUTION

Classification and Phylogeny

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata.

molecular evolution and phylogenetics

Phylogeny & Systematics: The Tree of Life

Chapter 26 Phylogeny and the Tree of Life

Phylogeny and the Tree of Life

C.DARWIN ( )

Phylogeny & Systematics

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

Consensus Methods. * You are only responsible for the first two

Inferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

BIOLOGY. Phylogeny and the Tree of Life CAMPBELL. Reece Urry Cain Wasserman Minorsky Jackson

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.

Examples of Phylogenetic Reconstruction

Phylogenetics - Orthology, phylogenetic experimental design and phylogeny reconstruction. Lesser Tenrec (Echinops telfairi)

Chapter 26 Phylogeny and the Tree of Life

Estimating Evolutionary Trees. Phylogenetic Methods

7. Tests for selection

Gene Families part 2. Review: Gene Families /727 Lecture 8. Protein family. (Multi)gene family

Phylogenetics: Building Phylogenetic Trees

Concept Modern Taxonomy reflects evolutionary history.

Inferring Molecular Phylogeny

Phylogeny and Evolution. Gina Cannarozzi ETH Zurich Institute of Computational Science

Bioinformatics course

Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D

Phylogeny and Molecular Evolution. Introduction

Unit 9: Evolution Guided Reading Questions (80 pts total)

Phylogeny: traditional and Bayesian approaches

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

Inferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution

Primate Diversity & Human Evolution (Outline)

Phylogeny and the Tree of Life

Organizing Life s Diversity

CHAPTER 26 PHYLOGENY AND THE TREE OF LIFE Connecting Classification to Phylogeny

CS5263 Bioinformatics. Guest Lecture Part II Phylogenetics

Evolution by duplication

Unit 7: Evolution Guided Reading Questions (80 pts total)

Phylogene)cs. IMBB 2016 BecA- ILRI Hub, Nairobi May 9 20, Joyce Nzioki

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

How should we organize the diversity of animal life?

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

Bayesian Models for Phylogenetic Trees

Bootstrapping and Tree reliability. Biol4230 Tues, March 13, 2018 Bill Pearson Pinn 6-057

Microbial Diversity and Assessment (II) Spring, 2007 Guangyi Wang, Ph.D. POST103B

A Phylogenetic Network Construction due to Constrained Recombination

AP Biology Notes Outline Enduring Understanding 1.B. Big Idea 1: The process of evolution drives the diversity and unity of life.

Phylogeny and Molecular Evolution. Introduction

Chapter 16: Reconstructing and Using Phylogenies

Transcription:

Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 200 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues Estimating divergence times Determining the identity of new pathogens Detection of orthology and paralogy Reconstructing ancient proteins Detecting recombination break points Identification of horizontal gene transfer

Candidatus Desulforudis audaxviator Life is Lonely at the Center of the Earth Environmental genomics reveals a single-species ecosystem deep within earth. Chivian et al. Science 2008. 2

Molecular phylogeny to examine extinct species - I Is the south american opossum evolutionary related to the australian marsupial wolf? bos echymipera thylacinus sarcophilu dasyurus trichosuru phalanger philander 3

Molecular phylogeny to examine extinct species - II "Sequencing the nuclear genome of the extinct woolly mammoth". Miller et al. Nature Nov. 2008 Molecular phylogeny to examine extinct species - III Phylogeny of Neanderthal individuals Svante Pääbo 4

5

Out of Africa hypothesis Modern humans evolved from archaic forms only in Africa. Archaic humans living in Asia and Europe (like the Neanderthal) were replaced by modern humans migrating out of Africa. Home assignment - Phylogeny of Neanderthal - modern humans - monkeys Starting point is multiple alignment of complete mitochondrial genomes from * 8 modern humans of different origin, including 2 African sequences * One Neanderthal genome (2008) * Gorilla, Bonobo, Chimpanzee * Relationship of Neanderthal to modern humans? * Modern humans and "Out of Africa" hypothesis? * What primate is most closely related to humans? 6

Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues Estimating divergence times Determining the identity of new pathogens Detection of orthology and paralogy Reconstructing ancient proteins Detecting recombination break points Identification of horizontal gene transfer A molecular clock may be used in the estimation of time of divergence between two species A r = K / 2 or = K/2r where Ancestral sequence B r = rate of nucleotide substitution (estimated from fossil records) K = number of substitutions K between the two homologous sequences = ime of divergence between the two species 7

Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues Estimating divergence times Determining the identity of new pathogens Detection of orthology and paralogy Reconstructing ancient proteins Detecting recombination break points Identification of horizontal gene transfer Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues Estimating divergence times Determining the identity of new pathogens Detection of orthology and paralogy Reconstructing ancient proteins Detecting recombination break points Identification of horizontal gene transfer 8

Analysis of orthology and paralogy Human A Chimp A Mouse A Dog A Gene duplication events Human B Chimp B Mouse B Dog B Dog C Compare Zvelebil & Baum p. 242 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues Estimating divergence times Determining the identity of new pathogens Detection of orthology and paralogy Reconstructing ancient proteins Detecting recombination break points Identification of horizontal gene transfer 9

ADH produces ethanol ADH2 consumes ethanol Conclusion from properties of ancestral protein: ancestral Adh was mainly responsible for making ethanol 0

Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues Estimating divergence times Determining the identity of new pathogens Detection of orthology and paralogy Reconstructing ancient proteins Detecting recombination break points Identification of horizontal gene transfer Detecting recombination break points - common in viral genomes recombination breakpoint A BC x A B Cx

Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues Estimating divergence times Determining the identity of new pathogens Detection of orthology and paralogy Reconstructing ancient proteins Detecting recombination break points Identification of horizontal gene transfer Horizontal gene transfer - transfer of genes between species Mitochondria and chloroplasts resulted from bacteria that lived in symbiosis with a primitive eukaryote. Eventually many genes were lost or transferred to the nuclear genome 2

Many eubacterial genes have been transferred to archae and eukarya 3

Phylogenetic analysis may be used to identify horisontal gene transfer. Some Chlamydia (Eubacteria kingdom) proteins group with plant homologs Phylogeny of chlamydial enoyl-acyl carrier protein reductase as an example of horizontal transfer. From: Stephens RS, et al Genome sequence of an obligate intracellular pathogen of humans: Chlamydia trachomatis. Science. 998 Oct 23;282(5389):754-9. Phylogenetic analysis - Selection of sequences for analysis - Multiple sequence alignment - Construction of tree -Evaluation of tree 4

Construction of the phylogenetic tree Distance methods Character methods Maximum parsimony Maximum likelihood Distance methods Simplest distance measure: Consider every pair of sequences in the multiple alignment and count the number of differences. Degree of divergence = Hamming distance (D) D = n/n where N = alignment length n = number of sites with differences Example: AGGCCA AGCCCCA D = 2/0 = 0.2 5

Character-based methods * Maximum parsimony * Maximum likelihood Maximum parsimony parsimony - principle in science where the simplest answer is the preferred. In phylogeny: he preferred phylogenetic tree is the one that requires the fewest evolutionary steps. 6

Maximum parsimony. Identify all informative sites in the multiple alignment 2. For each possible tree, calculate the number of changes at each informative site. 3. Sum the number of changes for each possible tree. 4. ree with the smallest number of changes is selected as the most likely tree. Maximum parsimony Identify informative sites Site 2 3 4 5 6 7 8 9 Sequence ------------------------- A A G A G G C A 2 A G C C G G C G 3 A G A A C C A 4 A G A G A C C G * * * 7

Site 3 - non - informative Site 5 - informative 8

Summing changes: site 5 site 7 site 9 Sum ree I 2 4 ree II 2 2 5 ree III 2 2 2 6 ree I most likely. (In this case we are not considering branch lengths, only topology of tree is predicted) Character-based methods * Maximum parsimony * Maximum likelihood What is the probability that a particular tree generated the observed data under a specific model? 9

Maximum likelihood Consider the following multiple alignment A C 2 A C 3 A A 4 A G C First, consider position 3 above (AG) here are three possible unrooted trees for the OUs -4: A 3 2 2 2 4 3 4 4 G A G G ree A ree B ree C 3 A A rooted version of ree A: Maximum likelihood L L3 L0 0 L2 G 2 L4 L5 2 A 3 L6 G 4 L(ree) = L0 * L * L2 * L3 * L4 * L5 * L6 20

Example of probability matrix for nucleotide substitutions Maximum likelihood A C G A ~ k k 2k C k ~ 2k k k 2k ~ k G 2k k k ~ where we here set k = E-6. ransitions are more likely than transversions A rooted version of ree A: Maximum likelihood L L3 L0 0 L2 G 2 L4 L5 2 A 3 L6 G 4 L(ree) = L0 * L * L2 * L3 * L4 * L5 * L6 = 0.25 * * E-6 * * * 2E-6 * = 5E-3 2

A rooted version of ree A: Maximum likelihood L A L3 L0 0 L2 C 2 L4 L5 2 A 3 L6 G 4 L(ree2) = L0 * L * L2 * L3 * L4 * L5 * L6 = 0.25 * E-6 * 2E-6 * E-6 * E-6 * E-6 * E-6 = 5E-37 Maximum likelihood L(ree) = L(ree) + L(ree2) + L(ree3)... L(ree64) hen we examine all positions of the alignment in the same way. Probability of tree is the product of probabilities for the different positions. L = L(ree pos) * L(ree pos2) * L(ree pos3) * L(ree pos4) lnl = ln L(ree pos) + ln L(ree pos2) + ln L(ree pos3) +ln L(ree pos4) Finally, the rees B and C are handled the same way. ree with highest probability is preferred. 22

Maximum likelihood Consider the following multiple alignment A C 2 A C 3 A A 4 A G C First, consider position 3 above (AG) here are three possible unrooted trees for the OUs -4: A 3 2 2 2 4 3 4 4 G A G G ree A ree B ree C 3 A Phylogenetic analysis - Selection of sequences for analysis - Multiple sequence alignment - Construction of tree - Evaluation of tree Bootstrapping 23

Evaluation of tree - Bootstrapping (from www.icp.ucl.ac.be/~opperd/private/bootstrap.html) Bootstrapping is a way of testing the reliability of the dataset and the tree, allows you to assess whether the distribution of characters has been influenced by stochastic effects. Bootstrapping in practice ake a dataset consisting of in total n sequences with m sites each. A number of resampled datasets of the same size (n x m) as the original dataset is produced. However, each site is sampled at random and no more sites are sampled than there were original sites. 24

Consensus tree. he number of times each branch point or node occurred (bootstrap proportion) is indicated at each node. Bootstrapping typically involves 00-000 datasets. Bootstrap values > 70% are generally considered to provide support for the clade designation. Software for phylogenetic analysis PHYLIP (Phylogenetic Inference Package) Joe Felsenstein http://evolution.genetics.washington.edu/phylip.html Examples in home assignment DNADIS = create a distance matrix NEIGHBOR = neighbor joining / UPGMA DNAPARS = maximum parsimony DNAML = maximum likelihood PAUP (Phylogenetic Analysis Using Parsimony) MrBayes 25