Phylogenetics Todd Vision Spring Some applications. Uncultured microbial diversity

Similar documents
Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

Constructing Evolutionary/Phylogenetic Trees


Phylogenetic inference

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Dr. Amira A. AL-Hosary

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

C3020 Molecular Evolution. Exercises #3: Phylogenetics

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

Theory of Evolution Charles Darwin

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

Constructing Evolutionary/Phylogenetic Trees

Consensus Methods. * You are only responsible for the first two

Algorithms in Bioinformatics

8/23/2014. Phylogeny and the Tree of Life

Theory of Evolution. Charles Darwin

Bioinformatics 1 -- lecture 9. Phylogenetic trees Distance-based tree building Parsimony

Phylogenetic Tree Reconstruction

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

(Stevens 1991) 1. morphological characters should be assumed to be quantitative unless demonstrated otherwise

Anatomy of a tree. clade is group of organisms with a shared ancestor. a monophyletic group shares a single common ancestor = tapirs-rhinos-horses

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

How to read and make phylogenetic trees Zuzana Starostová

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics)

What is Phylogenetics

Principles of Phylogeny Reconstruction How do we reconstruct the tree of life? Basic Terminology. Looking at Trees. Basic Terminology.

Seuqence Analysis '17--lecture 10. Trees types of trees Newick notation UPGMA Fitch Margoliash Distance vs Parsimony

Phylogenetic analyses. Kirsi Kostamo

Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley

ELE4120 Bioinformatics Tutorial 8

Phylogeny Tree Algorithms

Phylogenetics - Orthology, phylogenetic experimental design and phylogeny reconstruction. Lesser Tenrec (Echinops telfairi)

Multiple Sequence Alignment. Sequences

Phylogenetics. BIOL 7711 Computational Bioscience

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Intraspecific gene genealogies: trees grafting into networks

Phylogenetic analysis. Characters

Page 1. Evolutionary Trees. Why build evolutionary tree? Outline

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Introduction to characters and parsimony analysis

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata.

Gene Families part 2. Review: Gene Families /727 Lecture 8. Protein family. (Multi)gene family

BINF6201/8201. Molecular phylogenetic methods

A (short) introduction to phylogenetics

Is the equal branch length model a parsimony model?

Bootstraps and testing trees. Alog-likelihoodcurveanditsconfidenceinterval

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

Session 5: Phylogenomics

Reconstructing the history of lineages

Phylogeny and Evolution. Gina Cannarozzi ETH Zurich Institute of Computational Science

Lecture 6 Phylogenetic Inference

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

Chapter 26 Phylogeny and the Tree of Life

Chapter 19: Taxonomy, Systematics, and Phylogeny

CS5263 Bioinformatics. Guest Lecture Part II Phylogenetics

Phylogenetic Analysis

Phylogenetic Analysis

Phylogenetic Analysis

Copyright notice. Molecular Phylogeny and Evolution. Goals of the lecture. Introduction. Introduction. December 15, 2008

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Phylogeny. Properties of Trees. Properties of Trees. Trees represent the order of branching only. Phylogeny: Taxon: a unit of classification

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

Evolutionary Tree Analysis. Overview

1/27/2010. Systematics and Phylogenetics of the. An Introduction. Taxonomy and Systematics

Inferring Molecular Phylogeny

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week (Jan 27 & 29):

Phylogenetics in the Age of Genomics: Prospects and Challenges

Week 7: Bayesian inference, Testing trees, Bootstraps

Letter to the Editor. Department of Biology, Arizona State University

Phylogenetics: Building Phylogenetic Trees

Phylogeny: building the tree of life

STEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization)

Many of the slides that I ll use have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Week 8: Testing trees, Bootstraps, jackknifes, gene frequencies

Introduction to Bioinformatics Introduction to Bioinformatics

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

Estimating Evolutionary Trees. Phylogenetic Methods

Phylogeny 9/8/2014. Evolutionary Relationships. Data Supporting Phylogeny. Chapter 26

Comparative Genomics II

I. Short Answer Questions DO ALL QUESTIONS

Chapter 26: Phylogeny and the Tree of Life

molecular evolution and phylogenetics

Bootstrapping and Tree reliability. Biol4230 Tues, March 13, 2018 Bill Pearson Pinn 6-057

Phylogenetics: Parsimony

BIOL 1010 Introduction to Biology: The Evolution and Diversity of Life. Spring 2011 Sections A & B

Systematics - Bio 615

Molecular Phylogenetics (part 1 of 2) Computational Biology Course João André Carriço

Chapter 16: Reconstructing and Using Phylogenies

--Therefore, congruence among all postulated homologies provides a test of any single character in question [the central epistemological advance].

MOLECULAR SYSTEMATICS: A SYNTHESIS OF THE COMMON METHODS AND THE STATE OF KNOWLEDGE

Effects of Gap Open and Gap Extension Penalties

Lecture V Phylogeny and Systematics Dr. Kopeny

Thanks to Paul Lewis, Jeff Thorne, and Joe Felsenstein for the use of slides

Transcription:

Phylogenetics Todd Vision Spring 2008 Tree basics Sequence alignment Inferring a phylogeny Neighbor joining Maximum parsimony Maximum likelihood Rooting trees and measuring confidence Software and file formats Testing hypotheses on a tree Some applications Studying organismal & biogeographic history Systematics (inferring the tree of life) ating events in the fossil record Testing hypotheses about phenotypic evolution onservation biology Studying gene and protein families (molecular evolution) Studying functional specificity and divergence Identifying selection at the molecular level Understanding host-parasite/pathogen coevolution Identifying horizontal transfer events Uncultured microbial diversity Hugenholtz P (2000) Genome iology 3, 1 iccarelli et al. (2006) Science 311, 1283 1

Two views of the tree of life Unrooted networks vs. rooted trees time Taxa Unrooted Rooted 3 1 3 5 15 105 10 2,027,025 34,459,425 W. Ford oolittle Unscaled vs. scaled branches More tree thinking Which species is oldest? Unscaled Scaled 2

lades and monophyly lade: a monophyletic group Includes the most recent common ancestor (MR) of a set of leaves and all of the descendants of that MR subtree on a rooted phylogeny Tree thinking Is the frog more closely related to the fish or the human? aum et al (2005) Science 310, 979-980. Tree thinking Now what do you think? Which of the four trees depicts a different pattern of relationships than the others? 3

Polytomies Incongruence between gene and species trees Error Lineage sorting Gene duplication & gene loss (paralogy) Horizontal transfer 2 descendants per node 2 or more descendants per node (a polytomy) Lineage sorting problem between closely related species himp, human, gorilla Gene duplication Two classes of homologous genes Orthologs diverged through speciation Paralogs diverged through duplication whether or not they are in the same genome time Species 1 Species 2 4

Horizontal transfer lternative explanations for incongruence Pereira et al (2000) J iol hem. 275(2):1495-501 elwiche F and Palmer J (1996) Mol iol Evol: 873-882. Outline Tree basics Sequence alignment Inferring a phylogeny Neighbor joining Maximum parsimony Maximum likelihood Rooting trees and measuring confidence Software and file formats Testing hypotheses on a tree lignments classified by Span Global, encompassing full-length sequences Local, restricted to conserved segments Number of sequences Pairwise, involving only two sequences like LST Multiple, involving more than two Hard! 5

Trivial ifficult GGG TGGTGTT GGTGG GGG TGGTGTT GGTGG GGTT TGTGGTT GGTGG GGT TGTGTT GGTGG GGGG TTGTGTT GGTG TTGTG GGGG--- -- ---TG GGGT--GT ---- --GGTG -TG--- TG- ----TG -GTGGG G-- -TGT -TTG--- GG Twilight Zone 100 90 80 70 60 50 40 30 20 10 Percent amino acid identity 0 otplots: phage λ ci vs. P22 c2 repressor mino acids versus N N sequences give much worse alignments than amino acid sequences Fewer letters Less realistic scoring matrices Window size 1 11 25 Stringency 1 7 15 rgleuys GGTTTG x xx x x GTTGTGT rgleuys 6

Scoring matrix ffine gap score Score depends on length of contiguous gap Gap opening penalty d Gap extension penalty e "(g) = #d # (g #1)e efensins Outline Tree basics Sequence alignment Inferring a phylogeny Neighbor joining Maximum parsimony Maximum likelihood Rooting trees and measuring confidence Software and file formats Testing hypotheses on a tree Hydrophobic Hydrophilic 7

Raw distances Tree distances istance matrix approaches - 5 4 6-5 5-2 - - 4.0 4.5 5.5-4.5 5.5-3.0-2 2 1.5 1 2 Neighbor joining lgorithm Takes a distance matrix as input Starts with a star phylogeny Progressively adds nodes until tree is fully resolved esirable features Very fast Works well if rate variation is not too great Neighbor joining 0.53 0.99 1.02 0.80 0.93 0.65 Star phylogeny Neighbor-joining tree Neighbor joining is based on good estimates of distance Observed number of substitutions time Estimated number of substitutions time 8

When are distances misleading? Maximum parsimony acgttgccga acgttactgg cgtaagatcg cgtaaaaccg 111112131- Maximum parsimony Homoplasy dvantages Provides explicit mapping of character changes along branches an be used for non-molecular characters (morphology) isadvantages Nondeterministic - it is a criterion to evaluate a tree, but does not help us locate that tree Non-probabilistic - makes statistical inference difficult Inconsistent - more data can be positively misleading onvergent or parallel character state changes in multiple independent lineages 9

Maximum likelihood We get heads with probability p Prob of k heads out of n tosses is given by the inomial Probability " n% L = P(x = k n, p) = $ ' # k pk (1( p) n(k & L p To calculate the likelihood of a phylogeny The input data is the alignment Each column is independent The model includes the topology, branch lengths and substitution matrix onsider all possible ancestral states for a given topology hoose the tree with branch lengths that maximizes the probability of producing the alignment g c g c g c g g c g Maximum Likelihood (ML) dvantages Estimates are consistent Given enough data and a correct model, the estimate converges on the correct phylogeny Probabilistic framework One can test relative fit of different models Sometimes the topology itself is a nuisance parameter isadvantages Slow (because we must examine lots of trees, like maximum parsimony) ut recent advances make it practical for trees with 100s of leaves Outline Tree basics Sequence alignment Inferring a phylogeny Neighbor joining Maximum parsimony Maximum likelihood Rooting trees and measuring confidence Software and file formats Testing hypotheses on a tree 10

Locating the root ootstrap confidence values Unrooted O Outgroup Midpoint How much to trust a given branch or clade? With NJ, parsimony and ML, clades do not come with marginal probabilities ootstrap by resampling the original alignment hoose an alignment having the same number of columns with replacement ompute a new tree Repeat this many times ount the proportion of resampled trees in which each original branch appears Label the branches on the original tree with these proportions phylogeny with bootstrap values 11

Strict consensus Outline Tree basics Sequence alignment Inferring a phylogeny Neighbor joining Maximum parsimony Maximum likelihood Rooting trees and measuring confidence Software and file formats Testing hypotheses on a tree onsensus may be among bootstrap replicates, or among equally parsimonious trees http://evolution.genetics.washington.edu/phylip/software.html Recommended software PHYLIP or JalView (NJ) PUP* (Parsimony, interactive, MacOSX) RxML (fast ML) Mrayes (ayesian) MEG (NJ, MP, ML, simple molecular evolution hypothesis testing, visualization) 12

FST format ligned FST Format >gi 18033454 gb L57167.1 F334385_1 own syndrome cell adhesion molecule SM [Rattus norvegicus] MWILLSLFQSFNVFSEEPHSSLYFVNSLQEVVFSTSGTLVPPGIPPVTLRWYLTGEEIYVP GIRHVHPNGTLQIFPFPPSSFSTLIHNTYYTENPSGKIRSQVHIKVLREPYTVRVEQKTMRGNV VFKIIPSSVEYVTVVSWEKTVSLVSGSRFLITSTGLYIKVQNEGLYNYRITRHRYTGETRQS NSRLFVSPNSPSILGFHRKMGQRVELPKLGHPEPYRWLKNMPLELSGRFQKTVTGLLI ENSRPSSGSYVEVSNRYGTKVIGRLYVKQPLKTISPRKVKSSVGSQVSLSSVTGNEQELSWYRN GEILNPGKNVRITGLNHNLIMHMVKSGGYQFVRKKLSQYVQVVLEGTPKIISFSEKVVSP >gi 45827726 ref NP_996770.1 own syndrome cell adhesion molecule isoform H2-52 precursor [Homo sapiens] MWILLSLFQSFNVFSELHSSLYFVNSLQEVVFSTTGTLVPPGIPPVTLRWYLTGEEIYVP GIRHVHPNGTLQIFPFPPSSFSTLIHNTYYTENPSGKIRSQVHIKVLREPYTVRVEQKTMRGNV VFKIIPSSVEYITVVSWEKTVSLVSGSRFLITSTGLYIKVQNEGLYNYRITRHRYTGETRQS NSRLFVSPNSPSILGFHRKMGQRVELPKLGHPEPYRWLKNMPLELSGRFQKTVTGLLI ENIRPSSGSYVEVSNRYGTKVIGRLYVKQPLKTISPRKVKSSVGSQVSLSSVTGTEQELSWYRN GEILNPGKNVRITGINHENLIMHMVKSGGYQFVRKKLSQYVQVVLEGTPKIISFSEKVVSP >gi 20127422 ref NP_001380.2 own syndrome cell adhesion molecule isoform H2-42 precursor [Homo sapiens] MWILLSLFQSFNVFSELHSSLYFVNSLQEVVFSTTGTLVPPGIPPVTLRWYLTGEEIYVP GIRHVHPNGTLQIFPFPPSSFSTLIHNTYYTENPSGKIRSQVHIKVLREPYTVRVEQKTMRGNV VFKIIPSSVEYITVVSWEKTVSLVSGSRFLITSTGLYIKVQNEGLYNYRITRHRYTGETRQS NSRLFVSPNSPSILGFHRKMGQRVELPKLGHPEPYRWLKNMPLELSGRFQKTVTGLLI ENIRPSSGSYVEVSNRYGTKVIGRLYVKQPLKTISPRKVKSSVGSQVSLSSVTGTEQELSWYRN GEILNPGKNVRITGINHENLIMHMVKSGGYQFVRKKLSQYVQVVLEGTPKIISFSEKVVSP >11_1RYP.ent/1-94 -----GYRHITIFSPEGRLYQVEYFKTNQTNINSL VRGKTVVISQKKVPKLLPT-TVSYIFISRTIGMVV NGPIPRNLRKEE >14_1RYP.ent/1-93 ------GYRLSIFSPGHIFQVEYLEVKR-GTVG VKGKNVVLGERRSTLKLQTRITPSKVSKISHVVLSF SGLNSRILIEKRVEQS >16_1RYP.ent/1-100 FRNNYGTVTFSPTGRLFQVEYLEIKQGSVTVGLRSN THVLVLKRNELSSYQKKIIKEHMGLSLGLP RVLSNYLRQQNYSSLVFNR Newick format Newick format (with branch lengths) 2.3 3.3 1.4 1.8 2.2 ((,)(,)) ((:2.3,:1.8):3.3,(:1.4,:2.2)) 13

NEXUS #NEXUS EGIN T; IMENSIONS NTX=89 NHR=88; [!ata from: Laskowski, M., Jr., and W.M. Fitch. 1989. Evolution of MTRIX [ 10 20 30 40 50 60 70 80 ] [........ ] Struthio_camelus VKYPNTNEEGKEVVLPKILSPIGSGVYSNELNIEYTNVSK??????FT--VYKPVPLYMLSKTSNKNNVVESSGTLRHFGK [86] Rhea_americana...L..E..N.V.T...?.?????...--...H...S.E...N...S... [86] Pterocnemia_pennata...L..E..N.V...H?EV...--...H...S.E...N...S... [86].] hauna_chavaria.r...l.t.t...t...rkev..--...t.e...nq...s...n...s... [86] nseranas_semipalmata.r...s...l.t...hkev..--..e...t.e...nq...n...s... [86] EGIN GENETIode; StandardNULER; EN; EGIN OONS; OESET * UNTITLE = Universal: all ; EN; EGIN SSUMPTIONS; OPTIONS EFTYPE=unord PolyTcount=MINSTEPS ; EN; EGIN TREES; TRNSLTE 1 Struthio_camelus, 2 Rhea_americana, 88 arpococcyx_renauldi, 89 Podargus_strigoides ; TREE * PUP_1 = [&R] (1,(((2,3),(4,5)),((((((((((6,7),((30,31),(((32,33),34),(((((((35,57),((((53,67),70) (62,(63,64),(68,69))),(((54,(55,56)),84),74))),(((46,(48,49)),47),71)),((36,59),60),(61,(75,76))),(72,73),77),((44,45),((((50,5 1),58),52),(65,66)))),(((37,((38,39),40)),41),(42,43)))))),14),15),((((16,20),(18,19)),17),((((21,26),(27,(28,29))), 22),((23,24),25)))),(78,79,80)),87),((81,85),(82,83))),(8,9,(10,((11,(86,88)),13),12))),89))); EN; EGIN NOTES; TEXT TXON=26 TEXT= G_removed_from_end_of_sequence; EN; Outline Tree basics Sequence alignment Inferring a phylogeny Neighbor joining Maximum parsimony Maximum likelihood Rooting trees and measuring confidence Software and file formats Testing hypotheses on a tree Testing a hypothesis on a tree Kishino-Hasegawa or SOWH test The difference in likelihood between two trees is compared to a test statistic for significance The alternative trees may differ in presence of an incongruency, for example This test is easy to misuse! Summary good alignment is a prerequisite Many methods are available for infer a tree Maximum likelihood is the most accurate To interpret the tree, it helps to Root it (don t be fooled by the graphic!) Measure confidence in clades (e.g. bootstrap) ranches with low support should be collapsed to polytomies pproach the tree with a hypothesis or question in mind 14