Consensus methods. Strict consensus methods

Similar documents
Maximum Likelihood Until recently the newest method. Popularized by Joseph Felsenstein, Seattle, Washington.

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

Phylogenetic methods in molecular systematics

Systematics - Bio 615

Lecture 6 Phylogenetic Inference

The Idiot s Guide to the Zen of Likelihood in a Nutshell in Seven Days for Dummies, Unleashed

Introduction to characters and parsimony analysis

Using Trees for Classifications. Introduction

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Dr. Amira A. AL-Hosary

Constructing Evolutionary/Phylogenetic Trees

Lecture V Phylogeny and Systematics Dr. Kopeny

ESS 345 Ichthyology. Systematic Ichthyology Part II Not in Book

What is Phylogenetics

Need for systematics. Applications of systematics. Linnaeus plus Darwin. Approaches in systematics. Principles of cladistics

A Phylogenetic Network Construction due to Constrained Recombination

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

How to read and make phylogenetic trees Zuzana Starostová

8/23/2014. Phylogeny and the Tree of Life

C.DARWIN ( )

Constructing Evolutionary/Phylogenetic Trees

(Stevens 1991) 1. morphological characters should be assumed to be quantitative unless demonstrated otherwise

Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment

Maximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018

Phylogenies & Classifying species (AKA Cladistics & Taxonomy) What are phylogenies & cladograms? How do we read them? How do we estimate them?

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley

Outline. Classification of Living Things

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

Phylogeny is the evolutionary history of a group of organisms. Based on the idea that organisms are related by evolution

PHYLOGENY & THE TREE OF LIFE

Quantifying sequence similarity

Homework Assignment, Evolutionary Systems Biology, Spring Homework Part I: Phylogenetics:

GENETICS - CLUTCH CH.22 EVOLUTIONARY GENETICS.

--Therefore, congruence among all postulated homologies provides a test of any single character in question [the central epistemological advance].

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2016 University of California, Berkeley. Parsimony & Likelihood [draft]

Macroevolution Part I: Phylogenies

Phylogenetic Analysis

Phylogenetic Analysis

Phylogenetic Analysis

How should we organize the diversity of animal life?

Algorithmic Methods Well-defined methodology Tree reconstruction those that are well-defined enough to be carried out by a computer. Felsenstein 2004,

Chapter 26 Phylogeny and the Tree of Life

Workshop: Biosystematics

Reconstructing the history of lineages

Fig. 26.7a. Biodiversity. 1. Course Outline Outcomes Instructors Text Grading. 2. Course Syllabus. Fig. 26.7b Table

Phylogenetic inference

Chapter 19: Taxonomy, Systematics, and Phylogeny

What Is Conservation?

BINF6201/8201. Molecular phylogenetic methods

Integrative Biology 200A "PRINCIPLES OF PHYLOGENETICS" Spring 2008

Phylogenetic analyses. Kirsi Kostamo

Anatomy of a tree. clade is group of organisms with a shared ancestor. a monophyletic group shares a single common ancestor = tapirs-rhinos-horses

The practice of naming and classifying organisms is called taxonomy.

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Intraspecific gene genealogies: trees grafting into networks

C3020 Molecular Evolution. Exercises #3: Phylogenetics

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

Classification, Phylogeny yand Evolutionary History

AP Biology. Cladistics

Introduction to Biosystematics - Zool 575

Biology 211 (2) Week 1 KEY!

Chapter 16: Reconstructing and Using Phylogenies

Is the equal branch length model a parsimony model?

Phylogeny and the Tree of Life

Biology 1B Evolution Lecture 2 (February 26, 2010) Natural Selection, Phylogenies

Chapter 26 Phylogeny and the Tree of Life

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley

Phylogeny 9/8/2014. Evolutionary Relationships. Data Supporting Phylogeny. Chapter 26

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057

Phylogeny and the Tree of Life

Consensus Methods. * You are only responsible for the first two

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics)

Phylogeny & Systematics: The Tree of Life

The Phylogenetic Reconstruction of the Grass Family (Poaceae) Using matk Gene Sequences

Phylogenetics: Parsimony and Likelihood. COMP Spring 2016 Luay Nakhleh, Rice University

BIOL 428: Introduction to Systematics Midterm Exam

Molecular evidence for multiple origins of Insectivora and for a new order of endemic African insectivore mammals

Evolutionary Models. Evolutionary Models

Ratio of explanatory power (REP): A new measure of group support

Biologists have used many approaches to estimating the evolutionary history of organisms and using that history to construct classifications.

Inferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution

Non-independence in Statistical Tests for Discrete Cross-species Data

Phylogenetic analysis. Characters

The Life System and Environmental & Evolutionary Biology II

Chapter 26: Phylogeny and the Tree of Life

Phylogenetics in the Age of Genomics: Prospects and Challenges

Bioinformatics 1 -- lecture 9. Phylogenetic trees Distance-based tree building Parsimony

Evaluating phylogenetic hypotheses

PHYLOGENY WHAT IS EVOLUTION? 1/22/2018. Change must occur in a population via allele

CS5263 Bioinformatics. Guest Lecture Part II Phylogenetics

Supplementary Materials for

Lecture 11 Friday, October 21, 2011

Name. Ecology & Evolutionary Biology 2245/2245W Exam 2 1 March 2014

Chapter 10. Classification and Phylogeny of Animals. Order in Diversity. Hierarchy of taxa. Table Linnaeus introduced binomial nomenclature

Name: Class: Date: ID: A

Organizing Life s Diversity

Classification and Phylogeny

PHYLOGENY AND SYSTEMATICS

Transcription:

Consensus methods A consensus tree is a summary of the agreement among a set of fundamental trees There are many consensus methods that differ in: 1. the kind of agreement 2. the level of agreement Consensus methods can be used with multiple trees from a single analysis or from multiple analyses Strict consensus methods Strict consensus methods require agreement across all the fundamental trees They show only those relationships that are unambiguously supported by the parsimonious interpretation of the data The commonest method (strict component consensus) focuses on clades/components/full splits This method produces a consensus tree that includes all and only those full splits found in all the fundamental trees Other relationships (those in which the fundamental trees disagree) are shown as unresolved polytomies 1

Strict consensus methods TWO FUNDAMENTAL TREES A B C D E F G A B C E D F G A B C D E F G STRICT COMPONENT CONSENSUS TREE Majority-rule consensus methods Majority-rule consensus methods require agreement across a majority of the fundamental trees May include relationships that are not supported by the MP tree This method produces a consensus tree that includes all and only those full splits found in a majority (>50%) of the fundamental trees Other relationships are shown as unresolved polytomies Of particular use in bootstrapping 2

Majority rule consensus THREE FUNDAMENTAL TREES A B C D E F G A B C E F D G A B C E D F G A B C E D F G Numbers indicate frequency of clades in the fundamental trees 100 66 66 66 66 MAJORITY-RULE COMPONENT CONSENSUS TREE Reduced consensus methods TWO FUNDAMENTAL TREES A B C D E F G A G B C D E F A B C D E F A B C D E F G STRICT REDUCED CONSENSUS TREE Taxon G is excluded Strict component consensus completely unresolved 3

Parsimonious Character Optimization 1 => 0 origin and reversal (ACCTRAN) 0 0 1 1 0 A B C D E * = 0 => 1 * = OR parallelism 2 separate origins 0 => 1 (DELTRAN) Homoplastic characters often have alternative equally parsimonious optimizations Commonly used varieties are: ACCTRAN - accelerated transformation DELTRAN - delayed transformation Consequently, branch lengths are not always fully determined PAUP reports minimum and maximum branch lengths Questions History? India Sri lanka 4

Questions History? India Sri lanka Missing data Missing data is ignored in tree building but can lead to alternative equally parsimonious optimizations in the absence of homoplasy single origin 0 => 1 on any one of 3 branches 1?? 0 0 A B C D E * * * Abundant missing data can lead to multiple equally parsimonious trees. This can be a serious problem with morphological data but is less likely to arise with molecular data 5

Maximum Likelihood Maximum Likelihood To estimate the probability that we would observe a particular dataset, given a phylogenetic tree and some notion of how the evolutionary process worked over time. Ï a b c d ) b a e f Ì c e a g Ó d c f a Probability of given (p = [ a,c,g,t] 6

What is the probability of observing a datum? If we flip a coin and get a head and we think the coin is unbiased, then the probability of observing this head is 0.5. If we think the coin is biased so that we expect to get a head 80% of the time, then the likelihood of observing this datum (a head) is 0.8. Therefore: The likelihood of making some observation is entirely dependent on the model that underlies our assumption. p =? Lesson: The datum has not changed, our model has. Therefore under the new model the likelihood of observing the datum has changed. What is the probability of observing a 'G' nucleotide? Model 1: frequency of G = 0.4 => likelihood(g) = 0.4 Model 2: frequency of G = 0.25 => likelihood(g) = 0.25 One rule the rule of 1. The sum of the likelihoods of all the possibilities will always equal 1. E.g. for DNA p(a)+p(c)+p(g)+p(t)=1 7

What about longer sequences? If we consider a gene of length 2: Gene 1: ga The the probability of observing this gene is the product of the probabilities of observing each character. E.g p(g) = 0.4; p(a)=0.15 (for instance) likelihood(ga) = 0.4 x 0.15 = 0.06 or even longer sequences? Gene 1: gactagctagacagatacgaattac Model (simple base frequency model): p(a)=0.15; p(c)=0.2; p(g)=0.4; p(t)=0.25; (the sum of all probabilities must equal 1) Like(Gene 1) = 0.000000000000000018452813 8

Note about models You might notice that our model of base frequency is not the optimal model for our observed data. If we had used the following model: p(a)=0.4; p(c) =0.2; p(g)= 0.2; p(t) = 0.2; The likelihood of observing the gene is: Like(gene 1) = 0.000000000000335544320000 (a value that is almost 10,000 times higher) Lesson: The datum has not changed, our model has. Therefore under the new model the likelihood of observing the datum has changed. How does this relate to phylogenetic trees? Consider an alignment of two sequences: Gene 1: gaac Gene 2: gacc We assume these genes are related by a (simple) phylogenetic tree with branch lengths. 9

Increase in model sophistication It is no longer possible to simply invoke a model that encompasses base composition, we must also include the mechanism of sequence change and stasis. There are two parts to this model - the tree and the process (the latter is confusingly referred to as the model, although both parts really compose the model). Note: We will stay with the confusing notation - to avoid further confusion. The model The two parts of the model are the tree and the process (the model). The model is composed of the composition and the substitution process -rate of change from one character state to another character state. Model = + Ï a b c d b a e f Ì c e a g Ó d c f a [ ] p = a,c,g,t 10

Simple time-reversible model A simple model is that the rate of change from a to c or vice versa is 0.4, the composition of a is 0.25 and the composition of c is 0.25 (a simplified version of the Jukes and Cantor 1969 model) Ï. 0.4.. 0.4... Ì.... Ó.... [ ] P = p = 0.25 0.25.. Probability of the third nucleotide position in our current alignment p(a) =0.25; p(c) = 0.25; p a Æc = 0.4 Starting with a, the likelihood of the nucleotide is 0.25 and the likelihood of the substitution (branch) is 0.4. So the likelihood of observing these data is: *Likelihood(D M) = 0.25 x 0.4 =0.01 Note: you will get the same result if you start with c, since this model is reversible *The likelihood of the data, given the model. 11

Substitution matrix For nucleotide sequences, there are 16 possible ways to describe substitutions - a 4x4 matrix. Ï a b c d e f g h P = Ì i j k l Ó m n o p Convention dictates that the order of the nucleotides is a,c,g,t Note: for amino acids, the matrix is a 20 x 20 matrix and for codon-based models, the matrix is 61 x 61 Substitution matrix - an example Ï 0.976 0.01 0.007 0.007 0.002 0.983 0.005 0.01 P = Ì 0.003 0.01 0.979 0.007 Ó 0.002 0.013 0.005 0.979 In this matrix, the probability of an a changing to a c is 0.01 and the probability of a c remaining the same is 0.979, etc. Note: The rows of this matrix sum to 1 - meaning that for every nucleotide, we have covered all the possibilities of what might happen to it. The columns do not sum to anything in particular. 12

To calculate the likelihood of the entire dataset, given a substitution matrix, base composition and a branch length of one "certain evolutionary distance" or "ced" Gene 1: ccat Likelihood of Gene 2: ccgt given Ï 0.976 0.01 0.007 0.007 0.002 0.983 0.005 0.01 P = Ì 0.003 0.01 0.979 0.007 Ó 0.002 0.013 0.005 0.979 π=[0.1,0.4,0.2,0.3] Likelihood of a two-sequence alignment. ccat ccgt p c P c-> c p c P c ->c p a P a-> g p t P t-> t =0.4x0.983x0.4x0.983x0.1x0.007x0.3x0.979 =0.0000300 Likelihood of going from the first to the second sequence is 0.0000300 13

Different Branch Lengths For very short branch lengths, the probability of a character staying the same is high and the probability of it changing is low (for our particular matrix). For longer branch lengths, the probability of character change becomes higher and the probability of staying the same is lower. The previous calculations are based on the assumption that the branch length describes one Certain Evolutionary Distance or CED. If we want to consider a branch length that is twice as long (2 CED), then we can multiply the substitution matrix by itself (matrix 2 ). 2 CED model Ï 0.976 0.01 0.007 0.007 0.002 0.983 0.005 0.01 P = Ì 0.003 0.01 0.979 0.007 Ó 0.002 0.013 0.005 0.979 = X Ï 0.976 0.01 0.007 0.007 0.002 0.983 0.005 0.01 P = Ì 0.003 0.01 0.979 0.007 Ó 0.002 0.013 0.005 0.979 È 0.953 0.02 0.013 0.015 Í Í 0.005 0.966 0.015 0.029 Í Í 0.01 0.029 0.939 0.022 Í Î 0.007 0.038 0.015 0.94 Which gives a likelihood of 0.0000559 Note the higher likelihood 14

For 3 CED È 0.93 0.029 0.019 0.022 Í Í 0.007 0.949 0.015 0.029 P 3 = Í Í 0.01 0.029 0.939 0.022 Í Î 0.007 0.038 0.015 0.94 This gives a likelihood of 0.0000782 Note that as the branch lengths increase, the values on diagonal decrease and the values on the off-diagonals increase. For higher values of CED units 1 0.0000300 2 0.0000559 3 0.0000782 10 0.0001620 15 0.0001770 20 0.0001750 30 0.0001520 L i k e l i h o o d 0 10 20 30 40 Branch Length 15

Likelihood of the alignment at various branch lengths ccat ccgt 0.0002 0.00018 0.00016 0.00014 0.00012 0.0001 0.00008 0.00006 0.00004 0.00002 0 0 0.1 0.2 0.3 0.4 0.5 0.6 The maximum likelihood value is 0.0001777 at a branch length of 0.330614 16

The evolutionary revolution Organisms share a common ancestry and our classification should reflect these histories (Darwin) Philosophy and methodology for reconstructing evolutionary history - cladistics (Hennig) Philosophical nature of natural groups (Ghiselin, Hull) A nomenclatural system adapted to phylogenetic systematics (Ghiselin, Griffiths, de Queiroz & Gauthier, etc) Content Ancestry 17

Bryant & Cantino (2002) claim that Traditional taxonomists tend to conceptualize taxa in terms of content. Proponents of the phylocode tend to conceptualize taxa in terms of ancestry. Definitions or Fixing the reference Name Name A B C A B C A B Name C x node stem apomorphy Node based: Name refers to the least inclusive clade comprising B and C Stem based: Name refers to the most inclusive clade comprising B and C, but not A. Apomorphy based: Name refers to all taxa descending from the first ancestor possessing apomorphy x. 18