Thanks to Paul Lewis and Joe Felsenstein for the use of slides

Similar documents
Consistency Index (CI)

Module 13: Molecular Phylogenetics

Module 12: Molecular Phylogenetics

Phylogenetic methods in molecular systematics

Module 13: Molecular Phylogenetics

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

Module 12: Molecular Phylogenetics

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Phylogenetics: Building Phylogenetic Trees

(Stevens 1991) 1. morphological characters should be assumed to be quantitative unless demonstrated otherwise

C3020 Molecular Evolution. Exercises #3: Phylogenetics

Phylogenetic Tree Reconstruction

Dr. Amira A. AL-Hosary

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

Evolutionary Tree Analysis. Overview

Is the equal branch length model a parsimony model?

Introduction to characters and parsimony analysis

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2016 University of California, Berkeley. Parsimony & Likelihood [draft]

X X (2) X Pr(X = x θ) (3)

BIOL 428: Introduction to Systematics Midterm Exam

Anatomy of a tree. clade is group of organisms with a shared ancestor. a monophyletic group shares a single common ancestor = tapirs-rhinos-horses

Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition

Reconstructing the history of lineages

Classification and Phylogeny

Thanks to Paul Lewis, Jeff Thorne, and Joe Felsenstein for the use of slides

Classification and Phylogeny

Phylogenetic inference

Algorithmic Methods Well-defined methodology Tree reconstruction those that are well-defined enough to be carried out by a computer. Felsenstein 2004,

A (short) introduction to phylogenetics

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

Systematics Lecture 3 Characters: Homology, Morphology

Bioinformatics 1 -- lecture 9. Phylogenetic trees Distance-based tree building Parsimony

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057

Algorithms in Bioinformatics

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

Phylogenetic Analysis

Phylogenetic Analysis

Phylogenetic Analysis

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30

Letter to the Editor. The Effect of Taxonomic Sampling on Accuracy of Phylogeny Estimation: Test Case of a Known Phylogeny Steven Poe 1

Homework Assignment, Evolutionary Systems Biology, Spring Homework Part I: Phylogenetics:

The practice of naming and classifying organisms is called taxonomy.

BINF6201/8201. Molecular phylogenetic methods

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution

Phylogene)cs. IMBB 2016 BecA- ILRI Hub, Nairobi May 9 20, Joyce Nzioki

1. Can we use the CFN model for morphological traits?

Non-independence in Statistical Tests for Discrete Cross-species Data

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Constructing Evolutionary/Phylogenetic Trees

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

--Therefore, congruence among all postulated homologies provides a test of any single character in question [the central epistemological advance].

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Biologists have used many approaches to estimating the evolutionary history of organisms and using that history to construct classifications.

Phylogenetics: Parsimony

Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

Evolutionary trees. Describe the relationship between objects, e.g. species or genes

Constructing Evolutionary/Phylogenetic Trees

Phylogeny: building the tree of life

Chapter 26 Phylogeny and the Tree of Life

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Reconstructing Evolutionary Trees. Chapter 14

A Phylogenetic Network Construction due to Constrained Recombination

The statistical and informatics challenges posed by ascertainment biases in phylogenetic data collection

Distances that Perfectly Mislead

How should we organize the diversity of animal life?

Many of the slides that I ll use have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Systematics - Bio 615

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

1 ATGGGTCTC 2 ATGAGTCTC

CSCI1950 Z Computa4onal Methods for Biology Lecture 4. Ben Raphael February 2, hhp://cs.brown.edu/courses/csci1950 z/ Algorithm Summary

Questions we can ask. Recall. Accuracy and Precision. Systematics - Bio 615. Outline

Phylogenetics: Parsimony and Likelihood. COMP Spring 2016 Luay Nakhleh, Rice University


PhyQuart-A new algorithm to avoid systematic bias & phylogenetic incongruence

Integrative Biology 200A "PRINCIPLES OF PHYLOGENETICS" Spring 2008

Integrative Biology 200A "PRINCIPLES OF PHYLOGENETICS" Spring 2012 University of California, Berkeley

Phylogenies & Classifying species (AKA Cladistics & Taxonomy) What are phylogenies & cladograms? How do we read them? How do we estimate them?

Pinvar approach. Remarks: invariable sites (evolve at relative rate 0) variable sites (evolves at relative rate r)

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University

Lecture 11 Friday, October 21, 2011

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

Inferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution

Estimating Evolutionary Trees. Phylogenetic Methods

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

ESTIMATION OF CONSERVATISM OF CHARACTERS BY CONSTANCY WITHIN BIOLOGICAL POPULATIONS

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016

Zhongyi Xiao. Correlation. In probability theory and statistics, correlation indicates the

Phylogenetic analyses. Kirsi Kostamo

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.

Early History up to Schedule. Proteins DNA & RNA Schwann and Schleiden Cell Theory Charles Darwin publishes Origin of Species

Maximum Likelihood Until recently the newest method. Popularized by Joseph Felsenstein, Seattle, Washington.

What is Phylogenetics

Chapter 19 Organizing Information About Species: Taxonomy and Cladistics

Chapter 1 Biology 103

PHYLOGENY & THE TREE OF LIFE

Transcription:

Thanks to Paul Lewis and Joe Felsenstein for the use of slides

Review Hennigian logic reconstructs the tree if we know polarity of characters and there is no homoplasy UPGMA infers a tree from a distance matrix: groups based on similarity fails to give the correct tree if rates of character evolution vary much modern distance-based approaches: extend the ideas behind Buneman s (1971) approach try to find trees and branch lengths, such that the path lengths implied by the tree match the pairwise distances that are estimated from character data. are not sensitive to variation in rate. do not use all of the information in the data.

Imperfections of distance base approaches Summarizing the character data into a distance matrix loses information. We cannot tell which characters evolve quickly and which evolve slowly from pairwise comparisons.

1 2 3 4 Species 1 C G A C Species 2 C G A T Species 3 C G A C Species 4 C G A T Species 5 C G A C Species 6 C G A T Species 7 C G A C Species 8 C G A T Species 9 C G G C Species 10 C G G T Species 11 C G G C Species 12 C G G T Species 13 C G G C Species 14 C G G T Species 15 C G G C Species 16 C G G T

Imperfections of distance base approaches Summarizing the character data into a distance matrix loses information. We cannot tell which characters evolve quickly and which evolve slowly from pairwise comparisons. Reconstructing character histories on a tree can reveal homoplasy this means we can t condense the character data before we start looking for trees.

1 2 3 4 5 6 7 8 9... Species 1 C G A C C A G G T... Species 2 C G A C C A G G T... Species 3 C G G T C C G G T... Species 4 C G G C C T G G T... One of the 3 possible trees: Same tree with states at character 6 instead of species names Species 1 Species 3 A C Species 2 Species 4 A T

Standard Parsimony

Things to note about the last slide 2 steps was the minimum score attainable. Multiple ancestral character state reconstructions gave a score of 2. Enumeration of all possible ancestral character states is not the most efficient algorithm.

Each character (site) is assumed to be independene To calculate the parsimony score for a tree we simply sum the scores for every site. 1 2 3 4 5 6 7 8 9 Species 1 C G A C C A G G T Species 2 C G A C C A G G T Species 3 C G G T C C G G T Species 4 C G G C C T G G T Score 0 0 1 1 0 2 0 0 0 Species 1 Species 3 Tree 1 has a score of 4 Species 2 Species 4

Considering a different tree We can repeat the scoring for each tree. 1 2 3 4 5 6 7 8 9 Species 1 C G A C C A G G T Species 2 C G A C C A G G T Species 3 C G G T C C G G T Species 4 C G G C C T G G T Score 0 0 2 1 0 2 0 0 0 Species 1 Species 2 Tree 2 has a score of 5 Species 3 Species 4

One more tree Tree 3 has the same score as tree 2 1 2 3 4 5 6 7 8 9 Species 1 C G A C C A G G T Species 2 C G A C C A G G T Species 3 C G G T C C G G T Species 4 C G G C C T G G T Score 0 0 2 1 0 2 0 0 0 Species 1 Species 2 Tree 3 has a score of 5 Species 4 Species 3

Parsimony criterion prefers tree 1 Tree 1 required the fewest number of state changes (DNA substitutions) to explain the data. Some parsimony advocates equate the preference for the fewest number of changes to the general scientific principle of preferring the simplest explanation (Ockham s Razor), but this connection has not been made in a rigorous manner. The parsimony criterion is equivalent to minimizing homoplasy.

In the example matrix at the beginning of these slides, only character 3 is parsimony informative. 1 2 3 4 5 6 7 8 9 Species 1 C G A C C A G G T Species 2 C G A C C A G G T Species 3 C G G T C C G G T Species 4 C G G C C T G G T Max score 0 0 2 1 0 2 0 0 0 Min score 0 0 1 1 0 2 0 0 0

Assumptions about the evolutionary process can be incorporated using different step costs 3 2 0 1 1 2 3 0 Fitch Parsimony unordered

Stepmatrices Fitch Parsimony Stepmatrix To A C G T A 0 1 1 1 From C 1 0 1 1 G 1 1 0 1 T 1 1 1 0

Stepmatrices Transversion-Transition 5:1 Stepmatrix To A C G T A 0 5 1 5 From C 5 0 5 1 G 1 5 0 5 T 5 1 5 0

5:1 Transversion:Transition parsimony

Stepmatrix considerations Parsimony scores from different stepmatrices cannot be meaningfully compared (31 under Fitch is not better than 45 under a transversion:transition stepmatrix) Parsimony cannot be used to infer the stepmatrix weights

Other Parsimony variants Dollo derived state can only arise once, but reversals can be frequent (e.g. restriction enzyme sites). weighted - usually means that different characters are weighted differently (slower, more reliable characters usually given higher weights). implied weights?

Scoring trees under parsimony is fast A C C A A G

Scoring trees under parsimony is fast Fitch algorithm A C C A A G {A,C} +1 {A,G} +1 3 steps {A} {A, C} +1 {A}

Scoring a trees under parsimony Fitch and Sankoff algorithms are fast (# of calculations increases linearly with the # of leaves, despite the fact that the # of possible ancestral state reconstructions increases exponentially). Fitch and Sankoff algorithms are guaranteed to succeed (return the minimum # of changes). Elaborations of these algorithms allow us to reconstruct ancestral states.

Qualitative description of parsimony It can perform well even when changes are not rare. Does not prefer to put changes on one branch over another Hard to characterize statistically the set of conditions in which parsimony is guaranteed to work well is very restrictive (low probability of change and not too much branch length heterogeneity); Parsimony often performs well in simulation studies (even when outside the zones in which it is guaranteed to work); Estimates of the tree can be extremely biased.

Long branch attraction Felsenstein, J. 1978. Cases in which parsimony or compatibility methods will be positively misleading. Systematic Zoology 27: 401-410. 1.0 1.0 0.01 0.01 0.01

Long branch attraction A G Felsenstein, J. 1978. Cases in which parsimony or compatibility methods will be positively misleading. Systematic Zoology 27: 401-410. 1.0 1.0 The probability of a parsimony informative site due to inheritance is very low, (roughly 0.0003). 0.01 A G 0.01 0.01

Long branch attraction A A Felsenstein, J. 1978. Cases in which parsimony or compatibility methods will be positively misleading. Systematic Zoology 27: 401-410. 1.0 1.0 The probability of a parsimony informative site due to inheritance is very low, (roughly 0.0003). 0.01 G G 0.01 0.01 The probability of a misleading parsimony informative site due to parallelism is much higher (roughly 0.008).

Long branch attraction Parsimony is almost guaranteed to get this tree wrong. 1 3 1 3 2 4 2 4 True Inferred

Inconsistency Statistical Consistency (roughly speaking) is converging to the true answer as the amount of data goes to. Parsimony based tree inference is not consistent for some tree shapes. In fact it can be positively misleading : Felsenstein zone tree Many clocklike trees with short internal branch lengths and long terminal branches (Penny et al., 1989, Huelsenbeck and Lander, 2003). Methods for assessing confidence (e.g. bootstrapping) will indicate that you should be very confident in the wrong answer.