Constructing Evolutionary/Phylogenetic Trees

Similar documents
Constructing Evolutionary/Phylogenetic Trees

Theory of Evolution Charles Darwin

Theory of Evolution. Charles Darwin

Gel Electrophoresis. 10/28/0310/21/2003 CAP/CGS 5991 Lecture 10Lecture 9 1

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

Phylogenetic inference

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Dr. Amira A. AL-Hosary

Phylogenetic Tree Reconstruction

BINF6201/8201. Molecular phylogenetic methods

Bioinformatics 1 -- lecture 9. Phylogenetic trees Distance-based tree building Parsimony

A (short) introduction to phylogenetics


Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D

Page 1. Evolutionary Trees. Why build evolutionary tree? Outline

Phylogenetic analyses. Kirsi Kostamo

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University

Phylogenetics. BIOL 7711 Computational Bioscience

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics

EVOLUTIONARY DISTANCES

Algorithms in Bioinformatics

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

Evolutionary Tree Analysis. Overview

Phylogeny Tree Algorithms

C3020 Molecular Evolution. Exercises #3: Phylogenetics

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

7. Tests for selection

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley

Phylogenetics: Building Phylogenetic Trees

Phylogeny. November 7, 2017

What is Phylogenetics

Phylogenetic inference: from sequences to trees

How to read and make phylogenetic trees Zuzana Starostová

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

Molecular Evolution and Phylogenetic Tree Reconstruction

Inferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution

Concepts and Methods in Molecular Divergence Time Estimation

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley

Phylogeny: building the tree of life

Phylogene)cs. IMBB 2016 BecA- ILRI Hub, Nairobi May 9 20, Joyce Nzioki

Intraspecific gene genealogies: trees grafting into networks

CS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1. Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week (Jan 27 & 29):

The Phylogenetic Handbook

Phylogeny: traditional and Bayesian approaches

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression)

Anatomy of a tree. clade is group of organisms with a shared ancestor. a monophyletic group shares a single common ancestor = tapirs-rhinos-horses

C.DARWIN ( )

Phylogenetic Analysis

Phylogenetic Analysis

Phylogenetic Analysis

Lecture 6 Phylogenetic Inference

Inferring Phylogenetic Trees. Distance Approaches. Representing distances. in rooted and unrooted trees. The distance approach to phylogenies

Evolutionary trees. Describe the relationship between objects, e.g. species or genes

Consistency Index (CI)

Plan: Evolutionary trees, characters. Perfect phylogeny Methods: NJ, parsimony, max likelihood, Quartet method

Phylogeny and Evolution. Gina Cannarozzi ETH Zurich Institute of Computational Science

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016

Molecular Evolution & Phylogenetics

Letter to the Editor. Department of Biology, Arizona State University

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

Multiple Sequence Alignment. Sequences

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Probabilistic modeling and molecular phylogeny

Phylogenetic trees 07/10/13

Bootstrap confidence levels for phylogenetic trees B. Efron, E. Halloran, and S. Holmes, 1996

Inferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT

Phylogenetics: Parsimony

CS5263 Bioinformatics. Guest Lecture Part II Phylogenetics

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

8/23/2014. Phylogeny and the Tree of Life

Consensus Methods. * You are only responsible for the first two

The Generalized Neighbor Joining method

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

A Phylogenetic Network Construction due to Constrained Recombination

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

Introduction to MEGA

Estimating Evolutionary Trees. Phylogenetic Methods

Copyright notice. Molecular Phylogeny and Evolution. Goals of the lecture. Introduction. Introduction. December 15, 2008

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees

Molecular Phylogenetics (part 1 of 2) Computational Biology Course João André Carriço

GENETICS - CLUTCH CH.22 EVOLUTIONARY GENETICS.

Consensus methods. Strict consensus methods

DNA Phylogeny. Signals and Systems in Biology Kushal EE, IIT Delhi

Phylogenetics: Parsimony and Likelihood. COMP Spring 2016 Luay Nakhleh, Rice University

Seuqence Analysis '17--lecture 10. Trees types of trees Newick notation UPGMA Fitch Margoliash Distance vs Parsimony

Molecular Evolution, course # Final Exam, May 3, 2006

CSCI1950 Z Computa4onal Methods for Biology Lecture 5

Reconstruire le passé biologique modèles, méthodes, performances, limites

molecular evolution and phylogenetics

Thanks to Paul Lewis, Jeff Thorne, and Joe Felsenstein for the use of slides

Transcription:

Constructing Evolutionary/Phylogenetic Trees 2 broad categories: istance-based methods Ultrametric Additive: UPGMA Transformed istance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood Bayesian Methods 3/31/05 1

Ultrametric An ultrametric tree: decreasing internal node labels distance between two nodes is label of least common ancestor. An ultrametric distance matrix: Symmetric matrix such that for every i, j, k, there is tie for maximum of (i,j), (j,k), (i,k) ij, ik jk i j k 3/31/05 2

Ultrametric: Assumptions Molecular Clock Hypothesis, Zuckerkandl & Pauling, 1962: Accepted point mutations in amino acid sequence of a protein occurs at a constant rate. Varies from protein to protein Varies from one part of a protein to another 3/31/05 3

Ultrametric: Example A B C E F G H A 0 4 3 4 5 4 3 4 B C E F G H A 3 5 4 E B,,F,H C,G 3/31/05 4

Ultrametric: Example A B C E F G H 5 A 0 4 3 4 5 4 3 4 B 0 4 2 5 1 4 4 C E F G H A 3 4 2 1 C,G B F E H 3/31/05 5

Ultrametric: istances Computed A B C E F G H 5 A 0 4 3 4 5 4 3 4 B 0 4 2 5 1 4 4 C 2 E F G H A 3 4 2 1 C,G B F E H 3/31/05 6

Unweighted pair-group method with arithmetic means (UPGMA) A B C AB C B d AB C d (AB)C C d AC d BC d (AB) d C d A d B d C d (AB)C = (d AC + d BC ) /2 A d AB /2 B The result is an ultrametric tree. 3/31/05 7

Transformed istance Method UPGMA makes errors when rate constancy among lineages does not hold. Remedy: introduce an outgroup & make corrections n ko ' = Now apply UPGMA ij ij 2 io jo + k = 1 n 3/31/05 8

Example: Start with an alignment 3/31/05 9

Example: Compute distances between sequences 3/31/05 10

Example: Tree using UPGMA 3/31/05 11

Additive-istance Trees Additive distance trees are edge-weighted trees, with distance between leaf nodes are exactly equal to length of path between nodes. A B C A 0 3 7 9 B 0 6 8 A 2 3 4 C 0 6 0 B 1 2 C 3/31/05 12

Unrooted Trees on 4 Taxa A C B A C A B B C 3/31/05 13

Four-Point Condition If the true tree is as shown below, then 1. d AB + d C < d AC + d B, and 2. d AB + d C < d A + d BC A C B 3/31/05 14

Saitou & Nei: Neighbor-Joining Method Start with a star topology. Find the pair to separate such that the total length of the tree is minimized. The pair is then replaced by its arithmetic mean, and the process is repeated. S 12 = 2 12 + 1 2( n 2) n ( + + 1 1k 2k k = 3 ( n ) 2) 3 i j n ij 3/31/05 15

Neighbor-Joining 1 2 n n 3 3 2 1 = + + + = n j i ij n k k k n n S 3 3 2 1 12 12 2) ( 1 ) ( 2) 2( 1 2 3/31/05 16

Example (Unrooted Tree) 3/31/05 17

Example (Rooted Tree) 3/31/05 18

Bootstrapping: Select columns 3/31/05 19

Constructing Evolutionary/Phylogenetic Trees 2 broad categories: istance-based methods Ultrametric Additive: UPGMA Transformed istance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood Bayesian Methods 3/31/05 20

Character-based Methods Input: characters, morphological features, sequences, etc. Output: phylogenetic tree that provides the history of what features changed. [Perfect Phylogeny Problem] one leaf/object, 1 edge per character, path changed traits 1 2 3 4 5 A 1 1 0 0 0 3 2 B 0 0 1 0 0 4 1 C 1 1 0 0 1 B E 5 0 0 1 1 0 E 0 1 0 0 0 A C 3/31/05 21

Example Perfect phylogeny does not always exist. 1 2 3 4 5 A 1 1 0 0 0 B 0 0 1 0 0 C 1 1 0 0 1 0 0 1 1 0 E 0 1 0 0 0 3 2 1 2 3 4 5 A 1 1 0 0 0 B 0 0 1 0 1 C 1 1 0 0 1 0 0 1 1 0 E 0 1 0 0 1 4 1 B E 5 A C 3/31/05 22

Maximum Parsimony Minimize the total number of mutations implied by the evolutionary history 3/31/05 23

Examples of Character ata 1 2 3 4 5 Characters/Sites A 1 1 0 0 0 Sequences 1 2 3 4 5 6 7 8 9 B 0 0 1 0 1 1 A A G A G T T C A C 1 1 0 0 1 2 A G C C G T T C T 0 0 1 1 0 3 A G A T A T C C A E 0 1 0 0 1 4 A G A G A T C C T 3/31/05 24

Maximum Parsimony Method: Example Sequence s Characters/Sites 1 2 3 4 5 6 7 8 9 1 A A G A G T T C A 2 A G C C G T T C T 3 A G A T A T C C A 4 A G A G A T C C T 3/31/05 25

Unrooted Trees on 4 Taxa A C B A C A B B C 3/31/05 26

1 2 3 4 5 6 7 8 9 1 A A G A G T T C A 2 A G C C G T T C T 3 A G A T A T C C A 4 A G A G A T C C T 3/31/05 27

Inferring nucleotides on internal nodes 3/31/05 28

Searching for the Maximum Parsimony Tree: Exhaustive Search 3/31/05 29

Searching for the Maximum Parsimony Tree: Branch-&-Bound 3/31/05 30

Probabilistic Models of Evolution Assuming a model of substitution, Pr{S i (t+ ) = Y S i (t) = X}, Using this formula it is possible to compute the likelihood that data is generated by a given phylogenetic tree T under a model of substitution. Now find the tree with the maximum likelihood. Y X Time elapsed? Prob of change along edge? Pr{S i (t+ ) = Y S i (t) = X} Prob of data? Product of prob for all edges 3/31/05 31

Computing Maximum Likelihood Tree 3/31/05 32

3/31/05 33

Models of Nucleotide Substitution Jukes-Cantor 1-3α α α α α 1-3α α α α α 1-3α α α α α 1-3α Kimura 3ST 1-α-β-γ α β γ α 1- α- β-γ γ β β γ 1- α- β-γ α γ β α 1- α- β-γ PAM Matrix for Amino Acids 3/31/05 34

PHYLIP s Characterbased Methods 3/31/05 35

PHYLIP s istancebased Methods 3/31/05 36

Bootstrap: Estimating confidence level of a phylogenetic hypothesis 3/31/05 37

Tree Evaluation Methods Bootstrap Skewness Test (Randomized Trees) Permutation Test (Randomized Characters) Parametric bootstrap Likelihood Ratio Tests 3/31/05 38

Computing Evolutionary Relationships: Basic Assumptions Evolutionary divergences are strictly bifurcating, i.e., observed data can be represented by a tree. [Exceptions: transfer of genetic material between organisms] Sampling of individuals from a group is enough to determine the relationships Each individual evolves independently. 3/31/05 39

UPGMA(fast) Comparisons assumes rate constancy; rarely used. Produces ultrametric trees Additive Methods (fast) If four-point condition is satisfied, works well Quality depends on quality of distance data oes not consider multiple substitutions at site Long sequences & small distances, small errors. Produces additive distance trees 3/31/05 40

Additive Metrics: Four-Point Condition If the true tree is as shown below, then 1. d AB + d C < d AC + d B, and 2. d AB + d C < d A + d BC A C B 3/31/05 41

Ultrametric vs Additive Metrics Check whether A is an additive distance matrix. Check whether U is an ultrametric matrix COMMENT: This is just a reduction. It does not mean that one can build a tree for the same data! 3/31/05 42

Comparisons Maximum Parsimony Character-based methods trusted more. MP assumes simplest explanation is always the best! No explicit assumptions, except that a tree that requires fewer substitutions is better Faster evolution on long branches may give rise to homoplasies, and MP may go wrong. Performance depends on the number of informative sites, and is usually not so good with finding clades. If more than one tree, builds consensus trees. 3/31/05 43

Comparisons Maximum Likelihood Uses all sites, unlike MP method. Makes assumptions on the rate and pattern of substitution. Relatively insensitive to violations of assumptions Not very robust (if some sequences very divergent). SLOW! Computationally intensive. Optimum usually cannot be found, since the search space is too large. Fast heuristics exist. Simulations show that it s better than MP and ME. 3/31/05 44

Phylogenetic Software PHYLIP, WEBPHYLIP, PhyloBLAST (large set of programs, command-line) PAUP (point-&-click) (MP-based) PUZZLE, TREE-PUZZLE, PAML, MOLPHY (ML-based) MrBAYES (Bayesian Methods) MACCLAE (MP?) LAMARC 3/31/05 45

Recipes to minimize errors in phylogenetic analysis Use large amounts of data. Randomize order, if needed. Exclude unreliable data (for e.g., when alignment is not known for sure) Exclude fast-evolving sequences or sites (3 rd codon positions), or only use it for close relationships. Most methods will incorrectly group sequences with similar base composition (Additive methods are robust in the presence of such sequences) Check validity of independence assumptions (e.g. changes on either side of a hairpin structure) Use all methods to look at data. 3/31/05 46

Alignments Inputs for phylogenetic analysis usually is a multiple sequence alignment. Programs such as CLUSTALW, produce good alignments, but not good trees. Aligning according to secondary or tertiary trees are better for phylogenetic analysis. Which alignment method is better for which phylogenetic analysis method? OPEN! 3/31/05 47

Other Heuristics Branch Swapping to modify existing trees Quartet Puzzling: rapid tree searching 3/31/05 48