Constructing Evolutionary/Phylogenetic Trees

Similar documents
Constructing Evolutionary/Phylogenetic Trees

Theory of Evolution Charles Darwin

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Dr. Amira A. AL-Hosary

Theory of Evolution. Charles Darwin

Phylogenetic inference

Phylogenetic analyses. Kirsi Kostamo

Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

A (short) introduction to phylogenetics

Phylogenetic Tree Reconstruction

Bioinformatics 1 -- lecture 9. Phylogenetic trees Distance-based tree building Parsimony

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Phylogenetics. BIOL 7711 Computational Bioscience

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley

7. Tests for selection


Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.

Gel Electrophoresis. 10/28/0310/21/2003 CAP/CGS 5991 Lecture 10Lecture 9 1

The Phylogenetic Handbook

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

Phylogeny. November 7, 2017

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

Phylogeny Tree Algorithms

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Phylogenetics: Building Phylogenetic Trees

Molecular Evolution & Phylogenetics

C3020 Molecular Evolution. Exercises #3: Phylogenetics

Algorithms in Bioinformatics

Evolutionary Tree Analysis. Overview

Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week (Jan 27 & 29):

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics

EVOLUTIONARY DISTANCES

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

Phylogenetic inference: from sequences to trees

Lecture 6 Phylogenetic Inference

Phylogenetic Analysis

Phylogenetic Analysis

Anatomy of a tree. clade is group of organisms with a shared ancestor. a monophyletic group shares a single common ancestor = tapirs-rhinos-horses

Intraspecific gene genealogies: trees grafting into networks

How to read and make phylogenetic trees Zuzana Starostová

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

BINF6201/8201. Molecular phylogenetic methods

Consensus methods. Strict consensus methods

Inferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution

Phylogenetic Analysis

Systematics - Bio 615

Phylogeny: building the tree of life

GENETICS - CLUTCH CH.22 EVOLUTIONARY GENETICS.

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

Phylogene)cs. IMBB 2016 BecA- ILRI Hub, Nairobi May 9 20, Joyce Nzioki

Questions we can ask. Recall. Accuracy and Precision. Systematics - Bio 615. Outline

Page 1. Evolutionary Trees. Why build evolutionary tree? Outline

Estimating Evolutionary Trees. Phylogenetic Methods

Concepts and Methods in Molecular Divergence Time Estimation

What is Phylogenetics

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Phylogeny: traditional and Bayesian approaches

Consensus Methods. * You are only responsible for the first two

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

Inferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University

Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30

Probabilistic modeling and molecular phylogeny

Introduction to MEGA

Molecular Evolution, course # Final Exam, May 3, 2006

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression)

C.DARWIN ( )

A Minimum Spanning Tree Framework for Inferring Phylogenies

Inferring Molecular Phylogeny

Letter to the Editor. Department of Biology, Arizona State University

CS5263 Bioinformatics. Guest Lecture Part II Phylogenetics

Maximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018

TheDisk-Covering MethodforTree Reconstruction

Introduction to characters and parsimony analysis

Integrative Biology 200A "PRINCIPLES OF PHYLOGENETICS" Spring 2008

How should we organize the diversity of animal life?

1 ATGGGTCTC 2 ATGAGTCTC

Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies

Molecular Evolution and Phylogenetic Tree Reconstruction

Consistency Index (CI)

(Stevens 1991) 1. morphological characters should be assumed to be quantitative unless demonstrated otherwise

arxiv: v1 [q-bio.pe] 27 Oct 2011

Generating phylogenetic trees with Phylomatic and dendrograms of functional traits in R

Bootstrap confidence levels for phylogenetic trees B. Efron, E. Halloran, and S. Holmes, 1996

Phylogenetics in the Age of Genomics: Prospects and Challenges

8/23/2014. Phylogeny and the Tree of Life

An Investigation of Phylogenetic Likelihood Methods

Phylogenetics: Parsimony and Likelihood. COMP Spring 2016 Luay Nakhleh, Rice University

X X (2) X Pr(X = x θ) (3)

Transcription:

Constructing Evolutionary/Phylogenetic Trees 2 broad categories: Distance-based methods Ultrametric Additive: UPGMA Transformed Distance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood Bayesian Methods 10/28/0310/21/2003 CAP/CGS 5991 Lecture 10Lecture 9 1

Character-based Methods Input: characters, morphological features, sequences, etc. Output: phylogenetic tree that provides the history of what features changed. [Perfect Phylogeny Problem] one leaf/object, 1 edge per character, path changed traits 1 2 3 4 5 A 1 1 0 0 0 3 2 B 0 0 1 0 0 4 1 C 1 1 0 0 1 D B E 5 D 0 0 1 1 0 E 0 1 0 0 0 A C 10/28/0310/21/2003 CAP/CGS 5991 Lecture 10Lecture 9 2

Example Perfect phylogeny does not always exist. 1 2 3 4 5 A 1 1 0 0 0 B 0 0 1 0 1 C 1 1 0 0 1 D 0 0 1 1 0 E 0 1 0 0 1 10/28/0310/21/2003 CAP/CGS 5991 Lecture 10Lecture 9 3

Maximum Parsimony Minimize the total number of mutations implied by the evolutionary history 10/28/0310/21/2003 CAP/CGS 5991 Lecture 10Lecture 9 4

Examples of Character Data 1 2 3 4 5 Characters/Sites A 1 1 0 0 0 Sequences 1 2 3 4 5 6 7 8 9 B 0 0 1 0 1 1 A A G A G T T C A C 1 1 0 0 1 2 A G C C G T T C T D 0 0 1 1 0 3 A G A T A T C C A E 0 1 0 0 1 4 A G A G A T C C T 10/28/0310/21/2003 CAP/CGS 5991 Lecture 10Lecture 9 5

Maximum Parsimony Method: Example Characters/Sites Sequences 1 2 3 4 5 6 7 8 9 1 A A G A G T T C A 2 A G C C G T T C T 3 A G A T A T C C A 4 A G A G A T C C T 10/28/0310/21/2003 CAP/CGS 5991 Lecture 10Lecture 9 6

Unrooted Trees on 4 Taxa A C B D A C A B D B C D 10/28/0310/21/2003 CAP/CGS 5991 Lecture 10Lecture 9 7

1 2 3 4 5 6 7 8 9 1 A A G A G T T C A 2 A G C C G T T C T 3 A G A T A T C C A 4 A G A G A T C C T 10/28/0310/21/2003 CAP/CGS 5991 Lecture 10Lecture 9 8

Inferring nucleotides on internal nodes 10/28/0310/21/2003 CAP/CGS 5991 Lecture 10Lecture 9 9

Searching for the Maximum Parsimony Tree: Exhaustive Search 10/28/0310/21/2003 CAP/CGS 5991 Lecture 10Lecture 9 10

Searching for the Maximum Parsimony Tree: Branch-&-Bound 10/28/0310/21/2003 CAP/CGS 5991 Lecture 10Lecture 9 11

Probabilistic Models of Evolution Assuming a model of substitution, Pr{S i (t+ ) = Y S i (t) = X}, Using this formula it is possible to compute the likelihood that data D is generated by a given phylogenetic tree T under a model of substitution. Now find the tree with the maximum likelihood. 10/28/0310/21/2003 CAP/CGS 5991 Lecture 10Lecture 9 12 Y X Time elapsed? Prob of change along edge? Pr{S i (t+ ) = Y S i (t) = X} Prob of data? Product of prob for all edges

Computing Maximum Likelihood Tree 10/28/0310/21/2003 CAP/CGS 5991 Lecture 10Lecture 9 13

Models of Nucleotide Substitution Jukes-Cantor 1-3α α α α α 1-3α α α α α 1-3α α α α α 1-3α Kimura 3ST 1-α-β-γ α β γ α 1- α- β-γ γ β β γ 1- α- β-γ α γ β α 1- α- β-γ PAM Matrix for Amino Acids 10/28/0310/21/2003 CAP/CGS 5991 Lecture 10Lecture 9 14

PHYLIP s Characterbased Methods 10/28/0310/21/2003 CAP/CGS 5991 Lecture 10Lecture 9 15

PHYLIP s Distancebased Methods 10/28/0310/21/2003 CAP/CGS 5991 Lecture 10Lecture 9 16

Bootstrap: Estimating confidence level of a phylogenetic hypothesis 10/28/0310/21/2003 CAP/CGS 5991 Lecture 10Lecture 9 17

Tree Evaluation Methods Bootstrap Skewness Test (Randomized Trees) Permutation Test (Randomized Characters) Parametric bootstrap Likelihood Ratio Tests 10/28/0310/21/2003 CAP/CGS 5991 Lecture 10Lecture 9 18

Computing Evolutionary Relationships: Basic Assumptions Evolutionary divergences are strictly bifurcating, i.e., observed data can be represented by a tree. [Exceptions: transfer of genetic material between organisms] Sampling of individuals from a group is enough to determine the relationships Each individual evolves independently. 10/28/0310/21/2003 CAP/CGS 5991 Lecture 10Lecture 9 19

Comparisons UPGMA(fast) assumes rate constancy; rarely used. Additive Methods (fast) If four-point condition is satisfied, works well Quality depends on quality of distance data Does not consider multiple substitutions at site Long sequences & small distances, small errors. 10/28/0310/21/2003 CAP/CGS 5991 Lecture 10Lecture 9 20

Additive Metrics: Four-Point Condition If the true tree is as shown below, then 1. d AB + d CD < d AC + d BD, and 2. d AB + d CD < d AD + d BC A C B D 10/28/0310/21/2003 CAP/CGS 5991 Lecture 10Lecture 9 21

Ultrametric vs Additive Metrics Check whether A is an additive distance matrix. Check whether U is an ultrametric matrix COMMENT: This is just a reduction. It does not mean that one can build a tree for the same data! 10/28/0310/21/2003 CAP/CGS 5991 Lecture 10Lecture 9 22

Comparisons Maximum Parsimony Character-based methods trusted more. MP assumes simplest explanation is always the best! No explicit assumptions, except that a tree that requires fewer substitutions is better Faster evolution on long branches may give rise to homoplasies, and MP may go wrong. Performance depends on the number of informative sites, and is usually not so good with finding clades. If more than one tree, builds consensus trees. 10/28/0310/21/2003 CAP/CGS 5991 Lecture 10Lecture 9 23

Comparisons Maximum Likelihood Uses all sites, unlike MP method. Makes assumptions on the rate and pattern of substitution. Relatively insensitive to violations of assumptions Not very robust (if some sequences very divergent). SLOW! Computationally intensive. Optimum usually cannot be found, since the search space is too large. Fast heuristics exist. Simulations show that it s better than MP and ME. 10/28/0310/21/2003 CAP/CGS 5991 Lecture 10Lecture 9 24

Phylogenetic Software PHYLIP, WEBPHYLIP, PhyloBLAST (large set of programs, command-line) PAUP (point-&-click) (MP-based) PUZZLE, TREE-PUZZLE, PAML, MOLPHY (ML-based) MrBAYES (Bayesian Methods) MACCLADE (MP?) LAMARC 10/28/0310/21/2003 CAP/CGS 5991 Lecture 10Lecture 9 25

Recipes to minimize errors in phylogenetic analysis Use large amounts of data. Randomize order, if needed. Exclude unreliable data (for e.g., when alignment is not known for sure) Exclude fast-evolving sequences or sites (3 rd codon positions), or only use it for close relationships. Most methods will incorrectly group sequences with similar base composition (Additive methods are robust in the presence of such sequences) Check validity of independence assumptions (e.g. changes on either side of a hairpin structure) Use all methods to look at data. 10/28/0310/21/2003 CAP/CGS 5991 Lecture 10Lecture 9 26

Alignments Inputs for phylogenetic analysis usually is a multiple sequence alignment. Programs such as CLUSTALW, produce good alignments, but not good trees. Aligning according to secondary or tertiary trees are better for phylogenetic analysis. Which alignment method is better for which phylogenetic analysis method? OPEN! 10/28/0310/21/2003 CAP/CGS 5991 Lecture 10Lecture 9 27

Other Heuristics Branch Swapping to modify existing trees Quartet Puzzling: rapid tree searching 10/28/0310/21/2003 CAP/CGS 5991 Lecture 10Lecture 9 28