Building Phylogenetic Trees UPGMA & NJ

Similar documents
Phylogenetic trees 07/10/13

Theory of Evolution. Charles Darwin

Bioinformatics 1 -- lecture 9. Phylogenetic trees Distance-based tree building Parsimony

BINF6201/8201. Molecular phylogenetic methods

Algorithms in Bioinformatics

CS5263 Bioinformatics. Guest Lecture Part II Phylogenetics


Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University

DNA Phylogeny. Signals and Systems in Biology Kushal EE, IIT Delhi

Evolutionary Tree Analysis. Overview

Week 5: Distance methods, DNA and protein models

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

Consistency Index (CI)

CSCI1950 Z Computa4onal Methods for Biology Lecture 5

Seuqence Analysis '17--lecture 10. Trees types of trees Newick notation UPGMA Fitch Margoliash Distance vs Parsimony

Constructing Evolutionary/Phylogenetic Trees

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

EVOLUTIONARY DISTANCES

Phylogeny: traditional and Bayesian approaches

Phylogenetics: Building Phylogenetic Trees

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

Letter to the Editor. Department of Biology, Arizona State University

Understanding phylogenies: Constructing and interpreting phylogenetic trees

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

CREATING PHYLOGENETIC TREES FROM DNA SEQUENCES

Theory of Evolution Charles Darwin

CS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1. Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003

Multiple Sequence Alignment. Sequences

PROTEIN PHYLOGENETIC INFERENCE USING MAXIMUM LIKELIHOOD WITH A GENETIC ALGORITHM

Algorithmic Methods Well-defined methodology Tree reconstruction those that are well-defined enough to be carried out by a computer. Felsenstein 2004,

B (a) n = 3 B D C. (b) n = 4

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

The least-squares approach to phylogenetics was first suggested

Phylogenetic Tree Reconstruction

AN ALTERNATING LEAST SQUARES APPROACH TO INFERRING PHYLOGENIES FROM PAIRWISE DISTANCES

ON THE UNIQUENESS OF BALANCED MINIMUM EVOLUTION

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057

Plan: Evolutionary trees, characters. Perfect phylogeny Methods: NJ, parsimony, max likelihood, Quartet method

Phylogenetics Todd Vision Spring Some applications. Uncultured microbial diversity

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Evolutionary trees. Describe the relationship between objects, e.g. species or genes

Page 1. Evolutionary Trees. Why build evolutionary tree? Outline

Copyright notice. Molecular Phylogeny and Evolution. Goals of the lecture. Introduction. Introduction. December 15, 2008

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Molecular Evolution and Phylogenetic Tree Reconstruction

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Evolutionary Trees. Evolutionary tree. To describe the evolutionary relationship among species A 3 A 2 A 4. R.C.T. Lee and Chin Lung Lu

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Application of new distance matrix to phylogenetic tree construction

A (short) introduction to phylogenetics

Phylogenetic inference

Dr. Amira A. AL-Hosary

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees

Minimum evolution using ordinary least-squares is less robust than neighbor-joining

Variances of the Average Numbers of Nucleotide Substitutions Within and Between Populations

Chapter 3: Phylogenetics

A Comparative Analysis of Popular Phylogenetic. Reconstruction Algorithms

Sequence Analysis '17- lecture 8. Multiple sequence alignment

What is Phylogenetics

Agricultural University

Phylogeny Tree Algorithms

ELE4120 Bioinformatics Tutorial 8

Evolutionary trees. Describe the relationship between objects, e.g. species or genes

On the Uniqueness of the Selection Criterion in Neighbor-Joining

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

Phylogenetic inference: from sequences to trees

Incremental Phylogenetics by Repeated Insertions: An Evolutionary Tree Algorithm

Phylogeny. November 7, 2017

Weighted Neighbor Joining: A Likelihood-Based Approach to Distance-Based Phylogeny Reconstruction

Inferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution

Phylogenetic Analysis and Intraspeci c Variation : Performance of Parsimony, Likelihood, and Distance Methods

Example questions. Z:\summer_10_teaching\bioinfo\Beispiel_frage_bioinformatik.doc [1 / 5]

molecular evolution and phylogenetics

CSCI1950 Z Computa4onal Methods for Biology Lecture 4. Ben Raphael February 2, hhp://cs.brown.edu/courses/csci1950 z/ Algorithm Summary

Lecture 8 Multiple Alignment and Phylogeny

Phylogenetic Trees. How do the changes in gene sequences allow us to reconstruct the evolutionary relationships between related species?

Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D

Supplementary Information

Inferring Phylogenetic Trees. Distance Approaches. Representing distances. in rooted and unrooted trees. The distance approach to phylogenies

Lecture 14: Multiple Sequence Alignment (Gene Finding, Conserved Elements) Scribe: John Ekins

How to read and make phylogenetic trees Zuzana Starostová

Phylogenetic analyses. Kirsi Kostamo

Nature Biotechnology: doi: /nbt Supplementary Figure 1. Detailed overview of the primer-free full-length SSU rrna library preparation.

arxiv: v1 [q-bio.pe] 3 May 2016

Phylogeny Jan 5, 2016

The Generalized Neighbor Joining method

Multiple Sequence Alignment (MAS)

Nearest Neighbor Search with Keywords

Reconstructing Trees from Subtree Weights

Phylogeny: building the tree of life

Using Phylogenomics to Predict Novel Fungal Pathogenicity Genes

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Maximum Likelihood Estimation on Large Phylogenies and Analysis of Adaptive Evolution in Human Influenza Virus A

Multiple Whole Genome Alignment

OMICS Journals are welcoming Submissions

Part III: Traveling salesman problems

Transcription:

uilding Phylogenetic Trees UPGM & NJ

UPGM UPGM Unweighted Pair-Group Method with rithmetic mean Unweighted = all pairwise distances contribute equally. Pair-Group = groups are combined in pairs. rithmetic mean = pairwise distances to each group (clade) are mean distances to all members of that group. Sokal R &Michener C (1958). statistical method for evaluating systematic relationships. University of Kansas Science ulletin 38:1409-1438.

UPGM: Principle UPGM Principle C E C E Find the 2 nodes with the shortest distance (here: C+) Start with unjoined ndoes and a pair-wise distance matrix - C E d, - C d,c d,c - d, d, d C, - E d,e d,e d C,E d,e - Join the 2 nodes Compute the branch lengths (d C,, d C,, d C,E )

UPGM: Principle UPGM Principle C E C E Repeat this process iteratively till the whole tree is obtained

UPGM: Example C E F G - 19 - C 27 31-8 18 26 - E 33 36 41 31 - F 18 1 32 17 35 - G 13 13 29 14 28 12 - istance matrix (can be obtained from pair-wise sequence alignments) The following example is from r Richard J. Edwards http://www.southampton.ac.uk/~re1u06/teaching/upgma/

UPGM: Example C E F G - 19 - C 27 31-8 18 26 - E 33 36 41 31 - F 18 1 32 17 35 - G 13 13 29 14 28 12 - Find the shortest distance. Here the shortest distance is 1 (between and F) Join the "nodes" (sequences) with the shortest distance: Here we join and F to create node F. epth of the new branch = 1/2 of the shortest distance (so that the node-to-node path length is equal to the shortest distance). Here: d F /2 = 0.5. 0.5 F 0.5

UPGM: Example F C E F G - F? - C 27? - 8? 26 - E 33? 41 31 - F 18-32 17 35 - G 13? 29 14 28 12 - Calculate mean pairwise distances with the other nodes (sequences) F C...

UPGM: Example F C E F G - F 18.5 - C 27 31.5-8 17.5 26 - E 33 35.5 41 31 - F 18-32 17 35 - G 13 12.5 29 14 28 12 - Calculate mean pairwise distances with the other nodes (sequences) Example d F, = (d, + d F, ) / 2 = (19 + 18) / 2 = 18.5 F C...

UPGM: Example F C E G - F 18.5 - C 27 31.5-8 17.5 26 - E 33 35.5 41 31 - G 13 12.5 29 14 28 - Repeat cycle with new shortest distances. Here, the next shortest distance is 8 (between and ). We thus join and with branch length = 8 / 2 = 4. 4 4 0.5 F 0.5

UPGM: Example F C E G - F 18 - C 26.5 31.5-8 17.5 26 - E 32 35.5 41 31 - G 13.5 12.5 29 14 28 - We join the closest nodes/groups and we recalculate the distances between nodes/groups. Example d F, = (d, + d F, + d, + d F, ) / 4 = = (19 + 18 + 18 + 17) / 4 = 18 F...

UPGM: Example F C E G - F 18 - C 26.5 31.5 - E 32 35.5 41 - G 13.5 12.5 29 28 - F G Repeat cycle with new shortest distances. Here, the next shortest distance is 12.5 (between F and G). We thus join F and G with branch length = 12.5 / 2 = 6.25. 4 4 0.5 5.75 0.5 6.25

UPGM: Example FG C E G - FG 16.5 - C 26.5 30.67 - E 32 33.0 41 - G 13.5 12.5 29 28 - The distances between nodes/groups are recalculated.

UPGM: Example FG C E - FG 16.5 - C 26.5 30.67 - E 32 33.0 41 - F G The shortest disance is recalculated, the nodes/groups are joined and the branch length is calculated. 4 4 0.5 5.75 0.5 6.25 4.25 2

UPGM: Example FG FG C E FG - FG 16.5 - C 29 30.67 - E 32.6 33.0 41 -

UPGM: Example FG C E FG - C 29 - E 32.6 41 - F G C 0.5 0.5 4 4 5.75 6.25 4.25 2 6.25 14.5

UPGM: Example FGC E FGC - E 34 - F G C E 0.5 0.5 4 4 5.75 6.25 4.25 2 17 6.25 14.5 2.5

UPGM: Example Remark: The source data for this example is a selection of Cytochrome C distances from Table 3 of Fitch & Margoliash (1967) Construction of phylogenetic tree, Science 155:279-84 Turtle - Human 19 - C Tuna C 27 31 - Chicken 8 18 26 - E Moth F Monkey G og Tutle 4 Chick 4 0.5 Man 5.75 F Monkey 0.5 G og 6.25 C Tuna E Moth E 33 36 41 31 - F 18 1 32 17 35 - G 13 13 29 14 28 12-4.25 2 17 Newick representation: 6.25 14.5 Source: r Richard J. Edwards Slides: http://www.southampton.ac.uk/~re1u06/teaching/upgma/ Software: http://bioware.soton.ac.uk/upgma.html 2.5

NJ Neighbour Joining (NJ) Neighbours = pair of nodes (sequences, OTUs) who have one node connecting them. Example: C Nodes and are neighbours (connected by only one internal node), and nodes C and are neighbours, whereas nodes and C (for ex.) are not neighbours. Saitou N, Nei M. (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol iol Evol. 4:406-25.

NJ: Principle Neighbour Joining (NJ) How to find neighbours? How to construct the tree? Principle: C Find the 2 nodes with the shortest distance (here: C+) Create an internal node (C) C C Compute the branch lengths (d C,C,d,C,d,C,...) E E Start with a "star" tree and a distance matrix dditive principle: d C, = d C,C + d,c

NJ: Principle Neighbour Joining (NJ) How to find neighbours? How to construct the tree? Principle: C Repeat this process iteratively till the whole tree is obtained C E E

NJ: Principle Neighbour Joining (NJ) How to find neighbours? How to construct the tree? Principle: C Repeat this process iteratively till the whole tree is obtained C E C E - d, - C d,c d,c - d, d, d C, - E d,e d,e d C,E d,e - E The distance between two nodes = distance given in the initial distance matrix

NJ: Principle Neighbour Joining (NJ) How to find neighbours? How to construct the tree? Theory: Saitou N, Nei M. (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol iol Evol. 4:406-25. Zvelebil & aum (2008) Terry Speed, lecture notes The Saitou-Nei algorithm is a good approximation of the exact method and run faster. It is illustrated on an example hereafter.

NJ: Example C E - 11 - C 12 9-17 16 16 - E 24 24 24 24 - istance matrix The following example is from Prof. Tore Samuelsson (2012) Genomics and ioinformatics - n introduction to Programming Tools for Life Scientists (Chap. 9)

NJ: Example C E - 11 - C 12 9-17 16 16 - E 24 24 24 24 - We start by calculating the S x value defined by the sum of all the distances to node X: S x = d X,i Here, we have: N! i=1 S = d, + d,c + d, + d,e = 11 + 12 + 17 +24 = 64 S = 11 + 9 + 16 + 24 = 60 S C = 12 + 9 + 16 + 24 = 61 S = 17 + 16 + 16 + 24 = 73 S E = 24 + 24 + 24 + 24 = 96

NJ: Example - C E 11 - C 12 9-17 16 16 - E 24 24 24 24 - We then calculate a δ matrix where δ ij = d ij - (S i + S j ) / (N-2) Here, we have: δ, = d, - (S + S ) / (N-2) = 11 - (64 + 60) / 3 = -30.3 S,C = 12 - (64 + 61) / 3 = -29.7 S, = 17 - (64 + 73) / 3 = -28.7...

NJ: Example C E - -30.3 - C -29.7-29.7 - -28.7-28.3-28.7 - E -29.3-28 -28.3-32.3 - δ matrix The number in this matrix reflect the relative total branch length of trees where the nodes i and j have been joined as neighbours.

NJ: Example C E - -30.3 - C -29.7-29.7 - -28.7-28.3-28.7 - E -29.3-28 -28.3-32.3 - δ matrix The number in this matrix reflect the relative total branch length of trees where the nodes i and j have been joins as neighbours. s we prefer the tree with the smallest total branch length we identify the minimum value, which in this case is δ,e =-32.3. Thus and E are the first nodes to be joined, to form a new node E.

NJ: Example C E - -30.3 - C -29.7-29.7 - -28.7-28.3-28.7 - E -29.3-28 -28.3-32.3 - δ matrix The distance d,e and d E,E are calulated as d,e = (d,e +(S -S E )/(N-2))/2 = (24+(73-96)/3) /2 = 8.2 d E,E = d,e - d,e = 15.8 These distances are used to build the tree: C 8.2 E 15.8 E

NJ: Example C E - 11 - C 12 9 - E 8.5 8 8 - New distance matrix The distances to the new node E are calulated as d,e = (d, + d E, - d,e ) / 2 = (17+24-24) / 2 = 8.5 d,e = (d, + d E, - d,e ) / 2 = (16+24-24) / 2 = 8 d C,E = (d,c + d E,C - d,e ) / 2 = (16+24-24) / 2 = 8

NJ: Example C E - -18.75 - C -18.25-19.5 - E -19.5-18.25-18.75 - New δ matrix We repeat the operation. Note that here there are two minimum values. We have selected nodes and C (to form node C) but the same final tree is obtained if we choose and E.

NJ: Example C E - -18.75 - C -18.25-19.5 - E -19.5-18.25-18.75 - New δ matrix The branch lengths are given by: d,c = (d,c + (S -S C ) / (N-2) ) / 2 = (9+(60-61)/2) / 2 = 4.25 d C,C = d,c - d,c = 9-4.25 = 4.75 and the tree becomes: 4.25 C E 8.2 C 4.75 15.8 E

NJ: Example C E - C 7 - E 8.5 3.5 - New distance matrix The distances to the new node E are calulated as d,c = (d, + d C, - d,c ) / 2 = (11+12-9) / 2 = 7 d E,C = (d,e + d C,E - d,c ) / 2 = (8+8-9) / 2 = 3.5

NJ: Example C E - C -19 - E -19-19 - New δ matrix The branch lengths are given by: d C,C = 1 d,c = 6 and the tree becomes: C 4.25 4.75 C 1 C 6 E 8.2 15.8 E

NJ: Example C - C E New distance matrix E 2.5 - Final tree 4.25 C C E 8.2 C 4.75 1 6 2.5 15.8 E

NJ: Example C E Check - 11 - C 12 9-17 16 16 - E 24 24 24 24 - d C, (distance matrix) = 16 d C, (tree) = 4.75+1+2.5+8.2 = 16.45 4.25 C C E 8.2 C 4.75 1 6 2.5 15.8 E

References