uilding Phylogenetic Trees UPGM & NJ
UPGM UPGM Unweighted Pair-Group Method with rithmetic mean Unweighted = all pairwise distances contribute equally. Pair-Group = groups are combined in pairs. rithmetic mean = pairwise distances to each group (clade) are mean distances to all members of that group. Sokal R &Michener C (1958). statistical method for evaluating systematic relationships. University of Kansas Science ulletin 38:1409-1438.
UPGM: Principle UPGM Principle C E C E Find the 2 nodes with the shortest distance (here: C+) Start with unjoined ndoes and a pair-wise distance matrix - C E d, - C d,c d,c - d, d, d C, - E d,e d,e d C,E d,e - Join the 2 nodes Compute the branch lengths (d C,, d C,, d C,E )
UPGM: Principle UPGM Principle C E C E Repeat this process iteratively till the whole tree is obtained
UPGM: Example C E F G - 19 - C 27 31-8 18 26 - E 33 36 41 31 - F 18 1 32 17 35 - G 13 13 29 14 28 12 - istance matrix (can be obtained from pair-wise sequence alignments) The following example is from r Richard J. Edwards http://www.southampton.ac.uk/~re1u06/teaching/upgma/
UPGM: Example C E F G - 19 - C 27 31-8 18 26 - E 33 36 41 31 - F 18 1 32 17 35 - G 13 13 29 14 28 12 - Find the shortest distance. Here the shortest distance is 1 (between and F) Join the "nodes" (sequences) with the shortest distance: Here we join and F to create node F. epth of the new branch = 1/2 of the shortest distance (so that the node-to-node path length is equal to the shortest distance). Here: d F /2 = 0.5. 0.5 F 0.5
UPGM: Example F C E F G - F? - C 27? - 8? 26 - E 33? 41 31 - F 18-32 17 35 - G 13? 29 14 28 12 - Calculate mean pairwise distances with the other nodes (sequences) F C...
UPGM: Example F C E F G - F 18.5 - C 27 31.5-8 17.5 26 - E 33 35.5 41 31 - F 18-32 17 35 - G 13 12.5 29 14 28 12 - Calculate mean pairwise distances with the other nodes (sequences) Example d F, = (d, + d F, ) / 2 = (19 + 18) / 2 = 18.5 F C...
UPGM: Example F C E G - F 18.5 - C 27 31.5-8 17.5 26 - E 33 35.5 41 31 - G 13 12.5 29 14 28 - Repeat cycle with new shortest distances. Here, the next shortest distance is 8 (between and ). We thus join and with branch length = 8 / 2 = 4. 4 4 0.5 F 0.5
UPGM: Example F C E G - F 18 - C 26.5 31.5-8 17.5 26 - E 32 35.5 41 31 - G 13.5 12.5 29 14 28 - We join the closest nodes/groups and we recalculate the distances between nodes/groups. Example d F, = (d, + d F, + d, + d F, ) / 4 = = (19 + 18 + 18 + 17) / 4 = 18 F...
UPGM: Example F C E G - F 18 - C 26.5 31.5 - E 32 35.5 41 - G 13.5 12.5 29 28 - F G Repeat cycle with new shortest distances. Here, the next shortest distance is 12.5 (between F and G). We thus join F and G with branch length = 12.5 / 2 = 6.25. 4 4 0.5 5.75 0.5 6.25
UPGM: Example FG C E G - FG 16.5 - C 26.5 30.67 - E 32 33.0 41 - G 13.5 12.5 29 28 - The distances between nodes/groups are recalculated.
UPGM: Example FG C E - FG 16.5 - C 26.5 30.67 - E 32 33.0 41 - F G The shortest disance is recalculated, the nodes/groups are joined and the branch length is calculated. 4 4 0.5 5.75 0.5 6.25 4.25 2
UPGM: Example FG FG C E FG - FG 16.5 - C 29 30.67 - E 32.6 33.0 41 -
UPGM: Example FG C E FG - C 29 - E 32.6 41 - F G C 0.5 0.5 4 4 5.75 6.25 4.25 2 6.25 14.5
UPGM: Example FGC E FGC - E 34 - F G C E 0.5 0.5 4 4 5.75 6.25 4.25 2 17 6.25 14.5 2.5
UPGM: Example Remark: The source data for this example is a selection of Cytochrome C distances from Table 3 of Fitch & Margoliash (1967) Construction of phylogenetic tree, Science 155:279-84 Turtle - Human 19 - C Tuna C 27 31 - Chicken 8 18 26 - E Moth F Monkey G og Tutle 4 Chick 4 0.5 Man 5.75 F Monkey 0.5 G og 6.25 C Tuna E Moth E 33 36 41 31 - F 18 1 32 17 35 - G 13 13 29 14 28 12-4.25 2 17 Newick representation: 6.25 14.5 Source: r Richard J. Edwards Slides: http://www.southampton.ac.uk/~re1u06/teaching/upgma/ Software: http://bioware.soton.ac.uk/upgma.html 2.5
NJ Neighbour Joining (NJ) Neighbours = pair of nodes (sequences, OTUs) who have one node connecting them. Example: C Nodes and are neighbours (connected by only one internal node), and nodes C and are neighbours, whereas nodes and C (for ex.) are not neighbours. Saitou N, Nei M. (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol iol Evol. 4:406-25.
NJ: Principle Neighbour Joining (NJ) How to find neighbours? How to construct the tree? Principle: C Find the 2 nodes with the shortest distance (here: C+) Create an internal node (C) C C Compute the branch lengths (d C,C,d,C,d,C,...) E E Start with a "star" tree and a distance matrix dditive principle: d C, = d C,C + d,c
NJ: Principle Neighbour Joining (NJ) How to find neighbours? How to construct the tree? Principle: C Repeat this process iteratively till the whole tree is obtained C E E
NJ: Principle Neighbour Joining (NJ) How to find neighbours? How to construct the tree? Principle: C Repeat this process iteratively till the whole tree is obtained C E C E - d, - C d,c d,c - d, d, d C, - E d,e d,e d C,E d,e - E The distance between two nodes = distance given in the initial distance matrix
NJ: Principle Neighbour Joining (NJ) How to find neighbours? How to construct the tree? Theory: Saitou N, Nei M. (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol iol Evol. 4:406-25. Zvelebil & aum (2008) Terry Speed, lecture notes The Saitou-Nei algorithm is a good approximation of the exact method and run faster. It is illustrated on an example hereafter.
NJ: Example C E - 11 - C 12 9-17 16 16 - E 24 24 24 24 - istance matrix The following example is from Prof. Tore Samuelsson (2012) Genomics and ioinformatics - n introduction to Programming Tools for Life Scientists (Chap. 9)
NJ: Example C E - 11 - C 12 9-17 16 16 - E 24 24 24 24 - We start by calculating the S x value defined by the sum of all the distances to node X: S x = d X,i Here, we have: N! i=1 S = d, + d,c + d, + d,e = 11 + 12 + 17 +24 = 64 S = 11 + 9 + 16 + 24 = 60 S C = 12 + 9 + 16 + 24 = 61 S = 17 + 16 + 16 + 24 = 73 S E = 24 + 24 + 24 + 24 = 96
NJ: Example - C E 11 - C 12 9-17 16 16 - E 24 24 24 24 - We then calculate a δ matrix where δ ij = d ij - (S i + S j ) / (N-2) Here, we have: δ, = d, - (S + S ) / (N-2) = 11 - (64 + 60) / 3 = -30.3 S,C = 12 - (64 + 61) / 3 = -29.7 S, = 17 - (64 + 73) / 3 = -28.7...
NJ: Example C E - -30.3 - C -29.7-29.7 - -28.7-28.3-28.7 - E -29.3-28 -28.3-32.3 - δ matrix The number in this matrix reflect the relative total branch length of trees where the nodes i and j have been joined as neighbours.
NJ: Example C E - -30.3 - C -29.7-29.7 - -28.7-28.3-28.7 - E -29.3-28 -28.3-32.3 - δ matrix The number in this matrix reflect the relative total branch length of trees where the nodes i and j have been joins as neighbours. s we prefer the tree with the smallest total branch length we identify the minimum value, which in this case is δ,e =-32.3. Thus and E are the first nodes to be joined, to form a new node E.
NJ: Example C E - -30.3 - C -29.7-29.7 - -28.7-28.3-28.7 - E -29.3-28 -28.3-32.3 - δ matrix The distance d,e and d E,E are calulated as d,e = (d,e +(S -S E )/(N-2))/2 = (24+(73-96)/3) /2 = 8.2 d E,E = d,e - d,e = 15.8 These distances are used to build the tree: C 8.2 E 15.8 E
NJ: Example C E - 11 - C 12 9 - E 8.5 8 8 - New distance matrix The distances to the new node E are calulated as d,e = (d, + d E, - d,e ) / 2 = (17+24-24) / 2 = 8.5 d,e = (d, + d E, - d,e ) / 2 = (16+24-24) / 2 = 8 d C,E = (d,c + d E,C - d,e ) / 2 = (16+24-24) / 2 = 8
NJ: Example C E - -18.75 - C -18.25-19.5 - E -19.5-18.25-18.75 - New δ matrix We repeat the operation. Note that here there are two minimum values. We have selected nodes and C (to form node C) but the same final tree is obtained if we choose and E.
NJ: Example C E - -18.75 - C -18.25-19.5 - E -19.5-18.25-18.75 - New δ matrix The branch lengths are given by: d,c = (d,c + (S -S C ) / (N-2) ) / 2 = (9+(60-61)/2) / 2 = 4.25 d C,C = d,c - d,c = 9-4.25 = 4.75 and the tree becomes: 4.25 C E 8.2 C 4.75 15.8 E
NJ: Example C E - C 7 - E 8.5 3.5 - New distance matrix The distances to the new node E are calulated as d,c = (d, + d C, - d,c ) / 2 = (11+12-9) / 2 = 7 d E,C = (d,e + d C,E - d,c ) / 2 = (8+8-9) / 2 = 3.5
NJ: Example C E - C -19 - E -19-19 - New δ matrix The branch lengths are given by: d C,C = 1 d,c = 6 and the tree becomes: C 4.25 4.75 C 1 C 6 E 8.2 15.8 E
NJ: Example C - C E New distance matrix E 2.5 - Final tree 4.25 C C E 8.2 C 4.75 1 6 2.5 15.8 E
NJ: Example C E Check - 11 - C 12 9-17 16 16 - E 24 24 24 24 - d C, (distance matrix) = 16 d C, (tree) = 4.75+1+2.5+8.2 = 16.45 4.25 C C E 8.2 C 4.75 1 6 2.5 15.8 E
References