Week 5: Distance methods, DNA and protein models

Size: px
Start display at page:

Download "Week 5: Distance methods, DNA and protein models"

Transcription

1 Week 5: Distance methods, DNA and protein models Genome 570 February, 2016 Week 5: Distance methods, DNA and protein models p.1/69

2 A tree and the expected distances it predicts E A C B 0.10 D A B C D E A B C D E The predicted distances are the sums of branch lengths between those two species. Week 5: Distance methods, DNA and protein models p.2/69

3 A tree and a set of two-species trees E A C B 0.10 D E A C D B The two-species trees correspond to the pairwise distances observed between the pairs of species. The tree also predicts two-species trees, because clipping off all other species you are left simply with the path between that pair of species. Week 5: Distance methods, DNA and protein models p.3/69

4 A tree with branch lengths A v 1 v 2 B E v 5 v 6 v 7 v 3 v 4 C D Distance matrix methods always infer trees that have branch lengths, and they assume models of change of the characters (sequences). Week 5: Distance methods, DNA and protein models p.4/69

5 Distance matrix methods observed distances calculated from the sequence data A B C D E F A B C D E F Find the tree which comes closest to predicting the observed pairwise distances compare A C Each possible tree (with branch lengths) predicts pairwise distances F D B E A B C D E F A B C D E F Week 5: Distance methods, DNA and protein models p.5/69

6 The math of least squares trees Q = n w ij (D ij d ij ) 2 i=1 j i d ij = k x ij,k v k Q = dq dv k = 2 ( n w ij D ij j i k i=1 ( n w ij x ij,k D ij j i k i=1 x ij,k v k ) 2. x ij,k v k ) = 0. Equations to solve to infer branch lengths by least squares on a given tree topology, which specifies the x ij,k. Week 5: Distance methods, DNA and protein models p.6/69

7 Solving for least squares branch lengths D AB + D AC + D AD + D AE = 4v 1 + v 2 + v 3 + v 4 + v 5 + 2v 6 + 2v 7 D AB + D BC + D BD + D BE = v 1 + 4v 2 + v 3 + v 4 + v 5 + 2v 6 + 3v 7 D AC + D BC + D CD + D CE = v 1 + v 2 + 4v 3 + v 4 + v 5 + 3v 6 + 2v 7 D AD + D BD + D CD + D DE = v 1 + v 2 + v 3 + 5v 4 + v 5 + 2v 6 + 3v 7 D AE + D BE + D CE + D DE = v 1 + v 2 + v 3 + v 4 + 4v 5 + 3v 6 + 2v 7 D AC + D AE + D BC +D BE + D CD + D DE = 2v 1 + 2v 2 + 3v 3 + 2v 4 + v 5 + 6v 6 + 4v 7 D AB + D AD + D BC +D BE + D CD + D DE = 2v 1 + 3v 2 + 3v 4 + 2v 5 + 4v 6 + 6v 7 Week 5: Distance methods, DNA and protein models p.7/69

8 A vector of all distances, stacked d = D AB D AC D AD D AE D BC D BD D BE D CD D CE D DE They are in lexicographic (dictionary) order. Week 5: Distance methods, DNA and protein models p.8/69

9 The design matrix and least squares equations X = So solution of equations is X T D = ( X T X ) v. v = (X T X) 1 X T D Week 5: Distance methods, DNA and protein models p.9/69

10 A diagonal matrix of weights W = w AB w AC w AD w AE w BC w BD w BE w CD w CE w DE, Week 5: Distance methods, DNA and protein models p.10/69

11 Weighted least squares equations X T WD = ( X T WX ) v, v = ( X T WX ) 1 X T WD. These matrix equations solve for weighted least squares estimates of the branch lengths on a given tree. The tree topology is specified by the design matrix X and the weights are the elements of the diagonal matrix W. Week 5: Distance methods, DNA and protein models p.11/69

12 A statistical justification for least squares SSQ = n i=1 j i (D ij E (D ij )) 2. Var(D ij ) This least squares method... is what we would get by standard statistical least squares approachesif the distances were normally distributed, independently varying, and had expectation and variance as shown... but they actually aren t independent in almost all cases (such as molecular sequences), but it can be shown that the estimate of the tree will be a consistent estimate in the case of non-independence, just not as efficient Week 5: Distance methods, DNA and protein models p.12/69

13 The Jukes-Cantor model A u/3 G u/3 u/3 u/3 u/3 C u/3 T The simplest and most symmetrical of models of DNA evolution. In a small interval of time dt the probability of change at a site is u dt, and it is equally likely to go to each of the other three bases. Week 5: Distance methods, DNA and protein models p.13/69

14 A simple derivation of the probabilities of net change Imagine a (fictional) kind of event that could change the base to one of the four possible bases (including the same base) with equal probability (instead of changing to one of the other three). Week 5: Distance methods, DNA and protein models p.14/69

15 A simple derivation of the probabilities of net change Imagine a (fictional) kind of event that could change the base to one of the four possible bases (including the same base) with equal probability (instead of changing to one of the other three). If you set the probability of change in that model to 4 3u dt, it would be indistinguishable from the actual Jukes-Cantor model. Week 5: Distance methods, DNA and protein models p.14/69

16 A simple derivation of the probabilities of net change Imagine a (fictional) kind of event that could change the base to one of the four possible bases (including the same base) with equal probability (instead of changing to one of the other three). If you set the probability of change in that model to 4 3u dt, it would be indistinguishable from the actual Jukes-Cantor model. If any nonzero number of these fictional events occur on a branch, the probability of ending up with (say) base C is 1 4. Week 5: Distance methods, DNA and protein models p.14/69

17 A simple derivation of the probabilities of net change Imagine a (fictional) kind of event that could change the base to one of the four possible bases (including the same base) with equal probability (instead of changing to one of the other three). If you set the probability of change in that model to 4 3u dt, it would be indistinguishable from the actual Jukes-Cantor model. If any nonzero number of these fictional events occur on a branch, the probability of ending up with (say) base C is 1 4. The number of these events that occur in a branch has Poisson distribution with expected number 4 3u t, so the probability of no event is e 4 3 ut Week 5: Distance methods, DNA and protein models p.14/69

18 The Jukes-Cantor model Probability of no event: Probability of some event: e 4 3 ut 1 e 4 3 ut Probability of changing to C given start at A, have rate u, time t: Prob (C A, u, t) = 1 (1 e 4ut) 3 4 fraction of sites different: f D = 3 4 ) (1 e 4 3 ut. Solving, the distance as function of the fraction of sites different is D = ût = 3 4 ln (1 4 3 f D ) Week 5: Distance methods, DNA and protein models p.15/69

19 differences per site Fraction of sites different versus branch length (1 e 3 r t ) branch length... as predicted by the Jukes-Cantor model. Week 5: Distance methods, DNA and protein models p.16/69

20 Branch length versus fraction of sites different ( (diff)) ln 1 3 branch length differences per site 1 Week 5: Distance methods, DNA and protein models p.17/69

21 If you don t correct for multiple changes B B A The true (unrooted) tree C A What we estimate C This happening because the long path between A and C is shortened more, proportionally, than the shorter paths between A and B and between C and B. The only way to do this is to put in a spurious branch leading to B. In effect, there is a war between the long paths and the short paths. With properly corrected distances, each path approaches its true length when we look at a very large number of sites. Week 5: Distance methods, DNA and protein models p.18/69

22 Least squares methods These are just differently weighted least squares methods, as mentioned previously. Fitch and Margoliash (1967) used a weight of 1/D 2 ij, so the quantity to be minimized is (D ij d ij ) 2 Q = ij D 2 ij Cavalli-Sforza and Edwards (1967) used the unweighted least squares method: Q = (D ij d ij ) 2 ij These amount to different assumptions about the how the size of a distance will affect its variance. Week 5: Distance methods, DNA and protein models p.19/69

23 The Minimum Evolution method Kidd and Sgaramella-Zonta (1971) and (independently) Rzhetsky and Nei (1992ff.) came up with this method: Search through tree space as usual For each tree estimate branch lengths by least squares, not allowing negative branch lengths Then actually evaluate the treenot by the sum of squares, but by the total sum of branch lengths. This does fairly well, in spite of the mixture of two optimization criteria. Note that it isnot directly related to parsimony, in spite of its name. Week 5: Distance methods, DNA and protein models p.20/69

24 The UPGMA algorithm (ij) The UPGMA algorithm 1. Assign each species weight w i 2. Pick the values of i and j that are for the smallest of the D ij 3. Make a new group (ij) = 1 5. Its weight is w + w i j 6. The distance between (ij) and another (say k) is computed as w D + w D i ik j jk D = (ij),k j i D ik D (ij),k 4. Assign time depth D ij / 2 to the connection between i and j. D jk k w + w i j 7. Delete i and j from the table, put in a row and column for (ij). 8. If there is only one group left, stop. Otherwise go to step 2. Note that the weighting in effect weights each of the original tips equally, so that a group s distance to an outside species or another group is the average of all the distances between species in one and species in the other. This is a natural unweighted choice. Week 5: Distance methods, DNA and protein models p.21/69

25 Sarich 1969, immunological distances dog bear raccoon weasel seal sea lion cat monkey dog bear raccoon weasel seal sea lion cat monkey Week 5: Distance methods, DNA and protein models p.22/69

26 Find smallest element, its rows, columns dog bear raccoon weasel seal sea lion cat monkey dog bear raccoon weasel seal sea lion cat monkey Week 5: Distance methods, DNA and protein models p.23/69

27 We do the following averaging dog bear raccoon weasel seal sea lion cat monkey dog ( )/ bear ( )/ raccoon ( )/ weasel ( )/ seal sea lion 24 0 cat ( )/ monkey ( )/ Week 5: Distance methods, DNA and protein models p.24/69

28 Clustering seal and sea lion dog bear raccoon seal sea lion weasel cat monkey Week 5: Distance methods, DNA and protein models p.25/69

29 After clustering seal and sea lion dog bear raccoon weasel SS cat monkey dog bear raccoon weasel SS cat monkey Week 5: Distance methods, DNA and protein models p.26/69

30 Clustering bear and racoon dog bear raccoon seal sea lion weasel cat monkey Week 5: Distance methods, DNA and protein models p.27/69

31 After clustering bear and raccoon dog BR weasel SS cat monkey dog BR weasel SS cat monkey Week 5: Distance methods, DNA and protein models p.28/69

32 Clustering bear-raccoon with seal-sealion dog bear raccoon seal sea lion weasel cat monkey Week 5: Distance methods, DNA and protein models p.29/69

33 After clustering those two clusters dog BRSS weasel cat monkey dog BRSS weasel cat monkey Week 5: Distance methods, DNA and protein models p.30/69

34 Clustering weasel with BRSS dog bear raccoon seal sea lion weasel cat monkey Week 5: Distance methods, DNA and protein models p.31/69

35 After adding weasel to that cluster dog BRSSW cat monkey dog BRSSW cat monkey Week 5: Distance methods, DNA and protein models p.32/69

36 Clustering the dog with BRSSW dog bear raccoon seal sea lion weasel cat monkey Week 5: Distance methods, DNA and protein models p.33/69

37 After adding dog to it DBRWSS cat monkey DBRWSS cat monkey Week 5: Distance methods, DNA and protein models p.34/69

38 Clustering all the carnivores and pinnipeds dog bear raccoon seal sea lion weasel cat monkey Week 5: Distance methods, DNA and protein models p.35/69

39 Finally, just monkey remaining DBRWSSC monkey DBRWSSC monkey Week 5: Distance methods, DNA and protein models p.36/69

40 The UPGMA tree dog bear raccoon seal sea lion weasel cat monkey Week 5: Distance methods, DNA and protein models p.37/69

41 UPGMA can mislead 13 A True tree B C D 10 True distance matrix A B C D A B C D UPGMA tree B C D A Week 5: Distance methods, DNA and protein models p.38/69

42 Neighbor-Joining For each tip, compute u i = n j:j i D ij/(n 2). Note that the denominator is (deliberately) not the number of items summed. Choose the i and j for which D ij u i u j is smallest. Join items i and j. Compute the branch length from i to the new node (v i ) and from j to the new node (v j ) as v i = 1 2 D ij (u i u j ) v j = 1 2 D ij (u j u i ) (continued on next slide) Week 5: Distance methods, DNA and protein models p.39/69

43 continued... Compute the distance between the new node (ij) and each of the remaining tips as D (ij),k = (D ik + D jk D ij ) Delete tips i and j from the tables and replace them by the new node, (ij), which is now treated as a tip. If more than two nodes remain, go back to step 1. Otherwise, connect the two remaining nodes (say, l and m ) by a branch of length D lm. / 2 Week 5: Distance methods, DNA and protein models p.40/69

44 The NJ star decomposition i v i v j j (ij) k Week 5: Distance methods, DNA and protein models p.41/69

45 The NJ tree on the Sarich data dog bear raccoon seal sea lion weasel cat monkey Same as UPGMA? No. Not clocklike Week 5: Distance methods, DNA and protein models p.42/69

46 Unweighted least squares on the Sarich data monkey cat weasel dog raccoon bear sea lion seal Same as Neighbor-Joining tree? Close but not the same. Week 5: Distance methods, DNA and protein models p.43/69

47 Fitch-Margoliash tree on the Sarich (1969) data monkey cat weasel dog raccoon bear sea lion seal Same as the unweighted least squares tree? No. Week 5: Distance methods, DNA and protein models p.44/69

48 Minimum evolution tree on the Sarich (1969) data monkey cat weasel raccoon dog bear sea lion seal Same as either least squares tree? As NJ tree? No. Week 5: Distance methods, DNA and protein models p.45/69

49 Kimura 2-parameter model A α G β β β β C α T The simplest, most symmetrical model that has different rates for transitions than for transversions. Week 5: Distance methods, DNA and protein models p.46/69

50 Parameters of K2P in terms of Ts/Tn ratio If we let R be the expected ratio of transition changes to transversions, it turns out that α = R R+1 β = ( ) R+1 Week 5: Distance methods, DNA and protein models p.47/69

51 Transition [sic] probabilities for K2P Computing the net probabilities of change ( transition in the stochastic process sense rather than as molecular biologists use it), it turns out that ) Prob (transition t) = ( exp 2R+1 R+1 t Prob (transversion t) = exp ( 2 R+1 t ) exp ( 2 R+1 t ) Week 5: Distance methods, DNA and protein models p.48/69

52 Transition and transversion when R = 10 Differences Total differences Transitions 0.20 Transversions Time (branch length) R = 10 Week 5: Distance methods, DNA and protein models p.49/69

53 Transition and transversion when R = Total differences 0.60 Differences Transversions Transitions R = Time (branch length) Week 5: Distance methods, DNA and protein models p.50/69

54 ML estimates for the K2P model The sufficient statistics for comparison of DNA sequences under the K2P model are simply the observed fractions P and Q of transitions and transversions. Then the maximum likelihood estimates of the branch length t and of R are obtained by finding the values that predict exactly those fractions. That done by solving the equations given three slides ago. t = 1 4 ln [ (1 2Q)(1 2P Q) 2] R = ln(1 2P Q) ln(1 2Q) 1 2 Week 5: Distance methods, DNA and protein models p.51/69

55 Likelihood for two species under the K2P model L = Prob (data t, R) = ( ) 1 n 4 (1 P Q) n n 1 n 2 P n 1 ( 1 2 Q) n 2 where n 1 is the number of sites differing by transitions, and n 2 is the number of sites differing by transversions. P and Q are the expected fractions of transition and transversion differences, as given by the expressions four screens above. This is a function of the two parameters t and R. The values that maximize it were given above, if we estimate both of them. But if R is given rather than estimated, there is no closed-form equation solving for t, it has to be inferred numerically by finding the value of t that maximizes L. Week 5: Distance methods, DNA and protein models p.52/69

56 The Tamura/Nei model, F84, and HKY To : A G C T From : A α R π G /π R + βπ G βπ C βπ T G α R π A /π R + βπ A βπ C βπ T C βπ A βπ G α Y π T /π Y + βπ T T βπ A βπ G α Y π C /π Y + βπ C For the F84 model, α R = α Y For the HKY model, α R /α Y = π R /π Y Week 5: Distance methods, DNA and protein models p.53/69

57 Transition/transversion ratio for the Tamura-Nei model T s = 2 α R π A π G / π R + 2 α Y π C π T / π Y + β ( π A π G + 2π C π T ) T v = 2 β π R π Y To get T s /T v = R and T s + T v = 1, β = 1 2π R π Y (1 + R) α Y = π R π Y R π A π G π C π T (1 + R) (π Y π A π G ρ + π R π C π T ) α R = ρ α Y Week 5: Distance methods, DNA and protein models p.54/69

58 Using fictional events to mimic the Tamura-Nei model We imagine two types of events: Type I: If the existing base is a purine, draw a replacement from a purine pool with bases in relative proportions π A : π G. This event has rate α R. If the existing base is a pyrimidine, draw a replacement from a pyrimidine pool with bases in relative proportions π C : π T. This event has rate α Y. Type II: No matter what the existing base is, replace it by a base drawn from a pool at the overall equilibrium frequencies: π A : π C : π G : π T. This event has rate β. Week 5: Distance methods, DNA and protein models p.55/69

59 Transition [sic] probabilities with the Tamura-Nei model If the branch starts with a purine: No events exp( (α R + β)t) Some type I, no type II exp( β t) (1 exp( α R t)) Some type II 1 exp( β t) If the branch starts with a pyrimidine: No events exp( (α Y + β) t) Some type I, no type II exp( β t) (1 exp( α Y t)) Some type II 1 exp( β t) Week 5: Distance methods, DNA and protein models p.56/69

60 A transition probability So if we want to compute the probability of getting a G given that a branch starts with an A, we add up The probability of no events, times 0 (as you can t get a G from an A with no events) The probability of some type I, no type II times π G /π Y (as the last type I event puts in a G with probability equal to the fraction of G s out of all purines). The probability of some type II times π G (as if there is any type II event, we thereafter have a probability of G equal to its overall expected frequency, and further type I events don t change that). Week 5: Distance methods, DNA and protein models p.57/69

61 A transition probability So that, for example Prob (G A, t) = exp( β t) (1 exp( α R t)) π G π R + (1 exp( β t)) π G Week 5: Distance methods, DNA and protein models p.58/69

62 A more compact expression More generally, we can use the Kronecker delta notation δ ij and the Watson-Kronecker notation ε ij to write Prob (j i, t) = exp( (α i + β)t) δ ij +exp( βt) (1 exp( α i t)) ( ) πj ε P ij k ε jkπ k +(1 exp( βt))π j where δ ij is 1 if the two bases i and j are different (0 otherwise), and ε ij is 1 if, of the two bases i and j, one is a purine and one is a pyrimidine (0 otherwise). Week 5: Distance methods, DNA and protein models p.59/69

63 Reversibility and the GTR model The condition ofreversibility in a stochastic process is written in terms of the equilibrium frequences of the states ( π i and the transition probabilities as π i Prob (j i, t) = π j Prob (i j, t) The general time-reversible model: To : A G C T From : A π G α π C β π T γ G π A α π C δ π T ε C π A β π G δ π T η T π A γ π G ε π C η Week 5: Distance methods, DNA and protein models p.60/69

64 The GTR model βπ A A βπ C γπ γπ A T απ απ G δπ A G T G δπ επ C επ G C ηπ T ηπ C T It is the most general model possible that still has the changes satisfy the condition of reversibility so that one cannot tell which direction evolution has gone by examining the sequences before and after. Week 5: Distance methods, DNA and protein models p.61/69

65 Standardizing the rates 2π A π G α + 2 π A π C β + 2 π A π T γ + 2 π G π C δ + 2 π G π T ε + 2 π C π T η = 1 This is done so that the expected rate of change is 1 per unit time. Week 5: Distance methods, DNA and protein models p.62/69

66 General Time Reversible models inference A data example (simulated under a K2P model, true distance 0.2 transition/transversion ratio = 2 A G C T total A G C T total The K2P model is a special case of the GTR model (as are all the other models that have been mentioned here). So if all goes well we should infer parameters that come close to specifying a K2P model. Week 5: Distance methods, DNA and protein models p.63/69

67 Averaging across the diagonal... A G C T total A G C T total Week 5: Distance methods, DNA and protein models p.64/69

68 Dividing each column by its sum (column, because P ij is to be the probability of change from j to i ) ˆP = Week 5: Distance methods, DNA and protein models p.65/69

69 Rate matrix from the matrix logarithm If the rate matrix is A, so that P = e A t ) Ât = log (ˆP = Week 5: Distance methods, DNA and protein models p.66/69

70 Standardizing the rates If we denote by ˆD the diagonal matrix of observed base frequencies, and we require that the rate of (potentially-observable) substitution is 1: trace(âˆd) = 1 We get: ˆt = trace(ât ˆD) = trace ( ) log(ˆp) ˆD and that also gives us an estimate of the rate matrix: ) / ( )  = log (ˆP trace log(ˆp) ˆD Week 5: Distance methods, DNA and protein models p.67/69

71 The rate estimates  = moderately close to the actual K2P rate matrix used in the simulation which was: A = 1 2/3 1/6 1/6 2/3 1 1/6 1/6 1/6 1/6 1 2/3 1/6 1/6 2/3 1 (but if any of the eigenvalues of log(ˆp) are negative, this doesn t work and the divergence time is estimated to be infinite). Week 5: Distance methods, DNA and protein models p.68/69

72 The lattice of these models General 12 parameter model (12) General time reversible model (9) Tamura Nei (6) HKY (5) F84 (5) Kimura K2P (2) Jukes Cantor (1) There are a great many other models, but these are the most commonly used. They allow for unequal expected frequencies of bases, and for inequalities of transitions and transversions. Week 5: Distance methods, DNA and protein models p.69/69

Lecture 4. Models of DNA and protein change. Likelihood methods

Lecture 4. Models of DNA and protein change. Likelihood methods Lecture 4. Models of DNA and protein change. Likelihood methods Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 4. Models of DNA and protein change. Likelihood methods p.1/36

More information

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University Phylogenetics: Distance Methods COMP 571 - Spring 2015 Luay Nakhleh, Rice University Outline Evolutionary models and distance corrections Distance-based methods Evolutionary Models and Distance Correction

More information

Phylogenetics: Building Phylogenetic Trees

Phylogenetics: Building Phylogenetic Trees 1 Phylogenetics: Building Phylogenetic Trees COMP 571 Luay Nakhleh, Rice University 2 Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary model should

More information

Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/22

Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/22 Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 24. Phylogeny methods, part 4 (Models of DNA and

More information

Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/26

Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/26 Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 27. Phylogeny methods, part 4 (Models of DNA and

More information

Phylogeny: traditional and Bayesian approaches

Phylogeny: traditional and Bayesian approaches Phylogeny: traditional and Bayesian approaches 5-Feb-2014 DEKM book Notes from Dr. B. John Holder and Lewis, Nature Reviews Genetics 4, 275-284, 2003 1 Phylogeny A graph depicting the ancestor-descendent

More information

Phylogenetic trees 07/10/13

Phylogenetic trees 07/10/13 Phylogenetic trees 07/10/13 A tree is the only figure to occur in On the Origin of Species by Charles Darwin. It is a graphical representation of the evolutionary relationships among entities that share

More information

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University Phylogenetics: Building Phylogenetic Trees COMP 571 - Fall 2010 Luay Nakhleh, Rice University Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary

More information

EVOLUTIONARY DISTANCES

EVOLUTIONARY DISTANCES EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:

More information

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive. Additive distances Let T be a tree on leaf set S and let w : E R + be an edge-weighting of T, and assume T has no nodes of degree two. Let D ij = e P ij w(e), where P ij is the path in T from i to j. Then

More information

Consistency Index (CI)

Consistency Index (CI) Consistency Index (CI) minimum number of changes divided by the number required on the tree. CI=1 if there is no homoplasy negatively correlated with the number of species sampled Retention Index (RI)

More information

Maximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018

Maximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018 Maximum Likelihood Tree Estimation Carrie Tribble IB 200 9 Feb 2018 Outline 1. Tree building process under maximum likelihood 2. Key differences between maximum likelihood and parsimony 3. Some fancy extras

More information

Week 8: Testing trees, Bootstraps, jackknifes, gene frequencies

Week 8: Testing trees, Bootstraps, jackknifes, gene frequencies Week 8: Testing trees, ootstraps, jackknifes, gene frequencies Genome 570 ebruary, 2016 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.1/69 density e log (density) Normal distribution:

More information

Phylogenetic Tree Reconstruction

Phylogenetic Tree Reconstruction I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven

More information

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the

More information

AN ALTERNATING LEAST SQUARES APPROACH TO INFERRING PHYLOGENIES FROM PAIRWISE DISTANCES

AN ALTERNATING LEAST SQUARES APPROACH TO INFERRING PHYLOGENIES FROM PAIRWISE DISTANCES Syst. Biol. 46(l):11-lll / 1997 AN ALTERNATING LEAST SQUARES APPROACH TO INFERRING PHYLOGENIES FROM PAIRWISE DISTANCES JOSEPH FELSENSTEIN Department of Genetics, University of Washington, Box 35736, Seattle,

More information

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics Bioinformatics 1 Biology, Sequences, Phylogenetics Part 4 Sepp Hochreiter Klausur Mo. 30.01.2011 Zeit: 15:30 17:00 Raum: HS14 Anmeldung Kusss Contents Methods and Bootstrapping of Maximum Methods Methods

More information

Phylogenetic inference

Phylogenetic inference Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis-) advantages of different information types

More information

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

More information

Tree of Life iological Sequence nalysis Chapter http://tolweb.org/tree/ Phylogenetic Prediction ll organisms on Earth have a common ancestor. ll species are related. The relationship is called a phylogeny

More information

Letter to the Editor. Department of Biology, Arizona State University

Letter to the Editor. Department of Biology, Arizona State University Letter to the Editor Traditional Phylogenetic Reconstruction Methods Reconstruct Shallow and Deep Evolutionary Relationships Equally Well Michael S. Rosenberg and Sudhir Kumar Department of Biology, Arizona

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: istance-based methods Ultrametric Additive: UPGMA Transformed istance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

Evolutionary Tree Analysis. Overview

Evolutionary Tree Analysis. Overview CSI/BINF 5330 Evolutionary Tree Analysis Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Backgrounds Distance-Based Evolutionary Tree Reconstruction Character-Based

More information

Lecture 4. Models of DNA and protein change. Likelihood methods

Lecture 4. Models of DNA and protein change. Likelihood methods Lecture 4. Models of DNA and protein change. Likelihood methods Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 4. Models of DNA and protein change. Likelihood methods p.1/39

More information

Mutation models I: basic nucleotide sequence mutation models

Mutation models I: basic nucleotide sequence mutation models Mutation models I: basic nucleotide sequence mutation models Peter Beerli September 3, 009 Mutations are irreversible changes in the DNA. This changes may be introduced by chance, by chemical agents, or

More information

Dr. Amira A. AL-Hosary

Dr. Amira A. AL-Hosary Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

More information

Week 7: Bayesian inference, Testing trees, Bootstraps

Week 7: Bayesian inference, Testing trees, Bootstraps Week 7: ayesian inference, Testing trees, ootstraps Genome 570 May, 2008 Week 7: ayesian inference, Testing trees, ootstraps p.1/54 ayes Theorem onditional probability of hypothesis given data is: Prob

More information

Molecular Evolution and Phylogenetic Tree Reconstruction

Molecular Evolution and Phylogenetic Tree Reconstruction 1 4 Molecular Evolution and Phylogenetic Tree Reconstruction 3 2 5 1 4 2 3 5 Orthology, Paralogy, Inparalogs, Outparalogs Phylogenetic Trees Nodes: species Edges: time of independent evolution Edge length

More information

Lab 9: Maximum Likelihood and Modeltest

Lab 9: Maximum Likelihood and Modeltest Integrative Biology 200A University of California, Berkeley "PRINCIPLES OF PHYLOGENETICS" Spring 2010 Updated by Nick Matzke Lab 9: Maximum Likelihood and Modeltest In this lab we re going to use PAUP*

More information

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center Phylogenetic Analysis Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center Outline Basic Concepts Tree Construction Methods Distance-based methods

More information

Math 239: Discrete Mathematics for the Life Sciences Spring Lecture 14 March 11. Scribe/ Editor: Maria Angelica Cueto/ C.E.

Math 239: Discrete Mathematics for the Life Sciences Spring Lecture 14 March 11. Scribe/ Editor: Maria Angelica Cueto/ C.E. Math 239: Discrete Mathematics for the Life Sciences Spring 2008 Lecture 14 March 11 Lecturer: Lior Pachter Scribe/ Editor: Maria Angelica Cueto/ C.E. Csar 14.1 Introduction The goal of today s lecture

More information

Massachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution

Massachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution Massachusetts Institute of Technology 6.877 Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution 1. Rates of amino acid replacement The initial motivation for the neutral

More information

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree) I9 Introduction to Bioinformatics, 0 Phylogenetic ree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & omputing, IUB Evolution theory Speciation Evolution of new organisms is driven by

More information

Inferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution

Inferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution Today s topics Inferring phylogeny Introduction! Distance methods! Parsimony method!"#$%&'(!)* +,-.'/01!23454(6!7!2845*0&4'9#6!:&454(6 ;?@AB=C?DEF Overview of phylogenetic inferences Methodology Methods

More information

Phylogenetics. Andreas Bernauer, March 28, Expected number of substitutions using matrix algebra 2

Phylogenetics. Andreas Bernauer, March 28, Expected number of substitutions using matrix algebra 2 Phylogenetics Andreas Bernauer, andreas@carrot.mcb.uconn.edu March 28, 2004 Contents 1 ts:tr rate ratio vs. ts:tr ratio 1 2 Expected number of substitutions using matrix algebra 2 3 Why the GTR model can

More information

Improving divergence time estimation in phylogenetics: more taxa vs. longer sequences

Improving divergence time estimation in phylogenetics: more taxa vs. longer sequences Mathematical Statistics Stockholm University Improving divergence time estimation in phylogenetics: more taxa vs. longer sequences Bodil Svennblad Tom Britton Research Report 2007:2 ISSN 650-0377 Postal

More information

Substitution = Mutation followed. by Fixation. Common Ancestor ACGATC 1:A G 2:C A GAGATC 3:G A 6:C T 5:T C 4:A C GAAATT 1:G A

Substitution = Mutation followed. by Fixation. Common Ancestor ACGATC 1:A G 2:C A GAGATC 3:G A 6:C T 5:T C 4:A C GAAATT 1:G A GAGATC 3:G A 6:C T Common Ancestor ACGATC 1:A G 2:C A Substitution = Mutation followed 5:T C by Fixation GAAATT 4:A C 1:G A AAAATT GAAATT GAGCTC ACGACC Chimp Human Gorilla Gibbon AAAATT GAAATT GAGCTC ACGACC

More information

Theory of Evolution Charles Darwin

Theory of Evolution Charles Darwin Theory of Evolution Charles arwin 858-59: Origin of Species 5 year voyage of H.M.S. eagle (83-36) Populations have variations. Natural Selection & Survival of the fittest: nature selects best adapted varieties

More information

DNA Phylogeny. Signals and Systems in Biology Kushal EE, IIT Delhi

DNA Phylogeny. Signals and Systems in Biology Kushal EE, IIT Delhi DNA Phylogeny Signals and Systems in Biology Kushal Shah @ EE, IIT Delhi Phylogenetics Grouping and Division of organisms Keeps changing with time Splitting, hybridization and termination Cladistics :

More information

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057 Estimating Phylogenies (Evolutionary Trees) II Biol4230 Thurs, March 2, 2017 Bill Pearson wrp@virginia.edu 4-2818 Jordan 6-057 Tree estimation strategies: Parsimony?no model, simply count minimum number

More information

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis

More information

CSCI1950 Z Computa4onal Methods for Biology Lecture 5

CSCI1950 Z Computa4onal Methods for Biology Lecture 5 CSCI1950 Z Computa4onal Methods for Biology Lecture 5 Ben Raphael February 6, 2009 hip://cs.brown.edu/courses/csci1950 z/ Alignment vs. Distance Matrix Mouse: ACAGTGACGCCACACACGT Gorilla: CCTGCGACGTAACAAACGC

More information

Building Phylogenetic Trees UPGMA & NJ

Building Phylogenetic Trees UPGMA & NJ uilding Phylogenetic Trees UPGM & NJ UPGM UPGM Unweighted Pair-Group Method with rithmetic mean Unweighted = all pairwise distances contribute equally. Pair-Group = groups are combined in pairs. rithmetic

More information

Reading for Lecture 13 Release v10

Reading for Lecture 13 Release v10 Reading for Lecture 13 Release v10 Christopher Lee November 15, 2011 Contents 1 Evolutionary Trees i 1.1 Evolution as a Markov Process...................................... ii 1.2 Rooted vs. Unrooted Trees........................................

More information

BMI/CS 776 Lecture 4. Colin Dewey

BMI/CS 776 Lecture 4. Colin Dewey BMI/CS 776 Lecture 4 Colin Dewey 2007.02.01 Outline Common nucleotide substitution models Directed graphical models Ancestral sequence inference Poisson process continuous Markov process X t0 X t1 X t2

More information

Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment

Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment Sequence Analysis 17: lecture 5 Substitution matrices Multiple sequence alignment Substitution matrices Used to score aligned positions, usually of amino acids. Expressed as the log-likelihood ratio of

More information

BINF6201/8201. Molecular phylogenetic methods

BINF6201/8201. Molecular phylogenetic methods BINF60/80 Molecular phylogenetic methods 0-7-06 Phylogenetics Ø According to the evolutionary theory, all life forms on this planet are related to one another by descent. Ø Traditionally, phylogenetics

More information

Phylogene)cs. IMBB 2016 BecA- ILRI Hub, Nairobi May 9 20, Joyce Nzioki

Phylogene)cs. IMBB 2016 BecA- ILRI Hub, Nairobi May 9 20, Joyce Nzioki Phylogene)cs IMBB 2016 BecA- ILRI Hub, Nairobi May 9 20, 2016 Joyce Nzioki Phylogenetics The study of evolutionary relatedness of organisms. Derived from two Greek words:» Phle/Phylon: Tribe/Race» Genetikos:

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: Distance-based methods Ultrametric Additive: UPGMA Transformed Distance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky MOLECULAR PHYLOGENY "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky EVOLUTION - theory that groups of organisms change over time so that descendeants differ structurally

More information

Markov Chains. Sarah Filippi Department of Statistics TA: Luke Kelly

Markov Chains. Sarah Filippi Department of Statistics  TA: Luke Kelly Markov Chains Sarah Filippi Department of Statistics http://www.stats.ox.ac.uk/~filippi TA: Luke Kelly With grateful acknowledgements to Prof. Yee Whye Teh's slides from 2013 14. Schedule 09:30-10:30 Lecture:

More information

Algorithms in Bioinformatics

Algorithms in Bioinformatics Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Distance Methods Character Methods

More information

Phylogeny Estimation and Hypothesis Testing using Maximum Likelihood

Phylogeny Estimation and Hypothesis Testing using Maximum Likelihood Phylogeny Estimation and Hypothesis Testing using Maximum Likelihood For: Prof. Partensky Group: Jimin zhu Rama Sharma Sravanthi Polsani Xin Gong Shlomit klopman April. 7. 2003 Table of Contents Introduction...3

More information

Minimum evolution using ordinary least-squares is less robust than neighbor-joining

Minimum evolution using ordinary least-squares is less robust than neighbor-joining Minimum evolution using ordinary least-squares is less robust than neighbor-joining Stephen J. Willson Department of Mathematics Iowa State University Ames, IA 50011 USA email: swillson@iastate.edu November

More information

Maximum Likelihood Until recently the newest method. Popularized by Joseph Felsenstein, Seattle, Washington.

Maximum Likelihood Until recently the newest method. Popularized by Joseph Felsenstein, Seattle, Washington. Maximum Likelihood This presentation is based almost entirely on Peter G. Fosters - "The Idiot s Guide to the Zen of Likelihood in a Nutshell in Seven Days for Dummies, Unleashed. http://www.bioinf.org/molsys/data/idiots.pdf

More information

X X (2) X Pr(X = x θ) (3)

X X (2) X Pr(X = x θ) (3) Notes for 848 lecture 6: A ML basis for compatibility and parsimony Notation θ Θ (1) Θ is the space of all possible trees (and model parameters) θ is a point in the parameter space = a particular tree

More information

Phylogenetics. BIOL 7711 Computational Bioscience

Phylogenetics. BIOL 7711 Computational Bioscience Consortium for Comparative Genomics! University of Colorado School of Medicine Phylogenetics BIOL 7711 Computational Bioscience Biochemistry and Molecular Genetics Computational Bioscience Program Consortium

More information

CSCI1950 Z Computa4onal Methods for Biology Lecture 4. Ben Raphael February 2, hhp://cs.brown.edu/courses/csci1950 z/ Algorithm Summary

CSCI1950 Z Computa4onal Methods for Biology Lecture 4. Ben Raphael February 2, hhp://cs.brown.edu/courses/csci1950 z/ Algorithm Summary CSCI1950 Z Computa4onal Methods for Biology Lecture 4 Ben Raphael February 2, 2009 hhp://cs.brown.edu/courses/csci1950 z/ Algorithm Summary Parsimony Probabilis4c Method Input Output Sankoff s & Fitch

More information

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between

More information

Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30

Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30 Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30 A non-phylogeny

More information

Reconstruire le passé biologique modèles, méthodes, performances, limites

Reconstruire le passé biologique modèles, méthodes, performances, limites Reconstruire le passé biologique modèles, méthodes, performances, limites Olivier Gascuel Centre de Bioinformatique, Biostatistique et Biologie Intégrative C3BI USR 3756 Institut Pasteur & CNRS Reconstruire

More information

Algorithmic Methods Well-defined methodology Tree reconstruction those that are well-defined enough to be carried out by a computer. Felsenstein 2004,

Algorithmic Methods Well-defined methodology Tree reconstruction those that are well-defined enough to be carried out by a computer. Felsenstein 2004, Tracing the Evolution of Numerical Phylogenetics: History, Philosophy, and Significance Adam W. Ferguson Phylogenetic Systematics 26 January 2009 Inferring Phylogenies Historical endeavor Darwin- 1837

More information

Evolutionary Trees. Evolutionary tree. To describe the evolutionary relationship among species A 3 A 2 A 4. R.C.T. Lee and Chin Lung Lu

Evolutionary Trees. Evolutionary tree. To describe the evolutionary relationship among species A 3 A 2 A 4. R.C.T. Lee and Chin Lung Lu Evolutionary Trees R.C.T. Lee and Chin Lung Lu CS 5313 Algorithms for Molecular Biology Evolutionary Trees p.1 Evolutionary tree To describe the evolutionary relationship among species Root A 3 Bifurcating

More information

CS5263 Bioinformatics. Guest Lecture Part II Phylogenetics

CS5263 Bioinformatics. Guest Lecture Part II Phylogenetics CS5263 Bioinformatics Guest Lecture Part II Phylogenetics Up to now we have focused on finding similarities, now we start focusing on differences (dissimilarities leading to distance measures). Identifying

More information

Evolutionary trees. Describe the relationship between objects, e.g. species or genes

Evolutionary trees. Describe the relationship between objects, e.g. species or genes Evolutionary trees Bonobo Chimpanzee Human Neanderthal Gorilla Orangutan Describe the relationship between objects, e.g. species or genes Early evolutionary studies The evolutionary relationships between

More information

Bioinformatics 1 -- lecture 9. Phylogenetic trees Distance-based tree building Parsimony

Bioinformatics 1 -- lecture 9. Phylogenetic trees Distance-based tree building Parsimony ioinformatics -- lecture 9 Phylogenetic trees istance-based tree building Parsimony (,(,(,))) rees can be represented in "parenthesis notation". Each set of parentheses represents a branch-point (bifurcation),

More information

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz Phylogenetic Trees What They Are Why We Do It & How To Do It Presented by Amy Harris Dr Brad Morantz Overview What is a phylogenetic tree Why do we do it How do we do it Methods and programs Parallels

More information

The least-squares approach to phylogenetics was first suggested

The least-squares approach to phylogenetics was first suggested Combinatorics of least-squares trees Radu Mihaescu and Lior Pachter Departments of Mathematics and Computer Science, University of California, Berkeley, CA 94704; Edited by Peter J. Bickel, University

More information

Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM).

Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM). 1 Bioinformatics: In-depth PROBABILITY & STATISTICS Spring Semester 2011 University of Zürich and ETH Zürich Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM). Dr. Stefanie Muff

More information

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9 Lecture 5 Alignment I. Introduction. For sequence data, the process of generating an alignment establishes positional homologies; that is, alignment provides the identification of homologous phylogenetic

More information

Phylogeny: building the tree of life

Phylogeny: building the tree of life Phylogeny: building the tree of life Dr. Fayyaz ul Amir Afsar Minhas Department of Computer and Information Sciences Pakistan Institute of Engineering & Applied Sciences PO Nilore, Islamabad, Pakistan

More information

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5. Five Sami Khuri Department of Computer Science San José State University San José, California, USA sami.khuri@sjsu.edu v Distance Methods v Character Methods v Molecular Clock v UPGMA v Maximum Parsimony

More information

Lecture Notes: Markov chains

Lecture Notes: Markov chains Computational Genomics and Molecular Biology, Fall 5 Lecture Notes: Markov chains Dannie Durand At the beginning of the semester, we introduced two simple scoring functions for pairwise alignments: a similarity

More information

Molecular Evolution, course # Final Exam, May 3, 2006

Molecular Evolution, course # Final Exam, May 3, 2006 Molecular Evolution, course #27615 Final Exam, May 3, 2006 This exam includes a total of 12 problems on 7 pages (including this cover page). The maximum number of points obtainable is 150, and at least

More information

Inferring Phylogenetic Trees. Distance Approaches. Representing distances. in rooted and unrooted trees. The distance approach to phylogenies

Inferring Phylogenetic Trees. Distance Approaches. Representing distances. in rooted and unrooted trees. The distance approach to phylogenies Inferring Phylogenetic Trees Distance Approaches Representing distances in rooted and unrooted trees The distance approach to phylogenies given: an n n matrix M where M ij is the distance between taxa

More information

CS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1. Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003

CS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1. Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003 CS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1 Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003 Lecturer: Wing-Kin Sung Scribe: Ning K., Shan T., Xiang

More information

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis

More information

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016 Molecular phylogeny - Using molecular sequences to infer evolutionary relationships Tore Samuelsson Feb 2016 Molecular phylogeny is being used in the identification and characterization of new pathogens,

More information

A (short) introduction to phylogenetics

A (short) introduction to phylogenetics A (short) introduction to phylogenetics Thibaut Jombart, Marie-Pauline Beugin MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis with PR Statistics, Millport Field

More information

Weighted Neighbor Joining: A Likelihood-Based Approach to Distance-Based Phylogeny Reconstruction

Weighted Neighbor Joining: A Likelihood-Based Approach to Distance-Based Phylogeny Reconstruction Weighted Neighbor Joining: A Likelihood-Based Approach to Distance-Based Phylogeny Reconstruction William J. Bruno,* Nicholas D. Socci, and Aaron L. Halpern *Theoretical Biology and Biophysics, Los Alamos

More information

C3020 Molecular Evolution. Exercises #3: Phylogenetics

C3020 Molecular Evolution. Exercises #3: Phylogenetics C3020 Molecular Evolution Exercises #3: Phylogenetics Consider the following sequences for five taxa 1-5 and the known outgroup O, which has the ancestral states (note that sequence 3 has changed from

More information

Michael Yaffe Lecture #4 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D

Michael Yaffe Lecture #4 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D 7.91 Lecture #4 Database Searching & Molecular Phylogenetics Michael Yaffe A B C D A B C D (((A,B)C)D) Outline FASTA, Blast searching, Smith-Waterman Psi-Blast Review of enomic DNA structure Substitution

More information

Phylogenetic inference: from sequences to trees

Phylogenetic inference: from sequences to trees W ESTFÄLISCHE W ESTFÄLISCHE W ILHELMS -U NIVERSITÄT NIVERSITÄT WILHELMS-U ÜNSTER MM ÜNSTER VOLUTIONARY FUNCTIONAL UNCTIONAL GENOMICS ENOMICS EVOLUTIONARY Bioinformatics 1 Phylogenetic inference: from sequences

More information

Understanding relationship between homologous sequences

Understanding relationship between homologous sequences Molecular Evolution Molecular Evolution How and when were genes and proteins created? How old is a gene? How can we calculate the age of a gene? How did the gene evolve to the present form? What selective

More information

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 2009 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

More information

How should we go about modeling this? Model parameters? Time Substitution rate Can we observe time or subst. rate? What can we observe?

How should we go about modeling this? Model parameters? Time Substitution rate Can we observe time or subst. rate? What can we observe? How should we go about modeling this? gorilla GAAGTCCTTGAGAAATAAACTGCACACACTGG orangutan GGACTCCTTGAGAAATAAACTGCACACACTGG Model parameters? Time Substitution rate Can we observe time or subst. rate? What

More information

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 200 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

More information

Thanks to Paul Lewis and Joe Felsenstein for the use of slides

Thanks to Paul Lewis and Joe Felsenstein for the use of slides Thanks to Paul Lewis and Joe Felsenstein for the use of slides Review Hennigian logic reconstructs the tree if we know polarity of characters and there is no homoplasy UPGMA infers a tree from a distance

More information

Theory of Evolution. Charles Darwin

Theory of Evolution. Charles Darwin Theory of Evolution harles arwin 858-59: Origin of Species 5 year voyage of H.M.S. eagle (8-6) Populations have variations. Natural Selection & Survival of the fittest: nature selects best adapted varieties

More information

Evolutionary trees. Describe the relationship between objects, e.g. species or genes

Evolutionary trees. Describe the relationship between objects, e.g. species or genes Evolutionary trees Bonobo Chimpanzee Human Neanderthal Gorilla Orangutan Describe the relationship between objects, e.g. species or genes Early evolutionary studies Anatomical features were the dominant

More information

Workshop III: Evolutionary Genomics

Workshop III: Evolutionary Genomics Identifying Species Trees from Gene Trees Elizabeth S. Allman University of Alaska IPAM Los Angeles, CA November 17, 2011 Workshop III: Evolutionary Genomics Collaborators The work in today s talk is joint

More information

The Generalized Neighbor Joining method

The Generalized Neighbor Joining method The Generalized Neighbor Joining method Ruriko Yoshida Dept. of Mathematics Duke University Joint work with Dan Levy and Lior Pachter www.math.duke.edu/ ruriko data mining 1 Challenge We would like to

More information

Ensembles and incomplete information

Ensembles and incomplete information p. 1/32 Ensembles and incomplete information So far in this course, we have described quantum systems by states that are normalized vectors in a complex Hilbert space. This works so long as (a) the system

More information

Multiple Sequence Alignment. Sequences

Multiple Sequence Alignment. Sequences Multiple Sequence Alignment Sequences > YOR020c mstllksaksivplmdrvlvqrikaqaktasglylpe knveklnqaevvavgpgftdangnkvvpqvkvgdqvl ipqfggstiklgnddevilfrdaeilakiakd > crassa mattvrsvksliplldrvlvqrvkaeaktasgiflpe

More information

Module 13: Molecular Phylogenetics

Module 13: Molecular Phylogenetics Module 13: Molecular Phylogenetics http://evolution.gs.washington.edu/sisg/2013/ MTH Thanks to Paul Lewis, Joe Felsenstein, Peter Beerli, Derrick Zwickl, and Joe Bielawski for slides Wednesday July 17:

More information

Theoretical Basis of Likelihood Methods in Molecular Phylogenetic Inference

Theoretical Basis of Likelihood Methods in Molecular Phylogenetic Inference Theoretical Basis of Likelihood Methods in Molecular Phylogenetic Inference Rhiju Das, Centre of Mathematics and Physical Sciences applied to Life science and EXperimental biology (CoMPLEX), University

More information

7. Tests for selection

7. Tests for selection Sequence analysis and genomics 7. Tests for selection Dr. Katja Nowick Group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute for Brain Research www. nowicklab.info

More information

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Bioinformatics II Probability and Statistics Universität Zürich and ETH Zürich Spring Semester 2009 Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Dr Fraser Daly adapted from

More information

2t t dt.. So the distance is (t2 +6) 3/2

2t t dt.. So the distance is (t2 +6) 3/2 Math 8, Solutions to Review for the Final Exam Question : The distance is 5 t t + dt To work that out, integrate by parts with u t +, so that t dt du The integral is t t + dt u du u 3/ (t +) 3/ So the

More information

Phylogeny. November 7, 2017

Phylogeny. November 7, 2017 Phylogeny November 7, 2017 Phylogenetics Phylon = tribe/race, genetikos = relative to birth Phylogenetics: study of evolutionary relationships among organisms, sequences, or anything in between Related

More information