A Statistical Test of Phylogenies Estimated from Sequence Data

Size: px
Start display at page:

Download "A Statistical Test of Phylogenies Estimated from Sequence Data"

Transcription

1 A Statistical Test of Phylogenies Estimated from Sequence Data Wen-Hsiung Li Center for Demographic and Population Genetics, University of Texas A simple approach to testing the significance of the branching order, estimated from protein or DNA sequence data, of three taxa is proposed. The branching order is inferred by the transformed-distance method, under the assumption that one or two outgroups are available, and the branch lengths are estimated by the least-squares method. The inferred branching order is considered significant if the estimated inter-nodal distance is significantly greater than zero. To test this, a formula for the variance of the inter-nodal distance has been developed. The statistical test proposed has been checked by computer simulation. The same test also applies to the case of four taxa with no outgroup, if one considers an unrooted tree. Formulas for the variances of internodal distances have also been developed for the case of five taxa. Conditions are given under which it is more efficient to add the sequence of a fifth taxon than to do 25% more nucleotide sequencing in each of the original four. A method is presented for combining analyses of disparate data to get a single P value. Finally, the test, applied to the human-chimpanzee-gorilla problem, shows that the issue is not yet resolved. Introduction Although phylogenetic reconstruction has long been recognized as a problem in statistical inference (Edwards and Cavalli-Sforza 1964)) few authors have considered how to evaluate the confidence level for estimated phylogenies (Cavender 1978; Felsenstein 1981, 1985a, 198%; Mueller and Ayala 1982; Templeton 1983; Nei et al. 1985; Lake 1987; P. Pamilo, personal communication). This problem has become important because the rapid accumulation of molecular data has generated much interest in phylogenetic studies. How to test the significance of an inferred phylogeny is a difficult problem. A simpler problem is to test the significance of estimated internodal distances. As will be explained later, in the case of four taxa significance of the internodal distance can be taken as significance of the inferred phylogeny. When the number of taxa under study is more than four, the two problems are no longer equivalent and the requirement of all internodal distances being significantly greater than zero seems to be too stringent a test for the significance of the inferred branching order. A simple way to test the significance of internodal distances is to study their variances. Mueller and Ayala ( 1982) proposed to compute these variances by the jackknife method, while Nei et al. ( 1985) derived analytic formulas for the case of a UPGMA tree, i.e., a tree estimated by the unweighted pair-group method of analysis (Sneath and_sokal 1973 ). The UPGMA method assumes a constant rate of evolution, Key words: phylogenetic reconstruction, transformed distance significance of branching order, phylogeny of apes and man. method, variances of branch lengths, Address for correspondence and reprints: Wen-Hsiung Li, Center for Demographic and Population Genetics, University of Texas, P.O. Box 20334, Houston, TX Mol. Biol. Evol. 6(4): by The University of Chicago. All rights reserved /89/ $02.00

2 Statistical Test of Phylogenies 425 but there is now strong evidence that this assumption is often violated (Wu and Li 1985; Britten 1986; Li et al. 1987). It is therefore desirable to consider an approach that does not make this assumption. In this paper I propose a two-step approach. The first step is to infer the branching order. One can use the transformed-distance method (Farris 1977; Klotz et al. 1979; Li 1981)) the neighbor-joining method (Saitou and Nei 1987)) the maximum parsimony method (Eck and Dayhoff 1966; Fitch 1977), or any other method that does not assume rate constancy and that has been shown to be effective for obtaining the correct tree. The second step is to estimate the branch lengths by the least-squares method (Cavalli-Sforza and Edwards 1967; Chakraborty 1977). The variances of internodal distances are then obtained from the equations derived from the least-squares method. In this study analytic formulas for these variances have been developed for the cases of four and five taxa. Computer simulation of the case of four taxa confirmed that the statistical test proposed can indeed be used to test the significance of an inferred phylogeny. The present theory was applied to the human-chimpanzee-gorilla trichotomy problem. Variances of Internodal Branch Lengths In the following I shall explain how to derive the variance of a branch length, assuming that the tree topology has already been inferred by one of the methods mentioned above. Since the focus of this paper is on the internodal branches, the variances of the other branches are presented in an appendix; these variances are useful for evaluating the reliability of estimates of branch lengths. Four Taxa Denote the four taxa under study by 1, 2, 3, and 4. Suppose that the inferred tree topology is as shown in figure la; the root of the tree can be determined if one of the four taxa is an outgroup. The branch lengths should satisfy the following equations: d12 = a + b, (1) d,,=a+c+d, (2) d,,=b+c+d, (3) (4) (5) ds4 = d + e, (6) where dti is the distance between taxa i and j. From these equations I obtain the following least-squares solution: a = %d12 + l/4( d13-d23+d14--d24), (7) b = d12 - a (8)

3 426 Li a b FIG. 1.-Model trees used in the derivation of the mean and variance of the branch lengths c = 4 d,3+d23+d14+44) - l/2( &+&$), (9) d = /2d34 + Y4( &+&3-44-c&4) ) (10) e = dj4 - d, (11) The variance (V) of c can be obtained using formula (9) and following the method of Nei et al. (1985) and Wu and Li (1985): v(c) = /16[V(d13)+V(d23)+V(d14)+~(d24)+2V(d16)+2V(d26) +4~(d,,)+2~/(d53)+2v(d54)] - l~~[w-ad+w&) (12) +v(d,,)+v(&)] + 1/2W,2) + l~wd, where V( d,) denotes the variance of the estimate of the distance between sequences i andj. First, consider protein sequence data. The mean and variance of the number (d,) of amino acid replacements per site between sequences i and j can be estimated by du =_fw l-p/f), (13) Vd,) = PC l-~)l[u l-~/f)~l, (14) where f = 19 /20, p is the proportion of different amino acids between the two sequences, and L is the number of residue sites compared. For a pair of extant sequences, these formulas are readily applicable. However, the sequences at nodes 5 and 6 do not exist, and thus variances such as V( d16) and V( ds6) cannot be estimated directly from actual data. However, they can be estimated as follows (Nei et al. 1985): I use V( d16) as an example. From formula ( 13) I obtain p = f( 1 -&l f). (15) Since d16 = a + c, p = f[ 1 - e-(a+c) f] ; a + c can be obtained from formulas ( 7) and (9). Putting p into formula ( 14)) one readily obtains V( d16).

4 Statistical Test of Phylogenies 427 Next, consider nucleotide sequence data. Under the assumption of random substitution among the four types of nucleotide, i.e., the one-parameter model, the mean and variance of the number of substitutions per nucleotide site between sequences i and j are also given by formulas ( 13) and ( 14), except that now f = 3 /4, p is the proportion of different nucleotides between the two sequences, and L is the number of nucleotide sites compared (Jukes and Cantor 1969; Kimura and Ohta 1972). Under the two-parameter model (Kimura 1980)) the formulas corresponding to formulas (13) and (14) are dii = A + B, (16) V( d,) = [ x2p+z2q- ( xp+zq)~] /L, (17) where P and Q are, respectively, the proportions of transitional and transversional differences between sequences i and j, x = 1 /( l -2P-Q), y = 1 /( l -2Q), z = (x+v)/ 2, A = Mln( x) - *An(v) is the number of transitional substitutions per site, and B = Mln(y) is the number of transversional substitutions per site. Note that, unlike formula ( 14)) formula ( 17) involves two parameters, P and Q. The formulas corresponding to formula ( 15 ) are given by Q = l/2( 1 -em2 ), (18) p = 1/2[ l_q_e-(2 l+b)], (19) (Wu and Li 1985). Five Taxa Suppose that the inferred lengths are then given by branching order is as in figure lb. The branch b = d12 - a, (21) c = 1h(d13+d23) + 1/&i,4+d24+d1s+d2s) - Wd34+dx) - M2 7 (22) d = /2(d13+d23) - %d12 - c, (23) e= 1h(d34+d35- & - dn) + %(d14+dst+du+dz) - l/2d4s 9 (24) f= ds4-d-e, (25) g = 45 - f. (26) If two of the five taxa, say taxa 4 to obtain only the variance of c. and 5, are known to be outgroups, then one needs v(c) = 1/64[~(d,4)+l/(d,5)+~(d24)+V(d25)1 + 1/32[~(d,8)+V(d2*)+~(d46)+V(d56)1

5 428 Li + /,6[~(d13)+V(d23)+V(d34)+V(d35)+~(d68)1 (27) + 1/g[V(d,7)+V(d27)+~(d36)+~(d38)-V(&7)-v(d57)1 + /4[~(d12)+V(A67)-V(d7g)] - 1/2[v(d16)+v(d26)+v(d37)1 * If only one or no outgroup exists, then one needs also to obtain the variance of e. v(e) = 1/64[~(d14)+V(dlS)fl/(d24)+l/(d25)1 + /32[~(d,8)+V(d28)fV(d46)+l/(d56)1 + 1/16[~(d13)+V(d23)+~(d34)+v(d35)+V(d68)1 (28) + /8[--(d17)-~(d27)+V(d36)+~(d38)+~(d47)+V(d57)1 + /4[l/(d45)-~(d67)+v(d78)] - 1/2[l/(d58)+~(d48)+v(d37)1s Computer programs for a floppy disk to the author. the above formulas are available on request by sending Test of Significance of an Inferred Phylogeny In the case of three taxa with one or two outgroups, the above results can be used to test the significance of an inferred phylogeny. Since in this case there is only one internal branch, i.e., branch c, testing the significance of the internal branch is equivalent to testing the significance of the inferred phylogeny. More explicitly, the null hypothesis is that the true phylogeny is a trichotomy, i.e., the three taxa diverged at the same time. This hypothesis is the same as the hypothesis of c = 0. Therefore, if the estimated c is significantly ~0, the null hypothesis of trichotomy is rejected and the inferred branching order can be taken as statistically significant. The same argument applies to the case of four taxa with no outgroup if one considers unrooted trees. This is easy to see from figure la: since branch c is the only internal branch, the inferred topology can be taken as significant if c is significantly >O. When the number of taxa under study is more than four, the situation becomes complicated. For example, in the case of five taxa there are two internal branches (fig. 1 b), and the probability for (only) one of them to become by chance significantly greater than zero at the level of c1 = 5% is 2a = 10%. Thus, in this case one cannot reject the null hypothesis that all the internal branches have zero length, i.e., that all the taxa diverged at the same time point and forrn a star phylogeny; of course, this null hypothesis can be rejected if a I 2.5%. On the other hand, the probability for both internal branches to be by chance significant at the level of a = 5% is approximately only a 2 = (it is not strictly o2 because the two internal branch lengths are not estimated independently). Hence, the requirement of all internal branches being significant seems to be too stringent a test for the significance of the inferred topology. Another difficulty is that one cannot draw a conclusion about the significance of an inferred tree topology as long as one or more of the internodal distances are nonsignificant; of course, the uncertainty can be restricted to a subset of taxa. In short, a more careful study is required for understanding the problem of testing the significance of an inferred phylogeny when more than four taxa are involved.

6 Statistical Test of Phylogenies 429 I now come back to the case of three taxa, where the task is to test the null hypothesis of trichotomy or c = 0. The above formulas for the mean and variance of c were derived under the assumption that the inferred branching order of the three taxa was (( 1, 2) 3); the notation ((i, j)k) means that lineage k branched off earlier than did lineages i and j. If, instead, the inferred branching pattern is (( 1, 3 ) 2)) then the subscripts 2 and 3 in the above formulas should be exchanged, and if the inferred branching pattern is (( 2, 3) 1 ), subscripts 1 and 3 should be exchanged. Under the null hypothesis of trichotomy, the three branching patterns (( 1, 2) 3 ), (( 1, 3) 2)) and (( 2, 3) 1) occur with equal probability. However, for each set of data only one pattern can occur and only one c can be positive and is tested for significant deviation from 0, so that there is no multiple-test problem. Moreover, regardless of which pattern occurs, the probability that c will assume a particular (nonnegative) value is the same. If the distribution of c is the same as the distribution of 1x1, where x is a standard normal random variate, then the standard statistical test based on the standard normal distribution can be applied. In particular, the estimated c is significant at the 5% level if the ratio of mean to SE is , and it is significant at the 1% level if the ratio is Obviously, the case of four taxa with no outgroup can be treated in the same manner, if one considers unrooted trees. To test the accuracy of the level of significance defined by the above criteria, I conducted a computer simulation for the case of three taxa with one outgroup. I assumed that the three taxa diverged at the same time, and I used the two-parameter model of nucleotide substitution. The simulation results are shown in table 1. In the table a, b, and d denote the expected lengths of the three lineages (i.e., expected numbers of substitutions per nucleotide site), while e denotes the expected length from the common ancestor of the three taxa to the outgroup. Let Y be the ratio of the estimated c value to the SE. The percentage of replicates with Y is <5% when a, b, and d are ~0.20 (table 1) but tends to be somewhat >5% when a, b, and d are 20.20, suggesting that under the latter situation a slightly higher r value, say 22.2, is required for the 5% significance level. On the other hand, the percentage of replicates with r is usually < 1%. Therefore, although the simulation results do not support the assumption of normality for the distribution of c, the standard normal test appears to be generally applicable. In the two cases where d is larger than a and b, so that the rate-constancy assumption is violated, the percentages of replicates with Y or 2.60 are similar to those for the cases where the rate-constancy holds. In the above simulation I have not considered branch lengths >0.45 because at this stage of divergence the distance between two sequences is close to 1, so that estimates of the number of substitutions per site will become unreliable (e.g., see Li et al. 1985). Numerical Examples To better understand the theory developed above, consider some numerical examples. I assume that the rate of nucleotide substitution is constant over time and that the observed number of substitutions between each pair of sequences is equal to the expected value. First, consider the case of three species with an outgroup (taxon 4) (fig. 1 a). In table 2, c1 denotes the proportion of transitional changes; a = 1 / 3 if substitutions occur randomly. The SE, which is the square root of V(c), is larger for a = 2/ 3 than for a = 1 / 3. Since transitional changes generally occur more often than transversional changes (Brown et al. 1982; Li et al. 1984)) the two-parameter model is more realistic

7 Table 1 Percentage of Replicates with r Exceeding a Specified Value BRANCH LENGTH (no. of substitutions/site) a=b d e L PERCENTAGE a= l/3 a = 213 r r r r , , , , , , , , , , , , , , , , , , , , , NOTE.-In all cases, the true value of c is 0. L = number of nucleotide sites studied; a = proportion of transitional substitutions; r = 1.96 is significant at the 5% level and r = 2.60 is significant at the 1% level under the assumption of the standard normal distribution. In each case the number of replicates is 1,000 for L = 1,000, 250 for L = 4,000, and 125 for L = 8,000. Table 2 SE of the Estimate of the Length of Branch c in Figure la c a SE c/se L rb l/ , l/ , , l/ ooo NOTE.-The branch lengths in fig. la are a = b = 0.05, d = c, and e = c. Symbols are as defined in the text and table 1. a Computed under the assumption that L = 1,000. b Number of nucleotide sites required for the ratio c/se to be 32 (i.e., to be -5% significant).

8 Statistical Test of Phylogenies 43 1 than the one-parameter model; the former is applicable to all a values, whereas the latter is applicable only to a = 1 / 3. The ratio c/se can be used to test whether c is significantly ~0. A ratio of 2 can be taken as significant at the 5% level. All the values in table 2 were obtained for L = 1,000. When c = 0.0 1, the ratio is 2 or larger if a! I */3. Thus, this case requires only a small amount of sequence data to resolve the branching order of the three species. When c = 0.005, then the ratio is considerably smaller than 2; for example, the ratio is 1.28 for a = 1 / 3. Formulas ( 14) and ( 17 ) imply that V( c) is inversely proportional to L. Therefore, for the ratio to increase from 1.28 to 2 the L value should increase from 1,000 to L = 1,000 X (2/ 1.28)2 = -2,500. The other L values in table 2 were obtained in the same manner. If c = , then the number of nucleotide sites needed to be studied is rather large, >50,000. Saitou and Nei ( 1986) have earlier considered this problem from a different angle. They studied the probability of obtaining the correct topology as a function of the number of nucleotides studied under various tree-making methods. Next, consider the amount of reduction in V(c) when a second outgroup (taxon 5) is added (fig. 1 b). Let us denote the V(c) value for the case of one outgroup by V, (c) and that for the case of two outgroups by V2( c). A comparison of these two values is shown in table 3. The reduction increases as V,(c) becomes larger. Since V(c) is inversely proportional to L, a reduction in V(c) can also be achieved by increasing L. Is it more advantageous to increase L or to add a second outgroup? The total number of nucleotides sequenced is 4 L for the case of one outgroup and 5 L for the case of two outgroups, the latter being 1.25 times the former. Therefore, if the same total number of nucleotides is to be sequenced, it is less advantageous to add a second outgroup than to increase L if Vi (c)l V2( c) < 1.25, whereas the reverse is true if the ratio is > In table 3 the ratio is ~1.25 for the first six cases and is > 1.25 for the last six cases. Since the ratio tends to increase with V, (c), in general it is more advantageous to increase L if Vi (c) is relatively small but more advantageous to add a second outgroup if I, (c) is relatively large. In all the cases in table 3, the distances from sequences 4 and 5 to the other three are the same, i.e., g = f in figure lb, so that the fifth sequence is as good a reference as the fourth one. If the fifth is more distantly related to the other three than the fourth sequence is, then the reduction in V(c) is expected to be smaller than those shown in table 3. Further, the effect will also be reduced if sequences 4 and 5 are closely related to each other. Discussion Heterogeneous Data Phylogenetic studies often use sequence data from different DNA regions. If all the regions studied have similar rates of nucleotide substitution, then all the data can be combined together into one single set. However, if substantial variation in rates exists, regions with different rates should be treated separately. The question then arises as to how to test the significance when the results from different data sets are combined. A simple test procedure is the inverse x2 method (Fisher 1932). Suppose that there are k different data sets. Let Pi be the significance level (probability) estimated from the ith data set. If the null hypothesis is true (i.e., the three taxa represent a trichotomy), then -21n(Pi) has a x2 distribution with 2 degrees of freedom and P = -2 2 ln(pi) i=l

9 432 Li Table 3 Variance [V,(c)] of the Estimate of the Length of Branch c in Figure lb a=b Vi(C) V2(4 d f=g e c cc (X 10-4) (x10-4) VlW v2w l/ l/ l/ l/ l/ l/ NOTE.-Symbols are as defined in the text and table 1. Obtained under the assumption that the second outgroup (taxon 5 in fig. 1 b) is not available. has a x2 distribution with 2k degrees of freedom. The probability corresponding to the computed P value can be easily obtained from a x2 table. Branching Order of Human, Chimpanzee, and Gorilla Holmquist et al. ( 1988) have recently applied Lake s ( 1987) method of phylogenetic reconstruction to study the human-chimp-gorilla trichotomy problem by using two sets of data: ( 1) nuclear DNA for a IO-kb region around the t-l-globin pseudogene locus (Miyamoto et al. 1987; Maeda et al. 1988, and references therein) and (2) mitochondrial (mt) DNA for the 896-bp fragment characterized by Brown et al. ( 1982). I applied the present theory to the same two sets of data. From the nucleotide differences tabulated in table 2 of Holmquist et al. ( 1988 ), I obtained the proportions of transitional and transversional differences between each pair of the four species: human, chimpanzee, gorilla, and orangutan (table 4). Here the problem is to determine the neighbor pairs. In both sets of data the transformed-distance method pairs human and chimpanzee in one clade. Applying the present theory to the nuclear DNA data, I obtain a = , b = , and c = with SE = if the one-parameter model of nucleotide substitution is used and a = , b = , and c = with SE = if the two-parameter model is used. Under both models the ratio c/se is and the probability for this to occur is 5 16%. For the mtdna data, I obtain a = , b = , c/se = / = 1.87, and a probability of 16% for the one-parameter model and a = , b = , c/se = / = 1.56, and a probability of 5 12% for the two-parameter model (a more rigorous treatment of the mtdna data should consider different types of regions separately). In this case the two-parameter model is much more realistic because there is a strong bias for transitional changes (Brown et al. 1982). To test the significance of the combined results, note that P, = 0.16, P2 = 0.12, and -2 2 ln( Pi) = Since the probability of x2 = 7.90 with 4 degrees of freedom is 0.10, one cannot reject the hypothesis of trichotomy. For both sets of data Lake s test gave a probability of ~25% (Holmquist et al. 1988), which is considerably larger than the probabilities obtained above. Holmquist

10 Statistical Test of Phylogenies 433 Table 4 P and 0 between SDecies HUMAN CHIMP GORILLA ORANGUTAN P Q P Q P Q P Q Human... Chimp... Gorilla... Orangutan NOTE.-The values below the diagonal were computed from the 10,046-bp region around the q-globin pseudogene locus (Miyamoto et al. 1987; Maeda et al. 1988); the values above the diagonal were computed from the mtdna for the 896-bp fragment of Brown et al. (1982). et al. have combined the two sets of data in one set and obtained a probability of 13%. One reason for this low probability is as follows: In Lake s test one calculates a parsimony-like term P and a background term B for each of the three alternative trees. Under the null hypothesis that the tree under consideration is wrong, P and B are statistically equal. Holmquist et al. used the binomial distribution to test the equality of P = B. It happened that for both sets of data P = 3 and B = 0 for the tree with human and chimpanzee in one clade, and so, when the two sets of data were combined, P = 6 and B = 0, from which Holmquist et al. obtained a probability of 3%. Since the mtdna segment used has evolved six to seven times faster than the nuclear DNA segment (see the above a and b values), both the P and B values should have different probability distributions for the two sets of data. It is therefore not clear that they should be combined together as was done by Holmquist et al. This problem deserves a more careful study. A more serious problem is that in Lake s method three independent tests (one for each of the three alternative trees) are conducted. Thus, the probability for rejecting the null hypothesis should be 1 - ( 1 -P)3 = 3P = 9%, instead of 3%. APPENDIX Variances of Branch Lengths The variances of internal branch lengths have already been given in the text. Here I present the variances of peripheral branch lengths. First, let us consider the case of four taxa. From equation ( 7 ), v(a) = /4V(d12) + /2[V(d,,)-V(d2~)] + /16[~(d13)+V(d14)+V(d23)+~(d24)1 + /#(d,,)+~(d,6)-~(~3+~(~45)-2~(&6)1. The four peripheral branches a, b, d, and e of figure la are topologically equivalent, so that the variance for any of the other branch lengths can be readily obtained from the above formula by exchanging the subscripts; for example, a comparison of equation ( 7) and ( 10) shows that to obtain V(d) one needs to exchange between 1 and 3, 2 and 4, and 5 and 6 ( see fig. la). Next, consider the case of five taxa. From equation (20), V(a) = /4V(d12) + 1/2[~(&)-V(~26)1 + 1/36[i+&3)+~(d,3)+~(d14)+~(~24)+V(&)+V(d2d1

11 434 Li + /18[2~(d17)+2V(d18)+2V(d27)-V(d36)-V(d46) +v(d56)-4v(d67)-2~/(d68)l. The four peripheral branches a, b, f, and g of figure 1 b are topologically equivalent, so that the variances of b, f, and g can be obtained from the above formula by exchanging subscripts. The variance of d is given by -v(d74)-2l/(d78)-v(d75)1 + /64[~(d,4)+V(d24)+V(d15)+~(d25)1 + 1/32[V(d64)+V(d18)+2v(dss)+v(d,8)+v(d,5)l - Acknowledgments I thank R. Chakraborty, M. Gouy, P. Pamilo, P. M. Sharp, and K. H. Wolfe for suggestions. This study was supported by NIH grant GM LITERATURE CITED BRITTEN, R. J Rates of DNA sequence evolution differ between taxonomic groups. Science 231: BROWN, W. M., E. M. PRAGER, A. WANG, and A. C. WILSON Mitochondrial DNA sequences of primates: tempo and mode of evolution. J. Mol. Evol. l& CAVALLI-SFORZA, L. L., and A. W. F. EDWARDS Phylogenetic analysis models and estimation procedures. Am. J. Hum. Genet. 19: CAVENDER, J. A Taxonomy with confidence. Math. Biosci. 40: CHAKFUBORTY, R Estimation of time of divergence from phylogenetic studies. Can. J. Genet. Cytol. 19: ECK, R. V., and M. 0. DAYHOFF Atlas of protein sequence and structure. National Biomedical Research Foundation, Silver Spring, Md. EDWARDS, A. W. F., and L. L. CAVALLI-SFORZA Reconstruction of evolutionary trees. Pp in V. H. HEYWCKID and J. MCNEILL, eds. Phenetic and phylogenetic classification. Systematics Association Publication 6. Systematics Association, London. FARRIS, J. S On the phenetic approach to vertebrate classification. Pp in M. K. HECHT, P. C. GOODY, and B. M. HECHT, eds. Major patterns in vertebrate evolution. Plenum, New York. FELSENSTEIN, J Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17: %~. Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39: % Confidence limits on phylogenies with a molecular clock. Syst. Zool. 34: FISHER, R. A Statistical methods for research workers. 4th ed. Oliver & Boyd, London. FITCH, W. M On the problem of discovering the most parsimonious tree. Am. Nat. 111: HOLMQUIST, R., M. M. MIYAMOTO, and M. GOODMAN Analysis of higher-primate phylogeny from transversion differences in nuclear and mitochondrial DNA by Lake s methods of evolutionary parsimony and operator metrics. Mol. Biol. Evol. 5:

12 Statistical Test of Phylogenies 435 JUKES, T. H., and C. R. CANTOR Evolution of protein molecules. Pp. 2 I- 132 in H. N. MUNRO, ed. Mammalian protein metabolism. Academic Press, New York. KIMURA, M A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol. 16: 11 l KIMURA, M., and T. OHTA On the stochastic model for estimation of mutational distance between homologous proteins. J. Mol. Evol. 2: KLOTZ, L. C., N. KOMAR, R. L. BLANKEN, and R. M. MITCHELL Calculation of evolutionary trees from sequence data. Proc. Natl. Acad. Sci. USA 76: LAKE, J. A A rate-independent technique for analysis of nucleic acid sequences: evolutionary parsimony. Mol. Biol. Evol. 4: LI, W.-H Simple method for constructing phylogenetic trees from distance matrices. Proc. Natl. Acad. Sci. USA 78: LI, W.-H., C.-C. LUO, and C.-I. WV Evolution of DNA sequences. Pp. l-94 in R. J. MACINTYRE, ed. Molecular evolutionary genetics. Plenum, New York. LI, W.-H., M. TANIMURA, and P. M. SHARP An evaluation of the molecular clock hypothesis using mammalian DNA sequences. J. Mol. Evol. 25: LI, W.-H., C.-I. WV, and C.-C. LUO Nonrandomness of point mutation as reflected in nucleotide substitutions in pseudogenes and its evolutionary implications. J. Mol. Evol. 21: MAEDA, N., C.-I. WV, J. BLISKA, and J. RENEKE Molecular evolution of intergenic DNA in higher primates: pattern of DNA changes, molecular clock and evolution of repetitive sequences. Mol. Biol. Evol. 5: l-20. MIYAMOTO, M. M., J. L. SLIGHTOM, and M. GOODMAN Phylogenetic relationships of human and African apes as ascertained from DNA sequences (7.1 kilobase pairs) of the wrlglobin region. Science 238: MUELLER, L. D., and F. J. AYALA Estimation and interpretation of genetic distance in empirical studies. Genet. Res. 40: NEI, M., J. C. STEPHENS, and N. SAITOU Methods for computing the standard errors of branching points in an evolutionary tree and their application to molecular data from humans and apes. Mol. Biol. Evol. 2: SAITOU, N., and M. NEI The number of nucleotides required to determine the branching order of three species with special reference to the human-chimpanzee-gorilla divergence. J. Mol. Evol. 24: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4: SNEATH, P. H. A., and R. R. SOKAL Numerical taxonomy. W. H. Freeman, San Francisco. TEMPLETON, A. R Phylogenetic inference from restriction endonuclease cleavage site maps with particular reference to the evolution of humans and the apes. Evolution 37:22 l WV, C.-I., and W.-H, LI Evidence for higher rates of nucleotide substitution in rodents than in man, Proc. Natl. Acad. Sci. USA 82: 174 l WALTER M. FITCH, reviewing editor Received September 28, 1988; revision received January 10, 1989

Variances of the Average Numbers of Nucleotide Substitutions Within and Between Populations

Variances of the Average Numbers of Nucleotide Substitutions Within and Between Populations Variances of the Average Numbers of Nucleotide Substitutions Within and Between Populations Masatoshi Nei and Li Jin Center for Demographic and Population Genetics, Graduate School of Biomedical Sciences,

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: istance-based methods Ultrametric Additive: UPGMA Transformed istance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

Letter to the Editor. Department of Biology, Arizona State University

Letter to the Editor. Department of Biology, Arizona State University Letter to the Editor Traditional Phylogenetic Reconstruction Methods Reconstruct Shallow and Deep Evolutionary Relationships Equally Well Michael S. Rosenberg and Sudhir Kumar Department of Biology, Arizona

More information

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the

More information

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

More information

Dr. Amira A. AL-Hosary

Dr. Amira A. AL-Hosary Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

More information

Variance and Covariances of the Numbers of Synonymous and Nonsynonymous Substitutions per Site

Variance and Covariances of the Numbers of Synonymous and Nonsynonymous Substitutions per Site Variance and Covariances of the Numbers of Synonymous and Nonsynonymous Substitutions per Site Tatsuya Ota and Masatoshi Nei Institute of Molecular Evolutionary Genetics and Department of Biology, The

More information

Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30

Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30 Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30 A non-phylogeny

More information

C3020 Molecular Evolution. Exercises #3: Phylogenetics

C3020 Molecular Evolution. Exercises #3: Phylogenetics C3020 Molecular Evolution Exercises #3: Phylogenetics Consider the following sequences for five taxa 1-5 and the known outgroup O, which has the ancestral states (note that sequence 3 has changed from

More information

Algorithmic Methods Well-defined methodology Tree reconstruction those that are well-defined enough to be carried out by a computer. Felsenstein 2004,

Algorithmic Methods Well-defined methodology Tree reconstruction those that are well-defined enough to be carried out by a computer. Felsenstein 2004, Tracing the Evolution of Numerical Phylogenetics: History, Philosophy, and Significance Adam W. Ferguson Phylogenetic Systematics 26 January 2009 Inferring Phylogenies Historical endeavor Darwin- 1837

More information

Phylogenetic inference

Phylogenetic inference Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis-) advantages of different information types

More information

Lecture 4. Models of DNA and protein change. Likelihood methods

Lecture 4. Models of DNA and protein change. Likelihood methods Lecture 4. Models of DNA and protein change. Likelihood methods Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 4. Models of DNA and protein change. Likelihood methods p.1/36

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: Distance-based methods Ultrametric Additive: UPGMA Transformed Distance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

Theory of Evolution Charles Darwin

Theory of Evolution Charles Darwin Theory of Evolution Charles arwin 858-59: Origin of Species 5 year voyage of H.M.S. eagle (83-36) Populations have variations. Natural Selection & Survival of the fittest: nature selects best adapted varieties

More information

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution Ziheng Yang Department of Biology, University College, London An excess of nonsynonymous substitutions

More information

Phylogeny Estimation and Hypothesis Testing using Maximum Likelihood

Phylogeny Estimation and Hypothesis Testing using Maximum Likelihood Phylogeny Estimation and Hypothesis Testing using Maximum Likelihood For: Prof. Partensky Group: Jimin zhu Rama Sharma Sravanthi Polsani Xin Gong Shlomit klopman April. 7. 2003 Table of Contents Introduction...3

More information

Phylogenetic Tree Reconstruction

Phylogenetic Tree Reconstruction I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven

More information

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5. Five Sami Khuri Department of Computer Science San José State University San José, California, USA sami.khuri@sjsu.edu v Distance Methods v Character Methods v Molecular Clock v UPGMA v Maximum Parsimony

More information

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky MOLECULAR PHYLOGENY "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky EVOLUTION - theory that groups of organisms change over time so that descendeants differ structurally

More information

Simple Methods for Testing the Molecular Evolutionary Clock Hypothesis

Simple Methods for Testing the Molecular Evolutionary Clock Hypothesis Copyright 0 1998 by the Genetics Society of America Simple s for Testing the Molecular Evolutionary Clock Hypothesis Fumio Tajima Department of Population Genetics, National Institute of Genetics, Mishima,

More information

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 2009 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

More information

Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition

Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition David D. Pollock* and William J. Bruno* *Theoretical Biology and Biophysics, Los Alamos National

More information

Concepts and Methods in Molecular Divergence Time Estimation

Concepts and Methods in Molecular Divergence Time Estimation Concepts and Methods in Molecular Divergence Time Estimation 26 November 2012 Prashant P. Sharma American Museum of Natural History Overview 1. Why do we date trees? 2. The molecular clock 3. Local clocks

More information

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 200 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

More information

EVOLUTIONARY DISTANCES

EVOLUTIONARY DISTANCES EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:

More information

Cladistics and Bioinformatics Questions 2013

Cladistics and Bioinformatics Questions 2013 AP Biology Name Cladistics and Bioinformatics Questions 2013 1. The following table shows the percentage similarity in sequences of nucleotides from a homologous gene derived from five different species

More information

Efficiencies of maximum likelihood methods of phylogenetic inferences when different substitution models are used

Efficiencies of maximum likelihood methods of phylogenetic inferences when different substitution models are used Molecular Phylogenetics and Evolution 31 (2004) 865 873 MOLECULAR PHYLOGENETICS AND EVOLUTION www.elsevier.com/locate/ympev Efficiencies of maximum likelihood methods of phylogenetic inferences when different

More information

8/23/2014. Phylogeny and the Tree of Life

8/23/2014. Phylogeny and the Tree of Life Phylogeny and the Tree of Life Chapter 26 Objectives Explain the following characteristics of the Linnaean system of classification: a. binomial nomenclature b. hierarchical classification List the major

More information

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics Bioinformatics 1 Biology, Sequences, Phylogenetics Part 4 Sepp Hochreiter Klausur Mo. 30.01.2011 Zeit: 15:30 17:00 Raum: HS14 Anmeldung Kusss Contents Methods and Bootstrapping of Maximum Methods Methods

More information

Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/26

Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/26 Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 27. Phylogeny methods, part 4 (Models of DNA and

More information

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center Phylogenetic Analysis Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center Outline Basic Concepts Tree Construction Methods Distance-based methods

More information

Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/22

Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/22 Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 24. Phylogeny methods, part 4 (Models of DNA and

More information

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University Phylogenetics: Distance Methods COMP 571 - Spring 2015 Luay Nakhleh, Rice University Outline Evolutionary models and distance corrections Distance-based methods Evolutionary Models and Distance Correction

More information

Algorithms in Bioinformatics

Algorithms in Bioinformatics Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Distance Methods Character Methods

More information

Agricultural University

Agricultural University , April 2011 p : 8-16 ISSN : 0853-811 Vol16 No.1 PERFORMANCE COMPARISON BETWEEN KIMURA 2-PARAMETERS AND JUKES-CANTOR MODEL IN CONSTRUCTING PHYLOGENETIC TREE OF NEIGHBOUR JOINING Hendra Prasetya 1, Asep

More information

Minimum evolution using ordinary least-squares is less robust than neighbor-joining

Minimum evolution using ordinary least-squares is less robust than neighbor-joining Minimum evolution using ordinary least-squares is less robust than neighbor-joining Stephen J. Willson Department of Mathematics Iowa State University Ames, IA 50011 USA email: swillson@iastate.edu November

More information

Estimating Divergence Dates from Molecular Sequences

Estimating Divergence Dates from Molecular Sequences Estimating Divergence Dates from Molecular Sequences Andrew Rambaut and Lindell Bromham Department of Zoology, University of Oxford The ability to date the time of divergence between lineages using molecular

More information

EVOLUTIONARY DISTANCE MODEL BASED ON DIFFERENTIAL EQUATION AND MARKOV PROCESS

EVOLUTIONARY DISTANCE MODEL BASED ON DIFFERENTIAL EQUATION AND MARKOV PROCESS August 0 Vol 4 No 005-0 JATIT & LLS All rights reserved ISSN: 99-8645 wwwjatitorg E-ISSN: 87-95 EVOLUTIONAY DISTANCE MODEL BASED ON DIFFEENTIAL EUATION AND MAKOV OCESS XIAOFENG WANG College of Mathematical

More information

Inferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution

Inferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution Today s topics Inferring phylogeny Introduction! Distance methods! Parsimony method!"#$%&'(!)* +,-.'/01!23454(6!7!2845*0&4'9#6!:&454(6 ;?@AB=C?DEF Overview of phylogenetic inferences Methodology Methods

More information

What Is Conservation?

What Is Conservation? What Is Conservation? Lee A. Newberg February 22, 2005 A Central Dogma Junk DNA mutates at a background rate, but functional DNA exhibits conservation. Today s Question What is this conservation? Lee A.

More information

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis

More information

FUNDAMENTALS OF MOLECULAR EVOLUTION

FUNDAMENTALS OF MOLECULAR EVOLUTION FUNDAMENTALS OF MOLECULAR EVOLUTION Second Edition Dan Graur TELAVIV UNIVERSITY Wen-Hsiung Li UNIVERSITY OF CHICAGO SINAUER ASSOCIATES, INC., Publishers Sunderland, Massachusetts Contents Preface xiii

More information

BINF6201/8201. Molecular phylogenetic methods

BINF6201/8201. Molecular phylogenetic methods BINF60/80 Molecular phylogenetic methods 0-7-06 Phylogenetics Ø According to the evolutionary theory, all life forms on this planet are related to one another by descent. Ø Traditionally, phylogenetics

More information

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057 Estimating Phylogenies (Evolutionary Trees) II Biol4230 Thurs, March 2, 2017 Bill Pearson wrp@virginia.edu 4-2818 Jordan 6-057 Tree estimation strategies: Parsimony?no model, simply count minimum number

More information

MOLECULAR PHYLOGENY AND GENETIC DIVERSITY ANALYSIS. Masatoshi Nei"

MOLECULAR PHYLOGENY AND GENETIC DIVERSITY ANALYSIS. Masatoshi Nei MOLECULAR PHYLOGENY AND GENETIC DIVERSITY ANALYSIS Masatoshi Nei" Abstract: Phylogenetic trees: Recent advances in statistical methods for phylogenetic reconstruction and genetic diversity analysis were

More information

Lecture 11 Friday, October 21, 2011

Lecture 11 Friday, October 21, 2011 Lecture 11 Friday, October 21, 2011 Phylogenetic tree (phylogeny) Darwin and classification: In the Origin, Darwin said that descent from a common ancestral species could explain why the Linnaean system

More information

How Molecules Evolve. Advantages of Molecular Data for Tree Building. Advantages of Molecular Data for Tree Building

How Molecules Evolve. Advantages of Molecular Data for Tree Building. Advantages of Molecular Data for Tree Building How Molecules Evolve Guest Lecture: Principles and Methods of Systematic Biology 11 November 2013 Chris Simon Approaching phylogenetics from the point of view of the data Understanding how sequences evolve

More information

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree) I9 Introduction to Bioinformatics, 0 Phylogenetic ree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & omputing, IUB Evolution theory Speciation Evolution of new organisms is driven by

More information

Phylogenetics: Building Phylogenetic Trees

Phylogenetics: Building Phylogenetic Trees 1 Phylogenetics: Building Phylogenetic Trees COMP 571 Luay Nakhleh, Rice University 2 Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary model should

More information

Improving divergence time estimation in phylogenetics: more taxa vs. longer sequences

Improving divergence time estimation in phylogenetics: more taxa vs. longer sequences Mathematical Statistics Stockholm University Improving divergence time estimation in phylogenetics: more taxa vs. longer sequences Bodil Svennblad Tom Britton Research Report 2007:2 ISSN 650-0377 Postal

More information

Tree of Life iological Sequence nalysis Chapter http://tolweb.org/tree/ Phylogenetic Prediction ll organisms on Earth have a common ancestor. ll species are related. The relationship is called a phylogeny

More information

Anatomy of a tree. clade is group of organisms with a shared ancestor. a monophyletic group shares a single common ancestor = tapirs-rhinos-horses

Anatomy of a tree. clade is group of organisms with a shared ancestor. a monophyletic group shares a single common ancestor = tapirs-rhinos-horses Anatomy of a tree outgroup: an early branching relative of the interest groups sister taxa: taxa derived from the same recent ancestor polytomy: >2 taxa emerge from a node Anatomy of a tree clade is group

More information

Theory of Evolution. Charles Darwin

Theory of Evolution. Charles Darwin Theory of Evolution harles arwin 858-59: Origin of Species 5 year voyage of H.M.S. eagle (8-6) Populations have variations. Natural Selection & Survival of the fittest: nature selects best adapted varieties

More information

Multiple Sequence Alignment. Sequences

Multiple Sequence Alignment. Sequences Multiple Sequence Alignment Sequences > YOR020c mstllksaksivplmdrvlvqrikaqaktasglylpe knveklnqaevvavgpgftdangnkvvpqvkvgdqvl ipqfggstiklgnddevilfrdaeilakiakd > crassa mattvrsvksliplldrvlvqrvkaeaktasgiflpe

More information

How should we go about modeling this? Model parameters? Time Substitution rate Can we observe time or subst. rate? What can we observe?

How should we go about modeling this? Model parameters? Time Substitution rate Can we observe time or subst. rate? What can we observe? How should we go about modeling this? gorilla GAAGTCCTTGAGAAATAAACTGCACACACTGG orangutan GGACTCCTTGAGAAATAAACTGCACACACTGG Model parameters? Time Substitution rate Can we observe time or subst. rate? What

More information

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other? Phylogeny and systematics Why are these disciplines important in evolutionary biology and how are they related to each other? Phylogeny and systematics Phylogeny: the evolutionary history of a species

More information

Understanding relationship between homologous sequences

Understanding relationship between homologous sequences Molecular Evolution Molecular Evolution How and when were genes and proteins created? How old is a gene? How can we calculate the age of a gene? How did the gene evolve to the present form? What selective

More information

Phylogene)cs. IMBB 2016 BecA- ILRI Hub, Nairobi May 9 20, Joyce Nzioki

Phylogene)cs. IMBB 2016 BecA- ILRI Hub, Nairobi May 9 20, Joyce Nzioki Phylogene)cs IMBB 2016 BecA- ILRI Hub, Nairobi May 9 20, 2016 Joyce Nzioki Phylogenetics The study of evolutionary relatedness of organisms. Derived from two Greek words:» Phle/Phylon: Tribe/Race» Genetikos:

More information

Bootstraps and testing trees. Alog-likelihoodcurveanditsconfidenceinterval

Bootstraps and testing trees. Alog-likelihoodcurveanditsconfidenceinterval ootstraps and testing trees Joe elsenstein epts. of Genome Sciences and of iology, University of Washington ootstraps and testing trees p.1/20 log-likelihoodcurveanditsconfidenceinterval 2620 2625 ln L

More information

Phylogenetic Analysis and Intraspeci c Variation : Performance of Parsimony, Likelihood, and Distance Methods

Phylogenetic Analysis and Intraspeci c Variation : Performance of Parsimony, Likelihood, and Distance Methods Syst. Biol. 47(2): 228± 23, 1998 Phylogenetic Analysis and Intraspeci c Variation : Performance of Parsimony, Likelihood, and Distance Methods JOHN J. WIENS1 AND MARIA R. SERVEDIO2 1Section of Amphibians

More information

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley

Integrative Biology 200 PRINCIPLES OF PHYLOGENETICS Spring 2018 University of California, Berkeley Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley B.D. Mishler Feb. 14, 2018. Phylogenetic trees VI: Dating in the 21st century: clocks, & calibrations;

More information

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016 Molecular phylogeny - Using molecular sequences to infer evolutionary relationships Tore Samuelsson Feb 2016 Molecular phylogeny is being used in the identification and characterization of new pathogens,

More information

Probability Distribution of Molecular Evolutionary Trees: A New Method of Phylogenetic Inference

Probability Distribution of Molecular Evolutionary Trees: A New Method of Phylogenetic Inference J Mol Evol (1996) 43:304 311 Springer-Verlag New York Inc. 1996 Probability Distribution of Molecular Evolutionary Trees: A New Method of Phylogenetic Inference Bruce Rannala, Ziheng Yang Department of

More information

Estimating Evolutionary Trees. Phylogenetic Methods

Estimating Evolutionary Trees. Phylogenetic Methods Estimating Evolutionary Trees v if the data are consistent with infinite sites then all methods should yield the same tree v it gets more complicated when there is homoplasy, i.e., parallel or convergent

More information

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive. Additive distances Let T be a tree on leaf set S and let w : E R + be an edge-weighting of T, and assume T has no nodes of degree two. Let D ij = e P ij w(e), where P ij is the path in T from i to j. Then

More information

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University Phylogenetics: Building Phylogenetic Trees COMP 571 - Fall 2010 Luay Nakhleh, Rice University Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary

More information

C.DARWIN ( )

C.DARWIN ( ) C.DARWIN (1809-1882) LAMARCK Each evolutionary lineage has evolved, transforming itself, from a ancestor appeared by spontaneous generation DARWIN All organisms are historically interconnected. Their relationships

More information

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task. Chapter 12 (Strikberger) Molecular Phylogenies and Evolution METHODS FOR DETERMINING PHYLOGENY In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task. Modern

More information

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between

More information

Phylogeny: building the tree of life

Phylogeny: building the tree of life Phylogeny: building the tree of life Dr. Fayyaz ul Amir Afsar Minhas Department of Computer and Information Sciences Pakistan Institute of Engineering & Applied Sciences PO Nilore, Islamabad, Pakistan

More information

Plan: Evolutionary trees, characters. Perfect phylogeny Methods: NJ, parsimony, max likelihood, Quartet method

Plan: Evolutionary trees, characters. Perfect phylogeny Methods: NJ, parsimony, max likelihood, Quartet method Phylogeny 1 Plan: Phylogeny is an important subject. We have 2.5 hours. So I will teach all the concepts via one example of a chain letter evolution. The concepts we will discuss include: Evolutionary

More information

ESTIMATION OF CONSERVATISM OF CHARACTERS BY CONSTANCY WITHIN BIOLOGICAL POPULATIONS

ESTIMATION OF CONSERVATISM OF CHARACTERS BY CONSTANCY WITHIN BIOLOGICAL POPULATIONS ESTIMATION OF CONSERVATISM OF CHARACTERS BY CONSTANCY WITHIN BIOLOGICAL POPULATIONS JAMES S. FARRIS Museum of Zoology, The University of Michigan, Ann Arbor Accepted March 30, 1966 The concept of conservatism

More information

Bioinformatics 1 -- lecture 9. Phylogenetic trees Distance-based tree building Parsimony

Bioinformatics 1 -- lecture 9. Phylogenetic trees Distance-based tree building Parsimony ioinformatics -- lecture 9 Phylogenetic trees istance-based tree building Parsimony (,(,(,))) rees can be represented in "parenthesis notation". Each set of parentheses represents a branch-point (bifurcation),

More information

PHYLOGENY AND SYSTEMATICS

PHYLOGENY AND SYSTEMATICS AP BIOLOGY EVOLUTION/HEREDITY UNIT Unit 1 Part 11 Chapter 26 Activity #15 NAME DATE PERIOD PHYLOGENY AND SYSTEMATICS PHYLOGENY Evolutionary history of species or group of related species SYSTEMATICS Study

More information

Inferring Phylogenies from Protein Sequences by. Parsimony, Distance, and Likelihood Methods. Joseph Felsenstein. Department of Genetics

Inferring Phylogenies from Protein Sequences by. Parsimony, Distance, and Likelihood Methods. Joseph Felsenstein. Department of Genetics Inferring Phylogenies from Protein Sequences by Parsimony, Distance, and Likelihood Methods Joseph Felsenstein Department of Genetics University of Washington Box 357360 Seattle, Washington 98195-7360

More information

7. Tests for selection

7. Tests for selection Sequence analysis and genomics 7. Tests for selection Dr. Katja Nowick Group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute for Brain Research www. nowicklab.info

More information

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression)

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression) Using phylogenetics to estimate species divergence times... More accurately... Basics and basic issues for Bayesian inference of divergence times (plus some digression) "A comparison of the structures

More information

Phylogenetic Analysis

Phylogenetic Analysis Phylogenetic Analysis Aristotle Through classification, one might discover the essence and purpose of species. Nelson & Platnick (1981) Systematics and Biogeography Carl Linnaeus Swedish botanist (1700s)

More information

A (short) introduction to phylogenetics

A (short) introduction to phylogenetics A (short) introduction to phylogenetics Thibaut Jombart, Marie-Pauline Beugin MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis with PR Statistics, Millport Field

More information

Letter to the Editor. Temperature Hypotheses. David P. Mindell, Alec Knight,? Christine Baer,$ and Christopher J. Huddlestons

Letter to the Editor. Temperature Hypotheses. David P. Mindell, Alec Knight,? Christine Baer,$ and Christopher J. Huddlestons Letter to the Editor Slow Rates of Molecular Evolution Temperature Hypotheses in Birds and the Metabolic Rate and Body David P. Mindell, Alec Knight,? Christine Baer,$ and Christopher J. Huddlestons *Department

More information

What is Phylogenetics

What is Phylogenetics What is Phylogenetics Phylogenetics is the area of research concerned with finding the genetic connections and relationships between species. The basic idea is to compare specific characters (features)

More information

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley

PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION Integrative Biology 200B Spring 2009 University of California, Berkeley "PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley B.D. Mishler Jan. 22, 2009. Trees I. Summary of previous lecture: Hennigian

More information

Distances that Perfectly Mislead

Distances that Perfectly Mislead Syst. Biol. 53(2):327 332, 2004 Copyright c Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 10.1080/10635150490423809 Distances that Perfectly Mislead DANIEL H. HUSON 1 AND

More information

How to read and make phylogenetic trees Zuzana Starostová

How to read and make phylogenetic trees Zuzana Starostová How to read and make phylogenetic trees Zuzana Starostová How to make phylogenetic trees? Workflow: obtain DNA sequence quality check sequence alignment calculating genetic distances phylogeny estimation

More information

Molecular Evolution and Phylogenetic Tree Reconstruction

Molecular Evolution and Phylogenetic Tree Reconstruction 1 4 Molecular Evolution and Phylogenetic Tree Reconstruction 3 2 5 1 4 2 3 5 Orthology, Paralogy, Inparalogs, Outparalogs Phylogenetic Trees Nodes: species Edges: time of independent evolution Edge length

More information

Bootstrapping and Tree reliability. Biol4230 Tues, March 13, 2018 Bill Pearson Pinn 6-057

Bootstrapping and Tree reliability. Biol4230 Tues, March 13, 2018 Bill Pearson Pinn 6-057 Bootstrapping and Tree reliability Biol4230 Tues, March 13, 2018 Bill Pearson wrp@virginia.edu 4-2818 Pinn 6-057 Rooting trees (outgroups) Bootstrapping given a set of sequences sample positions randomly,

More information

Lecture 4. Models of DNA and protein change. Likelihood methods

Lecture 4. Models of DNA and protein change. Likelihood methods Lecture 4. Models of DNA and protein change. Likelihood methods Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 4. Models of DNA and protein change. Likelihood methods p.1/39

More information

A Phylogenetic Network Construction due to Constrained Recombination

A Phylogenetic Network Construction due to Constrained Recombination A Phylogenetic Network Construction due to Constrained Recombination Mohd. Abdul Hai Zahid Research Scholar Research Supervisors: Dr. R.C. Joshi Dr. Ankush Mittal Department of Electronics and Computer

More information

Classification and Phylogeny

Classification and Phylogeny Classification and Phylogeny The diversity of life is great. To communicate about it, there must be a scheme for organization. There are many species that would be difficult to organize without a scheme

More information

Lecture 6 Phylogenetic Inference

Lecture 6 Phylogenetic Inference Lecture 6 Phylogenetic Inference From Darwin s notebook in 1837 Charles Darwin Willi Hennig From The Origin in 1859 Cladistics Phylogenetic inference Willi Hennig, Cladistics 1. Clade, Monophyletic group,

More information

Phylogenetic Analysis

Phylogenetic Analysis Phylogenetic Analysis Aristotle Through classification, one might discover the essence and purpose of species. Nelson & Platnick (1981) Systematics and Biogeography Carl Linnaeus Swedish botanist (1700s)

More information

Phylogenetic Analysis

Phylogenetic Analysis Phylogenetic Analysis Aristotle Through classification, one might discover the essence and purpose of species. Nelson & Platnick (1981) Systematics and Biogeography Carl Linnaeus Swedish botanist (1700s)

More information

Appendix from L. J. Revell, On the Analysis of Evolutionary Change along Single Branches in a Phylogeny

Appendix from L. J. Revell, On the Analysis of Evolutionary Change along Single Branches in a Phylogeny 008 by The University of Chicago. All rights reserved.doi: 10.1086/588078 Appendix from L. J. Revell, On the Analysis of Evolutionary Change along Single Branches in a Phylogeny (Am. Nat., vol. 17, no.

More information

Consistency Index (CI)

Consistency Index (CI) Consistency Index (CI) minimum number of changes divided by the number required on the tree. CI=1 if there is no homoplasy negatively correlated with the number of species sampled Retention Index (RI)

More information

Effects of Gap Open and Gap Extension Penalties

Effects of Gap Open and Gap Extension Penalties Brigham Young University BYU ScholarsArchive All Faculty Publications 200-10-01 Effects of Gap Open and Gap Extension Penalties Hyrum Carroll hyrumcarroll@gmail.com Mark J. Clement clement@cs.byu.edu See

More information

Chapter 16: Reconstructing and Using Phylogenies

Chapter 16: Reconstructing and Using Phylogenies Chapter Review 1. Use the phylogenetic tree shown at the right to complete the following. a. Explain how many clades are indicated: Three: (1) chimpanzee/human, (2) chimpanzee/ human/gorilla, and (3)chimpanzee/human/

More information

Classification and Phylogeny

Classification and Phylogeny Classification and Phylogeny The diversity it of life is great. To communicate about it, there must be a scheme for organization. There are many species that would be difficult to organize without a scheme

More information

Substitution = Mutation followed. by Fixation. Common Ancestor ACGATC 1:A G 2:C A GAGATC 3:G A 6:C T 5:T C 4:A C GAAATT 1:G A

Substitution = Mutation followed. by Fixation. Common Ancestor ACGATC 1:A G 2:C A GAGATC 3:G A 6:C T 5:T C 4:A C GAAATT 1:G A GAGATC 3:G A 6:C T Common Ancestor ACGATC 1:A G 2:C A Substitution = Mutation followed 5:T C by Fixation GAAATT 4:A C 1:G A AAAATT GAAATT GAGCTC ACGACC Chimp Human Gorilla Gibbon AAAATT GAAATT GAGCTC ACGACC

More information

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Bioinformatics II Probability and Statistics Universität Zürich and ETH Zürich Spring Semester 2009 Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Dr Fraser Daly adapted from

More information

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9 Lecture 5 Alignment I. Introduction. For sequence data, the process of generating an alignment establishes positional homologies; that is, alignment provides the identification of homologous phylogenetic

More information