Assessing Phylogenetic Hypotheses and Phylogenetic Data

Size: px
Start display at page:

Download "Assessing Phylogenetic Hypotheses and Phylogenetic Data"

Transcription

1 Assessing Phylogenetic Hypotheses and Phylogenetic Data We use numerical phylogenetic methods because most data includes potentially misleading evidence of relationships We should not be content with constructing phylogenetic hypotheses but should also assess what confidence we can place in our hypotheses This is not always simple! (but do not despair!)

2 Assessing Data Quality We expect (or hope) our data will be well structured and contain strong phylogenetic signal We can test this using randomization tests of explicit null hypotheses The behaviour or some measure of the quality of our real data is contrasted with that of comparable but phylogenetically uninformative data determined by randomization of the data

3 Random Permutation Random permutation destroys any correlation among characters to that expected by chance alone It preserves number of taxa, characters and character states in each character (and the theoretical maximum and minimum tree lengths) TAXA 1 2 CHARACTERS R-P R P R P R P R P A-E N-R A N E R A N E R A N E R A N E R D-M D M D M D M D M O-U O U O U O U O U M-T M T M T M T M T L-E L E L E L E L E Y-D Y D Y D Y D Y D TAXA CHARACTERS R-P N U D E R T O U A-E R E A P L E A D N-R M R M M A D N P D-M L T R E Y M D R O-U M-T D O E M Y O U T D O E U Y L M T L-E Y-D Y A D P N L D R M N P R M R E E Original structured data with strong correlations among characters Randomly permuted data with any correlation among characters due to chance

4 Matrix Randomization Tests Compare some measure of data quality/hierarchical structure for the real and many randomly permuted data sets This allows us to define a test statistic for the null hypothesis that the real data are no better structured than randomly permuted and phylogenetically uninformative data A permutation tail probability (PTP) is the proportion of data sets with as good or better measure of quality than the real data

5 Structure of Randomization Tests Reject null hypothesis if, for example, more than 5% of random permutations have as good or better measure than the real data FAIL TEST Frequency 95% cutoff PASS TEST reject null hypothesis Measure of data quality (e.g. tree length, ML, pairwise incompatibilities) GOOD BAD

6 Matrix Randomization Tests Measures of data quality include: 1. Tree length for most parsimonious trees - the shorter the tree length the better the data (PAUP*) 2. Numbers of pairwise incompatibilities between characters (pairs of incongruent characters) - the fewer character conflicts the better the data 3. Skewness of the distribution of tree lengths (PAUP)

7 Matrix Randomization Tests Ciliate SSUrDNA Real data Min = 430 Max = 927 Ochromonas Symbiodinium Prorocentrum Loxodes Tracheloraphis Spirostomum Gruberia Euplotes Tetrahymena 1 MPT L = 618 CI = RI = PTP = 0.01 PC-PTP = Significantly non random Randomly permuted Strict consensus Ochromonas Symbiodinium Prorocentrum Loxodes Tetrahymena Tracheloraphis Spirostomum Euplotes Gruberia 3 MPTs L = 792 CI = RI = PTP = 0.68 PC-PTP = Not significantly different from random

8 Skewness of Tree Length Distributions NUMBER OF TREES shortest tree Studies with random (and phylogenetically uninformative) data showed that the distribution of tree lengths tends to be normal NUMBER OF TREES shortest tree Tree length Tree length In contrast, phylogenetically informative data is expected to have a strongly skewed distribution with few shortest trees and few trees nearly as short

9 Skewness of Tree Length Distributions Skewness of tree length distributions can be used as a measure of data quality in randomization tests It is measured with the G 1 statistic in PAUP Significance cut-offs for data sets of up to eight taxa have been published based on randomly generated data (rather than randomly permuted data) PAUP does not perform the more direct randomization test

10 Skewness - example 722 ## ( 72) 723 ### ( 92) 724 ### ( 101) 725 ### ( 87) 726 #### ( 107) 727 #### ( 120) 728 #### ( 111) 729 ##### ( 134) 730 ##### ( 137) 731 #### ( 110) 732 #### ( 113) 733 #### ( 119) 734 #### ( 127) 735 ##### ( 131) 736 #### ( 106) 737 #### ( 109) 738 #### ( 126) 739 #### ( 115) 740 ##### ( 136) 741 #### ( 128) 742 ##### ( 144) 743 ##### ( 134) 744 ###### ( 160) 745 ##### ( 152) 746 ##### ( 159) 747 ###### ( 164) 748 ###### ( 182) 749 ####### ( 216) 750 ####### ( 193) 751 ######## ( 235) 752 ######## ( 244) 753 ######### ( 251) 754 ######## ( 243) 755 ######### ( 254) 756 ######## ( 243) 757 ######### ( 271) 758 ######### ( 255) 759 ########## ( 287) 760 ######### ( 268) 761 ########## ( 291) 762 ########### ( 319) 763 ########## ( 295) 764 ########### ( 314) 765 ########### ( 312) 766 ########### ( 331) 767 ########### ( 325) 768 ############ ( 347) 769 ########### ( 333) 770 ############ ( 361) 771 ############## ( 400) 772 ############# ( 386) 773 ############## ( 420) 774 ############## ( 399) 775 ############### ( 435) 776 ################# ( 505) 777 ################# ( 492) 778 ################## ( 534) 779 ################## ( 517) 780 ################## ( 529) 781 ###################### ( 637) 782 ##################### ( 604) 783 ######################## ( 685) 784 ######################## ( 691) 785 ###################### ( 644) 786 ######################## ( 700) 787 ########################## ( 746) 788 ######################### ( 713) 789 ########################## ( 743) 790 ########################## ( 746) 791 ######################### ( 732) 792 ########################## ( 764) 793 ############################ ( 811) 794 ######################### ( 717) 795 ########################## ( 762) 796 ######################## ( 695) 797 ############################ ( 807) 798 ######################## ( 685) 799 ####################### ( 660) 800 ######################## ( 688) 801 ####################### ( 659) 802 ######################## ( 693) 803 ######################## ( 694) 804 ########################## ( 762) 805 ########################## ( 743) 806 ######################### ( 737) 807 ########################## ( 745) 808 ############################ ( 816) 809 ############################# ( 838) 810 ############################ ( 827) 811 ########################## ( 765) 812 ############################## ( 859) ########################## ########################### ( 763) ( 773) 815 ############################# ( 835) 816 ############################ ( 802) 817 ########################### ( 798) 818 ############################# ( 848) 819 ############################# ( 847) 820 ############################## ( 879) 821 ############################ ( 828) 822 ########################### ( 784) 823 ########################## ( 757) 824 ########################## ( 770) 825 ############################ ( 812) 826 ############################ ( 819) 827 ############################# ( 850) 828 ############################## ( 863) 829 ################################ ( 934) 830 ################################ ( 919) 831 ################################# ( 963) 832 ################################### ( 1021) 833 ###################################### ( 1113) 834 ####################################### ( 1143) 835 ######################################## ( 1162) 836 ########################################## ( 1223) 837 ############################################ ( 1270) 838 ############################################### ( 1356) 839 ################################################ ( 1399) 840 ############################################### ( 1356) 841 ################################################# ( 1424) 842 ################################################### ( 1492) 843 #################################################### ( 1499) 844 ######################################################## ( 1630) 845 ####################################################### ( 1594) 846 ######################################################## ( 1619) 847 ########################################################### ( 1718) 848 ############################################################# ( 1765) 849 ############################################################## ( 1793) 850 ################################################################ ( 1853) 851 ############################################################## ( 1800) 852 ############################################################# ( 1773) 853 ################################################################ ( 1861) 854 ################################################################ ( 1853) 855 ############################################################## ( 1805) 856 ########################################################### ( 1722) 857 ######################################################### ( 1651) 858 ####################################################### ( 1613) 859 ###################################################### ( 1559) 860 ################################################### ( 1482) 861 ################################################### ( 1479) 862 ################################################ ( 1409) 863 ############################################## ( 1349) 864 ################################################ ( 1407) 865 ################################################### ( 1487) 866 ################################################## ( 1445) 867 ##################################################### ( 1550) 868 ################################################### ( 1482) 869 ###################################################### ( 1573) 870 ####################################################### ( 1587) 871 #################################################### ( 1525) 872 ###################################################### ( 1576) 873 ###################################################### ( 1572) 874 #################################################### ( 1499) 875 ################################################### ( 1480) 876 ############################################### ( 1370) 877 ############################################ ( 1289) 878 ########################################## ( 1228) 879 ######################################## ( 1165) 880 ################################### ( 1006) 881 ################################## ( 992) 882 ############################### ( 890) 883 ########################### ( 792) 884 ######################## ( 693) 885 ###################### ( 650) 886 ##################### ( 606) 887 ################ ( 469) 888 ############## ( 415) 889 ########### ( 314) 890 ######## ( 232) 891 ####### ( 213) 892 ##### ( 133) 893 #### ( 114) 894 ### ( 75) 895 ## ( 60) 896 ## ( 52) 897 # ( 17) 898 # ( 16) 899 ( 6) 900 ( 4) REAL DATA Ciliate SSUrDNA g1= (3) 793 (6) 794 (12) 795 (7) 796 (17) 797 (30) 798 (33) 799 # (42) 800 # (62) 801 # (91) 802 # (111) 803 ## (134) 804 ## (172) 805 ### (234) 806 #### (292) 807 #### (356) 808 ###### (450) 809 ####### (557) 810 ######## (642) 811 ######### (737) 812 ############ (973) 813 ############## (1130) 814 ################ (1308) 815 #################### (1594) 816 ##################### (1697) 817 ########################## (2097) 818 ############################## (2389) 819 ################################## (2714) 820 ###################################### (3080) 821 ######################################### (3252) 822 ############################################# (3616) 823 ################################################# (3933) 824 ################################################### (4094) 825 ####################################################### (4408) RANDOMLY PERMUTED DATA g1= ######################################################### (4574) 827 ########################################################## (4656) 828 ############################################################# (4871) 829 ############################################################## (4962) 830 ################################################################ (5130) 831 ############################################################## (5005) 832 ############################################################### (5078) 833 ############################################################### (5035) 834 ############################################################### (5029) 835 ############################################################# (4864) 836 ########################################################## (4620) 837 ######################################################## (4491) 838 ##################################################### (4256) 839 ################################################### (4057) 840 ############################################### (3749) 841 ############################################ (3502) 842 ####################################### (3160) 843 ################################### (2771) 844 ############################### (2514) 845 ############################ (2258) 846 ######################### (1964) 847 ###################### (1728) 848 ################## (1425) 849 ############## (1159) 850 ########### (915) 851 ######### (760) 852 ####### (581) 853 ###### (490) 854 #### (321) 855 ### (269) 856 ### (218) 857 ## (161) 858 # (95) 859 # (73) 860 # (46) 861 (26) 862 (16) 863 (14) 864 (7) 865 (7) 866 (3) 867 (2) Frequency distribution of tree lengths Frequency distribution of tree lengths

11 Matrix Randomization Tests - use and limitations Can detect very poor data - that provides no good basis for phylogenetic inferences (throw it away!) However, only very little may be needed to reject the null hypothesis (passing test great data) Doesn t indicate location of this structure (more discerning tests are possible) In the skewness test, significance levels for G 1 have been determined for small numbers of taxa only so that this test remains of limited use

12 Assessing Phylogenetic Hypotheses - groups on trees Several methods have been proposed that attach numerical values to internal branches in trees that are intended to provide some measure of the strength of support for those branches and the corresponding groups These methods include: character resampling methods - the bootstrap and jackknife decay analyses additional randomization tests

13 Bootstrapping (non-parametric) Bootstrapping is a modern statistical technique that uses computer intensive random resampling of data to determine sampling error or confidence intervals for some estimated parameter

14 Bootstrapping (non-parametric) Characters are resampled with replacement to create many bootstrap replicate data sets Each bootstrap replicate data set is analysed (e.g. with parsimony, distance, ML) Agreement among the resulting trees is summarized with a majority-rule consensus tree Frequency of occurrence of groups, bootstrap proportions (BPs), is a measure of support for those groups Additional information is given in partition tables

15 Bootstrapping Original data matrix Resampled data matrix Characters Taxa A R R Y Y Y Y Y Y B R R Y Y Y Y Y Y C Y Y Y Y Y R R R D Y Y R R R R R R Outgp R R R R R R R R A B C D Outgroup Characters Taxa A R R R Y Y Y Y Y B R R R Y Y Y Y Y C Y Y Y Y Y R R R D Y Y Y R R R R R Outgp R R R R R R R R Randomly resample characters from the original data with replacement to build many bootstrap replicate data sets of the same size as the original - analyse each replicate data set A B C D Outgroup Summarise the results of multiple analyses with a majority-rule consensus tree Bootstrap proportions (BPs) are the frequencies with which groups are encountered in analyses of replicate data sets A B C D 96% 66% Outgroup

16 Bootstrapping - an example Ciliate SSUrDNA - parsimony bootstrap ajority-rule consensus Ochromonas (1) Symbiodinium (2) Prorocentrum (3) Euplotes (8) Tetrahymena (9) Loxodes (4) Tracheloraphis (5) Spirostomum (6) Gruberia (7) Partition Table Freq ** ** ** **** ****** ** ****.* ***** ******* **...* **...* 1.00

17 Bootstrapping - random data Partition Table 71 Randomly permuted data - parsimony bootstrap 59 Ochromonas Symbiodinium Prorocentrum Loxodes Tracheloraphis Spirostomumum Euplotes Tetrahymena Gruberia Ochromonas Symbiodinium Prorocentrum Loxodes Spirostomumum Tetrahymena Euplotes Tracheloraphis Gruberia Majority-rule consensus (with minority components) Freq *****.** ** *..* *...* ***.*.** *...* *..**.** *..* *...*..* ***...* **.** **.* *...* **..*..* *...* *.** ***

18 Bootstrap - interpretation Bootstrapping was introduced as a way of establishing confidence intervals for phylogenies This interpretation of bootstrap proportions (BPs) depends on the assumption that the original data is a random sample from a much larger set of independent and identically distributed data However, several things complicate this interpretation - Perhhaps the assumptions are unreasonable - making any statistical interpretation of BPs invalid - Some theoretical work indicates that BPs are very conservative, and may underestimate confidence intervals - problem increases with numbers of taxa - BPs can be high for incongruent relationships in separate analyses - and can therefore be misleading (misleading data -> misleading BPs) - with parsimony it may be highly affected by inclusion or exclusion of only a few characters

19 Bootstrap - interpretation Bootstrapping is a very valuable and widely used technique - it (or some suitable) alternative is demanded by some journals, but it may require a pragmatic interpretation: BPs depend on two aspects of the support for a group - the numbers of characters supporting a group and the level of support for incongruent groups BPs thus provides an index of the relative support for groups provided by a set of data under whatever interpretation of the data (method of analysis) is used

20 Bootstrap - interpretation High BPs (e.g. > 85%) is indicative of strong signal in the data Provided we have no evidence of strong misleading signal (e.g. base composition biases, great differences in branch lengths) high BPs are likely to reflect strong phylogenetic signal Low BPs need not mean the relationship is false, only that it is poorly supported Bootstrapping can be viewed as a way of exploring the robustness of phylogenetic inferences to perturbations in the the balance of supporting and conflicting evidence for groups

21 Jackknifing Jackknifing is very similar to bootstrapping and differs only in the character resampling strategy Some proportion of characters (e.g. 50%) are randomly selected and deleted Replicate data sets are analysed and the results summarised with a majority-rule consensus tree Jackknifing and bootstrapping tend to produce broadly similar results and have similar interpretations

22 Decay analysis In parsimony analysis, a way to assess support for a group is to see if the group occurs in slightly less parsimonious trees also The length difference between the shortest trees including the group and the shortest trees that exclude the group (the extra steps required to overturn a group) is the decay index or Bremer support Total support (for a tree) is the sum of all clade decay indices - this has been advocated as a measure for an as yet unavailable matrix randomization test

23 Decay analysis -example Ciliate SSUrDNA data Ochromonas Symbiodinium Prorocentrum Loxodes Tracheloraphis Spirostomum Gruberia Euplotes Tetrahymena Randomly permuted data Ochromonas Symbiodinium Prorocentrum Loxodes Tetrahymena Tracheloraphis Spirostomum Euplotes Gruberia

24 Decay analyses - in practice Decay indices for each clade can be determined by: - Saving increasingly less parsimonious trees and producing corresponding strict component consensus trees until the consensus is completely unresolved - analyses using reverse topological constraints to determine shortest trees that lack each clade - with the Autodecay or TreeRot programs (in conjunction with PAUP)

25 Decay indices - interpretation Generally, the higher the decay index the better the relative support for a group Like BPs, decay indices may be misleading if the data is misleading Unlike BPs decay indices are not scaled (0-100) and it is less clear what is an acceptable decay index Magnitude of decay indices and BPs generally correlated (i.e. they tend to agree) Only groups found in all most parsimonious trees have decay indices > zero

26 Trees are typically complex - they can be thought of as sets of less complex relationships A B C D E Clades AB ABC DE Resolved triplets (AB)C (AC)D (DE)A (AB)D (AC)E (DE)B (AB)E (BC)D (DE)C (AC)E Resolved quartets ABCD ABDE ABCE ACDE BCDE

27 Extending Support Measures The same measures (BP, JP & DI) that are used for clades/splits can also be determined for triplets and quartets This provides a lot more information because there are more triplets/quartets than there are clades Furthermore...

28 The Decay Theorem The DI of an hypothesis of relationships is equal to the lowest DI of the resolved triplets that the hypothesis entails This applies equally to BPs and JPs as well as DIs Thus a phylogenetic chain is no stronger than its weakest link! and, measures of clade support may give a very incomplete picture of the distribution of support

29 Extensions Double decay analysis is the determination of decay indices for all relationships - gives a more comprehensive but potentially very complicated summary of support Majority-rule reduced consensus provides a similarly more comprehensive/complicated summary of bootstrap/jackknife proportions Leaf stability provides support values for the phylogenetic position of particular leaves

30 Bootstrapping with Reduced Consensus X ABCDE J I H G F X A B C D E F G H I J ABCDEX J I H G F ABCDE J I H G F A B C D E F G H I J X A B C D E F G H I J

31 Bootstrapping X A B C D E F G H I J A B C D E F G H I J

32 Leaf Stability Leaf stability is the average of supports of the triplets/quartets containing the leaf Acanthostega Ichthyostega Greererpeton Crassigyrinus Eucritta (54) Gephyrostegus Whatcheeria Balanerpeton Dendrerpeton Pholiderpeton Proterogyrinus Baphetes (67) Loxomma (67) Megalocephalus (98) (98) (69) (53) (58) (49) (64) (64) (66) (66) (67)

33 PTP tests of groups A number of randomization tests have been proposed for evaluating particular groups rather than entire data matrices by testing null hypotheses regarding the level of support they receive from the data Randomisation can be of the data or the group These methods have not become widely used both because they are not readily performed and because their properties are still under investigation One type, the topology dependent PTP tests are included in PAUP* but have serious problems

34 Comparing competing phylogenetic hypotheses - tests of two trees Particularly useful techniques are those designed to allow evaluation of alternative phylogenetic hypotheses Several such tests allow us to determine if one tree is statistically significantly worse than another: Winning sites test, Templeton test, Kishino-Hasegawa test, Shimodaira-Hasegawa test, parametric bootstrapping

35 Tests of two trees All these tests are of the null hypothesis that the differences between two trees (A and B) are no greater than expected from sampling error The simplest wining sites test sums the number of sites supporting tree A over tree B and vice versa (those having fewer steps on, and better fit to, one of the trees) Under the null hypothesis characters are equally likely to support tree A or tree B and a binomial distribution gives the probability of the observed difference in numbers of winning sites

36 The Templeton test Templeton s test is a non-parametric Wilcoxon signed ranks test of the differences in fits of characters to two trees It is like the winning sites test but also takes into account the magnitudes of differences in the support of characters for the two trees

37 Templeton s test - an example Seymouriadae Diadectomorpha 1 Synapsida Parareptilia Captorhinidae Paleothyris Araeoscelidia Claudiosaurus Younginiformes Archosauromorpha Lepidosauriformes 2 Placodus Eosauropterygia Recent studies of the relationships of turtles using morphological data have produced very different results with turtles grouping either within the parareptiles (H1) or within the diapsids (H2) the result depending on the morphologist This suggests there may be: - problems with the data - special problems with turtles - weak support for turtle relationships Parsimony analysis of the most recent data favoured H2 However, analyses constrained by H2 produced trees that required only 3 extra steps (<1% tree length) The Templeton test was used to evaluate the trees and showed that the slightly longer H1 tree found in the constrained analyses was not significantly worse than the unconstrained H2 tree The morphological data do not allow choice between H1 and H2

38 Kishino-Hasegawa test The Kishino-Hasegawa test is similar in using differences in the support provided by individual sites for two trees to determine if the overall differences between the trees are significantly greater than expected from random sampling error It is a parametric test that depends on assumptions that the characters are independent and identically distributed (the same assumptions underlying the statistical interpretation of bootstrapping) It can be used with parsimony and maximum likelihood - implemented in PHYLIP and PAUP*

39 Kishino-Hasegawa test Sites favouring tree A Mean Expected Sites favouring tree B 0 Distribution of Step/Likelihood differences at each site Under the null hypothesis the mean of the differences in parsimony steps or likelihoods for each site is expected to be zero, and the distribution normal From observed differences we calculate a standard deviation If the difference between trees (tree lengths or likelihoods) is attributable to sampling error, then characters will randomly support tree A or B and the total difference will be close to zero The observed difference is significantly greater than zero if it is greater than 1.95 standard deviations This allows us to reject the null hypothesis and declare the suboptimal tree significantly worse than the optimal tree (p < 0.05)

40 Kishino-Hasegawa test - an example Ochromonas Symbiodinium Prorocentrum Sarcocystis Theileria Colpoda Ciliate SSUrDNA Maximum likelihood tree Plagiopyla n Plagiopyla f Trimyema c Trimyema s Cyclidium p Cyclidium g Cyclidium l Glaucoma Colpodinium Tetrahymena Paramecium Discophrya Trithigmostoma Opisthonecta Dasytrichia Entodinium Spathidium Loxophylum Homalozoon Metopus c Metopus p Stylonychia Onychodromous Oxytrichia Loxodes Tracheloraphis Spirostomum Gruberia Blepharisma Parsimonious character optimization of the presence and absence of hydrogenosomes suggests four separate origins of hydrogenosomes within the ciliates Questions - how reliable is this result? - in particular how well supported is the idea of multiple origins? - how many origins can we confidently infer? anaerobic ciliates with hydrogenosomes

41 Kishino-Hasegawa test - an example Ochromonas Symbiodinium Prorocentrum 7 Sarcocystis Theileria Colpoda Plagiopyla n Plagiopyla f Trimyema c Trimyema s Glaucoma Colpodinium Tetrahymena Paramecium Cyclidium p Cyclidium g Cyclidium l Discophryal Opisthonecta Spathidium Homalozoon Loxophylum Metopus c Metopus p Stylonychia Onychodromous Oxytrichia Ciliate SSUrDNA data Most parsimonious tree Loxodes Tracheloraphis Spirostomum Gruberia Blepharisma Trithigmostoma Dasytrichia Entodinium Parsimony analysis yields a very similar tree - in particular, parsimonious character optimization indicates four separate origins of hydrogenosomes within ciliates Decay indices and BPs for parsimony and distance analyses indicate relative support for clades Differences between the ML, MP and distance trees generally reflect the less well supported relationships

42 Kishino-Hasegawa test - example Ochromonas Symbiodinium Prorocentrum Sarcocystis Theileria Plagiopyla n Plagiopyla f Trimyema c Trimyema s Cyclidium p Cyclidium g Cyclidium l Dasytrichia Entodinium Loxophylum Homalozoon Spathidium Metopus c Metopus p Loxodes Tracheloraphis Spirostomum Gruberia Blepharisma Discophrya Trithigmostoma Stylonychia Onychodromous Oxytrichia Colpoda Paramecium Glaucoma Colpodinium Tetrahymena Opisthonecta Ochromonas Symbiodinium Prorocentrum Sarcocystis Theileria Plagiopyla n Plagiopyla f Trimyema c Trimyema s Cyclidium p Metopus c Metopus p Dasytrichia Entodinium Cyclidium g Cyclidium l Loxophylum Spathidium Homalozoon Loxodes Tracheloraphis Spirostomum Gruberia Blepharisma Discophrya Trithigmostoma Stylonychia Onychodromous Oxytrichia Colpoda Paramecium Glaucoma Colpodinium Tetrahymena Opisthonecta Parsimony analyse with topological constraints were used to find the shortest trees that forced hydrogenosomal ciliate lineages together and thereby reduced the number of separate origins of hydrogenosomes Each of the constrained parsimony trees were compared to the ML tree and the Kishino-Hasegawa test used to determine which of these trees were significantly worse than the ML tree Two examples of the topological constraint trees

43 Kishino-Hasegawa test Test summary and results - origins of ciliate hydrogenosomes (simplified) No. Constraint Extra Difference Significantly Origins tree Steps and SD worse? 4 ML MP ± 18 No 3 (cp,pt) ± 22 No 3 (cp,rc) ± 40 Yes 3 (cp,m) ± 36 Yes 3 (pt,rc) ± 38 Yes 3 (pt,m) ± 29 Yes 3 (rc,m) ± 34 Yes 2 (pt,cp,rc) ± 40 Yes 2 (pt,rc,m) ± 43 Yes 2 (pt,cp,m) ± 37 Yes 2 (cp,rc,m) ± 49 Yes 2 (pt,cp)(rc,m) ± 39 Yes 2 (pt,m)(rc,cp) ± 48 Yes 2 (pt,rc)(cp,m) ± 50 Yes 1 (pt,cp,m,rc) ± 49 Yes Constrained analyses used to find most parsimonious trees with less than four separate origins of hydrogenosomes Tested against ML tree Trees with 2 or 1 origin are all significantly worse than the ML tree We can confidently conclude that there have been at least three separate origins of hydrogenosomes within the sampled ciliates

44 Shimodaira-Hasegawa Test To be statistically valid, the Kishino-Hasegawa test should be of trees that are selected a priori However, most applications have used trees selected a posteriori on the basis of the phylogenetic analysis Where we test the best tree against some other tree the KH test will be biased towards rejection of the null hypothesis The SH test is a similar but more statistically correct technique in these circumstances and should be preferred

45 Taxonomic Congruence Trees inferred from different data sets (different genes, morphology) should agree if they are accurate Congruence between trees is best explained by their accuracy Congruence can be investigated using consensus (and supertree) methods Incongruence requires further work to explain or resolve disagreements

46 Reliability of Phylogenetic Methods Phylogenetic methods (e.g. parsimony, distance, ML) can also be evaluated in terms of their general performance, particularly their: consistency - approach the truth with more data efficiency - how quickly (how much data) robustness - how sensitive to violations of assumptions Studies of these properties can be analytical or by simulation

47 Reliability of Phylogenetic Methods There have been many arguments that ML methods are best because they have desirable statistical properties, such as consistency However, ML does not always have these properties if the model is wrong/inadequate (fortunately this is testable to some extent) properties not yet demonstrated for complex inference problems such as phylogenetic trees

48 Reliability of Phylogenetic Methods Simulations show that ML methods generally outperform distance and parsimony methods over a broad range of realistic conditions Whelan et al Trends in Genetics 17: Most simulations are very (unrealistically) simple few taxa (typically just four) few parameters (standard models - JC, K2P etc)

49 Reliability of Phylogenetic Methods Simulations with four taxa have shown: - Model based methods - distance and maximum likelihood perform well when the model is accurate (not surprising!) - Violations of assumptions can lead to inconsistency for all methods (a Felsenstein zone) when branch lengths or rates are highly unequal - Maximum likelihood methods are quite robust to violations of model assumptions - Weighting can improve the performance of parsimony (reduce the size of the Felsenstein zone)

50 Reliability of Phylogenetic Methods However: - Generalising from four taxon simulations may be dangerous as conclusions may not hold for more complex cases - A few large scale simulations (many taxa) have suggested that parsimony can be very accurate and efficient - Most methods are accurate in correctly recovering known phylogenies produced in laboratory studies More study of methods is needed to help in choice of method using more realistic simulations

C3020 Molecular Evolution. Exercises #3: Phylogenetics

C3020 Molecular Evolution. Exercises #3: Phylogenetics C3020 Molecular Evolution Exercises #3: Phylogenetics Consider the following sequences for five taxa 1-5 and the known outgroup O, which has the ancestral states (note that sequence 3 has changed from

More information

Questions we can ask. Recall. Accuracy and Precision. Systematics - Bio 615. Outline

Questions we can ask. Recall. Accuracy and Precision. Systematics - Bio 615. Outline Outline 1. Mechanistic comparison with Parsimony - branch lengths & parameters 2. Performance comparison with Parsimony - Desirable attributes of a method - The Felsenstein and Farris zones - Heterotachous

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: istance-based methods Ultrametric Additive: UPGMA Transformed istance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: Distance-based methods Ultrametric Additive: UPGMA Transformed Distance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

A Chain Is No Stronger than Its Weakest Link: Double Decay Analysis of Phylogenetic Hypotheses

A Chain Is No Stronger than Its Weakest Link: Double Decay Analysis of Phylogenetic Hypotheses Syst. Biol. 49(4):754 776, 2000 A Chain Is No Stronger than Its Weakest Link: Double Decay Analysis of Phylogenetic Hypotheses MARK WILKINSON, 1 JOSEPH L. THORLEY, 1,2 AND PAUL UPCHURCH 3 1 Department

More information

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley

PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION Integrative Biology 200B Spring 2009 University of California, Berkeley "PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley B.D. Mishler Jan. 22, 2009. Trees I. Summary of previous lecture: Hennigian

More information

Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30

Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30 Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30 A non-phylogeny

More information

Phylogenetic methods in molecular systematics

Phylogenetic methods in molecular systematics Phylogenetic methods in molecular systematics Niklas Wahlberg Stockholm University Acknowledgement Many of the slides in this lecture series modified from slides by others www.dbbm.fiocruz.br/james/lectures.html

More information

Evaluating phylogenetic hypotheses

Evaluating phylogenetic hypotheses Evaluating phylogenetic hypotheses Methods for evaluating topologies Topological comparisons: e.g., parametric bootstrapping, constrained searches Methods for evaluating nodes Resampling techniques: bootstrapping,

More information

Phylogenetic inference

Phylogenetic inference Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis-) advantages of different information types

More information

Introduction to characters and parsimony analysis

Introduction to characters and parsimony analysis Introduction to characters and parsimony analysis Genetic Relationships Genetic relationships exist between individuals within populations These include ancestordescendent relationships and more indirect

More information

Systematics - Bio 615

Systematics - Bio 615 Bayesian Phylogenetic Inference 1. Introduction, history 2. Advantages over ML 3. Bayes Rule 4. The Priors 5. Marginal vs Joint estimation 6. MCMC Derek S. Sikes University of Alaska 7. Posteriors vs Bootstrap

More information

Integrative Biology 200A "PRINCIPLES OF PHYLOGENETICS" Spring 2008

Integrative Biology 200A PRINCIPLES OF PHYLOGENETICS Spring 2008 Integrative Biology 200A "PRINCIPLES OF PHYLOGENETICS" Spring 2008 University of California, Berkeley B.D. Mishler March 18, 2008. Phylogenetic Trees I: Reconstruction; Models, Algorithms & Assumptions

More information

Consensus Methods. * You are only responsible for the first two

Consensus Methods. * You are only responsible for the first two Consensus Trees * consensus trees reconcile clades from different trees * consensus is a conservative estimate of phylogeny that emphasizes points of agreement * philosophy: agreement among data sets is

More information

Inferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution

Inferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution Today s topics Inferring phylogeny Introduction! Distance methods! Parsimony method!"#$%&'(!)* +,-.'/01!23454(6!7!2845*0&4'9#6!:&454(6 ;?@AB=C?DEF Overview of phylogenetic inferences Methodology Methods

More information

Properties of Consensus Methods for Inferring Species Trees from Gene Trees

Properties of Consensus Methods for Inferring Species Trees from Gene Trees Syst. Biol. 58(1):35 54, 2009 Copyright c Society of Systematic Biologists DOI:10.1093/sysbio/syp008 Properties of Consensus Methods for Inferring Species Trees from Gene Trees JAMES H. DEGNAN 1,4,,MICHAEL

More information

Assessing Congruence Among Ultrametric Distance Matrices

Assessing Congruence Among Ultrametric Distance Matrices Journal of Classification 26:103-117 (2009) DOI: 10.1007/s00357-009-9028-x Assessing Congruence Among Ultrametric Distance Matrices Véronique Campbell Université de Montréal, Canada Pierre Legendre Université

More information

(Stevens 1991) 1. morphological characters should be assumed to be quantitative unless demonstrated otherwise

(Stevens 1991) 1. morphological characters should be assumed to be quantitative unless demonstrated otherwise Bot 421/521 PHYLOGENETIC ANALYSIS I. Origins A. Hennig 1950 (German edition) Phylogenetic Systematics 1966 B. Zimmerman (Germany, 1930 s) C. Wagner (Michigan, 1920-2000) II. Characters and character states

More information

Hypothesis testing and phylogenetics

Hypothesis testing and phylogenetics Hypothesis testing and phylogenetics Woods Hole Workshop on Molecular Evolution, 2017 Mark T. Holder University of Kansas Thanks to Paul Lewis, Joe Felsenstein, and Peter Beerli for slides. Motivation

More information

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2016 University of California, Berkeley. Parsimony & Likelihood [draft]

Integrative Biology 200 PRINCIPLES OF PHYLOGENETICS Spring 2016 University of California, Berkeley. Parsimony & Likelihood [draft] Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2016 University of California, Berkeley K.W. Will Parsimony & Likelihood [draft] 1. Hennig and Parsimony: Hennig was not concerned with parsimony

More information

Bootstrap confidence levels for phylogenetic trees B. Efron, E. Halloran, and S. Holmes, 1996

Bootstrap confidence levels for phylogenetic trees B. Efron, E. Halloran, and S. Holmes, 1996 Bootstrap confidence levels for phylogenetic trees B. Efron, E. Halloran, and S. Holmes, 1996 Following Confidence limits on phylogenies: an approach using the bootstrap, J. Felsenstein, 1985 1 I. Short

More information

Dr. Amira A. AL-Hosary

Dr. Amira A. AL-Hosary Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

More information

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata.

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata. Supplementary Note S2 Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata. Phylogenetic trees reconstructed by a variety of methods from either single-copy orthologous loci (Class

More information

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley

Integrative Biology 200 PRINCIPLES OF PHYLOGENETICS Spring 2018 University of California, Berkeley Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley B.D. Mishler Feb. 14, 2018. Phylogenetic trees VI: Dating in the 21st century: clocks, & calibrations;

More information

Chapter 9 BAYESIAN SUPERTREES. Fredrik Ronquist, John P. Huelsenbeck, and Tom Britton

Chapter 9 BAYESIAN SUPERTREES. Fredrik Ronquist, John P. Huelsenbeck, and Tom Britton Chapter 9 BAYESIAN SUPERTREES Fredrik Ronquist, John P. Huelsenbeck, and Tom Britton Abstract: Keywords: In this chapter, we develop a Bayesian approach to supertree construction. Bayesian inference requires

More information

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University Phylogenetics: Bayesian Phylogenetic Analysis COMP 571 - Spring 2015 Luay Nakhleh, Rice University Bayes Rule P(X = x Y = y) = P(X = x, Y = y) P(Y = y) = P(X = x)p(y = y X = x) P x P(X = x 0 )P(Y = y X

More information

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

More information

Bootstraps and testing trees. Alog-likelihoodcurveanditsconfidenceinterval

Bootstraps and testing trees. Alog-likelihoodcurveanditsconfidenceinterval ootstraps and testing trees Joe elsenstein epts. of Genome Sciences and of iology, University of Washington ootstraps and testing trees p.1/20 log-likelihoodcurveanditsconfidenceinterval 2620 2625 ln L

More information

PhyQuart-A new algorithm to avoid systematic bias & phylogenetic incongruence

PhyQuart-A new algorithm to avoid systematic bias & phylogenetic incongruence PhyQuart-A new algorithm to avoid systematic bias & phylogenetic incongruence Are directed quartets the key for more reliable supertrees? Patrick Kück Department of Life Science, Vertebrates Division,

More information

Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition

Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition David D. Pollock* and William J. Bruno* *Theoretical Biology and Biophysics, Los Alamos National

More information

Bootstrapping and Tree reliability. Biol4230 Tues, March 13, 2018 Bill Pearson Pinn 6-057

Bootstrapping and Tree reliability. Biol4230 Tues, March 13, 2018 Bill Pearson Pinn 6-057 Bootstrapping and Tree reliability Biol4230 Tues, March 13, 2018 Bill Pearson wrp@virginia.edu 4-2818 Pinn 6-057 Rooting trees (outgroups) Bootstrapping given a set of sequences sample positions randomly,

More information

Reconstructing the history of lineages

Reconstructing the history of lineages Reconstructing the history of lineages Class outline Systematics Phylogenetic systematics Phylogenetic trees and maps Class outline Definitions Systematics Phylogenetic systematics/cladistics Systematics

More information

STEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization)

STEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization) STEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization) Laura Salter Kubatko Departments of Statistics and Evolution, Ecology, and Organismal Biology The Ohio State University kubatko.2@osu.edu

More information

The Information Content of Trees and Their Matrix Representations

The Information Content of Trees and Their Matrix Representations 2004 POINTS OF VIEW 989 Syst. Biol. 53(6):989 1001, 2004 Copyright c Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 10.1080/10635150490522737 The Information Content of

More information

Letter to the Editor. Department of Biology, Arizona State University

Letter to the Editor. Department of Biology, Arizona State University Letter to the Editor Traditional Phylogenetic Reconstruction Methods Reconstruct Shallow and Deep Evolutionary Relationships Equally Well Michael S. Rosenberg and Sudhir Kumar Department of Biology, Arizona

More information

Consistency Index (CI)

Consistency Index (CI) Consistency Index (CI) minimum number of changes divided by the number required on the tree. CI=1 if there is no homoplasy negatively correlated with the number of species sampled Retention Index (RI)

More information

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007) FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter

More information

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057 Estimating Phylogenies (Evolutionary Trees) II Biol4230 Thurs, March 2, 2017 Bill Pearson wrp@virginia.edu 4-2818 Jordan 6-057 Tree estimation strategies: Parsimony?no model, simply count minimum number

More information

Pinvar approach. Remarks: invariable sites (evolve at relative rate 0) variable sites (evolves at relative rate r)

Pinvar approach. Remarks: invariable sites (evolve at relative rate 0) variable sites (evolves at relative rate r) Pinvar approach Unlike the site-specific rates approach, this approach does not require you to assign sites to rate categories Assumes there are only two classes of sites: invariable sites (evolve at relative

More information

Cladistics. The deterministic effects of alignment bias in phylogenetic inference. Mark P. Simmons a, *, Kai F. Mu ller b and Colleen T.

Cladistics. The deterministic effects of alignment bias in phylogenetic inference. Mark P. Simmons a, *, Kai F. Mu ller b and Colleen T. Cladistics Cladistics 27 (2) 42 46./j.96-3.2.333.x The deterministic effects of alignment bias in phylogenetic inference Mark P. Simmons a, *, Kai F. Mu ller b and Colleen T. Webb a a Department of Biology,

More information

Phylogenetic analyses. Kirsi Kostamo

Phylogenetic analyses. Kirsi Kostamo Phylogenetic analyses Kirsi Kostamo The aim: To construct a visual representation (a tree) to describe the assumed evolution occurring between and among different groups (individuals, populations, species,

More information

A Phylogenetic Network Construction due to Constrained Recombination

A Phylogenetic Network Construction due to Constrained Recombination A Phylogenetic Network Construction due to Constrained Recombination Mohd. Abdul Hai Zahid Research Scholar Research Supervisors: Dr. R.C. Joshi Dr. Ankush Mittal Department of Electronics and Computer

More information

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution Ziheng Yang Department of Biology, University College, London An excess of nonsynonymous substitutions

More information

Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies

Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies 1 What is phylogeny? Essay written for the course in Markov Chains 2004 Torbjörn Karfunkel Phylogeny is the evolutionary development

More information

How to read and make phylogenetic trees Zuzana Starostová

How to read and make phylogenetic trees Zuzana Starostová How to read and make phylogenetic trees Zuzana Starostová How to make phylogenetic trees? Workflow: obtain DNA sequence quality check sequence alignment calculating genetic distances phylogeny estimation

More information

Combining Data Sets with Different Phylogenetic Histories

Combining Data Sets with Different Phylogenetic Histories Syst. Biol. 47(4):568 581, 1998 Combining Data Sets with Different Phylogenetic Histories JOHN J. WIENS Section of Amphibians and Reptiles, Carnegie Museum of Natural History, Pittsburgh, Pennsylvania

More information

Maximum Likelihood Until recently the newest method. Popularized by Joseph Felsenstein, Seattle, Washington.

Maximum Likelihood Until recently the newest method. Popularized by Joseph Felsenstein, Seattle, Washington. Maximum Likelihood This presentation is based almost entirely on Peter G. Fosters - "The Idiot s Guide to the Zen of Likelihood in a Nutshell in Seven Days for Dummies, Unleashed. http://www.bioinf.org/molsys/data/idiots.pdf

More information

Consensus methods. Strict consensus methods

Consensus methods. Strict consensus methods Consensus methods A consensus tree is a summary of the agreement among a set of fundamental trees There are many consensus methods that differ in: 1. the kind of agreement 2. the level of agreement Consensus

More information

Estimating Divergence Dates from Molecular Sequences

Estimating Divergence Dates from Molecular Sequences Estimating Divergence Dates from Molecular Sequences Andrew Rambaut and Lindell Bromham Department of Zoology, University of Oxford The ability to date the time of divergence between lineages using molecular

More information

Split Support and Split Con ict Randomization Tests in Phylogenetic Inference

Split Support and Split Con ict Randomization Tests in Phylogenetic Inference Syst. Biol. 47(4):673 695, 1998 Split Support and Split Con ict Randomization Tests in Phylogenetic Inference MARK WILKINSON School of Biological Sciences, University of Bristol, Bristol BS8 1UG, and Department

More information

Scientific ethics, tree testing, Open Tree of Life. Workshop on Molecular Evolution Marine Biological Lab, Woods Hole, MA.

Scientific ethics, tree testing, Open Tree of Life. Workshop on Molecular Evolution Marine Biological Lab, Woods Hole, MA. Scientific ethics, tree testing, Open Tree of Life Workshop on Molecular Evolution 2018 Marine Biological Lab, Woods Hole, MA. USA Mark T. Holder University of Kansas next 22 slides from David Hillis Data

More information

Estimating Evolutionary Trees. Phylogenetic Methods

Estimating Evolutionary Trees. Phylogenetic Methods Estimating Evolutionary Trees v if the data are consistent with infinite sites then all methods should yield the same tree v it gets more complicated when there is homoplasy, i.e., parallel or convergent

More information

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics)

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics) - Phylogeny? - Systematics? The Phylogenetic Systematics (Phylogeny and Systematics) - Phylogenetic systematics? Connection between phylogeny and classification. - Phylogenetic systematics informs the

More information

Week 8: Testing trees, Bootstraps, jackknifes, gene frequencies

Week 8: Testing trees, Bootstraps, jackknifes, gene frequencies Week 8: Testing trees, ootstraps, jackknifes, gene frequencies Genome 570 ebruary, 2016 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.1/69 density e log (density) Normal distribution:

More information

Letter to the Editor. The Effect of Taxonomic Sampling on Accuracy of Phylogeny Estimation: Test Case of a Known Phylogeny Steven Poe 1

Letter to the Editor. The Effect of Taxonomic Sampling on Accuracy of Phylogeny Estimation: Test Case of a Known Phylogeny Steven Poe 1 Letter to the Editor The Effect of Taxonomic Sampling on Accuracy of Phylogeny Estimation: Test Case of a Known Phylogeny Steven Poe 1 Department of Zoology and Texas Memorial Museum, University of Texas

More information

Ratio of explanatory power (REP): A new measure of group support

Ratio of explanatory power (REP): A new measure of group support Molecular Phylogenetics and Evolution 44 (2007) 483 487 Short communication Ratio of explanatory power (REP): A new measure of group support Taran Grant a, *, Arnold G. Kluge b a Division of Vertebrate

More information

Maximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018

Maximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018 Maximum Likelihood Tree Estimation Carrie Tribble IB 200 9 Feb 2018 Outline 1. Tree building process under maximum likelihood 2. Key differences between maximum likelihood and parsimony 3. Some fancy extras

More information

Appendix from L. J. Revell, On the Analysis of Evolutionary Change along Single Branches in a Phylogeny

Appendix from L. J. Revell, On the Analysis of Evolutionary Change along Single Branches in a Phylogeny 008 by The University of Chicago. All rights reserved.doi: 10.1086/588078 Appendix from L. J. Revell, On the Analysis of Evolutionary Change along Single Branches in a Phylogeny (Am. Nat., vol. 17, no.

More information

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center Phylogenetic Analysis Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center Outline Basic Concepts Tree Construction Methods Distance-based methods

More information

Today's project. Test input data Six alignments (from six independent markers) of Curcuma species

Today's project. Test input data Six alignments (from six independent markers) of Curcuma species DNA sequences II Analyses of multiple sequence data datasets, incongruence tests, gene trees vs. species tree reconstruction, networks, detection of hybrid species DNA sequences II Test of congruence of

More information

Isolating - A New Resampling Method for Gene Order Data

Isolating - A New Resampling Method for Gene Order Data Isolating - A New Resampling Method for Gene Order Data Jian Shi, William Arndt, Fei Hu and Jijun Tang Abstract The purpose of using resampling methods on phylogenetic data is to estimate the confidence

More information

Permutation Tests. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods

Permutation Tests. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods Permutation Tests Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods The Two-Sample Problem We observe two independent random samples: F z = z 1, z 2,, z n independently of

More information

Chapter 26 Phylogeny and the Tree of Life

Chapter 26 Phylogeny and the Tree of Life Chapter 26 Phylogeny and the Tree of Life Biologists estimate that there are about 5 to 100 million species of organisms living on Earth today. Evidence from morphological, biochemical, and gene sequence

More information

first (i.e., weaker) sense of the term, using a variety of algorithmic approaches. For example, some methods (e.g., *BEAST 20) co-estimate gene trees

first (i.e., weaker) sense of the term, using a variety of algorithmic approaches. For example, some methods (e.g., *BEAST 20) co-estimate gene trees Concatenation Analyses in the Presence of Incomplete Lineage Sorting May 22, 2015 Tree of Life Tandy Warnow Warnow T. Concatenation Analyses in the Presence of Incomplete Lineage Sorting.. 2015 May 22.

More information

Biol 206/306 Advanced Biostatistics Lab 11 Models of Trait Evolution Fall 2016

Biol 206/306 Advanced Biostatistics Lab 11 Models of Trait Evolution Fall 2016 Biol 206/306 Advanced Biostatistics Lab 11 Models of Trait Evolution Fall 2016 By Philip J. Bergmann 0. Laboratory Objectives 1. Explore how evolutionary trait modeling can reveal different information

More information

Workshop III: Evolutionary Genomics

Workshop III: Evolutionary Genomics Identifying Species Trees from Gene Trees Elizabeth S. Allman University of Alaska IPAM Los Angeles, CA November 17, 2011 Workshop III: Evolutionary Genomics Collaborators The work in today s talk is joint

More information

Phylogenetic Tree Reconstruction

Phylogenetic Tree Reconstruction I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven

More information

Significance Testing with Incompletely Randomised Cases Cannot Possibly Work

Significance Testing with Incompletely Randomised Cases Cannot Possibly Work Human Journals Short Communication December 2018 Vol.:11, Issue:2 All rights are reserved by Stephen Gorard FRSA FAcSS Significance Testing with Incompletely Randomised Cases Cannot Possibly Work Keywords:

More information

A Bayesian Approach to Phylogenetics

A Bayesian Approach to Phylogenetics A Bayesian Approach to Phylogenetics Niklas Wahlberg Based largely on slides by Paul Lewis (www.eeb.uconn.edu) An Introduction to Bayesian Phylogenetics Bayesian inference in general Markov chain Monte

More information

Points of View. Congruence Versus Phylogenetic Accuracy: Revisiting the Incongruence Length Difference Test

Points of View. Congruence Versus Phylogenetic Accuracy: Revisiting the Incongruence Length Difference Test Points of View Syst. Biol. 53(1):81 89, 2004 Copyright c Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 10.1080/10635150490264752 Congruence Versus Phylogenetic Accuracy:

More information

Intraspecific gene genealogies: trees grafting into networks

Intraspecific gene genealogies: trees grafting into networks Intraspecific gene genealogies: trees grafting into networks by David Posada & Keith A. Crandall Kessy Abarenkov Tartu, 2004 Article describes: Population genetics principles Intraspecific genetic variation

More information

7. Tests for selection

7. Tests for selection Sequence analysis and genomics 7. Tests for selection Dr. Katja Nowick Group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute for Brain Research www. nowicklab.info

More information

A (short) introduction to phylogenetics

A (short) introduction to phylogenetics A (short) introduction to phylogenetics Thibaut Jombart, Marie-Pauline Beugin MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis with PR Statistics, Millport Field

More information

X X (2) X Pr(X = x θ) (3)

X X (2) X Pr(X = x θ) (3) Notes for 848 lecture 6: A ML basis for compatibility and parsimony Notation θ Θ (1) Θ is the space of all possible trees (and model parameters) θ is a point in the parameter space = a particular tree

More information

Definition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution.

Definition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution. Hypothesis Testing Definition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution. Suppose the family of population distributions is indexed

More information

Points of View Matrix Representation with Parsimony, Taxonomic Congruence, and Total Evidence

Points of View Matrix Representation with Parsimony, Taxonomic Congruence, and Total Evidence Points of View Syst. Biol. 51(1):151 155, 2002 Matrix Representation with Parsimony, Taxonomic Congruence, and Total Evidence DAVIDE PISANI 1,2 AND MARK WILKINSON 2 1 Department of Earth Sciences, University

More information

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline Phylogenetics Todd Vision iology 522 March 26, 2007 pplications of phylogenetics Studying organismal or biogeographic history Systematics ating events in the fossil record onservation biology Studying

More information

Experimental Design and Data Analysis for Biologists

Experimental Design and Data Analysis for Biologists Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1

More information

The statistical and informatics challenges posed by ascertainment biases in phylogenetic data collection

The statistical and informatics challenges posed by ascertainment biases in phylogenetic data collection The statistical and informatics challenges posed by ascertainment biases in phylogenetic data collection Mark T. Holder and Jordan M. Koch Department of Ecology and Evolutionary Biology, University of

More information

PHYLOGENY & THE TREE OF LIFE

PHYLOGENY & THE TREE OF LIFE PHYLOGENY & THE TREE OF LIFE PREFACE In this powerpoint we learn how biologists distinguish and categorize the millions of species on earth. Early we looked at the process of evolution here we look at

More information

Phylogenetic Analysis and Intraspeci c Variation : Performance of Parsimony, Likelihood, and Distance Methods

Phylogenetic Analysis and Intraspeci c Variation : Performance of Parsimony, Likelihood, and Distance Methods Syst. Biol. 47(2): 228± 23, 1998 Phylogenetic Analysis and Intraspeci c Variation : Performance of Parsimony, Likelihood, and Distance Methods JOHN J. WIENS1 AND MARIA R. SERVEDIO2 1Section of Amphibians

More information

A data based parsimony method of cophylogenetic analysis

A data based parsimony method of cophylogenetic analysis Blackwell Science, Ltd A data based parsimony method of cophylogenetic analysis KEVIN P. JOHNSON, DEVIN M. DROWN & DALE H. CLAYTON Accepted: 20 October 2000 Johnson, K. P., Drown, D. M. & Clayton, D. H.

More information

Lecture 6 Phylogenetic Inference

Lecture 6 Phylogenetic Inference Lecture 6 Phylogenetic Inference From Darwin s notebook in 1837 Charles Darwin Willi Hennig From The Origin in 1859 Cladistics Phylogenetic inference Willi Hennig, Cladistics 1. Clade, Monophyletic group,

More information

Classification and Phylogeny

Classification and Phylogeny Classification and Phylogeny The diversity of life is great. To communicate about it, there must be a scheme for organization. There are many species that would be difficult to organize without a scheme

More information

Biology 211 (2) Week 1 KEY!

Biology 211 (2) Week 1 KEY! Biology 211 (2) Week 1 KEY Chapter 1 KEY FIGURES: 1.2, 1.3, 1.4, 1.5, 1.6, 1.7 VOCABULARY: Adaptation: a trait that increases the fitness Cells: a developed, system bound with a thin outer layer made of

More information

Anatomy of a species tree

Anatomy of a species tree Anatomy of a species tree T 1 Size of current and ancestral Populations (N) N Confidence in branches of species tree t/2n = 1 coalescent unit T 2 Branch lengths and divergence times of species & populations

More information

Physics 509: Bootstrap and Robust Parameter Estimation

Physics 509: Bootstrap and Robust Parameter Estimation Physics 509: Bootstrap and Robust Parameter Estimation Scott Oser Lecture #20 Physics 509 1 Nonparametric parameter estimation Question: what error estimate should you assign to the slope and intercept

More information

MOLECULAR SYSTEMATICS: A SYNTHESIS OF THE COMMON METHODS AND THE STATE OF KNOWLEDGE

MOLECULAR SYSTEMATICS: A SYNTHESIS OF THE COMMON METHODS AND THE STATE OF KNOWLEDGE CELLULAR & MOLECULAR BIOLOGY LETTERS http://www.cmbl.org.pl Received: 16 August 2009 Volume 15 (2010) pp 311-341 Final form accepted: 01 March 2010 DOI: 10.2478/s11658-010-0010-8 Published online: 19 March

More information

Classification and Phylogeny

Classification and Phylogeny Classification and Phylogeny The diversity it of life is great. To communicate about it, there must be a scheme for organization. There are many species that would be difficult to organize without a scheme

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Concepts and Methods in Molecular Divergence Time Estimation

Concepts and Methods in Molecular Divergence Time Estimation Concepts and Methods in Molecular Divergence Time Estimation 26 November 2012 Prashant P. Sharma American Museum of Natural History Overview 1. Why do we date trees? 2. The molecular clock 3. Local clocks

More information

Thanks to Paul Lewis and Joe Felsenstein for the use of slides

Thanks to Paul Lewis and Joe Felsenstein for the use of slides Thanks to Paul Lewis and Joe Felsenstein for the use of slides Review Hennigian logic reconstructs the tree if we know polarity of characters and there is no homoplasy UPGMA infers a tree from a distance

More information

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis

More information

Multiple Testing. Gary W. Oehlert. January 28, School of Statistics University of Minnesota

Multiple Testing. Gary W. Oehlert. January 28, School of Statistics University of Minnesota Multiple Testing Gary W. Oehlert School of Statistics University of Minnesota January 28, 2016 Background Suppose that you had a 20-sided die. Nineteen of the sides are labeled 0 and one of the sides is

More information

BINF6201/8201. Molecular phylogenetic methods

BINF6201/8201. Molecular phylogenetic methods BINF60/80 Molecular phylogenetic methods 0-7-06 Phylogenetics Ø According to the evolutionary theory, all life forms on this planet are related to one another by descent. Ø Traditionally, phylogenetics

More information

Hypothesis Testing. Hypothesis: conjecture, proposition or statement based on published literature, data, or a theory that may or may not be true

Hypothesis Testing. Hypothesis: conjecture, proposition or statement based on published literature, data, or a theory that may or may not be true Hypothesis esting Hypothesis: conjecture, proposition or statement based on published literature, data, or a theory that may or may not be true Statistical Hypothesis: conjecture about a population parameter

More information

Increasing Data Transparency and Estimating Phylogenetic Uncertainty in Supertrees: Approaches Using Nonparametric Bootstrapping

Increasing Data Transparency and Estimating Phylogenetic Uncertainty in Supertrees: Approaches Using Nonparametric Bootstrapping Syst. Biol. 55(4):662 676, 2006 Copyright c Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 10.1080/10635150600920693 Increasing Data Transparency and Estimating Phylogenetic

More information

Lab 9: Maximum Likelihood and Modeltest

Lab 9: Maximum Likelihood and Modeltest Integrative Biology 200A University of California, Berkeley "PRINCIPLES OF PHYLOGENETICS" Spring 2010 Updated by Nick Matzke Lab 9: Maximum Likelihood and Modeltest In this lab we re going to use PAUP*

More information

Nonparametric tests. Mark Muldoon School of Mathematics, University of Manchester. Mark Muldoon, November 8, 2005 Nonparametric tests - p.

Nonparametric tests. Mark Muldoon School of Mathematics, University of Manchester. Mark Muldoon, November 8, 2005 Nonparametric tests - p. Nonparametric s Mark Muldoon School of Mathematics, University of Manchester Mark Muldoon, November 8, 2005 Nonparametric s - p. 1/31 Overview The sign, motivation The Mann-Whitney Larger Larger, in pictures

More information