Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30

Size: px
Start display at page:

Transcription

1 Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30

2 A non-phylogeny example of the bootstrap estimate of θ (unknown) true value of θ empirical distribution of sample Bootstrap replicates (unknown) true distribution Distribution of estimates of parameters Bootstrap sampling from a distribution (a mixture of two normals) to estimate the variance of the mean Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.2/30

3 Bootstrap sampling To infer the error in a quantity, θ, estimated from a sample of points x 1, x 2,..., x n we can Do the following R times (R = 1000 or so) Draw a bootstrap sample" by sampling n times with replacement from the sample. Call these x 1, x 2,..., x n. Note that some of the original points are represented more than once in the bootstrap sample, some once, some not at all. Estimate θ from the bootstrap sample, call this ˆθ k (k = 1, 2,..., R) When all R bootstrap samples have been done, the distribution of ˆθ i estimates the distribution one would get if one were able to draw repeated samples of n points from the unknown true distribution. Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.3/30

4 Bootstrap sampling of phylogenies Original Data sites sequences Bootstrap sample #1 sequences sites sample same number of sites, with replacement Estimate of the tree Bootstrap sample #2 sequences sites sample same number of sites, with replacement Bootstrap estimate of the tree, #1 (and so on) Bootstrap estimate of the tree, #2 Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.4/30

5 More on the bootstrap for phylogenies The sites are assumed to have evolved independently given the tree. They are the entities that are sampled (the x i ). The trees play the role of the parameter. One ends up with a cloud of R sampled trees. To summarize this cloud, we ask, for each branch in the tree, how frequently it appears among the cloud of trees. We make a tree that summarizes this for all the most frequently occurring branches. This is the majority rule consensus tree of the bootstrap estimates of the tree. Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.5/30

6 Majority rule consensus trees Trees: E A C F B D E C A B D F E A F D B C E A D F B C E C A D F B Majority rule consensus tree of the unrooted trees: How many times each partition of species is found: AE BCDF 3 ACE BDF 3 ACEF BD 1 AC BDEF 1 AEF BCD 1 ADEF BC 2 ABDF EC 1 ABCE DF 3 E A C B D F Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.6/30

7 An example of bootstrap sampling of trees Bovine Mouse Squir Monk Chimp Human Gorilla Orang Gibbon Rhesus Mac Jpn Macaq Crab E.Mac BarbMacaq Tarsier Lemur 232 nucleotide, 14-species mitochondrial D-loop Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.7/30 analyzed by parsimony, 100 bootstrap replicates

8 Potential problems with the bootstrap 1. Sites may not evolve independently 2. Sites may not come from a common distribution (but can consider them sampled from a mixture of possible distributions) 3. If do not know which branch is of interest at the outset, a multiple-tests" problem means P values are overstated 4. P values are biased (too conservative) 5. Bootstrapping does not correct biases in phylogeny methods Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.8/30

9 Other resampling methods Delete-half jackknife. Sample a random 50% of the sites, without replacement. Delete-1/e jackknife (Farris et. al. 1996) (too little deletion from a statistical viewpoint). Reweighting characters by choosing weights from an exponential distribution. In fact, reweighting them by any exchangeable weights having coefficient of variation of 1 Parametric bootstrap simulate data sets of this size assuming the estimate of the tree is the truth (to correct for correlation among adjacent sites) (Künsch, 1989) Block-bootstrapping sample n/b blocks of b adjacent sites. Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.9/30

10 Delete half jackknife on the example Bovine Mouse Squir Monk Chimp Human Gorilla Orang Gibbon Rhesus Mac Jpn Macaq Crab E.Mac BarbMacaq 59 Tarsier Lemur Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.10/30

11 Calibrating the jackknife Exact computation of the effects of deletion fraction for the jackknife (suppose 1 and 2 are conflicting groups) n 1 n 2 n characters Prob( m > m ) 1 2 Bootstrap Jackknife δ = 1/2 δ = 1/e n(1 δ) characters Prob( m > m ) m 1 m Prob( m > m ) 1 2 Prob( m = m ) We can compute for various n s the probabilities of getting more evidence for group 1 than for group 2 A typical result is for n 1 = 10, n 2 = 8, n = 100 : Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.11/30

12 Parametric bootstrapping The Parametric Bootstrap (Efron, 1985) Suppose we have independent observations drawn from a known distribution: x 1, x 2, x 3,... x n and a parameter, θ, To infer the variability of θ Use the current estimate, θ^ Use the distribution that has that as its true parameter sample R data sets from that distribution, each having the same sample size as the original sample... x *, x *, x *,... x * n x *, x *, x *,... x * n x *, x *, x *,... x * n... x *, x *, x *,... x * n θ calculated from this. ^ θ1 ^ θ2 ^ θ3 ^ θr and take the distribution of the as the estimate of the distribution from which it is drawn θ ^ θ i Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.12/30

13 The parametric bootstrap for phylogenies computer simulation data set #1 estimation of tree T 1 original data estimate of tree data set #2 data set #3 T 2 T 3 data set #100 T 100 Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.13/30

14 An example of the parametric bootstrap 53 Tarsier Bovine 41 Lemur Sq_Monk 80 Human 100 Chimp Gorilla Orang Gibbon Rhes_Mac Jpn_Mac Crab E_Mac Barb_Mac Mouse Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.14/30

15 ln L Likelihood ratio confidence limits on Ts/Tn ratio Transition / transversion ratio for the 14-species primate data set Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.15/30

16 ln Likelihood Likelihoods in tree space a 3-species clock example 204 A C B A B C 205 x x 206 B C A x x Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.16/30

17 The constraints for a molecular clock A B C D E Constraints for a clock v 1 v 2 v 4 v 5 v 1 = v 2 v 6 v 3 v v 4 = 5 v 8 v 1 + v v 6 = 3 v 7 v v 3 + v 7 = 4 + v 8 Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.17/30

18 Testing for a molecular clock To test for a molecular clock: Obtain the likelihood with no constraint of a molecular clock (For primates data with T s /T n = 30 we get ln L 1 = Obtain the highest likelihood for a tree which is constrained to have a molecular clock: ln L 0 = Look up 2(ln L 1 ln L 0 ) = = on a χ 2 distribution with n 2 = 12 degrees of freedom (in this case the result is significant) Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.18/30

19 Goldman s simulation test of Likelihood Ratios Goldman (1993) suggests that, in cases where we may wonder whether the Likelihood Ratio Test statistic really has its desired χ 2 distribution we can: Take our best estimate of the tree Simulate on it the evolution of data sets of the same size For each replicate, calculate the LRT statistic Use this as the distribution and see where the actual LRT value lies in it (e.g.: in the upper 5%?) This, of course, is a parametric boostrap. Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.19/30

20 Two trees to be tested by paired sites tests Tree I Bovine Gibbon Orang Gorilla Chimp Human Mouse Tree II Bovine Mouse Gibbon Orang Gorilla Chimp Human Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.20/30

21 Differences in log likelihoods site by site Tree I II site ln L Diff Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.21/30

22 Histogram of log likelihood differences Difference in log likelihood at site Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.22/30

23 Paired sites tests Winning sites test (Prager and Wilson, 1988). Do a sign test on the signs of the differences. z test (me, 1993 in PHYLIP documentation). Assume differences are normal, do z test of whether mean (hence sum) difference is significant. t test. Swofford et. al., 1996: do a t test (paired) Wilcoxon ranked sums test (Templeton, 1983). RELL test (Kishino and Hasegawa, 1989 per my suggestion). Bootstrap resample sites, get distribution of difference of totals. Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.23/30

24 In this example Winning sites test. 160 of 232 sites favor tree I. P < z test. Difference of log-likeihood totals is standard deviations from 0, P = Not significant. t test. Same as z test for this large a number of sites. Wilcoxon ranked sums test. Rank sum is standard deviations below its expected value, P = RELL test. 8,326 out of 10,000 samples have a positive sum, P = (two-sided) Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.24/30

25 Bayesian methods In the Bayesian framework, one can avoid the separate calculation of confidence intervals. The posterior distribution of trees shows us how much credence to give different trees (for example, it assigns probabilities to different tree topologies). The unresolved issue is how to summarize this posterior distrution in the best way. In this respect Bayesian methods leave you in a situation analogous to having the cloud of bootstrap-sampled trees without yet having summarized them. Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.25/30

26 References Bremer, K The limits of amino acid sequence data in angiosperm phylogenetic reconstruction. Evolution 42: [Bremer support] Cavender, J. A Taxonomy with confidence. Mathematical Biosciences 40: (Erratum, vol. 44, p. 308, 1979) [First paper on testing trees] Efron, B Bootstrap methods: another look at the jackknife. Annals of Statistics 7: [The original bootstrap paper] Efron, B Bootstrap confidence intervals for a class of parametric problems. Biometrika 72: [The parametric bootstrap] Farris, J. S., V. A. Albert, M. Kallersjö, D. Lipscomb, and A. G. Kluge Parsimony jackknifing outperforms neighbor-joining. Cladistics 12: [The delete-1/e jackknife for phylogenies] Felsenstein, J Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39: [The bootstrap first applied to phylogenies] Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.26/30

27 more references Felsenstein, J. and H. Kishino Is there something wrong with the bootstrap on phylogenies? A reply to Hillis and Bull. Systematic Biology 42: [A more detailed exposition of the bias of P values in a normal case] Felsenstein, J. 1985c. Confidence limits on phylogenies with a molecular clock. Systematic Zoology 34: [A 3-species case where we can evaluate methods] Goldman, N Statistical tests of models of DNA substitution. Journal of Molecular Evolution 36: [Parametric bootstrapping for testing models] Harshman, J The effect of irrelevant characters on bootstrap values. Systematic Zoology 43: [Not much effect on parsimony whether or not you include invariant characters when bootstrapping] Hasegawa, M., H. Kishino Confidence limits on the maximum-likelihood estimate of the hominoid tree from mitochondrial-dna sequences. Evolution 43: [The KHT test] Hasegawa, M. and H. Kishino Accuracies of the simple methods for estimating the bootstrap probability of a maximum-likelihood tree. Molecular Biology and Evolution 11: [RELL probabilities] Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.27/30

28 more references Hillis, D. M. and J. J. Bull An empirical test of bootstrapping as a method for assessing confidence in phylogenetic analysis. Systematic Biology 42: Bias in P values seen in a large simulation study] Kishino, H. and M. Hasegawa Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in Hominoidea. Journal of Molecular Evolution 29: [The KHT test] Künsch, H. R The jackknife and the bootstrap for general stationary observations. Annals of Statistics 17: [The block-bootstrap] Margush, T. and F. R. McMorris Consensus n-trees. Bulletin of Mathematical Biology 43: i. [Majority-rule consensus trees] Prager, E. M. and A. C. Wilson Ancient origin of lactalbumin from lysozyme: analysis of DNA and amino acid sequences. Journal of Molecular Evolution 27: [winning-sites test] Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.28/30

29 more references Sanderson, M. J Objections to bootstrapping phylogenies: a critique. Systematic Biology 44: [Good but he accepts a few criticisms I would not have accepted] Shimodaira, H. and M. Hasegawa Multiple comparisons of log-likelihoods with applications to phylogenetic inference. Molecular Biology and Evolution 16: [SH test for multiple trees] Sitnikova, T., A. Rzhetsky, and M. Nei Interior-branch and bootstrap tests of phylogenetic trees. Molecular Biology and Evolution 12: [The interior-branch test] Templeton, A. R Phylogenetic inference from restriction endonuclease cleavage site maps with particular reference to the evolution of humans and the apes. Evolution 37: [First paper on KHT test] Wu, C. F. J Jackknife, bootstrap and other resampling plans in regression analysis. Annals of Statistics 14: [The delete-half jackknife] Zharkikh, A., and W.-H. Li Statistical properties of bootstrap estimation of phylogenetic variability from nucleotide sequences. I. Four taxa with a molecular clock. Molecular Biology and Evolution 9: [Discovery and explanation of bias in P values] Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.29/30

30 How it was done This projection produced as a PDF, not a PowerPoint file, and viewed using the Full Screen mode (in the View menu of Adobe Acrobat Reader): using the prosper style in LaTeX, using Latex to make a.dvi file, using dvips to turn this into a Postscript file, using ps2pdf to mill it into a PDF file, and displaying the slides in Adobe Acrobat Reader. Result: nice slides using freeware. Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.30/30

Bootstraps and testing trees. Alog-likelihoodcurveanditsconfidenceinterval

ootstraps and testing trees Joe elsenstein epts. of Genome Sciences and of iology, University of Washington ootstraps and testing trees p.1/20 log-likelihoodcurveanditsconfidenceinterval 2620 2625 ln L

Week 8: Testing trees, Bootstraps, jackknifes, gene frequencies

Week 8: Testing trees, ootstraps, jackknifes, gene frequencies Genome 570 ebruary, 2016 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.1/69 density e log (density) Normal distribution:

Week 7: Bayesian inference, Testing trees, Bootstraps

Week 7: ayesian inference, Testing trees, ootstraps Genome 570 May, 2008 Week 7: ayesian inference, Testing trees, ootstraps p.1/54 ayes Theorem onditional probability of hypothesis given data is: Prob

Inference of phylogenies, with some thoughts on statistics and geometry p.1/31

Inference of phylogenies, with some thoughts on statistics and geometry Joe Felsenstein University of Washington Inference of phylogenies, with some thoughts on statistics and geometry p.1/31 Darwin s

Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/22

Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 24. Phylogeny methods, part 4 (Models of DNA and

Lecture 4. Models of DNA and protein change. Likelihood methods

Lecture 4. Models of DNA and protein change. Likelihood methods Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 4. Models of DNA and protein change. Likelihood methods p.1/36

Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/26

Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 27. Phylogeny methods, part 4 (Models of DNA and

Inferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution

Today s topics Inferring phylogeny Introduction! Distance methods! Parsimony method!"#\$%&'(!)* +,-.'/01!23454(6!7!2845*0&4'9#6!:&454(6 ;?@AB=C?DEF Overview of phylogenetic inferences Methodology Methods

Hypothesis testing and phylogenetics

Hypothesis testing and phylogenetics Woods Hole Workshop on Molecular Evolution, 2017 Mark T. Holder University of Kansas Thanks to Paul Lewis, Joe Felsenstein, and Peter Beerli for slides. Motivation

Scientific ethics, tree testing, Open Tree of Life. Workshop on Molecular Evolution Marine Biological Lab, Woods Hole, MA.

Scientific ethics, tree testing, Open Tree of Life Workshop on Molecular Evolution 2018 Marine Biological Lab, Woods Hole, MA. USA Mark T. Holder University of Kansas next 22 slides from David Hillis Data

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 2009 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 200 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

Lecture 4. Models of DNA and protein change. Likelihood methods

Lecture 4. Models of DNA and protein change. Likelihood methods Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 4. Models of DNA and protein change. Likelihood methods p.1/39

Summary statistics, distributions of sums and means

Summary statistics, distributions of sums and means Joe Felsenstein Summary statistics, distributions of sums and means p.1/17 Quantiles In both empirical distributions and in the underlying distribution,

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees 2 broad categories: Distance-based methods Ultrametric Additive: UPGMA Transformed Distance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis

Consensus Methods. * You are only responsible for the first two

Consensus Trees * consensus trees reconcile clades from different trees * consensus is a conservative estimate of phylogeny that emphasizes points of agreement * philosophy: agreement among data sets is

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees 2 broad categories: istance-based methods Ultrametric Additive: UPGMA Transformed istance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

Bootstrap confidence levels for phylogenetic trees B. Efron, E. Halloran, and S. Holmes, 1996

Bootstrap confidence levels for phylogenetic trees B. Efron, E. Halloran, and S. Holmes, 1996 Following Confidence limits on phylogenies: an approach using the bootstrap, J. Felsenstein, 1985 1 I. Short

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

Phylogenetic Analysis Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center Outline Basic Concepts Tree Construction Methods Distance-based methods

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley B.D. Mishler Feb. 14, 2018. Phylogenetic trees VI: Dating in the 21st century: clocks, & calibrations;

Phylogenetic Tree Reconstruction

I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven

Letter to the Editor. Department of Biology, Arizona State University

Letter to the Editor Traditional Phylogenetic Reconstruction Methods Reconstruct Shallow and Deep Evolutionary Relationships Equally Well Michael S. Rosenberg and Sudhir Kumar Department of Biology, Arizona

Substitution = Mutation followed. by Fixation. Common Ancestor ACGATC 1:A G 2:C A GAGATC 3:G A 6:C T 5:T C 4:A C GAAATT 1:G A

GAGATC 3:G A 6:C T Common Ancestor ACGATC 1:A G 2:C A Substitution = Mutation followed 5:T C by Fixation GAAATT 4:A C 1:G A AAAATT GAAATT GAGCTC ACGACC Chimp Human Gorilla Gibbon AAAATT GAAATT GAGCTC ACGACC

Evaluating phylogenetic hypotheses

Evaluating phylogenetic hypotheses Methods for evaluating topologies Topological comparisons: e.g., parametric bootstrapping, constrained searches Methods for evaluating nodes Resampling techniques: bootstrapping,

Phylogenetic inference

Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis-) advantages of different information types

= 1 = 4 3. Odds ratio justification for maximum likelihood. Likelihoods, Bootstraps and Testing Trees. Prob (H 2 D) Prob (H 1 D) Prob (D H 2 )

4 1 1/3 1 = 4 3 1 4 1/3 1 = 1 12 Odds ratio justification for maximum likelihood Likelihoods, ootstraps and Testing Trees Joe elsenstein the data H 1 Hypothesis 1 H 2 Hypothesis 2 the symbol for given

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley B.D. Mishler Jan. 22, 2009. Trees I. Summary of previous lecture: Hennigian

7. Tests for selection

Sequence analysis and genomics 7. Tests for selection Dr. Katja Nowick Group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute for Brain Research www. nowicklab.info

Dr. Amira A. AL-Hosary

Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

A (short) introduction to phylogenetics

A (short) introduction to phylogenetics Thibaut Jombart, Marie-Pauline Beugin MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis with PR Statistics, Millport Field

A Statistical Test of Phylogenies Estimated from Sequence Data

A Statistical Test of Phylogenies Estimated from Sequence Data Wen-Hsiung Li Center for Demographic and Population Genetics, University of Texas A simple approach to testing the significance of the branching

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships Tore Samuelsson Feb 2016 Molecular phylogeny is being used in the identification and characterization of new pathogens,

Probability Distribution of Molecular Evolutionary Trees: A New Method of Phylogenetic Inference

J Mol Evol (1996) 43:304 311 Springer-Verlag New York Inc. 1996 Probability Distribution of Molecular Evolutionary Trees: A New Method of Phylogenetic Inference Bruce Rannala, Ziheng Yang Department of

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the

EVOLUTIONARY DISTANCES

EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:

C3020 Molecular Evolution. Exercises #3: Phylogenetics

C3020 Molecular Evolution Exercises #3: Phylogenetics Consider the following sequences for five taxa 1-5 and the known outgroup O, which has the ancestral states (note that sequence 3 has changed from

Stochastic Errors vs. Modeling Errors in Distance Based Phylogenetic Reconstructions

PLGW05 Stochastic Errors vs. Modeling Errors in Distance Based Phylogenetic Reconstructions 1 joint work with Ilan Gronau 2, Shlomo Moran 3, and Irad Yavneh 3 1 2 Dept. of Biological Statistics and Computational

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

Frequentist Properties of Bayesian Posterior Probabilities of Phylogenetic Trees Under Simple and Complex Substitution Models

Syst. Biol. 53(6):904 913, 2004 Copyright c Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 10.1080/10635150490522629 Frequentist Properties of Bayesian Posterior Probabilities

Phylogenetics: Building Phylogenetic Trees

1 Phylogenetics: Building Phylogenetic Trees COMP 571 Luay Nakhleh, Rice University 2 Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary model should

Estimating Divergence Dates from Molecular Sequences

Estimating Divergence Dates from Molecular Sequences Andrew Rambaut and Lindell Bromham Department of Zoology, University of Oxford The ability to date the time of divergence between lineages using molecular

Inferring Molecular Phylogeny

Dr. Walter Salzburger he tree of life, ustav Klimt (1907) Inferring Molecular Phylogeny Inferring Molecular Phylogeny 55 Maximum Parsimony (MP): objections long branches I!! B D long branch attraction

Thanks to Paul Lewis and Joe Felsenstein for the use of slides

Thanks to Paul Lewis and Joe Felsenstein for the use of slides Review Hennigian logic reconstructs the tree if we know polarity of characters and there is no homoplasy UPGMA infers a tree from a distance

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution Ziheng Yang Department of Biology, University College, London An excess of nonsynonymous substitutions

Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition

Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition David D. Pollock* and William J. Bruno* *Theoretical Biology and Biophysics, Los Alamos National

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057

Estimating Phylogenies (Evolutionary Trees) II Biol4230 Thurs, March 2, 2017 Bill Pearson wrp@virginia.edu 4-2818 Jordan 6-057 Tree estimation strategies: Parsimony?no model, simply count minimum number

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

Five Sami Khuri Department of Computer Science San José State University San José, California, USA sami.khuri@sjsu.edu v Distance Methods v Character Methods v Molecular Clock v UPGMA v Maximum Parsimony

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

MOLECULAR PHYLOGENY "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky EVOLUTION - theory that groups of organisms change over time so that descendeants differ structurally

Bootstrapping and Tree reliability. Biol4230 Tues, March 13, 2018 Bill Pearson Pinn 6-057

Bootstrapping and Tree reliability Biol4230 Tues, March 13, 2018 Bill Pearson wrp@virginia.edu 4-2818 Pinn 6-057 Rooting trees (outgroups) Bootstrapping given a set of sequences sample positions randomly,

Phylogenetic analyses. Kirsi Kostamo

Phylogenetic analyses Kirsi Kostamo The aim: To construct a visual representation (a tree) to describe the assumed evolution occurring between and among different groups (individuals, populations, species,

Consistency Index (CI)

Consistency Index (CI) minimum number of changes divided by the number required on the tree. CI=1 if there is no homoplasy negatively correlated with the number of species sampled Retention Index (RI)

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

Phylogenetics: Building Phylogenetic Trees COMP 571 - Fall 2010 Luay Nakhleh, Rice University Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary

Assessing Congruence Among Ultrametric Distance Matrices

Journal of Classification 26:103-117 (2009) DOI: 10.1007/s00357-009-9028-x Assessing Congruence Among Ultrametric Distance Matrices Véronique Campbell Université de Montréal, Canada Pierre Legendre Université

Points of View. Congruence Versus Phylogenetic Accuracy: Revisiting the Incongruence Length Difference Test

Points of View Syst. Biol. 53(1):81 89, 2004 Copyright c Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 10.1080/10635150490264752 Congruence Versus Phylogenetic Accuracy:

Algorithmic Methods Well-defined methodology Tree reconstruction those that are well-defined enough to be carried out by a computer. Felsenstein 2004,

Tracing the Evolution of Numerical Phylogenetics: History, Philosophy, and Significance Adam W. Ferguson Phylogenetic Systematics 26 January 2009 Inferring Phylogenies Historical endeavor Darwin- 1837

Phylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches

Int. J. Bioinformatics Research and Applications, Vol. x, No. x, xxxx Phylogenies Scores for Exhaustive Maximum Likelihood and s Searches Hyrum D. Carroll, Perry G. Ridge, Mark J. Clement, Quinn O. Snell

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

I9 Introduction to Bioinformatics, 0 Phylogenetic ree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & omputing, IUB Evolution theory Speciation Evolution of new organisms is driven by

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

Phylogenetics: Bayesian Phylogenetic Analysis COMP 571 - Spring 2015 Luay Nakhleh, Rice University Bayes Rule P(X = x Y = y) = P(X = x, Y = y) P(Y = y) = P(X = x)p(y = y X = x) P x P(X = x 0 )P(Y = y X

Bootstrapping phylogenies: Large deviations and dispersion effects

Biometrika (1996), 83, 2, pp. 315-328 Printed in Great Britain Bootstrapping phylogenies: Large deviations and dispersion effects BY MICHAEL A. NEWTON Department of Statistics, University of Wisconsin-Madison,

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2016 University of California, Berkeley. Parsimony & Likelihood [draft]

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2016 University of California, Berkeley K.W. Will Parsimony & Likelihood [draft] 1. Hennig and Parsimony: Hennig was not concerned with parsimony

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.

Additive distances Let T be a tree on leaf set S and let w : E R + be an edge-weighting of T, and assume T has no nodes of degree two. Let D ij = e P ij w(e), where P ij is the path in T from i to j. Then

Effects of Gap Open and Gap Extension Penalties

Brigham Young University BYU ScholarsArchive All Faculty Publications 200-10-01 Effects of Gap Open and Gap Extension Penalties Hyrum Carroll hyrumcarroll@gmail.com Mark J. Clement clement@cs.byu.edu See

Multiple sequence alignment accuracy and phylogenetic inference

Utah Valley University From the SelectedWorks of T. Heath Ogden 2006 Multiple sequence alignment accuracy and phylogenetic inference T. Heath Ogden, Utah Valley University Available at: https://works.bepress.com/heath_ogden/6/

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics

Bioinformatics 1 Biology, Sequences, Phylogenetics Part 4 Sepp Hochreiter Klausur Mo. 30.01.2011 Zeit: 15:30 17:00 Raum: HS14 Anmeldung Kusss Contents Methods and Bootstrapping of Maximum Methods Methods

Molecular Clocks. The Holy Grail. Rate Constancy? Protein Variability. Evidence for Rate Constancy in Hemoglobin. Given

Molecular Clocks Rose Hoberman The Holy Grail Fossil evidence is sparse and imprecise (or nonexistent) Predict divergence times by comparing molecular data Given a phylogenetic tree branch lengths (rt)

Assessing Phylogenetic Hypotheses and Phylogenetic Data

Assessing Phylogenetic Hypotheses and Phylogenetic Data We use numerical phylogenetic methods because most data includes potentially misleading evidence of relationships We should not be content with constructing

Ratio of explanatory power (REP): A new measure of group support

Molecular Phylogenetics and Evolution 44 (2007) 483 487 Short communication Ratio of explanatory power (REP): A new measure of group support Taran Grant a, *, Arnold G. Kluge b a Division of Vertebrate

Week 5: Distance methods, DNA and protein models

Week 5: Distance methods, DNA and protein models Genome 570 February, 2016 Week 5: Distance methods, DNA and protein models p.1/69 A tree and the expected distances it predicts E A 0.08 0.05 0.06 0.03

Concepts and Methods in Molecular Divergence Time Estimation

Concepts and Methods in Molecular Divergence Time Estimation 26 November 2012 Prashant P. Sharma American Museum of Natural History Overview 1. Why do we date trees? 2. The molecular clock 3. Local clocks

Workshop III: Evolutionary Genomics

Identifying Species Trees from Gene Trees Elizabeth S. Allman University of Alaska IPAM Los Angeles, CA November 17, 2011 Workshop III: Evolutionary Genomics Collaborators The work in today s talk is joint

1 ATGGGTCTC 2 ATGAGTCTC

We need an optimality criterion to choose a best estimate (tree) Other optimality criteria used to choose a best estimate (tree) Parsimony: begins with the assumption that the simplest hypothesis that

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.

Chapter 12 (Strikberger) Molecular Phylogenies and Evolution METHODS FOR DETERMINING PHYLOGENY In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task. Modern

Data Mining Chapter 4: Data Analysis and Uncertainty Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 4: Data Analysis and Uncertainty Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Why uncertainty? Why should data mining care about uncertainty? We

How should we go about modeling this? Model parameters? Time Substitution rate Can we observe time or subst. rate? What can we observe?

How should we go about modeling this? gorilla GAAGTCCTTGAGAAATAAACTGCACACACTGG orangutan GGACTCCTTGAGAAATAAACTGCACACACTGG Model parameters? Time Substitution rate Can we observe time or subst. rate? What

MOLECULAR SYSTEMATICS: A SYNTHESIS OF THE COMMON METHODS AND THE STATE OF KNOWLEDGE

CELLULAR & MOLECULAR BIOLOGY LETTERS http://www.cmbl.org.pl Received: 16 August 2009 Volume 15 (2010) pp 311-341 Final form accepted: 01 March 2010 DOI: 10.2478/s11658-010-0010-8 Published online: 19 March

Fractional Replications

Chapter 11 Fractional Replications Consider the set up of complete factorial experiment, say k. If there are four factors, then the total number of plots needed to conduct the experiment is 4 = 1. When

Bayesian Models for Phylogenetic Trees

Bayesian Models for Phylogenetic Trees Clarence Leung* 1 1 McGill Centre for Bioinformatics, McGill University, Montreal, Quebec, Canada ABSTRACT Introduction: Inferring genetic ancestry of different species

Chapter 9 BAYESIAN SUPERTREES. Fredrik Ronquist, John P. Huelsenbeck, and Tom Britton

Chapter 9 BAYESIAN SUPERTREES Fredrik Ronquist, John P. Huelsenbeck, and Tom Britton Abstract: Keywords: In this chapter, we develop a Bayesian approach to supertree construction. Bayesian inference requires

Homoplasy. Selection of models of molecular evolution. Evolutionary correction. Saturation

Homoplasy Selection of models of molecular evolution David Posada Homoplasy indicates identity not produced by descent from a common ancestor. Graduate class in Phylogenetics, Campus Agrário de Vairão,

Estimating Evolutionary Trees. Phylogenetic Methods

Estimating Evolutionary Trees v if the data are consistent with infinite sites then all methods should yield the same tree v it gets more complicated when there is homoplasy, i.e., parallel or convergent

Statistical nonmolecular phylogenetics: can molecular phylogenies illuminate morphological evolution?

Statistical nonmolecular phylogenetics: can molecular phylogenies illuminate morphological evolution? 30 July 2011. Joe Felsenstein Workshop on Molecular Evolution, MBL, Woods Hole Statistical nonmolecular

Branch-Length Prior Influences Bayesian Posterior Probability of Phylogeny

Syst. Biol. 54(3):455 470, 2005 Copyright c Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 10.1080/10635150590945313 Branch-Length Prior Influences Bayesian Posterior Probability

Improving divergence time estimation in phylogenetics: more taxa vs. longer sequences

Mathematical Statistics Stockholm University Improving divergence time estimation in phylogenetics: more taxa vs. longer sequences Bodil Svennblad Tom Britton Research Report 2007:2 ISSN 650-0377 Postal

X X (2) X Pr(X = x θ) (3)

Notes for 848 lecture 6: A ML basis for compatibility and parsimony Notation θ Θ (1) Θ is the space of all possible trees (and model parameters) θ is a point in the parameter space = a particular tree

Using algebraic geometry for phylogenetic reconstruction

Using algebraic geometry for phylogenetic reconstruction Marta Casanellas i Rius (joint work with Jesús Fernández-Sánchez) Departament de Matemàtica Aplicada I Universitat Politècnica de Catalunya IMA

Minimum evolution using ordinary least-squares is less robust than neighbor-joining

Minimum evolution using ordinary least-squares is less robust than neighbor-joining Stephen J. Willson Department of Mathematics Iowa State University Ames, IA 50011 USA email: swillson@iastate.edu November

Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D

7.91 Lecture #5 Database Searching & Molecular Phylogenetics Michael Yaffe B C D B C D (((,B)C)D) Outline Distance Matrix Methods Neighbor-Joining Method and Related Neighbor Methods Maximum Likelihood

arxiv: v1 [q-bio.pe] 27 Oct 2011

INVARIANT BASED QUARTET PUZZLING JOE RUSINKO AND BRIAN HIPP arxiv:1110.6194v1 [q-bio.pe] 27 Oct 2011 Abstract. Traditional Quartet Puzzling algorithms use maximum likelihood methods to reconstruct quartet

Algorithms in Bioinformatics

Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Distance Methods Character Methods

Algebraic Statistics Tutorial I

Algebraic Statistics Tutorial I Seth Sullivant North Carolina State University June 9, 2012 Seth Sullivant (NCSU) Algebraic Statistics June 9, 2012 1 / 34 Introduction to Algebraic Geometry Let R[p] =

The One-Quarter Fraction

The One-Quarter Fraction ST 516 Need two generating relations. E.g. a 2 6 2 design, with generating relations I = ABCE and I = BCDF. Product of these is ADEF. Complete defining relation is I = ABCE = BCDF

CREATING PHYLOGENETIC TREES FROM DNA SEQUENCES

INTRODUCTION CREATING PHYLOGENETIC TREES FROM DNA SEQUENCES This worksheet complements the Click and Learn developed in conjunction with the 2011 Holiday Lectures on Science, Bones, Stones, and Genes:

Theoretical Basis of Likelihood Methods in Molecular Phylogenetic Inference

Theoretical Basis of Likelihood Methods in Molecular Phylogenetic Inference Rhiju Das, Centre of Mathematics and Physical Sciences applied to Life science and EXperimental biology (CoMPLEX), University

Phylogenetic hypotheses and the utility of multiple sequence alignment

Phylogenetic hypotheses and the utility of multiple sequence alignment Ward C. Wheeler 1 and Gonzalo Giribet 2 1 Division of Invertebrate Zoology, American Museum of Natural History Central Park West at

Theory of Evolution Charles Darwin

Theory of Evolution Charles arwin 858-59: Origin of Species 5 year voyage of H.M.S. eagle (83-36) Populations have variations. Natural Selection & Survival of the fittest: nature selects best adapted varieties

Reconstruire le passé biologique modèles, méthodes, performances, limites

Reconstruire le passé biologique modèles, méthodes, performances, limites Olivier Gascuel Centre de Bioinformatique, Biostatistique et Biologie Intégrative C3BI USR 3756 Institut Pasteur & CNRS Reconstruire

Letter to the Editor. The Effect of Taxonomic Sampling on Accuracy of Phylogeny Estimation: Test Case of a Known Phylogeny Steven Poe 1

Letter to the Editor The Effect of Taxonomic Sampling on Accuracy of Phylogeny Estimation: Test Case of a Known Phylogeny Steven Poe 1 Department of Zoology and Texas Memorial Museum, University of Texas