"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2011 University of California, Berkeley

Similar documents
"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200 Spring 2018 University of California, Berkeley

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Evolution of Body Size in Bears. How to Build and Use a Phylogeny

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Non-independence in Statistical Tests for Discrete Cross-species Data

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2016 University of California, Berkeley. Parsimony & Likelihood [draft]

Power of the Concentrated Changes Test for Correlated Evolution

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution

ASYMMETRIES IN PHYLOGENETIC DIVERSIFICATION AND CHARACTER CHANGE CAN BE UNTANGLED

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

Phenotypic Evolution. and phylogenetic comparative methods. G562 Geometric Morphometrics. Department of Geological Sciences Indiana University

Homework Assignment, Evolutionary Systems Biology, Spring Homework Part I: Phylogenetics:

AP Biology. Cladistics

Appendix from L. J. Revell, On the Analysis of Evolutionary Change along Single Branches in a Phylogeny

User s Manual for. Continuous. (copyright M. Pagel) Mark Pagel School of Animal and Microbial Sciences University of Reading Reading RG6 6AJ UK

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8

Chapter 7: Models of discrete character evolution

--Therefore, congruence among all postulated homologies provides a test of any single character in question [the central epistemological advance].

GENETICS - CLUTCH CH.22 EVOLUTIONARY GENETICS.

Constructing Evolutionary/Phylogenetic Trees

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley

Analysis of variance (ANOVA) Comparing the means of more than two groups

Estimating a Binary Character s Effect on Speciation and Extinction

Constructing Evolutionary/Phylogenetic Trees

Maximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics)

Theory of Evolution Charles Darwin

Integrative Biology 200A "PRINCIPLES OF PHYLOGENETICS" Spring 2012 University of California, Berkeley

Biol 206/306 Advanced Biostatistics Lab 11 Models of Trait Evolution Fall 2016

Classification and Phylogeny

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Difference in two or more average scores in different groups

Concepts and Methods in Molecular Divergence Time Estimation

BIOL Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES

BIOLOGY 317 Spring First Hourly Exam 4/20/12

Classification and Phylogeny

Page 1. Evolutionary Trees. Why build evolutionary tree? Outline

POSITIVE CORRELATION BETWEEN DIVERSIFICATION RATES AND PHENOTYPIC EVOLVABILITY CAN MIMIC PUNCTUATED EQUILIBRIUM ON MOLECULAR PHYLOGENIES

From Individual-based Population Models to Lineage-based Models of Phylogenies

Probabilistic modeling and molecular phylogeny

Dr. Amira A. AL-Hosary

Discrete & continuous characters: The threshold model

Testing adaptive hypotheses What is (an) adaptation? Testing adaptive hypotheses What is (an) adaptation?

LAB 4: PHYLOGENIES & MAPPING TRAITS

Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression)

Cladistics and Bioinformatics Questions 2013

Statistical nonmolecular phylogenetics: can molecular phylogenies illuminate morphological evolution?

This is particularly true if you see long tails in your data. What are you testing? That the two distributions are the same!

Lecture V Phylogeny and Systematics Dr. Kopeny

HYPOTHESIS TESTING: THE CHI-SQUARE STATISTIC

Thursday, January 14. Teaching Point: SWBAT. assess their knowledge to prepare for the Evolution Summative Assessment. (TOMORROW) Agenda:

A Phylogenetic Network Construction due to Constrained Recombination

Statistical Distribution Assumptions of General Linear Models

X X (2) X Pr(X = x θ) (3)

Biology 211 (2) Week 1 KEY!

Some Aspects of Ancestor Reconstruction in the Study of Floral Assembly. Some Aspects of Ancestor Reconstruction in the Study of Floral Assembly

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

Phylogenetic inference

Integrating Fossils into Phylogenies. Throughout the 20th century, the relationship between paleontology and evolutionary biology has been strained.

Section 9c. Propensity scores. Controlling for bias & confounding in observational studies

Econ 583 Homework 7 Suggested Solutions: Wald, LM and LR based on GMM and MLE

Chi Square Analysis M&M Statistics. Name Period Date

A (short) introduction to phylogenetics

Rapid evolution of the cerebellum in humans and other great apes

Algorithms in Bioinformatics

Consistency Index (CI)

C.DARWIN ( )

Visual interpretation with normal approximation

Inferring Molecular Phylogeny

Anatomy of a tree. clade is group of organisms with a shared ancestor. a monophyletic group shares a single common ancestor = tapirs-rhinos-horses

Testing quantitative genetic hypotheses about the evolutionary rate matrix for continuous characters

Modeling continuous trait evolution in the absence of phylogenetic information

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata.

HYPOTHESIS TESTING. Hypothesis Testing

Testing Independence

Phylogeny 9/8/2014. Evolutionary Relationships. Data Supporting Phylogeny. Chapter 26

Biol 206/306 Advanced Biostatistics Lab 12 Bayesian Inference

Testing and Model Selection

STT 843 Key to Homework 1 Spring 2018

Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D

Simple logistic regression

Integrative Biology 200A "PRINCIPLES OF PHYLOGENETICS" Spring 2008

(Stevens 1991) 1. morphological characters should be assumed to be quantitative unless demonstrated otherwise

Is the equal branch length model a parsimony model?

C3020 Molecular Evolution. Exercises #3: Phylogenetics

Applied Statistics for the Behavioral Sciences

Week 8: Testing trees, Bootstraps, jackknifes, gene frequencies

Chapter 9: Beyond the Mk Model

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016

Contingency Tables. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels.

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

Macroevolution Part I: Phylogenies

Chapter 26 Phylogeny and the Tree of Life

EFFECTS OF TREE SHAPE ON THE ACCURACY OF MAXIMUM LIKELIHOOD-BASED ANCESTOR RECONSTRUCTIONS

Contingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878

Biol 206/306 Advanced Biostatistics Lab 12 Bayesian Inference Fall 2016

Evolutionary Models. Evolutionary Models

Transcription:

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2011 University of California, Berkeley B.D. Mishler Feb. 1, 2011. Qualitative character evolution (cont.) - comparing two or more characters 1. Correlated evolution of two binary traits under the parsimony criterion Maddison's (1990) concentrated changes test was one of the first tests introduced to test for correlations in the evolution of discrete traits, in a phylogenetic context. The test examines the question: are changes in character B concentrated on portions of the phylogeny where character A has a particular state, more than expected by chance? Examples shown on the following pages: Does larval gregariousness in butterflies evolve more often than expected in lineages with warning coloration? Does dioecy evolve from hermaphroditism more often than expected in lineages that have fleshy (vs. dry) fruits? The test proceeds in several steps: 1) first reconstruct the evolution of character A on the tree. This is considered fixed an independent factor that may influence evolution of character B, but is not dependent upon it. 2) character B is mapped on the tree, and the total number of evolutionary transitions (gains and losses), and the number of transitions that occur against each background in trait A, are tabulated; 3) by exact calculation, or simulation, the number of possible ways to arrange the same number of changes on the given phylogeny are calculated, along with the number of arrangements which involve as many or more changes against the relevant background of character A. 4) The significance of the observed pattern is calculated as the proportion of possible arrangements with as many or more changes located in the selected background of character A. 4 gains (out of 5 gains, 1 loss) on black branches: p = 0.0638 69 ways to arrange 2 gains and 0 losses 15 have 2 gains on black branches

2. Maximum likelihood principles, and applications to discrete characters Maximum likelihood (ML) principles provide an alternative to parsimony in the reconstruction of phylogenies and estimation of ancestral states. ML also represents an important shift in thinking from standard probabilistic statistics. In standard statistics we focus on the probability of a given observation under a null hypothesis. If the actual observations are considered very unlikely, then we reject the null hypothesis. However, we don't actually accept a particular alternative hypothesis. For example, consider a t-test of the following observations: Treatment A: 5, 8, 10 (mean = 7.7, sd = 2.5) Treatment B: 8, 12, 15 (mean = 11.7, sd = 3.5) Null hypothesis: Assume that observations represent a finite sample from a normal distribution (in other words, the process in the natural world that generates these data would generate a normal distribution if you collected an infinite sample) assume that the means and variances of those distributions are equal in the two samples if these assumptions are true, the probability of drawing two samples that differ by as much or more as the two above is Pr(d H0) = 0.18. note that we can describe the full distribution of possible outcomes, in terms of the differences between the two groups, which will be a normal distribution that sums to 1. Maximum likelihood reverses the entire process. Let's assume that our data are real and true, and they reflect the outcome of some unknown process or model. Can we calculate the likelihood of the model, given these data, and compare that to the likelihood of alternative models. We are searching for the maximum likelihood model the model of the world that best fits the data. The problem is that there are an infinite number of possible models, so unlike probability we can't describe the entire likelihood space as a distribution that sums to 1. So how can we calculate their relative likelihoods? The fundamental insight (Edwards 1972) that makes ML statistics possible is that: L (m d) Pr (d m) Thus, we can obtain relative likelihoods of alternative models and compare them. One of those alternatives could be the traditional null hypothesis, leading to the same significance test, but in general the ability to specify a range of alternatives enhances our ability to explore specific hypotheses with the data. One of the main drawbacks is that it can be quite difficult in some cases to find the best model under ML, if there is no analytic solution.

Likelihood estimation of ancestral states and rates of character evolution on a phylogeny: We'll start with one of the simplest problem in phylogenetics: estimating the rates of character evolution for a binary trait. First consider the problem of evolution along a single branch: α = the instantaneous forward transition rate from 0 -> 1 β = the instantaneous reverse transition rate from 1 -> 0 With a little calculus one can show that the probabilities of change along a branch of length t are: From: To: 0 1 0 P 00 = 1 P 01 P 10 = β 1 exp ( α + β)t α + β ( [ ]) 1 P 01 = α 1 exp ( α + β)t α + β ( [ ]) P 11 = 1 P 10 If one assumes that the backward and forward transition rates are the same, P01 and P10 also are the same and simplify considerably. For example: α = 0.5, β = 0.5, t = 1 To: From: 0 1 0 0.684 0.316 1 0.316 0.684 For example: α = 0.8, β = 0.2, t = 1 To: From: 0 1 0 0.494 0.126 1 0.506 0.874 For example: α = 0.8, β = 0.0, t = 1 0 1 0 0.449 0.0 1 0.551 1.0

The first REALLY important thing about maximum likelihood view of evolutionary change is that branch lengths matter (unlike parsimony). Given instantaneous rates of change, α and β, a branch will eventually converge on a probability α that it ends in state 1 and probability β that it ends in state 0, regardless of the initial state. E.g. α = 0.8, β = 0.2 If both rates are lower, but similar ratio to each other, the branches will converge to the same point, but it will take longer. For α = 0.4, β = 0.1

ML estimation of transition rates and ancestral states 2 1. Calculate likelihood for all possible rates over all possible combinations of ancestral states 2. Sum probabilities over each ancestral set; max value = ML estimate of transition rate 3. Now, using that rate, evaluate likelihood of each ancestral set 4. Sum likelihoods for individual states at each node overall all possible values at other nodes; normalize to sum of 1 to get likelihood of state 0 and 1 at each internal node

phylogeny and species trait values BL=1.32 t3 t3=0 What is the likelihood of any one set of ancestral states given a hypothesized transition rate (assume equal forwards and backwards rates): Let A = 0, B = 1, alpha = beta = 0.3 A BL=0.93 B BL=0.39 BL=0.39 t2 t1 t2=1 t1=1 overall likelihood of this combination is the product of the individual likelihoods on each branch: A->B = P01,0.93 = 0.213 B->t1 = P11,0.39 = 0.896 B->t2 = P11,0.39 = 0.896 A->t3 = P00,1.32 = 0.726 prod = 0.1246

phylogeny and species trait values A B t3 t2 t1 0 1 1 likelihood 0.00 0.05 0.10 0.15 0.20 0.25 0.30 likelihood of transition rates depending on hypothesized ancestral node values (A,B) ML rate = 0.5121 (0,1) (1,1) (0,0) sum (1,0) 0.5 1.0 1.5 2.0 rate

phylogeny and species trait values given ML rate = 0.5121, what are probabilities of each combination of ancestral states (A,B) t3 0 A \ B 0 1 P(A) L(A) normalized A 0 0.0057 0.1218 0.127 0.429 1 0.0005 0.1693 0.169 0.571 B t2 1 P(B) 0.0062 0.291 L(B) normaliz ed 0.021 0.979 t1 1

Schluter et al. 1997 Evolution

Likelihood ratio tests a measure of support for alternative hypotheses LR = -2*ln(L 1 /L 2 ) For two hypotheses with the same number of parameters, there is no exact significance value attached to the LR. Values greater than 2 are considered 'strong support' For nested hypotheses with different numbers of parameters, LR is distributed as a chi-square with df = the difference in number of parameters For example: If we find the maximum likelihood with alpha and beta fit independently: al = 0.59 be = 0.31 L(m) 0.256. If we allow only one transition rate, such that al=be, then: al = be = inf L 0.25 LR = -2*ln(.25/.256) = 0.05 chisq(0.05, df=1) = 0.82 So these data are insufficient to reject a single rates model.

Pagel's (1994) discrete test of correlated evolution: Same idea as above, but test for parameters of dependence in trait change. For example: q12 is the probability that trait 2 changes from 0 -> 1, when trait 1 = 0 q34 is the probability that trait 2 changes from 0 -> 1, when trait 1 = 1 For an instantaneous model of change, assume they don't change simultaneously Model with full dependence has 8 parameters If traits evolve independently, there are only 4 parameters, because: q12 = q34; q13 = q24; q31 = q42; q21 = q43

ln(l(independent)) = -11.91 ln(l(dependent)) = -8.43-2*ln(LI/LD) = 6.96 chisq(6.96,4) = 0.14 q12 = 0.29 gain of oestrous swellings in single-male breeding systems q34 = 3.45 gain of OS in multi-male breeding systems q13 = 1.87 gain of multi-male BS in absence of OS CITATIONS: Pagel M. (1994) Detecting correlated evolution on phylogenies: a general method for the comparative analysis of discrete characters. Proc. R. Soc. Lond. B, 255, 37-45 Pagel M. (1999) The maximum likelihood approach to reconstructing ancestral character states of discrete characters on phylogenies. Syst. Biol., 48, 612-622 Schluter D., Price T., Mooers A. & Ludwig D. (1997) Likelihood of ancestor states in adaptive radiation. Evolution, 51, 1699-1711