Discrete & continuous characters: The threshold model

Similar documents
Statistical nonmolecular phylogenetics: can molecular phylogenies illuminate morphological evolution?

Algorithmic Methods Well-defined methodology Tree reconstruction those that are well-defined enough to be carried out by a computer. Felsenstein 2004,

ANCESTRAL CHARACTER ESTIMATION UNDER THE THRESHOLD MODEL FROM QUANTITATIVE GENETICS

Optimum selection and OU processes

How can molecular phylogenies illuminate morphological evolution?

Molecular Evolution & Phylogenetics

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Rapid evolution of the cerebellum in humans and other great apes

Estimating Evolutionary Trees. Phylogenetic Methods

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Supplemental Information Likelihood-based inference in isolation-by-distance models using the spatial distribution of low-frequency alleles

Inferring Speciation Times under an Episodic Molecular Clock

Dating r8s, multidistribute

Chapter 9: Beyond the Mk Model

Molecular Evolution, course # Final Exam, May 3, 2006

Metropolis-Hastings Algorithm

Chapter 7: Models of discrete character evolution

Appendix from L. J. Revell, On the Analysis of Evolutionary Change along Single Branches in a Phylogeny

Quantitative evolution of morphology

Phenotypic Evolution. and phylogenetic comparative methods. G562 Geometric Morphometrics. Department of Geological Sciences Indiana University

A Bayesian Approach to Phylogenetics

Markov chain Monte-Carlo to estimate speciation and extinction rates: making use of the forest hidden behind the (phylogenetic) tree

Contrasts for a within-species comparative method

Infer relationships among three species: Outgroup:

Lecture 6 Phylogenetic Inference

Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies

Maximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018

1. Can we use the CFN model for morphological traits?

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

A (short) introduction to phylogenetics

STA 4273H: Statistical Machine Learning

MCMC 2: Lecture 2 Coding and output. Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham

MCMC: Markov Chain Monte Carlo

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2011 University of California, Berkeley

p L yi z n m x N n xi

Systematics - Bio 615

Testing quantitative genetic hypotheses about the evolutionary rate matrix for continuous characters

Reminder of some Markov Chain properties:

Monte Carlo in Bayesian Statistics

Markov Chain Monte Carlo

Machine Learning Techniques for Computer Vision

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200 Spring 2018 University of California, Berkeley

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

DNA-based species delimitation

Evolutionary Models. Evolutionary Models

Nonlinear Models. and. Hierarchical Nonlinear Models

BMI/CS 776 Lecture 4. Colin Dewey

An Introduction to Bayesian Inference of Phylogeny

Bayes Nets: Sampling

Bayesian Phylogenetics:

Reconstruire le passé biologique modèles, méthodes, performances, limites

Phylogenetics. BIOL 7711 Computational Bioscience

Markov Chain Monte Carlo Methods

Phylogeny, trees and morphospace

BRIEF COMMUNICATIONS

The phylogenetic effective sample size and jumps

Markov Chain Monte Carlo Methods

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression)

Accounting for Phylogenetic Uncertainty in Comparative Studies: MCMC and MCMCMC Approaches. Mark Pagel Reading University.

Biol 206/306 Advanced Biostatistics Lab 12 Bayesian Inference

Taming the Beast Workshop

Biol 206/306 Advanced Biostatistics Lab 12 Bayesian Inference Fall 2016

Approximate Bayesian Computation: a simulation based approach to inference

Non-Parametric Bayesian Population Dynamics Inference

CS 343: Artificial Intelligence

ABC methods for phase-type distributions with applications in insurance risk problems

Hamiltonian Monte Carlo

Statistical Methods in Particle Physics Lecture 1: Bayesian methods

Estimating the Rate of Evolution of the Rate of Molecular Evolution

The statistical and informatics challenges posed by ascertainment biases in phylogenetic data collection

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

Markov Networks.

The Metropolis-Hastings Algorithm. June 8, 2012

Evolution of Body Size in Bears. How to Build and Use a Phylogeny

Reconstructing Evolutionary Trees. Chapter 14

BIOINFORMATICS. Gilles Guillot

Bayesian construction of perceptrons to predict phenotypes from 584K SNP data.

Calculation of IBD probabilities

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

MID-TERM EXAM ANSWERS. p t + δ t = Rp t 1 + η t (1.1)

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Markov Chain Monte Carlo methods

Today s Lecture: HMMs

Introduction to Quantitative Genetics. Introduction to Quantitative Genetics

Constructing Evolutionary/Phylogenetic Trees

Bayesian Inference and MCMC

Inferring Molecular Phylogeny

Bayesian Models for Phylogenetic Trees

The Origin of Deep Learning. Lili Mou Jan, 2015

Tutorial Session 2. MCMC for the analysis of genetic data on pedigrees:

Efficient recycled algorithms for quantitative trait models on phylogenies arxiv: v1 [q-bio.pe] 2 Dec 2015

(I AL BL 2 )z t = (I CL)ζ t, where

Calculation of IBD probabilities

EVOLUTIONARY DISTANCES

Bayesian Methods in Multilevel Regression

Bayesian Regression (1/31/13)

Assessing Reliability Using Developmental and Operational Test Data

Transcription:

Discrete & continuous characters: The threshold model

Discrete & continuous characters: the threshold model So far we have discussed continuous & discrete character models separately for estimating ancestral state; and for estimating the evolutionary correlation between characters. In recent years a new model has been proposed (or, more accurately, an old model has been revisited) to model the evolutionary covariance between discrete & continuous character on a phylogeny (Felsenstein 2005, 2012; Revell 2013). This model is called the threshold model.

Review: the Mk model The most commonly used model for discrete character evolution on trees is a model called the Mk model. M stands for Markov because the modeled process is a continuous-time Markov chain; and k because the model is generalized to include an arbitrary number (k) states. The central attribute of the Mk model is a transition matrix, Q, giving the instantaneous transition rates between states. Q = ( α + γ ε β ) α ( γ + δ ) ζ β δ ( ε + ζ ) p t = p0 exp( Qt)

Properties of the Mk model Because the process is (by definition) memoryless, a character that changes state from 0 -> 1 (or A -> B, etc.) has an indefinitely equal probability of reverting back, B -> A. That probability can be large or small (even 0), but it is indefinitely constant. In addition, for multistate data a character that has recently changed from A -> B immediately assumes the probability P bj of subsequently changing to state j. (Assuming fixation is rapid relative to the scale of time being studied) this could be a reasonable assumption for nucleotide data and some types of morphological characters. However, for complex morphological & ecological characters, it may be time to consider another model.

The threshold model Wright (1934) proposed a model for discrete characters in which the value of the discrete phenotype is determined by an underlying, unobserved continuous character called liability. If liability crossed a fixed threshold value, the character changed state. Sewall Wright (1889-1988)

What is liability? Liability is by definition unobserved or unmeasured. It could be a superficially invisible (but theoretically measurable) trait such as circulating blood hormone for instance. I also argue that liability could be a proxy for the complex, multilocus genetic changes that are likely to underlie a shift in a discretely measured ecological trait (Revell 2013).

The threshold model In spite of the long history of this model in quantitative genetics, Felsenstein (2005, 2012) was the first to apply it to comparative biology. He developed an approach to estimate the evolutionary correlation between discrete characters, or between discrete and continuous traits, using Joseph Felsenstein the threshold model. In that case, the correlation is merely the correlation of liabilities. Subsequently I (Revell 2013) proposed using the threshold model for ancestral state reconstruction.

Properties of the threshold model The threshold model is inherently ordered. Although we can use a Markov process to model the evolution of liability (e.g., Brownian motion), discrete character evolution under the threshold model is not memoryless. This is because if a character changed state recently from A -> B, it is much more likely to change back immediately (when near the threshold) than far in the future. The model also provides a natural framework for withinspecies polymorphism (although this is not implemented so far).

Properties of the threshold model

Simulating under the threshold model Simulating under the threshold model is trivial. We just simulate liability up the tree under our continuous character model (say, Brownian motion); and then we translate our simulated liabilities to the discrete threshold character.

Simulating under the threshold model 1. Simulate liability up the branches of the tree under our continuous character model.

Simulating under the threshold model 2. Apply the thresholds to translate tip & node states to the discrete character.

Simulating under the threshold model 2. Apply the thresholds to translate tip & node states to the discrete character.

Simulating under the threshold model 3. Project the implied states back onto the nodes of the tree.

Estimating ancestral states under the threshold model Fitting the threshold model to discrete character data is distinctly more difficult. This is because computing the probability of a character pattern would involve calculating a bunch of integrals of the multivariate normal distribution that we can t compute (and this ignores the positions of the thresholds). My solution is to sample the tip & node liabilities, and the relative positions of the thresholds, from their joint posterior probability distribution using Bayesian MCMC under the Metropolis-Hastings algorithm.

Estimating ancestral states under the threshold model While computing the likelihood of our discrete character data would be difficult; computing the likelihood (& thus posterior odds ratio) of a set of tip & node liabilities given the tip data & tree is easy: tip liabilities, likelihood ancestral states, & thresholds = liabilities P tree & model 1.0 if liabilities correct 0.0 otherwise l exp 1 ([ x, a] a 1) C ([ x, a] a 1) 1 2 0 ( x, a0, a, τ y, C) = ( n+ i 1) 2 1 2 (2π ) C 0 1 0 if if f ( x, τ) = y f ( x, τ) y

Estimating ancestral states under the threshold model Something that you might observe about this expression is that there is no rate of liability evolution, σ 2. This is because liability is scaleless, thus we can fix the position of the threshold(s) and estimate σ 2 ; or fix σ 2, and estimate the positions of the thresholds but not both. What value we fix σ 2 to is inconsequential, because σ 2 cancels from the numerator & denominator of the posterior odds ratio during MCMC.

Estimating the evolutionary correlation under the threshold model In addition to ancestral states, we can also use the threshold model to estimate the evolutionary correlation between discrete traits or between discrete & continuous characters. In this case the likelihood expression is as follows: tip liabilities, tip trait values, likelihood covariances, & thresholds* = liabilities, tip values, P & correlation tree & model 1.0 if correct 0.0 otherwise We can t maximize the likelihood, but we can sample liabilities & covariances from their joint posterior probability distribution.