Taming the Beast Workshop

Similar documents
Today's project. Test input data Six alignments (from six independent markers) of Curcuma species

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression)

Inferring Speciation Times under an Episodic Molecular Clock

Concepts and Methods in Molecular Divergence Time Estimation

Quartet Inference from SNP Data Under the Coalescent Model

first (i.e., weaker) sense of the term, using a variety of algorithmic approaches. For example, some methods (e.g., *BEAST 20) co-estimate gene trees

Anatomy of a species tree

Bayesian inference of species trees from multilocus data

WenEtAl-biorxiv 2017/12/21 10:55 page 2 #2

JML: testing hybridization from species trees

Inferring Species Trees Directly from Biallelic Genetic Markers: Bypassing Gene Trees in a Full Coalescent Analysis. Research article.

Introduction to Phylogenetic Analysis

Introduction to Phylogenetic Analysis

SpeciesNetwork Tutorial

Estimating Evolutionary Trees. Phylogenetic Methods

Efficient Bayesian species tree inference under the multi-species coalescent

The Inference of Gene Trees with Species Trees

Statistical Inference of Allopolyploid Species Networks in the Presence of Incomplete Lineage Sorting

Phylogenomics of closely related species and individuals

Evaluation of a Bayesian Coalescent Method of Species Delimitation

In comparisons of genomic sequences from multiple species, Challenges in Species Tree Estimation Under the Multispecies Coalescent Model REVIEW

Posterior predictive checks of coalescent models: P2C2M, an R package

Workshop III: Evolutionary Genomics

To link to this article: DOI: / URL:

A primer on phylogenetic biogeography and DEC models. March 13, 2017 Michael Landis Bodega Bay Workshop Sunny California

Efficient Bayesian Species Tree Inference under the Multispecies Coalescent

STEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization)

An Evaluation of Different Partitioning Strategies for Bayesian Estimation of Species Divergence Times

From Individual-based Population Models to Lineage-based Models of Phylogenies

The problem Lineage model Examples. The lineage model

Incomplete Lineage Sorting: Consistent Phylogeny Estimation From Multiple Loci

8/23/2014. Phylogeny and the Tree of Life

7. Tests for selection

Probability Distribution of Molecular Evolutionary Trees: A New Method of Phylogenetic Inference

Bayesian species identification under the multispecies coalescent provides significant improvements to DNA barcoding analyses

CREATING PHYLOGENETIC TREES FROM DNA SEQUENCES

Hidden Markov models in population genetics and evolutionary biology

A tutorial of BPP for species tree estimation and species delimitation

The Probability of a Gene Tree Topology within a Phylogenetic Network with Applications to Hybridization Detection

EVOLUTIONARY DISTANCES

GenPhyloData: realistic simulation of gene family evolution

The BPP program for species tree estimation and species delimitation

Markov chain Monte-Carlo to estimate speciation and extinction rates: making use of the forest hidden behind the (phylogenetic) tree

Many of the slides that I ll use have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Upcoming challenges in phylogenomics. Siavash Mirarab University of California, San Diego

Gene Genealogies Coalescence Theory. Annabelle Haudry Glasgow, July 2009

Unified modeling of gene duplication, loss and coalescence using a locus tree

Fast coalescent-based branch support using local quartet frequencies

Estimation of species divergence dates with a sloppy molecular clock

Divergence Time Estimation using BEAST v2.2.0 Dating Species Divergences with the Fossilized Birth-Death Process

Coalescent based demographic inference. Daniel Wegmann University of Fribourg

A Bayesian Approach for Fast and Accurate Gene Tree Reconstruction

Non-Parametric Bayesian Population Dynamics Inference

Phylogenetic Tree Reconstruction

Molecular phylogeny How to infer phylogenetic trees using molecular sequences


Molecular phylogeny How to infer phylogenetic trees using molecular sequences

ASTRAL: Fast coalescent-based computation of the species tree topology, branch lengths, and local branch support

Understanding How Stochasticity Impacts Reconstructions of Recent Species Divergent History. Huateng Huang

Divergence Time Estimation using BEAST v2. Dating Species Divergences with the Fossilized Birth-Death Process Tracy A. Heath

Phylogenetic Inference using RevBayes

DNA-based species delimitation

Cladistics and Bioinformatics Questions 2013

Coalescent Histories on Phylogenetic Networks and Detection of Hybridization Despite Incomplete Lineage Sorting

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley

Multiple Sequence Alignment. Sequences

arxiv: v2 [q-bio.pe] 4 Feb 2016

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

Discordance of Species Trees with Their Most Likely Gene Trees

Reconstruire le passé biologique modèles, méthodes, performances, limites

Approximate Bayesian Computation: a simulation based approach to inference

How should we go about modeling this? Model parameters? Time Substitution rate Can we observe time or subst. rate? What can we observe?

Processes of Evolution

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week (Jan 27 & 29):

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley

Fine-Scale Phylogenetic Discordance across the House Mouse Genome

Statistical nonmolecular phylogenetics: can molecular phylogenies illuminate morphological evolution?

Detection and Polarization of Introgression in a Five-Taxon Phylogeny

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

Maximum Likelihood Inference of Reticulate Evolutionary Histories

Inference of Gene Tree Discordance and Recombination. Yujin Chung. Doctor of Philosophy (Statistics)

Relaxing the Molecular Clock to Different Degrees for Different Substitution Types

Evolution Problem Drill 10: Human Evolution

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

How can molecular phylogenies illuminate morphological evolution?

HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM

Inferring Molecular Phylogeny

Received 21 June 2011; reviews returned 24 August 2011; accepted 1 November 2011 Associate Editor: Elizabeth Jockusch

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

Lecture 11 Friday, October 21, 2011

MCMC: Markov Chain Monte Carlo

ESTIMATING DIVERGENCE TIMES FROM MOLECULAR DATA ON PHYLOGENETIC

Phylogenomics. Jeffrey P. Townsend Department of Ecology and Evolutionary Biology Yale University. Tuesday, January 29, 13

Phylogenetics. BIOL 7711 Computational Bioscience

Molecular Epidemiology Workshop: Bayesian Data Analysis

Genetic Drift in Human Evolution

C.DARWIN ( )

Supplementary Information

How Molecules Evolve. Advantages of Molecular Data for Tree Building. Advantages of Molecular Data for Tree Building

Transcription:

Workshop and Chi Zhang June 28, 2016 1 / 19

Species tree Species tree the phylogeny representing the relationships among a group of species Figure adapted from [Rogers and Gibbs, 2014] Gene tree the phylogeny for sequences at a particular gene locus from those species 2 / 19

Gene tree discordance Incomplete lineage sorting Figure adapted from [Patterson et al., 2006] 3 / 19

Gene tree discordance Horizontal gene transfer Gene duplication and loss Figure adapted from [Degnan and Rosenberg, 2009] 4 / 19

Gene tree discordance Hybridization Figure adapted from [Li et al., 2016] 5 / 19

Species tree inference and A Bayesian method to infer from multilocus sequence data [Heled and Drummond, 2010], a functionality of BEAST2 Gene trees are embedded in the under the multispecies coalescent model [Rannala and Yang, 2003] incomplete lineage sorting Gene trees are independent among loci Human Chimp Gorilla 6 / 19

The prior for S has two parts: P(S) = P(S T )P(N) ST species time tree N population size functions P(S T ) typically a Yule (pure-birth) or birth-death prior we can assign a hyperprior for the speciation (birth) rate (and extinction (death) rate, if birth-death) P(N) constant or continuous-linear 7 / 19

Constant population sizes Figure adapted from [Drummond and Bouckaert, 2015] 8 / 19

Continuous-linear population sizes Figure adapted from [Drummond and Bouckaert, 2015] 9 / 19

In, the prior type for N is fixed to gamma The gamma shape parameter k is fixed to 2, but we can assign a hyperprior for ψ, the scale parameter of the gamma (This ψ parameter is called population mean in Beauti, but the prior mean is actually 2ψ when the population sizes are constant) 10 / 19

model The prior for gene tree g, given S Figure adapted from [Drummond and Bouckaert, 2015] 11 / 19

model The prob. distribution of gene time tree g given S, is: P(g S) = 2s 1 j=1 P(L j (g) N j (t)) s number of extant species (2s 1 branches totally) N j (t) population size function (linear) L j (g) coalescent intervals for genealogy g that are contained in the j th branch of S 12 / 19

P(c) prior for the molecular clock model of genealogy g strict clock typically fix to 1.0 for the first locus, and infer the relative clock rates for the rest loci relaxed clock P(θ) prior for the substitution model parameters e.g. HKY85, Prior for transition/transversion rate ratio (κ), e.g. gamma(2,1) Prior for base frequencies (πt, π C, π A, π G ), e.g. Dirichlet(1,1,1,1) 13 / 19

The probability (likelihood) of data d i (alignment at locus i), given the gene time tree g i, molecular clock c i, and substitution model θ i, is: P(d i g i, c i, θ i ) 14 / 19

Priors and likelihood P(S) prior for P(g i S) prior for gene tree i (multispecies coalescent) P(c i ) prior for clock rate of locus i P(θ i ) prior for substitution parameters of locus i P(d i g i, c i, θ i ) likelihood of data at locus i 15 / 19

Posterior The posterior distribution of the S given data D is: ( n ) P(S D) P(d i g i, c i, θ i )P(g i S)P(c i )P(θ i ) P(S)dG G i=1 The data D = d 1, d 2,..., d n is composed of n alignments, one per locus. G = (G 1 G 2... G n ) is the space of all gene trees over the respective alignments where g i G i is one specific gene tree on the i th alignment. 16 / 19

Integrating out population sizes Assume constant population sizes Assign i.i.d inverse-gamma(α, β) prior for N j mean = β/(α 1) The population sizes N can be integrated out from P(g S) [Jones, 2015] Specify α and β in the invgamma prior (instead of ψ in the gamma prior) 17 / 19

A more efficient implementation and an upgrade of Population sizes integrated out [Jones, 2015] Relaxed molecular clock per branch (instead of per gene tree branch) More efficient MCMC proposals for the and gene trees (coordinated operators) [Jones, 2015, Rannala and Yang, 2015] Available at github.com/genomescale/starbeast2, will be released soon (as a BEAST2 add-on) 18 / 19

I - Degnan, J. H. and Rosenberg, N. A. (2009). Gene tree discordance, phylogenetic inference and the multispecies coalescent. Trends in Ecology & Evolution, 24(6):332 340. - Drummond, A. J. and Bouckaert, R. R. (2015). Bayesian Evolutionary Analysis with BEAST. Cambridge University Press. - Heled, J. and Drummond, A. J. (2010). s from multilocus data. Molecular Biology and Evolution, 27(3):570 580. - Jones, G. R. (2015). Species delimitation and phylogeny estimation under the multispecies coalescent. biorxiv. - Li, G., Davis, B. W., Eizirik, E., and Murphy, W. J. (2016). Phylogenomic evidence for ancient hybridization in the genomes of living cats (Felidae). Genome Research, 26(1):1 11. - Patterson, N., Richter, D. J., Gnerre, S., Lander, E. S., and Reich, D. (2006). Genetic evidence for complex speciation of humans and chimpanzees. Nature, 441(7097):1103 1108. - Rannala, B. and Yang, Z. (2003). Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. Genetics, 164(4):1645 1656. - Rannala, B. and Yang, Z. (2015). Efficient Bayesian inference under the multi-species coalescent. arxiv.org. - Rogers, J. and Gibbs, R. A. (2014). Comparative primate genomics: emerging patterns of genome content and dynamics. Nature Reviews Genetics, 15(5):347 359. 19 / 19