Identifiability of the GTR+Γ substitution model (and other models) of DNA evolution

Size: px
Start display at page:

Download "Identifiability of the GTR+Γ substitution model (and other models) of DNA evolution"

Transcription

1 Identifiability of the GTR+Γ substitution model (and other models) of DNA evolution Elizabeth S. Allman Dept. of Mathematics and Statistics University of Alaska Fairbanks TM Current Challenges and Problems in Phylogenetics Isaac Newton Institute Cambridge, England 5 September 2007

2 Jointworkwith J. Rhodes C. Ané

3 Identifiability. A model of molecular evolution M is identifiable if the values of all parameters can be determined from the joint distribution P of states. Parameters = tree topology(ies), stationary distribution, edge lengths, rate matrix Q, Γ shape parameter, Markov edge matrices, p inv,etc. Identifiability is necessary to have consistency of statistical inference, whether using ML or Bayesian methods. INI Identifiability Slide 1

4 Known Identifiability results... Negative: For sufficiently complicated rate-across-sites models (non-explicit), tree identifiability can fail (Steel-Székely-Hendy, J. Comp. Biol., 1994) explicit non-generic examples (not r-a-s) of non-identifiability of mixtures (Štefankovič-Vigoda, Sys. Biol., 2007; J. Comp. Biol., 2007) non-generic 2-class mixtures on one tree can exactly agree with 1-class model on different tree (Matsen-Steel, preprint) more general study of many-class non-identifiable mixtures under 2-state symmetric model (Matsen-Mossel-Steel, preprint) INI Identifiability Slide 2

5 Positive: GTR is identifiable (use log-det distance to identify tree, etc.) GM is identifiable (Chang, Math. Biosci., 1996) general result on mixture models on one tree with small number of classes (Allman-Rhodes, J. Comp. Biol., 2006) For DNA models, tree is generically identifiable for: GTR+I GTR with 3 rate-across-sites classes GTR+GTR+GTR GM+GM+GM covarion with 3 rate classes INI Identifiability Slide 3

6 Generic vs. non-generic identifiability. If T n denotes n-leaf tree space and M any choice of model, then the parameterization map(s) φ M : T T n (T,S T ) C κ (T,s T ) P = φ M,T (s T ) give rise to the collection of joint distributions P for M. M is identifiable φ M is injective n INI Identifiability Slide 4

7 For a fixed tree T,themap φ M,T : {Parameters on T } C κn s T P = φ M,T (s T ) associates to each tree T its phylogenetic variety V T. But, V T1 V T2 always (star phylogenies) If the intersection is of lower dimension, then the tree is identifiable for generic parameters. INI Identifiability Slide 5

8 For a fixed tree T,themap φ M,T : {Parameters on T } C κn s T P = φ M,T (s T ) associates to each tree T its phylogenetic variety V T. V T1 But, V T1 V T2 always (star phylogenies) V T2 If the intersection is of lower dimension, then the tree is identifiable for generic parameters. INI Identifiability Slide 6

9 Today... Q1: Is the GTR+Γ+I model identifiable? Q2: Are 2-tree mixtures identifiable? INI Identifiability Slide 7

10 Q1: Is the GTR+Γ+I model identifiable? Rogers (Sys. Biol., 2001) claimed a proof, widely cited, but Argument has several major gaps in showing identifiability: 1) crucial use of an unjustified graphical claim 2) generic vs. non-generic parameters There is no valid, published proof that ML or Bayesian inference using the GTR+Γ+I model is consistent. INI Identifiability Slide 8

11 None of previous work applies to GTR+Γ or GTR+Γ+I, since: continuous rate distribution prevents application of Allman-Rhodes positive results (or algebraic methods of proof) specifying a particular form of rate distribution prevents application of negative Steel or Matsen-Mossel-Steel results. INI Identifiability Slide 9

12 New result: Allman, Ané, Rhodes (2007): For 4-state (DNA) models, GTR+Γ is identifiable. And, more generally, For κ-state models, GTR+Γ is generically identifiable. Comments: This is the first proof of identifiability for a rate-across-sites model with a continuous distribution of rates. Identifiability for all parameters, not just generic ones. Proof does not follow Rogers approach. INI Identifiability Slide 10

13 Main points of GTR+Γ proof: stationary distribution, eigenvectors of rate matrix Q from 1- and 2-taxon marginals Focus on 3-leaf tree to identify α shape parameter (work) a 2 a 3 a 1 then get Q, edge lengths t e. Result for n-leaf tree then follows from combinatorial arguments. Use algebraic arguments to extract information from 3-dim tensor. Use analytic arguments (convexity) for generic identifiability. Detailed analysis of non-generic cases completes proof. INI Identifiability Slide 11

14 Note: We still lack a proof that the tree is identifiable for GTR+Γ+I. This is likely to be significantly harder to prove since: Γ introduces only 1 parameter (shape parameter α ), Γ+I introduces 2 parameters (α, proportion of invar. sites p inv ) INI Identifiability Slide 12

15 Tree mixtures. Different parts of sequences may have evolved along different trees gene tree vs. species tree, incomplete lineage sorting Species Tree Gene 1 Gene 2 horizontalgenetransfer INI Identifiability Slide 13

16 Two-tree mixtures can confound analysis. Mossel E. and Vigoda E., Phylogenetic MCMC algorithms are misleading on mixtures of trees, Science 309, 2207 (2005). Ronquist, F., Larget B., Huelsenbeck, J., Kadane J., Simon D., and van der Mark, P., Comment on Phylogenetic MCMC algorithms are misleading on mixtures of trees, Science 312, 367a (2006). Mossel E. and Vigoda E., Response to comment on Phylogenetic MCMC algorithms are misleading on mixtures of trees, Science 312, 367b (2006). Matsen, F. and Steel M., Phylogenetic mixtures on a single tree can mimic a tree of another topology, preprint. (2-state) Matsen, F., Mossel, E. and Steel M., Mixed-up trees: the structure of phylogenetic mixtures, preprint. (2-state) INI Identifiability Slide 14

17 Simple model: 4-taxon trees T 1, T 2, T 3 a c a b a b b d c d d c T 1 T 2 T 3 Joint distributions P 1,2 are 2-tree mixtures with δ a mixing parameter. P 1,2 = δp M,T1 +(1 δ)p M,T2 Similarly, for the other two mixtures. INI Identifiability Slide 15

18 Theorem. Suppose P ij is a joint distribution arising from a 2-tree GM mixture on 4-taxon trees for κ =4states. Then the trees T i, T j and stochastic parameters s i, s j are generically identifiable from P ij. i.e. given P ij, we can generically identify (T i,s i ) and (T j,s j ). A similar result holds for 2-tree GTR mixtures (and JC mixtures). INI Identifiability Slide 16

19 Two-tree mixtures proof. (GM) Find a specific point B that lies on both V GM,T1,T 2 and V GM,T1,T 3. Prove B is non-singular by computing in Maple the dimension of the tangent spaces H 1,2 to B V GM,T1,T 2 and H 1,3 to B V GM,T1,T 3. B dim(h 1,2 ) = 127, dim(h 1,3 ) = 127 INI Identifiability Slide 17

20 All computations for GM can be done exactly: B can be chosen to arise from rational parameter values. parameterization is given by polynomials with rational coefficients. Maple performs exact rational arithmetic. Another computation shows that the two tangent spaces intersect in a lower dimensional hyperplane. ( ten minutes of computation) dim(h 1,2 H 1,3 ) = 115 This proves that V GM,T1,T 2 and V GM,T1,T 3 are different, and then by principles of AG we have dim(v GM,T1,T 2 V GM,T1,T 3 ) < 127. INI Identifiability Slide 18

21 Extension to GTR (non-algebraic): Observe JC GT R. Choose B to be a Jukes-Cantor point (rational, yet GTR) with B X GT R,T1,T 2 X GT R,T1,T 3 Prove that there is a vector v tangent to X GT R,T1,T 3 not lie in the tangent plane at B to V GM,T1,T 2. at B that does INI Identifiability Slide 19

22 Preprint: INI Identifiability Slide 20

Algebraic Statistics Tutorial I

Algebraic Statistics Tutorial I Algebraic Statistics Tutorial I Seth Sullivant North Carolina State University June 9, 2012 Seth Sullivant (NCSU) Algebraic Statistics June 9, 2012 1 / 34 Introduction to Algebraic Geometry Let R[p] =

More information

Pitfalls of Heterogeneous Processes for Phylogenetic Reconstruction

Pitfalls of Heterogeneous Processes for Phylogenetic Reconstruction Pitfalls of Heterogeneous Processes for Phylogenetic Reconstruction DANIEL ŠTEFANKOVIČ 1 AND ERIC VIGODA 2 1 Department of Computer Science, University of Rochester, Rochester, New York 14627, USA; and

More information

ELIZABETH S. ALLMAN and JOHN A. RHODES ABSTRACT 1. INTRODUCTION

ELIZABETH S. ALLMAN and JOHN A. RHODES ABSTRACT 1. INTRODUCTION JOURNAL OF COMPUTATIONAL BIOLOGY Volume 13, Number 5, 2006 Mary Ann Liebert, Inc. Pp. 1101 1113 The Identifiability of Tree Topology for Phylogenetic Models, Including Covarion and Mixture Models ELIZABETH

More information

When Do Phylogenetic Mixture Models Mimic Other Phylogenetic Models?

When Do Phylogenetic Mixture Models Mimic Other Phylogenetic Models? Syst. Biol. 61(6):1049 1059, 2012 The Author(s) 2012. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com

More information

Pitfalls of Heterogeneous Processes for Phylogenetic Reconstruction

Pitfalls of Heterogeneous Processes for Phylogenetic Reconstruction Pitfalls of Heterogeneous Processes for Phylogenetic Reconstruction Daniel Štefankovič Eric Vigoda June 30, 2006 Department of Computer Science, University of Rochester, Rochester, NY 14627, and Comenius

More information

Phylogeny of Mixture Models

Phylogeny of Mixture Models Phylogeny of Mixture Models Daniel Štefankovič Department of Computer Science University of Rochester joint work with Eric Vigoda College of Computing Georgia Institute of Technology Outline Introduction

More information

arxiv: v1 [math.ra] 13 Jan 2009

arxiv: v1 [math.ra] 13 Jan 2009 A CONCISE PROOF OF KRUSKAL S THEOREM ON TENSOR DECOMPOSITION arxiv:0901.1796v1 [math.ra] 13 Jan 2009 JOHN A. RHODES Abstract. A theorem of J. Kruskal from 1977, motivated by a latent-class statistical

More information

Phylogenetic Algebraic Geometry

Phylogenetic Algebraic Geometry Phylogenetic Algebraic Geometry Seth Sullivant North Carolina State University January 4, 2012 Seth Sullivant (NCSU) Phylogenetic Algebraic Geometry January 4, 2012 1 / 28 Phylogenetics Problem Given a

More information

A concise proof of Kruskal s theorem on tensor decomposition

A concise proof of Kruskal s theorem on tensor decomposition A concise proof of Kruskal s theorem on tensor decomposition John A. Rhodes 1 Department of Mathematics and Statistics University of Alaska Fairbanks PO Box 756660 Fairbanks, AK 99775 Abstract A theorem

More information

1. Can we use the CFN model for morphological traits?

1. Can we use the CFN model for morphological traits? 1. Can we use the CFN model for morphological traits? 2. Can we use something like the GTR model for morphological traits? 3. Stochastic Dollo. 4. Continuous characters. Mk models k-state variants of the

More information

The statistical and informatics challenges posed by ascertainment biases in phylogenetic data collection

The statistical and informatics challenges posed by ascertainment biases in phylogenetic data collection The statistical and informatics challenges posed by ascertainment biases in phylogenetic data collection Mark T. Holder and Jordan M. Koch Department of Ecology and Evolutionary Biology, University of

More information

How should we go about modeling this? Model parameters? Time Substitution rate Can we observe time or subst. rate? What can we observe?

How should we go about modeling this? Model parameters? Time Substitution rate Can we observe time or subst. rate? What can we observe? How should we go about modeling this? gorilla GAAGTCCTTGAGAAATAAACTGCACACACTGG orangutan GGACTCCTTGAGAAATAAACTGCACACACTGG Model parameters? Time Substitution rate Can we observe time or subst. rate? What

More information

Workshop III: Evolutionary Genomics

Workshop III: Evolutionary Genomics Identifying Species Trees from Gene Trees Elizabeth S. Allman University of Alaska IPAM Los Angeles, CA November 17, 2011 Workshop III: Evolutionary Genomics Collaborators The work in today s talk is joint

More information

The Generalized Neighbor Joining method

The Generalized Neighbor Joining method The Generalized Neighbor Joining method Ruriko Yoshida Dept. of Mathematics Duke University Joint work with Dan Levy and Lior Pachter www.math.duke.edu/ ruriko data mining 1 Challenge We would like to

More information

Recent Progress in Combinatorial Statistics

Recent Progress in Combinatorial Statistics Elchanan Mossel U.C. Berkeley Recent Progress in Combinatorial Statistics At Penn Statistics, Sep 11 Combinatorial Statistics Combinatorial Statistics : Rigorous Analysis of Inference Problems where: Estimating

More information

Jed Chou. April 13, 2015

Jed Chou. April 13, 2015 of of CS598 AGB April 13, 2015 Overview of 1 2 3 4 5 Competing Approaches of Two competing approaches to species tree inference: Summary methods: estimate a tree on each gene alignment then combine gene

More information

Introduction to Algebraic Statistics

Introduction to Algebraic Statistics Introduction to Algebraic Statistics Seth Sullivant North Carolina State University January 5, 2017 Seth Sullivant (NCSU) Algebraic Statistics January 5, 2017 1 / 28 What is Algebraic Statistics? Observation

More information

Phylogenetic Inference using RevBayes

Phylogenetic Inference using RevBayes Phylogenetic Inference using RevBayes Model section using Bayes factors Sebastian Höhna 1 Overview This tutorial demonstrates some general principles of Bayesian model comparison, which is based on estimating

More information

Reconstruire le passé biologique modèles, méthodes, performances, limites

Reconstruire le passé biologique modèles, méthodes, performances, limites Reconstruire le passé biologique modèles, méthodes, performances, limites Olivier Gascuel Centre de Bioinformatique, Biostatistique et Biologie Intégrative C3BI USR 3756 Institut Pasteur & CNRS Reconstruire

More information

Lie Markov models. Jeremy Sumner. School of Physical Sciences University of Tasmania, Australia

Lie Markov models. Jeremy Sumner. School of Physical Sciences University of Tasmania, Australia Lie Markov models Jeremy Sumner School of Physical Sciences University of Tasmania, Australia Stochastic Modelling Meets Phylogenetics, UTAS, November 2015 Jeremy Sumner Lie Markov models 1 / 23 The theory

More information

Using algebraic geometry for phylogenetic reconstruction

Using algebraic geometry for phylogenetic reconstruction Using algebraic geometry for phylogenetic reconstruction Marta Casanellas i Rius (joint work with Jesús Fernández-Sánchez) Departament de Matemàtica Aplicada I Universitat Politècnica de Catalunya IMA

More information

This article was published in an Elsevier journal. The attached copy is furnished to the author for non-commercial research and education use, including for instruction at the author s institution, sharing

More information

Inferring Complex DNA Substitution Processes on Phylogenies Using Uniformization and Data Augmentation

Inferring Complex DNA Substitution Processes on Phylogenies Using Uniformization and Data Augmentation Syst Biol 55(2):259 269, 2006 Copyright c Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 101080/10635150500541599 Inferring Complex DNA Substitution Processes on Phylogenies

More information

Who was Bayes? Bayesian Phylogenetics. What is Bayes Theorem?

Who was Bayes? Bayesian Phylogenetics. What is Bayes Theorem? Who was Bayes? Bayesian Phylogenetics Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison October 6, 2011 The Reverand Thomas Bayes was born in London in 1702. He was the

More information

Bayesian Phylogenetics

Bayesian Phylogenetics Bayesian Phylogenetics Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison October 6, 2011 Bayesian Phylogenetics 1 / 27 Who was Bayes? The Reverand Thomas Bayes was born

More information

Example: Hardy-Weinberg Equilibrium. Algebraic Statistics Tutorial I. Phylogenetics. Main Point of This Tutorial. Model-Based Phylogenetics

Example: Hardy-Weinberg Equilibrium. Algebraic Statistics Tutorial I. Phylogenetics. Main Point of This Tutorial. Model-Based Phylogenetics Example: Hardy-Weinberg Equilibrium Algebraic Statistics Tutorial I Seth Sullivant North Carolina State University July 22, 2012 Suppose a gene has two alleles, a and A. If allele a occurs in the population

More information

arxiv: v1 [math.st] 22 Jun 2018

arxiv: v1 [math.st] 22 Jun 2018 Hypothesis testing near singularities and boundaries arxiv:1806.08458v1 [math.st] Jun 018 Jonathan D. Mitchell, Elizabeth S. Allman, and John A. Rhodes Department of Mathematics & Statistics University

More information

Mixed-up Trees: the Structure of Phylogenetic Mixtures

Mixed-up Trees: the Structure of Phylogenetic Mixtures Bulletin of Mathematical Biology (2008) 70: 1115 1139 DOI 10.1007/s11538-007-9293-y ORIGINAL ARTICLE Mixed-up Trees: the Structure of Phylogenetic Mixtures Frederick A. Matsen a,, Elchanan Mossel b, Mike

More information

first (i.e., weaker) sense of the term, using a variety of algorithmic approaches. For example, some methods (e.g., *BEAST 20) co-estimate gene trees

first (i.e., weaker) sense of the term, using a variety of algorithmic approaches. For example, some methods (e.g., *BEAST 20) co-estimate gene trees Concatenation Analyses in the Presence of Incomplete Lineage Sorting May 22, 2015 Tree of Life Tandy Warnow Warnow T. Concatenation Analyses in the Presence of Incomplete Lineage Sorting.. 2015 May 22.

More information

Identifiability of latent class models with many observed variables

Identifiability of latent class models with many observed variables Identifiability of latent class models with many observed variables Elizabeth S. Allman Department of Mathematics and Statistics University of Alaska Fairbanks Fairbanks, AK 99775 e-mail: e.allman@uaf.edu

More information

arxiv: v1 [q-bio.pe] 3 May 2016

arxiv: v1 [q-bio.pe] 3 May 2016 PHYLOGENETIC TREES AND EUCLIDEAN EMBEDDINGS MARK LAYER AND JOHN A. RHODES arxiv:1605.01039v1 [q-bio.pe] 3 May 2016 Abstract. It was recently observed by de Vienne et al. that a simple square root transformation

More information

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive. Additive distances Let T be a tree on leaf set S and let w : E R + be an edge-weighting of T, and assume T has no nodes of degree two. Let D ij = e P ij w(e), where P ij is the path in T from i to j. Then

More information

Phylogenetics: Building Phylogenetic Trees

Phylogenetics: Building Phylogenetic Trees 1 Phylogenetics: Building Phylogenetic Trees COMP 571 Luay Nakhleh, Rice University 2 Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary model should

More information

A Bayesian Approach to Phylogenetics

A Bayesian Approach to Phylogenetics A Bayesian Approach to Phylogenetics Niklas Wahlberg Based largely on slides by Paul Lewis (www.eeb.uconn.edu) An Introduction to Bayesian Phylogenetics Bayesian inference in general Markov chain Monte

More information

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis

More information

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University Phylogenetics: Building Phylogenetic Trees COMP 571 - Fall 2010 Luay Nakhleh, Rice University Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary

More information

Distances that Perfectly Mislead

Distances that Perfectly Mislead Syst. Biol. 53(2):327 332, 2004 Copyright c Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 10.1080/10635150490423809 Distances that Perfectly Mislead DANIEL H. HUSON 1 AND

More information

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University Phylogenetics: Distance Methods COMP 571 - Spring 2015 Luay Nakhleh, Rice University Outline Evolutionary models and distance corrections Distance-based methods Evolutionary Models and Distance Correction

More information

Open Problems in Algebraic Statistics

Open Problems in Algebraic Statistics Open Problems inalgebraic Statistics p. Open Problems in Algebraic Statistics BERND STURMFELS UNIVERSITY OF CALIFORNIA, BERKELEY and TECHNISCHE UNIVERSITÄT BERLIN Advertisement Oberwolfach Seminar Algebraic

More information

CS 372: Computational Geometry Lecture 4 Lower Bounds for Computational Geometry Problems

CS 372: Computational Geometry Lecture 4 Lower Bounds for Computational Geometry Problems CS 372: Computational Geometry Lecture 4 Lower Bounds for Computational Geometry Problems Antoine Vigneron King Abdullah University of Science and Technology September 20, 2012 Antoine Vigneron (KAUST)

More information

Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies

Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies 1 What is phylogeny? Essay written for the course in Markov Chains 2004 Torbjörn Karfunkel Phylogeny is the evolutionary development

More information

Phylogenetic invariants versus classical phylogenetics

Phylogenetic invariants versus classical phylogenetics Phylogenetic invariants versus classical phylogenetics Marta Casanellas Rius (joint work with Jesús Fernández-Sánchez) Departament de Matemàtica Aplicada I Universitat Politècnica de Catalunya Algebraic

More information

arxiv:q-bio/ v5 [q-bio.pe] 14 Feb 2007

arxiv:q-bio/ v5 [q-bio.pe] 14 Feb 2007 The Annals of Applied Probability 2006, Vol. 16, No. 4, 2215 2234 DOI: 10.1214/105051600000000538 c Institute of Mathematical Statistics, 2006 arxiv:q-bio/0505002v5 [q-bio.pe] 14 Feb 2007 LIMITATIONS OF

More information

Limitations of Markov Chain Monte Carlo Algorithms for Bayesian Inference of Phylogeny

Limitations of Markov Chain Monte Carlo Algorithms for Bayesian Inference of Phylogeny Limitations of Markov Chain Monte Carlo Algorithms for Bayesian Inference of Phylogeny Elchanan Mossel Eric Vigoda July 5, 2005 Abstract Markov Chain Monte Carlo algorithms play a key role in the Bayesian

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: istance-based methods Ultrametric Additive: UPGMA Transformed istance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

Identifiability and Inference of Non-Parametric Rates-Across-Sites Models on Large-Scale Phylogenies

Identifiability and Inference of Non-Parametric Rates-Across-Sites Models on Large-Scale Phylogenies University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 10-2013 Identifiability and Inference of Non-Parametric Rates-Across-Sites Models on Large-Scale Phylogenies Elchanan

More information

Phylogenetic Inference using RevBayes

Phylogenetic Inference using RevBayes Phylogenetic Inference using RevBayes Substitution Models Sebastian Höhna 1 Overview This tutorial demonstrates how to set up and perform analyses using common nucleotide substitution models. The substitution

More information

DNA-based species delimitation

DNA-based species delimitation DNA-based species delimitation Phylogenetic species concept based on tree topologies Ø How to set species boundaries? Ø Automatic species delimitation? druhů? DNA barcoding Species boundaries recognized

More information

arxiv: v1 [q-bio.pe] 23 Nov 2017

arxiv: v1 [q-bio.pe] 23 Nov 2017 DIMENSIONS OF GROUP-BASED PHYLOGENETIC MIXTURES arxiv:1711.08686v1 [q-bio.pe] 23 Nov 2017 HECTOR BAÑOS, NATHANIEL BUSHEK, RUTH DAVIDSON, ELIZABETH GROSS, PAMELA E. HARRIS, ROBERT KRONE, COLBY LONG, ALLEN

More information

Phylogenetics: Likelihood

Phylogenetics: Likelihood 1 Phylogenetics: Likelihood COMP 571 Luay Nakhleh, Rice University The Problem 2 Input: Multiple alignment of a set S of sequences Output: Tree T leaf-labeled with S Assumptions 3 Characters are mutually

More information

arxiv: v1 [q-bio.pe] 4 Sep 2013

arxiv: v1 [q-bio.pe] 4 Sep 2013 Version dated: September 5, 2013 Predicting ancestral states in a tree arxiv:1309.0926v1 [q-bio.pe] 4 Sep 2013 Predicting the ancestral character changes in a tree is typically easier than predicting the

More information

Phylogenetic Assumptions

Phylogenetic Assumptions Substitution Models and the Phylogenetic Assumptions Vivek Jayaswal Lars S. Jermiin COMMONWEALTH OF AUSTRALIA Copyright htregulation WARNING This material has been reproduced and communicated to you by

More information

BMI/CS 776 Lecture 4. Colin Dewey

BMI/CS 776 Lecture 4. Colin Dewey BMI/CS 776 Lecture 4 Colin Dewey 2007.02.01 Outline Common nucleotide substitution models Directed graphical models Ancestral sequence inference Poisson process continuous Markov process X t0 X t1 X t2

More information

Bayesian support is larger than bootstrap support in phylogenetic inference: a mathematical argument

Bayesian support is larger than bootstrap support in phylogenetic inference: a mathematical argument Bayesian support is larger than bootstrap support in phylogenetic inference: a mathematical argument Tom Britton Bodil Svennblad Per Erixon Bengt Oxelman June 20, 2007 Abstract In phylogenetic inference

More information

Quartet Inference from SNP Data Under the Coalescent Model

Quartet Inference from SNP Data Under the Coalescent Model Bioinformatics Advance Access published August 7, 2014 Quartet Inference from SNP Data Under the Coalescent Model Julia Chifman 1 and Laura Kubatko 2,3 1 Department of Cancer Biology, Wake Forest School

More information

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the

More information

EVOLUTIONARY DISTANCES

EVOLUTIONARY DISTANCES EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:

More information

Chapter 7: Models of discrete character evolution

Chapter 7: Models of discrete character evolution Chapter 7: Models of discrete character evolution pdf version R markdown to recreate analyses Biological motivation: Limblessness as a discrete trait Squamates, the clade that includes all living species

More information

Spectral Theorem for Self-adjoint Linear Operators

Spectral Theorem for Self-adjoint Linear Operators Notes for the undergraduate lecture by David Adams. (These are the notes I would write if I was teaching a course on this topic. I have included more material than I will cover in the 45 minute lecture;

More information

Stochastic Errors vs. Modeling Errors in Distance Based Phylogenetic Reconstructions

Stochastic Errors vs. Modeling Errors in Distance Based Phylogenetic Reconstructions PLGW05 Stochastic Errors vs. Modeling Errors in Distance Based Phylogenetic Reconstructions 1 joint work with Ilan Gronau 2, Shlomo Moran 3, and Irad Yavneh 3 1 2 Dept. of Biological Statistics and Computational

More information

State Space and Hidden Markov Models

State Space and Hidden Markov Models State Space and Hidden Markov Models Kunsch H.R. State Space and Hidden Markov Models. ETH- Zurich Zurich; Aliaksandr Hubin Oslo 2014 Contents 1. Introduction 2. Markov Chains 3. Hidden Markov and State

More information

Geometry of Phylogenetic Inference

Geometry of Phylogenetic Inference Geometry of Phylogenetic Inference Matilde Marcolli CS101: Mathematical and Computational Linguistics Winter 2015 References N. Eriksson, K. Ranestad, B. Sturmfels, S. Sullivant, Phylogenetic algebraic

More information

Dimension. Eigenvalue and eigenvector

Dimension. Eigenvalue and eigenvector Dimension. Eigenvalue and eigenvector Math 112, week 9 Goals: Bases, dimension, rank-nullity theorem. Eigenvalue and eigenvector. Suggested Textbook Readings: Sections 4.5, 4.6, 5.1, 5.2 Week 9: Dimension,

More information

26 : Spectral GMs. Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G.

26 : Spectral GMs. Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G. 10-708: Probabilistic Graphical Models, Spring 2015 26 : Spectral GMs Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G. 1 Introduction A common task in machine learning is to work with

More information

Estimating Evolutionary Trees. Phylogenetic Methods

Estimating Evolutionary Trees. Phylogenetic Methods Estimating Evolutionary Trees v if the data are consistent with infinite sites then all methods should yield the same tree v it gets more complicated when there is homoplasy, i.e., parallel or convergent

More information

Bayesian Phylogenetics

Bayesian Phylogenetics Bayesian Phylogenetics Paul O. Lewis Department of Ecology & Evolutionary Biology University of Connecticut Woods Hole Molecular Evolution Workshop, July 27, 2006 2006 Paul O. Lewis Bayesian Phylogenetics

More information

Mixture Models in Phylogenetic Inference. Mark Pagel and Andrew Meade Reading University.

Mixture Models in Phylogenetic Inference. Mark Pagel and Andrew Meade Reading University. Mixture Models in Phylogenetic Inference Mark Pagel and Andrew Meade Reading University m.pagel@rdg.ac.uk Mixture models in phylogenetic inference!some background statistics relevant to phylogenetic inference!mixture

More information

Phylogenetic Graphical Models and RevBayes: Introduction. Fred(rik) Ronquist Swedish Museum of Natural History, Stockholm, Sweden

Phylogenetic Graphical Models and RevBayes: Introduction. Fred(rik) Ronquist Swedish Museum of Natural History, Stockholm, Sweden Phylogenetic Graphical Models and RevBayes: Introduction Fred(rik) Ronquist Swedish Museum of Natural History, Stockholm, Sweden Statistical Phylogenetics Statistical approaches increasingly important:

More information

PHYLOGENETIC ALGEBRAIC GEOMETRY

PHYLOGENETIC ALGEBRAIC GEOMETRY PHYLOGENETIC ALGEBRAIC GEOMETRY NICHOLAS ERIKSSON, KRISTIAN RANESTAD, BERND STURMFELS, AND SETH SULLIVANT Abstract. Phylogenetic algebraic geometry is concerned with certain complex projective algebraic

More information

Bayesian inference & Markov chain Monte Carlo. Note 1: Many slides for this lecture were kindly provided by Paul Lewis and Mark Holder

Bayesian inference & Markov chain Monte Carlo. Note 1: Many slides for this lecture were kindly provided by Paul Lewis and Mark Holder Bayesian inference & Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly provided by Paul Lewis and Mark Holder Note 2: Paul Lewis has written nice software for demonstrating Markov

More information

series. Utilize the methods of calculus to solve applied problems that require computational or algebraic techniques..

series. Utilize the methods of calculus to solve applied problems that require computational or algebraic techniques.. 1 Use computational techniques and algebraic skills essential for success in an academic, personal, or workplace setting. (Computational and Algebraic Skills) MAT 203 MAT 204 MAT 205 MAT 206 Calculus I

More information

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057 Estimating Phylogenies (Evolutionary Trees) II Biol4230 Thurs, March 2, 2017 Bill Pearson wrp@virginia.edu 4-2818 Jordan 6-057 Tree estimation strategies: Parsimony?no model, simply count minimum number

More information

Penalized Likelihood Phylogenetic Inference: Bridging the Parsimony-Likelihood Gap

Penalized Likelihood Phylogenetic Inference: Bridging the Parsimony-Likelihood Gap Syst. Biol. 57(5):665 674, 2008 Copyright c Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 10.1080/10635150802422274 Penalized Likelihood Phylogenetic Inference: Bridging

More information

Phylogenetics and Darwin. An Introduction to Phylogenetics. Tree of Life. Darwin s Trees

Phylogenetics and Darwin. An Introduction to Phylogenetics. Tree of Life. Darwin s Trees Phylogenetics and Darwin An Introduction to Phylogenetics Bret Larget larget@stat.wisc.edu Departments of Botany and of Statistics University of Wisconsin Madison February 4, 2008 A phylogeny is a tree

More information

arxiv: v1 [q-bio.pe] 16 Aug 2007

arxiv: v1 [q-bio.pe] 16 Aug 2007 MAXIMUM LIKELIHOOD SUPERTREES arxiv:0708.2124v1 [q-bio.pe] 16 Aug 2007 MIKE STEEL AND ALLEN RODRIGO Abstract. We analyse a maximum-likelihood approach for combining phylogenetic trees into a larger supertree.

More information

Reconstructing Trees from Subtree Weights

Reconstructing Trees from Subtree Weights Reconstructing Trees from Subtree Weights Lior Pachter David E Speyer October 7, 2003 Abstract The tree-metric theorem provides a necessary and sufficient condition for a dissimilarity matrix to be a tree

More information

Today's project. Test input data Six alignments (from six independent markers) of Curcuma species

Today's project. Test input data Six alignments (from six independent markers) of Curcuma species DNA sequences II Analyses of multiple sequence data datasets, incongruence tests, gene trees vs. species tree reconstruction, networks, detection of hybrid species DNA sequences II Test of congruence of

More information

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression)

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression) Using phylogenetics to estimate species divergence times... More accurately... Basics and basic issues for Bayesian inference of divergence times (plus some digression) "A comparison of the structures

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov

More information

Concepts and Methods in Molecular Divergence Time Estimation

Concepts and Methods in Molecular Divergence Time Estimation Concepts and Methods in Molecular Divergence Time Estimation 26 November 2012 Prashant P. Sharma American Museum of Natural History Overview 1. Why do we date trees? 2. The molecular clock 3. Local clocks

More information

Phylogenetic inference

Phylogenetic inference Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis-) advantages of different information types

More information

Maximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018

Maximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018 Maximum Likelihood Tree Estimation Carrie Tribble IB 200 9 Feb 2018 Outline 1. Tree building process under maximum likelihood 2. Key differences between maximum likelihood and parsimony 3. Some fancy extras

More information

Preliminaries. Download PAUP* from: Tuesday, July 19, 16

Preliminaries. Download PAUP* from:   Tuesday, July 19, 16 Preliminaries Download PAUP* from: http://people.sc.fsu.edu/~dswofford/paup_test 1 A model of the Boston T System 1 Idea from Paul Lewis A simpler model? 2 Why do models matter? Model-based methods including

More information

Incomplete Lineage Sorting: Consistent Phylogeny Estimation From Multiple Loci

Incomplete Lineage Sorting: Consistent Phylogeny Estimation From Multiple Loci University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 1-2010 Incomplete Lineage Sorting: Consistent Phylogeny Estimation From Multiple Loci Elchanan Mossel University of

More information

Taming the Beast Workshop

Taming the Beast Workshop Workshop and Chi Zhang June 28, 2016 1 / 19 Species tree Species tree the phylogeny representing the relationships among a group of species Figure adapted from [Rogers and Gibbs, 2014] Gene tree the phylogeny

More information

The Effect of Ambiguous Data on Phylogenetic Estimates Obtained by Maximum Likelihood and Bayesian Inference

The Effect of Ambiguous Data on Phylogenetic Estimates Obtained by Maximum Likelihood and Bayesian Inference Syst. Biol. 58(1):130 145, 2009 Copyright c Society of Systematic Biologists DOI:10.1093/sysbio/syp017 Advance Access publication on May 21, 2009 The Effect of Ambiguous Data on Phylogenetic Estimates

More information

Phylogenetic Geometry

Phylogenetic Geometry Phylogenetic Geometry Ruth Davidson University of Illinois Urbana-Champaign Department of Mathematics Mathematics and Statistics Seminar Washington State University-Vancouver September 26, 2016 Phylogenies

More information

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Bioinformatics II Probability and Statistics Universität Zürich and ETH Zürich Spring Semester 2009 Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Dr Fraser Daly adapted from

More information

OLIVIER SERMAN. Theorem 1.1. The moduli space of rank 3 vector bundles over a curve of genus 2 is a local complete intersection.

OLIVIER SERMAN. Theorem 1.1. The moduli space of rank 3 vector bundles over a curve of genus 2 is a local complete intersection. LOCAL STRUCTURE OF SU C (3) FOR A CURVE OF GENUS 2 OLIVIER SERMAN Abstract. The aim of this note is to give a precise description of the local structure of the moduli space SU C (3) of rank 3 vector bundles

More information

Mathematical Biology. Phylogenetic mixtures and linear invariants for equal input models. B Mike Steel. Marta Casanellas 1 Mike Steel 2

Mathematical Biology. Phylogenetic mixtures and linear invariants for equal input models. B Mike Steel. Marta Casanellas 1 Mike Steel 2 J. Math. Biol. DOI 0.007/s00285-06-055-8 Mathematical Biology Phylogenetic mixtures and linear invariants for equal input models Marta Casanellas Mike Steel 2 Received: 5 February 206 / Revised: July 206

More information

Systematics - Bio 615

Systematics - Bio 615 Bayesian Phylogenetic Inference 1. Introduction, history 2. Advantages over ML 3. Bayes Rule 4. The Priors 5. Marginal vs Joint estimation 6. MCMC Derek S. Sikes University of Alaska 7. Posteriors vs Bootstrap

More information

Four Point Gauss Quadrature Runge Kuta Method Of Order 8 For Ordinary Differential Equations

Four Point Gauss Quadrature Runge Kuta Method Of Order 8 For Ordinary Differential Equations International journal of scientific and technical research in engineering (IJSTRE) www.ijstre.com Volume Issue ǁ July 206. Four Point Gauss Quadrature Runge Kuta Method Of Order 8 For Ordinary Differential

More information

Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/26

Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/26 Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 27. Phylogeny methods, part 4 (Models of DNA and

More information

Homoplasy. Selection of models of molecular evolution. Evolutionary correction. Saturation

Homoplasy. Selection of models of molecular evolution. Evolutionary correction. Saturation Homoplasy Selection of models of molecular evolution David Posada Homoplasy indicates identity not produced by descent from a common ancestor. Graduate class in Phylogenetics, Campus Agrário de Vairão,

More information

Anatomy of a species tree

Anatomy of a species tree Anatomy of a species tree T 1 Size of current and ancestral Populations (N) N Confidence in branches of species tree t/2n = 1 coalescent unit T 2 Branch lengths and divergence times of species & populations

More information

Markov chain Monte-Carlo to estimate speciation and extinction rates: making use of the forest hidden behind the (phylogenetic) tree

Markov chain Monte-Carlo to estimate speciation and extinction rates: making use of the forest hidden behind the (phylogenetic) tree Markov chain Monte-Carlo to estimate speciation and extinction rates: making use of the forest hidden behind the (phylogenetic) tree Nicolas Salamin Department of Ecology and Evolution University of Lausanne

More information

THE THREE-STATE PERFECT PHYLOGENY PROBLEM REDUCES TO 2-SAT

THE THREE-STATE PERFECT PHYLOGENY PROBLEM REDUCES TO 2-SAT COMMUNICATIONS IN INFORMATION AND SYSTEMS c 2009 International Press Vol. 9, No. 4, pp. 295-302, 2009 001 THE THREE-STATE PERFECT PHYLOGENY PROBLEM REDUCES TO 2-SAT DAN GUSFIELD AND YUFENG WU Abstract.

More information

V (v i + W i ) (v i + W i ) is path-connected and hence is connected.

V (v i + W i ) (v i + W i ) is path-connected and hence is connected. Math 396. Connectedness of hyperplane complements Note that the complement of a point in R is disconnected and the complement of a (translated) line in R 2 is disconnected. Quite generally, we claim that

More information

Tensors. Lek-Heng Lim. Statistics Department Retreat. October 27, Thanks: NSF DMS and DMS

Tensors. Lek-Heng Lim. Statistics Department Retreat. October 27, Thanks: NSF DMS and DMS Tensors Lek-Heng Lim Statistics Department Retreat October 27, 2012 Thanks: NSF DMS 1209136 and DMS 1057064 L.-H. Lim (Stat Retreat) Tensors October 27, 2012 1 / 20 tensors on one foot a tensor is a multilinear

More information

BIG4: Biosystematics, informatics and genomics of the big 4 insect groups- training tomorrow s researchers and entrepreneurs

BIG4: Biosystematics, informatics and genomics of the big 4 insect groups- training tomorrow s researchers and entrepreneurs BIG4: Biosystematics, informatics and genomics of the big 4 insect groups- training tomorrow s researchers and entrepreneurs Kick-Off Meeting 14-18 September 2015 Copenhagen, Denmark This project has received

More information