Phylogeny of Mixture Models

Similar documents
Pitfalls of Heterogeneous Processes for Phylogenetic Reconstruction

Pitfalls of Heterogeneous Processes for Phylogenetic Reconstruction

Phylogenetic Algebraic Geometry

Identifiability of the GTR+Γ substitution model (and other models) of DNA evolution

Algebraic Statistics Tutorial I

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.

Reconstruire le passé biologique modèles, méthodes, performances, limites

Using algebraic geometry for phylogenetic reconstruction

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Phylogenetic Tree Reconstruction

Dr. Amira A. AL-Hosary

EVOLUTIONARY DISTANCES

Phylogenetics: Likelihood

Lecture 6 Phylogenetic Inference

CREATING PHYLOGENETIC TREES FROM DNA SEQUENCES

arxiv: v1 [q-bio.pe] 23 Nov 2017

Phylogenetics: Parsimony and Likelihood. COMP Spring 2016 Luay Nakhleh, Rice University

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University

Lecture 11 : Asymptotic Sample Complexity

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

How should we go about modeling this? Model parameters? Time Substitution rate Can we observe time or subst. rate? What can we observe?

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

Phylogenetic invariants versus classical phylogenetics

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

Constructing Evolutionary/Phylogenetic Trees

Phylogeny Estimation and Hypothesis Testing using Maximum Likelihood

Phylogenetics: Building Phylogenetic Trees


Limitations of Markov Chain Monte Carlo Algorithms for Bayesian Inference of Phylogeny

A Phylogenetic Network Construction due to Constrained Recombination

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

The Generalized Neighbor Joining method

Chapter 26 Phylogeny and the Tree of Life

Phylogeny: building the tree of life

Lecture 11 Friday, October 21, 2011

1. Can we use the CFN model for morphological traits?

BMI/CS 776 Lecture 4. Colin Dewey

Plan: Evolutionary trees, characters. Perfect phylogeny Methods: NJ, parsimony, max likelihood, Quartet method

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.

BINF6201/8201. Molecular phylogenetic methods

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

Phylogenetic inference

Phylogenetics: Parsimony

Constructing Evolutionary/Phylogenetic Trees

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Page 1. Evolutionary Trees. Why build evolutionary tree? Outline

Multiple Sequence Alignment. Sequences

Phylogenetic Assumptions

Introduction to Algebraic Statistics

Algebraic Invariants of Phylogenetic Trees

Evolutionary Tree Analysis. Overview

Evolutionary trees. Describe the relationship between objects, e.g. species or genes

Mixed-up Trees: the Structure of Phylogenetic Mixtures

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis

A Statistical Test of Phylogenies Estimated from Sequence Data

Analytic Solutions for Three Taxon ML MC Trees with Variable Rates Across Sites

Letter to the Editor. Department of Biology, Arizona State University

Phylogenetics. BIOL 7711 Computational Bioscience

Evolutionary Models. Evolutionary Models

Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment

Reconstructibility of trees from subtree size frequencies

Maximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018

The Tree of Life. Phylogeny

Taxonomy. Content. How to determine & classify a species. Phylogeny and evolution

arxiv:q-bio/ v5 [q-bio.pe] 14 Feb 2007

CS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1. Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057

Macroevolution Part I: Phylogenies

Discrete & continuous characters: The threshold model

CSCI1950 Z Computa4onal Methods for Biology Lecture 5

Reconstruction of certain phylogenetic networks from their tree-average distances

8/23/2014. Phylogeny and the Tree of Life

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Molecular Evolution & Phylogenetics

Reconstructing Trees from Subtree Weights

TheDisk-Covering MethodforTree Reconstruction

Symmetric Tree, ClustalW. Divergence x 0.5 Divergence x 1 Divergence x 2. Alignment length

Theory of Evolution Charles Darwin

C.DARWIN ( )

Phylogene)cs. IMBB 2016 BecA- ILRI Hub, Nairobi May 9 20, Joyce Nzioki

Phylogenetic analyses. Kirsi Kostamo

Evolutionary trees. Describe the relationship between objects, e.g. species or genes

Lecture 4. Models of DNA and protein change. Likelihood methods

Bioinformatics 1 -- lecture 9. Phylogenetic trees Distance-based tree building Parsimony

Theory of Evolution. Charles Darwin

MOLECULAR EVOLUTION AND PHYLOGENETICS SERGEI L KOSAKOVSKY POND CSE/BIMM/BENG 181 MAY 27, 2011

Maximum Likelihood Until recently the newest method. Popularized by Joseph Felsenstein, Seattle, Washington.

Phylogeny 9/8/2014. Evolutionary Relationships. Data Supporting Phylogeny. Chapter 26

Metric learning for phylogenetic invariants

Estimating Evolutionary Trees. Phylogenetic Methods

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Mul$ple Sequence Alignment Methods. Tandy Warnow Departments of Bioengineering and Computer Science h?p://tandy.cs.illinois.edu

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

Markov Chains. Sarah Filippi Department of Statistics TA: Luke Kelly

Probabilistic modeling and molecular phylogeny

PHYLOGENY AND SYSTEMATICS

Why IQ-TREE? Bui Quang Minh Australian National University. Workshop on Molecular Evolution Woods Hole, 24 July 2018

Phylogeny: traditional and Bayesian approaches

Transcription:

Phylogeny of Mixture Models Daniel Štefankovič Department of Computer Science University of Rochester joint work with Eric Vigoda College of Computing Georgia Institute of Technology

Outline Introduction (phylogeny, molecular phylogeny) Mathematical models (CFN, JC, K2, K3) Maximum likelihood (ML) methods Our setting: mixtures of distributions ML, MCMC for ML fails for mixtures Duality theorem: tests/ambiguous mixtures Proofs (strictly separating hyperplanes, non-constructive ambiguous mixtures)

Phylogeny development of a group: the development over time of a species, genus, or group, as contrasted with the development of an individual (ontogeny) orangutan gorilla chimpanzee human

Phylogeny how? development of a group: the development over time of a species, genus, or group, as contrasted with the development of an individual (ontogeny) past morphologic data (beak length, bones, etc.) present molecular data (DNA, protein sequences)

Molecular phylogeny INPUT: aligned DNA sequences Human: Chimpanzee: Gorilla: Orangutan: ATCGGTAAGTACGTGCGAA TTCGGTAAGTAAGTGGGAT TTAGGTCAGTAAGTGCGTT TTGAGTCAGTAAGAGAGTT OUTPUT: phylogenetic tree orangutan gorilla chimpanzee human

Example of a real phylogenetic tree Eucarya Universal phylogeny deduced from comparison of SSU and LSU rrna sequences (2508 homologous sites) using Kimura s s 2-2 parameter distance and the NJ method. The absence of root in this tree is expressed using a circular design. Bacteria Archaea Source: Manolo Gouy, Introduction to Molecular Phylogeny

Dictionary Leaves = Taxa = {chimp, human,...} Vertices = Nodes Edges =Branches Tree =Tree orangutan gorilla Unrooted/Rooted trees chimpanzee human

Outline Introduction (phylogeny, molecular phylogeny) Mathematical models (CFN, JC, K2, K3) Maximum likelihood (ML) methods Our setting: mixtures of distributions ML, MCMC for ML fails for mixtures Duality theorem: tests/ambiguous mixtures Proofs (strictly separating hyperplanes, non-constructive ambiguous mixtures)

Cavender-Farris-Neyman (CFN) model Weight of an edge = probability that 0 and 1 get flipped 0.15 0.32 0.22 0.06 0.12 0.09 orangutan gorilla chimpanzee human

CFN model Weight of an edge = probability that 0 and 1 get flipped 0 0.15 0.32 0.22 0.06 0.12 0.09 orangutan gorilla chimpanzee human

CFN model Weight of an edge = probability that 0 and 1 get flipped 0 0.15 0.32 0.22 0.06 0.12 0.09 orangutan gorilla chimpanzee human 1 with probability 0.32 0 with probability 0.68

CFN model Weight of an edge = probability that 0 and 1 get flipped 0 0.15 0.32 0.06 0.22 0.12 0.09 1 orangutan gorilla chimpanzee human 1 with probability 0.32 0 with probability 0.68

CFN model Weight of an edge = probability that 0 and 1 get flipped 0 0.15 0.32 0.06 0.22 0.12 0.09 1 orangutan gorilla chimpanzee human

CFN model Weight of an edge = probability that 0 and 1 get flipped 0 0.15 0.32 0.06 0.22 0.12 0.09 1 orangutan gorilla chimpanzee human 1 with probability 0.15 0 with probability 0.85

CFN model Weight of an edge = probability that 0 and 1 get flipped 0 0.15 0 0.32 0.06 0.22 0 0.12 0.09 1 0 1 0 orangutan gorilla chimpanzee human

CFN model Weight of an edge = probability that 0 and 1 get flipped 0 0.15 0 0.32 0.06 0.22 0 0.12 0.09 1 0 1 0 orangutan gorilla chimpanzee human 1.. 0.. 1.. 0..

CFN model Weight of an edge = probability that 0 and 1 get flipped 1 0.15 0 0.32 0.06 0.22 0 0.12 0.09 1 0 0 1 orangutan gorilla chimpanzee human 11.. 00.. 10.. 01..

CFN model Weight of an edge = probability that 0 and 1 get flipped 1 0.15 1 0.32 0.06 0.22 1 0.12 0.09 0 1 1 1 orangutan gorilla chimpanzee human 110.. 001.. 101.. 011..

CFN model Weight of an edge = probability that 0 and 1 get flipped 0.15 0.32 0.22 0.06 0.12 0.09 orangutan gorilla chimpanzee human 0000,0001,0010,0011,0100,0101,0110,0111, Denote the distribution on leaves μ(t,w) T = tree topology w = set of weights on edges

Generalization to more states Weight of an edge = probability that 0 and 1 get flipped transition matrix A G C T A G C T 0.9 0.05 0.03 0.02 0.05 0.87 0.07 0.01 0.03 0.07 0.89 0.01 0.02 0.01 0.01 0.96 A A A T G C A orangutan gorilla chimpanzee human

Models: Jukes-Cantor (JC) Rate matrix exp( t.r ) there are 4 states A G C T A G C T 0.9 0.05 0.03 0.02 0.05 0.87 0.07 0.01 0.03 0.07 0.89 0.01 0.02 0.01 0.01 0.96

Models: Kimura s 2 parameter (K2) Rate matrix exp( t.r ) purine/pyrimidine mutations less likely A G C T A G C T 0.9 0.05 0.03 0.02 0.05 0.87 0.07 0.01 0.03 0.07 0.89 0.01 0.02 0.01 0.01 0.96

Models: Kimura s 3 parameter (K3) Rate matrix exp( t.r ) take hydrogen bonds into account A G C T A G C T 0.9 0.05 0.03 0.02 0.05 0.87 0.07 0.01 0.03 0.07 0.89 0.01 0.02 0.01 0.01 0.96

Reconstructing the tree? Let D be samples from μ(t,w). Can we reconstruct T (and w)? parsimony distance based methods maximum likelihood methods (using MCMC) invariants? Main obstacle for all methods: too many leaf-labeled trees (2n-3)!!=(2n-3)(2n-5) 3.1

Outline Introduction (phylogeny, molecular phylogeny) Mathematical models (CFN, JC, K2, K3) Maximum likelihood (ML) methods Our setting: mixtures of distributions ML, MCMC for ML fails for mixtures Duality theorem: tests/ambiguous mixtures Proofs (strictly separating hyperplanes, non-constructive ambiguous mixtures)

Maximum likelihood method Let D be samples from μ(t,w). Likelihood of tree S is L(S) = max w Pr(D S,w) For D then the maximum likelihood tree is T

MCMC Algorithms for max-likelihood Combinatorial steps: NNI moves (Nearest Neighbor Interchange) Numerical steps (i.e., changing the weights) Move with probability min{1,l(t new )/L(T old )}

MCMC Algorithms for max-likelihood Only combinatorial steps: NNI moves (Nearest Neighbor Interchange) Does this Markov Chain mix rapidly? Not known!

Outline Introduction (phylogeny, molecular phylogeny) Mathematical models (CFN, JC, K2, K3) Maximum likelihood (ML) methods Our setting: mixtures of distributions ML, MCMC for ML fails for mixtures Duality theorem: tests/ambiguous mixtures Proofs (strictly separating hyperplanes, non-constructive ambiguous mixtures)

Mixtures one tree topology multiple mixtures Can we reconstruct the tree T? The mutation rates differ for positions in DNA

Reconstruction from mixtures - ML Theorem 1: maximum likelihood: fails to for CFN, JC, K2, K3 For all 0<C<1/2, all x sufficiently small: (i) maximum likelihood tree true tree (ii) 5-leaf version: MCMC torpidly mixing Similarly for JC, K2, and K3 models

Reconstruction from mixtures - ML Related results: [Kolaczkowski,Thornton] Nature, 2004. Experimental results for JC model [Chang] Math. Biosci., 1996. Different example for CFN model.

Reconstruction from mixtures - ML roof: ifficulty: finding edge weights that maximize likelihood. or x=0, trees are the same -- pure distribution, tree achievable on all opologies. So know max likelihood weights for every topology. (observed) T log μ(t,w) f observed comes from \mu(s,v) then it is optimal to take =S and w=v (basic property of log-likelihood)

Reconstruction from mixtures - ML roof: ifficulty: finding edge weights that maximize likelihood. or x=0, trees are the same -- pure distribution, tree achievable on all opologies. So know max likelihood weights for every topology. or x small, look at Taylor expansion bound max likelihood in terms of =0 case and functions of Jacobian and Hessian. =

Outline Introduction (phylogeny, molecular phylogeny) Mathematical models (CFN, JC, K2, K3) Maximum likelihood (ML) methods Our setting: mixtures of distributions ML, MCMC for ML fails for mixtures Duality theorem: tests/ambiguous mixtures Proofs (strictly separating hyperplanes, non-constructive ambiguous mixtures)

Reconstruction other algorithms? GOAL: Determine tree topology Duality theorem: Every model has either: A) ambiguous mixture distributions on 4 leaf trees (reconstruction impossible) B) linear tests (reconstruction easy) The dimension of the space of possible linear tests: CFN = 2, JC = 2, K2 = 5, K3 = 9

Ambiguity in CFN model For all 0<a,b<1/2, there is c=c(a,b) where: above mixture distribution on tree T is identical to below mixture distribution on tree S. Previously: non-constructive proof of nicer ambiguity in CFN model [Steel,Szekely,Hendy,1996]

What about JC?

What about JC? Reconstruction of the topology from mixture possible. Linear test = linear function which is >0 for mixture from T 2 <0 for mixture from T 3 There exists a linear test for JC model. Follows immediately from Lake 1987 linear invariants.

Lake s invariants Test f=μ(agcc) + μ(acac) + μ(aact) +μ(acgt) - μ(acgc) - μ(aacc) - μ(acat) - μ(agct) For μ=μ(t 1,w), f=0 For μ=μ(t 2,w), f<0 For μ=μ(t 3,w), f>0

Linear invariants v. Tests Linear invariant = hyperplane containing mixtures from T 1 Test = hyperplane strictly separating mixtures from T 2 from mixtures from T 3 pure distributions from T 2

Linear invariants v. Tests Linear invariant = hyperplane containing mixtures from T 1 Test = hyperplane strictly separating mixtures from T 2 from mixtures from T 3 pure distributions from T 2 ixtures from T 2

Linear invariants v. Tests Linear invariant = hyperplane containing mixtures from T 1 Test = hyperplane strictly separating mixtures from T 2 from mixtures from T 3 mixtures from T 3 ixtures from T 2

Linear invariants v. Tests Linear invariant = hyperplane containing mixtures from T 1 Test = hyperplane strictly separating mixtures from T 2 from mixtures from T 3 mixtures from T 3 ixtures from T 2 test

Separating hyperplanes Duality theorem: Every model has either: A) ambiguous mixture distributions on 4 leaf trees (reconstruction impossible) B) linear tests (reconstruction easy) Separating hyperplane theorem: ambiguous mixture separating hyperplane

Strictly separating hyperplanes??? Duality theorem: Every model has either: A) ambiguous mixture distributions on 4 leaf trees (reconstruction impossible) B) linear tests (reconstruction easy) Separating hyperplane theorem?: ambiguous mixture strictly separating hyperplane?

Strictly separating not always possible Separating hyperplane theorem?: ambiguous mixture strictly separating hyperplane? NO strictly separating hyperplane { (x,y) x>0 } { (0,y) y>0 } {(0,0)}

When strictly separating possible? NO strictly separating hyperplane { (x,y) x>0 } { (0,y) y>0 } (x,y 2 xz) x 0, y>0 {(0,0)} Lemma: Sets which are convex hulls of images of open sets under a multi-linear polynomial map have a strictly separating hyperplane. standard phylogeny models satisfy the assumption

Outline Introduction (phylogeny, molecular phylogeny) Mathematical models (CFN, JC, K2, K3) Maximum likelihood (ML) methods Our setting: mixtures of distributions ML, MCMC for ML fails for mixtures Duality theorem: tests/ambiguous mixtures Proofs (strictly separating hyperplanes, non-constructive ambiguous mixtures)

Proof Lemma: For sets which are convex hulls of images of open sets under a multi-linear polynomial map strictly separating hyperplane. Proof: P 1 (x 1,,x m ),,P n (x 1,,x m ), x=(x 1,,x m ) O WLOG linearly independent

Proof Lemma: For sets which are convex hulls of images of open sets under a multi-linear polynomial map strictly separating hyperplane. Proof: P 1 (x 1,,x m ),,P n (x 1,,x m ), x=(x 1,,x m ) O Have s 1,,s n such that s 1 P 1 (x) + + s n P n (x) 0 for all x O

Proof Lemma: For sets which are convex hulls of images of open sets under a multi-linear polynomial map strictly separating hyperplane. Proof: P 1 (x 1,,x m ),,P n (x 1,,x m ), x=(x 1,,x m ) O Have s 1,,s n such that s 1 P 1 (x) + + s n P n (x) 0 for all x O Goal: show s 1 P 1 (x) + + s n P n (x) > 0 for all x O

Proof Lemma: For sets which are convex hulls of images of open sets under a multi-linear polynomial map strictly separating hyperplane. Proof: P 1 (x 1,,x m ),,P n (x 1,,x m ), x=(x 1,,x m ) O Have s 1,,s n such that s 1 P 1 (x) + + s n P n (x) 0 for all x O Goal: show s 1 P 1 (x) + + s n P n (x) > 0 for all x O Suppose: s 1 P 1 (a) + + s n P n (a) = 0 for some a O

Proof Lemma: For sets which are convex hulls of images of open sets under a multi-linear polynomial map strictly separating hyperplane. Proof: P 1 (x 1,,x m ),,P n (x 1,,x m ), x=(x 1,,x m ) O linearly independent s 1 P 1 (x) + + s n P n (x) 0 for all x O s 1 P 1 (0) + + s n P n (0) = 0 Let R(x)=s 1 P 1 (x) + + s n P(x) - non-zero polynomial

Proof Lemma: For sets which are convex hulls of images of open sets under a multi-linear polynomial map strictly separating hyperplane. Proof: P 1 (x 1,,x m ),,P n (x 1,,x m ), x=(x 1,,x m ) O linearly independent s 1 P 1 (x) + + s n P n (x) 0 for all x O s 1 P 1 (0) + + s n P n (0) = 0 Let R(x)=s 1 P 1 (x) + + s n P(x) - non-zero polynomial R(0)=0 no constant monomial

Proof Lemma: For sets which are convex hulls of images of open sets under a multi-linear polynomial map strictly separating hyperplane. Proof: P 1 (x 1,,x m ),,P n (x 1,,x m ), x=(x 1,,x m ) O linearly independent s 1 P 1 (x) + + s n P n (x) 0 for all x O s 1 P 1 (0) + + s n P n (0) = 0 Let R(x)=s 1 P 1 (x) + + s n P(x) - non-zero polynomial R(0,,0,x i,0,0) 0 no monomial x i

Proof Lemma: For sets which are convex hulls of images of open sets under a multi-linear polynomial map strictly separating hyperplane. Proof: P 1 (x 1,,x m ),,P n (x 1,,x m ), x=(x 1,,x m ) O linearly independent s 1 P 1 (x) + + s n P n (x) 0 for all x O s 1 P 1 (0) + + s n P n (0) = 0 Let R(x)=s 1 P 1 (x) + + s n P(x) - non-zero polynomial R(0,,0,x i,0,0) 0 no monomial x i. no monomials at all, a contradiction

Outline Introduction (phylogeny, molecular phylogeny) Mathematical models (CFN, JC, K2, K3) Maximum likelihood (ML) methods Our setting: mixtures of distributions ML, MCMC for ML fails for mixtures Duality theorem: tests/ambiguous mixtures Proofs (strictly separating hyperplanes, non-constructive ambiguous mixtures)

Duality application: non-constructive proof of mixtures Duality theorem: Every model has either: A) ambiguous mixture distributions on 4 leaf trees (reconstruction impossible) B) linear tests (reconstruction easy) For K3 model the space of possible tests has dimension 9 T = σ 1 T 1 + + σ 9 T 9 Goal: show that there exists no test

LEM: The set of roots of a non-zero generalized polynomial has measure 0. Duality application: rate matrix non-constructive proof of mixtures entries in P = generalized polynomials poly(α,β,γ,x) exp(lin(α,β,γ,x)) transition matrix P = exp(x.r)

Non-constructive proof of mixtures P(3x) P(4x) P(0) P(x) P(2x) transition matrix P(x) = exp(x.r) Test should be 0 by continuity. T 1,,T 9 are generalized polynomials in α,β,γ,x Wronskian det W x (T 1,,T 9 ) is a generalized polynomial α,β,γ,x det W x (T 1,,T 9 ) 0 W x (T 1, T 9 ) [σ 1, σ 9 ]=0 NO TEST!

LEM: The set of roots of a non-zero generalized polynomial has measure 0. Non-constructive proof of mixtures The last obstacle: Wronskian W(T1,,T9) is non-zero Horrendous generalized polynomials, even for e.g., α=1,β=2,γ=4 plug-in complex numbers

Outline Introduction (phylogeny, molecular phylogeny) Mathematical models (CFN, JC, K2, K3) Maximum likelihood (ML) methods Our setting: mixtures of distributions ML, MCMC for ML fails for mixtures Duality theorem: tests/ambiguous mixtures Proofs (strictly separating hyperplanes, non-constructive ambiguous mixtures)

Open questions M a semigroup of doubly stochasic matrices (with multiplication). Under what conditions on M can you reconstruct the tree topology? * x x x 0<x<1/4 0<x<1/2 * x x * x x x x * x yes no x * x x x * * x y y 0<y x<1/4 0<z y x<1/2 * x y z x * y y y y * x yes no x * z y y z * x y y x * z y x *

Open questions Idealized setting: For data generated from a pure distribution (i.e., a single tree, no mixture): Are MCMC algorithms rapidly or torpidly mixing? How many characters (samples) needed until maximum likelihood tree is true tree?