Algebraic Statistics Tutorial I

Similar documents
Phylogenetic Algebraic Geometry

Example: Hardy-Weinberg Equilibrium. Algebraic Statistics Tutorial I. Phylogenetics. Main Point of This Tutorial. Model-Based Phylogenetics

Introduction to Algebraic Statistics

Phylogenetic invariants versus classical phylogenetics

Using algebraic geometry for phylogenetic reconstruction

Identifiability of the GTR+Γ substitution model (and other models) of DNA evolution

Toric Fiber Products

PHYLOGENETIC ALGEBRAIC GEOMETRY

Geometry of Phylogenetic Inference

Species Tree Inference using SVDquartets

Open Problems in Algebraic Statistics

Metric learning for phylogenetic invariants

arxiv: v1 [q-bio.pe] 23 Nov 2017

PRIMARY DECOMPOSITION FOR THE INTERSECTION AXIOM

arxiv: v1 [q-bio.pe] 27 Oct 2011

ELIZABETH S. ALLMAN and JOHN A. RHODES ABSTRACT 1. INTRODUCTION

Phylogeny of Mixture Models

Total binomial decomposition (TBD) Thomas Kahle Otto-von-Guericke Universität Magdeburg

The Generalized Neighbor Joining method

Algebraic Invariants of Phylogenetic Trees

Is Every Secant Variety of a Segre Product Arithmetically Cohen Macaulay? Oeding (Auburn) acm Secants March 6, / 23

Pitfalls of Heterogeneous Processes for Phylogenetic Reconstruction

mathematics Smoothness in Binomial Edge Ideals Article Hamid Damadi and Farhad Rahmati *

Tutorial: Gaussian conditional independence and graphical models. Thomas Kahle Otto-von-Guericke Universität Magdeburg

Pitfalls of Heterogeneous Processes for Phylogenetic Reconstruction

Jed Chou. April 13, 2015

USING INVARIANTS FOR PHYLOGENETIC TREE CONSTRUCTION

Maximum Likelihood Estimates for Binary Random Variables on Trees via Phylogenetic Ideals

When Do Phylogenetic Mixture Models Mimic Other Phylogenetic Models?

19 Tree Construction using Singular Value Decomposition

The binomial ideal of the intersection axiom for conditional probabilities

Combinatorial Aspects of Tropical Geometry and its interactions with phylogenetics

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.

Noetherianity up to symmetry

Phylogenetic Geometry

Reconstruire le passé biologique modèles, méthodes, performances, limites

Geometry of Gaussoids

Recent Progress in Combinatorial Statistics

Oeding (Auburn) tensors of rank 5 December 15, / 24

Algebraic Complexity in Statistics using Combinatorial and Tensor Methods

Infinite-dimensional combinatorial commutative algebra

CSCI1950 Z Computa4onal Methods for Biology Lecture 5

Dissimilarity maps on trees and the representation theory of GL n (C)

Consistency Index (CI)

1. Can we use the CFN model for morphological traits?

Mixed-up Trees: the Structure of Phylogenetic Mixtures

Mathematical Biology. Phylogenetic mixtures and linear invariants for equal input models. B Mike Steel. Marta Casanellas 1 Mike Steel 2

The statistical and informatics challenges posed by ascertainment biases in phylogenetic data collection

Algebraic Classification of Small Bayesian Networks

Homotopy Techniques for Tensor Decomposition and Perfect Identifiability

The intersection axiom of

Theory of Evolution Charles Darwin

arxiv: v4 [math.ac] 31 May 2012

Commuting birth-and-death processes

Maximum Agreement Subtrees

Combinatorics and geometry of E 7

Constructing Evolutionary/Phylogenetic Trees

arxiv: v1 [q-bio.pe] 1 Jun 2014

Vector bundles in Algebraic Geometry Enrique Arrondo. 1. The notion of vector bundle

University of Warwick institutional repository:

Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30

5. Grassmannians and the Space of Trees In this lecture we shall be interested in a very particular ideal. The ambient polynomial ring C[p] has ( n

Toric Varieties in Statistics

Reconstructing Trees from Subtree Weights

EVOLUTIONARY DISTANCES

Evolutionary Tree Analysis. Overview

BMI/CS 776 Lecture 4. Colin Dewey

Binomial Exercises A = 1 1 and 1

Lecture 1. Toric Varieties: Basics

Special Session on Secant Varieties. and Related Topics. Secant Varieties of Grassmann and Segre Varieties. Giorgio Ottaviani

Algebraic Varieties. Notes by Mateusz Micha lek for the lecture on April 17, 2018, in the IMPRS Ringvorlesung Introduction to Nonlinear Algebra

arxiv: v3 [q-bio.pe] 3 Dec 2011

T -equivariant tensor rank varieties and their K-theory classes

On the extremal Betti numbers of binomial edge ideals of block graphs

LECTURES ON SYMPLECTIC REFLECTION ALGEBRAS

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

The Maximum Likelihood Threshold of a Graph

arxiv: v1 [math.ra] 13 Jan 2009

Secant Varieties and Equations of Abo-Wan Hypersurfaces

ADVANCED TOPICS IN ALGEBRAIC GEOMETRY

arxiv: v1 [cs.cl] 5 Dec 2017

A few logs suce to build (almost) all trees: Part II

Processes of Evolution

Some Algebraic and Combinatorial Properties of the Complete T -Partite Graphs

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

Toric ideals finitely generated up to symmetry

Algorithms in Bioinformatics

CSCI1950 Z Computa4onal Methods for Biology Lecture 4. Ben Raphael February 2, hhp://cs.brown.edu/courses/csci1950 z/ Algorithm Summary

TheDisk-Covering MethodforTree Reconstruction

Phylogenetic Networks, Trees, and Clusters

Multi-parameter persistent homology: applications and algorithms

arxiv: v3 [math.ag] 21 Aug 2008

ALGEBRA: From Linear to Non-Linear. Bernd Sturmfels University of California at Berkeley

Lie Markov models. Jeremy Sumner. School of Physical Sciences University of Tasmania, Australia

Phylogenetic Tree Reconstruction

On the topology of matrix configuration spaces

ALGEBRAIC PROPERTIES OF BIER SPHERES


Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

arxiv: v6 [math.st] 19 Oct 2011

Transcription:

Algebraic Statistics Tutorial I Seth Sullivant North Carolina State University June 9, 2012 Seth Sullivant (NCSU) Algebraic Statistics June 9, 2012 1 / 34

Introduction to Algebraic Geometry Let R[p] = R[p 1,..., p m ] be the set of all polynomials in indeterminates p 1,..., p m with real coefficients. Definition Let F R[p 1,..., p m ]. The variety defined by F is the set V (F) := {a R m f (a) = 0 for all f F}. V ({p 2 p 2 1 }) = Seth Sullivant (NCSU) Algebraic Statistics June 9, 2012 2 / 34

Definition Given S R m is the vanishing ideal of S. I(S) := {f R[p] f (a) = 0 for all a S} Theorem (Hilbert s Basis Theorem) Every ideal I R[p] has a finite generating set. That is, for each ideal I, there exists a finite set F I such that F = I, where { } F := hi f i h i R[p], f i F. Seth Sullivant (NCSU) Algebraic Statistics June 9, 2012 3 / 34

Example: Hardy-Weinberg Equilibrium Suppose a gene has two alleles, a and A. If allele a occurs in the population with frequency θ (and A with frequency 1 θ) and these alleles are in Hardy-Weinberg equilibrium, the genotype frequencies are P(X = aa) = θ 2, P(X = aa) = 2θ(1 θ), P(X = AA) = (1 θ) 2 The model of Hardy-Weinberg equilibrium is the set M = {( θ 2, 2θ(1 θ), (1 θ) 2) } θ [0, 1] 3 I(M) = p aa +p aa +p AA 1, p 2 aa 4p aap AA Seth Sullivant (NCSU) Algebraic Statistics June 9, 2012 4 / 34

Main Point of This Tutorial Many statistical models are described by (semi)-algebraic constraints on a natural parameter space. Generators of the vanishing ideal can be useful for constructing algorithms or analyzing properties of statistical model. Two Examples Phylogenetic Algebraic Geometry Sampling Contingency Tables Seth Sullivant (NCSU) Algebraic Statistics June 9, 2012 5 / 34

Phylogenetics Problem Given a collection of species, find the tree that explains their history. Data consists of aligned DNA sequences from homologous genes Human: Chimp: Gorilla: Seth Sullivant (NCSU)...ACCGTGCAACGTGAACGA......ACCTTGGAAGGTAAACGA......ACCGTGCAACGTAAACTA... Algebraic Statistics June 9, 2012 6 / 34

Model-Based Phylogenetics Use a probabilistic model of mutations Parameters for the model are the combinatorial tree T, and rate parameters for mutations on each edge Models give a probability for observing a particular aligned collection of DNA sequences Human: Chimp: Gorilla: ACCGTGCAACGTGAACGA ACGTTGCAAGGTAAACGA ACCGTGCAACGTAAACTA Assuming site independence, data is summarized by empirical distribution of columns in the alignment. e.g. ˆp(AAA) = 6 18, ˆp(CGC) = 2 18, etc. Use empirical distribution and test statistic to find tree best explaining data Seth Sullivant (NCSU) Algebraic Statistics June 9, 2012 7 / 34

Phylogenetic Models Assuming site independence: Phylogenetic Model is a latent class graphical model Vertex v T gives a random variable X v {A, C, G, T} All random variables corresponding to internal nodes are latent X 1 X 2 X 3 Y 2 Y 1 P(x 1, x 2, x 3 ) = y 1 y 2 P(y 1 )P(y 2 y 1 )P(x 1 y 1 )P(x 2 y 2 )P(x 3 y 2 ) Seth Sullivant (NCSU) Algebraic Statistics June 9, 2012 8 / 34

Phylogenetic Models Assuming site independence: Phylogenetic Model is a latent class graphical model Vertex v T gives a random variable X v {A, C, G, T} All random variables corresponding to internal nodes are latent X 1 X 2 X 3 Y 2 Y 1 p i1 i 2 i 3 = j 1 j 2 π j1 a j2,j 1 b i1,j 1 c i2,j 2 d i3,j 2 Seth Sullivant (NCSU) Algebraic Statistics June 9, 2012 9 / 34

Algebraic Perspective on Phylogenetic Models Once we fix a tree T and model structure, we get a map φ T : Θ R 4n. Θ R d is a parameter space of numerical parameters (transition matrices associated to each edge). The map φ T is given by polynomial functions of the parameters. For each i 1 i n {A, C, G, T } n, φ T i 1 i n (θ) gives the probability of the column (i 1,..., i n ) in the alignment for the particular parameter choice θ. φ T i 1 i 2 i 3 (π, a, b, c, d) = π j1 a j2,j 1 b i1,j 1 c i2,j 2 d i3,j 2 j 1 j 2 The phylogenetic model is the set M T = φ T (Θ) R 4n. Seth Sullivant (NCSU) Algebraic Statistics June 9, 2012 10 / 34

Phylogenetic Varieties and Phylogenetic Invariants Let R[p] := R[p i1 i n : i 1 i n {A, C, G, T } n ] Definition Let I T := f R[p] : f (p) = 0 for all p M T R[p]. I T is the ideal of phylogenetic invariants of T. Let V T := {p R 4n : f (p) = 0 for all f I T }. V T is the phylogenetic variety of T. Note that M T V T. Since M T is image of a polynomial map dim M T = dim V T. Seth Sullivant (NCSU) Algebraic Statistics June 9, 2012 11 / 34

X 1 X X X 2 3 4 p lmno = 4 4 4 π i a ij b ik c jl d jm e kn f ko i=1 j=1 k=1 Seth Sullivant (NCSU) Algebraic Statistics June 9, 2012 12 / 34

X 1 X X X 2 3 4 p lmno = 4 4 4 π i a ij b ik c jl d jm e kn f ko i=1 j=1 k=1 4 a ij c jl d jm j=1 Seth Sullivant (NCSU) Algebraic Statistics June 9, 2012 12 / 34

X 1 X X X 2 3 4 p lmno = 4 4 4 π i a ij b ik c jl d jm e kn f ko i=1 j=1 k=1 4 4 a ij c jl d jm b ik e kn f ko j=1 k =1 Seth Sullivant (NCSU) Algebraic Statistics June 9, 2012 12 / 34

X 1 X X X 2 3 4 p lmno = = 4 4 4 π i a ij b ik c jl d jm e kn f ko i=1 j=1 k=1 4 4 4 a ij c jl d jm b ik e kn f ko π i i=1 j=1 k =1 Seth Sullivant (NCSU) Algebraic Statistics June 9, 2012 12 / 34

X 1 X X X 2 3 4 p lmno = = 4 4 4 π i a ij b ik c jl d jm e kn f ko i=1 j=1 k=1 4 4 4 a ij c jl d jm b ik e kn f ko π i i=1 j=1 k =1 p 1111 p 1112 p 1144 p = rank 1211 p 1212 p 1244...... 4 p 4411 p 4412 p 4444 Seth Sullivant (NCSU) Algebraic Statistics June 9, 2012 12 / 34

Splits and Phylogenetic Invariants Definition A split of a set is a bipartition A B. A split A B of the leaves of a tree T is valid for T if the induced trees T A and T B do not intersect. X 1 X X X 2 3 4 Valid: 12 34 Not Valid: 13 24 Seth Sullivant (NCSU) Algebraic Statistics June 9, 2012 13 / 34

2-way Flattenings and Matrix Ranks Proposition Let P M T. p ijkl = P(X 1 = i, X 2 = j, X 3 = k, X 4 = l) p AAAA p AAAC p AAAG p AATT p ACAA p ACAC p ACAG p ACTT Flat 12 34 (P) =....... p TTAA p TTAC p TTAG p TTTT If A B is a valid split for T, then rank(flat A B (P)) 4. Invariants in I T are subdeterminants of Flat A B (P). If C D is not a valid split for T, then generically rank(flat C D (P)) > 4. Seth Sullivant (NCSU) Algebraic Statistics June 9, 2012 14 / 34

Phylogenetic Algebraic Geometry Phylogenetic Algebraic Geometry is the study of the phylogenetic varieties and ideals V T and I T. Using Phylogenetic Invariants to Reconstruct Trees Identifiability of Phylogenetic Models Interesting Math Useful in Other Problems Seth Sullivant (NCSU) Algebraic Statistics June 9, 2012 15 / 34

Using Phylogenetic Invariants to Reconstruct Trees Definition A phylogenetic invariant f I T is phylogenetically informative if there is some other tree T such that f / I T. Idea of Cavender-Felsenstein (1987), Lake (1987): Evaluate phylogenetically informative phylogenetic invariants at empirical distribution ˆp to reconstruct phylogenetic trees Proposition For each n-leaf trivalent tree T, let F T I T be a set of phylogenetic invariants such that, for each T T, there is an f F T, such that f / I T. Let f T := f F T f. Then for generic p M T, f T (p) = 0 if and only if p M T. Seth Sullivant (NCSU) Algebraic Statistics June 9, 2012 16 / 34

Performance of Invariants Methods in Simulations Huelsenbeck (1995) did a systematic simulation comparison of 26 different methods of constructing a phylogenetic tree on 4 leaf trees. Invariant-based methods did poorly. HOWEVER... Huelsenbeck only used linear invariants. Casanellas, Fernandez-Sanchez (2006) redid these simulations using a generating set of the phylogenetic ideal I T. Phylogenetic invariants become comparable to other methods. For the particular model studied in Casanellas, Fernandez-Sanchez (2006) for a tree with 4 leaves, the ideal I T has 8002 generators. f T := f F T f is a sum of 8002 terms. Major work to overcome combinatorial explosion for larger trees. Seth Sullivant (NCSU) Algebraic Statistics June 9, 2012 17 / 34

Identifiability of Phylogenetic Models Definition A parametric statistical model is identifiable if it gives 1-to-1 map from parameters to probability distributions. Is it possible to infer the parameters of the model from data? Identifiability guarantees consistency of statistical methods (ML) Two types of parameters to consider for phylogenetic models: Numerical parameters (transition matrices) Tree parameter (combinatorial type of tree) Seth Sullivant (NCSU) Algebraic Statistics June 9, 2012 18 / 34

Geometric Perspective on Identifiability Definition The unrooted tree parameter T in a phylogenetic model is identifiable if for all p M T there does not exist another T T such that p M T. M 1 M 2 M1 M2 M 3 Identifiable Not Identifiable Seth Sullivant (NCSU) Algebraic Statistics June 9, 2012 19 / 34

Generic Identifiability Definition The tree parameter in a phylogenetic model is generically identifiable if for all n-leaf trees with T T, dim(m T M T ) < min(dim(m T ), dim(m T )). M1 M2 M3 Seth Sullivant (NCSU) Algebraic Statistics June 9, 2012 20 / 34

Proving Identifiability with Algebraic Geometry Proposition Let M 0 and M 1 be two algebraic models. If there exist phylogenetically informative invariants f 0 and f 1 such that f i (p) = 0 for all p M i, and f i (q) 0 for some q M 1 i, then dim(m 0 M 1 ) < min(dim M 0, dim M 1 ). M0 f = 0 M 1 Seth Sullivant (NCSU) Algebraic Statistics June 9, 2012 21 / 34

Phylogenetic Models are Identifiable Theorem The unrooted tree parameter of phylogenetic models is generically identifiable. Proof. Edge flattening invariants can detect which splits are implied by a specific distribution in M T. The splits in T uniquely determine T. Seth Sullivant (NCSU) Algebraic Statistics June 9, 2012 22 / 34

Phylogenetic Mixture Models Basic phylogenetic model assume same parameters at every site This assumption is not accurate within a single gene Some sites more important: unlikely to change Tree structure may vary across genes Leads to mixture models for different classes of sites M(T, r) denotes a same tree mixture model with underlying tree T and r classes of sites Seth Sullivant (NCSU) Algebraic Statistics June 9, 2012 23 / 34

Identifiability Questions for Mixture Models Question For fixed number of trees k, are the tree parameters T 1,..., T k, and rate parameters of each tree (generically) identified in phylogenetic mixture models? k = 1 (Ordinary phylogenetic models) Most models are identifiable on 2, 3, 4 leaves. ( Rogers, Chang, Steel, Hendy, Penny, Székely, Allman, Rhodes, Housworth,...) k > 1 T 1 = T 2 = = T k but no restriction on number of trees Not identifiable (Matsen-Steel, Stefankovic-Vigoda) k > 1, T i arbitrary Not identifiable (Mossel-Vigoda) Seth Sullivant (NCSU) Algebraic Statistics June 9, 2012 24 / 34

Theorem (Rhodes-Sullivant 2011) The unrooted tree and numerical parameters in a r-class, same tree phylogenetic mixture model on n-leaf trivalent trees are generically identifiable, if r < 4 n/4. Proof Ideas. Phylogenetic invariants from flattenings Tensor rank (Kruskal s Theorem) [Allman-Matias-Rhodes 2009] Elementary tree combinatorics Solving tree and numerical parameter identifiability at the same time Seth Sullivant (NCSU) Algebraic Statistics June 9, 2012 25 / 34

How to Construct Phylogenetic Invariants? Theorem (Sturmfels-S, Allman-Rhodes, Casanellas-S, Draisma-Kuttler) Consider nice algebraic phylogenetic model. The problem of computing phylogenetic invariants for any tree T can be reduced to the same problem for star trees K 1,k. The ideal I T generated by local contributions from each K 1,k, plus flattening invariants from edges. The varieties V K1,k are interesting classical algebraic varieties: toric varieties secant varieties Sec 4 (P 3 P 3 P 3 ) Seth Sullivant (NCSU) Algebraic Statistics June 9, 2012 26 / 34

Group-based models ( ) α β β α α β β β β α β β β β α β β β β α α β γ γ β α γ γ γ γ α β γ γ β α α β γ δ β α δ γ γ δ α β δ γ β α Random variables in finite abelian group G. Transitions probabilities satisfy Prob(X = g Y = h) = f (g + h). This means that the formula for Prob(X 1 = g 1,..., X n = g n ) is a convolution (over G n ). Apply discrete Fourier transform to turn convolution into a product. Theorem (Hendy-Penny 1993, Evans-Speed 1993) In the Fourier coordinates, a group-based model is parametrized by monomial functions in terms of the Fourier parameters. In particular, the CFN model is a toric variety. Seth Sullivant (NCSU) Algebraic Statistics June 9, 2012 27 / 34

Equations for the CFN Model Theorem (Sturmfels-S 2005) For any tree T, the toric ideal I T for the CFN model is generated by degree 2 determinantal equations. X 1 X 2 X3 X4 Fourier coordinates: q lmno = r,s,t,u {0,1} ( 1)rl+sm+tn+uo p rstu ( ) q0000 q 0001 q 0010 q 0011 q 1100 q 1101 q 1110 q 1111 ( q0100 q 0101 q 0110 ) q 0111 q 1000 q 1001 q 1010 q 1011 I T generated by 2 2 minors of: q 0000 q 0011 q 0001 q 0010 q 0100 q 0111 q 0101 q 0110 q 1000 q 1011 q 1001 q 1010 q 1100 q 1111 q 1101 q 1110 Seth Sullivant (NCSU) Algebraic Statistics June 9, 2012 28 / 34

Gluing Two Trees at a Leaf Let T = T 1 #T 2, tree obtained by joining two trees at a leaf. Each ring C[p]/I T1, C[p]/I T2 is invariant under action of group G = Gl r (C) k acting on the glue leaves. Theorem (Draisma-Kuttler) C[p]/I T = (C[p]/IT1 C C[p]/I T2 ) G V T = (V T1 V T2 )//G (GIT quotient) Actions of individual factors (Gl r (C)) do no interact. Use Reynolds operator and first fundamental theorem of CIT. Seth Sullivant (NCSU) Algebraic Statistics June 9, 2012 29 / 34

Gluing more complex graphs + = + = Still a group action (Gl r (C) k ). But factors are not acting independently. C[p]/I G = (C[p]/IG1 C C[p]/I G2 ) G C[p]/I G generated by degree 1 part of (C[p]/I G1 C C[p]/I G2 ) G (toric fiber product if r = 1) Theorem (Engström-Kahle-S) Can determine generators of I G from I G1 and I G2 if the TFP has low codimension. Useful for other problems in algebraic statistics. Seth Sullivant (NCSU) Algebraic Statistics June 9, 2012 30 / 34

Summary: Phylogenetic Algebraic Geometry Phylogenetic models are fundamentally algebraic-geometric objects. Algebraic perspective is useful for: Developing new construction algorithms Proving theorems about identifiability (currently best available for mixture models) Leads to interesting new mathematics, useful for other problems Long way to go: Your Help Needed! Seth Sullivant (NCSU) Algebraic Statistics June 9, 2012 31 / 34

Problems Theorem (Allman-Rhodes 2006) Let T be a trivalent tree with n leaves, and consider the general Markov model on binary characters. The ideal has generating set {3 3 minors of Flat A B (P) A B Σ(T ) where Σ(T ) is the set of all valid splits on T. Note that P is a 2 2 2, n-way tensor. Problem 1 Take your favorite 4, 5 and 6 leaf trees and write down all the matrices Flat A B (P) that are needed in the previous theorem. 2 Let S r = A B Σ(T ) {r r minors of Flat A B(P)}. What is V (S r ) C 2n? Seth Sullivant (NCSU) Algebraic Statistics June 9, 2012 32 / 34

References E. Allman, C. Matias, J. Rhodes. Identifiability of parameters in latent structure models with many observed variables. Annals of Statistics, 37 no.6a (2009) 3099-3132. M. Casanellas, J. Fernandez-Sanchez. Performance of a new invariants method on homogeneous and non-homogeneous quartet trees, Molecular Biology and Evolution, 24(1):288-293, 2006. J. Cavender, J. Felsenstein. Invariants of phylogenies: a simple case with discrete states. Journal of Classification 4 (1987) 57-71. J. Draisma, J Kuttler. On the ideals of equivariant tree models, Mathematische Annalen 344(3):619-644, 2009. A. Engström, T. Kahle, S. Sullivant. Multigraded commutative algebra of graph decompositions. 1102.2601 S. Evans and T. Speed. Invariants of some probability models used in phylogenetic inference. Annals of Statistics 21 (1993) 355-377. M. Hendy, D. Penny. Spectral analysis of phylogenetic data. J. Classification 10 (1993) 5 24. J. Huelsenbeck. Performance of phylogenetic methods in simulation. Systematic Biology 42 no.1 (1995) 17 48. J. Lake. A rate-independent technique for analysis of nucleaic acid sequences: evolutionary parsimony. Molecular Biology and Evolution 4 (1987) 167-191. Seth Sullivant (NCSU) Algebraic Statistics June 9, 2012 33 / 34

F.A. Matsen and M. Steel. Phylogenetic mixtures on a single tree can mimic a tree of another topology. Systematic Biology, 2007. E. Mossel and E. Vigoda Phylogenetic MCMC Are Misleading on Mixtures of Trees. Science 309, 2207 2209 (2005) J. Rhodes, S. Sullivant. Identifiability of large phylogenetic mixture models. To appear Bulletin of Mathematical Biology, 2011. 1011.4134 D. Stefankovic and E. Vigoda. Pitfalls of Heterogeneous Processes for Phylogenetic Reconstruction Systematic Biology 56(1): 113-124, 2007. B. Sturmfels, S. Sullivant. Toric ideals of phylogenetic invariants. Journal of Computational Biology 12 (2005) 204-228. Seth Sullivant (NCSU) Algebraic Statistics June 9, 2012 34 / 34