Algebraic Statistics Tutorial I
|
|
- Basil Marshall
- 5 years ago
- Views:
Transcription
1 Algebraic Statistics Tutorial I Seth Sullivant North Carolina State University June 9, 2012 Seth Sullivant (NCSU) Algebraic Statistics June 9, / 34
2 Introduction to Algebraic Geometry Let R[p] = R[p 1,..., p m ] be the set of all polynomials in indeterminates p 1,..., p m with real coefficients. Definition Let F R[p 1,..., p m ]. The variety defined by F is the set V (F) := {a R m f (a) = 0 for all f F}. V ({p 2 p 2 1 }) = Seth Sullivant (NCSU) Algebraic Statistics June 9, / 34
3 Definition Given S R m is the vanishing ideal of S. I(S) := {f R[p] f (a) = 0 for all a S} Theorem (Hilbert s Basis Theorem) Every ideal I R[p] has a finite generating set. That is, for each ideal I, there exists a finite set F I such that F = I, where { } F := hi f i h i R[p], f i F. Seth Sullivant (NCSU) Algebraic Statistics June 9, / 34
4 Example: Hardy-Weinberg Equilibrium Suppose a gene has two alleles, a and A. If allele a occurs in the population with frequency θ (and A with frequency 1 θ) and these alleles are in Hardy-Weinberg equilibrium, the genotype frequencies are P(X = aa) = θ 2, P(X = aa) = 2θ(1 θ), P(X = AA) = (1 θ) 2 The model of Hardy-Weinberg equilibrium is the set M = {( θ 2, 2θ(1 θ), (1 θ) 2) } θ [0, 1] 3 I(M) = p aa +p aa +p AA 1, p 2 aa 4p aap AA Seth Sullivant (NCSU) Algebraic Statistics June 9, / 34
5 Main Point of This Tutorial Many statistical models are described by (semi)-algebraic constraints on a natural parameter space. Generators of the vanishing ideal can be useful for constructing algorithms or analyzing properties of statistical model. Two Examples Phylogenetic Algebraic Geometry Sampling Contingency Tables Seth Sullivant (NCSU) Algebraic Statistics June 9, / 34
6 Phylogenetics Problem Given a collection of species, find the tree that explains their history. Data consists of aligned DNA sequences from homologous genes Human: Chimp: Gorilla: Seth Sullivant (NCSU)...ACCGTGCAACGTGAACGA......ACCTTGGAAGGTAAACGA......ACCGTGCAACGTAAACTA... Algebraic Statistics June 9, / 34
7 Model-Based Phylogenetics Use a probabilistic model of mutations Parameters for the model are the combinatorial tree T, and rate parameters for mutations on each edge Models give a probability for observing a particular aligned collection of DNA sequences Human: Chimp: Gorilla: ACCGTGCAACGTGAACGA ACGTTGCAAGGTAAACGA ACCGTGCAACGTAAACTA Assuming site independence, data is summarized by empirical distribution of columns in the alignment. e.g. ˆp(AAA) = 6 18, ˆp(CGC) = 2 18, etc. Use empirical distribution and test statistic to find tree best explaining data Seth Sullivant (NCSU) Algebraic Statistics June 9, / 34
8 Phylogenetic Models Assuming site independence: Phylogenetic Model is a latent class graphical model Vertex v T gives a random variable X v {A, C, G, T} All random variables corresponding to internal nodes are latent X 1 X 2 X 3 Y 2 Y 1 P(x 1, x 2, x 3 ) = y 1 y 2 P(y 1 )P(y 2 y 1 )P(x 1 y 1 )P(x 2 y 2 )P(x 3 y 2 ) Seth Sullivant (NCSU) Algebraic Statistics June 9, / 34
9 Phylogenetic Models Assuming site independence: Phylogenetic Model is a latent class graphical model Vertex v T gives a random variable X v {A, C, G, T} All random variables corresponding to internal nodes are latent X 1 X 2 X 3 Y 2 Y 1 p i1 i 2 i 3 = j 1 j 2 π j1 a j2,j 1 b i1,j 1 c i2,j 2 d i3,j 2 Seth Sullivant (NCSU) Algebraic Statistics June 9, / 34
10 Algebraic Perspective on Phylogenetic Models Once we fix a tree T and model structure, we get a map φ T : Θ R 4n. Θ R d is a parameter space of numerical parameters (transition matrices associated to each edge). The map φ T is given by polynomial functions of the parameters. For each i 1 i n {A, C, G, T } n, φ T i 1 i n (θ) gives the probability of the column (i 1,..., i n ) in the alignment for the particular parameter choice θ. φ T i 1 i 2 i 3 (π, a, b, c, d) = π j1 a j2,j 1 b i1,j 1 c i2,j 2 d i3,j 2 j 1 j 2 The phylogenetic model is the set M T = φ T (Θ) R 4n. Seth Sullivant (NCSU) Algebraic Statistics June 9, / 34
11 Phylogenetic Varieties and Phylogenetic Invariants Let R[p] := R[p i1 i n : i 1 i n {A, C, G, T } n ] Definition Let I T := f R[p] : f (p) = 0 for all p M T R[p]. I T is the ideal of phylogenetic invariants of T. Let V T := {p R 4n : f (p) = 0 for all f I T }. V T is the phylogenetic variety of T. Note that M T V T. Since M T is image of a polynomial map dim M T = dim V T. Seth Sullivant (NCSU) Algebraic Statistics June 9, / 34
12 X 1 X X X p lmno = π i a ij b ik c jl d jm e kn f ko i=1 j=1 k=1 Seth Sullivant (NCSU) Algebraic Statistics June 9, / 34
13 X 1 X X X p lmno = π i a ij b ik c jl d jm e kn f ko i=1 j=1 k=1 4 a ij c jl d jm j=1 Seth Sullivant (NCSU) Algebraic Statistics June 9, / 34
14 X 1 X X X p lmno = π i a ij b ik c jl d jm e kn f ko i=1 j=1 k=1 4 4 a ij c jl d jm b ik e kn f ko j=1 k =1 Seth Sullivant (NCSU) Algebraic Statistics June 9, / 34
15 X 1 X X X p lmno = = π i a ij b ik c jl d jm e kn f ko i=1 j=1 k= a ij c jl d jm b ik e kn f ko π i i=1 j=1 k =1 Seth Sullivant (NCSU) Algebraic Statistics June 9, / 34
16 X 1 X X X p lmno = = π i a ij b ik c jl d jm e kn f ko i=1 j=1 k= a ij c jl d jm b ik e kn f ko π i i=1 j=1 k =1 p 1111 p 1112 p 1144 p = rank 1211 p 1212 p p 4411 p 4412 p 4444 Seth Sullivant (NCSU) Algebraic Statistics June 9, / 34
17 Splits and Phylogenetic Invariants Definition A split of a set is a bipartition A B. A split A B of the leaves of a tree T is valid for T if the induced trees T A and T B do not intersect. X 1 X X X Valid: Not Valid: Seth Sullivant (NCSU) Algebraic Statistics June 9, / 34
18 2-way Flattenings and Matrix Ranks Proposition Let P M T. p ijkl = P(X 1 = i, X 2 = j, X 3 = k, X 4 = l) p AAAA p AAAC p AAAG p AATT p ACAA p ACAC p ACAG p ACTT Flat (P) = p TTAA p TTAC p TTAG p TTTT If A B is a valid split for T, then rank(flat A B (P)) 4. Invariants in I T are subdeterminants of Flat A B (P). If C D is not a valid split for T, then generically rank(flat C D (P)) > 4. Seth Sullivant (NCSU) Algebraic Statistics June 9, / 34
19 Phylogenetic Algebraic Geometry Phylogenetic Algebraic Geometry is the study of the phylogenetic varieties and ideals V T and I T. Using Phylogenetic Invariants to Reconstruct Trees Identifiability of Phylogenetic Models Interesting Math Useful in Other Problems Seth Sullivant (NCSU) Algebraic Statistics June 9, / 34
20 Using Phylogenetic Invariants to Reconstruct Trees Definition A phylogenetic invariant f I T is phylogenetically informative if there is some other tree T such that f / I T. Idea of Cavender-Felsenstein (1987), Lake (1987): Evaluate phylogenetically informative phylogenetic invariants at empirical distribution ˆp to reconstruct phylogenetic trees Proposition For each n-leaf trivalent tree T, let F T I T be a set of phylogenetic invariants such that, for each T T, there is an f F T, such that f / I T. Let f T := f F T f. Then for generic p M T, f T (p) = 0 if and only if p M T. Seth Sullivant (NCSU) Algebraic Statistics June 9, / 34
21 Performance of Invariants Methods in Simulations Huelsenbeck (1995) did a systematic simulation comparison of 26 different methods of constructing a phylogenetic tree on 4 leaf trees. Invariant-based methods did poorly. HOWEVER... Huelsenbeck only used linear invariants. Casanellas, Fernandez-Sanchez (2006) redid these simulations using a generating set of the phylogenetic ideal I T. Phylogenetic invariants become comparable to other methods. For the particular model studied in Casanellas, Fernandez-Sanchez (2006) for a tree with 4 leaves, the ideal I T has 8002 generators. f T := f F T f is a sum of 8002 terms. Major work to overcome combinatorial explosion for larger trees. Seth Sullivant (NCSU) Algebraic Statistics June 9, / 34
22 Identifiability of Phylogenetic Models Definition A parametric statistical model is identifiable if it gives 1-to-1 map from parameters to probability distributions. Is it possible to infer the parameters of the model from data? Identifiability guarantees consistency of statistical methods (ML) Two types of parameters to consider for phylogenetic models: Numerical parameters (transition matrices) Tree parameter (combinatorial type of tree) Seth Sullivant (NCSU) Algebraic Statistics June 9, / 34
23 Geometric Perspective on Identifiability Definition The unrooted tree parameter T in a phylogenetic model is identifiable if for all p M T there does not exist another T T such that p M T. M 1 M 2 M1 M2 M 3 Identifiable Not Identifiable Seth Sullivant (NCSU) Algebraic Statistics June 9, / 34
24 Generic Identifiability Definition The tree parameter in a phylogenetic model is generically identifiable if for all n-leaf trees with T T, dim(m T M T ) < min(dim(m T ), dim(m T )). M1 M2 M3 Seth Sullivant (NCSU) Algebraic Statistics June 9, / 34
25 Proving Identifiability with Algebraic Geometry Proposition Let M 0 and M 1 be two algebraic models. If there exist phylogenetically informative invariants f 0 and f 1 such that f i (p) = 0 for all p M i, and f i (q) 0 for some q M 1 i, then dim(m 0 M 1 ) < min(dim M 0, dim M 1 ). M0 f = 0 M 1 Seth Sullivant (NCSU) Algebraic Statistics June 9, / 34
26 Phylogenetic Models are Identifiable Theorem The unrooted tree parameter of phylogenetic models is generically identifiable. Proof. Edge flattening invariants can detect which splits are implied by a specific distribution in M T. The splits in T uniquely determine T. Seth Sullivant (NCSU) Algebraic Statistics June 9, / 34
27 Phylogenetic Mixture Models Basic phylogenetic model assume same parameters at every site This assumption is not accurate within a single gene Some sites more important: unlikely to change Tree structure may vary across genes Leads to mixture models for different classes of sites M(T, r) denotes a same tree mixture model with underlying tree T and r classes of sites Seth Sullivant (NCSU) Algebraic Statistics June 9, / 34
28 Identifiability Questions for Mixture Models Question For fixed number of trees k, are the tree parameters T 1,..., T k, and rate parameters of each tree (generically) identified in phylogenetic mixture models? k = 1 (Ordinary phylogenetic models) Most models are identifiable on 2, 3, 4 leaves. ( Rogers, Chang, Steel, Hendy, Penny, Székely, Allman, Rhodes, Housworth,...) k > 1 T 1 = T 2 = = T k but no restriction on number of trees Not identifiable (Matsen-Steel, Stefankovic-Vigoda) k > 1, T i arbitrary Not identifiable (Mossel-Vigoda) Seth Sullivant (NCSU) Algebraic Statistics June 9, / 34
29 Theorem (Rhodes-Sullivant 2011) The unrooted tree and numerical parameters in a r-class, same tree phylogenetic mixture model on n-leaf trivalent trees are generically identifiable, if r < 4 n/4. Proof Ideas. Phylogenetic invariants from flattenings Tensor rank (Kruskal s Theorem) [Allman-Matias-Rhodes 2009] Elementary tree combinatorics Solving tree and numerical parameter identifiability at the same time Seth Sullivant (NCSU) Algebraic Statistics June 9, / 34
30 How to Construct Phylogenetic Invariants? Theorem (Sturmfels-S, Allman-Rhodes, Casanellas-S, Draisma-Kuttler) Consider nice algebraic phylogenetic model. The problem of computing phylogenetic invariants for any tree T can be reduced to the same problem for star trees K 1,k. The ideal I T generated by local contributions from each K 1,k, plus flattening invariants from edges. The varieties V K1,k are interesting classical algebraic varieties: toric varieties secant varieties Sec 4 (P 3 P 3 P 3 ) Seth Sullivant (NCSU) Algebraic Statistics June 9, / 34
31 Group-based models ( ) α β β α α β β β β α β β β β α β β β β α α β γ γ β α γ γ γ γ α β γ γ β α α β γ δ β α δ γ γ δ α β δ γ β α Random variables in finite abelian group G. Transitions probabilities satisfy Prob(X = g Y = h) = f (g + h). This means that the formula for Prob(X 1 = g 1,..., X n = g n ) is a convolution (over G n ). Apply discrete Fourier transform to turn convolution into a product. Theorem (Hendy-Penny 1993, Evans-Speed 1993) In the Fourier coordinates, a group-based model is parametrized by monomial functions in terms of the Fourier parameters. In particular, the CFN model is a toric variety. Seth Sullivant (NCSU) Algebraic Statistics June 9, / 34
32 Equations for the CFN Model Theorem (Sturmfels-S 2005) For any tree T, the toric ideal I T for the CFN model is generated by degree 2 determinantal equations. X 1 X 2 X3 X4 Fourier coordinates: q lmno = r,s,t,u {0,1} ( 1)rl+sm+tn+uo p rstu ( ) q0000 q 0001 q 0010 q 0011 q 1100 q 1101 q 1110 q 1111 ( q0100 q 0101 q 0110 ) q 0111 q 1000 q 1001 q 1010 q 1011 I T generated by 2 2 minors of: q 0000 q 0011 q 0001 q 0010 q 0100 q 0111 q 0101 q 0110 q 1000 q 1011 q 1001 q 1010 q 1100 q 1111 q 1101 q 1110 Seth Sullivant (NCSU) Algebraic Statistics June 9, / 34
33 Gluing Two Trees at a Leaf Let T = T 1 #T 2, tree obtained by joining two trees at a leaf. Each ring C[p]/I T1, C[p]/I T2 is invariant under action of group G = Gl r (C) k acting on the glue leaves. Theorem (Draisma-Kuttler) C[p]/I T = (C[p]/IT1 C C[p]/I T2 ) G V T = (V T1 V T2 )//G (GIT quotient) Actions of individual factors (Gl r (C)) do no interact. Use Reynolds operator and first fundamental theorem of CIT. Seth Sullivant (NCSU) Algebraic Statistics June 9, / 34
34 Gluing more complex graphs + = + = Still a group action (Gl r (C) k ). But factors are not acting independently. C[p]/I G = (C[p]/IG1 C C[p]/I G2 ) G C[p]/I G generated by degree 1 part of (C[p]/I G1 C C[p]/I G2 ) G (toric fiber product if r = 1) Theorem (Engström-Kahle-S) Can determine generators of I G from I G1 and I G2 if the TFP has low codimension. Useful for other problems in algebraic statistics. Seth Sullivant (NCSU) Algebraic Statistics June 9, / 34
35 Summary: Phylogenetic Algebraic Geometry Phylogenetic models are fundamentally algebraic-geometric objects. Algebraic perspective is useful for: Developing new construction algorithms Proving theorems about identifiability (currently best available for mixture models) Leads to interesting new mathematics, useful for other problems Long way to go: Your Help Needed! Seth Sullivant (NCSU) Algebraic Statistics June 9, / 34
36 Problems Theorem (Allman-Rhodes 2006) Let T be a trivalent tree with n leaves, and consider the general Markov model on binary characters. The ideal has generating set {3 3 minors of Flat A B (P) A B Σ(T ) where Σ(T ) is the set of all valid splits on T. Note that P is a 2 2 2, n-way tensor. Problem 1 Take your favorite 4, 5 and 6 leaf trees and write down all the matrices Flat A B (P) that are needed in the previous theorem. 2 Let S r = A B Σ(T ) {r r minors of Flat A B(P)}. What is V (S r ) C 2n? Seth Sullivant (NCSU) Algebraic Statistics June 9, / 34
37 References E. Allman, C. Matias, J. Rhodes. Identifiability of parameters in latent structure models with many observed variables. Annals of Statistics, 37 no.6a (2009) M. Casanellas, J. Fernandez-Sanchez. Performance of a new invariants method on homogeneous and non-homogeneous quartet trees, Molecular Biology and Evolution, 24(1): , J. Cavender, J. Felsenstein. Invariants of phylogenies: a simple case with discrete states. Journal of Classification 4 (1987) J. Draisma, J Kuttler. On the ideals of equivariant tree models, Mathematische Annalen 344(3): , A. Engström, T. Kahle, S. Sullivant. Multigraded commutative algebra of graph decompositions S. Evans and T. Speed. Invariants of some probability models used in phylogenetic inference. Annals of Statistics 21 (1993) M. Hendy, D. Penny. Spectral analysis of phylogenetic data. J. Classification 10 (1993) J. Huelsenbeck. Performance of phylogenetic methods in simulation. Systematic Biology 42 no.1 (1995) J. Lake. A rate-independent technique for analysis of nucleaic acid sequences: evolutionary parsimony. Molecular Biology and Evolution 4 (1987) Seth Sullivant (NCSU) Algebraic Statistics June 9, / 34
38 F.A. Matsen and M. Steel. Phylogenetic mixtures on a single tree can mimic a tree of another topology. Systematic Biology, E. Mossel and E. Vigoda Phylogenetic MCMC Are Misleading on Mixtures of Trees. Science 309, (2005) J. Rhodes, S. Sullivant. Identifiability of large phylogenetic mixture models. To appear Bulletin of Mathematical Biology, D. Stefankovic and E. Vigoda. Pitfalls of Heterogeneous Processes for Phylogenetic Reconstruction Systematic Biology 56(1): , B. Sturmfels, S. Sullivant. Toric ideals of phylogenetic invariants. Journal of Computational Biology 12 (2005) Seth Sullivant (NCSU) Algebraic Statistics June 9, / 34
Phylogenetic Algebraic Geometry
Phylogenetic Algebraic Geometry Seth Sullivant North Carolina State University January 4, 2012 Seth Sullivant (NCSU) Phylogenetic Algebraic Geometry January 4, 2012 1 / 28 Phylogenetics Problem Given a
More informationExample: Hardy-Weinberg Equilibrium. Algebraic Statistics Tutorial I. Phylogenetics. Main Point of This Tutorial. Model-Based Phylogenetics
Example: Hardy-Weinberg Equilibrium Algebraic Statistics Tutorial I Seth Sullivant North Carolina State University July 22, 2012 Suppose a gene has two alleles, a and A. If allele a occurs in the population
More informationIntroduction to Algebraic Statistics
Introduction to Algebraic Statistics Seth Sullivant North Carolina State University January 5, 2017 Seth Sullivant (NCSU) Algebraic Statistics January 5, 2017 1 / 28 What is Algebraic Statistics? Observation
More informationPhylogenetic invariants versus classical phylogenetics
Phylogenetic invariants versus classical phylogenetics Marta Casanellas Rius (joint work with Jesús Fernández-Sánchez) Departament de Matemàtica Aplicada I Universitat Politècnica de Catalunya Algebraic
More informationUsing algebraic geometry for phylogenetic reconstruction
Using algebraic geometry for phylogenetic reconstruction Marta Casanellas i Rius (joint work with Jesús Fernández-Sánchez) Departament de Matemàtica Aplicada I Universitat Politècnica de Catalunya IMA
More informationIdentifiability of the GTR+Γ substitution model (and other models) of DNA evolution
Identifiability of the GTR+Γ substitution model (and other models) of DNA evolution Elizabeth S. Allman Dept. of Mathematics and Statistics University of Alaska Fairbanks TM Current Challenges and Problems
More informationToric Fiber Products
Toric Fiber Products Seth Sullivant North Carolina State University June 8, 2011 Seth Sullivant (NCSU) Toric Fiber Products June 8, 2011 1 / 26 Families of Ideals Parametrized by Graphs Let G be a finite
More informationPHYLOGENETIC ALGEBRAIC GEOMETRY
PHYLOGENETIC ALGEBRAIC GEOMETRY NICHOLAS ERIKSSON, KRISTIAN RANESTAD, BERND STURMFELS, AND SETH SULLIVANT Abstract. Phylogenetic algebraic geometry is concerned with certain complex projective algebraic
More informationGeometry of Phylogenetic Inference
Geometry of Phylogenetic Inference Matilde Marcolli CS101: Mathematical and Computational Linguistics Winter 2015 References N. Eriksson, K. Ranestad, B. Sturmfels, S. Sullivant, Phylogenetic algebraic
More informationSpecies Tree Inference using SVDquartets
Species Tree Inference using SVDquartets Laura Kubatko and Dave Swofford May 19, 2015 Laura Kubatko SVDquartets May 19, 2015 1 / 11 SVDquartets In this tutorial, we ll discuss several different data types:
More informationOpen Problems in Algebraic Statistics
Open Problems inalgebraic Statistics p. Open Problems in Algebraic Statistics BERND STURMFELS UNIVERSITY OF CALIFORNIA, BERKELEY and TECHNISCHE UNIVERSITÄT BERLIN Advertisement Oberwolfach Seminar Algebraic
More informationMetric learning for phylogenetic invariants
Metric learning for phylogenetic invariants Nicholas Eriksson nke@stanford.edu Department of Statistics, Stanford University, Stanford, CA 94305-4065 Yuan Yao yuany@math.stanford.edu Department of Mathematics,
More informationarxiv: v1 [q-bio.pe] 23 Nov 2017
DIMENSIONS OF GROUP-BASED PHYLOGENETIC MIXTURES arxiv:1711.08686v1 [q-bio.pe] 23 Nov 2017 HECTOR BAÑOS, NATHANIEL BUSHEK, RUTH DAVIDSON, ELIZABETH GROSS, PAMELA E. HARRIS, ROBERT KRONE, COLBY LONG, ALLEN
More informationPRIMARY DECOMPOSITION FOR THE INTERSECTION AXIOM
PRIMARY DECOMPOSITION FOR THE INTERSECTION AXIOM ALEX FINK 1. Introduction and background Consider the discrete conditional independence model M given by {X 1 X 2 X 3, X 1 X 3 X 2 }. The intersection axiom
More informationarxiv: v1 [q-bio.pe] 27 Oct 2011
INVARIANT BASED QUARTET PUZZLING JOE RUSINKO AND BRIAN HIPP arxiv:1110.6194v1 [q-bio.pe] 27 Oct 2011 Abstract. Traditional Quartet Puzzling algorithms use maximum likelihood methods to reconstruct quartet
More informationELIZABETH S. ALLMAN and JOHN A. RHODES ABSTRACT 1. INTRODUCTION
JOURNAL OF COMPUTATIONAL BIOLOGY Volume 13, Number 5, 2006 Mary Ann Liebert, Inc. Pp. 1101 1113 The Identifiability of Tree Topology for Phylogenetic Models, Including Covarion and Mixture Models ELIZABETH
More informationPhylogeny of Mixture Models
Phylogeny of Mixture Models Daniel Štefankovič Department of Computer Science University of Rochester joint work with Eric Vigoda College of Computing Georgia Institute of Technology Outline Introduction
More informationTotal binomial decomposition (TBD) Thomas Kahle Otto-von-Guericke Universität Magdeburg
Total binomial decomposition (TBD) Thomas Kahle Otto-von-Guericke Universität Magdeburg Setup Let k be a field. For computations we use k = Q. k[p] := k[p 1,..., p n ] the polynomial ring in n indeterminates
More informationThe Generalized Neighbor Joining method
The Generalized Neighbor Joining method Ruriko Yoshida Dept. of Mathematics Duke University Joint work with Dan Levy and Lior Pachter www.math.duke.edu/ ruriko data mining 1 Challenge We would like to
More informationAlgebraic Invariants of Phylogenetic Trees
Harvey Mudd College November 21st, 2006 The Problem DNA Sequence Data Given a set of aligned DNA sequences... Human: A G A C G T T A C G T A... Chimp: A G A G C A A C T T T G... Monkey: A A A C G A T A
More informationIs Every Secant Variety of a Segre Product Arithmetically Cohen Macaulay? Oeding (Auburn) acm Secants March 6, / 23
Is Every Secant Variety of a Segre Product Arithmetically Cohen Macaulay? Luke Oeding Auburn University Oeding (Auburn) acm Secants March 6, 2016 1 / 23 Secant varieties and tensors Let V 1,..., V d, be
More informationPitfalls of Heterogeneous Processes for Phylogenetic Reconstruction
Pitfalls of Heterogeneous Processes for Phylogenetic Reconstruction DANIEL ŠTEFANKOVIČ 1 AND ERIC VIGODA 2 1 Department of Computer Science, University of Rochester, Rochester, New York 14627, USA; and
More informationmathematics Smoothness in Binomial Edge Ideals Article Hamid Damadi and Farhad Rahmati *
mathematics Article Smoothness in Binomial Edge Ideals Hamid Damadi and Farhad Rahmati * Faculty of Mathematics and Computer Science, Amirkabir University of Technology (Tehran Polytechnic), 424 Hafez
More informationTutorial: Gaussian conditional independence and graphical models. Thomas Kahle Otto-von-Guericke Universität Magdeburg
Tutorial: Gaussian conditional independence and graphical models Thomas Kahle Otto-von-Guericke Universität Magdeburg The central dogma of algebraic statistics Statistical models are varieties The central
More informationPitfalls of Heterogeneous Processes for Phylogenetic Reconstruction
Pitfalls of Heterogeneous Processes for Phylogenetic Reconstruction Daniel Štefankovič Eric Vigoda June 30, 2006 Department of Computer Science, University of Rochester, Rochester, NY 14627, and Comenius
More informationJed Chou. April 13, 2015
of of CS598 AGB April 13, 2015 Overview of 1 2 3 4 5 Competing Approaches of Two competing approaches to species tree inference: Summary methods: estimate a tree on each gene alignment then combine gene
More informationUSING INVARIANTS FOR PHYLOGENETIC TREE CONSTRUCTION
USING INVARIANTS FOR PHYLOGENETIC TREE CONSTRUCTION NICHOLAS ERIKSSON Abstract. Phylogenetic invariants are certain polynomials in the joint probability distribution of a Markov model on a phylogenetic
More informationMaximum Likelihood Estimates for Binary Random Variables on Trees via Phylogenetic Ideals
Maximum Likelihood Estimates for Binary Random Variables on Trees via Phylogenetic Ideals Robin Evans Abstract In their 2007 paper, E.S. Allman and J.A. Rhodes characterise the phylogenetic ideal of general
More informationWhen Do Phylogenetic Mixture Models Mimic Other Phylogenetic Models?
Syst. Biol. 61(6):1049 1059, 2012 The Author(s) 2012. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com
More information19 Tree Construction using Singular Value Decomposition
19 Tree Construction using Singular Value Decomposition Nicholas Eriksson We present a new, statistically consistent algorithm for phylogenetic tree construction that uses the algeraic theory of statistical
More informationThe binomial ideal of the intersection axiom for conditional probabilities
J Algebr Comb (2011) 33: 455 463 DOI 10.1007/s10801-010-0253-5 The binomial ideal of the intersection axiom for conditional probabilities Alex Fink Received: 10 February 2009 / Accepted: 27 August 2010
More informationCombinatorial Aspects of Tropical Geometry and its interactions with phylogenetics
Combinatorial Aspects of Tropical Geometry and its interactions with phylogenetics María Angélica Cueto Department of Mathematics Columbia University Rabadan Lab Metting Columbia University College of
More informationAdditive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.
Additive distances Let T be a tree on leaf set S and let w : E R + be an edge-weighting of T, and assume T has no nodes of degree two. Let D ij = e P ij w(e), where P ij is the path in T from i to j. Then
More informationNoetherianity up to symmetry
1 Noetherianity up to symmetry Jan Draisma TU Eindhoven and VU Amsterdam Singular Landscapes in honour of Bernard Teissier Aussois, June 2015 A landscape, and a disclaimer 2 # participants A landscape,
More informationPhylogenetic Geometry
Phylogenetic Geometry Ruth Davidson University of Illinois Urbana-Champaign Department of Mathematics Mathematics and Statistics Seminar Washington State University-Vancouver September 26, 2016 Phylogenies
More informationReconstruire le passé biologique modèles, méthodes, performances, limites
Reconstruire le passé biologique modèles, méthodes, performances, limites Olivier Gascuel Centre de Bioinformatique, Biostatistique et Biologie Intégrative C3BI USR 3756 Institut Pasteur & CNRS Reconstruire
More informationGeometry of Gaussoids
Geometry of Gaussoids Bernd Sturmfels MPI Leipzig and UC Berkeley p 3 p 13 p 23 a 12 3 p 123 a 23 a 13 2 a 23 1 a 13 p 2 p 12 a 12 p p 1 Figure 1: With The vertices Tobias andboege, 2-faces ofalessio the
More informationRecent Progress in Combinatorial Statistics
Elchanan Mossel U.C. Berkeley Recent Progress in Combinatorial Statistics At Penn Statistics, Sep 11 Combinatorial Statistics Combinatorial Statistics : Rigorous Analysis of Inference Problems where: Estimating
More informationOeding (Auburn) tensors of rank 5 December 15, / 24
Oeding (Auburn) 2 2 2 2 2 tensors of rank 5 December 15, 2015 1 / 24 Recall Peter Burgisser s overview lecture (Jan Draisma s SIAM News article). Big Goal: Bound the computational complexity of det n,
More informationAlgebraic Complexity in Statistics using Combinatorial and Tensor Methods
Algebraic Complexity in Statistics using Combinatorial and Tensor Methods BY ELIZABETH GROSS THESIS Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Mathematics
More informationInfinite-dimensional combinatorial commutative algebra
Infinite-dimensional combinatorial commutative algebra Steven Sam University of California, Berkeley September 20, 2014 1/15 Basic theme: there are many axes in commutative algebra which are governed by
More informationCSCI1950 Z Computa4onal Methods for Biology Lecture 5
CSCI1950 Z Computa4onal Methods for Biology Lecture 5 Ben Raphael February 6, 2009 hip://cs.brown.edu/courses/csci1950 z/ Alignment vs. Distance Matrix Mouse: ACAGTGACGCCACACACGT Gorilla: CCTGCGACGTAACAAACGC
More informationDissimilarity maps on trees and the representation theory of GL n (C)
Dissimilarity maps on trees and the representation theory of GL n (C) Christopher Manon Department of Mathematics University of California, Berkeley Berkeley, California, U.S.A. chris.manon@math.berkeley.edu
More informationConsistency Index (CI)
Consistency Index (CI) minimum number of changes divided by the number required on the tree. CI=1 if there is no homoplasy negatively correlated with the number of species sampled Retention Index (RI)
More information1. Can we use the CFN model for morphological traits?
1. Can we use the CFN model for morphological traits? 2. Can we use something like the GTR model for morphological traits? 3. Stochastic Dollo. 4. Continuous characters. Mk models k-state variants of the
More informationMixed-up Trees: the Structure of Phylogenetic Mixtures
Bulletin of Mathematical Biology (2008) 70: 1115 1139 DOI 10.1007/s11538-007-9293-y ORIGINAL ARTICLE Mixed-up Trees: the Structure of Phylogenetic Mixtures Frederick A. Matsen a,, Elchanan Mossel b, Mike
More informationMathematical Biology. Phylogenetic mixtures and linear invariants for equal input models. B Mike Steel. Marta Casanellas 1 Mike Steel 2
J. Math. Biol. DOI 0.007/s00285-06-055-8 Mathematical Biology Phylogenetic mixtures and linear invariants for equal input models Marta Casanellas Mike Steel 2 Received: 5 February 206 / Revised: July 206
More informationThe statistical and informatics challenges posed by ascertainment biases in phylogenetic data collection
The statistical and informatics challenges posed by ascertainment biases in phylogenetic data collection Mark T. Holder and Jordan M. Koch Department of Ecology and Evolutionary Biology, University of
More informationAlgebraic Classification of Small Bayesian Networks
GROSTAT VI, Menton IUT STID p. 1 Algebraic Classification of Small Bayesian Networks Luis David Garcia, Michael Stillman, and Bernd Sturmfels lgarcia@math.vt.edu Virginia Tech GROSTAT VI, Menton IUT STID
More informationHomotopy Techniques for Tensor Decomposition and Perfect Identifiability
Homotopy Techniques for Tensor Decomposition and Perfect Identifiability Luke Oeding Auburn University with Hauenstein (Notre Dame), Ottaviani (Firenze) and Sommese (Notre Dame) Oeding (Auburn) Homotopy
More informationThe intersection axiom of
The intersection axiom of conditional independence: some new [?] results Richard D. Gill Mathematical Institute, University Leiden This version: 26 March, 2019 (X Y Z) & (X Z Y) X (Y, Z) Presented at Algebraic
More informationTheory of Evolution Charles Darwin
Theory of Evolution Charles arwin 858-59: Origin of Species 5 year voyage of H.M.S. eagle (83-36) Populations have variations. Natural Selection & Survival of the fittest: nature selects best adapted varieties
More informationarxiv: v4 [math.ac] 31 May 2012
BETTI NUMBERS OF STANLEY-REISNER RINGS DETERMINE HIERARCHICAL MARKOV DEGREES SONJA PETROVIĆ AND ERIK STOKES arxiv:0910.1610v4 [math.ac] 31 May 2012 Abstract. There are two seemingly unrelated ideals associated
More informationCommuting birth-and-death processes
Commuting birth-and-death processes Caroline Uhler Department of Statistics UC Berkeley (joint work with Steven N. Evans and Bernd Sturmfels) MSRI Workshop on Algebraic Statistics December 18, 2008 Birth-and-death
More informationMaximum Agreement Subtrees
Maximum Agreement Subtrees Seth Sullivant North Carolina State University March 24, 2018 Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 1 / 23 Phylogenetics Problem Given a collection
More informationCombinatorics and geometry of E 7
Combinatorics and geometry of E 7 Steven Sam University of California, Berkeley September 19, 2012 1/24 Outline Macdonald representations Vinberg representations Root system Weyl group 7 points in P 2
More informationConstructing Evolutionary/Phylogenetic Trees
Constructing Evolutionary/Phylogenetic Trees 2 broad categories: istance-based methods Ultrametric Additive: UPGMA Transformed istance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood
More informationarxiv: v1 [q-bio.pe] 1 Jun 2014
THE MOST PARSIMONIOUS TREE FOR RANDOM DATA MAREIKE FISCHER, MICHELLE GALLA, LINA HERBST AND MIKE STEEL arxiv:46.27v [q-bio.pe] Jun 24 Abstract. Applying a method to reconstruct a phylogenetic tree from
More informationVector bundles in Algebraic Geometry Enrique Arrondo. 1. The notion of vector bundle
Vector bundles in Algebraic Geometry Enrique Arrondo Notes(* prepared for the First Summer School on Complex Geometry (Villarrica, Chile 7-9 December 2010 1 The notion of vector bundle In affine geometry,
More informationUniversity of Warwick institutional repository:
University of Warwick institutional repository: http://go.warwick.ac.uk/wrap This paper is made available online in accordance with publisher policies. Please scroll down to view the document itself. Please
More informationLecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30
Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30 A non-phylogeny
More information5. Grassmannians and the Space of Trees In this lecture we shall be interested in a very particular ideal. The ambient polynomial ring C[p] has ( n
5. Grassmannians and the Space of Trees In this lecture we shall be interested in a very particular ideal. The ambient polynomial ring C[p] has ( n d) variables, which are called Plücker coordinates: C[p]
More informationToric Varieties in Statistics
Toric Varieties in Statistics Daniel Irving Bernstein and Seth Sullivant North Carolina State University dibernst@ncsu.edu http://www4.ncsu.edu/~dibernst/ http://arxiv.org/abs/50.063 http://arxiv.org/abs/508.0546
More informationReconstructing Trees from Subtree Weights
Reconstructing Trees from Subtree Weights Lior Pachter David E Speyer October 7, 2003 Abstract The tree-metric theorem provides a necessary and sufficient condition for a dissimilarity matrix to be a tree
More informationEVOLUTIONARY DISTANCES
EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:
More informationEvolutionary Tree Analysis. Overview
CSI/BINF 5330 Evolutionary Tree Analysis Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Backgrounds Distance-Based Evolutionary Tree Reconstruction Character-Based
More informationBMI/CS 776 Lecture 4. Colin Dewey
BMI/CS 776 Lecture 4 Colin Dewey 2007.02.01 Outline Common nucleotide substitution models Directed graphical models Ancestral sequence inference Poisson process continuous Markov process X t0 X t1 X t2
More informationBinomial Exercises A = 1 1 and 1
Lecture I. Toric ideals. Exhibit a point configuration A whose affine semigroup NA does not consist of the intersection of the lattice ZA spanned by the columns of A with the real cone generated by A.
More informationLecture 1. Toric Varieties: Basics
Lecture 1. Toric Varieties: Basics Taras Panov Lomonosov Moscow State University Summer School Current Developments in Geometry Novosibirsk, 27 August1 September 2018 Taras Panov (Moscow University) Lecture
More informationSpecial Session on Secant Varieties. and Related Topics. Secant Varieties of Grassmann and Segre Varieties. Giorgio Ottaviani
Special Session on Secant Varieties and Related Topics Secant Varieties of Grassmann and Segre Varieties Giorgio Ottaviani ottavian@math.unifi.it www.math.unifi.it/users/ottavian Università di Firenze
More informationAlgebraic Varieties. Notes by Mateusz Micha lek for the lecture on April 17, 2018, in the IMPRS Ringvorlesung Introduction to Nonlinear Algebra
Algebraic Varieties Notes by Mateusz Micha lek for the lecture on April 17, 2018, in the IMPRS Ringvorlesung Introduction to Nonlinear Algebra Algebraic varieties represent solutions of a system of polynomial
More informationarxiv: v3 [q-bio.pe] 3 Dec 2011
An algebraic analysis of the two state arkov model on tripod trees arxiv:1012.0062v3 [q-bio.pe] 3 Dec 2011 1 2 3 4 5 6 7 8 9 10 11 Abstract Steffen Klaere a,, Volkmar Liebscher b a Department of athematics
More informationT -equivariant tensor rank varieties and their K-theory classes
T -equivariant tensor rank varieties and their K-theory classes 2014 July 18 Advisor: Professor Anders Buch, Department of Mathematics, Rutgers University Overview 1 Equivariant K-theory Overview 2 Determinantal
More informationOn the extremal Betti numbers of binomial edge ideals of block graphs
On the extremal Betti numbers of binomial edge ideals of block graphs Jürgen Herzog Fachbereich Mathematik Universität Duisburg-Essen Essen, Germany juergen.herzog@uni-essen.de Giancarlo Rinaldo Department
More informationLECTURES ON SYMPLECTIC REFLECTION ALGEBRAS
LECTURES ON SYMPLECTIC REFLECTION ALGEBRAS IVAN LOSEV 16. Symplectic resolutions of µ 1 (0)//G and their deformations 16.1. GIT quotients. We will need to produce a resolution of singularities for C 2n
More informationSome of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!
Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis
More informationThe Maximum Likelihood Threshold of a Graph
The Maximum Likelihood Threshold of a Graph Elizabeth Gross and Seth Sullivant San Jose State University, North Carolina State University August 28, 2014 Seth Sullivant (NCSU) Maximum Likelihood Threshold
More informationarxiv: v1 [math.ra] 13 Jan 2009
A CONCISE PROOF OF KRUSKAL S THEOREM ON TENSOR DECOMPOSITION arxiv:0901.1796v1 [math.ra] 13 Jan 2009 JOHN A. RHODES Abstract. A theorem of J. Kruskal from 1977, motivated by a latent-class statistical
More informationSecant Varieties and Equations of Abo-Wan Hypersurfaces
Secant Varieties and Equations of Abo-Wan Hypersurfaces Luke Oeding Auburn University Oeding (Auburn, NIMS) Secants, Equations and Applications August 9, 214 1 / 34 Secant varieties Suppose X is an algebraic
More informationADVANCED TOPICS IN ALGEBRAIC GEOMETRY
ADVANCED TOPICS IN ALGEBRAIC GEOMETRY DAVID WHITE Outline of talk: My goal is to introduce a few more advanced topics in algebraic geometry but not to go into too much detail. This will be a survey of
More informationarxiv: v1 [cs.cl] 5 Dec 2017
PHYLOGENETICS OF INDO-EUROPEAN LANGUAGE FAMILIES VIA AN ALGEBRO-GEOMETRIC ANALYSIS OF THEIR SYNTACTIC STRUCTURES KEVIN SHU, ANDREW ORTEGARAY, ROBERT BERWICK AND MATILDE MARCOLLI arxiv:72.079v [cs.cl] 5
More informationA few logs suce to build (almost) all trees: Part II
Theoretical Computer Science 221 (1999) 77 118 www.elsevier.com/locate/tcs A few logs suce to build (almost) all trees: Part II Peter L. Erdős a;, Michael A. Steel b,laszlo A.Szekely c, Tandy J. Warnow
More informationProcesses of Evolution
15 Processes of Evolution Forces of Evolution Concept 15.4 Selection Can Be Stabilizing, Directional, or Disruptive Natural selection can act on quantitative traits in three ways: Stabilizing selection
More informationSome Algebraic and Combinatorial Properties of the Complete T -Partite Graphs
Iranian Journal of Mathematical Sciences and Informatics Vol. 13, No. 1 (2018), pp 131-138 DOI: 10.7508/ijmsi.2018.1.012 Some Algebraic and Combinatorial Properties of the Complete T -Partite Graphs Seyyede
More informationPhylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center
Phylogenetic Analysis Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center Outline Basic Concepts Tree Construction Methods Distance-based methods
More informationToric ideals finitely generated up to symmetry
Toric ideals finitely generated up to symmetry Anton Leykin Georgia Tech MOCCA, Levico Terme, September 2014 based on [arxiv:1306.0828] Noetherianity for infinite-dimensional toric varieties (with Jan
More informationAlgorithms in Bioinformatics
Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Distance Methods Character Methods
More informationCSCI1950 Z Computa4onal Methods for Biology Lecture 4. Ben Raphael February 2, hhp://cs.brown.edu/courses/csci1950 z/ Algorithm Summary
CSCI1950 Z Computa4onal Methods for Biology Lecture 4 Ben Raphael February 2, 2009 hhp://cs.brown.edu/courses/csci1950 z/ Algorithm Summary Parsimony Probabilis4c Method Input Output Sankoff s & Fitch
More informationTheDisk-Covering MethodforTree Reconstruction
TheDisk-Covering MethodforTree Reconstruction Daniel Huson PACM, Princeton University Bonn, 1998 1 Copyright (c) 2008 Daniel Huson. Permission is granted to copy, distribute and/or modify this document
More informationPhylogenetic Networks, Trees, and Clusters
Phylogenetic Networks, Trees, and Clusters Luay Nakhleh 1 and Li-San Wang 2 1 Department of Computer Science Rice University Houston, TX 77005, USA nakhleh@cs.rice.edu 2 Department of Biology University
More informationMulti-parameter persistent homology: applications and algorithms
Multi-parameter persistent homology: applications and algorithms Nina Otter Mathematical Institute, University of Oxford Gudhi/Top Data Workshop Porquerolles, 18 October 2016 Multi-parameter persistent
More informationarxiv: v3 [math.ag] 21 Aug 2008
ON THE IDEALS OF EQUIVARIANT TREE MODELS arxiv:0712.3230v3 [math.ag] 21 Aug 2008 JAN DRAISMA AND JOCHEN KUTTLER Abstract. We introduce equivariant tree models in algebraic statistics, which unify and generalise
More informationALGEBRA: From Linear to Non-Linear. Bernd Sturmfels University of California at Berkeley
ALGEBRA: From Linear to Non-Linear Bernd Sturmfels University of California at Berkeley John von Neumann Lecture, SIAM Annual Meeting, Pittsburgh, July 13, 2010 Undergraduate Linear Algebra All undergraduate
More informationLie Markov models. Jeremy Sumner. School of Physical Sciences University of Tasmania, Australia
Lie Markov models Jeremy Sumner School of Physical Sciences University of Tasmania, Australia Stochastic Modelling Meets Phylogenetics, UTAS, November 2015 Jeremy Sumner Lie Markov models 1 / 23 The theory
More informationPhylogenetic Tree Reconstruction
I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven
More informationOn the topology of matrix configuration spaces
On the topology of matrix configuration spaces Daniel C. Cohen Department of Mathematics Louisiana State University Daniel C. Cohen (LSU) Matrix configuration spaces Summer 2013 1 Configuration spaces
More informationALGEBRAIC PROPERTIES OF BIER SPHERES
LE MATEMATICHE Vol. LXVII (2012 Fasc. I, pp. 91 101 doi: 10.4418/2012.67.1.9 ALGEBRAIC PROPERTIES OF BIER SPHERES INGA HEUDTLASS - LUKAS KATTHÄN We give a classification of flag Bier spheres, as well as
More informationTree of Life iological Sequence nalysis Chapter http://tolweb.org/tree/ Phylogenetic Prediction ll organisms on Earth have a common ancestor. ll species are related. The relationship is called a phylogeny
More informationPhylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz
Phylogenetic Trees What They Are Why We Do It & How To Do It Presented by Amy Harris Dr Brad Morantz Overview What is a phylogenetic tree Why do we do it How do we do it Methods and programs Parallels
More informationarxiv: v6 [math.st] 19 Oct 2011
Electronic Journal of Statistics Vol. 5 (2011) 1276 1312 ISSN: 1935-7524 DOI: 10.1214/11-EJS640 Implicit inequality constraints in a binary tree model arxiv:0904.1980v6 [math.st] 19 Oct 2011 Contents Piotr
More information