Open Problems in Algebraic Statistics

Similar documents
Geometry of Phylogenetic Inference

Introduction to Algebraic Statistics

Asymptotic Approximation of Marginal Likelihood Integrals

PHYLOGENETIC ALGEBRAIC GEOMETRY

Is Every Secant Variety of a Segre Product Arithmetically Cohen Macaulay? Oeding (Auburn) acm Secants March 6, / 23

Algebraic Statistics Tutorial I

The Geometry of Semidefinite Programming. Bernd Sturmfels UC Berkeley

ALGEBRAIC DEGREE OF POLYNOMIAL OPTIMIZATION. 1. Introduction. f 0 (x)

Commuting birth-and-death processes

Phylogenetic Algebraic Geometry

Semidefinite Programming

Tensor Networks in Algebraic Geometry and Statistics

Geometry of Gaussoids

The maximum likelihood degree of rank 2 matrices via Euler characteristics

The Algebraic Degree of Semidefinite Programming

PRIMARY DECOMPOSITION FOR THE INTERSECTION AXIOM

Algebraic Statistics progress report

ALGEBRAIC STATISTICAL MODELS

Special Session on Secant Varieties. and Related Topics. Secant Varieties of Grassmann and Segre Varieties. Giorgio Ottaviani

Multivariate Gaussians, semidefinite matrix completion, and convex algebraic geometry

Sheaf cohomology and non-normal varieties

Using algebraic geometry for phylogenetic reconstruction

Solving the 100 Swiss Francs Problem

SPECTRAHEDRA. Bernd Sturmfels UC Berkeley

Toric Fiber Products

Secant Varieties and Equations of Abo-Wan Hypersurfaces

Tensors: a geometric view Open lecture November 24, 2014 Simons Institute, Berkeley. Giorgio Ottaviani, Università di Firenze

Polynomials, Ideals, and Gröbner Bases

Algebraic Complexity in Statistics using Combinatorial and Tensor Methods

Maximum Likelihood Estimation in Latent Class Models For Contingency Table Data

Maximum Likelihood Estimation in Latent Class Models For Contingency Table Data

Oeding (Auburn) tensors of rank 5 December 15, / 24

Testing Algebraic Hypotheses

Algebra Homework, Edition 2 9 September 2010

ALGEBRA: From Linear to Non-Linear. Bernd Sturmfels University of California at Berkeley

Representations of Positive Polynomials: Theory, Practice, and

SPECTRAHEDRA. Bernd Sturmfels UC Berkeley

n P say, then (X A Y ) P

Noetherianity up to symmetry

Twisted commutative algebras and related structures

Tutorial: Gaussian conditional independence and graphical models. Thomas Kahle Otto-von-Guericke Universität Magdeburg

The Euclidean Distance Degree of an Algebraic Variety

Tensors. Lek-Heng Lim. Statistics Department Retreat. October 27, Thanks: NSF DMS and DMS

Lecture 1. Toric Varieties: Basics

Tensor decomposition and tensor rank

A new parametrization for binary hidden Markov modes

Infinite-dimensional combinatorial commutative algebra

Problems on Minkowski sums of convex lattice polytopes

The Maximum Likelihood Degree of Mixtures of Independence Models

The binomial ideal of the intersection axiom for conditional probabilities

Combinatorics and geometry of E 7

NOTES ON HYPERBOLICITY CONES

Algebraic Classification of Small Bayesian Networks

9. Birational Maps and Blowing Up

MA 206 notes: introduction to resolution of singularities

MIT Algebraic techniques and semidefinite optimization May 9, Lecture 21. Lecturer: Pablo A. Parrilo Scribe:???

E. GORLA, J. C. MIGLIORE, AND U. NAGEL

Toric Varieties in Statistics

4.4 Noetherian Rings

Total binomial decomposition (TBD) Thomas Kahle Otto-von-Guericke Universität Magdeburg

Maximum Likelihood Estimation in Latent Class Models For Contingency Table Data

Markov Chains and Hidden Markov Models

What is Singular Learning Theory?

Mixture Models and Representational Power of RBM s, DBN s and DBM s

The Generalized Neighbor Joining method

The Maximum Likelihood Threshold of a Graph

The intersection axiom of

On the centre of the generic algebra of M 1,1

ARITHMETICALLY COHEN-MACAULAY BUNDLES ON THREE DIMENSIONAL HYPERSURFACES

Example: Hardy-Weinberg Equilibrium. Algebraic Statistics Tutorial I. Phylogenetics. Main Point of This Tutorial. Model-Based Phylogenetics

Algebraic Statistics. Seth Sullivant. North Carolina State University address:

HILBERT BASIS OF THE LIPMAN SEMIGROUP

On the Waring problem for polynomial rings

12. Linear systems Theorem Let X be a scheme over a ring A. (1) If φ: X P n A is an A-morphism then L = φ O P n

DEFORMATIONS VIA DIMENSION THEORY

HYPERBOLIC POLYNOMIALS, INTERLACERS AND SUMS OF SQUARES

Algebraic combinatorics for computational biology. Nicholas Karl Eriksson. B.S. (Massachusetts Institute of Technology) 2001

INTERSECTION THEORY CLASS 2

SPACES OF RATIONAL CURVES IN COMPLETE INTERSECTIONS

THE MAXIMUM LIKELIHOOD DEGREE OF MIXTURES OF INDEPENDENCE MODELS

where m is the maximal ideal of O X,p. Note that m/m 2 is a vector space. Suppose that we are given a morphism

(Reprint of pp in Proc. 2nd Int. Workshop on Algebraic and Combinatorial coding Theory, Leningrad, Sept , 1990)

MAXIMUM LIKELIHOOD ESTIMATION AND EM FIXED POINT IDEALS FOR BINARY TENSORS. Daniel Lemke

Matrix rigidity and elimination theory

Maximum Likelihood Estimates for Binary Random Variables on Trees via Phylogenetic Ideals

Dissimilarity maps on trees and the representation theory of GL n (C)

9. Integral Ring Extensions

Received: 1 September 2018; Accepted: 10 October 2018; Published: 12 October 2018

11. Dimension. 96 Andreas Gathmann

SPACES OF RATIONAL CURVES ON COMPLETE INTERSECTIONS

Tensors. Notes by Mateusz Michalek and Bernd Sturmfels for the lecture on June 5, 2018, in the IMPRS Ringvorlesung Introduction to Nonlinear Algebra

The Geometry of Matrix Rigidity

FOURIER COEFFICIENTS OF VECTOR-VALUED MODULAR FORMS OF DIMENSION 2

INITIAL COMPLEX ASSOCIATED TO A JET SCHEME OF A DETERMINANTAL VARIETY. the affine space of dimension k over F. By a variety in A k F

arxiv: v2 [math.st] 22 Oct 2016

Algebraic matroids are almost entropic

A Robust Form of Kruskal s Identifiability Theorem

MATHEMATICAL ENGINEERING TECHNICAL REPORTS. Markov Bases for Two-way Subtable Sum Problems

GEOMETRY OF FEASIBLE SPACES OF TENSORS. A Dissertation YANG QI

Algebraic Geometry Spring 2009

Transcription:

Open Problems inalgebraic Statistics p. Open Problems in Algebraic Statistics BERND STURMFELS UNIVERSITY OF CALIFORNIA, BERKELEY and TECHNISCHE UNIVERSITÄT BERLIN Advertisement Oberwolfach Seminar Algebraic Statistics with Mathias Drton and Seth Sullivant May 11-17, 2008

Open Problems inalgebraic Statistics p. Models with Hidden Variables A Concrete Problem: Consider the variety of 4 4 4-tables of tensor rank at most 4. Do the known polynomial invariants of degree five and nine suffice to cut out this variety? [ASCB, Conjecture 3.24, page 102] The General Problem: Study the geometry and commutative algebra of graphical models with hidden random variables. Construct these varieties by gluing familiar secant varieties, and by applying representation theory.

Open Problems inalgebraic Statistics p. My favorite statistical model lives in the 64-dimensional space C 4 C 4 C 4 of 4 4 4-tables (p ijk ), where i,j,k {A,C,G,T}. It is the variety with parametric representation p ijk = ρ Ai σ Aj θ Ak + ρ Ci σ Cj θ Ck + ρ Gi σ Gj θ Gk + ρ Ti σ Tj θ Tk Our problem is to compute its homogeneous prime ideal I in the polynomial ring with 64 unknowns, Q [ p AAA,p AAC,p AAT,...,p TTG,p TTT ].

Open Problems inalgebraic Statistics p. Why (Pure) Mathematics? Mathematics is the Art of Giving the Same Name to Different Things (H. Poincaré). the set of 4 4 4-tables of tensor rank 4 mixture of independent random variables the naive Bayes model the CI model [X 1 X 2 X 3 Y ]. third secant variety of Segre variety P 3 P 3 P 3 general Markov model for phylogenetic tree K 1,3 entanglement of pure states (mean fields)

Open Problems inalgebraic Statistics p. Invariants of Degree Five Consider any 3 4 4-subtable of (p ijk ) and let A,B,C be the 4 4-slices gotten by fixing i. On our model, the following identity of 4 4-matrices holds: A B 1 C = C B 1 A After clearing the denominator det(b), this gives 16 quintic polynomials which lie in our prime ideal I. Proposition 1. The space of quintic polynomials in I has dimension 1728. As a GL(C 4 ) 3 -module, it equals S 311 (C 4 ) S 2111 (C 4 ) S 2111 (C 4 ) S 2111 (C 4 ) S 311 (C 4 ) S 2111 (C 4 ) S 2111 (C 4 ) S 2111 (C 4 ) S 311 (C 4 )

Open Problems inalgebraic Statistics p. Invariants of Degree Nine Consider any 3 3 3-subtable (A,B,C) of (p ijk ). On our model, the following 3 3-matrix is singular: A B 1 C C B 1 A The numerator of its determinant is a degree nine polynomial in I known as the Strassen invariant. Proposition 2. The GL(C 4 ) 3 -submodule of I 9 generated by the Strassen invariant is not contained in I 5. This module has dimension 8000 and it equals S 333 (C 4 ) S 333 (C 4 ) S 333 (C 4 ). References: [Garcia-Stillman-St], [Allman-Rhodes], [Landsberg-Manivel], [Landsberg-Weyman]

Open Problems inalgebraic Statistics p. Maximum Likelihood A Concrete Problem: Characterize all projective varieties whose maximum likelihood degree is equal to one. [HKS: Solving the Likelihood Equations, Problem 13] The General Problem: Study the geometry of maximum likelihood estimation for algebraic statistical models. What makes a model nice? Is the ML degree related to convergence properties of the EM algorithm?

Open Problems inalgebraic Statistics p. MLE in Projective Space Fix projective space P n with coordinates (p 0 : p 1 : : p n ). The n-dimensional probability simplex is identified with P n 0. For any data vector (u 0,u 1,...,u n ) N n+1 we consider the likelihood function L = p 0 u 0 p 1 u 1 p 2 u 2 p n u n (p 0 +p 1 + +p n ) u 0+u 1 + +u n This is a well-defined rational function on P n. Q: What are the critical points of this function?

Open Problems inalgebraic Statistics p. Algebraic Statistical Models are subvarieties of the projective space P n. The maximum likelihood degree of a model M is the number of critical points of the restriction to M of L = p 0 u 0 p 1 u 1 p 2 u 2 p n u n (p 0 +p 1 + +p n ) u 0+u 1 + +u n Here we only count critical points that are not poles or zeros, and u 0,u 1,...,u n are assumed to be generic. If M is smooth and the divisor of L is normal crossing then there is a formula for the ML degree. [Catanese-Hoşten-Khetan-St.] Otherwise, use resolution of singularities????

Open Problems inalgebraic Statistics p. 1 Determinantal Varieties are models M that are specified by imposing rank conditions on a matrix of unknowns. ( ) p00 p Example. 2 2-matrices 01 of rank 1. p 10 p 11 Independence for two binary random variables. The ML degree is one, i.e., the MLE is a rational function in the data (Take normalized row and column sums) Example. 3 3-matrices (p ij ) of rank 2. Mixture of two ternary random variables. The ML degree is ten. Open Problem. Determine the ML degree of the variety of m n-matrices of rank r.

Open Problems inalgebraic Statistics p. 1 The 100 Swiss Francs Problem Our data are two aligned DNA sequences... ATCACCAAACATTGGGATGCCTGTGCATTTGCAAGCGGCT ATGAGTCTTAAACGCTGGCCATGTGCCATCTTAGACAGCG... test the hypothesis that these two sequences were generated by DiaNA using one biased coin and four tetrahedral dice... [ASCB, Example 1.16] Model: Positive 4 4-matrices (p ij ) of rank 2. 3 3 2 2 True or False: The matrix (ˆp ) ij = 1 3 3 2 2 40 2 2 3 3 2 2 3 3 maximizes L = ( i p ii ) 4 ( i j p ij ) 2 ( i,j p ij ) 40.

Open Problems inalgebraic Statistics p. 1 Gaussian CI Models A Concrete Problem: Which sets of almost-principal minors can be zero for a positive definite symmetric 5 5-matrix? [Matùš, Studený, Drton, Sullivant, Parrilo,...] The General Problem: Study the geometry of conditional independence models for multivariate Gaussian random variables. Which semigraphoids can be realized by Gaussians?

Open Problems inalgebraic Statistics p. 1 Almost-Principal Minors A multivariate Gaussian with mean zero is specified by its covariance matrix Σ = (σ ij ). This is a positive definite n n-matrix: all principal minors are positive. An almost-principal minor of Σ has row indices {i} K and column indices {j} K for some K [n] and i,j [n]\k. It is denoted [i j K ]. Lemma. The determinant [i j K ] is zero if and only if the random variable i is independent of the random variable j given the joint random variable K.

Open Problems inalgebraic Statistics p. 1 Subvarieties of the PSD cone Let PSD n denote the ( ) n+1 2 -dimensional cone of positive semidefinite symmetric n n-matrices. A Gaussian conditional independence model is a subvariety of PSD n defined by equations [i j K ] = 0. In algebraic geometry, we would study these varieties over the complex numbers. Here we are interested in their real locus, and how it intersects the cone PSD n.

Open Problems inalgebraic Statistics p. 1 A Cyclic Example Let n = 5 and consider the CI model given by [ 1 2 {3} ] = σ 12 σ 33 σ 13 σ 23 [ 2 3 {4} ] = σ 23 σ 44 σ 24 σ 34 [ 3 4 {5} ] = σ 34 σ 55 σ 35 σ 45 [ 4 5 {1} ] = σ 45 σ 11 σ 14 σ 15 [ 5 1 {2} ] = σ 15 σ 22 σ 25 σ 12 This variety has two irreducible components σ 12,σ 23,σ 34,σ 45,σ 15 a toric variety of degree 31, which intersects PSD 5 only in the set of rank one matrices. Corollary: For Gaussians, the five given statements imply [ 1 2 ], [ 2 3 ], [ 3 4 ], [ 4 5 ], [ 5 1 ].

Open Problems inalgebraic Statistics p. 1 The Entropy Map The submodular cone SubMod n is solution set of the following system of linear inequalities in R 2n : H {i} K + H {j} K H {i,j} K + H K Taking the logarithms of all 2 n principal minors of any positive definite matrix defines the entropy map H : SDP n SubMod n Proposition. The Gaussian CI models are the inverse images of the faces of the polyhedral cone SubMod n. Problem: Characterize the image of H.

Open Problems inalgebraic Statistics p. 1 Conclusion This talk presented three concrete open problems: 1. Consider the variety of 4 4 4-tables of tensor rank at most 4. Do the known polynomial invariants of degree five and nine suffice to cut out this variety? 2. Characterize all projective varieties whose maximum likelihood degree is equal to one. 3. Which sets of almost-principal minors can be zero for a positive definite symmetric 5 5-matrix?

Open Problems inalgebraic Statistics p. 1 Bonus Problem: Rational Points Consider n discrete random variables X 1,X 2,...,X n with d 1,d 2,...,d n states. Any collection of CI statements defines a determinantal variety in the space of tables, C d 1 C d 2 C d n. The corresponding strict CI variety is the set of tables for which the given CI statements hold but all other CI statement do not hold. Problem: Does every strict CI variety have a Q-rational point? What if d 1,d 2,...,d n 0? [Matúš: Final Conclusions, CPC 8 (1999) p. 275]