Algebraic Complexity in Statistics using Combinatorial and Tensor Methods

Algebraic Complexity in Statistics using Combinatorial and Tensor Methods BY ELIZABETH GROSS THESIS Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Mathematics in the Graduate College of the University of Illinois at Chicago, 2013 Chicago, Illinois Defense Committee: Shmuel Friedland, Chair and Advisor Sonja Petrović, Advisor, Penn State Jan Verschelde Olga Kashcheyeva Lek-Heng Lim, University of Chicago

To Ryan and Sebastian. ii

ACKNOWLEDGMENTS Thank you to my advisor, Sonja Petrović, whose dedicated mentoring has prepared me for life as a mathematician. Sonja has been an excellent advisor, always challenging me to do better and always believing I could. I will continually be grateful for her guidance and knowledge. Thank you to my committee, Shmuel Friedland, Jan Verschelde, Olga Kashcheyeva, and Lek-Heng Lim. Jan and Olga have been supportive through their active participation in the Graduate Computational Algebraic Geometry Seminar. Shmuel has been a second advisor to me, and I have enjoyed our many conversations. I am grateful to Jan for his early mentoring and introducing me to numerical algebraic geometry. I ve been very lucky to have multiple people who have gone above and beyond in helping me succeed; of these people, Mathias Drton, Bernd Sturmfels, and Seth Sullivant deserve special recognition. I am grateful to my fellow classmates at UIC and my colleagues from SFSU. It has been inspiring to watch everyone s triumphs and wonderful to be part of such a vibrant and stimulating department. I am also grateful for the support of my friends and family, especially my father. Finally, I would like to thank Ryan. Ryan, you are a loving and devoted husband and father. You keep me grounded when I am in danger of floating away and soaring when I am leaden. Together, we have a beautiful life. Thank you for all your help. iii

TABLE OF CONTENTS CHAPTER PAGE 1 INTRODUCTION................................ 1 2 BACKGROUND................................. 6 2.1 Models, Ideals, and Varieties.................... 6 2.2 Toric models, Markov bases and Markov complexity...... 9 2.3 Phylogenetic Models......................... 10 2.4 Tensors, Rank, and Border Rank................. 12 2.5 Maximum Likelihood Estimation................. 14 3 TORIC IDEALS OF HYPERGRAPHS................. 18 3.1 Introduction.............................. 18 3.2 Preliminaries and notation..................... 21 3.3 Splitting sets and reducible edge sets............... 23 3.4 Indispensable Binomials....................... 28 3.5 General degree bounds....................... 34 3.6 Hidden Subset Models........................ 44 4 PHYLOGENETIC MODELS AND TENSORS OF BOUNDED RANK........................................ 50 4.1 Introduction.............................. 50 4.2 A characterization of V 4 (3, 3, 4).................. 53 4.3 Proving case A.I.3 using degree 6 polynomials......... 55 4.3.1 The case L = R = e 3 e 3....................... 58 4.3.2 The case L = e 3 e 3, R = e 3e 2................... 60 4.4 The defining polynomials of V 4 (4, 4, 4).............. 60 5 MAXIMUM LIKELIHOOD DEGREE OF VARIANCE COM- PONENT MODELS.............................. 62 5.1 Introduction.............................. 62 5.2 The likelihood equations...................... 66 5.2.1 Maximum likelihood......................... 67 5.2.2 Restricted maximum likelihood.................. 71 5.3 Proof of formula for ML degree.................. 72 5.4 Proof of formula for REML degree................ 81 5.5 Linear mixed models with multimodal likelihood functions.. 86 6 CONCLUSION.................................. 91 iv

TABLE OF CONTENTS (Continued) CHAPTER PAGE CITED LITERATURE............................ 94 VITA......................................... 100 v

LIST OF FIGURES FIGURE PAGE 1 Reducible balanced edge set. The green edge e s is the separator... 25 2 Reducible balanced edge set with an improper separator. The separator consists of green edges e 1 and e 2..................... 25 3 Hypergraph associated to the hierarchical log-linear model for no 3- way interaction.................................. 32 4 Example of a non-uniform hypergraph whose associated toric ideal is non-homogeneous................................ 44 5 Case 1. Proof of Theorem 32........................ 48 6 Case 3. Proof of Theorem 32........................ 48 vi

SUMMARY Fundamental questions in statistical modeling ask about the best methods for model selection, goodness-of-fit testing, and estimation of parameters. For example, given a collection of aligned DNA sequences from a group of extant species, how can we decide which evolutionary tree best describes the species ancestral history, or, given a sparse high-dimensional contingency table, how can we perform goodness-of-fit testing when exact tests are infeasible? In questions such as these, combinatorics, commutative algebra and algebraic geometry play a leading role. We explore such questions for specific classes of models, e.g. toric models, phylogenetic models, and variance components models, and tackle the algebraic complexity problems that lie at the root of them. We begin our exploration by studying toric ideals of hypergraphs, algebraic objects that are used for goodness-of-fit testing for log-linear models. In this study, we use the combinatorics of hypergraphs to give degree bounds on the generators of the ideals, give sufficiency conditions of when a binomial in the ideal is indispensable, show that the ideal of Tan(P 1 ) n is generated by quadratics and cubics in cumulant coordinates, and recover a well-known complexity theorem in algebraic statistics due to De Loera and Onn. Second, we explore phylogenetic models by viewing the models as sets of tensors with bounded rank. We show that the variety of 4 4 4 complex-valued tensors with border rank at most 4 is defined by polynomials of degree 5, 6, and 9. This variety corresponds to the 4-state general Markov model on the claw tree K 1,3 and its defining polynomials can be used in model selection. This vii

SUMMARY (Continued) result also gives further evidence that the phylogenetic ideal of the model can be generated by polynomials of degree 9 and less. Finally, we look at the algebraic complexity of maximum likelihood estimation for variance components models, where we give explicit formulas for the ML and REML degree of the random effects model for the one-way layout and give examples of multimodal likelihood surfaces. viii

CHAPTER 1 INTRODUCTION Algebraic statistics applies commutative algebra, algebraic geometry, and combinatorics to problems arising in statistics (for surveys see [22] and [25]). In this field of study, the idea of complexity comes up in several different respects. This thesis will explore three different complexity issues: Markov complexity for toric models, phylogenetic complexity for phylogenetic models, and maximum likelihood degree for variance components models. The first two of these are both defined as the maximum degree of polynomials in a minimal generating set of a specified ideal. The last measure of algebraic complexity, the maximum likelihood degree, is the degree of a zero-dimensional variety. A statistical model is a family of probability distributions where joint probabilities are often specified parametrically. If the joint probabilities, or, more commonly, their logarithms, are parameterized by polynomials, then the closure of the model is an algebraic variety. The underlying idea of algebraic statistics is that information about the variety yields statistical information about the model. For example, the generators of the vanishing ideal of a statistical model are useful in goodness-of-fit testing and model selection [19] [9]. These generators are called model invariants or, in the case of log-linear models, Markov bases. Part I of this dissertation is concerned with providing an efficient description of the model invariants for two different classes of discrete models, those encoded by hypergraphs and those specified as tensors of bounded border rank. 1

2 In Chapter 3, we focus on statistical models that are parameterized by square-free monomials. Examples of such models include log-linear models [24] [22], group-based phylogenetic models [58], and some hidden subset models [59]. In these cases, the parameterization can be encoded by a hypergraph H. The model invariants are the generators of the toric ideal of the hypergraph H, the kernel of the monomial map defined by the vertex-edge incidence matrix of H. In the context of algebraic statistics, a binomial generating set of a toric ideal is called a Markov basis. The Markov complexity, or Markov width, of a toric ideal is the maximum degree of the polynomials in a minimal generating set. Section 3.4 focuses on indispensable binomials, i.e. binomials that are members of every minimal generating set of I H. The degree of an indispensable binomial gives a lower bound on the Markov complexity of I H. Proposition 19 gives a combinatorial sufficient condition for determining whether a binomial f I H is indispensable. Consequently, the Graver basis is the unique minimal generating set of I H for any 2-regular hypergraph (Proposition 20). In Corollary 26, we apply our combinatorial lower bounds to recover a well-known complexity result in algebraic statistics from [16], which states that Markov bases for the no 3-way interaction model on 3 r c contingency tables are arbitrarily complicated. In Section 3.5, we show that a degree bound on the generators of I H is equivalent to a combinatorial criterion on H (See Thereom 27). This result generalizes work of Villareal [65] and Ohsugi and Hibi [47] regarding the toric ideals of graphs and offers a way of computing an upper bound for the Markov complexity of a given statistical model.

3 As an example, we consider hidden subset models. In [59], Sturmfels and Zwiernik show that the variety of the hidden subset model for n random variables with subsets {{1}, {2},... {n}} is isomorphic to the image of the first tangential variety Tan((P 1 ) n ) in cumulant coordinates. Using the hypergraph approach, we are able to show that in cumulant coordinates, the defining ideal of the first tangential variety Tan((P 1 ) n ) is generated in quadratics and cubics. Thus, as n grows, the Markov complexity of the hidden subset model associated to Tan((P 1 ) n ) remains constant. In Chapter 4, we study the 4-state general tree-based Markov model on the claw tree K 1,3 ; this model has applications to phylogenetics [1]. In terms of multilinear algebra, the variety of this model is the set of all tensors in C 4 C 4 C 4 of border rank at most 4. Thus, we shift from the combinatorics of hypergraphs to the study of three-way tensors in this chapter. For phylogenetic models, the vanishing ideal associated to the model is referred to as a phylogenetic ideal, and the maximum degree of the polynomials in a minimal generating set is called the phylogenetic complexity of the model. For the 4-state general Markov model on K 1,3, the phylogenetic complexity is conjectured to be 9 (see Conjecture 34), this was first conjectured in [57] and agrees with the numerical computations in [6]. In [26], Friedland shows that the variety associated to the model is cut out set-theorectically by polynomials of degree 5, 9, and 16. The goal of Chapter 4 is to replace the degree 16 equations with a set of degree 6 equations that are known to be in the ideal [39]. In Theorem 33, we tighten the results of Friedland in [26] and show that the variety of tensors in C 4 C 4 C 4 of border rank at most 4 is cut out by polynomials of degree 5, 6, and 9. In addition to providing

4 supporting evidence for Conjecture 34, this result, combined with results in [26] and [6], gives explicit polynomials that one can use to test whether a tensor in C 4 C 4 C 4 has border rank less than four, or equivalently, whether given data could have arisen from the 4-state general Markov model on the claw tree K 1,3. Chapter 5 is dedicated to a different kind of complexity problem: that of maximum likelihood estimation. In particular, we study the algebraic complexity, or ML degree, of maximum likelihood estimation for specific models. The ML degree gives us insight into the geometry of the likelihood surface. For example, it speaks to the total possible number of modes of the likelihood surface. Since common maximum likelihood estimation techniques use local numerical methods to find maxima, asking whether a given likelihood function could have more than one local maximum is an important, but often overlooked, question in statistics. The goal of maximum likelihood estimation is to find parameters that best explain a given data point. This amounts to finding the zero set of the likelihood equations. This zero set is also referred to as the likelihood locus and has been studied within an algebraic geometry setting in [10] and [33]. The number of complex solutions to the system of likelihood equations for generic data is called the maximum likelihood degree (ML degree); it is the degree of the likelihood locus and it quantifies the feasibility of using symbolic algebraic methods to find maximum likelihood estimates and gives an upper bound on the number of modes of the real likelihood surface. Another statistical method for finding likely parameters given an array of observations is restricted maximum likelihood estimation, a variation of maximum likelihood estimation. The REML method involves considering a projection of the observed array; for the one-way layout

5 with random effects this method returns a likelihood function dependent on only two of the three parameters. In the case of restricted maximum likelihood estimation, the REML degree of a model is defined similarly as the ML degree: it is the number of complex solutions to the restricted maximum likelihood equations for generic data. To our knowledge, the REML degree has never been studied for any statistical model. In Chapter 5, we give explicit formulas for the ML and REML degree for variance components models, specifically one-way layouts with random effects. We conclude Chapter 5 with two examples of a multimodal likelihood functions.

CHAPTER 2 BACKGROUND In this dissertation, we use a common shorthand notation for monomials using vectors. Let x = (x 1,... x N ) be a vector of indeterminates and let a Z N 0 be a non-negative integer vector. Then we denote x a = x a 1 1 xa 2 2 xa N N. 2.1 Models, Ideals, and Varieties In Part I, we will be concerned with discrete statistical models. The proceeding notation and definitions follow [22] and [60]. In the case of discrete models, we can think of the joint probability distribution of m random variables as a m-dimensional array P. Let X 1,... X m be discrete random variables with X l [r l ]. Let R = m l=1 [r l] and N = m l=1 r l. The (i 1, i 2,..., i m )th entry of P is the joint probability P (X 1 = i 1,..., X m = i m ). For simplicity, we will often flatten the tensor P into a vector p: Definition 1 The joint probability vector for X 1,... X m is the N-dimensional vector p = (p i i R) R N where p i is the joint probability p i = p i1...i m = P (X 1 = i 1,..., X m = i m ), i = (i 1,..., i m ) R. 6

7 Since we are concerned with probability distributions, the coordinates of p satisfy p i 0 for all i R and the constraint i R p i = 1. The set of all p that satisfy these constraints forms a N 1 dimensional simplex. Definition 2 The probability simplex N 1 is the set of all possible joint probability distributions for m random variables with respective state spaces [r 1 ],..., [r m ]. N 1 := {p R N p i 0 for all i R and i R p i = 1}. A statistical model M for the random variables X 1,..., X m is a subset of N 1. The models we consider in this dissertation are parametric statistical models. Definition 3 Let Θ R d be a parameter space and φ : Θ N 1 a map. The image M = φ(θ) N 1 is a parametric statistical model. In algebraic statistics, we are concerned with parametric statistical models where φ is a rational map. In this case, it is natural to consider φ as a map from C d C N. The advantage of this view is that the image φ(c d ) is well-approximatd by a variety, namely its Zariski closure (see [60][Theorem 3.6] and proceeding discussion). For completeness, we define here what we mean by variety, ideal of a subset of C N, and Zariski closure.

8 In the following discussion we work over the ring R[p], treating the joint probability distributions p i for i R as indeterminates. Definition 4 Let F be a collection of polynomials in R[p]. The variety of F is the zero set of F : V (F ) := {x C N f(x) = 0 for all f F }. Notice that in the above definition the variety V (F ) may not be irreducible, in general, we will use variety to mean an algebraic set. Let S C N. We define the ideal of S as I(S) := {f R[p] f(x) = 0 for all x S}. Given a variety V C N, the ideal I(V ) is the set of all polynomials that vanish on V. If V = V (J) for an ideal J, then J is called a set of defining equations for V. Notice that, in many cases, V = V (J) does not imply I(V ) = J; this implication is only true when J is radical. Thus, when we say that J defines a variety V set-theorectically, this means V = V (J) but not necessarily I(V ) = J. As stated above, we are often interested in the Zariski closure of the image of a rational map. The Zariski closure of S is V (I(S)); this is the smallest variety that contains S. For a parametric model with a polynomial parameterization φ, the ideal of the model will be denoted by I M := I(Im φ),

9 and the variety of the model as V M := V (I M ). Chapters 3 and 4 are motivated by parametric statistical models with polynomial parameterizations. Our main goal is to understand the complexity of the implicitization problem, i.e. finding the generators of I M. 2.2 Toric models, Markov bases and Markov complexity Let C = C \ {0}. If the map φ : C d C N in Definition 3 is monomial, then the parametric statistical model M = φ((c ) d ) N 1 is called a toric model. Toric models are generally referred to as log-linear models in statistical literature. In statistics, for discrete data, observations are recorded in a contingency table, m-dimensional arrays T where the (i 1,..., i m )th entry is the number of times the random vector (X 1,..., X m ) was observed in state (i 1,..., i m ). A Markov basis is a set of integer vectors that connects the lattice of all tables with the same sufficient statistics as T for all T in the sample space (the sufficient statistics are determined by the model; see [22]). Given a toric model M, the Fundamental Theorem of Markov Bases, which originally appeared in [19], establishes a oneto-one correspondence between Markov bases and binomial generating sets of I M. The ideal I M is a toric ideal, which is well-known to be a binomial ideal (see [56]). Since the focus of this dissertation is more algebraic than statistical, we will use the algebraic definition of Markov bases.

10 Definition 5 Let M be a toric model. Let F R[p] such that each f F is binomial. If F generates the ideal I M, then F is a Markov basis of M. The Markov complexity of a toric model M is the maximum degree of all polynomials in a minimal generating of I M. In Section 3.4, we use the Graver bases to bound the Markov complexity for certain toric models. The Graver basis of a toric ideal I is the set of all binomials p u p v such that there is no other binomial p w p z I such that p w divides p u and p z divides p v. Binomials in the Graver basis are called primitive. The Graver complexity of a toric model M is the maximum degree of all polynomials in the Graver basis of I M. Since the Graver basis of an ideal I contains the universal Gröbner basis of I [56], we see that the Graver basis is a generating set of I, and thus, Markov complexity of M Graver complexity of M. 2.3 Phylogenetic Models A problem that arises in computational biology and has been studied extensively in algebraic statistics is the problem of inferring phylogenetic trees given aligned DNA sequences of several living species. Algebraic methods for inferring phylogenetic trees were first proposed independently by Lake in [37] and Cavender and Felsenstein in [13]. The methods have been more recently explored by Cassanellas and Fernández-Sánchez in [14]. The models used in these studies are hidden Markov models, parametric statistical models with a polynomial parameterization. In a hidden Markov model, evolution is assumed to proceed along a directed tree with all edges moving away from the root. In the tree, each

11 node corresponds to a species. The leaves correspond to extant species while the internal nodes represent extinct species. Common hidden Markov models used in phylogenetics are the 2-state and 4-state models. In the 4-state model, each internal node is a hidden random variable and each leaf is an observed random variable; the state space for each random variable X i, hidden and observed, is {1, 2, 3, 4} which correspond to the nucleic bases {A, C, G, T }. Each edge (i, j) is assigned a 4 4 transition matrix whose (k, l)th entry is the conditional probability P (X j = l X i = k). In the general Markov model, the model we explore in Chapter 4, the only constraint on the transition matrices are that the rows sum to 1. We will refer to the four-state general Markov model on the tree T as M T. The ideal I MT is referred to as a phylogenetic ideal and its generators are called phylogenetic invariants. Definition 6 The phylogenetic complexity of a phylogenetic model M T is the maximum degree over all polynomials in a minimal generating set of I MT. When the tree T is a bifurcating tree, results in [21] and [2] state that all phylogenetic invariants of M T can be obtained from the phylogenetic invariants of M K1,3 where K 1,3 is the 3-leaf claw tree. Thus, it suffices to understand I MK1,3. Proposition 7 [22][Proposition 4.1.11] Let M 0 be the 4-state general Markov model on the claw tree K 1,3. V M0 is isomorphic to V 4 (4, 4, 4), the set of all complex-valued 4 4 4 tensors with border rank less than or equal to 4.

12 Thus, by Proposition 7, in order to understand V M0, we need to be able to understand the set of all complex-valued 4 4 4 tensors with border rank less than or equal to 4. 2.4 Tensors, Rank, and Border Rank This section uses terminology and definitions from [26], [38], and [41]. In Chapter 4 we focus on elements of C m C n C l, equivalently, three-way complex-valued tensors of dimension m n l. We will take a coordinate based perspective, considering a tensor T C m C n C l as an array T = [t i,j,k ] m,n,l i=j=k Cm n l whose (i, j, k)th entry is t i,j,k. Coordinate representations of tensors are also referred to as a hypermatrices in order to call attention to the fact that they are equipped with algebraic operations arising from the algebraic structure of C m C n C l rather than just data structures (see [41]). Just as matrices, we can define a notion of rank for tensors that is independent of the choice of bases for the vector spaces C m, C n, C l. This rank definition is also sometimes referred to as the outer product rank. Definition 8 A three-way tensor T C m C n C l is a rank one tensor if it can be written as the outer product of three vectors u C m, v C n, w C l, i.e. T = u v w. The (i, j, k)th element of T is u i v j w k.

13 Definition 9 The rank of a non-zero tensor T C m C n C l, denoted rank T, is the minimal number r such that there exist u i C m, v i C n, w i C l for 1 i r such that T = r u i v i w i. i=1 While the set of all matrices in C m C n with rank less than r is closed with respect to the Zariski topology, the set of all tensors in C m C n C l is not necessarily closed. Thus, we introduce the following notion of border rank as described in [38]. Definition 10 A tensor T has border rank r if it is a limit of tensors of rank r but is not a limit of tensors of rank s for any s < r. In this case, we write brank T = r. We use V r (m, n, l) to denote the set of all tensors in C m C n C l of border rank less than r. The set V r (m, n, l) is a closed irreducible variety whose projectivization is the rth secant variety of P m 1 P n 1 P l 1. Chapter 4 is concerned with determining defining equations of V 4 (4, 4, 4), the variety associated to the 4-state general Markov model on the 3-leaf claw tree. In Chapter 4, many of the results are phrased in terms of the slices of a tensor T C m C n C l. Slices are matrices obtained from a tensor T = [t i,j,k ] m,n,l i=j=k by fixing one of the three indices: a 1-slice or horizontal slice is obtained by fixing the 1st index, a 2-slice or lateral slice is obtained by fixing the 2nd index, and a 3-slice or frontal slice is obtained by fixing the 3rd index. A tensor T C m C n C l has m horizontal slices, n lateral slices, and l frontal slices. We denote slices as T q,p where p {1, 2, 3} and indicates whether it is a 1-slice, 2-slice, or 3-slice, and q gives the value of the fixed index. For example, T 1,3 = [t i,j,1 ] m,n i,j=1.

14 One way to understand the rank of a tensor is to understand the span of its frontal slices (or horizontal or lateral slices). Let the span of the frontal slices be denoted T 3 (T ) := span(t 1,3,..., T l,3 ) C m n. The following Theorem from [26] states the connection between T 3 (T ) and the rank of T. Theorem 11 [26][Theorem 2.1] Let T C m C n C l. Then rank T is the minimal dimension of a subspace U C m n that contains T 3 (T ) and is spanned by rank one matrices. Theorem 11 is used in Chapter 4 as we show that V 4 (4, 4, 4) is cut out by polynomials of degree 5, 6, and 9. 2.5 Maximum Likelihood Estimation In Chapter 5 we turn our attention towards algebraic complexity problems that arise in maximum likelihood estimation. Maximum likelihood estimation is a statistical method for estimating the most likely parameters of a probability density function given a set of observed data (e.g. a contingency table) and a statistical model. Let M = {f( θ) θ Θ} be a parametric statistical model with parameters θ = (θ 1,..., θ m ). Since the models we explore in Chapter 5 are continuous, we change our notation slightly from the preceding sections and denote a joint probability density function in M as f( θ).

15 In maximum likelihood estimation, one assumes that observed data are independently and identically distributed according to a probability density function f( θ 0 ) M. Thus, given a sample of n observations x 1,..., x n, the goal is to find the best estimate of θ 0. This amounts to maximizing the likelihood function. The likelihood function is the probability of observing x 1,..., x n given θ 0 = θ, that is, it is a function in the parameters θ n L(θ x 1,..., x n ) = f(x i θ). i=1 The estimator (or MLE) of θ 0, which is denoted ˆθ, is the value of the parameters θ that maximizes L(θ x 1,..., x n ). Since in many cases the logarithm of the likelihood function is easier to analyze, in statistics, we often consider the log-likelihood function: l(θ x 1,..., x n ) = n ln f(x i θ). i=1 If ˆθ is a maximum of the log-likelihood function, then ˆθ is the MLE for L(θ x 1,..., x n ). We will use the log-likelihood function in Chapter 5. A maximum of l(θ x 1,..., x n ) occurs when all its first partial derivatives are zero. The likelihood equations, or log-likelihood equations, are the equations { L θ i { l θ i = 0, i = 1,..., m} and = 0, i = 1,..., m}, respectively. In Chapter 5 we use likelihood equations to mean the log-likelihood equations.

16 The number of complex solutions to the likelihood equations is constant with probability one, and a data set is generic if it is not part of the null set for which the number of complex solutions is different. Thus, we define the maximum likelihood degree as: Definition 12 The maximum likelihood degree ( ML degree) is the number of complex solutions to the maximum likelihood equations for generic data. If the likelihood equations or log-likelihood equations are rational, there is symbolic (e.g Macaulay2) and numerical software (e.g PHCpack) that can find all the complex solutions to the likelihood equations. The remainder of the optimization problem then becomes evaluating the likelihood function at the solutions to determine at which point the maximum is attained. Thus the ML degree is a measure of the algebraic complexity of the problem of maximum likelihood estimation. For more background on ML degrees, see [33; 10; 8; 22; 57; 34]. The ML degree also gives insight into the geometry of maximum likelihood estimation. The likelihood surface is the real part of the hypersurface defined by the likelihood function. If the ML degree of a model is greater than one, then it is possible that the likelihood surface is multimodal, which suggests local methods of obtaining the maximum of the likelihood surface could fail. Section 5.5 gives an example of a multimodal likelihood surface. In Section 5.4, we study restricted maximum likelihood estimation, a variation of maximum likelihood estimation whose algebraic complexity has not been studied before for any statistical model. We define REML degree in terms of the restricted maximum likelihood equations.

17 Definition 13 The restricted maximum likelihood degree ( REML degree) is the number of complex solutions to the restricted maximum likelihood equations for generic data. Theorem 39 gives a formula for the REML degree for variance components models.

CHAPTER 3 TORIC IDEALS OF HYPERGRAPHS This chapter is based on work in [30] with Sonja Petrović. 3.1 Introduction Let H be a hypergraph on V = {1,..., n} with edge set E P(V ) \ { }. Each edge e i E of size d encodes a squarefree monomial x e i := j e i x j of degree d in the polynomial ring k[x 1,..., x n ]. The edge subring of the hypergraph H, denoted by k[h], is the following monomial subring: k[h] := k[x e i : e i E(H)]. The toric ideal of k[h], denoted I H, is the kernel of the monomial map φ H : k[t ei ] k[h] defined by φ H (t ei ) = x e i. The ideal I H encodes the algebraic relations among the edges of the hypergraph. For the special case where H is a graph, generating sets of the toric ideal of k[h] have been studied combinatorially in [47], [48], [51], [62], [65], and [66]. The combinatorial signatures of generators of I H are balanced edge sets of H. Balanced edge sets on uniform hypergraphs were introduced in [50], and are referred to as monomial walks. This chapter is based on the fact that the ideal I H is generated by binomials f E arising from primitive balanced edge sets E of H (See Proprosition 14, a generalization of [50, Theorem 2.8]). A balanced edge set of H is a multiset of bicolored edges E = E blue E red satisfying 18

19 the following balancing condition: for each vertex v covered by E, the number of red edges containing v equals the number of blue edges containing v, that is, deg blue (v) = deg red (v). (3.1.1) A binomial f E arises from E if it can be written as f E = t e t e. e E blue e E red Note that while H is a simple hypergraph (it contains no multiple edges), E allows repetition of edges. In addition, the balanced edge set E is primitive if there exists no other balanced edge set E = E blue E red such that E blue E blue and E red E red; this is the usual definition of an element in the Graver basis of I H. If H is a uniform hypergraph, a balanced edge set is called a monomial walk to conform with the terminology in [65], [66] and [50]. The motivation for studying toric ideals I H in this work is their connection to Markov bases for statistical models parameterized by monomials as described in Section 2.2. In what follows, we give two general degree bounds for generators of I H (Section 3.5), study the combinatorics of splitting sets and reducibility (defined in Section 3.3), and explore implications to algebraic statistics and Markov complexity throughout. Section 3.4 focuses on indispensable binomials, i.e. binomials that are members of every minimal generating set of I H. Proposition 19 gives a combinatorial sufficient condition for determining whether a binomial f I H is indispensable. Consequently, the Graver basis is the unique minimal generating set of I H for any 2-regular

20 hypergraph (Proposition 20). In particular, this means that the Graver basis is equal to the universal Gröbner basis, although the defining matrix need not be unimodular. Theorem 27 is a combinatorial criterion for the ideal of a uniform hypergraph to be generated in degree at most d 2. The criterion is based on decomposable balanced edge sets, separators, and splitting sets; see Definitions 15 and 16. Our result generalizes the well-known criterion for the toric ideal of a graph to be generated in degree 2 from [47], [65], and [66]. Splitting sets translate and extend the constructions used in [47], [65], and [66] to hypergraphs and arbitrary degrees. Theorem 29 provides a more general result for non-uniform hypergraphs. Since log-linear models, by definition, have a monomial parametrization, we can also associate to any log-linear model M with a square-free parameterization a (non-uniform) hypergraph H M. By Proposition 14, Markov moves for the model M are described by balanced edge sets of H M : if E is a balanced edge set of H M, then a Markov move on a fiber of the model corresponds to replacing the set of red edges in E by the set of blue edges in E. Our degree bounds give a bound for the Markov complexity of the model M. We apply our combinatorial criteria to recover a well-known complexity result in algebraic statistics from [16] in Corollary 26. Finally, we study the Markov complexity of a set of models from [59] called hidden subset models; the Zariski closure of these models are tangential varieities. Namely, Theorem 32 says that the ideal associated to the image of Tan((P 1 ) n ) in higher cumulants is generated by quadratics and cubics.

21 3.2 Preliminaries and notation We remind the reader that all hypergraphs in this chapter are simple, that is, they contain no multiple edges. In contrast, balanced edge sets of hypergraphs are not, since the binomials arising from the sets need not be squarefree. Therefore, for the purpose of this manuscript, we will refer to a balanced edge set as a multiset of edges, with implied vertex set; and, as usual, V (E) denotes the vertex set contained in the edges in E. For the remainder of this short section, we will clear the technical details and notation we need for the proofs that follow. A multiset, M, is an ordered pair (A, f) such that A is a set and f is a function from A to N >0 that records the multiplicity of each of the elements of A. For example, the multiset M = ({1, 2}, f) with f(1) = 1 and f(2) = 3 represents M = {1, 2, 2, 2} where ordering doesn t matter. We will commonly use the latter notation. Given a multiset M = (A, f), the support of M is supp (M) := A, and its size is M := a A f(a). For two multisets M 1 = (A, f 1 ) and M 2 = (B, f 2 ), we say M 2 M 1 if B A and for all b B, f 2 (b) f 1 (b). M 2 is a proper submultiset of M 1 if B A, or there exists a b B such that f 2 (b) < f 1 (b). way: Unions, intersections, and relative complements of multisets are defined in the canonical

22 f 1 (a) if a A \ B, M 1 M 2 := (A B, g) where g(a) = f 2 (a) if a B \ A, max(f 1 (a), f 2 (a)) if a A B; M 1 M 2 := (A B, g) where g(a) = min(f 1 (a), f 2 (a)); f 1 (a) if a A \ B, M 1 M 2 := (C, g), where g(a) = f 1 (a) f 2 (a) otherwise. and C = A \ B {a A B f 1 (a) f 2 (a) > 0} Note that the support of the union (intersection) of two multisets is the union (intersection) of their supports. Finally, we define a sum of M 1 and M 2 :

23 f 1 (a) if a A \ B, M 1 M 2 := (A B, g) where g(a) = f 2 (a) if a B \ A. f 1 (a) + f 2 (a) if a A B If M 1 M 2 is a balanced edge set, then the notation M 1 b M 2 will be used to record the bicoloring of M 1 M 2 : edges in M 1 are blue, and edges in M 2 are red. Finally, the number of edges in a hypergraph H containing a vertex v will be denoted by deg(v; H). For a bicolored multiset M := M blue m M red, the blue degree deg blue (v; M) of a vertex v is defined to be deg(v; M blue ). The red degree deg red (v; M) is defined similarly. 3.3 Splitting sets and reducible edge sets The aim of this section is to lay the combinatorial groundwork for studying toric ideals of hypergraphs. In particular, we explicitly state what it combinatorially means for a binomial arising from a monomial walk to be generated by binomials of a smaller degree. We begin by describing the binomial generators of I H. Unless otherwise stated, H need not be uniform. Proposition 14 Every binomial in the toric ideal of a hypergraph corresponds to a balanced edge set. In particular, the toric ideal I H is generated by primitive balanced edge sets. Proof. Suppose E is a balanced multiset of edges over H. Define a binomial f E k[t e : e E(H)] as follows: f E = t e t e. e E blue e E red

24 The balancing condition (3.1.1) ensures that f E is in the kernel of the map φ H. The second claim is immediate. Motivated by the application of reducible simplicial complexes to understand the Markov bases of hierarchical log-linear models [20], we now introduce notions of reducibility and separators for balanced edge sets. For simplicity, we will often abuse notation and use H to denote the edge set of H. Definition 15 A balanced edge set E is said to be reducible with separator S, supp (S) supp (E), and decomposition (Γ 1, S, Γ 2 ), if there exist balanced edge sets Γ 1 E and Γ 2 E with S such that S = Γ 1red Γ 2blue, E = Γ 1 Γ 2, and the following coloring conditions hold: Γ 1red, Γ 2red E red and Γ 1blue, Γ 2blue E blue. We say that S is proper with respect to (Γ 1, S, Γ 2 ) if S is a proper submultiset of both Γ 1red and Γ 2blue. If S is not proper, then S is said to be blue with respect to (Γ 1, S, Γ 2 ) if Γ 1red = S, and red with respect to (Γ 1, S, Γ 2 ) if Γ 2blue = S. Figure 1 shows an example of a reducible balanced edge set E. The separator is proper and consists of the single green edge e s ; it appears twice in the balanced edge set E, once as a blue edge and once as a red edge. Figure 2 shows a reducible balanced edge set where the separator, consisting of the two green edges e 1 and e 2, is not proper. As before, the separator edges appear twice in the balanced edge set.

25! "#! "#! $# Figure 1. Reducible balanced edge set. The green edge e s is the separator. Figure 2. Reducible balanced edge set with an improper separator. The separator consists of green edges e 1 and e 2. If H is a hypergraph and E is a balanced edge set with supp (E) H, given a multiset S with supp (S) H, we can construct a new balanced edge set in the following manner: E + S := (E blue S) m (E red S). Definition 16 Let H be a hypergraph. Let E be a balanced edge set with size 2n such that supp (E) H. A non-empty multiset S with supp (S) H is a splitting set of E with decomposition (Γ 1, S, Γ 2 ) if E + S is reducible with separator S and decomposition (Γ 1, S, Γ 2 ). S is said to be a blue ( red, resp.) splitting set with respect to (Γ 1, S, Γ 2 ), if S is a blue (red, resp.) separator of E + S with respect to (Γ 1, S, Γ 2 ). S is a proper splitting set of E if there exists a decomposition (Γ 1, S, Γ 2 ) of E + S such that S is a proper separator with respect to (Γ 1, S, Γ 2 ).

26 Example 17 (Group-based Markov model) Let V 1 = {x 1, x 2, x 3, x 4 }, V 2 = {y 1, y 2, y 3, y 4 }, and V 3 = {z 1, z 2, z 3, z 4 }. Let V be the disjoint union of V 1, V 2, and V 3. Let H be the 3-uniform hypergraph with vertex set V and edge set: e 111 = {x 1, y 1, z 1 } e 122 = {x 1, y 2, z 2 } e 133 = {x 1, y 3, z 3 } e 144 = {x 1, y 4, z 4 } e 221 = {x 2, y 2, z 1 } e 212 = {x 2, y 1, z 2 } e 243 = {x 2, y 4, z 3 } e 234 = {x 2, y 3, z 4 } e 331 = {x 3, y 3, z 1 } e 342 = {x 3, y 4, z 2 } e 313 = {x 3, y 1, z 3 } e 324 = {x 3, y 2, z 4 } e 441 = {x 4, y 4, z 1 } e 432 = {x 4, y 3, z 2 } e 423 = {x 4, y 2, z 3 } e 414 = {x 4, y 1, z 4 } The hypergraph H has applications in algebraic phylogenetics: it represents the parametrization of the Z 2 Z 2 group-based Markov model on the claw tree K 1,2 (see [58, Example 25]). This model is a submodel of the 4-state general Markov model on K 1,2 described in [addme]. Consider the monomial walk W = {e 324, e 111, e 243, e 432 } m {e 122, e 313, e 234, e 441 }. Let S = {e 133, e 212 }. Then S is a splitting set of W with decomposition (Γ 1, S, Γ 2 ) where Γ 1 = {e 111, e 243, e 432 } m {e 133, e 212, e 441 } Γ 2 = {e 133, e 212, e 324 } m {e 122, e 313, e 234 }.

27 The decomposition (Γ 1, S, Γ 2 ) encodes binomials in I H that generate f W : f W = t e324 (t e111 t e243 t e432 t e133 t e212 t e441 ) + t e441 (t e133 t e212 t e324 t e122 t e313 t e234 ). The previous example illustrates the algebraic interpretation of a splitting set. Notice there is a correspondence between monomials in k[t ei ] and multisets of edges of H. We will write E(t a 1 e i1 t a 2 e i2 t a l e il ) for the multiset ({e i1,..., e il }, f) where f : {e i1,..., e il } N e ij a j. Thus the support of E(t a 1 e i1 t a 2 e i2 t a l e il ) corresponds to the support of the monomial t a 1 e i1 t a 2 e i2 t a l e il. If f E = u v I H is the binomial arising from the balanced edge set E, then a monomial s corresponds to a splitting set S if and only if there exist two binomials u 1 v 1, u 2 v 2 I H such that us = u 1 u 2, vs = v 1 v 2 and s = gcd(v 2, u 1 ). In this case, the decomposition of E + S is (Γ 1, S, Γ 2 ) where Γ 1 = E(u 1 ) m E(v 1 ) and Γ 2 = E(u 2 ) m E(v 2 ). For a balanced edge set, E, the existence of a spitting set determines whether the binomial f E I H can be written as the linear combination of two binomials f Γ1,f Γ2 I H. While, in general, the existence of a splitting set does not imply deg(f Γ1 ), deg(f Γ2 ) < deg(f E ), if H is uniform and the splitting set is proper, then the following lemma holds.

28 Lemma 18 Let H be a uniform hypergraph and let W be a monomial walk with supp (W) H and W = 2n. If S is a proper splitting set of W, then there exists a decomposition (Γ 1, S, Γ 2 ) of W + S such that Γ 1 < W and Γ 2 < W. Proof. Let S be a proper splitting set of W. By definition, there exists a decomposition (Γ 1, S, Γ 2 ) of W + S, such that S is a proper submultiset of Γ 1red and Γ 2blue. Let Γ 1 = 2n 1 and Γ 2 = 2n 2. Since W + S = Γ 1 Γ 2, it follows that W + S = Γ 1 + Γ 2. Then, 2n + 2 S = 2n 1 + 2n 2, which implies 2n 2n 1 = 2n 2 2 S. But S being a proper submultiset of Γ 2blue gives that n 2 > S, which, in turn, implies that n > n 1. By a similar argument, n > n 2. Thus Γ 1 < W and Γ 2 < W. 3.4 Indispensable Binomials A binomial f in a toric ideal I is indispensable if f or f belongs to every binomial generating set of I. Indispensable binomials of toric ideals were introduced by Takemura et al, and are studied in [63], [3], [11], [48], [51]. The degree of a indispensable binomial in I H is a lower bound on the Markov complexity of the model associated to H. Proposition 19 Let H be a hypergraph. Let E be a balanced edge set with supp (E) H. Let f E be the binomial arising from E. If there does not exist a splitting set of E, then f E is an indispensable binomial of I H. Proof. Suppose E is not indispensable. Then there is a binomial generating set of I H, G = {f 1,..., f n }, such that f E / G and f E / G.

29 Since f E = f + E f E I H, there is a f i = f + i f i G such that f + i or f i divides f + E. Without loss of generality, assume f + i f + E. Since f i is a binomial in I H, f i arises from a monomial walk E i on H. Let S = E ired. Let Γ 1 = E i and Γ 2 = Γ 2blue m Γ 2red where Γ 2blue = ((E blue E iblue ) E ired ) Γ 2red = E red. Since f + i f + E, the multiset E i blue E blue, and thus Γ 1 Γ 2 = E + S. By construction, Γ 1red Γ 2blue = S. Therefore S is a splitting set of E. If every Graver basis element of a binomial ideal I H is indispensable, then the Graver basis of I H is the unique minimal generating set of I H. Propositions 20 and 24 describe two classes of hypergraphs where this is the case. In particular, for these hypergraphs, the universal Gröbner basis of I H is a minimal generating set. Proposition 20 If H is a 2-regular uniform hypergraph, then the Graver basis of I H is the unique minimal generating set of I H. For the proof of Proposition 20, we make use of Proposition 3.2 in [50] which concerns balanced edge sets that are pairs of perfect matchings. Definition 21 A matching on a hypergraph H = (V, E) is a subset M E such that the elements of M are pairwise disjoint. A matching is called perfect if V (M) = V.

30 Proof. [Proof of Proposition 20] Let G be the Graver basis of I H and let f G. Since every element of G is binomial, f arises from a primitive monomial walk W with supp (W) H. Let M b = supp (W red ) and M r = supp (W blue ). By primitivity of W, the intersection M r M b =. Since W satisfies condition (3.1.1) and H is 2-regular, if e 1, e 2 M b and e 1 e 2, then e 1 M r or e 2 M r, which would contradict the primitivity of W. So M b and M r are two edge-disjoint perfect matchings on V (W). By Proposition 3.2 in [50], W contains no multiple edges, i.e. W = M b m M r. Furthermore, since H is 2-regular, the edge set of the subhypergraph induced by V (W) is M b M r Suppose S is a splitting set of W with decomposition (Γ 1, S, Γ 2 ). By the correspondence between primitive monomial walks and primitive binomials, there exists a primitive monomial walk Γ such that Γ blue Γ 1blue and Γ red Γ 1red (if Γ 1 is primitive, then Γ = Γ 1 ). By Proposition 3.2 in[50], Γ must be a pair of perfect matchings on V (Γ ). This means Γ is a proper balanced edge set of W, a contradiction. Therefore, by Proposition 19, f W is indispensable. Since every element in the Graver basis of I H is indispensable, there is no generating set of I H strictly contained in the Graver basis, and the claim follows. Definition 22 A k-uniform hypergraph H = (V, E) is k-partite if there exists a partition of V into k disjoint subsets, V 1,..., V k, such that each edge in E contains exactly one vertex from each V i. Lemma 23 Let H = (V, E) be a k-uniform k-partite hypergraph with E = E b E r and E b E r =. If there exists a V i, 1 i k, such that deg(v; E r ) = deg(v; E b ) = 1 for all v V i, then a monomial walk W with support E is primitive only if W contains no multiple edges.

31 Proof. Follows from the proof of necessity of Proposition 3.2 in [50]. Proposition 24 Let H = (V, E) be a k-uniform k-partite hypergraph. If there exists a V i such that deg(v; E) = 2 where for all v V i, then the Graver basis of I H is the unique minimal generating set of I H. Proof. The proof is similar to the proof of Proposition 20. Note that while H may not be 2-regular, one of its parts, V i, is locally 2-regular, and thus restricts the structure of monomial walks on H. In particular, Lemma 23 ensures that M r and M b, are edge-disjoint perfect matchings on V (W) Vi, and the rest of the proof follows immediately. Example 25 (No 3-way interaction) The toric ideal of the hypergraph H in Figure 3 corresponds to the hierarchical log-linear model for no 3-way interaction on 2 2 2 contingency tables. This statistical model is a common example in algebraic statistics [22, Example 1.2.7]. Since there is exactly one primitive monomial walk W on H that travels through 8 edges, I H = (f W ). For 2 3 3 contingency tables with no 3-way interaction, the hypergraph corresponding to this log-linear model has 18 edges. The hypergraph in this case is H = (V, E) where V =

32 Figure 3. Hypergraph associated to the hierarchical log-linear model for no 3-way interaction. {x 00, x 01, x 02, x 10, x 11, x 12, y 00, y 01, y 02, y 10, y 11, y 12, z 00, z 01, z 02, z 10, z 11, z 12, z 20, z 21, z 22 } and the edge set is: e 000 = {x 00, y 00, z 00 } e 001 = {x 00, y 01, z 01 } e 002 = {x 00, y 02, z 02 } e 010 = {x 01, y 00, z 10 } e 011 = {x 01, y 01, z 11 } e 012 = {x 01, y 02, z 12 } e 020 = {x 02, y 00, z 20 } e 021 = {x 02, y 01, z 21 } e 022 = {x 02, y 02, z 22 } e 100 = {x 10, y 10, z 00 } e 101 = {x 10, y 11, z 01 } e 102 = {x 10, y 12, z 02 } e 110 = {x 11, y 10, z 10 } e 111 = {x 11, y 11, z 11 } e 112 = {x 11, y 12, z 12 } e 120 = {x 12, y 10, z 20 } e 121 = {x 12, y 11, z 21 } e 122 = {x 12, y 12, z 22 } Let W be the primitive monomial walk W = {e 000, e 101, e 011, e 112, e 022, e 120 } m {e 100, e 001, e 111, e 012, e 122, e 220.}

33 Every remaining edge H that does not appear in W is not contained in V (W), thus it can be easily verified that there does not exist a splitting set of W, so by Proposition 19, f W is indispensable. In fact, H satisfies the condition of Proposition 24 and thus every binomial in I H corresponding to a primitive monomial walk is indispensable. From the above discussion, we can see that if a uniform hypergraph H contains an induced subhypergraph H s that is 2-regular and there exists a bicoloring such that with this bicoloring H s is also a balanced edge set, then the maximum degree of any minimal generating set of I H is at least E(H s ) /2. A similar statement holds for k-uniform, k-partite hypergraphs with vertex partition V = k i=1 V i. Namely, if H contains an induced subhypergraph H s that is 2-regular on V i (i.e., H satisfies the conditions of Proposition 24) and there exists a bicoloring such that with this bicoloring H s is a balanced edge set (e.g., H s is a pair of disjoint perfect matchings), then the maximum degree of any minimal generating set of I H is at least E(H s ) /2. Recall that degree bounds on minimal generators give a Markov complexity bound for the corresponding log-linear model in algebraic statistics. This allows us to recover a well-known result: Corollary 26 (Consequence of Theorem 1.2 in [16]; see also Theorem 1.2.17 in [22]) The Markov complexity for the no 3-way interaction model on 3 r c contingency tables grows arbitrarily large as r and c increase. Proof. For the no 3-way interaction model on 2 r c contingency tables, we can construct a primitive binomial f Hs of degree 2 min(r, c) in its defining toric ideal by taking a cycle of

34 length min(r, c) on the bipartite graph K r,c. (We remind the reader that this is precisely how f W is constructed in Example 25). By noting that the hypergraph associated to this binomial H s is an induced subhypergraph of the hypergraph associated to the 3 r c case and that H s is 2-regular in one of the partitions, the claim follows by Proposition 24. 3.5 General degree bounds For uniform hypergraphs, balanced edge sets are referred to as monomial walks. In the previous sections, we saw that splitting sets of W translate to algebraic operations on the binomials f W, providing a general construction for rewriting a high-degree binomial in terms of binomials corresponding to shorter walks. This, along with Lemma 18, is the key to the general degree bound result. Theorem 27 Given a k-uniform hypergraph H, the toric ideal I H is generated in degree at most d if and only if for every primitive monomial walk W of length 2n > 2d, with supp (W) H, one of the following two conditions hold: i) there exists a proper splitting set S of W, or ii) there is a finite sequence of pairs, (S 1, R 1 ),..., (S N, R N ), such that S 1 and R 1 are blue and red splitting sets of W of size less than n with decompositions (Γ 11, S 1, Γ 21 ) and (Υ 11, R 1, Υ 21 ), S i+1 and R i+1 are blue and red splitting sets of W i = Γ 2iblue m Υ 1ired of size less than n with decompositions (Γ 1i+1, S i+1, Γ 2i+1 ) and (Υ 1i+1, R i+1, Υ 2i+1 ), and,

35 S N R N or there exists a proper splitting set of W N. Proof. [Proof of necessity ( )] Let H be a k-uniform hypergraph whose toric ideal I H is generated in degree at most d. Let W be a primitive monomial walk of length 2n > 2d. Let p W = u v be the binomial that arises from W. Since I H is generated in degree at most d, there exist primitive binomials of degree at most d, (u 1 v 1 ),..., (u s v s ) k[t ei ], and m 1,..., m s k[t ei ], such that p W = m 1 (u 1 v 1 ) + m 2 (u 2 v 2 ) +... + m s (u s v s ). By expanding and reordering so that m 1 u 1 = u w, m s v s = v w, and m i v i = m i+1 u i+1 for all i = 1,..., s 1, we may and will assume that m 1,..., m s are monomials. If gcd(m i, m i+1 ) 1 for some i, we can add the terms m i (u i v i ) and m i+1 (u i+1 v i+1 ) to get a new term, m i (u i v i ), where m i = gcd(m i, m i+1 ) and (u i v i ) is an binomial of I H of degree less than n. Continuing recursively in the manner, we have p W = m 1(u 1 v 1) + m 2(u 2 v 2) +... + m r(u r v r) where m 1 u 1 = u w, m rv r = v w, m i v i = m i+1 u i+1, gcd(m i, m i+1 ) = 1 for all i = 1,..., r 1, and deg(u i v i ) < n for all i = 1,... r. For convenience, we will drop the superscripts and write p w = m 1 (u 1 v 1 ) + m 2 (u 2 v 2 ) +... + m r (u r v r ).

36 Case 1: r = 2. In this case, p W = m 1 (u 1 v 1 ) + m 2 (u 2 v 2 ). Let Γ 1 := E(u 1 ) m E(v 1 ) Γ 2 := E(u 2 ) m E(v 2 ) S := E(v 1 ) E(u 2 ) = E(gcd(v 1, u 2 )). We want to show (Γ 1, S, Γ 2 ) is a decomposition of W + S. Since S = Γ 1red Γ 2blue, Γ 1blue W blue, and Γ 2red W red, we only need to show W + S = Γ 1 Γ 2, Γ 2red (W + S) red, and Γ 2blue (W + S) blue. First, notice the following equalities hold: W + S = (W blue S) (W red S) = E(u) S E(v) S = E(m 1 u 1 ) S E(m 2 v 2 ) S = E(m 1 ) E(u 1 ) S E(m 2 ) E(v 2 ) S. Let s k[t ei ] be the monomial such that E(s) = S, so s = gcd(v 1, u 2 ). The equality m 1 v 1 = m 2 u 2 implies m 1 ( v 1 s ) = m 2 ( u 2 s ). Now, v 1 s and u 2 s are clearly relatively prime, and by the assumptions on p W, m 1 and m 2 are relatively prime. This means the equality m 1 ( v 1 s ) = m 2 ( u 2 s ) implies m 1 = u 2 s and m 2 = v 1 s. Thus, Γ 1 Γ 2 = E(u 1 ) E(v 1 ) E(u 2 ) E(v 2 ) = E(u 1 ) E( v 1 s ) S E(v 2) E( u 2 s ) S = E(u 1 ) E(m 2 ) S E(v 2 ) E(m 1 ) S.

37 Consequently, W + S = Γ 1 Γ 2. Notice the equality m 2 = v 1 s also implies Γ 1red = E(v 1 ) = E(m 2 ) S. This means Γ 1red (E(m 2 u 2 ) S) = (W red S) = (W + S) red. By a similar observation, Γ 2blue (W + S) blue. Case 2: r = 2N + 1. For 1 < i < N, let Γ 1i = E(u i ) m E(v i ) Γ 2i = E(m i+1 u i+1 ) m E(m 2N i+2 v 2N i+2 ) S i = E(v i ) E(m i+1 u i+1 ) = E(gcd(v i, m i+1 u i+1 )) = E(v i ). For 1 < i < N, let Υ 1i = E(m i u i ) m E(m 2N i+1 v 2N i+1 ) Υ 2i = E(u 2N i+2 ) m E(v 2N i+2 ) R i = E(m 2N i+1 v 2N i+1 ) E(u 2N i+2 ) = E(gcd(m 2N i+1 v 2N i+1, u 2N i+2 )) = E(u 2N i+2 ). One can follow the proof of Case 1) to see that S 1 and R 1 are splitting sets of W, and S i+1 and R i+1 are splitting sets of W i = E(m i+1 u i+1 ) m E(m 2N i+1 v 2N i+1 ) for i = 1,..., N 1. Furthermore, by definition, they are blue and red splitting sets (resp.) of size less than 2n.

38 Since W N 1blue = Γ 2N 1blue and W N 1red = Υ 1N 1red, the binomial arising from the walk on W N 1 is m N u N m N+2 v N+2 = m N (u N v N ) + m N+1 (u N+1 v N+1 ) + m N+2 (u N+2 v N+2 ). Choose e H such that t e m N+1, then t e v N and t e u N+2. But since S N = E(v N ) and R N = E(u N+2 ), e S N and e R N, so S N R N. Case 3: r = 2N + 2. For 1 < i < N, let Γ 1i = E(u i ) m E(v i ) Γ 2i = E(m i+1 u i+1 ) m E(m 2N i+3 v 2N i+3 ) S i = E(v i ) E(m i+1 u i+1 ) = E(gcd(v i, m i+1 u i+1 )) = E(v i ). For 1 < i < N, let Υ 1i = E(m i u i ) m E(m 2N i+2 v 2N i+2 ) Υ 2i = E(u 2N i+3 ) m E(v 2N i+3 ) R i = E(m 2N i+2 v 2N i+2 ) E(u 2N i+3 ) = E(gcd(m 2N i+2 v 2N i+2, u 2N i+3 )) = E(u 2N i+3 ). We can follow the proof of Case 1) to see that S 1 and R 1 are splitting sets of W, and S i+1 and R i+1 are splitting sets of W i = E(m i+1 u i+1 ) m E(m 2N i+2 v 2N i+2 ) for i = 1,..., N 1.

39 Furthermore, by definition, they are blue and red (resp.) splitting sets of size less than n. Since W Nblue = Γ 2Nblue and W Nred = Υ 1Nred, the binomial arising from W N is m N+1 u N+1 m N+2 v N+2 = m N+1 (u N+1 v N+1 ) + m N+2 (u N+2 v N+2 ) which is exactly case 1), which means there exists a proper splitting set of W N. Proof. [Proof of sufficiency ( )] Assume every primitive monomial walk W of length 2n > 2d with supp (W) H satisfies i) or ii). Let p W = u v be a generator of I H which arises from the monomial walk W on H. To show that I H = [I H ] d, we proceed by induction on the degree of p W. If deg p W = 2, then p W [I H ] d. So assume deg p W = n > d and every generator of I H of degree less than n is in [I H ] d. Since the size of W is greater than 2d, either condition i) holds or condition ii) holds. Suppose i) holds. By Lemma 3.5, there exists a decomposition of W, (Γ 1, S, Γ 2 ), such that Γ 1 < W and Γ 2 < W. Let p Γ1 = u 1 v 1 (p Γ2 = u 2 v 2, respectively) be the binomial that arises from Γ 1 (Γ 2, respectively). Let m 1 = u/u 1 and m 2 = v/v 2. What remains to be shown is that p W = m 1 p Γ1 + m 2 p Γ2, that is, u v = m 1 (u 1 v 1 ) + m 2 (u 2 v 2 ). However, it is clear that u = m 1 u 1 and v = m 2 v 2, so it suffices to show is that m 1 v 1 = m 2 u 2, or equivalently, E(m 1 v 1 ) = E(m 2 u 2 ).

40 Let s k[t ei ] be the monomial such that E(s) = S. Then Γ 1 Γ 2 = (E(u 1 ) E( v 1 s ) S) (E(u 2 s ) S E(v 2)) and W + S = (E(m 1 ) E(u 1 ) S) (E(m 2 ) E(v 2 ) S). Thus, since W + S = Γ 1 Γ 2, E(m 1 ) E(m 2 ) = E( v 1 s ) E(u 2 s ), which in turn implies m 1 m 2 = ( v 1 s )(u 2 s ). Since W is primitive and the coloring conditions on (Γ 1, S, Γ 2 ) imply E( v 1 s ) W red and E(m 1 ) W blue, the monomials m 1 and v 1 s are relatively prime. A similar argument shows m 2 and u 2 s are relatively prime. Thus, m 1 = u 2 s and m 2 = v 1 s, and consequently, E(m 1 v 1 ) = E(m 2 u 2 ) and p w = m 1 p Γ1 + m 2 p Γ2. Since deg p Γ1, deg p Γ2 < n, the induction hypothesis applied to p Γ1 and p Γ2 shows that p W [I H ] d.

41 Now suppose ii) holds. For i from 1 to N, let p Γ1i = u i v i and p Υ2i = y i z i be the binomials arising from Γ 1i and Υ 2i. Let w ib w ir be the binomial arising from the walk W i and let p W = w 0b w 0r. For 1 i N, let m i = w (i 1)b /u i, and q i = w (i 1)r /z i. Then p W = N N m i (u i v i ) + w Nb w Nr + q N+1 i (y N+1 i z N+1 i ). i=1 i=1 The preceding claim follows from three observations: (1) by construction, w 0b = m 1 u 1 and w 0r = q 1 z 1 ; (2) by the definition of W N, w Nb = m N v N and w Nr = q N y N ; and (3) by the definitions of m i, q i, and the walk W i, m i v i = m i+1 u i+1 and q i+1 z i+1 = q i y i for 1 i N 1. As a consequence of the size conditions on the splitting sets of W i, the linear combination N i=1 m i(u i v i ) [I H ] d and N i=1 q N+1 i(y N+1 i z N+1 i ) [I H ] d. So if W N satisfies condition i), the binomial w Nb w Nr [I H ] d, and thus, p W [I H ] d. To finish the proof, assume that S N and R N share an edge, e. Then the claim above becomes: p W = N i=1 m i (u i v i ) + t e ( m Nv N t e q Ny N t e ) + N q N+1 i (y N+1 i z N+1 i ) i=1 and we just need to show that, in fact, t e divides m N v N and q N y N. But this is clear to see since e S N which implies t e v N and e R N which implies t e y N. Example 28 (Independence models) Let H be the complete k-partite hypergraph with d vertices in each partition V 1,..., V k. These hypergraphs correspond to independence models in

42 statistics. Equivalently, the edge subring of the complete k-partite hypergraph with d vertices in each partition parametrizes the Segre embedding of P d P d with k copies. The ideal I H is generated by quadrics. To see this, let W, supp (W) H, be a primitive monomial walk of length 2n, n > 2. Choose a multiset E W consisting of n 1 blue and n 1 red edges. Since each edge must contain a vertex from each V i, for each i, there is at most one vertex in V (E ) V i that is not covered by a red edge and a blue edge from E. Consequently, V (E ) contains a vertex from each V i that belong to at least one red edge and at least one blue edge of E. For a multiset of edges, M, with supp (M) H, we define the max degree of a vertex: maxdeg(v; M) := max(deg red (v; M), deg blue (v; M)). The partitioning of the vertices ensures that V (E ) cannot contain more then k vertices whose maxdeg with respect to E is n 1. Indeed, if there are more that k vertices with maxdeg equal to n 1, then two of those vertices must belong to the same partition, V j. This would imply that W contains at least 4(n 1) edges, which is impossible when n > 2. Next, choose n 1 new blue edges and n 1 red edges in the following manner: Let d b (v) := deg blue (v; E ) and d r (v) := deg red (v; E ). For i = 1,..., k choose a vertex from V (E blue ) V (E red ) V i that has the largest maxdeg with respect to E ; let b n 1 and r n 1 be this set of vertices. For all v b n 1, reduce d b (v) and d r (v) by 1. Now choose b 1,..., b n 2 by the following algorithm:

43 for i from 1 to k do: let V i :=sort V (E ) V i by d b (v) in decreasing order; for j from n 2 down to 1 do: ( b j := list {v i : v i is first element in V i }; for all v b j do d b (v) = d b (v) 1; for i from 1 to k do V i =sort V i by d b (v) in decreasing order; ). Let R 1 = {b 1,..., b n 1 } and S 1 = {r 1,..., r n 1 }. Then R 1 and S 1 are red and blue splitting sets of W that share an edge. Thus, condition ii) of Theorem 27 is met, and consequently I H is generated in degree 2. When H is a non-uniform hypergraph, the toric ideal I H is not necessarily homogeneous. For example, Figure 4 supports a binomial in I H where H consists of edges of size two and four; note that the edges still satisfy the balancing condition (3.1.1). However, we can still modify the conditions of Theorem 27 to find degree bounds for the toric ideals of non-uniform hypergraphs. Proposition 29 gives a prescription for determining a degree bound on the generators of I H in terms of local structures of H. Proposition 29 Given a hypergraph H and a binomial f E I H arising from the balanced edge set E with n = E blue E red, f E is a linear combination of binomials in I H of degree less than n if one of the following two conditions hold:

44 Figure 4. Example of a non-uniform hypergraph whose associated toric ideal is non-homogeneous. i) there exists a proper splitting set S of E with decomposition (Γ 1, S, Γ 2 ) where Γ iblue, Γ ired < n for i = 1, 2, or ii) there is a pair of blue and red splitting sets of E, S and R, of size less than n with decompositions (Γ 1, S, Γ 2 ), (Υ 1, R, Υ 2 ) such that Γ 1blue, Υ 2red < n, Γ 2blue, Υ 1red n, and S R. Proof. This proof follows the proof of sufficiency for Theorem 27. Note that in the proof, the uniform condition doesn t play an essential role; it is only invoked to bound the size of the red and blue parts of each monomial hypergraph appearing in the decompositions involved. Thus, the hypothesis of Proposition 29 acts in place of the uniform condition in Theorem 27. 3.6 Hidden Subset Models For the remainder of this section, we will concern ourselves with the first tangential variety, Tan((P 1 ) n ). In [59], Sturmfels and Zwiernik use cumulants to give a monomial parameterization