An Asymptotic Behaviour of the Marginal Likelihood for General Markov Models

Size: px
Start display at page:

Download "An Asymptotic Behaviour of the Marginal Likelihood for General Markov Models"

Transcription

1 Journal of Machine Learning Research 1 (011) Submitted 1/10; Revised 9/11; Published 11/11 An Asymptotic Behaviour of the Marginal Likelihood for General Markov Models Piotr Zwiernik Institute for Pure and Applied Mathematics 460 Portola Plaza Box Los Angeles, CA PIOTR.ZWIERNIK@GMAIL.COM Editor: Kenji Fukumizu Abstract The standard Bayesian Information Criterion (BIC) is derived under regularity conditions which are not always satisfied in the case of graphical models with hidden variables. In this paper we derive the BIC for the binary graphical tree models where all the inner nodes of a tree represent binary hidden variables. This provides an extension of a similar formula given by Rusakov and Geiger for naive Bayes models. The main tool used in this paper is the connection between the growth behavior of marginal likelihood integrals and the real log-canonical threshold. Keywords: BIC, marginal likelihood, singular models, tree models, Bayesian networks, real logcanonical threshold 1. Introduction A key step in the Bayesian learning of graphical models is to compute the marginal likelihood of the data, which is the likelihood function averaged over the parameters with respect to the prior distribution. Given a fully observed system, the theory of graphical models provides a simple way to obtain the marginal likelihood. This was explained for example by Cooper and Herskovits (199) and Heckerman et al. (1995). However, when some of the variables in the system are hidden (never observed), the exact determination of the marginal likelihood is typically intractable (for example Chickering and Heckerman, 1997). This motivates the search for efficient techniques to approximate the marginal likelihood. In this paper we focus on the large sample behavior of the marginal likelihood called the Bayes Information Criterion (BIC). To present basic results on the BIC we need to introduce some notation. Let X be a random variable with values in [m] := {1,...,m}. Its distribution q = (q 1,...,q m ) can be identified with a point in the probability simplex m 1 = {x R m : i x i = 1, x i 0} R m. Consider a map p : Θ m 1 and letm = p(θ) be a parametric discrete model for X with the parameter space Θ and parametrization p. Let X (N) = X 1,...,X N be a random sample from the distribution q m 1. By Z N we denote the marginal likelihood and by L(θ;X (N),M) = P(X (N) M,θ) the likelihood function. Thus Z N =P(X (N) M)= L(θ;X (N),M)ϕ(θ)dθ, Θ c 011 Piotr Zwiernik.

2 ZWIERNIK where θ denotes the model parameters and ϕ(θ) is a prior distribution on Θ. The stochastic complexity is defined by F N = logz N and the entropy function by S= m i=1 q i logq i. In statistical theory to obtain the BIC we usually require that the asymptotic limit of the likelihood function, as N, is maximized over a unique point in the interior of the parameter space where the Jacobian matrix of the parametrization is full rank. For the class of problems for which this assumption is satisfied Schwarz (1978) and Haughton (1988) showed that, as N, EF N = N S+ d logn+ O(1), where d = dimθ (Watanabe, 009, Corollary 1.15 and Section 6). The same formula works if the limit of the likelihood is maximized over a finite number of points in the interior of Θ. Geometrically, for large sample sizes function Z N concentrates around the maxima. This enables us to apply the Laplace approximation locally in the neighborhood of each maximum. It can be proved (see Proposition 5) that the above formula can be generalized for the case when the set, over which the limit of the likelihood is maximized, forms a sufficiently regular compact subset of the parameter space. Denote this subset by Θ. Then, as N, EF N = N S+ d d logn+ O(1), (1) where d = dim Θ. Note that in our case Θ is a set of zeros of a real analytic function. Therefore, it will be always a semi-analytic set, that is given by {g 1 (θ) 0,...,g r (θ) 0}, where g i are all analytic functions. It follows that the dimension is well defined (Bierstone and Milman, 1988, Remark.1). In the case of models with hidden variables the locus of the points maximizing the limit of the likelihood may not be sufficiently regular. In this case the likelihood will have a different asymptotic behavior around different points and relatively more mass of the marginal likelihood integral will be related to neighborhoods of singular points (see Figure 1). For these points we cannot use the Laplace approximation. Nevertheless, the computation of the BIC is still possible using results of Watanabe (009) and some earlier works of Arnold, Varchenko and collaborators (Arnold et al., 1988). This formula will differ from the standard BIC. First, the coefficient of log N can be different from d d in (1). Second, we sometimes encounter an additional log log N term affecting the asymptotics (see Theorem 4). Let again q be the true data generating distribution and M = p(θ) a discrete parametric statistical model with the parameter space Θ. Let ϕ : Θ R be a prior distribution. Throughout the paper we always assume: (A1) The prior distribution ϕ is strictly positive, bounded and smooth on Θ. (A) There exists θ Θ such that p(θ)=q and q lies in the interior of the probability simplex. (A3) The set Θ R d is a compact and semianalytic set of dimension d. 384

3 ASYMPTOTICS OF THE MARGINAL LIKELIHOOD FOR TREE MODELS Figure 1: The case when the likelihood is maximized over a singular subset of Θ=[ 1,1] given by θ 1 θ = 0. In this paper we consider an important class of parametric models with large number of hidden variables, called general Markov models, assuming for simplicity that all the random variables in the system are binary. This model class is extensively used in phylogenetics (Semple and Steel, 003, Chapter 8) and in the analysis of causal systems (see Pearl and Tarsi, 1986). We begin with a quick informal introduction to general Markov models which is then formalized in Section 3.1. Let T =(V,E) be an undirected tree with the vertex set V and the edge set E. Let T r denote T rooted in r, that is a tree with one distinguished vertex r and all the edges directed away from r. Consider the Markov process Y =(Y v ) v V on T r, which by definition is the Bayesian network on T r. Then, the general Markov model is a family of marginal distributions over the subvector of Y corresponding to the leaves of T r. It is well known that this model class does not depend on the rooting. Therefore, we denote this model class, omitting the rooting, bym T. For a tree T with n leaves we denote the subvector of Y corresponding to the leaves of T by X =(X 1,...,X n ) with some arbitrary numbering of leaves. The subvector of Y corresponding to the inner nodes is denoted by H. By construction the general Markov model is a statistical model for X. Let q M T be the true distribution and Σ = [ˆµ i j ] be the covariance matrix of X. A surprising fact proved in this paper is that the zeros in Σ, or equivalently, marginal independencies between components of X, completely determine the asymptotics for the marginal likelihood. We say that two nodes u,v of T are separated by another node w, if w lies on the unique path between u and v. Let l denote the number of inner nodes v of T such that for each triple i, j,k of leaves separated in T by v we have ˆµ i j ˆµ ik ˆµ jk = 0 but there exist leaves i, j separated by v such that ˆµ i j 0. In terms of the conditional independence defining the general Markov model, an inner node v contributes to l if for every three leaves i, j,k such that X i X j X k H v at least two are marginally independent but there exist two leaves i, j such that X i X j H v but not X i X j. In addition, we say that an inner node v is degenerate (or q-degenerate) if for any two leaves i, j separated by v we have ˆµ i j = 0. In other words v is degenerate if for every i, j such that X i X j H v also X i X j. All other nodes are called nondegenerate. We denote by n e the number of edges of T and by n v the number of its nodes. The following result is a special instance of (Watanabe, 009, the Main Formula II, p. 34): 385

4 ZWIERNIK Theorem 1 Let T r = (V,E) be a tree rooted in r and X (N) be a random sample from q. With assumptions (A1), (A) and (A3), if there are no q-degenerate nodes then, as N, EF N = N S+ n v+ n e l logn+ O(1). Note, in particular, that the above formula is independent of the rooting. Example 1 Let n=4 and assume that data are generated from the Bayesian network given by the quartet tree in Figure. If q is such that Σ has no zeros then l = 0 and the coefficient of logn is 11 Figure : A quartet tree rooted in r.. This corresponds to the classical BIC since the dimension of the parameter space is 11. If the true distribution q M T satisfies in addition the marginal independence condition X 1 (X,X 3,X 4 ) then ˆµ 1i = 0 for i=,3,4 and r contributes to l. We depict this situation on the left hand side in Figure 3. Here the dashed edge means that for every pair i, j of leaves separated by this edge ˆµ i j = 0 and an inner node contributes to l if its valency, in the forest with the dashed edges removed, is. In this case l = 1 and the coefficient of logn is 9. If, in addition, q satisfies X 1 X 3 (X,X 4 ) then l = and hence the coefficient is 7. The corresponding graph is depicted in the middle of Figure 3. Example (Naive Bayes model) Consider a star tree with one inner node and n leaves. If there are no degenerate nodes this corresponds to q being either a regular point or a type 1 singularity as defined by Rusakov and Geiger (005). If l = 0 then q is a regular point and the coefficient of logn is equal to n+1. If l = 1 then q is a type 1 singularity and the coefficient is equal to n 1. This corresponds exactly to (Rusakov and Geiger, 005, Theorem 4). If the inner node is degenerate this corresponds to the type singularity which does not satisfy assumptions of Theorem 1. If q is such that there are degenerate nodes the computation of the BIC is much harder because the likelihood in this case maximizes over a singular subset of the parameter space. The case of star Figure 3: Three graphs representing submodels of the quartet tree model with some additional marginal independencies. In the third case the root is degenerate. 386

5 ASYMPTOTICS OF THE MARGINAL LIKELIHOOD FOR TREE MODELS trees was investigated by Rusakov and Geiger (005). In this paper we obtain a closed form formula for the BIC in the case of trivalent trees, whose all inner nodes have valency three. This is provided in Theorem which together with Theorem 1 are the main results of this paper. The importance of trivalent trees follows mainly from the fact that any other tree model is a submodel of a model for a trivalent tree. They also form a natural class for models of evolution in biology. If T is trivalent then for every inner node v V there exist A,B,C [n] such that A B C = [n] and A, B,C are separated by v. By the defining conditional independence conditions we have that X A X B X C Y v, where X A = (X i ) i A. In this case we call v degenerate if q is such that X A X B X C. Let l 0 denote the number of degenerate nodes. Theorem Let T r =(V,E) be a rooted trivalent tree with n 3 leaves and root r. With assumptions (A1), (A) and (A3) if r is degenerate but all its neighbors are not, then, as N, ( nv + n e l EF N = N S+ 5l ) 0+ 1 logn+ O(1). 4 In all other cases, as N, ( nv + n e l EF N = N S+ 5l ) 0 logn cloglogn+ O(1), 4 where c is a nonnegative integer. Moreover c = 0 always if either both r is nondegenerate or if r and all its neighbors are degenerate. The coefficients of logn above are given in this special form to show the correction term with respect to the coefficient in the smooth case in Theorem 1. Example 3 Consider again the quartet model from Example 1. If Σ has no zeros then l = 0, l 0 = 0 and we get the same formula as previously with coefficient 11. Now assume that q is such that in addition the marginal independence X 1 X (X 3,X 4 ) holds. The situation is depicted on the right hand side in Figure 3. The edge(r,a) is dashed since for any two leaves separated by this edge the corresponding covariance is zero. In this case l = 1, l 0 = 1, the root is degenerate but all its neighbors are not and hence, by Theorem, the coefficient of logn is 3 and c=0. Consider finally the case when all off-diagonal elements of Σ are zero. In this case l = 0 and l 0 = and hence the coefficient of logn is also 3. However, later in Example 5 and Remark 9 we will see that c may be strictly greater than zero in this case. Following Rusakov and Geiger (005), the main method of the proof is to change the coordinates of the model so that the induced parameterization becomes simple. This gives us a much better insight into the model structure which is described by Zwiernik and Smith (011b) and Zwiernik and Smith (011a). Since the BIC is invariant with respect to these changes, the reparameterized problem still gives the solution to the original question. Our main analytical tool is the real logcanonical threshold (see for example Watanabe, 009). This is an important geometric invariant which in certain cases can be computed in a relatively simple way using discrete geometry. The relevance of this invariant to the BIC is given by Theorem 4. We remark that the techniques developed in this paper can be applied to obtain the BIC also in the case of non-trivalent trees. The paper is organized as follows. In Section, following Watanabe (009), we provide the theory of asymptotic expansion of marginal likelihood integrals. This theory enables us to analyze 387

6 ZWIERNIK the asymptotic behavior of the marginal likelihood without the standard regularity assumptions. In Section 3 we define Bayesian networks on rooted trees. We also obtain a basic result on the BIC in the case when the observed likelihood is maximised over a sufficiently smooth subset of the parameter space. This gives a simple proof of Theorem 1. The proof of Theorem is more technical and hence divided into three main steps split between Sections 4, 5 and 6. Finally, in Section 7, we combine all these results.. Asymptotics of Marginal Likelihood Integrals In this section we introduce the real log-canonical threshold and link it to the asymptotic behavior of marginal likelihood integrals. We present how this enables us to obtain the BIC in the case of a general class of statistical models, which is mostly based on previous results of Sumio Watanabe..1 The Real Log-canonical Threshold Given θ 0 R d, leta θ0 (R d ) be the ring of real-valued functions f :R d R that are analytic at θ 0. Given a subset Θ R d satisfying (A3), let A Θ (R d ) be the ring of real functions analytic at each point θ 0 Θ. If f A Θ (R d ), then for every θ 0 Θ, f can be locally represented as power series centered at θ 0. Denote by A Θ (Rd ) the subset of A Θ (R d ) consisting of all non-negative functions. Usually the ambient space is clear from the context and in this case we omit it in our notation writing A θ0,a Θ anda Θ. Definition 3 (The real log-canonical threshold) Given a compact semianalytic set Θ R d such that dimθ=d, a real analytic function f A Θ (Rd ) and a smooth positive function ϕ : R d R, consider the zeta function defined as ζ(z)= f(θ) z ϕ(θ)dθ. () Θ By Theorem.4 of Watanabe (009) this function is extended to a meromorphic function on the entire complex line and its poles are real and positive. The real log-canonical threshold of f denoted by rlct Θ ( f ;ϕ) is the smallest pole of ζ(z). By mult Θ ( f ;ϕ) we denote the multiplicity of this pole. By convention if ζ(z) has no poles then rlct Θ ( f ;ϕ) = and mult Θ ( f ;ϕ) = d. If ϕ(θ) 1 then we omit ϕ in the notation writing rclt Θ ( f) and mult Θ ( f). Define RLCT Θ ( f ;ϕ) to be the pair (rlct Θ ( f ;ϕ),mult Θ ( f ;ϕ)), and we order these pairs so that (r 1,m 1 )>(r,m ) if r 1 > r, or r 1 = r and m 1 < m. LetM = p(θ) m 1 be a parametric discrete model and q m 1 be a probability distribution. WithM and q fixed the Kullback-Leibler distance K : Θ R is defined by K(θ)= m i=1 q i log q i p i (θ). (3) It is well known that K(θ) 0 on Θ and K(θ) = 0 if and only if p(θ) = q. If q is the true data generating distribution then assumption (A) means that Θ={θ : K(θ)=0} is non-empty. The following theorem gives the motivation to study the real log-canonical threshold in the statistical context. 388

7 ASYMPTOTICS OF THE MARGINAL LIKELIHOOD FOR TREE MODELS Theorem 4 (Watanabe) Let M be a parametric discrete statistical model, q the true data generating distribution and K the corresponding Kullback-Leibler distance. With assumptions (A1), (A) and (A3), as N, EF N = N S+rlct Θ (K;ϕ)logN+(mult Θ (K;ϕ) 1)loglogN+ O(1). To compute the real log-canonical threshold we split the integral in () into a sum of finitely many integrals over small neighbourhoods Θ 0 of some points θ 0 Θ for which we have efficient tools of computation. We can always do this using the partition of unity since Θ is compact. For each of the local integrals we use Hironaka s theorem to reduce it to a locally monomial case. The details are presented by Watanabe (009). Let θ 0 Θ and let W 0 be any sufficiently small open ball around θ 0 in R d. Then, by Theorem.4 of Watanabe (009), RLCT W0 ( f ;ϕ) does not depend on the choice of W 0 and hence it is denoted by RLCT θ0 ( f ;ϕ). If f(θ 0 ) 0 then RLCT W0 ( f ;ϕ) = (,d) and hence we can constrain only to points θ 0 such that f(θ 0 )=0. In our context this means that we consider only points in the q-fiber Θ. The local computations give the answer to the global question because, by (Lin, 011, Proposition.5), the set of pairs RLCT Θ0 ( f ;ϕ) for θ 0 Θ has a minimum and RLCT Θ ( f ;ϕ)= min θ 0 Θ RLCT Θ 0 ( f ;ϕ), (4) where Θ 0 = W 0 Θ is the neighbourhood of θ 0 in Θ. For each θ 0 Θ to compute RLCT Θ0 ( f ;ϕ) we consider two cases. If θ 0 lies in the interior of Θ then we can assume Θ 0 = W 0 and hence RLCT Θ0 ( f ;ϕ)=rlct θ0 ( f ;ϕ). If θ 0 bd(θ), where bd(θ) denotes the set of boundary points of Θ, the computations may change significantly because the real log-canonical threshold depends on the boundary conditions (cf. Example.7 of Lin, 011). Nevertheless, it can be showed that at least if there exists an open subset U R d such that U Θ 0 and f A U (Rd ) then Because in this case RLCT Θ0 ( f) RLCT θ0 ( f). (5) ( f(θ)) z dθ= ( f(θ)) z dθ+ ( f(θ)) z dθ W 0 Θ 0 W 0 \Θ 0 which implies that RLCT θ0 ( f)=min{rlct Θ0 ( f),rlct W0 \Θ 0 ( f)}. Finally, whenever Θ /0 we have RLCT Θ (K)= min RLCT Θ0 (K). (6) θ 0 Θ The following result enables us to obtain the BIC in the smooth case. Proposition 5 Let M be a parametric statistical model with parametrization p, and q be the true data generating distribution. Let K A Θ (Rd ) be the Kullback-Leibler distance defined in (3). Given (A1), (A) and (A3) assume that there exists a smooth manifold M R d satisfying Θ=M Θ. Then, as N, d d EF N = N S+ logn+ O(1), where d = dim Θ. 389

8 ZWIERNIK Proof By assumption (A1) there exist two constants c,c> 0 such that c<ϕ(θ)<c on Θ. Therefore c (K(θ)) z dθ<ζ(z)< C (K(θ)) z dθ Θ Θ and it follows that RLCT Θ (K;ϕ) = RLCT Θ (K). By Theorem 4 it suffices to prove the following lemma which generalises Proposition 3.3 of Saito (007). Lemma 6 Let Θ R d be a compact semianalytic set and f A Θ (Rd ). If there exists a smooth manifold M R d such that Θ = M Θ and θ 0 Θ then RLCT θ0 ( f) = RLCT Θ ( f) = ( d d,1) where d = dim Θ. To prove this recall that the real log-canonical threshold RLCT θ0 ( f) does not depend on the choice of a sufficiently small neighborhood W 0 of θ 0. Since Θ = M Θ and M is a smooth manifold it follows that for each point θ 0 of Θ there exists an open neighborhood W 0 of θ 0 in R d with local coordinates w 1,...,w d centered at θ 0 such that the local equation of Θ is w 1 + +w c = 0, where c=d d. A single blow-up π at the origin satisfies all the conditions of Hironaka s Theorem since in the new coordinates over one of the charts f(π(u)) = u 1a(u) where a(u) is nowhere vanishing and π (u)=u c 1 1. For other charts the situation is the same and hence RLCT θ0 ( f)=(c/,1). Since by (4) RLCT Θ ( f)=min θ0 Θ RLCT W0 Θ( f) it suffices to show that if θ 0 is a boundary point of Θ then RLCT W0 Θ( f) (c/,1). But this follows from (5) and the fact that RLCT θ0 ( f)=(c/,1) as θ 0 is a smooth point of M. The lemma is hence proved. 3. General Markov Models In this section we formally define the general Markov modelm T and give in Theorem 1 the asymptotic expansion of the marginal likelihood when q andm T satisfy conditions of Proposition Definition of the Model Class All random variables considered in this paper are assumed to be binary with the value either 0 or 1. Let T r =(V,E) be a rooted tree. For any directed edge e=(k,l) E we say that k and l are adjacent and k is a parent of l and we denote it by k=pa(l). For every β {0,1} V let p β =P( v V{Y v = β v }). The Markov process on T r is a sequence Y =(Y v ) v V of binary random variables such that for each β=(β v ) v V {0,1} V p β (θ)=θ (r) β r v V\r θ (v) β v β pa(v), (7) where θ (v) β v β = P(Y pa(v) v = β v Y pa(v) = β pa(v) ) and θ (r) β r = P(Y r = β r ). In a more standard statistical language these models are just fully observed Bayesian networks on rooted trees. Recall that n e = E and n v = V. Since θ (v) 0 i + θ(v) 1 i = 1 for all v V and i = 0,1 then the Markov process on T r defined by (7) has exactly n e + 1 free parameters in the vector θ: one for the root distribution θ (r) 1 and two for each edge (u,v) E given by θ (v) 1 0 and θ(v) 1 1. The parameter space is Θ T =[0,1] n e+1. The general Markov model on T r is induced from the Markov process on T r by assuming that all the inner nodes represent hidden random variables. Hence we consider induced marginal 390

9 ASYMPTOTICS OF THE MARGINAL LIKELIHOOD FOR TREE MODELS probability distributions over the leaves of T r. The set of leaves is denoted by L. We assume that T r has n leaves and hence we can associate L with the set [n] given some arbitrary numbering of the leaves. Let Y =(X,H) where X =(X 1,...,X n ) denotes the variables represented by the leaves of T r and H denotes the vector of variables represented by inner nodes, that is X = (Y v ) v L and H =(Y v ) v V\L. We define the general Markov modelm T to be the model in the probability simplex n 1 obtained by summing out in (7) over all possible values of the inner nodes. By definitionm T is the image of the map p : Θ T n 1 given by p α (θ)= H θ (r) β r v V\r θ (v) β v β pa(v) for any α {0,1} L, whereh is the set of all vectors β=(β v ) v V such that (β v ) v L = α. Because the model class does not depend on the rooting we usually omit the root r in the notation. For a more detailed treatment see (Semple and Steel, 003, Chapter 8). 3. The BIC in the Smooth Case For q M T let Σ = [ˆµ i j ] R n n be the corresponding covariance matrix of the random vector represented by the leaves of T. We define the q-fiber as Θ T = {θ Θ T : p(θ)=q} = {θ Θ T : K(θ)=0}. The geometry of Θ T is directly related to the real log-canonical threshold of the Kullback-Leibler distance. We now show that this geometry is determined by zeros in Σ. For this we need to introduce some new concepts. We say that that an edge e E is isolated relative to q if ˆµ i j = 0 for all i, j [n] such that e E(i j), where E(i j) denotes the set of edges in the path joining i and j. By Ê E we denote the set of all edges of T which are isolated relative to q. By T = (V,E\ Ê) we denote the forest obtained from T by removing edges in Ê. We now define relations on Ê and E\ Ê. For two edges e,e with either {e,e } Ê or {e,e } E\Ê write e e if either e=e or e and e are adjacent and all the edges that are incident with both e and e are isolated relative to q. Let us now take the transitive closure of restricted to pairs of edges in Ê to form an equivalence relation on Ê. Similarly, take the transitive closure of restricted to the pairs of edges in E\ Ê to form an equivalence relation in E\ Ê. We will let [Ê] and [E\ Ê] denote the set of equivalence classes of Ê and E\ Ê respectively. By construction all the inner nodes of T have either degree zero in T or the degree is strictly greater than one. We say that a node v V is non-degenerate with respect to q if either v is a leaf of T or degv in T. Otherwise we say that the node is degenerate with respect to q. Note that this coincides with the definition of a degenerate node given in the introduction. Moreover, the isolated edges in Examples 1 and 3 correspond precisely to the dashed edges in Figure 3. The set of all nodes which are degenerate with respect to q is denoted by V. Proposition 7 (Zwiernik and Smith, 011b) Let T be a tree with n leaves. Let q M T and let T be defined as above. If each of the inner nodes of T has degree at least two in T then Θ T is a manifold with corners and dim Θ T = l, where l is the number of nodes which have degree two in T. In this way we can compute the asymptotic behavior of the marginal likelihood in the case when assumptions of Proposition 7 are satisfied. 391

10 ZWIERNIK Proposition 8 Let T be a tree and M T be the corresponding general Markov model. Let q M T be the real distribution generating the data such that each inner node of T has degree at least two in T. Then ( ) nv + n e l RLCT ΘT (K)=,1. Proof Since every inner node of T has degree at least two in T then by Proposition 7 there exists a smooth manifold M R n v+n e such that Θ T = M Θ T and dim Θ=l. The result follows from Proposition 5 and the fact that dimθ T = n e + 1=n v + n e. By Theorem 4, Proposition 8 implies Theorem 1 since l in its statement is exactly the number of inner nodes v such that the degree of v in T is two. Remark 9 Theorem 1 is still true if (A1) is replaced by the assumption that the prior distribution is bounded on Θ T and there exists an open subset of Θ T with a non-empty intersection with Θ T where the prior is strictly positive. In particular we can use conjugate Beta priors θ (v) 1 i Beta(α(v) i,β (v) i ) as long as α (v) i,β (v) i The Ideal-theoretic Approach In this section we define the real log-canonical threshold of an ideal. Theorem 11 translates the problem of finding the real log-canonical threshold of the Kullback-Leibler distance into algebra. We then analyse general Markov models from this perspective. In Theorem 14 we apply a useful change of coordinates which enables us to work out the real log-canonical threshold in the singular case. 4.1 The Real Log-canonical Threshold of an Ideal Let f 1,..., f r A Θ then the ideal generated by f 1,..., f r is a subset ofa Θ denoted by f 1,..., f r ={ f A Θ : f(θ)= r i=1 h i (θ) f i (θ),h i A Θ }. Following Lin (011) we generalize the notion of the real log-canonical thresholds to the ideal I = f 1,..., f r. This mirrors the analytic definition of the log-canonical threshold of an ideal (see for example Lazarsfeld, 004, Section 9.3.D). By definition RLCT Θ (I;ϕ)=RLCT Θ ( f 1,..., f r ;ϕ) := RLCT Θ ( f ;ϕ), where f(θ)= f 1 (θ)+ + f r(θ). By (Lin, 011, Proposition 4.5) the real log-canonical threshold does not depend on the choice of generators of I. The following important proposition enables us to use the full power of the ideal-theoretic approach. Proposition 10 Let f,g A Θ (R d ) and let I be an ideal ina Θ (R d ). Then 39

11 ASYMPTOTICS OF THE MARGINAL LIKELIHOOD FOR TREE MODELS i Let ρ : Ω Θ be a proper real analytic isomorphism and ρ I ={ f ρ : f I} be the pullback of I toa Ω. Then, RLCT Θ (I;ϕ)=RLCT Ω (ρ I;(ϕ ρ) ρ ), where ρ denotes the Jacobian of ρ. ii If ϕ is positive and bounded on Θ then RLCT Θ (I;ϕ)=RLCT Θ (I). iii If there exist constants c,c > 0 such that cg(θ) f(θ) c g(θ) for every θ Θ then RLCT Θ ( f)= RLCT Θ (g). iv Let I = f 1,..., f r and J = g 1,...,g r where g i = u i f i for i = 1,...,r and there exist positive constants c,c such that c<u i (θ)< C for all θ Θ and for all i=1,...,r. Then RLCT Θ (I)= RLCT Θ (J). The ideal-theoretic approach proves to be useful in a fairly general statistical context: Theorem 11 (Lin, 011) Let p = (p 1,..., p m ) : Θ m 1 be a polynomial mapping and M = p(θ) be the statistical model of X with values in [m]. For a given point q M define I = p 1 (θ) q 1,..., p m (θ) q m A Θ. (8) Let q denote the true data generating distribution and K(θ) be the corresponding Kullback-Leibler distance defined in (3). Moreover, let ϕ the prior distribution on Θ satisfying (A1). Then RLCT Θ (K;ϕ)=RLCT Θ (I ;ϕ)=rlct Θ (I), (9) where the second equation in (9) follows from Proposition 10 ii. We now perform the change of coordinates f θω : Θ T Ω T, f pκ : n 1 K T discussed in detail by Zwiernik and Smith (011b). We have the following diagram, where the top row is the original parametrization and where the induced parameterisation ψ T is given in the bottom row. Θ T f ωθ f θω Ω T p ψ T n 1 f κp K T f pκ Here f θω and f pκ are polynomial isomorphisms, and hence, by Proposition 10 (i), in our computations of the real-log canonical threshold, we can alternatively constrain to the bottom row of the diagram. We denote the coordinates of K T by κ=(κ I ) for I [n], I /0. The coordinates of Ω T are denoted by ω=((s v ),(η uv )) for all v V and (u,v) E. Both ω and κ have a statistical meaning as described by Zwiernik and Smith (011b). However, in this work we use them in a purely algebraic manner. We just note that the coordinates ofk T are certain functions of the moments. In particular, κ i =EX i for i=1,...,n and each κ i j corresponds to the covariance between X i and X j. Interpretation of other coordinates is more complicated. 393

12 ZWIERNIK Simple linear constraints defining Θ T become only slightly more complicated when expressed in the new parameters. The choice of parameter values is not free anymore in the sense that constraints for each of the parameters involve other parameters. The new parameter space Ω T is given by s r [ 1,1] and for each (u,v) E (cf. Equation (19) in Zwiernik and Smith, 011b) (1+s v ) (1 s u )η uv (1 s v ) (1 s v ) (1+s u )η uv (1+s v ). (10) In the new coordinate system the situation is more tractable because ψ T has a simpler structure. Proposition 1 (Zwiernik and Smith, 011b) Let T r = (V,E) be a rooted trivalent tree with n leaves. Then for each i=1,...,n one has κ i (ω)= 1 (1 s i) and κ I (ω)= 1 4 (1 s r(i) ) v V(I)\I s degv v (u,v) E(I) η uv for all I, where the degree of v V(I) is considered in T(I)=(V(I),E(I)), which is the smallest subtree of T containing I. Let I denote the pullback of the ideal I A ΘT to the ideal in A ΩT induced by f θω. Thus I = f ωθ I ={ f f ωθ : f I}. The ideal describes Ω T = f θω ( Θ T ) as a subset of Ω T. Let [n] k denote all subsets of [n] with at least k elements. Then the pullback of I satisfies ( ) I = κ 1 ˆκ 1,...,κ n ˆκ n + κ I (ω) ˆκ I, (11) I [n] where ˆκ I are the corresponding coordinates of f pκ (q). Here the sum of ideals results in another ideal with the generating set which is the sum of generating sets of the summands. For local computations we use the following reduction. Proposition 13 (Lin, 011) Let I A x0 (R m ), J A y0 (R n ) be two ideals. If RLCT x0 (I)=(λ x,m x ) and RLCT y0 (J)=(λ y,m y ) then RLCT (x0,y 0 )(I+ J)=(λ x + λ y,m x + m y 1). Theorem 14 Let T r be a rooted tree with n leaves and q M T. Let I be the ideal defined by (8) and I the ideal defined by (11). Then RLCT ΘT (I)=RLCT ΩT (I)= min ω 0 Ω T RLCT Ω0 (I), where Ω 0 is a sufficiently small neighborhood of ω 0 in Ω T. Moreover, let J = I [n] κ I (ω) ˆκ I. Then, for every ω 0 Ω T ( n ) RLCT ω0 (I)=,0 + RLCT ω0 (J). (1) 394

13 ASYMPTOTICS OF THE MARGINAL LIKELIHOOD FOR TREE MODELS Proof Since f ωθ is an isomorphism with a constant Jacobian then the first part of the theorem follows from Proposition 10 (i). Let now W be an ε-box around ω 0 =((s 0 v),(η 0 uv)). If T is rooted in an inner leaf then by Proposition 1 the ideal J does not depend on s 1,...,s n. Since for every i=1,...,n the expression κ i ˆκ i depends only on s i then RLCT ω0 ( κ 1 ˆκ 1,...,κ n ˆκ n )=( n,1), which can be easily checked (see for example Proposition 3.3 of Saito, 007). Equation (1) follows from Proposition 13. Now assume that T is rooted in one of the leaves. In this case both κ 1 ˆκ 1,...,κ n ˆκ n and J depend on s r because κ I (ω)=(1 s r) f I (ω) for some monomial f I (ω) whenever r I. Therefore, we cannot use Proposition 13 directly. However, by assumption (A), q lies in the interior of the probability simplex and hence ˆκ i (0,1) for i = 1,...,n which is equivalent to s 0 i ( 1,1). Therefore, for each ω 0 one can find two positive constants c,c such that c 1 s r C in W. By Proposition 10 (iv) the real log canonical threshold of J in W is equal to the real log-canonical threshold of a an ideal with generators induced from the generators of J by replacing each 1 s r by 1. Now again (1) follows from Proposition The Main Reduction Step Recall that ˆκ i j = Cov(X i,x j ). In this section we prove a technical result which enables us to reduce the computations of RLCT ω0 (J) to two simpler cases. First, when q is such that ˆκ i j 0 for all i, j [n]. Second, when q is such that ˆκ i j = 0 for all i, j [n]. Moreover, the second case is reduced to computations for monomial ideals which are amenable to various combinatorial techniques. Let T be a trivalent tree with n 3 leaves and let q M T. If all the equivalence classes in [Ê] are singletones or [Ê] is empty, which is equivalent to every inner node being of degree at least two in T, then Theorem 1 gives us the asymptotic behavior of the marginal likelihood. Thus, let assume that there is at least one class in[ê] which is not a singleton. Let T 1,...,T k denote trees representing the equivalence classes in [Ê] and let S 1,...,S m denote trees induced by the connected components of E\ Ê. Let L 1,...L k denote the sets of leaves of T 1,...,T k. For each S i i=1,...,m by Remark 5. (iv) of Zwiernik and Smith (011b) its set of leaves denoted by [n i ] is a subset of [n]. For each S i the number of nodes, edges and nodes of degree in T is denoted by n i v, n i e and l i respectively. We illustrate this notation in Figure 4 where the dashed edges represent edges in Ê. Simpler examples are given in Figure 3. Lemma 15 Let T =(V,E) be a trivalent rooted tree with n 4 leaves and let q M T. Let J = I [n] κ I (ω) ˆκ I as in Theorem 14. If ω 0 Ω T then RLCT ω0 (J)= m i=1 RLCT ω0 (J(S i ))+ k i=1 RLCT ω0 (J(T i ))+(0,1 m k), (13) where J(S i )= I [ni ] κ I (ω) ˆκ I for i=1,...,m and J(T i )= w,w L i κ ww (ω) for i=1,...,k. Proof We first show that I:ˆκI =0 κ I (ω) = i, j:ˆκi j =0 κ i j (ω). The inclusion is clear. We now show. First note that for every I [n] if ˆκ I = 0 then either η 0 e = 0 for an edge e E(I) or 395

14 ZWIERNIK Figure 4: An example of a forest T induced by a point q. s r(i) = 1. There exist i, j I such that such that ˆκ i j= 0 and the r(i j)=r(i). It follows by Proposition 1 that κ I (ω)=κ i j (ω) f(ω) for a polynomial f(ω) and therefore the inclusion is also true. This implies J = κ I (ω) ˆκ I + I:ˆκ I 0 I:ˆκ I =0 κ I (ω) = Hence, to proof the lemma, it suffices to show that for every ω 0 Ω T RLCT ω0 ( m i=1 m i=1 J(S i )+ κ i j (ω). i, j:ˆκ i j =0 J(S i )+ κ i j (ω) ) (14) i, j:ˆκ i j =0 is equal to the right hand side of (13). If e E \ Ê then by definition there exist i, j [n] such that ˆκ i j 0 and e E(i j). Since, by Proposition 1, ˆκ i j = η 0 e f(ω 0 ) for a polynomial f then in particular η 0 e 0. It follows that for a sufficiently small ε for each E E\ Ê one can find positive constants c(ε),c(ε) such that c(ε) e E η e C(ε) holds in the ε-box around ω 0. Similarly if v / V (cf. Section 3.) then there exist positive constants d(ε),d(ε) such that d(ε) (1 s v) D(ε) in the ε-box around ω 0. It follows by Proposition 10 (iv) that in computations of the real log-canonical threshold in (14) we can replace each κ i j (ω) by (1 s r(i j) )δ r(i j) η e (15) e E(i j) Ê where δ r(i j) = 1 if r(i j) V and δ r(i j) = 0 otherwise. Thus, in (14) we can replace the ideal i, j:ˆκi j =0 κ i j (ω) by the ideal J 1 = i, j: ˆκi j =0 (1 s r(i j) )δ r(i j) η e E(i j) Ê e. However, if we define J = k i=1 w,w L i (1 s r(ww ) )δ r(ww ) e E(ww ) η e (16) then it can be checked that J 1 = J. To show that J J 1, fix j = 1,...,k and w,w L j, and show that the corresponding generator of J lies in J 1. Note that by construction each of w,w either has degree two in T or is a leaf of T. Hence, by the definition of Ê, there exist i, j [n] such that E(i j) Ê = E(ww ). It follows that each generator in (16) is also in the set of generators of J 1 and hence J J 1. To show the opposite inclusion, note that, if E(i j) intersects with more than one component T 1,...,T k then the corresponding generator in (15) is a product of some generators in (16) and hence it lies in J. 396

15 ASYMPTOTICS OF THE MARGINAL LIKELIHOOD FOR TREE MODELS Since the generators of every J(S i ) for i=1,...,m and every (1 s r(ww ))δr(ww ) w,w L j e E(ww ) for j= 1,...,k involve disjoint sets of variables then by Proposition 13 the term in (14) is equal to ( ) m k RLCT ω0 (J(S i ))+ RLCT ω0 (1 s r(ww ) )δr(ww ) η e +(0,1 m k). i=1 i=1 w,w L j η e e E(ww ) Again by Proposition 10 (iv) for each i=1,...,k ( ) RLCT ω0 (1 s r(ww ) )δr(ww ) η e = RLCT ω0 (J(T i )) w,w L i e E(ww ) which finishes the proof. We note that, by Proposition 8 and the formula in (1) for each S i : 6. The Case of Zero Covariances RLCT(J(S i ))+ n i = ni v+ n i e l i. (17) In this subsection we assume that q M T is such that ˆκ i j = 0 for all i, j [n]. This implies the full joint marginal independence X 1 X n. The aim is to prove the following proposition. Proposition 16 Let T be a trivalent tree with n 3 leaves rooted in r V. Let q M T be such that ˆκ i j = 0 for all i, j [n]. Let J be the ideal defined in Theorem 14. Then ( n ) min RLCT ω0 (J)= ω 0 Ω T 4,m, where m=1 if either r is a leaf of T or r together with all its neighbors are all inner nodes of T. In all other cases we cannot obtain an explicit upper bound for m and hence m 1. The strategy of the proof of Proposition 16 is as follows. First, in Section 6.1, we show that the local computations can be restricted to a special subset of Ω T over which J can be replaced by a monomial ideal. Then, in Section 6., we present a method to compute the real log-canonical threshold of a monomial ideal. We use this method in Section The Deepest Singularity Note that, by Lemma 15, RLCT ω0 (J) = RLCT ω0 ( i, j [n] κ i j (ω) ) so without loss of generality we will assume in this section that J = i, j [n] κ i j (ω). Moreover, for each v V these ideals depend on s v only through the value of s v. It follows that the computations can be reduced only to points satisfying s v 0 for all inner nodes v of T. Henceforth, in this section, we always assume this is the case. We define the deepest singularity of Ω T as Ω deep :={ω Ω T : η e = 0 for all e Ê, s v = 1 for all v V}. 397

16 ZWIERNIK We note that, since ˆκ i j = 0 for all i, j [n], then Ê = E and V is equal to the set of all inner nodes of T. It follows that Ω deep is an affine subspace constrained to Ω T with all coordinates either 0 or 1. Proposition 17 Let T be a tree with n leaves. Let q M T such that ˆκ i j = 0 for all i, j [n]. Then min ω 0 Ω T RLCT ω0 (J)= min ω 0 Ω deep RLCT ω0 (J). Proof We first show that Ω T is a union of affine subspaces constrained to Ω T with a common intersection given by Ω deep. Let V 0 V and E 0 Ê and Ω (V0,E 0 ) ={ω Ω T : s v = 1 for all v V 0, η uv = 0 for all (u,v) E 0 }. We say that(v 0,E 0 ) is minimal for Σ if for every point ω in Ω (V0,E 0 ) and for every i, j [n] κ i j (ω)=0, and furthermore, that (V 0,E 0 ) is minimal with such a property (with respect to inclusion on both coordinates). We now show that Ω T = (V 0,E 0 ) min. Ω (V0,E 0 ). The first inclusion follows from the fact that if ω Ω T then κ i j (ω)= ˆκ i j = 0 for all i, j [n]. Therefore ω Ω (V0,E 0 ) for some minimal(v 0,E 0 ). The second inclusion is obvious. Each Ω (V0,E 0 ) is an affine subspace in R V + E, denoted by M (V0,E 0 ), constrained to Ω T. Let S denote the intersection lattice of all M (V0,E 0 ) for (V 0,E 0 ) minimal with ordering denoted by. For each i S let M (i) denote the corresponding intersection and define S i = M (i) \ j<im ( j). In this way we obtain an S-induced decomposition of R V + E (cf. Section 3.1 in Goresky and MacPherson, 1988). By (Lazarsfeld, 004, Example ) the function ω rlct ω (J) is lower semicontinuous (the argument used there works over the real numbers). This means that for every ω 0 Ω T and ε>0 there exists a neighborhood U of ω 0 such that rlct ω0 (J) rlct ω (J)+ε for all ω U. Since the set of values of the real log-canonical threshold is discrete this means that for every ω 0 Ω T and any sufficiently small neighborhood W 0 of ω 0, one has rlct ω0 (J) rlct ω (J) for all ω W 0. Moreover, rlct(j) is constant on each S i. Since for any neighborhood W 0 of ω 0 Ω deep we have W 0 S i /0 for all i S then necessarily the minimum of the real log-canonical threshold is attained for a point in the deepest singularity. Proposition 17 shows that in the singular case we can restrict our analysis to the neighborhood of Ω deep. Often however, we also consider points in a bigger set Ω 0 ={ω Ω T : η uv = 0 for all (u,v) Ê}. Note that Ω deep lies on the boundary of Ω T (cf. (10)) but Ω 0 also contains internal points of Ω T which will be crucial for some of the arguments later. We now formulate another technical lemma which enables us to reduce computations to the monomial case. 398

17 ASYMPTOTICS OF THE MARGINAL LIKELIHOOD FOR TREE MODELS Lemma 18 Assume that q M T is such that ˆκ i j = 0 for all i, j [n]. Let J(ω 0 ) be the ideal J translated to the origin. Then for every ω 0 Ω 0 RLCT 0 (J(ω 0 ))=RLCT 0 (J ), (18) where J is a monomial ideal such that each κ i j (ω+ω 0 ) in the set of generators of J(ω 0 ) is replaced either by s r(i j) (u,v) E(i j) η uv if s 0 r(i j) = 1, or by (u,v) E(i j) η uv if s 0 r(i j) 1. Proof Let i, j [n] and assume that ω 0 =((s 0 v),(η 0 e)) Ω 0 so that η 0 e = 0 for all e E. Then, by Proposition 1: κ i j (ω+ω 0 )= 1 4 (1 (s r(i j)+ s 0 r(i j) ) ) η e. (19) e E(i j) If s 0 r(i j) 1 for a sufficiently small ε>0 there exist positive constants c(ε), C(ε) such that c(ε)< 1 (s r(i j) +s 0 r(i j) ) < C(ε) for s r(i j) ( ε,ε). Therefore, by Proposition 10 (iv), we can replace this term in (19) with 1. If s 0 r(i j) = 1 rewrite 1 (1+s r(i j) ) as s r(i j) (+s r(i j) ). For a sufficiently small ε we can find two positive constants c(ε),c(ε) such that c<+s r(i j) < C whenever s r(i j) ( ε,ε). Again, by Proposition 10 (iv), we can replace +s r(i j) with 1. This proves Equation (18). Since J is a monomial ideal then, by (Lin, 011, Proposition 4.11) and Theorem 0 below, we can compute RLCT 0 (J ) using the method of Newton diagrams. We present this method in the following subsection. 6. Newton Diagram Method Given an analytic function f A 0 (R d ) we pick local coordinates x=(x 1,...,x d ) in a neighborhood of the origin. This allows us to represent f as a power series in x 1,...,x d such that f(x)= α c α x α. The exponents of terms of the polynomial f are vectors inn d. The Newton polyhedron of f denoted by Γ + ( f) is the convex hull of the subset {α+α : c α 0,α R d 0}. A subset γ Γ + ( f) is a face of Γ + ( f) if there exists β R d such that γ={α Γ + ( f) : α,β α,β for all α Γ + ( f)}. If γ is a subset of Γ + ( f) then we define f γ (x) = α γ N d c α x α. The principal part of f is, by definition, the sum of all terms of f supported on all compact faces of Γ + ( f). Example 4 Let f(x,y)=x 3 + xy+6x y+3x 4 y+y. Then the Newton diagram looks as follows: 399

18 ZWIERNIK where the dots correspond to the terms of f. There are only two bounded facets of Γ + ( f) and the principal part of f is equal to x 3 + xy+y. Definition 19 The principal part of the power series f with real coefficients is R-nondegenerate if for all compact faces γ of Γ + ( f) { x R n : f γ (x)= = f } γ (x)=0 {ω R n : x 1 x n = 0}. x 1 x n From the geometric point of view this condition means that the singular locus of the hypersurface defined by f γ (x)=0 lies outside of (R ) n for all compact faces γ of Γ + ( f). The following theorem shows that, if the principal part of f isr-nondegenerate and f A Θ, the computations are greatly facilitated. An example of an application of these methods in statistical analysis can be found in Yamazaki and Watanabe (004). Theorem 0 (Arnold et al., 1988) Let f A 0 (Rd ) and f(0)=0. If the principal part of f is R- nondegenerate then RLCT 0 ( f)=( 1 t,c) where t is the smallest number such that the vector(t,...,t) hits the polyhedron Γ + ( f) and c is the codimension of the face it hits. For a proof see Theorem 4.8, Lin (011). Let now f A θ 0 such that f(θ 0 )=0. We can then center f at θ 0 obtaining a function in A 0. Then we can use Theorem 0 to compute RLCT θ0 ( f). Remark 1 Note that this theorem in general will not give us RLCT Θ0 ( f) if 0 is a boundary point of Θ in which case we also need to resolve the defining inequalities. For a discussion see (Arnold et al., 1988, Section 8.3.4) and Example.7 in Lin (011). 6.3 Proof of Proposition 16 Let n 4. For each ω 0 Ω 0, let δ=δ(ω 0 ) {0,1} V denote the indicator vector satisfying δ v = 1 if v V is such that s 0 v = 1 and δ v = 0 otherwise. In particular δ i = 0 for all i=1,...,n because the leaves, by (A), are assumed to be non-degenerate. LetV δ =R n e+ δ =R δ R n e, where δ = v δ v, be the real space with variables representing the edges (x e ) e E and nodes (y v ) for all v such that δ v = 1. With some arbitrary numbering of the nodes and edges we order the variables as follows: y 1 y δ x e1 x ene. In Lemma 18, for each ω 0 Ω 0, we reduced our computations to the analysis of RLCT 0 (J ) where J has a simple monomial form. Let Q δ be a polynomial 3300

19 ASYMPTOTICS OF THE MARGINAL LIKELIHOOD FOR TREE MODELS function on Ω T defined as a sum of squares of generators of J. In particular RLCT 0 (J ) = RLCT 0 (Q δ ). The exponents of terms of the polynomial Q δ (ω) are vectors in {0,} ne+ δ. We have that Q δ (ω)= i j [n] s δ r(i j) r(i j) (u,v) E(i j) η uv. (0) The convex hull, in V δ, of the exponents of the terms in Q δ is called the Newton polytope of Q δ and denoted Γ(Q δ ). We now investigate this polytope which is needed to understand the polyhedron Γ + (Q δ ), which is needed to use Theorem 0. Since each term of Q δ corresponds to a path between two leaves then the construction of the Newton polytope Γ(Q δ ) V δ gives a direct relationship between paths in T and the points generating the polytope. Convex combinations of points corresponding to paths give rise to points in the polytope. Let E 0 E be the subset of edges of T such that one of the ends is in the set of leaves of T. We call these edges terminal. Note that each point generating Γ(Q δ ) satisfies e E0 x e = 4. This follows from the fact that each of these points corresponds to a path between two leaves in T and every such a path need to cross exactly two terminal edges. Consequently each point of Γ(Q δ ) needs to satisfy this equation as well. The induced facet of the Newton polyhedron Γ + (Q δ ) is given as and each point of Γ + (Q δ ) satisfies e E0 x e 4. The following lemma proves one part of Proposition 16. F 0 ={(y,x) Γ + (Q δ ) : e E 0 x e = 4} (1) Lemma (The real log-canonical threshold of J ) Under assumptions of Proposition 16 we have that rclt 0 (J )= n 4. Proof If n= then, since s 0 1,s0 1, by Lemma 18 we have that ( ) 1 RLCT ω0 (J )=RLCT 0 (η 1)=,1. Therefore Proposition 16 holds in this case. Now assume that n 4. By Theorem 0 we have to show that t= 4 n is the smallest t such that the vector(t,...,t) hits Γ +(Q δ ). To show that 4 n 1 Γ +(Q δ ) we construct a point q Γ(Q δ ) such that q 4 n1 coordinatewise. The point is constructed as follows. Construction 3 Let T =(V,E) be a trivalent tree with n 4 leaves, rooted in r. We present two constructions of networks of paths between the leaves of T. The first construction is for the case when the root is degenerate, δ r = 1. In this case T is necessarily rooted in an inner node. If n = 4 then the network consists of the two paths within cherries counted with multiplicity two. Each of the paths corresponds to a point in Γ(Q δ ). We order the coordinates ofv δ =R 5+ δ by y a y b x 1 x 5 where y a,y b are included only if δ a,δ b = 1. For example the point corresponding to the path involving edges e 1 and e is(,0;,,0,0,0). The barycenter of the points corresponding to all the four paths in the network is (1,1;1,1,0,1,1) both if T is rooted in a or b. 3301

20 ZWIERNIK If n>4 then we build the network recursively. Assume that T is rooted in an inner node a and pick an inner edge (a,b). Label the edges incident with a and b as for the quartet tree above and consider the subtree given by the quartet tree. Draw four paths as on the picture above. Let v be any leaf of the quartet subtree which is not a leaf of T and label the two additional edges incident with v by e 6 and e 7. Now we extend the network by adding e 6 to one of the paths terminating in v and e 7 to the other. Next we add an additional path involving only e 6 and e 7 like on the picture below. By construction v is the root of the additional path. We extend the network cherry by cherry until it covers all terminal edges. Note that we have made some choices building up the network and hence the construction is not unique. However, each of the inner nodes is always a root of at least one and at most two paths. Moreover, each edge is covered at most twice and each terminating edge is covered exactly two times. We have n paths in the network, all representing points of Γ(Q δ ) denoted by q 1,...,q n. Let q= 1 n n i=1 q i then q Γ(Q δ ) is given by x ab = 0, x e = 4 n for all e E\(a,b). The other coordinates by construction satisfy y a = 4 n, y b = 4 n if δ b = 1, and y v = n for all v V \{a,b} such that δ v = 1. If δ r = 0 then we proceed as follows. For n=4 consider a network of all the possible paths all counted with multiplicity one apart from the cherry paths (paths of length two) counted with multiplicity two. This makes eight paths and each edge is covered exactly four times. The coordinates of the point representing the barycenter of all paths in the network satisfy x e = 1 for all e E and y v = 1 for all v such that δ v = 1. This construction generalizes recursively in a similar way as the one for T rooted in an inner node. We always have n paths and each edge is covered exactly four times. The network induces a point q Γ(Q δ ) with coordinates given by y v = n for all v V such that δ v = 1 and x e = 4 n for e E. (This finishes the construction.) The point 4 n 1 lies in Γ +(Q δ ), which follows from Construction 3 and the fact that the constructed point q Γ(Q δ ) satisfies q 4 n 1. Moreover, for any s < 4 n the point s(1,...,1) does not satisfy e E0 x e 4 and hence it cannot be in Γ + (Q δ ). It follows that 4 n is the smallest t such that t1 Γ + (Q δ ) and therefore rlct 0 (J )= n 4. Note that the result does not depend on δ. To compute the multiplicity of the real log-canonical threshold of Q δ we have to get a better understanding of the polyhedron Γ + (Q δ ). According to Theorem 0 we need to find the codimension of the face of Γ + (Q δ ) hit by the vector 4 n1. First we find the hyperplane representation of the Newton polytope Γ(Q δ ) reducing the problem to a simpler but equivalent one. Definition 4 (A pair-edge incidence polytope) Let T =(V, E) be a trivalent tree with n 4 leaves. We define a polytope P n R n e, where n e = n 3, as the convex combination of points (q i j ) i, j [n] where k-th coordinate of q i j is one if the k-th edge is in the path between i and j and there is zero otherwise. We call P n a pair-edge incidence polytope by analogy to the pair-edge incidence matrix defined in (Mihaescu and Pachter, 008, Definition 1). 330

Asymptotic Approximation of Marginal Likelihood Integrals

Asymptotic Approximation of Marginal Likelihood Integrals Asymptotic Approximation of Marginal Likelihood Integrals Shaowei Lin 10 Dec 2008 Abstract We study the asymptotics of marginal likelihood integrals for discrete models using resolution of singularities

More information

arxiv: v6 [math.st] 19 Oct 2011

arxiv: v6 [math.st] 19 Oct 2011 Electronic Journal of Statistics Vol. 5 (2011) 1276 1312 ISSN: 1935-7524 DOI: 10.1214/11-EJS640 Implicit inequality constraints in a binary tree model arxiv:0904.1980v6 [math.st] 19 Oct 2011 Contents Piotr

More information

What is Singular Learning Theory?

What is Singular Learning Theory? What is Singular Learning Theory? Shaowei Lin (UC Berkeley) shaowei@math.berkeley.edu 23 Sep 2011 McGill University Singular Learning Theory A statistical model is regular if it is identifiable and its

More information

University of Warwick institutional repository:

University of Warwick institutional repository: University of Warwick institutional repository: http://go.warwick.ac.uk/wrap This paper is made available online in accordance with publisher policies. Please scroll down to view the document itself. Please

More information

arxiv: v2 [stat.me] 23 Dec 2015

arxiv: v2 [stat.me] 23 Dec 2015 Marginal likelihood and model selection for Gaussian latent tree and forest models arxiv:141.885v [stat.me] 3 Dec 015 Mathias Drton 1 Shaowei Lin Luca Weihs 1 and Piotr Zwiernik 3 1 Department of Statistics,

More information

2. Intersection Multiplicities

2. Intersection Multiplicities 2. Intersection Multiplicities 11 2. Intersection Multiplicities Let us start our study of curves by introducing the concept of intersection multiplicity, which will be central throughout these notes.

More information

10. Smooth Varieties. 82 Andreas Gathmann

10. Smooth Varieties. 82 Andreas Gathmann 82 Andreas Gathmann 10. Smooth Varieties Let a be a point on a variety X. In the last chapter we have introduced the tangent cone C a X as a way to study X locally around a (see Construction 9.20). It

More information

Total positivity in Markov structures

Total positivity in Markov structures 1 based on joint work with Shaun Fallat, Kayvan Sadeghi, Caroline Uhler, Nanny Wermuth, and Piotr Zwiernik (arxiv:1510.01290) Faculty of Science Total positivity in Markov structures Steffen Lauritzen

More information

9. Birational Maps and Blowing Up

9. Birational Maps and Blowing Up 72 Andreas Gathmann 9. Birational Maps and Blowing Up In the course of this class we have already seen many examples of varieties that are almost the same in the sense that they contain isomorphic dense

More information

Affine Geometry and Discrete Legendre Transform

Affine Geometry and Discrete Legendre Transform Affine Geometry and Discrete Legendre Transform Andrey Novoseltsev April 24, 2008 Abstract The combinatorial duality used in Gross-Siebert approach to mirror symmetry is the discrete Legendre transform

More information

12. Hilbert Polynomials and Bézout s Theorem

12. Hilbert Polynomials and Bézout s Theorem 12. Hilbert Polynomials and Bézout s Theorem 95 12. Hilbert Polynomials and Bézout s Theorem After our study of smooth cubic surfaces in the last chapter, let us now come back to the general theory of

More information

Partial cubes: structures, characterizations, and constructions

Partial cubes: structures, characterizations, and constructions Partial cubes: structures, characterizations, and constructions Sergei Ovchinnikov San Francisco State University, Mathematics Department, 1600 Holloway Ave., San Francisco, CA 94132 Abstract Partial cubes

More information

Facets for Node-Capacitated Multicut Polytopes from Path-Block Cycles with Two Common Nodes

Facets for Node-Capacitated Multicut Polytopes from Path-Block Cycles with Two Common Nodes Facets for Node-Capacitated Multicut Polytopes from Path-Block Cycles with Two Common Nodes Michael M. Sørensen July 2016 Abstract Path-block-cycle inequalities are valid, and sometimes facet-defining,

More information

Algebraic Varieties. Notes by Mateusz Micha lek for the lecture on April 17, 2018, in the IMPRS Ringvorlesung Introduction to Nonlinear Algebra

Algebraic Varieties. Notes by Mateusz Micha lek for the lecture on April 17, 2018, in the IMPRS Ringvorlesung Introduction to Nonlinear Algebra Algebraic Varieties Notes by Mateusz Micha lek for the lecture on April 17, 2018, in the IMPRS Ringvorlesung Introduction to Nonlinear Algebra Algebraic varieties represent solutions of a system of polynomial

More information

MATH 8253 ALGEBRAIC GEOMETRY WEEK 12

MATH 8253 ALGEBRAIC GEOMETRY WEEK 12 MATH 8253 ALGEBRAIC GEOMETRY WEEK 2 CİHAN BAHRAN 3.2.. Let Y be a Noetherian scheme. Show that any Y -scheme X of finite type is Noetherian. Moreover, if Y is of finite dimension, then so is X. Write f

More information

arxiv: v2 [math.ag] 24 Jun 2015

arxiv: v2 [math.ag] 24 Jun 2015 TRIANGULATIONS OF MONOTONE FAMILIES I: TWO-DIMENSIONAL FAMILIES arxiv:1402.0460v2 [math.ag] 24 Jun 2015 SAUGATA BASU, ANDREI GABRIELOV, AND NICOLAI VOROBJOV Abstract. Let K R n be a compact definable set

More information

Lecture 4 October 18th

Lecture 4 October 18th Directed and undirected graphical models Fall 2017 Lecture 4 October 18th Lecturer: Guillaume Obozinski Scribe: In this lecture, we will assume that all random variables are discrete, to keep notations

More information

Resolution of Singularities in Algebraic Varieties

Resolution of Singularities in Algebraic Varieties Resolution of Singularities in Algebraic Varieties Emma Whitten Summer 28 Introduction Recall that algebraic geometry is the study of objects which are or locally resemble solution sets of polynomial equations.

More information

Lecture 1. Toric Varieties: Basics

Lecture 1. Toric Varieties: Basics Lecture 1. Toric Varieties: Basics Taras Panov Lomonosov Moscow State University Summer School Current Developments in Geometry Novosibirsk, 27 August1 September 2018 Taras Panov (Moscow University) Lecture

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Semidefinite Programming

Semidefinite Programming Semidefinite Programming Notes by Bernd Sturmfels for the lecture on June 26, 208, in the IMPRS Ringvorlesung Introduction to Nonlinear Algebra The transition from linear algebra to nonlinear algebra has

More information

where m is the maximal ideal of O X,p. Note that m/m 2 is a vector space. Suppose that we are given a morphism

where m is the maximal ideal of O X,p. Note that m/m 2 is a vector space. Suppose that we are given a morphism 8. Smoothness and the Zariski tangent space We want to give an algebraic notion of the tangent space. In differential geometry, tangent vectors are equivalence classes of maps of intervals in R into the

More information

QUIVERS AND LATTICES.

QUIVERS AND LATTICES. QUIVERS AND LATTICES. KEVIN MCGERTY We will discuss two classification results in quite different areas which turn out to have the same answer. This note is an slightly expanded version of the talk given

More information

Globalization and compactness of McCrory Parusiński conditions. Riccardo Ghiloni 1

Globalization and compactness of McCrory Parusiński conditions. Riccardo Ghiloni 1 Globalization and compactness of McCrory Parusiński conditions Riccardo Ghiloni 1 Department of Mathematics, University of Trento, 38050 Povo, Italy ghiloni@science.unitn.it Abstract Let X R n be a closed

More information

5 Quiver Representations

5 Quiver Representations 5 Quiver Representations 5. Problems Problem 5.. Field embeddings. Recall that k(y,..., y m ) denotes the field of rational functions of y,..., y m over a field k. Let f : k[x,..., x n ] k(y,..., y m )

More information

Matroid Optimisation Problems with Nested Non-linear Monomials in the Objective Function

Matroid Optimisation Problems with Nested Non-linear Monomials in the Objective Function atroid Optimisation Problems with Nested Non-linear onomials in the Objective Function Anja Fischer Frank Fischer S. Thomas ccormick 14th arch 2016 Abstract Recently, Buchheim and Klein [4] suggested to

More information

11. Dimension. 96 Andreas Gathmann

11. Dimension. 96 Andreas Gathmann 96 Andreas Gathmann 11. Dimension We have already met several situations in this course in which it seemed to be desirable to have a notion of dimension (of a variety, or more generally of a ring): for

More information

Algebraic Geometry. Andreas Gathmann. Class Notes TU Kaiserslautern 2014

Algebraic Geometry. Andreas Gathmann. Class Notes TU Kaiserslautern 2014 Algebraic Geometry Andreas Gathmann Class Notes TU Kaiserslautern 2014 Contents 0. Introduction......................... 3 1. Affine Varieties........................ 9 2. The Zariski Topology......................

More information

Topological properties

Topological properties CHAPTER 4 Topological properties 1. Connectedness Definitions and examples Basic properties Connected components Connected versus path connected, again 2. Compactness Definition and first examples Topological

More information

The Triangle Closure is a Polyhedron

The Triangle Closure is a Polyhedron The Triangle Closure is a Polyhedron Amitabh Basu Robert Hildebrand Matthias Köppe November 7, 21 Abstract Recently, cutting planes derived from maximal lattice-free convex sets have been studied intensively

More information

Classification of root systems

Classification of root systems Classification of root systems September 8, 2017 1 Introduction These notes are an approximate outline of some of the material to be covered on Thursday, April 9; Tuesday, April 14; and Thursday, April

More information

Optimization and Optimal Control in Banach Spaces

Optimization and Optimal Control in Banach Spaces Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,

More information

Eilenberg-Steenrod properties. (Hatcher, 2.1, 2.3, 3.1; Conlon, 2.6, 8.1, )

Eilenberg-Steenrod properties. (Hatcher, 2.1, 2.3, 3.1; Conlon, 2.6, 8.1, ) II.3 : Eilenberg-Steenrod properties (Hatcher, 2.1, 2.3, 3.1; Conlon, 2.6, 8.1, 8.3 8.5 Definition. Let U be an open subset of R n for some n. The de Rham cohomology groups (U are the cohomology groups

More information

Let X be a topological space. We want it to look locally like C. So we make the following definition.

Let X be a topological space. We want it to look locally like C. So we make the following definition. February 17, 2010 1 Riemann surfaces 1.1 Definitions and examples Let X be a topological space. We want it to look locally like C. So we make the following definition. Definition 1. A complex chart on

More information

4. Images of Varieties Given a morphism f : X Y of quasi-projective varieties, a basic question might be to ask what is the image of a closed subset

4. Images of Varieties Given a morphism f : X Y of quasi-projective varieties, a basic question might be to ask what is the image of a closed subset 4. Images of Varieties Given a morphism f : X Y of quasi-projective varieties, a basic question might be to ask what is the image of a closed subset Z X. Replacing X by Z we might as well assume that Z

More information

SYMPLECTIC GEOMETRY: LECTURE 5

SYMPLECTIC GEOMETRY: LECTURE 5 SYMPLECTIC GEOMETRY: LECTURE 5 LIAT KESSLER Let (M, ω) be a connected compact symplectic manifold, T a torus, T M M a Hamiltonian action of T on M, and Φ: M t the assoaciated moment map. Theorem 0.1 (The

More information

THE CAPORASO-HARRIS FORMULA AND PLANE RELATIVE GROMOV-WITTEN INVARIANTS IN TROPICAL GEOMETRY

THE CAPORASO-HARRIS FORMULA AND PLANE RELATIVE GROMOV-WITTEN INVARIANTS IN TROPICAL GEOMETRY THE CAPORASO-HARRIS FORMULA AND PLANE RELATIVE GROMOV-WITTEN INVARIANTS IN TROPICAL GEOMETRY ANDREAS GATHMANN AND HANNAH MARKWIG Abstract. Some years ago Caporaso and Harris have found a nice way to compute

More information

π X : X Y X and π Y : X Y Y

π X : X Y X and π Y : X Y Y Math 6130 Notes. Fall 2002. 6. Hausdorffness and Compactness. We would like to be able to say that all quasi-projective varieties are Hausdorff and that projective varieties are the only compact varieties.

More information

MA 206 notes: introduction to resolution of singularities

MA 206 notes: introduction to resolution of singularities MA 206 notes: introduction to resolution of singularities Dan Abramovich Brown University March 4, 2018 Abramovich Introduction to resolution of singularities 1 / 31 Resolution of singularities Let k be

More information

THE POINCARE-HOPF THEOREM

THE POINCARE-HOPF THEOREM THE POINCARE-HOPF THEOREM ALEX WRIGHT AND KAEL DIXON Abstract. Mapping degree, intersection number, and the index of a zero of a vector field are defined. The Poincare-Hopf theorem, which states that under

More information

Tree sets. Reinhard Diestel

Tree sets. Reinhard Diestel 1 Tree sets Reinhard Diestel Abstract We study an abstract notion of tree structure which generalizes treedecompositions of graphs and matroids. Unlike tree-decompositions, which are too closely linked

More information

SOME REMARKS ON THE TOPOLOGY OF HYPERBOLIC ACTIONS OF R n ON n-manifolds

SOME REMARKS ON THE TOPOLOGY OF HYPERBOLIC ACTIONS OF R n ON n-manifolds SOME REMARKS ON THE TOPOLOGY OF HYPERBOLIC ACTIONS OF R n ON n-manifolds DAMIEN BOULOC Abstract. This paper contains some more results on the topology of a nondegenerate action of R n on a compact connected

More information

A new parametrization for binary hidden Markov modes

A new parametrization for binary hidden Markov modes A new parametrization for binary hidden Markov models Andrew Critch, UC Berkeley at Pennsylvania State University June 11, 2012 See Binary hidden Markov models and varieties [, 2012], arxiv:1206.0500,

More information

The Strong Largeur d Arborescence

The Strong Largeur d Arborescence The Strong Largeur d Arborescence Rik Steenkamp (5887321) November 12, 2013 Master Thesis Supervisor: prof.dr. Monique Laurent Local Supervisor: prof.dr. Alexander Schrijver KdV Institute for Mathematics

More information

4.1 Notation and probability review

4.1 Notation and probability review Directed and undirected graphical models Fall 2015 Lecture 4 October 21st Lecturer: Simon Lacoste-Julien Scribe: Jaime Roquero, JieYing Wu 4.1 Notation and probability review 4.1.1 Notations Let us recall

More information

COMP538: Introduction to Bayesian Networks

COMP538: Introduction to Bayesian Networks COMP538: Introduction to Bayesian Networks Lecture 9: Optimal Structure Learning Nevin L. Zhang lzhang@cse.ust.hk Department of Computer Science and Engineering Hong Kong University of Science and Technology

More information

FOUNDATIONS OF ALGEBRAIC GEOMETRY CLASS 27

FOUNDATIONS OF ALGEBRAIC GEOMETRY CLASS 27 FOUNDATIONS OF ALGEBRAIC GEOMETRY CLASS 27 RAVI VAKIL CONTENTS 1. Proper morphisms 1 2. Scheme-theoretic closure, and scheme-theoretic image 2 3. Rational maps 3 4. Examples of rational maps 5 Last day:

More information

On Maps Taking Lines to Plane Curves

On Maps Taking Lines to Plane Curves Arnold Math J. (2016) 2:1 20 DOI 10.1007/s40598-015-0027-1 RESEARCH CONTRIBUTION On Maps Taking Lines to Plane Curves Vsevolod Petrushchenko 1 Vladlen Timorin 1 Received: 24 March 2015 / Accepted: 16 October

More information

Normal Fans of Polyhedral Convex Sets

Normal Fans of Polyhedral Convex Sets Set-Valued Analysis manuscript No. (will be inserted by the editor) Normal Fans of Polyhedral Convex Sets Structures and Connections Shu Lu Stephen M. Robinson Received: date / Accepted: date Dedicated

More information

Tangent spaces, normals and extrema

Tangent spaces, normals and extrema Chapter 3 Tangent spaces, normals and extrema If S is a surface in 3-space, with a point a S where S looks smooth, i.e., without any fold or cusp or self-crossing, we can intuitively define the tangent

More information

June 21, Peking University. Dual Connections. Zhengchao Wan. Overview. Duality of connections. Divergence: general contrast functions

June 21, Peking University. Dual Connections. Zhengchao Wan. Overview. Duality of connections. Divergence: general contrast functions Dual Peking University June 21, 2016 Divergences: Riemannian connection Let M be a manifold on which there is given a Riemannian metric g =,. A connection satisfying Z X, Y = Z X, Y + X, Z Y (1) for all

More information

The Lefthanded Local Lemma characterizes chordal dependency graphs

The Lefthanded Local Lemma characterizes chordal dependency graphs The Lefthanded Local Lemma characterizes chordal dependency graphs Wesley Pegden March 30, 2012 Abstract Shearer gave a general theorem characterizing the family L of dependency graphs labeled with probabilities

More information

DISCRETE INVARIANTS OF GENERICALLY INCONSISTENT SYSTEMS OF LAURENT POLYNOMIALS LEONID MONIN

DISCRETE INVARIANTS OF GENERICALLY INCONSISTENT SYSTEMS OF LAURENT POLYNOMIALS LEONID MONIN DISCRETE INVARIANTS OF GENERICALLY INCONSISTENT SYSTEMS OF LAURENT POLYNOMIALS LEONID MONIN Abstract. Consider a system of equations f 1 = = f k = 0 in (C ) n, where f 1,..., f k are Laurent polynomials

More information

Parameterizing orbits in flag varieties

Parameterizing orbits in flag varieties Parameterizing orbits in flag varieties W. Ethan Duckworth April 2008 Abstract In this document we parameterize the orbits of certain groups acting on partial flag varieties with finitely many orbits.

More information

Undirected Graphical Models

Undirected Graphical Models Undirected Graphical Models 1 Conditional Independence Graphs Let G = (V, E) be an undirected graph with vertex set V and edge set E, and let A, B, and C be subsets of vertices. We say that C separates

More information

9. Integral Ring Extensions

9. Integral Ring Extensions 80 Andreas Gathmann 9. Integral ing Extensions In this chapter we want to discuss a concept in commutative algebra that has its original motivation in algebra, but turns out to have surprisingly many applications

More information

LECTURE 10: THE ATIYAH-GUILLEMIN-STERNBERG CONVEXITY THEOREM

LECTURE 10: THE ATIYAH-GUILLEMIN-STERNBERG CONVEXITY THEOREM LECTURE 10: THE ATIYAH-GUILLEMIN-STERNBERG CONVEXITY THEOREM Contents 1. The Atiyah-Guillemin-Sternberg Convexity Theorem 1 2. Proof of the Atiyah-Guillemin-Sternberg Convexity theorem 3 3. Morse theory

More information

Course 311: Michaelmas Term 2005 Part III: Topics in Commutative Algebra

Course 311: Michaelmas Term 2005 Part III: Topics in Commutative Algebra Course 311: Michaelmas Term 2005 Part III: Topics in Commutative Algebra D. R. Wilkins Contents 3 Topics in Commutative Algebra 2 3.1 Rings and Fields......................... 2 3.2 Ideals...............................

More information

On improving matchings in trees, via bounded-length augmentations 1

On improving matchings in trees, via bounded-length augmentations 1 On improving matchings in trees, via bounded-length augmentations 1 Julien Bensmail a, Valentin Garnero a, Nicolas Nisse a a Université Côte d Azur, CNRS, Inria, I3S, France Abstract Due to a classical

More information

In English, this means that if we travel on a straight line between any two points in C, then we never leave C.

In English, this means that if we travel on a straight line between any two points in C, then we never leave C. Convex sets In this section, we will be introduced to some of the mathematical fundamentals of convex sets. In order to motivate some of the definitions, we will look at the closest point problem from

More information

Week 3: Faces of convex sets

Week 3: Faces of convex sets Week 3: Faces of convex sets Conic Optimisation MATH515 Semester 018 Vera Roshchina School of Mathematics and Statistics, UNSW August 9, 018 Contents 1. Faces of convex sets 1. Minkowski theorem 3 3. Minimal

More information

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,

More information

Chapter 1. Preliminaries

Chapter 1. Preliminaries Introduction This dissertation is a reading of chapter 4 in part I of the book : Integer and Combinatorial Optimization by George L. Nemhauser & Laurence A. Wolsey. The chapter elaborates links between

More information

Logarithmic functional and reciprocity laws

Logarithmic functional and reciprocity laws Contemporary Mathematics Volume 00, 1997 Logarithmic functional and reciprocity laws Askold Khovanskii Abstract. In this paper, we give a short survey of results related to the reciprocity laws over the

More information

14 : Theory of Variational Inference: Inner and Outer Approximation

14 : Theory of Variational Inference: Inner and Outer Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Maria Ryskina, Yen-Chia Hsu 1 Introduction

More information

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1 Chapter 2 Probability measures 1. Existence Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension to the generated σ-field Proof of Theorem 2.1. Let F 0 be

More information

8. Prime Factorization and Primary Decompositions

8. Prime Factorization and Primary Decompositions 70 Andreas Gathmann 8. Prime Factorization and Primary Decompositions 13 When it comes to actual computations, Euclidean domains (or more generally principal ideal domains) are probably the nicest rings

More information

For Ramin. From Jonathan December 9, 2014

For Ramin. From Jonathan December 9, 2014 For Ramin From Jonathan December 9, 2014 1 Foundations. 1.0 Overview. Traditionally, knot diagrams are employed as a device which converts a topological object into a combinatorial one. One begins with

More information

When does a mixture of products contain a product of mixtures?

When does a mixture of products contain a product of mixtures? When does a mixture of products contain a product of mixtures? Jason Morton Penn State May 19, 2014 Algebraic Statistics 2014 IIT Joint work with Guido Montufar Supported by DARPA FA8650-11-1-7145 Jason

More information

ACO Comprehensive Exam March 20 and 21, Computability, Complexity and Algorithms

ACO Comprehensive Exam March 20 and 21, Computability, Complexity and Algorithms 1. Computability, Complexity and Algorithms Part a: You are given a graph G = (V,E) with edge weights w(e) > 0 for e E. You are also given a minimum cost spanning tree (MST) T. For one particular edge

More information

Preliminaries and Complexity Theory

Preliminaries and Complexity Theory Preliminaries and Complexity Theory Oleksandr Romanko CAS 746 - Advanced Topics in Combinatorial Optimization McMaster University, January 16, 2006 Introduction Book structure: 2 Part I Linear Algebra

More information

Locally convex spaces, the hyperplane separation theorem, and the Krein-Milman theorem

Locally convex spaces, the hyperplane separation theorem, and the Krein-Milman theorem 56 Chapter 7 Locally convex spaces, the hyperplane separation theorem, and the Krein-Milman theorem Recall that C(X) is not a normed linear space when X is not compact. On the other hand we could use semi

More information

PERVERSE SHEAVES ON A TRIANGULATED SPACE

PERVERSE SHEAVES ON A TRIANGULATED SPACE PERVERSE SHEAVES ON A TRIANGULATED SPACE A. POLISHCHUK The goal of this note is to prove that the category of perverse sheaves constructible with respect to a triangulation is Koszul (i.e. equivalent to

More information

1 Directional Derivatives and Differentiability

1 Directional Derivatives and Differentiability Wednesday, January 18, 2012 1 Directional Derivatives and Differentiability Let E R N, let f : E R and let x 0 E. Given a direction v R N, let L be the line through x 0 in the direction v, that is, L :=

More information

INTRODUCTION TO REAL ANALYTIC GEOMETRY

INTRODUCTION TO REAL ANALYTIC GEOMETRY INTRODUCTION TO REAL ANALYTIC GEOMETRY KRZYSZTOF KURDYKA 1. Analytic functions in several variables 1.1. Summable families. Let (E, ) be a normed space over the field R or C, dim E

More information

ALGEBRA: From Linear to Non-Linear. Bernd Sturmfels University of California at Berkeley

ALGEBRA: From Linear to Non-Linear. Bernd Sturmfels University of California at Berkeley ALGEBRA: From Linear to Non-Linear Bernd Sturmfels University of California at Berkeley John von Neumann Lecture, SIAM Annual Meeting, Pittsburgh, July 13, 2010 Undergraduate Linear Algebra All undergraduate

More information

Algebraic Methods in Combinatorics

Algebraic Methods in Combinatorics Algebraic Methods in Combinatorics Po-Shen Loh 27 June 2008 1 Warm-up 1. (A result of Bourbaki on finite geometries, from Răzvan) Let X be a finite set, and let F be a family of distinct proper subsets

More information

14 : Theory of Variational Inference: Inner and Outer Approximation

14 : Theory of Variational Inference: Inner and Outer Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2014 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Yu-Hsin Kuo, Amos Ng 1 Introduction Last lecture

More information

Stochastic Complexity of Variational Bayesian Hidden Markov Models

Stochastic Complexity of Variational Bayesian Hidden Markov Models Stochastic Complexity of Variational Bayesian Hidden Markov Models Tikara Hosino Department of Computational Intelligence and System Science, Tokyo Institute of Technology Mailbox R-5, 459 Nagatsuta, Midori-ku,

More information

Root systems and optimal block designs

Root systems and optimal block designs Root systems and optimal block designs Peter J. Cameron School of Mathematical Sciences Queen Mary, University of London Mile End Road London E1 4NS, UK p.j.cameron@qmul.ac.uk Abstract Motivated by a question

More information

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Theorems Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

Algebraic Statistics progress report

Algebraic Statistics progress report Algebraic Statistics progress report Joe Neeman December 11, 2008 1 A model for biochemical reaction networks We consider a model introduced by Craciun, Pantea and Rempala [2] for identifying biochemical

More information

Model Complexity of Pseudo-independent Models

Model Complexity of Pseudo-independent Models Model Complexity of Pseudo-independent Models Jae-Hyuck Lee and Yang Xiang Department of Computing and Information Science University of Guelph, Guelph, Canada {jaehyuck, yxiang}@cis.uoguelph,ca Abstract

More information

The cocycle lattice of binary matroids

The cocycle lattice of binary matroids Published in: Europ. J. Comb. 14 (1993), 241 250. The cocycle lattice of binary matroids László Lovász Eötvös University, Budapest, Hungary, H-1088 Princeton University, Princeton, NJ 08544 Ákos Seress*

More information

On the mean connected induced subgraph order of cographs

On the mean connected induced subgraph order of cographs AUSTRALASIAN JOURNAL OF COMBINATORICS Volume 71(1) (018), Pages 161 183 On the mean connected induced subgraph order of cographs Matthew E Kroeker Lucas Mol Ortrud R Oellermann University of Winnipeg Winnipeg,

More information

Explicit Examples of Strebel Differentials

Explicit Examples of Strebel Differentials Explicit Examples of Strebel Differentials arxiv:0910.475v [math.dg] 30 Oct 009 1 Introduction Philip Tynan November 14, 018 In this paper, we investigate Strebel differentials, which are a special class

More information

A necessary and sufficient condition for the existence of a spanning tree with specified vertices having large degrees

A necessary and sufficient condition for the existence of a spanning tree with specified vertices having large degrees A necessary and sufficient condition for the existence of a spanning tree with specified vertices having large degrees Yoshimi Egawa Department of Mathematical Information Science, Tokyo University of

More information

Asymptotic Model Selection for Naive Bayesian Networks

Asymptotic Model Selection for Naive Bayesian Networks Journal of Machine Learning Research 6 (5) 1-35 Submitted 7/3; Revised /4; Published 1/5 Asymptotic Model Selection for Naive Bayesian Networks Dmitry Rusakov Dan Geiger Computer Science Department Technion

More information

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2. APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product

More information

Affine Geometry and the Discrete Legendre Transfrom

Affine Geometry and the Discrete Legendre Transfrom Affine Geometry and the Discrete Legendre Transfrom Andrey Novoseltsev Department of Mathematics University of Washington April 25, 2008 Andrey Novoseltsev (UW) Affine Geometry April 25, 2008 1 / 24 Outline

More information

On John type ellipsoids

On John type ellipsoids On John type ellipsoids B. Klartag Tel Aviv University Abstract Given an arbitrary convex symmetric body K R n, we construct a natural and non-trivial continuous map u K which associates ellipsoids to

More information

1 Differentiable manifolds and smooth maps

1 Differentiable manifolds and smooth maps 1 Differentiable manifolds and smooth maps Last updated: April 14, 2011. 1.1 Examples and definitions Roughly, manifolds are sets where one can introduce coordinates. An n-dimensional manifold is a set

More information

fy (X(g)) Y (f)x(g) gy (X(f)) Y (g)x(f)) = fx(y (g)) + gx(y (f)) fy (X(g)) gy (X(f))

fy (X(g)) Y (f)x(g) gy (X(f)) Y (g)x(f)) = fx(y (g)) + gx(y (f)) fy (X(g)) gy (X(f)) 1. Basic algebra of vector fields Let V be a finite dimensional vector space over R. Recall that V = {L : V R} is defined to be the set of all linear maps to R. V is isomorphic to V, but there is no canonical

More information

TORIC WEAK FANO VARIETIES ASSOCIATED TO BUILDING SETS

TORIC WEAK FANO VARIETIES ASSOCIATED TO BUILDING SETS TORIC WEAK FANO VARIETIES ASSOCIATED TO BUILDING SETS YUSUKE SUYAMA Abstract. We give a necessary and sufficient condition for the nonsingular projective toric variety associated to a building set to be

More information

Part III. 10 Topological Space Basics. Topological Spaces

Part III. 10 Topological Space Basics. Topological Spaces Part III 10 Topological Space Basics Topological Spaces Using the metric space results above as motivation we will axiomatize the notion of being an open set to more general settings. Definition 10.1.

More information

The Topology of Intersections of Coloring Complexes

The Topology of Intersections of Coloring Complexes The Topology of Intersections of Coloring Complexes Jakob Jonsson October 19, 2005 Abstract In a construction due to Steingrímsson, a simplicial complex is associated to each simple graph; the complex

More information

Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 5

Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 5 Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 5 Instructor: Farid Alizadeh Scribe: Anton Riabov 10/08/2001 1 Overview We continue studying the maximum eigenvalue SDP, and generalize

More information

Contents. O-minimal geometry. Tobias Kaiser. Universität Passau. 19. Juli O-minimal geometry

Contents. O-minimal geometry. Tobias Kaiser. Universität Passau. 19. Juli O-minimal geometry 19. Juli 2016 1. Axiom of o-minimality 1.1 Semialgebraic sets 2.1 Structures 2.2 O-minimal structures 2. Tameness 2.1 Cell decomposition 2.2 Smoothness 2.3 Stratification 2.4 Triangulation 2.5 Trivialization

More information

4 Packing T-joins and T-cuts

4 Packing T-joins and T-cuts 4 Packing T-joins and T-cuts Introduction Graft: A graft consists of a connected graph G = (V, E) with a distinguished subset T V where T is even. T-cut: A T -cut of G is an edge-cut C which separates

More information

DISCRETIZED CONFIGURATIONS AND PARTIAL PARTITIONS

DISCRETIZED CONFIGURATIONS AND PARTIAL PARTITIONS DISCRETIZED CONFIGURATIONS AND PARTIAL PARTITIONS AARON ABRAMS, DAVID GAY, AND VALERIE HOWER Abstract. We show that the discretized configuration space of k points in the n-simplex is homotopy equivalent

More information