An Asymptotic Behaviour of the Marginal Likelihood for General Markov Models

Size: px

Start display at page:

Download "An Asymptotic Behaviour of the Marginal Likelihood for General Markov Models"

Alison McKinney
5 years ago
Views:

1 Journal of Machine Learning Research 1 (011) Submitted 1/10; Revised 9/11; Published 11/11 An Asymptotic Behaviour of the Marginal Likelihood for General Markov Models Piotr Zwiernik Institute for Pure and Applied Mathematics 460 Portola Plaza Box Los Angeles, CA PIOTR.ZWIERNIK@GMAIL.COM Editor: Kenji Fukumizu Abstract The standard Bayesian Information Criterion (BIC) is derived under regularity conditions which are not always satisfied in the case of graphical models with hidden variables. In this paper we derive the BIC for the binary graphical tree models where all the inner nodes of a tree represent binary hidden variables. This provides an extension of a similar formula given by Rusakov and Geiger for naive Bayes models. The main tool used in this paper is the connection between the growth behavior of marginal likelihood integrals and the real log-canonical threshold. Keywords: BIC, marginal likelihood, singular models, tree models, Bayesian networks, real logcanonical threshold 1. Introduction A key step in the Bayesian learning of graphical models is to compute the marginal likelihood of the data, which is the likelihood function averaged over the parameters with respect to the prior distribution. Given a fully observed system, the theory of graphical models provides a simple way to obtain the marginal likelihood. This was explained for example by Cooper and Herskovits (199) and Heckerman et al. (1995). However, when some of the variables in the system are hidden (never observed), the exact determination of the marginal likelihood is typically intractable (for example Chickering and Heckerman, 1997). This motivates the search for efficient techniques to approximate the marginal likelihood. In this paper we focus on the large sample behavior of the marginal likelihood called the Bayes Information Criterion (BIC). To present basic results on the BIC we need to introduce some notation. Let X be a random variable with values in [m] := {1,...,m}. Its distribution q = (q 1,...,q m ) can be identified with a point in the probability simplex m 1 = {x R m : i x i = 1, x i 0} R m. Consider a map p : Θ m 1 and letm = p(θ) be a parametric discrete model for X with the parameter space Θ and parametrization p. Let X (N) = X 1,...,X N be a random sample from the distribution q m 1. By Z N we denote the marginal likelihood and by L(θ;X (N),M) = P(X (N) M,θ) the likelihood function. Thus Z N =P(X (N) M)= L(θ;X (N),M)ϕ(θ)dθ, Θ c 011 Piotr Zwiernik.

2 ZWIERNIK where θ denotes the model parameters and ϕ(θ) is a prior distribution on Θ. The stochastic complexity is defined by F N = logz N and the entropy function by S= m i=1 q i logq i. In statistical theory to obtain the BIC we usually require that the asymptotic limit of the likelihood function, as N, is maximized over a unique point in the interior of the parameter space where the Jacobian matrix of the parametrization is full rank. For the class of problems for which this assumption is satisfied Schwarz (1978) and Haughton (1988) showed that, as N, EF N = N S+ d logn+ O(1), where d = dimθ (Watanabe, 009, Corollary 1.15 and Section 6). The same formula works if the limit of the likelihood is maximized over a finite number of points in the interior of Θ. Geometrically, for large sample sizes function Z N concentrates around the maxima. This enables us to apply the Laplace approximation locally in the neighborhood of each maximum. It can be proved (see Proposition 5) that the above formula can be generalized for the case when the set, over which the limit of the likelihood is maximized, forms a sufficiently regular compact subset of the parameter space. Denote this subset by Θ. Then, as N, EF N = N S+ d d logn+ O(1), (1) where d = dim Θ. Note that in our case Θ is a set of zeros of a real analytic function. Therefore, it will be always a semi-analytic set, that is given by {g 1 (θ) 0,...,g r (θ) 0}, where g i are all analytic functions. It follows that the dimension is well defined (Bierstone and Milman, 1988, Remark.1). In the case of models with hidden variables the locus of the points maximizing the limit of the likelihood may not be sufficiently regular. In this case the likelihood will have a different asymptotic behavior around different points and relatively more mass of the marginal likelihood integral will be related to neighborhoods of singular points (see Figure 1). For these points we cannot use the Laplace approximation. Nevertheless, the computation of the BIC is still possible using results of Watanabe (009) and some earlier works of Arnold, Varchenko and collaborators (Arnold et al., 1988). This formula will differ from the standard BIC. First, the coefficient of log N can be different from d d in (1). Second, we sometimes encounter an additional log log N term affecting the asymptotics (see Theorem 4). Let again q be the true data generating distribution and M = p(θ) a discrete parametric statistical model with the parameter space Θ. Let ϕ : Θ R be a prior distribution. Throughout the paper we always assume: (A1) The prior distribution ϕ is strictly positive, bounded and smooth on Θ. (A) There exists θ Θ such that p(θ)=q and q lies in the interior of the probability simplex. (A3) The set Θ R d is a compact and semianalytic set of dimension d. 384

ASYMPTOTICS OF THE MARGINAL LIKELIHOOD FOR TREE MODELS Figure 1: The case when the likelihood is maximized over a singular subset of Θ=[ 1,1] given by θ 1 θ = 0.

3 ASYMPTOTICS OF THE MARGINAL LIKELIHOOD FOR TREE MODELS Figure 1: The case when the likelihood is maximized over a singular subset of Θ=[ 1,1] given by θ 1 θ = 0. In this paper we consider an important class of parametric models with large number of hidden variables, called general Markov models, assuming for simplicity that all the random variables in the system are binary. This model class is extensively used in phylogenetics (Semple and Steel, 003, Chapter 8) and in the analysis of causal systems (see Pearl and Tarsi, 1986). We begin with a quick informal introduction to general Markov models which is then formalized in Section 3.1. Let T =(V,E) be an undirected tree with the vertex set V and the edge set E. Let T r denote T rooted in r, that is a tree with one distinguished vertex r and all the edges directed away from r. Consider the Markov process Y =(Y v ) v V on T r, which by definition is the Bayesian network on T r. Then, the general Markov model is a family of marginal distributions over the subvector of Y corresponding to the leaves of T r. It is well known that this model class does not depend on the rooting. Therefore, we denote this model class, omitting the rooting, bym T. For a tree T with n leaves we denote the subvector of Y corresponding to the leaves of T by X =(X 1,...,X n ) with some arbitrary numbering of leaves. The subvector of Y corresponding to the inner nodes is denoted by H. By construction the general Markov model is a statistical model for X. Let q M T be the true distribution and Σ = [ˆµ i j ] be the covariance matrix of X. A surprising fact proved in this paper is that the zeros in Σ, or equivalently, marginal independencies between components of X, completely determine the asymptotics for the marginal likelihood. We say that two nodes u,v of T are separated by another node w, if w lies on the unique path between u and v. Let l denote the number of inner nodes v of T such that for each triple i, j,k of leaves separated in T by v we have ˆµ i j ˆµ ik ˆµ jk = 0 but there exist leaves i, j separated by v such that ˆµ i j 0. In terms of the conditional independence defining the general Markov model, an inner node v contributes to l if for every three leaves i, j,k such that X i X j X k H v at least two are marginally independent but there exist two leaves i, j such that X i X j H v but not X i X j. In addition, we say that an inner node v is degenerate (or q-degenerate) if for any two leaves i, j separated by v we have ˆµ i j = 0. In other words v is degenerate if for every i, j such that X i X j H v also X i X j. All other nodes are called nondegenerate. We denote by n e the number of edges of T and by n v the number of its nodes. The following result is a special instance of (Watanabe, 009, the Main Formula II, p. 34): 385

4 ZWIERNIK Theorem 1 Let T r = (V,E) be a tree rooted in r and X (N) be a random sample from q. With assumptions (A1), (A) and (A3), if there are no q-degenerate nodes then, as N, EF N = N S+ n v+ n e l logn+ O(1). Note, in particular, that the above formula is independent of the rooting. Example 1 Let n=4 and assume that data are generated from the Bayesian network given by the quartet tree in Figure. If q is such that Σ has no zeros then l = 0 and the coefficient of logn is 11 Figure : A quartet tree rooted in r.. This corresponds to the classical BIC since the dimension of the parameter space is 11. If the true distribution q M T satisfies in addition the marginal independence condition X 1 (X,X 3,X 4 ) then ˆµ 1i = 0 for i=,3,4 and r contributes to l. We depict this situation on the left hand side in Figure 3. Here the dashed edge means that for every pair i, j of leaves separated by this edge ˆµ i j = 0 and an inner node contributes to l if its valency, in the forest with the dashed edges removed, is. In this case l = 1 and the coefficient of logn is 9. If, in addition, q satisfies X 1 X 3 (X,X 4 ) then l = and hence the coefficient is 7. The corresponding graph is depicted in the middle of Figure 3. Example (Naive Bayes model) Consider a star tree with one inner node and n leaves. If there are no degenerate nodes this corresponds to q being either a regular point or a type 1 singularity as defined by Rusakov and Geiger (005). If l = 0 then q is a regular point and the coefficient of logn is equal to n+1. If l = 1 then q is a type 1 singularity and the coefficient is equal to n 1. This corresponds exactly to (Rusakov and Geiger, 005, Theorem 4). If the inner node is degenerate this corresponds to the type singularity which does not satisfy assumptions of Theorem 1. If q is such that there are degenerate nodes the computation of the BIC is much harder because the likelihood in this case maximizes over a singular subset of the parameter space. The case of star Figure 3: Three graphs representing submodels of the quartet tree model with some additional marginal independencies. In the third case the root is degenerate. 386

5 ASYMPTOTICS OF THE MARGINAL LIKELIHOOD FOR TREE MODELS trees was investigated by Rusakov and Geiger (005). In this paper we obtain a closed form formula for the BIC in the case of trivalent trees, whose all inner nodes have valency three. This is provided in Theorem which together with Theorem 1 are the main results of this paper. The importance of trivalent trees follows mainly from the fact that any other tree model is a submodel of a model for a trivalent tree. They also form a natural class for models of evolution in biology. If T is trivalent then for every inner node v V there exist A,B,C [n] such that A B C = [n] and A, B,C are separated by v. By the defining conditional independence conditions we have that X A X B X C Y v, where X A = (X i ) i A. In this case we call v degenerate if q is such that X A X B X C. Let l 0 denote the number of degenerate nodes. Theorem Let T r =(V,E) be a rooted trivalent tree with n 3 leaves and root r. With assumptions (A1), (A) and (A3) if r is degenerate but all its neighbors are not, then, as N, ( nv + n e l EF N = N S+ 5l ) 0+ 1 logn+ O(1). 4 In all other cases, as N, ( nv + n e l EF N = N S+ 5l ) 0 logn cloglogn+ O(1), 4 where c is a nonnegative integer. Moreover c = 0 always if either both r is nondegenerate or if r and all its neighbors are degenerate. The coefficients of logn above are given in this special form to show the correction term with respect to the coefficient in the smooth case in Theorem 1. Example 3 Consider again the quartet model from Example 1. If Σ has no zeros then l = 0, l 0 = 0 and we get the same formula as previously with coefficient 11. Now assume that q is such that in addition the marginal independence X 1 X (X 3,X 4 ) holds. The situation is depicted on the right hand side in Figure 3. The edge(r,a) is dashed since for any two leaves separated by this edge the corresponding covariance is zero. In this case l = 1, l 0 = 1, the root is degenerate but all its neighbors are not and hence, by Theorem, the coefficient of logn is 3 and c=0. Consider finally the case when all off-diagonal elements of Σ are zero. In this case l = 0 and l 0 = and hence the coefficient of logn is also 3. However, later in Example 5 and Remark 9 we will see that c may be strictly greater than zero in this case. Following Rusakov and Geiger (005), the main method of the proof is to change the coordinates of the model so that the induced parameterization becomes simple. This gives us a much better insight into the model structure which is described by Zwiernik and Smith (011b) and Zwiernik and Smith (011a). Since the BIC is invariant with respect to these changes, the reparameterized problem still gives the solution to the original question. Our main analytical tool is the real logcanonical threshold (see for example Watanabe, 009). This is an important geometric invariant which in certain cases can be computed in a relatively simple way using discrete geometry. The relevance of this invariant to the BIC is given by Theorem 4. We remark that the techniques developed in this paper can be applied to obtain the BIC also in the case of non-trivalent trees. The paper is organized as follows. In Section, following Watanabe (009), we provide the theory of asymptotic expansion of marginal likelihood integrals. This theory enables us to analyze 387

6 ZWIERNIK the asymptotic behavior of the marginal likelihood without the standard regularity assumptions. In Section 3 we define Bayesian networks on rooted trees. We also obtain a basic result on the BIC in the case when the observed likelihood is maximised over a sufficiently smooth subset of the parameter space. This gives a simple proof of Theorem 1. The proof of Theorem is more technical and hence divided into three main steps split between Sections 4, 5 and 6. Finally, in Section 7, we combine all these results.. Asymptotics of Marginal Likelihood Integrals In this section we introduce the real log-canonical threshold and link it to the asymptotic behavior of marginal likelihood integrals. We present how this enables us to obtain the BIC in the case of a general class of statistical models, which is mostly based on previous results of Sumio Watanabe..1 The Real Log-canonical Threshold Given θ 0 R d, leta θ0 (R d ) be the ring of real-valued functions f :R d R that are analytic at θ 0. Given a subset Θ R d satisfying (A3), let A Θ (R d ) be the ring of real functions analytic at each point θ 0 Θ. If f A Θ (R d ), then for every θ 0 Θ, f can be locally represented as power series centered at θ 0. Denote by A Θ (Rd ) the subset of A Θ (R d ) consisting of all non-negative functions. Usually the ambient space is clear from the context and in this case we omit it in our notation writing A θ0,a Θ anda Θ. Definition 3 (The real log-canonical threshold) Given a compact semianalytic set Θ R d such that dimθ=d, a real analytic function f A Θ (Rd ) and a smooth positive function ϕ : R d R, consider the zeta function defined as ζ(z)= f(θ) z ϕ(θ)dθ. () Θ By Theorem.4 of Watanabe (009) this function is extended to a meromorphic function on the entire complex line and its poles are real and positive. The real log-canonical threshold of f denoted by rlct Θ ( f ;ϕ) is the smallest pole of ζ(z). By mult Θ ( f ;ϕ) we denote the multiplicity of this pole. By convention if ζ(z) has no poles then rlct Θ ( f ;ϕ) = and mult Θ ( f ;ϕ) = d. If ϕ(θ) 1 then we omit ϕ in the notation writing rclt Θ ( f) and mult Θ ( f). Define RLCT Θ ( f ;ϕ) to be the pair (rlct Θ ( f ;ϕ),mult Θ ( f ;ϕ)), and we order these pairs so that (r 1,m 1 )>(r,m ) if r 1 > r, or r 1 = r and m 1 < m. LetM = p(θ) m 1 be a parametric discrete model and q m 1 be a probability distribution. WithM and q fixed the Kullback-Leibler distance K : Θ R is defined by K(θ)= m i=1 q i log q i p i (θ). (3) It is well known that K(θ) 0 on Θ and K(θ) = 0 if and only if p(θ) = q. If q is the true data generating distribution then assumption (A) means that Θ={θ : K(θ)=0} is non-empty. The following theorem gives the motivation to study the real log-canonical threshold in the statistical context. 388

7 ASYMPTOTICS OF THE MARGINAL LIKELIHOOD FOR TREE MODELS Theorem 4 (Watanabe) Let M be a parametric discrete statistical model, q the true data generating distribution and K the corresponding Kullback-Leibler distance. With assumptions (A1), (A) and (A3), as N, EF N = N S+rlct Θ (K;ϕ)logN+(mult Θ (K;ϕ) 1)loglogN+ O(1). To compute the real log-canonical threshold we split the integral in () into a sum of finitely many integrals over small neighbourhoods Θ 0 of some points θ 0 Θ for which we have efficient tools of computation. We can always do this using the partition of unity since Θ is compact. For each of the local integrals we use Hironaka s theorem to reduce it to a locally monomial case. The details are presented by Watanabe (009). Let θ 0 Θ and let W 0 be any sufficiently small open ball around θ 0 in R d. Then, by Theorem.4 of Watanabe (009), RLCT W0 ( f ;ϕ) does not depend on the choice of W 0 and hence it is denoted by RLCT θ0 ( f ;ϕ). If f(θ 0 ) 0 then RLCT W0 ( f ;ϕ) = (,d) and hence we can constrain only to points θ 0 such that f(θ 0 )=0. In our context this means that we consider only points in the q-fiber Θ. The local computations give the answer to the global question because, by (Lin, 011, Proposition.5), the set of pairs RLCT Θ0 ( f ;ϕ) for θ 0 Θ has a minimum and RLCT Θ ( f ;ϕ)= min θ 0 Θ RLCT Θ 0 ( f ;ϕ), (4) where Θ 0 = W 0 Θ is the neighbourhood of θ 0 in Θ. For each θ 0 Θ to compute RLCT Θ0 ( f ;ϕ) we consider two cases. If θ 0 lies in the interior of Θ then we can assume Θ 0 = W 0 and hence RLCT Θ0 ( f ;ϕ)=rlct θ0 ( f ;ϕ). If θ 0 bd(θ), where bd(θ) denotes the set of boundary points of Θ, the computations may change significantly because the real log-canonical threshold depends on the boundary conditions (cf. Example.7 of Lin, 011). Nevertheless, it can be showed that at least if there exists an open subset U R d such that U Θ 0 and f A U (Rd ) then Because in this case RLCT Θ0 ( f) RLCT θ0 ( f). (5) ( f(θ)) z dθ= ( f(θ)) z dθ+ ( f(θ)) z dθ W 0 Θ 0 W 0 \Θ 0 which implies that RLCT θ0 ( f)=min{rlct Θ0 ( f),rlct W0 \Θ 0 ( f)}. Finally, whenever Θ /0 we have RLCT Θ (K)= min RLCT Θ0 (K). (6) θ 0 Θ The following result enables us to obtain the BIC in the smooth case. Proposition 5 Let M be a parametric statistical model with parametrization p, and q be the true data generating distribution. Let K A Θ (Rd ) be the Kullback-Leibler distance defined in (3). Given (A1), (A) and (A3) assume that there exists a smooth manifold M R d satisfying Θ=M Θ. Then, as N, d d EF N = N S+ logn+ O(1), where d = dim Θ. 389

8 ZWIERNIK Proof By assumption (A1) there exist two constants c,c> 0 such that c<ϕ(θ)<c on Θ. Therefore c (K(θ)) z dθ<ζ(z)< C (K(θ)) z dθ Θ Θ and it follows that RLCT Θ (K;ϕ) = RLCT Θ (K). By Theorem 4 it suffices to prove the following lemma which generalises Proposition 3.3 of Saito (007). Lemma 6 Let Θ R d be a compact semianalytic set and f A Θ (Rd ). If there exists a smooth manifold M R d such that Θ = M Θ and θ 0 Θ then RLCT θ0 ( f) = RLCT Θ ( f) = ( d d,1) where d = dim Θ. To prove this recall that the real log-canonical threshold RLCT θ0 ( f) does not depend on the choice of a sufficiently small neighborhood W 0 of θ 0. Since Θ = M Θ and M is a smooth manifold it follows that for each point θ 0 of Θ there exists an open neighborhood W 0 of θ 0 in R d with local coordinates w 1,...,w d centered at θ 0 such that the local equation of Θ is w 1 + +w c = 0, where c=d d. A single blow-up π at the origin satisfies all the conditions of Hironaka s Theorem since in the new coordinates over one of the charts f(π(u)) = u 1a(u) where a(u) is nowhere vanishing and π (u)=u c 1 1. For other charts the situation is the same and hence RLCT θ0 ( f)=(c/,1). Since by (4) RLCT Θ ( f)=min θ0 Θ RLCT W0 Θ( f) it suffices to show that if θ 0 is a boundary point of Θ then RLCT W0 Θ( f) (c/,1). But this follows from (5) and the fact that RLCT θ0 ( f)=(c/,1) as θ 0 is a smooth point of M. The lemma is hence proved. 3. General Markov Models In this section we formally define the general Markov modelm T and give in Theorem 1 the asymptotic expansion of the marginal likelihood when q andm T satisfy conditions of Proposition Definition of the Model Class All random variables considered in this paper are assumed to be binary with the value either 0 or 1. Let T r =(V,E) be a rooted tree. For any directed edge e=(k,l) E we say that k and l are adjacent and k is a parent of l and we denote it by k=pa(l). For every β {0,1} V let p β =P( v V{Y v = β v }). The Markov process on T r is a sequence Y =(Y v ) v V of binary random variables such that for each β=(β v ) v V {0,1} V p β (θ)=θ (r) β r v V\r θ (v) β v β pa(v), (7) where θ (v) β v β = P(Y pa(v) v = β v Y pa(v) = β pa(v) ) and θ (r) β r = P(Y r = β r ). In a more standard statistical language these models are just fully observed Bayesian networks on rooted trees. Recall that n e = E and n v = V. Since θ (v) 0 i + θ(v) 1 i = 1 for all v V and i = 0,1 then the Markov process on T r defined by (7) has exactly n e + 1 free parameters in the vector θ: one for the root distribution θ (r) 1 and two for each edge (u,v) E given by θ (v) 1 0 and θ(v) 1 1. The parameter space is Θ T =[0,1] n e+1. The general Markov model on T r is induced from the Markov process on T r by assuming that all the inner nodes represent hidden random variables. Hence we consider induced marginal 390

9 ASYMPTOTICS OF THE MARGINAL LIKELIHOOD FOR TREE MODELS probability distributions over the leaves of T r. The set of leaves is denoted by L. We assume that T r has n leaves and hence we can associate L with the set [n] given some arbitrary numbering of the leaves. Let Y =(X,H) where X =(X 1,...,X n ) denotes the variables represented by the leaves of T r and H denotes the vector of variables represented by inner nodes, that is X = (Y v ) v L and H =(Y v ) v V\L. We define the general Markov modelm T to be the model in the probability simplex n 1 obtained by summing out in (7) over all possible values of the inner nodes. By definitionm T is the image of the map p : Θ T n 1 given by p α (θ)= H θ (r) β r v V\r θ (v) β v β pa(v) for any α {0,1} L, whereh is the set of all vectors β=(β v ) v V such that (β v ) v L = α. Because the model class does not depend on the rooting we usually omit the root r in the notation. For a more detailed treatment see (Semple and Steel, 003, Chapter 8). 3. The BIC in the Smooth Case For q M T let Σ = [ˆµ i j ] R n n be the corresponding covariance matrix of the random vector represented by the leaves of T. We define the q-fiber as Θ T = {θ Θ T : p(θ)=q} = {θ Θ T : K(θ)=0}. The geometry of Θ T is directly related to the real log-canonical threshold of the Kullback-Leibler distance. We now show that this geometry is determined by zeros in Σ. For this we need to introduce some new concepts. We say that that an edge e E is isolated relative to q if ˆµ i j = 0 for all i, j [n] such that e E(i j), where E(i j) denotes the set of edges in the path joining i and j. By Ê E we denote the set of all edges of T which are isolated relative to q. By T = (V,E\ Ê) we denote the forest obtained from T by removing edges in Ê. We now define relations on Ê and E\ Ê. For two edges e,e with either {e,e } Ê or {e,e } E\Ê write e e if either e=e or e and e are adjacent and all the edges that are incident with both e and e are isolated relative to q. Let us now take the transitive closure of restricted to pairs of edges in Ê to form an equivalence relation on Ê. Similarly, take the transitive closure of restricted to the pairs of edges in E\ Ê to form an equivalence relation in E\ Ê. We will let [Ê] and [E\ Ê] denote the set of equivalence classes of Ê and E\ Ê respectively. By construction all the inner nodes of T have either degree zero in T or the degree is strictly greater than one. We say that a node v V is non-degenerate with respect to q if either v is a leaf of T or degv in T. Otherwise we say that the node is degenerate with respect to q. Note that this coincides with the definition of a degenerate node given in the introduction. Moreover, the isolated edges in Examples 1 and 3 correspond precisely to the dashed edges in Figure 3. The set of all nodes which are degenerate with respect to q is denoted by V. Proposition 7 (Zwiernik and Smith, 011b) Let T be a tree with n leaves. Let q M T and let T be defined as above. If each of the inner nodes of T has degree at least two in T then Θ T is a manifold with corners and dim Θ T = l, where l is the number of nodes which have degree two in T. In this way we can compute the asymptotic behavior of the marginal likelihood in the case when assumptions of Proposition 7 are satisfied. 391

10 ZWIERNIK Proposition 8 Let T be a tree and M T be the corresponding general Markov model. Let q M T be the real distribution generating the data such that each inner node of T has degree at least two in T. Then ( ) nv + n e l RLCT ΘT (K)=,1. Proof Since every inner node of T has degree at least two in T then by Proposition 7 there exists a smooth manifold M R n v+n e such that Θ T = M Θ T and dim Θ=l. The result follows from Proposition 5 and the fact that dimθ T = n e + 1=n v + n e. By Theorem 4, Proposition 8 implies Theorem 1 since l in its statement is exactly the number of inner nodes v such that the degree of v in T is two. Remark 9 Theorem 1 is still true if (A1) is replaced by the assumption that the prior distribution is bounded on Θ T and there exists an open subset of Θ T with a non-empty intersection with Θ T where the prior is strictly positive. In particular we can use conjugate Beta priors θ (v) 1 i Beta(α(v) i,β (v) i ) as long as α (v) i,β (v) i The Ideal-theoretic Approach In this section we define the real log-canonical threshold of an ideal. Theorem 11 translates the problem of finding the real log-canonical threshold of the Kullback-Leibler distance into algebra. We then analyse general Markov models from this perspective. In Theorem 14 we apply a useful change of coordinates which enables us to work out the real log-canonical threshold in the singular case. 4.1 The Real Log-canonical Threshold of an Ideal Let f 1,..., f r A Θ then the ideal generated by f 1,..., f r is a subset ofa Θ denoted by f 1,..., f r ={ f A Θ : f(θ)= r i=1 h i (θ) f i (θ),h i A Θ }. Following Lin (011) we generalize the notion of the real log-canonical thresholds to the ideal I = f 1,..., f r. This mirrors the analytic definition of the log-canonical threshold of an ideal (see for example Lazarsfeld, 004, Section 9.3.D). By definition RLCT Θ (I;ϕ)=RLCT Θ ( f 1,..., f r ;ϕ) := RLCT Θ ( f ;ϕ), where f(θ)= f 1 (θ)+ + f r(θ). By (Lin, 011, Proposition 4.5) the real log-canonical threshold does not depend on the choice of generators of I. The following important proposition enables us to use the full power of the ideal-theoretic approach. Proposition 10 Let f,g A Θ (R d ) and let I be an ideal ina Θ (R d ). Then 39

11 ASYMPTOTICS OF THE MARGINAL LIKELIHOOD FOR TREE MODELS i Let ρ : Ω Θ be a proper real analytic isomorphism and ρ I ={ f ρ : f I} be the pullback of I toa Ω. Then, RLCT Θ (I;ϕ)=RLCT Ω (ρ I;(ϕ ρ) ρ ), where ρ denotes the Jacobian of ρ. ii If ϕ is positive and bounded on Θ then RLCT Θ (I;ϕ)=RLCT Θ (I). iii If there exist constants c,c > 0 such that cg(θ) f(θ) c g(θ) for every θ Θ then RLCT Θ ( f)= RLCT Θ (g). iv Let I = f 1,..., f r and J = g 1,...,g r where g i = u i f i for i = 1,...,r and there exist positive constants c,c such that c<u i (θ)< C for all θ Θ and for all i=1,...,r. Then RLCT Θ (I)= RLCT Θ (J). The ideal-theoretic approach proves to be useful in a fairly general statistical context: Theorem 11 (Lin, 011) Let p = (p 1,..., p m ) : Θ m 1 be a polynomial mapping and M = p(θ) be the statistical model of X with values in [m]. For a given point q M define I = p 1 (θ) q 1,..., p m (θ) q m A Θ. (8) Let q denote the true data generating distribution and K(θ) be the corresponding Kullback-Leibler distance defined in (3). Moreover, let ϕ the prior distribution on Θ satisfying (A1). Then RLCT Θ (K;ϕ)=RLCT Θ (I ;ϕ)=rlct Θ (I), (9) where the second equation in (9) follows from Proposition 10 ii. We now perform the change of coordinates f θω : Θ T Ω T, f pκ : n 1 K T discussed in detail by Zwiernik and Smith (011b). We have the following diagram, where the top row is the original parametrization and where the induced parameterisation ψ T is given in the bottom row. Θ T f ωθ f θω Ω T p ψ T n 1 f κp K T f pκ Here f θω and f pκ are polynomial isomorphisms, and hence, by Proposition 10 (i), in our computations of the real-log canonical threshold, we can alternatively constrain to the bottom row of the diagram. We denote the coordinates of K T by κ=(κ I ) for I [n], I /0. The coordinates of Ω T are denoted by ω=((s v ),(η uv )) for all v V and (u,v) E. Both ω and κ have a statistical meaning as described by Zwiernik and Smith (011b). However, in this work we use them in a purely algebraic manner. We just note that the coordinates ofk T are certain functions of the moments. In particular, κ i =EX i for i=1,...,n and each κ i j corresponds to the covariance between X i and X j. Interpretation of other coordinates is more complicated. 393

12 ZWIERNIK Simple linear constraints defining Θ T become only slightly more complicated when expressed in the new parameters. The choice of parameter values is not free anymore in the sense that constraints for each of the parameters involve other parameters. The new parameter space Ω T is given by s r [ 1,1] and for each (u,v) E (cf. Equation (19) in Zwiernik and Smith, 011b) (1+s v ) (1 s u )η uv (1 s v ) (1 s v ) (1+s u )η uv (1+s v ). (10) In the new coordinate system the situation is more tractable because ψ T has a simpler structure. Proposition 1 (Zwiernik and Smith, 011b) Let T r = (V,E) be a rooted trivalent tree with n leaves. Then for each i=1,...,n one has κ i (ω)= 1 (1 s i) and κ I (ω)= 1 4 (1 s r(i) ) v V(I)\I s degv v (u,v) E(I) η uv for all I, where the degree of v V(I) is considered in T(I)=(V(I),E(I)), which is the smallest subtree of T containing I. Let I denote the pullback of the ideal I A ΘT to the ideal in A ΩT induced by f θω. Thus I = f ωθ I ={ f f ωθ : f I}. The ideal describes Ω T = f θω ( Θ T ) as a subset of Ω T. Let [n] k denote all subsets of [n] with at least k elements. Then the pullback of I satisfies ( ) I = κ 1 ˆκ 1,...,κ n ˆκ n + κ I (ω) ˆκ I, (11) I [n] where ˆκ I are the corresponding coordinates of f pκ (q). Here the sum of ideals results in another ideal with the generating set which is the sum of generating sets of the summands. For local computations we use the following reduction. Proposition 13 (Lin, 011) Let I A x0 (R m ), J A y0 (R n ) be two ideals. If RLCT x0 (I)=(λ x,m x ) and RLCT y0 (J)=(λ y,m y ) then RLCT (x0,y 0 )(I+ J)=(λ x + λ y,m x + m y 1). Theorem 14 Let T r be a rooted tree with n leaves and q M T. Let I be the ideal defined by (8) and I the ideal defined by (11). Then RLCT ΘT (I)=RLCT ΩT (I)= min ω 0 Ω T RLCT Ω0 (I), where Ω 0 is a sufficiently small neighborhood of ω 0 in Ω T. Moreover, let J = I [n] κ I (ω) ˆκ I. Then, for every ω 0 Ω T ( n ) RLCT ω0 (I)=,0 + RLCT ω0 (J). (1) 394

13 ASYMPTOTICS OF THE MARGINAL LIKELIHOOD FOR TREE MODELS Proof Since f ωθ is an isomorphism with a constant Jacobian then the first part of the theorem follows from Proposition 10 (i). Let now W be an ε-box around ω 0 =((s 0 v),(η 0 uv)). If T is rooted in an inner leaf then by Proposition 1 the ideal J does not depend on s 1,...,s n. Since for every i=1,...,n the expression κ i ˆκ i depends only on s i then RLCT ω0 ( κ 1 ˆκ 1,...,κ n ˆκ n )=( n,1), which can be easily checked (see for example Proposition 3.3 of Saito, 007). Equation (1) follows from Proposition 13. Now assume that T is rooted in one of the leaves. In this case both κ 1 ˆκ 1,...,κ n ˆκ n and J depend on s r because κ I (ω)=(1 s r) f I (ω) for some monomial f I (ω) whenever r I. Therefore, we cannot use Proposition 13 directly. However, by assumption (A), q lies in the interior of the probability simplex and hence ˆκ i (0,1) for i = 1,...,n which is equivalent to s 0 i ( 1,1). Therefore, for each ω 0 one can find two positive constants c,c such that c 1 s r C in W. By Proposition 10 (iv) the real log canonical threshold of J in W is equal to the real log-canonical threshold of a an ideal with generators induced from the generators of J by replacing each 1 s r by 1. Now again (1) follows from Proposition The Main Reduction Step Recall that ˆκ i j = Cov(X i,x j ). In this section we prove a technical result which enables us to reduce the computations of RLCT ω0 (J) to two simpler cases. First, when q is such that ˆκ i j 0 for all i, j [n]. Second, when q is such that ˆκ i j = 0 for all i, j [n]. Moreover, the second case is reduced to computations for monomial ideals which are amenable to various combinatorial techniques. Let T be a trivalent tree with n 3 leaves and let q M T. If all the equivalence classes in [Ê] are singletones or [Ê] is empty, which is equivalent to every inner node being of degree at least two in T, then Theorem 1 gives us the asymptotic behavior of the marginal likelihood. Thus, let assume that there is at least one class in[ê] which is not a singleton. Let T 1,...,T k denote trees representing the equivalence classes in [Ê] and let S 1,...,S m denote trees induced by the connected components of E\ Ê. Let L 1,...L k denote the sets of leaves of T 1,...,T k. For each S i i=1,...,m by Remark 5. (iv) of Zwiernik and Smith (011b) its set of leaves denoted by [n i ] is a subset of [n]. For each S i the number of nodes, edges and nodes of degree in T is denoted by n i v, n i e and l i respectively. We illustrate this notation in Figure 4 where the dashed edges represent edges in Ê. Simpler examples are given in Figure 3. Lemma 15 Let T =(V,E) be a trivalent rooted tree with n 4 leaves and let q M T. Let J = I [n] κ I (ω) ˆκ I as in Theorem 14. If ω 0 Ω T then RLCT ω0 (J)= m i=1 RLCT ω0 (J(S i ))+ k i=1 RLCT ω0 (J(T i ))+(0,1 m k), (13) where J(S i )= I [ni ] κ I (ω) ˆκ I for i=1,...,m and J(T i )= w,w L i κ ww (ω) for i=1,...,k. Proof We first show that I:ˆκI =0 κ I (ω) = i, j:ˆκi j =0 κ i j (ω). The inclusion is clear. We now show. First note that for every I [n] if ˆκ I = 0 then either η 0 e = 0 for an edge e E(I) or 395

14 ZWIERNIK Figure 4: An example of a forest T induced by a point q. s r(i) = 1. There exist i, j I such that such that ˆκ i j= 0 and the r(i j)=r(i). It follows by Proposition 1 that κ I (ω)=κ i j (ω) f(ω) for a polynomial f(ω) and therefore the inclusion is also true. This implies J = κ I (ω) ˆκ I + I:ˆκ I 0 I:ˆκ I =0 κ I (ω) = Hence, to proof the lemma, it suffices to show that for every ω 0 Ω T RLCT ω0 ( m i=1 m i=1 J(S i )+ κ i j (ω). i, j:ˆκ i j =0 J(S i )+ κ i j (ω) ) (14) i, j:ˆκ i j =0 is equal to the right hand side of (13). If e E \ Ê then by definition there exist i, j [n] such that ˆκ i j 0 and e E(i j). Since, by Proposition 1, ˆκ i j = η 0 e f(ω 0 ) for a polynomial f then in particular η 0 e 0. It follows that for a sufficiently small ε for each E E\ Ê one can find positive constants c(ε),c(ε) such that c(ε) e E η e C(ε) holds in the ε-box around ω 0. Similarly if v / V (cf. Section 3.) then there exist positive constants d(ε),d(ε) such that d(ε) (1 s v) D(ε) in the ε-box around ω 0. It follows by Proposition 10 (iv) that in computations of the real log-canonical threshold in (14) we can replace each κ i j (ω) by (1 s r(i j) )δ r(i j) η e (15) e E(i j) Ê where δ r(i j) = 1 if r(i j) V and δ r(i j) = 0 otherwise. Thus, in (14) we can replace the ideal i, j:ˆκi j =0 κ i j (ω) by the ideal J 1 = i, j: ˆκi j =0 (1 s r(i j) )δ r(i j) η e E(i j) Ê e. However, if we define J = k i=1 w,w L i (1 s r(ww ) )δ r(ww ) e E(ww ) η e (16) then it can be checked that J 1 = J. To show that J J 1, fix j = 1,...,k and w,w L j, and show that the corresponding generator of J lies in J 1. Note that by construction each of w,w either has degree two in T or is a leaf of T. Hence, by the definition of Ê, there exist i, j [n] such that E(i j) Ê = E(ww ). It follows that each generator in (16) is also in the set of generators of J 1 and hence J J 1. To show the opposite inclusion, note that, if E(i j) intersects with more than one component T 1,...,T k then the corresponding generator in (15) is a product of some generators in (16) and hence it lies in J. 396

15 ASYMPTOTICS OF THE MARGINAL LIKELIHOOD FOR TREE MODELS Since the generators of every J(S i ) for i=1,...,m and every (1 s r(ww ))δr(ww ) w,w L j e E(ww ) for j= 1,...,k involve disjoint sets of variables then by Proposition 13 the term in (14) is equal to ( ) m k RLCT ω0 (J(S i ))+ RLCT ω0 (1 s r(ww ) )δr(ww ) η e +(0,1 m k). i=1 i=1 w,w L j η e e E(ww ) Again by Proposition 10 (iv) for each i=1,...,k ( ) RLCT ω0 (1 s r(ww ) )δr(ww ) η e = RLCT ω0 (J(T i )) w,w L i e E(ww ) which finishes the proof. We note that, by Proposition 8 and the formula in (1) for each S i : 6. The Case of Zero Covariances RLCT(J(S i ))+ n i = ni v+ n i e l i. (17) In this subsection we assume that q M T is such that ˆκ i j = 0 for all i, j [n]. This implies the full joint marginal independence X 1 X n. The aim is to prove the following proposition. Proposition 16 Let T be a trivalent tree with n 3 leaves rooted in r V. Let q M T be such that ˆκ i j = 0 for all i, j [n]. Let J be the ideal defined in Theorem 14. Then ( n ) min RLCT ω0 (J)= ω 0 Ω T 4,m, where m=1 if either r is a leaf of T or r together with all its neighbors are all inner nodes of T. In all other cases we cannot obtain an explicit upper bound for m and hence m 1. The strategy of the proof of Proposition 16 is as follows. First, in Section 6.1, we show that the local computations can be restricted to a special subset of Ω T over which J can be replaced by a monomial ideal. Then, in Section 6., we present a method to compute the real log-canonical threshold of a monomial ideal. We use this method in Section The Deepest Singularity Note that, by Lemma 15, RLCT ω0 (J) = RLCT ω0 ( i, j [n] κ i j (ω) ) so without loss of generality we will assume in this section that J = i, j [n] κ i j (ω). Moreover, for each v V these ideals depend on s v only through the value of s v. It follows that the computations can be reduced only to points satisfying s v 0 for all inner nodes v of T. Henceforth, in this section, we always assume this is the case. We define the deepest singularity of Ω T as Ω deep :={ω Ω T : η e = 0 for all e Ê, s v = 1 for all v V}. 397

16 ZWIERNIK We note that, since ˆκ i j = 0 for all i, j [n], then Ê = E and V is equal to the set of all inner nodes of T. It follows that Ω deep is an affine subspace constrained to Ω T with all coordinates either 0 or 1. Proposition 17 Let T be a tree with n leaves. Let q M T such that ˆκ i j = 0 for all i, j [n]. Then min ω 0 Ω T RLCT ω0 (J)= min ω 0 Ω deep RLCT ω0 (J). Proof We first show that Ω T is a union of affine subspaces constrained to Ω T with a common intersection given by Ω deep. Let V 0 V and E 0 Ê and Ω (V0,E 0 ) ={ω Ω T : s v = 1 for all v V 0, η uv = 0 for all (u,v) E 0 }. We say that(v 0,E 0 ) is minimal for Σ if for every point ω in Ω (V0,E 0 ) and for every i, j [n] κ i j (ω)=0, and furthermore, that (V 0,E 0 ) is minimal with such a property (with respect to inclusion on both coordinates). We now show that Ω T = (V 0,E 0 ) min. Ω (V0,E 0 ). The first inclusion follows from the fact that if ω Ω T then κ i j (ω)= ˆκ i j = 0 for all i, j [n]. Therefore ω Ω (V0,E 0 ) for some minimal(v 0,E 0 ). The second inclusion is obvious. Each Ω (V0,E 0 ) is an affine subspace in R V + E, denoted by M (V0,E 0 ), constrained to Ω T. Let S denote the intersection lattice of all M (V0,E 0 ) for (V 0,E 0 ) minimal with ordering denoted by. For each i S let M (i) denote the corresponding intersection and define S i = M (i) \ j<im ( j). In this way we obtain an S-induced decomposition of R V + E (cf. Section 3.1 in Goresky and MacPherson, 1988). By (Lazarsfeld, 004, Example ) the function ω rlct ω (J) is lower semicontinuous (the argument used there works over the real numbers). This means that for every ω 0 Ω T and ε>0 there exists a neighborhood U of ω 0 such that rlct ω0 (J) rlct ω (J)+ε for all ω U. Since the set of values of the real log-canonical threshold is discrete this means that for every ω 0 Ω T and any sufficiently small neighborhood W 0 of ω 0, one has rlct ω0 (J) rlct ω (J) for all ω W 0. Moreover, rlct(j) is constant on each S i. Since for any neighborhood W 0 of ω 0 Ω deep we have W 0 S i /0 for all i S then necessarily the minimum of the real log-canonical threshold is attained for a point in the deepest singularity. Proposition 17 shows that in the singular case we can restrict our analysis to the neighborhood of Ω deep. Often however, we also consider points in a bigger set Ω 0 ={ω Ω T : η uv = 0 for all (u,v) Ê}. Note that Ω deep lies on the boundary of Ω T (cf. (10)) but Ω 0 also contains internal points of Ω T which will be crucial for some of the arguments later. We now formulate another technical lemma which enables us to reduce computations to the monomial case. 398

17 ASYMPTOTICS OF THE MARGINAL LIKELIHOOD FOR TREE MODELS Lemma 18 Assume that q M T is such that ˆκ i j = 0 for all i, j [n]. Let J(ω 0 ) be the ideal J translated to the origin. Then for every ω 0 Ω 0 RLCT 0 (J(ω 0 ))=RLCT 0 (J ), (18) where J is a monomial ideal such that each κ i j (ω+ω 0 ) in the set of generators of J(ω 0 ) is replaced either by s r(i j) (u,v) E(i j) η uv if s 0 r(i j) = 1, or by (u,v) E(i j) η uv if s 0 r(i j) 1. Proof Let i, j [n] and assume that ω 0 =((s 0 v),(η 0 e)) Ω 0 so that η 0 e = 0 for all e E. Then, by Proposition 1: κ i j (ω+ω 0 )= 1 4 (1 (s r(i j)+ s 0 r(i j) ) ) η e. (19) e E(i j) If s 0 r(i j) 1 for a sufficiently small ε>0 there exist positive constants c(ε), C(ε) such that c(ε)< 1 (s r(i j) +s 0 r(i j) ) < C(ε) for s r(i j) ( ε,ε). Therefore, by Proposition 10 (iv), we can replace this term in (19) with 1. If s 0 r(i j) = 1 rewrite 1 (1+s r(i j) ) as s r(i j) (+s r(i j) ). For a sufficiently small ε we can find two positive constants c(ε),c(ε) such that c<+s r(i j) < C whenever s r(i j) ( ε,ε). Again, by Proposition 10 (iv), we can replace +s r(i j) with 1. This proves Equation (18). Since J is a monomial ideal then, by (Lin, 011, Proposition 4.11) and Theorem 0 below, we can compute RLCT 0 (J ) using the method of Newton diagrams. We present this method in the following subsection. 6. Newton Diagram Method Given an analytic function f A 0 (R d ) we pick local coordinates x=(x 1,...,x d ) in a neighborhood of the origin. This allows us to represent f as a power series in x 1,...,x d such that f(x)= α c α x α. The exponents of terms of the polynomial f are vectors inn d. The Newton polyhedron of f denoted by Γ + ( f) is the convex hull of the subset {α+α : c α 0,α R d 0}. A subset γ Γ + ( f) is a face of Γ + ( f) if there exists β R d such that γ={α Γ + ( f) : α,β α,β for all α Γ + ( f)}. If γ is a subset of Γ + ( f) then we define f γ (x) = α γ N d c α x α. The principal part of f is, by definition, the sum of all terms of f supported on all compact faces of Γ + ( f). Example 4 Let f(x,y)=x 3 + xy+6x y+3x 4 y+y. Then the Newton diagram looks as follows: 399

18 ZWIERNIK where the dots correspond to the terms of f. There are only two bounded facets of Γ + ( f) and the principal part of f is equal to x 3 + xy+y. Definition 19 The principal part of the power series f with real coefficients is R-nondegenerate if for all compact faces γ of Γ + ( f) { x R n : f γ (x)= = f } γ (x)=0 {ω R n : x 1 x n = 0}. x 1 x n From the geometric point of view this condition means that the singular locus of the hypersurface defined by f γ (x)=0 lies outside of (R ) n for all compact faces γ of Γ + ( f). The following theorem shows that, if the principal part of f isr-nondegenerate and f A Θ, the computations are greatly facilitated. An example of an application of these methods in statistical analysis can be found in Yamazaki and Watanabe (004). Theorem 0 (Arnold et al., 1988) Let f A 0 (Rd ) and f(0)=0. If the principal part of f is R- nondegenerate then RLCT 0 ( f)=( 1 t,c) where t is the smallest number such that the vector(t,...,t) hits the polyhedron Γ + ( f) and c is the codimension of the face it hits. For a proof see Theorem 4.8, Lin (011). Let now f A θ 0 such that f(θ 0 )=0. We can then center f at θ 0 obtaining a function in A 0. Then we can use Theorem 0 to compute RLCT θ0 ( f). Remark 1 Note that this theorem in general will not give us RLCT Θ0 ( f) if 0 is a boundary point of Θ in which case we also need to resolve the defining inequalities. For a discussion see (Arnold et al., 1988, Section 8.3.4) and Example.7 in Lin (011). 6.3 Proof of Proposition 16 Let n 4. For each ω 0 Ω 0, let δ=δ(ω 0 ) {0,1} V denote the indicator vector satisfying δ v = 1 if v V is such that s 0 v = 1 and δ v = 0 otherwise. In particular δ i = 0 for all i=1,...,n because the leaves, by (A), are assumed to be non-degenerate. LetV δ =R n e+ δ =R δ R n e, where δ = v δ v, be the real space with variables representing the edges (x e ) e E and nodes (y v ) for all v such that δ v = 1. With some arbitrary numbering of the nodes and edges we order the variables as follows: y 1 y δ x e1 x ene. In Lemma 18, for each ω 0 Ω 0, we reduced our computations to the analysis of RLCT 0 (J ) where J has a simple monomial form. Let Q δ be a polynomial 3300

19 ASYMPTOTICS OF THE MARGINAL LIKELIHOOD FOR TREE MODELS function on Ω T defined as a sum of squares of generators of J. In particular RLCT 0 (J ) = RLCT 0 (Q δ ). The exponents of terms of the polynomial Q δ (ω) are vectors in {0,} ne+ δ. We have that Q δ (ω)= i j [n] s δ r(i j) r(i j) (u,v) E(i j) η uv. (0) The convex hull, in V δ, of the exponents of the terms in Q δ is called the Newton polytope of Q δ and denoted Γ(Q δ ). We now investigate this polytope which is needed to understand the polyhedron Γ + (Q δ ), which is needed to use Theorem 0. Since each term of Q δ corresponds to a path between two leaves then the construction of the Newton polytope Γ(Q δ ) V δ gives a direct relationship between paths in T and the points generating the polytope. Convex combinations of points corresponding to paths give rise to points in the polytope. Let E 0 E be the subset of edges of T such that one of the ends is in the set of leaves of T. We call these edges terminal. Note that each point generating Γ(Q δ ) satisfies e E0 x e = 4. This follows from the fact that each of these points corresponds to a path between two leaves in T and every such a path need to cross exactly two terminal edges. Consequently each point of Γ(Q δ ) needs to satisfy this equation as well. The induced facet of the Newton polyhedron Γ + (Q δ ) is given as and each point of Γ + (Q δ ) satisfies e E0 x e 4. The following lemma proves one part of Proposition 16. F 0 ={(y,x) Γ + (Q δ ) : e E 0 x e = 4} (1) Lemma (The real log-canonical threshold of J ) Under assumptions of Proposition 16 we have that rclt 0 (J )= n 4. Proof If n= then, since s 0 1,s0 1, by Lemma 18 we have that ( ) 1 RLCT ω0 (J )=RLCT 0 (η 1)=,1. Therefore Proposition 16 holds in this case. Now assume that n 4. By Theorem 0 we have to show that t= 4 n is the smallest t such that the vector(t,...,t) hits Γ +(Q δ ). To show that 4 n 1 Γ +(Q δ ) we construct a point q Γ(Q δ ) such that q 4 n1 coordinatewise. The point is constructed as follows. Construction 3 Let T =(V,E) be a trivalent tree with n 4 leaves, rooted in r. We present two constructions of networks of paths between the leaves of T. The first construction is for the case when the root is degenerate, δ r = 1. In this case T is necessarily rooted in an inner node. If n = 4 then the network consists of the two paths within cherries counted with multiplicity two. Each of the paths corresponds to a point in Γ(Q δ ). We order the coordinates ofv δ =R 5+ δ by y a y b x 1 x 5 where y a,y b are included only if δ a,δ b = 1. For example the point corresponding to the path involving edges e 1 and e is(,0;,,0,0,0). The barycenter of the points corresponding to all the four paths in the network is (1,1;1,1,0,1,1) both if T is rooted in a or b. 3301

20 ZWIERNIK If n>4 then we build the network recursively. Assume that T is rooted in an inner node a and pick an inner edge (a,b). Label the edges incident with a and b as for the quartet tree above and consider the subtree given by the quartet tree. Draw four paths as on the picture above. Let v be any leaf of the quartet subtree which is not a leaf of T and label the two additional edges incident with v by e 6 and e 7. Now we extend the network by adding e 6 to one of the paths terminating in v and e 7 to the other. Next we add an additional path involving only e 6 and e 7 like on the picture below. By construction v is the root of the additional path. We extend the network cherry by cherry until it covers all terminal edges. Note that we have made some choices building up the network and hence the construction is not unique. However, each of the inner nodes is always a root of at least one and at most two paths. Moreover, each edge is covered at most twice and each terminating edge is covered exactly two times. We have n paths in the network, all representing points of Γ(Q δ ) denoted by q 1,...,q n. Let q= 1 n n i=1 q i then q Γ(Q δ ) is given by x ab = 0, x e = 4 n for all e E\(a,b). The other coordinates by construction satisfy y a = 4 n, y b = 4 n if δ b = 1, and y v = n for all v V \{a,b} such that δ v = 1. If δ r = 0 then we proceed as follows. For n=4 consider a network of all the possible paths all counted with multiplicity one apart from the cherry paths (paths of length two) counted with multiplicity two. This makes eight paths and each edge is covered exactly four times. The coordinates of the point representing the barycenter of all paths in the network satisfy x e = 1 for all e E and y v = 1 for all v such that δ v = 1. This construction generalizes recursively in a similar way as the one for T rooted in an inner node. We always have n paths and each edge is covered exactly four times. The network induces a point q Γ(Q δ ) with coordinates given by y v = n for all v V such that δ v = 1 and x e = 4 n for e E. (This finishes the construction.) The point 4 n 1 lies in Γ +(Q δ ), which follows from Construction 3 and the fact that the constructed point q Γ(Q δ ) satisfies q 4 n 1. Moreover, for any s < 4 n the point s(1,...,1) does not satisfy e E0 x e 4 and hence it cannot be in Γ + (Q δ ). It follows that 4 n is the smallest t such that t1 Γ + (Q δ ) and therefore rlct 0 (J )= n 4. Note that the result does not depend on δ. To compute the multiplicity of the real log-canonical threshold of Q δ we have to get a better understanding of the polyhedron Γ + (Q δ ). According to Theorem 0 we need to find the codimension of the face of Γ + (Q δ ) hit by the vector 4 n1. First we find the hyperplane representation of the Newton polytope Γ(Q δ ) reducing the problem to a simpler but equivalent one. Definition 4 (A pair-edge incidence polytope) Let T =(V, E) be a trivalent tree with n 4 leaves. We define a polytope P n R n e, where n e = n 3, as the convex combination of points (q i j ) i, j [n] where k-th coordinate of q i j is one if the k-th edge is in the path between i and j and there is zero otherwise. We call P n a pair-edge incidence polytope by analogy to the pair-edge incidence matrix defined in (Mihaescu and Pachter, 008, Definition 1). 330

Asymptotic Approximation of Marginal Likelihood Integrals

Asymptotic Approximation of Marginal Likelihood Integrals Shaowei Lin 10 Dec 2008 Abstract We study the asymptotics of marginal likelihood integrals for discrete models using resolution of singularities