COUNTING INTEGER POINTS IN POLYHEDRA. Alexander Barvinok

COUNTING INTEGER POINTS IN POLYHEDRA Alexander Barvinok Papers are available at http://www.math.lsa.umich.edu/ barvinok/papers.html Let P R d be a polytope. We want to compute (exactly or approximately) the number P Z d of integer points in P. P Typeset by AMS-TEX

PART I: EXACT COUNTING IN FIXED DIMENSION Rational polyhedra P = (ξ,...,ξ d ) : d a ij ξ j b i, i =,...,n, j= where a ij,b i Z. The input size of P: L(P) = n(d+)+ ij log 2 ( a ij +) + i log 2 ( b i +), the number of bits needed to define P. 2

Generating functions m x m P We consider the sum m P Z d x m, where The motivating example x m = x µ xµ d d for m = (µ,...,µ d ) n m= x m = xn+ x m x The sum over the integer points on an interval. m n 3

Theorem. Let us fix d. There exists a polynomial time algorithm, which, given a rational polyhedron P R d without lines, computes the generating function f(p,x) = m P Z d x m in the form f(p,x) = i I ǫ i x v i ( x u i ) ( x u id), where ǫ i {,}, v i Z d, u ij Z d \{}. The complexity of the algorithm is L O(d), where L = L(P) is the input size of P. In particular, I = L O(d). Proved in 993 (with L O(d2) bound) and then again in 999. 4

Having computed we can Applications f(p,x) = m P Z d x m, ) efficiently count P Z d, the number of integer points in a given rational polytope P. Carefully substitute x =... = x d = into f(p,x); 2) solve integer programming problems of optimizing a given linear function on the set P Z d of integer points in P. There are at least two implementations: LattE (lattice point enumerator) by J. De Loera et al., now subsumed by LattE macchiato by M. Köppe, and barvinok by S. Verdoolaege http://www.math.ucdavis.edu/ latte http://www.kotnet.org/ skimo/barvinok http://freshmeat.net/projects/barvinok/ 5

Reducing polyhedra to cones Some ideas of the proof We can represent the polyhedron as a hyperplane section of a higher-dimensional cone. P P K We have P = K H, where P R d, K R d+, and H R d+ is an affine hyperplane identified with R d. Consequently, we have f(p,x) = f(k, x) x, where x = (x,x d+). d+ xd+ = 6

Dealing with cones: unimodular cones A cone K R d generated by a basis u,...,u d of Z d is called unimodular. For such a cone, we have f(k,x) = m K Z d x m = ( x u ) ( x u d). P Integer points in the non-negative orthant. m Z d + x m = ( x ) ( x d ). A unimodular cone differs from the non-negative orthant by a unimodular transformation. The main construction (993): For any fixed d there is a polynomial time algorithm which decomposes a given rational cone K R d into a combination of unimodular cones K i : [K] = i I ǫ i [K i ], where ǫ i {,}, K i are unimodular cones, and { if x R d [A] : R d R, [A](x) = otherwise is the indicator of a set A. 7

Example: continued fractions in dimension 2 Suppose that K R 2 is generated by (,) and (3,64). First, we compute 64 3 = 5+ 9 3 = 5+ 3+ 4 9 = 5+ 3+ 2+ 4. Hence 64/3 = [5;3,2,4]. Now we compute the convergents: [5;3,2] = 5+ 3+ 2 = 37 7,[5;3] = 5+ 3 = 6 3, [5] = 5. 8

Then we do cutting and pasting of cones: (3, 6) (7, 37) + (3, 6) (, 5) + (, 5) (3, 64) (3, 64) = (7, 37) Let K be the cone generated by (,) and (,). Starting with K, we cut the cone generated by (,) and (,5); paste the cone generated by (, 5) and (3, 6); cut the cone generated by (3, 6) and (7, 37); paste the cone generated by (7, 37) and (3, 64) to finally get K generated by (,) and (3,64). The four cones we cut and paste are unimodular. So we get f(k,x) = ( x )( x 2 ) ( x 2 )( x x 5 2 ) + ( x x 5 2 )( x3 x6 2 ) ( x 3 x6 2 )( x7 x37 2 ) + ( x 7 x37 2 )( x3 x64 2 ). 9

Higher dimensions For K generated by linearly independent u,...,u d Z d, let indk be the volume of the parallelepiped spanned by u,...,u d (so indk = if and only if K is unimodular). Using Minkowski s Convex Body Theorem, compute w Z d \{} such that w = d α i u i where α i (indk) /d. i= Let K i be the cone generated by u,...,u i,w,u i+,...,u d. Then [K] = d ǫ i [K i ] i= +indicators of lower-dimensional cones, where ǫ i {,} and Now iterate. indk i (indk) d d. = + + u 3 u 3 u 3 u u 2 u u 2 w u 2 w u w u 3 u 2 w u u u 2 w u 3 = + + u w u u 2 2 w u 3 u 2 u 3 u w w u u 3

PART II : APPROXIMATE COUNTING IN HIGHER DIMENSIONS We assume that P is defined by a system of linear equations Ax = b and inequalities so the picture looks more like this x, P Here A = (a ij ) is an integer k n matrix of rank k < n and b is an integer n-vector. k integer entries * * * * * * * * * * * * n Let Λ = Ax : x Z n be the lattice in Z k. Unless b Λ, we have P Z n =. A

Joint work with J.A. Hartigan (Yale). Let us consider the function The Gaussian formula g(ξ) = (ξ +)ln(ξ +) ξlnξ for ξ. 3 2 2 4 6 8 x Let us solve the optimization problem: Find max g(x) = n g(ξ j ) j= Subject to: x = (ξ,...,ξ n ) P. Since g is strictly concave, the maximum point z = (ζ,...,ζ n ) is unique and can be found efficiently by interior point methods, for example. Computing z is easy both in theory and in practice. Let us compute a k k matrix Q = (q ij ) by q ij = n ( ) a im a jm ζ 2 m +ζ m. m= We approximate the number of integer points in P by P Z n e g(z) detλ (2π) k/2 (detq) /2. 2

A couple of examples Let us compute the number of 4 4 non-negative integer matrices with row sums 22, 25, 93 and 64 and column sums 8, 286, 7 and 27. 8 286 7 27 22 25 93 64 * * * * * * * * * * * * * * * * The number of such matrices is 2259427676854.23 5. We have n = 6 variables and k = 7 equations (one equation can be thrown out). The Gaussian formula overestimates by about 6%. J. De Loera computed more examples. Here is one of them: the exact number of 3 3 3 arrays of non-negative integers with the sums [3, 22, 87], [5, 3, 77], [42,87,] along the affine coordinate hyperplanes is 8846838772659 8.84 5. The Gaussian formula gives the relative error of.85%. 87 42 5 3 77 3 22 87 Herewe have n = 27 variablesand k = 7 equations (two equationscan be thrown out). 3

The (first level of) intuition A random variable x is geometric if Pr { x = m } = pq m for m =,,... where p+q = and p,q >. We have Ex = q p and varx = q p 2. Conversely, if Ex = z then p = +z, q = z +z and varx = z 2 +z. Theorem. Let P R n be a polytope that is the intersection of an affine subspace in R n and the non-negative orthant R n +. Suppose that P has a non-empty interior (contains a point with strictly positive coordinates). Then the strictly concave function g(x) = n ) ((ξ j +)ln(ξ j +) ξ j lnξ j j= attains its maximum on P at a unique point z = (ζ,...,ζ n ) with positive coordinates. Suppose that x,...,x n are independent geometric random variables with expectations ζ,...,ζ n and let X = (x,...,x n ). Then the probability mass function of X is constant on P Z n and equal to e g(z) at every x P Z n. In particular, P Z n = e g(z) Pr { X P }. 4

Now, suppose that P = { } x : Ax = b, x. Let Y = AX, that is, Y = (y,...,y k ), where n y i = a ij x j. j= Theorem implies that We note that and that P Z n = e g(z) Pr { Y = b }. EY = b cov(y i,y j ) = n a im a jm varx k = m= n ( ) a im a jm ζ 2 m +ζ m. Now, weobservethaty isthesumofnindependent randomvectorsx j A j, wherea j is the j-th column of A, so we make a leap of faith and assume that the distribution of Y in the vicinity of its expectation is close to the distribution of a Gaussian random vector Y with the same expectation b and the covariance matrix Q. m= b Hence it is not unreasonable to assume that Pr { Y = b } Pr { Y b+fundamental domain of Λ } detλ (2π) k/2 (detq) /2. 5

More intuition from statistics and statistical physics In 957, E.T. Jaynes formulated a general principle. Let Ω be a large but finite probability space with an unknown measure µ, let f,...,f k : Ω R be random variables with known expectations Ef i = α i for i =,...,k and let g : Ω R be yet another random variable. Then to compute or estimate Eg one should assume that µ is the probability measure on Ω of the largest entropy such that that Ef i = α i for i =,...,k. Works great for the Maxwell-Boltzmann distribution. In 963, I.J. Good argued that the null hypothesis concerning an unknown probability distribution from a given class should be the one stating that the distribution is the maximum entropy distribution in the class. In our case, Ω is the set Z n + of non-negative integer vectors, f i are the linear equationsdefining polytopep, and µisthecounting probabilitymeasure onp Z n +. We approximate µ by the maximum entropy distribution on Z n + subject to the constraints Ef i = α i, where f i are the linear equations defining P. Fact: Among all distributions on Z + with a given expectation, the geometric distribution has the maximum entropy. The entropy of a geometric distribution with expectation x is g(x) = (x+)ln(x+) xlnx. A S Let us choose a maximum entropy distribution among all probability distributions on a given set S R n with the expectation in a given affine subspace A. It is not hard to argue that the conditional distribution on S A must be uniform. 6

Multi-way transportation polytopes and multi-way contingency tables Transportationpolytopes. Letuschoosepositiver,...,r m andc,...,c n such that r +...+r m = c +...+c n = N. The polytope P of non-negative m n matrices (x ij ) with row sums r,...,r m and column sums c,...,c n is called a (two-index) transportation polytope. We have dimp = (m )(n ). Suppose r i and c j are integer. Integer points in P are called (two-way) contingency tables. ν-way transportation polytopes. Let us fix an integer ν 2. The polytope P of ν-dimensional k... k ν arrays (x j...j ν ) with prescribed sectional sums j k... j i k i j i+ k i+... j ν k ν are called ν-way transportation polytopes. x j...j i,j,j i+...j ν As long as the natural balance conditions are met, P is a polytope with dimp = k k ν (k +...+k ν )+ν. Integer points in P are called ν-way contingency tables. 7

Some results Fix ν and let k,...,k ν grow roughly proportionately. We proved: The number of integer points in P is asymptotically Gaussian provided ν 5. We suspect it is Gaussian already for ν 3. In particular, the number of non-negative integer k k magic cubes, that is, contingency tables with all sectional sums equal to r = αk ν is ( ) ((α+) +o() α+ α α) k ν ( 2πα 2 +2πα ) (kν ν+)/2 k (ν ν 2 )(k )/2 as k +, provided ν 5, r is integer and α is separated away from ; For ν = 2, the asymptotic of the number of integer contingency tables with equalrowsumsandequalcolumnsumswasrecentlycomputedbye.r.canfieldand B. McKay. It is differs from the Gaussian estimate by a constant factor (generally, greater than ). This corresponds to the Edgeworth correction to the Gaussian distribution. 8

The curious case of ν = 2 The Gaussian formula should be multiplied by the correction factor, computed as follows. Let Z = (z ij ) be the matrix maximizing g(x) = ij ((x ij +)ln(x ij +) x ij lnx ij ) on the polytope of non-negative m n matrices with row sums (r,...,r m ) and column sums (c,...,c n ). Let us consider the quadratic form q : R m+n R: Let q(s,t) = 2 ( ) z 2 ij +z ij (si +t j ) 2 for (s,t) = (s,...,s m ;t,...,t n ). ij u =,...,;,..., }{{}}{{} m times n times and let L R m+n be any hyperplane not containing u. Then the restriction of q onto L is strictly positive definite and we consider the Gaussian probability measure on L with the density proportional to e q. Let us define random variables f,h : L R by Let f(s,t) = z ij (z ij +)(2z ij +)(s i +t j ) 3 6 h(s,t) = 24 Then the correction factor is ij and z ij (z ij +) ( zij 2 +6z ij + ) (s i +t j ) 4 ij for (s,t) = (s,...,s m ;t,...,t n ). µ = Ef 2 and ν = Eh. exp { µ } 2 +ν. 9

Ramifications: counting - points in polytopes A similar approach works for the set P {,} n of points with - coordinates in P R n, where P is defined by the system { } P = x : Ax = b, x. Here A = (a ij ) is a k n integer matrix of rank k < n and b Z k. Let Λ = A(Z n ) Z k. The geometric distribution is replaced by the Bernoulli distribution and function g(ξ) = (ξ +)ln(ξ +) ξlnξ is replaced by the standard entropy function h(x) = ξln ξ +( ξ)ln ξ for ξ..6.4.2 We solve a convex optimization problem: Find max h(x) =.2.4.6.8 x n h(ξ j ) j= Subject to: x = (ξ,...,ξ n ) P. Since h is strictly concave, the maximum point z = (ζ,...,ζ n ) is unique and can be found efficiently by interior point methods, for example. We compute the k d matrix Q = (q ij ) by The Gaussian formula: q ij = n ( a im a jm ζm ζm) 2. m= P {,} n e h(z) detλ (2π) k/2 (detq) /2. 2