Packing, coding, and ground states From information theory to physics. Lecture III. Packing and energy minimization bounds in compact spaces

Packing, coding, and ground states From information theory to physics Lecture III. Packing and energy minimization bounds in compact spaces Henry Cohn Microsoft Research New England

Pair correlations For simplicity, we ll focus on finite point configurations in S n 1. The distance distribution of a finite subset C of S n 1 measures how often each distance occurs between pairs of points. For 1 t 1, define A t = #{(x, y) C 2 x, y = t}. Recall that x y 2 = 2 x, y, so A t counts the number of pairs at distance 2 2t. In physics terms, this is equivalent to the pair correlation function. We can express energy for a pair potential in terms of pair correlations: f ( x y 2 ) = f (2 2t)A t. x y 1 t<1 (The right side has only finitely many nonzero summands.)

Constraints on pair correlations Because x y f ( x y 2 ) = 1 t<1 f (2 2t)A t, figuring out how low the energy can be amounts to understanding what the possible pair correlation functions are. There are some obvious constraints for a configuration with N points: A t 0 for all t, A 1 = N, and t A t = N 2. These follow trivially from the definition A t = #{(x, y) C 2 x, y = t}. There are also less obvious constraints, such as A t t = 2 x, y = x 0. t x C This is the inequality we used to analyze simplices.

Delsarte linear programming inequalities Delsarte discovered (in a closely related context) an infinite sequence of linear inequalities generalizing this last one. They use special functions, namely Gegenbauer or ultraspherical polynomials. These are a family Pk n of polynomials in one variable, with deg(pk n ) = k, such that A t Pk n (t) 0 t for all k. In particular, P1 n (t) = t, so we recover the previous inequality, and P0 n (t) = 1. Equivalently, for every finite set C S n 1, Pk n ( x, y ) 0. We ll return shortly to what these polynomials are and why they have this property.

Linear programming bounds Now we can try to minimize 1 t<1 f (2 2t)A t subject to these inequalities. This is a linear function of A t, and all our inequalities are linear as well. Thus, we get a lower bound for energy from the following infinite-dimensional linear programming problem: Find A t for 1 t 1 to minimize A 1 = N A t 0 t A t = N 2 t A tpk n (t) 0 for k 1 1 t<1 A t f (2 2t) subject to:

Linear programming duality We can formulate the dual linear program, in which we try to prove bounds on energy by taking linear combinations of the constraints. That amounts to the following theorem: Theorem (Delsarte,..., Yudin). Suppose h = k h kpk n with h k 0 for k 1, and suppose h(t) f (2 2t) for t [ 1, 1]. Then every N-point configuration C on S n 1 satisfies f ( x y 2 ) N 2 h 0 Nh(1). x y In other words, we need a lower bound h for the potential function f such that h has non-negative ultraspherical coefficients. Then we get a lower bound for f -energy. How do we choose h to optimize this bound? Nobody knows in general, but we can do it in certain special cases.

Theorem (Delsarte,..., Yudin). Suppose h = k h kpk n with h k 0 for k 1, and suppose h(t) f (2 2t) for t [ 1, 1]. Then every N-point configuration C on S n 1 satisfies f ( x y 2 ) N 2 h 0 Nh(1). x y Proof: We have f ( x y 2 ) h( x, y ) x y x y = h( x, y ) Nh(1) = N 2 h 0 Nh(1) + k 1 N 2 h 0 Nh(1). h k Pk n ( x, y ) Q.E.D.

This all rests on the fundamental inequality Pk n ( x, y ) 0. It might seem like an extraordinarily wasteful proof technique, since we are throwing away tons of terms. But in fact Pk n ( x, y ) averages to zero over the whole sphere, so perhaps those terms aren t likely to be so large anyway.

Applying these bounds LP bounds are behind almost every case in which universal optimality, or indeed any sharp bound on energy, is known. When could the bound be sharp? We need f ( x y 2 ) = h( x, y ) for all x, y C with x y, and we need Pk n ( x, y ) = 0 for all k 1 for which h k > 0. In practice, we choose h to be a polynomial of as low a degree as possible subject to agreeing with f to order 2 at each inner product that occurs between distinct points in C. Then you can check that everything works and treat it as an undeserved miracle, or explain it via the following theorem.

Theorem (Cohn and Kumar). Every m-distance set that is a spherical (2m 1)-design is universally optimal. m-distance set = set with m distances between distinct points spherical k-design = finite subset D of sphere S n 1 such that for all polynomials p : R n R of total degree at most k, average of p over D = average of p over S n 1. (I.e., averaging at these points gives exact numerical integration for polynomials up to degree k.) This theorem handles every known universal optimum except the regular 600-cell. H. Cohn and A. Kumar, Universally optimal distribution of points on spheres, Journal of the American Mathematical Society 20 (2007), 99 148.

Packing bounds Theorem. Suppose h = k h kp n k with h k 0 for k 0 and h 0 > 0, and suppose h(t) 0 for t [ 1, cos θ]. Then every configuration C on S n 1 with minimal angle at least θ satisfies C h(1)/h 0. Proof: We have C h(1) h( x, y ) = k h k P n k ( x, y ) C 2 h 0. Q.E.D.

So what are ultraspherical polynomials? Orthogonal polynomials with respect to (1 t 2 ) (n 3)/2 dt on [ 1, 1]. I.e., if k l. 1 1 P n k (t)pn l (t)(1 t2 ) (n 3)/2 dt = 0 Equivalently, Pk n is orthogonal to all polynomials of degree less than k with respect to this measure. This uniquely determines them up to scaling (which is irrelevant for us, as long as we take Pk n (1) > 0). Just apply Gram-Schmidt orthogonalization to 1, t, t 2,... to compute them recursively.

Orthogonal polynomials Many wonderful implications. For example, orthogonality shows that Pk n has k distinct roots in [ 1, 1]. To see why, suppose P n k changed sign at only m points r 1,..., r m in [ 1, 1], with m < k. Then P n k (t)(t r 1)... (t r m ) would never change sign on [ 1, 1], which would contradict 1 1 P n k (t)(t r 1)... (t r m )(1 t 2 ) (n 3)/2 dt = 0 (which holds because (t r 1 )... (t r m ) has degree less than k).

Recall that as a representation of O(n), we can decompose L 2 (S n 1 ) as L 2 (S n 1 ) = W k, k 0 where W k consists of degree k spherical harmonics. Let x S n 1, and consider the linear map that takes f W k to f (x). This map must be the inner product with some unique element w k,x of W k, called a reproducing kernel. That is, for all f W k. f (x) = w k,x, f The function w k,x is invariant under all rotations that fix x, since such rotations preserve f (x). Thus, w k,x (y) must be a function of x, y alone. We define P n k by w k,x (y) = Pk n ( x, y ).

The reproducing kernel w k,x is a polynomial of degree k (because it is a spherical harmonic), and thus so is P n k. Furthermore, w k,x and w l,x are orthogonal in L 2 (S n 1 ) for k l, since they are spherical harmonics of different degrees. Thus, Pk n ( x, y )Pn l ( x, y ) dµ(y) = 0, S n 1 where µ is surface measure. If we orthogonally project from the surface of the sphere onto the axis from x to x, then µ projects to a constant times the measure (1 t 2 ) (n 3)/2 dt on [ 1, 1]. (This is a good multivariable calculus exercise.) Thus, we have recovered all the properties of ultraspherical polynomials we needed except for the fundamental inequality Pk n ( x, y ) 0.

As a side comment, we can now see that W k is an irreducible representation of O(n). If it broke up further, then each summand would have its own reproducing kernel, which would two different polynomials of degree k that would be orthogonal to each other as well as to lower degree polynomials. That is impossible (the space of polynomials of degree at most k has dimension too low to contain that).

The fundamental inequality Recall that the reproducing kernel property means w k,x, f = f (x) for all f W k. In particular, taking f = w k,y yields w k,x, w k,y = w k,y (x). Recall also that w k,y (x) = Pk n ( x, y ). Now we have Pk n ( x, y ) = w k,y (x) = w k,x, w k,y 2 = w k,x 0. x C

This is a perfect generalization of x C x 2 0, except instead of summing the vectors x, we are summing vectors w k,x in the Hilbert space W k. One interpretation is that x w k,x maps S n 1 into the unit sphere in the higher-dimensional space W k, and we re combining the trivial inequality x C w k,x 2 0 with that nontrivial mapping. When n = 2, the space W k has dimension 2 for k 1, so we are mapping S 1 to itself. This map wraps S 1 around itself k times, while the analogues for n 3 are more subtle.

Do ultraspherical polynomials span all the functions P satisfying P( x, y ) 0 for all C? No; see F. Pfender, Improved Delsarte bounds for spherical codes in small dimensions, J. Combin. Theory Ser. A 114 (2007), 1133 1147. However, they span all the positive-definite kernels: functions P such that for all x 1,..., x N S n 1, the N N matrix with entries P( x i, x j ) is positive semidefinite. I. J. Schoenberg, Positive definite functions on spheres, Duke Math. J. 9 (1942), 96 108.

Semidefinite programming bounds Generalizations put semidefinite constraints on higher correlation functions. A. Schrijver, New code upper bounds from the Terwilliger algebra and semidefinite programming, IEEE Transactions on Information Theory 51 (2005), 2859 2866. C. Bachoc and F. Vallentin, New upper bounds for kissing numbers from semidefinite programming, Journal of the American Mathematical Society 21 (2008), 909 924. H. Cohn and J. Woo, Three-point bounds for energy minimization, Journal of the American Mathematical Society 25 (2012), 929 958. D. de Laat and F. Vallentin, A semidefinite programming hierarchy for packing problems in discrete geometry, arxiv:1311.3789.

For more information Papers are available from: http://research.microsoft.com/~cohn Specifically, see Order and disorder in energy minimization: http://arxiv.org/abs/1003.3053