New York University, Fall 2013 Lattices, Convexity & Algorithms Lecture 2 The Dual Lattice, Integer Linear Systems and Hermite Normal Form Lecturers: D. Dadush, O. Regev Scribe: D. Dadush 1 Dual Lattice In the first Lecture, we saw that lattices can be viewed equivalently in two different ways, i.e. as discrete additive subgroup of R n or as an additive subgroup of R n with linearly independent generators. In this section, we show a final equivalent viewpoint which relates the discreteness of a lattice to the existence of an appropriate dual. Definition 1 (Dual Lattice) For an additive subgroup G R n, we define the dual lattice of G to be G = {x span(g) : x, y Z, y G}. The dual lattice is a very useful object for proving geometric inequalities about the original lattice. In particular, we can often use dual vectors to provide an effeciently checkable proof of some property of the original lattice. In the following lemma, we show that dual vectors can provide a simple witness for proving lower bounds on the length of the shortest non-zero vector. Lemma 2 Let G R n be an additive subgroup of rank k 1. Then if there exists linearly independent vectors b 1,..., b k G, we have that λ 1 (G) min 1 i k In particular, if dim(g ) = dim(g), then G is a lattice. 1 b i 2 Proof: Take x G \ {0}. Since x span(b1,..., b k ) and x = 0, there exists j [k] such that x, b j = 0. Furthermore, since x, b j Z, x, b j 1. Therefore 1 x, b j x 2 b j 2 x 2 1 b j 2 min 1 i k 1 b i 2 Since the final lower bound holds for all x G \ {0}, we get that λ 1 (G) > min 1 i k 1 b i 2, as needed. Since λ 1 (G) > 0, we get that G is a lattice. Lastly, note that G contains k linearly independent vectors if and only dim(g ) = k = dim(g). We leave the following as an exercise. Exercise 1 For any additive subgroup G R n, show that G is a lattice. Definition 3 (Pseudo-Inverse) For a non-singular matix B = (b 1,..., b k ) R n k, we define its pseudo-inverse to be B + def = (B t B) 1 B t R k n. We define b1,..., b k to the columns of (B+ ) t, i.e. (B + ) t = (b1,..., b k ), and denote this set of vectors to be the associated dual basis for b 1,..., b k. Remark 4 Note that when B is a square matrix (and hence invertible), the pseudo-inverse corresponds to the standard inverse since B + = (B t B) 1 B t = B 1 B t B t = B 1. In the next lemma, we establish the basic properties of the pseudo-inverse. 1
Lemma 5 Let B = (b 1,..., b k ) R n k be a non-singular matrix. The following holds: 1. B + is well-defined. 2. The dual basis vectors b 1,..., b k are contained in span(b) and are linearly independent. 3. ker(b + ) = span(b). 4. B + B = I k, where I k is the k k identity. 5. For i, j k, b i, b j = δ ij, where δ ij = 1 if i = j and 0 otherwise. Proof: Proof of 1. Since B + = (B t B) 1 B t, to prove that B + is well-defined we need only show that B t B R k k is invertible. In particular, it suffices to show that B t B is non-singular. Take x R k, and assume that B t Bx = 0. We need to prove that x = 0. Note that 0 = x t B t Bx = Bx 2 2. From here, we see that Bx 2 2 = 0 Bx = 0 x = 0 since B is non-singular. Therefore x = 0 as needed. Proof of 2. Note that by definition of b1,..., b k, we have that (b 1,..., b k ) = ((Bt B) 1 B t ) t = B(B t B) t = B(B t B) 1, where the last equality follows since B t B is symmetric (and hence has a symmetric inverse). Therefore bi = B ( (B t B) 1 ) e i span(b) as needed. Furthermore, since both B and (B t B) 1 are non-singular, it follows that (B + ) t = B(B t B) 1 is non-singular. Since b1,..., b k correspond to the columns of (B+ ) t, it follows that b1,..., b k are linearly independent. Proof of 3. Since (B t B) 1 is non-singular, note that for x R n we have that B + x = (B t B) 1 ( B t x ) = 0 B t x = 0 x span(b). Therefore x ker(b + ) x span(b) as needed. Proof of 4. Proof of 5. B + B = (B t B) 1 (B t B) = I k as needed. For i, j [k], b i, b j = (B + ) t e i, Be j = e t i B+ Be j = e t i I ke j = δ ij as needed. Lemma 6 Let L R n be a rank k 1 lattice with basis B = (b 1,..., b k ) R n k. Then L is a rank k lattice with basis b 1,..., b k. Furthermore, (L ) = L. Proof: We wish to prove that L(b1,..., b k ) = L. We show that L(b1,..., b k ) L. First since b1,..., b k span(b) = span(l), we have that L(b 1,..., b k ) span(l). Now take x L and y L(b1,..., b k ). Since b 1,..., b k is a basis for L, we may express x = i=1 k a ib i for a 1,..., a k Z. Similarly, y = i=1 k b ibi with b 1,..., b k Z. Since b i, b j = δ ij, i, j [k], we have that k x, y = a i b i, b i b j = a i b j b i, b k j = a i b i Z i=1 i=1 k j=1 1 i,j k since a i, b i Z, i [k]. Therefore L(b1,..., b k ) L as needed. We now prove that L L(b1,..., b k ). Take y L. Examine ŷ = i=1 k y, b i bi. Since y, b i Z, i [k], we clearly have that ŷ L(b1,..., b k ). For j [k], note that y ŷ, b j = y, b j k i=1 y, b i b i, b j = y, b j 2 k i=1 y, b i δ ij = y, b j y, b j = 0.
From the above, we see that y ŷ span(b). Furthermore, since y, ŷ span(b), we get that y ŷ span(b) span(b) y ŷ = 0. Therefore y = ŷ y L(b1,..., b k ) as needed. For the furthermore, we wish to show that (L ) = L. Since b1,..., b k span(b) and linearly independent, we clearly have that span(l) = span(b) = span(l ). Next, since for all x L, y L, we have that x, y Z, we get that L (L ). Now take z (L ). Since z span(b), we can write z = i=1 k a ib i for a 1,..., a k R. Since z (L ), we note that z, bi = a i Z, for all i [k]. Therefore z L as needed. Given the above lemmas, we get an alternate characterization of lattices. Theorem 7 Let G R n be an additive subgroup. Then the following are equivalent: 1. G is a lattice. 2. (G ) = G. 3. dim(g ) = dim(g). Proof: 1 2. Follows directly from Lemma 6. 2 3. Since G span(g), we have the trivial inequality dim(g ) dim(g). Since (G ) = G, we have that dim(g) dim(g ) dim((g ) ) = dim(g). Therefore dim(g) = dim(g ) as needed. 3 1. Follows directly from Lemma 2. Using the dual lattice, we can also get an exact description of when the orthogonal projection of an lattice remains a lattice. Lemma 8 Let L R n be a lattice. Let S = span(l) and let W S be a linear subspace. 1. π W (L) = L W. Furthermore, this holds if L is any additive subgroup. 2. π W (L) is a lattice dim(w L ) = dim(w S) dim(w L) = dim(w S). Proof: Proof of 1. We first show that π W (L) L W. Take x π W (L). First, we clearly have that x span(π W (L)) = span(π W (S)) = W S. Next note that for y L, since π W (x) = x we have that x, y = x, π W (y) Z since π W (y) π W (L). Therefore x L W as needed. Now we show that L W π W (L). Take x L W. Note that x span(l ) W S W = W. Take any y π W (L), and let ŷ L be any lifting of y satisfying π W (ŷ) = y. Since π W (x) = x, we have that x, y = x, π W (ŷ) = x, ŷ Z since ŷ L. Therefore x π W (L) as needed. Since the proof uses only the properties of additive subgroups, the characterization holds when L is any additive subgroup. 3
Proof of 2. From Theorem 7 we have that π W (L) is a lattice dim(π W (L) ) = dim(π W (L)). By the first part of the Lemma, we have that π W (L) = L W, and hence dim(π W (L)) = dim(π W (L) ) dim(π W (L)) = dim(l W). Next, note that span(π W (L)) = π W (span(l)) = W. This proves that π W (L) is a lattice dim(w) = dim(w S) = dim(l W). We now show that dim(w S) = dim(l W) dim(w S) = dim(l W ). Since the statements are symmetric (i.e. since (W ) = W and (L ) = L), it suffices to prove the implication one way. We assume that dim(w S) = dim(l W). Let l = dim(w S) and let k = dim(s). Since L W S W, we clearly have that dim(l W ) dim(s W ). It therefore suffices to prove the reverse inequality. By the assumption on L W, i.e. that dim(l W) = l, we have that L W has a basis b1,..., b l L W. By Theorem 5 of Lecture 1, we can extend b1,..., b l to a basis b1,..., b k of L. Let b 1,..., b k denote the associated basis of L, i.e. which satisfies b i, b j = δ ij, for i, j [k]. Note that bi, b j = 0 for i {1,..., l} and j {l + 1,..., k}. Therefore b l+1,..., b k span(l) span(l W) = S (S W) = S W. Since b l+1,..., b k are linearly independent, we have that dim(l W ) dim(span(b l+1,..., b k )) = k l = dim(s) dim(s W) = dim(s W ) as needed. Using the previous lemma, we easily derive the following corollary. Corollary 9 Let L R n be a lattice. 1. For linearly independent vectors v 1,..., v k L, let π denote the orthogonal projection onto span(v 1,..., v k ). Then π(l) is a lattice. 2. For linearly independent vectors v1,..., v k L, let π denote the orthogonal projection onto span(v k,..., v k ). Then π(l) is a lattice. Proof: Proof of 1. Let W = span(v 1,..., v k ) span(l). Since v 1,..., v k L note that π(l) = π W (L). Since W = span(v 1,..., v k ) + span(l) and v 1,..., b k L, we see that W span(l) = span(v 1,..., v k ) = span(l W ) and hence dim(span(l) W ) = dim(l W ). Therefore by Lemma 8, we have that π W (L) = π(l) is a lattice as needed. Proof of 2. Let W = span(v1,..., v k ). Given that L is a lattice, we know that span(l) = span(l ). Since v1,..., v k L, we see that W span(l ) = span(l) and hence W span(l) = W = span(l W). Therefore dim(w span(l)) = dim(w L ). By Lemma 8, we now have that π W (L) = π(l) is a lattice as needed. 2 Deciding Lattice Membership and Building a SubLattice Basis In this section, we will focus on solving some easy, but useful, computational problems on lattices. The two main tasks we will address are the following: let L R n be a rank k lattice with basis B R n k. Let y 1,..., y m be vectors in L (not necessarily linearly independent). 4
1. Lattice Basis Problem. Determine a basis for L(y 1,..., y m ). 2. Lattice Membership Problem. Given x R n, decide whether x L(y 1,..., y m )? If so, find an integer combination z 1,..., z n Z such that x = m i=1 z ib i. Remark 10 For the first question, since L(y 1,..., y m ) is an additive subgroup of L, it is clearly discrete and hence a lattice. Therefore it makes sense to ask for a basis of L(y 1,..., y m ). Simplications and Reductions. We now make some direct simplications. Letting B + R k n, denote the pseudo-inverse of B, we note that finding a basis of L(b 1,..., b m ) is equivalent to finding a basis for L(B + y 1,..., B + y m ). Note that each B + y i Z k, i [m], since B + gives the coordinates of y i in terms of B. To see the equivalence, note that given a basis z 1,..., z l Z k for L(B + y 1,..., B + y m ) Z k, we get that Bz 1,..., Bz l is a basis for L(y 1,..., y m ). Therefore, we may assume that all the given vectors are integer vectors. For the second question, we can make analoguous simplifications. First, using linear algebra, one may directly check whether x span(y 1,..., y m ). Next, we may reduce to a problem on integer vectors as above, where we now check whether B + x L(B + y 1,..., B + y m ) Z k. Clearly if B + x / Z k, then B + x / L(y 1,..., y m ), and hence we may assume that B + x Z k as well. Thus far, we have reduced both questions 1 and 2 to the case where all the vectors are integral. We now show that the decisional version of question 2 reduces to question 1. Given y 1,..., y m Z n, let z 1,..., z k Z k denote a basis for L(y 1,..., y m ) as produced by any basis finding algorithm (solver for question 1). We now reduce checking whether x L(y 1,..., y m ) = L(z 1,..., z k ) to solving the linear system of equations (z 1,..., z k )a = x for a R k. Here we have three cases, either (1) the system no solution, (2) the system has a non-integer solution, or (3) the system has an integer solution. Since z 1,..., z k is a basis for the lattice, note that both cases (1) and (2) imply that x / L(y 1,..., y m ), and case (3) implies that x L(y 1,..., y m ), as needed. This completes the reduction. We note that in the case that x L(y 1,..., y m ) the previous reduction does not give specific integer multipliers a 1,..., a m such that x = m i=1 a iy i. However, as we shall see, the techniques used to generate the basis for L(y 1,..., y k ) will yield a method to compute these coefficients. 2.1 Applications In the following section, we give two direct applications of the questions from the previous section. Solving Integer Linear Systems of Equations. Given A Z n m, b Z n, decide whether the system Ax = b, x Z m, has a solution, (1) and if so, find a satisfying x Z m. Proposition 11 The Integer Linear System problem is equivalent to the Lattice Membership Problem. Proof: Let L = L(A) denote the integer sublattice generated by the columns of A. By definition, Ax = b, x Z m has a solution b is an integral combinations of the columns of A b L(A). Hence the integer linear system problem is equivalent to the lattice membership problem 5
for integer sublattices. However, from the remarks in the previous section, we know that the lattice membership problem reduces to the case where all the vectors are integral, and hence the problems are completely equivalent. Remark 12 We note that the when the columns of A are linearly independent, the above problem reduces to linear algebra. This is exactly the setting where the columns of A form a basis of L(A). Solving Modular Systems of Equations. Given A Z n m, b Z n, c Z n +, decide whether the system b 1 (mod c 1 ) Ax = b, x Z m Ax., x Z m, has a solution, (2) b n (mod c n ) and if so, find a satisfying x Z m. A natural question is when do integer linear systems of equations occur in practice where the columns of A are not linearly independent? As we will see in the following reduction, the problem of solving modular equations reduces to solving an integer linear system where we do not have linear independence. Proposition 13 Solving a Modular System of Equations reduces to solving a Integer Linear System. Proof: By definition, we have that a b (mod c), a, b Z, c N, if and only if a + zc = b for some z Z. By the identical reasoning, we have that Ax b 1 (mod c 1 ). b n (mod c n ), x Z m, has a solution c 1 0... 0 0 c 2... 0 Ax +...... z = b, x Zm, z Z n, has a solution. 0 0... c n Furthermore, given a solution (x, z) to the integer linear system, the vector x gives a solution to the modular system. Hence, the modular system of equations reduces to an integer system of equations. Lastly, note that the columns of the extended coefficient matrix are not linearly independent (since the diagonal matrix of moduli spans R n ). The following exercise shows that both of the above problems either have solutions or have succinct proofs of infeasibility. Exercise 2 (Duality for Integer Linear Systems) Take A Z n m, b Z n, c Z n +. 1. Prove that the system Ax = b, x Z m has a solution if and only if there does not exist y R n such that y t A Z m and y t b / Z. (Hint: split up the analysis based on whether Ax = b has a real solution or not. If it has a real solution, examine the appropriate dual lattice.) 6
2. Prove that the system Ax b 1 (mod c 1 ). b n (mod c n ), x Z m, has a solution, if and only if there does not exist y R n, y i {0, 1 c i,..., c i 1 c i }, i [n], such that y t A Z m and y t b / Z. 2.2 Hermite Normal Form In this section, we describe a method for solving the lattice basis problem for finitely generated integer sublattices. As mentioned previously, this suffices to solve the general lattice basis problem, as well as the decisional version of the lattice membership problem. Definition 14 (Hermite Normal Form) Let A Z n m, and let k = rank(a). Let r(i), i [m], denote the index of the first non-zero entry in the i th column of A. The matrix A is in Hermite Normal Form (HNF) if it satisfies the following: 1. The first k columns of A are non-zero, and the remaining are zero. 2. r(1) < r(2) < < r(k). 3. A r(i),i > 0 for all i [k]. 4. 0 A r(i),j < A r(i),i for all 1 j < i k. The main goal of this section is to prove the following theorem: Theorem 15 For any A Z n m, there exists a unimodular transformation U Z m m such that AU is in HNF. Furthermore, if AU 1 and AU 2 are in HNF, U 1, U 2 Z m m unimodular, then AU 1 = AU 2. The above theorem is constructive: we will give an algorithm which computes both the HNF and the corresponding unimodular transformation for any integer matrix A Z n m. Using the above procedure for computing the HNF, the following algorithm computes a basis for L(A): 1. Compute the HNF Ā = AU for A, U Z m m unimodular. 2. Return the first rank(ā) columns of Ā. We now justify the correctness of the above procedure. Let k = rank(ā) and let Ā,[k] denote the first k columns of Ā. Since the remaining columns of Ā are zero, we clearly have that L(Ā,[k] ) = L(Ā). Next, since Ā = AU for a unimodular U Z m m, we have that L(Ā) = L(AU) = L(A). Furthermore, note that k = rank(ā) = rank(a), since U is non-singular. Given this we have that the columns of Ā,[k] are linearly independent and that L(Ā,[k] ) = L(A). Hence Ā,[k] form a basis of L(A) as desired. To construct the HNF we will use the following elementary integer columns operations: 1. A,i A,i + za,j, z Z, i = j. (add integer multiple of column j to column i) 2. A,i A,j, i = j. (swap column i and j) 7
3. A,i A,i. (negate column i) We leave it as an exercise to show that each of the above operations corresponds to multiplying the matrix A on the right by a unimodular transformation. Furthermore, any sequence of such operations also corresponds to right mutliplication by a unimodular, since unimodular transformations form a group under multiplication. Proof:[of Theorem 15] Given a matrix A Z m n, we shall use the following algorithm to put it in HNF: Require: A Z n m Ensure: A in HNF. c 0 { lower bound on rank(a) } for all r 1 to n do if A r,[c+1,m] = 0 then c c + 1 { increase rank lower bound by 1} for all i c to m do A,i sign(a r,i ) A,i { make partial row non-negative } repeat i arg min{a r,j : A r,j = 0, c j m} { find index of smallest non-zero entry } A,c A,i { swap column c with smallest non-zero entry column } for all j c + 1 to m do A,j A,j A r,j A r,c A,c { main Euclidean algorithm step } until A r,[c+1,m] = 0 for all j 1 to c 1 do A,j A,j A r,j A r,c A,c { row cleanup } We shall prove that the above algorithm terminates in a bounded number of iterations and that it puts A in HNF. We note that the algorithm presented above is not known to terminate in polynomial time. The main issue is that the as written, the size of the intermediate numbers in the working matrix A could be very very large (i.e. have a super-polynomial number of bits). There are many ways to overcome this problem, though we shall not cover them here. For a thorough reference on the Hermite Normal Form, one may consult [Sch86]. References [Sch86] A. Schrijver. Theory of Linear and Integer Programming. Wiley-Interscience, New York, NY, 1986. 8