RESTRICTED INVERTIBILITY REVISITED

Size: px

Start display at page:

Download "RESTRICTED INVERTIBILITY REVISITED"

Polly Newman
6 years ago
Views:

1 RESTRICTED INVERTIBILITY REVISITED ASSAF NAOR AND PIERRE YOUSSEF Dedicated to Jirka Matoušek Abstract. Suppose that m, n N and that A : R m R n is a linear operator. It is shown here that if k, r N satisfy k < r ranka) then there exists a subset σ {,..., m} with σ = k such that the restriction of A to R σ R m is invertible, and moreover the operator norm of the inverse A : AR σ ) R m is at most a constant multiple of the quantity mr/r k) m i=r sia) ), where s A)... s ma) are the singular values of A. This improves over a series of works, starting from the seminal Bourgain Tzafriri Restricted Invertibility Principle, through the works of Vershynin, Spielman Srivastava and Marcus Spielman Srivastava. In particular, this directly implies an improved restricted invertibility principle in terms of Schatten von Neumann norms.. Introduction Given m, n N, the rank of a linear operator A : R m R n equals the largest possible dimension of a linear subspace V R m on which A is injective, i.e., the inverse A : AV ) V exists. The restricted invertibility problem asks for conditions on A that ensure a strengthening of this basic fact from linear algebra in two ways, corresponding to additional structural information on the subspace V R m on which A is injective, as well as quantitative information on the behavior of the inverse A : AV ) V. Firstly, the goal is to find a large dimensional coordinate subspace on which A is invertible, i.e., we wish to find a large subset σ {,..., m} such that A is injective on R σ R m. Secondly, rather than being satisfied with mere invertibility we ask for A to be quantitatively invertible on R σ in the sense that the operator norm of the inverse A : AR σ ) R σ is not too large. Obviously, additional assumptions on A are required for such conclusions to hold true. The following theorem, which is known as the Bourgain Tzafriri Restricted Invertibility Principle [BT87, BT89, BT9], is a seminal result that addressed the above question and had major influence on subsequent research, with a variety of interesting applications to several areas. Throughout what follows, for m N the standard coordinate basis of R m will be denoted by e,..., e m R m. Theorem Bourgain Tzafriri). There exist two universal constant c, C 0, ) with the following property. Suppose that m N and that A : R m R m is a linear operator such that the Euclidean norm of the vector Ae j R m equals for every j {,..., m}. Letting A denote the operator norm of A, there exists a subset σ {,..., m} with σ cm/ A such that A is injective on R σ and the operator norm of the inverse A : AR σ ) R σ is at most C. In what follows, for p [, ] and m N the l p norm of a vector x R m will be denoted as usual by x p. Thus x is the Euclidean norm of x. We shall also denote as usual) by l m p the normed space R m equipped with the l p norm. The standard scalar product on R m will be denoted,. For k, m, n N and a k-dimensional subspace V R m, the Schatten von Neumann p norm of a linear operator A : V R n will be denoted below by A Sp. Thus ) A Sp = TrA A) p k ) p = s j A) p p, A. N. was supported by BSF grant 000, the Packard Foundation and the Simons Foundation. j=

2 where s A) s A)... s k A) denote the singular values of A, i.e., they are the decreasing rearrangement of the) eigenvalues of the positive semiinite operator A A : V V. Thus A S = s A) is the operator norm of A. Also, A S is the Hilbert Schmidt norm of A, i.e., for every orthonormal basis u,..., u k of V we have A S = k n j= Au i, e j = k Ae i. Below it will sometimes be convenient to denote the smallest singular value of A by s min A) = s k A). Thus A is injective if and only if s min A) > 0, in which case A S = /s min A). Given m N and σ {,..., m} it will be convenient to denote the formal identity from R σ to R m by J σ : R σ R m, i.e., J σ a j ) ) = a je j for every a j ) R σ. With this notation, given an operator A : R m R n that is injective on R σ we can consider the operator AJ σ ) : AR σ ) R σ. We shall sometimes drop the need to mention explicitly that A is injective on R σ by adhering to the convention that if A is not injective on R σ then AJ σ ) S =. Using the above notation, Theorem asserts that if A : R m R m is a linear operator that satisfies Ae j = for all j {,..., m} then there exists σ {,..., m} with σ m/ A S such that AJ σ ) S, or equivalently s min AJ σ ). Here, and in what follows, we use the following standard asymptotic notation. Given two quantities K, L R the notation K L respectively K L) means that there exists a universal constant c 0, ) such that K cl respectively K cl). The notation K L means that both K L and K L hold true. The following theorem is a useful strengthening of the Bourgain Tzafriri Restricted Invertibility Principle that was discovered by Vershynin in [Ver0]. Theorem Vershynin). There exists a universal constant c 0, ) with the following property. Fix k, m, n N. Let A : R m R n be a linear operator with Ae j = for all j {,..., m}. Also, let : R n R n be a positive inite diagonal operator, i.e., there exist d,..., d n 0, ) such that x = d x,..., d n x n ) for every x = x,..., x n ) R n. Suppose that k < A S / A S and write k = ε) A S / A S where ε 0, ) thus ε = k A S / A S ). Then there exists a subset σ {,..., m} with σ = k such that AJ σ ) S ε c log/ε). For a linear operator T : R m R n, the quantity T S / T S is often called the stable rank of T, though this terminology sometimes also refers to the quantity T S / T S. In both cases, the use of the term stable in this context expresses the fact that the quantity in question is a robust replacement for the rank of T in the sense that the rank of T could be large due to the fact that T has many positive but nevertheless very small singular values, while if the stable rank of T is large then its singular values are large on average. Below we shall use the terminology stable rank exclusively for the quantity T S / T S, which we denote by srankt ) = T S / T S. Theorem coincides with the special case ε = and = I n of Theorem, where I n is the identity operator on R n. However, Theorem improves over Theorem in three ways that are important for geometric applications. Firstly, Theorem treats rectangular matrices while Theorem treats only the case m = n. Secondly, even in the special case = I n of Theorem the size of the subset σ {,..., m} is allowed to be arbitrarily close to sranka), while in Theorem it can only be taken to be a constant multiple of sranka). Lastly, Theorem actually allows for the size of the subset σ {,..., m} to be arbitrarily close to the supremum of sranka ) over all positive inite diagonal operators : R m R m, a quantity that could be much larger than sranka). Remark 3. Theorem is often stated in the literature as a subset selection principle for John decompositions of the identity. Namely, suppose that k, m, n N and x,..., x m R n {0} satisfy m j= x j, y = y for all y Rn. Equivalently, we have m j= x j x j = I n, where for x, y R n the rank-one operator x y : R n R n is ined as usual by setting x y)z) = x, z y for every z R n. Suppose that T : R n R n is a linear operator satisfying T x,..., T x m 0, and that

3 k = ε)srankt ) for some ε 0, ). Then there exists σ {,..., m} with σ = k such that {a j } R, a j ) T x j T x j ε c log/ε) a j. The above formulation is equivalent to Theorem as stated in terms of rectangular matrices by considering the operator A : R m R n that is given by Ae j = T x j / T x j for every j {,..., m}. A recent breakthrough of Spielman Srivastava [SS], that relies nontrivially on a remarkable method for sparsifying quadratic forms that was developed by Batson Spielman Srivastava [BSS] see also the survey [Nao]), yielded the following improved restricted invertibility principle, via techniques that are entirely different from those used by Bourgain Tzafriri and Vershynin. Theorem 4 Spielman Srivastava). Suppose that k, m, n N and let A : R m R n be a linear operator such that k < sranka). Write k = ε)sranka) where ε 0, ). Then there exists a subset σ {,..., m} with σ = k such that AJ σ ) S ε m m. A S ε A S In the setting of Theorem 4, since A S = m when the columns of A have unit Euclidean norm, Theorem is a special case of Theorem 4. As in the case = I n of Theorem, the statement of Theorem 4 has the additional feature that the subset σ {,..., m} can have size arbitrarily close to sranka). Moreover, in Theorem 4 the columns of A need not have unit Euclidean norm, and the upper bound on AJ σ ) S in terms of ε is much better in Theorem 4 than the corresponding bound in the case = I n of Theorem ; in fact this bound is asymptotically sharp [BHKW88] as ε 0. An additional feature of Theorem 4 is that its proof in [SS] yields a deterministic polynomial time algorithm for finding the subset σ, while previous to [SS] only a randomized polynomial time algorithm was available [Tro09]. Theorem does have a feature that Theorem 4 does not, namely the size of the subset σ {,..., m} can be taken to be arbitrarily close to the supremum of sranka ) over all positive inite diagonal operators : R m R m, albeit with worse dependence on ε. However, in [You4] it was shown how to combine the features of Theorem and Theorem 4 so as to yield this stronger guarantee with the better dependence on ε that is asserted in Theorem 4. This improvement is important for certain geometric applications [You4]. The new results that are presented below have this stronger weighted feature, but for the sake of simplicity of the initial discussion in the Introduction we shall first present all the ensuing statements in their unweighted form that corresponds to the way Theorem 4 is stated above. A different proof of Theorem 4 in the special case AA = I n was found by Marcus, Spielman and Srivastava in [MSS4], using their powerful method of interlacing polynomials [MSS5a, MSS5b]. In fact, their forthcoming work [MSS6] obtains Theorem 5 below, which yields for the first time a restricted invertibility principle for subsets that can be asymptotically larger than the stable rank, with their size depending on the ratio of the Hilbert Schmidt norm and the Schatten von Neumann 4 norm. This result was announced by Srivastava in his talk at the conference Banach Spaces: Geometry and Analysis Hebrew University, May 03), and it is actually a precursor to the outstanding subsequent work [MSS5b]. Its proof will appear for the first time in the forthcoming preprint [MSS6], but we confirmed with the authors that they obtain Theorem 5 as stated below. Theorem 5 Marcus Spielman Srivastava). Suppose that k, m, n N and let A : R m R n be a linear operator such that k < 4 A S / A S4 ) 4. Define ε 3/4, ) by k = ε) A 4 S / A 4 S 4. Then there exists a subset σ {,..., m} with σ = k such that AJ σ ) S ε m A S. ) 3

4 Theorem 5 can be much better than the previously known restricted invertibility principles at detecting large well-invertible sub-matrices. To state a concrete example, suppose that the singular values of A are s A) 4 m and s A) s 3 A)... s m A) =. Then Theorem 4 yields a subset σ {,..., m} of size of order m for which the operator norm of the inverse of AJ σ is O), while Theorem 5 yields such a subset whose size is at least a constant multiple of m. The restriction k < 4 A S / A S4 ) 4 in Theorem 5 ensures that ε > 3/4, so that the quantity appearing under the square root in ) is positive. Thus, in the statement of Theorem 5 k cannot be arbitrarily close to the modified stable rank A 4 S / A 4 S 4, but this will be remedied below. It is important to note that the quantity A 4 S / A 4 S 4 is always at least sranka). More generally, given p, ], if we ine the p-stable rank of A to be the quantity srank p A) = A S A Sp ) p p, ) then in particular srank 4 A) = A 4 S / A 4 S 4 and srank A) = sranka). We claim that p q > = srank p A) srank q A), 3) Indeed, by direct application of Hölder s inequality we have p q) qp ) A Sq A S pq ) qp ) A S p, which simplifies to give 3). The limit as p + of srank p A) can be computed explicitly, yielding the quantity below, denoted EntrankA), which we naturally call the entropic stable rank of A. EntrankA) = lim srank pa) = exp log p + m j= s j A) m j= s ja) ) log s j A) m j= s ja) TrA A) log TrA A) TrA A loga ) A)) m = exp TrA = A S A) j= s j A) s j A) A S. As we shall explain in the next section, here we obtain an improved restricted invertibility theorem that in particular yields a strengthening of Theorem 5 that allows one to make use of the p-stable rank of A for every p >, thus producing well-invertible sub-matrices of A of size that can be any integer that is less than the entropic stable rank of A... Restricted invertibility in terms of rank. Our main new result is the following theorem. Theorem 6. Suppose that k, m, n N. Let A : R m R n be a linear operator with ranka) > k. Then there exists a subset σ {,..., m} with σ = k such that AJ σ ) mr S min r {k+,...,ranka)} r k) m i=r s ia). 4) Example 7. To illustrate the relation between Theorem 4, Theorem 5 and Theorem 6, consider a linear operator A : R m R n with s j A) / j for every j {,..., m}. Thus ranka) = m, sranka) log m and srank 4 m) log m). Since m/ A S m/ log m, Theorem 4 yields σ {,..., m} with σ log m and AJ σ ) S m/ log m, Theorem 5 yields such a subset with σ log m), and Theorem 6 yields such a subset with σ m. In fact, for every ε 0, ), Theorem 6 yields σ {,..., m} with σ m ε such that AJ σ ) S ε m/ log m. 4

5 Theorem 6 has the feature that it asserts the existence of a coordinate subspace of dimension arbitrarily close to the rank of the given operator on which it is invertible, with quantitative control on the operator norm of the inverse. The rank is not a stable quantity, but it is simple to deduce stable consequences of Theorem 6 that are stronger than Theorem 5. Indeed, continuing with the notation of Theorem 6, for every p, ) we can apply Hölder s inequality to deduce that r A S = s i A) + Hence, m i=r m s i A) i=r r ) p r ) s i A) p p m + s i A) r ) p A S p + i=r m s i A). ) s i A) A S r ) p A ) r ) S p = A p S. 5) srank p A) A substitution of 5) into 4) yields the following estimate. s min AJ σ ) k ) r max r {k+,...,srank pa)} r srank p A) i=r ) ) p A S m. 6) The estimate 6) is nontrivial only when k < srank p A), so write k = ε)srank p A) for some ε 0, ). One checks that the following choice of r {k +,..., srank p A)} attains the maximum in the right hand side of 6), up to universal constant factors. If ε is bounded away from, say ε 0, /], choose r ε/)srank p A). If / < ε e p/p ) then choose r log/ ε)) srank p A). If e p/p ) < ε < then choose r e p/p ) srank p A). Thus, 0 < ε p = AJ σ) S p p < ε e p p = AJ σ ) S p m < ε < = AJ σ ) S. A S e p p A more concise way to write these estimates is as follows. AJ σ ) S + p p ) log ε ) m ε A S, m log / ε)) A S, ) m A S. For ease of future reference, we record the above corollary of Theorem 6 as Theorem 8 below. Theorem 8 Restricted invertibility in terms of Schatten von Neumann norms). Suppose that k, m, n N, ε 0, ) and p, ). Let A : R m R n be a linear operator that satisfies k ε)srank p A). Then there exists a subset σ {,..., m} with σ = k such that AJ σ ) S + p p ) log ε ) ) m A S. Equivalently, if k < EntrankA) then there exists σ {,..., m} with σ = k such that ) AJ σ ) S inf ψ k m p, p> srank p A) A S 5

6 where ψ p : R [0, ] is ined by ψ p ε) = if ε 0, ψ p x) = p/p ))/ε if 0 < ε < /, ψ p ε) = p/p ))/ log/ ε)) if / < ε e p/p ) and ψ p ε) = if ε > e p/p ). The case p = 4 of Theorem 8 implies up to constant factors) the conclusion of Theorem 5, though now treating any ε 0, ), i.e., k arbitrarily close to srank 4 A), while Theorem 5 applies only when ε > 3/4. Theorem 8 can detect the well-invertibility of A on coordinate subspaces that are much larger than those detected by Theorem 5. For example suppose that the singular values of A are s A) 3 m and s A) s 3 A)... s m A). Then Theorem 5 yields a subset σ {,..., m} of size of order m /3 for which the operator norm of the inverse of AJ σ is O), while the case p = 3 of) Theorem 8 yields such a subset whose size is proportional to m. We shall prove Theorem 6 through an application of Theorem 9 below, which is a restricted invertibility statement of independent interest, in combination with a volumetric argument that leads to Lemma 0 below. Throughout what follows, given n N and a linear subspace F R n, we shall denote the orthogonal projection from R n onto F by Proj F : R n F. Theorem 9. Fix k, m, n N and a linear operator A : R m R n satisfying ranka) > k. Let ω {,..., m} be any subset with ω = ranka) such that the vectors {Ae i } i ω R n are linearly independent. For every j ω let F j R n be the orthogonal complement of the span of {Ae i } i ω {j} R n, i.e., F j = span {Ae i } i ω {j} ). 7) Then there exists a subset σ ω with σ = k such that ranka) AJ σ ) S max. 8) ranka) k j ω Proj Fj Ae j The link between Theorem 9 and Theorem 6 is furnished through the following lemma. Lemma 0. Fix r, m, n N. Let A : R m R n be a linear operator with ranka) r. For every τ {,..., m} let E τ R n be the orthogonal complement of the span of {Ae j } j τ R n, i.e., E τ = Then there exists a subset τ {,..., m} with τ = r such that j τ, span {Ae j } j τ ). 9) Proj Eτ {j} Ae j m m i=r s i A) ). 0) The deduction of Theorem 6 from Theorem 9 and Lemma 0 is simple. Indeed, in the setting of Theorem 6, take r {k +,..., ranka)} and apply Lemma 0 to obtain a subset τ {,..., m} with τ = r that satisfies 0). This implies in particular that {Ae j } j τ are linearly independent, hence the operator AJ τ : R τ R n has rank r. By Theorem 9 applied with A replaced by AJ τ, m = r = ranka) and ω = τ, we obtain a further subset σ τ with σ = k such that AJ σ ) 8) 0) S mr r k) m i=r s ia). This is precisely the assertion of Theorem 6. In Section 5 we shall prove the following variant of Theorem 9. Comparing 7) and 9) we see that Fj = E ω {j} for every j ω. 6

7 Theorem. Fix k, m, n N and a linear operator A : R m R n satisfying ranka) > k. Then there exists a subset σ {,..., m} with σ = k such that m AJ σ ) S ranka) k ranka) ranka) ). ) s i A) To explain how Theorem relates to Theorem 6, note that in the setting of Theorem 6 we have j ω Proj Fj Ae j = ranka) s i AJ ω ). ) The simple linear-algebraic justification of ) appears in Section. below. For simplicity suppose that ω = {,..., m}, so ranka) = m, and write k = ε)m for some ε 0, ). Then Theorem 6 yields a subset σ {,..., m} with σ = k such that AJ σ ) S max, 3) ε j {,...,m} Proj Fj Ae j while, due to ), Theorem yields a subset σ {,..., m} with σ = k such that AJ σ ) m ) S m ). 4) ε m Proj Fj Ae j ε m Proj Fj Ae j The estimates 3) and 4) are incomparable since 3) yields a dependence on ε that is better than that of 4) as ε 0, while the bound in 4) is in terms of the average of the quantities {/ Proj Fj Ae j }m j= rather than their maximum. It remains an interesting open question whether one could obtain a restricted invertibility theorem that combines the best terms in 3) and 4). Remark. Theorem 9 is best possible, up to constant factors. Indeed, fix k, m N with k < m and let B be the m by m matrix all of whose diagonal entries equal m and all of whose off-diagonal entries equal. Then B is positive inite diagonal-dominant) and we choose A = B. We are thus in the setting of Theorem 9 with m = n = ranka) and ω = {,..., m}. The quantity / Proj Fj Ae j is equal to the j th diagonal entry of A A) = B ; see equation 6) in Section. below for a simple justification of this fact. The matrix B is an invertible circulant matrix, and as such B is also a circulant matrix whose diagonal entries equal /m + ); see [Dav79, KS] for more on the explicit evaluation of basic quantities related to circulant matrices, including their inverses and eigenvalues, which we use here. Therefore / Proj Fj Ae j = /m + ) for every j {,..., m}, so that the right hand side of 8) equals m/m + )m k)) / m k. At the same time, take any σ {,..., m} with σ = k. Then AJ σ ) AJ σ ) = J σbj σ corresponds to a k by k matrix whose diagonal entries equal m and whose off-diagonal entries equal. This is again a circulant matrix whose eigenvalues equal to m+ with multiplicity k and m+ k with multiplicity. Thus s AJ σ ) =... = s k AJ σ ) = m + and s k AJ σ ) = s min AJ σ ) = / AJ σ ) S = m + k. This shows that AJ σ ) S / m k, so that 8) is sharp up to constant factors... Remarks on the proofs. The original proof of Bourgain and Tzafriri of Theorem consists of a beautiful combination of probabilistic, combinatorial and analytic arguments. It proceeds roughly along three steps. Firstly, using random selectors one finds a large collection of columns of A that is well separated. In the second step one uses the Sauer Shelah lemma [Sau7, She7] to find a further subset of the columns such that the inverse of the restriction of A to this subset, when viewed as an operator from l to l, has small norm; the Sauer Shelah lemma is discussed in Section.4 below, since it plays an important role here as well. The third step of the Bourgain Tzafriri proof uses tools from functional analysis, specifically the Little Grothendieck s Inequality [Gro53] and 7

8 the Pietsch Domination Theorem [Pie67], to control the desired Hilbertian operator norm; these analytic tools are used here as well, and are explained in detail in Section. and Section.3 below. Vershynin s proof of Theorem uses the Bourgain Tzafriri restricted invertibility theorem as a black box, alongside with unpublished) work of Kashin and Tzafriri see Theorem.5 in [Ver0]). A key contribution of Verhynin was the idea to work with the Hilbert Schmidt norm so as to allow for an iterative argument. As we stated earlier, the proof of Spielman and Srivastava of Theorem 4 is entirely different from the previously used methods in this context, relying on the sparsification method of Batson Spielman Srivastava [BSS]. This refreshing approach led to many important developments, and it was subsequently augmented by the powerful method of interlacing polynomials of Marcus Spielman Srivastava, which they used to prove Theorem 5, showing that one could use higher Schatten von Neumann norms to address the restricted invertibility problem. Our starting point here was the realization that one could use ideas and techniques that predate the works of Vershynin, Spielman Srivastava and Marcus Spielman Srivastava to obtain asymptotically sharp results such as Theorem 4, and even to strengthen the statement in terms of higher Schatten von Neumann norms that is contained in Theorem 5. These later results were based on the discovery of powerful new techniques, leading to many additional applications crowned by the solution of the Kadison Singer problem [MSS5b]) that are not covered here, but the present work shows how to apply classical methods to improve over the best known bounds on the restricted invertibility problem. Specifically, we rely on the beautiful work of Giannopoulos [Gia96], which treats a seemingly unrelated geometric question see also [Gia95]), though it is partially inspired by the work of Bourgain Tzafriri [BT87] itself, as well as the works of Bourgain Szarek [BS88] and Szarek Talagrand [ST89] see also [Sza9]). The key step is to use Giannopoulos clever iterative application of the Sauer Shelah lemma Bourgain Tzafriri used the Sauer Shelah lemma only once in their original argument) in the proof of Theorem 9. In fact, one could use a geometric statement of Giannopoulos [Gia96] as a black box so as to obtain a shorter proof of Theorem 9; this is carried out in Section 4. below, but only after we present a self-contained argument in Section 4. Theorem is of a different nature, since its proof uses the Marcus Spielman Srivastava method of interlacing polynomials. We do not see how to prove it using the classical analytic techniques that are utilized elsewhere in this article, and in fact we do not need it for the applications that are obtained here as we explained earlier, Theorem is incomparable to Theorem 9, being weaker in terms of the dependence on certain parameters and stronger in other respects). Nevertheless, Theorem certainly belongs to the family of restricted invertibility results that we study here. Among the interesting questions that arise naturally from the present work, we ask whether Theorem 6, Theorem 8, Theorem 9 and Theorem can be made to be algorithmic. Our current proofs do not yield a polynomial time algorithm that finds the desired coordinate subspace, due to various reasons, including but not limited to) the use of the Sauer Shelah lemma in Theorem 6, Theorem 8 and Theorem 9) and the use of the method of interlacing polynomials in Theorem )..3. Roadmap. While this article is primarily devoted to new results, it also has an expository component due to the fact that we are using tools and ideas from diverse fields, with which some readers may not be familiar. Being very much inspired by Matoušek s exceptionally clear style of mathematical exposition, we also made an effort for the ensuing arguments to be self-contained by including quick explanations of classical results that are being used. It seems impossible to fully achieve a Matoušek-style exposition, but hopefully his influence helped us to make an important area of mathematics and a collection of powerful and versatile tools accessible to a wider audience. Section describes auxiliary statements that will be used in the subsequent proofs. These include classical results of major importance to several fields, and we include brief deductions of what we need so as to make this article self-contained. Section 3 contains the proof of Lemma 0. A selfcontained proof of Theorem 6, using a clever iterative procedure of Giannopoulus [Gia96], appears 8

9 in Section 4. This is followed by Section 4., where it is shown that Theorem 6 is equivalent to a geometric theorem of Giannopulos [Gia96], thus yielding a shorter but not self-contained) proof of Theorem 6. Section 5 contains the proof of Theorem. Acknowledgements. We thank Bill Johnson for helpful discussions. This work was initiated while we were participating in the workshop Beyond Kadison Singer: paving and consequences at the American Institute of Mathematics. We thank the organizers for the excellent working conditions.. Preliminaries In this section we shall describe several tools that will be used in the ensuing arguments, and derive certain corollaries of them in forms that will be easy to quote as the need arises later... A bit of linear algebra. We shall start with elementary linear algebraic reasoning that clarifies the meaning of some of the quantities that were discussed in the Introduction. In particular, we shall see why the identity ) holds true. We work here in the setting of Theorem 9, namely we are given k, m, n N and a linear operator A : R m R n satisfying ranka) > k. We are also fixing any subset ω {,..., m} with ω = ranka) such that the vectors {Ae i } i ω R n are linearly independent. For j ω we consider the linear subspace F j R n that is ined in 7), namely F j is the orthogonal complement of the span of {Ae i } i ω {j} R n. For every j ω ine a vector v j R n as follows. v j = Proj F j Ae j Proj Fj Ae j R n. 5) For every j ω, since I n Proj Fj is the orthogonal projection onto span{ae i } i ω {j} ) R n, we know that I n Proj Fj Ae j span{ae i } i ω {j} ). So, {Proj Fj Ae j } j ω span{ae i } i ω ), and therefore {v j } j ω span{ae i } i ω ). For j ω we have Proj Fj Ae j, Ae j = Proj Fj Ae j, so v j, Ae j =. Also, because Proj Fj Ae j is orthogonal to {Ae i } i ω {j}, we have v j, Ae i = 0 for every i ω {j}. Since {Ae i } i ω is a basis of span{ae i } i ω ) and {v j } j ω span{ae i } i ω ), this means that {v j } j ω is the unique dual basis of {Ae i } i ω in span{ae i } i ω ). The operator AJ ω ) AJ ω ) : R ω R ω has rank ω = ranka), hence it is invertible. For every j ω we may therefore consider the vector w j = AJ ω ) AJ ω ) AJ ω ) ) ej span{ae i } i ω ). Observe that for every i, j ω we have w j, Ae i = AJ ω ) AJ ω ) AJ ω ) ) ej, AJ ω )e i = AJ ω ) AJ ω ) AJ ω ) AJ ω ) ) ej, e i = e j, e i. By the uniqueness of the dual basis of {Ae i } i ω in span{ae i } i ω ), we conclude that v j = w j for every j ω. This implies in particular that for every j ω we have Proj Fj Ae j = v j = w j, w j = AJ ω ) AJ ω ) AJ ω ) ) ej, AJ ω ) AJ ω ) AJ ω ) ) ej AJω = ) AJ ω ) ) ej, AJ ω ) AJ ω ) AJ ω ) AJ ω ) ) AJω ej = ) AJ ω ) ) ej, e j. 6) Consequently, j ω Proj Fj Ae j = AJω ) AJ ω ) ) AJω ej, e j = Tr ) AJ ω ) ) ) = j ω ranka) s i AJ ω ). 9

10 This is precisely the identity ). The above discussion, and in particular the auxiliary vectors 5) and their properties that were derived above, will play a role in later arguments as well... Grothendieck. We shall use later the following important theorem of Grothendieck [Gro53]. Theorem 3 Little Grothendieck Inequality). Fix k, m, n N. Suppose that T : R m R n is a linear operator. Then for every x,..., x k R m there exists i {,..., m} such that k T x r π T l m ln r= k x ri. 7) r= Here T l m l n = max x [,] m T x is the operator norm of T when it is viewed as an operator from l m to l n, and x ri = x r, e i is the i th coordinate of x r R m. To see the significance of Theorem 3, note that the inition of the operator norm of T when it is viewed as an operator from l m to l n is nothing more than the smallest C 0 such that for every x R m there exists i {,..., m} for which T x C x i. So, the case k = of 7) without the factor π/ in the right hand side is a tautology. Theorem 3) asserts that the case k = of 7) automatically upgrades to 7) for general k N at the cost of a loss of the constant factor π/. The literature contains clear expositions of Theorem 3 and its various useful generalizations and equivalent formulations; see e.g. [Pis86, DJT95]. Nevertheless, for the sake of completeness we shall now quickly explain why Theorem 3 holds true, following a specialization of) the standard proofs of this fact [Pis86, DJT95]. We note that the factor π/ in 7) is sharp; see e.g. the remark immediately following the proof of Theorem 5.4 in [Pis86]. To prove Theorem 3, by rescaling both T and x,..., x k ) we may assume without loss of generality that T l m l n = and k r= T x r =. With this normalization, we claim that m k ) T T x r ) π j. 8) j= r= Once proven, 8) implies the desired estimate 7) via the following application of Cauchy Schwarz. = k T x r = r= k x r, T T x r = r= max i {,...,m} m j= r= ) x m ri r= j= k k x rj T T x r ) j k T T x r ) j r= m k j= r= ) 8) x rj ) k T T x r ) j r= π max i {,...,m} k To prove 8), let {g r } k r= be i.i.d. standard Gaussian random variables. For every j {,..., m} the random variable k r= g rt T x r ) j is Gaussian with mean 0 and variance k r= T T x r ) j. So, [ m k ) ] [ E T m g r T x r = E j= r= j j= k ] g r T T x r ) j = r= j= m j= [ k ] E g r T T x r ) j r= = E [ g ] m k ) T T x r ) j = π r= r= m k T T x r ) j j= r= x ri ) ). ). 9) 0

11 Let z {, } m T be the random vector given by z j = sign ) k r= g rt x r. Then )j m k ) T k g r T x r = z, T g r T x r = j= r= j T z r= T z, k g r T x r r= k g r T x r T l m l n z r= By taking expectations in 0) we see that π m k ) [ T T x r ) 9) m j = E j= r= j= k ) ] T g r T x r r= r= [ 0) k ] E g r T x r j k g r T x r = r= [ k E r= k g r T x r. 0) r= ]) k g r T x r = T x r =, This is precisely the desired estimate 8), thus completing the proof of Theorem Pietsch. Another classical tool that will be used later together with the Little Grothendieck Inequality) is the Pietsch Domination Theorem [Pie67]. Theorem 4 Pietsch Domination). Fix m, n N and M 0, ). Suppose that T : R m R n is a linear operator such that for every k N and x,..., x k R m there exists i {,..., m} with k r= T x r M k r= x ri. Then there exist µ,..., µ m [0, ] with m µ i = such that m w = w,..., w m ) R m, T w M µ i wi. Observe in passing that the conclusion of Theorem 4 immediately implies its assumption. Indeed, by applying this conclusion with w = x r for each r {,..., k}, and then summing the resulting inequalities over r {,..., k}, we get that k r= T x r m µ im k r= x ri ), so the existence of the desired index i {,..., m} follows from the fact that µ,..., µ m ) is a probability measure. The main point here is therefore the reverse implication, as stated in Theorem 4. In Banach space theoretic terminology, the assumption on the operator T in Theorem 4 says that T has -summing norm at most M when it is viewed as an operator from l m to l n. We refer to the monographs [TJ89, DJT95] for much more on this topic, as well as proofs of more general versions of) the Pietsch Domination Theorem. As before, for the sake of completeness we shall now explain why Theorem 4 holds true, following a specialization of) the standard proofs [TJ89, DJT95] of this fact, which amount to an application of the separation theorem equivalently, Hahn Banach or duality of linear programming) to appropriately chosen convex sets. Let K R m be the set of all those vectors y R m for which there exists k N and x,..., x k R m such that y i = k r= T x r M k r= x ri for every i {,..., m}. It is immediate to check that K is convex, and the assumption on T can be restated as saying that K 0, ) m =. By the separation theorem there exist µ = µ,..., µ m ) R m such that m µ iy i < m µ iz i for every y K and z 0, ) m. In particular, µ 0 and inf z 0, ) m z, µ >, so necessarily µ i 0 for all i {,..., m}. We may rescale so that m µ i =. If w R m then T w M wi )m K, so T w M m µ iwi = m µ i T w M wi ) inf m z 0, ) m µ iz i = 0. The following lemma is a combination of the Little Grothendieck Inequality and the Pietsch Domination Theorem; this is how Theorem 3 and Theorem 4 will be used in what follows. r=

12 Lemma 5. Fix m, n N and ε 0, ). Let T : R n R m be a linear operator. Then there exists a subset σ {,..., m} with σ ε)m such that π Proj R σt S εm T l n. ) lm Proof. Since we have T l m l n = T l n, an application of Theorem 3 to T : R m R n shows lm that the assumption of Theorem 4 holds true with T replaced by T and M = π/ T l n l m. Hence, Theorem 4 shows that there exists µ [0, ] m with m µ i = such that Define y R m, T y π T l n lm σ = { i {,..., m} : µ i mε m µ i yi. ) }. 3) Since µ is a probability measure on {,..., m}, by Markov s inequality we have σ ε)m. Take x R n and choose y R m such that y = and Proj R σt x = y, Proj R σt x. Then, Proj R σt x = y, Proj R σt x = T Proj R σy, x T Proj R σy x ) π T l n lm x µ i yi 3) π mε T l n lm x y = π mε T l n lm x. 4) i σ Since 4) holds true for every x R n, this completes the proof of the desired estimate )..4. Sauer Shelah. The Sauer Shelah lemma [Sau7, She7] is a fundamental combinatorial principle of wide applicability that will be used crucially later. Lemma 6 Sauer Shelah). Fix m, n N. Suppose that Ω {, } n satisfies Ω > m n ) k=0 k. Then there exists a subset σ {,..., n} with σ m such that Proj R σω = {, } σ, i.e., for every ε {, } σ there exists δ Ω such that δ j = ε j for every j σ. In particular, if Ω > n then such a subset σ {,..., n} exists with σ n + )/ n/. It is simple to prove Lemma 6 by induction on n when one strengthens the inductive hypothesis as follows. Denoting shω) = {σ {,..., n} : Proj R σω = {, } σ }, we claim that shω) Ω ; this would imply Lemma 6 since the number of subsets of {,..., n} of size at most m equals m n ) k=0 k. This stronger statement is due to Pajor [Paj85], and the resulting very short inductive proof which we shall now sketch for completeness appears as Theorem. in [ARS0]. The case n = holds trivially here we use the convention that {, } = and Proj R Ω = ). Assuming the validity of the above statement for n, take Ω {, } n+ = {, } n {, } and denote Ω = {x {, } n : x, ) Ω} and Ω = {x {, } n : x, ) Ω}. Then Ω + Ω = Ω and by the inductive hypothesis we have shω ) Ω and shω ) Ω. By our initions we have shω) shω ) shω )) {σ {n + } : σ shω ) shω )}, so shω) shω ) shω ) + shω ) shω ) = shω ) + shω ) Ω + Ω = Ω..5. Fan and Hilbert Schmidt. We record for ease of future use the following lemma that controls the influence of multiplication by an orthogonal projection on the Hilbert Schmidt norm of a linear operator. Its proof is a simple consequence of the classical Fan Maximum Principle [Fan49], but we couldn t locate a reference where it is stated explicitly in the form that we will use later.

13 Lemma 7. Fix m, n N and r {,..., n}. Let A : R m R n be a linear operator and let P : R n R n be an orthogonal projection of rank r. Then m ) PA S s i A). i=n r+ Proof. Since I n P is an orthogonal projection of rank n r, by a classical result of Fan [Fan49], n r n r TrAA I n P)) s i AA ) = s i A) 5) The proof of 5) is simple; see e.g. [Stø3, Lemma 8..8] for a short proof and [Bha97, Chapter III] for more general variational principles along these lines. Now, since P is an orthogonal projection, PA S = TrPA) PA)) = TrA PA) = TrAA P) = TrAA ) TrAA I n P)) m = s i A) TrAA I n P)) 5) m n r m s i A) s i A) = s i A). i=n r+ 3. Proof of Lemma 0 In this section we shall prove Lemma 0 in a more general weighted form that corresponds to the renormalization step in Vershynin s Theorem, i.e., Theorem. Using this weighted version of Lemma 0, one can directly deduce weighted versions of Theorem 6 and Theorem 8 as well, by combining Lemma 8 below with Theorem 9, exactly as we did in the Introduction. Lemma 8 weighted version of Lemma 0). Fix r, m, n N. Let A : R m R n be a linear operator with ranka) r. For every τ {,..., m} let E τ R n be ined as in 9), i.e., it is the orthogonal complement of the span of {Ae j } j τ R n. Then for every d,..., d m 0, ) there exists a subset τ {,..., m} with τ = r such that j τ, ProjEτ {j} Ae d j m ) j s i A). 6) m d i Proof. For every τ {,..., m} let K τ R n be the convex hull of the vectors {±Ae j /d j } j τ, i.e., { } { K τ = conv Ae j : j τ }) Ae j : j τ. 7) d j d j The desired subset τ {,..., m} will be chosen so as to maximize the r-dimensional volume of the convex hull of K σ over all those subsets σ of {,..., m} of size r. Namely, we shall fix from now on a subset τ {,..., m} with τ = r such that vol r K τ ) = max σ {,...,m} σ =r i=r vol r K σ ). 8) Take any β {,..., m} with β = r and fix i {,..., m} β. Then by the inition 7) we have K β {i} = conv{±ae i /d i } K β ), i.e., K β {i} is the union of the two cones with base K β and apexes at ±Ae i /d i. Recalling 9), note that K β spank β ) = Eβ. Hence, the height of these two cones equals the Euclidean length of the orthogonal projection of Ae i /d i onto E β. Therefore, vol r Kβ {i} ) = Proj Eβ Ae i vol r K β ) rd i. 9) 3

14 Returning to the subset τ that was chosen in 8), we see that if j τ and i {,..., m} then ) Proj Eτ {j} Ae j vol r Kτ {j} 9) = vol r K τ ) rd j 8) ) 9) vol r Kτ {j}) {i} = ) Proj Eτ {j} Ae i vol r Kτ {j}. 30) rd i Since we are assuming that ) r ranka), we know that vol r K τ ) > 0. It therefore ) follows from 30) that also vol r Kτ {j} > 0, so me may cancel the quantity volr Kτ {j} /r from both sides of 30). Since the resulting estimate holds true for every i {,..., m}, we conclude that j τ, Consequently, for every j τ we have Proj Eτ {j} Ae j m Equivalently, d j j τ, Proj Eτ {j} Ae j d i d j = max i {,...,m} ) 3) Proj Eτ {j} Ae i d i. 3) m Proj Eτ {j} Ae i = Proj Eτ {j} A S. ProjEτ {j} Ae j d j m d i ProjEτ {j} A S. 3) Recalling 9), since τ = r we know that dime τ {j} ) = n r ) for every j τ. Consequently, Proj Eτ {j} : R n R n is an orthogonal projection of rank n r ), so that the desired inequality 6) follows from 3) and Lemma Giannopoulos In this section we shall prove Theorem 9, following the lines of a clever iterative procedure that was devised by Giannopoulos in [Gia96]. Throughout the ensuing discussion, we may assume in the setting of Theorem 9 that ω = {,..., m}, in which case ranka) = m. Indeed, there is no loss of generality by doing so because for general ω {,..., m} we could then consider the restricted operator AJ ω : R ω R n in order to obtain Theorem 9 as stated in the Introduction. Proof overview. The overall strategy of the ensuing proof can be explained in broad strokes given the tools that were already presented in Section. The ultimate goal of Theorem 9 is to obtain an upper bound on the operator norm S of a certain m by n matrix the inverse of an appropriate coordinate restriction of the given n by m matrix A), while we have already seen in Lemma 5 that if one does not mind composing with a further coordinate projection then such a bound follows automatically from a weaker upper estimate on the operator norm l n l m. The latter quantity can be controlled using the Sauer Shelah lemma due to the following reasoning. Let {v j } m j= be the dual basis of {Ae j} m j= that is given in 5). Consider the subset Ω of the hypercube {, } m consisting of all those sign vectors ε = ε,..., ε m ) for which the Euclidean norm m j= ε jv j is not too large, with the precise meaning of not too large here to be specified in the proof of Lemma 9 below; see 37). The parallelogram identity says that if ε {, } m is chosen uniformly at random then the expectation of m j= ε jv j equals m j= v j. So, by Markov s inequality, an appropriate setting of the parameters would yield that the cardinality of Ω is greater than m = {, } m /. The Sauer Shelah lemma would then furnish a coordinate subset β {,..., m} with the property that every sign pattern ε j ) j β {, } β can be completed to a full dimensional sign vector ε {, } m such that m j= ε jv j is short in the Euclidean norm. 4

15 The above conclusion implies an upper bound on the operator norm of the inverse of the restriction of A to R β, when it is viewed as an operator from l β to lm. Indeed, given an arbitrary vector a j ) j β R β, the goal is to bound j β a j in terms of j β a jae j. The sign pattern to be considered is then the signs of the coefficients a j ) j β R β, i.e., set ε j = signa j ) for every j β. The Sauer Shelah) subset β {,..., m} was constructed so that this sign vector can be completed to a full dimensional sign vector ε {, } m with control on the Euclidean length of m j= ε jv j. But {v j } m j= is a dual basis of {Ae j} m j=, so by the inition of ε j) j β the quantity j β a j is equal to the scalar product of j β a jae j with the short vector m j= ε jv j. By Cauchy Schwarz this scalar product is bounded from above by the Euclidean length of j β a jae j times the Euclidean length of m j= ε jv j, with the latter quantity being bounded above by design. By Lemma 5 we can now pass to a further subset of β and compose the resulting inverse matrix with the coordinate projection onto that subset so as to upgrade this control on the operator norm from l β to lm to a better upper bound on S. Complications arise when one examines the above strategy from the quantitative perspective. The Sauer Shelah lemma can at best produce a coordinate subset of size m/, while we desire to obtain restricted invertibility on a potentially larger subset. Moreover, in the above procedure the Sauer Shelah subset is further reduced in size due to the subsequent use of Lemma 5. Since we desire to extract larger coordinate subsets, one can attempt to apply this reasoning iteratively, i.e., start by using the Sauer Shelah lemma to obtain a coordinate subset, followed by an application of Lemma 5 to pass to a further subset β {,..., m}. Now apply the same double selection procedure to {,..., m} β, thus obtaining a subset β {,..., m} β, and iterate this procedure by now considering {,..., m} β β ) and so forth. To make this strategy work, one needs to formulate a stronger inductive hypothesis so as to allow one to glue the local information on the subsets that are extracted in each step of the iteration into global information on their union, while ensuring that the end result is a sufficiently large coordinate subset. This is the reason why the assumptions of Lemma 9 below are more complicated. The technical details that implement the above strategy are explained in the remainder of this section. Lemma 9. Fix n N and m {,..., n}. Let A : R m R n be a linear operator such that the vectors {Ae j } m j= Rn are linearly independent. Suppose that k N {0} and σ {,..., m}. For j {,..., m} recall the inition of the subspace F j R n in 7) with ω = {,..., m}), i.e, ) F j = span {Ae i } i {,...,m} {j}. Then there exists τ σ with τ k ) σ such that for every ϑ {,..., m} that satisfies ϑ τ and every a = a,..., a m ) R m there exists an index j {,..., m} for which σ k r= a i r Proj Fj Ae j a i Ae i + k ) a i. 33) i τ Proof. It will be convenient to introduce the following notation. i ϑ M = max and α k = j {,...,m} Proj Fj Ae j i ϑ σ τ) k r. 34) Throughout we adhere to the convention that an empty sum vanishes, thus in particular α 0 = 0. Under the notation 34), our goal becomes to show that there exists τ σ with τ k ) σ such that for every ϑ {,..., m} that satisfies ϑ τ and every a R m we have a i α k M σ a i Ae i + k ) a i. 35) i τ i ϑ r= i ϑ σ τ) 5

16 We shall prove this statement by induction on k. The case k = 0 holds vacuously by taking τ =. Assuming the validity of this statement for k, we shall proceed to deduce its validity for k +. We are given τ σ with τ k ) σ such that for every ϑ {,..., m} that satisfies ϑ τ we know that 35) holds true for every a R m. Observe that if τ = σ then τ itself would satisfy the required statement for k +, so we may assume from now on that σ τ. For every j {,..., m} let v j be given as in 5), i.e., v j = Proj F j Ae j Proj Fj Ae j R n. 36) Observe that the denominator in 36) and also in 33) and 34)) does not vanish since we are assuming in Lemma 9 that {Ae j } m j= are linearly independent. Define Ω {, }σ τ as follows. Ω = { ε {, } σ τ : By the parallelogram identity we have M σ τ 34) i σ τ Proj Fi Ae i 37) > 36) = i σ τ σ τ i σ τ v i = σ τ ε {,} σ τ ε/ Ω ε i v i M } σ τ. 37) ε {,} σ τ ε i v i i σ τ M σ τ = M σ τ Ω ) σ τ. 38) Since σ τ > 0, it follows from 38) that Ω > σ τ. We can now apply the Sauer Shelah lemma, i.e., Lemma 6, thus deducing that there exists a subset β σ τ with β σ τ / such that Proj R βω = {, } β. Defining τ = τ β we shall now proceed to show that τ satisfies the inductive hypothesis with k replaced by k +. Since β τ =, τ σ and β σ τ / we have τ = τ + β τ + σ τ = τ + σ k ) σ + σ = k ) σ. 39) Next, suppose that ϑ {,..., m} satisfies ϑ τ. If a R m then because Proj R βω = {, } β there exists ε Ω such that for every j β we have ε j = signa j ). The fact that ε Ω means that i σ τ ε i v i M σ τ M σ k/, 40) where in the last step of 40) we used the fact that τ k ) σ. The inition 36) of {v j } m j= implies that v i, Ae j = δ ij for every i, j {,..., m}. Hence, a i = a i Ae i, ε i v i = a i Ae i, ε i v i i β i ϑ i β i σ τ i σ τ a i Ae i ε i v i + i ϑ i ϑ σ τ ) i σ τ i ϑ β) σ τ) a i 40) M σ a i Ae i + k/ i ϑ ε i a i i ϑ σ τ ) a i. 4) 6

17 The penultimate step of 4) uses the Cauchy Schwarz inequality and the fact that, by the inition of τ, we have ϑ β) σ τ) = ϑ σ τ ). Now, a i = a i + a i 35) α k M σ a i Ae i + k ) a i + a i i τ i τ i β i ϑ i ϑ σ τ) i β = α k M σ a i Ae i + k ) a i + k a i, 4) i β i ϑ i ϑ σ τ ) where for the last step of 4) recall that ϑ σ τ) = ϑ σ τ )) β. It remains to combine 4) and 4) to deduce that ) a i α k + k+ M σ a i Ae i + k+ ) a i. 43) i τ i ϑ i ϑ σ τ ) Recalling the inition of α k in 34), we have α k+ = α k + k+)/, so the validity of 39) and 43) completes the proof that τ satisfies the inductive hypothesis with k replaced by k +. Lemma 0. Fix m, n, t N and β {,..., m}. Let A : R m R n be a linear operator such that the vectors {Ae j } m j= Rn are linearly independent. Then there exist two subsets σ, τ β satisfying σ τ, τ t ) β and τ σ β /4 such that if we denote ϑ = τ {,..., m} β) then ProjR σaj ϑ ) S max j {,...,m} t Proj Fj Ae j, where we recall that the inition of the subspace F j R n is given in 7). Proof. An application of Lemma 9 with σ = β and k = t produces τ β with τ t ) β such that if we choose ϑ = τ {,..., m} β) in 33) and continue with the notation in 34) then a R m, a i t M β a i Ae i. 44) i τ Note that the above choice of ϑ makes the second term in the right hand side of 33) vanish, and this is the only way by which 33) will be used here. However, the more complicated form of 33) was needed in Lemma 9 to allow for the inductive construction to go through. A different way to state 44) is the following operator norm bound. ProjR τ AJ ϑ ) l ϑ t l τ M β. Since τ t ) β β /, if we set ε = β /4 τ ) then ε 0, /). We are therefore in position to use Lemma 5, thus producing a subset σ τ with τ σ ε τ = β /4 such that Proj R σaj ϑ ) S = Proj R σproj R τ AJ ϑ ) S t M β ε τ t M. Proof of Theorem 9. Recall that, in the setting of Theorem 9, we are currently assuming without loss of generality that ω = {,..., m}. Choose r N {0} such that i ϑ r+ k m. 45) r Denote τ 0 = {,..., m} and σ 0 =. We shall construct by induction on u {0,..., r + } two subsets σ u, τ u {,..., m} such that if we denote β u = τ u σ u and u {,..., r + }, ϑ u = τ u {,..., m} β u ), 46) 7

18 then the following properties hold true for every u {,..., r + }. a) σ u τ u β u. b) τ u r+u 4 ) β u and β u 4 β u. c) Proj R σu AJ ϑu ) S r u M, where M is ined in 34). Indeed, assuming inductively that σ u, τ u have been constructed, the existence of sets σ u, τ u with the desired properties follows from an application of Lemma 0 with β = β u and t = r u + 4. Recalling 46), by a) we have β u = β u σ u β u τ u ) for every u {,..., r + }. Hence, σ u = β u β u β u τ u β u β u β u r u+4 β m u β u, 47) r+u+ where the penultimate inequality in 47) uses the first assertion in b) and the final inequality in 47) uses the fact that, by induction, the second assertion in b) implies that β u m/4 u, since β 0 = {,..., m}. Observe that the sets {σ u } r+ u= are pairwise disjoint, so if we denote then r+ σ = u= σ u 47) β 0 β r+ m r+ σ = u= r+ u= Next, recalling the inition of ϑ u in 46), observe that σ σ u, 48) u m m 4 r+ m r+ = m m r+ r+ u= 45) k. 49) ϑ u. 50) Indeed, in order to verify the validity of 50) note that due to a) we have σ u, σ u+,..., σ r+ τ u and σ,..., σ u {,..., m} β u for every u {,..., r + }. It follows from 50) that if a R σ then for every u {,..., r + } we have J σ a J ϑu R ϑu R m. Consequently, We therefore have the following estimate. r+ J σ a 48) r+ = Proj R σu J σ a = Proj R σu J σ a u= c) r+ Proj R σu AJ ϑu ) AJ σ )a = Proj R σu J σ a. 5) u= 5) = r+ ProjR σu AJ ϑu ) AJ σ )a u= r u M AJ σ )a r M AJ σ )a u= 45) mm m k AJ σ)a. 5) Recalling the inition of M in 34), since 5) holds true for every a R σ we conclude that AJ σ ) m S max. m k j {,...,m} Proj Fj Ae j This is the desired estimate 8), which, together with 49), concludes the proof of Theorem Geometric interpretation of Theorem 9. Theorem below is a result of Giannopoulos [Gia96]. It can be viewed as a geometric analogue of the Sauer Shelah lemma for ellipsoids. The rough) analogy between the two results is that they both assert that certain large subsets of R n must admit a large rank coordinate projection that contains a certain canonical shape a full hypercube in the Sauer Shelah case and a large Euclidean ball in Giannopoulos case). A different geometric analogue of the Sauer Shelah lemma was proved by Szarek and Talagrand in [ST89]. 8

Sections of Convex Bodies via the Combinatorial Dimension

Sections of Convex Bodies via the Combinatorial Dimension (Rough notes - no proofs) These notes are centered at one abstract result in combinatorial geometry, which gives a coordinate approach to several