1 Lesson 0: Intro and motivation (Brunn-Minkowski)

Size: px

Start display at page:

Download "1 Lesson 0: Intro and motivation (Brunn-Minkowski)"

Damian Golden
5 years ago
Views:

1 1 Lesson 0: Intro and motivation (Brunn-Minkowski) Convex sets are in correspondence with norms, which will be our first motivation to studying them. A very basic inequality is Brunn-Minkowski, which is in fact valid also without the assumption of convexity, so we start with it. Definition 1.1. The Minkowski sum of two sets A, B R n is A + B := {x + y : x A, y B} R n. Theorem 1.2. Let K, T R n be compact, measurable, and non-empty. Then Vol(K + T ) 1/n Vol(K) 1/n + Vol(T ) 1/n. Here Vol(K + T ) can mean outer-measure if it is not measurable. Remark: what is Vol? we shall later see that in the case of a convex set, measurabilty is a rather simple issue, not requiring anything much. In general, I assume Chedva III so that you know how to define the volume of a set: take its indicator function, it should be integrable, and the value of this integral is the volume. Whether it is integrable of not depends on its boundary structure, namely that it is a null set, can be covered with boxes of ε-total volume for any ε > 0. Later we ll see that convex sets can be approximated by polytopes,and the volume of a polytope can be defined via pyramids, so we ll have a more elementary notion of volume (which coincides with Lebesgue measure on convex sets). Proof. First show it holds when K and T are boxes parallel to coordinate axes. This is simply n (a i + b i ) 1/n a 1/n i + b 1/n i=1 i, which after division is ( ai a i + b i but ( ai ) 1/n ( ) 1/n bi + 1, a i + b i a i + b i ) 1/n 1 ai, n a i + b i so summing it up we get what is needed. Next we show it for A and B which are disjoint unions of such coordinate boxed. Use induction of the amount of boxes in A B. Split A using some coordinate hyperplane into A + and A each of which has one box less. Move B so that it splits by the same hyperplane into B + and B, where Vol(B + )/Vol(B ) = Vol(A + )/Vol(A ) =: λ. Then Vol(A + B) Vol(A + + B + ) + Vol(A + B ) (Vol(A + ) 1/n + Vol(B + ) 1/n) n + (Vol(A ) 1/n + Vol(B ) 1/n) n = (1 + λ) (Vol(A ) 1/n + Vol(B ) 1/n) n ( = Vol(A) 1/n + Vol(B) 1/n) n. 1

2 Finally, use continuity of volume for measurable sets so that both A and B are approximated from the inside by unions of cubes. A few things to start considering: Dimension free form: We need to define homothety λa = {λx : x A}. Note that in the non convex situation, it is not even true that 2A = A + A. however, it is true from properties of volume that Vol(λA) = λ n Vol(A). We thus get as a consequence from Brunn Minkowski the following form: Vol(λA + (1 λ)b) Vol(A) λ Vol(B) (1 λ). Remark 1.3. One easily deduces the usual from from the dimension free from (assumed, however, for all λ) by a change of variables as follows: Given A and B, chose λ such that Vol(A/λ) = Vol(B/(1 λ)). Then Vol(A + B) 1/n = Vol(λA + (1 λ)b ) 1/n Vol(A ) λ/n Vol(B ) (1 λ)/n = Vol(A ) 1/n = λvol(a ) 1/n + (1 λ)vol(b ) 1/n = Vol(A) 1/n + Vol(B) 1/n. One then thinks of Brunn-Minkowski as (log-)concavity of the volume function with respect to Minkowski addition. We will later note, for example, that one can also write Vol(A + tb) (Vol(A) 1/n + tvol(b) 1/n) n and there is equality for t = 0, which implies that there is an inequality of the derivatives with respect to t. Writing this derivative Vol(A + tb) Vol(A) lim nvol(b) 1/n Vol(A) n 1 n. t 0 + t The element in the left hand side is surface area with respect to B, and is usual surface area when B = B n 2 is Euclidean. Assume that A has the same volume as the Euclidean ball Bn 2, we get S(A) nvol(b n 2 ) = S(B n 2 ). this is the isoperimetric inequality. (You did not have to remember surface area of B2 n, just note that in the case A + B2 n there is equality all the way). (In fact, the left hand side we shall see further in the course is, for convex sets, a polynomial in t by Steiner formula, which we shall later learn, namely that in the convex world Vol(K + tl) = n V j (K, L)t j where V 0 = V (K) we get an inequalities between polynomials with equal leading coefficients, j=0 2

3 which implies an inequality of the second terms, namely V 1 (K, L) nvol(k) n 1 n Vol(L) 1/n. this is called Minkowski s inequality. We shall later in the course see another definition (coinciding) for s.a. based on polytopes, prove Steiner formula and more.) 2 Lesson 1: Basic convexity Definition 2.1. A set A R n is called convex if for any x, y A and any λ [0, 1], (1 λ)x + λy A. Definition 2.2. The affine hull of a set of points in the smallest affine subspace they all belong to. A set of points is called affinely independent if none of them belongs to the affine hull of the rest, namely one cannot produce a non-zero solution of λ i x i = 0 with λ i = 0. A convex combination of points {x i } k i=1 Rn is a point of the form k i=1 λ ix i where λ i = 1 and λ i 0. Remark: an affinely independent set in R n has at most n + 1 points [move all so that one is at the origin, now it is a matter of linear independence]. Examples of convex sets: Intervals, Subspaces (also affine), Half-spaces, Euclidean Balls. General convex sets which include 0 can be thought of as unit balls of non-symmetric norms. Theorem 2.3. A set A R n is convex if and only if all convex combinations of points in A lie in A. Proof. Induction. An intersection of convex sets is convex. This enables to define the convex hull of a general set A, Definition 2.4. For a set A R n, its convex hull conv(a) is the intersection of all convex sets including A. Linear Algebra Theorem 2.5. For a set A R n conv(a) = { k α i x i : k N, α i [0, 1], α i = 1.} i=1 Proof. The right hand side is convex, and clearly must be included in any convex set which includes A. In fact, Caratheodory s theorem implies that one may take k = n + 1; we shall see this very soon (idea: take the minimal number of such points; they must be affinely independent or else one could reduce the number further). Recall homothety and Minkowski sum. For convex sets, αa + βa = (α + β)a, and this characterizes convexity. Note that A A is defined (and is not A \ A =, nor the neutral 3

4 element {0} unless A = {x} ). Question for thought: cancellation law A + C = B + C implies A + B (when C is a convex body - compact, say). Clearly convexity is preserved by both. Also by any linear or affine mapping, and by the inverse of an affine mapping. In particular, projections of convex sets are convex (but not vice versa). Definition 2.6. The intersection of finitely many closed half-spaces is called a polyhedral set. The convex hull of finitely many points is called a polytope. The convex hull of r + 1 affinely independent points is called an r-simplex. We shall later see that polytopes and polyhedral sets are somewhat dual notions, and that bounded polyhedral sets are polytopes and vice versa. Definition 2.7. The vertices of a polytope: points such that P \ {x} are still convex. Up to here Shiur 1.1, 2 hours, date 2.11 Theorem 2.8. Let P = conv{x i } m i=1 then x i is a vertex if and only if it does not belong to the convex hull of the others. Also, P is the convex hull of its vertices. Proof. Clearly no vertex can come from outside the initial set from definition of convex hull and verticity. If x i is a vertex then P \ {x i } is convex and includes the convex hull of the rest and not x i. If x i does not belong to conv{x j } j i and is not a vertex this means P \ {x i } is not convex so there should be λ (0, 1) and a, b P \ {x i } with x i = (1 λ)a + λb. Write a and b as convex combinations, to get x i as a convex combination of the rest (since its coefficients in a and b are not 1). This completes the first part. Now remove one element a time not changing the convex hull until you get a set which is only vertices, and still the same convex hull. Polytopes are preserved by linear maps, affine, Minkowski addition (an exercise: conv(a + B) = conv(a) + conv(b)), intersection, convex hull. The fact about intersection requires a bit of work (say, working with polyhedral sets). Theorem 2.9. A R n is a simplex if and only if there exist {x 0,..., x r } A such that any point x A is uniquely represented as a convex combination of them. Proof. An r-simplex was defined as the convex hull of r + 1 affinely independent points. Given a simplex, denote these points by {x i } r i=0, then any point inside has a representation, and it is unique do to the affine independence of the points. Given a set A with the above property then it must equal conv{x i } as it is convex, contains them, and in included in their convex hull. They must be affinely independent otherwise one could produce two convex combinations of the points representing the same point in the convex hull (trivial from definitions). Combinatorics Theorem 2.10 (Radon). Let x 1,..., x m R n be affinely dependent. There is a partition into two disjoint sets I J = {1,..., m}, I J = such that conv{x i } i I conv{x j } j J Proof. Since they are dependent, one can find α i x i = 0 with α i = 0. Partition to the positive and the negative, and normalize. 4

5 Note that in particular, this applies to any n + 2 points in R n. Theorem 2.11 (Helly). Let {A i } m i=1 be convex in Rn with m n + 1. If each n + 1 have non-empty intersection, then the intersection of all is non-empty. Proof. Induction step: Find an element x i in the intersection of any (m 1) of them. Apply Radon to the resulting m n + 2 points to find w.l.o.g. x conv(x i ) k i=1 conv(x i ) m i=k+1. the first conv lies in the intersection of all A i, i = k + 1,..., m and the second in the rest. If the sets are compact this can be generalized to an infinite family by a standard topological argument [finite intersections]. Theorem 2.12 (Caratheodory). Let A R n and x conv(a). Then there exist {x i } n i=0 A such that x conv{x i }. Proof. Write by Theorem 2.5 x = k i=1 α ix i with α i = 1, α i > 0, least possible k. Then x i are affinely independent or else there would be some β i x i = 0 with β i = 0, in which case we d split them into negative and positive ones, pick from the positive ones the one with the smallest ratio α i /β i, say i 0, and then x = k (α i β i α i0 /β i0 )x i i=1 is a way of writing x as a convex hull of x i with just k 1 non-zero coordinates (all terms are non-negative), a contradiction. If they are affinely independent, there are at most n + 1 of them. Up to here Shiur 1.2, 1 hour, date 3.11 Topology Proposition Let 1 k n. The relative interior of a k-simplex conv{x i } k i=0 Rn is relint(p ) = { i α i x i : α i = 1, α i > 0}. Proof. Straightforward. Assume wlog that x 0 = 0. Then affine independence of {x i } k i=0 then translates to {x i } k i=1 being linearly independent. Without loss of generality they span Rk and form a basis for it. The coordinate mapping x (a 1,..., a k ) where x = a i x i is then a homeomorphism (it is a linear map) and the simplex is the inverse image of the standard simplex in R k, {a : a i 1, a i 0}. By continuity (of both the mapping and its inverse), its interior is the inverse image of the interior of the standard simplex, which is trivially {a : a i 1, a i > 0}. Theorem In A R n is convex then relint(a). Proof. Take k + 1 affinely independent points in A which is assumed of dimension k. Their convex hull contains relative interior points which are relative interior points of A as well. 5

6 Theorem Let A R n be convex. Then It is useful to first prove the following Ā = {x R n : y A with [y, x) A}. inta = {x R n : y x z (x, y), [x, z] A}. Lemma Let A R n be convex. If x Ā and y relint(a) then [y, x) relint(a) Proof. We work with full dimensional sets for simplicity. Let z = (1 α)x+αy with 0 < α < 1. Take some x k x with x k A. Then (z (1 α)x k )/α y. Take some open ball around y which is all in A, these points reach it for large k. When they do, there is another ball V around y k = (z (1 α)x k )/α which is all in A, but then z αv + (1 α)x k which by convexity is included A, and is thus an open neighborhood of z. Proof of Theorem We begin with the closure. The inclusion of RHS in the LHS is immediate. For the other side, take a point in the closure and take y in the relative interior of A, which exists by Theorem 2.14, and by the above Lemma, [y, x) is a subset of the relative interior, and in particular of A. For the second assertion, assume the dimension is full or else both sides are empty. LHS in RHS is trivial by definition of an open set. Let x belong to the RHS. Choose (by Theorem 2.14 some y relint(a), and apply the assumption to 2x y (any affine combination with negative y-coefficient would do) so there must be some z (x, 2x y) with [x, z] A. But z (x, 2x y) implies z = λx + (1 λ)(2x y) so x(2 λ) = z + (1 λ)y and thus x (z, y). Since y is in the relative interior and z A A, we get by the above lemma that x relint(a) as claimed. Up to here Shiur 2.1, 1 hour, date 9.11 Corollary For a convex A R n both relint(a) and Ā are convex. Proof. Follows from the above characterization and some convex combinations. For the relative interior it is immediate from the lemma: Start with x, x relint(a), then x A and so [x, x ) relint(a). For the closure, let x, x A, then there are y, y A with (y, x), (y, x ) A and in particular, by convexity, (λy + (1 λ)x, λx + (1 λ)x ) λ(y, x) + (1 λ)(y, x ) A, 6

7 so again by the lemma the convex combinations of x and x are in A. Corollary For a convex A R n, relint(a) = relint(a) and relint(a) = Ā. Proof. Again, simple manipulations on the definitions. Start with relint. Clearly LHS is included in the RHS. If x relint(a) then by convexity of this set and theorem 2.15, for all y x there exists z (x, y) with [x, z] A. Pick y relint(a) (if it equals x we are done, so assume not) and apply that to 2x y as before. Then there exists z (x, 2x y) such that [x, z] A. But in particular x (z, y) and z A so x relint(a). Next we work with the closures. This is even easier. Now it is clear that RHS is included in LHS. But if x A then there is some y relint(a) with (x, y] relint(a) so so x relint(a). Convex hull need not be closed, even if you start with a closed set (e.g., {(x, 1/x) : x > 0} {(0, 0)}). It does send open to open, and compact to compact. Theorem Conv sends open to open and compact to compact. Proof. Open: Let x conva. Write it as a combination. Take balls around each x i. Their intersection (when translated to the origin) in a neighbourhood of x. Compact: Boundedness is clear, since everything is contained in a large ball. Closed is attained by writing the sequence as combinations of n + 1 vectors and taking subsequences (both of the vectors and of the coefficients). Support/separation Theorem from Hilbert course: Theorem For a closed convex set K R n and a point x outside, there is a unique closest point to x in K in the Euclidean metric, called P K (x). Proof. (Existence from closed, uniqueness from convex ). Assume x K otherwise it is trivial that P K (x) = x. Find some r with B(x, r) K then the closest point must lie in B(x, r), so the infimum is over a compact set B(x, r) K, thus is attained at some y. If there were two minimizing points then they would make (together with x a triangle with equal sides, and by Pythagoras their average would be closer to x. This average is in K by convexity, and this is a contradiction. Remarks: sets of positive reach: when uniqueness is guaranteed on K + εb2 n. There is a theorem that uniqueness of metric projection implies convexity. One may project with respect to other metrics but lose uniqueness, for uniqueness to happen one needs strict convexity (see below: no segments on the boundary). We shall not prove the following theorem. Theorem 2.21 (Motzkin). Let S be a non-empty closed set in R n. Assume the unit sphere of your norm is smooth and strictly convex (e.g., in the Euclidean case defined above). Then S is convex if and only if for every x R n there exists a unique closest point in S. Both smoothness and strict convexity are essential. Up to here Shiur 2.2, 1 hour, date

8 Definition Let K R n be closed and convex. Let H be some affine hyperplane, that is, a set of the from {x : f(x) = c} for some linear f : R n R. It has the related half-spaces H + = {f c} and H = {f c}. We say that H is a supporting hyperplane for K if either H or H + includes K and H K. (We then also have a support set K H and any x there is a support point of K. The corresponding half-space is a supporting half-space. Remark: in the non-compact case one can generate a smallest half-space in a given direction which does not intersect the set and hence is not a supporting halfspace. Theorem For x K R n closed and convex the hyperplane through P K (x) orthogonal to x P K (x) is a supporting hyperplane. Proof. Denote this hyperplane by H = P K x + u. Clearly by orthogonality, x H. Take H + to be the half-space not containing x, and assume u is such that u, x P K x > 0 (the vector u is just x P K x). We claim K H +. Assume not, then there is some y H K. This means y P K x, u > 0. Take the segment joining y and P K x, which is included in K and forms an acute angle with the segment [P K x, x]. The points on this segment which are close to P K x are closed to x than P K x (use Pythagoras, say, or simply x (P K x + δ(y P K x)) 2 = x P K x 2 2δ x P K x, y x + δ 2 y P K x 2 < x P K x 2 for small δ), a contradiction. Corollary If x (P K x, y) then P K y = P K x Proof. Indeed, by the above theorem, the supporting hyperplane H at P K x is orthogonal to x P K x thus also to y P K x, and since K is contained in the half space bounded by it, the points in K are farther from y than the points of H, the closest of which is P K x. 8

9 Corollary Every closed convex set (which is not, R n ) is the intersection of its supporting half-spaces. Something stronger holds: every boundary point has a supporting hyperplane passing through it. To this end first note Lemma Given a closed convex set K, the mapping P K is a continuous mapping with Lip. constant 1 Proof. Indeed, x P K (x), P K (y) P K (x) 0, y P K (y), P K (x) P K (y) 0 so that which implies x y + P K (y) P K (x), P K (y) P K (x) 0, P K (y) P K (x), P K (y) P K (x) y x, P K (y) P K (x) y x P K (y) P K (x) and dividing we get what we claimed. (The case when they have the same projection, or when they are inside K, are easy). Theorem Let K R n closed, convex, x K. Then there exists a supporting hyperplane for K passing through x. Proof. By definition of the boundary, take a sequence x k K, x k x. Then by the lemma, P K (x k ) P K (x) = x. Use Corollary 2.24 to choose elements y k x+ B n 2 with P Ax k = P A y k, by continuing the line between x k and P K x k until it hits x + B. By compactness of x + B take a converging subsequence of y k y. Then P K (y) = x (in particular, y K) and by Theorem 2.23 we are done. The above is a kind of separation theorem. Here is a stronger one: Theorem Let A, B R n non-empty and convex, and assume relinta relintb =. Then there exists a hyperplane E separating them. Proof. Clearly by the assumption, 0 relint(a) relint(b) = relint(a B) (the latter is in the exercises). It can be in the closure (thus in the boundary of A B) or outside. In both cases we have seen above that we may separate 0 from A B using a linear functional f with f(a B) f(0) = 0, say. This exactly means that f(a) f(b) for all a A and b B. To end this section, we shall return to the simple class of polytopes, and check that separation there is indeed what we imagine. A support set of K was defined before, K H when H is a support hyperplane. Definition The support sets of a polytope are called faces, they can have dimension between 0 and n 1. The n 1 dimensional ones are called facets, and the 1-dimensional ones edges. Note that support sets are convex. Proposition Given a polytope P R n, its 0-faces are its vertices. 9

10 Proof. Given a 0-face {v}, there is a supporting hyperplane H with H K = {v} K H. Clearly P \ {x} = P int(h ) is convex, so {v} is a vertex by the usual definition. For the other side, if v is a vertex then P = conv(v, v 1,... v k ) and v conv(v j ) =: P (we saw this in Theorem 2.8). One may project v onto P and find some support half-space for P through P P v. We claim that a translation of this halfspace going thought v will be supporting for P and with intersection with P equal to {v}. Indeed, is we encode the hyperplane by a functional f with, say, f(p ) 0 and f(v) = c > 0, then for any point x P which is not v we still have, by convex combinations, f(x) < c, so that {f = c} P = {v}, as needed. The faces of a polytope are as we imagine them: convex hulls of vertices. Theorem Let P R n be a polytope, and let F be a face of P. Then F = conv(v(p ) F ). Proof. Denote the supporting hyperplane associated with F by H, so that P H and H P = F. Encode it by {f = c}, so that f(p ) c. For every vertex in F, f(v) = c and for every vertex of P not in F, f(v) < c. Let δ be chosen so that for v F, f(v) c δ. Every point in F can be written as a convex combination of all the vertices. But by applying f we see that the weights of all v F must equal 0, so that the point is in conv(v(p ) F ). Corollary For a polytope P R n There are finitely many faces. If it has k vertices, it has less than 2 k faces. Theorem A polytope is a polyhedral set. Proof. First restrict to full dimension since one may get the subspace which is the affine span by intersecting finitely many halfspaces. Next take the (finitely many) faces, a supporting halfspace for each one, and intersect them. One gets a polytope P which clearly contains P. However, if there was some point in P \ P then one could connect it with a relative interior point of P, get a boundary point in between, it lies on some face, so it has some corresponding half-space containing it which participated in the intersection defining P. So this point is at the boundary of P. But it is also in the interior by the much loved lemma from before and the fact that it lies in an open interval connecting an interior point with a point in the set. This is a contradiction. Up to here Shiur 3.1, 2 hours, date extremal elements Generalizing slightly from polytopes, define an extreme point of a convex set Definition Let K R n be closed and convex. (i) A point x K is called extremal if whenever x = λy + (1 λ)z with z, y K and λ (0, 1) one has y = z = x. The set of extreme points is denoted ext(k). (ii) A point x K is called an exposed point if there exists a supporting hyperplane for K such that K H = {x} (that is, {x} is a support set). Remark For a closed half-space, ext(h) =, and for a convex set ext(a) if and only if A does not contain any straight line (will be an exercise). Clearly exp(k) ext(k). The converse is not true by the example of a football stadium. As in polytopes, v ext(k) is and only if K \ {x} is convex. For polytopes we saw v(p ) = ext(p ) = exp(p ). Theorem 2.36 (Minkowski). Let A K for a compact convex K. The K = conv(a) if and only if ext(k) A. In particular, K = conv(ext(k)). 10

11 Proof: In functional analysis course. If K = conv(a) and x is an extreme point which is not in A, since K \ {x} is convex, x conv(a), a contradiction. To show that K = conv(ext(k)) use standard argument. We recall it here for convenience. One first shows that the set ext(k) is non-empty. Once this is done, assume some point of K is not in conv(ext(k)). Separate it from conv(ext(k)) by a functional, take subset of K where the maximal value of this functional is attained. This is an an extremal set (defined in the exercise). By non-emptiness of ext, the extremal set has an extreme point, and it must be an extreme point of K as well (by exercise again, say). Contradiction. To show that the set ext(k) is non-empty one may use Zorn s lemma on the set of extremal sets (partially) ordered by inclusion. A minimal element must be a point, otherwise one may separate two points in the extremal set using a functional, take the set where it attains its maximal value, it must be an extremal set and is strictly included in the original set, a contradiction to minimality. Another proof: use induction on dimension. Dimension 1 is easy. To show any x K is in conv(ext(k)) take some line through x, it intersects K in a segment, with endpoints on the boundary. By the support theorem, there are supporting hyperplanes through these endpoints. The corresponding support sets are the convex hulls of their extreme points. These extreme points are also extreme points of K. By induction, the endpoints of the segment are in the convex hull of the extreme points of the support sets, and in particular of K, and therefore also x is. Corollary A closed convex set is a polytope is an only if it has a finite number of extreme points. Proof. We already saw the a polytope has a finite number of vertices and these are its extreme points. If a closed convex set has a finite number of extreme points then by the last theorem it is their convex hull, so, it is a polytope. Corollary A bounded polyhedral set is a polytope Proof. Let P be a polyhedral set, P = m j=1 H+ j. We need to show ext(p ) is finite. We claim that any x ext(p ) has to be of a very specific form: the intersection of some of the hyperplanes above. (therefore, only a finite number of possibilities exist). Indeed, let us consider the intersection of some k m of the hyperplanes H j for which x H j, and intersect the resulting set with all H j + \ H j for x H j (so that x H j + \ H j ). This is a subset of P, and it is relatively open, and it includes x which is an extreme point of P, and thus it must equal to {x}. Theorem For a compact convex K R n, K = conv(exp(k)). Proof. In the exercises. Corollary 2.40 (Straszewich). For a compact convex K R n, ext(k) exp(k). Proof. This follows by Theorem 2.39 together with Minkowski s theorem for A = expk and the fact that conv(exp(k)) convexp(k). Up to here Shiur 3.2, 1 hour, date

12 More theorems, for future courses There are many other interesting things to tell here, which we don t have time for. One might include some as exercises (we have, actually). Here are a few: (we do not include the proofs for now) Theorem 2.41 (Krasnoselski). Let S be a non-empty compact connected set in R n. Suppose each n + 1 points {x i } of S have some point y of S which they can all see (that is, [x i, y] S). Then S is star-shaped. Theorem In R 3 a compact set is convex if and only is each 2-dimensional section of it is simply connected. Theorem 2.43 (Bing). In R 2, assume one is given a bounded set with at least two points, such that with any two point it includes a third which gives together with them an equilateral triangle. Then the set is exactly the vertices of an equilateral triangle. 3 Lesson 2 : Convex functions Introduction, properties, operations In fact, the main function we will need is the support function h K (u) = sup x, u. x K We have already encountered it in the exercises. But convex sets are so intimately connected with convex functions that we may as well work with them for a couple of lessons. We will also use, for a convex set, its convex indicator function δ K (x) which is 0 on K and + elsewhere. Definition 3.1. A function f : R n (, + ] is called convex if its epigraph, defined by epi(f) = {(x, y) R n R : f(x) y}, is a convex subset of R n+1. A function f : A (, ] is called convex if its extension to R n as + outside A is convex, so in particular A must be convex. The domain dom(f) of a convex function f is the (convex) set where f is finite. The constant + function is called improper. Theorem 3.2. A function f is convex if and only if f(λx + (1 λ)y) λf(x) + (1 λ)f(y) for all x, y R n and λ (0, 1). (here, for positive c, c(+ ) + a = +. ) Proof. By definitions. The convexity of the epi-graph means that if f(x) a and f(y) b then f(λx + (1 λ)y) λa + (1 λ)b. This is equivalent to this happening for a = f(x) and b = f(y). The only slightly interesting thing to check is the + values, and they work. Remark 3.3. Affine functions are convex and concave (iff). The sub-level sets of a convex function are convex. Convex functions are closed under addition and multiplication by positive scalars. Also under pointwise supremum (since the epigraphs are intersected), though properness does not follow. 12

13 Remark 3.4 (Jensen Inequality). If f is convex, λ i 0 and m i=1 λ i = 1 then m f( λ i x i ) i=1 m λ i f(x i ). i=1 More generally, given some distribution on R n with dµ = 1 (so, a probability measure) we have f( xdµ(x)) f(x)dµ(x) which can be written as f (E µ (X)) E µ (f(x)). You may have seen a proof of this (in a probability class, say) as follows: take a supporting hyperplane to epi(f) at E µ (X) (why does it exist? the point (E µ (X), f(e µ (X))) is a boundary point of the epigraph, which is a convex set (one may look at its closure, if one is worried that it might not be closed) so by Theorem 2.26 (find label) we have a supporting hyperplane through it. It must be an epigraph of an affine function otherwise it points in the R n direction which means that E µ (X) is at the boundary of the domain, which can only occur if X is distributed on a lower dimensional space, in which case we restrict to this space to begin with). We thus have that f(x) f(e µ (X)) + u, x E µ (X). Taking expectation on both sides and using its linearity, we get that E µ (f(x)) f (E µ (X)). Remark 3.5. For positively homogeneous functions, convex means subadditive. Indeed, f(λx + (1 λ)y) f(λx) + f((1 λ)y) = λf(x) + (1 λ)f(y) and vice versa. In particular, norms (even non-symmetric) are convex. We mention that convex functions are important in optimization, mainly since a local minimum is global. Indeed, if x 0 is a local minimum, say in a ball around it or radium ρ, then take any y farther from x 0 and create x = ρ y x 0 y + (1 ρ y x 0 )x 0 which satisfies x x 0 = ρ thus f(x) f(x 0 ) but f(x) thus f(x 0 ) f(y). ρ y x 0 f(y) + (1 ρ y x 0 )f(x 0) and Next we look at a lemma on slopes, which actually should be given when we talk about regularity of convex functions, but is so elementary we give it here. It is one dimensional. Lemma 3.6. Let I R be an interval and f : I R convex. Then for x < y < z in I, f(y) f(x) y x f(z) f(x) z x f(z) f(y). z y 13

14 Proof. Let y = (1 λ)x + λz, then use convexity of f: f(y) f(x) y x (1 λ)f(x) + λf(z) f(x) (1 λ)x + λz x = f(z) f(x). z x The second is similar. Corollary 3.7. Let I R be an interval and f : I R convex. Then f is Lipschitz on any fixed closed subinterval I int(i). Proof. Take any two points x > x to the left of I and two points y > y to the right of I. Then by the above Lemma (applied twice, say) for any z > z I we have f(x) f(x ) x x f(z) f(z ) z z f(y) f(y ) y y. Therefore f(z) f(z ( ) z z max f(y) f(y ) y y, f(x) f(x ) ) x x. Below we shall show Lip continuity in n dimensions too (on compact subsets of the interior of the domain), but this we do in the regularity subsection. Before, some more introductory definitions and properties. Definition 3.8. A convex function f is called closed if epi(f) R n R is closed. the closure of a convex function is the largest closed function below it, and its epigraph is epif. (Note (in exercises) that given a convex function the closure of the epigraph will be the epigraph of another convex function.) Definition 3.9. The convex hull (or envelope, or convexification) of a function f : R n (, + ] is defined as conv(f) = sup{g : g f, g convex}, provided there exists some affine g f (which is equivalent to this set being non-empty). The convex hull, or convex infimum, of two, or many, functions is defined as the convex hull of the infimum, or - using the operation of convex hull on the epi-graphs. This can be also written as conv(inf α I (f α )) = ˆ inf α I (f α ) = sup{g : g f α α I, g convex}. It exists if and only if there is an affine function below all f α. 14

15 Theorem Let f : R n (, + ] be closed and convex. f, h affine }. Then f = sup{h : h Proof. From the corresponding theorem (Corollary 2.25) for convex sets, we can write the epi-graph as an intersection of halfspaces in R n+1. Separate these into three types, via the (n + 1) th coordinate of the normal. the pointing upwards ones, with a positive (n + 1) th coordinates are precisely the epigraphs of affine functions. The pointing downwards ones, with negative (n + 1) th coordinate cannot occur for proper functions since then no ray pointing upwards would participate in the epigraph. Finally, there are the vertical ones with 0 (n + 1) th coordinate. Thus, we should show that each such halfspace can be obtained as an intersection of halfspaces of the first type. Let us work without loss of generality with the halfspace { x, e 0} where e R n R n R. This means that the domain of f is a subset of (e ) R n, that is, for every x dom(f), e, x 0. There exists some affine function below f as f is proper (just separate a point below epif from it) call it l, so that for all x dom(f), l(x) f(x). We get that also l a (x) = l(x)+a x, e f(x) and this is for all a > 0. The corresponding halfspaces for a, which we may call H + a = {(x, y) : y l a (x)} = epi(l a ) participate in the intersection. We claim that a>0 epi(l a ) { x, e 0}. Indeed, if y 0 l(x 0 ) + a x 0, e for all a > 0 then x 0, e 0. The Legendre transform Definition Let f : R n (, + ] be convex and not (proper). Lf(y) = f (y) = sup ( x, y f(x)). {x R n } Theorem For f : R n (, + ] convex and proper, f is always closed, convex and proper. We have that f = f (the closure of f). Remark In fact, for any f we can take f and it will make sense, and we will have f = conv(f). Proof. The supremum is only over the domain. Convexity follows from being supremum of affine functions. Closed-ness also (the epi-graph is an intersection of closed sets). Note that there is some affine function below f, say, y +c and thus f (y) sup x, y ( x, y +c) = c is not all + (proper). 15

16 Next we check that it is an involution on closed functions, and that applying it twice is like taking a closure: One side is quite easy f f. Indeed, we claim that f f by choosing y = x in the inner supremum : f (x) = sup z ( x, z f (z)) = sup inf ( x, z z, y + f(y)) f(x). z y As it is closed, also f f. For the other direction, we use Theorem 3.10 which says that f is the supremum of affine functions below f (this is separation which one always needs for claims such as K = K). Thus f = sup{(, y c) :, y c f}. The latter condition, after moving terms from side to side, can be written as c f (y). So, f = sup{(, y c) : c f (y)} sup(, y f (y)) = f. y If we would have started with a closed function, this would be the end. It is anyway the end, since f f, we have f f and applying Legendre again f f. Putting these together we get f f f f. The claim is done. Up to here Shiur 4.1, 2 hours, date Regularity Theorem Convex functions are continuous in the interior of their domains and Lipschitz continuous on compact subsets of the interior of their domain. Proof. Given x in the interior of the domain we can restrict to a simplex in the interior of the domain which includes x. By convexity the value on the whole simplex is bounded by the maximum on its vertices, say C. Next we restrict to some neighborhood B2 n (x, ρ) P of x, and assume y = ρ. Now, f(x + εy) = f((1 ε)x + ε(x + y)) (1 ε)f(x) + εc. Thus f(x + εy) f(x) ε(c f(x)) 2εC. On the other hand x = 1 ε (x + εy) + (x y) 1 + ε 1 + ε 16

17 so that Thus Together, f(x) 1 ε f(x + εy) ε 1 + ε C. f(x + εy) f(x) ε(f(x) C) 2εC. f(x + εy) f(x) 2εC = εy 2C/ρ and this gives continuity as ε 0. Actually, this gives that for z = x + εy B2 n (x, ρ), f(x) f(z) K z x, where K = 2C/ρ. To get that f is Lip on compact subsets of the interior, fix some such compact subset A. By open-ness of the interior and compactness of A there is some ρ with A + ρb2 n int(domf) (the distance, not 0, between A and the boundary of dom(f)). We can now use for C the maximum of f on A + ρb2 n, which is finite by continuity. The same argument as above given that f(x) f(z) x z 2C/ρ. Differentiability is a slightly more complicated issue. For this introduction, we shall start with n = 1. We then consider directional derivatives of higher dimensional functions, but do not go deep into the theory here. Note that by Lebesgue s theorem, as absolutely continuous function on an interval is almost everywhere differentiable (and we have locally Lipshitz - in one dimension this was even easier than above). In the convex case, however this becomes every stronger: at most countable many points without differentiability. Theorem Let f : R (, + ] be convex, x int(domf). (i) The right and left derivatives exist and satisfy f f +. (ii) Each is monotonously increasing, and they are equal except at most countable many points. (iii) The function f + is continuous from the right, and f from the left, and f is the indefinite integral of each. Note that (iii) in particular implies that if a convex function is differentiable, it is continuously differentiable! Proof. We start by showing the existence of f and f + with, for x < y, f (x) f +(x) f (y). Indeed, taking w < x < z < y, then we have by Lemma 3.6, 17

18 f(w) f(x) f(z) f(x) f(y) f(x) w x z x y x f(y) f(z). y z By the same Lemma, the first two expressions as functions of z and w are non-decreasing so they have limits as w x and z x +, and these are precisely f (x) f +(x). Taking the limit of the fourth expression as z y we get that f +(x) f(y) f(x) y x f (y)( f +(y)) as required. So, we have shown their existence, the inequality between them, and that they are increasing. An increasing function has at most countably many points of discontinuity. If x is a point of continuity of f, say, then f (x) f +(x) lim y x + f (y) = f (x). So, in particular, the left and right derivatives are equal except for countable many points, and at such points f is differentiable. This completes (ii). Let us show that f + is continuous from the right, namely lim z x f +(z) = f +(x). The inequality is immediate from monotonicity. On the other hand, for any y > x we have f(y) f(x) y x f(y) f(z) = lim lim f z x y z z x +(z) and taking the limit of the left hand side as y x we get that f +(x) lim z x f +(z). We remark that one could have actually used the more general fact that if a non-decreasing sequence of real continuous non-decreasing functions on an interval I has a limit g then g is continuous from the left. Then note that f, say, is the limit of some g n continuous nondecreasing, g n = f(x 1/n) f(x) 1/n. Finally, we need to complete (iii) by showing that the indefinite integral of f, say, is f. 18

19 Pick some a and define g(x) = f(a) + x a f (t)dt. Convexity of g is easy, let z = λx + (1 λ)y for x < y and λ [0, 1], then g(z) g(x) = g(y) g(z) = z x y z f (t)dt (z x)f (z), f (t)dt (y z)f (z), λg(y) + (1 λ)g(x) λ(y z)f (z) + (1 λ)(x z)f (z) + g(z) = g(z). But if it is convex, it has left and right derivatives, and then one computes that g +(x) g(y) g(x) = lim = 1 y x y x y x y x f (t)dt = 1 y x y x f +(t)dt f +(x). Analogously, g (x) f (x). So we have the chain g f g + f + with equality except countably many points. In particular, g = f and g + = f + except for finitely many points. But as both functions are continuous from the left (resp. right) they are equal everywhere. So f g has derivative 0 everywhere, thus is constant, and the constant is 0 by comparing at a. Up to here Shiur 4.2, 1 hour, date Remark The connection between f (x) (or more generally, f +(x) and f (x)) and the supporting half-plane is straightforward: f(y) f(x) + u(y x) whenever u [f (x), f +(x)]. This was noted in the above proof, indeed f(y) f(x) y x { f + (x) y > x, f (x) y < x. Corollary Let f : R n (, ] be convex, x int(dom(f)). Then for each u S n 1, or, more generally, each u R n, the directional derivative exists. f (x; u) := lim t 0 + f(x + tu) f(x) t Proof. Simply define g : R (, ] by g(t) = f(x + tu) and use that it is convex, thus differentiable from the right at 0. Remark Note that even though the directional derivatives exist, this need not be a linear operation with respect to u, in other words, f is not necessarily differentiable. However, as a function of u (defined for all u R n ) the expression f (x; u) is convex (and positively 19

20 homogeneous). f (x; λu + (1 λ)v) = f(x + t(λu + (1 λ)v)) f(x) lim t 0 + t = f(λ(x + tu) + (1 λ)(x + tv)) f(x) lim t 0 + t λ(f(x + tu) f(x)) + (1 λ)(f(x + tv) f(x)) lim t 0 + t = f(x + tu) f(x) f(x + tv) f(x) λ lim + (1 λ) lim. t 0 + t t 0 + t Remark Let us list some things which are not too hard to show but we skip, parts of them appear below though: For convex function, differentiability at a point x is equivalent to existence of f e j (x). Another thing equivalent to both is that at (x, f(x)) the epigraph has a unique supporting affine hyperplane, and then it is given by a(y) = f(x) + f, y x. It is true that f is differentiable almost everywhere (with respect to Lebesgue measure). Actually, the second derivative also exists almost everywhere, this is deeper theorem of Alexandrov. Finally, a convex differentiable function (on an open domain) is C 1, that is, all partial derivatives are continuous. We might get back to these things later in the course. In the case where a function is differentiable, or twice so, there is a simple way to characterize convexity. We summarize these in the following claim Theorem Let f : A R, where A R n is convex and open. 1) In n = 1, if f is differentiable, f is convex if and only if f is increasing; if f is twice differentiable, f is convex if and only if f 0. 2) In general n, if f is differentiable, f is convex if and only if f is monotone in the following sense: for all x, y A f(x) f(y), x y 0. A third equivalent condition is that f(y) f(x) f(x), y x for all x, y in the domain. If f is twice differentiable, f is convex if and only if 2 f 0 (that is, is a positive semidefinite matrix) for all x A. Proof. Part (1) is from basic calculus courses. If f is convex, we have already shown that its derivative (one sided) is increasing. If the derivative exists and is increasing, one uses the mean value theorem twice, one for f(λx + (1 λ)y) f(x) and once for f(y) f(λx + (1 λ)y). the fact about the second derivative is just the connection between increasing and non-negativity of the derivative. Let us turn to the more general part (2). Start with differentiable f. Let us show that (a) f is convex implies (b) x, y, f(y) f(x) f(x), y x implies (c) f(x) f(y), x y 0 implies (a). Assume f is convex, then f(x + λ(y x)) (1 λ)f(x) + λf(y) therefore f(x + λ(y x)) f(x) λ f(x), y x f(y) f(x) f(x), y x. 20

21 The chain rule implies that the left hand side tends to 0 as λ 0 + and thus the right hand side is non-negative, as claimed. (b) implies (c): If f(y) f(x) + f(x), y x and f(x) f(y) + f(y), x y, Adding the two inequalities one gets the mentioned monotonicity. (c) implies (a): Convexity is a line-issue so we may restrict to a line; the condition implies that on every line the derivative is increasing, and this implies convexity. more precisely, define g(t) = f(x + tv) where v = y x. all we need is to show that g is convex. g (t) g (s) = f(x+tv), v f(x+sv), v = f(x+tv) f(x+sv), (x+tv) (x+sv) /(t s) so it is non-negative if s < t by (c), and by the one dimensional claim, g is convex, and thus f. As for the Hessian: Intuitively one thinks of Taylor s remainder theorem which in the case of pd Hessian implies that the difference between f and its tangent hyperplane is non-negative. Instead (and to show if and only if), we will work in one dimension again. Namely we define as above g(t) = f(x + tv), and f is convex if and only if g is convex for any choice of x and v. This is equivalent to g 0. It is in fact enough that g (0) 0 because we may move x accordingly. g (0) = lim t 0 + g (t) g (0) t = lim f(x + tv) f(x), v t n i f(x + tv) i f(x) = v i lim t 0 i=1 + t n n = i,j f(x)v j = 2 f(x)v, v. t v i i=1 j=1 The subdifferential In the non-differentiable case, there is a replacement for the gradient, which is given in the following definition: Definition Let f : R n (, + ] be convex and x dom(f). The subdifferential of f at x is defined by f(x) = {u R n : f(y) f(x) + u, y x y}. any u f(x) is called a subgradient of f at x. Remark Let x int(dom(f)). Then the set f(x) is non-empty, since there is at least one supporting hyperplane at (x, f(x)) and it is the graph of an affine function which lies below f and touches it at (x, f(x)). The set f(x) is convex. For x int(dom(f)), it is also compact (the condition is closed, and it cannot include a ray, since x is in the interior of the domain). Theorem Let f : R n (, + ] be convex and x dom(f). Then u f(x) if and only if (u, 1) is an exterior normal to epi(f) at (x, f(x)), that is, epi(f) (x, f(x)) + {(u, 1) }. 21

22 Proof. The condition epi(f) (x, f(x)) + {(u, 1) } means that whenever (y, t) epi(f) we have y x, u (t f(x)) 0 which is equivalent to y x, u f(y) f(x) for all y, which is the definition of the subgradient. Remark Let f : R n (, + ] be convex, let x int(dom(f)) and assume f is differentiable at x. Then f(x) = { f(x)}. Indeed, let u f(x). Then for all t > 0 we have f(x + tv) f(x) t u, v so that f (x; v) u, v. But under differentiability f (x; v) = f (x; v) u, v and we get that f (x; v) = u, v but this holds for all v so u is unique, and is f(x). We remark that this is in fact if and only if, but the other direction is slightly more work - move to one dimension, get that the function is differentiable on any line, and use the fact (which we also did not bother to prove) that this implies differentiability in the convex case. We shall show below that in fact f (x; u) = sup v f(x) v, u. One probably gets the feeling how much this subgradient is related to the Legendre transform. Let us mention only one of many such relations. Theorem Let f : R n (, + ] be convex, closed and proper. then f (y) + f(x) x, y with equality if and only if y f(x) and if and only if x f (y). Proof. The inequality is by definition, f (y) = sup z ( z, y f(z)) x, y f(x). In the differentiable case, the equality case can be produced by differentiation (to find the maximum). In the general case: Assume y f(x), then f(z) f(x) + y, z x for all z, and so f (y) = sup z ( z, y f(z)) sup ( z, y f(x) y, z x ) = x, y f(x) f (y). z Assume equality holds. Then sup z ( z, y f(z)) = x, y f(x) which means that for all z, f(z) f(x) z x, y. Since, under the conditions, f = f, we also get the second if and only if. Note that in particular we have just shown that y f(x) iff x f (y). In the differentiable setting this is sometimes stated as ( f) f = Id, which can be further differentiated.. maybe later in the course. The support function Definition For K R n non-empty and convex, where u R n. h K (u) = sup x, u x K Theorem Let K, T R n non-empty and convex. Then 1) h K is positively homogeneous, closed, convex. 22

23 2) h K = h K and K = {x : x, u h K (u) u}. 3) h K is order preserving, K T implies h K h T and h K h T implies K T. 4) h K is finite if and only if K is bounded 5) h λk+µt = λh K + µh T for λ, µ 0 and h K (u) = h K ( u) 6) h convki = sup h Ki and h Ki = infh ˆ Ki. 7) δk = h K. Proof. Straightforward. 1) By definitions we get homogeneity and subadditivity. Closed as a supremum of closed. 2) Taking the supremum over the closure does not change its value. The second statement is like double dual, one needs some separation argument. Clearly the left hand side is a subset of the right hand side. Assume thus one has some x such that for all u, x, u h K (u). If x K then we may separate them with a hyperplane. Assume it is in direction u, that is, x, u > c y, u for all y K. Then h K (u) c < x, u which is a contradiction. 3) The first part is from the definition. The second is from the second part of 2. 4) If K is bounded within a ball of radius R then h K h B n 2 (R) <. If h K is finite then from convexity and regularity it must be continuous. In particular, on S n 1 it is bounded, and this gives h K h B n 2 (R) for some R, which by 3 implies K is bounded. 5) By definition, separate within the supremum, or change signs within the expression. 6) Clearly K i K = convk i so h Ki h K, so sup h Ki h K. On the other hand, given some x K it can be written as x = λ i x i with x i K i, λ i 0 and λ i = 1 (for some finite sub-collection, of size n + 1). Thus h K (u) = sup x, u = x K = sup λ i 0, λ i =1 The second one will be in the exercise. 7) Was in the previous exercise. Up to here Shiur 5.1, 2 hours, date sup x i K i,λ i 0, λ i =1 λ i x i, u λi h Ki (u) sup h Ki. Theorem Every positively homogeneous closed convex h : R n (, ] is the support of some (unique) closed convex set K. Proof. Take h. Then h (u) = sup ax, u h(ax) = ah (u) so that it is only 0 or. Letting K be the domain of h, we have h = δ K, and then h = h = h K. Remark ) K = {x} iff h K = x, 2) h K+{x} = h K + x, 3) K = K if h K ( u) = h K (u) for all u. 4) 0 K iff h K 0. Same as we had support sets before, we revisit them again, but with the aid of the support function. This fixes the directions which were less determined before. 23

24 Definition Let K R n be non-empty closed and convex. H(u) = {x : x, u = h K (u)} K(u) = K H(u) = {x K : x, u = h K (u)}. If h K (u) then the hyperplane H(u) is a good candidate to being a supporting hyperplane for K, as we always have K H(u) (with no possibility to translate it further). The corresponding support set ought to be K(u). As usual, in the unbounded set, the set K(u) might be empty, in which case this is not formally a supporting hyperplane. If K(u) then indeed K(u) is a support set and H(u) a support hyperplane. We shall from now on work in the setting of non-empty compact closed convex sets, and to these we refer to as convex bodies. Definition A compact convex nonempty set is called a convex body. The class of all convex bodies in R n is denote K n. If we also require them to include 0, this is K0 n. If 0 is assumed in the interior, this is denoted K(0) n. The support function of polytopes is very simple, since if P = conv{x i } m i=1, by the above remarks h P (u) = max x i, u. i=1,...m So, we see that the support function is piecewise linear. (The definition of piecewise linear is that there is a partition of R n finitely many convex sets and the function is linear on each.) Proposition Let K K n. Then h K is piecewise linear if and only if K is a polytope. Proof. One direction we have just seen. Most parts are if and only if. To be more precise, assume h K is piecewise linear. Since it is positively homogeneous, its linearity sets most be cones, say some finitely many A i s. On each A i, the function is given by, x i (from homogeneity). Convexity of h K implies that h K max, x i but since on each A i the maximum is attained for the i th element, we have equality. We conclude this chapter with a claim that seems esoteric but will later play a part. Proposition Let f : R n (, + ] be convex and let x int(dom(f)), The directional derivative in direction z is given by f (x; z) = Up to here Shiur 5.2, 1 hours, date 1.12 sup v, z = h f(x) (z). v f(x) Proof. The direction is easy ( If v f(u) then f(x) f(u)+ v, x u for all x, in particular for x = u + tz, so we get f f(u + tz) f(u) f(u) + t v, z f(u) (u; z) = lim lim = v, z.) t 0 + t t 0 + t However, instead we shall use the fact that as a function of z, f (x, z) is positively homogeneous and convex. (so, sub-additive), as we saw in Remark Further, we may assume it is finite, since x is in the interior of the domain, and we have the simple lemma on slopes of one dimensional functions. Therefore, it is also closed (in fact, continuous). 24

25 There is therefore, by Theorem 3.28, some convex set K x such that f (x; z) = h Kx (z). We need to show that K x = f(x). Let y K x. Then, for unit z S n 1, from convexity, f(x + tz) f(x) tf (u; z) = th Ku (z) t y, z. Thus, for any w, letting w = x + tz with unit z, we get f(w) f(x) y, w x. This means that y f(x). If y f(x) then for any z, f(x + tz) f(x) + t y, z. Taking the limit as t 0 + of the relevant quantity we see that h Kx (z) = f (x; z) y, z. This already means that y K x, as needed. Theorem Let K K n and u R n \ {0} then and in particular, for any z R n, h K (u) = K(u) h K(u; z) = h K(u) (z). Proof. We shall use the previous theorem when the function f is given by some h K. The only thing which is left to show is that the sub-differential of h K at a point u is precisely the support set K(u). h K (u) = {v R n : h K (y) h K (u) + v, y u y} = {v R n : sup x, y sup x, u + v, y u y} x K x K If v K(u) then h K (u) = u, v and then y h K (y) v, y = h K (u) + v, y u = since in particular v K. So we see v K(u) v h K (u). If v h K (u) then plug in y = 0 to get v, u h K (u) and y = 2u to get and 2h K (u) h K (u) + 2 v, u so v, u h K (u) which means there is equality. To see that v K (which would complete the fact that v K(u) note that for all y v, y h K (y) v, u h K (u) = 0 for all y so v, y h K (y) for all y and by Theorem 3.27 v K. Corollary Let K K n and u R n \ {0}. The set K(u) is one point if and only if h K is differentiable at u (in which case h K(u) is the support function of a point, thus linear, thus h K (u; z) is linear in z, and is the gradient). In such a case, one has the well known formula h K (u), u = h K (u). For h K to be differentiable on all R n \ {0}, one needs (iff) the boundary of the set K not to contain any segments. Such Ks are called strictly convex. Remark To see more clearly why indeed the support set is the subdifferential, one may restrict to the differentiable case and call K(u) = {x(u)} the support set, so we have an equation h K (u) = x(u), u, which when differentiated yields h K (u) = x(u) + D x (u)u. However, as 25

x(u) is clearly constant on rays (it is not defined at 0) its differential at u must include u in its Kernel (all components do not change in the direction u), so we only have x(u) there.

In R 2, for a polytope with perimeter l(p ) plus a ball, it is pretty easy to see that Vol(P + εb n 2 ) = Vol(P ) + εl(p ) + ε 2 π.

26 x(u) is clearly constant on rays (it is not defined at 0) its differential at u must include u in its Kernel (all components do not change in the direction u), so we only have x(u) there. 4 Lesson 3: Mixed Volumes We are heading towards the formula for Vol(K + st ), which is a polynomial of degree n in s > 0, or more generally for Vol( m i=1 λ ik i ). In R 2, for a polytope with perimeter l(p ) plus a ball, it is pretty easy to see that Vol(P + εb n 2 ) = Vol(P ) + εl(p ) + ε 2 π. Here is a picture in R 3 of the cube plus a small ball εb2 n, try to think what are the coefficients of ε, ε 2, ε 3. We shall start with polytopes, and then use them to approximate convex bodies. To this end we need two things: a metric on convex bodies, and some kind of compactness argument to have converging sub-sequences (this is useful in general). After these two tools, we shall investigate volume and surface areas on polytopes and on general convex bodies. We shall see that convexity allows us to define these objects more naïvely than is done in Chedva III, say. 4.1 Hausdorff distance The space of convex bodies in R n is endowed with a natural topology. It can be introduced as the topology induced by the Hausdorff metric Definition 4.1. δ H (K, T ) = max{max min x K y T x y, max min x T y K x y }. 26

27 Lemma 4.2. For K, T K n we have (i) δ H (K, T ) = inf{δ 0 : K T + δb n 2, T K + δbn 2 } (ii) δ H (K, T ) = max{ h K (u) h T (u) : u S n 1 }. Note that (ii) in particular means that the embedding K h K from K n to C(S n 1 ) is an isometry between (K n, δ H ) and the subset H n C(S n 1 ) of the class of convex functions on the sphere endowed with the supremum norm. (A function on S n 1 is called convex if its homogeneous extension to R n is). Note that this mapping T : K n H n is positively linear, bijective, order-preserving (between inclusion and pointwise inequality). Note that there is no linearity for negative scalars, namely h K T h K h T. Proof. The first is from definitions: K T + δb2 n iff for every x K there exists y T with x y δ iff max x K min y T x y δ and the same for the other one. The second is not much more: K T + δb2 n iff h K h T + δh B which, as functions on S n 1, means h K h T δ. We see that δ H (K, T ) δ iff h K h T C(S n 1 ) δ. Corollary 4.3. For K, L, M K n we have δ H (K, L) = δ H (K + M, L + M). In particular, we have the cancellation law mentioned much earlier K + M = L + M = K = L. Remark 4.4. Maybe you are asking yourselves what are the isometries of the space of convex bodies with respect to this metric. Apparently such results exist (Gruber, Schneider, Lettl) and the only isometries are of the form K UK + L for a fixed body L and rigid motion U. See page 236 of Gruber s book. 4.2 Compactness: Blaschke s selection theorem. Theorem 4.5. Any sequence of convex bodies K j K n of which all elements are contained in some fixed ball, has a converging subsequence. Proof. One way is to use Ascoli-Arzela for the distance functions. We shall instead give a direct proof with subsequences: First we construct a rectangle of bodies K i,j where each row is a subsequence of the row above it, namely {K i,j } j=1 is a subsequence of {K i 1,j} i=1, and moreover, the distances between all elements in {K i,j } j=1 is at most 1/2i. This is done by 27

28 covering the big ball by small ones of radius 2 i (a finite number, N i using compactness) and saying that an infinite subsequence of the sequence {K i 1,j } j=1 has the same elements of the covering intersecting them. (That is, the same subset of P(N i )). Next, take a diagonal, to get a subsequence K j,j with distance between each two at most 1/2 min(i,j). We claim that taking K nj = K j,j we have K nj + 1/2 j 1 B2 n is monotone decreasing, since Kn j+1 K nj + 1/2 j B2 n so add 1/2j B2 n. We thus may intersect all of its elements to get a set which we shall call K 0 = j (K nj + 1/2 j 1 B n 2 ). We claim this K is the limit of K nj. Indeed, on the one hand for any ε > 0, for j large enough so that 2 (j 1) < ε, clearly K 0 is included in K nj + εb2 n. On the other hand, take K + εb2 n and let G denote its interior. We may subtract G from each K nj + 1/2 j 1 B2 n and get a decreasing sequence of compact sets... K nj + 1/2 j 1 B n 2 \ G K nj+1 + 1/2 j B n 2 \ G... which are all included both in K (from its definition) and in R n \ G and is thus empty (since K and R n \ G do not intersect). From the finite intersection property, it must be empty from some point onwards, and in particular for large enough j, K nj G K + εb2 n. We thus get that K nj K 0, as needed. To be able to define things first on polytopes and then extend to general bodies, we shall need to approximate bodies by polytopes (and hope that the property/quantity discussed in monotone or continuous). Here is the density property: Theorem 4.6. Let K K n and ε > 0. (i) There exists a polytope P K with δ H (P, K) ε. (ii) There exists a polytope K P with δ H (P, K) ε. (iii) If 0 relint(k) there exists a polytope P relint(k) with K relint((1 + ε)p ) (and in particular, omitting the relint s). Proof. (i) We cover the boundary K by a finite number of open balls of radius ε centred at the boundary, K m i=1 x i + εb2 n. Let P = conv(x i) m i=1. Then P K and K P + εbn 2 so by convexity K P + εb2 n as needed. (ii) We cover (K+εB2 n ) by open halfspaces whose boundaries are supporting hyperplanes of K. (This can be done since points on this boundary are necessarily not in K). By compactness, we find a finite sub-covering, (K + εb2 n) m i=1 int(h+ i ). We let P = m i=1 H i. Then K P and (K + εb2 n) Rn \ K so K P + εb2 n, as needed. (iii) Exercise number 42 Up to here Shiur 6.1, 2 hours, date Volume and surface area: polytopes The volume we are acquainted with is Lebesgue measure on R n. This notion coincides with a much more elementary one which we next define for convex sets. We shall use approximation by polytopes, and for polytopes: recursive definition with respect to dimension. To this end we shall need to consider the support sets of a convex body as sets in a lower dimensional space - 28

29 this is done by, for example, translating a support set K(u) to u which is then identified with R n 1. (Any choice of an orthonormal basis there will work in the same way, the only thing that matters is the euclidean metric, which his inherited from R n ) Definition 4.7. Let P K n be a polytope. For n = 1 we define Vol 1 ([a, b]) = b a and S 1 (P ) = 2 (surface area). For n 2 we define { 1 Vol n (P ) = n u h P (u)vol n 1 (P (u)) dim(p ) = n 1 0 dim(p ) n 2 S n (P ) = u Vol n 1 (P (u)). Note that if the polytope is of dimension (n 1), its volume thus defined is 0 (since the only two support sets of n 1 dimension have opposite h P (u) = h P ( u)), as is its usual Lebesgue measure. Its surface area is twice its (n 1) dimensional volume. The next proposition says that the above formula coincides with usual Lebesgue measure for plytopes, and some further useful properties. Proposition 4.8. Let P K n be a polytope. Then Vol n (P ) is the usual Lebesgue measure of P. In particular, both Vol n and S n are invariant with respect to rigid motions, for λ > 0 Vol(λP ) = λ n Vol(P ), S n (λp ) = λ n 1 S n (P ), Vol n (P ) = 0 if and only if dim(p ) n 1, and if P Q are polytopes then Vol n (P ) Vol n (Q). Moreover, we have that in this case also S n (P ) S n (Q). Proof. We shall use induction on dimension. For n = 1 this is clear. Let n 2 and assume wlog that dim(p ) = n otherwise we already showed that Vol n (P ) = 0. It has some finitely many facets P (u i ), i = 1,..., m of dimension n 1. Let us first work in the case where 0 is in the interior of P. In this case, h(u i ) > 0 for all i. We consider pyramids over each facet, P i = conv({0}, P (u i )). The (usual) volume of these pyramids is 1 n h P (u i )Vol n 1 (P (u i )), and this also coincides with the defintion above for Vol n since the support function is all facets which are not indirection u i equals to 0. Thus Vol n (P ) = m Vol n (P i ) = i=1 m (Lebesgue volume)(p i ) i=1 and as these pyramids give a dissection of it into parts with disjoint interiors, the latter equals the Lebesgue volume of P. 29

30 In the general case when 0 is not assumed in the interior we check how a translation affects the expression. The volumes of the facets remain the same, but the support function gets an extra linear function. We thus need to show that for any x 0, (1) x0, u i Vol n 1 (P (u i )) = 0 (which is equivalent to u i Vol n 1 (P (u i )) = 0, though this is not relevant). Note that we do know that for small ε, any y 0 int(p ) and any x 0 we have Vol n (P y 0 + εx 0 ) = Vol n (P y 0 ) since when 0 is in the interior Vol n is Lebesgue measure which is translation invariant. Thus for small ε εx0, u i Vol n 1 (P (u i )) = 0 but the formula is homogeneous in ε and (1) must hold. All that is left is to show the moreover part (which needs convexity seriously). Assume P Q. We will need the following triangle inequality : If {P (u i )} m i=1 are all the facets of a given polytope, then m Vol n 1 (P (u 1 )) Vol n 1 (P (u i )). i=2 To do this, just project all facets to the u 1 hyperplane, and projections do not increase (n 1) dimensional volume (they multiply it by cosine of the angle). they do cover the facet, and that s that. However, this formula implies that S n (Q H + ) S n (Q) (since one is exchanging the many facets of Q H with just one facet). Inductively reduce, using this, Q to P by each time intersection with one half-space including P. Up to here Shiur 6.2, 1 hour, date 8.12 Remark 4.9. We have just seen that if P is a polytope with facets P (u i ) with volumes V i = Vol n 1 (P (u i )) then u i V i = 0. The opposite is also true and is called Minkowski s theorem, namely given vectors u i and numbers V i such that u i positively span R n and such that u i V i = 0, one can build a polytope in R n external normals {u i } and with V i = Vol n 1 (P (u i )). The proof can be done with the use of Brauer s fixed point theorem, or by a clever use of Lagrange multiplies. We have all the tools for it, be we delay this to a later section, Section 12, where it is stated as Theorem The reason is that we think of it as a special case of a general problem of finding bodies which generate certain area measures on the sphere, and we shall have a whole section devoted to that. 4.4 Volume and surface area: bodies We may now extend the definition of volume and of surface area to convex bodies in the natural way using monotonicity. We do so, and state the main properties, in a theorem. Theorem Let K K n. Then inf Vol n(p ) = sup Vol n (P ), P K P K and this is the Lebesgue measure of K. Similarly inf S n(p ) = sup S n (P ), P K P K 30

(and this is the (n 1) dimensional Lebsgue measure of K R n : but we just use it as a definition) Both S n and Vol n are invariant with respect to rigid motions, Vol n (λk) = λ n Vol n (K) and S n

31 (and this is the (n 1) dimensional Lebsgue measure of K R n : but we just use it as a definition) Both S n and Vol n are invariant with respect to rigid motions, Vol n (λk) = λ n Vol n (K) and S n (λk) = λ (n 1) S n (K) for λ 0, Vol n (K) = 0 if and only if dim(k) n 1, K T implies Vol n (K) Vol n (T ) and S n (K) S n (T ), K Vol n (K) is continuous. (Actually S n is also continuous - since things are convex - but we wait with this one more subsection.) Proof. First note that for polytopes, we arrive at the same formula as before since we have monotonicity of both expressions, so that the infimum and the supremum are attained for the polytope itself. We continue with the existence of the limit: Again using monotonicity for polytopes, for a general convex body, we have inf Vol n(p ) sup Vol n (P ), P K P K and the same for the surface area formula. Also, all quantities are invariant with respect to rigid motions by the same fact for polytopes. We may thus assume wlog that 0 relint(k). Given a convex body, we use Theorem 4.6 (iii) to get some P K (1 + ε)p so that inf Vol n(p ) (1 + ε) n sup Vol n (P ) P K P K and for surface area with (1 + ε) n 1 instead. As ε > 0 was arbitrary, the two sides are equal. The fact that Vol n for convex bodies coincides with Lebesgue measure now follows from its monotonicity and the fact that both functions coincide on polytopes. (So from now on we use Vol n to denote also Lebesgue measure, and these two limits). We can now prove all the rest by the corresponding properties for polytopes, or for Lebesgue measure. We shall elaborate only on the last one which is continuity of volume with respect to Hausdorff distance. Assume K i K. This means that for every r > 0 there exists an index J such that for all j > J we have K i K + rb n 2 and K K i + rb n 2. Let us assume wlog that 0 relint(k). We distinguish between two cases: either K is of full dimension (that is, with non-empty interior) or not. Assume first that dim K = n. 31

32 Choose by Theorem 4.6 (iii) some P int(k) (1 + ε)int(p ). Thus for some small r > 0 we have P + rb2 n K and thus for large enough j we have P + rbn 2 K j + rb2 n. We can now use the generalized cancellation law to get that for large j P K j. Similarly, For some r we have K + rb2 n (1 + ε)p and so for large enough j we have K j (1 + ε)p. We get that for j > J also P K j (1 + ε)p so that Vol n (K j ) Vol n (K) Vol n ((1 + ε)p ) Vol n (P ) = ((1 + ε) n 1)Vol n (P ). As ε was arbitrary, and Vol n (P ) is bounded by Vol n (K), this completes the proof in the full dimensional case. If K is of lower dimension, then Vol n (K) = 0, and we should show that lim Vol n (K i ) = 0. This is simple. Pick some polytope P with dim P = (n 1) such that K + B2 n 1 P. Then for any 1 > δ > 0, for j large enough K j K + δb2 n and the volume of this set is at most δvol n 1 (P ) which can be as small as needed. 4.5 Mixed volumes: polytopes Here we really start preparing for polynomiality of volume with respect to Minkowski addition. Let us give some intuition about the way to prove it (though eventually we shall use a slightly different route - the full route which we indicate here shall be in the exercises and in Section 4.7). We know that the volume of a polytope is given by Vol n (P ) = 1 h P (u)vol n 1 (P (u)), n where in fact the sum is over all unit outer normals of facets of P. Let us consider what happens when we sum two polytopes. u Vol n (sp 1 + tp 2 ) = 1 (sh P1 (u) + th P2 (u))vol n 1 ((sp 1 + tp 2 )(u)). n u We now have to sum over unit normals of the sum, which is a set independent of t as we shall see below in Lemma Inductively we could continue, and use that (sp 1 +tp 2 )(u) = sp 1 (u)+tp 2 (u) (again, we shall see in Lemma 4.11 below) and by induction also Vol n 1 (sp 1 (u) + tp 2 (u)) is a homogeneous polynomial in s, t, this time of degree (n 1). In doing so, we would get a proof, but it is somewhat hard to track down the coefficients since we would be dealing with, for instance, h (sp1 +tp 2 )(u)(v) in the inductive step. The second lemma below, Lemma 4.13, helps us deal with these support functions and with the sets themselves, realizing the support sets as intersections of facets and recovering the vectors with respect to which these are support sets (certain combinations of the normals of the facets involved). We shall start with the easy lemma regarding the support sets behaviour with respect to Minkowski addition and multiplication by positive scalars. Lemma Let λ i > 0 for i = 1,..., m and P i K n polytopes. Let u, v S n 1. Then (a) ( m i=1 λ ip i )(u) = m i=1 λ ip i (u) (b) dim( m i=1 λ ip i )(u) = dim( m i=1 P i)(u) (c) If ( m i=1 P i)(u) ( m i=1 P i)(v) then ( m i=1 P i)(u) ( m i=1 P i)(v) = m i=1 (P i(u) P i (v)). 32

33 Proof. The first, (a) was more or less in an exercise. We can also prove it using the subdifferential, since h λ i P i (u)(z) = h λi P i (u; z) = λ i h P i (u; z) = λ i h Pi (u)(z). For (b) we first move all polytopes to have 0 in their relative interior, in which case also their sum has 0 in its relative interior (this too was in an exercise) and then, letting λ min = min λ i and λ max = max λ i, we have, using (a), that λ min Pi (u) m λ i P i (u) λ max Pi (u). i=1 Thus they have the same dimension (and in fact, the same spanned subspace, after this translation). Finally, assume the intersection is non empty, take some x inside it. Thus x = x i with x i P i. Clearly h Pi (u) = x i, u and h Pi (v) = x i, v otherwise x would not satisfy h P i (u) = x, u and h P i (v) = x, v. Thus x i P i (u) P i (v) for all i if and only if x P (u) P (v). The converse works much the same way (using (a), say). Remark One consequence of the last Lemma is that the facets (namely, highest dimensional faces) of P are a subset of the facets of P + Q. Indeed, (P + Q)(u) = P (u) + Q(u) so if dim P (u) = n 1 then dim(p + Q)(u) = n 1 as well. (Easily seen if one assume 0 to be in the bodies, which does not matter for dimensional issues.) We need a slightly more intricate fact about support sets which intersect. Lemma Let K K n and u, v S n 1 linearly independent. Let w = λu + µv where µ > 0 and λ R. Assume K(u) K(v). Then K(u) K(v) = (K(u))(w). In particular, for any x K(u) K(v) we have h K(u) (w) = x, w = λ x, u + µ x, v = λh K (u) + µh K (v). Remark Note that we are not dealing with simple polytopes, so the dimension of the intersection is not necessarily (n 2). Assume, however, that we are in the case of polytopes, that u and v are unit normals of two intersecting facets, so that the intersection is of dimension n 2. In such a case, we could speak about the normal of the intersection with respect to one 33

34 of the facets, say P (u). In such a case we get two unit normals, u and v, of the polytope P, and another normal w of P (u) P (v) with respect to the polytope P (u), say. Geometrically we can see they all lie on a plane, and if we let θ stand for the angle between u and v, namely cos(θ) = u, v, one sees that v = u cos(θ) + w sin(θ) (in other words, w = Proj u v). Since the facet P (u) lies in u, the facet in direction w + λu for any λ R is still the same one, which is as claimed. Lemma 4.13 above does not involve just polytopes and is quite general. In particular, the dimension of the intersection can be smaller than (n 2), in which case there are more normals than just this half-plane. Proof. Assume z K(u) K(v). Clearly z, u = h K (u) = h K(u) (u). Also clearly h K(u) ( u) = z, u since the inner product with u, and with u, is fixed on K(u). In particular, for all λ R, h K(u) (λu) = λh K (u) = z, λu. Thus for w = λu + µv we have (as z, v = h K (v)) that z, w = h K(u) (λu)+ z, µv = h K(u) (λu)+h K (µv) h K(u) (λu)+h K(u) (µv) h K(u) (w) z, w. thus z (K(u))(w). Similarly, if z (K(u))(w) take some x K(u) K(v) then it also belongs to (K(u))(w) by above, so λ x, u + µ x, v = x, w = z, w = λ z, u + µ z, v but we know that z and x have the same inner product with u so we conclude that also z K(v), as needed. Up to here Shiur 7.1, 2 hours, date We are finally in a position to define the mixed volume of n polytopes in R n. Usually one first says that the quantity Vol( m i=1 λ ik i ) is a polynomial in the λ i s, then calls the coefficients mixed volumes and then given a precise formula for them. We work in the opposite way, first give the formula and then claim these are the coefficients in the polynomial. In the exercises an alternative route will be provided (and summarized in the next subsection). Definition Let P 1,..., P n K n be polytopes. For n = 1 we define for P 1 = [a, b], For n 2 we define inductively V (1) (P 1 ) = Vol 1 (P 1 ) = h P1 (1) + h P1 ( 1) = b a. V (n) (P 1,..., P n ) = 1 n u N( n i=2 P i) 34 h P1 (u)v (n 1) (P 2 (u),..., P n (u)).

35 Note that we are summing over unit normals of the (n 1) dimensional faces of n i=2 P i. When one of the sets is empty we let the mixed volume equal to 0. Remark Note that a-priori the expression for mixed volume does not seem to be symmetric in the arguments, and this is one of the objectives of the next theorem. Also note that one easily sees by induction that V (n) (P, P,..., P ) = Vol n (P ). If B n 2 were a polytope, one would also see that V (n) (B n 2, P,..., P ) = S n(p ), and we shall return to this later. Theorem For P 1,..., P m K n polytopes we have that m Vol( λ i P i ) = i=1 m i 1,...i n=1 λ i1 λ in V (n) (P i1,..., P in ). The function V (n) (P 1,..., P n ) is non-negative, translation invariant, symmetric in its arguments, and vanishes when the dimension of n i=1 P i is less than n. Remark To understand the result it may be useful to look at the special case m = 2. In such a case, we get that Vol n (sp 1 + tp 2 ) = n ( ) n s j t n j V (n) (P 1,..., P 1, P 2,..., P 2 ). j j=0 where in the j th summand P 1 is repeated j times and P 2 is repeated (n j) times. We remark that in the proof we shall use inductively mixed volumes in lower dimensions on bodies (which will be support sets) which lie in a subspace. We shall thus define also for k polytopes in R n, Q 1,..., Q k, which satisfy dim( k i=1 Q i) k, the quantity V (k) (Q 1,... Q k ) = V (k) (Q 1 E,... Q k E) where E is a k dimensional subspace containing a translate of each of them. Because of translation invariance the choice of E will not be important, in particular in the case where the dimension of their sum is less than k. Up to here Shiur 7.2, 1 hour, date Proof. We use induction on dimension. In dimension 1, summing intervals is linear, and volume is linear, so everything holds. m Vol 1 ( λ i [a i, b i ]) = Vol 1 ([ λ i a i, λ i ]) = λ i (b i a i ). i=1 Next assume the assertion is true in dimension (n 1). Let use first show that V (n) vanishes when the dimension of the Minkowski sum of the bodies is less than n. In such a case, the body is contained in some translate of a hyperplane u. In particular, the (n 1) dimensional faces (of the sum) with non-zero volume are either non-existent (in which case we sum over an empty set) or are in directions u and u for some vector, in which case h P1 ( u) = h P1 (u) and P j (u) = P j ( u) for all j, so that the expression vanishes. To prove polynomiality (with these given coefficients) it is sufficient to use only positive λ i, since if one is given zero coefficients for some of the polytopes this simply means reducing m. We use the definition of volume and the inductive hypothesis to get 35

36 m Vol n ( λ i P i ) = 1 n i=1 = 1 n = u N( m i=1 λ ip i ) u N( m i=1 λ ip i ) i 1 =1 m i 1 =1 λ i1 1 n h m i=1 λ ip i (u)vol n 1 u N( m i=1 λ ip i ) ( ) m ( λ i P i )(u) i=1 ( m m ) λ i1 h Pi1 (u)vol n 1 λ i (P i (u)) i=1 ( m ) h Pi1 (u)vol n 1 λ i (P i (u)) i=1 Using the induction hypothesis ( m ) m n Vol n 1 λ i (P i (u)) = λ ij V (n 1) (P i2 (u),..., P in (u)). Plugging this in, we get that i=1 i 2,...,i n=1 j=2 m Vol n ( λ i P i ) = i=1 = = m m i 1 =1 i 2,...,i n=1 m n i 1,i 2,...,i n=1 j=1 m i 1,i 2,...,i n=1 j=1 λ i1 n j=2 λ ij 1 n λ ij 1 n u N( P i ) u N( λ i P i ) n λ ij V (n) (P i1, P i2,..., P in ). h Pi1 (u)v (n 1) (P i2 (u),..., P in (u)) h Pi1 (u)v (n 1) (P i2 (u),..., P in (u)) We have exchanged the summation over N( m i=1 λ ip i ) first with N( m i=1 P i) by Lemma 4.11 and then with N( n j=2 P i j ) which is a smaller set (by Remark 4.12). The latter is allowed because any u which is not in the set N( n j=2 P i j ) will produce support sets {P ij (u)} n j=2 such that the dimension of the sum satisfies dim( n j=2 P i j (u)) (n 2), and by the previous argument the (n 1) dimensional mixed volumes of the corresponding support sets will equal 0 and not donate to the sum. To show symmetry (it is important because we gave an explicit formula, otherwise we can just symmetrize any polynomial) we shall assume it inductively, and show that V (n) (P 1, P 2, P 3,..., P n ) = V (n) (P 2, P 1, P 3,..., P n ). We may clearly assume the dimension of the sum P = n i=1 P i is n, otherwise both sides vanish. Let us write the expression for the left hand side (in a way which is symmetric in P 1 and P 2 ). 36

37 We go back to the definition, we see that V (n) (P 1, P 2, P 3,..., P n ) = 1 h P1 (u)v (n 1) (P 2 (u),..., P n (u)) n = 1 n u N( n i=2 P i) u N( n i=2 P i) 1 h P1 (u) n 1 w N( n i=3 P i(u)) h P2 (u)(w)v (n 2) (P 3 (u)(w),..., P n (u)(w)). In the inner sum, the vectors w summed upon are the normals of the facet ( n i=3 P i)(u). By the same argument as before, we change all sums to sums on facets of n i=1 P i = P and P (u) respectively. We see that V (n) (P 1, P 2, P 3,..., P n ) = 1 1 h P1 (u) n n 1 u N(P ) w N(P (u)) h P2 (u)(w)v (n 2) (P 3 (u)(w),..., P n (u)(w)). Each such facet ( n i=1 P i(u))(w) is an (n 2) dimensional face of P = n i=1 P i contained in P (u), thus given by some non-empty P (u) P (v). In particular, they are all in u, and are the projections of vectors v N(P ) onto u. We are in fact thus summing over facets P (v) which intersect P (u), and we have the lemma about intersection to assist us, Lemma 4.11 (c). Moreover, in the case where P (u) intersects P (v) and gives an (n 2) dimensional facet of P (u), the normal of this facet with respect to P (u), which is w = P u v, is a linear combination of the form w = λu + µv with µ > 0 (see Remark 4.14 above.) Thus P (u)(w) = P (u) P (v). Even further we know by Lemma 4.11 that if n i=1 P i(u) n i=1 P i(v) is such a face then it is given as the sum n i=1 (P i(u) P i (v)) so in particular these intersections are non-empty as well. We thus also have that P i (u)(w) = P i (u) P i (v) V (n) (P 1, P 2, P 3,..., P n ) = 1 1 h P1 (u) n n 1 u N(P ) v N(P ) s.t. P (u) P (v),v u h P2 (u)(p u v)v (n 2) (P 3 (u) P 3 (v),..., P n (u) P n (v)). Note that, as in Remark 4.14, since w is the projection onto u of v, we have w = v u u,v v u u,v. Similarly, v = u cos(θ) + w sin(θ), with θ = θ u,v given by cos(θ) = u, v. Thus sin(θ) = v u u, v and w = 1 sin(θ) v 1 tan(θ) u. Therefore, as in Lemma 4.13, we have h P2 (u)(p u v) = h P 2 (v) sin(θ) h P 2 (u) tan(θ). 37

38 (Indeed, for x P 2 (u) P 2 (v), h P2 (u)(w) = x, w = as we have done before). We go back to the main equation, we see that v, x u, x sin(θ) tan(θ) = h P 2 (v) sin(θ) h P 2 (u) tan(θ). = = V (n) (P 1, P 2, P 3,..., P n ) 1 n(n 1) 1 n(n 1) u N(P ) v N(P ),P (v) P (u),v ±u u,v N(P ),P (u) P (v) [ h P 1 (u)h P2 (v) sin(θ u,v ) h P1 (u)[ h P 2 (v) sin(θ u,v ) h P 2 (u) tan(θ u,v ) ]V (n 2) (P 3 (u) P 3 (v),..., P n (u) P n ( h P 1 (u)h P2 (u) ]V (n 2) (P 3 (u) P 3 (v),..., P n (u) P n (v)). tan(θ u,v ) The expression is clearly symmetric in P 1, P 2. What remains to be proven is that these numbers are independent of independent translations of the bodies, which follows from the same property for Vol n and uniqueness of polynomial decomposition, for example, when plugging in n = m. Indeed, write the symmetric polynomial for Vol( n i=1 λ i(p i + x i )) = Vol( n i=1 λ ip i ). The coefficient of n j=1 λ i j is both V (n) (P i1,..., P in ) and V (n) (P i1 + x 1,..., P in + x n ). For this note that we are using the symmetry of the coefficients to have the relevant uniqueness result (otherwise xy+2yx = 2xy+yx). As is usual in similar situation, there exists a reverse correspondence, namely one can get an expression for mixed volumes in terms of volumes of sums. (Inverse in the sense that Vol n ( P i ) = V (n) (P i1,..., P in )). As a get-ready example compute in R 2 the quantity V (2) (P 1, P 2 ) from Vol 2 (P 1 + P 2 ), Vol 2 (P 1 ), Vol 2 (P 2 ), and in R 3 the quantity V (3) (P 1, P 2, P 3 ) from Vol 3 (P 1 + P 2 + P 3 ), Vol 3 (P 1 + P 2 ), Vol 3 (P 2 + P 3 ), Vol 3 (P 3 + P 1 ) and Vol 3 (P i ). Theorem 4.19 (Polar decomposition). For polytopes P 1,..., P n in R n we have V (n) (P 1,..., P n ) = 1 n! n k ( 1) n+k Vol n ( P jm ). k=1 1 j 1 < <j k n m=1 Proof. The right hand side is clearly some function of the polytopes, denote it F (P 1,..., P n ). Consider, as a function of (λ 1,..., λ n ), the function g(λ 1,..., λ n ) = F (λ 1 P 2,..., λ n P n ). We already know that it is a homogeneous polynomial of degree n (on (R + ) n ). Write it with symmetric coefficients. 38

39 Plug in P 1 = {0} to get a telescoping sum (for each term with P 1 there is a corresponding term without it, with an opposite sign, which cancels it). We thus have g(0, λ 2,..., λ n ) = 0. This means no terms without λ 1 has non-zero coefficient in the polynomial. the same applies to λ j for all j, so the only term in the polynomial is C n j=1 λ j. go back to the polynomial representation - the only element which produces such a term (so - whose terms do not cancel with others) is Vol n ( n m=1 P j m ) with j 1 = 1,..., j n = n, and we have the name for the coefficient of the product, by the previous theorem, simply n!v (n) (P 1,..., P n ). 4.6 Mixed volumes: convex bodies Naturally, we would like to extend the definition, and the theorem, from polytopes to general convex bodies. This is standard, we write it here (but maybe omit details in class?) Theorem For {K i } n i=1 Kn define their mixed volume by V (K 1,..., K n ) = lim V (P (m) 1,..., P (m) n ) where P (m) i K i are any approximating sequences. Then this function is well defined, and we have V (K 1,..., K n ) = 1 n k ( 1) n+k Vol n ( K jm ), n! and for λ i > 0 m Vol( λ i K i ) = i=1 k=1 m i 1,...i n=1 j=1 1 j 1 < <j k n m=1 n λ ij V (n) (K i1,..., K in ). Furthermore, if K, L, K j K n we have (a) V (K,..., K) = Vol n (K) and nv (K,..., K, B n 2 ) = S n(k) (b) V is symmetric (c) V is multilinear in each argument, that ism for λ, µ > 0 V (λk + µl, K 2,..., K n ) = λv (K, K 2,..., K n ) + µv (L, K 2,..., K n ). (d) V (K 1, K 2,..., K n ) = V (K 1 + {x 1 }, K 2 + {x 2 },..., K n + {x n }) (e) V (UK 1, UK 2,..., UK n ) = V (K 1, K 2,..., K n ) for U O(n). (f) V is continuous with respect to each of its arguments (g) V 0 and is monotone increasing in each argument. Proof. We can use the formula for V (n) from Theorem 4.19 which proves that the limit exists, is continuous and independent of the choice of approximating sequence, and satisfies the corresponding equations. In particular we get (d,e,f). To show (a) it suffices to consider polytopes for which it follows immediately from the main formula for m = 1. As for surface area, we first consider K which is a polytope. The definition requires approximating B2 n = lim P m. We know already nv (P,..., P, P m ) = h Pm (u)vol n 1 (P (u)). u N(P ) 39

40 In the limit we have h Pm (u) 1 so we see that nv (P,..., P, B n 2 ) = u N(P ) Vol n 1 (P (u)) = S n (P ) as needed. For a general convex body, we can approximate it from inside and outside by polytopes, and as the mixed volume is continuous, we are done. (We only use here the monotonicity of S n, and we get continuity of it!) The symmetry follows from symmetry for polytopes, as does non-negativity (where in the case of polytopes move 0 to be in the relative interior of each summand). The linearity in each argument is also not hard, expanding in two different ways the polynomial Vol(λ 1 (λk + µl) + λ λ n K n ). Another way is to use additivity of the support function and the definition for polytopes, and then approximate a general body. Monotonicity is also simple if one uses the definition with the support function, which is monotone. Up to here Shiur 8.1, 2 hour, date Mixed volumes via strongly isomorphic polytopes An alternative route towards mixed volumes is to restrict to classes of polynomials with some fixed outer normals. Let us write the sequence of Lemmas that provide such a scheme. In due time we may even fill in the details (much of it is in the exercises). Definition Strongly isomorphic polytopes Lemma Strongly isomorphic implies same for faces. Preserved under Minkowski addition and multiplication by positive scalars. Lemma Strongly isomorphic are dense in the following sense: Given K 1,..., K m and ε > 0 one may find strongly isomorphic polytopes P 1,..., P m with δ H (K i, P i ) ε. In fact, one may also make P i simple, that is, each vertex belongs to exactly n facets. Proposition Fixing a class of simple and strongly isomorphic polytopes, Volume is a homogeneous polynomial of degree n in the heights h P (u j ). Corollary Fixing a class of simple and strongly isomorphic polytopes, the volume of m i=1 λ ip i is a homogeneous polynomial of degree n in λ i. Proposition Polar decomposition. Theorem Polynomiality of mixed volumes with all the needed properties. 4.8 Steiner s formula and Kubota s formulae A special case of Mixed volumes is when m = 2 and K 2 = B2 n. In this case, which is a particular case of the polynomiality claim, the formula for Vol(K + λb2 n ) is called Steiner formula. Originally it was produced by directly studying the so-called parallel body. Here instead of using the notation V (n) (K,..., K, B2 n,... Bn 2 ) =: V (n) (K [k], B2 n [n k]) where K appears k times and B2 n appears (n k) times, we introduce the notation 40

41 Definition For K K n and 0 j n we let the quer-mass-integrals W j (K) be defined via the Steiner formula n ( ) n Vol(K + λb2 n ) = W j (K)λ j. j j=0 Note that W j (K) = V (n) (K [n j], B2 n [j]) (which may cause some confusion: for µ 0 we have W j (µk) = µ n j W j (K)). See about another normalization in the exercises. Remark Note that in particular one gets that 1 S n (K) = nw 1 (K) = lim ε 0 + ε (Vol n(k + εb2 n ) Vol n (K)). Thus, surface area in indeed the derivative of volume. This case of mixed volumes has been studied extensively. In the next section, when we start dealing with inequalities, this will (as we have already seen) play a role in the isoperimetric inequality and in other inequalities such as Urysohn. In this subsection we present an elegant formula for these integrals, called the Kubota formula. To present the formula it is useful to introduce the Grassmanian G n,k of k dimensional subspaces of R n, which have a natural Haar measure, we call it σ n,k (normalized and invariant of rotation). The measure σ n,k can be realized as follows: pick k independent vectors randomly on the sphere and take their span. With probability one this is an element in G n,k, and this also gives the measure. Another option: if one agrees that there is a Haar measure on O(n) (which can be obtained be taking a random element u 1 in S n 1, then a random element in S n 2 u 1 etc..) Then pick a random element U O(n), consider the subspace U(sp{e 1,... e k }) which is an element in G n,k. Different U can give rise to the same element in G n,k, so that G n,k is a quotient space of O(n) and inherits the measure from it. From uniqueness of Haar measure, all these constructions are identical. We shall denote κ n = Vol n (B2 n). Remark It is useful to know how κ n behaves as a function of n. It is not relevant to this chapter, but anyhow a good place to remark upon it. In Chedva 3 one shows (2π) n/2 = dx = e r2 r n 1 nκ n drdσ = e x2 R n 0 S n 1 0 e s (2s) n 2 2 nκn dr = nκ n 2 n 2 2 Γ( n 2 ). so that κ n = 2πn/2 nγ( n 2 ) = πn/2 Γ( n +1). In particular κ 1 = 2, κ 2 = π, κ 3 = 4 3 π, κ 4 = π2 2 etc. Seems 2 like a growing sequence? Not the case at all... Using the approximation of Stirling Γ(k + 1) 2πk( k e )k (where means that he quotient converges to 1) we get κ n 1 ( 2πe ) n/2. πn n So, a ball of volume one has to have radius of the order n. Theorem 4.31 (Kubota s formulae). Let K K n and 1 j n 1. Then W j (K) = κ n Vol n j (P E K)dσ n,n j (E). κ n j G n,n j Before proving it, let us write the first case which is j = 1. (It will also serve as a base of induction later). We get 41

42 Lemma 4.32 (Cauchy formula). Let K K n. Then W 1 (K) = κ n Vol n 1 (P κ u K)dσ(u). n 1 S n 1 Recall that W 1 (K) = V (n) (K, K,..., K, B2 n) = 1 n S n(k), so this is in fact a formula for surface area. Also, sometimes one uses integration on the sphere with respect to usual Lebesgue measure, the relation being dσ(u) = du/vol n 1 (S n 1 ) = du/(nκ n ) so we arrive here at the formula S n (K) = 1 Vol n 1 (P κ u K)du n 1 S n 1 which may be more familiar to you as Cauchy formula. Proof. We work with a polytope P first. For a generic u S n 1, each facet F i of P has some angle θ i between its normal and u which is not π/2. When such a facet is projected onto u, its area when projected is cos(θ) times its original area. Clearly the projection P E (P ) is covered twice by the projections of the facets of P. We get that Vol n 1 (P u (P )) = 1 2 m cos(θ i ) Vol n 1 (P i ). i=1 When integrated over the sphere, we get S n 1 Vol n 1 (P u (P ))dσ(u) = 1 2 m i=1 Vol n 1 (P i ) cos(θ(n i, u)) dσ(u). S n 1 By rotation invariance, the latter integral does not depend on n i and in simply a constant depending on the dimension (for example, let n = (1, 0,..., 0), so that cos(θ) = u 1 ). We have thus shown that there exists some constant c n such that for any polytope P we have S n 1 Vol n 1 P u (P )dσ(u) = c n S n (P ). Since both sides are well defined for convex bodies and are clearly monotone, and since we have equality for all polytopes, we have equality for bodies as well. Thus, for the same c n (defined for now by half the integral of u 1 over u S n 1 ) we get for all K K n that S n 1 Vol n 1 P u (K)dσ(u) = c n S n (K). Instead of computing the integral, we may simply plug in K = B n 2 in which case we get Thus c n = 1 κ n 1 n κ n. Vol n 1 (B n 1 2 ) = c n S n (B n 2 ) = c n nvol n (B n 2 ). Remark A sanity check: We showed above that 1 κ n 1 n κ n = c n = 1 2 good to see that this average of a coordinate is below 1. We may compute 1 κ n 1 n κ n = 1 n π n 1 2 Γ( n 2 + 1) π n 2 Γ( n ) 1 42 n π (n ) 1/2 1 n. S n 1 u 1 dσ(u). It is

here means up to constants, using that 1 e ( n+1 2 )1/2 Γ( n 2 +1) Γ( n 2 + 1 2 ) ( n+1 2 )1/2. (One may get a sharper estimate with exact constants).

In fact, the following picture shows that σ{u : u 1 > r} (1 r 2 ) n/2 e 1 2 nr2. Indeed, denoting the cap by C r, and A = conv(c r, {0}), clearly σ(c r ) = Voln(A) Vol n(b2 n ).

43 here means up to constants, using that 1 e ( n+1 2 )1/2 Γ( n 2 +1) Γ( n ) ( n+1 2 )1/2. (One may get a sharper estimate with exact constants). So, the average of u 1 on the sphere is n 1/2, which correspond to the concentration phenomenon which implies that all the mass of the sphere lies near any big-circle. In fact, the following picture shows that σ{u : u 1 > r} (1 r 2 ) n/2 e 1 2 nr2. Indeed, denoting the cap by C r, and A = conv(c r, {0}), clearly σ(c r ) = Voln(A) Vol n(b2 n ). However A is a subset of a ball with radius (1 r 2 ) 1/2 centred at the touching point of the cap and the ball of radius r, and this ends the proof. Up to here Shiur 8.2, 1 hours, date To prove Kubota formula we shall use induction. The induction step will be given by a formula which compares the mixed volume in R n with the mixed volumes of the projections in their respective R n 1 s. Remark Note that the function W j depends on the dimension in which we are working (even its homogeneity does!). This is only a matter of normalization. For the sake of the following Lemma, let us call w j the corresponding function in dimension (n 1). (That is, W j (K) = V (n) (K[n j], B2 n[j]) and w j(k ) = V (n 1) (K [n 1 j], B2 n 1 [j]) where K R n and K R n 1.) 43

44 = In fact, if K is (n 1) dimensional, we have λ λ n j=0 ( ) n W j (K)λ j = Vol(K + λb2 n ) = j ( ) Vol n 1 K + (λ 2 t 2 ) 1/2 B n 1 dt = 2 = λ λ ) Vol n 1 ((K + λb2 n ) (R n 1 + tu )dt n 1 ( ) n 1 w j (K) j j=0 λ λ (λ 2 t 2 ) j/2 dt n 1 ( ) n 1 w j (K)λ j+1 κ j+1. j κ j In the last equality we have computed the (simple) integral for the volume of a (j +1)-ball using Fubini. We get by coefficient comparison ( ) n W j (K) = κ ( ) j n 1 w j 1 (K) j j 1 which may be rewritten as κ j 1 j=0 W j (K) = 1 n j κ j κ j 1 w j 1 (K). Theorem Let K K n and 1 j n 1. Then the following relation holds: W j (K) = κ n w j 1 (P nκ u (K))dσ(u). n 1 S n 1 Here we have used the normalized Haar measure on S n 1. Using the (not normalized) Lebesgue measure on the sphere we can rewrite W j (K) = 1 w j 1 (P κ u (K))du. n 1 S n 1 (In the case j = 1 we have w 0 = Vol n 1 and this is simply the Cauchy formula.) Proof. We shall get two polynomial formulae for S n (K + λb2 n ), and compare coefficients. On the one, hand we expand P u (K + λb2 n) = P u (K) + λ(bn 2 u ). to get n 1 ( ) n 1 Vol n 1 P u (K + λb2 n ) = w j (P j u (K))λ j j=1 and integrate over the sphere to get by Cauchy formula W 1 (K + λb n 2 ) = κ n n 1 κ n 1 j=1 ( n 1 )λ j w j (P j u (K)). S n 1 On the other hand, using multi-linearity we can get a polynomial expression in λ W 1 (K + λb2 n). It is easier to do it by expanding Vol n(k + λb2 n + µbn 2 ) and comparing the coefficients of µ: Vol n (K + λb n 2 + µb n 2 ) = n j=0 ( ) n (λ + µ) j W j (K) = j 44 n j=0 ( ) n W j (K) j j m=0 ( ) j λ m µ j m. m

45 Vol n (K + λb n 2 + µb n 2 ) = n j=0 ( ) n µ j W j (K + λb2 n ) j Since these two polynomials in µ are equal, the coefficient of µ is equal in both, and we get that W 1 (K + λb2 n ) = 1 n ( ) n W j (K)jλ j 1 n j j=1 Comparing our two expressions for W 1 (K + λb2 n) we can equate the coefficients of λj 1 to get ( ) 1 n W j (K) = κ ( ) n n 1 w j 1 (P n j κ n 1 j 1 u (K)). S n 1 Clearly this is the same as the claim in the theorem. Proof of Theorem The proof follows by induction on j, where the case j = 1 is Cauchy s formula. Assume the claim holds for j < k. By Theorem 4.35 we have that W j (K) = κ n w j 1 (P κ u (K)) n 1 S n 1 and by the inductive assumption the latter can be written as w j 1 (P u (K)) = κ n 1 Vol n j (P F (P κ u K))dσ n 1,n j (F ). n j G u,n j Where we consider each time the Grassmanian G u,n j of (n j) dimensional subspaces F of u with the corresponding Haar measure. Clearly dσ n 1,n j (F )dσ(u) = dσ n,n j (E). Putting the two formulae together we end up with W j (K) = κ n κ n 1 κ n 1 κ n j G n,n j Vol n j (P E K))dσ n,n j (E). As a corollary of Kubota s formula we get an interpretation for W n 1 (K). Definition 4.36 (Mean width). Let K K n. The mean width of K is defined as 2M (K) where M (K) = h K (u)dσ(u) S n 1 (where dσ(u) is, as usual, normalized Lebesgue measure on the sphere). It is called width because for each fixed u, the minimal slab with normals u which includes the body has width h K (u)+h K ( u) and the average of this width over the unit sphere is 2M. 45

Theorem 4.37. Let K K n. Then M (K) = 1 κ n W n 1 (K). Proof. To understand the formula we can take polytopes P m which B2 n so that V (K, B2 n,.

46 Theorem Let K K n. Then M (K) = 1 κ n W n 1 (K). Proof. To understand the formula we can take polytopes P m which B2 n so that V (K, B2 n,..., B2 n ) = lim 1 h K (u)vol n 1 (P (u)) = 1 h K (u)du n n S n 1 u N(P m) where integration is with respect to the spherical Lebesgue measure, so that using du = S n(b2 n) d σ(u) = nκ n dσ(u) we get the formula. However, here we are using that the atomic measures dµ m on the sphere with atoms at normals of P m and weights which are the volumes of the corresponding facets approximate (weakly) du, the Lebesgue measure on the sphere. While this can easily be formalized, it will be easier to use Kubota s formula. Indeed, we have by Theorem 4.31 and j = n 1 that V (K, B2 n,..., B2 n ) = κ n Vol 1 (P Ru K)dσ(u) 2 S n 1 Clearly Vol 1 (P Ru (K)) = h K (u) + h K ( u) and when integrated both terms contribute the same total integral so that we get W n 1 (K) = M (K). κ n Let us remark on a geometric way to see the last formula. The quantity W 1 can also be written as Vol(B2 n nw 1 (K) = lim + εk) Vol(Bn 2 ) Vol(RB2 n = lim + K) Vol(RBn 2 ) ε 0 + ε R + R n 1. The following picture should explain why, roughly, Vol(RB2 n+k) Vol(RBn 2 ) Rn 1 S h n 1 K (u)du, and thus nw 1 (K) = h K (u)nκ n dσ(u). S n 1 46

47 5 Lesson 4: Brunn-Minkowski Inequality: revisited 5.1 Intro The inequality Vol(K + L) 1/n Vol(K) 1/n + Vol(L) 1/n, was presented as Theorem 1.2 in the very first lesson, and we saw a proof of it which does not involve convexity (indeed, the statement is true quite generally). We did not state the equality case, which is that either K and L are homothetic, or one of the volumes is zero (the latter does not imply equality though: a more precise statement is as follows: there is equality if and only if K and L are homothetic convex bodies from which a set of measure 0 has been removed). We shall see more proofs below, perhaps also of the equality case. Now that we already know that Vol(tK + sl) is a polynomial in t, Brunn-Minkowski s inequality can be interpreted as an inequality of polynomials, so in particular of the first non-trivial coefficients. This is called Minkowski s inequality. Its equality case is based on that of Brunn-Minkowski. Theorem 5.1 (Minkowski s first inequality). Let K, L K n then V (K,..., K, L) Vol(K) n 1 n Vol(L) 1/n. Moreover, equality holds if and only if either K and L are homothetic, K and L lie in parallel hyperplanes, or dim(k) n 2. Proof. For the inequality one can just write the inequality of polynomials n j=0 ( ) n V (K[n j], L[j])λ j (Vol(K) 1/n + λvol(l) 1/n ) n. j Since equality holds at λ = 0 we get an inequality of the first derivative at 0 namely nv (K,...K, L) n(vol(k) 1/n ) n 1 Vol(L) 1/n. However, since we want also the equality case, we proceed slightly differently (in a way which will help you in the exercise too). We consider the function f(λ) = Vol((1 λ)k + λl) 1/n 47

48 which is concave in λ [0, 1]. Therefore we may bound its derivative from below by f(1) f(0). Compute f (0) = 1 n Vol(K)(1 n)/n [ nvol(k) + nv (K,..K, L)] Vol(L) 1/n Vol(K) 1/n and cancellation gives V (K,..., K, L) Vol(L) 1/n Vol(K) (n 1)/n. Moreover, if there is equality this means that the concave function f is actually linear, which means there is equality in the Brunn-Minkowski inequality for K and L, unless Vol(K) = 0 in which case we have to go back and see when there is equality, namely V (K,..., K, L) = 0, and this is easily seen to be if either dim(k) n 2 or dim(k + L) n 1. Now we use the (not proved in this course, yet at least) equality case in Brunn-Minkwoski s inequality. Up to here Shiur 9.1, 2 hours, date A special case of this is of course the isoperimetric inequality Corollary 5.2 (Isoperimetric). Of all convex bodies with given volume, the one with minimal surface area is a euclidean ball. Proof. We rephrase it using the homogeneity of Vol and S n as Use Minkowski s inequality to get that ( ) Sn (K) 1/(n 1) ( ) Voln (K) 1/n S n (B2 n) Vol n (B2 n). S n (K) = nv (K,..., K, B n 2 ) nvol(k) (n 1)/n κ 1/n n ( ) Vol(K) (n 1)/n = nκ n. κ n There is another interesting inequality and it is when the roles of K and B n 2 are reversed. Corollary 5.3 (Urysohn s inequality). Of all convex bodies with given volume, the one with minimal mean width is a euclidean ball. Proof. We rephrase it using the homogeneity of Vol and M as Use Minkowski s inequality we get that M (K) = M ( ) (K) M (B2 n) Voln (K) 1/n Vol n (B2 n). M (K) = 1 V (K, B2 n,..., B2 n ) 1 ( ) Vol(K) 1/n κ (n 1)/n Voln (K) 1/n n = κ n κ n Vol n (B2 n). Let us remark on a reverse inequality. Assume that K is a convex body which includes the origin (you can assume centrally symmetric if you wish) and suchtaht 0 is in its interior. Thus 48

49 it gives rise to a norm (which may be non-symmetric in the general case, and is then called a Gauge function) x = x K = inf{r > 0 : x rk}. Thus, K = {x R n : x K 1}. We may compute Vol n (K) = 1 K (x) = 1 K (ur)r n 1 drdu = κ n R n S n 1 0 S n 1 and using the above, 1 K (ur) = 1 if and only if r 1 u. We get that Vol n (K) = κ n S n 1 1/ u 0 nr n 1 drdσ(u) = κ n S n K (ur)nr n 1 drdσ(u). ( ) 1 n dσ(u). u Thus the ratio between the volume of K and that of the euclidean ball is Vol n (K) 1 = κ n S n 1 u n dσ(u). Using convexity of x x n we see that n 1 S u n dσ(u) ( u ) n S n 1 so that we get, letting M(K) = S n 1 u, a reverse inequality to that of Urysohn, namely Note that M(K) = M (K ). 5.2 Steiner symmetrization ( Voln (K) κ n ) 1/n 1 M(K). In this section we describe a way of symmetrizing a body to be closer to a ball. This method, called Steiner symmetrization, is very useful in proving geometric inequalities, as we shall see below. In fact, it was the original method of Steiner in proving the isoperimetric inequality, and can be used, as we shall see below, also to prove Brunn Minkowski and many other inequalities. Definition 5.4. Let K R n be a convex body and u S n 1. direction u, S u K, is defined so that for any x u The Steiner symmetral in Vol ((x + Ru) K) = Vol ((x + Ru) S u K), and the right hand term is an interval centred at x. 49

50 Remark 5.5. There is a continuous way to look at Steiner symmetrization, letting the various intervals move in constant velocity (which depends on the interval). This is call a system of moving shadows, and we will not go into this. We gather some basic properties of Steiner symmetrization. By a proper convex body we shall mean that it has non empty interior. Up to here Shiur 9.2, 1 hour, date Proposition 5.6. Let K, T K n and u S n 1. Then (i) S u (K) is a convex body (ii) λs u K = S u (λk) (iii) S u K + S u T S u (K + T ) (iv) K T implies S u K S u T (v) S u is continuous with respect to Hausdorff metric on proper bodies (vi) Vol(S u (K)) = Vol(K) (vii) S n (S u (K)) S n (K) (surface area) (true in fact for all W i : we do not explain for now) (viii) diam(s u (K)) diam(k) (ix) inradius(s u (K)) inradius(k), circumrad(s u K) circumrad(k) Proof. (i) follows from trapezoids. Indeed, convexity is a two-dimensional notion, so it is enough to consider x and y in u and check that for any 0 < λ < 1 the interval centred at z = (1 λ)x + λy with length l z = K (z + Ru) has length which is at most (1 λ)l x + λl y, which in turn follows from the fact that (1 λ)(k (x+ru))+λ(k (y +Ru)) K (z +Ru). (ii) is trivial, as all notions are homogeneous (iii) is a computation: Let as in (i) l z (K) = K (z +Ru). Take an element in the left hand side, it is of the form x + y where x = x + tu with x u and t 1 2 l(k) x and similarly y = y + su 50

51 with y u and s 1 2 l(t y ). Their sum is x + y + (t + s)u. To see that it is in the right hand side we need to see that the length of (K + T ) (x + y + Ru) is at least l x (K) + l y (T ). This is true since (K + T ) (x + y + Ru) (K (xru)) + (T (y + Ru)) and the latter is an interval of length l (K) x + l (T ) y. (iv) is immediate, since it is just an inequality of lengths. (v) One should first note that properness is important, as one can easily construct a non proper example with no continuity, as in the following picture In the subclass of proper convex bodies: Take a sequence of proper convex bodies which converges to a proper body, K j K. We may assume wlog that 0 int(k). Since clearly Steiner symmetrization or a translation is a translation of the original symmetrization (in a vector which is the projection onto u of the original translation). Then by definition of distance, and comparability of B 2 and K, for each ε > 0 there is some index M such taht for j > M we have (1 ε)k K j (1 + ε)k. Now use the homothety property (ii) to get that also (1 ε)s u K S u K j (1 + ε)s u K. (vi) is just Fubini. (vii) We use the equivalent definition of surface area as a limit of volume differences, and the Brunn-Minkowski property (iii) as follows: S n (S u K) = Vol n (S u K + εb2 n lim n(s u K) Vol n (S u (K + εb2 n lim n(s u K) ε 0 + ε ε 0 + ε = Vol n (K + εb2 n lim n(k) = S n (K). ε 0 + ε (viii) It is again a two dimensional claim. The diameter is attained as x y for some x, y K. Consider the trapezoid, all of which is inside K, which is tha convex hull of K (x + Ru) and K (y + Ru). It has two diagonals, the larger of which is x y. 51

52 The large diagonal is only decreased after symmetrizing, by Pythagoras more or less. (ix) The symmetral of a ball is a ball with the same radius and containment is preserved by (iv), this applies both for inradius and circumradius. Already what is here is good for proving various things via symmetrization, using a limiting argument about the existence of a sequence of symmetrizations converging to a ball. This argument and the corollaries will be given in Lemma 5.9 and after it. But first one more property about behaviour of Steiner with polarity. Since polarity may have not been defined yet (I think it has, with Legendre before, but to be sure) we digress a second: Definition 5.7. Let K K0 n. The polar body of K is defined by K = {y R n : x, y 1 x K}. It has many good properties some of which we have seen in exercises. It is a closed convex set, compact if 0 int(k), the mapping K K reverses order, it is an involution on K(0) n (and also on closed convex sets containing 0). On K0 n it exchanges intersection with closure of convex hull. Of particular interest in the volume product of K and K. One defines the Santalo product of a centrally symmetric K = K by S(K) = Vol n (K)Vol n (K ). Note that this number is invariant under invertible linear transformations. Indeed, (AK) = {x : Ay, x 1 y K} = {x : y, A x 1 y K} = (A ) 1 K. Thus S(AK) = det(a) det(a ) 1 S(K) = S(K). Actually, for non-symmetric bodies clearly the choice of where the origin is will affect the Santalo product, and usually one defines for general K K n (0) S(K) = inf x {Vol n(k)vol n ((K x) )}. It is known that S(K) is maximized when K is an ellipsoid. This is called Santalo inequality (and we shall prove below in Theorem 5.14). The question of where it is minimized remains to this day an open problem (even in dimension 3!), and Mahler s conjecture asserts that the minimizer is a simplex. Restricting to the class of centrally symmetric convex bodies, the conjecture asserts that the minimum is attained at the cube (however, in such a case it is far from unique). In order to use symmetrizations to prove Santalo s inequality we shall invoke the following property of Steiner symmetrizations: 52

53 Theorem 5.8 (decrease dual volume). Let K = K K n (0) and let u Sn 1. Then Vol(K ) Vol((S u K) ). Proof. Wlog we shall symmetrize with respect to the hyperplane x n = 0, that is, u = e n. We may write S u K = {(x, 1 (s t)) : (x, s), (x, t) K}, 2 (S u K) = {(y, r) : x, y + r(s t)/2 1 (x, s), (x, t) K}. For a set A R n let A(r) denote the (n 1) dimensional part of A at height r, that is, {x R n 1 : (x, r) A}. Then 1 2 (K (r) + K ( r)) = { y + z 2 { y + z 2 Note that since A = K = A we have : x, y + sr 1, w, z tr 1 (x, s), (w, t) K} : x, y + sr 1, x, z tr 1 (x, s), (x, t) K} { y + z : 1 (s t) x, y + z + r 1 (x, s), (x, t) K} (s t) = {u : x, u + r 1 (x, s), (x, t) K} 2 = (S u K) (r). A( r) = {x : (x, r) A} = {x : ( x, r) A} = { y : (y, r) A} = A(r) so that both have the same volume. Thus, by Brunn-Minkowski s inequality we have get that Vol n 1 ( K (r) + K ( r) ) Vol n 1 (K (r)) 1/2 Vol n 1 (K ( r)) 1/2 = Vol n 1 (K (r)) 2 Putting these together we see that Vol n (S u (K)) = Vol n 1 ((S u (K)) (r))dr Vol n 1 (K (r))dr = Vol n (K ). Having seen that various quantities are constant, or monotone, with respect to Steiner symmetrization, our next step is to explain that apllying subsequent symmetrizations one approaches the Euclidean ball. We shall not prove this claim (from time considerations alone) but indicate roughly how it is done. Lemma 5.9. Let K K n (0) satisfy Vol(K) = κ n. u j S n 1 such that letting K j = S uj K j 1 we have K j B n 2. Then there exists a sequence of vectors Proof. The idea: take the class of all bodies obtained by subsequent symmetrizations of the original body K. They are all included in any ball around 0 which includes K. Take the infimal circumradius (distance from 0) obtained by bodies in this family, R 0. Take a sequence of symmetrals whose circumradius converges to R 0, and by Blaschke, a convergent subsequence of it, K j L. First we shall explain how to show the limit body L is a ball (its circumradius is the infimal circumradius of course). 53

54 Assume not, then the L is included in the ball with the infimal radius, and misses some cap of this ball (say a cap or radius δ > 0). By compactness one may cover the boundary of this ball (the sphere) with mirror images of this cap with respect to a sequence of hyperplanes. Pick a body in the sequence which is very close to the limit, and in particular so are all of its symmetrizations. We claim that symmetrizing the limit body with respect to all these hyperplanes will produce a ball which is strictly inside the original one, contradicting minimality of its circumradius. We are almost done, as we have shown that for any ε we can find a sequence of symmetrizations which bring us close to a ball up to ε. We may now build our sequence inductively. Note that this proves in fact that we can find a sequence of symmetrizations with respect to which several bodies, together, converge to a ball. Remark The question of how fast these symmetrizations approach B2 n (with best choice of u j, or with a random choice, or with a semi-random one) is of high importance in this field but is beyond the scope of these notes (for now). See papers on the subject by Klartag and Milman, and the references therein (where it is shown that it takes O(n) symmetrizations to approach a Euclidean ball within small distance). Corollary 5.11 (Brunn-Minkowski). For convex bodies K, T K n (0) we have Vol n (K + T ) 1/n Vol n (K) 1/n + Vol n (T ) 1/n. Proof. By Proposition 5.6 (iii) we have for any u that S u (K + T ) S u (K) + S u (T ). Pick a sequence of symmetrizations such that subsequent symmetrizations with respect to them of both K, T and K + T approach a Euclidnean balls of radii r K, r T and r K+T respectively. By volume preservation, r n K κ n = Vol n (K) and similarly for the two other radii. Applying the proposition at every step we get that, in the first step, and by induction Taking the limit we see that S u2 S u1 (K + T ) S u2 (S u1 K + S u1 T ) S u2 S u1 K + S u2 S u1 T S uj S u1 (K + T ) S uj S u1 K + S uj S u1 T. r K+T B n 2 r K B n 2 + r T B n 2 = (r K + r T )B n 2 namely r K+T r K + r T. Rewriting this in terms of volumes, and cancelling κ 1/n n we get Vol n (K + T ) 1/n Vol n (K) 1/n + Vol n (T ) 1/n, as needed. in both sides, Similarly one can get the isoperimetric inequality directly by property (vii) of Proposition 5.6 and a sequence of symmetrizations. Using property (viii) that diam(s u (K)) diam(k) one produces the isodiametric inequality, namely Theorem 5.12 (Isodiametric). For K K n one has that Vol(K) ( diam(k) ) n κ n. 2 Note that in the centrally symmetric case this is trivial, since K diam(k) 2 B2 n. In the non-symmetric case a proof using Brunn-Minkowski directly is also easy to produce: 54

55 Using Brunn-Minkowski. Note that the difference body K K is symmetric about the origin and has twice the diameter of K diam(k) = max{ x y : x, y K} = max{ z : z K K}. (Diameter is simply maximal width, and the maximal width of K and of K are attained in the same direction.) From Brunn-Minkowski we know that Vol(K K) 2 n Vol(K) and we get, from the trivial centrally symmetric case that as needed. Vol(K) Vol( K K 2 ) ( diam(k) ) n κ n 2 Up to here Shiur 10.1, 2 hours, date 4.1 Remark We have used that Vol(K K) 2 n Vol(K). A reverse inequality also holds, and it is called the Rogers Shephard inequality, which will be discussed in Section 7. It says that ( ) 2n V ol(k K) vol(k). n Note that in general a reverse Brunn-Minkowski inequality cannot hold, so the above inequality seems to be a genuine thing about K and K. However, it turns out not to be the case, and in fact when any two bodies are placed in the right position (a SL n image of them) a reverse type Brunn-Minkowksi inequality of the form Vol(K + T ) 1/n C(Vol(K) 1/n + Vol(T ) 1/n ) holds true. This is a deep theorem of Bourgain and Milman, to which we will attend in Section 9. Still, in Rogers Shephard we have the advantage that the exact constant is known (it is attained for the simplex). Finally, a theorem which follows by Steiner symmetrization and which is not a simple consequence of Brunn-Minkowski is the Sanatalo inequality for volume product: Theorem 5.14 (Santalo). For any centrally symmetric convex body K K n (0) we have S(K) = Vol(K)Vol(K ) κ 2 n. Proof. The inequality is clearly linear invariant (see the discussion after Definition 5.7) so we may assume wlog that Vol(K) = κ n. Take a sequence of subsequent symmetrizations K j of K converging to B n 2. Then by order reversion also K j K. Indeed, (1 ε)b n 2 K j (1 + ε)b n 2 (1 + ε) 1 B n 2 K j (1 ε) 1 B n 2. Since Vol n (K j ) = Vol n (K j 1 ) and Vol n (Kj ) Vol n(kj 1 ) we see that the volume product is increasing, S(K) S(K 1 ) S(K j 1 ) S(K j ) S(B2 n ) = κ 2 n. Remark The non-symmetric case is more difficult to deal with, and one can consult the following paper of Meyer and Pajor 55

56 where it is shown that, denoting K z = (K z) + z, for every affine hyperplane H intersecting K there is a point on it, z H, such that Vol n (K)Vol n (K z κ 2 n ) 4λ(1 λ), where λ is the ratio of the volume of K in H +, say, and the total volume. In particular, S(K) κ 2 n. To this end the prove a lemma of the same type as Theorem 5.8, namely that if a hyperplane separated K z to two parts with volumes λvol n (K) and (1 λ)vol n (K), and if K is Steiner symmetrized with respect to H, then Vol n ((S H (K)) z ) 4λ(1 λ)vol n (K z ). However, the proof of this lemma requires an inequality for functions reminiscent of the Prekopa-Leindler inequality which we shall prove in Section 5.3 below. Finally, the lemma is applied iteratively (but with the extra factor involving λ only at the first step). 5.3 Prekopa Leindler Because we saw too elegant a proof of Brunn-Minkowski, we will state a generalization of it to functions and give a perhaps more transparent proof (though in some sense the ingredients will be similar!). For more information you may consult the first pages of shiri/teaching/gard.pdf Theorem 5.16 (Prekopa-Leindler). Let 0 < λ < 1 and let f, g, h : R n [0, ) be integrable, and satisfy that for all x, y R n Then h((1 λ)x + λy) f(x) (1 λ) g(y) λ. ( ) (1 λ) ( ) λ h(x)dx f(x)dx g(x)dx. R n R n R n Remark Notice that the function h satisfies in particular that h(z) f(z) λ g(z) (1 λ), however for the left hand side we have an inequality (called Hölder) ( ) λ ( f(z) λ g(z) (1 λ) dz f(x)dx g(x)dx R n R n R n ) (1 λ) so this can be tought of as a reverse Hölder inequality. 2. Notice that if f = e ϕ and g = e ψ the condition on h = e ξ is that ξ(z) inf(λϕ(x) + (1 λ)ψ(y)) whenever z = λx + (1 λ)y, and the right hand term is (in the log-concave setting, at least) the inf-convolution of the two functions (whose epigraph is the Minkowski-average of the epigraphs). 3. Note that this is indeed a generalization of the multiplicative Brunn-Minkowski inequality when we take h = 1 λk+(1 λ)t, f = 1 K and g = 1 T we get Vol((1 λ)k + λt ) Vol(K) (1 λ) Vol(T ) λ. 56

57 As in Remark 1.3, this is equivalent to the usual version. 4. Note that in the statement of the theorem convexity assumptions are not involved. The natural ones would be that f, g and h are e ϕ for convex ϕ, what is called log-concave functions. In such a case the proofs below become easier, and we get Brunn-Minkowski for convex sets. Up to here Shiur 10.2, 1 hour, date 5.1 The proof of the inequality will proceed by induction on dimension, so we shall first prove the one-dimensional version. However, its proof will rely on a very simple version of the Brunn Minkowski inequality, the one dimensional one, which we produce first (note that in the convex setting, there is simply equality). Lemma Given two non-empty, bounded, and measurable X, Y R such that λx + (1 λ)y is measurable. Then Vol 1 (λx + (1 λ)y ) λvol 1 (X) + (1 λ)vol 1 (Y ). Proof. We shall work with compact sets, and the inequality will follow for non-compact by approximation. We shall write X = λx and Y = (1 λ)y. We shall translate X and Y so that X Y = {0} and X R and Y R +. Then X Y X + Y and Vol 1 (X Y ) = Vol 1 (X ) + Vol 1 (Y ) and the proof is complete. Theorem 5.19 (1-dimensional Prekopa-Leindler). Let 0 < λ < 1 and let f, g, h : R [0, ) be integrable, and satisfy that for all x, y R Then R h((1 λ)x + λy) f(x) (1 λ) g(y) λ. ( ) (1 λ) ( λ h(x)dx f(x)dx g(x)dx). R R Proof. Notice that by the condition on the three functions we have an inclusion of the form λ{x : f(x) t} + (1 λ){y : g(y) t} {z : h(z) t}. Therefore, using the one-dimensional Brunn-Minkowski inequality, we know that λvol 1 ({x : f(x) t}) + (1 λ)vol 1 ({y : g(y) t} Vol 1 ({z : h(z) t}). We use the equality R h = 0 Vol 1 (({z : h(z) t})dt (for h, g and f) to get R ( ) (1 λ) ( λ h (1 λ) f + λ g f g). R R R R We remark that if someone is worried by boundedness etc., one may work with bounded g and f using truncation, and also one may normalize f and g so that they have supremum 1, or integral 1, as (sometimes) needed. We are now ready to prove the Prékopa Leindler theorem, using induction. Proof of Theorem The case n = 1 was settled in the previous theorem. Assume we know it for all dimensions < n. Given f, g, h as in the theorem, we shall factor R n = R n 1 R 57

58 and define for any s R the functions f s, g s, h s by f s (x) = f(x, s) etc. c = (1 λ)a + λb. Then Let a, b R and and so by induction h c ((1 λ)x + λy) f a (x) (1 λ) g b (y) λ, ( ) (1 λ) ( ) λ h c (z)dz f a (x)dx g b (y)dy. R n 1 R n 1 R n 1 Letting these integrals define functions on R, H(c), F (a) and G(b) respectively, we see that the condition of the one-dimensional Prekopa Leindler theorem holds for these, and conclude that h = R n ( ) (1 λ) ( λ ( H(c)dc F (a)da G(b)db) = R R R n f ) (1 λ) ( R n g ) λ. Remark It is worth mentioning yet another proof of the Prekopa Leindler inequality which using measure transportation. The whole topic of transportation of measure was a subject of a course I gave last year for M.Sc., so we shall not get into it, but it is a powerful technique which allows to prove many results, and this is just one small example of this. We shall do it in one dimension only. We shall assume regularity throughout, though difficulties such as non-differentiability are easily overcome. Let the full integrals of f and g equal to 1 without loss of generality. Define u and v so that u(t) f(s)ds = t = v(t) g(s)ds. Clearly u and v are monotone, so almost everywhere they are differentiable. Let us assume for convenience they are everywhere differentiable. Let w(t) = (1 λ)u(t) + λv(t). It is also monotone and will serve as a change of variables for h. Note that by the chain rule f(u(t))u (t) = 1 and g(v(t))v (t) = 1. Therefore w (t) u (t) (1 λ) v (t) λ 1 1 =. f(u(t)) (1 λ) g(v(t)) λ Compute R h(x)dx = h(w(t))w (t)dt 1 f(u(t)) (1 λ) g(v(t)) λ f(u(t)) (1 λ) dt = 1. g(v(t)) λ h((1 λ)u(t) + λv(t)) f(u(t)) (1 λ) g(v(t)) λ dt Although the proof above (as did the proof for BM in the case of sets) works for rather general functions, the natural functions to work with are log-concave functions. Definition Let f : R n [0, + ). It be called log-concave if f = e ϕ for some convex ϕ. this can be reformulated as follows: for any x, y R n and λ [0, 1] we have f(λx + (1 λ)y) f(x) λ f(y) (1 λ). The support of f is the set {f 0}. A Borel measure µ on R n is called log-concave if µ((1 λ)a + λb) µ(a) (1 λ) µ(b) λ. 58

59 Note that for log-concave f, level sets of the form {x : f(x) t} of such functions are intervals. Lemma If a Borel measure µ has density f which is log-concave, then µ is log-concave. (A particular case is the Brunn Minkowski inequality.) Proof. Note that for z = (1 λ)x + λy we have that f(z)1 (1 λ)a+λb (z) (f(x)1 A (x)) (1 λ) (f(y)1 B (y)) λ. Therefore we may apply Prekopa-Leindler to get that µ((1 λ)a + λb) = (1 λ)a+λb ( ) (1 λ) ( f(z)dz f(y)dy A B f(x)dx) λ = µ(a) (1 λ) µ(b) λ. The opposite is also true, namely any log-concave measure which has a support of full dimension must have a density which is a log-concave function, this is a result due to C. Borell which we do not prove. You shall show in the exercises that Prekopa Leindler implies that the convolution of two log-concave functions is log-concave, and that the marginal of a log-concave function is a logconcave function. Since this is rather principal, let us demonstrate the proofs of these facts. Lemma Let f, g : R n [0, + ) be log-concave functions. Then (f g)(x) = f(y)g(x y)dy R n is a log-concave function, and so is P F (f) : E [0, ) defined for a subspace E by P E f(x) := f(x + y)dy. E Proof. Fix x and y, let z = (1 λ)x + λy and define the following three functions F (a) = f(a)g(x a), G(b) = f(b)g(y b), H(c) = f(c)g(z c). It is easy to check that log-concavity of f, g implies that H((1 λ)a + λb) F (a) (1 λ) G(b) λ. Thus, by Prekopa-Leindler H ( F ) 1 λ ( G) λ. This can be translated into (f g)(z) ((f g)(x)) (1 λ) ((f g)(y)) λ. The second assertion follows from the first by approximation of 1 E, but can also be done directly, since for every fixed x and y we have, for z = (1 λ)z + λz E that f((1 λ)x + λy) + z) f(x + z ) (1 λ) f(y + z ) λ 59

60 and by Prekopa Leindler on E we get that P E f((1 λ)x + λy) = f((1 λ)x + λy + z)dz (P E f(x)) (1 λ) (P E f(y)) λ. E In particular, marginals of indicator functions are log-concave (actually some power of them is concave, which is a stronger fact than log-concavity). Another result which we do not prove but is worth mentioning is that marginals of convex bodies (of arbitrary dimension) are dense in the space of log-concave measures (the is locally uniform convergence). 6 Lesson 5: John position 6.1 Introduction We have met the Hausdorff distance on K n. In its definition the Euclidean structure is used. If one thinks of convex bodies (say, in K(0) n and centrally symmetric) as representing normed space, then it seems un-natural to have a Euclidean structure which determines distances. It is more natural to identify two normed spaces which are isometric (via a linear transformation) and to talk about a distance which takes this into account. Such is the Banach Mazur distance. We shall denote by X K the space R n equipped with the norm given by K, that is, x K = inf{r > 0 : x rk}. Two spaces X K, X T (of the same dimension) are isometric if there exists a linear A : R n R n which preserves distances, which is equivalent to AK = T. Definition 6.1. Let K, T K n (0). (1) The geometric distance between them is d g (K, T ) = inf{ab : 1 a K T bk}. (2) The Banach Mazur distance between the normed spaces is d BM (X K, X T ) = inf{d g (K, ut ) : u GL n }. Note that this is a logarithmic distance, that is, d(k, K) = 1 and d(k, L) d(k, T )d(t, L). Example 6.2. It is easy to check that d g (B n 1, Bn 2 ) = d g(b n 2, [ 1, 1]n ) = n and also not difficult to convince ones-self that this position is optimal, namely also d BM (B n 2, [ 1, 1]n ) = n. However, the trivial computation d g (B, 1 [ 1, 1]n ) = n (which is easily seen to be non-optimal for n = 2, say) can actually be improved significantly to C n. (We do not discuss this, nor the open problem of the maximal distance between a symmetric convex body and the cube, which is known to be bounded by n 5/6 ). 60

61 Up to here Shiur 11.1, 2 hour2, date 11.1 Remark 6.3. Note that there is no way to compare Hausdorff distance and Banach Mazur (BM) distance, on the one hand two bodies very far from one another in Hausdorff distnace may actually be identified in BM distance, and on the other hand any two sets inside, say, an ε ball in R n have Hausdorff distance less than ε whereas BM distance is invariant under multiplication by scalars. However, if a body K contains, say δb2 n, and has Hausdorff distance to T less than ε, K T + εb2 n T K + εb2 n then so that and d BM (K, T ) 1+εδ 1 εδ (1 εδ)k + εb n 2 T + εb n 2 T K(1 + εδ) (1 εδ)k T K(1 + εδ) 1 + 3εδ so that in a sense Hausdorff dominates BM. With this notion of distances, one may ask various questions about the space of equivalence classes of convex bodies (or normed spaces), a space which is called the Banach Mazur compactum. Questions such as : what is the diameter of this space, what is the centre, etc. Let us mention the easy fact that the Banach Mazur compactum is, as its name suggests, a compact metric space. This can be shown directly in a pretty elementary way but will follow easily from our discussion below. Remark 6.4. Here is an example of a proof for compactness: given a sequence [K j ] we may pick K j [K j ] for which the euclidean ball is contained in K j and is the ellipsoid with maximal volume inside it. Then we claim that for large R (which may depend on n) each K j RB2 n. Indeed, otherwise K j would include the cone conv(re, B2 n ) for some unit vector e, and when R is large enough this set includes an ellipsoid of volume greater than κ n, a contradiction to the choice of K j. Now when all K j are bounded, we may invoke Blaschke to get a converging subsequence in δ H, which by the remark above dominates the Banach Mazur distance (as B2 n K j ). We thus got a converging subsequence K jm δ H K and [K jm ] BM [K]. In this section we shall show that for any n dimensional normed space, d BM (X K, l n 2 ) n. To do so, given a centrally symmetric convex body K we shall take the ellipsoid E of maximal volume inside K, and show that K ne. (In other words, we quantify R = n in the remark above). A similar result holds in the non-symmetric case with n instead of n but we shall not go into it. 61

6.2 John s theorem: take I The real, useful, John s theorem gives a characterization of the maximal volume ellipsoid, or, more precisely, a characterization for a body being in John position.

62 6.2 John s theorem: take I The real, useful, John s theorem gives a characterization of the maximal volume ellipsoid, or, more precisely, a characterization for a body being in John position. Definition 6.5. Given a centrally symmetric K K(0) n we shall say it is in John position if the maximal volume ellipsoid of K is B2 n. Clearly every K has some linear image AK which is in John position, and by Lemma 6.11 this image is unique up to orthogonal transformations. In this subsection, however, we shall prove in an elementary way that in John s position, K nb2 n. This does not correspond to the original proof, and some key facts about the John position which will be important for us later will appear in the real, useful, John s theorem in subsection??. Theorem 6.6. Let B n 2 K Kn (0) be in John position. Then K nb n 2 Proof. Assume towards a contradiction that there exists some x 0 K with x 0 = R > n. Wlog assume x 0 = Re 1. From symmetry, conv(±re 1, B2 n ) K. Consider an ellipsoid n E = {x : x2 1 a 2 + x 2 i b 2 1}. Here b 2 = (1 ε) for a small ε, and a > 1 will be determined later so as to satisfy E K. The volume of E is ab n 1. To have it included in the aforementioned convex hull, note that we need just a two dimensional argument, since we have revolution bodies. We need some drawings, the first is the picture of the convex hull, the second is the same picture, squeezed in one axis by b/a. i=2 We see that 1/R = t/ R 2 + t 2 from the first picture. Thus t 2 R 2 = R 2 + t 2 and t 2 = R 2 /(R 2 1). From the second picture we see that a/r = b/(rb/a) = t/ R 2 b 2 /a 2 + t 2. That is, R 2 b 2 + t 2 a 2 = R 2 t 2 so that plugging in t 2 we get, after cancelling out unneeded terms, b 2 (R 2 1) + a 2 = R 2 a 2 = b 2 + R 2 (1 b 2 ). Plugging in the value for b 2 = 1 ε we see a = 1 + (R 2 1)ε and the volume of the whole ellipsoid (squared, say) is (1 ε) n 1 (1 + ε(r 2 1)) = (1 (n 1)ε + o(ε))(1 + (R 2 1)ε) = 1 + (R 2 n)ε + o(ε). So, when R > n, for mall enough ε, this will indeed be greater than 1, a contradiction. 62

63 Up to here Shiur 11.2, 1 hour, date Existence and uniqueness of ellipsoid with max volume Let us discuss ellipsoids a bit more before embarking on the journey to John s theorem (take II). Remark 6.7. A good reason for dicussing ellipsoids is that positions, which we defined as linear images of a body, turn up very naturally, not just the John position but many others as well. Recall that a position can be thought of in two equivalent ways: as linear images of the bosy, or as choices of a euclidean structure on a given normed space. Here are two more examples for positions: 1) Minimal surface area position. Minimize, over A SL n, the surface area of AK. (Or, over A GL n, the number S n (AK)/Vol n (K) (n 1)/n ). 2) Isotropic position. It can be given by a minimization probelm (of AK x 2 dx) but better thought of as follows: pick an A GL n so that for K = AK we have Vol(K ) = 1 and << ξ, η >>:= ξ, x x, η dx = C K ξ, η. K In isotropic position one may show that the volumes of central sections by hyperplanes are all equivalent up to a universal constant (independent of the dimension) and are actually (for θ S n 1 ) approximately ( ) 1/2 Vol n 1 (K θ x, θ 2 dx. K Thus the Slicing Problem (which asks whether for any body of volume 1, in any dimension, there exists a central hyperplane section of (n 1-dimensional) volume greater than 1/100, becomes a question about boundedness from above of C K for bodies of volume 1. The standard notation is C K = L K. (Best known bound: Klartag n 1/4, which improved Bourgains s n 1/4 log n.) 3) One more famous position is l position, we do not discuss it. Nor do we discuss M-position, which is not less important and was mentioned with respect to inverse Brunn-Minkowski inequalities. A centrally symmetric ellipsoid E is simply a linear image of the Euclidean unit ball. It is non-degenerate if E = AB2 n where A GL n. We shall need to know the following fact from linear algebra: Fact 6.8. A non-singular n n matrix A GL n can be represented as A = BU where B > 0 (symmetric and positive definite) and U is orthogonal. This polar decomposition is unique, and B = AA. Note that therefore we can think of an ellipsoid A = AB2 n = BUBn 2 = BBn 2 as a linear image of B2 n by a symmetric positive definite matrix. Since such a matrix can be diagonalized orthogonally, say with normalized eigenvectors v i, an ellipsoid is also just a set of the form E = {x R n : x, v i 2 α 2 i 1}. Here v i is some orthogonal basis and α i are the respective lengths of the semi-axes. Letting E = AB n 2 in the basis of v i = e i we have that A is diagonal (α i ). 63

64 The volume of the ellipsoid is of course det(a) Vol(B2 n) which is simply κ n αi. We shall also want to think of a symmetric matrix as an element in R n(n+1) 2. The subset of R n(n+1) 2 corresponding to positive definite symmetric matrices is a convex cone (open, since we do not allow semi-definite). This cone has a very interesting extremal structure, its extremal rays (if one takes the closure, for convenience) are the rank one matrices, v T v =: v v. Real symmetric matrices and the cone of positive definite matrices are a good model for Brunn-Minkowski type inequalities. Let us list a few examples. The polynomiality of determinant is easy, simply by the formula for determinant. Polarization follows easily, as it did in the mixed volume statement we had. Lemma 6.9. Let A 1,..., A m GL n (R) be symmetric. Then det(λ 1 A λ m A m ) = Here the mixed discriminant are m i 1,...,i n=1 D(A i1,..., A in )λ i1 λ in. D(A 1,..., A n ) = 1 ( 1) n+ εi det( ε i A i ). n! ε {0,1} n Note that in this lemma one does not assume positive matrices, and so one cannot claim that the coefficients, the mixed discriminants, are positive. In general we have D(A 1,... A n ) det(a i ) 1/n. (We shall not prove, or use). Lemma Let A 1, A 2 GL n (R) be symmetric and positive definite. Then det(a 1 + A 2 ) 1/n det(a 1 ) 1/n + det(a 2 ) 1/n. Equality holds if and only if A 1 = µa 2 for some µ > 0. Equivalently, Here the equality condition is that A 1 = A 2. det((1 λ)a 1 + λa 2 ) det(a 1 ) (1 λ) det(a 2 ) λ. Proof. Since determinant is multiplicative, we may multiply by A 1 2 prove that for B which is positive definite and symmetric det(b + Id) 1/n det(b) 1/n + det(id) 1/n = det(b) 1/n + 1. Since B is positive definite, it is diagonal in some orthogonal basis (and identity remains identity in any basis). Writing the matrices in this basis we need to show n (1 + λ i ) 1/n = det(b + Id) 1/n det(b) 1/n + 1 = ( λ i ) 1/n + 1. i=1 Dividing by the left hand side we may rewrite the inequality as ( ) 1/n ( λi ) 1 1/n λ i 1 + λ i 64

which follows by the arithmetic geometric inequality ( λi 1 + λ i ) 1/n ( ) 1 1/n + 1 1 + λ i n n i=1 ( λi + 1 ) = 1.

65 which follows by the arithmetic geometric inequality ( λi 1 + λ i ) 1/n ( ) 1 1/n λ i n n i=1 ( λi + 1 ) = λ i 1 + λ i A (slightly odd) way of reformulating the above lemma is that the subset of the cone of positive definite matrices consisting of those with determinant greater than one is a convex set. We shall use this fact in the next lemma. Let us mention yet another proof of this Brunn-Minkowski for matrices which shows that, in a sense, this is just Hölder. Another proof of Lemma We shall use the well known formula Rn e Ax,x dx = (2π)n/2 det(a) 1/2. Compute (2π) n/2 det((1 λ)a + λb) 1/2 = e ((1 λ)a+λb)x,x dx = R ) n (1 λ) ( ) λ Holder e (R Ax,x dx e Bx,x dx = n R n R n (e Ax,x ) (1 λ) ( e Bx,x ) λ dx (2π) n/2 (det A) (1 λ) 2 (det B) λ 2 Lemma Given a centrally symmetric proper convex body K R n, there exists a unique ellipsoid of maximal volume inscribed in K. Proof. In the exercises you showed (using compactness in R n(n 1)/2 or Blaschke) the existence of such an ellipsoid. Assume there are two different ellipsoids of maximal volume inside K. Without loss of generality one of them is B2 n, and the other given by some ABn 2 with A > 0 and det(a) = 1. By convexity of K also A+Id 2 B2 n K. This is an ellipsoid, whose volume is det( A+Id 2 ) and this determinant is at least 1 by Lemma In fact, the equality condition cannot be satisfied since we assumed the two ellipsoids were different, so we get a contradiction to maximality of the volume. 65

66 6.4 Ellipsoids outside K An equivalent construction for maximal volume ellipsoid in K is minimal volume outside K. It is not just similar proofs, but actually complete analogy due to the following simple lemma. Lemma Let K K(0) n. An ellipsoid E is (the ellipsoid) of maximal volume inside K if and only if E is an (the) ellipsoid of minimal volume outside K. Proof. Polarity reverses order so that for every ellipsoid E K there corresponds an ellipsoid E K and since polarity is an involution also vice versa. Furthermore, by the computation (AK) = (A ) 1 K (after definition 5.7) or the linear invariance of the Santalo product we have that Vol(E)Vol(E ) = κ 2 n so that if Vol(E 1 ) Vol(E 2 ) then Vol(E1 ) Vol(E 2 ). Therefore the volume of the maximal ellipsoid in K is κ 2 n/v where V is the volume of the minimal ellipsoid outside K. To have things complete all around, let us prove independently (not really, obviously) that every convex body has a unique minimal volume ellipsoid outside it. Lemma Let K R n be a symmetric convex body. There exists a unique ellipsoid E K with minimal volume. Proof. Let V = inf{vol(e) : E K} > 0 and find a sequence in GL n such that E m = Tm 1 (B2 n) with Vol(E m ) = κ n / det(t m ) V. Note that the operator norm of T m as an operator from X K to l n 2 is bounded by 1 (this is precisely the meaning of K E) and thus the elements T m are all inside some bounded set in GL n (use that all norms in R n2 are equivalent, say). We can thus use compactness to deduce that there is a converging subsequence so wlog T m S with det(s) = κ n /V > 0 (in particular, S GL n ). For uniqueness, assume wlog that B2 n and another E both contain K and have minimal volume. Let E be given by E = {x : x,v i 2 1}. Construct an average ellipsoid α 2 i E = {x R n : ( ) 1 2 (α 2 i + 1) x, v i 2 1}. Clearly E includes the intersection of E 1 and E 2 so it also includes K, and its volume is κ n α 2 i = κ n 2α 2 i α 2 i + 1 κ n where we have used α i = 1 and 2α i 1 + αi 2. There is equality only if all α i = 1 therefore E = B2 n as claimed. 6.5 John s theorem : take II It turns out that in John position some special condition on the touching points of the body and the ball holds. This can be thought of (and was the original John method, I think) as an infinite dimensional version of Lagrange multipliers. 66

67 Theorem 6.14 (John). Given a centrally symmetric K K(0) n with Bn 2 K the following are equivalent: (i) B2 n is the maximal volume ellipsoid in K (ii) There are u i K S n 1 and λ i > 0 for i = 1,..., m such that Id = λ i u i u i ( λ i = n). Remark The notation u u stands for the rank one symmetric operator (u u)(x) = u, x u, so that the operator equation means x = λ i x, u i u i. Note that by taking trace we readily get that λ i = n so we don t really have to put it in the statement, but it is useful to have it there to remember. Note also that x 2 = λ i x, u i 2. Further, if one knows the latter, one may deduce the former as follows x x, y + y 2 = x + y 2 = λ i x + y, u i 2 = λ i x, u i λ i x, u i y, u i + λ i y, u i 2. Thus x, y = λ i x, u i u i, y. As this holds for every y, we get that x = λ i x, u i u i. (i) implies (ii). We shall show that Id/n is a convex combination of u u for u K S n 1 (such u s are called touching points ). To this end, we assume by contradiction that Id/n may be separated from the compact convex set conv(u u : u K = u = 1}. One should ponder a second on why it is closed and bounded (considered as a subset of R n2 ) but this is immediate from continuity of norms and from boundedness of the euclidean unit ball, say, in l norm. The separation may be assumed to be given by a linear functional φ, that is, φ, Id/n < r φ, u u 67

for all such u S n 1 K. We may actually assume wlog that φ = φ since otherwise we let ψ = φ+φ 2 and since both u u and Id/n are symmetric, it satisfies the same inequality.

68 for all such u S n 1 K. We may actually assume wlog that φ = φ since otherwise we let ψ = φ+φ 2 and since both u u and Id/n are symmetric, it satisfies the same inequality. (Equivalently: one may separate in the space of linear symmetric matrices which is isometric to its dual space). Note that the traces of both u u and of Id/n are 1, so that the scalar product with the identity is equal for both. In other words, we may subtract from φ the linear functional trace times any constant and still get separation, with r changed to r const. If this constant is trace(φ) we get that the new functional, on Id/n, is 0. So far we have shown that there exists a linear functional, B, given by a symmetric matrix B = B, such that B, Id = tr(b) = 0 and B, u u > s > 0 for all u S n 1 K (notice that, for instance, this clearly cannot happen for an orthogonal basis since then the trace would be at least ns). The idea is to use B in order to define an ellipsoid of greater volume than B2 n which still resides in K. The ellipsoid shall be defined by (for a suitably chosen δ) E δ = {x : (I + δb)x, x 1}. For small enough δ this is indeed an ellipsoid as I + δb is still positive definite (simply take δ < 1/ B, this will ensure p.d.) so it has a square root, say S δ, and then E δ = S 1 δ B2 n. We shall now show two things: first, that for small enough δ this ellipsoid is indeed inside K, and second, that (because the trace of B is 0) the volume of this ellipsoid is greater than the volume of the ball. For the first part, we shall first consider the contact points of K and B2 n. Denote this set by U. We separate to points close to U and point far from it. Consider points of S n 1 which are far from contact points, V = {v S n 1 : d(v, U) s/(2 B )}. This is a compact set and thus it has positive distance from the boundary of K, in particular, r = max{ x : x V } < 1. We claim that for small enough δ, v/ v E δ for all v V. This means that in direction v, E δ K. Indeed, we let λ = min v V Bv, v ; this is a negative number by the following reason: since the trace of B is zero, so for some w we have Bw, w < 0 (as there are some w s with this quantity positive), but for all u U we have Bu, u > s so if v u < s/(2 B ) we also have Bv, v = Bu, u + Bu, v u + B(v u), v > s 2 v u B > 0. Therefore w V. We thus have that λ < 0. 68

69 Now is the time to chose δ < (1 r 2 )/ λ. Then for v V (I + δb) v v, v 1 + δ Bv, v = v v δλ r 2 > 1. This completes the assertion in the case of V. For S n 1 \ V this follows easily by our choice of a neighborhood of U: For every u U we know that B, u u > s so that Bu, u s, and thus (I + δb)u, u > 1 + δs. Thus u E δ. Further, is v S n 1 then (I + δb)v, v (I + δb)u, u = δ ( Bv, v Bu, u ) δ Bv, v Bv, u + δ Bv, u Bu, u 2δ B u v So that for a 1/2 B -neighborhood of U on S n 1, we have (I + δb)v, v 1 and in particular v E δ. However v B n 2 K so that in direction v, E δ K. We have completed the first part, namely showing that E δ K for small enough δ. As for the volume of this ellipsoid, Vol(E δ ) = κ n / det(i + δb) 1/2 and since by A/G inequality det(i +δb) 1/n tr(i +δb)/n = 1 we would get that the ellipsoid E δ has the same volume as B2 n, a contradiction since B 0. We shall prove the converse, namely that (ii) implies (i), in the following slightly more general setting. We say that a measure µ on S n 1 is isotropic if for every x x, u udµ(u) = x which we wrote before as u udµ(u) = Id. Note that there are only n 2 things to check (not really for every u) because of linearity of the identity above. Namely one needs only to check that i, j u i u j dµ(u) = δ i,j. x As we saw in the deiscrete case (in the exercises), equivalently one may check that for every x, u 2 dµ(u) = x 2. Note that, by applying this to e i and summing, one gets that µ(s n 1 ) = u 2 dµ(u) = n. This is like the trace formula we had above, forcing λ i = n, but here we also allow a non-atomic measure. Theorem Let B n 2 K = K Rn be convex. Assume there exists an isotropic Borel measure on S n 1 which is supported on the contact points K S n 1. Then K is in John position. Remark Given a measure µ, the support of µ is defined to be all points x such that for all ε > 0, µ(b(x, ε)) > 0. Proof. Instead of K consider the set L = {y : x, y 1 x supp(µ).} 69

70 (Note: this is like the dual set to conv(±supp(µ)).) Clearly K L and so it is enough to show that B2 n is maximal for L. Pick some ellipsoid E L given as before by E = {x R n : x, v i 2 1}. Here v i is some orthogonal basis and α i are the respective lengths of the semi-axes. If u supp(µ) then y(u) = α j u, v j v j E L. Indeed, y(u) has j th coordinate equal α j u, v j so that the sum of the squares weighted by αj 2 is u 2 = 1. Thus y(u) E L and thus u, y(u) 1 which gives α j u, v j 2 1. This holds for all u supp(µ). Isotropicity means that for every x, x, u 2 dµ(u) = x 2 so that in particular this is true for v j. We may now write n = dµ(u) αj v j, u 2 dµ(u) = α j v j, u 2 dµ(u) = α j. By the AG inequality, α i n implies α i 1, as claimed. α 2 i Corollary The Banach Mazur distance between l n 2 is bounded by n. and any n-dimensional normed space Proof. By definitions, this is equivalent to finding a position T K such that B n 2 K nb n 2. We shall use John position. Take the unit ball of the normed space to John position. Use John s theorem to find λ i u i u i = Id with touching points. Thus for x K, x, u i 1, because the supporting hyperplane at u i is u i. From symmetry, actually x, u i 1. x 2 = x, λ i u i, x u i = λ i ( u i, x ) 2 λ i = n. 6.6 Minimal Surface Area Position When one want to minimize S n (AK) over all A SL n (R), another position is created, called minimal surface area position. It will be the subject of Section 8 to show that the minimal surface area of a given body is always bounded from above by the surface area of a regular simplex with the same volume, or, in the symmetric case, by the surface area of the cube with the same volume. 70

As we shall see there, the proof goes via John s position, and not the real minimal surface area position, which is the subject of this subsection. Definition 6.19.

71 As we shall see there, the proof goes via John s position, and not the real minimal surface area position, which is the subject of this subsection. Definition The surface area measure of a convex body K is a measure on S n 1, denoted by σ K, and given by σ K (W ) = ν(x K : n K (x) W ). Here n K (x) is the outer normal to K at x, and is uniquely defined up to measure 0 points. We used ν to denote (n 1) dimensional Hausdorff measure on the surface K. It turns out that the minimal surface area position is characterized by isotropicity of σ K. Theorem Let K K n with Vol N (K) = 1. Then S n (K) = min A SLn S n (AK) if and only if for some constant c (given by S n (K)/n) the measure 1 c σ K is isotropic. Moreover, this position is unique up to orthogonal transformations. Proof. Let us first prove that under minimality, we have isotropicity. Take some A SL n. We claim that S n (A 1 K) = Au dσ K (u). S n 1 Indeed, we claim first that if u = n K (x) was the normal at x, then the normal at A 1 x to A 1 K is Au Au. This is because the tangent hyperplane x+u is mapped by A 1 to the tangent hyperplane A 1 x + A 1 (u ) and A 1 u = {A 1 z : z, u = 0} = {w : A w, u = 0} = {w : w, Au = 0} = (Au). Next we use that A SL n so that Au ) dσ K (u) = dσ A 1 K( Au Au ) A 1 u, Au Au = dσ Au K( A 1 Au Therefore, letting w = Au Au, so that u = A 1 w A 1 w, dσ A 1 K(w) = dσ A 1 K( Au S n 1 S n 1 Au ) = Au dσ K (u). S n 1 We now continue, and use the assumption that K is in minimal surface area position, so I+εT that for any A = we have an inequality det(i+εt ) 1/n S Au dσ n 1 K (u) S dσ n 1 K (u) which we rewrite as (I + εt )u dσ K (u) det(i + εt ) 1/n S n (K). S n 1 71

1 Lesson 1: Brunn Minkowski Inequality

1 Lesson 1: Brunn Minkowski Inequality A set A R n is called convex if (1 λ)x + λy A for any x, y A and any λ [0, 1]. The Minkowski sum of two sets A, B R n is defined by A + B := {a + b : a A, b B}. One