8.409 An Algorithmist s Toolkit Nov. 0, 009 Lecturer: Jonathan Kelner Lecture 7 Johnson-Lindenstrauss Theorem. Recap We first recap a theorem (isoperimetric inequality) and a lemma (concentration) from last time: Theorem (Measure concentration on the sphere) Let S n be the unit sphere in R n and A S n be a measurable set with vol(a) /, and let A denote the set of points of S n with distance at most from A. Then vol(a ) e n /. This theorem basically says that: When we get a set A which is greater or equal to half of the sphere, if we further incorporate points at most away from A, we almost have the whole sphere. Definition (c-lipschitz) A function f : A B is c-lipschitz if, for any u, v A, we have f(u) f(v) c u v For a unit vector x S n, the projection of the first k dimension is a -Lipschitz function,: f(x) = x + x + + x k Lemma 3 For a unit vector x S n, and f(x) = x + x + + x k. Let x be a vector randomly chosen with uniform distribution from S n and M be the median of f(x). Then f(x) is sharply concentrated with: P r[ f(x) M t] e t n/. Metric Embedding Definition 4 (D-embedding) Suppose that X = {x, x, x n } is a finite set, d is a metric on X, and f : X R k is -Lipschitz, with f(x i ) f(x j ) d(x i, x j ). The distortion of f is the minimum D for which f(x i ) f(x j ) d(x i, x j ) D f(x i ) f(x j ) for some positive constant α. We refer to f as a D-embedding of X. Claim of Johnson-Lindenstrauss Theorem: The Euclidean metric on any finite set X (a bunch of high dimensional points) can be embedded with distortion D = + in R k for k = O( log n). If we lose ( = 0), it becomes almost impossible to do better than that in R n. Nevertheless, it is not hard to construct a counter example to this: a simplex of n + points. The Johnson-Lindenstrauss theorem gives us an interesting result: if we project x to a random subspace, the projection y give us an approximate length of x for some fixed multiplication factor c, i.e. x c y. And c y is embedded with distortion D = +..3 Proof of the Theorem Next, we provide a more precise statement about Johnson-Lindenstrauss Theorem: 7-
Theorem 5 (Johnson-Lindenstrauss) Let X = {x, x, x n } R m (for any m) and let k = O( log n). For: L R m be a uniform random k dimensional subspace. {y, y, y n } be projections of x i on L. y i = cyi for some fixed constant c, and c = Θ( m ) Then, with high probability L is a ( + )-embedding of X into R k, i.e. for x i, x j X k x i x j y i y j ( + ) x i x j Proof Let Π L : R m L be the orthogonal projection of R m vector into subspace L. For x i, x j X, we let x be the normalized unit vector of x i x j, and we need to prove that ( φ) M x Π L (x) ( + φ) M x holds with high probability, where M is the median of the of the function f = x + + x m. Following definition 4, this shows that the mapping Π L is a D-embedding of X into R k with D = +φ We let φ = D + 3 so that = /3 /3 +. Since x =, it is equivalent to showing that the following inequality holds with high probability Π L (x) M < M () 3 Lemma 3 describes the case when we have a random unit vector and project it onto a fixed subspace. It is actually identical to fixing a vector and projecting it onto a random subspace (we will describe how this random subspace is generated in the next subsection). We use Lemma 3 and plug in t = 3 M; the probability inequality () does not hold is bounded by [ ] t m/ P r Π L (x) M M 4e 3 = 4e M m/8 4e k/7 /m Line 4 holds since k = O( k log n) (for further details, please see []). Line 3 holds since M = Ω( m ), based on the following reasoning: We have that = E[ X ] = E[x i ], which implies that E[x i ] = m. Consequently, k = E[ f ] Pr[f M + t](m + t) + Pr[f > M + t] max(f ) (M + t) + e t m/, m k where we used the fact that f = i= x i. Taking t = Θ( k ), we have that M = Ω( k m ). m φ. 7-
.4 Random Subspace Here we describe how a random subspace is generated. We first provide a quick review about Gaussians, a multivariate Gaussian has PDF: p x (x, x,, x N ) = exp( (x µ) T Σ (x µ)) (π) N/ Σ / where Σ is a nonsingular covariance matrix and vector µ is the mean of x. Gaussians have several nice properties. The following operations on Gaussian variables also yield Gaussian variables: Project onto a lower dimensional subspace. Restrict to a lower dimensional subspace, i.e. conditional probability. Any linear operations. In addition, we can generate a vector with multi-dimensional Gaussian distribution by picking each coordinate according to a -dimensional Gaussian distribution. How do we generate a random vector from a sphere? The idea here is to pick a point from a multidimensional Gaussian distribution (generate each coordinate with mean = 0 and variance =, N(0, )) so most n-dimensional vectors have norm n. As the shape of an independent Gaussian distribution s PDF is symmetric, this procedure does indeed generate a point randomly and uniformly from a sphere (after normalizing it). Generating a random vector from a uniform distribution does not work, since it is not sampling uniformly from a sphere after normalization. How do we get a random projection? This is no more than sampling n k times from a N(0, ) gaussian distributions. Each k samples are grouped to form a k-dimensional vector, so we have n total vectors: v, v, v n. We can simply orthonormalize these vectors, denoted as vˆi, and form the random subspace L:...... vˆ vˆ vˆn....5 Applications of Johnson-Lindenstrauss Theorem The Johnson-Lindenstrauss Theorem is very useful in several application areas, since it can approximately solve many problems. Here we illustrate some of them: Proximity Problems : This is an immediate application of the J-L Theorem. This is the case when we get a set of points in a high dimensional space R d and we want to compute any property defined in terms of distance between points. Using the J-L theorem, we can actually solve the problem in a lower dimensional space (up to a distortion factor). Example problems here include closest pair, furthest pair, minimum spanning tree, minimum cost matching, and various clustering problems. On-line Problems : The problems of this type involve answering queries in a high dimensional space. This is usually done through partitioning a high dimensional space according to some error (distance) measure. However, this operation tends to be exponentially dependent on the dimension of the space, ( ) e.g., d (referred to as the curse of dimensionality ). Projecting points of higher dimensional space into lower dimensional space significantly helps with these types of problems. Data Stream/Storage Problem : We obtain data in a stream but we cannot store it all due to some storage space restriction. One way of dealing with it is to maintain a count for each data entry and then see how the counts are distributed. The idea is to provide sketches of such data based on the J-L Theorem. For further details, please refer to Piotr Indyk s course and his survey paper. In summary, applications that are related to dimensionality reduction are very likely to be a good platform for the J-L Theorem. 7-3
Dvoretsky s Theorem Dvoretsky s Theorem, proved by Aryeh Dvoretsky in his article A Theorem on Convex Bodies and Applications to Banach Spaces in 959, tries to answer the following question: Let C be an origin-symmetric convex body in R n. S R n be a vector subspace. We would like to know: does Y = C S look like a sphere? Furthermore, for how high a dimension (we denote it as k) does there exist an S for which this occurs? A formal statement of Y s similarity to a sphere can be characterized by whether Y has a small Banach-Mazur distance to the sphere, i.e. if there exists a linear transformation such that S k () Y S k ( + ) where S k (r) is denoted as a sphere with radius r. It turns out that k varies with different types of convex bodies: for a ellipsoid k = n, for a cross-polytope k = Θ(n), and for a cube is k = log(n). It turns out that the cube case is the worst case scenario. Here is a formal statement of Dvoretsky s Theorem: Theorem 6 (Dvoretsky) There is a positive constant c > 0 such that, for all and n, every n-dimensional origin-symmetric convex body has a section within distance + of the unit ball of dimension c k log n log( + ) Instead of providing the whole proof, we give a sketch of the proof here:. When we are given an origin-symmetric convex body, denoted as C, it defines some norm with respect to the convex body: C C.. We need a subspace S to be spherical. It is basically saying that when we take any vector θ on S, then θ C is approximately constant. 3. This is similar to concentration of measures which we have shown before. It basically says that when we have a function defined as a norm f : θ θ C, it is precisely concentrated for every θ on the sphere (i.e. every θ C is close to median). 4. This is similar to Johnson-Lindenstrauss except that we need every vector in k-dimensional subspace satisfying point (In the J-L theorem, we prove that most of the vectors (points) are close to a fixed constant, i.e. median). 5. What we do is to put a fine mesh on the k-dimensional subspace and show that every point on the grid is right. The number of points we need to check is approximately O(( 4 k δ ) ) where δ is the error. We can see that it is exponentially dependent on k and it looks similar to the dependency of k in the J-L theorem. For further details of the proof, please see []. References. Sariel Har-Peled, Geometric Approximation Algorithms, http://valis.cs.uiuc.edu/ sariel/teach/notes/aprx. Aryeh Dvoretzky, Some results on convex bodies and Banach spaces, Proceedings of the National Academy of Sciences, 959. 7-4
MIT OpenCourseWare http://ocw.mit.edu 8.409 Topics in Theoretical Computer Science: An Algorithmist's Toolkit Fall 009 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.