An Algorithmist s Toolkit Nov. 10, Lecture 17

Similar documents
Multivariate Statistics Random Projections and Johnson-Lindenstrauss Lemma

18.409: Topics in TCS: Embeddings of Finite Metric Spaces. Lecture 6

16 Embeddings of the Euclidean metric

Randomized Algorithms

An Algorithmist s Toolkit September 10, Lecture 1

1 Dimension Reduction in Euclidean Space

Topics in Theoretical Computer Science: An Algorithmist's Toolkit Fall 2007

Basic Properties of Metric and Normed Spaces

Non-Asymptotic Theory of Random Matrices Lecture 4: Dimension Reduction Date: January 16, 2007

Randomized Algorithms

Probabilistic Methods in Asymptotic Geometric Analysis.

Finite Metric Spaces & Their Embeddings: Introduction and Basic Tools

Some Useful Background for Talk on the Fast Johnson-Lindenstrauss Transform

Lecture 18: March 15

Asymptotic Geometric Analysis, Fall 2006

A glimpse into convex geometry. A glimpse into convex geometry

Proximity problems in high dimensions

Lectures in Geometric Functional Analysis. Roman Vershynin

Advances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008

Methods for sparse analysis of high-dimensional data, II

Approximate Clustering via Core-Sets

CS 372: Computational Geometry Lecture 14 Geometric Approximation Algorithms

25 Minimum bandwidth: Approximation via volume respecting embeddings

Concentration inequalities: basics and some new challenges

Succinct Data Structures for Approximating Convex Functions with Applications

High-dimensional distributions with convexity properties

High Dimensional Geometry, Curse of Dimensionality, Dimension Reduction

Clustering in High Dimensions. Mihai Bădoiu

An Algorithmist s Toolkit Lecture 18

Isomorphic Steiner symmetrization of p-convex sets

Optimal compression of approximate Euclidean distances

Linear Analysis Lecture 5

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

On metric characterizations of some classes of Banach spaces

15.081J/6.251J Introduction to Mathematical Programming. Lecture 18: The Ellipsoid method

Massachusetts Institute of Technology

Random matrices: Distribution of the least singular value (via Property Testing)

The Johnson-Lindenstrauss Lemma

Intensity Analysis of Spatial Point Patterns Geog 210C Introduction to Spatial Data Analysis

On the Distortion of Embedding Perfect Binary Trees into Low-dimensional Euclidean Spaces

Motivating the Covariance Matrix

Specral Graph Theory and its Applications September 7, Lecture 2 L G1,2 = 1 1.

Approximate Voronoi Diagrams

LECTURE NOTES ON DVORETZKY S THEOREM

Random Feature Maps for Dot Product Kernels Supplementary Material

On John type ellipsoids

Notes taken by Costis Georgiou revised by Hamed Hatami

Methods for sparse analysis of high-dimensional data, II

Supremum of simple stochastic processes

Lecture 6 Proof for JL Lemma and Linear Dimensionality Reduction

Notes on Gaussian processes and majorizing measures

Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig

Lecture Notes 1: Vector spaces

Algorithms for Streaming Data

Geometry of Similarity Search

1 Approximate Counting by Random Sampling

3 Orthogonality and Fourier series

Continuous Data with Continuous Priors Class 14, Jeremy Orloff and Jonathan Bloom

Dimensionality in the Stability of the Brunn-Minkowski Inequality: A blessing or a curse?

An Algorithmist s Toolkit September 24, Lecture 5

Lecture: Expanders, in theory and in practice (2 of 2)

In English, this means that if we travel on a straight line between any two points in C, then we never leave C.

Metric structures in L 1 : Dimension, snowflakes, and average distortion

Unsupervised Learning

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 11 Luca Trevisan February 29, 2016

Lecture 1 Measure concentration

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012

Projective Clustering in High Dimensions using Core-Sets

Problem Set 6: Solutions Math 201A: Fall a n x n,

Locality Sensitive Hashing

Approximating a Convex Body by An Ellipsoid

Notions such as convergent sequence and Cauchy sequence make sense for any metric space. Convergent Sequences are Cauchy

Recall that any inner product space V has an associated norm defined by

Fall, 2003 CIS 610. Advanced geometric methods. Homework 3. November 11, 2003; Due November 25, beginning of class

6.241 Dynamic Systems and Control

Empirical Processes and random projections

Auerbach bases and minimal volume sufficient enlargements

Embeddings of finite metric spaces in Euclidean space: a probabilistic view

Concentration function and other stuff

Chapter 3 Least Squares Solution of y = A x 3.1 Introduction We turn to a problem that is dual to the overconstrained estimation problems considered s

An Improved Construction of Progression-Free Sets

Lecture 2: A Las Vegas Algorithm for finding the closest pair of points in the plane

Linear Regression and Its Applications

Lecture 5 : Projections

Faster Johnson-Lindenstrauss style reductions

An Optimal Affine Invariant Smooth Minimization Algorithm.

16.1 Bounding Capacity with Covering Number

Optimization and Optimal Control in Banach Spaces

Geometric Constraints II

Lecture 13 October 6, Covering Numbers and Maurey s Empirical Method

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

EXAM # 3 PLEASE SHOW ALL WORK!

Dissertation Defense

Multi-normed spaces and multi-banach algebras. H. G. Dales. Leeds Semester

8 Eigenvectors and the Anisotropic Multivariate Gaussian Distribution

Physics 342 Lecture 4. Probability Density. Lecture 4. Physics 342 Quantum Mechanics I

Computational functional genomics

The Knaster problem and the geometry of high-dimensional cubes

Lec10p1, ORF363/COS323

Small sample size in high dimensional space - minimum distance based classification.

Transcription:

8.409 An Algorithmist s Toolkit Nov. 0, 009 Lecturer: Jonathan Kelner Lecture 7 Johnson-Lindenstrauss Theorem. Recap We first recap a theorem (isoperimetric inequality) and a lemma (concentration) from last time: Theorem (Measure concentration on the sphere) Let S n be the unit sphere in R n and A S n be a measurable set with vol(a) /, and let A denote the set of points of S n with distance at most from A. Then vol(a ) e n /. This theorem basically says that: When we get a set A which is greater or equal to half of the sphere, if we further incorporate points at most away from A, we almost have the whole sphere. Definition (c-lipschitz) A function f : A B is c-lipschitz if, for any u, v A, we have f(u) f(v) c u v For a unit vector x S n, the projection of the first k dimension is a -Lipschitz function,: f(x) = x + x + + x k Lemma 3 For a unit vector x S n, and f(x) = x + x + + x k. Let x be a vector randomly chosen with uniform distribution from S n and M be the median of f(x). Then f(x) is sharply concentrated with: P r[ f(x) M t] e t n/. Metric Embedding Definition 4 (D-embedding) Suppose that X = {x, x, x n } is a finite set, d is a metric on X, and f : X R k is -Lipschitz, with f(x i ) f(x j ) d(x i, x j ). The distortion of f is the minimum D for which f(x i ) f(x j ) d(x i, x j ) D f(x i ) f(x j ) for some positive constant α. We refer to f as a D-embedding of X. Claim of Johnson-Lindenstrauss Theorem: The Euclidean metric on any finite set X (a bunch of high dimensional points) can be embedded with distortion D = + in R k for k = O( log n). If we lose ( = 0), it becomes almost impossible to do better than that in R n. Nevertheless, it is not hard to construct a counter example to this: a simplex of n + points. The Johnson-Lindenstrauss theorem gives us an interesting result: if we project x to a random subspace, the projection y give us an approximate length of x for some fixed multiplication factor c, i.e. x c y. And c y is embedded with distortion D = +..3 Proof of the Theorem Next, we provide a more precise statement about Johnson-Lindenstrauss Theorem: 7-

Theorem 5 (Johnson-Lindenstrauss) Let X = {x, x, x n } R m (for any m) and let k = O( log n). For: L R m be a uniform random k dimensional subspace. {y, y, y n } be projections of x i on L. y i = cyi for some fixed constant c, and c = Θ( m ) Then, with high probability L is a ( + )-embedding of X into R k, i.e. for x i, x j X k x i x j y i y j ( + ) x i x j Proof Let Π L : R m L be the orthogonal projection of R m vector into subspace L. For x i, x j X, we let x be the normalized unit vector of x i x j, and we need to prove that ( φ) M x Π L (x) ( + φ) M x holds with high probability, where M is the median of the of the function f = x + + x m. Following definition 4, this shows that the mapping Π L is a D-embedding of X into R k with D = +φ We let φ = D + 3 so that = /3 /3 +. Since x =, it is equivalent to showing that the following inequality holds with high probability Π L (x) M < M () 3 Lemma 3 describes the case when we have a random unit vector and project it onto a fixed subspace. It is actually identical to fixing a vector and projecting it onto a random subspace (we will describe how this random subspace is generated in the next subsection). We use Lemma 3 and plug in t = 3 M; the probability inequality () does not hold is bounded by [ ] t m/ P r Π L (x) M M 4e 3 = 4e M m/8 4e k/7 /m Line 4 holds since k = O( k log n) (for further details, please see []). Line 3 holds since M = Ω( m ), based on the following reasoning: We have that = E[ X ] = E[x i ], which implies that E[x i ] = m. Consequently, k = E[ f ] Pr[f M + t](m + t) + Pr[f > M + t] max(f ) (M + t) + e t m/, m k where we used the fact that f = i= x i. Taking t = Θ( k ), we have that M = Ω( k m ). m φ. 7-

.4 Random Subspace Here we describe how a random subspace is generated. We first provide a quick review about Gaussians, a multivariate Gaussian has PDF: p x (x, x,, x N ) = exp( (x µ) T Σ (x µ)) (π) N/ Σ / where Σ is a nonsingular covariance matrix and vector µ is the mean of x. Gaussians have several nice properties. The following operations on Gaussian variables also yield Gaussian variables: Project onto a lower dimensional subspace. Restrict to a lower dimensional subspace, i.e. conditional probability. Any linear operations. In addition, we can generate a vector with multi-dimensional Gaussian distribution by picking each coordinate according to a -dimensional Gaussian distribution. How do we generate a random vector from a sphere? The idea here is to pick a point from a multidimensional Gaussian distribution (generate each coordinate with mean = 0 and variance =, N(0, )) so most n-dimensional vectors have norm n. As the shape of an independent Gaussian distribution s PDF is symmetric, this procedure does indeed generate a point randomly and uniformly from a sphere (after normalizing it). Generating a random vector from a uniform distribution does not work, since it is not sampling uniformly from a sphere after normalization. How do we get a random projection? This is no more than sampling n k times from a N(0, ) gaussian distributions. Each k samples are grouped to form a k-dimensional vector, so we have n total vectors: v, v, v n. We can simply orthonormalize these vectors, denoted as vˆi, and form the random subspace L:...... vˆ vˆ vˆn....5 Applications of Johnson-Lindenstrauss Theorem The Johnson-Lindenstrauss Theorem is very useful in several application areas, since it can approximately solve many problems. Here we illustrate some of them: Proximity Problems : This is an immediate application of the J-L Theorem. This is the case when we get a set of points in a high dimensional space R d and we want to compute any property defined in terms of distance between points. Using the J-L theorem, we can actually solve the problem in a lower dimensional space (up to a distortion factor). Example problems here include closest pair, furthest pair, minimum spanning tree, minimum cost matching, and various clustering problems. On-line Problems : The problems of this type involve answering queries in a high dimensional space. This is usually done through partitioning a high dimensional space according to some error (distance) measure. However, this operation tends to be exponentially dependent on the dimension of the space, ( ) e.g., d (referred to as the curse of dimensionality ). Projecting points of higher dimensional space into lower dimensional space significantly helps with these types of problems. Data Stream/Storage Problem : We obtain data in a stream but we cannot store it all due to some storage space restriction. One way of dealing with it is to maintain a count for each data entry and then see how the counts are distributed. The idea is to provide sketches of such data based on the J-L Theorem. For further details, please refer to Piotr Indyk s course and his survey paper. In summary, applications that are related to dimensionality reduction are very likely to be a good platform for the J-L Theorem. 7-3

Dvoretsky s Theorem Dvoretsky s Theorem, proved by Aryeh Dvoretsky in his article A Theorem on Convex Bodies and Applications to Banach Spaces in 959, tries to answer the following question: Let C be an origin-symmetric convex body in R n. S R n be a vector subspace. We would like to know: does Y = C S look like a sphere? Furthermore, for how high a dimension (we denote it as k) does there exist an S for which this occurs? A formal statement of Y s similarity to a sphere can be characterized by whether Y has a small Banach-Mazur distance to the sphere, i.e. if there exists a linear transformation such that S k () Y S k ( + ) where S k (r) is denoted as a sphere with radius r. It turns out that k varies with different types of convex bodies: for a ellipsoid k = n, for a cross-polytope k = Θ(n), and for a cube is k = log(n). It turns out that the cube case is the worst case scenario. Here is a formal statement of Dvoretsky s Theorem: Theorem 6 (Dvoretsky) There is a positive constant c > 0 such that, for all and n, every n-dimensional origin-symmetric convex body has a section within distance + of the unit ball of dimension c k log n log( + ) Instead of providing the whole proof, we give a sketch of the proof here:. When we are given an origin-symmetric convex body, denoted as C, it defines some norm with respect to the convex body: C C.. We need a subspace S to be spherical. It is basically saying that when we take any vector θ on S, then θ C is approximately constant. 3. This is similar to concentration of measures which we have shown before. It basically says that when we have a function defined as a norm f : θ θ C, it is precisely concentrated for every θ on the sphere (i.e. every θ C is close to median). 4. This is similar to Johnson-Lindenstrauss except that we need every vector in k-dimensional subspace satisfying point (In the J-L theorem, we prove that most of the vectors (points) are close to a fixed constant, i.e. median). 5. What we do is to put a fine mesh on the k-dimensional subspace and show that every point on the grid is right. The number of points we need to check is approximately O(( 4 k δ ) ) where δ is the error. We can see that it is exponentially dependent on k and it looks similar to the dependency of k in the J-L theorem. For further details of the proof, please see []. References. Sariel Har-Peled, Geometric Approximation Algorithms, http://valis.cs.uiuc.edu/ sariel/teach/notes/aprx. Aryeh Dvoretzky, Some results on convex bodies and Banach spaces, Proceedings of the National Academy of Sciences, 959. 7-4

MIT OpenCourseWare http://ocw.mit.edu 8.409 Topics in Theoretical Computer Science: An Algorithmist's Toolkit Fall 009 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.