Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012

Similar documents
Least-Squares Regression on Sparse Spaces

Topic 7: Convergence of Random Variables

Lecture 6 : Dimensionality Reduction

Multi-View Clustering via Canonical Correlation Analysis

7.1 Support Vector Machine

A Unified Theorem on SDP Rank Reduction

LECTURE NOTES ON DVORETZKY S THEOREM

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k

PDE Notes, Lecture #11

Lower bounds on Locality Sensitive Hashing

Function Spaces. 1 Hilbert Spaces

Database-friendly Random Projections

Some Examples. Uniform motion. Poisson processes on the real line

Multi-View Clustering via Canonical Correlation Analysis

High-Dimensional p-norms

Acute sets in Euclidean spaces

Lecture 2 Lagrangian formulation of classical mechanics Mechanics

Necessary and Sufficient Conditions for Sketched Subspace Clustering

Convergence of Random Walks

Agmon Kolmogorov Inequalities on l 2 (Z d )

Generalized Tractability for Multivariate Problems

arxiv: v1 [cs.lg] 22 Mar 2014

Separation of Variables

u!i = a T u = 0. Then S satisfies

Analyzing Tensor Power Method Dynamics in Overcomplete Regime

G4003 Advanced Mechanics 1. We already saw that if q is a cyclic variable, the associated conjugate momentum is conserved, L = const.

Multi-View Clustering via Canonical Correlation Analysis

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback

ensembles When working with density operators, we can use this connection to define a generalized Bloch vector: v x Tr x, v y Tr y

Darboux s theorem and symplectic geometry

Multi-View Clustering via Canonical Correlation Analysis

Problem Sheet 2: Eigenvalues and eigenvectors and their use in solving linear ODEs

Tractability results for weighted Banach spaces of smooth functions

A Weak First Digit Law for a Class of Sequences

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions

Introduction to the Vlasov-Poisson system

Discrete Mathematics

Lower Bounds for the Smoothed Number of Pareto optimal Solutions

A Sketch of Menshikov s Theorem

The Exact Form and General Integrating Factors

arxiv: v4 [cs.ds] 7 Mar 2014

arxiv: v2 [cs.ds] 11 May 2016

Math 1B, lecture 8: Integration by parts

1. Aufgabenblatt zur Vorlesung Probability Theory

Euler equations for multiple integrals

On permutation-invariance of limit theorems

Concentration of Measure Inequalities for Compressive Toeplitz Matrices with Applications to Detection and System Identification

On combinatorial approaches to compressed sensing

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks

A Randomized Approximate Nearest Neighbors Algorithm - a short version

FLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS. 1. Introduction

On the Surprising Behavior of Distance Metrics in High Dimensional Space

Analysis IV, Assignment 4

All s Well That Ends Well: Supplementary Proofs

MA 2232 Lecture 08 - Review of Log and Exponential Functions and Exponential Growth

Multivariate Statistics Random Projections and Johnson-Lindenstrauss Lemma

SINGULAR PERTURBATION AND STATIONARY SOLUTIONS OF PARABOLIC EQUATIONS IN GAUSS-SOBOLEV SPACES

The Principle of Least Action

Sharp Thresholds. Zachary Hamaker. March 15, 2010

Calculus of Variations

Delocalization of boundary states in disordered topological insulators

Chromatic number for a generalization of Cartesian product graphs

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION

19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control

Low-Dimensional Lattice Basis Reduction Revisited (Extended Abstract)

Solutions to Math 41 Second Exam November 4, 2010

Approximate Constraint Satisfaction Requires Large LP Relaxations

Influence of weight initialization on multilayer perceptron performance

Lower Bounds for Local Monotonicity Reconstruction from Transitive-Closure Spanners

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013

Monte Carlo Methods with Reduced Error

Leaving Randomness to Nature: d-dimensional Product Codes through the lens of Generalized-LDPC codes

Introduction to Markov Processes

Rank, Trace, Determinant, Transpose an Inverse of a Matrix Let A be an n n square matrix: A = a11 a1 a1n a1 a an a n1 a n a nn nn where is the jth col

Iterated Point-Line Configurations Grow Doubly-Exponentially

SYSTEMS OF DIFFERENTIAL EQUATIONS, EULER S FORMULA. where L is some constant, usually called the Lipschitz constant. An example is

II. First variation of functionals

Monotonicity for excited random walk in high dimensions

Asymptotic estimates on the time derivative of entropy on a Riemannian manifold

Collapsed Gibbs and Variational Methods for LDA. Example Collapsed MoG Sampling

Quantum mechanical approaches to the virial

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments

arxiv: v1 [math.mg] 10 Apr 2018

The derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x)

Tensors, Fields Pt. 1 and the Lie Bracket Pt. 1

Linear First-Order Equations

Large Triangles in the d-dimensional Unit-Cube (Extended Abstract)

Monotonicity of facet numbers of random convex hulls

CHAPTER 1 : DIFFERENTIABLE MANIFOLDS. 1.1 The definition of a differentiable manifold

On the number of isolated eigenvalues of a pair of particles in a quantum wire

Self-normalized Martingale Tail Inequality

CONTROL CHARTS FOR VARIABLES

The total derivative. Chapter Lagrangian and Eulerian approaches

A note on asymptotic formulae for one-dimensional network flow problems Carlos F. Daganzo and Karen R. Smilowitz

ALGEBRAIC AND ANALYTIC PROPERTIES OF ARITHMETIC FUNCTIONS

Ramsey numbers of some bipartite graphs versus complete graphs

Resistant Polynomials and Stronger Lower Bounds for Depth-Three Arithmetical Formulas

Homework 3 - Solutions

Schrödinger s equation.

Transcription:

CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration Roughly speaing, measure concentration correspons to exploiting a phenomenon that some functions of ranom variables are highly concentrate aroun their expectation/meian The main example that will be of our interest here is Johnson-Linenstrauss JL) lemma The JL lemma is a very powerful tool for imensionality reuction in high-imensional Eucliean spaces an it is wiely use to alleviate the curse of imensionality that occurs in applications where one nees to eal with high-imensional ata Examples of Measure Concentration Probably the most well-nown example of measure concentration result states that the sum of inepenent ranom variables is tightly concentrate aroun its expectation/meian In particular, if X, X,, X n are inepenent an ientically istribute ranom variables ii) with each X i taing a value X i {, } with equal probability, the celebrate Chernoff boun states that their sum X = n i= X i is highly concentrate aroun its expectation Specifically, the probability that X > t is exponentially ecaying with t, ie, X > t) < e t n ) Note that expectation of X is just zero) Although this result is the most well-nown one an it alreay has plethora of applications, it can actually be seen as a special case of a more general measure concentration phenomena To this en, let us focus our attention on general real functions on hypercube an say that a function f : {, } n R is L-Lipschitz, for some L > 0, with respect to l metric) iff, for all x, y {, } n, fx) fy) L x y ) One can view the L-Lipschitz conition as a quantifie version of uniform continuity of f) Now, one can show that for any -Lipschitz function of n ranom variables X, X,, X n that are ii an are + an with equal probability, an analogous to ) concentration aroun the meian µ of f occurs Namely, we have fx,, X n ) > µ + t) < e t n 3) One can get a result for arbitrary Lipschitz constant L just by scaling) As the sum function is clearly -Lipschitz, one can see that Chernoff boun is inee a consequence of this more general statement 3 The Johnson-Linenstrauss Lemma The main example of measure concentration phenomenon that we want to focus on toay is capture by Johnson-Linenstrauss JL) lemma an correspons to the behavior of ranom vectors on a highimensional unit sphere Roughly speaing, the Johnson-Linenstrauss lemma tells us that the l -istance of high-imensional vectors is well preserve uner ranom projection to a much) lower imension

Lemma Johnson-Linenstrauss lemma) Consier a set of n vectors x i R an a ranom - imensional subspace of R Let y i be the projection of each x i on that subspace For any ε > 0, if = Ωɛ log n) then with probability at least n, ɛ) xi x j y i y j + ɛ) xi x j, i, j 4) In the light of this lemma, if we have some high-imensional ata whose ey characteristic we are intereste in is capture by l -istance, then we can achieve even an exponential compression of this ata s imension at the price of introucing only + ε) error note that the is just a normalizing scaling factor) It turns out that there is a lot of scenarios especially, in statistics an machine learning) where this technique is applicable an allows one to lift the curse of imensionality Namely, in a lot of applications, very) high-imensional ata arises naturally an this in of compression often calle imensionality reuction provies a powerful tool for ealing with the computational cost of processing such ata 3 Ranom Subspaces Before proceeing to the proof of this lemma, we first nee to mae the notion of ranom subspace precise To this en, let us start by efining what we mean by a ranom unit vector x S, where S is the -imensional unit sphere We will view such a vector as a result of a generation proceure in which, first, we sample each of its coorinates inepenently from a Gaussian istribution N 0, ) that has zero mean an stanar eviation one; an then normalize it to mae its norm equal to Note that one of the important an esirable properties of this efinition is that the resulting probability measure on the sphere is rotationally invariant) Once we efine our notion of ranom unit vector, ie, we efine our probability measure on the sphere, we can procee to efining what we mean by a ranom subspace of imension Again, we will o this by specifying the ranom process that generates it This process is as follows: Choose a ranom unit vector an mae it the first basis vector v of the subspace For the next rouns repeat the following: pic a ranom unit vector, subtract from it its projection on the subspace spanne by the previously chosen vectors v,, v i, an normalize it to form the next basis vector v i Clearly, after this proceure is finishe we en up with an orthonormal basis v,, v that spans the esire ranom) subspace of imension Note that the above proceure is nothing else than Gram-Schmit orthogonalization applie to a set of ranom unit vectors {v,, v }) Also, one can see that uner this efinition the projection y i of a ata point x i onto such a ranom subspace can be written in a matrix form as v v v x i v v v x i y i y i = v v v y i }{{} x i V Here, each row of the projection matrix V correspons to one ranom basis vector v i It is easy to see that this ranomly chosen unit vector is not in the span of the vectors v,, v i with probability

3 Proof of the JL Lemma Now that we have efine what a ranom vector an what a ranom subspace is, we are reay to prove the Johnson-Linenstrauss lemma As a first step, we show that this lemma follows from a simpler statement that just focuses on the norm of the projection of a fixe vector x in imensions onto a ranom -imensional subspace Lemma Let x be an arbitrary vector in R an z R be its projection onto a ranom -imensional subspace Then, for any ε > 0, as long as = Ωɛ log n), we have z x ɛ, with probability exceeing n 3 It is not har to see that once we prove Lemma, the Johnson-Linenstrauss lemma follows easily Inee, by applying the above lemma with x = x i x j, for any fixe i an j, we get [ z i,j ] x i x j ɛ n 3, where z i,j is the projection of x i x j on the ranom subspace Since the projection is a linear map, we have z i,j = y i y j So, applying a union boun to the previous inequality, over all On ) pairs i, j), we get that [ y i y j ] i j, x i x j ɛ nn ) n 3 n, which can be easily seen to be equivalent to the statement of Johnson-Linenstrauss lemma Hence, from now on we focus on proving Lemma Observe that by scaling, it suffices to prove this lemma for the case of x being a unit vector) To mae our tas easier, we want to first invert our perspective Namely, instea of looing at the norm of a projection of an arbitrary vector onto a ranom -imensional subspace, we prefer to loo at the norm of a projection of a ranom vector on a fixe -imensional subspace that correspons to the first coorinates of that vector It is not har to see that these two views are completely equivalent To this en, note that we can always rotate the space in such a way that the ranom -imensional subspace we chosen is just the projection onto the first coorinates Formally, let U enote the unitary matrix whose first rows are equal to vectors v i s that form the basis of the ranom subspace we have chosen) an where the remaining rows are chosen arbitrarily to form an orthonormal basis of the orthogonal complement of our subspace Then, we have that z i = v i ) T x = U v i ) T U x), for any i, as U = U T is a unitary matrix too an thus satisfies U ) T U = I Since U v i is equal to the i-th stanar basis vector e i an U x is a ranom vector as it correspons to a ranom rotation of a fixe vector), it is inee vali to see z as the projection of a ranom vector onto the subspace spanne by its first coorinates Thans to the above simplification of the perspective, our goal now is to stuy how the norm of the first coorinates of a ranom vector of unit norm) concentrates aroun a particular value To this en, note that if z = z,, z ) = z, z +,, z ) is a ranom unit vector then clearly we have E i= z i ) = 3

Since each z i s are ientically istribute, we obtain ) E = i= z i Thus inee l -norm of the first coorinates of a ranom vector has the esire expectation However, to prove Lemma, we also nee to stuy how this norm is concentrate aroun its expectation We will not o this toay Instea, just to give a flavor of involve techniques, we prove here a simpler result that bouns the concentration of the corresponing norm for = Specifically, we show that the probability that z is larger than t is exponentially ecaying with t Lemma 3 Let z = z,, z ) be a ranom vector in S We have z > t ) ) exp t, for any 0 < t Proof The proof of this lemma is base on a simple geometric argument Let us fix some t > 0 As z a) b) Figure : Illustration of the proof in two imensions a) The caps corresponing to z > are mare with re color b) Pictorial argument justifying upperbouning the area of these two caps by the area of corresponing sphere of the same raius is a ranom vector from a -imensional unit sphere S, we can see that the probability of choosing z with z > t is exactly the ratio of the area of two -imensional caps of raius R cup = t to the total area of a unit -imensional sphere See Figure a) that represents the situation in two imensions, ie, the case of = ) We can upperboun the area of these two caps by the area of a whole sphere of the same raius see Figure b)) As the area of a -imensional sphere SR) of raius R has to be a function of the form C R, where C is some coefficient epening on but not on R, we have z > t ) areasr cup)) areas ) Using the fact that x/n) n exp x), we conclue that z > t ) exp t /), = C Rcup ) C = t t 4

whenever 0 < t, as esire It is interesting to note that by applying Lemma 3 with t = Ω log n), we get that the probability log n that z excees is boune by This tells us that in high imensions almost all the vectors n O) on the unit sphere are close to being orthogonal Inee, thans to the rotation invariance of the scalar prouct, we can always tae one of the vectors to have its first coorinate be equal to an have all the remaining coorinates equal to zero Then, the scalar prouct of a ranom unit vector z with this vector is equal to z In high imensions, the quantity log n is very small, which gives a very small scalar prouct with high probability Unfortunately, as we alreay mentione, the bouns provie by the Lemma 3 are too wea to yiel the esire concentration of the norm of the projection of z on the first coorinates Therefore, we state without proof) a stronger version of Lemma 3 that allows one tae avantage of larger values of Lemma 4 Let z = z,, z ) = z, z +,, z ) be a ranom vector in S We have ) z > t e t Once we have this lemma, the proof of Lemma is straightforwar We just tae t = ε = 0ε ln n We then have ) z > ɛ n 3, which proves Lemma, an thus the Johnson-Linenstrauss lemma 33 Further Discussion As we presente it here, the JL lemma is not very practical This is so as our generation of the projection matrix V requires performing Gram-Schmit orthnormalization that is computationally quite expensive when n is large which is often the case) To circumvent this issue an mae JL lemma more practical, there was a lot of successful) wor on eveloping much more efficient constructions of the projection matrix V In these latest constructions, this matrix is generate via a very simple an easy to implement proceure that maes V have only few entries of each column being non-zero As a result, not only the whole construction is very efficient, but also the resulting matrix V is sparse ie, it has only small fraction of entries non-zero), which leas to computations of the projections of the input vectors being very efficient too All of these avancements mae JL lemma a truly practical tool Given the usefulness of JL lemma in applications that operate base on l -istance, it is natural to woner if similar results coul be achieve for other l p -istances Unfortunately, it seems that it is not the case an in fact for some of the istances eg, l -istance) there are strong lowerbouns on the possible imension reuction Also, it is nown that for l -istance, the imension reuction offere by JL lemma is essentially optimal) an 5