CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration Roughly speaing, measure concentration correspons to exploiting a phenomenon that some functions of ranom variables are highly concentrate aroun their expectation/meian The main example that will be of our interest here is Johnson-Linenstrauss JL) lemma The JL lemma is a very powerful tool for imensionality reuction in high-imensional Eucliean spaces an it is wiely use to alleviate the curse of imensionality that occurs in applications where one nees to eal with high-imensional ata Examples of Measure Concentration Probably the most well-nown example of measure concentration result states that the sum of inepenent ranom variables is tightly concentrate aroun its expectation/meian In particular, if X, X,, X n are inepenent an ientically istribute ranom variables ii) with each X i taing a value X i {, } with equal probability, the celebrate Chernoff boun states that their sum X = n i= X i is highly concentrate aroun its expectation Specifically, the probability that X > t is exponentially ecaying with t, ie, X > t) < e t n ) Note that expectation of X is just zero) Although this result is the most well-nown one an it alreay has plethora of applications, it can actually be seen as a special case of a more general measure concentration phenomena To this en, let us focus our attention on general real functions on hypercube an say that a function f : {, } n R is L-Lipschitz, for some L > 0, with respect to l metric) iff, for all x, y {, } n, fx) fy) L x y ) One can view the L-Lipschitz conition as a quantifie version of uniform continuity of f) Now, one can show that for any -Lipschitz function of n ranom variables X, X,, X n that are ii an are + an with equal probability, an analogous to ) concentration aroun the meian µ of f occurs Namely, we have fx,, X n ) > µ + t) < e t n 3) One can get a result for arbitrary Lipschitz constant L just by scaling) As the sum function is clearly -Lipschitz, one can see that Chernoff boun is inee a consequence of this more general statement 3 The Johnson-Linenstrauss Lemma The main example of measure concentration phenomenon that we want to focus on toay is capture by Johnson-Linenstrauss JL) lemma an correspons to the behavior of ranom vectors on a highimensional unit sphere Roughly speaing, the Johnson-Linenstrauss lemma tells us that the l -istance of high-imensional vectors is well preserve uner ranom projection to a much) lower imension
Lemma Johnson-Linenstrauss lemma) Consier a set of n vectors x i R an a ranom - imensional subspace of R Let y i be the projection of each x i on that subspace For any ε > 0, if = Ωɛ log n) then with probability at least n, ɛ) xi x j y i y j + ɛ) xi x j, i, j 4) In the light of this lemma, if we have some high-imensional ata whose ey characteristic we are intereste in is capture by l -istance, then we can achieve even an exponential compression of this ata s imension at the price of introucing only + ε) error note that the is just a normalizing scaling factor) It turns out that there is a lot of scenarios especially, in statistics an machine learning) where this technique is applicable an allows one to lift the curse of imensionality Namely, in a lot of applications, very) high-imensional ata arises naturally an this in of compression often calle imensionality reuction provies a powerful tool for ealing with the computational cost of processing such ata 3 Ranom Subspaces Before proceeing to the proof of this lemma, we first nee to mae the notion of ranom subspace precise To this en, let us start by efining what we mean by a ranom unit vector x S, where S is the -imensional unit sphere We will view such a vector as a result of a generation proceure in which, first, we sample each of its coorinates inepenently from a Gaussian istribution N 0, ) that has zero mean an stanar eviation one; an then normalize it to mae its norm equal to Note that one of the important an esirable properties of this efinition is that the resulting probability measure on the sphere is rotationally invariant) Once we efine our notion of ranom unit vector, ie, we efine our probability measure on the sphere, we can procee to efining what we mean by a ranom subspace of imension Again, we will o this by specifying the ranom process that generates it This process is as follows: Choose a ranom unit vector an mae it the first basis vector v of the subspace For the next rouns repeat the following: pic a ranom unit vector, subtract from it its projection on the subspace spanne by the previously chosen vectors v,, v i, an normalize it to form the next basis vector v i Clearly, after this proceure is finishe we en up with an orthonormal basis v,, v that spans the esire ranom) subspace of imension Note that the above proceure is nothing else than Gram-Schmit orthogonalization applie to a set of ranom unit vectors {v,, v }) Also, one can see that uner this efinition the projection y i of a ata point x i onto such a ranom subspace can be written in a matrix form as v v v x i v v v x i y i y i = v v v y i }{{} x i V Here, each row of the projection matrix V correspons to one ranom basis vector v i It is easy to see that this ranomly chosen unit vector is not in the span of the vectors v,, v i with probability
3 Proof of the JL Lemma Now that we have efine what a ranom vector an what a ranom subspace is, we are reay to prove the Johnson-Linenstrauss lemma As a first step, we show that this lemma follows from a simpler statement that just focuses on the norm of the projection of a fixe vector x in imensions onto a ranom -imensional subspace Lemma Let x be an arbitrary vector in R an z R be its projection onto a ranom -imensional subspace Then, for any ε > 0, as long as = Ωɛ log n), we have z x ɛ, with probability exceeing n 3 It is not har to see that once we prove Lemma, the Johnson-Linenstrauss lemma follows easily Inee, by applying the above lemma with x = x i x j, for any fixe i an j, we get [ z i,j ] x i x j ɛ n 3, where z i,j is the projection of x i x j on the ranom subspace Since the projection is a linear map, we have z i,j = y i y j So, applying a union boun to the previous inequality, over all On ) pairs i, j), we get that [ y i y j ] i j, x i x j ɛ nn ) n 3 n, which can be easily seen to be equivalent to the statement of Johnson-Linenstrauss lemma Hence, from now on we focus on proving Lemma Observe that by scaling, it suffices to prove this lemma for the case of x being a unit vector) To mae our tas easier, we want to first invert our perspective Namely, instea of looing at the norm of a projection of an arbitrary vector onto a ranom -imensional subspace, we prefer to loo at the norm of a projection of a ranom vector on a fixe -imensional subspace that correspons to the first coorinates of that vector It is not har to see that these two views are completely equivalent To this en, note that we can always rotate the space in such a way that the ranom -imensional subspace we chosen is just the projection onto the first coorinates Formally, let U enote the unitary matrix whose first rows are equal to vectors v i s that form the basis of the ranom subspace we have chosen) an where the remaining rows are chosen arbitrarily to form an orthonormal basis of the orthogonal complement of our subspace Then, we have that z i = v i ) T x = U v i ) T U x), for any i, as U = U T is a unitary matrix too an thus satisfies U ) T U = I Since U v i is equal to the i-th stanar basis vector e i an U x is a ranom vector as it correspons to a ranom rotation of a fixe vector), it is inee vali to see z as the projection of a ranom vector onto the subspace spanne by its first coorinates Thans to the above simplification of the perspective, our goal now is to stuy how the norm of the first coorinates of a ranom vector of unit norm) concentrates aroun a particular value To this en, note that if z = z,, z ) = z, z +,, z ) is a ranom unit vector then clearly we have E i= z i ) = 3
Since each z i s are ientically istribute, we obtain ) E = i= z i Thus inee l -norm of the first coorinates of a ranom vector has the esire expectation However, to prove Lemma, we also nee to stuy how this norm is concentrate aroun its expectation We will not o this toay Instea, just to give a flavor of involve techniques, we prove here a simpler result that bouns the concentration of the corresponing norm for = Specifically, we show that the probability that z is larger than t is exponentially ecaying with t Lemma 3 Let z = z,, z ) be a ranom vector in S We have z > t ) ) exp t, for any 0 < t Proof The proof of this lemma is base on a simple geometric argument Let us fix some t > 0 As z a) b) Figure : Illustration of the proof in two imensions a) The caps corresponing to z > are mare with re color b) Pictorial argument justifying upperbouning the area of these two caps by the area of corresponing sphere of the same raius is a ranom vector from a -imensional unit sphere S, we can see that the probability of choosing z with z > t is exactly the ratio of the area of two -imensional caps of raius R cup = t to the total area of a unit -imensional sphere See Figure a) that represents the situation in two imensions, ie, the case of = ) We can upperboun the area of these two caps by the area of a whole sphere of the same raius see Figure b)) As the area of a -imensional sphere SR) of raius R has to be a function of the form C R, where C is some coefficient epening on but not on R, we have z > t ) areasr cup)) areas ) Using the fact that x/n) n exp x), we conclue that z > t ) exp t /), = C Rcup ) C = t t 4
whenever 0 < t, as esire It is interesting to note that by applying Lemma 3 with t = Ω log n), we get that the probability log n that z excees is boune by This tells us that in high imensions almost all the vectors n O) on the unit sphere are close to being orthogonal Inee, thans to the rotation invariance of the scalar prouct, we can always tae one of the vectors to have its first coorinate be equal to an have all the remaining coorinates equal to zero Then, the scalar prouct of a ranom unit vector z with this vector is equal to z In high imensions, the quantity log n is very small, which gives a very small scalar prouct with high probability Unfortunately, as we alreay mentione, the bouns provie by the Lemma 3 are too wea to yiel the esire concentration of the norm of the projection of z on the first coorinates Therefore, we state without proof) a stronger version of Lemma 3 that allows one tae avantage of larger values of Lemma 4 Let z = z,, z ) = z, z +,, z ) be a ranom vector in S We have ) z > t e t Once we have this lemma, the proof of Lemma is straightforwar We just tae t = ε = 0ε ln n We then have ) z > ɛ n 3, which proves Lemma, an thus the Johnson-Linenstrauss lemma 33 Further Discussion As we presente it here, the JL lemma is not very practical This is so as our generation of the projection matrix V requires performing Gram-Schmit orthnormalization that is computationally quite expensive when n is large which is often the case) To circumvent this issue an mae JL lemma more practical, there was a lot of successful) wor on eveloping much more efficient constructions of the projection matrix V In these latest constructions, this matrix is generate via a very simple an easy to implement proceure that maes V have only few entries of each column being non-zero As a result, not only the whole construction is very efficient, but also the resulting matrix V is sparse ie, it has only small fraction of entries non-zero), which leas to computations of the projections of the input vectors being very efficient too All of these avancements mae JL lemma a truly practical tool Given the usefulness of JL lemma in applications that operate base on l -istance, it is natural to woner if similar results coul be achieve for other l p -istances Unfortunately, it seems that it is not the case an in fact for some of the istances eg, l -istance) there are strong lowerbouns on the possible imension reuction Also, it is nown that for l -istance, the imension reuction offere by JL lemma is essentially optimal) an 5