High-Dimensional p-norms

Size: px

Start display at page:

Download "High-Dimensional p-norms"

Hugh Byrd
6 years ago
Views:

1 High-Dimensional p-norms Gérar Biau an Davi M. Mason Abstract Let X = X 1,...,X be a R -value ranom vector with i.i.. components, an let X p = j=1 X j p 1/p be its p-norm, for p > 0. The impact of letting go to infinity on X p has surprising consequences, which may ramatically affect high-imensional ata processing. This effect is usually referre to as the istance concentration phenomenon in the computational learning literature. Despite a growing interest in this important question, previous work has essentially characterize the problem in terms of numerical experiments an incomplete mathematical statements. In the present paper, we soliify some of the arguments which previously appeare in the literature an offer new insights into the phenomenon. 1 Introuction In what follows, for x = x 1,...,x a vector of R an 0 < p <, we set x p = 1/p x j p. 1 j=1 It is recalle that for p 1,. p is a norm on R the L p -norm but for 0 < p < 1, the triangle inequality oes not hol an. p is sometimes calle a prenorm. In the sequel, we take the liberty to call p-norm a norm or prenorm of the form 1, with p > 0. Gérar Biau Université Pierre et Marie Curie, Ecole Normale Supérieure & Institut universitaire e France, e- mail: gerar.biau@upmc.fr Davi M. Mason University of Delaware, avim@uel.eu 1

2 2 Gérar Biau an Davi M. Mason Now, let X = X 1,...,X be a R -value ranom vector with i.i.. components. The stuy of the probabilistic properties of X p as the imension tens to infinity has recently witnesse an important research effort in the computational learning community see, e.g., François et al., 2007, for a review. This activity is easily explaine by the central role playe by the quantity X p in the analysis of nearest neighbor search algorithms, which are currently wiely use in ata management an atabase mining. Inee, fining the closest matching object in an L p -sense is of significant importance for numerous applications, incluing pattern recognition, multimeia content retrieving images, vieos, etc., ata mining, frau etection an DNA sequence analysis, just to name a few. Most of these real applications involve very high-imensional ata for example, pictures taken by a stanar camera consist of several million pixels an the curse of imensionality when tens to be a major obstacle in the evelopment of nearest neighbor-base techniques. The effect on X p of letting go large is usually referre to as the istance concentration phenomenon in the computational learning literature. It is in fact a quite vague term that encompasses several interpretations. For example, it has been observe by several authors e.g., François et al., 2007 that, uner appropriate moment assumptions, the so-calle relative stanar eviation Var X p /E X p tens to zero as tens to infinity. Consequently, by Chebyshev s inequality this will be rigorously establishe in Section 2, for all ε > 0, { } X p P 1 E X p ε 0, as. This simple result reveals that the relative error mae as consiering E X p instea of the ranom value X p becomes asymptotically negligible. Therefore, high-imensional vectors X appear to be istribute on a sphere of raius E X p. The istance concentration phenomenon is also often expresse by consiering an i.i.. X sample X 1,...,X n an observing that, uner certain conitions, the relative contrast max 1 i n X i p min 1 i n X i p min 1 i n X i p vanishes in probability as tens to infinity, whereas the contrast max X i p min X i p 1 i n 1 i n behaves in expectation as 1/p 1/2 Beyer et al., 1999; Hinneburg et al., 2000; Aggarwal et al., 2001; Kabán, Thus the ratio between the largest an smallest p-istances from the sample to the origin becomes negligible as the imension increases, an all points seem to be locate at approximately the same istance. This phenomenon may ramatically affect high-imensional ata processing, analysis, retrieval an inexing, insofar these proceures rely on some notion of p-norm. Accoringly, serious questions are raise as to the valiity of many nearest neighbor search heuristics in high imension, a problem that can be further exacerbate by

3 High-Dimensional p-norms 3 techniques that fin approximate neighbors in orer to improve algorithmic performance Beyer et al., Even if people have now a better unerstaning of the istance concentration phenomenon an its practical implications, it is however our belief that there is still a serious nee to soliify its mathematical backgroun. Inee, previous work has essentially characterize the problem in terms of numerical experiments an often incomplete probabilistic statements, with missing assumptions an sometimes efective proofs. Thus, our objective in the present paper is to soliify some of the statements which previously appeare in the computational learning literature. We start in Section 2 by offering a thorough analysis of the behavior of the p-norm X p as a function of p an the properties of the istribution of X as. Section 3 is evote to the investigation of some new asymptotic properties of the contrast max 1 i n X i p min 1 i n X i p, both as an n. For the sake of clarity, most technical proofs are gathere in Section 4. The basic tools that we shall use are the law of large numbers, the central limit theorem, moment bouns for sums of i.i.. ranom variables, an a coupling inequality of Yurinskiĭ Asymptotic behavior of p-norms 2.1 Consistency Throughout the ocument, the notation P an D stan for convergence in probability an in istribution, respectively. The notation u n = ov n an u n = Ov n mean, respectively, that u n /v n 0 an u n Cv n for some constant C, as n. The symbols o P v n an O P v n enote, respectively, a sequence of ranom variables {Y n } n 1 such that Y n /v n P 0 an Yn /v n is boune in probability, as n. We start this section with a general proposition that plays a key role in the analysis. Proposition 1. Let {U } 1 be a sequence of ranom variables such that U P a, an let ϕ be a real-value measurable function which is continuous at a. Assume that: i ϕ is boune on [ M,M] for some M > a ; ii E ϕu < for all 1. Then, as, if an only if EϕU ϕa Eϕ U 1{ U > M} 0. 2 Proof. The proof is easy. Conition i an continuity of ϕ at a allow us to apply the boune convergence theorem to get

4 4 Gérar Biau an Davi M. Mason Since EϕU 1{ U M} ϕa. EϕU = EϕU 1{ U M} + EϕU 1{ U > M}, the rest of the proof is obvious. We shall now specialize the result of Proposition 1 to the case when U = 1 Y j := Y, j=1 where {Y j } j 1 is a sequence of i.i.. Y ranom variables with finite mean µ. In this case, by the strong law of large numbers, U µ almost surely. The following lemma gives two sufficient conitions for 2 to hol when U = Y. Lemma 1. let ϕ be a real-value measurable function. Assume that one of the following two conitions is satisfie: Conition 1 The function ϕ is convex on R an E ϕy <. Conition 2 For some s > 1, lim supe ϕy s <. Then 2 is satisfie for the sequence {Y } 1 with a = µ an M > µ. Proof. Suppose that Conition 1 is satisfie. Then note that by the convexity assumption E ϕy 1 { Y > M } 1 E ϕyj { 1 Y > M } j=1 = E ϕy 1 { Y > M }. Since M > µ, we conclue that with probability one, ϕy 1{ Y > M} 0. Also ϕy 1{ Y > M} ϕy. Therefore, by the ominate convergence theorem, 2 hols. Next, notice by Höler s inequality with 1/r = 1 1/s that E ϕy 1 { Y > M } E ϕy s 1/s P { Y > M } 1/r. Since P{ Y > M} 0, 2 immeiately follows from Conition 2. Let us now return to the istance concentration problem, which has been iscusse in the introuction. Recall that we enote by X = X 1,...,X a R -value ranom vector with i.i.. X components. Whenever for p > 0 E X p <, we set µ p = E X p. Also when Var X p <, we shall write σ 2 p = Var X p. Proposition 1 an Lemma 1 yiel the following corollary:

5 High-Dimensional p-norms 5 Corollary 1. Fix p > 0 an r > 0. i Whenever r/p < 1 an E X p <, whereas if E X p =, then E X r p r/p ii Whenever r/p 1 an E X r <, E X r p r/p µ r/p p, as, E X r p lim r/p =. whereas if E X r =, then, for all 1, µ r/p p, as, E X r p r/p =. Proof. We shall apply Proposition 1 an Lemma 1 to Y = X p, Y j = X j p, j 1, an ϕu = u r/p. Proof of i For the first part of i, notice that with s = p/r > 1 E ϕ j=1 X j p s = j=1 E X j p = E X p <. This shows that sufficient Conition 2 of Lemma 1 hols, which by Proposition 1 gives the result. For the secon part of i observe that for any K > 0 j=1 X j p r/p j=1 X j E E p 1 { X j K } r/p. Observing that the right-han sie of the inequality converges to E X p 1{ X K} r/p as, we get for any K > 0 lim inf E j=1 X j p r/p E X p 1{ X K} r/p.

6 6 Gérar Biau an Davi M. Mason Since K can be chosen arbitrarily large an we assume that E X p =, we see that the conclusion hols. Proof of ii For the first part of ii, note that in this case r/p 1, so ϕ is convex. Moreover, note that E ϕ j=1 X j p = E j=1 X j p r/p 1 E X r by Jensen s inequality <. Thus sufficient Conition 1 of Lemma 1 hols, which by Proposition 1 leas to the result. For the secon part of ii, observe that if E X r =, then, for all 1, E j=1 X j p r/p r/p E X r =. Applying Corollary 1 with p > 0 an r = 2 yiels the following important result: Proposition 2. Fix p > 0 an assume that 0 < E X m < for m = max2, p. Then, as, E X p 1/p µ p 1/p an which implies E X 2 p 2/p µ 2/p p, Var X p E X p 0, as. This result, when correctly state, correspons to Theorem 5 of François et al It expresses the fact that the relative stanar eviation converges towars zero when the imension grows. It is known in the computational learning literature as the p-norm concentration in high-imensional spaces. It is noteworthy that, by Chebyshev s inequality, for all ε > 0,

7 High-Dimensional p-norms 7 { } X p { P 1 E X p ε } = P X p E X p εe X p Var X p ε 2 E 2 X p 0, as. 3 That is, X p /E X p P 1 or, in other wors, the sequence { X p } 1 is relatively stable Boucheron et al., This property guarantees that the ranom fluctuations of X p aroun its expectation are of negligible size when compare to the expectation, an therefore most information about the size of X p is given by E X p as becomes large. 2.2 Rates of convergence The asymptotic concentration statement of Corollary 1 can be mae more precise by means of rates of convergence, at the price of stronger moment assumptions. To reach this objective, we first nee a general result to control the behavior of a function of an i.i.. empirical mean aroun its true value. Thus, assume that {Y j } j 1 are i.i.. Y with mean µ an variance σ 2. As before, we efine Y = 1 Y j. j=1 Let ϕ be a real-value function with erivatives ϕ an ϕ. Khan 2004 provies sufficient conitions for EϕY = ϕµ + ϕ µσ 2 + o 2 2 to hol. The following lemma, whose assumptions are less restrictive, can be use in place of Khan s result For the sake of clarity, its proof is postpone to Section 4. Lemma 2. Let {Y j } j 1 be a sequence of i.i.. Y ranom variables with mean µ an variance σ 2, an ϕ be a real-value function with continuous erivatives ϕ an ϕ in a neighborhoo of µ. Assume that for some r > 1, an, with 1/s = 1 1/r, Then, as, E Y r+1 < 4 lim supe ϕy s <. 5 EϕY = ϕµ + ϕ µσ 2 + o 1. 2

8 8 Gérar Biau an Davi M. Mason The consequences of Lemma 2 in terms of p-norm concentration are summarize in the following proposition: Proposition 3. Fix p > 0 an assume that 0 < E X m < for m = max4,3p. Then, as, E X p = 1/p µ p 1/p + O 1/p 1 an which implies Var X p = µ2/p 2 p σp 2 1 2/p p 2 + o 1+2/p, Var X p E X p σ p pµ p, as. Proposition 3 is state without assumptions as Theorem 6 in François et al. 2007, where it is provie with an ambiguous proof. This result shows that for a fixe large, the relative stanar eviation evolves with p as the ratio σ p /pµ p. For instance, when the istribution of X is uniform, µ p = 1 p + 1 an σ p = p p + 1 In that case, we conclue that Var X p 1 E X p 2p p + 1. Thus, in the uniform setting, the limiting relative stanar eviation is a strictly ecreasing function of p. This observation is often interprete by saying that p- norms are more concentrate for larger values of p. There are however istributions for which this is not the case. A counterexample is given by a balance mixture of two stanar Gaussian ranom variables with mean 1 an 1, respectively see François et al., 2007, page 881. In that case, it can be seen that the asymptotic relative stanar eviation with p 1 is smaller than for values of p [8,30], making fractional norms more concentrate. Proof Proposition 3. Fix p > 0 an introuce the functions on R ϕ 1 u = u 1/p an ϕ 2 u = u 2/p. Assume that E X max4,p <. Applying Corollary 1 we get that, as, an E j=1 X j p 2/p µ 2/p p

9 High-Dimensional p-norms 9 E j=1 X j p 4/p µ 4/p p. This says that with s = 2, for i = 1,2, lim supe ϕ j=1 X j p s i <. Now, let Y = X p an set r = 2. If we also assume that E Y r+1 = E Y 3 = E X 3p <, we get by applying Lemma 2 to ϕ 1 an ϕ 2 that for i = 1,2 Eϕ i Y = ϕ i µ p + ϕ i µ pσp 2 + o 1. 2 Thus, whenever E X m <, where m = max4,3p, an Therefore, we see that E Y 1/p = µ p 1/p + 1 p E Y 2/p = µ p 2/p + 1 p 1 p p 2 p p 1/p 2 µ p σp 2 + o 1 2 2/p 2 µ p σp 2 + o 1. Var Y 1/p = E Y 2/p E 2 Y 1/p = µ2/p 2 p σp 2 p 2 + o 1. The ientity Y = 1 j=1 X j p yiels the esire results. We conclue the section with a corollary, which specifies inequality 3. Corollary 2. Fix p > 0. i If 0 < E X m < for m = max4,3p, then, for all ε > 0, { } X p P 1 E X p ε σp 2 ε 2 p 2 µ p 2 + o 1. ii If for some positive constant C, 0 < X C almost surely, then, for p 1 an all ε > 0, { } X p P 1 E X p ε 2exp ε 2 2/p 1 µ p 2/p 2C 2 + o 2/p 1.

10 10 Gérar Biau an Davi M. Mason Proof. Statement i is an immeiate consequence of Proposition 3 an Chebyshev s inequality. Now, assume that p 1, an let A = [ C,C]. For x = x 1,...,x R, let g : A R be efine by gx = x p = 1/p x j p. j=1 Clearly, for each 1 j, sup gx 1,...,x gx 1,...,x j 1,x j,x j+1,...,x x 1,...,x A x j A = sup x A,x j A x p x p, where x is ientical to x, except on the j-th coorinate where it takes the value x j. It follows, by Minkowski inequality which is vali here since p 1, that sup gx1,...,x gx 1,...,x j 1,x j,x j+1,...,x x 1,...,x A x j A sup x x p x A x j A = sup x,x A 2 x x 2C. Consequently, using the boune ifference inequality McDiarmi, 1989, we obtain { } { } X p P 1 E X p ε = P X p E X p εe X p 2exp 2εE X p 2 4C 2 = 2exp ε 2 2/p 1 µ p 2/p 2C 2 + o 2/p 1, where, in the last inequality, we use Proposition 3. This conclues the proof. 3 Minima an maxima Another important question arising in high-imensional nearest neighbor search analysis concerns the relative asymptotic behavior of the minimum an maximum

11 High-Dimensional p-norms 11 p-istances to the origin within a ranom sample. To be precise, let X 1,...,X n be an i.i.. X sample, where X = X 1,...,X is as usual a R -value ranom vector with i.i.. X components. We will be primarily intereste in this section in the asymptotic properties of the ifference the contrast max 1 i n X i p min 1 i X i p. Assume, to start with, that n is fixe an only is allowe to grow. Then an immeiate application of the law of large numbers shows that, whenever µ p = E X p <, almost surely as, Moreover, if 0 < µ p <, then 1/p max 1 i n X i p min 1 i n X i p max 1 i n X i p min 1 i n X i p P 1. P 0. The above ratio is sometimes calle the relative contrast in the computational learning literature. Thus, as becomes large, all observations seem to be istribute at approximately the same p-istance from the origin. The concept of nearest neighbor measure by p-norms in high imension is therefore less clear than in small imension, with resulting computational ifficulties an algorithmic inefficiencies. These consistency results can be specifie by means of asymptotic istributions. Recall that if Z 1,...,Z n are i.i. stanar normal ranom variables, the sample range is efine to be M n = max Z i min Z i. 1 i n 1 i n The asymptotic istribution of M n is well known see, e.g., Davi, Namely, for any x one has { 2logn M n 2 2logn + lim P n = exp t e t e x t t. } loglogn + log4π 2 x 2logn For future reference, we shall sketch the proof of this fact here. It is well known that with a n = 2logn an b n = 2logn 1 loglogn + log4π 6 2 2logn we have a n max Z i b n,a n min Z i + b n E, E, 7 1 i n 1 i n where E an E are inepenent, E = E an P{E x} = exp exp x, < x <. The asymptotic inepenence of the maximum an minimum part can be inferre from Theorem of Reiss, 1989, an the asymptotic istribution part from Example 2 on page 71 of Resnick, From 7 we get

12 12 Gérar Biau an Davi M. Mason a n max 1 i n Z i min 1 i n Z i 2a n b n D E + E. Clearly, P{E + E x} = = exp e x t exp e t e t t exp t e t e x t t. Our first result treats the case when n is fixe an. Proposition 4. Fix p > 0, an assume that 0 < E X p < an 0 < σ p <. Then, for fixe n, as, 1/2 1/p max X i p min X D i p σ pµ p 1/p 1 M n. 1 i n 1 i n p To our knowlege, this is the first statement of this type in the analysis of highimensional nearest neighbor problems. In fact, most of the existing results merely boun the asymptotic expectation of the normalize ifference an ratio between the max an the min, but with bouns which are unfortunately not of the same orer in n as soon as n 3 see, e.g., Theorem 3 in Hinneburg et al., One of the consequences of Proposition 4 is that, for fixe n, the ifference between the farthest an nearest neighbors oes not necessarily go to zero in probability as tens to infinity. Inee, we see that the size of max X i p min X i p 1 i n 1 i n grows as 1/p 1/2. For example, this ifference increases with imensionality as for the L 1 Manhattan metric an remains stable in istribution for the L 2 Eucliean metric. It tens to infinity in probability for p < 2 an to zero for p > 2. This observation is in line with the conclusions of Hinneburg et al. 2000, who argue that nearest neighbor search in a high-imensional space tens to be meaningless for norms with larger exponents, since the maximum observe istance tens towars the minimum one. It shoul be note, however, that the variance of the limiting istribution epens on the value of p. Remark 1. Let Z 1,...,Z n be i.i. stanar normal ranom variables, an let R n = max 1 i n Z i min 1 i n Z i. Assuming µ p > 0 an 0 < σ p <, one can prove, using the same technique, that max 1 i n X i p 1/p µ p min 1 i n X i p 1/p µ p D Rn.

13 High-Dimensional p-norms 13 Proof Proposition 4. Denote by Z n a centere Gaussian ranom vector in R n, with ientity covariance matrix. By the central limit theorem, as, [ X1 p p,..., X n p ] p D µ p,..., µ p σ p Z n. Applying the elta metho with the mapping f x 1,...,x n = x 1/p 1,...,xn 1/p which is ifferentiable at µ p,..., µ p since µ p > 0, we obtain [ X1 p 1/p,..., X ] n p 1/p µ p 1/p,..., µ p 1/p D σpµ p 1/p 1 Z n. p Thus, by continuity of the maximum an minimum functions, 1/2 1/p max X i p min X D i p σ pµ p 1/p 1 M n. 1 i n 1 i n p In the previous analysis, n the sample size was fixe whereas the imension was allowe to grow to infinity. A natural question that arises concerns the impact of letting n be a function of such that n tens to infinity as Mallows, Proposition 5 below offers a first answer. Proposition 5. Fix p 1, an assume that 0 < E X 3p < an σ p > 0. For any sequence of positive integers {n} 1 converging to infinity an satisfying n = o 1/5 log 6/5, as, 8 we have pa n 1/2 1/p µ p 1/p 1 max σ X i p min X D i p 2a n b n E + E, 1 i n 1 i n p where a n an b n are as in 6, an E an E are as in 7. Proof. In the following, we let δ = 1/log. For future use note that δ 2 logn 0 an n 5 δ 6 0, as. 9 In the proof we shall often suppress the epenence of n an δ on. For 1 i n, we set X i = X 1,i,...,X,i an X i p p = j=1 X j,i p.

14 14 Gérar Biau an Davi M. Mason We see that for n 1, X 1 p p µ p,..., X n p p µ p σp σp = j=1 X j,1 p µ p,..., j=1 X j,n p µ p σp σp := Y 1,...,Y n = Y n R n. As above, let Z n = Z 1,...,Z n be a centere Gaussian ranom vector in R n, with ientity covariance matrix. Write, for 1 j, X j,1 p µ p ξ j =,..., X j,n p µ p σp σp an note that j=1 ξ j = Y n. Set β = j=1 E ξ j 3 2. Then, by Jensen s inequality, E ξ j 3 2 = E n i=1 X j,i p µ p 2 3/2 σ 2 p n σ 2 p This gives that for any δ > 0, possibly epening upon n, B := βnδ 3 n5/2 σ 3 p E X p µ p 3 δ 3. 3/2 E X p µ p 3. Applying a result of Yurinskiĭ 1977 as formulate in Section 4 of Chapter 10 of Pollar 2001 we get, on a suitable probability space epening on δ > 0 an n 1, there exist ranom vectors Y n an Z n satisfying Y D n = Y n an Z D n = Z n such that { } P Y n Z n 2 > 3δ CB 1 + logb, 10 n where C is a universal constant. To avoi the use of primes we shall from now on D rop them from the notation an write Y n = Y D n an Z n = Z n, where it is unerstoo that the pair Y n,z n satisfies inequality 10 for the given δ > 0. Using the fact that max x i max 1 i n 1 i n y i n x i y i 2, i=1 we get, for all ε > 0, { } } P a n max Y i max Z i > ε P{ 2logn Yn Z n 2 > ε. 1 i n 1 i n

15 High-Dimensional p-norms 15 Thus, for all large enough, { } P a n max Y i max Z i > ε P{ 2logn Yn Z n 2 > 3δ } 2logn 1 i n 1 i n since δ logn 0 as { } = P Y n Z n 2 > 3δ. From 10, we euce that for all ε > 0 an all large enough, { } P a n max Y i max Z i > ε CB 1 i n 1 i n But, by our choice of δ an 9, B 1 + logb 0, n so that Similarly, one proves that a n max 1 i n Y i max 1 i n Z i = o P 1. a n min 1 i n Y i min 1 i n Z i = o P logb n Thus, by 7, we conclue that a n max Y i b n,a n min Y D i + b n E, E i n 1 i n Next, we have a n max Y i b n,a n min 1 i n max 1 i n X i = a p p n σp = Y i + b n 1 i n µp σ p b n, a n min 1 i n X i p p σp max 1 i n X i a p p min n 1 i n X i p p β n, a n β n σp σp. µp + b n, σ p where β n = both µp σ p + b n an β n µp = σ p b n. Note that a n an 11 imply that max 1 i n X i p p P min 1 i n X i p p β n 0 an β n σp σp P 0. 12

16 16 Gérar Biau an Davi M. Mason Observe also that by a two term Taylor expansion, for a suitable β n between β n an max 1 i n X i p p/ σ p, pa n β 1/p 1 n max 1 i n X i p 1/p p β 1/p σp max 1 i n X i p p = a n β n σp + a n 1 p βn 1/p 1 2p We obtain by 11 an 12 that a 2 n β 1/p 2 n max 1 i n X i p 2 1/p 2 p β n β n σp a n βn 1/p 1 n max 1 i n X i p p σp β n 2. 1 = O P = o P 1. a n β n Similarly, pa n β n 1/p 1 min 1 i n X i p p σp min 1 i n X i p p = a n β n σp 1/p β n 1/p + o P 1. Keeping in min that β n /β n 1, we get pa n max 1 i n X i p 1/p p βn 1/p 1 βn 1/p, σp an hence D E, E pa n β 1/p 1 n min 1 i n X i p p σp max 1 i n X i p σ p 1/p min 1 i n X i p σ p 1/p β 1/p n + β n 1/p Next notice that 8 implies that b n / 0, as. Thus, recalling 1/p β n 1/p D E + E. β n b = 1 + n up /σ p µp /σ p an β n b = 1 n, up /σ p µp /σ p we are le to

17 High-Dimensional p-norms 17 pa n βn 1/p 1 βn 1/p β n 1/p = 2a n b n + Oa n b 2 nβn 1 = 2a n b n + o1. Therefore we get pa n 1/2 1/p µ p 1/p 1 max σ X i p min X D i p 2a n b n E + E. 1 i n 1 i n p 4 Proof of Lemma 2 In the sequel, to lighten notation a bit, we set Y = Y. Choose any ε > 0 an δ > 0 such that ϕ has continuous erivatives ϕ an ϕ on I δ = [µ δ,µ + δ] an ϕ µ ϕ x ε for all x I δ. We see that by Taylor s theorem that for Y I δ ϕy = ϕµ + ϕ µy µ ϕ µy µ 2, 13 where µ lies between Y an µ. Clearly, EϕY ϕµ σ 2 ϕ µ 2 = E ϕy ϕµ + ϕ µy µ ϕ µy µ 2 where E { ϕy ϕµ + ϕ µy µ ϕ µy µ 2} 1{Y I δ } + E ϕy 1{Y / Iδ } + E PY 1{Y / Iδ }, Py = ϕµ + ϕ µy µ ϕ µy µ 2. Now using 13 an ϕ µ ϕ x ε for all x I δ, we may write E { ϕy ϕµ + ϕ µy µ ϕ µy µ 2} 1{Y I δ } ε 2 EY µ2 = εσ 2 2. Next, we shall boun E ϕy 1{Y / Iδ } + E PY 1{Y / Iδ } := Recall that we assume that for some r > 1, conition 4 hols. In this case, by Theorem 28 on page 286 of Petrov 1975 applie with r replace by r + 1, for all δ > 0, P { Y µ δ } = o r. 14

18 18 Gérar Biau an Davi M. Mason Then, by using Höler s inequality, 5 an 14, we get 1 E ϕy s 1/s P{Y / Iδ } 1/r = o 1. We shall next boun 2. Obviously from 14 ϕµ P{Y / I δ } = o 1. Furthermore, by the Cauchy-Schwarz inequality an 14, E ϕ µy µ1{y / I δ } ϕ µ σ 1/2 o r/2 = o 1, an by Höler s inequality with p = r + 1/2 an q 1 = 1 p 1 = 1 2/r + 1 = r 1/r + 1, we have 2 1 ϕ µ E Y µ 2 1{Y / I δ } 2 1 ϕ µ E Y µ r+1 2/r+1 P{Y / Iδ } 1/q. Applying Rosenthal s inequality see equation 2.3 in Giné et al., 2003 we obtain E Y µ r+1 = E 1 i=1 15r + 1 logr + 1 Y i µ r+1 r+1 max r+1/2 EY 2 r+1/2, r E Y r+1. Thus E Y µ r+1 2/r+1 = O 1, which when combine with 14 gives Thus 2 1 ϕ µ E Y µ r+1 2/r+1 P{Y / Iδ } r 1/r+1 = o 1. 2 = o 1. Putting everything together, we conclue that for any ε > 0 lim sup EϕY ϕµ σ 2 ϕ µ 2 εσ 2 2. Since ε > 0 can be chosen arbitrarily small, this completes the proof.

19 High-Dimensional p-norms 19 Acknowlegements The authors thank the referee for pointing out a misstatement in the original version of the paper. References C.C. Aggarwal, A. Hinneburg, an D.A. Keim. On the surprising behavior of istance metrics in high imensional space. In Proceeings of the 8th International Conference on Database Theory, pages , Berlin, Springer. K.S. Beyer, J. Golstein, R. Ramakrishnan, an U. Shaft. When is nearest neighbor meaningful? In Proceeings of the 7th International Conference on Database Theory, pages , Berlin, Springer. S. Boucheron, G. Lugosi, an P. Massart. Concentration Inequalities: A Nonasymptotic Theory of Inepenence. Oxfor University Press, Oxfor, H. Davi. Orer Statistics. 2n Eition. Wiley, New York, D. François, V. Wertz, an M. Verleysen. The concentration of fractional istances. IEEE Transactions on Knowlege an Data Engineering, 19: , E. Giné, D.M. Mason, an A.Yu. Zaitsev. The L 1 -norm ensity estimator process. The Annals of Probability, 31: , A. Hinneburg, C.C. Aggarwal, an D.A. Keim. What is the nearest neighbor in high imensional spaces. In Proceeings of the 26th International Conference on Very Large Data Bases, pages , San Francisco, Morgan Kaufmann. A. Kabán. Non-parametric etection of meaningless istances in high imensional ata. Statistics an Computing, 22: , R.A. Khan. Approximation for the expectation of a function of the sample mean. Statistics, 38: , C.L. Mallows. A note on asymptotic joint normality. The Annals of Mathematical Statistics, 43: , C. McDiarmi. On the metho of boune ifferences. In J. Siemons, eitor, Surveys in Combinatorics, 1989, Lonon Mathematical Society Lecture Note Series 141, pages Cambrige University Press, V.V. Petrov. Sums of Inepenent Ranom Variables, volume 82 of Ergebnisse er Mathematik un ihrer Grenzgebiete. Springer, New York, D. Pollar. A User s Guie to Measure Theoretic Probability. Cambrige University Press, Cambrige, R.-D. Reiss. Approximate Distributions of Orer Statistics. With Applications to Nonparametric Statistics. Springer, New York, S.I. Resnick. Extreme Values, Regular Variation, an Point Processes. Springer, New York, V.V. Yurinskiĭ. On the error of the Gaussian approximation for convolutions. Teoriya Veroyatnostei i ee Primeneniya, 22: , 1977.

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012 CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration