High-Dimensional p-norms

Similar documents
Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012

Lower bounds on Locality Sensitive Hashing

On the Surprising Behavior of Distance Metrics in High Dimensional Space

Topic 7: Convergence of Random Variables

Some Examples. Uniform motion. Poisson processes on the real line

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments

LECTURE NOTES ON DVORETZKY S THEOREM

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k

PDE Notes, Lecture #11

Convergence of Random Walks

Least-Squares Regression on Sparse Spaces

ON THE DISTANCE BETWEEN SMOOTH NUMBERS

Acute sets in Euclidean spaces

A note on asymptotic formulae for one-dimensional network flow problems Carlos F. Daganzo and Karen R. Smilowitz

Logarithmic spurious regressions

A new proof of the sharpness of the phase transition for Bernoulli percolation on Z d

Tractability results for weighted Banach spaces of smooth functions

FLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS. 1. Introduction

Linear First-Order Equations

Function Spaces. 1 Hilbert Spaces

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback

Monotonicity for excited random walk in high dimensions

A LIMIT THEOREM FOR RANDOM FIELDS WITH A SINGULARITY IN THE SPECTRUM

Introduction to the Vlasov-Poisson system

u!i = a T u = 0. Then S satisfies

d dx But have you ever seen a derivation of these results? We ll prove the first result below. cos h 1

Two formulas for the Euler ϕ-function

Separation of Variables

Introduction to Markov Processes

15.1 Upper bound via Sudakov minorization

arxiv: v4 [math.pr] 27 Jul 2016

Euler equations for multiple integrals

The Sokhotski-Plemelj Formula

JUST THE MATHS UNIT NUMBER DIFFERENTIATION 2 (Rates of change) A.J.Hobson

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013

The Sokhotski-Plemelj Formula

Jointly continuous distributions and the multivariate Normal

ALGEBRAIC AND ANALYTIC PROPERTIES OF ARITHMETIC FUNCTIONS

12.11 Laplace s Equation in Cylindrical and

Agmon Kolmogorov Inequalities on l 2 (Z d )

Schrödinger s equation.

Least Distortion of Fixed-Rate Vector Quantizers. High-Resolution Analysis of. Best Inertial Profile. Zador's Formula Z-1 Z-2

NOTES ON EULER-BOOLE SUMMATION (1) f (l 1) (n) f (l 1) (m) + ( 1)k 1 k! B k (y) f (k) (y) dy,

On the Behavior of Intrinsically High-Dimensional Spaces: Distances, Direct and Reverse Nearest Neighbors, and Hubness

7.1 Support Vector Machine

The derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x)

Multi-View Clustering via Canonical Correlation Analysis

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions

Ramsey numbers of some bipartite graphs versus complete graphs

Lecture 6 : Dimensionality Reduction

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION

Generalized Tractability for Multivariate Problems

A Sketch of Menshikov s Theorem

The Exact Form and General Integrating Factors

Tutorial on Maximum Likelyhood Estimation: Parametric Density Estimation

SINGULAR PERTURBATION AND STATIONARY SOLUTIONS OF PARABOLIC EQUATIONS IN GAUSS-SOBOLEV SPACES

REAL ANALYSIS I HOMEWORK 5

Asymptotic estimates on the time derivative of entropy on a Riemannian manifold

First Order Linear Differential Equations

Construction of the Electronic Radial Wave Functions and Probability Distributions of Hydrogen-like Systems

PETER L. BARTLETT AND MARTEN H. WEGKAMP

Chromatic number for a generalization of Cartesian product graphs

Math 342 Partial Differential Equations «Viktor Grigoryan

1 dx. where is a large constant, i.e., 1, (7.6) and Px is of the order of unity. Indeed, if px is given by (7.5), the inequality (7.

Sturm-Liouville Theory

Discrete Mathematics

arxiv: v2 [math.dg] 16 Dec 2014

Chapter 4. Electrostatics of Macroscopic Media

A Unified Theorem on SDP Rank Reduction

arxiv: v4 [cs.ds] 7 Mar 2014

1 Lecture 20: Implicit differentiation

A New Minimum Description Length

The Generalized Incompressible Navier-Stokes Equations in Besov Spaces

International Journal of Pure and Applied Mathematics Volume 35 No , ON PYTHAGOREAN QUADRUPLES Edray Goins 1, Alain Togbé 2

The total derivative. Chapter Lagrangian and Eulerian approaches

FURTHER BOUNDS FOR THE ESTIMATION ERROR VARIANCE OF A CONTINUOUS STREAM WITH STATIONARY VARIOGRAM

Influence of weight initialization on multilayer perceptron performance

A Weak First Digit Law for a Class of Sequences

Witt#5: Around the integrality criterion 9.93 [version 1.1 (21 April 2013), not completed, not proofread]

WUCHEN LI AND STANLEY OSHER

On the Cauchy Problem for Von Neumann-Landau Wave Equation

6 General properties of an autonomous system of two first order ODE

Generalization of the persistent random walk to dimensions greater than 1

Leaving Randomness to Nature: d-dimensional Product Codes through the lens of Generalized-LDPC codes

LEGENDRE TYPE FORMULA FOR PRIMES GENERATED BY QUADRATIC POLYNOMIALS

Polynomial Inclusion Functions

Calculus of Variations

GLOBAL SOLUTIONS FOR 2D COUPLED BURGERS-COMPLEX-GINZBURG-LANDAU EQUATIONS

Lower Bounds for the Smoothed Number of Pareto optimal Solutions

The Subtree Size Profile of Plane-oriented Recursive Trees

QF101: Quantitative Finance September 5, Week 3: Derivatives. Facilitator: Christopher Ting AY 2017/2018. f ( x + ) f(x) f(x) = lim

L p Theory for the Multidimensional Aggregation Equation

Lecture 5. Symmetric Shearer s Lemma

Summary: Differentiation

Step 1. Analytic Properties of the Riemann zeta function [2 lectures]

19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control

Interconnected Systems of Fliess Operators

Chapter 6: Energy-Momentum Tensors

Similarity Measures for Categorical Data A Comparative Study. Technical Report

Quantum mechanical approaches to the virial

Transcription:

High-Dimensional p-norms Gérar Biau an Davi M. Mason Abstract Let X = X 1,...,X be a R -value ranom vector with i.i.. components, an let X p = j=1 X j p 1/p be its p-norm, for p > 0. The impact of letting go to infinity on X p has surprising consequences, which may ramatically affect high-imensional ata processing. This effect is usually referre to as the istance concentration phenomenon in the computational learning literature. Despite a growing interest in this important question, previous work has essentially characterize the problem in terms of numerical experiments an incomplete mathematical statements. In the present paper, we soliify some of the arguments which previously appeare in the literature an offer new insights into the phenomenon. 1 Introuction In what follows, for x = x 1,...,x a vector of R an 0 < p <, we set x p = 1/p x j p. 1 j=1 It is recalle that for p 1,. p is a norm on R the L p -norm but for 0 < p < 1, the triangle inequality oes not hol an. p is sometimes calle a prenorm. In the sequel, we take the liberty to call p-norm a norm or prenorm of the form 1, with p > 0. Gérar Biau Université Pierre et Marie Curie, Ecole Normale Supérieure & Institut universitaire e France, e- mail: gerar.biau@upmc.fr Davi M. Mason University of Delaware, e-mail: avim@uel.eu 1

2 Gérar Biau an Davi M. Mason Now, let X = X 1,...,X be a R -value ranom vector with i.i.. components. The stuy of the probabilistic properties of X p as the imension tens to infinity has recently witnesse an important research effort in the computational learning community see, e.g., François et al., 2007, for a review. This activity is easily explaine by the central role playe by the quantity X p in the analysis of nearest neighbor search algorithms, which are currently wiely use in ata management an atabase mining. Inee, fining the closest matching object in an L p -sense is of significant importance for numerous applications, incluing pattern recognition, multimeia content retrieving images, vieos, etc., ata mining, frau etection an DNA sequence analysis, just to name a few. Most of these real applications involve very high-imensional ata for example, pictures taken by a stanar camera consist of several million pixels an the curse of imensionality when tens to be a major obstacle in the evelopment of nearest neighbor-base techniques. The effect on X p of letting go large is usually referre to as the istance concentration phenomenon in the computational learning literature. It is in fact a quite vague term that encompasses several interpretations. For example, it has been observe by several authors e.g., François et al., 2007 that, uner appropriate moment assumptions, the so-calle relative stanar eviation Var X p /E X p tens to zero as tens to infinity. Consequently, by Chebyshev s inequality this will be rigorously establishe in Section 2, for all ε > 0, { } X p P 1 E X p ε 0, as. This simple result reveals that the relative error mae as consiering E X p instea of the ranom value X p becomes asymptotically negligible. Therefore, high-imensional vectors X appear to be istribute on a sphere of raius E X p. The istance concentration phenomenon is also often expresse by consiering an i.i.. X sample X 1,...,X n an observing that, uner certain conitions, the relative contrast max 1 i n X i p min 1 i n X i p min 1 i n X i p vanishes in probability as tens to infinity, whereas the contrast max X i p min X i p 1 i n 1 i n behaves in expectation as 1/p 1/2 Beyer et al., 1999; Hinneburg et al., 2000; Aggarwal et al., 2001; Kabán, 2012. Thus the ratio between the largest an smallest p-istances from the sample to the origin becomes negligible as the imension increases, an all points seem to be locate at approximately the same istance. This phenomenon may ramatically affect high-imensional ata processing, analysis, retrieval an inexing, insofar these proceures rely on some notion of p-norm. Accoringly, serious questions are raise as to the valiity of many nearest neighbor search heuristics in high imension, a problem that can be further exacerbate by

High-Dimensional p-norms 3 techniques that fin approximate neighbors in orer to improve algorithmic performance Beyer et al., 1999. Even if people have now a better unerstaning of the istance concentration phenomenon an its practical implications, it is however our belief that there is still a serious nee to soliify its mathematical backgroun. Inee, previous work has essentially characterize the problem in terms of numerical experiments an often incomplete probabilistic statements, with missing assumptions an sometimes efective proofs. Thus, our objective in the present paper is to soliify some of the statements which previously appeare in the computational learning literature. We start in Section 2 by offering a thorough analysis of the behavior of the p-norm X p as a function of p an the properties of the istribution of X as. Section 3 is evote to the investigation of some new asymptotic properties of the contrast max 1 i n X i p min 1 i n X i p, both as an n. For the sake of clarity, most technical proofs are gathere in Section 4. The basic tools that we shall use are the law of large numbers, the central limit theorem, moment bouns for sums of i.i.. ranom variables, an a coupling inequality of Yurinskiĭ 1977. 2 Asymptotic behavior of p-norms 2.1 Consistency Throughout the ocument, the notation P an D stan for convergence in probability an in istribution, respectively. The notation u n = ov n an u n = Ov n mean, respectively, that u n /v n 0 an u n Cv n for some constant C, as n. The symbols o P v n an O P v n enote, respectively, a sequence of ranom variables {Y n } n 1 such that Y n /v n P 0 an Yn /v n is boune in probability, as n. We start this section with a general proposition that plays a key role in the analysis. Proposition 1. Let {U } 1 be a sequence of ranom variables such that U P a, an let ϕ be a real-value measurable function which is continuous at a. Assume that: i ϕ is boune on [ M,M] for some M > a ; ii E ϕu < for all 1. Then, as, if an only if EϕU ϕa Eϕ U 1{ U > M} 0. 2 Proof. The proof is easy. Conition i an continuity of ϕ at a allow us to apply the boune convergence theorem to get

4 Gérar Biau an Davi M. Mason Since EϕU 1{ U M} ϕa. EϕU = EϕU 1{ U M} + EϕU 1{ U > M}, the rest of the proof is obvious. We shall now specialize the result of Proposition 1 to the case when U = 1 Y j := Y, j=1 where {Y j } j 1 is a sequence of i.i.. Y ranom variables with finite mean µ. In this case, by the strong law of large numbers, U µ almost surely. The following lemma gives two sufficient conitions for 2 to hol when U = Y. Lemma 1. let ϕ be a real-value measurable function. Assume that one of the following two conitions is satisfie: Conition 1 The function ϕ is convex on R an E ϕy <. Conition 2 For some s > 1, lim supe ϕy s <. Then 2 is satisfie for the sequence {Y } 1 with a = µ an M > µ. Proof. Suppose that Conition 1 is satisfie. Then note that by the convexity assumption E ϕy 1 { Y > M } 1 E ϕyj { 1 Y > M } j=1 = E ϕy 1 { Y > M }. Since M > µ, we conclue that with probability one, ϕy 1{ Y > M} 0. Also ϕy 1{ Y > M} ϕy. Therefore, by the ominate convergence theorem, 2 hols. Next, notice by Höler s inequality with 1/r = 1 1/s that E ϕy 1 { Y > M } E ϕy s 1/s P { Y > M } 1/r. Since P{ Y > M} 0, 2 immeiately follows from Conition 2. Let us now return to the istance concentration problem, which has been iscusse in the introuction. Recall that we enote by X = X 1,...,X a R -value ranom vector with i.i.. X components. Whenever for p > 0 E X p <, we set µ p = E X p. Also when Var X p <, we shall write σ 2 p = Var X p. Proposition 1 an Lemma 1 yiel the following corollary:

High-Dimensional p-norms 5 Corollary 1. Fix p > 0 an r > 0. i Whenever r/p < 1 an E X p <, whereas if E X p =, then E X r p r/p ii Whenever r/p 1 an E X r <, E X r p r/p µ r/p p, as, E X r p lim r/p =. whereas if E X r =, then, for all 1, µ r/p p, as, E X r p r/p =. Proof. We shall apply Proposition 1 an Lemma 1 to Y = X p, Y j = X j p, j 1, an ϕu = u r/p. Proof of i For the first part of i, notice that with s = p/r > 1 E ϕ j=1 X j p s = j=1 E X j p = E X p <. This shows that sufficient Conition 2 of Lemma 1 hols, which by Proposition 1 gives the result. For the secon part of i observe that for any K > 0 j=1 X j p r/p j=1 X j E E p 1 { X j K } r/p. Observing that the right-han sie of the inequality converges to E X p 1{ X K} r/p as, we get for any K > 0 lim inf E j=1 X j p r/p E X p 1{ X K} r/p.

6 Gérar Biau an Davi M. Mason Since K can be chosen arbitrarily large an we assume that E X p =, we see that the conclusion hols. Proof of ii For the first part of ii, note that in this case r/p 1, so ϕ is convex. Moreover, note that E ϕ j=1 X j p = E j=1 X j p r/p 1 E X r by Jensen s inequality <. Thus sufficient Conition 1 of Lemma 1 hols, which by Proposition 1 leas to the result. For the secon part of ii, observe that if E X r =, then, for all 1, E j=1 X j p r/p r/p E X r =. Applying Corollary 1 with p > 0 an r = 2 yiels the following important result: Proposition 2. Fix p > 0 an assume that 0 < E X m < for m = max2, p. Then, as, E X p 1/p µ p 1/p an which implies E X 2 p 2/p µ 2/p p, Var X p E X p 0, as. This result, when correctly state, correspons to Theorem 5 of François et al. 2007. It expresses the fact that the relative stanar eviation converges towars zero when the imension grows. It is known in the computational learning literature as the p-norm concentration in high-imensional spaces. It is noteworthy that, by Chebyshev s inequality, for all ε > 0,

High-Dimensional p-norms 7 { } X p { P 1 E X p ε } = P X p E X p εe X p Var X p ε 2 E 2 X p 0, as. 3 That is, X p /E X p P 1 or, in other wors, the sequence { X p } 1 is relatively stable Boucheron et al., 2013. This property guarantees that the ranom fluctuations of X p aroun its expectation are of negligible size when compare to the expectation, an therefore most information about the size of X p is given by E X p as becomes large. 2.2 Rates of convergence The asymptotic concentration statement of Corollary 1 can be mae more precise by means of rates of convergence, at the price of stronger moment assumptions. To reach this objective, we first nee a general result to control the behavior of a function of an i.i.. empirical mean aroun its true value. Thus, assume that {Y j } j 1 are i.i.. Y with mean µ an variance σ 2. As before, we efine Y = 1 Y j. j=1 Let ϕ be a real-value function with erivatives ϕ an ϕ. Khan 2004 provies sufficient conitions for EϕY = ϕµ + ϕ µσ 2 + o 2 2 to hol. The following lemma, whose assumptions are less restrictive, can be use in place of Khan s result 2004. For the sake of clarity, its proof is postpone to Section 4. Lemma 2. Let {Y j } j 1 be a sequence of i.i.. Y ranom variables with mean µ an variance σ 2, an ϕ be a real-value function with continuous erivatives ϕ an ϕ in a neighborhoo of µ. Assume that for some r > 1, an, with 1/s = 1 1/r, Then, as, E Y r+1 < 4 lim supe ϕy s <. 5 EϕY = ϕµ + ϕ µσ 2 + o 1. 2

8 Gérar Biau an Davi M. Mason The consequences of Lemma 2 in terms of p-norm concentration are summarize in the following proposition: Proposition 3. Fix p > 0 an assume that 0 < E X m < for m = max4,3p. Then, as, E X p = 1/p µ p 1/p + O 1/p 1 an which implies Var X p = µ2/p 2 p σp 2 1 2/p p 2 + o 1+2/p, Var X p E X p σ p pµ p, as. Proposition 3 is state without assumptions as Theorem 6 in François et al. 2007, where it is provie with an ambiguous proof. This result shows that for a fixe large, the relative stanar eviation evolves with p as the ratio σ p /pµ p. For instance, when the istribution of X is uniform, µ p = 1 p + 1 an σ p = p p + 1 In that case, we conclue that Var X p 1 E X p 2p + 1. 1 2p + 1. Thus, in the uniform setting, the limiting relative stanar eviation is a strictly ecreasing function of p. This observation is often interprete by saying that p- norms are more concentrate for larger values of p. There are however istributions for which this is not the case. A counterexample is given by a balance mixture of two stanar Gaussian ranom variables with mean 1 an 1, respectively see François et al., 2007, page 881. In that case, it can be seen that the asymptotic relative stanar eviation with p 1 is smaller than for values of p [8,30], making fractional norms more concentrate. Proof Proposition 3. Fix p > 0 an introuce the functions on R ϕ 1 u = u 1/p an ϕ 2 u = u 2/p. Assume that E X max4,p <. Applying Corollary 1 we get that, as, an E j=1 X j p 2/p µ 2/p p

High-Dimensional p-norms 9 E j=1 X j p 4/p µ 4/p p. This says that with s = 2, for i = 1,2, lim supe ϕ j=1 X j p s i <. Now, let Y = X p an set r = 2. If we also assume that E Y r+1 = E Y 3 = E X 3p <, we get by applying Lemma 2 to ϕ 1 an ϕ 2 that for i = 1,2 Eϕ i Y = ϕ i µ p + ϕ i µ pσp 2 + o 1. 2 Thus, whenever E X m <, where m = max4,3p, an Therefore, we see that E Y 1/p = µ p 1/p + 1 p E Y 2/p = µ p 2/p + 1 p 1 p p 2 p p 1/p 2 µ p σp 2 + o 1 2 2/p 2 µ p σp 2 + o 1. Var Y 1/p = E Y 2/p E 2 Y 1/p = µ2/p 2 p σp 2 p 2 + o 1. The ientity Y = 1 j=1 X j p yiels the esire results. We conclue the section with a corollary, which specifies inequality 3. Corollary 2. Fix p > 0. i If 0 < E X m < for m = max4,3p, then, for all ε > 0, { } X p P 1 E X p ε σp 2 ε 2 p 2 µ p 2 + o 1. ii If for some positive constant C, 0 < X C almost surely, then, for p 1 an all ε > 0, { } X p P 1 E X p ε 2exp ε 2 2/p 1 µ p 2/p 2C 2 + o 2/p 1.

10 Gérar Biau an Davi M. Mason Proof. Statement i is an immeiate consequence of Proposition 3 an Chebyshev s inequality. Now, assume that p 1, an let A = [ C,C]. For x = x 1,...,x R, let g : A R be efine by gx = x p = 1/p x j p. j=1 Clearly, for each 1 j, sup gx 1,...,x gx 1,...,x j 1,x j,x j+1,...,x x 1,...,x A x j A = sup x A,x j A x p x p, where x is ientical to x, except on the j-th coorinate where it takes the value x j. It follows, by Minkowski inequality which is vali here since p 1, that sup gx1,...,x gx 1,...,x j 1,x j,x j+1,...,x x 1,...,x A x j A sup x x p x A x j A = sup x,x A 2 x x 2C. Consequently, using the boune ifference inequality McDiarmi, 1989, we obtain { } { } X p P 1 E X p ε = P X p E X p εe X p 2exp 2εE X p 2 4C 2 = 2exp ε 2 2/p 1 µ p 2/p 2C 2 + o 2/p 1, where, in the last inequality, we use Proposition 3. This conclues the proof. 3 Minima an maxima Another important question arising in high-imensional nearest neighbor search analysis concerns the relative asymptotic behavior of the minimum an maximum

High-Dimensional p-norms 11 p-istances to the origin within a ranom sample. To be precise, let X 1,...,X n be an i.i.. X sample, where X = X 1,...,X is as usual a R -value ranom vector with i.i.. X components. We will be primarily intereste in this section in the asymptotic properties of the ifference the contrast max 1 i n X i p min 1 i X i p. Assume, to start with, that n is fixe an only is allowe to grow. Then an immeiate application of the law of large numbers shows that, whenever µ p = E X p <, almost surely as, Moreover, if 0 < µ p <, then 1/p max 1 i n X i p min 1 i n X i p max 1 i n X i p min 1 i n X i p P 1. P 0. The above ratio is sometimes calle the relative contrast in the computational learning literature. Thus, as becomes large, all observations seem to be istribute at approximately the same p-istance from the origin. The concept of nearest neighbor measure by p-norms in high imension is therefore less clear than in small imension, with resulting computational ifficulties an algorithmic inefficiencies. These consistency results can be specifie by means of asymptotic istributions. Recall that if Z 1,...,Z n are i.i. stanar normal ranom variables, the sample range is efine to be M n = max Z i min Z i. 1 i n 1 i n The asymptotic istribution of M n is well known see, e.g., Davi, 1981. Namely, for any x one has { 2logn M n 2 2logn + lim P n = exp t e t e x t t. } loglogn + log4π 2 x 2logn For future reference, we shall sketch the proof of this fact here. It is well known that with a n = 2logn an b n = 2logn 1 loglogn + log4π 6 2 2logn we have a n max Z i b n,a n min Z i + b n E, E, 7 1 i n 1 i n where E an E are inepenent, E = E an P{E x} = exp exp x, < x <. The asymptotic inepenence of the maximum an minimum part can be inferre from Theorem 4.2.8 of Reiss, 1989, an the asymptotic istribution part from Example 2 on page 71 of Resnick, 1987. From 7 we get

12 Gérar Biau an Davi M. Mason a n max 1 i n Z i min 1 i n Z i 2a n b n D E + E. Clearly, P{E + E x} = = exp e x t exp e t e t t exp t e t e x t t. Our first result treats the case when n is fixe an. Proposition 4. Fix p > 0, an assume that 0 < E X p < an 0 < σ p <. Then, for fixe n, as, 1/2 1/p max X i p min X D i p σ pµ p 1/p 1 M n. 1 i n 1 i n p To our knowlege, this is the first statement of this type in the analysis of highimensional nearest neighbor problems. In fact, most of the existing results merely boun the asymptotic expectation of the normalize ifference an ratio between the max an the min, but with bouns which are unfortunately not of the same orer in n as soon as n 3 see, e.g., Theorem 3 in Hinneburg et al., 2000. One of the consequences of Proposition 4 is that, for fixe n, the ifference between the farthest an nearest neighbors oes not necessarily go to zero in probability as tens to infinity. Inee, we see that the size of max X i p min X i p 1 i n 1 i n grows as 1/p 1/2. For example, this ifference increases with imensionality as for the L 1 Manhattan metric an remains stable in istribution for the L 2 Eucliean metric. It tens to infinity in probability for p < 2 an to zero for p > 2. This observation is in line with the conclusions of Hinneburg et al. 2000, who argue that nearest neighbor search in a high-imensional space tens to be meaningless for norms with larger exponents, since the maximum observe istance tens towars the minimum one. It shoul be note, however, that the variance of the limiting istribution epens on the value of p. Remark 1. Let Z 1,...,Z n be i.i. stanar normal ranom variables, an let R n = max 1 i n Z i min 1 i n Z i. Assuming µ p > 0 an 0 < σ p <, one can prove, using the same technique, that max 1 i n X i p 1/p µ p min 1 i n X i p 1/p µ p D Rn.

High-Dimensional p-norms 13 Proof Proposition 4. Denote by Z n a centere Gaussian ranom vector in R n, with ientity covariance matrix. By the central limit theorem, as, [ X1 p p,..., X n p ] p D µ p,..., µ p σ p Z n. Applying the elta metho with the mapping f x 1,...,x n = x 1/p 1,...,xn 1/p which is ifferentiable at µ p,..., µ p since µ p > 0, we obtain [ X1 p 1/p,..., X ] n p 1/p µ p 1/p,..., µ p 1/p D σpµ p 1/p 1 Z n. p Thus, by continuity of the maximum an minimum functions, 1/2 1/p max X i p min X D i p σ pµ p 1/p 1 M n. 1 i n 1 i n p In the previous analysis, n the sample size was fixe whereas the imension was allowe to grow to infinity. A natural question that arises concerns the impact of letting n be a function of such that n tens to infinity as Mallows, 1972. Proposition 5 below offers a first answer. Proposition 5. Fix p 1, an assume that 0 < E X 3p < an σ p > 0. For any sequence of positive integers {n} 1 converging to infinity an satisfying n = o 1/5 log 6/5, as, 8 we have pa n 1/2 1/p µ p 1/p 1 max σ X i p min X D i p 2a n b n E + E, 1 i n 1 i n p where a n an b n are as in 6, an E an E are as in 7. Proof. In the following, we let δ = 1/log. For future use note that δ 2 logn 0 an n 5 δ 6 0, as. 9 In the proof we shall often suppress the epenence of n an δ on. For 1 i n, we set X i = X 1,i,...,X,i an X i p p = j=1 X j,i p.

14 Gérar Biau an Davi M. Mason We see that for n 1, X 1 p p µ p,..., X n p p µ p σp σp = j=1 X j,1 p µ p,..., j=1 X j,n p µ p σp σp := Y 1,...,Y n = Y n R n. As above, let Z n = Z 1,...,Z n be a centere Gaussian ranom vector in R n, with ientity covariance matrix. Write, for 1 j, X j,1 p µ p ξ j =,..., X j,n p µ p σp σp an note that j=1 ξ j = Y n. Set β = j=1 E ξ j 3 2. Then, by Jensen s inequality, E ξ j 3 2 = E n i=1 X j,i p µ p 2 3/2 σ 2 p n σ 2 p This gives that for any δ > 0, possibly epening upon n, B := βnδ 3 n5/2 σ 3 p E X p µ p 3 δ 3. 3/2 E X p µ p 3. Applying a result of Yurinskiĭ 1977 as formulate in Section 4 of Chapter 10 of Pollar 2001 we get, on a suitable probability space epening on δ > 0 an n 1, there exist ranom vectors Y n an Z n satisfying Y D n = Y n an Z D n = Z n such that { } P Y n Z n 2 > 3δ CB 1 + logb, 10 n where C is a universal constant. To avoi the use of primes we shall from now on D rop them from the notation an write Y n = Y D n an Z n = Z n, where it is unerstoo that the pair Y n,z n satisfies inequality 10 for the given δ > 0. Using the fact that max x i max 1 i n 1 i n y i n x i y i 2, i=1 we get, for all ε > 0, { } } P a n max Y i max Z i > ε P{ 2logn Yn Z n 2 > ε. 1 i n 1 i n

High-Dimensional p-norms 15 Thus, for all large enough, { } P a n max Y i max Z i > ε P{ 2logn Yn Z n 2 > 3δ } 2logn 1 i n 1 i n since δ logn 0 as { } = P Y n Z n 2 > 3δ. From 10, we euce that for all ε > 0 an all large enough, { } P a n max Y i max Z i > ε CB 1 i n 1 i n But, by our choice of δ an 9, B 1 + logb 0, n so that Similarly, one proves that a n max 1 i n Y i max 1 i n Z i = o P 1. a n min 1 i n Y i min 1 i n Z i = o P 1. 1 + logb n Thus, by 7, we conclue that a n max Y i b n,a n min Y D i + b n E, E. 11 1 i n 1 i n Next, we have a n max Y i b n,a n min 1 i n max 1 i n X i = a p p n σp = Y i + b n 1 i n µp σ p b n, a n min 1 i n X i p p σp max 1 i n X i a p p min n 1 i n X i p p β n, a n β n σp σp. µp + b n, σ p where β n = both µp σ p + b n an β n µp = σ p b n. Note that a n an 11 imply that max 1 i n X i p p P min 1 i n X i p p β n 0 an β n σp σp P 0. 12

16 Gérar Biau an Davi M. Mason Observe also that by a two term Taylor expansion, for a suitable β n between β n an max 1 i n X i p p/ σ p, pa n β 1/p 1 n max 1 i n X i p 1/p p β 1/p σp max 1 i n X i p p = a n β n σp + a n 1 p βn 1/p 1 2p We obtain by 11 an 12 that a 2 n β 1/p 2 n max 1 i n X i p 2 1/p 2 p β n β n σp a n βn 1/p 1 n max 1 i n X i p p σp β n 2. 1 = O P = o P 1. a n β n Similarly, pa n β n 1/p 1 min 1 i n X i p p σp min 1 i n X i p p = a n β n σp 1/p β n 1/p + o P 1. Keeping in min that β n /β n 1, we get pa n max 1 i n X i p 1/p p βn 1/p 1 βn 1/p, σp an hence D E, E pa n β 1/p 1 n min 1 i n X i p p σp max 1 i n X i p σ p 1/p min 1 i n X i p σ p 1/p β 1/p n + β n 1/p Next notice that 8 implies that b n / 0, as. Thus, recalling 1/p β n 1/p D E + E. β n b = 1 + n up /σ p µp /σ p an β n b = 1 n, up /σ p µp /σ p we are le to

High-Dimensional p-norms 17 pa n βn 1/p 1 βn 1/p β n 1/p = 2a n b n + Oa n b 2 nβn 1 = 2a n b n + o1. Therefore we get pa n 1/2 1/p µ p 1/p 1 max σ X i p min X D i p 2a n b n E + E. 1 i n 1 i n p 4 Proof of Lemma 2 In the sequel, to lighten notation a bit, we set Y = Y. Choose any ε > 0 an δ > 0 such that ϕ has continuous erivatives ϕ an ϕ on I δ = [µ δ,µ + δ] an ϕ µ ϕ x ε for all x I δ. We see that by Taylor s theorem that for Y I δ ϕy = ϕµ + ϕ µy µ + 2 1 ϕ µy µ 2, 13 where µ lies between Y an µ. Clearly, EϕY ϕµ σ 2 ϕ µ 2 = E ϕy ϕµ + ϕ µy µ + 2 1 ϕ µy µ 2 where E { ϕy ϕµ + ϕ µy µ + 2 1 ϕ µy µ 2} 1{Y I δ } + E ϕy 1{Y / Iδ } + E PY 1{Y / Iδ }, Py = ϕµ + ϕ µy µ + 2 1 ϕ µy µ 2. Now using 13 an ϕ µ ϕ x ε for all x I δ, we may write E { ϕy ϕµ + ϕ µy µ + 2 1 ϕ µy µ 2} 1{Y I δ } ε 2 EY µ2 = εσ 2 2. Next, we shall boun E ϕy 1{Y / Iδ } + E PY 1{Y / Iδ } := 1 + 2. Recall that we assume that for some r > 1, conition 4 hols. In this case, by Theorem 28 on page 286 of Petrov 1975 applie with r replace by r + 1, for all δ > 0, P { Y µ δ } = o r. 14

18 Gérar Biau an Davi M. Mason Then, by using Höler s inequality, 5 an 14, we get 1 E ϕy s 1/s P{Y / Iδ } 1/r = o 1. We shall next boun 2. Obviously from 14 ϕµ P{Y / I δ } = o 1. Furthermore, by the Cauchy-Schwarz inequality an 14, E ϕ µy µ1{y / I δ } ϕ µ σ 1/2 o r/2 = o 1, an by Höler s inequality with p = r + 1/2 an q 1 = 1 p 1 = 1 2/r + 1 = r 1/r + 1, we have 2 1 ϕ µ E Y µ 2 1{Y / I δ } 2 1 ϕ µ E Y µ r+1 2/r+1 P{Y / Iδ } 1/q. Applying Rosenthal s inequality see equation 2.3 in Giné et al., 2003 we obtain E Y µ r+1 = E 1 i=1 15r + 1 logr + 1 Y i µ r+1 r+1 max r+1/2 EY 2 r+1/2, r E Y r+1. Thus E Y µ r+1 2/r+1 = O 1, which when combine with 14 gives Thus 2 1 ϕ µ E Y µ r+1 2/r+1 P{Y / Iδ } r 1/r+1 = o 1. 2 = o 1. Putting everything together, we conclue that for any ε > 0 lim sup EϕY ϕµ σ 2 ϕ µ 2 εσ 2 2. Since ε > 0 can be chosen arbitrarily small, this completes the proof.

High-Dimensional p-norms 19 Acknowlegements The authors thank the referee for pointing out a misstatement in the original version of the paper. References C.C. Aggarwal, A. Hinneburg, an D.A. Keim. On the surprising behavior of istance metrics in high imensional space. In Proceeings of the 8th International Conference on Database Theory, pages 420 434, Berlin, 2001. Springer. K.S. Beyer, J. Golstein, R. Ramakrishnan, an U. Shaft. When is nearest neighbor meaningful? In Proceeings of the 7th International Conference on Database Theory, pages 217 235, Berlin, 1999. Springer. S. Boucheron, G. Lugosi, an P. Massart. Concentration Inequalities: A Nonasymptotic Theory of Inepenence. Oxfor University Press, Oxfor, 2013. H. Davi. Orer Statistics. 2n Eition. Wiley, New York, 1981. D. François, V. Wertz, an M. Verleysen. The concentration of fractional istances. IEEE Transactions on Knowlege an Data Engineering, 19:873 886, 2007. E. Giné, D.M. Mason, an A.Yu. Zaitsev. The L 1 -norm ensity estimator process. The Annals of Probability, 31:719 768, 2003. A. Hinneburg, C.C. Aggarwal, an D.A. Keim. What is the nearest neighbor in high imensional spaces. In Proceeings of the 26th International Conference on Very Large Data Bases, pages 506 515, San Francisco, 2000. Morgan Kaufmann. A. Kabán. Non-parametric etection of meaningless istances in high imensional ata. Statistics an Computing, 22:375 385, 2012. R.A. Khan. Approximation for the expectation of a function of the sample mean. Statistics, 38: 117 122, 2004. C.L. Mallows. A note on asymptotic joint normality. The Annals of Mathematical Statistics, 43: 508 515, 1972. C. McDiarmi. On the metho of boune ifferences. In J. Siemons, eitor, Surveys in Combinatorics, 1989, Lonon Mathematical Society Lecture Note Series 141, pages 148 188. Cambrige University Press, 1989. V.V. Petrov. Sums of Inepenent Ranom Variables, volume 82 of Ergebnisse er Mathematik un ihrer Grenzgebiete. Springer, New York, 1975. D. Pollar. A User s Guie to Measure Theoretic Probability. Cambrige University Press, Cambrige, 2001. R.-D. Reiss. Approximate Distributions of Orer Statistics. With Applications to Nonparametric Statistics. Springer, New York, 1989. S.I. Resnick. Extreme Values, Regular Variation, an Point Processes. Springer, New York, 1987. V.V. Yurinskiĭ. On the error of the Gaussian approximation for convolutions. Teoriya Veroyatnostei i ee Primeneniya, 22:242 253, 1977.