Concentration Inequalities for Poisson Functionals

Size: px

Start display at page:

Download "Concentration Inequalities for Poisson Functionals"

Cameron Harmon
5 years ago
Views:

1 Concentration Inequalities for Poisson Functionals Dissertation zur Erlangung des Doktorgrades (Dr. rer. nat.) des Fachbereichs Mathematik/Informatik der Universität Osnabrück vorgelegt von Sascha Bachmann aus Mettingen Osnabrück, 2015

3 Foreword The content of the present thesis is the result of two years of research that I did as a PhD student at the Institute for Mathematics of the Osnabrück University. These two years have been an interesting and inspiring time, and I wish to first of all thank several people that contributed to this enriching period of my life. Acknowledgments. I wish to thank my supervisor Matthias Reitzner for guiding me during my PhD studies, for being a source of motivation and inspiration, and in particular for drawing my attention to the fruitful topic of concentration inequalities for Poisson functionals. I also thank Giovanni Peccati for an amazingly productive and inspiring collaboration as well as for inviting me to several research visits to the University of Luxemburg. Moreover, I wish to thank Günter Last, Matthias Schulte and Christoph Thäle for interesting discussions around topics related to the present thesis, where I particularly thank Günter Last for drawing my attention to the study of concentration properties for component counts. In addition to this, I wish to thank my colleagues in Osnabrück for providing such a pleasant and welcoming working atmosphere that certainly contributed significantly to the success of my PhD studies. In particular, I thank my colleague Gilles Bonnet, who shared an office with me during the last two years, for many interesting conversations about both mathematical and non-mathematical topics. I also thank my friends for all the great times we have spent together during the last years and in particular for the countless movie nights that provided a lot of balance to my work. Furthermore, I wish to thank my parents for always believing in me and for their constant support in all situations of life. Last, but certainly not least, I very much thank my wife Saskia for always being by my side, for supporting me particularly in work intensive times, and for being this uniquely special person to me. Underlying papers. The presentation in this thesis is based on and coincides on large parts with the content of the following three papers: [Bac15] S. Bachmann. Concentration for Poisson functionals: component counts in random geometric graphs. Preprint (accepted for publication in Stochastic Process. Appl.) ariv: v1 [math.pr]. 1

4 2 FOREWORD [BP15] [BR15] S. Bachmann and G. Peccati. Concentration Bounds for Geometric Poisson Functionals: Logarithmic Sobolev Inequalities Revisited. Preprint (accepted for publication in Electron. J. Probab.) ariv: v1 [math.pr]. S. Bachmann and M. Reitzner. Concentration for Poisson U-Statistics: Subgraph Counts in Random Geometric Graphs. Preprint (submitted to Ann. Appl. Probab.) ariv: v1 [math.pr]. Whenever appropriate, at the beginning of a chapter or a section of the present thesis, it will be clarified in a paragraph entitled References to the underlying papers which parts of the chapter or section correspond to which parts of the above papers.

5 Contents Foreword 1 Chapter 1. Introduction Historical overview Random geometric graphs Outline Chapter 2. Preliminaries Background from point process theory Background from probability theory Chapter 3. General Methods for Concentration Inequalities Logarithmic Sobolev inequalities for concentration Concentration for U-statistics Concentration for the convex distance in Poisson-based models Chapter 4. Concentration for Random Geometric Graphs Edge counting Length power functionals Length of random intersection graphs Subgraph counts and beyond Component counts Chapter 5. Strong Laws for Random Geometric Graphs Overview of the literature on asymptotic results Subgraph counts Component counts References 117 List of Notations 123 3

7 CHAPTER 1 Introduction When dealing with a random quantity F, a fundamental task is to reliably predict the uncertain outcome. To do so, a natural first step is to describe F in terms of some mean value that is representative of the distribution of F. For numerous well established reasons, the expectation or a median of F are good choices for such a mean value. However, representing the random outcome by some mean value M, we do not know if we can rely on this prediction, or if F may still take a value far away from M with high probability. Therefore, it is very natural to ask for upper bounds on the probabilities P(F M + r) and P(F M r), where r 0 is a real number. Estimates of this type are commonly referred to as concentration inequalities for the upper tail and the lower tail of F, respectively. Further terms that are used for such inequalities are e.g. tail bounds, tail estimates or concentration estimates. One naturally aims for concentration inequalities with a fast decay and a lot of related research particularly focuses on tail bounds that decay exponentially. A substantial contribution of the present thesis is to provide new methods for proving exponentially decaying concentration inequalities for random variables F = F (η) that are functions of a Poisson point process η. As applications of the developed techniques, concentration inequalities and strong laws of large numbers for several quantities associated with random geometric graphs are established Historical overview While the well-known Chebyshev inequality and also the Markov inequality yield tail bounds with a power law decay, the study of exponentially decaying tail inequalities originates from the investigation of sums of independent random variables. Early work on concentration properties of the latter objects has been done in the 1950s and 1960s by Chernoff [19], Bennett [4], and Hoeffding [34], only to mention some of the most important pioneers in the field (see [10, Section 2.12] for further reading and details). Stimulated by these seminal studies, the development of new techniques to establish exponential concentration estimates also for more general functions of independent random variables has been at the core of a collaborative effort of many authors during the last decades. These techniques include methods based on martingale differences, which were in particular pioneered in the 1970s and 1980s by Yurinskiĭ [77], Maurey 5

8 6 1. INTRODUCTION [56], Milman and Schechtman [59]. Another approach based on couplings that is commonly referred to as the transportation method was developed in 1986 by Marton [53]. Further groundbreaking contributions were made by Talagrand [73] in the year 1995, where he particularly introduced the notion of convex distance that turned out to be a remarkable tool for proving concentration inequalities. An approach to derive concentration inequalities that is particularly relevant for the present thesis is based on logarithmic Sobolev inequalities together with a technique suggested by I. Herbst in an unpublished letter which is known as the Herbst argument. This method, that is now commonly referred to as the entropy method, was initiated in 1984 with the publication [21] by Davies and Simon and further pioneered from the late 1990s until today particularly by Ledoux [50 52], and by Boucheron, Lugosi and Massart [9, 11, 12]. The above historical overview on exponential tail bounds for functions of independent random variables is based on [10, Chapter 1 and Section 2.12] and we refer the interested reader to this reference for further details. While the research described above provides a rich collection of concentration techniques for random variables that are functions of independent random elements, so far only a few authors have studied concentration properties of random variables that can be written as a function of a Poisson point process, usually referred to as Poisson functionals. Early seminal work on concentration for Poisson functionals has been done in the year 2000 by Wu [75], extending previous breakthrough findings by Ané, Bobkov and Ledoux [1, 5]. The approach in the latter references is based on combining new logarithmic Sobolev inequalities for Poisson point processes with the Herbst argument, thus initiating the entropy method for Poisson functionals. Further contributions related to this method were made by Reynaud-Bouret and Houdré [36, 66], and also by Breton, Houdré and Privault [14, 35], where in the latter articles the authors combine the Herbst argument with integration by parts formulas instead of logarithmic Sobolev inequalities. The general concentration results that are obtained using this entropy method for Poisson processes only apply under very strong assumptions on the considered Poisson functionals. One of the main insights of the present thesis is that, by carefully combining the famous Mecke formula for Poisson processes with Wu s logarithmic Sobolev inequality, a significant amount of flexibility can be added to the entropy method. This allows one to adapt techniques for functions of independent random variables that were in particular established by Boucheron, Lugosi and Massart in [11, 12] and also by Maurer in [55] to the framework of Poisson point processes. A further approach for establishing concentration inequalities for Poisson functionals is based on an adaptation of Talagrand s notion of convex distance to the setting of Poisson point processes that was recently suggested by Reitzner in [63]. First examples of how to use this method to obtain tail estimates for Poisson functionals are worked out by Lachièze-Rey, Reitzner, Schulte and Thäle in [45, 65], where concentration inequalities for certain U-statistics built over Poisson processes with finite intensity measures are derived. The techniques used in the latter references will

9 1.2. RANDOM GEOMETRIC GRAPHS 7 be enhanced in the present thesis with the particular goal to derive concentration inequalities that also hold if the underlying Poisson process has infinite intensity measure. We conclude this historical overview on the topic of concentration inequalities by mentioning the recent work [27] by Eichelsbacher, Raič and Schreiber, where concentration techniques for so-called stabilizing Poisson functionals based on cumulant expansions and cluster measures are investigated. We stress that in the latter work, only Poisson processes with finite intensity measures are considered Random geometric graphs References to the underlying papers. The present section partially coincides with [Bac15, pp. 1-3] as well as [BR15, pp. 1-3]. The wish to understand concentration properties of several quantities that are naturally of interest when studying random geometric graphs has been a main motivation for the research presented in this thesis. In the following, we will therefore give a brief introduction to the topic of random geometric graphs. In particular, we provide some historical notes and important references, point out important applications, and also introduce some of the random quantities that will be at the core of the investigations in this work. Random geometric graphs have been studied extensively for some decades now. In the simplest version of these graphs, the vertices are given by a random set of points in R d and two vertices are connected by an edge if their distance is less than a fixed positive real number. This model was introduced in the year 1961 by Gilbert in [30], and since then many authors contributed to various directions of research on random geometric graphs. For a historical overview on the topic we refer the reader to the book [61] by Penrose. Recent contributions are e.g. [23, 43, 44, 65]. It is a well established fact that numerous real world phenomena can be modeled by means of a random geometric graph, like for example the spread of a disease or a fire (see e.g. [3, 29]). Also, as communication networks such as wireless and sensor networks have become increasingly important in recent years, random geometric graphs have gained a considerable attention since they provide natural models for these objects (see e.g. [18, 31, 60]). Further applications arise from cluster analysis, where one aims to divide a given set of objects into groups (or clusters) such that objects within the same group are similar to each other (see e.g. [7, 8] for further reading). If the objects are represented by points in R d, one way to perform this task is to built a geometric graph over the points and to take the connected components of the graph as the clusters. For the purpose of statistical inference, a probabilistic theory for the connected components of the graph is needed. There are several ways how to choose the random set of points in R d that determines the vertices of the considered random geometric graph. The two most prominent

10 8 1. INTRODUCTION choices that are discussed in the literature are either to take independently and identically distributed (i.i.d.) points in R d as the vertices, or to let the vertex set be given by a Poisson point process on R d. Naturally, since this work deals with concentration for Poisson functionals, we will focus throughout on the second variant of the model. One class of random variables that will be investigated in the present thesis is given by the subgraph counts associated with random geometric graphs. For a fixed finite connected graph H, the corresponding subgraph count is the random variable that counts the number of subgraphs of the random geometric graph that are isomorphic to H. These random variables have been studied by many authors, see [61, Chapter 3] for a historical overview on related results. The subgraph counts are examples for Poisson U-statistics that satisfy some nice additional properties. Another interesting U-statistic associated with a random geometric graph is the total edge length, defined as the sum of all edge lengths. A further class of quantities that is studied in the present thesis will be referred to as component counts. For example, one can consider the number of connected components of the graph with at most k (or alternatively with exactly k) vertices. Moreover, our analysis covers the number of components that are isomorphic to a fixed finite connected graph H. Early work on the latter random variables was done by R. Hafner in [32] and further related results are presented in [61]. One might think that the study of subgraph and component counts needs to be restricted to finite graphs to ensure that all occurring variables are finite. We stress that this is not the case. There are for example Poisson processes on R d, such that the associated random geometric graph has a.s. infinitely many vertices and edges but still a.s. a finite number of triangles, and hence also a finite number of components that are isomorphic to a triangle. This phenomenon seems to be quite unexplored so far. In the context of concentration properties, a natural question is whether concentration inequalities also hold in these situations. We emphasize that the tail estimates for subgraph and component counts presented in this thesis only require that the considered random variables are almost surely finite and hence cover such cases. This is a particular advantage over previous concentration results for Poisson U-statistics from [45, 65] where the considered Poisson processes need to have finite intensity measures. We conclude this introductory discussion on the topic of random geometric graphs by mentioning a closely related field of research. In recent years, the study of random geometric simplicial complexes built on random point sets in R d has attracted a considerable attention, see e.g. [24, 39, 40, 76] and in particular the survey [6] by Bobrowski and Kahle on this topic. Motivations to study these complexes arise particularly from topological data analysis (for a survey on this see [15]), but also from other applications of geometric complexes like sensor networks (see e.g. [22]). In order to investigate these random topological objects, it frequently turns out that results and properties of the underlying random geometric graphs can be very

11 1.3. OUTLINE 9 useful. The findings and methods presented in this thesis might therefore as well be of interest for future research on random simplicial complexes Outline In the following, we will briefly outline the content of the present thesis. The second chapter aims to provide some mathematical background that is needed to read the thesis. The primary goal of Section 2.1 is to introduce the notion of a Poisson point process on a general measure space. In addition to this, in Section 2.2 we recall some basic concepts and facts from probability theory. In the third chapter, a large collection of general concentration results is presented that may serve as a toolbox for deriving concentration inequalities for Poisson functionals. The proofs of the statements presented in Section 3.1 are based on inequalities involving the entropy of the considered random variables, which are derived using Wu s logarithmic Sobolev inequality together with the Mecke formula. Via variations of the Herbst argument, these entropy inequalities can be regarded as differential inequalities for the moment generating function, which yield exponential concentration inequalities for the considered functionals when combined with Markov s inequality. The procedure that was just described is commonly referred to as the entropy method, where the novelty here is the use of the Mecke formula, which adds a lot of flexibility to the method. Section 3.2 deals with concentration techniques for Poisson U-statistics. In addition to various results that are based on the entropy method, we will also present an approach that is based on the convex distance for Poisson processes, resulting in concentration inequalities with respect to a median instead of the expectation of the considered functionals. Interestingly, the condition on the random variables that need to be satisfied for the latter inequalities to hold basically coincides with the condition that appears in one of the results which are derived using the entropy method. In Section 3.3 we will re-prove the convex distance inequality for Poisson processes that was established by Reitzner in [63]. This inequality is at the core of the method for proving concentration estimates via the convex distance, and we will point out how this result can be obtained (up to slightly worse constants) using the entropy method. In particular, the presented proof of the convex distance inequality only uses results from the theory of Poisson point processes and does not involve approximations by binomial processes, thus answering the question stated by Reitzner in [63] whether such a direct proof is possible. The fourth chapter provides a variety of applications of the general concentration methods that are established in the third chapter to quantities associated with random geometric graphs. The most basic of these quantities that will be studied in Section 4.1 is the number of edges of the graph. In order to obtain concentration inequalities for this random

12 10 1. INTRODUCTION variable, we will apply the results established in the Sections and A class of functionals that contains the number of edges as a special case is given by the length power functionals which are investigated in Section 4.2. Also the total edge length of the graph, defined as the sum of the lengths of all edges, is a length power functional and thus covered by the analysis in this section. The method that is used in Section 4.1 to obtain tail estimates for the number of edges can not be adapted for the more general length power functionals. Therefore, the alternative approach from Section is used to derive concentration inequalities. In Section 4.3 we consider a slightly different model for random geometric graphs. The vertices in this model are again given by some Poisson point process on R d and any two distinct vertices x and y are connected by an edge whenever their distance is at most ρ(x)+ρ(y), where ρ(x) = ( x +1) γ and γ > 0 is some fixed parameter. The functional that is studied now is the total edge length of the resulting random graph. Interestingly, in this model one can have almost surely infinitely many edges and still a finite total edge length. However, following again the approach from Section 3.2.5, we will obtain concentration inequalities for the total edge length, regardless of whether there are almost surely finitely or infinitely many edges. Section 4.4 aims to generalize the method that was used to derive tail estimates for the number of edges in Section 4.1 to arbitrary subgraph counts. Actually, concentration inequalities for an even more general class of Poisson U-statistics will be established in this section. The tail estimates for subgraph counts will then follow as a special case. Finally, in Section 4.5 we will prove new concentration results for component counts associated with random geometric graphs. The class of functionals considered here contains in particular the number of connected components of the graph that are isomorphic to a fixed finite graph H. A further example that is covered by the analysis in this section is the number of components with at most k vertices (or alternatively with exactly k vertices). In the fifth chapter we will use the concentration inequalities for subgraph and component counts that are proved in the fourth chapter to derive strong laws of large numbers for suitably rescaled versions of the considered random variables. These asymptotic results complement some statements from [61] where comparable strong laws are proved for the case where the considered random geometric graphs are built over independently and identically distributed points in R d. An overview on related asymptotic results for subgraph and component counts is contained in Section 5.1 of this chapter.

13 CHAPTER 2 Preliminaries This chapter provides some mathematical background that is needed for reading the present thesis. However, basic notions and results from probability and measure theory will not be recalled, and the reader s knowledge is assumed to be at least at an undergraduate level in these areas of mathematics Background from point process theory The fundamental notion that is introduced in this section is that of a Poisson point process on a general measure space. We also stress that the notations and definitions established in the following will be used throughout the present thesis. The upcoming presentation follows particularly the references [20, Chapter VI], [41, Chapter 12] and [46, Section 1], which are also recommended for further reading as well as for proofs of the presented statements Random measures and point processes. Throughout, we denote by (Ω, A, P) a probability space, meaning that Ω is a set, A is a σ-algebra on Ω and P is a probability measure on the measurable space (Ω, A). Denote by R + = [0, ] the extended positive real line equipped with the Borel σ-algebra B( R + ). A random measure on a measurable space (, ) is a map such that for any B, the map η : Ω R + η(b) : Ω R +, ω η(ω, B) is a random variable and for any ω Ω, the map η ω : R +, B η(ω, B) is a measure on (, ). We will assume throughout that the considered random measures are σ-finite, meaning that there exists a partition (B i ) i N of with B i such that almost surely η(b i ) < for all i N. Denote by M the set of σ-finite measures on the measurable space (, ), equipped with the σ-algebra M generated by the maps M R +, ξ ξ(b), 11

14 12 2. PRELIMINARIES where B ranges over all sets in. Now, a random measure η can also be regarded as a measurable map η : Ω M, ω η ω. The latter interpretation is consistent with the idea that a random measure is a random element in M. A random measure η is called integer-valued if η ω takes only values in N 0 { } for almost every ω Ω. A point process is an integer-valued random measure. Let η be a random measure on (, ). The intensity measure or mean measure of η is the measure µ on (, ) that is defined by µ(b) = Eη(B), B, where E denotes the expectation with respect to P. In terms of indicator functions, the above equality can be written as 1 Bdµ = E 1 Bdη, and a basic result on random measures states that this equality can be extended to arbitrary non-negative measurable functions, i.e. fdµ = E fdη holds for any measurable map f : R +. This result is occasionally referred to as Campbell s theorem (see e.g. [68, Theorem 3.1.2]) Poisson point processes. Recall that a discrete real random variable Z is Poisson distributed with parameter τ (0, ) if P(Z = k) = τ k e τ, k N 0. k! We extend this definition by saying that Z is Poisson distributed with parameter τ = 0 if almost surely Z = 0. Moreover, we say that Z is Poisson distributed with parameter τ = if almost surely Z =. A Poisson (point) process η on (, ) is a point process that satisfies: (i) The random variables η(b 1 ),..., η(b n ) are stochastically independent whenever B 1,..., B n are pairwise disjoint; (ii) The random variable η(b) is Poisson distributed for any B. Note that, since we assume all random measures to be σ-finite, also the intensity measure of any Poisson process is σ-finite. Conversely, one can prove (see e.g. [20, Theorem 2.15 in Chapter VI]) that for any σ-finite measure µ on (, ) there exists a Poisson process on (, ) with intensity measure µ. Moreover, according to [41, Lemma 12.1], the intensity measure of a Poisson process uniquely determines its distribution. Another basic fact is that any Poisson process η can be written as a sum (2.1.1) η = η() i=1 δ i,

15 2.1. BACKGROUND FROM POINT PROCESS THEORY 13 where ( i ) i N is a family of random elements in and δ z denotes the Dirac measure associated with a point z (see e.g. [46, pp. 2-3]). This representation implies that a Poisson process is indeed a random element in the space of σ-finite integervalued discrete measures { } N = ξ = i δ xi : x i for all i and ξ is σ-finite equipped with the σ-algebra N which is generated by the maps N R +, ξ ξ(b), where B ranges over all sets in. For the remainder of this section, let η denote a Poisson process on the space (, ) with intensity measure µ. An extremely useful identity that is at the very core of the findings presented in the present work is the so-called Mecke formula, which was proved by J. Mecke (see [58, Satz 3.1]). According to this result, for any measurable map H : N R + one has [ ] (2.1.2) E H(η, x)dη(x) = E[H(η + δ x, x)]dµ(x). To state a multivariate generalization of the Mecke formula, for k N, we introduce a further point process η (k) on the product space ( k, k ). For this purpose, we use the representation (2.1.1) of the Poisson process as a sum η = η() i=1 δ i. Now let [η()] k be the random set of k-tuples (i 1,..., i k ) such that the i j are pairwise distinct indices satisfying 1 i j η(). To define the point process η (k), let η (k) (B k ) = 1{( i1,..., ik ) B k } (i 1,...,i k ) [η()] k for any B k k. The Slivnyak-Mecke formula states that for any measurable map H : N k R +, [ ] (2.1.3) E H(η, x 1,..., x k )dη (k) (x 1,..., x k ) k [ ( )] k = E H η + δ xi, x 1,..., x k dµ(x 1 ) dµ(x k ). i=1 A proof of the Slivnyak-Mecke formula for the case where the Poisson process η is simple (this concept is explained below) and the space is a locally compact topological space with a countable base can be found in [68, Corollary 3.2.3]. It is stated in [48, p. 670] that this identity extends to general Poisson processes, resulting in the formula presented above. The objects of study in the present thesis are random variables that factorize over the Poisson process η. More precisely, we call a random variable F a Poisson functional if there exists a measurable map f : N R such that almost surely F = f(η). The map f is then called a representative of F. When considering a Poisson functional F, we will usually implicitly assume that a suitable representative f of F is chosen without mentioning this explicitly. Then, we will refer to the representative f via the,

16 14 2. PRELIMINARIES notation F (ξ) = f(ξ) for any ξ N. A particular consequence of this convention is that almost surely F (η) = F Simple point processes. Throughout this subsection, we assume that {x} for all x. Then, we call an integer-valued discrete measure ξ N simple if ξ({x}) 1 for all x. Clearly, a simple measure ξ is completely determined by the (countable) set of its atoms (2.1.4) {x : ξ({x}) > 0}, which we will call the atom-set of ξ. It is therefore common practice to identify a simple measure ξ with its atom-set, which particularly allows one to use convenient set notations such as ξ B for some B and also x ξ. A point process η on is called simple if the measure η ω is simple for almost every ω Ω. So, a simple point process is a random simple measure on and, identifying each η ω with its atom-set as described above, we will also regard such a point process as random countable subset of. For a Poisson point process η the property of being simple is equivalent to the fact that its intensity measure µ is non-atomic, meaning that µ({x}) = 0 for all x (see e.g. [20, Theorem 2.17 in Chapter VI]). When dealing with simple point processes, it is often convenient and more intuitive, to think of integrals with respect to the point process as a random (possibly infinite) sum. Indeed, if η is a simple point process on and f : R + is a measurable map, then the integral of f with respect to η is given by the sum f(x)dη(x) = f(x). x η Similarly, the left-hand side of the Mecke formula (2.1.2) for simple Poisson processes can be rewritten using a sum instead of an integral. More generally, for a simple Poisson process η on, the Slivnyak-Mecke formula (2.1.3) becomes (2.1.5) E H(η, x 1,..., x k ) = (x 1,...,x k ) η k E [ H ( η + where the random set η k is defined by )] k δ xi, x 1,..., x k dµ(x 1 ) dµ(x k ), i=1 η k := {(x 1,..., x k ) k : η({x i }) > 0 for all i and x i x j whenever i j} Background from probability theory As already stated, the reader of the present thesis is assumed to have basic knowledge of probability and measure theory. Nevertheless, in this section we will briefly recall

17 2.2. BACKGROUND FROM PROBABILITY THEORY 15 some notions and results that are particularly relevant for the upcoming investigations Uniform integrability. The following presentation is based on [20, Section 3 in Chapter II] and [41, Chapter 4]. Consider a family of real random variables ( n ) n N that are defined on a common probability space. Then ( n ) n N is called uniformly integrable if lim sup E( n 1{ n > t}) = 0. t n N We will use the following result, which is a special case of [41, Proposition 4.12]. Proposition Let ( n ) n N be a family of integrable random variables that converges in probability to some random variable. Then the following statements are equivalent: (i) ( n ) n N is uniformly integrable; (ii) E n E as n. As it is for example pointed out in [20, Remark 3.13 (e) in Chapter II], a sufficient condition for a family of random variables ( n ) n N to be uniformly integrable is that it is L p -bounded for some p > 1, meaning that (2.2.1) sup E n p <. n N Moreover, according to [20, Remark 3.13 (c) in Chapter II], any uniformly integrable family of random variables ( n ) n N satisfies sup E n <. n N From this together with Proposition we immediately obtain the following result that will be a crucial tool in several occasions. Proposition Let ( n ) n N be a family of integrable random variables that is L p -bounded for some p > 1. Assume that n converges in probability to some random variable. Then E < and E n E as n Complete Convergence and the Borel-Cantelli lemma. In Chapter 5 of the present thesis, strong laws of large numbers will be proved, which are formulated in terms of complete convergence. Following the definition in [61, p. 15], we say that a sequence of random variables ( n ) n N completely converges to some a R if for any ε > 0, P( n a > ε) <. n N c.c. We write n a as n if the sequence ( n ) n N completely converges to a. A basic fact from probability theory is that complete convergence implies almost sure convergence (denoted by ). a.s. We will see below that this implication easily follows

18 16 2. PRELIMINARIES from the next result, well-known as the Borel-Cantelli lemma (see e.g. [41, Theorem 3.18] together with [41, Theorem 3.13]). Theorem Let (Ω, A, P) be a probability space and A n A, n N. Assume that n N P(A n) <. Then P(A n occurrs for infinitely many n) = 0. If the events A n are independent, then n N P(A n) = implies P(A n occurrs for infinitely many n) = 1. The Borel-Cantelli lemma implies the following result. Lemma Let ( n ) n N be a sequence of real random variables and a R such that Then n c.c. a as n. n a.s. a as n. Proof. For any ε > 0 we define the event A ε by A ε = { n a > ε only holds for finitely many n}. Since n converges completely to a, it follows from the Borel-Cantelli lemma (Theorem 2.2.3) that P(A ε ) = 1 for all ε > 0. Moreover, we have { } ε > 0 the condition n a > ε { n a as n } = only holds for finitely many n = ε>0 A ε. Now, since for ε ε > 0 one has A ε A ε, it follows that ( ) ( ) P A ε = P A 1/n = lim P(A 1/n) = 1. n ε>0 n N We now follow the presentations in [37, p. 26] and [57, p. 275] to further discuss the relationship between almost sure and complete convergence. As it was e.g. pointed out in [37, p. 26], an almost surely convergent sequence of random variables can fail to be completely convergent. However, taking into account also the second part of the Borel-Cantelli lemma, one easily observes that a sequence of independent random variables ( n ) n N converges almost surely if and only if it converges completely. Therefore, complete convergence of ( n ) n N is indeed equivalent to almost sure convergence of any sequence (Y n ) n N such that Y n is equally distributed as n for all n. A common setting where it is natural to distinguish between almost sure and complete convergence, is the asymptotic study of random variables F n = f(z 1,..., Z n )

19 2.2. BACKGROUND FROM PROBABILITY THEORY 17 that are functions of i.i.d. random elements Z 1,..., Z n. One classical example is the arithmetic mean of i.i.d. real random variables, but also quantities associated with random geometric graphs built over i.i.d. points in R d fit into this framework. A natural way to couple the distributions of the random variables F n is to consider a sequence (Z n ) n N of i.i.d. random elements and then for any n N one defines F n = f(z 1,..., Z n ). In this model, the step from F n to F n+1 is performed by keeping the outcomes of Z 1,..., Z n, then adding a further independent random element Z n+1, and finally evaluating f on Z 1,..., Z n+1. Of course, the random variables F n, n N are usually not independent in such a setting. In accordance with [57], we shall call the latter model the incrementing model. Another natural coupling for the sequence F n is obtained by considering i.i.d. random elements Z (n) 1,..., Z n (n) for any n N such that all the (Z (n) i ) i n are independent. The resulting quantities F n = f(z (n) 1,..., Z n (n) ) now form a sequence of independent random variables. In contrast to the incrementing model, the step from F n to F n+1 is now performed by replacing the realizations of the random elements Z (n) 1,..., Z n (n) with new realizations Z (n+1) 1,..., Z (n+1) n+1 independent of all previous outcomes. Again in accordance with [57], we shall call this model the independent model. In the framework described above, complete convergence results correspond to almost sure convergence for the independent model, and this also implies almost sure convergence for the incrementing model. An almost sure convergence result for the incrementing model, however, does not imply almost sure convergence for the independent model. We conclude this section with a lemma that will be used in several occasions. Lemma Let ( n ) n N be a sequence of real random variables and let (a n ) n N be a sequence of real numbers converging to some a R. Assume that for any ε > 0, (2.2.2) P( n a n ε) <. Then n=1 n c.c. a as n. Proof. Let ε > 0. Choose n 0 N such that a n a < ε/2 for all n n 0. Then for any n n 0 we have that n a > ε implies n a n ε/2. Thus, P( n a > ε) P( n a n ε/2) for all n n 0. From this together with the assumption (2.2.2) we obtain P( n a > ε) <. n=1

21 CHAPTER 3 General Methods for Concentration Inequalities This chapter aims to establish a variety of new general results that can be used to derive concentration inequalities for Poisson functionals. Most of the presented results are proved using techniques based on logarithmic Sobolev inequalities that are very close in spirit to the so-called entropy method, which was particularly investigated in [11, 12, 54, 55]. Instead of Poisson functionals, the latter references deal with random variables that can be written as functions of independent random elements. The collection of results that are derived in Section 3.1 of this chapter can therefore be regarded as contribution to the entropy method for Poisson processes. A particular focus of this theses is to investigate concentration properties of Poisson U-statistics a special class of Poisson functionals. In Section 3.2 of this chapter, general concentration results for Poisson U-statistics are derived, and one approach to obtain these results is to use the methods developed in Section 3.1. A further approach that is detailed in Section 3.2 is based on the convex distance for Poisson processes a concept that was originally introduced by M. Talagrand [73] in the context of product spaces, and that was adapted for Poisson processes by M. Reitzner in [63]. The latter method yields concentration inequalities with respect to a median instead of the expectation of the considered U-statistics. Finally, in Section 3.3 of the present chapter the convex distance inequality for Poisson processes, which was proved by Reitzner in [63], will be re-proven (up to slightly worse constants) using only techniques based on the entropy method Logarithmic Sobolev inequalities for concentration References to the underlying papers. The introductory part of the present section up to and including Remark combines text passages from [BP15, pp. 2, 5]. The next part of the section up to and including Remark , and excluding Proposition and its proof, coincides up to minor changes and some additional details with content from [BP15, Sections 3 and 4, pp. 6-18]. While in the latter reference the proofs and ancillary results are gathered in [BP15, Section 4], in the present section the statements are directly followed by their proofs and ancillary results are presented when they are needed. Lemma and its proof coincide with [Bac15, Lemma 4.1] and its proof. Finally, Theorem combines [Bac15, Theorem 3.2] and [BP15, Theorem 3.10], and its proof is a combination of the proofs of these two results, supplemented by further details. 19

22 20 3. GENERAL METHODS FOR CONCENTRATION INEQUALITIES Throughout this section, we shall denote by (,, µ) a σ-finite measure space, such that the σ-field is countably generated and µ() > 0. We write η to indicate a Poisson point process on (, ) with intensity measure µ. For the remainder of the section, we consider a Poisson functional F. Our starting point is the following powerful Theorem 3.1.1, proved by Wu in [75] (see also [16]), and extending previous breakthrough findings contained in [1, 5]. Such a result involves two objects: (i) the entropy of a random variable Z > 0 with EZ <, that is defined as Ent(Z) := E(Z log(z)) E(Z) log(ez), and (ii) the difference (or add-one cost) operator DF, that is defined for any x and ξ N as D x F (ξ) = F (ξ + δ x ) F (ξ). Theorem (See Corollary 2.3 in [75]). For all ν R satisfying E(e νf ) < we have ( )] (3.1.1) Ent(e νf ) E [e νf ψ(νd x F (η)) dµ(x), where ψ(z) = ze z e z + 1. Remark In the statements of the upcoming results, we will often work under the assumption that the add-one cost operator D x F (ξ) verifies a given property P (for instance, D x F (ξ) 0) for every x and every ξ N. This requirement means of course that there exists a representative f of F such that the quantity f(ξ + δ x ) f(ξ) verifies P for every x and every ξ N. The heart of the method we are about to present is the modified logarithmic Sobolev inequality stated below. To shorten notations, we write D I xf (ξ) = D x F (ξ)1{(x, ξ) I} whenever I N is measurable, x and ξ N. In the same spirit we will also use the notations Dx β F (ξ) = D x F (ξ)1{d x F (ξ) β}, D + x F (ξ) = D x F (ξ)1{d x F (ξ) 0}. The quantities Dx β F (ξ) and Dx F (ξ) are defined analogously, and so are the operators D x >β F and D x <β F (with strict inequalities). The following observation is derived by combining Wu s modified logarithmic Sobolev inequality for Poisson point processes (3.1.1) with the Mecke formula (2.1.2). Proposition Let I N be a measurable set. Then for all ν R satisfying E(e νf ) < we have ( )] Ent(e νf ) E [e νf ψ(νdxf I (η)) dµ(x) + ϕ( νdx Ic F (η δ x )) dη(x),

23 3.1. LOGARITHMIC SOBOLEV INEQUALITIES FOR CONCENTRATION 21 where ϕ(z) = e z z 1 and ψ(z) = ze z e z + 1. Proof. By (3.1.1), the inequality ] Ent(e νf ) E [e νf ψ(νd x F ) dµ(x) holds. Now, since ψ(0) = 0, ϕ(0) = 0 and ψ(z) = e z ϕ( z) for any z R, we have Hence, we compute Ent(e νf ) E = E = E ψ(νd x F (η)) = ψ(νd I xf (η)) + e νdxf (η) ϕ( νd Ic x F (η)). e νf ψ(νdxf I ) dµ(x) + E e νf ψ(νdxf I ) dµ(x) + E e νf ψ(νdxf I ) dµ(x) + E e νdxf +νf ϕ( νd Ic x F ) dµ(x) e νf (η+δx) ϕ( νd Ic x F ) dµ(x) e νf ϕ( νd Ic x F (η δ x ))dη(x). At this, the last equality holds by the Mecke formula (2.1.2) together with the Fubini theorem. The maps ϕ and ψ that appear in the above Proposition will play a crucial role in the upcoming investigations. The next result gathers some analytic properties of these functions. Proposition For any z R let ϕ(z) = e z z 1 and ψ(z) = ze z e z + 1. The following statements hold: (i) ϕ and ψ are non-negative, decreasing on (, 0] and increasing on [0, ); (ii) lim z 0 ϕ(z)/z 2 = 1/2 = lim z 0 ψ(z)/z 2 ; (iii) The maps z ϕ(z)/z 2 and z ψ(z)/z 2 are increasing on R \ {0}; (iv) The maps z ϕ(z)/z and z ψ(z)/z are increasing on (0, ); (v) ψ(z) ϕ(z) z 2 /2 for z 0 and ϕ(z) ψ(z) for z 0. Proof. [Proof of (i)] The derivative of ϕ is given by ϕ (z) = e z 1 and the derivative of ψ is given by ψ (z) = ze z. Thus, ϕ (z), ψ (z) 0 for z 0 and ϕ (z), ψ (z) 0 for z 0. Since moreover ϕ(0) = 0 = ψ(0), the result follows. [Proof of (ii)] This follows from an easy application of L Hôpital s rule. [Proof of (iii)] We write g(z) = ψ(z)/z 2 for z 0 and g(0) = 1/2. By virtue of statement (ii), g is a continuous function and it is easy to check that g(z) = n 0 (n + 1)z n (n + 2)!. Similarly, we write h(z) = ϕ(z)/z 2 for z 0 and h(0) = 1/2. Then h is continuous and h(z) = z n (n + 2)!. n 0

24 22 3. GENERAL METHODS FOR CONCENTRATION INEQUALITIES It is immediate from the above representations of g and h that both functions are increasing on [0, ). To prove monotonicity for negative z, we write g(z) = ψ(z) = ez ϕ( z) = e z h( z). z 2 Now, the derivative of g for z < 0 equals z 2 e z (h( z) h ( z)). Hence, we have g (z) > 0 if and only if h( z) h ( z) > 0. The derivative of h is given by h (z) = n 0 Thus, monotonicity of g follows from h( z) h ( z) = ( n 0 (n + 1)z n (n + 3)!. 1 (n + 1) (n + 2)! (n + 3)! ) ( z) n > 0. Similarly, one observes that h (z) > 0 is equivalent to g( z) g ( z) > 0. derivative of g can be written as g (z) = (n + 2)(n + 1)z n. (n + 3)! n 0 The Monotonicity of h now follows from g( z) g ( z) = ( ) n + 1 (n + 2)(n + 1) ( z) n > 0. (n + 2)! (n + 3)! n 0 [Proof of (iv)] Write g(z) = ϕ(z)/z 2. Then the derivative of g is non-negative by statement (iii). For any z > 0, the derivative of ϕ(z)/z = g(z)z is now given by g (z)z + g(z) 0. Thus ϕ(z)/z is increasing on (0, ) and analogously one deals with ψ(z)/z. [Proof of (v)] For any z R we have ψ (z) ϕ (z) = ze z e z + 1 = ψ(z) which is non-negative by statement (i). Thus, ψ ϕ and since ψ(0) = ϕ(0), it follows that ψ(z) ϕ(z) for z 0 and ϕ(z) ψ(z) for z 0. The inequality ϕ(z) z 2 /2 for z 0 is immediate from statement (ii) and statement (iii). Notice that in the following we will occasionally use the properties of ϕ and ψ that were derived in the above proposition without explicitly referring to this result. We now aim for presenting the crucial ancillary Theorem and we begin with some preliminary definitions. For any β R we define the random variables V + β = V + β (F ) and V β = V β (F ) by V + β = (Dx β F (η)) 2 dµ(x) + (D x >β F (η δ x )) 2 dη(x), = (Dx β F (η)) 2 dµ(x) + (D x <β F (η δ x )) 2 dη(x). V β

25 3.1. LOGARITHMIC SOBOLEV INEQUALITIES FOR CONCENTRATION 23 Note that we will write V + = V 0 + and V = V0. The notation is in correspondence with [11] where the entropy method for product spaces was investigated. For the proofs of several of the upcoming results, and in particular for the proof of Theorem 3.1.6, we will need the preliminary lemma below. To establish this statement, we use the Poincaré inequality for Poisson processes (see e.g. [75, Remark 1.4]), which holds true for any square-integrable Poisson functional F, providing the following bound on the variance of F : (3.1.2) VF E (D x F (η)) 2 dµ(x). Lemma Assume that for some β R we have EV + β < or EV β <. Then F is integrable. Proof. The proof is basically a variation of arguments from the proof of [75, Proposition 3.1], combined with the Mecke formula. The statement for V β is proved in the same way as for V + β. Consider for any n N the truncation F n = min(max(f, n), n). Then E(Fn) 2 <, hence the Poincaré inequality (3.1.2) yields VF n E (D x F n (η)) 2 dµ(x). Since DF n DF, we also have E (D x F n (η)) 2 dµ(x) E (D x F (η)) 2 dµ(x) and the Mecke formula (2.1.2) gives E (D x F (η)) 2 dµ(x) = EV + β. From the last three displays we obtain sup n N VF n EV + β <. Assume for contradiction that there is a subsequence (F nk ) k N satisfying lim k EF nk =. Let z > 0 and put ε k = EF nk z. Then, by means of the Chebyshev inequality, one has for sufficiently large k, EV + β P(F nk z) P( F nk EF nk ε k ) (EF nk z) 2. Thus, lim k P(F nk z) = 0 for any z > 0, contradicting the fact that F n F in distribution. We conclude that the sequence (EF n ) n N is bounded from above, and analogously one observes that this sequence is also bounded from below. Together with sup n N VF n < this implies also sup n N E(Fn) 2 <. Thus, the family (F n ) n N is L 2 -bounded and from Proposition it follows that E F <. The upcoming result can be regarded as a generalized analogue of [11, Theorem 2] for Poisson point processes. The proof is similar to the product space version where Proposition takes now the role of the logarithmic Sobolev inequality.

26 24 3. GENERAL METHODS FOR CONCENTRATION INEQUALITIES The generalization is achieved using arguments similar to those in the proof of [75, Proposition 3.1]. To get prepared for the presentation of the theorem, for β R and z > 0, let ψ(zβ)/(zβ 2 ) β > 0 Φ β (z) = z/2 β = 0 ϕ( zβ)/(zβ 2 ) β < 0, and ϕ(zβ)/(zβ 2 ) β > 0 Ψ β (z) = z/2 β = 0 ϕ( zβ)/(zβ 2 ) β < 0, where ϕ and ψ are as in Proposition Note that we will frequently use the fact that these functions are increasing, which follows from Proposition Theorem Consider the Poisson functional F and assume that one of the following conditions is satisfied: (i) β > 0 and D x F (ξ) < β holds for all (x, ξ) N; (ii) β < 0 and D x F (ξ) > β holds for all (x, ξ) N; (iii) β = 0. Let ν > 0 be such that E exp(νf ) < and assume that F is integrable. Then for any θ > 0 satisfying Φ β (ν)θ < 1, we have [ ( )] log E[exp(ν(F EF ))] Ψ β(ν)θ νv + 1 Φ β (ν)θ log E β (3.1.3) exp. θ Let ν > 0 be such that E exp( νf ) < and assume that F is integrable. Then for any θ > 0 with Φ β (ν)θ < 1, we have [ ( )] log E[exp( ν(f EF ))] Ψ β(ν)θ νv 1 Φ β (ν)θ log E β (3.1.4) exp. θ In addition to this, the relation E exp(νv + β /θ) < implies that E F < and E exp(νf ) <. Also, the relation E exp(νv β /θ) < implies that E F < and E exp( νf ) <. For the proof of the above Theorem 3.1.6, as in the proof of the product space version [11, Theorem 2], we also need [54, Lemma 11]. This result states that for any ν > 0 and any two random variables and Y satisfying E(e ν ), E(e νy ) <, we have (3.1.5) νe(e νy ) E(e νy ) νe(y eνy ) E(e νy ) + log E(e ν ) log E(e νy ). Proof of Theorem We prove (3.1.3). To prove the desired inequality we adapt the proof of [11, Theorem 2] and combine this with arguments from the proof of [75, Proposition 3.1]. We first consider the case where F is bounded. Let

27 3.1. LOGARITHMIC SOBOLEV INEQUALITIES FOR CONCENTRATION 25 ϕ and ψ be as in Proposition Then, by Proposition (iii), the maps z ψ(z)/z 2 and z ϕ(z)/z 2 are increasing on R \ {0}. Hence, for any u (0, ν] and β 0 we have ψ(uz) (ψ(uβ)/β 2 )z 2 uφ β (u)z 2 for z β, ϕ( uz) (ϕ( uβ)/β 2 )z 2 uφ β (u)z 2 for z β. In each of the above two lines, the second inequality follows from the identity e z ϕ( z) = ψ(z). As a direct consequence of Proposition (v), in the case β = 0 one also has ψ(uz) uφ β (u)z 2 for z β and ϕ( uz) uφ β (u)z 2 for z β. Together with ψ(0) = ϕ(0) = 0 this gives ψ(udx β F (η)) uφ β (u)(dx β F (η)) 2, ϕ( ud >β x F (η δ x )) uφ β (u)(d >β x F (η δ x )) 2. Hence, taking I = {(x, ξ) N : D x F (ξ) β}, it follows from Proposition that ( )] Ent(e uf ) E [e uf ψ(udx β F (η)) dµ(x) + ϕ( ud x >β F (η δ x ))dη(x) ( )] uφ β (u) E [e uf (Dx β F (η)) 2 dµ(x) + (D x >β F (η δ x )) 2 dη(x) = uφ β (u) E(V + β euf ). Moreover, taking = V + β /θ and Y = F, it follows from (3.1.5) that E(V + β euf ) E(e uf ) θe(f euf ) E(e uf ) + θ u log E(euV + β /θ ) θ u log E(euF ). Invoking the definition of the entropy, it follows from the last two displays that E(F e uf ) ue(e uf ) log ( E(euF ) θe(f e uf ) u 2 Φ β (u) ue(e uf ) + θ + log E(euVβ /θ ) θ ) u2 u 2 log E(euF ). Since by assumption Φ β (u)θ Φ β (ν)θ < 1, the latter inequality is equivalent to E(F e uf ) ue(e uf ) log E(euF ) u 2 Φ β(u)θ log E(e uv + u 2 (1 Φ β (u)θ) β /θ ) Defining h(z) = 1 z log E(ezF ) and g(z) = log E(e zv + β ), the above estimate can be restated as follows: h (u) Φ β(u)θg(u/θ) u 2 (1 Φ β (u)θ) for any u (0, ν]. Since lim u 0+ h(u) = EF, integration from 0 to ν gives (3.1.6) h(ν) EF + ν 0 Φ β (u)θg(u/θ) u 2 (1 Φ β (u)θ) du..

28 26 3. GENERAL METHODS FOR CONCENTRATION INEQUALITIES It is a well known basic fact that the logarithm of a moment generating function is convex, hence g is convex on the interval [0, ν/θ] and moreover g(0) = 0. In particular, we have for any u (0, ν] that Hence, ν 0 g(ν/θ) g(u/θ) u(1 Φ β (u)θ) ν u u(1 Φ β (u)θ) g(ν/θ) ν(1 Φ β (ν)θ). Φ β (u)θg(u/θ) u 2 (1 Φ β (u)θ) du θg(ν/θ) ν Φ β (u) ν(1 Φ β (ν)θ) 0 u In the case β < 0, we bound the integral on the right-hand side by Φ β (ν) = Ψ β (ν). This works since Φ β (u)/u is increasing by Proposition (iii). For β = 0 one immediately has ν 0 (Φ β(u)/u)du = ν/2 = Ψ β (ν), and in the case β > 0, the integral can also be explicitly computed and one obtains ν 0 (Φ β(u)/u)du = ϕ(νβ)/(νβ 2 ) = Ψ β (ν). Combining this with (3.1.6) gives log E(e νf ) νef + Ψ β(ν)θ 1 Φ β (ν)θ g(ν/θ). This proves inequality (3.1.3) for bounded F. Applying (3.1.3) to the Poisson functional F and observing that V + β ( F ) = V β (F ) yields (3.1.4) for bounded F. Now assume that F is not necessarily bounded and consider for n N the truncated random variables F n = min(max(f, n), n). du. We will now conclude that if E exp(νv + β of random variables /θ) <, then F is integrable and the family (3.1.7) {exp(ν(f n EF n ))} n N converges in probability to exp(ν(f EF )) and is L τ -bounded for some τ > 1. Thus, it follows from Proposition that lim n E exp(ν(f n EF n )) = E exp(ν(f EF )) < and hence also E exp(ν(f )) <. This together with the observation V + β (F n) V + β and the fact that we already proved (3.1.3) for the bounded random variables F n then yields the result for the unbounded F. Integrability of F follows from Lemma since the assumption E(exp(νV + β /θ)) < implies that EV + β <. By dominated convergence, integrability of F now implies the convergence in probability of the sequence in (3.1.7). Observe that, if (i), (ii) or (iii) holds, then V + β (F n) V + β (F ) = V + β.

29 3.1. LOGARITHMIC SOBOLEV INEQUALITIES FOR CONCENTRATION 27 Also note that we can choose τ > 1 such that Φ β (τν)τθ < 1. Then it follows from (3.1.3) that ( ( log E[exp(τν(F n EF n ))] Ψ β(τν)τθ νv + 1 Φ β (τν)τθ log E β exp (F )) n) θ ( ( )) Ψ β(τν)τθ νv + 1 Φ β (τν)τθ log E β exp <. θ Thus so the family in (3.1.7) is L τ -bounded. sup E[exp(ν(F n EF n )) τ ] <, n N Repeating the above reasoning for F instead of F, we can also extend inequality (3.1.4) to the case where F is unbounded. Using the above results, the methods for deriving concentration inequalities presented in [11] by Boucheron, Lugosi and Massart and also in [55] by Maurer naturally carry over to Poisson functionals. In the following we present some variations of these techniques that will be used for the applications later on. We start with some results that provide tail estimates for the random variable F (η), given that V + β or V + β are almost surely bounded by a constant. If β = 0, i.e. V or V are almost surely bounded, then the arguments of the exponential bounds presented below display quadratic terms, meaning that the decay is particularly fast in this case. Note that Wu s concentration inequality [75, Proposition 3.1] is implied by the following more general results. c almost surely. Then F is inte- Corollary Assume that F satisfies V + β grable and the following statements hold: (i) If either condition (i) or (ii) of Theorem is satisfied, then for all r 0, ( ( c P(F EF + r) exp β 2 + r ) ( log 1 + β r ) + r ) β c β ( exp r 2 β log ( 1 + β r c )). (ii) If β = 0, that is if V + c holds almost surely, then for all r 0, ) P(F EF + r) exp ( r2. 2c Proof. It follows from Lemma that F is integrable. Theorem yields log E(e ν(f EF ) ) inf θ (0,1/Φ β (ν)) Ψ β (ν)νc 1 Φ β (ν)θ = Ψ β(ν)νc.

30 28 3. GENERAL METHODS FOR CONCENTRATION INEQUALITIES Markov s inequality now gives for any ν > 0, P(F EF + r) = P(e ν(f EF ) e νr ) E(eν(F EF ) ) e νr exp (Ψ β (ν)νc νr). Optimizing in ν yields the desired concentration bounds. Note also that the second inequality in (i) is obtained via a straightforward analysis that involves the estimate (1 + z) log(1 + z) z z log(1 + z) for z 0, 2 or equivalently (1 + z/2) log(1 + z) z 0. To verify the latter inequality, observe that the well-known estimate log(z + 1) z/(z + 1) implies that the derivative of the left-hand side is non-negative. We continue with the corresponding version for the lower tail. This corollary is obtained in the same way as the above one where inequality (3.1.4) is used instead of (3.1.3). The proof is therefore omitted. c almost surely. Then F is inte- Corollary Assume that F satisfies V β grable and the following statements hold: (i) If either condition (i) or (ii) of Theorem is satisfied, then for all r 0, ( ( c P(F EF r) exp β 2 + r ) ( log 1 + β r ) + r ) β c β ( exp r 2 β log ( 1 + β r c )). (ii) If β = 0, that is if V c holds almost surely, then for all r 0, ) P(F EF r) exp ( r2. 2c The following corollary is useful to obtain concentration inequalities under less restrictive boundedness conditions on V +. This is an adaptation for Poisson processes that corresponds to a combination of the results [11, Theorem 8 and Theorem 9]. Corollary Assume that F 0 and that there is a random variable G 0 and an α [0, 2) such that almost surely V + GF α. Let θ > 0 and ν (0, 2/θ) be such that E exp(νg/θ) <. Then EF 1 α/2 < and log E[exp(ν(F 1 α/2 EF 1 α/2 ))] νθ [ ( )] νg 2 νθ log E exp. θ Proof. Here we adapt and combine the proofs of [11, Theorem 8 and Theorem 9]. For α = 0, the statement follows directly from Theorem 3.1.6, so let α (0, 2).

31 3.1. LOGARITHMIC SOBOLEV INEQUALITIES FOR CONCENTRATION 29 Let γ = 1 α/2. Then, on the event {F 0}, we have (D x + F γ (η δ x )) 2 dη(x) = 1{F (η) γ F (η δ x ) γ > 0}(F (η) γ F (η δ x ) γ ) 2 dη(x) + 1{F (η) γ F (η δ x ) γ = 0}F (η) 2γ dη(x) = 1{F (η) F (η δ x ) > 0} + 1{F (η) F (η δ x )= 0}F (η) 2γ dη(x). ( F (η) F (η) 1 γ F (η δ ) 2 x) F (η δ x ) 1 γ dη(x) Since 1 γ > 0, we have that F (η) F (η δ x ) implies F (η) 1 γ F (η δ x ) 1 γ. Hence, the above expression is upper bounded by ( F (η) 1{F (η) F (η δ x ) > 0} F (η) 1 γ F (η δ ) 2 x) F (η) 1 γ dη(x) + 1{F (η) F (η δ x )= 0}F (η) 2γ dη(x) = 1 ( D + F (η) α x F (η δ x ) ) 2 dη(x). Quite similarly one obtains that on the event {F 0}, (Dx F γ (η)) 2 dµ(x) 1 F (η) α (Dx F (η)) 2 dµ(x). Hence, it follows that on the event {F 0, V + GF α }, V + (F γ ) = (Dx F γ (η)) 2 dµ(x) + (D x + F γ (η δ x )) 2 dη(x) V + F (η) α G. Moreover, it is easy to check that on the event {F = 0, V + GF α }, one has that V + (F γ ) = 0 = V +. Therefore, by virtue of the assumption that almost surely V + GF α, it follows that almost surely V + (F γ ) G. Applying Theorem to the random variable F γ yields the result. In the case where the random variable G in the above corollary is just a constant, we obtain a concentration inequality for the upper tail. The proof of the next result uses the following relation between moments and tails of a real random variable Z 0 which is taken from [41, Lemma 3.4]: (3.1.8) EZ p = p 0 P(Z r)r p 1 dr for any p > 0.

32 30 3. GENERAL METHODS FOR CONCENTRATION INEQUALITIES Corollary Assume that F 0 and that for some α [0, 2) and c > 0 we have almost surely V + cf α. Then all moments of F exist and for all r 0, P(F EF + r) exp ( ((r + EF )1 α/2 (EF ) 1 α/2 ) 2 ). 2c Proof. For α = 0, the statement follows directly from Corollary (ii), so let α (0, 2). Let γ = 1 α/2. Continuing in the same way as in the proof of Corollary yields that almost surely V + (F γ ) = (Dx F γ (η)) 2 dµ(x) + (D x + F γ (η δ x )) 2 dη(x) c. We conclude that Corollary (ii) applies to F γ. So F γ is non-negative and has an exponentially decaying upper tail. Thus, by virtue of (3.1.8), all moments of F γ exist, hence all moments of F exist. As it was pointed out in [11, p. 1588], we can now write P(F EF + r) = P(F γ (r + EF ) γ ) P(F γ E(F γ ) (r + EF ) γ (EF ) γ ) exp ( ((r + EF )γ (EF ) γ ) 2 ). 2c We continue with a lemma that is used in the proof of the upcoming Theorem Note that for any real number z R we write z + = z1{z 0} and z = z1{z 0}. Lemma Let n N and consider F n = min(max(f, n), n) and V + (n) = V + (F n ). Then for any real number b 0, almost surely F (V + (n) b) F n(v + b) if F, F n 0, F (V + (n) b) F n(v + b) if F, F n 0. Proof. It is easy to see that V + (n) V +. Hence, the desired statement holds on the event {F = F n }. If F F n, then either F n = n < F or F n = n > F. The latter case implies V + (n) = 0 and F, F n 0, hence the desired statement holds. So consider the case F n = n < F and let A = F/n. Then the desired inequality is equivalent to AV + (n) V + + (A 1)b.

33 3.1. LOGARITHMIC SOBOLEV INEQUALITIES FOR CONCENTRATION 31 Since b 0 and A > 1, the above inequality is implied by AV + (n) V +, i.e. by ( ) A (n F n (η δ x )) 2 +dη(x) + (F n (η + δ x ) n) 2 dµ(x) (An F (η δ x )) 2 +dη(x) + (F (η + δ x ) An) 2 dµ(x). To prove this, it suffices to conclude (3.1.9) (3.1.10) A(n F n (η δ x )) 2 + (An F (η δ x )) 2 +, A(F n (η + δ x ) n) 2 (F (η + δ x ) An) 2. We prove (3.1.9). If F (η δ x ) > n, then If F (η δ x ) n, then A(n F n (η δ x )) 2 + = 0 (An F (η δ x )) 2 +. F (η δ x ) F n (η δ x ) =: m. Now, since m n and A > 1, we have An 2 m 2 0. This gives thus Hence, (A 2 A)n 2 + (1 A)m 2 0, (An m) 2 = A 2 n 2 2Anm + m 2 An 2 2Anm + Am 2 = A(n m) 2. A(n F n (η δ x )) 2 + (An F n (η δ x )) 2 + (An F (η δ x )) 2 +, where the last inequality follows from F (η δ x ) F n (η δ x ) < An. This proves (3.1.9) and analogously one obtains (3.1.10). The result follows. The next result is a variation of Corollary for Poisson functionals that are not necessarily non-negative. This is the Poisson space analogue of [11, Theorem 5]. Theorem Assume that the Poisson functional F is integrable and that for some a > 0 and b 0 we have almost surely V + af + b. Then for any ν (0, 2/a) we have E(e νf ) < and (3.1.11) log E(exp(ν(F EF ))) ν2 (aef + b). 2 aν Moreover, for any r 0, ( r 2 ) P(F EF + r) exp. 2aEF + 2b + ar

34 32 3. GENERAL METHODS FOR CONCENTRATION INEQUALITIES Proof. We first consider the case where F is bounded, and for this case we adapt the proof of [11, Theorem 5]. In order to establish (3.1.11), we argue in the same way as in the beginning of the proof of Theorem to obtain for any u (0, ν], Invoking the assumption on V + yields Ent(e uf ) 1 2 u2 E(V + e uf ). ue(f e uf ) E(e uf ) log E(e uf ) 1 2 u2 (ae(f e uf ) + be(e uf )). With h(u) = 1 u log E(euF ) this can be rearranged as h (u) 1 2 (a log(e(euf )) + b). Integrating this from 0 to ν gives h(ν) EF 1 ( a log(e(e νf ) + νb ). 2 Noting that aν < 2 and rearranging the above inequality, we obtain (3.1.11) for the bounded case. For the unbounded case, consider for any n N the truncated random variables F n = min(max(f, n), n). It follows from the assumptions and Lemma that almost surely F (V + (n) b) af nf if F 0, F (V + (n) b) af nf if F 0. Note also that for F = 0 = F n we have V + (n) V + b. Therefore, almost surely V + (n) af n + b, so (3.1.11) holds for all F n. By dominated convergence, the sequence EF n is convergent, hence bounded above by some constant C. Moreover, we can choose a τ > 1 such that τν < 2/a. Thus, since we already proved that (3.1.11) applies to all the F n, we conclude ( τ sup E[exp(ν(F n EF n )) τ 2 ν 2 ) ] exp (ac + b) <. n N 2 aτν This means that the family of random variables {exp(ν(f n EF n ))} n N is L τ -bounded. Hence, Proposition gives lim E exp(ν(f n EF n )) = E exp(ν(f EF )) <. n We note again that the result is already proved for the F n and that EF n EF as n. This concludes the proof of (3.1.11).

35 3.1. LOGARITHMIC SOBOLEV INEQUALITIES FOR CONCENTRATION 33 The concentration inequality now follows using the inequality we just proved together with Markov s inequality and the following estimate taken from [11, Lemma 11]: ) (νr Cν2 r 2 for all c, C > 0, 1 cν 2(2C + cr) sup ν [0,1/c) where we put c = a/2 and C = (aef + b)/2. Here we can indeed assume C > 0 since under the assumption that a.s. V + af + b one has that aef + b = 0 implies a.s. af + b = 0, thus a.s. F = b/a, and the result holds trivially in this case. We continue with some preparations for the proof of the upcoming Theorem For this we use the FKG inequality for Poisson point processes that was first published by Janson in [38, Lemma 2.1]. The following version, which is formulated in terms of the difference operator, is a special case of the result [48, Theorem 1.4] by Last and Penrose. Lemma Let F and G be bounded Poisson functionals and assume that D x F (ξ), D x G(ξ) 0 for all (x, ξ) N. Then E(F G) (EF )(EG). It was also remarked by Janson in [38, p. 318] that under conditions like F, G 0 or EF 2, EG 2 <, the above result easily extends to unbounded functionals by monotone convergence. For our purpose we need the following extension. Corollary Let F, G 0 be Poisson functionals. Assume that F is bounded and G is integrable. Moreover, assume that D x F (ξ) 0 and D x G(ξ) 0 for all (x, ξ) N. Then E(F G) (EF )(EG). Proof. Since F is bounded, it follows from EG < that also E(F G) <. Now consider for any n N the truncations G n = min(g, n). Then we have almost surely G n G and F G n F G as n. By monotone convergence, EG n EG and E(F G n ) E(F G) as n. It follows from Lemma that for any n N, E(F G n ) (EF )(EG n ). The result follows. The next result applies whenever F is increasing, meaning that DF 0, and V is increasing and integrable. In this case, we obtain lower tail inequalities for the random variable F that display a quadratic term in the argument of the exponential bound, providing a particularly fast decay of the estimate.

36 34 3. GENERAL METHODS FOR CONCENTRATION INEQUALITIES Theorem Assume that the Poisson functional F is not almost surely constant and satisfies D x F (ξ), D x V (ξ) 0 for all (x, ξ) N and EV (η) <. Then F = F (η) is integrable and EV > 0, and for all r 0 we have ) P(F EF r) exp ( r2 2EV. Proof. This proof is inspired by ideas from the proof of [11, Theorem 6]. For any n N consider the truncations F n = min(max(f, n), n). Then the F n are again increasing. Let ν < 0. It follows from Proposition with I = N that for any u [ν, 0) we have ] Ent(e ufn ) E [e ufn ψ(ud x F n ) dµ(x). Since by Proposition (v) one has ψ( z) (1/2)z 2 for z 0, the right-hand side of the above expression does not exceed ] 1 [e 2 E ufn (ud x F n ) 2 dµ(x) = 1 2 u2 E(e ufn V (F n )). We have V (F n ) V almost surely, hence E(e ufn V (F n )) in the above display can be upper bounded by E(e ufn V ). Now, since F n is increasing and u < 0, the functional e ufn is decreasing, meaning that D(e ufn ) 0, and bounded. Moreover, by assumption the functional V is increasing and EV <. Hence, by Corollary we have E(e ufn V ) E(e ufn ) EV. It follows from the above considerations that Integrating from ν to 0 yields h (u) 1 2 EV where h(u) = 1 u log E(euFn ). (3.1.12) log E[exp(ν(F n EF n ))] 1 2 ν2 EV. Since EV <, by Lemma we have E F <. Hence, the dominated convergence theorem yields that EF n EF as n. This implies exp(ν(f n EF n )) exp(ν(f EF )) as n almost surely and thus also in probability. Replacing ν in (3.1.12) by 2ν gives sup E[exp(ν(F n EF n )) 2 ] exp(2ν 2 EV ) <, n N

37 3.1. LOGARITHMIC SOBOLEV INEQUALITIES FOR CONCENTRATION 35 so the family {exp(ν(f n EF n ))} n N is also L 2 -bounded and invoking (3.1.12), we conclude that Applying Proposition Using Markov s inequality, one obtains log E[exp(ν(F EF ))] 1 2 ν2 EV. P(F EF r) E[exp(ν(F EF ))]e νr exp ( ) 1 2 ν2 EV + νr. Now, if EV > 0, the desired concentration estimate follows from an easy optimization in ν. Assume that EV = 0. Then, since the inequality in the above display holds true for any ν < 0, it follows that P(F EF r) = 0 for all r > 0. Therefore, continuity of P yields P(F < EF ) = 0 and this implies almost surely F = EF, contradicting the assumption that F is not a.s. constant. Remark Assume that the Poisson functional F is increasing. A sufficient condition for the assumption DV 0 in the above theorem is that the second iteration of the difference operator of F is non-negative. Indeed, assume that for every (x, ξ) N, D z D x F (ξ) 0 for µ-almost every z. Then, since obviously D z D x F (ξ) = D x D z F (ξ), one has D z F (ξ + δ x ) D z F (ξ) 0 and hence D z F (ξ + δ x ) 2 D z F (ξ) 2 for all (x, ξ) N and µ-a.e. z. So we see that also D(DF ) 2 is non-negative, thus yielding that for every (x, ξ) N, D x V (ξ) = D x (D z F (ξ)) 2 dµ(z) = D x (D z F (ξ)) 2 dµ(z) 0. We continue with an analytic lemma that is used in the upcoming proof of Theorem Lemma Let ψ(z) = ze z e z + 1. Then for any a > 0 and z > 0, one has aψ(z)/z aψ(z)/z max(a, 4/3). 2 Proof. The desired inequality can be rearranged as aψ(z)(1 cz) cz 2, where c = max(a, 4/3)/2. It will be established below that (3.1.13) ψ(z) ( z) 1 2 z2. In the case a 4/3, one has c = a/2. Hence, using (3.1.13) we obtain aψ(z)(1 cz) = aψ(z) ( 1 a 2 z) aψ(z) ( z) a 2 z2 = cz 2, so the result holds in this case. Now assume that a < 4/3. Then c = 2/3 and using again (3.1.13) yields aψ(z)(1 cz) = aψ(z) ( z) a 2 z2 cz 2.

38 36 3. GENERAL METHODS FOR CONCENTRATION INEQUALITIES It remains to prove (3.1.13). To do so, we first rearrange this inequality as To prove the above, we compute 5 3 zez z2 e z + e z z z zez + 1 = 5 z n = 2 z n+1 n! 3 + (n + 1)z n n! (n + 1)! n 0 n 0 n 0 = 2 3 n 0 z n+1 n! = 2 z n+1 3 n! n 1 = 2 3 z2 zn+1 z n+1 (n + 1)! n (n + 1)! + n 1 n z + n zn+1 (n + 1)! z2 + e z n z n (n + 1)! + 3 z n 2 (n + 1) + e z + 2 (n + 2)! 3 z z2. n 1 n 1 Now, the last expression in the above display can be upper bounded by 2 3 z2 z n e z + 2 n! 3 z z2 = 2 3 z2 e z + e z z z2, n 1 where we used the obvious estimate 1 3(n + 1) + (n + 1)! 2(n + 2)! 1 n!. We conclude this section with a result that deals with the situation when the difference operator DF is bounded. In this case, we obtain a concentration inequality for the lower tail by controlling the random variable V +. This is a generalized Poisson space analogue of [55, Theorem 13]. Theorem Assume that F 0 and EF > 0. some a > 0 we have Assume moreover that for D x F (ξ) 1 for any (x, ξ) N and almost surely V + (η) af (η). Then all moments of F exist and for any r 0 we have ( r 2 P(F EF r) exp 2 max(a, 4/3)EF If moreover D x F (ξ) 0 for any (x, ξ) N, then even ( r 2 P(F EF r) exp 2 max(a, 1)EF ). ).

39 3.1. LOGARITHMIC SOBOLEV INEQUALITIES FOR CONCENTRATION 37 Proof. Here we adapt the proof of the product space version [55, Theorem 13]. First consider the case where F is bounded. Then by Proposition with I = {(x, ξ) N : D x F (ξ) 0} we have for any u < 0, ( )] Ent(e uf ) E [e uf ψ(udx F (η)) dµ(x) + ϕ( ud x + F (η δ x )) dη(x). Moreover, by assumption we have D x F (η) 1, thus udx F (η) u. Since by Proposition (iii) the map z ψ(z)/z 2 is increasing, it follows that ψ(udx F (η))dµ(x) = u 2 ψ(udx F (η)) u 2 Dx F (η) 2 D x F (η) 2 dµ(x) u 2 ψ( u) u 2 Dx F (η) 2 dµ(x), where we interpret 0/0 as 1. Similarly, one also has ϕ( ud x + F (η)) dη(x) u 2 ϕ( u) u 2 D x + F (η δ x ) 2 dη(x). Now, since by Proposition (v) we have that ϕ( u) ψ( u) for any u < 0 and since by assumption Dx F (η) 2 dµ(x) + D x + F (η δ x ) 2 dη(x) = V + (η) af (η), it follows from the above considerations that Ent(e uf ) ψ( u)ae(f e uf ). Dividing this inequality by u 2 E(e uf ) gives (3.1.14) h (u) = Ent(euF ) u 2 E(e uf ) ψ( u)ae(f euf ) u 2 E(e uf, ) where h(u) = u 1 log E(e uf ). Let ν < 0. Integrating inequality (3.1.14) from ν to 0 and using that, since ψ( u)/u 2 is decreasing in u, one has ψ( u)/u 2 ψ( ν)/ν 2 for all u [ν, 0), yields EF 1 ν log E(eνF ) ψ( ν) ν 2 a log E(e νf ). Since 1 aψ( ν)/ν is positive, we can rearrange this inequality as (3.1.15) log E[exp(ν(F EF ))] ν 2 aψ( ν)/ν 2 1 aψ( ν)/ν EF. To extend the above inequality to the case where F is unbounded, we consider again the truncated random variables F n = min(max(f, n), n). Exactly as in the proof of Theorem , one concludes that almost surely V + (n) af n, and since moreover DF n DF 1, it follows that (3.1.15) holds true for the truncations F n. According to Corollary , the condition V + af ensures that all moments of F exist, in particular E F <. Thus, dominated convergence yields that EF n EF as n, hence also exp(ν(f n EF n )) exp(ν(f EF )) as n

40 38 3. GENERAL METHODS FOR CONCENTRATION INEQUALITIES in probability. Moreover, since the sequence (EF n ) n N is convergent, there exists a constant C > 0 such that EF n C for all n N. Using this together with (3.1.15), where ν is replaced by 2ν, yields sup E[exp(ν(F n EF n )) 2 ] exp n N (ν 2 aψ( 2ν)/ν 2 ) 1 aψ( 2ν)/(2ν) C <, so the family {exp(ν(f n EF n ))} n N is L 2 -bounded. Invoking Proposition and inequality (applied to F n ) together with the fact that lim n EF n = EF, one infers that also holds true for the unbounded F. Now, by Lemma we have (3.1.16) aψ( ν)/ν 2 1 aψ( ν)/ν max(a, 4/3). 2 The above relation combined with (3.1.15) and Markov s inequality yields ( ) P(F EF r) E[exp(ν(F EF ))]e νr 2 max(a, 4/3) exp ν EF + νr. 2 The first tail bound of the result is now obtained by an easy optimization in ν. The second tail bound for the case where the functional F is increasing is obtained analogously, where the estimate aϕ( ν)/ν 2 1 aϕ( ν)/ν max(a, 1) 2 is used instead of (3.1.16). The latter inequality is stated in [55, Equation (12)] Concentration for U-statistics References to the underlying papers. The first part of the present section up to and including Subsection coincides up to minor changes with the content from [BP15, Section 5, pp ], where the introductory part corresponds to [BP15, Subsection 5.1] and the Subsections and correspond to [BP15, Subsections 5.2 and 5.3], respectively. Subsection is based on and coincides partially with [BP15, Subsection 5.4], where Proposition corresponds to (a part of) [BP15, Proposition 5.7]. The first part of Subsection up to and including Theorem coincides up to minor changes with content from [BR15, p. 6]. The remainder of Subsection coincides up to minor changes with content from [BR15, pp ]. Subsection coincides up to minor changes with content from [BP15, Section 7, pp ]. The aim of the present section is to investigate the concentration properties of Poisson U-statistics. For this purpose, we need to specialize the very general framework that was in order so far. Throughout this section, we consider again a σ-finite measure space (,, µ) such that the σ-field is countably generated and µ() > 0. Moreover, we shall assume that {x} for every x and that µ is non-atomic. Recall that the Poisson point process η whose intensity measure is given by the non-atomic µ is now simple, meaning that almost surely η({x}) 1 for all x.

41 3.2. CONCENTRATION FOR U-STATISTICS 39 Also recall that in this setting we identify η with its atom-set {x : η({x}) > 0}. In addition to these conventions, for any integer-valued discrete measure ξ N we will write x ξ to indicate that x is an atom of ξ, meaning that x and ξ({x}) > 0. Note that this notation is consistent with the convention that we identify a simple measure ξ N with its atom-set. Remark Consider a Poisson functional F together with some representative f : N R. For any ξ N, we denote by ξ the simple measure uniquely determined by its value on singletons via the relation ξ ({x}) = 1{ξ({x}) > 0}, for all x. Then, since η is simple, we have that almost surely F = f(η ). It follows that another representative of F is given by f : N R, where f (ξ) = f(ξ ). Therefore, without loss of generality, we can assume that F (ξ) = F (ξ ) for all ξ N, that is, given an arbitrary functional ξ F (ξ), in this section we will systematically select a representative of F that only depends on ξ via the map ξ ξ. With this convention, one has that D x F (ξ) = D x F (ξ ), and also that D x F (ξ) = 0 whenever x ξ. Finally we observe that, again by virtue of the above convention and in accordance with the content of Remark 3.1.2, the fact that the quantity D x F (ξ) verifies some property P for every x and every ξ N is equivalent to the fact that P is verified for all (x, ξ) N such that ξ is simple. We now recall some relevant definitions. Let k N and let f : k [0, ) be a symmetric measurable map. Define the functional S f : N [0, ] by (3.2.1) S f (ξ) = f(x), where we shall recall that (3.2.2) x ξ k ξ k := {x = (x 1,..., x k ) : x i ξ for all i and x i x j whenever i j}. A (Poisson) U-statistic F of order k with kernel f is a random variable such that almost surely F = S f (η). According to the Slivnyak-Mecke formula (2.1.5), the expectation of a U-statistic F is given by EF = f(x 1,..., x k )dµ(x 1 ) dµ(x k ); see e.g. [64, Section 3] for more details as well as for an introduction to U-statistics with kernels that may have arbitrary sign Choice of a representative. In order to apply results from Section 3.1 to a Poisson U-statistic F with kernel f 0, we first need to choose a suitable representative of F. Whenever the considered U-statistic F is almost surely finite, we can choose as a representative of F the map f : N R defined by f(ξ) = S f (ξ) if S f (ξ) < and f(ξ) = 0 if S f (ξ) =. In order to avoid technical problems arising from the choice of this representative, we will often assume that a given U-statistic with kernel f is well-behaved. By this we mean that there exists a measurable set B N with P(η B) = 1 such that

42 40 3. GENERAL METHODS FOR CONCENTRATION INEQUALITIES (i) S f (ξ) < for all ξ B, (ii) ξ + δ x B whenever ξ B and x, (iii) ξ δ x B whenever ξ B and x ξ. If F is well-behaved, then we will choose as a representative of F the map f : N R defined by f(ξ) = S f (ξ) if ξ B and f(ξ) = 0 if ξ B c. Then, for any (x, ξ) N one has D x F (ξ) = S f (ξ + δ x ) S f (ξ) < if ξ B and D x F (ξ) = 0 if ξ B c. Note that by virtue of (3.2.2) and (3.2.1), the above choices of a representative imply F (ξ) = F (ξ ) for all ξ N, which is consistent with Remark Finally, note that U-statistics that arise in typical applications (in particular, all U-statistics considered in this work) are usually well-behaved in the sense described above General results. We will use an explicit expression for the difference operator of a U-statistic that was established by Reitzner and Schulte in [64]. The upcoming Proposition gathers together several results from [64, Lemma 3.3 and Theorem 3.6] (see also [69, Lemma 4.10 and Corollary 4.12] for analogous results formulated in a more general framework), in a form that is adapted to our setting. In the following, for every integer k 1 and every real p > 0, we will write L p (µ k ) := L p ( k, k, µ k ) for the L p -space associated with ( k, k, µ k ), and also use the shorthand notation L p (µ 1 ) = L p (µ). Proposition Let the above assumptions and notation prevail, let F be a U- statistic with non-negative kernel f and let S f be as in (3.2.1). Then, for any ξ N and x ξ, one has S f (ξ) S f (ξ δ x ) = kf (x, ξ) whenever S f (ξ) <, where, for any ξ N and every x ξ such that ξ({x}) = 1, the local version of F is defined as (3.2.3) F (x, ξ) := f(x, y), y (ξ\x) k 1 where ξ \ x is shorthand for ξ δ x, and F (x, ξ) := 0 whenever ξ({x}) > 1. Moreover, if EF 2 <, then it follows that f L 1 (µ k ) L 2 (µ k ). As a direct consequence of the above result together with our canonical choices of a representative, described in Subsection 3.2.1, we obtain: Corollary Let F be a U-statistic with non-negative kernel f. Then the following statements hold: (i) If F is almost surely finite, then there exists a measurable set B N that satisfies P(η B) = 1 such that for any ξ B and x ξ, the local version F (x, ξ) is finite and D x F (ξ δ x ) = kf (x, ξ).

43 3.2. CONCENTRATION FOR U-STATISTICS 41 (ii) If F is well-behaved, then there exists a measurable set B N that satisfies P(η B) = 1 such that the following holds: (a) For any ξ B, x ξ and z, the local versions F (x, ξ) and F (z, ξ + δ z ) are finite, and moreover D x F (ξ δ x ) = kf (x, ξ) and D z F (ξ) = kf (z, ξ + δ z ). (b) For any ξ B c, x ξ and z, one has D x F (ξ δ x ) = 0 = D z F (ξ). The previous Corollary implies that, if F is an almost surely finite U-statistic with kernel f 0, then almost surely (3.2.4) V + (F ) = k 2 x η F (x, η) 2. If F is in addition well-behaved, then almost surely (3.2.5) V (F ) = k 2 F (x, η + δ x ) 2 dµ(x). We have therefore the following consequences of Corollary and Theorem Corollary Consider an almost surely finite U-statistic F of order k with non-negative kernel f. Assume that for some α [0, 2) and c > 0 we have almost surely (3.2.6) F (x, η) 2 cf α. x η Then all moments of F exist and for all r 0, P(F EF + r) exp ( ((r + EF )1 α/2 (EF ) 1 α/2 ) 2 ) 2ck 2. While the above result is immediate from Corollary , in order to derive the corollary below, we carry out some details in the subsequent proof. Corollary Consider a well-behaved U-statistic F of order k with non-negative kernel f. Assume that EF > 0 and (3.2.7) V := E F (x, η + δ x ) 2 dµ(x) <. Then F is integrable and V > 0, and for all r 0 we have ) P(F EF r) exp ( r2 2k 2. V Proof. Using Corollary (ii) and the Mecke formula (2.1.2) we infer 0 < EF = E F (x, η) = E F (x, η + δ x ) dµ(x). x η

44 42 3. GENERAL METHODS FOR CONCENTRATION INEQUALITIES This implies V > 0. In particular, we only need to deal with the case where F is not a.s. constant. We have EV (F ) = k 2 EV < and since F is well-behaved, it follows from Corollary (ii) that D x F (ξ) 0 for any (x, ξ) N. So the result follows from Theorem together with Remark once we proved that for any (x, ξ) N one has that D z D x F (ξ) 0 holds for µ-a.e. z. Now, since the measure µ is non-atomic, it is sufficient if D z D x F (ξ) is non-negative for any (z, x, ξ) N such that x z. Let B N be a measurable set as described in Subsection For any ξ B c it is immediate that D z D x F (ξ) = 0, so let ξ B and x, z such that x z. If x ξ or z ξ, one also has clearly that D z D x F (ξ) = 0, so assume that x, y / ξ. Using Corollary (ii) we can now write D z D x F (ξ) = D x F (ξ + δ z ) D x F (ξ) = kf (x, ξ + δ x + δ z ) kf (x, ξ + δ x ) = k f(x, y) k f(x, y) y (ξ+δ z) k 1 = k(k 1) y ξ k 2 f(z, x, y). y ξ k 1 Note that the above formula for DDF is just a special case of the formula for the iterated difference operator of a U-statistic that was established in the proof of [64, Lemma 3.5], and that the above computation is only carried out for the reader s convenience. The result follows from the last display since f Computing V in formula (3.2.7). We will now point out (by gathering some facts from existing literature) that condition (3.2.7) is equivalent to squareintegrability of the U-statistic F and that one can obtain a rather explicit expression for V in terms of some set of auxiliary kernels built from f. Definition Let f be a symmetric element of L 1 (µ k ), for some k 1. For n N, we define the kernels f n : n R by ( ) k (3.2.8) f n (y 1,..., y n ) := f(y 1,..., y n, z 1,..., z k n )dµ k n (z 1,..., z k n ) n k n if n k and the integral on the right-hand side is well defined, and f n (y 1,..., y n ) := 0 otherwise. Observe that, since f is in L 1 (µ k ), then the set of those (y 1,..., y n ) such that the integral on the right-hand side of (3.2.8) is not defined has measure µ n equal to zero, for every n = 1,..., k. Plainly, each f n is a symmetric map from n into R and f n L 1 (µ n ), for every n = 1,..., k. Moreover, by definition we have f k = f and f n = 0 for all n > k. It was proved by Reitzner and Schulte in [64, Lemma 3.5] (see also [69, Theorem 4.1] for a similar result formulated in a more general framework) that for a squareintegrable U-statistic F with kernel f, the maps f n built from f as defined in Definition coincide with the kernels of the Wiener-Itô chaos expansion of F, and

45 3.2. CONCENTRATION FOR U-STATISTICS 43 that in this case, one has VF = k n! f n 2 n, n=1 where n denotes the norm on L 2 (µ n ). Therefore, a square-integrable U-statistic with kernel f also satisfies nn! f n 2 n = n=1 k nn! f n 2 n < n=1 and according to [46, Theorem 5.1], the latter condition implies k 2 V = E (D x F ) 2 dµ(x) <, where the first equality holds by (3.2.5) whenever F is well-behaved. Moreover, the computation in the last display of the proof of [46, Theorem 5.1] yields that if the above condition is verified, then E (D x F ) 2 dµ(x) = nn! f n 2 n. Finally, as a direct consequence of [47, Proposition 2.5] one has that an integrable U-statistic F that satisfies E (DF )2 dµ < is automatically square-integrable. n=1 Combining the above facts yields the following statement. Proposition Consider a well-behaved U-statistic F of order k 1, with non-negative kernel f L 1 (µ k ). Then, the following assertions are equivalent: (i) F is square-integrable; (ii) V <, where V is defined in (3.2.7). If one of the above conditions is verified, then (3.2.9) k 2 V = k nn! f n 2 n and VF = n=1 so that, in particular, V k 1 VF. k n! f n 2 n, Concentration around the median. Throughout this subsection, we consider the special case where the measurable space (, ) is given by the Euclidean space R d equipped with the Borel σ-algebra B(R d ). In this framework, in addition to the estimate from Corollary 3.2.4, a variation of the approach presented in [65, Sections 5.1 and 5.2] and [45, Section 3] gives that for finite intensity measure Poisson processes, the condition (3.2.6) also implies a concentration inequality around the median instead of the expectation of the considered U-statistic. Moreover, it is possible to extend these tail estimates to U-statistics built over non-finite intensity measure processes, resulting in the forthcoming theorem. To state this result, we first n=1

46 44 3. GENERAL METHODS FOR CONCENTRATION INEQUALITIES need to introduce a further notation. For any real random variable Z, we denote by MZ the smallest median of Z, i.e. (3.2.10) MZ = inf{x R : P(Z x) 1/2}. Note that MZ is exactly the value which the quantile-function of Z takes at 1/2. We are well prepared to state the announced result. Theorem Let F be a U-statistic of order k with kernel f 0. Assume that F is almost surely finite and satisfies (3.2.6) for some α [0, 2) and c > 0. Then for all r 0, ( P(F > MF + r) 2 exp r 2 4k 2 c(r + MF ) α If moreover MF > 0, then for all r 0, ( r 2 ) P(F < MF r) 2 exp 4k 2 c(mf ) α. be the space of finite simple measures on R d, equipped with the σ-algebra N (s) fin that is obtained by restricting N to N (s) fin. Note that as usual, we identify the elements in N(s) fin with their atom-sets, meaning in this case that N (s) fin is identified with the finite subsets of Rd. We will use the convex distance for Poisson point processes, which was introduced The approach that is described in the following to obtain the concentration inequalities presented in Theorem is a refinement of the method suggested in [65, Sections 5.1 and 5.2] and [45, Section 3]. Before we consider U-statistics under the general assumptions of Theorem above, we deal with the case where the intensity measure of the Poisson process η is finite. For this purpose, let N (s) fin by Reitzner in [63, pp. 1-2]. For any ξ N (s) fin and A N(s) fin when specialized to the case of simple measures, is given by where d T (ξ, A) = sup inf u S(ξ) ν A x ξ\ν u(x), ). S(ξ) = {u : R d R 0 measurable with x ξ u(x) 2 1}. the convex distance, To obtain the concentration inequalities for a U-statistic F associated with the finite intensity Poisson process η, we will first relate F in a reasonable way to d T. Then we will use the inequality (3.2.11) P(η A)P(d T (η, A) s) exp ) ( s2 4 for A N (s) fin, s 0, which is taken from [63, Theorem 3.2]. For the upcoming proof of Theorem we also need the following relation, stated in [45, p. 13]. We carry out the corresponding straightforward computations for the sake of completeness.

47 3.2. CONCENTRATION FOR U-STATISTICS 45 Lemma Let F be a U-statistic of order k with kernel f 0. Then for any ξ, ν N (s) fin we have F (ξ) k F (x, ξ) + F (ν). x ξ\ν Proof. We have F (ξ) = 1( x i / ν)f(x) + x ξ k x (ξ ν) k f(x) k 1(x i / ν)f(x) + f(x) i=1 x ξ k x ν k = k 1(x 1 / ν)f(x) + f(x) x ξ k = k x ξ\ν F (x, ξ) + F (ν), where the third line holds by symmetry of f. x ν k We are equipped to prove the result corresponding to Theorem for finite intensity measure processes. Proposition Assume that the intensity measure of η is finite. Let F be a U-statistic of order k with kernel f 0 and let m be a median of F. Assume that F satisfies (3.2.6) for some α [0, 2) and c > 0. Then for all r 0 one has ( r 2 ) P(F m + r) 2 exp 4k 2 c(r + m) α. If moreover m > 0, then for all r 0, P(F m r) 2 exp ( r2 4k 2 cm α Proof. This proof is obtained via a variation of the method that was suggested in [65, Sections 5.1 and 5.2] and [45, Section 3]. Let ξ N (s) fin and A N(s) fin such that F (ξ) > 0. Note that F (ξ) = y ξ F (y, ξ) and hence not all F (y, ξ) are equal to zero. Therefore, we can define the map u ξ : R d R 0 by u ξ (x) = ). F (x, ξ) y ξ F (y, ξ)2 if x ξ and u ξ (x) = 0 if x / ξ. Then we have u ξ S(ξ), thus x ξ\ν F (x, ξ) d T (ξ, A) inf ν A. y ξ F (y, ξ)2

48 46 3. GENERAL METHODS FOR CONCENTRATION INEQUALITIES Moreover, by Lemma 3.2.9, for any ν A we have F (ξ) k F (x, ξ) + F (ν). x ξ\ν Thus, since by (3.2.6) we have y ξ F (y, ξ)2 cf (ξ) α for P η -a.e. ξ N (s) fin, we obtain that (3.2.12) holds for P η -a.e. ξ N (s) fin d T (ξ, A) inf ν A F (ξ) F (ν) inf k y ξ F (y, ξ)2 ν A F (ξ) F (ν) k cf (ξ) α/2. satisfying F (ξ) > 0. Now, to prove the first inequality, let A = {ν N (s) fin : F (ν) m}. Let r > 0 and note that the desired inequality holds trivially for r = 0. Then, since the map s s/(s + m) α/2 is increasing, it follows from (3.2.12) that d T (ξ, A) for P η -a.e. ξ N (s) fin (3.2.11) and the fact that P(η A) 1 2 yields F (ξ) m k cf (ξ) r α/2 k c(r + m) α/2 that satisfies F (ξ) m + r. This observation together with P(F (η) m + r) P ( d T (η, A) ( 2 exp To prove the second inequality, let r 0 and A = {ν N (s) fin ) r k c(r + m) α/2 r 2 ). 4k 2 c(r + m) α : F (ν) m r}. If P(F (η) m r) = P(η A) = 0, the desired inequality holds trivially, so assume that P(F (η) m r) > 0 and note that this implies r m. Then, since the map s (s (m r))/s α/2 is increasing, it follows from (3.2.12) that d T (ξ, A) F (ξ) (m r) k m (m r) cf (ξ) α/2 k r = cm α/2 k cm α/2 for P η -a.e. ξ N (s) fin that satisfies F (ξ) m. Thus, it follows from (3.2.11) that ( ) 1 r P(F (η) m) P d T (η, A) 2 k cm α/2 ) 1 ( P(F (η) m r) exp r2 4k 2 cm α.

49 3.2. CONCENTRATION FOR U-STATISTICS 47 As a final preparation for the proof of Theorem 3.2.8, we establish the upcoming Lemma For the proof of this result, we use the following characterization of convergence in distribution, which is a part of the well-known Portmanteau theorem (see e.g. [41, Theorem 4.25]): Theorem Let and n, n N be random variables taking values in some metric space S. Then the following statements are equivalent: (i) n converges in distribution to ; (ii) lim inf n P( n B) P( B) for any open set B S; (iii) lim sup n P( n C) P( C) for any closed set C S. Recall that M is the smallest median of a random variable, as defined in (3.2.10). Lemma Let and n, n N be random variables such that a.s. n+1 a.s. n for all n N and n. Then there exists a non-decreasing sequence (m n ) n N where m n is a median of n such that lim n m n = M. Proof. For a random variable Z, let ˆMZ = sup{x R : P(Z x) 1/2} <. Note that ( ˆM n ) n N is a non-decreasing sequence and that ˆM n ˆM for all n N, hence ( ˆM n ) n N is convergent. We claim that (3.2.13) lim ˆM n M. n To see this, let x R be such that P( x) < 1/2. Since almost sure convergence of the n implies convergence in distribution, we have by the Portmanteau theorem (Theorem ) that lim sup P( n x) P( x) < 1/2. n Hence, for sufficiently large n, one has P( n x) < 1/2. This implies that for sufficiently large n, we have P( n x) 1/2 and thus ˆM n x. From these considerations it follows that lim ˆM n sup{x R : P( x) < 1 n 2 } = inf{x R : P( x) 1 2 } = M. Hence (3.2.13) is established. Now, for any n N, either M n = ˆM n is the unique median of n or all elements in the interval [M n, ˆM n ) are medians of n, where M n and ˆM n are non-decreasing in n. Taking (3.2.13) into account as well as the fact that M n M for all n N, the result follows. Proof of Theorem For any n N let η n = η B(0, n), where B(0, n) denotes the Euclidean ball centered at 0 with radius n. Then η n is a Poisson point process with finite intensity measure that is given by µ n (A) = µ(a B(0, n)) for any Borel set A R d. We define for any n N the random variable F n in terms of the functional S f : N [0, ], which was introduced in (3.2.1), by F n := S f (η n ).

50 48 3. GENERAL METHODS FOR CONCENTRATION INEQUALITIES One easily observes that, since F is a U-statistic with non-negative kernel, we have a.s. almost surely F n+1 F n for all n N and also F n F. According to Lemma we can choose a non-decreasing sequence (m n ) n N such that m n is a median of F n satisfying lim n m n = MF. Let r 0. By virtue of Proposition , for any n N we have ( r 2 ) ( r 2 ) P(F n m n r) 2 exp 4k 2 c(r + m n ) α 2 exp 4k 2 c(r + MF ) α. The sequence of random variables F n m n converges almost surely (and thus in distribution) to F MF. Therefore, by the Portmanteau theorem (Theorem ) we have P(F MF > r) lim inf n P(F n m n > r) 2 exp The inequality for the lower tail follows analogously. ( r 2 4k 2 c(r + MF ) α Another look at U-statistics of order two. In this section, we develop a different approach for obtaining concentration inequalities for the upper tail of U-statistics of order 2, which is based on combining Corollary with (a generalized version of) the result [66, Theorem 3] by Reynaud-Bouret, and which is in parts also vaguely similar in spirit to ideas from [66, Section 6.1.2] and [36]. Throughout this section, we let the general assumptions of Section 3.2 prevail; in particular, the intensity measure µ of η is a non-atomic positive measure on (, ) and the Poisson process η is simple. We begin by generalizing [66, Theorem 3] to Poisson processes with possibly non-finite intensity measure: Theorem Consider a countable family {f j } j J of functions [0, 1] and let G = sup f j (x). j J x η Assume that EG <. Then for any ν > 0 we have where ϕ(ν) = e ν ν 1. log E[exp(ν(G EG))] ϕ(ν)eg, Proof. Here we generalize the proof of [66, Theorem 3]. First note that by monotone convergence, we can assume without loss of generality that J <. For each n N let G n = min(g, n). Then E(e νgn ) <, hence Proposition with I = gives ( ) (3.2.14) Ent(e νgn ) E e νgn ϕ( νd x G n (η δ x )). x η ).

51 3.2. CONCENTRATION FOR U-STATISTICS 49 Consider some realization of η. Since we assumed J <, it follows that for some j J we have Now, for any x η we have G(η) = x η f j (x). 0 D x G(η δ x ) f j (x) 1. Moreover, if G(η δ x ) n, then D x G n (η δ x ) = 0 and if G(η δ x ) < n, then D x G n (η δ x ) = D x G(η δ x ) max(0, G(η) n). From this we obtain ( ) D x G n (η δ x ) f j (x) max(0, G(η) n) x η x η = G(η) max(0, G(η) n) = G n (η). Since ϕ is obviously convex and ϕ(0) = 0, we have that ϕ( νz) ϕ( ν)z for ν > 0 and 0 z 1. Thus, it follows from the above considerations that ( ) Ent(e νgn ) ϕ( ν)e e νgn D x G n (η δ x ) ϕ( ν)e(e νgn G n ). Continuing in the same way as in the proof of [54, Theorem 10] gives (3.2.15) x η log E[exp(ν(G n EG n ))] ϕ(ν)eg n ϕ(ν)eg. Now, since ν > 0 and EG <, by monotone convergence we have lim E[exp(ν(G n EG n ))] = E[exp(ν(G EG))]. n Invoking (3.2.15) yields the result. We continue with an analytic lemma that is used in the proof of the upcoming theorem, but which might be of independent interest in similar situations. Lemma For any z > 0, sup ν>0 [ νz e ν2 + 1 ] log(z + 1)z 3/2 4. z + 8 Proof. This proof is inspired by the proof of [50, Corollary 2.12]. We begin with a preliminary observation. Let a > 1 and consider the map ν 1 e ν2 + aν 2. Then this map takes 0 to 0 and it has a unique local extremum (a maximum) at log(a) on the positive reals. Hence, whenever 1 e x 2 + ax 2 0 for some x > 0, we have 1 e ν2 aν 2 for any ν (0, x]. Now assume that 0 < z 4. Then we take ν = z/4. Since ν 1 and 1 e + 2 0, the above observation implies 1 e ν2 2ν 2.

52 50 3. GENERAL METHODS FOR CONCENTRATION INEQUALITIES Hence, we have νz e ν2 + 1 νz 2ν 2 = z2 log(z + 1)z 3/ z + 8 For 4 < z 12 we take ν = z/8. Then ν 3/2 and since 1 e 9/ , the initial observation gives Thus, 1 e ν2 4ν 2. νz e ν2 + 1 νz 4ν 2 = z2 log(z + 1)z 3/ z + 8 It remains to prove that the result holds for z > 12. Here we can take ν = log(z) and see that the supremum is lower bounded by z( log(z) 1) + 1. Now, it is elementary to check that the map z log(z) log(z + 1) is increasing for z 12, thus log(z) log(z + 1) + A, where A = log(12) log(13) < 0. Hence, We claim that z( log(z) 1) + 1 z( log(z + 1) + A 1) + 1. z( log(z + 1) + A 1) z( log(z + 1)) for z > 12. This will imply the result since 1 4 z( log(z + 1)) 1 z( log(z + 1)) z = 4 z + 2 To prove the claim, define log(z + 1)z 3/2 4. z + 8 h(z) = z( log(z + 1) + A 1) z( log(z + 1)) = 3 4 z log(z + 1) + z(a 1) + 1. The derivative of h satisfies ( ) h (z) = 3 z + 2 log(z + 1)(z + 1) A 1 3 log(z + 1) + A 1. log(z + 1)(z + 1) 4 Since the right-hand side is increasing in z and positive for z = 12, we have that h (z) 0 for any z > 12. So h is increasing for z > 12 and since also h(12) 0, it follows that h(z) 0 for all z > 12. This proves the claim and concludes the proof of the result.

53 3.2. CONCENTRATION FOR U-STATISTICS 51 Theorem Let F be a U-statistic of order 2 with kernel f 0 such that 0 < EF <. Assume that there is a countable family {g j } j J of functions [0, 1] and a constant c > 0 such that satisfies almost surely (3.2.16) sup G = sup j J y η x η\y g j (x) x η f(y, x) cg and EG <. Then EG > 0 and for any r 0 we have ( ( )) EF + r EF P(F EF + r) exp E(G) χ, 4c EG where log(z + 1)z 3/2 χ(z) = 4. z + 8 Proof. First note that EG = 0 would imply a.s. G = 0 and hence (3.2.16) would give a.s. x η\y f(y, x) = 0 for all y η, yielding a.s. F = 0 and thus contradicting the assumption EF > 0. We see that EG > 0. Now, the assumptions together with (3.2.4) imply that almost surely V + = 4 F (y, η) 2 = 4 y η y η 2 f(y, x) 4cGF. inf θ (0,2/ν) x η\y By Corollary and Theorem this gives for any ν > 0, log E[exp(ν( F E [ F ))] exp inf θ (0,2/ν) νθ 2 νθ log E ( )] 4cν θ G νθ EG(exp(4cν/θ) 1) 2 νθ EG(exp(4cν 2 ) 1). Let r > 0 and note that the result holds trivially for r = 0. Then, using the above computation and Markov s inequality, we obtain for any ν > 0, P(F EF + r) P(e ν( F E F ) e ν( EF +r EF ) ) ( exp EG(exp(4cν 2 ) 1) ν( EF + r ) EF ). Hence, writing z = ( EF + r EF )/( 4cEG) and substituting ν by ν/ 4c, we obtain ( [ P(F EF + r) exp EG sup νz exp(ν 2 ) + 1 ]). ν>0 The result now follows from Lemma

54 52 3. GENERAL METHODS FOR CONCENTRATION INEQUALITIES 3.3. Concentration for the convex distance in Poisson-based models References to the underlying papers. The present section coincides up to minor changes with [BP15, Section 8]. The convex distance for product spaces that was introduced by M. Talagrand in [73] has proved to be a very useful tool in the context of concentration inequalities see e.g. [26, Chapter 11], [72, Chapter 6], [74, Chapter 2] and the references therein. In the recent paper [63] by M. Reitzner, this notion has been adapted for models based on Poisson point processes with finite intensity measure. For both the product space and the Poisson space version, the method of using the convex distance to establish concentration properties is based on an isoperimetric inequality. First applications of this method for Poisson-based models are worked out in [45, 65] where concentration inequalities for Poisson U-statistics are presented, and a variation of the approach that is used in the latter references has already been studied in Subsection of the present thesis. The proof of the convex distance isoperimetric inequality in [63] uses an approximation of the Poisson process by binomial processes. The goal in this section is to give an alternative proof for this inequality. Apart from slightly worse constants, we entirely recover Reitzner s result [63, Theorem 1.1] with the tools developed in the present work. In particular, we only use methods from Poisson process theory, thus answering the question proposed in [63] of whether such a direct proof is possible. Moreover, the assumptions on the space for our results are less restrictive than in [63] where only locally compact second countable Hausdorff spaces are studied. As before, we consider a measurable space (, ) and we will assume throughout the present section that {x} for all x. The upcoming presentation is based on [12, Section 2] and [11, pp ] where the convex distance for product spaces is recovered using the entropy method. The results presented in the following are derived by adapting the reasoning from the latter references to the setting of Poisson processes considered in this section Convex distance for Poisson processes. The framework of the present section allows for Poisson processes that are not simple, and is therefore significantly more general than the setting of Subsection 3.2.4, where we already defined a version of the convex distance that was adapted to simple point processes. We therefore need to first define the notion of convex distance in its full generality, as it was introduced by Reitzner in [63, pp. 1-2]. For this purpose, let N fin N denote the space of finite integer-valued discrete measures on which is equipped with the σ-algebra N fin obtained by restricting N to N fin. We will write ξ(x) = ξ({x}) whenever ξ N fin and x in order to simplify notations. For any two measures ξ, ν N fin, we define the measure ξ \ ν by ξ \ ν = x ξ(ξ(x) ν(x)) + δ x,

55 3.3. CONCENTRATION FOR THE CONVE DISTANCE IN POISSON-BASED MODELS 53 where x ξ indicates, as before, that x and ξ(x) > 0. The convex distance d T (ξ, A) is now defined for any measurable set A N fin and ξ N fin by d T (ξ, A) = sup inf u d(ξ \ ν), u ξ 1 ν A where the supremum ranges over all measurable maps u : R such that u ξ 1 and ξ denotes the 2-norm with respect to the measure ξ. Obviously, the supremum in the above right-hand side can also by taken only over non-negative maps u. It is immediate from the above definition that (3.3.1) d T (ξ, A) = sup inf u(x)(ξ(x) ν(x)) +. u ξ 1 ν A x ξ The upcoming Proposition gives an alternative characterization for the convex distance, which will be crucial for proving the isoperimetric inequality later on. For the proof of this result, as in the corresponding reasoning from [11, pp ], we will use Sion s minimax theorem [71, Corollary 3.3]. The following formulation of Sion s result is basically taken from [42, p. 5]: Theorem Let S and T be convex subsets of linear topological spaces and assume moreover that S is compact. Let f : S T R be a function such that Then (i) For each x S the function T R, y f(x, y) is upper semicontinuous and quasi-concave; (ii) For each y T the function S R, x f(x, y) is lower semicontinuous and quasi-convex. inf sup x S y T f(x, y) = sup inf f(x, y). y T x S We are well equipped for the proof of the announced characterization of the convex distance. Proposition Let A N fin and denote by M(A) the set of probability measures on A. Then, for any ξ N fin we have d T (ξ, A) = max min u(x)e ζ(ν) [(ξ(x) ν(x)) + ] u ξ 1 ζ M(A) x ξ = min max ζ M(A) u ξ 1 x ξ u(x)e ζ(ν) [(ξ(x) ν(x)) + ], where, here and for the rest of the section, we use the shorthand notation E ζ(ν) [h(ν)] = h(ν)dζ(ν), for every positive measurable map h : A R +. A

56 54 3. GENERAL METHODS FOR CONCENTRATION INEQUALITIES Proof. Here we adapt arguments from the proof of [11, Proposition 13]. We begin by proving that (3.3.2) d T (ξ, A) = sup inf u(x)e ζ(ν) [(ξ(x) ν(x)) + ]. u ξ 1 ζ M(A) x ξ For any ν A consider the probability measure ζ ν M(A) that is concentrated on ν. Then u(x)(ξ(x) ν(x)) +. x ξ u(x)e ζν(ν )[(ξ(x) ν (x)) + ] = x ξ Hence, for any u with u ξ 1, we have inf u(x)e ζ(ν) [(ξ(x) ν(x)) + ] inf u(x)(ξ(x) ν(x)) +. ζ M(A) x ξ ν A x ξ On the other hand, for all ζ M(A) we have inf u(x)(ξ(x) ν(x)) + E ζ(ν) u(x)(ξ(x) ν(x)) + Thus, ν A x ξ inf u(x)(ξ(x) ν(x)) + ν A x ξ This establishes equation (3.3.2). x ξ = x ξ u(x)e ζ(ν) [(ξ(x) ν(x)) + ]. inf ζ M(A) x ξ u(x)e ζ(ν) [(ξ(x) ν(x)) + ]. We aim at applying Sion s minimax theorem (Theorem 3.3.1). To get prepared for this, first note that the supremum in (3.3.2) can obviously by performed with respect to those functions u : R satisfying u(x) = 0 whenever x / ξ. Note also that these functions form a finite dimensional real vector space (whose dimension is given by {x : x ξ} ), which will be denoted by U. So the supremum is actually taken over U 1 = {u U : u ξ 1}, which is a convex and compact subset of U. Denote by Q the finite set of maps q : N 0 satisfying q(x) ξ(x) for all x ξ and q(x) = 0 whenever x / ξ. Moreover, define the map I by I : A Q, ν (x (ξ(x) ν(x)) + ). Then, for any ζ M(A) and x ξ we have E ζ(ν) [(ξ(x) ν(x)) + ] = E Iζ(q) [q(x)], where Iζ denotes the pushforward measure of ζ with respect to I. Now, instead of taking the infimum in M(A) we can also minimize in the set of pushforward measures IM(A). The set IM(A) coincides with the set of probability measures on I(A), denoted by MI(A). Observe that MI(A) is a convex and compact subset in

57 3.3. CONCENTRATION FOR THE CONVE DISTANCE IN POISSON-BASED MODELS 55 the finite dimensional real vector space of all signed measures on I(A), which we denote by SI(A). Obviously, the map U SI(A) R, (u, ζ) x ξ u(x)e ζ(q) [q(x)], is both linear in u and ζ. Hence, it is also upper semicontinuous and quasi-concave in u and lower semicontinuous and quasi-convex in ζ. According to the above considerations, the assumptions of Sion s theorem are satisfied and we obtain sup inf u(x)e ζ(ν) [(ξ(x) ν(x)) + ] = sup inf u(x)e ζ(q) [q(x)] u ξ 1 ζ M(A) x ξ = inf sup ζ MI(A) u U 1 x ξ u(x)e ζ(q) [q(x)] = inf u U 1 ζ MI(A) x ξ sup ζ M(A) u ξ 1 x ξ u(x)e ζ(ν) [(ξ(x) ν(x)) + ]. Since both U 1 and MI(A) are compact, the suprema and infima are actually maxima and minima Convex distance inequality. In what follows, we will give the announced new proof of the convex distance inequality for Poisson point processes. The result we aim to prove is the following: Theorem Let η be a Poisson point process on with finite intensity measure µ. Let A N fin be arbitrary. Then In particular, for any r 0, P(η A)E(e d T (η,a) 2 /10 ) 1. (3.3.3) P(η A)P(d T (η, A) r) e r2 /10. Note that in [63, Theorem 1.1] (under more restrictive assumptions on the space (,, µ)), an inequality stronger than (3.3.3) is proved, where the constant 1/10 is replaced by 1/4. To get prepared for the proof of Theorem 3.3.3, we first establish the following result. This is interesting in its own right since it particularly states that the variance of the convex distance is bounded by 1. Proposition Let η be a Poisson point process on with finite intensity measure µ. Then for any A N fin almost surely V + (d T (η, A)) = (D x d T (η δ x, A)) 2 dη(x) 1. In particular, Vd T (η, A) 1. Proof. For this proof we adapt arguments from the proof of [11, Proposition 13]. According to Proposition we can choose a map û : R with û ξ 1 and a probability measure ˆζ on A satisfying d T (ξ, A) = x ξ û(x)eˆζ(ν) [(ξ(x) ν(x)) + ].

58 56 3. GENERAL METHODS FOR CONCENTRATION INEQUALITIES Then, for any z ξ, we have d T (ξ δ z, A) min ζ M(A) x ξ δ z û(x)e ζ(ν) [((ξ δ z )(x) ν(x)) + ]. Choose some ζ M(A) that achieves the minimum in the above right-hand side. Then d T (ξ, A) x ξ û(x)e ζ(ν) [(ξ(x) ν(x)) + ]. It follows that d T (ξ, A) d T (ξ δ z, A) x ξ û(x)e ζ(ν) [(ξ(x) ν(x)) + ] = û(z)e ζ(ν) [(ξ(z) ν(z)) + (ξ(z) ν(z) 1) + ] = û(z)e ζ(ν) [1{ξ(z) > ν(z)}] û(z). x ξ δ z û(x)e ζ(ν) [((ξ δ z )(x) ν(x)) + ] This yields (d T (ξ, A) d T (ξ δ z, A)) 2 dξ(z) û(z) 2 ξ(z) = û 2 ξ 1. Hence, almost surely (3.3.4) (D x d T (η δ x, A)) 2 dη(x) 1. Plainly, one has d T (η, A) η() and the latter random variable is Poisson distributed. Hence, the convex distance is square-integrable, and therefore the Poincaré inequality for Poisson processes (3.1.2) gives Vd T (η, A) E (D x d T (η δ x, A)) 2 dη(x). Thus, Vd T (η, A) 1 by virtue of (3.3.4). As a final ingredient for the upcoming proof of the convex distance inequality, we derive the following consequence of the Cauchy-Schwarz inequality. Lemma Let ξ N fin and consider the measure space (,, ξ). Then for any measurable map h : R, (3.3.5) sup u ξ 1 u(x)h(x)dξ(x) = h ξ. Proof. Note that h is of course square-integrable with respect to ξ. Hence, by the Cauchy-Schwarz inequality, for any u such that u ξ 1, u(x)h(x)dξ(x) u ξ h ξ h ξ. We see that the left-hand side in (3.3.5) is less or equal to the right-hand side. Moreover, we can take u = h/ h ξ to conclude that the right-hand side is less or equal to the left-hand side.

59 3.3. CONCENTRATION FOR THE CONVE DISTANCE IN POISSON-BASED MODELS 57 Proof of Theorem Here we adapt arguments from the proofs of [12, Lemma 1 and Corollary 1]. We will prove below that (3.3.6) (3.3.7) 0 D x (d T (ξ, A) 2 ) 2 for any (x, ξ) N fin and almost surely V + (d T (η, A) 2 ) 4d T (η, A) 2. Hence, if follows from Theorem and Theorem (where the latter result is applied to the Poisson functional 1 2 d T (η, A) 2 ) that (i) For any ν (0, 1/2), log E(exp(ν(d T (η, A) 2 Ed T (η, A) 2 ))) 2ν2 Ed T (η, A) 2 ; 1 2ν (ii) If Ed T (η, A) 2 > 0, then for any r 0, ( P(d T (η, A) 2 Ed T (η, A) 2 r 2 ) r) exp 8Ed T (η, A) 2. Taking ν = 1/10 we obtain from (i) that ( Ee d T (η,a) 2 /10 EdT (η, A) 2 ) exp. 8 Moreover, since η A implies d T (η, A) = 0, it follows from (ii) with r = Ed T (η, A) 2 that P(η A) P ( d T (η, A) 2 Ed T (η, A) 2 Ed T (η, A) 2) exp ( Ed T (η, A) 2 ), 8 where this inequality holds trivially in the case Ed T (η, A) 2 = 0. So, the result follows once we have proven (3.3.6) and (3.3.7). To prove (3.3.7), first observe that d T (, A) is an increasing functional. Using this and Proposition we compute V + (d T (η, A) 2 ) = (d T (η, A) 2 d T (η δ x, A) 2 ) 2 dη(x) = (d T (η, A) d T (η δ x, A)) 2 (d T (η, A) + d T (η δ x, A)) 2 dη(x) (d T (η, A) d T (η δ x, A)) 2 4d T (η, A) 2 dη(x) = 4d T (η, A) 2 (D x d T (η δ x, A)) 2 dη(x) 4d T (η, A) 2. It remains to prove (3.3.6). For this, let (z, ξ) N fin. Then, according to Proposition 3.3.2, we can write d T (ξ, A) = max u(x)eˆζ(ν) [(ξ(x) ν(x)) + ] u ξ 1 x ξ = max u ξ 1 u(x)eˆζ(ν) [ ( 1 ν(x) ξ(x) ) + ] dξ(x)

60 58 3. GENERAL METHODS FOR CONCENTRATION INEQUALITIES for some probability measure ˆζ on A. By virtue of Lemma 3.3.5, the latter expression equals ( [ ( Eˆζ(ν) 1 ν(x) ) ]) 2 dξ(x). ξ(x) Invoking Proposition and again Lemma 3.3.5, we also obtain d T (ξ + δ z, A) max u(x)eˆζ(ν) [((ξ + δ z )(x) ν(x)) + ] u ξ+δz 1 x ξ+δ z = ( [ ( ) ]) 2 ν(x) Eˆζ(ν) 1 d(ξ + δ z )(x). (ξ + δ z )(x) + From this it follows that ( [ ( D z d 2 T Eˆζ(ν) 1 ν(z) ) ξ(z) ]) 2 ( [ ( (ξ(z) + 1) Eˆζ(ν) 1 ν(z) ) ξ(z) + ]) 2 ξ(z) where the subtrahend vanishes whenever ξ(z) = 0. Clearly, if ξ(z) = 0, then the righthand side in the above display is less or equal to 1. So assume that ξ(z) > 0. Then, using the abbreviations G(ν, z) = (ξ(z) ν(z) + 1) + and G (ν, z) = (ξ(z) ν(z)) +, one observes that the right-hand side in the last display can be upper bounded by (Eˆζ(ν) G(ν, z)) 2 (Eˆζ(ν) G (ν, z)) 2 ξ(z) + 1 = Eˆζ(ν) [G(ν, z) G (ν, z)] Eˆζ(ν) [G(ν, z) + G (ν, z)] ξ(z) + 1 Eˆζ(ν) [G(ν, z) + G (ν, z)] ξ(z) + 1 [ ] (ξ(z) ν(z) + 1)+ + (ξ(z) ν(z)) + = Eˆζ(ν) ξ(z) It follows that D z (d T (ξ, A) 2 ) 2. The functional d T (, A) is increasing and non-negative, thus D z (d T (ξ, A) 2 ) 0. This concludes the proof.

61 CHAPTER 4 Concentration for Random Geometric Graphs References to the underlying papers. Figure 1 below is a grayscale version of [BR15, Figure 1]. The paragraph following Figure 1 coincides up to minor changes with the introductory paragraph of [BR15, Section 4]. In this chapter, we will prove concentration inequalities for a variety of random quantities that are naturally of interest when studying random geometric graphs. The desired estimates are derived using the general results that were established in the preceding Chapter 3. Before introducing the model for random geometric graphs that will be considered in the following, we present a picture that illustrates how the resulting graphs might look like in the plane. Figure 1: Random unit disk graph, intensity measure dµ = 18( x + 1) 1 dx 59

62 60 4. CONCENTRATION FOR RANDOM GEOMETRIC GRAPHS The model that will be investigated in the present chapter has been particularly studied in [32, 43, 44], and it includes as a special case the classical model for random geometric graphs, which is extensively described in [61] for the case where the underlying point process has finite intensity measure. Let S R d be a Borel set such that S = S. To any countable subset ξ R d we assign the geometric graph G S (ξ) with vertex set ξ and an edge between two distinct vertices x and y whenever x y S. For x = (x 1,..., x k ) (R d ) k we will occasionally write G S (x) instead of G S ({x 1,..., x k }). Throughout the chapter, let η be a non-trivial Poisson point process on R d with non-atomic intensity measure µ. We also assume that µ is locally finite, meaning that µ(b) < for any bounded Borel set B. Note that the Poisson process η considered in this chapter is simple and will therefore be regarded as a random countable subset of R d. Now, the object that we are interested in is the random geometric graph G S (η). Denote the closed ball centered at x R d with radius ρ R + by B(x, ρ). Throughout, we will assume that B(0, ρ) S B(0, θρ) for some ρ > 0 and θ 1. Note that if we take θ = 1, then S = B(0, ρ) and we end up with the classical model of random geometric graphs for the Euclidean norm, often referred to as random disk graph. Also, the classical geometric graphs based on any other norm on R d are covered by the model introduced above. We emphasize that the setting introduced above fits into the general framework of the preceding Chapter 3, where the measurable space (, ) is now given by R d equipped with the Borel σ-algebra B(R d ). The definitions and notations that have been introduced and used in the previous chapters shall prevail throughout. A comprehensive part of research related to quantities associated with random geometric graphs deals with how they behave asymptotically when the parameters of the model are varied. We stress that throughout the present chapter, the parameters of the considered graph model are fixed. However, in the subsequent Chapter 5, asymptotic results in terms of strong laws will be derived, and an overview of the literature on the asymptotic theory for random geometric graphs is also included in that chapter Edge counting References to the underlying papers. The present section, excluding Lemma and its proof as well as Subsection 4.1.4, coincides up to minor changes with content from [BP15, Section 6, pp ]. Here the Subsections 4.1.1, 4.1.2, and correspond to [BP15, Subsections 6.1, 6.2, 6.3 and 6.4], respectively. In particular, Figure 2 is a grayscale version of [BP15, Figure 1]. In this section we will prove concentration inequalities for the number of edges in the random disk graph. For the remainder of the section, we fix some ρ > 0 and denote the corresponding random disk graph associated with η by G = G(η) = G S (η) where the set S R d is now just the Euclidean ball with radius ρ. The vertex set of G is then given by η, and two vertices x, y are linked by an edge (in symbols, x y) whenever 0 < x y ρ, where denotes the Euclidean norm on R d.

63 4.1. EDGE COUNTING 61 For technical reasons clarified below, we will assume for the rest of the section that the following condition on µ is verified: (4.1.1) µ(b(x, ρ)) dµ(x) <. R d Relation (4.1.1) is verified whenever µ(r d ) <, but such a finiteness condition is not necessary for (4.1.1) to hold 1. Note that, since µ is locally finite and (4.1.1) is in order, the map x µ(b(x, ρ)) is necessarily bounded. To see this, choose γ > 0 such that the ball B(0, ρ) can be written as a union of 1/γ many sets with diameter less than ρ. Then the pigeonhole principle (here we mean the extension that µ( n i=1 B i) = c < implies µ(b j ) c/n for some j) yields that for any y R d we can choose a set C y B(y, ρ) satisfying: (i) C y B(x, ρ) for all x C y, and (ii) µ(c y ) γµ(b(y, ρ)). Now, µ(b(x, ρ))dµ(x) sup µ(b(x, ρ))dµ(x) γ sup µ(b(y, ρ))dµ(x) R d y R d C y y R d C y = γ sup y R d µ(c y )µ(b(y, ρ)) γ 2 sup y R d µ(b(y, ρ)) 2. In the following, we will provide new concentration estimates for the random variable N = N(η) := { {x, y} η : x y }, corresponding to the number of edges of G, where A denotes the cardinality of a set A. It is immediately seen that N is a Poisson U-statistic of order 2 with non-negative kernel f(x, y) = 1 21{ x y ρ}. In particular, the Slivnyak-Mecke formula (2.1.5) yields that the assumption (4.1.1) is actually equivalent to integrability of N and that EN = 1 µ(b(x, ρ)) dµ(x). 2 R d We also see that assumption (4.1.1) implies that N < almost surely, yielding in turn that N is well-behaved in the sense described in Subsection Preparation: optimal rates. Let the above notation and assumptions prevail. In the forthcoming Section 4.1.2, we will provide estimates for the upper tail of N having the form (4.1.2) P(N EN + r) exp( I(r)), r > 0, where I is a positive map verifying (4.1.3) lim I(r) =. r The next statement contains a universal necessary condition on the asymptotic behavior of I(r). Note that we will write I 1 (r) I 2 (r) for functions I 1, I 2 if they are asymptotically equivalent, meaning that lim r I 1 (r)/i 2 (r) = 1. 1 Consider for instance the measure µ on R 2 having density p(x) = ( x + 1) 2 together with an arbitrary radius ρ > 0

64 62 4. CONCENTRATION FOR RANDOM GEOMETRIC GRAPHS Proposition Let I(r) verify (4.1.2) and (4.1.3). Then, lim sup r I(r) r 1/2 log r 1 2. Proof. Let x R d such that q := µ(b(x, ρ/2)) > 0. Then ˆN := η(b(x, ρ/2)) is Poisson distributed with expectation q. Moreover, the distance between any y, z B(x, ρ/2) is at most ρ, thus any two vertices in B(x, ρ/2) are connected by an edge. This implies that almost surely ˆN( ˆN 1)/2 N. Hence, for any r 0 we have P(N EN + r) P( ˆN 2 ˆN 2EN + 2r) P( ˆN h(r)), where h(r) := 2EN r Now, using that for sufficiently large r, we obtain P( ˆN h(r)) q h(r) e q h(r)! lim inf r qh(r) e q h(r) h(r) = exp( h(r) log(h(r)/q) q), P(N EN + r) exp( h(r) log(h(r)/q) q) 1. The above considerations yield that there exists a constant C 0 such that, for sufficiently large r, h(r) log(h(r)/q) + q I(r) C. Dividing this inequality by I(r) and letting r diverge to infinity gives (4.1.4) lim inf r h(r) log(h(r)/q) I(r) 1. The conclusion is obtained by observing that, as r, h(r) log(h(r)/q) (r/2) 1/2 log r. The following statement is an elementary consequence of Proposition Corollary Let I be a positive map verifying (4.1.2), and assume that there exist constants a, b > 0 such that, as r, I(r) b r a. Then, necessarily, a Concentration inequalities for the upper tail. We will now deal with bounds on the upper tail of N. We start by observing that, for every x η, the local version N(x, η), as defined in (3.2.3), is exactly given by the quantity deg(x)/2, where deg(x) = {y η : x y} is the degree of the vertex x. Our aim in what follows is to show that, for some constant c > 0, one has that almost surely (4.1.5) deg(x) 2 cn 3/2. x η

4.1. EDGE COUNTING 63 Hence, Corollary 3.2.4 yields the following concentration inequality for the upper tail: (4.1.6) P(N EN + r) exp ( ((r + EN)1/4 (EN) 1/4 ) 2 ).

65 4.1. EDGE COUNTING 63 Hence, Corollary yields the following concentration inequality for the upper tail: (4.1.6) P(N EN + r) exp ( ((r + EN)1/4 (EN) 1/4 ) 2 ). 2c Observe that the right-hand side of (4.1.6) has the form exp( I(r)), where I(r) r 1/2 /2c, as r. According to Corollary 4.1.2, the power 1/2 for r is optimal in this situation. We will see in Section 4.2 that, by adopting an alternative approach, the rate of decay of I(r) can indeed be improved by the square root of a logarithmic factor. Also notice that by virtue of relation (4.1.5) together with Corollary 3.2.4, almost sure finiteness of N is equivalent to integrability of N. Hence, relation (4.1.1) actually holds if and only if N is almost surely finite. We start by proving a geometric lemma, focusing on deterministic countable point sets. Note that the diameter of a subset B R d will be denoted by diam(b). In what follows, we shall write p = p(d) to indicate the smallest integer p such that the half ball B h (ρ) := {x R d : x ρ, x 1 > 0} can be written as a union B h (ρ) = B 1... B p of pairwise disjoint Borel sets such that diam(b i ) ρ for all i = 1,..., p. Note that the value of p depends on the dimension d of the surrounding Euclidean space. In the plane R 2 one has for example that p = 3. The picture below illustrates the situation described in the proof of the upcoming lemma. Figure 2: Partitioning of the half ball

ON MEHLER S FORMULA. Giovanni Peccati (Luxembourg University) Conférence Géométrie Stochastique Nantes April 7, 2016

ON MEHLER S FORMULA. Giovanni Peccati (Luxembourg University) Conférence Géométrie Stochastique Nantes April 7, 2016 1 / 22 ON MEHLER S FORMULA Giovanni Peccati (Luxembourg University) Conférence Géométrie Stochastique Nantes April 7, 2016 2 / 22 OVERVIEW ı I will discuss two joint works: Last, Peccati and Schulte (PTRF,