1. Principle of large deviations

Size: px

Start display at page:

Download "1. Principle of large deviations"

Suzanna Fletcher
5 years ago
Views:

1 1. Principle of large deviations This section will provide a short introduction to the Law of large deviations. The basic principle behind this theory centers on the observation that certain functions F of a large number of i.i.d random variables often have the following property [ ] (1) Where s and n are real constants such that In other words, this suggests that the probability that the Function F takes on a value near a point x decays exponentially fast with speed s n and rate function I(x) The following example provided by [1] goes some way in further explaining the concept. Consider a sum (more appropriately, sample mean) of real random variables having the form (2) It is desired to calculate the probability density function of S n, or P Sn (s) where the n random variables are mutually independent and identically distributed. Since the random variables are independent, (3) Now, depending on the nature of p(x), i.e, the probability mass function of the random variables, an appropriate form of the expression for P Sn (s) can be obtained. If p(x) follows a Gaussian or exponential distribution, it can be shown that Where, (4). for the Gaussian pdf and for the Exponential pdf. If the Random variables are Bernoulli with a probability of success α,

2 For simplicity (without going into rigorous concepts of topology and measure theory), the author in [1] states that, a random variable S n or its pdf satisfies a Large Deviation Principle (LDP) if a following form of limit exists. (5) and gives rise to a function I(s), which is the rate function and is not everywhere zero. This can be arrived at in the following manner. As was already mentioned earlier, This implies that, where o(n) is a correction term that is sub-linear in n. As a result, taking a limit of the above expression as n tends to infinity leads to The author however, emphasizes that the above definition is a simplified one of the actual rigorous definition provided by Varadhan [2]. He states that the actual definition is expressed in terms of measures of sets rather than as probability distribution functions, and involves upper and lower bounds on these probabilities rather than a simple limit. Large Deviation Theory has two important aspects. On the one hand, there is the question of how to formalize the intuitive formula for the LDP. On the other, there is the question of determining an appropriate form of the rate function for various forms of functions of independent random variables. It must be emphasized that the rate function in the LDP is related to the concept of entropy from Statistical Mechanics. It takes a simple example to illustrate this point. In statistical mechanics, the entropy of a particular micro-state is related to the number of micro-states which occupies a particular macro-state. Consider the earlier example involving the sample mean wherein each random variable is Bernoulli. In that example, the sample mean could designate a particular macro-state, whereas the particular sequence of heads or tails that gives rise to that value of S n could represent a microstate. Loosely speaking, a macro-state having a higher number of micro-states giving rise to it has higher entropy. A state with higher entropy has a higher chance of being realized in actual experiments. The macro-state with mean value of ½, with an equal number of heads and tails, has the highest number micro-states giving rise to it and it is indeed the state with the highest entropy. In most practical situations we observe that we indeed obtain this macro-state for large numbers of trials. The "rate function" in the LDP on the other hand measures the probability of appearance of a particular macro-state. The smaller the rate function, the higher is the chance of a macro-state appearing. Consider again the rate function for the sample mean defined above if the Random variables are Bernoulli with a probability of success α,

3 In the coin-tossing experiment (considering a fair coin with = ½) the value of the "rate function" for mean value equal to 1/2 is zero. This suggests that as the number of trials approaches infinity, the value of the mean approaches that of the expected value. It can be seen quite clearly that this takes the form of the Kullbeck-Leibler Divergence. 2. Chatterjee s Contribution Now, Chatterjee in his paper [3] provides a rigorous mathematical formulation of the LDP for random graphs. Just as Sanov s theorem provides a large deviation principle for an i.i.d sample, Chatterjee formulates a large deviation principle for the Erdos-Renyi Graph and this is the main goal of his paper. The formulation used in [3] was then extended by Chatterjee in [4] wherein a theoretical method is introduced for the analysis of exponential random graph models. Without going into as much mathematical detail as provided by Chatterjee, an attempt will be made here at examining the results of [4] through a simplified approach. The first hurdle that was faced was construction and visualization of a space wherein all the graphs could be viewed as random elements irrespective of sample size. Chatterjee used the concept of graph limits to build this space (W * ). A large deviation principle (much like the one demonstrated in equation (5)) was then formulated for the random graph in the space W *. It was shown in [3] that, (6) where h * F *. The above expression is defined for a closed function F * W *. Now, let T: G * R be a bounded continuous function on the metric space defined. Then, ( ) (7) The coefficient of n 2 is introduced in order to ensure the presence of a non-trivial limit. From equations (6) and (7), ( ) Eventually, after a derivation to this effect, the following form is obtained Where, the supremum is over h * W * This is the main result of Chatterjee s paper [4]. There needs to be emphasis on the incompleteness of (and inconsistencies present in) the above treatment. The actual proof is provided by Chatterjee in [4] and, as mentioned earlier, involves a complete treatment of the concepts. However, for the sake of intuitive understanding, the explanation given here provides a

4 sense of the direction that the authors took in order to arrive at this theory of Estimating and Understanding Random Graph Models. 3. Determining the rate function As mentioned earlier, the definition of the rate function I is one of the prime objectives of the large deviation theory. It was shown earlier that if the Random variables are Bernoulli, I(s) takes the following form, Accordingly, the rate function defined in [4] was I[0,1] R, such that Extending this over the space h W *, (8) ( ) The primary discussion regarding the derivation of Chatterjee s main result needs to be the means of determining the rate function. This is necessary because, as mentioned earlier, the rate function is related to the concept of entropy. The Gartner-Ellis theorem provides a means of calculating the rate function. It defines the Scaled Cumulant Generating Function (SCGF) as Here, E[.] is the expectation operator and A n is the random variable being considered. The Gartner-Ellis theorem states that if exists and is differentiable for all k, then A n satisfies the large deviation principle given by rate function, The transform defined by the supremum is the Legendre-Fenchel transform (also known as Fenchel Transform). Thus, the rate function is obtained by determining the Legendre-Fenchel transform of (also known as Fenchel conjugate). In the case of the random graph model, k is replaced by a function T, which represents the graph statistic, and a similar treatment is performed. It is supposed that obtaining the legendre-fenchel transform of as defined above would then lead to the form of rate function defined by Chatterjee, as in equation (8).

5 4. Thoughts and discussions Though the concepts are related, it is supposed that Chatterjee approached the problem of estimation of ERGMs purely from the standpoint of the Large Deviation Theory. Dudik, [5] on the other hand, provides the first complete framework of a generalized form of maxent including fully general and unified performance guarantees, algorithms and convergence proofs. As a part of this work, he derived the dual of the generalized maxent (which is the log likelihood) and the concept of Fenchel duality was used to derive the dual. There is a definite relation between the rate function of the LDP and the concept of entropy. As the fenchel conjugate is employed in both cases, it leads to similar expressions. 5. References [1] Touchette, H., The large deviation approach to statistical mechanics, Phys.Rep. 478 (1-3) (2009) [2] Varadhan, S.R.S., Asymptotic probabilities and differential equations. Comm. Pure Appl. Math., 19: , [3] S. Chatterjee, Varadhan, S.R.S., The large deviation principle for the Erdős-Rényi random graph, Eur J Combinatorics 32 ( 2011), [4] Chatterjee, S. and Diaconis, P. (2011). Estimating and understanding exponential random graph models. Available at arxiv: [5] Dudık, M., Maximum entropy density estimation and modeling geographic distributions of species. Ph.D. thesis.

Large-deviation theory and coverage in mobile phone networks

Weierstrass Institute for Applied Analysis and Stochastics Large-deviation theory and coverage in mobile phone networks Paul Keeler, Weierstrass Institute for Applied Analysis and Stochastics, Berlin joint