Gärtner-Ellis Theorem and applications.

Size: px

Start display at page:

Download "Gärtner-Ellis Theorem and applications."

Marjory Randall
5 years ago
Views:

1 Gärtner-Ellis Theorem and applications. Elena Kosygina July 25, 208 In this lecture we turn to the non-i.i.d. case and discuss Gärtner-Ellis theorem. As an application, we study Curie-Weiss model with random external magnetic fields. We have discussed Cramér s theorem for empirical means of i.i.d. random variables which take values in R. This theorem under exactly the same condition can be shown to hold for empirical means of i.i.d. random vectors with values in R d (see 2 Corollary 6..6). The result we consider in this note is much more general: measures µ n are not required to be distributions of empirical means, they can be arbitrary distributions on (R d, B(R d )) as long as the limit Λ(t) := lim n n ln Λ n(nt) := lim n n ln e nt,x dµ n [, ] () R d exists for all t R d. A typical situation which arises in applications is the following: we have a sequence of random vectors (Y n ) n N on possibly different probability spaces (Ω n, G n, P n ), Y n : Ω n R d, and µ n (A) := P n (Y n A) for every A B(R d ). In the special case when µ n is the distribution of empirical means of i.i.d. random vectors (X i ) i N with a common distribution µ we have Λ(t) = n ln Λ n(nt) = [E (e )] n ln t,x n i = ln e t,x dµ R d - just the familiar logarithmic MGF of µ. The condition is 0 DΛ o, where Λ is the logarithmic MGF of the common distribution µ on R d. 2 Amir Dembo and Ofer Zeitouni. Large deviations techniques and applications, volume 38 of Applications of Mathematics. Springer-Verlag, New York, second edition, 998 We can always think of these random variables as defined on a common probability space so that their distributions are exactly those that we want (e.g. construct them on an infinite product space) but it is not necessary. Gärtner-Ellis Theorem Let (µ n ) n N be a sequence of probability measures on (R d, B(R d )). Assume that for all t R d a possibly infinite limit Λ(t) in () exists. The convexity of Λ n (t) := ln e t,x dµ n R d for each n N and the limit definition of Λ immediately imply that Λ is convex. Similarly to the assumption in Cramér s theorem, we shall assume throughout this note that 0 DΛ o. This will ensure, in particular, that

2 gärtner-ellis theorem and applications. 2 Λ >. Indeed, note that as Λ n (0) = 0 for all n, so Λ(0) = 0. If for some t we had Λ(t) = then by convexity we would have for all α (0, ] But then Λ(αt) = Λ(αt + ( α)0) αλ(t) + ( α)λ(0) =. 0 = Λ(0) = Λ ( 2 (αt) + ) 2 ( αt) 2 Λ(αt) + 2 Λ( αt), and we would also have Λ( αt) = for all α (0, ]. This contradicts the assumption 0 D o Λ. We shall also need the following definition. Definition. Let I be the Legendre-Fenchel transform of Λ, i.e. I(x) = sup x R d ( t, x Λ(t)). A point x D I := {x R d : I(x) < } is said to be exposed for I if there is a η R d such that I(y) I(x) > η, y x for all y = x y The hyperplane h x (y) = I(x) + η, y x is called an exposing hyperplane to the graph of I at x. For a given (x, I(x)), it is characterized by its normal η. With a slight abuse of terminology, η itself will be referred to as an exposing hyperplane. Figure : Points x (0, ] are not exposed for the pictured function. x Since I is the Legendre-Fenchel transform of Λ, I is convex and satisfies all conditions of a rate function, i.e. it is non-negative and has compact sub-level sets. A justification of the last two claims can be given along the same lines as in lecture notes (right after Cramér theorem). Theorem (Gärtner-Ellis). Let (µ n ) n N be a sequence of probability measures on (R d, B(R d )). Assume that for all t R d a possibly infinite limit Λ(t) in () exists and that 0 D o Λ. Then (i) for every closed set C R d, lim sup n n ln µ n(c) inf x C I(x). Jurgen Gärtner. On large deviations from an invariant measure. Teor. Verojatnost. i Primenen., 22():27 42, 977 Richard S. Ellis. Large deviations for a general class of random vectors. Ann. Probab., 2(): 2, 984

3 gärtner-ellis theorem and applications. 3 (ii) for every open set O R d, lim sup n n ln µ n(o) inf I(x), x O E where E is the set of those exposed points for I which have an exposing hyperplane in D o Λ. Suppose in addition that Λ is lower semi-continuous on R d, differentiable on D o Λ, and either D Λ = R d or Λ is steep, i.e. lim Λ(t n) = n whenever t n D o Λ, t n t D o Λ as n. Then (µ n) n N satisfies the LDP with rate function I. This theorem is rather general but still it does not capture all the cases in which a sequence of measures on R d satisfies a LDP. Our concern is, of course, only about the lower bound. Here is a simple example borrowed from 3, p. 45. Example. Let µ n ((, x]) = ( e nx ) [0, ) (x) (exponential distribution with parameter n), x R. Then ( ) Λ(t) = lim n n ln ne ntx nx 0, if t < ; dx = 0, if t. x, if x 0; I(x) = sup(tx Λ(t)) = t R, if x < 0. We see that E = {0} while D I = [0, ), and for each open set O with O E = Gärtner-Ellis theorem gives only a trivial lower bound. But it is easy to see directly that for every open set O for which O D I = lim n n ln µ n(o) = lim n n ln O [0, ) ne nx dx inf{x, x O [0, )} = inf x O I(x). This says that (µ n ) n N satisfy a LDP with rate I. 3 Amir Dembo and Ofer Zeitouni. Large deviations techniques and applications, volume 38 of Applications of Mathematics. Springer-Verlag, New York, second edition, 998 Exercise. Give the details of this computation. Hint: Every open set in R is a countable union of disjoint open intervals. Use the leftmost interval which intersects D I. We note that Gärtner-Ellis Theorem readily implies Cramér theorem on R d, d, when the logarithmic MGF of µ is finite on all of R d. Otherwise we have to impose an additional condition of

4 gärtner-ellis theorem and applications. 4 steepness (see Theorem ). As we have already mentioned above, the inclusion 0 DΛ o is sufficient for the result in Cramér theorem to hold. The following exercise shows that Theorem does not fully include Cramér theorem if D Λ = R d. Exercise (Exercise 2.3.7(a) in A. Dembo, O. Zeitouni). Let µ be a probability measure on (R d, B(R d )) with a density proportional to e x /( + x d+2 ) and Λ be its logarithmic MGF. (a) Find D o Λ. (b) Show that Λ is not steep. Conclude that in this example Cramér theorem yields the full LDP for empirical means while Gärtner-Ellis theorem does not. A general limitation of Gärtner-Ellis theorem is that the differentiability and steepness conditions on Λ are often hard to check in applications. This is due to the fact that the existence of the limit which defines Λ is often obtained by soft methods which do not give a usable formula for Λ. Let us look again at the already familiar Curie-Weiss model and see what Gärtner-Ellis theorem can and can not do in this case. This example also provides a good illustration of notions introduced in this note. Curie-Weiss model II: what does Gärtner-Ellis theorem give us? For each spin configuration σ = (σ, σ 2,..., σ n ) Σ n = {, } n we define P n,β (σ) = e βh n(σ) Z n,β, where H n (σ) = J 2n n σ i σ j and Z n,β = e βhn(σ). i,j= σ Σ n For each n we have a different probability space (Σ n, G n, P n,β ). We study random variables σ n = n i= σ i [, ]. Thus, the relevant probability measures µ n,β (the distributions of σ n under P n,β ), n N, are measures on the common measurable space ([, ], B([, ])). Thus, µ n,β (A) = P n,β (σ : σ n A), A B([, ]).

5 gärtner-ellis theorem and applications. 5 We obtained two main results for (µ n,β ) n N : the LDP with the rate function I β (x) = I 0 (x) βj [ 2 x2 inf I 0 (y) βj ] y [,] 2 y2, where +x 2 ln +x 2 + x 2 ln x 2, if x < ; I 0 (x) = ln 2, if x = ;, if x >. and weak convergence to a limiting measure. These results clearly show the existence of a phase transition as β crosses J. Let us apply Gärtner-Ellis theorem for this case and draw conclusions. We know from the start that when β > J we should run into trouble, since in this regime I β is not convex while every rate function in Gärtner-Ellis theorem is convex. To start, we need to compute Λ β (t). As we did before, we rewrite everything in terms of µ n,0 (β = 0) which assigns a binomial probability to each possible value of σ n {, + 2/n,..., 2/n, }, ( ) n µ n,0 ({ + 2m/n}) = 2 n m and use the LDP for these measures (with the rate function I 0 ) and Varadhan s lemma to get Λ β (t) = lim n n R ln E = lim n n ln µn,0 = sup x [,] [ tx + e ntx dµ n,β (e n(tσ n+ βj 2 (σ n) 2) E µn,0 ( e n βj 2 (σ n) 2) = sup x [,] ( βj 2 x2 I 0 (x) sup y [,] (2) ( tx + βj ) 2 x2 I 0 (x) Since we take the supremum of continuous functions over [, ], Λ β (t) < for all t, i.e. D Λβ = R. By Theorem (i) we get a LDP upper bound with the rate function Λ β. If β J then I β is convex and Λ β = I = I β with D Iβ = [, ]. Moreover, I β is strictly convex so that every point in (, ) is exposed with an exposing hyperplane in D Λβ simply because D Λβ = R. By Theorem (ii) we get a LDP lower bound: for every open set O lim inf n n ln µ n,β(o) inf I β(x) x O (,) = inf x O [,] I β(x) = inf x O I β(x). sup y [,] ( ) βj 2 y2 I 0 (y) ( ) )] βj 2 y2 I 0 (y) = sup (tx I β (x)). x [,]

6 gärtner-ellis theorem and applications. 6 The next to the last equality is due to continuity of I β on [, ]. The last equality holds because I β (x) = outside of [, ]. We recover an earlier obtained LDP in this case. Alternatively, we could have used the last statement of Theorem : Λ β satisfies all additional conditions needed for a full LDP. If β > J then Λ β is convex while I β is not and Λ β (x) = I β, if x m β ; 0, if x < m β, where ±m β (, 0) (0, ) are the points where I β attains its minimal value 0. In lecture note 3 we have shown that m β is the unique solution in (0, ) of the equation x = tanh(βjx). Graph of I β (x) Graph of Λ (x) Construction of Λ(t) 0. y 0. y 0. y y = tx y = Λ (x) m β m β x m β m β x m β m β Λ(t) x Our first observation is that Λ β < I β on ( m β, m β ) and an upper bound provided by Theorem is strictly larger than the one we obtained earlier: for every closed set G ( m β, m β ) inf x G I β(x) < inf x G Λ β (x). We also see that Λ β 0 on [ m β, m β ]. This implies that Λ β (t) = Iβ (t) = Λ β (t) t m β and, since Λ β (0) = 0, we conclude that Λ β has a corner at the origin. The differentiability condition fails at 0, and Theorem only gives us the following lower bound: for all open sets O lim inf n n ln µ n,β(o) inf x O E Λ β (x), where the set of exposed points E is (, m β ) (m β, ) (we have to exclude the set where Λ β is not strictly convex). If O [ m β, m β ] then Theorem gives us only a trivial lower bound (the infimum is taken over the empty set), while our earlier results give us more information in this case. Now it is the right time for an example where Gärtner-Ellis theorem provides an efficient and quick way of getting a LDP z Λ(t) m β t Figure 2: Graph of Λ(t) = Λ (t). t

7 gärtner-ellis theorem and applications. 7 Curie-Weiss model III: adding a random external field This subsection follows 4 and shows how to get a LDP for Curie- Weiss model in the case when a constant global external magnetic field h is replaced by a random local field h := (h i ) i N satisfying an appropriate averaging assumption (see Assumption below). Let (Ω, F, P) be a probability space and h = (h i ) i N be a sequence of random variables on it. We shall consider random probability measures on (Σ n, G n ) defined by 4 Matthias Löwe, Raphael Meiners, and Felipe Torres. Large deviations principle for Curie-Weiss models with random fields. J. Phys. A, 46(2):25004, 0 pp., 203 P n,β,h (σ) := e βh n,h(σ) Z n,β,h, where H n,h (σ) = J 2n n i,j= σ i σ j n i= h i σ i, Z n,β,h = σ Σ n e βh n,h(σ). An important difference with the original deterministic model is that the measure P n,β,h (σ) is not anymore completely determined by the value of σ n. For each ω Ω, let µ n,β,h be the distribution of σ n. It is a measure on [, ]. Our goal will be to state and prove a LDP for (µ n,β,h ) n N. For this we shall need an assumption on a random field h. Assumption: Let h = (h i ) i N be a sequence of random variables on some probability space (Ω, F, P). We assume that for almost every ω Ω f n (t, ω) := n n ln cosh(t + βh i (ω)) f (t) as n i= for every t R and some differentiable function f : R R. Exercise 2. Prove that if f satisfies the above assumption then necessarily f (t) for all t R and, therefore, f (x) := sup t R (tx f (t)) is equal to for all x >. The most basic example is given by a sequence (h i ) i N of i.i.d. random variables with a finite first moment, E[ h i ] <. The convergence follows by the strong law of large numbers (and a small additional argument, see the exercise below) and f (t) = E[ln cosh(t + βh i )]. Note that ln cosh t t and, thus, E[ ln cosh(t + βh i ) ] t + βe[ h i ] <.

8 gärtner-ellis theorem and applications. 8 Exercise 3. The strong law of large numbers implies that for every t the convergence holds outside of a set of probability 0. But R is uncountable. Start with rational t and show rigorously that the assumption is indeed satisfied. Another example, now with dependence, arises when (h i ) i N is an irreducible positive recurrent Markov chain on a countable subset S of R. Such Markov chain has a unique stationary distribution π on S. Assuming that ln cosh(t + βs) π(s) <, s S the ergodic theorem for Markov chains (see Theorem 4. in 5 ) implies that for any initial distribution with probability lim n n n ln cosh(t + βh i ) = f (t) := ln cosh(t + βs)π(s). i= s S Note that when the initial distribution is different from π the random variables (h i ) i N are not identically distributed. 5 Pierre Brémaud. Markov chains, volume 3 of Texts in Applied Mathematics. Springer-Verlag, New York, 999. Gibbs fields, Monte Carlo simulation, and queues For our last example let (h i ) i N {0} be a stationary ergodic sequence on some probability space (Ω, F, P). More precisely, let T : Ω Ω be a measure preserving transformation, i.e. T be F- measurable and P(T A) = P(A) for all A F. In addition, assume that T is ergodic, that is T A = A implies that P(A) {0, }. Let h be any random variable on (Ω, F, P) with a finite first moment. Define h i (ω) = h(t i ω), i N {0}. The ergodicity assumption and the finiteness of the first moment of h ensure (by Birkhoff ergodic theorem) that for almost every ω Ω n lim n n i=0 ln cosh(t + βh i (ω)) = f (t) := E[ln cosh(t + βh(ω))]. Now we are ready to state the main theorem of this subsection. A sequence (h i ) i N {0} is said to be stationary if for every k N {0} and n N the random vector (h n, h n+,..., h n+k ) has the same distribution as (h 0, h,..., h n ). In simple words, ergodicity of T means that there are no non-trivial T-invariant sets. This general framework, in fact, includes the previous two examples provided that the Markov chain in the second example starts from the stationary distribution. Theorem 2. Suppose that h = (h i ) i N satisfies the assumption above. Then for almost every ω Ω measures (µ n,β,h ) n N satisfy a LDP with deterministic rate function I β, f (x) := f (x) βj [ 2 x2 inf f (y) βj ] y [,] 2 y2, where f (x) := sup t R (tx f (t)).

9 gärtner-ellis theorem and applications. 9 Remark. Note that if h i h then f (t) = ln cosh (t + βh) and after a calculation we get that f (x) = I 0 (x) βhx, where I 0 is given by (2). Combining this result with Theorem 2 we get the rate function for the last exercise in lecture notes 3. A useful identity: ln cosh(arctanhx) = 2 ln( x2 ). Proof. The plan is to introduce a family of convenient reference measures which satisfies a LDP and then use tilting to get a desired LDP for the original measures. In our previous considerations measures (µ n,0 ) n N were such reference measures. They were the distributions of σ n under the uniform measure P n,0 on all spin configurations {, } n. Under this uniform measure spins σ i, i n, were independent and identically distributed with P n,0 (σ : σ i = ) = /2. We consider measures Q n,β,h under which individual spins are still independent but not identically distributed: Q n,β,h (σ Σ n : σ i = ±) = e ±βh i 2 cosh(βh i ). Let ν n,β,h be the distribution of σ n under Q n,β,h. These are our new reference measures. We are going to apply Gärtner-Ellis theorem to (ν n,β,h ) n N to get a LDP. We need to compute Λ β (t). Using our assumption we find that for almost every ω Ω Λ β (t) := lim n n R ln e ntx dν n,β,h = lim n n ln e t n i= σ n i Q n,β,h (σ) = lim n σ Σ n n ln e βhi+t + e βh i t 2 cosh(βh i= i ) for all t R. Therefore, D Λβ = R and Λ β is differentiable on R. By Gärtner-Ellis theorem (ν n,β,h ) n N satisfy a LDP with rate = lim n ( f n (t) f n (0)) = f (t) f (0) Λ β (x) = f (x) + f (0). According to Exercise 2, D Λ β [, ]. Our next step is tilting. For every A B([, ]) A simpler way to see this is to note that σ n [, ], and, therefore, the rate function has to be infinite outside of [, ]. P n,β,h (σ) µ n,β,h (A) = P n,β,h (σ) = Q σ: σ n A σ: σ n A n,β,h (σ) Q n,β,h(σ) = 2n n i= cosh(h iσ i ) Z n,β,h = 2n n i= cosh(h iσ i ) Z n,β,h σ: σ n A e nβj 2 (σ n) 2 Q n,β,h (σ) e nβj 2 x2 dν n,β,h =: A Z G n,β,h G(x) : = βj 2 x2, x [, ]; Z G n,β,h := Z n,β,h 2 n n i= cosh(h iσ i ) = e ng(x) dν n,β,h, where A [,] eng(x) dν n,β,h.

10 gärtner-ellis theorem and applications. 0 By part (b) of Lemma 0.3 from lecture notes 3 we conclude that for almost every ω Ω measures (µ n,β,h ) n N satisfy a LDP with rate function I G (x) = Λ β (x) G(x) inf y [,] (Λ β (y) G(y)) = I β, f (x) as claimed. References Pierre Brémaud. Markov chains, volume 3 of Texts in Applied Mathematics. Springer-Verlag, New York, 999. Gibbs fields, Monte Carlo simulation, and queues. Amir Dembo and Ofer Zeitouni. Large deviations techniques and applications, volume 38 of Applications of Mathematics. Springer-Verlag, New York, second edition, 998. Richard S. Ellis. Large deviations for a general class of random vectors. Ann. Probab., 2(): 2, 984. Jurgen Gärtner. On large deviations from an invariant measure. Teor. Verojatnost. i Primenen., 22():27 42, 977. Matthias Löwe, Raphael Meiners, and Felipe Torres. Large deviations principle for Curie-Weiss models with random fields. J. Phys. A, 46 (2):25004, 0 pp., 203.

Large Deviations for Weakly Dependent Sequences: The Gärtner-Ellis Theorem

Chapter 34 Large Deviations for Weakly Dependent Sequences: The Gärtner-Ellis Theorem This chapter proves the Gärtner-Ellis theorem, establishing an LDP for not-too-dependent processes taking values in