Large deviations, weak convergence, and relative entropy

Size: px
Start display at page:

Download "Large deviations, weak convergence, and relative entropy"

Transcription

1 Large deviations, weak convergence, and relative entropy Markus Fischer, University of Padua Revised June 20, 203 Introduction Rare event probabilities and large deviations: basic example and definition in ection 2. Essential tools for large deviations analysis: weak convergence of probability measures (ection 3) and relative entropy (ection 4). Weak convergence especially useful in the Dupuis and Ellis [997] approach see lectures! Table : Notation X B(X ) P(X ) M b (X ) C(X ) C b (X ) C c (X ) C k (R d ) a topological space σ-algebra of Borel sets over X, i.e., smallest σ-algebra containing all open (closed) subsets of X space of probability measures on B(X ), endowed with topology of weak convergence of probability measures space of all bounded Borel measurable functions X R space of all continuous functions X R space of all bounded continuous functions X R space of all continuous functions X R with compact support space of all continuous functions R d R with continuous partial derivatives up to order k Preparatory lecture notes for a course on Representations and weak convergence methods for the analysis and approximation of rare events given by Prof. Paul Dupuis, Brown University, at the Doctoral chool in Mathematics, University of Padua, May 203.

2 C k c (R d ) space of all continuous functions R d R with compact support and continuous partial derivatives up to order k T a subset of R, usually [0, T ] or [0, ) C(T : X ) space of all continuous functions T X D(T : X ) space of all càdlàg functions T X (i.e., functions continuous from the right with limits from the left) minimum (as binary operator) maximum (as binary operator) 2 Large deviations A standard textbook on the theory of large deviations is Dembo and Zeitouni [998]; also see Ellis [985], Deuschel and troock [989], Dupuis and Ellis [997], den Hollander [2000], and the references therein. A foundational work in the theory is Varadhan [966]. 2. Coin tossing Consider the following random experiments: Given a number n N, toss n coins of the same type and count the number of coins that land heads up. Denote that (random) number by n. Then n /n is the empirical mean, here equal to the empirical probability, of getting heads. What can be said about n /n for n large? To construct a mathematical model for the coin tossing experiments, let X, X 2,... be {0, }-valued independent and identically distributed (i.i.d.) random variables defined on some probability space (Ω, F, P). Thus each X i has Bernoulli distribution with parameter p. = P(X = ). Interpret X i (ω) = as saying that coin i at realization ω Ω lands head up. Then n = n i= X i. By the strong / weak law of large numbers (LLN), n n p with probability one / in probability. In particular, by the weak law of large numbers, for all ε > 0, P { n /n p ε} 0. More can be said about the asymptotic behavior of those deviation probabilities. Observe that n has binomial distribution with parameters (n, p), 2

3 that is, P { n = k} = n! k!(n k)! pk ( p) n k, k {0,..., n}. By tirling s formula, asymptotically for large n, 2πn n n e n P { n = k} 2πk k k e k 2π(n k)(n k) n k e (n k) pk ( p) n k. Therefore, if k n x for some x (0, ), then hence log P { n = k} (log(2π) + log(x) + log( x) + log(n)) 2 ( ) ( ) x x n x log n( x) log, p p ( n log P { n = k} x log ( ) x + ( x) log p ( )) x. p The expression x log( x x p ) + ( x) log( p ) gives the relative entropy of the Bernoulli distribution with parameter x w.r.t. the Bernoulli distribution with parameter p, which is minimal and zero if and only if x = p. The asymptotic equivalence n log P { n = k} ( x log ( ) x + ( x) log p ( )) x, k n x, p shows that the probabilities of the events { n /n p ε} converge to zero exponentially fast with rate (up to arbitrary small corrections) ( ( ) ( )) p + ε p ε (p+ε) log + ( p ε) log, p p which corresponds to x = p + ε, the rate of slowest convergence. Events like { n /n p ε} describe large deviations from the law of large numbers limit, in contrast to the fluctuations ( normal deviations ) captured by the central limit theorem, which says here that the distribution of n ( n /n p) is asymptotically Gaussian with mean zero and variance p( p). 3

4 2.2 The large deviation principle The theory of large deviations will be developed in this course for random variables taking values in a Polish space. A Polish space is a separable topological space that is compatible with a complete metric. Examples of Polish spaces are R d with the standard topology, any closed subset of R d (or another Polish space) equipped with the induced topology, the space C(T : X ) of continuous functions, T (, ) an interval, X a complete and separable metric space, equipped with the topology of uniform convergence on compact subsets of T, the space D(T : X ) of càdlàg functions, T (, ) an interval, a X a complete and separable metric space, equipped with the korohod topology [e.g. Billingsley, 999, Chapter 3], the space P(X ) of probability measures on B(X ), X a Polish space, equipped with the weak convergence topology (cf. ection 3). Let (ξ n ) n N be a family of random variables with values in a Polish space. Let I : [0, ] be a function with compact sublevel sets, i.e., {x : I(x) c} is compact for every c [0, ). uch a function is lower semicontinuous and is called a (good) rate function. Definition 2.. The sequence (ξ n ) n N satisfies the large deviation principle with rate function I iff for all G B(), inf I(x) lim inf x G n log P {ξn G} lim sup n log P {ξn G} inf I(x). x cl(g) The large deviation principle is a distributional property: Writing P n for Law(ξ n ), (ξ n ) satisfies the large deviation principle with rate function I if and only if for all G B(), inf I(x) lim inf x G n log P n(g) lim sup n log P n(g) inf I(x). x cl(g) 4

5 The large deviation principle gives a rough description of the asymptotic behavior of the probabilities of rare events. For simplicity, consider only I- continuity sets; a set G B() is called an I-continuity set if inf x cl(g) I(x) = inf x G I(x). For such G, hence lim n log P {ξn G} = inf I(x) =. I(G), x G P {ξ n G} = P n (G) = e n(i(g)+o()). The probability of G w.r.t. the law of ξ n therefore tends to zero exponentially fast as n whenever I(G) > 0. Thus, if I(G) > 0, then G is a rare event (w.r.t. P n for large n). Example (Coin tossing). The sequence ξ n. = n /n, n N, satisfies the large deviation principle in =. R (or =. [0, ]) with rate function x log( x x p ) + ( x) log( p ) if x [0, ], I(x) = otherwise. If p (0, ), then I is finite and continuous on [0, ], and convex on R. A concept closely related to the large deviation principle is what is called Laplace principle. Let (ξ n ) n N be a family of -valued random variables, P n = Law(ξ n ). Definition 2.2. The sequence (ξ n ) satisfies the Laplace principle with rate function I iff for all F C b (), lim n log E [exp ( n F (ξn ))] = inf {I(x) + F (x)}. x In Definition 2.2 it is clearly equivalent to require that for all F C b (), lim n log exp (n F (x)) P n (dx) = sup {F (x) I(x)}. x If = R d and if we took F (x) = θ x (such an F is clearly unbounded), then on the right-hand side above we would have the Legendre transform of I at θ R d. Also notice the analogy with Laplace s method for asymptotically evaluating exponential integrals: lim n log 0 enf(x) dx = max x [0,] f(x) for all f C([0, ]). Basic results [for instance Dembo and Zeitouni, 998, Chapter 4]: 5

6 . The (good) rate function of a large deviation principle is uniquely determined. 2. If I is the rate function of a large deviation principle, then inf x I(x) = 0 and I(x ) = 0 for some x. If I has a unique minimizer, then the large deviation principle implies a corresponding law of large numbers. 3. The large deviation principle holds if and only if the Laplace principle holds, and the (good) rate function is the same. 4. Contraction principle: Let Y be a Polish space and ψ : Y be a measurable function. If (ξ n ) satisfies the large deviation principle with (good) rate function I and if ψ is continuous on {x : I(x) < }, then (ψ(ξ n )) satisfies the large deviation principle with (good) rate function J(y). = inf x ψ (y) I(x). 2.3 Large deviations for empirical means Let X, X 2,... be R-valued i.i.d. random variables with common distribution µ such that m =. xµ(dx) = E[X ] is finite. As in the case of coin flipping,. set n = n i= X i and consider the asymptotic behavior of n /n, n N. By the law of large numbers, n /n m with probability one. Let φ µ be the moment generating function of µ (or X, X 2,...), that is, φ µ (t) =. e t x µ(dx) = E [ e t X ], t R. R Theorem 2. (Cramér). uppose that µ is such that φ µ (t) is finite for all t R. Then ( n /n) n N satisfies the large deviation principle with rate function I given by I(x) =. sup {t x log (φ µ (t))}. t R The rate function I in Theorem 2. is the Legendre transform of log φ µ, the logarithmic moment generating function or cumulant generating function of the common distribution µ. Two particular cases:. Bernoulli distribution: µ the Bernoulli distribution on {0, } with parameter p. Then φ µ (t) = p + p e t and ( ) x I(x) = x log + ( x) log p ( x p I(x) = for x R \ [0, ], as in Example. 6 ), x [0, ],

7 2. Normal distribution: µ the normal distribution with mean 0 and variance σ 2. Then φ µ (t) = e σ2 t 2 /2 and I(x) = x2 2σ 2, x R. ome properties of log φ µ and the associated rate function I from Theorem 2. (hypothesis φ µ (t) < for all t R): φ µ is in C (R : (0, )), log φ µ is strictly convex, I(x) = for x / [essinf X, esssup X ] = Conv(supp(µ)), I is non-negative and convex, I is strictly convex and infinitly differentiable on (essinf X, esssup X ), I(m) = 0, I (m) = 0, and I (m) = / var(µ). Here we give a sketch of the proof of Theorem 2.; for a complete proof of Cramér s theorem under weaker assumptions on φ µ see ection 2.2 in Dembo and Zeitouni [998, pp ]. Proof of Theorem 2. (sketch). May assume m = E[X ] = 0. Given x > 0, consider the probabilities of the events { n /n x}. By Markov s inequality and since n is the sum of i.i.d. random variables with common distribution µ, we have for t > 0, P { n n x} = P { e t n e t n x} e t n x E [ e t n] = e t n x (φ µ (t)) n. ince this inequality holds for any t > 0 and log is non-decreasing, we obtain lim sup n log P { n/n x} sup {t x log(φ µ (t))}. t>0 By Jensen s inequality, log φ µ (t) t E[X ] = t m = 0. ince log φ µ (0) = 0, we have I(m) = I(0) = 0 and t x log(φ µ (t)) t m log(φ µ (t)) I(m) = 0 for every t 0. It follows that sup t>0 {t x log(φ µ (t))} = sup {t x log(φ µ (t))} = I(x), t R 7

8 which yields the large deviation upper bound for closed sets of the form [x, ), x > 0. A completely analogous argument gives the upper bound for closed sets of the form (, x], x < 0. Let G R be a closed set not containing zero. Then, thanks to the strict convexity of I and the fact that I 0 and I(0) = 0, there are x + > 0, x < 0 such that G (, x ] [x +, ) and inf x G I(x) = inf x (,x ] [x +, ) I(x). This establishes the large deviation upper bound. To obtain the large deviation lower bound, we first consider open sets of the form (x δ, x + δ) for x (essinf X, esssup X ), δ > 0. Fix such x, δ. ince φ µ is everywhere finite by hypothesis, thus log φ µ continuously differentiable and strictly convex on R, there exists a unique solution t x R to the equation and x = (log φ µ ) (t x ) = φ (t x ) φ(t x ) = E [ X e txx ] E [e txx, ] I(x) = t x x log φ µ (t x ). Define a probability measure µ P(R) absolutely continuous with respect to µ according to d µ dµ (y) =. etx y exp (t x y log φ µ (t x )) = φ µ (t x ). Let Y, Y 2,... be i.i.d. random variables with common distribution µ, and set n. = n i= Y i, n N. Then E[Y i ] = x and, for ε > 0, P { n /n (x ε, x + ε)} = {z R n : n i= z i (n(x ε),n(x+ε))} n µ(dy) e n(tx(x ε) tx(x+ε)) {z R n : n i= z i (n(x ε),n(x+ε))} etx n i= y i n µ(dy) { } = e n log φµ(tx) e n(tx(x ε) tx(x+ε)) P n /n (x ε, x + ε). ince Y, Y 2,... are i.i.d. with E[Y i ] = x, we have n /n x in probability as n by the weak law of large numbers. It follows that lim inf n log P { n/n (x ε, x + ε)} log φ µ (t x ) t x (x ε) t x (x + ε). ince ε > 0 was arbitrary and log φ µ (t x ) t x x = I(x), we obtain lim inf n log P { n/n (x δ, x + δ)} I(x). 8

9 If G R is open such that G (essinf X, esssup X ), then the preceding inequality implies that lim inf n log P { n/n G} sup I(x) = inf I(x). x G x G The parameter t x in the proof of Theorem 2. corresponds to an absolutely continuous change of measure with exponential density. Under the new measures, the random variable X has expected value x instead of zero the rare event { n /n x} becomes typical! 3 Weak convergence of probability measures Here we collect some facts about weak convergence of probability measures. A standard reference on the topic is Billingsley [968, 999]; also see Chapter 3 in Ethier and Kurtz [986] or Chapter in Dudley [2002]. Let X be a Polish space. Let B(X ) be the Borel σ-algebra over X. Denote by P(X ) the space of probability measures on B(X ). Definition 3.. A sequence (θ n ) n N P(X ) is said to converge weakly to X w θ P(X ), in symbols θ n θ, if f(x)θ n (dx) f(x)θ n (dx) for all f C b (X ). X Remark 3.. In the terminology of functional analysis, the weak convergence of Definition 3. would be called weak- convergence. Denote by M sgn (X ) the space of finite signed measures on B(X ). Then M sgn (X ) is a Banach space under the total variation norm ( strong convergence ), and P(X ) is a closed convex subset of M sgn (X ) with respect to both the strong and the weak convergence topology. Moreover, M sgn (X ) can be identified with a subspace of the topological dual C b (X ) of the Banach space C b (X ) under the sup norm (if X is compact, then M sgn (X ) C b (X ) ; in general, C b (X ) is not reflexive, not even for X = [0, ]). Thus P(X ) C b (X ), and the convergence of Definition 3. coincides with the weak- convergence on the dual space induced by the original space C b (X ). Example 2 (Dirac measures). Let (x n ) n N X be a convergent sequence with limit x X. Then the sequence of Dirac measures (δ xn ) n N P(X ) converges weakly to the Dirac measure δ x. 9

10 Example 3 (Normal distributions). Let (m n ) n N R, (σ 2 n) n N [0, ) be convergent sequences with limits m and σ 2, respectively. Then the sequence of normal distributions (N(m n, σ 2 n)) n N P(R) converges weakly to N(m, σ 2 ). Example 4 (Product measures). uppose (θ n ) n N P(X ), (µ n ) n N P(Y) are weakly convergent sequences with limits θ and µ, respectively, where Y is Polish, too. Then the sequence of product measures (θ n µ n ) n N P(X Y) converges weakly to θ µ. Example 5 (Marginals vs. joint distribution). Let X, Y, Z be independent real-valued random variables defined on some probability space (Ω, F, P) such that Law(X) = N(0, ) = Law(Y ) (i.e., X, Y have standard normal distribution), and Z has Rademacher distribution, that is, P(Z = ) = /2 = P(Z = ). Define a sequence (θ n ) n N P(R 2. ) by θ 2n = Law(X, Y ),. = Law(X, Z X), n N. Observe that the random variable Z X has θ 2n standard normal distribution N(0, ). The marginal distributions of (θ n ) therefore converge weakly (they are constant and equal to N(0, )), while the sequence (θ n ) itself does not converge (indeed, the joint distribution of X and Z X is not even Gaussian). To prove weak convergence of probability measures on a product space it is therefore not enough to check convergence of the marginal distributions. This suffices only in the case of product measures; cf. Example 4. The limit of a weakly convergent sequence in P(X ) is unique. Weak convergence induces a topology on P(X ); under this topology, X being a Polish space, P(X ) is a Polish space, too. Let d be a complete metric compatible with the topology of X ; thus (X, d) is a complete and separable metric space. There are different choices for a complete metric on P(X ) that is compatible with the topology of weak convergence. Two common choices are the Prohorov metric and the bounded Lipschitz metric, respectively. The Prohorov metric on P(X ) is defined by ρ(θ, ν). = inf {ε > 0 : θ(g) ν(g ε ) + ε for all closed G X }, (3.) where G ε. = {x X : d(x, G) < ε}. Notice that ρ is indeed a metric. The bounded Lipschitz metric on P(X ) is defined by ρ(θ, ν) =. { sup fdθ fdν } : f Cb (X ) such that f bl, (3.2) 0

11 where f bl. = supx X f(x) + sup x,y X :x y f(x) f(y) d(x,y). The following theorem gives a number of equivalent characterizations of weak convergence. Theorem 3. ( Portemanteau theorem ). Let (θ n ) n N P(X ) be a sequence of probability measures, and let θ P(X ). Then the following are equivalent: (i) θ n w θ as n ; (ii) ρ(θ n, θ) 0 (Prohorov metric); (iii) ρ(θ n, θ) 0 (bounded Lipschitz metric); (iv) fdθ n fdθ for all bounded Lipschitz continuous f : X R; (v) lim inf θ n (O) θ(o) for all open O X ; (vi) lim sup θ n (G) θ(g) for all closed G X ; (vii) lim θ n (B) = θ(b) for all B B(X ) such that θ( B) = 0, where B. = cl(b) cl(b c ) denotes the boundary of the Borel set B; (viii) fdθ n fdθ for all f M b (X ) such that θ(u f ) = 0 where. = {x X : f discontinuous at x}. U f Proof. For the equivalence of conditions (i), (ii), (iii), and (iv) see Theorem.3.3 in Dudley [2002, pp ]. (i) (v). Let O X be open. If f C b (X ) is such that 0 f O, then lim inf θ n(o) f dθ X since θ n (O) f for every n N and f dθ n f dθ as n by (i). ince O is open, we can find (f M ) M N C b (X ) such that 0 f M O and f M O pointwise as M. Then f M dθ θ(o) as M by monotone convergence. It follows that lim inf θ n (O) θ(o). (v) (vi). This follows by taking complements (open/closed sets). (v), (vi) (vii). Let B B(X ). Clearly, B B cl(b) and B open, cl(b) closed. Therefore by (v), (vi), θ(b ) lim inf θ n(b ) lim inf θ n(b) lim sup θ n (B) lim sup θ n (cl(b)) θ(cl(b)).

12 If θ( B) = 0, then θ(b ) = θ(cl(b)), hence lim θ n (B) = θ(b). (vii) (viii). Let f M b (X ) be such that θ(u f ) = 0, and let A =. {y R : θ(f {y}) > 0} be the set of atoms of θ f. ince θ f is a finite measure, A is at most countable. Let ε > 0. Then there are N N, y 0,..., y N R \ A such that y 0 f < y <... < y N < f y N, y i y i ε. For i {,..., N} set B i. = f {[y i, y i )}. Then θ( B i ) θ(f {y i }) + θ(f {y i }) + θ{u f } = 0. Using (vii) we obtain lim sup f dθ n lim sup N θ n (B i ) y i = i= N i= θ(b i ) y i ε + f dθ. ince ε was arbitrary, it follows that lim sup f dθn f dθ. The same argument applied to f yields the inequality lim inf f dθn f dθ. The implication (viii) (i) is immediate. From Definition 3. it is clear that weak convergence is preserved under continuous mappings. The mapping theorem for weak convergence requires continuity only with probability one with respect to the limit measure; this should be compared to characterization (vii) in Theorem 3.. Theorem 3.2 (Mapping theorem). Let (θ n ) n N P(X ), θ P(X ). Let Y be a second Polish space, and let ψ : X Y be a measurable mapping. If w w θ and θ{x X : ψ discontinuous at x} = 0, then ψ θn ψ θ. θ n Proof. By part (v) of Theorem 3., it is enough to show that for every O Y open, lim inf θ ( n ψ (O) ) θ ( ψ (O) ). Let O Y be open. et C. = {x X : ψ continuous at x}. Then ψ (O) C is contained in (ψ (O)), the interior of ψ (O). ince (ψ (O)) is open, w θ(c) = and θ n θ by hypothesis, it follows from part (v) of Theorem 3. that θ ( ψ (O) ) = θ ( (ψ (O)) ) ( lim inf θ n (ψ (O)) ) lim inf θ ( n ψ (O) ). 2

13 The following generalization of Theorem 3.2 is sometimes useful. Theorem 3.3 (Extended mapping theorem). Let (θ n ) n N P(X ), θ P(X ). Let Y be a second Polish space, and let ψ n, n N, ψ be a measurable w mappings X Y. If θ n θ and θ {x X : (x n ) X such that x n x while ψ n (x n ) ψ(x)} = 0, then ψ n θ n w ψ θ. Proof. ee Theorem 5.5 in Billingsley [968, p. 34] or Theorem 4.27 in Kallenberg [200, p. 76]. In the following version of Fatou s lemma the probability measures converge weakly, while the integrand is fixed (the latter condition can be weakened). Lemma 3. (Fatou). Let (θ n ) n N P(X ), θ P(X ). Let g : X (, ] be a lower semicontinuous function. If θ n θ, then lim inf g dθ n g dθ. Proof. ee Theorem A.3.2 in Dupuis and Ellis [997, p. 307]. X w X A sequence (X n ) n N of X -valued random variables (possibly defined on different probability spaces) is said to converge in distribution to some X - valued random variable X if the respective laws converge weakly. There is a version of the dominated convergence theorem in connection with convergence in distribution for real-valued (or R d -valued) random variables. Theorem 3.4 (Dominated convergence). Let (X n ) n N be a sequence of R- valued random variables. uppose that (X n ) converges in distribution to X for some random variable X. If (X n ) n N is uniformly integrable, then E [ X ] < and E n [X n ] E [X]. Proof. For M N, the functions x M (x M) and x x M are in C b (R). ince (X n ) converges in distribution to X by hypothesis, this implies E n [ X n M] E [ X M], E n [ M (X n M)] E [ M (X M)]. (3.3a) (3.3b) 3

14 By hypothesis, (X n ) n N is uniformly integrable, that is, lim sup [ E n Xn M { Xn M}] = 0. n N This implies, in particular, that sup n N E n [ X n ] <. By (3.3a), for M N we can choose n M N such that E nm [ X nm M] E [ X M]. It follows that sup E [ X M] M N { sup M N E nm [ X nm M] E [ X M] + sup E n [ X n M] n N + sup E n [ X n ] <. n N This shows E[ X ] <, that is, X is integrable, since E [ X M] E[ X ] as M by monotone convergence. Now for any n N, any M N, E [X] E n [X n ] E [ M (X M)] E n [ M (X n M)] + E [ ] [ ] X { X M} + En Xn { Xn M} Let ε > 0. By the integrability of X and the uniform integrability of (X n ) one finds M ε N such that E [ ] [ ] ε X { X M} + sup E k Xk { Xk M} k N 2. By (3.3a), we can choose n ε = n(m ε ) such that E [ M ε (X M ε )] E nε [ M ε (X nε M ε )] ε 2. It follows that E [X] E nε [X nε ] ε, which establishes the desired convergence since ε was arbitrary. uppose we have a sequence (X n ) n N of random variables that converges w in distribution to some random variable X; thus Law(X n ) Law(X). If the relation (in particular, joint distribution) between the X, X 2,... is irrelevant, one may work with random variables that converge almost surely. w Theorem 3.5 (korohod representation). Let (θ n ) n N P(X ). If θ n θ for some θ P(X ), then there exists a probability space (Ω, F, P) carrying X -valued random variables X n, n N, and X such that P Xn = θ n for every n N, P X = θ, and X n X as n P-almost surely. } 4

15 Proof (for X finite). Assume that X {,..., N} for some N N. et Ω =. [0, ], F =. B([0, ]), and let P be Lebesgue measure on B([0, ]). et X(ω) =. X n (ω) =. N k (θ{i<k},θ{i k}] (ω), k= N k (θn{i<k},θn{i k}](ω). k= The random variables thus defined have the desired properties since, for every k {,..., N}, θ n {i < k} θ{i < k}, θ n {i k} θ{i k} as n. For a general Polish space X (actually, completeness not needed) use countable partitions and approximation argument; cf. the proof of Theorem 4.30 in Kallenberg [200, p. 79] or of Theorem.7.2 in Dudley [2002, pp ]. A standard method for proving that a sequence (a n ) n N of elements of a complete metric space converges to a unique limit a is to proceed as follows. First show that (a n ) n N is relatively compact (i.e., cl({a n : n N}) is compact in ). Then, taking any convergent subsequence (a n(j) ) j N with limit ã, show that ã = a. This establishes a n a as n N. Relative compactness in P(X ) with the topology of weak convergence when X is Polish is equivalent to uniform exhaustibility by compact sets or tightness. Definition 3.2. Let I be a non-empty set. A family (θ i ) i I P(X ) is called tight (or uniformly tight) if for any ε > 0 there is a compact set K ε X such that inf θ i(k ε ) ε. i I Theorem 3.6 (Prohorov). Let I be a non-empty set, and let (θ i ) i I P(X ), where X is Polish. Then (θ i ) i I P(X ) is tight if and only if (θ i ) i I is relatively compact in P(X ) with respect to the topology of weak convergence. Proof. ee, for instance, ection.5 in Billingsley [999, pp ]. Depending on the structure of the underlying space X, conditions for tightness or relative compactness can be derived. Let us consider here the case X = C([0, ), R d ) with the topology of uniform convergence on compact time intervals. for (R d -valued) continuous processes. With this choice, X is the canonical path space Let X be the canonical process on C([0, ), R d ), that is, X(t, ω). = ω(t) for t 0, ω C([0, ), R d ). 5

16 Theorem 3.7. Let I be a non-empty set, and let (θ i ) i I P(C([0, ), R d )). Then (θ i ) i I is relatively compact if and only if the following two conditions hold: (i) (θ i (X(0)) is tight in P(R d ), and (ii) for every ε > 0, every T N there is δ > 0 such that }) sup θ i ({ω C([0, ), R d ) : w T (ω, δ) > ε ε, i I where w T (ω, δ). = sup s,t [0,T ]: t s δ ω(t) ω(s) is the modulus of continuity of ω with size δ over the time interval [0, T ]. Proof. ee, for instance, Theorem in Billingsley [999, pp ]; the extension from a compact time interval to [0, ) is straightforward. Theorem 3.7 should be compared to the Arzelà-Ascoli criterion for relative compactness in C([0, ), R d ). The next theorem gives a sufficient condition for relative compactness (or tightness) in P(C([0, ), R d )); the result should be compared to the Kolmogorov-Chentsov continuity theorem. Theorem 3.8 (Kolmogorov s sufficient condition). Let I be a non-empty set, and let (θ i ) i I P(C([0, ), R d )). uppose that (i) (θ i (X(0)) is tight in P(R d ), and (ii) there are strictly positive numbers C, α, β such that for all t, s [0, ), all i I, E θi [ X(s) X(t) α ] C t s +β. Then (θ i ) i I is relatively compact in P(C([0, ), R d )). Proof. ee, for instance, Corollary 6.5 in Kallenberg [200, p. 33]. Tightness of a family (θ i ) i I P() can often be established by showing that sup i I G(θ i ) < for an appropriate function G. This works if G is a tightness function in the sense of the definition below. The issue is more delicate when passing from the korohod space D([0, T ], X ) to the korohod space D([0, ), X ); cf. Chapter 3 in Billingsley [999]. 6

17 Definition 3.3. A measurable function g : [0, ] is called a tightness function on if g has relatively compact sublevel sets, that is, for every M R, cl{x : g(x) M} is compact in. A way of constructing tightness functions on P() is provided by the following result. Theorem 3.9. uppose g is a tightness function on. Define a function G on P() by G(θ) =. g(x)θ(dx). Then G is a tightness function on. Proof. The function G is well-defined with values in [0, ]. Let M [0, ), and set A =. {θ P() : G(θ) M}. By Prohorov s theorem it is enough to. show that A = A(M) is tight. Let ε > 0. et K ε = cl{x : g(x) M/ε}. Then K ε is compact since g is a tightness function and for θ A, M G(θ) = g dθ g dθ M \K ε ε θ( \ K ε). It follows that inf θ A θ( \ K ε ) ε, hence sup θ A θ(k ε ) ε. 4 Relative entropy Here we collect properties of relative entropy for probability measures on Polish spaces. We will mostly refer to Dupuis and Ellis [997]; also see the classical works by Kullback and Leibler [95] and Kullback [959], where relative entropy is introduced as an information measure and called directed divergence. Let, X, Y be Polish spaces. Definition 4.. Let µ, ν P(). The relative entropy of µ with respect to ν is given by ( ) R(µ ν) = log dµ dν (x) µ(dx) if µ ν, else. Relative entropy is well-defined as a function P() P() [0, ]. Indeed, if µ ν, then a density f =. dµ dν exists by the Radon-Nikodym theorem with f uniquely determined ν-almost surely. In this case, R(µ ν) = f(x) log (f(x)) ν(dx). 7

18 Clearly, lim x 0+ x log(x) = 0. ince fdν = and x log(x) x for all x 0 with equality if and only if x =, it follows that R(µ ν) 0 with R(µ ν) = 0 if and only if µ = ν. Relative entropy can actually be defined for σ-finite measures on an arbitrary measurable space. Lemma 4. (Basic properties). Properties of relative entropy R(.. ) for probability measures on a Polish space. (a) Relative entropy is a non-negative, convex, lower semicontinuous function P() P() [0, ]. (b) For ν P(), R(. ν) is strictly convex on {µ P( : R(µ ν) < }. (c) For ν P(), R(. ν) has compact sublevel sets. (d) Let Π denote the set of finite measurable partitions of. Then for all µ, ν P(), ( ) µ(a) R(µ ν) = sup µ(a) log, π Π ν(a) A π where x log(x/y) = 0 if x = 0, x log(x/y) = if x > 0 and y = 0. (e) For every A B(), any µ, ν P(), R(µ ν) µ(a) log ( ) µ(a). ν(a) Proof. ee Lemma.4.3, parts (b), (c), (g), in Dupuis and Ellis [997, pp ]. Lemma 4.2 (Contraction property). Let ψ : Y X be a Borel measurable mapping. Let η P(X ), γ 0 P(Y). Then R ( η γ 0 ψ ) = where inf = by convention. inf R( ) γ γ 0, (4.) γ P(Y):γ ψ =η Proof (sketch). Inequality analogous to proof of Lemma E.2. in Dupuis and Ellis [997, p. 366]. For the opposite inequality, check that the probability measure γ defined by γ(dy) =. dη dγ 0 (ψ(y))γ ψ 0 (dy) attains the infimum whenever that infimum is finite. 2 2 For more details and an application see, for instance, M. Fischer, On the form of the large deviation rate function for the empirical measures of weakly interacting systems, arxiv: [math.pr]. 8

19 Lemma 4.2 yields the invariance property of relative entropy established in Lemma E.2. [Dupuis and Ellis, 997, p. 366] in the case when ψ is a bijective bi-measurable mapping; also cf. Theorem 4. in Kullback and Leibler [95], where the inequality that is implied by Lemma 4.2 is derived. Lemma 4.3 (Chain rule). Let X, Y be Polish spaces. Let α, β P(X Y) and denote their marginal distributions on X by α and β, respectively. Let α(..), β(..) be stochastic kernels on Y given X such that for all A B(X ), B B(Y), α(a B) = α(b x)α (dx), β(a B) = β(b x)β (dx). A Then the mapping x R (α(. x) β(. x)) is measurable and R(α β) = R (α β ) + R (α(. x) β(. x)) α (dx). In particular, if α, β are product measures, then R(α α 2 β β 2 ) = R (α β ) + R (α 2 β 2 ). Proof. Appendix C.3 in Dupuis and Ellis [997, pp ] X A The variational representation for Laplace functionals given in Lemma 4.4 below is the starting point for the weak convergence approach to large deviations; its statement should be compared to Definition 2.2, the definition of the Laplace principle. Lemma 4.4 (Laplace functionals). Let ν P(). Then for all g M b (), { } log exp ( g(x)) ν(dx) = inf R(µ ν) + g(x)µ(dx), µ P() Infimum in variational formula above is attained at µ P() given by dµ dν (x). = exp ( g(x)) x. exp ( g(y)) ν(dy), Proof. Let g M b (), and define µ through its density with respect to ν as above. Notice that µ, ν are mutually absolutely continuous. Let µ P() be such that R(µ ν) <. Then µ is absolutely continuous with respect to 9

20 ν with density dµ dν, but also absolutely continuous with respect to µ with density dµ dµ = dµ dν dν dµ, where dν dµ = eg e. It follows that g dµ R(µ ν) + g dµ = = ( ) dµ log dµ + g dµ dν ( ) ( dµ dµ log dµ dµ + log dν = R(µ µ ) log e g dν. ) dµ + g dµ This yields the assertion since R(µ µ ) 0 with R(µ µ ) = 0 if and only if µ = µ. Lemma 4.4 also allows to derive the Donsker-Varadhan variational formula for relative entropy itself. Lemma 4.5 (Donsker-Varadhan). Let µ, ν P(). Then { R(µ ν) = sup g M b () g(x)µ(dx) log exp (g(x)) ν(dx) Proof. Let µ, ν P(). By Lemma 4.4, for every g M b () R(µ ν) g dµ log e g dν, hence R(µ ν) sup g M b () { { = sup g M b () g dµ log g dµ log } e g dν e g dν For g M b () set J(g). = g dµ log eg dν. Thus R(µ ν) sup g Mb J(g). To obtain equality, it is enough to find a sequence (g M ) M N M b () such }. that lim sup M J(g M ) = R(µ ν). We distinguish two cases. First case: µ is not absolutely continuous with respect to ν. Then R(µ ν) = and there exists A B() such that µ(a) > 0 while ν(a) = 0. Choose such a set A and set g M. = M A. Then, for every M N, g M = 0 ν-almost surely, thus e g M dν = e 0 dν =, hence log e g M dν = 0. follows that lim sup M J(g M ) = lim sup M }. g M dµ = lim sup M µ(a) =. M 20 It

21 econd case: µ is absolutely continuous with respect to ν. Then we can choose a measurable function f : [0, ) such that f is a density for µ with respect to ν (f a version of the Radon-Nikodym derivative dµ/dν), and R(µ ν) = f log(f)dν, where the value of the integral is in [0, ]. et g M (x). = log(f(x)) [/M,M] (f(x)) M {0} (f(x)), x. Then lim g M dµ M = lim f log(f) [/M,M] (f) dν M ( ( = lim f log(f) + (0, ) (f) ) [/M,M] (f) dν M = f log(f) dν + ν{f > 0} ν{f > 0} = R(µ ν) ) [/M,M] (f) dν by dominated convergence and monotone convergence since t log(t) for every t 0 and t log(t) = 0 if t = 0, hence, for every x, (f(x) log(f(x)) + (0, ) (f(x))) [/M,M] (f(x)) f(x) log(f(x)) + (0, ) (f(x)) as M. On the other hand, again using dominated and monotone convergence, respectively, lim log e g M dν M ( ( = log lim f [/M,M] (f) + (0,/M) (M, ) (f) + e M {0} (f) ) ) dν M = log f dν = log() = 0. It follows that lim sup M J(g M ) = lim M g M dµ = R(µ ν). Remark 4.. If the state space is Polish as we assume, then the supremum in the Donsker-Varadhan formula of Lemma 4.5 can be restricted to bounded and continuous functions, that is, { g(x)µ(dx) log sup g M b () = sup g C b () { g(x)µ(dx) log } exp (g(x)) ν(dx) } exp (g(x)) ν(dx) ; see the proof of formula (C.) in Dupuis and Ellis [997, pp ]. 2

22 Remark 4.2. Lemmata 4.4 and 4.5 imply a relationship of convex duality between Laplace functionals and relative entropy. Let ν P(). Then { } R(µ ν) = sup g dµ log e g dν, µ P(), log e g dν = g M b (X ) sup µ P() { } g dµ R(µ ν), g M b (), that is, the functions µ R(µ ν) and g log e g dν are convex conjugates. References P. Billingsley. Convergence of Probability Measures. Wiley series in Probability and tatistics. John Wiley & ons, New York, 968. P. Billingsley. Convergence of Probability Measures. Wiley series in Probability and tatistics. John Wiley & ons, New York, 2nd edition, 999. A. Dembo and O. Zeitouni. Large Deviations Techniques and Applications, volume 38 of Applications of Mathematics. pringer, New York, 2nd edition, 998. F. den Hollander. Large Deviations, volume 4 of Fields Institute Monographs. American Mathematical ociety, Providence, RI, J.-D. Deuschel and D. W. troock. Large Deviations. Academic Press, Boston, 989. R. Dudley. Real Analysis and Probability. Cambridge studies in advanced mathematics. Cambridge University Press, Cambridge, P. Dupuis and R.. Ellis. A Weak Convergence Approach to the Theory of Large Deviations. Wiley eries in Probability and tatistics. John Wiley & ons, New York, 997. R.. Ellis. Entropy, Large Deviations and tatistical Mechanics, volume 27 of Grundlehren der mathematischen Wissenschaften. pringer, New York, N. Ethier and T. G. Kurtz. Markov Processes: Characterization and Convergence. Wiley eries in Probability and tatistics. John Wiley & ons, New York,

23 O. Kallenberg. Foundations of Modern Probability. Probability and Its Applications. pringer, New York, 2nd edition, Kullback. Information Theory and tatistics. John Wiley and ons, Inc., New York, Kullback and R. A. Leibler. On information and sufficiency. Ann. Math. tatistics, 22:79 86, 95.. R.. Varadhan. Asymptotic probabilities and differential equations. Comm. Pure Appl. Math., 9:26 286,

Probability and Measure

Probability and Measure Part II Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2018 84 Paper 4, Section II 26J Let (X, A) be a measurable space. Let T : X X be a measurable map, and µ a probability

More information

SUMMARY OF RESULTS ON PATH SPACES AND CONVERGENCE IN DISTRIBUTION FOR STOCHASTIC PROCESSES

SUMMARY OF RESULTS ON PATH SPACES AND CONVERGENCE IN DISTRIBUTION FOR STOCHASTIC PROCESSES SUMMARY OF RESULTS ON PATH SPACES AND CONVERGENCE IN DISTRIBUTION FOR STOCHASTIC PROCESSES RUTH J. WILLIAMS October 2, 2017 Department of Mathematics, University of California, San Diego, 9500 Gilman Drive,

More information

2a Large deviation principle (LDP) b Contraction principle c Change of measure... 10

2a Large deviation principle (LDP) b Contraction principle c Change of measure... 10 Tel Aviv University, 2007 Large deviations 5 2 Basic notions 2a Large deviation principle LDP......... 5 2b Contraction principle................ 9 2c Change of measure................. 10 The formalism

More information

MATHS 730 FC Lecture Notes March 5, Introduction

MATHS 730 FC Lecture Notes March 5, Introduction 1 INTRODUCTION MATHS 730 FC Lecture Notes March 5, 2014 1 Introduction Definition. If A, B are sets and there exists a bijection A B, they have the same cardinality, which we write as A, #A. If there exists

More information

THEOREMS, ETC., FOR MATH 515

THEOREMS, ETC., FOR MATH 515 THEOREMS, ETC., FOR MATH 515 Proposition 1 (=comment on page 17). If A is an algebra, then any finite union or finite intersection of sets in A is also in A. Proposition 2 (=Proposition 1.1). For every

More information

Metric Spaces and Topology

Metric Spaces and Topology Chapter 2 Metric Spaces and Topology From an engineering perspective, the most important way to construct a topology on a set is to define the topology in terms of a metric on the set. This approach underlies

More information

MTH 404: Measure and Integration

MTH 404: Measure and Integration MTH 404: Measure and Integration Semester 2, 2012-2013 Dr. Prahlad Vaidyanathan Contents I. Introduction....................................... 3 1. Motivation................................... 3 2. The

More information

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2)

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2) 14:17 11/16/2 TOPIC. Convergence in distribution and related notions. This section studies the notion of the so-called convergence in distribution of real random variables. This is the kind of convergence

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 08: Stochastic Convergence

Introduction to Empirical Processes and Semiparametric Inference Lecture 08: Stochastic Convergence Introduction to Empirical Processes and Semiparametric Inference Lecture 08: Stochastic Convergence Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations

More information

Notes 1 : Measure-theoretic foundations I

Notes 1 : Measure-theoretic foundations I Notes 1 : Measure-theoretic foundations I Math 733-734: Theory of Probability Lecturer: Sebastien Roch References: [Wil91, Section 1.0-1.8, 2.1-2.3, 3.1-3.11], [Fel68, Sections 7.2, 8.1, 9.6], [Dur10,

More information

Examples of Dual Spaces from Measure Theory

Examples of Dual Spaces from Measure Theory Chapter 9 Examples of Dual Spaces from Measure Theory We have seen that L (, A, µ) is a Banach space for any measure space (, A, µ). We will extend that concept in the following section to identify an

More information

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1 Chapter 2 Probability measures 1. Existence Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension to the generated σ-field Proof of Theorem 2.1. Let F 0 be

More information

Part II Probability and Measure

Part II Probability and Measure Part II Probability and Measure Theorems Based on lectures by J. Miller Notes taken by Dexter Chua Michaelmas 2016 These notes are not endorsed by the lecturers, and I have modified them (often significantly)

More information

. Then V l on K l, and so. e e 1.

. Then V l on K l, and so. e e 1. Sanov s Theorem Let E be a Polish space, and define L n : E n M E to be the empirical measure given by L n x = n n m= δ x m for x = x,..., x n E n. Given a µ M E, denote by µ n the distribution of L n

More information

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9 MAT 570 REAL ANALYSIS LECTURE NOTES PROFESSOR: JOHN QUIGG SEMESTER: FALL 204 Contents. Sets 2 2. Functions 5 3. Countability 7 4. Axiom of choice 8 5. Equivalence relations 9 6. Real numbers 9 7. Extended

More information

+ 2x sin x. f(b i ) f(a i ) < ɛ. i=1. i=1

+ 2x sin x. f(b i ) f(a i ) < ɛ. i=1. i=1 Appendix To understand weak derivatives and distributional derivatives in the simplest context of functions of a single variable, we describe without proof some results from real analysis (see [7] and

More information

2 (Bonus). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

2 (Bonus). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure? MA 645-4A (Real Analysis), Dr. Chernov Homework assignment 1 (Due 9/5). Prove that every countable set A is measurable and µ(a) = 0. 2 (Bonus). Let A consist of points (x, y) such that either x or y is

More information

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications. Stat 811 Lecture Notes The Wald Consistency Theorem Charles J. Geyer April 9, 01 1 Analyticity Assumptions Let { f θ : θ Θ } be a family of subprobability densities 1 with respect to a measure µ on a measurable

More information

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure? MA 645-4A (Real Analysis), Dr. Chernov Homework assignment 1 (Due ). Show that the open disk x 2 + y 2 < 1 is a countable union of planar elementary sets. Show that the closed disk x 2 + y 2 1 is a countable

More information

1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer 11(2) (1989),

1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer 11(2) (1989), Real Analysis 2, Math 651, Spring 2005 April 26, 2005 1 Real Analysis 2, Math 651, Spring 2005 Krzysztof Chris Ciesielski 1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer

More information

Real Analysis Problems

Real Analysis Problems Real Analysis Problems Cristian E. Gutiérrez September 14, 29 1 1 CONTINUITY 1 Continuity Problem 1.1 Let r n be the sequence of rational numbers and Prove that f(x) = 1. f is continuous on the irrationals.

More information

STAT 7032 Probability Spring Wlodek Bryc

STAT 7032 Probability Spring Wlodek Bryc STAT 7032 Probability Spring 2018 Wlodek Bryc Created: Friday, Jan 2, 2014 Revised for Spring 2018 Printed: January 9, 2018 File: Grad-Prob-2018.TEX Department of Mathematical Sciences, University of Cincinnati,

More information

MATH MEASURE THEORY AND FOURIER ANALYSIS. Contents

MATH MEASURE THEORY AND FOURIER ANALYSIS. Contents MATH 3969 - MEASURE THEORY AND FOURIER ANALYSIS ANDREW TULLOCH Contents 1. Measure Theory 2 1.1. Properties of Measures 3 1.2. Constructing σ-algebras and measures 3 1.3. Properties of the Lebesgue measure

More information

Max-min (σ-)additive representation of monotone measures

Max-min (σ-)additive representation of monotone measures Noname manuscript No. (will be inserted by the editor) Max-min (σ-)additive representation of monotone measures Martin Brüning and Dieter Denneberg FB 3 Universität Bremen, D-28334 Bremen, Germany e-mail:

More information

From now on, we will represent a metric space with (X, d). Here are some examples: i=1 (x i y i ) p ) 1 p, p 1.

From now on, we will represent a metric space with (X, d). Here are some examples: i=1 (x i y i ) p ) 1 p, p 1. Chapter 1 Metric spaces 1.1 Metric and convergence We will begin with some basic concepts. Definition 1.1. (Metric space) Metric space is a set X, with a metric satisfying: 1. d(x, y) 0, d(x, y) = 0 x

More information

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability... Functional Analysis Franck Sueur 2018-2019 Contents 1 Metric spaces 1 1.1 Definitions........................................ 1 1.2 Completeness...................................... 3 1.3 Compactness......................................

More information

Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses

Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses Ann Inst Stat Math (2009) 61:773 787 DOI 10.1007/s10463-008-0172-6 Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses Taisuke Otsu Received: 1 June 2007 / Revised:

More information

Random Process Lecture 1. Fundamentals of Probability

Random Process Lecture 1. Fundamentals of Probability Random Process Lecture 1. Fundamentals of Probability Husheng Li Min Kao Department of Electrical Engineering and Computer Science University of Tennessee, Knoxville Spring, 2016 1/43 Outline 2/43 1 Syllabus

More information

The main results about probability measures are the following two facts:

The main results about probability measures are the following two facts: Chapter 2 Probability measures The main results about probability measures are the following two facts: Theorem 2.1 (extension). If P is a (continuous) probability measure on a field F 0 then it has a

More information

An inverse of Sanov s theorem

An inverse of Sanov s theorem An inverse of Sanov s theorem Ayalvadi Ganesh and Neil O Connell BRIMS, Hewlett-Packard Labs, Bristol Abstract Let X k be a sequence of iid random variables taking values in a finite set, and consider

More information

1 Weak Convergence in R k

1 Weak Convergence in R k 1 Weak Convergence in R k Byeong U. Park 1 Let X and X n, n 1, be random vectors taking values in R k. These random vectors are allowed to be defined on different probability spaces. Below, for the simplicity

More information

Convergence of generalized entropy minimizers in sequences of convex problems

Convergence of generalized entropy minimizers in sequences of convex problems Proceedings IEEE ISIT 206, Barcelona, Spain, 2609 263 Convergence of generalized entropy minimizers in sequences of convex problems Imre Csiszár A Rényi Institute of Mathematics Hungarian Academy of Sciences

More information

3 Integration and Expectation

3 Integration and Expectation 3 Integration and Expectation 3.1 Construction of the Lebesgue Integral Let (, F, µ) be a measure space (not necessarily a probability space). Our objective will be to define the Lebesgue integral R fdµ

More information

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

Integration on Measure Spaces

Integration on Measure Spaces Chapter 3 Integration on Measure Spaces In this chapter we introduce the general notion of a measure on a space X, define the class of measurable functions, and define the integral, first on a class of

More information

Lecture 5 Theorems of Fubini-Tonelli and Radon-Nikodym

Lecture 5 Theorems of Fubini-Tonelli and Radon-Nikodym Lecture 5: Fubini-onelli and Radon-Nikodym 1 of 13 Course: heory of Probability I erm: Fall 2013 Instructor: Gordan Zitkovic Lecture 5 heorems of Fubini-onelli and Radon-Nikodym Products of measure spaces

More information

REVIEW OF ESSENTIAL MATH 346 TOPICS

REVIEW OF ESSENTIAL MATH 346 TOPICS REVIEW OF ESSENTIAL MATH 346 TOPICS 1. AXIOMATIC STRUCTURE OF R Doğan Çömez The real number system is a complete ordered field, i.e., it is a set R which is endowed with addition and multiplication operations

More information

Entropy, Inference, and Channel Coding

Entropy, Inference, and Channel Coding Entropy, Inference, and Channel Coding Sean Meyn Department of Electrical and Computer Engineering University of Illinois and the Coordinated Science Laboratory NSF support: ECS 02-17836, ITR 00-85929

More information

Weak convergence. Amsterdam, 13 November Leiden University. Limit theorems. Shota Gugushvili. Generalities. Criteria

Weak convergence. Amsterdam, 13 November Leiden University. Limit theorems. Shota Gugushvili. Generalities. Criteria Weak Leiden University Amsterdam, 13 November 2013 Outline 1 2 3 4 5 6 7 Definition Definition Let µ, µ 1, µ 2,... be probability measures on (R, B). It is said that µ n converges weakly to µ, and we then

More information

Brownian Motion and Conditional Probability

Brownian Motion and Conditional Probability Math 561: Theory of Probability (Spring 2018) Week 10 Brownian Motion and Conditional Probability 10.1 Standard Brownian Motion (SBM) Brownian motion is a stochastic process with both practical and theoretical

More information

Empirical Processes: General Weak Convergence Theory

Empirical Processes: General Weak Convergence Theory Empirical Processes: General Weak Convergence Theory Moulinath Banerjee May 18, 2010 1 Extended Weak Convergence The lack of measurability of the empirical process with respect to the sigma-field generated

More information

On duality theory of conic linear problems

On duality theory of conic linear problems On duality theory of conic linear problems Alexander Shapiro School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia 3332-25, USA e-mail: ashapiro@isye.gatech.edu

More information

x log x, which is strictly convex, and use Jensen s Inequality:

x log x, which is strictly convex, and use Jensen s Inequality: 2. Information measures: mutual information 2.1 Divergence: main inequality Theorem 2.1 (Information Inequality). D(P Q) 0 ; D(P Q) = 0 iff P = Q Proof. Let ϕ(x) x log x, which is strictly convex, and

More information

THEOREMS, ETC., FOR MATH 516

THEOREMS, ETC., FOR MATH 516 THEOREMS, ETC., FOR MATH 516 Results labeled Theorem Ea.b.c (or Proposition Ea.b.c, etc.) refer to Theorem c from section a.b of Evans book (Partial Differential Equations). Proposition 1 (=Proposition

More information

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product Chapter 4 Hilbert Spaces 4.1 Inner Product Spaces Inner Product Space. A complex vector space E is called an inner product space (or a pre-hilbert space, or a unitary space) if there is a mapping (, )

More information

Real Analysis Chapter 3 Solutions Jonathan Conder. ν(f n ) = lim

Real Analysis Chapter 3 Solutions Jonathan Conder. ν(f n ) = lim . Suppose ( n ) n is an increasing sequence in M. For each n N define F n : n \ n (with 0 : ). Clearly ν( n n ) ν( nf n ) ν(f n ) lim n If ( n ) n is a decreasing sequence in M and ν( )

More information

Weak convergence and Brownian Motion. (telegram style notes) P.J.C. Spreij

Weak convergence and Brownian Motion. (telegram style notes) P.J.C. Spreij Weak convergence and Brownian Motion (telegram style notes) P.J.C. Spreij this version: December 8, 2006 1 The space C[0, ) In this section we summarize some facts concerning the space C[0, ) of real

More information

Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University

Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University February 7, 2007 2 Contents 1 Metric Spaces 1 1.1 Basic definitions...........................

More information

If Y and Y 0 satisfy (1-2), then Y = Y 0 a.s.

If Y and Y 0 satisfy (1-2), then Y = Y 0 a.s. 20 6. CONDITIONAL EXPECTATION Having discussed at length the limit theory for sums of independent random variables we will now move on to deal with dependent random variables. An important tool in this

More information

l(y j ) = 0 for all y j (1)

l(y j ) = 0 for all y j (1) Problem 1. The closed linear span of a subset {y j } of a normed vector space is defined as the intersection of all closed subspaces containing all y j and thus the smallest such subspace. 1 Show that

More information

Lectures on Integration. William G. Faris

Lectures on Integration. William G. Faris Lectures on Integration William G. Faris March 4, 2001 2 Contents 1 The integral: properties 5 1.1 Measurable functions......................... 5 1.2 Integration.............................. 7 1.3 Convergence

More information

Measure Theory on Topological Spaces. Course: Prof. Tony Dorlas 2010 Typset: Cathal Ormond

Measure Theory on Topological Spaces. Course: Prof. Tony Dorlas 2010 Typset: Cathal Ormond Measure Theory on Topological Spaces Course: Prof. Tony Dorlas 2010 Typset: Cathal Ormond May 22, 2011 Contents 1 Introduction 2 1.1 The Riemann Integral........................................ 2 1.2 Measurable..............................................

More information

Functional Analysis. Martin Brokate. 1 Normed Spaces 2. 2 Hilbert Spaces The Principle of Uniform Boundedness 32

Functional Analysis. Martin Brokate. 1 Normed Spaces 2. 2 Hilbert Spaces The Principle of Uniform Boundedness 32 Functional Analysis Martin Brokate Contents 1 Normed Spaces 2 2 Hilbert Spaces 2 3 The Principle of Uniform Boundedness 32 4 Extension, Reflexivity, Separation 37 5 Compact subsets of C and L p 46 6 Weak

More information

Probability Theory. Richard F. Bass

Probability Theory. Richard F. Bass Probability Theory Richard F. Bass ii c Copyright 2014 Richard F. Bass Contents 1 Basic notions 1 1.1 A few definitions from measure theory............. 1 1.2 Definitions............................. 2

More information

Analysis Comprehensive Exam Questions Fall 2008

Analysis Comprehensive Exam Questions Fall 2008 Analysis Comprehensive xam Questions Fall 28. (a) Let R be measurable with finite Lebesgue measure. Suppose that {f n } n N is a bounded sequence in L 2 () and there exists a function f such that f n (x)

More information

Recall that if X is a compact metric space, C(X), the space of continuous (real-valued) functions on X, is a Banach space with the norm

Recall that if X is a compact metric space, C(X), the space of continuous (real-valued) functions on X, is a Banach space with the norm Chapter 13 Radon Measures Recall that if X is a compact metric space, C(X), the space of continuous (real-valued) functions on X, is a Banach space with the norm (13.1) f = sup x X f(x). We want to identify

More information

I. ANALYSIS; PROBABILITY

I. ANALYSIS; PROBABILITY ma414l1.tex Lecture 1. 12.1.2012 I. NLYSIS; PROBBILITY 1. Lebesgue Measure and Integral We recall Lebesgue measure (M411 Probability and Measure) λ: defined on intervals (a, b] by λ((a, b]) := b a (so

More information

van Rooij, Schikhof: A Second Course on Real Functions

van Rooij, Schikhof: A Second Course on Real Functions vanrooijschikhof.tex April 25, 2018 van Rooij, Schikhof: A Second Course on Real Functions Notes from [vrs]. Introduction A monotone function is Riemann integrable. A continuous function is Riemann integrable.

More information

2 Lebesgue integration

2 Lebesgue integration 2 Lebesgue integration 1. Let (, A, µ) be a measure space. We will always assume that µ is complete, otherwise we first take its completion. The example to have in mind is the Lebesgue measure on R n,

More information

Optimal Transport for Data Analysis

Optimal Transport for Data Analysis Optimal Transport for Data Analysis Bernhard Schmitzer 2017-05-16 1 Introduction 1.1 Reminders on Measure Theory Reference: Ambrosio, Fusco, Pallara: Functions of Bounded Variation and Free Discontinuity

More information

MATH 6605: SUMMARY LECTURE NOTES

MATH 6605: SUMMARY LECTURE NOTES MATH 6605: SUMMARY LECTURE NOTES These notes summarize the lectures on weak convergence of stochastic processes. If you see any typos, please let me know. 1. Construction of Stochastic rocesses A stochastic

More information

Analysis Finite and Infinite Sets The Real Numbers The Cantor Set

Analysis Finite and Infinite Sets The Real Numbers The Cantor Set Analysis Finite and Infinite Sets Definition. An initial segment is {n N n n 0 }. Definition. A finite set can be put into one-to-one correspondence with an initial segment. The empty set is also considered

More information

Functional Analysis Winter 2018/2019

Functional Analysis Winter 2018/2019 Functional Analysis Winter 2018/2019 Peer Christian Kunstmann Karlsruher Institut für Technologie (KIT) Institut für Analysis Englerstr. 2, 76131 Karlsruhe e-mail: peer.kunstmann@kit.edu These lecture

More information

Metric Spaces. Exercises Fall 2017 Lecturer: Viveka Erlandsson. Written by M.van den Berg

Metric Spaces. Exercises Fall 2017 Lecturer: Viveka Erlandsson. Written by M.van den Berg Metric Spaces Exercises Fall 2017 Lecturer: Viveka Erlandsson Written by M.van den Berg School of Mathematics University of Bristol BS8 1TW Bristol, UK 1 Exercises. 1. Let X be a non-empty set, and suppose

More information

Generalized Orlicz spaces and Wasserstein distances for convex concave scale functions

Generalized Orlicz spaces and Wasserstein distances for convex concave scale functions Bull. Sci. math. 135 (2011 795 802 www.elsevier.com/locate/bulsci Generalized Orlicz spaces and Wasserstein distances for convex concave scale functions Karl-Theodor Sturm Institut für Angewandte Mathematik,

More information

Notes on Measure, Probability and Stochastic Processes. João Lopes Dias

Notes on Measure, Probability and Stochastic Processes. João Lopes Dias Notes on Measure, Probability and Stochastic Processes João Lopes Dias Departamento de Matemática, ISEG, Universidade de Lisboa, Rua do Quelhas 6, 1200-781 Lisboa, Portugal E-mail address: jldias@iseg.ulisboa.pt

More information

Exercises Measure Theoretic Probability

Exercises Measure Theoretic Probability Exercises Measure Theoretic Probability 2002-2003 Week 1 1. Prove the folloing statements. (a) The intersection of an arbitrary family of d-systems is again a d- system. (b) The intersection of an arbitrary

More information

Economics 204 Summer/Fall 2011 Lecture 5 Friday July 29, 2011

Economics 204 Summer/Fall 2011 Lecture 5 Friday July 29, 2011 Economics 204 Summer/Fall 2011 Lecture 5 Friday July 29, 2011 Section 2.6 (cont.) Properties of Real Functions Here we first study properties of functions from R to R, making use of the additional structure

More information

Asymptotics of minimax stochastic programs

Asymptotics of minimax stochastic programs Asymptotics of minimax stochastic programs Alexander Shapiro Abstract. We discuss in this paper asymptotics of the sample average approximation (SAA) of the optimal value of a minimax stochastic programming

More information

Probability and Measure

Probability and Measure Chapter 4 Probability and Measure 4.1 Introduction In this chapter we will examine probability theory from the measure theoretic perspective. The realisation that measure theory is the foundation of probability

More information

Analysis Qualifying Exam

Analysis Qualifying Exam Analysis Qualifying Exam Spring 2017 Problem 1: Let f be differentiable on R. Suppose that there exists M > 0 such that f(k) M for each integer k, and f (x) M for all x R. Show that f is bounded, i.e.,

More information

1.1. MEASURES AND INTEGRALS

1.1. MEASURES AND INTEGRALS CHAPTER 1: MEASURE THEORY In this chapter we define the notion of measure µ on a space, construct integrals on this space, and establish their basic properties under limits. The measure µ(e) will be defined

More information

A SHORT INTRODUCTION TO BANACH LATTICES AND

A SHORT INTRODUCTION TO BANACH LATTICES AND CHAPTER A SHORT INTRODUCTION TO BANACH LATTICES AND POSITIVE OPERATORS In tis capter we give a brief introduction to Banac lattices and positive operators. Most results of tis capter can be found, e.g.,

More information

PROBLEMS. (b) (Polarization Identity) Show that in any inner product space

PROBLEMS. (b) (Polarization Identity) Show that in any inner product space 1 Professor Carl Cowen Math 54600 Fall 09 PROBLEMS 1. (Geometry in Inner Product Spaces) (a) (Parallelogram Law) Show that in any inner product space x + y 2 + x y 2 = 2( x 2 + y 2 ). (b) (Polarization

More information

A List of Problems in Real Analysis

A List of Problems in Real Analysis A List of Problems in Real Analysis W.Yessen & T.Ma December 3, 218 This document was first created by Will Yessen, who was a graduate student at UCI. Timmy Ma, who was also a graduate student at UCI,

More information

Real Analysis Qualifying Exam May 14th 2016

Real Analysis Qualifying Exam May 14th 2016 Real Analysis Qualifying Exam May 4th 26 Solve 8 out of 2 problems. () Prove the Banach contraction principle: Let T be a mapping from a complete metric space X into itself such that d(tx,ty) apple qd(x,

More information

Lecture 5. If we interpret the index n 0 as time, then a Markov chain simply requires that the future depends only on the present and not on the past.

Lecture 5. If we interpret the index n 0 as time, then a Markov chain simply requires that the future depends only on the present and not on the past. 1 Markov chain: definition Lecture 5 Definition 1.1 Markov chain] A sequence of random variables (X n ) n 0 taking values in a measurable state space (S, S) is called a (discrete time) Markov chain, if

More information

Tools from Lebesgue integration

Tools from Lebesgue integration Tools from Lebesgue integration E.P. van den Ban Fall 2005 Introduction In these notes we describe some of the basic tools from the theory of Lebesgue integration. Definitions and results will be given

More information

02. Measure and integral. 1. Borel-measurable functions and pointwise limits

02. Measure and integral. 1. Borel-measurable functions and pointwise limits (October 3, 2017) 02. Measure and integral Paul Garrett garrett@math.umn.edu http://www.math.umn.edu/ garrett/ [This document is http://www.math.umn.edu/ garrett/m/real/notes 2017-18/02 measure and integral.pdf]

More information

INTRODUCTION TO MEASURE THEORY AND LEBESGUE INTEGRATION

INTRODUCTION TO MEASURE THEORY AND LEBESGUE INTEGRATION 1 INTRODUCTION TO MEASURE THEORY AND LEBESGUE INTEGRATION Eduard EMELYANOV Ankara TURKEY 2007 2 FOREWORD This book grew out of a one-semester course for graduate students that the author have taught at

More information

1. Probability Measure and Integration Theory in a Nutshell

1. Probability Measure and Integration Theory in a Nutshell 1. Probability Measure and Integration Theory in a Nutshell 1.1. Measurable Space and Measurable Functions Definition 1.1. A measurable space is a tuple (Ω, F) where Ω is a set and F a σ-algebra on Ω,

More information

Stat 8112 Lecture Notes Weak Convergence in Metric Spaces Charles J. Geyer January 23, Metric Spaces

Stat 8112 Lecture Notes Weak Convergence in Metric Spaces Charles J. Geyer January 23, Metric Spaces Stat 8112 Lecture Notes Weak Convergence in Metric Spaces Charles J. Geyer January 23, 2013 1 Metric Spaces Let X be an arbitrary set. A function d : X X R is called a metric if it satisfies the folloing

More information

Applied Analysis (APPM 5440): Final exam 1:30pm 4:00pm, Dec. 14, Closed books.

Applied Analysis (APPM 5440): Final exam 1:30pm 4:00pm, Dec. 14, Closed books. Applied Analysis APPM 44: Final exam 1:3pm 4:pm, Dec. 14, 29. Closed books. Problem 1: 2p Set I = [, 1]. Prove that there is a continuous function u on I such that 1 ux 1 x sin ut 2 dt = cosx, x I. Define

More information

General Theory of Large Deviations

General Theory of Large Deviations Chapter 30 General Theory of Large Deviations A family of random variables follows the large deviations principle if the probability of the variables falling into bad sets, representing large deviations

More information

Reminder Notes for the Course on Measures on Topological Spaces

Reminder Notes for the Course on Measures on Topological Spaces Reminder Notes for the Course on Measures on Topological Spaces T. C. Dorlas Dublin Institute for Advanced Studies School of Theoretical Physics 10 Burlington Road, Dublin 4, Ireland. Email: dorlas@stp.dias.ie

More information

δ xj β n = 1 n Theorem 1.1. The sequence {P n } satisfies a large deviation principle on M(X) with the rate function I(β) given by

δ xj β n = 1 n Theorem 1.1. The sequence {P n } satisfies a large deviation principle on M(X) with the rate function I(β) given by . Sanov s Theorem Here we consider a sequence of i.i.d. random variables with values in some complete separable metric space X with a common distribution α. Then the sample distribution β n = n maps X

More information

Prohorov s theorem. Bengt Ringnér. October 26, 2008

Prohorov s theorem. Bengt Ringnér. October 26, 2008 Prohorov s theorem Bengt Ringnér October 26, 2008 1 The theorem Definition 1 A set Π of probability measures defined on the Borel sets of a topological space is called tight if, for each ε > 0, there is

More information

3. (a) What is a simple function? What is an integrable function? How is f dµ defined? Define it first

3. (a) What is a simple function? What is an integrable function? How is f dµ defined? Define it first Math 632/6321: Theory of Functions of a Real Variable Sample Preinary Exam Questions 1. Let (, M, µ) be a measure space. (a) Prove that if µ() < and if 1 p < q

More information

7 The Radon-Nikodym-Theorem

7 The Radon-Nikodym-Theorem 32 CHPTER II. MESURE ND INTEGRL 7 The Radon-Nikodym-Theorem Given: a measure space (Ω,, µ). Put Z + = Z + (Ω, ). Definition 1. For f (quasi-)µ-integrable and, the integral of f over is (Note: 1 f f.) Theorem

More information

Large Deviations, Linear Statistics, and Scaling Limits for Mahler Ensemble of Complex Random Polynomials

Large Deviations, Linear Statistics, and Scaling Limits for Mahler Ensemble of Complex Random Polynomials Large Deviations, Linear Statistics, and Scaling Limits for Mahler Ensemble of Complex Random Polynomials Maxim L. Yattselev joint work with Christopher D. Sinclair International Conference on Approximation

More information

Local semiconvexity of Kantorovich potentials on non-compact manifolds

Local semiconvexity of Kantorovich potentials on non-compact manifolds Local semiconvexity of Kantorovich potentials on non-compact manifolds Alessio Figalli, Nicola Gigli Abstract We prove that any Kantorovich potential for the cost function c = d / on a Riemannian manifold

More information

Chapter 4. Measure Theory. 1. Measure Spaces

Chapter 4. Measure Theory. 1. Measure Spaces Chapter 4. Measure Theory 1. Measure Spaces Let X be a nonempty set. A collection S of subsets of X is said to be an algebra on X if S has the following properties: 1. X S; 2. if A S, then A c S; 3. if

More information

ELEMENTS OF PROBABILITY THEORY

ELEMENTS OF PROBABILITY THEORY ELEMENTS OF PROBABILITY THEORY Elements of Probability Theory A collection of subsets of a set Ω is called a σ algebra if it contains Ω and is closed under the operations of taking complements and countable

More information

1 Stochastic Dynamic Programming

1 Stochastic Dynamic Programming 1 Stochastic Dynamic Programming Formally, a stochastic dynamic program has the same components as a deterministic one; the only modification is to the state transition equation. When events in the future

More information

Contents: 1. Minimization. 2. The theorem of Lions-Stampacchia for variational inequalities. 3. Γ -Convergence. 4. Duality mapping.

Contents: 1. Minimization. 2. The theorem of Lions-Stampacchia for variational inequalities. 3. Γ -Convergence. 4. Duality mapping. Minimization Contents: 1. Minimization. 2. The theorem of Lions-Stampacchia for variational inequalities. 3. Γ -Convergence. 4. Duality mapping. 1 Minimization A Topological Result. Let S be a topological

More information

ψ(t) := log E[e tx ] (1) P X j x e Nψ (x) j=1

ψ(t) := log E[e tx ] (1) P X j x e Nψ (x) j=1 Seminars 15/01/018 Ex 1. Exponential estimate Let X be a real valued random variable, and suppose that there exists r > 0 such that E[e rx ]

More information

Analysis of Probabilistic Systems

Analysis of Probabilistic Systems Analysis of Probabilistic Systems Bootcamp Lecture 2: Measure and Integration Prakash Panangaden 1 1 School of Computer Science McGill University Fall 2016, Simons Institute Panangaden (McGill) Analysis

More information

1. Aufgabenblatt zur Vorlesung Probability Theory

1. Aufgabenblatt zur Vorlesung Probability Theory 24.10.17 1. Aufgabenblatt zur Vorlesung By (Ω, A, P ) we always enote the unerlying probability space, unless state otherwise. 1. Let r > 0, an efine f(x) = 1 [0, [ (x) exp( r x), x R. a) Show that p f

More information

Continuous Functions on Metric Spaces

Continuous Functions on Metric Spaces Continuous Functions on Metric Spaces Math 201A, Fall 2016 1 Continuous functions Definition 1. Let (X, d X ) and (Y, d Y ) be metric spaces. A function f : X Y is continuous at a X if for every ɛ > 0

More information