Weak convergence in Probability Theory A summer excursion! Day 3

Size: px

Start display at page:

Download "Weak convergence in Probability Theory A summer excursion! Day 3"

Della Lucas
5 years ago
Views:

1 BCAM June Weak convergence in Probability Theory A summer excursion! Day 3 Armand M. Makowski ECE & ISR/HyNet University of Maryland at College Park armand@isr.umd.edu

2 BCAM June Day 1: Basic definitions of convergence for random variables will be reviewed, together with criteria and counter-examples. Day 2: Skorokhod s Theorem and coupling Examples in queueing theory, in the theory of Markov chains and time series analysis. Day 3: Poisson convergence: The Stein-Chen method with applications to problems in the theory of random graphs. Day 4: Weak convergence in function spaces Prohorov s Theorem and sequential compactness Day 5: An illustration: From random walks to Brownian motion

3 BCAM June A LITTLE DETOUR WITH N-VALUED RVS

4 BCAM June Probability mass functions Consider a rv X defined on the probability triple (Ω, F, P). It is said to be an N-valued rv if P [X N] = 1. Its probability mass function (pmf) p X (p X (x), x = 0, 1,...) is given by p X (x) = P [X = x], x = 0, 1,...

5 BCAM June Since F X (x) = p X (k), x R k N: k x we have F X p X Conversely, any [0, 1]-valued sequence p = (p(x), x = 0, 1,...) is called a pmf on N if p(x) = 1 x N Any pmf sequence uniquely defines a probability distribution function F : R [0, 1] through F(x) = p(k), x R k N: k x

6 BCAM June Fact: The N-valued rvs {X n, n = 1, 2,...} converge weakly to the N-valued rv X if and only if We also write p Xn = n p X lim p X n n (x) = p X (x), x = 0, 1,... A natural idea: Look for the limiting behavior of the pmfs, namely lim p X n n (x), x N Beware: While this limit may exist, it is not always a pmf! X n = st n + Poi(1)

7 BCAM June Theorem 1 Consider a collection of N-valued rvs {X n, n = 1, 2,...} such that the limits p(x) = lim n p X n (x), x N all exist. Then, the sequence p = (p(x), x N) is a pmf on N if and only the collection {X n, n = 1, 2,...} is tight, in which case X n = n X where X is an N-valued rv distributed according to the pmf p If p(k) = lim n p Xn (k) for each k = 0, 1,..., then lim F X n (x) = F X (x) n for all x in R which are non-integer. Thus, F Xn = n F X, and the collection {X n, n = 1, 2,...} is tight.

8 BCAM June Assume the collection {X n, n = 1, 2,...} to be tight. For every n = 1, 2,..., 1 = P [X n = x] = for arbitrary N 1.Thus, N P [X n = x] + P [X n > N] 1 = lim n = N P [X n = x] p(x) + limsup n P [X n > N] By tightness, for every ε > 0 there exists N(ε) such that sup P [X n > N(ε)] ε, n=1,2,...

9 BCAM June in which case 1 N(ε) Take ε = r 1 with r = 1, 2,... so that 1 N(r 1 ) p(x) + ε p(x) + r 1 The integers {N(r 1 ), r = 1, 2,...} can be selected to be strictly increasing, hence lim r N(r 1 ) =. Thus, 1 p(x) and the conclusion follows. p(x) = 1

10 BCAM June POISSON CONVERGENCE

11 BCAM June (Classical) Poisson convergence For each p [0, 1], let {B n (p), n = 1, 2,...} denote a collection of i.i.d. {0, 1}-valued (Bernoulli) rvs with and define P [B n (p) = 1] = 1 P [B n (p) = 0] = p, n = 1, 2,... S n (p) := B 1 (p) B n (p), n = 1, 2,... S n (p) = st Bin(n; p)

12 BCAM June Theorem 2 Consider a [0, 1]-valued sequence {p n, n = 1, 2,...} with lim np n = λ n for some λ > 0. Then, it holds that S n (p n ) = n Poi(λ) where Poi(λ) denotes a Poisson rv with parameter λ. For n large, p n λ n and S n(p n ) st Poi(np n )

13 BCAM June The Poisson paradigm The setting: For each r = 1, 2,..., let {B r,k (p r,k ), k = 1,...,k r } denote a collection of {0, 1}-valued rvs, which are not necessarily independent, and write S r (p r,1,...,p r,kr ) = B r,1 (p r,1 ) B r,kr (p r,kr )

14 BCAM June A typical result: With lim r k r =, if ( ) lim max p r,k = 0 r k=1,...,k r and lim (p r, p r,kr ) = λ r for some λ > 0, then under additional conditions of vanishingly weak correlations, S r (p r,1,...,p r,kr ) = r Poi(λ) Thus, and E [S r (p r,1,...,p r,kr )] = p r, p r,kr λ S r (p r,1,...,p r,kr ) st Poi(λ)

15 BCAM June Obvious ideas Via pmfs: lim P [S r(p r,1,...,p r,kr ) = x] = λx r x! e λ, x N Via pgfs: [ ] lim E z S r(p r,1,...,p r,k r ) = e λ(1 z), z R r

16 BCAM June Via the method of moments: For each p = 0, 1,..., lim E [S r(p r,1,...,p r,kr ) p ] = E [Poi(λ) p ] r Via the method of factorial moments: For each p = 0, 1,..., [ p ] lim E (S r (p r,1,...,p r,kr ) l) = λ p+1 r l=0

17 BCAM June Not obvious how to do this under general conditions Find new approaches Rates of convergence?

18 BCAM June TOTAL VARIATION DISTANCE, COUPLING AND APPROXIMATIONS

19 BCAM June Total variation between two pmfs For pmfs µ and ν on N, with X µ and Y ν, d TV (µ; ν) := 1 2 µ(x) ν(x) = d TV (X; Y ) This defines a distance on the space of all pmfs on N!

20 BCAM June Easy facts For any subset A of N, µ(a) ν(a) x A µ(x) ν(x) so that 1 2 ( µ(a) ν(a) + µ(ac ) ν(a c ) ) d TV (µ; ν) where A c denotes the complement of A in N. Since we conclude that µ(a c ) ν(a c ) = µ(a) ν(a), µ(a) ν(a) d TV (µ; ν).

21 BCAM June It follows that sup µ(a) ν(a) d TV (µ; ν). A N From the discussion around the maximal coupling (see later) we see that equality is in fact achieved, i.e., d TV (µ; ν) = sup µ(a) ν(a) A N This justifies the terminology maximal coupling.

22 BCAM June Weak convergence via total variation Fact: For N-valued rvs {X, X n, n = 1, 2,...}, we have X n = n X if and only if lim n d TV(X n ; X) = 0 If lim n d TV (X n ; X) = 0, then lim P [X n = x] P [X = x] = 0, n x N If X n = n X, then for each n = 1, 2,..., we have P [X n = x] P [X = x]

23 BCAM June N P [X n = x] P [X = x] + (P [X n = x] + P [X = x]) x=n+1 N = P [X n = x] P [X = x] + P [X n > N] + P [X > N] for any integer N 1. By tightness, for every ε > 0 there exists N(ε) such that sup P [X n > N(ε)] ε n=1,2,... and P [X > N(ε)] ε,

24 BCAM June whence Therefore, d TV (X n ; X) N(ε) lim sup n P [X n = x] P [X = x] + 2ε d TV (X n ; X) 2ε

25 BCAM June The coupling inequality Lemma 1 For pmfs µ and ν on N, we have d TV (µ; ν) P [X Y ] for any pair of N-valued rvs X and Y, with X µ and Y ν, which are defined on a common probability space (Ω, F, P). A pair (X, Y ) of N-valued rvs X and Y, which are defined on the same probability space (Ω, F, P), is called a coupling for the pair of pmfs µ and ν if X µ and Y ν

26 BCAM June Indeed, for any coupling (X, Y ), we have d TV (µ; ν) = 1 P [X = x] P [Y = x] 2 = P [X Y, X = x] P [X Y, Y = x] (P [X Y, X = x] + P [X Y, Y = x]) P [X Y, X = x] = P [X Y ] P [X Y, Y = x] Conclusion: d TV (X; Y ) small if coupling selected so that P [X Y ] small

27 BCAM June Independent coupling A coupling (X, Y ) for the pair of pmfs µ and ν is an independent coupling if the rvs X and Y are independent rvs, i.e., P [X = x, Y = y] = P [X = x] P [Y = y], x, y N. In that case P [X Y ] = = P [X = x, Y x] P [X = x] P [Y x]

28 BCAM June = P [X = x](1 P [Y = x]) = 1 P [X = x] P [Y = x]

29 BCAM June Maximal coupling (Dobrushin) Theorem 3 For pmfs µ and ν on N, we have d TV (µ; ν) = inf (P [X Y ] : (X, Y ) C(µ, ν)) where C(µ, ν) denotes the collection of all couplings for the pair µ and ν. Corollary 1 For pmfs µ and ν on N, there exists a coupling (X, Y ) in C(µ, ν) such that d TV (µ; ν) = P [X Y ] Such a coupling is called a maximal coupling for the pair µ and ν.

30 BCAM June Here is the one! Set γ(x) = min(ν(x), µ(x)), x N Define the pmf {η(x, y), (x, y) N 2 } on N 2 as γ(x) η(x, y) = (ν(x) γ(x))(µ(y) γ(y)) d TV (ν;µ) if x = y if x y Take Ω = N 2, F = 2 Ω and define P through the atoms, i.e., P [(x, y)] = η(x, y), (x, y) N 2

31 BCAM June Need to check Marginal conditions: ν(x) = η(x, y), x N y=0 and µ(y) = η(x, y), y N Achievability: P [X Y ] = d TV (µ; ν).

32 BCAM June Key observation 1: With A = {x N : ν(x) µ(x)}, we have ν(a) µ(a) = x A(ν(x) µ(x)) = (ν(x) µ(x)) + Similarly, µ(a c ) ν(a c ) = (µ(x) ν(x)) + Since we get ν(a) µ(a) = µ(a c ) ν(a c ), (ν(x) µ(x)) + = (µ(x) ν(x)) +

33 BCAM June Key observation 2: The obvious relations µ(x) ν(x) = (µ(x) ν(x)) + + (ν(x) µ(x)) +, x N now give (ν(x) µ(x)) + = (µ(x) ν(x)) + = d TV (µ; ν) Key observation 3: and (ν(x) γ(x)) = (µ(x) γ(x)) = (ν(x) µ(x)) + (µ(x) ν(x)) +

34 BCAM June Bernoulli rvs Pick 0 < p < p < 1. It is easy to verify that d TV (B(p), B(p )) = p p The independent coupling is not maximal The maximal coupling is achieved by taking B (p) = 1 [U p] and B (p ) = 1 [U p ] with U uniform on (0, 1). Indeed, P [1 [U p] 1 [U p ]] = P [p < U p ] = p p

35 BCAM June Poisson rvs Fact: For every λ > 0 and µ, we have d TV (Poi(λ), Poi(µ)) λ µ Assume 0 < λ < µ. Pick X Poi(λ) and Z Poi(µ λ) with X and Z independent. Thus, Y = X + Z Poi(µ). Obviously P [X Y ] = P [X X + Z] = P [Z 0] = 1 e (µ λ) and use the fact that 1 x e x for all x 0.

36 BCAM June A useful fact via coupling Proposition 1 For arbitrary pmfs µ 1,...,µ n, ν 1,...,ν n on N, it holds d TV (µ 1... µ n ; ν 1... ν n ) d TV (µ i ; ν i ) i=1

37 BCAM June An equivalent form Proposition 2 Consider mutually independent N-valued rvs X 1,...,X n defined on a common probability space with X i µ i for all i = 1,..., n. Similarly, consider mutually independent N-valued rvs Y 1,...,Y n defined on a common (possibly different) probability space with Y i ν i for all i = 1,...,n. Then, it holds d TV (X X n ; Y Y n ) d TV (X i ; Y i ) i=1

38 BCAM June For each i = 1,...,n, consider any coupling (X i, Y i ) in C(µ i, ν i ) such that the N 2 -valued rvs (X 1, Y 1 ),...,(X n, Y n ) are mutually independent pairs defined on a common probability space. By construction, X X n µ 1... µ n and Y Y n ν 1... ν n. since the rvs X 1,...,X n (resp. Y 1,...,Y n ) are mutually independent.

39 BCAM June By the coupling inequality, d TV (µ 1... µ n ; ν 1... ν n ) = d TV (X X n ; Y Y n ) P [X X n Y Y n ] P [ n i=1[x i Y i ]] P [X i Y i ]. i=1 Now use the maximal coupling for each i = 1,...,n so that d TV (µ i ; ν i ) = P [X i Y i ] and therefore, d TV (µ 1... µ n ; ν 1... ν n ) d TV (µ i ; ν i ). i=1

40 BCAM June An easy Poisson approximation result Consider a collection {B k (p k ), k = 1, 2,...,n} of mutually independent {0, 1}-valued (Bernoulli) rvs with P [B k (p k ) = 1] = 1 P [B k (p k ) = 0] = p k, k = 1,...,n and define S n := B 1 (p 1 ) B n (p n ). Also write λ n = p p n.

41 BCAM June Question: How well is S n approximated by a Poisson rv, say with parameter λ n? In particular, what can we say about the distance d TV (S n ; Poi(λ n ))? Answer: With mutually independent Poisson rvs Poi(p 1 ),...,Poi(p n ), we get d TV (S n ; Poi(λ n )) = d TV (B 1 (p 1 ) B n (p n ); Poi(p 1 ) Poi(p n )) d TV (B i (p i ); Poi(p i )) i=1 and an approximation is just within grasp!

42 BCAM June Computing d TV (B(p); Poi(p)) (0 < p < 1) The maximal coupling (B (p), Poi (p)) is given by P [B (p) = x, Poi (p) = y] 1 p if x = y = 0 = e p (1 p) if x = 1, y = 0 0 if x = 0, y = 1, 2,... p y y! e p if x = 1, y = 1, 2,... See Dobrushin s Theorem

43 BCAM June It is easy to see that P [B (p) Poi (p)] = ( e p (1 p) ) + y=2 p y y! e p = ( e p (1 p) ) + ( 1 e p pe p) = ( 1 e p) p Thus, d TV (B(p); Poi(p)) = ( 1 e p) p p 2 for all 0 < p < 1.

44 BCAM June A Poisson approximation is born! Thus, d TV (S n ; Poi(λ n )) d TV (B i (p i ); Poi(p i )) i=1 i=1 p 2 i Prohorov (1956) [Homogeneous case], LeCam (1960) [Heterogeneous case]

45 BCAM June THE STEIN-CHEN METHOD

46 BCAM June Handling lack of independence! Stein (1971) introduced a revolutionary method to build approximations to the distribution of sums of rvs as they appear in the CLT Chen (1975) extended the method to the situation encountered in the Poisson paradigm

47 BCAM June The basic ideas Fact 1: If Z = st Poi(λ), then E [λg(z + 1) Zg(Z)] = 0 for every bounded mapping g : N R. Fact 2: If Z = st Poi(λ), then any mapping f : N R satisfying can be put in the form E [f(z)] = 0 f(x) = λg(x + 1) xg(x), x N for some bounded mapping g : N R.

48 BCAM June Fact 3: The N-valued rv Z is Poisson distributed with parameter λ if and only if for every bounded mapping g : N R. E [λg(z + 1) Zg(Z)] = 0 (1) Pick k = 0, 1,... and take g k (x) = δ(x; k), x N Then, the condition (1) reads E [λg k (Z + 1) Zg k (Z)] = 0 and this is equivalent to λp [Z + 1 = k] = kp [Z = k].

49 BCAM June The leap Consider an N-valued rv W. If the distributional approximation W st Poi(λ) holds in some appropriate technical sense, then we should expect E [λg(w + 1) Wg(W)] 0 for every bounded mapping g : N R. Conversely, if the N-valued rv W has the property that E [λg(w + 1) Wg(W)] 0 for every bounded mapping g : N R, then it might be possible to tease out a formal statement for the approximation W st Poi(λ).

50 BCAM June More formally Step 1: For A N, find the mapping g : N R which solves the functional equation λg(x + 1) xg(x) = 1 [x A] Poi(λ; A), x N with g(0) = 0. Denote this solution by g λ,a. Why is this useful? Because for any N-valued rv W, we have E [λg(w + 1) Wg(W)] = P [W A] Poi(λ; A) and bounds on the total variation distance beween W and Poi(λ) are possibly within reach!

51 BCAM June Step 2: The bounds and g λ,a = sup g λ,a (x) min(1, λ 1 ) x N g λ,a = sup g λ,a (x + 1) g λ,a (x) λ 1 (1 e λ ) x N both hold uniformly in A! Question: How do we use these facts?

52 BCAM June Back to the independent case Consider a collection {B k (p k ), k = 1, 2,...,n} of mutually independent {0, 1}-valued (Bernoulli) rvs with P [B k (p k ) = 1] = 1 P [B k (p k ) = 0] = p k, k = 1,...,n and define W = B 1 (p 1 ) B n (p n ). Also, for each k = 1,...,n, write W k = B k (p k ) l=1,l k Set λ = p p n.

53 BCAM June Consider a bounded mapping g : N R. First, E [Wg(W)] = E [(B 1 (p 1 ) B n (p n )) g(w)] = E [B k (p k )g(w)] = = = k=1 E [B k (p k )g(w k + 1)] k=1 E [B k (p k )] E [g(w k + 1)] k=1 p k E [g(w k + 1)] k=1 as we note the independence of the rvs B k (p k ) and W k.

54 BCAM June On the other hand, E [λg(w + 1)] = p k E [g(w + 1)] k=1 Consequently, E [λg(w + 1) Wg(W)] = λe [g(w + 1)] E [Wg(W)] = p k E [g(w + 1) g(w k + 1)] = k=1 p k E [B k (p k ) (g(w k + 2) g(w k + 1))] k=1

55 BCAM June Without further invoking independence, we find two bounds: E [λg(w + 1) Wg(W)] p k E [B k (p k ) g(w k + 2) g(w k + 1) ] k=1 g g p k E [B k (p k )] k=1 k=1 p 2 k Also E [λg(w + 1) Wg(W)] 2 g k=1 p 2 k

56 BCAM June Now pick A N and apply this last bound with g = g λ,a : We get P [W A] Poi(λ; A) = E [λg λ,a (W + 1) Wg λ,a (W)] ( g λ,a or g λ,a ) k=1 p 2 k Conclusion: with d TV (W; Poi(λ)) C(λ) p 2 k. k=1 C(λ) = λ 1 ( 1 e λ)

57 BCAM June And the dependent case? For each k = 1, 2,...,n, we have E [B k (p k )g(w)] = E [B k (p k )g(w k + 1)] = E [B k (p k )E [g(w k + 1) B k (p k )]] = p k E [g(w k + 1) B k (p k ) = 1] = p k E [g(w) B k (p k ) = 1] Consequently, = E [λg(w + 1) Wg(W)] p k E [g(w + 1) E [g(w) B k (p k ) = 1]] k=1

58 BCAM June What if? Coupling assumption: Assume that for each k = 1,..., n, there exist a probability triple (Ω k, F k, P k ) and N-valued rvs V k and U k both defined on it such that U k = st W and V k + 1 = st [W B k (p k ) = 1] Therefore, E [g(w + 1) E [g(w) B k (p k ) = 1]] = E [g(u k + 1) g(v k + 1)] g E [ U k V k ]

59 BCAM June It is now plain that E [λg(w + 1) Wg(W)] p k E [g(u k + 1) g(v k + 1)] k=1 g p k E [ U k V k ] k=1

60 BCAM June Now pick A N and apply this last bound with g = g λ,a : We get P [W A] Poi(λ; A) = E [λg λ,a (W + 1) Wg λ,a (W)] g λ,a p k E [ U k V k ] k=1 Conclusion: d TV (W; Poi(λ)) C(λ) p k E [ U k V k ] k=1

61 BCAM June Strategy A good approximation can be obtained if we are able to construct the couplings (U 1, V 1 ),...,(U n, V n ) such that E [ U k V k ] is small since then d TV (W; Poi(λ)) C(λ) p k E [ U k V k ] k=1 U k = st W and V k + 1 = st [W B k (p k ) = 1], k = 1,...,K Many strategies have been developed to do just that!

62 BCAM June Sometimes two moments suffice An interesting situation occurs when the construction is such that V k U k, k = 1, 2,... then = = p k E [ U k V k ] k=1 p k E [U k V k ] k=1 p k E [U k ] k=1 k=1 p k E [V k ]

63 BCAM June = p k E [W] p k (E [W B k (p k ) = 1] 1) k=1 k=1 k=1 [ n ] = E [W] 2 E B k (p k )E [W B k (p k ) = 1] + λ = λ + E [W] 2 E [ W 2] = λ Var[W]

On the convergence of sequences of random variables: A primer

BCAM May 2012 1 On the convergence of sequences of random variables: A primer Armand M. Makowski ECE & ISR/HyNet University of Maryland at College Park armand@isr.umd.edu BCAM May 2012 2 A sequence a :