Draft. Advanced Probability Theory (Fall 2017) J.P.Kim Dept. of Statistics. Finally modified at November 28, 2017

Size: px
Start display at page:

Download "Draft. Advanced Probability Theory (Fall 2017) J.P.Kim Dept. of Statistics. Finally modified at November 28, 2017"

Transcription

1 Fall 207 Dept. of Statistics Finally modified at November 28, 207

2 Preface & Disclaimer This note is a summary of the lecture Advanced Probability Theory A held at Seoul National University, Fall 207. Lecturer was Minwoo Chae, and the note was summarized by, who is a Ph.D student. There are few textbooks and references in this course, which are following. Weak Convergence and Empirical Processes with Applications to Statistics, Van der Vaart & Wellner, Springer, 996. Asymptotic Statistics, Van der Vaart, Cambridge University Press, 998. Also I referred to following books when I write this note. The list would be updated continuously. Convergence of probability measures, Billingsley, John Wiley & Sons, 203. Lecture notes on Topics in Mathematics I held by Gerald Trutnau spring 205. Finally, some examples or motivation would be complemented based on the lecture notes summarized by myself of Probability Theory I on spring 206; Theory of Statistics II on fall 206, most of which are available at If you want to correct typo or mistakes, please contact to: joonpyokim@snu.ac.kr

3 Chapter Stochastic Convergence. Motivation Recall some basic results in asymptotics. Theorem.. SLLN. Let X, X 2,, X n be i.i.d random variables with E X <. Then n X i P a.s EX. Theorem..2 CLT. Let X, X 2,, X n be i.i.d random variable with E X 2 <. Then n where µ = EX and σ 2 = EX 2 µ2. d X i µ N0, σ2, From now on we will use following notations. Let Ω, A, P or Ω i, A i, P i be underlying probability space or sequence of them; D, d be a metric space; D = BD be a Borel σ-algebra of D; C b D be the set of all bounded continuous real functions on D; X X n, resp. be a map from Ω Ω i, resp. to D not necessarily be measurable. Remark..3. Note that LLN and CLT holds for fx i s, i.e., n fx i P a.s EfX 2

4 and n d fx i EfX i N0, σ2 f holds for σ 2 f = varfx, provided that E[fX 2 ] <. Our question in this course is: For a class F of real functions. do LLN and CLT hold uniformly in some sense? For example, does it hold P-a.s. or in probability? For finite f,, f k, n sup n f F fx i EfX f X i Ef X,, n 0 f k X i Ef k X converges weakly to MVN. How the convergence of infinite dimensional joint net can be defined? n f X i Ef X For this we see more general notion of weak convergence here. Definition..4. Let P n, P be Borel probability measures on D, D, d. Then i P n converges weakly to P, denoted as P n P, iff D w fdp n D f F fdp f C b D. ii If X n and X are D-valued random variables with laws P n and P respectively, then X n converges weakly to X, denoted as X n X, iff P n w w P. For weak convergence of., we may use the definition..4. For this,. should be embedded into a metric space. Example..5. Let Ω n, A n, P n = [0, ], B, λ, and F = { [0,t] : 0 t } D[0, ],. where B = B[0, ] is a Borel σ-algebra on [0, ] and λ denotes the Lebesgue measure. Then. can be viewed as a D[0, ]-valued random variables. A natural metric on D[0, ] is the uniform metric 3

5 defined as df, f 2 = sup f t f 2 t f, f 2 D[0, ]. t [0,] However, under such metric, D[0, ] is not separable, which makes the space too large to work with. Furthermore, under the metric,. may even not be measurable. Proposition..6. A map X : [0, ] D[0, ] defined as Xω = [ω,] is NOT Borel measurable with the uniform metric. Proof. Figure.: Proof of proposition..6. Let B s be the open ball of radius /2 in D[0, ] centered on [s,]. Then G = s S B s is an open set in D[0, ] for any S [0, ]. However, note that Xω B s if and only if ω = s, and hence X G = X G = S holds. If X is Borel measurable, then every subset S of [0, ] should be also Borel measurable; it yields contradiction. To handle this issue, we may consider some alternative views like: To consider a weaker σ-algebra, such as ball σ-alg. In here, ball σ-algebra is a σ-algebra generated by all open balls. If the space is separable, then ball σ-algebra is equivalent to Borel σ-algebra. Note that with smaller σ-algebra, measurability condition becomes weaker. To consider a weaker metric. This is one typical approach dealing with empirical process, using Skorokhod s metric. Under Skorokhod metric, D[0, ] becomes separable, and it is wellknown that there exists an equivalent metric with Skorokhod metric making D[0, ] also complete Billingsley. Drop the measurability requirement, that is to extend some notions of weak convergence to nonmeasurable maps. We shall focus on this approach in this course..2 Outer Integral From now on, let Ω, A, P be an underlying probability space. Also, let T : Ω R = [, ] be an arbitrary map not necessarily be measurable and B Ω be an arbitrary set not necessarily be 4

6 measurable. Definition.2.. i The outer integral of T w.r.t. P is defined as E T := inf{eu : U T, U : Ω R is measurable & EU exists}, where EU exists means EU + < or EU < note that it can be defined only except the case. ii The outer probability of B is iii The inner integral of T w.r.t. P is defined as iv The inner probability of B is P B = inf{pa : A B, A A}. E T = E T. P B = P Ω\B. Remark.2.2. Note that definitions in iii and iv is equivalent to using similar argument as i and ii, i.e., and E T = sup{eu : U T, U : Ω R is measurable & EU exists} P B = sup{pa : A B, A A}. It is well known that the map T achieving the supremum always exists provided that its expectation exists. Lemma.2.3. For any map T : Ω R, there exists a measurable map T : Ω R with i T T ; ii T U P-a.s. for any U : Ω R with U T P-a.s.. Furthermore, such T is unique up to P-null sets, and E T = ET provided that ET exists. Definition.2.4. Such function T is called minimal measurable majorant of T. Similarly, the maximal measurable minorant T can be defined as T = T. 5

7 There are several similarities between outer integral and normal one. Many concepts and propositions in the probability theory can be extended to the outer probability statement. However there are also several statements those not hold in outer-measure version. One example is Fubini s theorem. Lemma.2.5 Fubini theorem in outer-integral. Let T be a real-valued function on the product space Ω Ω 2, A A 2, P P 2. Then where E 2 is defined as E T E E 2 T E E 2T E T, E 2T ω = inf { E 2 U : Uω 2 T ω, ω 2, U : Ω 2 R is measurable and E 2 U exists } for ω Ω and vice versa. Now we will extend the notion of weak convergence to non-measurable maps..3 Weak Convergence Definition.3.. i A Borel probability measure L on D is tight if ɛ > 0 cpt set K with LK ɛ. ii A Borel measurable map X : Ω D is tight if the law of X LX := P X is tight. iii L or X is separable if there exists separable measurable set with probability, i.e., Lemma.3.2. separable measurable set A D s.t. LA = or PX A =. i If L or X is tight, then L or X is separable. ii The converse is true if D is complete. That is, given that D is complete, separability of L or X implies tightness. Now we are ready to define weak convergence of arbitrary map X n. Definition.3.3 Weak Convergence. Let Ω n, A n, P n be a sequence of probability spaces and X n : Ω n D be arbitrary maps may be non-measurable. Then X n is said to converge weakly to a Borel measure L, denoted as X n w L, if E fx n fdl f C b D. 6

8 Furthermore, if there is a Borel measurable map X with law L, i.e., LX = L, then it is denoted as X n w X. We can say similar arguments about weak convergence of measurable maps. Theorem.3.4 Portmanteau. TFAE. w i X n L ii lim inf n P X n G LG for any open set G iii lim sup n P X n F LF for any closed set F iv lim inf n E fx n fdl for any function f which is l.s.c & bdd below v lim sup n E fx n fdl for any function f which is u.s.c & bdd above vi lim P X n B = lim P X n B = LB for any L-continuity set B i.e., L B = 0 vii lim inf n E fx n fdl for any function f which is bdd, Lipschitz continuous, and nonnegative Recall that a function f is lower semicontinuous l.s.c if lim inf fx fx 0 x x 0 and vice versa. Our first important result in measure theory is continuous mapping theorem. Theorem.3.5 Continuous mapping theorem. Let D, d and E, e be metric spaces and g : D E be continuous at every point of a set D 0 D. If X n w gx n gx. w X and X takes its values in D 0, then Next to the continuous mapping theorem, Prokhorov theorem or Helly s principle, in special case is the most important theorem on weak convergence. To formulate the result, two new concepts are needed. Definition.3.6. ii {X n } is asymptotic tight if i {X n } is asymptotic measurable if E fx n E fx n 0 f C bd. ɛ > 0 cpt set K s.t. lim inf n P X n K δ ɛ δ > 0, where K δ := {y D : dy, K < δ} is the δ-enlargement of K. 7

9 Remark.3.7. A collection of Borel measurable maps X n is uniformly tight if ɛ > 0 cpt set K s.t. inf PX n K ɛ. n It is also equivalent if inf in the last statement is replaced to lim inf The δ in the definition of asymptotic tightness may seem a bit overdone it enlarges the set K, but nothing is gained in simple cases: Proposition.3.8. If D is separable and complete, then uniformly tightness and asymptotically tightness are the same for measurable maps. Following result might be useful to verify asymptotic measurability or tightness. Lemma.3.9. w i If X n X, then X n is asymptotically measurable. w ii If X n X, then X n is asymptotically tight X is tight. Now we are ready to state Prokhorov theorem. Theorem.3.0 Prokhorov. i If {X n } is asymptotically tight and asymptotically measurable, then {X n } is relatively compact, i.e., every subsequence {X n } has a further subsequence {X n } converging weakly to a tight Borel law. ii Relatively compact collection {X n } is asymptotically tight if D is Polish space i.e., separable and complete. Remark.3.. By previous theorem, for Borel measures on Polish space, the concepts relatively compact, asymptotically tight and uniformly tight are all equivalent. Our final extension is: w w Lemma.3.2. Let X n X and Y n c, where c is constant and X has separable Borel law. Then X n, Y n X, c. w Corollary.3.3. Let X n and X be on separable Banach space topological vector space and Y n and c be scalars. Then addition and scalar multiplication can be defined, which are also continuous operator on separable Banach space. Thus we can get X n + Y n w X + c 8

10 and Furthermore, if c 0, we can also obtain X n Y n w cx. w X n /Y n X/c..4 Spaces of Bounded Functions Definition.4.. Let T be an arbitrary set. Then the space l T is defined as where f = sup ft. t T It is well-known that l is Banach space. l T = {all functions f : T R s.t. f < }, Definition.4.2 Stochastic Process. A collection {Xt : t T } of random variables defined on the same probability space Ω, A, P is called a stochastic process. Note that, if every sample path t Xt, ω belongs to l T, i.e., every sample path is bounded, then X can be viewed as a random map from Ω to l T. For any arbitrary map X : Ω l T, it is natural to call a finite dimensional projection Xt, Xt 2,, Xt k for t, t 2,, t k T as a marginal. Obviously our interest is to find equivalent condition for asymptotic tightness of weak convergence of a sequence of random maps X n. Before starting, we introduce following two lemmas which will be used. Lemma.4.3. Let X n : Ω n l T be asymptotically tight. Then X n is asymptotically measurable X n t is asymptotically measurable for any t T. It implies that every stochastic process is asymptotically measurable; each marginal is random variable and hence measurable. Lemma.4.4. Let X, Y be Borel-measurable maps into l T and they are tight. Then LX = LY all marginals are equal in law. It means that for tight measurable maps, laws of all marginals determine the joint laws. Now we are ready to introduce our first result. 9

11 Theorem.4.5. Let X n : Ω n l T, n =, 2, be arbitrary maps. Then X n converges weakly to a tight limit if and only if X n is asymptotically tight; 2 every marginal converges weakly to a limit. Proof. is trivial from lemma.3.9. Next, note that, for any fixed t, t 2,, t k T, projection g : l T R k z zt, zt 2,, zt k is continuous function on l T. Thus continuous mapping theorem implies 2. Let t T be arbitrarily chosen. Then condition 2 implies that X n t is asymptotically measurable by lemma.3.9. Since t T was arbitrary, X n is asymptotically measurable by lemma.4.3. Then by Prokhorov theorem, every subsequence {n } {n} has a further subsequence {n } {n } which makes X n converges weakly. If such limit is all equal, then X n converges weakly. This follows from convergence of every marginal condition 2 and lemma.4.4. In details, for any subsequence {n } {n}, there exist a further subsequence {n } {n } and Y = Y n such that X n w Y. Note that Y is tight by condition and lemma.3.9, and by lemma.4.4, every Y n has the same law for any choice of subsequence {n }. Let X be tight r.v. s.t. LX = LY. Then we get w X. and therefore X n w X, n {n } {n} {n } {n } s.t. X n Theorem.4.5 tells that weak convergence for a sequence of random map is implied by asymptotic tightness and marginal convergence. Marginal convergence can be established by and of the well-known methods for proving weak convergence on Euclidean space. Asymptotic tightness can be given a more concrete form, either through finite approximation or essentially Arzelà-Ascoli characterization. Second approach is related to asymptotic continuity of the sample paths. Definition.4.6. A map ρ : T T R is called a semimetric or pseudometric if ρx, y 0 and x = y implies ρx, y = 0; 2 ρx, y = ρy, x; 3 ρx, z ρx, y + ρy, z. It may not satisfy ρx, y = 0 = x = y 0

12 Definition.4.7. Let X n : Ω n l T be a sequence of maps and ρ be a semimetric on T. Then X n is called asymptotically uniformly ρ-equicontinuous in probability if ɛ, η > 0 δ > 0 s.t. lim sup P sup ρs,t<δ X n s X n t > ɛ < η. Recall that a collection {f n : T R} of functions is uniformly equicontinuous if ɛ > 0 δ > 0 s.t. sup ρs,t<δ f n s f n t < ɛ uniformly on n. The definition in.4.7 is slightly changed to make the notion in probability. Now we are ready to see some equivalent conditions of asymptotic tightness; which is one of the goal of this section. Theorem.4.8. TFAE. i X n is asymptotically tight. ii X n t is asymptotically tight t T ; 2 semimetric ρ on T s.t. T, ρ is totally bounded and X n is asymptotically uniformly ρ-equicontinuous in probability. iii and 3 holds, where 3 ɛ, η > 0 finite partition {T,, T k } of T s.t. lim sup P max sup X n s X n t > ɛ i s,t T i < η..2 Remark.4.9. ii is related to Arzelà-Ascoli characterization of the space, while iii is related to the finite approximation of state space T. iii means that for any ɛ > 0, T can be partitioned into finitely many subset T i such that asymptotically the variation of the sample paths t Xt is less than ɛ on every T i. Proof. i ii. First we have to show. Let π t : x xt be a projection. Given ɛ > 0, there exists a compact set K s.t. From lim inf P X n K δ > ɛ δ > 0. n a K δ = b K s.t. b a < δ = π t b π t a b a < δ = π t a π t K δ,

13 we get lim inf n P X n t π t K δ lim inf n P X n K δ > ɛ δ > 0. As π t is continuous, π t K is compact, and it is the desired compact set. Now we show 2. Let ɛ be given, and K K 2 be a sequence of compact subsets of l T satisfying Now for any m, define ρ m as lim inf P X n K ɛ m m. Claim. T, ρ m is totally bounded. ρ m s, t = sup zs zt. z K m asymptotic tightness Remark.4.0. Note that totally boundedness means that ɛ > 0 T is covered with finite radius-ɛ balls w.r.t ρ. It is also equivalent to: ɛ > 0 finite subset whose distance from any element of T is less than ɛ. Proof of Claim. For given η > 0, choose z, z 2,, z k l T s.t. K m k B η z j. j= It can be chosen because of compactness of K m Since each z i is a bounded function, A := {z t,, z k t : t T } R k is bounded set, it is totally bounded, and hence t, t 2,, t p T s.t. It gives that for any t T, t i s.t. and hence we get A p B η z t i,, z k t i. z t,, z k t B η z t i,, z k t i, ρ m t, t i = sup zt zt i z K m sup min z K m j k zt z j t + z j t z j t i + z j t i zt i }{{}}{{} z z j z z j totally bounded space is bounded; converse is also true in Euclidean space. 2

14 2 sup min z z j + max z jt z j t i z K m j k j k }{{}}{{} η def. of {z j } η def. of {t i } 3η. In summary, η > 0 {t,, t p } s.t. t T t i s.t. ρ m t, t i 3η, which gives totally boundedness of T, ρ m. Claim Claim 2. T, ρ is totally bounded, where ρs, t = 2 m ρ m s, t. m= Proof of Claim 2. Note that ρ m increases as m grows by the definition. For η > 0, take m s.t. 2 m < η. Then by Claim, {t, t 2,, t p } s.t. T p B η t i ; ρ m. Then for every t T, t i s.t. ρ m t, t i < η, and so ρt, t i It means that η > 0 {t, t 2,, t p } s.t. m 2 k ρ k t, t i + 2 k }{{} k=m+ ρ mt,t i }{{} k= η + η = 2η. =2 m <η t T t i s.t. ρt, t i 2η, which gives that T, ρ is totally bounded. Claim 2 Claim 3. X n is asymptotically uniformly ρ-equicontinuous in probability. Proof of Claim 3. Let ɛ > 0. If z z 0 < ɛ for some z 0 K m, then zs zt zs z 0 s + }{{} z 0 s z 0 t }{{} + z 0 t zt }{{} <ɛ sup z Km zs zt =ρ ms,t <ɛ 2ɛ + ρ m s, t. 3

15 If ρs, t < 2 m ɛ, then ρ m s, t 2 m ρs, t < ɛ, which gives ρ m s, t < ɛ. Thus z K ɛ m = z 0 K m s.t. z z 0 < ɛ = zs zt 2ɛ + ρ m s, t 3ɛ provided that ρs, t < 2 m ɛ. Therefore, Now letting δ < 2 m ɛ, we get lim inf P sup ρs,t<δ K ɛ m { z : sup ρs,t<2 m ɛ X n s X n t 3ɛ In summary, we get m N ɛ > 0 δ > 0 s.t. lim inf P sup ρs,t<δ zs zt 3ɛ X n s X n t 3ɛ }. lim inf P X n K ɛ m m. m, which implies that X n is asymptotically uniformly ρ-equicontinuous in probability. Claim 3 ii iii. By the assumption, given ɛ, η > 0, δ > 0 s.t. lim sup P sup ρs,t<δ X n s X n t > ɛ Since T, ρ is totally bounded, finite set {t, t 2,, t p } T s.t. Now letting T i = B δ/2 t i ; ρ, we get and therefore T p B δ/2 t i ; ρ. j= < η. s, t T i = ρs, t ρs, t i + ρt i, t < δ, sup zs zt s,t T i sup ρs,t<δ zs zt for any i =, 2,, p. It implies the conclusion lim sup P max i sup X n s X n t > ɛ s,t T i lim sup P sup X n s X n t > ɛ ρs,t<δ. 4

16 iii i. Suppose that for given ɛ, η > 0, holds. Note that, for a fixed t i T i, lim sup P max sup X n s X n t > ɛ i s,t T i sup X n s X n t ɛ = sup X n s sup X n s X n t i + X n t i X n t i + ɛ, s,t T i s T i s T i and hence lim inf P X n max X nt i + ɛ lim inf i p P < η max sup X n s X n t ɛ i p s,t T i It implies that X n is asymptotically tight. Why? First note that for each i, M i > 0 s.t. Letting M = max i M i, we get lim inf P X n t i < M i + ɛ η ɛ > 0. lim sup P max X n t i M + ɛ lim sup P X n t i M + ɛ pη ɛ > 0, i i.e., max i X n t i is asymptotically tight. Now let K be a compact set s.t. Then which implies lim inf P max X n t i K ɛ η ɛ > 0. i lim inf P X n max X n t i + ɛ, max X n t i K ɛ i i = lim inf P X n max X n t i i ɛ, max X n t i K ɛ i lim inf Xn K 3ɛ, P lim inf P Xn K 3ɛ 2η, i η. i.e., X n is asymptotically tight. Now, let ζ > 0 and a sequence ɛ m 0 be given. Choose M > 0 s.t. lim sup P X n > M ζ. 5

17 For ɛ m and η = 2 m ζ, let T = be a partition satisfying lim sup P max i k m k m T m,i sup X n s X n t > ɛ m s,t T m,i Now, let {z m,, z m,2,, z m,pm } be the set of all functions in l T that are constant on each T m,i taking values Now let a Function z m,i s. By construction, for each m, M 0, ±ɛ m, ±2ɛ m,, ± ɛ m. ɛ m < η. b Approximating elements of l T with z m,i s. Figure.2: Function z m,i s and approximation. K m = P m B ɛm z m,i and K = X n M and max sup X n s X n t < ɛ m implies X n K m..3 i s,t T m,i Since K is closed and hence complete 2 and totally bounded, it is compact. Thus our claim is: Claim. δ > 0 m s.t. K δ m K i. m= K m. Proof of Claim. Assume not. Then δ > 0 s.t. m K δ m K i. That is, z m s.t. z m m K i but z m / K δ. Now we use Arzelà-Ascoli formulation: Note that {z n } K = P B ɛ z,i, i.e., an infinite number of z n belong to finite balls, which means that at least one of them contains also an infinite number of z n. Now consider a subsequence {z n} of {z n } in B ɛ z,i for some i. In the same way, 2 closed subset of complete space is complete 6

18 there exists a further subsequence {z n } in B ɛ2 z 2,i2 for some i 2. z z 2 z 3 z z 2 z 3 z z2 z Now define z l as a sequence z, z2, z 3,, and then we get z l is Cauchy sequence. Since l T is complete, z l converges. Now note that z l l m K i by construction for any l m, and m K i is closed, the limit z of z l belongs to m K i for any m. It implies that z K, which is contradictory to z m / K δ m. Claim Now, by the Claim, m lim sup P X n / K δ lim sup P X n / K i K i lim sup P X n > M or max sup X n s X n t > ɛ m for some m m.3 i s,t T m,i m lim sup P X n > M + lim sup P max sup X n s X n t > ɛ m i m s,t T = m,i m ζ + 2 m ζ < 2ζ. If the condition asymptotic tightness is replaced with weak convergence, then we can obtain stronger argument. w Proposition.4.. If X n X, where X is tight, then sample path t Xt, ω is uniformly ρ-continuous a.s., where ρ is the semimetric constructed in the proof of i ii part of theorem.4.8. Proof. Let notations be continued. We get PX Km ɛ lim sup P X n Km ɛ Portmanteau lim inf P X n K ɛ m m 7

19 for any m and ɛ > 0. By letting ɛ 0, we get PX K m m, which gives Hence, for we get P X m= ρ m s, t = sup zs zt and ρs, t = z K m Also, from ρ m s, t 2 m ρs, t, we get K m =..4 2 m ρ m s, t, m= z K m = zs zt ρ m s, t s, t T. ρs, t < δ = ρ m s, t < ɛ for any δ < 2 m ɛ. Therefore, we get the conclusion; For m = mω s.t. Xω K m, ɛ > 0 δ = δm s.t. Proposition.4.2. If X n w sup s,t T ρs,t<δ Xs Xt < ɛ. X, T, ρ is totally bounded, and sample path t Xt, ω is uniformly ρ-continuous P-a.s., then X n is asymptotically tight and asymptotically uniformly ρ- equicontinuous in probability. Remark.4.3. Before the proof, note that: The set of uniformly continuous functions on a totally bounded set is complete & separable in uniform metric. A brief proof is following. It is well known that CT is complete; CT is separable if and only if T is compact. It gives that the set of continuous functions is complete and separable if T is compact. Meanwhile, followings are also well known: Totally bounded, complete set is compact; uniformly continuous function can be extended to a continuous function on a completion. In other words, uniformly continuous function on a totally bounded set is equivalent to a continuous function on a compact set. 8

20 Proof. Note that T, ρ is totally bounded and t Xt, ω is uniformly ρ-continuous, so the set of realization of such X is complete and separable. Since r.v. on complete separable space is tight, X is tight, which implies X n is asymptotically tight lemma.3.9. Since X is tight and uniformly ρ-continuous a.s., η > 0 K : cpt set of uniformly ρ-continuous functions s.t. PX K η. Note that, from Portmanteau lemma, lim inf P X n K ɛ PX K ɛ η ɛ > 0. Since K is totally bounded, ɛ > 0 z, z 2,, z k K s.t. K k B ɛz i, which implies Since each z i is uniformly continuous, Then we get, for z B 2ɛ z i, K ɛ k B 2ɛ z i. δ > 0 s.t. ρs, t < δ = max z is z i t < ɛ. i k ρs, t < δ = zs zt zs z i s + z i s z i t + z i t zt 2ɛ + ɛ + 2ɛ = 5ɛ, and therefore, lim inf P sup ρs,t<δ X n s X n t 5ɛ lim inf P X n k B 2ɛ z i lim inf P X n K ɛ η. Concluding remark of this chapter is that, in most cases we are interested in the case that limit process is Gaussian, whose finite-dimensional convergence is obtained by CLT. In here, semimetric ρ in proposition.4.2 becomes p-norm. Definition.4.4. A stochastic process is called Gaussian if each marginal has multivariate normal distribution. 9

21 Remark.4.5. Note that if X n w X, where X is tight Gaussian process, then the metric ρs, t = ρ p s, t = E Xs Xt p /p, p makes X n asymptotically uniformly ρ-equicontinuous in probability. 20

22 Chapter 2 Maximal Inequalities and Symmetrization 2. Introduction In here, we use following notation. Let X, B, P be a baseline probability space, and X, B, P be the product space. We consider the projection into the ith coordinate, X i : X X. Then X, X 2, become i.i.d. r.v. s with distribution law P. Definition 2... Denote and P n := n G n := n δ Xi empirical measure δ Xi P. empirical process In here, δ X denotes the dirac-delta measure. Remark Often, G n denotes the stochastic process f G n f i.e., G n f f F, where F is a collection of measurable functions and Qf denotes Qf = fdq for a measurable function f and signed measure Q. Note that G n f = n Definition For signed measure Q, define fx i Pf. Q F := sup{ Qf : f F}. Our first step is very well-known results: Proposition For each f F, i P n f Pf a.s.. SLLN 2

23 ii G n f d N0, Pf Pf2. CLT We are interested in uniform versions of previous proposition. Uniform version of i becomes: In here P denotes outer probability. P P n P F Definition A collection of integrable measurable function F satisfying 2. is called P- Glivenko-Cantelli class. Next, uniform version of ii can be obtained as following. Assume that sup fx Pf < x X. f F Then f G n f can be viewed as a map into l F. If G n is asymptotically tight in l F, then G n converges weakly to a tight Borel measurable map G in l F, from CLT-like argument and theorem.4.5. Definition A class F of square-integrable measurable functions is called P-Donsker class if G n is asymptotically tight. Remark A finite collection F of integrable functions is trivially P-Glivenko-Cantelli. Furthermore, a finite collection F of square-integrable functions is P-Donsker iii i part of theorem.4.8. Example Let X, X 2, be i.i.d r.v s in R, and Then F := {,t] : t R}. P n P F = sup F n t F t 0 a.s. t R for any probability measure P on R. It gives that F is P-Glivenko-Cantelli for any P. To show F is P-Donsker, we should show asymptotical tightness of G n, which is obtained by controlling supremum on the finite partition. For this, we need some maximal inequalities and technique of controlling variation, which will be covered on the rest part of this chapter. 2.2 Tail and Concentration Bounds The most simple case is well-known to us: 22

24 Figure 2.: Supremum on the infinite set might be controlled as an aggregation of supremum on the finite net and variation in each small ball. Theorem 2.2. Markov inequality. Let X be a r.v. with mean µ. Then E X µ k P X µ t t k t, k > 0. It gives a polynomial bound for tail probability. However, such result may not be so useful because of its roughness. Some results about exponential bounds are also well-known, which are often called concentration inequalities. Theorem Chernoff bound. Proof. Clear from PX µ t EeλX µ e λt λ > 0 t R. IX µ t = Ie λx µ e λt eλx µ e λt. Example Gaussian tail bound. Let X Nµ, σ 2 be Gaussian r.v.. Then by Chernoff ineq., Hence, we get PX µ t e λt Ee λx µ = exp λt + σ2 PX µ t inf λ>0 exp λt + σ2 2 λ2 2 λ2 for any t > 0, λ > 0. = e t2 /2σ 2 t > 0. As shown, Gaussian random variable has a squared-exponential tail bound. In general, the collection of such distribution is named as sub-gaussian. Definition A r.v. X with mean EX = µ is called sub-gaussian if σ > 0 s.t. Ee λx µ e σ2 λ 2 /2 λ R, 23

25 Remark Note that right hand side of definition is an mgf of N0, σ 2. Thus sub- Gaussianity means smaller scale of mgf than that of Gaussian distribution, i.e., having tail which decays faster than Gaussian scale. Remark Obviously, if X is sub-gaussian, we get PX µ t exp t2 2σ 2 t 0. Furthermore, if X is sub-gaussian with parameter σ, so is X, and hence P X µ t = PX µ t + P X µ t 2 exp Example A r.v. ɛ is called Rademacher if In this case, Ee λɛ = eλ + e λ = 2 and hence ɛ is sub-gaussian with σ =. Pɛ = = Pɛ = = 2. λ 2k 2k! λ 2k k=0 k=0 λ 2 2 k k! = exp, 2 t2 2σ 2. Actually, this result is not so surprising, because distribution with bounded support has extremely light tail, clearly lighter tail than that of Gaussian. We can easily formulate the conjecture as following: Example Let X be a r.v. with EX = µ and Pa X b =. Then X is sub-gaussian with σ = b a 2. To show this, define ψλ = log Ee λx. cgf Then ψ0 = 0, ψ 0 = µ and where ψ λ = E λ X 2 E λ X 2 E λ fx := EfXeλX Ee λx. Note that E λ can be viewed as an expectation operator w.r.t weight proportional to e λx. Now note that: If a Y b a.s., then vary = min EY y 2 E Y b + a 2 y 2 b a 2 2 holds. 24

26 Since ψ λ can be viewed as a variance, we get Thus we obtain and hence which yields b a 2 ψ λ λ R. 2 b a 2 sup ψ λ, 2 λ R ψλ = ψ0 + ψ 0λ + ψ ξ 2 b a ψ0 + ψ 2 0λ + 2 = λµ + λ2 b a 2, 2 2 Ee λx µ = e λµ+ψλ exp λ 2 2 λ 2 2 λ 2 b a 2. 2 Our next result is that independent sum of sub-gaussian random variables is also sub-gaussian. Theorem Hoeffding s inequality. Let X i be independent r.v. s with EX i = µ i, and each X i is sub-gaussian with σ = σ i. Then n X i is also sub-gaussian with parameter n /2, σ2 i i.e., P X i µ i t exp 2 n Proof. It is sufficient to show that It is clear from t 2 σ2 i X + X 2 is sub-gaussian with σ 2 = σ 2 + σ2 2. t 0. E e λx +X 2 µ +µ 2 = E e λx µ E e λx 2 µ 2 σ 2 exp 2 λ2 σ 2 exp 2 2 λ2 σ 2 = exp + σ2 2 λ 2. 2 Following corollary is clear from Hoeffding s inequality, but it is very useful result. It will also be widely used in this course. Corollary If each X i is bounded and independent, i.e., Pa i X i b i =, then P X i µ i t exp 2t 2 n b i a i 2. 25

27 Before we move the step, let s check some equivalent conditions for sub-gaussianity. Theorem For any X with EX = 0, TFAE. i σ > 0 s.t. Ee λx exp λ 2 2 σ2 λ R i.e., X is sub-gaussian. ii c and Gaussian r,v, Z N0, τ 2 s.t. P X s cp Z s s 0. iii θ 0 s.t. EX 2k 2k! 2 k k! θ2k k =, 2,. iv σ > 0 s.t. Ee λx2 /2σ 2 λ [0,. λ Now we see some other notion. The notion of sub-gaussianity is fairly restrictive, so that it is natural to consider various relaxations of it. The class called sub-exponential r.v. s are defined by a slightly milder condition on the mgf and hence has a slower tail probability density rate. Definition A random variable X is called sub-exponential if ν, b > 0 s.t. Ee λx µ e ν2 λ 2 /2 λ : λ b. Obviously, sub-gaussianity implies sub-exponentiality. The converse is not true; sub-gaussianity is stronger condition. Example Let Z N0, amd X = Z 2. Then e λ Ee λx = for λ < 2λ 2, and it does not exists for λ > /2. With simple calculation, we can verify that e λ 2λ e 2λ2 λ : λ < 4. Therefore, X is sub-exponential, but not sub-gaussian. Theorem For any X with EX = 0, TFAE. i ν, b > 0 s.t. Ee λx e λ2 ν 2 /2 λ : λ /b i.e., X is sub-exponential. ii c 0 > 0 s.t. Ee λx < λ : λ c 0. iii c, c 2 > 0 s.t. P X t c e c 2t t > 0. iv σ, M > 0 s.t. EX k 2 σ2 k!m k 2 k = 2, 3, Bernstein condition The condition iv is called Berstein condition. It is known that: 26

28 Lemma If EX = 0 and X satisfies Bernstein condition, then Proof. Note that holds, which implies provided that λ. It gives M and hence t 2 P X t 2e 2σ 2 +Mt t > 0. Ee λx = λ k EX k k=0 = + + k! λ k EX k k=2 k! λ k 2 σ2 k!m k 2 k=2 = + λ2 2 σ2 k! λ M k 2 k=2 Ee λx + λ2 2 σ2 λ M e λ 2 σ 2 2 λ M PX t e λt+ λ2 σ 2 2 λ M λ : λ M, PX t Similar technique on X gives the conclusion λ 2 σ 2 inf λ /M e λt+ 2 λ M P X t 2e t 2 2σ 2 +Mt. = e t 2 2σ 2 +Mt. We can easily extend the result to the independent sum of random variables. Corollary Bernstein s inequality. Let X i be independent random variables satisfying Bernstein condition Then EX i = 0 and E X i k σ2 i 2 k!m k 2, k = 2, 3,. P X + + X n t 2e 2 t 2 n σ i 2+Mt. 27

29 Proof. By Chernoff inequality, we get P X + + X n t 2e λt Ee λ n Xi 2 exp t We get the conclusion by letting λ = Mt + n i.i.d Example Let Z k N0,. Then P n for any λ s.t. λ < 4. Since we get σ2 i. λt + Zk 2 t e λt exp λ Zk 2 n k= n = 2e λt e λ/n 2λ/n k= min λ </4 P n 2e λt e 2nλ/n2 = 2e 2λ2 λt+ n 2λ 2 n λt = nt2 8, Zk 2 t 2e nt2 k= λ 2 σ 2 i 2 λ M Example Johnson-Lindenstrauss embedding. Let u i R d, i =, 2,, m be extremely high-dimensional vectors i.e., d is very large. We want to find a map F : R d R n with n d and δ u i u j 2 2 F u i F u j δ u i u j 2 2 for some δ 0, embedding to low-dimensional space preserving the distance approximately. Remark Such embedding might be useful when using, for example, clustering algorithm. There are various distance-based clustering methods such as K-means. If one handles extremely highdimensional data, then obtaining distances between all pairs of data might require heavy computation. For this reason, one can first embed the data into low-dimensional subspace, with preserving distances, and regard the data as low-dimensional. Example continued Define F : R d R n by 8.. F u = Xu n, where X = x ij i,j R n d with x ij i.i.d N0,. 28

30 Then F u 2 2 u 2 2 = Xu 2 2 n u 2 2 = X i, u 2 n u 2 2 = n where X i is the ith row vector of X. Note that for any fixed u, holds, and hence we get n F u 2 2 u 2 = 2 X i, u 2 X i, χ 2 n u 2 F u 2 P 2 u 2 / [ δ, + δ] 2e nδ2 /8 2 u u 2 for any u 0 by previous example. Thus, using F u i u j = F u i F u j, we get F u i F u j 2 2 P u i u j 2 / [ δ, + δ] for some i j F u i F u j 2 2 P 2 u i u j 2 / [ δ, + δ] i j 2 m 2 e nδ2 /8. 2 Finally, for any ɛ 0, and m 2, so for such n, we can find a map F. From now on, we focus on our origin interest. m 2 e nδ2 /8 ɛ if n > 6 2 δ 2 log m ɛ, 2, Whether a given class F is a Glivenko-Cantelli Donsker class depends on the size of the class. A finite class of square integrable functions is always Donsker by theorem.4.8, while at the other extreme the class of all square integrable uniformlybounded functions is almost never Donsker. A relatively simple way to measure the size of a class is to use entropy numbers, which is essentially the logarithm of the number of ball or brackets of size ɛ needed to cover F. Let F, be a subset of a normed space of functions f : X R. Definition Covering number. The covering number Nɛ, F, is the minimum number of balls {g : g f < ɛ} of radius ɛ needed to cover F. The center of the balls f need not belongs to F. The entropy is the logarithm of the covering number Nɛ, F,. Definition Bracketing number. Given two functions l and u, the bracket [l, u] is the set of all functions with l f u. An ɛ-bracket is a bracket [l, u] with u l < ɛ. The bracketing number N [ ] ɛ, F, is the minimum number of ɛ-brackets needed to cover F. Each u and l need not belong to F. 29

31 The bracketing entropy entropy with bracketing is the logarithm of the bracketing number N [ ] ɛ, F,. We only consider norms with property f < g = f g. For example, L r Q norm satisfies the property. Remark Note that is satisfied, because f Q,r = f r dq /r Nɛ, F, N [ ] 2ɛ, F, u + l f [l, u], u l < 2ɛ = f B ɛ 2 holds, i.e., every 2ɛ-bracket is contained in some ɛ-ball. Definition An envelope function of F is any function F s.t. 2.3 Maximal Inequalities fx F x x X f F. In this section, we will obtain the bound of expectation of maximum, for example, maximum variation of stochastic process within small time. For this we introduce the notion of Orlicz norm. Definition For ψ : [0, [0,, where ψ is strictly increasing and convex function with ψ0 = 0, and a random variable X, the Orlicz norm X ψ is defined as { } X X ψ = inf C > 0 : Eψ. C Of course, we wonder that Orlicz norm is actually a norm. Proposition ψ is a norm on the set of all random variables with X ψ <, i.e., i ax ψ = a X ψ a R; ii X ψ = 0 X = 0 a.s.; iii X + Y ψ X ψ + Y ψ. 30

32 Proof. i Trivial. ii part is trivial. Assume that X ψ = 0. It means that X Eψ C > 0. C Note that X ψ ψ = on X 0 C C 0 X ψ ψ0 = 0 on X = 0. C C 0 ψ = because ψ is convex, strictly increasing function If PX 0 > 0, then by monotone convergence theorem, which is contradictory to. iii It suffices to show that X lim Eψ =, C 0 C X Y X + Y Eψ Eψ = Eψ. C C 2 C + C 2 Let Eψ X /C and Eψ Y /C 2. Then under our claim X + Y ψ C + C 2 holds. Taking infimum w.r.t C and C 2 sequentially, we get the desired result. It comes from: X + Y X + Y ψ ψ ψ is strictly increasing C + C 2 C + C 2 C X = ψ + C 2 Y C + C 2 C C + C 2 C 2 C X ψ + C 2 Y ψ ψ is convex. C + C 2 C + C 2 C C 2 There are two oftenly-used Orlicz norms. Example Let ψx = x p, p. Then trivially ψ satisfies conditions in definition 2.3. and { X p X ψ = inf C > 0 : E } C = inf {C > 0 : E X p C p } = E X p /p =: X p, i.e., Orlicz norm w.r.t ψx = x p is L p -norm. 3

33 Example Let ψ p x := e xp, p. Then trivially ψ p satisfies conditions in definition 2.3. and ψ p x x p. Hence Remark Note that, to X p or X ψp X p X ψp. exist, X X Eψ < or Eψ p < C C should be held for some C > 0 respectively. The former one requires polynomial order tail bound, while the latter one requires exponential order p = or squared-exponential one p = 2. In general, following holds. Proposition Tail bound. If X ψ <, then Proof. Since ψ is continuous from convexity, X Eψ = E lim X ψ P X > x. x ψ X ψ C X ψ ψ X = lim C C X ψ Eψ holds by MCT Actually = holds. Now Markov inequality gives X x P X > x = P ψ ψ X ψ X ψ X Eψ X ψ. x x ψ ψ X ψ X ψ X 2.2 C This proposition gives necessary condition for X ψ <. Then what is sufficient condition? In other words, is there any condition for tail bound which implies X ψ <? C Proposition If P X > x for p, C, δ > 0, then X x p+δ p <. Proof. E X p = 0 P X p > xdx + C dx <. x+δ/p Proposition If P X > x Ke Cxp for p and C, K > 0, then X ψp <. 32

34 Proof. Note that E e D X p X p = E = E = = KD holds for sufficiently small D > 0. It gives that Eψ p X De Ds ds Is < X p De Ds ds Ps < X p De Ds ds Ke Cs De Ds dx 0 D /p e C Ds ds for sufficiently small D > 0 precisely, if D C K+, i.e., X ψ p < precisely, X ψp K+ C /p. Remark Proposition gives that, if tail probability is bounded with p +δ order polynomial, then p-norm X p becomes finite; proposition gives that if tail probability is bounded with squared exponential exponential, resp., i.e., random variable has sub-gaussian sub-exponential, resp. distrbution, then X ψ2 < X ψ <, resp. is satisfied. Our origin goal of this section is to obtain some bounds for maximum of random variables. Such maximal inequalities can be found from the basic properties of Orlicz norm. Before starting, note following naive bound or similarly, E max X i i m m E X i m max E X i, i m max X /p m /p i i m = E max X /p i p E X i p m max E X i p = m /p max X i p. p i m i m i m Thus if random variable has smaller tail probability E max X i p <, then more tight bound for maximum is obtained m /p. Following proposition gives generalized bound. Theorem Let ψ be convex, strictly increasing function with ψ0 = 0. Further, assume that ψ satisfies lim sup x,y ψxψy ψcxy < for some c >

35 Then for any random variables X, X 2,, X m, max X i i m Kψ m max X i ψ, ψ i m where K is a constant depending only on ψ. Remark Note that: m /p in the naive bound is corresponding to ψ m. If ψ increases fast, then ψ m becomes smaller, which gives smaller bound. It holds for any random variables X,, X m ; it does not require additional assumption such as independence. Proof. Firstly, we assume that and ψ. In this case, 2 Thus, for y and any C > 0, ψxψy ψcxy x, y x ψ ψcx x y. y ψy c Xi max ψ Xi ψ C Xi max I i m Cy i m ψy Cy Xi + ψ Cy }{{} c Xi ψ C max + ψ i m ψy c Xi m ψ C + ψy 2 ψ on X i Cy < Xi I Cy < holds. Taking expectation with C = c max X i ψ and y = ψ 2m, we get i m [ ] [ ] max i m X i E ψ = E max Cy ψ Xi i m C X i m ψ max X i ψ E + ψy 2 { }} { Xi m Eψ X i ψ + 2m

36 2 + 2 =, and therefore max X i i m Cy ψ holds from ψ 2m 2ψ m, which comes from and increasingness of ψ. = cψ 2m max X i ψ i m 2cψ m max X i ψ i m m = ψ0 + ψψ 2m 0 + ψ 2m ψ 2 2 Now we see general ψ. Define φx = σψτx. If τ > 0 is large enough K > 0 s.t. x, y 0 φxφy = σ 2 ψτxψτy Kσ 2 ψcτ 2 xy = Kσφcτxy 2.3, so if σ < is small enough, we get φxφy φcτxy and φ = σψτ. Also note that 2 Putting C = X φ gives στ { } { X X ψ = inf C > 0 : Eψ = inf C > 0 : } X C σ Eφ. τc X σ Eφ = σ τc Eφ σ X X Eφ X φ X φ while holds from φσx + σ 0 σφx + σφ0. Hence we get On the other hand, and putting C = τ X ψ we get X ψ X φ στ. { } τ X X φ = inf C > 0 : σeψ, C τ X σeψ C X = σeψ X ψ, , which implies X φ τ X ψ. 35

37 Therefore we have max X i i m ψ στ max X i i m φ K στ φ m max X i φ i m K στ 2 ψ m max τ X i ψ i m = K ψ m max X i ψ. i m In, it was used that from φ x = τ ψ σ x and we have ψ σψ x σψ ψ x = x, σ σ φ x = x τ ψ σ στ ψ x. Remark Using previous theorem, we can obtain the bound of maximum of stochastic process. A common technique to handle maximum term is to partition the underlying space into finite net and control variation on the small ball, for example, sup X t max X t i + i m t T sup dt,t i <δ X ti X t. Partitioning the space into δ-balls is deeply related to the covering number; it will affect the bound. As δ becomes small, variation on each δ-ball might be smaller, while controlling maximum of finite net becomes challengeable. Definition Let T, d be an arbitrary semi-metric space. Then the covering number Nɛ is the minimum number of balls of radius ɛ needed to cover T ; a collection of points is ɛ-separated if the distance between each pair of points is strictly larger than ɛ; the packing number Dɛ is the maximum number of ɛ-separated points in T. We can naturally guess that the packing number Dɛ would have similar value with the covering number Nɛ. Proposition Nɛ Dɛ N 2 ɛ. 36

38 Proof. First, for D = Dɛ,let t, t 2,, t D be maximal ɛ-separated points. Then since the set {t, t 2,, t D } is maximal, adding any other point in T makes the set not ɛ-separated. That is, It means that i.e., Nɛ Dɛ. t T t i s.t. dt, t i ɛ. T D B ɛ t j, j= Next, let D = Dɛ and N = Nɛ/2. Assume that D > N. Then t, t 2,, t D which are ɛ- separated points, and s, s 2,, s N which balls centered with cover T, i.e., T N j= B ɛ/2s j. Then because we assumed that D > N, there exist two points t i and t i t i, t i B ɛ/2 s j. However it is contradictory to the assumption that t i, t i D N. Now we are ready for our main result for maximal inequality. those belong to the same ball are ɛ-separated. Therefore Definition A stochastic process X t t T is separable if for any countable dense subset T 0 T and δ > 0, sup ds,t<ssδ s,t T X s X t = sup ds,t<δ s,t T 0 Lemma If 0 X n X, then X n ψ X ψ. Proof. First, it is obvious that Now, for any C < X ψ, by definition, X s X t a.s.. 0 X y = X ψ Y ψ. lim Eψ It implies that X n ψ for large n, i.e., Since C < X ψ was arbitrary, we get Xn C X = Eψ >. MCT C lim inf X n ψ C. lim inf X n ψ X ψ. 37

39 Meanwhile, X n X implies X n ψ X ψ, which gives lim X n ψ = X ψ. Theorem Maximal Inequality. Let ψ be convex, strictly increasing function satisfying ψ0 = 0 and 2.3. Also assume that stochastic process X t t T is separable and satisfies Then for any η, δ > 0, sup ds,t δ X s X t ψ C ds, t s, t T. 2.4 { η } X s X t K ψ Dɛdɛ + δψ D 2 η ψ holds, where K is a constant depending only on C and ψ. 0 Proof. Construct T 0 T T recursively to satisfy that T j is a maximal η 2 j -separated set containing T j. Then by the definition of packing number, cardt j Dη 2 j. Note that by maximality t j+ T j+ t j T j s.t. dt j, t j+ η 2 j. Link every t j+ T j+ to a unique t j T j s.t. dt j, t j+ < η 2 j make any mapping which satisfies dt j, t j+ < η 2 j ; how it can be possible is not our interest. Now call t k+, t k,, t 0 to a chain. Note that is countable and dense subset by construction in T. Since X t t T is separable, sup k= T k ds,t δ X s X t = ψ = ds,t δ s,t k= T k MCT k sup X s X t ψ lim sup X s X t ds,t δ s,t T k+ Now let s k+ s k s 0 and t k+ t k t 0 be chains. Then X sk+ X tk+ X sk+ X s0 X tk+ X t0 + X s0 X t0 }{{} ψ. 38

40 holds. Now we get k { = Xsj+ X sj X tj+ X tj } 2 j=0 where L j is the set of all links from T j+ to T j. Then we get and hence by theorem 2.3.0, sup 2 s k+,t k+ T k+ ψ cardl j Dη 2 j, k max X u X v, u,v L j j=0 k max X u X v u,v L j j=0 2K K 4K 4K ψ k ψ cardl j max X u X v ψ }{{} u,v L j }{{} j=0 ψ Dη 2 j C du,v Cη 2 j k ψ Dη 2 j η 2 j 2 4 j=0 η/2 0 η 0 ψ Dɛdɛ ψ Dɛdɛ. Now, to control X s0 X t0, conversely for each pair of end points s 0, t 0, choose unique pair Figure 2.2: k ψ Dη 2 j η 2 j 2 j=0 η/2 0 ψ Dɛdɛ. s k+, t k+ T k+ which is different from those in previous paragraph; there is some abuse of notation. Then X s0 X t0 + X sk+ X tk+ 39

41 again, and hence max X s0 X t0 s 0,t 0 T 0 max s ψ k+,t k+ T k+ + max X sk+ X ψ tk+ ψ 4K η 0 ψ Dɛdɛ + max X sk+ X tk+ ψ. Note that the number of possible pairs of s 0, t 0 and consequently s k+, t k+ is at most cardt 0 2 Dη 2, and thus by theorem again, max X sk+ X tk+ ψ K ψ D 2 η max X sk+ X tk+ ψ. Since X sk+ X tk+ ψ C ds k+, t k+, we get max X s X t 8K ds,t δ s,t T k+ ψ η 0 ψ Dɛdɛ + K ψd 2 η Cδ = 8K η 0 ψ Dɛdɛ + KδψD 2 η. Remark Why we decomposed X sk+ X tk+ as and X s0 X t0, and decomposed X s0 X t0 again? If we bound X sk+ X tk+ directly with similar argument, then we obtain the bound with term ψ D 2 η 2 j, which might not be so useful. How such maximal inequality can be used? Following is one example which gives the bound for sub-gaussian stochastic process. process. Before we start, we should define sub-gaussianity of stochastic Definition A stochastic process X t t T is sub-gaussian with respect to semi-metric d if P X s X t > x 2 exp x 2 2 d 2 x. s, t Example Any zero-mean Gaussian process is sub-gaussian with respect to L 2 -distance ds, t = σx s X t = EX s X t 2. Example Let ɛ, ɛ 2,, ɛ n be Rademacher r.v. and X a = a i ɛ i, a R n. 40

42 Then by Hoeffding inequalitym, P a i ɛ i x 2 exp x 2 2 a 2 It implies that X a a R n is sub-gaussian stochastic process with respect to Euclidean distance da, b = a b 2. To apply maximal inequality, we should verify the condition 2.4. Proposition For sub-gaussian stochastic process X t t T and ψ 2 x = e x2, Proof. It suffices to show that It comes from X s X t ψ2 6ds, t. Xs X t Eψ 2. 6ds, t Xs X t Xs X Eψ 2 t 2 = E exp 6ds, t 6d 2 s, t Xs X t 2 = P exp 0 6d 2 > x dx s, t = P X s X t > 6ds, t log + x dx 0 2 exp 6d 2 s, t log + x 2 d 2 dx s, t Now we get the desired result. = 0 0 = 2. 2e 3 log+x dx Corollary Let X t t T be separable sub-gaussian stochastic process. Then E sup ds,t δ X s X t δ Remark From now on, A B denotes that 0. log Dɛdɛ δ > 0. A c B for some universal constant c > 0. Proof. Apply theorem with ψ = ψ 2 and η = δ. Since the constant K in theorem depended 4

43 only on ψ and C, which are all given in this example, K becomes universal. Therefore we get E sup ds,t δ X s X t = sup sup sup ds,t δ ds,t δ ds,t δ δ ψ2 0 δ ψ2 0 X s X t X s X t 2 X s X t ψ2 Dɛdɛ + δψ 2 D2 δ Dɛdɛ + δψ 2 Dδ ψ2 x = log + x and hence we get ψ δ ψ 0 2 x2 2ψ2 x for x 0 2 Dɛdɛ ψ2 is increasing, while D is decreasing, and hence = δψ δ 0 δ 0 2 Dδ δ 0 ψ log + Dɛdɛ log Dɛdɛ 2 Dɛdɛ log + x 2 log x for sufficiently large x Remark Note that log Dɛ is an entropy. Thus, whether the value of bound integral be finite or not depends on how fast the entropy grows as δ goes to Symmetrization In empirical process, our final goal is to obtain Glivenko-Cantelli and Donsker s theorem. They can be obtained from measuring the space F via covering number or bracketing numbers. The former one requires symmetrization technique, while the other one requires Bernstein inequality as follows. Lemma Let X,, X m be arbitrary r.v. s with x 2 P X i > x 2e 2 b+ax x > 0 42

44 for a, b > 0. Then max i m X i a log + m + b log + m. ψ Remark The bound can be also represented as aψ m + bψ2 m. Proof. First note that holds for p q φ defined as X ψp X ψq log 2 q p ψ p xlog 2 p = φ ψ q xlog 2 q, i.e., φ = ψ p ψ q for ψp x = 2 xp is concave function with φ =, and hence by Jensen, which gives log 2 q p X ψq X ψp. Now holds. Now recall that Thus for we get Therefore we have = φ φ Eψ q log 2 q log 2 X q X ψq Eφ ψ q log 2 q log 2 X q X ψq = Eψ p log 2 p X q, X ψq P X i > x 2e 2 x 2 b+ax 2e x2 4b 2e x 4a 0 x b x > b K + /p P X > x Ke Cxp, p = X ψp proposition C max i m X i X i = X i I X i b + X i I X i > b, a a }{{}}{{} P > x 2e x2 4b and hence ψ2 b P > x 2e x 4a and hence ψ a. max ψ i m + max ψ i m ψ a a 43

45 max i m + max ψ2 i m ψ ψ2 m max ψ 2 + ψ m max ψ i m i m ψ2 m b + ψ ma. Now we see very useful technique, which is called symmetrization. Recall that in empirical process we consider following setting: i.i.d X, X 2,, X n P P n f = n G n f = n fx i fx i Pf. Symmetrization technique is formulated based on the fact that, for Rademacher random variables ɛ,, ɛ n, f P n Pf would have similar behavior with f P 0 nf := n ɛ i fx i. Theorem Symmetrization. Let φ be a convex non-decreasing function and F be a class of measurable functions. Then E φ P n P F E φ 2 P 0 n F. Proof. We prove only under the measurability condition. Recall that under measurability, we can use Fubini theorem. Let Y, Y 2,, Y n be independent copies of X, X 2,, X n. Then and hence by non-decreasingness of φ, P n P F = sup fx i EfX i f F n = sup fx f F n i E Y fy i E Y sup fx i fy i n, f F Eφ P n P F E X φ E Y sup f F n E X E Y φ n sup f F fx i fy i fx i fy i Jensen 44

46 holds. Now note that, by symmetricity, and hence fx i fy i d fy i fx i fx i fy i d e i fy i fx i for any e i {, } symmetrization!. Consequently, we have sup n f F d fx i fy i sup n for any e,, e n {, } n. Therefore we get f F e i fx i fy i Eφ P n P F E ɛ E X,Y ɛ φ sup ɛ i fx i fy i f F n { } 2 E ɛ E X,Y φ sup ɛ i fx i 2 f F n + sup 2 ɛ i fy i f F n { } 2 E 2 2 ɛ E X,Y φ sup ɛ f F n i fx i + E X,Y φ sup ɛ f F n i fy i = 2 E 2 ɛ2e X φ sup ɛ i fx i f F n 2 = Eφ sup ɛ i fx i n f F = Eφ2 P 0 n F. Example Consider φx = x m, m. Then by symmetrization. If P 0 n F is measurable, then holds. The term E P n P m F 2 m E P 0 n m F E P 0 n F = E P 0 n F = E X E ɛ X sup n E ɛ X sup f F n f F ɛ i fx i ɛ i fx i can be viewed as a supremum of stochastic process n a i ɛ i for constants a i s, and hence its bound can be obtained via, for instance, Hoeffding inequality. Note that such argument requires measurability! Thus considering the class of functions which makes the target process measurable is a natural 45

47 procedure. Definition A class F of measurable functions f : X R on X, A, P is called P-measurable class if X,, X n e i fx i is measurable on completion of X n, A n, P n for every n and e,, e n {, } n. F 46

48 Chapter 3 Applications for Empirical Process 3. Glivenko-Cantelli Theorems Now we are ready for our first goal in empirical process; a uniform LLN. First we use bracketing argument; it does not require measurability. Theorem 3.. Bracketing Glivenko-Cantelli. If N [ ] ɛ, F, L P < ɛ > 0, then F is Glivenko- Cantelli, i.e., P 0. P n P F Proof. First note that ɛ-bracket w.r.t L P norm is [l, u] with l f u and u l = u l dp = P u l < ɛ. For given ɛ > 0, choose finitely many ɛ-brackets [l i, u i ], i N covering F. For each f F, i s.t. P n Pf = P n f Pf P n u i Pf = P n Pu i + Pu i f < P n Pu i + ɛ. If f is fixed, then i is also fixed, and hence by SLLN, P n Pu i 0 almost surely. Since i is finitely many, we have and therefore, Similarly we get max P n Pu i 0 almost surely, i N supp n Pf < max P n Pu i +ɛ. f F i N }{{} 0 inf P n Pf > ɛ + min P n Pl i, f F i N }{{} 0 47

49 and combining both we obtain Since ɛ > 0 was arbitrary, we get or lim sup P n Pf F ɛ almost surely. lim sup P n P F = 0 a.s., P a.s. 0. P n P F Example Let P be a probability measure on R and Then for given ɛ > 0, let Then F = {,c] : c R}. = t 0 < t < < t m = with Pt i, t i+ < ɛ i. [,ti ],,ti+ are ɛ-brackets covering F, and hence we get Glivenko-Cantelli theorem, sup F n t F t 0 almost surely. t Next argument for other type of Glivenko-Cantelli theorem uses symmetrization technique. mentioned in example 2.4.4, we need measurability condition in here. Theorem 3..3 Covering Glivenko-Cantelli. Let F be P-measurable and F be an envelope of F with P F <. Furthermore assume that where log Nɛ, F M, L P n = o P n M, ɛ > 0, F M = {f F M : f F}. As Then E P n P F = o i.e., it implies P n P F P 0. 48

50 Proof. Denote gf F = sup f F gf. Then by symmetrization, E P n P F 2E X E ɛ n holds. Note that and hence we get = 2E X E ɛ n 2E X E ɛ n ɛ i fx i F measurability! ɛ i fx i IF X i M + ɛ i fx i IF X i > M n ɛ i fx i + 2E X E ɛ ɛ i fx i IF X i > M n FM }{{ F } = sup ɛ i fx i IF X i > M f F n fx i IF X i > M n n E P n P F 2E X E ɛ n F X i IF X i > M ɛ i fx i + 2E XF X i IF X i > M }{{} FM =2P F IF >M M 0 P F < Now, for given X,, X n and ɛ > 0, let G be an ɛ-covering of F M s.t. cardg = Nɛ, F M, L P n. Note that and hence It gives E ɛ n n ɛ i fx i n ɛ i fx i FM f F M g G s.t. P n g f < ɛ, ɛ i gx i + E ɛ n ɛ i fx i + ɛ G ɛ i fx i +ɛ n }{{} = X ψ X ψ2 X = E ɛ max f G ɛ i gx i fx i n }{{} n i gx i fx i =P n g f <ɛ.. F 49

51 max ɛ f G i fx i + ɛ n ψ2 X + log G max ɛ f G i fx i n ψ2 X + ɛ /2 log Nɛ, F M, L P n max fx i 2 + ɛ f G n = /2 log Nɛ, F M, L P n max fx i 2 +ɛ f G n n }{{} log Nɛ, F M, L P n M n + ɛ = o P + ɛ =P nf 2 /2 by the assumption log Nɛ, F M, L P n = o P n. In part, following argument is used: For constants a i s, we get from ɛ 2 i E exp Cn = and in consequence Eψ 2 C n n a i ɛ i = log 2 ψ2 /2 n 2 a i ɛ i = a 2 i /2 n a 2 E exp i ɛ 2 i C 2 = exp n 2 C 2 a i ɛ i exp n 2 C 2 a 2 i a 2 i 2 C n log 2 /2 a 2 i. Or we can use some general arguments using Hoeffding inequality; see following remark. Since ɛ > 0 was arbitrary, we get Note that and therefore by BCT, we get E ɛ n E X E ɛ n E ɛ n ɛ i fx i = o P. FM ɛ i fx i M; FM ɛ i fx i = o as n. FM 50

52 Remark In part, we used the argument only can be applied on Rademacher ɛ s. However, we can also find more general argument using Hoeffding inequality. Note that since each a i ɛ i s are sub-gaussian, by Hoeffding s inequality, we can find K and C s.t. Now proposition gives P n n a i ɛ i > x Ke Cx2. K + /2 a i ɛ i, C ψ2 where K = 2 and C = 2 a 2 i precisely. Remark To make red-colored part in the proof of previous theorem rigorous, one should construct G to satisfy f M for f G. It can be assumed without loss of generality; if not, one can truncate the function as f M M so that truncated one also covers F M and satisfies f M. Just one have to check that it is still ɛ-covering of F M ; let f F M and g G s.t. P n g f < ɛ. Then for g = g M M, P n g f = n holds. n = n i: M gx i M i: M gx i M gx i fx i + gx i fx i + gx i fx i = P n g f < ɛ 3.2 Donsker Theorems i: gx i >M i: gx i >M M fx i + gx i fx i + i: M>gX i i: M>gX i In here we consider two versions of Donsker s theorem. From now on, Q,2 denotes for a probability measure Q. f Q,2 = f 2 dq /2 fx i + M fx i gx i Theorem 3.2. Covering Donsker. Let F δ := {f g : f, g F, f g P,2 < δ} be P-measurable for any δ 0, ] and F be an envelope of F with P F 2 <. If 0 sup log Nɛ F Q,2, F, L 2 Qdɛ <, 3. Q when the supremum is taken over all finitely discrete probability measures, then F is P-Donsker. 5

53 Proof. It suffices to prove that G n is asymptotically tight, where G n = {G n f : f F} is regarded as a stochastic process with index set F. By theorem.4.8 note that each G n f converges weakly by classical CLT, which implies asymptotic tightness of each marginal it s enough to show that: i F is totally bounded in L 2 P norm; ii G n is asymptotically uniformly L 2 P-equicontinuous in probability. For this, we need following lemma: Lemma Let a n : [0, ] [0, be a sequence of non-decreasing functions. Then Proof of lemma. lim lim sup a n δ = 0 a n δ n = 0 δ n 0. δ 0 = Let {δ n } be nonincreasing sequence convergin to 0. ɛ > 0 δ 0 > 0 s.t. lim sup a n δ 0 < ɛ 2 and hence N s.t. n N a n δ 0 < ɛ. Since a n is nondecreasing, N s.t. n N δ n < δ 0. Thus = It s sufficient to show that: n N N = δ n < δ 0 = a n δ n a n δ 0 < ɛ. δ n 0 s t. lim sup a n δ n = lim lim sup a n δ. δ 0 Let C = lim δ 0 lim sup a n δ. Then for any δ > 0, we get lim sup a n δ C, because a n decreases as δ 0. It gives that for any δ > 0 and for any ɛ > 0, Thus, for every fixed m, i.e., N, N 2, N 3, s.t. a n δ > C ɛ i.o.. a n > C m m i.o., a N > C a N2 > C 2 2, N 2 > N 52

54 a N3 3 > C 3, N 3 > N 2 and so on. Take δ n as Then by definition, holds, which gives that,,, }{{} 2, 2,,, }{{ 2} 3, 3,,,. }{{ 3} N, N 2 N a Nk δ Nk > C k N 3 N 2 lim sup a n δ n C. However, since a n δ n a n δ for any fixed δ > 0 and large n enough, we have which gives Therefore we get lim sup a n δ n lim sup a n δ δ > 0, lim sup a n δ n C. lim sup a n δ n = C = lim lim sup a n δ n. δ 0 Now we show ii first. ii is equivalent to Note that by definition thus ii is again equivalent to x, η > 0 δ > 0 s.t. lim sup P sup f g P,2 <δ G n f G n g > x Lemma sup f g P,2 <δ G n f G n g = G n Fδ ; x, η > 0 δ > 0 s.t. lim sup P G n Fδ > x < η. Note that G n δ decreases as δ 0, which makes P G n Fδ > x also non-decreasing of δ. Thus it is equivalent to which is also same as lim δ 0 lim sup P G n Fδ > x < η x > 0, < η. lim P G n Fδn > x < η x > 0 δ n

55 by the lemma. Now we will show 3.2 instead of ii. For given x > 0 and δ n 0, P G n Fδn > x x E G n Fδn 2 x E n ɛ i fx i Fδn symmetrization holds. Note that E becomes E in blue-colored part from the measurability of F δn. Now, note that where f n = n P ɛ X n ɛ i fx i gx i < x 2 exp x 2 2 f g 2, n fx i 2 by Hoeffding s inequality cf. example 2.3.2, which implies that the stochastic process f n corollary E ɛ X n ɛ i fx i is sub-gaussian w.r.t. n. Then by maximal inequality ɛ i fx i E ɛ X sup Fδn δ 0 f F δn f g n<δ holds for any δ > 0 and g F δn. Using 0 F δn E ɛ X n n ɛ i fx i gx i + E ɛ X n log Dɛ, Fδn, n dɛ + E ɛ X n ɛ i fx i Fδn Now using Dɛ Nɛ/2, we can obtain that E ɛ X n ɛ i gx i ɛ i gx i ɛ i fx i Fδn = 0 θn 0 and letting δ very big MCT, we can obtain log Nɛ, Fδn, n dɛ 0 log Dɛ, Fδn, n dɛ. log Nɛ, Fδn, n dɛ θ n = sup f n f F δn Nɛ, F δn, n = for large ɛ θn/ F n 0 θn/ F n 0 θn/ F n 0 log Nɛ F n, F, n dɛ F n F δn F sup log Nɛ F Q,2, F, L 2 Qdɛ F n Q sup Q log N ɛ 2 F Q,2, F, L 2 Q f Q,2 < ɛ, g Q,2 < ɛ f g Q,2 < 2ɛ, dɛ F n 54

56 Note that F n = n = which implies N2ɛ, F, L 2 Q N 2 ɛ, F, L 2 Q θn/2 F n log Nɛ F Q,2, F, L 2 Qdɛ 2 F n 0 θn/ F n 0 sup Q sup log Nɛ F Q,2, F, L 2 Qdɛ F n. Q F X i 2 converges to a positive constant by SLLN and the assumption, E X F 2 n = P F 2 <. Hence we get: θn/ F n E X sup log Nɛ F Q,2, F, L 2 Qdɛ F n 0 Q θn = E X sup log Nɛ F Q,2, F, L 2 Q F n I > ɛ dɛ 0 Q F n θn = sup log Nɛ F Q,2, F, L 2 QE F n I > ɛ dɛ. F n 0 Q By uniform entropy condition and DCT, the last term converges to 0 as n if E F n I θn > ɛ 0 ɛ > 0. F n If θ n / F n converges to 0 in probability, then Cauchy-Schwarz gives E F n Iθ n > ɛ F n E F 2 n P θ n > ɛ F n }{{}}{{} < 0 which gives the desired result. Thus our claim is that θ n / F n converges to a positive constant; therefore our final claim is: and Claim. θ n = o P. By definition, 0, P 0. However note that F n θn 2 = sup f 2 n = sup P n f 2 sup P n Pf 2 + sup Pf 2 sup P n Pf 2 + sup Pf 2 f F δn f F δn f F δn f F δn f F f F δn sup Pf 2 δn 2 0 def of F δ f F δn hold. Furthermore, since 4F 2 is an integrable envelope of G = {f 2 : f F }, we get for f, g F P n f 2 g 2 = P n f g f + g P n f g 4F f g n 4F n Cauchy-Schwarz f 2F 55

57 and hence Nɛ 2F 2 n, G, L P n Nɛ F n, F, n sup Nɛ F Q,2, F, Q,2. f g n ɛ F n P n f 2 g 2 f g n 4F n ɛ F n 4F n = ɛ 2F 2 n Hence Nɛ 2F 2 n, G, L P n is bounded by a fixed number depending only on ɛ, i.e., It implies that Nɛ 2F 2 n, G, L P n = O P ɛ > 0. log Nɛ, G, L P n = o P n ɛ > 0, cf. see following remark which implies that G is Glivenko-Cantelli thm 3..3, i.e., Claim sup P n Pf = sup P n Pf 2 P f G f F Q 0. Remark Assume that Nɛ 2F 2 n, G, L P n = O P for any ɛ > 0. For each ω, M > 0 and N s.t. and hence n > N = 2F 2 nω M, NɛM, G, L P n Nɛ 2F 2 n, G, L P n for such M and n. log Nɛ 2F 2 n, G, L P n = o P n ɛ > 0 implies that log Nɛ, G, L P n = o P n ɛ > 0. Proof Cont d. Now we show i. Since G is Glivenko-Cantelli, there exists a finitely discrete measure P n with P n Pf 2 F Meanwhile, by the uniform entropy condition, we get i.e., 0 log Nɛ F Pn,2, F, L 2 P n dɛ = F Pn,2 0. Nɛ, F, L 2 P n < ɛ > 0. 0 log Nɛ, F, L 2 P n dɛ <, 56

58 For f, g F, P n f g 2 < ɛ 2 implies Pf g 2 = P P n f g 2 + P n f g 2 P P n 2f 2 + 2g 2 +P }{{} n f g 2 ɛ 2 + ɛ 2 = 2ɛ 2 4 P n Pf 2 F for large n enough so that P n Pf 2 F ɛ 2 /4. It implies that for large n, i.e., for large n. Therefore we obtain i.e., F is totally bounded w.r.t L 2 P-norm. f g Pn,2 ɛ = f g P,2 2ɛ ɛ Nɛ, F, L 2 P N, F, L 2 P n < 2 Nɛ, F, L 2 P < ɛ > 0, Next we consider bracketing Donsker s theorem. It uses Bernstein s inequality in the proof. From now on, let F be a set of measurable functions with envelope F satisfying P F 2 <. Lemma If F < and f < for any f F, then Proof. Note that f E G n F max log F + max f F n G n f = n f F f P,2 fx i Pf. Each fx i Pf/ n has mean zero and satisfies Bernstein condition log F. 3.3 [ E fx i Pf k fxi Pf 2 ] fx = E i Pf k 2 n n n [ 2 f ] k 2 2f 2 X E i + Pf 2 n n 2 k 2 n Pf 2 2 k f n 2 k k! 4Pf 2 k 2 2n k! f n 57

59 holds. Thus by Bernstein s inequality, P G n f > x 2 exp x 2 2 4Pf 2 n + f 2 exp x 2 2 x 4 max f + max x n f F f F n holds for any x > 0 for large n. Now maximal inequality lemma 2.4. gives the conclusion E G n F max G nf f F ψ f max log + F + 4 max Pf 2 log + F f F n f F f max log F + max Pf 2 log F. f F n f F Theorem Bracketing Donsker. If then F is P-Donsker. 0 log N [ ] ɛ, F, L 2 Pdɛ <, Remark We use chaining technique and previous lemma in the proof. However, as the condition f satisfying < is required to apply the lemma, we should truncate the terms with the order f log F f P,2 n so that two terms in the RHS of 3.3 have equal order. Proof. There exists an envelope F of F with P F 2 < Recall remark ; bracketing number is larger than covering number with same diameter. Finiteness of the integral gives that N [ ] ɛ, F, L 2 P = for large ɛ. Let [l, u] be the only bracket covering F with u l P,2 < M. Also we get 0 log Nɛ, F, L 2 Pdɛ <, i.e., Nɛ, F, L 2 P is finite for any ɛ > 0. It implies that F is totally bounded; so F is bounded. Thus P u + l 2 2P u 2 + l 2 and u P,2 u f P,2 + f P,2 <, 58

60 l P,2 f l P,2 + f P,2 < for f F implies that P u + l 2 <. Letting F = sup u, l u + l, we get an envelope F of F with P F 2 <. For q, construct a sequence of nested partitions s.t. F q,i is a 2 q -bracket in P,2 and F = N q F q,i 2 q log N q <. 3.4 q= Figure 3.: Nested Partition i F q,i. Of course we have to show that we can find such partition satisfying 3.4. Note that N q is equal to the sum of the number of partitions of each F q,i, i.e., It implies that N q N q N [ ] 2 q, F, L 2 P. Figure 3.2: Relationship between N q and N q. log Nq log N q + log N [ ] 2 q, F, L 2 P log N q + log N [ ] 2 q, F, L 2 P a + b a + b 59

61 log N q 2 + log N [ ] 2 q, F, L 2 P + log N [ ] 2 q, F, L 2 P log N + and therefore 2 q log N q q= q log N [ ] 2 p, F, L 2 P, p=2 q= 2 q log N + = log N + = log N + log N + < log N + q= p= p= q=p q log N [ ] 2 p, F, L 2 P p=2 q 2 q log N [ ] 2 p, F, L 2 P 2 q log N [ ] 2 p, F, L 2 P 2 p log N [ ] 2 p, F, L 2 P p= 0 log N [ ] ɛ, F, L 2 Pdɛ holds, which yields 3.4. Now, fix f q,i F q,i fix representatives of each partition, and for f F q,i, define π q f := f q,i q f := sup g h g,h F q,i Since each F q,i is 2 q -bracket, g h P,2 2 q, and hence projection to the space of representatives. variation on each partition P q f 2 2 q. Figure 3.3: F q,i and representative f q,i. 60

Introduction to Empirical Processes and Semiparametric Inference Lecture 08: Stochastic Convergence

Introduction to Empirical Processes and Semiparametric Inference Lecture 08: Stochastic Convergence Introduction to Empirical Processes and Semiparametric Inference Lecture 08: Stochastic Convergence Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations

More information

Empirical Processes: General Weak Convergence Theory

Empirical Processes: General Weak Convergence Theory Empirical Processes: General Weak Convergence Theory Moulinath Banerjee May 18, 2010 1 Extended Weak Convergence The lack of measurability of the empirical process with respect to the sigma-field generated

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research

More information

1 Weak Convergence in R k

1 Weak Convergence in R k 1 Weak Convergence in R k Byeong U. Park 1 Let X and X n, n 1, be random vectors taking values in R k. These random vectors are allowed to be defined on different probability spaces. Below, for the simplicity

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 59 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d

More information

EMPIRICAL PROCESSES: Theory and Applications

EMPIRICAL PROCESSES: Theory and Applications Corso estivo di statistica e calcolo delle probabilità EMPIRICAL PROCESSES: Theory and Applications Torgnon, 23 Corrected Version, 2 July 23; 21 August 24 Jon A. Wellner University of Washington Statistics,

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results

Introduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results Introduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics

More information

Math 209B Homework 2

Math 209B Homework 2 Math 29B Homework 2 Edward Burkard Note: All vector spaces are over the field F = R or C 4.6. Two Compactness Theorems. 4. Point Set Topology Exercise 6 The product of countably many sequentally compact

More information

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure? MA 645-4A (Real Analysis), Dr. Chernov Homework assignment 1 (Due ). Show that the open disk x 2 + y 2 < 1 is a countable union of planar elementary sets. Show that the closed disk x 2 + y 2 1 is a countable

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued

Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and

More information

Continuous Functions on Metric Spaces

Continuous Functions on Metric Spaces Continuous Functions on Metric Spaces Math 201A, Fall 2016 1 Continuous functions Definition 1. Let (X, d X ) and (Y, d Y ) be metric spaces. A function f : X Y is continuous at a X if for every ɛ > 0

More information

THEOREMS, ETC., FOR MATH 515

THEOREMS, ETC., FOR MATH 515 THEOREMS, ETC., FOR MATH 515 Proposition 1 (=comment on page 17). If A is an algebra, then any finite union or finite intersection of sets in A is also in A. Proposition 2 (=Proposition 1.1). For every

More information

2 (Bonus). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

2 (Bonus). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure? MA 645-4A (Real Analysis), Dr. Chernov Homework assignment 1 (Due 9/5). Prove that every countable set A is measurable and µ(a) = 0. 2 (Bonus). Let A consist of points (x, y) such that either x or y is

More information

Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University

Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University February 7, 2007 2 Contents 1 Metric Spaces 1 1.1 Basic definitions...........................

More information

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS PROBABILITY: LIMIT THEOREMS II, SPRING 218. HOMEWORK PROBLEMS PROF. YURI BAKHTIN Instructions. You are allowed to work on solutions in groups, but you are required to write up solutions on your own. Please

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini April 27, 2018 1 / 80 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d

More information

MATH 6605: SUMMARY LECTURE NOTES

MATH 6605: SUMMARY LECTURE NOTES MATH 6605: SUMMARY LECTURE NOTES These notes summarize the lectures on weak convergence of stochastic processes. If you see any typos, please let me know. 1. Construction of Stochastic rocesses A stochastic

More information

Exercise Solutions to Functional Analysis

Exercise Solutions to Functional Analysis Exercise Solutions to Functional Analysis Note: References refer to M. Schechter, Principles of Functional Analysis Exersize that. Let φ,..., φ n be an orthonormal set in a Hilbert space H. Show n f n

More information

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9 MAT 570 REAL ANALYSIS LECTURE NOTES PROFESSOR: JOHN QUIGG SEMESTER: FALL 204 Contents. Sets 2 2. Functions 5 3. Countability 7 4. Axiom of choice 8 5. Equivalence relations 9 6. Real numbers 9 7. Extended

More information

7 Complete metric spaces and function spaces

7 Complete metric spaces and function spaces 7 Complete metric spaces and function spaces 7.1 Completeness Let (X, d) be a metric space. Definition 7.1. A sequence (x n ) n N in X is a Cauchy sequence if for any ɛ > 0, there is N N such that n, m

More information

Metric Spaces. Exercises Fall 2017 Lecturer: Viveka Erlandsson. Written by M.van den Berg

Metric Spaces. Exercises Fall 2017 Lecturer: Viveka Erlandsson. Written by M.van den Berg Metric Spaces Exercises Fall 2017 Lecturer: Viveka Erlandsson Written by M.van den Berg School of Mathematics University of Bristol BS8 1TW Bristol, UK 1 Exercises. 1. Let X be a non-empty set, and suppose

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 13: Entropy Calculations

Introduction to Empirical Processes and Semiparametric Inference Lecture 13: Entropy Calculations Introduction to Empirical Processes and Semiparametric Inference Lecture 13: Entropy Calculations Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research

More information

4 Expectation & the Lebesgue Theorems

4 Expectation & the Lebesgue Theorems STA 205: Probability & Measure Theory Robert L. Wolpert 4 Expectation & the Lebesgue Theorems Let X and {X n : n N} be random variables on a probability space (Ω,F,P). If X n (ω) X(ω) for each ω Ω, does

More information

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1 Chapter 2 Probability measures 1. Existence Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension to the generated σ-field Proof of Theorem 2.1. Let F 0 be

More information

PROBLEMS. (b) (Polarization Identity) Show that in any inner product space

PROBLEMS. (b) (Polarization Identity) Show that in any inner product space 1 Professor Carl Cowen Math 54600 Fall 09 PROBLEMS 1. (Geometry in Inner Product Spaces) (a) (Parallelogram Law) Show that in any inner product space x + y 2 + x y 2 = 2( x 2 + y 2 ). (b) (Polarization

More information

Weak convergence and Brownian Motion. (telegram style notes) P.J.C. Spreij

Weak convergence and Brownian Motion. (telegram style notes) P.J.C. Spreij Weak convergence and Brownian Motion (telegram style notes) P.J.C. Spreij this version: December 8, 2006 1 The space C[0, ) In this section we summarize some facts concerning the space C[0, ) of real

More information

4th Preparation Sheet - Solutions

4th Preparation Sheet - Solutions Prof. Dr. Rainer Dahlhaus Probability Theory Summer term 017 4th Preparation Sheet - Solutions Remark: Throughout the exercise sheet we use the two equivalent definitions of separability of a metric space

More information

Stat 8112 Lecture Notes Weak Convergence in Metric Spaces Charles J. Geyer January 23, Metric Spaces

Stat 8112 Lecture Notes Weak Convergence in Metric Spaces Charles J. Geyer January 23, Metric Spaces Stat 8112 Lecture Notes Weak Convergence in Metric Spaces Charles J. Geyer January 23, 2013 1 Metric Spaces Let X be an arbitrary set. A function d : X X R is called a metric if it satisfies the folloing

More information

Lecture 2: Uniform Entropy

Lecture 2: Uniform Entropy STAT 583: Advanced Theory of Statistical Inference Spring 218 Lecture 2: Uniform Entropy Lecturer: Fang Han April 16 Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal

More information

SUMMARY OF RESULTS ON PATH SPACES AND CONVERGENCE IN DISTRIBUTION FOR STOCHASTIC PROCESSES

SUMMARY OF RESULTS ON PATH SPACES AND CONVERGENCE IN DISTRIBUTION FOR STOCHASTIC PROCESSES SUMMARY OF RESULTS ON PATH SPACES AND CONVERGENCE IN DISTRIBUTION FOR STOCHASTIC PROCESSES RUTH J. WILLIAMS October 2, 2017 Department of Mathematics, University of California, San Diego, 9500 Gilman Drive,

More information

Probability and Measure

Probability and Measure Part II Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2018 84 Paper 4, Section II 26J Let (X, A) be a measurable space. Let T : X X be a measurable map, and µ a probability

More information

Real Analysis Problems

Real Analysis Problems Real Analysis Problems Cristian E. Gutiérrez September 14, 29 1 1 CONTINUITY 1 Continuity Problem 1.1 Let r n be the sequence of rational numbers and Prove that f(x) = 1. f is continuous on the irrationals.

More information

Applied Analysis (APPM 5440): Final exam 1:30pm 4:00pm, Dec. 14, Closed books.

Applied Analysis (APPM 5440): Final exam 1:30pm 4:00pm, Dec. 14, Closed books. Applied Analysis APPM 44: Final exam 1:3pm 4:pm, Dec. 14, 29. Closed books. Problem 1: 2p Set I = [, 1]. Prove that there is a continuous function u on I such that 1 ux 1 x sin ut 2 dt = cosx, x I. Define

More information

STA 711: Probability & Measure Theory Robert L. Wolpert

STA 711: Probability & Measure Theory Robert L. Wolpert STA 711: Probability & Measure Theory Robert L. Wolpert 6 Independence 6.1 Independent Events A collection of events {A i } F in a probability space (Ω,F,P) is called independent if P[ i I A i ] = P[A

More information

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2)

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2) 14:17 11/16/2 TOPIC. Convergence in distribution and related notions. This section studies the notion of the so-called convergence in distribution of real random variables. This is the kind of convergence

More information

x log x, which is strictly convex, and use Jensen s Inequality:

x log x, which is strictly convex, and use Jensen s Inequality: 2. Information measures: mutual information 2.1 Divergence: main inequality Theorem 2.1 (Information Inequality). D(P Q) 0 ; D(P Q) = 0 iff P = Q Proof. Let ϕ(x) x log x, which is strictly convex, and

More information

Metric Spaces and Topology

Metric Spaces and Topology Chapter 2 Metric Spaces and Topology From an engineering perspective, the most important way to construct a topology on a set is to define the topology in terms of a metric on the set. This approach underlies

More information

The main results about probability measures are the following two facts:

The main results about probability measures are the following two facts: Chapter 2 Probability measures The main results about probability measures are the following two facts: Theorem 2.1 (extension). If P is a (continuous) probability measure on a field F 0 then it has a

More information

ELEMENTS OF PROBABILITY THEORY

ELEMENTS OF PROBABILITY THEORY ELEMENTS OF PROBABILITY THEORY Elements of Probability Theory A collection of subsets of a set Ω is called a σ algebra if it contains Ω and is closed under the operations of taking complements and countable

More information

Locally convex spaces, the hyperplane separation theorem, and the Krein-Milman theorem

Locally convex spaces, the hyperplane separation theorem, and the Krein-Milman theorem 56 Chapter 7 Locally convex spaces, the hyperplane separation theorem, and the Krein-Milman theorem Recall that C(X) is not a normed linear space when X is not compact. On the other hand we could use semi

More information

Convergence of Feller Processes

Convergence of Feller Processes Chapter 15 Convergence of Feller Processes This chapter looks at the convergence of sequences of Feller processes to a iting process. Section 15.1 lays some ground work concerning weak convergence of processes

More information

2) Let X be a compact space. Prove that the space C(X) of continuous real-valued functions is a complete metric space.

2) Let X be a compact space. Prove that the space C(X) of continuous real-valued functions is a complete metric space. University of Bergen General Functional Analysis Problems with solutions 6 ) Prove that is unique in any normed space. Solution of ) Let us suppose that there are 2 zeros and 2. Then = + 2 = 2 + = 2. 2)

More information

REAL AND COMPLEX ANALYSIS

REAL AND COMPLEX ANALYSIS REAL AND COMPLE ANALYSIS Third Edition Walter Rudin Professor of Mathematics University of Wisconsin, Madison Version 1.1 No rights reserved. Any part of this work can be reproduced or transmitted in any

More information

Fundamental Inequalities, Convergence and the Optional Stopping Theorem for Continuous-Time Martingales

Fundamental Inequalities, Convergence and the Optional Stopping Theorem for Continuous-Time Martingales Fundamental Inequalities, Convergence and the Optional Stopping Theorem for Continuous-Time Martingales Prakash Balachandran Department of Mathematics Duke University April 2, 2008 1 Review of Discrete-Time

More information

Brownian Motion and Conditional Probability

Brownian Motion and Conditional Probability Math 561: Theory of Probability (Spring 2018) Week 10 Brownian Motion and Conditional Probability 10.1 Standard Brownian Motion (SBM) Brownian motion is a stochastic process with both practical and theoretical

More information

Lecture 4 Lebesgue spaces and inequalities

Lecture 4 Lebesgue spaces and inequalities Lecture 4: Lebesgue spaces and inequalities 1 of 10 Course: Theory of Probability I Term: Fall 2013 Instructor: Gordan Zitkovic Lecture 4 Lebesgue spaces and inequalities Lebesgue spaces We have seen how

More information

A Concise Course on Stochastic Partial Differential Equations

A Concise Course on Stochastic Partial Differential Equations A Concise Course on Stochastic Partial Differential Equations Michael Röckner Reference: C. Prevot, M. Röckner: Springer LN in Math. 1905, Berlin (2007) And see the references therein for the original

More information

Integral Jensen inequality

Integral Jensen inequality Integral Jensen inequality Let us consider a convex set R d, and a convex function f : (, + ]. For any x,..., x n and λ,..., λ n with n λ i =, we have () f( n λ ix i ) n λ if(x i ). For a R d, let δ a

More information

Probability and Measure

Probability and Measure Probability and Measure Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA Convergence of Random Variables 1. Convergence Concepts 1.1. Convergence of Real

More information

(convex combination!). Use convexity of f and multiply by the common denominator to get. Interchanging the role of x and y, we obtain that f is ( 2M ε

(convex combination!). Use convexity of f and multiply by the common denominator to get. Interchanging the role of x and y, we obtain that f is ( 2M ε 1. Continuity of convex functions in normed spaces In this chapter, we consider continuity properties of real-valued convex functions defined on open convex sets in normed spaces. Recall that every infinitedimensional

More information

Continuity of convex functions in normed spaces

Continuity of convex functions in normed spaces Continuity of convex functions in normed spaces In this chapter, we consider continuity properties of real-valued convex functions defined on open convex sets in normed spaces. Recall that every infinitedimensional

More information

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS PROBABILITY: LIMIT THEOREMS II, SPRING 15. HOMEWORK PROBLEMS PROF. YURI BAKHTIN Instructions. You are allowed to work on solutions in groups, but you are required to write up solutions on your own. Please

More information

1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer 11(2) (1989),

1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer 11(2) (1989), Real Analysis 2, Math 651, Spring 2005 April 26, 2005 1 Real Analysis 2, Math 651, Spring 2005 Krzysztof Chris Ciesielski 1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer

More information

Random Process Lecture 1. Fundamentals of Probability

Random Process Lecture 1. Fundamentals of Probability Random Process Lecture 1. Fundamentals of Probability Husheng Li Min Kao Department of Electrical Engineering and Computer Science University of Tennessee, Knoxville Spring, 2016 1/43 Outline 2/43 1 Syllabus

More information

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability... Functional Analysis Franck Sueur 2018-2019 Contents 1 Metric spaces 1 1.1 Definitions........................................ 1 1.2 Completeness...................................... 3 1.3 Compactness......................................

More information

1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3

1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3 Index Page 1 Topology 2 1.1 Definition of a topology 2 1.2 Basis (Base) of a topology 2 1.3 The subspace topology & the product topology on X Y 3 1.4 Basic topology concepts: limit points, closed sets,

More information

Honours Analysis III

Honours Analysis III Honours Analysis III Math 354 Prof. Dmitry Jacobson Notes Taken By: R. Gibson Fall 2010 1 Contents 1 Overview 3 1.1 p-adic Distance............................................ 4 2 Introduction 5 2.1 Normed

More information

Concentration inequalities and the entropy method

Concentration inequalities and the entropy method Concentration inequalities and the entropy method Gábor Lugosi ICREA and Pompeu Fabra University Barcelona what is concentration? We are interested in bounding random fluctuations of functions of many

More information

2. Dual space is essential for the concept of gradient which, in turn, leads to the variational analysis of Lagrange multipliers.

2. Dual space is essential for the concept of gradient which, in turn, leads to the variational analysis of Lagrange multipliers. Chapter 3 Duality in Banach Space Modern optimization theory largely centers around the interplay of a normed vector space and its corresponding dual. The notion of duality is important for the following

More information

MA651 Topology. Lecture 10. Metric Spaces.

MA651 Topology. Lecture 10. Metric Spaces. MA65 Topology. Lecture 0. Metric Spaces. This text is based on the following books: Topology by James Dugundgji Fundamental concepts of topology by Peter O Neil Linear Algebra and Analysis by Marc Zamansky

More information

1 Glivenko-Cantelli type theorems

1 Glivenko-Cantelli type theorems STA79 Lecture Spring Semester Glivenko-Cantelli type theorems Given i.i.d. observations X,..., X n with unknown distribution function F (t, consider the empirical (sample CDF ˆF n (t = I [Xi t]. n Then

More information

Compact operators on Banach spaces

Compact operators on Banach spaces Compact operators on Banach spaces Jordan Bell jordan.bell@gmail.com Department of Mathematics, University of Toronto November 12, 2017 1 Introduction In this note I prove several things about compact

More information

Wiener Measure and Brownian Motion

Wiener Measure and Brownian Motion Chapter 16 Wiener Measure and Brownian Motion Diffusion of particles is a product of their apparently random motion. The density u(t, x) of diffusing particles satisfies the diffusion equation (16.1) u

More information

Problem set 1, Real Analysis I, Spring, 2015.

Problem set 1, Real Analysis I, Spring, 2015. Problem set 1, Real Analysis I, Spring, 015. (1) Let f n : D R be a sequence of functions with domain D R n. Recall that f n f uniformly if and only if for all ɛ > 0, there is an N = N(ɛ) so that if n

More information

Part II Probability and Measure

Part II Probability and Measure Part II Probability and Measure Theorems Based on lectures by J. Miller Notes taken by Dexter Chua Michaelmas 2016 These notes are not endorsed by the lecturers, and I have modified them (often significantly)

More information

Analysis Finite and Infinite Sets The Real Numbers The Cantor Set

Analysis Finite and Infinite Sets The Real Numbers The Cantor Set Analysis Finite and Infinite Sets Definition. An initial segment is {n N n n 0 }. Definition. A finite set can be put into one-to-one correspondence with an initial segment. The empty set is also considered

More information

The Arzelà-Ascoli Theorem

The Arzelà-Ascoli Theorem John Nachbar Washington University March 27, 2016 The Arzelà-Ascoli Theorem The Arzelà-Ascoli Theorem gives sufficient conditions for compactness in certain function spaces. Among other things, it helps

More information

converges as well if x < 1. 1 x n x n 1 1 = 2 a nx n

converges as well if x < 1. 1 x n x n 1 1 = 2 a nx n Solve the following 6 problems. 1. Prove that if series n=1 a nx n converges for all x such that x < 1, then the series n=1 a n xn 1 x converges as well if x < 1. n For x < 1, x n 0 as n, so there exists

More information

MATH MEASURE THEORY AND FOURIER ANALYSIS. Contents

MATH MEASURE THEORY AND FOURIER ANALYSIS. Contents MATH 3969 - MEASURE THEORY AND FOURIER ANALYSIS ANDREW TULLOCH Contents 1. Measure Theory 2 1.1. Properties of Measures 3 1.2. Constructing σ-algebras and measures 3 1.3. Properties of the Lebesgue measure

More information

Commutative Banach algebras 79

Commutative Banach algebras 79 8. Commutative Banach algebras In this chapter, we analyze commutative Banach algebras in greater detail. So we always assume that xy = yx for all x, y A here. Definition 8.1. Let A be a (commutative)

More information

Weak convergence. Amsterdam, 13 November Leiden University. Limit theorems. Shota Gugushvili. Generalities. Criteria

Weak convergence. Amsterdam, 13 November Leiden University. Limit theorems. Shota Gugushvili. Generalities. Criteria Weak Leiden University Amsterdam, 13 November 2013 Outline 1 2 3 4 5 6 7 Definition Definition Let µ, µ 1, µ 2,... be probability measures on (R, B). It is said that µ n converges weakly to µ, and we then

More information

MTH 404: Measure and Integration

MTH 404: Measure and Integration MTH 404: Measure and Integration Semester 2, 2012-2013 Dr. Prahlad Vaidyanathan Contents I. Introduction....................................... 3 1. Motivation................................... 3 2. The

More information

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R. Ergodic Theorems Samy Tindel Purdue University Probability Theory 2 - MA 539 Taken from Probability: Theory and examples by R. Durrett Samy T. Ergodic theorems Probability Theory 1 / 92 Outline 1 Definitions

More information

Stability of optimization problems with stochastic dominance constraints

Stability of optimization problems with stochastic dominance constraints Stability of optimization problems with stochastic dominance constraints D. Dentcheva and W. Römisch Stevens Institute of Technology, Hoboken Humboldt-University Berlin www.math.hu-berlin.de/~romisch SIAM

More information

STA205 Probability: Week 8 R. Wolpert

STA205 Probability: Week 8 R. Wolpert INFINITE COIN-TOSS AND THE LAWS OF LARGE NUMBERS The traditional interpretation of the probability of an event E is its asymptotic frequency: the limit as n of the fraction of n repeated, similar, and

More information

Exercises Measure Theoretic Probability

Exercises Measure Theoretic Probability Exercises Measure Theoretic Probability 2002-2003 Week 1 1. Prove the folloing statements. (a) The intersection of an arbitrary family of d-systems is again a d- system. (b) The intersection of an arbitrary

More information

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3 Brownian Motion Contents 1 Definition 2 1.1 Brownian Motion................................. 2 1.2 Wiener measure.................................. 3 2 Construction 4 2.1 Gaussian process.................................

More information

MATH 722, COMPLEX ANALYSIS, SPRING 2009 PART 5

MATH 722, COMPLEX ANALYSIS, SPRING 2009 PART 5 MATH 722, COMPLEX ANALYSIS, SPRING 2009 PART 5.. The Arzela-Ascoli Theorem.. The Riemann mapping theorem Let X be a metric space, and let F be a family of continuous complex-valued functions on X. We have

More information

THE SKOROKHOD OBLIQUE REFLECTION PROBLEM IN A CONVEX POLYHEDRON

THE SKOROKHOD OBLIQUE REFLECTION PROBLEM IN A CONVEX POLYHEDRON GEORGIAN MATHEMATICAL JOURNAL: Vol. 3, No. 2, 1996, 153-176 THE SKOROKHOD OBLIQUE REFLECTION PROBLEM IN A CONVEX POLYHEDRON M. SHASHIASHVILI Abstract. The Skorokhod oblique reflection problem is studied

More information

From now on, we will represent a metric space with (X, d). Here are some examples: i=1 (x i y i ) p ) 1 p, p 1.

From now on, we will represent a metric space with (X, d). Here are some examples: i=1 (x i y i ) p ) 1 p, p 1. Chapter 1 Metric spaces 1.1 Metric and convergence We will begin with some basic concepts. Definition 1.1. (Metric space) Metric space is a set X, with a metric satisfying: 1. d(x, y) 0, d(x, y) = 0 x

More information

An exponential family of distributions is a parametric statistical model having densities with respect to some positive measure λ of the form.

An exponential family of distributions is a parametric statistical model having densities with respect to some positive measure λ of the form. Stat 8112 Lecture Notes Asymptotics of Exponential Families Charles J. Geyer January 23, 2013 1 Exponential Families An exponential family of distributions is a parametric statistical model having densities

More information

Midterm 1. Every element of the set of functions is continuous

Midterm 1. Every element of the set of functions is continuous Econ 200 Mathematics for Economists Midterm Question.- Consider the set of functions F C(0, ) dened by { } F = f C(0, ) f(x) = ax b, a A R and b B R That is, F is a subset of the set of continuous functions

More information

Problem 1: Compactness (12 points, 2 points each)

Problem 1: Compactness (12 points, 2 points each) Final exam Selected Solutions APPM 5440 Fall 2014 Applied Analysis Date: Tuesday, Dec. 15 2014, 10:30 AM to 1 PM You may assume all vector spaces are over the real field unless otherwise specified. Your

More information

An introduction to some aspects of functional analysis

An introduction to some aspects of functional analysis An introduction to some aspects of functional analysis Stephen Semmes Rice University Abstract These informal notes deal with some very basic objects in functional analysis, including norms and seminorms

More information

CHAPTER VIII HILBERT SPACES

CHAPTER VIII HILBERT SPACES CHAPTER VIII HILBERT SPACES DEFINITION Let X and Y be two complex vector spaces. A map T : X Y is called a conjugate-linear transformation if it is a reallinear transformation from X into Y, and if T (λx)

More information

Brownian motion. Samy Tindel. Purdue University. Probability Theory 2 - MA 539

Brownian motion. Samy Tindel. Purdue University. Probability Theory 2 - MA 539 Brownian motion Samy Tindel Purdue University Probability Theory 2 - MA 539 Mostly taken from Brownian Motion and Stochastic Calculus by I. Karatzas and S. Shreve Samy T. Brownian motion Probability Theory

More information

δ xj β n = 1 n Theorem 1.1. The sequence {P n } satisfies a large deviation principle on M(X) with the rate function I(β) given by

δ xj β n = 1 n Theorem 1.1. The sequence {P n } satisfies a large deviation principle on M(X) with the rate function I(β) given by . Sanov s Theorem Here we consider a sequence of i.i.d. random variables with values in some complete separable metric space X with a common distribution α. Then the sample distribution β n = n maps X

More information

d(x n, x) d(x n, x nk ) + d(x nk, x) where we chose any fixed k > N

d(x n, x) d(x n, x nk ) + d(x nk, x) where we chose any fixed k > N Problem 1. Let f : A R R have the property that for every x A, there exists ɛ > 0 such that f(t) > ɛ if t (x ɛ, x + ɛ) A. If the set A is compact, prove there exists c > 0 such that f(x) > c for all x

More information

Stochastic integration. P.J.C. Spreij

Stochastic integration. P.J.C. Spreij Stochastic integration P.J.C. Spreij this version: April 22, 29 Contents 1 Stochastic processes 1 1.1 General theory............................... 1 1.2 Stopping times...............................

More information

The Lebesgue Integral

The Lebesgue Integral The Lebesgue Integral Brent Nelson In these notes we give an introduction to the Lebesgue integral, assuming only a knowledge of metric spaces and the iemann integral. For more details see [1, Chapters

More information

Aliprantis, Border: Infinite-dimensional Analysis A Hitchhiker s Guide

Aliprantis, Border: Infinite-dimensional Analysis A Hitchhiker s Guide aliprantis.tex May 10, 2011 Aliprantis, Border: Infinite-dimensional Analysis A Hitchhiker s Guide Notes from [AB2]. 1 Odds and Ends 2 Topology 2.1 Topological spaces Example. (2.2) A semimetric = triangle

More information

Real Analysis, 2nd Edition, G.B.Folland Elements of Functional Analysis

Real Analysis, 2nd Edition, G.B.Folland Elements of Functional Analysis Real Analysis, 2nd Edition, G.B.Folland Chapter 5 Elements of Functional Analysis Yung-Hsiang Huang 5.1 Normed Vector Spaces 1. Note for any x, y X and a, b K, x+y x + y and by ax b y x + b a x. 2. It

More information

Math212a1413 The Lebesgue integral.

Math212a1413 The Lebesgue integral. Math212a1413 The Lebesgue integral. October 28, 2014 Simple functions. In what follows, (X, F, m) is a space with a σ-field of sets, and m a measure on F. The purpose of today s lecture is to develop the

More information

Introduction and Preliminaries

Introduction and Preliminaries Chapter 1 Introduction and Preliminaries This chapter serves two purposes. The first purpose is to prepare the readers for the more systematic development in later chapters of methods of real analysis

More information

The Central Limit Theorem: More of the Story

The Central Limit Theorem: More of the Story The Central Limit Theorem: More of the Story Steven Janke November 2015 Steven Janke (Seminar) The Central Limit Theorem:More of the Story November 2015 1 / 33 Central Limit Theorem Theorem (Central Limit

More information

1. Stochastic Processes and filtrations

1. Stochastic Processes and filtrations 1. Stochastic Processes and 1. Stoch. pr., A stochastic process (X t ) t T is a collection of random variables on (Ω, F) with values in a measurable space (S, S), i.e., for all t, In our case X t : Ω S

More information

Useful Probability Theorems

Useful Probability Theorems Useful Probability Theorems Shiu-Tang Li Finished: March 23, 2013 Last updated: November 2, 2013 1 Convergence in distribution Theorem 1.1. TFAE: (i) µ n µ, µ n, µ are probability measures. (ii) F n (x)

More information

Notions such as convergent sequence and Cauchy sequence make sense for any metric space. Convergent Sequences are Cauchy

Notions such as convergent sequence and Cauchy sequence make sense for any metric space. Convergent Sequences are Cauchy Banach Spaces These notes provide an introduction to Banach spaces, which are complete normed vector spaces. For the purposes of these notes, all vector spaces are assumed to be over the real numbers.

More information

If g is also continuous and strictly increasing on J, we may apply the strictly increasing inverse function g 1 to this inequality to get

If g is also continuous and strictly increasing on J, we may apply the strictly increasing inverse function g 1 to this inequality to get 18:2 1/24/2 TOPIC. Inequalities; measures of spread. This lecture explores the implications of Jensen s inequality for g-means in general, and for harmonic, geometric, arithmetic, and related means in

More information

Math 341: Convex Geometry. Xi Chen

Math 341: Convex Geometry. Xi Chen Math 341: Convex Geometry Xi Chen 479 Central Academic Building, University of Alberta, Edmonton, Alberta T6G 2G1, CANADA E-mail address: xichen@math.ualberta.ca CHAPTER 1 Basics 1. Euclidean Geometry

More information