Chapter 7 Statistical Functionals and the Delta Method
|
|
- Myron Jordan
- 6 years ago
- Views:
Transcription
1 Chapter 7 Statistical Functionals and the Delta Method. Estimators as Functionals of F n or P n 2. Continuity of Functionals of F or P 3. Metrics for Distribution Functions F and Probability Distributions P 4. Differentiability of Functionals of F or P : Gateaux, Hadamard, and Frechet Derivatives 5. Higher Order Derivatives
2 2
3 Chapter 7 Statistical Functionals and the Delta Method Estimates as Functionals of F n or P n Often the quantity we want to estimate can be viewed as a functional T (F ) or T (P ) of the underlying distribution function F or P generating the data. Then a simple nonparametric estimator is simply T (F n ) or T (P n ) where F n and P n denote the empirical distribution function and empirical measure of the data. Notation. Suppose that X,..., X n are i.i.d. P on (X, A). We let P n n δ Xi the empirical measure of the sample, n i= where δ x the measure with mass one at x (so δ x (A) = A (x) for A A. When X = R k, especially when k =, we will write F n (x) = n n (,x] (X i )=P n (,x], F(x) =P (,x]. i= Here is a list of examples. Example. The mean T (F )= xdf (x). T (F n )= xdf n (x). Example.2 The r-th moment: for r an integer, T (F )= x r df (x), and T (F n )= x r df n (x). Example.3 The variance: T (F )=Var F (X) = (x xdf (x)) 2 df (x) = (x y) 2 df (x)df (y), 2 T (F n )=Var Fn (X) = (x xdf n (x)) 2 df n (x) = (x y) 2 df n (x)df n (y). 2 Example.4 The median: T (F )=F (/2). T (F n )=F n (/2). Example.5 The α trimmed mean: T (F n ) = ( 2α) α α F n (u)du. T (F ) = ( 2α) α α F (u)du for 0 < α < /2. 3
4 4 CHAPTER 7. STATISTICAL FUNCTIONALS AND THE DELTA METHOD Example.6 The Hodges-Lehmann functional: T (F ) = (/2){F F } (/2) where denotes convolution. Then T (F n ) = (/2){F n F n } (/2) = median{(x i + X j )/2}. Example.7 The Mann-Whitney functional. For X, Y independent with distribution functions F and G respectively, T (F, G) = F dg = P F,G (X Y ). Then T (F m, G n )= F m dg n (based on two independent samples X,..., X m i.i.d. F with empirical df F m and Y,..., Y n i.i.d. G with empirical df G n. Example.8 Multivariate mean: for P on (R k, B k ): T (P ) = xdp (x) (with values in R k ), T (P n )= xdp n (x) =n n i= X i. Example.9 Multivariate cross second moments: for P on (R k, B k ): T (P )= xx T dp (x) = x 2 dp (x); T (P n )= xx T dp n (x) = Note that T (P ) and T (P n ) take values in R k k. x 2 dp n (x) =n n X i Xi T. Example.0 Multivariate covariance matrix: for P on (R k, B k ): T (P )= (x ydp (y))(x ydp (y)) T dp (x) = (x y)(x y) T dp (x)dp (y), 2 T (P n )= (x ydp n (y))(x ydp n (y)) T dp n (x) = 2 i= n (x y)(x y) T dp n (x)dp n (y) =n (X i X n )(X i X n ) T. Example. k means clustering functional: T (P )=(T (P ),..., T k (P ) where the T i (P ) s minimize k x t 2 x t k 2 dp (x) = x t i 2 dp (x) C i where C i = {x R m : t i minimizes x t 2 over {t,..., t k }}. Then T (P n )=(T (P n ),..., T k (P n )) where the T i (P n ) s minimize x t 2 x t k 2 dp n (x). i= i= Example.2 The simplicial depth function: for P on R k and x R k, set T (P ) T (P )(x) = Pr P (x S(X,..., X k+ )) where X,..., X k+ are i.i.d. P and S(x,..., x k+ ) is the simplex in R k determined by x,..., x k+ ; e.g. for k = 2, the simplex determined by x,x 2,x 3 is just a triangle. Then T (P n )=Pr Pn (x S(X,..., X k+ )). Note that in this example T (P ) is a function from R k to R.
5 . ESTIMATES AS FUNCTIONALS OF F N OR P N 5 Example.3 (Z-functional derived from likelihood). A maximum likelihood estimator: for P on (X, A), suppose that P = {P θ : θ Θ R k } is a regular parametric model with vector scores function l θ ( ; θ). Then for general P, not necessarily in the model P, consider T defined by () l θ (x; T (P ))dp (x) = 0. Then l θ (x; T (P n ))dp n (x) =0 defines T (P n ). For estimation of location in one dimension with l(x; θ) =ψ(x θ) and ψ f /f, these become ψ(x T (F ))df (x) = 0 and ψ(x T (F n ))df n (x) =0. We expect that often the value T (P ) Θ satisfying () also satisfies T (P ) = argmin θ Θ K(P, P θ ). Here is a heuristic argument showing why this should be true: Note that for many cases we have ˆθ n = argmax θ n l n (θ) = argmax θ P n (log θ) argmax θ P (log θ) = argmax θ log p θ (x)dp (x). p Now ( ) pθ P (log p θ ) = P (log p)+p log p ( ) p = P (log p) P log p θ = P (log p) K(P, P θ ). Thus argmax θ log p θ (x)dp (x) = argmin θ K(P, P θ ) θ(p ). If we can interchange differentiation and integration it follows that θ K(P, P θ )= p(x) l θ (x; θ)dµ(x) = l θ (x; θ)dp (x), so the relation () is obtained by setting this gradient vector equal to 0. Example.4 A bootstrap functional: let T (F ) be a functional with estimator T (F n ), and consider estimating the distribution function of n(t (F n ) T (F )), H n (F ; ) =P F ( n(t (F n ) T (F )) ). A natural estimator is H n (F n, ).
6 6 CHAPTER 7. STATISTICAL FUNCTIONALS AND THE DELTA METHOD 2 Continuity of Functionals of F or P One of the basic properties of a functional T is continuity (or lack thereof). The sense in which we will want our functionals T to be continuous is in the sense of weak convergence. Definition 2. A. T : F R is weakly continuous at F 0 if F n F 0 implies T (F n ) T (F 0 ). T : F R is weakly lower-semicontinuous at F 0 if F n F 0 implies lim inf n T (F n ) T (F 0 ). B. T : P R is weakly continuous at P 0 P if P n P 0 implies T (P n ) T (P 0 ). Example 2. T (F )= xdf (x) is discontinuous at every F 0 : if F n = ( n )F 0 + n δ an, then F n F 0 since, for bounded ψ ψdf n = ( n ) ψdf 0 + n ψ(a n ) ψdf 0. But T (F n ) = ( n )T (F 0 )+n a n if we choose a n so that n a n. Example 2.2 T (F ) = ( 2α) α α F (u)du with 0 <α< /2 is continuous at every F 0 : F n F 0 implies that Fn (t) F0 (t) a.e. Lebesgue. Hence T (F n ) = ( 2α) α α ( 2α) α by the dominated convergence theorem. α Fn (u)du F 0 (u)du = T (F 0 ) Example 2.3 T (F )=F (/2) is continuous at every F 0 such that F 0 is continuous at /2. Example 2.4 (A lower-semicontinuous functional T ). Let T (F ) = V ar F (X) = (x E F X) 2 df (x) = 2 E F (X X ) 2 where X, X F are independent; recall example.3. If F n d F, then lim inf n T (F n ) T (F ); this follows from Skorokhod and Fatou. Here is the basic fact about empirical measures that makes weak continuity of a functional T useful: Theorem 2. (Varadarajan). If X,..., X n are i.i.d. P on a separable metric space (S, d), then Pr(P n P ) =. Proof. For each fixed bounded and continuous function ψ we have P n ψ ψdp n = n ψ(x i ) a.s. Pψ ψdp n i=
7 2. CONTINUITY OF FUNCTIONALS OF F OR P 7 by the ordinary strong law of large numbers. The proof is completed by noting that the collection of bounded continuous functions on a separable metric space (S, d) is itself separable. See Dudley (989), sections.2 and.4. Combining Varadarajan s theorem with weak continuity of T yields the following simple result. Proposition 2. Suppose that: (i) (X, A) =(S, B Borel ) where (S, d) is a separable metric space and B Borel denotes its usual Borel sigma - field. (ii) T : P R is weakly continuous at P 0 P. (iii) X,..., X n are i.i.d. P 0. Then T n T (P n ) a.s. T (P 0 ). Proof. By Varadarajan s theorem 2., P n P 0 a.s. Fix ω A with Pr(A) = so that P ω n P 0. Then by weak continuity of T, T n (P ω n ) T (P 0). A difficulty in using this theorem is typically in trying to verify weak-continuity of T. Weak continuity is a rather strong hypothesis, and many interesting functions fail to have this type of continuity. The following approach is often useful. Definition 2.2 Let F L (P ) be a collection of integrable functions. Say that P n P with respect to F if P n P F sup f F P n (f) P (f) 0. Furthermore, we say that T : P R is continuous with respect to F if P n P F 0 implies that T (P n ) T (P ). Definition 2.3 If F L (P ) is a collection of integrable functions with P n P F say that F is a Glivenko-Cantelli class for P and write F GC(P ). 0, we then Theorem 2.2 Suppose that: (i) F GC(P ); i.e. P n P F a.s. 0. (ii) T is continuous with respect to F. Then T (P n ) a.s. T (P ).
8 8 CHAPTER 7. STATISTICAL FUNCTIONALS AND THE DELTA METHOD 3 Metrics Probability Distributions F and P We have already encountered the total variation and Hellinger metrics in the course of studying Scheffé s lemma, Bayes estimators, and tests of hypotheses. As we will see, as useful as these metrics are, they are too strong: the empirical measure P n fails to converge to the true P in either the total variation or Hellinger distance in general. In fact this fails to hold in general for the Prohorov and dual bounded Lipschitz metrics which we introduce below, as has been shown by Dudley (969), Kersting (978), and Bretagnolle and Huber-Carol (977); also see the remarks in Huber (98), page 39. Nonetheless, it will be helpful to have in mind some some useful metrics for probability measures P and df s F, and their properties. Definition 3. The Kolmogorov or supremum metric between two distribution functions F and G is d K (F, G) F G sup x R k F (x) G(x). Definition 3.2 The Lévy metric between two distribution functions F and G is d L (F, G) inf{ɛ > 0: G(x ɛ) ɛ F (x) G(x + ɛ)+ɛ for all x R}. Definition 3.3 The Prohorov metric between two probability measures P, Q on a metric space (S, d) is d pr (P, Q) = inf{ɛ > 0: P (B) Q(B ɛ )+ɛ for all Borel sets B} where B ɛ {x : inf y B d(x, y) ɛ}. To define the next metric for P, Q on a metric space (S, d) for any real-valued function f on S, set f L sup x y f(x) f(y) /d(x, y), and denote the usual supremum norm by f sup x f(x). Finally, set f BL f L + f. Definition 3.4 The dual - bounded Lipschitz metric d BL is defined by d BL (P, Q) sup{ fdp fdq : f BL }. Definition 3.5 The total variation metric d TV is defined by d TV (P, Q) sup{ P (A) Q(A) : A A} = p q dµ 2 where p dp/dµ, q = dq/dµ for some measure µ dominating both P and Q (e.g. µ = P + Q). Definition 3.6 The Hellinger metric H is defined by H 2 (P, Q) = { p pqdµ q} 2 dµ = ρ(p, Q) 2 where µ is any measure dominating both P and Q. The quantity ρ(p, Q) pqdµ is called the affinity between P and Q. The following basic theorem establishes relationships between these metrics:
9 3. METRICS PROBABILITY DISTRIBUTIONS F AND P 9 Theorem 3. A. d Pr (P, Q) 2 d BL (P, Q) 2d Pr (P, Q). B. H 2 (P, Q) d TV (P, Q) H(P, Q){2 H 2 (P, Q)} /2. C. d Pr (P, Q) d TV (P, Q). D. For distributions P, Q on the real line, d L d K d TV. Proof. We proved B in chapter 2. For A, see Dudley (989) section.3, problem 5, and section.6, corollary.6.5. Also see Huber (98), corollary 2.4.3, page 33. Another useful reference is Whitt (974). Theorem 3.2 (Strassen). The following are equivalent: (a) d Pr (P, Q) ɛ. (b) There exist X P, Y Q defined on a common probability space (Ω, F, P r) such that Pr(d(X, Y ) ɛ) ɛ. Proof. (b) implies (a) is easy: for any Borel set B, [X B] = [X B, d(x, Y ) ɛ] [X B, d(x, Y ) >ɛ] [X B ɛ ] [d(x, Y ) >ɛ], so that P (B) Q(B ɛ )+ɛ. For the proof of (a) implies (b) see Strassen (965), Dudley (968), or Schay (974). A nice treatment of Strassen s theorem is given by Dudley (989).
10 0 CHAPTER 7. STATISTICAL FUNCTIONALS AND THE DELTA METHOD 4 Differentiability of Functionals T of F or P To be able to prove more than consistency, we will need stronger properties of the functional T, namely differentiability. Definition 4. T is Gateaux differentiable at F if there exists a linear functional T (F ; ) such that for F t = ( t)f + tg, T (F t ) T (F ) lim t 0 t = T (F ; G F )= = ψ F (x)dg(x) ψ(x)d(g(x) F (x)) where ψ F (x) =ψ ψdf (x) has mean zero under F. Or, T : P RisGateaux - differentiable at P if there exists T (P ; ) bounded and linear such that for P t ( t)p + tq T (P t ) T (P ) lim = T t 0 t (P ; Q P )= ψ(x)d(q(x) P (x)) = ψ P (x)dq(x). Definition 4.2 T has the influence function or influence curve IC(x; T, F ) at F if, with F t ( t)f + tδ x, T (F t ) T (F ) lim = IC(x; T, F )=ψ F (x). t 0 t Example 4. Probability of a set: suppose that T (F )=F(A) for a fixed measurable set A. Then T (F t ) T (F ) = { A (x) A (y)df (y)}dg(x) = ψ F (x)dg(x) t where ψ F (x) = A (x) F (A). Example 4.2 The mean: T (F )= xdf (x). Then T (F t ) T (F ) = {x T (F )}dg(x) = ψ F (x)dg(x) t where ψ F (x) =x T (F ). Note that the influence function ψ F (x) for the probability functional is bounded, but that the influence function ψ F (x) for the mean functional is unbounded. Example 4.3 The variance: T (F )=Var F (X) = (x µ(f )) 2 df (x). Now d dt T (F t) t=0 = d (x µ(f t )) 2 df t (x) dt = (x µ(f )) 2 d(g F )(x)+2 (x µ(f ))( ) µ(f ; G F )df (x) = (x µ(f )) 2 d(g F ) = {(x µ(f )) 2 σf 2 }dg(x). Hence IC(x; T, F )=ψ F (x) =(x µ(f )) 2 σ 2 F.
11 4. DIFFERENTIABILITY OF FUNCTIONALS T OF F OR P Example 4.4 T (F )=F (/2), and suppose that F has density f which is positive at F (/2). Then, with F t ( t)f + tg, d dt T (F t) = d t=0 dt F t (/2). t=0 Note that F t (Ft (/2)) = /2, and hence 0 = d dt F t(ft (/2)) so that Hence t=0 = d {F (Ft (/2)) + t(g F )(F t (/2))} dt t=0 = f(f (/2)) T (F ; G F )+(G F)(F (/2)) + 0, T (F ; G F ) = (G F )(F (/2)) f(f (/2)) ((,F = (/2)](x) /2)dG(x) f(f. (/2)) ψ F (x) =IC(x; T, F )= f(f (/2)) { (,F (/2)](x) /2}. Example 4.5 The p th quantile, T (F )=F (p). By a calculation similar to that for the median, and T (F ; G F ) = (G F )(F (p)) f(f (p)) ((,F = (p)](x) p)dg(x) f(f. (p)) ψ F (x) =IC(x; T, F )= f(f (p) { (,F (p)](x) p}. Now we need to consider other types of derivatives: in particular the stronger notions of derivative which we will discuss below are those of Fréchet and Hadamard derivatives. Definition 4.3 A functional T : F RisFréchet - differentiable at F F with respect to d if there exists a continuous linear functional T (F ; ) from finite signed measures in R such that () T (G) T (F ) T (F ; G F ) 0 as d (F, G) 0. d (G, F ) Here are some properties of Fréchet - differentiation: Theorem 4. Suppose that d is a metric for weak convergence (i.e. the Lévy metric for df s on the line; or the Prohorov or dual-bounded Lipschitz metric for measures on a metric space (S, d)). Then:
12 2 CHAPTER 7. STATISTICAL FUNCTIONALS AND THE DELTA METHOD A. If T exists in the Fréchet sense, then it is unique, and T is Gateaux differentiable with Gateaux derivative T. B. If T is Fréchet differentiable at F, then T is continuous at F. C. T (F ; G F )= ψd(g F )= (ψ ψdf )dg where the function ψ is bounded and continuous. Proof. See Huber (98), proposition 5., page 37. Fréchet differentiability leads to an easy proof of asymptotic normality if the metric d is compatible with the empirical df or empirical measure. Theorem 4.2 Suppose that T is Fréchet differentiable at F with respect to d and that (2) nd (F n,f)=o p (). Then n(t (Fn ) T (F )) = ψ F d{ n(f n F )} + o p () = d n n ψ F (X i )+o p () i= N(0, Eψ 2 F (X)). Proof. By Fréchet differentiability of T at F, n(t (Fn ) T (F )) = n ψ F df n + no(d (F n,f)) = n ψ F df n + o(d (F n,f)) nd (F n,f) d (F n,f) = n ψ F df n + o()o p () by (2) and (). Note that if d is the Lévy metric d L or the Kolmogorov metric d K on the line, then (2) is satisfied: ndl (F n,f) nd K (F n,f)= n F n F d = Un (F ) d U(F ). Unfortunately, if d = d Pr or d = d BL, then nd (F n,f) is not O p () in general; see Dudley (969), Kersting (978), and Huber-Carol (977). Thus we are lead to consideration of other metrics such as the Kolmogorov metric and generalizations thereof for problems concerning functionals T (P ) of probability distributions P. While some functionals T are Fréchet differentiable with respect to the supremum or Kolmogorov metric, we can make more functionals differentiable by considering a somewhat weaker notion of differentiability as follows:
13 4. DIFFERENTIABILITY OF FUNCTIONALS T OF F OR P 3 Definition 4.4 A functional T : F R is Hadamard differentiable at F with respect to the Kolmogorov distance d K = (or compactly differentiable with respect to d K ) if there exists T (F ; ) continuous and linear satisfying T (F t ) T (F ) T (F ; F t F ) = o() t for all {F t } satisfying t (F t F ) 0 for some function. The motivation for this definition is simply that we can write n(t (Fn ) T (F )) = T (F + n /2 n /2 (F n F )) T (F ) n /2 where n(f n F ) d = U n (F ) U(F ). Hence we can easily deduce the following theorem. Theorem 4.3 Suppose that T : F R is Hadamard - differentiable at F with respect to. Then n(t (Fn ) T (F )) d N(0,E( T 2 (F ; (, ] (X) F ))). Moreover, n(t (Fn ) T (F )) T (F ; n(f n F )) = o p (). Proof. This is easily proved using a Skorokhod construction of the empirical process, or by the extended continuous mapping theorem. Gill (989) used the Skorokhod approach; Wellner (989) pointed out the extended continuous mapping proof. One way of treating all the kinds of differentiability we have discussed so far is as follows. Define T (F t ) T (F ) T (F ; F t F ) Rem(F + th); Here h = t (F t F ). Let S be a collection of subsets of the metric space (F,d ). Then T is S differentiable at F with derivative T if for all S S Rem(F + th) t 0 as t 0 uniformly in h S. Now different choices of S yield different degrees of goodness of the linear approximation of T by T at F. The three most common choices are just those we have discussed: A. When S = {all singletons of (F,d )}, T is called Gateaux or directionally differentiable. B. When S = {all compact subsets of (F,d )}, T is called Hadamard or compactly differentiable. C. When S = {all bounded subsets of F,d )}, T is called Fréchet (or boundedly) differentiable. Here is a simple example of a function T defined on pairs of probability distributions (or, in this case, distribution functions) which is compactly differentiable with respect to the familiar supremum (or uniform or Kolomogorov) norm, but which is not Fréchet differentiable with respect to this norm.
14 4 CHAPTER 7. STATISTICAL FUNCTIONALS AND THE DELTA METHOD Example 4.6 For distribution functions F, G on R, define T by T (F, G) = F dg = P (X Y ) where X F, Y G are independent. Let F F sup x F (x) F (x) F F F where F = { (,t] : t R}. Proposition 4. A. T (F, G) is Hadamard differentiable with respect to at every pair of df s (F, G) with derivative T given by (3) T ((F, G); α, β) = αdg βdf. B. T (F, G) is not Fréchet differentiable with respect to. Proof. The following proof of A is basically Gill s (989), lemma 3. For F t F and G t G, define α t (F t F )/t and β t (G t G)/t; for Hadamard differentiability we have α t α and β t β with resepct to for some (bounded) functions α and β. Now T (F t,g t ) T (F, G) t T (α t,β t ) = t α t dβ t = αd(g t G)+ (α t α)d(g t G). Since T is continuous, it suffices to show that the right side converges to 0. The second term on the right is bounded by { α t α dg t + } dg 2 α t α 0. Fix ɛ> 0. Since the limit function α in the first term is right-continuous with left limits, there is a step function with a finite number m of jumps, α say, which satisfies α α <ɛ. Thus the first term may be bounded as follows: αd(g t G) (α α)d(g t G) + αd(g t G) 2 α α + m α(x j ) (G t G)[x j,x j ) j= 2ɛ +2m α G t G 2ɛ. Since ɛ is arbitrary, this completes the proof of A. Here is the proof of B. If T were Fréchet - differentiable, it would have to be true that (a) T (F n,g n ) T (F, G) T (F n F, G n G) = o( F n F G n G ) for every sequence of pairs of d.f. s {(F n,g n )} with F n F 0 and G n G 0. We now exhibit a sequence {(F n,g n )} for which (a) fails. By straightforward algebra using (3), (b) T (F n,g n ) T (F, G) T (F n F, G n G) = (F n F )d(g n G).
15 4. DIFFERENTIABILITY OF FUNCTIONALS T OF F OR P 5 Consider the d.f. s F n and G n corresponding to the measures which put masses n at 0,..., (n )/n and /n,..., respectively: n n F n = n δ k/n, and G n = n δ k/n. k=0 both of these sequences of df s converge uniformly to the uniform(0, ) df F (x) x G(x), and furthermore F n F = G n G =/n. Now (F n F )(x) = (F n F )() = 0, and n k= k= ( ) k n x [(k )/n,k/n) (x), (G n G)(x) = (F n F )(x) n ( k n = n k= ) x [(k )/n,k/n) (x) with (G n G)() = 0. Thus, separating G n G into its discrete and continuous parts, (F n F )d(g n G) = n (F n F ) k= = n n n n { n ( ) k (/n)+n n n 2 ( ) } 2 n /n = 2n = O(/n) o( F n F G n G )=o(/n). 0 ( ) n t { dt} Hence (a) fails and T is not Fréchet - differentiable. Remark 4. The previous example was suggested by R. M. Dudley. Dudley (992), (994) has studied other metrics, based p variation norms, for which this T is almost Fréchet - differentiable, and for which some functionals may be Fréchet differentiable even though Fréchet differentiability with respect to may fail. A particular refinement of Hadamard differentiability which is very useful is as follows: since the limiting P Brownian bridge process G P of the empirical process G n n(p n P ) is in C u (F,ρ P ) with probability one for any F CLT (P ), we say that T is Hadamard differentiable tangentially to C u (F,ρ P ) at P P if there is a continuous linear function T : C u (F,ρ P ) B so that T (P t ) T (P ) t T ( 0 ) holds for any path {P t } such that t (P t P 0 )/t satisfies t 0 F 0 with 0 C u (F,ρ P ). Then a nice version of the delta-method for nonlinear functions T of P n is given by the following theorem:
16 6 CHAPTER 7. STATISTICAL FUNCTIONALS AND THE DELTA METHOD Theorem 4.4 Suppose that: (i) T is Hadamard differentiable tangentially to C u (F,ρ P ) at P P. (ii) F CLT (P ): n(pn P ) G P (where G P takes values in C u (F,ρ P ) by definition of F CLT (P )). Then (4) n(t (Pn ) T (P )) T (G P ). Proof. Define g n : P l (F) B by g n (x) n(t (P + n /2 x) T (P )). Then, by (i), for { n } l (F) with n 0 F 0 and 0 C u (F,ρ P ), g n ( n ) T ( 0 ) g( 0 ). Thus by the extended continuous mapping theorem in the Hoffmann - Jorgensen weak convergence theory (see van der Vaart and Wellner (996), Theorem.., page 67), g n (G n ) g(g P )= T (G P ), and hence (4) holds. The immediate corollary for the classical Mann-Whitney form of the Wilcoxon statistic given in example 4.6 is: Corollary If X,..., X m are i.i.d. F and independent of Y,..., Y n which are i.i.d. G, 0< P F,G (X Y ) <, and λ N m/n m/(m + n) λ (0, ), then { } mn mn F m dg n F dg = N N {T (F m, G n ) T (F, G)} λ U(F )dg λ V(G)dF d N(0,σλ 2 (F, G)) where U and V are two independent Brownian bridge processes and σλ 2 (F, G) = ( λ)v ar(g(x)) + λv ar(f (Y )). This is, of course, well-known, and can be proved in a variety of other ways (by treating T (F m, G n ) as a two-sample U statistic, or a rank statistic, or by a direct analysis), but the proof via the differentiable functional approach seems instructive and useful. (See e.g. Lehmann (975), Statistical Methods Based on Ranks, Section 5, pages , and especially example 20, page 365.) Other interesting applications have been given by Grübel (988) (who studies the asymptotic theory of the length of the shorth); Pons and Turckheim (989) (who study bivariate hazard estimators and tests of independence based thereon), and Gill and Johansen (990) (who prove Hadamard differentiability of the product integral ). Gill, van der Laan, and Wellner (992) give applications to several problems connected with estimation of bivariate distributions. Arcones and Giné (990) study the delta-method in connection with M estimation and the bootstrap. van der Vaart (99b) shows that Hadamard differentiable functions preserve asymptotic efficiency properties of estimators.
17 5. HIGH ORDER DERIVATIVES 7 5 High Order Derivatives The following example illustrates the phenomena which we want to consider here in the simplest possible setting: Example 5. Suppose that X,X 2,..., X n are i.i.d. Bernoulli(p). Then n(x n p) d Z N(0,p( p)), and, if g(p) =p( p), n(g(xn ) g(p)) d g (p)z = ( 2p)Z N(0,p( p)( 2p) 2 ) by the delta-method (or g-prime theorem). But if p =/2, since g (/2) = 0 this yields only n(g(xn ) /4) d 0. Thus we need to study the higher derivatives of g at /2. since g is, in fact, a quadratic, we have g(p) =g(/2) + 0 (p /2) + 2! ( 2)(p /2)2 =/4 (p /2) 2. Thus n(g(x n ) g(/2)) = n(x n /2) 2 d Z 2 4 χ2. This is a very simple example of a more general limit theorem which we will develop below. Now consider a functional T : F Rasinsections - 4. Definition 5. T is k th order Gateaux differentiable at F if, with F t = F + t(g F ), exists. d k T (F ;(G F )) = dk dt k T (F t) t=0 Note that T (F ; G F )=d (T ; G F ) if it exists. It is usually the case that d k T (F ; G F ) = = ψ k (x,..., x k )d(g F )(x ) d(g F )(x k ) ψ k,f (x,..., x k )dg(x ) dg(x k ); here the function ψ k,f is determined from ψ k by a straightforward centering recipe: ψ,f (x) ψ (x) ψ df, ψ 2,F (x) ψ 2 (x,x 2 ) ψ 2 (x,x 2 )df (x 2 ) + ψ 2 (x,x 2 )df (x )df (x 2 ), ψ 2 (x,x 2 )df (x )
18 8 CHAPTER 7. STATISTICAL FUNCTIONALS AND THE DELTA METHOD and so forth; see Serfling section 6.3.2, lemma A, page 222. A consequence of this is that we can write n k/2 d k (T (F ; F n F ) = ψ k (x,,x k )d(f n (x ) F (x )) d(f n (x k ) F (x k )) = n k/2 ψ k,f (x,..., x k )df n (x ) df n (x k ) = n k/2 n i = n ψ k,f (X i,..., X ik ). i k = This is exactly n k/2 times a V- statistic of order k. Also not that, by Taylor s formula for a function of one real variable t, we have T (F t ) T (F )= k j=0 j! d jt (F ; G F )+ d k+ (k + )! dt k+ T (F t) for some t [0,t]. To analyze the asymptotic behavior of T in terms of d T, d 2 T,..., it is typically the first non-zero term d m T which dominates. Serfling s condition A m : suppose that t=t V ar F (ψ k,f (X,..., X k )) { = 0 for k < m > 0 for k = m. R mn T (F n ) T (F ) m! d mt (F ; F n F ) satisfies n m/2 R mn = o p (). This condition will be invoked with first m = and then m = 2 in the following two theorems. Theorem 5. (Serfling s theorem A). Suppose that X,..., X n are i.i.d. F, and suppose that T satisfies A. Let µ(t, F )=E F ψ,f (X ) = 0 and σ 2 (T, F )=V ar(ψ,f (X )) and suppose that σ 2 (T, F ) <. Then n(t (Fn ) T (F )) d N(0,σ 2 (T, F )). Proof. Now by (ii) of condition A, n(t (Fn ) T (F )) = o p () + n /2 d T (F ; F n F ) = o p () + n ψ,f (X i ) n i= d N(0,σ 2 (T, F )). This is essentially the same as in our earlier proofs of asymptotic normality using differentiability, but here we are hypothesizing that the remainder term is asymptotically negligible (and thus the real effort in using the theorem will be to verify the second part of A ).
19 5. HIGH ORDER DERIVATIVES 9 Theorem 5.2 (Serfling s theorem B). Suppose that X,..., X n are i.i.d. F, and suppose that T satisfies A 2 with ψ 2,F (x, y) =ψ 2,F (y, x) and E F ψ2,f 2 (X,X 2 ) <, E F ψ 2,F (X,X ) <, and E F ψ 2,F (x, X 2 ) = 0. Define A : L 2 (F ) L 2 (F ) by Ag(x) = ψ 2,F (x, y)g(y)df (y), g L 2 (F ), and let {λ k } be the eigenvalues of A. Then n{t (F n ) T (F )} d 2 λ k Zk 2 k= where Z,Z 2,... are i.i.d. N(0, ). Sketch of the proof: By condition A 2 we can write { n(t (F n ) T (F )) = n T (F n ) T (F ) } 2! d 2(F ; F n F ) + n 2! d 2(F ; F n F ) = o p () + n () ψ 2,F (x,x 2 )df n (x )df n (x 2 ). 2 Now denote the orthonormal eigenfunctions of the (Hilbert-Schmidt) operator A by {φ k } and the corresponding eigenvalues by {λ k }: thus Aφ k = λ k φ k. Then it is well-known that ψ 2,F (x, y) = λ k φ k (x)φ k (y) k= in the sense of L 2 (F F ) convergence. Hence n ψ 2,F (x,x 2 )df n (x )df n (x 2 ) (2) (3) = n = n d k= { λ k k= k= λ k Z 2 k λ k φ k (x)φ k (y)df n (x)df n (y) φ k df n } 2 = { λ k n k= n φ k (X i ) where the Z i s are i.i.d. N(0, ) since E F φ k (X i ) = 0, E F ψk 2(X i) =, and E F φ j (X i )φ k (X i ) = 0. Combining () and (3) completes the heuristic proof. The reason that this is heuristic is because of the infinite series appearing in (2) and (3). The complete proof entails consideration of finite sums and the corresponding approximation arguments; see Serfling (98), pages for the U statistic case. But note that the V statistic argument on page 227 just involves throwing the diagonal terms back in, and is therefore really easier. i= } 2 Remark 5. It seems to me that Serfling s µ(t, F ) = 0 as formulated above. It also seems to me that he has missed the factor of /2 appearing in the limit distribution.
20 20 CHAPTER 7. STATISTICAL FUNCTIONALS AND THE DELTA METHOD Remark 5.2 Gregory (977) gives related but stronger results which do not require E F ψ 2,F (X,X )= k= λ k < and apply to some interesting statistics with λ k =/k. Note that the infinite series k= k (Z2 k ) defined a proper random variable since the summands have mean 0 and variances 2/k 2 ; these actually arise as limit distributions of the popular Shapiro - Wilk tests for normality; see e.g. DeWet and Venter (972), (973).
Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation
Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider
More informationAsymptotic statistics using the Functional Delta Method
Quantiles, Order Statistics and L-Statsitics TU Kaiserslautern 15. Februar 2015 Motivation Functional The delta method introduced in chapter 3 is an useful technique to turn the weak convergence of random
More informationEmpirical Processes: General Weak Convergence Theory
Empirical Processes: General Weak Convergence Theory Moulinath Banerjee May 18, 2010 1 Extended Weak Convergence The lack of measurability of the empirical process with respect to the sigma-field generated
More information7 Influence Functions
7 Influence Functions The influence function is used to approximate the standard error of a plug-in estimator. The formal definition is as follows. 7.1 Definition. The Gâteaux derivative of T at F in the
More informationAsymptotic Statistics-VI. Changliang Zou
Asymptotic Statistics-VI Changliang Zou Kolmogorov-Smirnov distance Example (Kolmogorov-Smirnov confidence intervals) We know given α (0, 1), there is a well-defined d = d α,n such that, for any continuous
More informationIntroduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued
Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research
More information3 Integration and Expectation
3 Integration and Expectation 3.1 Construction of the Lebesgue Integral Let (, F, µ) be a measure space (not necessarily a probability space). Our objective will be to define the Lebesgue integral R fdµ
More informationEfficiency of Profile/Partial Likelihood in the Cox Model
Efficiency of Profile/Partial Likelihood in the Cox Model Yuichi Hirose School of Mathematics, Statistics and Operations Research, Victoria University of Wellington, New Zealand Summary. This paper shows
More informationMeasure and Integration: Solutions of CW2
Measure and Integration: s of CW2 Fall 206 [G. Holzegel] December 9, 206 Problem of Sheet 5 a) Left (f n ) and (g n ) be sequences of integrable functions with f n (x) f (x) and g n (x) g (x) for almost
More informationX n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2)
14:17 11/16/2 TOPIC. Convergence in distribution and related notions. This section studies the notion of the so-called convergence in distribution of real random variables. This is the kind of convergence
More informationThe main results about probability measures are the following two facts:
Chapter 2 Probability measures The main results about probability measures are the following two facts: Theorem 2.1 (extension). If P is a (continuous) probability measure on a field F 0 then it has a
More informationNotions such as convergent sequence and Cauchy sequence make sense for any metric space. Convergent Sequences are Cauchy
Banach Spaces These notes provide an introduction to Banach spaces, which are complete normed vector spaces. For the purposes of these notes, all vector spaces are assumed to be over the real numbers.
More informationStatistics 581 Revision of Section 4.4: Consistency of Maximum Likelihood Estimates Wellner; 11/30/2001
Statistics 581 Revision of Section 4.4: Consistency of Maximum Likelihood Estimates Wellner; 11/30/2001 Some Uniform Strong Laws of Large Numbers Suppose that: A. X, X 1,...,X n are i.i.d. P on the measurable
More informationSOME CONVERSE LIMIT THEOREMS FOR EXCHANGEABLE BOOTSTRAPS
SOME CONVERSE LIMIT THEOREMS OR EXCHANGEABLE BOOTSTRAPS Jon A. Wellner University of Washington The bootstrap Glivenko-Cantelli and bootstrap Donsker theorems of Giné and Zinn (990) contain both necessary
More information3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?
MA 645-4A (Real Analysis), Dr. Chernov Homework assignment 1 (Due ). Show that the open disk x 2 + y 2 < 1 is a countable union of planar elementary sets. Show that the closed disk x 2 + y 2 1 is a countable
More informationLecture 35: December The fundamental statistical distances
36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose
More informationAsymptotics of minimax stochastic programs
Asymptotics of minimax stochastic programs Alexander Shapiro Abstract. We discuss in this paper asymptotics of the sample average approximation (SAA) of the optimal value of a minimax stochastic programming
More information(B(t i+1 ) B(t i )) 2
ltcc5.tex Week 5 29 October 213 Ch. V. ITÔ (STOCHASTIC) CALCULUS. WEAK CONVERGENCE. 1. Quadratic Variation. A partition π n of [, t] is a finite set of points t ni such that = t n < t n1
More informationStochastic Convergence, Delta Method & Moment Estimators
Stochastic Convergence, Delta Method & Moment Estimators Seminar on Asymptotic Statistics Daniel Hoffmann University of Kaiserslautern Department of Mathematics February 13, 2015 Daniel Hoffmann (TU KL)
More information7 Convergence in R d and in Metric Spaces
STA 711: Probability & Measure Theory Robert L. Wolpert 7 Convergence in R d and in Metric Spaces A sequence of elements a n of R d converges to a limit a if and only if, for each ǫ > 0, the sequence a
More information1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3
Index Page 1 Topology 2 1.1 Definition of a topology 2 1.2 Basis (Base) of a topology 2 1.3 The subspace topology & the product topology on X Y 3 1.4 Basic topology concepts: limit points, closed sets,
More information2 (Bonus). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?
MA 645-4A (Real Analysis), Dr. Chernov Homework assignment 1 (Due 9/5). Prove that every countable set A is measurable and µ(a) = 0. 2 (Bonus). Let A consist of points (x, y) such that either x or y is
More informationd(x n, x) d(x n, x nk ) + d(x nk, x) where we chose any fixed k > N
Problem 1. Let f : A R R have the property that for every x A, there exists ɛ > 0 such that f(t) > ɛ if t (x ɛ, x + ɛ) A. If the set A is compact, prove there exists c > 0 such that f(x) > c for all x
More informationMath 240 (Driver) Qual Exam (9/12/2017)
1 Name: I.D. #: Math 240 (Driver) Qual Exam (9/12/2017) Instructions: Clearly explain and justify your answers. You may cite theorems from the text, notes, or class as long as they are not what the problem
More informationf(x) f(z) c x z > 0 1
INVERSE AND IMPLICIT FUNCTION THEOREMS I use df x for the linear transformation that is the differential of f at x.. INVERSE FUNCTION THEOREM Definition. Suppose S R n is open, a S, and f : S R n is a
More informationTheoretical Statistics. Lecture 19.
Theoretical Statistics. Lecture 19. Peter Bartlett 1. Functional delta method. [vdv20] 2. Differentiability in normed spaces: Hadamard derivatives. [vdv20] 3. Quantile estimates. [vdv21] 1 Recall: Delta
More informationReal Analysis Problems
Real Analysis Problems Cristian E. Gutiérrez September 14, 29 1 1 CONTINUITY 1 Continuity Problem 1.1 Let r n be the sequence of rational numbers and Prove that f(x) = 1. f is continuous on the irrationals.
More informationMetric Spaces and Topology
Chapter 2 Metric Spaces and Topology From an engineering perspective, the most important way to construct a topology on a set is to define the topology in terms of a metric on the set. This approach underlies
More information+ 2x sin x. f(b i ) f(a i ) < ɛ. i=1. i=1
Appendix To understand weak derivatives and distributional derivatives in the simplest context of functions of a single variable, we describe without proof some results from real analysis (see [7] and
More informationL p Functions. Given a measure space (X, µ) and a real number p [1, ), recall that the L p -norm of a measurable function f : X R is defined by
L p Functions Given a measure space (, µ) and a real number p [, ), recall that the L p -norm of a measurable function f : R is defined by f p = ( ) /p f p dµ Note that the L p -norm of a function f may
More informationPreservation Theorems for Glivenko-Cantelli and Uniform Glivenko-Cantelli Classes
Preservation Theorems for Glivenko-Cantelli and Uniform Glivenko-Cantelli Classes This is page 5 Printer: Opaque this Aad van der Vaart and Jon A. Wellner ABSTRACT We show that the P Glivenko property
More informationIntegration on Measure Spaces
Chapter 3 Integration on Measure Spaces In this chapter we introduce the general notion of a measure on a space X, define the class of measurable functions, and define the integral, first on a class of
More informationGaussian Random Fields
Gaussian Random Fields Mini-Course by Prof. Voijkan Jaksic Vincent Larochelle, Alexandre Tomberg May 9, 009 Review Defnition.. Let, F, P ) be a probability space. Random variables {X,..., X n } are called
More informationLecture Notes on Metric Spaces
Lecture Notes on Metric Spaces Math 117: Summer 2007 John Douglas Moore Our goal of these notes is to explain a few facts regarding metric spaces not included in the first few chapters of the text [1],
More informationAsymptotic Distributions for the Nelson-Aalen and Kaplan-Meier estimators and for test statistics.
Asymptotic Distributions for the Nelson-Aalen and Kaplan-Meier estimators and for test statistics. Dragi Anevski Mathematical Sciences und University November 25, 21 1 Asymptotic distributions for statistical
More informationStat 710: Mathematical Statistics Lecture 31
Stat 710: Mathematical Statistics Lecture 31 Jun Shao Department of Statistics University of Wisconsin Madison, WI 53706, USA Jun Shao (UW-Madison) Stat 710, Lecture 31 April 13, 2009 1 / 13 Lecture 31:
More information1 Weak Convergence in R k
1 Weak Convergence in R k Byeong U. Park 1 Let X and X n, n 1, be random vectors taking values in R k. These random vectors are allowed to be defined on different probability spaces. Below, for the simplicity
More informationFunctional Analysis I
Functional Analysis I Course Notes by Stefan Richter Transcribed and Annotated by Gregory Zitelli Polar Decomposition Definition. An operator W B(H) is called a partial isometry if W x = X for all x (ker
More information(1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define
Homework, Real Analysis I, Fall, 2010. (1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define ρ(f, g) = 1 0 f(x) g(x) dx. Show that
More informationReal Analysis Notes. Thomas Goller
Real Analysis Notes Thomas Goller September 4, 2011 Contents 1 Abstract Measure Spaces 2 1.1 Basic Definitions........................... 2 1.2 Measurable Functions........................ 2 1.3 Integration..............................
More informationREAL AND COMPLEX ANALYSIS
REAL AND COMPLE ANALYSIS Third Edition Walter Rudin Professor of Mathematics University of Wisconsin, Madison Version 1.1 No rights reserved. Any part of this work can be reproduced or transmitted in any
More informationStability of optimization problems with stochastic dominance constraints
Stability of optimization problems with stochastic dominance constraints D. Dentcheva and W. Römisch Stevens Institute of Technology, Hoboken Humboldt-University Berlin www.math.hu-berlin.de/~romisch SIAM
More informationP-adic Functions - Part 1
P-adic Functions - Part 1 Nicolae Ciocan 22.11.2011 1 Locally constant functions Motivation: Another big difference between p-adic analysis and real analysis is the existence of nontrivial locally constant
More informationWeak convergence. Amsterdam, 13 November Leiden University. Limit theorems. Shota Gugushvili. Generalities. Criteria
Weak Leiden University Amsterdam, 13 November 2013 Outline 1 2 3 4 5 6 7 Definition Definition Let µ, µ 1, µ 2,... be probability measures on (R, B). It is said that µ n converges weakly to µ, and we then
More informationInference For High Dimensional M-estimates. Fixed Design Results
: Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and
More informationBrownian Motion. 1 Definition Brownian Motion Wiener measure... 3
Brownian Motion Contents 1 Definition 2 1.1 Brownian Motion................................. 2 1.2 Wiener measure.................................. 3 2 Construction 4 2.1 Gaussian process.................................
More informationSTAT 7032 Probability Spring Wlodek Bryc
STAT 7032 Probability Spring 2018 Wlodek Bryc Created: Friday, Jan 2, 2014 Revised for Spring 2018 Printed: January 9, 2018 File: Grad-Prob-2018.TEX Department of Mathematical Sciences, University of Cincinnati,
More informationAn elementary proof of the weak convergence of empirical processes
An elementary proof of the weak convergence of empirical processes Dragan Radulović Department of Mathematics, Florida Atlantic University Marten Wegkamp Department of Mathematics & Department of Statistical
More informationSection 8.2. Asymptotic normality
30 Section 8.2. Asymptotic normality We assume that X n =(X 1,...,X n ), where the X i s are i.i.d. with common density p(x; θ 0 ) P= {p(x; θ) :θ Θ}. We assume that θ 0 is identified in the sense that
More informationConstraint qualifications for convex inequality systems with applications in constrained optimization
Constraint qualifications for convex inequality systems with applications in constrained optimization Chong Li, K. F. Ng and T. K. Pong Abstract. For an inequality system defined by an infinite family
More informationAdvanced Statistics II: Non Parametric Tests
Advanced Statistics II: Non Parametric Tests Aurélien Garivier ParisTech February 27, 2011 Outline Fitting a distribution Rank Tests for the comparison of two samples Two unrelated samples: Mann-Whitney
More informationFinite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product
Chapter 4 Hilbert Spaces 4.1 Inner Product Spaces Inner Product Space. A complex vector space E is called an inner product space (or a pre-hilbert space, or a unitary space) if there is a mapping (, )
More informationX. Linearization and Newton s Method
163 X. Linearization and Newton s Method ** linearization ** X, Y nls s, f : G X Y. Given y Y, find z G s.t. fz = y. Since there is no assumption about f being linear, we might as well assume that y =.
More informationExercises Measure Theoretic Probability
Exercises Measure Theoretic Probability 2002-2003 Week 1 1. Prove the folloing statements. (a) The intersection of an arbitrary family of d-systems is again a d- system. (b) The intersection of an arbitrary
More informationarxiv: v1 [math.pr] 24 Aug 2018
On a class of norms generated by nonnegative integrable distributions arxiv:180808200v1 [mathpr] 24 Aug 2018 Michael Falk a and Gilles Stupfler b a Institute of Mathematics, University of Würzburg, Würzburg,
More informationOverview of normed linear spaces
20 Chapter 2 Overview of normed linear spaces Starting from this chapter, we begin examining linear spaces with at least one extra structure (topology or geometry). We assume linearity; this is a natural
More informationReal Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi
Real Analysis Math 3AH Rudin, Chapter # Dominique Abdi.. If r is rational (r 0) and x is irrational, prove that r + x and rx are irrational. Solution. Assume the contrary, that r+x and rx are rational.
More informationTHE INVERSE FUNCTION THEOREM
THE INVERSE FUNCTION THEOREM W. PATRICK HOOPER The implicit function theorem is the following result: Theorem 1. Let f be a C 1 function from a neighborhood of a point a R n into R n. Suppose A = Df(a)
More informationTHEOREMS, ETC., FOR MATH 515
THEOREMS, ETC., FOR MATH 515 Proposition 1 (=comment on page 17). If A is an algebra, then any finite union or finite intersection of sets in A is also in A. Proposition 2 (=Proposition 1.1). For every
More informationPart II Probability and Measure
Part II Probability and Measure Theorems Based on lectures by J. Miller Notes taken by Dexter Chua Michaelmas 2016 These notes are not endorsed by the lecturers, and I have modified them (often significantly)
More informationMATHS 730 FC Lecture Notes March 5, Introduction
1 INTRODUCTION MATHS 730 FC Lecture Notes March 5, 2014 1 Introduction Definition. If A, B are sets and there exists a bijection A B, they have the same cardinality, which we write as A, #A. If there exists
More informationAnalysis Qualifying Exam
Analysis Qualifying Exam Spring 2017 Problem 1: Let f be differentiable on R. Suppose that there exists M > 0 such that f(k) M for each integer k, and f (x) M for all x R. Show that f is bounded, i.e.,
More information9 Brownian Motion: Construction
9 Brownian Motion: Construction 9.1 Definition and Heuristics The central limit theorem states that the standard Gaussian distribution arises as the weak limit of the rescaled partial sums S n / p n of
More informationChapter 4: Asymptotic Properties of the MLE
Chapter 4: Asymptotic Properties of the MLE Daniel O. Scharfstein 09/19/13 1 / 1 Maximum Likelihood Maximum likelihood is the most powerful tool for estimation. In this part of the course, we will consider
More informationFunctional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...
Functional Analysis Franck Sueur 2018-2019 Contents 1 Metric spaces 1 1.1 Definitions........................................ 1 1.2 Completeness...................................... 3 1.3 Compactness......................................
More informationHILBERT SPACES AND THE RADON-NIKODYM THEOREM. where the bar in the first equation denotes complex conjugation. In either case, for any x V define
HILBERT SPACES AND THE RADON-NIKODYM THEOREM STEVEN P. LALLEY 1. DEFINITIONS Definition 1. A real inner product space is a real vector space V together with a symmetric, bilinear, positive-definite mapping,
More informationIntroduction to Empirical Processes and Semiparametric Inference Lecture 08: Stochastic Convergence
Introduction to Empirical Processes and Semiparametric Inference Lecture 08: Stochastic Convergence Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations
More informationl(y j ) = 0 for all y j (1)
Problem 1. The closed linear span of a subset {y j } of a normed vector space is defined as the intersection of all closed subspaces containing all y j and thus the smallest such subspace. 1 Show that
More informationNonparametric estimation under Shape Restrictions
Nonparametric estimation under Shape Restrictions Jon A. Wellner University of Washington, Seattle Statistical Seminar, Frejus, France August 30 - September 3, 2010 Outline: Five Lectures on Shape Restrictions
More information1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer 11(2) (1989),
Real Analysis 2, Math 651, Spring 2005 April 26, 2005 1 Real Analysis 2, Math 651, Spring 2005 Krzysztof Chris Ciesielski 1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer
More informationProblem List MATH 5143 Fall, 2013
Problem List MATH 5143 Fall, 2013 On any problem you may use the result of any previous problem (even if you were not able to do it) and any information given in class up to the moment the problem was
More informationOn the Uniform Asymptotic Validity of Subsampling and the Bootstrap
On the Uniform Asymptotic Validity of Subsampling and the Bootstrap Joseph P. Romano Departments of Economics and Statistics Stanford University romano@stanford.edu Azeem M. Shaikh Department of Economics
More informationTheoretical Statistics. Lecture 17.
Theoretical Statistics. Lecture 17. Peter Bartlett 1. Asymptotic normality of Z-estimators: classical conditions. 2. Asymptotic equicontinuity. 1 Recall: Delta method Theorem: Supposeφ : R k R m is differentiable
More informationMAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9
MAT 570 REAL ANALYSIS LECTURE NOTES PROFESSOR: JOHN QUIGG SEMESTER: FALL 204 Contents. Sets 2 2. Functions 5 3. Countability 7 4. Axiom of choice 8 5. Equivalence relations 9 6. Real numbers 9 7. Extended
More informationMath 5051 Measure Theory and Functional Analysis I Homework Assignment 2
Math 551 Measure Theory and Functional nalysis I Homework ssignment 2 Prof. Wickerhauser Due Friday, September 25th, 215 Please do Exercises 1, 4*, 7, 9*, 11, 12, 13, 16, 21*, 26, 28, 31, 32, 33, 36, 37.
More informationLecture 16: Sample quantiles and their asymptotic properties
Lecture 16: Sample quantiles and their asymptotic properties Estimation of quantiles (percentiles Suppose that X 1,...,X n are i.i.d. random variables from an unknown nonparametric F For p (0,1, G 1 (p
More informationLecture I: Asymptotics for large GUE random matrices
Lecture I: Asymptotics for large GUE random matrices Steen Thorbjørnsen, University of Aarhus andom Matrices Definition. Let (Ω, F, P) be a probability space and let n be a positive integer. Then a random
More informationNon-parametric Inference and Resampling
Non-parametric Inference and Resampling Exercises by David Wozabal (Last update. Juni 010) 1 Basic Facts about Rank and Order Statistics 1.1 10 students were asked about the amount of time they spend surfing
More informationsimple if it completely specifies the density of x
3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely
More informationIntroduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued
Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and
More informationSTATISTICAL INFERENCE UNDER ORDER RESTRICTIONS LIMIT THEORY FOR THE GRENANDER ESTIMATOR UNDER ALTERNATIVE HYPOTHESES
STATISTICAL INFERENCE UNDER ORDER RESTRICTIONS LIMIT THEORY FOR THE GRENANDER ESTIMATOR UNDER ALTERNATIVE HYPOTHESES By Jon A. Wellner University of Washington 1. Limit theory for the Grenander estimator.
More informationReview of Multi-Calculus (Study Guide for Spivak s CHAPTER ONE TO THREE)
Review of Multi-Calculus (Study Guide for Spivak s CHPTER ONE TO THREE) This material is for June 9 to 16 (Monday to Monday) Chapter I: Functions on R n Dot product and norm for vectors in R n : Let X
More informationExercises Measure Theoretic Probability
Exercises Measure Theoretic Probability Chapter 1 1. Prove the folloing statements. (a) The intersection of an arbitrary family of d-systems is again a d- system. (b) The intersection of an arbitrary family
More informationMeasure-theoretic probability
Measure-theoretic probability Koltay L. VEGTMAM144B November 28, 2012 (VEGTMAM144B) Measure-theoretic probability November 28, 2012 1 / 27 The probability space De nition The (Ω, A, P) measure space is
More informationMT804 Analysis Homework II
MT804 Analysis Homework II Eudoxus October 6, 2008 p. 135 4.5.1, 4.5.2 p. 136 4.5.3 part a only) p. 140 4.6.1 Exercise 4.5.1 Use the Intermediate Value Theorem to prove that every polynomial of with real
More informationFourier Series. 1. Review of Linear Algebra
Fourier Series In this section we give a short introduction to Fourier Analysis. If you are interested in Fourier analysis and would like to know more detail, I highly recommend the following book: Fourier
More informationδ xj β n = 1 n Theorem 1.1. The sequence {P n } satisfies a large deviation principle on M(X) with the rate function I(β) given by
. Sanov s Theorem Here we consider a sequence of i.i.d. random variables with values in some complete separable metric space X with a common distribution α. Then the sample distribution β n = n maps X
More informationMAA6617 COURSE NOTES SPRING 2014
MAA6617 COURSE NOTES SPRING 2014 19. Normed vector spaces Let X be a vector space over a field K (in this course we always have either K = R or K = C). Definition 19.1. A norm on X is a function : X K
More informationMeasure Theory on Topological Spaces. Course: Prof. Tony Dorlas 2010 Typset: Cathal Ormond
Measure Theory on Topological Spaces Course: Prof. Tony Dorlas 2010 Typset: Cathal Ormond May 22, 2011 Contents 1 Introduction 2 1.1 The Riemann Integral........................................ 2 1.2 Measurable..............................................
More informationLEBESGUE INTEGRATION. Introduction
LEBESGUE INTEGATION EYE SJAMAA Supplementary notes Math 414, Spring 25 Introduction The following heuristic argument is at the basis of the denition of the Lebesgue integral. This argument will be imprecise,
More informationProbability Theory. Richard F. Bass
Probability Theory Richard F. Bass ii c Copyright 2014 Richard F. Bass Contents 1 Basic notions 1 1.1 A few definitions from measure theory............. 1 1.2 Definitions............................. 2
More informationPROBLEMS. (b) (Polarization Identity) Show that in any inner product space
1 Professor Carl Cowen Math 54600 Fall 09 PROBLEMS 1. (Geometry in Inner Product Spaces) (a) (Parallelogram Law) Show that in any inner product space x + y 2 + x y 2 = 2( x 2 + y 2 ). (b) (Polarization
More informationChapter 8 Integral Operators
Chapter 8 Integral Operators In our development of metrics, norms, inner products, and operator theory in Chapters 1 7 we only tangentially considered topics that involved the use of Lebesgue measure,
More informationIndeed, if we want m to be compatible with taking limits, it should be countably additive, meaning that ( )
Lebesgue Measure The idea of the Lebesgue integral is to first define a measure on subsets of R. That is, we wish to assign a number m(s to each subset S of R, representing the total length that S takes
More informationNotes on uniform convergence
Notes on uniform convergence Erik Wahlén erik.wahlen@math.lu.se January 17, 2012 1 Numerical sequences We begin by recalling some properties of numerical sequences. By a numerical sequence we simply mean
More informationApplied Analysis (APPM 5440): Final exam 1:30pm 4:00pm, Dec. 14, Closed books.
Applied Analysis APPM 44: Final exam 1:3pm 4:pm, Dec. 14, 29. Closed books. Problem 1: 2p Set I = [, 1]. Prove that there is a continuous function u on I such that 1 ux 1 x sin ut 2 dt = cosx, x I. Define
More informationPreservation Theorems for Glivenko-Cantelli and Uniform Glivenko-Cantelli Classes
Preservation Theorems for Glivenko-Cantelli and Uniform Glivenko-Cantelli Classes Aad van der Vaart and Jon A. Wellner Free University and University of Washington ABSTRACT We show that the P Glivenko
More informationComparative and qualitative robustness for law-invariant risk measures
Comparative and qualitative robustness for law-invariant risk measures Volker Krätschmer Alexander Schied Henryk Zähle Abstract When estimating the risk of a P&L from historical data or Monte Carlo simulation,
More informationAnalysis Finite and Infinite Sets The Real Numbers The Cantor Set
Analysis Finite and Infinite Sets Definition. An initial segment is {n N n n 0 }. Definition. A finite set can be put into one-to-one correspondence with an initial segment. The empty set is also considered
More informationMATH MEASURE THEORY AND FOURIER ANALYSIS. Contents
MATH 3969 - MEASURE THEORY AND FOURIER ANALYSIS ANDREW TULLOCH Contents 1. Measure Theory 2 1.1. Properties of Measures 3 1.2. Constructing σ-algebras and measures 3 1.3. Properties of the Lebesgue measure
More information