Chapter 7 Statistical Functionals and the Delta Method

Size: px
Start display at page:

Download "Chapter 7 Statistical Functionals and the Delta Method"

Transcription

1 Chapter 7 Statistical Functionals and the Delta Method. Estimators as Functionals of F n or P n 2. Continuity of Functionals of F or P 3. Metrics for Distribution Functions F and Probability Distributions P 4. Differentiability of Functionals of F or P : Gateaux, Hadamard, and Frechet Derivatives 5. Higher Order Derivatives

2 2

3 Chapter 7 Statistical Functionals and the Delta Method Estimates as Functionals of F n or P n Often the quantity we want to estimate can be viewed as a functional T (F ) or T (P ) of the underlying distribution function F or P generating the data. Then a simple nonparametric estimator is simply T (F n ) or T (P n ) where F n and P n denote the empirical distribution function and empirical measure of the data. Notation. Suppose that X,..., X n are i.i.d. P on (X, A). We let P n n δ Xi the empirical measure of the sample, n i= where δ x the measure with mass one at x (so δ x (A) = A (x) for A A. When X = R k, especially when k =, we will write F n (x) = n n (,x] (X i )=P n (,x], F(x) =P (,x]. i= Here is a list of examples. Example. The mean T (F )= xdf (x). T (F n )= xdf n (x). Example.2 The r-th moment: for r an integer, T (F )= x r df (x), and T (F n )= x r df n (x). Example.3 The variance: T (F )=Var F (X) = (x xdf (x)) 2 df (x) = (x y) 2 df (x)df (y), 2 T (F n )=Var Fn (X) = (x xdf n (x)) 2 df n (x) = (x y) 2 df n (x)df n (y). 2 Example.4 The median: T (F )=F (/2). T (F n )=F n (/2). Example.5 The α trimmed mean: T (F n ) = ( 2α) α α F n (u)du. T (F ) = ( 2α) α α F (u)du for 0 < α < /2. 3

4 4 CHAPTER 7. STATISTICAL FUNCTIONALS AND THE DELTA METHOD Example.6 The Hodges-Lehmann functional: T (F ) = (/2){F F } (/2) where denotes convolution. Then T (F n ) = (/2){F n F n } (/2) = median{(x i + X j )/2}. Example.7 The Mann-Whitney functional. For X, Y independent with distribution functions F and G respectively, T (F, G) = F dg = P F,G (X Y ). Then T (F m, G n )= F m dg n (based on two independent samples X,..., X m i.i.d. F with empirical df F m and Y,..., Y n i.i.d. G with empirical df G n. Example.8 Multivariate mean: for P on (R k, B k ): T (P ) = xdp (x) (with values in R k ), T (P n )= xdp n (x) =n n i= X i. Example.9 Multivariate cross second moments: for P on (R k, B k ): T (P )= xx T dp (x) = x 2 dp (x); T (P n )= xx T dp n (x) = Note that T (P ) and T (P n ) take values in R k k. x 2 dp n (x) =n n X i Xi T. Example.0 Multivariate covariance matrix: for P on (R k, B k ): T (P )= (x ydp (y))(x ydp (y)) T dp (x) = (x y)(x y) T dp (x)dp (y), 2 T (P n )= (x ydp n (y))(x ydp n (y)) T dp n (x) = 2 i= n (x y)(x y) T dp n (x)dp n (y) =n (X i X n )(X i X n ) T. Example. k means clustering functional: T (P )=(T (P ),..., T k (P ) where the T i (P ) s minimize k x t 2 x t k 2 dp (x) = x t i 2 dp (x) C i where C i = {x R m : t i minimizes x t 2 over {t,..., t k }}. Then T (P n )=(T (P n ),..., T k (P n )) where the T i (P n ) s minimize x t 2 x t k 2 dp n (x). i= i= Example.2 The simplicial depth function: for P on R k and x R k, set T (P ) T (P )(x) = Pr P (x S(X,..., X k+ )) where X,..., X k+ are i.i.d. P and S(x,..., x k+ ) is the simplex in R k determined by x,..., x k+ ; e.g. for k = 2, the simplex determined by x,x 2,x 3 is just a triangle. Then T (P n )=Pr Pn (x S(X,..., X k+ )). Note that in this example T (P ) is a function from R k to R.

5 . ESTIMATES AS FUNCTIONALS OF F N OR P N 5 Example.3 (Z-functional derived from likelihood). A maximum likelihood estimator: for P on (X, A), suppose that P = {P θ : θ Θ R k } is a regular parametric model with vector scores function l θ ( ; θ). Then for general P, not necessarily in the model P, consider T defined by () l θ (x; T (P ))dp (x) = 0. Then l θ (x; T (P n ))dp n (x) =0 defines T (P n ). For estimation of location in one dimension with l(x; θ) =ψ(x θ) and ψ f /f, these become ψ(x T (F ))df (x) = 0 and ψ(x T (F n ))df n (x) =0. We expect that often the value T (P ) Θ satisfying () also satisfies T (P ) = argmin θ Θ K(P, P θ ). Here is a heuristic argument showing why this should be true: Note that for many cases we have ˆθ n = argmax θ n l n (θ) = argmax θ P n (log θ) argmax θ P (log θ) = argmax θ log p θ (x)dp (x). p Now ( ) pθ P (log p θ ) = P (log p)+p log p ( ) p = P (log p) P log p θ = P (log p) K(P, P θ ). Thus argmax θ log p θ (x)dp (x) = argmin θ K(P, P θ ) θ(p ). If we can interchange differentiation and integration it follows that θ K(P, P θ )= p(x) l θ (x; θ)dµ(x) = l θ (x; θ)dp (x), so the relation () is obtained by setting this gradient vector equal to 0. Example.4 A bootstrap functional: let T (F ) be a functional with estimator T (F n ), and consider estimating the distribution function of n(t (F n ) T (F )), H n (F ; ) =P F ( n(t (F n ) T (F )) ). A natural estimator is H n (F n, ).

6 6 CHAPTER 7. STATISTICAL FUNCTIONALS AND THE DELTA METHOD 2 Continuity of Functionals of F or P One of the basic properties of a functional T is continuity (or lack thereof). The sense in which we will want our functionals T to be continuous is in the sense of weak convergence. Definition 2. A. T : F R is weakly continuous at F 0 if F n F 0 implies T (F n ) T (F 0 ). T : F R is weakly lower-semicontinuous at F 0 if F n F 0 implies lim inf n T (F n ) T (F 0 ). B. T : P R is weakly continuous at P 0 P if P n P 0 implies T (P n ) T (P 0 ). Example 2. T (F )= xdf (x) is discontinuous at every F 0 : if F n = ( n )F 0 + n δ an, then F n F 0 since, for bounded ψ ψdf n = ( n ) ψdf 0 + n ψ(a n ) ψdf 0. But T (F n ) = ( n )T (F 0 )+n a n if we choose a n so that n a n. Example 2.2 T (F ) = ( 2α) α α F (u)du with 0 <α< /2 is continuous at every F 0 : F n F 0 implies that Fn (t) F0 (t) a.e. Lebesgue. Hence T (F n ) = ( 2α) α α ( 2α) α by the dominated convergence theorem. α Fn (u)du F 0 (u)du = T (F 0 ) Example 2.3 T (F )=F (/2) is continuous at every F 0 such that F 0 is continuous at /2. Example 2.4 (A lower-semicontinuous functional T ). Let T (F ) = V ar F (X) = (x E F X) 2 df (x) = 2 E F (X X ) 2 where X, X F are independent; recall example.3. If F n d F, then lim inf n T (F n ) T (F ); this follows from Skorokhod and Fatou. Here is the basic fact about empirical measures that makes weak continuity of a functional T useful: Theorem 2. (Varadarajan). If X,..., X n are i.i.d. P on a separable metric space (S, d), then Pr(P n P ) =. Proof. For each fixed bounded and continuous function ψ we have P n ψ ψdp n = n ψ(x i ) a.s. Pψ ψdp n i=

7 2. CONTINUITY OF FUNCTIONALS OF F OR P 7 by the ordinary strong law of large numbers. The proof is completed by noting that the collection of bounded continuous functions on a separable metric space (S, d) is itself separable. See Dudley (989), sections.2 and.4. Combining Varadarajan s theorem with weak continuity of T yields the following simple result. Proposition 2. Suppose that: (i) (X, A) =(S, B Borel ) where (S, d) is a separable metric space and B Borel denotes its usual Borel sigma - field. (ii) T : P R is weakly continuous at P 0 P. (iii) X,..., X n are i.i.d. P 0. Then T n T (P n ) a.s. T (P 0 ). Proof. By Varadarajan s theorem 2., P n P 0 a.s. Fix ω A with Pr(A) = so that P ω n P 0. Then by weak continuity of T, T n (P ω n ) T (P 0). A difficulty in using this theorem is typically in trying to verify weak-continuity of T. Weak continuity is a rather strong hypothesis, and many interesting functions fail to have this type of continuity. The following approach is often useful. Definition 2.2 Let F L (P ) be a collection of integrable functions. Say that P n P with respect to F if P n P F sup f F P n (f) P (f) 0. Furthermore, we say that T : P R is continuous with respect to F if P n P F 0 implies that T (P n ) T (P ). Definition 2.3 If F L (P ) is a collection of integrable functions with P n P F say that F is a Glivenko-Cantelli class for P and write F GC(P ). 0, we then Theorem 2.2 Suppose that: (i) F GC(P ); i.e. P n P F a.s. 0. (ii) T is continuous with respect to F. Then T (P n ) a.s. T (P ).

8 8 CHAPTER 7. STATISTICAL FUNCTIONALS AND THE DELTA METHOD 3 Metrics Probability Distributions F and P We have already encountered the total variation and Hellinger metrics in the course of studying Scheffé s lemma, Bayes estimators, and tests of hypotheses. As we will see, as useful as these metrics are, they are too strong: the empirical measure P n fails to converge to the true P in either the total variation or Hellinger distance in general. In fact this fails to hold in general for the Prohorov and dual bounded Lipschitz metrics which we introduce below, as has been shown by Dudley (969), Kersting (978), and Bretagnolle and Huber-Carol (977); also see the remarks in Huber (98), page 39. Nonetheless, it will be helpful to have in mind some some useful metrics for probability measures P and df s F, and their properties. Definition 3. The Kolmogorov or supremum metric between two distribution functions F and G is d K (F, G) F G sup x R k F (x) G(x). Definition 3.2 The Lévy metric between two distribution functions F and G is d L (F, G) inf{ɛ > 0: G(x ɛ) ɛ F (x) G(x + ɛ)+ɛ for all x R}. Definition 3.3 The Prohorov metric between two probability measures P, Q on a metric space (S, d) is d pr (P, Q) = inf{ɛ > 0: P (B) Q(B ɛ )+ɛ for all Borel sets B} where B ɛ {x : inf y B d(x, y) ɛ}. To define the next metric for P, Q on a metric space (S, d) for any real-valued function f on S, set f L sup x y f(x) f(y) /d(x, y), and denote the usual supremum norm by f sup x f(x). Finally, set f BL f L + f. Definition 3.4 The dual - bounded Lipschitz metric d BL is defined by d BL (P, Q) sup{ fdp fdq : f BL }. Definition 3.5 The total variation metric d TV is defined by d TV (P, Q) sup{ P (A) Q(A) : A A} = p q dµ 2 where p dp/dµ, q = dq/dµ for some measure µ dominating both P and Q (e.g. µ = P + Q). Definition 3.6 The Hellinger metric H is defined by H 2 (P, Q) = { p pqdµ q} 2 dµ = ρ(p, Q) 2 where µ is any measure dominating both P and Q. The quantity ρ(p, Q) pqdµ is called the affinity between P and Q. The following basic theorem establishes relationships between these metrics:

9 3. METRICS PROBABILITY DISTRIBUTIONS F AND P 9 Theorem 3. A. d Pr (P, Q) 2 d BL (P, Q) 2d Pr (P, Q). B. H 2 (P, Q) d TV (P, Q) H(P, Q){2 H 2 (P, Q)} /2. C. d Pr (P, Q) d TV (P, Q). D. For distributions P, Q on the real line, d L d K d TV. Proof. We proved B in chapter 2. For A, see Dudley (989) section.3, problem 5, and section.6, corollary.6.5. Also see Huber (98), corollary 2.4.3, page 33. Another useful reference is Whitt (974). Theorem 3.2 (Strassen). The following are equivalent: (a) d Pr (P, Q) ɛ. (b) There exist X P, Y Q defined on a common probability space (Ω, F, P r) such that Pr(d(X, Y ) ɛ) ɛ. Proof. (b) implies (a) is easy: for any Borel set B, [X B] = [X B, d(x, Y ) ɛ] [X B, d(x, Y ) >ɛ] [X B ɛ ] [d(x, Y ) >ɛ], so that P (B) Q(B ɛ )+ɛ. For the proof of (a) implies (b) see Strassen (965), Dudley (968), or Schay (974). A nice treatment of Strassen s theorem is given by Dudley (989).

10 0 CHAPTER 7. STATISTICAL FUNCTIONALS AND THE DELTA METHOD 4 Differentiability of Functionals T of F or P To be able to prove more than consistency, we will need stronger properties of the functional T, namely differentiability. Definition 4. T is Gateaux differentiable at F if there exists a linear functional T (F ; ) such that for F t = ( t)f + tg, T (F t ) T (F ) lim t 0 t = T (F ; G F )= = ψ F (x)dg(x) ψ(x)d(g(x) F (x)) where ψ F (x) =ψ ψdf (x) has mean zero under F. Or, T : P RisGateaux - differentiable at P if there exists T (P ; ) bounded and linear such that for P t ( t)p + tq T (P t ) T (P ) lim = T t 0 t (P ; Q P )= ψ(x)d(q(x) P (x)) = ψ P (x)dq(x). Definition 4.2 T has the influence function or influence curve IC(x; T, F ) at F if, with F t ( t)f + tδ x, T (F t ) T (F ) lim = IC(x; T, F )=ψ F (x). t 0 t Example 4. Probability of a set: suppose that T (F )=F(A) for a fixed measurable set A. Then T (F t ) T (F ) = { A (x) A (y)df (y)}dg(x) = ψ F (x)dg(x) t where ψ F (x) = A (x) F (A). Example 4.2 The mean: T (F )= xdf (x). Then T (F t ) T (F ) = {x T (F )}dg(x) = ψ F (x)dg(x) t where ψ F (x) =x T (F ). Note that the influence function ψ F (x) for the probability functional is bounded, but that the influence function ψ F (x) for the mean functional is unbounded. Example 4.3 The variance: T (F )=Var F (X) = (x µ(f )) 2 df (x). Now d dt T (F t) t=0 = d (x µ(f t )) 2 df t (x) dt = (x µ(f )) 2 d(g F )(x)+2 (x µ(f ))( ) µ(f ; G F )df (x) = (x µ(f )) 2 d(g F ) = {(x µ(f )) 2 σf 2 }dg(x). Hence IC(x; T, F )=ψ F (x) =(x µ(f )) 2 σ 2 F.

11 4. DIFFERENTIABILITY OF FUNCTIONALS T OF F OR P Example 4.4 T (F )=F (/2), and suppose that F has density f which is positive at F (/2). Then, with F t ( t)f + tg, d dt T (F t) = d t=0 dt F t (/2). t=0 Note that F t (Ft (/2)) = /2, and hence 0 = d dt F t(ft (/2)) so that Hence t=0 = d {F (Ft (/2)) + t(g F )(F t (/2))} dt t=0 = f(f (/2)) T (F ; G F )+(G F)(F (/2)) + 0, T (F ; G F ) = (G F )(F (/2)) f(f (/2)) ((,F = (/2)](x) /2)dG(x) f(f. (/2)) ψ F (x) =IC(x; T, F )= f(f (/2)) { (,F (/2)](x) /2}. Example 4.5 The p th quantile, T (F )=F (p). By a calculation similar to that for the median, and T (F ; G F ) = (G F )(F (p)) f(f (p)) ((,F = (p)](x) p)dg(x) f(f. (p)) ψ F (x) =IC(x; T, F )= f(f (p) { (,F (p)](x) p}. Now we need to consider other types of derivatives: in particular the stronger notions of derivative which we will discuss below are those of Fréchet and Hadamard derivatives. Definition 4.3 A functional T : F RisFréchet - differentiable at F F with respect to d if there exists a continuous linear functional T (F ; ) from finite signed measures in R such that () T (G) T (F ) T (F ; G F ) 0 as d (F, G) 0. d (G, F ) Here are some properties of Fréchet - differentiation: Theorem 4. Suppose that d is a metric for weak convergence (i.e. the Lévy metric for df s on the line; or the Prohorov or dual-bounded Lipschitz metric for measures on a metric space (S, d)). Then:

12 2 CHAPTER 7. STATISTICAL FUNCTIONALS AND THE DELTA METHOD A. If T exists in the Fréchet sense, then it is unique, and T is Gateaux differentiable with Gateaux derivative T. B. If T is Fréchet differentiable at F, then T is continuous at F. C. T (F ; G F )= ψd(g F )= (ψ ψdf )dg where the function ψ is bounded and continuous. Proof. See Huber (98), proposition 5., page 37. Fréchet differentiability leads to an easy proof of asymptotic normality if the metric d is compatible with the empirical df or empirical measure. Theorem 4.2 Suppose that T is Fréchet differentiable at F with respect to d and that (2) nd (F n,f)=o p (). Then n(t (Fn ) T (F )) = ψ F d{ n(f n F )} + o p () = d n n ψ F (X i )+o p () i= N(0, Eψ 2 F (X)). Proof. By Fréchet differentiability of T at F, n(t (Fn ) T (F )) = n ψ F df n + no(d (F n,f)) = n ψ F df n + o(d (F n,f)) nd (F n,f) d (F n,f) = n ψ F df n + o()o p () by (2) and (). Note that if d is the Lévy metric d L or the Kolmogorov metric d K on the line, then (2) is satisfied: ndl (F n,f) nd K (F n,f)= n F n F d = Un (F ) d U(F ). Unfortunately, if d = d Pr or d = d BL, then nd (F n,f) is not O p () in general; see Dudley (969), Kersting (978), and Huber-Carol (977). Thus we are lead to consideration of other metrics such as the Kolmogorov metric and generalizations thereof for problems concerning functionals T (P ) of probability distributions P. While some functionals T are Fréchet differentiable with respect to the supremum or Kolmogorov metric, we can make more functionals differentiable by considering a somewhat weaker notion of differentiability as follows:

13 4. DIFFERENTIABILITY OF FUNCTIONALS T OF F OR P 3 Definition 4.4 A functional T : F R is Hadamard differentiable at F with respect to the Kolmogorov distance d K = (or compactly differentiable with respect to d K ) if there exists T (F ; ) continuous and linear satisfying T (F t ) T (F ) T (F ; F t F ) = o() t for all {F t } satisfying t (F t F ) 0 for some function. The motivation for this definition is simply that we can write n(t (Fn ) T (F )) = T (F + n /2 n /2 (F n F )) T (F ) n /2 where n(f n F ) d = U n (F ) U(F ). Hence we can easily deduce the following theorem. Theorem 4.3 Suppose that T : F R is Hadamard - differentiable at F with respect to. Then n(t (Fn ) T (F )) d N(0,E( T 2 (F ; (, ] (X) F ))). Moreover, n(t (Fn ) T (F )) T (F ; n(f n F )) = o p (). Proof. This is easily proved using a Skorokhod construction of the empirical process, or by the extended continuous mapping theorem. Gill (989) used the Skorokhod approach; Wellner (989) pointed out the extended continuous mapping proof. One way of treating all the kinds of differentiability we have discussed so far is as follows. Define T (F t ) T (F ) T (F ; F t F ) Rem(F + th); Here h = t (F t F ). Let S be a collection of subsets of the metric space (F,d ). Then T is S differentiable at F with derivative T if for all S S Rem(F + th) t 0 as t 0 uniformly in h S. Now different choices of S yield different degrees of goodness of the linear approximation of T by T at F. The three most common choices are just those we have discussed: A. When S = {all singletons of (F,d )}, T is called Gateaux or directionally differentiable. B. When S = {all compact subsets of (F,d )}, T is called Hadamard or compactly differentiable. C. When S = {all bounded subsets of F,d )}, T is called Fréchet (or boundedly) differentiable. Here is a simple example of a function T defined on pairs of probability distributions (or, in this case, distribution functions) which is compactly differentiable with respect to the familiar supremum (or uniform or Kolomogorov) norm, but which is not Fréchet differentiable with respect to this norm.

14 4 CHAPTER 7. STATISTICAL FUNCTIONALS AND THE DELTA METHOD Example 4.6 For distribution functions F, G on R, define T by T (F, G) = F dg = P (X Y ) where X F, Y G are independent. Let F F sup x F (x) F (x) F F F where F = { (,t] : t R}. Proposition 4. A. T (F, G) is Hadamard differentiable with respect to at every pair of df s (F, G) with derivative T given by (3) T ((F, G); α, β) = αdg βdf. B. T (F, G) is not Fréchet differentiable with respect to. Proof. The following proof of A is basically Gill s (989), lemma 3. For F t F and G t G, define α t (F t F )/t and β t (G t G)/t; for Hadamard differentiability we have α t α and β t β with resepct to for some (bounded) functions α and β. Now T (F t,g t ) T (F, G) t T (α t,β t ) = t α t dβ t = αd(g t G)+ (α t α)d(g t G). Since T is continuous, it suffices to show that the right side converges to 0. The second term on the right is bounded by { α t α dg t + } dg 2 α t α 0. Fix ɛ> 0. Since the limit function α in the first term is right-continuous with left limits, there is a step function with a finite number m of jumps, α say, which satisfies α α <ɛ. Thus the first term may be bounded as follows: αd(g t G) (α α)d(g t G) + αd(g t G) 2 α α + m α(x j ) (G t G)[x j,x j ) j= 2ɛ +2m α G t G 2ɛ. Since ɛ is arbitrary, this completes the proof of A. Here is the proof of B. If T were Fréchet - differentiable, it would have to be true that (a) T (F n,g n ) T (F, G) T (F n F, G n G) = o( F n F G n G ) for every sequence of pairs of d.f. s {(F n,g n )} with F n F 0 and G n G 0. We now exhibit a sequence {(F n,g n )} for which (a) fails. By straightforward algebra using (3), (b) T (F n,g n ) T (F, G) T (F n F, G n G) = (F n F )d(g n G).

15 4. DIFFERENTIABILITY OF FUNCTIONALS T OF F OR P 5 Consider the d.f. s F n and G n corresponding to the measures which put masses n at 0,..., (n )/n and /n,..., respectively: n n F n = n δ k/n, and G n = n δ k/n. k=0 both of these sequences of df s converge uniformly to the uniform(0, ) df F (x) x G(x), and furthermore F n F = G n G =/n. Now (F n F )(x) = (F n F )() = 0, and n k= k= ( ) k n x [(k )/n,k/n) (x), (G n G)(x) = (F n F )(x) n ( k n = n k= ) x [(k )/n,k/n) (x) with (G n G)() = 0. Thus, separating G n G into its discrete and continuous parts, (F n F )d(g n G) = n (F n F ) k= = n n n n { n ( ) k (/n)+n n n 2 ( ) } 2 n /n = 2n = O(/n) o( F n F G n G )=o(/n). 0 ( ) n t { dt} Hence (a) fails and T is not Fréchet - differentiable. Remark 4. The previous example was suggested by R. M. Dudley. Dudley (992), (994) has studied other metrics, based p variation norms, for which this T is almost Fréchet - differentiable, and for which some functionals may be Fréchet differentiable even though Fréchet differentiability with respect to may fail. A particular refinement of Hadamard differentiability which is very useful is as follows: since the limiting P Brownian bridge process G P of the empirical process G n n(p n P ) is in C u (F,ρ P ) with probability one for any F CLT (P ), we say that T is Hadamard differentiable tangentially to C u (F,ρ P ) at P P if there is a continuous linear function T : C u (F,ρ P ) B so that T (P t ) T (P ) t T ( 0 ) holds for any path {P t } such that t (P t P 0 )/t satisfies t 0 F 0 with 0 C u (F,ρ P ). Then a nice version of the delta-method for nonlinear functions T of P n is given by the following theorem:

16 6 CHAPTER 7. STATISTICAL FUNCTIONALS AND THE DELTA METHOD Theorem 4.4 Suppose that: (i) T is Hadamard differentiable tangentially to C u (F,ρ P ) at P P. (ii) F CLT (P ): n(pn P ) G P (where G P takes values in C u (F,ρ P ) by definition of F CLT (P )). Then (4) n(t (Pn ) T (P )) T (G P ). Proof. Define g n : P l (F) B by g n (x) n(t (P + n /2 x) T (P )). Then, by (i), for { n } l (F) with n 0 F 0 and 0 C u (F,ρ P ), g n ( n ) T ( 0 ) g( 0 ). Thus by the extended continuous mapping theorem in the Hoffmann - Jorgensen weak convergence theory (see van der Vaart and Wellner (996), Theorem.., page 67), g n (G n ) g(g P )= T (G P ), and hence (4) holds. The immediate corollary for the classical Mann-Whitney form of the Wilcoxon statistic given in example 4.6 is: Corollary If X,..., X m are i.i.d. F and independent of Y,..., Y n which are i.i.d. G, 0< P F,G (X Y ) <, and λ N m/n m/(m + n) λ (0, ), then { } mn mn F m dg n F dg = N N {T (F m, G n ) T (F, G)} λ U(F )dg λ V(G)dF d N(0,σλ 2 (F, G)) where U and V are two independent Brownian bridge processes and σλ 2 (F, G) = ( λ)v ar(g(x)) + λv ar(f (Y )). This is, of course, well-known, and can be proved in a variety of other ways (by treating T (F m, G n ) as a two-sample U statistic, or a rank statistic, or by a direct analysis), but the proof via the differentiable functional approach seems instructive and useful. (See e.g. Lehmann (975), Statistical Methods Based on Ranks, Section 5, pages , and especially example 20, page 365.) Other interesting applications have been given by Grübel (988) (who studies the asymptotic theory of the length of the shorth); Pons and Turckheim (989) (who study bivariate hazard estimators and tests of independence based thereon), and Gill and Johansen (990) (who prove Hadamard differentiability of the product integral ). Gill, van der Laan, and Wellner (992) give applications to several problems connected with estimation of bivariate distributions. Arcones and Giné (990) study the delta-method in connection with M estimation and the bootstrap. van der Vaart (99b) shows that Hadamard differentiable functions preserve asymptotic efficiency properties of estimators.

17 5. HIGH ORDER DERIVATIVES 7 5 High Order Derivatives The following example illustrates the phenomena which we want to consider here in the simplest possible setting: Example 5. Suppose that X,X 2,..., X n are i.i.d. Bernoulli(p). Then n(x n p) d Z N(0,p( p)), and, if g(p) =p( p), n(g(xn ) g(p)) d g (p)z = ( 2p)Z N(0,p( p)( 2p) 2 ) by the delta-method (or g-prime theorem). But if p =/2, since g (/2) = 0 this yields only n(g(xn ) /4) d 0. Thus we need to study the higher derivatives of g at /2. since g is, in fact, a quadratic, we have g(p) =g(/2) + 0 (p /2) + 2! ( 2)(p /2)2 =/4 (p /2) 2. Thus n(g(x n ) g(/2)) = n(x n /2) 2 d Z 2 4 χ2. This is a very simple example of a more general limit theorem which we will develop below. Now consider a functional T : F Rasinsections - 4. Definition 5. T is k th order Gateaux differentiable at F if, with F t = F + t(g F ), exists. d k T (F ;(G F )) = dk dt k T (F t) t=0 Note that T (F ; G F )=d (T ; G F ) if it exists. It is usually the case that d k T (F ; G F ) = = ψ k (x,..., x k )d(g F )(x ) d(g F )(x k ) ψ k,f (x,..., x k )dg(x ) dg(x k ); here the function ψ k,f is determined from ψ k by a straightforward centering recipe: ψ,f (x) ψ (x) ψ df, ψ 2,F (x) ψ 2 (x,x 2 ) ψ 2 (x,x 2 )df (x 2 ) + ψ 2 (x,x 2 )df (x )df (x 2 ), ψ 2 (x,x 2 )df (x )

18 8 CHAPTER 7. STATISTICAL FUNCTIONALS AND THE DELTA METHOD and so forth; see Serfling section 6.3.2, lemma A, page 222. A consequence of this is that we can write n k/2 d k (T (F ; F n F ) = ψ k (x,,x k )d(f n (x ) F (x )) d(f n (x k ) F (x k )) = n k/2 ψ k,f (x,..., x k )df n (x ) df n (x k ) = n k/2 n i = n ψ k,f (X i,..., X ik ). i k = This is exactly n k/2 times a V- statistic of order k. Also not that, by Taylor s formula for a function of one real variable t, we have T (F t ) T (F )= k j=0 j! d jt (F ; G F )+ d k+ (k + )! dt k+ T (F t) for some t [0,t]. To analyze the asymptotic behavior of T in terms of d T, d 2 T,..., it is typically the first non-zero term d m T which dominates. Serfling s condition A m : suppose that t=t V ar F (ψ k,f (X,..., X k )) { = 0 for k < m > 0 for k = m. R mn T (F n ) T (F ) m! d mt (F ; F n F ) satisfies n m/2 R mn = o p (). This condition will be invoked with first m = and then m = 2 in the following two theorems. Theorem 5. (Serfling s theorem A). Suppose that X,..., X n are i.i.d. F, and suppose that T satisfies A. Let µ(t, F )=E F ψ,f (X ) = 0 and σ 2 (T, F )=V ar(ψ,f (X )) and suppose that σ 2 (T, F ) <. Then n(t (Fn ) T (F )) d N(0,σ 2 (T, F )). Proof. Now by (ii) of condition A, n(t (Fn ) T (F )) = o p () + n /2 d T (F ; F n F ) = o p () + n ψ,f (X i ) n i= d N(0,σ 2 (T, F )). This is essentially the same as in our earlier proofs of asymptotic normality using differentiability, but here we are hypothesizing that the remainder term is asymptotically negligible (and thus the real effort in using the theorem will be to verify the second part of A ).

19 5. HIGH ORDER DERIVATIVES 9 Theorem 5.2 (Serfling s theorem B). Suppose that X,..., X n are i.i.d. F, and suppose that T satisfies A 2 with ψ 2,F (x, y) =ψ 2,F (y, x) and E F ψ2,f 2 (X,X 2 ) <, E F ψ 2,F (X,X ) <, and E F ψ 2,F (x, X 2 ) = 0. Define A : L 2 (F ) L 2 (F ) by Ag(x) = ψ 2,F (x, y)g(y)df (y), g L 2 (F ), and let {λ k } be the eigenvalues of A. Then n{t (F n ) T (F )} d 2 λ k Zk 2 k= where Z,Z 2,... are i.i.d. N(0, ). Sketch of the proof: By condition A 2 we can write { n(t (F n ) T (F )) = n T (F n ) T (F ) } 2! d 2(F ; F n F ) + n 2! d 2(F ; F n F ) = o p () + n () ψ 2,F (x,x 2 )df n (x )df n (x 2 ). 2 Now denote the orthonormal eigenfunctions of the (Hilbert-Schmidt) operator A by {φ k } and the corresponding eigenvalues by {λ k }: thus Aφ k = λ k φ k. Then it is well-known that ψ 2,F (x, y) = λ k φ k (x)φ k (y) k= in the sense of L 2 (F F ) convergence. Hence n ψ 2,F (x,x 2 )df n (x )df n (x 2 ) (2) (3) = n = n d k= { λ k k= k= λ k Z 2 k λ k φ k (x)φ k (y)df n (x)df n (y) φ k df n } 2 = { λ k n k= n φ k (X i ) where the Z i s are i.i.d. N(0, ) since E F φ k (X i ) = 0, E F ψk 2(X i) =, and E F φ j (X i )φ k (X i ) = 0. Combining () and (3) completes the heuristic proof. The reason that this is heuristic is because of the infinite series appearing in (2) and (3). The complete proof entails consideration of finite sums and the corresponding approximation arguments; see Serfling (98), pages for the U statistic case. But note that the V statistic argument on page 227 just involves throwing the diagonal terms back in, and is therefore really easier. i= } 2 Remark 5. It seems to me that Serfling s µ(t, F ) = 0 as formulated above. It also seems to me that he has missed the factor of /2 appearing in the limit distribution.

20 20 CHAPTER 7. STATISTICAL FUNCTIONALS AND THE DELTA METHOD Remark 5.2 Gregory (977) gives related but stronger results which do not require E F ψ 2,F (X,X )= k= λ k < and apply to some interesting statistics with λ k =/k. Note that the infinite series k= k (Z2 k ) defined a proper random variable since the summands have mean 0 and variances 2/k 2 ; these actually arise as limit distributions of the popular Shapiro - Wilk tests for normality; see e.g. DeWet and Venter (972), (973).

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider

More information

Asymptotic statistics using the Functional Delta Method

Asymptotic statistics using the Functional Delta Method Quantiles, Order Statistics and L-Statsitics TU Kaiserslautern 15. Februar 2015 Motivation Functional The delta method introduced in chapter 3 is an useful technique to turn the weak convergence of random

More information

Empirical Processes: General Weak Convergence Theory

Empirical Processes: General Weak Convergence Theory Empirical Processes: General Weak Convergence Theory Moulinath Banerjee May 18, 2010 1 Extended Weak Convergence The lack of measurability of the empirical process with respect to the sigma-field generated

More information

7 Influence Functions

7 Influence Functions 7 Influence Functions The influence function is used to approximate the standard error of a plug-in estimator. The formal definition is as follows. 7.1 Definition. The Gâteaux derivative of T at F in the

More information

Asymptotic Statistics-VI. Changliang Zou

Asymptotic Statistics-VI. Changliang Zou Asymptotic Statistics-VI Changliang Zou Kolmogorov-Smirnov distance Example (Kolmogorov-Smirnov confidence intervals) We know given α (0, 1), there is a well-defined d = d α,n such that, for any continuous

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research

More information

3 Integration and Expectation

3 Integration and Expectation 3 Integration and Expectation 3.1 Construction of the Lebesgue Integral Let (, F, µ) be a measure space (not necessarily a probability space). Our objective will be to define the Lebesgue integral R fdµ

More information

Efficiency of Profile/Partial Likelihood in the Cox Model

Efficiency of Profile/Partial Likelihood in the Cox Model Efficiency of Profile/Partial Likelihood in the Cox Model Yuichi Hirose School of Mathematics, Statistics and Operations Research, Victoria University of Wellington, New Zealand Summary. This paper shows

More information

Measure and Integration: Solutions of CW2

Measure and Integration: Solutions of CW2 Measure and Integration: s of CW2 Fall 206 [G. Holzegel] December 9, 206 Problem of Sheet 5 a) Left (f n ) and (g n ) be sequences of integrable functions with f n (x) f (x) and g n (x) g (x) for almost

More information

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2)

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2) 14:17 11/16/2 TOPIC. Convergence in distribution and related notions. This section studies the notion of the so-called convergence in distribution of real random variables. This is the kind of convergence

More information

The main results about probability measures are the following two facts:

The main results about probability measures are the following two facts: Chapter 2 Probability measures The main results about probability measures are the following two facts: Theorem 2.1 (extension). If P is a (continuous) probability measure on a field F 0 then it has a

More information

Notions such as convergent sequence and Cauchy sequence make sense for any metric space. Convergent Sequences are Cauchy

Notions such as convergent sequence and Cauchy sequence make sense for any metric space. Convergent Sequences are Cauchy Banach Spaces These notes provide an introduction to Banach spaces, which are complete normed vector spaces. For the purposes of these notes, all vector spaces are assumed to be over the real numbers.

More information

Statistics 581 Revision of Section 4.4: Consistency of Maximum Likelihood Estimates Wellner; 11/30/2001

Statistics 581 Revision of Section 4.4: Consistency of Maximum Likelihood Estimates Wellner; 11/30/2001 Statistics 581 Revision of Section 4.4: Consistency of Maximum Likelihood Estimates Wellner; 11/30/2001 Some Uniform Strong Laws of Large Numbers Suppose that: A. X, X 1,...,X n are i.i.d. P on the measurable

More information

SOME CONVERSE LIMIT THEOREMS FOR EXCHANGEABLE BOOTSTRAPS

SOME CONVERSE LIMIT THEOREMS FOR EXCHANGEABLE BOOTSTRAPS SOME CONVERSE LIMIT THEOREMS OR EXCHANGEABLE BOOTSTRAPS Jon A. Wellner University of Washington The bootstrap Glivenko-Cantelli and bootstrap Donsker theorems of Giné and Zinn (990) contain both necessary

More information

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure? MA 645-4A (Real Analysis), Dr. Chernov Homework assignment 1 (Due ). Show that the open disk x 2 + y 2 < 1 is a countable union of planar elementary sets. Show that the closed disk x 2 + y 2 1 is a countable

More information

Lecture 35: December The fundamental statistical distances

Lecture 35: December The fundamental statistical distances 36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose

More information

Asymptotics of minimax stochastic programs

Asymptotics of minimax stochastic programs Asymptotics of minimax stochastic programs Alexander Shapiro Abstract. We discuss in this paper asymptotics of the sample average approximation (SAA) of the optimal value of a minimax stochastic programming

More information

(B(t i+1 ) B(t i )) 2

(B(t i+1 ) B(t i )) 2 ltcc5.tex Week 5 29 October 213 Ch. V. ITÔ (STOCHASTIC) CALCULUS. WEAK CONVERGENCE. 1. Quadratic Variation. A partition π n of [, t] is a finite set of points t ni such that = t n < t n1

More information

Stochastic Convergence, Delta Method & Moment Estimators

Stochastic Convergence, Delta Method & Moment Estimators Stochastic Convergence, Delta Method & Moment Estimators Seminar on Asymptotic Statistics Daniel Hoffmann University of Kaiserslautern Department of Mathematics February 13, 2015 Daniel Hoffmann (TU KL)

More information

7 Convergence in R d and in Metric Spaces

7 Convergence in R d and in Metric Spaces STA 711: Probability & Measure Theory Robert L. Wolpert 7 Convergence in R d and in Metric Spaces A sequence of elements a n of R d converges to a limit a if and only if, for each ǫ > 0, the sequence a

More information

1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3

1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3 Index Page 1 Topology 2 1.1 Definition of a topology 2 1.2 Basis (Base) of a topology 2 1.3 The subspace topology & the product topology on X Y 3 1.4 Basic topology concepts: limit points, closed sets,

More information

2 (Bonus). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

2 (Bonus). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure? MA 645-4A (Real Analysis), Dr. Chernov Homework assignment 1 (Due 9/5). Prove that every countable set A is measurable and µ(a) = 0. 2 (Bonus). Let A consist of points (x, y) such that either x or y is

More information

d(x n, x) d(x n, x nk ) + d(x nk, x) where we chose any fixed k > N

d(x n, x) d(x n, x nk ) + d(x nk, x) where we chose any fixed k > N Problem 1. Let f : A R R have the property that for every x A, there exists ɛ > 0 such that f(t) > ɛ if t (x ɛ, x + ɛ) A. If the set A is compact, prove there exists c > 0 such that f(x) > c for all x

More information

Math 240 (Driver) Qual Exam (9/12/2017)

Math 240 (Driver) Qual Exam (9/12/2017) 1 Name: I.D. #: Math 240 (Driver) Qual Exam (9/12/2017) Instructions: Clearly explain and justify your answers. You may cite theorems from the text, notes, or class as long as they are not what the problem

More information

f(x) f(z) c x z > 0 1

f(x) f(z) c x z > 0 1 INVERSE AND IMPLICIT FUNCTION THEOREMS I use df x for the linear transformation that is the differential of f at x.. INVERSE FUNCTION THEOREM Definition. Suppose S R n is open, a S, and f : S R n is a

More information

Theoretical Statistics. Lecture 19.

Theoretical Statistics. Lecture 19. Theoretical Statistics. Lecture 19. Peter Bartlett 1. Functional delta method. [vdv20] 2. Differentiability in normed spaces: Hadamard derivatives. [vdv20] 3. Quantile estimates. [vdv21] 1 Recall: Delta

More information

Real Analysis Problems

Real Analysis Problems Real Analysis Problems Cristian E. Gutiérrez September 14, 29 1 1 CONTINUITY 1 Continuity Problem 1.1 Let r n be the sequence of rational numbers and Prove that f(x) = 1. f is continuous on the irrationals.

More information

Metric Spaces and Topology

Metric Spaces and Topology Chapter 2 Metric Spaces and Topology From an engineering perspective, the most important way to construct a topology on a set is to define the topology in terms of a metric on the set. This approach underlies

More information

+ 2x sin x. f(b i ) f(a i ) < ɛ. i=1. i=1

+ 2x sin x. f(b i ) f(a i ) < ɛ. i=1. i=1 Appendix To understand weak derivatives and distributional derivatives in the simplest context of functions of a single variable, we describe without proof some results from real analysis (see [7] and

More information

L p Functions. Given a measure space (X, µ) and a real number p [1, ), recall that the L p -norm of a measurable function f : X R is defined by

L p Functions. Given a measure space (X, µ) and a real number p [1, ), recall that the L p -norm of a measurable function f : X R is defined by L p Functions Given a measure space (, µ) and a real number p [, ), recall that the L p -norm of a measurable function f : R is defined by f p = ( ) /p f p dµ Note that the L p -norm of a function f may

More information

Preservation Theorems for Glivenko-Cantelli and Uniform Glivenko-Cantelli Classes

Preservation Theorems for Glivenko-Cantelli and Uniform Glivenko-Cantelli Classes Preservation Theorems for Glivenko-Cantelli and Uniform Glivenko-Cantelli Classes This is page 5 Printer: Opaque this Aad van der Vaart and Jon A. Wellner ABSTRACT We show that the P Glivenko property

More information

Integration on Measure Spaces

Integration on Measure Spaces Chapter 3 Integration on Measure Spaces In this chapter we introduce the general notion of a measure on a space X, define the class of measurable functions, and define the integral, first on a class of

More information

Gaussian Random Fields

Gaussian Random Fields Gaussian Random Fields Mini-Course by Prof. Voijkan Jaksic Vincent Larochelle, Alexandre Tomberg May 9, 009 Review Defnition.. Let, F, P ) be a probability space. Random variables {X,..., X n } are called

More information

Lecture Notes on Metric Spaces

Lecture Notes on Metric Spaces Lecture Notes on Metric Spaces Math 117: Summer 2007 John Douglas Moore Our goal of these notes is to explain a few facts regarding metric spaces not included in the first few chapters of the text [1],

More information

Asymptotic Distributions for the Nelson-Aalen and Kaplan-Meier estimators and for test statistics.

Asymptotic Distributions for the Nelson-Aalen and Kaplan-Meier estimators and for test statistics. Asymptotic Distributions for the Nelson-Aalen and Kaplan-Meier estimators and for test statistics. Dragi Anevski Mathematical Sciences und University November 25, 21 1 Asymptotic distributions for statistical

More information

Stat 710: Mathematical Statistics Lecture 31

Stat 710: Mathematical Statistics Lecture 31 Stat 710: Mathematical Statistics Lecture 31 Jun Shao Department of Statistics University of Wisconsin Madison, WI 53706, USA Jun Shao (UW-Madison) Stat 710, Lecture 31 April 13, 2009 1 / 13 Lecture 31:

More information

1 Weak Convergence in R k

1 Weak Convergence in R k 1 Weak Convergence in R k Byeong U. Park 1 Let X and X n, n 1, be random vectors taking values in R k. These random vectors are allowed to be defined on different probability spaces. Below, for the simplicity

More information

Functional Analysis I

Functional Analysis I Functional Analysis I Course Notes by Stefan Richter Transcribed and Annotated by Gregory Zitelli Polar Decomposition Definition. An operator W B(H) is called a partial isometry if W x = X for all x (ker

More information

(1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define

(1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define Homework, Real Analysis I, Fall, 2010. (1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define ρ(f, g) = 1 0 f(x) g(x) dx. Show that

More information

Real Analysis Notes. Thomas Goller

Real Analysis Notes. Thomas Goller Real Analysis Notes Thomas Goller September 4, 2011 Contents 1 Abstract Measure Spaces 2 1.1 Basic Definitions........................... 2 1.2 Measurable Functions........................ 2 1.3 Integration..............................

More information

REAL AND COMPLEX ANALYSIS

REAL AND COMPLEX ANALYSIS REAL AND COMPLE ANALYSIS Third Edition Walter Rudin Professor of Mathematics University of Wisconsin, Madison Version 1.1 No rights reserved. Any part of this work can be reproduced or transmitted in any

More information

Stability of optimization problems with stochastic dominance constraints

Stability of optimization problems with stochastic dominance constraints Stability of optimization problems with stochastic dominance constraints D. Dentcheva and W. Römisch Stevens Institute of Technology, Hoboken Humboldt-University Berlin www.math.hu-berlin.de/~romisch SIAM

More information

P-adic Functions - Part 1

P-adic Functions - Part 1 P-adic Functions - Part 1 Nicolae Ciocan 22.11.2011 1 Locally constant functions Motivation: Another big difference between p-adic analysis and real analysis is the existence of nontrivial locally constant

More information

Weak convergence. Amsterdam, 13 November Leiden University. Limit theorems. Shota Gugushvili. Generalities. Criteria

Weak convergence. Amsterdam, 13 November Leiden University. Limit theorems. Shota Gugushvili. Generalities. Criteria Weak Leiden University Amsterdam, 13 November 2013 Outline 1 2 3 4 5 6 7 Definition Definition Let µ, µ 1, µ 2,... be probability measures on (R, B). It is said that µ n converges weakly to µ, and we then

More information

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3 Brownian Motion Contents 1 Definition 2 1.1 Brownian Motion................................. 2 1.2 Wiener measure.................................. 3 2 Construction 4 2.1 Gaussian process.................................

More information

STAT 7032 Probability Spring Wlodek Bryc

STAT 7032 Probability Spring Wlodek Bryc STAT 7032 Probability Spring 2018 Wlodek Bryc Created: Friday, Jan 2, 2014 Revised for Spring 2018 Printed: January 9, 2018 File: Grad-Prob-2018.TEX Department of Mathematical Sciences, University of Cincinnati,

More information

An elementary proof of the weak convergence of empirical processes

An elementary proof of the weak convergence of empirical processes An elementary proof of the weak convergence of empirical processes Dragan Radulović Department of Mathematics, Florida Atlantic University Marten Wegkamp Department of Mathematics & Department of Statistical

More information

Section 8.2. Asymptotic normality

Section 8.2. Asymptotic normality 30 Section 8.2. Asymptotic normality We assume that X n =(X 1,...,X n ), where the X i s are i.i.d. with common density p(x; θ 0 ) P= {p(x; θ) :θ Θ}. We assume that θ 0 is identified in the sense that

More information

Constraint qualifications for convex inequality systems with applications in constrained optimization

Constraint qualifications for convex inequality systems with applications in constrained optimization Constraint qualifications for convex inequality systems with applications in constrained optimization Chong Li, K. F. Ng and T. K. Pong Abstract. For an inequality system defined by an infinite family

More information

Advanced Statistics II: Non Parametric Tests

Advanced Statistics II: Non Parametric Tests Advanced Statistics II: Non Parametric Tests Aurélien Garivier ParisTech February 27, 2011 Outline Fitting a distribution Rank Tests for the comparison of two samples Two unrelated samples: Mann-Whitney

More information

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product Chapter 4 Hilbert Spaces 4.1 Inner Product Spaces Inner Product Space. A complex vector space E is called an inner product space (or a pre-hilbert space, or a unitary space) if there is a mapping (, )

More information

X. Linearization and Newton s Method

X. Linearization and Newton s Method 163 X. Linearization and Newton s Method ** linearization ** X, Y nls s, f : G X Y. Given y Y, find z G s.t. fz = y. Since there is no assumption about f being linear, we might as well assume that y =.

More information

Exercises Measure Theoretic Probability

Exercises Measure Theoretic Probability Exercises Measure Theoretic Probability 2002-2003 Week 1 1. Prove the folloing statements. (a) The intersection of an arbitrary family of d-systems is again a d- system. (b) The intersection of an arbitrary

More information

arxiv: v1 [math.pr] 24 Aug 2018

arxiv: v1 [math.pr] 24 Aug 2018 On a class of norms generated by nonnegative integrable distributions arxiv:180808200v1 [mathpr] 24 Aug 2018 Michael Falk a and Gilles Stupfler b a Institute of Mathematics, University of Würzburg, Würzburg,

More information

Overview of normed linear spaces

Overview of normed linear spaces 20 Chapter 2 Overview of normed linear spaces Starting from this chapter, we begin examining linear spaces with at least one extra structure (topology or geometry). We assume linearity; this is a natural

More information

Real Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi

Real Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi Real Analysis Math 3AH Rudin, Chapter # Dominique Abdi.. If r is rational (r 0) and x is irrational, prove that r + x and rx are irrational. Solution. Assume the contrary, that r+x and rx are rational.

More information

THE INVERSE FUNCTION THEOREM

THE INVERSE FUNCTION THEOREM THE INVERSE FUNCTION THEOREM W. PATRICK HOOPER The implicit function theorem is the following result: Theorem 1. Let f be a C 1 function from a neighborhood of a point a R n into R n. Suppose A = Df(a)

More information

THEOREMS, ETC., FOR MATH 515

THEOREMS, ETC., FOR MATH 515 THEOREMS, ETC., FOR MATH 515 Proposition 1 (=comment on page 17). If A is an algebra, then any finite union or finite intersection of sets in A is also in A. Proposition 2 (=Proposition 1.1). For every

More information

Part II Probability and Measure

Part II Probability and Measure Part II Probability and Measure Theorems Based on lectures by J. Miller Notes taken by Dexter Chua Michaelmas 2016 These notes are not endorsed by the lecturers, and I have modified them (often significantly)

More information

MATHS 730 FC Lecture Notes March 5, Introduction

MATHS 730 FC Lecture Notes March 5, Introduction 1 INTRODUCTION MATHS 730 FC Lecture Notes March 5, 2014 1 Introduction Definition. If A, B are sets and there exists a bijection A B, they have the same cardinality, which we write as A, #A. If there exists

More information

Analysis Qualifying Exam

Analysis Qualifying Exam Analysis Qualifying Exam Spring 2017 Problem 1: Let f be differentiable on R. Suppose that there exists M > 0 such that f(k) M for each integer k, and f (x) M for all x R. Show that f is bounded, i.e.,

More information

9 Brownian Motion: Construction

9 Brownian Motion: Construction 9 Brownian Motion: Construction 9.1 Definition and Heuristics The central limit theorem states that the standard Gaussian distribution arises as the weak limit of the rescaled partial sums S n / p n of

More information

Chapter 4: Asymptotic Properties of the MLE

Chapter 4: Asymptotic Properties of the MLE Chapter 4: Asymptotic Properties of the MLE Daniel O. Scharfstein 09/19/13 1 / 1 Maximum Likelihood Maximum likelihood is the most powerful tool for estimation. In this part of the course, we will consider

More information

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability... Functional Analysis Franck Sueur 2018-2019 Contents 1 Metric spaces 1 1.1 Definitions........................................ 1 1.2 Completeness...................................... 3 1.3 Compactness......................................

More information

HILBERT SPACES AND THE RADON-NIKODYM THEOREM. where the bar in the first equation denotes complex conjugation. In either case, for any x V define

HILBERT SPACES AND THE RADON-NIKODYM THEOREM. where the bar in the first equation denotes complex conjugation. In either case, for any x V define HILBERT SPACES AND THE RADON-NIKODYM THEOREM STEVEN P. LALLEY 1. DEFINITIONS Definition 1. A real inner product space is a real vector space V together with a symmetric, bilinear, positive-definite mapping,

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 08: Stochastic Convergence

Introduction to Empirical Processes and Semiparametric Inference Lecture 08: Stochastic Convergence Introduction to Empirical Processes and Semiparametric Inference Lecture 08: Stochastic Convergence Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations

More information

l(y j ) = 0 for all y j (1)

l(y j ) = 0 for all y j (1) Problem 1. The closed linear span of a subset {y j } of a normed vector space is defined as the intersection of all closed subspaces containing all y j and thus the smallest such subspace. 1 Show that

More information

Nonparametric estimation under Shape Restrictions

Nonparametric estimation under Shape Restrictions Nonparametric estimation under Shape Restrictions Jon A. Wellner University of Washington, Seattle Statistical Seminar, Frejus, France August 30 - September 3, 2010 Outline: Five Lectures on Shape Restrictions

More information

1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer 11(2) (1989),

1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer 11(2) (1989), Real Analysis 2, Math 651, Spring 2005 April 26, 2005 1 Real Analysis 2, Math 651, Spring 2005 Krzysztof Chris Ciesielski 1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer

More information

Problem List MATH 5143 Fall, 2013

Problem List MATH 5143 Fall, 2013 Problem List MATH 5143 Fall, 2013 On any problem you may use the result of any previous problem (even if you were not able to do it) and any information given in class up to the moment the problem was

More information

On the Uniform Asymptotic Validity of Subsampling and the Bootstrap

On the Uniform Asymptotic Validity of Subsampling and the Bootstrap On the Uniform Asymptotic Validity of Subsampling and the Bootstrap Joseph P. Romano Departments of Economics and Statistics Stanford University romano@stanford.edu Azeem M. Shaikh Department of Economics

More information

Theoretical Statistics. Lecture 17.

Theoretical Statistics. Lecture 17. Theoretical Statistics. Lecture 17. Peter Bartlett 1. Asymptotic normality of Z-estimators: classical conditions. 2. Asymptotic equicontinuity. 1 Recall: Delta method Theorem: Supposeφ : R k R m is differentiable

More information

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9 MAT 570 REAL ANALYSIS LECTURE NOTES PROFESSOR: JOHN QUIGG SEMESTER: FALL 204 Contents. Sets 2 2. Functions 5 3. Countability 7 4. Axiom of choice 8 5. Equivalence relations 9 6. Real numbers 9 7. Extended

More information

Math 5051 Measure Theory and Functional Analysis I Homework Assignment 2

Math 5051 Measure Theory and Functional Analysis I Homework Assignment 2 Math 551 Measure Theory and Functional nalysis I Homework ssignment 2 Prof. Wickerhauser Due Friday, September 25th, 215 Please do Exercises 1, 4*, 7, 9*, 11, 12, 13, 16, 21*, 26, 28, 31, 32, 33, 36, 37.

More information

Lecture 16: Sample quantiles and their asymptotic properties

Lecture 16: Sample quantiles and their asymptotic properties Lecture 16: Sample quantiles and their asymptotic properties Estimation of quantiles (percentiles Suppose that X 1,...,X n are i.i.d. random variables from an unknown nonparametric F For p (0,1, G 1 (p

More information

Lecture I: Asymptotics for large GUE random matrices

Lecture I: Asymptotics for large GUE random matrices Lecture I: Asymptotics for large GUE random matrices Steen Thorbjørnsen, University of Aarhus andom Matrices Definition. Let (Ω, F, P) be a probability space and let n be a positive integer. Then a random

More information

Non-parametric Inference and Resampling

Non-parametric Inference and Resampling Non-parametric Inference and Resampling Exercises by David Wozabal (Last update. Juni 010) 1 Basic Facts about Rank and Order Statistics 1.1 10 students were asked about the amount of time they spend surfing

More information

simple if it completely specifies the density of x

simple if it completely specifies the density of x 3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued

Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and

More information

STATISTICAL INFERENCE UNDER ORDER RESTRICTIONS LIMIT THEORY FOR THE GRENANDER ESTIMATOR UNDER ALTERNATIVE HYPOTHESES

STATISTICAL INFERENCE UNDER ORDER RESTRICTIONS LIMIT THEORY FOR THE GRENANDER ESTIMATOR UNDER ALTERNATIVE HYPOTHESES STATISTICAL INFERENCE UNDER ORDER RESTRICTIONS LIMIT THEORY FOR THE GRENANDER ESTIMATOR UNDER ALTERNATIVE HYPOTHESES By Jon A. Wellner University of Washington 1. Limit theory for the Grenander estimator.

More information

Review of Multi-Calculus (Study Guide for Spivak s CHAPTER ONE TO THREE)

Review of Multi-Calculus (Study Guide for Spivak s CHAPTER ONE TO THREE) Review of Multi-Calculus (Study Guide for Spivak s CHPTER ONE TO THREE) This material is for June 9 to 16 (Monday to Monday) Chapter I: Functions on R n Dot product and norm for vectors in R n : Let X

More information

Exercises Measure Theoretic Probability

Exercises Measure Theoretic Probability Exercises Measure Theoretic Probability Chapter 1 1. Prove the folloing statements. (a) The intersection of an arbitrary family of d-systems is again a d- system. (b) The intersection of an arbitrary family

More information

Measure-theoretic probability

Measure-theoretic probability Measure-theoretic probability Koltay L. VEGTMAM144B November 28, 2012 (VEGTMAM144B) Measure-theoretic probability November 28, 2012 1 / 27 The probability space De nition The (Ω, A, P) measure space is

More information

MT804 Analysis Homework II

MT804 Analysis Homework II MT804 Analysis Homework II Eudoxus October 6, 2008 p. 135 4.5.1, 4.5.2 p. 136 4.5.3 part a only) p. 140 4.6.1 Exercise 4.5.1 Use the Intermediate Value Theorem to prove that every polynomial of with real

More information

Fourier Series. 1. Review of Linear Algebra

Fourier Series. 1. Review of Linear Algebra Fourier Series In this section we give a short introduction to Fourier Analysis. If you are interested in Fourier analysis and would like to know more detail, I highly recommend the following book: Fourier

More information

δ xj β n = 1 n Theorem 1.1. The sequence {P n } satisfies a large deviation principle on M(X) with the rate function I(β) given by

δ xj β n = 1 n Theorem 1.1. The sequence {P n } satisfies a large deviation principle on M(X) with the rate function I(β) given by . Sanov s Theorem Here we consider a sequence of i.i.d. random variables with values in some complete separable metric space X with a common distribution α. Then the sample distribution β n = n maps X

More information

MAA6617 COURSE NOTES SPRING 2014

MAA6617 COURSE NOTES SPRING 2014 MAA6617 COURSE NOTES SPRING 2014 19. Normed vector spaces Let X be a vector space over a field K (in this course we always have either K = R or K = C). Definition 19.1. A norm on X is a function : X K

More information

Measure Theory on Topological Spaces. Course: Prof. Tony Dorlas 2010 Typset: Cathal Ormond

Measure Theory on Topological Spaces. Course: Prof. Tony Dorlas 2010 Typset: Cathal Ormond Measure Theory on Topological Spaces Course: Prof. Tony Dorlas 2010 Typset: Cathal Ormond May 22, 2011 Contents 1 Introduction 2 1.1 The Riemann Integral........................................ 2 1.2 Measurable..............................................

More information

LEBESGUE INTEGRATION. Introduction

LEBESGUE INTEGRATION. Introduction LEBESGUE INTEGATION EYE SJAMAA Supplementary notes Math 414, Spring 25 Introduction The following heuristic argument is at the basis of the denition of the Lebesgue integral. This argument will be imprecise,

More information

Probability Theory. Richard F. Bass

Probability Theory. Richard F. Bass Probability Theory Richard F. Bass ii c Copyright 2014 Richard F. Bass Contents 1 Basic notions 1 1.1 A few definitions from measure theory............. 1 1.2 Definitions............................. 2

More information

PROBLEMS. (b) (Polarization Identity) Show that in any inner product space

PROBLEMS. (b) (Polarization Identity) Show that in any inner product space 1 Professor Carl Cowen Math 54600 Fall 09 PROBLEMS 1. (Geometry in Inner Product Spaces) (a) (Parallelogram Law) Show that in any inner product space x + y 2 + x y 2 = 2( x 2 + y 2 ). (b) (Polarization

More information

Chapter 8 Integral Operators

Chapter 8 Integral Operators Chapter 8 Integral Operators In our development of metrics, norms, inner products, and operator theory in Chapters 1 7 we only tangentially considered topics that involved the use of Lebesgue measure,

More information

Indeed, if we want m to be compatible with taking limits, it should be countably additive, meaning that ( )

Indeed, if we want m to be compatible with taking limits, it should be countably additive, meaning that ( ) Lebesgue Measure The idea of the Lebesgue integral is to first define a measure on subsets of R. That is, we wish to assign a number m(s to each subset S of R, representing the total length that S takes

More information

Notes on uniform convergence

Notes on uniform convergence Notes on uniform convergence Erik Wahlén erik.wahlen@math.lu.se January 17, 2012 1 Numerical sequences We begin by recalling some properties of numerical sequences. By a numerical sequence we simply mean

More information

Applied Analysis (APPM 5440): Final exam 1:30pm 4:00pm, Dec. 14, Closed books.

Applied Analysis (APPM 5440): Final exam 1:30pm 4:00pm, Dec. 14, Closed books. Applied Analysis APPM 44: Final exam 1:3pm 4:pm, Dec. 14, 29. Closed books. Problem 1: 2p Set I = [, 1]. Prove that there is a continuous function u on I such that 1 ux 1 x sin ut 2 dt = cosx, x I. Define

More information

Preservation Theorems for Glivenko-Cantelli and Uniform Glivenko-Cantelli Classes

Preservation Theorems for Glivenko-Cantelli and Uniform Glivenko-Cantelli Classes Preservation Theorems for Glivenko-Cantelli and Uniform Glivenko-Cantelli Classes Aad van der Vaart and Jon A. Wellner Free University and University of Washington ABSTRACT We show that the P Glivenko

More information

Comparative and qualitative robustness for law-invariant risk measures

Comparative and qualitative robustness for law-invariant risk measures Comparative and qualitative robustness for law-invariant risk measures Volker Krätschmer Alexander Schied Henryk Zähle Abstract When estimating the risk of a P&L from historical data or Monte Carlo simulation,

More information

Analysis Finite and Infinite Sets The Real Numbers The Cantor Set

Analysis Finite and Infinite Sets The Real Numbers The Cantor Set Analysis Finite and Infinite Sets Definition. An initial segment is {n N n n 0 }. Definition. A finite set can be put into one-to-one correspondence with an initial segment. The empty set is also considered

More information

MATH MEASURE THEORY AND FOURIER ANALYSIS. Contents

MATH MEASURE THEORY AND FOURIER ANALYSIS. Contents MATH 3969 - MEASURE THEORY AND FOURIER ANALYSIS ANDREW TULLOCH Contents 1. Measure Theory 2 1.1. Properties of Measures 3 1.2. Constructing σ-algebras and measures 3 1.3. Properties of the Lebesgue measure

More information