Stochastic Convergence, Delta Method & Moment Estimators

Size: px

Start display at page:

Download "Stochastic Convergence, Delta Method & Moment Estimators"

Ashley Daniel
6 years ago
Views:

1 Stochastic Convergence, Delta Method & Moment Estimators Seminar on Asymptotic Statistics Daniel Hoffmann University of Kaiserslautern Department of Mathematics February 13, 2015 Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

2 Overview 1 Stochastic Convergence Concepts of convergence and basic results Theoretical examples: LLN and CLT Tools for weak convergence More on weak convergence: Tightness and Prohorov s theorem Stochastic Landau notation 2 Delta Method Basic result Application I: Testing variance Application II: Asymptotic confidence intervals and variance-stabilizing transformations 3 Moment Estimators Method of Moments: Definition Existence and asymptotic normality 4 List of literature Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

3 Chapter 1 Stochastic Convergence Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

4 Scope and general assumptions We recall the basic notions of stochastic convergence from Probability Theory and take a closer look at weak convergence culminating in Prohorov s theorem. 1 I.e.: There is some countable, dense subset of S. This is just a technical assumption to guarantee the measurability of events like {d(x, Y ) > η} for some random variables X, Y and a threshold η. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

5 Scope and general assumptions We recall the basic notions of stochastic convergence from Probability Theory and take a closer look at weak convergence culminating in Prohorov s theorem. Throughout this talk we fix a probability space (Ω, A, P) on which all appearing random variables will be defined if not stated differently. Furthermore let (S, d) be a separable 1 metric space which will serve as codomain. Later on we will restrict ourselves to the case S = R k. Let L(P, S) := {X : Ω S X is A B(S) measurable} denote the space of all random variables of interest. 1 I.e.: There is some countable, dense subset of S. This is just a technical assumption to guarantee the measurability of events like {d(x, Y ) > η} for some random variables X, Y and a threshold η. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

6 Concepts of convergence Definitions and properties Definition Let (X n ) n N L(P, S) N and X L(P, S). The sequence (X n ) n N is said to... converge almost surely to X (notation: X n X ) if there is some P-null set N A, s.t. X n (ω) n X (ω) for each ω Ω \ N. converge in probability to X (notation: X n a.s. P X ) if we have ε > 0 : lim n P (d(x n, X ) > ε) = 0. converge weakly to X (notation: X n X ) if for each f C b (S) we have f dp(x n ) = f dp(x ). lim n S converge in L p L -sense, p [1, ], to X if S = R (notation: X p n X ) if lim n X n X Lp (P) = 0. S Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

7 From Probability Theory one is familiar with the following relations between these different concepts of convergence: Proposition (relations) Let (X n ) n N L(P, S) and X L(P, S). Then it holds: a.s. P (a) X n X = X n X = X n X. (b) subsequence principle: P X (n k ) k N N N (k l ) l N N N a.s. : X nkl X. X n (c) Slutsky s lemma: Let S = R k and A n, B n L(P, S), n N, s.t. P A n a R k P and B n b R. If X n X it holds: A n + B n X n a + b X. (d) Let S = R and p [1, ). Then it holds: L p P X = X n X. X n Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

8 Moreover the above notions of convergence are compatible with continuity, i.e. a convergent sequence of random variables can be transported to another space using continuous functions and preserving the convergence: Proposition (continuous mapping principle) Let X n, X L(P, S), n N, and Φ : S S a Borel-measurable, P(X )-a.e. continuous mapping where (S, d ) is another metric space. Then one has: { where, X n X = Φ(X n ) Φ(X ), } P., a.s. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

9 Theoretical examples: Where do these notions occur? The most important examples include: Theorem (weak law of large numbers) Let (X n ) n N be a sequence of uncorrelated R-valued random variables satisfying sup n N Var[X n ] <. Then we have 1 n n P (X i E[X i ]) 0. L 2 i=1 Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

10 Theoretical examples: Where do these notions occur? The most important examples include: Theorem (weak law of large numbers) Let (X n ) n N be a sequence of uncorrelated R-valued random variables satisfying sup n N Var[X n ] <. Then we have 1 n n P (X i E[X i ]) 0. L 2 i=1 Theorem (strong law of large numbers) Let (X n ) n N (L 1 (P)) N be a sequence of i.i.d. random variables. Then 1 n n i=1 X i a.s. L 1 E[X 1 ]. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

11 Theorem (central limit theorem) Let (X n ) n N be a sequence of i.i.d. R k -valued random variables satisfying E [ X 1 2 2] <. Then we have ( ) 1 n P (X i E[X i ]) N k (0, Cov[X 1 ]). n i=1 Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

12 Theorem (central limit theorem) Let (X n ) n N be a sequence of i.i.d. R k -valued random variables satisfying E [ X 1 2 2] <. Then we have ( ) 1 n P (X i E[X i ]) N k (0, Cov[X 1 ]). n i=1 Theorem (weak law of small numbers) Let {X n,m } n N be a triangular array of independent random m=1,...,n variables with P(X n,m ) = Bin(1, p n,m ), m = 1,..., n, n N. Suppose that n m=1 p n n n,m λ > 0 and max m=1,...,n p n,m 0. Then we have: ( n ) P X n,m = Poi(λ). n m=1 Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

13 Tools for weak convergence Definition (Weak convergence General approach) Let µ n, µ, n N, be probability measures on B(S). Then the sequence (µ n ) n N converges weakly to µ iff f dµ n = f dµ f Cb 0 (S). lim n S S Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

14 Tools for weak convergence Definition (Weak convergence General approach) Let µ n, µ, n N, be probability measures on B(S). Then the sequence (µ n ) n N converges weakly to µ iff f dµ n = f dµ f Cb 0 (S). lim n Remark (A slight generalization) S S Hence weak convergence of random variables only depends on distributions: X n X P(X n ) P(X ). Due to this equivalence, it is possible to define weak convergence for random variables defined on different probability spaces: X n on (Ω n, A n, P n ), n N, and X on (Ω, A, P). For the sake of simplicity, we will not consider this slight generalization here. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

15 From Probability Theory one is familiar with the following characterization of weak convergence: Theorem (portmanteau theorem) Let X n, X, n N, be S-valued random variables. Then t.f.a.e.: (a) X n X, i.e. E [f (X n )] n E [f (X )] f Cb 0(S). (b) E [f (X n )] n E [f (X )] for all Lipschitz-continuous f Cb 0(S). (c) P(X O) lim inf n P(X n O) for all open O S. (d) P(X F ) lim sup n P(X n F ) for all closed F S. (e) P(X B) = lim n P(X n B) for all B B(S) with P(X )-negligible boundary, i.e. P(X B) = 0. (f) E [f (X n )] n E [f (X )] for all bounded B(S)-measurable functions f : S R that are P(X )-a.e. continuous. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

16 In an Euclidean k-space the distribution function is an appropriate tool to characterize weak convergence: Definition Let X L(P, R k ). Then its (cumulative) distribution function, for short cdf, is given by ( ) F X : R k [0, 1], x P(X x) = P X k (, x i ] i=1. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

17 In an Euclidean k-space the distribution function is an appropriate tool to characterize weak convergence: Definition Let X L(P, R k ). Then its (cumulative) distribution function, for short cdf, is given by ( ) F X : R k [0, 1], x P(X x) = P X k (, x i ] i=1. Remark Note that, as a consequence of the uniqueness theorem for finite measures, the cdf characterizes the distribution of X uniquely since { } k E := (, x i ] x 1,..., x k R i=1 is a π-system (i.e. it is -stable) that generates B(R k ). Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

18 Proposition (weak convergence on R k via cdf) Let X n, X, n N, be R k -valued random variables. Then it holds X n X iff F Xn (x) n F X (x) for all x R k where F X is continuous. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

19 Proposition (weak convergence on R k via cdf) Let X n, X, n N, be R k -valued random variables. Then it holds X n X iff F Xn (x) n F X (x) for all x R k where F X is continuous. Example (N 1 ( 0, 1 n) δ0 ) Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

20 Some more theory on weak convergence We have already observed that weak convergence is weak in the sense that it is implied by all other concepts of convergence that we have introduced. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

21 Some more theory on weak convergence We have already observed that weak convergence is weak in the sense that it is implied by all other concepts of convergence that we have introduced. Let us have a closer look at weak convergence and recall from calculus: Proposition (a) Every convergent sequence in R k is bounded. (b) Every bounded sequence in R k has a convergent subsequence. (Bolzano-Weierstrass theorem) Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

22 Some more theory on weak convergence We have already observed that weak convergence is weak in the sense that it is implied by all other concepts of convergence that we have introduced. Let us have a closer look at weak convergence and recall from calculus: Proposition (a) Every convergent sequence in R k is bounded. (b) Every bounded sequence in R k has a convergent subsequence. (Bolzano-Weierstrass theorem) Is there an analogon involving weak convergence and probabilistic boundedness? Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

23 Prohorov s theorem Yes, indeed: Prohorov s theorem answers this question: Theorem (Prohorov) Let (X n ) n N be a sequence of R k -valued random variables. Then it holds: (a) If X n X for some R k -valued random variable X, then (X n ) n N is uniformly tight 2. (b) If (X n ) n N is uniformly tight 2, then there exists a subsequence ( Xnj ) j N with X n j X for some R k -valued random variable X. For proving this theorem we need some additional concepts and results. 2 This will be made precise shortly. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

24 Probabilistic boundedness Definition (uniform tightness) Let I be an index set and F := {X i } i I a family of R k -valued random variables. Then F is called uniformly tight or bounded in probability if for every ε > 0 there is a constant M ε > 0, such that sup P ( X i 2 > M ε ) < ε. i I Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

25 Probabilistic boundedness Definition (uniform tightness) Let I be an index set and F := {X i } i I a family of R k -valued random variables. Then F is called uniformly tight or bounded in probability if for every ε > 0 there is a constant M ε > 0, such that Remark sup P ( X i 2 > M ε ) < ε. i I Uniform tightness of a sequence of random vectors in R k (i.e. I = N) is exactly the definition of the stochastic Landau notation O p : (X n ) n N is uniformly tight iff X n = O P (1). We will scrutinize this notation later. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

26 Helly s lemma Definition A function F : R k [0, 1] is called a defective distribution function if there is some finite measure µ on B(R k ) taking values in [0, 1] and a constant c F [0, 1], such that ( ) F (x) c F = µ ((, x]) = µ k i=1 (, x i ], x R k. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

27 Helly s lemma Definition A function F : R k [0, 1] is called a defective distribution function if there is some finite measure µ on B(R k ) taking values in [0, 1] and a constant c F [0, 1], such that ( ) Remark F (x) c F = µ ((, x]) = µ k i=1 (, x i ], x R k. 1 By continuity of measures from above we have: c F = lim xi F (x). i=1,...,k 2 A defective distribution function is a cdf iff the underlying finite measure is a probability measure and c F = 0. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

28 Helly s lemma Lemma (Helly s lemma/helly s selection theorem) Let (F n ) n N be a sequence of distribution functions with domain R k. Then this sequence possesses a subsequence ( F nj with the property )j N lim j F nj (x) = F (x) for each continuity point x R k of some defective distribution function F. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

29 Helly s lemma Lemma (Helly s lemma/helly s selection theorem) Let (F n ) n N be a sequence of distribution functions with domain R k. Then this sequence possesses a subsequence ( F nj with the property )j N lim j F nj (x) = F (x) for each continuity point x R k of some defective distribution function F. Rough idea of the proof. The proof is quite technical. Hence we only present the idea of the construction of F. For details, please refer to [Dur10, Thm , Thm ] and [Van98, Lemma 2.5]. BOARD Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

30 Is it really possible that Helly s lemma fails to provide us with an honest cdf? Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

31 Is it really possible that Helly s lemma fails to provide us with an honest cdf? Unfortunately, yes! Example Consider a sequence (X n ) n N of real-valued random variables satisfying X n δ n, n N. Then the corresponding sequence of distribution functions is given by F n : R {0, 1}, x 1 [n, ) (x). Obviously lim j F nj (x) = 0 for each x R and each subsequence (n j ) j N. Hence Helly s lemma cannot yield an honest cdf! Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

32 Prohorov s theorem Now we are in a position to prove Prohorov s theorem: Theorem (Prohorov) Let (X n ) n N be a sequence of R k -valued random variables. Then it holds: (a) If X n X for some R k -valued random variable X, then (X n ) n N is uniformly tight. (b) If (X n ) n N is uniformly tight, then there exists a subsequence ( Xnj ) j N with X n j X for some R k -valued random variable X. Proof. BOARD Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

33 Stochastic Landau notation Similar to the well-known O-Notation from calculus, one can introduce a stochastic version of these Landau symbols in order to express the speed of convergence (in probability): Definition Let (X n ) n N, (R n ) n N be sequences of R k - and R-valued random variables, respectively. We write: (a) X n = O P (1) : {X n : n N} is uniformly tight. (b) X n = O P (R n ) : X n = R n Y n for a sequence (Y n ) n N of R k -valued random variables satisfying Y n = O P (1). Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

34 Stochastic Landau notation Similar to the well-known O-Notation from calculus, one can introduce a stochastic version of these Landau symbols in order to express the speed of convergence (in probability): Definition Let (X n ) n N, (R n ) n N be sequences of R k - and R-valued random variables, respectively. We write: (a) X n = O P (1) : {X n : n N} is uniformly tight. (b) X n = O P (R n ) : X n = R n Y n for a sequence (Y n ) n N of R k -valued random variables satisfying Y n = O P (1). P (c) X n = o P (1) : X n 0. (d) X n = o P (R n ) : X n = R n Y n for a sequence (Y n ) n N of R k -valued random variables satisfying Y n = o P (1). Commonly, (R n ) n N is called the rate (of convergence). Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

35 In our next chapter we will use differential calculus and therefore need the following lemma. Think of R as the remainder term in some Taylor expansion. Lemma Let D R k be open with 0 D and let R : D R m be a function with R(0) = 0. Furthermore let (X n ) n N be a sequence of random variables taking values in D with X n = o P (1). Then for every p > 0 we have: (a) R(h) = o ( h p 2) (h 0) = R(Xn ) = o P ( Xn p 2) ; (b) R(h) = O ( h p 2) (h 0) = R(Xn ) = O P ( Xn p 2). Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

36 In our next chapter we will use differential calculus and therefore need the following lemma. Think of R as the remainder term in some Taylor expansion. Lemma Let D R k be open with 0 D and let R : D R m be a function with R(0) = 0. Furthermore let (X n ) n N be a sequence of random variables taking values in D with X n = o P (1). Then for every p > 0 we have: (a) R(h) = o ( h p 2) (h 0) = R(Xn ) = o P ( Xn p 2) ; (b) R(h) = O ( h p 2) (h 0) = R(Xn ) = O P ( Xn p 2). Proof. Let p > 0 and define g(h) := { R(h) h p 2 if h 0, 0 else. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

37 In our next chapter we will use differential calculus and therefore need the following lemma. Think of R as the remainder term in some Taylor expansion. Lemma Let D R k be open with 0 D and let R : D R m be a function with R(0) = 0. Furthermore let (X n ) n N be a sequence of random variables taking values in D with X n = o P (1). Then for every p > 0 we have: (a) R(h) = o ( h p 2) (h 0) = R(Xn ) = o P ( Xn p 2) ; (b) R(h) = O ( h p 2) (h 0) = R(Xn ) = O P ( Xn p 2). Proof. Let p > 0 and define g(h) := { R(h) h p 2 if h 0, 0 else. [=:R n] [=:Y n] {}}{{}}{ Then for each n N we have R(X n ) = X n p 2 g(x n ). Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

38 Proof (continued). (a) By assumption R(h) lim g(h) = lim h 0 h 0 h p 2 = 0, i.e. g is continuous at 0. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

39 Proof (continued). (a) By assumption R(h) lim g(h) = lim h 0 h 0 h p 2 = 0, i.e. g is continuous at 0. Since X n 0 (by assumption), the P continuous mapping principle yields g(x n ) 0, i.e. g(x n ) = o P (1). Thus R(X n ) = o P ( X n p 2 ). P Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

40 Proof (continued). (a) By assumption R(h) lim g(h) = lim h 0 h 0 h p 2 = 0, i.e. g is continuous at 0. Since X n 0 (by assumption), the P continuous mapping principle yields g(x n ) 0, i.e. g(x n ) = o P (1). Thus R(X n ) = o P ( X n p 2 ). (b) By assumption there is some δ > 0 and some M > 0, s.t. g(h) 2 = R(h) M for all h B δ (0) D. 2 h p 2 P Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

41 Proof (continued). (a) By assumption R(h) lim g(h) = lim h 0 h 0 h p 2 = 0, i.e. g is continuous at 0. Since X n 0 (by assumption), the P continuous mapping principle yields g(x n ) 0, i.e. g(x n ) = o P (1). Thus R(X n ) = o P ( X n p 2 ). (b) By assumption there is some δ > 0 and some M > 0, s.t. g(h) 2 = R(h) P M for all h B δ (0) D. Since X n 0 2 h p 2 (by assumption), we obtain: P ( g(x n ) 2 > M) P ( X n 2 > δ) n 0. P Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

42 Proof (continued). Hence for given ε > 0 we can choose n ε N, s.t. P ( g(x n ) 2 > M) < ε 2 for all n > n ε. For n {1,..., n ε } we choose M ε M suitably large, s.t. P ( g(x n ) 2 > M ε ) < ε for these n. 2 Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

43 Proof (continued). Hence for given ε > 0 we can choose n ε N, s.t. P ( g(x n ) 2 > M) < ε 2 for all n > n ε. For n {1,..., n ε } we choose M ε M suitably large, s.t. P ( g(x n ) 2 > M ε ) < ε for these n. Obviously, this yields 2 sup P ( g(x n ) 2 > M ε ) ε n N 2 < ε, i.e. {g(x n )} n N is uniformly tight. Thus, g(x n ) = O P (1) and R(X n ) = O P ( X n p 2 ). Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

44 Proof (continued). Hence for given ε > 0 we can choose n ε N, s.t. P ( g(x n ) 2 > M) < ε 2 for all n > n ε. For n {1,..., n ε } we choose M ε M suitably large, s.t. P ( g(x n ) 2 > M ε ) < ε for these n. Obviously, this yields 2 sup P ( g(x n ) 2 > M ε ) ε n N 2 < ε, i.e. {g(x n )} n N is uniformly tight. Thus, g(x n ) = O P (1) and R(X n ) = O P ( X n p 2 ). Example (LLN) Let (X n ) n N ( L 1 (P) ) N be a sequence of i.i.d. random variables. Define S n := n i=1 X i. Then we know that Sn n = X n P E[X 1 ], i.e. S n ne[x 1 ] = o P (n). Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

45 Chapter 2 Delta Method Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

46 Motivation/Idea Given a limit law of n (T n θ) (often derived from the CLT), how to deduce one of n (φ(t n ) φ(θ)) where φ is some differentiable mapping? Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

47 Motivation/Idea Given a limit law of n (T n θ) (often derived from the CLT), how to deduce one of n (φ(t n ) φ(θ)) where φ is some differentiable mapping? Use a Taylor expansion! Remark In applications, T n often is an estimator for some parameter θ. Note that the question appeals to the limit distribution. Hence φ(t n ) may inherit a property like asymptotic efficiency from T n. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

48 Let us recall some definitions concerning estimators: Definition Let {P ϑ } ϑ Θ be a familiy of probability measures on B(R m ), consider an i.i.d. sample X 1,..., X n with P(X 1 ) {P ϑ } ϑ Θ and assume T n = T (X 1,..., X n ) : Ω Θ to be an estimator for ϑ. (a) T n is called consistent iff T n P ϑ ϑ (X 1 P ϑ ) for all ϑ Θ. (b) T n is called unbiased iff bias ϑ (T n ) := E ϑ [T n ] ϑ = 0 for all ϑ Θ. (Existence of the involved integral is required.) (c) T n is called asymptotically efficient iff (provided that T n is R k -valued) for all ϑ Θ we have: P ϑ ( n (Tn ϑ) ) N k ( 0, I(Pϑ ) 1). [ ]) Here I(P ϑ ) := (E ϑ ϑ i log f ϑ (X ) ϑ j log f ϑ (X ) denotes the i,j=1,...,k Fisher information matrix of P ϑ where Θ R k is assumed. Moreover X P ϑ is an R m -valued random variable and f ϑ = dp ϑ dλ, if P m ϑ λ m, and f ϑ = dp ϑ d#, if P m ϑ # m (in either way for all ϑ Θ). Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

49 Proof. BOARD Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54 Delta method Basic result Theorem (Delta method) Let D R k be an open subset and suppose that φ : D R m is a mapping that is differentiable at θ D. Furthermore let {T n } n N be a family of D-valued random variables and let (r n ) n N be a sequence of n real numbers satisfying 0 < r n. If r n (T n θ) T for some D-valued random variable T, then n r n (φ(t n ) φ(θ)) n Dφ(θ)T, where Dφ(θ) L ( R k, R m) = R m k denotes the Fréchet-derivative (represented by the Jacobian matrix) of φ at θ. Moreover, we have P r n (φ(t n ) φ(θ)) Dφ(θ) (r n (T n θ)) 2 0.

50 There is also a slightly more general result if we assume φ to be of class C 1 around θ. We state it without proof: Theorem (Uniform delta method) Let D R k be an open subset and suppose that φ : D R m is a mapping that is continuously differentiable in an open neighborhood of θ D. Furthermore let {T n } n N be a family of D-valued random variables and let (r n ) n N be a sequence of real numbers satisfying n 0 < r n. If r n (T n θ n ) T for some D-valued random variable T and some n n D θ n θ, then r n (φ(t n ) φ(θ n )) n Dφ(θ)T, where Dφ(θ) L ( R k, R m) = R m k denotes the Fréchet-derivative (represented by the Jacobian matrix) of φ at θ. Moreover, we have P r n (φ(t n ) φ(θ n )) Dφ(θ) (r n (T n θ n )) 2 0. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

51 Application I: Testing variance CLT revisited Example (sample variance) Given a data set consisting of i.i.d. observations X 1,..., X n L 4 (P), n N, we want to estimate its variance: Therefore we consider the biased estimator bŝ n 2 := 1 n ( ) 2 Xi X n = X 2 n X 2 n = φ(x n, X n 2 n), i=1 where φ(x, y) := y x 2, (x, y) T R 2. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

52 Application I: Testing variance CLT revisited Example (sample variance) Given a data set consisting of i.i.d. observations X 1,..., X n L 4 (P), n N, we want to estimate its variance: Therefore we consider the biased estimator bŝ n 2 := 1 n ( ) 2 Xi X n = X 2 n X 2 n = φ(x n, X n 2 n), i=1 where φ(x, y) := y x 2, (x, y) T R 2. Define µ k := E[X1 k ], k = 1..., 4. Then for the vectors (X i, Xi 2 ) T, i = 1,..., n, it holds by the CLT: (( ) ( )) X n n µ1 = 1 n (( ) [( )]) Xi Xi X 2 n µ 2 n Xi 2 E Xi 2 Z i=1 Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

53 Example (sample variance (continued)) n ( (X n, X 2 n) T ( µ1, µ 2 ) T ) Z, where (( ) ( )) 0 µ2 µ Z N 2, 2 1 µ 3 µ 1 µ 2 0 µ 3 µ 1 µ 2 µ 4 µ 2. 2 Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

54 Example (sample variance (continued)) n ( (X n, X 2 n) T ( µ1, µ 2 ) T ) Z, where (( ) ( )) 0 µ2 µ Z N 2, 2 1 µ 3 µ 1 µ 2 0 µ 3 µ 1 µ 2 µ 4 µ 2. 2 Hence, the delta method implies (since b Ŝn 2 = φ(x n, X 2 n), where φ(x, y) = y x 2 ): ( ) n φ(x n, X 2 n) φ(µ 1, µ 2 ) Dφ(µ 1, µ 2 )Z, i.e. (since Dφ(x, y)z = ( 2x, 1 ) z, z R 2 ) n ( bŝ 2 n ( µ 2 µ 2 1) ) Z. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

55 Example (sample variance (continued)) n ( bŝ 2 n ( µ 2 µ 2 1) ) Z, where ( Z N 1 0, ( 2µ 1, 1 ) ( ) ( )) µ 2 µ 2 1 µ 3 µ 1 µ 2 2µ1 µ 3 µ 1 µ 2 µ 4 µ = N 1 (0, µ 4 µ 2 2 2µ 4 1 4µ 1 µ 3 + 6µ 2 1µ 2 ). Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

56 Example (sample variance (continued)) n ( bŝ 2 n ( µ 2 µ 2 1) ) Z, where ( Z N 1 0, ( 2µ 1, 1 ) ( ) ( )) µ 2 µ 2 1 µ 3 µ 1 µ 2 2µ1 µ 3 µ 1 µ 2 µ 4 µ = N 1 (0, µ 4 µ 2 2 2µ 4 1 4µ 1 µ 3 + 6µ 2 1µ 2 ). Slutsky s lemma implies that this result also holds for the corresponding unbiased estimator Ŝ 2 n := n n 1 bŝ 2 n. We will apply this result to construct a test for the variance of a data set. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

57 First, we recall some notions and results from Mathematical Statistics : Definition (kurtosis) Let X L 4 (P) be a random variable. Then the kurtosis of X is defined by E [(X E[X ]) 4] E [(X E[X ]) 4] κ X := ( E [(X E[X ]) 2]) 2 = (Var(X )) 2 Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

58 First, we recall some notions and results from Mathematical Statistics : Definition (kurtosis) Let X L 4 (P) be a random variable. Then the kurtosis of X is defined by E [(X E[X ]) 4] E [(X E[X ]) 4] κ X := ( E [(X E[X ]) 2]) 2 = (Var(X )) 2 Remark If X N 1 (µ, σ 2 ), then κ X = 3. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

59 First, we recall some notions and results from Mathematical Statistics : Definition (kurtosis) Let X L 4 (P) be a random variable. Then the kurtosis of X is defined by E [(X E[X ]) 4] E [(X E[X ]) 4] κ X := ( E [(X E[X ]) 2]) 2 = (Var(X )) 2 Remark If X N 1 (µ, σ 2 ), then κ X = 3. Definition (Chi-square distribution) Let X 1,..., X n N 1 (0, 1) be i.i.d. random variables. Then the probability measure χ 2 n := P ( n i=1 X i 2 ) on B(R) is called the chi-square distribution with n degrees of freedom. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

60 From Mathematical statistics one is familiar with the following Proposition Let X 1,..., X n N 1 (µ, σ 2 ) be i.i.d. random variables and Ŝn 2 = 1 n ( ) 2 n 1 i=1 Xi X n the unbiased estimator of σ 2 from above. Then ( ) (n 1)Ŝ n 2 P = χ 2 σ n 1. 2 Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

61 From Mathematical statistics one is familiar with the following Proposition Let X 1,..., X n N 1 (µ, σ 2 ) be i.i.d. random variables and Ŝn 2 = 1 n ( ) 2 n 1 i=1 Xi X n the unbiased estimator of σ 2 from above. Then ( ) (n 1)Ŝ n 2 P = χ 2 σ n 1. 2 This result gives rise to the following test for normal data: Example (One-sided test for σ 2 ) In the situation of the preceding proposition and for given σ 2 0 > 0, we want to test H 0 : σ 2 = σ 2 0 vs. H 1 : σ 2 > σ 2 0 at level α (0, 1). Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

62 Example (One-sided test for σ 2 (continued)) As a test statistic we use T n := (n 1)Ŝ2 n σ0 2 H 0 if T n > q (1 α) χ 2 n 1 σ 2 0 χ 2 n 1 and decide to reject ((1 α)-quantile of then χ 2 n 1-distribution). Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

63 Example (One-sided test for σ 2 (continued)) As a test statistic we use T n := (n 1)Ŝ2 n σ 2 0 H 0 if T n > q (1 α) χ 2 n 1 Hence we obtain: P σ 2 0 ( σ 2 0 χ 2 n 1 and decide to reject ((1 α)-quantile of then χ 2 n 1-distribution). ) (( T n > q (1 α) = 1 χ 2 n 1 χ 2 n 1 i.e. the test has exactly level α., q (1 α) χ 2 n 1 ]) = α, Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

64 Example (One-sided test for σ 2 (continued)) As a test statistic we use T n := (n 1)Ŝ2 n σ 2 0 H 0 if T n > q (1 α) χ 2 n 1 Hence we obtain: P σ 2 0 ( σ 2 0 χ 2 n 1 and decide to reject ((1 α)-quantile of then χ 2 n 1-distribution). ) (( T n > q (1 α) = 1 χ 2 n 1 χ 2 n 1 i.e. the test has exactly level α., q (1 α) χ 2 n 1 ]) = α, What if we know that the given data are not normally distributed? Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

65 Example (One-sided test for σ 2 (continued)) As a test statistic we use T n := (n 1)Ŝ2 n σ 2 0 H 0 if T n > q (1 α) χ 2 n 1 Hence we obtain: P σ 2 0 ( σ 2 0 χ 2 n 1 and decide to reject ((1 α)-quantile of then χ 2 n 1-distribution). ) (( T n > q (1 α) = 1 χ 2 n 1 χ 2 n 1 i.e. the test has exactly level α., q (1 α) χ 2 n 1 ]) = α, What if we know that the given data are not normally distributed? We use the approximation n 2 (Ŝ n ( ) ) µ 2 µ 2 1 Z, where Z N 1 (0, µ 4 µ 2 2 2µ 4 1 4µ 1 µ 3 + 6µ 2 1µ 2 ), from above to derive a test of asymptotic level α for certain data sets. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

66 Example (One-sided test for σ 2 without normality assumption) Let X 1,..., X n L 4 (P) be i.i.d. random variables. As above let µ k := E[X1 k ]. For the sake of simplicity we assume µ 1 = 0. 3 We obtain: σ 2 := Var(X 1 ) = µ 2 and κ := κ X1 = µ 4. Hence, our µ 2 2 approximation reduces to ) (Ŝ2 n n 1 Z N 1 (0, κ 1). µ 2 3 This is not a restriction: Centering observations neither affects their dispersion nor Ŝ n 2 = 1 n ( ) 2. n 1 i=1 Xi X n (After centering one has to use centered moments instead of our µ k.) Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

67 Example (One-sided test for σ 2 without normality assumption) Let X 1,..., X n L 4 (P) be i.i.d. random variables. As above let µ k := E[X1 k ]. For the sake of simplicity we assume µ 1 = 0. 3 We obtain: σ 2 := Var(X 1 ) = µ 2 and κ := κ X1 = µ 4. Hence, our µ 2 2 approximation reduces to ) (Ŝ2 n n 1 Z N 1 (0, κ 1). µ 2 Again, for given σ 2 0 > 0, we want to test H 0 : σ 2 = σ0 2 vs. H 1 : σ 2 > σ0 2 at level α (0, 1). We use T n := ( ) n Ŝ2 n 1 as test statistic. σ0 2 3 This is not a restriction: Centering observations neither affects their dispersion nor Ŝ n 2 = 1 n ( ) 2. n 1 i=1 Xi X n (After centering one has to use centered moments instead of our µ k.) Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

68 T n κ 1 = (Ŝ ) 2 n n κ 1 1 N σ0 2 1 (0, 1) 1000 repetitions Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

69 T n κ 1 = (Ŝ ) 2 n n κ 1 1 N σ0 2 1 (0, 1) 1000 repetitions Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

70 A remark on quantiles of χ 2 n 1 Remark Let C n 1 χ 2 n 1, n N >1. Then the CLT implies ( ) Cn 1 (n 1) P N 1 (0, 1). 2n 2 4 This is due to the fact that Φ increases strictly. ( Probability Theory ) Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

71 A remark on quantiles of χ 2 n 1 Remark Let C n 1 χ 2 n 1, n N >1. Then the CLT implies ( ) Cn 1 (n 1) P N 1 (0, 1). 2n 2 Hence, for α (0, 1), the (1 α)-quantile of the latter probability distribution converges to that of a standard normal distribution as quantiles of this distribution are uniquely determined 4 : lim n (n 1) 2n 2 }{{} =: 1 q n 2 q (1 α) χ 2 n 1 = Φ 1 (1 α). 4 This is due to the fact that Φ increases strictly. ( Probability Theory ) Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

72 Example (One-sided test for σ 2 (continued)) ) σ 2 0 Recall: T n := ( n Ŝ2 n 1 σ Z N (0, κ 1), lim n q n = 2Φ 1 (1 α). Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

73 Example (One-sided test for σ 2 (continued)) ) σ 2 0 Recall: T n := ( n Ŝ2 n 1 σ Z N (0, κ 1), lim n q n = 2Φ 1 (1 α). Thus Slutsky s lemma implies: P σ 2 0 ((T n q n ) ) N 1 ( ) 2Φ 1 (1 α), κ 1. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

74 Example (One-sided test for σ 2 (continued)) ) σ 2 0 Recall: T n := ( n Ŝ2 n 1 σ Z N (0, κ 1), lim n q n = 2Φ 1 (1 α). Thus Slutsky s lemma implies: P σ 2 0 ((T n q n ) ) N 1 ( ) 2Φ 1 (1 α), κ 1. We implement the following decision rule: Reject H 0 if T n > q (1 α) χ 2 (n 1) n 1 n 1 = q n. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

75 Example (One-sided test for σ 2 (continued)) ) σ 2 0 Recall: T n := ( n Ŝ2 n 1 σ Z N (0, κ 1), lim n q n = 2Φ 1 (1 α). Thus Slutsky s lemma implies: P σ 2 0 ((T n q n ) ) N 1 ( ) 2Φ 1 (1 α), κ 1. We implement the following decision rule: Reject H 0 if T n > q (1 α) χ 2 (n 1) n 1 n 1 = q n. For the error of type I, we obtain: P σ 2 0 (T n > q n ) = P σ 2 0 (T n q n > 0) n portmanteau (e) 1 N 1 ( 2Φ 1 (1 α), κ 1 ) ((, 0]) Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

76 Example (One-sided test for σ 2 (continued)) ( P σ 2 0 (T n > q n ) n 2Φ 1 (1 α) 1 Φ κ 1 )! = α κ = 3! α 1 κ 3 Hence our decision rule establishes an (asymptotic) one-sided test (of level α) for σ 2 iff the distribution of the observations is platykurtic or mesokurtic, i.e. κ < 3 and κ = 3, respectively. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

77 Recall: H 0 : σ 2 = σ 2 0 T n := (n 1)Ŝ2 n σ 2 0 T n := n ( Ŝ2 n σ 2 0 H 0 if T n > q n = used with normal data Reject H 0 if T n > q (1 α). χ 2 n 1 ) 1 used with possibly non-normal data Reject q (1 α) χ 2 (n 1) n 1 n 1. Remark The presented testing procedures are closely related: They are based on the same (asymptotic) decision rule (if µ 1 = 0) as one can prove: T n > q n T n (n 1) > n 1 ( ) q (1 α) (n 1). }{{ n χ 2 n 1 } 1 for large n Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

78 Application II: Asymptotic confidence intervals and variance-stabilizing transformations (VST) We are given a parametric model {P ϑ } ϑ Θ of probability measures on B(R k ) and assume Θ := (ϑ, ϑ + ) R to be an open interval. Furthermore assume the existence of an estimator T n = T (X 1,..., X n ) of ϑ Θ (where {X l } l N is a familiy of i.i.d. R k -valued random variables with P(X 1 ) {P ϑ } ϑ Θ 5 ) that satisfies n (Tn ϑ) ϑ = n T N 1(0, σ 2 (ϑ)) for all ϑ Θ. We assume that σ 2 ( ) is known as function of ϑ. 5 Formally, for l N, Xl is defined on (Ω, A, P) as fixed above. If ϑ Θ is the true (but unknown) parameter then the image measure of P under X 1 (i.e. the distribution of X 1 ) is given by P ϑ. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

79 Application II: Asymptotic confidence intervals and variance-stabilizing transformations (VST) We are given a parametric model {P ϑ } ϑ Θ of probability measures on B(R k ) and assume Θ := (ϑ, ϑ + ) R to be an open interval. Furthermore assume the existence of an estimator T n = T (X 1,..., X n ) of ϑ Θ (where {X l } l N is a familiy of i.i.d. R k -valued random variables with P(X 1 ) {P ϑ } ϑ Θ 5 ) that satisfies n (Tn ϑ) ϑ = n T N 1(0, σ 2 (ϑ)) for all ϑ Θ. We assume that σ 2 ( ) is known as function of ϑ. Task: For fixed γ (0, 1), find an asymptotic confidence interval for ϑ. 5 Formally, for l N, Xl is defined on (Ω, A, P) as fixed above. If ϑ Θ is the true (but unknown) parameter then the image measure of P under X 1 (i.e. the distribution of X 1 ) is given by P ϑ. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

80 Application II: Asymptotic confidence intervals and variance-stabilizing transformations (VST) We are given a parametric model {P ϑ } ϑ Θ of probability measures on B(R k ) and assume Θ := (ϑ, ϑ + ) R to be an open interval. Furthermore assume the existence of an estimator T n = T (X 1,..., X n ) of ϑ Θ (where {X l } l N is a familiy of i.i.d. R k -valued random variables with P(X 1 ) {P ϑ } ϑ Θ 5 ) that satisfies n (Tn ϑ) ϑ = n T N 1(0, σ 2 (ϑ)) for all ϑ Θ. We assume that σ 2 ( ) is known as function of ϑ. Task: For fixed γ (0, 1), find an asymptotic confidence interval for ϑ. First idea: Consider the asymptotic γ-confidence interval CI ϑ;n (γ) := [ T n Φ 1 ( 1 + γ 2 ) σ(ϑ) n, T n Φ 1 ( 1 + γ 2 ) ] σ(ϑ). n 5 Formally, for l N, Xl is defined on (Ω, A, P) as fixed above. If ϑ Θ is the true (but unknown) parameter then the image measure of P under X 1 (i.e. the distribution of X 1 ) is given by P ϑ. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

81 First idea: Consider the asymptotic γ-confidence interval [ ( ) ( ) ] 1 + γ σ(ϑ) 1 + γ σ(ϑ) CI ϑ;n(γ) = T n Φ 1, T n Φ 1. 2 n 2 n Problem: ϑ and thus σ(ϑ) are unknown in general. Hence, these confidence intervals are useless in practice. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

82 First idea: Consider the asymptotic γ-confidence interval [ ( ) ( ) ] 1 + γ σ(ϑ) 1 + γ σ(ϑ) CI ϑ;n(γ) = T n Φ 1, T n Φ 1. 2 n 2 n Problem: ϑ and thus σ(ϑ) are unknown in general. Hence, these confidence intervals are useless in practice. Solution 1: Estimate σ 2 (ϑ) using a consistent estimator. This approach is discussed in Mathematical Statistics. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

83 First idea: Consider the asymptotic γ-confidence interval [ ( ) ( ) ] 1 + γ σ(ϑ) 1 + γ σ(ϑ) CI ϑ;n(γ) = T n Φ 1, T n Φ 1. 2 n 2 n Problem: ϑ and thus σ(ϑ) are unknown in general. Hence, these confidence intervals are useless in practice. Solution 1: Estimate σ 2 (ϑ) using a consistent estimator. This approach is discussed in Mathematical Statistics. Solution 2: Use a variance-stabilizing transformation of the given data set. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

84 Variance-stabilizing transformations Assumption Let ϑ 0 Θ be fixed. We assume that the mapping Θ = (ϑ, ϑ + ) ϑ ϑ ϑ 0 1 σ(θ) dθ R is well-defined and differentiable (with derivative 1/σ( )). Definition (VST) In the stated situation, under the latter assumption and for some fixed η > 0, the differentiable mapping φ : Θ = (ϑ, ϑ + ) R, ϑ is called a variance-stabilizing transformation. ϑ ϑ 0 η σ(θ) dθ Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

85 Recall: φ(ϑ) = ϑ ϑ 0 Remark (Basic properties) η σ(θ) dθ, ϑ (ϑ, ϑ + ) 1 φ is continuous and due to η > 0 and σ > 0 on Θ also strictly increasing as its derivative equals φ η. Hence φ is invertible. σ 2 φ exhibits the variance-stabilizing property: φ σ η. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

86 Recall: φ(ϑ) = ϑ ϑ 0 Remark (Basic properties) η σ(θ) dθ, ϑ (ϑ, ϑ + ) 1 φ is continuous and due to η > 0 and σ > 0 on Θ also strictly increasing as its derivative equals φ η. Hence φ is invertible. σ 2 φ exhibits the variance-stabilizing property: φ σ η. Remark (What is the origin of this name?) Recall: n (T n ϑ) differentiable on Θ. ϑ = n T N 1(0, σ 2 (ϑ)) for all ϑ Θ and φ is Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

87 Recall: φ(ϑ) = ϑ ϑ 0 Remark (Basic properties) η σ(θ) dθ, ϑ (ϑ, ϑ + ) 1 φ is continuous and due to η > 0 and σ > 0 on Θ also strictly increasing as its derivative equals φ η. Hence φ is invertible. σ 2 φ exhibits the variance-stabilizing property: φ σ η. Remark (What is the origin of this name?) Recall: ϑ n (T n ϑ) = T N 1(0, σ 2 (ϑ)) for all ϑ Θ and φ is n differentiable on Θ. Hence, the delta method implies: n (φ(tn ) φ(ϑ)) ϑ = n φ (ϑ)t N 1 (0, (φ (ϑ)) 2 σ 2 (ϑ)) = N 1 (0, η 2 ) for all ϑ Θ, i.e. the asymptotic variance is stabilized to η 2 (which is usually chosen to be 1 in practice). Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

88 Recall: T n is an estimator of ϑ with n (Tn ϑ) ϑ = n T N 1(0, σ 2 (ϑ)) for all ϑ Θ. Goal: Find an asymptotic γ-confidence interval for ϑ. Derived so far: ϑ n (φ(tn ) φ(ϑ)) = n φ (ϑ) N 1 (0, η 2 ) for all ϑ Θ. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

89 Recall: T n is an estimator of ϑ with n (Tn ϑ) ϑ = n T N 1(0, σ 2 (ϑ)) for all ϑ Θ. Goal: Find an asymptotic γ-confidence interval for ϑ. Derived so far: ϑ n (φ(tn ) φ(ϑ)) = n φ (ϑ) N 1 (0, η 2 ) for all ϑ Θ. Example (Asymptotic CI via VST) In the above situation and using a variance-stabilizing transformation φ, our First Idea implies that [ ( ) ( ) ] 1 + γ η 1 + γ η CI φ(ϑ),n (γ) := φ(t n ) Φ 1 n, φ(t n ) + Φ 1 n 2 2 is an asymptotic γ-confidence interval for φ(ϑ). Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

90 Example (Asymptotic CI via VST (continued)) [ ( ) 1 + γ η CI φ(ϑ),n (γ) := φ(t n ) Φ 1 2 is an asymptotic γ-confidence interval for φ(ϑ). n, φ(t n ) + Φ 1 ( 1 + γ 2 ) ] η n Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

91 Example (Asymptotic CI via VST (continued)) [ ( ) 1 + γ η CI φ(ϑ),n (γ) := φ(t n ) Φ 1 2 n, φ(t n ) + Φ 1 ( 1 + γ 2 ) ] η n is an asymptotic γ-confidence interval for φ(ϑ). Idea: Transform this interval using φ 1 to obtain an asymptotic CI for ϑ. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

92 Example (Asymptotic CI via VST (continued)) [ ( ) 1 + γ η CI φ(ϑ),n (γ) := φ(t n ) Φ 1 2 n, φ(t n ) + Φ 1 ( 1 + γ 2 ) ] η n is an asymptotic γ-confidence interval for φ(ϑ). Idea: Transform this interval using φ 1 to obtain an asymptotic CI for ϑ. We know for ϑ Θ: γ lim inf n P ϑ (φ(ϑ) CI φ(ϑ),n (γ) ) ( }) = lim inf P ϑ ϑ {φ CI n φ(ϑ),n (γ). Now, since φ is continuous and strictly increasing (in particular one-to-one), { φ CI φ(ϑ),n(γ) } R is really an interval. Thus, depending on a specific φ, we obtain an asymptotic γ-confidence interval for ϑ. Note that this interval is easy to compute in practice: Just apply φ 1 to the boundary values of CI φ(ϑ),n(γ). Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

93 Chapter 3 Moment Estimators Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

94 Method of Moments Let Θ R k and {P ϑ } ϑ Θ be a familiy of probability measures on B(R). As above we assume that {X l } l N is a familiy of i.i.d. R-valued random variables with P(X 1 ) {P ϑ } ϑ Θ, i.e. the distribution of X 1 is known up to the parameter vector ϑ Θ. 6 Of course, this requires certain integrability conditions on these functions and X 1 s.t. all involved expectations are well-defined. 7 Note that it is not clear a priori whether such a ϑ exists. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

95 Method of Moments Let Θ R k and {P ϑ } ϑ Θ be a familiy of probability measures on B(R). As above we assume that {X l } l N is a familiy of i.i.d. R-valued random variables with P(X 1 ) {P ϑ } ϑ Θ, i.e. the distribution of X 1 is known up to the parameter vector ϑ Θ. Given some functions f 1,..., f k : R R, the method of moments pursues the following ansatz 6 : Find ϑ Θ, s.t. f j (X ) n = 1 n n f j (X i ) =! E ϑ [f j (X 1 )], j = 1,..., k. 7 i=1 It is obvious that the LLN motivates this approach. 6 Of course, this requires certain integrability conditions on these functions and X 1 s.t. all involved expectations are well-defined. 7 Note that it is not clear a priori whether such a ϑ exists. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

96 Ansatz Find ϑ Θ, s.t. 1 n n i=1 f j(x i )! = E ϑ [f j (X 1 )], j = 1,..., k. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

97 Ansatz Find ϑ Θ, s.t. 1 n n i=1 f j(x i )! = E ϑ [f j (X 1 )], j = 1,..., k. Remark (Recourse to Mathematical Statistics ) Consider f j (x) := x j, j = 1,..., k. Then the method of moments reduces to finding ϑ Θ s.t. X j n = 1 n n i=1 X j i = 1 n n ] f j (X i ) =! E ϑ [f j (X 1 )] = E ϑ [X j 1, j = 1,..., k. i=1 Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

98 Ansatz Find ϑ Θ, s.t. 1 n n i=1 f j(x i )! = E ϑ [f j (X 1 )], j = 1,..., k. Remark (Recourse to Mathematical Statistics ) Consider f j (x) := x j, j = 1,..., k. Then the method of moments reduces to finding ϑ Θ s.t. X j n = 1 n n i=1 X j i = 1 n n ] f j (X i ) =! E ϑ [f j (X 1 )] = E ϑ [X j 1, j = 1,..., k. i=1 Now we want to scrutinize conditions for existence and asymptotic normality of this type of estimator (to be introduced shortly). Therefore, we use the following Notation f := (f 1,..., f k ) T, e : Θ R k, ϑ E ϑ [f (X 1 )]. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

99 Moment estimators Therefore, the equation of interest is given by )! (f j (X ) n = e(ϑ). (*) j=1,...,k Definition (moment estimators) An estimator ϑ n solving equation (*) is called a moment estimator. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

100 Existence and asymptotic normality Theorem We consider the situation stated above. Let Θ R k be an open set and suppose e(ϑ) = E ϑ [f (X 1 )], ϑ Θ, is continuously differentiable in an open neighborhood of some point ϑ] 0 Θ with det De(ϑ 0 ) 0. Moreover, assume that E ϑ0 [ f (X 1 ) 2 2 <. Then e is C 1 -invertible in an open neighborhood of ϑ 0 and moment estimators ϑ n exists with probability tending to 1 as n 8. Furthermore they obey ( n ) ) ϑ0 9 ( P ( ϑn ϑ 0 N k 0, (De(ϑ 0 )) 1 Cov ϑ0 [f (X 1 )] (De(ϑ 0 )) T) Proof. BOARD 8 I.e., informally, the set of ω s where ϑ n can be defined gains P-mass as n. 9 If ϑ 0 is the true parameter. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

101 List of literature Durrett, R.: Probability: Theory and Examples. Cambridge University Press, Klenke, A.: Wahrscheinlichkeitstheorie. Berlin: Springer, Redenbach, C.: Mathematical Statistics. Lecture Notes TU Kaiserslautern, Seifried, F. T.: Maß und Integration. Lecture Notes TU Kaiserslautern, Seifried, F. T.: Probability Theory. Lecture Notes TU Kaiserslautern, Van Der Vaart, A. W.: Asymptotic Statistics. Cambridge University Press, Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, / 54

Weak convergence. Amsterdam, 13 November Leiden University. Limit theorems. Shota Gugushvili. Generalities. Criteria

Weak convergence. Amsterdam, 13 November Leiden University. Limit theorems. Shota Gugushvili. Generalities. Criteria Weak Leiden University Amsterdam, 13 November 2013 Outline 1 2 3 4 5 6 7 Definition Definition Let µ, µ 1, µ 2,... be probability measures on (R, B). It is said that µ n converges weakly to µ, and we then