Stochastic Convergence, Delta Method & Moment Estimators

Similar documents
Weak convergence. Amsterdam, 13 November Leiden University. Limit theorems. Shota Gugushvili. Generalities. Criteria

On the convergence of sequences of random variables: A primer

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Theoretical Statistics. Lecture 1.

Lecture 32: Asymptotic confidence sets and likelihoods

Uniformity and the delta method

Gaussian vectors and central limit theorem

SDS : Theoretical Statistics

Introduction to Empirical Processes and Semiparametric Inference Lecture 08: Stochastic Convergence

ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM

A Very Brief Summary of Statistical Inference, and Examples

Economics 241B Review of Limit Theorems for Sequences of Random Variables

ST5215: Advanced Statistical Theory

The Canonical Gaussian Measure on R

Metric spaces and metrizability

Probability and Measure

Economics 583: Econometric Theory I A Primer on Asymptotics

Large Sample Theory. Consider a sequence of random variables Z 1, Z 2,..., Z n. Convergence in probability: Z n

Lecture 7 Introduction to Statistical Decision Theory

Some Background Material

Chapter 6. Convergence. Probability Theory. Four different convergence concepts. Four different convergence concepts. Convergence in probability

Prohorov s theorem. Bengt Ringnér. October 26, 2008

IEOR 3106: Introduction to Operations Research: Stochastic Models. Fall 2011, Professor Whitt. Class Lecture Notes: Thursday, September 15.

Notes 9 : Infinitely divisible and stable laws

17. Convergence of Random Variables

STAT Sample Problem: General Asymptotic Results

Asymptotic Statistics-III. Changliang Zou

Lecture 2: Convergence of Random Variables

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.

6.1 Variational representation of f-divergences

Review and continuation from last week Properties of MLEs

Lecture 8: Information Theory and Statistics

Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University

Lecture 6 Basic Probability

Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued

Lecture 1: August 28

Theoretical Statistics. Lecture 17.

1 Probability theory. 2 Random variables and probability theory.

simple if it completely specifies the density of x

Chapter 7. Confidence Sets Lecture 30: Pivotal quantities and confidence sets

Module 3. Function of a Random Variable and its distribution

Math Camp II. Calculus. Yiqing Xu. August 27, 2014 MIT

Notes 1 : Measure-theoretic foundations I

Mathematical Methods for Neurosciences. ENS - Master MVA Paris 6 - Master Maths-Bio ( )

Stat 8112 Lecture Notes Weak Convergence in Metric Spaces Charles J. Geyer January 23, Metric Spaces

Probability Theory and Statistics. Peter Jochumzen

Empirical Processes: General Weak Convergence Theory

Lecture 8 Inequality Testing and Moment Inequality Models

Probability and Distributions

40.530: Statistics. Professor Chen Zehua. Singapore University of Design and Technology

Statistical inference

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

Chapter 4. Theory of Tests. 4.1 Introduction

If we want to analyze experimental or simulated data we might encounter the following tasks:

CONVERGENCE OF RANDOM SERIES AND MARTINGALES

University of Regina. Lecture Notes. Michael Kozdron

Asymptotic Statistics-VI. Changliang Zou

Spring 2012 Math 541B Exam 1

Lecture 2. We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales.

The properties of L p -GMM estimators

STAT 200C: High-dimensional Statistics

Statistics. Statistics

CHAPTER 3: LARGE SAMPLE THEORY

Probability and Measure

The International Journal of Biostatistics

Convergence in Distribution

Chapter 5. Weak convergence

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2)

Introduction to Probability

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

Limiting Distributions

STA 732: Inference. Notes 2. Neyman-Pearsonian Classical Hypothesis Testing B&D 4

Multivariate Analysis and Likelihood Inference

7 Convergence in R d and in Metric Spaces

Chapter 3. Point Estimation. 3.1 Introduction

Chapter 3: Unbiased Estimation Lecture 22: UMVUE and the method of using a sufficient and complete statistic

Hochdimensionale Integration

Overview of normed linear spaces

Multivariate Distributions

MA651 Topology. Lecture 9. Compactness 2.

Metric Spaces Lecture 17

Stochastic Processes

STAT 7032 Probability Spring Wlodek Bryc

Notes 18 : Optional Sampling Theorem

Metric Spaces and Topology

Building Infinite Processes from Finite-Dimensional Distributions

Notions such as convergent sequence and Cauchy sequence make sense for any metric space. Convergent Sequences are Cauchy

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions

Real Analysis. July 10, These notes are intended for use in the warm-up camp for incoming Berkeley Statistics

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2

Cherry Blossom run (1) The credit union Cherry Blossom Run is a 10 mile race that takes place every year in D.C. In 2009 there were participants

Elementary Probability. Exam Number 38119

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Regression and Statistical Inference

Statistics for scientists and engineers

1 Fourier Integrals of finite measures.

Lecture 21: Convergence of transformations and generating a random variable

Recall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n

Lecture 28: Asymptotic confidence sets

Transcription:

Stochastic Convergence, Delta Method & Moment Estimators Seminar on Asymptotic Statistics Daniel Hoffmann University of Kaiserslautern Department of Mathematics February 13, 2015 Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 1 / 54

Overview 1 Stochastic Convergence Concepts of convergence and basic results Theoretical examples: LLN and CLT Tools for weak convergence More on weak convergence: Tightness and Prohorov s theorem Stochastic Landau notation 2 Delta Method Basic result Application I: Testing variance Application II: Asymptotic confidence intervals and variance-stabilizing transformations 3 Moment Estimators Method of Moments: Definition Existence and asymptotic normality 4 List of literature Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 2 / 54

Chapter 1 Stochastic Convergence Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 3 / 54

Scope and general assumptions We recall the basic notions of stochastic convergence from Probability Theory and take a closer look at weak convergence culminating in Prohorov s theorem. 1 I.e.: There is some countable, dense subset of S. This is just a technical assumption to guarantee the measurability of events like {d(x, Y ) > η} for some random variables X, Y and a threshold η. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 4 / 54

Scope and general assumptions We recall the basic notions of stochastic convergence from Probability Theory and take a closer look at weak convergence culminating in Prohorov s theorem. Throughout this talk we fix a probability space (Ω, A, P) on which all appearing random variables will be defined if not stated differently. Furthermore let (S, d) be a separable 1 metric space which will serve as codomain. Later on we will restrict ourselves to the case S = R k. Let L(P, S) := {X : Ω S X is A B(S) measurable} denote the space of all random variables of interest. 1 I.e.: There is some countable, dense subset of S. This is just a technical assumption to guarantee the measurability of events like {d(x, Y ) > η} for some random variables X, Y and a threshold η. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 4 / 54

Concepts of convergence Definitions and properties Definition Let (X n ) n N L(P, S) N and X L(P, S). The sequence (X n ) n N is said to... converge almost surely to X (notation: X n X ) if there is some P-null set N A, s.t. X n (ω) n X (ω) for each ω Ω \ N. converge in probability to X (notation: X n a.s. P X ) if we have ε > 0 : lim n P (d(x n, X ) > ε) = 0. converge weakly to X (notation: X n X ) if for each f C b (S) we have f dp(x n ) = f dp(x ). lim n S converge in L p L -sense, p [1, ], to X if S = R (notation: X p n X ) if lim n X n X Lp (P) = 0. S Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 5 / 54

From Probability Theory one is familiar with the following relations between these different concepts of convergence: Proposition (relations) Let (X n ) n N L(P, S) and X L(P, S). Then it holds: a.s. P (a) X n X = X n X = X n X. (b) subsequence principle: P X (n k ) k N N N (k l ) l N N N a.s. : X nkl X. X n (c) Slutsky s lemma: Let S = R k and A n, B n L(P, S), n N, s.t. P A n a R k P and B n b R. If X n X it holds: A n + B n X n a + b X. (d) Let S = R and p [1, ). Then it holds: L p P X = X n X. X n Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 6 / 54

Moreover the above notions of convergence are compatible with continuity, i.e. a convergent sequence of random variables can be transported to another space using continuous functions and preserving the convergence: Proposition (continuous mapping principle) Let X n, X L(P, S), n N, and Φ : S S a Borel-measurable, P(X )-a.e. continuous mapping where (S, d ) is another metric space. Then one has: { where, X n X = Φ(X n ) Φ(X ), } P., a.s. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 7 / 54

Theoretical examples: Where do these notions occur? The most important examples include: Theorem (weak law of large numbers) Let (X n ) n N be a sequence of uncorrelated R-valued random variables satisfying sup n N Var[X n ] <. Then we have 1 n n P (X i E[X i ]) 0. L 2 i=1 Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 8 / 54

Theoretical examples: Where do these notions occur? The most important examples include: Theorem (weak law of large numbers) Let (X n ) n N be a sequence of uncorrelated R-valued random variables satisfying sup n N Var[X n ] <. Then we have 1 n n P (X i E[X i ]) 0. L 2 i=1 Theorem (strong law of large numbers) Let (X n ) n N (L 1 (P)) N be a sequence of i.i.d. random variables. Then 1 n n i=1 X i a.s. L 1 E[X 1 ]. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 8 / 54

Theorem (central limit theorem) Let (X n ) n N be a sequence of i.i.d. R k -valued random variables satisfying E [ X 1 2 2] <. Then we have ( ) 1 n P (X i E[X i ]) N k (0, Cov[X 1 ]). n i=1 Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 9 / 54

Theorem (central limit theorem) Let (X n ) n N be a sequence of i.i.d. R k -valued random variables satisfying E [ X 1 2 2] <. Then we have ( ) 1 n P (X i E[X i ]) N k (0, Cov[X 1 ]). n i=1 Theorem (weak law of small numbers) Let {X n,m } n N be a triangular array of independent random m=1,...,n variables with P(X n,m ) = Bin(1, p n,m ), m = 1,..., n, n N. Suppose that n m=1 p n n n,m λ > 0 and max m=1,...,n p n,m 0. Then we have: ( n ) P X n,m = Poi(λ). n m=1 Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 9 / 54

Tools for weak convergence Definition (Weak convergence General approach) Let µ n, µ, n N, be probability measures on B(S). Then the sequence (µ n ) n N converges weakly to µ iff f dµ n = f dµ f Cb 0 (S). lim n S S Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 10 / 54

Tools for weak convergence Definition (Weak convergence General approach) Let µ n, µ, n N, be probability measures on B(S). Then the sequence (µ n ) n N converges weakly to µ iff f dµ n = f dµ f Cb 0 (S). lim n Remark (A slight generalization) S S Hence weak convergence of random variables only depends on distributions: X n X P(X n ) P(X ). Due to this equivalence, it is possible to define weak convergence for random variables defined on different probability spaces: X n on (Ω n, A n, P n ), n N, and X on (Ω, A, P). For the sake of simplicity, we will not consider this slight generalization here. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 10 / 54

From Probability Theory one is familiar with the following characterization of weak convergence: Theorem (portmanteau theorem) Let X n, X, n N, be S-valued random variables. Then t.f.a.e.: (a) X n X, i.e. E [f (X n )] n E [f (X )] f Cb 0(S). (b) E [f (X n )] n E [f (X )] for all Lipschitz-continuous f Cb 0(S). (c) P(X O) lim inf n P(X n O) for all open O S. (d) P(X F ) lim sup n P(X n F ) for all closed F S. (e) P(X B) = lim n P(X n B) for all B B(S) with P(X )-negligible boundary, i.e. P(X B) = 0. (f) E [f (X n )] n E [f (X )] for all bounded B(S)-measurable functions f : S R that are P(X )-a.e. continuous. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 11 / 54

In an Euclidean k-space the distribution function is an appropriate tool to characterize weak convergence: Definition Let X L(P, R k ). Then its (cumulative) distribution function, for short cdf, is given by ( ) F X : R k [0, 1], x P(X x) = P X k (, x i ] i=1. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 12 / 54

In an Euclidean k-space the distribution function is an appropriate tool to characterize weak convergence: Definition Let X L(P, R k ). Then its (cumulative) distribution function, for short cdf, is given by ( ) F X : R k [0, 1], x P(X x) = P X k (, x i ] i=1. Remark Note that, as a consequence of the uniqueness theorem for finite measures, the cdf characterizes the distribution of X uniquely since { } k E := (, x i ] x 1,..., x k R i=1 is a π-system (i.e. it is -stable) that generates B(R k ). Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 12 / 54

Proposition (weak convergence on R k via cdf) Let X n, X, n N, be R k -valued random variables. Then it holds X n X iff F Xn (x) n F X (x) for all x R k where F X is continuous. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 13 / 54

Proposition (weak convergence on R k via cdf) Let X n, X, n N, be R k -valued random variables. Then it holds X n X iff F Xn (x) n F X (x) for all x R k where F X is continuous. Example (N 1 ( 0, 1 n) δ0 ) Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 13 / 54

Some more theory on weak convergence We have already observed that weak convergence is weak in the sense that it is implied by all other concepts of convergence that we have introduced. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 14 / 54

Some more theory on weak convergence We have already observed that weak convergence is weak in the sense that it is implied by all other concepts of convergence that we have introduced. Let us have a closer look at weak convergence and recall from calculus: Proposition (a) Every convergent sequence in R k is bounded. (b) Every bounded sequence in R k has a convergent subsequence. (Bolzano-Weierstrass theorem) Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 14 / 54

Some more theory on weak convergence We have already observed that weak convergence is weak in the sense that it is implied by all other concepts of convergence that we have introduced. Let us have a closer look at weak convergence and recall from calculus: Proposition (a) Every convergent sequence in R k is bounded. (b) Every bounded sequence in R k has a convergent subsequence. (Bolzano-Weierstrass theorem) Is there an analogon involving weak convergence and probabilistic boundedness? Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 14 / 54

Prohorov s theorem Yes, indeed: Prohorov s theorem answers this question: Theorem (Prohorov) Let (X n ) n N be a sequence of R k -valued random variables. Then it holds: (a) If X n X for some R k -valued random variable X, then (X n ) n N is uniformly tight 2. (b) If (X n ) n N is uniformly tight 2, then there exists a subsequence ( Xnj ) j N with X n j X for some R k -valued random variable X. For proving this theorem we need some additional concepts and results. 2 This will be made precise shortly. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 15 / 54

Probabilistic boundedness Definition (uniform tightness) Let I be an index set and F := {X i } i I a family of R k -valued random variables. Then F is called uniformly tight or bounded in probability if for every ε > 0 there is a constant M ε > 0, such that sup P ( X i 2 > M ε ) < ε. i I Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 16 / 54

Probabilistic boundedness Definition (uniform tightness) Let I be an index set and F := {X i } i I a family of R k -valued random variables. Then F is called uniformly tight or bounded in probability if for every ε > 0 there is a constant M ε > 0, such that Remark sup P ( X i 2 > M ε ) < ε. i I Uniform tightness of a sequence of random vectors in R k (i.e. I = N) is exactly the definition of the stochastic Landau notation O p : (X n ) n N is uniformly tight iff X n = O P (1). We will scrutinize this notation later. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 16 / 54

Helly s lemma Definition A function F : R k [0, 1] is called a defective distribution function if there is some finite measure µ on B(R k ) taking values in [0, 1] and a constant c F [0, 1], such that ( ) F (x) c F = µ ((, x]) = µ k i=1 (, x i ], x R k. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 17 / 54

Helly s lemma Definition A function F : R k [0, 1] is called a defective distribution function if there is some finite measure µ on B(R k ) taking values in [0, 1] and a constant c F [0, 1], such that ( ) Remark F (x) c F = µ ((, x]) = µ k i=1 (, x i ], x R k. 1 By continuity of measures from above we have: c F = lim xi F (x). i=1,...,k 2 A defective distribution function is a cdf iff the underlying finite measure is a probability measure and c F = 0. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 17 / 54

Helly s lemma Lemma (Helly s lemma/helly s selection theorem) Let (F n ) n N be a sequence of distribution functions with domain R k. Then this sequence possesses a subsequence ( F nj with the property )j N lim j F nj (x) = F (x) for each continuity point x R k of some defective distribution function F. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 18 / 54

Helly s lemma Lemma (Helly s lemma/helly s selection theorem) Let (F n ) n N be a sequence of distribution functions with domain R k. Then this sequence possesses a subsequence ( F nj with the property )j N lim j F nj (x) = F (x) for each continuity point x R k of some defective distribution function F. Rough idea of the proof. The proof is quite technical. Hence we only present the idea of the construction of F. For details, please refer to [Dur10, Thm. 1.1.6, Thm. 3.2.6] and [Van98, Lemma 2.5]. BOARD Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 18 / 54

Is it really possible that Helly s lemma fails to provide us with an honest cdf? Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 19 / 54

Is it really possible that Helly s lemma fails to provide us with an honest cdf? Unfortunately, yes! Example Consider a sequence (X n ) n N of real-valued random variables satisfying X n δ n, n N. Then the corresponding sequence of distribution functions is given by F n : R {0, 1}, x 1 [n, ) (x). Obviously lim j F nj (x) = 0 for each x R and each subsequence (n j ) j N. Hence Helly s lemma cannot yield an honest cdf! Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 19 / 54

Prohorov s theorem Now we are in a position to prove Prohorov s theorem: Theorem (Prohorov) Let (X n ) n N be a sequence of R k -valued random variables. Then it holds: (a) If X n X for some R k -valued random variable X, then (X n ) n N is uniformly tight. (b) If (X n ) n N is uniformly tight, then there exists a subsequence ( Xnj ) j N with X n j X for some R k -valued random variable X. Proof. BOARD Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 20 / 54

Stochastic Landau notation Similar to the well-known O-Notation from calculus, one can introduce a stochastic version of these Landau symbols in order to express the speed of convergence (in probability): Definition Let (X n ) n N, (R n ) n N be sequences of R k - and R-valued random variables, respectively. We write: (a) X n = O P (1) : {X n : n N} is uniformly tight. (b) X n = O P (R n ) : X n = R n Y n for a sequence (Y n ) n N of R k -valued random variables satisfying Y n = O P (1). Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 21 / 54

Stochastic Landau notation Similar to the well-known O-Notation from calculus, one can introduce a stochastic version of these Landau symbols in order to express the speed of convergence (in probability): Definition Let (X n ) n N, (R n ) n N be sequences of R k - and R-valued random variables, respectively. We write: (a) X n = O P (1) : {X n : n N} is uniformly tight. (b) X n = O P (R n ) : X n = R n Y n for a sequence (Y n ) n N of R k -valued random variables satisfying Y n = O P (1). P (c) X n = o P (1) : X n 0. (d) X n = o P (R n ) : X n = R n Y n for a sequence (Y n ) n N of R k -valued random variables satisfying Y n = o P (1). Commonly, (R n ) n N is called the rate (of convergence). Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 21 / 54

In our next chapter we will use differential calculus and therefore need the following lemma. Think of R as the remainder term in some Taylor expansion. Lemma Let D R k be open with 0 D and let R : D R m be a function with R(0) = 0. Furthermore let (X n ) n N be a sequence of random variables taking values in D with X n = o P (1). Then for every p > 0 we have: (a) R(h) = o ( h p 2) (h 0) = R(Xn ) = o P ( Xn p 2) ; (b) R(h) = O ( h p 2) (h 0) = R(Xn ) = O P ( Xn p 2). Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 22 / 54

In our next chapter we will use differential calculus and therefore need the following lemma. Think of R as the remainder term in some Taylor expansion. Lemma Let D R k be open with 0 D and let R : D R m be a function with R(0) = 0. Furthermore let (X n ) n N be a sequence of random variables taking values in D with X n = o P (1). Then for every p > 0 we have: (a) R(h) = o ( h p 2) (h 0) = R(Xn ) = o P ( Xn p 2) ; (b) R(h) = O ( h p 2) (h 0) = R(Xn ) = O P ( Xn p 2). Proof. Let p > 0 and define g(h) := { R(h) h p 2 if h 0, 0 else. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 22 / 54

In our next chapter we will use differential calculus and therefore need the following lemma. Think of R as the remainder term in some Taylor expansion. Lemma Let D R k be open with 0 D and let R : D R m be a function with R(0) = 0. Furthermore let (X n ) n N be a sequence of random variables taking values in D with X n = o P (1). Then for every p > 0 we have: (a) R(h) = o ( h p 2) (h 0) = R(Xn ) = o P ( Xn p 2) ; (b) R(h) = O ( h p 2) (h 0) = R(Xn ) = O P ( Xn p 2). Proof. Let p > 0 and define g(h) := { R(h) h p 2 if h 0, 0 else. [=:R n] [=:Y n] {}}{{}}{ Then for each n N we have R(X n ) = X n p 2 g(x n ). Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 22 / 54

Proof (continued). (a) By assumption R(h) lim g(h) = lim h 0 h 0 h p 2 = 0, i.e. g is continuous at 0. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 23 / 54

Proof (continued). (a) By assumption R(h) lim g(h) = lim h 0 h 0 h p 2 = 0, i.e. g is continuous at 0. Since X n 0 (by assumption), the P continuous mapping principle yields g(x n ) 0, i.e. g(x n ) = o P (1). Thus R(X n ) = o P ( X n p 2 ). P Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 23 / 54

Proof (continued). (a) By assumption R(h) lim g(h) = lim h 0 h 0 h p 2 = 0, i.e. g is continuous at 0. Since X n 0 (by assumption), the P continuous mapping principle yields g(x n ) 0, i.e. g(x n ) = o P (1). Thus R(X n ) = o P ( X n p 2 ). (b) By assumption there is some δ > 0 and some M > 0, s.t. g(h) 2 = R(h) M for all h B δ (0) D. 2 h p 2 P Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 23 / 54

Proof (continued). (a) By assumption R(h) lim g(h) = lim h 0 h 0 h p 2 = 0, i.e. g is continuous at 0. Since X n 0 (by assumption), the P continuous mapping principle yields g(x n ) 0, i.e. g(x n ) = o P (1). Thus R(X n ) = o P ( X n p 2 ). (b) By assumption there is some δ > 0 and some M > 0, s.t. g(h) 2 = R(h) P M for all h B δ (0) D. Since X n 0 2 h p 2 (by assumption), we obtain: P ( g(x n ) 2 > M) P ( X n 2 > δ) n 0. P Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 23 / 54

Proof (continued). Hence for given ε > 0 we can choose n ε N, s.t. P ( g(x n ) 2 > M) < ε 2 for all n > n ε. For n {1,..., n ε } we choose M ε M suitably large, s.t. P ( g(x n ) 2 > M ε ) < ε for these n. 2 Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 24 / 54

Proof (continued). Hence for given ε > 0 we can choose n ε N, s.t. P ( g(x n ) 2 > M) < ε 2 for all n > n ε. For n {1,..., n ε } we choose M ε M suitably large, s.t. P ( g(x n ) 2 > M ε ) < ε for these n. Obviously, this yields 2 sup P ( g(x n ) 2 > M ε ) ε n N 2 < ε, i.e. {g(x n )} n N is uniformly tight. Thus, g(x n ) = O P (1) and R(X n ) = O P ( X n p 2 ). Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 24 / 54

Proof (continued). Hence for given ε > 0 we can choose n ε N, s.t. P ( g(x n ) 2 > M) < ε 2 for all n > n ε. For n {1,..., n ε } we choose M ε M suitably large, s.t. P ( g(x n ) 2 > M ε ) < ε for these n. Obviously, this yields 2 sup P ( g(x n ) 2 > M ε ) ε n N 2 < ε, i.e. {g(x n )} n N is uniformly tight. Thus, g(x n ) = O P (1) and R(X n ) = O P ( X n p 2 ). Example (LLN) Let (X n ) n N ( L 1 (P) ) N be a sequence of i.i.d. random variables. Define S n := n i=1 X i. Then we know that Sn n = X n P E[X 1 ], i.e. S n ne[x 1 ] = o P (n). Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 24 / 54

Chapter 2 Delta Method Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 25 / 54

Motivation/Idea Given a limit law of n (T n θ) (often derived from the CLT), how to deduce one of n (φ(t n ) φ(θ)) where φ is some differentiable mapping? Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 26 / 54

Motivation/Idea Given a limit law of n (T n θ) (often derived from the CLT), how to deduce one of n (φ(t n ) φ(θ)) where φ is some differentiable mapping? Use a Taylor expansion! Remark In applications, T n often is an estimator for some parameter θ. Note that the question appeals to the limit distribution. Hence φ(t n ) may inherit a property like asymptotic efficiency from T n. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 26 / 54

Let us recall some definitions concerning estimators: Definition Let {P ϑ } ϑ Θ be a familiy of probability measures on B(R m ), consider an i.i.d. sample X 1,..., X n with P(X 1 ) {P ϑ } ϑ Θ and assume T n = T (X 1,..., X n ) : Ω Θ to be an estimator for ϑ. (a) T n is called consistent iff T n P ϑ ϑ (X 1 P ϑ ) for all ϑ Θ. (b) T n is called unbiased iff bias ϑ (T n ) := E ϑ [T n ] ϑ = 0 for all ϑ Θ. (Existence of the involved integral is required.) (c) T n is called asymptotically efficient iff (provided that T n is R k -valued) for all ϑ Θ we have: P ϑ ( n (Tn ϑ) ) N k ( 0, I(Pϑ ) 1). [ ]) Here I(P ϑ ) := (E ϑ ϑ i log f ϑ (X ) ϑ j log f ϑ (X ) denotes the i,j=1,...,k Fisher information matrix of P ϑ where Θ R k is assumed. Moreover X P ϑ is an R m -valued random variable and f ϑ = dp ϑ dλ, if P m ϑ λ m, and f ϑ = dp ϑ d#, if P m ϑ # m (in either way for all ϑ Θ). Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 27 / 54

Proof. BOARD Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 28 / 54 Delta method Basic result Theorem (Delta method) Let D R k be an open subset and suppose that φ : D R m is a mapping that is differentiable at θ D. Furthermore let {T n } n N be a family of D-valued random variables and let (r n ) n N be a sequence of n real numbers satisfying 0 < r n. If r n (T n θ) T for some D-valued random variable T, then n r n (φ(t n ) φ(θ)) n Dφ(θ)T, where Dφ(θ) L ( R k, R m) = R m k denotes the Fréchet-derivative (represented by the Jacobian matrix) of φ at θ. Moreover, we have P r n (φ(t n ) φ(θ)) Dφ(θ) (r n (T n θ)) 2 0.

There is also a slightly more general result if we assume φ to be of class C 1 around θ. We state it without proof: Theorem (Uniform delta method) Let D R k be an open subset and suppose that φ : D R m is a mapping that is continuously differentiable in an open neighborhood of θ D. Furthermore let {T n } n N be a family of D-valued random variables and let (r n ) n N be a sequence of real numbers satisfying n 0 < r n. If r n (T n θ n ) T for some D-valued random variable T and some n n D θ n θ, then r n (φ(t n ) φ(θ n )) n Dφ(θ)T, where Dφ(θ) L ( R k, R m) = R m k denotes the Fréchet-derivative (represented by the Jacobian matrix) of φ at θ. Moreover, we have P r n (φ(t n ) φ(θ n )) Dφ(θ) (r n (T n θ n )) 2 0. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 29 / 54

Application I: Testing variance CLT revisited Example (sample variance) Given a data set consisting of i.i.d. observations X 1,..., X n L 4 (P), n N, we want to estimate its variance: Therefore we consider the biased estimator bŝ n 2 := 1 n ( ) 2 Xi X n = X 2 n X 2 n = φ(x n, X n 2 n), i=1 where φ(x, y) := y x 2, (x, y) T R 2. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 30 / 54

Application I: Testing variance CLT revisited Example (sample variance) Given a data set consisting of i.i.d. observations X 1,..., X n L 4 (P), n N, we want to estimate its variance: Therefore we consider the biased estimator bŝ n 2 := 1 n ( ) 2 Xi X n = X 2 n X 2 n = φ(x n, X n 2 n), i=1 where φ(x, y) := y x 2, (x, y) T R 2. Define µ k := E[X1 k ], k = 1..., 4. Then for the vectors (X i, Xi 2 ) T, i = 1,..., n, it holds by the CLT: (( ) ( )) X n n µ1 = 1 n (( ) [( )]) Xi Xi X 2 n µ 2 n Xi 2 E Xi 2 Z i=1 Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 30 / 54

Example (sample variance (continued)) n ( (X n, X 2 n) T ( µ1, µ 2 ) T ) Z, where (( ) ( )) 0 µ2 µ Z N 2, 2 1 µ 3 µ 1 µ 2 0 µ 3 µ 1 µ 2 µ 4 µ 2. 2 Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 31 / 54

Example (sample variance (continued)) n ( (X n, X 2 n) T ( µ1, µ 2 ) T ) Z, where (( ) ( )) 0 µ2 µ Z N 2, 2 1 µ 3 µ 1 µ 2 0 µ 3 µ 1 µ 2 µ 4 µ 2. 2 Hence, the delta method implies (since b Ŝn 2 = φ(x n, X 2 n), where φ(x, y) = y x 2 ): ( ) n φ(x n, X 2 n) φ(µ 1, µ 2 ) Dφ(µ 1, µ 2 )Z, i.e. (since Dφ(x, y)z = ( 2x, 1 ) z, z R 2 ) n ( bŝ 2 n ( µ 2 µ 2 1) ) Z. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 31 / 54

Example (sample variance (continued)) n ( bŝ 2 n ( µ 2 µ 2 1) ) Z, where ( Z N 1 0, ( 2µ 1, 1 ) ( ) ( )) µ 2 µ 2 1 µ 3 µ 1 µ 2 2µ1 µ 3 µ 1 µ 2 µ 4 µ 2 2 1 = N 1 (0, µ 4 µ 2 2 2µ 4 1 4µ 1 µ 3 + 6µ 2 1µ 2 ). Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 32 / 54

Example (sample variance (continued)) n ( bŝ 2 n ( µ 2 µ 2 1) ) Z, where ( Z N 1 0, ( 2µ 1, 1 ) ( ) ( )) µ 2 µ 2 1 µ 3 µ 1 µ 2 2µ1 µ 3 µ 1 µ 2 µ 4 µ 2 2 1 = N 1 (0, µ 4 µ 2 2 2µ 4 1 4µ 1 µ 3 + 6µ 2 1µ 2 ). Slutsky s lemma implies that this result also holds for the corresponding unbiased estimator Ŝ 2 n := n n 1 bŝ 2 n. We will apply this result to construct a test for the variance of a data set. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 32 / 54

First, we recall some notions and results from Mathematical Statistics : Definition (kurtosis) Let X L 4 (P) be a random variable. Then the kurtosis of X is defined by E [(X E[X ]) 4] E [(X E[X ]) 4] κ X := ( E [(X E[X ]) 2]) 2 = (Var(X )) 2 Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 33 / 54

First, we recall some notions and results from Mathematical Statistics : Definition (kurtosis) Let X L 4 (P) be a random variable. Then the kurtosis of X is defined by E [(X E[X ]) 4] E [(X E[X ]) 4] κ X := ( E [(X E[X ]) 2]) 2 = (Var(X )) 2 Remark If X N 1 (µ, σ 2 ), then κ X = 3. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 33 / 54

First, we recall some notions and results from Mathematical Statistics : Definition (kurtosis) Let X L 4 (P) be a random variable. Then the kurtosis of X is defined by E [(X E[X ]) 4] E [(X E[X ]) 4] κ X := ( E [(X E[X ]) 2]) 2 = (Var(X )) 2 Remark If X N 1 (µ, σ 2 ), then κ X = 3. Definition (Chi-square distribution) Let X 1,..., X n N 1 (0, 1) be i.i.d. random variables. Then the probability measure χ 2 n := P ( n i=1 X i 2 ) on B(R) is called the chi-square distribution with n degrees of freedom. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 33 / 54

From Mathematical statistics one is familiar with the following Proposition Let X 1,..., X n N 1 (µ, σ 2 ) be i.i.d. random variables and Ŝn 2 = 1 n ( ) 2 n 1 i=1 Xi X n the unbiased estimator of σ 2 from above. Then ( ) (n 1)Ŝ n 2 P = χ 2 σ n 1. 2 Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 34 / 54

From Mathematical statistics one is familiar with the following Proposition Let X 1,..., X n N 1 (µ, σ 2 ) be i.i.d. random variables and Ŝn 2 = 1 n ( ) 2 n 1 i=1 Xi X n the unbiased estimator of σ 2 from above. Then ( ) (n 1)Ŝ n 2 P = χ 2 σ n 1. 2 This result gives rise to the following test for normal data: Example (One-sided test for σ 2 ) In the situation of the preceding proposition and for given σ 2 0 > 0, we want to test H 0 : σ 2 = σ 2 0 vs. H 1 : σ 2 > σ 2 0 at level α (0, 1). Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 34 / 54

Example (One-sided test for σ 2 (continued)) As a test statistic we use T n := (n 1)Ŝ2 n σ0 2 H 0 if T n > q (1 α) χ 2 n 1 σ 2 0 χ 2 n 1 and decide to reject ((1 α)-quantile of then χ 2 n 1-distribution). Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 35 / 54

Example (One-sided test for σ 2 (continued)) As a test statistic we use T n := (n 1)Ŝ2 n σ 2 0 H 0 if T n > q (1 α) χ 2 n 1 Hence we obtain: P σ 2 0 ( σ 2 0 χ 2 n 1 and decide to reject ((1 α)-quantile of then χ 2 n 1-distribution). ) (( T n > q (1 α) = 1 χ 2 n 1 χ 2 n 1 i.e. the test has exactly level α., q (1 α) χ 2 n 1 ]) = α, Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 35 / 54

Example (One-sided test for σ 2 (continued)) As a test statistic we use T n := (n 1)Ŝ2 n σ 2 0 H 0 if T n > q (1 α) χ 2 n 1 Hence we obtain: P σ 2 0 ( σ 2 0 χ 2 n 1 and decide to reject ((1 α)-quantile of then χ 2 n 1-distribution). ) (( T n > q (1 α) = 1 χ 2 n 1 χ 2 n 1 i.e. the test has exactly level α., q (1 α) χ 2 n 1 ]) = α, What if we know that the given data are not normally distributed? Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 35 / 54

Example (One-sided test for σ 2 (continued)) As a test statistic we use T n := (n 1)Ŝ2 n σ 2 0 H 0 if T n > q (1 α) χ 2 n 1 Hence we obtain: P σ 2 0 ( σ 2 0 χ 2 n 1 and decide to reject ((1 α)-quantile of then χ 2 n 1-distribution). ) (( T n > q (1 α) = 1 χ 2 n 1 χ 2 n 1 i.e. the test has exactly level α., q (1 α) χ 2 n 1 ]) = α, What if we know that the given data are not normally distributed? We use the approximation n 2 (Ŝ n ( ) ) µ 2 µ 2 1 Z, where Z N 1 (0, µ 4 µ 2 2 2µ 4 1 4µ 1 µ 3 + 6µ 2 1µ 2 ), from above to derive a test of asymptotic level α for certain data sets. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 35 / 54

Example (One-sided test for σ 2 without normality assumption) Let X 1,..., X n L 4 (P) be i.i.d. random variables. As above let µ k := E[X1 k ]. For the sake of simplicity we assume µ 1 = 0. 3 We obtain: σ 2 := Var(X 1 ) = µ 2 and κ := κ X1 = µ 4. Hence, our µ 2 2 approximation reduces to ) (Ŝ2 n n 1 Z N 1 (0, κ 1). µ 2 3 This is not a restriction: Centering observations neither affects their dispersion nor Ŝ n 2 = 1 n ( ) 2. n 1 i=1 Xi X n (After centering one has to use centered moments instead of our µ k.) Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 36 / 54

Example (One-sided test for σ 2 without normality assumption) Let X 1,..., X n L 4 (P) be i.i.d. random variables. As above let µ k := E[X1 k ]. For the sake of simplicity we assume µ 1 = 0. 3 We obtain: σ 2 := Var(X 1 ) = µ 2 and κ := κ X1 = µ 4. Hence, our µ 2 2 approximation reduces to ) (Ŝ2 n n 1 Z N 1 (0, κ 1). µ 2 Again, for given σ 2 0 > 0, we want to test H 0 : σ 2 = σ0 2 vs. H 1 : σ 2 > σ0 2 at level α (0, 1). We use T n := ( ) n Ŝ2 n 1 as test statistic. σ0 2 3 This is not a restriction: Centering observations neither affects their dispersion nor Ŝ n 2 = 1 n ( ) 2. n 1 i=1 Xi X n (After centering one has to use centered moments instead of our µ k.) Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 36 / 54

T n κ 1 = (Ŝ ) 2 n n κ 1 1 N σ0 2 1 (0, 1) 1000 repetitions Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 37 / 54

T n κ 1 = (Ŝ ) 2 n n κ 1 1 N σ0 2 1 (0, 1) 1000 repetitions Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 38 / 54

A remark on quantiles of χ 2 n 1 Remark Let C n 1 χ 2 n 1, n N >1. Then the CLT implies ( ) Cn 1 (n 1) P N 1 (0, 1). 2n 2 4 This is due to the fact that Φ increases strictly. ( Probability Theory ) Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 39 / 54

A remark on quantiles of χ 2 n 1 Remark Let C n 1 χ 2 n 1, n N >1. Then the CLT implies ( ) Cn 1 (n 1) P N 1 (0, 1). 2n 2 Hence, for α (0, 1), the (1 α)-quantile of the latter probability distribution converges to that of a standard normal distribution as quantiles of this distribution are uniquely determined 4 : lim n (n 1) 2n 2 }{{} =: 1 q n 2 q (1 α) χ 2 n 1 = Φ 1 (1 α). 4 This is due to the fact that Φ increases strictly. ( Probability Theory ) Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 39 / 54

Example (One-sided test for σ 2 (continued)) ) σ 2 0 Recall: T n := ( n Ŝ2 n 1 σ Z N 0 2 1 (0, κ 1), lim n q n = 2Φ 1 (1 α). Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 40 / 54

Example (One-sided test for σ 2 (continued)) ) σ 2 0 Recall: T n := ( n Ŝ2 n 1 σ Z N 0 2 1 (0, κ 1), lim n q n = 2Φ 1 (1 α). Thus Slutsky s lemma implies: P σ 2 0 ((T n q n ) ) N 1 ( ) 2Φ 1 (1 α), κ 1. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 40 / 54

Example (One-sided test for σ 2 (continued)) ) σ 2 0 Recall: T n := ( n Ŝ2 n 1 σ Z N 0 2 1 (0, κ 1), lim n q n = 2Φ 1 (1 α). Thus Slutsky s lemma implies: P σ 2 0 ((T n q n ) ) N 1 ( ) 2Φ 1 (1 α), κ 1. We implement the following decision rule: Reject H 0 if T n > q (1 α) χ 2 (n 1) n 1 n 1 = q n. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 40 / 54

Example (One-sided test for σ 2 (continued)) ) σ 2 0 Recall: T n := ( n Ŝ2 n 1 σ Z N 0 2 1 (0, κ 1), lim n q n = 2Φ 1 (1 α). Thus Slutsky s lemma implies: P σ 2 0 ((T n q n ) ) N 1 ( ) 2Φ 1 (1 α), κ 1. We implement the following decision rule: Reject H 0 if T n > q (1 α) χ 2 (n 1) n 1 n 1 = q n. For the error of type I, we obtain: P σ 2 0 (T n > q n ) = P σ 2 0 (T n q n > 0) n portmanteau (e) 1 N 1 ( 2Φ 1 (1 α), κ 1 ) ((, 0]) Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 40 / 54

Example (One-sided test for σ 2 (continued)) ( P σ 2 0 (T n > q n ) n 2Φ 1 (1 α) 1 Φ κ 1 )! = α κ = 3! α 1 κ 3 Hence our decision rule establishes an (asymptotic) one-sided test (of level α) for σ 2 iff the distribution of the observations is platykurtic or mesokurtic, i.e. κ < 3 and κ = 3, respectively. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 41 / 54

Recall: H 0 : σ 2 = σ 2 0 T n := (n 1)Ŝ2 n σ 2 0 T n := n ( Ŝ2 n σ 2 0 H 0 if T n > q n = used with normal data Reject H 0 if T n > q (1 α). χ 2 n 1 ) 1 used with possibly non-normal data Reject q (1 α) χ 2 (n 1) n 1 n 1. Remark The presented testing procedures are closely related: They are based on the same (asymptotic) decision rule (if µ 1 = 0) as one can prove: T n > q n T n (n 1) > n 1 ( ) q (1 α) (n 1). }{{ n χ 2 n 1 } 1 for large n Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 42 / 54

Application II: Asymptotic confidence intervals and variance-stabilizing transformations (VST) We are given a parametric model {P ϑ } ϑ Θ of probability measures on B(R k ) and assume Θ := (ϑ, ϑ + ) R to be an open interval. Furthermore assume the existence of an estimator T n = T (X 1,..., X n ) of ϑ Θ (where {X l } l N is a familiy of i.i.d. R k -valued random variables with P(X 1 ) {P ϑ } ϑ Θ 5 ) that satisfies n (Tn ϑ) ϑ = n T N 1(0, σ 2 (ϑ)) for all ϑ Θ. We assume that σ 2 ( ) is known as function of ϑ. 5 Formally, for l N, Xl is defined on (Ω, A, P) as fixed above. If ϑ Θ is the true (but unknown) parameter then the image measure of P under X 1 (i.e. the distribution of X 1 ) is given by P ϑ. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 43 / 54

Application II: Asymptotic confidence intervals and variance-stabilizing transformations (VST) We are given a parametric model {P ϑ } ϑ Θ of probability measures on B(R k ) and assume Θ := (ϑ, ϑ + ) R to be an open interval. Furthermore assume the existence of an estimator T n = T (X 1,..., X n ) of ϑ Θ (where {X l } l N is a familiy of i.i.d. R k -valued random variables with P(X 1 ) {P ϑ } ϑ Θ 5 ) that satisfies n (Tn ϑ) ϑ = n T N 1(0, σ 2 (ϑ)) for all ϑ Θ. We assume that σ 2 ( ) is known as function of ϑ. Task: For fixed γ (0, 1), find an asymptotic confidence interval for ϑ. 5 Formally, for l N, Xl is defined on (Ω, A, P) as fixed above. If ϑ Θ is the true (but unknown) parameter then the image measure of P under X 1 (i.e. the distribution of X 1 ) is given by P ϑ. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 43 / 54

Application II: Asymptotic confidence intervals and variance-stabilizing transformations (VST) We are given a parametric model {P ϑ } ϑ Θ of probability measures on B(R k ) and assume Θ := (ϑ, ϑ + ) R to be an open interval. Furthermore assume the existence of an estimator T n = T (X 1,..., X n ) of ϑ Θ (where {X l } l N is a familiy of i.i.d. R k -valued random variables with P(X 1 ) {P ϑ } ϑ Θ 5 ) that satisfies n (Tn ϑ) ϑ = n T N 1(0, σ 2 (ϑ)) for all ϑ Θ. We assume that σ 2 ( ) is known as function of ϑ. Task: For fixed γ (0, 1), find an asymptotic confidence interval for ϑ. First idea: Consider the asymptotic γ-confidence interval CI ϑ;n (γ) := [ T n Φ 1 ( 1 + γ 2 ) σ(ϑ) n, T n Φ 1 ( 1 + γ 2 ) ] σ(ϑ). n 5 Formally, for l N, Xl is defined on (Ω, A, P) as fixed above. If ϑ Θ is the true (but unknown) parameter then the image measure of P under X 1 (i.e. the distribution of X 1 ) is given by P ϑ. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 43 / 54

First idea: Consider the asymptotic γ-confidence interval [ ( ) ( ) ] 1 + γ σ(ϑ) 1 + γ σ(ϑ) CI ϑ;n(γ) = T n Φ 1, T n Φ 1. 2 n 2 n Problem: ϑ and thus σ(ϑ) are unknown in general. Hence, these confidence intervals are useless in practice. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 44 / 54

First idea: Consider the asymptotic γ-confidence interval [ ( ) ( ) ] 1 + γ σ(ϑ) 1 + γ σ(ϑ) CI ϑ;n(γ) = T n Φ 1, T n Φ 1. 2 n 2 n Problem: ϑ and thus σ(ϑ) are unknown in general. Hence, these confidence intervals are useless in practice. Solution 1: Estimate σ 2 (ϑ) using a consistent estimator. This approach is discussed in Mathematical Statistics. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 44 / 54

First idea: Consider the asymptotic γ-confidence interval [ ( ) ( ) ] 1 + γ σ(ϑ) 1 + γ σ(ϑ) CI ϑ;n(γ) = T n Φ 1, T n Φ 1. 2 n 2 n Problem: ϑ and thus σ(ϑ) are unknown in general. Hence, these confidence intervals are useless in practice. Solution 1: Estimate σ 2 (ϑ) using a consistent estimator. This approach is discussed in Mathematical Statistics. Solution 2: Use a variance-stabilizing transformation of the given data set. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 44 / 54

Variance-stabilizing transformations Assumption Let ϑ 0 Θ be fixed. We assume that the mapping Θ = (ϑ, ϑ + ) ϑ ϑ ϑ 0 1 σ(θ) dθ R is well-defined and differentiable (with derivative 1/σ( )). Definition (VST) In the stated situation, under the latter assumption and for some fixed η > 0, the differentiable mapping φ : Θ = (ϑ, ϑ + ) R, ϑ is called a variance-stabilizing transformation. ϑ ϑ 0 η σ(θ) dθ Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 45 / 54

Recall: φ(ϑ) = ϑ ϑ 0 Remark (Basic properties) η σ(θ) dθ, ϑ (ϑ, ϑ + ) 1 φ is continuous and due to η > 0 and σ > 0 on Θ also strictly increasing as its derivative equals φ η. Hence φ is invertible. σ 2 φ exhibits the variance-stabilizing property: φ σ η. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 46 / 54

Recall: φ(ϑ) = ϑ ϑ 0 Remark (Basic properties) η σ(θ) dθ, ϑ (ϑ, ϑ + ) 1 φ is continuous and due to η > 0 and σ > 0 on Θ also strictly increasing as its derivative equals φ η. Hence φ is invertible. σ 2 φ exhibits the variance-stabilizing property: φ σ η. Remark (What is the origin of this name?) Recall: n (T n ϑ) differentiable on Θ. ϑ = n T N 1(0, σ 2 (ϑ)) for all ϑ Θ and φ is Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 46 / 54

Recall: φ(ϑ) = ϑ ϑ 0 Remark (Basic properties) η σ(θ) dθ, ϑ (ϑ, ϑ + ) 1 φ is continuous and due to η > 0 and σ > 0 on Θ also strictly increasing as its derivative equals φ η. Hence φ is invertible. σ 2 φ exhibits the variance-stabilizing property: φ σ η. Remark (What is the origin of this name?) Recall: ϑ n (T n ϑ) = T N 1(0, σ 2 (ϑ)) for all ϑ Θ and φ is n differentiable on Θ. Hence, the delta method implies: n (φ(tn ) φ(ϑ)) ϑ = n φ (ϑ)t N 1 (0, (φ (ϑ)) 2 σ 2 (ϑ)) = N 1 (0, η 2 ) for all ϑ Θ, i.e. the asymptotic variance is stabilized to η 2 (which is usually chosen to be 1 in practice). Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 46 / 54

Recall: T n is an estimator of ϑ with n (Tn ϑ) ϑ = n T N 1(0, σ 2 (ϑ)) for all ϑ Θ. Goal: Find an asymptotic γ-confidence interval for ϑ. Derived so far: ϑ n (φ(tn ) φ(ϑ)) = n φ (ϑ) N 1 (0, η 2 ) for all ϑ Θ. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 47 / 54

Recall: T n is an estimator of ϑ with n (Tn ϑ) ϑ = n T N 1(0, σ 2 (ϑ)) for all ϑ Θ. Goal: Find an asymptotic γ-confidence interval for ϑ. Derived so far: ϑ n (φ(tn ) φ(ϑ)) = n φ (ϑ) N 1 (0, η 2 ) for all ϑ Θ. Example (Asymptotic CI via VST) In the above situation and using a variance-stabilizing transformation φ, our First Idea implies that [ ( ) ( ) ] 1 + γ η 1 + γ η CI φ(ϑ),n (γ) := φ(t n ) Φ 1 n, φ(t n ) + Φ 1 n 2 2 is an asymptotic γ-confidence interval for φ(ϑ). Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 47 / 54

Example (Asymptotic CI via VST (continued)) [ ( ) 1 + γ η CI φ(ϑ),n (γ) := φ(t n ) Φ 1 2 is an asymptotic γ-confidence interval for φ(ϑ). n, φ(t n ) + Φ 1 ( 1 + γ 2 ) ] η n Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 48 / 54

Example (Asymptotic CI via VST (continued)) [ ( ) 1 + γ η CI φ(ϑ),n (γ) := φ(t n ) Φ 1 2 n, φ(t n ) + Φ 1 ( 1 + γ 2 ) ] η n is an asymptotic γ-confidence interval for φ(ϑ). Idea: Transform this interval using φ 1 to obtain an asymptotic CI for ϑ. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 48 / 54

Example (Asymptotic CI via VST (continued)) [ ( ) 1 + γ η CI φ(ϑ),n (γ) := φ(t n ) Φ 1 2 n, φ(t n ) + Φ 1 ( 1 + γ 2 ) ] η n is an asymptotic γ-confidence interval for φ(ϑ). Idea: Transform this interval using φ 1 to obtain an asymptotic CI for ϑ. We know for ϑ Θ: γ lim inf n P ϑ (φ(ϑ) CI φ(ϑ),n (γ) ) ( }) = lim inf P ϑ ϑ {φ CI n φ(ϑ),n (γ). Now, since φ is continuous and strictly increasing (in particular one-to-one), { φ CI φ(ϑ),n(γ) } R is really an interval. Thus, depending on a specific φ, we obtain an asymptotic γ-confidence interval for ϑ. Note that this interval is easy to compute in practice: Just apply φ 1 to the boundary values of CI φ(ϑ),n(γ). Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 48 / 54

Chapter 3 Moment Estimators Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 49 / 54

Method of Moments Let Θ R k and {P ϑ } ϑ Θ be a familiy of probability measures on B(R). As above we assume that {X l } l N is a familiy of i.i.d. R-valued random variables with P(X 1 ) {P ϑ } ϑ Θ, i.e. the distribution of X 1 is known up to the parameter vector ϑ Θ. 6 Of course, this requires certain integrability conditions on these functions and X 1 s.t. all involved expectations are well-defined. 7 Note that it is not clear a priori whether such a ϑ exists. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 50 / 54

Method of Moments Let Θ R k and {P ϑ } ϑ Θ be a familiy of probability measures on B(R). As above we assume that {X l } l N is a familiy of i.i.d. R-valued random variables with P(X 1 ) {P ϑ } ϑ Θ, i.e. the distribution of X 1 is known up to the parameter vector ϑ Θ. Given some functions f 1,..., f k : R R, the method of moments pursues the following ansatz 6 : Find ϑ Θ, s.t. f j (X ) n = 1 n n f j (X i ) =! E ϑ [f j (X 1 )], j = 1,..., k. 7 i=1 It is obvious that the LLN motivates this approach. 6 Of course, this requires certain integrability conditions on these functions and X 1 s.t. all involved expectations are well-defined. 7 Note that it is not clear a priori whether such a ϑ exists. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 50 / 54

Ansatz Find ϑ Θ, s.t. 1 n n i=1 f j(x i )! = E ϑ [f j (X 1 )], j = 1,..., k. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 51 / 54

Ansatz Find ϑ Θ, s.t. 1 n n i=1 f j(x i )! = E ϑ [f j (X 1 )], j = 1,..., k. Remark (Recourse to Mathematical Statistics ) Consider f j (x) := x j, j = 1,..., k. Then the method of moments reduces to finding ϑ Θ s.t. X j n = 1 n n i=1 X j i = 1 n n ] f j (X i ) =! E ϑ [f j (X 1 )] = E ϑ [X j 1, j = 1,..., k. i=1 Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 51 / 54

Ansatz Find ϑ Θ, s.t. 1 n n i=1 f j(x i )! = E ϑ [f j (X 1 )], j = 1,..., k. Remark (Recourse to Mathematical Statistics ) Consider f j (x) := x j, j = 1,..., k. Then the method of moments reduces to finding ϑ Θ s.t. X j n = 1 n n i=1 X j i = 1 n n ] f j (X i ) =! E ϑ [f j (X 1 )] = E ϑ [X j 1, j = 1,..., k. i=1 Now we want to scrutinize conditions for existence and asymptotic normality of this type of estimator (to be introduced shortly). Therefore, we use the following Notation f := (f 1,..., f k ) T, e : Θ R k, ϑ E ϑ [f (X 1 )]. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 51 / 54

Moment estimators Therefore, the equation of interest is given by )! (f j (X ) n = e(ϑ). (*) j=1,...,k Definition (moment estimators) An estimator ϑ n solving equation (*) is called a moment estimator. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 52 / 54

Existence and asymptotic normality Theorem We consider the situation stated above. Let Θ R k be an open set and suppose e(ϑ) = E ϑ [f (X 1 )], ϑ Θ, is continuously differentiable in an open neighborhood of some point ϑ] 0 Θ with det De(ϑ 0 ) 0. Moreover, assume that E ϑ0 [ f (X 1 ) 2 2 <. Then e is C 1 -invertible in an open neighborhood of ϑ 0 and moment estimators ϑ n exists with probability tending to 1 as n 8. Furthermore they obey ( n ) ) ϑ0 9 ( P ( ϑn ϑ 0 N k 0, (De(ϑ 0 )) 1 Cov ϑ0 [f (X 1 )] (De(ϑ 0 )) T) Proof. BOARD 8 I.e., informally, the set of ω s where ϑ n can be defined gains P-mass as n. 9 If ϑ 0 is the true parameter. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 53 / 54

List of literature Durrett, R.: Probability: Theory and Examples. Cambridge University Press, 2010. Klenke, A.: Wahrscheinlichkeitstheorie. Berlin: Springer, 2013. Redenbach, C.: Mathematical Statistics. Lecture Notes TU Kaiserslautern, 2014. Seifried, F. T.: Maß und Integration. Lecture Notes TU Kaiserslautern, 2013. Seifried, F. T.: Probability Theory. Lecture Notes TU Kaiserslautern, 2014. Van Der Vaart, A. W.: Asymptotic Statistics. Cambridge University Press, 1998. Daniel Hoffmann (TU KL) Seminar: Asymptotic Statistics February 13, 2015 54 / 54