Theoretical Statistics. Lecture 22.

Size: px

Start display at page:

Download "Theoretical Statistics. Lecture 22."

Evan George
5 years ago
Views:

1 Theoretical Statistics. Lecture 22. Peter Bartlett 1. Recall: Asymptotic testing. 2. Quadratic mean differentiability. 3. Local asymptotic normality. [vdv7] 1

2 Recall: Asymptotic testing Consider the asymptotics of a test. We have A parametric model P θ for θ Θ. A null hypothesis θ = θ 0. An alternative hypothesis θ = θ 0 +h n. Test: compute the log likelihood ratio, λ = log n dp θ0 +h n dp θ0 (X i ), and reject the null hypothesis if it is sufficiently large. 2

3 Recall: Asymptotic testing For example, suppose P θ = N(θ,σ 2 ). Then we saw that λ = nh n σ ( X θ 2 0 ) nh2 n 2σ ) 2 θ 0 N ( nh2 n 2σ, nh2 n. 2 σ 2 For nh n h 0, the normal parameters approach ( h 2 /(2σ 2 ),h 2 /σ 2 ). 3

4 Recall: Asymptotic testing Another example. The exponential family with sufficient statistict : p θ (x) = exp(t(x)θ A(θ)). We have λ = log for h n = h/ n. n dp θ0 +h n dp θ0 (X i ) n = h n (T(X i ) P θ0 T(X i )) n 2 A (θ 0 )h 2 n +o(nh 2 n) θ 0 N ( h2 var θ0 (T(X 1 )) 2 ),h 2 var(t(x 1 )), θ 0 4

5 Local asymptotic normality: Taylor series Suppose that we have a density p θ wrt some measure, and the log likelihood, l θ (x) = logp θ (x) is twice differentiable wrt θ, and can be approximated by its second order Taylor series, l θ+h (x) = l θ (x)+h T lθ (x)+ 1 2 ht lθ (x)h+o( h 2 ). Then λ = log = n dp θ+hn dp θ (X i ) n (logp θ+hn (X i ) logp θ (X i )) = h T n n l θ (X i )+ 1 2 ht n n l θ (X i )h n +o(n h n 2 ). 5

6 Score functions Consider the log likelihood function l θ (x) = logp θ (x). Its derivative l θ is called the score function. For X P θ (and for l θ satisfying regularity conditions), we have 1. The score function has mean zero: P θ lθ = 0, 2. The mean curvature of the log likelihood is the negative Fisher information: P θ lθ = I θ, where I θ = P θ lθ lt θ. 6

7 Notice that Score functions: Proof p θ (x)dµ(x) = 1 implies ṗ θ (x)dµ(x) = 0, p θ (x)dµ(x) = 0. But and P θ lθ = P θ lθ = l θ dp θ = l θ p θ dµ = ( pθ p θ ṗθṗ T θ p 2 θ ṗθ p θ p θ dµ = ) p θ dµ = ṗ θ dµ = 0 l θ lt θ p θ dµ = I θ. 7

8 Thus, So if nh n h, Local asymptotic normality: Taylor series λ = h T n n 1 n 1/2 1 n n n l θ (X i )+ 1 2 ht n l θ (X i ) P θ N(0,I θ ), l θ (X i ) P θ I θ. n P θ N ( 1 ) 2 ht I θ h,h T I θ h. l θ (X i )h n +o(n h n 2 ) This behavior is known as local asymptotic normality. 8

9 Quadratic mean differentiability What conditions make this argument rigorous? A weaker condition than twice differentiability suffices: θ p θ differentiable for most x. Definition: The root density θ p θ (for θ R k ) is differentiable in quadratic mean at θ if there exists a vector-valued measurable function l θ : X R k such that, forh 0, ( pθ+h p θ 1 ) 2 2 ht lθ pθ dµ = o( h 2 ). 9

10 Quadratic mean differentiability Why the strange notation? Ifθ p θ is differentiable, then θ pθ = 1 2 θ p θ = 1 θ p θ pθ = 1 pθ θ l θ = pθ 2 p θ 2 1 pθ lθ. 2 Notice that we do not need differentiability at every x. Rather, the L 2 (µ) (average under µ squared) error should be small. 10

11 QMD and local asymptotic normality Theorem: If Θ is an open subset of R k, and P θ is QMD at θ Θ, then 1. P θ lθ = I θ = P θ lθ l T θ exists. 3. For every h n satisfying nh n h, log n p θ+hn p θ (X i ) = 1 n n θ N h T lθ (X i ) 1 2 ht I θ h+o Pθ (1) ( 1 ) 2 ht I θ h,h T I θ h. QMD of p θ is elegant: ( p) 2 dµ = 1; we can use inner prods inl 2 (µ). 11

12 QMD sufficient conditions Theorem: If 1. Θ is an open subset of R k. 2. θ p θ (x) is continuously differentiable at µ-almost all x. 3. I θ = ṗ θ ṗ T θ /p θdµ is continuous inθ. Then p θ is QMD at θ, with l θ = ṗ θ /p θ. 12

13 QMD Examples Exponential families are QMD. (See earlier example). Location families. p θ (x) = f(x θ), where f is positive, continuously differentiable, with I θ = ( f (x) f(x) ) 2 f(x)dx <, are QMD. (Note that, because we can shift x byθ,i θ does not depend onθ.) 13

14 QMD Examples Laplace location model is QMD: p θ (x) = 1 2 exp( x θ ). Notice that p θ is not differentiable. But it is QMD (because the single point of non-differentiability, θ, has measure zero). Uniform distributionp θ on[0,θ] is not QMD. Indeed, QMD requires ( o( h 2 ) = pθ+h p θ 1 ) 2 2 ht lθ pθ dµ θ+h ( pθ+h p θ 1 ) 2 2 ht lθ pθ dµ θ = h θ +h, which is a contradiction. 14

15 Recall: Contiguity Theorem: For log dq n dp n P n N(µ,σ 2 ), Q n P n iffµ = σ 2 /2. (Also, P n Q n for any µ,σ 2.) But for QMD families, if h n satisfies nh n h, log n p θ+hn p θ (X i ) = 1 n n θ N h T lθ (X i ) 1 2 ht I θ h+o Pθ (1) ( 1 ) 2 ht I θ h,h T I θ h. SoP n θ+h n P n θ. 15

16 Recall: Contiguity and change of measure Lemma: [Le Cam s Third Lemma] Suppose, for X n R k, ( X n,log dq ) n P n N µ, Σ τ. dp n τ T σ 2 Q n Then X n N(µ+τ,Σ). σ2 2 16

17 Asymptotically linear statistics Suppose the model {P θ : θ Θ} is QMD, and a statistic T n satisfies n(tn µ θ ) = 1 n n ψ θ (X i )+o Pθ (1), where P θ ψ θ = 0 andp θ ψ θ ψ T θ = Σ. Then for h n satisfying nh n h, the sequence of log likelihood ratios satisfies log dpn θ+h n dp n θ (X 1,...,X n ) = 1 n n h T lθ (X i ) 1 2 ht I θ h+o Pθ (1). 17

18 Asymptotically linear statistics Thus, the central limit theorem implies ( ) n(tn µ θ ),log dpn θ+h n θ 0 dpθ n N, Σ 1 2 ht I θ h τ T τ, h T I θ h where τ = P θ ψ θ h T lθ. Then n(t n µ θ ) θ+h n N ( ) P θ ψ θ h T lθ,σ. 18

Theoretical Statistics. Lecture 23.

Theoretical Statistics. Lecture 23. Peter Bartlett 1. Recall: QMD and local asymptotic normality. [vdv7] 2. Convergence of experiments, maximum likelihood. 3. Relative efficiency of tests. [vdv14] 1 Local