Theoretical Statistics. Lecture 19.

Size: px

Start display at page:

Download "Theoretical Statistics. Lecture 19."

Luke Casey
6 years ago
Views:

1 Theoretical Statistics. Lecture 19. Peter Bartlett 1. Functional delta method. [vdv20] 2. Differentiability in normed spaces: Hadamard derivatives. [vdv20] 3. Quantile estimates. [vdv21] 1

2 Recall: Delta method Theorem: If φ : R k R m is differentiable at θ, and n(t n θ) T, then n(φ(tn ) φ(θ)) φ θ(t) n(φ(tn ) φ(θ)) φ θ( n(t n θ)) P 0. Here, φ θ is the derivative (linear map) satisfying φ(θ+h) φ(θ) = φ θ(h)+o( h ) for h 0. 2

3 Functional delta method What about more complex cases? For instance, what if we are interested in a property of the probability distributionφ(p), and we use an estimator φ(p n )? As a first example, we ll get some intuition by considering a Taylor series expansion about the value φ(p). But what does it mean for the Taylor series expansion ofφto exist? To make this rigorous, we need to consider φ as a map between normed linear spaces, and we need it to be appropriately differentiable. The right notion is Hadamard differentiability. 3

4 Functional delta method Given X 1,...,X n from a distributionp, and a parameter of interest φ(p), suppose that we estimate φ(p) byφ(p n ), where P n is the empirical distribution. What is the asymptotic behavior of φ(p n )? Define the derivative φ (k) P (H) of the map t φ(p +th) at t = 0, where H is a perturbation direction. Then if the derivatives exist we have a Taylor series expansion: φ(p+th) φ(p) = tφ P(H)+ 1 2 t2 φ (2) P (H)+ + 1 m! tm φ (m) P (H)+o(tm ). 4

5 Functional delta method Substitutingt = 1/ n and H = G n (= n(p n P)X), gives the von Mises expansion, th = 1 n G n = P n P, φ(p n ) φ(p) = 1 n φ P(G n )+ 1 2n φ(2) P (G n)+ + 1 m!n m/2φ(m) P (G n)+o(n m/2 ). (This is not rigorous: it requires a stronger notion of differentiability, because the perturbation direction is now random: H = G n.) 5

6 Functional delta method If φ P is a linear map, we have φ(p n ) φ(p) = 1 n φ P(G n )+o(n 1/2 ) = 1 n n φ P(δ Xi P)+o(n 1/2 ), i=1 where δ Xi is the discrete distribution concentrated onx i. So ifφ P (δ X P) is mean zero and finite variance, we have that n(φ(pn ) φ(p)) is asymptotically normal. 6

7 Aside: influence functions The function x φ P (δ x P) is the influence function ofφ: φ P(δ x P) = d dt φ(p +t(δ x P)) t=0 = d dt φ((1 t)p +tδ x). t=0 This measures the impact of changing P by mixing in a tiny amount ofδ Xi. It is important in robust statistics. 7

8 Example: Quantiles Suppose we wish to estimate the pth quantile of P, where p (0,1). We ll write it as φ(f) for the cumulative distribution functionf of P. IfF is continuous at the appropriate point, we can define φ(f) = F 1 (p). We need to calculate φ F(s x F) = d dt φ((1 t)f +ts x), t=0 where s x is the cdf of δ x, i.e., the step function a 1[a x]. 8

9 Example: Quantiles WriteF t = (1 t)f +ts x, then differentiate both sides of the equation p = F t (φ(f t )) at t = 0 (ignoring the non-differentiability at φ(f t ) = x): 0 = d F t (φ(f t )) d t t=0 = d (1 t)f(φ(f t ))+ts x (φ(f t )) d t t=0 = F(φ(F t ))+(1 t)f(φ(f t ))φ F t (s x F)+s x (φ(f t )) t=0 = F(φ(F))+f(φ(F))φ F(s x F)+s x (φ(f)), where f is the density of P. 9

10 Example: Quantiles Rearranging, we have φ F(s x F) = F(φ(F)) s x(φ(f)) f(φ(f)) = p s x(f 1 (p)) f(f 1 (p)) p 1 f(f = 1 (p)) ifx F 1 (p), ifx > F 1 (p). p f(f 1 (p)) 10

11 Example: Quantiles So Eφ F(s X F) = p(p 1)+(1 p)p f(f 1 (p)) varφ F(s X F) = p(1 p)2 +(1 p)p 2 f(f 1 (p)) 2 = p(1 p) f(f 1 (p)) 2. = 0 And we have n(φ(fn ) φ(f)) N ( 0, ) p(1 p) (f(f 1 (p))) 2. 11

12 Differentiability of functions in normed spaces How do we make this rigorous? Functional delta method. Theorem: Supposeφ : D E, whered ande are normed linear spaces. Suppose the statistict n : Ω n D satisfies n(t n θ) T for a random element T ind 0 D. If φ is Hadamard differentiable at θ tangentially tod 0 then n(φ(tn ) φ(θ)) φ θ(t). If we can extend φ : D 0 E to a continuous mapφ : D E, then n(φ(tn ) φ(θ)) = φ θ( n(t n θ))+o P (1). 12

13 Differentiability of functions in normed spaces For the von Mises expansion, we considered φ P(H) = d dt φ(p +th) t=0 for some perturbation direction H. This is the Gateaux derivative. Definition: φ : D E is Gateaux differentiable at θ D if h D, φ θ(h) E, s.t. as t 0, φ(θ+th) φ(θ) φ t θ(h) 0. φ θ might not be (i) linear, or (ii) continuous (even if it s linear, it might not be continuous ifd,e are infinite dimensional). 13

14 Differentiability of functions in normed spaces Definition: φ : D E is Hadamard differentiable at θ D if φ θ : D E (linear, continuous), h D, ift 0, h t h 0, then φ(θ+th t ) φ(θ) φ t θ(h) 0. Gateaux requires the difference quotients to converge to some φ θ (h) for each direction h; Hadamard requires a singleφ θ that works for every direction h. It is equivalent to the convergence in the definition of Gateaux differentiability being uniform over h in a compact subset of D. 14

15 Differentiability of functions in normed spaces Definition: φ : D E is Fréchet differentiable at θ D if φ θ : D E (linear, continuous), h D, if h 0, then φ(θ +h) φ(θ) φ θ (h) h 0. Hadamard requires the difference quotients to converge to zero for each direction, possibly with different rates for different directions; Fréchet requires the same rate for each direction. They are equivalent for D = R d. 15

16 Differentiability of functions in normed spaces We ll consider Hadamard differentiability, but allow the weaker notion of tangential differentiability: the limiting directions h can be constrained. Definition: φ : D E is Hadamard differentiable at θ D tangentially tod 0 D if φ θ : D 0 E (linear, continuous), h D 0, ift 0, h t h 0, then φ(θ+th t ) φ(θ) t φ θ(h) 0. 16

17 Differentiability of functions in normed spaces Theorem: Supposeφ : D E, whered ande are normed linear spaces. Suppose the statistict n : Ω n D satisfies n(t n θ) T for a random element T ind 0 D. If φ is Hadamard differentiable at θ tangentially tod 0 then n(φ(tn ) φ(θ)) φ θ(t). If we can extend φ : D 0 E to a continuous mapφ : D E, then n(φ(tn ) φ(θ)) = φ θ( n(t n θ))+o P (1). 17

18 Differentiability of functions in normed spaces Proof: Based on continuous mapping theorem. Consider the maps f n (h) = ( ( n φ θ + 1 ) ) h φ(θ). n Hadamard differentiability implies that for any sequence h n h D 0, we have f n (h n ) φ θ (h). Sof n( n(t n θ)) φ θ (T). (Second statement follows from continuous mapping theorem for the function h (φ(h),φ θ (h)).) 18

19 Differentiability of functions in normed spaces The chain rule lets us determine Hadamard derivatives of a composition of maps. Theorem: Suppose φ : D E, ψ : E F, where D, E and F are normed linear spaces. If 1. φ is Hadamard differentiable at θ tangentially tod 0, and 2. ψ is Hadamard differentiable at φ(θ) tangentially toφ θ (D 0), then ψ φ : D F is Hadamard differentiable at θ tangentially to D 0, with derivative ψ φ(θ) φ θ. 19

20 Example: Quantiles Definition: The quantile function of F isf 1 : (0,1) R, F 1 (p) = inf{x : F(x) p}. [PICTURE] Forp (0,1) and x R, F 1 (p) x p F(x), which implies the quantile transformation: for U uniform on(0,1), F 1 (U) F. 20

21 Example: Quantiles Also,F(F 1 (p)) p, with equality unless F has a discontinuity at F 1 (p). This implies the probability integral transformation: forx F, F(X) is uniform on [0,1] ifff is continuous onr, because F(X) = F(F 1 (U)) = U in that case. Finally, F 1 (F(x)) x, with equality unlessf is flat to the left of x. Thus, F 1 is an inverse (i.e., F 1 (F(x)) = x and F(F 1 (p)) = p for all x and p) ifff is continuous and strictly increasing. 21

22 Empirical quantile function For a sample with distribution function F, define the empirical quantile function as the quantile function Fn 1 of the empirical distribution function F n. where i is chosen such that F 1 n (p) = inf{x : F n (x) p} = X n(i), i 1 n < p i n, and X n(1),...,x n(n) are the order statistics of the sample, that is, X n(1) X n(n) and ( Xn(1),...,X n(n) ) is a permutation of the sample (X 1,...,X n ). 22

Asymptotic statistics using the Functional Delta Method

Asymptotic statistics using the Functional Delta Method Quantiles, Order Statistics and L-Statsitics TU Kaiserslautern 15. Februar 2015 Motivation Functional The delta method introduced in chapter 3 is an useful technique to turn the weak convergence of random