Lecture 16: Sample quantiles and their asymptotic properties Estimation of quantiles (percentiles Suppose that X 1,...,X n are i.i.d. random variables from an unknown nonparametric F For p (0,1, G 1 (p = inf{x : G(x p} is the pth quantile for any c.d.f. G on R. Quantiles of F are often the parameters of interest. θ p = F 1 (p = pth quantile of F F n = empirical c.d.f. based on X 1,...,X n θ p = Fn 1 (p = the pth sample quantile. θ p = c np X (mp + (1 c np X (mp +1, where X (j is the jth order statistic, m p is the integer part of np, c np = 1 if np is an integer, and c np = 0 if np is not an integer. Thus, θ p is a linear function of order statistics. UW-Madison (Statistics Stat 710, Lecture 16 Jan 2018 1 / 10
F(θ p = lim x θp,x<θ p F(x F(θ p = lim x θp,x>θ p F(x F(θ p p F(θ p F is not flat in a neighborhood of θ p if and only if p < F(θ p + ε for any ε > 0. Theorem 5.9 Let X 1,...,X n be i.i.d. random variables from a c.d.f. F satisfying p < F(θ p + ε for any ε > 0. Then, for every ε > 0 and n = 1,2,..., P ( θ p θ p > ε 2Ce 2nδ 2 ε, where δ ε is the smaller of F(θ p + ε p and p F(θ p ε and C is the same constant in Lemma 5.1(i. Remarks Theorem 5.9 implies that θ p is strongly consistent for θ p (exercise Theorem 5.9 implies that θ p is n-consistent for θ p if F (θ p and F (θ p + (the left and right derivatives of F at θ p exist (exercise. UW-Madison (Statistics Stat 710, Lecture 16 Jan 2018 2 / 10
F(θ p = lim x θp,x<θ p F(x F(θ p = lim x θp,x>θ p F(x F(θ p p F(θ p F is not flat in a neighborhood of θ p if and only if p < F(θ p + ε for any ε > 0. Theorem 5.9 Let X 1,...,X n be i.i.d. random variables from a c.d.f. F satisfying p < F(θ p + ε for any ε > 0. Then, for every ε > 0 and n = 1,2,..., P ( θ p θ p > ε 2Ce 2nδ 2 ε, where δ ε is the smaller of F(θ p + ε p and p F(θ p ε and C is the same constant in Lemma 5.1(i. Remarks Theorem 5.9 implies that θ p is strongly consistent for θ p (exercise Theorem 5.9 implies that θ p is n-consistent for θ p if F (θ p and F (θ p + (the left and right derivatives of F at θ p exist (exercise. UW-Madison (Statistics Stat 710, Lecture 16 Jan 2018 2 / 10
F(θ p = lim x θp,x<θ p F(x F(θ p = lim x θp,x>θ p F(x F(θ p p F(θ p F is not flat in a neighborhood of θ p if and only if p < F(θ p + ε for any ε > 0. Theorem 5.9 Let X 1,...,X n be i.i.d. random variables from a c.d.f. F satisfying p < F(θ p + ε for any ε > 0. Then, for every ε > 0 and n = 1,2,..., P ( θ p θ p > ε 2Ce 2nδ 2 ε, where δ ε is the smaller of F(θ p + ε p and p F(θ p ε and C is the same constant in Lemma 5.1(i. Remarks Theorem 5.9 implies that θ p is strongly consistent for θ p (exercise Theorem 5.9 implies that θ p is n-consistent for θ p if F (θ p and F (θ p + (the left and right derivatives of F at θ p exist (exercise. UW-Madison (Statistics Stat 710, Lecture 16 Jan 2018 2 / 10
Proof of Theorem 5.9 Let ε > 0 be fixed. Note that, for any c.d.f. G on R, (exercise. Hence G(x t if and only if x G 1 (t P ( θp > θ p + ε = P ( p > F n (θ p + ε = P ( F(θ p + ε F n (θ p + ε > F(θ p + ε p P ( ρ (F n,f > δ ε Ce 2nδ 2 ε, where the last inequality follows from DKW s inequality (Lemma 5.1(i. Similarly, P ( θp < θ p ε Ce 2nδ 2 ε. This completes the proof. UW-Madison (Statistics Stat 710, Lecture 16 Jan 2018 3 / 10
The distribution of a sample quantile The exact distribution of θ p can be obtained as follows. Since nf n (t has the binomial distribution Bi(F(t,n for any t R, P ( θp t = P ( F n (t p ( n = i n i=l p [F(t] i [1 F(t] n i, where l p = np if np is an integer and l p = 1+ the integer part of np if np is not an integer. If F has a Lebesgue p.d.f. f, then θ p has the Lebesgue p.d.f. ( n 1 ϕ n (t = n [F(t] lp 1 [1 F(t] n l p f (t. l p 1 This can be shown by differentiating P ( F n (t p term by term, which leads to UW-Madison (Statistics Stat 710, Lecture 16 Jan 2018 4 / 10
ϕ n (t = = n i=l p ( n ( n i n i=l p l p +n i[f(t] i 1 [1 F(t] n i f (t ( n i (n i[f(t] i [1 F(t] n i 1 f (t l p [F(t] l p 1 [1 F(t] n l p f (t n i=l p +1 ( n 1 i 1 [F(t] i 1 [1 F(t] n i f (t n 1( n 1 n [F(t] i [1 F(t] n i 1 f (t i=l p i ( n 1 = n [F(t] lp 1 [1 F(t] n l p f (t. l p 1 The following result provides an asymptotic distribution for n( θ p θ p. UW-Madison (Statistics Stat 710, Lecture 16 Jan 2018 5 / 10
Theorem 5.10 Let X 1,...,X n be i.i.d. random variables from F. (i If F(θ p = p, then P( n( θ p θ p 0 Φ(0 = 1 2, where Φ is the c.d.f. of the standard normal. (ii If F is continuous at θ p and there exists F (θ p > 0, then P ( n( θ p θ p t Φ(t/σ F, t < 0, where σ F = p(1 p/f (θ p. (iii If F is continuous at θ p and there exists F (θ p + > 0, then P ( n( θ p θ p t Φ(t/σ + F, t > 0, where σ + F = p(1 p/f (θ p +. (iv If F (θ p exists and is positive, then n( θp θ p d N(0,σ 2 F, where σ F = p(1 p/f (θ p. UW-Madison (Statistics Stat 710, Lecture 16 Jan 2018 6 / 10
Proof The proof of (i is left as an exercise. Part (iv is a direct consequence of (i-(iii and the proofs of (ii and (iii are similar. Thus, we only give a proof for (iii. Let t > 0, p nt = F(θ p + tσ + F n 1/2, c nt = n(p nt p/ p nt (1 p nt, and Z nt = [B n (p nt np nt ]/ np nt (1 p nt, where B n (q denotes a random variable having the binomial distribution Bi(q, n. Then P ( θp θ p + tσ + F n 1/2 = P ( p F n (θ p + tσ + F n 1/2 = P ( Z nt c nt. Under the assumed conditions on F, p nt p and c nt t. Hence, the result follows from P ( Z nt < c nt Φ( cnt 0. But this follows from the CLT (Example 1.33 and Pólya s theorem (Proposition 1.16. UW-Madison (Statistics Stat 710, Lecture 16 Jan 2018 7 / 10
If F (θ p and F (θ p + exist and are positive, but F (θ p F (θ p +, then the asymptotic distribution of n( θ p θ p has the c.d.f. Φ(t/σ F I (,0(t + Φ(t/σ + F I [0, (t, a mixture of two normal distributions. An example of such a case when p = 1/2 is F(x = xi [0, 1 2 (x + (2x 1 2 I [ 1 2, 3 4 (x + I [ 3 4, (x. Bahadur s representation When F (θ p = F (θ p + = F (θ p > 0, Theorem 5.9 shows that the asymptotic distribution of n( θ p θ p is the same as that of n[fn (θ p F(θ p ]/F (θ p. The next result reveals a stronger relationship between sample quantiles and the empirical c.d.f. Theorem 5.11 (Bahadur s representation Let X 1,...,X n be i.i.d. random variables from F. If F (θ p exists and is positive, then n( θp θ p = n[f n (θ p F(θ p ]/F (θ p + o p (1. UW-Madison (Statistics Stat 710, Lecture 16 Jan 2018 8 / 10
If F (θ p and F (θ p + exist and are positive, but F (θ p F (θ p +, then the asymptotic distribution of n( θ p θ p has the c.d.f. Φ(t/σ F I (,0(t + Φ(t/σ + F I [0, (t, a mixture of two normal distributions. An example of such a case when p = 1/2 is F(x = xi [0, 1 2 (x + (2x 1 2 I [ 1 2, 3 4 (x + I [ 3 4, (x. Bahadur s representation When F (θ p = F (θ p + = F (θ p > 0, Theorem 5.9 shows that the asymptotic distribution of n( θ p θ p is the same as that of n[fn (θ p F(θ p ]/F (θ p. The next result reveals a stronger relationship between sample quantiles and the empirical c.d.f. Theorem 5.11 (Bahadur s representation Let X 1,...,X n be i.i.d. random variables from F. If F (θ p exists and is positive, then n( θp θ p = n[f n (θ p F(θ p ]/F (θ p + o p (1. UW-Madison (Statistics Stat 710, Lecture 16 Jan 2018 8 / 10
If F (θ p and F (θ p + exist and are positive, but F (θ p F (θ p +, then the asymptotic distribution of n( θ p θ p has the c.d.f. Φ(t/σ F I (,0(t + Φ(t/σ + F I [0, (t, a mixture of two normal distributions. An example of such a case when p = 1/2 is F(x = xi [0, 1 2 (x + (2x 1 2 I [ 1 2, 3 4 (x + I [ 3 4, (x. Bahadur s representation When F (θ p = F (θ p + = F (θ p > 0, Theorem 5.9 shows that the asymptotic distribution of n( θ p θ p is the same as that of n[fn (θ p F(θ p ]/F (θ p. The next result reveals a stronger relationship between sample quantiles and the empirical c.d.f. Theorem 5.11 (Bahadur s representation Let X 1,...,X n be i.i.d. random variables from F. If F (θ p exists and is positive, then n( θp θ p = n[f n (θ p F(θ p ]/F (θ p + o p (1. UW-Madison (Statistics Stat 710, Lecture 16 Jan 2018 8 / 10
Proof Let t R, θ nt = θ p + tn 1/2, Z n (t = n[f(θ nt F n (θ nt ]/F (θ p, and U n (t = n[f(θ nt F n ( θ p ]/F (θ p. It can be shown (exercise that Since F n ( θ p n 1, Z n (t Z n (0 = o p (1. U n (t = n[f(θ nt p + p F n ( θ p ]/F (θ p = n[f(θ nt p]/f (θ p + O(n 1/2 t. Let ξ n = n( θ p θ p. Then, for any t R and ε > 0, P ( ξ n t,z n (0 t + ε = P ( Z n (t U n (t,z n (0 t + ε P ( Z n (t Z n (0 ε/2 +P ( U n (t t ε/2 0 UW-Madison (Statistics Stat 710, Lecture 16 Jan 2018 9 / 10
Proof (continued Similarly, P ( ξ n t + ε,z n (0 t 0. It follows from the result in Exercise 128 of 1.6 that which is what we need to prove. Corollary 5.1 ξ n Z n (0 = o p (1, Let X 1,...,X n be i.i.d. random variables from F having positive derivatives at θ pj, where 0 < p 1 < < p m < 1 are fixed constants. Then n[( θp1,..., θ pm (θ p1,...,θ pm ] d N m (0,D, where D is the m m symmetric matrix whose (i,jth element is p i (1 p j /[F (θ pi F (θ pj ], i j. UW-Madison (Statistics Stat 710, Lecture 16 Jan 2018 10 / 10
Proof (continued Similarly, P ( ξ n t + ε,z n (0 t 0. It follows from the result in Exercise 128 of 1.6 that which is what we need to prove. Corollary 5.1 ξ n Z n (0 = o p (1, Let X 1,...,X n be i.i.d. random variables from F having positive derivatives at θ pj, where 0 < p 1 < < p m < 1 are fixed constants. Then n[( θp1,..., θ pm (θ p1,...,θ pm ] d N m (0,D, where D is the m m symmetric matrix whose (i,jth element is p i (1 p j /[F (θ pi F (θ pj ], i j. UW-Madison (Statistics Stat 710, Lecture 16 Jan 2018 10 / 10