Non-parametric Inference and Resampling

Size: px

Start display at page:

Download "Non-parametric Inference and Resampling"

Maurice Gilmore
5 years ago
Views:

1 Non-parametric Inference and Resampling Exercises by David Wozabal (Last update 3. Juni 2013) 1 Basic Facts about Rank and Order Statistics students were asked about the amount of time they spend surfing Internet per month : Student i Amount of time x i (in Hours) (a) What are the corresponding values for the ordered statistics? (b) Determine the rank statistics (r 1,..., r 10 ). 1.2 Suppose that we have a sample of size n for the setting of Exercise 1.1. In terms of the order statistics, what statistic will you consider to study the following : (a) the least amount of time spent (b) the highest amount of time spent (c) the median of the amount of time spent. Hint: distinguish between cases, when n is odd and when n is even. (d) the range of the amount of time spent (e) Determine the values of the statistics discussed in (a)-(d) based on the data in Exercise The empirical distribution function ˆF n corresponding to the sample x 1,..., x n is given by: ˆF n (t) = 1. Number of samples not exceeding t. n (a) Calculate (and draw the graph of) the empirical distribution function for Exercise 1.1. (b) Can you express the empirical distribution function in terms of order statistics. 1.4 Prove that for a random sample X = (X 1,..., X n ), where R i are the rank statistics of X. R i = n(n + 1) Prove that for a density f that is invariant with respect to permutation, i.e., f(x 1,..., x n ) = f(x s(1),..., x s(1) ), s S n and for X f the following holds (a) P(R = r) = (n!) 1 for R the vector of ranks. (b) f(x (1),..., x (n) ) = n!f(x (1),..., x (n) ) (c) R is independent of the vector of order statistics X ().

2 Hint: Use that f(x (1),..., x (n) ) = s S n f(x (s(1)),..., x (s(n)) ) and that P(R = r X () = x () ) = f(x (s(1)),..., x (s(n)) ) f(x (1),..., x (n) ) 1.6 Let X = (X 1,..., X n ) be a random vector with a continuous density, such that R is uniformly distributed in S n. Show that (a) P(R i = k) = 1 n, 1 i, k n (b) P(R i = k, R j = h) = 1 n(n 1), i j, k h (c) P(R i = R j ) = 0, 1 i j n. 1.7 Let X 1,..., X n be i.i.d. random variables. Show that Hint: Use the fact that f X(1),X (n) (x, y) = n(n 1) (F (y) F (x)) n 2 f(x)f(y), < x y <. P(X (1) x, X (n) y) = P(X (n) y) P(X (1) > x, X (n) y) to obtain the joint distribution of (X (1), X (n) ). 1.8 Show that the density of the midrange V n = (X (1) + X (n) )/2 for an i.i.d. sample X 1,... X n from an absolutely continuous distribution F with density f is given by f Vn (v) = 2n(n 1) v {F (2v x 1 ) F (x 1 )} n 2 f(x 1 )f(2v x 1 )dx 1, < v <. Hint: Use the joint density of (X (1), X (n) ) from 1.7 to calculate the joint density of (X (1), V n ) by finding a transformation T : R 2 R 2, such that T (X (1), X (n) ) = (X (1), V n ) and noting that using the change of variable formula P(T (X) A) = P(X T 1 (A)) = f X (x)dx = f X (T 1 (x)) J T 1(x) dx x T 1 (A) where J T 1 is the determinant of the Jacobian of T 1. Now integrate out the dependence on x (1). 1.9 Calculate the explicit form for the density of the midrange V n, when X i F = U(0, 1) for all 1 i n For an i.i.d. sample X 1,..., X n, calculate the density of the order statistic X (i) for 1 i n Using example 1.10 calculate the density of U (i) for U i i.i.d. with U i U(0, 1) for all 1 i n For the U 1,... U n defined in Exercise 1.11 prove that E(U (i) ) = Hint: Use the following result (Beta function) where Γ is the Gamma function. B(x, y) = 1 0 i n + 1. x A t x 1 (1 t) y 1 dt = Γ(x)Γ(y) Γ(x + y) 2

3 1.13 For the U 1,... U n defined in Exercise 1.11 prove that Var(U (i) ) = i(n i + 1) (n + 1) 2 (n + 2). Hint: Again use the Beta function and it s relation to the Gamma function Show that in the i.i.d. case the joint density of two arbitrary order statistics (X (j), X (i) ) with 1 i < j n is given by (for < x i < x j < ) f X(i),X (j) (x i, x j ) = n! (i 1)!(j i 1)!(n j)! F (x i) i 1 (F (x j ) F (x i )) j i 1 (1 F (x j )) n j f(x i )f(x j ). Hint: Use the formula for the joint density f of all the order statistics (why does the this hold?) f(x (1),..., x (n) ) = r S n f(x (r1 ),..., x (rn)) = n!f(x (1),..., x (n) ) and integrate out the orders k : k i, k j Let U 1,..., U n be a i.i.d. sample from the standard uniform distribution U(0, 1). Use 1.14 and 1.10 to show that the random variables U (i) /U (j) and U (j) for 1 i < j n, are statistically independent and calculate the respective distribution functions Prove that cov( ˆF n (x), ˆF n (y)) = c(f X (x), F X (y))/n, where c(s, t) = min(s, t) st and ˆF n is the empirical distribution function of a random sample of size n from the population F X. Hint: Write ˆF n (x) = n 1 n 1 {t:x i t}(x) Let ˆF n be the empirical cdf of a random sample of size n from a continuous cdf F X. Show that for < x < y <, cov( ˆF n (x), ˆF n (y)) = F X(x)(1 F X (y)) n Hint: Use the result from the last exercise Show that for an i.i.d. sample (X 1,..., X n ) with ranks (R 1,..., R n ) (a) Cov(R i, R j ) = n (b) Corr(R i, R j ) = n+1 n Non-parametric tests 2.1 Use the sign test and the following i.i.d. sample , , , , , , , , , to test the null hypothesis that the median φ of the underlying distribution equals 0, against H 1 : φ 0 and H 1 : φ > 0 (at the level α = 0.05). 2.2 Repeat the test in 2.1 with the Wilcoxon test. 2.3 Let X 1,..., X 48 be a random sample from a distribution with distribution function F. To test H 0 : F (41) = 1 4 against H 1 : F (41) < 1 4, use the statistic Y, which is the number of samples less than or equal to 41. If the observed value of Y is smaller than 7 reject H 0 and accept H 1. If p = F (41), find the power function K(p), 0 < p < 1 4 of the test. Approximate α = K( 1 4 ). 3

4 2.4 A statistician decides that working with numbers is a tedious job and designs a universal test, which rejects the null hypothesis in favor of the alternative hypothesis if a coin toss (whose outcome is independent of the testing problem in question) yields heads. What is the power of this test? 2.5 Calculate the expectation and the variance of the test statistic of the sign test under the null hypothesis of the test. 2.6 Calculate the expectation and the variance of the test statistic of the Wilcoxon test under the null hypothesis of the test. Hint: The Wilcoxon test statistic for testing H 0 : X = M is given by W = Z i rank( D i ) where D i = X i M and Z i = sign(d i ). Show that W has the same distribution as n V i, where V i are independent random variables with P(V i = i) = P(V i = i) = 2 1. Use this representation to calculate mean and variance. 2.7 (Gibbons & Chakraborti, 2003); A random sample of 10 observations is drawn from a normal population with mean µ and variance 1. Instead of a normal-theory test, the ordinary sign test is used for H 0 : µ = 0, H 1 : µ > 0, and the null hypothesis is rejected if th test statistic K 8. (a) Plot the power curve using the exact distribution of K. (b) Plot the power curve using the normal approximation to the distribution of K. 2.8 (Gibbons & Chakraborti, 2003); Hoskin et al. (1986) investigated the change in fatal motor-vehicle accidents after the legal minimum drinking age was raised in 10 states. Their data were the ratios of the number of single-vehicle nighttime fatalities to the number of licensed drivers in the affected age group before and after the laws were changed to raise the drinking age, shown in Table 1. The researchers hypothesized that raising the minimum drinking age resulted in a reduced median fatality ratio. Investigate this hypothesis. State Affected Ages Ratio before Ratio after Florida Georgia Illinois Iowa Maine Michigan Montana Nebraska New Hampshire Tennessee Show that for X symmetric around 0, X and Z = 1 {1} (sign(x)) are independent Check whether the distribution of the Wilcoxon test statistic W = Z i rank( D i ) is asymptotically normally distributed. Use the following 4

5 Theorem 2.1. Let U 1,..., U n be a independent sample, where E(U i ) = µ i and Var(U i ) = σi 2. If the Liapounov condition n lim E( U i µ i 3 ) n ( n ) 3 = 0 σ2 2 i holds then n U i n µ i n σ2 n N(0, 1) Write a computer program that calculates the exact distribution of the test statistic of the Wilcoxon test Rank-Sum-Test: Let (X 1,..., X m ) and (Y 1,..., Y n ) be two samples such that Z = (Z 1,..., Z k ) = (X 1,..., X m, Y 1,..., Y n ) with k = n + m is an i.i.d. sample from a continuous distribution and R = (R 1,..., R k ) the corresponding ranks. Define T = m R i. Show that (a) E(T ) = m k+1 2 (b) Var(T ) = m(k m)(k+1) 12 Hint: Use example Density Estimation In this section K will always be a symmetric density and K h (x) = h 1 K ( x h). A kernel density estimator ˆf h (x; X 1,..., X n ) based on the random sample X 1,..., X n following density f is given by ˆf h (x; X 1,..., X n ) = 1 K h (x X i ). n Often the dependence on the random variables X 1,..., X n is suppressed in the notation and we write simply ˆf h (x; X 1,..., X n ) = ˆf(x, h). 3.1 Show that ˆf h (x; X 1,..., X n ) is a density. 3.2 The Mean Squared Error of an estimator ˆθ for θ is given by [ MSE(ˆθ) = E (ˆθ θ) 2]. Show that with Bias(ˆθ) = E(ˆθ) θ. MSE(ˆθ) = Var(ˆθ) + Bias(ˆθ) Let f g(x) = f(x y)g(y)dy be the convolution of f and g. Show that 5

6 (a) E( ˆf(x; h)) = K h(x y)f(y)dy = (K h f)(x) (b) Var( ˆf(x; h)) = n 1 ( (K 2 h f)(x) (K h f) 2 (x) ). 3.4 Show that the MSE of ˆf(x; h) is MSE (f(x; h)) = n 1 ( (K 2 h f)(x) (K h f) 2 (x) ) + ((K h f)(x) f(x)) Let f be a continuous, bounded function. Show that lim h 0 f(x)k h (x ˆx)dx = f(ˆx). 3.6 The Integrated Squared Error of ˆf(x; X 1,..., X n ) is given as ISE( ˆf(x, h)) = while the Mean Integrated Squared Error is given by [ ] MISE( ˆf(x, h)) = E ( ˆf(x, h) f(x)) 2 dx = 1 n ( ˆf(x, h) f(x)) 2 dx (K 2 h f)(x) (K h f) 2 (x)dx + [(K h f)(x) f(x)] 2 dx where (f g)(x) = f(y)g(x y)dy = f(x y)g(y)dy is the convolution. Using the above show that MISE( ˆf(x, h)) = 1 nh 2 An asymptotic analysis of the MISE yields with K 2 (x)dx + (1 1 n ) (K h f) 2 (x)dx (K h f)(x)f(x)dx + f(x) 2 dx. MISE( ˆf(x, h)) = AMISE( ˆf(x, h)) + o ( (nh) 1 + h 4) AMISE( ˆf(x, h)) = (nh) 1 K(x) 2 dx + 1 ( 4 h4 ) 2 x 2 K(x)dx f (x) 2 dx. The first term in the above (asymptotically) corresponds to the variance of ˆf(x, h) while the second term captures the (asymptotic) bias of ˆf(x, h), showing a trade-off similar to the general formula in 3.2 in dependence on the parameter h. 3.7 Demonstrate the variance bias trade-off discussed above by sampling from the logistic distribution with 1 F (x) = 1 + e x. Sample from the distribution, fit a kernel density estimator with K being the density of the standard normal distribution and vary the values of h. Plot the results against f and complement this visual analysis by calculating the two components of the AMISE for each run. 3.8 Use the formula for the AMISE above to calculate the (asymptotically) optimal bandwidth h (i.e. the value of h for which the AMISE is minimal) and use h to obtain the asymptotic rate of convergence of the kernel density estimate. 6

7 4 Non-Parametric Regression In regression theory the aim is to find a relationship between two random variables X and Y by considering the following optimization (regression) problem min E [ (Y m(x)) 2]. (1) m:r R If m is restricted to be a linear function, the above problem is the classical linear regression problem. In locally weighted polynomial regression, we determine the regression function at a point x by solving the problem min β (Y i β 0 β 1 (X i x) β p (X i x) p ) 2 K h (X i x) with β = (β 0, β 1,..., β p ). We then set m(x) = m(x, p, h) = β 0. To make the problem conform with the usual regression setting, we define the (n p) matrix X x = 1 (X 1 x) (X 1 x) p... 1 (X n x) (X n x) p and the (n n) matrix W x = diag (K h (X 1 x),..., K h (X n x)) where diag(x 1,..., x n ) is the matrix A = (a i,j ) with a i,i = x i and a i,j = 0 if i j. We can then write (Y i β 0 β 1 (X i x) β p (X i x) p ) 2 K h (X i x) = (Y X x β) W x (Y X x β) with Y = (Y 1,..., Y n ). Now standard least squares theory can be used to get the optimal coefficients ˆβ = ( ˆβ 0, ˆβ 1,..., ˆβ p ) ˆβ = (X x W x X x ) 1 X x W x Y (2) and therefore ˆm(x, p, h) = e 1 (X x W x X x ) 1 X x W x Y. Exercises 4.1 Show that m(x) = E(Y X = x) in (1). Hint: To calculate the expectation the joint density f X,Y of (X, Y ) is needed, which can used in the following form f X,Y (x, y) = f Y,X=x (y)f X (x) where f Y,X=x (y) is the density of Y conditional on X = x. We can then write min E [ (Y m(x)) 2] = min (y m(x)) 2 f Y,X=x (y)f X (x)dydx. m:r R m:r R 4.2 Show (2). 4.3 The Nadaraja-Watson estimator: Show that for p = 0, we have that ˆm(x, 0, h) = β 0 = K h (x i x)y i n K h(x i x). 7

8 4.4 Show that for p = 1, it holds that ˆm(x, 1, h) = β 0 = (ŝ 2 (x, h) ŝ 1 (x, h))(x i x)k h (x i x)y i ŝ 2 (x, h)ŝ 0 (x, h) ŝ 1 (x, h) 2 where Hint: Show that with ŝ r (x, h) = n 1 (x i x) r K h (x i x). ŝ 0 (x, h) ŝ p (x, h) ˆt 0 (x, h) ˆm(x, p, h) = ŝ p (x, h) ŝ 2p (x, h) ˆt p (x, h) ˆt r (x, h) = n 1 (x i x) r K h (x i x)y i. 8

Non-parametric Inference and Resampling

Non-parametric Inference and Resampling Exercises by David Wozabal (Last update. Juni 010) 1 Basic Facts about Rank and Order Statistics 1.1 10 students were asked about the amount of time they spend surfing