Assigmet 9 Exercise 5.5 Let X biomial, p, where p 0, 1 is ukow. Obtai cofidece itervals for p i two differet ways: a Sice X / p d N0, p1 p], the variace of the limitig distributio depeds oly o p. Use the fact that X / p to fid a cosistet estimator of the variace ad use it to derive a 95% cofidece iterval for p. Sketch of solutio: Sice X / p, we kow that X X / 2 p1 p. Therefore, Slutsky s theorem implies that X p 2 d N0, 1. X X Usig the asymptotic result above to obtai a approximatio for fixed, we obtai 1.96 < ] X p 2 X X < 1.96 0.95. A bit of algebra trasforms this expressio ito X 1.96 X X < p < X ] + 1.96 X X 0.95, which gives the approximate 95% cofidece iterval. b Use the result of problem 5.3b to derive a 95% cofidece iterval for p. Sketch of solutio: 2 Therefore, we have 1.96 < 2 si 1 X Exercise 5.3b implies that ] si 1 X si 1 p si 1 p d N 0, 1. ] < 1.96 0.95. 247
After a bit of algebra, we obtai si 1 X 0.98 < si 1 p < si 1 X + 0.98 ] 0.95. However, we must be very careful about cotiuig; it is ot acceptable to apply the square of the sie fuctio throughout the above iequality, because the sie fuctio is ot odecreasig ad therefore does ot preserve iequalities. Istead, we should apply the followig fuctio because it is odecreasig: 0 if x 0 fx = si 2 x if 0 < x < π/2 1 if x π/2. Therefore, the approximate cofidece iterval for p is f si 1 X 0.98 ] < p < f si 1 X + 0.98 0.95. c Evaluate the two cofidece itervals i parts a ad b umerically for all combiatios of {10, 100, 1000} ad p {.1,.3,.5} as follows: For 1000 realizatios of X bi, p, costruct both 95% cofidece itervals ad keep track of how may times out of 1000 that the cofidece itervals cotai p. Report the observed proportio of successes for each, p combiatio. Does your study reveal ay differeces i the performace of these two competig methods? Sketch of solutio: Below is some R code to accomplish the simulatios. Notice that the fuctio above uses the same simulated biomial radom variables for each test. This is ot required, but it does have the advatage of reducig oe possible source of variability i this computer experimet. Here is a summary of the results: p = 0.1 p = 0.3 p = 0.5 = 10 a 0.617 a 0.858 a 0.899 b 0.607 b 0.973 b 0.899 = 100 a 0.921 a 0.948 a 0.940 b 0.947 b 0.948 b 0.940 = 1000 a 0.954 a 0.959 a 0.949 b 0.947 b 0.955 b 0.949 It appears that the aswers give very close to the omial 95% coverage probability 248
with p = 0.5 ad at least 100. However, with = 10 either method works well. Whe = 1000, it looks like both methods work well for all values of p tested. Fially, it seems that for smaller sample sizes or values of p closer to zero, the method from part b might give slightly better coverage probabilities, though overall the two methods perform very similarly. sim <- fuctio,p { x <- rbiom1000,,p ci1 <- sumabssqrt*x/-p*/sqrtx*-x < 1.96 ci2 <- sumabssqrt*asisqrtx/-asisqrtp <.98 cci1, ci2/1000 } sim10,.1 # Repeat with all combiatios of, p 1] 0.617 0.607 Exercise 5.8 Assume X 1, Y 1,..., X, Y are idepedet ad idetically distributed from some bivariate ormal distributio. Let ρ deote the populatio correlatio coefficiet ad r the sample correlatio coefficiet. a Describe a test of H 0 : ρ = 0 agaist H 1 : ρ 0 based o the fact that fr fρ] d N0, 1, where fx is Fisher s trasformatio fx = 1/2 log1 + x/1 x]. α =.05. Use Sketch of solutio: With fx = 1/2 log1 + x/1 x], the asymptotic approximatio gives fρ 0 1.96 < fr < fρ 0 + 1.96 ] 0.95 uder the ull hypothesis ρ = ρ 0. With ρ 0 = 0, we obtai fρ 0 = 1/2 log1 = 0, which meas that the above expressio may be simplified to 1.96 < fr < 1.96 ] 0.95. Solvig for r, we obtai exp{ 3.92/ } 1 exp{ 3.92/ } + 1 < r < exp{3.92/ ] } 1 exp{3.92/ 0.95. } + 1 249
b Based o 5000 repetitios each, estimate the actual level for this test i the case whe E X i = E Y i = 0, Var X i = Var Y i = 1, ad {3, 5, 10, 20}. Sketch of solutio: Below is some R code that will simulate 5000 tests of H 0 : ρ = 0 o a bivariate ormal sample of size i which the true correlatio is zero. I other words, we estimate the true levels to be 0.3940, 0.1826, 0.1064, ad 0.746 for = 3, 5, 10, ad 20, respectively. This suggests that the small-sample approximatio derived from the asymptotic result is ot very good for < 20. fisher <- fuctiox log1+x/1-x/2 rtest <- fuctio { z <- arrayrorm10000*,c5000,,2 r <- applyz, 1, fuctiox corx,1],x,2] sumabsfisherr > 1.96/sqrt } sapplyc3,5,10,20, rtest/5000 1] 0.3940 0.1826 0.1064 0.0746 Exercise 6.2 Let X 1, X 2,... be idepedet uiform 0, θ radom variables. Let X = max{x 1,..., X } ad cosider the three estimators δ 0 = X δ 1 = 2 1 X δ 2 = X. 1 a rove that each estimator is cosistet for θ. Sketch of solutio: Sice / 1 1, it suffices to show that X θ. This is true because for 0 < ɛ < θ, θ ɛ X θ > ɛ = 0. θ b erform a empirical compariso of these three estimators for = 10 2, 10 3, 10 4. Use θ = 1 ad simulate 1000 samples of size from uiform 0, 1. From these 1000 samples, estimate the bias ad mea squared error of each estimator. Which oe of the three appears to be best? Sketch of solutio: Some R code that accomplishes this compariso is below. It is clear that δ 1 is the best i terms of MSE; this is due to the fact that its bias is much lower tha that of the other estimators. 250
d <- fuctio,theta=1 { x <- maxtheta*ruif * /-1^c0,1,2 amesx <- paste"delta", 0:2, sep="" x } bias.ad.mse <- fuctiox, theta cbidbias=rowmeasx-theta, mse=applyx, 1, fuctioa,b meaa-b^2, theta > bias.ad.msereplicate1000, d1e+2, theta=1 bias mse delta0-0.0097362758 1.893735e-04 delta1 0.0002663880 9.656975e-05 delta2 0.0103700889 2.059968e-04 > bias.ad.msereplicate1000, d1e+3, theta=1 bias mse delta0-9.518221e-04 1.929893e-06 delta1 4.822611e-05 1.028305e-06 delta2 1.049275e-03 2.129013e-06 > bias.ad.msereplicate1000, d1e+4, theta=1 bias mse delta0-1.010507e-04 2.062006e-08 delta1-1.050774e-06 1.041201e-08 delta2 9.895912e-05 2.020589e-08 c Fid the asymptotic distributio of θ δ i for i = 0, 1, 2. Based o your results, which of the three appears to be the best estimator ad why? For the latter questio, do t attempt to make a rigorous mathematical argumet; simply give a educated guess. Sketch of solutio: θ δ i t ] = After a little algebra, we obtai 1 1 i θ t ] = X 1 1 i 1 t θ for certai values of t. This expressio has a limit of e i e t/θ = e t+iθ/θ. We ca thus verify that if X is expoetially distributed with mea θ, the we have θ δ i d X iθ. This explais why δ 1 is the best estimator i part b: It is, roughly speakig, asymptotically ubiased, whereas the asymptotic distributios of θ δ 0 ad θ δ 2 have meas θ ad θ, respectively. 251
Exercise 6.5 Let X 1,..., X be a simple radom sample from the distributio fuctio F x = 1 1/x]I{x > 1}. a Fid the joit asymptotic distributio of X 1 /, X /. Hit: roceed as i Example 6.5. Sketch of solutio: The iverse of the cumulative distributio fuctio is F 1 x = 1/1 u. Now let U 1,..., U be a simple radom sample from a stadard uiform distributio. Sice 1 U is stadard uiform if U is, we claim that U 1,..., U has the same joit distributio as 1 U,..., 1 U 1. Based o this fact, we ow claim by the iversio method for geeratig radom variables that T X 1,..., X T = d 1 1,...,. 6.17 U U 2 U 1 We have see e.g., i Example 6.4 that U1 d Y1, 6.18 where Y 1 ad Y 2 are idepedet stadard expoetial radom variables. Therefore, 6.17 implies that X 1 / d 1/Y1 + Y 2. X / 1/Y 1 b Fid the asymptotic distributio of X 1 /X. Sketch of solutio: By the result of part a, we coclude that X 1 X d Y 1. The distributio of Y 1 / may be determied by direct evaluatio of the cumulative distributio fuctio. Clearly Y 1 / is betwee 0 ad 1; for a arbitrary t 0, 1, Y1 t Y1 = E t Y 2 = E = E 1 exp{ ty 2 /1 t}] = 1 0 = 1 1 t 1 t exp{ ty/1 t} exp{ y} dy 0 252 exp{ y/1 t} dy = t. Y 1 ty 2 1 t Y 2
Therefore, Y 1 / is a stadard uiform distributio. Exercise 6.6 If X 1,..., X are idepedet ad idetically distributed uiform0, 1 variables, prove that X 1 /X 2 d uiform0, 1. Sketch of solutio: Actually, this result is exact o covergece i distributio is ecessary. We showed i Sectio 6.2 that the joit distributio of X 1, X 2 is the same as that of Y 1, /Y 1 + Y +1, where the Y j are i.i.d. stadard expoetial variables. Therefore, X 1 /X 2 has the same distributio as Y 1 /, which we showed i the previous exercise to be uiformly distributed. 253