Nonparametric Estimation of Smooth Conditional Distributions

Size: px

Start display at page:

Download "Nonparametric Estimation of Smooth Conditional Distributions"

Shona O’Neal’
6 years ago
Views:

1 Nonparametric Estimation of Smooth Conditional Distriutions Bruce E. Hansen University of Wisconsin May 24 Astract This paper considers nonparametric estimation of smooth conditional distriution functions (CDFs) using kernel smoothing methods. We propose estimation using a new smoothed local linear (SLL) estimator. Estimation ias is reduced through the use of a local linear estimator rather than local averaging. Estimation variance is reduced through the use of smoothing. Asymptotic analysis of mean integrated squared error (MISE) reveals the form of these efficiency gains, and their magnitudes are demonstrated in numerical simulations. Considerale attention is devoted to the development of a plugin rule for andwidths which minimize estimates of the asymptotic MISE. We illustrate the estimation method with an application to the U.S. quarterly GDP growth rate. Research supported y the National Science Foundation. Department of Economics, 118 Oservatory Drive, University of Wisconsin, Madison, WI 5376

2 1 Introduction This paper considers nonparametric estimation of smooth conditional distriution functions (CDFs). As the CDF is the conditional expectation of an indicator function, it may e estimated y nonparametric regression estimation methods. A recent paper which exploits advances in nonparametric regression for estimation of the CDF is Hall, Wolff and Yao (1999). See also Fan and Yao (23). Standard nonparametric regression estimates of the CDF, however, are not smooth and therefore are inefficient. If the dependent variale in these regressions the indicator function is replaced y a smooth function, estimation variance can e reduced at the cost of an increase in ias. If the degree of smoothing is carefully chosen, it can result in a lowered finite sample mean integrated squared error (MISE). The advantages of smoothed estimation of distriution functions has een discussed in the statistics literature. For references see Hansen (24). The extension to CDF estimation considered here is new. Our analysis concerns a pair of random variales (Y,) R R that have a smooth joint distriution. Let f(y) and g(x) denote the marginal densities of Y and, and let f (y x) denote the conditional density of Y given x. The conditional distriution function is Define as well the derivatives F (y x) Z y f (u x) du. F (s) (y x) s F (y x) xs f (s) (y x) s f (y x) xs f (y x) f (y x). y Given a random sample {Y 1, 1,..., Y n, n } from this distriution, the goal is to estimate F (y x) at a fixed value x. To simplify the notation we will sometimes suppress dependence on x. Let Φ(s) and φ(s) denote the standard normal distriution and density functions and let Φ σ (s) Φ(s/σ) and φ σ (s) σ 1 φ(s/σ). Let φ (m) (s) (d m /ds m ) φ(s) and φ σ (m) (s) (d m /ds m ) φ σ (s) σ (m+1) φ (m) (s/σ) denote the derivatives of φ(s) and φ σ (s). In particular, φ (1) (s) sφ(s) and φ (2) (s) s 2 1 φ(s). Our estimators will make use of two univariate kernel functions w(s) and k(s). We assume that oth are symmetric and second-order, are normalized to have a unit variance (e.g. R s2 w(s)ds 1), and have a finite sixth moment (e.g. R s6 w(s)ds < ). Let R R w(s)2 ds denote the roughness of w. Let Z s K(s) k(u)du 1

3 denote the integrated kernel of k and define the constant ψ 2 sk(s)k(s)ds >. (1) For our applications, we set oth k(s) and w(s) to equal the normal kernel φ(s), in which case R 1/2 π, ψ 1/ π and K(s) Φ(s). Section 2 presents unsmoothed estimators. Section 3 introduces our smoothed estimators. Section 4 develops plug-in andwidth rules for the local linear and smoothed local linear estimator. Section 5 presents a numerical simulation of performance. Section 6 is a simple application to the U.S. quarterly real GDP growth rate. Section 7 concludes. The proof of Theorem 1 is presented in the Appendix. Gauss code which implements the estimators and andwidth methods is availale on the author s wepage 2 Unsmoothed Estimators Note that F (y x) E (1 (Y i y) i x), andthusitisaregressionfunction. ThustheCDFmaye estimated y regression of 1 (Y i y) on i. A simple nonparametric estimator is the Nadaraya-Watson (NW) estimator, and takes the form of a local average ˆF (y x) P n i1 w i1 (Y i y) P n i1 w i (2) where w i 1 µ x w i (3) are kernel weights and is a andwidth. Reduced ias may e achieved using a local linear (LL) estimator. Local linear estimation is a special case of local polynomial regression, and is typically recommended for estimation mean. For a general treatment see Fan and Gijels (1996). The estimator is the intercept from a weighted least-squares regression of 1 (Y i y) on (x i ) using the weights w i, and can e written as P n i1 F (y x) w i 1 (Y i y) P n i1 w i wi w i ³1 ˆβ (x i ) Ã n ˆβ i1 w i (x i ) 2! 1 Ã n i1! w i (x i ). (4) (5) AsshowninTheorem1() of Hall, Wolff and Yao (1999), if cn 1/5 then ³ 2 R (F (y x)(1 F (y x))) E F (y x) F (y x) ' + 4 g(x)n 4 F (2) (y x) 2 + O ³n 6/5. 2

4 Integrating over y we otain the MISE ³ 2 RV E F (y x) F (y x) dy gn + 4 V 1 + O ³n 6/5 4 (6) where g g(x) V V 1 F (y x)(1 F (y x)) dy F (2) (y x) 2 dy. The first term on the right side of (6) is the integrated asymptotic variance, and the second is the integrated squared asymptotic ias. This ias term is simplified relative to the NW estimator, and this is the source of the improved performance of F over ˆF. Hall, Wolff and Yao (1999) point out that the local linear estimator F has two undesirale properties. It may e nonmonotonic in y, and is not constrained to lie in [, 1]. Thus the estimator F is not a CDF. The prolem occurs when the weights (5) are negative, which occurs when ˆβ (x i ) > 1. A simple solution 1 is to modify wi to exclude negative values. Thus we redefine the weights as follows ( wi ˆβ (x i ) > 1 w i ³1 ˆβ (7) (x i ) ˆβ (x i ) 1 This modification is asymptotically negligile, ut constrains F to e a valid CDF. Using the modified weights (7) our estimator is a slight modification of the local linear estimator, however we will continue to refer to it as the local linear (LL) estimator for simplicity. 3 Smoothed Estimators For some andwidth h> let K h (s) K(s/h) and consider the integrated kernel smooth K h (y Y i ). For small h it can e viewed as a smooth approximation to the indicator function 1 (Y i y). Smoothed CDF estimators replace the indicator function 1 (Y i y) in (2) and (4) with K h (y Y i ), yielding the smoothed Nadaraya-Watson (SNW) estimator ˆF h, (y x) P n i1 w ik h (y Y i ) P n i1 w i 1 Hall, Wolff and Yao (1999) proposed an alternative solution, introducing weights to the Nadaraya-Watson estimator which are selected y an empirical likelihood criterion. Their estimator is asymptotically equivalent to ours, ut computationally more complicated. 3

5 and the smoothed local linear (SLL) estimator F h, (y x) P n i1 w i K h (y Y i ) P n. i1 w i The weights w i and w i are definedin(3)and(7),respectively. Theorem 1 Assume that F (y x) and g(x) are continuously differentiale up to the fourth order in oth y and x. If cn 1/5 and h O() as n, then ³ 2 R E F h, (y x) F (y x) dy gn (V V 1 hψ)+4 h2 2 V 2 + h4 V 3 + O ³n 6/ (8) where g g(x) V V 1 V 2 V 3 F (y x)(1 F (y x)) dy F (2) (y x) 2 dy f (y x) f (2) (y x) dy f (y x) 2 dy. The MISE in (8) is a generalization of (6) to allow for h. The MISE is strictly decreasing at h (since ψ > ) demonstrating that the asymptotic MISE is minimized for strictly positive h. Thus there are (at least asymptotic) efficiency gains from smoothing. 4 Plug-In Bandwidth Selection We suggest selecting the andwidths (h, ) y the plug-in method. Plug-in andwidths are the values which minimize an estimate of the asymptotic MISE (8). Estimation of the latter requires estimation of the parameters g, V, V 1,V 2, and V 2. In this section we discuss estimation of these components, and implementation of the plug-in andwidth rule. Readers who are interested in the plug-in algorithm ut not the motivational details can jump directly to Section 4.5, which summarizes the implementation details. 4.1 Kernel Estimation of g The normal kernel density estimator is ĝ 1 n n φ r (x i ) (9) j1 4

6 where r is a andwidth. An estimate of the asymptotic MSE-minimizing choice for r is ˆr 2 π ĝ s (x) ³ ĝ (2) t (x) 2 n 1/5 (1) where ĝ s (x) 1 n ĝ (2) t (x) 1 n n φ s (x i ) j1 n j1 φ (2) t (x i ) and s and t are preliminary andwidths. Letting ˆσ x denote the sample standard deviation for i, we suggest the Gaussian reference rules s 1.6ˆσ x n 1/5 and t.94ˆσ x n 1/9, which are MISE-optimal for estimation of the density and its second derivative when g is Gaussian. 4.2 Estimation of V The parameter V is a function of the conditional distriution F (y x). A parametric estimator of the latter is the intercept from a p th order polynomial regression of 1 (Y i y) on ( i x). For p let 1 1 x ( 1 x) p p..., (11) 1 n x ( n x) p and let 1 (Y y) denote the stacked 1 (Y i y). Then the estimator can e written as ˆF (y x) δ 1 p p p 1 (Y y) (12) where δ 1 (1 ) is the first unit vector. Given (12), an estimate of V is ³ ˆV ˆF (y x) 1 ˆF (y x) dy Z δ 1 1 p p p 1 (Y >y) 1 (Y y) dy p 1 p p δ1 δ 1 p p p Y Y + p p p 1 δ1 (13) where (a) + a1 (a ) for scalar a, and is an element-y-element operation for matrices. A nonparametric estimator of V can e ased on a local linear regression. Given a andwidth q, let w i φ q (x i ) denote Gaussian kernel weights and set W diag{w i }. Then the estimator is F (y x) δ 1 1 W 1 1 W 1 (Y y) 5

7 and the estimate of V is Ṽ δ 1 1 W 1 1 W Y Y + W 1 1 W 1 1 δ1. (14) The andwidth q which minimizes the asymptotic mean integrated squared error of F (y x) is µ V q 2 πgv 1 n which depends on the unknowns g, V and V 1. (See equation (8) with h.) Using the estimates ĝ from (9), ˆV from (13), and ˆV 1 from (17) elow, we otain a plug-in andwidth ˆq.78 Ã ˆV ĝ ˆV 1 n 1/5! 1/5. (15) 4.3 Estimation of V 1 The parameter V 1 is a function of the second derivative F (2) (y x) whichcaneestimatedypolynomial regression. For some p 2 let p e defined as in (11). The estimator is ˆF (2) (y x) δ 3 1 p p p 1 (Y y) (16) where δ 3 is the third unit vector (a one in the third place, and zeros elsewhere). To calculate V 2 from (16), oserve that for any c>max Y i Z c 1 (Y y) 1 (Y y) dy Dc max Y,Y where D is an n n matrix of ones, and oserve that δ 3 p p p D. Thus Z c Z ˆF (2) (y x) 2 dy δ c 3 1 p p p 1 (Y y) 1 (Y y) dy p 1 p p δ3 δ 3 p p p Dc max Y,Y p p p 1 δ3 δ 3 1 p p p max Y,Y p 1 p p δ3 where the max operator is element-y-element. Note that this expression is invariant to c>max Y i, thus we define ˆV 1 δ 3 1 p p p max Y,Y p 1 p p δ3. (17) This estimator was also used in (15) to construct a plug-in rule for the nonparametric estimate Ṽ. A nonparametric estimator of V 1 can e ased on a local cuic regression. (While any local polynomial regression of order two or aove is consistent, local cuic regression is recommend for estimation of a regression second derivative.) Given a andwidth q 1, let w 1i φ q1 (x i ) denote Gaussian kernel 6

8 weights and set W 1 diag{w 1i }. The estimator of F (2) (y x) is F (2) (y x) δ 3 3 W W 1 1 (Y y) (18) and the estimate of V 1 is Ṽ 1 δ 3 3 W W 1 max Y,Y W W δ3. (19) Using Fan and Gijels (1996, Theorem 3.1 and equation (3.2)) the andwidth which minimizes the asymptotic mean integrated squared error of ˆF (2) (y x) is where The constant V 4 may e estimated y µ V 1/9 q gv 4 n V 4 F (4) (y x) 2 dy. ˆV 4 δ 5 p p p max Y,Y p p p 1 δ5 (2) where p 4 and δ 5 is the fifth unit vector, suggesting the plug-in andwidth ˆq Ã! 1/9 ˆV. (21) ĝ ˆV 4 n 4.4 Estimation of V 2 and V 3 If f (y x) does not depend on x then we have the simplification V 3 R f (y) 2 dy which is a wellstudied estimation prolem. For a andwidth a let e the n 1 vector of ones and ˆf a (y) 1 n n j1 φ (1) a (y Y i)φ (1) a y Y 1 e a Gaussian kernel estimate of f (y). An estimate of V 3 in this case is ˆf a (y)2 dy φ (2) a φ (1) a (y Y ) φ(1) y Y dy 1 a Y Y 1 (22) where a 2a. The second equality is equation (7.1) ofmarronandwand(1992): φ (r) a (y Y i ) φ (s) a (y Y j ) dy ( 1) r φ (r+s) 2a (Y i Y j ). (23) 7

9 The estimator (22) is the Jones and Sheather (1991) estimator. Hansen (24) descries a multiple plug-in method for selection of a. We recommend this choice and hold it fixed for the remainder of this su-section and the next. Now consider estimation of V 3 allowing for dependence on x. Note that the estimate of f (y) in (24) is a regression of φ (1) a (y Y i) on a constant. Similarly, we can estimate f (y x) as the intercept in a polynomial regression on x i x. For p definedin(11) let The associated estimate of V 3 is ˆV 3 ˆf a (y x)2 dy Z δ 1 1 p p p δ 1 ˆf a (y x) δ 1 1 p p p φ (1) a (y Y ). (24) p p p φ (2) a φ (1) a (y Y ) φ(1) y Y dy p 1 p p δ1 a Y Y p p p 1 δ1. (25) Alternatively, we could use a fully nonparametric estimator such as a local linear estimator. However, the optimal andwidths depend on derivatives of the conditional density that are difficult to estimate. We thus do not consider further complications. In addition, using an analog of (16) and (24), a simple estimator of V 2 is ˆV 2 ˆf (2) a (y x) ˆf a (y) dy Z δ 3 1 p p p φ a (y Y ) φ a y Y dy p 1 p p δ1 where the third equality uses (23). δ 3 p p p φ a Y Y p p p 1 δ1. (26) 4.5 Plug-In Bandwidth Our plug-in estimate of (8) is M (h, ) R ³Ṽ hψ + 4 Ṽ 1 2 h2 2 ˆV 2 + h4 ˆV 3 ĝn (27) The plug-in andwidth pair ( h, ) jointly minimize M (h, ). A closed-form solution is not availale, so the function needs to e numerically minimized. One caveat is that we must constrain h<ψ/ṽ to ensure a sensile solution. (When h>ψ/ṽ then M is strictly increasing in so unconstrained minimization sets which is not sensile.) As a practical matter it makes sense to ound h more sustantially, we suggest the constraint h ψ/2ṽ,andthisisdoneinournumericalapplications. The plug-in andwidth for the unsmoothed estimator F is found y setting h and minimizing 8

10 to find ˆ ³ RṼ/ĝṼ 1 n 1/6. To summarize, the computation algorithm is as follows. First, ĝ is calculated from (9) using (1). Second, for some p 4, ˆV, ˆV 1 and ˆV 4 are calculated using (13), (17) and (2). Third, using the andwidth (15), Ṽ is calculated using (14). Fourth, using the andwidth (21), Ṽ 1 is calculated using (19). Fifth, the andwidth a is calculated using the method of Hansen (24), allowing ˆV 2 and ˆV 3 to e calculated using (26) and (25). Finally, ĝ, Ṽ,Ṽ 1, ˆV 2, ˆV 3 are set into (27) which is numerically minimized over (h, ) to otain ( h, ). 5 Simulation The performance of the CDF estimators is investigated in a simple simulation experiment using ivariate data. The variales i are generated as iid N(, 1). The variale Y i is generated as Y i i + e i where e i is iid, independent of i. The error e i is distriuted as G, one of the first nine mixture-normal distriutions from Marron and Wand (1992). Consequently, the true CDF for Y i is G(y x). Foreach choice of G we generated 1, independent samples of size n 5, 1 and 3. ForeachsampleweestimatedtheCDFatx 1 and y { 2., 1.5, 1.,.5,.,.5, 1., 1.5, 2.}, using the NW, SNW, LL and SLL estimators. Our measure of precision is the exact mean-squared error, averaged over these values of y (as an approximation to the MISE). The standard normal kernel was used for all estimates. We first astracted from the issue of andwidth selection and evaluated the precision of the four estimates using the infeasile finite-sample-optimal andwidth, calculated y quasi-newton minimization. The results are presented in the first three columns of Tale 1. Wehavetakentheunsmoothed NW estimator as a aseline, and have divided the MISE of the other estimators y that of the NW estimator. Thus the numers in Tale 1 represent the relative improvement attained relative to the NW estimator. The results show that across models and sample sizes, the SLL estimator achieves the lowest MISE. In each case, using local linear estimation rather than local averaging reduces the MISE, and using smoothing lowers the MISE. We next investigated the performance of our plug-in andwidths ˆ for the LL estimator and ( h, ) for the SLL estimator. On each sample descried aove we calculated these plug-in andwidths and CDF estimators, and evaluated the MISE across the 1, samples. The results are presented in the final two columns of Tale 1. As efore, the MISE is normalized y the MISE of the infeasile NW estimator. Thus the numers represented the relative improvement attained y the feasile plug-in estimators relative to this infeasile estimator. In each case, the plug-in SLL estimator has lower MISE than the plug-in LL estimator. In some cases the difference is minor, in other case it is quite sustantial (the largest gains are for Model #5, where the MISE is reduced y aout 25%). In each case the MISE is higher than if the infeasile optimal andwidth had een used, in some cases sustantially higher, suggesting that further improvements in andwidth choice may e possile. 9

11 Overall, the simulation evidence confirms the implications of the asymptotic analysis. We can recommend use of the SLL estimator (with our proposed plug-in andwidth) for empirical practice. 6 GDP Growth Forecasting We illustrate the method with a simple application to the U.S. real GDP growth rate. Let GDP i denote the level of quarterly real GDP and let y i 4(ln(GDP i ) ln (GDP i 1 )) denote the annualized quarterly growth rate. Set x i y i 1. We restrict the sample to run from first third quarter of 1984 to the first quarter of 24 (79 oservations). This choice is consistent with the decline in output volatility documented y McConnell and Perez-Quiros (2). We estimated the CDF using the local linear (LL) and smoothed local linear (SLL) estimators for y i given x i x, fixing x to equal 2% and 4.5%, which are approximately the 25 th and 75 th quantiles of the unconditional distriution. The CDF was evaluated at 51 evenly spaced increments etween -2 and 7, and the andwidths were calculated y the plug-in method descried in Section 4.5. To provide a contrast, we also estimated the CDF using a homoskedastic Gaussian AR(1). Figure 1 displays the three CDFs for the case of x 2. and Figure 2 for the case x 4.5. In oth cases, the three estimates are meaningfully different from one another. The parametric estimate has a fairly different location and variance. The two nonparametric estimates have the same asic shape, ut the unsmoothed local linear estimate is visually quite erratic, and the smoothed local linear estimator appears to apply an intuitively correct degree of smoothing. It is also interesting to note that the LL estimator uses a plug-in andwidth of ˆ 1.59 in Figure 1 and ˆ 2.13 in Figure 2, while the SLL estimator uses 1.44, h.53 and 2. and h.23 respectively. In oth cases, the plug-in rule selects similar values for for the two estimators (with that for SLL slightly smaller), and selects a consideraly smaller value for h than. 7 Conclusion We have shown how to comine local linear methods and smoothing to produce good nonparametric estimates of the conditional distriution function for ivariate data. We have shown that meaningful improvements in estimation efficiency can e achieved y using these techniques. Furthermore, we have derived data-ased plug-in rules for andwidth selection which appear to work in practice. These tools should prove useful for empirical application. This analysis has een confined to ivariate data. An important extension would to the case of multivariate data. In particular, this extension would require the study and development of feasile andwidth selection rules. 1

12 8 Appendix: Proof of Theorem 1 The proof of Theorem 1 is ased on the following set of expansions. Lemma 1 Let g (s) ds dx s g(x) and K h (Y i)k h (y Y i ) F (y x). Ew i g(x)+ 2 2 g(2) (x)+o( 4 ) (28) E (w i (x i )) 2 g (1) (x)+o( 4 ) (29) E ³w 2 i (x i ) 2 g(x)+o( 4 ) (3) E (w i Kh(Y i )) 2 2 F (2) (y x) g(x)+ 2 F (1) (y x) g (1) (x) (31) + h2 2 f (y x) g(x)+o 4 E (w i (x i ) Kh(Y i )) 2 F (1) (y x) g(x)+o 4 (32) E w 2 i K h(y i ) 2 Rg(x)(F (y x)(1 F (y x)) hψf (y x)) +O () (33) E ³w 2 i 2 (x i ) 2 Kh(Y i ) O() (34) Proof. Throughout, we make use of the assumption that h O() to simplify the ounds. The derivations make use of the following Taylor series expansions which are valid under the assumption that the functions are fourth order differentiale: g(x v) g(x) vg (1) (x)+ v2 2 2 g(2) (x) v3 3 6 g(3) (x)+o( 4 ) (35) F (y hu v) F (y v) huf (y v)+ h2 u 2 2 f (y v) h3 u 3 6 f (y v)+o(h 4 ) (36) F (y x v) F (y x) vf (1) (y x)+ 2 v 2 2 F (2) (y x) 3 v 3 6 F (3) (y x)+o( 4 ) (37) f (y x v) f (y x) v y f (1) (y x)+o( 2 ). (38) To show (28), y a change of variales and (35) Ew i 1 w µ x v g(v)dv w (v) g(x v)dv w (v) µg(x) vg (1) (x)+ v2 2 2 g(2) (x) v3 3 6 g(3) (x) dv + O( 4 ) g(x)+ 2 2 g(2) (x)+o( 4 ). 11

13 Note that this makes use of the facts that R w (v) dy 1, R w (v) v, R w (v) v2 1 and R w (v) v3. Similarly E (w i (x i )) (x v) 1 µ x v w vw (v) g(x v)dv g(v)dv vw (v) µg(x) vg (1) (x)+ v2 2 2 g(2) (x) dv + O( 4 ) 2 g (1) (x)+o( 4 ) yielding (29) and E ³w 2 i (x i ) (x v) 2 1 w 2 2 g(x)+o( 4 ) µ x v v 2 w (v) g (x v) dv g(v)dv yielding (3). To show (31) and(32),first oserve that y integration y parts, a change of variales and (36) K h (y u) f (u v) du +O(h 4 ) k h (y u) F (u v) du k (u) F (y hu v) du k (u) µf (y v) huf (y v)+ h2 u 2 2 f (y v) h3 u 3 6 f (y v) du F (y v)+ h2 2 f (y v)+o(h 4 ). (39) 12

14 By a change of variales, and using (35) and (37), F (y v) 1 µ x v w g(v)dv F (y x v) g(x v)w (v) dv µf (y x) vf (1) (y x)+ 2 v 2 2 F (2) (y x) 3 v 3. µg(x) vg (1) (x)+ 2 v 2 2 g(2) (x) v3 3 6 g(3) (x) w(v)+o( 4 ) 6 F (3) (y x) F (y x) g(x)+ 2 2 F (y x) g(2) (x)+ 2 2 F (2) (y x) g(x)+ 2 F (1) (y x) g (1) (x)+o( 4 ). (4) Similarly, y a change of variales and (35) and (38), f (y v) 1 µ Z x v w g(v)dv f (y x v) g(x v)w(v)dv f (y x) g(x)+o( 2 ). (41) Together, (28), (39), (4) and (41) show that which is (31). E (w i Kh(Y i )) E (w i K h (Y i )) E (w i ) F (y x) µ 1 x v w K h (y u) f (u v) g(v)dudv µ g(x)+ 2 2 g(2) (x) F (y x)+o( 4 ) F (y v) 1 µ µ x v w g(v)dv g(x)+ 2 2 g(2) (x) F (y x) g(v)dv + O( 4 ) + h2 2 f (y v) 1 w µ x v h2 2 f (y x) g(x)+ 2 2 F (2) (y x) g(x)+ 2 F (1) (y x) g (1) (x) +O 4 13

15 Similarly, using (29), (39) and similar derivations, E (w i (x i ) Kh(Y i )) E (w i (x i ) K h (Y i )) E (w i (x i )) F (y x) (x v) 1 µ x v w K h (y u) f (u v) g(v)dvdu + 2 g (1) (x)f (y x)+o( 4 ) vf (y x v) g(x v)w (v) dv + h2 2 vf (y x v) g(x v)w (v) dv + 2 g (1) (x)f (y x)+o( 4 ) 2 F (1) (y x) g(x)+o 4 which is (32). To show (33) first oserve that y a change of variales and (35), E Z wi 2 µ 1 x v 2 2 w g (v) dv 1 w (v) 2 g (x v) dv 1 ³ w (v) 2 g(x) vg (1) (x) dv + O() Rg(x) + O(). (42) Second, using (39), a change of variales, (35) and (37), E wi 2 K h (Y i ) µ 1 x v 2 w K h (y u) f (u v) g(v)dvdu µ 1 x v 2 µ h 2 2 w g(v)f (y v) dv + O 1 1 w (v) 2 g(x v)f (y x v) dv + O () ³ ³ w (v) 2 g(x) vg (1) (x) F (y x) vf (1) (y x) dv + O( 2 ) Rg(x)F (y x) + O(). (43) 14

16 Third, y integration y parts, a change of variales, and (36) K h (y u) 2 f (u v) du 2 K h (y u) k h (y u) F (u v) du 2 2 K (u) k (u) F (y hu v) du K (u) k (u)(f (y v) huf (y v)) du + O(h 2 ) F (y v) hψf (y v)+o(h 2 ). (44) The final equality uses the definition (1) and the fact that the symmetry of k(u) aout zero implies 2K (u) k (u) du. Fourth, using (44), (35), (37) and (38), E wi 2 K h (Y i ) 2 µ 1 x v 2 2 w K h (y u) 2 f (u v) g(v)dvdu µ 1 x v 2 µ h 2 2 w (F (y v) hψf (y v)) g(v)dv + O 1 w (v) 2 (F (y x v) hψf (y x v)) g(x v)dv + O () Rg(x)(F (y x) hψf (y x)) + O() (45) Together, (42), (43) and (45) show E wi 2 Kh(Y i ) 2 E wi 2 K h (Y i ) 2 2E wi 2 K h (Y i ) F (y x)+e wi 2 F (y x) 2 Rg(x)(F (y x) hψf (y x)) 2Rg(x)F (y x)2 + Rg(x)F (y x) 2 + O () Rg(x)(F (y x)(1 F (y x)) hψf (y x)) + O () which is (33). (34) is shown similarly. ProofofTheorem1:From equations (29)-(3), Eˆβ x ³ E ³w i (x i ) 2 1 (E (wi (x i ))) + O 2 g(x) 1 g (1) (x)+o 2. 15

17 Comined with (28), this yields Next, using equations (31) and(32), Together with (46) we find Ewi Ew i EˆβE (w i (x i )) + O( 2 ) (46) f(x)+o( 2 ). E (w i K h(y i )) E (w i K h(y i )) EˆβE (w i (x i ) K h(y i )) + O F (2) (y x) g(x)+ h2 2 f (y x) g(x)+o 4. ³ E F h, (y x) F (y x) Ew i K h (y Y i) + O Ew i ³ 2 F (2) (y x)+h 2 f (y x) + O 4. Similarly, using (33) and (34) Var³ F h, (y x) 1 Var(wi K h (y Y i)) n (Ewi )2 µ 1 E (w i Kh (y Y i)) 2 n g(x) 2 + O n R (F (y x)(1 F (y x)) hψf (y x)) g(x)n + O µ O ³n 6/5. n Together, these expressions and the assumption that cn 1/5 estalish that ³ 2 E F h, (y x) F (y x) R (F (y x)(1 F (y x)) ψhf (y x) f(x)) g(x)n + h2 2 f (y x) F (2) (y x) f (y x) F (2) (y x) h4 f (y x) O ³n 6/5. Integrating over y we otain ³ Z R ³ 2 R F (y x)(1 F (y x)) dy ψh R f (y x) dyf(x) E F h, (y x) F (y x) dy g(x)n + h4 f (y x) 2 dy + h2 2 f (y x) F (2) (y x) dy F (2) (y x) 2 dy + O ³n 6/5 4 16

18 which equals the expression in (8) using the equivalence f (y x) F (2) (y x) dy f (y x) f (2) (y x) dy which holds y integration y parts. This completes the proof. 17

19 Tale 1: Normalized MISE Conditional Distriution Function Estimators n 5 Optimal Bandwidth Plug-In Bandwidths Error Density SNW LL SLL LL SLL Gaussian Skewed Unimodal Strongly Skewed Kurtotic Unimodal Outlier Bimodal Separated Bimodal Asymmetric Bimodal Trimodal n 1 Optimal Bandwidth Plug-In Bandwidths Error Density SNW LL SLL LL SLL Gaussian Skewed Unimodal Strongly Skewed Kurtotic Unimodal Outlier Bimodal Separated Bimodal Asymmetric Bimodal Trimodal n 3 Optimal Bandwidth Plug-In Bandwidths Error Density SNW LL SLL LL SLL Gaussian Skewed Unimodal Strongly Skewed Kurtotic Unimodal Outlier Bimodal Separated Bimodal Asymmetric Bimodal Trimodal

20 References [1] Fan, Jianqing and Irene Gijels (1996): Local Polynomial Modelling and Its Applications: NewYork: Chapman and Hall. [2] Fan, Jianqing and Qiwei Yao (23): Nonlinear Time Series: Nonparametric and Parametric Methods. NewYork:Springer. [3] Hall, Peter, Rodney C. L. Wolff, andqiweiyao(1999): Methods for Estimating a Conditional Distriution Function, Journal of the American Statistical Association, 94, [4] Hansen, Bruce E. (24): Bandwidth Selection for Nonparametric Distriution Estimation, manuscript, University of Wisconsin. [5] Jones, M.C. and S. J. Sheather (1991): Using non-stochastic terms to advantage in kernel-ased estimation of integrated squared density derivatives, Statistics and Proaility Letters, 11, [6] Marron, J.S. and M.P. Wand (1992): Exact mean integrated squared error, Annals of Statistics, 2, [7] McConnell, M.M. and G. Perez-Quiros (2): Output fluctuations in the United States: What has changed since the early 198s? American Economic Review, 9,

Estimating a Finite Population Mean under Random Non-Response in Two Stage Cluster Sampling with Replacement

Open Journal of Statistics, 07, 7, 834-848 http://www.scirp.org/journal/ojs ISS Online: 6-798 ISS Print: 6-78X Estimating a Finite Population ean under Random on-response in Two Stage Cluster Sampling