Transformation and Smoothing in Sample Survey Data

Size: px
Start display at page:

Download "Transformation and Smoothing in Sample Survey Data"

Transcription

1 Scandinavian Journal of Statistics, Vol. 37: , 2010 doi: /j x Published by Blackwell Publishing Ltd. Transformation and Smoothing in Sample Survey Data YANYUAN MA Department of Statistics, Texas A&M University ALAN H. WELSH Centre for Mathematics and its Applications, Australian National University ABSTRACT. We consider model-based prediction of a finite population total when a monotone transformation of the survey variable makes it appropriate to assume additive, homoscedastic errors. As the transformation to achieve this does not necessarily simultaneously produce an easily parameterized mean function, we assume only that the mean is a smooth function of the auxiliary variable and estimate it non-parametrically. The back transformation of predictions obtained on the transformed scale introduces bias which we remove using smearing. We obtain an asymptotic expansion for the prediction error which shows that prediction bias is asymptotically negligible and the prediction mean-squared error (MSE) using a non-parametric model remains in the same order as when a parametric model is adopted. The expansion also shows the effect of smearing on the prediction MSE and can be used to compute the asymptotic prediction MSE. We propose a model-based bootstrap estimate of the prediction MSE. The predictor produces competitive results in terms of bias and prediction MSE in a simulation study, and performs well on a population constructed from an Australian farm survey. Key words: bootstrap, finite population prediction, model-based inference, non-parametric prediction, retransformation, transformation bias 1. Introduction Consider a finite population of N units in which a survey variable Y has population values Y 1,..., Y N and an auxiliary variable X has population values X 1,..., X N. For simplicity, we assume that X is a real variable; generalizations to vector X are reasonably straightforward. We assume that the values of X are known for all N population units, but that the values of Y are known only for a sample s of n N population units. Furthermore, the process used to select the population units to include in the sample is assumed to be conditionally independent of the values of Y given the values of X. Once the sample has been selected, the values of Y i, i s are known. The problem is to use Y i, i s and X i, i = 1,..., N to predict the unknown finite population total T = N i = 1 Y i. In this article, we propose a model-based transformation approach which incorporates smoothing and smearing to predict T by ˆT = Y i + n 1 g[ ˆm(X j ) +{g 1 (Y i ) ˆm(X i )}], (1) where g is a known transformation and ˆm is either a parametric or a non-parametric estimator of the mean of the transformed survey variable g 1 (Y ). In the non-parametric case, ˆm(x) = k s w k(x)g 1 (Y k ), where w k (x) are weights (such as the Nadaraya Watson, local linear or Gasser Müller weights) constructed from a kernel function K with bandwidth h > 0. When g is the identity function, there is no need for smearing so the second term in (1) simplifies to ˆm(X j ). In the remainder of this section, we give an intuitive motivation for our use of (1) to predict T and then discuss its theoretical properties.

2 Scand J Statist 37 Transformation and smoothing 497 The model-based approach to predicting T (see, e.g. chapter 2 of Valliant et al., 2000) is based on predictors of the general form ˆT = Y i + Ŷ j, (2) where Ŷ j is an estimator of E(Y j Y i, i s, X 1,..., X N ), j s. Under the usual assumption that the pairs (Y i, X i ) are independent, E(Y j Y i, i s, X 1,..., X N ) = E(Y j X j ) and Ŷ j is taken to be the fitted value from a parametric regression model relating Y and X. The parametric regression model is commonly taken to be linear but it is convenient for later comparison to allow the more general formulation. If the relationship between Y and X does not follow the assumed parametric form, it may be possible to transform Y so that the assumed parametric form holds on the transformed scale g 1 (Y ). That is, g 1 (Y i ) = m(x i ) + σɛ i, (3) where m( ) = m(, θ) is the regression function which is known up to the unknown regression parameter θ, σ is an unknown scale parameter and {ɛ i } are i.i.d. with mean zero and variance one. (We also allow transformation of X but this does not introduce any bias so it is not necessary to make it explicit.) If ˆθ g is an estimator of θ on the transformed scale, then we can predict g 1 (Y j )bym(x j, ˆθ g ) and, after back transforming, predict Y j by Ŷ j = g{m(x j, ˆθ g )}. This simple predictor is biased for Y j because it estimates g{m(x j, θ)} rather than E[g{m(X j, θ) + σɛ} X j ]. Even if the bias for predicting an individual Y j is small when N and n are large, there are N n such terms in the predictor (2) so the bias accumulates and can make ˆT severely biased. If we are prepared to assume an analytic form for the distribution of ɛ, we can try to adjust for this bias; for example, Karlberg (2000a,b) gives bias adjustments based on the log-transformation and the properties of the log-normal distribution. Alternatively, as pointed out by Chambers & Dorfman (2003b), we can use model calibration (Wu & Sitter, 2001) or smearing (Duan, 1983) to remove the transformation bias. We focus on smearing which estimates the expectation by an empirical average over the sample residuals, leading to the predictor Ŷ j = n 1 g[m(x j, ˆθ g ) +{g 1 (Y i ) m(x i, ˆθ g )}]. This kind of prediction of individual observations is analysed in a very general (infinite population) context in Welsh & Zhou (2006). We have chosen to use smearing because it is conceptually simple, flexible enough to apply when m is non-parametric and fits into the model-based framework. On the contrary, implementing model calibration when m is non-parametric is awkward and as a model-assisted method which incorporates model information into design-based procedures, model calibration does not fit naturally into our model-based approach. Chambers & Dorfman (2003b) point out that, in practice, both model calibration and smearing assume that the specified model fits the data on the transformed scale and can perform poorly when it does not. In particular, mis-specification of the regression function, a high number of zeros in the data and outliers can all lead to poor performance. The incorporation of a delta spike for handling zeros and robust estimation for handling outliers are treated in general in Welsh & Zhou (2006); in the finite population context, Chambers & Dorfman (2003b) suggested using robust estimates and incorporating robustness weights into the smearing predictor to deal with outliers. Both these ideas can be incorporated into the methodology we develop but, for simplicity, in this article, we consider only the less well-addressed issue of mis-specification. We adopt a non-parametric approach in which we treat m as a smooth function and estimate it non-parametrically. In (2) we use the smeared predictor Ŷ j = n 1 g{ ˆm(X j) + g 1 (Y i ) ˆm(X i )} to obtain (1). A general treatment of model-based non-parametric estimation of regression functions in finite population problems

3 498 Y. Ma and A. H. Welsh Scand J Statist 37 is given by Chambers et al. (2003). Note that model-based non-parametric estimators are different and have different properties from design-based estimators of a smooth regression function such as considered by Breidt & Opsomer (2000). In this article, we develop the prediction mean-squared error (MSE) properties of the parametric and non-parametric predictors without transformation and then with transformation and smearing. The calculations in each case proceed by writing the prediction error of the predictor ˆT in (2) as ˆT T = Y i + Ŷ j N i = 1 Y i = P {Y j E(Y j X j )}, (4) where P = {Ŷ j E(Y j X j )}. Note that the two terms on the right-hand side are independent, because the first term involves only Y i for i in sample, the second term involves only Y i for i not in sample and the observations are independent. This independence between the two terms implies that the prediction MSE is E( ˆT T) 2 = V + var(y j X j ), (5) where V = E(P 2 X ) = E([ {Ŷ j E(Y j X j )}] 2 X ). The second term on the right-hand side of (5) does not depend on the method of prediction so the problem is to evaluate V for different predictors. The calculations are fairly straightforward except in the case of the nonparametric predictor with transformation and smearing for which they are surprisingly complicated. Our main contribution is to obtain the leading terms in the expansion of {Ŷ j E(Y j X j )} and to show that V in (5) is of order N n or, equivalently, of order n as O(N) = O(n). (We will use O(n) hereafter.) This is the same order as the second term in (5) so the prediction MSE is of order n in the parametric and non-parametric cases and even when smearing the predictions to remove transformation bias. A practical issue with using non-parametric methods is that the (asymptotic) prediction MSE can be difficult to estimate. This is usually because of the bias terms our result shows that the bias terms are of lower order than the leading stochastic terms so can be ignored when we estimate the asymptotic MSE. In this context however, some of the omitted terms may not be much smaller than the retained terms unless N and n are both very large. To overcome this difficulty, we propose in section 3 a model-based bootstrap procedure to estimate the prediction MSE. The remainder of the article is organized as follows. The main results are presented in section 2. Estimation of the prediction MSE and inference are discussed in section 3 and the method is illustrated by a small simulation study in section 4. The article ends with a discussion in section 5, where the main features and findings of the proposed methods are recast and the potential limitations and possible extensions mentioned. The outline of the proof of the theory is deferred to an Appendix, and the technical details of the proof to the online Supporting Information available in connection with the online version of the article. 2. Prediction MSE results Our main result gives the lowest order terms in the expansion of {Ŷ j E(Y j X j )} for the non-parametric predictor with smearing. Simultaneously, we present results for the other predictors we have discussed because they are interesting in their own right, enable us to present a complete picture (including making comparisons) and provide context for the final result.

4 Scand J Statist 37 Transformation and smoothing 499 We assume throughout this section that N and n such that 0 < lim n/n < 1 and that Eɛ 4 <. Define a(x, ɛ) = g{m(x) + σɛ}, b(x, ɛ) = g {m(x) + σɛ} and c(x, ɛ) = g {m(x) + σɛ}. Then, we assume the following functions exist and are continuous functions of x on the support of X : α(x) = Ea(x, ɛ), β(x) = Eb(x, ɛ), γ(x) = Ec(x, ɛ), Ea(x, ɛ) 2, Eb(x, ɛ) 2, Eɛ 2 b(x, ɛ) 2, Ec(x, ɛ) 2, Eɛ 4 c(x, ɛ) 2. These assumption allows us to bound various quantities on the support of X without requiring the transformation g and its derivatives to be bounded. This is useful because the common transformations are not bounded. Note that the Cauchy Scharwz inequality enables us to bound other versions of these quantities, for example, as Eɛ 2 = 1, Eɛb(x, ɛ) {Eb(x, ɛ) 2 } 1/2, etc. We present conditions for the two smeared predictors separately. Those listed in condition A are for the parametric model m(x, θ) and those in condition B are for the non-parametric model m(x). The conditions for the predictors without transformation are a subset of these conditions. Condition A 1. The estimator ˆθ satisfies ˆθ θ = O p (n 1/2 ). 2. The regression function m(x, b) is differentiable in b, m (x, b) is uniformly continuous in b and Em (X, θ) <. 3. The transformation g is monotone and differentiable and the derivative g is uniformly continuous on any compact set. Condition (A1) allows for a wide class of estimators, including the widely used weighted least squares estimator and various robust estimators. Conditions (A2) and (A3) are smoothness conditions which allow us to expand both the regression function and the transformation when required. For the non-parametric predictors, we consider linear estimators ˆm(x) = k s w k(x)g 1 (Y k ) of the regression function m(x), where w k (x) are user-specified weights. Let K be a kernel function and h > 0 be the bandwidth. Then, common choices of weights include the Nadaraya Watson weights w k (x) = K{(x X k)/h} K{(x X i)/h}, the local-linear weights w k (x) = K{(x X k)/h}{s n,2 (X i x)s n,1 } K{(x X i)/h}{s n,2 (X i x)s n,1 }, where S n, m = K{(x X i)/h}(x i x) m (see Fan & Gijbels, 1996), or the Gasser Müller weights (Xk + X k + 1 )/2 w k (x) = 1 K{(x u)/h} du. h (X k 1 + X k )/2 As we make explicit in the following conditions, any linear smoother whose weights satisfy standard conditions can be used in the predictor.

5 500 Y. Ma and A. H. Welsh Scand J Statist 37 Condition B 1. The linear estimator ˆm(x) satisfies ˆm(x) m(x) = h 2 Cm (x) + σ k s w k (x)ɛ k + o(h 2 ) + O(n 1 ). 2. The weights w k (x) satisfy w k (X i ) = O p (1), w i (X j ) = O p (1), wi 2 (X ) = O p {(nh) 1 }, wi 2 (X i ) = O p {(nh 2 ) 1 }, w k (X j )w k (X ) = O p (1), w k (X i )w k (X ) = O p (1), k s w i (X )ɛ i = O p {(nh) 1/2 }, k s where the results hold uniformly with respect to the subindex and to X. 3. The density p of X has compact support (so X is bounded) and 0 < c 1 < p(x) < c 2 < on the support. 4. The regression function m(x) is three times continuously differentiable. 5. The first two derivatives of the transformation g exist and the second derivative g is uniformly continuous on any compact set. 6. The bandwidth h = O(n τ ) with 1/4 < τ < 1/2. Conditions (B1) and (B2) are satisfied by the common choices of the estimator. Condition (B3) allows us to control the behaviour of the covariates and conditions (B4) and (B5) allow us to expand both the regression function and the transformation when required. In condition (B6), the lower bound 1/4 < τ has to be strict to have the estimation variance dominate the bias, whereas the upper bound τ < 1/2 can actually be relaxed to τ 1/2 without affecting the first order; this is discussed in more detail after the statement of the theorem. However, some terms are of order h 1 = n τ which is the same order as the leading terms if τ = 1/2. Thus, choosing τ < 1/2 ensures that these terms are of smaller order than the leading terms and hence can be neglected asymptotically; in other words, choosing τ = 1/2 does not change the order of the prediction MSE but adds extra terms to it. Note that (B6) excludes the usual O(n 1/5 ) bandwidth because we are estimating the total which is an aggregate. The estimation variance is O(n) which does not depend on h and allows us to select a relatively small bandwidth to eliminate the effect of the accumulated bias. Specifically, in calculating the total, the bias from the non-parametric regression estimator is accumulated to order nh 2. To contain the bias order within n 1/2, we have to exclude the usual n 1/5 order from h. However, the relatively small bandwidth does not cause the variance order to deteriorate because of the aggregated nature of the total. This is reflected throughout the proof of the theorem. Theorem Suppose that either condition A or condition B holds as appropriate for the predictor. Then the first term P = {Ŷ j E(Y j X j )} in the expansion (4) for each predictor is: (i) Parametric prediction without transformation P 1 = m (X j, θ)( ˆθ θ).

6 Scand J Statist 37 Transformation and smoothing 501 (ii) Non-parametric prediction without transformation P 2 = σ w k (X j )ɛ k. k s (iii) Parametric prediction with transformation and smearing P 3 = β(x j ){m (X j, θ) 1 m (X i, θ)}( n ˆθ θ) + 1 {a(x j, ɛ i ) α(x j )}. n (iv) Non-parametric prediction with transformation and smearing P 4 = σ β(x j ){w k (X j ) 1 w k (X i )}ɛ k + 1 {a(x j, ɛ i ) α(x j )} n n k s 1 n σ w i (X i )[b(x j, ɛ i )ɛ i E{b(X j, ɛ i )ɛ i X j }]. The prediction MSE in each case is of order O(n). That is, E{( ˆT T) 2 } = O(n). The proof is given in the Appendix. The expression in (i) for the parametric predictor is familiar (see, e.g. Valliant et al., 2000) and is included to establish a baseline. The expression in (ii) for the non-parametric predictor is similar to (i) because the bias contribution is of smaller order. As discussed in the Introduction, this occurs because of aggregation: we are predicting a sum rather than individual observations. Specifically, from (B1), {Ŷ j E(Y j X j )} = { ˆm(X j ) m(x j )} = h 2 C m (X j ) + σ w k (X j )ɛ k + o p (nh 2 ) + O p (1) k s so the leading term in the variance is V = σ { } 2 2 w k (X j ) = σ 2 w k (X j )w k (X j ), k s k s j s which is of order n by condition (B2). When h = O(n τ ), the first term is of order n 1 2τ,sothe prediction bias is of smaller order than P 2 provided τ > 1/4 and hence the prediction MSE is of order n. It is also possible to choose τ = 1/2; this choice adds extra terms in (iv), although the final order of the prediction MSE remains O(n). The advantage of choosing the boundary case τ = 1/2 is that it allows straightforward scaling when choosing bandwidth; this is described in detail in section 4. The transformation in (iii) affects the contribution of the parametric estimator through a multiplicative term β(x j ), which comes from g, and by imposing centring on m. The second term in (iii) captures the effect of smearing to remove transformation bias. The effect of this term is to increase the variability of the predictor. The transformation in (iv) has a similar effect to (iii) on the non-parametric estimator by introducing β(x j ) and centring the weights. The effect of smearing here is to introduce two extra terms: one of these is the same as the smearing contribution in (iii) and the other combines the smearing with the estimator. As in (ii) there is no asymptotic bias and, as in (iii), the cost of smearing to remove bias is to increase variability. When the transformation g is the identity function, results in (iii) and (iv) reduce to those in (i) and (ii), respectively, in terms of their leading orders. The verification is straightforward and we omit the details here.

7 502 Y. Ma and A. H. Welsh Scand J Statist Inference We can base inferences on the smeared non-parametric predictor by estimating the asymptotic prediction MSE and using a Gaussian approximation. However, this approach may not work very well because the asymptotic prediction MSE contains several terms and some of the omitted terms are only of slightly smaller order than those retained in the approximation. This suggests that it may be useful to investigate the use of the bootstrap to estimate the uncertainty in the predictor and to make inferences about the population total. As the purpose of bootstrapping is to estimate the model-based prediction error or to set model-based confidence intervals for T, we construct a bootstrap distribution that is conditional on the sample s actually selected. We consider the approach suggested by Chambers & Dorfman (2003a), although other ways of bootstrapping may be possible. We treat the residuals r i = g 1 (Y i ) ˆm(X i ) = σɛ i h 2 Cm (X i ) σ k s w k (X i )ɛ k + o(h 2 ) + O(n 1 ) as estimates of the model errors. The residuals have conditional mean of order h 2 and approximate conditional variance σ 2 {1 + k s w k(x i ) 2 2w i (X i )} = σ 2 + O{(nh) 1 } so we can also use the standardized residuals g 1 (Y i ) ˆm(X i ) 1 + k s w k(x i ) 2 2w i (X i ), (6) which have conditional mean of order h 2 and approximate conditional variance σ 2. The standardization produces a higher-order correction to the residuals and so can be omitted. If we let ri, i = 1,..., N denote a simple random sample of residuals sampled with replacement from the sample residuals r i, i s, then we can construct the bootstrap population Y i = g{ ˆm(X i ) + r i }, i = 1,..., N. Here we assume g is well defined at ˆm(X i ) + ri. For commonly used transformations such as the log-transformation (so g is the exponential function) or the Box Cox transformation, this is indeed the case. For this population, we compute the bootstrap population total T and use the same sample units s as before to construct the predictor ˆT as defined in (2). The difference ˆT T is the prediction error based on the sample s for this bootstrap population and, repeating these calculations, we obtain the bootstrap distribution for the prediction error ˆT T. The expected squared error of this bootstrap distribution is an estimate of the prediction MSE. Following Chambers & Dorfman (2003a), we can also use the quantiles of the bootstrap distribution (percentile method) to set confidence intervals for T. The bootstrap described previously is attractively simple because it ignores the possible effects of bias on the residuals. There is no transformation bias because the residuals are constructed without any back transformation; this is related to the fact that the bootstrap population is constructed by predicting individual values Y i rather than their conditional mean values E(Y i X i ). [ We do compute the bootstrap population total (an aggregated quantity) but it is more appropriate to treat this as the actual total of the bootstrap population (hence unsmeared) rather than as an estimate which can be smeared.] The residuals do include smoothing bias from estimating the regression function m; see Davison & Hinkley (1997, section 7.6). We can adjust for this bias if we can estimate C and m (and then consider modifying the approximate variance used in standardization to accommodate their uncertainty) but this introduces complications which we prefer to avoid. As we are estimating individual errors rather than aggregated errors, we should use a classical (larger) band-

8 Scand J Statist 37 Transformation and smoothing 503 width h to construct the residuals than we use to construct the non-parametric predictor of the population total. 4. Simulation We illustrate the proposed method in a model-based simulation study. We performed 250 simulations in which samples of size n = 400 were generated from populations of size N = 1600 so the non-sample size is N n = We generated and kept fixed 400 X i sinthe sample and 1200 X i s in the non-sample, but new Y i s were generated in each simulation. The X i s were generated from a uniform distribution U(0, 4), whereas the Y i s were generated from g 1 {m(x i ) + ɛ i } with the ɛ i from a standard normal distribution N(0, 1). We experimented with three different mean functions, m(x) = 50x/{1 + (x + 4) 2 }, m(x) = 2.5x/{1 + (x 1) 2 } and m(x) = 0.8(x 2) 2, respectively, where the coefficients are selected so that the three functions have similar ranges. Finally, we used the identity transformation (g 1 (Y ) = Y ) and the log-transformation (g 1 (Y ) = log(y )) as these seem to be the most widely used in practice. We implemented the proposed method using the standard (Nadaraya Watson) kernel estimator (p1) and the local linear estimator (p2) with the Epanechnikov kernel K(x) = 0.75(1 x 2 )I(x 2 < 1) and with bandwidth h 1 = O(n 1/2 ) on the observed pairs {X i, g 1 (Y i )}, i s, to estimate the regression function m(x). From condition (B6), h 1 = O(n τ ) for any 1/4 < τ < 1/2 suffices. As we pointed out, using h 1 = O(n 1/2 ) instead of o(n 1/2 ) changes the coefficient but not the order of the prediction MSE. In the bootstrap, we do not need to estimate the coefficients so the procedure carries through. The advantage of using the boundary order bandwidth is that it allows straightforward scaling between samples of different sizes in the cross-validation procedure described in the next paragraph. We denote the estimates of m(x) by ˆm 1 (x) to emphasize the use of h 1. The predictors of the finite population total are then calculated through ˆT = Y i + 1 g{ ˆm 1 (X j ) + r i }. n To estimate the bias and variability of the proposed methods, we used a bootstrap method. We first obtained a new estimate of m(x), denoted by ˆm 2 (x), from samples {X i, g 1 (Y i )}, i s, using bandwidth h 2 = O(n 1/5 ). Note that here h 2 has the same order as the usual optimal bandwidth for non-parametric regression as we use it to obtain individual residuals; see the discussion in the last paragraph of section 3. We then obtained the residuals r i = g 1 (Y i ) ˆm 2 (X i ), i s, and formed bootstrap samples by constructing (X k, Yk ), where Yk = g(x k) + r k, k = 1,..., N and the r k were randomly selected with replacement from r i, i s. The bootstrap procedure was repeated 500 times. In practice we need to be able to select the two bandwidths h 1 and h 2.Ash 2 optimizes the non-parametric fitting of m(x) on the data {X i, g 1 (Y i )}, i s, it can be selected by the usual leave-one-out cross-validation. However, the selection of h 1 is not so straightforward. The final goal is to minimize the MSE of the predictor of the total when 1200 out of 1600 responses are missing so, ideally, we should use a leave-3/4-out cross-validation and then scale h 1 back by multiplying by 4 1/2. Of course, it is not practical to calculate all ( ) crossvalidation samples, so we select at random 50 sets of cross-validation samples and minimize the MSE of the predictor of the total over these 50 sets to obtain the optimal h 1. Although increasing 50 to a larger number will yield a closer approximation to the true optimal crossvalidation bandwidth, the computational burden increases very quickly as well. Hence, we use this relatively small number of samples in the simulation.

9 504 Y. Ma and A. H. Welsh Scand J Statist 37 We compared our method with several other methods including predictors based on fitting (a) a linear model (ax + b) with transformation and smearing; (b) a linear model through the origin (ax ) without transformation; (c) a linear model without transformation; and (d) the expansion estimator of the total ˆT = N/n Y i. To illustrate the importance of smearing when we transform, we also implemented no smearing versions of the proposed estimators (p1), (p2) and method (a). The expansion estimator (d) is both a model-based estimator (for a simple homogeneous model) and a design-based estimator (for simple random sampling without replacement). We included two further design-based generalized regression (GREG) estimators of the total to complete the comparison between model- and design-based methods. We considered (g1) the GREG with constant variance and (g2) the GREG with variance proportional to X i which is just the familiar ratio estimator. These estimators differ from their model-based counterparts (2) in how they treat the in-sample observations. The simulation results are presented in Table 1 and Fig. 1. Several comments are worth making. (i) In all the situations, in terms of the sample MSE, (p2) has better performance than (p1). Hence, comparing the kernel and local linear estimators, the local linear estimator is usually preferred. Intuitively, this may be explained by the fact that the Nadaraya Watson kernel estimator is a local constant estimator, which is generally outperformed by the local linear estimator in regions with sparse observations and at the boundary (see Fan & Gijbels, 1996, Härdle et al., 2000, chapter 4). (ii) In most cases, the sample MSEs match the average of the estimated MSE rather well. This demonstrates that the inference derived in section 3 is valid and the asymptotic properties do not require huge sample sizes or bootstrap sizes. Here, sample MSE is the MSE of the estimators from the 250 simulation replicates compared with the true value, which is known in simulations, and average of the estimated MSE is the average of the 250 estimates of the MSE, each one calculated within a simulation replicated using only the sample information. Note that for the proposed estimators (p1) and (p2), the MSEs are dominated by the estimation variance. As we are estimating the total, the variance is N 2 times bigger than that of the mean. With N = 1600 in the simulations, a variance of the order 10 6 is quite usual and does not indicate a problem. (iii) There are occasional situations, for example, for the model log(y) = 2.5x/{1 + (x 1) 2 }, in which the average of the estimated MSE does not match the sample MSE very closely. We conjecture that this may be caused by difficulties in bandwidth selection. In the bootstrap, to save computation time, we used the values of h 1 and h 2 chosen for the sample rather than re-estimating the bandwidths for each bootstrap sample. This may contribute to the observed performance because in the other experiments we conducted (not reported here) we found that if we use fixed bandwidths for h 1 and h 2, the difference between MSE and MSE is rather small. Intuitively, choosing a bandwidth specific to each bootstrap sample contributes extra variability and hence will increase the total MSE, a phenomenon verified in a small experiment that we conducted. However, the estimated MSE is not always an underestimate so increasing the variability in the bootstrap is not always helpful. (iv) To examine the numerical usefulness of the asymptotic results derived in the theorem, we also calculated the estimated MSE for the estimator (p1) using the leading terms. As we expected, the results are too crude to be useful. In fact, for m(x) = 2.5x/{1 + (x 1) 2 }, the estimated MSE is about 10 per cent of the sample MSE, whereas it falls to about 5 per cent for the other two mean functions. This confirms our intuition that the leading order is only slightly larger than the smaller terms that are ignored. It also suggests that the main contribution of the theorem is the evaluation of the rate of convergence. Although our method is proposed in the model-based framework, out of curiosity, we also performed a simulation in the design-based framework. We still performed 250 simulations with the same sample size and non-sample size but the data generation procedure is different.

10 Scand J Statist 37 Transformation and smoothing 505 Table 1. Simulation results. Replications = 250, n = 400, N = 1600 Bias Vars MSE MSE Bias Vars MSE MSE m(x) = 50x/{1 + (x + 4) 2 }, g 1 (y) = log y m(x) = 50x/{1 + (x + 4) 2 }, g 1 (y) = y (p1) 2.79e2 4.17e6 4.24e6 5.09e e3 3.53e3 3.69e3 (p2) 3.88e2 3.84e6 3.99e6 4.01e e3 3.39e3 3.55e3 (a) 2.33e3 5.39e6 1.08e7 5.78e6 1.92e1 3.38e3 3.74e3 4.03e3 (b) 6.90e2 5.04e6 5.51e6 3.04e6 3.03e2 2.77e3 9.43e4 3.85e3 (c) 6.52e1 4.14e6 4.15e6 3.74e6 1.92e1 3.38e3 3.74e3 4.05e3 (d) 1.55e3 3.51e6 5.92e6 4.20e6 7.05e1 3.36e3 8.34e3 6.01e3 (g1) 7.35e3 1.07e7 6.47e7 4.17e6 5.29e2 4.85e3 2.85e5 5.58e3 (g2) 1.67e2 4.90e6 4.93e6 4.55e6 2.32e1 3.59e3 4.12e3 6.41e3 (p1n) 2.57e3 8.51e6 1.51e7 1.12e e3 5.28e3 5.31e3 (p2n) 1.05e4 9.77e5 1.12e8 1.06e8 4.39e1 4.07e3 6.00e3 5.03e3 (an) 1.02e4 1.06e6 1.05e8 1.48e8 1.92e1 3.38e3 3.74e3 4.03e3 m(x) = 2.5x/{1 + (x 1) 2 }, g 1 (y) = log y m(x) = 2.5x/{1 + (x 1) 2 }, g 1 (y) = y (p1) 3.95e2 1.84e6 2.00e6 1.59e e3 3.54e3 3.74e3 (p2) 7.36e2 1.24e6 1.78e6 1.56e e3 3.42e3 3.55e3 (a) 2.70e2 1.83e6 1.90e6 2.15e6 4.38e1 3.38e3 5.29e3 5.91e3 (b) 4.68e3 7.52e5 2.26e7 1.81e6 5.36e2 2.77e3 2.90e5 6.58e3 (c) 2.57e2 1.84e6 1.91e6 2.18e6 4.38e1 3.38e3 5.29e3 5.93e3 (d) 3.25e2 1.88e6 1.99e6 2.19e6 4.16e1 3.36e3 5.09e3 5.92e3 (g1) 1.06e3 1.15e6 2.28e6 2.43e e3 4.88e3 9.54e3 (g2) 5.36e2 1.52e6 1.81e6 2.05e6 2.98e1 3.59e3 4.47e3 6.33e3 (p1n) 1.74e3 4.32e6 7.34e6 4.67e e3 5.27e3 5.31e3 (p2n) 6.47e3 3.53e5 4.22e7 3.78e7 3.90e1 4.27e3 5.79e3 4.79e3 (an) 8.61e3 1.25e5 7.43e7 7.63e7 4.38e1 3.38e3 5.29e3 5.91e3 m(x) = 0.8(x 2) 2, g 1 ( y) = log y m(x) = 0.8(x 2) 2, g 1 ( y) = y (p1) 2.00e1 7.75e5 7.76e5 8.21e e3 4.14e3 3.97e3 (p2) 2.23e2 4.88e5 5.38e5 5.42e e3 3.67e3 3.60e3 (a) 3.91e2 8.11e5 9.64e5 1.16e6 2.95e1 3.71e3 4.59e3 7.17e3 (b) 1.74e3 8.77e5 3.92e6 9.91e5 2.48e2 2.78e3 6.43e4 5.29e3 (c) 4.02e2 8.19e5 9.80e5 1.19e6 2.95e1 3.71e3 4.59e3 7.15e3 (d) 4.18e2 8.33e5 1.01e6 1.19e6 4.09e1 3.74e3 5.41e3 7.18e3 (g1) 4.20e2 1.50e6 1.68e6 1.26e6 9.96e1 5.06e3 1.50e4 8.03e3 (g2) 6.83e2 8.41e5 1.31e6 1.12e6 1.43e1 3.42e3 3.63e3 6.58e3 (p1n) 7.40e2 1.85e6 2.40e6 1.40e e3 5.53e3 5.90e3 (p2n) 3.22e3 1.90e5 1.06e7 1.01e7 2.80e1 3.83e3 4.61e3 4.71e3 (an) 5.33e3 2.63e5 2.85e7 3.30e7 2.95e1 3.71e3 4.59e3 7.17e3 Bias, sample bias; Vars, sample variance; MSE, sample mean-squared error; MSE, average of the estimated mean-squared error; (p1), kernel estimator with transformation and smearing; (p2), local linear estimator with transformation and smearing; (a), linear estimator with transformation; (b), linear regression through the origin without transformation; (c), linear regression without transformation; (d), expansion estimator; (g1), GREG estimator with constant variance; (g2), GREG/ratio estimator; (p1n), p1 without smearing; (p2n), p2 without smearing; (an), a without smearing. Specifically, we generated a fixed population of 1600 (X i, Y i ) pairs and then drew 250 independent simple random samples without replacement of 400 pairs from this population. All other aspects of the simulation design were identical to the model-based case. The simulation results are given in Table 2. We can see that although the proposed method is targeted at the model-based situation, its performance in the design-based situation is very good. We also applied the proposed methods to samples drawn from the Australian Agricultural and Grazing Industries Survey (AAGIS) data from Kokic et al. (2000). This data set contains information on 1652 Australian broadacre farms. The variables we used are the total farm area in hectares (X ) and the total cash costs of the farm during the survey year in Australian dollars (Y ). Of the 1652 farms surveyed, 8 farms exhibited unusual data patterns and hence

11 506 Y. Ma and A. H. Welsh Scand J Statist 37 p1 p2 a b c d g1 g2 p1n p2n an x 10 4 p1 p2 a b c d g1 g2 p1n p2n an p1 p2 a b c d g1 g2 p1n p2n an p1 p2 a b c d g1 g2 p1n p2n an p1 p2 a b c d g1 g2 p1n p2n an p1 p2 a b c d g1 g2 p1n p2n an Fig. 1. Boxplots of the differences between the total estimates and the true totals for the cases presented in Table 1. The left column has g 1 (y) = log(y) and the right column has g 1 (y) = y. The top row has m(x) = 50x/{1 + (x + 4) 2 }, the middle row has m(x) = 2.5x/{1 + (x 1) 2 } and the bottom has m(x) = 0.8(x 2) 2. The estimators are given in the same order as in Table 1. were excluded from our analysis. The data are plotted in Fig. 2 on the raw scale and the log scale, respectively. Clearly, a log-transformation reveals the data pattern more clearly; so the log-transformation was used in our proposed method for the analysis. We treated the 1644 farms as the population of interest and conducted a simulation study in which we selected 250 independent samples each of size 411 from the population. To simplify the computations, we fixed the bandwidth at h = 1 for estimator (p1) and h = 3for estimator (p2). The choice of h is based on a preliminary cross-validation result, where we

12 Scand J Statist 37 Transformation and smoothing 507 Table 2. Simulation results on design-based data. Replications = 250, n = 400, N = 1600 Bias Vars MSE MSE Bias Vars MSE MSE m(x) = 50x/{1 + (x + 4) 2 }, g 1 (y) = log y m(x) = 50x/{1 + (x + 4) 2 }, g 1 (y) = y (p1) 2.54e2 9.20e6 9.27e6 1.01e e3 5.84e3 5.95e3 (p2) 6.48e2 7.78e6 8.20e6 7.11e6 9.72e e3 5.82e3 5.75e3 (a) 1.20e3 8.23e6 9.67e6 8.40e e3 6.25e3 6.26e3 (b) 1.30e3 1.53e7 1.70e7 8.42e6 3.29e2 5.89e3 1.14e5 6.23e3 (c) 4.03e2 1.02e7 1.04e7 1.01e e3 6.25e3 6.31e3 (d) 4.46e2 1.07e7 1.09e7 1.07e e3 8.18e3 8.77e3 (g1) 8.83e3 2.72e7 1.05e8 9.65e6 5.27e2 8.02e3 2.86e5 5.70e3 (g2) 2.37e2 1.09e7 1.10e7 9.19e e3 8.65e3 6.09e3 (p1n) 1.22e4 5.45e6 1.55e8 1.32e8 1.13e1 9.28e3 9.41e3 9.01e3 (p2n) 1.23e4 2.31e6 1.54e8 1.34e8 3.95e1 6.08e3 7.64e3 7.28e3 (an) 1.19e4 2.47e6 1.43e8 1.73e e3 6.25e3 6.26e3 m(x) = 2.5x/{1 + (x 1) 2 }, g 1 (y) = log y m(x) = 2.5x/{1 + (x 1) 2 }, g 1 (y) = y (p1) 1.30e2 2.71e6 2.72e6 3.08e e3 5.94e3 6.01e3 (p2) 6.47e2 2.18e6 2.60e6 2.70e6 5.05e e3 5.94e3 5.74e3 (a) 8.31e1 2.85e6 2.86e6 3.00e e3 8.39e3 8.65e3 (b) 5.38e3 1.85e6 3.08e7 2.84e6 5.74e2 9.47e3 3.39e5 1.01e4 (c) 7.25e1 3.00e6 3.00e6 3.11e e3 8.39e3 8.71e3 (d) 5.16e1 2.99e6 3.00e6 3.13e6 1.76e e3 8.43e3 8.74e3 (g1) 1.68e3 2.71e6 5.52e6 2.65e6 2.81e1 1.30e4 1.38e4 9.68e3 (g2) 9.93e1 3.56e6 3.57e6 2.16e e4 1.51e4 6.07e3 (p1n) 7.44e3 1.38e6 5.67e7 5.90e e3 8.09e3 8.18e3 (p2n) 7.78e3 9.11e5 6.15e7 5.79e7 3.62e1 6.13e3 7.44e3 7.24e3 (an) 9.43e3 6.56e5 8.96e7 8.80e e3 8.39e3 8.65e3 m(x) = 0.8(x 2) 2, g 1 (y) = log y m(x) = 0.8(x 2) 2, g 1 (y) = y (p1) 1.80e2 1.59e6 1.62e6 1.85e e3 6.18e3 6.14e3 (p2) 1.17e1 1.17e6 1.17e6 1.39e e3 5.90e3 5.86e3 (a) 1.46e2 2.15e6 2.17e6 2.23e e3 9.92e3 1.03e4 (b) 2.69e3 9.62e5 8.17e6 1.78e6 2.88e2 6.31e3 8.90e4 9.36e3 (c) 6.36e1 2.04e6 2.05e6 2.05e e3 9.92e3 1.04e4 (d) 9.88e1 2.02e6 2.03e6 2.06e e3 9.87e3 1.04e4 (g1) e6 1.39e6 1.56e6 5.01e1 1.01e4 1.26e4 8.77e3 (g2) 4.45e1 1.91e6 1.91e6 1.34e e4 1.08e4 7.20e3 (p1n) 4.20e3 8.67e5 1.85e7 1.80e e3 8.18e3 8.27e3 (p2n) 3.73e3 5.20e5 1.44e7 1.58e7 7.10e1 6.25e3 1.13e4 1.04e4 (an) 6.44e3 2.71e5 4.17e7 4.39e e3 9.92e3 1.03e4 Bias, sample bias; Vars, sample variance; MSE, sample mean-squared error; MSE, average of the estimated mean-squared error; (p1), kernel estimator with transformation and smearing; (p2), local linear estimator with transformation and smearing; (a), linear estimator with transformation; (b), linear regression through the origin without transformation; (c), linear regression without transformation; (d), expansion estimator; (g1), GREG estimator with constant variance; (g2), GREG/ratio estimator; (p1n), p1 without smearing; (p2n), p2 without smearing; (an), a without smearing. averaged the selected bandwidths and rounded to the nearest integer. The results are presented in Table 3. Clearly, both proposed methods (p1), (p2) outperformed the other methods in terms of the MSEs, although the gain in comparison with method (a) is not substantial. A closer look at the right panel of Fig. 2 indicates that a non-parametric curve might not provide much better fit to the data than a linear function on the log scale, and this explains the similar performance of the three estimators. The estimated MSE is quite close to the sample MSE for both (p1) and (p2). We performed similar studies on AAGIS data using the (cross-validation) bandwidth. In these studies, the performance gain of (p1) and (p2) and (a) is very similar, whereas all three continue to outperform the remaining estimators. The estimation of the MSE turns out to be less accurate for both (p1) and (p2).

13 508 Y. Ma and A. H. Welsh Scand J Statist 37 Total cast cost (y) 10 x Total farm area (x) Log transformed total cast cost (log(y)) Total farm area (x) Fig. 2. Scatter plot of 1644 observations in the Australian Agricultural and Grazing Industries Survey data set, on the original scale (left panel) and the log scale (right panel), respectively. Table 3. Results for Australian Agricultural and Grazing Industries Survey data with eight outliers excluded. Results are based on 250 samples with sample size n = 411 and non-sample size N n = 1233 Bias Vars MSE MSE Bias Vars MSE MSE (p1) 5.49e3 4.78e8 5.08e8 4.66e8 4.47e3 5.17e8 5.37e8 4.79e8 (p2) 4.52e3 4.47e8 4.67e8 4.59e8 5.18e3 5.22e8 5.49e8 7.85e8 (a) 6.32e3 4.41e8 4.81e8 4.19e8 (b) 3.99e4 1.05e9 2.64e9 7.65e8 (c) 1.76e3 5.56e8 5.59e8 7.25e8 (d) 3.86e3 6.14e8 6.29e8 9.03e8 (g1) 1.48e5 2.37e9 2.43e e8 (g2) 1.70e3 7.49e8 7.52e8 6.25e8 (p1n) 6.62e4 3.32e8 4.71e9 5.83e9 6.82e4 4.46e8 5.10e9 3.63e8 (p2n) 7.54e4 2.70e8 5.95e9 3.89e9 7.63e4 3.83e8 6.20e9 4.36e9 (an) 6.46e4 2.66e8 4.44e9 4.82e9 Bias, sample bias; Vars, sample variance; MSE, sample mean-squared error; MSE, average of the estimated mean-squared error; (p1), kernel estimator; (p2), local linear estimator; (a), linear estimator with transformation; (b), linear regression through the origin without transformation; (c), linear regression without transformation; (d), expansion estimator; (g1), GREG estimator with constant variance; (g2), GREG/ratio estimator; (p1n), p1 without smearing; (p2n), p2 without smearing; (an), a without smearing. 5. Discussion In this article, we propose to estimate finite population totals using a transformation and smoothing approach with smearing to remove the bias caused by back transformation. The purpose of the transformation is to achieve additive homoscedastic error; in exchange, we may lose having an explicit parametric model for the mean. We have proved that even under a weak non-parametric model assumption, the prediction MSE can achieve the same order as under a parametric model. Moreover, the same result holds (so there is no further loss) under transformation with smearing. Hence, a non-parametric model does not cause any performance loss in terms of the order of the asymptotic error. The computation of the prediction MSE is nonetheless more challenging (as is always the case for non-parametric models), and we propose to use a bootstrap to obtain the estimated prediction MSE. The result that non-parametric smoothing methods can achieve the same rate of convergence as parametric methods for predicting a finite population total is interesting and

14 Scand J Statist 37 Transformation and smoothing 509 important. It is a consequence of the fact that we are predicting a sum or aggregate, which is like an integrated or expected regression function so parametric rates of convergence are achievable. In contrast, the familiar non-parametric rates are obtained when predicting a single observation. We can intuitively relate predicting aggregates versus predicting single observations to distribution function estimation versus density estimation. The distribution function is obtained by integration which, like summation, is a form of smoothing through aggregation. Non-parametric estimation of density functions is difficult and the best possible rates of convergence are the famous non-parametric ones. Non-parametric estimation of distribution functions is relatively easy and can be done at parametric rates. The reason is that the integration or aggregation has made the problem smoother and hence easier. In the survey context, predicting an individual observation is a difficult problem like the density estimation problem whereas predicting the total (an aggregate) is much easier, like the distribution function estimation problem. The possibility of achieving parametric rates of convergence from non-parametric predictors has been considered in the design-based case by Breidt & Opsomer (2000) but does not seem to have been discussed in the model-based sample survey literature. Hence, we give the result both with and without transformation for completeness. Under transformation, smearing removes bias in the predictor but adds to its variability an effect similar to that of the bias-adjustment approach discussed by Karlberg (2000a,b). However, as pointed out earlier, aggregating even small bias contributions can have a deleterious effect so it is important to remove the bias. Two sources of bias in our problem, namely smoothing and transformation, are reduced by aggregation and smearing, respectively. A third potential source of bias, model mis-specification, is reduced by our use of nonparametric smoothing. Nonetheless, although our model assumptions are much weaker than in parametric models, there are still assumptions and hence the potential for misspecification effects. We assume throughout that the model is correct and do not consider bias caused by model mis-specification. Our results are practically important because they give a better understanding of the properties of non-parametric predictors in finite population problems and demonstrate a flexible method of prediction. In particular, our results show that non-parametric prediction is competitive with parametric prediction of a finite population total in terms of efficiency, convenience and is superior in terms of consistency and flexibility. As the non-parametric approach should work for any smooth regression function, why not simply smooth the raw data and avoid transformation altogether? Hastie & Tibshirani (1990, section 7.1) provided a partial answer to this question. Although non-parametric smoothing weakens the assumptions on the model, it does not eliminate all of them and transformation may be employed to ensure that these assumptions are satisfied. When transformation induces homoscedastic errors, we can simplify the analysis because we do not need to also model a variance function. Transformation is also useful in practice for restricting predictions to an allowable range. Transformation can sometimes contribute to improving prediction performance, as we observe in our simulation study. Finally, when we have multiple covariates, an additive model is often used. In this case, transformation is often needed to achieve additivity. The analysis in the additive model inevitably builds on the simpler univariate case that we consider here. Our study of how the smoothing bias and transformation bias affect each other in the present simpler context, provides baseline results and a useful approach that can be extended in the future. In summary, the application of a nonlinear transformation g 1 to Y (often the logarithmic transformation) can be a useful way of achieving additive, homoscedastic errors. The purpose of making a transformation in a non-parametric model is different from that in parametric models (we change the error structure rather than the regression function) but the consequences in terms of prediction bias after back-transformation are the same and we need to extend the smearing technique to remove the bias.

15 510 Y. Ma and A. H. Welsh Scand J Statist 37 It is worth pointing out that in practice, finding a suitable transformation g may not be easy. We can try to estimate the transformation within a parametric family of transformations g, such as the Box Cox transformation, or a family of smooth transformations. However, estimating transformations and the non-parametric regression together gives rise to identifiability problems, so additional information, possibly in the form of instrumental variables will be needed; see, for example, Linton et al. (1997). For simplicity and focus, in this article, we assumed that a suitable transformation g is known. In practice, the assumption that a transformation g can be found to achieve additive, homoscedastic errors should be checked using the sample observations. When such a g does not exist or is too hard to find, the method proposed here is not applicable. In such situations, we would recommend not performing any transformation. Instead, standard parametric or non-parametric regression should be performed on the sample data and the corresponding non-sample values should be predicted from these estimates. Straightforward summation of the non-samples should then be performed to estimate the total. Of course, inferences should then be modified to incorporate the actual error structure. Finally, we have developed the method as a model-based approach but a design-based version can be constructed by incorporating sample inclusion probabilities appropriately. In our numerical experiment, we found that under simple random sampling without replacement (for which the sample inclusion probabilities are constant), the (model-based) procedure performed well as a design-based procedure. We caution that the theory and its derivation will be very different in the design-based framework, and a systematic study is needed before definite conclusions can be drawn. Acknowledgement This research was supported by the Australian Research Council (DP ) and a US NSF Grant DMS Supporting Information Additional Supporting Information may be found in the online version of this article: More detailed proof of result (iv) in the theorem non-parametric case with transformation and smearing. Please note: Wiley-Blackwell are not responsible for the content or functionality of any supporting materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the article. References Breidt, F. J. & Opsomer, J. D. (2000). Local polynomial regression estimators in survey sampling. Ann. Statist. 28, Chambers, R. L. & Dorfman, A. H. (2003a). Robust sample survey inference via bootstrapping and bias correction: the case of the ratio estimator. S3RI Methodology Working Papers, M03/13. Southampton Statistical Sciences Research Institute, University of Southampton. Chambers, R. L. & Dorfman, A. H. (2003b). Transformed variables in survey sampling. S3RI Methodology Working Paper M03/21. Southampton Statistical Sciences Research Institute, University of Southampton. Chambers, R. L., Dorfman, A. H. & Sverchkov, M. Yu. (2003). Non-parametric regression with complex survey data. In Analysis of survey data (eds R. L. Chambers & C. J. Skinner), Wiley, New York.

16 Scand J Statist 37 Transformation and smoothing 511 Davison, A. C. & Hinkley, D. V. (1997). Bootstrap methods and their application. Cambridge University Press, Cambridge. Duan, N. (1983). Smearing estimate: a non-parametric retransformation estimate. J. Amer. Statist. Assoc. 78, Fan, J. & Gijbels, I. (1996). Local polynomial modelling and its applications. Chapman and Hall, London. Härdle, W., Müller, M., Sperlich, S. & Werwaltz, A. (2000). Non-parametric and semiparametric models. Springer, New York. Hastie, T. & Tibshirani, R. (1990). Generalized additive models. Chapman and Hall, London. Karlberg, F. (2000a). Population total prediction under a lognormal superpopulation model. Metron 58, Karlberg, F. (2000b). Survey estimation for highly skewed population in the presence of zeroes. J. Off. Stat. 16, Kokic, P., Chambers, R. & Beare, S. (2000). Microsimulation of business performance. Int. Stat. Rev. 68, Linton, O. B., Chen, R., Wang, N. & Härdle, W. (1997). An analysis of transformations for additive non-parametric regression. J. Amer. Statist. Assoc. 92, Valliant, R., Dorfman, A. H. & Royall, R. M. (2000). Finite population sampling and inference: a prediction approach. Wiley, New York. Welsh, A. H. & Zhou, X. H. (2006). Estimating the retransformed mean in a heteroscedastic two-part model. J. Statist. Plann. Inference 136, Wu, C. & Sitter, R. R. (2001). A model calibration approach to using complete auxiliary information from survey data. J. Amer. Statist. Assoc. 96, Received February 2009, in final form December 2009 Yanyuan Ma, Department of Statistics, Texas A&M University, 3143 TAMU, College Station, TX , USA. ma@stat.tamu.edu Appendix: Proof of the theorem We outline the proof of the theorem for case (iv) and then (iii). The full proof is available in the online Supporting Information which may be found in the online version of this article. First note that the {ɛ i } are identically distributed so that α(x j ) = n 1 E[g{m(X j ) + σɛ i } X j ] = E[g{m(X j ) + σɛ j } X j ] = E(Y j X j ). Non-parametric case with transformation and smearing Write P = {Ŷ j E(Y j X j )} = {Ŷ j E(Ŷ j X )} + {E(Ŷ j X ) E(Y j X j )}. We develop an expansion for Ŷ j which will also lead to an expansion for E(Ŷ j X ). First, we make a (quadratic) Taylor series approximation to g in Ŷ j in powers of ˆm(X j ) ˆm(X i ) m(x j ) + m(x i ) to obtain Ŷ j = S 1 + S 2 + S 3 + S 4, where S 4 is the remainder term. Then, using the standard first-order bias plus stochastic term approximation to the estimator ˆm, we can write ˆm(X j ) ˆm(X i ) m(x j ) + m(x i ) = h 2 C{m (X j ) m (X i )} + σ {w k (X j ) w k (X i )}ɛ k + o(h 2 ) + O(n 1 ). (7) k s

Single Index Quantile Regression for Heteroscedastic Data

Single Index Quantile Regression for Heteroscedastic Data Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University SMAC, November 6, 2015 E. Christou, M. G. Akritas (PSU) SIQR

More information

Single Index Quantile Regression for Heteroscedastic Data

Single Index Quantile Regression for Heteroscedastic Data Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University JSM, 2015 E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015

More information

Finite Population Sampling and Inference

Finite Population Sampling and Inference Finite Population Sampling and Inference A Prediction Approach RICHARD VALLIANT ALAN H. DORFMAN RICHARD M. ROYALL A Wiley-Interscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane

More information

Nonparametric Methods

Nonparametric Methods Nonparametric Methods Michael R. Roberts Department of Finance The Wharton School University of Pennsylvania July 28, 2009 Michael R. Roberts Nonparametric Methods 1/42 Overview Great for data analysis

More information

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas 0 0 5 Motivation: Regression discontinuity (Angrist&Pischke) Outcome.5 1 1.5 A. Linear E[Y 0i X i] 0.2.4.6.8 1 X Outcome.5 1 1.5 B. Nonlinear E[Y 0i X i] i 0.2.4.6.8 1 X utcome.5 1 1.5 C. Nonlinearity

More information

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of

More information

Modelling Non-linear and Non-stationary Time Series

Modelling Non-linear and Non-stationary Time Series Modelling Non-linear and Non-stationary Time Series Chapter 2: Non-parametric methods Henrik Madsen Advanced Time Series Analysis September 206 Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September

More information

DESIGN-ADAPTIVE MINIMAX LOCAL LINEAR REGRESSION FOR LONGITUDINAL/CLUSTERED DATA

DESIGN-ADAPTIVE MINIMAX LOCAL LINEAR REGRESSION FOR LONGITUDINAL/CLUSTERED DATA Statistica Sinica 18(2008), 515-534 DESIGN-ADAPTIVE MINIMAX LOCAL LINEAR REGRESSION FOR LONGITUDINAL/CLUSTERED DATA Kani Chen 1, Jianqing Fan 2 and Zhezhen Jin 3 1 Hong Kong University of Science and Technology,

More information

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon Jianqing Fan Department of Statistics Chinese University of Hong Kong AND Department of Statistics

More information

Goodness-of-fit tests for the cure rate in a mixture cure model

Goodness-of-fit tests for the cure rate in a mixture cure model Biometrika (217), 13, 1, pp. 1 7 Printed in Great Britain Advance Access publication on 31 July 216 Goodness-of-fit tests for the cure rate in a mixture cure model BY U.U. MÜLLER Department of Statistics,

More information

Test for Discontinuities in Nonparametric Regression

Test for Discontinuities in Nonparametric Regression Communications of the Korean Statistical Society Vol. 15, No. 5, 2008, pp. 709 717 Test for Discontinuities in Nonparametric Regression Dongryeon Park 1) Abstract The difference of two one-sided kernel

More information

Nonparametric Principal Components Regression

Nonparametric Principal Components Regression Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS031) p.4574 Nonparametric Principal Components Regression Barrios, Erniel University of the Philippines Diliman,

More information

Nonparametric Regression

Nonparametric Regression Nonparametric Regression Econ 674 Purdue University April 8, 2009 Justin L. Tobias (Purdue) Nonparametric Regression April 8, 2009 1 / 31 Consider the univariate nonparametric regression model: where y

More information

Model-free prediction intervals for regression and autoregression. Dimitris N. Politis University of California, San Diego

Model-free prediction intervals for regression and autoregression. Dimitris N. Politis University of California, San Diego Model-free prediction intervals for regression and autoregression Dimitris N. Politis University of California, San Diego To explain or to predict? Models are indispensable for exploring/utilizing relationships

More information

SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES

SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES Statistica Sinica 19 (2009), 71-81 SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES Song Xi Chen 1,2 and Chiu Min Wong 3 1 Iowa State University, 2 Peking University and

More information

Nonparametric Modal Regression

Nonparametric Modal Regression Nonparametric Modal Regression Summary In this article, we propose a new nonparametric modal regression model, which aims to estimate the mode of the conditional density of Y given predictors X. The nonparametric

More information

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods Chapter 4 Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods 4.1 Introduction It is now explicable that ridge regression estimator (here we take ordinary ridge estimator (ORE)

More information

On prediction and density estimation Peter McCullagh University of Chicago December 2004

On prediction and density estimation Peter McCullagh University of Chicago December 2004 On prediction and density estimation Peter McCullagh University of Chicago December 2004 Summary Having observed the initial segment of a random sequence, subsequent values may be predicted by calculating

More information

Professors Lin and Ying are to be congratulated for an interesting paper on a challenging topic and for introducing survival analysis techniques to th

Professors Lin and Ying are to be congratulated for an interesting paper on a challenging topic and for introducing survival analysis techniques to th DISCUSSION OF THE PAPER BY LIN AND YING Xihong Lin and Raymond J. Carroll Λ July 21, 2000 Λ Xihong Lin (xlin@sph.umich.edu) is Associate Professor, Department ofbiostatistics, University of Michigan, Ann

More information

Additive Isotonic Regression

Additive Isotonic Regression Additive Isotonic Regression Enno Mammen and Kyusang Yu 11. July 2006 INTRODUCTION: We have i.i.d. random vectors (Y 1, X 1 ),..., (Y n, X n ) with X i = (X1 i,..., X d i ) and we consider the additive

More information

Issues on quantile autoregression

Issues on quantile autoregression Issues on quantile autoregression Jianqing Fan and Yingying Fan We congratulate Koenker and Xiao on their interesting and important contribution to the quantile autoregression (QAR). The paper provides

More information

Nonparametric Regression Härdle, Müller, Sperlich, Werwarz, 1995, Nonparametric and Semiparametric Models, An Introduction

Nonparametric Regression Härdle, Müller, Sperlich, Werwarz, 1995, Nonparametric and Semiparametric Models, An Introduction Härdle, Müller, Sperlich, Werwarz, 1995, Nonparametric and Semiparametric Models, An Introduction Tine Buch-Kromann Univariate Kernel Regression The relationship between two variables, X and Y where m(

More information

Do Markov-Switching Models Capture Nonlinearities in the Data? Tests using Nonparametric Methods

Do Markov-Switching Models Capture Nonlinearities in the Data? Tests using Nonparametric Methods Do Markov-Switching Models Capture Nonlinearities in the Data? Tests using Nonparametric Methods Robert V. Breunig Centre for Economic Policy Research, Research School of Social Sciences and School of

More information

University, Tempe, Arizona, USA b Department of Mathematics and Statistics, University of New. Mexico, Albuquerque, New Mexico, USA

University, Tempe, Arizona, USA b Department of Mathematics and Statistics, University of New. Mexico, Albuquerque, New Mexico, USA This article was downloaded by: [University of New Mexico] On: 27 September 2012, At: 22:13 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered

More information

Nonparametric Econometrics

Nonparametric Econometrics Applied Microeconometrics with Stata Nonparametric Econometrics Spring Term 2011 1 / 37 Contents Introduction The histogram estimator The kernel density estimator Nonparametric regression estimators Semi-

More information

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) 1 EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) Taisuke Otsu London School of Economics Summer 2018 A.1. Summation operator (Wooldridge, App. A.1) 2 3 Summation operator For

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

A Simulation Study on Confidence Interval Procedures of Some Mean Cumulative Function Estimators

A Simulation Study on Confidence Interval Procedures of Some Mean Cumulative Function Estimators Statistics Preprints Statistics -00 A Simulation Study on Confidence Interval Procedures of Some Mean Cumulative Function Estimators Jianying Zuo Iowa State University, jiyizu@iastate.edu William Q. Meeker

More information

Smooth simultaneous confidence bands for cumulative distribution functions

Smooth simultaneous confidence bands for cumulative distribution functions Journal of Nonparametric Statistics, 2013 Vol. 25, No. 2, 395 407, http://dx.doi.org/10.1080/10485252.2012.759219 Smooth simultaneous confidence bands for cumulative distribution functions Jiangyan Wang

More information

On the Robust Modal Local Polynomial Regression

On the Robust Modal Local Polynomial Regression International Journal of Statistical Sciences ISSN 683 5603 Vol. 9(Special Issue), 2009, pp 27-23 c 2009 Dept. of Statistics, Univ. of Rajshahi, Bangladesh On the Robust Modal Local Polynomial Regression

More information

Lawrence D. Brown* and Daniel McCarthy*

Lawrence D. Brown* and Daniel McCarthy* Comments on the paper, An adaptive resampling test for detecting the presence of significant predictors by I. W. McKeague and M. Qian Lawrence D. Brown* and Daniel McCarthy* ABSTRACT: This commentary deals

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 1 Bootstrapped Bias and CIs Given a multiple regression model with mean and

More information

DEPARTMENT MATHEMATIK ARBEITSBEREICH MATHEMATISCHE STATISTIK UND STOCHASTISCHE PROZESSE

DEPARTMENT MATHEMATIK ARBEITSBEREICH MATHEMATISCHE STATISTIK UND STOCHASTISCHE PROZESSE Estimating the error distribution in nonparametric multiple regression with applications to model testing Natalie Neumeyer & Ingrid Van Keilegom Preprint No. 2008-01 July 2008 DEPARTMENT MATHEMATIK ARBEITSBEREICH

More information

Local Polynomial Regression

Local Polynomial Regression VI Local Polynomial Regression (1) Global polynomial regression We observe random pairs (X 1, Y 1 ),, (X n, Y n ) where (X 1, Y 1 ),, (X n, Y n ) iid (X, Y ). We want to estimate m(x) = E(Y X = x) based

More information

UNIVERSITÄT POTSDAM Institut für Mathematik

UNIVERSITÄT POTSDAM Institut für Mathematik UNIVERSITÄT POTSDAM Institut für Mathematik Testing the Acceleration Function in Life Time Models Hannelore Liero Matthias Liero Mathematische Statistik und Wahrscheinlichkeitstheorie Universität Potsdam

More information

Chapter 9. Non-Parametric Density Function Estimation

Chapter 9. Non-Parametric Density Function Estimation 9-1 Density Estimation Version 1.2 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least

More information

Fractional Polynomial Regression

Fractional Polynomial Regression Chapter 382 Fractional Polynomial Regression Introduction This program fits fractional polynomial models in situations in which there is one dependent (Y) variable and one independent (X) variable. It

More information

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistics - Lecture One. Outline. Charlotte Wickham  1. Basic ideas about estimation Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence

More information

Economics Division University of Southampton Southampton SO17 1BJ, UK. Title Overlapping Sub-sampling and invariance to initial conditions

Economics Division University of Southampton Southampton SO17 1BJ, UK. Title Overlapping Sub-sampling and invariance to initial conditions Economics Division University of Southampton Southampton SO17 1BJ, UK Discussion Papers in Economics and Econometrics Title Overlapping Sub-sampling and invariance to initial conditions By Maria Kyriacou

More information

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University Integrated Likelihood Estimation in Semiparametric Regression Models Thomas A. Severini Department of Statistics Northwestern University Joint work with Heping He, University of York Introduction Let Y

More information

Time Series and Forecasting Lecture 4 NonLinear Time Series

Time Series and Forecasting Lecture 4 NonLinear Time Series Time Series and Forecasting Lecture 4 NonLinear Time Series Bruce E. Hansen Summer School in Economics and Econometrics University of Crete July 23-27, 2012 Bruce Hansen (University of Wisconsin) Foundations

More information

DIRECT AND INDIRECT LEAST SQUARES APPROXIMATING POLYNOMIALS FOR THE FIRST DERIVATIVE FUNCTION

DIRECT AND INDIRECT LEAST SQUARES APPROXIMATING POLYNOMIALS FOR THE FIRST DERIVATIVE FUNCTION Applied Probability Trust (27 October 2016) DIRECT AND INDIRECT LEAST SQUARES APPROXIMATING POLYNOMIALS FOR THE FIRST DERIVATIVE FUNCTION T. VAN HECKE, Ghent University Abstract Finite difference methods

More information

Local Polynomial Modelling and Its Applications

Local Polynomial Modelling and Its Applications Local Polynomial Modelling and Its Applications J. Fan Department of Statistics University of North Carolina Chapel Hill, USA and I. Gijbels Institute of Statistics Catholic University oflouvain Louvain-la-Neuve,

More information

Nonparametric Inference via Bootstrapping the Debiased Estimator

Nonparametric Inference via Bootstrapping the Debiased Estimator Nonparametric Inference via Bootstrapping the Debiased Estimator Yen-Chi Chen Department of Statistics, University of Washington ICSA-Canada Chapter Symposium 2017 1 / 21 Problem Setup Let X 1,, X n be

More information

Nonparametric Regression. Changliang Zou

Nonparametric Regression. Changliang Zou Nonparametric Regression Institute of Statistics, Nankai University Email: nk.chlzou@gmail.com Smoothing parameter selection An overall measure of how well m h (x) performs in estimating m(x) over x (0,

More information

A nonparametric method of multi-step ahead forecasting in diffusion processes

A nonparametric method of multi-step ahead forecasting in diffusion processes A nonparametric method of multi-step ahead forecasting in diffusion processes Mariko Yamamura a, Isao Shoji b a School of Pharmacy, Kitasato University, Minato-ku, Tokyo, 108-8641, Japan. b Graduate School

More information

Defect Detection using Nonparametric Regression

Defect Detection using Nonparametric Regression Defect Detection using Nonparametric Regression Siana Halim Industrial Engineering Department-Petra Christian University Siwalankerto 121-131 Surabaya- Indonesia halim@petra.ac.id Abstract: To compare

More information

Local linear multiple regression with variable. bandwidth in the presence of heteroscedasticity

Local linear multiple regression with variable. bandwidth in the presence of heteroscedasticity Local linear multiple regression with variable bandwidth in the presence of heteroscedasticity Azhong Ye 1 Rob J Hyndman 2 Zinai Li 3 23 January 2006 Abstract: We present local linear estimator with variable

More information

A Note on Data-Adaptive Bandwidth Selection for Sequential Kernel Smoothers

A Note on Data-Adaptive Bandwidth Selection for Sequential Kernel Smoothers 6th St.Petersburg Workshop on Simulation (2009) 1-3 A Note on Data-Adaptive Bandwidth Selection for Sequential Kernel Smoothers Ansgar Steland 1 Abstract Sequential kernel smoothers form a class of procedures

More information

Extending clustered point process-based rainfall models to a non-stationary climate

Extending clustered point process-based rainfall models to a non-stationary climate Extending clustered point process-based rainfall models to a non-stationary climate Jo Kaczmarska 1, 2 Valerie Isham 2 Paul Northrop 2 1 Risk Management Solutions 2 Department of Statistical Science, University

More information

Modification and Improvement of Empirical Likelihood for Missing Response Problem

Modification and Improvement of Empirical Likelihood for Missing Response Problem UW Biostatistics Working Paper Series 12-30-2010 Modification and Improvement of Empirical Likelihood for Missing Response Problem Kwun Chuen Gary Chan University of Washington - Seattle Campus, kcgchan@u.washington.edu

More information

Estimation of AUC from 0 to Infinity in Serial Sacrifice Designs

Estimation of AUC from 0 to Infinity in Serial Sacrifice Designs Estimation of AUC from 0 to Infinity in Serial Sacrifice Designs Martin J. Wolfsegger Department of Biostatistics, Baxter AG, Vienna, Austria Thomas Jaki Department of Statistics, University of South Carolina,

More information

A kernel indicator variogram and its application to groundwater pollution

A kernel indicator variogram and its application to groundwater pollution Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session IPS101) p.1514 A kernel indicator variogram and its application to groundwater pollution data Menezes, Raquel University

More information

Chapter 9. Non-Parametric Density Function Estimation

Chapter 9. Non-Parametric Density Function Estimation 9-1 Density Estimation Version 1.1 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

The choice of weights in kernel regression estimation

The choice of weights in kernel regression estimation Biometrika (1990), 77, 2, pp. 377-81 Printed in Great Britain The choice of weights in kernel regression estimation BY THEO GASSER Zentralinstitut fur Seelische Gesundheit, J5, P.O.B. 5970, 6800 Mannheim

More information

CONCEPT OF DENSITY FOR FUNCTIONAL DATA

CONCEPT OF DENSITY FOR FUNCTIONAL DATA CONCEPT OF DENSITY FOR FUNCTIONAL DATA AURORE DELAIGLE U MELBOURNE & U BRISTOL PETER HALL U MELBOURNE & UC DAVIS 1 CONCEPT OF DENSITY IN FUNCTIONAL DATA ANALYSIS The notion of probability density for a

More information

ECO Class 6 Nonparametric Econometrics

ECO Class 6 Nonparametric Econometrics ECO 523 - Class 6 Nonparametric Econometrics Carolina Caetano Contents 1 Nonparametric instrumental variable regression 1 2 Nonparametric Estimation of Average Treatment Effects 3 2.1 Asymptotic results................................

More information

Supporting Information for Estimating restricted mean. treatment effects with stacked survival models

Supporting Information for Estimating restricted mean. treatment effects with stacked survival models Supporting Information for Estimating restricted mean treatment effects with stacked survival models Andrew Wey, David Vock, John Connett, and Kyle Rudser Section 1 presents several extensions to the simulation

More information

Nonparametric Identi cation and Estimation of Truncated Regression Models with Heteroskedasticity

Nonparametric Identi cation and Estimation of Truncated Regression Models with Heteroskedasticity Nonparametric Identi cation and Estimation of Truncated Regression Models with Heteroskedasticity Songnian Chen a, Xun Lu a, Xianbo Zhou b and Yahong Zhou c a Department of Economics, Hong Kong University

More information

Monte Carlo Study on the Successive Difference Replication Method for Non-Linear Statistics

Monte Carlo Study on the Successive Difference Replication Method for Non-Linear Statistics Monte Carlo Study on the Successive Difference Replication Method for Non-Linear Statistics Amang S. Sukasih, Mathematica Policy Research, Inc. Donsig Jang, Mathematica Policy Research, Inc. Amang S. Sukasih,

More information

On the Accuracy of Bootstrap Confidence Intervals for Efficiency Levels in Stochastic Frontier Models with Panel Data

On the Accuracy of Bootstrap Confidence Intervals for Efficiency Levels in Stochastic Frontier Models with Panel Data On the Accuracy of Bootstrap Confidence Intervals for Efficiency Levels in Stochastic Frontier Models with Panel Data Myungsup Kim University of North Texas Yangseon Kim East-West Center Peter Schmidt

More information

The exact bootstrap method shown on the example of the mean and variance estimation

The exact bootstrap method shown on the example of the mean and variance estimation Comput Stat (2013) 28:1061 1077 DOI 10.1007/s00180-012-0350-0 ORIGINAL PAPER The exact bootstrap method shown on the example of the mean and variance estimation Joanna Kisielinska Received: 21 May 2011

More information

Inferences for the Ratio: Fieller s Interval, Log Ratio, and Large Sample Based Confidence Intervals

Inferences for the Ratio: Fieller s Interval, Log Ratio, and Large Sample Based Confidence Intervals Inferences for the Ratio: Fieller s Interval, Log Ratio, and Large Sample Based Confidence Intervals Michael Sherman Department of Statistics, 3143 TAMU, Texas A&M University, College Station, Texas 77843,

More information

Nonparametric Heteroscedastic Transformation Regression Models for Skewed Data, with an Application to Health Care Costs

Nonparametric Heteroscedastic Transformation Regression Models for Skewed Data, with an Application to Health Care Costs Nonparametric Heteroscedastic Transformation Regression Models for Skewed Data, with an Application to Health Care Costs Xiao-Hua Zhou, Huazhen Lin, Eric Johnson Journal of Royal Statistical Society Series

More information

Function of Longitudinal Data

Function of Longitudinal Data New Local Estimation Procedure for Nonparametric Regression Function of Longitudinal Data Weixin Yao and Runze Li Abstract This paper develops a new estimation of nonparametric regression functions for

More information

Minimax design criterion for fractional factorial designs

Minimax design criterion for fractional factorial designs Ann Inst Stat Math 205 67:673 685 DOI 0.007/s0463-04-0470-0 Minimax design criterion for fractional factorial designs Yue Yin Julie Zhou Received: 2 November 203 / Revised: 5 March 204 / Published online:

More information

A Shape Constrained Estimator of Bidding Function of First-Price Sealed-Bid Auctions

A Shape Constrained Estimator of Bidding Function of First-Price Sealed-Bid Auctions A Shape Constrained Estimator of Bidding Function of First-Price Sealed-Bid Auctions Yu Yvette Zhang Abstract This paper is concerned with economic analysis of first-price sealed-bid auctions with risk

More information

ESTIMATION OF NONLINEAR BERKSON-TYPE MEASUREMENT ERROR MODELS

ESTIMATION OF NONLINEAR BERKSON-TYPE MEASUREMENT ERROR MODELS Statistica Sinica 13(2003), 1201-1210 ESTIMATION OF NONLINEAR BERKSON-TYPE MEASUREMENT ERROR MODELS Liqun Wang University of Manitoba Abstract: This paper studies a minimum distance moment estimator for

More information

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Sahar Z Zangeneh Robert W. Keener Roderick J.A. Little Abstract In Probability proportional

More information

Efficient Robbins-Monro Procedure for Binary Data

Efficient Robbins-Monro Procedure for Binary Data Efficient Robbins-Monro Procedure for Binary Data V. Roshan Joseph School of Industrial and Systems Engineering Georgia Institute of Technology Atlanta, GA 30332-0205, USA roshan@isye.gatech.edu SUMMARY

More information

Bootstrap. Director of Center for Astrostatistics. G. Jogesh Babu. Penn State University babu.

Bootstrap. Director of Center for Astrostatistics. G. Jogesh Babu. Penn State University  babu. Bootstrap G. Jogesh Babu Penn State University http://www.stat.psu.edu/ babu Director of Center for Astrostatistics http://astrostatistics.psu.edu Outline 1 Motivation 2 Simple statistical problem 3 Resampling

More information

STA 2201/442 Assignment 2

STA 2201/442 Assignment 2 STA 2201/442 Assignment 2 1. This is about how to simulate from a continuous univariate distribution. Let the random variable X have a continuous distribution with density f X (x) and cumulative distribution

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics Jiti Gao Department of Statistics School of Mathematics and Statistics The University of Western Australia Crawley

More information

Estimating a frontier function using a high-order moments method

Estimating a frontier function using a high-order moments method 1/ 16 Estimating a frontier function using a high-order moments method Gilles STUPFLER (University of Nottingham) Joint work with Stéphane GIRARD (INRIA Rhône-Alpes) and Armelle GUILLOU (Université de

More information

INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION. 1. Introduction

INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION. 1. Introduction INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION VICTOR CHERNOZHUKOV CHRISTIAN HANSEN MICHAEL JANSSON Abstract. We consider asymptotic and finite-sample confidence bounds in instrumental

More information

Introduction. Linear Regression. coefficient estimates for the wage equation: E(Y X) = X 1 β X d β d = X β

Introduction. Linear Regression. coefficient estimates for the wage equation: E(Y X) = X 1 β X d β d = X β Introduction - Introduction -2 Introduction Linear Regression E(Y X) = X β +...+X d β d = X β Example: Wage equation Y = log wages, X = schooling (measured in years), labor market experience (measured

More information

Combining data from two independent surveys: model-assisted approach

Combining data from two independent surveys: model-assisted approach Combining data from two independent surveys: model-assisted approach Jae Kwang Kim 1 Iowa State University January 20, 2012 1 Joint work with J.N.K. Rao, Carleton University Reference Kim, J.K. and Rao,

More information

NADARAYA WATSON ESTIMATE JAN 10, 2006: version 2. Y ik ( x i

NADARAYA WATSON ESTIMATE JAN 10, 2006: version 2. Y ik ( x i NADARAYA WATSON ESTIMATE JAN 0, 2006: version 2 DATA: (x i, Y i, i =,..., n. ESTIMATE E(Y x = m(x by n i= ˆm (x = Y ik ( x i x n i= K ( x i x EXAMPLES OF K: K(u = I{ u c} (uniform or box kernel K(u = u

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Non-parametric bootstrap mean squared error estimation for M-quantile estimates of small area means, quantiles and poverty indicators

Non-parametric bootstrap mean squared error estimation for M-quantile estimates of small area means, quantiles and poverty indicators Non-parametric bootstrap mean squared error estimation for M-quantile estimates of small area means, quantiles and poverty indicators Stefano Marchetti 1 Nikos Tzavidis 2 Monica Pratesi 3 1,3 Department

More information

Smooth nonparametric estimation of a quantile function under right censoring using beta kernels

Smooth nonparametric estimation of a quantile function under right censoring using beta kernels Smooth nonparametric estimation of a quantile function under right censoring using beta kernels Chanseok Park 1 Department of Mathematical Sciences, Clemson University, Clemson, SC 29634 Short Title: Smooth

More information

Computational treatment of the error distribution in nonparametric regression with right-censored and selection-biased data

Computational treatment of the error distribution in nonparametric regression with right-censored and selection-biased data Computational treatment of the error distribution in nonparametric regression with right-censored and selection-biased data Géraldine Laurent 1 and Cédric Heuchenne 2 1 QuantOM, HEC-Management School of

More information

A Note on Bootstraps and Robustness. Tony Lancaster, Brown University, December 2003.

A Note on Bootstraps and Robustness. Tony Lancaster, Brown University, December 2003. A Note on Bootstraps and Robustness Tony Lancaster, Brown University, December 2003. In this note we consider several versions of the bootstrap and argue that it is helpful in explaining and thinking about

More information

Statistica Sinica Preprint No: SS

Statistica Sinica Preprint No: SS Statistica Sinica Preprint No: SS-017-0013 Title A Bootstrap Method for Constructing Pointwise and Uniform Confidence Bands for Conditional Quantile Functions Manuscript ID SS-017-0013 URL http://wwwstatsinicaedutw/statistica/

More information

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive

More information

Rejoinder. 1 Phase I and Phase II Profile Monitoring. Peihua Qiu 1, Changliang Zou 2 and Zhaojun Wang 2

Rejoinder. 1 Phase I and Phase II Profile Monitoring. Peihua Qiu 1, Changliang Zou 2 and Zhaojun Wang 2 Rejoinder Peihua Qiu 1, Changliang Zou 2 and Zhaojun Wang 2 1 School of Statistics, University of Minnesota 2 LPMC and Department of Statistics, Nankai University, China We thank the editor Professor David

More information

STAT 704 Sections IRLS and Bootstrap

STAT 704 Sections IRLS and Bootstrap STAT 704 Sections 11.4-11.5. IRLS and John Grego Department of Statistics, University of South Carolina Stat 704: Data Analysis I 1 / 14 LOWESS IRLS LOWESS LOWESS (LOcally WEighted Scatterplot Smoothing)

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.

More information

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij = K. Model Diagnostics We ve already seen how to check model assumptions prior to fitting a one-way ANOVA. Diagnostics carried out after model fitting by using residuals are more informative for assessing

More information

Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model

Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model 1. Introduction Varying-coefficient partially linear model (Zhang, Lee, and Song, 2002; Xia, Zhang, and Tong, 2004;

More information

Robustness to Parametric Assumptions in Missing Data Models

Robustness to Parametric Assumptions in Missing Data Models Robustness to Parametric Assumptions in Missing Data Models Bryan Graham NYU Keisuke Hirano University of Arizona April 2011 Motivation Motivation We consider the classic missing data problem. In practice

More information

Additional results for model-based nonparametric variance estimation for systematic sampling in a forestry survey

Additional results for model-based nonparametric variance estimation for systematic sampling in a forestry survey Additional results for model-based nonparametric variance estimation for systematic sampling in a forestry survey J.D. Opsomer Colorado State University M. Francisco-Fernández Universidad de A Coruña July

More information

Diagnostics can identify two possible areas of failure of assumptions when fitting linear models.

Diagnostics can identify two possible areas of failure of assumptions when fitting linear models. 1 Transformations 1.1 Introduction Diagnostics can identify two possible areas of failure of assumptions when fitting linear models. (i) lack of Normality (ii) heterogeneity of variances It is important

More information

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects Contents 1 Review of Residuals 2 Detecting Outliers 3 Influential Observations 4 Multicollinearity and its Effects W. Zhou (Colorado State University) STAT 540 July 6th, 2015 1 / 32 Model Diagnostics:

More information

Titolo Smooth Backfitting with R

Titolo Smooth Backfitting with R Rapporto n. 176 Titolo Smooth Backfitting with R Autori Alberto Arcagni, Luca Bagnato ottobre 2009 Dipartimento di Metodi Quantitativi per le Scienze Economiche ed Aziendali Università degli Studi di Milano

More information

Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis

Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis Sílvia Gonçalves and Benoit Perron Département de sciences économiques,

More information

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Elizabeth C. Mannshardt-Shamseldin Advisor: Richard L. Smith Duke University Department

More information