Single-index model-assisted estimation in survey sampling

Size: px
Start display at page:

Download "Single-index model-assisted estimation in survey sampling"

Transcription

1 Journal of onparametric Statistics Vol. 21, o. 4, May 2009, Single-index model-assisted estimation in survey sampling Li Wang* Department of Statistics, University of Georgia, Athens, GA, 30602, USA (Received 4 June 2008; final version received 19 January 2009 ) A model-assisted semiparametric method of estimating finite-population totals is investigated to improve the precision of survey estimators by incorporating multivariate auxiliary information. The proposed superpopulation model is a single-index model (SIM) which has proven to be a simple and efficient semiparametric tool in multivariate regression. A class of estimators based on polynomial spline regression is proposed. These estimators are robust against deviation from SIMs. Under standard design conditions, the proposed estimators are asymptotically design-unbiased, consistent and asymptotically normal. An iterative optimisation routine is provided that is sufficiently fast for users to analyze large and complex survey data within seconds. The proposed method has been applied to simulated datasets and MU281 dataset, which have provided strong evidence that corroborates with the asymptotic theory. Keywords: Horvitz Thompson estimator; model-assisted estimation; semiparametric; spline smoothing; superpopulation AMS Subject Classification: 62D05; 62G08 1. Introduction In this article, the classic finite-population estimation problem is investigated. In what follows, let U ={1,...,i,...,} denote the units of finite population. For each i U, let y i be a generic characteristic and the objective is to estimate t y = i U y i. A probability sample s is drawn from U according to a fixed sampling design p ( ), where p (s) is the probability of drawing the sample s. Let i i = Pr{i s} = s i p (s) denote the inclusion probability for element i U and ij ij = Pr{i, j s} = s i,j p (s) denote the inclusion probability for element i, j U. If no information other than the inclusion probabilities is used to estimate t y, a well-known design-unbiased estimator is the Horvitz Thompson (HT) estimator ˆt y ˆt y = i s y i i. (1) * lilywang@uga.edu ISS print/iss online 2009 Taylor & Francis DOI: /

2 488 L. Wang The variance of the HT estimator under the sampling design is Var p (ˆt y ) = ( ij i j ) y i y j. i,j U i j The efficiency of the HT estimator can be significantly improved by incorporating some cheap auxiliary information at the population level in addition to sample data. Such auxiliary information is often available for all elements of the population of interest in many surveys. For instance, in many countries, administrative registers provide extensive sources of auxiliary information. Complete registers can give access to variables such as sex, age, income and country of birth. Studies of labor force characteristics or household expenditure patterns, for example, might benefit from these auxiliary data. Another example is the satellite images or GPS data used in spatial sampling. These data are often collected at the population level, which are often available at little or no extra cost, especially compared with the cost of collecting the survey data. For more examples of auxiliary information, see [1 4]. Use of auxiliary information to improve the accuracy of survey estimators actually dates back to post-stratification, calibration, ratio and regression estimation; see [5 8] for a general review of these methods. Auxiliary information can also be used to increase the accuracy of the finite-population distribution function, for example, [9]. In this article, let x i ={x i1,...,x id } be a d-dimensional auxiliary variable vector, i U, and assume that {(x i,y i )} i U is a realisation of (X,Y)from an infinite superpopulation, ξ, satisfying Y = m(x) + σ(x)ε, (2) in which the d-variate function m is the unknown mean function of Y conditional on the auxiliary information vector X, often is assumed to be smooth; σ is the unknown standard deviation (SD) function. The standard error satisfies that E ξ (ε X) = 0 and E ξ (ε 2 X) = 1, where E ξ is the expectation with respect to the population ξ. The interesting problem is how to take advantage of the regression relationship (2) to better estimate t y. The traditional parametric approach to analyse a regression relationship assumes that the superpopulation model is fully described by a finite set of parameters, for example, the linear regression (LREG) estimator discussed in [7]. However, it sometimes requires prohibitively complex models with a very large number of parameters to address various hypotheses. It is very difficult to obtain any prior model information about the regression function m in Equation (2), and substantial estimation bias can result if a preselected parametric model is too restricted to fit unexpected features. As an alternative, one can try to estimate the unknown regression relationships nonparametrically without reference to a specific form. The flexibility of nonparametric smoothing/regression is extremely helpful in exploratory data analysis as well as in obtaining robust predictions, see [10,11] for details. onparametric methods for survey data are rather sparse and have begun to emerge as important and practical tools, see [12 17]. Breidt and Opsomer [12] first proposed a nonparametric model-assisted estimator based on local polynomial regression, which generalised the parametric framework in survey sampling and improved the precision of the survey estimators. Their investigation is restricted to the scalar case, i.e., d = 1. owadays most surveys involve more than one auxiliary variable [18]. For example, the auxiliary information obtained from remote sensing data, satellite images and GPS data provide a wide and growing range of variables to be employed in spatial sampling. The ortheastern lakes survey discussed in [19,20] is a good example of this. In that study, a lot of information, such as longitude, latitude, and elevation, of every lake in the population is known for the Environmental Monitoring and Assessment Program of the US Environmental Protection Agency. In addition, the growing possibilities of information and communication technology have made it possible to develop very large and complex surveys. In this

3 Journal of onparametric Statistics 489 article, a d-dimensional auxiliary vector is considered to improve the efficiency of estimating t y for both small and large surveys. Research in nonparametric survey theory and methodology when the dimension of the auxiliary information vector is high, however, is quite challenging. A key difficulty is due to the issue of curse of dimensionality : the optimal rate of convergence decreases with dimensionality [21]. One solution is regression in the form of additive model popularised by Hastie and Tibshirani [22]; see [13,15,19] for possible application of additive model to survey sampling. A weakness of the purely additive model is that interactions between the explanatory variables are completely ignored [23]. An attractive alternative to additive model is the single-index model (SIM) given in Equation (3). Similar to the first step of projection pursuit regression, SIM reduces dimensionality but does not incorporate interactions; see [24 28] for instance. The basic appeal of SIM is that it is in nature a hybrid method of parametric and nonparametric regression. It preserves the simplicity of parametric regression where simplicity is sufficient: the d-variate function m(x) = m(x 1,...,x d ) is expressed as a univariate function of x T θ 0 = d q=1 x qθ 0,q ; it also employs the flexibility of nonparametric regression where flexibility is necessary. In this article, I investigate the SIM-assisted estimator for the finite-population total, that is, the superpopulation model in Equation (2) is assumed to be an SIM. Under standard design conditions, a design-consistent estimator of θ 0 has been obtained using polynomial splines, and the proposed estimator of t y is asymptotically design-unbiased (ADU), consistent and asymptotically normal. By taking advantage of the spline smoothing and iterative optimisation routines, the proposed method is particularly computationally efficient compared with the kernel additive model approaches in the literature of nonparametric survey estimation, in which iterative approaches such as a backfitting algorithm [19,22] or marginal integration [29] are necessary. The rest of the article is organised as follows. Section 2 gives details of model specification and proposes the estimation method. Section 3 describes some nice properties of the proposed estimators. Section 4 provides the actual procedure to implement the method. Section 5 reports the empirical results. Technical proofs are contained in the Appendix. 2. Superpopulation model and proposed estimator 2.1. Single-index superpopulation model In this article, the proposed superpopulation model ξ in (2) is an SIM, where Y = m(x T θ 0 ) + σ(x)ε, (3) where the unknown parameter θ 0 is called the single-index coefficient, used for simple interpretation once estimated; function m is an unknown smooth function used for further data summary. If the SIM is misspecified, however, a goodness-of-fit test is necessary and the estimation of θ 0 must be rethought; see [30]. So in this article, instead of presuming that the underlying true function m is a single-index function like the one defined in Equation (3), the single-index is identified by the best approximation to the multivariate function m. Specifically, a univariate function g is estimated, which optimally approximates the multivariate function m in the sense of g(ν) = E ξ [m(x) X T θ 0 = ν]. (4) The superiority of this method is that it works very well even under model misspecification so that it is much more useful in applications than the traditional SIMs given in Equation (3).

4 490 L. Wang For the superpopulation model defined by Equations (2) and (4), let m θ (X T θ) = E ξ [Y X T θ]=e ξ [m(x) X T θ] for any fixed θ, where as noted in the introduction, E ξ denotes the expected value with respect to the population ξ in (2) and (4). Define the risk function of θ as R(θ) = E ξ [{Y m θ (X T θ)} 2 ]=E ξ {m(x) m θ (X T θ)} 2 + E ξ σ 2 (X), (5) which is uniquely minimised at θ 0 S+ d 1 ={(θ 1,...,θ d ) d q=1 θ q 2 = 1,θ d > 0}. Remark 2.1 It is obvious that without constraints, the coefficient vector θ 0 is identified only up to a constant factor. Typically, one requires that θ 0 =1, which entails that at least one of the coordinates θ 0,1,...,θ 0,d is nonzero. One could assume without loss of generality that θ 0,d > 0, and the candidate θ 0 would then belong to the upper unit hemisphere S+ d Spline smoothing Estimation of both θ 0 and g( ) in model (4) requires a degree of statistical smoothing. In this article, all estimation is carried out via polynomial splines. The use of polynomial spline smoothing in the generalised nonparametric models can be back to [21]. As pointed out in [13,31], one of the important advantages of spline smoothing is the relative ease with which spline estimators can be simply computed, even for large datasets or datasets with regions of sparse data. In addition, spline smoothing is a global smoothing method. After the spline basis is chosen, the coefficients can be estimated by an efficient optimisation procedure. In contrast, kernel-based methods, such as the kernel-based backfitting [19,22] and marginal integration approaches [29], in which the maximising has to be conducted repeatedly at every local data points, are very time-consuming. To introduce the function space of splines of order p, one pre-selects an integer 1/6 J = J 1/5 (log ) 2/5, see Assumption (A4), and divides [0, 1] into (J + 1) subintervals, [k j,k j+1 ), j = 0,...,J 1, [k J, 1], where {k j } J j=1 is a sequence of equally spaced points, called interior knots, given as k 1 p = =k 1 = k 0 = 0 <k 1 < <k J < 1 = k J +1 = =k J +p, in which k j = j/(j + 1), j = 0, 1,...,J + 1. The jth B-spline of order p denoted by B j,p is recursively defined by [32]. In the following, let (2) = (2) [0, 1] be the space of all the second-order smoothness functions that are polynomials of degree 3 on each subinterval. Direct calculation shows that under Assumption (A1) in Section 3.2, for any θ S+ d 1, the variable X T θ has a Lebesgue probability density function (pdf) that is uniformly bounded below and above by the pdf of a rescaled centered Beta{(d + 1)/2,(d + 1)/2}, ( Ɣ(d + 1) 1 v 2 ) (d 1)/2 f d (ν) = I [ a,a](v), Ɣ{(d + 1)/2} 2 2 d a a 2 which vanishes at boundary points a and a. This makes nonparametric smoothing of Y on X T θ difficult. I therefore first transform the variable X T θ by using the cumulative distribution function F d of f d ν/a Ɣ(d + 1) F d (ν) = 1 Ɣ{(d + 1)/2} 2 2 (1 t 2 ) (d 1)/2 dt, ν [ a,a]. (6) d For the rest of the article, denote the transformed variable of the single-index variable X T θ by Z θ and let ϕ θ be the conditional expectation of m given the transformed variable Z θ, i.e., ϕ θ (Z θ ) = E ξ {m(x) Z θ }=E ξ {m(x) X T θ}=m θ (X T θ), (7)

5 Journal of onparametric Statistics 491 Remark 2.2 The transformed variable, Z θ, has a quasi-uniform [0, 1] distribution, i.e., the pdf of the transformed variable is supported on [0, 1] with positive lower bound. In practice, the radius a can take the value of the 100(1 α) percentile of { x i } i U, for example, α = Oracle population-based estimator If the entire realisation were known by oracle, one can create an oracle estimator to estimate θ 0 and g in Equation (4) through a profile least-squares method. Specifically, one estimates the single-index coefficient θ 0 by a consistent estimator θ via minimising the empirical version of the risk function R(θ) defined in Equation (5), i.e., θ = arg min θ S d 1 + R(θ), (8) where R(θ) = 1 {y i ϕ θ (z θi )} 2, (9) i U with ϕ θ ( ) = arg min ϕ( ) (2) [0,1] i U {y i ϕ(z θi )} 2. Then the link function g can be estimated by g, a spline smoother of {y i } i U on {z θi } i U, i.e., g(ν) = ϕ θ (F d(ν)), where F d ( ) is defined in Equation (6). Thus, the best single-index approximation to m(x) is m(x) = g(x T θ) = ϕ θ (z θ ). Let y = (y 1,y 2,...,y ) T, B θ ={B j,4 (z θi )} i U,j= 3,...,J be the B-spline matrix for any fixed θ and e i be a -vector witha1intheith position and 0 elsewhere. Write m i = g(x T i θ) = ϕ θ (z θi ) = et i B θ (BT θ B θ ) 1 B y. (10) T θ Clearly, m i is the spline single-index prediction at x i based on the entire finite population. If these pseudo-predictions m i were known, then a design-unbiased estimator of t y would be the generalised difference estimator t y,diff = i s y i m i i + i U m i, (11) as given in [7, p. 221]. The design variance of t y,diff in Equation (11) is Var p ( t y,diff ) = ( ij i j ) y i m i y j m j. i,j U i j 2.4. Sample-based estimator However, the predictions m i for m(x i ) cannot be computed directly from data, because the only y i s observed are those with i s. Therefore, each m i needs to be replaced by a sample-based consistent estimator. For any fixed θ, the sample-based cubic spline estimator ˆϕ θ of ϕ θ in Equation (7) is defined as ˆϕ θ ( ) = arg min 1 i {y i ϕ(z θi )} 2. ϕ( ) (2) [0,1] i s Define the sample-based empirical risk function of θ ˆR(θ) = 1 i s 1 i {y i ˆϕ θ (z θi )} 2, (12)

6 492 L. Wang then the sample design-based spline estimator of θ 0 is defined as ˆθ = arg min θ S d 1 + ˆR(θ), (13) and the spline estimator of g is ĝ, i.e., ĝ(v) =ˆϕ ˆθ (F d(v)). For any i s, let ˆm i =ĝ(x T i ˆθ) =ˆϕ ˆθ (z ˆθi ) = et i B ˆθ,s (BṰ θ,s W sb ˆθ,s ) 1 B Ṱ θ,s W sy s, (14) where y s ={y i } i s is the n -vector of y i obtained in the sample and { } 1 B ˆθ,s ={B j,4(z ˆθi )} i s,j= 3,...,J, W s = diag. i i s Then the sample design-based spline estimator of t y is 3. Properties of the estimator ˆt y,diff = i s y i ˆm i i 3.1. A simple alternative expression for the estimator + i U ˆm i. (15) Like the ratio and LREG estimators [7] and the penalised spline estimators [13], the B-spline estimator defined in Equation (15) can also be represented in a simple form. Let ˆt z and ˆt z be two vectors: ˆt z = B j,4 (z ˆθi ) i U T j= 3,...,J, ˆt z = { i s } T 1 i B j,4 (z ˆθi ). j= 3,...,J Then the estimator in Equation (15) can be written as ˆt y,diff = ˆt y + (ˆt z ˆt z ) ˆγ, where ˆγ = (B Ṱ θ,s W sb ˆθ,s ) 1 B Ṱ θ,s W sy s. oting that (1,...,1) J +4 B Ṱ = (1,...,1) θ,s n and one has { i s } T 1 i B j,4 (z ˆθi ) = (1,...,1) J +4 B Ṱ W θ,s sb ˆθ,s, j= 3,...,J { } T ˆt z ˆγ = 1 i B j,4 (z ˆθi ) (B Ṱ W θ,s sb ˆθ,s ) 1 B Ṱ W θ,s sy s i s j= 3,...,J = (1,...,1) J +4 B Ṱ θ,s W sy s = (1,...,1) n W s y s = ˆt y. So the proposed estimator takes the simple and attractive form: ˆt y,diff = ˆt z ˆγ = i U ˆm i.

7 Journal of onparametric Statistics Assumptions For the asymptotic properties of the estimators, I will use the traditional asymptotic framework given in [12,33], in which both the population and sample sizes increase as. There are two sources of variation to be considered here. The first is introduced by the random sample design and the corresponding measure is denoted by p. The with p-probability 1, O p, o p and E p ( ) notation below is with respect to this measure. The second is associated with the superpopulation from which the finite population is viewed as a sample. The corresponding measure and notation are ξ, with ξ-probability 1, O ξ, o ξ and E ξ ( ). Before stating the asymptotic results, I formulate some assumptions. Let Ba d ={x Rd x a} be the d-dimensional ball with radius a, center 0 and volume Vol d (Ba d ). Let C (k) (B d a ) ={m the kth order partial derivatives of m are continuous on Bd a } be the space of kth order smooth functions. (A1) The density function of X, f(x) C (4) (Ba d ) for some a>0, and there are positive constants c f C f such that c f /Vol d (Ba d) f(x) C f /Vol d (Ba d),ifx Bd a and f(x) = 0, x / Bd a. (A2) The regression function in Equation (2) m C (4) (Ba d). (A3) The error ε in Equation (2) satisfies E ξ (ε X) = 0, E ξ (ε 2 X) = 1, and there exists a positive constant M such that sup x B d a E ξ ( ε 3 X = x) <M. The SD function σ(x) is continuous on Ba d,0<c σ inf x B d a σ(x) sup x B d a σ(x) C σ <. (A4) As, n 1 (0, 1) and the number of interior knots J satisfies: n 1/6 J n 1/5 {log(n )} 2/5. (A5) For all, min i U i λ>0, min i,j U ij λ > 0 and lim sup n max ij i j <. i,j U,i =j (A6) Let D k, be the set of all distinct k-tuples (i 1,i 2,...,i k ) from U, lim sup n 2 max E p [(I i1 i1 )(I i2 i2 )(I i3 i3 )(I i4 i4 )] <, (i 1,i 2,i 3,i 4 ) D 4, lim sup n 2 max E p [(I i1 I i2 i1 i 2 )(I i3 I i4 i3 i 4 )] <, (i 1,i 2,i 3,i 4 ) D 4, and lim sup n 2 max E p [(I i1 i1 ) 2 (I i2 i2 )(I i3 i3 )] <, (i 1,i 2,i 3 ) D 3, where I i = 1ifi s and I i = 0 otherwise. (A7) The risk function R in Equation (9) is locally convex at θ : ε>0, δ >0 such that θ θ 2 <εif R(θ) R( θ)<δ. (A8) The second-order derivative of R(θ) is bounded at θ = θ. Remark 3.1 Assumptions (A1) (A3) are typical in the nonparametric smoothing literature, see, for instance, [10,11,28]. Assumption (A4) is about how to choose the number of knots in order to achieve the optimal nonparametric rate of convergence. In practice, the number of interior knots J is chosen according to Equation (17). Assumptions (A5) and (A6) involve the inclusion probabilities of the design, which are also assumed in [12]. Assumption (A7) is used to derive the design consistency of ˆθ to θ and Assumption (A8) is used to obtain the rate of the consistency.

8 494 L. Wang 3.3. Asymptotic properties of the estimator The estimator ˆθ in Equation (13) of the single-index coefficient θ 0 is asymptotically designconsistent as the following theorem demonstrates. Theorem 3.1 Under Assumptions (A1) (A5) and (A7), ˆθ is asymptotically design-consistent in the sense that with p-probability 1 lim ˆθ θ = 0, and further if (A8) holds, then ˆθ θ = O p ( J n 1/2 ), where θ, ˆθ are the population- and sample-based estimators of θ 0 in Equations (8) and (13). Like the local polynomial estimators in [12], the following theorem shows that the estimator ˆt y,diff in Equation (15) is ADU and design-consistent. Theorem 3.2 Under Assumptions (A1) (A5) and (A7) and (A8), the model-assisted spline estimator ˆt y,diff in Equation (15) is ADU in the sense that lim E p [ ] ˆty,diff t y = 0 with ξ-probability 1, and is design-consistent in the sense that for all η>0 lim E p[i { ˆty,diff t y >η}] =0with ξ-probability 1. Like the local polynomial estimators in [12], the following theorem shows that the estimator in Equation (15) also inherits the limiting distribution of the generalised difference estimator. Theorem 3.3 Under Assumptions (A1) (A8), for t y,diff, ˆt y,diff in Equations (11) and (15), as implies where ˆV( 1 ˆt y,diff ) = ( t y,diff t y ) d Varp 1/2( (0, 1) 1 t y,diff ) 1 (ˆt y,diff t y ) ˆV 1/2 ( 1 ˆt y,diff ) d (0, 1), ( ij i j (y i ˆm i )(y j ˆm j ) i,j U i j Details of the proofs of Theorems are given in the Appendix. ) Ii I j ij. (16) Remark 3.2 In [13], the number of knots is fixed, thus the bias caused by spline approximation in developing the asymptotic theory is ignored. It has been shown in many contexts of function

9 Journal of onparametric Statistics 495 estimation that, by letting the number of knots increase with the sample size at an appropriate rate, the spline estimate of an unknown function can achieve the optimal nonparametric rate of convergence; see [31,34]. For this purpose, in this article, n 1/6 J n 1/5 {log(n )} 2/5,as shown in Assumption (A4). Remark 3.3 As one referee pointed out, the asymptotics with the number of knots allowed to grow is much more challenging, and only very recent work tackles this problem, e.g., [35,36]. However, the results obtained in this article are not directly comparable to those obtained in [35,36] due to different settings of the model. The problem in [35,36] is a purely nonparametric curve estimation problem, and the objective is to study the asymptotics of the curve estimators fitted with penalised splines, whereas the problem here is a semi-parametric one and the main interest is in estimating the parametric component θ. At the population level, it has been shown that θ 0 should be estimable at the usual root-n rate of convergence using similar techniques as deriving the asymptotics of maximum likelihood estimators. In this article, examination of the approximation results of the derivatives (up to the second order) of the risk function in Equation (5) by their empirical versions implies that a range of smoothing parameter is allowed for the desired asymptotics; see Appendix A of [37]. This differs from nonparametric curve estimation in [35,36] in which the optimal choice of the smoothing parameter is required to achieve the optimal rate of convergence. 4. Algorithm In this section, the actual procedure is described to implement the estimation of θ 0 and t y.i first introduce some new notation. For any fixed θ, write P θ,s = B θ,s (B T θ,s W sb θ,s ) 1 B T θ,s W s as the sample projection matrix onto the cubic spline space. For any q = 1,...,d, write Ḃ q = ( / θ q )B θ, Ṗ q = ( / θ q )P θ as the first-order derivatives of B θ and P θ,s with respect to θ. Write θ d = (θ 1,...,θ d 1 ) T. Let Ŝ (θ d ) be the score vector of the risk function ˆR (θ d ) = ˆR(θ 1,θ 2,...,θ d 1, 1 θ d 2 2 ), that is, Ŝ (θ d ) = ( / θ d ) ˆR (θ d ). The exact form of Ŝ (θ d ) is given in Lemma 4.1 of [37]. In practice, the estimation is implemented via the following procedure. Step 1. Standardise the auxiliary variables {x i } i U and find the radius a used in the CDF transformation (6) by calculating the 100(1 α) percentile of { x i } i U (α = 0.01, 0.05 for example). Step 2. Find the estimator ˆθ of θ 0 by minimising ˆR in Equation (12) through the port optimisation routine in the technical report of [38], with (0, 0,...,1) T as the initial value and the gradient vector Ŝ in Equation (17) of [37]. If d<n, one can take the simple OLS estimator (after standardisation) for {y i, x i } i s with its last coordinate positive. Step 3. Obtain the estimator ˆm i of m(x i ), i U, by applying formula (14). Step 4. Calculate the sample design-based spline estimator of t y in Equation (15). Remark 4.1 In Step 2, the number of interior knots is J = min{c 1 [n 1/5.5 ],c 2 }, (17) where c 1 and c 2 are positive integers and [ν] denotes the integer part of ν. The choice of the tuning parameter c 1 makes little difference for a large sample, and according to our asymptotic theory, there is no optimal way to set these c 1 and c 2. I recommend using c 1 = 1 to save computing for massive data sets and c 2 = 5,...,10 for smooth monotonic or smooth unimodal regression as suggested by Yu and Ruppert [39].

10 496 L. Wang 5. Empirical results In this section, empirical results are provided to demonstrate the applicability of the methodology. Besides the spline single-index (SIM) estimators proposed in the article, I have obtained for comparison the performance of three other estimators: HT estimator in Equation (1), LREG estimator without interaction terms in Chapter 6 of [7] and spline additive estimator (AM) in [13] with degrees 1, 2 and 3 and adaptive knots. The number of knots J for the spline SIM estimator is selected according to Equation (17) Simulated population To illustrate the finite-sample behavior of the estimator ˆt y,diff, some simulation results are presented. For the superpopulation model (2), the following six mean functions are considered: 2-dimension (linear): m 1 (x) = x 1 + x 2 ; 2-dimension (quadratic): m 2 (x) = 1 + (x 1 + x 2 ) 2 ; 2-dimension (bump 1): m 3 (x) = x 1 + x 2 + 4exp{ (x 1 + x 2 ) 2 }; 2-dimension (bump 2): m 4 (x) = x 1 + x 2 + 4exp{ (x 1 + x 2 ) 2 }+ x x2 2 ; 4-dimension (sinusoid): m 5 (x) = sin(x T θ0), θ 0 = (1, 1, 0, 1) T / 3; 10-dimension (sinusoid): m 6 (x) = sin(x T θ 0 ), θ 0 = (1, 1, 0,...,0, 1) T / 3. These represent various correct and incorrect SIM specifications. Function m 1 is a simple linear additive function with two auxiliary variables, and it is also a linear single-index function; Functions m 2, m 3, m 5 and m 6 are some very common SIMs, but unlike m 1, they are not additive so that the purely linear or additive model would be misspecified. Function m 4 is neither a genuine single-index nor a genuine additive function so that any of the above models would be misspecified. However, because the SIM in this article is identified by the best approximation (see Equation (2)) to the multivariate mean function, the estimator ˆt y,diff is expected to be robust in this case. The auxiliary vector {x i } i U is generated from i.i.d d-dimensional uniform (0, 1) random vectors. The population values y i s are generated from the mean functions by adding i.i.d (0,σ 2 ) errors with σ = 0.1 and 0.4. The population is of size = Samples are generated by simple random sampling using sample size n = 50, 100 and 200. For each combination of mean function, SD and sample size, 1000 replicates are selected from the same population, the estimators are calculated and the design bias, design variance and the design mean squared errors (MSEs) are estimated. Table 1 lists the average mean squared errors (AMSEs) of the spline estimators ˆθ in Equation (13) based on d dimensions AMSE( ˆθ) = 1 d d MSE( ˆθ q ), (18) q=1 from which one sees that, even for small sample size, the estimators ˆθ are very accurate for all the population models, and the precision is improved when sample size n increases. In terms of the design biases, the percent relative design biases {E p [ˆt y,diff ] t y } t y 100%

11 Journal of onparametric Statistics 497 Table 1. AMSE of the spline estimators ˆθ defined in Equation (18). σ n ote: Based on 1000 replications of n simple random samples from population of size = defined in [12] have been measured for all the above models. It is found that the relative design biases of the SIM estimators are quite small (<1% for all cases in the simulation) even for sample size n = 50. Table 2 shows the ratios of design MSEs for HT, LREG and AM estimators to the MSE for the proposed spline SIM estimator. From this table, one sees that the model-assisted estimators, LREG, AM and SIM estimators, perform much better than the simple HT estimators regardless of the type of mean function and standard error. For m 1, LREG is expected to be the preferred estimator, since the assumed model is correctly specified. The AM and SIM estimators have similar behavior in this case, and the MSE ratios of AM to SIM are close to 1. However, not much efficiency is lost by using SIM and AM instead of LREG. The MSE ratios of LREG to SIM are at least 0.78 for all cases. For the rest of the population, the SIM estimators perform consistently better than LREG and AM estimators because the interactions between the auxiliary variables have been completely ignored for LREG and AM estimators. For m 4, it is not a genuine single-index function, but SIM estimators are still much more accurate than HT, LREG and AM estimators, confirmative to the theory that the proposed estimators are robust against the deviation from the SIM. To see how fast the computation is, Table 2 provides the average time (based on 1000 replications) of obtaining the SIM estimators on an ordinary PC with Intel Pentium IV 1.86 GHz processor and 1.0 GB RAM. It shows that the proposed SIM estimation is extremely fast. For instance, for Model 6, the SIM estimation of a 10-dimensional sample of size 200 takes on average 0.23 s. I have also carried out the simulation with sample size n = 5000 generated from the population of size 50,000. Remarkably, it takes on average <8 s to get the SIM estimators for all the above models MU281 data The MU284 data set from Appendix B of [7] contains data about Swedish municipalities. The study variable y is RMT , where RMT85 is municipal tax receipts in Two auxiliary variables x 1 (CS82) and x 2 (SS82) are used, where x 1 is the number of Conservative Party seats in the municipal council, and x 2 is the number of Social Democrat Party seats. The largest three cities according to the variable population in 1975 (pop75) are discarded because they are huge outliers and would be treated separately in practice. The population total of = 281 Swedish Municipalities, t y, is found to be The oracle estimator θ (Equation (8)) at the population level is found to be (0.8412, ) T. A Monte Carlo simulation is carried out in which 1000 repeated SRS samples (each with n = 50 and 100) are drawn from the MU281 population of Swedish municipalities. To demonstrate the closeness of the spline estimator ˆθ to the oracle index parameter θ, Table 3 lists the sample mean (MEA), design bias (BIAS), design SD, the design MSE and the AMSE in Equation (18) of ˆθ. From this table, one sees that the sample-based estimators ˆθ are very accurate even for sample

12 498 L. Wang Table 2. Ratio of MSE of the HT, LREG and additive model-assisted estimators (AM) to the SIM-assisted estimators and the average computing time of the SIM. MSE ratio Model σ n HT LREG Degree = 1 Degree = 2 Degree = 3 Time of SIM (s) AM ote: Based on 1000 replications of simple random sampling from population of size = Table 3. Spline estimators ˆθ on MU281 data. n θ MEA BIAS SD MSE AMSE 50 θ θ θ θ ote: Based on 1000 replications of simple random sampling from population of 281 Swedish Municipalities.

13 Journal of onparametric Statistics 499 Table 4. Estimators of t y on MU281 data. n Estimator MEA BIAS SD MSE 50 HT LREG AM (degree = 1) AM (degree = 2) AM (degree = 3) SIM HT LREG AM (degree = 1) AM (degree = 2) AM (degree = 3) SIM ote: Based on 1000 replications of simple random sampling from population of 281 Swedish Municipalities. size 50. As what is expected, when the sample size increases, the coefficient is more accurately estimated. Table 4 shows the performance of the HT, LREG, AM and SIM estimators of t y. One sees from this table that the model-assisted estimators are much more accurate than the simple HT estimators. Among all the model-assisted estimators, the spline SIM estimators are better than other estimators in terms of the MSE. ote The corresponding computing package in R, svyty_1.0.zip, can be freely downloaded from edu/research/svyty_1.0.zip. References [1] R.L. Chambers, Robust case-weighting for multipurpose establishment surveys. J. Off. Statist. 12 (1996), pp [2] R.L. Chambers, A.H. Dorfman, and T.E. Wehrly, Bias robust estimation in finite populations using nonparametric calibration, J. Amer. Statist. Assoc. 88 (1993), pp [3] A.H. Dorfman, onparametric regression for estimating totals in finite populations. Proceedings of the Section on Survey Research Methods, American Statistical Association, Alexandria, VA., 1992, pp [4] A.H. Dorfman and P. Hall, Estimators of the finite population distribution function using nonparametric regression, Ann. Statist. 21 (1993), pp [5] R.L. Chambers, A.H. Dorfman, and P. Hall, Properties of estimators of the finite distribution function, Biometrika 79 (1992), pp [6] R.L. Chambers and C.J. Skinner, Analysis of Survey Data, Wiley, Chichester, [7] C.E. Särndal, B. Swensson, and J. Wretman, Model Assisted Survey Sampling, Springer-Verlag, ew York, [8] M.E. Thompson, Theory of Sample Surveys, Chapman and Hall, London, [9] S. Wang and A.H. Dorfman, A new estimator for the finite population distribution function, Biometrika 83 (1997), pp [10] J. Fan and I. Gijbels, Local Polynomial Modelling and Its Applications, Chapman and Hall, London, [11] W. Härdle, Applied onparametric Regression, Cambridge University Press, Cambridge, [12] F.J. Breidt and J.D. Opsomer, Local polynomial regression estimators in survey sampling, Ann. Statist. 28 (2000), pp [13] F.J. Breidt, G. Claeskens, and J.D. Opsomer, Model-assisted estimation for complex surveys using penalised splines, Biometrika 92 (2005), pp [14] G.E. Montanari and M.G. Ranalli, onparametric model calibration estimation in survey sampling, J. Amer. Statist. Assoc. 100 (2005), pp [15] J.D. Opsomer, F.J. Breidt, G.G. Moisen, and G. Kauermann, Model-assisted estimation of forest resources with generalized additive models (with discussion), J. Amer. Statist. Assoc. 102 (2007), pp

14 500 L. Wang [16] H. Zheng and R.J.A. Little, Penalized spline model-based estimation of finite population total from probabilityproportional-to-size samples, J. Off. Statist. 19 (2003), pp [17] H. Zheng and R.J.A. Little, Penalized spline nonparametric mixed models for inference about a finite population mean from two-stage samples, Survey Methodol. 30 (2004), pp [18] C.E. Särndal and S. Lundström, Estimation in Surveys with onresponse, Wiley, ew York, [19] F.J. Breidt, J.D. Opsomer, A.A. Johnson, and M.G., Ranalli, Semiparametric model-assisted estimation for natural resource surveys, Survey Methodol. 33 (2007), pp [20] S. Everson-Stewart, onparametric survey regression estimation in two-stage spatial sampling, unpublished masters project, Colorado State University. Available at [21] C.J. Stone, The dimensionality reduction principle for generalized additive models, Ann. Statist. 14 (1986), pp [22] T.J. Hastie and R.J. Tibshirani, Generalized Additive Models, Chapman and Hall, London, [23] S. Sperlich, D. Tjøstheim, and L. Yang, onparametric estimation and testing of interaction in additive models, Econ. Theory 18 (2002), pp [24] R. Carroll, J. Fan, I. Gijbels, and M.P. Wand, Generalized partially linear single-index models. J. Amer. Statist. Assoc. 92 (1997), pp [25] P. Hall, On projection pursuit regression, Ann. Statist. 17 (1989), pp [26] W. Härdle, P. Hall, and H. Ichimura, Optimal smoothing in single-index models, Ann. Statist. 21 (1993), pp [27] J.L. Horowitz and W. Härdle, Direct semiparametric estimation of single-index models with discrete covariates, J. Amer. Statist. Assoc. 91 (1996), pp [28] Y. Xia, H. Tong, W.K. Li, and L. Zhu, An adaptive estimation of dimension reduction space, J. R. Stat. Soc. Ser. B Stat. Methodol. 64 (2002), pp [29] O.B. Linton and J.P. ielsen, A kernel method of estimating structured nonparametric regression based on marginal integration, Biometrika 82 (1995), pp [30] Y. Xia, W.K. Li, H. Tong, and D. Zhang, A goodness-of-fit test for single-index models, Statist. Sinica 14 (2004), pp [31] L. Wang and L. Yang, Spline-backfitted kernel smoothing of nonlinear additive autoregression model, Ann. Statist. 35 (2007), pp [32] C. de Boor, A Practical Guide to Splines, Springer-Verlag, ew York, [33] C. Isaki and W.A. Fuller, Survey design under the regression superpopulation model, J. Amer. Statist. Assoc. 77 (1982), pp [34] J.Z. Huang, Local asymptotics for polynomial spline regression, Ann. Statist. 31 (2003), pp [35] G. Claeskens, T. Krivobokova, and J. Opsomer, Asymptotic properties of penalized spline regression estimators, to appear in Biometrika. [36] Y. Li and D. Ruppert, On the asymptotics of penalized splines, Biometrika 95 (2008), pp [37] L. Wang, Single-index model-assisted estimation in survey sampling, Technical Report. Available at [38] D.M. Gay, Usage summary for selected optimization routines, Computing Science Technical Report o Available at [39] Y. Yu and D. Ruppert, Penalized spline estimation for partially linear single index models, J. Amer. Statist. Assoc. 97 (2002), pp Appendix A.1. Proof of Theorem 3.1 Let (, A, P) be the design probability space with respect to the sampling design measure. By Lemma A.3 of [37], for any δ>0 and ω, there exists an integer 0 (ω), such that when > 0 (ω), ˆR( θ,ω) R( θ) < δ/2. ote that ˆθ = ˆθ(ω)is the minimiser of ˆR(θ, ω),so ˆR(ˆθ(ω),ω) R( θ)<δ/2. Using Lemma A.3 of [37] again, there exists 1 (ω), such that when > 1 (ω), R(ˆθ(ω)) ˆR(ˆθ(ω),ω)<δ/2. Thus, when >max( 0 (ω), 1 (ω)), R(ˆθ(ω)) R( θ) < δ 2 + ˆR(ˆθ(ω),ω) R( θ) < δ 2 + δ 2 = δ. By Assumption (A7), for any ε>0, if R(ˆθ(ω),ω) R( θ)<δ, then one would have ˆθ(ω) θ 2 <ε for large enough, which is true for any ω, and the strong consistency holds. ext, note that ˆR(θ) ˆR(θ) = 2 ˆR(θ) θ θ θ θ θ=ˆθ θ= θ T ( ˆθ θ), θ= θ with θ = t ˆθ + (I t) θ. So ( ) ˆθ θ 2 ˆR(θ) 1 ˆR(θ) = θ θ T, θ θ= θ θ= θ

15 Journal of onparametric Statistics 501 where according to Equation (A.6) of [37] and the above consistency result of ˆθ, lim 2 ˆR(θ) θ θ T θ= θ in probability p, and by Equation (A.6) of [37] again, one has ˆR(θ) θ sup ˆR(θ) θ= θ θ Sc d 1 θ Thus ˆθ θ = O p (J /n 1/2 ) by Assumption (A8). 2 R(θ) θ θ T R(θ) θ θ= θ ( = O p J n 1/2 ). A.2. Proof of Theorem 3.2 Lemma A.1 Under Assumptions (A1) (A5) and (A7) one has where m i and ˆm i are defined in Equations (10) and (14). Proof Let ˆ m i = ei TB ˆθ (BṰB θ ˆθ ) 1 B Ṱ y, then one can write θ 1 lim E p ( m i ˆm i ) 2 = 0, i U ( m i ˆm i ) 2 = ( m i ˆ m i ) 2 + ( ˆ m i ˆm i ) 2 + 2( m i ˆ m i )( ˆ m i ˆm i ). By Lemma A.2 of [37], (1/)E p [ i U ( ˆ m i ˆm i ) 2 ] 0, it suffices to show 1 E p ( m i ˆ m i ) 2 0. i U Let f(t)= e T i P t ˆθ+(1 t) θ y, then df(t)/dt = et i dq=1 ( / θ q )P t ˆθ+(1 t) θ ( ˆθ q θ q )y. Thus, (A.1) where t (0, 1). Therefore, m i ˆ m i = e T i B ˆθ (BṰ θ B ˆθ ) 1 B Ṱ θ y et i B θ (BT θ B θ ) 1 B T θ y d = f(1) f(0) = ei T P θ t θ+(1 t ˆ ) θ ( ˆθ q θ q )y, q=1 q 1 E p ( m i ˆ m i ) 2 = 1 i U ote that according to Theorem 3.1, with p-probability 1, E p et i i U q=1 P θ t ˆθ+(1 t ) θ P q θ θ, q 2 d P θ t ˆθ+(1 t ) θ ( ˆθ q θ q )y. q and ˆθ θ = O p (J /n 1/2 ). By Lemma A.4 of [37], there exists a positive constant C 0 such that sup 1 k d sup θ S d 1 ( / θ c k )P θ C 0 J with ξ-probability 1. Thus Equation (A.1) follows directly from the above arguments and Assumption (A4). Hence the result.

16 502 L. Wang Then ote that E p ˆt y,diff t y ˆt y,diff t y = E p (y i m i ) i U i U (y i m i ) ( Ii i 1) ( ) Ii 1 + ( ( ˆm i m i ) 1 I ) i. i i U i According to the definition of Equation (9), under Assumptions (A1) (A4), one has lim sup + E p ( ˆm i m i )2 E p 1/2 (1 I i / i )2. (A.2) i U i U 1 i U (y i m i ) 2 <. Following the same argument of Theorem 1 in [12], the first term on the right hand side of Equation (A.2) converges to zero as. For the second term, Assumption (A5) implies that E p (1 I i / i )2 = i (1 i ) 2 1 λ. i U i U i According to Lemma A.1, 1 lim E p [( ˆm i m i ) 2 ] 0with ξ-probability 1, i U and the result follows from Markov s inequality. A.3. Proof of Theorem 3.3 The next lemma is to derive the asymptotic MSE of the proposed spline estimator in Equation (15). Lemma A.2 Proof Let then ote that Under Assumptions (A1) (A5) and (A7) ( ) 2 ˆty,diff t y n E p = n ( ij i j 2 (y i m i )(y j m i ) i,j U i j ˆt y,diff t y a = n 1/2 = y i m i i U y i m i i U ( ) Ii 1 + i i U ( ) Ii 1, b = n 1/2 i ˆm i m i i U ˆm i m i ( ) Ii 1. i ) + o(1). (A.3) ( 1 I ) i, i E p [a 2 ]= n ( ) ij i j 2 (y i m i )(y j m i ) i,j U i j ( 1 λ + n ) max i,j U,i =j ij i j 1 λ 2 (y i m i ) 2 <, i U E p [b 2 ]= n 2 E p ( ˆm i m i )( ˆm j m j ) i,j U ( 1 λ + n max i,j U,i =j ij i j λ 2 ( 1 I i i )( 1 I j j ) ) 1 E p[{ ˆm i m i } 2 ]. By Lemma A.1, one has E p [b 2 ]=o(1) and the Cauchy Schwartz inequality implies E p[a n b n ]=0. Therefore, ( ) 2 ˆty,diff t y n E p = E p [an 2 ]+2E p[a n b n ]+E p [bn 2 ]=E p[an 2 ]+o(1). Thus the desired result holds.

17 Journal of onparametric Statistics 503 Denote AMSE( 1 ˆt y,diff ) = 1 2 ( ) ij i j (y i m i )(y j m j ) i,j U i j as the asymptotic MSE in (A.3). The next result shows that it can be estimated consistently by ˆV( 1 ˆt y,diff ) in Equation (16). Lemma A.3 Under Assumptions (A1) (A7), one has lim n E p ˆV( 1 ˆt y,diff ) AMSE( 1 ˆt y,diff ) =0. Proof Denote then For the first term S 1, one has and S 1 = 1 2 S 2 = 2 2 S 3 = 1 2 ( ij i j (y i m i )(y j m j ) i,j U i j ( ij i j (y i m i )( m j ˆm j ) i,j U i j ( ij i j ( m i ˆm i )( m j ˆm j ) i,j U i j ) Ii I j ij, ) Ii I j ij, ) Ii I j ij, ˆV( 1 ˆt y,diff ) AMSE( 1 ˆt y,diff ) = S 1 AMSE( 1 ˆt y,diff ) + S 2 + S 3. n E p S 1 AMSE( 1 ˆt y,diff ) n 2 E p ( ) 2 1/2 ij i j Ii I j ij (y i m i )(y j m j ), i,j U i j ij n 2 4 E p ( ) 2 ij i j Ii I j ij (y i m i )(y j m j ) i,j U i j ij = n2 ( )( )( ) 1 4 (y i m i ) 2 (y k m k ) 2 i 1 k ik i k i,k U i k ik + 2n2 ( )( ) 1 4 (y i m i ) 2 i kl k l (y k m k )(y l m l ) i U k =l i kl ( ) Ii i I k I l kl E p + n2 i kl 4 (y i m i )(y j m j )(y k m k )(y l m l ) ( )( ij i j kl k l i j k l s 1 + s 2 + s 3. i =j,k =l ) ( Ii I j ij E p ij ) I k I l kl kl (A.4) ow s 1 n 2 (y i m i )4 λ n 2 i U ( n 2 λ n2 (y i m i )2 (y k m k )2 ik i k λ 4 4 i,k U ) max i,k U,i =k ik i k λ 4 2 (y i m i )4, i U

18 504 L. Wang and lim sup 1/ i U (y i m i ) 4 <. Thus s 1 goes to zero as.ext s 3 (n max i,k U,i =k ik i k ) 2 1 λ 4 λ 2 4 (y i m i )(y j m j )(y k m k )(y l m l ) E p i =j,k =l I i I j ij I k I l kl ij kl O( 1 ) + (n max i,k U,i =k ik i k ) 2 [( )( )] λ 4 λ 2 max (i,j,k,l) D 4, E Ii I j ij Ik I l kl p ij kl (y i m i )4, i U which converges to zero as by Assumption (A6). As a result of the Cauchy Schwartz inequality, one can show s 2 goes to zero as. Therefore, n E p S 1 AMSE( 1 ˆt y,diff ) 0, as. (A.5) ext for S 2, by Lemma A.1 n E p 2 ( ) ij i j Ii I j 2 (y i m i )( m j ˆm j ) i,j U i j ij ( 2n max i,k U,i =k ik i k 2 λ 4 λ 2 + 2n ) (y i m i )2 E p ( m i ˆm i )2 λ 2, (A.6) i U i U which converges to zero. For S 3, applying Lemma A.1 again, one has E p n S 3 = n 2 E ( ) ij i j Ii I j p ( m i ˆm i )( m j ˆm j ) i,j U i j ij ( n max i,j U,i =j ij i j λ 2 λ + n ) 1 λ 2 E p ( m i ˆm i ) 2 0. i U The desired result follows from (A.4) (A.7). Proof of Theorem 3.3 By the proof of Lemma A.2, 1 (ˆt y,diff t y ) = 1 i s y i ˆm i i = 1 ( t y,diff t y ) + o p (n 1/2 ), so the desired result follows from Lemma A.3. + y i I i ˆm i i U i U i (A.7)

F. Jay Breidt Colorado State University

F. Jay Breidt Colorado State University Model-assisted survey regression estimation with the lasso 1 F. Jay Breidt Colorado State University Opening Workshop on Computational Methods in Social Sciences SAMSI August 2013 This research was supported

More information

Nonparametric Regression Estimation of Finite Population Totals under Two-Stage Sampling

Nonparametric Regression Estimation of Finite Population Totals under Two-Stage Sampling Nonparametric Regression Estimation of Finite Population Totals under Two-Stage Sampling Ji-Yeon Kim Iowa State University F. Jay Breidt Colorado State University Jean D. Opsomer Colorado State University

More information

INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING

INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING Statistica Sinica 24 (2014), 1001-1015 doi:http://dx.doi.org/10.5705/ss.2013.038 INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING Seunghwan Park and Jae Kwang Kim Seoul National Univeristy

More information

Cross-validation in model-assisted estimation

Cross-validation in model-assisted estimation Graduate Theses and Dissertations Iowa State University Capstones, Theses and Dissertations 009 Cross-validation in model-assisted estimation Lifeng You Iowa State University Follow this and additional

More information

New Local Estimation Procedure for Nonparametric Regression Function of Longitudinal Data

New Local Estimation Procedure for Nonparametric Regression Function of Longitudinal Data ew Local Estimation Procedure for onparametric Regression Function of Longitudinal Data Weixin Yao and Runze Li The Pennsylvania State University Technical Report Series #0-03 College of Health and Human

More information

Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model

Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model 1. Introduction Varying-coefficient partially linear model (Zhang, Lee, and Song, 2002; Xia, Zhang, and Tong, 2004;

More information

Calibration estimation using exponential tilting in sample surveys

Calibration estimation using exponential tilting in sample surveys Calibration estimation using exponential tilting in sample surveys Jae Kwang Kim February 23, 2010 Abstract We consider the problem of parameter estimation with auxiliary information, where the auxiliary

More information

Two Applications of Nonparametric Regression in Survey Estimation

Two Applications of Nonparametric Regression in Survey Estimation Two Applications of Nonparametric Regression in Survey Estimation 1/56 Jean Opsomer Iowa State University Joint work with Jay Breidt, Colorado State University Gerda Claeskens, Université Catholique de

More information

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon Jianqing Fan Department of Statistics Chinese University of Hong Kong AND Department of Statistics

More information

Regularization in Cox Frailty Models

Regularization in Cox Frailty Models Regularization in Cox Frailty Models Andreas Groll 1, Trevor Hastie 2, Gerhard Tutz 3 1 Ludwig-Maximilians-Universität Munich, Department of Mathematics, Theresienstraße 39, 80333 Munich, Germany 2 University

More information

Weighting in survey analysis under informative sampling

Weighting in survey analysis under informative sampling Jae Kwang Kim and Chris J. Skinner Weighting in survey analysis under informative sampling Article (Accepted version) (Refereed) Original citation: Kim, Jae Kwang and Skinner, Chris J. (2013) Weighting

More information

Transformation and Smoothing in Sample Survey Data

Transformation and Smoothing in Sample Survey Data Scandinavian Journal of Statistics, Vol. 37: 496 513, 2010 doi: 10.1111/j.1467-9469.2010.00691.x Published by Blackwell Publishing Ltd. Transformation and Smoothing in Sample Survey Data YANYUAN MA Department

More information

Function of Longitudinal Data

Function of Longitudinal Data New Local Estimation Procedure for Nonparametric Regression Function of Longitudinal Data Weixin Yao and Runze Li Abstract This paper develops a new estimation of nonparametric regression functions for

More information

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of

More information

Estimation of cumulative distribution function with spline functions

Estimation of cumulative distribution function with spline functions INTERNATIONAL JOURNAL OF ECONOMICS AND STATISTICS Volume 5, 017 Estimation of cumulative distribution function with functions Akhlitdin Nizamitdinov, Aladdin Shamilov Abstract The estimation of the cumulative

More information

A review of some semiparametric regression models with application to scoring

A review of some semiparametric regression models with application to scoring A review of some semiparametric regression models with application to scoring Jean-Loïc Berthet 1 and Valentin Patilea 2 1 ENSAI Campus de Ker-Lann Rue Blaise Pascal - BP 37203 35172 Bruz cedex, France

More information

Local linear multiple regression with variable. bandwidth in the presence of heteroscedasticity

Local linear multiple regression with variable. bandwidth in the presence of heteroscedasticity Local linear multiple regression with variable bandwidth in the presence of heteroscedasticity Azhong Ye 1 Rob J Hyndman 2 Zinai Li 3 23 January 2006 Abstract: We present local linear estimator with variable

More information

DESIGN-ADAPTIVE MINIMAX LOCAL LINEAR REGRESSION FOR LONGITUDINAL/CLUSTERED DATA

DESIGN-ADAPTIVE MINIMAX LOCAL LINEAR REGRESSION FOR LONGITUDINAL/CLUSTERED DATA Statistica Sinica 18(2008), 515-534 DESIGN-ADAPTIVE MINIMAX LOCAL LINEAR REGRESSION FOR LONGITUDINAL/CLUSTERED DATA Kani Chen 1, Jianqing Fan 2 and Zhezhen Jin 3 1 Hong Kong University of Science and Technology,

More information

Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions

Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions Vadim Marmer University of British Columbia Artyom Shneyerov CIRANO, CIREQ, and Concordia University August 30, 2010 Abstract

More information

Nonparametric Small Area Estimation Using Penalized Spline Regression

Nonparametric Small Area Estimation Using Penalized Spline Regression Nonparametric Small Area Estimation Using Penalized Spline Regression 0verview Spline-based nonparametric regression Nonparametric small area estimation Prediction mean squared error Bootstrapping small

More information

Nonparametric Small Area Estimation via M-quantile Regression using Penalized Splines

Nonparametric Small Area Estimation via M-quantile Regression using Penalized Splines Nonparametric Small Estimation via M-quantile Regression using Penalized Splines Monica Pratesi 10 August 2008 Abstract The demand of reliable statistics for small areas, when only reduced sizes of the

More information

ASYMPTOTICS FOR PENALIZED SPLINES IN ADDITIVE MODELS

ASYMPTOTICS FOR PENALIZED SPLINES IN ADDITIVE MODELS Mem. Gra. Sci. Eng. Shimane Univ. Series B: Mathematics 47 (2014), pp. 63 71 ASYMPTOTICS FOR PENALIZED SPLINES IN ADDITIVE MODELS TAKUMA YOSHIDA Communicated by Kanta Naito (Received: December 19, 2013)

More information

ADDITIVE COEFFICIENT MODELING VIA POLYNOMIAL SPLINE

ADDITIVE COEFFICIENT MODELING VIA POLYNOMIAL SPLINE ADDITIVE COEFFICIENT MODELING VIA POLYNOMIAL SPLINE Lan Xue and Lijian Yang Michigan State University Abstract: A flexible nonparametric regression model is considered in which the response depends linearly

More information

Single Index Quantile Regression for Heteroscedastic Data

Single Index Quantile Regression for Heteroscedastic Data Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University SMAC, November 6, 2015 E. Christou, M. G. Akritas (PSU) SIQR

More information

Illustration of the Varying Coefficient Model for Analyses the Tree Growth from the Age and Space Perspectives

Illustration of the Varying Coefficient Model for Analyses the Tree Growth from the Age and Space Perspectives TR-No. 14-06, Hiroshima Statistical Research Group, 1 11 Illustration of the Varying Coefficient Model for Analyses the Tree Growth from the Age and Space Perspectives Mariko Yamamura 1, Keisuke Fukui

More information

arxiv: v2 [math.st] 20 Jun 2014

arxiv: v2 [math.st] 20 Jun 2014 A solution in small area estimation problems Andrius Čiginas and Tomas Rudys Vilnius University Institute of Mathematics and Informatics, LT-08663 Vilnius, Lithuania arxiv:1306.2814v2 [math.st] 20 Jun

More information

NONPARAMETRIC ENDOGENOUS POST-STRATIFICATION ESTIMATION

NONPARAMETRIC ENDOGENOUS POST-STRATIFICATION ESTIMATION Statistica Sinica 2011): Preprint 1 NONPARAMETRIC ENDOGENOUS POST-STRATIFICATION ESTIMATION Mark Dahlke 1, F. Jay Breidt 1, Jean D. Opsomer 1 and Ingrid Van Keilegom 2 1 Colorado State University and 2

More information

DEPARTMENT MATHEMATIK ARBEITSBEREICH MATHEMATISCHE STATISTIK UND STOCHASTISCHE PROZESSE

DEPARTMENT MATHEMATIK ARBEITSBEREICH MATHEMATISCHE STATISTIK UND STOCHASTISCHE PROZESSE Estimating the error distribution in nonparametric multiple regression with applications to model testing Natalie Neumeyer & Ingrid Van Keilegom Preprint No. 2008-01 July 2008 DEPARTMENT MATHEMATIK ARBEITSBEREICH

More information

Nonparametric Methods

Nonparametric Methods Nonparametric Methods Michael R. Roberts Department of Finance The Wharton School University of Pennsylvania July 28, 2009 Michael R. Roberts Nonparametric Methods 1/42 Overview Great for data analysis

More information

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction

More information

Nonparametric Econometrics

Nonparametric Econometrics Applied Microeconometrics with Stata Nonparametric Econometrics Spring Term 2011 1 / 37 Contents Introduction The histogram estimator The kernel density estimator Nonparametric regression estimators Semi-

More information

Nonparametric Modal Regression

Nonparametric Modal Regression Nonparametric Modal Regression Summary In this article, we propose a new nonparametric modal regression model, which aims to estimate the mode of the conditional density of Y given predictors X. The nonparametric

More information

Asymptotic inference for a nonstationary double ar(1) model

Asymptotic inference for a nonstationary double ar(1) model Asymptotic inference for a nonstationary double ar() model By SHIQING LING and DONG LI Department of Mathematics, Hong Kong University of Science and Technology, Hong Kong maling@ust.hk malidong@ust.hk

More information

REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY

REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY J.D. Opsomer, W.A. Fuller and X. Li Iowa State University, Ames, IA 50011, USA 1. Introduction Replication methods are often used in

More information

Efficient Estimation for the Partially Linear Models with Random Effects

Efficient Estimation for the Partially Linear Models with Random Effects A^VÇÚO 1 33 ò 1 5 Ï 2017 c 10 Chinese Journal of Applied Probability and Statistics Oct., 2017, Vol. 33, No. 5, pp. 529-537 doi: 10.3969/j.issn.1001-4268.2017.05.009 Efficient Estimation for the Partially

More information

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo Outline in High Dimensions Using the Rodeo Han Liu 1,2 John Lafferty 2,3 Larry Wasserman 1,2 1 Statistics Department, 2 Machine Learning Department, 3 Computer Science Department, Carnegie Mellon University

More information

Robust estimators for additive models using backfitting

Robust estimators for additive models using backfitting Robust estimators for additive models using backfitting Graciela Boente Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires and CONICET, Argentina Alejandra Martínez Facultad de Ciencias

More information

The EM Algorithm for the Finite Mixture of Exponential Distribution Models

The EM Algorithm for the Finite Mixture of Exponential Distribution Models Int. J. Contemp. Math. Sciences, Vol. 9, 2014, no. 2, 57-64 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ijcms.2014.312133 The EM Algorithm for the Finite Mixture of Exponential Distribution

More information

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

FULL LIKELIHOOD INFERENCES IN THE COX MODEL October 20, 2007 FULL LIKELIHOOD INFERENCES IN THE COX MODEL BY JIAN-JIAN REN 1 AND MAI ZHOU 2 University of Central Florida and University of Kentucky Abstract We use the empirical likelihood approach

More information

Additional results for model-based nonparametric variance estimation for systematic sampling in a forestry survey

Additional results for model-based nonparametric variance estimation for systematic sampling in a forestry survey Additional results for model-based nonparametric variance estimation for systematic sampling in a forestry survey J.D. Opsomer Colorado State University M. Francisco-Fernández Universidad de A Coruña July

More information

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University Integrated Likelihood Estimation in Semiparametric Regression Models Thomas A. Severini Department of Statistics Northwestern University Joint work with Heping He, University of York Introduction Let Y

More information

BIAS-ROBUSTNESS AND EFFICIENCY OF MODEL-BASED INFERENCE IN SURVEY SAMPLING

BIAS-ROBUSTNESS AND EFFICIENCY OF MODEL-BASED INFERENCE IN SURVEY SAMPLING Statistica Sinica 22 (2012), 777-794 doi:http://dx.doi.org/10.5705/ss.2010.238 BIAS-ROBUSTNESS AND EFFICIENCY OF MODEL-BASED INFERENCE IN SURVEY SAMPLING Desislava Nedyalova and Yves Tillé University of

More information

Estimation of the Conditional Variance in Paired Experiments

Estimation of the Conditional Variance in Paired Experiments Estimation of the Conditional Variance in Paired Experiments Alberto Abadie & Guido W. Imbens Harvard University and BER June 008 Abstract In paired randomized experiments units are grouped in pairs, often

More information

Improving linear quantile regression for

Improving linear quantile regression for Improving linear quantile regression for replicated data arxiv:1901.0369v1 [stat.ap] 16 Jan 2019 Kaushik Jana 1 and Debasis Sengupta 2 1 Imperial College London, UK 2 Indian Statistical Institute, Kolkata,

More information

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Ruppert A. EMPIRICAL ESTIMATE OF THE KERNEL MIXTURE Here we

More information

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA Kasun Rathnayake ; A/Prof Jun Ma Department of Statistics Faculty of Science and Engineering Macquarie University

More information

Professors Lin and Ying are to be congratulated for an interesting paper on a challenging topic and for introducing survival analysis techniques to th

Professors Lin and Ying are to be congratulated for an interesting paper on a challenging topic and for introducing survival analysis techniques to th DISCUSSION OF THE PAPER BY LIN AND YING Xihong Lin and Raymond J. Carroll Λ July 21, 2000 Λ Xihong Lin (xlin@sph.umich.edu) is Associate Professor, Department ofbiostatistics, University of Michigan, Ann

More information

Combining data from two independent surveys: model-assisted approach

Combining data from two independent surveys: model-assisted approach Combining data from two independent surveys: model-assisted approach Jae Kwang Kim 1 Iowa State University January 20, 2012 1 Joint work with J.N.K. Rao, Carleton University Reference Kim, J.K. and Rao,

More information

The Use of Survey Weights in Regression Modelling

The Use of Survey Weights in Regression Modelling The Use of Survey Weights in Regression Modelling Chris Skinner London School of Economics and Political Science (with Jae-Kwang Kim, Iowa State University) Colorado State University, June 2013 1 Weighting

More information

Smooth functions and local extreme values

Smooth functions and local extreme values Smooth functions and local extreme values A. Kovac 1 Department of Mathematics University of Bristol Abstract Given a sample of n observations y 1,..., y n at time points t 1,..., t n we consider the problem

More information

Computation of an efficient and robust estimator in a semiparametric mixture model

Computation of an efficient and robust estimator in a semiparametric mixture model Journal of Statistical Computation and Simulation ISSN: 0094-9655 (Print) 1563-5163 (Online) Journal homepage: http://www.tandfonline.com/loi/gscs20 Computation of an efficient and robust estimator in

More information

Nonparametric Principal Components Regression

Nonparametric Principal Components Regression Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS031) p.4574 Nonparametric Principal Components Regression Barrios, Erniel University of the Philippines Diliman,

More information

Variance Function Estimation in Multivariate Nonparametric Regression

Variance Function Estimation in Multivariate Nonparametric Regression Variance Function Estimation in Multivariate Nonparametric Regression T. Tony Cai 1, Michael Levine Lie Wang 1 Abstract Variance function estimation in multivariate nonparametric regression is considered

More information

Bootstrap inference for the finite population total under complex sampling designs

Bootstrap inference for the finite population total under complex sampling designs Bootstrap inference for the finite population total under complex sampling designs Zhonglei Wang (Joint work with Dr. Jae Kwang Kim) Center for Survey Statistics and Methodology Iowa State University Jan.

More information

Finite Population Sampling and Inference

Finite Population Sampling and Inference Finite Population Sampling and Inference A Prediction Approach RICHARD VALLIANT ALAN H. DORFMAN RICHARD M. ROYALL A Wiley-Interscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane

More information

41903: Introduction to Nonparametrics

41903: Introduction to Nonparametrics 41903: Notes 5 Introduction Nonparametrics fundamentally about fitting flexible models: want model that is flexible enough to accommodate important patterns but not so flexible it overspecializes to specific

More information

Nonresponse weighting adjustment using estimated response probability

Nonresponse weighting adjustment using estimated response probability Nonresponse weighting adjustment using estimated response probability Jae-kwang Kim Yonsei University, Seoul, Korea December 26, 2006 Introduction Nonresponse Unit nonresponse Item nonresponse Basic strategy

More information

OPTIMISATION CHALLENGES IN MODERN STATISTICS. Co-authors: Y. Chen, M. Cule, R. Gramacy, M. Yuan

OPTIMISATION CHALLENGES IN MODERN STATISTICS. Co-authors: Y. Chen, M. Cule, R. Gramacy, M. Yuan OPTIMISATION CHALLENGES IN MODERN STATISTICS Co-authors: Y. Chen, M. Cule, R. Gramacy, M. Yuan How do optimisation problems arise in Statistics? Let X 1,...,X n be independent and identically distributed

More information

Robustness to Parametric Assumptions in Missing Data Models

Robustness to Parametric Assumptions in Missing Data Models Robustness to Parametric Assumptions in Missing Data Models Bryan Graham NYU Keisuke Hirano University of Arizona April 2011 Motivation Motivation We consider the classic missing data problem. In practice

More information

SEMIPARAMETRIC ESTIMATION OF CONDITIONAL HETEROSCEDASTICITY VIA SINGLE-INDEX MODELING

SEMIPARAMETRIC ESTIMATION OF CONDITIONAL HETEROSCEDASTICITY VIA SINGLE-INDEX MODELING Statistica Sinica 3 (013), 135-155 doi:http://dx.doi.org/10.5705/ss.01.075 SEMIPARAMERIC ESIMAION OF CONDIIONAL HEEROSCEDASICIY VIA SINGLE-INDEX MODELING Liping Zhu, Yuexiao Dong and Runze Li Shanghai

More information

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the

More information

Smooth simultaneous confidence bands for cumulative distribution functions

Smooth simultaneous confidence bands for cumulative distribution functions Journal of Nonparametric Statistics, 2013 Vol. 25, No. 2, 395 407, http://dx.doi.org/10.1080/10485252.2012.759219 Smooth simultaneous confidence bands for cumulative distribution functions Jiangyan Wang

More information

Marginal Specifications and a Gaussian Copula Estimation

Marginal Specifications and a Gaussian Copula Estimation Marginal Specifications and a Gaussian Copula Estimation Kazim Azam Abstract Multivariate analysis involving random variables of different type like count, continuous or mixture of both is frequently required

More information

Local Modal Regression

Local Modal Regression Local Modal Regression Weixin Yao Department of Statistics, Kansas State University, Manhattan, Kansas 66506, U.S.A. wxyao@ksu.edu Bruce G. Lindsay and Runze Li Department of Statistics, The Pennsylvania

More information

SEMIPARAMETRIC QUANTILE REGRESSION WITH HIGH-DIMENSIONAL COVARIATES. Liping Zhu, Mian Huang, & Runze Li. The Pennsylvania State University

SEMIPARAMETRIC QUANTILE REGRESSION WITH HIGH-DIMENSIONAL COVARIATES. Liping Zhu, Mian Huang, & Runze Li. The Pennsylvania State University SEMIPARAMETRIC QUANTILE REGRESSION WITH HIGH-DIMENSIONAL COVARIATES Liping Zhu, Mian Huang, & Runze Li The Pennsylvania State University Technical Report Series #10-104 College of Health and Human Development

More information

Classification via kernel regression based on univariate product density estimators

Classification via kernel regression based on univariate product density estimators Classification via kernel regression based on univariate product density estimators Bezza Hafidi 1, Abdelkarim Merbouha 2, and Abdallah Mkhadri 1 1 Department of Mathematics, Cadi Ayyad University, BP

More information

Supporting Information for Estimating restricted mean. treatment effects with stacked survival models

Supporting Information for Estimating restricted mean. treatment effects with stacked survival models Supporting Information for Estimating restricted mean treatment effects with stacked survival models Andrew Wey, David Vock, John Connett, and Kyle Rudser Section 1 presents several extensions to the simulation

More information

SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES

SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES Statistica Sinica 19 (2009), 71-81 SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES Song Xi Chen 1,2 and Chiu Min Wong 3 1 Iowa State University, 2 Peking University and

More information

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Institute of Statistics

More information

Chapter 8: Estimation 1

Chapter 8: Estimation 1 Chapter 8: Estimation 1 Jae-Kwang Kim Iowa State University Fall, 2014 Kim (ISU) Ch. 8: Estimation 1 Fall, 2014 1 / 33 Introduction 1 Introduction 2 Ratio estimation 3 Regression estimator Kim (ISU) Ch.

More information

On the Robust Modal Local Polynomial Regression

On the Robust Modal Local Polynomial Regression International Journal of Statistical Sciences ISSN 683 5603 Vol. 9(Special Issue), 2009, pp 27-23 c 2009 Dept. of Statistics, Univ. of Rajshahi, Bangladesh On the Robust Modal Local Polynomial Regression

More information

MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava

MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS Maya Gupta, Luca Cazzanti, and Santosh Srivastava University of Washington Dept. of Electrical Engineering Seattle,

More information

arxiv: v1 [stat.co] 26 May 2009

arxiv: v1 [stat.co] 26 May 2009 MAXIMUM LIKELIHOOD ESTIMATION FOR MARKOV CHAINS arxiv:0905.4131v1 [stat.co] 6 May 009 IULIANA TEODORESCU Abstract. A new approach for optimal estimation of Markov chains with sparse transition matrices

More information

Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach"

Kneib, Fahrmeir: Supplement to Structured additive regression for categorical space-time data: A mixed model approach Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach" Sonderforschungsbereich 386, Paper 43 (25) Online unter: http://epub.ub.uni-muenchen.de/

More information

Penalized Balanced Sampling. Jay Breidt

Penalized Balanced Sampling. Jay Breidt Penalized Balanced Sampling Jay Breidt Colorado State University Joint work with Guillaume Chauvet (ENSAI) February 4, 2010 1 / 44 Linear Mixed Models Let U = {1, 2,...,N}. Consider linear mixed models

More information

Additive Isotonic Regression

Additive Isotonic Regression Additive Isotonic Regression Enno Mammen and Kyusang Yu 11. July 2006 INTRODUCTION: We have i.i.d. random vectors (Y 1, X 1 ),..., (Y n, X n ) with X i = (X1 i,..., X d i ) and we consider the additive

More information

Model-assisted Estimation of Forest Resources with Generalized Additive Models

Model-assisted Estimation of Forest Resources with Generalized Additive Models Model-assisted Estimation of Forest Resources with Generalized Additive Models Jean Opsomer, Jay Breidt, Gretchen Moisen, Göran Kauermann August 9, 2006 1 Outline 1. Forest surveys 2. Sampling from spatial

More information

Bayesian Estimation and Inference for the Generalized Partial Linear Model

Bayesian Estimation and Inference for the Generalized Partial Linear Model Bayesian Estimation Inference for the Generalized Partial Linear Model Haitham M. Yousof 1, Ahmed M. Gad 2 1 Department of Statistics, Mathematics Insurance, Benha University, Egypt. 2 Department of Statistics,

More information

Calibration estimation in survey sampling

Calibration estimation in survey sampling Calibration estimation in survey sampling Jae Kwang Kim Mingue Park September 8, 2009 Abstract Calibration estimation, where the sampling weights are adjusted to make certain estimators match known population

More information

Issues on quantile autoregression

Issues on quantile autoregression Issues on quantile autoregression Jianqing Fan and Yingying Fan We congratulate Koenker and Xiao on their interesting and important contribution to the quantile autoregression (QAR). The paper provides

More information

TESTING SERIAL CORRELATION IN SEMIPARAMETRIC VARYING COEFFICIENT PARTIALLY LINEAR ERRORS-IN-VARIABLES MODEL

TESTING SERIAL CORRELATION IN SEMIPARAMETRIC VARYING COEFFICIENT PARTIALLY LINEAR ERRORS-IN-VARIABLES MODEL Jrl Syst Sci & Complexity (2009) 22: 483 494 TESTIG SERIAL CORRELATIO I SEMIPARAMETRIC VARYIG COEFFICIET PARTIALLY LIEAR ERRORS-I-VARIABLES MODEL Xuemei HU Feng LIU Zhizhong WAG Received: 19 September

More information

Small Area Estimation Using a Nonparametric Model Based Direct Estimator

Small Area Estimation Using a Nonparametric Model Based Direct Estimator University of Wollongong Research Online Centre for Statistical & Survey Methodology Working Paper Series Faculty of Engineering and Information Sciences 2009 Small Area Estimation Using a Nonparametric

More information

A New Method for Varying Adaptive Bandwidth Selection

A New Method for Varying Adaptive Bandwidth Selection IEEE TRASACTIOS O SIGAL PROCESSIG, VOL. 47, O. 9, SEPTEMBER 1999 2567 TABLE I SQUARE ROOT MEA SQUARED ERRORS (SRMSE) OF ESTIMATIO USIG THE LPA AD VARIOUS WAVELET METHODS A ew Method for Varying Adaptive

More information

Smooth nonparametric estimation of a quantile function under right censoring using beta kernels

Smooth nonparametric estimation of a quantile function under right censoring using beta kernels Smooth nonparametric estimation of a quantile function under right censoring using beta kernels Chanseok Park 1 Department of Mathematical Sciences, Clemson University, Clemson, SC 29634 Short Title: Smooth

More information

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song Presenter: Jiwei Zhao Department of Statistics University of Wisconsin Madison April

More information

Estimation of a quadratic regression functional using the sinc kernel

Estimation of a quadratic regression functional using the sinc kernel Estimation of a quadratic regression functional using the sinc kernel Nicolai Bissantz Hajo Holzmann Institute for Mathematical Stochastics, Georg-August-University Göttingen, Maschmühlenweg 8 10, D-37073

More information

Miscellanea A note on multiple imputation under complex sampling

Miscellanea A note on multiple imputation under complex sampling Biometrika (2017), 104, 1,pp. 221 228 doi: 10.1093/biomet/asw058 Printed in Great Britain Advance Access publication 3 January 2017 Miscellanea A note on multiple imputation under complex sampling BY J.

More information

Chapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70

Chapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70 Chapter 5: Models used in conjunction with sampling J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70 Nonresponse Unit Nonresponse: weight adjustment Item Nonresponse:

More information

Data Integration for Big Data Analysis for finite population inference

Data Integration for Big Data Analysis for finite population inference for Big Data Analysis for finite population inference Jae-kwang Kim ISU January 23, 2018 1 / 36 What is big data? 2 / 36 Data do not speak for themselves Knowledge Reproducibility Information Intepretation

More information

The Bayesian Approach to Multi-equation Econometric Model Estimation

The Bayesian Approach to Multi-equation Econometric Model Estimation Journal of Statistical and Econometric Methods, vol.3, no.1, 2014, 85-96 ISSN: 2241-0384 (print), 2241-0376 (online) Scienpress Ltd, 2014 The Bayesian Approach to Multi-equation Econometric Model Estimation

More information

Jyh-Jen Horng Shiau 1 and Lin-An Chen 1

Jyh-Jen Horng Shiau 1 and Lin-An Chen 1 Aust. N. Z. J. Stat. 45(3), 2003, 343 352 A MULTIVARIATE PARALLELOGRAM AND ITS APPLICATION TO MULTIVARIATE TRIMMED MEANS Jyh-Jen Horng Shiau 1 and Lin-An Chen 1 National Chiao Tung University Summary This

More information

Modification and Improvement of Empirical Likelihood for Missing Response Problem

Modification and Improvement of Empirical Likelihood for Missing Response Problem UW Biostatistics Working Paper Series 12-30-2010 Modification and Improvement of Empirical Likelihood for Missing Response Problem Kwun Chuen Gary Chan University of Washington - Seattle Campus, kcgchan@u.washington.edu

More information

ESTIMATION OF NONLINEAR BERKSON-TYPE MEASUREMENT ERROR MODELS

ESTIMATION OF NONLINEAR BERKSON-TYPE MEASUREMENT ERROR MODELS Statistica Sinica 13(2003), 1201-1210 ESTIMATION OF NONLINEAR BERKSON-TYPE MEASUREMENT ERROR MODELS Liqun Wang University of Manitoba Abstract: This paper studies a minimum distance moment estimator for

More information

Rejoinder. 1 Phase I and Phase II Profile Monitoring. Peihua Qiu 1, Changliang Zou 2 and Zhaojun Wang 2

Rejoinder. 1 Phase I and Phase II Profile Monitoring. Peihua Qiu 1, Changliang Zou 2 and Zhaojun Wang 2 Rejoinder Peihua Qiu 1, Changliang Zou 2 and Zhaojun Wang 2 1 School of Statistics, University of Minnesota 2 LPMC and Department of Statistics, Nankai University, China We thank the editor Professor David

More information

Eric Shou Stat 598B / CSE 598D METHODS FOR MICRODATA PROTECTION

Eric Shou Stat 598B / CSE 598D METHODS FOR MICRODATA PROTECTION Eric Shou Stat 598B / CSE 598D METHODS FOR MICRODATA PROTECTION INTRODUCTION Statistical disclosure control part of preparations for disseminating microdata. Data perturbation techniques: Methods assuring

More information

Bayesian estimation of the discrepancy with misspecified parametric models

Bayesian estimation of the discrepancy with misspecified parametric models Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012

More information

EFFICIENCY OF MODEL-ASSISTED REGRESSION ESTIMATORS IN SAMPLE SURVEYS

EFFICIENCY OF MODEL-ASSISTED REGRESSION ESTIMATORS IN SAMPLE SURVEYS Statistica Sinica 24 2014, 395-414 doi:ttp://dx.doi.org/10.5705/ss.2012.064 EFFICIENCY OF MODEL-ASSISTED REGRESSION ESTIMATORS IN SAMPLE SURVEYS Jun Sao 1,2 and Seng Wang 3 1 East Cina Normal University,

More information

University, Tempe, Arizona, USA b Department of Mathematics and Statistics, University of New. Mexico, Albuquerque, New Mexico, USA

University, Tempe, Arizona, USA b Department of Mathematics and Statistics, University of New. Mexico, Albuquerque, New Mexico, USA This article was downloaded by: [University of New Mexico] On: 27 September 2012, At: 22:13 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered

More information

Generated Covariates in Nonparametric Estimation: A Short Review.

Generated Covariates in Nonparametric Estimation: A Short Review. Generated Covariates in Nonparametric Estimation: A Short Review. Enno Mammen, Christoph Rothe, and Melanie Schienle Abstract In many applications, covariates are not observed but have to be estimated

More information

Key words and phrases: B-spline, confidence envelope, covariance function, functional data, Karhunen-Loève L 2 representation, longitudinal data.

Key words and phrases: B-spline, confidence envelope, covariance function, functional data, Karhunen-Loève L 2 representation, longitudinal data. Statistica Sinica 2013): Preprint 1 ORACLE-EFFICIENT CONFIDENCE ENVELOPES FOR COVARIANCE FUNCTIONS IN DENSE FUNCTIONAL DATA Guanqun Cao 1, Li Wang 2, Yehua Li 2 and Lijian Yang 3 1 Auburn University, 2

More information