Fractional hot deck imputation

Size: px
Start display at page:

Download "Fractional hot deck imputation"

Transcription

1 Biometrika (2004), 91, 3, pp Biometrika Trust Printed in Great Britain Fractional hot deck imputation BY JAE KWANG KM Department of Applied Statistics, Yonsei University, Seoul, , Korea AND WAYNE FULLER Department of Statistics, owa State University, Ames, owa 50011, U.S.A. SUMMARY To compensate for item nonresponse, hot deck imputation procedures replace missing values with values that occur in the sample. Fractional hot deck imputation replaces each missing observation with a set of imputed values and assigns a weight to each imputed value. Under the model in which observations in an imputation cell are independently and identically distributed, fractional hot deck imputation is shown to be an effective imputation procedure. A consistent replication variance estimation procedure for estimators computed with fractional imputation is suggested. Simulations show that fractional imputation and the suggested variance estimator are superior to multiple imputation estimators in general, and much superior to multiple imputation for estimating the variance of a domain mean. Some key words: Cell mean model; Missing data; Nonresponse; Replication variance estimation. 1. NTRODUCTON tem nonresponse occurs when a sampled unit provides some information but fails to respond to all items. Hot deck imputation is an imputation procedure in which the value assigned for a missing item is taken from respondents in the current sample. Many hot deck imputation procedures use auxiliary variables known for both the respondents and nonrespondents to divide the sample into so-called imputation cells. The hot deck imputation method assigns the value from a record with a response to the record with a missing value. The record providing the value is called the donor and the record with the missing value is called the recipient. A desirable property of hot deck imputation is that all imputed values are observed values. For example, imputed values for categorical variables will also be categorical with the same number of categories as observed for the respondents. n random hot deck imputation, nonrespondents are assigned values chosen at random from respondents in the same imputation cell. Random hot deck imputation preserves the distributional properties of the imputed dataset; that is, the distribution function for imputed data within a cell differs from the distribution function for the respondents in the cell only because of the randomness of imputation.

2 560 JAE KWANG KM AND WAYNE FULLER Random selection of donors introduces variability that is termed imputation variance. Brick & Kalton (1996) describe two methods for reducing imputation variance. One is through the sample design used for selecting donors within each imputation cell. For instance, selecting donors by simple random sampling without replacement is preferable to simple random sampling of donors with replacement. A second approach is to use fractional imputation (Kalton & Kish, 1984; Fay, 1996) which involves using more than one donor for a recipient. For example, three imputed values might be assigned to each nonrespondent, with each entry allocated a weight of one-third of the nonrespondent s original weight. Multiple imputation, proposed by Rubin (1978), is a procedure for handling missing data that allows the data analyst to use analyses designed for complete data, while at the same time providing a method for estimating the uncertainty due to the missing data. Both fractional imputation and multiple imputation can be called repeated imputation, in the sense that more than one value is assigned for each missing item. However, fractional imputation was designed to reduce the imputation variance, while the primary goals of multiple imputation are to simplify estimation and to provide an estimator of the variance. The approximate Bayesian bootstrap described in Rubin & Schenker (1986) can be viewed as a hot deck imputation method in the multiple imputation context. n addition to the multiple imputation procedure of Rubin & Schenker (1986), several other methods have been proposed for estimating the variance of an estimated total after hot deck imputation. Rao & Shao (1992) proposed a jackknife variance estimator for hot deck imputation in which the donors are selected with replacement with the selection probability proportional to the sampling weights. Tollefson & Fuller (1992) proposed a variance estimator for without-replacement hot deck imputation. Särndal (1992), Fay (1996) and Chen & Shao (2001) also proposed variance estimators for certain types of hot deck imputation; see Little & Rubin (2002) for discussion of imputation methods. n this paper, we give assumptions under which hot deck imputation produces consistent estimators, provide a general representation for hot deck imputation estimators and give the variance of the estimators. Secondly, we propose a class of unbiased variance estimators for the variance of the imputed estimator that is applicable for complex survey designs. The proposed variance estimator is general in the sense that it does not depend on the specific hot deck imputation method used. For example, the proposed variance estimator is unbiased for both with-replacement and without-replacement selection of donors. Fractional imputation combined with the proposed replication variance estimator gives a set of replication weights that can be used to construct unbiased variance estimators for estimators based on the completely responding variables as well as for estimators based on imputed data. 2. MODELS FOR MPUTATON 2 1. ntroduction Consider a population of N elements identified by a set of indices U={1,2,...,N}. Associated with unit i of the population is the value of the study variable y and i Y =(y,y,...,y ) denotes the vector of values of study variable for the N units in the 1 2 N population. Let A denote the set of indices of the elements in a sample selected by a set of probability rules called the sampling mechanism. Responses are obtained from the selected sample

3 Fractional hot deck imputation 561 according to a probability mechanism called the response mechanism. Let the population quantity of interest be h =h(y,y,...,y ) and let h@ be a linear estimator of h based N 1 2 N N on the full sample, h@ = wy. (1) i i iµa f we observe y on every element of the sample, then h@ with w ={pr(iµa)} 1 is a i i design-unbiased estimator of the population sum of the y. i Assume that the finite population U is made up of G imputation cells. Let n be the g number of sample elements in imputation cell g and let r, r >0, be the number of g g respondents in imputation cell g. The elements in cell g (,...,G) of the finite population are assumed to be a realisation of independently and identically distributed random variables with mean m and variance s2. Thus, independently for each iµu, g g g y i ~(m g, s2 g ), (2) where U g denotes the set of indices for the gth imputation cell. f the y i are independent of the sampling mechanism and of the response mechanism conditional on the cell, then model (2) for the cell holds for the responding units as well as for the nonrespondents; that is, independently for each iµa g, y i (A, A R )~(m g, s2 g ), (3) where A g is the set of indices of sample elements in cell g and A R is the set of indices of the sample respondents. We call model (3) the cell mean model. Note that the y i in the population may be related to the design but the division into cells removes that dependence Hot deck imputation A hot deck imputation method can be described by two factors, the first of which is the way in which donors are selected for each missing item. The distribution of d={d ;iµa R,jµA M }, (4) where d is the number of times that y i is used as a donor for y j and A M is the set of indices of the sample nonrespondents, is called the imputation mechanism. The second factor is the way the weight of the donor is defined for each missing item. Let w* be the fraction of the original weight of element j assigned to the value from donor i used for the missing value of element j. For element j with missing y, y j = iµa R w* y i (5) is the weighted mean of the imputed values. Note that w* ii =1 for iµa R and w* ii =0 for iµa M. The sum of the imputation fractions for each item is required to be one: iµa R w* =1, (6) for all jµa. n the typical situation with M donors for element j ( jµa M ), the w* are all equal to M 1 for d =1.

4 562 JAE KWANG KM AND WAYNE FULLER A linear estimator using hot deck imputation can be written in the form h@ = iµa R A w w* j jµa B y i 7 a y. (7) i i iµa R The sum of w* w over all recipients for which i is a donor, including i itself, is the total j weight of donor i, and is denoted by a. i 3. ESTMATON AFTER HOT DECK MPUTATON The properties of the estimator (7) under the cell mean model (3) are given in Theorem 1. THEOREM 1. L et the population satisfy the cell mean model (3). Assume the distribution of d is independent of Y and depends only on (r,r,...,r ) and (n,n,...,n ). Also, 1 2 G 1 2 G pr (d >0)>0 if and only if i and j belong to the same imputation cell. Let h@ be a linear estimator of the form (1) constructed from the full sample and assume that h@ is design-unbiased for the population quantity h. L et the fractionally imputed N estimator be defined by (7) and assume that the imputation fractions are constructed so that (6) holds. T hen E(h@ h )=0, (8) N var (h@ )=var A G w m i iµa g gb +E A G a2s2 i iµa Rg gb, (9) where w is proportional to {pr (iµa)} 1, a is the total weight of donor i defined in (7), A i i is the set of sample indices, A is the set of respondent indices, G is the number of imputation R cells, A is the set of indices of the sample for the gth imputation cell, and A is the set of g Rg indices of respondents for the gth imputation cell. f h =WN y and W w =N, then N i=1 i iµa i var (h@ h )=var A G w m N i iµa g gb +E q G (a2 a )s2 i i iµa Rg gr. (10) See Appendix 1 for the proof. n Theorem 1, expression (9) is the variance of h@ as an estimator of the superpopulation parameter. Expression (10) is the variance of the estimator of the finite population total. The expectations on the right-hand side of (9) and on the right-hand side of (10) are with respect to the joint distribution defined by the superpopulation model, the sampling mechanism, the response mechanism and the imputation mechanism. Correspondingly, the variances are with respect to the joint distribution. Under the cell mean model (3), the variance of the estimator is a function of the expectation of a2, where the expectation is a function of the procedure used to select i donors. Under model (3), a procedure that produces equal a will minimise the conditional i variance, conditional on the observed sample indices. While the best estimator of the cell mean under the cell mean model is the simple sample cell mean, this estimator is seldom used in practice. The practitioner may be willing to use the model to impute for missing values but not for estimation of the mean. Also, the model may not be appropriate for other y-values. Common approaches under model (3) are to select donors within the cell with equal probabilities or to select donors with probabilities proportional to the original weights of donors.

5 Fractional hot deck imputation Given nonresponse, one estimator of h is the ratio estimator 563 h@ = G A w FE iµa g ib W w y iµa Rg i i, (11) W w iµarg i where A and A are defined in Theorem 1. The estimator (11) is called fully efficient g Rg because it contains no variability due to random selection of donors. As discussed, (11) is not fully model efficient under the cell mean model when the w differ. i The estimator (11) can be implemented by using fractional imputation. Every responding unit in the imputation cell is used as a donor for each nonresponding element in the cell and the imputation fractions w* are defined to be proportional to the sampling weights. The resulting fractionally imputed estimator is h@ = w w* y, (12) FEF j i jµa iµa R where w w* is the weight of donor i for recipient j, and w* is the imputation fraction for j donor i as a donor for recipient j, defined as w ) 1w, if jµa and iµa, w* = q(w sµa Rg s i Mg Rg (13) 1, if jµa and i=j. R The estimator (12) with w* of (13), algebraically equivalent to (11), is called the fully efficient fractionally imputed estimator. f the sampling weights in a cell are the same, this estimator has the minimum conditional variance, conditional on the number in the cell, among linear unbiased estimators under the cell mean model. The imputed dataset permits the simple computation of statistics such as percentiles without recomputing the ratio. 4. REPLCATON VARANCE ESTMATON n this section we consider variance estimation based on replication; Wolter (1985) and Rust & Rao (1996) contain discussions of such procedures. Let a replication variance estimator for a complete sample be VC (h@)= L c (h@(k) h@)2, (14) k k=1 where h@(k) is the kth estimator of h based on the observations included in the kth N replicate, L is the number of replicates, and c is a factor associated with replicate k k determined by the replication method. When the original estimator h@ is a linear estimator of the form (1), the kth replicate of h@ can be written h@(k)= w(k) y, (15) i i iµa where w(k) denotes the replicate weight for the ith unit of the kth replication. i Theorem 2 provides criteria for constructing an unbiased replication variance estimator for the fractionally imputed estimator. THEOREM 2. L et the conditions of T heorem 1 hold. Assume that h@ is of the form in (7), where the imputation fractions satisfy (6). L et the complete sample replication variance estimator of (14) be design-unbiased for the design variance of the complete sample estimator.

6 564 JAE KWANG KM AND WAYNE FULLER L et the kth replicate of h@ for the imputed sample be of the form h@(k) = w(k) w*(k) y (16) j i jµa iµa R = a(k) y, (17) i i iµa R where a(k) =W w(k) w*(k),w(k) is the kth replication weight of unit j for the complete sample i jµa j j and w*(k) is the kth replicate of the imputation fraction w*. Assume replicates are constructed so that L c (a(k) a )2= a2 (,2,...,G), (18) k i i i k=1 iµa Rg iµa Rg w*(k) =1 (for all jµa). (19) iµa R T hen the variance estimator is unbiased for the variance of h@. VC (h@ )= L k=1 c k (h@(k) )2 (20) See Appendix 1 for the proof. The estimator VC (h@ ) is unbiased for the variance of h@, where, as in Theorem 1, the variance is defined by the joint distribution. The result holds for any survey design for which a replication variance estimator is available. Requirement (18) for the replicates follows from the representation of the variance given in (9) of Theorem 1. There are numerous ways of constructing replicates that satisfy (18) and (19). We consider replicates of the jackknife type. A natural starting place is to consider replicates constructed by removing all of the imputed values associated with a deleted respondent and increasing the weights of the other donors. By slightly modifying this procedure we can construct an unbiased estimator of the variance. Let a(k) 1i =W jµa w(k) w* and let P be the set of indices of elements deleted, or that j k have their weights reduced, at the kth replication. Assume that the P (k=1,2,...,l) k are such that each element appears in one and only one of the P. The replicates will k satisfy (18) if c (a(k) a )2 c (a(k) k i i k 1i a i )2= iµa Rg iµa Rg sµp k] A Rg q a2 s L i=1 c i (a(i) 1s a s )2 r, (21) for,2,...,gand k=1,2,...,l. Consider a fractionally imputed procedure with M distinct donors for each missing value and w* =M 1 for d =1 and jµa M. Define jackknife replicate fractions for missing unit j in cell g of replicate k by d, if M >0 and iµp, kg jk k w*(k) =Gw* w* +(M M ) 1M d, if M >0 and i1p, jk jk kg jk k w*, otherwise, (22)

7 Fractional hot deck imputation 565 where M =W d is the number of donors in P that are used for missing unit j, jk iµpk] A R k and d (k=1,2,...,l) are to be determined. Equation (21) expressed as a quadratic kg function of d is kg iµp k] A Rg c kq a(k) 1i a i d kg jni w(k) d j r 2 + c iµa i1p Rg kq a(k) 1i a i +d kg jni k c (a(k) k 1i a i )2 iµa Rg (M M ) 1M w(k) d jk jk j r 2 sµp k] A Rg q a2 s L c (a(i) i 1s a s )2 =0, (23) r i=1 where a(k) 1i =W jµa w(k) w*. Thus, the d, for k=1,2,...,l and,2,...,g, can be j kg obtained by solving the LG quadratic equations. f the weights are to be positive, d kg must be less than the smallest w*. There are certain replication methods and sampling configurations where it is not possible to find a d less than w* satisfying (23). n such kg cases an unbiased variance estimator can be constructed by adding replicates where the w*(k) differ from w*, but w(k) i =w i for all i. 5. ESTMATON OF LNEAR COMBNATONS mputed datasets are often used to construct estimates for subdomains of the population where these domains were not part of the model used in the construction of imputed values. n fact, imputation may be carried out under the conscious assumption that the imputation model holds across domains. An estimated domain mean for the full sample can be represented as where h@ = A w z zy j jµa jb 1 w z y, j j j jµa if j is in the domain, z = j q1, 0, otherwise. The estimator h@ zy is a linear estimator if the z values are considered fixed. t is not linear in the survey design context because the z i vary from sample to sample. To evaluate the performance of the fractional imputation variance estimator for the domain mean, first consider the estimation of the mean of cross-products of z and y, where z i is known for all iµa and some missing values of y have been imputed using fractional imputation. Writing the estimator as a function of the observed y s, we have h@ = zy, iµa R A w i z i + w w* z j jµa M jb y i 7 a y. z,i i iµa R n constructing replicates for variance estimation for W w y, we create replicates that iµa i i satisfy (18), where a(k) i =w(k) + i w(k) j jµa M w*(k).

8 566 JAE KWANG KM AND WAYNE FULLER Only in special cases will the replicate weights a(k) z,i =w(k) z + w(k) w*(k) z i i j j jµa M also satisfy equality (18). Hence, the replicates constructed for h@ will give a biased zy, estimator of the variance of h@. The relative bias in the fractionally imputed variance zy, estimator for jackknife replicates for a simple random sample of size n, a single imputation cell and w* =M 1 is of order n 1(n r)m 1, where r is the number of respondents. Thus, the bias in the fractional imputation variance estimator can be reduced by increasing M, where M is bounded by r. 6. SMULATON STUDES Simulation was used to compare some imputation methods. n the first study, simple random samples were generated from an infinite population composed of two equally sized cells, where, independently for each i, y i ~ qn(2 8, 1 16), in cell 1, N(3 8, 1 735), in cell 2. (24) For every i, the variable z i, where the z i are independently and identically distributed indicator variables with pr (z=1)=0 25, was created. The z i are independent of the y i, define membership in a domain, and are always observed. The response rate for y is 0 7 in cell 1 and 0 6 in cell 2. Thirty thousand samples of size 60 and samples of size 120 were generated. All of the realised samples had more than one respondent in each imputation cell. The parameters of interest are the mean of y, denoted by h 1, the mean of y for z=1, denoted by h 2, and the fraction of y s less than 2 0, denoted by h 3. The parameter h 3 is the mean of the variables The full sample estimator of h 2 is u i = q1, if y i <2, 0, otherwise. h@ = A z 2 iµa ib 1 z y. i i iµa Estimators were constructed by fully efficient fractional imputation, by a fractional imputation scheme with M=3 donors explained below, by the approximate Bayesian bootstrap of Rubin & Schenker (1986) with M=3 imputations, by fractional imputation with M=10 and by the approximate Bayesian bootstrap with M=10. The fractional imputation method was performed using without-replacement selection of M donors for each nonrespondent. Let there be r g respondents and m g missing values in cell g. Let Mm g =t g r g +k g, where t g and k g are integers with 0 k g <r g. Then k g respondents are used t g +1 times, and r g k g respondents are used t g times. Those to be used t g +1 times are selected with equal probabilities without replacement. The replicates were calculated using (22) and (23).

9 Fractional hot deck imputation The variance estimator for multiple imputation adopted from Rubin (1987) is where 567 VC M =W M +(1+M 1)B M, (25) (W, h: )=M 1 M (VC, h@ ), B =(M 1) 1 M (h@ h: )2, M M (t) (t) M (t) M t=1 t=1 M is the number of the multiple imputations, VC is the complete-data variance estimator (t) applied to the tth imputation dataset, and h@ is a version of h@ computed from the tth (t) imputation dataset. Table 1 shows the means, the variances and the standardised variances of the point estimators under the five imputation schemes. The standardised variance is the variance Table 1: First simulation. Means, variances and standardised variances of the point estimators under five diverent imputation schemes, based on samples n Parameter mputation scheme Mean Variance Std var 60 Mean, FEF h F (M=3) ABB (M=3) F (M=10) ABB (M=10) Domain mean, FEF h F (M=3) ABB (M=3) F (M=10) ABB (M=10) Proportion, FEF h F (M=3) ABB (M=3) F (M=10) ABB (M=10) Mean, FEF h F (M=3) ABB (M=3) F (M=10) ABB (M=10) Domain mean, FEF h F (M=3) ABB (M=3) F (M=10) ABB (M=10) Proportion, FEF h F (M=3) ABB (M=3) F (M=10) ABB (M=10) FEF, fully efficient fractional imputation; F, fractional imputation; ABB, approximate Bayesian bootstrap; Std var, standardised variance.

10 568 JAE KWANG KM AND WAYNE FULLER divided by the variance of the fully efficient fractionally imputed estimator, and multiplied by 100. All five imputation methods are unbiased for the three parameters and the Monte Carlo results are consistent with that property. Under model (24), the theoretical variance of the full sample estimator of h is for n=60, and the variance of the fully efficient fractional imputation estimator of h is 1 var (h@ )=n 2{var (n m +n m )+E(n2r 1)s2+E(n2r 1)s2} 1,FEF j q , if n=60, , if n=120, where m is the mean of cell g, s2 is the variance of cell g, and n is the sample number g g g falling in cell g. The theoretical increases in the variance of the estimator of the mean h 1 for n=60 due to missing values are , , , and for fully efficient fractional imputation, fractional imputation with M=3, the approximate Bayesian bootstrap imputation with M=3, fractional imputation with M=10 and the approximate Bayesian bootstrap imputation with M=10, respectively. Thus, the corresponding approximate relative efficiencies as estimators of the mean of the missing values are 100, 98 9, 75 5, 99 8 and 91 1 for a sample of size 60. The corresponding relative efficiencies for n=120 are 100, 98 0, 75 2, 99 8 and The Monte Carlo results are in general agreement with the theoretical approximations. The Monte Carlo variances of Table 1 show that the fully efficient fractional imputation is always the most efficient and the approximate Bayesian bootstrap imputation with M=3 is always the least efficient. The fractional imputation scheme with M=3 shows only about 1% loss in efficiency relative to fully efficient fractional imputation for estimation of the mean. Fractional imputation with M=10 has less than 0 5% loss in efficiency relative to the fully efficient procedure for the mean. Losses are somewhat larger for the domain mean with a loss of about 8% for M=3 and a loss of about 2% for M=10 relative to fully efficient fractional imputation. Table 2 shows the mean, relative bias, relative variance, standardised variance and t-statistics for the variance estimators. The relative bias of the estimated variance is the Monte Carlo bias divided by the Monte Carlo mean. The relative variance is the Monte Carlo variance of the variance estimator divided by the square of the variance, where the variance is given in Table 1. The t-statistic for testing the hypothesis of zero bias is the Monte Carlo estimated bias divided by the Monte Carlo standard error of the estimated bias. The fractional imputation variance estimation procedures are unbiased for the variance of the mean and the Monte Carlo results for h and h are in agreement with that property. 1 3 For small imputation cell sizes, the approximate Bayesian bootstrap variance estimators are biased for the variance of the mean. The bias of the multiple imputation variance estimator for the approximate Bayesian bootstrap method is, up to O(n 2) terms, (n g r g )n g r g A 2 n + 1 n g + 1 r g B s2 gr. E(VC ) var (h@ )j n 2 q 2 M ABB The approximate percent relative bias of the variance estimator for h@ is 3 9% for 1,M (n, M)=(60, 10) and 1 9% for (n, M)=(120, 10), agreeing with the Monte Carlo results. The fractionally imputed variance estimator is biased for the variance of h@. The bias 2 comes from two sources, the first being the bias described in 5. The second source is the bias in the jackknife variance estimator for a ratio, where the domain mean is a ratio

11 Fractional hot deck imputation Table 2: First simulation. Means, relative biases, relative variances, standardised variances and t-statistics for the variance estimator, based on samples 569 n Method Mean RB (%) RV (%) Std var t-statistic 60 h FEF F (M=3) ABB (M=3) F (M=10) ABB (M=10) h FEF F (M=3) ABB (M=3) F (M=10) ABB (M=10) h FEF F (M=3) ABB (M=3) F (M=10) ABB (M=10) h FEF F (M=3) ABB (M=3) F (M=10) ABB (M=10) h FEF F (M=3) ABB (M=3) F (M=10) ABB (M=10) h FEF F (M=3) ABB (M=3) F (M=10) ABB (M=10) FEF, fully efficient fractional imputation; F, fractional imputation; ABB, approximate Bayesian bootstrap; RB, relative bias; RV, relative variance; Std var, standardised variance. estimator. The relative bias in the estimated variance of the domain mean computed with the full sample is about 8% for n=60 and about 3% for n=120. The ratio bias decreases as n increases and the imputation bias decreases as M increases. Table 2 illustrates that multiple imputation with the approximate Bayesian bootstrap produces a seriously biased estimator of the variance of the estimator of h 2. The existence of a bias in the multiple imputation variance estimator for a domain mean, where the domain information is not used for imputation, was pointed out by Fay (1992). The relative biases of the approximate Bayesian bootstrap variance estimators for h@ 2 are over 50%, for both sample sizes. This is judged to be a serious shortcoming of multiple imputation variance estimation because the construction of estimates for small domains is one of the reasons to choose imputation over weighting as the adjustment for nonresponse.

12 570 JAE KWANG KM AND WAYNE FULLER The bias in the multiple imputation variance estimator for the domain mean results primarily because the B M of (25) is a biased estimator of the effect of imputing for missing values. The model used to generate the data is the model used in imputation in that the analysis variable is assumed to be unrelated to z. The assumed independence is reflected in the imputation in that the assignment of donors is made without consideration of z. As a result, the estimator for a domain calculated with imputed data can have smaller variance than the estimator computed from a complete sample. The multiple imputation variance estimator does not reflect the fact that some of the imputed values used in the domain come from observations outside the domain. ncreasing M or increasing sample size reduces only the bias in the multiple imputation variance estimator due to the bias in the jackknife estimator of the variance of a ratio. The variance estimators for the fractionally imputed procedures are more stable than those of the approximate Bayesian bootstrap methods. Fully efficient fractional imputation has the uniformly smallest variance of the variance estimators and the approximate Bayesian bootstrap method with M=3 has the largest variance of the variance estimators. The relative efficiency of the fractional imputation variance estimator relative to the multiple imputation variance estimator with M=10 for h 1 is 140% for n=60 and 197% for n=120. The relative efficiency of the fractional imputation variance estimator with M=3 relative to the multiple imputation variance estimator with M=3 for h 1 is 375% for n=60 and 664% for n=120. The variance of the imputed estimator has two components, the variance of the full sample estimator and the variance due to imputation. The approximate Bayesian bootstrap method estimates the two components of variance separately. The degrees of freedom for the estimator of the variance due to imputation is M 1 and, hence, does not decrease as n increases. Although the variance of the estimated sampling variance is order n 1, the estimated variance due to imputation places a lower bound on the variance of the estimated variance. Unlike the multiple imputation variance estimator, the variance of the fractional imputation estimated variance is inversely proportional to the number of sampling units. Hence, for fixed M, the relative efficiency of the fractional imputation variance estimator relative to the approximate Bayesian bootstrap increases without bound as sample size increases. Table 3 displays the means and variances of the lengths of 95% confidence intervals, and the size of a two-sided test for the true value. The coverage probability of a nominal 95% interval is one minus the size. The confidence intervals are (h@ t n VC, h@ +t n VC ), where t n is the upper 2 5 percentile of the t-distribution with n degrees of freedom. The degrees of freedom for t for fractional imputation methods is one less than the number of respondents. For the approximate Bayesian bootstrap method, the degrees of freedom is that suggested by Barnard & Rubin (1999). The confidence intervals based on fractional imputation show better performance than the confidence intervals based on the approximate Bayesian bootstrap. For h 1 and n=60 the fractional imputation confidence intervals have coverages near the nominal level of 95%. The size given in Table 3 is the size of a nominal 5% two-tailed test for the mean and is one minus the coverage probability. The approximate Bayesian bootstrap with M=10 produces intervals that average 2% narrower than the fractional imputation intervals, but the coverage is significantly less than 95%. Also the lengths of the approximate Bayesian bootstrap intervals are much more variable. The coverages of confidence intervals for h 3 constructed with the fractional imputation procedure are superior to those of the approximate Bayesian bootstrap procedure, but all

13 Fractional hot deck imputation Table 3: First simulation. Standardised means of confidence interval width, standardised variances of confidence interval width and sizes of a twotailed test, based on samples 571 n Method Std mean Std var Size (%) 60 h FEF F (M=3) ABB (M=3) F (M=10) ABB (M=10) h FEF F (M=3) ABB (M=3) F (M=10) ABB (M=10) h FEF F (M=3) ABB (M=3) F (M=10) ABB (M=10) h FEF F (M=3) ABB (M=3) F (M=10) ABB (M=10) h FEF F (M=3) ABB (M=3) F (M=10) ABB (M=10) h FEF F (M=3) ABB (M=3) F (M=10) ABB (M=10) FF, fully efficient fractional imputation; F, fractional imputation; ABB, approximate Bayesian bootstrap; Std mean, standardised mean; Std var, standardised variance. procedures have coverages significantly less than 95%. Coverages deviate from 95% because the distribution of the test statistic deviates from normality. Coverages of intervals for h 3 generally improve as sample size increases for all procedures. The coverages of the fractional imputation procedure are close to 95% for h 2, the domain mean. The sizes of the approximate Bayesian bootstrap procedures for the domain mean are much smaller than the nominal 5% because the variance is seriously overestimated. The sizes of the approximate Bayesian bootstrap procedure for the domain mean do not improve as sample size increases. We also simulated the performance of a multiple imputation procedure, similar to that described by Rubin (1987, p. 83), that assumes the distribution of y to be known. n our case the distribution of y in a cell is normal and we call the procedure normal

14 572 JAE KWANG KM AND WAYNE FULLER distribution multiple imputation. The approximate Bayesian bootstrap imputation and normal distribution multiple imputation procedures have similar efficiencies for h 1 and h with the approximate Bayesian bootstrap marginally more efficient. The normal 2 distribution multiple imputation is slightly more efficient for h because it uses correct 3 information about the distribution. The sizes of normal distribution multiple imputation were slightly better than the approximate Bayesian bootstrap for h and worse for h. The 1 2 normal distribution multiple imputation overestimated the variance of the imputed estimator of h by more than 50% for both sample sizes. The normal distribution multiple 2 imputation procedure was generally the best procedure for h although it overestimated 3 the variance and thus led to sizes less than 5% for n=120. n a second simulation we generated simple random samples of clusters from an infinite population of clusters, where the population contains four types of cluster. The clusters are composed of elements selected from the two imputation cells. One-quarter of the clusters have one element from cell one and five elements from cell two; one-quarter of the clusters have two elements from cell one and four from cell two; one-quarter of the clusters have four elements from cell one and two elements from cell two; and one-quarter of the clusters have five elements from cell one and one element from cell two. The response rates for the two cells are 0 7 in cell 1 and 0 6 in cell 2, as in the first simulation. Three scenarios were simulated. n the first scenario, called population (A), they variable is generated by specification (24). n addition to the y-variable, a variable x was generated, distributed as a uniform random variable independent of y. The estimator of the mean of y and the regression coefficient, i.e. slope, for the regression of y on x were computed. The slope was calculated as W W (x iµar jµa j x: )(y i y: )w*, W (x iµa i x: )2 where w* =M 1d if jµa M and iµa R, and y: is the mean of the imputed sample. Replicates for variance estimation were calculated using (22) and (23). For the population generated in this manner, model (2) holds for the variable y. Hence, as the numbers in the first block of rows in Table 4 indicate, the imputed estimator is unbiased for the mean of y. Also, because y is independent of x, the true slope is zero and the slope estimated with imputed data is unbiased for zero. The multiple imputation variance estimator is slightly biased for the variance of the mean because of the cell size bias described earlier. n this experiment, unlike the experiment of Table 2, the variance of the estimated multiple imputation variance of the mean for M=10 is smaller than the variance of the fractional imputation variance estimator for M=10. To estimate the full sample variance, the multiple imputation procedure uses the average of the full sample variance estimators for ten sets of imputed data. Since the imputed cluster samples are not perfectly correlated, the average of the M estimated full sample variances is more efficient for the full sample variance than the usual variance estimator based on the full sample. The relative variance of the full sample variance estimator is about 10%, while the relative variance of the multiple imputation variance estimator of the full sample variance is about 7%. Thus, for multiple imputation with M=10, the contribution of the B M component to the total variance of the estimated variance is such that the relative variance of the multiple imputation variance estimator is less than 10%.

15 Fractional hot deck imputation Table 4: Second simulation. Properties of estimators for diverent estimation schemes and populations based on samples of 20 clusters 573 Parameter RB (%) RV (%) Power (Population) Method Mean Variance Est. var. Est. var. (%) Mean F (M=3) (A) M (M=3) F (M=10) M (M=10) Slope F (M=3) (A) M (M=3) F (M=10) M (M=10) Mean F (M=3) (B) M (M=3) F (M=10) M (M=10) Slope F (M=3) (C) M (M=3) F (M=10) M (M=10) F, fractional imputation; M, multiple imputation; RB, relative bias, RV, relative variance; Est var., estimated variance. The relative variance of the fractional imputation variance estimator is about 20% larger than the relative variance of the full sample variance estimator for the full sample. The fractional imputation variance estimator is based on the same number of replicates as the full sample estimator and those replicates reflect the increase in variability due to imputation. The nature of the bias in the fractional imputation estimated variance of the slope was described in 5. As that discussion suggested, the empirical relative bias for the fractional imputation variance estimator declines from 15% at M=3to7%atM=10. The multiple imputation variance estimator has a bias of about 80% for the slope in this simulation. The nature of the bias in the multiple imputation estimated variance of the slope is that described in the discussion for the estimation of domains. n population (B), the variables y 2i were generated by the model y 2i =y i +b i, where y i is defined in (24), b i =0 8 if the element is in a cluster of type 1, b i =0 4 if the element is in a cluster of type 2, b i = 0 4 if the element is in a cluster of type 3, and b i = 0 8 if the element is in a cluster of type 4. The remaining simulation parameters are as defined for population (A). The elements in a cell of population (B) are not identically distributed. However, the imputed estimator of the mean is unbiased because the response rate is the same for all elements in a cell. The multiple imputation estimator of the variance of the full sample estimator is constructed under the assumption that the imputed values in the cluster have the same mean as the missing values. Since that condition does not hold in population (B), the multiple

16 574 JAE KWANG KM AND WAYNE FULLER imputation variance estimator has a negative bias. On the other hand, the method of estimation employed by fractional imputation reflects the presence of a cluster effect, and the fractional imputation variance estimator for M=10 is nearly unbiased. To study the behaviour of the procedures under a misspecified model, a variable was generated by the rule y 3i =0 7x i +y i, where y i is defined in (24). For this population, called population (C), the remaining parts of the data configuration are as described for population (A). The imputation procedures are as previously described and assume that y 3 is independent of x. As a result, the estimator of the slope of y 3 on x is biased. The expected value of the imputed estimator is 0 455, where is the true value multiplied by the average response rate of 0 65; see Table 4. A desirable feature of an imputation procedure is the ability to identify incorrect model specifications. Given that the imputation is constructed under the assumption that y 3 is independent of x, one might consider testing the independence assumption by using the imputed data to compute the regression of y 3 on x. The entries in the final column of Table 4 for the estimated slope of population (C) are the powers of the nominal 5% test of the hypothesis that the slope is zero. Since the multiple imputation variance estimator has a bias of about 80%, the power of a test based on the multiple imputation variance is very poor. The bias of the fractional imputation variance estimator is much smaller than that of multiple imputation and hence the power for the test based on the fractional imputation estimated variance is much larger than that of multiple imputation. Thus the fractional imputation variance provides a much greater chance of identifying an incorrect model specification than does multiple imputation. ACKNOWLEDGEMENT This research was partially supported by a subcontract between Westat and owa State University under a contract between Westat and the Department of Education. We thank two referees, the associate editor and the editor for useful comments. APPENDX 1 Proofs Proof of T heorem 1. By the assumptions of the imputation mechanism and by (3), independently for each iµa, g y (A, A,d)~(m, s2 ). (A1) i R g g Let the linear estimator for the full sample be as defined in (1) and let h@ be the imputed estimator. Under the model (A1), E(h@ A, A,d)=E A G a m A, A,d B R i g R iµa Rg = G a m = G w m, (A2) i g i g iµa Rg iµa g where the last equality follows from (6), and result (8) is established.

17 The conditional variance of is var A, A,d)= G R = G by (A1). nserting (A2) and (A3) into Fractional hot deck imputation iµa Rg a2s2 i g iµa Rg G h=1 a a cov (y,y A, A,d) i j i j R jµa Rh var (h@ )=var {E(h@ A, A R,d)}+E{var (h@ A, A R,d)}, 575 (A3) (A4) we obtain result (9). To show (10), note that, by (A2), E(h@ h N A, A R,d)= G under the model (A1). For h N =WN i=1 Y i, w m G m i g g iµa g iµu g (A5) cov (h@, h A, A,d)= G N R var (h A, A,d)= G N R and var (h@ A, A R,d)is given in (A3). Therefore, a s2, i g iµa Rg s2, g iµu g Note that, by (6), var (h@ h N A, A R,d)= G (a2 2a )s2 + G s2. i i g g iµa Rg iµu g E A G a s2 i iµa Rg gb =E A G E{var (h@ h N A, A R,d)}=E q G w s2 i iµa g gb = G s2, g iµu g (a2 2a )s2 + G i i g iµa Rg s2 iµu g gr =E q G (a2 a )s2 i i iµa Rg gr. (A6) Therefore, result (10) follows by inserting (A5) and (A6) into a decomposition of the form (A4) for h@ h. % N Proof of T heorem 2. We write We can write E q L c (h@(k) k )2 r =E C L c {E(h@(k) k A, A,d)}2 D R k=1 k=1 +E C L c {var (h@(k) k A, A,d)} D. R k=1 (A7) h@(k) = a(k) i iµa R y i iµa R a i y i

18 576 JAE KWANG KM AND WAYNE FULLER and, by (6) and (19), W (a(k), a )=W (w(k),w). Therefore, under the cell mean model, iµar] U g i i iµa] U g i i we have E(h@(k) A, A,d)= G (w(k) w )m, (A8) R i i g iµa ] U g var (h@(k) A, A,d)= G (a(k) a )2s2. (A9) R i i g iµa R] U g f we use (A8) and the unbiasedness of the complete sample variance estimator, the first term on the right-hand side of the decomposition in (A7) reduces to var {WG W w m }.By iµa] U g i g (A9) and (18), the second term on the right-hand side of (A7) is E{WG W a2s2 }. Therefore, iµar] U g i g the expectation on the right-hand side of (A7) is equal to the variance given in (9). % APPENDX 2 llustrative calculation for fractional imputation We illustrate the weight construction with a cluster sample of four clusters. The data are given in Table A1. To simplify the presentation, we use initial weights of one. We use a single imputation Table A1. Replicate weights Weights Cluster Obs. mp. value Original Rep. 1 Rep. 2 Rep. 3 Rep. 4 1 y y y y Miss y (1 d ) 1 289(1+2d ) Miss y (1+0 5d ) 1 289(1 d ) Miss y (1+0 5d ) 1 289(1 d ) y y y y y y y y Miss y (1 d ) 1 289(1+0 5d ) 0 134(1+0 5d ) Miss y (1+0 5d ) 1 289(1 d ) 0 134(1+0 5d ) Miss y (1+0 5d ) 1 289(1+0 5d ) 0 134(1 d ) y 41 y Obs., observation; mp., imputed; Rep., replicate. cell and assume that y 11, y 21 and y 22 are the randomly selected donors for missing y 13 and that y 11, y 21 and y 31 are the three randomly selected donors for missing y 33. Fractional imputation is used with three donors for each missing value and with equal fractions assigned to each. The weights for the imputed dataset are given in the column original. n general, it is a good idea to construct replicates with c k =1. Such replicates will generally reduce the degrees-of-freedom effect of observing a subset of respondents. For a simple random sample of size n, a jackknife replicate for the mean with c k =1 can be created by reducing the weight on the deleted unit from n 1 to n 1 {n 3(n 1)}1/2, and increasing the weights on the remaining units by (n 1) 1[n 1 {n 3(n 1)}1/2].

19 Fractional hot deck imputation Thus, the first replicate for the mean with a sample of size 4 is created by decreasing the weight on the first cluster from one-quarter to and changing the other three weights to n the illustration we use weights of one, instead of one-quarter, and the basic replicate weights given in Table A1 are and The computations for Table A1 were carried out as follows. Step 1. Write the fractional replication weights in terms of d s; see Table A1. Step 2. Calculate a(k) 1i =W jµa w(k) w* and T =a2 WL c (a(k) j i i k=1 k 1i a )2; see Table A2. Then the i determining equation for d, in (21) of the text, can be written 577 for k=1, 2, 3, 4. c (a(k) a )2 c (a(k) k i i k 1i a i )2= T, i iµa Rg iµa Rg iµa Rg] P k (A10) Table A2. Weights for respondents Original Rep. 1 Rep. 2 Rep. 3 Rep. 4 Obs. a i a(1) 1i a(2) 1i a(3) 1i a(4) 1i T i y y y y y y y Obs., observation; Rep., replicate. Step 3. Write the determining equations (A10) in terms of the d s. For k=1, the determining equation is For k=2, we obtain For k=3, we obtain ( d d )2+( d d )2 +( d )2+( d )2 ( )2 ( )2 ( )2 ( )2= ( d d )2+( d d )2 +( d )2+( d )2 ( )2 ( )2 ( )2 ( )2= ( d )2+( d )2+( )2 +( d )2 ( )2 ( )2 ( )2 ( )2= There is no donor in cluster four so there is no d to be computed for replicate four. Step 4. Solve the determining equations to get d 1 =0 267, d 2 =0 223 and d 3 =0.

20 578 JAE KWANG KM AND WAYNE FULLER REFERENCES BARNARD, J. & RUBN, D. B. (1999). Small-sample degrees of freedom with multiple imputation. Biometrika 86, BRCK, J. M. & KALTON, G. (1996). Handling missing data in survey research. Statist. Meth. Med. Res. 5, CHEN, J. & SHAO, J. (2001). Jackknife variance estimation for nearest-neighbor imputation. J. Am. Statist. Assoc. 96, FAY, R. E. (1992). When are inferences from multiple imputation valid? n Proc. Survey Res. Meth. Sect., Am. Statist. Assoc., pp Washington, DC: American Statistical Association. FAY, R. E. (1996). Alternative paradigms for the analysis of imputed survey data. J. Am. Statist. Assoc. 91, KALTON, G. & KSH, L. (1984). Some efficient random imputation methods. Commun. Statist. A 13, LTTLE, R. J. A. & RUBN, D. B. (2002). Statistical Analysis with Missing Data. New York: Wiley. RAO, J. N. K. & SHAO, J. (1992). Jackknife variance estimation with survey data under hot deck imputation. Biometrika 79, RUBN, D. B. (1978). Multiple imputations in sample surveys: A phenomenological Bayesian approach to nonresponse. n Proc. Survey Res. Meth. Sect., Am. Statist. Assoc., pp Washington, DC: American Statistical Association. RUBN, D. B. (1987). Multiple mputation for Nonresponse in Surveys. New York: Wiley. RUBN, D. B. & SCHENKER, N. (1986). Multiple imputation for interval estimation from simple random samples with ignorable nonresponse. J. Am. Statist. Assoc. 81, RUST, K. F. & RAO, J. N. K. (1996). Variance estimation for complex surveys using replication techniques. Statist. Meth. Med. Res. 5, SÄRNDAL, C.-E. (1992). Methods for estimating the precision of survey estimates when imputation has been used. Survey Methodol. 18, TOLLEFSON, M. & FULLER, W. A. (1992). Variance estimation for samples with random imputation. n Proc. Survey Res. Meth. Sect., Am. Statist. Assoc., pp Washington, DC: American Statistical Association. WOLTER, K. M. (1985). ntroduction to Variance Estimation. New York: Springer-Verlag. [Received May Revised February 2004]

Two-phase sampling approach to fractional hot deck imputation

Two-phase sampling approach to fractional hot deck imputation Two-phase sampling approach to fractional hot deck imputation Jongho Im 1, Jae-Kwang Kim 1 and Wayne A. Fuller 1 Abstract Hot deck imputation is popular for handling item nonresponse in survey sampling.

More information

VARIANCE ESTIMATION FOR NEAREST NEIGHBOR IMPUTATION FOR U.S. CENSUS LONG FORM DATA

VARIANCE ESTIMATION FOR NEAREST NEIGHBOR IMPUTATION FOR U.S. CENSUS LONG FORM DATA Submitted to the Annals of Applied Statistics VARIANCE ESTIMATION FOR NEAREST NEIGHBOR IMPUTATION FOR U.S. CENSUS LONG FORM DATA By Jae Kwang Kim, Wayne A. Fuller and William R. Bell Iowa State University

More information

On the bias of the multiple-imputation variance estimator in survey sampling

On the bias of the multiple-imputation variance estimator in survey sampling J. R. Statist. Soc. B (2006) 68, Part 3, pp. 509 521 On the bias of the multiple-imputation variance estimator in survey sampling Jae Kwang Kim, Yonsei University, Seoul, Korea J. Michael Brick, Westat,

More information

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction

More information

Parametric fractional imputation for missing data analysis

Parametric fractional imputation for missing data analysis 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Biometrika (????),??,?, pp. 1 15 C???? Biometrika Trust Printed in

More information

A note on multiple imputation for general purpose estimation

A note on multiple imputation for general purpose estimation A note on multiple imputation for general purpose estimation Shu Yang Jae Kwang Kim SSC meeting June 16, 2015 Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 1 / 32 Introduction Basic Setup Assume

More information

Nonresponse weighting adjustment using estimated response probability

Nonresponse weighting adjustment using estimated response probability Nonresponse weighting adjustment using estimated response probability Jae-kwang Kim Yonsei University, Seoul, Korea December 26, 2006 Introduction Nonresponse Unit nonresponse Item nonresponse Basic strategy

More information

An Overview of the Pros and Cons of Linearization versus Replication in Establishment Surveys

An Overview of the Pros and Cons of Linearization versus Replication in Establishment Surveys An Overview of the Pros and Cons of Linearization versus Replication in Establishment Surveys Richard Valliant University of Michigan and Joint Program in Survey Methodology University of Maryland 1 Introduction

More information

arxiv:math/ v1 [math.st] 23 Jun 2004

arxiv:math/ v1 [math.st] 23 Jun 2004 The Annals of Statistics 2004, Vol. 32, No. 2, 766 783 DOI: 10.1214/009053604000000175 c Institute of Mathematical Statistics, 2004 arxiv:math/0406453v1 [math.st] 23 Jun 2004 FINITE SAMPLE PROPERTIES OF

More information

Pooling multiple imputations when the sample happens to be the population.

Pooling multiple imputations when the sample happens to be the population. Pooling multiple imputations when the sample happens to be the population. Gerko Vink 1,2, and Stef van Buuren 1,3 arxiv:1409.8542v1 [math.st] 30 Sep 2014 1 Department of Methodology and Statistics, Utrecht

More information

Model Assisted Survey Sampling

Model Assisted Survey Sampling Carl-Erik Sarndal Jan Wretman Bengt Swensson Model Assisted Survey Sampling Springer Preface v PARTI Principles of Estimation for Finite Populations and Important Sampling Designs CHAPTER 1 Survey Sampling

More information

TWO-WAY CONTINGENCY TABLES UNDER CONDITIONAL HOT DECK IMPUTATION

TWO-WAY CONTINGENCY TABLES UNDER CONDITIONAL HOT DECK IMPUTATION Statistica Sinica 13(2003), 613-623 TWO-WAY CONTINGENCY TABLES UNDER CONDITIONAL HOT DECK IMPUTATION Hansheng Wang and Jun Shao Peking University and University of Wisconsin Abstract: We consider the estimation

More information

6. Fractional Imputation in Survey Sampling

6. Fractional Imputation in Survey Sampling 6. Fractional Imputation in Survey Sampling 1 Introduction Consider a finite population of N units identified by a set of indices U = {1, 2,, N} with N known. Associated with each unit i in the population

More information

REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES

REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES Statistica Sinica 8(1998), 1153-1164 REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES Wayne A. Fuller Iowa State University Abstract: The estimation of the variance of the regression estimator for

More information

Combining data from two independent surveys: model-assisted approach

Combining data from two independent surveys: model-assisted approach Combining data from two independent surveys: model-assisted approach Jae Kwang Kim 1 Iowa State University January 20, 2012 1 Joint work with J.N.K. Rao, Carleton University Reference Kim, J.K. and Rao,

More information

Sampling from Finite Populations Jill M. Montaquila and Graham Kalton Westat 1600 Research Blvd., Rockville, MD 20850, U.S.A.

Sampling from Finite Populations Jill M. Montaquila and Graham Kalton Westat 1600 Research Blvd., Rockville, MD 20850, U.S.A. Sampling from Finite Populations Jill M. Montaquila and Graham Kalton Westat 1600 Research Blvd., Rockville, MD 20850, U.S.A. Keywords: Survey sampling, finite populations, simple random sampling, systematic

More information

Monte Carlo Study on the Successive Difference Replication Method for Non-Linear Statistics

Monte Carlo Study on the Successive Difference Replication Method for Non-Linear Statistics Monte Carlo Study on the Successive Difference Replication Method for Non-Linear Statistics Amang S. Sukasih, Mathematica Policy Research, Inc. Donsig Jang, Mathematica Policy Research, Inc. Amang S. Sukasih,

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 1 Bootstrapped Bias and CIs Given a multiple regression model with mean and

More information

A decision theoretic approach to Imputation in finite population sampling

A decision theoretic approach to Imputation in finite population sampling A decision theoretic approach to Imputation in finite population sampling Glen Meeden School of Statistics University of Minnesota Minneapolis, MN 55455 August 1997 Revised May and November 1999 To appear

More information

REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY

REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY J.D. Opsomer, W.A. Fuller and X. Li Iowa State University, Ames, IA 50011, USA 1. Introduction Replication methods are often used in

More information

A Note on Bayesian Inference After Multiple Imputation

A Note on Bayesian Inference After Multiple Imputation A Note on Bayesian Inference After Multiple Imputation Xiang Zhou and Jerome P. Reiter Abstract This article is aimed at practitioners who plan to use Bayesian inference on multiplyimputed datasets in

More information

INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING

INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING Statistica Sinica 24 (2014), 1001-1015 doi:http://dx.doi.org/10.5705/ss.2013.038 INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING Seunghwan Park and Jae Kwang Kim Seoul National Univeristy

More information

Plausible Values for Latent Variables Using Mplus

Plausible Values for Latent Variables Using Mplus Plausible Values for Latent Variables Using Mplus Tihomir Asparouhov and Bengt Muthén August 21, 2010 1 1 Introduction Plausible values are imputed values for latent variables. All latent variables can

More information

Chapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70

Chapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70 Chapter 5: Models used in conjunction with sampling J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70 Nonresponse Unit Nonresponse: weight adjustment Item Nonresponse:

More information

Statistical Practice

Statistical Practice Statistical Practice A Note on Bayesian Inference After Multiple Imputation Xiang ZHOU and Jerome P. REITER This article is aimed at practitioners who plan to use Bayesian inference on multiply-imputed

More information

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data Journal of Multivariate Analysis 78, 6282 (2001) doi:10.1006jmva.2000.1939, available online at http:www.idealibrary.com on Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone

More information

Fractional Imputation in Survey Sampling: A Comparative Review

Fractional Imputation in Survey Sampling: A Comparative Review Fractional Imputation in Survey Sampling: A Comparative Review Shu Yang Jae-Kwang Kim Iowa State University Joint Statistical Meetings, August 2015 Outline Introduction Fractional imputation Features Numerical

More information

BOOTSTRAPPING SAMPLE QUANTILES BASED ON COMPLEX SURVEY DATA UNDER HOT DECK IMPUTATION

BOOTSTRAPPING SAMPLE QUANTILES BASED ON COMPLEX SURVEY DATA UNDER HOT DECK IMPUTATION Statistica Sinica 8(998), 07-085 BOOTSTRAPPING SAMPLE QUANTILES BASED ON COMPLEX SURVEY DATA UNDER HOT DECK IMPUTATION Jun Shao and Yinzhong Chen University of Wisconsin-Madison Abstract: The bootstrap

More information

SAS/STAT 14.2 User s Guide. Introduction to Survey Sampling and Analysis Procedures

SAS/STAT 14.2 User s Guide. Introduction to Survey Sampling and Analysis Procedures SAS/STAT 14.2 User s Guide Introduction to Survey Sampling and Analysis Procedures This document is an individual chapter from SAS/STAT 14.2 User s Guide. The correct bibliographic citation for this manual

More information

Determining Sufficient Number of Imputations Using Variance of Imputation Variances: Data from 2012 NAMCS Physician Workflow Mail Survey *

Determining Sufficient Number of Imputations Using Variance of Imputation Variances: Data from 2012 NAMCS Physician Workflow Mail Survey * Applied Mathematics, 2014,, 3421-3430 Published Online December 2014 in SciRes. http://www.scirp.org/journal/am http://dx.doi.org/10.4236/am.2014.21319 Determining Sufficient Number of Imputations Using

More information

Nonrespondent subsample multiple imputation in two-phase random sampling for nonresponse

Nonrespondent subsample multiple imputation in two-phase random sampling for nonresponse Nonrespondent subsample multiple imputation in two-phase random sampling for nonresponse Nanhua Zhang Division of Biostatistics & Epidemiology Cincinnati Children s Hospital Medical Center (Joint work

More information

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW SSC Annual Meeting, June 2015 Proceedings of the Survey Methods Section ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW Xichen She and Changbao Wu 1 ABSTRACT Ordinal responses are frequently involved

More information

A Robust Approach of Regression-Based Statistical Matching for Continuous Data

A Robust Approach of Regression-Based Statistical Matching for Continuous Data The Korean Journal of Applied Statistics (2012) 25(2), 331 339 DOI: http://dx.doi.org/10.5351/kjas.2012.25.2.331 A Robust Approach of Regression-Based Statistical Matching for Continuous Data Sooncheol

More information

Miscellanea A note on multiple imputation under complex sampling

Miscellanea A note on multiple imputation under complex sampling Biometrika (2017), 104, 1,pp. 221 228 doi: 10.1093/biomet/asw058 Printed in Great Britain Advance Access publication 3 January 2017 Miscellanea A note on multiple imputation under complex sampling BY J.

More information

Statistical Methods. Missing Data snijders/sm.htm. Tom A.B. Snijders. November, University of Oxford 1 / 23

Statistical Methods. Missing Data  snijders/sm.htm. Tom A.B. Snijders. November, University of Oxford 1 / 23 1 / 23 Statistical Methods Missing Data http://www.stats.ox.ac.uk/ snijders/sm.htm Tom A.B. Snijders University of Oxford November, 2011 2 / 23 Literature: Joseph L. Schafer and John W. Graham, Missing

More information

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University Statistica Sinica 27 (2017), 000-000 doi:https://doi.org/10.5705/ss.202016.0155 DISCUSSION: DISSECTING MULTIPLE IMPUTATION FROM A MULTI-PHASE INFERENCE PERSPECTIVE: WHAT HAPPENS WHEN GOD S, IMPUTER S AND

More information

Bootstrap inference for the finite population total under complex sampling designs

Bootstrap inference for the finite population total under complex sampling designs Bootstrap inference for the finite population total under complex sampling designs Zhonglei Wang (Joint work with Dr. Jae Kwang Kim) Center for Survey Statistics and Methodology Iowa State University Jan.

More information

Exact Inference for the Two-Parameter Exponential Distribution Under Type-II Hybrid Censoring

Exact Inference for the Two-Parameter Exponential Distribution Under Type-II Hybrid Censoring Exact Inference for the Two-Parameter Exponential Distribution Under Type-II Hybrid Censoring A. Ganguly, S. Mitra, D. Samanta, D. Kundu,2 Abstract Epstein [9] introduced the Type-I hybrid censoring scheme

More information

J.N.K. Rao, Carleton University Department of Mathematics & Statistics, Carleton University, Ottawa, Canada

J.N.K. Rao, Carleton University Department of Mathematics & Statistics, Carleton University, Ottawa, Canada JACKKNIFE VARIANCE ESTIMATION WITH IMPUTED SURVEY DATA J.N.K. Rao, Carleton University Department of Mathematics & Statistics, Carleton University, Ottawa, Canada KEY WORDS" Adusted imputed values, item

More information

Introduction to Survey Data Analysis

Introduction to Survey Data Analysis Introduction to Survey Data Analysis JULY 2011 Afsaneh Yazdani Preface Learning from Data Four-step process by which we can learn from data: 1. Defining the Problem 2. Collecting the Data 3. Summarizing

More information

Some methods for handling missing values in outcome variables. Roderick J. Little

Some methods for handling missing values in outcome variables. Roderick J. Little Some methods for handling missing values in outcome variables Roderick J. Little Missing data principles Likelihood methods Outline ML, Bayes, Multiple Imputation (MI) Robust MAR methods Predictive mean

More information

Estimation from Purposive Samples with the Aid of Probability Supplements but without Data on the Study Variable

Estimation from Purposive Samples with the Aid of Probability Supplements but without Data on the Study Variable Estimation from Purposive Samples with the Aid of Probability Supplements but without Data on the Study Variable A.C. Singh,, V. Beresovsky, and C. Ye Survey and Data Sciences, American nstitutes for Research,

More information

PIRLS 2016 Achievement Scaling Methodology 1

PIRLS 2016 Achievement Scaling Methodology 1 CHAPTER 11 PIRLS 2016 Achievement Scaling Methodology 1 The PIRLS approach to scaling the achievement data, based on item response theory (IRT) scaling with marginal estimation, was developed originally

More information

A MODEL-BASED EVALUATION OF SEVERAL WELL-KNOWN VARIANCE ESTIMATORS FOR THE COMBINED RATIO ESTIMATOR

A MODEL-BASED EVALUATION OF SEVERAL WELL-KNOWN VARIANCE ESTIMATORS FOR THE COMBINED RATIO ESTIMATOR Statistica Sinica 8(1998), 1165-1173 A MODEL-BASED EVALUATION OF SEVERAL WELL-KNOWN VARIANCE ESTIMATORS FOR THE COMBINED RATIO ESTIMATOR Phillip S. Kott National Agricultural Statistics Service Abstract:

More information

ASA Section on Survey Research Methods

ASA Section on Survey Research Methods REGRESSION-BASED STATISTICAL MATCHING: RECENT DEVELOPMENTS Chris Moriarity, Fritz Scheuren Chris Moriarity, U.S. Government Accountability Office, 411 G Street NW, Washington, DC 20548 KEY WORDS: data

More information

EFFICIENT REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLING

EFFICIENT REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLING Statistica Sinica 13(2003), 641-653 EFFICIENT REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLING J. K. Kim and R. R. Sitter Hankuk University of Foreign Studies and Simon Fraser University Abstract:

More information

Eric Shou Stat 598B / CSE 598D METHODS FOR MICRODATA PROTECTION

Eric Shou Stat 598B / CSE 598D METHODS FOR MICRODATA PROTECTION Eric Shou Stat 598B / CSE 598D METHODS FOR MICRODATA PROTECTION INTRODUCTION Statistical disclosure control part of preparations for disseminating microdata. Data perturbation techniques: Methods assuring

More information

Conservative variance estimation for sampling designs with zero pairwise inclusion probabilities

Conservative variance estimation for sampling designs with zero pairwise inclusion probabilities Conservative variance estimation for sampling designs with zero pairwise inclusion probabilities Peter M. Aronow and Cyrus Samii Forthcoming at Survey Methodology Abstract We consider conservative variance

More information

Design and Estimation for Split Questionnaire Surveys

Design and Estimation for Split Questionnaire Surveys University of Wollongong Research Online Centre for Statistical & Survey Methodology Working Paper Series Faculty of Engineering and Information Sciences 2008 Design and Estimation for Split Questionnaire

More information

Sample size calculations for logistic and Poisson regression models

Sample size calculations for logistic and Poisson regression models Biometrika (2), 88, 4, pp. 93 99 2 Biometrika Trust Printed in Great Britain Sample size calculations for logistic and Poisson regression models BY GWOWEN SHIEH Department of Management Science, National

More information

Weighting Missing Data Coding and Data Preparation Wrap-up Preview of Next Time. Data Management

Weighting Missing Data Coding and Data Preparation Wrap-up Preview of Next Time. Data Management Data Management Department of Political Science and Government Aarhus University November 24, 2014 Data Management Weighting Handling missing data Categorizing missing data types Imputation Summary measures

More information

6 Single Sample Methods for a Location Parameter

6 Single Sample Methods for a Location Parameter 6 Single Sample Methods for a Location Parameter If there are serious departures from parametric test assumptions (e.g., normality or symmetry), nonparametric tests on a measure of central tendency (usually

More information

9/2/2010. Wildlife Management is a very quantitative field of study. throughout this course and throughout your career.

9/2/2010. Wildlife Management is a very quantitative field of study. throughout this course and throughout your career. Introduction to Data and Analysis Wildlife Management is a very quantitative field of study Results from studies will be used throughout this course and throughout your career. Sampling design influences

More information

BOOK REVIEW Sampling: Design and Analysis. Sharon L. Lohr. 2nd Edition, International Publication,

BOOK REVIEW Sampling: Design and Analysis. Sharon L. Lohr. 2nd Edition, International Publication, STATISTICS IN TRANSITION-new series, August 2011 223 STATISTICS IN TRANSITION-new series, August 2011 Vol. 12, No. 1, pp. 223 230 BOOK REVIEW Sampling: Design and Analysis. Sharon L. Lohr. 2nd Edition,

More information

No is the Easiest Answer: Using Calibration to Assess Nonignorable Nonresponse in the 2002 Census of Agriculture

No is the Easiest Answer: Using Calibration to Assess Nonignorable Nonresponse in the 2002 Census of Agriculture No is the Easiest Answer: Using Calibration to Assess Nonignorable Nonresponse in the 2002 Census of Agriculture Phillip S. Kott National Agricultural Statistics Service Key words: Weighting class, Calibration,

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Streamlining Missing Data Analysis by Aggregating Multiple Imputations at the Data Level

Streamlining Missing Data Analysis by Aggregating Multiple Imputations at the Data Level Streamlining Missing Data Analysis by Aggregating Multiple Imputations at the Data Level A Monte Carlo Simulation to Test the Tenability of the SuperMatrix Approach Kyle M Lang Quantitative Psychology

More information

Linear Regression Models. Based on Chapter 3 of Hastie, Tibshirani and Friedman

Linear Regression Models. Based on Chapter 3 of Hastie, Tibshirani and Friedman Linear Regression Models Based on Chapter 3 of Hastie, ibshirani and Friedman Linear Regression Models Here the X s might be: p f ( X = " + " 0 j= 1 X j Raw predictor variables (continuous or coded-categorical

More information

On the Use of Compromised Imputation for Missing data using Factor-Type Estimators

On the Use of Compromised Imputation for Missing data using Factor-Type Estimators J. Stat. Appl. Pro. Lett. 2, No. 2, 105-113 (2015) 105 Journal of Statistics Applications & Probability Letters An International Journal http://dx.doi.org/10.12785/jsapl/020202 On the Use of Compromised

More information

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at Biometrika Trust Robust Regression via Discriminant Analysis Author(s): A. C. Atkinson and D. R. Cox Source: Biometrika, Vol. 64, No. 1 (Apr., 1977), pp. 15-19 Published by: Oxford University Press on

More information

Imputation for Missing Data under PPSWR Sampling

Imputation for Missing Data under PPSWR Sampling July 5, 2010 Beijing Imputation for Missing Data under PPSWR Sampling Guohua Zou Academy of Mathematics and Systems Science Chinese Academy of Sciences 1 23 () Outline () Imputation method under PPSWR

More information

Combining Non-probability and Probability Survey Samples Through Mass Imputation

Combining Non-probability and Probability Survey Samples Through Mass Imputation Combining Non-probability and Probability Survey Samples Through Mass Imputation Jae-Kwang Kim 1 Iowa State University & KAIST October 27, 2018 1 Joint work with Seho Park, Yilin Chen, and Changbao Wu

More information

Estimating Causal Effects of Organ Transplantation Treatment Regimes

Estimating Causal Effects of Organ Transplantation Treatment Regimes Estimating Causal Effects of Organ Transplantation Treatment Regimes David M. Vock, Jeffrey A. Verdoliva Boatman Division of Biostatistics University of Minnesota July 31, 2018 1 / 27 Hot off the Press

More information

analysis of incomplete data in statistical surveys

analysis of incomplete data in statistical surveys analysis of incomplete data in statistical surveys Ugo Guarnera 1 1 Italian National Institute of Statistics, Italy guarnera@istat.it Jordan Twinning: Imputation - Amman, 6-13 Dec 2014 outline 1 origin

More information

Comments on Design-Based Prediction Using Auxilliary Information under Random Permutation Models (by Wenjun Li (5/21/03) Ed Stanek

Comments on Design-Based Prediction Using Auxilliary Information under Random Permutation Models (by Wenjun Li (5/21/03) Ed Stanek Comments on Design-Based Prediction Using Auxilliary Information under Random Permutation Models (by Wenjun Li (5/2/03) Ed Stanek Here are comments on the Draft Manuscript. They are all suggestions that

More information

University of Michigan School of Public Health

University of Michigan School of Public Health University of Michigan School of Public Health The University of Michigan Department of Biostatistics Working Paper Series Year 003 Paper Weighting Adustments for Unit Nonresponse with Multiple Outcome

More information

Methods Used for Estimating Statistics in EdSurvey Developed by Paul Bailey & Michael Cohen May 04, 2017

Methods Used for Estimating Statistics in EdSurvey Developed by Paul Bailey & Michael Cohen May 04, 2017 Methods Used for Estimating Statistics in EdSurvey 1.0.6 Developed by Paul Bailey & Michael Cohen May 04, 2017 This document describes estimation procedures for the EdSurvey package. It includes estimation

More information

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i, A Course in Applied Econometrics Lecture 18: Missing Data Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. When Can Missing Data be Ignored? 2. Inverse Probability Weighting 3. Imputation 4. Heckman-Type

More information

MISSING or INCOMPLETE DATA

MISSING or INCOMPLETE DATA MISSING or INCOMPLETE DATA A (fairly) complete review of basic practice Don McLeish and Cyntha Struthers University of Waterloo Dec 5, 2015 Structure of the Workshop Session 1 Common methods for dealing

More information

The Effect of Multiple Weighting Steps on Variance Estimation

The Effect of Multiple Weighting Steps on Variance Estimation Journal of Official Statistics, Vol. 20, No. 1, 2004, pp. 1 18 The Effect of Multiple Weighting Steps on Variance Estimation Richard Valliant 1 Multiple weight adjustments are common in surveys to account

More information

Data Integration for Big Data Analysis for finite population inference

Data Integration for Big Data Analysis for finite population inference for Big Data Analysis for finite population inference Jae-kwang Kim ISU January 23, 2018 1 / 36 What is big data? 2 / 36 Data do not speak for themselves Knowledge Reproducibility Information Intepretation

More information

ECNS 561 Multiple Regression Analysis

ECNS 561 Multiple Regression Analysis ECNS 561 Multiple Regression Analysis Model with Two Independent Variables Consider the following model Crime i = β 0 + β 1 Educ i + β 2 [what else would we like to control for?] + ε i Here, we are taking

More information

The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for Bayesian Estimation in a Finite Gaussian Mixture Model

The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for Bayesian Estimation in a Finite Gaussian Mixture Model Thai Journal of Mathematics : 45 58 Special Issue: Annual Meeting in Mathematics 207 http://thaijmath.in.cmu.ac.th ISSN 686-0209 The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for

More information

Basics of Modern Missing Data Analysis

Basics of Modern Missing Data Analysis Basics of Modern Missing Data Analysis Kyle M. Lang Center for Research Methods and Data Analysis University of Kansas March 8, 2013 Topics to be Covered An introduction to the missing data problem Missing

More information

11. Bootstrap Methods

11. Bootstrap Methods 11. Bootstrap Methods c A. Colin Cameron & Pravin K. Trivedi 2006 These transparencies were prepared in 20043. They can be used as an adjunct to Chapter 11 of our subsequent book Microeconometrics: Methods

More information

TWO-WAY CONTINGENCY TABLES WITH MARGINALLY AND CONDITIONALLY IMPUTED NONRESPONDENTS

TWO-WAY CONTINGENCY TABLES WITH MARGINALLY AND CONDITIONALLY IMPUTED NONRESPONDENTS TWO-WAY CONTINGENCY TABLES WITH MARGINALLY AND CONDITIONALLY IMPUTED NONRESPONDENTS By Hansheng Wang A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy

More information

Inference with Imputed Conditional Means

Inference with Imputed Conditional Means Inference with Imputed Conditional Means Joseph L. Schafer and Nathaniel Schenker June 4, 1997 Abstract In this paper, we develop analytic techniques that can be used to produce appropriate inferences

More information

A Monte-Carlo study of asymptotically robust tests for correlation coefficients

A Monte-Carlo study of asymptotically robust tests for correlation coefficients Biometrika (1973), 6, 3, p. 661 551 Printed in Great Britain A Monte-Carlo study of asymptotically robust tests for correlation coefficients BY G. T. DUNCAN AND M. W. J. LAYAKD University of California,

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

Sociedad de Estadística e Investigación Operativa

Sociedad de Estadística e Investigación Operativa Sociedad de Estadística e Investigación Operativa Test Volume 14, Number 2. December 2005 Estimation of Regression Coefficients Subject to Exact Linear Restrictions when Some Observations are Missing and

More information

Statistical Data Analysis

Statistical Data Analysis DS-GA 0 Lecture notes 8 Fall 016 1 Descriptive statistics Statistical Data Analysis In this section we consider the problem of analyzing a set of data. We describe several techniques for visualizing the

More information

Comment on Tests of Certain Types of Ignorable Nonresponse in Surveys Subject to Item Nonresponse or Attrition

Comment on Tests of Certain Types of Ignorable Nonresponse in Surveys Subject to Item Nonresponse or Attrition Institute for Policy Research Northwestern University Working Paper Series WP-09-10 Comment on Tests of Certain Types of Ignorable Nonresponse in Surveys Subject to Item Nonresponse or Attrition Christopher

More information

ANALYSIS OF PANEL DATA MODELS WITH GROUPED OBSERVATIONS. 1. Introduction

ANALYSIS OF PANEL DATA MODELS WITH GROUPED OBSERVATIONS. 1. Introduction Tatra Mt Math Publ 39 (2008), 183 191 t m Mathematical Publications ANALYSIS OF PANEL DATA MODELS WITH GROUPED OBSERVATIONS Carlos Rivero Teófilo Valdés ABSTRACT We present an iterative estimation procedure

More information

Chapter 8: Estimation 1

Chapter 8: Estimation 1 Chapter 8: Estimation 1 Jae-Kwang Kim Iowa State University Fall, 2014 Kim (ISU) Ch. 8: Estimation 1 Fall, 2014 1 / 33 Introduction 1 Introduction 2 Ratio estimation 3 Regression estimator Kim (ISU) Ch.

More information

What s New in Econometrics. Lecture 13

What s New in Econometrics. Lecture 13 What s New in Econometrics Lecture 13 Weak Instruments and Many Instruments Guido Imbens NBER Summer Institute, 2007 Outline 1. Introduction 2. Motivation 3. Weak Instruments 4. Many Weak) Instruments

More information

Robust covariance estimator for small-sample adjustment in the generalized estimating equations: A simulation study

Robust covariance estimator for small-sample adjustment in the generalized estimating equations: A simulation study Science Journal of Applied Mathematics and Statistics 2014; 2(1): 20-25 Published online February 20, 2014 (http://www.sciencepublishinggroup.com/j/sjams) doi: 10.11648/j.sjams.20140201.13 Robust covariance

More information

Calibration estimation using exponential tilting in sample surveys

Calibration estimation using exponential tilting in sample surveys Calibration estimation using exponential tilting in sample surveys Jae Kwang Kim February 23, 2010 Abstract We consider the problem of parameter estimation with auxiliary information, where the auxiliary

More information

The Bayesian Approach to Multi-equation Econometric Model Estimation

The Bayesian Approach to Multi-equation Econometric Model Estimation Journal of Statistical and Econometric Methods, vol.3, no.1, 2014, 85-96 ISSN: 2241-0384 (print), 2241-0376 (online) Scienpress Ltd, 2014 The Bayesian Approach to Multi-equation Econometric Model Estimation

More information

A BOOTSTRAP VARIANCE ESTIMATOR FOR SYSTEMATIC PPS SAMPLING

A BOOTSTRAP VARIANCE ESTIMATOR FOR SYSTEMATIC PPS SAMPLING A BOOTSTRAP VARIANCE ESTIMATOR FOR SYSTEMATIC PPS SAMPLING Steven Kaufman, National Center for Education Statistics Room 422d, 555 New Jersey Ave. N.W., Washington, D.C. 20208 Key Words: Simulation, Half-Sample

More information

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Sahar Z Zangeneh Robert W. Keener Roderick J.A. Little Abstract In Probability proportional

More information

Alternative implementations of Monte Carlo EM algorithms for likelihood inferences

Alternative implementations of Monte Carlo EM algorithms for likelihood inferences Genet. Sel. Evol. 33 001) 443 45 443 INRA, EDP Sciences, 001 Alternative implementations of Monte Carlo EM algorithms for likelihood inferences Louis Alberto GARCÍA-CORTÉS a, Daniel SORENSEN b, Note a

More information

Likelihood-based inference with missing data under missing-at-random

Likelihood-based inference with missing data under missing-at-random Likelihood-based inference with missing data under missing-at-random Jae-kwang Kim Joint work with Shu Yang Department of Statistics, Iowa State University May 4, 014 Outline 1. Introduction. Parametric

More information

Discussing Effects of Different MAR-Settings

Discussing Effects of Different MAR-Settings Discussing Effects of Different MAR-Settings Research Seminar, Department of Statistics, LMU Munich Munich, 11.07.2014 Matthias Speidel Jörg Drechsler Joseph Sakshaug Outline What we basically want to

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Regression Diagnostics for Survey Data

Regression Diagnostics for Survey Data Regression Diagnostics for Survey Data Richard Valliant Joint Program in Survey Methodology, University of Maryland and University of Michigan USA Jianzhu Li (Westat), Dan Liao (JPSM) 1 Introduction Topics

More information

Measurement error as missing data: the case of epidemiologic assays. Roderick J. Little

Measurement error as missing data: the case of epidemiologic assays. Roderick J. Little Measurement error as missing data: the case of epidemiologic assays Roderick J. Little Outline Discuss two related calibration topics where classical methods are deficient (A) Limit of quantification methods

More information

STATISTICAL METHODS FOR MISSING DATA IN COMPLEX SAMPLE SURVEYS

STATISTICAL METHODS FOR MISSING DATA IN COMPLEX SAMPLE SURVEYS STATISTICAL METHODS FOR MISSING DATA IN COMPLEX SAMPLE SURVEYS by Rebecca Roberts Andridge A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Biostatistics)

More information

A construction method for orthogonal Latin hypercube designs

A construction method for orthogonal Latin hypercube designs Biometrika (2006), 93, 2, pp. 279 288 2006 Biometrika Trust Printed in Great Britain A construction method for orthogonal Latin hypercube designs BY DAVID M. STEINBERG Department of Statistics and Operations

More information

Unequal Probability Designs

Unequal Probability Designs Unequal Probability Designs Department of Statistics University of British Columbia This is prepares for Stat 344, 2014 Section 7.11 and 7.12 Probability Sampling Designs: A quick review A probability

More information

Chapter 3: Element sampling design: Part 1

Chapter 3: Element sampling design: Part 1 Chapter 3: Element sampling design: Part 1 Jae-Kwang Kim Fall, 2014 Simple random sampling 1 Simple random sampling 2 SRS with replacement 3 Systematic sampling Kim Ch. 3: Element sampling design: Part

More information