Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Size: px
Start display at page:

Download "Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling"

Transcription

1 Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, Joint work with Shu Yang

2 Introduction 1 Introduction 2 Review 3 Fractional Hot deck imputation 4 Simulation Study 5 Conclusion Kim (ISU) Fractional Hot Deck Imputation June 26, / 44

3 Introduction Basic Setup U = {1, 2,, N}: index set of finite population (x i, y i ): study variables in unit i in the population. η: parameter of interest defined by the solution to Examples: N U(η; x i, y i ) = 0. i=1 1 Population mean: U(η; x, y) = y η 2 Population proportion of Y less than q: U(η; x, y) = I (y < q) η 3 Population p-th quantitle : U(η; x, y) = I (y < η) p 4 Population regression coefficient: U(η; x, y) = (y xη)x 5 Domain mean: U(η; x, y) = (y η)d(x) Kim (ISU) Fractional Hot Deck Imputation June 26, / 44

4 Introduction Basic Setup (Cont d) A: index set of the sample (A U) obtained from a probability sampling design, with π i being the first-order inclusion probability of unit i. From the sample, we collect measurement for (x i, y i ). Under complete response, a consistent estimator of η can be obtained by solving w i U(η; x i, y i ) = 0, (1) i A for η, where w i = π 1 i. Under some regularity conditions, the solution to (1) is consistent and asymptotically normally distributed (Binder and Patak, 1994). Kim (ISU) Fractional Hot Deck Imputation June 26, / 44

5 Introduction Basic Setup (Cont d) Assume that x i are always observed and y i are subject to non-response. Define δ i = { 1 if yi is observed 0 otherwise. A consistent estimator of η is then obtained by taking the conditional expectation and solving Ū(η) = 0 for η, where Ū(η) = i A w i [δ i U(η; x i, y i ) + (1 δ i ) E{U(η; x i, Y ) x i, δ i = 0}]. (2) Kim (ISU) Fractional Hot Deck Imputation June 26, / 44

6 Introduction How to compute the conditional expectation in (2)? 1 Often, start with assuming missing-at-random (MAR). That is, f (y x, δ) = f (y x) 2 Build a (parametric) model on f (y x). That is, for some θ. f (y x) = f (y x; θ) 3 Obtain a consistent estimator ˆθ of θ from the set of respondents. That is, solve w i δ i S(θ; x i, y i ) = 0 i A for θ, where S(θ; x, y) is the score function of θ. 4 Compute the conditional expectation by a Monte Carlo approximation using the samples from f (y x; ˆθ): E{U(η; x i, Y ) x i } = 1 M M j=1 U(η; x i, y (j) i ), where y (j) i f (y x i ; ˆθ). Kim (ISU) Fractional Hot Deck Imputation June 26, / 44

7 Introduction Imputation Imputation: Monte Carlo approximation of the conditional expectation (given the observed data). E {U (η; x i, Y ) x i } = 1 M M ( ) U η; x i, y (j) i j=1 1 Bayesian approach: generate yi from f (y i x i, θ ) where θ is generated from p(θ x, y). 2 Frequentist approach: generate yi from f (y i x i ; ˆθ), where ˆθ is a consistent estimator. Once the conditional expectation is computed (approximately), we can obtain ˆη by solving the imputed estimating equation. Kim (ISU) Fractional Hot Deck Imputation June 26, / 44

8 Introduction Imputation Remark Imputation can be applied even when η is unknown. Thus, it is a useful tool for general-purpose estimation. Works even when M = 1 (single imputation). To reduce the variance and to enable variance estimation, M > 1 is often used. Bayesian approach: Multiple imputation of Rubin (1987) Frequentist approach: Parametric fractional imputation of Kim (2011). Kim (ISU) Fractional Hot Deck Imputation June 26, / 44

9 Review 1 Introduction 2 Review 3 Fractional Hot deck imputation 4 Simulation Study 5 Conclusion Kim (ISU) Fractional Hot Deck Imputation June 26, / 44

10 Review Multiple imputation Generate M imputed values (with equal weights) Features 1 Imputed values are generated from the posterior predictive distribution, which is the average of f (y i x i ; θ) evaluated at the posterior distribution π (θ x, y obs ). 2 Variance estimation formula is simple (Rubin s formula). ˆV MI ( η M ) = W M + (1 + 1 M )B M where W M = M 1 M m=1 ˆV I (m), B M = (M 1) 1 M m=1 (ˆη(m) η M ) 2, η M = M 1 M m=1 ˆη (m) is the average of M imputed estimators of η, and ˆV I (m) is the imputed version of the variance estimator of ˆη under complete response. Kim (ISU) Fractional Hot Deck Imputation June 26, / 44

11 Review Multiple imputation Remark Sampling design is incorporated by including w i into covariates in order to make the sampling design non-informative. Thus, the imputed values are generated from the sample model, not from population model. y i f (y x i, I i = 1) where I i is the indicator function for the sample inclusion. MAR is assumed in the sample level: f (y x, I = 1, δ = 0) = f (y x, I = 1, δ = 1), which is different from MAR in the population level: f (y x, δ = 0) = f (y x, δ = 1). Kim (ISU) Fractional Hot Deck Imputation June 26, / 44

12 Review Multiple imputation Remark (Cont d) If the sampling design is non-informative, then the sample model and the population model are equivalent and the sample MAR and the population MAR are equivalent. Variance estimation (using Rubin s formula) does not work when the sampling design is informative. Even when the sampling design is non-informative, consistency of variance estimator is questionable (Kim et al., 2006). Kim (ISU) Fractional Hot Deck Imputation June 26, / 44

13 Review Multiple imputation Variance estimation Rubin s formula is based on the following decomposition: V (ˆη MI ) = V (ˆη n ) + V (ˆη MI ˆη n ). Basically, W M term estimates V (ˆη n ) and (1 + M 1 )B M term estimates V (ˆη MI ˆη n ). In general, we have V (ˆη MI ) = V (ˆη n ) + V (ˆη MI ˆη n ) + 2Cov(ˆη MI ˆη n, ˆη n ) and the covariance terms can be non-negligible. The condition of zero covariance is called congeniality by Meng (1994). Congeniality holds when ˆη MI is a smooth function of the MLE of θ in f (y x; θ). Otherwise, Rubin s variance estimator can be biased, which will be discussed in the simulation section. Kim (ISU) Fractional Hot Deck Imputation June 26, / 44

14 Review Parametric Fractional Imputation Parametric fractional imputation of Kim (2011) 1 More than one (say M) imputed values of y i : y (1) generated from some (initial) density h (y i x i ). 2 Create weighted data set where i,, y (M) i {( ) } w i wij, x i, y (j) i ; j = 1, 2,, M; i A wij f (y (j) (j) i x i ; ˆθ)/h(y i x i ), ˆθ is the (pseudo) maximum likelihood estimator of θ. 3 The weight wij are the normalized importance weights and are called fractional weights. Kim (ISU) Fractional Hot Deck Imputation June 26, / 44

15 Review Parametric Fractional Imputation (Cont d) Product: fractionally imputed data set of size nm { } (w i wij, x i, y (j) ); j = 1, 2,, M; i A Property: for sufficiently large M, M j=1 w ij g(x i, y (j) i ) = i g(xi, y) f (y x i ;ˆθ) h(y x i ) h(y x i)dy f (y xi ;ˆθ) h(y x i ) h(y x i)dy for any g such that the expectation exists. = E { g (x i, Y ) x i ; ˆθ } Can handle informative sampling design by incorporating the sampling weights into the score equation. That is, solve w i δ i S(θ; x i, y i ) = 0 (3) i A where S(θ; x, y) = log f (y x; θ)/ θ is the score function of θ. Kim (ISU) Fractional Hot Deck Imputation June 26, / 44

16 Review Parametric Fractional Imputation (Cont d) Remark Imputed values are generated from the population model, not from the sample model. y i f (y x i ) f (y x i, I i = 1). Thus, we assume population MAR, not sample MAR. For variance estimation, either linearization method or replication method can be used. Kim (ISU) Fractional Hot Deck Imputation June 26, / 44

17 Fractional Hot deck imputation 1 Introduction 2 Review 3 Fractional Hot deck imputation 4 Simulation Study 5 Conclusion Kim (ISU) Fractional Hot Deck Imputation June 26, / 44

18 Fractional Hot deck imputation Fractional Hot deck imputation Motivation Hot deck imputation Imputed values are real observations Very popular in household surveys Want to implement hot deck version of fractional imputation. Kim (2004) and Fuller and Kim (2005) already considered fractional hot deck imputation: x is categorical in f (y x). Kim, Fuller and Bell (2011) extended the method of Kim (2004) to nearest neighbor imputation. We now want to extend it to the case when x has continuous components. Kim (ISU) Fractional Hot Deck Imputation June 26, / 44

19 Fractional Hot deck imputation Fractional Hot deck imputation Proposed method: Three steps 1 Fully efficient fractional imputation (FEFI) by choosing all the respondents as donors. That is, we use M = n R imputed values for each missing unit, where n R is the number of respondents in the sample. 2 Use a systematic PPS sampling to select m (<< n R ) donors from the FEFI. 3 Use a calibration weighting technique to compute the final fractional weights (which lead to the same estimates of FEFI for some items). Kim (ISU) Fractional Hot Deck Imputation June 26, / 44

20 Fractional Hot deck imputation Fractional Hot deck imputation Step 1: FEFI step Want to find the fractional weights wij when the j-th imputed value is taken from the j-th value in the set of the respondents. y (j) i Without loss of generality, we assume that the first n R elements respond and write y (j) i = y j. Recall that wij f (y (j) (j) i x i ; ˆθ)/h(y i x i ) when y (j) i are generated from h(y x i ). We have only to find h(y (j) i x i ) when we use y (j) i = y j. Kim (ISU) Fractional Hot Deck Imputation June 26, / 44

21 Fractional Hot deck imputation Fractional Hot deck imputation Step 1: FEFI step (Cont d) We can treat {y i ; δ i = 1} as a realization from f (y δ = 1), the marginal distribution of y among respondents. Now, we can write f (y j δ j = 1) = = = 1 N R f (y j x, δ j = 1) f (x δ j = 1)dx f (y j x) f (x δ j = 1)dx N δ k f (y j x k ), k=1 where N R = N i=1 δ i is the population size of (potential) respondents. Kim (ISU) Fractional Hot Deck Imputation June 26, / 44

22 Fractional Hot deck imputation Fractional Hot deck imputation Step 1: FEFI step (Cont d) Using the survey weights, we can approximate f (y j δ j = 1) k A = R w k f (y j x k ) k A R w k and the fractional weight for y (j) i = y j becomes w ij f (y j x i ; ˆθ) k A R w k f (y j x k ; ˆθ) (4) with j A R w ij = 1, where A R = {i A; δ i = 1} and ˆθ is computed from the weighted score equation in (3). Kim (ISU) Fractional Hot Deck Imputation June 26, / 44

23 Fractional Hot deck imputation Fractional Hot deck imputation Step 2: Sampling Step FEFI uses all the elements in A R as donors for each missing i. Want to reduce the number of donors to, say, m = 10. For each i, we can treat the FEFI donor set as the weighted population and apply a sampling method to select a smaller set of donors. Fractional weights (4) for FEFI can be used as the selection probabilities for the PPS sampling. That is, our goal is to obtain a (systematic) PPS sample D i of size m from the FEFI donor set of size M = n R, using wij as the selection probability assigned to the j-th element in A R. (Note that wij satisfies M j=1 w ij = 1 and wij > 0.) Kim (ISU) Fractional Hot Deck Imputation June 26, / 44

24 Fractional Hot deck imputation Fractional Hot deck imputation Step 3: Weighting Step After we select D i from the complete set of respondents, the selected donors in D i are assigned with the initial fractional weights wij0 = 1/m. The fractional weights are further adjusted to satisfy w i {(1 δ i ) wij,cq(x i, y j )} = w i {(1 δ i ) wij q(x i, y j )}, i A j D i i A j A R (5) for some q(x i, y j ), and j D i wij,c = 1 for all i with δ i = 0, where wij is the fractional weights for FEFI method, as defined in (4). Regarding the choice of the control function q(x, y) in (5), we can use q(x, y) = (y, y 2 ), which will lead to fully efficient estimates for the mean and the variance of y. Kim (ISU) Fractional Hot Deck Imputation June 26, / 44

25 Fractional Hot deck imputation Fractional Hot deck imputation Remark For variance estimation, replication method can be used. The imputed values are not changed, only the fractional weights are changed for each replication. (Details skipped) The proposed fractional hot deck imputation is less sensitive against model mis-specification in f (y x; θ). (Details skipped.) The proposed method can be extended to a non-ignorable missing case under a parametric model assumption on the response mechanism. (Details skipped). Kim (ISU) Fractional Hot Deck Imputation June 26, / 44

26 Simulation Study 1 Introduction 2 Review 3 Fractional Hot deck imputation 4 Simulation Study 5 Conclusion Kim (ISU) Fractional Hot Deck Imputation June 26, / 44

27 Simulation Study Simulation Study - Study One Factors considered Correct vs incorrect imputation model: to see the effect of model misspecification of f (y x). Imputation methods: MI, PFI, FHDI Parameters of interest: mean, proportion Kim (ISU) Fractional Hot Deck Imputation June 26, / 44

28 Simulation Study Simulation Study - Study One Simulation Setup Two sets of models 1 Model A: y i = 0.5x i + e i, where x i exp(1) and e i N(0, 1). 2 Model B: same as model A except for e i {χ 2 (2) 2)}/2 Response mechanism: y i is observed only when δ i = 1 where δ i Bernoulli(π), π i = {1 + exp( 0.2 x i )} 1 Thus, we have MAR with 65% overall response in both models. B = 5, 000 Monte Carlo samples of size n = 200. We used y i N(β 0 + β 1 x i, σ 2 ) as the imputation model under both cases. (Thus, the imputation model is mis-specified under Model B.) Kim (ISU) Fractional Hot Deck Imputation June 26, / 44

29 Simulation Study Simulation Study - Study One Simulation Setup (Cont d) Two parameters considered: 1 η 1 = E(Y ): the population mean of y 2 η 2 = Pr(Y < 1): the proportion of Y less than one. Four estimators computed: 1 Full sample estimator (FULL) that is computed using the full sample. 2 Multiple imputation (MI) estimator with imputation size m = 10 3 Parametric fractional imputation (PFI) with imputation size m = 10 4 Fractional hot deck imputation (FHDI) with imputation size m = 10 Kim (ISU) Fractional Hot Deck Imputation June 26, / 44

30 Simulation Study Simulation Study - Study One Simulation Results under Model A Table : Point estimation Parameter Method Mean Var Std Var Full η 1 = µ y MI PFI FHDI Full η 2 = pr(y < 1) MI PFI FHDI Kim (ISU) Fractional Hot Deck Imputation June 26, / 44

31 Simulation Study Simulation Study - Study One Simulation Results under Model A Table : Variance estimation Parameter Method R.B. (%) t-statistics V (ˆη 1 ) V (ˆη 2 ) MI PFI FHDI MI PFI FHDI Kim (ISU) Fractional Hot Deck Imputation June 26, / 44

32 Simulation Study Simulation Study - Study One Discussion for Model A results Point estimation unbiased for both parameters under correct model. For η 1 = E(Y ), imputation increases variance roughly 45-53%: V (ˆη 1,imp ) = 1 ( 1 n σ2 y + 1 ) σe 2 n R n = ( ) = = and / = Kim (ISU) Fractional Hot Deck Imputation June 26, / 44

33 Simulation Study Simulation Study - Study One Discussion for Model A results (Cont d) For η 2 = Pr(Y < 1), imputation increases variance roughly 25% for MI and PFI. Note that ˆη 2,imp = 1 n n [δ i I (y i < 1) + (1 δ i )E{I (Y < 1) x i }] i=1 where we used the imputation model in computing the conditional expectation. Thus, it borrows strength by making use of normality assumption at the time of imputation. In some sense, the above imputation estimator can be viewed as a composite estimator, where composite estimator is a weighted average of direct estimator and synthetic estimator. Kim (ISU) Fractional Hot Deck Imputation June 26, / 44

34 Simulation Study Simulation Study - Study One Discussion for Model A results (Cont d) In fact, under full response, there are two estimators of η 2 = Pr(Y < 1): n ˆη 2,MME = n 1 I (y i < 1) ˆη 2,MLE = 1 i=1 ( ) y ˆµ φ dy. ˆσ The MLE is more efficient than the MME but it is less robust. The congeniality condition holds when MLE is used, but not when MME is used. Rubin s variance estimator for MI requires the congeniality condition. FI does not require congeniality. Kim (ISU) Fractional Hot Deck Imputation June 26, / 44

35 Simulation Study Simulation Study - Study One Simulation Results under Model B Table : Point estimation Parameter Method Mean Var Std Var Full η 1 = µ y MI PFI FHDI Full η 2 = pr(y < 1) MI PFI FHDI Kim (ISU) Fractional Hot Deck Imputation June 26, / 44

36 Simulation Study Simulation Study - Study One Simulation Results under Model B Table : Variance estimation Parameter Method R.B. (%) t-statistics V (ˆη 1 ) V (ˆη 2 ) MI PFI FHDI (m = 10) MI (m = 10) PFI (m = 10) FHDI (m = 10) Kim (ISU) Fractional Hot Deck Imputation June 26, / 44

37 Simulation Study Simulation Study - Study One Discussion for Model B results Point estimation unbiased for η 1 = E(Y ) even when the imputation model is incorrect. Note that, for m, the imputed estimator of η 1 can be written ˆη 1,imp = 1 n = 1 n n {δ i y i + (1 δ i )ŷ i } i=1 n i=1 ŷ i which is called the projection estimator. Kim and Rao (2012) showed design-consistency of the projection estimator. Kim (ISU) Fractional Hot Deck Imputation June 26, / 44

38 Simulation Study Simulation Study - Study One Discussion for Model B results (Cont d) However, all imputed estimator are biased for η 2 = Pr(Y < 1). The biases are much higher for MI and PFI than FHDI, with the corresponding z-statistics are -34.8,-33.5, and 5.5 for MI, PFI, and FHDI, respectively. Note that the true error distribution is e i {χ 2 (2) 2)/2 while the imputation model errors are generated from ei N(0, ˆσ e). 2 (See the picture next page). In FHDI, the donors are still generated from the true distribution, only the fractional weights are computed from the wrong model. Thus, the effect of model mis-specification is less severe than the other imputation methods that create synthetic values from the wrong model. Kim (ISU) Fractional Hot Deck Imputation June 26, / 44

39 Simulation Study Density True model Imputation model x Kim (ISU) Fractional Hot Deck Imputation June 26, / 44

40 Simulation Study Simulation Study - Study Two Bivariate data (x i, y i ) of size n = 100 with Y i = β 0 + β 1 x i + β 2 ( x 2 i 1 ) + e i (6) where (β 0, β 1, β 2 ) = (0, 0.9, 0.06), x i N (0, 1), e i N (0, 0.16), and x i and e i are independent. The variable x i is always observed but the probability that y i responds is 0.5. The imputation model is Y i = β 0 + β 1 x i + e i. That is, imputer s model uses extra information of β 2 = 0. From the imputed data, we fit model (6) and computed power of a test H 0 : β 2 = 0 with 0.05 significant level. In addition, we also considered the Complete-Case (CC) method that simply uses the complete cases only for the regression analysis. Kim (ISU) Fractional Hot Deck Imputation June 26, / 44

41 Simulation Study Simulation Study - Study Two Table 5 Simulation results for the Monte Carlo experiment based on 10,000 Monte Carlo samples. Method E(ˆθ) V (ˆθ) R.B. ( ˆV ) Power MI FI CC Table 5 shows that MI provides efficient point estimator than CC method but variance estimation is very conservative (more than 100% overestimation). Because of the serious positive bias of MI variance estimator, the statistical power of the test based on MI is actually lower than the CC method. Kim (ISU) Fractional Hot Deck Imputation June 26, / 44

42 Conclusion 1 Introduction 2 Review 3 Fractional Hot deck imputation 4 Simulation Study 5 Conclusion Kim (ISU) Fractional Hot Deck Imputation June 26, / 44

43 Conclusion Concluding Remarks Advantage 1 Hot deck imputation: uses real observations for imputed values. 2 Robust against model mis-specification. 3 Applicable even when the sampling design is informative. 4 Does not require congeniality condition for valid variance estimation. Disadvantage : May have a higher imputation variance than the imputation methods using synthetic values. Kim (ISU) Fractional Hot Deck Imputation June 26, / 44

44 Conclusion Future work Extension to single imputation (m = 1). Imputation variance component needs to be estimated. Instead of the calibration weighting step (in Step 3), we may consider using balanced imputation (Chauvet et al., 2011) FHDI for multivariate missing To be presented at the ISI meeting in Hong Kong To be implemented in SAS (in Proc Surveyimpute). Kim (ISU) Fractional Hot Deck Imputation June 26, / 44

45 References REFERENCES Binder, D. and Z. Patak (1994), Use of estimating functions for estimation from complex surveys, Journal of the American Statistical Association 89, Chauvet, G., J.-C. Deville and D. Haziza (2011), On balanced random imputation in surveys, Biometrika 98, Fuller, W. A. and J. K. Kim (2005), Hot deck imputation for the response model, Survey Methodology 31, Kim, J. K. (2004), Finite sample properties of multiple imputation estimators, The Annals of Statistics 32, Kim, J. K. (2011), Parametric fractional imputation for missing data analysis, Biometrika 98, Kim, J. K. and J. N. K. Rao (2012), Combining data from two independent surveys: a model-assisted approach, Biometrika 99, Kim, J. K., M. J. Brick, W. A. Fuller and G. Kalton (2006), On the bias of the multiple imputation variance estimator in survey sampling, Journal of the Royal Statistical Society: Series B 68, Kim (ISU) Fractional Hot Deck Imputation June 26, / 44

46 Conclusion Kim, J.K., W.A. Fuller and W.R. Bell (2011), Variance estimation for nearest neighbor imputation for u.s. census long form data, Annals of Applied Statistics 5, Meng, X. L. (1994), Multiple-imputation inferences with uncongenial sources of input (with discussion), Statistical Science 9, Rubin, D. B. (1987), Multiple Imputation for Nonresponse in Surveys, Wiley, New York. Kim (ISU) Fractional Hot Deck Imputation June 26, / 44

Fractional Imputation in Survey Sampling: A Comparative Review

Fractional Imputation in Survey Sampling: A Comparative Review Fractional Imputation in Survey Sampling: A Comparative Review Shu Yang Jae-Kwang Kim Iowa State University Joint Statistical Meetings, August 2015 Outline Introduction Fractional imputation Features Numerical

More information

A note on multiple imputation for general purpose estimation

A note on multiple imputation for general purpose estimation A note on multiple imputation for general purpose estimation Shu Yang Jae Kwang Kim SSC meeting June 16, 2015 Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 1 / 32 Introduction Basic Setup Assume

More information

6. Fractional Imputation in Survey Sampling

6. Fractional Imputation in Survey Sampling 6. Fractional Imputation in Survey Sampling 1 Introduction Consider a finite population of N units identified by a set of indices U = {1, 2,, N} with N known. Associated with each unit i in the population

More information

Parametric fractional imputation for missing data analysis

Parametric fractional imputation for missing data analysis 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Biometrika (????),??,?, pp. 1 15 C???? Biometrika Trust Printed in

More information

Two-phase sampling approach to fractional hot deck imputation

Two-phase sampling approach to fractional hot deck imputation Two-phase sampling approach to fractional hot deck imputation Jongho Im 1, Jae-Kwang Kim 1 and Wayne A. Fuller 1 Abstract Hot deck imputation is popular for handling item nonresponse in survey sampling.

More information

Statistical Methods for Handling Missing Data

Statistical Methods for Handling Missing Data Statistical Methods for Handling Missing Data Jae-Kwang Kim Department of Statistics, Iowa State University July 5th, 2014 Outline Textbook : Statistical Methods for handling incomplete data by Kim and

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

Combining data from two independent surveys: model-assisted approach

Combining data from two independent surveys: model-assisted approach Combining data from two independent surveys: model-assisted approach Jae Kwang Kim 1 Iowa State University January 20, 2012 1 Joint work with J.N.K. Rao, Carleton University Reference Kim, J.K. and Rao,

More information

Likelihood-based inference with missing data under missing-at-random

Likelihood-based inference with missing data under missing-at-random Likelihood-based inference with missing data under missing-at-random Jae-kwang Kim Joint work with Shu Yang Department of Statistics, Iowa State University May 4, 014 Outline 1. Introduction. Parametric

More information

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University Statistica Sinica 27 (2017), 000-000 doi:https://doi.org/10.5705/ss.202016.0155 DISCUSSION: DISSECTING MULTIPLE IMPUTATION FROM A MULTI-PHASE INFERENCE PERSPECTIVE: WHAT HAPPENS WHEN GOD S, IMPUTER S AND

More information

Chapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70

Chapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70 Chapter 5: Models used in conjunction with sampling J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70 Nonresponse Unit Nonresponse: weight adjustment Item Nonresponse:

More information

Recent Advances in the analysis of missing data with non-ignorable missingness

Recent Advances in the analysis of missing data with non-ignorable missingness Recent Advances in the analysis of missing data with non-ignorable missingness Jae-Kwang Kim Department of Statistics, Iowa State University July 4th, 2014 1 Introduction 2 Full likelihood-based ML estimation

More information

Nonresponse weighting adjustment using estimated response probability

Nonresponse weighting adjustment using estimated response probability Nonresponse weighting adjustment using estimated response probability Jae-kwang Kim Yonsei University, Seoul, Korea December 26, 2006 Introduction Nonresponse Unit nonresponse Item nonresponse Basic strategy

More information

On the bias of the multiple-imputation variance estimator in survey sampling

On the bias of the multiple-imputation variance estimator in survey sampling J. R. Statist. Soc. B (2006) 68, Part 3, pp. 509 521 On the bias of the multiple-imputation variance estimator in survey sampling Jae Kwang Kim, Yonsei University, Seoul, Korea J. Michael Brick, Westat,

More information

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data Jae-Kwang Kim 1 Iowa State University June 28, 2012 1 Joint work with Dr. Ming Zhou (when he was a PhD student at ISU)

More information

A measurement error model approach to small area estimation

A measurement error model approach to small area estimation A measurement error model approach to small area estimation Jae-kwang Kim 1 Spring, 2015 1 Joint work with Seunghwan Park and Seoyoung Kim Ouline Introduction Basic Theory Application to Korean LFS Discussion

More information

Chapter 4: Imputation

Chapter 4: Imputation Chapter 4: Imputation Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Basic Theory for imputation 3 Variance estimation after imputation 4 Replication variance estimation

More information

Chapter 8: Estimation 1

Chapter 8: Estimation 1 Chapter 8: Estimation 1 Jae-Kwang Kim Iowa State University Fall, 2014 Kim (ISU) Ch. 8: Estimation 1 Fall, 2014 1 / 33 Introduction 1 Introduction 2 Ratio estimation 3 Regression estimator Kim (ISU) Ch.

More information

Miscellanea A note on multiple imputation under complex sampling

Miscellanea A note on multiple imputation under complex sampling Biometrika (2017), 104, 1,pp. 221 228 doi: 10.1093/biomet/asw058 Printed in Great Britain Advance Access publication 3 January 2017 Miscellanea A note on multiple imputation under complex sampling BY J.

More information

Combining Non-probability and Probability Survey Samples Through Mass Imputation

Combining Non-probability and Probability Survey Samples Through Mass Imputation Combining Non-probability and Probability Survey Samples Through Mass Imputation Jae-Kwang Kim 1 Iowa State University & KAIST October 27, 2018 1 Joint work with Seho Park, Yilin Chen, and Changbao Wu

More information

Data Integration for Big Data Analysis for finite population inference

Data Integration for Big Data Analysis for finite population inference for Big Data Analysis for finite population inference Jae-kwang Kim ISU January 23, 2018 1 / 36 What is big data? 2 / 36 Data do not speak for themselves Knowledge Reproducibility Information Intepretation

More information

Introduction to Survey Data Integration

Introduction to Survey Data Integration Introduction to Survey Data Integration Jae-Kwang Kim Iowa State University May 20, 2014 Outline 1 Introduction 2 Survey Integration Examples 3 Basic Theory for Survey Integration 4 NASS application 5

More information

VARIANCE ESTIMATION FOR NEAREST NEIGHBOR IMPUTATION FOR U.S. CENSUS LONG FORM DATA

VARIANCE ESTIMATION FOR NEAREST NEIGHBOR IMPUTATION FOR U.S. CENSUS LONG FORM DATA Submitted to the Annals of Applied Statistics VARIANCE ESTIMATION FOR NEAREST NEIGHBOR IMPUTATION FOR U.S. CENSUS LONG FORM DATA By Jae Kwang Kim, Wayne A. Fuller and William R. Bell Iowa State University

More information

5. Fractional Hot deck Imputation

5. Fractional Hot deck Imputation 5. Fractioal Hot deck Imputatio Itroductio Suppose that we are iterested i estimatig θ EY or eve θ 2 P ry < c where y fy x where x is always observed ad y is subject to missigess. Assume MAR i the sese

More information

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW SSC Annual Meeting, June 2015 Proceedings of the Survey Methods Section ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW Xichen She and Changbao Wu 1 ABSTRACT Ordinal responses are frequently involved

More information

Robustness to Parametric Assumptions in Missing Data Models

Robustness to Parametric Assumptions in Missing Data Models Robustness to Parametric Assumptions in Missing Data Models Bryan Graham NYU Keisuke Hirano University of Arizona April 2011 Motivation Motivation We consider the classic missing data problem. In practice

More information

analysis of incomplete data in statistical surveys

analysis of incomplete data in statistical surveys analysis of incomplete data in statistical surveys Ugo Guarnera 1 1 Italian National Institute of Statistics, Italy guarnera@istat.it Jordan Twinning: Imputation - Amman, 6-13 Dec 2014 outline 1 origin

More information

Streamlining Missing Data Analysis by Aggregating Multiple Imputations at the Data Level

Streamlining Missing Data Analysis by Aggregating Multiple Imputations at the Data Level Streamlining Missing Data Analysis by Aggregating Multiple Imputations at the Data Level A Monte Carlo Simulation to Test the Tenability of the SuperMatrix Approach Kyle M Lang Quantitative Psychology

More information

Fractional hot deck imputation

Fractional hot deck imputation Biometrika (2004), 91, 3, pp. 559 578 2004 Biometrika Trust Printed in Great Britain Fractional hot deck imputation BY JAE KWANG KM Department of Applied Statistics, Yonsei University, Seoul, 120-749,

More information

arxiv:math/ v1 [math.st] 23 Jun 2004

arxiv:math/ v1 [math.st] 23 Jun 2004 The Annals of Statistics 2004, Vol. 32, No. 2, 766 783 DOI: 10.1214/009053604000000175 c Institute of Mathematical Statistics, 2004 arxiv:math/0406453v1 [math.st] 23 Jun 2004 FINITE SAMPLE PROPERTIES OF

More information

Bootstrap inference for the finite population total under complex sampling designs

Bootstrap inference for the finite population total under complex sampling designs Bootstrap inference for the finite population total under complex sampling designs Zhonglei Wang (Joint work with Dr. Jae Kwang Kim) Center for Survey Statistics and Methodology Iowa State University Jan.

More information

Plausible Values for Latent Variables Using Mplus

Plausible Values for Latent Variables Using Mplus Plausible Values for Latent Variables Using Mplus Tihomir Asparouhov and Bengt Muthén August 21, 2010 1 1 Introduction Plausible values are imputed values for latent variables. All latent variables can

More information

Calibration estimation using exponential tilting in sample surveys

Calibration estimation using exponential tilting in sample surveys Calibration estimation using exponential tilting in sample surveys Jae Kwang Kim February 23, 2010 Abstract We consider the problem of parameter estimation with auxiliary information, where the auxiliary

More information

Weighting in survey analysis under informative sampling

Weighting in survey analysis under informative sampling Jae Kwang Kim and Chris J. Skinner Weighting in survey analysis under informative sampling Article (Accepted version) (Refereed) Original citation: Kim, Jae Kwang and Skinner, Chris J. (2013) Weighting

More information

Introduction An approximated EM algorithm Simulation studies Discussion

Introduction An approximated EM algorithm Simulation studies Discussion 1 / 33 An Approximated Expectation-Maximization Algorithm for Analysis of Data with Missing Values Gong Tang Department of Biostatistics, GSPH University of Pittsburgh NISS Workshop on Nonignorable Nonresponse

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

The Use of Survey Weights in Regression Modelling

The Use of Survey Weights in Regression Modelling The Use of Survey Weights in Regression Modelling Chris Skinner London School of Economics and Political Science (with Jae-Kwang Kim, Iowa State University) Colorado State University, June 2013 1 Weighting

More information

An Overview of the Pros and Cons of Linearization versus Replication in Establishment Surveys

An Overview of the Pros and Cons of Linearization versus Replication in Establishment Surveys An Overview of the Pros and Cons of Linearization versus Replication in Establishment Surveys Richard Valliant University of Michigan and Joint Program in Survey Methodology University of Maryland 1 Introduction

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING

INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING Statistica Sinica 24 (2014), 1001-1015 doi:http://dx.doi.org/10.5705/ss.2013.038 INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING Seunghwan Park and Jae Kwang Kim Seoul National Univeristy

More information

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model

More information

Bayesian inference for multivariate extreme value distributions

Bayesian inference for multivariate extreme value distributions Bayesian inference for multivariate extreme value distributions Sebastian Engelke Clément Dombry, Marco Oesting Toronto, Fields Institute, May 4th, 2016 Main motivation For a parametric model Z F θ of

More information

Graybill Conference Poster Session Introductions

Graybill Conference Poster Session Introductions Graybill Conference Poster Session Introductions 2013 Graybill Conference in Modern Survey Statistics Colorado State University Fort Collins, CO June 10, 2013 Small Area Estimation with Incomplete Auxiliary

More information

Fractional imputation method of handling missing data and spatial statistics

Fractional imputation method of handling missing data and spatial statistics Graduate Theses and Dissertations Graduate College 2014 Fractional imputation method of handling missing data and spatial statistics Shu Yang Iowa State University Follow this and additional works at:

More information

Using Estimating Equations for Spatially Correlated A

Using Estimating Equations for Spatially Correlated A Using Estimating Equations for Spatially Correlated Areal Data December 8, 2009 Introduction GEEs Spatial Estimating Equations Implementation Simulation Conclusion Typical Problem Assess the relationship

More information

Chapter 3: Maximum Likelihood Theory

Chapter 3: Maximum Likelihood Theory Chapter 3: Maximum Likelihood Theory Florian Pelgrin HEC September-December, 2010 Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 1 / 40 1 Introduction Example 2 Maximum likelihood

More information

Analyzing Pilot Studies with Missing Observations

Analyzing Pilot Studies with Missing Observations Analyzing Pilot Studies with Missing Observations Monnie McGee mmcgee@smu.edu. Department of Statistical Science Southern Methodist University, Dallas, Texas Co-authored with N. Bergasa (SUNY Downstate

More information

F & B Approaches to a simple model

F & B Approaches to a simple model A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 215 http://www.astro.cornell.edu/~cordes/a6523 Lecture 11 Applications: Model comparison Challenges in large-scale surveys

More information

Econometrics I, Estimation

Econometrics I, Estimation Econometrics I, Estimation Department of Economics Stanford University September, 2008 Part I Parameter, Estimator, Estimate A parametric is a feature of the population. An estimator is a function of the

More information

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score Causal Inference with General Treatment Regimes: Generalizing the Propensity Score David van Dyk Department of Statistics, University of California, Irvine vandyk@stat.harvard.edu Joint work with Kosuke

More information

Combining multiple observational data sources to estimate causal eects

Combining multiple observational data sources to estimate causal eects Department of Statistics, North Carolina State University Combining multiple observational data sources to estimate causal eects Shu Yang* syang24@ncsuedu Joint work with Peng Ding UC Berkeley May 23,

More information

An Introduction to Bayesian Linear Regression

An Introduction to Bayesian Linear Regression An Introduction to Bayesian Linear Regression APPM 5720: Bayesian Computation Fall 2018 A SIMPLE LINEAR MODEL Suppose that we observe explanatory variables x 1, x 2,..., x n and dependent variables y 1,

More information

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Institute of Statistics

More information

Some methods for handling missing values in outcome variables. Roderick J. Little

Some methods for handling missing values in outcome variables. Roderick J. Little Some methods for handling missing values in outcome variables Roderick J. Little Missing data principles Likelihood methods Outline ML, Bayes, Multiple Imputation (MI) Robust MAR methods Predictive mean

More information

arxiv: v2 [math.st] 20 Jun 2014

arxiv: v2 [math.st] 20 Jun 2014 A solution in small area estimation problems Andrius Čiginas and Tomas Rudys Vilnius University Institute of Mathematics and Informatics, LT-08663 Vilnius, Lithuania arxiv:1306.2814v2 [math.st] 20 Jun

More information

Model comparison and selection

Model comparison and selection BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)

More information

Basics of Modern Missing Data Analysis

Basics of Modern Missing Data Analysis Basics of Modern Missing Data Analysis Kyle M. Lang Center for Research Methods and Data Analysis University of Kansas March 8, 2013 Topics to be Covered An introduction to the missing data problem Missing

More information

Propensity score adjusted method for missing data

Propensity score adjusted method for missing data Graduate Theses and Dissertations Graduate College 2013 Propensity score adjusted method for missing data Minsun Kim Riddles Iowa State University Follow this and additional works at: http://lib.dr.iastate.edu/etd

More information

A Note on Bayesian Inference After Multiple Imputation

A Note on Bayesian Inference After Multiple Imputation A Note on Bayesian Inference After Multiple Imputation Xiang Zhou and Jerome P. Reiter Abstract This article is aimed at practitioners who plan to use Bayesian inference on multiplyimputed datasets in

More information

Interactions and Squares: Don t Transform, Just Impute!

Interactions and Squares: Don t Transform, Just Impute! Interactions and Squares: Don t Transform, Just Impute! Philipp Gaffert Volker Bosch Florian Meinfelder Abstract Multiple imputation [Rubin, 1987] is difficult to conduct if the analysis model includes

More information

Measurement Error and Linear Regression of Astronomical Data. Brandon Kelly Penn State Summer School in Astrostatistics, June 2007

Measurement Error and Linear Regression of Astronomical Data. Brandon Kelly Penn State Summer School in Astrostatistics, June 2007 Measurement Error and Linear Regression of Astronomical Data Brandon Kelly Penn State Summer School in Astrostatistics, June 2007 Classical Regression Model Collect n data points, denote i th pair as (η

More information

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models

The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models John M. Neuhaus Charles E. McCulloch Division of Biostatistics University of California, San

More information

Biostat 2065 Analysis of Incomplete Data

Biostat 2065 Analysis of Incomplete Data Biostat 2065 Analysis of Incomplete Data Gong Tang Dept of Biostatistics University of Pittsburgh October 20, 2005 1. Large-sample inference based on ML Let θ is the MLE, then the large-sample theory implies

More information

The propensity score with continuous treatments

The propensity score with continuous treatments 7 The propensity score with continuous treatments Keisuke Hirano and Guido W. Imbens 1 7.1 Introduction Much of the work on propensity score analysis has focused on the case in which the treatment is binary.

More information

Accounting for Complex Sample Designs via Mixture Models

Accounting for Complex Sample Designs via Mixture Models Accounting for Complex Sample Designs via Finite Normal Mixture Models 1 1 University of Michigan School of Public Health August 2009 Talk Outline 1 2 Accommodating Sampling Weights in Mixture Models 3

More information

Modification and Improvement of Empirical Likelihood for Missing Response Problem

Modification and Improvement of Empirical Likelihood for Missing Response Problem UW Biostatistics Working Paper Series 12-30-2010 Modification and Improvement of Empirical Likelihood for Missing Response Problem Kwun Chuen Gary Chan University of Washington - Seattle Campus, kcgchan@u.washington.edu

More information

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Econometrics Workshop UNC

More information

Bayesian Inference. Chapter 9. Linear models and regression

Bayesian Inference. Chapter 9. Linear models and regression Bayesian Inference Chapter 9. Linear models and regression M. Concepcion Ausin Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in Mathematical Engineering

More information

Testing Restrictions and Comparing Models

Testing Restrictions and Comparing Models Econ. 513, Time Series Econometrics Fall 00 Chris Sims Testing Restrictions and Comparing Models 1. THE PROBLEM We consider here the problem of comparing two parametric models for the data X, defined by

More information

Calibration Estimation for Semiparametric Copula Models under Missing Data

Calibration Estimation for Semiparametric Copula Models under Missing Data Calibration Estimation for Semiparametric Copula Models under Missing Data Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Economics and Economic Growth Centre

More information

Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions

Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions Joe Schafer Office of the Associate Director for Research and Methodology U.S. Census

More information

A weighted simulation-based estimator for incomplete longitudinal data models

A weighted simulation-based estimator for incomplete longitudinal data models To appear in Statistics and Probability Letters, 113 (2016), 16-22. doi 10.1016/j.spl.2016.02.004 A weighted simulation-based estimator for incomplete longitudinal data models Daniel H. Li 1 and Liqun

More information

Statistical Methods. Missing Data snijders/sm.htm. Tom A.B. Snijders. November, University of Oxford 1 / 23

Statistical Methods. Missing Data  snijders/sm.htm. Tom A.B. Snijders. November, University of Oxford 1 / 23 1 / 23 Statistical Methods Missing Data http://www.stats.ox.ac.uk/ snijders/sm.htm Tom A.B. Snijders University of Oxford November, 2011 2 / 23 Literature: Joseph L. Schafer and John W. Graham, Missing

More information

REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY

REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY J.D. Opsomer, W.A. Fuller and X. Li Iowa State University, Ames, IA 50011, USA 1. Introduction Replication methods are often used in

More information

Some methods for handling missing data in surveys

Some methods for handling missing data in surveys Graduate Theses and Dissertations Graduate College 2015 Some methods for handling missing data in surveys Jongho Im Iowa State University Follow this and additional works at: http://lib.dr.iastate.edu/etd

More information

Reconstruction of individual patient data for meta analysis via Bayesian approach

Reconstruction of individual patient data for meta analysis via Bayesian approach Reconstruction of individual patient data for meta analysis via Bayesian approach Yusuke Yamaguchi, Wataru Sakamoto and Shingo Shirahata Graduate School of Engineering Science, Osaka University Masashi

More information

Marginal Specifications and a Gaussian Copula Estimation

Marginal Specifications and a Gaussian Copula Estimation Marginal Specifications and a Gaussian Copula Estimation Kazim Azam Abstract Multivariate analysis involving random variables of different type like count, continuous or mixture of both is frequently required

More information

Imputation for Missing Data under PPSWR Sampling

Imputation for Missing Data under PPSWR Sampling July 5, 2010 Beijing Imputation for Missing Data under PPSWR Sampling Guohua Zou Academy of Mathematics and Systems Science Chinese Academy of Sciences 1 23 () Outline () Imputation method under PPSWR

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

Combining Non-probability and. Probability Survey Samples Through Mass Imputation

Combining Non-probability and. Probability Survey Samples Through Mass Imputation Combining Non-probability and arxiv:1812.10694v2 [stat.me] 31 Dec 2018 Probability Survey Samples Through Mass Imputation Jae Kwang Kim Seho Park Yilin Chen Changbao Wu January 1, 2019 Abstract. This paper

More information

Statistical Practice

Statistical Practice Statistical Practice A Note on Bayesian Inference After Multiple Imputation Xiang ZHOU and Jerome P. REITER This article is aimed at practitioners who plan to use Bayesian inference on multiply-imputed

More information

Comparison of multiple imputation methods for systematically and sporadically missing multilevel data

Comparison of multiple imputation methods for systematically and sporadically missing multilevel data Comparison of multiple imputation methods for systematically and sporadically missing multilevel data V. Audigier, I. White, S. Jolani, T. Debray, M. Quartagno, J. Carpenter, S. van Buuren, M. Resche-Rigon

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

EIE6207: Estimation Theory

EIE6207: Estimation Theory EIE6207: Estimation Theory Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: Steven M.

More information

Small Area Modeling of County Estimates for Corn and Soybean Yields in the US

Small Area Modeling of County Estimates for Corn and Soybean Yields in the US Small Area Modeling of County Estimates for Corn and Soybean Yields in the US Matt Williams National Agricultural Statistics Service United States Department of Agriculture Matt.Williams@nass.usda.gov

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2015 Julien Berestycki (University of Oxford) SB2a MT 2015 1 / 16 Lecture 16 : Bayesian analysis

More information

PIRLS 2016 Achievement Scaling Methodology 1

PIRLS 2016 Achievement Scaling Methodology 1 CHAPTER 11 PIRLS 2016 Achievement Scaling Methodology 1 The PIRLS approach to scaling the achievement data, based on item response theory (IRT) scaling with marginal estimation, was developed originally

More information

Physics 403. Segev BenZvi. Propagation of Uncertainties. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Propagation of Uncertainties. Department of Physics and Astronomy University of Rochester Physics 403 Propagation of Uncertainties Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Maximum Likelihood and Minimum Least Squares Uncertainty Intervals

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 05 Full points may be obtained for correct answers to eight questions Each numbered question (which may have several parts) is worth

More information

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:. MATHEMATICAL STATISTICS Homework assignment Instructions Please turn in the homework with this cover page. You do not need to edit the solutions. Just make sure the handwriting is legible. You may discuss

More information

STATISTICAL INFERENCE WITH DATA AUGMENTATION AND PARAMETER EXPANSION

STATISTICAL INFERENCE WITH DATA AUGMENTATION AND PARAMETER EXPANSION STATISTICAL INFERENCE WITH arxiv:1512.00847v1 [math.st] 2 Dec 2015 DATA AUGMENTATION AND PARAMETER EXPANSION Yannis G. Yatracos Faculty of Communication and Media Studies Cyprus University of Technology

More information

For more information about how to cite these materials visit

For more information about how to cite these materials visit Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/

More information

Topics and Papers for Spring 14 RIT

Topics and Papers for Spring 14 RIT Eric Slud Feb. 3, 204 Topics and Papers for Spring 4 RIT The general topic of the RIT is inference for parameters of interest, such as population means or nonlinearregression coefficients, in the presence

More information

Bayesian Analysis of Latent Variable Models using Mplus

Bayesian Analysis of Latent Variable Models using Mplus Bayesian Analysis of Latent Variable Models using Mplus Tihomir Asparouhov and Bengt Muthén Version 2 June 29, 2010 1 1 Introduction In this paper we describe some of the modeling possibilities that are

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

Cluster Sampling 2. Chapter Introduction

Cluster Sampling 2. Chapter Introduction Chapter 7 Cluster Sampling 7.1 Introduction In this chapter, we consider two-stage cluster sampling where the sample clusters are selected in the first stage and the sample elements are selected in the

More information

L09. PARTICLE FILTERING. NA568 Mobile Robotics: Methods & Algorithms

L09. PARTICLE FILTERING. NA568 Mobile Robotics: Methods & Algorithms L09. PARTICLE FILTERING NA568 Mobile Robotics: Methods & Algorithms Particle Filters Different approach to state estimation Instead of parametric description of state (and uncertainty), use a set of state

More information

A decision theoretic approach to Imputation in finite population sampling

A decision theoretic approach to Imputation in finite population sampling A decision theoretic approach to Imputation in finite population sampling Glen Meeden School of Statistics University of Minnesota Minneapolis, MN 55455 August 1997 Revised May and November 1999 To appear

More information