Imputation for Missing Data under PPSWR Sampling
|
|
- Felicia Malone
- 5 years ago
- Views:
Transcription
1 July 5, 2010 Beijing Imputation for Missing Data under PPSWR Sampling Guohua Zou Academy of Mathematics and Systems Science Chinese Academy of Sciences 1 23
2 () Outline () Imputation method under PPSWR sampling () Special case: SRS sampling and uniform response () () 2 23
3 () Item nonresponse: occurs frequently in sample surveys. Example In sample survey on transportation, some vehicles may not be found, but their tonnage or seat capacity is known to us. Solutions: (1) Increasing response probability; (2) Imputing the missing values of the sampled units. Imputation methods: Ratio imputation, regression imputation, random imputation etc. Shortcoming: Uniform response (often), simple random sampling PPSWR sampling: the sampling with probability proportional to size with replacement which is often used in the first stage of multistage sampling. 3 23
4 () Imputation method under PPSWR sampling 1. Notation Survey population U: consist of N distinct units identified through the labels i = 1,..., N. Suppose that the auxiliary variable, X, is available for each unit of the population, but the variable of interest, Y, is missing for some of the sampled units. s n : sample with size n drawn from U by PPSWR sampling. s r : the respondent set with size r( 1) s n r : the nonrespondent set with size n r.
5 Uniform response mechanism: independent response across sample units and equal response probability, p (q 1 p). Non-uniform response mechanism: independent response across sample units and possibly unequal response probabilities, p i (q i 1 p i ). Response indicator on y i : { 1, if the unit i responds to y i, I i = 0, otherwise. 5 23
6 2. Imputation method We first let p i be known. For the missing Y values, we suggest the following imputation method: y i = 1 n r q j y j x i, i s n r. p j s j x j r 6 23
7 It is interesting to note that yi is an approximation of the weighted least squares predictor under the following superpopulation model: y i = βx i + e i, ε(e i ) = 0, ε(e 2 i ) = σ2 x 2 i, ε(e ie j ) = 0 (i j). In fact, it can be seen that under the above superpopulation model, the weighted least squares estimator of β with the weights w i q i /(p i x 2 i ) is given by s ˆβ = r (q i y i )/(p i x i ) s r 1/p i r. Further, the expectation of s r 1/p i with respect to the response mechanism is n. 7 23
8 3. Estimator of population mean and its variance Applying the above imputation method to the PPSWR sampling, the corresponding Hansen-Hurwitz estimator of the population mean Ȳ is ˆȲ P P S = X n i s r y i = X p i x i n i s n y i p i x i I i. is design-unbiased under the non- Theorem 1. The estimator ˆȲ P P S uniform response mechanism. Theorem 2. Under the non-uniform response, the variance of ˆȲ P P S is given by V ( ˆȲ P P S) = 1 N 2 where Z i = X i /X. { 1 n N i=1 ( ) 2 Yi Z i Y + 1 Z i n N i=1 } q i Yi 2, p i Z i 8 23
9 4. Jackknife variance estimator Define the imputed values as follows: For i s n r, ( 1 q k y ) k x i, j s r, n r p k x k yi a k s (j) = r {j} ( 1 q k y ) k x i, j s n r, n r 1 p k x k k s r when the j-th sample unit is deleted. Based on these imputed values, the estimator of Ȳ can be obtained as X y i, j s r, n 1 p i x i ˆȲ P a i s P S(j) = r {j} X y i, j s n r, n 1 p i x i i s r when the j-th sample unit is deleted. 9 23
10 Define the j-th pseudovalue as follows: ˆȲ j = n ˆȲ P a P S (n 1) ˆȲ P P S(j). A jackknife variance estimator of ˆȲ P P S v J ( ˆȲ P P S ) = n 1 n = [ ˆȲ P P S j s n X 2 n(n 1) i s r is then given by y2 i p 2 i x2 i ˆȲ a P P S(j)] 2 1 n ( i s r y i p i x i ) 2. Theorem 3. Let n > 1. Then under the non-uniform response, we have E[v J ( ˆȲ P P S)] = V ( ˆȲ P P S). Theorem 3 shows that the jackknife variance estimator v J ( ˆȲ P P S ) is designunbiased under the non-uniform response
11 5. Extension to the case of unknown response probability In practice, the response probability p i is rare to be known. Here we model the response probability p i by a parametric model p i = g(x i ; θ), where g is a known smooth function and θ is an unknown finitedimensional parameter. An example of such a model is the logistic regression model. The estimator ˆθ of the parameter θ can be obtained by the maximum likelihood approach. Let ˆp i = g(x i ; ˆθ) be the estimated response probability. Then the corresponding estimator of the population mean Ȳ and its jackknife variance estimator are given by ˆȲ e P P S = X n i s r y i ˆp i x i, 11 23
12 and respectively. v J ( ˆȲ e P P S) = X 2 n(n 1) i s r y2 i ˆp 2 i x2 i 1 n ( i s r y i ˆp i x i ) 2, For the relationship between the estimators with known and unknown p i, under some regularity conditions, we have and ˆȲ e P P S = ˆȲ P P S + O p ( 1 n ), ( ) v J ( ˆȲ P e P S) = v J ( ˆȲ 1 P P S) + O p. n 3/
13 () Special case: SRS sampling and uniform response Imputed value: y i = Imputed estimator: 1 r y j x j s j r ˆȲ m SRS= ū r x n + where ū r = 1 u j with u j = y j /x j. r j s r x i, i s n r. r(n 1) (r 1)n (ȳ r ū r x r ), 13 23
14 The estimator ˆȲ SRS m is a design-unbiased estimator under uniform response. It is interesting that it is just the version of Hartley-Ross estimator (Hartley and Ross 1954, Nature) for estimating the population mean under two-phase sampling. Variance: V ( ˆȲ SRS) m = 1 ( ) np [S2 Y + (1 p)(2ȳ Ū X + Ū 2 T Ū 2 X2 2Ū Z)] 1 + O n 2, where T = 1 N and N T i with T i = Xi 2, and Z = 1 N i=1 S 2 Y = 1 N 1 N (Y i Ȳ )2. i=1 N Z i with Z i = Y i X i, i=
15 Jackknife variance estimator: v J ( ˆȲ SRS m ) = 1 { (n 1)(r 1)ū 2 n(n 1)(r 1) r s 2 x(n) + 1 (r 2) 2[n(r 2) x n (nr r 2) x r ] 2 s 2 u(r) + (n 2)2 r 2 (r 2) 2 s2 y(r) where s y ν 1 u ν 2x ν 3(r) = 1 r 1 + (n 2)r(nr 2r2 + 4r 4) (r 2) 2 ū 2 r s 2 x(r) 2(n 2)r + (r 2) 2 [n(r 2) x n (nr r 2) x r ] s yu (r) 2 (r 2) 2[(n2 r n(r 2 r + 2) (r 1)(r 2))(r 2) x n (n 2 r 2 nr(r 2 + 4) + 2(3r 2 4r + 4)) x r ]ū r s ux (r) 2(n 2)r(nr r2 + r 2) (r 2) 2 ū r s yx (r) 2 r 2 [n(r 2) x n (nr r 2) x r ]s u 2 x(r) + 2(nr r2 + r 2) ū r s ux 2(r) r 2 2r +s u 2 x2(r) n (ȳ r ū r x r )s ux (r) + 2(n 2)r s yux (r) r 2 } r2 n(r 1) (ȳ r ū r x r ) 2, j s r (y j ȳ r ) ν 1 (u j ū r ) ν 2 (x j x r ) ν
16 Two approximate Jackknife variance estimators: v J ( ˆȲ SRS m ) = nū2 1 rs 2 x(n) + 1 r [( x r x n ) 2 s 2 u(r) 2( x r x n )s yu (r) + s 2 y(r)] ( 1 + r 2 ) ( 1 ū 2 n rs 2 x(r) + 2 r 1 ) ū r [( x r x n )s ux (r) s yx (r)], n and v J( ˆȲ SRS) m = nū2 1 rs 2 x(n) + 1 ( 1 r s2 y(r) + r 2 ) ( 1 ū 2 n rs 2 x(r) 2 r 1 ) ū r s yx (r). n 16 23
17 Asymptotic design-unbiasedness: (i) E[v J ( ˆȲ SRS m m )] = V ( ˆȲ SRS ) + O ( ) 1 n. 2 (ii) E[v J ( ˆȲ SRS m m )] = V ( ˆȲ SRS ) + O ( ) 1 n. 2 (iii) E[v J ( ˆȲ SRS m m )] = V ( ˆȲ SRS ) + O ( ) 1 n. 2 Remark: We should note that the (approximate) design-unbiasedness is the main requirement for a good estimator in survey sampling. The approximate design-unbiasedness of the Jackknife variance estimators has been found first in Zou and Feng (1998), and then in Skinner and Rao (2002) and Zou et al. (2002). Such a property universally holds from our subsequent analysis
18 () The data are generated from the three ratio models which are different only in the auxiliary variables: y i = 3.9x i + x i ε i with x i U(0.1, 2.1), N(1, 1), and N(20, 16), respectively, ε i N(0, 1), and x i and ε i are assumed to be independent. In the case of uniform response, we set p = 0.76; in the case of nonuniform response, the unequal response probability p i for the unit i follows the logistic model p i = exp( x i) 1 + exp( x i )
19 We first generate a finite population with the size of N = 10, 000. Then the samples with n = 100 and n = 500 are drawn from the finite population by the PPSWR sampling, respectively. We repeat the process B = 5, 000 times. For the b-th run, denote the estimators of the population mean under the uniform (b) and non-uniform responses as and ˆȲ P P S, respectively. We calculate the ˆȲ I(b) P P S simulated means and variances as follows: and E ( ˆȲ I P P S) = 1 B V ( ˆȲ I P P S) = 1 B B ( b=1 B b=1 ˆȲ I(b) P P S, ˆȲ I(b) P P S Ȳ )2, E ( ˆȲ P P S) = 1 B V ( ˆȲ P P S) = 1 B B b=1 B ( b=1 ˆȲ (b) P P S ; ˆȲ (b) P P S Ȳ )2. Similarly, the corresponding jackknife variance estimators are calculated as E [v J ( ˆȲ I P P S)] = 1 B B b=1 v (b) J ( ˆȲ I P P S), and E [v J ( ˆȲ P P S)] = 1 B B b=1 v (b) J ( ˆȲ P P S)
20 20 23
21 Table 1 summarizes the results on the simulated mean, variance and jackknife variance estimate. It can be seen from the table that both of the estimators ˆȲ P I P S and ˆȲ P P S are very close to the true population means for the three distributions of auxiliary variable. Also, the jackknife variance estimators perform very well
22 To study the effect of the response probability, we set various response probabilities: p = 0.5 for uniform response, and p i follows p i = exp{0.3(x i X)} 1 + exp{0.3(x i X)} for non-uniform response. The results are presented in Table 2. It is observed that the approximate design-unbiasedness of the proposed estimators still holds. On the other hand, it is also clear that the variances become larger for low response probability. For some other settings of response probability, we obtain the similar results but omit them here for saving space
23 23 23
24 () Incomplete auxiliary information and multiple auxiliary information Estimation of response probability: the use of non-parametric approach. Other unequal probability sampling 24 23
25 Thank you! 25 23
Chapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70
Chapter 5: Models used in conjunction with sampling J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70 Nonresponse Unit Nonresponse: weight adjustment Item Nonresponse:
More informationNonresponse weighting adjustment using estimated response probability
Nonresponse weighting adjustment using estimated response probability Jae-kwang Kim Yonsei University, Seoul, Korea December 26, 2006 Introduction Nonresponse Unit nonresponse Item nonresponse Basic strategy
More information6. Fractional Imputation in Survey Sampling
6. Fractional Imputation in Survey Sampling 1 Introduction Consider a finite population of N units identified by a set of indices U = {1, 2,, N} with N known. Associated with each unit i in the population
More informationCombining data from two independent surveys: model-assisted approach
Combining data from two independent surveys: model-assisted approach Jae Kwang Kim 1 Iowa State University January 20, 2012 1 Joint work with J.N.K. Rao, Carleton University Reference Kim, J.K. and Rao,
More informationJong-Min Kim* and Jon E. Anderson. Statistics Discipline Division of Science and Mathematics University of Minnesota at Morris
Jackknife Variance Estimation for Two Samples after Imputation under Two-Phase Sampling Jong-Min Kim* and Jon E. Anderson jongmink@mrs.umn.edu Statistics Discipline Division of Science and Mathematics
More informationNonrespondent subsample multiple imputation in two-phase random sampling for nonresponse
Nonrespondent subsample multiple imputation in two-phase random sampling for nonresponse Nanhua Zhang Division of Biostatistics & Epidemiology Cincinnati Children s Hospital Medical Center (Joint work
More informationJong-Min Kim* and Jon E. Anderson. Statistics Discipline Division of Science and Mathematics University of Minnesota at Morris
Jackknife Variance Estimation of the Regression and Calibration Estimator for Two 2-Phase Samples Jong-Min Kim* and Jon E. Anderson jongmink@morris.umn.edu Statistics Discipline Division of Science and
More informationChapter 8: Estimation 1
Chapter 8: Estimation 1 Jae-Kwang Kim Iowa State University Fall, 2014 Kim (ISU) Ch. 8: Estimation 1 Fall, 2014 1 / 33 Introduction 1 Introduction 2 Ratio estimation 3 Regression estimator Kim (ISU) Ch.
More informationRecent Advances in the analysis of missing data with non-ignorable missingness
Recent Advances in the analysis of missing data with non-ignorable missingness Jae-Kwang Kim Department of Statistics, Iowa State University July 4th, 2014 1 Introduction 2 Full likelihood-based ML estimation
More informationA measurement error model approach to small area estimation
A measurement error model approach to small area estimation Jae-kwang Kim 1 Spring, 2015 1 Joint work with Seunghwan Park and Seoyoung Kim Ouline Introduction Basic Theory Application to Korean LFS Discussion
More informationModel Assisted Survey Sampling
Carl-Erik Sarndal Jan Wretman Bengt Swensson Model Assisted Survey Sampling Springer Preface v PARTI Principles of Estimation for Finite Populations and Important Sampling Designs CHAPTER 1 Survey Sampling
More informationESTIMATION OF DISTRIBUTION FUNCTION AND QUANTILES USING THE MODEL-CALIBRATED PSEUDO EMPIRICAL LIKELIHOOD METHOD
Statistica Sinica 12(2002), 1223-1239 ESTIMATION OF DISTRIBUTION FUNCTION AND QUANTILES USING THE MOD-CALIBRATED PSEUDO EMPIRICAL LIKIHOOD METHOD Jiahua Chen and Changbao Wu University of Waterloo Abstract:
More informationTopic 12 Overview of Estimation
Topic 12 Overview of Estimation Classical Statistics 1 / 9 Outline Introduction Parameter Estimation Classical Statistics Densities and Likelihoods 2 / 9 Introduction In the simplest possible terms, the
More informationAn Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data
An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data Jae-Kwang Kim 1 Iowa State University June 28, 2012 1 Joint work with Dr. Ming Zhou (when he was a PhD student at ISU)
More informationFractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling
Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction
More informationInference with Imputed Conditional Means
Inference with Imputed Conditional Means Joseph L. Schafer and Nathaniel Schenker June 4, 1997 Abstract In this paper, we develop analytic techniques that can be used to produce appropriate inferences
More informationREPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES
Statistica Sinica 8(1998), 1153-1164 REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES Wayne A. Fuller Iowa State University Abstract: The estimation of the variance of the regression estimator for
More informationGraybill Conference Poster Session Introductions
Graybill Conference Poster Session Introductions 2013 Graybill Conference in Modern Survey Statistics Colorado State University Fort Collins, CO June 10, 2013 Small Area Estimation with Incomplete Auxiliary
More informationChapter 3: Maximum Likelihood Theory
Chapter 3: Maximum Likelihood Theory Florian Pelgrin HEC September-December, 2010 Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 1 / 40 1 Introduction Example 2 Maximum likelihood
More informationPh.D. Qualifying Exam Friday Saturday, January 3 4, 2014
Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014 Put your solution to each problem on a separate sheet of paper. Problem 1. (5166) Assume that two random samples {x i } and {y i } are independently
More informationRegression Models - Introduction
Regression Models - Introduction In regression models, two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent variable,
More informationASYMPTOTIC NORMALITY UNDER TWO-PHASE SAMPLING DESIGNS
Statistica Sinica 17(2007), 1047-1064 ASYMPTOTIC NORMALITY UNDER TWO-PHASE SAMPLING DESIGNS Jiahua Chen and J. N. K. Rao University of British Columbia and Carleton University Abstract: Large sample properties
More informationSampling from Finite Populations Jill M. Montaquila and Graham Kalton Westat 1600 Research Blvd., Rockville, MD 20850, U.S.A.
Sampling from Finite Populations Jill M. Montaquila and Graham Kalton Westat 1600 Research Blvd., Rockville, MD 20850, U.S.A. Keywords: Survey sampling, finite populations, simple random sampling, systematic
More informationanalysis of incomplete data in statistical surveys
analysis of incomplete data in statistical surveys Ugo Guarnera 1 1 Italian National Institute of Statistics, Italy guarnera@istat.it Jordan Twinning: Imputation - Amman, 6-13 Dec 2014 outline 1 origin
More informationCombining Non-probability and Probability Survey Samples Through Mass Imputation
Combining Non-probability and Probability Survey Samples Through Mass Imputation Jae-Kwang Kim 1 Iowa State University & KAIST October 27, 2018 1 Joint work with Seho Park, Yilin Chen, and Changbao Wu
More informationAsymptotic Normality under Two-Phase Sampling Designs
Asymptotic Normality under Two-Phase Sampling Designs Jiahua Chen and J. N. K. Rao University of Waterloo and University of Carleton Abstract Large sample properties of statistical inferences in the context
More informationA decision theoretic approach to Imputation in finite population sampling
A decision theoretic approach to Imputation in finite population sampling Glen Meeden School of Statistics University of Minnesota Minneapolis, MN 55455 August 1997 Revised May and November 1999 To appear
More informationMS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari
MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind
More informationAn Overview of the Pros and Cons of Linearization versus Replication in Establishment Surveys
An Overview of the Pros and Cons of Linearization versus Replication in Establishment Surveys Richard Valliant University of Michigan and Joint Program in Survey Methodology University of Maryland 1 Introduction
More informationData Integration for Big Data Analysis for finite population inference
for Big Data Analysis for finite population inference Jae-kwang Kim ISU January 23, 2018 1 / 36 What is big data? 2 / 36 Data do not speak for themselves Knowledge Reproducibility Information Intepretation
More informationEmpirical Likelihood Methods for Sample Survey Data: An Overview
AUSTRIAN JOURNAL OF STATISTICS Volume 35 (2006), Number 2&3, 191 196 Empirical Likelihood Methods for Sample Survey Data: An Overview J. N. K. Rao Carleton University, Ottawa, Canada Abstract: The use
More informationWhat is Survey Weighting? Chris Skinner University of Southampton
What is Survey Weighting? Chris Skinner University of Southampton 1 Outline 1. Introduction 2. (Unresolved) Issues 3. Further reading etc. 2 Sampling 3 Representation 4 out of 8 1 out of 10 4 Weights 8/4
More informationIntroduction to Survey Data Integration
Introduction to Survey Data Integration Jae-Kwang Kim Iowa State University May 20, 2014 Outline 1 Introduction 2 Survey Integration Examples 3 Basic Theory for Survey Integration 4 NASS application 5
More informationReview of Econometrics
Review of Econometrics Zheng Tian June 5th, 2017 1 The Essence of the OLS Estimation Multiple regression model involves the models as follows Y i = β 0 + β 1 X 1i + β 2 X 2i + + β k X ki + u i, i = 1,...,
More informationOn the bias of the multiple-imputation variance estimator in survey sampling
J. R. Statist. Soc. B (2006) 68, Part 3, pp. 509 521 On the bias of the multiple-imputation variance estimator in survey sampling Jae Kwang Kim, Yonsei University, Seoul, Korea J. Michael Brick, Westat,
More informationChapter 4: Imputation
Chapter 4: Imputation Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Basic Theory for imputation 3 Variance estimation after imputation 4 Replication variance estimation
More informationFractional Imputation in Survey Sampling: A Comparative Review
Fractional Imputation in Survey Sampling: A Comparative Review Shu Yang Jae-Kwang Kim Iowa State University Joint Statistical Meetings, August 2015 Outline Introduction Fractional imputation Features Numerical
More informationBIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation
BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation Yujin Chung November 29th, 2016 Fall 2016 Yujin Chung Lec13: MLE Fall 2016 1/24 Previous Parametric tests Mean comparisons (normality assumption)
More informationMachine Learning Basics: Maximum Likelihood Estimation
Machine Learning Basics: Maximum Likelihood Estimation Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics 1. Learning
More informationGeneralized Linear Models Introduction
Generalized Linear Models Introduction Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Linear Models For many problems, standard linear regression approaches don t work. Sometimes,
More informationChapter 4. Replication Variance Estimation. J. Kim, W. Fuller (ISU) Chapter 4 7/31/11 1 / 28
Chapter 4 Replication Variance Estimation J. Kim, W. Fuller (ISU) Chapter 4 7/31/11 1 / 28 Jackknife Variance Estimation Create a new sample by deleting one observation n 1 n n ( x (k) x) 2 = x (k) = n
More informationA Method for Inferring Label Sampling Mechanisms in Semi-Supervised Learning
A Method for Inferring Label Sampling Mechanisms in Semi-Supervised Learning Saharon Rosset Data Analytics Research Group IBM T.J. Watson Research Center Yorktown Heights, NY 1598 srosset@us.ibm.com Hui
More informationA Method for Inferring Label Sampling Mechanisms in Semi-Supervised Learning
A Method for Inferring Label Sampling Mechanisms in Semi-Supervised Learning Saharon Rosset Data Analytics Research Group IBM T.J. Watson Research Center Yorktown Heights, NY 1598 srosset@us.ibm.com Hui
More informationParametric fractional imputation for missing data analysis
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Biometrika (????),??,?, pp. 1 15 C???? Biometrika Trust Printed in
More informationStatistics II. Management Degree Management Statistics IIDegree. Statistics II. 2 nd Sem. 2013/2014. Management Degree. Simple Linear Regression
Model 1 2 Ordinary Least Squares 3 4 Non-linearities 5 of the coefficients and their to the model We saw that econometrics studies E (Y x). More generally, we shall study regression analysis. : The regression
More informationTWO-WAY CONTINGENCY TABLES UNDER CONDITIONAL HOT DECK IMPUTATION
Statistica Sinica 13(2003), 613-623 TWO-WAY CONTINGENCY TABLES UNDER CONDITIONAL HOT DECK IMPUTATION Hansheng Wang and Jun Shao Peking University and University of Wisconsin Abstract: We consider the estimation
More informationUniversity of Michigan School of Public Health
University of Michigan School of Public Health The University of Michigan Department of Biostatistics Working Paper Series Year 003 Paper Weighting Adustments for Unit Nonresponse with Multiple Outcome
More informationABHELSINKI UNIVERSITY OF TECHNOLOGY
Cross-Validation, Information Criteria, Expected Utilities and the Effective Number of Parameters Aki Vehtari and Jouko Lampinen Laboratory of Computational Engineering Introduction Expected utility -
More informationCluster Sampling 2. Chapter Introduction
Chapter 7 Cluster Sampling 7.1 Introduction In this chapter, we consider two-stage cluster sampling where the sample clusters are selected in the first stage and the sample elements are selected in the
More informationPlausible Values for Latent Variables Using Mplus
Plausible Values for Latent Variables Using Mplus Tihomir Asparouhov and Bengt Muthén August 21, 2010 1 1 Introduction Plausible values are imputed values for latent variables. All latent variables can
More information2018 2019 1 9 sei@mistiu-tokyoacjp http://wwwstattu-tokyoacjp/~sei/lec-jhtml 11 552 3 0 1 2 3 4 5 6 7 13 14 33 4 1 4 4 2 1 1 2 2 1 1 12 13 R?boxplot boxplotstats which does the computation?boxplotstats
More informationTwo-phase sampling approach to fractional hot deck imputation
Two-phase sampling approach to fractional hot deck imputation Jongho Im 1, Jae-Kwang Kim 1 and Wayne A. Fuller 1 Abstract Hot deck imputation is popular for handling item nonresponse in survey sampling.
More informationWeighting in survey analysis under informative sampling
Jae Kwang Kim and Chris J. Skinner Weighting in survey analysis under informative sampling Article (Accepted version) (Refereed) Original citation: Kim, Jae Kwang and Skinner, Chris J. (2013) Weighting
More informationThe Effective Use of Complete Auxiliary Information From Survey Data
The Effective Use of Complete Auxiliary Information From Survey Data by Changbao Wu B.S., Anhui Laodong University, China, 1982 M.S. Diploma, East China Normal University, 1986 a thesis submitted in partial
More informationLink lecture - Lagrange Multipliers
Link lecture - Lagrange Multipliers Lagrange multipliers provide a method for finding a stationary point of a function, say f(x, y) when the variables are subject to constraints, say of the form g(x, y)
More informationRESEARCH REPORT. Vanishing auxiliary variables in PPS sampling with applications in microscopy.
CENTRE FOR STOCHASTIC GEOMETRY AND ADVANCED BIOIMAGING 2014 www.csgb.dk RESEARCH REPORT Ina Trolle Andersen, Ute Hahn and Eva B. Vedel Jensen Vanishing auxiliary variables in PPS sampling with applications
More informationin Survey Sampling Petr Novák, Václav Kosina Czech Statistical Office Using the Superpopulation Model for Imputations and Variance
Using the Superpopulation Model for Imputations and Variance Computation in Survey Sampling Czech Statistical Office Introduction Situation Let us have a population of N units: n sampled (sam) and N-n
More informationStatement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.
MATHEMATICAL STATISTICS Homework assignment Instructions Please turn in the homework with this cover page. You do not need to edit the solutions. Just make sure the handwriting is legible. You may discuss
More informationLinear Models in Machine Learning
CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,
More informationJ.N.K. Rao, Carleton University Department of Mathematics & Statistics, Carleton University, Ottawa, Canada
JACKKNIFE VARIANCE ESTIMATION WITH IMPUTED SURVEY DATA J.N.K. Rao, Carleton University Department of Mathematics & Statistics, Carleton University, Ottawa, Canada KEY WORDS" Adusted imputed values, item
More informationof being selected and varying such probability across strata under optimal allocation leads to increased accuracy.
5 Sampling with Unequal Probabilities Simple random sampling and systematic sampling are schemes where every unit in the population has the same chance of being selected We will now consider unequal probability
More informationINSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING
Statistica Sinica 24 (2014), 1001-1015 doi:http://dx.doi.org/10.5705/ss.2013.038 INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING Seunghwan Park and Jae Kwang Kim Seoul National Univeristy
More informationIntroduction to Survey Data Analysis
Introduction to Survey Data Analysis JULY 2011 Afsaneh Yazdani Preface Learning from Data Four-step process by which we can learn from data: 1. Defining the Problem 2. Collecting the Data 3. Summarizing
More informationPetter Mostad Mathematical Statistics Chalmers and GU
Petter Mostad Mathematical Statistics Chalmers and GU Solution to MVE55/MSG8 Mathematical statistics and discrete mathematics MVE55/MSG8 Matematisk statistik och diskret matematik Re-exam: 4 August 6,
More informationarxiv: v2 [math.st] 20 Jun 2014
A solution in small area estimation problems Andrius Čiginas and Tomas Rudys Vilnius University Institute of Mathematics and Informatics, LT-08663 Vilnius, Lithuania arxiv:1306.2814v2 [math.st] 20 Jun
More informationA Design-Sensitive Approach to Fitting Regression Models With Complex Survey Data
A Design-Sensitive Approach to Fitting Regression Models With Complex Survey Data Phillip S. Kott RI International Rockville, MD Introduction: What Does Fitting a Regression Model with Survey Data Mean?
More informationREPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY
REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY J.D. Opsomer, W.A. Fuller and X. Li Iowa State University, Ames, IA 50011, USA 1. Introduction Replication methods are often used in
More informationSimple and Multiple Linear Regression
Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where
More informationDiscussing Effects of Different MAR-Settings
Discussing Effects of Different MAR-Settings Research Seminar, Department of Statistics, LMU Munich Munich, 11.07.2014 Matthias Speidel Jörg Drechsler Joseph Sakshaug Outline What we basically want to
More informationBootstrap. Director of Center for Astrostatistics. G. Jogesh Babu. Penn State University babu.
Bootstrap G. Jogesh Babu Penn State University http://www.stat.psu.edu/ babu Director of Center for Astrostatistics http://astrostatistics.psu.edu Outline 1 Motivation 2 Simple statistical problem 3 Resampling
More informationSimple Linear Regression Analysis
LINEAR REGRESSION ANALYSIS MODULE II Lecture - 6 Simple Linear Regression Analysis Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Prediction of values of study
More informationMax. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes
Maximum Likelihood Estimation Econometrics II Department of Economics Universidad Carlos III de Madrid Máster Universitario en Desarrollo y Crecimiento Económico Outline 1 3 4 General Approaches to Parameter
More informationThis paper is not to be removed from the Examination Halls
~~ST104B ZA d0 This paper is not to be removed from the Examination Halls UNIVERSITY OF LONDON ST104B ZB BSc degrees and Diplomas for Graduates in Economics, Management, Finance and the Social Sciences,
More informationA note on multiple imputation for general purpose estimation
A note on multiple imputation for general purpose estimation Shu Yang Jae Kwang Kim SSC meeting June 16, 2015 Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 1 / 32 Introduction Basic Setup Assume
More informationTABLES AND FORMULAS FOR MOORE Basic Practice of Statistics
TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics Exploring Data: Distributions Look for overall pattern (shape, center, spread) and deviations (outliers). Mean (use a calculator): x = x 1 + x
More informationLinear Regression. Junhui Qian. October 27, 2014
Linear Regression Junhui Qian October 27, 2014 Outline The Model Estimation Ordinary Least Square Method of Moments Maximum Likelihood Estimation Properties of OLS Estimator Unbiasedness Consistency Efficiency
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Bootstrap for Regression Week 9, Lecture 1
MA 575 Linear Models: Cedric E. Ginestet, Boston University Bootstrap for Regression Week 9, Lecture 1 1 The General Bootstrap This is a computer-intensive resampling algorithm for estimating the empirical
More informationGraduate Econometrics I: Unbiased Estimation
Graduate Econometrics I: Unbiased Estimation Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Unbiased Estimation
More informationSome Assorted Formulae. Some confidence intervals: σ n. x ± z α/2. x ± t n 1;α/2 n. ˆp(1 ˆp) ˆp ± z α/2 n. χ 2 n 1;1 α/2. n 1;α/2
STA 248 H1S MIDTERM TEST February 26, 2008 SURNAME: SOLUTIONS GIVEN NAME: STUDENT NUMBER: INSTRUCTIONS: Time: 1 hour and 50 minutes Aids allowed: calculator Tables of the standard normal, t and chi-square
More informationMultidimensional Control Totals for Poststratified Weights
Multidimensional Control Totals for Poststratified Weights Darryl V. Creel and Mansour Fahimi Joint Statistical Meetings Minneapolis, MN August 7-11, 2005 RTI International is a trade name of Research
More informationSTAT 512 sp 2018 Summary Sheet
STAT 5 sp 08 Summary Sheet Karl B. Gregory Spring 08. Transformations of a random variable Let X be a rv with support X and let g be a function mapping X to Y with inverse mapping g (A = {x X : g(x A}
More informationThe logistic regression model is thus a glm-model with canonical link function so that the log-odds equals the linear predictor, that is
Example The logistic regression model is thus a glm-model with canonical link function so that the log-odds equals the linear predictor, that is log p 1 p = β 0 + β 1 f 1 (y 1 ) +... + β d f d (y d ).
More informationWorkpackage 5 Resampling Methods for Variance Estimation. Deliverable 5.1
Workpackage 5 Resampling Methods for Variance Estimation Deliverable 5.1 2004 II List of contributors: Anthony C. Davison and Sylvain Sardy, EPFL. Main responsibility: Sylvain Sardy, EPFL. IST 2000 26057
More informationAdmissible Estimation of a Finite Population Total under PPS Sampling
Research Journal of Mathematical and Statistical Sciences E-ISSN 2320-6047 Admissible Estimation of a Finite Population Total under PPS Sampling Abstract P.A. Patel 1* and Shradha Bhatt 2 1 Department
More informationPropensity score adjusted method for missing data
Graduate Theses and Dissertations Graduate College 2013 Propensity score adjusted method for missing data Minsun Kim Riddles Iowa State University Follow this and additional works at: http://lib.dr.iastate.edu/etd
More informationApplied Econometrics (QEM)
Applied Econometrics (QEM) The Simple Linear Regression Model based on Prinicples of Econometrics Jakub Mućk Department of Quantitative Economics Jakub Mućk Applied Econometrics (QEM) Meeting #2 The Simple
More informationCalibration estimation using exponential tilting in sample surveys
Calibration estimation using exponential tilting in sample surveys Jae Kwang Kim February 23, 2010 Abstract We consider the problem of parameter estimation with auxiliary information, where the auxiliary
More informationL2: Two-variable regression model
L2: Two-variable regression model Feng Li feng.li@cufe.edu.cn School of Statistics and Mathematics Central University of Finance and Economics Revision: September 4, 2014 What we have learned last time...
More informationNonstationary Panels
Nonstationary Panels Based on chapters 12.4, 12.5, and 12.6 of Baltagi, B. (2005): Econometric Analysis of Panel Data, 3rd edition. Chichester, John Wiley & Sons. June 3, 2009 Agenda 1 Spurious Regressions
More informationGibbs Sampling in Endogenous Variables Models
Gibbs Sampling in Endogenous Variables Models Econ 690 Purdue University Outline 1 Motivation 2 Identification Issues 3 Posterior Simulation #1 4 Posterior Simulation #2 Motivation In this lecture we take
More informationVARIANCE ESTIMATION FOR NEAREST NEIGHBOR IMPUTATION FOR U.S. CENSUS LONG FORM DATA
Submitted to the Annals of Applied Statistics VARIANCE ESTIMATION FOR NEAREST NEIGHBOR IMPUTATION FOR U.S. CENSUS LONG FORM DATA By Jae Kwang Kim, Wayne A. Fuller and William R. Bell Iowa State University
More informationThe Simple Regression Model. Part II. The Simple Regression Model
Part II The Simple Regression Model As of Sep 22, 2015 Definition 1 The Simple Regression Model Definition Estimation of the model, OLS OLS Statistics Algebraic properties Goodness-of-Fit, the R-square
More informationBessel s Equation. MATH 365 Ordinary Differential Equations. J. Robert Buchanan. Fall Department of Mathematics
Bessel s Equation MATH 365 Ordinary Differential Equations J. Robert Buchanan Department of Mathematics Fall 2018 Background Bessel s equation of order ν has the form where ν is a constant. x 2 y + xy
More informationTopic 16 Interval Estimation
Topic 16 Interval Estimation Additional Topics 1 / 9 Outline Linear Regression Interpretation of the Confidence Interval 2 / 9 Linear Regression For ordinary linear regression, we have given least squares
More informationStatistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach
Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score
More information2017 Financial Mathematics Orientation - Statistics
2017 Financial Mathematics Orientation - Statistics Written by Long Wang Edited by Joshua Agterberg August 21, 2018 Contents 1 Preliminaries 5 1.1 Samples and Population............................. 5
More informationEconomics 582 Random Effects Estimation
Economics 582 Random Effects Estimation Eric Zivot May 29, 2013 Random Effects Model Hence, the model can be re-written as = x 0 β + + [x ] = 0 (no endogeneity) [ x ] = = + x 0 β + + [x ] = 0 [ x ] = 0
More informationPh.D. Qualifying Exam Friday Saturday, January 6 7, 2017
Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a
More informationEconometrics Multiple Regression Analysis: Heteroskedasticity
Econometrics Multiple Regression Analysis: João Valle e Azevedo Faculdade de Economia Universidade Nova de Lisboa Spring Semester João Valle e Azevedo (FEUNL) Econometrics Lisbon, April 2011 1 / 19 Properties
More informationSimple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.
Statistical Computation Math 475 Jimin Ding Department of Mathematics Washington University in St. Louis www.math.wustl.edu/ jmding/math475/index.html October 10, 2013 Ridge Part IV October 10, 2013 1
More information