Fractional Imputation in Survey Sampling: A Comparative Review
|
|
- Leslie McLaughlin
- 5 years ago
- Views:
Transcription
1 Fractional Imputation in Survey Sampling: A Comparative Review Shu Yang Jae-Kwang Kim Iowa State University Joint Statistical Meetings, August 2015
2 Outline Introduction Fractional imputation Features Numerical illustration Conclusion Yang & Kim (Iowa State University ) Fractional Imputation 2015 JSM 2 / 29
3 Introduction Basic Setup U = {1, 2,, N}: Finite population A U: sample (selected by a probability sampling design). Under complete response, suppose that ˆη n,g = w i g(y i ) i A is an unbiased estimator of η g = N 1 N i=1 g(y i). Here, g( ) is a known function. For example, g(y) = I (y < 3) leads to η g = P(Y < 3). Yang & Kim (Iowa State University ) Fractional Imputation 2015 JSM 3 / 29
4 Introduction Basic Setup (Cont d) A = A R A M, where y i are observed in A R. y i are missing in A M R i = 1 if i A R and R i = 0 if i A M. y i : imputed value for y i, i A M Imputed estimator of η g ˆη I,g = i A R w i g(y i ) + i ) i A M w i g(y Need E {g(y i ) R i = 0} = E {g(y i ) R i = 0}. Yang & Kim (Iowa State University ) Fractional Imputation 2015 JSM 4 / 29
5 Introduction ML estimation under missing data setup Often, find x (always observed) such that Missing at random (MAR) holds: f (y x, R = 0) = f (y x) Imputed values are created from f (y x). Computing the conditional expectation can be a challenging problem. 1 Do not know the true parameter θ in f (y x) = f (y x; θ): E {g (y) x} = E {g (y i ) x i ; θ}. 2 Even if we know θ, computing the conditional expectation can be numerically difficult. Yang & Kim (Iowa State University ) Fractional Imputation 2015 JSM 5 / 29
6 Introduction Imputation Imputation: Monte Carlo approximation of the conditional expectation (given the observed data). E {g (y i ) x i } = 1 M M ( g j=1 y (j) i ) 1 Bayesian approach: generate yi from f (y i x i, y obs ) = f (y i x i, θ) p(θ x i, y obs )dθ 2 Frequentist approach: generate yi consistent estimator. from f ( y i x i ; ˆθ ), where ˆθ is a Yang & Kim (Iowa State University ) Fractional Imputation 2015 JSM 6 / 29
7 Comparison Bayesian Frequentist Model Posterior distribution Prediction model f (latent, θ data) f (latent data, θ) Computation Data augmentation EM algorithm Prediction I-step E-step Parameter update P-step M-step Parameter est n Posterior mode ML estimation Imputation Multiple imputation Fractional imputation Variance estimation Rubin s formula Linearization or Resampling Yang & Kim (Iowa State University ) Fractional Imputation 2015 JSM 7 / 29
8 Fractional Imputation: Basic Idea Approximate E{g(y i ) x i } by M i E{g(y i ) x i } = wij g(y (j) i ) where wij is the fractional weight assigned to y (j) i, the j-th imputed value of y i. j=1 The fractional weights satisfy M i j=1 w ij = 1. Yang & Kim (Iowa State University ) Fractional Imputation 2015 JSM 8 / 29
9 Fractional Imputation for categorical data If y i is a categorical variable, we can use M i = total number of possible values of y i y (j) i = the j-th possible value of y i w (j) ij = P(y i = y (j) i x i ; ˆθ), where ˆθ is the pseudo MLE of θ. Yang & Kim (Iowa State University ) Fractional Imputation 2015 JSM 9 / 29
10 Parametric Fractional Imputation (Kim, 2011) More generally, we can write y i = (y i1,, y ip ) and y i can be partitioned into (y i,obs, y i,mis ). 1 More than one (say M) imputed values of y mis,i, denoted by y (1) mis,i,, y (M) mis,i, are generated from some density h (y mis,i y obs ). 2 Create weighted data set {( wi wij, yij ) } ; j = 1, 2,, M; i A where M j=1 w ij = 1, y ij = (y obs,i, y (j) mis,i ) wij f (yij (j) ; ˆθ)/h(y mis,i y i,obs ), ˆθ is the (pseudo) maximum likelihood estimator of θ, and f (y; θ) is the joint density of y. 3 The weight wij are the normalized importance weights and can be called fractional weights. Yang & Kim (Iowa State University ) Fractional Imputation 2015 JSM 10 / 29
11 EM algorithm using PFI EM algorithm by fractional imputation 1 Initial imputation: generate y (j) mis,i h (y i,mis y i,obs ). 2 E-step: compute where M j=1 w ij(t) = 1. 3 M-step: update w ij(t) f (y ij ; ˆθ (t) )/h(y (j) i,mis y i,obs) ˆθ (t+1) : solution to i A M j=1 w i w ij(t) S ( θ; y ij ) = 0, where S(θ; y) = log f (y; θ)/ θ is the score function of θ. 4 Repeat Step2 and Step 3 until convergence. We may add an optional step that checks if wij(t) is too large for some j. In this case, h(y i,mis y i,obs ) needs to be changed. Yang & Kim (Iowa State University ) Fractional Imputation 2015 JSM 11 / 29
12 Approximation: Calibration Fractional imputation In large scale survey sampling, we prefer to have smaller M. Two-step method for fractional imputation: 1 Create a set of fractionally imputed data with size nm, (say M = 1000). 2 Use an efficient sampling and weighting method to get a final set of fractionally imputed data with size nm, (say m = 10). Thus, we treat the step-one imputed data as a finite population and the step-two imputed data as a sample. We can use efficient sampling technique (such as systematic sampling or stratification) to get a final imputed data and use calibration technique for fractional weighting. Yang & Kim (Iowa State University ) Fractional Imputation 2015 JSM 12 / 29
13 Why Fractional Imputation? 1 To improve the efficiency of the point estimator (vs. single imputation) 2 To obtain valid frequentist inference without congeniality condition (vs. multiple imputation) 3 To handle informative sampling mechanism We will discuss the third issue first. Yang & Kim (Iowa State University ) Fractional Imputation 2015 JSM 13 / 29
14 Informative sampling design Let f (y x) be the conditional distribution of y given x. x is always observed but y is subject to missingness. A sampling design is called noninformative (w.r.t f ) if it satisfies f (y x, I = 1) = f (y x) (1) where I i = 1 if i A and I i = 0 otherwise. If (1) does not hold, then the sampling design is informative. Yang & Kim (Iowa State University ) Fractional Imputation 2015 JSM 14 / 29
15 Missing At Random Two versions of Missing At Random (MAR) 1 PMAR (Population Missing At Random) Y R X 2 SMAR (Sample Missing At Random) Y R (X, I ) Fractional imputation assumes PMAR while multiple imputation assumes SMAR Under noninformative sampling design, PMAR=SMAR Yang & Kim (Iowa State University ) Fractional Imputation 2015 JSM 15 / 29
16 Imputation under informative sampling Two approaches under informative sampling when PMAR holds. 1 Weighting approach: Use weighted score equation to estimate θ in f (y x; θ). The imputed values are generated from f (y x, ˆθ). 2 Augmented model approach: Include w into model covariates to get the augmented model f (y x, w; φ). The augmented model makes the sampling design noninformative in the sense that f (y x, w) = f (y x, w, I = 1). The imputed values are generated from f (y x, w; ˆφ), where ˆφ is computed from unweighted score equation. Yang & Kim (Iowa State University ) Fractional Imputation 2015 JSM 16 / 29
17 Berg, Kim, and Skinner (2015) Figure : A Directed Acyclic Graph (DAG) for a setup where PMAR holds but SMAR does not hold. Variable U is latent in the sense that it is never observed. R W I Y X U f (y x, R) = f (y x) holds but f (y x, w, R) f (y x, w). Yang & Kim (Iowa State University ) Fractional Imputation 2015 JSM 17 / 29
18 Imputation under informative sampling Weighting approach generates imputed values from f (y x, R = 1) and ˆθ I = i A w i {R i y i + (1 R i )y i } (2) is unbiased under PMAR. The augmented model approach generates imputed values from f (y x, w, I = 1, R = 1) and (2) is unbiased when f (y x, w, I = 1, R = 1) = f (y x, w, I = 1, R = 0) (3) holds. PMAR does not necessarily imply SMAR in (3). Fractional imputation is based on weighting approach. Yang & Kim (Iowa State University ) Fractional Imputation 2015 JSM 18 / 29
19 Numerical illustration A pseudo finite population constructed from a single month data in Monthly Retail Trade Survey (MRTS) at US Bureau of Census N = 7, 260 retail business units in five strata Three variables in the data h: stratum x hi : inventory values y hi : sales Yang & Kim (Iowa State University ) Fractional Imputation 2015 JSM 19 / 29
20 Box plot of log sales and log inventory values by strata Box plot of sales data by strata strata log scale Box plot of inventory data by strata strata log scale Yang & Kim (Iowa State University ) Fractional Imputation 2015 JSM 20 / 29
21 Imputation model log(y hi ) = β 0h + β 1 log(x hi ) + e hi where e hi N(0, σ 2 ) Yang & Kim (Iowa State University ) Fractional Imputation 2015 JSM 21 / 29
22 Residual plot and residual QQ plot Fitted values Residuals Residuals vs Fitted Theoretical Quantiles Standardized residuals Normal Q Q Regression model of log(y) against log(x) and strata indicator Yang & Kim (Iowa State University ) Fractional Imputation 2015 JSM 22 / 29
23 Stratified random sampling Table : The sample allocation in stratified simple random sampling. Strata Strata size N h Sample size n h Sampling weight Yang & Kim (Iowa State University ) Fractional Imputation 2015 JSM 23 / 29
24 Response mechanism: PMAR Variable x hi is always observed and only y hi is subject to missingness. PMAR R hi Bernoulli(π hi ), π hi = 1/[1 + exp{4 0.3 log(x hi )}]. The overall response rate is about 0.6. Yang & Kim (Iowa State University ) Fractional Imputation 2015 JSM 24 / 29
25 Simulation Study Table 1 Monte Carlo bias and variance of the point estimators. Parameter Estimator Bias Variance Std Var Complete sample θ = E(Y ) MI FI Table 2 Monte Carlo relative bias of the variance estimator. Parameter Imputation Relative bias (%) V (ˆθ) MI 18.4 FI 2.7 Yang & Kim (Iowa State University ) Fractional Imputation 2015 JSM 25 / 29
26 Discussion Rubin s formula is based on the following decomposition: V (ˆθ MI ) = V (ˆθ n ) + V (ˆθ MI ˆθ n ) where ˆθ n is the complete-sample estimator of θ. Basically, W M term estimates V (ˆθ n ) and (1 + M 1 )B M term estimates V (ˆθ MI ˆθ n ). For general case, we have V (ˆθ MI ) = V (ˆθ n ) + V (ˆθ MI ˆθ n ) + 2Cov(ˆθ MI ˆθ n, ˆθ n ) and Rubin s variance estimator ignores the covariance term. Thus, a sufficient condition for the validity of unbiased variance estimator is Cov(ˆθ MI ˆθ n, ˆθ n ) = 0. Meng (1994) called the condition congeniality of ˆθ n. Congeniality holds when ˆθ n is the MLE of θ (self-efficient estimator). Yang & Kim (Iowa State University ) Fractional Imputation 2015 JSM 26 / 29
27 Discussion For example, there are two estimators of θ = E(Y ) when log(y ) follows from N(β 0 + β 1 x, σ 2 ). 1 Maximum likelihood method: 2 Method of moments: ˆθ MLE = n 1 n exp{ ˆβ 0 + ˆβ 1 x i + 0.5ˆσ 2 } i=1 ˆθ MME = n 1 The MME of θ = E(Y ) does not satisfy the congeniality and Rubin s variance estimator is biased (R.B. = 58.5%) Rubin s variance estimator is essentially unbiased for MLE of θ (R.B. = -1.9%) but MLE is rarely used in practice. n i=1 y i Yang & Kim (Iowa State University ) Fractional Imputation 2015 JSM 27 / 29
28 Summary Fractional imputation is developed as a frequentist imputation. Multiple imputation is motivated from a Bayesian framework. The frequentist validity of multiple imputation requires congeniality. Fractional imputation does not require the congeniality condition and works well for Method of Moments estimators. For informative sampling, augmented model approach does not necessarily achieve SMAR. Fractional imputation uses weighting approach for informative sampling. Yang & Kim (Iowa State University ) Fractional Imputation 2015 JSM 28 / 29
29 The end Yang & Kim (Iowa State University ) Fractional Imputation 2015 JSM 29 / 29
Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling
Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction
More informationA note on multiple imputation for general purpose estimation
A note on multiple imputation for general purpose estimation Shu Yang Jae Kwang Kim SSC meeting June 16, 2015 Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 1 / 32 Introduction Basic Setup Assume
More informationLikelihood-based inference with missing data under missing-at-random
Likelihood-based inference with missing data under missing-at-random Jae-kwang Kim Joint work with Shu Yang Department of Statistics, Iowa State University May 4, 014 Outline 1. Introduction. Parametric
More information6. Fractional Imputation in Survey Sampling
6. Fractional Imputation in Survey Sampling 1 Introduction Consider a finite population of N units identified by a set of indices U = {1, 2,, N} with N known. Associated with each unit i in the population
More informationParametric fractional imputation for missing data analysis
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Biometrika (????),??,?, pp. 1 15 C???? Biometrika Trust Printed in
More informationStatistical Methods for Handling Missing Data
Statistical Methods for Handling Missing Data Jae-Kwang Kim Department of Statistics, Iowa State University July 5th, 2014 Outline Textbook : Statistical Methods for handling incomplete data by Kim and
More informationStatistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach
Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score
More informationChapter 4: Imputation
Chapter 4: Imputation Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Basic Theory for imputation 3 Variance estimation after imputation 4 Replication variance estimation
More informationRecent Advances in the analysis of missing data with non-ignorable missingness
Recent Advances in the analysis of missing data with non-ignorable missingness Jae-Kwang Kim Department of Statistics, Iowa State University July 4th, 2014 1 Introduction 2 Full likelihood-based ML estimation
More informationShu Yang and Jae Kwang Kim. Harvard University and Iowa State University
Statistica Sinica 27 (2017), 000-000 doi:https://doi.org/10.5705/ss.202016.0155 DISCUSSION: DISSECTING MULTIPLE IMPUTATION FROM A MULTI-PHASE INFERENCE PERSPECTIVE: WHAT HAPPENS WHEN GOD S, IMPUTER S AND
More informationNonresponse weighting adjustment using estimated response probability
Nonresponse weighting adjustment using estimated response probability Jae-kwang Kim Yonsei University, Seoul, Korea December 26, 2006 Introduction Nonresponse Unit nonresponse Item nonresponse Basic strategy
More informationTwo-phase sampling approach to fractional hot deck imputation
Two-phase sampling approach to fractional hot deck imputation Jongho Im 1, Jae-Kwang Kim 1 and Wayne A. Fuller 1 Abstract Hot deck imputation is popular for handling item nonresponse in survey sampling.
More informationCombining data from two independent surveys: model-assisted approach
Combining data from two independent surveys: model-assisted approach Jae Kwang Kim 1 Iowa State University January 20, 2012 1 Joint work with J.N.K. Rao, Carleton University Reference Kim, J.K. and Rao,
More informationAn Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data
An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data Jae-Kwang Kim 1 Iowa State University June 28, 2012 1 Joint work with Dr. Ming Zhou (when he was a PhD student at ISU)
More informationMiscellanea A note on multiple imputation under complex sampling
Biometrika (2017), 104, 1,pp. 221 228 doi: 10.1093/biomet/asw058 Printed in Great Britain Advance Access publication 3 January 2017 Miscellanea A note on multiple imputation under complex sampling BY J.
More informationA measurement error model approach to small area estimation
A measurement error model approach to small area estimation Jae-kwang Kim 1 Spring, 2015 1 Joint work with Seunghwan Park and Seoyoung Kim Ouline Introduction Basic Theory Application to Korean LFS Discussion
More informationData Integration for Big Data Analysis for finite population inference
for Big Data Analysis for finite population inference Jae-kwang Kim ISU January 23, 2018 1 / 36 What is big data? 2 / 36 Data do not speak for themselves Knowledge Reproducibility Information Intepretation
More informationAccounting for Complex Sample Designs via Mixture Models
Accounting for Complex Sample Designs via Finite Normal Mixture Models 1 1 University of Michigan School of Public Health August 2009 Talk Outline 1 2 Accommodating Sampling Weights in Mixture Models 3
More informationRobustness to Parametric Assumptions in Missing Data Models
Robustness to Parametric Assumptions in Missing Data Models Bryan Graham NYU Keisuke Hirano University of Arizona April 2011 Motivation Motivation We consider the classic missing data problem. In practice
More informationBasic math for biology
Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood
More informationFractional imputation method of handling missing data and spatial statistics
Graduate Theses and Dissertations Graduate College 2014 Fractional imputation method of handling missing data and spatial statistics Shu Yang Iowa State University Follow this and additional works at:
More informationA Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness
A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model
More informationBayesian Linear Regression
Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective
More informationPlausible Values for Latent Variables Using Mplus
Plausible Values for Latent Variables Using Mplus Tihomir Asparouhov and Bengt Muthén August 21, 2010 1 1 Introduction Plausible values are imputed values for latent variables. All latent variables can
More informationGeneralized Linear Models. Kurt Hornik
Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general
More informationEmpirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design
1 / 32 Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design Changbao Wu Department of Statistics and Actuarial Science University of Waterloo (Joint work with Min Chen and Mary
More informationChapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70
Chapter 5: Models used in conjunction with sampling J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70 Nonresponse Unit Nonresponse: weight adjustment Item Nonresponse:
More informationStatistical Methods. Missing Data snijders/sm.htm. Tom A.B. Snijders. November, University of Oxford 1 / 23
1 / 23 Statistical Methods Missing Data http://www.stats.ox.ac.uk/ snijders/sm.htm Tom A.B. Snijders University of Oxford November, 2011 2 / 23 Literature: Joseph L. Schafer and John W. Graham, Missing
More informationA Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,
A Course in Applied Econometrics Lecture 18: Missing Data Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. When Can Missing Data be Ignored? 2. Inverse Probability Weighting 3. Imputation 4. Heckman-Type
More informationPropensity score adjusted method for missing data
Graduate Theses and Dissertations Graduate College 2013 Propensity score adjusted method for missing data Minsun Kim Riddles Iowa State University Follow this and additional works at: http://lib.dr.iastate.edu/etd
More informationMax. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes
Maximum Likelihood Estimation Econometrics II Department of Economics Universidad Carlos III de Madrid Máster Universitario en Desarrollo y Crecimiento Económico Outline 1 3 4 General Approaches to Parameter
More informationMS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari
MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind
More informationComparison of multiple imputation methods for systematically and sporadically missing multilevel data
Comparison of multiple imputation methods for systematically and sporadically missing multilevel data V. Audigier, I. White, S. Jolani, T. Debray, M. Quartagno, J. Carpenter, S. van Buuren, M. Resche-Rigon
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationBayesian methods for missing data: part 1. Key Concepts. Nicky Best and Alexina Mason. Imperial College London
Bayesian methods for missing data: part 1 Key Concepts Nicky Best and Alexina Mason Imperial College London BAYES 2013, May 21-23, Erasmus University Rotterdam Missing Data: Part 1 BAYES2013 1 / 68 Outline
More informationanalysis of incomplete data in statistical surveys
analysis of incomplete data in statistical surveys Ugo Guarnera 1 1 Italian National Institute of Statistics, Italy guarnera@istat.it Jordan Twinning: Imputation - Amman, 6-13 Dec 2014 outline 1 origin
More informationIntroduction An approximated EM algorithm Simulation studies Discussion
1 / 33 An Approximated Expectation-Maximization Algorithm for Analysis of Data with Missing Values Gong Tang Department of Biostatistics, GSPH University of Pittsburgh NISS Workshop on Nonignorable Nonresponse
More informationRegression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood
Regression Estimation - Least Squares and Maximum Likelihood Dr. Frank Wood Least Squares Max(min)imization Function to minimize w.r.t. β 0, β 1 Q = n (Y i (β 0 + β 1 X i )) 2 i=1 Minimize this by maximizing
More informationPh.D. Qualifying Exam Friday Saturday, January 6 7, 2017
Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a
More informationOn the bias of the multiple-imputation variance estimator in survey sampling
J. R. Statist. Soc. B (2006) 68, Part 3, pp. 509 521 On the bias of the multiple-imputation variance estimator in survey sampling Jae Kwang Kim, Yonsei University, Seoul, Korea J. Michael Brick, Westat,
More informationIntroduction to Survey Data Integration
Introduction to Survey Data Integration Jae-Kwang Kim Iowa State University May 20, 2014 Outline 1 Introduction 2 Survey Integration Examples 3 Basic Theory for Survey Integration 4 NASS application 5
More informationOutline of GLMs. Definitions
Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density
More informationarxiv:math/ v1 [math.st] 23 Jun 2004
The Annals of Statistics 2004, Vol. 32, No. 2, 766 783 DOI: 10.1214/009053604000000175 c Institute of Mathematical Statistics, 2004 arxiv:math/0406453v1 [math.st] 23 Jun 2004 FINITE SAMPLE PROPERTIES OF
More informationCausal Inference with General Treatment Regimes: Generalizing the Propensity Score
Causal Inference with General Treatment Regimes: Generalizing the Propensity Score David van Dyk Department of Statistics, University of California, Irvine vandyk@stat.harvard.edu Joint work with Kosuke
More informationWeighting in survey analysis under informative sampling
Jae Kwang Kim and Chris J. Skinner Weighting in survey analysis under informative sampling Article (Accepted version) (Refereed) Original citation: Kim, Jae Kwang and Skinner, Chris J. (2013) Weighting
More informationBasics of Modern Missing Data Analysis
Basics of Modern Missing Data Analysis Kyle M. Lang Center for Research Methods and Data Analysis University of Kansas March 8, 2013 Topics to be Covered An introduction to the missing data problem Missing
More informationANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW
SSC Annual Meeting, June 2015 Proceedings of the Survey Methods Section ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW Xichen She and Changbao Wu 1 ABSTRACT Ordinal responses are frequently involved
More informationInference with Imputed Conditional Means
Inference with Imputed Conditional Means Joseph L. Schafer and Nathaniel Schenker June 4, 1997 Abstract In this paper, we develop analytic techniques that can be used to produce appropriate inferences
More informationBAYESIAN METHODS TO IMPUTE MISSING COVARIATES FOR CAUSAL INFERENCE AND MODEL SELECTION
BAYESIAN METHODS TO IMPUTE MISSING COVARIATES FOR CAUSAL INFERENCE AND MODEL SELECTION by Robin Mitra Department of Statistical Science Duke University Date: Approved: Dr. Jerome P. Reiter, Supervisor
More informationGraybill Conference Poster Session Introductions
Graybill Conference Poster Session Introductions 2013 Graybill Conference in Modern Survey Statistics Colorado State University Fort Collins, CO June 10, 2013 Small Area Estimation with Incomplete Auxiliary
More information[Part 2] Model Development for the Prediction of Survival Times using Longitudinal Measurements
[Part 2] Model Development for the Prediction of Survival Times using Longitudinal Measurements Aasthaa Bansal PhD Pharmaceutical Outcomes Research & Policy Program University of Washington 69 Biomarkers
More informationarxiv: v5 [stat.me] 13 Feb 2018
arxiv: arxiv:1602.07933 BOOTSTRAP INFERENCE WHEN USING MULTIPLE IMPUTATION By Michael Schomaker and Christian Heumann University of Cape Town and Ludwig-Maximilians Universität München arxiv:1602.07933v5
More informationChapter 8: Estimation 1
Chapter 8: Estimation 1 Jae-Kwang Kim Iowa State University Fall, 2014 Kim (ISU) Ch. 8: Estimation 1 Fall, 2014 1 / 33 Introduction 1 Introduction 2 Ratio estimation 3 Regression estimator Kim (ISU) Ch.
More informationRegression: Lecture 2
Regression: Lecture 2 Niels Richard Hansen April 26, 2012 Contents 1 Linear regression and least squares estimation 1 1.1 Distributional results................................ 3 2 Non-linear effects and
More informationINSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING
Statistica Sinica 24 (2014), 1001-1015 doi:http://dx.doi.org/10.5705/ss.2013.038 INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING Seunghwan Park and Jae Kwang Kim Seoul National Univeristy
More informationRegression Estimation Least Squares and Maximum Likelihood
Regression Estimation Least Squares and Maximum Likelihood Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 1 Least Squares Max(min)imization Function to minimize
More informationBiostat 2065 Analysis of Incomplete Data
Biostat 2065 Analysis of Incomplete Data Gong Tang Dept of Biostatistics University of Pittsburgh October 20, 2005 1. Large-sample inference based on ML Let θ is the MLE, then the large-sample theory implies
More informationREGRESSION WITH SPATIALLY MISALIGNED DATA. Lisa Madsen Oregon State University David Ruppert Cornell University
REGRESSION ITH SPATIALL MISALIGNED DATA Lisa Madsen Oregon State University David Ruppert Cornell University SPATIALL MISALIGNED DATA 10 X X X X X X X X 5 X X X X X 0 X 0 5 10 OUTLINE 1. Introduction 2.
More informationTopic 12 Overview of Estimation
Topic 12 Overview of Estimation Classical Statistics 1 / 9 Outline Introduction Parameter Estimation Classical Statistics Densities and Likelihoods 2 / 9 Introduction In the simplest possible terms, the
More informationBayesian Model Diagnostics and Checking
Earvin Balderama Quantitative Ecology Lab Department of Forestry and Environmental Resources North Carolina State University April 12, 2013 1 / 34 Introduction MCMCMC 2 / 34 Introduction MCMCMC Steps in
More informationA Comparative Study of Imputation Methods for Estimation of Missing Values of Per Capita Expenditure in Central Java
IOP Conference Series: Earth and Environmental Science PAPER OPEN ACCESS A Comparative Study of Imputation Methods for Estimation of Missing Values of Per Capita Expenditure in Central Java To cite this
More informationBasic Sampling Methods
Basic Sampling Methods Sargur Srihari srihari@cedar.buffalo.edu 1 1. Motivation Topics Intractability in ML How sampling can help 2. Ancestral Sampling Using BNs 3. Transforming a Uniform Distribution
More informationTheory of Maximum Likelihood Estimation. Konstantin Kashin
Gov 2001 Section 5: Theory of Maximum Likelihood Estimation Konstantin Kashin February 28, 2013 Outline Introduction Likelihood Examples of MLE Variance of MLE Asymptotic Properties What is Statistical
More informationFitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation
Fitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation Dimitris Rizopoulos Department of Biostatistics, Erasmus University Medical Center, the Netherlands d.rizopoulos@erasmusmc.nl
More informationInferences on missing information under multiple imputation and two-stage multiple imputation
p. 1/4 Inferences on missing information under multiple imputation and two-stage multiple imputation Ofer Harel Department of Statistics University of Connecticut Prepared for the Missing Data Approaches
More informationSimple Linear Regression
Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)
More informationVARIANCE ESTIMATION FOR NEAREST NEIGHBOR IMPUTATION FOR U.S. CENSUS LONG FORM DATA
Submitted to the Annals of Applied Statistics VARIANCE ESTIMATION FOR NEAREST NEIGHBOR IMPUTATION FOR U.S. CENSUS LONG FORM DATA By Jae Kwang Kim, Wayne A. Fuller and William R. Bell Iowa State University
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationEM Algorithm II. September 11, 2018
EM Algorithm II September 11, 2018 Review EM 1/27 (Y obs, Y mis ) f (y obs, y mis θ), we observe Y obs but not Y mis Complete-data log likelihood: l C (θ Y obs, Y mis ) = log { f (Y obs, Y mis θ) Observed-data
More informationCombining Non-probability and Probability Survey Samples Through Mass Imputation
Combining Non-probability and Probability Survey Samples Through Mass Imputation Jae-Kwang Kim 1 Iowa State University & KAIST October 27, 2018 1 Joint work with Seho Park, Yilin Chen, and Changbao Wu
More informationFoundations of Statistical Inference
Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2015 Julien Berestycki (University of Oxford) SB2a MT 2015 1 / 16 Lecture 16 : Bayesian analysis
More informationCombining multiple observational data sources to estimate causal eects
Department of Statistics, North Carolina State University Combining multiple observational data sources to estimate causal eects Shu Yang* syang24@ncsuedu Joint work with Peng Ding UC Berkeley May 23,
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide
More informationBayesian Additive Regression Tree (BART) with application to controlled trail data analysis
Bayesian Additive Regression Tree (BART) with application to controlled trail data analysis Weilan Yang wyang@stat.wisc.edu May. 2015 1 / 20 Background CATE i = E(Y i (Z 1 ) Y i (Z 0 ) X i ) 2 / 20 Background
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7
MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is
More informationModel comparison and selection
BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)
More informationAn Overview of the Pros and Cons of Linearization versus Replication in Establishment Surveys
An Overview of the Pros and Cons of Linearization versus Replication in Establishment Surveys Richard Valliant University of Michigan and Joint Program in Survey Methodology University of Maryland 1 Introduction
More informationPart 6: Multivariate Normal and Linear Models
Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of
More informationEfficient Monte Carlo computation of Fisher information matrix using prior information
Efficient Monte Carlo computation of Fisher information matrix using prior information Sonjoy Das, UB James C. Spall, APL/JHU Roger Ghanem, USC SIAM Conference on Data Mining Anaheim, California, USA April
More informationMultiple Imputation Methods for Treatment Noncompliance and Nonresponse in Randomized Clinical Trials
UW Biostatistics Working Paper Series 2-19-2009 Multiple Imputation Methods for Treatment Noncompliance and Nonresponse in Randomized Clinical Trials Leslie Taylor UW, taylorl@u.washington.edu Xiao-Hua
More informationStreamlining Missing Data Analysis by Aggregating Multiple Imputations at the Data Level
Streamlining Missing Data Analysis by Aggregating Multiple Imputations at the Data Level A Monte Carlo Simulation to Test the Tenability of the SuperMatrix Approach Kyle M Lang Quantitative Psychology
More informationProblem Selected Scores
Statistics Ph.D. Qualifying Exam: Part II November 20, 2010 Student Name: 1. Answer 8 out of 12 problems. Mark the problems you selected in the following table. Problem 1 2 3 4 5 6 7 8 9 10 11 12 Selected
More informationSimple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.
Statistical Computation Math 475 Jimin Ding Department of Mathematics Washington University in St. Louis www.math.wustl.edu/ jmding/math475/index.html October 10, 2013 Ridge Part IV October 10, 2013 1
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationA Review of Pseudo-Marginal Markov Chain Monte Carlo
A Review of Pseudo-Marginal Markov Chain Monte Carlo Discussed by: Yizhe Zhang October 21, 2016 Outline 1 Overview 2 Paper review 3 experiment 4 conclusion Motivation & overview Notation: θ denotes the
More informationInterval Estimation III: Fisher's Information & Bootstrapping
Interval Estimation III: Fisher's Information & Bootstrapping Frequentist Confidence Interval Will consider four approaches to estimating confidence interval Standard Error (+/- 1.96 se) Likelihood Profile
More informationF & B Approaches to a simple model
A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 215 http://www.astro.cornell.edu/~cordes/a6523 Lecture 11 Applications: Model comparison Challenges in large-scale surveys
More informationMeasurement Error and Linear Regression of Astronomical Data. Brandon Kelly Penn State Summer School in Astrostatistics, June 2007
Measurement Error and Linear Regression of Astronomical Data Brandon Kelly Penn State Summer School in Astrostatistics, June 2007 Classical Regression Model Collect n data points, denote i th pair as (η
More informationChapter 17: Undirected Graphical Models
Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)
More informationBayesian Analysis (Optional)
Bayesian Analysis (Optional) 1 2 Big Picture There are two ways to conduct statistical inference 1. Classical method (frequentist), which postulates (a) Probability refers to limiting relative frequencies
More informationMissing Covariate Data in Matched Case-Control Studies
Missing Covariate Data in Matched Case-Control Studies Department of Statistics North Carolina State University Paul Rathouz Dept. of Health Studies U. of Chicago prathouz@health.bsd.uchicago.edu with
More informationA STRATEGY FOR STEPWISE REGRESSION PROCEDURES IN SURVIVAL ANALYSIS WITH MISSING COVARIATES. by Jia Li B.S., Beijing Normal University, 1998
A STRATEGY FOR STEPWISE REGRESSION PROCEDURES IN SURVIVAL ANALYSIS WITH MISSING COVARIATES by Jia Li B.S., Beijing Normal University, 1998 Submitted to the Graduate Faculty of the Graduate School of Public
More informationLecture Notes: Some Core Ideas of Imputation for Nonresponse in Surveys. Tom Rosenström University of Helsinki May 14, 2014
Lecture Notes: Some Core Ideas of Imputation for Nonresponse in Surveys Tom Rosenström University of Helsinki May 14, 2014 1 Contents 1 Preface 3 2 Definitions 3 3 Different ways to handle MAR data 4 4
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationBayesian Analysis of Multivariate Normal Models when Dimensions are Absent
Bayesian Analysis of Multivariate Normal Models when Dimensions are Absent Robert Zeithammer University of Chicago Peter Lenk University of Michigan http://webuser.bus.umich.edu/plenk/downloads.htm SBIES
More informationMeasurement error as missing data: the case of epidemiologic assays. Roderick J. Little
Measurement error as missing data: the case of epidemiologic assays Roderick J. Little Outline Discuss two related calibration topics where classical methods are deficient (A) Limit of quantification methods
More informationSTATS 200: Introduction to Statistical Inference. Lecture 29: Course review
STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout
More informationFor more information about how to cite these materials visit
Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/
More informationBayesian inference for multivariate extreme value distributions
Bayesian inference for multivariate extreme value distributions Sebastian Engelke Clément Dombry, Marco Oesting Toronto, Fields Institute, May 4th, 2016 Main motivation For a parametric model Z F θ of
More informationGeneralized Estimating Equations
Outline Review of Generalized Linear Models (GLM) Generalized Linear Model Exponential Family Components of GLM MLE for GLM, Iterative Weighted Least Squares Measuring Goodness of Fit - Deviance and Pearson
More information