The Use of Survey Weights in Regression Modelling

Size: px
Start display at page:

Download "The Use of Survey Weights in Regression Modelling"

Transcription

1 The Use of Survey Weights in Regression Modelling Chris Skinner London School of Economics and Political Science (with Jae-Kwang Kim, Iowa State University) Colorado State University, June

2 Weighting in Regression Modelling Established whether to weight implementation in practice Modern developments adapting to new modelling settings new ways of constructing (modifying) weights (for particular models) 2

3 Modifying Weights to retain benefits of weights, while mitigating disadvantages 3

4 Pros and Cons of Weighting Pros to avoid bias from informative sampling, when inclusion probabilities π j unequal (note: other approaches can also do this, e.g. sample likelihood, Pfeffermann, 2011) to protect against model misspecification to make efficient use of population-level information Cons variance inflation from unequal inclusion probabilities 4

5 Motivating Application: Cross-National Comparative Survey Analysis Common for π j to vary, especially between countries. population sizes of countries have a tremendous, thousand-fold range; whereas sample sizes tend to be made more constant in order to obtain similar errors for national means Kish (1994) But with great variation in π j, loss of efficiency from π 1 j weighting may be high. Want to retain consistency of weighted estimation while avoiding too much inflation of variance. 5

6 Outline 1. First kind of weight modification - theory 2. Application to European Social Survey 3. Second kind of weight modification - theory 4. Simulation study 6

7 Model Population U = {1, 2,..., N} Regression model f (y j ; x j, β) Score u j (y j ; x j, β) = log f (y j ;x j,β) β write u j (β) = u j (y j ; x j, β) Mean score model: E m { uj (β) } = 0, j = 1, 2,..., N 7

8 Census Parameter β U solves N u j (β) = 0. j=1 Kish and Frankel (1974), Godambe and Thompson (1986) say β U of interest even if model fails. But is it really always the appropriate pseudo-parameter when model fails? E.g. Scott and Wild (2002, 2003), Skinner (2003) 8

9 Sampling Sample s U Sample indicator variable I j = 1 if j s I j = 0 if j / s, 9

10 (Unweighted) Sample Estimating Equations β solves N I j u j (β) = 0. j=1 design-model consistent if E m E p { Ij u j (β) } = 0, j = 1, 2,..., N in particular, if I j and Y j independent (given x j ) noninformative sampling 10

11 Weighted Sample Estimating equations β w solves N I j w j u j (β) = 0. j=1 design-model consistency condition: E m E p { Ij w j u j (β) } = 0, j = 1, 2,..., N 11

12 Design Consistency β design consistent for β U if { N E p I j w j u j (β) } = j=1 N u j (β) j=1 for arbitrary y j and β so E p ( Ij w j ) = 1 design consistency condition 12

13 Design Weights Hence w j = π 1 j, Horvitz-Thompson weight or design weight, if w j is not sample-dependent Godambe and Thompson (1986, Theorem 1) Horvitz-Thompson weighted estimation equations optimal within broader class of sample estimating equations (which give design consistent estimation of β U ) 13

14 Widening Class of Weights But design consistency condition may not be appropriate scientifically. We drop it to enable us to improve efficiency. Now require only design-model consistency condition, but do assume mean score model holds. design consistency condition design-model consistency condition So class of possible weights is widened 14

15 First Kind of Weight Modification Modification by function of covariates. Class of modified weights w j = d j q j, where d j = π 1 j q j = q(x j ) (not sample dependent) and meets design-model consistency condition if mean score model holds. will generally not meet design-consistency condition unless q j constant 15

16 Optimization Problem to determine function q( ) which minimises var mp ( β w ) var mp ( β w ) = J(β) 1 var mp { N j=1 } w j I j u j (β) J(β) 1 where J(β) = E mp { N j=1 } u j (β) I j w j β 16

17 Approximations/Assumptions - observations for different units are approximately independent - generalized linear model so that u j (β) = e j x j var mp ( β w ) { N } 1 N { N } 1 j=1 q j E m ( e 2 j ) xj x T j j=1 qj 2 ( E m dj ej 2 ) xj x T j j=1 q j E m ( e 2 j ) xj x T j 17

18 (Approximately) Optimal Solution q j E ( ) m e 2 j x j ( ) E m dj ej 2 x j Equivalent to Fuller (2009, Sect 6.3.2) for linear regression model Requires fitting of model to E m ( dj e 2 j x j ) 18

19 Estimating q( ) e j is scaled version of residual y j E m (Y j x j ) As a first approximation, suppose d j and ej 2 set q j = 1/E m (d j x j ) are uncorrelated and And thus w j = d j q j = d j /E m (d j x j ) Design weight standardized for its dependence on x j Different from Pfeffermann and Sverchkov (1999, 2003) q j 1/E m ( dj I j = 1, x j ) Will discuss estimation of E m (d j x j ) later 19

20 Application: Voter Turnout in Europe European Social Survey Round subsample of 2621 people aged in 19 European countries, providing data on variables relating to political interest and civic duty analysis similar to Fieldhouse, Tranmer and Russell (2007) European J. Political Research 20

21 Logistic Regression Analysis y = 1, if voted in last national election in country = 0, otherwise x variables for rational choice model, including political efficacy - principal components of questions measuring extent to which respondents think they can understand and influence politics system benefits - principal components of respondent s feelings of civic duty 21

22 Estimated Coefficients of Logistic Regression variable unweighted s.e. weighted s.e political efficacy political efficacy closeness of contest (%) partisanship closeness*partnership collective benefits system benefits system benefits is female belongs to ethnic minority has partner has dependent child born in country

23 Estimated Coefficients of Logistic Regression variable unweighted s.e. weighted s.e political efficacy political efficacy closeness of contest (%) partisanship closeness*partnership collective benefits system benefits system benefits is female belongs to ethnic minority has partner has dependent child born in country

24 Test for Informative Sampling Augment model by adding interactions between weight and x variables. Test whether coefficients of new variables are all 0 Wald test F (14, 2607)=2.0, p= Conclude significant effect of weighting. DuMouchel and Duncan (1983), Fuller (2009) 24

25 Test as Diagnostic for Misspecification informative sampling may indicate omitted variable in model are there variables which explain π j which could be included in model? 25

26 Sample Selection separate sampling in different countries sampling schemes vary according to different sampling frames, e.g. - lists of residents - lists of households - lists of addresses 26

27 Respecified Model major variation in design weights between countries introduce country dummy variables in model scientifically reasonable, since aim is to study variations in turnout of young people in context of country variation 27

28 Initial model New model Variable unweighted weighted unweighted weighted political efficacy political efficacy closeness of contest (%) Partisanship closeness*partnership collective benefits system benefits system benefits is female belongs to ethnic minority has partner has dependent child born in country

29 Results for Respecified Model effect of weighting reduced no longer significant little sense in attempting to respecify model further to include within country design variables, since such variation in designs between countries 29

30 Standard Errors Variable Unweighted Weighted political efficacy political efficacy closeness of contest (%) Partisanship closeness*partnership collective benefits system benefits system benefits is female belongs to ethnic minority has partner has dependent child born in country

31 Within Country Weights as Modified Weights Recall modified weight: w j = d j q j = d j /E m (d j x j ) Let x (1) be vector of country dummy variables, subvector of x Simplify E m (d j x j ) to E m (d j x (1) j ) Weighting with d j /E m (d j x (1) j ) still consistent - achieves most of efficiency gain? Estimate E m (d j x (1) j ) by design-weighted mean of design weights d c(j) for country c(j) d j / d c(j) is within country weight 31

32 Standard Errors Within Design Variable Unweighted country weights political efficacy political efficacy closeness of contest (%) Partisanship closeness*partnership collective benefits system benefits system benefits is female belongs to ethnic minority has partner has dependent child born in country

33 Within Country Weighting consistent estimation, provided dependence of y on countries correctly specified in model protects against bias from informative selection within countries (non-significant test may lack power) avoids inflation of standard errors may give consistent estimation for more suitable pseudo-parameter under model misspecification? 33

34 Return to Theory 34

35 Widening Class of Weights Further Still assume design-model consistency condition and mean score model holds. Want to minimize Write var mp ( β w ) = J 1 var mp { N j=1 } w j I j u j (β) J 1 { N } var w j I j u j (β) = j=1 { N } { var E(w j y j, x j, I j )I j u j (β) + E var( j=1 35 N j=1 } w j I j u j (β) y j, x j, I j )

36 Second Kind of Weight Modification Consistency unaffected if E(w j y j, x j, I j )I j = E(d j y j, x j, I j )I j Variance minimized if var(w j y j, x j, I j ) = 0 Achieved by setting w j = E(d j y j, x j, I j = 1)q(x j ) d j q(x j ) modify d j to d j, smooths noise in weights unrelated to y j given x j c.f Beaumont (2008) weight smoothing Chaudhuri et al. (2010), Pfeffermann (2011) (conditional) empirical likelihood 36

37 Weight Smoothing in Descriptive Surveys T = j y j Horvitz-Thompson estimator T HT = j I jd j y j Smoothed Horvitz-Thompson estimator T SHT = j I j d j y j where d j E mp (d j y j, I j = 1) E mp ( T HT ) = E mp ( T SHT ) = T V mp ( T HT ) V mp ( T SHT ) 37

38 Weight Smoothing in Regression where d j E(d j y j, x j, I j = 1) w j = d j q(x j ) optimal choice q j E ( ) m e 2 j x j ) E m ( dj ej 2 x j 38

39 Auxiliary Weight Model(s) d j = E(d j y j, x j, I j = 1) = d(y j, x j ; φ) E.g. d(y j, x j ; φ) = 1 + exp( φ 1 x j φ 2 y j ) q j depends on E( d j e 2 j x j ) Iterative estimation of β and φ Kim and Skinner (2013) See also Beaumont (2008), Pfeffermann (2011) 39

40 first weight modification Variance Estimation - replacement of q(x j ) by q(x j ; α) does not affect asymptotic variance - consistent variance estimation by treating weights d j q j as fixed second weight modification - replacement of d j by d(y j, x j ; φ) does affect asymptotic variance - variance estimation does need take account of error in estimating φ - Kim and Skinner (2013) give linearization variance estimator 40

41 Simulation Study y j = β 0 + β 1 x j + e j π j = nk j i U k, k j = i Poisson and Chao (1982) sampling exp( z j ), z j N(1 + y j, ) Pfeffermann-Sverchkov q j = 1/E ( ) ( d j ) I j = 1, x j Our first modification q j = Em ej 2 (d x j ) j ej 2 x j E 41

42 Simulation Standard Errors - Intercept Weight Model A Model B Design Pfeffermann-Sverchkov Our first modification Smoothed design Smoothed P-S Our combined modification

43 Simulation Standard Errors - Slope Weight Model A Model B Design Pfeffermann-Sverchkov Our first modification Smoothed design Smoothed P-S Our combined modification

44 Summary Both weight modifications offer efficiency gains, assuming (mean score) model holds. First weight modification offers: consistency under misspecification of q(x) no need to modify variance estimation approach Smoothing requires: correct specification of E(d j y j, x j, I j = 1) for consistency modification of variance estimation approach In both cases, consider whether implied pseudoparameter under misspecification is scientifically reasonable. 44

45 References 1 Kim, J.K. and Skinner, C.J. (2013) Weighting in survey analysis under informative sampling. Biometrika, 100, Skinner, C.J. and Mason, B. (2012) Weighting in the regression analysis of survey data with a cross-national application. Canadian Journal of Statistics, 39,

46 References 2 Beaumont, J.-F. (2008) Biometrika Chaudhuri, S., Handcock, M. and Rendall, M. (2010) Working paper DuMouchel, W. and Duncan, G. (1983) JASA Fieldhouse, E. et al. (2007) Eur. J. Political Research Fuller, W. (2009) Sampling Statistics. Wiley Godambe, V. and Thompson, M. (1986) Int. Statist.Rev. Kish, L. (1994) Int. Statist. Rev. Kish, L. and Frankel, M. (1974) J.R.S.S.B Pfeffermann, D. (2011) Survey Methodology Pfeffermann, D. and Sverchkov, M. (1999) Sankhya, B Pfeffermann, D. and Sverchkov, M. (2003) in Chambers, R. and Skinner, C. eds. Analysis of Survey Data. Wiley. Scott, A. and Wild, C. (2002) J.R.S.S.B Scott, A. and Wild, C. (2003) in Chambers, R. and Skinner, C. eds. Analysis of Survey Data. Wiley. Skinner, C. (2003) in Chambers, R. and Skinner, C. eds. Analysis of Survey Data. Wiley. 46

47 47

Efficiency of Robust Regression Model in Sampling Designs Using Weighted Averages

Efficiency of Robust Regression Model in Sampling Designs Using Weighted Averages IOSR Journal of Mathematics (IOSR-JM) e-iss: 78-578, p-iss: 319-765X. Volume 14, Issue 1 Ver. II (Jan. - Feb. 018), PP 41-45 www.iosrjournals.org Efficiency of Robust Regression Model in Sampling Designs

More information

Weighting in survey analysis under informative sampling

Weighting in survey analysis under informative sampling Jae Kwang Kim and Chris J. Skinner Weighting in survey analysis under informative sampling Article (Accepted version) (Refereed) Original citation: Kim, Jae Kwang and Skinner, Chris J. (2013) Weighting

More information

Combining data from two independent surveys: model-assisted approach

Combining data from two independent surveys: model-assisted approach Combining data from two independent surveys: model-assisted approach Jae Kwang Kim 1 Iowa State University January 20, 2012 1 Joint work with J.N.K. Rao, Carleton University Reference Kim, J.K. and Rao,

More information

What is Survey Weighting? Chris Skinner University of Southampton

What is Survey Weighting? Chris Skinner University of Southampton What is Survey Weighting? Chris Skinner University of Southampton 1 Outline 1. Introduction 2. (Unresolved) Issues 3. Further reading etc. 2 Sampling 3 Representation 4 out of 8 1 out of 10 4 Weights 8/4

More information

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction

More information

A measurement error model approach to small area estimation

A measurement error model approach to small area estimation A measurement error model approach to small area estimation Jae-kwang Kim 1 Spring, 2015 1 Joint work with Seunghwan Park and Seoyoung Kim Ouline Introduction Basic Theory Application to Korean LFS Discussion

More information

STATISTICAL INFERENCE FOR SURVEY DATA ANALYSIS

STATISTICAL INFERENCE FOR SURVEY DATA ANALYSIS STATISTICAL INFERENCE FOR SURVEY DATA ANALYSIS David A Binder and Georgia R Roberts Methodology Branch, Statistics Canada, Ottawa, ON, Canada K1A 0T6 KEY WORDS: Design-based properties, Informative sampling,

More information

Fractional Imputation in Survey Sampling: A Comparative Review

Fractional Imputation in Survey Sampling: A Comparative Review Fractional Imputation in Survey Sampling: A Comparative Review Shu Yang Jae-Kwang Kim Iowa State University Joint Statistical Meetings, August 2015 Outline Introduction Fractional imputation Features Numerical

More information

INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING

INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING Statistica Sinica 24 (2014), 1001-1015 doi:http://dx.doi.org/10.5705/ss.2013.038 INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING Seunghwan Park and Jae Kwang Kim Seoul National Univeristy

More information

Nonresponse weighting adjustment using estimated response probability

Nonresponse weighting adjustment using estimated response probability Nonresponse weighting adjustment using estimated response probability Jae-kwang Kim Yonsei University, Seoul, Korea December 26, 2006 Introduction Nonresponse Unit nonresponse Item nonresponse Basic strategy

More information

Empirical likelihood inference for regression parameters when modelling hierarchical complex survey data

Empirical likelihood inference for regression parameters when modelling hierarchical complex survey data Empirical likelihood inference for regression parameters when modelling hierarchical complex survey data Melike Oguz-Alper Yves G. Berger Abstract The data used in social, behavioural, health or biological

More information

Sanjay Chaudhuri Department of Statistics and Applied Probability, National University of Singapore

Sanjay Chaudhuri Department of Statistics and Applied Probability, National University of Singapore AN EMPIRICAL LIKELIHOOD BASED ESTIMATOR FOR RESPONDENT DRIVEN SAMPLED DATA Sanjay Chaudhuri Department of Statistics and Applied Probability, National University of Singapore Mark Handcock, Department

More information

Miscellanea A note on multiple imputation under complex sampling

Miscellanea A note on multiple imputation under complex sampling Biometrika (2017), 104, 1,pp. 221 228 doi: 10.1093/biomet/asw058 Printed in Great Britain Advance Access publication 3 January 2017 Miscellanea A note on multiple imputation under complex sampling BY J.

More information

Bootstrap inference for the finite population total under complex sampling designs

Bootstrap inference for the finite population total under complex sampling designs Bootstrap inference for the finite population total under complex sampling designs Zhonglei Wang (Joint work with Dr. Jae Kwang Kim) Center for Survey Statistics and Methodology Iowa State University Jan.

More information

Impact of serial correlation structures on random effect misspecification with the linear mixed model.

Impact of serial correlation structures on random effect misspecification with the linear mixed model. Impact of serial correlation structures on random effect misspecification with the linear mixed model. Brandon LeBeau University of Iowa file:///c:/users/bleb/onedrive%20 %20University%20of%20Iowa%201/JournalArticlesInProgress/Diss/Study2/Pres/pres.html#(2)

More information

Regressing the Propensity to Vote

Regressing the Propensity to Vote Journal of Data Science 10(2012), 281-295 Regressing the Propensity to Vote Gordon G. Bechtel University of Florida and Florida Research Institute Abstract: The present paper addresses the propensity to

More information

LOGISTICS REGRESSION FOR SAMPLE SURVEYS

LOGISTICS REGRESSION FOR SAMPLE SURVEYS 4 LOGISTICS REGRESSION FOR SAMPLE SURVEYS Hukum Chandra Indian Agricultural Statistics Research Institute, New Delhi-002 4. INTRODUCTION Researchers use sample survey methodology to obtain information

More information

Non-parametric bootstrap and small area estimation to mitigate bias in crowdsourced data Simulation study and application to perceived safety

Non-parametric bootstrap and small area estimation to mitigate bias in crowdsourced data Simulation study and application to perceived safety Non-parametric bootstrap and small area estimation to mitigate bias in crowdsourced data Simulation study and application to perceived safety David Buil-Gil, Reka Solymosi Centre for Criminology and Criminal

More information

Answers to Problem Set #4

Answers to Problem Set #4 Answers to Problem Set #4 Problems. Suppose that, from a sample of 63 observations, the least squares estimates and the corresponding estimated variance covariance matrix are given by: bβ bβ 2 bβ 3 = 2

More information

A comparison of weighted estimators for the population mean. Ye Yang Weighting in surveys group

A comparison of weighted estimators for the population mean. Ye Yang Weighting in surveys group A comparison of weighted estimators for the population mean Ye Yang Weighting in surveys group Motivation Survey sample in which auxiliary variables are known for the population and an outcome variable

More information

Discrete Response Multilevel Models for Repeated Measures: An Application to Voting Intentions Data

Discrete Response Multilevel Models for Repeated Measures: An Application to Voting Intentions Data Quality & Quantity 34: 323 330, 2000. 2000 Kluwer Academic Publishers. Printed in the Netherlands. 323 Note Discrete Response Multilevel Models for Repeated Measures: An Application to Voting Intentions

More information

Generalized Linear Models: An Introduction

Generalized Linear Models: An Introduction Applied Statistics With R Generalized Linear Models: An Introduction John Fox WU Wien May/June 2006 2006 by John Fox Generalized Linear Models: An Introduction 1 A synthesis due to Nelder and Wedderburn,

More information

On Testing for Informative Selection in Survey Sampling 1. (plus Some Estimation) Jay Breidt Colorado State University

On Testing for Informative Selection in Survey Sampling 1. (plus Some Estimation) Jay Breidt Colorado State University On Testing for Informative Selection in Survey Sampling 1 (plus Some Estimation) Jay Breidt Colorado State University Survey Methods and their Use in Related Fields Neuchâtel, Switzerland August 23, 2018

More information

Weighting Methods. Harvard University STAT186/GOV2002 CAUSAL INFERENCE. Fall Kosuke Imai

Weighting Methods. Harvard University STAT186/GOV2002 CAUSAL INFERENCE. Fall Kosuke Imai Weighting Methods Kosuke Imai Harvard University STAT186/GOV2002 CAUSAL INFERENCE Fall 2018 Kosuke Imai (Harvard) Weighting Methods Stat186/Gov2002 Fall 2018 1 / 13 Motivation Matching methods for improving

More information

How to Use the Internet for Election Surveys

How to Use the Internet for Election Surveys How to Use the Internet for Election Surveys Simon Jackman and Douglas Rivers Stanford University and Polimetrix, Inc. May 9, 2008 Theory and Practice Practice Theory Works Doesn t work Works Great! Black

More information

Log-linear Modelling with Complex Survey Data

Log-linear Modelling with Complex Survey Data Int. Statistical Inst.: Proc. 58th World Statistical Congress, 20, Dublin (Session IPS056) p.02 Log-linear Modelling with Complex Survey Data Chris Sinner University of Southampton United Kingdom Abstract:

More information

Propensity Score Weighting with Multilevel Data

Propensity Score Weighting with Multilevel Data Propensity Score Weighting with Multilevel Data Fan Li Department of Statistical Science Duke University October 25, 2012 Joint work with Alan Zaslavsky and Mary Beth Landrum Introduction In comparative

More information

The ESS Sample Design Data File (SDDF)

The ESS Sample Design Data File (SDDF) The ESS Sample Design Data File (SDDF) Documentation Version 1.0 Matthias Ganninger Tel: +49 (0)621 1246 282 E-Mail: matthias.ganninger@gesis.org April 8, 2008 Summary: This document reports on the creation

More information

In Praise of the Listwise-Deletion Method (Perhaps with Reweighting)

In Praise of the Listwise-Deletion Method (Perhaps with Reweighting) In Praise of the Listwise-Deletion Method (Perhaps with Reweighting) Phillip S. Kott RTI International NISS Worshop on the Analysis of Complex Survey Data With Missing Item Values October 17, 2014 1 RTI

More information

F. Jay Breidt Colorado State University

F. Jay Breidt Colorado State University Model-assisted survey regression estimation with the lasso 1 F. Jay Breidt Colorado State University Opening Workshop on Computational Methods in Social Sciences SAMSI August 2013 This research was supported

More information

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1 Parametric Modelling of Over-dispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson

More information

CHAPTER 6: SPECIFICATION VARIABLES

CHAPTER 6: SPECIFICATION VARIABLES Recall, we had the following six assumptions required for the Gauss-Markov Theorem: 1. The regression model is linear, correctly specified, and has an additive error term. 2. The error term has a zero

More information

BIAS-ROBUSTNESS AND EFFICIENCY OF MODEL-BASED INFERENCE IN SURVEY SAMPLING

BIAS-ROBUSTNESS AND EFFICIENCY OF MODEL-BASED INFERENCE IN SURVEY SAMPLING Statistica Sinica 22 (2012), 777-794 doi:http://dx.doi.org/10.5705/ss.2010.238 BIAS-ROBUSTNESS AND EFFICIENCY OF MODEL-BASED INFERENCE IN SURVEY SAMPLING Desislava Nedyalova and Yves Tillé University of

More information

Introduction to Survey Data Integration

Introduction to Survey Data Integration Introduction to Survey Data Integration Jae-Kwang Kim Iowa State University May 20, 2014 Outline 1 Introduction 2 Survey Integration Examples 3 Basic Theory for Survey Integration 4 NASS application 5

More information

A Design-Sensitive Approach to Fitting Regression Models With Complex Survey Data

A Design-Sensitive Approach to Fitting Regression Models With Complex Survey Data A Design-Sensitive Approach to Fitting Regression Models With Complex Survey Data Phillip S. Kott RI International Rockville, MD Introduction: What Does Fitting a Regression Model with Survey Data Mean?

More information

A note on multiple imputation for general purpose estimation

A note on multiple imputation for general purpose estimation A note on multiple imputation for general purpose estimation Shu Yang Jae Kwang Kim SSC meeting June 16, 2015 Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 1 / 32 Introduction Basic Setup Assume

More information

11. Generalized Linear Models: An Introduction

11. Generalized Linear Models: An Introduction Sociology 740 John Fox Lecture Notes 11. Generalized Linear Models: An Introduction Copyright 2014 by John Fox Generalized Linear Models: An Introduction 1 1. Introduction I A synthesis due to Nelder and

More information

6. Assessing studies based on multiple regression

6. Assessing studies based on multiple regression 6. Assessing studies based on multiple regression Questions of this section: What makes a study using multiple regression (un)reliable? When does multiple regression provide a useful estimate of the causal

More information

ESRI 2008 Health GIS Conference

ESRI 2008 Health GIS Conference ESRI 2008 Health GIS Conference An Exploration of Geographically Weighted Regression on Spatial Non- Stationarity and Principal Component Extraction of Determinative Information from Robust Datasets A

More information

On the bias of the multiple-imputation variance estimator in survey sampling

On the bias of the multiple-imputation variance estimator in survey sampling J. R. Statist. Soc. B (2006) 68, Part 3, pp. 509 521 On the bias of the multiple-imputation variance estimator in survey sampling Jae Kwang Kim, Yonsei University, Seoul, Korea J. Michael Brick, Westat,

More information

Chapter 11. Regression with a Binary Dependent Variable

Chapter 11. Regression with a Binary Dependent Variable Chapter 11 Regression with a Binary Dependent Variable 2 Regression with a Binary Dependent Variable (SW Chapter 11) So far the dependent variable (Y) has been continuous: district-wide average test score

More information

Penalized Balanced Sampling. Jay Breidt

Penalized Balanced Sampling. Jay Breidt Penalized Balanced Sampling Jay Breidt Colorado State University Joint work with Guillaume Chauvet (ENSAI) February 4, 2010 1 / 44 Linear Mixed Models Let U = {1, 2,...,N}. Consider linear mixed models

More information

Introduction to Linear Regression Analysis

Introduction to Linear Regression Analysis Introduction to Linear Regression Analysis Samuel Nocito Lecture 1 March 2nd, 2018 Econometrics: What is it? Interaction of economic theory, observed data and statistical methods. The science of testing

More information

SUPPLEMENTARY SIMULATIONS & FIGURES

SUPPLEMENTARY SIMULATIONS & FIGURES Supplementary Material: Supplementary Material for Mixed Effects Models for Resampled Network Statistics Improve Statistical Power to Find Differences in Multi-Subject Functional Connectivity Manjari Narayan,

More information

Generalized Linear Models

Generalized Linear Models York SPIDA John Fox Notes Generalized Linear Models Copyright 2010 by John Fox Generalized Linear Models 1 1. Topics I The structure of generalized linear models I Poisson and other generalized linear

More information

Empirical Likelihood Methods for Sample Survey Data: An Overview

Empirical Likelihood Methods for Sample Survey Data: An Overview AUSTRIAN JOURNAL OF STATISTICS Volume 35 (2006), Number 2&3, 191 196 Empirical Likelihood Methods for Sample Survey Data: An Overview J. N. K. Rao Carleton University, Ottawa, Canada Abstract: The use

More information

Combining Non-probability and Probability Survey Samples Through Mass Imputation

Combining Non-probability and Probability Survey Samples Through Mass Imputation Combining Non-probability and Probability Survey Samples Through Mass Imputation Jae-Kwang Kim 1 Iowa State University & KAIST October 27, 2018 1 Joint work with Seho Park, Yilin Chen, and Changbao Wu

More information

High Dimensional Propensity Score Estimation via Covariate Balancing

High Dimensional Propensity Score Estimation via Covariate Balancing High Dimensional Propensity Score Estimation via Covariate Balancing Kosuke Imai Princeton University Talk at Columbia University May 13, 2017 Joint work with Yang Ning and Sida Peng Kosuke Imai (Princeton)

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data Jae-Kwang Kim 1 Iowa State University June 28, 2012 1 Joint work with Dr. Ming Zhou (when he was a PhD student at ISU)

More information

Application of log-linear models in producing small area estimates of unemployment in Poland

Application of log-linear models in producing small area estimates of unemployment in Poland Application of log-linear models in producing small area estimates of unemployment in Poland Tomasz Klimanek 1, Tomasz Józefowski 1, Marcin Szymkowiak 1,2, Li-Chun Zhang 3,4 1 Statistical Office in Poznan

More information

Estimation of change in a rotation panel design

Estimation of change in a rotation panel design Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS028) p.4520 Estimation of change in a rotation panel design Andersson, Claes Statistics Sweden S-701 89 Örebro, Sweden

More information

Modeling Longitudinal Count Data with Excess Zeros and Time-Dependent Covariates: Application to Drug Use

Modeling Longitudinal Count Data with Excess Zeros and Time-Dependent Covariates: Application to Drug Use Modeling Longitudinal Count Data with Excess Zeros and : Application to Drug Use University of Northern Colorado November 17, 2014 Presentation Outline I and Data Issues II Correlated Count Regression

More information

Multilevel Modeling Day 2 Intermediate and Advanced Issues: Multilevel Models as Mixed Models. Jian Wang September 18, 2012

Multilevel Modeling Day 2 Intermediate and Advanced Issues: Multilevel Models as Mixed Models. Jian Wang September 18, 2012 Multilevel Modeling Day 2 Intermediate and Advanced Issues: Multilevel Models as Mixed Models Jian Wang September 18, 2012 What are mixed models The simplest multilevel models are in fact mixed models:

More information

Applying cluster analysis to 2011 Census local authority data

Applying cluster analysis to 2011 Census local authority data Applying cluster analysis to 2011 Census local authority data Kitty.Lymperopoulou@manchester.ac.uk SPSS User Group Conference November, 10 2017 Outline Basic ideas of cluster analysis How to choose variables

More information

Model choices for complex survey analysis

Model choices for complex survey analysis University of Wollongong Research Online University of Wollongong Thesis Collection 1954-2016 University of Wollongong Thesis Collections 2016 Model choices for complex survey analysis Preeya Riyapan University

More information

G. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication

G. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication G. S. Maddala Kajal Lahiri WILEY A John Wiley and Sons, Ltd., Publication TEMT Foreword Preface to the Fourth Edition xvii xix Part I Introduction and the Linear Regression Model 1 CHAPTER 1 What is Econometrics?

More information

Multiple Regression. Midterm results: AVG = 26.5 (88%) A = 27+ B = C =

Multiple Regression. Midterm results: AVG = 26.5 (88%) A = 27+ B = C = Economics 130 Lecture 6 Midterm Review Next Steps for the Class Multiple Regression Review & Issues Model Specification Issues Launching the Projects!!!!! Midterm results: AVG = 26.5 (88%) A = 27+ B =

More information

1 Motivation for Instrumental Variable (IV) Regression

1 Motivation for Instrumental Variable (IV) Regression ECON 370: IV & 2SLS 1 Instrumental Variables Estimation and Two Stage Least Squares Econometric Methods, ECON 370 Let s get back to the thiking in terms of cross sectional (or pooled cross sectional) data

More information

REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY

REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY J.D. Opsomer, W.A. Fuller and X. Li Iowa State University, Ames, IA 50011, USA 1. Introduction Replication methods are often used in

More information

Example. Multiple Regression. Review of ANOVA & Simple Regression /749 Experimental Design for Behavioral and Social Sciences

Example. Multiple Regression. Review of ANOVA & Simple Regression /749 Experimental Design for Behavioral and Social Sciences 36-309/749 Experimental Design for Behavioral and Social Sciences Sep. 29, 2015 Lecture 5: Multiple Regression Review of ANOVA & Simple Regression Both Quantitative outcome Independent, Gaussian errors

More information

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies Kosuke Imai Department of Politics Princeton University November 13, 2013 So far, we have essentially assumed

More information

Econometrics for PhDs

Econometrics for PhDs Econometrics for PhDs Amine Ouazad April 2012, Final Assessment - Answer Key 1 Questions with a require some Stata in the answer. Other questions do not. 1 Ordinary Least Squares: Equality of Estimates

More information

Auxiliary Variables in Mixture Modeling: Using the BCH Method in Mplus to Estimate a Distal Outcome Model and an Arbitrary Secondary Model

Auxiliary Variables in Mixture Modeling: Using the BCH Method in Mplus to Estimate a Distal Outcome Model and an Arbitrary Secondary Model Auxiliary Variables in Mixture Modeling: Using the BCH Method in Mplus to Estimate a Distal Outcome Model and an Arbitrary Secondary Model Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 21 Version

More information

Covariate Balancing Propensity Score for General Treatment Regimes

Covariate Balancing Propensity Score for General Treatment Regimes Covariate Balancing Propensity Score for General Treatment Regimes Kosuke Imai Princeton University October 14, 2014 Talk at the Department of Psychiatry, Columbia University Joint work with Christian

More information

Exercise sheet 6 Models with endogenous explanatory variables

Exercise sheet 6 Models with endogenous explanatory variables Exercise sheet 6 Models with endogenous explanatory variables Note: Some of the exercises include estimations and references to the data files. Use these to compare them to the results you obtained with

More information

Statistical Analysis of List Experiments

Statistical Analysis of List Experiments Statistical Analysis of List Experiments Graeme Blair Kosuke Imai Princeton University December 17, 2010 Blair and Imai (Princeton) List Experiments Political Methodology Seminar 1 / 32 Motivation Surveys

More information

1. Capitalize all surnames and attempt to match with Census list. 3. Split double-barreled names apart, and attempt to match first half of name.

1. Capitalize all surnames and attempt to match with Census list. 3. Split double-barreled names apart, and attempt to match first half of name. Supplementary Appendix: Imai, Kosuke and Kabir Kahnna. (2016). Improving Ecological Inference by Predicting Individual Ethnicity from Voter Registration Records. Political Analysis doi: 10.1093/pan/mpw001

More information

Two-phase sampling approach to fractional hot deck imputation

Two-phase sampling approach to fractional hot deck imputation Two-phase sampling approach to fractional hot deck imputation Jongho Im 1, Jae-Kwang Kim 1 and Wayne A. Fuller 1 Abstract Hot deck imputation is popular for handling item nonresponse in survey sampling.

More information

Recent Advances in the analysis of missing data with non-ignorable missingness

Recent Advances in the analysis of missing data with non-ignorable missingness Recent Advances in the analysis of missing data with non-ignorable missingness Jae-Kwang Kim Department of Statistics, Iowa State University July 4th, 2014 1 Introduction 2 Full likelihood-based ML estimation

More information

FREQUENTIST BEHAVIOR OF FORMAL BAYESIAN INFERENCE

FREQUENTIST BEHAVIOR OF FORMAL BAYESIAN INFERENCE FREQUENTIST BEHAVIOR OF FORMAL BAYESIAN INFERENCE Donald A. Pierce Oregon State Univ (Emeritus), RERF Hiroshima (Retired), Oregon Health Sciences Univ (Adjunct) Ruggero Bellio Univ of Udine For Perugia

More information

Weighting. Homework 2. Regression. Regression. Decisions Matching: Weighting (0) W i. (1) -å l i. )Y i. (1-W i 3/5/2014. (1) = Y i.

Weighting. Homework 2. Regression. Regression. Decisions Matching: Weighting (0) W i. (1) -å l i. )Y i. (1-W i 3/5/2014. (1) = Y i. Weighting Unconfounded Homework 2 Describe imbalance direction matters STA 320 Design and Analysis of Causal Studies Dr. Kari Lock Morgan and Dr. Fan Li Department of Statistical Science Duke University

More information

Unequal Probability Designs

Unequal Probability Designs Unequal Probability Designs Department of Statistics University of British Columbia This is prepares for Stat 344, 2014 Section 7.11 and 7.12 Probability Sampling Designs: A quick review A probability

More information

Weight calibration and the survey bootstrap

Weight calibration and the survey bootstrap Weight and the survey Department of Statistics University of Missouri-Columbia March 7, 2011 Motivating questions 1 Why are the large scale samples always so complex? 2 Why do I need to use weights? 3

More information

Selection on Observables

Selection on Observables Selection on Observables Hasin Yousaf (UC3M) 9th November Hasin Yousaf (UC3M) Selection on Observables 9th November 1 / 22 Summary Altonji, Elder and Taber, JPE, 2005 Bellows and Miguel, JPubE, 2009 Oster,

More information

Poisson Regression. Gelman & Hill Chapter 6. February 6, 2017

Poisson Regression. Gelman & Hill Chapter 6. February 6, 2017 Poisson Regression Gelman & Hill Chapter 6 February 6, 2017 Military Coups Background: Sub-Sahara Africa has experienced a high proportion of regime changes due to military takeover of governments for

More information

Appendix A. Numeric example of Dimick Staiger Estimator and comparison between Dimick-Staiger Estimator and Hierarchical Poisson Estimator

Appendix A. Numeric example of Dimick Staiger Estimator and comparison between Dimick-Staiger Estimator and Hierarchical Poisson Estimator Appendix A. Numeric example of Dimick Staiger Estimator and comparison between Dimick-Staiger Estimator and Hierarchical Poisson Estimator As described in the manuscript, the Dimick-Staiger (DS) estimator

More information

Likelihood-based inference with missing data under missing-at-random

Likelihood-based inference with missing data under missing-at-random Likelihood-based inference with missing data under missing-at-random Jae-kwang Kim Joint work with Shu Yang Department of Statistics, Iowa State University May 4, 014 Outline 1. Introduction. Parametric

More information

SAMPLING BIOS 662. Michael G. Hudgens, Ph.D. mhudgens :55. BIOS Sampling

SAMPLING BIOS 662. Michael G. Hudgens, Ph.D.   mhudgens :55. BIOS Sampling SAMPLIG BIOS 662 Michael G. Hudgens, Ph.D. mhudgens@bios.unc.edu http://www.bios.unc.edu/ mhudgens 2008-11-14 15:55 BIOS 662 1 Sampling Outline Preliminaries Simple random sampling Population mean Population

More information

A double-hurdle count model for completed fertility data from the developing world

A double-hurdle count model for completed fertility data from the developing world A double-hurdle count model for completed fertility data from the developing world Alfonso Miranda (alfonso.miranda@cide.edu) Center for Research and Teaching in Economics CIDE México c A. Miranda (p.

More information

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design 1 / 32 Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design Changbao Wu Department of Statistics and Actuarial Science University of Waterloo (Joint work with Min Chen and Mary

More information

AdaBoost. Lecturer: Authors: Center for Machine Perception Czech Technical University, Prague

AdaBoost. Lecturer: Authors: Center for Machine Perception Czech Technical University, Prague AdaBoost Lecturer: Jan Šochman Authors: Jan Šochman, Jiří Matas Center for Machine Perception Czech Technical University, Prague http://cmp.felk.cvut.cz Motivation Presentation 2/17 AdaBoost with trees

More information

Topics and Papers for Spring 14 RIT

Topics and Papers for Spring 14 RIT Eric Slud Feb. 3, 204 Topics and Papers for Spring 4 RIT The general topic of the RIT is inference for parameters of interest, such as population means or nonlinearregression coefficients, in the presence

More information

Økonomisk Kandidateksamen 2004 (I) Econometrics 2. Rettevejledning

Økonomisk Kandidateksamen 2004 (I) Econometrics 2. Rettevejledning Økonomisk Kandidateksamen 2004 (I) Econometrics 2 Rettevejledning This is a closed-book exam (uden hjælpemidler). Answer all questions! The group of questions 1 to 4 have equal weight. Within each group,

More information

Nonparametric Regression Estimation of Finite Population Totals under Two-Stage Sampling

Nonparametric Regression Estimation of Finite Population Totals under Two-Stage Sampling Nonparametric Regression Estimation of Finite Population Totals under Two-Stage Sampling Ji-Yeon Kim Iowa State University F. Jay Breidt Colorado State University Jean D. Opsomer Colorado State University

More information

Topic 10: Panel Data Analysis

Topic 10: Panel Data Analysis Topic 10: Panel Data Analysis Advanced Econometrics (I) Dong Chen School of Economics, Peking University 1 Introduction Panel data combine the features of cross section data time series. Usually a panel

More information

Primal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing

Primal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing Primal-dual Covariate Balance and Minimal Double Robustness via (Joint work with Daniel Percival) Department of Statistics, Stanford University JSM, August 9, 2015 Outline 1 2 3 1/18 Setting Rubin s causal

More information

Regression-Discontinuity Analysis

Regression-Discontinuity Analysis Page 1 of 11 Home» Analysis» Inferential Statistics» Regression-Discontinuity Analysis Analysis Requirements The basic RD Design is a two-group pretestposttest model as indicated in the design notation.

More information

ECNS 561 Multiple Regression Analysis

ECNS 561 Multiple Regression Analysis ECNS 561 Multiple Regression Analysis Model with Two Independent Variables Consider the following model Crime i = β 0 + β 1 Educ i + β 2 [what else would we like to control for?] + ε i Here, we are taking

More information

Statistical Models for Causal Analysis

Statistical Models for Causal Analysis Statistical Models for Causal Analysis Teppei Yamamoto Keio University Introduction to Causal Inference Spring 2016 Three Modes of Statistical Inference 1. Descriptive Inference: summarizing and exploring

More information

Survival Analysis Math 434 Fall 2011

Survival Analysis Math 434 Fall 2011 Survival Analysis Math 434 Fall 2011 Part IV: Chap. 8,9.2,9.3,11: Semiparametric Proportional Hazards Regression Jimin Ding Math Dept. www.math.wustl.edu/ jmding/math434/fall09/index.html Basic Model Setup

More information

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score Causal Inference with General Treatment Regimes: Generalizing the Propensity Score David van Dyk Department of Statistics, University of California, Irvine vandyk@stat.harvard.edu Joint work with Kosuke

More information

Combining multiple observational data sources to estimate causal eects

Combining multiple observational data sources to estimate causal eects Department of Statistics, North Carolina State University Combining multiple observational data sources to estimate causal eects Shu Yang* syang24@ncsuedu Joint work with Peng Ding UC Berkeley May 23,

More information

ISQS 5349 Spring 2013 Final Exam

ISQS 5349 Spring 2013 Final Exam ISQS 5349 Spring 2013 Final Exam Name: General Instructions: Closed books, notes, no electronic devices. Points (out of 200) are in parentheses. Put written answers on separate paper; multiple choices

More information

Lecture 9: Learning Optimal Dynamic Treatment Regimes. Donglin Zeng, Department of Biostatistics, University of North Carolina

Lecture 9: Learning Optimal Dynamic Treatment Regimes. Donglin Zeng, Department of Biostatistics, University of North Carolina Lecture 9: Learning Optimal Dynamic Treatment Regimes Introduction Refresh: Dynamic Treatment Regimes (DTRs) DTRs: sequential decision rules, tailored at each stage by patients time-varying features and

More information

Calibration estimation using exponential tilting in sample surveys

Calibration estimation using exponential tilting in sample surveys Calibration estimation using exponential tilting in sample surveys Jae Kwang Kim February 23, 2010 Abstract We consider the problem of parameter estimation with auxiliary information, where the auxiliary

More information

Problem Set 3: Bootstrap, Quantile Regression and MCMC Methods. MIT , Fall Due: Wednesday, 07 November 2007, 5:00 PM

Problem Set 3: Bootstrap, Quantile Regression and MCMC Methods. MIT , Fall Due: Wednesday, 07 November 2007, 5:00 PM Problem Set 3: Bootstrap, Quantile Regression and MCMC Methods MIT 14.385, Fall 2007 Due: Wednesday, 07 November 2007, 5:00 PM 1 Applied Problems Instructions: The page indications given below give you

More information

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Lecture Linear models from a matrix perspective An example of how to do

More information

Technical Appendix C: Methods

Technical Appendix C: Methods Technical Appendix C: Methods As not all readers may be familiar with the multilevel analytical methods used in this study, a brief note helps to clarify the techniques. The general theory developed in

More information

Final Exam. Name: Solution:

Final Exam. Name: Solution: Final Exam. Name: Instructions. Answer all questions on the exam. Open books, open notes, but no electronic devices. The first 13 problems are worth 5 points each. The rest are worth 1 point each. HW1.

More information