The Use of Survey Weights in Regression Modelling

Size: px

Start display at page:

Download "The Use of Survey Weights in Regression Modelling"

Maurice Lyons
5 years ago
Views:

1 The Use of Survey Weights in Regression Modelling Chris Skinner London School of Economics and Political Science (with Jae-Kwang Kim, Iowa State University) Colorado State University, June

2 Weighting in Regression Modelling Established whether to weight implementation in practice Modern developments adapting to new modelling settings new ways of constructing (modifying) weights (for particular models) 2

3 Modifying Weights to retain benefits of weights, while mitigating disadvantages 3

4 Pros and Cons of Weighting Pros to avoid bias from informative sampling, when inclusion probabilities π j unequal (note: other approaches can also do this, e.g. sample likelihood, Pfeffermann, 2011) to protect against model misspecification to make efficient use of population-level information Cons variance inflation from unequal inclusion probabilities 4

5 Motivating Application: Cross-National Comparative Survey Analysis Common for π j to vary, especially between countries. population sizes of countries have a tremendous, thousand-fold range; whereas sample sizes tend to be made more constant in order to obtain similar errors for national means Kish (1994) But with great variation in π j, loss of efficiency from π 1 j weighting may be high. Want to retain consistency of weighted estimation while avoiding too much inflation of variance. 5

6 Outline 1. First kind of weight modification - theory 2. Application to European Social Survey 3. Second kind of weight modification - theory 4. Simulation study 6

7 Model Population U = {1, 2,..., N} Regression model f (y j ; x j, β) Score u j (y j ; x j, β) = log f (y j ;x j,β) β write u j (β) = u j (y j ; x j, β) Mean score model: E m { uj (β) } = 0, j = 1, 2,..., N 7

8 Census Parameter β U solves N u j (β) = 0. j=1 Kish and Frankel (1974), Godambe and Thompson (1986) say β U of interest even if model fails. But is it really always the appropriate pseudo-parameter when model fails? E.g. Scott and Wild (2002, 2003), Skinner (2003) 8

9 Sampling Sample s U Sample indicator variable I j = 1 if j s I j = 0 if j / s, 9

10 (Unweighted) Sample Estimating Equations β solves N I j u j (β) = 0. j=1 design-model consistent if E m E p { Ij u j (β) } = 0, j = 1, 2,..., N in particular, if I j and Y j independent (given x j ) noninformative sampling 10

11 Weighted Sample Estimating equations β w solves N I j w j u j (β) = 0. j=1 design-model consistency condition: E m E p { Ij w j u j (β) } = 0, j = 1, 2,..., N 11

12 Design Consistency β design consistent for β U if { N E p I j w j u j (β) } = j=1 N u j (β) j=1 for arbitrary y j and β so E p ( Ij w j ) = 1 design consistency condition 12

13 Design Weights Hence w j = π 1 j, Horvitz-Thompson weight or design weight, if w j is not sample-dependent Godambe and Thompson (1986, Theorem 1) Horvitz-Thompson weighted estimation equations optimal within broader class of sample estimating equations (which give design consistent estimation of β U ) 13

14 Widening Class of Weights But design consistency condition may not be appropriate scientifically. We drop it to enable us to improve efficiency. Now require only design-model consistency condition, but do assume mean score model holds. design consistency condition design-model consistency condition So class of possible weights is widened 14

15 First Kind of Weight Modification Modification by function of covariates. Class of modified weights w j = d j q j, where d j = π 1 j q j = q(x j ) (not sample dependent) and meets design-model consistency condition if mean score model holds. will generally not meet design-consistency condition unless q j constant 15

16 Optimization Problem to determine function q( ) which minimises var mp ( β w ) var mp ( β w ) = J(β) 1 var mp { N j=1 } w j I j u j (β) J(β) 1 where J(β) = E mp { N j=1 } u j (β) I j w j β 16

17 Approximations/Assumptions - observations for different units are approximately independent - generalized linear model so that u j (β) = e j x j var mp ( β w ) { N } 1 N { N } 1 j=1 q j E m ( e 2 j ) xj x T j j=1 qj 2 ( E m dj ej 2 ) xj x T j j=1 q j E m ( e 2 j ) xj x T j 17

18 (Approximately) Optimal Solution q j E ( ) m e 2 j x j ( ) E m dj ej 2 x j Equivalent to Fuller (2009, Sect 6.3.2) for linear regression model Requires fitting of model to E m ( dj e 2 j x j ) 18

19 Estimating q( ) e j is scaled version of residual y j E m (Y j x j ) As a first approximation, suppose d j and ej 2 set q j = 1/E m (d j x j ) are uncorrelated and And thus w j = d j q j = d j /E m (d j x j ) Design weight standardized for its dependence on x j Different from Pfeffermann and Sverchkov (1999, 2003) q j 1/E m ( dj I j = 1, x j ) Will discuss estimation of E m (d j x j ) later 19

20 Application: Voter Turnout in Europe European Social Survey Round subsample of 2621 people aged in 19 European countries, providing data on variables relating to political interest and civic duty analysis similar to Fieldhouse, Tranmer and Russell (2007) European J. Political Research 20

21 Logistic Regression Analysis y = 1, if voted in last national election in country = 0, otherwise x variables for rational choice model, including political efficacy - principal components of questions measuring extent to which respondents think they can understand and influence politics system benefits - principal components of respondent s feelings of civic duty 21

22 Estimated Coefficients of Logistic Regression variable unweighted s.e. weighted s.e political efficacy political efficacy closeness of contest (%) partisanship closeness*partnership collective benefits system benefits system benefits is female belongs to ethnic minority has partner has dependent child born in country

23 Estimated Coefficients of Logistic Regression variable unweighted s.e. weighted s.e political efficacy political efficacy closeness of contest (%) partisanship closeness*partnership collective benefits system benefits system benefits is female belongs to ethnic minority has partner has dependent child born in country

24 Test for Informative Sampling Augment model by adding interactions between weight and x variables. Test whether coefficients of new variables are all 0 Wald test F (14, 2607)=2.0, p= Conclude significant effect of weighting. DuMouchel and Duncan (1983), Fuller (2009) 24

25 Test as Diagnostic for Misspecification informative sampling may indicate omitted variable in model are there variables which explain π j which could be included in model? 25

26 Sample Selection separate sampling in different countries sampling schemes vary according to different sampling frames, e.g. - lists of residents - lists of households - lists of addresses 26

27 Respecified Model major variation in design weights between countries introduce country dummy variables in model scientifically reasonable, since aim is to study variations in turnout of young people in context of country variation 27

28 Initial model New model Variable unweighted weighted unweighted weighted political efficacy political efficacy closeness of contest (%) Partisanship closeness*partnership collective benefits system benefits system benefits is female belongs to ethnic minority has partner has dependent child born in country

29 Results for Respecified Model effect of weighting reduced no longer significant little sense in attempting to respecify model further to include within country design variables, since such variation in designs between countries 29

30 Standard Errors Variable Unweighted Weighted political efficacy political efficacy closeness of contest (%) Partisanship closeness*partnership collective benefits system benefits system benefits is female belongs to ethnic minority has partner has dependent child born in country

31 Within Country Weights as Modified Weights Recall modified weight: w j = d j q j = d j /E m (d j x j ) Let x (1) be vector of country dummy variables, subvector of x Simplify E m (d j x j ) to E m (d j x (1) j ) Weighting with d j /E m (d j x (1) j ) still consistent - achieves most of efficiency gain? Estimate E m (d j x (1) j ) by design-weighted mean of design weights d c(j) for country c(j) d j / d c(j) is within country weight 31

32 Standard Errors Within Design Variable Unweighted country weights political efficacy political efficacy closeness of contest (%) Partisanship closeness*partnership collective benefits system benefits system benefits is female belongs to ethnic minority has partner has dependent child born in country

33 Within Country Weighting consistent estimation, provided dependence of y on countries correctly specified in model protects against bias from informative selection within countries (non-significant test may lack power) avoids inflation of standard errors may give consistent estimation for more suitable pseudo-parameter under model misspecification? 33

34 Return to Theory 34

35 Widening Class of Weights Further Still assume design-model consistency condition and mean score model holds. Want to minimize Write var mp ( β w ) = J 1 var mp { N j=1 } w j I j u j (β) J 1 { N } var w j I j u j (β) = j=1 { N } { var E(w j y j, x j, I j )I j u j (β) + E var( j=1 35 N j=1 } w j I j u j (β) y j, x j, I j )

36 Second Kind of Weight Modification Consistency unaffected if E(w j y j, x j, I j )I j = E(d j y j, x j, I j )I j Variance minimized if var(w j y j, x j, I j ) = 0 Achieved by setting w j = E(d j y j, x j, I j = 1)q(x j ) d j q(x j ) modify d j to d j, smooths noise in weights unrelated to y j given x j c.f Beaumont (2008) weight smoothing Chaudhuri et al. (2010), Pfeffermann (2011) (conditional) empirical likelihood 36

37 Weight Smoothing in Descriptive Surveys T = j y j Horvitz-Thompson estimator T HT = j I jd j y j Smoothed Horvitz-Thompson estimator T SHT = j I j d j y j where d j E mp (d j y j, I j = 1) E mp ( T HT ) = E mp ( T SHT ) = T V mp ( T HT ) V mp ( T SHT ) 37

38 Weight Smoothing in Regression where d j E(d j y j, x j, I j = 1) w j = d j q(x j ) optimal choice q j E ( ) m e 2 j x j ) E m ( dj ej 2 x j 38

39 Auxiliary Weight Model(s) d j = E(d j y j, x j, I j = 1) = d(y j, x j ; φ) E.g. d(y j, x j ; φ) = 1 + exp( φ 1 x j φ 2 y j ) q j depends on E( d j e 2 j x j ) Iterative estimation of β and φ Kim and Skinner (2013) See also Beaumont (2008), Pfeffermann (2011) 39

40 first weight modification Variance Estimation - replacement of q(x j ) by q(x j ; α) does not affect asymptotic variance - consistent variance estimation by treating weights d j q j as fixed second weight modification - replacement of d j by d(y j, x j ; φ) does affect asymptotic variance - variance estimation does need take account of error in estimating φ - Kim and Skinner (2013) give linearization variance estimator 40

41 Simulation Study y j = β 0 + β 1 x j + e j π j = nk j i U k, k j = i Poisson and Chao (1982) sampling exp( z j ), z j N(1 + y j, ) Pfeffermann-Sverchkov q j = 1/E ( ) ( d j ) I j = 1, x j Our first modification q j = Em ej 2 (d x j ) j ej 2 x j E 41

42 Simulation Standard Errors - Intercept Weight Model A Model B Design Pfeffermann-Sverchkov Our first modification Smoothed design Smoothed P-S Our combined modification

43 Simulation Standard Errors - Slope Weight Model A Model B Design Pfeffermann-Sverchkov Our first modification Smoothed design Smoothed P-S Our combined modification

44 Summary Both weight modifications offer efficiency gains, assuming (mean score) model holds. First weight modification offers: consistency under misspecification of q(x) no need to modify variance estimation approach Smoothing requires: correct specification of E(d j y j, x j, I j = 1) for consistency modification of variance estimation approach In both cases, consider whether implied pseudoparameter under misspecification is scientifically reasonable. 44

45 References 1 Kim, J.K. and Skinner, C.J. (2013) Weighting in survey analysis under informative sampling. Biometrika, 100, Skinner, C.J. and Mason, B. (2012) Weighting in the regression analysis of survey data with a cross-national application. Canadian Journal of Statistics, 39,

46 References 2 Beaumont, J.-F. (2008) Biometrika Chaudhuri, S., Handcock, M. and Rendall, M. (2010) Working paper DuMouchel, W. and Duncan, G. (1983) JASA Fieldhouse, E. et al. (2007) Eur. J. Political Research Fuller, W. (2009) Sampling Statistics. Wiley Godambe, V. and Thompson, M. (1986) Int. Statist.Rev. Kish, L. (1994) Int. Statist. Rev. Kish, L. and Frankel, M. (1974) J.R.S.S.B Pfeffermann, D. (2011) Survey Methodology Pfeffermann, D. and Sverchkov, M. (1999) Sankhya, B Pfeffermann, D. and Sverchkov, M. (2003) in Chambers, R. and Skinner, C. eds. Analysis of Survey Data. Wiley. Scott, A. and Wild, C. (2002) J.R.S.S.B Scott, A. and Wild, C. (2003) in Chambers, R. and Skinner, C. eds. Analysis of Survey Data. Wiley. Skinner, C. (2003) in Chambers, R. and Skinner, C. eds. Analysis of Survey Data. Wiley. 46

47 47

Efficiency of Robust Regression Model in Sampling Designs Using Weighted Averages

IOSR Journal of Mathematics (IOSR-JM) e-iss: 78-578, p-iss: 319-765X. Volume 14, Issue 1 Ver. II (Jan. - Feb. 018), PP 41-45 www.iosrjournals.org Efficiency of Robust Regression Model in Sampling Designs