Can a Pseudo Panel be a Substitute for a Genuine Panel?

Similar documents
How to Use the Internet for Election Surveys

Combining Difference-in-difference and Matching for Panel Data Analysis

Multiple regression: Categorical dependent variables

Dummies and Interactions

Estimating and Using Propensity Score in Presence of Missing Background Data. An Application to Assess the Impact of Childbearing on Wellbeing

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score

Data Analytics for Social Science

Item Response Theory for Conjoint Survey Experiments

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

Combining Non-probability and Probability Survey Samples Through Mass Imputation

Contents. Part I: Fundamentals of Bayesian Inference 1

MMWS Software Program Manual

1. Capitalize all surnames and attempt to match with Census list. 3. Split double-barreled names apart, and attempt to match first half of name.

Causal Inference in Observational Studies with Non-Binary Treatments. David A. van Dyk

Alexina Mason. Department of Epidemiology and Biostatistics Imperial College, London. 16 February 2010

Small domain estimation using probability and non-probability survey data

Goals. PSCI6000 Maximum Likelihood Estimation Multiple Response Model 1. Multinomial Dependent Variable. Random Utility Model

Dynamics in Social Networks and Causality

ESTIMATION OF TREATMENT EFFECTS VIA MATCHING

Advanced Quantitative Methods: limited dependent variables

Ecological inference with distribution regression

A comparison of fully Bayesian and two-stage imputation strategies for missing covariate data

(quantitative or categorical variables) Numerical descriptions of center, variability, position (quantitative variables)

Math 138 Summer Section 412- Unit Test 1 Green Form, page 1 of 7

An Introduction to Causal Analysis on Observational Data using Propensity Scores

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

Causal Inference Basics

Measuring Social Influence Without Bias

Gov 2002: 5. Matching

Class business PS is due Wed. Lecture 20 (QPM 2016) Multivariate Regression November 14, / 44

Econometrics of Panel Data

Identify the scale of measurement most appropriate for each of the following variables. (Use A = nominal, B = ordinal, C = interval, D = ratio.

Random Intercept Models

Covariate Balancing Propensity Score for General Treatment Regimes

Ph.D. course: Regression models. Introduction. 19 April 2012

Statistical Analysis of the Item Count Technique

Selection on Observables: Propensity Score Matching.

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

Regression Discontinuity Designs

Ph.D. course: Regression models. Regression models. Explanatory variables. Example 1.1: Body mass index and vitamin D status

Statistical Analysis of List Experiments

Propensity Score Analysis Using teffects in Stata. SOC 561 Programming for the Social Sciences Hyungjun Suh Apr

disc choice5.tex; April 11, ffl See: King - Unifying Political Methodology ffl See: King/Tomz/Wittenberg (1998, APSA Meeting). ffl See: Alvarez

Kausalanalyse. Analysemöglichkeiten von Paneldaten

Tables and Figures. This draft, July 2, 2007

Bayesian methods for missing data: part 1. Key Concepts. Nicky Best and Alexina Mason. Imperial College London

Political Science Fall 2018

Splitting a predictor at the upper quarter or third and the lower quarter or third

Applied Microeconometrics (L5): Panel Data-Basics

EMERGING MARKETS - Lecture 2: Methodology refresher

Longitudinal Data Analysis Using SAS Paul D. Allison, Ph.D. Upcoming Seminar: October 13-14, 2017, Boston, Massachusetts

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Introduction to Linear Regression Analysis

Review of Multinomial Distribution If n trials are performed: in each trial there are J > 2 possible outcomes (categories) Multicategory Logit Models

Semiparametric Generalized Linear Models

Matching. Stephen Pettigrew. April 15, Stephen Pettigrew Matching April 15, / 67

A Meta-Analysis of the Urban Wage Premium

PBAF 528 Week 8. B. Regression Residuals These properties have implications for the residuals of the regression.

Econometrics I Lecture 7: Dummy Variables

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

What s New in Econometrics. Lecture 1

Chapter 1 Introduction. What are longitudinal and panel data? Benefits and drawbacks of longitudinal data Longitudinal data models Historical notes

Sensitivity checks for the local average treatment effect

ECON 482 / WH Hong Binary or Dummy Variables 1. Qualitative Information

CS6220: DATA MINING TECHNIQUES

Comparing Change Scores with Lagged Dependent Variables in Models of the Effects of Parents Actions to Modify Children's Problem Behavior

Write your identification number on each paper and cover sheet (the number stated in the upper right hand corner on your exam cover).

CS6220: DATA MINING TECHNIQUES

multilevel modeling: concepts, applications and interpretations

Longitudinal Data Analysis Using Stata Paul D. Allison, Ph.D. Upcoming Seminar: May 18-19, 2017, Chicago, Illinois

APPENDICES TO Protest Movements and Citizen Discontent. Appendix A: Question Wordings

A dynamic perspective to evaluate multiple treatments through a causal latent Markov model

Exploiting TIMSS and PIRLS combined data: multivariate multilevel modelling of student achievement

SIMULATION-BASED SENSITIVITY ANALYSIS FOR MATCHING ESTIMATORS

A multivariate multilevel model for the analysis of TIMMS & PIRLS data

Quantitative Analysis and Empirical Methods

NELS 88. Latent Response Variable Formulation Versus Probability Curve Formulation

The STS Surgeon Composite Technical Appendix

Basic Verification Concepts

Bayesian regression tree models for causal inference: regularization, confounding and heterogeneity

Econometrics of Panel Data

Behavioral Data Mining. Lecture 19 Regression and Causal Effects

Categorical Predictor Variables

A Study of Statistical Power and Type I Errors in Testing a Factor Analytic. Model for Group Differences in Regression Intercepts

Difference-in-Differences Methods

WU Weiterbildung. Linear Mixed Models

Chapter 11. Regression with a Binary Dependent Variable

Analysis of Longitudinal Data. Patrick J. Heagerty PhD Department of Biostatistics University of Washington

Supplemental Appendix to "Alternative Assumptions to Identify LATE in Fuzzy Regression Discontinuity Designs"

Combining Experimental and Non-Experimental Design in Causal Inference

On the Use of Linear Fixed Effects Regression Models for Causal Inference

A Sampling of IMPACT Research:

Minimax-Regret Sample Design in Anticipation of Missing Data, With Application to Panel Data. Jeff Dominitz RAND. and

1 Fixed E ects and Random E ects

Analysis of Panel Data: Introduction and Causal Inference with Panel Data

Nuoo-Ting (Jassy) Molitor, Nicky Best, Chris Jackson and Sylvia Richardson Imperial College UK. September 30, 2008

Assess Assumptions and Sensitivity Analysis. Fan Li March 26, 2014

Lecture 5: Spatial probit models. James P. LeSage University of Toledo Department of Economics Toledo, OH

Econometrics with Observational Data. Introduction and Identification Todd Wagner February 1, 2017

Extending causal inferences from a randomized trial to a target population

Transcription:

Can a Pseudo Panel be a Substitute for a Genuine Panel? Min Hee Seo Washington University in St. Louis minheeseo@wustl.edu February 16th 1 / 20

Outline Motivation: gauging mechanism of changes Introduce pseudo panels as alternative statistical tool Limitations in existing pseudo panel approaches Technique for improvement Empirical Analysis Result Conclusion 2 / 20

Motivation: Presidential Approval Rating Figure: Changes in presidential approval rating in individual level Strongly Approve Somewhat Approve Neutral Somewhat Disapprove Strongly Disapprove 2010 2012 2014 Data: CCES Panel from 2010-2012-2014. More details 3 / 20

Motivation 1. Lack of panel survey data availability 2. Costly and less feasible to conduct panel survey 3. Limitation with cross-sectional survey 4 / 20

Pseudo Panel as Alternative Tool Advantages: 1. Different sources can be combined 2. Approximation of true panel 5 / 20

Pseudo Panel as Alternative Tool Advantages: 1. Different sources can be combined 2. Approximation of true panel Disadvantages: 1. Measurement error (observed - true) 2. Absence of robust techniques 3. Controversial reliability of pseudo panel 4. Not applied to political science 6 / 20

Pseudo Panel as Alternative Tool Advantages: 1. Different sources can be combined 2. Approximation of true panel Disadvantages: 1. Measurement error (observed - true) 2. Absence of robust techniques 3. Controversial reliability of pseudo panel 4. Not applied to political science Types: 1. Macro/cohort level 2. Individual level 7 / 20

Pseudo Panel with Matching Technique What it does: 1. Find a unit with similar observable characteristics 2. Reduce bias due to confounding 3. Enables a comparison of outcomes among matched and original units 8 / 20

Pseudo Panel with Matching Technique What it does: 1. Find a unit with similar observable characteristics 2. Reduce bias due to confounding 3. Enables a comparison of outcomes among matched and original units Nearest neighbor matching Propensity scores are a common tool for matching cases Match based on nearest distance of scalar, π Propensity Score : π = Pr(Y = 1 X) Distance(X i,x j ) = π i π j 9 / 20

Pseudo Panel with Matching Technique What it does: 1. Find a unit with similar observable characteristics 2. Reduce bias due to confounding 3. Enables a comparison of outcomes among matched and original units Nearest neighbor matching Propensity scores are a common tool for matching cases Match based on nearest distance of scalar, π Propensity Score : π = Pr(Y = 1 X) Distance(X i,x j ) = π i π j Limitations: 1. Apply to complete cases 2. Focus on distribution of covariates based on a single criteria rather than one-to-one exact matching 10 / 20

More details. 11 / 20 Pseudo Panel: Affinity Score Matching Finds exact matching between two individuals based on n dimensions (accounting discrete variables and missing values) ID year y x 1 x 2 x 3 x 4 x 5 1 2000 3 NA 1 2 0 0 2 2000 1 1 0 2 0 1 ID year y x 1 x 2 x 3 x 4 x 5 1001 2002 4 0 2 2 0 0 1002 2002 5 0 1 1 1 0 Table. Process of Constructing a Pseudo Panel with Affinity Score Matching

More details. 12 / 20 Pseudo Panel: Affinity Score Matching Finds exact matching between two individuals based on n dimensions (accounting discrete variables and missing values) ID year y x 1 x 2 x 3 x 4 x 5 1 2000 3 NA 1 2 0 0 2 2000 1 1 0 2 0 1 ID year y x 1 x 2 x 3 x 4 x 5 1001 2002 4 0 2 2 0 0 1002 2002 5 0 1 1 1 0 Table. Process of Constructing a Pseudo Panel with Affinity Score Matching

More details. 13 / 20 Pseudo Panel: Affinity Score Matching Finds exact matching between two individuals based on n dimensions (accounting discrete variables and missing values) ID year y x 1 x 2 x 3 x 4 x 5 1 2000 3 NA 1 0 0 0 2 2000 1 1 0 2 0 1 ID year y x 1 x 2 x 3 x 4 x 5 1001 2002 4 0 2 2 0 0 1002 2002 5 0 1 1 1 0 Table. Process of Constructing a Pseudo Panel with Affinity Score Matching

Validation and Empirical Application Data Survey Data: Cooperative Congressional Election Study (CCES) Both panel and cross-sectional surveys (2010-2012 - 2014) n = 9500 Measurement Response Variable Obama s Approval Rating (5-point Scale) Explanatory Variable Positive perception of national economy between two waves Control Variable Female, Party Identification, Education, Race, Income Model Strategy Approval Rating i = α j[i] + β [i] time i BetterEcon i + ε i α j N(µ α,σ 2 α) 14 / 20

Result: Varying Intercept Model on Obama s Approval Rating True Panel: Affinity Matching Pseudo: Propensity Matching Pseudo: Democrat 0.93 2.30 2.01 (0.018) (0.018) (0.019) Republican 0.59 0.88 0.79 (0.018) (0.018) (0.019) Time2:BetterEcon.t1 0.17 0.66 1.25 (0.014) (0.025) (0.028) Time3:BetterEcon.t2 0.10 0.72 0.26 (0.014) (0.027) (0.028) σ α 1.24 0.16 0.02 σ y 0.28 0.98 1.10 ICC 0.82 0.14 0.02 number of observation=28500, unique individual=9500. Standard errors are in parenthesis. 15 / 20

Method: Statistical Power Power: 1 the probability of making Type II error (β ) Estimate the precision of inferences 16 / 20

Method: Statistical Power Power: 1 the probability of making Type II error (β ) Estimate the precision of inferences Expectations: True panel > Pseudo panel (affinity score) Pseudo panel (affinity score) > Pseudo panel (propensity score) 17 / 20

Result Power: 0.533 Power: 0.532 Power: 0.513 Approval Rating Strongly Disapprove Strongly Approve ID = 74768 2000 2002 2004 ID = 8233 2000 2002 2004 ID = 19575 2000 2002 2004 Approval Rating Strongly Disapprove Strongly Approve ID = 68035 2000 2002 2004 ID = 65916 2000 2002 2004 ID = 28923 2000 2002 2004 Approval Rating Strongly Disapprove Strongly Approve ID = 58279 2000 2002 2004 ID = 54088 2000 2002 2004 ID = 38268 2000 2002 2004 Year Year Year Year Year Year Year Year Year (a) True Panel (b) Pseudo Panel (Affinity Score) (b) Pseudo Panel (Propensity Score)

Conclusion Summary: 1. Limitations in existing studies on constructing pseudo panel with matching technique 2. Suggest improved matching technique to build pseudo panel Finding: 1. Pseudo panel as an approximation of a true panel data 2. Introduce more feasible technique to build pseudo panel 19 / 20

Where to go next? Limitation: Examined 1) short period, 2) specific outcome variable, 3) one specific type of pseudo panel Future Studies: Explore local level, different dataset, and different types of pseudo panel Identifying panel attrition by applying affinity score matching technique Power analysis in dynamic hierarchical model Multiple imputation in longitudinal studies 20 / 20

Supplementary Materials Detail: Riverplot ( here ). Cohort Pseudo Panel ( here ). Affinity Score ( here ). CCES ( here ). Aggregated Estimation ( here ). Data - Graphics ( here ). Climate Change Model: Individual-level - Table ( here ). Individual-level - Posterior Distribution ( here ). Cohort-level - Table ( here ). Cohort-level - Posterior Distribution ( here ). 21 / 20

River plot of approval rating in individual-level: 1. Data: CCES 2. Panel: n = 9500, Complete cases = 9449 3. Average percentage of n for each categories: 47%, 8%, 1%, 25%, 19% 4. Percentage of n changed their opinion over three waves: 32% Back to Back to slide list 22 / 20

Cohort Pseudo Panel: The sample is divided into a small number of cohorts with a large number of observations in each (Browning et al 1985; Propper, Rees, and Green 2001). Cohort implies time invariant variables such as birth year. Aggregated level analysis. ȳ ct = x ct β + ᾱ ct + ū ct, where c = 1,...,C,t = 1,...,T 1. If n c is large enough, the time varying ᾱ ct can be treated as constant over time as ᾱ c. 2. Bias due to sampling error in the cohort average exist and can be substantial even for a sample size of thousands 3. No robust approach to build a cohort pseudo panel. Not much discussion but many blinded applications. Back to list 23 / 20

Affinity Score Computation: Affinity Score i,j = k i q i z i,j k i q i k i : the total number of variables that we are interested for individual i q i : the number of variables which has missing values for individual i z i,j : represents the number of variables when i and j have different values. Affinity Score i,j : the number of exact matching of the same variable between two individuals divided by the total number of variables that we are interested for individual i * Threshold: > 0.8 (among 7 dimensions, 6 of them should be exactly matched) Criteria: age, gender, education, race, party identification, ideology, income Back to Back to list slide 24 / 20

CCES Cross-sectional: 2010 (n=55400), 2012 (n=54535), 2014 (n=56200) Approval Rating: 1 (strongly disapprove) to 5 (strongly approve) National Economy Status: 1 (gotten much worse) to 5 (gotten much better) Back to list 25 / 20

Result p: 0.533 p: 0.532 p: 0.513 Approval Rating Strongly Disapprove Strongly Approve Oppose Support Approval Rating Strongly Disapprove Strongly Approve Oppose Support Approval Rating Strongly Disapprove Strongly Approve Oppose Support Support for Same Sex Marriage Support for Same Sex Marriage Support for Same Sex Marriage (a) True Panel (b) Pseudo Panel (Affinity Score) (b) Pseudo Panel (Propensity Score) Back to list 26 / 20

Panel and Pseudo Panel Dataset 1. Pseudo Panel by Nearest Neighbor Propensity Score Matching ID year y x 1 x 2 x 3 x 4 1 2000 3 1 1 2 0 1 2002 1 1 0 3 0 1 2004 5 0 1 2 0 2. Pseudo Panel by Affinity Score Matching ID year y x 1 x 2 x 3 x 4 1 2000 3 1 1 2 0 1 2002 1 1 1 2 0 1 2004 5 0 1 3 0 vs. True Panel Survey ID year y x 1 x 2 x 3 x 4 1 2000 3 1 1 2 0 1 2002 4 0 1 2 0 1 2004 5 0 1 2 0 Back to list 27 / 20

Individual-Level Analysis Result Logistic Hierarchical Model on Belief in Global Warming Panel : Pseudo Panel (Affinity) : Unusual Temperature 0.041 0.012 (0.019) (0.019) Female 0.575 0.605 (0.239) (0.228) White 0.535 0.701 (0.311) (0.309) Age 0.005 0.001 (0.008) (0.008) Education 0.160 0.213 (0.062) (0.062) Democrat 1.041 0.222 (0.317) (0.270) Republican 2.203 2.132 (0.248) (0.247) Interest in Politics 0.315 0.072 (0.151) (0.140) Intercept 0.655 0.646 (7.281) (6.948) σt 2 0.758 0.884 (2.374) (3.044) N 654 654 Npanelist 218 218 Nwave 3 3 Back to list 28 / 20

Individual-Level Analysis Result Comparison of Posterior Distributions of an Unusual Temperature Density 0 5 10 15 20 panel pseudo (affinity) difference 0.10 0.05 0.00 0.05 0.10 0.15 Esimate Size of Unusual Temperature Back to list 29 / 20

Cohort-Level Analysis Result Table: Logistic Hierarchical Model on Belief in Global Warming Panel : Pseudo Panel (Affinity) : 10-year cohort 3-year cohort 10-year cohort 3-year cohort Unusual Temperature 0.048 0.053 0.014 0.014 (0.020) (0.023) (0.019) (0.019) Intercept 0.004 0.052 0.004 0.060 (0.997) (0.986) (0.961) (0.957) σc 2 1.013 1.043 0.042 0.131 (0.521) (0.252) (0.245) (0.170) σt 2 1.084 0.840 0.783 0.940 (2.636) (1.515) (1.765) (2.714) ICC for σ 2 c 0.235 0.241 0.013 0.038 Control Variables N 654 654 654 654 N panelist 218 218 218 218 N wave 3 3 3 3 N cohort 7 22 7 22 Back to list 30 / 20

Cohort-Level Analysis Result Comparison of Posterior Distributions of Unusual Temperature by the Cohort Group Density 0 5 10 15 20 panel pseudo (affinity) difference 0 5 10 15 20 25 30 panel pseudo (affinity) difference 0.10 0.05 0.00 0.05 0.10 0.15 0.10 0.05 0.00 0.05 0.10 0.15 Estimate Size of Unusual Temperature Estimate Size of Unusual Temperature (a) 10-year Age Span Cohort Group (b) 3-year Age Span Cohort Group Back to list 31 / 20