Can a Pseudo Panel be a Substitute for a Genuine Panel? Min Hee Seo Washington University in St. Louis minheeseo@wustl.edu February 16th 1 / 20
Outline Motivation: gauging mechanism of changes Introduce pseudo panels as alternative statistical tool Limitations in existing pseudo panel approaches Technique for improvement Empirical Analysis Result Conclusion 2 / 20
Motivation: Presidential Approval Rating Figure: Changes in presidential approval rating in individual level Strongly Approve Somewhat Approve Neutral Somewhat Disapprove Strongly Disapprove 2010 2012 2014 Data: CCES Panel from 2010-2012-2014. More details 3 / 20
Motivation 1. Lack of panel survey data availability 2. Costly and less feasible to conduct panel survey 3. Limitation with cross-sectional survey 4 / 20
Pseudo Panel as Alternative Tool Advantages: 1. Different sources can be combined 2. Approximation of true panel 5 / 20
Pseudo Panel as Alternative Tool Advantages: 1. Different sources can be combined 2. Approximation of true panel Disadvantages: 1. Measurement error (observed - true) 2. Absence of robust techniques 3. Controversial reliability of pseudo panel 4. Not applied to political science 6 / 20
Pseudo Panel as Alternative Tool Advantages: 1. Different sources can be combined 2. Approximation of true panel Disadvantages: 1. Measurement error (observed - true) 2. Absence of robust techniques 3. Controversial reliability of pseudo panel 4. Not applied to political science Types: 1. Macro/cohort level 2. Individual level 7 / 20
Pseudo Panel with Matching Technique What it does: 1. Find a unit with similar observable characteristics 2. Reduce bias due to confounding 3. Enables a comparison of outcomes among matched and original units 8 / 20
Pseudo Panel with Matching Technique What it does: 1. Find a unit with similar observable characteristics 2. Reduce bias due to confounding 3. Enables a comparison of outcomes among matched and original units Nearest neighbor matching Propensity scores are a common tool for matching cases Match based on nearest distance of scalar, π Propensity Score : π = Pr(Y = 1 X) Distance(X i,x j ) = π i π j 9 / 20
Pseudo Panel with Matching Technique What it does: 1. Find a unit with similar observable characteristics 2. Reduce bias due to confounding 3. Enables a comparison of outcomes among matched and original units Nearest neighbor matching Propensity scores are a common tool for matching cases Match based on nearest distance of scalar, π Propensity Score : π = Pr(Y = 1 X) Distance(X i,x j ) = π i π j Limitations: 1. Apply to complete cases 2. Focus on distribution of covariates based on a single criteria rather than one-to-one exact matching 10 / 20
More details. 11 / 20 Pseudo Panel: Affinity Score Matching Finds exact matching between two individuals based on n dimensions (accounting discrete variables and missing values) ID year y x 1 x 2 x 3 x 4 x 5 1 2000 3 NA 1 2 0 0 2 2000 1 1 0 2 0 1 ID year y x 1 x 2 x 3 x 4 x 5 1001 2002 4 0 2 2 0 0 1002 2002 5 0 1 1 1 0 Table. Process of Constructing a Pseudo Panel with Affinity Score Matching
More details. 12 / 20 Pseudo Panel: Affinity Score Matching Finds exact matching between two individuals based on n dimensions (accounting discrete variables and missing values) ID year y x 1 x 2 x 3 x 4 x 5 1 2000 3 NA 1 2 0 0 2 2000 1 1 0 2 0 1 ID year y x 1 x 2 x 3 x 4 x 5 1001 2002 4 0 2 2 0 0 1002 2002 5 0 1 1 1 0 Table. Process of Constructing a Pseudo Panel with Affinity Score Matching
More details. 13 / 20 Pseudo Panel: Affinity Score Matching Finds exact matching between two individuals based on n dimensions (accounting discrete variables and missing values) ID year y x 1 x 2 x 3 x 4 x 5 1 2000 3 NA 1 0 0 0 2 2000 1 1 0 2 0 1 ID year y x 1 x 2 x 3 x 4 x 5 1001 2002 4 0 2 2 0 0 1002 2002 5 0 1 1 1 0 Table. Process of Constructing a Pseudo Panel with Affinity Score Matching
Validation and Empirical Application Data Survey Data: Cooperative Congressional Election Study (CCES) Both panel and cross-sectional surveys (2010-2012 - 2014) n = 9500 Measurement Response Variable Obama s Approval Rating (5-point Scale) Explanatory Variable Positive perception of national economy between two waves Control Variable Female, Party Identification, Education, Race, Income Model Strategy Approval Rating i = α j[i] + β [i] time i BetterEcon i + ε i α j N(µ α,σ 2 α) 14 / 20
Result: Varying Intercept Model on Obama s Approval Rating True Panel: Affinity Matching Pseudo: Propensity Matching Pseudo: Democrat 0.93 2.30 2.01 (0.018) (0.018) (0.019) Republican 0.59 0.88 0.79 (0.018) (0.018) (0.019) Time2:BetterEcon.t1 0.17 0.66 1.25 (0.014) (0.025) (0.028) Time3:BetterEcon.t2 0.10 0.72 0.26 (0.014) (0.027) (0.028) σ α 1.24 0.16 0.02 σ y 0.28 0.98 1.10 ICC 0.82 0.14 0.02 number of observation=28500, unique individual=9500. Standard errors are in parenthesis. 15 / 20
Method: Statistical Power Power: 1 the probability of making Type II error (β ) Estimate the precision of inferences 16 / 20
Method: Statistical Power Power: 1 the probability of making Type II error (β ) Estimate the precision of inferences Expectations: True panel > Pseudo panel (affinity score) Pseudo panel (affinity score) > Pseudo panel (propensity score) 17 / 20
Result Power: 0.533 Power: 0.532 Power: 0.513 Approval Rating Strongly Disapprove Strongly Approve ID = 74768 2000 2002 2004 ID = 8233 2000 2002 2004 ID = 19575 2000 2002 2004 Approval Rating Strongly Disapprove Strongly Approve ID = 68035 2000 2002 2004 ID = 65916 2000 2002 2004 ID = 28923 2000 2002 2004 Approval Rating Strongly Disapprove Strongly Approve ID = 58279 2000 2002 2004 ID = 54088 2000 2002 2004 ID = 38268 2000 2002 2004 Year Year Year Year Year Year Year Year Year (a) True Panel (b) Pseudo Panel (Affinity Score) (b) Pseudo Panel (Propensity Score)
Conclusion Summary: 1. Limitations in existing studies on constructing pseudo panel with matching technique 2. Suggest improved matching technique to build pseudo panel Finding: 1. Pseudo panel as an approximation of a true panel data 2. Introduce more feasible technique to build pseudo panel 19 / 20
Where to go next? Limitation: Examined 1) short period, 2) specific outcome variable, 3) one specific type of pseudo panel Future Studies: Explore local level, different dataset, and different types of pseudo panel Identifying panel attrition by applying affinity score matching technique Power analysis in dynamic hierarchical model Multiple imputation in longitudinal studies 20 / 20
Supplementary Materials Detail: Riverplot ( here ). Cohort Pseudo Panel ( here ). Affinity Score ( here ). CCES ( here ). Aggregated Estimation ( here ). Data - Graphics ( here ). Climate Change Model: Individual-level - Table ( here ). Individual-level - Posterior Distribution ( here ). Cohort-level - Table ( here ). Cohort-level - Posterior Distribution ( here ). 21 / 20
River plot of approval rating in individual-level: 1. Data: CCES 2. Panel: n = 9500, Complete cases = 9449 3. Average percentage of n for each categories: 47%, 8%, 1%, 25%, 19% 4. Percentage of n changed their opinion over three waves: 32% Back to Back to slide list 22 / 20
Cohort Pseudo Panel: The sample is divided into a small number of cohorts with a large number of observations in each (Browning et al 1985; Propper, Rees, and Green 2001). Cohort implies time invariant variables such as birth year. Aggregated level analysis. ȳ ct = x ct β + ᾱ ct + ū ct, where c = 1,...,C,t = 1,...,T 1. If n c is large enough, the time varying ᾱ ct can be treated as constant over time as ᾱ c. 2. Bias due to sampling error in the cohort average exist and can be substantial even for a sample size of thousands 3. No robust approach to build a cohort pseudo panel. Not much discussion but many blinded applications. Back to list 23 / 20
Affinity Score Computation: Affinity Score i,j = k i q i z i,j k i q i k i : the total number of variables that we are interested for individual i q i : the number of variables which has missing values for individual i z i,j : represents the number of variables when i and j have different values. Affinity Score i,j : the number of exact matching of the same variable between two individuals divided by the total number of variables that we are interested for individual i * Threshold: > 0.8 (among 7 dimensions, 6 of them should be exactly matched) Criteria: age, gender, education, race, party identification, ideology, income Back to Back to list slide 24 / 20
CCES Cross-sectional: 2010 (n=55400), 2012 (n=54535), 2014 (n=56200) Approval Rating: 1 (strongly disapprove) to 5 (strongly approve) National Economy Status: 1 (gotten much worse) to 5 (gotten much better) Back to list 25 / 20
Result p: 0.533 p: 0.532 p: 0.513 Approval Rating Strongly Disapprove Strongly Approve Oppose Support Approval Rating Strongly Disapprove Strongly Approve Oppose Support Approval Rating Strongly Disapprove Strongly Approve Oppose Support Support for Same Sex Marriage Support for Same Sex Marriage Support for Same Sex Marriage (a) True Panel (b) Pseudo Panel (Affinity Score) (b) Pseudo Panel (Propensity Score) Back to list 26 / 20
Panel and Pseudo Panel Dataset 1. Pseudo Panel by Nearest Neighbor Propensity Score Matching ID year y x 1 x 2 x 3 x 4 1 2000 3 1 1 2 0 1 2002 1 1 0 3 0 1 2004 5 0 1 2 0 2. Pseudo Panel by Affinity Score Matching ID year y x 1 x 2 x 3 x 4 1 2000 3 1 1 2 0 1 2002 1 1 1 2 0 1 2004 5 0 1 3 0 vs. True Panel Survey ID year y x 1 x 2 x 3 x 4 1 2000 3 1 1 2 0 1 2002 4 0 1 2 0 1 2004 5 0 1 2 0 Back to list 27 / 20
Individual-Level Analysis Result Logistic Hierarchical Model on Belief in Global Warming Panel : Pseudo Panel (Affinity) : Unusual Temperature 0.041 0.012 (0.019) (0.019) Female 0.575 0.605 (0.239) (0.228) White 0.535 0.701 (0.311) (0.309) Age 0.005 0.001 (0.008) (0.008) Education 0.160 0.213 (0.062) (0.062) Democrat 1.041 0.222 (0.317) (0.270) Republican 2.203 2.132 (0.248) (0.247) Interest in Politics 0.315 0.072 (0.151) (0.140) Intercept 0.655 0.646 (7.281) (6.948) σt 2 0.758 0.884 (2.374) (3.044) N 654 654 Npanelist 218 218 Nwave 3 3 Back to list 28 / 20
Individual-Level Analysis Result Comparison of Posterior Distributions of an Unusual Temperature Density 0 5 10 15 20 panel pseudo (affinity) difference 0.10 0.05 0.00 0.05 0.10 0.15 Esimate Size of Unusual Temperature Back to list 29 / 20
Cohort-Level Analysis Result Table: Logistic Hierarchical Model on Belief in Global Warming Panel : Pseudo Panel (Affinity) : 10-year cohort 3-year cohort 10-year cohort 3-year cohort Unusual Temperature 0.048 0.053 0.014 0.014 (0.020) (0.023) (0.019) (0.019) Intercept 0.004 0.052 0.004 0.060 (0.997) (0.986) (0.961) (0.957) σc 2 1.013 1.043 0.042 0.131 (0.521) (0.252) (0.245) (0.170) σt 2 1.084 0.840 0.783 0.940 (2.636) (1.515) (1.765) (2.714) ICC for σ 2 c 0.235 0.241 0.013 0.038 Control Variables N 654 654 654 654 N panelist 218 218 218 218 N wave 3 3 3 3 N cohort 7 22 7 22 Back to list 30 / 20
Cohort-Level Analysis Result Comparison of Posterior Distributions of Unusual Temperature by the Cohort Group Density 0 5 10 15 20 panel pseudo (affinity) difference 0 5 10 15 20 25 30 panel pseudo (affinity) difference 0.10 0.05 0.00 0.05 0.10 0.15 0.10 0.05 0.00 0.05 0.10 0.15 Estimate Size of Unusual Temperature Estimate Size of Unusual Temperature (a) 10-year Age Span Cohort Group (b) 3-year Age Span Cohort Group Back to list 31 / 20