2 Decomposition Methods - Illustrative Example

Size: px

Start display at page:

Download "2 Decomposition Methods - Illustrative Example"

Opal Sherman
5 years ago
Views:

1 2 Decomposition Methods - Illustrative Example 2.1 Reweighting Reweighting is a simple way to construct a counterfactual distribution In the case of gender, we may ask what would the distribution of wages of women look like if they had the same X s as men F Y C F (y) = where the reweighting function is Ψ(X) = Pr(X M = 1) Pr(X M = 0) F YF X F (y X)Ψ(X)dF XF (X), (1) = Pr(M = 1 X)/ Pr(M = 1) Pr(M = 0 X)/ Pr(M = 0).

2 The reweighting procedure is as follows: 1. Pool the data for women M = 0 and men M = 1 and run a logit or probit model for the probability of belonging to group M = 1. It may be useful to create an artificial sample that will include {X 0, Ψ(X)}. save temp01,replace; keep if female==1; replace female=2; save temp2, replace; use temp01, clear; append using temp2; Here it is important to use a flexible functional form that may include many interactions and to pay attention to the issue of common support

3 2. Estimate the reweighting factor Ψ(X) for observations in group M = 0 using the predicted probability of belonging to group M = 1 3. Compute the counterfactual statistic of interest using observations from the sample of women reweighted using Ψ(X).. gen sch_10afqt=sch_10*afqtp89; gen sch_10exp=sch_10*wkswk_18;. gen diploma_hsafqt=diploma_hs*afqtp89; gen diploma_hsexp=diploma_hs*wkswk_18;. gen ged_hsafqt=ged_hs*afqtp89; gen ged_hsext=ged_hs*wkswk_18;. gen smcolafqt=smcol*afqtp89; gen smcolexp=smcol*wkswk_18;. gen bachelor_colafqt=bachelor_col*afqtp89; gen bachelor_colexp=bachelor_col*wksw. gen master_colafqt=master_col*afqtp89; gen master_colexp=master_col*wkswk_18;. gen doctor_colafqt=doctor_col*afqtp89; gen doctor_colexp=doctor_col*wkswk_18;. gen expafqt=afqtp89*wkswk_18; gen expsq=wkswk_18^2; gen yrsmilsq=yrsmil78_00^2;. probit male age00 msa ctrlcity north_central south00 west hispanic black schl00 > sch_10* diploma_hs* ged_hs* smcol* bachelor_col* master_col* doctor_col* afqt > expafqt famrspb wkswk_18 expsq yrsmil78_00 yrsmilsq pcntpt_22 manuf eduheal > if female==0 female==1 ;

4 Iteration 0: log likelihood = Iteration 1: log likelihood = Iteration 2: log likelihood = Iteration 3: log likelihood = Iteration 4: log likelihood = Iteration 5: log likelihood = Probit regression Number of obs = 5309 LR chi2(41) = Prob > chi2 = Log likelihood = Pseudo R2 = male Coef. Std. Err. z P> z [95% Conf. Interval] age msa ctrlcity north_cent~l south

5 west hispanic black schl sch_ sch_10afqt sch_10exp diploma_hs diploma_hs~t diploma_hs~p ged_hs ged_hsafqt ged_hsext smcol smcolafqt smcolexp bachelor_col bachelor_c~t bachelor_c~p master_col master_col~t

6 master_col~p doctor_col doctor_col~t doctor_col~p afqtp expafqt famrspb wkswk_ expsq yrsmil78_ yrsmilsq pcntpt_ manuf eduheal othind _cons predict pmale, p;. summ pmale if male~=1, detail;

7 Pr(male) Percentiles Smallest 1% % % Obs % Sum of Wgt % Mean Largest Std. Dev % % Variance % Skewness % Kurtosis replace pmale=0.99 if pmale>0.99 & male~=1; (0 real changes made). quietly summ male if male<2 ;

8 . gen pbar=r(mean);. gen phix=(pmale)/(1-pmale)*((1-pbar)/pbar) if female==2; (5309 missing values generated). sum phix, detail; phix Percentiles Smallest 1% % % Obs % Sum of Wgt % Mean Largest Std. Dev % % Variance % Skewness % Kurtosis

9 Density Women Women as Men Men Density Women Women as Men Excl. Fam. Rsp. Men Log(wage) Log(wage) Figure 1: Densities of Male and Female Wages

10 . quietly sum lropc00 if female==0, detail ;. gen p90m=r(p90); gen p50m=r(p50); gen p10m=r(p10); gen pmeanm=r(mean);. quietly sum lropc00 if female==1, detail ;. gen p90f=r(p90); gen p50f=r(p50); gen p10f=r(p10); gen pmeanf=r(mean);. quietly sum lropc00 if female==2 [aweight=phix], detail;. gen p90fm=r(p90); gen p50fm=r(p50); gen p10fm=r(p10); gen pmeanfm=r(mean);. *aggregate decomposition;. foreach stat in mean {; 2. gen delta_o=p stat m-p stat f; 3. gen delta_x=p stat fm-p stat f; 4. gen delta_s=p stat m-p stat fm; 5. di "for statistic stat " " delta_o= " delta_o " delta_x= " delta_x " > delta_s= " delta_s; 6. drop delta_o delta_x delta_s; 7. }; for statistic mean delta_o= delta_x= delta_s= for statistic 10 delta_o= delta_x= delta_s= for statistic 50 delta_o= delta_x= delta_s= for statistic 90 delta_o= delta_x= delta_s=

11 2.2 RIF-regression Recentered Influence Function (RIF)-regressions are a convenient way to perform a OB type detailed decomposition for other statistics besides the mean, usually quantiles are preferred For quantiles, RIF-regressions correspond to a rescaled linear probability model, where the rescaling factor depends on an estimate of the density of the quantile of interest RIF(y; Q τ ) = Q τ + τ 1I {y Q τ} f Y (Q τ ) (2)

12 Because the distributional statistic of interest can be written in terms of expectations of its conditional recentered influence function, ν(f g ) = E X [E [RIF(y g ;ν) X = x]] = E [X G = g] γ ν g, a standard OB decomposition (without reweighting) can be runned using the RIF as dependent variable. forvalues qt = 10(40)90 { ; 2. gen rif_ qt =.; 3. };. pctile eval1=lropc00 if female==1, nq(100) ;. kdensity lropc00 if female==1, at(eval1) gen(evalf densf) width(0.10) nograph. forvalues qt = 10(40)90 { ; 2. local qc = qt /100.0; 3. replace rif_ qt =evalf[ qt ]+ qc /densf[ qt ] if lropc00>=evalf[ qt ] > & female==1;

13 4. replace rif_ qt =evalf[ qt ]-(1- qc )/densf[ qt ] if lropc00<evalf[ qt ] > & female==1; 5. };. pctile eval2=lropc00 if female==0, nq(100) ;. kdensity lropc00 if female==0, at(eval2) gen(evalm densm) width(0.10) nograph. forvalues qt = 10(40)90 { ; 2. local qc = qt /100.0; 3. replace rif_ qt =evalm[ qt ]+ qc /densm[ qt ] if lropc00>=evalm[ qt ] > & female==0; 4. replace rif_ qt =evalm[ qt ]-(1- qc )/densm[ qt ] if lropc00<evalm[ qt ] > & female==0; 5. };. forvalues qt = 10(40)90 { ; 2. oaxaca rif_ qt age00 msa ctrlcity north_central south00 west hispanic > black sch_10 sch10_12 diploma_hs ged_hs bachelor_col master_col doctor_col > afqtp89 famrspb wkswk_18 yrsmil78_00 pcntpt_22 manuf eduheal othind, > by(female) weight(1)

14 Reference Group: Male Coef. Table 4. Gender Wage Gap: Quantile Decomposition Results (NLSY, 2000) 10th percentile 50th percentile 90th percentile A: Raw log wage gap : Q τ [ln(w m )]-Qτ[ln(w f )] ( 0.023) ( 0.019) ( 0.026) B: Decomposition Method: Machado-Mata-Melly Estimated log wage gap: Qτ[ln(wm )]-Qτ[ln(w f )] ( 0.015) ( 0.016) ( 0.026) Total explained by characteristics ( 0.028) ( 0.027) ( 0.019) Total wage structure ( 0.027) ( 0.024) ( 0.025) C: Decomposition Method: RIF regressions without reweighing Mean RIF gap: E[RIF τ (ln(w m ))]-E[RIF τ (ln(w f ))] ( 0.023) ( 0.019) ( 0.026) Composition effects attributable to Age, race, region, etc ( 0.005) ( 0.004) ( 0.004) Education ( 0.005) ( 0.006) ( 0.01) AFQT ( 0.02) ( 0.004) ( 0.005) L.T. withdrawal due to family ( 0.021) ( 0.014) ( 0.017) Life-time work experience ( 0.026) ( 0.014) ( 0.023) Industrial Sectors ( 0.012) ( 0.008) ( 0.011) Total explained by characteristics ( 0.035) ( 0.025) ( 0.028) Wage structure effects attributable to Age, race, region, etc ( 0.426) ( 0.357) ( 0.524) Education ( 0.028) ( 0.031) ( 0.045) AFQT ( 0.03) ( 0.042) ( 0.062) L.T. withdrawal due to family ( 0.032) ( 0.025) ( 0.032) Life-time work experience ( 0.148) ( 0.082) ( 0.119) Industrial Sectors ( 0.06) ( 0.046) ( 0.052) Constant ( 0.349) ( 0.323) ( 0.493) Total wage structure ( 0.044) ( 0.028) ( 0.036) Note: The data is an extract from the NLSY79 used in O'Neill and O'Neill (2006). Industrial sectors have been added to their analysis to illustrate issues linked to categorical variables. The other explanatory variables are age, dummies for black, hispanic, region, msa, central city. Bootstrapped standard errors are in parentheses. Means are reported in Table 2.

15 > detail(groupdem:age00 msa ctrlcity north_central south00 west hispanic black, > groupaf:afqtp89, > grouped:sch_10 sch10_12 diploma_hs ged_hs bachelor_col master_col doctor_col > groupfam:famrspb, > groupex:wkswk_18 yrsmil78_00 pcntpt_22, > groupind:manuf eduheal othind) ; Blinder-Oaxaca decomposition Number of obs = : female = 0 2: female = rif_50 Coef. Std. Err. z P> z [95% Conf. Interval] Differential Prediction_ Prediction_ Difference

16 Explained groupdem grouped groupaf groupfam groupex groupind Total Unexplained groupdem grouped groupaf groupfam groupex groupind _cons Total groupdem: age00 msa ctrlcity north_central south00 west hispanic black

17 grouped: sch_10 sch10_12 diploma_hs ged_hs bachelor_col master_col doctor_col doctor_col groupaf: afqtp89 groupfam: famrspb groupex: wkswk_18 yrsmil78_00 pcntpt_22 groupind: manuf eduheal othind The rifreg.ado file on my web site can do the computation of the RIF for the gini and the variance.. quietly rifreg lropc00 age00 msa ctrlcity north_central south00 west hispanic > black sch_10 sch10_12 diploma_hs ged_hs bachelor_col master_col doctor_col > afqtp89 famrspb wkswk_18 yrsmil78_00 pcntpt_22 union governmt nonprofit > if female==1, variance retain(rif_varf) ;. quietly rifreg lropc00 age00 msa ctrlcity north_central south00 west hispanic > black sch_10 sch10_12 diploma_hs ged_hs bachelor_col master_col doctor_col > afqtp89 famrspb wkswk_18 yrsmil78_00 pcntpt_22 union governmt nonprofit > if female==0, variance retain(rif_varm) ;

18 . gen rif_var=rif_varf if female==1; (5309 missing values generated). replace rif_var=rif_varm if female==0; (2655 real changes made). oaxaca rif_var age00 msa ctrlcity north_central south00 west hispanic black > sch_10 sch10_12 diploma_hs ged_hs bachelor_col master_col doctor_col afqtp89 > famrspb wkswk_18 yrsmil78_00 pcntpt_22 manuf eduheal othind > if female==0 female==1, by(female) weight(1) > detail(groupdem:age00 msa ctrlcity north_central south00 west hispanic black > groupaf:afqtp89, > grouped:sch_10 sch10_12 diploma_hs ged_hs bachelor_col master_col doctor_col > groupfam:famrspb, > groupex:wkswk_18 yrsmil78_00 pcntpt_22, > groupind:manuf eduheal othind) ; Blinder-Oaxaca decomposition Number of obs = : female = 0

19 2: female = rif_var Coef. Std. Err. z P> z [95% Conf. Interval] Differential Prediction_ Prediction_ Difference Explained groupdem grouped groupaf groupfam groupex groupind Total Unexplained groupdem

20 grouped groupaf groupfam groupex groupind _cons Total groupdem: age00 msa ctrlcity north_central south00 west hispanic black grouped: sch_10 sch10_12 diploma_hs ged_hs bachelor_col master_col doctor_col doctor_col groupaf: afqtp89 groupfam: famrspb groupex: wkswk_18 yrsmil78_00 pcntpt_22 groupind: manuf eduheal othind

21 2.3 FFL: Reweighting and RIF-regressions The aggregate decomposition can be obtained by simple reweighting, so for any statistic ν ν O = ν ( ( F Y1 M=1) ν FY0 M=0 ) ( ν FY0 M=1) = ν ( F Y1 M=1 }{{} ν S ) + ν ( F Y0 M=1 where ν S is the wage structure effect, while ν X effect. ) ν ( FY0 M=0) }{{} ν X is the composition, To compute a detailed decomposition, we can run the corresponding RIF-regressions to obtain parameter estimates, γ ν g E [RIF(y g ;ν) X = x] = E [X G = g] γ ν g + ɛ.

22 Then the composition effect ν X,R is divided into a pure composition effect ν X,p and a component measuring the specification error, ν X,e : ν X,R = ( X 01 γ ν 01 X 0 γ ν ) ( 0 + X01 γ ν 0 X 01 γ ν ) 0 = ( X 01 X 0 ) γ ν 0 + X 01 [ γ ν 01 γ ν 0 ] (3) = ν X,p + ν X,e Similarly, the wage structure effect is written as ν S,R = ( X 1 γ ν 1 X 01 γ ν ) ( 01 + X1 γ ν 01 X 1 γ ν ) 01 = X 1 ( γ ν 1 γ ν 01 ) + ( X 1 X 01 ) γ ν 01 = ν S,p + ν S,e (4)

23 and reduces to the first term ν S,p given that the reweighting error ν S,e goes to zero as X 01 X 1 in large samples. In practice, this is estimated by contructing a third sample, which in this case will be the sample of women with male weights, sample01 The detailed reweighted decomposition is thus obtained by running two Oaxaca-Blinder decompositions: OB1) with sample 1 and sample 01 to get the pure wage structure effect, OB2) with sample 0 and sample 01 to get the pure composition effect.

24 . *** get composition effects with reweighing [E(X_0 t=1)- E(X_0 t=0)]b_c ;. oaxaca rif_50 age00 msa ctrlcity north_central south00 west hispanic black > sch_10 sch10_12 diploma_hs ged_hs bachelor_col master_col doctor_col afqtp89 > famrspb wkswk_18 yrsmil78_00 pcntpt_22 manuf eduheal othind > [aweight=wgt] if male==0 male==2, by(male) weight(1) swap > detail(groupdem:age00 msa ctrlcity north_central south00 west hispanic black, > groupaf:afqtp89, > grouped:sch_10 sch10_12 diploma_hs ged_hs bachelor_col master_col doctor_col, > groupfam:famrspb, > groupex:wkswk_18 yrsmil78_00 pcntpt_22, > groupind:manuf eduheal othind) ; Blinder-Oaxaca decomposition Number of obs = : male = 2 2: male = rif_50 Coef. Std. Err. z P> z [95% Conf. Interval] Differential

25 Prediction_ Prediction_ Difference Explained groupdem grouped groupaf groupfam groupex groupind Total Unexplained groupdem grouped groupaf groupfam groupex groupind _cons

26 Total groupdem: age00 msa ctrlcity north_central south00 west hispanic black grouped: sch_10 sch10_12 diploma_hs ged_hs bachelor_col master_col doctor_col doctor_col groupaf: afqtp89 groupfam: famrspb groupex: wkswk_18 yrsmil78_00 pcntpt_22 groupind: manuf eduheal othind. *** get wage structure effects E(X_1 t=1)*[b_1-b_c] ;. oaxaca rif_50 age00 msa ctrlcity north_central south00 west hispanic black > sch_10 sch10_12 diploma_hs ged_hs bachelor_col master_col doctor_col afqtp89 > famrspb wkswk_18 yrsmil78_00 pcntpt_22 manuf eduheal othind > [aweight=wgt] if male==1 male==2, by(male) weight(0) > detail(groupdem:age00 msa ctrlcity north_central south00 west hispanic black, > groupaf:afqtp89, > grouped:sch_10 sch10_12 diploma_hs ged_hs bachelor_col master_col doctor_col, > groupfam:famrspb, > groupex:wkswk_18 yrsmil78_00 pcntpt_22, > groupind:manuf eduheal othind) ;

27 Blinder-Oaxaca decomposition Number of obs = : male = 1 2: male = rif_50 Coef. Std. Err. z P> z [95% Conf. Interval] Differential Prediction_ Prediction_ Difference Explained groupdem grouped groupaf groupfam groupex groupind

28 Total Unexplained groupdem grouped groupaf groupfam groupex groupind _cons Total groupdem: age00 msa ctrlcity north_central south00 west hispanic black grouped: sch_10 sch10_12 diploma_hs ged_hs bachelor_col master_col doctor_col doctor_col groupaf: afqtp89 groupfam: famrspb groupex: wkswk_18 yrsmil78_00 pcntpt_22 groupind: manuf eduheal othind

29 By contrast with wage inequality, detailed wage structure effects in the case of gender are generally not statistically significant In this small sample, reweighting is also not as successful, we would like to see reweighting errors an order of magnitude smaller

Decomposing Changes (or Differences) in Distributions. Thomas Lemieux, UBC Econ 561 March 2016

Decomposing Changes (or Differences) in Distributions Thomas Lemieux, UBC Econ 561 March 2016 Plan of the lecture Refresher on Oaxaca decomposition Quantile regressions: analogy with standard regressions