Detailed Wage Decompositions: Revisiting the Identification Problem

Size: px
Start display at page:

Download "Detailed Wage Decompositions: Revisiting the Identification Problem"

Transcription

1 Detailed Wage Decompositions: Revisiting the Identification Problem ChangHwan Kim August 1, 2012 Department of Sociology University of Kansas 1415 Jayhawk Blvd., Room 716 Lawrence, KS Tel: (785) Fax: (785) Comments from the editor of Sociological Methodology, five anonymous reviewers, Arthur Sakamoto, Daniel Powers, Yu Xie, and Myeong-Su Yun have greatly improved previous drafts. All remaining errors are the author s sole responsibility. i

2 Detailed Wage Decompositions: Revisiting the Identification Problem August 1, 2012 Abstract Yun (2005) suggested an averaging method to resolve the identification problem in detailed decompositions. Since then, detailed decomposition techniques have been widely discussed. This paper shows that the averaging method may not answer the identification problem. The method is built on unrealistic distributional constraints on a set of dummy variables and is sensitive to both the number of groups and the method of grouping. As an alternative, a weighted averaging method is suggested which is in line with Haisken-DeNew and Schmidt s (1997) two step re-normalization. This method gives a distinct meaning to the intercept term and makes detailed decomposition feasible using a reasonable assumption. However, this paper underscores that there are multiple solutions to the identification problem, and that other detailed decomposition methods may be acceptable depending on theoretical or practical considerations. Keywords: Blinder-Oaxaca Decomposition, Detailed Decomposition, Averaging Method, Grand-Mean Weighting Method 1

3 1 Introduction Blinder-Oaxaca (BO) wage decomposition methods have been extensively applied in sociology and economics (e.g., Dodoo 1991; Farkas and Vicknair 1996; Sakamoto et al. 2000; DeLeire 2001; Sayer 2004; Van Hook et al. 2004; Phillips and Sweeney 2006; Stearns et al. 2007; Berends and Penaloza 2008). Although several variants of wage decompositions have been developed since the initial work of Blinder (1973) and Oaxaca (1973), 1 all these techniques share the basic idea that the mean wage gap between two groups at a given time can be broken down into two components (Jones and Kelley 1984; Cotton 1985). The first component is the coefficient effect and the second component is the endowment effect. On top of the two components decomposition, researchers have often reported the contributions of individual variables or of combined sets of dummy variables after implementing a BO decomposition (e.g., Fields and Wolff 1995; DeLeire 2001; Bobbitt-Zeher 2007; citealtchangengland:2010). However, these detailed decompositions can be erroneous, because the contribution of each individual variable or set of dummy variables changes with the choice of the reference group. This is the identification problem which has been a much discussed (Jones 1983; Jones and Kelley 1984; Clogg and Eliason 1986; Oaxaca and Ransom 1999; Horrace and Oaxaca 2001; Gardeazabal and Ugidos 2004; Yun 2005) yet often ignored pitfall of detailed decomposition using BO techniques. To resolve the identification problem, Gardeazabal and Ugidos (2004) suggested normalizing the estimated coefficients, and Yun (2005) proposed an averaging method as a simple alternative to the unwieldy normalization. Since then, detailed BO-type decomposition techniques have been widely discussed in sociology and economics (e.g., Bhaumik et al. 2006; Powers and Yun 2009; Kim 2010; Fortin et al. 2010; Powers et al. forthcoming). The averaging method is, however, based on unrealistic distributional constraints on a set of dummy variables. In this paper, I suggest the application of an alternative method and consider its limitations. 1 Decomposition techniques were originally developed by sociologists and demographers (e.g., Kitagawa 1955). See Powers and Yun (2009: ) for a summary history. 2

4 2 Identification Problem and Averaging Method Using ordinary least square (OLS) regression, we can estimate the log wage, y, as follows: y = a + K J j b jk x jk + e (1) j=1 k=1 where j = 1, 2, 3,..., J and k = 1, 2, 3,..., K j ; j refer to the jth factor, and k represents the kth level of each of these J factors. In equation 1, each factor includes a complete set of dummies, including the typically left out reference group. The reference group E[b jk x jk x j1 = 1] takes a value of zero, which implies the identification constraint of b j1 = 0. For simplicity, I restrict J = 1 in this paper, so that equation 1 becomes y = a+ K k=1 b kx k +e. Consider a comparison of white workers (group W) and black workers (group B). Estimating equation 1 separately for two groups, the mean wage gap between groups W and B (ȳ W ȳ B ) can be partitioned into several components as follows: ȳ W ȳ B = ( â W â B) K + (ˆb W k }{{} k=1 D1A ˆb B k ) xb k } {{ } D1B } {{ } D1 + K ( x W k k=1 xb k )ˆb W k } {{ } D2 (2) where D1 is the sum of D1A, which denotes the intercept component, and D1B, which denotes the coefficient component. D1 represents the total coefficient effect. D2 refers to the total endowment effect. By definition in OLS, the mean residual difference between ē W and ē B is zero. In a regression model, an estimated intercept varies with the choice of reference group, and that is the case with the intercept component (D1A) in equation 2. As the reference group changes in a regression model, the estimated coefficient b k changes accordingly. Therefore, estimates of the extent to which individual factors and factor levels contribute to the mean wage gap between groups W and B also vary with the choice of reference group. This illustrates the identification problem in detailed decompositions. 3

5 To resolve the identification problem, Gardeazabal and Ugidos (2004) suggest normalizing the coefficients of the dummy variables by imposing a restriction of b k = 0. As a simple way to impose such restriction, Yun (2005) proposes the following averaging method: y = ( a + b ) K + (b k b)x k + e k=1 K = a + b k x k + e k=1 (3) where a refers to a + b and b refers to (b k b). A b is computed as the sum of b k divided K k=1 by the number of factor levels; that is, b k K. For the reference group of equation 1, b 1 is zero by definition, thus the transformed coefficient b 1 b is equal to b. This transformation of b k causes K k=1 b k to become zero. In ANOVA, the restriction of b k = 0 is referred to as a sigma constraint. It is one of a number of possible identifying normalizations (Fox 2008:145). Under this constraint, the constant or intercept term is a generalized grand mean (i.e., the mean of the means), and the effects b k are deviations from this. As a result of this normalization, both the new coefficients for the independent variables, (b k b), and for the new intercept, a + b, do not depend on the choice of reference group. As the coefficient b 1 for the reference group becomes b, no group is omitted. Because equation 3 has exactly the same structure as equation 1, the BO decomposition technique for equation 2 can also be applied to equation 3. On the surface, the averaging method appears to solve the identification problem. In fact, the averaging method does not resolve it but merely conceals it. In equation 3, the intercept a is the expected wage given x k = 1/K for k = 1,..., K. That is, E[y (x k = 1/K)] = a. The difference between the intercepts of the two groups, a W a B, is the expected wage difference between hypothetical groups W and B, assuming the means of all x s equal 1/K. The normalized coefficients b k represent the expected differences of y for a group whose x k = 1 for a specific k compared to the hypothetical reference point (or group), of which the mean of x k is equal to 1/K for all k s. That is, b k = E[y (x k = 1)] a. 4

6 The identification problem occurs because the estimates of the intercept and those of all other coefficients change with the arbitrary change of reference group. If the intercept of one possible reference group (e.g, E[y x 1 = 1 & x 2,3,4,...,K = 0] when x 1 is the reference group) and that of the other possible reference group (e.g, E[y x 2 = 1 & x 1,3,4,...,K = 0] when x 2 is the reference group) are arbitrary, then the intercept of the averaging method (i.e., E[y x 1,2,3,...,K = 1/K] when a hypothetical point of which the means of all xs equal 1/K is the reference) is also arbitrary in essence. Some will argue that the averaging method nonetheless offers a normalized, standardized, and detailed decomposition. They will say that it is better to use 1/K than to cherrypick the reference group. However, the averaging method is sensitive to the number of k. This sensitivity leads to another kind of identification problem. Suppose a researcher uses the four factor levels for education. One possible four-factor grouping would be LTHS (less than high school), HSG (high school graduate), SC (some college), and BA+ (bachelor degree or higher), in which case the intercept calculated using the averaging method is E(y x LT HS = x HSG = x SC = x BA+ =.25). If the researcher changes her education categories into five, dividing BA+ into BA and Grad (graduate degrees), the intercept of the averaging method is changed to E(y x LT HS = x HSG = x SC = x BA = x Grad =.20). In the former case, BA+ would include a 25% share of workers, whereas in the second grouping it would include a 40% share. As the size of K changes, so does the intercept estimate, as well as all the other coefficients. The averaging method is sensitive not only to the number of groups, but also to the method of grouping. Suppose a research uses another way of four-factor grouping, which combines LTHS and HSG into <HSG and divides BA+ into BA and Grad, so that four groups are <HSG, SC, BA, and Grad. In the previous four-factor grouping, BA+ would include a 25% share of workers, whereas in the new grouping it would include a 50% share. As the hypothetical distribution of x for the intercept differs as a function of grouping method, almost all estimated effects including the intercept effect (D1A), the sum of coefficient effects (sum of D1B), coefficient effects (D1B) of factor level and endowment effects (D2) of 5

7 factor level differ as well. Only the sum of D1 and the sum of D2 are consistently estimated free from the variation of model specifications. Note that the original BO decomposition also yields the same results regarding the sums of D1 and D2. There is also an identification problem with dichotomous variables (Yun 2005). Which values should be coded as 1 and 0 is an arbitrary decision made by the researcher. With the averaging method, the intercept for the dichotomous variables is the value expected when the proportions of x = 1 and x = 0 are equal (i.e.,.50). This simply is not the case for many variables of interest, such as union membership. In sum, despite contentions to the contrary, the averaging method suffers from the same identification problem as the original BO decomposition. Different numbers of groups and different methods of grouping can lead to substantially divergent results. 3 A Suggested Alternative Approach: The Grand-Mean Weighting Method If neither the current BO decomposition nor the averaging method can solve the identification problem, are detailed decompositions feasible at all? They are said to be infeasible because the intercept term in regression models depends on the choice of reference group. In fact, any normalization with the linear restriction of the parameters K k=1 b k w k = 0 will yield model-free decomposition estimates. There are an infinite number of possible restrictions. The averaging method (where the weighting factors w k are equal to 1/K) is a special cases of these restrictions. Even the original BO decomposition can be considered a special case of the restriction, with w 1 = 1 for k = 1 (factor level 1 is the reference group) and w k = 0 for all other k. Given the restriction of K k=1 b k w k = 0, all detailed decompositions are not mathematically wrong. The feasibility of detailed decomposition does not hinge on mathematical tweaks, but rather on acceptable restrictions involving the weighting factors w k. As an alternative approach, I suggest to apply a weighted averaging method in which the 6

8 weighting factors are grand-means. The weighted normalization is in line with the two-step re-normalization proposed by Haisken-DeNew and Schmidt (1997) and the restricted least squares method discussed by Greene and Seaks (1991). I will call this the grand-mean (GM) weighting method. In the following section, I first discuss the mathematical advantage of the GM weighting method and then turn to its theoretical implications. To carry out a detailed decomposition using the GM weighting method, it is necessary to transform the estimated regression coefficients of equation 1, b k, as follows: y = ( a + b ) K + (b k b )x k + e k=1 K = a + b k x k + e k=1 where b = K b k x k. k=1 (4) Yun (2005) used a simple arithmetic mean to compute b in equation 3. The normalization of the averaging method by K k=1 b k = 0 is equivalent to the normalization based on K k=1 b k (1/K) = 0. The GM weighting method, instead, normalizes the estimated coefficients by imposing K k=1 b k x k = 0, where x refers to the grand mean for both group W and group B. That is, b is treated as the grand-mean-weighted sum of b k in equation 4. After these transformations, the usual BO decomposition techniques can be applied. The insensitivity of a to the choice of reference group can easily be proved. For simplicity, let s assume there is only one dummy variable on the right side of equation. When x 0 is a reference group, the regression model looks like equation 5a, and as we change the reference group to x 1, the estimated regression model becomes equation 5b. y = a + bx 1 + e (5a) y = (a + b) + ( b)x 0 + e (5b) 7

9 Applying equation 4, equations 5a and 5b are transformed to equations 6a and 6b respectively: y = (a + b x 1 ) + (0 b x 1 )x 0 + (b b x 1 )x 1 + e (6a) y = (a + b + [ b x 0 ]) + ( b [ b x 0 ])x 0 + (0 [ b x 0 ])x 1 + e (6b) The intercept of equation 6a is (a + b x 1 ) and that of equation 6b is (a + b + [ b x 0 ]). Because x 1 is a dichotomous variable, x 0 = 1 x 1 and x 0 = 1 x 1. If we replace x 0 with 1 x 1 for the intercept of 6b, it is reduced to (a + b x 1 ) which is identical with the intercept of 6a. This proves that the intercept of the GM weighting method is insensitive to the choice of reference group. The intercept effect (D1A) of the GM weighting method quantifies the extent to which members of the disadvantaged group are, on average, treated differently than members of the advantaged group in a society. Thus, this value can also be interpreted as the average extent of discrimination (assuming no unobserved heterogeneity). The estimated contribution of the individual factor level (D1B) of the GM weighting method indicates a deviation from the mean discrimination level. The sum of D1B of the GM weighting method is close to zero. 2 When there is group-based discrimination, all members of the disadvantaged group suffer to a similar degree. If the extent of discrimination varies greatly within a given minority group, it would be hard to label the discrimination as groupbased. Therefore, the small effects of D1B, along with the large intercept effect are what one would expect if there is group-based discrimination. The sums of D2 are identical between averaging method and the GM weighting method. Unlike the averaging method, however, the GM weighting method yields consistent estimates of the D2 for individual factor levels regardless of model specifications. 2 The sum of D1B of the GM weighting method will be meaningfully different from zero only if the value of x for each group differs from the grand mean. A similar limitation is inevitable to all detailed decomposition methods. For the averaging method, the sum of D1B can be substantively large only if the distribution of x differs from 1/K, and the sum of D1B becomes larger as a researcher arbitrarily applies a model specification that increases x k 1/K. For the original BO method, the sum of D1B will be near zero if the proportion of the reference group approaches to 1, and conversely it becomes larger as a researcher arbitrarily chooses a reference group of which the proportion (i.e., x k ) is smaller than other groups. 8

10 Table 1: Descriptive Statistics Total White Black Gap x x W x B x W x B Log Wage, ȳ Less Than High School (LTHS) High School Graduaate (HSG) Some College (SC) Bachelor Degree (BA) Graduate Degree (Grad) Never Married Currently Married Widow/Divorce/Seperated An Illustrative Example Using the 2009 Current Population Survey Monthly Outgoing Rotation Group (CPS-MORG), I decompose the log wage gap (.284 log dollars) between white male workers and black male workers. To examine the sensitivity of decomposition results to model specifications, I estimate three models, applying both the averaging method and the GM weighting method. Table 1 shows the grand means and two group means. Table 2 presents the decomposition results. 3 Model 1 decomposes the racial gap into five educational levels, three age groups, and three marital status. In Model 2, LTHS and HSG are collapsed to <HSG and marital status is divided into two categories. Model 3 has the same model specification with Model 2 except educational categories. BA and Grad are collapsed to BA+ and LTHS and HSG are separately identified. 3 The variance-covariance matrix of the averaging method is discussed in detail by Yun (2008). The same technique can be applied for the modified GMC method with slight modification. The variance-covariance matrix of the normalized regression coefficients of the averaging method is computed as Σ b = W Σ B 0W where W is a weight matrix and Σ B 0 is a reformatted variance-covariance matrix of the original regression coefficients (Σ B). For the GM weighting method, everything except the weighting matrix W is the same as the averaging method. The weighting matrix needs to be rebuilt by replacing a set of matrix of 1/K j for each factor with a set of matrix of weighting values using grand means. The new coefficient for the GM weighting method is obtained by taking diagonal of W B. The variance-covariance matrix of the new coefficients is computed as Σ b = W Σ B W. 9

11 In all three models, the sum of D1 (D1A + D1B) and the sum of D2 are identical between two decomposition methods. The sum of D2 for each factor (e.g., D2 of Edu Effect) is also identical across methods. All other estimated decomposition components, however, differ by decomposition methods. Importantly, the decomposition results of the GM weighting method are consistent across model specifications, while those of the averaging method are substantially altered by models. The intercept component (D1A) of the averaging method is.175 in Model 1, but it becomes.167 in Model 2 and.200 in Model 3. In contrast to the averaging method, the intercept components using the GM weighting method barely change across the models. The coefficient effects (D1) of individual factors and factor levels are not consistently estimated with the averaging method. For example, in Model 1 which uses five educational categories, the sum of D1 of education is.005. When the education factor levels are reduced to four categories in Model 2, the sum of D1 of education becomes.023. When I modify the classification of education factors again in Model 3, the effect now turns out to be These results imply that we cannot determine whether the coefficient effect of education contributes to the reduction of racial gap or to the increase of racial gap with the averaging method. For another example, the sum of D1 for two educational factor levels, LTHS and HSG, of the averaging method is.014 in Model 1, but it is.024 in Model 2 and.007 in Model 3. Unlike the averaging method, the GM weighting method yields consistent estimates of the effects of D1 for individual factors and factor levels under the different classifications of factor levels. The averaging method does not produce consistent estimates of the endowment effects (D2) for individual factor levels either. The estimated effect of D2 for BA and Grad combined in the averaging method is either.028,.039, or.049 depending on the model specifications, while that in the GM weighting method is almost identical across models. When a dichotomous variable is used, the averaging method reports even effects for two dichotomous categories by design. As a result, D2 s of married and not-married are equally.016 in Model 2. However, when three marital status factor levels are used in Model 1, D2 10

12 Table 2: Decomposition of the Mean Log Wage Gap between White and Black Men Using Averaging and GM weighting Methods Averaging GM weighting (Equation 3) (Equation 4) D1 D2 D1 D2 A. Model I LTHS HSG **.010*.016** SC BA ** ** Grad ** * [Σ Edu Effect] [.005] [.066]** [.007]** [.066]** [Σ Age Effect] [.000] [-.001]** [.001]* [-.001]** Never Married -.011*.012* -.015**.019** Currently Married.016**.020**.009**.012** Wid/Div/Sep [Σ Marriage Effect] [.005]* [.034]** [-.009]** [.034]** Intercept.175**.187** [Total] [.185]** [.099]** [.185]** [.099]** B. Model 2 <HSG.024**.031*.017**.024 SC BA Grad [Σ Edu Effect] [.023]** [.063]** [.007]** [.063]** [Σ Age Effect] [.000] [-.001]** [.001] [-.001]** Currently Not-married -.013** **.021 Currently Married.012** **.012 [Σ Marriage Effect] [-.001]** [.032]** [-.008]** [.032]** Intercept.167**.190** [Total] [.189]** [.095]** [.189]** [.095]** C. Model 3 LTHS HSG *.016 SC BA ** ** [Σ Edu Effect] [-.012] [.064]** [.005] [.064]** [Σ Age Effect] [.000] [-.001]** [.000] [-.001]** Currently Not-married -.014** **.021 Currently Married.013** **.012 [Σ Marriage Effect] [-.001]** [.033]** [-.009]** [.033]** Intercept.200**.191** [Total] [.188]** [.096]** [.188]** [.096]** 11

13 of not married (i.e., the combination of never-married and widowed/divorced/separated) is smaller than that of married. Unlike the averaging method, the GM weighting method again produces consistent amounts of D2 for married regardless of models. Another noteworthy point is about the effects of age. Unlike other variables, the estimated effects D1 and D2 are near zeros in both methods. This is simply because the proportion of each age group happens to equal 1/K (i.e., 1/3) for both racial groups. In short, this illustration clearly shows that the detailed decomposition of the GM weighting methods are model-free and consistent, while the averaging method is sensitive to model specifications. 5 The Best Practice Even though I suggested to apply the GM weighted method for detailed decomposition, I hasten to add that there are no methods, including even the GM weighting method, that can ultimately solve the identification problem and thus be universally applied to all situations. Recall that given the restriction that K k=1 b k w k = 0, then all detailed decompositions are mathematically correct. Different restrictions lead to different interpretations of the estimates. Thus, the most essential question is what method is best and what principles should be applied when choosing a specific decomposition method. If the task is to compare groups within a nation on economic performance, I argue that the GM weighting method is generally preferable. According to this method, the currently observed distribution of x per se should be accepted as given. The intercept terms in equations 4 measure the expected wage when x is distributed as x. The reason why w k should be the grand-mean rather than a group-specific mean (or other weighting factor), is that the wage is determined by supply and demand in the whole labor force of a society, not a specific group. For example, the supply of highly educated workers can be measured best by x Grad, not by x W Grad or xb Grad. If the currently observed labor market reflects an equilibrium in employment, which in turn affects wages, the most reasonable and practical assumption 12

14 on the current status of the labor market is x. 4 The grand-means need not be of the two groups (W and B); their estimation can include other groups not of interest in the current study. Which distribution of x yields the most realistic estimate of actual wages depends upon the researcher s judgment. When the sample in a given dataset is representative, the grand means are unbiased and consistent estimates of E(x) for a given population. As far as E(x) is considered a reflection of the current social conditions in a given society, the GM weighting method accurately estimates the extent to which each factor and factor level contributes to the group differences under these social conditions. If there are theoretical or practical reasons to define a specific group as the reference group, the original BO decomposition method can be applied. For example, suppose a researcher conducts a detailed decomposition of a wage difference, with a particular focus on college premiums. College premiums are defined as the net difference in the results of HSG and BA. Therefore, HSG may be the natural choice for the reference group. Even in cases such as this, however, I recommend combining the original BO method with the GM weighting method. The dummy coding used in the original BO method need be applied only for the education factor, with the GM weighting method applied to all the others. This is because there are no strong theoretical or practical reasons to pick a certain factor level (e.g., the Pacific region) as the reference for computing college premiums. One caveat researchers should bear in mind in interpreting the college premium effect of the BO decomposition is that the college premiums are computed relative to the inner group counterparts. By setting HSG as a reference point, we implicitly assume that HSG does not contribute to the wage gap between two populations we are interested in. A positive college premium effect in account for the wage gap between group B (black workers) and group W (white workers) does not necessarily indicate that the wage of the college educated workers of group B exceeds that of group W. The higher college premium of group B can be a reflection of the excessive discrimination against the low educated workers of group B. 4 A similar logic was applied in a study of inter-industry wage differentials by Krueger and Summers (1988). 13

15 The example in the previous section illustrates how more generically analytic and substantive grounds may be invoked when choosing a specific decomposition method. In this spirit, I would recommend the averaging method only when it is reasonable to assume a uniform distribution across factor levels (although in the case of studies of labor market outcomes, such an assumption seems dubious). Other constraints of K k=1 b k w k = 0 might be possible, but there needs to be compelling reasons to bypass the grand means. The choice of weighting values becomes more subtle when detailed decomposition techniques are applied beyond the labor market. For group comparisons within a nation, such as racial differences in voting rates or gender differences in subjective well-being, the GM weighting method is still preferable in most cases. However, the GM weighting method is not always the best choice. In particular, grand means would not be appropriate weighting factors for international comparisons. Suppose a researcher is interested in the difference in mortality rates between the US and China. As the Chinese economy develops, the distributions of age, education-level, and other covariates in China would approach those in the US. If a researcher wants to perform a decomposition under the assumption that the distributions of the xs in China are equal to those in the US, the weighting factors used to normalize the coefficients should be x US, not the (weighted) means of groups means, x US and x CN. There are other studies for which the averaging method is most appropriate. An example is the total fertility rate (TFT), which is a hypothetical fertility rate when a woman experiences the current, age-specific population fertility rates throughout her life. A uniform distribution of age groups should be assumed for computing the TFT. 5 If researchers want to yield the intercept component (D1A) representing the TFT after controlling for the other covariates, they can apply the averaging method to the age variable and the modified GM weighting method to the covariates. 6 5 Yu Xie pointed out this in his comments on the previous draft of this paper at the quantitative methodology session of the 2011 American Sociological Association annual meeting. 6 Note that the identification problem and its solutions as discussed in this paper are relevant to the methods used for rate standardization and the decomposition of rate differences, matters that have long been discussed in the demography literature ( see, for example, Kitagawa (1955); Clogg and Eliason (1988); Liao (1989); 14

16 In short, there is no single best method of obtaining components for decomposition (Clogg et al. 1990:191). In principal, any detailed decomposition method is acceptable as long as there are theoretical or practical reasons to believe that the researcher s choice of reference group (or weighting factors) produces a meaningful decomposition result. This is why I refer to the GM weighting method A Suggested Alternative, not The Solution. 6 Conclusion Since the development of the BO decomposition techniques, detailed decompositions have often been reported despite the caveats about the identification problem that has been raised many scholars. To address this concern, Yun (2005) suggested the averaging method. Although that method is a notable advance and is undoubtedly applicable in some cases, it does not resolve the identification problem entirely. It is based on unrealistic distributional constraints on a set of dummy variables, and it is sensitive to the number of groups and the method of grouping. The legitimacy of any detailed decomposition depends on the acceptability of the assumption of how the independent variables are distributed for the purpose of computing the intercept. There are multiple solutions to the identification problem. Different model specifications for detailed decompositions are appropriate, depending on various theoretical and/or practical considerations. Unless such considerations are compelling, however, the GM weighting method is likely to be more generally preferred for studies of labor market outcomes. This conclusion is based on the following reasons: (1) a state of equilibrium (or current social conditions) is the most reasonable assumption to make for labor market phenomena; (2) estimates of the contributions of individual factors and factor levels from the GM weighting method are the least sensitive to model specifications, such as the choice of coding scheme; and (3) as a result, the GM weighting method provides clear substantive interpretations of the intercept Clogg et al. (1990)). 15

17 and coefficients components for individual factor levels. Detailed decomposition can be applied to various sociological issues. Given the rise in the number of highly educated workers, estimating the detail contributions of the differences in levels of education, fields of study, occupation, and other covariates in accounting for gender/race earnings gaps is especially promising. Wealth inequality is another area of interest, where a key issue is how to compute the extent to which each factor contributes to wealth accumulation (Spilerman 2000); detailed decomposition is an essential tool for such calculations (Scholz and Levine 2004). Application of the GM weighting method to non-linear models is warranted. Although there is no general agreement on how to decompose the results of quantile regressions, the GM weighting method can be easily applied to quantile regressions as long as the use of the Blinder-Oaxaca type decomposition implemented by García et al. (2001) is acceptable. 7 7 See Gardeazabal and Ugidos (2005) for further discussion. 16

18 References Berends, Mark, amd Samuel R. Lucas and Roberto V. Penaloza How Changes in Families and Schools Are Related to Trends in Black-White Test Scores. Sociology of Education 81: Bhaumik, Sumon Kumar, Ira N. Gang, and Myeong-Su Yun Ethnic conflict and economic disparity: Serbians and Albanians in Kosovo. Journal of Comparative Economics 34: Blinder, Alan S Wage Discrimination: Reduced Form and Structural Estimates. Journal of Human Resources 8: Bobbitt-Zeher, Donna The Gender Income Gap and the Role of Education. Sociology of Education 80:1 22. Clogg, Clifford C. and Scott R. Eliason On Regression Standardization for Moments. Sociological Methods and Research 14: Clogg, Clifford C. and Scott R. Eliason A Flexible Procedure for Adjusting Rates and Proportions, Including Statistical Methods for Group Comparisons. American Sociological Review 53: Clogg, Clifford C., James W. Shockey, and Scott R. Eliason A General Statistical Framework for Adjustment of Rates. Sociological Methods and Research 19: Cotton, Jeremiah Decomposing Income, Earnings, and Wage Differentials. Sociological Methods and Research 14: DeLeire, Thomas Changes in Wage Discrimination against People with Disabilities: Journal of Human Resources 36: Dodoo, F. Nil-Amoo Earnings differences among Blacks in America. Social Science Research 20:

19 Farkas, George and Keven Vicknair Appropriate Tests aof Racial Wage Discrimination Require Controls for Cognitive Skill: Comment on Cancio, Evans, and Maume. American Sociological Review 61: Fields, Judith and Edward N. Wolff Interindustry Wage Differentials and the Gender Wage Gap. Industrial and Labor Relations Review 49: Fortin, Nicole, Thomas Lemieux, and Sergio Firpo Decomposition Methods in Economics. NBER Working Papers Fox, John Applied Regression Analysis and Generalized Linear Models. Thousand Oaks, CA: Sage Publications, Inc. García, Jaume, Pedro J. Hernández, and Ángel López-Nicolás How Wide is the Gap? An Investigation of Gender Wage Differences Using Quantile Regression. Empirical Economics 26: Gardeazabal, Javier and Arantza Ugidos More on Identification in Detailed Wage Decompositions. The Review of Economics and Statistics 86: Gardeazabal, Javier and Arantza Ugidos Gender Wage Discrimination at Quantiles. Journal of Population Economics 18: Greene, William H. and Terry G. Seaks The Restricted Least Square Estimator: A Pedagogical Note. The Review of Economics and Statistics 73: Haisken-DeNew, J.P. and C. M. Schmidt Inter-Industry and Inter-Regional Differentials: Mechanics and Interpretation. The Review of Economics and Statistics 79: Horrace, William C. and Ronald L. Oaxaca Inter-Industry Wage Differentials and the Gender Wage Gap: An Identification Problem. Industrial and Labor Relations Review 54: Jones, F. L On Decomposing the Wage Gap: A Critical Comment on Blinder s Method. Journal of Human Resources 18:

20 Jones, F. L. and Jonathan Kelley Decomposing Differences Between Groups: A Cautionary Note on Measuring Discrimination. Sociological Methods and Research 12: Kim, ChangHwan Decomposing the Change in the Wage Gap Between White and Black Men Over Time, : An Extension of the Blinder-Oaxaca Decomposition Method. Sociological Methods and Research 38: Kitagawa, E.M Components of a Difference between Two Rates. Journal of the American Statistical Association 50: Krueger, Alan B. and Lawrence H. Summers Efficiency Wages and the Inter-Industry Wage Structure. Econometrica 57: Liao, Tim Futing A Flexible Approach for the Decomposition of Rate Differences. Demography 26: Oaxaca, Ronald L Male-female Wage Differentials in Urban Labor Markets. International Economic Review 14: Oaxaca, Ronald L. and Michael R. Ransom Identification in Detailed Wage Decompositions. The Review of Economics and Statistics 81: Phillips, Julie A. and Megan M. Sweeney Can Differential Exposure to Risk Factors Explain Recent Racial and Ethnic Variation in Marital Disruption? Social Science Research 35: Powers, Daniel A., Hirotoshi Yoshioka, and Myeong-Su Yun. forthcoming. mdvcmp: Multivariate Decomposition for Nonlinear Response Models. The Stata Journal. Powers, Daniel A. and Myeong-Su Yun Multivariate Decomposition for Hazard Rate Models. Sociological Methodology 39:

21 Sakamoto, Arthur, Huei-Hsia Wu, and Jessie M. Tzeng The Declining Significance of Race among American Men During the Latter Half of the Twentieth Century. Demography 37: Sayer, Liana C Are Parents Investing Less in Children? Trends in Mothers and Fathers Time with Children. American Journal of Sociology 110:1 43. Scholz, John Karl and Kara Levine U.S. Black-White Wealth Inequality. In Social Inequality, edited by Kathryn M. Neckerman, pp , New York. Russell Sage Foundation. Spilerman, Seymour Wealth and Stratification Processes. Annual Review of Sociology 26: Stearns, Elizabeth, Stephanie Moller, Judith Blau, and Stephanie Potochnick Staying Back and Dropping Out: The Relationship Between Grade Retention and School Dropout. Sociology of Education 80: Van Hook, Jennifer, Susan L. Brown, and Maxwell Ndigume Kwenda A Decomposition of Trends in Poverty among Children of Immigrants. Demography 41: Yun, Myeong-Su A Simple Solution to the Identification Problem in Detailed Wage Decompositions. Economic Inquiry 43: Yun, Myeong-Su Identification Problem and Detailed Oaxaca Decomposition: A General Solution and Inference. Journal of Economic and Social Measurement 33:

Normalized Equation and Decomposition Analysis: Computation and Inference

Normalized Equation and Decomposition Analysis: Computation and Inference DISCUSSION PAPER SERIES IZA DP No. 1822 Normalized Equation and Decomposition Analysis: Computation and Inference Myeong-Su Yun October 2005 Forschungsinstitut zur Zukunft der Arbeit Institute for the

More information

Sources of Inequality: Additive Decomposition of the Gini Coefficient.

Sources of Inequality: Additive Decomposition of the Gini Coefficient. Sources of Inequality: Additive Decomposition of the Gini Coefficient. Carlos Hurtado Econometrics Seminar Department of Economics University of Illinois at Urbana-Champaign hrtdmrt2@illinois.edu Feb 24th,

More information

Marginal effects and extending the Blinder-Oaxaca. decomposition to nonlinear models. Tamás Bartus

Marginal effects and extending the Blinder-Oaxaca. decomposition to nonlinear models. Tamás Bartus Presentation at the 2th UK Stata Users Group meeting London, -2 Septermber 26 Marginal effects and extending the Blinder-Oaxaca decomposition to nonlinear models Tamás Bartus Institute of Sociology and

More information

More on Roy Model of Self-Selection

More on Roy Model of Self-Selection V. J. Hotz Rev. May 26, 2007 More on Roy Model of Self-Selection Results drawn on Heckman and Sedlacek JPE, 1985 and Heckman and Honoré, Econometrica, 1986. Two-sector model in which: Agents are income

More information

Rethinking Inequality Decomposition: Comment. *London School of Economics. University of Milan and Econpubblica

Rethinking Inequality Decomposition: Comment. *London School of Economics. University of Milan and Econpubblica Rethinking Inequality Decomposition: Comment Frank A. Cowell and Carlo V. Fiorio *London School of Economics University of Milan and Econpubblica DARP 82 February 2006 The Toyota Centre Suntory and Toyota

More information

Introduction to Linear Regression Analysis

Introduction to Linear Regression Analysis Introduction to Linear Regression Analysis Samuel Nocito Lecture 1 March 2nd, 2018 Econometrics: What is it? Interaction of economic theory, observed data and statistical methods. The science of testing

More information

Additive Decompositions with Interaction Effects

Additive Decompositions with Interaction Effects DISCUSSION PAPER SERIES IZA DP No. 6730 Additive Decompositions with Interaction Effects Martin Biewen July 2012 Forschungsinstitut zur Zukunft der Arbeit Institute for the Study of Labor Additive Decompositions

More information

GROWING APART: THE CHANGING FIRM-SIZE WAGE PREMIUM AND ITS INEQUALITY CONSEQUENCES ONLINE APPENDIX

GROWING APART: THE CHANGING FIRM-SIZE WAGE PREMIUM AND ITS INEQUALITY CONSEQUENCES ONLINE APPENDIX GROWING APART: THE CHANGING FIRM-SIZE WAGE PREMIUM AND ITS INEQUALITY CONSEQUENCES ONLINE APPENDIX The following document is the online appendix for the paper, Growing Apart: The Changing Firm-Size Wage

More information

Ch 7: Dummy (binary, indicator) variables

Ch 7: Dummy (binary, indicator) variables Ch 7: Dummy (binary, indicator) variables :Examples Dummy variable are used to indicate the presence or absence of a characteristic. For example, define female i 1 if obs i is female 0 otherwise or male

More information

An Alternative Estimator for Industrial Gender Wage Gaps: A Normalized Regression Approach

An Alternative Estimator for Industrial Gender Wage Gaps: A Normalized Regression Approach DISCUSSION PAPER SERIES IZA DP No. 9381 An Alternative Estimator for Industrial Gender Wage Gaps: A Normalized Regression Approach Myeong-Su Yun Eric S. Lin September 2015 Forschungsinstitut zur Zukunft

More information

Making sense of Econometrics: Basics

Making sense of Econometrics: Basics Making sense of Econometrics: Basics Lecture 4: Qualitative influences and Heteroskedasticity Egypt Scholars Economic Society November 1, 2014 Assignment & feedback enter classroom at http://b.socrative.com/login/student/

More information

Modeling Mediation: Causes, Markers, and Mechanisms

Modeling Mediation: Causes, Markers, and Mechanisms Modeling Mediation: Causes, Markers, and Mechanisms Stephen W. Raudenbush University of Chicago Address at the Society for Resesarch on Educational Effectiveness,Washington, DC, March 3, 2011. Many thanks

More information

Potential Outcomes Model (POM)

Potential Outcomes Model (POM) Potential Outcomes Model (POM) Relationship Between Counterfactual States Causality Empirical Strategies in Labor Economics, Angrist Krueger (1999): The most challenging empirical questions in economics

More information

ECON Interactions and Dummies

ECON Interactions and Dummies ECON 351 - Interactions and Dummies Maggie Jones 1 / 25 Readings Chapter 6: Section on Models with Interaction Terms Chapter 7: Full Chapter 2 / 25 Interaction Terms with Continuous Variables In some regressions

More information

Econometrics Problem Set 4

Econometrics Problem Set 4 Econometrics Problem Set 4 WISE, Xiamen University Spring 2016-17 Conceptual Questions 1. This question refers to the estimated regressions in shown in Table 1 computed using data for 1988 from the CPS.

More information

WHAT IS HETEROSKEDASTICITY AND WHY SHOULD WE CARE?

WHAT IS HETEROSKEDASTICITY AND WHY SHOULD WE CARE? 1 WHAT IS HETEROSKEDASTICITY AND WHY SHOULD WE CARE? For concreteness, consider the following linear regression model for a quantitative outcome (y i ) determined by an intercept (β 1 ), a set of predictors

More information

An Extension of the Blinder-Oaxaca Decomposition to a Continuum of Comparison Groups

An Extension of the Blinder-Oaxaca Decomposition to a Continuum of Comparison Groups ISCUSSION PAPER SERIES IZA P No. 2921 An Extension of the Blinder-Oaxaca ecomposition to a Continuum of Comparison Groups Hugo Ñopo July 2007 Forschungsinstitut zur Zukunft der Arbeit Institute for the

More information

Rockefeller College University at Albany

Rockefeller College University at Albany Rockefeller College University at Albany PAD 705 Handout: Simultaneous quations and Two-Stage Least Squares So far, we have studied examples where the causal relationship is quite clear: the value of the

More information

A Meta-Analysis of the Urban Wage Premium

A Meta-Analysis of the Urban Wage Premium A Meta-Analysis of the Urban Wage Premium Ayoung Kim Dept. of Agricultural Economics, Purdue University kim1426@purdue.edu November 21, 2014 SHaPE seminar 2014 November 21, 2014 1 / 16 Urban Wage Premium

More information

Flexible Estimation of Treatment Effect Parameters

Flexible Estimation of Treatment Effect Parameters Flexible Estimation of Treatment Effect Parameters Thomas MaCurdy a and Xiaohong Chen b and Han Hong c Introduction Many empirical studies of program evaluations are complicated by the presence of both

More information

Review of Multiple Regression

Review of Multiple Regression Ronald H. Heck 1 Let s begin with a little review of multiple regression this week. Linear models [e.g., correlation, t-tests, analysis of variance (ANOVA), multiple regression, path analysis, multivariate

More information

Income Distribution Dynamics with Endogenous Fertility. By Michael Kremer and Daniel Chen

Income Distribution Dynamics with Endogenous Fertility. By Michael Kremer and Daniel Chen Income Distribution Dynamics with Endogenous Fertility By Michael Kremer and Daniel Chen I. Introduction II. III. IV. Theory Empirical Evidence A More General Utility Function V. Conclusions Introduction

More information

Group comparisons in logit and probit using predicted probabilities 1

Group comparisons in logit and probit using predicted probabilities 1 Group comparisons in logit and probit using predicted probabilities 1 J. Scott Long Indiana University May 27, 2009 Abstract The comparison of groups in regression models for binary outcomes is complicated

More information

Abstract Teenage Employment and the Spatial Isolation of Minority and Poverty Households Using micro data from the US Census, this paper tests the imp

Abstract Teenage Employment and the Spatial Isolation of Minority and Poverty Households Using micro data from the US Census, this paper tests the imp Teenage Employment and the Spatial Isolation of Minority and Poverty Households by Katherine M. O'Regan Yale School of Management and John M. Quigley University of California Berkeley I II III IV V Introduction

More information

Sociology 593 Exam 2 March 28, 2002

Sociology 593 Exam 2 March 28, 2002 Sociology 59 Exam March 8, 00 I. True-False. (0 points) Indicate whether the following statements are true or false. If false, briefly explain why.. A variable is called CATHOLIC. This probably means that

More information

Wage Decompositions Using Panel Data Sample Selection Correction

Wage Decompositions Using Panel Data Sample Selection Correction DISCUSSION PAPER SERIES IZA DP No. 10157 Wage Decompositions Using Panel Data Sample Selection Correction Ronald L. Oaxaca Chung Choe August 2016 Forschungsinstitut zur Zukunft der Arbeit Institute for

More information

Decomposing Changes (or Differences) in Distributions. Thomas Lemieux, UBC Econ 561 March 2016

Decomposing Changes (or Differences) in Distributions. Thomas Lemieux, UBC Econ 561 March 2016 Decomposing Changes (or Differences) in Distributions Thomas Lemieux, UBC Econ 561 March 2016 Plan of the lecture Refresher on Oaxaca decomposition Quantile regressions: analogy with standard regressions

More information

HOHENHEIM DISCUSSION PAPERS IN BUSINESS, ECONOMICS AND SOCIAL SCIENCES. - An Alternative Estimation Approach -

HOHENHEIM DISCUSSION PAPERS IN BUSINESS, ECONOMICS AND SOCIAL SCIENCES. - An Alternative Estimation Approach - 3 FACULTY OF BUSINESS, NOMICS AND SOCIAL SCIENCES HOHENHEIM DISCUSSION PAPERS IN BUSINESS, NOMICS AND SOCIAL SCIENCES Research Area INEPA DISCUSSION PAPER -2017 - An Alternative Estimation Approach - a

More information

Wooldridge, Introductory Econometrics, 4th ed. Chapter 2: The simple regression model

Wooldridge, Introductory Econometrics, 4th ed. Chapter 2: The simple regression model Wooldridge, Introductory Econometrics, 4th ed. Chapter 2: The simple regression model Most of this course will be concerned with use of a regression model: a structure in which one or more explanatory

More information

Sample Problems. Note: If you find the following statements true, you should briefly prove them. If you find them false, you should correct them.

Sample Problems. Note: If you find the following statements true, you should briefly prove them. If you find them false, you should correct them. Sample Problems 1. True or False Note: If you find the following statements true, you should briefly prove them. If you find them false, you should correct them. (a) The sample average of estimated residuals

More information

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47 ECON2228 Notes 2 Christopher F Baum Boston College Economics 2014 2015 cfb (BC Econ) ECON2228 Notes 2 2014 2015 1 / 47 Chapter 2: The simple regression model Most of this course will be concerned with

More information

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Many economic models involve endogeneity: that is, a theoretical relationship does not fit

More information

Is the Glass Ceiling Cracking?: A Simple Test

Is the Glass Ceiling Cracking?: A Simple Test Is the Glass Ceiling Cracking?: A Simple Test Ting Hu Business School, Tulane University Myeong-Su Yun Economics, Tulane University Keywords: glass ceiling, executive compensation, gender gap, top rank

More information

Chapter 9: The Regression Model with Qualitative Information: Binary Variables (Dummies)

Chapter 9: The Regression Model with Qualitative Information: Binary Variables (Dummies) Chapter 9: The Regression Model with Qualitative Information: Binary Variables (Dummies) Statistics and Introduction to Econometrics M. Angeles Carnero Departamento de Fundamentos del Análisis Económico

More information

Applied Econometrics (MSc.) Lecture 3 Instrumental Variables

Applied Econometrics (MSc.) Lecture 3 Instrumental Variables Applied Econometrics (MSc.) Lecture 3 Instrumental Variables Estimation - Theory Department of Economics University of Gothenburg December 4, 2014 1/28 Why IV estimation? So far, in OLS, we assumed independence.

More information

Econometrics (60 points) as the multivariate regression of Y on X 1 and X 2? [6 points]

Econometrics (60 points) as the multivariate regression of Y on X 1 and X 2? [6 points] Econometrics (60 points) Question 7: Short Answers (30 points) Answer parts 1-6 with a brief explanation. 1. Suppose the model of interest is Y i = 0 + 1 X 1i + 2 X 2i + u i, where E(u X)=0 and E(u 2 X)=

More information

PhD/MA Econometrics Examination January 2012 PART A

PhD/MA Econometrics Examination January 2012 PART A PhD/MA Econometrics Examination January 2012 PART A ANSWER ANY TWO QUESTIONS IN THIS SECTION NOTE: (1) The indicator function has the properties: (2) Question 1 Let, [defined as if using the indicator

More information

Earnings Functions and the Measurement of the Determinants of Wage Dispersion: Extending Oaxaca's Approach 1. Joseph Deutsch. and.

Earnings Functions and the Measurement of the Determinants of Wage Dispersion: Extending Oaxaca's Approach 1. Joseph Deutsch. and. Earnings Functions and the Measurement of the Determinants of Wage Dispersion: Extending Oaxaca's Approach 1 by Joseph Deutsch and Jacques Silber Department of Economics Bar-Ilan University 52900 Ramat-Gan,

More information

WISE International Masters

WISE International Masters WISE International Masters ECONOMETRICS Instructor: Brett Graham INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This examination paper contains 32 questions. You are

More information

1 Motivation for Instrumental Variable (IV) Regression

1 Motivation for Instrumental Variable (IV) Regression ECON 370: IV & 2SLS 1 Instrumental Variables Estimation and Two Stage Least Squares Econometric Methods, ECON 370 Let s get back to the thiking in terms of cross sectional (or pooled cross sectional) data

More information

Wooldridge, Introductory Econometrics, 3d ed. Chapter 16: Simultaneous equations models. An obvious reason for the endogeneity of explanatory

Wooldridge, Introductory Econometrics, 3d ed. Chapter 16: Simultaneous equations models. An obvious reason for the endogeneity of explanatory Wooldridge, Introductory Econometrics, 3d ed. Chapter 16: Simultaneous equations models An obvious reason for the endogeneity of explanatory variables in a regression model is simultaneity: that is, one

More information

Yu Xie, Institute for Social Research, 426 Thompson Street, University of Michigan, Ann

Yu Xie, Institute for Social Research, 426 Thompson Street, University of Michigan, Ann Association Model, Page 1 Yu Xie, Institute for Social Research, 426 Thompson Street, University of Michigan, Ann Arbor, MI 48106. Email: yuxie@umich.edu. Tel: (734)936-0039. Fax: (734)998-7415. Association

More information

ECON 482 / WH Hong Binary or Dummy Variables 1. Qualitative Information

ECON 482 / WH Hong Binary or Dummy Variables 1. Qualitative Information 1. Qualitative Information Qualitative Information Up to now, we assume that all the variables has quantitative meaning. But often in empirical work, we must incorporate qualitative factor into regression

More information

Project Report for STAT571 Statistical Methods Instructor: Dr. Ramon V. Leon. Wage Data Analysis. Yuanlei Zhang

Project Report for STAT571 Statistical Methods Instructor: Dr. Ramon V. Leon. Wage Data Analysis. Yuanlei Zhang Project Report for STAT7 Statistical Methods Instructor: Dr. Ramon V. Leon Wage Data Analysis Yuanlei Zhang 77--7 November, Part : Introduction Data Set The data set contains a random sample of observations

More information

The Regression Tool. Yona Rubinstein. July Yona Rubinstein (LSE) The Regression Tool 07/16 1 / 35

The Regression Tool. Yona Rubinstein. July Yona Rubinstein (LSE) The Regression Tool 07/16 1 / 35 The Regression Tool Yona Rubinstein July 2016 Yona Rubinstein (LSE) The Regression Tool 07/16 1 / 35 Regressions Regression analysis is one of the most commonly used statistical techniques in social and

More information

Truncation and Censoring

Truncation and Censoring Truncation and Censoring Laura Magazzini laura.magazzini@univr.it Laura Magazzini (@univr.it) Truncation and Censoring 1 / 35 Truncation and censoring Truncation: sample data are drawn from a subset of

More information

Econometrics of causal inference. Throughout, we consider the simplest case of a linear outcome equation, and homogeneous

Econometrics of causal inference. Throughout, we consider the simplest case of a linear outcome equation, and homogeneous Econometrics of causal inference Throughout, we consider the simplest case of a linear outcome equation, and homogeneous effects: y = βx + ɛ (1) where y is some outcome, x is an explanatory variable, and

More information

Econometrics I Lecture 7: Dummy Variables

Econometrics I Lecture 7: Dummy Variables Econometrics I Lecture 7: Dummy Variables Mohammad Vesal Graduate School of Management and Economics Sharif University of Technology 44716 Fall 1397 1 / 27 Introduction Dummy variable: d i is a dummy variable

More information

Treatment Effects with Normal Disturbances in sampleselection Package

Treatment Effects with Normal Disturbances in sampleselection Package Treatment Effects with Normal Disturbances in sampleselection Package Ott Toomet University of Washington December 7, 017 1 The Problem Recent decades have seen a surge in interest for evidence-based policy-making.

More information

Sociology 593 Exam 2 Answer Key March 28, 2002

Sociology 593 Exam 2 Answer Key March 28, 2002 Sociology 59 Exam Answer Key March 8, 00 I. True-False. (0 points) Indicate whether the following statements are true or false. If false, briefly explain why.. A variable is called CATHOLIC. This probably

More information

Applied Microeconometrics (L5): Panel Data-Basics

Applied Microeconometrics (L5): Panel Data-Basics Applied Microeconometrics (L5): Panel Data-Basics Nicholas Giannakopoulos University of Patras Department of Economics ngias@upatras.gr November 10, 2015 Nicholas Giannakopoulos (UPatras) MSc Applied Economics

More information

Understanding Sources of Wage Inequality: Additive Decomposition of the Gini Coefficient Using. Quantile Regression

Understanding Sources of Wage Inequality: Additive Decomposition of the Gini Coefficient Using. Quantile Regression Understanding Sources of Wage Inequality: Additive Decomposition of the Gini Coefficient Using Quantile Regression Carlos Hurtado November 3, 217 Abstract Comprehending how measurements of inequality vary

More information

Ordinary Least Squares Regression

Ordinary Least Squares Regression Ordinary Least Squares Regression Goals for this unit More on notation and terminology OLS scalar versus matrix derivation Some Preliminaries In this class we will be learning to analyze Cross Section

More information

ESCoE Research Seminar

ESCoE Research Seminar ESCoE Research Seminar Decomposing Differences in Productivity Distributions Presented by Patrick Schneider, Bank of England 30 January 2018 Patrick Schneider Bank of England ESCoE Research Seminar, 30

More information

Interpreting and using heterogeneous choice & generalized ordered logit models

Interpreting and using heterogeneous choice & generalized ordered logit models Interpreting and using heterogeneous choice & generalized ordered logit models Richard Williams Department of Sociology University of Notre Dame July 2006 http://www.nd.edu/~rwilliam/ The gologit/gologit2

More information

Lecture 5: Omitted Variables, Dummy Variables and Multicollinearity

Lecture 5: Omitted Variables, Dummy Variables and Multicollinearity Lecture 5: Omitted Variables, Dummy Variables and Multicollinearity R.G. Pierse 1 Omitted Variables Suppose that the true model is Y i β 1 + β X i + β 3 X 3i + u i, i 1,, n (1.1) where β 3 0 but that the

More information

Econometrics Problem Set 3

Econometrics Problem Set 3 Econometrics Problem Set 3 Conceptual Questions 1. This question refers to the estimated regressions in table 1 computed using data for 1988 from the U.S. Current Population Survey. The data set consists

More information

Explanatory Variables Must be Linear Independent...

Explanatory Variables Must be Linear Independent... Explanatory Variables Must be Linear Independent... Recall the multiple linear regression model Y j = β 0 + β 1 X 1j + β 2 X 2j + + β p X pj + ε j, i = 1,, n. is a shorthand for n linear relationships

More information

Asymmetric Information and Search Frictions: A Neutrality Result

Asymmetric Information and Search Frictions: A Neutrality Result Asymmetric Information and Search Frictions: A Neutrality Result Neel Rao University at Buffalo, SUNY August 26, 2016 Abstract This paper integrates asymmetric information between firms into a canonical

More information

Decomposing Excess Returns in Stochastic Linear Models

Decomposing Excess Returns in Stochastic Linear Models D I S C U S S I O A E R S E R I E S IZA D o. 637 Decomposing Excess Returns in Stochastic Linear Models Carl Lin December 011 Forschungsinstitut zur Zukunft der Arbeit Institute for the Study of Labor

More information

A Measure of Robustness to Misspecification

A Measure of Robustness to Misspecification A Measure of Robustness to Misspecification Susan Athey Guido W. Imbens December 2014 Graduate School of Business, Stanford University, and NBER. Electronic correspondence: athey@stanford.edu. Graduate

More information

Has the Family Planning Policy Improved the Quality of the Chinese New. Generation? Yingyao Hu University of Texas at Austin

Has the Family Planning Policy Improved the Quality of the Chinese New. Generation? Yingyao Hu University of Texas at Austin Very preliminary and incomplete Has the Family Planning Policy Improved the Quality of the Chinese New Generation? Yingyao Hu University of Texas at Austin Zhong Zhao Institute for the Study of Labor (IZA)

More information

Dealing With Endogeneity

Dealing With Endogeneity Dealing With Endogeneity Junhui Qian December 22, 2014 Outline Introduction Instrumental Variable Instrumental Variable Estimation Two-Stage Least Square Estimation Panel Data Endogeneity in Econometrics

More information

An Introduction to Relative Distribution Methods

An Introduction to Relative Distribution Methods An Introduction to Relative Distribution Methods by Mark S Handcock Professor of Statistics and Sociology Center for Statistics and the Social Sciences CSDE Seminar Series March 2, 2001 In collaboration

More information

Analysing data: regression and correlation S6 and S7

Analysing data: regression and correlation S6 and S7 Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association

More information

A METHODOLOGICAL NOTE ON OBTAINING MCA COEFFICIENTS. Daniel H. Hill

A METHODOLOGICAL NOTE ON OBTAINING MCA COEFFICIENTS. Daniel H. Hill WORKING PAPER SERIES A METHODOLOGICAL NOTE ON OBTAINING MCA COEFFICIENTS AND STANDARD ERRORS Daniel H. Hill Institute for Social Research ISR Survey Research Center Research Center for Group Dynamics Center

More information

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score Causal Inference with General Treatment Regimes: Generalizing the Propensity Score David van Dyk Department of Statistics, University of California, Irvine vandyk@stat.harvard.edu Joint work with Kosuke

More information

Appendix B. Additional Results for. Social Class and Workers= Rent,

Appendix B. Additional Results for. Social Class and Workers= Rent, Appendix B Additional Results for Social Class and Workers= Rent, 1983-2001 How Strongly do EGP Classes Predict Earnings in Comparison to Standard Educational and Occupational groups? At the end of this

More information

Approximation of Functions

Approximation of Functions Approximation of Functions Carlos Hurtado Department of Economics University of Illinois at Urbana-Champaign hrtdmrt2@illinois.edu Nov 7th, 217 C. Hurtado (UIUC - Economics) Numerical Methods On the Agenda

More information

Changes in the Transitory Variance of Income Components and their Impact on Family Income Instability

Changes in the Transitory Variance of Income Components and their Impact on Family Income Instability Changes in the Transitory Variance of Income Components and their Impact on Family Income Instability Peter Gottschalk and Sisi Zhang August 22, 2010 Abstract The well-documented increase in family income

More information

Fixed Effects Models for Panel Data. December 1, 2014

Fixed Effects Models for Panel Data. December 1, 2014 Fixed Effects Models for Panel Data December 1, 2014 Notation Use the same setup as before, with the linear model Y it = X it β + c i + ɛ it (1) where X it is a 1 K + 1 vector of independent variables.

More information

Controlling for Time Invariant Heterogeneity

Controlling for Time Invariant Heterogeneity Controlling for Time Invariant Heterogeneity Yona Rubinstein July 2016 Yona Rubinstein (LSE) Controlling for Time Invariant Heterogeneity 07/16 1 / 19 Observables and Unobservables Confounding Factors

More information

Midterm 1 ECO Undergraduate Econometrics

Midterm 1 ECO Undergraduate Econometrics Midterm ECO 23 - Undergraduate Econometrics Prof. Carolina Caetano INSTRUCTIONS Reading and understanding the instructions is your responsibility. Failure to comply may result in loss of points, and there

More information

oaxaca: Blinder-Oaxaca Decomposition in R

oaxaca: Blinder-Oaxaca Decomposition in R oaxaca: Blinder-Oaxaca Decomposition in R Marek Hlavac Department of Economics, UWC Adriatic, Duino (Trieste), Italy Central European Labour Studies Institute (CELSI), Bratislava, Slovakia Abstract This

More information

Intermediate Econometrics

Intermediate Econometrics Intermediate Econometrics Markus Haas LMU München Summer term 2011 15. Mai 2011 The Simple Linear Regression Model Considering variables x and y in a specific population (e.g., years of education and wage

More information

RWI : Discussion Papers

RWI : Discussion Papers Thomas K. Bauer and Mathias Sinning No. 32 RWI : Discussion Papers RWI ESSEN Rheinisch-Westfälisches Institut für Wirtschaftsforschung Board of Directors: Prof. Dr. Christoph M. Schmidt, Ph.D. (President),

More information

Econometrics in a nutshell: Variation and Identification Linear Regression Model in STATA. Research Methods. Carlos Noton.

Econometrics in a nutshell: Variation and Identification Linear Regression Model in STATA. Research Methods. Carlos Noton. 1/17 Research Methods Carlos Noton Term 2-2012 Outline 2/17 1 Econometrics in a nutshell: Variation and Identification 2 Main Assumptions 3/17 Dependent variable or outcome Y is the result of two forces:

More information

Part VII. Accounting for the Endogeneity of Schooling. Endogeneity of schooling Mean growth rate of earnings Mean growth rate Selection bias Summary

Part VII. Accounting for the Endogeneity of Schooling. Endogeneity of schooling Mean growth rate of earnings Mean growth rate Selection bias Summary Part VII Accounting for the Endogeneity of Schooling 327 / 785 Much of the CPS-Census literature on the returns to schooling ignores the choice of schooling and its consequences for estimating the rate

More information

FAQ: Linear and Multiple Regression Analysis: Coefficients

FAQ: Linear and Multiple Regression Analysis: Coefficients Question 1: How do I calculate a least squares regression line? Answer 1: Regression analysis is a statistical tool that utilizes the relation between two or more quantitative variables so that one variable

More information

CEPA Working Paper No

CEPA Working Paper No CEPA Working Paper No. 15-06 Identification based on Difference-in-Differences Approaches with Multiple Treatments AUTHORS Hans Fricke Stanford University ABSTRACT This paper discusses identification based

More information

Mathematics for Economics MA course

Mathematics for Economics MA course Mathematics for Economics MA course Simple Linear Regression Dr. Seetha Bandara Simple Regression Simple linear regression is a statistical method that allows us to summarize and study relationships between

More information

Lecture 2: The Human Capital Model

Lecture 2: The Human Capital Model Lecture 2: The Human Capital Model Fatih Guvenen University of Minnesota February 7, 2018 Fatih Guvenen (2018) Lecture 2: Ben Porath February 7, 2018 1 / 16 Why Study Wages? Labor income is 2/3 of GDP.

More information

Social Vulnerability Index. Susan L. Cutter Department of Geography, University of South Carolina

Social Vulnerability Index. Susan L. Cutter Department of Geography, University of South Carolina Social Vulnerability Index Susan L. Cutter Department of Geography, University of South Carolina scutter@sc.edu Great Lakes and St. Lawrence Cities Initiative Webinar December 3, 2014 Vulnerability The

More information

Chapter 9. Dummy (Binary) Variables. 9.1 Introduction The multiple regression model (9.1.1) Assumption MR1 is

Chapter 9. Dummy (Binary) Variables. 9.1 Introduction The multiple regression model (9.1.1) Assumption MR1 is Chapter 9 Dummy (Binary) Variables 9.1 Introduction The multiple regression model y = β+β x +β x + +β x + e (9.1.1) t 1 2 t2 3 t3 K tk t Assumption MR1 is 1. yt =β 1+β 2xt2 + L+β KxtK + et, t = 1, K, T

More information

Logistic regression: Why we often can do what we think we can do. Maarten Buis 19 th UK Stata Users Group meeting, 10 Sept. 2015

Logistic regression: Why we often can do what we think we can do. Maarten Buis 19 th UK Stata Users Group meeting, 10 Sept. 2015 Logistic regression: Why we often can do what we think we can do Maarten Buis 19 th UK Stata Users Group meeting, 10 Sept. 2015 1 Introduction Introduction - In 2010 Carina Mood published an overview article

More information

Working Paper No Earnings Functions and the Measurement of the Determinants of Wage Dispersion: Extending Oaxaca's Approach*

Working Paper No Earnings Functions and the Measurement of the Determinants of Wage Dispersion: Extending Oaxaca's Approach* Working Paper No. 521 Earnings Functions and the Measurement of the Determinants of Wage Dispersion: Extending Oaxaca's Approach* by Joseph Deutsch Bar-Ilan University, Israel and Jacques Silber The Levy

More information

Lecture Notes on Measurement Error

Lecture Notes on Measurement Error Steve Pischke Spring 2000 Lecture Notes on Measurement Error These notes summarize a variety of simple results on measurement error which I nd useful. They also provide some references where more complete

More information

Lecture 24: Partial correlation, multiple regression, and correlation

Lecture 24: Partial correlation, multiple regression, and correlation Lecture 24: Partial correlation, multiple regression, and correlation Ernesto F. L. Amaral November 21, 2017 Advanced Methods of Social Research (SOCI 420) Source: Healey, Joseph F. 2015. Statistics: A

More information

Do not copy, post, or distribute

Do not copy, post, or distribute 14 CORRELATION ANALYSIS AND LINEAR REGRESSION Assessing the Covariability of Two Quantitative Properties 14.0 LEARNING OBJECTIVES In this chapter, we discuss two related techniques for assessing a possible

More information

One Economist s Perspective on Some Important Estimation Issues

One Economist s Perspective on Some Important Estimation Issues One Economist s Perspective on Some Important Estimation Issues Jere R. Behrman W.R. Kenan Jr. Professor of Economics & Sociology University of Pennsylvania SRCD Seattle Preconference on Interventions

More information

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies Kosuke Imai Department of Politics Princeton University November 13, 2013 So far, we have essentially assumed

More information

Combining Non-probability and Probability Survey Samples Through Mass Imputation

Combining Non-probability and Probability Survey Samples Through Mass Imputation Combining Non-probability and Probability Survey Samples Through Mass Imputation Jae-Kwang Kim 1 Iowa State University & KAIST October 27, 2018 1 Joint work with Seho Park, Yilin Chen, and Changbao Wu

More information

Applied Economics. Regression with a Binary Dependent Variable. Department of Economics Universidad Carlos III de Madrid

Applied Economics. Regression with a Binary Dependent Variable. Department of Economics Universidad Carlos III de Madrid Applied Economics Regression with a Binary Dependent Variable Department of Economics Universidad Carlos III de Madrid See Stock and Watson (chapter 11) 1 / 28 Binary Dependent Variables: What is Different?

More information

Trends in the Relative Distribution of Wages by Gender and Cohorts in Brazil ( )

Trends in the Relative Distribution of Wages by Gender and Cohorts in Brazil ( ) Trends in the Relative Distribution of Wages by Gender and Cohorts in Brazil (1981-2005) Ana Maria Hermeto Camilo de Oliveira Affiliation: CEDEPLAR/UFMG Address: Av. Antônio Carlos, 6627 FACE/UFMG Belo

More information

Gabriel Montes-Rojas, Lucas Siga & Ram Mainali The Journal of Economic Inequality

Gabriel Montes-Rojas, Lucas Siga & Ram Mainali The Journal of Economic Inequality Mean and quantile regression Oaxaca- Blinder decompositions with an application to caste discrimination Gabriel Montes-Rojas, Lucas Siga & Ram Mainali The Journal of Economic Inequality ISSN 1569-1721

More information

Tables and Figures. This draft, July 2, 2007

Tables and Figures. This draft, July 2, 2007 and Figures This draft, July 2, 2007 1 / 16 Figures 2 / 16 Figure 1: Density of Estimated Propensity Score Pr(D=1) % 50 40 Treated Group Untreated Group 30 f (P) 20 10 0.01~.10.11~.20.21~.30.31~.40.41~.50.51~.60.61~.70.71~.80.81~.90.91~.99

More information

Chapter 4 Regression with Categorical Predictor Variables Page 1. Overview of regression with categorical predictors

Chapter 4 Regression with Categorical Predictor Variables Page 1. Overview of regression with categorical predictors Chapter 4 Regression with Categorical Predictor Variables Page. Overview of regression with categorical predictors 4-. Dummy coding 4-3 4-5 A. Karpinski Regression with Categorical Predictor Variables.

More information

Eco 391, J. Sandford, spring 2013 April 5, Midterm 3 4/5/2013

Eco 391, J. Sandford, spring 2013 April 5, Midterm 3 4/5/2013 Midterm 3 4/5/2013 Instructions: You may use a calculator, and one sheet of notes. You will never be penalized for showing work, but if what is asked for can be computed directly, points awarded will depend

More information

Equation Number 1 Dependent Variable.. Y W's Childbearing expectations

Equation Number 1 Dependent Variable.. Y W's Childbearing expectations Sociology 592 - Homework #10 - Advanced Multiple Regression 1. In their classic 1982 paper, Beyond Wives' Family Sociology: A Method for Analyzing Couple Data, Thomson and Williams examined the relationship

More information