Trends in the Relative Distribution of Wages by Gender and Cohorts in Brazil ( )

Similar documents
An Introduction to Relative Distribution Methods

Sources of Inequality: Additive Decomposition of the Gini Coefficient.

Using Relative Distribution Software

On the Empirical Content of the Formal-Informal Labor Market Segmentation Hypothesis *

Population Aging, Labor Demand, and the Structure of Wages

Approximation of Functions

Nonparametric Density Estimation

Decomposing Changes (or Differences) in Distributions. Thomas Lemieux, UBC Econ 561 March 2016

Quantile methods. Class Notes Manuel Arellano December 1, Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be

41903: Introduction to Nonparametrics

Chapter 8. Quantile Regression and Quantile Treatment Effects

More formally, the Gini coefficient is defined as. with p(y) = F Y (y) and where GL(p, F ) the Generalized Lorenz ordinate of F Y is ( )

Making sense of Econometrics: Basics

Labour Supply Responses and the Extensive Margin: The US, UK and France

The decomposition of wage gaps in the Netherlands with sample selection adjustments

Regression #8: Loose Ends

Notes on Heterogeneity, Aggregation, and Market Wage Functions: An Empirical Model of Self-Selection in the Labor Market

The Changing Nature of Gender Selection into Employment: Europe over the Great Recession

Wages and housing transfers in China Inequality decomposition by source using copulas

More on Roy Model of Self-Selection

An Analysis of College Algebra Exam Scores December 14, James D Jones Math Section 01

The Gender Earnings Gap: Measurement and Analysis

1. Capitalize all surnames and attempt to match with Census list. 3. Split double-barreled names apart, and attempt to match first half of name.

Normalized Equation and Decomposition Analysis: Computation and Inference

Understanding Sources of Wage Inequality: Additive Decomposition of the Gini Coefficient Using. Quantile Regression

Rethinking Inequality Decomposition: Comment. *London School of Economics. University of Milan and Econpubblica

The Instability of Correlations: Measurement and the Implications for Market Risk

Econometrics I. Professor William Greene Stern School of Business Department of Economics 1-1/40. Part 1: Introduction

T-Test QUESTION T-TEST GROUPS = sex(1 2) /MISSING = ANALYSIS /VARIABLES = quiz1 quiz2 quiz3 quiz4 quiz5 final total /CRITERIA = CI(.95).

Stat 5101 Lecture Notes

Worker Heterogeneity, Wage Inequality, and International Trade: Theory and Evidence from Brazil

Distribution regression methods

What Accounts for the Growing Fluctuations in FamilyOECD Income March in the US? / 32

FINITE MIXTURES OF LOGNORMAL AND GAMMA DISTRIBUTIONS

MIT Graduate Labor Economics Spring 2015 Lecture Note 1: Wage Density Decompositions

Within-Groups Wage Inequality and Schooling: Further Evidence for Portugal

Model-free prediction intervals for regression and autoregression. Dimitris N. Politis University of California, San Diego

Conditional density estimation: an application to the Ecuadorian manufacturing sector. Abstract

Additive Decompositions with Interaction Effects

Comparative Advantage and Schooling

Nonparametric Methods

Potential Outcomes Model (POM)

Impact Evaluation Technical Workshop:

Answer Key: Problem Set 5

Applied Microeconometrics. Maximilian Kasy

Name Class Date. Adding and Subtracting Polynomials Going Deeper Essential question: How do you add and subtract polynomials?

The Nonparametric Bootstrap

Ignoring the matching variables in cohort studies - when is it valid, and why?

What s New in Econometrics? Lecture 14 Quantile Methods

Review of Multiple Regression

Semi and Nonparametric Models in Econometrics

The Regression Tool. Yona Rubinstein. July Yona Rubinstein (LSE) The Regression Tool 07/16 1 / 35

COUNTERFACTUAL DECOMPOSITION OF CHANGES IN WAGE DISTRIBUTIONS USING QUANTILE REGRESSION

Chapter 9: The Regression Model with Qualitative Information: Binary Variables (Dummies)

Introduction to Econometrics

3 Joint Distributions 71

GIST 4302/5302: Spatial Analysis and Modeling

ECON 482 / WH Hong Binary or Dummy Variables 1. Qualitative Information

ECON Interactions and Dummies

Can we do statistical inference in a non-asymptotic way? 1

Trade and Inequality: From Theory to Estimation

Sensitivity checks for the local average treatment effect

A Distributional Framework for Matched Employer Employee Data

The Allocation of Talent and U.S. Economic Growth

MIT Graduate Labor Economics Spring 2013 Lecture Note 6: Wage Density Decompositions

Financial Econometrics and Volatility Models Copulas

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Ch 7: Dummy (binary, indicator) variables

1 Introduction Overview of the Book How to Use this Book Introduction to R 10

finite-sample optimal estimation and inference on average treatment effects under unconfoundedness

Bayesian vs frequentist techniques for the analysis of binary outcome data

12 - Nonparametric Density Estimation

Inference and Regression

INEQUALITY OF HAPPINESS IN THE U.S.: and. James Foster

Intermediate Econometrics

An Extension of the Blinder-Oaxaca Decomposition to a Continuum of Comparison Groups

Kernel density estimation of reliability with applications to extreme value distribution

Introduction to Linear Regression Analysis

Dover- Sherborn High School Mathematics Curriculum Probability and Statistics

Econometrics Problem Set 4

Econometrics in a nutshell: Variation and Identification Linear Regression Model in STATA. Research Methods. Carlos Noton.

ES103 Introduction to Econometrics

Machine Learning Linear Regression. Prof. Matteo Matteucci

Non-Parametric Tests for Imperfect Repair

Flexible Estimation of Treatment Effect Parameters

Semiparametric Generalized Linear Models

Statistics: Learning models from data

The reldist Package. April 5, 2006

Tables and Figures. This draft, July 2, 2007

Solutions to Odd-Numbered End-of-Chapter Exercises: Chapter 8

Frequency Analysis & Probability Plots

L7. DISTRIBUTION REGRESSION AND COUNTERFACTUAL ANALYSIS

Applied Quantitative Methods II

Akaike Information Criterion to Select the Parametric Detection Function for Kernel Estimator Using Line Transect Data

Inference for the Regression Coefficient

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

New Developments in Econometrics Lecture 16: Quantile Estimation

Firms, Informality and Development: Theory and evidence from Brazil. Online Appendix

ECON 361 Assignment 1 Solutions

Transcription:

Trends in the Relative Distribution of Wages by Gender and Cohorts in Brazil (1981-2005) Ana Maria Hermeto Camilo de Oliveira Affiliation: CEDEPLAR/UFMG Address: Av. Antônio Carlos, 6627 FACE/UFMG Belo Horizonte, MG, Brazil 31270-901 Phone Number: 55-31-34097154 E-mail: ahermeto@cedeplar.ufmg.br Raquel Rangel de Meireles Guimarães Affiliation: CEDEPLAR/UFMG Address: Av. Antônio Carlos, 6627 FACE/UFMG Belo Horizonte, MG, Brazil 31270-901 Phone Number: 55-31-34097154 E-mail: rguima@cedeplar.ufmg.br Extended Abstract Most of previous works regarding gender wage differentials are based on parametric methodologies for estimation of wage differentials by gender, using traditional mincerian regressions or quantile regression estimates, with classical or bayesian statistics. The aim of our paper is to analyse and decompose changes in earnings relative distribution between men and women in different cohorts, using the relative distribution framework. This methodology was proposed by Handcock and Morris (1999), considering non-parametrical tools which allow an exploratory analysis that is independent of parametric assumptions on the mathematical form of the response-variable probabilities. We use density estimates of the kernel probability for each sex and cohort and decompositions of the relative distribution to get substantive evidences for gender differentials and relative mobility in Brazil, from 1981 to 2005. We use microdata from the Brazilian Household Sample Survey (Pesquisa Nacional por Amostra de Domicílios - PNAD), for 1981, 1992 and 2005. This survey is yearly conducted by the Brazilian Census Bureau (Instituto Brasileiro de Geografia e Estatística IBGE) and presents a comprehensive data source, in particular about the labor market and earnings, statistically representative of Brazil. To analyze the wage differentials between male and female workers, our sample was restricted to those working at the survey reference week and earned a positive wage in the period. We constructed a pseudo-panel of these repeated cross-sections, that allows us to follow cohorts over time. Non-parametric approaches brought into consideration the fact that it was not strictly necessary to adopt assumptions about the mathematical form of the probabilities distribution of a variable. Most of parametric models (classical regressions and their decompositions) are sensitive to violations of their hypothesis, turning into misleading answers to studies questions (DINARDO e TOBIAS, 2001). Moreover, the non-parametric methodology allows the analysis of data as they are, without any prior distribution assumption. Therefore, the relative distribution method, proposed by Handcock and Morris (1999), constitutes a valuable instrument for the substantive analysis of the wage inequality and provides a consistent framework for data analysis. Intuitively, the relative distribution is a transformation of data from two distributions (reference and comparison - in our paper, men and women) into one

distribution that contains all information for comparison between them. In addition, the relative distribution combines the exploratory potential of data to a statistical tool of estimation and inference that is insensitive to recurring assumptions of parametric methodologies. Despite the development of data analysis based on the relative distribution and other nonparametric methods, there are few studies in Brazil that examine the evolution of gender wage inequality using these tools, particularly following cohorts over time. Besides, although inequality studies should focus on the distribution as a whole, most of empirical studies are stuck to methodologies based on mean values, which are not always representative of the population. The relative distribution method fills this lack and provides a framework of summary measures and figures that allows a substantive examination of data. The intuition of the relative distribution is the construction of counterfactual settings in which two populations are compared, based on the probabilities distributions. In our paper, we investigate how female workers in Brazil would be located at the male wage distribution, or how they would be located in 2005 if their wage distribution remains as in 1997, for example. The relative distribution has the following properties: (i) it is not affected by scale (invariant to any monotonic transformation of the original variable; e.g. wages versus log-wages); (ii) the basic unit of analysis is the population and not the individual; (iii) it measures individuals proportions and ranks, and not the earnings values, as the traditional methods. These characteristics distinguish the relative distribution for inequality questions. Formally, for our empirical application, consider Y 0 as a random variable for the hourly-wage logarithm of a reference population (men). The probability density funcion (pdf) of Y 0 is given by f 0 (y) and its cumulative distribution (cdf) is F 0 (y). Consider also the same measures for a comparison population Y (private sector workers), with a pdf of Y, f(y) and its distribution, F(y). The relative distribution of Y to Y 0 (R function) is defined by the random variable distribution: R = F 0 (Y) (1)

Where: R: indicates the relative distribution of log hourly-wages, F 0 : is the cumulative distribution function for the log hourly-wages of male workers, and Y: is the log hourly-wages of female workers. The R relative distribution is obtained from Y, changed by the cumulative distribution function for Y 0, F 0. A property of R is that it is continuous at the [0,1] range. We define the outcome of R, r, by relative data. An important measure related to the cumulative distribution functions (cdf) is their inverse function that derives the quantile function. For the log hourly-wages of a population, this function is described as: Where: Q { p} ( p) = F 1 ( p) = inf x F ( x) x Q(.): indicates the quantile function of the log hourly-wages, F -1 (.): is the inverse of the cumulative distributive function (cdf) of hourly-wages, x: is the observed hourly-wages, and p: indicates the quantile of interest. (2) The quantile function value, Q(p), defines the p quantile value of the hourly-wage distribution. An special case is the median (p = 0,5), for which the Q(0,5) value defines the hourly-wage that splits 50% of the population whose wage is below e 50% above. Given that, by definition, the relative distribution is a monotonic transformation of the variable of interest (log hourly-wages for relative data) and, if we know this variable distribution, we are able to show the equivalence between the original distribution and its transformation: Where: 1 ( ) = F ( h ( y )) ( y ) P X h ( x) F Y 1 = (3) Y: is the variable of interest, and h(x): indicates the monotonic transformation of this variable. Hence, the equation (3) proves that the cumulative distribution function of Y is equivalent to the same function for a monotonic transformation of Y. Using this property, we can reach the cumulative distribution function (cdf) of the random variable R: ( ) = F Q ( r ) 1 ( r ) = F F ( r ) ( ), 0 r 1 G (4) 0 0 The first derivate of G(r) indicates the relative density function (pdf): ( Q0 ( r )) ( Q ( r )) f g ( r ) =, 0 r 1 (5) f 0 0

So, the relative density g(r) can be interpreted as a density ratio: the ratio between the fraction of female workers and the fraction of male workers at a given level of the outcome Y (Q 0 (r)). Intuitively, the relative density points how the women would be located at the wage distribution of men. If wages over the distribution for the both sectors were equal, the g(r) function would assume the value 1 and, for example, the 10% of male workers that are located in the lower tail of the distribution would be equivalent to the 10% of lowest wages of the female distribution (if g(0,10) = 1). To estimate the hourly-wage probabilities densities for the men and women, f 0 (y) e f(y), we used the kernel non-parametric data smoothing method. This method allowed us to not adopt an unnecessary or wrong assumption about the wage distribution mathematical form. The estimation of kernel densities require two components: the bandwidth and the kernel function. The bandwidth indicates the distance from a point x 0 of the wage distribution to where the density will be smoothed and the kernel is the function that estimates local means. According to Dinardo and Tobias (2001), the kernel choice does not significantly affects the form of the estimated probabilities curves, but the bandwidth choice has significant implications on the results, since it involves a trade-off between bias and variance. We used the optimal criteria proposed by Silverman (1986), which adopts an Epanechnikov kernel (k(x)) for the estimation of ˆ x ), with a bandwidth ( h ) defined as: the probability density ( ( ) n x X fˆ 1 1 i n ( x) = k (6) n i = 1 n h k 2 ( x) ( 1 x ) I ( x) h = Where: f n 3 = (7) 4 13643, δ n -0, 2 σ (8) x: is the function value to be smoothed, k(x): is the kernel function, I(x): is the index function, h : is the optimal bandwidth, δ : is a constant for the Epanechnikov kernel (1,7188), n: is the sample size, and σ : is the sample standard-deviation. In addition to the easy and flexible interpretation, the relative distribution also provides decomposition methods of shifts in the median (location) and in the structure (shape), allowing interesting analyses about the effect over time of changes on wages. All this is connected to the explanatory power of the figures, which facilitates the interpretations even more.

Following Handcock e Morris (1999), consider Y 0L as a random variable that describes an outcome (here, standardized wages) for the reference group, adjusted to have the same median of the comparison group, i.e. Y 0L = Y 0 - ρ, where ρ is the variable median for the comparison group. In this case, we say that Y 0L is an hypothetical group that has the comparison group median, but the structure of the reference group. The cumulative distribution function of Y 0L is F 0L (y) or F 0 (y - ρ). In the same way, the probability density of Y 0L is given by f 0L (y) or f 0 (y - ρ). For the relative decomposition in location (median) and shape (structure) shifts, we constructed two relative distributions, derived from the distributions of three groups: the reference group Y, the comparison group Y 0, and the hypothetical group, Y 0L, which has the median of the comparison group, but the structure of the reference group. Consider R the relative distribution of the reference group to the comparison group, that is, F 0 (Y). Also consider R 0 0L as the relative distribution of the hypothetical group to the comparison group, F 0 (Y 0L ) or F 0 (Y 0 + ρ). In this case, if the reference and the comparison groups have the same median, then R 0 0L has an uniform distribution. Considering R 0L as the relative distribution of the reference group to the hypothetical group, F 0L (Y) or F 0 (Y - ρ), R 0L would have an uniform distribution when, net of median shifts, the two distributions (reference and comparison) have the same structure. Therefore, the overall relative density g(r) can be decomposed in two parts: a relative density that indicates location shifts g 0 0L (r) and a relative density for differences in the two distributions shapes g 0L (r): 0L 0 g(r) = g (r) g (r) (9) 0L An application of this decomposition refers to the analysis of changes in the wages distribution between men and women in two periods. Figure 1: Gender log-hourly wage gap. Brazil, 1981-2005 Figure 2: Relative distribution of the log-hourly wages by gender, Brazil, 1981-2005

Figure 3: Selected deciles of log-hourly wage by gender, Brazil, 1981-2005

Figure 4: CDF of the Relative Distribution of wages by gender, Brazil, 1981 and 1992 Proportion of women in 1981 Proportion of women in 1992 Proportion of men in 1981 Brazil, 1981 Proportion of men in 1992 Brazil, 1992 Following Handcock and Aldrich (2002), if the two distributions (of men and women) are identical, then the cumulated distribution function (cdf) of the relative distribution line is a 45 o line and the probability density function (pdf) is that of a uniform distribution. Then we can infer that the gender distribution of wages in Brazil has becomed more equal since 1981.

Figure 5: CDF of the Relative Distribution of wages by gender, Brazil, 1992 and 2005 Proportion of women in 1992 Proportion of women in 2005 Proportion of men in 1992 Proportion of men in 2005 Brazil, 1992 Figure 6: PDF of the Relative Distribution of wages by gender, Brazil, 1981 and 1992 Relative density 0.0 0.5 1.0 1.5 2.0 2.5 Relative density 0.0 0.5 1.0 1.5 2.0 2.5 0.0 0.2 0.4 0.6 0.8 1.0 Proportion of men Brazil, 1981 0.0 0.2 0.4 0.6 0.8 1.0 Proportion of men Brazil, 1992

Figure 7: PDF of the Relative Distribution of wages by gender, Brazil, 1992 and 2005. Relative density 0.0 0.5 1.0 1.5 2.0 2.5 Relative density 0.0 0.5 1.0 1.5 2.0 2.5 0.0 0.2 0.4 0.6 0.8 1.0 Proportion of men Brazil, 1992 0.0 0.2 0.4 0.6 0.8 1.0 Proportion of men Brazil, 2005 Figure 8: Kernel density estimator of the log-hourly wage PDF by gender, Brazil, 1981 and 1992 density 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Male workers, 1981 Female workers, 1981 density 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Male workers, 1992 Female workers, 1992-2 -1 0 1 2 3 4-2 -1 0 1 2 3 4 changes in the log-hourly wages changes in the log-hourly wages

Figure 9: Kernel density estimator of the log-hourly wage PDF by gender, Brazil, 1992 and 2005 density 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Male workers, 1992 Female workers, 1992 density 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Male workers, 2005 Female workers, 2005-2 -1 0 1 2 3 4-2 -1 0 1 2 3 4 changes in the log-hourly wages changes in the log-hourly wages