An Introduction to Relative Distribution Methods

Similar documents
Trends in the Relative Distribution of Wages by Gender and Cohorts in Brazil ( )

Sources of Inequality: Additive Decomposition of the Gini Coefficient.

Using Relative Distribution Software

Population Aging, Labor Demand, and the Structure of Wages

Applied Microeconometrics. Maximilian Kasy

ES103 Introduction to Econometrics

Labour Supply Responses and the Extensive Margin: The US, UK and France

SUMMARIZING MEASURED DATA. Gaia Maselli

GROWING APART: THE CHANGING FIRM-SIZE WAGE PREMIUM AND ITS INEQUALITY CONSEQUENCES ONLINE APPENDIX

Chapter (3) Describing Data Numerical Measures Examples

ANOVA - analysis of variance - used to compare the means of several populations.

6 THE NORMAL DISTRIBUTION

MEASURES OF CENTRAL TENDENCY

Groupe de lecture. Instrumental Variables Estimates of the Effect of Subsidized Training on the Quantiles of Trainee Earnings. Abadie, Angrist, Imbens

Analysis of Categorical Data Three-Way Contingency Table

Introductory Mathematics and Statistics Summer Practice Questions

SESSION 5 Descriptive Statistics

4. Nonlinear regression functions

More formally, the Gini coefficient is defined as. with p(y) = F Y (y) and where GL(p, F ) the Generalized Lorenz ordinate of F Y is ( )

Wages and housing transfers in China Inequality decomposition by source using copulas

CHAPTER 4 VARIABILITY ANALYSES. Chapter 3 introduced the mode, median, and mean as tools for summarizing the

Econometrics Problem Set 4

WISE International Masters

Summarizing Measured Data

Bayesian Modeling of Conditional Distributions

We know from STAT.1030 that the relevant test statistic for equality of proportions is:

WHAT IS HETEROSKEDASTICITY AND WHY SHOULD WE CARE?

Ch 7: Dummy (binary, indicator) variables

Approximation of Functions

MGEC11H3Y L01 Introduction to Regression Analysis Term Test Friday July 5, PM Instructor: Victor Yu

Quantitative Tools for Research

Inferential Statistics

Social Vulnerability Index. Susan L. Cutter Department of Geography, University of South Carolina

STATISTICAL ANALYSIS OF LAW ENFORCEMENT SURVEILLANCE IMPACT ON SAMPLE CONSTRUCTION ZONES IN MISSISSIPPI (Part 1: DESCRIPTIVE)

Exploratory Data Analysis: Two Variables

Sociology 593 Exam 2 Answer Key March 28, 2002

Lecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data:

Mathematics for Economics MA course

Descriptive Statistics Class Practice [133 marks]

Sociology 593 Exam 2 March 28, 2002

Decomposing Changes (or Differences) in Distributions. Thomas Lemieux, UBC Econ 561 March 2016

ECON Interactions and Dummies

Review for the Algebra EOC

MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES

Lecture Notes for BUSINESS STATISTICS - BMGT 571. Chapters 1 through 6. Professor Ahmadi, Ph.D. Department of Management

Descriptive Statistics

Probabilities & Statistics Revision

Exploratory Data Analysis: Two Variables

Introductory Mathematics and Statistics Summer Practice Questions

Econ 444, class 11. Robert de Jong 1. Monday November 6. Ohio State University. Econ 444, Wednesday November 1, class Department of Economics

2011/04 LEUKAEMIA IN WALES Welsh Cancer Intelligence and Surveillance Unit

Econ 172A, Fall 2012: Final Examination (I) 1. The examination has seven questions. Answer them all.

Chapter 9: The Regression Model with Qualitative Information: Binary Variables (Dummies)

FREC 608 Guided Exercise 9

Describing Distributions with Numbers

Chapter 3: Examining Relationships

Statistics 528: Homework 2 Solutions

The Gender Earnings Gap: Measurement and Analysis

Solutionbank S1 Edexcel AS and A Level Modular Mathematics

Exercises from Chapter 3, Section 1

Statistics 135: Fall 2004 Final Exam

Appendix B. Additional Results for. Social Class and Workers= Rent,

8.1 Frequency Distribution, Frequency Polygon, Histogram page 326

STA 584 Supplementary Examples (not to be graded) Fall, 2003

Elementary Statistics

Understanding Sources of Wage Inequality: Additive Decomposition of the Gini Coefficient Using. Quantile Regression

CHAPTER 1. Introduction

Clinical Research Module: Biostatistics

Applied Quantitative Methods II

Lecture 28 Chi-Square Analysis

and A L T O S O L O LOWELL, MICHIGAN, THURSDAY, OCTCBER Mrs. Thomas' Young Men Good Bye 66 Long Illness Have Sport in

School of Business. Blank Page

Group comparisons in logit and probit using predicted probabilities 1

Nonparametric Density Estimation

Semi and Nonparametric Models in Econometrics

Regression #8: Loose Ends

Marginal effects and extending the Blinder-Oaxaca. decomposition to nonlinear models. Tamás Bartus

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score

1. Fill in the three character code you received via in the box

Econometrics Problem Set 3

Impact Evaluation Technical Workshop:

MEASURING THE SOCIO-ECONOMIC BIPOLARIZATION PHENOMENON

The scatterplot is the basic tool for graphically displaying bivariate quantitative data.

Inferences Based on Two Samples

Section 3. Measures of Variation

# of 6s # of times Test the null hypthesis that the dice are fair at α =.01 significance

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

Latent classes for preference data

WISE International Masters

Regression Analysis in R

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Multivariate Methods. Multivariate Methods: Topics of the Day

Foundations of Statistical Inference

ECON 361 Assignment 1 Solutions

This exam contains 13 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

STAT5044: Regression and Anova

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

Parameter Uncertainty in the Estimation of the Markov Model of Labor Force Activity: Known Error Rates Satisfying Daubert

TOPIC: Descriptive Statistics Single Variable

CHAPTER 1 Functions Graphs, and Models; Linear Functions. Algebra Toolbox Exercises. 1. {1,2,3,4,5,6,7,8} and

Transcription:

An Introduction to Relative Distribution Methods by Mark S Handcock Professor of Statistics and Sociology Center for Statistics and the Social Sciences CSDE Seminar Series March 2, 2001 In collaboration with: Martina Morris

00 Introduction and Motivations Motivating issue: To compare the economic status of groups across the full distribution - relative to each other - over time Tracking the earnings of men and women over time Focus on yearly earnings of - individual full time year round workers, 16 65 years old, not in school, the military or farming - non-hispanic white men and women, non-hispanic black women - Consider 1967 to 1987 (ie, 21 years of data) - Based on the US Current Population Survey (annual March Supplement) - Sample sizes are large 20,000 per year for males, 10,000 per year for females 2

How can/should we compare the earnings? One approach is to consider a summary measure of level (eg, mean, median) 3

What can we learn from such summaries? - where the centers of the distributions comparatively lie What is lost from such summaries? - How are the men s earnings distributed about their median value? - How are the women s earnings distributed about their median value? - Are they distributed in a similar fashion? We do not know if substantively important differences between (the distributions of) men s and women s earnings are adequately summarized 4

What other differences can matter? density 00 005 010 015 020 025 men women 0 10 20 30 40 earnings (thousand dollars) Fig 1 The distributions of earnings in 1967 - Spread variation about the median within each group - Quantiles proportions above or below fixed values (eg, poverty line) - Shape the full distributional features of each curve 5

We require a framework that captures the differences between distributions: - in a parsimonious way - flexible representation of patterns of differences in the data that are not preconceived - builds on easily interpretable numerical summaries such as location and spread 6

Idea: Rescale the women s earnings relative to the men s so that both are on the same interpretable scale eg Break up the men s distribution into deciles: 0-4420 4421-5500 5501-6500 13500 - What proportion of women fall within each of these groups: density of women 0 1 2 3 4 0 01 02 03 04 05 06 07 08 09 proportion of men Fig 2 The relative decile distribution of women s to men s earnings in 1967 We can see how frequent the women are for groups of men of equal size 7

2 5 10 15 30 proportion of women 0 2 4 6 10 5 2 00 02 04 06 08 10 proportion of men Fig 3 The relative distribution of women s to men s earnings in 1967 8

10 Technical Stuff: The Relative Distribution - Does the idea have depth? Y 0 a measurement for a reference population F 0 (y) the CDF and f 0 (y) the density Y the measurement on a comparison population F (y) CDF and density f(y) The objective is to study the differences between the distributions of Y and Y 0 using Y 0 as the reference Consider the grade transformation of Y to Y 0 : R = F 0 (Y ) Cwik and Mielniczuk (1989) Refer to R as the relative distribution of Y to Y 0 9

The cumulative distribution function (CDF) of R is ( ) G(r) = F F0 1 (r) 0 r 1 The corresponding density, g(r) = ( f F 1 ) 0 (r) f 0 ( F 1 0 (r) ) 0 r 1 where r represents the proportion of values f and f 0 are the densities Interpretations G(r) the relative CDF: a proportion G(r) of the target population are below the level of a proportion r of the reference population g(r) the relative density, represents the ratio of the frequency of the target population to the frequency of the reference population at the r th quantile of the reference population level [ F 1 0 (r) ] 10

The relative distribution focuses directly on the comparison rather than individual distributions The scale is in terms of the ranks rather than levels: we count people rather than dollars The relative distribution is invariant to monotone transformation of each of the variable (eg, wages vs log wages) If the two distributions are identical then the relative distribution is uniform on [0, 1] Many indices and measures can be defined based on g(p) alone, emphasizing interesting aspects of their relative shape χ 2 L 1 For example, Kullback Leibler information number: KL(Y, Y 0 ) = divergence: χ 2 (Y, Y 0 ) = distance: L 1 (Y, Y 0 ) 1 0 1 0 log[g(p)]g(p)dp [g(p) 1] 2 dp f 0 (x) f(x) f 0 (x)dx = 1 0 g(p) 1 dp 11

11 Application: Changes in Men s Earnings 1975 93 Consider distributions of real hourly wages for white men between 1975 and 1993 (based on CPS March Supplement) density 00 002 004 006 1993 1975 0 10 20 30 40 wage rate (dollars per hour) Fig 4 The distributions of real hourly wages in 1975 and 1993 expressed in 1993 dollars 12

To explain the rise in wage inequality, researchers have started to look at the restructuring and reorganization of the American firm and its effects on workers (Cappelli 1994, Harrison 1994) - a dramatic increase in the use of (so called contingent workers) vs (so called core workers) 0 4000 8000 12000 0 20 40 60 80 hours worked per year in 1975 0 4000 8000 12000 0 20 40 60 80 hours worked per year in 1975 Figure 5: The distributions of usual weekly hours worked for 1975 and 1993 13

The Relative Distribution of Work Schedules 15 40 45 50 60 density 10 15 20 00 02 04 06 08 10 proportion of weekly hours in 1975 - The polarization in work schedules is quite apparent - Could this explain the growth in the dispersion of hourly wages? 14

20 Adjusting the Distributions for Differences in Covariates Suppose the two groups differ in terms of a covariate Z For example, we can adjust the relative distribution of wages for the changing distribution of work schedules (Y 0, Z 0 ) reference wages and work schedules (Y, Z) comparison wages and work schedules Idea: Construct a synthetic population for the reference group that has the same composition of the covariate as the comparison wage For example, what would the 1975 wages have looked like if the 1975 group had as many working the same hours as the 1993 group? The marginal density of Y 0 f 0 (y) f Y 0(y) = is f Y 0 Z0 (y z) f Z0(z) dz 15

Let Y A be the random variable that gives the measurement Y 0 for the (hypothetical) reference population with - a marginal distribution of Z is the same as in the comparison population and - the conditional distributions of the measurement given Z (ie, f Y (y z) ) the same as in the reference population 0 Z0 The density of Y A can be written as: f A (y) = f Y 0 Z0(y z) f Z (z) dz Call Y A the random variable describing Y 0 compositionally adjusted to Z 16

We can now determine the compositional effects of the covariate by comparing Y m to Y 0 Let X m,0 = F 0 (Y m ) be the relative distribution of Y m to Y 0 - describes the differences between Y 0 and Y due to compositional effects Let X 1,m = F m (Y ) be the relative distribution of Y to Y m - describes the differences between Y and Y 0 not due to the compositional differences Let X X 1,0 = F 0 (Y ) be the relative distribution of Y to Y 0 X 1,m is the relative distribution of X 1,0 to X m,0 Heuristically, we can represent the decomposition of the relative densities by: overall relative density = relative density representing compositional effect of the covariate covariate adjusted relative density By comparing plots of g 1 0, g A 0, and g 1 A side by side we can gauge the relative size and nature of the components 17

21 Decomposition of Changes in Wages due to Changes in Work Schedules (a) (b) (c) density 04 06 08 10 12 14 16 18 density 04 06 08 10 12 14 16 18 density 04 06 08 10 12 14 16 18 00 02 04 06 08 10 00 02 04 06 08 10 00 02 04 06 08 10 proportion of 1975 proportion of 1975 proportion of adjusted 1975 The polarization in work schedules had a modest polarizing effect on the wages In terms of the magnitude of the effect, this shift did not drive the majority of the rising inequality in wages 18

30 Conclusions Relative distribution ideas represent a general framework for the comparative analysis of distributional difference and change - can be used as the basis for exploratory, descriptive and analytical techniques Summary measures based on the relative density can be used to test hypotheses about distributional differences Decomposition techniques enable one to isolate location, shape and compositional effects - enables one to distinguish the impact of changes in population mix (a demographic process) from changes in attribute allocation (a social or economic process) 19