Truncation and Censoring Laura Magazzini laura.magazzini@univr.it Laura Magazzini (@univr.it) Truncation and Censoring 1 / 35
Truncation and censoring Truncation: sample data are drawn from a subset of a larger population of interest Characteristic of the distribution from which the sample data are drawn Example: studies of income based on incomes above or below the poverty line (of limited usefulness for inference about the whole population) Censoring: values of the dependent variable in a certain range are all transformed to (or reported at) a single value Defect in the sample data Example: in studies of income, people below the poverty line are reported at the poverty line Truncation and censoring introduce similar distortion into conventional statistical results Laura Magazzini (@univr.it) Truncation and Censoring 2 / 35
Truncation Truncation Aim: infer the caracteristics of a full population from a sample drawn from a restricted population Example: characteristics of people with income above $100,000 Let Y be a continous random variable with pdf f (y). The conditional distribution of y given y > a (a a constant) is: f (y y > a) = In case of y normally distributed: where α = a µ σ f (y y > a) = f (y) Pr(y > a) 1 σ φ ( x µ ) σ 1 Φ(α) Laura Magazzini (@univr.it) Truncation and Censoring 3 / 35
Truncation Moments of truncated distributions E(Y y < a) < E(Y ) E(Y y > a) > E(Y ) V (Y trunc.) < V (Y ) Laura Magazzini (@univr.it) Truncation and Censoring 4 / 35
Truncation Moments of the truncated normal distribution Let y N(µ, σ 2 ) and a constant E(y truncation) = µ + σλ(α) Var(y truncation) = σ 2 [1 δ(α)] α = (a µ)/σ φ(α) is the standard normal density λ(α) is called inverse Mills ratio: λ(α) = φ(α)/[1 Φ(α)] λ(α) = φ(α)/φ(α) if truncation is y > a if truncation is y < a δ(α) = λ(α)[λ(α) α], where 0 < δ(α) < 1 for any α Laura Magazzini (@univr.it) Truncation and Censoring 5 / 35
Truncation Example: a truncated log-normal income distribution From New York Post (1987): The typical upper affluent American... makes $142,000 per year... The people surveyed had household income of at least $100,000 Does this tell us anything about the typical American?... only 2 percent of Americans make the grade Degree of truncation in the sample: 98% The $142,000 is probably quite far from the mean in the full population Assuming lognormally distributed income in the population (log of income has a normal distribution), the information can be employed to deduce the population mean income Let x = income and y = ln x E[y y > log 100] = µ + σφ(α) 1 Φ(α) By substituting E[x] = E[e y ] = e µ+σ2 /2, we get E[x] = $22, 087 1987 Statistical Abstract of the US listed average household income of about $25, 000 (relatively good estimate based on little information!) Laura Magazzini (@univr.it) Truncation and Censoring 6 / 35
The truncated regression model y Truncation i = x i β + ɛ i, ɛ i x i N(0, σ 2 ) Unit i is observed only if yi cross a threshold: { n.a. if y y i = i a yi if yi > a E[y i yi > a] = x i β + σλ(α i), with α i = (a x i β)/σ The marginal effect in the subpopulation is: E[y i yi > a] = β + σ(dλ(α i )/dα i ) α i x i x i =... = β(1 δ(α i )) Since 0 < δ(α i ) < 1, the marginal effect in the subpopulation is less than the corresponding coefficient If the interest is in the linear relationship between y and x (population), the β can be directly interpreted Laura Magazzini (@univr.it) Truncation and Censoring 7 / 35
Estimation Truncation and censoring Truncation OLS of y on x leads to inconsistent estimates The model is y i yi > a = E(y i y i > a) + ɛ i = x i β + σλ(α i) + ɛ i By construction, the error term is heteroskedastic Omitted variable bias (λ i is not included in the regression) In applications, it is usually found that the OLS estimates are biased toward zero Under the normality assumption, MLE can be obtained y µ φ( σ ) f (y y > a) = 1 σ 1 Φ(α) with α = a µ σ The log-likelihood can be written as log L = N i=1 [ ( log σ 1 yi x φ σ i β )] N log i=1 [ ( a x 1 Φ σ i β )] Laura Magazzini (@univr.it) Truncation and Censoring 8 / 35
Example: simulated data Y = 1.5 + 0.5x + ɛ/2, N = 100, a = 0 Truncation Laura Magazzini (@univr.it) Truncation and Censoring 9 / 35
Censored data Censored data Censored regression models generally apply when the variable to be explained is partly continuous but has positive probability mass at one or more points Assume there is a variable with quantitave meaning y and we are interested in E[y x] If y and x were observed for everyone in the population: standard regression methods (ordinary or nonlinear least squares) can be applied In the case of censored data, y is not observable for part of the population Conventional regression methods fail to account for the qualitative difference between limit (censored) and nonlimit (continuous) observations Top coding / corner solution outcome Laura Magazzini (@univr.it) Truncation and Censoring 10 / 35
Censored data Top coding: example Data generating process Let wealth denote actual family wealth, measured in thousands of dollars Suppose that wealth follows the linear regression model E[wealth x] = x β Censored data: we observe wealth only when wealth > 200 When wealth is smaller than 200 we know that it is, but we do not know the actual value of wealth Therefore observed wealth can be written as wealth = max(wealth, 200) Laura Magazzini (@univr.it) Truncation and Censoring 11 / 35
Censored data Top coding: example Estimation of β We assume that wealth given x has a homoskedastic normal distribution wealth = x β + ɛ, ɛ x N(0, σ 2 ) Recorded wealth is: wealth = max(wealth, 200) = max(x β + ɛ, 200) β is estimated via maximum likelihood using a mixture of discrete and continuous distributions (details later...) Laura Magazzini (@univr.it) Truncation and Censoring 12 / 35
Censored data Example: seat demanded and ticket sold Laura Magazzini (@univr.it) Truncation and Censoring 13 / 35
Censored data Corner solution outcomes Still labeled censored regression models Pioneer work by Tobin (1958): household purchase of durable goods Let y be an observable choice or outcome describing some economic agent, such as an individual or a firm, with the following characteristics: y takes on the value zero with positive probability but is a continuous random variable over strictly positive values Examples: amount of life insurance coverage chosen by an individual, family contributions to an individual retirement account, and firm expenditures on research and development We can imagine economic agents solving an optimization problem, and for some agents the optimal choice will be the corner solution, y = 0 The issue here is not data observability, rather individual behaviour We are interested in features of the distribution of y given x, such as E[y x] and Pr(y = 0 x) Laura Magazzini (@univr.it) Truncation and Censoring 14 / 35
The censored normal distribution Censored data y N(µ, σ 2 ) Observed data are censored in a = 0: { y = 0 if y 0 y = y if y > 0 The distribution is a mixture of discrete and continuous distribution If y 0: f (y) = Pr(y = 0) = Pr(y 0) = Φ( µ/σ) = 1 Φ(µ/σ) If y > 0: f (y) = φ ( ) y µ σ ( ) E[y] = 0 Pr(y = 0) + E[y y > 0] Pr(y > 0) = (µ + σλ)φ 0 µ σ with λ = φ/φ Laura Magazzini (@univr.it) Truncation and Censoring 15 / 35
The censored regression model Tobit model (Tobin, 1958) Censored data Let y be a continuous variable (latent variable): y i = x i β + ɛ i, where ɛ x N(0, σ 2 ) The observed data y are Why not OLS? y i = max(0, y i ) = Estimates can be obtained by MLE { 0 if y i 0 if yi > 0 y i Laura Magazzini (@univr.it) Truncation and Censoring 16 / 35
Estimation Truncation and censoring Censored data A positive probability is assigned to the observations y i = 0: Pr(y i = 0 x i ) = Pr(yi 0 x i ) = Pr(x i β + ɛ i 0) = Pr(ɛ i x i β) = 1 Pr(ɛ i < x i β) ( x ) = 1 Φ i β σ The likelihood can be written as: L(β, σ 2 y) = ( ( x )) 1 Φ i β σ y i =0 y i >0 ( 1 Φ = y i =0 ( x )) i β σ y i >0 1 σ φ ( yi x i β ) σ 1 2πσ 2 e 1 2 ( yi x i ) 2 β Laura Magazzini (@univr.it) Truncation and Censoring 17 / 35 σ
Censored data Marginal effect in the tobit model In the case of censored data, β estimated from the tobit model can be employed to study the effect of x on E[y x] In the case of corner solution outcome, the estimated β are not sufficient since E[y x] and E[y x, y > 0] depend on β in a non-linear way E[y i x i ] x i = Φ ( x ) i β β σ E[y i x i ] x i = Pr(y i > 0) E[y i x i,y i >0] x i + E[y i x i, y i > 0] Pr[y i >0] x i A change in x i has two effects: (1) It affects the conditional mean of yi in the positive part of the distribution (2) It affects the probability that the observation will fall in the positive part of the distribution Laura Magazzini (@univr.it) Truncation and Censoring 18 / 35
Example: simulated data Y = 1.5 + 0.5x + ɛ/2, N = 100 Censored data Laura Magazzini (@univr.it) Truncation and Censoring 19 / 35
Some issues in specification Censored data Heteroschedasticity MLE is inconsistent However the problem can be approached directly and σ i considered in the likelihood function instead of σ. Specification of a particular model for σ i provides the empirical model for estimation Misspecification of Pr(y < 0) In the tobit model, a variable that increases the probability of an observation being a non-limit observation also increases the mean of the variable - Example: loss due to fire in buildings A more general model has been devised involving a decision equation and a regression equation for nonlimit observations Non-normality MLE is inconsistent Research is ongoing both on alternative estimators and on methods for testing this type of misspecification Laura Magazzini (@univr.it) Truncation and Censoring 20 / 35
Sample selection Truncation and censoring Sample selection What if observation is driven by a different process? (1) Data observability Saving function (in the population): saving = β 0 + β 1 income + β 2 age + β 3 married + β 4 kids + u Survey data only includes families whose household head was 45 years of age or older (2) Individual behaviour (Boyes, Hoffman, Low, 1989; Greene, 1992) y 1 = 1 if individual i defaults on a loan/credit card, 0 otherwise y 2 = 1 if individual i is granted a loan/credit card, 0 otherwise For a given individual, y 1 is not observed unless y 2 equals 1 Laura Magazzini (@univr.it) Truncation and Censoring 21 / 35
Sample selection Sample selection / incidental truncation Let y and z have a bivariate distribution with correlation ρ We are interested in the distribution of y given that another variable z exceeds a particular value Intuition: if y and z are positively correlated then the truncation of z should push the distribution of y to the right The truncated joint distribution is f (y, z z > a) = f (y, z) Pr(z > a) To obtain the incidentally truncated marginal density of y, we should integrate z out of this expression Laura Magazzini (@univr.it) Truncation and Censoring 22 / 35
Sample selection Moment of the incidentally truncated bivariate normal distribution Let y and z have a bivariate normal distribution with means µ y and µ z, standard deviations σ y and σ z, and correlation ρ E[y z > a] = µ y + ρσ y λ(α z ) V [y z > a] = σ 2 y [1 ρ 2 δ(α z )] α z = (a µ z )/σ z λ(α z ) = φ(α z )/[1 Φ(α z )] δ(α z ) = λ(α z )[λ(α z ) α z ] If the truncation is z < a, then λ(α z ) = φ(α z )/Φ(α z ) Laura Magazzini (@univr.it) Truncation and Censoring 23 / 35
Sample selection Example: A model of labor supply Consider a population of women where only a subsample is engaged in market employment We are interested in identifying the determinants of the labor supply for all women A simple model of female labor supply consists of 2 equations (1) Wage equation: the difference between a person s market wage and her reservation wage, as a function of characteristics such as age, education, number of children,... plus unobservables (2) Hours equation: The desired number of labor hours supplied depends on the wage, home characteristics (e.g. presence of small children), marital status,... plus unobservable Truncation: Equation 2 describes the desired hours, but an actual figure is observed only if the individual is working, i.e. when the market wage exceeds the reservation wage The hours variable is incidentally truncated Laura Magazzini (@univr.it) Truncation and Censoring 24 / 35
Sample selection Example: A model of labor supply When OLS on the working sample? Assume working women are chosen randomly If the working subsample has similar endowments of characteristics (both obs. & unobs.) as the nonworking sample, OLS is an option BUT the decision to work is not random: the working and nonworking sample potentially have different characteristics When the relationship is purely trough observables, appropriate conditioning variables can be included in the relevant equation If unobservable characteristics affecting the work decision are correlated with the unobservable characteristics affecting wage, then a relationship is determined that cannot be tackle by including appropriate controls A bias is induced due to sample selection Laura Magazzini (@univr.it) Truncation and Censoring 25 / 35
Sample selection Regression in a model of selection (1) Equation that determines sample selection The equation of primary interest is z i = w i γ + u i y i = x i β + ɛ i where y i is observed only when zi data are not available) is greater than zero (otherwise This model is closely related to the Tobit model, although it is less restrictive: the parameters explaining the censoring are not constrained to equal those explaining the variation in the observed dependent variable. For this reason the model is also known as Tobit type two. Laura Magazzini (@univr.it) Truncation and Censoring 26 / 35
Sample selection Regression in a model of selection (2) If u i and ɛ i have a bivariate normal distribution with zero mean and correlation ρ, E[y i y i is observed] = E[y i zi > 0] = E[y i u i > w i γ] = x i β + E[ɛ i u i > w i γ] = x i β + ρσ ɛ λ i (α u ) where α z = w i γ/σ u and λ(α u ) = φ(α u )/Φ(α u ) So, the regression model can be written as y i z i > 0 = E[y i z i > 0] + υ i = x i β + ρσ ɛ λ i (α u ) + υ i Laura Magazzini (@univr.it) Truncation and Censoring 27 / 35
Sample selection Regression in a model of selection (3) E[y i z i > 0] = x i β + ρσ ɛ λ i (α u ) OLS regression using the observed data will lead to inconsistent estimates (omitted variable bias) The marginal effect of the regressors on y i in the observed sample consists of two components: Direct effect on the mean of y i (β) In addition, if the variable appears in the probability that zi is positive, then it will influence y i through its presence in λ i E[y i zi ( > 0] ρσɛ = β k + γ k x ik σ u ) δ i (α u ) Most often zi is not observed, rather we can infer its sign but not its magnitude Since there is no information on the scale of z, the disturbance variance in the selection equation cannot be estimated (we let σu 2 = 1) Laura Magazzini (@univr.it) Truncation and Censoring 28 / 35
Sample selection Regression in a model of selection (4) Selection mechanisms z i = w i γ + u i, where we observe z i = 1 if zi > 0 and 0 otherwise. Pr(z i = 1 w i ) = Φ(w i γ) Pr(z i = 0 w i ) = 1 Φ(w i γ) Regression model y i = x i β + ɛ i, where y i is observed only when z i is equal to one (otherwise data are not available) (u i, ɛ i ) bivariate normal[0, 0, 1, σ ɛ, ρ] Laura Magazzini (@univr.it) Truncation and Censoring 29 / 35
Sample selection Estimation Least squares using the observed data produces incosistent estimates of β (omitted variable) Least squares regression of y on x and λ would be a consistent estimator However, even if λ i were observed, OLS would be inefficient: υ i are heteroskedastic Maximum likelihood estimation can be applied Heckman (1979) proposed a two-step procedure Laura Magazzini (@univr.it) Truncation and Censoring 30 / 35
Maximum likelihood estimation Sample selection The log likelihood for observation i, log L i = l i, can be written as: If y i is not observed l i = log Φ( w i γ) If y i is observed ( ) w i l i = log Φ γ + (y i x i β)ρ/σ ɛ 1 ( yi x 1 ρ 2 2 σ ɛ i β ) log( 2πσ ɛ ) σ ɛ and ρ are not directly estimated (they have to be greater than 0) Directly estimated are log σ ɛ and atanhρ: atanhρ = 1 ( ) 1 + ρ 2 log 1 ρ Estimation would be simplified if ρ = 0 Laura Magazzini (@univr.it) Truncation and Censoring 31 / 35
Sample selection Two-step procedure Heckman (1979) y i z i > 0 = E[y i z i > 0] + υ i = x i β + ρσ ɛ λ i (α u ) + υ i 1 Estimate the probit equation by MLE to obtain estimates of γ. For each observation in the selected sample, compute ˆλ i (inverse Mills ratio) 2 Estimate β and β λ = ρσ ɛ by least squares regression of y on x and ˆλ Laura Magazzini (@univr.it) Truncation and Censoring 32 / 35
Sample selection Estimators of the variance and standard errors Second step standard errors need to be adjusted to account for the first step estimation The estimation of σ ɛ needs to be adjusted: At each observation, the true conditional variance of the disturbance would be σ 2 i = σ 2 ɛ(1 ρ 2 δ i ) A consistent estimator of σ 2 ɛ is given by: ˆσ 2 ɛ = 1 n e e + ˆ δb 2 λ To test hypothesis, an estimate of the asymptotic covariance matrix of the coefficients (including β λ ) is needed Two problems arise: (1) the disturbance terms υ i is heteroskedastic; (2) there are unknown parameters in λ i Formulas are rather cumbersome, but can be calculated using the matrix of independent variables, the sample estimates of σ 2 ɛ and ρ, and the assumed known values of λ i and δ i Laura Magazzini (@univr.it) Truncation and Censoring 33 / 35
Two-step procedure Discussion Truncation and censoring Sample selection Identification: exclusion restriction Although the inverse Mills ration is non linear in the single index w i γ, the function mapping this index into the inverse Mills ratio is linear for certain ranges of the index Accordingly, the inclusion of additional variables in w i in the first step can be important for identification of the second step estimates In real world, there are few cadidates for simultaneous inclusion in w i and exclusion from x i Inclusion of the inverse Mills ratio into the equation of interest is driven by the normality assumption Recent research includes specific attempts to move away from the normality assumption: y i z i > 0 = x i β + µ(w i γ) + υ i where µ(w i γ) is called selectivity correction Laura Magazzini (@univr.it) Truncation and Censoring 34 / 35
Sample selection Selection in qualitative response models The problem of sample selection has been modeled in other settings besides the linear regression model Binary choice model have been considered, but also count data models For example in the case of the Poisson model: y i ɛ i Poisson(λ i ) log λ i = x i β + ɛ i (y i, x i ) are only observed when z i = 1, where zi = w i γ + u i and z i = 1 if zi > 0, 0 otherwise Assume that (ɛ i, u i ) have a bivariate normal distribution with non-zero correlation Selection affects the mean (and the variance) of y i and, in the observed data, y i no longer has a Poisson distribution Laura Magazzini (@univr.it) Truncation and Censoring 35 / 35