Econometrics II. Seppo Pynnönen. Spring Department of Mathematics and Statistics, University of Vaasa, Finland

Department of Mathematics and Statistics, University of Vaasa, Finland Spring 2018

Part III Limited Dependent Variable Models As of Jan 30, 2017

1 Background 2 Binary Dependent Variable The Linear Probability Model The Logit and Probit Model 3 Tobit Model Interpreting Tobit Estimates Predicting with Tobit Regression Checking Specification of Tobit Models

Limited dependent variables refer to variables whose range of values is substantially restricted. A binary variable takes only two values (0/1) is an example. Other examples are is a variable that takes a small number of integer values. Other kinds of limited variables are those whose values are truncated for some reasons. For example, number of passenger tickets in an airplane or some sports event, etc. Note however that not all truncated cases need special treatment. An example is wage, which must be positive. Typical truncated value variables are those that have in the limiting value a big concentration of observations.

Up until now in regression y = x β + u, (1) where x β = β 0 + β 1 x 1 + + β k x k, y has had quantitative meaning (e.g. wage). What if y indicates a qualitative event (e.g., firm has gone to bankruptcy), such that y = 1 indicates the occurrence of the event ( success ) and y = 0 non-occurrence ( fail ), and we want to explain it by some explanatory variables?

The meaning of the regression y = x β + u, when y is a binary variable. Then, because E[u x] = 0, E[y x] = x β. (2) Because y is a random variable that can have only values 0 or 1, we can define probabilities for y as P(y = 1 x) and P(y = 0 x) = 1 P(y = 1 x), such that E[y x] = 0 P(y = 0 x) + 1 P(y = 1 x) = P(y = 1 x).

Thus, E[y x] = P(y = 1 x) indicates the success probability and regression in equation 2 models P(y = 1 x) = β 0 + β 1 x 1 + + β k x k, (3) the probability of success. This is called the linear probability model (LPM). The slope coefficients indicate the marginal effect of corresponding x-variable on the success probability, i.e., change in the probability as x changes, or P(y = 1 x) = β j x j. (4)

In the OLS estimated model ŷ = β 0 + ˆβ 1 x 1 +... ˆβ k x k (5) ŷ is the estimated or predicted probability of success. In order to correctly specify the binary variable, it may be useful to name the variable according to the success category (e.g., in a bankruptcy study, bankrupt = 1 for bankrupt firms and bankrupt = 0 for non-bankrupt firm [thus success is just a generic term]).

Example 1 (Married women participation in labor force (year 1975)) Linear probability model (See R-snippet for the R-commands): lm(formula = inlf ~ nwifeinc + educ + exper + I(exper^2) + age + kidslt6 + kidsge6, data = wkng) Residuals: Min 1Q Median 3Q Max -0.93432-0.37526 0.08833 0.34404 0.99417 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 0.5855192 0.1541780 3.798 0.000158 *** nwifeinc -0.0034052 0.0014485-2.351 0.018991 * educ 0.0379953 0.0073760 5.151 3.32e-07 *** exper 0.0394924 0.0056727 6.962 7.38e-12 *** I(exper^2) -0.0005963 0.0001848-3.227 0.001306 ** age -0.0160908 0.0024847-6.476 1.71e-10 *** kidslt6-0.2618105 0.0335058-7.814 1.89e-14 *** kidsge6 0.0130122 0.0131960 0.986 0.324415 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 0.4271 on 745 degrees of freedom Multiple R-squared: 0.2642,Adjusted R-squared: 0.2573 F-statistic: 38.22 on 7 and 745 DF, p-value: < 2.2e-16

All others but kidsge6 are statistically significant with signs as might be expected. The coefficients indicate the marginal effects of the variables on the probability that inlf = 1. Thus e.g., an additional year of educ increases the probability by 0.037 (other variables held fixed). Marginal effect of experince on marri women labor force participation Marginal effect of eduction on marrie women labor force participation Probability 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Probability 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0 10 20 30 40 Experience (years) 0 5 10 15 Education (years)

Some issues with associated to the LPM. Dependent left hand side restricted to (0, 1), while right hand side (, ), which may result to probability predictions less than zero or larger than one. Heteroskedasticity of u, since by denoting p(x) = P(y = 1 x) = x β var[u x] = (1 p(x))p(x) (6) which is not a constant but depends on x, and hence violating Assumption 2.

The first of the above problems can be technically easily solved by mapping the linear function on the right hand side of equation (3) by a non-linear function to the range (0, 1). Such a function is generally called a link function. That is, instead we write equation (3) as P(y = 1 x) = G(x β). (7) Although any function G : R [0, 1] applies in principle, so called logit and probit transformations are in practice most popular (the former is based on logistic distribution and the latter normal distribution). Economists favor often the probit transformation such that G is the distribution function of the standard normal density, i.e., G(z) = Φ(z) = z 1 2π e 1 2 v 2 dv, (8)

In the logit tranformation G(z) = Both as S-shaped ez 1 + e z = 1 1 + e z = z e v dv. (9) (1 + e v 2 ) Probit transformation Logit transformation G(z) 0.0 0.2 0.4 0.6 0.8 1.0 G(z) 0.0 0.2 0.4 0.6 0.8 1.0 3 1 0 1 2 3 z 3 1 0 1 2 3 z

The price, however, is that the interpretation of the marginal effects is not any more as straightforward as with the LPM. However, negative sign indicates decreasing effect on the probability and positive increasing. More precisely, using equation (7), the marginal change with respect to x j (keeping others unchanged) is P(y = 1 x β) g(x β)β j x j, (10) where g is the derivative function of G (g(x β) = (1/ 2π) exp ( (x β) 2 /2 ) for probit and g(x β) = exp( x β)/ (1 + exp( x β)) 2 for logit).

Typically the marginal effects are evaluated by unit changes in x j (i.e., x j = 1) at sample means of the x-variables with estimated β-coefficients [partial effect at the average (PEA)]. Another commonly used approach is to evaluate at the sample mean 1 n g(x i n ˆβ). (11) i=1

There are various pseudo R-suared measures for binary response models. One is McFadden measure. Another is squared correlation between ŷ i s (prediceted probability) and observed y i s (which have 0/1 values). Using R, the former can be computed as 1 (residual deviance)/(null deviance), where residual deviance is the value of the likelihood function of the fitted model, and null deviance is the value of the likelihood function when the intercept is included into the model.

Example 2 (Married women s labor force... ) Probit: (family = binomial(link = probit ) in glm) Call: glm(formula = inlf ~ nwifeinc + educ + exper + I(exper^2) + age + kidslt6 + kidsge6, family = binomial(link = "probit"), data = wkng) Deviance Residuals: Min 1Q Median 3Q Max -2.2156-0.9151 0.4315 0.8653 2.4553 Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) 0.2700736 0.5080782 0.532 0.59503 nwifeinc -0.0120236 0.0049392-2.434 0.01492 * educ 0.1309040 0.0253987 5.154 2.55e-07 *** exper 0.1233472 0.0187587 6.575 4.85e-11 *** I(exper^2) -0.0018871 0.0005999-3.145 0.00166 ** age -0.0528524 0.0084624-6.246 4.22e-10 *** kidslt6-0.8683247 0.1183773-7.335 2.21e-13 *** kidsge6 0.0360056 0.0440303 0.818 0.41350 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Null deviance: 1029.7 on 752 degrees of freedom Residual deviance: 802.6 on 745 degrees of freedom AIC: 818.6 Pseudo R-square: 1-802.6 / 1029.7 = 0.221

Logit: (family = binomial(link = logit ) in glm) glm(formula = inlf ~ nwifeinc + educ + exper + I(exper^2) + age + kidslt6 + kidsge6, family = binomial(link = "logit"), data = wkng) Deviance Residuals: Min 1Q Median 3Q Max -2.1770-0.9063 0.4473 0.8561 2.4032 Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) 0.425452 0.860365 0.495 0.62095 nwifeinc -0.021345 0.008421-2.535 0.01126 * educ 0.221170 0.043439 5.091 3.55e-07 *** exper 0.205870 0.032057 6.422 1.34e-10 *** I(exper^2) -0.003154 0.001016-3.104 0.00191 ** age -0.088024 0.014573-6.040 1.54e-09 *** kidslt6-1.443354 0.203583-7.090 1.34e-12 *** kidsge6 0.060112 0.074789 0.804 0.42154 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Null deviance: 1029.75 on 752 degrees of freedom Residual deviance: 803.53 on 745 degrees of freedom AIC: 819.53, Pseudo R-squared: 1-803.53 / 1029.75 = 0.220 Qualitatively the results are similar to those of the LPM. (R exercise: create similar graphs to those of the linear case for the marginal effects.)

Limited dependent variable is called a corner solution response variable if the variable is zero (say) for a nontrivial fraction in the population but is roughly continuously distributed over positive values. An example is the amount an individual is consuming alcohol in a given month. Nothing in principle prevents using a linear model for such a y. The problem is that fitted values may be negative.

In cases where it is important to have a model that implies nonnegative predicted values for y, the Tobit model is convenient. The Tobit model (typically) expresses the observed response, y, in terms of an underlying latent variable, y, y = x β + u (12) with and u x N(0, σ 2 ). y = max(0, y ) (13)

Accordingly y N(x β, σ 2 ) and y = y for y 0, but y = 0 for y < 0. Given sample of observations on y, the parameters can be estimated by the method of maximum likelihood. The log-likelihood function for observation i is l i (β, σ 2 ) = 1(y i = 0) log ( 1 Φ(x iβ/σ) ) (14) ( 1 +1(y i > 0) log σ φ ( (y i x iβ)/σ ) ) where 1(A) is an indicator function with value 1 if the condition A is true and zero otherwise, Φ( ) is the distribution function and φ( ) the density function of the N(0, 1) distribution. The maximization of the log-likelihood, l(β, σ) = i l i(β, σ), to obtain the ML estimates of β and σ is done by numerical methods.

Example 3 (Married women annual working hours) Married women working hours Frequency 0 50 100 150 200 250 300 0 1000 2000 3000 4000 5000 Hours

OLS results lm(formula = hours ~ nwifeinc + educ + exper + I(exper^2) + age + kidslt6 + kidsge6, data = wkng) Residuals: Min 1Q Median 3Q Max -1511.3-537.8-146.9 538.1 3555.6 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 1330.4824 270.7846 4.913 1.10e-06 *** nwifeinc -3.4466 2.5440-1.355 0.1759 educ 28.7611 12.9546 2.220 0.0267 * exper 65.6725 9.9630 6.592 8.23e-11 *** I(exper^2) -0.7005 0.3246-2.158 0.0312 * age -30.5116 4.3639-6.992 6.04e-12 *** kidslt6-442.0899 58.8466-7.513 1.66e-13 *** kidsge6-32.7792 23.1762-1.414 0.1577 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 750.2 on 745 degrees of freedom Multiple R-squared: 0.2656,Adjusted R-squared: 0.2587 F-statistic: 38.5 on 7 and 745 DF, p-value: < 2.2e-16

Tobit regression vglm(formula = hours ~ nwifeinc + educ + exper + I(exper^2) + age + kidslt6 + kidsge6, family = tobit(lower = 0), data = wkng) Pearson residuals: Min 1Q Median 3Q Max mu -8.429-0.8331-0.1352 0.8136 3.494 loge(sd) -0.994-0.5814-0.2366 0.2150 11.893 Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept):1 965.28507 443.93450 2.174 0.029676 * (Intercept):2 7.02289 0.03589 195.682 < 2e-16 *** nwifeinc -8.81433 4.48480-1.965 0.049371 * educ 80.64715 21.56529 3.740 0.000184 *** exper 131.56501 17.01343 7.733 1.05e-14 *** I(exper^2) -1.86417 0.52992-3.518 0.000435 *** age -54.40524 7.34462-7.408 1.29e-13 *** kidslt6-894.02622 111.46120-8.021 1.05e-15 *** kidsge6-16.21577 38.48134-0.421 0.673468 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Number of linear predictors: 2 Names of linear predictors: mu, loge(sd) Log-likelihood: -3819.095 on 1497 degrees of freedom Number of iterations: 6

(Intercept):2 is an extra statistic related to residual standard deviation. OLS generally results to biased estimation due to the censored y-values. Tobit regression accounts the biasing effect. However, we should make some adjustments to the Tobit coefficients before interpreting the magnitudes, as discussed below.

Interpreting Tobit Estimates 1 Background 2 Binary Dependent Variable The Linear Probability Model The Logit and Probit Model 3 Tobit Model Interpreting Tobit Estimates Predicting with Tobit Regression Checking Specification of Tobit Models

Interpreting Tobit Estimates Similar to regression, the interest is in the conditional expectation E[y x]. Given E[y y > 0, x] we can compute E[y x] as we can consider E[y y > 0, x] as the value of the binary random variable z which has value E[y y > 0, x] with with probability P[y > 0 x], when y > 0 and E[y y = 0, x] = 0 with probability P[y = 0 x] when y = 0. Accordingly using the law of iterated expectation (LIE) 2 E[y x] = E z [E[y z, x]] = P[y > 0 x] E[y y > 0, x] (17) 2 Generally, given random variables x, y, and z, E[x z] = E y [E[x y, z]] (15) and in particular E[x] = E y [E[x y]]. (16)

Interpreting Tobit Estimates Because y N(x β, σ 2 ) and y = y for y > 0 and y = 0 for y < 0, we have P(y > 0 x) = 1 Φ( x β/σ) = Φ(x β), such that E[y x] in (17) becomes E[y x] = Φ(x β/σ)e[y y > 0, x]. (18) To obtain E[y y > 0, x] we can use the general result for z N(0, 1): For any c E[z z > c] = φ(c)/ (1 Φ(c)) from which we obtain, by noting that y = x β + u and E[y y > 0, x] = x β + E[u u > x β], E[y y > 0, x] = x β + σ φ(xβ/σ), (19) where φ(c) = φ(c)/φ(c) [note: φ( c) = φ(c) and 1 Φ( c) = Φ(c)].

Interpreting Tobit Estimates Thus the marginal contribution of x j to the (conditional) expectation is x j E[y y > 0, x] = β j + β j φ (x β), (20) where φ ( ) is the derivative of φ( ). Because for standard normal distribution φ (z) = dφ(z)/dz = zφ(z) and Φ (z) = dφ(z)/dz = φ(z), we get finally ( E[y y > 0, x] = β j 1 x φ(x ( β/σ) x β/σ + φ(x )) β/σ). (21) j

Interpreting Tobit Estimates Equation (21) shows that the β j does not exactly reflect the marginal effect of x j on E[y y > 0, x]. It ( becomes adjusted ( by the factor )) 1 φ(x β/σ) x β/σ + φ(x β/σ). The marginal effect of x j on E[y x]: Combining equations (17) and (19), we have E[y x] = Φ(x β/σ)x β + σφ(x β), (22) where we have used the result Φ(z) φ(z) = φ(z).

Interpreting Tobit Estimates From equation (17) we can compute the marginal effect of x j by utilizing φ (z) = zφ(z), so that x j E[y x] = β j Φ(x β/σ) + β j φ(x β/σ)x β β j φ(x β)x β = β j Φ(x β/σ). (23) Again β becomes adjusted to some extend (causing difference from OLS). After estimating β and σ, Φ(x β/σ) is often evaluated at the mean n 1 i Φ(x i ˆβ/ˆσ).

Predicting with Tobit Regression 1 Background 2 Binary Dependent Variable The Linear Probability Model The Logit and Probit Model 3 Tobit Model Interpreting Tobit Estimates Predicting with Tobit Regression Checking Specification of Tobit Models

Predicting with Tobit Regression Predicteions of E[y x] in equation (22) can be obtained by replacing the parameters by their estimates ŷ = Φ(x ˆβ/ˆσ)x ˆβ + ˆσφ(x ˆβ/ˆσ), (24) where Φ is the standard normal cumulative distribution function and φ the standard normal density function (derivative function of Φ). Exercise: Using R, plot the predicted values for working hours as a function of education (educ) when the other explanatory are set to their means (for a solution, see R snippet for Example 3 on the course home page).

Predicting with Tobit Regression Remark 1 In OLS the R-square is the correlation of the observed values with the predicted values. Using this practice, one can compute an R-square for a Tobit model as well. For the OLS solution, R 2 = 0.258. Saving the R vglm results into an object (above wkh.tbt), the predicted values can be extracted with the fitted() function. In R S4 object the sub-objects are called slots. The observed dependent values are in slot @y, i.e., in our case wkh.tbt@y. Thus, for the Tobit model command cor(wkh.tbt@y, fitted(wkh)) 2 produces R 2 = 0.261, which is close to that of OLS.

Checking Specification of Tobit Models 1 Background 2 Binary Dependent Variable The Linear Probability Model The Logit and Probit Model 3 Tobit Model Interpreting Tobit Estimates Predicting with Tobit Regression Checking Specification of Tobit Models

Checking Specification of Tobit Models If we introduce a dummy variable w = 0 when y = 0 and w = 1 if y > 0, then E[w x] = P[w = 1 x] = Φ(x β/σ) is the probit model. Accordingly, if the Tobit model holds, we can expect that the (scaled) Tobit slope estimate ˆβ j /ˆσ of x j should be fairly close to that of probit estimate ˆγ j. Comparing closeness of the slope coefficients can be used as an informal specification check of appropriateness of the Tobit model. ================================= Tobit/sigma Probit --------------------------------- (Intercept):1 0.8603 0.2701 nwifeinc -0.0079-0.0120 educ 0.0719 0.1309 exper 0.1173 0.1233 I(exper^2) -0.0017-0.0019 age -0.0485-0.0529 kidslt6-0.7968-0.8683 kidsge6-0.0145 0.0360 (Insignificant in both models) ================================= The (scaled) slope coefficients of the Tobit model are fairly close to those of the probit model, suggesting appropriateness of the Tobit model.