Econometrics II. Seppo Pynnönen. Spring Department of Mathematics and Statistics, University of Vaasa, Finland

Similar documents
Regression with Qualitative Information. Part VI. Regression with Qualitative Information

Econometrics Lecture 5: Limited Dependent Variable Models: Logit and Probit

Econometrics I Lecture 7: Dummy Variables

Modeling Binary Outcomes: Logit and Probit Models

Estimating the return to education for married women mroz.csv: 753 observations and 22 variables

ECON 482 / WH Hong Binary or Dummy Variables 1. Qualitative Information

Linear Regression Models P8111

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is

ECON 594: Lecture #6

Logistic & Tobit Regression

Binary Dependent Variables

Linear Regression With Special Variables

Lecture 12: Application of Maximum Likelihood Estimation:Truncation, Censoring, and Corner Solutions

Introduction to the Generalized Linear Model: Logistic regression and Poisson regression

BMI 541/699 Lecture 22

ST430 Exam 1 with Answers

Gibbs Sampling in Latent Variable Models #1

Econometrics II Tutorial Problems No. 1

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46

Week 7: Binary Outcomes (Scott Long Chapter 3 Part 2)

Model Specification and Data Problems. Part VIII

Classification. Chapter Introduction. 6.2 The Bayes classifier

Statistics 203 Introduction to Regression Models and ANOVA Practice Exam

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

Chapter 9 Regression with a Binary Dependent Variable. Multiple Choice. 1) The binary dependent variable model is an example of a

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20

Analysing categorical data using logit models

9 Generalized Linear Models

Generalized linear models

12 Modelling Binomial Response Data

Exam Applied Statistical Regression. Good Luck!

Categorical Predictor Variables

Applied Economics. Regression with a Binary Dependent Variable. Department of Economics Universidad Carlos III de Madrid

CHAPTER 7. + ˆ δ. (1 nopc) + ˆ β1. =.157, so the new intercept is = The coefficient on nopc is.157.

Course Econometrics I

Logistic Regressions. Stat 430

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Generalized Linear Models. stat 557 Heike Hofmann

Applied Health Economics (for B.Sc.)

Treatment Effects with Normal Disturbances in sampleselection Package

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/

Lab 10 - Binary Variables

Economics 536 Lecture 21 Counts, Tobit, Sample Selection, and Truncation

Control Function and Related Methods: Nonlinear Models

Matched Pair Data. Stat 557 Heike Hofmann

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis

Making sense of Econometrics: Basics

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j

Limited Dependent Variables and Panel Data

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Single-level Models for Binary Responses

Non-linear panel data modeling

Generalized linear models

Logistic Regression 21/05

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model

Today. HW 1: due February 4, pm. Aspects of Design CD Chapter 2. Continue with Chapter 2 of ELM. In the News:

Lecture 10: Alternatives to OLS with limited dependent variables. PEA vs APE Logit/Probit Poisson

Multiple Linear Regression

Data-analysis and Retrieval Ordinal Classification

Lecture 3.1 Basic Logistic LDA

Lecture notes to Chapter 11, Regression with binary dependent variables - probit and logit regression

Final Exam. Name: Solution:

The general linear regression with k explanatory variables is just an extension of the simple regression as follows

Exercise 5.4 Solution

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Problem set - Selection and Diff-in-Diff

Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis"

Density Temp vs Ratio. temp

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

STA 450/4000 S: January

McGill University. Faculty of Science. Department of Mathematics and Statistics. Statistics Part A Comprehensive Exam Methodology Paper

ECON 5350 Class Notes Functional Form and Structural Change

ST430 Exam 2 Solutions

Truncation and Censoring

Introduction to General and Generalized Linear Models

Intermediate Econometrics

Administration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books

Formulary Applied Econometrics

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

2. We care about proportion for categorical variable, but average for numerical one.

Review of Multinomial Distribution If n trials are performed: in each trial there are J > 2 possible outcomes (categories) Multicategory Logit Models

Would you have survived the sinking of the Titanic? Felix Pretis (Oxford) Econometrics Oxford University, / 38

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours

Outline of GLMs. Definitions

Statistical Inference. Part IV. Statistical Inference

STA102 Class Notes Chapter Logistic Regression

Log-linear Models for Contingency Tables

Course Econometrics I

Generalized Linear Models

Binary Choice Models Probit & Logit. = 0 with Pr = 0 = 1. decision-making purchase of durable consumer products unemployment

i (x i x) 2 1 N i x i(y i y) Var(x) = P (x 1 x) Var(x)

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression

Exercise sheet 6 Models with endogenous explanatory variables

Regression Methods for Survey Data

UNIVERSITY OF TORONTO Faculty of Arts and Science

Applied Regression Analysis

Transcription:

Department of Mathematics and Statistics, University of Vaasa, Finland Spring 2018

Part III Limited Dependent Variable Models As of Jan 30, 2017

1 Background 2 Binary Dependent Variable The Linear Probability Model The Logit and Probit Model 3 Tobit Model Interpreting Tobit Estimates Predicting with Tobit Regression Checking Specification of Tobit Models

Limited dependent variables refer to variables whose range of values is substantially restricted. A binary variable takes only two values (0/1) is an example. Other examples are is a variable that takes a small number of integer values. Other kinds of limited variables are those whose values are truncated for some reasons. For example, number of passenger tickets in an airplane or some sports event, etc. Note however that not all truncated cases need special treatment. An example is wage, which must be positive. Typical truncated value variables are those that have in the limiting value a big concentration of observations.

1 Background 2 Binary Dependent Variable The Linear Probability Model The Logit and Probit Model 3 Tobit Model Interpreting Tobit Estimates Predicting with Tobit Regression Checking Specification of Tobit Models

1 Background 2 Binary Dependent Variable The Linear Probability Model The Logit and Probit Model 3 Tobit Model Interpreting Tobit Estimates Predicting with Tobit Regression Checking Specification of Tobit Models

1 Background 2 Binary Dependent Variable The Linear Probability Model The Logit and Probit Model 3 Tobit Model Interpreting Tobit Estimates Predicting with Tobit Regression Checking Specification of Tobit Models

Up until now in regression y = x β + u, (1) where x β = β 0 + β 1 x 1 + + β k x k, y has had quantitative meaning (e.g. wage). What if y indicates a qualitative event (e.g., firm has gone to bankruptcy), such that y = 1 indicates the occurrence of the event ( success ) and y = 0 non-occurrence ( fail ), and we want to explain it by some explanatory variables?

The meaning of the regression y = x β + u, when y is a binary variable. Then, because E[u x] = 0, E[y x] = x β. (2) Because y is a random variable that can have only values 0 or 1, we can define probabilities for y as P(y = 1 x) and P(y = 0 x) = 1 P(y = 1 x), such that E[y x] = 0 P(y = 0 x) + 1 P(y = 1 x) = P(y = 1 x).

Thus, E[y x] = P(y = 1 x) indicates the success probability and regression in equation 2 models P(y = 1 x) = β 0 + β 1 x 1 + + β k x k, (3) the probability of success. This is called the linear probability model (LPM). The slope coefficients indicate the marginal effect of corresponding x-variable on the success probability, i.e., change in the probability as x changes, or P(y = 1 x) = β j x j. (4)

In the OLS estimated model ŷ = β 0 + ˆβ 1 x 1 +... ˆβ k x k (5) ŷ is the estimated or predicted probability of success. In order to correctly specify the binary variable, it may be useful to name the variable according to the success category (e.g., in a bankruptcy study, bankrupt = 1 for bankrupt firms and bankrupt = 0 for non-bankrupt firm [thus success is just a generic term]).

Example 1 (Married women participation in labor force (year 1975)) Linear probability model (See R-snippet for the R-commands): lm(formula = inlf ~ nwifeinc + educ + exper + I(exper^2) + age + kidslt6 + kidsge6, data = wkng) Residuals: Min 1Q Median 3Q Max -0.93432-0.37526 0.08833 0.34404 0.99417 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 0.5855192 0.1541780 3.798 0.000158 *** nwifeinc -0.0034052 0.0014485-2.351 0.018991 * educ 0.0379953 0.0073760 5.151 3.32e-07 *** exper 0.0394924 0.0056727 6.962 7.38e-12 *** I(exper^2) -0.0005963 0.0001848-3.227 0.001306 ** age -0.0160908 0.0024847-6.476 1.71e-10 *** kidslt6-0.2618105 0.0335058-7.814 1.89e-14 *** kidsge6 0.0130122 0.0131960 0.986 0.324415 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 0.4271 on 745 degrees of freedom Multiple R-squared: 0.2642,Adjusted R-squared: 0.2573 F-statistic: 38.22 on 7 and 745 DF, p-value: < 2.2e-16

All others but kidsge6 are statistically significant with signs as might be expected. The coefficients indicate the marginal effects of the variables on the probability that inlf = 1. Thus e.g., an additional year of educ increases the probability by 0.037 (other variables held fixed). Marginal effect of experince on marri women labor force participation Marginal effect of eduction on marrie women labor force participation Probability 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Probability 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0 10 20 30 40 Experience (years) 0 5 10 15 Education (years)

Some issues with associated to the LPM. Dependent left hand side restricted to (0, 1), while right hand side (, ), which may result to probability predictions less than zero or larger than one. Heteroskedasticity of u, since by denoting p(x) = P(y = 1 x) = x β var[u x] = (1 p(x))p(x) (6) which is not a constant but depends on x, and hence violating Assumption 2.

1 Background 2 Binary Dependent Variable The Linear Probability Model The Logit and Probit Model 3 Tobit Model Interpreting Tobit Estimates Predicting with Tobit Regression Checking Specification of Tobit Models

The first of the above problems can be technically easily solved by mapping the linear function on the right hand side of equation (3) by a non-linear function to the range (0, 1). Such a function is generally called a link function. That is, instead we write equation (3) as P(y = 1 x) = G(x β). (7) Although any function G : R [0, 1] applies in principle, so called logit and probit transformations are in practice most popular (the former is based on logistic distribution and the latter normal distribution). Economists favor often the probit transformation such that G is the distribution function of the standard normal density, i.e., G(z) = Φ(z) = z 1 2π e 1 2 v 2 dv, (8)

In the logit tranformation G(z) = Both as S-shaped ez 1 + e z = 1 1 + e z = z e v dv. (9) (1 + e v 2 ) Probit transformation Logit transformation G(z) 0.0 0.2 0.4 0.6 0.8 1.0 G(z) 0.0 0.2 0.4 0.6 0.8 1.0 3 1 0 1 2 3 z 3 1 0 1 2 3 z

The price, however, is that the interpretation of the marginal effects is not any more as straightforward as with the LPM. However, negative sign indicates decreasing effect on the probability and positive increasing. More precisely, using equation (7), the marginal change with respect to x j (keeping others unchanged) is P(y = 1 x β) g(x β)β j x j, (10) where g is the derivative function of G (g(x β) = (1/ 2π) exp ( (x β) 2 /2 ) for probit and g(x β) = exp( x β)/ (1 + exp( x β)) 2 for logit).

Typically the marginal effects are evaluated by unit changes in x j (i.e., x j = 1) at sample means of the x-variables with estimated β-coefficients [partial effect at the average (PEA)]. Another commonly used approach is to evaluate at the sample mean 1 n g(x i n ˆβ). (11) i=1

There are various pseudo R-suared measures for binary response models. One is McFadden measure. Another is squared correlation between ŷ i s (prediceted probability) and observed y i s (which have 0/1 values). Using R, the former can be computed as 1 (residual deviance)/(null deviance), where residual deviance is the value of the likelihood function of the fitted model, and null deviance is the value of the likelihood function when the intercept is included into the model.

Example 2 (Married women s labor force... ) Probit: (family = binomial(link = probit ) in glm) Call: glm(formula = inlf ~ nwifeinc + educ + exper + I(exper^2) + age + kidslt6 + kidsge6, family = binomial(link = "probit"), data = wkng) Deviance Residuals: Min 1Q Median 3Q Max -2.2156-0.9151 0.4315 0.8653 2.4553 Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) 0.2700736 0.5080782 0.532 0.59503 nwifeinc -0.0120236 0.0049392-2.434 0.01492 * educ 0.1309040 0.0253987 5.154 2.55e-07 *** exper 0.1233472 0.0187587 6.575 4.85e-11 *** I(exper^2) -0.0018871 0.0005999-3.145 0.00166 ** age -0.0528524 0.0084624-6.246 4.22e-10 *** kidslt6-0.8683247 0.1183773-7.335 2.21e-13 *** kidsge6 0.0360056 0.0440303 0.818 0.41350 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Null deviance: 1029.7 on 752 degrees of freedom Residual deviance: 802.6 on 745 degrees of freedom AIC: 818.6 Pseudo R-square: 1-802.6 / 1029.7 = 0.221

Logit: (family = binomial(link = logit ) in glm) glm(formula = inlf ~ nwifeinc + educ + exper + I(exper^2) + age + kidslt6 + kidsge6, family = binomial(link = "logit"), data = wkng) Deviance Residuals: Min 1Q Median 3Q Max -2.1770-0.9063 0.4473 0.8561 2.4032 Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) 0.425452 0.860365 0.495 0.62095 nwifeinc -0.021345 0.008421-2.535 0.01126 * educ 0.221170 0.043439 5.091 3.55e-07 *** exper 0.205870 0.032057 6.422 1.34e-10 *** I(exper^2) -0.003154 0.001016-3.104 0.00191 ** age -0.088024 0.014573-6.040 1.54e-09 *** kidslt6-1.443354 0.203583-7.090 1.34e-12 *** kidsge6 0.060112 0.074789 0.804 0.42154 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Null deviance: 1029.75 on 752 degrees of freedom Residual deviance: 803.53 on 745 degrees of freedom AIC: 819.53, Pseudo R-squared: 1-803.53 / 1029.75 = 0.220 Qualitatively the results are similar to those of the LPM. (R exercise: create similar graphs to those of the linear case for the marginal effects.)

1 Background 2 Binary Dependent Variable The Linear Probability Model The Logit and Probit Model 3 Tobit Model Interpreting Tobit Estimates Predicting with Tobit Regression Checking Specification of Tobit Models

Limited dependent variable is called a corner solution response variable if the variable is zero (say) for a nontrivial fraction in the population but is roughly continuously distributed over positive values. An example is the amount an individual is consuming alcohol in a given month. Nothing in principle prevents using a linear model for such a y. The problem is that fitted values may be negative.

In cases where it is important to have a model that implies nonnegative predicted values for y, the Tobit model is convenient. The Tobit model (typically) expresses the observed response, y, in terms of an underlying latent variable, y, y = x β + u (12) with and u x N(0, σ 2 ). y = max(0, y ) (13)

Accordingly y N(x β, σ 2 ) and y = y for y 0, but y = 0 for y < 0. Given sample of observations on y, the parameters can be estimated by the method of maximum likelihood. The log-likelihood function for observation i is l i (β, σ 2 ) = 1(y i = 0) log ( 1 Φ(x iβ/σ) ) (14) ( 1 +1(y i > 0) log σ φ ( (y i x iβ)/σ ) ) where 1(A) is an indicator function with value 1 if the condition A is true and zero otherwise, Φ( ) is the distribution function and φ( ) the density function of the N(0, 1) distribution. The maximization of the log-likelihood, l(β, σ) = i l i(β, σ), to obtain the ML estimates of β and σ is done by numerical methods.

Example 3 (Married women annual working hours) Married women working hours Frequency 0 50 100 150 200 250 300 0 1000 2000 3000 4000 5000 Hours

OLS results lm(formula = hours ~ nwifeinc + educ + exper + I(exper^2) + age + kidslt6 + kidsge6, data = wkng) Residuals: Min 1Q Median 3Q Max -1511.3-537.8-146.9 538.1 3555.6 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 1330.4824 270.7846 4.913 1.10e-06 *** nwifeinc -3.4466 2.5440-1.355 0.1759 educ 28.7611 12.9546 2.220 0.0267 * exper 65.6725 9.9630 6.592 8.23e-11 *** I(exper^2) -0.7005 0.3246-2.158 0.0312 * age -30.5116 4.3639-6.992 6.04e-12 *** kidslt6-442.0899 58.8466-7.513 1.66e-13 *** kidsge6-32.7792 23.1762-1.414 0.1577 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 750.2 on 745 degrees of freedom Multiple R-squared: 0.2656,Adjusted R-squared: 0.2587 F-statistic: 38.5 on 7 and 745 DF, p-value: < 2.2e-16

Tobit regression vglm(formula = hours ~ nwifeinc + educ + exper + I(exper^2) + age + kidslt6 + kidsge6, family = tobit(lower = 0), data = wkng) Pearson residuals: Min 1Q Median 3Q Max mu -8.429-0.8331-0.1352 0.8136 3.494 loge(sd) -0.994-0.5814-0.2366 0.2150 11.893 Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept):1 965.28507 443.93450 2.174 0.029676 * (Intercept):2 7.02289 0.03589 195.682 < 2e-16 *** nwifeinc -8.81433 4.48480-1.965 0.049371 * educ 80.64715 21.56529 3.740 0.000184 *** exper 131.56501 17.01343 7.733 1.05e-14 *** I(exper^2) -1.86417 0.52992-3.518 0.000435 *** age -54.40524 7.34462-7.408 1.29e-13 *** kidslt6-894.02622 111.46120-8.021 1.05e-15 *** kidsge6-16.21577 38.48134-0.421 0.673468 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Number of linear predictors: 2 Names of linear predictors: mu, loge(sd) Log-likelihood: -3819.095 on 1497 degrees of freedom Number of iterations: 6

(Intercept):2 is an extra statistic related to residual standard deviation. OLS generally results to biased estimation due to the censored y-values. Tobit regression accounts the biasing effect. However, we should make some adjustments to the Tobit coefficients before interpreting the magnitudes, as discussed below.

Interpreting Tobit Estimates 1 Background 2 Binary Dependent Variable The Linear Probability Model The Logit and Probit Model 3 Tobit Model Interpreting Tobit Estimates Predicting with Tobit Regression Checking Specification of Tobit Models

Interpreting Tobit Estimates Similar to regression, the interest is in the conditional expectation E[y x]. Given E[y y > 0, x] we can compute E[y x] as we can consider E[y y > 0, x] as the value of the binary random variable z which has value E[y y > 0, x] with with probability P[y > 0 x], when y > 0 and E[y y = 0, x] = 0 with probability P[y = 0 x] when y = 0. Accordingly using the law of iterated expectation (LIE) 2 E[y x] = E z [E[y z, x]] = P[y > 0 x] E[y y > 0, x] (17) 2 Generally, given random variables x, y, and z, E[x z] = E y [E[x y, z]] (15) and in particular E[x] = E y [E[x y]]. (16)

Interpreting Tobit Estimates Because y N(x β, σ 2 ) and y = y for y > 0 and y = 0 for y < 0, we have P(y > 0 x) = 1 Φ( x β/σ) = Φ(x β), such that E[y x] in (17) becomes E[y x] = Φ(x β/σ)e[y y > 0, x]. (18) To obtain E[y y > 0, x] we can use the general result for z N(0, 1): For any c E[z z > c] = φ(c)/ (1 Φ(c)) from which we obtain, by noting that y = x β + u and E[y y > 0, x] = x β + E[u u > x β], E[y y > 0, x] = x β + σ φ(xβ/σ), (19) where φ(c) = φ(c)/φ(c) [note: φ( c) = φ(c) and 1 Φ( c) = Φ(c)].

Interpreting Tobit Estimates Thus the marginal contribution of x j to the (conditional) expectation is x j E[y y > 0, x] = β j + β j φ (x β), (20) where φ ( ) is the derivative of φ( ). Because for standard normal distribution φ (z) = dφ(z)/dz = zφ(z) and Φ (z) = dφ(z)/dz = φ(z), we get finally ( E[y y > 0, x] = β j 1 x φ(x ( β/σ) x β/σ + φ(x )) β/σ). (21) j

Interpreting Tobit Estimates Equation (21) shows that the β j does not exactly reflect the marginal effect of x j on E[y y > 0, x]. It ( becomes adjusted ( by the factor )) 1 φ(x β/σ) x β/σ + φ(x β/σ). The marginal effect of x j on E[y x]: Combining equations (17) and (19), we have E[y x] = Φ(x β/σ)x β + σφ(x β), (22) where we have used the result Φ(z) φ(z) = φ(z).

Interpreting Tobit Estimates From equation (17) we can compute the marginal effect of x j by utilizing φ (z) = zφ(z), so that x j E[y x] = β j Φ(x β/σ) + β j φ(x β/σ)x β β j φ(x β)x β = β j Φ(x β/σ). (23) Again β becomes adjusted to some extend (causing difference from OLS). After estimating β and σ, Φ(x β/σ) is often evaluated at the mean n 1 i Φ(x i ˆβ/ˆσ).

Predicting with Tobit Regression 1 Background 2 Binary Dependent Variable The Linear Probability Model The Logit and Probit Model 3 Tobit Model Interpreting Tobit Estimates Predicting with Tobit Regression Checking Specification of Tobit Models

Predicting with Tobit Regression Predicteions of E[y x] in equation (22) can be obtained by replacing the parameters by their estimates ŷ = Φ(x ˆβ/ˆσ)x ˆβ + ˆσφ(x ˆβ/ˆσ), (24) where Φ is the standard normal cumulative distribution function and φ the standard normal density function (derivative function of Φ). Exercise: Using R, plot the predicted values for working hours as a function of education (educ) when the other explanatory are set to their means (for a solution, see R snippet for Example 3 on the course home page).

Predicting with Tobit Regression Remark 1 In OLS the R-square is the correlation of the observed values with the predicted values. Using this practice, one can compute an R-square for a Tobit model as well. For the OLS solution, R 2 = 0.258. Saving the R vglm results into an object (above wkh.tbt), the predicted values can be extracted with the fitted() function. In R S4 object the sub-objects are called slots. The observed dependent values are in slot @y, i.e., in our case wkh.tbt@y. Thus, for the Tobit model command cor(wkh.tbt@y, fitted(wkh)) 2 produces R 2 = 0.261, which is close to that of OLS.

Checking Specification of Tobit Models 1 Background 2 Binary Dependent Variable The Linear Probability Model The Logit and Probit Model 3 Tobit Model Interpreting Tobit Estimates Predicting with Tobit Regression Checking Specification of Tobit Models

Checking Specification of Tobit Models If we introduce a dummy variable w = 0 when y = 0 and w = 1 if y > 0, then E[w x] = P[w = 1 x] = Φ(x β/σ) is the probit model. Accordingly, if the Tobit model holds, we can expect that the (scaled) Tobit slope estimate ˆβ j /ˆσ of x j should be fairly close to that of probit estimate ˆγ j. Comparing closeness of the slope coefficients can be used as an informal specification check of appropriateness of the Tobit model. ================================= Tobit/sigma Probit --------------------------------- (Intercept):1 0.8603 0.2701 nwifeinc -0.0079-0.0120 educ 0.0719 0.1309 exper 0.1173 0.1233 I(exper^2) -0.0017-0.0019 age -0.0485-0.0529 kidslt6-0.7968-0.8683 kidsge6-0.0145 0.0360 (Insignificant in both models) ================================= The (scaled) slope coefficients of the Tobit model are fairly close to those of the probit model, suggesting appropriateness of the Tobit model.