Economics 671: Applied Econometrics Department of Economics, Finance and Legal Studies University of Alabama

Problem Set #1 (Random Data Generation) 1. Generate =500random numbers from both the uniform 1 ( [0 1], uniformbetween zero and one) and exponential exp ( ) (set =2and let [0 1]) distributions. Plot the histograms of each of the variables. What are the true and estimated means and variances? 2. Generate a random variable with =500from the mixed normal distribution with [ ( 3 1)] = [ (3 1)] = 0 5. Plot the histogram. Note that this should be a bimodal distribution. Do not add the two variables together. This will lead to a unimodal density with mean near zero. 3. Generate a random variable with = 500 from the Student-t distribution with 5 degrees of freedom, 5.Plotthehistogram. 4. Generate a random variable with =500from the 2 distribution with 1 degree of freedom. Plot the histogram. 5. Generate a random variable with =500from the Normal distribution with mean 1 and variance 0.5. Plot the histogram. 6. This problem will show you how the central limit theorem works. Using a programming software show how the CLT works for the sample mean with the five separate distributions. Use =10 50 and 100 with =100 200, and500 Monte Carlo replications. Plot histograms of the sample means for each distribution and each sample size. (a) Standard normal, (0 1) (b) Uniform, [0 1] (c) Mixed Normal, [ ( 3 1)] = [ (3 1)] = 0 5 (d) Exponential, exp ( ) (set =2and let [0 1]) (e) Student- 5 7. This problem is to conduct a Monte Carlo experiment to examine the finite sample performance of a test in size ( ( 0 : 0 ))andpower( ( 0 : 0 )). (a) Size: Generate a random sample 1 2 of size =50,with =0and 2 =1from two distributions, (i) normal and (ii) uniform. Pretend you do not know the mean and variance. Then test the null hypothesis that =0against the alternative that it is not equal to zero. Use Monte Carlo experiments with 500 replications to evaluate the size of the test statistic t. Use the asymptotic critical value at the 5% level (1.96). Repeat with =100and 200. Whatdoyou find? (b) Power: Repeat part (a) with =0 1 0 3 0 1, and 0 3. Plot the power function (include =0in the plot). 1

Problem Set #2 (Nonlinear Single Equation Models) 1. Consider the following special one-parameter case of the gamma distribution µ ³ ( ) = 2 exp 0 0 For this distribution it can be shown that ( ) =2 and ( ) =2 2. Here we introduce regressors and suppose that in the true model the parameter depdends on regressors according to =exp( 0 ) 2. Thus ( )=exp( 0 ) and ( )= [exp ( 0 )] 2 2. Assume the data are independent over and is nonstochastic and = 0 in the dgp. (a) MLE: Show that the log-likelihood function (scaled by 1 ) for this gamma model is ( ) = 1 P [ln 2 0 +2ln2 2 exp ( 0 )] =1 (b) MM: Show that the conditions ( )=exp( 0 ) and ( )=[exp( 0 )] 2 2 imply that n h io ( exp ( 0 )) 2 [exp ( 0 )] 2 2 =0 Use the moment condition to form a method of moments estimator b. (c) GMM:Supposeweusethemomentcondition { [ exp ( 0 )]} in addition to that in part (b). Give the objective function for a GMM estimator of. (d) An appropriate gamma variate can be generated by using = ln 1 ln 2, where =exp( 0 ) 2 and 1 and 2 are random draws from [0 1]. Let 0 = 0 + 1. Generate a sample of size =500when 0 = 1 0 and 1 =1 0 and (0 1). 1. Obtain NLS estimates of 0 and 1 from regression of on exp ( 0 + 1 ). 2. Obtain ML estimates of 0 and 1 from regression of on exp ( 0 + 1 ). 3. Obtain GMM estimates of 0 and 1 from regression of on exp ( 0 + 1 ). 2. Consider the data set used in Zellener and Revankar (1970). This data set (available in Table F9.2 in Greene) contains 25 observations on transportation equipment manufacturing. The table includes data on output, capital and labor for 25 U.S. states. Here we consider the Cobb-Douglas production function = 1 2 Estimate the parameters, 1 and 2 using the following procedures: 2

(a) OLS in logs (b) MLE in logs and levels (Gaussian and Poisson respectively) (c) NLS in logs and levels (d) GMM in logs and levels 3

Problem Set #3 (Limited Dependent Variables) 1. Consider a latent variable modeled by = 0 + (0 2 ). Suppose is censored from above so that we observe = if and = if,where the upper limit of is a known constant for each individual (i.e., data) and may differ over individuals. (a) Give the log-likelihood function for this model (hint: note that this differs from the standard case both owing to presence of and because the equalities are reversed with = if ). (b) Obtain the expression for the truncated mean ( ) (hint: for (0 1),wehave ( )= ( ) [1 Φ ( )]; ( )= ( ) and (0 1)). (c) Hence give Heckman s two-step estimator for this model. (d) Obtain the expression for the censored mean ( ) (hint: an essential part is the answer in part b). 2. Given the data set 1 0 0 1 1 0 0 1 1 1 9 2 5 4 6 7 3 5 2 6 estimate a probit model and test the hypothesis that is not influential in determining the probability that equals one. Estimate a logit model and test the hypothesis that is not influential in determing the probablity that equals one. 3. We are interested in the ordered models. Our data consist of 250 observations of which the response are 0 1 2 3 4 50 40 45 80 35 Using the preceeding data, obtain maximum likelihood estimates of the unknown parameters of the model under the normal and logistic distributions (hint: consider the probabilities as the unknown parameters). 4. Deb and Trivedi (2002) modeled the number of outpatient visits to a medical doctor and to all providers using count data models (data available from Cameron and Trivedi). Here instead we intended to model annual health expenditures. The model is a probit regression of DMED, an indicator variable for positive health expenditures, against just one regressor for simplicity, NDISEASE, the number of chronic diseases. 4

(a) Obtain the OLS estimates of the slope parameter. (b) Obtain the probit estimate of the slope parameter. (c) Given part (b), obtain the marginal effect of chronic diseases in three ways: averaged over the sample, evaluated at the sample average of NDISEASE and the histogram of the individual marginal effects. (d) Obtain the logit estimate of the slope parameter. (e) Given part (d), obtain the marginal effect of chronic diseases in four ways: averaged over the sample, evaluated at the sample average of NDISEASE, the histogram of the individual marginal effects and evaulated at Λ ( 0 )=. (f) For the logit model calculate the proportionate change in the odds ratio when NDISEASE changes. (g) Compare the three binary models on the basis of statistical significance of NDIS- EASE. (h) Compare the three binary models on the basis of the estimated marginal effect. (i) Compare the three binary models on the basis of the predicted probabilities. (j) Compare the logit and probit models on the basis of the log-likelihood. 5

Problem Set #4 (Panel Data) 1. In a fixed effects model, we can consider several ways to control for the individual effects which may be correlated with the regressors. Three cases being the within estimator, the least-squares dummy variable estimator (LSDV) and the first difference estimator. Consider the three methods, show that (a) The within estimator and the LSDV estimator are equivalent (b) When =2, the within estimator and the first difference estimator are equivalent 2. Koop and Tobias (2004) consider the relationship between wages and education, ability, and family characteristics. Their data (available on the JAE data archive) is available in two parts. The first file contains the panel of 17,919 observations on the Person ID and 4 time-varying variables. The second file contains time invariant variables for the individual or the 2,178 households. See the article for details on the empirical model and data construction. Consider the following regression model ln = + 1 + 2 + 3 2 + 4 + 5 + + where ln is log of hourly wage, is education, is potential experience, is a dummy variable for living in a broken home and is the number of siblings. Note that the last two variables are time invariant. Using all of the 17,919 observations of the data provided, estimate the following (a) Pooled OLS model (b) Random effects model (feasible GLS estimator with pooled OLS estimator used for first stage) (c) Fixed effects model (within estimator) (d) Hausman-Taylor estimator (e) Use the Hausman test to test for the difference between models in (b) and (c) 6