Exercises for the course of Econometrics Introduction 1. () A researcher is using data for a sample of 30 observations to investigate the relationship between some dependent variable y i and independent variable x i. Preliminary analysis of the sample data produces the following information: NX y i = 7 i=1 NX x i y i = 11; i=1 NX x i = 1 i=1 NX yi 2 = 55 i=1 NX x 2 i = 3 i=1 with N = 30. Use the above sample information to answer all the following questions. Show explicitly all formulas and calculations. (a) Compute the OLS estimates of the intercept parameter 1 and the slope parameter 2 : (b) Compute the value of the R-squared, the coe cient of determination for the estimated OLS sample regression equation. Brie y interpret what the calculated value of this R-squared means. (c) Calculate the estimation of the variance and the standard error of the estimators of 1 and 2. 2. Let child denote the number of children ever born to a woman, and let educ denote years of education for the woman. A simple model relating fertility to years of education is child i = 1 + 2 educ i + u i ; where u i is the unobserved error term. 1
(a) What kinds of factors are contained in u i? Are these likely to be correlated with level of education? (b) In your opinion, will a simple regression analysis uncover the causal e ect of education on fertility? Explain. 3. Consider the simple regression model. Does a change in the unit of measurement for the dependent variable modify the parameter estimates or the R-squared? And a change in the unit of measurement for the regressors? Explain and justify. 4. In the wage equation, log(wage) = 1 + 2 educ + u; if wage is measured in cents rather than in dollars, what di erence does it make to the equation? Justify. 5. Suppose that the constant is equal to zero, i.e., 1 = 0. Then compute the OLS estimator of the slope coe cient 2 and its variance. Chapter 1 1. The variable rdintens is expenditures on research and development (R&D) as a percentage of sales. The variable sales is total sales measured in millions of dollars. The variable profmarg is pro ts as a percentage of sales. Using a data set for 32 rms in the chemical industry, the following equation is estimated: rdintens d = 0:472 + 0:321 log(sales) + 0:050 profmarg (1:369) (0:216) (0:046) N = 32; R 2 = 0:099: (a) Interpret the coe cienton log(sales). In particular, if sales increases by 10%, what is the estimated percentage point change in rdintens? Is this an economically large e ect? 2
(b) Test the hypothesis that R&D intensity does not change with sales, against the alternative that it does change with sales. Do the test at the 5% and 10% levels. (c) Does profmarg have a statistically signi cant e ect on rdintens? 2. Regression analysis can be used to test whether the market e ciently uses information in valuing stocks. For concreteness, let return be the total return from holding a rm s stock over the four-year period from the end of 1990 to the end of 1994. The e cient markets hypothesis says that these returns should not be systematically related to information known in 1990. If rm characteristics known at the beginning of the period help to predict stock returns, then we could use this information in choosing stocks. For 1990, let dkr be a rm s debt to capital ratio, let eps denote the earnings per share, let (log)netinc denote net income, and let (log)salary denote total compensation for the CEO. (a) The following equation was estimated: return d = 40:44 + 0:952 dkr + 0:472 eps (29:30) (0:854) (:332) 0:025 netinc 0:003 salary (0:020) (0:009) N = 142; R 2 = 0:0285: Test whether the explanatory variables are jointly signi cant at the 5% level. Is any explanatory variable individually signi cant? 3
(b) Now reestimate the model using the log form for netinc and salary: return d = 69:12 + 1:056 dkr + 0:586 eps (164:66) (0:847) (:336) 31:18 log(netinc) + 39:26 log(salary) (14:16) (26:40) N = 142; R 2 = 0:0531: Do any of your conclusions from part (a) change? (c) Overall, is the evidence for predictability of stock returns strong or weak? 3. () The following equation was estimated: d sleep = 3; 638:25 (112:28) 0:148 totwork 11:13 educ + 2:20 age (0:017) (5:88) (1:45) N = 706; R 2 = 0:113: where sleep is the number of sleeping hours over the year, totwork is the number of the number of working hours, educ is the education in years, and age is the age in years. (a) Is either educ or age individually signi cant at the 5% level against a twosided alternative? Show your work. (b) Dropping educ and age from the equation gives d sleep = 3; 638:25 (112:28) 0:151 totwork (0:017) N = 706; R 2 = 0:103: Are educ and age jointly signi cant in the original equation at the 5% level? Justify your answer. 4
(c) Does including educ and age in the model greatly a ect the estimated tradeo between sleeping and working? (d) Suppose that the sleep equation contains heteroskedasticity. this mean about the tests computed in parts (i) and (ii)? What does 4. () In demand analysis, household demands are usually supposed to be zero homegeneous. That is to say, the demanded quantities do not change when prices and income vary in the same proportion. Suppose that there are only three goods, and consider the following equation for the demand of good 1: log Q 1 = 1 + 2 log P 1 + 3 log P 2 + 4 log P 3 + 5 log Y + u; where Q 1 is the demanded quantity of good 1, P 1, P 2, P 3 are the prices of goods 1, 2 and 3, and Y is the household income. How can the homogeneity restriction be imposed in the model? How can this restriction be empirically tested? 5. () De ne the projection matrix as P = X(X 0 X) 1 X 0 and the anihilator matrix as M = I X(X 0 X) 1 X 0. Show the following. (a) P and M are symmetric and idempotent. (b) PX = X, MX = 0 (c) ^y = Py, ^u = My = Mu: (d) ^u 0^u = u 0 Mu: (e) R 2 u = y0 Py y 0 y : 6. Prove that the R-squared is the square of the simple correlation coe cient between y and by, where by = X(X 0 X) 1 X 0 y. 5
7. Prove that if a regression is tted without a constant term, the residuals will not necessarily sum to zero, and the R-squared if calculated as 1 u 0 u=(y 0 y Ny 2 ), may be negative. 8. Prove that the adjusted R-squared increases with the addition of a variable only if the F statistic for testing the signi cance of that variable exceeds unity. 9. Under the Gauss-Markov hypothesis, does there exist a linear, but not necessarily unbiased, estimator of that has a variance smaller that that of the OLS estimator? If so, how small can the variance be? Justify. 10. () In the restricted OLS, the sum of squared residuals is minimized subject to the constraint implied by the null hypothesis R = r. For the Lagragian as: min b SSR(b) = 1 2 (y Xb)0 (y Xb) + (Rb r) 0 ; where is the Q-dimensional multiplier. The restricted least square estimator b R is the solution to the minimization problem. (a) Show that b R = b (X 0 X) 1 R 0 R(X 0 X) 1 R 0 1 (R b r) = R 0 R(X 0 X) 1 R 0 1 (R b r) where b = (X 0 X) 1 X 0 y is the unrestricted least square estimator. (b) Let bu = y X b, the residuals from the restricted regression. Show that SSR R SSR U = ( b R b ) 0 (X 0 X)( b R b ) = (R r) 0 R(X 0 X) 1 R 0 1 (R r) = 0 R(X 0 X) 1 R 0 1 = bu 0 Pbu where P = X(X 0 X) 1 X 0. 6
(c) Verify that you have proved that F = (SSR R SSR U )=Q : SSR U =(N K) 11. Criticize the following statement: "It is a nonsense to test a hypothesis consisting of a large number of equality restrictions, because the t-test will most likely reject at least some of the restrictions". 12. Verify the identity @ 2 L( I( 0 ) = E 0 ) @@ 0 for the linear regression model. 13. () Prove that under usual assumptions Cov(bu; b j X) = 0. Explain all the steps. Hint: Cov(bu; b h j X) = E ( b i )(bu E(bu j X)) j X and use Exercise 5. Chapter 2 1. A sequence of real numbers is a trivial example of a sequence of random variables. Is it true that lim N!1 z N = implies plim N!1 z N =? Hint: If lim N!1 z N =, jz N j < " for N su icently large. 2. Suppose that p N( b ) N (0; 2 ). Does it follow that plim b =? Give a formal proof using the properties of convergence. 3. () Prove that under assumptions 2.1 2.5, plim b 2 = 2 ; where ^ 2 N 1 K NX bu 2 i : i=1 Hint: bu i = y i x 0 i b = x 0 i x 0 i b + u i. 7
4. There is no unique way to write the linear hypothesis R = r, because for any conformable nonsingular matrix F, the same set of restrictions can be represented as ~R = ~r with ~R = FR and ~r = Fr. Does a di erent choice of R and r a ect the asymptotic distribution of the Wald statistics? And its numerical value? 5. () Find the 5% critical values of F(10,1) and the 5% critical values of 2 (10). What is the relationship between the two? Is it intuitive? Then prove that SSR R SSR U b 2 d! 2 (Q): 6. Let savings denote the amount saved by a household over the year, and let income denote its total income for the same period. Consider the savings function: savings = 1 + 2 income + u; u = p income " where " is a random variable with E(") = 0 and var("). independent of income. Assume that " is (a) Show that E(u i jincome) = 0, so that the zero conditional mean assumption is satis ed. Show that var(u i jincome) = 2 income, so that the homoskedasticity assumption is violated. Provide a discussion that supports the assumption that the variance of savings increases with family income. (b) How can the model be transformed to deal with heteroskedasticity, so that estimators are e cient? 7. Suppose that the variance matrix of the error term u is heteroskedastic with var(u) = V 2. Show that ^ GLS = + (X 0 V 1 X) 1 X 0 V 1 u. Moreover, under the classical hypothesis, and the null hypothesis, show that 0 R^ r [R(X 0 V 1 X) 1 R 0 ] R^ r =Q F = ^u 0 V 1^u=(N K) is distributed as a Fisher distribution with Q and N 8 K degrees of freedom.
Chapter 3 1. Consider a sample of individuals containing information on wages, education, experience and so on. The econometrician selects the subsample of individuals who participate in the labor market (for who wages can be observed) and, using this sample, regresses the wage on the various explanatory variables. What can you say about the possible bias of the estimators? 2. () Consider the following macroeconomic model: consumption function: C i = 1 + 2 Y i + u i (1) GNP identity: Y i = C i + I i : (2) where Y i is GNP, C i is aggregate consumption, and I i is aggregate investment. (a) Derive the reduced-form of the model. (b) Prove that Y i in (1) is endogenous and suggest a consistent estimator. 3. () Consider the simple regression model y = 1 + 2 x + u under the Gauss- Markov assumptions. For some function g(x), for example g(x) = x 2 or g(x) = log(1 + x 2 ), de ne z i = g(x i ). De ne a slope estimator as ~ 2 = P N i=1 (z i z) y i P N i=1 (z i z) x i : (a) Show that ~ 2 is linear and unbiased. Remember, because E(ujx) = 0, you can treat both z i and x i as non random. (b) Add the homoskedasticity assumption. Show that P N var( ~ 2 ) = 2 i=1 (z i z) 2 PN 2 : i=1 (z i z) x i 9
(c) Show directly that, under the Gauss-Markov assumptions, var(^ 2 ) var( ~ 2 ); where ^ 2 is the OLS estimator. 4. Show that E(u 2 jz) = 2 implies E(u 2 z 0 z) = 2 E(z 0 z). 5. Let A be a Q K matrix of full row rank (so Q K) such that A zx is of full column rank. Let z i = Az i (so z i is a vector of Q transformed instruments). (a) Verify that assumptions 3.3 and 3.4 continue to hold with th eset of instruments z i. (b) Verify that the regression model satisfying Assumptions 3.3 and 3.4 reduces to the traditional regression model if z i = x i. 6. Show that 2SLS estimators and IV estimators coincide when the model is exactly identi ed. Chapter 4 1. Consider the autoregressive process of order 2 [AR(2)] de ned as: z t = + 1 z t 1 + 2 z t 2 + " t where " t is an (independent) white noise process. Suppose that the process is stationary. Then, (a) Compute the unconditional mean, the variance and the autocovariances of order 1 and 2 of z t in the stationary case. (b) Derive the conditions on the parameters for stationarity. 2. Consider the simple regression model y t = 1 + 2 x t + u t : () 10
The error term u t is heteroskedastic and autocorrelated with u t 1 and u t 2 : More precisely, u t = p exp( 1 + 2 x t )v t ; v t = 1 v t 1 + 2 v t 2 + " t : where the " t is an (independent) white noise and 1, 2, 1 and 2 are parameters. The conditions of stationaroty are supposed to be satis ed. Show how to transform the initial equation and how to estimate this model using Generalized Least Square (but 1, 2, 1 and 2 are not known). 3. Consider the following regression model: log(invpc) t = 1 + 2 log(price) t + 3 t + u t : where invpc is the real per capital housing investment and price is a housing price index, for which Assumptions 4.1-4.4 are satis ed. The following model is estimated: log(invpc) t = 1 + 2 log(price) t + u t : Determin whether the omission of the trend will lead to unconsistent estimators and characterize the direction of the asymptotic bias as a function of the relationship between price and t and invpc and t. 4. Consider the simple regression model y t = 1 + 2 x t + u t and suppose that the error term can be described by a AR(1) process: u t = u t 1 + " t ; with jj < 1 where " t is an (independent) white noise. 11
(a) Compute the variance of the slope estimator and show that, if > 0, the OLS variance of the slope estimator tends to under-estimate the true variance. (b) Consider that the error term can be described by a MA(1) process and show how the conclusions are modi ed. 12