Econometrics Homework Due Date: March, 24. by This problem set includes questions for Lecture -4 covered before midterm exam. Question Let z be a random column vector of size 3 : z = @ (a) Write out z z and zz in terms of z ; z 2 and z 3 : (b) If z N(; I 3 ), what is the distribution of z z? (b) If z N(; I 3 ); what is E(zz )? What is E(z z)? z z 2 z 3 Question 2 Given that X is a random vector with 3 elements: X = (X ; X 2 ; X 3 ) : The expectation is given The variance matrix is given by E(X) = @ V ar(x) = @ A A 2 :5 :5 3 Calculate the following expectation and variance. (i) X + X 2 (ii) :2X + :5X 2 + :3X 3 (iii) 3X 2 + X 3 You can use the formula without matrix. You may also calculate with the formula with matrix and verify. Question 3 Determine whether each of the following statements is true or false. (a) An estimator ^ for is unbiased if ^ = with probability bigger than.5. (b) If ^ n is a sequence of estimator for in sample size n, and V ar(^ n ) = :2 2 =n and E(^ n ) = (n )=n, then ^ n is a consistent estimator for : A
(c) Now we want to estimate the mean of a symmetric distribution. Given that the both sample median and sample mean are unbiased estimators for. The variance of the sample mean is 2 =n and the variance of the sample median is about :57 2 =n. Therefore, sample mean is a more e cient estimator of than sample median. (d) If the population of random variable X follows normal distribution, the sample mean X also follows normal distribution exactly. (e) For a test of 5% signi cance level, we reject the null if the p-value is larger than.5. Question 4 You work for the quality control department of a famous brand of canned food. We want to random draw some sample of the product and check if the mean weight is up to standard. The canned food is designed to weigh g. Now you can draw of the cans of a particular production line, and want to test the hypothesis that the population mean weight is g, against the alternative hypothesis that the population mean weight is not g. The sample mean X = 99:28; sample variance is s 2 = 5:22: (a) What statistic do you use to test the hypothesis? What is its distribution? Should we use asymptotic approximation? (b) What is the standard error of the estimator of the mean weight? (c) What is the t-statistic of your test? What is the p-value of the above test? (d) Test the above hypothesis at 5% signi cance level. (e) Construct a 95% con dence interval for the mean weight of the can. Question 5 We would like to investigate whether one s height can in uence their income level. Suppose we have a sample of men with their height and income information. The following equation is estimated by OLS: Income d i = 2983:4 + 53:3Height i n = ; R 2 = :3 where income is measured in yuan in a month and height is measured in meter. (a) Interpret the slope coe cient. How much does income change for a man cm higher? (b) Interpret the intercept coe cient. Is it meaningful by itself? (c) What is the predicted income in this model for a man who is :7m high? (d) Can a man of :6m have higher actual income than a man of :7m? (e) If I change the unit of income into per-thousand, so that I = Income=; how does it change the OLS estimators b and b? (f) If instead I change the unit of height to centimeter, so that H = Height, how does it change the OLS estimators b and b? [Hint: for (e) and (f), you may derive from the formulae of the estimators.] (g) What does R 2 mean? Question 6 Determine if each of the following statement is true or false. (Explanations are not required here, but you should know the reason.) () Gauss Markov Theorem states that the OLS estimator is the most e cient estimator among all unbiased estimators under assumptions -5. 2
(2) If the population regression function is Earnings i = + edu i + " i, where edu i is the year of education. If >, it is impossible to have a person with higher education to have lower earnings than another guy with lower education. (3) The larger the error variance 2 ; the larger than variance of b given X (values of regressors), holding other things constant. (4) In a multiple regression model, the higher the correlation among the regressors, the lower the variance of the corresponding OLS estimators given X, holding other things constant. (5) Suppose income is a regressor to explain food consumption. If the model holds for all income groups, it can reduce the variance of the estimator of the slope coe cient on income if we draw sample only from the low-income group. (6) If potential labor market experience of an individual is de ned by exper = age educ 6 where educ is the years of education received, then we cannot put all age; educ and exper as regressors of one regression equation. (7) In an F-test, you reject the null hypothesis when the sample F-statistic is at either the smallest or the biggest =2 portion of the corresponding F-distribution under the null. (8) The t-value shown in software output is the t statistic for the null hypothesis of = :5: (9) If the estimated coe cient b = 2:8 and se(b ) = :4; the t-value shown in the software is 2.. () We need to have regressors uncorrelated to each other in order to satisfy the basic assumptions of the linear regression model. () Adjusted R 2 (i.e. R 2 ) may decrease when we add one more regressor into the model. (2) To test the hypothesis 2 = 3 = ; we can use T test. Question 7 We have data collected from a random sample of 22 home sales from a community in 23. Let Price denote the selling price (in $), BDR denote the number of bedrooms, Bath denote number of bathrooms, Hsize denote the size of the house (in square feet). Lsize denote the lot size (in square feet), Age denote the age of the house (in years), and Poor denote the condition of the house where means the house is in poor condition, and for not in poor condition. An estimated regression equation yields d Pr ice i = 9:2 + :485BDR i + 23:4Bath i + :56Hsize i + :2Lsize i + :9Age i 48:8P oor i R 2 = :72 n = 22 (a) Interpret each of the coe cient estimates. Do they make economic sense? (b) What factors may contribute to the disturbance term? (c) What is the expected price of a house with 3 bedrooms, 2 bathrooms, 2 square feet large with a square feet of the lot, years old and in good condition? (d) Number of bedroom and house size are likely to be positively correlated. What is the consequence of such correlation? Question 8 The following regression is estimated as a production function ln Q t = :37 + :632 (:257) ln K t + :452 (:29) ln L t + e t R 2 = :9; n = 5 cov(b k ; b l ) = :55 3
where Q is output level, K is the amount of capital input and L is the amount of labor input. Standard errors (not variance) are given in the parenthesis. (a) Interpret the coe cient estimates. (b) Is each of the coe cients statistically signi cant at 5% level? (c) Use the information given above, test the following hypothesese separately: (i) The elasticity of labor input is.4. (ii) The capital and labor elasticity of output are the same. ( K = L ) (ii) The production function is constant return to scale. ( K + L = ) What statistics can you use here? What is the degree of freedom. Can you accept or reject the null hypothesis? Question 9 This question analyzes the determination of the starting salary of business school master program graduates in the United States. The following is the summary statistics of the data. Each data point is the average value of a program (school), and there are 47 programs in the data set.. su asp lnasp tuition lntuition rating gpa acceptance Variable Obs Mean Std. Dev. Min Max asp 47 8445.89. 6634 732 lnasp 47.338.3277.255.58357 tuition 47 288.79 582.845 9826 37323 lntuition 47.24297.243527 9.92787.52736 rating 47 3.74426.44363 3 4.6 gpa 47 3.369574.23625 3.7 3.6 acceptance 47 3.5949.4893 9.2 56.6 where asp is the annual starting salary after graduations in US dollar, lnasp=ln(asp), tuition is the annual school tuition fees in US dollar, rating is the employers assessment of the program, gpa is the undergraduate GPA of the students, and acceptance is the acceptance rate of the master program in percentage point. Speci cation A. reg lnasp lntuition rating Source SS df MS Number of obs = 47 F( 2, 44) = 9.37 Model.62864263 2.343235 Prob > F =. Residual.536368 44.34484 R squared =.859 Adj R squared =.797 Total.7863 46.6956659 Root MSE =.5865 lnasp Coef. Std. Err. t P> t [95% Conf. Interval] lntuition.228839.446583 2.75.9.32889.228868 rating.2797.24533 8.86..676937.26654 _cons 9.2669.484897 22.67. 8.43682.8333 Speci cation B 4
. reg lnasp lntuition rating gpa acceptance Source SS df MS Number of obs = 47 F( 4, 42) = 6.37 Model.664443427 4.66857 Prob > F =. Residual.5562884 42.275497 R squared =.858 Adj R squared =.8377 Total.7863 46.6956659 Root MSE =.5245 lnasp Coef. Std. Err. t P> t [95% Conf. Interval] lntuition.99399.482582 2.5.46.759.965288 rating.59375.283499 5.62..25.2653 gpa.68343.22594.67.58.38635.274672 acceptance.322.825 2.96.5.53865.75 _cons 9.58796.7572 3.6. 8.6433.9 (a) How do you interpret the coe cients for Speci cation A? Do they make sense? (b) How do you interpret the coe cients for Speci cation B? Do they make sense? How do the coe cients on lntuition and rating change? (c) Test the joint hypothesis that the coe cients on gpa and acceptance are both zero, using F statistic. F = (RSS R RSS U )=J RSS U =(n K) Based on this statistic, decide whether the null hypothesis should be rejected at 5% signi cance level. What are the degrees of freedom? [Critical Values at 5%: F 4;42 = 2:59; F 2;42 = 3:22; F 2;5 = 5:79: Choose the appropriate one.] Question In a simple regression model, y i = + x i + u i All other assumptions hold, but instead we have E(u i jx i ) = 2: (a) What is the conditional expectation of E(y i jx i ) in terms of and? (b) What is the expectation given X of the usual OLS estimators b? (b) If we rede ned the model y i = + x i + " i so that E(" i jx i ) =, what is the relation between and? What is the relation between and? Question (optional, no need to hand in) (a) Consider the simplest regression model. y i = + u i and we maintain all basic assumptions. (i) What is E(y i )? (ii) What is the OLS estimator for? Hint: Consider min ^ nx 2 (y i ^) i= 5
and solve for ^: (b) Consder another case: Regression function without an intercept. If we know that the population regression function is y i = x i + u i where = : (i) What is E(y i jx i )? What is E(y i jx i = )? (ii) Derive the OLS estimator for. That means we minimize the sum of squares of residuals min ^ nx (y i ^ x i ) 2 i= 6