Final Exam. Name: Instructions. Answer all questions on the exam. Open books, open notes, but no electronic devices. The first 13 problems are worth 5 points each. The rest are worth 1 point each. HW1. Suppose the classical regression model holds, with 0 = 10, 1 = 1 and = 3. Graph the conditional distribution of Y when X = 5. Put numbers on the horizontal axis. Solution:
HW2. You can estimate the slope, 1, by using ordinary least squares (OLS), or by using least sum of absolute deviations (LAD). These will be different estimates of the same parameter. You can simulate many data sets according to the classical model and calculate both estimates for each simulated data set. Based on these simulations, how will you know that the OLS estimate of 1 is better than the LAD estimate of? Solution: The histograms of the OLS and LAD estimates that result from the simulations will show the OLS values tend to be closer to the value of 1. This will also be confirmed by the fact that the standard deviation of simulated OLS estimates will smaller than that of the LAD estimates.
HW3. Suppose you fit a regression model using lm and get the following results. Call: lm(formula = Y ~ X) Residuals: Min 1Q Median 3Q Max -2.95-0.60 0.037 0.72 1.57 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 2.04 0.095 21.3 <2e-16 *** X 9.95 0.092 107.6 <2e-16 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 0.956 on 98 degrees of freedom Multiple R-squared: 0.9916, Adjusted R-squared: 0.9915 F-statistic: 1.159e+04 on 1 and 98 DF, p-value: < 2.2e-16 Give an approximate 95% confidence interval for 1. Solution: 9.95 2(0.092). Roughly 9.8 < 1 < 10.2.
Midterm. The Toluca study has Y = workhours and X = lotsize for n = 25 jobs. The standard error of ˆ 1 is the estimated standard deviation of ˆ 1. To have a standard deviation of ˆ 1, there must be many values of ˆ 1. In the context of the Toluca study, describe what those many values of ˆ 1 refer to. Solution: The observed workhours data values (25 of them) are just one sample of potentially observable workhours data values from the conditional distributions specified by the observed lotsize data values. Every such other sample of 25 workhours values, matched with the original lotsize values, will give a different ˆ 1. The many values of ˆ 1 refer to these values, one for each such randomly sampled data set.
HW4. Suppose you fit the model ln(y) = 0 + 1 X +, and you get the following output from lm. Call: lm(formula = lny ~ X) Residuals: Min 1Q Median 3Q Max -0.179915-0.016900 0.008345 0.024582 0.071882 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 3.893437 0.004011 970.77 <2e-16 *** X 0.206713 0.003880 53.28 <2e-16 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 0.04011 on 98 degrees of freedom Multiple R-squared: 0.9666, Adjusted R-squared: 0.9663 F-statistic: 2839 on 1 and 98 DF, p-value: < 2.2e-16 Give the back transformed estimated model to predict Y as a function of X. Solution: The predicted value of lny is 3.89 + 0.206X. So the predicted value of Y is exp(3.89 + 0.206X).
HW5. Recall the charitable contributions data set. Let Y = ln(charitable.cont) (logarithm of charitable contributions) X1= DEPS (dependents), X2 = ln(income.agi) (logarithm of adjusted gross income), and consider the classical regression model Y = 0 + 1X1 + 2X2 +. Interpret the parameter 1 in this specific context. Solution: Consider two conditional distributions of Y: (i) Where X 1 = 2, and X 2 = 9. (ii) Where X 1 = 1, and X 2 = 9. The conditional mean of Y in case (i) is 0 + 1 (2) + 2 (9) The conditional mean of Y in case (ii) is 0 + 1 (1) + 2 (9). So 1 is the difference in conditional means of Y (= ln(charitable Cont)) for those two cases. In general, 1 is the increase in the mean of Y associated with one more DEP, holding ln(charitable Cont) fixed.
HW6. Consider the R simulation code X1 = rnorm(100) X2 = rnorm(100) Y = 25 + 100*X1 + 50*X2 + rnorm(100,0,4) summary(lm(y ~ X1 +X2)) There is an F test in the lm output. In this context of this simulation, what hypothesis is being tested by the F test? Is the hypothesis true or false in this case? Solution: The F test is testing the null (restricted) model where both 1 and 2 are zero. Here, 1 = 100 and 2 = 50, so the null model is false.
HW7. An estimated conditional mean function using an interaction model is Yˆ 2 3X 1X 4X X. 1 2 1 2 Both X1 and X2 are interval variables, both having ranges from 0 to 1. Graph (i) the estimated conditional mean of Y as a function of X1, when X2 = 0, and (ii) the estimated conditional mean of Y as a function of X1, when X2 = 1, on the same axes. Solution: When X2 = 0, Yˆ 2 3X1 1(0) 4 X1(0) 2 3X1. When X2 = 1, Yˆ 2 3X 1(1) 4 X (1) 3 X. Here is a graph: 1 1 1 (Interaction is apparent in that the effect of X1 on Y depends strongly on the value of X2: When X2 = 0, X1 has a positive effect on Y, and when X2 =1, X1 has a negative effect on Y.)
HW8. Consider the data pronunciation data set. The Y variable is age and the X variable is region of the U.S., either West, South, North Central, or East. The region variable is coded using indicator variables and the model is Y = 0 + 1West + 2South + 3NorthCentral +. Interpret 0 and 1 in the context of this study. Solution. Consider the four groups: West: Mean of Y is 0 + 1 South: Mean of Y is 0 + 2 North Central: Mean of Y is 0 + 3 East: Mean of Y is 0 So 0 is the mean of Y in the East, and 1 is the difference between the mean of Y in the West and the mean of Y in the East. (Note: Here the mean of Y is actually the probability of pronouncing day-tuh since the Y variable is binary.)
HW9. In the data pronunciation data set, the Y variable is Y=1 if the survey respondent pronounces data as day - tuh, and is Y=0 if the survey respondent pronounces data as daa - tuh. The X variable is the age of the survey respondent. Here are two fitted models. fit0 = glm(y ~ 1, family = "binomial") fit1 = glm(y ~ X, family = "binomial") Explain (i) how to get the likelihood ratio chi-square statistic from the log likelihoods of these two fitted models, and (ii) how the degrees of freedom are determined. Solution: (i) you get log likelihoods for the fits using loglik(fit0) and loglik(fit1). The chisquare statistic is 2(logLik(fit1)) 2(logLik(fit0)). (ii) There are two parameters in fit1 ( 0 and 1 of the logistic regression model) and one in fit0 (just 0), so the degrees of freedom is 2-1=1.
HW10. When data are count data (like in the financial planners example), why does the classical regression model give a bad estimate of the conditional distribution p(y x)? Draw a graph or graphs as part of your answer. Solution: The following graph shows the estimated p(y x) for a 45-year old Female, when using the classical model. This is obviously bad estimate of p(y x) because it predicts 1.4, 0.1, etc. financial planners might be used. It even predicts that negative financial planners might be used!
HW11. Suppose your data look like this: Y X 0 0.0 0 1.1 120,000 11.2 10,003 2.1 0 5.3 0 0.0 0 3.9 990,891 0.8 Briefly state pros and cons of using Poisson regression with these data. Which has more weight here, pro or con? (Recall: pro = benefit, con = disadvantage ) Solution: Pro: There are 0 s in the Y variable, and that is one indication that Poisson might be appropriate. Con: The non-zero numbers should not be too large. Here they are huge, and not capable of being explained by a Poisson model is which there are also 0 s. So the con has more weight here and one should not use Poisson regression.
HW12. You can construct a 90% prediction interval for your Y variable, given X = 10, using quantile regressions. Explain how to do this. Solution. Run two quantile regressions, one with = 0.05, and the other with = 0.95. Plug X = 10 into the two estimated linear models to get the endpoints of the interval.
Quiz 1. When is E(Y X = x) truly equal to 0 + 1 x? A. When there are three or more levels of the X variable B. When the test for linearity passes (p >.05) C. When E(Y X = x) increases as x increases D. When X and Y are independent (here, E(Y X = x) = 0 + x.) Quiz 2. What are the maximum likelihood estimates when you assume a Laplace distribution for p(y x)? A. Quantile regression estimates (assuming = 0.5) B. Ordinary least squares estimates C. Weighted least squares estimates D. Generalized least squares estimates Quiz 3. If ˆ 1 is an unbiased estimator of 1, then A. ˆ 1 is a good estimator of 1 B. ˆ 1 is sometimes equal to 1 C. ˆ 1 is close to 1 D. The mean of the probability distribution of potentially observable values of ˆ 1 is exactly equal to 1 Quiz 4. Under the classical regression model, what happens to the confidence interval for E(Y X = x) when n increases? A. It gets wider B. It approaches 0 + 1 x 0 C. It approaches 0 + 1 x 1.96 Quiz 5. When checking assumption an assumption using a testing (p value based) method, you A. reject the assumption when p >.05 B. reject the assumption when p <.05 C. fail to reject the assumption when p <.05 D. fail to accept the assumption when p >.05
Quiz 6. If there is heteroscedasticity, which statement must be true? A. Var(Y X = 20) = Var(Y X = 120) B. The hypothesis test rejects the homoscedastic model C. Var(Y X = 20) is significantly different from Var(Y X = 120) D. Var(Y X = a) is different from Var(Y X = b), for some values a and b. Quiz 7. When might you transform Y but not X? (Pick one answer only) A. When there are outliers in your X data B. When your conditional Y distribution is normal C. When E(Y X = x) is a nonlinear function of x D. When the Box-Cox method indicates that = 1 is a good choice Quiz 8. When is the transformation 1/Y easy to justify? A. When the units of measurement of Y are ratio units B. When the distribution of Y is lognormal C. When the Box-Cox method indicates that = 0 is a good choice D. When the Box-Cox method indicates that = 1 is a good choice Quiz 9. Suppose the data come from the model Y X = x ~ N( 0 + 1x, 2 ). Given X = x, the best predictor of Y is A. 0 + 1 x B. 0 + 1x + C. ˆ ˆ 0 1 x D. ˆ ˆ ˆ 0 1x Quiz 10. Assuming the same model as in quiz 9 above, is the A. standard deviation of Y B. standard deviation of X C. standard deviation of Y when X = 10 D. standard deviation of X when Y = 10 Quiz 11. Which matrix tells you how much sample to sample variation there is in the potentially observable values of the OLS estimated regression coefficients? A. I B. (x x) -1 C. 2 I D. 2 (x x) -1
Quiz 12. Your regression model is Y = 0 + 1 X 1 + 2 X 2 +. Lack of multicollinearity in your regression model is indicated when R 2 is close to 0.0 for which of the following? (Select only one.) A. lm(y ~ X1) B. lm(y ~ X2) C. lm(y ~ X1 + X2) D. lm(x1 ~ X2) Quiz 13. Consider the model E(Y X1 = x1, X2 = x2) = 0 + 1x1 + 2x2. When graphed threedimensionally, this function looks like A. A plane B. A line C. A twisted plane D. A bell curve Quiz 14. Consider a two-way ANOVA with interaction to predict price of a home as a function of region (A or B) and whether the home has a cellar (Yes or No). The model is Price = 0 + 1Region.A + 2Cellar.Yes + 3 Region.A*Cellar.Yes + Region.A = 1 if the home is in Region A, and Region.A = 0 if the home is in region B. Cellar.Yes = 1 if the home has a cellar, and Cellar.Yes = 0 if the home does not have a cellar. The mean price of a home in Region B that has a cellar is 0 + 2 Quiz 15. Two models were analyzed: fit1 = lm(charity ~ DEPS) fit2 = lm(charity ~ as.factor(deps)) Here, CHARITY is a measure of charitable contributions and DEPS is number of dependents claimed on a tax form, taking the values 0, 1, 2, 3, 4, 5 and 6 in the data set. Select one of the following choices. A. The correct functional specification assumption is true in fit1 B. The correct functional specification assumption is true in fit2 C. The normality assumption is true in fit1 D. The normality assumption is true in fit2 Quiz 16. Models to predict Hans graduate GPA using his GRE quant and verbal scores are fit0 = lm(gpa ~ 1) fit1 = lm(gpa ~ GRE.quant) fit2 = lm(gpa ~ GRE.quant + GRE.verbal) Which model gives a prediction having least bias? A. fit0 B. fit1 C. fit2
Quiz 17. Suppose the variance function is Var(Y X = x) = x. Using this function, the maximum likelihood estimates are weighted least squares estimates, with weights equal to A. B. 2 C. 1/x D. 1/x 2 Quiz 18. What is the benefit of using heteroscedasticity-consistent standard errors, as opposed to the ordinary standard errors, when there is heteroscedasticity? A. The percentage of 95% confidence intervals for 1 that contain 1 becomes closer to 95% B. The p-value for testing that 1 = 0 becomes exactly correct C. The linearity assumption becomes approximately valid D. The R 2 statistic becomes higher Quiz 19. If the probability is 0.75, then the odds ratio is A. 1/4 B. 4.0 C. 1/3 D. 3.0 Quiz 20. In the ordinal logistic regression model, Pr(Y = 1 X = x) = 1 2 1x). Thus, if 1 is negative, then the probability that Y = 1 A. increases as X increases B. decreases as X increases C. does not change as X increases Quiz 21. The negative binomial regression model (NBRM) requires you to estimate one more parameter than does the Poisson regression model (PRM). If you estimate a NBRM when the data generating process is correctly modelled as a PRM, you can expect that A. Your estimated parameters will have higher variances B. Your estimated parameters will be biased C. Your true parameters will have higher variances D. Your true parameters will be biased Quiz 22. Give an example of a censored data value. A. A data value that is known only to be more than 4.5 B. A data value that is an outlier C. A data value that that has been excluded from the study D. A data value from a lognormal distribution
Quiz 23. The Cox proportional hazards regression model is not fully parametric. What is nonparametric about the model? A. The mean is a nonparametric function of the X variables B. The baseline distribution is not a parametric distribution C. The variance is a nonparametric function of the X variables D. The hazard function is a nonparametric function of the X variables Quiz 24. When is an outlier in X space a serious problem? A. When you use OLS estimates B. When you use ML estimates C. When there is also a large absolute standardized residual D. When there is also a small Cook s D statistic Quiz 25. Consider the classical regression model Y X = x ~ N( 0 + 1x, 2 ). Assuming this model is true, the function that relates the 0.975 quantile of the distribution of Y to X = x is y0.975 = 0 + 1x B. 0 + 1x 1.96 C. 0 + 1 x + 1.96 D. 0 + 1x + 0.975 Quiz 26. Neural networks have what advantage over the classical linear regression model? A. They are more easily interpreted B. They allow nonlinear conditional mean functions with interactions C. They allow non-normal distributions D. They allow heteroscedasticity Quiz 27. Which is called p hacking? A. Trying different Winsorizing thresholds (eg, 95%, 98%, 99%, 99.5% etc.) until your desired result becomes statistically significant. B. Trying different models (eg lm(y ~ X1 + X2), lm(y ~ X1 + X3), lm(y ~ X1 + X2 + X3), lm(y ~ X1 + X2 + X3 + X4)), until you get a p-value for X1 that is less than 0.05. C. Trying models where you violate the inclusion principle in the hope of proving your desired conclusion. D. All of the above.