Chapter 8 Conclusion

Size: px

Start display at page:

Download "Chapter 8 Conclusion"

Richard Jacobs
5 years ago
Views:

1 1 Chapter 8 Conclusion Three questions about test scores (score) and student-teacher ratio (str): a) After controlling for differences in economic characteristics of different districts, does the effect of str on score depend on the fraction of English learners (pctel)? b) Does this effect depend on str? (Is there a non-linear relationship?) c) After taking economic factors and nonlinearities into account, what is the estimated effect on score of reducing str?

2 2 > teachdata = read.csv(" > attach(teachdata) > head(teachdata) sublunch score str avginc pctel

3 3 An economics study should always include a description of the data: sublunch percent qualifying for reduced-price lunch score average test score str student teacher ratio avginc district average income (in $1000 s) pctel percentage of English learners It is also common to provide descriptive statistics for the variables. The variable of interest is str ( policy variable). Two measures of the economic background of students: sublunch and avginc pctel also important because of O.V.B.

4 4 In a previous lecture, it was argued that avginc might have a non-linear relationship with score: > plot(avginc, score, xlim = c(5,60), ylim = c(600,710)) score avginc

5 5 What are some ways we can deal with this? (i) Polynomials: > avginc2 = avginc^2 > avginc3 = avginc^3 > eqcubic = lm(score ~ avginc + avginc2 + avginc3) > summary(eqcubic) Call: lm(formula = score ~ avginc + avginc2 + avginc3) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 6.001e e < 2e-16 *** avginc 5.019e e e-08 *** avginc e e * avginc e e Signif. codes: 0 *** ** 0.01 * Residual standard error: on 416 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 3 and 416 DF, p-value: < 2.2e-16

6 6 Let s plot the cubic regression function: > par(new = TRUE) > curve( *x *x^ *x^3, xlim = c(5,60), ylim = c(600,710), ylab = "", xlab = "", col = 2) score avginc

7 7 (ii) Logarithms: > eqlog = lm(score ~ log(avginc)) > summary(eqlog) Call: lm(formula = score ~ log(avginc)) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) <2e-16 *** log(avginc) <2e-16 *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: on 418 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 1 and 418 DF, p-value: < 2.2e-16 Add this regression to the plot:

8 8 > par(new = TRUE) > curve( *log(x), xlim = c(5,60), ylim = c(600,710), ylab = "", xlab = "", col = 3) > legend("bottomright", c("cubic", "Lin-Log"), pch =" ", col=c(2,3)) score Cubic Lin-Log avginc

9 9 Do you like the cubic or lin-log model better? What are the advantages/disadvantages? Does heteroskedasticity appear to be present? We will proceed by using log(avginc). But first, to revise omitted variable bias, let s see what happens if we leave log(avginc) out of the regression. > eq1 = lm(score ~ str + pctel + sublunch) > summary(eq1) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** str e-05 *** pctel *** sublunch < 2e-16 *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: 9.08 on 416 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 3 and 416 DF, p-value: < 2.2e-16

10 10 Now add log(avginc): > eq2 = lm(score ~ str + pctel + sublunch + log(avginc)) > summary(eq2) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** str ** pctel e-08 *** sublunch < 2e-16 *** log(avginc) e-11 *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: on 415 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 4 and 415 DF, p-value: < 2.2e-16 How have the results changed? What is going on here?

11 11 Regressor (1) (2) (3) (4) (5) (6) (7) str -1.00** (0.24) -0.73** (0.23) str 2 str 3 pctel ** (0.033) hiel ** (0.032) hiel str hiel str 2 hiel str 3 sublunch ** (0.022) ** (0.030) log(avginc) 11.57** (1.74) Intercept 700.2** (4.7) 658.6** (7.7) R

12 12 Let s address (a): After controlling for differences in economic characteristics of different districts, does the effect of str on score depend on the fraction of English learners (pctel)? An easier way to examine this might be to create a dummy variable. Let s define a new variable (high percentage of English learners): hiel = 0 for classes with small percentage of English learners hiel = 1 for classes with large percentage of English learners How should we determine the threshold? > summary(pctel) Min. 1st Qu. Median Mean 3rd Qu. Max

13 13 Create hiel: hiel = 0 hiel[pctel >= 10] = 1 To address (a), create the interaction term: hielstr = hiel*str

14 14 Try a regression without economic controls: > eq3 = lm(score ~ str + hiel + hielstr) > summary(eq3) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) <2e-16 *** str hiel hielstr Signif. codes: 0 *** ** 0.01 * Residual standard error: on 416 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: 62.4 on 3 and 416 DF, p-value: < 2.2e-16 Which coefficient should we be testing to see if str has a different effect for classes with many English learners? What do we conclude? In anticipation of (c), let s test if str matters. Does it appear to matter from the results above?

15 15 H 0 : student-teacher ratio has no effect on test scores H 0 : model (3) The model under the null hypothesis is: > eqnul1 = lm(score ~ hiel) > summary(eqnul1) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) <2e-16 *** hiel <2e-16 *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: on 418 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 1 and 418 DF, p-value: < 2.2e-16

16 16 Formula for F-statistic: F = F = (R 2 U R 2 R ) q (1 R 2 U ) (n k U 1) ( ) 2 ( ) ( ) = 7.57 Since this is greater than the 5% critical value of 3.00, we reject the null. Alternatively, use the following R-code to perform the test: > anova(eq3,eqnul1) Analysis of Variance Table Model 1: score ~ str + hiel + hielstr Model 2: score ~ hiel Res.Df RSS Df Sum of Sq F Pr(>F) *** --- Signif. codes: 0 *** ** 0.01 *

17 17 Let s try a model with economic controls. > eq4 = lm(score ~ str + hiel + hielstr + sublunch + log(avginc)) > summary(eq4) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** str hiel hielstr sublunch < 2e-16 *** log(avginc) e-11 *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: on 414 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 5 and 414 DF, p-value: < 2.2e-16 Has the conclusion (about a different effect for classes with many English learners) changed?

18 18 Again, let s test the null that str doesn t matter. Restricted model: > eqnul2 = lm(score ~ hiel + sublunch + log(avginc)) > anova(eq4,eqnul2) Analysis of Variance Table Model 1: score ~ str + hiel + hielstr + sublunch + log(avginc) Model 2: score ~ hiel + sublunch + log(avginc) Res.Df RSS Df Sum of Sq F Pr(>F) ** --- Signif. codes: 0 *** ** 0.01 *

19 19 Regressor (1) (2) (3) (4) (5) (6) (7) str -1.00** (0.24) -0.73** (0.23) (0.54) (0.30) str 2 str 3 pctel ** (0.033) ** (0.032) hiel 5.64 (16.7) hiel str (0.84) hiel str (9.1) (0.47) hiel str 3 sublunch ** (0.022) ** (0.030) ** (0.029) log(avginc) 11.57** (1.74) 12.12** (1.8) Intercept 700.2** (4.7) 658.6** (7.7) 682.2** (10.5) 653.7** (8.9) R

20 20 Now let s address (b): is the relationship between str and score non-linear? > str2 = str^2 > str3 = str^3 > eq5 = lm(score ~ str + str2 + str3 + hiel + sublunch + log(avginc)) > summary(eq5) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) str * str ** str ** hiel e-07 *** sublunch < 2e-16 *** log(avginc) e-11 *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: on 413 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 6 and 413 DF, p-value: < 2.2e-16

21 21 Regressor (1) (2) (3) (4) (5) (6) (7) str -1.00** (0.24) -0.73** (0.23) (0.54) (0.30) 64.33** (25.5) str ** (1.29) str ** (0.022) pctel ** (0.033) ** (0.032) hiel 5.64 (16.7) 5.50 (9.1) -5.47** (1.03) hiel str (0.84) (0.47) hiel str 2 hiel str 3 sublunch ** (0.022) ** (0.030) ** (0.029) ** (0.028) log(avginc) 11.57** (1.74) 12.12** (1.8) 11.75** (1.7) Intercept 700.2** (4.7) 658.6** (7.7) 682.2** (10.5) 653.7** (8.9) (165.8) R

22 22 To test the null hypothesis that the relationship between str and score is linear, estimate a restricted model and compare it to model (5): > eqnul3 = lm(score ~ hiel + sublunch + log(avginc)) > anova(eq5,eqnul3) Analysis of Variance Table Model 1: score ~ str + str2 + str3 + hiel + sublunch + log(avginc) Model 2: score ~ hiel + sublunch + log(avginc) Res.Df RSS Df Sum of Sq F Pr(>F) *** --- Signif. codes: 0 *** ** 0.01 * What do you conclude? What other way might you try to capture this non-linear effect? How would you test to see if str matters, using model (5)?

23 23 Let s reconsider (a) under the cubic specification. We want to know if the effect of str on score is different for classes with a high percentage of English learners. Again, the strategy is: have the dummy variable hiel interact with all terms involving str this allows for the marginal effect to differ between the two groups testing to see if the coeffecients on the interaction terms are jointly equal to zero is equivalent to testing that there is no difference between the two groups Create the new interaction terms: hielstr2 = hiel*str2 hielstr3 = hiel*str3 Add the interaction terms to model (5): eq6 = lm(score ~ str + str2 + str3 + hiel + hielstr + hielstr2 + hielstr3 + sublunch + log(avginc))

24 24 Regressor (1) (2) (3) (4) (5) (6) (7) str -1.00** (0.24) -0.73** (0.23) (0.54) (0.30) 64.33** (25.5) 83.70** (29.69) str ** (1.29) -4.38** (1.51) str ** (0.022) 0.075** (0.025) pctel ** (0.033) ** (0.032) hiel 5.64 (16.7) 5.50 (9.1) -5.47** (1.03) 816.1* (434.61) hiel str (0.84) (0.47) * (66.35) hiel str * (3.35) hiel str * (0.056) sublunch ** (0.022) ** (0.030) ** (0.029) ** (0.028) ** (0.029) log(avginc) 11.57** (1.74) 12.12** (1.8) 11.75** (1.7) 11.80** (1.75) Intercept 700.2** (4.7) 658.6** (7.7) 682.2** (10.5) 653.7** (8.9) (165.8) (192.2) R

25 25 How do we test (a) using model (6)? > anova(eq6,eq5) Analysis of Variance Table Model 1: score ~ str + str2 + str3 + hiel + hielstr + hielstr2 + hielstr3 + sublunch + log(avginc) Model 2: score ~ str + str2 + str3 + hiel + sublunch + log(avginc) Res.Df RSS Df Sum of Sq F Pr(>F) So, once again, we can t reject the null that the effect of str on score is the same regardless of number of English learners. This suggests that the interaction terms are not needed, and model (5) is adequate. For a final model, let s make sure that our results are invariant to the use of hiel or pctel. eq7 = lm(score ~ str + str2 + str3 + pctel + sublunch + log(avginc))

26 26 Regressor (1) (2) (3) (4) (5) (6) (7) str -1.00** (0.24) -0.73** (0.23) (0.54) (0.30) 64.33** (25.5) 83.70** (29.69) 65.29** (25.48) str ** (1.29) -4.38** (1.51) -3.47** (1.30) str ** (0.022) 0.075** (0.025) 0.060** (0.022) pctel ** (0.033) ** (0.032) ** (0.032) hiel 5.64 (16.7) 5.50 (9.1) -5.47** (1.03) 816.1* (434.61) hiel str (0.84) (0.47) * (66.35) hiel str * (3.35) hiel str * (0.056) sublunch ** (0.022) ** (0.030) ** (0.029) ** (0.028) ** (0.029) ** (0.030) log(avginc) 11.57** (1.74) 12.12** (1.8) 11.75** (1.7) 11.80** (1.75) 11.51** (1.73) Intercept 700.2** (4.7) 658.6** (7.7) 682.2** (10.5) 653.7** (8.9) (165.8) (192.2) (165.9) R

27 27 Summary (a) Based on hypothesis tests involving models (3), (4) and (6), there doesn t appear to be a substantial difference in the effect of str on score for classes with many English learners. (b) A hypothesis test involving model (5) indicates the relationship between str and score is non-linear. (c) Using F-tests, the null hypothesis that str has no effect on score is rejected in all models. (Only one of these F-tests was shown). Model (5) and (7) should be our preferred models based on the sequence of testing. Let s use them to provide some policy recommendation. If str = 20, then reducing str to 18 would improve score by 3.00 using model (5), and 2.93 using model (7). If str = 22, then reducing str to 20 would improve score by 1.93 (model 5) or 1.90 (model 7).

Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as:

Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as: 1 Joint hypotheses The null and alternative hypotheses can usually be interpreted as a restricted model ( ) and an model ( ). In our example: Note that if the model fits significantly better than the restricted