School of Mathematical Sciences. Question 1

Size: px

Start display at page:

Download "School of Mathematical Sciences. Question 1"

Ariel Houston
6 years ago
Views:

1 School of Mathematical Sciences MTH5120 Statistical Modelling I Practical 8 and Assignment 7 Solutions Question 1 Figure 1: The residual plots do not contradict the model assumptions of normality, constant variance and linearity. Regression Analysis: Y versus X1, X2, X3, X4, X5 Y = X X X X X5 Constant X X X X X S = R-Sq = 91.3% R-Sq(adj) = 89.9% Regression Residual Error Total X X X X X

2 Comments: T-tests: The hypotheses are H 0 : β i = 0, that is, that the coefficient of x i is zero given all other explanatory variables are in the model. The p-values for testing β 1 and β 2 are smaller than 0.01, hence we can reject the null hypothesis for each of these two parameters. The intercept is also highly significant. The p-value for testing β 4 is 0.063, hence we can reject the null hypothesis on the significance level α = 0.1 but not at α = There is weak evidence against the null hypothesis. The p-values for testing β 3 and β 5 are larger than 0.1, hence there is no evidence against the null hypotheses, neither for β 3 nor for β 5. H 0 : β 1 = β 2 = β 3 = β 4 = β 5 versus H 1 : H 0 can be rejected at α = That is, the overall regression is highly significant. It means that some of the explanatory variables explain a significant part of the variability of the response variable. The sequential sums of squares show that SS(β 1 ) is much larger than SS(β 2 β 1 ), that is, adding X 2 to the model given that X 1 is already in the model does not increase the regression sum of squares very much. Also SS(β 3 β 1, β 2 ) is very small and so adding X 3 to the model given that X 1 and X 2 are already there will not increase the SS R by a large amount. Similarly, SS(β 4 β 1, β 2, β 3 ) and SS(β 5 β 1, β 2, β 3, β 4 ) are very small. However, we do need to check their statistical significance. Figure 2: The matrix plot shows that there is a clear relationship between Y and each of X 1 - X 4, however not X 5. Apart from X 5, which seems not to be related to any of the variables, we can see that all explanatory variables show some dependencies among themselves. This may make the model fitting more difficult.. 2

3 Figure 3: The residual plots do not indicate any contradiction to the assumptions of normality, constant variance and linearity. Regression Analysis: Y versus X1, X2, X4 Y = X X X4 Constant X X X S = R-Sq = 90.2% R-Sq(adj) = 89.3% Regression Residual Error Total X X X Comments: We have three predictors in the model. The p-values for testing β 1 and β 2 are smaller than 0.01, hence we can reject the null hypothesis for each of these parameters. The intercept is also highly significant. However, the p-value for β 4 shows that this parameter is weakly significant. H 0 : β 1 = β 2 = β 4 = 0 versus H 1 : H 0 can be rejected at α = 0.001, that is, the overall regression is highly significant. This means that at least one of the three predictors explains a significant part of the variability of the response variable. As before, the sequential sums of squares show that SS(β 1 ) is much larger than SS(β 2 β 1 ), that is, adding X 2 to the model given that X 1 is already in the model, does not increase the regression sum of squares very much. SS(β 4 β 1, β 2 ) is smaller than SS(β 2 β 1 ). Although the statistical significance of the parameters needs to be tested. 3

Figure 4: The residual plots do not contradict the model assumptions of normality, constant variance and linearity. Regression Analysis: Y versus X1, X2 Y = - 1.30 + 2.45 X1-0.00782 X2 Constant -1.

4 Figure 4: The residual plots do not contradict the model assumptions of normality, constant variance and linearity. Regression Analysis: Y versus X1, X2 Y = X X2 Constant X X S = R-Sq = 89.3% R-Sq(adj) = 88.7% Regression Residual Error Total X X Comments: The p-values for testing β 1 and β 2 are smaller than 0.01, hence we can reject the null hypothesis for each of these parameters. The intercept is also highly significant. H 0 : β 1 = β 2 = 0 versus H 1 : H 0 can be rejected at α = 0.001, that is, the overall regression is highly significant. This means that at least one of the two explanatory variables explains a significant part of the variability of the response variable. As before, the sequential sums of squares show that SS(β 1 ) is much larger than SS(β 2 β 1 ), that is, adding X 2 to the model given that X 1 is already in the model, does not increase the regression sum of squares very much, but again, the statistical significance of the parameters needs to be tested. 4

5 Question 2 The question is: can we reduce the set of regressors X 1, X 2,..., X p 1 to, say, X 1, X 2,..., X q 1 (renumbering if necessary) where q < p, by omitting X q, X q+1,..., X p 1? This can be done by testing the hypothesis versus The test function is H 0 : β q = β q+1 =... = β p 1 = 0 H 1 : H 0. F = SS extra/(p q) MS E H0 F p q,n p, where SS extra = SS R SSR red; SS R, MS E are the sums of squares obtained in the full model, SSR red is the regression sum of squares obtained in the reduced model. (a) Here we compare two models: the full model including all explanatory variables X 1, X 2, X 3, X 4, X 5 with the reduced model including only X 1, X 2 and X 4. The number of parameters in the reduced model is q = 4 and in the full model is p = 6, that is, we have p q = 2. The null and alternative hypotheses are H 0 : β 3 = β 5 = 0 versus H 1 : H 0. To obtain the value of the test statistic we calculate: SS extra = SS R SS red R = = F obs = / = Comparing this value with the critical value of F α;p q,n p = F 0.1;2,32 = we see that there is no evidence to reject the null hypothesis. (b) Here we compare two models: the full model including all explanatory variables X 1, X 2, X 3, X 4, X 5 with the reduced model including only X 1 and X 2. Here q = 3 and p = 6, that is, p q = 3. The null and alternative hypotheses are H 0 : β 3 = β 4 = β 5 = 0 versus H 1 : H 0. To obtain the value of the test statistic we calculate: SS extra = SS R SS red R = = F obs = 1.005/ = We have F 0.1;3,32 = and F 0.05;3,32 = Hence we would reject the null hypothesis at the significance level α = 0.1 but not at α = There is weak evidence against the null hypothesis. It is not very clear which model would be better for fitting the petrol consumption. The X 4 predictor is not highly significant when X 1 and X 2 are in the model, also there is only weak evidence against the null hypothesis H 0 : β 3 = β 4 = β 5 = 0 while there is no evidence against H 0 : β 3 = β 5 = 0. The values of R 2 adjusted are not much different, both are high. However, the model with X 1 and X 2 only is more parsimonious. 5

School of Mathematical Sciences. Question 1. Best Subsets Regression

School of Mathematical Sciences MTH5120 Statistical Modelling I Practical 9 and Assignment 8 Solutions Question 1 Best Subsets Regression Response is Crime I n W c e I P a n A E P U U l e Mallows g E P