Chapter 8. R-squared, Adjusted R-Squared, the F test, and Multicollinearity

Size: px

Start display at page:

Download "Chapter 8. R-squared, Adjusted R-Squared, the F test, and Multicollinearity"

Jacob Booth
6 years ago
Views:

1 Chapter 8. R-squared, Adusted R-Squared, the F test, and Multicollinearity This chapter discusses additional output in the regression analysis, from the context of multiple regression in the classic model. It also discusses multicollinearity, its effects, and remedies. 8.1 The R-squared Statistic The population R statistic was introduced in Chapter 6 as Φ = 1 E{v (X)}/ Var(Y), where v(x) is the conditional variance of Y given X = x. This number tells you how well the X variable(s) predict your Y variables. Since the entire focus of this book is on conditional distributions p(y x), I d like you to understand the prediction concept in terms of separation of the distributions p(y X = low) and p(y X = high). For example, suppose the true model is Y = X + ε, where X ~ N(0,5 ) and Var(ε) = σ. Then Var(Y) = σ = 1 + σ, and v(x) =σ, implying Φ = 1 σ /(1 + σ ) = 1/(1 + σ ). Three cases I d like you to consider: (i) σ = 9.0, implying a low Φ = 0.1, (ii) σ = 1.0, implying a medium value Φ = 0.5, and (iii) σ = 1/9, implying a high Φ = 0.9. In all cases, let s say a low value of X is 15.0, one standard deviation below the mean, and a high value of X is 5.0, one standard deviation above the mean. Now, when X = 15, the distribution p(y X = 15) is the N(6+0.(15) = 9.0, σ ) distribution; and when X = 5, the distribution p(y X = 5) is the N(6+0.(5) = 11.0, σ ) distribution. Figure displays these distributions for the three cases above, where the population R is either 0.1, 0.5, or 0.9 (which happen in this study when σ is either 9.0, 1.0, or 1/9). Notice that there is greater separation of the distributions p(y x) when the population R is higher. 1

2 Figure Separation of distributions p(y X = low) (left distributions) and p(y X = high) (right distributions) in cases where the population R is 0.1 (top panel), 0.5 (medium panel) and 0.9 (bottom panel). In all cases X = low and X = high refer to an X that is either one standard deviation below the mean or one standard deviation above the mean. In the case of the classic regression model, which is instantiated by Figure 8.1.0, the conditional variance Var(Y X = x) = v(x) is a constant σ, and does not depend on X = x. Also in the classic regression model, the maximum likelihood estimate of σ is ˆ σ = SSE/n, where SSE = n ( y i yˆ ), the sum of squared vertical deviations from yi values to the fitted OLS function. i = 1 i

3 The unconditional variance is Var(Y) = σ Y, so the population R statistic in the classic regression model is The maximum likelihood estimate of σ Y is Φ = 1 σ / σ Y n ˆY σ = SST/n, where SST = = ( y ), the i i y 1 total sum of squared vertical deviations from yi values to the flat line where y = y. See Figure Figure Scatterplot of n = 4 data points (indicated by X s.) Horizontal red line is y = y line and diagonal blue line is the least squares lines. Vertical deviations from the y = y line are shown as red; SST is the sum of these squared deviations. Vertical deviations from the least squares line are shown as blue; SSE is the sum of these squared deviations. The R statistic equals 1 SSE/SST. 3

4 Using the maximum likelihood estimates of conditional and unconditional variance, you get the estimate of the population R-squared statistic, R = 1 (SSE/n)/(SST/n) = 1 SSE/SST. Recall Chapter 5, where I compared different transformations in the X variable. The model with the highest maximized log likelihood was the one with the smallest estimated conditional variance SSE/n, hence it was also the model with smallest SSE, since n is always the same when considering different models for the same data set. Also, SST is always the same when considering different models for the same data set, because SST does not involve the predicted values from the model. Thus, among the different models having different transformed X variable 1, the model with the highest log likelihood corresponds precisely to the model with the highest R statistic. While it is mathematically factual that 0 R 1.0, there is no Ugly Rule of Thumb for how large an R statistic should be to be considered good. Rather, it depends on norms for the given subect area: In finance, any non-zero R for predicting stock returns is interesting, because the efficient markets hypothesis states that the population R is zero in this case. In chemical reaction modeling, the outputs are essentially deterministic functions of the inputs, so an R statistic that is less than 1.0, e.g. 0.99, may not be good enough because it indicates faulty experimental procedures. With human subects and models to predict their behavior, the R statistics are typically less than 0.50 because people are, well, people. We have our own minds, and are not robots that can be pigeon-holed by some regression model. My advice is to rely less on R, and more on separation of distributions as seen in Figure When we get to more complex models, the usual R statistic becomes less interpretable, and in some cases it is non-existent. But you always will have conditional distributions p(y x), and you can always graph those distributions as shown in Figure to see how well your X predicts your Y. 8. The Adusted R-Squared Statistic Recall that, in the classic model, Φ = 1 σ / σ Y, and that the standard R statistic replaces the two variances with their maximum likelihood estimates. Recall also that maximum likelihood estimates of variance are slightly biased. Replacing the variances with their unbiased estimates gives the adusted R statistic: R a = 1 {SSE/(n k 1)}/{SST/(n -1)} With larger number of predictor variables k, the ordinary R tends to be increasingly biased upward; the adusted R statistic is less biased. You can interpret the adusted R statistic in the same way as the ordinary one, but note that the adusted R statistic can give values less than 0.0, which are clearly bad estimates since the estimand Φ cannot be negative. 1 This discussion refers to X transformations only, not Y transformations. 4

5 The following R code indicates where these statistics are, as well as by hand calculations of them. sales = read.table(" attach(sales); Y = NSOLD; X1 = INTRATE^-1; X = PPGGAS n = nrow(sales) fit = lm(y ~ X1 + X); summary(fit) SST = sum( (Y-mean(Y))^ ) SSE = sum(fit$residuals^) ## By hand calculations of R-squared statistics R.squared = 1 - SSE/SST R.squared.ad = 1 - (SSE/(n-3))/(SST/(n-1)) R.squared; R.squared.ad The summary of the fit shows the following R and adusted R statistic: Multiple R-squared: , Adusted R-squared: F-statistic: on and 15 DF, p-value: 3.50e-08 The by hand calculations agree: > R.squared; R.squared.ad [1] [1] The F Test See the R output a few lines above: Underneath the R statistic is the F-statistic. This statistic is related to the R statistic in that it is also a function of SST and SSE (see Figure again.) It is given by F = {(SST SSE)/k)/{SSE/(n k 1)}, If you add the line ((SST-SSE)/)/(SSE/(n-3)) to the R code above, you will get the reported F-statistic, although with more decimals: With a little algebra, you can relate this directly to the R statistic, showing that for fixed k and n, larger R corresponds to larger F: F = {(n-k-1)/k} R /(1 R ) This statistic is used to test the global null hypothesis H0: β1 = β = = βk = 0. In loose words, this hypotheses states that none of the regression variables X1, X,, or Xk is related to Y. Under the classic model where H0: β1 = β = = βk = 0 is true, it can be proven mathematically that F ~ Fk, n-k-1 5

6 In other words, the null distribution of the F statistic is the F distribution with k numerator degrees of freedom and n-k-1 denominator degrees of freedom. Recall also that the degrees of freedom for error, dfe, was given by dfe = n-k-1. The numerator degrees of freedom, k, is sometimes called the model degrees of freedom, hence symbolized as dfm, because it represents the flexibility (freedom) of the model. When H0: β1 = β = = βk = 0 is true, the theoretical R statistic is exactly Φ = 0. And when H0 is false you get larger values of R, hence larger F-statistics. Unlike the t-test for testing individual regression coefficients, the p-value for testing H0: β1 = β = = βk = 0 via the F test considers the extreme values of F to be only the large values, not both the large and the small ones: Smaller F values are expected under H0. To understand the F statistic, when it is small and when it is large, its distribution, and the chance only model where β1 = β = = βk = 0, you should use simulation. (As always!) Simulation Study to Understand the F Statistic sales = read.table(" attach(sales); X1 = INTRATE^-1; X = PPGGAS n = nrow(sales) Y = 5 + 0*X1 + 0*X + rnorm(n,0,4) ## Notice the 0 s: The null model is true fit = lm(y ~ X1 + X); summary(fit) The code above generates data Y that is unrelated to either X1 or X; in other words, the null hypothesis H0: β1 = β = 0 is in fact true. From the code above, I got F = (yours will vary by randomness). But to understand what is the range of possible F values that are explained by chance alone, you need to repeat this simulation many (ideally, infinitely many) times. So let s simulate a bunch of em, save the F values, draw their histogram and overlay the theoretically correct Fdfm,dfe density. R Code for Figure Nsim = Fsim.null = numeric(nsim) Fsim.alt = numeric(nsim) for (i in 1:Nsim) { Y.null = 5 + 0*X1 + 0*X + rnorm(n,0,4) Y.alt = *X1 + 50*X + rnorm(n,0,4) fit.null = lm(y.null ~ X1 + X) fit.alt = lm(y.alt ~ X1 + X) Fsim.null[i] = summary(fit.null)$fstatistic[1] Fsim.alt[i] = summary(fit.alt)$fstatistic[1] } par(mfrow=c(3,1)) For example, the quadratic regression model, which has dfm =, is more flexible than the linear model, which has dfm = 1. 6

7 par(mar=c(4,4,1,1)) hist(fsim.null, breaks=100, freq=f, main="", xlab="f value") hist(fsim.null, breaks=100, freq=f, main="", xlab="f value") flist = seq(0,15,.01) fdist = df(flist,,15) crit = qf(.95,,15) points(flist, fdist, type="l") abline(v=crit, col="blue") hist(fsim.null, breaks=100, freq=f, main="", xlab="f value") points(flist, fdist, type="l") abline(v=crit, col="blue") hist(fsim.alt,breaks=100, freq=f, add=t, lty=, border="red") 7

8 Figure Top panel: Histogram of 10,000 simulated F statistics under the null model. Middle panel: Same as top panel but with the theoretically correct F,15 distribution overlaid (solid black curve), as well as its 0.95 quantile 3.68 (blue line). Bottom panel: Same as middle panel, but with histogram of 10,000 simulated F statistics under an alternative model superimposed (red histogram). The observed F statistic from the original data was As seen in Figure 8.3.1, this value is off the chart, and hence does not appear to be easily explained by the null model where β1 = β = 0. The p-value is calculated from the solid F,15 curve shown in the middle and bottom panels 8

9 of Figure 8.3.1; it is the area under that curve beyond , and is calculated in R as 1- pf( ,,15), giving e-08, agreeing with p-value: 3.50e-08 shown in the lm output above. The conclusion is that the F statistic is not easily explained under the model where β1 = β = 0, so it is logical to conclude that (β1 = β = 0) is not true; i.e., it is logical to conclude that either β1 0, or β 0, or that both β1 and β differ from 0. Be careful, though: The F test is not specific. A significant F test does not tell you that both parameters differ from zero, nor can it identify which parameter differs from zero. It can only tell you that at least one parameter (either β1 or β) differs from Multicollinearity Multicollinearity (MC) refers to the X variables being collinear to varying degrees. In the case of two X variables, X1 and X, collinearity means that the two variables are close to linearly related. A perfect multicollinearity means that they are perfectly linearly related. See Figure Figure Left panel: Collinear X variables having correlation Right panel: Perfectly collinear X variables having correlation 1.0. Often, multicollinearity with ust two X variables is called simply collinearity ; Figure illustrates the meaning of the term collinear. With more X variables, it is not so easy to visualize multicollinearity. But if one of the X variables, say X, is closely related to all the other X variables via 9

10 X a0 + a1x1 + + a-1x-1 + a+1x akxk then there is multicollinearity. And if the is in fact an = in the equation above, then there is a perfect multicollinearity. A perfect multicollinearity causes the X T X matrix to be non-invertible, implying that there are no unique least squares estimates. Equations 0 through k shown in Section 7.1 can still be solved for estimates of the β s, but there are infinitely many solutions, so it is unclear what the effects of the individual X variables are. To understand this infinity of solutions for the estimated β s, consider the case where there is only one X variable. A perfect multicollinearity in this case means that X1 = a0, a constant, so that the X1 column is perfectly related to the intercept column of 1 s; i.e., X1 = a01. Figure 8.4. shows how data might look in this case, where xi = 10 for every i = 1,,n, and also shows several possible least squares fits, all of which have the same sum of squared errors. Figure Non-unique least squares fits, all of which provide the minimum SSE, when the X column of data is perfectly related to the intercept column. A similar phenomenon happens with the case of two X variables as shown in the right panel of Figure 8.4.1: There are an infinity of planar functions (review Figure in Chapter 7) of X1 and X that all minimize the SSE. 10

11 In R, you will get one of these infinitely many estimated planar functions, but you can t trust the parameter estimates, because, again, they are ust one of an infinity of possible estimates. For example, the R code below generates perfectly collinear (X1, X) data, then generates Y data from these X data that satisfy all regression assumptions. set.seed(1345) X1 = rnorm(100) X = *X1-1 Y = 1 + *X1 + 3*X + rnorm(100,0,1) summary(lm(y~x1+x)) This code produces the following output: Call: lm(formula = Y ~ X1 + X) Residuals: Min 1Q Median 3Q Max Coefficients: (1 not defined because of singularities) Estimate Std. Error t value Pr(> t ) (Intercept) <e-16 *** X <e-16 *** X NA NA NA NA --- Signif. codes: 0 *** ** 0.01 * Residual standard error: on 98 degrees of freedom Multiple R-squared: , Adusted R-squared: F-statistic: 7888 on 1 and 98 DF, p-value: <.e-16 Notice the NA for the coefficient of X. Recognizing that X T X is not invertible, and hence that there are infinitely many solutions for the estimated β s, R simply assigned ˆβ = 0 and estimated 3 β1. Note also the comment Coefficients: (1 not defined because of singularities). In matrix algebra, singular means not invertible ; the comment lets you know that R recognizes that X T X is not invertible. To visualize the infinity of solutions for the regression plane in the example above, have a look at the 3-D representation of the data ust simulated in Figure below. In that graph, there are infinitely many planes that will separate the positive and negative residuals as shown some are steeper on one side of the vertical sheet where the data lie, some are steeper on the other side of the sheet. 3 While the estimates and 0 do not correspond well with R code where β 1 = and β = 3, the estimate actually makes sense when you replace X with X1 1 in the model equation; then you see that the true multiplier of X1 is exactly

12 Figure Three-D scatterplot of data where the X variables are perfectly collinear. The data lie in a vertical sheet above the line of collinearity on the (X1, X) plane. There are infinitely many planes for which the given blue points are above and the given red points are below. Intuitively, it makes sense that you cannot estimate the coefficients uniquely when there is perfect multicollinearity. Recall that β is the difference between the means of the distributions of potentially observable Y values in two cohorts: Cohort 1: X1 = x1, X = x Cohort : X1 = x1, X = x +1 However, if X is perfectly related to X1, it is impossible to increase X while leaving X1 fixed. 1

13 Hence, with perfectly collinear (X1, X) variables, it is simply impossible to estimate the effect of larger X when X1 is held fixed. See the right panel of Figure 8.4.1: You cannot increase X while holding X1 constant. The intuitive logic that you cannot estimate the effect of increasing X while X1 is held constant for the case of perfectly collinear X variables, also explains the problem with near perfect collinearity, as shown in the left panel of Figure Since the data are so closely related, there is very little variation in X when you fix X1, say, by drawing a vertical line over any particular value of X1. Recall also that, to estimate the effect of an X variable on Y, you need variation in that X variable. The relevant variation in the case of multiple regression, where you are estimating effect of an X variable holding the other variables fixed, is exactly the variation in that X variable where the other variables are fixed. If there is little such variation, as shown in the left panel of Figure 8.4.1, you will get unique estimates of the β s, but they will be relatively imprecise estimates because, again, there is so little relevant variation in the X data. Therefore, the main problem with multicollinearity is that the estimates of the β s are relatively (relative to the case where the X variables are unrelated) imprecisely estimated. This imprecision manifests itself in higher standard errors of the estimated β s. There is a simple formula to explain how multicollinearity affects the standard errors of the estimated β s: Recall from Chapter 7, Section 3, that s.e.( βˆ ) = σˆ c, = 0,1,,k. In simple regression, where there is ust one X variable, this expression reduces to the form you saw in Chapter 3, ˆ ˆ σ s. e.( β1) =. (8.4.1) n 1 Some fairly complicated matrix algebra gives the following representation of the standard errors for the multiple regression case: 1/ ˆ 1 s.e.( βˆ ) = σˆ c =. s 1 1 R x σn Here, R is the R-squared statistic that you get by regressing X on all other X variables. Higher R is in indication of more extreme multicollinearity, and its effect on the precision of the estimate βˆ. Two important special cases are (1) R = 0, in which case the standard error formula for βˆ is exactly as given in the simple regression where there is only one X variable, see equation s x 13

14 (8.4.1) above, and () R 1, in which case the standard error tends to infinity, which is expected because when X is increasingly related to the other X variables, there is less and less variation in X when all other X variables are held fixed. The term 1/(1- R ) is called the variance inflation factor because it measures how much larger is the variance of be called a standard error inflation factor. βˆ due to multicollinearity. By the same token, {1/(1- R )} 1/ can Example: Illustrating the effects of MC in a Simulation Study ## R code to illustrate the effects of MC ## This data set shows what happens with highly MC data. Note that the ## model has a highly significant F statistic (p-value =.306e-08), ## but neither X variable is significant via their t statistics. The MC ## between X1 and X makes it difficult to assess the effect ## of X1 when X is held fixed, and vice versa. set.seed(1345) x1 = rep(1:10, each=10) x = x1 + rnorm(100, 0,.05) # X differs from X1 by N(0,0.05^) random variation. ## You can see the collinearity in the graph: plot(x1,x) ## The true model has beta0 = 7, beta1=1, and beta = 1, ## with all assumptions satisfied. y = 7 + x1 + x + rnorm(100,0,10) dat.high.mc = data.frame(y,x1,x) high.mc = lm( y ~ x1 + x, data = dat.high.mc) summary(high.mc) The output shows: Call: lm(formula = y ~ x1 + x, data = dat.high.mc) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) ** x x Signif. codes: 0 *** ** 0.01 *

15 Residual standard error: on 97 degrees of freedom Multiple R-squared: , Adusted R-squared: F-statistic: 1. on and 97 DF, p-value:.306e-08 In the example above, the R statistics (obtained via summary(lm(x1~x)) and summary(lm(x~x1)) are and , implying standard error inflation factors 1/( ) 1/ = This, the standard errors, and are times larger than they would have been had the variables been uncorrelated. A slight modification of the simulation model to keep all the same (same n, same σ, nearly the same variances of X1 and X) except with uncorrelated X variables verifies this: set.seed(1345) x1 = rep(1:10, each=10) x = rep(1:10, 10) ## You can see the lack of collinearity in the graph: plot(x1,x) ## The true model has beta0 = 7, beta1=1, and beta = 1, ## with all assumptions satisfied. y = 7 + x1 + x + rnorm(100,0,10) dat.no.mc = data.frame(y,x1,x) no.mc = lm( y ~ x1 + x, data = dat.no.mc) summary(no.mc) The output shows: Call: lm(formula = y ~ x1 + x, data = dat.high.mc) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) ** x ** x Signif. codes: 0 *** ** 0.01 * Residual standard error: on 97 degrees of freedom Multiple R-squared: 0.157, Adusted R-squared: F-statistic: on and 97 DF, p-value: In the case of simple regression, the R statistic is equal to the square of the correlation coefficient, so you get the same R in both regressions. However, with more than two X variables, the R statistics will all be different. Some of the X variables will be more highly related to the others; these are the variables that suffer most from multicollinearity. 15

16 The R statistics relating X1 to X and X to X1 are both 0.0 in this second example. The standard error multiplier in the first example with the high multicollinearity was ; checking, we see that = 0.6, reasonably close to the standard errors and in the original analysis with multicollinear variables. Differences are mostly explained by randomness in the estimates σˆ. Summary of Multicollinearity (MC) and its effects 1. MC exists when the X s are correlated (i.e., almost always). It does not involve the Y s. Existence of MC violates none of the classic model assumptions 5.. Greater MC causes larger standard errors of the parameter estimates. This means that your estimates of the parameters tend to be less precise with higher degrees of MC. You will tend to have more insignificant tests and wider confidence intervals in these cases. This happens because when X1 and X are closely related, the data cannot isolate the unique effect of X1 on Y, controlling X, as precisely as is the case when X1 and X are not closely related. 3. The more the MC, the less interpretable are the parameters. In particular, β1 is the effect of varying X1 when other X s are held fixed. But it becomes difficult to even imagine varying X1 while holding X fixed, when X1 and X are extremely highly correlated. 4. MC almost always exists in observational data, and often exists in experimental data as well. The question is therefore not is there MC?, but rather how strong is the MC and what are its effects? Generally, the higher the correlations among the X's, the greater the degree of MC, and the greater the effects (high parameter standard errors; tenuous parameter interpretation.) 5. The extreme case of MC is called perfect MC, and happens when the columns of the X matrix are perfectly linearly dependent, in which case there are no unique least squares estimates. The fact that there are no unique LSEs in this case does not mean you can't proceed; you still can still estimate parameters (albeit non-uniquely) and make valid predictions resulting from such estimates. Most computer software allow you to estimate models in this case, but provide a warning message or other unusual output (such as R s NA for some parameter estimates) that you should pay attention to. 6. Regression models that are estimated using MC data can still be useful. There is no absolute requirement that MC be below a certain level. In fact, in some cases it is strongly recommended that highly correlated variables be retained in the model. For 5 Some books and web documents incorrectly state that there is an assumption of no MC in regression analysis. 16

17 example, in most cases you should include the linear term in a quadratic model, even though the linear and quadratic terms are highly correlated. This is called the Variable Inclusion Principle ; more on this in the next chapter. 7. It is most important that you simply recognize the effects of multicollinearity, which are (i) high variances of parameter estimates, (ii) tenuous parameter interpretations, and (iii) in the extreme case of perfect multicollinearity, non-existence of unique least squares estimates. When might MC be a Problem? It makes no sense to test for MC in the usual hypothesis testing H0 vs. H1 sense. The following are not tests, they are ust suggestions, essentially Ugly Rules of Thumb, aimed to help identify when MC might be a problem. 1. When correlations between the X variables are extremely high (e.g., many greater than 0.9) or variance inflation factors are very high (e.g., greater than 9.0; implying a standard error inflation factor greater than 3.0).. When variables that are important a priori but are insignificant, you might suspect a MC problem (but consider also whether the sample size is simply too small). What to do about MC? 1. Main Solution: Diagnose the problem/understand its effects. Display the correlation matrix of the X variables and analyze the variance inflation factors. MC always exists to a degree, and need not be removed, especially if MC is not severe; it violates no assumptions. You don't necessarily have to do anything at all about it.. In some cases, you can avoid using MC variables. Here are some suggestions. Evaluate them in your particular situation to see if they make sense; every situation is different. a. Drop less important and/or redundant X variables. b. Combine X variables into an index. For example, if X1, X and X3 are all measuring the same thing, then you might use their sum or average in the model in place of the original three X variables. c. Use principal components to reduce the dimensionality of the X variables (this is discussed in courses in Multivariate Analysis. Maybe also later in this book, I have not yet decided whether to include it). d. Use common factors (or latent variables), to represent the correlated X variables, and fit a structural equations model relating the response Y to these common factors. This is a somewhat controversial solution because the common factors are unobservable, and therefore cannot be used for prediction. Nevertheless, this model is quite common in behavioral research. It is discussed in courses in Multivariate Analysis, but not here. I will use the related combine into an index approach (see 17

18 .b. above) instead, which is very similar to the latent variable-based analysis, in some ways better, and in some ways worse. e. Use ratios in size-related cases. For example, if you have the two firm-level variables X1 = Total Sales and X = Total Assets in your model, they are bound to be highly correlated. So you might use the two variables X 1 = (Total Assets)/(Total Sales) and X = (Total Sales) (perhaps in log form) in your model instead of the two variables (Total Sales) and (Total Assets). 3. In some cases, you must simply leave multicollinear variables in the model. These cases include a. Predictive Multicollinearity: Two variable can be highly correlated, but both are essential for predicting Y. When you leave one or the other out of the model, you get a much poorer model (much lower R ). In the data set Turtles, if you predict a turtle's sex from its length and height, you will find that length and height are highly correlated (R = 0.97). But you have to include them both in the model because R (length, height) = 0.61, whereas R (length) = 0.31 and R (height) = The scientific conclusion is that turtle sex is more related to turtle shape, a combination of length and height, than it is to either length or height individually. This probably makes sense to a biologist who studies reproductive biology of turtles. b. Variable Inclusion Rules: Whenever you include higher order terms in a model, you should also include the implied lower order terms. For example, if you include X in the model, then you should also include X. But X and X are highly correlated. Nevertheless, both X and X should be used in model, despite the fact that they are highly correlated, for reasons I will give in the next chapter. c. Research Hypotheses: Your main research hypothesis is to assess the effect of X 1, but you recognize that the effect of X1 on Y might be confounded by X. If this is the case, you are simply stuck with including both X 1 and X in the model 4. Other solutions: Redesign study or collect more data. a. Selection of levels: If you have the opportunity to select the (X 1, X ) values, then you should attempt to do so in a way that makes those variables as uncorrelated as possible. For example, (X 1, X ) might refer to two process inputs, each either Low or High, and you should select them in the arrangement (L,L), (L,H), (H,L), (H,H), with equal numbers of runs at each combination, to ensure that X1 and X are uncorrelated. b. Sample size: The main problem resulting from MC is that the standard errors are large. You can always make standard errors smaller by collecting a larger sample size: recall that ˆ 1 s.e.( βˆ ) = s 1 1 R x σn 1/ 18

19 So even you change nothing else, a larger sample size n will make the standard errors smaller. 19

Instructions: Closed book, notes, and no electronic devices. Points (out of 200) in parentheses

ISQS 5349 Final Spring 2011 Instructions: Closed book, notes, and no electronic devices. Points (out of 200) in parentheses 1. (10) What is the definition of a regression model that we have used throughout