Statistics 5100 Spring 2018 Exam 1 Directions: You have 60 minutes to complete the exam. Be sure to answer every question, and do not spend too much time on any part of any question. Be concise with all your responses. Partial SAS output and statistical tables are found in an accompanying handout. The point value of each question is given, and the points sum to 100. Good luck! (Q1) (1 point) What is your name? Possibly useful formulas for this exam: b1 = Corr(X,Y) SDY / SDX b0 = Y b 1 X ei = Y i Y i conf/pred interval: Estimate ± (Critical Value) (Standard Error of Estimate) regression equation: Y i = β 0 + β 1 X i1 + β 2 X i2 + + β p 1 X i,p 1 + ε i (SSE reduced SSE full ) F p q = SSE full n p o p = # β s in full model (incl. intercept) o q = # β s in reduced model (incl. intercept) o n = sample size SSR(U V) = SSE(V) SSE(U, V) R 2 = SS model = 1 SSE SS total SS total 2 R adj = 1 n 1 SSE, where p = # predictors in model n p SS total Data: This exam uses a single data set involving intelligence quotient (IQ). Data were collected on 38 students at a certain university. The relevant variables are summarized in the table below. Of interest is how the other variables could be used to predict IQ. Variable Description IQ Full-scale IQ, from Wechsler subtests Sex 0=Female, 1=Male BrainSize Total pixel count (in 1000 s) of brain in MRI scans Weight Body weight, in pounds Height Body height, in inches
(Q2) IQ is regressed on Weight, Height, and BrainSize; see the partial output from PROC REG in SAS Output 1. (a) (7 points) What percentage of the variability in IQ can be explained by its linear relationship with Weight, Height, and BrainSize? (b) (12 points) Give an appropriate interpretation of the number 0.06788, which is bolded and underlined in the last line of the Parameter Estimates table. (c) (10 points) Give an appropriate interpretation of the number 0.0016, which is also bolded and underlined in the same table. (Note that you are not asked what you would do with this number, or what you might conclude from it, but what does the value of the number itself represent?) (d) (8 points) Comment briefly on what the graphical diagnostics in this output suggest about two model assumptions regarding error term distribution. For both assumptions, specify (i) the assumption, (ii) the name of the graphical diagnostic, and (iii) what the diagnostic suggests about the assumption here. 1: 2: (i) (iii) (i) (iii) (ii) (ii)
(Q3) (8 points) SAS Output 2 reports some numerical diagnostics on the residuals from the model fit in SAS Output 1. For the same two assumptions you referred to in part (d) of Q2 above, specify (i) the name of the numerical diagnostic, (ii) the value of the threshold against which you compare its results (i.e., when would it be significant), and (iii) what the diagnostic suggests about the assumption here. 1: (i) (iii) (ii) 2: (i) (iii) (ii) (Q4) A researcher wonders whether the effect of BrainSize on IQ is the same for both men and women. (See SAS Output 3, where the plotting symbol is Sex.) (a) (6 points) Write out the linear regression model (in terms of β's and the variable names) to address this question. (b) (1 points) Based on this model, write (but do not test) the appropriate null hypothesis to test the researcher's question. (Q5) A researcher wonders whether the average IQ is different for men and women (see SAS Output 4 ). One could use a t-test or ANOVA model to consider this potential difference. Instead, the researcher also wants to also control for possible effects of the factors Weight, Height, and BrainSize (see SAS Output 5 ), but no interactions. (a) (6 points) Write out the linear regression model (in terms of β's and the variable names) to address this question (including controlling for these other factors). (b) (1 point) Based on this model, write (but do not test) the appropriate null hypothesis to test the researcher's question.
(Q6) SAS Output 6 has evidence that multicollinearity is not problematic in the model fit there. For one diagnostic method to assess multicollinearity, identify the following: (a) (1 point) The name of the diagnostic method (b) (4 points) The numeric rule (or threshold) the method uses for determining multicollinearity (i.e., when would multicollinearity be called problematic?) (c) (4 points) How the output in SAS Output 6 compares to this numeric rule (or threshold) (Q7) A researcher regresses IQ on BrainSize, Sex, and the interaction term BrainSize*Sex. Assume that she has verified that model assumptions are met. She reports 90% interval estimates for male IQ when BrainSize is 1000, and partial results are given below. (There is no other output needed to answer these questions.) Interval Type Left Endpoint Right Endpoint Confidence 108.285 115.455 Prediction 99.5534 124.1866 (a) (6 points) What is the correct interpretation of the confidence interval? (b) (6 points) What is the point estimate of male IQ when BrainSize is 1000?
(Q8) [This collection of questions does not involve any data or output.] Consider a multiple linear regression model where Y is regressed on X1, X2, and X3. (a) (6 points) When asked to state the multiple linear regression model, a student wrote the model below. Do you agree? Why or why not? E[Y i ] = β 0 + β 1 X i1 + β 2 X i2 + β 3 X i3 + ε i (b) (6 points) Suppose there is multicollinearity involving X1 and X3. State clearly what this means. (You are not asked how you would check for it.) (c) (6 points) Suppose there is interaction involving X1 and X3. State clearly what this means. (You are not asked how you would check for it.) (Q9) (1 point) What topic(s) did you study most but that did not appear on this exam?
Output and Tables for Spring 2018 STAT 5100 Exam 1 SAS Output 1 The REG Procedure Dependent Variable: IQ Number of Observations Read 38 Number of Observations Used 38 Analysis of Variance Source DF Sum of Squares Mean Square F Value Pr > F Model 3 711.15276 237.05092 4.91 0.0061 Error 34 1640.55776 48.25170 Corrected Total 37 2351.71053 Parameter Estimates Variable DF Parameter Estimate Standard Error t Value Pr > t Intercept 1 75.50800 22.09766 3.42 0.0017 Weight 1 0.04012 0.06915 0.58 0.5656 Height 1-0.53976 0.43146-1.25 0.2195 BrainSize 1 0.06788 0.01977 3.43 0.0016 (SAS Output 1 continues)
SAS Output 1, continued
SAS Output 2 P-value for Brown-Forsythe test of constant variance in residual vs. predicted Obs t_bf BF_pvalue 1 2.57013 0.014450 Output for correlation test of normality of residual (Check text Table B.6 for threshold) The CORR Procedure 2 Variables: resid expectnorm Simple Statistics Variable N Mean Std Dev Sum Minimum Maximum Label resid 38 0 6.65878 0-13.62938 13.19254 residual expectnorm 38 0 0.97720 0-2.13600 2.13600 Pearson Correlation Coefficients, N = 38 Prob > r under H0: Rho=0 resid expectnorm resid residual 1.00000 0.99417 <.0001 expectnorm 0.99417 1.00000 <.0001
SAS Output 4
SAS Output 6 The REG Procedure Model: MODEL1 Dependent Variable: IQ Analysis of Variance Source DF Sum of Squares Mean Square F Value Pr > F Model 3 711.15276 237.05092 4.91 0.0061 Error 34 1640.55776 48.25170 Corrected Total 37 2351.71053 Parameter Estimates Variable DF Parameter Estimate Standard Error t Value Pr > t Variance Inflation Intercept 1 75.50800 22.09766 3.42 0.0017 0 Weight 1 0.04012 0.06915 0.58 0.5656 2.02139 Height 1-0.53976 0.43146-1.25 0.2195 2.27686 BrainSize 1 0.06788 0.01977 3.43 0.0016 1.57844 Collinearity Diagnostics Number Eigenvalue Condition Index Proportion of Variation Intercept Weight Height BrainSize 1 3.98301 1.00000 0.00016327 0.00071000 0.00009143 0.00024682 2 0.01307 17.45495 0.04428 0.61650 0.00288 0.01280 3 0.00291 36.97864 0.21566 0.05134 0.02574 0.93488 4 0.00100 62.97774 0.73990 0.33145 0.97129 0.05207