Sociology 592 - Research Statistics I Final Exam Answer Key December 15, 1993 Where appropriate, show your work - partial credit may be given. (On the other hand, don't waste a lot of time on excess verbiage.) Do not spend too much time on any one problem. You are free to refer to anything that was demonstrated in the homework or handouts. 1. (5 points each, 20 points total). For each of the following, indicate whether the statement is true or false. If you think the statement is false, indicate how the statement could be corrected. NOTE: These are all pretty easy, but you could waste a great deal of time on some of them or make stupid mistakes if you don't happen to see what the easiest way to approach each problem is. a. One of the problems with using listwise deletion of missing data is that it can produce a correlation matrix that would be impossible with any complete set of data. Solution. False. This problem can occur with pairwise deletion of missing data, because each correlation can be based on a different and non-equivalent sample. b. SPSS produced the following computer printout. If =.05 and backwards stepwise regression is being used, then EDUC should be removed from the equation next. Multiple R.50998 Analysis of Variance R Square.26008 DF Sum of Squares Mean Square Adjusted R Square.25561 Regression 3 29200.56017 9733.52006 Standard Error 12.94175 Residual 496 83074.43983 167.48879 ------------------------------- Variables in the Equation -------------------------------- Variable B SE B Beta Correl Part Cor Partial T Sig T PROGTYPE 8.351967 1.447318.278956.400000.222883.250827 5.771.0000 BLACK 10.698092 1.450267.286709.330000.284911.314422 7.377.0000 EDUC 1.282964.402837.153956.350000.123009.141563 3.185.0015 (Constant) 37.213275 3.878287 9.595.0000 Solution. False. EDUC will not be eliminated from the model because its effect is significant at the.05 level. Backwards deletion stops once all remaining variables are statistically significant. c. A sociologist believes that women are more religious than are men (where religiosity is measured on a scale that runs from a low of zero to a high of one hundred). In her sample of 300 subjects, when she regresses Religiosity (Y) on Gender (X, where 1 = Female, 0 = Male), she finds that b = 10, F = 25, the mean of Y is 50 and the mean of X is 0.5. Therefore, she should reject the null hypothesis. Solution. True. Recall that in a bivariate regression, T² = F. Since b is positive and F = 25, T = 5, which is highly significant. Note that if we did not know the sign of b, we wouldn't be able to answer the question, since the alternative hypothesis is one-tailed (i.e. a large F is not enough, the effect of gender must be in the predicted direction.) d. A researcher is interested in the relationship between Income (measured in dollars) and race (where 1 = White, 0 = NonWhite). Regression, Anova, and a T-Test are all possible means for addressing this issue.
Solution. True. You could regress income on the dummy variable for race. Or, you could do a one-way Anova or a T-Test where you broke down income by the two categories for race. If the IV had three or more categories, you could still use dummy variable regression or one-way Anova but not a T-test. 2. Short answer problems. (10 points each, 30 points total, up to 10 points extra credit) Answer three of the following. You will get up to five points extra credit for each additional problem you answer correctly. a. When Y is regressed on X1 and X2, r y1 =.6, r y2 =.7, TOL X1 =.75. Compute the semipartial and partial correlations. Solution. Recall that, when there are two IVs, TOL X1 = TOL X2 = 1 - r 12 ² ==> r 12 ² = 1 - TOL X1. Since TOL X1 =.75, r 12 ² =.25 and r 12 =.5 (or, it could conceivably equal -.5, but that would seem unlikely given the strong positive correlations each X has with Y). Now, it is simply a matter of plugging in the values into the appropriate formulas. Here is an SPSS printout (using an arbitrary sample size of 100, variable means of 0 and s.d.'s of 1.) --------------------------------------- Variables in the Equation --------------------------------------- Variable B SE B Beta Correl Part Cor Partial Tolerance VIF T X1.333333.076582.333333.600000.288675.404226.750000 1.333 4.353 X2.533333.076582.533333.700000.461880.577350.750000 1.333 6.964 (Constant).000000.065990.000 table. b. In a multiple regression, N = 160, K = 9, F = 5, SSE = 600. Construct the ANOVA Solution. Note that DFE = N - K -1 = 150, MSE = SSE/DFE = 600/150 = 4, MSR = F * MSE = 5 * 4 = 20, DFR = K,
SSR = DFR * MSR = 9 * 20 = 180 The rest follows pretty easily: Source SS d.f. MS F Regression (or explained) SSR = 180 K = 9 SSR / K = 20 MSR/MSE = 5 Error (or residual) SSE = 600 N - K - 1 = 150 SSE / (N-K-1) = 4 Total SST = 780 N - 1 = 159 SST / (N - 1) = 4.906 c. A sociologist has collected data from 100 respondents. When she regresses Y on X1 and X2, she gets F = 97. When she adds X3 to the model, R² =.75. Test H 0 : ß 3 = 0 H A : ß 3 <> 0. Solution. Recall that The above is the R² for the constrained model. In the unconstrained model, R² =.75. To test whether the constrained and unconstrained models significantly differ, compute With D.F. equal to 1 and 96, this F value is highly significant. Here is the SPSS printout from a hypothetical example that meets the above conditions. N of Cases = 100 Correlation: Y X1 X2 X3 Y 1.000.577.577.289 X1.577 1.000.000.000 X2.577.000 1.000.000 X3.289.000.000 1.000 Equation Number 1 Dependent Variable.. Y Block Number 1. Method: Enter X1 X2 Variable(s) Entered on Step Number 1.. X2 2.. X1 * * * * M U L T I P L E R E G R E S S I O N * * * * Multiple R.81650 R Square.66667 R Square Change.66667
Adjusted R Square.65979 F Change 97.00000 Standard Error.58327 Signif F Change.0000 Analysis of Variance DF Sum of Squares Mean Square Regression 2 66.00000 33.00000 Residual 97 33.00000.34021 F = 97.00000 Signif F =.0000 ------------------ Variables in the Equation ------------------ Variable B SE B Beta T Sig T X1.577350.058621.577350 9.849.0000 X2.577350.058621.577350 9.849.0000 (Constant).000000.058327.000 1.0000 ------------- Variables not in the Equation ------------- Variable Beta In Partial Min Toler T Sig T X3.288675.500000 1.000000 5.657.0000 Equation Number 1 Dependent Variable.. Y Block Number 2. Method: Enter X3 Variable(s) Entered on Step Number 3.. X3 * * * * M U L T I P L E R E G R E S S I O N * * * * Multiple R.86603 R Square.75000 R Square Change.08333 Adjusted R Square.74219 F Change 32.00000 Standard Error.50775 Signif F Change.0000 Analysis of Variance DF Sum of Squares Mean Square Regression 3 74.25000 24.75000 Residual 96 24.75000.25781 F = 96.00000 Signif F =.0000 ------------------ Variables in the Equation ------------------ Variable B SE B Beta T Sig T X1.577350.051031.577350 11.314.0000 X2.577350.051031.577350 11.314.0000 X3.288675.051031.288675 5.657.0000 (Constant).000000.050775.000 1.0000 d. The dean of the college wants to see what relationship, if any, there is between the scholarly output of a faculty member (measured on a scale that runs from 0 to 100) and the religion and gender of the faculty member. Hence, information is collected from 25 Catholic males, 25 Catholic females, 25 NonCatholic males, and 25 NonCatholic females on the following variables: Y = scholarly output X1 = 1 if Catholic, -1 otherwise X2 = 1 if male, -1 otherwise X3 = X1 * X2 (i.e. X3 is an interaction term) A regression analysis yields a = 60, b1 = -10, b2 = -10, b3 = 10. Compute the average scores for Catholic males, Catholic females, NonCatholic males, and NonCatholic females. Who is the most "scholarly" of these four groups? Solution. X1 X2 X3 = X1 * X2 60-10X1-10X2 + 10X3
Catholic Males 1 1 1 50 Catholic Females 1-1 -1 50 NonCatholic Males -1 1-1 50 NonCatholic Females -1-1 1 90 Thus, NonCatholic Females are the most scholarly group, and the Dean should of course plan accordingly as he makes future hires. e. A political scientist has collected data from voters concerning their opinion of Ross Perot. Her variables are Y = Level of support for Perot (measured on a scale that runs from a low of zero to a high of 100), X1 = Previously supported Perot (where 1 = voted for Perot in 1992, 0 = did not vote for Perot), and X2 = Gender (1 = Female, 0 = Male). She finds the following: Mean Label Y 35.000 X1.200 X2.500 N of Cases = 500 Covariance: Y X1 X2 Y 100.000 3.000-1.000 X1 3.000.160 0.000 X2-1.000 0.000.250 (a) Compute the standardized coefficients b1' and b2'. Solution. Note that r Y1 = s Y1 /sqrt(s YY *s 11 ) = 3/sqrt(100 *.16) =.75, r Y2 = s Y2 /sqrt(s YY *s 22 ) = -1/sqrt(100 *.25) = -.20, r 12 = s 12 /sqrt(s 11 *s 22 ) = 0/sqrt(.16 *. 25) = 0, b 1 ' = (r y1 - r 12 * r y2 ) / (1 - r² 12 ) = (.75) / (1) =.75 b 2 ' = (r y2 - r 12 * r y1 ) / ( 1 - r² 12 ) = (-2.0) / (1) = -.20 Note how, when the X's are uncorrelated, the standardized coefficients are the same as the correlations. Here is the rest of the printout: Multiple R.77621 Analysis of Variance R Square.60250 DF Sum of Squares Mean Square Adjusted R Square.60090 Regression 2 30064.75000 15032.37500 Standard Error 6.31743 Residual 497 19835.25000 39.90996 F = 376.65723 Signif F =.0000 ------------------------------- Variables in the Equation -------------------------------- Variable B SE B Beta Correl Part Cor Partial T Sig T X1 18.750000.707018.750000.750000.750000.765466 26.520.0000
X2-4.000000.565614 -.200000 -.200000 -.200000 -.302372-7.072.0000 (Constant) 33.250000.424022 78.416.0000 (b) The researcher believes that women are more supportive of Perot than are men. Is the evidence strong enough to support her case? Solution. Definitely not, because the negative correlation shows that women are actually less supportive of Perot, the exact opposite of what she predicted. 3. After his exciting come from behind win on the NAFTA vote, Bill Clinton wants to know where he stands with the American public. He has told his pollsters to go out and collect information on four key variables: Clinton Female Nafta Union Level of support for Bill Clinton, where 0 = strongly opposes and 100 = strongly supports. Coded 1 if the respondent is female, 0 if male Coded 1 if the respondent favored NAFTA, 0 otherwise Coded 1 if the respondent is a union member, 0 otherwise. Using forward stepwise regression, the first and only model obtained was as follows: MODEL I: FORWARD STEPWISE Mean Std Dev Label CLINTON 47.367 15.426 FEMALE.500.509 NAFTA.500.509 UNION.367.490 N of Cases = 30 Correlation: CLINTON FEMALE NAFTA UNION CLINTON 1.000.705.305.159 FEMALE.705 1.000.200 -.069 NAFTA.305.200 1.000 -.623 UNION.159 -.069 -.623 1.000 * * * * M U L T I P L E R E G R E S S I O N * * * * Equation Number 1 Dependent Variable.. CLINTON Block Number 1. Method: Forward Criterion PIN.0500 FEMALE NAFTA UNION Variable(s) Entered on Step Number 1.. FEMALE Multiple R.70549 Analysis of Variance R Square.49771 DF Sum of Squares Mean Square Adjusted R Square.47977 Regression 1 3434.70008 3434.70008 Standard Error 11.12633 Residual 28 3466.26656 123.79523 F = 27.74501 Signif F =.0000 ------------------ Variables in the Equation ------------------ Variable B SE B Beta T Sig T FEMALE 21.400001 4.062762.705488 5.267.0000 (Constant) 36.666666 2.872806 12.763.0000 ------------- Variables not in the Equation -------------
Variable Beta In Partial Min Toler T Sig T NAFTA.171244.236742.960000 1.266.2163 UNION.209274.294576.995215 1.602.1209 End Block Number 1 PIN =.050 Limits reached. a. (15 points) Interpret the results from model I. How many of the sample members supported NAFTA? How many belong to a union? Who supports Clinton more, men or women, and by how much? Did union members tend to support or oppose NAFTA? Why was FEMALE the first and only variable to enter into the forward stepwise regression? Solution. From the means, you can tell that 50% of the 30 sample members (15) supported NAFTA, and that 36.7% (11) are Union members. The regression b tells you women support Clinton more, by a margin of 21.4 points (or about 58.1 versus 36.7). The negative correlation between Nafta and Union shows that union members tended to oppose Nafta. FEMALE was the first variable to enter because it had the strongest correlation with Clinton. It was the only variable to enter because, once Female was in the equation, neither Union or Nafta could produce a statistically significant increase in R². (This result is deceptive, however, because, as we will see, while neither variable alone can produce a significant increase, the two variables together can.) Using backwards stepwise regression, the first and only model obtained was as follows: MODEL II: BACKWARDS STEPWISE Multiple R.82835 Analysis of Variance R Square.68616 DF Sum of Squares Mean Square Adjusted R Square.64995 Regression 3 4735.19331 1578.39777 Standard Error 9.12683 Residual 26 2165.77333 83.29897 F = 18.94859 Signif F =.0000 --------------------------------------- Variables in the Equation --------------------------------------- Variable B SE B Beta Correl Part Cor Partial Tolerance VIF T FEMALE (1) 3.410260.641671.705488 (2).745743.955000 1.047 5.708 NAFTA 15.060209 4.347246.496486.305492.380611.561975 (3) 1.702 3.464 UNION 16.143976 4.430058 (4).159473.400374.581453.609250 1.641 3.644 (Constant) 24.184991 3.943977 (5) 6.132 ------ in ------- Variable Sig T FEMALE.0000 NAFTA.0019 UNION.0012 (Constant).0000 End Block Number 2 POUT =.100 Limits reached. No variables removed for this block. b. (20 points) Fill in the missing items (1) - (5). Solution. (1) = b 1 = se b1 * T b1 = 3.410260 * 5.708 = 19.46576408 (2) = sr 1 = b 1 ' * SQRT(TOL 1 ) =.641671 * SQRT(.955) =.627067218 (3) = TOL 2 = 1/VIF 2 = 1/1.702 =.587544065 (4) = b 3 ' = b 3 * sd 3 / sd y = 16.143976 *.490 / 15.426 =.512806187 (5) = b 0 ' = 0 (the constant always equals zero when variables are standardized)
Here is the uncensored printout: --------------------------------------- Variables in the Equation --------------------------------------- Variable B SE B Beta Correl Part Cor Partial Tolerance VIF T FEMALE 19.464224 3.410260.641671.705488.627068.745743.955000 1.047 5.708 NAFTA 15.060209 4.347246.496486.305492.380611.561975.587692 1.702 3.464 UNION 16.143976 4.430058.512942.159473.400374.581453.609250 1.641 3.644 (Constant) 24.184991 3.943977 6.132 ------ in ------- Variable Sig T FEMALE.0000 NAFTA.0019 UNION.0012 (Constant).0000 Solution. c. (10 points) Do an F test of the hypothesis H 0 : ß Nafta = ß Union = 0. For an F with 2,26 d.f., the critical value is 3.37, so reject the null hypothesis. Here is the SPSS printout on this test: Hypothesis Tests Sum of DF Squares Rsq Chg F Sig F Source 2 1300.49322.18845 7.80618.0022 NAFTA UNION 3 4735.19331 18.94859.0000 Regression 26 2165.77333 Residual 29 6900.96664 Total d. (5 points) In the forward stepwise regression, only Female made it into the final model. Yet, in the backwards stepwise regression, all three variables made it. Why did the final models differ so much? Solution. This occurs because of suppression. While both Union and Nafta have positive direct effects on support for Clinton, they are negatively correlated with each other. This causes each variable to have a fairly weak correlation with Clinton. Hence, the two together have significant effects, but individually neither appears to have much of an impact. Thus, in a forwards regression, neither makes it in, and in a backwards regression neither makes it out. We will pursue the topic of suppression further in Research Methods class.
All SPSS runs were done with SPSS for Windows 6.0. Following are the control cards: SET WIDTH = 110 Title 'Final Exam, Soc 592, December 1993'. Subtitle 'Problem 1B - Backwards Regression'. MATRIX DATA VARIABLES = IQ ProgType Black Educ/ FORMAT = FREE full /FILE = INLINE / N = 500 /CONTENTS = MEAN SD CORR. BEGIN DATA. 57.000.500.200 10.5 15.000.501.402 1.8 1.000 0.400 0.330 0.350 0.400 1.000 0.100 0.600 0.330 0.100 1.000 0.100 0.350 0.600 0.100 1.000 END DATA. / matrix = in(*) /DESCRIPTIVES DEF / VARIABLES IQ ProgType Black Educ /STATISTICS DEF ZPP /CRITERIA=POUT(.051) /DEPENDENT IQ /method = backward progtype black educ. Subtitle 'Problem 2a - semipartial and partial correlations'. MATRIX DATA VARIABLES = Y X1 X2/ FORMAT = FREE full /FILE = INLINE / N = 100 /CONTENTS = CORR. BEGIN DATA. 1.000.600.700.600 1.000 0.500 0.700 0.500 1.000 END DATA. matrix = in(*) /DESCRIPTIVES corr / VARIABLES Y X1 X2 /STATISTICS DEF TOL ZPP /DEPENDENT Y /method ENTER X1 X2. Subtitle 'Problem 2c - Test of B3 = 0'. * This is one of an infinite number of examples that meets * the conditions set forth in the problem. MATRIX DATA VARIABLES = Y X1 X2 X3/ FORMAT = FREE full /FILE = INLINE / N = 100 /CONTENTS = CORR. BEGIN DATA. 1.000000000.577350269.577350269.288675134.577350269 1.000000000 0.0 0.0 0.577350269 0.0 1.000 0.0.288675134 0.0 0.0 1.000 END DATA. matrix = in(*) /DESCRIPTIVES corr / VARIABLES Y X1 X2 X3 /STATISTICS DEF CHA /DEPENDENT Y /method ENTER X1 X2 /ENTER X3. Subtitle 'Problem 2e - support for Ross'. MATRIX DATA VARIABLES = Y X1 X2/ FORMAT = FREE full /FILE = INLINE / N = 500 /CONTENTS = MEAN SD CORR. BEGIN DATA. 35.000 0.200 0.500 10.000.400.500 1.000.750 -.200.750 1.000 0.000 -.200 0.000 1.000 END DATA. matrix = in(*) /DESCRIPTIVES mean cov / VARIABLES Y X1 X2 /STATISTICS DEF ZPP /DEPENDENT Y /method ENTER X1 X2. Subtitle 'Problem 3 - Support for Clinton'. MATRIX DATA VARIABLES = Rowtype_ Clinton Female Nafta Union/ Format = Free Full/ File = Inline.
Begin Data. MEAN 47.3666667.5000000.5000000.3666667 STDDEV 15.4260937.5085476.5085476.4901325 N 30.0000000 30.0000000 30.0000000 30.0000000 CORR 1.0000000.7054877.3054916.1594728 CORR.7054877 1.0000000.2000000 -.0691714 CORR.3054916.2000000 1.0000000 -.6225430 CORR.1594728 -.0691714 -.6225430 1.0000000 End Data. * Forward Selection. /MATRIX = IN(*) /DESCRIPTIVES DEF /STATISTICS DEF /CRITERIA=PIN(.05) POUT(.10) /DEPENDENT clinton /METHOD=FORWARD female nafta union. * Backwards selection. /MATRIX = IN(*) /DESCRIPTIVES DEF /STATISTICS DEF ZPP TOL /CRITERIA=PIN(.05) POUT(.10) /DEPENDENT clinton /METHOD=BACKWARD female nafta union /TEST (nafta union).