Sociology 63993 Exam 2 Answer Key March 30, 2012 I. True-False. (20 points) Indicate whether the following statements are true or false. If false, briefly explain why. 1. A researcher has constructed scales that measure health, diet, exercise, income and education. When she regresses health on the other four variables, she finds that the effect of education is 0. This means that, at least when it comes to health, it makes no difference whether you are well educated or poorly educated. False (or at least, not necessarily true). Education could have indirect effects on health, e.g. better educated people may have higher incomes, better diets, and may exercise more. 2. A key advantage of centering independent variables is that centering often eliminates the need to include interaction terms in the model. False. Centering has no impact on whether or not interaction terms are needed in the model. It may make results easier to interpret. 3. A researcher runs the following:. ovtest Ramsey RESET test using powers of the fitted values of health Ho: model has no omitted variables F(3, 10330) = 5.62 Prob > F = 0.0008 This indicates that our model has too many extraneous variables. False. It indicates that polynomial terms (e.g. X 2 ) should be added to the model. 4. A researcher hypothesizes that income positively affects the self-image of men but has a negative effect on the self-image of women. She gets ˆ βincome = 12 ˆ βfemale = 0 ˆ β = 4 Income * Female = 1 if female, 0 if male. The T values for Income and for the interaction term are both highly significant. The evidence supports the researcher s hypothesis. False. The effect of income is smaller for women than it is for men (12 versus 8) but it is still positive. 5. A researcher has regressed Y on X1. However, X2 should also be in the model. As a result of this omission, the coefficient for X1 will definitely be biased. Female False. It won t be biased if X1 and X2 are uncorrelated. Sociology 63993 Exam 2 Page 1
II. Path Analysis/Model specification (25 pts). A sociologist believes that the following model describes the relationship between X1, X2, X3, and X4. All her variables are in standardized form. The estimated value of each path in her model is included in the diagram. u -.7 X2.3 X1.6.5 X4 X3 w v a. (5 pts) Write out the structural equation for each endogenous variable, using both the names for the paths (e.g. β 42 ) and the estimated value of the path coefficient. X2 = β 21 X1 + u = -.7 * X1 + v X3 = β 32 X2 + u =.5 * X2 + v X4 = β 41 X1 + β 42 X2 + w =.6 * X1 +.3* X2 + w b. (10 pts) Part of the correlation matrix is shown below. Determine the complete correlation matrix. (Remember, variables are standardized. You can use either normal equations or Sewell Wright, but you might want to use both as a double-check.) x1 x2 x3 x4 -------------+------------------------------------ x1 1.0000 x2-0.7000 1.0000 x3?? 1.0000 x4.??? 1.0000 The uncensored printout is x1 x2 x3 x4 -------------+------------------------------------ x1 1.0000 x2-0.7000 1.0000 x3-0.3500 0.5000 1.0000 x4 0.3900-0.1200-0.0600 1.0000 Sociology 63993 Exam 2 Page 2
Using normal equations, ρ 12 = β 21 E(X1 2 ) = β 21 = -.7 ρ 13 = β 32 E(X1X2) = β 32 * ρ 12 = -.5*.7 = -.35 ρ 23 = β 32 E(X2 2 ) = β 32 =.5 ρ 41 = β 41 E(X1 2 ) + β 42 E(X1X2) = β 41 + β 42 ρ 12 =.6 + (.3 * -.7) =.39 ρ 42 = β 41 E(X1X2) + β 42 E(X2 2 ) = β 41 ρ 12 + β 42 = (.6 * -.7) +.3 = -.12 ρ 43 = β 41 E(X1X3) + β 42 E(X2X3) = β 41 ρ 13 + β 42 ρ 23 = (.6 * -.35) + (.3 *.5) = -.06 Or, alternatively, ρ = β 21 21 =.7 ρ = β + β β 31 31 21 32 = 0 + (.7*.5) = 0.35 =.35 ρ = β + β β 32 32 31 21 =.5 + (0*.7) =.5 + 0 =.5 ρ = β + β β + β β + β β β 41 41 21 42 31 43 21 32 43 =.6 + (.7*.3) + (0*0) + (.7*.5*0) =.6.21+ 0 + 0 =.39 ρ = β + β β + β β + β β β 42 42 32 43 41 21 43 31 21 =.3 + (.5*0) + (.6*.7) + (0*0*.5) =.3+ 0.42 + 0 =.12 ρ = β + β β + β β + β β β + β β β 43 43 41 31 42 32 41 21 32 42 21 31 = 0 +.(6*0) + (.3*.5) + (.6*.7*.5) + (.3*.7*0) = 0 + 0 +.15.21+ 0 =.06 Sociology 63993 Exam 2 Page 3
c. (5 pts) Decompose the correlation between X1 and X4 into Correlation due to direct effects.6 Correlation due to indirect effects -.21 Correlation due to common causes 0 model: d. (5 pts) Suppose the above model is correct, but instead the researcher believed in and estimated the following X2 X4 w What conclusions would the researcher likely draw? In particular, what would the researcher conclude about the effect of changes in X2 on X4? Discuss the consequences of this mis-specification, and in what ways, if any, the results would be misleading. Why would she make these mistakes? The estimated effect of X2 on X4 would be the same as the correlation between the variables, i.e. -.12. The researchers would therefore conclude that increases in X2 will produce decreases in X4, when in reality the effect of X2 on X4 is positive. The error would occur because there are suppressor effects. The negative correlation produced by the common cause of the omitted X1 gets confounded with the correlation produced by the positive direct effect of X2 on X4. This mistake could cause the researchers to do the exact opposite of what they should do. III. Group comparisons (25 points). The Catholic Bishops are adamantly opposed to proposals that they feel would mandate that Catholic Institutions pay for contraceptive health care. However, they want to know more about how the Catholic laity feel about these issues. In particular, how do Catholic men and women differ in their beliefs? They have therefore collected information from almost 2,300 US Catholics on the following variables. Variable contracept female religious femrel Description Scale that measures support for government-mandated coverage of contraceptive health care. Scale ranges from 0 to 125, where a higher score indicates greater support. Coded 1 if female, 0 otherwise Scale that measures how religious the respondent is. Original scale ranges from 0 (not religious at all) to 75 (extremely religious). The scale used here has been centered to have a mean of zero. female * religious The results of the analysis are as follows:. * Estimate Models. nestreg: reg contracept religious female femrel Sociology 63993 Exam 2 Page 4
Block 1: religious Source SS df MS Number of obs = 2270 -------------+------------------------------ F( 1, 2268) = 540.90 Model 488051.725 1 488051.725 Prob > F = 0.0000 Residual 2046391.94 2268 902.289213 R-squared = 0.1926 -------------+------------------------------ Adj R-squared = 0.1922 Total 2534443.66 2269 1116.98707 Root MSE = 30.038 contracept Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- religious -3.689038.1586182-23.26 0.000-4.000091-3.377986 _cons 61.28458.6304635 97.21 0.000 60.04824 62.52093 Block 2: female Source SS df MS Number of obs = 2270 -------------+------------------------------ F( 2, 2267) = 1317.39 Model 1362299.24 2 681149.622 Prob > F = 0.0000 Residual 1172144.42 2267 517.0465 R-squared = 0.5375 -------------+------------------------------ Adj R-squared = 0.5371 Total 2534443.66 2269 1116.98707 Root MSE = 22.739 contracept Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- religious -1.359155.1327702-10.24 0.000-1.619519-1.098791 female 43.49509 1.057762 41.12 0.000 41.42081 45.56937 _cons 38.09997.7386994 51.58 0.000 36.65138 39.54857 Block 3: femrel Source SS df MS Number of obs = 2270 -------------+------------------------------ F( 3, 2266) = 879.57 Model 1363521.85 3 454507.284 Prob > F = 0.0000 Residual 1170921.81 2266 516.735131 R-squared = 0.5380 -------------+------------------------------ Adj R-squared = 0.5374 Total 2534443.66 2269 1116.98707 Root MSE = 22.732 contracept Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- religious -1.133977.1976051-5.74 0.000-1.521483 -.7464716 female 43.60929 1.060046 41.14 0.000 41.53052 45.68805 femrel -.4102893.2667354-1.54 0.124 -.9333604.1127818 _cons 37.69189.7846873 48.03 0.000 36.15311 39.23067 +-------------------------------------------------------------+ Block Residual Change Block F df df Pr > F R2 in R2 -------+----------------------------------------------------- 1 540.90 1 2268 0.0000 0.1926 2 1690.85 1 2267 0.0000 0.5375 0.3449 3 2.37 1 2266 0.1241 0.5380 0.0005 +-------------------------------------------------------------+ Sociology 63993 Exam 2 Page 5
. * Differences by gender. ttest contracept, by(female) Two-sample t test with equal variances Group Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] Male 1060 35.63679.6628397 21.5805 34.33616 36.93742 Female 1210 83.75289.707922 24.62511 82.364 85.14178 combined 2270 61.28458.7014733 33.42136 59.90899 62.66018 diff -48.1161.9782482-50.03446-46.19775 diff = mean(male) - mean(female) t = -49.1860 Ho: diff = 0 degrees of freedom = 2268 Ha: diff < 0 Ha: diff!= 0 Ha: diff > 0 Pr(T < t) = 0.0000 Pr( T > t ) = 0.0000 Pr(T > t) = 1.0000. ttest religious, by(female) Two-sample t test with equal variances Group Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] Male 1060 1.81229.1085763 3.534988 1.599241 2.025339 Female 1210-1.587626.1049 3.648953-1.793432-1.381819 combined 2270 2.45e-08.0834429 3.975598 -.1636324.1636325 diff 3.399915.1512899 3.103234 3.696596 diff = mean(male) - mean(female) t = 22.4729 Ho: diff = 0 degrees of freedom = 2268 Ha: diff < 0 Ha: diff!= 0 Ha: diff > 0 Pr(T < t) = 1.0000 Pr( T > t ) = 0.0000 Pr(T > t) = 0.0000 Based on the above results, advise the Bishops on the following. When thinking about your answers, keep in mind the various reasons that two groups can differ on some outcome measure. a) (5 pts) In block 3, the coefficient for female is 43.60929. Assuming the model is correct, the analyst for the Bishops thinks this means that, when a man and woman are equally religious, the woman is expected to score 43.6 points higher than the man on the contraception measure. Indicate whether you agree or disagree, and explain why. This is incorrect. The model includes an interaction term, so the expected difference between a man and women who are equally religious depends on how religious they are. Since religious is centered, 43.6 is the expected difference between a man and women of average religiosity. The more religious the man and woman are, the less the gap is between them because, according to the model, religious has a greater negative effect on women than men. By tweaking the problem a little bit we can see the differences displayed graphically. * Redo with factor variables. quietly reg contracept religious i.female i.female#c.religious. quietly margins female, at(religious=(-50(5)50)). marginsplot, noci scheme(sj) Sociology 63993 Exam 2 Page 6
Adjusted Predictions of female Linear Prediction 0 50 100 150-50 -45-40 -35-30 -25-20 -15-10 -5 0 5 10 15 20 25 30 35 40 45 50 religious Male Female b) (10 pts) The researchers begin by estimating a series of models. Which of the models do you think is best, and why? What do these models tell us about how religiosity affects support for mandatory contraceptive care? What ways (if any) do the determinants of support for contraceptive care differ by gender? Model 2 is best. It is a significant improvement over Model 2, while model 3 is not a significant improvement over model 2, i.e. the interaction term is not significant. This model says that the intercepts differ for men and women, but the effects of religious do not. People who are less religious, and also women, tend to have higher levels of support for mandated coverage of contraceptive care (after controlling for the other variable in the model). Put another way, when men and women are equally religious, women tend to support contraceptive care more. c) (10 pts) The researchers then run a series of t-tests. What do these t-tests tell us about how attitudes differ by gender? What additional insights, if any, do these tests give us as to why support for mandatory contraceptive care differs by gender? Mandated health care coverage is much more popular with women than with men women have scores that are 48 points higher on average than do men, and the difference is highly significant. Women also score about 3.4 points lower on the religiosity scale. Since lower levels of religiosity result in higher levels of support for contraceptive care, this compositional difference also contributes to women being more supportive of contraceptive care overall. In short, gender is important for two reasons. First, the intercept is greater for women than it is for men. Second, women tend to be less religious, which in turn causes them to support mandated contraceptive care more. IV. Short answer. Answer both of the following questions. (15 points each, 30 points total.) Each of the following describes a nonlinear or nonadditive relationship between variables. Draw a scatterplot that illustrates the relationship. Describe the harms that might result if you simply regressed Y on X, e.g. would values be over-estimated, under-estimated, or what? Indicate the model you think should be estimated, e.g. E(Y) = α + β 1 X + β 2 X 2 and/or give the Stata command that would estimate the model. Explain what variables you would need to compute in order to actually estimate the model, e.g. logs of variables, interaction terms. Finally, indicate how you would actually test whether or not nonlinearity or nonadditivity actually was a problem. If you find it helpful, you are welcome to present the Stata commands you would use, but the statistical rationale behind the command still needs to be clear. a. Mitt Romney is worried that issue positions that are helping him to win Republican votes in the primaries may hurt him in the general election. Specifically, he believes that, the more he opposes Obama s health care plan, the more Sociology 63993 Exam 2 Page 7
Republicans tend to support him. But, at the same time, he fears that the more he opposes Obama s health care plan, the less support he will get from Independents and Democrats. Interaction effects are called for. The effect of opposition to health care on support for Romney depends on Party Id. Greater opposition has a positive effect on support for Romney for Republicans but a negative effect on support for Democrats and Independents. A graph of the relationship might look something like Ignoring these differences would result in an estimated effect that might be 0 or small positive or small negative, but whatever it was it would be misleading. In Stata you might give a command like regress support opposehc i.partyid i.partyid#c.opposehc You could do an incremental F test or a Wald test of whether the interaction terms are needed, similar to what was done in Problem III. b. Obama too is wondering how much to emphasize his health care program. Based on data from past electoral campaigns, his advisors believe that each additional dollar he spends on ads promoting his health care program, up to $10 million, will steadily increase his support among voters. However, their research suggests that any dollars spent above that on health care advertising will have no effect on his support. Spline effects sound like a good idea. Up to $10 million dollars, each dollar spent promoting health care has a positive effect on support. Any additional amount above $10 million has zero effect. A graph of the relationship might look something like Failing to take this into account would probably underestimate how effective the first $10 million was and overestimate the impact of amounts above that. In Stata you could use the mkspline function, e.g. if advertising is measured in millions of dollars you could do something like mkspline adv1 10 adv2 = adv reg support adv1 adv2 Sociology 63993 Exam 2 Page 8
The coefficient for adv2 would give me the slope for the effect of advertising above $10 million. Given the way the problem is phrased, I would want to test whether the coefficient is 0. I d of course also want to make sure that the coefficient for adv1 was positive and significant. Appendix: Stata code used in this exam * Exam 2, 2012, Soc 63993 version 12.1 * Problem I-3. webuse nhanes2f, clear reg health weight ovtest * Problem II, Path Analysis clear all matrix input corr = (1,-.7,-.35,.39\-.7,1,.5,-.12\-.35,.5,1,-.06\.39,-.12,-.06,1) corr2data x1 x2 x3 x4, n(100) corr(corr) reg x2 x1 reg x3 x1 x2 reg x4 x1 x2 x3 corr * Problem III. use "http://www.indiana.edu/~jslsoc/stata/spex_data/ordwarm2.dta", clear keep in 1/2270 gen female = male==0 label define female 0 "Male" 1 "Female" label values female female gen religious = 60 - (female *.3 * ed + ed) *** Center religious sum religious replace religious = religious - r(mean) gen femrel = female * religious gen contracept = ((female *.04 * warm + warm + female*1.5) - 1) * 25 *** Results for exam start below *** Estimate Models nestreg: reg contracept religious female femrel *** Differences by gender ttest contracept, by(female) ttest religious, by(female) *** Supplemental - Redo with factor variables quietly reg contracept religious i.female i.female#c.religious quietly margins female, at(religious=(-50(5)50)) marginsplot, noci scheme(sj) Sociology 63993 Exam 2 Page 9