Sociology 362 Data Exercise 6 Logistic Regression 2 The questions below refer to the data and output beginning on the next page. Although the raw data are given there, you do not have to do any Stata runs in order to answer the questions below. All you have to do is use the results from the logistic regression models that have already been fit. For each model, the dependent variable is the log of the odds that a person in group j has lung disease. More formally, if π j is the probability that a unit in group j has lung disease (i.e., Y ij = 1 if the ith unit in group j has lung disease, Y ij = 0 otherwise), and φ j = π j /(1 π j ) is the odds that a person in group j has lung disease, then the dependent variable is ln(φ j ), which is also known as logit(π j ). For some of the models fitted below, the coefficients (say, β) are reported on the log-odds scale; for others they are reported on the plain odds scale, which means the coefficients (i.e., e β ) are interpretable as odds ratios. For example, if years of employment (X) has a coefficient β =.06 on the log-odds scale, then its (odds-ratio) coefficient on the plain odds scale is e.06 = 1.0618. The β =.06 means that someone with x+1 years of employment has a log odds of lung disease that is.06 higher than the log-odds for someone with x years of employment. The odds-ratio figure e.06 = 1.0618 means that the odds of lung disease for someone with x+1 years of employment is 1.0618 times the odds for someone with x years of employment. I have also printed out the predicted probabilities for some of the models. You should use these to check some of your answers. One more thing. The accompanying output was produced using Stata s blogit command because the data are grouped. If the data had come to me as n j = 560 individual observations on lung disease, sex, etc., I would have used Stata s logit or logistic command. All of the answers to all of the questions below would be the same. 1. For model 1: a. Find the fitted log-odds of lung disease for someone who smokes and for someone who doesn t smoke. What is the difference between the log-odds. b. Find the fitted (plain) odds of lung disease for someone who smokes and someone who does not smoke. Compute the ratio of smoker to nonsmoker odds. Is it equal to e β = e 1.151722 = 3.16+? c. How much does smoking increase the probability of lung disease? d. Do a likelihood ratio test of the null hypothesis β = 0, by computing the reduction in deviance due to moving from Model 0 to Model 1. 2. For Model 3: a. Show that positive coefficients on the log-odds scale imply odds-ratios greater than 1, while negative coefficients on the log-odds scale imply odds-ratio coefficients less than one. b. The odds-ratio coefficient for sex is 2.32+. What does this mean? c. Construct the 95% confidence interval for the effect of a unit change in years on the log odds of lung disease; then construct an interval estimate of the effect of a unit change in years on the odds of lung disease. d. Compare the probability of lung disease for a white male smoker with 5 years of employment and a white male smoker with 15 years of employment. e. Do a likelihood ratio test of the null hypothesis that, controlling for smoking behavior, β sex = β race = β yrs = 0. 1
f. Among the reported statistics, we find chi2(4)=46.64. Identify the corresponding null hypothesis and carry out the likelihood ratio test that yields exactly this statistic. 3. Model 5 might be called the trait model of lung disease: it assumes that once sex and race are accounted for, behaviors like smoking or working in a dirty environment have no effect. Model 6, on the other hand, is behavioral: it assumes that once disease-relevant behavior like smoking and dusty working conditions are accounted for, sex and race have no effect. Carry out the relevant likelihood ratio tests for adjudicating between these points of view. 2
Data definitions: n = size of group; r = number in group with lung diseasse; smk = dummy code 1 for smoker; sex coded 1 for male; race coded 1 for white; and yrs is years of employment in a hazardous, dusty workplace.. list r n smk sex race yrs r n smk sex race yrs 1. 3 37 1 1 1 5 2. 25 139 1 1 0 5 3. 0 5 1 0 1 5 4. 2 22 1 0 0 5 5. 0 16 0 1 1 5 6. 6 75 0 1 0 5 7. 0 4 0 0 1 5 8. 1 24 0 0 0 5 9. 8 21 1 1 1 15 10. 8 30 1 1 0 15 11. 2 8 0 1 1 15 12. 1 9 0 1 0 15 13. 31 77 1 1 1 25 14. 10 31 1 1 0 25 15. 5 47 0 1 1 25 16. 3 15 0 1 0 25 Model 0. blogit r n chi2(0) = 0.00 Prob > chi2 =. Log Likelihood = -270.24344 Pseudo R2 = 0.0000 _cons -1.466337.1082664-13.544 0.000-1.678535-1.254139 Model 1. blogit r n smk chi2(1) = 20.59 Log Likelihood = -259.94709 Pseudo R2 = 0.0381 smk 1.151722.2761103 4.171 0.000.6105558 1.692888 _cons -2.302585.2471969-9.315 0.000-2.787082-1.818088 Model 2. blogit r n smk sex race chi2(3) = 30.10 Log Likelihood = -255.19315 Pseudo R2 = 0.0557 smk 1.111579.2780959 3.997 0.000.5665209 1.656637 sex 1.255928.6116363 2.053 0.040.0571424 2.454713 race.3616097.2242357 1.613 0.107 -.0778842.8011036 _cons -3.603122.6340384-5.683 0.000-4.845814-2.360429
Model 3a. blogit r n smk sex race yrs chi2(4) = 46.64 Log Likelihood = -246.9252 Pseudo R2 = 0.0863 smk 1.160457.282286 4.111 0.000.6071863 1.713727 sex.842054.6238158 1.350 0.177 -.3806026 2.064711 race -.1339447.2610206-0.513 0.608 -.6455358.3776464 yrs.0572981.0143055 4.005 0.000.0292597.0853364 _cons -3.831732.6377202-6.008 0.000-5.081641-2.581824 Model 3b. blogit r n smk sex race yrs,or chi2(4) = 46.64 Log Likelihood = -246.9252 Pseudo R2 = 0.0863 _outcome Odds Ratio Std. Err. z P> z [95% Conf. Interval] smk 3.19139.9008848 4.111 0.000 1.83526 5.549607 sex 2.32113 1.447958 1.350 0.177.6834495 7.883016 race.8746384.2282987-0.513 0.608.5243815 1.458847 yrs 1.058971.0151492 4.005 0.000 1.029692 1.089083. pred p_hat1. list smk sex race yrs p_hat1 smk sex race yrs p_hat1 1. 1 1 1 5.1575361 2. 1 1 0 5.1761386 3. 1 0 1 5.0745555 4. 1 0 0 5.0843403 5. 0 1 1 5.0553503 6. 0 1 0 5.0627855 7. 0 0 1 5.024622 8. 0 0 0 5.028052 9. 1 1 1 15.2490482 10. 1 1 0 15.2749302 11. 0 1 1 15.0941357 12. 0 1 0 15.1061953 13. 1 1 1 25.3703502 14. 1 1 0 25.4020887 15. 0 1 1 25.1556219 16. 0 1 0 25.174045 Model 4. blogit r n yrs chi2(1) = 23.07 Log Likelihood = -258.70632 Pseudo R2 = 0.0427 yrs.0566908.0118836 4.771 0.000.0333994.0799823 _cons -2.243938.2112435-10.623 0.000-2.657968-1.829909
. pred phat2. list yrs phat2 yrs phat2 1. 5.1234147 2. 5.1234147 3. 5.1234147 4. 5.1234147 5. 5.1234147 6. 5.1234147 7. 5.1234147 8. 5.1234147 9. 15.1988375 10. 15.1988375 11. 15.1988375 12. 15.1988375 13. 25.3043501 14. 25.3043501 15. 25.3043501 16. 25.3043501 Model 5. blogit r n sex race chi2(2) = 11.38 Prob > chi2 = 0.0034 Log Likelihood = -264.55141 Pseudo R2 = 0.0211 sex 1.394996.6068325 2.299 0.022.2056258 2.584365 race.3389905.2204747 1.538 0.124 -.0931319.7711129 _cons -2.915518.5958392-4.893 0.000-4.083342-1.747695 Model 6a. blogit r n smk yrs chi2(2) = 44.17 Log Likelihood = -248.15959 Pseudo R2 = 0.0817 smk 1.189703.2813393 4.229 0.000.6382886 1.741118 yrs.0587493.012215 4.810 0.000.0348084.0826903 _cons -3.136865.3201372-9.799 0.000-3.764322-2.509408 Model 6b. blogit r n smk yrs,or chi2(2) = 44.17 Log Likelihood = -248.15959 Pseudo R2 = 0.0817 _outcome Odds Ratio Std. Err. z P> z [95% Conf. Interval] smk 3.286106.9245108 4.229 0.000 1.893238 5.703718 yrs 1.060509.0129541 4.810 0.000 1.035421 1.086205