Sociology 362 Data Exercise 6 Logistic Regression 2

Similar documents
Homework Solutions Applied Logistic Regression

Lecture 12: Effect modification, and confounding in logistic regression

Assessing the Calibration of Dichotomous Outcome Models with the Calibration Belt

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis

Binary Dependent Variables

Lab 11 - Heteroskedasticity

Logistic Regression - problem 6.14

Statistical Modelling with Stata: Binary Outcomes

Chapter 20: Logistic regression for binary response variables

Modelling Rates. Mark Lunt. Arthritis Research UK Epidemiology Unit University of Manchester

Chapter 11. Regression with a Binary Dependent Variable

Exercise 7.4 [16 points]

Lecture 2: Poisson and logistic regression

Unit 5 Logistic Regression

Unit 5 Logistic Regression

Unit 5 Logistic Regression

From the help desk: Comparing areas under receiver operating characteristic curves from two or more probit or logit models

Lecture 3.1 Basic Logistic LDA

Lecture 5: Poisson and logistic regression

Logistic Regression. Fitting the Logistic Regression Model BAL040-A.A.-10-MAJ

Model and Working Correlation Structure Selection in GEE Analyses of Longitudinal Data

2. We care about proportion for categorical variable, but average for numerical one.

i (x i x) 2 1 N i x i(y i y) Var(x) = P (x 1 x) Var(x)

Introduction to logistic regression

Econ 371 Problem Set #6 Answer Sheet. deaths per 10,000. The 90% confidence interval for the change in death rate is 1.81 ±

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression

raise Coef. Std. Err. z P> z [95% Conf. Interval]

Binomial Model. Lecture 10: Introduction to Logistic Regression. Logistic Regression. Binomial Distribution. n independent trials

Binary Dependent Variable. Regression with a

sociology 362 regression

Lecture 10: Introduction to Logistic Regression

Practice exam questions

Logistic Regression. Building, Interpreting and Assessing the Goodness-of-fit for a logistic regression model

crreg: A New command for Generalized Continuation Ratio Models

Recent Developments in Multilevel Modeling

Using the same data as before, here is part of the output we get in Stata when we do a logistic regression of Grade on Gpa, Tuce and Psi.

Suppose that we are concerned about the effects of smoking. How could we deal with this?

BIOSTATS Intermediate Biostatistics Spring 2017 Exam 2 (Units 3, 4 & 5) Practice Problems SOLUTIONS

Correlation and regression

University of California at Berkeley Fall Introductory Applied Econometrics Final examination. Scores add up to 125 points

sociology 362 regression

BIOS 312: MODERN REGRESSION ANALYSIS

Exam ECON3150/4150: Introductory Econometrics. 18 May 2016; 09:00h-12.00h.

Case-control studies

Binary Logistic Regression

Logit estimates Number of obs = 5054 Wald chi2(1) = 2.70 Prob > chi2 = Log pseudolikelihood = Pseudo R2 =

ECON Introductory Econometrics. Lecture 11: Binary dependent variables

Exam Applied Statistical Regression. Good Luck!

Logistic Regression Analyses in the Water Level Study

Problem Set #3-Key. wage Coef. Std. Err. t P> t [95% Conf. Interval]

How To Do Piecewise Exponential Survival Analysis in Stata 7 (Allison 1995:Output 4.20) revised

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Lecture 4: Generalized Linear Mixed Models

Lecture 7: OLS with qualitative information

One-Way ANOVA. Some examples of when ANOVA would be appropriate include:

Modelling Binary Outcomes 21/11/2017

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

Cohen s s Kappa and Log-linear Models

Understanding the multinomial-poisson transformation

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p )

Lecture 3: Multiple Regression. Prof. Sharyn O Halloran Sustainable Development U9611 Econometrics II

Compare Predicted Counts between Groups of Zero Truncated Poisson Regression Model based on Recycled Predictions Method

Logistic & Tobit Regression

Econometrics Problem Set 10

Lecture 2: Categorical Variable. A nice book about categorical variable is An Introduction to Categorical Data Analysis authored by Alan Agresti

PSC 8185: Multilevel Modeling Fitting Random Coefficient Binary Response Models in Stata

(a) Briefly discuss the advantage of using panel data in this situation rather than pure crosssections

One-stage dose-response meta-analysis

ECON 594: Lecture #6

PubHlth Intermediate Biostatistics Spring 2015 Exam 2 (Units 3, 4 & 5) Study Guide

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Latent class analysis and finite mixture models with Stata

Unit 5 Logistic Regression Practice Problems

Lecture 4 Multiple linear regression

Problem Set 10: Panel Data

STA6938-Logistic Regression Model

(c) Interpret the estimated effect of temperature on the odds of thermal distress.

1 Independent Practice: Hypothesis tests for one parameter:

Group Comparisons: Differences in Composition Versus Differences in Models and Effects

Appendix A. Numeric example of Dimick Staiger Estimator and comparison between Dimick-Staiger Estimator and Hierarchical Poisson Estimator

Economics 326 Methods of Empirical Research in Economics. Lecture 14: Hypothesis testing in the multiple regression model, Part 2

8 Nominal and Ordinal Logistic Regression

Section 9c. Propensity scores. Controlling for bias & confounding in observational studies

Truck prices - linear model? Truck prices - log transform of the response variable. Interpreting models with log transformation

Chapter 12.8 Logistic Regression

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, Last revised January 20, 2018

STAT 526 Spring Midterm 1. Wednesday February 2, 2011

Logistic Regression Analysis

Regression so far... Lecture 21 - Logistic Regression. Odds. Recap of what you should know how to do... At this point we have covered: Sta102 / BME102

12 Modelling Binomial Response Data

POLI 7050 Spring 2008 March 5, 2008 Unordered Response Models II

Monday 7 th Febraury 2005

Sociology Exam 1 Answer Key Revised February 26, 2007

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression

Models for Binary Outcomes

IP WEIGHTING AND MARGINAL STRUCTURAL MODELS (CHAPTER 12) BIOS IPW and MSM

Transcription:

Sociology 362 Data Exercise 6 Logistic Regression 2 The questions below refer to the data and output beginning on the next page. Although the raw data are given there, you do not have to do any Stata runs in order to answer the questions below. All you have to do is use the results from the logistic regression models that have already been fit. For each model, the dependent variable is the log of the odds that a person in group j has lung disease. More formally, if π j is the probability that a unit in group j has lung disease (i.e., Y ij = 1 if the ith unit in group j has lung disease, Y ij = 0 otherwise), and φ j = π j /(1 π j ) is the odds that a person in group j has lung disease, then the dependent variable is ln(φ j ), which is also known as logit(π j ). For some of the models fitted below, the coefficients (say, β) are reported on the log-odds scale; for others they are reported on the plain odds scale, which means the coefficients (i.e., e β ) are interpretable as odds ratios. For example, if years of employment (X) has a coefficient β =.06 on the log-odds scale, then its (odds-ratio) coefficient on the plain odds scale is e.06 = 1.0618. The β =.06 means that someone with x+1 years of employment has a log odds of lung disease that is.06 higher than the log-odds for someone with x years of employment. The odds-ratio figure e.06 = 1.0618 means that the odds of lung disease for someone with x+1 years of employment is 1.0618 times the odds for someone with x years of employment. I have also printed out the predicted probabilities for some of the models. You should use these to check some of your answers. One more thing. The accompanying output was produced using Stata s blogit command because the data are grouped. If the data had come to me as n j = 560 individual observations on lung disease, sex, etc., I would have used Stata s logit or logistic command. All of the answers to all of the questions below would be the same. 1. For model 1: a. Find the fitted log-odds of lung disease for someone who smokes and for someone who doesn t smoke. What is the difference between the log-odds. b. Find the fitted (plain) odds of lung disease for someone who smokes and someone who does not smoke. Compute the ratio of smoker to nonsmoker odds. Is it equal to e β = e 1.151722 = 3.16+? c. How much does smoking increase the probability of lung disease? d. Do a likelihood ratio test of the null hypothesis β = 0, by computing the reduction in deviance due to moving from Model 0 to Model 1. 2. For Model 3: a. Show that positive coefficients on the log-odds scale imply odds-ratios greater than 1, while negative coefficients on the log-odds scale imply odds-ratio coefficients less than one. b. The odds-ratio coefficient for sex is 2.32+. What does this mean? c. Construct the 95% confidence interval for the effect of a unit change in years on the log odds of lung disease; then construct an interval estimate of the effect of a unit change in years on the odds of lung disease. d. Compare the probability of lung disease for a white male smoker with 5 years of employment and a white male smoker with 15 years of employment. e. Do a likelihood ratio test of the null hypothesis that, controlling for smoking behavior, β sex = β race = β yrs = 0. 1

f. Among the reported statistics, we find chi2(4)=46.64. Identify the corresponding null hypothesis and carry out the likelihood ratio test that yields exactly this statistic. 3. Model 5 might be called the trait model of lung disease: it assumes that once sex and race are accounted for, behaviors like smoking or working in a dirty environment have no effect. Model 6, on the other hand, is behavioral: it assumes that once disease-relevant behavior like smoking and dusty working conditions are accounted for, sex and race have no effect. Carry out the relevant likelihood ratio tests for adjudicating between these points of view. 2

Data definitions: n = size of group; r = number in group with lung diseasse; smk = dummy code 1 for smoker; sex coded 1 for male; race coded 1 for white; and yrs is years of employment in a hazardous, dusty workplace.. list r n smk sex race yrs r n smk sex race yrs 1. 3 37 1 1 1 5 2. 25 139 1 1 0 5 3. 0 5 1 0 1 5 4. 2 22 1 0 0 5 5. 0 16 0 1 1 5 6. 6 75 0 1 0 5 7. 0 4 0 0 1 5 8. 1 24 0 0 0 5 9. 8 21 1 1 1 15 10. 8 30 1 1 0 15 11. 2 8 0 1 1 15 12. 1 9 0 1 0 15 13. 31 77 1 1 1 25 14. 10 31 1 1 0 25 15. 5 47 0 1 1 25 16. 3 15 0 1 0 25 Model 0. blogit r n chi2(0) = 0.00 Prob > chi2 =. Log Likelihood = -270.24344 Pseudo R2 = 0.0000 _cons -1.466337.1082664-13.544 0.000-1.678535-1.254139 Model 1. blogit r n smk chi2(1) = 20.59 Log Likelihood = -259.94709 Pseudo R2 = 0.0381 smk 1.151722.2761103 4.171 0.000.6105558 1.692888 _cons -2.302585.2471969-9.315 0.000-2.787082-1.818088 Model 2. blogit r n smk sex race chi2(3) = 30.10 Log Likelihood = -255.19315 Pseudo R2 = 0.0557 smk 1.111579.2780959 3.997 0.000.5665209 1.656637 sex 1.255928.6116363 2.053 0.040.0571424 2.454713 race.3616097.2242357 1.613 0.107 -.0778842.8011036 _cons -3.603122.6340384-5.683 0.000-4.845814-2.360429

Model 3a. blogit r n smk sex race yrs chi2(4) = 46.64 Log Likelihood = -246.9252 Pseudo R2 = 0.0863 smk 1.160457.282286 4.111 0.000.6071863 1.713727 sex.842054.6238158 1.350 0.177 -.3806026 2.064711 race -.1339447.2610206-0.513 0.608 -.6455358.3776464 yrs.0572981.0143055 4.005 0.000.0292597.0853364 _cons -3.831732.6377202-6.008 0.000-5.081641-2.581824 Model 3b. blogit r n smk sex race yrs,or chi2(4) = 46.64 Log Likelihood = -246.9252 Pseudo R2 = 0.0863 _outcome Odds Ratio Std. Err. z P> z [95% Conf. Interval] smk 3.19139.9008848 4.111 0.000 1.83526 5.549607 sex 2.32113 1.447958 1.350 0.177.6834495 7.883016 race.8746384.2282987-0.513 0.608.5243815 1.458847 yrs 1.058971.0151492 4.005 0.000 1.029692 1.089083. pred p_hat1. list smk sex race yrs p_hat1 smk sex race yrs p_hat1 1. 1 1 1 5.1575361 2. 1 1 0 5.1761386 3. 1 0 1 5.0745555 4. 1 0 0 5.0843403 5. 0 1 1 5.0553503 6. 0 1 0 5.0627855 7. 0 0 1 5.024622 8. 0 0 0 5.028052 9. 1 1 1 15.2490482 10. 1 1 0 15.2749302 11. 0 1 1 15.0941357 12. 0 1 0 15.1061953 13. 1 1 1 25.3703502 14. 1 1 0 25.4020887 15. 0 1 1 25.1556219 16. 0 1 0 25.174045 Model 4. blogit r n yrs chi2(1) = 23.07 Log Likelihood = -258.70632 Pseudo R2 = 0.0427 yrs.0566908.0118836 4.771 0.000.0333994.0799823 _cons -2.243938.2112435-10.623 0.000-2.657968-1.829909

. pred phat2. list yrs phat2 yrs phat2 1. 5.1234147 2. 5.1234147 3. 5.1234147 4. 5.1234147 5. 5.1234147 6. 5.1234147 7. 5.1234147 8. 5.1234147 9. 15.1988375 10. 15.1988375 11. 15.1988375 12. 15.1988375 13. 25.3043501 14. 25.3043501 15. 25.3043501 16. 25.3043501 Model 5. blogit r n sex race chi2(2) = 11.38 Prob > chi2 = 0.0034 Log Likelihood = -264.55141 Pseudo R2 = 0.0211 sex 1.394996.6068325 2.299 0.022.2056258 2.584365 race.3389905.2204747 1.538 0.124 -.0931319.7711129 _cons -2.915518.5958392-4.893 0.000-4.083342-1.747695 Model 6a. blogit r n smk yrs chi2(2) = 44.17 Log Likelihood = -248.15959 Pseudo R2 = 0.0817 smk 1.189703.2813393 4.229 0.000.6382886 1.741118 yrs.0587493.012215 4.810 0.000.0348084.0826903 _cons -3.136865.3201372-9.799 0.000-3.764322-2.509408 Model 6b. blogit r n smk yrs,or chi2(2) = 44.17 Log Likelihood = -248.15959 Pseudo R2 = 0.0817 _outcome Odds Ratio Std. Err. z P> z [95% Conf. Interval] smk 3.286106.9245108 4.229 0.000 1.893238 5.703718 yrs 1.060509.0129541 4.810 0.000 1.035421 1.086205